* [PATCH bpf-next 0/4] x86/fgraph,bpf: Fix ORC stack unwind from kprobe_multi
@ 2026-01-12 21:49 Jiri Olsa
2026-01-12 21:49 ` [PATCH bpf-next 1/4] x86/fgraph: Fix return_to_handler regs.rsp value Jiri Olsa
` (3 more replies)
0 siblings, 4 replies; 17+ messages in thread
From: Jiri Olsa @ 2026-01-12 21:49 UTC (permalink / raw)
To: Masami Hiramatsu, Steven Rostedt, Josh Poimboeuf
Cc: Peter Zijlstra, bpf, linux-trace-kernel, x86, Yonghong Song,
Song Liu, Andrii Nakryiko, Mahe Tardy
hi,
Mahe reported missing function from stack trace on top of kprobe multi
program. It turned out the latest fix [1] needs some more fixing.
thanks,
jirka
[1] https://lore.kernel.org/bpf/20251104215405.168643-1-jolsa@kernel.org/
---
Jiri Olsa (4):
x86/fgraph: Fix return_to_handler regs.rsp value
x86/fgraph,bpf: Switch kprobe_multi program stack unwind to hw_regs path
selftests/bpf: Fix kprobe multi stacktrace_ips test
selftests/bpf: Allow to benchmark trigger with stacktrace
arch/x86/include/asm/ftrace.h | 2 +-
arch/x86/kernel/ftrace_64.S | 4 +++-
kernel/trace/fgraph.c | 2 +-
tools/testing/selftests/bpf/bench.c | 4 ++++
tools/testing/selftests/bpf/bench.h | 1 +
tools/testing/selftests/bpf/benchs/bench_trigger.c | 1 +
tools/testing/selftests/bpf/prog_tests/stacktrace_ips.c | 3 ++-
tools/testing/selftests/bpf/progs/trigger_bench.c | 18 ++++++++++++++++++
8 files changed, 31 insertions(+), 4 deletions(-)
^ permalink raw reply [flat|nested] 17+ messages in thread* [PATCH bpf-next 1/4] x86/fgraph: Fix return_to_handler regs.rsp value 2026-01-12 21:49 [PATCH bpf-next 0/4] x86/fgraph,bpf: Fix ORC stack unwind from kprobe_multi Jiri Olsa @ 2026-01-12 21:49 ` Jiri Olsa 2026-01-12 21:49 ` [PATCH bpf-next 2/4] x86/fgraph,bpf: Switch kprobe_multi program stack unwind to hw_regs path Jiri Olsa ` (2 subsequent siblings) 3 siblings, 0 replies; 17+ messages in thread From: Jiri Olsa @ 2026-01-12 21:49 UTC (permalink / raw) To: Masami Hiramatsu, Steven Rostedt, Josh Poimboeuf Cc: Peter Zijlstra, bpf, linux-trace-kernel, x86, Yonghong Song, Song Liu, Andrii Nakryiko, Mahe Tardy The previous change (Fixes commit) messed up the rsp register value, which is wrong because it's already adjusted with FRAME_SIZE, we need the original value (after UNWIND_HINT_FUNC hint). Fixes: 20a0bc10272f ("x86/fgraph,bpf: Fix stack ORC unwind from kprobe_multi return probe") Signed-off-by: Jiri Olsa <jolsa@kernel.org> --- arch/x86/kernel/ftrace_64.S | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S index a132608265f6..613be81b6e88 100644 --- a/arch/x86/kernel/ftrace_64.S +++ b/arch/x86/kernel/ftrace_64.S @@ -368,13 +368,15 @@ SYM_CODE_START(return_to_handler) subq $8, %rsp UNWIND_HINT_FUNC + movq %rsp, %rdi + /* Save ftrace_regs for function exit context */ subq $(FRAME_SIZE), %rsp movq %rax, RAX(%rsp) movq %rdx, RDX(%rsp) movq %rbp, RBP(%rsp) - movq %rsp, RSP(%rsp) + movq %rdi, RSP(%rsp) movq %rsp, %rdi call ftrace_return_to_handler -- 2.52.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH bpf-next 2/4] x86/fgraph,bpf: Switch kprobe_multi program stack unwind to hw_regs path 2026-01-12 21:49 [PATCH bpf-next 0/4] x86/fgraph,bpf: Fix ORC stack unwind from kprobe_multi Jiri Olsa 2026-01-12 21:49 ` [PATCH bpf-next 1/4] x86/fgraph: Fix return_to_handler regs.rsp value Jiri Olsa @ 2026-01-12 21:49 ` Jiri Olsa 2026-01-12 22:07 ` Steven Rostedt 2026-01-12 21:49 ` [PATCH bpf-next 3/4] selftests/bpf: Fix kprobe multi stacktrace_ips test Jiri Olsa 2026-01-12 21:49 ` [PATCH bpf-next 4/4] selftests/bpf: Allow to benchmark trigger with stacktrace Jiri Olsa 3 siblings, 1 reply; 17+ messages in thread From: Jiri Olsa @ 2026-01-12 21:49 UTC (permalink / raw) To: Masami Hiramatsu, Steven Rostedt, Josh Poimboeuf Cc: Mahe Tardy, Peter Zijlstra, bpf, linux-trace-kernel, x86, Yonghong Song, Song Liu, Andrii Nakryiko Mahe reported missing function from stack trace on top of kprobe multi program. The missing function is the very first one in the stacktrace, the one that the bpf program is attached to. # bpftrace -e 'kprobe:__x64_sys_newuname* { print(kstack)}' Attaching 1 probe... do_syscall_64+134 entry_SYSCALL_64_after_hwframe+118 ('*' is used for kprobe_multi attachment) The reason is that the previous change (the Fixes commit) fixed stack unwind for tracepoint, but removed attached function address from the stack trace on top of kprobe multi programs, which I also overlooked in the related test (check following patch). The tracepoint and kprobe_multi have different stack setup, but use same unwind path. I think it's better to keep the previous change, which fixed tracepoint unwind and instead change the kprobe multi unwind as explained below. The bpf program stack unwind calls perf_callchain_kernel for kernel portion and it follows two unwind paths based on X86_EFLAGS_FIXED bit in pt_regs.flags. When the bit set we unwind from stack represented by pt_regs argument, otherwise we unwind currently executed stack up to 'first_frame' boundary. The 'first_frame' value is taken from regs.rsp value, but ftrace_caller and ftrace_regs_caller (ftrace trampoline) functions set the regs.rsp to the previous stack frame, so we skip the attached function entry. If we switch kprobe_multi unwind to use the X86_EFLAGS_FIXED bit, we can control the start of the unwind and get back the attached function address. As another benefit we also cut extra unwind cycles needed to reach the 'first_frame' boundary. The speedup can be meassured with trigger bench for kprobe_multi program and stacktrace support. - without bpf_get_stackid call: # ./bench -w2 -d5 -a -p1 trig-kprobe-multi Summary: hits 0.857 ± 0.003M/s ( 0.857M/prod), drops 0.000 ± 0.000M/s, total operations 0.857 ± 0.003M/s - with bpf_get_stackid call: # ./bench -w2 -d5 -a -g -p1 trig-kprobe-multi Summary: hits 1.302 ± 0.002M/s ( 1.302M/prod), drops 0.000 ± 0.000M/s, total operations 1.302 ± 0.002M/s Note the '-g' option for stacktrace added in following change. To recreate same stack setup for return probe as we have for entry probe, we set the instruction pointer to the attached function address, which gets us the same unwind setup and same stack trace. With the fix, entry probe: # bpftrace -e 'kprobe:__x64_sys_newuname* { print(kstack)}' Attaching 1 probe... __x64_sys_newuname+9 do_syscall_64+134 entry_SYSCALL_64_after_hwframe+118 return probe: # bpftrace -e 'kretprobe:__x64_sys_newuname* { print(kstack)}' Attaching 1 probe... __x64_sys_newuname+4 do_syscall_64+134 entry_SYSCALL_64_after_hwframe+118 Fixes: 6d08340d1e35 ("Revert "perf/x86: Always store regs->ip in perf_callchain_kernel()"") Reported-by: Mahe Tardy <mahe.tardy@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> --- arch/x86/include/asm/ftrace.h | 2 +- kernel/trace/fgraph.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h index b08c95872eed..c56e1e63b893 100644 --- a/arch/x86/include/asm/ftrace.h +++ b/arch/x86/include/asm/ftrace.h @@ -57,7 +57,7 @@ arch_ftrace_get_regs(struct ftrace_regs *fregs) } #define arch_ftrace_partial_regs(regs) do { \ - regs->flags &= ~X86_EFLAGS_FIXED; \ + regs->flags |= X86_EFLAGS_FIXED; \ regs->cs = __KERNEL_CS; \ } while (0) diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c index cc48d16be43e..6279e0a753cf 100644 --- a/kernel/trace/fgraph.c +++ b/kernel/trace/fgraph.c @@ -825,7 +825,7 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe } if (fregs) - ftrace_regs_set_instruction_pointer(fregs, ret); + ftrace_regs_set_instruction_pointer(fregs, trace.func); bit = ftrace_test_recursion_trylock(trace.func, ret); /* -- 2.52.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 2/4] x86/fgraph,bpf: Switch kprobe_multi program stack unwind to hw_regs path 2026-01-12 21:49 ` [PATCH bpf-next 2/4] x86/fgraph,bpf: Switch kprobe_multi program stack unwind to hw_regs path Jiri Olsa @ 2026-01-12 22:07 ` Steven Rostedt 2026-01-13 11:43 ` Jiri Olsa 0 siblings, 1 reply; 17+ messages in thread From: Steven Rostedt @ 2026-01-12 22:07 UTC (permalink / raw) To: Jiri Olsa Cc: Masami Hiramatsu, Josh Poimboeuf, Mahe Tardy, Peter Zijlstra, bpf, linux-trace-kernel, x86, Yonghong Song, Song Liu, Andrii Nakryiko On Mon, 12 Jan 2026 22:49:38 +0100 Jiri Olsa <jolsa@kernel.org> wrote: > To recreate same stack setup for return probe as we have for entry > probe, we set the instruction pointer to the attached function address, > which gets us the same unwind setup and same stack trace. > > With the fix, entry probe: > > # bpftrace -e 'kprobe:__x64_sys_newuname* { print(kstack)}' > Attaching 1 probe... > > __x64_sys_newuname+9 > do_syscall_64+134 > entry_SYSCALL_64_after_hwframe+118 > > return probe: > > # bpftrace -e 'kretprobe:__x64_sys_newuname* { print(kstack)}' > Attaching 1 probe... > > __x64_sys_newuname+4 > do_syscall_64+134 > entry_SYSCALL_64_after_hwframe+118 But is this really correct? The stack trace of the return from __x86_sys_newuname is from offset "+4". The stack trace from entry is offset "+9". Isn't it confusing that the offset is likely not from the return portion of that function? -- Steve > > Fixes: 6d08340d1e35 ("Revert "perf/x86: Always store regs->ip in perf_callchain_kernel()"") > Reported-by: Mahe Tardy <mahe.tardy@gmail.com> > Signed-off-by: Jiri Olsa <jolsa@kernel.org> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 2/4] x86/fgraph,bpf: Switch kprobe_multi program stack unwind to hw_regs path 2026-01-12 22:07 ` Steven Rostedt @ 2026-01-13 11:43 ` Jiri Olsa 2026-01-15 18:52 ` Andrii Nakryiko 2026-01-20 16:17 ` Jiri Olsa 0 siblings, 2 replies; 17+ messages in thread From: Jiri Olsa @ 2026-01-13 11:43 UTC (permalink / raw) To: Steven Rostedt Cc: Masami Hiramatsu, Josh Poimboeuf, Mahe Tardy, Peter Zijlstra, bpf, linux-trace-kernel, x86, Yonghong Song, Song Liu, Andrii Nakryiko On Mon, Jan 12, 2026 at 05:07:57PM -0500, Steven Rostedt wrote: > On Mon, 12 Jan 2026 22:49:38 +0100 > Jiri Olsa <jolsa@kernel.org> wrote: > > > To recreate same stack setup for return probe as we have for entry > > probe, we set the instruction pointer to the attached function address, > > which gets us the same unwind setup and same stack trace. > > > > With the fix, entry probe: > > > > # bpftrace -e 'kprobe:__x64_sys_newuname* { print(kstack)}' > > Attaching 1 probe... > > > > __x64_sys_newuname+9 > > do_syscall_64+134 > > entry_SYSCALL_64_after_hwframe+118 > > > > return probe: > > > > # bpftrace -e 'kretprobe:__x64_sys_newuname* { print(kstack)}' > > Attaching 1 probe... > > > > __x64_sys_newuname+4 > > do_syscall_64+134 > > entry_SYSCALL_64_after_hwframe+118 > > But is this really correct? > > The stack trace of the return from __x86_sys_newuname is from offset "+4". > > The stack trace from entry is offset "+9". Isn't it confusing that the > offset is likely not from the return portion of that function? right, makes sense.. so standard kprobe actualy skips attached function (__x86_sys_newuname) on return probe stacktrace.. perhaps we should do the same for kprobe_multi I managed to get that with the change below, but it's wrong wrt arch code, note the ftrace_regs_set_stack_pointer(fregs, stack + 8) .. will try to figure out better way when we agree on the solution thanks, jirka --- diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h index c56e1e63b893..b0e8ce4934e7 100644 --- a/arch/x86/include/asm/ftrace.h +++ b/arch/x86/include/asm/ftrace.h @@ -71,6 +71,9 @@ arch_ftrace_get_regs(struct ftrace_regs *fregs) #define ftrace_regs_set_instruction_pointer(fregs, _ip) \ do { arch_ftrace_regs(fregs)->regs.ip = (_ip); } while (0) +#define ftrace_regs_set_stack_pointer(fregs, _sp) \ + do { arch_ftrace_regs(fregs)->regs.sp = (_sp); } while (0) + static __always_inline unsigned long ftrace_regs_get_return_address(struct ftrace_regs *fregs) diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c index 6279e0a753cf..b1510c412dcb 100644 --- a/kernel/trace/fgraph.c +++ b/kernel/trace/fgraph.c @@ -717,7 +717,8 @@ int function_graph_enter_regs(unsigned long ret, unsigned long func, /* Retrieve a function return address to the trace stack on thread info.*/ static struct ftrace_ret_stack * ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret, - unsigned long frame_pointer, int *offset) + unsigned long *stack, unsigned long frame_pointer, + int *offset) { struct ftrace_ret_stack *ret_stack; @@ -762,6 +763,7 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret, *offset += FGRAPH_FRAME_OFFSET; *ret = ret_stack->ret; + *stack = (unsigned long) ret_stack->retp; trace->func = ret_stack->func; trace->overrun = atomic_read(¤t->trace_overrun); trace->depth = current->curr_ret_depth; @@ -810,12 +812,13 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe struct ftrace_ret_stack *ret_stack; struct ftrace_graph_ret trace; unsigned long bitmap; + unsigned long stack; unsigned long ret; int offset; int bit; int i; - ret_stack = ftrace_pop_return_trace(&trace, &ret, frame_pointer, &offset); + ret_stack = ftrace_pop_return_trace(&trace, &ret, &stack, frame_pointer, &offset); if (unlikely(!ret_stack)) { ftrace_graph_stop(); @@ -824,8 +827,11 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe return (unsigned long)panic; } - if (fregs) - ftrace_regs_set_instruction_pointer(fregs, trace.func); + if (fregs) { + ftrace_regs_set_instruction_pointer(fregs, ret); + ftrace_regs_set_stack_pointer(fregs, stack + 8); + } + bit = ftrace_test_recursion_trylock(trace.func, ret); /* diff --git a/tools/testing/selftests/bpf/prog_tests/stacktrace_ips.c b/tools/testing/selftests/bpf/prog_tests/stacktrace_ips.c index e1a9b55e07cb..852830536109 100644 --- a/tools/testing/selftests/bpf/prog_tests/stacktrace_ips.c +++ b/tools/testing/selftests/bpf/prog_tests/stacktrace_ips.c @@ -74,12 +74,20 @@ static void test_stacktrace_ips_kprobe_multi(bool retprobe) load_kallsyms(); - check_stacktrace_ips(bpf_map__fd(skel->maps.stackmap), skel->bss->stack_key, 5, - ksym_get_addr("bpf_testmod_stacktrace_test"), - ksym_get_addr("bpf_testmod_stacktrace_test_3"), - ksym_get_addr("bpf_testmod_stacktrace_test_2"), - ksym_get_addr("bpf_testmod_stacktrace_test_1"), - ksym_get_addr("bpf_testmod_test_read")); + if (retprobe) { + check_stacktrace_ips(bpf_map__fd(skel->maps.stackmap), skel->bss->stack_key, 4, + ksym_get_addr("bpf_testmod_stacktrace_test_3"), + ksym_get_addr("bpf_testmod_stacktrace_test_2"), + ksym_get_addr("bpf_testmod_stacktrace_test_1"), + ksym_get_addr("bpf_testmod_test_read")); + } else { + check_stacktrace_ips(bpf_map__fd(skel->maps.stackmap), skel->bss->stack_key, 5, + ksym_get_addr("bpf_testmod_stacktrace_test"), + ksym_get_addr("bpf_testmod_stacktrace_test_3"), + ksym_get_addr("bpf_testmod_stacktrace_test_2"), + ksym_get_addr("bpf_testmod_stacktrace_test_1"), + ksym_get_addr("bpf_testmod_test_read")); + } cleanup: stacktrace_ips__destroy(skel); ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 2/4] x86/fgraph,bpf: Switch kprobe_multi program stack unwind to hw_regs path 2026-01-13 11:43 ` Jiri Olsa @ 2026-01-15 18:52 ` Andrii Nakryiko 2026-01-16 16:25 ` Jiri Olsa 2026-01-20 16:17 ` Jiri Olsa 1 sibling, 1 reply; 17+ messages in thread From: Andrii Nakryiko @ 2026-01-15 18:52 UTC (permalink / raw) To: Jiri Olsa Cc: Steven Rostedt, Masami Hiramatsu, Josh Poimboeuf, Mahe Tardy, Peter Zijlstra, bpf, linux-trace-kernel, x86, Yonghong Song, Song Liu, Andrii Nakryiko On Tue, Jan 13, 2026 at 3:43 AM Jiri Olsa <olsajiri@gmail.com> wrote: > > On Mon, Jan 12, 2026 at 05:07:57PM -0500, Steven Rostedt wrote: > > On Mon, 12 Jan 2026 22:49:38 +0100 > > Jiri Olsa <jolsa@kernel.org> wrote: > > > > > To recreate same stack setup for return probe as we have for entry > > > probe, we set the instruction pointer to the attached function address, > > > which gets us the same unwind setup and same stack trace. > > > > > > With the fix, entry probe: > > > > > > # bpftrace -e 'kprobe:__x64_sys_newuname* { print(kstack)}' > > > Attaching 1 probe... > > > > > > __x64_sys_newuname+9 > > > do_syscall_64+134 > > > entry_SYSCALL_64_after_hwframe+118 > > > > > > return probe: > > > > > > # bpftrace -e 'kretprobe:__x64_sys_newuname* { print(kstack)}' > > > Attaching 1 probe... > > > > > > __x64_sys_newuname+4 > > > do_syscall_64+134 > > > entry_SYSCALL_64_after_hwframe+118 > > > > But is this really correct? > > > > The stack trace of the return from __x86_sys_newuname is from offset "+4". > > > > The stack trace from entry is offset "+9". Isn't it confusing that the > > offset is likely not from the return portion of that function? > > right, makes sense.. so standard kprobe actualy skips attached function > (__x86_sys_newuname) on return probe stacktrace.. perhaps we should do > the same for kprobe_multi but it is quite nice to see what function we were kretprobing, actually... How hard would it be to support that for singular kprobe as well? And what does fexit's stack trace show for such case? > > I managed to get that with the change below, but it's wrong wrt arch code, > note the ftrace_regs_set_stack_pointer(fregs, stack + 8) .. will try to > figure out better way when we agree on the solution > > thanks, > jirka > > > --- > diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h > index c56e1e63b893..b0e8ce4934e7 100644 > --- a/arch/x86/include/asm/ftrace.h > +++ b/arch/x86/include/asm/ftrace.h > @@ -71,6 +71,9 @@ arch_ftrace_get_regs(struct ftrace_regs *fregs) > #define ftrace_regs_set_instruction_pointer(fregs, _ip) \ > do { arch_ftrace_regs(fregs)->regs.ip = (_ip); } while (0) > > +#define ftrace_regs_set_stack_pointer(fregs, _sp) \ > + do { arch_ftrace_regs(fregs)->regs.sp = (_sp); } while (0) > + > > static __always_inline unsigned long > ftrace_regs_get_return_address(struct ftrace_regs *fregs) > diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c > index 6279e0a753cf..b1510c412dcb 100644 > --- a/kernel/trace/fgraph.c > +++ b/kernel/trace/fgraph.c > @@ -717,7 +717,8 @@ int function_graph_enter_regs(unsigned long ret, unsigned long func, > /* Retrieve a function return address to the trace stack on thread info.*/ > static struct ftrace_ret_stack * > ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret, > - unsigned long frame_pointer, int *offset) > + unsigned long *stack, unsigned long frame_pointer, > + int *offset) > { > struct ftrace_ret_stack *ret_stack; > > @@ -762,6 +763,7 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret, > > *offset += FGRAPH_FRAME_OFFSET; > *ret = ret_stack->ret; > + *stack = (unsigned long) ret_stack->retp; > trace->func = ret_stack->func; > trace->overrun = atomic_read(¤t->trace_overrun); > trace->depth = current->curr_ret_depth; > @@ -810,12 +812,13 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe > struct ftrace_ret_stack *ret_stack; > struct ftrace_graph_ret trace; > unsigned long bitmap; > + unsigned long stack; > unsigned long ret; > int offset; > int bit; > int i; > > - ret_stack = ftrace_pop_return_trace(&trace, &ret, frame_pointer, &offset); > + ret_stack = ftrace_pop_return_trace(&trace, &ret, &stack, frame_pointer, &offset); > > if (unlikely(!ret_stack)) { > ftrace_graph_stop(); > @@ -824,8 +827,11 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe > return (unsigned long)panic; > } > > - if (fregs) > - ftrace_regs_set_instruction_pointer(fregs, trace.func); > + if (fregs) { > + ftrace_regs_set_instruction_pointer(fregs, ret); > + ftrace_regs_set_stack_pointer(fregs, stack + 8); > + } > + > > bit = ftrace_test_recursion_trylock(trace.func, ret); > /* > diff --git a/tools/testing/selftests/bpf/prog_tests/stacktrace_ips.c b/tools/testing/selftests/bpf/prog_tests/stacktrace_ips.c > index e1a9b55e07cb..852830536109 100644 > --- a/tools/testing/selftests/bpf/prog_tests/stacktrace_ips.c > +++ b/tools/testing/selftests/bpf/prog_tests/stacktrace_ips.c > @@ -74,12 +74,20 @@ static void test_stacktrace_ips_kprobe_multi(bool retprobe) > > load_kallsyms(); > > - check_stacktrace_ips(bpf_map__fd(skel->maps.stackmap), skel->bss->stack_key, 5, > - ksym_get_addr("bpf_testmod_stacktrace_test"), > - ksym_get_addr("bpf_testmod_stacktrace_test_3"), > - ksym_get_addr("bpf_testmod_stacktrace_test_2"), > - ksym_get_addr("bpf_testmod_stacktrace_test_1"), > - ksym_get_addr("bpf_testmod_test_read")); > + if (retprobe) { > + check_stacktrace_ips(bpf_map__fd(skel->maps.stackmap), skel->bss->stack_key, 4, > + ksym_get_addr("bpf_testmod_stacktrace_test_3"), > + ksym_get_addr("bpf_testmod_stacktrace_test_2"), > + ksym_get_addr("bpf_testmod_stacktrace_test_1"), > + ksym_get_addr("bpf_testmod_test_read")); > + } else { > + check_stacktrace_ips(bpf_map__fd(skel->maps.stackmap), skel->bss->stack_key, 5, > + ksym_get_addr("bpf_testmod_stacktrace_test"), > + ksym_get_addr("bpf_testmod_stacktrace_test_3"), > + ksym_get_addr("bpf_testmod_stacktrace_test_2"), > + ksym_get_addr("bpf_testmod_stacktrace_test_1"), > + ksym_get_addr("bpf_testmod_test_read")); > + } > > cleanup: > stacktrace_ips__destroy(skel); ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 2/4] x86/fgraph,bpf: Switch kprobe_multi program stack unwind to hw_regs path 2026-01-15 18:52 ` Andrii Nakryiko @ 2026-01-16 16:25 ` Jiri Olsa 2026-01-16 22:24 ` Andrii Nakryiko 0 siblings, 1 reply; 17+ messages in thread From: Jiri Olsa @ 2026-01-16 16:25 UTC (permalink / raw) To: Andrii Nakryiko Cc: Jiri Olsa, Steven Rostedt, Masami Hiramatsu, Josh Poimboeuf, Mahe Tardy, Peter Zijlstra, bpf, linux-trace-kernel, x86, Yonghong Song, Song Liu, Andrii Nakryiko On Thu, Jan 15, 2026 at 10:52:04AM -0800, Andrii Nakryiko wrote: > On Tue, Jan 13, 2026 at 3:43 AM Jiri Olsa <olsajiri@gmail.com> wrote: > > > > On Mon, Jan 12, 2026 at 05:07:57PM -0500, Steven Rostedt wrote: > > > On Mon, 12 Jan 2026 22:49:38 +0100 > > > Jiri Olsa <jolsa@kernel.org> wrote: > > > > > > > To recreate same stack setup for return probe as we have for entry > > > > probe, we set the instruction pointer to the attached function address, > > > > which gets us the same unwind setup and same stack trace. > > > > > > > > With the fix, entry probe: > > > > > > > > # bpftrace -e 'kprobe:__x64_sys_newuname* { print(kstack)}' > > > > Attaching 1 probe... > > > > > > > > __x64_sys_newuname+9 > > > > do_syscall_64+134 > > > > entry_SYSCALL_64_after_hwframe+118 > > > > > > > > return probe: > > > > > > > > # bpftrace -e 'kretprobe:__x64_sys_newuname* { print(kstack)}' > > > > Attaching 1 probe... > > > > > > > > __x64_sys_newuname+4 > > > > do_syscall_64+134 > > > > entry_SYSCALL_64_after_hwframe+118 > > > > > > But is this really correct? > > > > > > The stack trace of the return from __x86_sys_newuname is from offset "+4". > > > > > > The stack trace from entry is offset "+9". Isn't it confusing that the > > > offset is likely not from the return portion of that function? > > > > right, makes sense.. so standard kprobe actualy skips attached function > > (__x86_sys_newuname) on return probe stacktrace.. perhaps we should do > > the same for kprobe_multi > > but it is quite nice to see what function we were kretprobing, > actually... IIUC Steven doesn't like the wrong +offset that comes from entry probe, maybe we could have func+(ADDRESS_OF_RET+1) ..but not sure how hard that would be still.. you always have the attached function ip when you get the stacktrace, so I'm not sure how usefull it's to have it in stacktrace as well.. you can always add it yourself > How hard would it be to support that for singular kprobe > as well? And what does fexit's stack trace show for such case? I think we will get the bpf_program address, so we see the attached function in stacktrace, will check thanks, jirka ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 2/4] x86/fgraph,bpf: Switch kprobe_multi program stack unwind to hw_regs path 2026-01-16 16:25 ` Jiri Olsa @ 2026-01-16 22:24 ` Andrii Nakryiko 2026-01-20 14:50 ` Steven Rostedt 0 siblings, 1 reply; 17+ messages in thread From: Andrii Nakryiko @ 2026-01-16 22:24 UTC (permalink / raw) To: Jiri Olsa Cc: Steven Rostedt, Masami Hiramatsu, Josh Poimboeuf, Mahe Tardy, Peter Zijlstra, bpf, linux-trace-kernel, x86, Yonghong Song, Song Liu, Andrii Nakryiko On Fri, Jan 16, 2026 at 8:25 AM Jiri Olsa <olsajiri@gmail.com> wrote: > > On Thu, Jan 15, 2026 at 10:52:04AM -0800, Andrii Nakryiko wrote: > > On Tue, Jan 13, 2026 at 3:43 AM Jiri Olsa <olsajiri@gmail.com> wrote: > > > > > > On Mon, Jan 12, 2026 at 05:07:57PM -0500, Steven Rostedt wrote: > > > > On Mon, 12 Jan 2026 22:49:38 +0100 > > > > Jiri Olsa <jolsa@kernel.org> wrote: > > > > > > > > > To recreate same stack setup for return probe as we have for entry > > > > > probe, we set the instruction pointer to the attached function address, > > > > > which gets us the same unwind setup and same stack trace. > > > > > > > > > > With the fix, entry probe: > > > > > > > > > > # bpftrace -e 'kprobe:__x64_sys_newuname* { print(kstack)}' > > > > > Attaching 1 probe... > > > > > > > > > > __x64_sys_newuname+9 > > > > > do_syscall_64+134 > > > > > entry_SYSCALL_64_after_hwframe+118 > > > > > > > > > > return probe: > > > > > > > > > > # bpftrace -e 'kretprobe:__x64_sys_newuname* { print(kstack)}' > > > > > Attaching 1 probe... > > > > > > > > > > __x64_sys_newuname+4 > > > > > do_syscall_64+134 > > > > > entry_SYSCALL_64_after_hwframe+118 > > > > > > > > But is this really correct? > > > > > > > > The stack trace of the return from __x86_sys_newuname is from offset "+4". > > > > > > > > The stack trace from entry is offset "+9". Isn't it confusing that the > > > > offset is likely not from the return portion of that function? > > > > > > right, makes sense.. so standard kprobe actualy skips attached function > > > (__x86_sys_newuname) on return probe stacktrace.. perhaps we should do > > > the same for kprobe_multi > > > > but it is quite nice to see what function we were kretprobing, > > actually... > > IIUC Steven doesn't like the wrong +offset that comes from entry probe, > maybe we could have func+(ADDRESS_OF_RET+1) ..but not sure how hard that > would be > > still.. you always have the attached function ip when you get the stacktrace, > so I'm not sure how usefull it's to have it in stacktrace as well.. you can > always add it yourself You can, but that's a custom thing that every single tool has to implement. Having traced function right there in the stack would be so nice and convenient. I don't insist, but I'm just saying that practically speaking this would make sense. Even conceptually, kretprobe is (logically) called from traced function right before exit. In reality it's not exactly like that and we don't know where ret happened, but having traced function in kretprobe's stack trace is more useful than confusing, IMO. But again, I just found it interesting that we could have it, if we wanted to. > > > > How hard would it be to support that for singular kprobe > > as well? And what does fexit's stack trace show for such case? > > I think we will get the bpf_program address, so we see the attached > function in stacktrace, will check > > thanks, > jirka ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 2/4] x86/fgraph,bpf: Switch kprobe_multi program stack unwind to hw_regs path 2026-01-16 22:24 ` Andrii Nakryiko @ 2026-01-20 14:50 ` Steven Rostedt 0 siblings, 0 replies; 17+ messages in thread From: Steven Rostedt @ 2026-01-20 14:50 UTC (permalink / raw) To: Andrii Nakryiko Cc: Jiri Olsa, Masami Hiramatsu, Josh Poimboeuf, Mahe Tardy, Peter Zijlstra, bpf, linux-trace-kernel, x86, Yonghong Song, Song Liu, Andrii Nakryiko On Fri, 16 Jan 2026 14:24:51 -0800 Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > I don't insist, but I'm just saying that practically speaking this > would make sense. Even conceptually, kretprobe is (logically) called > from traced function right before exit. In reality it's not exactly > like that and we don't know where ret happened, but having traced > function in kretprobe's stack trace is more useful than confusing, > IMO. It's more useful than confusing for you because you understand it. For anyone else, it will be very confusing, or worse, miscalculated, to see the backtrace coming from the start of the function when the function has already executed. Sure, having the function is very useful, but the function is already completed. Technically it shouldn't be in the stacktrace. Having the return address in the trace should point out that the stacktrace came from the function right before that address. -- Steve ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 2/4] x86/fgraph,bpf: Switch kprobe_multi program stack unwind to hw_regs path 2026-01-13 11:43 ` Jiri Olsa 2026-01-15 18:52 ` Andrii Nakryiko @ 2026-01-20 16:17 ` Jiri Olsa 1 sibling, 0 replies; 17+ messages in thread From: Jiri Olsa @ 2026-01-20 16:17 UTC (permalink / raw) To: Jiri Olsa Cc: Steven Rostedt, Masami Hiramatsu, Josh Poimboeuf, Mahe Tardy, Peter Zijlstra, bpf, linux-trace-kernel, x86, Yonghong Song, Song Liu, Andrii Nakryiko On Tue, Jan 13, 2026 at 12:43:39PM +0100, Jiri Olsa wrote: > On Mon, Jan 12, 2026 at 05:07:57PM -0500, Steven Rostedt wrote: > > On Mon, 12 Jan 2026 22:49:38 +0100 > > Jiri Olsa <jolsa@kernel.org> wrote: > > > > > To recreate same stack setup for return probe as we have for entry > > > probe, we set the instruction pointer to the attached function address, > > > which gets us the same unwind setup and same stack trace. > > > > > > With the fix, entry probe: > > > > > > # bpftrace -e 'kprobe:__x64_sys_newuname* { print(kstack)}' > > > Attaching 1 probe... > > > > > > __x64_sys_newuname+9 > > > do_syscall_64+134 > > > entry_SYSCALL_64_after_hwframe+118 > > > > > > return probe: > > > > > > # bpftrace -e 'kretprobe:__x64_sys_newuname* { print(kstack)}' > > > Attaching 1 probe... > > > > > > __x64_sys_newuname+4 > > > do_syscall_64+134 > > > entry_SYSCALL_64_after_hwframe+118 > > > > But is this really correct? > > > > The stack trace of the return from __x86_sys_newuname is from offset "+4". > > > > The stack trace from entry is offset "+9". Isn't it confusing that the > > offset is likely not from the return portion of that function? > > right, makes sense.. so standard kprobe actualy skips attached function > (__x86_sys_newuname) on return probe stacktrace.. perhaps we should do > the same for kprobe_multi > > I managed to get that with the change below, but it's wrong wrt arch code, > note the ftrace_regs_set_stack_pointer(fregs, stack + 8) .. will try to > figure out better way when we agree on the solution > > thanks, > jirka > > > --- > diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h > index c56e1e63b893..b0e8ce4934e7 100644 > --- a/arch/x86/include/asm/ftrace.h > +++ b/arch/x86/include/asm/ftrace.h > @@ -71,6 +71,9 @@ arch_ftrace_get_regs(struct ftrace_regs *fregs) > #define ftrace_regs_set_instruction_pointer(fregs, _ip) \ > do { arch_ftrace_regs(fregs)->regs.ip = (_ip); } while (0) > > +#define ftrace_regs_set_stack_pointer(fregs, _sp) \ > + do { arch_ftrace_regs(fregs)->regs.sp = (_sp); } while (0) > + > > static __always_inline unsigned long > ftrace_regs_get_return_address(struct ftrace_regs *fregs) > diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c > index 6279e0a753cf..b1510c412dcb 100644 > --- a/kernel/trace/fgraph.c > +++ b/kernel/trace/fgraph.c > @@ -717,7 +717,8 @@ int function_graph_enter_regs(unsigned long ret, unsigned long func, > /* Retrieve a function return address to the trace stack on thread info.*/ > static struct ftrace_ret_stack * > ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret, > - unsigned long frame_pointer, int *offset) > + unsigned long *stack, unsigned long frame_pointer, > + int *offset) > { > struct ftrace_ret_stack *ret_stack; > > @@ -762,6 +763,7 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret, > > *offset += FGRAPH_FRAME_OFFSET; > *ret = ret_stack->ret; > + *stack = (unsigned long) ret_stack->retp; > trace->func = ret_stack->func; > trace->overrun = atomic_read(¤t->trace_overrun); > trace->depth = current->curr_ret_depth; > @@ -810,12 +812,13 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe > struct ftrace_ret_stack *ret_stack; > struct ftrace_graph_ret trace; > unsigned long bitmap; > + unsigned long stack; > unsigned long ret; > int offset; > int bit; > int i; > > - ret_stack = ftrace_pop_return_trace(&trace, &ret, frame_pointer, &offset); > + ret_stack = ftrace_pop_return_trace(&trace, &ret, &stack, frame_pointer, &offset); > > if (unlikely(!ret_stack)) { > ftrace_graph_stop(); > @@ -824,8 +827,11 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe > return (unsigned long)panic; > } > > - if (fregs) > - ftrace_regs_set_instruction_pointer(fregs, trace.func); > + if (fregs) { > + ftrace_regs_set_instruction_pointer(fregs, ret); > + ftrace_regs_set_stack_pointer(fregs, stack + 8); actually looks like this might be better solution.. storing the proper rsp value directly to the regs in return_to_handler jirka --- diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S index a132608265f6..971883045b75 100644 --- a/arch/x86/kernel/ftrace_64.S +++ b/arch/x86/kernel/ftrace_64.S @@ -368,13 +368,16 @@ SYM_CODE_START(return_to_handler) subq $8, %rsp UNWIND_HINT_FUNC + movq %rsp, %rdi + addq $8, %rdi + /* Save ftrace_regs for function exit context */ subq $(FRAME_SIZE), %rsp movq %rax, RAX(%rsp) movq %rdx, RDX(%rsp) movq %rbp, RBP(%rsp) - movq %rsp, RSP(%rsp) + movq %rdi, RSP(%rsp) movq %rsp, %rdi call ftrace_return_to_handler ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH bpf-next 3/4] selftests/bpf: Fix kprobe multi stacktrace_ips test 2026-01-12 21:49 [PATCH bpf-next 0/4] x86/fgraph,bpf: Fix ORC stack unwind from kprobe_multi Jiri Olsa 2026-01-12 21:49 ` [PATCH bpf-next 1/4] x86/fgraph: Fix return_to_handler regs.rsp value Jiri Olsa 2026-01-12 21:49 ` [PATCH bpf-next 2/4] x86/fgraph,bpf: Switch kprobe_multi program stack unwind to hw_regs path Jiri Olsa @ 2026-01-12 21:49 ` Jiri Olsa 2026-01-12 21:49 ` [PATCH bpf-next 4/4] selftests/bpf: Allow to benchmark trigger with stacktrace Jiri Olsa 3 siblings, 0 replies; 17+ messages in thread From: Jiri Olsa @ 2026-01-12 21:49 UTC (permalink / raw) To: Masami Hiramatsu, Steven Rostedt, Josh Poimboeuf Cc: Peter Zijlstra, bpf, linux-trace-kernel, x86, Yonghong Song, Song Liu, Andrii Nakryiko, Mahe Tardy We now include the attached function in the stack trace, fixing the test accordingly. Fixes: c9e208fa93cd ("selftests/bpf: Add stacktrace ips test for kprobe_multi/kretprobe_multi") Signed-off-by: Jiri Olsa <jolsa@kernel.org> --- tools/testing/selftests/bpf/prog_tests/stacktrace_ips.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/prog_tests/stacktrace_ips.c b/tools/testing/selftests/bpf/prog_tests/stacktrace_ips.c index c9efdd2a5b18..43781a67051b 100644 --- a/tools/testing/selftests/bpf/prog_tests/stacktrace_ips.c +++ b/tools/testing/selftests/bpf/prog_tests/stacktrace_ips.c @@ -74,7 +74,8 @@ static void test_stacktrace_ips_kprobe_multi(bool retprobe) load_kallsyms(); - check_stacktrace_ips(bpf_map__fd(skel->maps.stackmap), skel->bss->stack_key, 4, + check_stacktrace_ips(bpf_map__fd(skel->maps.stackmap), skel->bss->stack_key, 5, + ksym_get_addr("bpf_testmod_stacktrace_test"), ksym_get_addr("bpf_testmod_stacktrace_test_3"), ksym_get_addr("bpf_testmod_stacktrace_test_2"), ksym_get_addr("bpf_testmod_stacktrace_test_1"), -- 2.52.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH bpf-next 4/4] selftests/bpf: Allow to benchmark trigger with stacktrace 2026-01-12 21:49 [PATCH bpf-next 0/4] x86/fgraph,bpf: Fix ORC stack unwind from kprobe_multi Jiri Olsa ` (2 preceding siblings ...) 2026-01-12 21:49 ` [PATCH bpf-next 3/4] selftests/bpf: Fix kprobe multi stacktrace_ips test Jiri Olsa @ 2026-01-12 21:49 ` Jiri Olsa 2026-01-15 18:48 ` Andrii Nakryiko 3 siblings, 1 reply; 17+ messages in thread From: Jiri Olsa @ 2026-01-12 21:49 UTC (permalink / raw) To: Masami Hiramatsu, Steven Rostedt, Josh Poimboeuf Cc: Peter Zijlstra, bpf, linux-trace-kernel, x86, Yonghong Song, Song Liu, Andrii Nakryiko, Mahe Tardy Adding support to call bpf_get_stackid helper from trigger programs, so far added for kprobe multi. Adding the --stacktrace/-g option to enable it. Signed-off-by: Jiri Olsa <jolsa@kernel.org> --- tools/testing/selftests/bpf/bench.c | 4 ++++ tools/testing/selftests/bpf/bench.h | 1 + .../selftests/bpf/benchs/bench_trigger.c | 1 + .../selftests/bpf/progs/trigger_bench.c | 18 ++++++++++++++++++ 4 files changed, 24 insertions(+) diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c index bd29bb2e6cb5..8dadd9c928ec 100644 --- a/tools/testing/selftests/bpf/bench.c +++ b/tools/testing/selftests/bpf/bench.c @@ -265,6 +265,7 @@ static const struct argp_option opts[] = { { "verbose", 'v', NULL, 0, "Verbose debug output"}, { "affinity", 'a', NULL, 0, "Set consumer/producer thread affinity"}, { "quiet", 'q', NULL, 0, "Be more quiet"}, + { "stacktrace", 'g', NULL, 0, "Get stack trace"}, { "prod-affinity", ARG_PROD_AFFINITY_SET, "CPUSET", 0, "Set of CPUs for producer threads; implies --affinity"}, { "cons-affinity", ARG_CONS_AFFINITY_SET, "CPUSET", 0, @@ -350,6 +351,9 @@ static error_t parse_arg(int key, char *arg, struct argp_state *state) case 'q': env.quiet = true; break; + case 'g': + env.stacktrace = true; + break; case ARG_PROD_AFFINITY_SET: env.affinity = true; if (parse_num_list(arg, &env.prod_cpus.cpus, diff --git a/tools/testing/selftests/bpf/bench.h b/tools/testing/selftests/bpf/bench.h index bea323820ffb..7cf21936e7ed 100644 --- a/tools/testing/selftests/bpf/bench.h +++ b/tools/testing/selftests/bpf/bench.h @@ -26,6 +26,7 @@ struct env { bool list; bool affinity; bool quiet; + bool stacktrace; int consumer_cnt; int producer_cnt; int nr_cpus; diff --git a/tools/testing/selftests/bpf/benchs/bench_trigger.c b/tools/testing/selftests/bpf/benchs/bench_trigger.c index 34018fc3927f..aeec9edd3851 100644 --- a/tools/testing/selftests/bpf/benchs/bench_trigger.c +++ b/tools/testing/selftests/bpf/benchs/bench_trigger.c @@ -146,6 +146,7 @@ static void setup_ctx(void) bpf_program__set_autoload(ctx.skel->progs.trigger_driver, true); ctx.skel->rodata->batch_iters = args.batch_iters; + ctx.skel->rodata->stacktrace = env.stacktrace; } static void load_ctx(void) diff --git a/tools/testing/selftests/bpf/progs/trigger_bench.c b/tools/testing/selftests/bpf/progs/trigger_bench.c index 2898b3749d07..479400d96fa4 100644 --- a/tools/testing/selftests/bpf/progs/trigger_bench.c +++ b/tools/testing/selftests/bpf/progs/trigger_bench.c @@ -25,6 +25,23 @@ static __always_inline void inc_counter(void) __sync_add_and_fetch(&hits[cpu & CPU_MASK].value, 1); } +volatile const int stacktrace; + +typedef __u64 stack_trace_t[128]; + +struct { + __uint(type, BPF_MAP_TYPE_STACK_TRACE); + __uint(max_entries, 16384); + __type(key, __u32); + __type(value, stack_trace_t); +} stackmap SEC(".maps"); + +static __always_inline void do_stacktrace(void *ctx) +{ + if (stacktrace) + bpf_get_stackid(ctx, &stackmap, 0); +} + SEC("?uprobe") int bench_trigger_uprobe(void *ctx) { @@ -96,6 +113,7 @@ SEC("?kprobe.multi/bpf_get_numa_node_id") int bench_trigger_kprobe_multi(void *ctx) { inc_counter(); + do_stacktrace(ctx); return 0; } -- 2.52.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 4/4] selftests/bpf: Allow to benchmark trigger with stacktrace 2026-01-12 21:49 ` [PATCH bpf-next 4/4] selftests/bpf: Allow to benchmark trigger with stacktrace Jiri Olsa @ 2026-01-15 18:48 ` Andrii Nakryiko 2026-01-15 18:50 ` Andrii Nakryiko ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Andrii Nakryiko @ 2026-01-15 18:48 UTC (permalink / raw) To: Jiri Olsa Cc: Masami Hiramatsu, Steven Rostedt, Josh Poimboeuf, Peter Zijlstra, bpf, linux-trace-kernel, x86, Yonghong Song, Song Liu, Andrii Nakryiko, Mahe Tardy On Mon, Jan 12, 2026 at 1:50 PM Jiri Olsa <jolsa@kernel.org> wrote: > > Adding support to call bpf_get_stackid helper from trigger programs, > so far added for kprobe multi. > > Adding the --stacktrace/-g option to enable it. > > Signed-off-by: Jiri Olsa <jolsa@kernel.org> > --- > tools/testing/selftests/bpf/bench.c | 4 ++++ > tools/testing/selftests/bpf/bench.h | 1 + > .../selftests/bpf/benchs/bench_trigger.c | 1 + > .../selftests/bpf/progs/trigger_bench.c | 18 ++++++++++++++++++ > 4 files changed, 24 insertions(+) > This now actually becomes a stack trace benchmark :) But I don't mind, I think it would be good to be able to benchmark this. But I think we should then implement it for all different tracing programs (tp, raw_tp, fentry/fexit/fmod_ret) for consistency and so we can compare and contrast?... > diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c > index bd29bb2e6cb5..8dadd9c928ec 100644 > --- a/tools/testing/selftests/bpf/bench.c > +++ b/tools/testing/selftests/bpf/bench.c > @@ -265,6 +265,7 @@ static const struct argp_option opts[] = { > { "verbose", 'v', NULL, 0, "Verbose debug output"}, > { "affinity", 'a', NULL, 0, "Set consumer/producer thread affinity"}, > { "quiet", 'q', NULL, 0, "Be more quiet"}, > + { "stacktrace", 'g', NULL, 0, "Get stack trace"}, bikeshedding time: why "g"? why not -S or something like that? > { "prod-affinity", ARG_PROD_AFFINITY_SET, "CPUSET", 0, > "Set of CPUs for producer threads; implies --affinity"}, > { "cons-affinity", ARG_CONS_AFFINITY_SET, "CPUSET", 0, > @@ -350,6 +351,9 @@ static error_t parse_arg(int key, char *arg, struct argp_state *state) > case 'q': > env.quiet = true; > break; > + case 'g': > + env.stacktrace = true; > + break; > case ARG_PROD_AFFINITY_SET: > env.affinity = true; > if (parse_num_list(arg, &env.prod_cpus.cpus, > diff --git a/tools/testing/selftests/bpf/bench.h b/tools/testing/selftests/bpf/bench.h > index bea323820ffb..7cf21936e7ed 100644 > --- a/tools/testing/selftests/bpf/bench.h > +++ b/tools/testing/selftests/bpf/bench.h > @@ -26,6 +26,7 @@ struct env { > bool list; > bool affinity; > bool quiet; > + bool stacktrace; > int consumer_cnt; > int producer_cnt; > int nr_cpus; > diff --git a/tools/testing/selftests/bpf/benchs/bench_trigger.c b/tools/testing/selftests/bpf/benchs/bench_trigger.c > index 34018fc3927f..aeec9edd3851 100644 > --- a/tools/testing/selftests/bpf/benchs/bench_trigger.c > +++ b/tools/testing/selftests/bpf/benchs/bench_trigger.c > @@ -146,6 +146,7 @@ static void setup_ctx(void) > bpf_program__set_autoload(ctx.skel->progs.trigger_driver, true); > > ctx.skel->rodata->batch_iters = args.batch_iters; > + ctx.skel->rodata->stacktrace = env.stacktrace; > } > > static void load_ctx(void) > diff --git a/tools/testing/selftests/bpf/progs/trigger_bench.c b/tools/testing/selftests/bpf/progs/trigger_bench.c > index 2898b3749d07..479400d96fa4 100644 > --- a/tools/testing/selftests/bpf/progs/trigger_bench.c > +++ b/tools/testing/selftests/bpf/progs/trigger_bench.c > @@ -25,6 +25,23 @@ static __always_inline void inc_counter(void) > __sync_add_and_fetch(&hits[cpu & CPU_MASK].value, 1); > } > > +volatile const int stacktrace; > + > +typedef __u64 stack_trace_t[128]; > + > +struct { > + __uint(type, BPF_MAP_TYPE_STACK_TRACE); > + __uint(max_entries, 16384); > + __type(key, __u32); > + __type(value, stack_trace_t); > +} stackmap SEC(".maps"); > + > +static __always_inline void do_stacktrace(void *ctx) > +{ > + if (stacktrace) > + bpf_get_stackid(ctx, &stackmap, 0); > +} > + > SEC("?uprobe") > int bench_trigger_uprobe(void *ctx) > { > @@ -96,6 +113,7 @@ SEC("?kprobe.multi/bpf_get_numa_node_id") > int bench_trigger_kprobe_multi(void *ctx) > { > inc_counter(); > + do_stacktrace(ctx); > return 0; > } > > -- > 2.52.0 > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 4/4] selftests/bpf: Allow to benchmark trigger with stacktrace 2026-01-15 18:48 ` Andrii Nakryiko @ 2026-01-15 18:50 ` Andrii Nakryiko 2026-01-16 16:30 ` Jiri Olsa 2026-01-16 16:26 ` Jiri Olsa 2026-01-22 8:35 ` Jiri Olsa 2 siblings, 1 reply; 17+ messages in thread From: Andrii Nakryiko @ 2026-01-15 18:50 UTC (permalink / raw) To: Jiri Olsa Cc: Masami Hiramatsu, Steven Rostedt, Josh Poimboeuf, Peter Zijlstra, bpf, linux-trace-kernel, x86, Yonghong Song, Song Liu, Andrii Nakryiko, Mahe Tardy On Thu, Jan 15, 2026 at 10:48 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Mon, Jan 12, 2026 at 1:50 PM Jiri Olsa <jolsa@kernel.org> wrote: > > > > Adding support to call bpf_get_stackid helper from trigger programs, > > so far added for kprobe multi. > > > > Adding the --stacktrace/-g option to enable it. > > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org> > > --- > > tools/testing/selftests/bpf/bench.c | 4 ++++ > > tools/testing/selftests/bpf/bench.h | 1 + > > .../selftests/bpf/benchs/bench_trigger.c | 1 + > > .../selftests/bpf/progs/trigger_bench.c | 18 ++++++++++++++++++ > > 4 files changed, 24 insertions(+) > > > > This now actually becomes a stack trace benchmark :) But I don't mind, > I think it would be good to be able to benchmark this. But I think we > should then implement it for all different tracing programs (tp, > raw_tp, fentry/fexit/fmod_ret) for consistency and so we can compare > and contrast?... > > > diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c > > index bd29bb2e6cb5..8dadd9c928ec 100644 > > --- a/tools/testing/selftests/bpf/bench.c > > +++ b/tools/testing/selftests/bpf/bench.c > > @@ -265,6 +265,7 @@ static const struct argp_option opts[] = { > > { "verbose", 'v', NULL, 0, "Verbose debug output"}, > > { "affinity", 'a', NULL, 0, "Set consumer/producer thread affinity"}, > > { "quiet", 'q', NULL, 0, "Be more quiet"}, > > + { "stacktrace", 'g', NULL, 0, "Get stack trace"}, > > bikeshedding time: why "g"? why not -S or something like that? > > > { "prod-affinity", ARG_PROD_AFFINITY_SET, "CPUSET", 0, > > "Set of CPUs for producer threads; implies --affinity"}, > > { "cons-affinity", ARG_CONS_AFFINITY_SET, "CPUSET", 0, > > @@ -350,6 +351,9 @@ static error_t parse_arg(int key, char *arg, struct argp_state *state) > > case 'q': > > env.quiet = true; > > break; > > + case 'g': > > + env.stacktrace = true; > > + break; > > case ARG_PROD_AFFINITY_SET: > > env.affinity = true; > > if (parse_num_list(arg, &env.prod_cpus.cpus, > > diff --git a/tools/testing/selftests/bpf/bench.h b/tools/testing/selftests/bpf/bench.h > > index bea323820ffb..7cf21936e7ed 100644 > > --- a/tools/testing/selftests/bpf/bench.h > > +++ b/tools/testing/selftests/bpf/bench.h > > @@ -26,6 +26,7 @@ struct env { > > bool list; > > bool affinity; > > bool quiet; > > + bool stacktrace; > > int consumer_cnt; > > int producer_cnt; > > int nr_cpus; > > diff --git a/tools/testing/selftests/bpf/benchs/bench_trigger.c b/tools/testing/selftests/bpf/benchs/bench_trigger.c > > index 34018fc3927f..aeec9edd3851 100644 > > --- a/tools/testing/selftests/bpf/benchs/bench_trigger.c > > +++ b/tools/testing/selftests/bpf/benchs/bench_trigger.c > > @@ -146,6 +146,7 @@ static void setup_ctx(void) > > bpf_program__set_autoload(ctx.skel->progs.trigger_driver, true); > > > > ctx.skel->rodata->batch_iters = args.batch_iters; > > + ctx.skel->rodata->stacktrace = env.stacktrace; > > } > > > > static void load_ctx(void) > > diff --git a/tools/testing/selftests/bpf/progs/trigger_bench.c b/tools/testing/selftests/bpf/progs/trigger_bench.c > > index 2898b3749d07..479400d96fa4 100644 > > --- a/tools/testing/selftests/bpf/progs/trigger_bench.c > > +++ b/tools/testing/selftests/bpf/progs/trigger_bench.c > > @@ -25,6 +25,23 @@ static __always_inline void inc_counter(void) > > __sync_add_and_fetch(&hits[cpu & CPU_MASK].value, 1); > > } > > > > +volatile const int stacktrace; > > + > > +typedef __u64 stack_trace_t[128]; > > + > > +struct { > > + __uint(type, BPF_MAP_TYPE_STACK_TRACE); > > + __uint(max_entries, 16384); > > + __type(key, __u32); > > + __type(value, stack_trace_t); > > +} stackmap SEC(".maps"); oh, and why bother with STACK_TRACE map, just call bpf_get_stack() API and have maybe per-CPU scratch array for stack trace (per-CPU so that in multi-cpu benchmarks they don't just contend on the same cache lines) > > + > > +static __always_inline void do_stacktrace(void *ctx) > > +{ > > + if (stacktrace) > > + bpf_get_stackid(ctx, &stackmap, 0); > > +} > > + > > SEC("?uprobe") > > int bench_trigger_uprobe(void *ctx) > > { > > @@ -96,6 +113,7 @@ SEC("?kprobe.multi/bpf_get_numa_node_id") > > int bench_trigger_kprobe_multi(void *ctx) > > { > > inc_counter(); > > + do_stacktrace(ctx); > > return 0; > > } > > > > -- > > 2.52.0 > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 4/4] selftests/bpf: Allow to benchmark trigger with stacktrace 2026-01-15 18:50 ` Andrii Nakryiko @ 2026-01-16 16:30 ` Jiri Olsa 0 siblings, 0 replies; 17+ messages in thread From: Jiri Olsa @ 2026-01-16 16:30 UTC (permalink / raw) To: Andrii Nakryiko Cc: Masami Hiramatsu, Steven Rostedt, Josh Poimboeuf, Peter Zijlstra, bpf, linux-trace-kernel, x86, Yonghong Song, Song Liu, Andrii Nakryiko, Mahe Tardy On Thu, Jan 15, 2026 at 10:50:13AM -0800, Andrii Nakryiko wrote: SNIP > > > diff --git a/tools/testing/selftests/bpf/benchs/bench_trigger.c b/tools/testing/selftests/bpf/benchs/bench_trigger.c > > > index 34018fc3927f..aeec9edd3851 100644 > > > --- a/tools/testing/selftests/bpf/benchs/bench_trigger.c > > > +++ b/tools/testing/selftests/bpf/benchs/bench_trigger.c > > > @@ -146,6 +146,7 @@ static void setup_ctx(void) > > > bpf_program__set_autoload(ctx.skel->progs.trigger_driver, true); > > > > > > ctx.skel->rodata->batch_iters = args.batch_iters; > > > + ctx.skel->rodata->stacktrace = env.stacktrace; > > > } > > > > > > static void load_ctx(void) > > > diff --git a/tools/testing/selftests/bpf/progs/trigger_bench.c b/tools/testing/selftests/bpf/progs/trigger_bench.c > > > index 2898b3749d07..479400d96fa4 100644 > > > --- a/tools/testing/selftests/bpf/progs/trigger_bench.c > > > +++ b/tools/testing/selftests/bpf/progs/trigger_bench.c > > > @@ -25,6 +25,23 @@ static __always_inline void inc_counter(void) > > > __sync_add_and_fetch(&hits[cpu & CPU_MASK].value, 1); > > > } > > > > > > +volatile const int stacktrace; > > > + > > > +typedef __u64 stack_trace_t[128]; > > > + > > > +struct { > > > + __uint(type, BPF_MAP_TYPE_STACK_TRACE); > > > + __uint(max_entries, 16384); > > > + __type(key, __u32); > > > + __type(value, stack_trace_t); > > > +} stackmap SEC(".maps"); > > oh, and why bother with STACK_TRACE map, just call bpf_get_stack() API > and have maybe per-CPU scratch array for stack trace (per-CPU so that > in multi-cpu benchmarks they don't just contend on the same cache > lines) ok, will change thanks, jirka ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 4/4] selftests/bpf: Allow to benchmark trigger with stacktrace 2026-01-15 18:48 ` Andrii Nakryiko 2026-01-15 18:50 ` Andrii Nakryiko @ 2026-01-16 16:26 ` Jiri Olsa 2026-01-22 8:35 ` Jiri Olsa 2 siblings, 0 replies; 17+ messages in thread From: Jiri Olsa @ 2026-01-16 16:26 UTC (permalink / raw) To: Andrii Nakryiko Cc: Masami Hiramatsu, Steven Rostedt, Josh Poimboeuf, Peter Zijlstra, bpf, linux-trace-kernel, x86, Yonghong Song, Song Liu, Andrii Nakryiko, Mahe Tardy On Thu, Jan 15, 2026 at 10:48:29AM -0800, Andrii Nakryiko wrote: > On Mon, Jan 12, 2026 at 1:50 PM Jiri Olsa <jolsa@kernel.org> wrote: > > > > Adding support to call bpf_get_stackid helper from trigger programs, > > so far added for kprobe multi. > > > > Adding the --stacktrace/-g option to enable it. > > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org> > > --- > > tools/testing/selftests/bpf/bench.c | 4 ++++ > > tools/testing/selftests/bpf/bench.h | 1 + > > .../selftests/bpf/benchs/bench_trigger.c | 1 + > > .../selftests/bpf/progs/trigger_bench.c | 18 ++++++++++++++++++ > > 4 files changed, 24 insertions(+) > > > > This now actually becomes a stack trace benchmark :) But I don't mind, > I think it would be good to be able to benchmark this. But I think we > should then implement it for all different tracing programs (tp, > raw_tp, fentry/fexit/fmod_ret) for consistency and so we can compare > and contrast?... yep, agreed > > > diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c > > index bd29bb2e6cb5..8dadd9c928ec 100644 > > --- a/tools/testing/selftests/bpf/bench.c > > +++ b/tools/testing/selftests/bpf/bench.c > > @@ -265,6 +265,7 @@ static const struct argp_option opts[] = { > > { "verbose", 'v', NULL, 0, "Verbose debug output"}, > > { "affinity", 'a', NULL, 0, "Set consumer/producer thread affinity"}, > > { "quiet", 'q', NULL, 0, "Be more quiet"}, > > + { "stacktrace", 'g', NULL, 0, "Get stack trace"}, > > bikeshedding time: why "g"? why not -S or something like that? perf tool strikes back ;-) -S is better thanks, jirka ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 4/4] selftests/bpf: Allow to benchmark trigger with stacktrace 2026-01-15 18:48 ` Andrii Nakryiko 2026-01-15 18:50 ` Andrii Nakryiko 2026-01-16 16:26 ` Jiri Olsa @ 2026-01-22 8:35 ` Jiri Olsa 2 siblings, 0 replies; 17+ messages in thread From: Jiri Olsa @ 2026-01-22 8:35 UTC (permalink / raw) To: Andrii Nakryiko Cc: Masami Hiramatsu, Steven Rostedt, Josh Poimboeuf, Peter Zijlstra, bpf, linux-trace-kernel, x86, Yonghong Song, Song Liu, Andrii Nakryiko, Mahe Tardy On Thu, Jan 15, 2026 at 10:48:29AM -0800, Andrii Nakryiko wrote: > On Mon, Jan 12, 2026 at 1:50 PM Jiri Olsa <jolsa@kernel.org> wrote: > > > > Adding support to call bpf_get_stackid helper from trigger programs, > > so far added for kprobe multi. > > > > Adding the --stacktrace/-g option to enable it. > > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org> > > --- > > tools/testing/selftests/bpf/bench.c | 4 ++++ > > tools/testing/selftests/bpf/bench.h | 1 + > > .../selftests/bpf/benchs/bench_trigger.c | 1 + > > .../selftests/bpf/progs/trigger_bench.c | 18 ++++++++++++++++++ > > 4 files changed, 24 insertions(+) > > > > This now actually becomes a stack trace benchmark :) But I don't mind, > I think it would be good to be able to benchmark this. But I think we > should then implement it for all different tracing programs (tp, > raw_tp, fentry/fexit/fmod_ret) for consistency and so we can compare > and contrast?... fyi I updated the bench for all program types and got some stats current fix WITHOUT stacktrace: usermode-count : 810.652 ± 1.036M/s kernel-count : 336.645 ± 2.812M/s syscall-count : 27.798 ± 0.063M/s fentry : 67.677 ± 0.291M/s fexit : 49.970 ± 0.214M/s fmodret : 52.860 ± 0.237M/s rawtp : 65.196 ± 0.224M/s tp : 34.120 ± 0.042M/s kprobe : 25.157 ± 0.019M/s kprobe-multi : 33.223 ± 0.205M/s kprobe-multi-all: 4.739 ± 0.003M/s kretprobe : 10.904 ± 0.020M/s kretprobe-multi: 15.996 ± 0.023M/s kretprobe-multi-all: 2.559 ± 0.092M/s current fix WITH stacktrace: usermode-count : 782.529 ± 5.866M/s kernel-count : 341.116 ± 2.247M/s syscall-count : 27.481 ± 0.267M/s fentry : 2.397 ± 0.026M/s fexit : 2.472 ± 0.008M/s fmodret : 2.475 ± 0.014M/s rawtp : 2.593 ± 0.031M/s tp : 2.641 ± 0.020M/s kprobe : 3.848 ± 0.014M/s kprobe-multi : 4.188 ± 0.025M/s kprobe-multi-all: 0.261 ± 0.026M/s kretprobe : 3.782 ± 0.011M/s kretprobe-multi: 4.157 ± 0.023M/s kretprobe-multi-all: 0.177 ± 0.000M/s with similar fix for fentry/fexit/raw_tp/tp WITH stacktrace: usermode-count : 792.613 ± 1.322M/s kernel-count : 337.725 ± 2.422M/s syscall-count : 27.363 ± 0.030M/s fentry : 14.911 ± 0.083M/s fexit : 13.749 ± 0.060M/s fmodret : 13.987 ± 0.049M/s rawtp : 13.760 ± 0.042M/s tp : 7.060 ± 0.026M/s kprobe : 3.920 ± 0.012M/s kprobe-multi : 4.186 ± 0.030M/s kprobe-multi-all: 0.281 ± 0.006M/s kretprobe : 3.782 ± 0.005M/s kretprobe-multi: 4.030 ± 0.014M/s kretprobe-multi-all: 0.178 ± 0.000M/s so cutting the extra initial unwind gets some speedup ex expected I'm getting wrong callstack for rawtp programs, so I need to find out why, but the rest of the tracing programs fentry/fexit.. work ok jirka ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2026-01-22 8:35 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-01-12 21:49 [PATCH bpf-next 0/4] x86/fgraph,bpf: Fix ORC stack unwind from kprobe_multi Jiri Olsa 2026-01-12 21:49 ` [PATCH bpf-next 1/4] x86/fgraph: Fix return_to_handler regs.rsp value Jiri Olsa 2026-01-12 21:49 ` [PATCH bpf-next 2/4] x86/fgraph,bpf: Switch kprobe_multi program stack unwind to hw_regs path Jiri Olsa 2026-01-12 22:07 ` Steven Rostedt 2026-01-13 11:43 ` Jiri Olsa 2026-01-15 18:52 ` Andrii Nakryiko 2026-01-16 16:25 ` Jiri Olsa 2026-01-16 22:24 ` Andrii Nakryiko 2026-01-20 14:50 ` Steven Rostedt 2026-01-20 16:17 ` Jiri Olsa 2026-01-12 21:49 ` [PATCH bpf-next 3/4] selftests/bpf: Fix kprobe multi stacktrace_ips test Jiri Olsa 2026-01-12 21:49 ` [PATCH bpf-next 4/4] selftests/bpf: Allow to benchmark trigger with stacktrace Jiri Olsa 2026-01-15 18:48 ` Andrii Nakryiko 2026-01-15 18:50 ` Andrii Nakryiko 2026-01-16 16:30 ` Jiri Olsa 2026-01-16 16:26 ` Jiri Olsa 2026-01-22 8:35 ` Jiri Olsa
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox