From: Jiri Olsa <jolsa@kernel.org>
To: Masami Hiramatsu <mhiramat@kernel.org>,
Steven Rostedt <rostedt@goodmis.org>,
Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Mahe Tardy <mahe.tardy@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
bpf@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
x86@kernel.org, Yonghong Song <yhs@fb.com>,
Song Liu <songliubraving@fb.com>,
Andrii Nakryiko <andrii@kernel.org>
Subject: [PATCH bpf-next 2/4] x86/fgraph,bpf: Switch kprobe_multi program stack unwind to hw_regs path
Date: Mon, 12 Jan 2026 22:49:38 +0100 [thread overview]
Message-ID: <20260112214940.1222115-3-jolsa@kernel.org> (raw)
In-Reply-To: <20260112214940.1222115-1-jolsa@kernel.org>
Mahe reported missing function from stack trace on top of kprobe
multi program. The missing function is the very first one in the
stacktrace, the one that the bpf program is attached to.
# bpftrace -e 'kprobe:__x64_sys_newuname* { print(kstack)}'
Attaching 1 probe...
do_syscall_64+134
entry_SYSCALL_64_after_hwframe+118
('*' is used for kprobe_multi attachment)
The reason is that the previous change (the Fixes commit) fixed
stack unwind for tracepoint, but removed attached function address
from the stack trace on top of kprobe multi programs, which I also
overlooked in the related test (check following patch).
The tracepoint and kprobe_multi have different stack setup, but use
same unwind path. I think it's better to keep the previous change,
which fixed tracepoint unwind and instead change the kprobe multi
unwind as explained below.
The bpf program stack unwind calls perf_callchain_kernel for kernel
portion and it follows two unwind paths based on X86_EFLAGS_FIXED
bit in pt_regs.flags.
When the bit set we unwind from stack represented by pt_regs argument,
otherwise we unwind currently executed stack up to 'first_frame'
boundary.
The 'first_frame' value is taken from regs.rsp value, but ftrace_caller
and ftrace_regs_caller (ftrace trampoline) functions set the regs.rsp
to the previous stack frame, so we skip the attached function entry.
If we switch kprobe_multi unwind to use the X86_EFLAGS_FIXED bit,
we can control the start of the unwind and get back the attached
function address. As another benefit we also cut extra unwind cycles
needed to reach the 'first_frame' boundary.
The speedup can be meassured with trigger bench for kprobe_multi
program and stacktrace support.
- without bpf_get_stackid call:
# ./bench -w2 -d5 -a -p1 trig-kprobe-multi
Summary: hits 0.857 ± 0.003M/s ( 0.857M/prod), drops 0.000 ± 0.000M/s, total operations 0.857 ± 0.003M/s
- with bpf_get_stackid call:
# ./bench -w2 -d5 -a -g -p1 trig-kprobe-multi
Summary: hits 1.302 ± 0.002M/s ( 1.302M/prod), drops 0.000 ± 0.000M/s, total operations 1.302 ± 0.002M/s
Note the '-g' option for stacktrace added in following change.
To recreate same stack setup for return probe as we have for entry
probe, we set the instruction pointer to the attached function address,
which gets us the same unwind setup and same stack trace.
With the fix, entry probe:
# bpftrace -e 'kprobe:__x64_sys_newuname* { print(kstack)}'
Attaching 1 probe...
__x64_sys_newuname+9
do_syscall_64+134
entry_SYSCALL_64_after_hwframe+118
return probe:
# bpftrace -e 'kretprobe:__x64_sys_newuname* { print(kstack)}'
Attaching 1 probe...
__x64_sys_newuname+4
do_syscall_64+134
entry_SYSCALL_64_after_hwframe+118
Fixes: 6d08340d1e35 ("Revert "perf/x86: Always store regs->ip in perf_callchain_kernel()"")
Reported-by: Mahe Tardy <mahe.tardy@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
arch/x86/include/asm/ftrace.h | 2 +-
kernel/trace/fgraph.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index b08c95872eed..c56e1e63b893 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -57,7 +57,7 @@ arch_ftrace_get_regs(struct ftrace_regs *fregs)
}
#define arch_ftrace_partial_regs(regs) do { \
- regs->flags &= ~X86_EFLAGS_FIXED; \
+ regs->flags |= X86_EFLAGS_FIXED; \
regs->cs = __KERNEL_CS; \
} while (0)
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index cc48d16be43e..6279e0a753cf 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -825,7 +825,7 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe
}
if (fregs)
- ftrace_regs_set_instruction_pointer(fregs, ret);
+ ftrace_regs_set_instruction_pointer(fregs, trace.func);
bit = ftrace_test_recursion_trylock(trace.func, ret);
/*
--
2.52.0
next prev parent reply other threads:[~2026-01-12 21:50 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-12 21:49 [PATCH bpf-next 0/4] x86/fgraph,bpf: Fix ORC stack unwind from kprobe_multi Jiri Olsa
2026-01-12 21:49 ` [PATCH bpf-next 1/4] x86/fgraph: Fix return_to_handler regs.rsp value Jiri Olsa
2026-01-12 21:49 ` Jiri Olsa [this message]
2026-01-12 22:07 ` [PATCH bpf-next 2/4] x86/fgraph,bpf: Switch kprobe_multi program stack unwind to hw_regs path Steven Rostedt
2026-01-13 11:43 ` Jiri Olsa
2026-01-15 18:52 ` Andrii Nakryiko
2026-01-16 16:25 ` Jiri Olsa
2026-01-16 22:24 ` Andrii Nakryiko
2026-01-20 14:50 ` Steven Rostedt
2026-01-20 16:17 ` Jiri Olsa
2026-01-12 21:49 ` [PATCH bpf-next 3/4] selftests/bpf: Fix kprobe multi stacktrace_ips test Jiri Olsa
2026-01-12 21:49 ` [PATCH bpf-next 4/4] selftests/bpf: Allow to benchmark trigger with stacktrace Jiri Olsa
2026-01-15 18:48 ` Andrii Nakryiko
2026-01-15 18:50 ` Andrii Nakryiko
2026-01-16 16:30 ` Jiri Olsa
2026-01-16 16:26 ` Jiri Olsa
2026-01-22 8:35 ` Jiri Olsa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260112214940.1222115-3-jolsa@kernel.org \
--to=jolsa@kernel.org \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=jpoimboe@kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mahe.tardy@gmail.com \
--cc=mhiramat@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=songliubraving@fb.com \
--cc=x86@kernel.org \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox