From: Jiri Olsa <jolsa@kernel.org>
To: Masami Hiramatsu <mhiramat@kernel.org>,
Steven Rostedt <rostedt@goodmis.org>,
Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
bpf@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
x86@kernel.org, Yonghong Song <yhs@fb.com>,
Song Liu <songliubraving@fb.com>,
Andrii Nakryiko <andrii@kernel.org>
Subject: [PATCHv2 2/4] x86/fgraph,bpf: Fix stack ORC unwind from kprobe_multi return probe
Date: Mon, 3 Nov 2025 23:09:22 +0100 [thread overview]
Message-ID: <20251103220924.36371-3-jolsa@kernel.org> (raw)
In-Reply-To: <20251103220924.36371-1-jolsa@kernel.org>
Currently we don't get stack trace via ORC unwinder on top of fgraph exit
handler. We can see that when generating stacktrace from kretprobe_multi
bpf program which is based on fprobe/fgraph.
The reason is that the ORC unwind code won't get pass the return_to_handler
callback installed by fgraph return probe machinery.
Solving this by creating stack frame in return_to_handler expected by
ftrace_graph_ret_addr function to recover original return address and
continue with the unwind.
Also updating the pt_regs data with cs/flags/rsp which are needed for
successful stack retrieval from ebpf bpf_get_stackid helper.
- in get_perf_callchain we check user_mode(regs) so CS has to be set
- in perf_callchain_kernel we call perf_hw_regs(regs), so EFLAGS/FIXED
has to be unset
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
arch/x86/include/asm/ftrace.h | 5 +++++
arch/x86/kernel/ftrace_64.S | 8 +++++++-
include/linux/ftrace.h | 10 +++++++++-
3 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 93156ac4ffe0..b08c95872eed 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -56,6 +56,11 @@ arch_ftrace_get_regs(struct ftrace_regs *fregs)
return &arch_ftrace_regs(fregs)->regs;
}
+#define arch_ftrace_partial_regs(regs) do { \
+ regs->flags &= ~X86_EFLAGS_FIXED; \
+ regs->cs = __KERNEL_CS; \
+} while (0)
+
#define arch_ftrace_fill_perf_regs(fregs, _regs) do { \
(_regs)->ip = arch_ftrace_regs(fregs)->regs.ip; \
(_regs)->sp = arch_ftrace_regs(fregs)->regs.sp; \
diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S
index 367da3638167..823dbdd0eb41 100644
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -354,12 +354,17 @@ SYM_CODE_START(return_to_handler)
UNWIND_HINT_UNDEFINED
ANNOTATE_NOENDBR
+ /* Restore return_to_handler value that got eaten by previous ret instruction. */
+ subq $8, %rsp
+ UNWIND_HINT_FUNC
+
/* Save ftrace_regs for function exit context */
subq $(FRAME_SIZE), %rsp
movq %rax, RAX(%rsp)
movq %rdx, RDX(%rsp)
movq %rbp, RBP(%rsp)
+ movq %rsp, RSP(%rsp)
movq %rsp, %rdi
call ftrace_return_to_handler
@@ -368,7 +373,8 @@ SYM_CODE_START(return_to_handler)
movq RDX(%rsp), %rdx
movq RAX(%rsp), %rax
- addq $(FRAME_SIZE), %rsp
+ addq $(FRAME_SIZE) + 8, %rsp
+
/*
* Jump back to the old return address. This cannot be JMP_NOSPEC rdi
* since IBT would demand that contain ENDBR, which simply isn't so for
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 7ded7df6e9b5..07f8c309e432 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -193,6 +193,10 @@ static __always_inline struct pt_regs *ftrace_get_regs(struct ftrace_regs *fregs
#if !defined(CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS) || \
defined(CONFIG_HAVE_FTRACE_REGS_HAVING_PT_REGS)
+#ifndef arch_ftrace_partial_regs
+#define arch_ftrace_partial_regs(regs) do {} while (0)
+#endif
+
static __always_inline struct pt_regs *
ftrace_partial_regs(struct ftrace_regs *fregs, struct pt_regs *regs)
{
@@ -202,7 +206,11 @@ ftrace_partial_regs(struct ftrace_regs *fregs, struct pt_regs *regs)
* Since arch_ftrace_get_regs() will check some members and may return
* NULL, we can not use it.
*/
- return &arch_ftrace_regs(fregs)->regs;
+ regs = &arch_ftrace_regs(fregs)->regs;
+
+ /* Allow arch specific updates to regs. */
+ arch_ftrace_partial_regs(regs);
+ return regs;
}
#endif /* !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS || CONFIG_HAVE_FTRACE_REGS_HAVING_PT_REGS */
--
2.51.0
next prev parent reply other threads:[~2025-11-03 22:09 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-03 22:09 [PATCHv2 0/4] x86/fgraph,bpf: Fix ORC stack unwind from return probe Jiri Olsa
2025-11-03 22:09 ` [PATCHv2 1/4] Revert "perf/x86: Always store regs->ip in perf_callchain_kernel()" Jiri Olsa
2025-11-06 12:27 ` Peter Zijlstra
2025-11-03 22:09 ` Jiri Olsa [this message]
2025-11-03 23:47 ` [PATCHv2 2/4] x86/fgraph,bpf: Fix stack ORC unwind from kprobe_multi return probe bot+bpf-ci
2025-11-04 9:27 ` Jiri Olsa
2025-11-06 12:29 ` Peter Zijlstra
2025-11-07 22:22 ` Alexei Starovoitov
2025-11-03 22:09 ` [PATCHv2 3/4] selftests/bpf: Add stacktrace ips test for kprobe_multi/kretprobe_multi Jiri Olsa
2025-11-03 22:09 ` [PATCHv2 4/4] selftests/bpf: Add stacktrace ips test for raw_tp Jiri Olsa
2025-11-03 23:47 ` bot+bpf-ci
2025-11-04 8:35 ` Jiri Olsa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251103220924.36371-3-jolsa@kernel.org \
--to=jolsa@kernel.org \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=jpoimboe@kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mhiramat@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=songliubraving@fb.com \
--cc=x86@kernel.org \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.