From: Yonghong Song <yonghong.song@linux.dev>
To: bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>,
Salvatore Benedetto <salvabenedetto@meta.com>
Subject: [PATCH bpf-next] bpf: Use fake pt_regs when doing bpf syscall tracepoint tracing
Date: Mon, 9 Sep 2024 20:43:06 -0700 [thread overview]
Message-ID: <20240910034306.3122378-1-yonghong.song@linux.dev> (raw)
Salvatore Benedetto reported an issue that when doing syscall tracepoint
tracing the kernel stack is empty. For example, using the following
command line
bpftrace -e 'tracepoint:syscalls:sys_enter_read { print("Kernel Stack\n"); print(kstack()); }'
the output will be
===
Kernel Stack
===
Further analysis shows that pt_regs used for bpf syscall tracepoint
tracing is from the one constructed during user->kernel transition.
The call stack looks like
perf_syscall_enter+0x88/0x7c0
trace_sys_enter+0x41/0x80
syscall_trace_enter+0x100/0x160
do_syscall_64+0x38/0xf0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The ip address stored in pt_regs is from user space hence no kernel
stack is printed.
To fix the issue, we need to use kernel address from pt_regs.
In kernel repo, there are already a few cases like this. For example,
in kernel/trace/bpf_trace.c, several perf_fetch_caller_regs(fake_regs_ptr)
instances are used to supply ip address or use ip address to construct
call stack.
The patch follows the above example by using a fake pt_regs.
The pt_regs is stored in local stack since the syscall tracepoint
tracing is in process context and there are no possibility that
different concurrent syscall tracepoint tracing could mess up with each
other. This is similar to a perf_fetch_caller_regs() use case in
kernel/trace/trace_event_perf.c with function perf_ftrace_function_call()
where a local pt_regs is used.
With this patch, for the above bpftrace script, I got the following output
===
Kernel Stack
syscall_trace_enter+407
syscall_trace_enter+407
do_syscall_64+74
entry_SYSCALL_64_after_hwframe+75
===
Reported-by: Salvatore Benedetto <salvabenedetto@meta.com>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
kernel/trace/trace_syscalls.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 9c581d6da843..063f51952d49 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -559,12 +559,15 @@ static int perf_call_bpf_enter(struct trace_event_call *call, struct pt_regs *re
int syscall_nr;
unsigned long args[SYSCALL_DEFINE_MAXARGS];
} __aligned(8) param;
+ struct pt_regs fake_regs;
int i;
BUILD_BUG_ON(sizeof(param.ent) < sizeof(void *));
/* bpf prog requires 'regs' to be the first member in the ctx (a.k.a. ¶m) */
- *(struct pt_regs **)¶m = regs;
+ memset(&fake_regs, 0, sizeof(fake_regs));
+ perf_fetch_caller_regs(&fake_regs);
+ *(struct pt_regs **)¶m = &fake_regs;
param.syscall_nr = rec->nr;
for (i = 0; i < sys_data->nb_args; i++)
param.args[i] = rec->args[i];
--
2.43.5
next reply other threads:[~2024-09-10 3:43 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-10 3:43 Yonghong Song [this message]
2024-09-10 5:34 ` [PATCH bpf-next] bpf: Use fake pt_regs when doing bpf syscall tracepoint tracing Andrii Nakryiko
2024-09-10 5:42 ` Andrii Nakryiko
2024-09-10 15:25 ` Yonghong Song
2024-09-10 16:50 ` Andrii Nakryiko
2024-09-10 18:22 ` Yonghong Song
2024-09-10 18:25 ` Andrii Nakryiko
2024-09-10 15:23 ` Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240910034306.3122378-1-yonghong.song@linux.dev \
--to=yonghong.song@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@fb.com \
--cc=martin.lau@kernel.org \
--cc=salvabenedetto@meta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox