From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
To: Tianyi Liu <i.pear@outlook.com>
Cc: andrii.nakryiko@gmail.com, ajor@meta.com,
albancrequy@linux.microsoft.com, bpf@vger.kernel.org,
flaniel@linux.microsoft.com, jolsa@kernel.org,
linux-trace-kernel@vger.kernel.org, linux@jordanrome.com,
mathieu.desnoyers@efficios.com, oleg@redhat.com
Subject: Re: [PATCH v2] tracing/uprobe: Add missing PID filter for uretprobe
Date: Sun, 25 Aug 2024 02:27:55 +0900 [thread overview]
Message-ID: <20240825022755.a430c0969a1e784adc444a4f@kernel.org> (raw)
In-Reply-To: <ME0P300MB04163A2993D1B545C3533DDC9D892@ME0P300MB0416.AUSP300.PROD.OUTLOOK.COM>
On Sat, 24 Aug 2024 13:49:26 +0800
Tianyi Liu <i.pear@outlook.com> wrote:
> Hi Masami and Andrii:
>
> I would like to share more information and ideas, but I'm possibly wrong.
>
> > > U(ret)probes are designed to be filterable using the PID, which is the
> > > second parameter in the perf_event_open syscall. Currently, uprobe works
> > > well with the filtering, but uretprobe is not affected by it. This often
> > > leads to users being disturbed by events from uninterested processes while
> > > using uretprobe.
> > >
> > > We found that the filter function was not invoked when uretprobe was
> > > initially implemented, and this has been existing for ten years. We have
> > > tested the patch under our workload, binding eBPF programs to uretprobe
> > > tracepoints, and confirmed that it resolved our problem.
> >
> > Is this eBPF related problem? It seems only perf record is also affected.
> > Let me try.
>
> I guess it should be a general issue and is not specific to BPF, because
> the BPF handler is only a event "consumer".
>
> >
> > And trace one of them;
> >
> > $ sudo ~/bin/perf trace record -e probe_malloc:malloc__return -p 93928
> >
>
> A key trigger here is that the two tracer instances (either `bpftrace` or
> `perf record`) must be running *simultaneously*. One of them should use
> PID1 as filter, while the other uses PID2.
Even if I run `perf trace record` simultaneously, it filters the event
correctly. I just ran;
sudo ~/bin/perf trace record -e probe_malloc:malloc__return -p 93927 -o trace1.data -- sleep 50 &
sudo ~/bin/perf trace record -e probe_malloc:malloc__return -p 93928 -o trace2.data -- sleep 50 &
And dump trace1.data and trace2.data by `perf script`.
>
> I think the reason why only tracing PID1 fails to trigger the bug is that,
> uprobe uses some form of copy-on-write mechanism to create independent
> .text pages for the traced process. For example, if only PID1 is being
> traced, then only the libc.so used by PID1 is patched. Other processes
> will continue to use the unpatched original libc.so, so the event isn't
> triggered. In my reproduction example, the two bpftrace instances must be
> running at the same moment.
>
> > This is a bit confusing, because even if the kernel-side uretprobe
> > handler doesn't do the filtering by itself, uprobe subsystem shouldn't
> > install breakpoints on processes which don't have uretprobe requested
> > for (unless I'm missing something, of course).
>
> There're two tracers, one attached to PID1, and the other attached
> to PID2, as described above.
Yeah, but perf works fine. Maybe perf only enables its ring buffer for
the specific process and read from the specific ring buffer (Note, perf
has per-process ring buffer IIRC.) How does the bpftracer implement it?
>
> > It still needs to be fixed like you do in your patch, though. Even
> > more, we probably need a similar UPROBE_HANDLER_REMOVE handling in
> > handle_uretprobe_chain() to clean up breakpoint for processes which
> > don't have uretprobe attached anymore (but I think that's a separate
> > follow up).
>
> Agreed, the implementation of uretprobe should be almost the same as
> uprobe, but it seems uretprobe was ignored in previous modifications.
OK, I just have a confirmation from BPF people, because I could not
reproduce the issue with perf tool.
>
> > $ sudo ~/bin/perf trace record -e probe_malloc:malloc__return -p 93928
> > ^C[ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.031 MB perf.data (9 samples) ]
> >
> > And dump the data;
> >
> > $ sudo ~/bin/perf script
> > malloc-run 93928 [004] 351736.730649: raw_syscalls:sys_exit: NR 230 = 0
> > malloc-run 93928 [004] 351736.730694: probe_malloc:malloc__return: (561cfdeb30c0 <- 561cfdeb3204)
> > malloc-run 93928 [004] 351736.730696: raw_syscalls:sys_enter: NR 230 (0, 0, 7ffc7a5c5380, 7ffc7a5c5380, 561d2940f6b0,
> > malloc-run 93928 [004] 351738.730857: raw_syscalls:sys_exit: NR 230 = 0
> > malloc-run 93928 [004] 351738.730869: probe_malloc:malloc__return: (561cfdeb30c0 <- 561cfdeb3204)
> > malloc-run 93928 [004] 351738.730883: raw_syscalls:sys_enter: NR 230 (0, 0, 7ffc7a5c5380, 7ffc7a5c5380, 561d2940f6b0,
> > malloc-run 93928 [004] 351740.731110: raw_syscalls:sys_exit: NR 230 = 0
> > malloc-run 93928 [004] 351740.731125: probe_malloc:malloc__return: (561cfdeb30c0 <- 561cfdeb3204)
> > malloc-run 93928 [004] 351740.731127: raw_syscalls:sys_enter: NR 230 (0, 0, 7ffc7a5c5380, 7ffc7a5c5380, 561d2940f6b0,
> >
> > Hmm, it seems to trace one pid data. (without this change)
> > If this changes eBPF behavior, I would like to involve eBPF people to ask
> > this is OK. As far as from the viewpoint of perf tool, current code works.
>
> I tried this and also couldn't reproduce the bug. Even when running two
> perf instances simultaneously, `perf record` (or perhaps `perf trace` for
> convenience) only outputs events from the corresponding PID as expected.
> I initially suspected that perf might be applying another filter in user
> space, but that doesn't seem to be the case. I'll need to conduct further
> debugging, which might take some time.
>
> I also tried combining bpftrace with `perf trace`. Specifically, I used
> `perf trace` for PID1 and bpftrace for PID2. `perf trace` still only
> outputs events from PID1, but bpftrace prints events from both PIDs.
> I'm not yet sure why this is happening.
Yeah, if perf only reads the specific process's ring buffer, it should
work without this filter.
Thanks,
>
> Thanks so much,
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
next prev parent reply other threads:[~2024-08-24 17:28 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <ME0P300MB0416034322B9915ECD3888649D882@ME0P300MB0416.AUSP300.PROD.OUTLOOK.COM>
2024-08-23 17:44 ` [PATCH v2] tracing/uprobe: Add missing PID filter for uretprobe Masami Hiramatsu
2024-08-23 19:07 ` Andrii Nakryiko
2024-08-24 5:49 ` Tianyi Liu
2024-08-24 17:27 ` Masami Hiramatsu [this message]
2024-08-25 17:14 ` Oleg Nesterov
2024-08-25 18:43 ` Oleg Nesterov
2024-08-25 22:40 ` Oleg Nesterov
2024-08-26 10:05 ` Jiri Olsa
2024-08-26 11:57 ` Oleg Nesterov
2024-08-26 12:24 ` Oleg Nesterov
2024-08-26 13:48 ` Jiri Olsa
2024-08-26 18:56 ` Oleg Nesterov
2024-08-26 21:25 ` Oleg Nesterov
2024-08-26 22:01 ` Jiri Olsa
2024-08-26 22:08 ` Andrii Nakryiko
2024-08-26 22:29 ` Oleg Nesterov
2024-08-27 13:07 ` Jiri Olsa
2024-08-27 13:45 ` Jiri Olsa
2024-08-27 16:45 ` Oleg Nesterov
2024-08-28 11:40 ` Jiri Olsa
2024-08-27 20:19 ` Oleg Nesterov
2024-08-28 11:46 ` Jiri Olsa
2024-08-29 15:20 ` Oleg Nesterov
2024-08-29 19:46 ` Jiri Olsa
2024-08-29 21:12 ` Oleg Nesterov
2024-08-29 23:22 ` Jiri Olsa
2024-08-27 6:27 ` Tianyi Liu
2024-08-27 10:08 ` Jiri Olsa
2024-08-27 10:20 ` Jiri Olsa
2024-08-27 10:54 ` Oleg Nesterov
2024-08-27 10:40 ` Oleg Nesterov
2024-08-27 13:32 ` Jiri Olsa
2024-08-27 14:26 ` Oleg Nesterov
2024-08-27 14:41 ` Jiri Olsa
2024-08-26 14:52 ` Tianyi Liu
2024-08-25 17:00 ` Oleg Nesterov
2024-08-30 10:12 ` Oleg Nesterov
2024-08-30 12:23 ` Oleg Nesterov
2024-08-30 13:34 ` Jiri Olsa
2024-08-30 15:51 ` Andrii Nakryiko
2024-09-02 9:11 ` Jiri Olsa
2024-09-03 18:09 ` Andrii Nakryiko
2024-09-03 18:11 ` Andrii Nakryiko
2024-09-03 19:15 ` Jiri Olsa
2024-09-01 19:22 ` Tianyi Liu
2024-09-01 23:26 ` Oleg Nesterov
2024-09-02 17:17 ` Oleg Nesterov
2024-09-03 14:33 ` Jiri Olsa
2024-09-06 10:43 ` Jiri Olsa
2024-09-06 19:18 ` Oleg Nesterov
2024-09-09 10:41 ` Jiri Olsa
2024-09-09 18:34 ` Oleg Nesterov
2024-09-10 8:45 ` Jiri Olsa
2024-09-07 19:19 ` Tianyi Liu
2024-09-08 13:15 ` Oleg Nesterov
2024-09-09 1:16 ` Andrii Nakryiko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240825022755.a430c0969a1e784adc444a4f@kernel.org \
--to=mhiramat@kernel.org \
--cc=ajor@meta.com \
--cc=albancrequy@linux.microsoft.com \
--cc=andrii.nakryiko@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=flaniel@linux.microsoft.com \
--cc=i.pear@outlook.com \
--cc=jolsa@kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=linux@jordanrome.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=oleg@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox