public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
To: Tianyi Liu <i.pear@outlook.com>
Cc: andrii.nakryiko@gmail.com, ajor@meta.com,
	albancrequy@linux.microsoft.com, bpf@vger.kernel.org,
	flaniel@linux.microsoft.com, jolsa@kernel.org,
	linux-trace-kernel@vger.kernel.org, linux@jordanrome.com,
	mathieu.desnoyers@efficios.com, oleg@redhat.com
Subject: Re: [PATCH v2] tracing/uprobe: Add missing PID filter for uretprobe
Date: Sun, 25 Aug 2024 02:27:55 +0900	[thread overview]
Message-ID: <20240825022755.a430c0969a1e784adc444a4f@kernel.org> (raw)
In-Reply-To: <ME0P300MB04163A2993D1B545C3533DDC9D892@ME0P300MB0416.AUSP300.PROD.OUTLOOK.COM>

On Sat, 24 Aug 2024 13:49:26 +0800
Tianyi Liu <i.pear@outlook.com> wrote:

> Hi Masami and Andrii:
> 
> I would like to share more information and ideas, but I'm possibly wrong.
> 
> > > U(ret)probes are designed to be filterable using the PID, which is the
> > > second parameter in the perf_event_open syscall. Currently, uprobe works
> > > well with the filtering, but uretprobe is not affected by it. This often
> > > leads to users being disturbed by events from uninterested processes while
> > > using uretprobe.
> > >
> > > We found that the filter function was not invoked when uretprobe was
> > > initially implemented, and this has been existing for ten years. We have
> > > tested the patch under our workload, binding eBPF programs to uretprobe
> > > tracepoints, and confirmed that it resolved our problem.
> > 
> > Is this eBPF related problem? It seems only perf record is also affected.
> > Let me try.
> 
> I guess it should be a general issue and is not specific to BPF, because
> the BPF handler is only a event "consumer".
> 
> > 
> > And trace one of them;
> > 
> > $ sudo ~/bin/perf trace record -e probe_malloc:malloc__return  -p 93928
> > 
> 
> A key trigger here is that the two tracer instances (either `bpftrace` or
> `perf record`) must be running *simultaneously*. One of them should use
> PID1 as filter, while the other uses PID2.

Even if I run `perf trace record` simultaneously, it filters the event
correctly. I just ran;

sudo ~/bin/perf trace record -e probe_malloc:malloc__return -p 93927 -o trace1.data -- sleep 50 &
sudo ~/bin/perf trace record -e probe_malloc:malloc__return -p 93928 -o trace2.data -- sleep 50 &

And dump trace1.data and trace2.data by `perf script`.

> 
> I think the reason why only tracing PID1 fails to trigger the bug is that,
> uprobe uses some form of copy-on-write mechanism to create independent
> .text pages for the traced process. For example, if only PID1 is being
> traced, then only the libc.so used by PID1 is patched. Other processes
> will continue to use the unpatched original libc.so, so the event isn't
> triggered. In my reproduction example, the two bpftrace instances must be
> running at the same moment.
> 
> > This is a bit confusing, because even if the kernel-side uretprobe
> > handler doesn't do the filtering by itself, uprobe subsystem shouldn't
> > install breakpoints on processes which don't have uretprobe requested
> > for (unless I'm missing something, of course).
> 
> There're two tracers, one attached to PID1, and the other attached
> to PID2, as described above.

Yeah, but perf works fine. Maybe perf only enables its ring buffer for
the specific process and read from the specific ring buffer (Note, perf
has per-process ring buffer IIRC.) How does the bpftracer implement it?

> 
> > It still needs to be fixed like you do in your patch, though. Even
> > more, we probably need a similar UPROBE_HANDLER_REMOVE handling in
> > handle_uretprobe_chain() to clean up breakpoint for processes which
> > don't have uretprobe attached anymore (but I think that's a separate
> > follow up).
> 
> Agreed, the implementation of uretprobe should be almost the same as
> uprobe, but it seems uretprobe was ignored in previous modifications.

OK, I just have a confirmation from BPF people, because I could not
reproduce the issue with perf tool.

> 
> > $ sudo ~/bin/perf trace record -e probe_malloc:malloc__return  -p 93928
> > ^C[ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.031 MB perf.data (9 samples) ]
> > 
> > And dump the data;
> > 
> > $ sudo ~/bin/perf script
> >       malloc-run   93928 [004] 351736.730649:       raw_syscalls:sys_exit: NR 230 = 0
> >       malloc-run   93928 [004] 351736.730694: probe_malloc:malloc__return: (561cfdeb30c0 <- 561cfdeb3204)
> >       malloc-run   93928 [004] 351736.730696:      raw_syscalls:sys_enter: NR 230 (0, 0, 7ffc7a5c5380, 7ffc7a5c5380, 561d2940f6b0,
> >       malloc-run   93928 [004] 351738.730857:       raw_syscalls:sys_exit: NR 230 = 0
> >       malloc-run   93928 [004] 351738.730869: probe_malloc:malloc__return: (561cfdeb30c0 <- 561cfdeb3204)
> >       malloc-run   93928 [004] 351738.730883:      raw_syscalls:sys_enter: NR 230 (0, 0, 7ffc7a5c5380, 7ffc7a5c5380, 561d2940f6b0,
> >       malloc-run   93928 [004] 351740.731110:       raw_syscalls:sys_exit: NR 230 = 0
> >       malloc-run   93928 [004] 351740.731125: probe_malloc:malloc__return: (561cfdeb30c0 <- 561cfdeb3204)
> >       malloc-run   93928 [004] 351740.731127:      raw_syscalls:sys_enter: NR 230 (0, 0, 7ffc7a5c5380, 7ffc7a5c5380, 561d2940f6b0,
> > 
> > Hmm, it seems to trace one pid data. (without this change)
> > If this changes eBPF behavior, I would like to involve eBPF people to ask
> > this is OK. As far as from the viewpoint of perf tool, current code works.
> 
> I tried this and also couldn't reproduce the bug. Even when running two
> perf instances simultaneously, `perf record` (or perhaps `perf trace` for
> convenience) only outputs events from the corresponding PID as expected.
> I initially suspected that perf might be applying another filter in user
> space, but that doesn't seem to be the case. I'll need to conduct further
> debugging, which might take some time.
> 
> I also tried combining bpftrace with `perf trace`. Specifically, I used
> `perf trace` for PID1 and bpftrace for PID2. `perf trace` still only
> outputs events from PID1, but bpftrace prints events from both PIDs.
> I'm not yet sure why this is happening.

Yeah, if perf only reads the specific process's ring buffer, it should
work without this filter.

Thanks,

> 
> Thanks so much,


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

  reply	other threads:[~2024-08-24 17:28 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <ME0P300MB0416034322B9915ECD3888649D882@ME0P300MB0416.AUSP300.PROD.OUTLOOK.COM>
2024-08-23 17:44 ` [PATCH v2] tracing/uprobe: Add missing PID filter for uretprobe Masami Hiramatsu
2024-08-23 19:07   ` Andrii Nakryiko
2024-08-24  5:49     ` Tianyi Liu
2024-08-24 17:27       ` Masami Hiramatsu [this message]
2024-08-25 17:14       ` Oleg Nesterov
2024-08-25 18:43         ` Oleg Nesterov
2024-08-25 22:40         ` Oleg Nesterov
2024-08-26 10:05           ` Jiri Olsa
2024-08-26 11:57             ` Oleg Nesterov
2024-08-26 12:24               ` Oleg Nesterov
2024-08-26 13:48               ` Jiri Olsa
2024-08-26 18:56                 ` Oleg Nesterov
2024-08-26 21:25                 ` Oleg Nesterov
2024-08-26 22:01                   ` Jiri Olsa
2024-08-26 22:08                     ` Andrii Nakryiko
2024-08-26 22:29                     ` Oleg Nesterov
2024-08-27 13:07                       ` Jiri Olsa
2024-08-27 13:45                         ` Jiri Olsa
2024-08-27 16:45                         ` Oleg Nesterov
2024-08-28 11:40                           ` Jiri Olsa
2024-08-27 20:19                         ` Oleg Nesterov
2024-08-28 11:46                           ` Jiri Olsa
2024-08-29 15:20                             ` Oleg Nesterov
2024-08-29 19:46                               ` Jiri Olsa
2024-08-29 21:12                                 ` Oleg Nesterov
2024-08-29 23:22                                   ` Jiri Olsa
2024-08-27  6:27                   ` Tianyi Liu
2024-08-27 10:08               ` Jiri Olsa
2024-08-27 10:20                 ` Jiri Olsa
2024-08-27 10:54                   ` Oleg Nesterov
2024-08-27 10:40                 ` Oleg Nesterov
2024-08-27 13:32                   ` Jiri Olsa
2024-08-27 14:26                     ` Oleg Nesterov
2024-08-27 14:41                       ` Jiri Olsa
2024-08-26 14:52           ` Tianyi Liu
2024-08-25 17:00     ` Oleg Nesterov
2024-08-30 10:12 ` Oleg Nesterov
2024-08-30 12:23   ` Oleg Nesterov
2024-08-30 13:34   ` Jiri Olsa
2024-08-30 15:51     ` Andrii Nakryiko
2024-09-02  9:11       ` Jiri Olsa
2024-09-03 18:09         ` Andrii Nakryiko
2024-09-03 18:11           ` Andrii Nakryiko
2024-09-03 19:15             ` Jiri Olsa
2024-09-01 19:22   ` Tianyi Liu
2024-09-01 23:26     ` Oleg Nesterov
2024-09-02 17:17       ` Oleg Nesterov
2024-09-03 14:33         ` Jiri Olsa
2024-09-06 10:43     ` Jiri Olsa
2024-09-06 19:18       ` Oleg Nesterov
2024-09-09 10:41         ` Jiri Olsa
2024-09-09 18:34           ` Oleg Nesterov
2024-09-10  8:45             ` Jiri Olsa
2024-09-07 19:19       ` Tianyi Liu
2024-09-08 13:15         ` Oleg Nesterov
2024-09-09  1:16           ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240825022755.a430c0969a1e784adc444a4f@kernel.org \
    --to=mhiramat@kernel.org \
    --cc=ajor@meta.com \
    --cc=albancrequy@linux.microsoft.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=flaniel@linux.microsoft.com \
    --cc=i.pear@outlook.com \
    --cc=jolsa@kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=linux@jordanrome.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox