All of lore.kernel.org
 help / color / mirror / Atom feed
From: Namhyung Kim <namhyung@kernel.org>
To: Howard Chu <howardchu95@gmail.com>
Cc: Jiri Olsa <olsajiri@gmail.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	acme@kernel.org, mingo@redhat.com, mark.rutland@arm.com,
	alexander.shishkin@linux.intel.com, irogers@google.com,
	adrian.hunter@intel.com, peterz@infradead.org,
	kan.liang@linux.intel.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, Song Liu <song@kernel.org>,
	bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>
Subject: Re: [RFC PATCH v1] perf trace: Mitigate failures in parallel perf trace instances
Date: Mon, 11 Aug 2025 13:15:12 -0700	[thread overview]
Message-ID: <aJpPUHUEJ7cLKd8e@google.com> (raw)
In-Reply-To: <CAH0uvogvkRoHc6jWYSJHLenaRMru23YaGfA1i_vWZ6eF9LwVzw@mail.gmail.com>

Hello,

Sorry for the late reply.

On Mon, Jun 09, 2025 at 11:38:00AM -0700, Howard Chu wrote:
> Hi Jiri,
> 
> On Wed, Jun 4, 2025 at 3:25 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Mon, Jun 02, 2025 at 06:17:43PM -0400, Steven Rostedt wrote:
> > > On Fri, 30 May 2025 17:00:38 -0700
> > > Howard Chu <howardchu95@gmail.com> wrote:
> > >
> > > > Hello Namhyung,
> > > >
> > > > On Fri, May 30, 2025 at 4:37 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > On Wed, May 28, 2025 at 11:55:36PM -0700, Howard Chu wrote:
> > > > > > perf trace utilizes the tracepoint utility, the only filter in perf
> > > > > > trace is a filter on syscall type. For example, if perf traces only
> > > > > > openat, then it filters all the other syscalls, such as readlinkat,
> > > > > > readv, etc.
> > > > > >
> > > > > > This filtering is flawed. Consider this case: two perf trace
> > > > > > instances are running at the same time, trace instance A tracing
> > > > > > readlinkat, trace instance B tracing openat. When an openat syscall
> > > > > > enters, it triggers both BPF programs (sys_enter) in both perf trace
> > > > > > instances, these kernel functions will be executed:
> > > > > >
> > > > > > perf_syscall_enter
> > > > > >   perf_call_bpf_enter
> > > > > >     trace_call_bpf
> > > > > >       bpf_prog_run_array
> > > > > >
> > > > > > In bpf_prog_run_array:
> > > > > > ~~~
> > > > > > while ((prog = READ_ONCE(item->prog))) {
> > > > > >       run_ctx.bpf_cookie = item->bpf_cookie;
> > > > > >       ret &= run_prog(prog, ctx);
> > > > > >       item++;
> > > > > > }
> > > > > > ~~~
> > > > > >
> > > > > > I'm not a BPF expert, but by tinkering I found that if one of the BPF
> > > > > > programs returns 0, there will be no tracepoint sample. That is,
> > > > > >
> > > > > > (Is there a sample?) = ProgRetA & ProgRetB & ProgRetC
> > > > > >
> > > > > > Where ProgRetA is the return value of one of the BPF programs in the BPF
> > > > > > program array.
> > > > > >
> > > > > > Go back to the case, when two perf trace instances are tracing two
> > > > > > different syscalls, again, A is tracing readlinkat, B is tracing openat,
> > > > > > when an openat syscall enters, it triggers the sys_enter program in
> > > > > > instance A, call it ProgA, and the sys_enter program in instance B,
> > > > > > ProgB, now ProgA will return 0 because ProgA cares about readlinkat only,
> > > > > > even though ProgB returns 1; (Is there a sample?) = ProgRetA (0) &
> > > > > > ProgRetB (1) = 0. So there won't be a tracepoint sample in B's output,
> > > > > > when there really should be one.
> > > > >
> > > > > Sounds like a bug.  I think it should run bpf programs attached to the
> > > > > current perf_event only.  Isn't it the case for tracepoint + perf + bpf?
> > > >
> > > > I really can't answer that question.
> >
> > bpf programs for tracepoint are executed before the perf event specific
> > check/trigger in perf_trace_run_bpf_submit
> >
> > bpf programs array is part of struct trace_event_call so it's global per
> > tracepoint, not per perf event

Right, I think we need a way to attach a BPF program to perf_event (as
an overflow handler), not to the trace_event_call, when it comes to a
tracepoint event.  So that it can only affect behaviors of the calling
thread.  It would access the trace data as sample raw data from ctx.

Maybe it needs new link_create flags and requires BPF_PROG_TYPE_PERF_EVENT.

Wdyt?

Thanks,
Namhyung


      reply	other threads:[~2025-08-11 20:15 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-29  6:55 [RFC PATCH v1] perf trace: Mitigate failures in parallel perf trace instances Howard Chu
2025-05-30  0:23 ` Howard Chu
2025-05-30 23:37 ` Namhyung Kim
2025-05-31  0:00   ` Howard Chu
2025-06-02 22:17     ` Steven Rostedt
2025-06-03  4:56       ` Namhyung Kim
2025-06-04 10:25       ` Jiri Olsa
2025-06-06 18:27         ` Alexei Starovoitov
2025-06-09 18:30           ` Howard Chu
2025-06-09 18:38         ` Howard Chu
2025-08-11 20:15           ` Namhyung Kim [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aJpPUHUEJ7cLKd8e@google.com \
    --to=namhyung@kernel.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=howardchu95@gmail.com \
    --cc=irogers@google.com \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@redhat.com \
    --cc=olsajiri@gmail.com \
    --cc=peterz@infradead.org \
    --cc=song@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.