Re: [PATCH] perf/bpf: Don't call bpf_overflow_handler() for tracing events

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Kyle Huey <me@kylehuey.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Olsa <olsajiri@gmail.com>,
	khuey@kylehuey.com, Ingo Molnar <mingo@redhat.com>,
	 Namhyung Kim <namhyung@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	 robert@ocallahan.org, Joe Damato <jdamato@fastly.com>,
	 Arnaldo Carvalho de Melo <acme@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	 Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Ian Rogers <irogers@google.com>,
	 Adrian Hunter <adrian.hunter@intel.com>,
	"Liang, Kan" <kan.liang@linux.intel.com>,
	 Andrii Nakryiko <andrii@kernel.org>, Song Liu <song@kernel.org>,
	linux-perf-users@vger.kernel.org,  linux-kernel@vger.kernel.org,
	bpf@vger.kernel.org
Subject: Re: [PATCH] perf/bpf: Don't call bpf_overflow_handler() for tracing events
Date: Mon, 15 Jul 2024 08:19:44 -0700	[thread overview]
Message-ID: <CAP045Aq3Mv2oDMCU8-Afe7Ne+RLH62120F3RWqc+p9STpcxyxg@mail.gmail.com> (raw)
In-Reply-To: <20240715150410.GJ14400@noisy.programming.kicks-ass.net>

On Mon, Jul 15, 2024 at 8:04 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Jul 15, 2024 at 07:33:57AM -0700, Kyle Huey wrote:
> > On Mon, Jul 15, 2024 at 4:12 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> > > Urgh, so wth does event_is_tracing do with event->prog? And can't we
> > > clean this up?
> >
> > Tracing events keep track of the bpf program in event->prog solely for
> > cleanup. The bpf programs are stored in and invoked from
> > event->tp_event->prog_array, but when the event is destroyed it needs
> > to know which bpf program to remove from that array.
>
> Yeah, figured it out eventually.. Does look like it needs event->prog
> and we can't easily remedy this dual use :/
>
> > > That whole perf_event_is_tracing() is a pretty gross function.
> > >
> > > Also, I think the default return value of bpf_overflow_handler() is
> > > wrong -- note how if !event->prog we won't call bpf_overflow_handler(),
> > > but if we do call it, but then have !event->prog on the re-read, we
> > > still return 0.
> >
> > The synchronization model here isn't quite clear to me but I don't
> > think this matters in practice. Once event->prog is set the only
> > allowed change is for it to be cleared when the perf event is freed.
> > Anything else is refused by perf_event_set_bpf_handler() with EEXIST.
> > Can that free race with an overflow handler? I'm not sure, but even if
> > it can, dropping an overflow for an event that's being freed seems
> > fine to me. If it can't race then we could remove the condition on the
> > re-read entirely.
>
> Right, also rcu_read_lock() is cheap enough to unconditionally do I'm
> thinking.
>
> So since we have two distinct users of event->prog, I figured we could
> distinguish them from one of the LSB in the pointer value, which then
> got me the below.
>
> But now that I see the end result I'm not at all sure this is sane.
>
> But I figure it ought to work...

I think this would probably work but stealing the bit seems far more
complicated than just gating on perf_event_is_tracing().

Would it assuage your concerns at all if I made event->prog a simple
union between say handler_prog and sample_prog (still discriminated by
perf_event_is_tracing() where necessary) with appropriate comments and
changed the two code paths accordingly?

- Kyle

> ---
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index ab6c4c942f79..5ec78346c2a1 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -9594,6 +9594,13 @@ static inline bool sample_is_allowed(struct perf_event *event, struct pt_regs *r
>  }
>
>  #ifdef CONFIG_BPF_SYSCALL
> +
> +static inline struct bpf_prog *event_prog(struct perf_event *event)
> +{
> +       unsigned long _prog = (unsigned long)READ_ONCE(event->prog);
> +       return (void *)(_prog & ~1);
> +}
> +
>  static int bpf_overflow_handler(struct perf_event *event,
>                                 struct perf_sample_data *data,
>                                 struct pt_regs *regs)
> @@ -9603,19 +9610,21 @@ static int bpf_overflow_handler(struct perf_event *event,
>                 .event = event,
>         };
>         struct bpf_prog *prog;
> -       int ret = 0;
> +       int ret = 1;
> +
> +       guard(rcu)();
>
> -       ctx.regs = perf_arch_bpf_user_pt_regs(regs);
> -       if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
> -               goto out;
> -       rcu_read_lock();
>         prog = READ_ONCE(event->prog);
> -       if (prog) {
> +       if (!((unsigned long)prog & 1))
> +               return ret;
> +
> +       prog = (void *)((unsigned long)prog & ~1);
> +
> +       if (unlikely(__this_cpu_inc_return(bpf_prog_active) == 1)) {
>                 perf_prepare_sample(data, event, regs);
> +               ctx.regs = perf_arch_bpf_user_pt_regs(regs);
>                 ret = bpf_prog_run(prog, &ctx);
>         }
> -       rcu_read_unlock();
> -out:
>         __this_cpu_dec(bpf_prog_active);
>
>         return ret;
> @@ -9652,14 +9661,14 @@ static inline int perf_event_set_bpf_handler(struct perf_event *event,
>                 return -EPROTO;
>         }
>
> -       event->prog = prog;
> +       event->prog = (void *)((unsigned long)prog | 1);
>         event->bpf_cookie = bpf_cookie;
>         return 0;
>  }
>
>  static inline void perf_event_free_bpf_handler(struct perf_event *event)
>  {
> -       struct bpf_prog *prog = event->prog;
> +       struct bpf_prog *prog = event_prog(event);
>
>         if (!prog)
>                 return;
> @@ -9707,7 +9716,7 @@ static int __perf_event_overflow(struct perf_event *event,
>
>         ret = __perf_event_account_interrupt(event, throttle);
>
> -       if (event->prog && !bpf_overflow_handler(event, data, regs))
> +       if (!bpf_overflow_handler(event, data, regs))
>                 return ret;
>
>         /*
> @@ -12026,10 +12035,10 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
>                 context = parent_event->overflow_handler_context;
>  #if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_EVENT_TRACING)
>                 if (parent_event->prog) {
> -                       struct bpf_prog *prog = parent_event->prog;
> +                       struct bpf_prog *prog = event_prog(parent_event);
>
>                         bpf_prog_inc(prog);
> -                       event->prog = prog;
> +                       event->prog = parent_event->prog;
>                 }
>  #endif
>         }

next prev parent reply	other threads:[~2024-07-15 15:20 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-13  4:46 [PATCH] perf/bpf: Don't call bpf_overflow_handler() for tracing events Kyle Huey
2024-07-13 20:32 ` Jiri Olsa
2024-07-15 11:12   ` Peter Zijlstra
2024-07-15 14:33     ` Kyle Huey
2024-07-15 15:04       ` Peter Zijlstra
2024-07-15 15:19         ` Kyle Huey [this message]
2024-07-15 16:30           ` Peter Zijlstra
2024-07-15 16:48             ` Kyle Huey
2024-07-16  7:25               ` Jiri Olsa
2024-07-19 18:26                 ` Andrii Nakryiko
2024-07-26 12:37                   ` Kyle Huey
2024-07-26 16:34                     ` Andrii Nakryiko
2024-07-26 16:35                       ` Kyle Huey
2024-08-05 11:55                         ` Joe Damato
2024-08-13 10:37                           ` Joe Damato
2024-08-13 13:38                             ` Kyle Huey
2024-07-20 16:03                 ` Masami Hiramatsu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAP045Aq3Mv2oDMCU8-Afe7Ne+RLH62120F3RWqc+p9STpcxyxg@mail.gmail.com \
    --to=me@kylehuey.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=irogers@google.com \
    --cc=jdamato@fastly.com \
    --cc=kan.liang@linux.intel.com \
    --cc=khuey@kylehuey.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=olsajiri@gmail.com \
    --cc=peterz@infradead.org \
    --cc=robert@ocallahan.org \
    --cc=song@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).