Re: [PATCH] perf/bpf: Don't call bpf_overflow_handler() for tracing events

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Kyle Huey <me@kylehuey.com>
Cc: Jiri Olsa <olsajiri@gmail.com>,
	khuey@kylehuey.com, Ingo Molnar <mingo@redhat.com>,
	Namhyung Kim <namhyung@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	robert@ocallahan.org, Joe Damato <jdamato@fastly.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Ian Rogers <irogers@google.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	"Liang, Kan" <kan.liang@linux.intel.com>,
	Andrii Nakryiko <andrii@kernel.org>, Song Liu <song@kernel.org>,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	bpf@vger.kernel.org
Subject: Re: [PATCH] perf/bpf: Don't call bpf_overflow_handler() for tracing events
Date: Mon, 15 Jul 2024 17:04:10 +0200	[thread overview]
Message-ID: <20240715150410.GJ14400@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <CAP045ArBNZ559RFrvDTsHj42S4W+BuReHe+XV2tBPSeoHOMMpA@mail.gmail.com>

On Mon, Jul 15, 2024 at 07:33:57AM -0700, Kyle Huey wrote:
> On Mon, Jul 15, 2024 at 4:12 AM Peter Zijlstra <peterz@infradead.org> wrote:

> > Urgh, so wth does event_is_tracing do with event->prog? And can't we
> > clean this up?
> 
> Tracing events keep track of the bpf program in event->prog solely for
> cleanup. The bpf programs are stored in and invoked from
> event->tp_event->prog_array, but when the event is destroyed it needs
> to know which bpf program to remove from that array.

Yeah, figured it out eventually.. Does look like it needs event->prog
and we can't easily remedy this dual use :/

> > That whole perf_event_is_tracing() is a pretty gross function.
> >
> > Also, I think the default return value of bpf_overflow_handler() is
> > wrong -- note how if !event->prog we won't call bpf_overflow_handler(),
> > but if we do call it, but then have !event->prog on the re-read, we
> > still return 0.
> 
> The synchronization model here isn't quite clear to me but I don't
> think this matters in practice. Once event->prog is set the only
> allowed change is for it to be cleared when the perf event is freed.
> Anything else is refused by perf_event_set_bpf_handler() with EEXIST.
> Can that free race with an overflow handler? I'm not sure, but even if
> it can, dropping an overflow for an event that's being freed seems
> fine to me. If it can't race then we could remove the condition on the
> re-read entirely.

Right, also rcu_read_lock() is cheap enough to unconditionally do I'm
thinking.

So since we have two distinct users of event->prog, I figured we could
distinguish them from one of the LSB in the pointer value, which then
got me the below.

But now that I see the end result I'm not at all sure this is sane.

But I figure it ought to work... 

---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index ab6c4c942f79..5ec78346c2a1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9594,6 +9594,13 @@ static inline bool sample_is_allowed(struct perf_event *event, struct pt_regs *r
 }
 
 #ifdef CONFIG_BPF_SYSCALL
+
+static inline struct bpf_prog *event_prog(struct perf_event *event)
+{
+	unsigned long _prog = (unsigned long)READ_ONCE(event->prog);
+	return (void *)(_prog & ~1);
+}
+
 static int bpf_overflow_handler(struct perf_event *event,
 				struct perf_sample_data *data,
 				struct pt_regs *regs)
@@ -9603,19 +9610,21 @@ static int bpf_overflow_handler(struct perf_event *event,
 		.event = event,
 	};
 	struct bpf_prog *prog;
-	int ret = 0;
+	int ret = 1;
+
+	guard(rcu)();
 
-	ctx.regs = perf_arch_bpf_user_pt_regs(regs);
-	if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
-		goto out;
-	rcu_read_lock();
 	prog = READ_ONCE(event->prog);
-	if (prog) {
+	if (!((unsigned long)prog & 1))
+		return ret;
+
+	prog = (void *)((unsigned long)prog & ~1);
+
+	if (unlikely(__this_cpu_inc_return(bpf_prog_active) == 1)) {
 		perf_prepare_sample(data, event, regs);
+		ctx.regs = perf_arch_bpf_user_pt_regs(regs);
 		ret = bpf_prog_run(prog, &ctx);
 	}
-	rcu_read_unlock();
-out:
 	__this_cpu_dec(bpf_prog_active);
 
 	return ret;
@@ -9652,14 +9661,14 @@ static inline int perf_event_set_bpf_handler(struct perf_event *event,
 		return -EPROTO;
 	}
 
-	event->prog = prog;
+	event->prog = (void *)((unsigned long)prog | 1);
 	event->bpf_cookie = bpf_cookie;
 	return 0;
 }
 
 static inline void perf_event_free_bpf_handler(struct perf_event *event)
 {
-	struct bpf_prog *prog = event->prog;
+	struct bpf_prog *prog = event_prog(event);
 
 	if (!prog)
 		return;
@@ -9707,7 +9716,7 @@ static int __perf_event_overflow(struct perf_event *event,
 
 	ret = __perf_event_account_interrupt(event, throttle);
 
-	if (event->prog && !bpf_overflow_handler(event, data, regs))
+	if (!bpf_overflow_handler(event, data, regs))
 		return ret;
 
 	/*
@@ -12026,10 +12035,10 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 		context = parent_event->overflow_handler_context;
 #if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_EVENT_TRACING)
 		if (parent_event->prog) {
-			struct bpf_prog *prog = parent_event->prog;
+			struct bpf_prog *prog = event_prog(parent_event);
 
 			bpf_prog_inc(prog);
-			event->prog = prog;
+			event->prog = parent_event->prog;
 		}
 #endif
 	}

next prev parent reply	other threads:[~2024-07-15 15:04 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-13  4:46 [PATCH] perf/bpf: Don't call bpf_overflow_handler() for tracing events Kyle Huey
2024-07-13 20:32 ` Jiri Olsa
2024-07-15 11:12   ` Peter Zijlstra
2024-07-15 14:33     ` Kyle Huey
2024-07-15 15:04       ` Peter Zijlstra [this message]
2024-07-15 15:19         ` Kyle Huey
2024-07-15 16:30           ` Peter Zijlstra
2024-07-15 16:48             ` Kyle Huey
2024-07-16  7:25               ` Jiri Olsa
2024-07-19 18:26                 ` Andrii Nakryiko
2024-07-26 12:37                   ` Kyle Huey
2024-07-26 16:34                     ` Andrii Nakryiko
2024-07-26 16:35                       ` Kyle Huey
2024-08-05 11:55                         ` Joe Damato
2024-08-13 10:37                           ` Joe Damato
2024-08-13 13:38                             ` Kyle Huey
2024-07-20 16:03                 ` Masami Hiramatsu

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:ab6c4c942f7 dfblob:5ec78346c2a )
 OR (
bs:"Re: [PATCH] perf/bpf: Don't call bpf_overflow_handler() for tracing events" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240715150410.GJ14400@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=irogers@google.com \
    --cc=jdamato@fastly.com \
    --cc=kan.liang@linux.intel.com \
    --cc=khuey@kylehuey.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=me@kylehuey.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=olsajiri@gmail.com \
    --cc=robert@ocallahan.org \
    --cc=song@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.