From: Namhyung Kim <namhyung@kernel.org>
To: Marco Elver <elver@google.com>
Cc: Kyle Huey <me@kylehuey.com>, Kyle Huey <khuey@kylehuey.com>,
open list <linux-kernel@vger.kernel.org>,
Andrii Nakryiko <andrii.nakryiko@gmail.com>,
Jiri Olsa <jolsa@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
Robert O'Callahan <robert@ocallahan.org>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Ian Rogers <irogers@google.com>,
Adrian Hunter <adrian.hunter@intel.com>,
linux-perf-users@vger.kernel.org
Subject: Re: [PATCH v2 1/3] perf: Reorder overflow handler ahead of event_limit/sigtrap
Date: Thu, 7 Dec 2023 14:38:35 -0800 [thread overview]
Message-ID: <ZXJJa5re536_e7c1@google.com> (raw)
In-Reply-To: <CANpmjNOCFz43zhJm1mS5Ai1XTQiZ_-BW2K3z8vJEV4CyfoZtCA@mail.gmail.com>
Hello,
On Thu, Dec 07, 2023 at 06:53:58PM +0100, Marco Elver wrote:
> On Thu, 7 Dec 2023 at 18:47, Kyle Huey <me@kylehuey.com> wrote:
> >
> >
> >
> > On Thu, Dec 7, 2023, 9:05 AM Marco Elver <elver@google.com> wrote:
> >>
> >> On Thu, 7 Dec 2023 at 17:35, Kyle Huey <me@kylehuey.com> wrote:
> >> >
> >> > The perf subsystem already allows an overflow handler to clear pending_kill
> >> > to suppress a fasync signal (although nobody currently does this). Allow an
> >> > overflow handler to suppress the other visible side effects of an overflow,
> >> > namely event_limit accounting and SIGTRAP generation.
Well, I think it can still hit the throttling logic and generate
a PERF_RECORD_THROTTLE. But it should be rare..
> >> >
> >> > Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> >> > ---
> >> > kernel/events/core.c | 10 +++++++---
> >> > 1 file changed, 7 insertions(+), 3 deletions(-)
> >> >
> >> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> >> > index b704d83a28b2..19fddfc27a4a 100644
> >> > --- a/kernel/events/core.c
> >> > +++ b/kernel/events/core.c
> >> > @@ -9541,6 +9541,12 @@ static int __perf_event_overflow(struct perf_event *event,
> >> > */
> >> >
> >> > event->pending_kill = POLL_IN;
> >> > +
> >> > + READ_ONCE(event->overflow_handler)(event, data, regs);
> >> > +
> >> > + if (!event->pending_kill)
> >> > + return ret;
> >>
> >> It's not at all intuitive that resetting pending_kill to 0 will
> >> suppress everything else, too. There is no relationship between the
> >> fasync signals and SIGTRAP. pending_kill is for the former and
> >> pending_sigtrap is for the latter. One should not affect the other.
> >>
> >> A nicer solution would be to properly undo the various pending_*
> >> states (in the case of pending_sigtrap being set it should be enough
> >> to reset pending_sigtrap to 0, and also decrement
> >> event->ctx->nr_pending).
> >
> >
> > I don't believe it's possible to correctly undo the event_limit decrement after the fact (if it's e.g. racing with the ioctl that adds to the event limit).
> >
> >> Although I can see why this solution is simpler. Perhaps with enough
> >> comments it might be clearer.
> >>
> >> Preferences?
> >
> >
> > The cleanest way would probably be to add a return value to the overflow handler function that controls this. It requires changing a bunch of arch specific code on arches I don't have access to though.
>
> Hmm.
>
> Maybe wait for perf maintainers to say what is preferrable. (I could
> live with just making sure this has no other weird side effects and
> more comments.)
What if we can call bpf handler directly and check the return value?
Then I think we can also get rid of the original overflow handler.
Something like this (untested..)
Thanks,
Namhyung
---8<---
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e85cd1c0eaf3..1eba6f5bb70b 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -809,7 +809,6 @@ struct perf_event {
perf_overflow_handler_t overflow_handler;
void *overflow_handler_context;
#ifdef CONFIG_BPF_SYSCALL
- perf_overflow_handler_t orig_overflow_handler;
struct bpf_prog *prog;
u64 bpf_cookie;
#endif
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4c72a41f11af..e1a00646dbbe 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9471,6 +9471,12 @@ static inline bool sample_is_allowed(struct perf_event *event, struct pt_regs *r
* Generic event overflow handling, sampling.
*/
+#ifdef CONFIG_BPF_SYSCALL
+static int bpf_overflow_handler(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs);
+#endif
+
static int __perf_event_overflow(struct perf_event *event,
int throttle, struct perf_sample_data *data,
struct pt_regs *regs)
@@ -9487,6 +9493,11 @@ static int __perf_event_overflow(struct perf_event *event,
ret = __perf_event_account_interrupt(event, throttle);
+#ifdef CONFIG_BPF_SYSCALL
+ if (event->prog && bpf_overflow_handler(event, data, regs) == 0)
+ return ret;
+#endif
+
/*
* XXX event_limit might not quite work as expected on inherited
* events
@@ -10346,7 +10357,7 @@ static void perf_event_free_filter(struct perf_event *event)
}
#ifdef CONFIG_BPF_SYSCALL
-static void bpf_overflow_handler(struct perf_event *event,
+static int bpf_overflow_handler(struct perf_event *event,
struct perf_sample_data *data,
struct pt_regs *regs)
{
@@ -10355,7 +10366,7 @@ static void bpf_overflow_handler(struct perf_event *event,
.event = event,
};
struct bpf_prog *prog;
- int ret = 0;
+ int ret = 1;
ctx.regs = perf_arch_bpf_user_pt_regs(regs);
if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
@@ -10369,10 +10380,7 @@ static void bpf_overflow_handler(struct perf_event *event,
rcu_read_unlock();
out:
__this_cpu_dec(bpf_prog_active);
- if (!ret)
- return;
-
- event->orig_overflow_handler(event, data, regs);
+ return ret;
}
static int perf_event_set_bpf_handler(struct perf_event *event,
@@ -10408,8 +10416,6 @@ static int perf_event_set_bpf_handler(struct perf_event *event,
event->prog = prog;
event->bpf_cookie = bpf_cookie;
- event->orig_overflow_handler = READ_ONCE(event->overflow_handler);
- WRITE_ONCE(event->overflow_handler, bpf_overflow_handler);
return 0;
}
@@ -10420,7 +10426,6 @@ static void perf_event_free_bpf_handler(struct perf_event *event)
if (!prog)
return;
- WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler);
event->prog = NULL;
bpf_prog_put(prog);
}
@@ -11880,13 +11885,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
overflow_handler = parent_event->overflow_handler;
context = parent_event->overflow_handler_context;
#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_EVENT_TRACING)
- if (overflow_handler == bpf_overflow_handler) {
+ if (parent_event->prog) {
struct bpf_prog *prog = parent_event->prog;
bpf_prog_inc(prog);
event->prog = prog;
- event->orig_overflow_handler =
- parent_event->orig_overflow_handler;
}
#endif
}
next prev parent reply other threads:[~2023-12-07 22:38 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-07 16:34 [PATCH v2 0/3] Combine perf and bpf for fast eval of hw breakpoint conditions Kyle Huey
2023-12-07 16:34 ` [PATCH v2 1/3] perf: Reorder overflow handler ahead of event_limit/sigtrap Kyle Huey
2023-12-07 17:05 ` Marco Elver
[not found] ` <CAP045Ap8z0qUpuYtbf9hpBqfnngNU7wVT0HM0XwQMrYYt9CAkg@mail.gmail.com>
2023-12-07 17:53 ` Marco Elver
2023-12-07 22:38 ` Namhyung Kim [this message]
2023-12-07 23:02 ` Kyle Huey
2023-12-07 16:34 ` [PATCH v2 2/3] perf/bpf: Allow a bpf program to suppress all sample side effects Kyle Huey
2023-12-07 19:12 ` Andrii Nakryiko
2023-12-07 16:34 ` [PATCH v2 3/3] selftest/bpf: Test a perf bpf program that suppresses " Kyle Huey
2023-12-07 19:12 ` Andrii Nakryiko
2023-12-07 19:20 ` Marco Elver
2023-12-07 22:56 ` Kyle Huey
2023-12-08 1:08 ` Kyle Huey
2023-12-08 3:46 ` Yonghong Song
2023-12-08 8:06 ` Marco Elver
2023-12-08 17:09 ` Kyle Huey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZXJJa5re536_e7c1@google.com \
--to=namhyung@kernel.org \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=andrii.nakryiko@gmail.com \
--cc=elver@google.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=khuey@kylehuey.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=me@kylehuey.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=robert@ocallahan.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox