From: Namhyung Kim <namhyung@kernel.org>
To: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ian Rogers <irogers@google.com>,
Kan Liang <kan.liang@linux.intel.com>,
Jiri Olsa <jolsa@kernel.org>,
Adrian Hunter <adrian.hunter@intel.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
linux-perf-users@vger.kernel.org, KP Singh <kpsingh@kernel.org>,
Song Liu <song@kernel.org>,
bpf@vger.kernel.org, Stephane Eranian <eranian@google.com>
Subject: Re: [PATCH] perf bpf-filter: Support multiple events properly
Date: Mon, 5 Aug 2024 14:12:01 -0700 [thread overview]
Message-ID: <ZrFAIc0G8n0-zgxt@google.com> (raw)
In-Reply-To: <ZrEolmUz_2I1fmdJ@x1>
On Mon, Aug 05, 2024 at 04:31:34PM -0300, Arnaldo Carvalho de Melo wrote:
> On Mon, Aug 05, 2024 at 11:56:56AM -0700, Namhyung Kim wrote:
> > On Mon, Aug 05, 2024 at 12:03:14PM -0300, Arnaldo Carvalho de Melo wrote:
> > > On Fri, Aug 02, 2024 at 10:37:52AM -0700, Namhyung Kim wrote:
> > > > So far it used tgid as a key to get the filter expressions in the
> > > > pinned filters map for regular users but it won't work well if the has
> > > > more than one filters at the same time. Let's add the event id to the
> > > > key of the filter hash map so that it can identify the right filter
> > > > expression in the BPF program.
> > > >
> > > > As the event can be inherited to child tasks, it should use the primary
> > > > id which belongs to the parent (original) event. Since evsel opens the
> > > > event for multiple CPUs and tasks, it needs to maintain a separate hash
> > > > map for the event id.
> > >
> > > I'm trying to test it now, it would be nice to have the series of events
> > > needed to test that the feature is working.
> >
> > Sure, I used the following command.
> >
> > ./perf record -e cycles --filter 'ip < 0xffffffff00000000' -e instructions --filter 'period < 100000' -o- ./perf test -w noploop | ./perf script -i-
>
> Thanks
>
> > >
> > > Some comments below.
> > >
> > > > In the user space, it keeps a list for the multiple evsel and release
> > > > the entries in the both hash map when it closes the event.
> > > >
> > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > ---
> > > > tools/perf/util/bpf-filter.c | 288 ++++++++++++++++---
> > > > tools/perf/util/bpf_skel/sample-filter.h | 11 +-
> > > > tools/perf/util/bpf_skel/sample_filter.bpf.c | 42 ++-
> > > > tools/perf/util/bpf_skel/vmlinux/vmlinux.h | 5 +
> > > > 4 files changed, 304 insertions(+), 42 deletions(-)
> > > >
> > > > diff --git a/tools/perf/util/bpf-filter.c b/tools/perf/util/bpf-filter.c
> > > > index c5eb0b7eec19..69b147cba969 100644
> > > > --- a/tools/perf/util/bpf-filter.c
> > > > +++ b/tools/perf/util/bpf-filter.c
> > > > @@ -1,4 +1,45 @@
> > > > /* SPDX-License-Identifier: GPL-2.0 */
> > > > +/**
> > > > + * Generic event filter for sampling events in BPF.
> > > > + *
> > > > + * The BPF program is fixed and just to read filter expressions in the 'filters'
> > > > + * map and compare the sample data in order to reject samples that don't match.
> > > > + * Each filter expression contains a sample flag (term) to compare, an operation
> > > > + * (==, >=, and so on) and a value.
> > > > + *
> > > > + * Note that each entry has an array of filter repxressions and it only succeeds
> > >
> > > expressions
> >
> > Oops, thanks.
> >
> > >
> > > > + * when all of the expressions are satisfied. But it supports the logical OR
> > > > + * using a GROUP operation which is satisfied when any of its member expression
> > > > + * is evaluated to true. But it doesn't allow nested GROUP operations for now.
> > > > + *
> > > > + * To support non-root users, the filters map can be loaded and pinned in the BPF
> > > > + * filesystem by root (perf record --setup-filter pin). Then each user will get
> > > > + * a new entry in the shared filters map to fill the filter expressions. And the
> > > > + * BPF program will find the filter using (task-id, event-id) as a key.
> > > > + *
> > > > + * The pinned BPF object (shared for regular users) has:
> > > > + *
> > > > + * event_hash |
> > > > + * | | |
> > > > + * event->id ---> | id | ---+ idx_hash | filters
> > > > + * | | | | | | | |
> > > > + * | .... | +-> | idx | --+--> | exprs | ---> perf_bpf_filter_entry[]
> > > > + * | | | | | | .op
> > > > + * task id (tgid) --------------+ | .... | | | ... | .term (+ part)
> > > > + * | .value
> > > > + * |
> > > > + * ======= (root would skip this part) ======== (compares it in a loop)
> > > > + *
> > > > + * This is used for per-task use cases while system-wide profiling (normally from
> > > > + * root user) uses a separate copy of the program and the maps for its own so that
> > > > + * it can proceed even if a lot of non-root users are using the filters at the
> > > > + * same time. In this case the filters map has a single entry and no need to use
> > > > + * the hash maps to get the index (key) of the filters map (IOW it's always 0).
> > > > + *
> > > > + * The BPF program returns 1 to accept the sample or 0 to drop it.
> > > > + * The 'dropped' map is to keep how many samples it dropped by the filter and
> > > > + * it will be reported as lost samples.
> > >
> > > I think there is value in reporting how many were filtered out, I'm just
> > > unsure about reporting it as "lost" samples, as lost has another
> > > semantic associated, i.e. ring buffer was full or couldn't process it
> > > for some other resource starvation issue, no?
> >
> > Then we need a way to save the information. It could be a new record
> > type (PERF_RECORD_DROPPED_SAMPLES), a new misc flag in the lost samples
>
> I guess "PERF_RECORD_FILTERED_SAMPLES" would be better, more precise,
> wdyt?
>
> > record or a header field. I prefer the misc flag.
>
> I think we can have both filtered and lost samples, so I would prefer
> the new record type.
I think we can have two LOST_SAMPLES records then - one with the new
misc flag for BPF and the other (without the flag) for the usual lost
samples. This would require minimal changes IMHO.
Thanks,
Namhyung
>
> > Also there should be a separate PERF_RECORD_LOST record in the middle
> > when there's actual issue. Without that we can say it's from the BPF.
>
> Right, disambiguating filtered from lost samples is indeed useful.
>
> - Arnaldo
>
> > Thanks,
> > Namhyung
next prev parent reply other threads:[~2024-08-05 21:12 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-02 17:37 [PATCH] perf bpf-filter: Support multiple events properly Namhyung Kim
2024-08-05 15:03 ` Arnaldo Carvalho de Melo
2024-08-05 18:56 ` Namhyung Kim
2024-08-05 19:31 ` Arnaldo Carvalho de Melo
2024-08-05 21:12 ` Namhyung Kim [this message]
2024-08-13 23:33 ` Namhyung Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZrFAIc0G8n0-zgxt@google.com \
--to=namhyung@kernel.org \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=bpf@vger.kernel.org \
--cc=eranian@google.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=kan.liang@linux.intel.com \
--cc=kpsingh@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=song@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.