public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Namhyung Kim <namhyung@kernel.org>
To: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ian Rogers <irogers@google.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-perf-users@vger.kernel.org, KP Singh <kpsingh@kernel.org>,
	Song Liu <song@kernel.org>,
	bpf@vger.kernel.org, Stephane Eranian <eranian@google.com>
Subject: Re: [PATCH] perf bpf-filter: Support multiple events properly
Date: Mon, 5 Aug 2024 14:12:01 -0700	[thread overview]
Message-ID: <ZrFAIc0G8n0-zgxt@google.com> (raw)
In-Reply-To: <ZrEolmUz_2I1fmdJ@x1>

On Mon, Aug 05, 2024 at 04:31:34PM -0300, Arnaldo Carvalho de Melo wrote:
> On Mon, Aug 05, 2024 at 11:56:56AM -0700, Namhyung Kim wrote:
> > On Mon, Aug 05, 2024 at 12:03:14PM -0300, Arnaldo Carvalho de Melo wrote:
> > > On Fri, Aug 02, 2024 at 10:37:52AM -0700, Namhyung Kim wrote:
> > > > So far it used tgid as a key to get the filter expressions in the
> > > > pinned filters map for regular users but it won't work well if the has
> > > > more than one filters at the same time.  Let's add the event id to the
> > > > key of the filter hash map so that it can identify the right filter
> > > > expression in the BPF program.
> > > > 
> > > > As the event can be inherited to child tasks, it should use the primary
> > > > id which belongs to the parent (original) event.  Since evsel opens the
> > > > event for multiple CPUs and tasks, it needs to maintain a separate hash
> > > > map for the event id.
> > > 
> > > I'm trying to test it now, it would be nice to have the series of events
> > > needed to test that the feature is working.
> > 
> > Sure, I used the following command.
> > 
> >   ./perf record -e cycles --filter 'ip < 0xffffffff00000000' -e instructions --filter 'period < 100000' -o- ./perf test -w noploop | ./perf script -i-
> 
> Thanks
>  
> > > 
> > > Some comments below.
> > >  
> > > > In the user space, it keeps a list for the multiple evsel and release
> > > > the entries in the both hash map when it closes the event.
> > > > 
> > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > ---
> > > >  tools/perf/util/bpf-filter.c                 | 288 ++++++++++++++++---
> > > >  tools/perf/util/bpf_skel/sample-filter.h     |  11 +-
> > > >  tools/perf/util/bpf_skel/sample_filter.bpf.c |  42 ++-
> > > >  tools/perf/util/bpf_skel/vmlinux/vmlinux.h   |   5 +
> > > >  4 files changed, 304 insertions(+), 42 deletions(-)
> > > > 
> > > > diff --git a/tools/perf/util/bpf-filter.c b/tools/perf/util/bpf-filter.c
> > > > index c5eb0b7eec19..69b147cba969 100644
> > > > --- a/tools/perf/util/bpf-filter.c
> > > > +++ b/tools/perf/util/bpf-filter.c
> > > > @@ -1,4 +1,45 @@
> > > >  /* SPDX-License-Identifier: GPL-2.0 */
> > > > +/**
> > > > + * Generic event filter for sampling events in BPF.
> > > > + *
> > > > + * The BPF program is fixed and just to read filter expressions in the 'filters'
> > > > + * map and compare the sample data in order to reject samples that don't match.
> > > > + * Each filter expression contains a sample flag (term) to compare, an operation
> > > > + * (==, >=, and so on) and a value.
> > > > + *
> > > > + * Note that each entry has an array of filter repxressions and it only succeeds
> > > 
> > >                                                   expressions
> > 
> > Oops, thanks.
> > 
> > > 
> > > > + * when all of the expressions are satisfied.  But it supports the logical OR
> > > > + * using a GROUP operation which is satisfied when any of its member expression
> > > > + * is evaluated to true.  But it doesn't allow nested GROUP operations for now.
> > > > + *
> > > > + * To support non-root users, the filters map can be loaded and pinned in the BPF
> > > > + * filesystem by root (perf record --setup-filter pin).  Then each user will get
> > > > + * a new entry in the shared filters map to fill the filter expressions.  And the
> > > > + * BPF program will find the filter using (task-id, event-id) as a key.
> > > > + *
> > > > + * The pinned BPF object (shared for regular users) has:
> > > > + *
> > > > + *                  event_hash                   |
> > > > + *                  |        |                   |
> > > > + *   event->id ---> |   id   | ---+   idx_hash   |     filters
> > > > + *                  |        |    |   |      |   |    |       |
> > > > + *                  |  ....  |    +-> |  idx | --+--> | exprs | --->  perf_bpf_filter_entry[]
> > > > + *                                |   |      |   |    |       |               .op
> > > > + *   task id (tgid) --------------+   | .... |   |    |  ...  |               .term (+ part)
> > > > + *                                               |                            .value
> > > > + *                                               |
> > > > + *   ======= (root would skip this part) ========                     (compares it in a loop)
> > > > + *
> > > > + * This is used for per-task use cases while system-wide profiling (normally from
> > > > + * root user) uses a separate copy of the program and the maps for its own so that
> > > > + * it can proceed even if a lot of non-root users are using the filters at the
> > > > + * same time.  In this case the filters map has a single entry and no need to use
> > > > + * the hash maps to get the index (key) of the filters map (IOW it's always 0).
> > > > + *
> > > > + * The BPF program returns 1 to accept the sample or 0 to drop it.
> > > > + * The 'dropped' map is to keep how many samples it dropped by the filter and
> > > > + * it will be reported as lost samples.
> > > 
> > > I think there is value in reporting how many were filtered out, I'm just
> > > unsure about reporting it as "lost" samples, as lost has another
> > > semantic associated, i.e. ring buffer was full or couldn't process it
> > > for some other resource starvation issue, no?
> > 
> > Then we need a way to save the information.  It could be a new record
> > type (PERF_RECORD_DROPPED_SAMPLES), a new misc flag in the lost samples
> 
> I guess "PERF_RECORD_FILTERED_SAMPLES" would be better, more precise,
> wdyt?
> 
> > record or a header field.  I prefer the misc flag.
> 
> I think we can have both filtered and lost samples, so I would prefer
> the new record type.

I think we can have two LOST_SAMPLES records then - one with the new
misc flag for BPF and the other (without the flag) for the usual lost
samples.  This would require minimal changes IMHO.

Thanks,
Namhyung

>  
> > Also there should be a separate PERF_RECORD_LOST record in the middle
> > when there's actual issue.  Without that we can say it's from the BPF.
> 
> Right, disambiguating filtered from lost samples is indeed useful.
> 
> - Arnaldo
>  
> > Thanks,
> > Namhyung

  reply	other threads:[~2024-08-05 21:12 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-02 17:37 [PATCH] perf bpf-filter: Support multiple events properly Namhyung Kim
2024-08-05 15:03 ` Arnaldo Carvalho de Melo
2024-08-05 18:56   ` Namhyung Kim
2024-08-05 19:31     ` Arnaldo Carvalho de Melo
2024-08-05 21:12       ` Namhyung Kim [this message]
2024-08-13 23:33         ` Namhyung Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZrFAIc0G8n0-zgxt@google.com \
    --to=namhyung@kernel.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=eranian@google.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=song@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox