public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Adrian Hunter <adrian.hunter@intel.com>
To: Namhyung Kim <namhyung@kernel.org>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>, Ian Rogers <irogers@google.com>,
	Andi Kleen <ak@linux.intel.com>,
	Kan Liang <kan.liang@linux.intel.com>, Song Liu <song@kernel.org>,
	Stephane Eranian <eranian@google.com>,
	Ravi Bangoria <ravi.bangoria@amd.com>,
	Leo Yan <leo.yan@linaro.org>, James Clark <james.clark@arm.com>,
	Hao Luo <haoluo@google.com>, LKML <linux-kernel@vger.kernel.org>,
	linux-perf-users@vger.kernel.org, bpf@vger.kernel.org
Subject: Re: [PATCH 3/8] perf record: Add BPF event filter support
Date: Tue, 7 Mar 2023 15:04:27 +0200	[thread overview]
Message-ID: <4d6f69ce-e765-13f6-ae30-8ec63eaf4c34@intel.com> (raw)
In-Reply-To: <20230222230141.1729048-4-namhyung@kernel.org>

On 23/02/23 01:01, Namhyung Kim wrote:
> Use --filter option to set BPF filter for generic events other than the
> tracepoints or Intel PT.  The BPF program will check the sample data and
> filter according to the expression.
> 
> For example, the below is the typical perf record for frequency mode.
> The sample period started from 1 and increased gradually.
> 
> $ sudo ./perf record -e cycles true
> $ sudo ./perf script
>        perf-exec 2272336 546683.916875:          1 cycles:  ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
>        perf-exec 2272336 546683.916892:          1 cycles:  ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
>        perf-exec 2272336 546683.916899:          3 cycles:  ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
>        perf-exec 2272336 546683.916905:         17 cycles:  ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
>        perf-exec 2272336 546683.916911:        100 cycles:  ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
>        perf-exec 2272336 546683.916917:        589 cycles:  ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
>        perf-exec 2272336 546683.916924:       3470 cycles:  ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
>        perf-exec 2272336 546683.916930:      20465 cycles:  ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
>             true 2272336 546683.916940:     119873 cycles:  ffffffff8283afdd perf_iterate_ctx+0x2d ([kernel.kallsyms])
>             true 2272336 546683.917003:     461349 cycles:  ffffffff82892517 vma_interval_tree_insert+0x37 ([kernel.kallsyms])
>             true 2272336 546683.917237:     635778 cycles:  ffffffff82a11400 security_mmap_file+0x20 ([kernel.kallsyms])
> 
> When you add a BPF filter to get samples having periods greater than 1000,
> the output would look like below:
> 
> $ sudo ./perf record -e cycles --filter 'period > 1000' true
> $ sudo ./perf script
>        perf-exec 2273949 546850.708501:       5029 cycles:  ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
>        perf-exec 2273949 546850.708508:      32409 cycles:  ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
>        perf-exec 2273949 546850.708526:     143369 cycles:  ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms])
>        perf-exec 2273949 546850.708600:     372650 cycles:  ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms])
>        perf-exec 2273949 546850.708791:     482953 cycles:  ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms])
>             true 2273949 546850.709036:     501985 cycles:  ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms])
>             true 2273949 546850.709292:     503065 cycles:      7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/Documentation/perf-record.txt | 15 +++++++++++---
>  tools/perf/util/bpf_counter.c            |  3 +--
>  tools/perf/util/evlist.c                 | 25 +++++++++++++++++-------
>  tools/perf/util/evsel.c                  |  2 ++
>  tools/perf/util/parse-events.c           |  8 +++-----
>  5 files changed, 36 insertions(+), 17 deletions(-)
> 
> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
> index ff815c2f67e8..9f4e1337e6dc 100644
> --- a/tools/perf/Documentation/perf-record.txt
> +++ b/tools/perf/Documentation/perf-record.txt
> @@ -119,9 +119,12 @@ OPTIONS
>  	  "perf report" to view group events together.
>  
>  --filter=<filter>::
> -        Event filter. This option should follow an event selector (-e) which
> -	selects either tracepoint event(s) or a hardware trace PMU
> -	(e.g. Intel PT or CoreSight).
> +	Event filter.  This option should follow an event selector (-e).
> +	If the event is a tracepoint, the filter string will be parsed by
> +	the kernel.  If the event is a hardware trace PMU (e.g. Intel PT
> +	or CoreSight), it'll be processed as an address filter.  Otherwise
> +	it means a general filter using BPF which can be applied for any
> +	kind of events.

events -> event

>  
>  	- tracepoint filters
>  
> @@ -174,6 +177,12 @@ OPTIONS
>  	within a single mapping.  MMAP events (or /proc/<pid>/maps) can be
>  	examined to determine if that is a possibility.
>  
> +	- bpf filters
> +
> +	BPF filter can access the sample data and make a decision based on the

'BPF filter' -> 'A BPF filter'

> +	data.  Users need to set the appropriate sample type to use the BPF

'the appropriate' -> 'an appropriate'

Perhaps could expand on what "appropriate sample type" means here,
also, since the user does not actually specify sample_type directly.

What happens if the sample_type is not appropriate?

> +	filter.
> +
>  	Multiple filters can be separated with space or comma.
>  
>  --exclude-perf::
> diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
> index eeee899fcf34..0414385794ee 100644
> --- a/tools/perf/util/bpf_counter.c
> +++ b/tools/perf/util/bpf_counter.c
> @@ -781,8 +781,7 @@ extern struct bpf_counter_ops bperf_cgrp_ops;
>  
>  static inline bool bpf_counter_skip(struct evsel *evsel)
>  {
> -	return list_empty(&evsel->bpf_counter_list) &&
> -		evsel->follower_skel == NULL;
> +	return evsel->bpf_counter_ops == NULL;
>  }
>  
>  int bpf_counter__install_pe(struct evsel *evsel, int cpu_map_idx, int fd)
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 817df2504a1e..1ae047b24c89 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -31,6 +31,7 @@
>  #include "util/evlist-hybrid.h"
>  #include "util/pmu.h"
>  #include "util/sample.h"
> +#include "util/bpf-filter.h"
>  #include <signal.h>
>  #include <unistd.h>
>  #include <sched.h>
> @@ -1086,17 +1087,27 @@ int evlist__apply_filters(struct evlist *evlist, struct evsel **err_evsel)
>  	int err = 0;
>  
>  	evlist__for_each_entry(evlist, evsel) {
> -		if (evsel->filter == NULL)
> -			continue;
> -
>  		/*
>  		 * filters only work for tracepoint event, which doesn't have cpu limit.
>  		 * So evlist and evsel should always be same.
>  		 */
> -		err = perf_evsel__apply_filter(&evsel->core, evsel->filter);
> -		if (err) {
> -			*err_evsel = evsel;
> -			break;
> +		 if (evsel->filter) {

Extra space before 'if'

> +			err = perf_evsel__apply_filter(&evsel->core, evsel->filter);
> +			if (err) {
> +				*err_evsel = evsel;
> +				break;
> +			}
> +		}
> +
> +		/*
> +		 * non-tracepoint events can have BPF filters.
> +		 */
> +		if (!list_empty(&evsel->bpf_filters)) {
> +			err = perf_bpf_filter__prepare(evsel);
> +			if (err) {
> +				*err_evsel = evsel;
> +				break;
> +			}
>  		}
>  	}
>  
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 51e8ce6edddc..cae624fde026 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -50,6 +50,7 @@
>  #include "off_cpu.h"
>  #include "../perf-sys.h"
>  #include "util/parse-branch-options.h"
> +#include "util/bpf-filter.h"
>  #include <internal/xyarray.h>
>  #include <internal/lib.h>
>  #include <internal/threadmap.h>
> @@ -1494,6 +1495,7 @@ void evsel__exit(struct evsel *evsel)
>  	assert(list_empty(&evsel->core.node));
>  	assert(evsel->evlist == NULL);
>  	bpf_counter__destroy(evsel);
> +	perf_bpf_filter__destroy(evsel);
>  	evsel__free_counts(evsel);
>  	perf_evsel__free_fd(&evsel->core);
>  	perf_evsel__free_id(&evsel->core);
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 0336ff27c15f..4371a2bb2564 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -27,6 +27,7 @@
>  #include "perf.h"
>  #include "util/parse-events-hybrid.h"
>  #include "util/pmu-hybrid.h"
> +#include "util/bpf-filter.h"
>  #include "tracepoint.h"
>  #include "thread_map.h"
>  
> @@ -2537,11 +2538,8 @@ static int set_filter(struct evsel *evsel, const void *arg)
>  		perf_pmu__scan_file(pmu, "nr_addr_filters",
>  				    "%d", &nr_addr_filters);
>  
> -	if (!nr_addr_filters) {
> -		fprintf(stderr,
> -			"This CPU does not support address filtering\n");
> -		return -1;
> -	}
> +	if (!nr_addr_filters)
> +		return perf_bpf_filter__parse(&evsel->bpf_filters, str);
>  
>  	if (evsel__append_addr_filter(evsel, str) < 0) {
>  		fprintf(stderr,


  reply	other threads:[~2023-03-07 13:07 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-22 23:01 [RFC/PATCHSET 0/8] perf record: Implement BPF sample filter (v3) Namhyung Kim
2023-02-22 23:01 ` [PATCH 1/8] perf bpf filter: Introduce basic BPF filter expression Namhyung Kim
2023-03-07 13:03   ` Adrian Hunter
2023-02-22 23:01 ` [PATCH 2/8] perf bpf filter: Implement event sample filtering Namhyung Kim
2023-03-07 13:03   ` Adrian Hunter
2023-03-07 20:57     ` Namhyung Kim
2023-02-22 23:01 ` [PATCH 3/8] perf record: Add BPF event filter support Namhyung Kim
2023-03-07 13:04   ` Adrian Hunter [this message]
2023-03-07 21:05     ` Namhyung Kim
2023-02-22 23:01 ` [PATCH 4/8] perf record: Record dropped sample count Namhyung Kim
2023-03-07 13:04   ` Adrian Hunter
2023-02-22 23:01 ` [PATCH 5/8] perf bpf filter: Add 'pid' sample data support Namhyung Kim
2023-02-22 23:01 ` [PATCH 6/8] perf bpf filter: Add more weight " Namhyung Kim
2023-02-22 23:01 ` [PATCH 7/8] perf bpf filter: Add data_src " Namhyung Kim
2023-02-22 23:01 ` [PATCH 8/8] perf bpf filter: Add logical OR operator Namhyung Kim
2023-03-07  4:53 ` [RFC/PATCHSET 0/8] perf record: Implement BPF sample filter (v3) Namhyung Kim
2023-03-07 22:33 ` Jiri Olsa
2023-03-07 23:06   ` Namhyung Kim
  -- strict thread matches above, loose matches on Subject: below --
2023-02-19  6:13 [RFC/PATCHSET 0/8] perf record: Implement BPF sample filter (v2) Namhyung Kim
2023-02-19  6:13 ` [PATCH 3/8] perf record: Add BPF event filter support Namhyung Kim
2023-02-21 11:54   ` Jiri Olsa
2023-02-22 19:50     ` Namhyung Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4d6f69ce-e765-13f6-ae30-8ec63eaf4c34@intel.com \
    --to=adrian.hunter@intel.com \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=eranian@google.com \
    --cc=haoluo@google.com \
    --cc=irogers@google.com \
    --cc=james.clark@arm.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=leo.yan@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=ravi.bangoria@amd.com \
    --cc=song@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox