From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>, Ian Rogers <irogers@google.com>,
Adrian Hunter <adrian.hunter@intel.com>,
Andi Kleen <ak@linux.intel.com>,
Kan Liang <kan.liang@linux.intel.com>, Song Liu <song@kernel.org>,
Stephane Eranian <eranian@google.com>,
Ravi Bangoria <ravi.bangoria@amd.com>,
Leo Yan <leo.yan@linaro.org>, James Clark <james.clark@arm.com>,
Hao Luo <haoluo@google.com>, LKML <linux-kernel@vger.kernel.org>,
linux-perf-users@vger.kernel.org, bpf@vger.kernel.org
Subject: Re: [PATCH 03/10] perf record: Add BPF event filter support
Date: Wed, 15 Mar 2023 10:47:29 -0300 [thread overview]
Message-ID: <ZBHMcfCr+GeQJ9Sz@kernel.org> (raw)
In-Reply-To: <20230314234237.3008956-4-namhyung@kernel.org>
Em Tue, Mar 14, 2023 at 04:42:30PM -0700, Namhyung Kim escreveu:
> Use --filter option to set BPF filter for generic events other than the
> tracepoints or Intel PT. The BPF program will check the sample data and
> filter according to the expression.
>
> For example, the below is the typical perf record for frequency mode.
> The sample period started from 1 and increased gradually.
>
> $ sudo ./perf record -e cycles true
> $ sudo ./perf script
> perf-exec 2272336 546683.916875: 1 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
> perf-exec 2272336 546683.916892: 1 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
> perf-exec 2272336 546683.916899: 3 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
> perf-exec 2272336 546683.916905: 17 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
> perf-exec 2272336 546683.916911: 100 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
> perf-exec 2272336 546683.916917: 589 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
> perf-exec 2272336 546683.916924: 3470 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
> perf-exec 2272336 546683.916930: 20465 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
> true 2272336 546683.916940: 119873 cycles: ffffffff8283afdd perf_iterate_ctx+0x2d ([kernel.kallsyms])
> true 2272336 546683.917003: 461349 cycles: ffffffff82892517 vma_interval_tree_insert+0x37 ([kernel.kallsyms])
> true 2272336 546683.917237: 635778 cycles: ffffffff82a11400 security_mmap_file+0x20 ([kernel.kallsyms])
>
> When you add a BPF filter to get samples having periods greater than 1000,
> the output would look like below:
Had to add:
diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
index be336f1b2b689602..153a13cdca9df1ea 100644
--- a/tools/perf/util/python.c
+++ b/tools/perf/util/python.c
@@ -19,6 +19,7 @@
#include "mmap.h"
#include "stat.h"
#include "metricgroup.h"
+#include "util/bpf-filter.h"
#include "util/env.h"
#include "util/pmu.h"
#include <internal/lib.h>
@@ -135,6 +136,18 @@ int bpf_counter__disable(struct evsel *evsel __maybe_unused)
return 0;
}
+// not to drag util/bpf-filter.c
+
+int perf_bpf_filter__prepare(struct evsel *evsel __maybe_unused)
+{
+ return 0;
+}
+
+int perf_bpf_filter__destroy(struct evsel *evsel __maybe_unused)
+{
+ return 0;
+}
+
/*
* Support debug printing even though util/debug.c is not linked. That means
* implementing 'verbose' and 'eprintf'.
Please run 'perf test' before submitting patches,
- Arnaldo
> $ sudo ./perf record -e cycles --filter 'period > 1000' true
> $ sudo ./perf script
> perf-exec 2273949 546850.708501: 5029 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
> perf-exec 2273949 546850.708508: 32409 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
> perf-exec 2273949 546850.708526: 143369 cycles: ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms])
> perf-exec 2273949 546850.708600: 372650 cycles: ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms])
> perf-exec 2273949 546850.708791: 482953 cycles: ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms])
> true 2273949 546850.709036: 501985 cycles: ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms])
> true 2273949 546850.709292: 503065 cycles: 7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
>
> Acked-by: Jiri Olsa <jolsa@kernel.org>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
> tools/perf/Documentation/perf-record.txt | 15 +++++++++++---
> tools/perf/util/bpf_counter.c | 3 +--
> tools/perf/util/evlist.c | 25 +++++++++++++++++-------
> tools/perf/util/evsel.c | 2 ++
> tools/perf/util/parse-events.c | 8 +++-----
> 5 files changed, 36 insertions(+), 17 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
> index ff815c2f67e8..122f71726eaa 100644
> --- a/tools/perf/Documentation/perf-record.txt
> +++ b/tools/perf/Documentation/perf-record.txt
> @@ -119,9 +119,12 @@ OPTIONS
> "perf report" to view group events together.
>
> --filter=<filter>::
> - Event filter. This option should follow an event selector (-e) which
> - selects either tracepoint event(s) or a hardware trace PMU
> - (e.g. Intel PT or CoreSight).
> + Event filter. This option should follow an event selector (-e).
> + If the event is a tracepoint, the filter string will be parsed by
> + the kernel. If the event is a hardware trace PMU (e.g. Intel PT
> + or CoreSight), it'll be processed as an address filter. Otherwise
> + it means a general filter using BPF which can be applied for any
> + kind of event.
>
> - tracepoint filters
>
> @@ -176,6 +179,12 @@ OPTIONS
>
> Multiple filters can be separated with space or comma.
>
> + - bpf filters
> +
> + A BPF filter can access the sample data and make a decision based on the
> + data. Users need to set an appropriate sample type to use the BPF
> + filter.
> +
> --exclude-perf::
> Don't record events issued by perf itself. This option should follow
> an event selector (-e) which selects tracepoint event(s). It adds a
> diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
> index aa78a15a6f0a..1b77436e067e 100644
> --- a/tools/perf/util/bpf_counter.c
> +++ b/tools/perf/util/bpf_counter.c
> @@ -763,8 +763,7 @@ extern struct bpf_counter_ops bperf_cgrp_ops;
>
> static inline bool bpf_counter_skip(struct evsel *evsel)
> {
> - return list_empty(&evsel->bpf_counter_list) &&
> - evsel->follower_skel == NULL;
> + return evsel->bpf_counter_ops == NULL;
> }
>
> int bpf_counter__install_pe(struct evsel *evsel, int cpu_map_idx, int fd)
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index b74e12239aec..cc491a037836 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -31,6 +31,7 @@
> #include "util/evlist-hybrid.h"
> #include "util/pmu.h"
> #include "util/sample.h"
> +#include "util/bpf-filter.h"
> #include <signal.h>
> #include <unistd.h>
> #include <sched.h>
> @@ -1086,17 +1087,27 @@ int evlist__apply_filters(struct evlist *evlist, struct evsel **err_evsel)
> int err = 0;
>
> evlist__for_each_entry(evlist, evsel) {
> - if (evsel->filter == NULL)
> - continue;
> -
> /*
> * filters only work for tracepoint event, which doesn't have cpu limit.
> * So evlist and evsel should always be same.
> */
> - err = perf_evsel__apply_filter(&evsel->core, evsel->filter);
> - if (err) {
> - *err_evsel = evsel;
> - break;
> + if (evsel->filter) {
> + err = perf_evsel__apply_filter(&evsel->core, evsel->filter);
> + if (err) {
> + *err_evsel = evsel;
> + break;
> + }
> + }
> +
> + /*
> + * non-tracepoint events can have BPF filters.
> + */
> + if (!list_empty(&evsel->bpf_filters)) {
> + err = perf_bpf_filter__prepare(evsel);
> + if (err) {
> + *err_evsel = evsel;
> + break;
> + }
> }
> }
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index a83d8cd5eb51..dc3faf005c3b 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -50,6 +50,7 @@
> #include "off_cpu.h"
> #include "../perf-sys.h"
> #include "util/parse-branch-options.h"
> +#include "util/bpf-filter.h"
> #include <internal/xyarray.h>
> #include <internal/lib.h>
> #include <internal/threadmap.h>
> @@ -1517,6 +1518,7 @@ void evsel__exit(struct evsel *evsel)
> assert(list_empty(&evsel->core.node));
> assert(evsel->evlist == NULL);
> bpf_counter__destroy(evsel);
> + perf_bpf_filter__destroy(evsel);
> evsel__free_counts(evsel);
> perf_evsel__free_fd(&evsel->core);
> perf_evsel__free_id(&evsel->core);
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 3b2e5bb3e852..6c5cf5244486 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -28,6 +28,7 @@
> #include "perf.h"
> #include "util/parse-events-hybrid.h"
> #include "util/pmu-hybrid.h"
> +#include "util/bpf-filter.h"
> #include "tracepoint.h"
> #include "thread_map.h"
>
> @@ -2542,11 +2543,8 @@ static int set_filter(struct evsel *evsel, const void *arg)
> perf_pmu__scan_file(pmu, "nr_addr_filters",
> "%d", &nr_addr_filters);
>
> - if (!nr_addr_filters) {
> - fprintf(stderr,
> - "This CPU does not support address filtering\n");
> - return -1;
> - }
> + if (!nr_addr_filters)
> + return perf_bpf_filter__parse(&evsel->bpf_filters, str);
>
> if (evsel__append_addr_filter(evsel, str) < 0) {
> fprintf(stderr,
> --
> 2.40.0.rc1.284.g88254d51c5-goog
>
--
- Arnaldo
next prev parent reply other threads:[~2023-03-15 13:48 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-14 23:42 [PATCHSET 00/10] perf record: Implement BPF sample filter (v5) Namhyung Kim
2023-03-14 23:42 ` [PATCH 01/10] perf bpf filter: Introduce basic BPF filter expression Namhyung Kim
2023-03-14 23:42 ` [PATCH 02/10] perf bpf filter: Implement event sample filtering Namhyung Kim
2023-03-15 16:24 ` Arnaldo Carvalho de Melo
2023-03-15 16:39 ` Arnaldo Carvalho de Melo
2023-03-15 16:51 ` Namhyung Kim
2023-03-15 20:12 ` Arnaldo Carvalho de Melo
2023-03-16 5:18 ` Namhyung Kim
2023-03-16 10:15 ` Arnaldo Carvalho de Melo
2023-03-14 23:42 ` [PATCH 03/10] perf record: Add BPF event filter support Namhyung Kim
2023-03-15 13:47 ` Arnaldo Carvalho de Melo [this message]
2023-03-15 16:41 ` Namhyung Kim
2023-03-15 16:51 ` Arnaldo Carvalho de Melo
2023-03-14 23:42 ` [PATCH 04/10] perf record: Record dropped sample count Namhyung Kim
2023-03-14 23:42 ` [PATCH 05/10] perf bpf filter: Add 'pid' sample data support Namhyung Kim
2023-03-14 23:42 ` [PATCH 06/10] perf bpf filter: Add more weight " Namhyung Kim
2023-03-14 23:42 ` [PATCH 07/10] perf bpf filter: Add data_src " Namhyung Kim
2023-03-14 23:42 ` [PATCH 08/10] perf bpf filter: Add logical OR operator Namhyung Kim
2023-03-14 23:42 ` [PATCH 09/10] perf bpf filter: Show warning for missing sample flags Namhyung Kim
2023-03-14 23:42 ` [PATCH 10/10] perf record: Update documentation for BPF filters Namhyung Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZBHMcfCr+GeQJ9Sz@kernel.org \
--to=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=ak@linux.intel.com \
--cc=bpf@vger.kernel.org \
--cc=eranian@google.com \
--cc=haoluo@google.com \
--cc=irogers@google.com \
--cc=james.clark@arm.com \
--cc=jolsa@kernel.org \
--cc=kan.liang@linux.intel.com \
--cc=leo.yan@linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=ravi.bangoria@amd.com \
--cc=song@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox