[PATCH v5 0/2] perf/s390: Regression: Move uid filtering to BPF filters

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v5 0/2] perf/s390: Regression: Move uid filtering to BPF filters
@ 2025-08-06 16:22 Ilya Leoshkevich
  2025-08-06 16:22 ` [PATCH v5 1/2] libbpf: Add the ability to suppress perf event enablement Ilya Leoshkevich
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Ilya Leoshkevich @ 2025-08-06 16:22 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Ian Rogers,
	Arnaldo Carvalho de Melo
  Cc: bpf, linux-perf-users, linux-kernel, linux-s390, Thomas Richter,
	Jiri Olsa, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Ilya Leoshkevich

v4: https://lore.kernel.org/bpf/20250806114227.14617-1-iii@linux.ibm.com/
v4 -> v5: Fix a typo in the commit message (Yonghong).

v3: https://lore.kernel.org/bpf/20250805130346.1225535-1-iii@linux.ibm.com/
v3 -> v4: Rename the new field to dont_enable (Alexei, Eduard).
          Switch the Fixes: tag in patch 2 (Alexander, Thomas).
          Fix typos in the cover letter (Thomas).

v2: https://lore.kernel.org/bpf/20250728144340.711196-1-tmricht@linux.ibm.com/
v2 -> v3: Use no_ioctl_enable in perf.

v1: https://lore.kernel.org/bpf/20250725093405.3629253-1-tmricht@linux.ibm.com/
v1 -> v2: Introduce no_ioctl_enable (Jiri).

Hi,

This series fixes a regression caused by moving UID filtering to BPF.
The regression affects all events that support auxiliary data, most
notably, "cycles" events on s390, but also PT events on Intel. The
symptom is missing events when UID filtering is enabled.

Patch 1 introduces a new option for the
bpf_program__attach_perf_event_opts() function.
Patch 2 makes use of it in perf, and also contains a lot of technical
details of why exactly the problem is occurring.

Thanks to Thomas Richter for the investigation and the initial version
of this fix, and to Jiri Olsa for suggestions.

Best regards,
Ilya


Ilya Leoshkevich (2):
  libbpf: Add the ability to suppress perf event enablement
  perf bpf-filter: Enable events manually

 tools/lib/bpf/libbpf.c       | 13 ++++++++-----
 tools/lib/bpf/libbpf.h       |  4 +++-
 tools/perf/util/bpf-filter.c |  5 ++++-
 3 files changed, 15 insertions(+), 7 deletions(-)

-- 
2.50.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v5 1/2] libbpf: Add the ability to suppress perf event enablement
  2025-08-06 16:22 [PATCH v5 0/2] perf/s390: Regression: Move uid filtering to BPF filters Ilya Leoshkevich
@ 2025-08-06 16:22 ` Ilya Leoshkevich
  2025-08-06 16:22 ` [PATCH v5 2/2] perf bpf-filter: Enable events manually Ilya Leoshkevich
  2025-08-07 16:20 ` [PATCH v5 0/2] perf/s390: Regression: Move uid filtering to BPF filters patchwork-bot+netdevbpf
  2 siblings, 0 replies; 4+ messages in thread
From: Ilya Leoshkevich @ 2025-08-06 16:22 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Ian Rogers,
	Arnaldo Carvalho de Melo
  Cc: bpf, linux-perf-users, linux-kernel, linux-s390, Thomas Richter,
	Jiri Olsa, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Ilya Leoshkevich, Eduard Zingerman

Automatically enabling a perf event after attaching a BPF prog to it is
not always desirable.

Add a new "dont_enable" field to struct bpf_perf_event_opts. While
introducing "enable" instead would be nicer in that it would avoid
a double negation in the implementation, it would make
DECLARE_LIBBPF_OPTS() less efficient.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Suggested-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Co-developed-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
 tools/lib/bpf/libbpf.c | 13 ++++++++-----
 tools/lib/bpf/libbpf.h |  4 +++-
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index fb4d92c5c339..8f5a81b672e1 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -10965,11 +10965,14 @@ struct bpf_link *bpf_program__attach_perf_event_opts(const struct bpf_program *p
 		}
 		link->link.fd = pfd;
 	}
-	if (ioctl(pfd, PERF_EVENT_IOC_ENABLE, 0) < 0) {
-		err = -errno;
-		pr_warn("prog '%s': failed to enable perf_event FD %d: %s\n",
-			prog->name, pfd, errstr(err));
-		goto err_out;
+
+	if (!OPTS_GET(opts, dont_enable, false)) {
+		if (ioctl(pfd, PERF_EVENT_IOC_ENABLE, 0) < 0) {
+			err = -errno;
+			pr_warn("prog '%s': failed to enable perf_event FD %d: %s\n",
+				prog->name, pfd, errstr(err));
+			goto err_out;
+		}
 	}
 
 	return &link->link;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index d1cf813a057b..455a957cb702 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -499,9 +499,11 @@ struct bpf_perf_event_opts {
 	__u64 bpf_cookie;
 	/* don't use BPF link when attach BPF program */
 	bool force_ioctl_attach;
+	/* don't automatically enable the event */
+	bool dont_enable;
 	size_t :0;
 };
-#define bpf_perf_event_opts__last_field force_ioctl_attach
+#define bpf_perf_event_opts__last_field dont_enable
 
 LIBBPF_API struct bpf_link *
 bpf_program__attach_perf_event(const struct bpf_program *prog, int pfd);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v5 2/2] perf bpf-filter: Enable events manually
  2025-08-06 16:22 [PATCH v5 0/2] perf/s390: Regression: Move uid filtering to BPF filters Ilya Leoshkevich
  2025-08-06 16:22 ` [PATCH v5 1/2] libbpf: Add the ability to suppress perf event enablement Ilya Leoshkevich
@ 2025-08-06 16:22 ` Ilya Leoshkevich
  2025-08-07 16:20 ` [PATCH v5 0/2] perf/s390: Regression: Move uid filtering to BPF filters patchwork-bot+netdevbpf
  2 siblings, 0 replies; 4+ messages in thread
From: Ilya Leoshkevich @ 2025-08-06 16:22 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Ian Rogers,
	Arnaldo Carvalho de Melo
  Cc: bpf, linux-perf-users, linux-kernel, linux-s390, Thomas Richter,
	Jiri Olsa, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Ilya Leoshkevich

On s390, and, in general, on all platforms where the respective event
supports auxiliary data gathering, the command:

   # ./perf record -u 0 -aB --synth=no -- ./perf test -w thloop
   [ perf record: Woken up 1 times to write data ]
   [ perf record: Captured and wrote 0.011 MB perf.data ]
   # ./perf report --stats | grep SAMPLE
   #

does not generate samples in the perf.data file. On x86 the command:

  # sudo perf record -e intel_pt// -u 0 ls

is broken too.

Looking at the sequence of calls in 'perf record' reveals this
behavior:

1. The event 'cycles' is created and enabled:

   record__open()
   +-> evlist__apply_filters()
       +-> perf_bpf_filter__prepare()
	   +-> bpf_program.attach_perf_event()
	       +-> bpf_program.attach_perf_event_opts()
	           +-> __GI___ioctl(..., PERF_EVENT_IOC_ENABLE, ...)

   The event 'cycles' is enabled and active now. However the event's
   ring-buffer to store the samples generated by hardware is not
   allocated yet.

2. The event's fd is mmap()ed to create the ring buffer:

   record__open()
   +-> record__mmap()
       +-> record__mmap_evlist()
	   +-> evlist__mmap_ex()
	       +-> perf_evlist__mmap_ops()
	           +-> mmap_per_cpu()
	               +-> mmap_per_evsel()
	                   +-> mmap__mmap()
	                       +-> perf_mmap__mmap()
	                           +-> mmap()

   This allocates the ring buffer for the event 'cycles'. With mmap()
   the kernel creates the ring buffer:

   perf_mmap(): kernel function to create the event's ring
   |            buffer to save the sampled data.
   |
   +-> ring_buffer_attach(): Allocates memory for ring buffer.
       |        The PMU has auxiliary data setup function. The
       |        has_aux(event) condition is true and the PMU's
       |        stop() is called to stop sampling. It is not
       |        restarted:
       |
       |        if (has_aux(event))
       |                perf_event_stop(event, 0);
       |
       +-> cpumsf_pmu_stop():

   Hardware sampling is stopped. No samples are generated and saved
   anymore.

3. After the event 'cycles' has been mapped, the event is enabled a
   second time in:

   __cmd_record()
   +-> evlist__enable()
       +-> __evlist__enable()
	   +-> evsel__enable_cpu()
	       +-> perf_evsel__enable_cpu()
	           +-> perf_evsel__run_ioctl()
	               +-> perf_evsel__ioctl()
	                   +-> __GI___ioctl(., PERF_EVENT_IOC_ENABLE, .)

   The second

      ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);

   is just a NOP in this case. The first invocation in (1.) sets the
   event::state to PERF_EVENT_STATE_ACTIVE. The kernel functions

   perf_ioctl()
   +-> _perf_ioctl()
       +-> _perf_event_enable()
           +-> __perf_event_enable()

   return immediately because event::state is already set to
   PERF_EVENT_STATE_ACTIVE.

This happens on s390, because the event 'cycles' offers the possibility
to save auxilary data. The PMU callbacks setup_aux() and free_aux() are
defined. Without both callback functions, cpumsf_pmu_stop() is not
invoked and sampling continues.

To remedy this, remove the first invocation of

   ioctl(..., PERF_EVENT_IOC_ENABLE, ...).

in step (1.) Create the event in step (1.) and enable it in step (3.)
after the ring buffer has been mapped.

Output after:

 # ./perf record -aB --synth=no -u 0 -- ./perf test -w thloop 2
 [ perf record: Woken up 3 times to write data ]
 [ perf record: Captured and wrote 0.876 MB perf.data ]
 # ./perf  report --stats | grep SAMPLE
              SAMPLE events:      16200  (99.5%)
              SAMPLE events:      16200
 #

The software event succeeded both before and after the patch:

 # ./perf record -e cpu-clock -aB --synth=no -u 0 -- \
					  ./perf test -w thloop 2
 [ perf record: Woken up 7 times to write data ]
 [ perf record: Captured and wrote 2.870 MB perf.data ]
 # ./perf  report --stats | grep SAMPLE
              SAMPLE events:      53506  (99.8%)
              SAMPLE events:      53506
 #

Fixes: b4c658d4d63d61 ("perf target: Remove uid from target")
Suggested-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Co-developed-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
 tools/perf/util/bpf-filter.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/bpf-filter.c b/tools/perf/util/bpf-filter.c
index d0e013eeb0f7..a0b11f35395f 100644
--- a/tools/perf/util/bpf-filter.c
+++ b/tools/perf/util/bpf-filter.c
@@ -451,6 +451,8 @@ int perf_bpf_filter__prepare(struct evsel *evsel, struct target *target)
 	struct bpf_link *link;
 	struct perf_bpf_filter_entry *entry;
 	bool needs_idx_hash = !target__has_cpu(target);
+	DECLARE_LIBBPF_OPTS(bpf_perf_event_opts, pe_opts,
+			    .dont_enable = true);
 
 	entry = calloc(MAX_FILTERS, sizeof(*entry));
 	if (entry == NULL)
@@ -522,7 +524,8 @@ int perf_bpf_filter__prepare(struct evsel *evsel, struct target *target)
 	prog = skel->progs.perf_sample_filter;
 	for (x = 0; x < xyarray__max_x(evsel->core.fd); x++) {
 		for (y = 0; y < xyarray__max_y(evsel->core.fd); y++) {
-			link = bpf_program__attach_perf_event(prog, FD(evsel, x, y));
+			link = bpf_program__attach_perf_event_opts(prog, FD(evsel, x, y),
+								   &pe_opts);
 			if (IS_ERR(link)) {
 				pr_err("Failed to attach perf sample-filter program\n");
 				ret = PTR_ERR(link);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v5 0/2] perf/s390: Regression: Move uid filtering to BPF filters
  2025-08-06 16:22 [PATCH v5 0/2] perf/s390: Regression: Move uid filtering to BPF filters Ilya Leoshkevich
  2025-08-06 16:22 ` [PATCH v5 1/2] libbpf: Add the ability to suppress perf event enablement Ilya Leoshkevich
  2025-08-06 16:22 ` [PATCH v5 2/2] perf bpf-filter: Enable events manually Ilya Leoshkevich
@ 2025-08-07 16:20 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 4+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-08-07 16:20 UTC (permalink / raw)
  To: Ilya Leoshkevich
  Cc: ast, daniel, andrii, irogers, acme, bpf, linux-perf-users,
	linux-kernel, linux-s390, tmricht, jolsa, hca, gor, agordeev

Hello:

This series was applied to bpf/bpf.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Wed,  6 Aug 2025 18:22:40 +0200 you wrote:
> v4: https://lore.kernel.org/bpf/20250806114227.14617-1-iii@linux.ibm.com/
> v4 -> v5: Fix a typo in the commit message (Yonghong).
> 
> v3: https://lore.kernel.org/bpf/20250805130346.1225535-1-iii@linux.ibm.com/
> v3 -> v4: Rename the new field to dont_enable (Alexei, Eduard).
>           Switch the Fixes: tag in patch 2 (Alexander, Thomas).
>           Fix typos in the cover letter (Thomas).
> 
> [...]

Here is the summary with links:
  - [v5,1/2] libbpf: Add the ability to suppress perf event enablement
    https://git.kernel.org/bpf/bpf/c/9474e27a24a4
  - [v5,2/2] perf bpf-filter: Enable events manually
    https://git.kernel.org/bpf/bpf/c/5e2ac8e8571d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-08-07 16:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-06 16:22 [PATCH v5 0/2] perf/s390: Regression: Move uid filtering to BPF filters Ilya Leoshkevich
2025-08-06 16:22 ` [PATCH v5 1/2] libbpf: Add the ability to suppress perf event enablement Ilya Leoshkevich
2025-08-06 16:22 ` [PATCH v5 2/2] perf bpf-filter: Enable events manually Ilya Leoshkevich
2025-08-07 16:20 ` [PATCH v5 0/2] perf/s390: Regression: Move uid filtering to BPF filters patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).