* [PATCH bpf] libbpf: eBPF fails on events with auxiliary data
@ 2025-07-25 9:34 Thomas Richter
2025-07-25 14:06 ` Jiri Olsa
0 siblings, 1 reply; 3+ messages in thread
From: Thomas Richter @ 2025-07-25 9:34 UTC (permalink / raw)
To: linux-kernel, linux-s390, linux-perf-users, acme, namhyung, bpf
Cc: agordeev, gor, sumanthk, hca, japo, Thomas Richter,
Ilya Leoshkevich
On linux-next
commit b4c658d4d63d61 ("perf target: Remove uid from target")
introduces a regression on s390. In fact the regression exists
on all platforms when the event supports auxiliary data gathering.
Command
# ./perf record -u 0 -aB --synth=no -- ./perf test -w thloop
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data ]
# ./perf report --stats | grep SAMPLE
#
does not generate samples in the perf.data file.
On x86 command
# sudo perf record -e intel_pt// -u 0 ls
is broken too.
Looking at the sequence of calls in 'perf record' reveals this
behavior:
1. The event 'cycles' is created and enabled:
record__open()
+-> evlist__apply_filters()
+-> perf_bpf_filter__prepare()
+-> bpf_program.attach_perf_event()
+-> bpf_program.attach_perf_event_opts()
+-> __GI___ioctl(..., PERF_EVENT_IOC_ENABLE, ...)
The event 'cycles' is enabled and active now. However the event's
ring-buffer to store the samples generated by hardware is not
allocated yet. This happens now after enabling the event:
2. The event's fd is mmap() to create the ring buffer:
record__open()
+-> record__mmap()
+-> record__mmap_evlist()
+-> evlist__mmap_ex()
+-> perf_evlist__mmap_ops()
+-> mmap_per_cpu()
+-> mmap_per_evsel()
+-> mmap__mmap()
+-> perf_mmap__mmap()
+-> mmap()
This allocates the ring-buffer for the event 'cycles'. With mmap()
the kernel creates the ring buffer:
perf_mmap(): kernel function to create the event's ring
| buffer to save the sampled data.
|
+-> ring_buffer_attach(): Allocates memory for ring buffer.
| The PMU has auxiliary data setup function. The
| has_aux(event) condition is true and the PMU's
| stop() is called to stop sampling. It is not
| restarted:
| if (has_aux(event))
| perf_event_stop(event, 0);
|
+-> cpumsf_pmu_stop():
Hardware sampling is stopped. No samples are generated and saved
anymore.
3. After the event 'cycles' has been mapped, the event is enabled a
second time in:
__cmd_record()
+-> evlist__enable()
+-> __evlist__enable()
+-> evsel__enable_cpu()
+-> perf_evsel__enable_cpu()
+-> perf_evsel__run_ioctl()
+-> perf_evsel__ioctl()
+-> __GI___ioctl(., PERF_EVENT_IOC_ENABLE, .)
The second
ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
is just a NOP in this case. The first invocation in (1.) sets the
event::state to PERF_EVENT_STATE_ACTIVE. The kernel functions
perf_ioctl()
+-> _perf_ioctl()
+-> _perf_event_enable()
+-> __perf_event_enable() returns immediately because
event::state is already set to
PERF_EVENT_STATE_ACTIVE.
This happens on s390, because the event 'cycles' offers the possibility
to save auxilary data. The PMU call backs setup_aux() and
free_aux() are defined. Without both call back functions,
cpumsf_pmu_stop() is not invoked and sampling continues.
To remedy this, remove the first invocation of
ioctl(..., PERF_EVENT_IOC_ENABLE, ...).
in step (1.) Create the event in step (1.) and enable it in step (3.)
after the ring buffer has been mapped.
Output after:
# ./perf record -aB --synth=no -u 0 -- ./perf test -w thloop 2
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 0.876 MB perf.data ]
# ./perf report --stats | grep SAMPLE
SAMPLE events: 16200 (99.5%)
SAMPLE events: 16200
#
The software event succeeded before and after the patch:
# ./perf record -e cpu-clock -aB --synth=no -u 0 -- ./perf test -w thloop 2
[ perf record: Woken up 7 times to write data ]
[ perf record: Captured and wrote 2.870 MB perf.data ]
# ./perf report --stats | grep SAMPLE
SAMPLE events: 53506 (99.8%)
SAMPLE events: 53506
#
Fixes: 63f2f5ee856ba ("libbpf: add ability to attach/detach BPF program to perf event")
To: Andrii Nakryiko <andriin@fb.com>
To: Ian Rogers <irogers@google.com>
To: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
tools/lib/bpf/libbpf.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 162ebd16a59f..5973412a1031 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -10960,12 +10960,6 @@ struct bpf_link *bpf_program__attach_perf_event_opts(const struct bpf_program *p
}
link->link.fd = pfd;
}
- if (ioctl(pfd, PERF_EVENT_IOC_ENABLE, 0) < 0) {
- err = -errno;
- pr_warn("prog '%s': failed to enable perf_event FD %d: %s\n",
- prog->name, pfd, errstr(err));
- goto err_out;
- }
return &link->link;
err_out:
--
2.50.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH bpf] libbpf: eBPF fails on events with auxiliary data
2025-07-25 9:34 [PATCH bpf] libbpf: eBPF fails on events with auxiliary data Thomas Richter
@ 2025-07-25 14:06 ` Jiri Olsa
2025-07-31 7:41 ` Thomas Richter
0 siblings, 1 reply; 3+ messages in thread
From: Jiri Olsa @ 2025-07-25 14:06 UTC (permalink / raw)
To: Thomas Richter
Cc: linux-kernel, linux-s390, linux-perf-users, acme, namhyung, bpf,
agordeev, gor, sumanthk, hca, japo, Ilya Leoshkevich
On Fri, Jul 25, 2025 at 11:34:05AM +0200, Thomas Richter wrote:
> On linux-next
> commit b4c658d4d63d61 ("perf target: Remove uid from target")
> introduces a regression on s390. In fact the regression exists
> on all platforms when the event supports auxiliary data gathering.
>
> Command
> # ./perf record -u 0 -aB --synth=no -- ./perf test -w thloop
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.011 MB perf.data ]
> # ./perf report --stats | grep SAMPLE
> #
>
> does not generate samples in the perf.data file.
> On x86 command
> # sudo perf record -e intel_pt// -u 0 ls
> is broken too.
>
> Looking at the sequence of calls in 'perf record' reveals this
> behavior:
> 1. The event 'cycles' is created and enabled:
> record__open()
> +-> evlist__apply_filters()
> +-> perf_bpf_filter__prepare()
> +-> bpf_program.attach_perf_event()
> +-> bpf_program.attach_perf_event_opts()
> +-> __GI___ioctl(..., PERF_EVENT_IOC_ENABLE, ...)
> The event 'cycles' is enabled and active now. However the event's
> ring-buffer to store the samples generated by hardware is not
> allocated yet. This happens now after enabling the event:
>
> 2. The event's fd is mmap() to create the ring buffer:
> record__open()
> +-> record__mmap()
> +-> record__mmap_evlist()
> +-> evlist__mmap_ex()
> +-> perf_evlist__mmap_ops()
> +-> mmap_per_cpu()
> +-> mmap_per_evsel()
> +-> mmap__mmap()
> +-> perf_mmap__mmap()
> +-> mmap()
>
> This allocates the ring-buffer for the event 'cycles'. With mmap()
> the kernel creates the ring buffer:
>
> perf_mmap(): kernel function to create the event's ring
> | buffer to save the sampled data.
> |
> +-> ring_buffer_attach(): Allocates memory for ring buffer.
> | The PMU has auxiliary data setup function. The
> | has_aux(event) condition is true and the PMU's
> | stop() is called to stop sampling. It is not
> | restarted:
> | if (has_aux(event))
> | perf_event_stop(event, 0);
> |
> +-> cpumsf_pmu_stop():
>
> Hardware sampling is stopped. No samples are generated and saved
> anymore.
>
> 3. After the event 'cycles' has been mapped, the event is enabled a
> second time in:
> __cmd_record()
> +-> evlist__enable()
> +-> __evlist__enable()
> +-> evsel__enable_cpu()
> +-> perf_evsel__enable_cpu()
> +-> perf_evsel__run_ioctl()
> +-> perf_evsel__ioctl()
> +-> __GI___ioctl(., PERF_EVENT_IOC_ENABLE, .)
> The second
> ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
> is just a NOP in this case. The first invocation in (1.) sets the
> event::state to PERF_EVENT_STATE_ACTIVE. The kernel functions
> perf_ioctl()
> +-> _perf_ioctl()
> +-> _perf_event_enable()
> +-> __perf_event_enable() returns immediately because
> event::state is already set to
> PERF_EVENT_STATE_ACTIVE.
>
> This happens on s390, because the event 'cycles' offers the possibility
> to save auxilary data. The PMU call backs setup_aux() and
> free_aux() are defined. Without both call back functions,
> cpumsf_pmu_stop() is not invoked and sampling continues.
>
> To remedy this, remove the first invocation of
> ioctl(..., PERF_EVENT_IOC_ENABLE, ...).
> in step (1.) Create the event in step (1.) and enable it in step (3.)
> after the ring buffer has been mapped.
>
> Output after:
> # ./perf record -aB --synth=no -u 0 -- ./perf test -w thloop 2
> [ perf record: Woken up 3 times to write data ]
> [ perf record: Captured and wrote 0.876 MB perf.data ]
> # ./perf report --stats | grep SAMPLE
> SAMPLE events: 16200 (99.5%)
> SAMPLE events: 16200
> #
>
> The software event succeeded before and after the patch:
> # ./perf record -e cpu-clock -aB --synth=no -u 0 -- ./perf test -w thloop 2
> [ perf record: Woken up 7 times to write data ]
> [ perf record: Captured and wrote 2.870 MB perf.data ]
> # ./perf report --stats | grep SAMPLE
> SAMPLE events: 53506 (99.8%)
> SAMPLE events: 53506
> #
>
> Fixes: 63f2f5ee856ba ("libbpf: add ability to attach/detach BPF program to perf event")
> To: Andrii Nakryiko <andriin@fb.com>
> To: Ian Rogers <irogers@google.com>
> To: Ilya Leoshkevich <iii@linux.ibm.com>
> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
> Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
> ---
> tools/lib/bpf/libbpf.c | 6 ------
> 1 file changed, 6 deletions(-)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 162ebd16a59f..5973412a1031 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -10960,12 +10960,6 @@ struct bpf_link *bpf_program__attach_perf_event_opts(const struct bpf_program *p
> }
> link->link.fd = pfd;
> }
> - if (ioctl(pfd, PERF_EVENT_IOC_ENABLE, 0) < 0) {
> - err = -errno;
> - pr_warn("prog '%s': failed to enable perf_event FD %d: %s\n",
> - prog->name, pfd, errstr(err));
> - goto err_out;
> - }
I think this might break existing users depending on this
could we instead add some 'enable' flag to bpf_perf_event_opts and perf
would use bpf_program__attach_perf_event_opts function instead?
jirka
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH bpf] libbpf: eBPF fails on events with auxiliary data
2025-07-25 14:06 ` Jiri Olsa
@ 2025-07-31 7:41 ` Thomas Richter
0 siblings, 0 replies; 3+ messages in thread
From: Thomas Richter @ 2025-07-31 7:41 UTC (permalink / raw)
To: Jiri Olsa, Ian Rogers, Ilya Leoshkevich
Cc: linux-kernel, linux-s390, linux-perf-users, acme, namhyung, bpf,
agordeev, gor, sumanthk, hca, japo
On 7/25/25 16:06, Jiri Olsa wrote:
> On Fri, Jul 25, 2025 at 11:34:05AM +0200, Thomas Richter wrote:
>> On linux-next
>> commit b4c658d4d63d61 ("perf target: Remove uid from target")
>> introduces a regression on s390. In fact the regression exists
>> on all platforms when the event supports auxiliary data gathering.
>>
>> Command
>> # ./perf record -u 0 -aB --synth=no -- ./perf test -w thloop
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.011 MB perf.data ]
>> # ./perf report --stats | grep SAMPLE
>> #
....
>> - goto err_out;
>> - }
>
> I think this might break existing users depending on this
>
> could we instead add some 'enable' flag to bpf_perf_event_opts and perf
> would use bpf_program__attach_perf_event_opts function instead?
>
> jirka
>
Hi Jiri, Ilya and Ian,
Jiri recommended a more flexible approach and I submitted version 2
https://lore.kernel.org/all/20250728144340.711196-1-tmricht@linux.ibm.com
This version now also passed the eBPF ci-test suite.
I am blind on the eBPF topic and do not have enough knowledge to dig deeper into eBPF.
Please note that this is not an s390 specific issue. It happens on other platforms
when the event supports auxiliary data gathering, i. e. has_aux() returns true.
I would like to ask someone with more eBPF knowledge for some help to drive this further.
Thanks a lot.
--
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Wolfgang Wendt
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-07-31 7:41 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-25 9:34 [PATCH bpf] libbpf: eBPF fails on events with auxiliary data Thomas Richter
2025-07-25 14:06 ` Jiri Olsa
2025-07-31 7:41 ` Thomas Richter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).