* [PATCH v4 0/2] perf/s390: Regression: Move uid filtering to BPF filters @ 2025-08-06 11:40 Ilya Leoshkevich 2025-08-06 11:40 ` [PATCH v4 1/2] libbpf: Add the ability to suppress perf event enablement Ilya Leoshkevich 2025-08-06 11:40 ` [PATCH v4 2/2] perf bpf-filter: Enable events manually Ilya Leoshkevich 0 siblings, 2 replies; 7+ messages in thread From: Ilya Leoshkevich @ 2025-08-06 11:40 UTC (permalink / raw) To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Ian Rogers, Arnaldo Carvalho de Melo Cc: bpf, linux-perf-users, linux-kernel, linux-s390, Thomas Richter, Jiri Olsa, Heiko Carstens, Vasily Gorbik, Alexander Gordeev, Ilya Leoshkevich v3: https://lore.kernel.org/bpf/20250805130346.1225535-1-iii@linux.ibm.com/ v3 -> v4: Rename the new field to dont_enable (Alexei, Eduard). Switch the Fixes: tag in patch 2 (Alexander, Thomas). Fix typos in the cover letter (Thomas). v2: https://lore.kernel.org/bpf/20250728144340.711196-1-tmricht@linux.ibm.com/ v2 -> v3: Use no_ioctl_enable in perf. v1: https://lore.kernel.org/bpf/20250725093405.3629253-1-tmricht@linux.ibm.com/ v1 -> v2: Introduce no_ioctl_enable (Jiri). Hi, This series fixes a regression caused by moving UID filtering to BPF. The regression affects all events that support auxiliary data, most notably, "cycles" events on s390, but also PT events on Intel. The symptom is missing events when UID filtering is enabled. Patch 1 introduces a new option for the bpf_program__attach_perf_event_opts() function. Patch 2 makes use of it in perf, and also contains a lot of technical details of why exactly the problem is occurring. Thanks to Thomas Richter for the investigation and the initial version of this fix, and to Jiri Olsa for suggestions. Best regards, Ilya Ilya Leoshkevich (2): libbpf: Add the ability to suppress perf event enablement perf bpf-filter: Enable events manually tools/lib/bpf/libbpf.c | 13 ++++++++----- tools/lib/bpf/libbpf.h | 4 +++- tools/perf/util/bpf-filter.c | 5 ++++- 3 files changed, 15 insertions(+), 7 deletions(-) -- 2.50.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v4 1/2] libbpf: Add the ability to suppress perf event enablement 2025-08-06 11:40 [PATCH v4 0/2] perf/s390: Regression: Move uid filtering to BPF filters Ilya Leoshkevich @ 2025-08-06 11:40 ` Ilya Leoshkevich 2025-08-06 15:25 ` Yonghong Song 2025-08-06 11:40 ` [PATCH v4 2/2] perf bpf-filter: Enable events manually Ilya Leoshkevich 1 sibling, 1 reply; 7+ messages in thread From: Ilya Leoshkevich @ 2025-08-06 11:40 UTC (permalink / raw) To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Ian Rogers, Arnaldo Carvalho de Melo Cc: bpf, linux-perf-users, linux-kernel, linux-s390, Thomas Richter, Jiri Olsa, Heiko Carstens, Vasily Gorbik, Alexander Gordeev, Ilya Leoshkevich, Eduard Zingerman Automatically enabling a perf event after attaching a BPF prog to it is not always desirable. Add a new no_ioctl_enable field to struct bpf_perf_event_opts. While introducing ioctl_enable instead would be nicer in that it would avoid a double negation in the implementation, it would make DECLARE_LIBBPF_OPTS() less efficient. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Suggested-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Co-developed-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> --- tools/lib/bpf/libbpf.c | 13 ++++++++----- tools/lib/bpf/libbpf.h | 4 +++- 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index fb4d92c5c339..8f5a81b672e1 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -10965,11 +10965,14 @@ struct bpf_link *bpf_program__attach_perf_event_opts(const struct bpf_program *p } link->link.fd = pfd; } - if (ioctl(pfd, PERF_EVENT_IOC_ENABLE, 0) < 0) { - err = -errno; - pr_warn("prog '%s': failed to enable perf_event FD %d: %s\n", - prog->name, pfd, errstr(err)); - goto err_out; + + if (!OPTS_GET(opts, dont_enable, false)) { + if (ioctl(pfd, PERF_EVENT_IOC_ENABLE, 0) < 0) { + err = -errno; + pr_warn("prog '%s': failed to enable perf_event FD %d: %s\n", + prog->name, pfd, errstr(err)); + goto err_out; + } } return &link->link; diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index d1cf813a057b..455a957cb702 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -499,9 +499,11 @@ struct bpf_perf_event_opts { __u64 bpf_cookie; /* don't use BPF link when attach BPF program */ bool force_ioctl_attach; + /* don't automatically enable the event */ + bool dont_enable; size_t :0; }; -#define bpf_perf_event_opts__last_field force_ioctl_attach +#define bpf_perf_event_opts__last_field dont_enable LIBBPF_API struct bpf_link * bpf_program__attach_perf_event(const struct bpf_program *prog, int pfd); -- 2.50.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v4 1/2] libbpf: Add the ability to suppress perf event enablement 2025-08-06 11:40 ` [PATCH v4 1/2] libbpf: Add the ability to suppress perf event enablement Ilya Leoshkevich @ 2025-08-06 15:25 ` Yonghong Song 0 siblings, 0 replies; 7+ messages in thread From: Yonghong Song @ 2025-08-06 15:25 UTC (permalink / raw) To: Ilya Leoshkevich, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Ian Rogers, Arnaldo Carvalho de Melo Cc: bpf, linux-perf-users, linux-kernel, linux-s390, Thomas Richter, Jiri Olsa, Heiko Carstens, Vasily Gorbik, Alexander Gordeev, Eduard Zingerman On 8/6/25 4:40 AM, Ilya Leoshkevich wrote: > Automatically enabling a perf event after attaching a BPF prog to it is > not always desirable. > > Add a new no_ioctl_enable field to struct bpf_perf_event_opts. While no_ioctl_enable => dont_enable > introducing ioctl_enable instead would be nicer in that it would avoid > a double negation in the implementation, it would make > DECLARE_LIBBPF_OPTS() less efficient. > > Acked-by: Eduard Zingerman <eddyz87@gmail.com> > Suggested-by: Jiri Olsa <jolsa@kernel.org> > Tested-by: Thomas Richter <tmricht@linux.ibm.com> > Co-developed-by: Thomas Richter <tmricht@linux.ibm.com> > Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> > --- > tools/lib/bpf/libbpf.c | 13 ++++++++----- > tools/lib/bpf/libbpf.h | 4 +++- > 2 files changed, 11 insertions(+), 6 deletions(-) > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > index fb4d92c5c339..8f5a81b672e1 100644 > --- a/tools/lib/bpf/libbpf.c > +++ b/tools/lib/bpf/libbpf.c > @@ -10965,11 +10965,14 @@ struct bpf_link *bpf_program__attach_perf_event_opts(const struct bpf_program *p > } > link->link.fd = pfd; > } > - if (ioctl(pfd, PERF_EVENT_IOC_ENABLE, 0) < 0) { > - err = -errno; > - pr_warn("prog '%s': failed to enable perf_event FD %d: %s\n", > - prog->name, pfd, errstr(err)); > - goto err_out; > + > + if (!OPTS_GET(opts, dont_enable, false)) { > + if (ioctl(pfd, PERF_EVENT_IOC_ENABLE, 0) < 0) { > + err = -errno; > + pr_warn("prog '%s': failed to enable perf_event FD %d: %s\n", > + prog->name, pfd, errstr(err)); > + goto err_out; > + } > } > > return &link->link; > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h > index d1cf813a057b..455a957cb702 100644 > --- a/tools/lib/bpf/libbpf.h > +++ b/tools/lib/bpf/libbpf.h > @@ -499,9 +499,11 @@ struct bpf_perf_event_opts { > __u64 bpf_cookie; > /* don't use BPF link when attach BPF program */ > bool force_ioctl_attach; > + /* don't automatically enable the event */ > + bool dont_enable; > size_t :0; > }; > -#define bpf_perf_event_opts__last_field force_ioctl_attach > +#define bpf_perf_event_opts__last_field dont_enable > > LIBBPF_API struct bpf_link * > bpf_program__attach_perf_event(const struct bpf_program *prog, int pfd); ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v4 2/2] perf bpf-filter: Enable events manually 2025-08-06 11:40 [PATCH v4 0/2] perf/s390: Regression: Move uid filtering to BPF filters Ilya Leoshkevich 2025-08-06 11:40 ` [PATCH v4 1/2] libbpf: Add the ability to suppress perf event enablement Ilya Leoshkevich @ 2025-08-06 11:40 ` Ilya Leoshkevich 2025-08-06 22:53 ` Namhyung Kim 1 sibling, 1 reply; 7+ messages in thread From: Ilya Leoshkevich @ 2025-08-06 11:40 UTC (permalink / raw) To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Ian Rogers, Arnaldo Carvalho de Melo Cc: bpf, linux-perf-users, linux-kernel, linux-s390, Thomas Richter, Jiri Olsa, Heiko Carstens, Vasily Gorbik, Alexander Gordeev, Ilya Leoshkevich On s390, and, in general, on all platforms where the respective event supports auxiliary data gathering, the command: # ./perf record -u 0 -aB --synth=no -- ./perf test -w thloop [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.011 MB perf.data ] # ./perf report --stats | grep SAMPLE # does not generate samples in the perf.data file. On x86 the command: # sudo perf record -e intel_pt// -u 0 ls is broken too. Looking at the sequence of calls in 'perf record' reveals this behavior: 1. The event 'cycles' is created and enabled: record__open() +-> evlist__apply_filters() +-> perf_bpf_filter__prepare() +-> bpf_program.attach_perf_event() +-> bpf_program.attach_perf_event_opts() +-> __GI___ioctl(..., PERF_EVENT_IOC_ENABLE, ...) The event 'cycles' is enabled and active now. However the event's ring-buffer to store the samples generated by hardware is not allocated yet. 2. The event's fd is mmap()ed to create the ring buffer: record__open() +-> record__mmap() +-> record__mmap_evlist() +-> evlist__mmap_ex() +-> perf_evlist__mmap_ops() +-> mmap_per_cpu() +-> mmap_per_evsel() +-> mmap__mmap() +-> perf_mmap__mmap() +-> mmap() This allocates the ring buffer for the event 'cycles'. With mmap() the kernel creates the ring buffer: perf_mmap(): kernel function to create the event's ring | buffer to save the sampled data. | +-> ring_buffer_attach(): Allocates memory for ring buffer. | The PMU has auxiliary data setup function. The | has_aux(event) condition is true and the PMU's | stop() is called to stop sampling. It is not | restarted: | | if (has_aux(event)) | perf_event_stop(event, 0); | +-> cpumsf_pmu_stop(): Hardware sampling is stopped. No samples are generated and saved anymore. 3. After the event 'cycles' has been mapped, the event is enabled a second time in: __cmd_record() +-> evlist__enable() +-> __evlist__enable() +-> evsel__enable_cpu() +-> perf_evsel__enable_cpu() +-> perf_evsel__run_ioctl() +-> perf_evsel__ioctl() +-> __GI___ioctl(., PERF_EVENT_IOC_ENABLE, .) The second ioctl(fd, PERF_EVENT_IOC_ENABLE, 0); is just a NOP in this case. The first invocation in (1.) sets the event::state to PERF_EVENT_STATE_ACTIVE. The kernel functions perf_ioctl() +-> _perf_ioctl() +-> _perf_event_enable() +-> __perf_event_enable() return immediately because event::state is already set to PERF_EVENT_STATE_ACTIVE. This happens on s390, because the event 'cycles' offers the possibility to save auxilary data. The PMU callbacks setup_aux() and free_aux() are defined. Without both callback functions, cpumsf_pmu_stop() is not invoked and sampling continues. To remedy this, remove the first invocation of ioctl(..., PERF_EVENT_IOC_ENABLE, ...). in step (1.) Create the event in step (1.) and enable it in step (3.) after the ring buffer has been mapped. Output after: # ./perf record -aB --synth=no -u 0 -- ./perf test -w thloop 2 [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 0.876 MB perf.data ] # ./perf report --stats | grep SAMPLE SAMPLE events: 16200 (99.5%) SAMPLE events: 16200 # The software event succeeded both before and after the patch: # ./perf record -e cpu-clock -aB --synth=no -u 0 -- \ ./perf test -w thloop 2 [ perf record: Woken up 7 times to write data ] [ perf record: Captured and wrote 2.870 MB perf.data ] # ./perf report --stats | grep SAMPLE SAMPLE events: 53506 (99.8%) SAMPLE events: 53506 # Fixes: b4c658d4d63d61 ("perf target: Remove uid from target") Suggested-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Co-developed-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> --- tools/perf/util/bpf-filter.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/bpf-filter.c b/tools/perf/util/bpf-filter.c index d0e013eeb0f7..a0b11f35395f 100644 --- a/tools/perf/util/bpf-filter.c +++ b/tools/perf/util/bpf-filter.c @@ -451,6 +451,8 @@ int perf_bpf_filter__prepare(struct evsel *evsel, struct target *target) struct bpf_link *link; struct perf_bpf_filter_entry *entry; bool needs_idx_hash = !target__has_cpu(target); + DECLARE_LIBBPF_OPTS(bpf_perf_event_opts, pe_opts, + .dont_enable = true); entry = calloc(MAX_FILTERS, sizeof(*entry)); if (entry == NULL) @@ -522,7 +524,8 @@ int perf_bpf_filter__prepare(struct evsel *evsel, struct target *target) prog = skel->progs.perf_sample_filter; for (x = 0; x < xyarray__max_x(evsel->core.fd); x++) { for (y = 0; y < xyarray__max_y(evsel->core.fd); y++) { - link = bpf_program__attach_perf_event(prog, FD(evsel, x, y)); + link = bpf_program__attach_perf_event_opts(prog, FD(evsel, x, y), + &pe_opts); if (IS_ERR(link)) { pr_err("Failed to attach perf sample-filter program\n"); ret = PTR_ERR(link); -- 2.50.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v4 2/2] perf bpf-filter: Enable events manually 2025-08-06 11:40 ` [PATCH v4 2/2] perf bpf-filter: Enable events manually Ilya Leoshkevich @ 2025-08-06 22:53 ` Namhyung Kim 2025-08-06 23:38 ` Alexei Starovoitov 0 siblings, 1 reply; 7+ messages in thread From: Namhyung Kim @ 2025-08-06 22:53 UTC (permalink / raw) To: Ilya Leoshkevich Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Ian Rogers, Arnaldo Carvalho de Melo, bpf, linux-perf-users, linux-kernel, linux-s390, Thomas Richter, Jiri Olsa, Heiko Carstens, Vasily Gorbik, Alexander Gordeev Hello, On Wed, Aug 06, 2025 at 01:40:35PM +0200, Ilya Leoshkevich wrote: > On s390, and, in general, on all platforms where the respective event > supports auxiliary data gathering, the command: > > # ./perf record -u 0 -aB --synth=no -- ./perf test -w thloop > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.011 MB perf.data ] > # ./perf report --stats | grep SAMPLE > # > > does not generate samples in the perf.data file. On x86 the command: > > # sudo perf record -e intel_pt// -u 0 ls > > is broken too. > > Looking at the sequence of calls in 'perf record' reveals this > behavior: > > 1. The event 'cycles' is created and enabled: > > record__open() > +-> evlist__apply_filters() > +-> perf_bpf_filter__prepare() > +-> bpf_program.attach_perf_event() > +-> bpf_program.attach_perf_event_opts() > +-> __GI___ioctl(..., PERF_EVENT_IOC_ENABLE, ...) > > The event 'cycles' is enabled and active now. However the event's > ring-buffer to store the samples generated by hardware is not > allocated yet. > > 2. The event's fd is mmap()ed to create the ring buffer: > > record__open() > +-> record__mmap() > +-> record__mmap_evlist() > +-> evlist__mmap_ex() > +-> perf_evlist__mmap_ops() > +-> mmap_per_cpu() > +-> mmap_per_evsel() > +-> mmap__mmap() > +-> perf_mmap__mmap() > +-> mmap() > > This allocates the ring buffer for the event 'cycles'. With mmap() > the kernel creates the ring buffer: > > perf_mmap(): kernel function to create the event's ring > | buffer to save the sampled data. > | > +-> ring_buffer_attach(): Allocates memory for ring buffer. > | The PMU has auxiliary data setup function. The > | has_aux(event) condition is true and the PMU's > | stop() is called to stop sampling. It is not > | restarted: > | > | if (has_aux(event)) > | perf_event_stop(event, 0); > | > +-> cpumsf_pmu_stop(): > > Hardware sampling is stopped. No samples are generated and saved > anymore. > > 3. After the event 'cycles' has been mapped, the event is enabled a > second time in: > > __cmd_record() > +-> evlist__enable() > +-> __evlist__enable() > +-> evsel__enable_cpu() > +-> perf_evsel__enable_cpu() > +-> perf_evsel__run_ioctl() > +-> perf_evsel__ioctl() > +-> __GI___ioctl(., PERF_EVENT_IOC_ENABLE, .) > > The second > > ioctl(fd, PERF_EVENT_IOC_ENABLE, 0); > > is just a NOP in this case. The first invocation in (1.) sets the > event::state to PERF_EVENT_STATE_ACTIVE. The kernel functions > > perf_ioctl() > +-> _perf_ioctl() > +-> _perf_event_enable() > +-> __perf_event_enable() > > return immediately because event::state is already set to > PERF_EVENT_STATE_ACTIVE. > > This happens on s390, because the event 'cycles' offers the possibility > to save auxilary data. The PMU callbacks setup_aux() and free_aux() are > defined. Without both callback functions, cpumsf_pmu_stop() is not > invoked and sampling continues. > > To remedy this, remove the first invocation of > > ioctl(..., PERF_EVENT_IOC_ENABLE, ...). > > in step (1.) Create the event in step (1.) and enable it in step (3.) > after the ring buffer has been mapped. > > Output after: > > # ./perf record -aB --synth=no -u 0 -- ./perf test -w thloop 2 > [ perf record: Woken up 3 times to write data ] > [ perf record: Captured and wrote 0.876 MB perf.data ] > # ./perf report --stats | grep SAMPLE > SAMPLE events: 16200 (99.5%) > SAMPLE events: 16200 > # > > The software event succeeded both before and after the patch: > > # ./perf record -e cpu-clock -aB --synth=no -u 0 -- \ > ./perf test -w thloop 2 > [ perf record: Woken up 7 times to write data ] > [ perf record: Captured and wrote 2.870 MB perf.data ] > # ./perf report --stats | grep SAMPLE > SAMPLE events: 53506 (99.8%) > SAMPLE events: 53506 > # > > Fixes: b4c658d4d63d61 ("perf target: Remove uid from target") > Suggested-by: Jiri Olsa <jolsa@kernel.org> > Tested-by: Thomas Richter <tmricht@linux.ibm.com> > Co-developed-by: Thomas Richter <tmricht@linux.ibm.com> > Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Thanks, Namhyung > --- > tools/perf/util/bpf-filter.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/tools/perf/util/bpf-filter.c b/tools/perf/util/bpf-filter.c > index d0e013eeb0f7..a0b11f35395f 100644 > --- a/tools/perf/util/bpf-filter.c > +++ b/tools/perf/util/bpf-filter.c > @@ -451,6 +451,8 @@ int perf_bpf_filter__prepare(struct evsel *evsel, struct target *target) > struct bpf_link *link; > struct perf_bpf_filter_entry *entry; > bool needs_idx_hash = !target__has_cpu(target); > + DECLARE_LIBBPF_OPTS(bpf_perf_event_opts, pe_opts, > + .dont_enable = true); > > entry = calloc(MAX_FILTERS, sizeof(*entry)); > if (entry == NULL) > @@ -522,7 +524,8 @@ int perf_bpf_filter__prepare(struct evsel *evsel, struct target *target) > prog = skel->progs.perf_sample_filter; > for (x = 0; x < xyarray__max_x(evsel->core.fd); x++) { > for (y = 0; y < xyarray__max_y(evsel->core.fd); y++) { > - link = bpf_program__attach_perf_event(prog, FD(evsel, x, y)); > + link = bpf_program__attach_perf_event_opts(prog, FD(evsel, x, y), > + &pe_opts); > if (IS_ERR(link)) { > pr_err("Failed to attach perf sample-filter program\n"); > ret = PTR_ERR(link); > -- > 2.50.1 > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 2/2] perf bpf-filter: Enable events manually 2025-08-06 22:53 ` Namhyung Kim @ 2025-08-06 23:38 ` Alexei Starovoitov 2025-08-07 5:02 ` Namhyung Kim 0 siblings, 1 reply; 7+ messages in thread From: Alexei Starovoitov @ 2025-08-06 23:38 UTC (permalink / raw) To: Namhyung Kim Cc: Ilya Leoshkevich, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Ian Rogers, Arnaldo Carvalho de Melo, bpf, linux-perf-use., LKML, linux-s390, Thomas Richter, Jiri Olsa, Heiko Carstens, Vasily Gorbik, Alexander Gordeev On Wed, Aug 6, 2025 at 3:53 PM Namhyung Kim <namhyung@kernel.org> wrote: > > Hello, > > On Wed, Aug 06, 2025 at 01:40:35PM +0200, Ilya Leoshkevich wrote: > > On s390, and, in general, on all platforms where the respective event > > supports auxiliary data gathering, the command: > > > > # ./perf record -u 0 -aB --synth=no -- ./perf test -w thloop > > [ perf record: Woken up 1 times to write data ] > > [ perf record: Captured and wrote 0.011 MB perf.data ] > > # ./perf report --stats | grep SAMPLE > > # > > > > does not generate samples in the perf.data file. On x86 the command: > > > > # sudo perf record -e intel_pt// -u 0 ls > > > > is broken too. > > > > Looking at the sequence of calls in 'perf record' reveals this > > behavior: > > > > 1. The event 'cycles' is created and enabled: > > > > record__open() > > +-> evlist__apply_filters() > > +-> perf_bpf_filter__prepare() > > +-> bpf_program.attach_perf_event() > > +-> bpf_program.attach_perf_event_opts() > > +-> __GI___ioctl(..., PERF_EVENT_IOC_ENABLE, ...) > > > > The event 'cycles' is enabled and active now. However the event's > > ring-buffer to store the samples generated by hardware is not > > allocated yet. > > > > 2. The event's fd is mmap()ed to create the ring buffer: > > > > record__open() > > +-> record__mmap() > > +-> record__mmap_evlist() > > +-> evlist__mmap_ex() > > +-> perf_evlist__mmap_ops() > > +-> mmap_per_cpu() > > +-> mmap_per_evsel() > > +-> mmap__mmap() > > +-> perf_mmap__mmap() > > +-> mmap() > > > > This allocates the ring buffer for the event 'cycles'. With mmap() > > the kernel creates the ring buffer: > > > > perf_mmap(): kernel function to create the event's ring > > | buffer to save the sampled data. > > | > > +-> ring_buffer_attach(): Allocates memory for ring buffer. > > | The PMU has auxiliary data setup function. The > > | has_aux(event) condition is true and the PMU's > > | stop() is called to stop sampling. It is not > > | restarted: > > | > > | if (has_aux(event)) > > | perf_event_stop(event, 0); > > | > > +-> cpumsf_pmu_stop(): > > > > Hardware sampling is stopped. No samples are generated and saved > > anymore. > > > > 3. After the event 'cycles' has been mapped, the event is enabled a > > second time in: > > > > __cmd_record() > > +-> evlist__enable() > > +-> __evlist__enable() > > +-> evsel__enable_cpu() > > +-> perf_evsel__enable_cpu() > > +-> perf_evsel__run_ioctl() > > +-> perf_evsel__ioctl() > > +-> __GI___ioctl(., PERF_EVENT_IOC_ENABLE, .) > > > > The second > > > > ioctl(fd, PERF_EVENT_IOC_ENABLE, 0); > > > > is just a NOP in this case. The first invocation in (1.) sets the > > event::state to PERF_EVENT_STATE_ACTIVE. The kernel functions > > > > perf_ioctl() > > +-> _perf_ioctl() > > +-> _perf_event_enable() > > +-> __perf_event_enable() > > > > return immediately because event::state is already set to > > PERF_EVENT_STATE_ACTIVE. > > > > This happens on s390, because the event 'cycles' offers the possibility > > to save auxilary data. The PMU callbacks setup_aux() and free_aux() are > > defined. Without both callback functions, cpumsf_pmu_stop() is not > > invoked and sampling continues. > > > > To remedy this, remove the first invocation of > > > > ioctl(..., PERF_EVENT_IOC_ENABLE, ...). > > > > in step (1.) Create the event in step (1.) and enable it in step (3.) > > after the ring buffer has been mapped. > > > > Output after: > > > > # ./perf record -aB --synth=no -u 0 -- ./perf test -w thloop 2 > > [ perf record: Woken up 3 times to write data ] > > [ perf record: Captured and wrote 0.876 MB perf.data ] > > # ./perf report --stats | grep SAMPLE > > SAMPLE events: 16200 (99.5%) > > SAMPLE events: 16200 > > # > > > > The software event succeeded both before and after the patch: > > > > # ./perf record -e cpu-clock -aB --synth=no -u 0 -- \ > > ./perf test -w thloop 2 > > [ perf record: Woken up 7 times to write data ] > > [ perf record: Captured and wrote 2.870 MB perf.data ] > > # ./perf report --stats | grep SAMPLE > > SAMPLE events: 53506 (99.8%) > > SAMPLE events: 53506 > > # > > > > Fixes: b4c658d4d63d61 ("perf target: Remove uid from target") > > Suggested-by: Jiri Olsa <jolsa@kernel.org> > > Tested-by: Thomas Richter <tmricht@linux.ibm.com> > > Co-developed-by: Thomas Richter <tmricht@linux.ibm.com> > > Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> > > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> > > Acked-by: Namhyung Kim <namhyung@kernel.org> Do you mind if I take the whole set through the bpf tree ? I'm planning to send bpf PR in a couple days, so by -rc1 all trees will see the fix. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 2/2] perf bpf-filter: Enable events manually 2025-08-06 23:38 ` Alexei Starovoitov @ 2025-08-07 5:02 ` Namhyung Kim 0 siblings, 0 replies; 7+ messages in thread From: Namhyung Kim @ 2025-08-07 5:02 UTC (permalink / raw) To: Alexei Starovoitov Cc: Ilya Leoshkevich, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Ian Rogers, Arnaldo Carvalho de Melo, bpf, linux-perf-use., LKML, linux-s390, Thomas Richter, Jiri Olsa, Heiko Carstens, Vasily Gorbik, Alexander Gordeev Hi Alexei, On Wed, Aug 06, 2025 at 04:38:09PM -0700, Alexei Starovoitov wrote: > On Wed, Aug 6, 2025 at 3:53 PM Namhyung Kim <namhyung@kernel.org> wrote: > > > > Hello, > > > > On Wed, Aug 06, 2025 at 01:40:35PM +0200, Ilya Leoshkevich wrote: > > > On s390, and, in general, on all platforms where the respective event > > > supports auxiliary data gathering, the command: > > > > > > # ./perf record -u 0 -aB --synth=no -- ./perf test -w thloop > > > [ perf record: Woken up 1 times to write data ] > > > [ perf record: Captured and wrote 0.011 MB perf.data ] > > > # ./perf report --stats | grep SAMPLE > > > # > > > > > > does not generate samples in the perf.data file. On x86 the command: > > > > > > # sudo perf record -e intel_pt// -u 0 ls > > > > > > is broken too. > > > > > > Looking at the sequence of calls in 'perf record' reveals this > > > behavior: > > > > > > 1. The event 'cycles' is created and enabled: > > > > > > record__open() > > > +-> evlist__apply_filters() > > > +-> perf_bpf_filter__prepare() > > > +-> bpf_program.attach_perf_event() > > > +-> bpf_program.attach_perf_event_opts() > > > +-> __GI___ioctl(..., PERF_EVENT_IOC_ENABLE, ...) > > > > > > The event 'cycles' is enabled and active now. However the event's > > > ring-buffer to store the samples generated by hardware is not > > > allocated yet. > > > > > > 2. The event's fd is mmap()ed to create the ring buffer: > > > > > > record__open() > > > +-> record__mmap() > > > +-> record__mmap_evlist() > > > +-> evlist__mmap_ex() > > > +-> perf_evlist__mmap_ops() > > > +-> mmap_per_cpu() > > > +-> mmap_per_evsel() > > > +-> mmap__mmap() > > > +-> perf_mmap__mmap() > > > +-> mmap() > > > > > > This allocates the ring buffer for the event 'cycles'. With mmap() > > > the kernel creates the ring buffer: > > > > > > perf_mmap(): kernel function to create the event's ring > > > | buffer to save the sampled data. > > > | > > > +-> ring_buffer_attach(): Allocates memory for ring buffer. > > > | The PMU has auxiliary data setup function. The > > > | has_aux(event) condition is true and the PMU's > > > | stop() is called to stop sampling. It is not > > > | restarted: > > > | > > > | if (has_aux(event)) > > > | perf_event_stop(event, 0); > > > | > > > +-> cpumsf_pmu_stop(): > > > > > > Hardware sampling is stopped. No samples are generated and saved > > > anymore. > > > > > > 3. After the event 'cycles' has been mapped, the event is enabled a > > > second time in: > > > > > > __cmd_record() > > > +-> evlist__enable() > > > +-> __evlist__enable() > > > +-> evsel__enable_cpu() > > > +-> perf_evsel__enable_cpu() > > > +-> perf_evsel__run_ioctl() > > > +-> perf_evsel__ioctl() > > > +-> __GI___ioctl(., PERF_EVENT_IOC_ENABLE, .) > > > > > > The second > > > > > > ioctl(fd, PERF_EVENT_IOC_ENABLE, 0); > > > > > > is just a NOP in this case. The first invocation in (1.) sets the > > > event::state to PERF_EVENT_STATE_ACTIVE. The kernel functions > > > > > > perf_ioctl() > > > +-> _perf_ioctl() > > > +-> _perf_event_enable() > > > +-> __perf_event_enable() > > > > > > return immediately because event::state is already set to > > > PERF_EVENT_STATE_ACTIVE. > > > > > > This happens on s390, because the event 'cycles' offers the possibility > > > to save auxilary data. The PMU callbacks setup_aux() and free_aux() are > > > defined. Without both callback functions, cpumsf_pmu_stop() is not > > > invoked and sampling continues. > > > > > > To remedy this, remove the first invocation of > > > > > > ioctl(..., PERF_EVENT_IOC_ENABLE, ...). > > > > > > in step (1.) Create the event in step (1.) and enable it in step (3.) > > > after the ring buffer has been mapped. > > > > > > Output after: > > > > > > # ./perf record -aB --synth=no -u 0 -- ./perf test -w thloop 2 > > > [ perf record: Woken up 3 times to write data ] > > > [ perf record: Captured and wrote 0.876 MB perf.data ] > > > # ./perf report --stats | grep SAMPLE > > > SAMPLE events: 16200 (99.5%) > > > SAMPLE events: 16200 > > > # > > > > > > The software event succeeded both before and after the patch: > > > > > > # ./perf record -e cpu-clock -aB --synth=no -u 0 -- \ > > > ./perf test -w thloop 2 > > > [ perf record: Woken up 7 times to write data ] > > > [ perf record: Captured and wrote 2.870 MB perf.data ] > > > # ./perf report --stats | grep SAMPLE > > > SAMPLE events: 53506 (99.8%) > > > SAMPLE events: 53506 > > > # > > > > > > Fixes: b4c658d4d63d61 ("perf target: Remove uid from target") > > > Suggested-by: Jiri Olsa <jolsa@kernel.org> > > > Tested-by: Thomas Richter <tmricht@linux.ibm.com> > > > Co-developed-by: Thomas Richter <tmricht@linux.ibm.com> > > > Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> > > > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> > > > > Acked-by: Namhyung Kim <namhyung@kernel.org> > > Do you mind if I take the whole set through the bpf tree ? > > I'm planning to send bpf PR in a couple days, so by -rc1 > all trees will see the fix. Sure, I don't think we have conflicting changes and we'll sync perf-tools-next once -rc1 is released. Thanks, Namhyung ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-08-07 5:02 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-06 11:40 [PATCH v4 0/2] perf/s390: Regression: Move uid filtering to BPF filters Ilya Leoshkevich 2025-08-06 11:40 ` [PATCH v4 1/2] libbpf: Add the ability to suppress perf event enablement Ilya Leoshkevich 2025-08-06 15:25 ` Yonghong Song 2025-08-06 11:40 ` [PATCH v4 2/2] perf bpf-filter: Enable events manually Ilya Leoshkevich 2025-08-06 22:53 ` Namhyung Kim 2025-08-06 23:38 ` Alexei Starovoitov 2025-08-07 5:02 ` Namhyung Kim
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).