All of lore.kernel.org
 help / color / mirror / Atom feed
From: Namhyung Kim <namhyung@kernel.org>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Ian Rogers <irogers@google.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	bpf <bpf@vger.kernel.org>,
	"linux-perf-use." <linux-perf-users@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-s390 <linux-s390@vger.kernel.org>,
	Thomas Richter <tmricht@linux.ibm.com>,
	Jiri Olsa <jolsa@kernel.org>, Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>
Subject: Re: [PATCH v4 2/2] perf bpf-filter: Enable events manually
Date: Wed, 6 Aug 2025 22:02:38 -0700	[thread overview]
Message-ID: <aJQzbpV_NXCD5-Ob@google.com> (raw)
In-Reply-To: <CAADnVQJG6U6X1qarpbdXra12m-PhNJK5f-jyw695osnOm6AZnQ@mail.gmail.com>

Hi Alexei,

On Wed, Aug 06, 2025 at 04:38:09PM -0700, Alexei Starovoitov wrote:
> On Wed, Aug 6, 2025 at 3:53 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hello,
> >
> > On Wed, Aug 06, 2025 at 01:40:35PM +0200, Ilya Leoshkevich wrote:
> > > On s390, and, in general, on all platforms where the respective event
> > > supports auxiliary data gathering, the command:
> > >
> > >    # ./perf record -u 0 -aB --synth=no -- ./perf test -w thloop
> > >    [ perf record: Woken up 1 times to write data ]
> > >    [ perf record: Captured and wrote 0.011 MB perf.data ]
> > >    # ./perf report --stats | grep SAMPLE
> > >    #
> > >
> > > does not generate samples in the perf.data file. On x86 the command:
> > >
> > >   # sudo perf record -e intel_pt// -u 0 ls
> > >
> > > is broken too.
> > >
> > > Looking at the sequence of calls in 'perf record' reveals this
> > > behavior:
> > >
> > > 1. The event 'cycles' is created and enabled:
> > >
> > >    record__open()
> > >    +-> evlist__apply_filters()
> > >        +-> perf_bpf_filter__prepare()
> > >          +-> bpf_program.attach_perf_event()
> > >              +-> bpf_program.attach_perf_event_opts()
> > >                  +-> __GI___ioctl(..., PERF_EVENT_IOC_ENABLE, ...)
> > >
> > >    The event 'cycles' is enabled and active now. However the event's
> > >    ring-buffer to store the samples generated by hardware is not
> > >    allocated yet.
> > >
> > > 2. The event's fd is mmap()ed to create the ring buffer:
> > >
> > >    record__open()
> > >    +-> record__mmap()
> > >        +-> record__mmap_evlist()
> > >          +-> evlist__mmap_ex()
> > >              +-> perf_evlist__mmap_ops()
> > >                  +-> mmap_per_cpu()
> > >                      +-> mmap_per_evsel()
> > >                          +-> mmap__mmap()
> > >                              +-> perf_mmap__mmap()
> > >                                  +-> mmap()
> > >
> > >    This allocates the ring buffer for the event 'cycles'. With mmap()
> > >    the kernel creates the ring buffer:
> > >
> > >    perf_mmap(): kernel function to create the event's ring
> > >    |            buffer to save the sampled data.
> > >    |
> > >    +-> ring_buffer_attach(): Allocates memory for ring buffer.
> > >        |        The PMU has auxiliary data setup function. The
> > >        |        has_aux(event) condition is true and the PMU's
> > >        |        stop() is called to stop sampling. It is not
> > >        |        restarted:
> > >        |
> > >        |        if (has_aux(event))
> > >        |                perf_event_stop(event, 0);
> > >        |
> > >        +-> cpumsf_pmu_stop():
> > >
> > >    Hardware sampling is stopped. No samples are generated and saved
> > >    anymore.
> > >
> > > 3. After the event 'cycles' has been mapped, the event is enabled a
> > >    second time in:
> > >
> > >    __cmd_record()
> > >    +-> evlist__enable()
> > >        +-> __evlist__enable()
> > >          +-> evsel__enable_cpu()
> > >              +-> perf_evsel__enable_cpu()
> > >                  +-> perf_evsel__run_ioctl()
> > >                      +-> perf_evsel__ioctl()
> > >                          +-> __GI___ioctl(., PERF_EVENT_IOC_ENABLE, .)
> > >
> > >    The second
> > >
> > >       ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
> > >
> > >    is just a NOP in this case. The first invocation in (1.) sets the
> > >    event::state to PERF_EVENT_STATE_ACTIVE. The kernel functions
> > >
> > >    perf_ioctl()
> > >    +-> _perf_ioctl()
> > >        +-> _perf_event_enable()
> > >            +-> __perf_event_enable()
> > >
> > >    return immediately because event::state is already set to
> > >    PERF_EVENT_STATE_ACTIVE.
> > >
> > > This happens on s390, because the event 'cycles' offers the possibility
> > > to save auxilary data. The PMU callbacks setup_aux() and free_aux() are
> > > defined. Without both callback functions, cpumsf_pmu_stop() is not
> > > invoked and sampling continues.
> > >
> > > To remedy this, remove the first invocation of
> > >
> > >    ioctl(..., PERF_EVENT_IOC_ENABLE, ...).
> > >
> > > in step (1.) Create the event in step (1.) and enable it in step (3.)
> > > after the ring buffer has been mapped.
> > >
> > > Output after:
> > >
> > >  # ./perf record -aB --synth=no -u 0 -- ./perf test -w thloop 2
> > >  [ perf record: Woken up 3 times to write data ]
> > >  [ perf record: Captured and wrote 0.876 MB perf.data ]
> > >  # ./perf  report --stats | grep SAMPLE
> > >               SAMPLE events:      16200  (99.5%)
> > >               SAMPLE events:      16200
> > >  #
> > >
> > > The software event succeeded both before and after the patch:
> > >
> > >  # ./perf record -e cpu-clock -aB --synth=no -u 0 -- \
> > >                                         ./perf test -w thloop 2
> > >  [ perf record: Woken up 7 times to write data ]
> > >  [ perf record: Captured and wrote 2.870 MB perf.data ]
> > >  # ./perf  report --stats | grep SAMPLE
> > >               SAMPLE events:      53506  (99.8%)
> > >               SAMPLE events:      53506
> > >  #
> > >
> > > Fixes: b4c658d4d63d61 ("perf target: Remove uid from target")
> > > Suggested-by: Jiri Olsa <jolsa@kernel.org>
> > > Tested-by: Thomas Richter <tmricht@linux.ibm.com>
> > > Co-developed-by: Thomas Richter <tmricht@linux.ibm.com>
> > > Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
> > > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> >
> > Acked-by: Namhyung Kim <namhyung@kernel.org>
> 
> Do you mind if I take the whole set through the bpf tree ?
> 
> I'm planning to send bpf PR in a couple days, so by -rc1
> all trees will see the fix.

Sure, I don't think we have conflicting changes and we'll sync
perf-tools-next once -rc1 is released.

Thanks,
Namhyung


      reply	other threads:[~2025-08-07  5:02 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-06 11:40 [PATCH v4 0/2] perf/s390: Regression: Move uid filtering to BPF filters Ilya Leoshkevich
2025-08-06 11:40 ` [PATCH v4 1/2] libbpf: Add the ability to suppress perf event enablement Ilya Leoshkevich
2025-08-06 15:25   ` Yonghong Song
2025-08-06 11:40 ` [PATCH v4 2/2] perf bpf-filter: Enable events manually Ilya Leoshkevich
2025-08-06 22:53   ` Namhyung Kim
2025-08-06 23:38     ` Alexei Starovoitov
2025-08-07  5:02       ` Namhyung Kim [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aJQzbpV_NXCD5-Ob@google.com \
    --to=namhyung@kernel.org \
    --cc=acme@kernel.org \
    --cc=agordeev@linux.ibm.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=iii@linux.ibm.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=tmricht@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.