linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Kees Cook <keescook@chromium.org>
Cc: Kees Cook <kees@kernel.org>,
	linux-perf-users <linux-perf-users@vger.kernel.org>
Subject: Re: Getting PMU stats on specific syscalls
Date: Thu, 11 Jan 2024 12:53:25 -0300	[thread overview]
Message-ID: <ZaAO9cKhjbmTM0tJ@kernel.org> (raw)
In-Reply-To: <202401101640.21C835A8A@keescook>

Em Wed, Jan 10, 2024 at 04:40:57PM -0800, Kees Cook escreveu:
> On Tue, Jan 09, 2024 at 10:45:11PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Jan 09, 2024 at 05:04:54PM -0800, Kees Cook escreveu:
> > > On January 9, 2024 3:30:32 PM PST, Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> wrote:
> > > >On Tue, Jan 9, 2024, 7:55 PM Kees Cook <keescook@chromium.org> wrote:
> > > >> I'd like to get PMU stats only on specific syscalls. I haven't been able
> > > >> to figure this out. i.e. how do I run:

> > > >>         perf stat -e some_syscall_name_here make -j128

> > > >> Try with the technique described here:

> > > >https://www.spinics.net/lists/linux-perf-users/msg09253.html

> > > >Using the syscalls:sys_enter_syscall_name and
> > > >syscalls:sys_exit_syscall_name as on off switches.

> > > Yeah, I can get walk-clock time, but I'm trying to get much more
> > > detailed stats (cache hits, cycle counts, etc).

> > > It seems I can only get PMU counts from a whole process. I was hoping
> > > to avoid creating a synthetic workload that only exercises the one
> > > syscall and instead get a view of the PMU counters under a real load
> > > but only for the duration of the syscall...
> > 
> > This is the sequence I thought would do what you want:
> 
> Oh cool! Thanks for the example. I will give this a shot.

Hope it helps, but adding support for perf stat to run while some
function runs seems like a great new feature to have, something like:

  perf stat --function FOO make -j128

That would enable the default events but would aggregate only when the
kernel function FOO runs (for userspace one would do -x /path/to/binary
as with 'perf probe') seems a great addition.

Similar, in implementation to using BPF in the tool below, that may be
of your interest as well

Using the function graph tracer:

root@number:~# perf ftrace latency -T schedule
^C#   DURATION     |      COUNT | GRAPH                                          |
     0 - 1    us |     318459 | ##########                                     |
     1 - 2    us |     455298 | ###############                                |
     2 - 4    us |     214732 | #######                                        |
     4 - 8    us |     154981 | #####                                          |
     8 - 16   us |     175314 | #####                                          |
    16 - 32   us |      19021 |                                                |
    32 - 64   us |       1066 |                                                |
    64 - 128  us |        502 |                                                |
   128 - 256  us |        435 |                                                |
   256 - 512  us |        219 |                                                |
   512 - 1024 us |       1188 |                                                |
     1 - 2    ms |        172 |                                                |
     2 - 4    ms |        262 |                                                |
     4 - 8    ms |        535 |                                                |
     8 - 16   ms |        788 |                                                |
    16 - 32   ms |        513 |                                                |
    32 - 64   ms |        379 |                                                |
    64 - 128  ms |         82 |                                                |
   128 - 256  ms |         65 |                                                |
   256 - 512  ms |        111 |                                                |
   512 - 1024 ms |         15 |                                                |
     1 - ...   s |          0 |                                                |
root@number:~#

Aggregating using BPF:

root@number:~# perf ftrace latency --use-bpf -T schedule
^C#   DURATION     |      COUNT | GRAPH                                          |
     0 - 1    us |      49377 | ###                                            |
     1 - 2    us |     169563 | ############                                   |
     2 - 4    us |     166872 | ############                                   |
     4 - 8    us |      90711 | ######                                         |
     8 - 16   us |      64548 | ####                                           |
    16 - 32   us |      77211 | #####                                          |
    32 - 64   us |       4567 |                                                |
    64 - 128  us |        486 |                                                |
   128 - 256  us |        339 |                                                |
   256 - 512  us |        174 |                                                |
   512 - 1024 us |        112 |                                                |
     1 - 2    ms |        602 |                                                |
     2 - 4    ms |         76 |                                                |
     4 - 8    ms |         65 |                                                |
     8 - 16   ms |        223 |                                                |
    16 - 32   ms |        315 |                                                |
    32 - 64   ms |        258 |                                                |
    64 - 128  ms |        175 |                                                |
   128 - 256  ms |         54 |                                                |
   256 - 512  ms |         42 |                                                |
   512 - 1024 ms |         63 |                                                |
     1 - ...   s |         35 |                                                |
root@number:~#

With a syscall:

root@number:~# perf ftrace latency -bT __x64_sys_write find /
<BIG find / output SNIP>
/var/www
/var/www/cgi-bin
/var/www/html
/var/yp
/var/.updated
/wb
#   DURATION     |      COUNT | GRAPH                                          |
     0 - 1    us |    5286452 | #######################                        |
     1 - 2    us |    5205312 | ######################                         |
     2 - 4    us |      35561 |                                                |
     4 - 8    us |       3852 |                                                |
     8 - 16   us |        957 |                                                |
    16 - 32   us |        281 |                                                |
    32 - 64   us |         54 |                                                |
    64 - 128  us |         37 |                                                |
   128 - 256  us |         20 |                                                |
   256 - 512  us |          6 |                                                |
   512 - 1024 us |          5 |                                                |
     1 - 2    ms |          4 |                                                |
     2 - 4    ms |          3 |                                                |
     4 - 8    ms |         44 |                                                |
     8 - 16   ms |        226 |                                                |
    16 - 32   ms |        742 |                                                |
    32 - 64   ms |        229 |                                                |
    64 - 128  ms |          9 |                                                |
   128 - 256  ms |          0 |                                                |
   256 - 512  ms |          0 |                                                |
   512 - 1024 ms |          0 |                                                |
     1 - ...   s |          0 |                                                |
root@number:~#

root@number:~# perf ftrace latency -h

 Usage: perf ftrace [<options>] [<command>]
    or: perf ftrace [<options>] -- [<command>] [<options>]
    or: perf ftrace {trace|latency} [<options>] [<command>]
    or: perf ftrace {trace|latency} [<options>] -- [<command>] [<options>]

    -b, --use-bpf         Use BPF to measure function latency
    -n, --use-nsec        Use nano-second histogram
    -T, --trace-funcs <func>
                          Show latency of given function

root@number:~#

The BPF code basically puts a kprobe at function entry and a kretprobe
at function exit, and goes on aggregating the time deltas, we would
instead do deltas on free running perf counters and at the end provide
it just like 'perf stat' does instead of the histogram above, the BPF is
straightforward:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/bpf_skel/func_latency.bpf.c

The userspace part just associates the function of interest to those the
kprobe (function entry) and kretprobe (function exit) BPF programs in
func_latency.bpf.c:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/bpf_ftrace.c#n86

- Arnaldo

      reply	other threads:[~2024-01-11 15:53 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-09 22:55 Getting PMU stats on specific syscalls Kees Cook
2024-01-09 23:52 ` Namhyung Kim
2024-01-10  1:01   ` Kees Cook
2024-01-10  1:52     ` Leo Yan
     [not found] ` <CA+JHD90kw0CX9=E18A7NBJrxdPDQuwrew355RV47oBhn_1s_QQ@mail.gmail.com>
2024-01-10  1:04   ` Kees Cook
2024-01-10  1:45     ` Arnaldo Carvalho de Melo
2024-01-11  0:40       ` Kees Cook
2024-01-11 15:53         ` Arnaldo Carvalho de Melo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZaAO9cKhjbmTM0tJ@kernel.org \
    --to=acme@kernel.org \
    --cc=kees@kernel.org \
    --cc=keescook@chromium.org \
    --cc=linux-perf-users@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).