linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Getting PMU stats on specific syscalls
@ 2024-01-09 22:55 Kees Cook
  2024-01-09 23:52 ` Namhyung Kim
       [not found] ` <CA+JHD90kw0CX9=E18A7NBJrxdPDQuwrew355RV47oBhn_1s_QQ@mail.gmail.com>
  0 siblings, 2 replies; 8+ messages in thread
From: Kees Cook @ 2024-01-09 22:55 UTC (permalink / raw)
  To: linux-perf-users

Hi,

I'd like to get PMU stats only on specific syscalls. I haven't been able
to figure this out. i.e. how do I run:

	perf stat -e some_syscall_name_here make -j128

I can't figure out how to limit collection to only time spent during the
syscall I'm interested in. I tried building synthetic events, but that
doesn't seem to work...

Thanks!

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Getting PMU stats on specific syscalls
  2024-01-09 22:55 Getting PMU stats on specific syscalls Kees Cook
@ 2024-01-09 23:52 ` Namhyung Kim
  2024-01-10  1:01   ` Kees Cook
       [not found] ` <CA+JHD90kw0CX9=E18A7NBJrxdPDQuwrew355RV47oBhn_1s_QQ@mail.gmail.com>
  1 sibling, 1 reply; 8+ messages in thread
From: Namhyung Kim @ 2024-01-09 23:52 UTC (permalink / raw)
  To: Kees Cook; +Cc: linux-perf-users

Hello,

On Tue, Jan 9, 2024 at 2:56 PM Kees Cook <keescook@chromium.org> wrote:
>
> Hi,
>
> I'd like to get PMU stats only on specific syscalls. I haven't been able
> to figure this out. i.e. how do I run:
>
>         perf stat -e some_syscall_name_here make -j128

How about this?

  perf stat -e syscalls:sys_enter_<syscall_name> -- make -j128

>
> I can't figure out how to limit collection to only time spent during the
> syscall I'm interested in. I tried building synthetic events, but that
> doesn't seem to work...

Hmm.. ok.  it seems you want to use PMU only for a syscall.
I don't think perf supports that for now, but maybe it's possible
to attach custom BPF programs to read perf counters at syscall
enter and exit and to save the diff in a map.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Getting PMU stats on specific syscalls
  2024-01-09 23:52 ` Namhyung Kim
@ 2024-01-10  1:01   ` Kees Cook
  2024-01-10  1:52     ` Leo Yan
  0 siblings, 1 reply; 8+ messages in thread
From: Kees Cook @ 2024-01-10  1:01 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: linux-perf-users



On January 9, 2024 3:52:53 PM PST, Namhyung Kim <namhyung@kernel.org> wrote:
>Hello,
>
>On Tue, Jan 9, 2024 at 2:56 PM Kees Cook <keescook@chromium.org> wrote:
>>
>> Hi,
>>
>> I'd like to get PMU stats only on specific syscalls. I haven't been able
>> to figure this out. i.e. how do I run:
>>
>>         perf stat -e some_syscall_name_here make -j128
>
>How about this?
>
>  perf stat -e syscalls:sys_enter_<syscall_name> -- make -j128

Unfortunately this is just a count of calls.

>
>>
>> I can't figure out how to limit collection to only time spent during the
>> syscall I'm interested in. I tried building synthetic events, but that
>> doesn't seem to work...
>
>Hmm.. ok.  it seems you want to use PMU only for a syscall.
>I don't think perf supports that for now, but maybe it's possible
>to attach custom BPF programs to read perf counters at syscall
>enter and exit and to save the diff in a map.

Does BPF have access to the PMU counters? I don't know how to even approach writing that though. :P

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Getting PMU stats on specific syscalls
       [not found] ` <CA+JHD90kw0CX9=E18A7NBJrxdPDQuwrew355RV47oBhn_1s_QQ@mail.gmail.com>
@ 2024-01-10  1:04   ` Kees Cook
  2024-01-10  1:45     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 8+ messages in thread
From: Kees Cook @ 2024-01-10  1:04 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: linux-perf-users



On January 9, 2024 3:30:32 PM PST, Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> wrote:
>On Tue, Jan 9, 2024, 7:55 PM Kees Cook <keescook@chromium.org> wrote:
>
>> Hi,
>>
>> I'd like to get PMU stats only on specific syscalls. I haven't been able
>> to figure this out. i.e. how do I run:
>>
>>         perf stat -e some_syscall_name_here make -j128
>>
>>
>>
>> Try with the technique described here:
>>
>
>https://www.spinics.net/lists/linux-perf-users/msg09253.html
>
>Using the syscalls:sys_enter_syscall_name and
>syscalls:sys_exit_syscall_name as on off switches.

Yeah, I can get walk-clock time, but I'm trying to get much more detailed stats (cache hits, cycle counts, etc).

It seems I can only get PMU counts from a whole process. I was hoping to avoid creating a synthetic workload that only exercises the one syscall and instead get a view of the PMU counters under a real load but only for the duration of the syscall...

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Getting PMU stats on specific syscalls
  2024-01-10  1:04   ` Kees Cook
@ 2024-01-10  1:45     ` Arnaldo Carvalho de Melo
  2024-01-11  0:40       ` Kees Cook
  0 siblings, 1 reply; 8+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-01-10  1:45 UTC (permalink / raw)
  To: Kees Cook; +Cc: linux-perf-users

Em Tue, Jan 09, 2024 at 05:04:54PM -0800, Kees Cook escreveu:
> On January 9, 2024 3:30:32 PM PST, Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> wrote:
> >On Tue, Jan 9, 2024, 7:55 PM Kees Cook <keescook@chromium.org> wrote:
> >> I'd like to get PMU stats only on specific syscalls. I haven't been able
> >> to figure this out. i.e. how do I run:

> >>         perf stat -e some_syscall_name_here make -j128

> >> Try with the technique described here:

> >https://www.spinics.net/lists/linux-perf-users/msg09253.html

> >Using the syscalls:sys_enter_syscall_name and
> >syscalls:sys_exit_syscall_name as on off switches.
 
> Yeah, I can get walk-clock time, but I'm trying to get much more
> detailed stats (cache hits, cycle counts, etc).
 
> It seems I can only get PMU counts from a whole process. I was hoping
> to avoid creating a synthetic workload that only exercises the one
> syscall and instead get a view of the PMU counters under a real load
> but only for the duration of the syscall...

This is the sequence I thought would do what you want:

First run collecting the metrics you want + the slice demarcators, in
this case while running 'find /':

[root@five ~]# perf record -e '{cycles,instructions,cache-misses}' -e syscalls:sys_enter_write,syscalls:sys_exit_write find / > /dev/null
[ perf record: Woken up 213 times to write data ]
[ perf record: Captured and wrote 55.935 MB perf.data (756776 samples) ]
[root@five ~]#

this will collect lots of samples:

[root@five ~]# perf script | wc -l
756776
[root@five ~]#

Now consider samples that take place after syscalls:sys_enter_write and
stop considering as soon as you find a syscalls:sys_exit_write.

[root@five ~]# perf script --switch-on=syscalls:sys_enter_write --switch-off=syscalls:sys_exit_write | wc -l
5854
[root@five ~]#

Now a report with all the samples:

[root@five ~]# perf report  | head
# Total Lost Samples: 0
#
# Samples: 293K of events 'anon group { cycles, instructions, cache-misses }'
# Event count (approx.): 127206014994
#
#                 Overhead  Command  Shared Object                                   Symbol
# ........................  .......  ..............................................  ........................
[root@five ~]# perf report  | head -20
# Total Lost Samples: 0
#
# Samples: 293K of events 'anon group { cycles, instructions, cache-misses }'
# Event count (approx.): 127206014994
#
#                 Overhead  Command  Shared Object                                   Symbol
# ........................  .......  ..............................................  .........................
#
     2.63%   2.04%   2.36%  find     [kernel.kallsyms]                               [k] btrfs_search_slot
     2.27%   1.74%   2.67%  find     [kernel.kallsyms]                               [k] __radix_tree_lookup
     2.11%   1.81%   1.80%  find     [kernel.kallsyms]                               [k] btrfs_bin_search
     1.99%   1.98%   2.41%  find     [kernel.kallsyms]                               [k] srso_alias_safe_ret
     1.87%   2.20%   1.00%  find     [kernel.kallsyms]                               [k] read_extent_buffer
     1.74%   2.26%   0.62%  find     [kernel.kallsyms]                               [k] btrfs_real_readdir
     1.71%   1.22%   1.66%  find     [kernel.kallsyms]                               [k] read_block_for_search
     1.60%   1.64%   1.10%  find     [kernel.kallsyms]                               [k] entry_SYSCALL_64
     1.60%   1.34%   1.20%  find     [kernel.kallsyms]                               [k] kmem_cache_alloc
[root@five ~]#

And now one with just what is inside sys_enter_write <-> sys_exit_write:

[root@five ~]# perf report --switch-on=syscalls:sys_enter_write --switch-off=syscalls:sys_exit_write | head -40
# Total Lost Samples: 0
#
# Samples: 5K of events 'anon group { cycles, instructions, cache-misses }'
# Event count (approx.): 2406980186
#
#                 Overhead  Command  Shared Object      Symbol
# ........................  .......  .................  ......................................
#
    10.72%  11.71%  13.61%  find     [kernel.kallsyms]  [k] perf_output_sample
     7.56%  10.25%   8.68%  find     [kernel.kallsyms]  [k] __output_copy
     7.06%  11.47%   7.89%  find     [kernel.kallsyms]  [k] memcpy
     6.60%   5.45%   6.54%  find     [kernel.kallsyms]  [k] vfs_write
     4.51%   4.27%   5.58%  find     [kernel.kallsyms]  [k] srso_alias_safe_ret
     4.37%   2.93%   5.25%  find     [kernel.kallsyms]  [k] perf_tp_event
     4.20%   3.94%   3.92%  find     [kernel.kallsyms]  [k] perf_event_output_forward
     4.12%   5.56%   2.63%  find     [kernel.kallsyms]  [k] perf_output_begin_forward
     4.00%   2.84%   1.88%  find     [kernel.kallsyms]  [k] perf_prepare_sample
     2.97%   2.40%   1.60%  find     [kernel.kallsyms]  [k] __perf_event_header__init_id
     2.93%   4.35%   3.03%  find     [kernel.kallsyms]  [k] srso_alias_return_thunk
     2.93%   1.88%   1.70%  find     [kernel.kallsyms]  [k] perf_syscall_exit
     2.60%   2.84%   2.37%  find     [kernel.kallsyms]  [k] security_file_permission
     2.50%   2.57%   1.86%  find     [kernel.kallsyms]  [k] selinux_file_permission
     2.42%   0.71%   2.49%  find     [kernel.kallsyms]  [k] perf_tp_event_match
     2.36%   1.06%   1.64%  find     [kernel.kallsyms]  [k] __task_pid_nr_ns
     2.20%   2.24%   2.37%  find     [kernel.kallsyms]  [k] ksys_write
     1.77%   2.01%   3.78%  find     [kernel.kallsyms]  [k] __fget_light
     1.62%   2.82%   1.01%  find     [kernel.kallsyms]  [k] native_sched_clock
     1.55%   1.56%   1.79%  find     [kernel.kallsyms]  [k] __rcu_read_unlock
     1.45%   0.84%   0.72%  find     [kernel.kallsyms]  [k] syscall_exit_to_user_mode_prepare
     1.41%   0.98%   1.25%  find     [kernel.kallsyms]  [k] __perf_event_overflow
     1.36%   0.81%   0.63%  find     [kernel.kallsyms]  [k] __fsnotify_parent
     1.23%   1.36%   1.03%  find     [kernel.kallsyms]  [k] do_syscall_64
     1.19%   0.26%   1.36%  find     [kernel.kallsyms]  [k] rw_verify_area
     1.07%   0.56%   0.87%  find     [kernel.kallsyms]  [k] write_null
     1.06%   1.36%   0.38%  find     [kernel.kallsyms]  [k] perf_instruction_pointer
     0.96%   0.58%   1.21%  find     [kernel.kallsyms]  [k] inode_security
     0.92%   0.33%   0.51%  find     [kernel.kallsyms]  [k] perf_trace_buf_update
[root@five ~]#

If you want to sum up the samples for each of the events you can do as
below, but yeah, the missing feature here is to aggregate those counters
as early as possible, without generating a perf.data for post
processing.

- Arnaldo

[root@five ~]# perf script -g python
generated Python script: perf-script.py
[root@five ~]# cat perf-script.py
# perf script event handlers, generated by perf script -g python
# Licensed under the terms of the GNU GPL License version 2

# The common_* event handler fields are the most useful fields common to
# all events.  They don't necessarily correspond to the 'common_*' fields
# in the format files.  Those fields not available as handler params can
# be retrieved using Python functions of the form common_*(context).
# See the perf-script-python Documentation for the list of available functions.

from __future__ import print_function

import os
import sys

sys.path.append(os.environ['PERF_EXEC_PATH'] + \
	'/scripts/python/Perf-Trace-Util/lib/Perf/Trace')

from perf_trace_context import *
from Core import *


def trace_begin():
	print("in trace_begin")

def trace_end():
	print("in trace_end")

def syscalls__sys_exit_write(event_name, context, common_cpu,
	common_secs, common_nsecs, common_pid, common_comm,
	common_callchain, __syscall_nr, ret, perf_sample_dict):
		print_header(event_name, common_cpu, common_secs, common_nsecs,
			common_pid, common_comm)

		print("__syscall_nr=%d, ret=%d" % \
		(__syscall_nr, ret))

		print('Sample: {'+get_dict_as_string(perf_sample_dict['sample'], ', ')+'}')

		for node in common_callchain:
			if 'sym' in node:
				print("	[%x] %s%s%s%s" % (
					node['ip'], node['sym']['name'],
					"+0x{:x}".format(node['sym_off']) if 'sym_off' in node else "",
					" ({})".format(node['dso'])  if 'dso' in node else "",
					" " + node['sym_srcline'] if 'sym_srcline' in node else ""))
			else:
				print("	[%x]" % (node['ip']))

		print()

def syscalls__sys_enter_write(event_name, context, common_cpu,
	common_secs, common_nsecs, common_pid, common_comm,
	common_callchain, __syscall_nr, fd, buf, count,
		perf_sample_dict):
		print_header(event_name, common_cpu, common_secs, common_nsecs,
			common_pid, common_comm)

		print("__syscall_nr=%d, fd=%u, buf=%u, " \
		"count=%u" % \
		(__syscall_nr, fd, buf, count))

		print('Sample: {'+get_dict_as_string(perf_sample_dict['sample'], ', ')+'}')

		for node in common_callchain:
			if 'sym' in node:
				print("	[%x] %s%s%s%s" % (
					node['ip'], node['sym']['name'],
					"+0x{:x}".format(node['sym_off']) if 'sym_off' in node else "",
					" ({})".format(node['dso'])  if 'dso' in node else "",
					" " + node['sym_srcline'] if 'sym_srcline' in node else ""))
			else:
				print("	[%x]" % (node['ip']))

		print()

def trace_unhandled(event_name, context, event_fields_dict, perf_sample_dict):
		print(get_dict_as_string(event_fields_dict))
		print('Sample: {'+get_dict_as_string(perf_sample_dict['sample'], ', ')+'}')

def print_header(event_name, cpu, secs, nsecs, pid, comm):
	print("%-20s %5u %05u.%09u %8u %-20s " % \
	(event_name, cpu, secs, nsecs, pid, comm), end="")

def get_dict_as_string(a_dict, delimiter=' '):
	return delimiter.join(['%s=%s'%(k,str(v))for k,v in sorted(a_dict.items())])
[root@five ~]#

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Getting PMU stats on specific syscalls
  2024-01-10  1:01   ` Kees Cook
@ 2024-01-10  1:52     ` Leo Yan
  0 siblings, 0 replies; 8+ messages in thread
From: Leo Yan @ 2024-01-10  1:52 UTC (permalink / raw)
  To: Kees Cook; +Cc: Namhyung Kim, linux-perf-users

On Tue, Jan 09, 2024 at 05:01:59PM -0800, Kees Cook wrote:

[...]

> >> I can't figure out how to limit collection to only time spent during the
> >> syscall I'm interested in. I tried building synthetic events, but that
> >> doesn't seem to work...
> >
> >Hmm.. ok.  it seems you want to use PMU only for a syscall.
> >I don't think perf supports that for now, but maybe it's possible
> >to attach custom BPF programs to read perf counters at syscall
> >enter and exit and to save the diff in a map.
> 
> Does BPF have access to the PMU counters? I don't know how to even approach writing that though. :P

An alternative is to use ftrace's function_graph tracer to measure a
function duration, e.g.

      # cd /sys/kernel/debug/tracing
      # echo syscall_function_name > set_graph_function
      # echo function_graph > current_tracer

      do some test

      # cat trace

I think you even can access the PMU counter for measurement duration
in user space - though you might cannot directly measure a specific
syscall, see:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c#n127

Thanks,
Leo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Getting PMU stats on specific syscalls
  2024-01-10  1:45     ` Arnaldo Carvalho de Melo
@ 2024-01-11  0:40       ` Kees Cook
  2024-01-11 15:53         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 8+ messages in thread
From: Kees Cook @ 2024-01-11  0:40 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Kees Cook, linux-perf-users

On Tue, Jan 09, 2024 at 10:45:11PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Jan 09, 2024 at 05:04:54PM -0800, Kees Cook escreveu:
> > On January 9, 2024 3:30:32 PM PST, Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> wrote:
> > >On Tue, Jan 9, 2024, 7:55 PM Kees Cook <keescook@chromium.org> wrote:
> > >> I'd like to get PMU stats only on specific syscalls. I haven't been able
> > >> to figure this out. i.e. how do I run:
> 
> > >>         perf stat -e some_syscall_name_here make -j128
> 
> > >> Try with the technique described here:
> 
> > >https://www.spinics.net/lists/linux-perf-users/msg09253.html
> 
> > >Using the syscalls:sys_enter_syscall_name and
> > >syscalls:sys_exit_syscall_name as on off switches.
>  
> > Yeah, I can get walk-clock time, but I'm trying to get much more
> > detailed stats (cache hits, cycle counts, etc).
>  
> > It seems I can only get PMU counts from a whole process. I was hoping
> > to avoid creating a synthetic workload that only exercises the one
> > syscall and instead get a view of the PMU counters under a real load
> > but only for the duration of the syscall...
> 
> This is the sequence I thought would do what you want:

Oh cool! Thanks for the example. I will give this a shot.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Getting PMU stats on specific syscalls
  2024-01-11  0:40       ` Kees Cook
@ 2024-01-11 15:53         ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 8+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-01-11 15:53 UTC (permalink / raw)
  To: Kees Cook; +Cc: Kees Cook, linux-perf-users

Em Wed, Jan 10, 2024 at 04:40:57PM -0800, Kees Cook escreveu:
> On Tue, Jan 09, 2024 at 10:45:11PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Jan 09, 2024 at 05:04:54PM -0800, Kees Cook escreveu:
> > > On January 9, 2024 3:30:32 PM PST, Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> wrote:
> > > >On Tue, Jan 9, 2024, 7:55 PM Kees Cook <keescook@chromium.org> wrote:
> > > >> I'd like to get PMU stats only on specific syscalls. I haven't been able
> > > >> to figure this out. i.e. how do I run:

> > > >>         perf stat -e some_syscall_name_here make -j128

> > > >> Try with the technique described here:

> > > >https://www.spinics.net/lists/linux-perf-users/msg09253.html

> > > >Using the syscalls:sys_enter_syscall_name and
> > > >syscalls:sys_exit_syscall_name as on off switches.

> > > Yeah, I can get walk-clock time, but I'm trying to get much more
> > > detailed stats (cache hits, cycle counts, etc).

> > > It seems I can only get PMU counts from a whole process. I was hoping
> > > to avoid creating a synthetic workload that only exercises the one
> > > syscall and instead get a view of the PMU counters under a real load
> > > but only for the duration of the syscall...
> > 
> > This is the sequence I thought would do what you want:
> 
> Oh cool! Thanks for the example. I will give this a shot.

Hope it helps, but adding support for perf stat to run while some
function runs seems like a great new feature to have, something like:

  perf stat --function FOO make -j128

That would enable the default events but would aggregate only when the
kernel function FOO runs (for userspace one would do -x /path/to/binary
as with 'perf probe') seems a great addition.

Similar, in implementation to using BPF in the tool below, that may be
of your interest as well

Using the function graph tracer:

root@number:~# perf ftrace latency -T schedule
^C#   DURATION     |      COUNT | GRAPH                                          |
     0 - 1    us |     318459 | ##########                                     |
     1 - 2    us |     455298 | ###############                                |
     2 - 4    us |     214732 | #######                                        |
     4 - 8    us |     154981 | #####                                          |
     8 - 16   us |     175314 | #####                                          |
    16 - 32   us |      19021 |                                                |
    32 - 64   us |       1066 |                                                |
    64 - 128  us |        502 |                                                |
   128 - 256  us |        435 |                                                |
   256 - 512  us |        219 |                                                |
   512 - 1024 us |       1188 |                                                |
     1 - 2    ms |        172 |                                                |
     2 - 4    ms |        262 |                                                |
     4 - 8    ms |        535 |                                                |
     8 - 16   ms |        788 |                                                |
    16 - 32   ms |        513 |                                                |
    32 - 64   ms |        379 |                                                |
    64 - 128  ms |         82 |                                                |
   128 - 256  ms |         65 |                                                |
   256 - 512  ms |        111 |                                                |
   512 - 1024 ms |         15 |                                                |
     1 - ...   s |          0 |                                                |
root@number:~#

Aggregating using BPF:

root@number:~# perf ftrace latency --use-bpf -T schedule
^C#   DURATION     |      COUNT | GRAPH                                          |
     0 - 1    us |      49377 | ###                                            |
     1 - 2    us |     169563 | ############                                   |
     2 - 4    us |     166872 | ############                                   |
     4 - 8    us |      90711 | ######                                         |
     8 - 16   us |      64548 | ####                                           |
    16 - 32   us |      77211 | #####                                          |
    32 - 64   us |       4567 |                                                |
    64 - 128  us |        486 |                                                |
   128 - 256  us |        339 |                                                |
   256 - 512  us |        174 |                                                |
   512 - 1024 us |        112 |                                                |
     1 - 2    ms |        602 |                                                |
     2 - 4    ms |         76 |                                                |
     4 - 8    ms |         65 |                                                |
     8 - 16   ms |        223 |                                                |
    16 - 32   ms |        315 |                                                |
    32 - 64   ms |        258 |                                                |
    64 - 128  ms |        175 |                                                |
   128 - 256  ms |         54 |                                                |
   256 - 512  ms |         42 |                                                |
   512 - 1024 ms |         63 |                                                |
     1 - ...   s |         35 |                                                |
root@number:~#

With a syscall:

root@number:~# perf ftrace latency -bT __x64_sys_write find /
<BIG find / output SNIP>
/var/www
/var/www/cgi-bin
/var/www/html
/var/yp
/var/.updated
/wb
#   DURATION     |      COUNT | GRAPH                                          |
     0 - 1    us |    5286452 | #######################                        |
     1 - 2    us |    5205312 | ######################                         |
     2 - 4    us |      35561 |                                                |
     4 - 8    us |       3852 |                                                |
     8 - 16   us |        957 |                                                |
    16 - 32   us |        281 |                                                |
    32 - 64   us |         54 |                                                |
    64 - 128  us |         37 |                                                |
   128 - 256  us |         20 |                                                |
   256 - 512  us |          6 |                                                |
   512 - 1024 us |          5 |                                                |
     1 - 2    ms |          4 |                                                |
     2 - 4    ms |          3 |                                                |
     4 - 8    ms |         44 |                                                |
     8 - 16   ms |        226 |                                                |
    16 - 32   ms |        742 |                                                |
    32 - 64   ms |        229 |                                                |
    64 - 128  ms |          9 |                                                |
   128 - 256  ms |          0 |                                                |
   256 - 512  ms |          0 |                                                |
   512 - 1024 ms |          0 |                                                |
     1 - ...   s |          0 |                                                |
root@number:~#

root@number:~# perf ftrace latency -h

 Usage: perf ftrace [<options>] [<command>]
    or: perf ftrace [<options>] -- [<command>] [<options>]
    or: perf ftrace {trace|latency} [<options>] [<command>]
    or: perf ftrace {trace|latency} [<options>] -- [<command>] [<options>]

    -b, --use-bpf         Use BPF to measure function latency
    -n, --use-nsec        Use nano-second histogram
    -T, --trace-funcs <func>
                          Show latency of given function

root@number:~#

The BPF code basically puts a kprobe at function entry and a kretprobe
at function exit, and goes on aggregating the time deltas, we would
instead do deltas on free running perf counters and at the end provide
it just like 'perf stat' does instead of the histogram above, the BPF is
straightforward:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/bpf_skel/func_latency.bpf.c

The userspace part just associates the function of interest to those the
kprobe (function entry) and kretprobe (function exit) BPF programs in
func_latency.bpf.c:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/bpf_ftrace.c#n86

- Arnaldo

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-01-11 15:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-09 22:55 Getting PMU stats on specific syscalls Kees Cook
2024-01-09 23:52 ` Namhyung Kim
2024-01-10  1:01   ` Kees Cook
2024-01-10  1:52     ` Leo Yan
     [not found] ` <CA+JHD90kw0CX9=E18A7NBJrxdPDQuwrew355RV47oBhn_1s_QQ@mail.gmail.com>
2024-01-10  1:04   ` Kees Cook
2024-01-10  1:45     ` Arnaldo Carvalho de Melo
2024-01-11  0:40       ` Kees Cook
2024-01-11 15:53         ` Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).