Re: [PATCH] perf top: Use evsel's cpus to replace user_requested_cpus

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Mark Rutland <mark.rutland@arm.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>,
	Ian Rogers <irogers@google.com>,
	Namhyung Kim <namhyung@kernel.org>,
	maz@kernel.org, marcan@marcan.st, linux-kernel@vger.kernel.org,
	linux-perf-users@vger.kernel.org
Subject: Re: [PATCH] perf top: Use evsel's cpus to replace user_requested_cpus
Date: Fri, 15 Dec 2023 14:49:14 -0300	[thread overview]
Message-ID: <ZXyRmpyzNGd3gKwV@kernel.org> (raw)
In-Reply-To: <ZXyEJYXVNn2i06ro@FVFF77S0Q05N>

Em Fri, Dec 15, 2023 at 04:51:49PM +0000, Mark Rutland escreveu:
> On Fri, Dec 15, 2023 at 12:36:10PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Dec 12, 2023 at 06:31:05PM +0000, Mark Rutland escreveu:
> > > On ARM it'll be essentially the same as on x86: if you open an event with
> > > type==PERF_EVENT_TYPE_HARDWARE (without the extended HW type pointing to a
> > > specific PMU), and with cpu==-1, it'll go to an arbitrary CPU PMU, whichever
> > > happens to be found by perf_init_event() when iterating over the 'pmus' list.

> > > If you open an event with type==PERF_EVENT_TYPE_HARDWARE and cpu!=-1, the event
> > > will opened on the appropriate CPU PMU, by virtue of being rejected by others
> > > when perf_init_event() iterates over the 'pmus' list.

> > The way that it is working non on my intel hybrid system, with Kan's
> > patch, is equivalent to using this on the RK3399pc board I have:

> > root@roc-rk3399-pc:~# perf top -e armv8_cortex_a72/cycles/P,armv8_cortex_a53/cycles/P

> > Wouldn't be better to make 'perf top' on ARM work the way is being done
> > in x86 now, i.e. default to opening the two events, one per PMU and
> > allow the user to switch back and forth using the TUI/stdio?
 
> TBH, for perf top I don't know *which* behaviour is preferable, but I agree
> that it'd be good for x86 and arm to work in the same way.

Right, reducing the difference in the user experience accross arches.
 
> For design-cleanliness and consistency with other commands I can see that
> opening those separately is probably for the best, but for typical usage of
> perf top it's really nice to have those presented together without having to
> tab back-and-forth between the distinct PMUs, and so the existing behaviour of

Humm, so you would want two histogram viewers, one for each PMU, side by
side?

> using CPU-bound PERF_EVENT_TYPE_HARDWARE events is arguably nicer for the user.

So, on ARM64, start the following 'perf trace' session, then run the
stock 'perf top':

root@roc-rk3399-pc:~# perf trace -e perf_event_open
<SNIP calls doing capability queries and setting up sideband stuff>
   535.764 ( 0.015 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 19
   535.783 ( 0.067 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 1, group_fd: -1, flags: FD_CLOEXEC) = 28
   535.854 ( 0.063 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 2, group_fd: -1, flags: FD_CLOEXEC) = 29
   535.920 ( 0.015 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 30
   535.939 ( 0.016 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 4, group_fd: -1, flags: FD_CLOEXEC) = 31
   535.959 ( 0.011 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 5, group_fd: -1, flags: FD_CLOEXEC) = 32

root@roc-rk3399-pc:~# grep "CPU part" /proc/cpuinfo  | uniq -c
      4 CPU part	: 0xd03
      2 CPU part	: 0xd08
root@roc-rk3399-pc:~#

It is already doing what you suggest, right? PERF_TYPE_HARDWARE, one
counter per CPU, maps to armv8_cortex_a72/cycles/P and
armv8_cortex_a53/cycles/P.

One thing I'm thinking is that we should split this per PMU at the
hist_entry, so that we could show how many samples/events came from each
of them...

- Arnaldo
 
> I don't have a strong feeling either way; I'm personally happy so long as
> explicit pmu_name/event/ events don't get silently converted into
> PERF_EVENT_TYPE_HARDWARE events, and as long as we correctly set the extended
> HW type when we decide to use that.
> 
> Thanks,
> Mark.
> 
> > Kan, I also noticed that the name of the event is:
> > 
> > 1K cpu_atom/cycles:P/                                                                                                                                                                         ◆
> > 11K cpu_core/cycles:P/
> > 
> > If I try to use that on the command line:
> > 
> > root@number:~# perf top -e cpu_atom/cycles:P/
> > event syntax error: 'cpu_atom/cycles:P/'
> >                               \___ Bad event or PMU
> > 
> > Unable to find PMU or event on a PMU of 'cpu_atom'
> > 
> > Initial error:
> > event syntax error: 'cpu_atom/cycles:P/'
> >                               \___ unknown term 'cycles:P' for pmu 'cpu_atom'
> > 
> > valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,cmask,config,config1,config2,config3,name,period,freq,branch_type,time,call-graph,stack-size,no-inherit,inherit,max-stack,nr,no-overwrite,overwrite,driver-config,percore,aux-output,aux-sample-size,metric-id,raw,legacy-cache,hardware
> > Run 'perf list' for a list of valid events
> > 
> >  Usage: perf top [<options>]
> > 
> >     -e, --event <event>   event selector. use 'perf list' to list available events
> > root@number:~#
> > 
> > It should be:
> > 
> >   "cpu_atom/cycles/P"

next prev parent reply	other threads:[~2023-12-15 17:49 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-08 21:08 [PATCH] perf top: Use evsel's cpus to replace user_requested_cpus kan.liang
2023-12-11 21:13 ` Arnaldo Carvalho de Melo
2023-12-12 15:56   ` Liang, Kan
2023-12-12 16:58     ` Arnaldo Carvalho de Melo
2023-12-12 17:23     ` Namhyung Kim
2023-12-12 18:00       ` Ian Rogers
2023-12-12 18:31         ` Mark Rutland
2023-12-12 18:49           ` Namhyung Kim
2023-12-12 19:22             ` Liang, Kan
2023-12-13 12:05               ` Mark Rutland
2023-12-12 19:26             ` Ian Rogers
2023-12-15 15:36           ` Arnaldo Carvalho de Melo
2023-12-15 16:51             ` Mark Rutland
2023-12-15 17:49               ` Arnaldo Carvalho de Melo [this message]
2024-01-05 12:31                 ` Mark Rutland
2023-12-15 17:59             ` Liang, Kan
2023-12-15 18:26               ` Arnaldo Carvalho de Melo
2023-12-15 18:53                 ` Liang, Kan
2023-12-18 20:23                   ` Arnaldo Carvalho de Melo
2023-12-18 21:07                     ` Liang, Kan
2023-12-12  0:02 ` Ian Rogers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZXyRmpyzNGd3gKwV@kernel.org \
    --to=acme@kernel.org \
    --cc=irogers@google.com \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=marcan@marcan.st \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=namhyung@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).