All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Mark Rutland <mark.rutland@arm.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>,
	Ian Rogers <irogers@google.com>,
	Namhyung Kim <namhyung@kernel.org>,
	maz@kernel.org, marcan@marcan.st, linux-kernel@vger.kernel.org,
	linux-perf-users@vger.kernel.org
Subject: Re: [PATCH] perf top: Use evsel's cpus to replace user_requested_cpus
Date: Fri, 15 Dec 2023 14:49:14 -0300	[thread overview]
Message-ID: <ZXyRmpyzNGd3gKwV@kernel.org> (raw)
In-Reply-To: <ZXyEJYXVNn2i06ro@FVFF77S0Q05N>

Em Fri, Dec 15, 2023 at 04:51:49PM +0000, Mark Rutland escreveu:
> On Fri, Dec 15, 2023 at 12:36:10PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Dec 12, 2023 at 06:31:05PM +0000, Mark Rutland escreveu:
> > > On ARM it'll be essentially the same as on x86: if you open an event with
> > > type==PERF_EVENT_TYPE_HARDWARE (without the extended HW type pointing to a
> > > specific PMU), and with cpu==-1, it'll go to an arbitrary CPU PMU, whichever
> > > happens to be found by perf_init_event() when iterating over the 'pmus' list.

> > > If you open an event with type==PERF_EVENT_TYPE_HARDWARE and cpu!=-1, the event
> > > will opened on the appropriate CPU PMU, by virtue of being rejected by others
> > > when perf_init_event() iterates over the 'pmus' list.

> > The way that it is working non on my intel hybrid system, with Kan's
> > patch, is equivalent to using this on the RK3399pc board I have:

> > root@roc-rk3399-pc:~# perf top -e armv8_cortex_a72/cycles/P,armv8_cortex_a53/cycles/P

> > Wouldn't be better to make 'perf top' on ARM work the way is being done
> > in x86 now, i.e. default to opening the two events, one per PMU and
> > allow the user to switch back and forth using the TUI/stdio?
 
> TBH, for perf top I don't know *which* behaviour is preferable, but I agree
> that it'd be good for x86 and arm to work in the same way.

Right, reducing the difference in the user experience accross arches.
 
> For design-cleanliness and consistency with other commands I can see that
> opening those separately is probably for the best, but for typical usage of
> perf top it's really nice to have those presented together without having to
> tab back-and-forth between the distinct PMUs, and so the existing behaviour of

Humm, so you would want two histogram viewers, one for each PMU, side by
side?

> using CPU-bound PERF_EVENT_TYPE_HARDWARE events is arguably nicer for the user.

So, on ARM64, start the following 'perf trace' session, then run the
stock 'perf top':

root@roc-rk3399-pc:~# perf trace -e perf_event_open
<SNIP calls doing capability queries and setting up sideband stuff>
   535.764 ( 0.015 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 19
   535.783 ( 0.067 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 1, group_fd: -1, flags: FD_CLOEXEC) = 28
   535.854 ( 0.063 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 2, group_fd: -1, flags: FD_CLOEXEC) = 29
   535.920 ( 0.015 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 30
   535.939 ( 0.016 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 4, group_fd: -1, flags: FD_CLOEXEC) = 31
   535.959 ( 0.011 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 5, group_fd: -1, flags: FD_CLOEXEC) = 32

root@roc-rk3399-pc:~# grep "CPU part" /proc/cpuinfo  | uniq -c
      4 CPU part	: 0xd03
      2 CPU part	: 0xd08
root@roc-rk3399-pc:~#

It is already doing what you suggest, right? PERF_TYPE_HARDWARE, one
counter per CPU, maps to armv8_cortex_a72/cycles/P and
armv8_cortex_a53/cycles/P.

One thing I'm thinking is that we should split this per PMU at the
hist_entry, so that we could show how many samples/events came from each
of them...

- Arnaldo
 
> I don't have a strong feeling either way; I'm personally happy so long as
> explicit pmu_name/event/ events don't get silently converted into
> PERF_EVENT_TYPE_HARDWARE events, and as long as we correctly set the extended
> HW type when we decide to use that.
> 
> Thanks,
> Mark.
> 
> > Kan, I also noticed that the name of the event is:
> > 
> > 1K cpu_atom/cycles:P/                                                                                                                                                                         ◆
> > 11K cpu_core/cycles:P/
> > 
> > If I try to use that on the command line:
> > 
> > root@number:~# perf top -e cpu_atom/cycles:P/
> > event syntax error: 'cpu_atom/cycles:P/'
> >                               \___ Bad event or PMU
> > 
> > Unable to find PMU or event on a PMU of 'cpu_atom'
> > 
> > Initial error:
> > event syntax error: 'cpu_atom/cycles:P/'
> >                               \___ unknown term 'cycles:P' for pmu 'cpu_atom'
> > 
> > valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,cmask,config,config1,config2,config3,name,period,freq,branch_type,time,call-graph,stack-size,no-inherit,inherit,max-stack,nr,no-overwrite,overwrite,driver-config,percore,aux-output,aux-sample-size,metric-id,raw,legacy-cache,hardware
> > Run 'perf list' for a list of valid events
> > 
> >  Usage: perf top [<options>]
> > 
> >     -e, --event <event>   event selector. use 'perf list' to list available events
> > root@number:~#
> > 
> > It should be:
> > 
> >   "cpu_atom/cycles/P"

  reply	other threads:[~2023-12-15 17:49 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-08 21:08 [PATCH] perf top: Use evsel's cpus to replace user_requested_cpus kan.liang
2023-12-11 21:13 ` Arnaldo Carvalho de Melo
2023-12-12 15:56   ` Liang, Kan
2023-12-12 16:58     ` Arnaldo Carvalho de Melo
2023-12-12 17:23     ` Namhyung Kim
2023-12-12 18:00       ` Ian Rogers
2023-12-12 18:31         ` Mark Rutland
2023-12-12 18:49           ` Namhyung Kim
2023-12-12 19:22             ` Liang, Kan
2023-12-13 12:05               ` Mark Rutland
2023-12-12 19:26             ` Ian Rogers
2023-12-15 15:36           ` Arnaldo Carvalho de Melo
2023-12-15 16:51             ` Mark Rutland
2023-12-15 17:49               ` Arnaldo Carvalho de Melo [this message]
2024-01-05 12:31                 ` Mark Rutland
2023-12-15 17:59             ` Liang, Kan
2023-12-15 18:26               ` Arnaldo Carvalho de Melo
2023-12-15 18:53                 ` Liang, Kan
2023-12-18 20:23                   ` Arnaldo Carvalho de Melo
2023-12-18 21:07                     ` Liang, Kan
2023-12-12  0:02 ` Ian Rogers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZXyRmpyzNGd3gKwV@kernel.org \
    --to=acme@kernel.org \
    --cc=irogers@google.com \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=marcan@marcan.st \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=namhyung@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.