linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
@ 2023-11-21 12:08 Hector Martin
  2023-11-21 13:40 ` Marc Zyngier
  2023-11-21 23:43 ` Bagas Sanjaya
  0 siblings, 2 replies; 53+ messages in thread
From: Hector Martin @ 2023-11-21 12:08 UTC (permalink / raw)
  To: linux-perf-users, LKML; +Cc: Marc Zyngier, Asahi Linux

Perf broke on all Apple ARM64 systems (tested almost everything), and
according to maz also on Juno (so, probably all big.LITTLE) since v6.5.

Test command:

sudo taskset -c 0 ./perf stat -e apple_icestorm_pmu/cycles/ -e
apple_firestorm_pmu/cycles/ -e cycles ls

Since this is taskset to CPU #0 (LITTLE core, icestorm), only events for
icestorm are expected.

I bisected the breakage to two distinct points:

5ea8f2ccffb is the first bad commit. With its parent, the output is as
expected (same as v6.4):

         3,297,462      apple_icestorm_pmu/cycles/

     <not counted>      apple_firestorm_pmu/cycles/
                       (0.00%)
     <not counted>      cycles
                       (0.00%)

With 5ea8f2ccffb everything breaks:

   <not supported>      apple_icestorm_pmu/cycles/

   <not supported>      apple_firestorm_pmu/cycles/

     <not counted>      cycles
                       (0.00%)

Somewhere along the way to 82fe2e45cdb00 things get even worse (didn't
bother bisecting this range). With its parent:

   <not supported>      apple_icestorm_pmu/cycles/

   <not supported>      apple_firestorm_pmu/cycles/

   <not supported>      apple_icestorm_pmu/cycles/

   <not supported>      apple_firestorm_pmu/cycles/

Then 82fe2e45cdb00 leads to the current v6.5 behavior:

     <not counted>      apple_icestorm_pmu/cycles/
                       (0.00%)
     <not counted>      apple_firestorm_pmu/cycles/
                       (0.00%)
     <not counted>      cycles
                       (0.00%)

If I taskset the task to CPU#2 (big core, firestorm), I get events:

         1,454,858      apple_icestorm_pmu/cycles/

         1,454,760      apple_firestorm_pmu/cycles/

         1,454,384      cycles


So the current behavior is that all output seems to come from the
firestorm PMU event counter, regardless of requested event.

This is all unchanged and still broken in v6.7-rc2.

- Hector

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-21 12:08 [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5 Hector Martin
@ 2023-11-21 13:40 ` Marc Zyngier
  2023-11-21 15:24   ` Marc Zyngier
  2023-11-21 23:43 ` Bagas Sanjaya
  1 sibling, 1 reply; 53+ messages in thread
From: Marc Zyngier @ 2023-11-21 13:40 UTC (permalink / raw)
  To: Hector Martin, Arnaldo Carvalho de Melo, Ian Rogers, James Clark
  Cc: linux-perf-users, LKML, Asahi Linux, Mark Rutland

[Adding key people on Cc]

On Tue, 21 Nov 2023 12:08:48 +0000,
Hector Martin <marcan@marcan.st> wrote:
> 
> Perf broke on all Apple ARM64 systems (tested almost everything), and
> according to maz also on Juno (so, probably all big.LITTLE) since v6.5.

I can confirm that at least on 6.7-rc2, perf is pretty busted on any
asymmetric ARM platform. It isn't clear what criteria is used to pick
the PMU, but nothing works anymore.

The saving grace in my case is that Debian still ships a 6.1 perftool
package, but that's obviously not going to last.

I'm happy to test potential fixes.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-21 13:40 ` Marc Zyngier
@ 2023-11-21 15:24   ` Marc Zyngier
  2023-11-21 15:40     ` Mark Rutland
                       ` (2 more replies)
  0 siblings, 3 replies; 53+ messages in thread
From: Marc Zyngier @ 2023-11-21 15:24 UTC (permalink / raw)
  To: Mark Rutland, Hector Martin, Arnaldo Carvalho de Melo, Ian Rogers,
	James Clark
  Cc: linux-perf-users, LKML, Asahi Linux

On Tue, 21 Nov 2023 13:40:31 +0000,
Marc Zyngier <maz@kernel.org> wrote:
> 
> [Adding key people on Cc]
> 
> On Tue, 21 Nov 2023 12:08:48 +0000,
> Hector Martin <marcan@marcan.st> wrote:
> > 
> > Perf broke on all Apple ARM64 systems (tested almost everything), and
> > according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> 
> I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> asymmetric ARM platform. It isn't clear what criteria is used to pick
> the PMU, but nothing works anymore.
> 
> The saving grace in my case is that Debian still ships a 6.1 perftool
> package, but that's obviously not going to last.
> 
> I'm happy to test potential fixes.

At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
-vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
CPU):

<quote>
maz@valley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 0 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
 apple_firestorm_pmu/cycles/ -e cycles ls
Using CPUID 0x00000000612f0280
Attempt to add: apple_icestorm_pmu/cycles=0/
..after resolving event: apple_icestorm_pmu/cycles=0/
Opening: unknown-hardware:HG
------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0xb00000000
  disabled                         1
------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
sys_perf_event_open failed, error -95
Attempt to add: apple_firestorm_pmu/cycles=0/
..after resolving event: apple_firestorm_pmu/cycles=0/
Control descriptor is not initialized
Opening: apple_icestorm_pmu/cycles/
------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  size                             136
  config                           0 (PERF_COUNT_HW_CPU_CYCLES)
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 3
Opening: apple_firestorm_pmu/cycles/
------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  size                             136
  config                           0 (PERF_COUNT_HW_CPU_CYCLES)
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
Opening: cycles
------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  size                             136
  config                           0 (PERF_COUNT_HW_CPU_CYCLES)
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 5
arch			builtin-diff.o      builtin-mem.o	 common-cmds.h    perf-completion.sh
bench			builtin-evlist.c    builtin-probe.c	 CREDITS	  perf.h
Build			builtin-evlist.o    builtin-probe.o	 design.txt	  perf-in.o
builtin-annotate.c	builtin-ftrace.c    builtin-record.c	 dlfilters	  perf-iostat
builtin-annotate.o	builtin-ftrace.o    builtin-record.o	 Documentation    perf-iostat.sh
builtin-bench.c		builtin.h	    builtin-report.c	 FEATURE-DUMP	  perf.o
builtin-bench.o		builtin-help.c      builtin-report.o	 include	  perf-read-vdso.c
builtin-buildid-cache.c  builtin-help.o      builtin-sched.c	 jvmti		  perf-sys.h
builtin-buildid-cache.o  builtin-inject.c    builtin-script.c	 libapi	  PERF-VERSION-FILE
builtin-buildid-list.c	builtin-inject.o    builtin-script.o	 libperf	  perf-with-kcore
builtin-buildid-list.o	builtin-kallsyms.c  builtin-stat.c	 libsubcmd	  pmu-events
builtin-c2c.c		builtin-kallsyms.o  builtin-stat.o	 libsymbol	  python
builtin-c2c.o		builtin-kmem.c      builtin-timechart.c  Makefile	  python_ext_build
builtin-config.c	builtin-kvm.c	    builtin-top.c	 Makefile.config  scripts
builtin-config.o	builtin-kvm.o	    builtin-top.o	 Makefile.perf    tests
builtin-daemon.c	builtin-kwork.c     builtin-trace.c	 MANIFEST	  trace
builtin-daemon.o	builtin-list.c      builtin-version.c	 perf		  ui
builtin-data.c		builtin-list.o      builtin-version.o	 perf-archive	  util
builtin-data.o		builtin-lock.c      check-headers.sh	 perf-archive.sh
builtin-diff.c		builtin-mem.c	    command-list.txt	 perf.c
apple_icestorm_pmu/cycles/: -1: 0 873709 0
apple_firestorm_pmu/cycles/: -1: 0 873709 0
cycles: -1: 0 873709 0
apple_icestorm_pmu/cycles/: 0 873709 0
apple_firestorm_pmu/cycles/: 0 873709 0
cycles: 0 873709 0

 Performance counter stats for 'ls':

     <not counted>      apple_icestorm_pmu/cycles/                                              (0.00%)
     <not counted>      apple_firestorm_pmu/cycles/                                             (0.00%)
     <not counted>      cycles                                                                  (0.00%)

       0.000002250 seconds time elapsed

       0.000000000 seconds user
       0.000000000 seconds sys
</quote>

If I run the same thing on another CPU cluster (firestorm), I get
this:

<quote>
maz@valley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 2 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
 apple_firestorm_pmu/cycles/ -e cycles ls
Using CPUID 0x00000000612f0280
Attempt to add: apple_icestorm_pmu/cycles=0/
..after resolving event: apple_icestorm_pmu/cycles=0/
Opening: unknown-hardware:HG
------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0xb00000000
  disabled                         1
------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
sys_perf_event_open failed, error -95
Attempt to add: apple_firestorm_pmu/cycles=0/
..after resolving event: apple_firestorm_pmu/cycles=0/
Control descriptor is not initialized
Opening: apple_icestorm_pmu/cycles/
------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  size                             136
  config                           0 (PERF_COUNT_HW_CPU_CYCLES)
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 3
Opening: apple_firestorm_pmu/cycles/
------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  size                             136
  config                           0 (PERF_COUNT_HW_CPU_CYCLES)
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 4
Opening: cycles
------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  size                             136
  config                           0 (PERF_COUNT_HW_CPU_CYCLES)
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 5
arch			builtin-diff.o      builtin-mem.o	 common-cmds.h    perf-completion.sh
bench			builtin-evlist.c    builtin-probe.c	 CREDITS	  perf.h
Build			builtin-evlist.o    builtin-probe.o	 design.txt	  perf-in.o
builtin-annotate.c	builtin-ftrace.c    builtin-record.c	 dlfilters	  perf-iostat
builtin-annotate.o	builtin-ftrace.o    builtin-record.o	 Documentation    perf-iostat.sh
builtin-bench.c		builtin.h	    builtin-report.c	 FEATURE-DUMP	  perf.o
builtin-bench.o		builtin-help.c      builtin-report.o	 include	  perf-read-vdso.c
builtin-buildid-cache.c  builtin-help.o      builtin-sched.c	 jvmti		  perf-sys.h
builtin-buildid-cache.o  builtin-inject.c    builtin-script.c	 libapi	  PERF-VERSION-FILE
builtin-buildid-list.c	builtin-inject.o    builtin-script.o	 libperf	  perf-with-kcore
builtin-buildid-list.o	builtin-kallsyms.c  builtin-stat.c	 libsubcmd	  pmu-events
builtin-c2c.c		builtin-kallsyms.o  builtin-stat.o	 libsymbol	  python
builtin-c2c.o		builtin-kmem.c      builtin-timechart.c  Makefile	  python_ext_build
builtin-config.c	builtin-kvm.c	    builtin-top.c	 Makefile.config  scripts
builtin-config.o	builtin-kvm.o	    builtin-top.o	 Makefile.perf    tests
builtin-daemon.c	builtin-kwork.c     builtin-trace.c	 MANIFEST	  trace
builtin-daemon.o	builtin-list.c      builtin-version.c	 perf		  ui
builtin-data.c		builtin-list.o      builtin-version.o	 perf-archive	  util
builtin-data.o		builtin-lock.c      check-headers.sh	 perf-archive.sh
builtin-diff.c		builtin-mem.c	    command-list.txt	 perf.c
apple_icestorm_pmu/cycles/: -1: 1035101 469125 469125
apple_firestorm_pmu/cycles/: -1: 1035035 469125 469125
cycles: -1: 1034653 469125 469125
apple_icestorm_pmu/cycles/: 1035101 469125 469125
apple_firestorm_pmu/cycles/: 1035035 469125 469125
cycles: 1034653 469125 469125

 Performance counter stats for 'ls':

         1,035,101      apple_icestorm_pmu/cycles/                                            
         1,035,035      apple_firestorm_pmu/cycles/                                           
         1,034,653      cycles                                                                

       0.000001333 seconds time elapsed

       0.000000000 seconds user
       0.000000000 seconds sys
</quote>

which doesn't make any sense either. I really don't understand what
this PERF_TYPE_HARDWARE does here (the *real* types are 10 and 11),
nor what this 'cycle=0' stuff is.

/puzzled

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-21 15:24   ` Marc Zyngier
@ 2023-11-21 15:40     ` Mark Rutland
  2023-11-21 15:46       ` Ian Rogers
  2023-11-21 15:41     ` Ian Rogers
  2023-11-23 14:23     ` Mark Rutland
  2 siblings, 1 reply; 53+ messages in thread
From: Mark Rutland @ 2023-11-21 15:40 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Hector Martin, Arnaldo Carvalho de Melo, Ian Rogers, James Clark,
	linux-perf-users, LKML, Asahi Linux

On Tue, Nov 21, 2023 at 03:24:25PM +0000, Marc Zyngier wrote:
> On Tue, 21 Nov 2023 13:40:31 +0000,
> Marc Zyngier <maz@kernel.org> wrote:
> > 
> > [Adding key people on Cc]
> > 
> > On Tue, 21 Nov 2023 12:08:48 +0000,
> > Hector Martin <marcan@marcan.st> wrote:
> > > 
> > > Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> > 
> > I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> > asymmetric ARM platform. It isn't clear what criteria is used to pick
> > the PMU, but nothing works anymore.
> > 
> > The saving grace in my case is that Debian still ships a 6.1 perftool
> > package, but that's obviously not going to last.
> > 
> > I'm happy to test potential fixes.
> 
> At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
> -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
> CPU):

IIUC the tool is doing the wrong thing here and overriding explicit
${pmu}/${event}/ events with PERF_TYPE_HARDWARE events rather than events using
that ${pmu}'s type and event namespace.

Regardless of the *new* ABI that allows PERF_TYPE_HARDWARE events to be
targetted to a specific PMU, it's semantically wrong to rewrite events like
this since ${pmu}/${event}/ is not necessarily equivalent to a similarly-named
PERF_COUNT_HW_${EVENT}. 

Mark.

> <quote>
> maz@valley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 0 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
>  apple_firestorm_pmu/cycles/ -e cycles ls
> Using CPUID 0x00000000612f0280
> Attempt to add: apple_icestorm_pmu/cycles=0/
> ..after resolving event: apple_icestorm_pmu/cycles=0/
> Opening: unknown-hardware:HG
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   config                           0xb00000000
>   disabled                         1
> ------------------------------------------------------------
> sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
> sys_perf_event_open failed, error -95
> Attempt to add: apple_firestorm_pmu/cycles=0/
> ..after resolving event: apple_firestorm_pmu/cycles=0/
> Control descriptor is not initialized
> Opening: apple_icestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 3
> Opening: apple_firestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
> Opening: cycles
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 5
> arch			builtin-diff.o      builtin-mem.o	 common-cmds.h    perf-completion.sh
> bench			builtin-evlist.c    builtin-probe.c	 CREDITS	  perf.h
> Build			builtin-evlist.o    builtin-probe.o	 design.txt	  perf-in.o
> builtin-annotate.c	builtin-ftrace.c    builtin-record.c	 dlfilters	  perf-iostat
> builtin-annotate.o	builtin-ftrace.o    builtin-record.o	 Documentation    perf-iostat.sh
> builtin-bench.c		builtin.h	    builtin-report.c	 FEATURE-DUMP	  perf.o
> builtin-bench.o		builtin-help.c      builtin-report.o	 include	  perf-read-vdso.c
> builtin-buildid-cache.c  builtin-help.o      builtin-sched.c	 jvmti		  perf-sys.h
> builtin-buildid-cache.o  builtin-inject.c    builtin-script.c	 libapi	  PERF-VERSION-FILE
> builtin-buildid-list.c	builtin-inject.o    builtin-script.o	 libperf	  perf-with-kcore
> builtin-buildid-list.o	builtin-kallsyms.c  builtin-stat.c	 libsubcmd	  pmu-events
> builtin-c2c.c		builtin-kallsyms.o  builtin-stat.o	 libsymbol	  python
> builtin-c2c.o		builtin-kmem.c      builtin-timechart.c  Makefile	  python_ext_build
> builtin-config.c	builtin-kvm.c	    builtin-top.c	 Makefile.config  scripts
> builtin-config.o	builtin-kvm.o	    builtin-top.o	 Makefile.perf    tests
> builtin-daemon.c	builtin-kwork.c     builtin-trace.c	 MANIFEST	  trace
> builtin-daemon.o	builtin-list.c      builtin-version.c	 perf		  ui
> builtin-data.c		builtin-list.o      builtin-version.o	 perf-archive	  util
> builtin-data.o		builtin-lock.c      check-headers.sh	 perf-archive.sh
> builtin-diff.c		builtin-mem.c	    command-list.txt	 perf.c
> apple_icestorm_pmu/cycles/: -1: 0 873709 0
> apple_firestorm_pmu/cycles/: -1: 0 873709 0
> cycles: -1: 0 873709 0
> apple_icestorm_pmu/cycles/: 0 873709 0
> apple_firestorm_pmu/cycles/: 0 873709 0
> cycles: 0 873709 0
> 
>  Performance counter stats for 'ls':
> 
>      <not counted>      apple_icestorm_pmu/cycles/                                              (0.00%)
>      <not counted>      apple_firestorm_pmu/cycles/                                             (0.00%)
>      <not counted>      cycles                                                                  (0.00%)
> 
>        0.000002250 seconds time elapsed
> 
>        0.000000000 seconds user
>        0.000000000 seconds sys
> </quote>
> 
> If I run the same thing on another CPU cluster (firestorm), I get
> this:
> 
> <quote>
> maz@valley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 2 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
>  apple_firestorm_pmu/cycles/ -e cycles ls
> Using CPUID 0x00000000612f0280
> Attempt to add: apple_icestorm_pmu/cycles=0/
> ..after resolving event: apple_icestorm_pmu/cycles=0/
> Opening: unknown-hardware:HG
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   config                           0xb00000000
>   disabled                         1
> ------------------------------------------------------------
> sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
> sys_perf_event_open failed, error -95
> Attempt to add: apple_firestorm_pmu/cycles=0/
> ..after resolving event: apple_firestorm_pmu/cycles=0/
> Control descriptor is not initialized
> Opening: apple_icestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 3
> Opening: apple_firestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 4
> Opening: cycles
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 5
> arch			builtin-diff.o      builtin-mem.o	 common-cmds.h    perf-completion.sh
> bench			builtin-evlist.c    builtin-probe.c	 CREDITS	  perf.h
> Build			builtin-evlist.o    builtin-probe.o	 design.txt	  perf-in.o
> builtin-annotate.c	builtin-ftrace.c    builtin-record.c	 dlfilters	  perf-iostat
> builtin-annotate.o	builtin-ftrace.o    builtin-record.o	 Documentation    perf-iostat.sh
> builtin-bench.c		builtin.h	    builtin-report.c	 FEATURE-DUMP	  perf.o
> builtin-bench.o		builtin-help.c      builtin-report.o	 include	  perf-read-vdso.c
> builtin-buildid-cache.c  builtin-help.o      builtin-sched.c	 jvmti		  perf-sys.h
> builtin-buildid-cache.o  builtin-inject.c    builtin-script.c	 libapi	  PERF-VERSION-FILE
> builtin-buildid-list.c	builtin-inject.o    builtin-script.o	 libperf	  perf-with-kcore
> builtin-buildid-list.o	builtin-kallsyms.c  builtin-stat.c	 libsubcmd	  pmu-events
> builtin-c2c.c		builtin-kallsyms.o  builtin-stat.o	 libsymbol	  python
> builtin-c2c.o		builtin-kmem.c      builtin-timechart.c  Makefile	  python_ext_build
> builtin-config.c	builtin-kvm.c	    builtin-top.c	 Makefile.config  scripts
> builtin-config.o	builtin-kvm.o	    builtin-top.o	 Makefile.perf    tests
> builtin-daemon.c	builtin-kwork.c     builtin-trace.c	 MANIFEST	  trace
> builtin-daemon.o	builtin-list.c      builtin-version.c	 perf		  ui
> builtin-data.c		builtin-list.o      builtin-version.o	 perf-archive	  util
> builtin-data.o		builtin-lock.c      check-headers.sh	 perf-archive.sh
> builtin-diff.c		builtin-mem.c	    command-list.txt	 perf.c
> apple_icestorm_pmu/cycles/: -1: 1035101 469125 469125
> apple_firestorm_pmu/cycles/: -1: 1035035 469125 469125
> cycles: -1: 1034653 469125 469125
> apple_icestorm_pmu/cycles/: 1035101 469125 469125
> apple_firestorm_pmu/cycles/: 1035035 469125 469125
> cycles: 1034653 469125 469125
> 
>  Performance counter stats for 'ls':
> 
>          1,035,101      apple_icestorm_pmu/cycles/                                            
>          1,035,035      apple_firestorm_pmu/cycles/                                           
>          1,034,653      cycles                                                                
> 
>        0.000001333 seconds time elapsed
> 
>        0.000000000 seconds user
>        0.000000000 seconds sys
> </quote>
> 
> which doesn't make any sense either. I really don't understand what
> this PERF_TYPE_HARDWARE does here (the *real* types are 10 and 11),
> nor what this 'cycle=0' stuff is.
> 
> /puzzled
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-21 15:24   ` Marc Zyngier
  2023-11-21 15:40     ` Mark Rutland
@ 2023-11-21 15:41     ` Ian Rogers
  2023-11-21 15:56       ` Mark Rutland
  2023-11-23 14:23     ` Mark Rutland
  2 siblings, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2023-11-21 15:41 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Mark Rutland, Hector Martin, Arnaldo Carvalho de Melo,
	James Clark, linux-perf-users, LKML, Asahi Linux

On Tue, Nov 21, 2023 at 7:24 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Tue, 21 Nov 2023 13:40:31 +0000,
> Marc Zyngier <maz@kernel.org> wrote:
> >
> > [Adding key people on Cc]
> >
> > On Tue, 21 Nov 2023 12:08:48 +0000,
> > Hector Martin <marcan@marcan.st> wrote:
> > >
> > > Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> >
> > I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> > asymmetric ARM platform. It isn't clear what criteria is used to pick
> > the PMU, but nothing works anymore.
> >
> > The saving grace in my case is that Debian still ships a 6.1 perftool
> > package, but that's obviously not going to last.
> >
> > I'm happy to test potential fixes.
>
> At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
> -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
> CPU):
>
> <quote>
> maz@valley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 0 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
>  apple_firestorm_pmu/cycles/ -e cycles ls
> Using CPUID 0x00000000612f0280
> Attempt to add: apple_icestorm_pmu/cycles=0/
> ..after resolving event: apple_icestorm_pmu/cycles=0/
> Opening: unknown-hardware:HG
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   config                           0xb00000000
>   disabled                         1
> ------------------------------------------------------------
> sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
> sys_perf_event_open failed, error -95
> Attempt to add: apple_firestorm_pmu/cycles=0/
> ..after resolving event: apple_firestorm_pmu/cycles=0/
> Control descriptor is not initialized
> Opening: apple_icestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 3
> Opening: apple_firestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
> Opening: cycles
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 5
> arch                    builtin-diff.o      builtin-mem.o        common-cmds.h    perf-completion.sh
> bench                   builtin-evlist.c    builtin-probe.c      CREDITS          perf.h
> Build                   builtin-evlist.o    builtin-probe.o      design.txt       perf-in.o
> builtin-annotate.c      builtin-ftrace.c    builtin-record.c     dlfilters        perf-iostat
> builtin-annotate.o      builtin-ftrace.o    builtin-record.o     Documentation    perf-iostat.sh
> builtin-bench.c         builtin.h           builtin-report.c     FEATURE-DUMP     perf.o
> builtin-bench.o         builtin-help.c      builtin-report.o     include          perf-read-vdso.c
> builtin-buildid-cache.c  builtin-help.o      builtin-sched.c     jvmti            perf-sys.h
> builtin-buildid-cache.o  builtin-inject.c    builtin-script.c    libapi   PERF-VERSION-FILE
> builtin-buildid-list.c  builtin-inject.o    builtin-script.o     libperf          perf-with-kcore
> builtin-buildid-list.o  builtin-kallsyms.c  builtin-stat.c       libsubcmd        pmu-events
> builtin-c2c.c           builtin-kallsyms.o  builtin-stat.o       libsymbol        python
> builtin-c2c.o           builtin-kmem.c      builtin-timechart.c  Makefile         python_ext_build
> builtin-config.c        builtin-kvm.c       builtin-top.c        Makefile.config  scripts
> builtin-config.o        builtin-kvm.o       builtin-top.o        Makefile.perf    tests
> builtin-daemon.c        builtin-kwork.c     builtin-trace.c      MANIFEST         trace
> builtin-daemon.o        builtin-list.c      builtin-version.c    perf             ui
> builtin-data.c          builtin-list.o      builtin-version.o    perf-archive     util
> builtin-data.o          builtin-lock.c      check-headers.sh     perf-archive.sh
> builtin-diff.c          builtin-mem.c       command-list.txt     perf.c
> apple_icestorm_pmu/cycles/: -1: 0 873709 0
> apple_firestorm_pmu/cycles/: -1: 0 873709 0
> cycles: -1: 0 873709 0
> apple_icestorm_pmu/cycles/: 0 873709 0
> apple_firestorm_pmu/cycles/: 0 873709 0
> cycles: 0 873709 0
>
>  Performance counter stats for 'ls':
>
>      <not counted>      apple_icestorm_pmu/cycles/                                              (0.00%)
>      <not counted>      apple_firestorm_pmu/cycles/                                             (0.00%)
>      <not counted>      cycles                                                                  (0.00%)
>
>        0.000002250 seconds time elapsed
>
>        0.000000000 seconds user
>        0.000000000 seconds sys
> </quote>
>
> If I run the same thing on another CPU cluster (firestorm), I get
> this:
>
> <quote>
> maz@valley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 2 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
>  apple_firestorm_pmu/cycles/ -e cycles ls
> Using CPUID 0x00000000612f0280
> Attempt to add: apple_icestorm_pmu/cycles=0/
> ..after resolving event: apple_icestorm_pmu/cycles=0/
> Opening: unknown-hardware:HG
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   config                           0xb00000000
>   disabled                         1
> ------------------------------------------------------------
> sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
> sys_perf_event_open failed, error -95
> Attempt to add: apple_firestorm_pmu/cycles=0/
> ..after resolving event: apple_firestorm_pmu/cycles=0/
> Control descriptor is not initialized
> Opening: apple_icestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 3
> Opening: apple_firestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 4
> Opening: cycles
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 5
> arch                    builtin-diff.o      builtin-mem.o        common-cmds.h    perf-completion.sh
> bench                   builtin-evlist.c    builtin-probe.c      CREDITS          perf.h
> Build                   builtin-evlist.o    builtin-probe.o      design.txt       perf-in.o
> builtin-annotate.c      builtin-ftrace.c    builtin-record.c     dlfilters        perf-iostat
> builtin-annotate.o      builtin-ftrace.o    builtin-record.o     Documentation    perf-iostat.sh
> builtin-bench.c         builtin.h           builtin-report.c     FEATURE-DUMP     perf.o
> builtin-bench.o         builtin-help.c      builtin-report.o     include          perf-read-vdso.c
> builtin-buildid-cache.c  builtin-help.o      builtin-sched.c     jvmti            perf-sys.h
> builtin-buildid-cache.o  builtin-inject.c    builtin-script.c    libapi   PERF-VERSION-FILE
> builtin-buildid-list.c  builtin-inject.o    builtin-script.o     libperf          perf-with-kcore
> builtin-buildid-list.o  builtin-kallsyms.c  builtin-stat.c       libsubcmd        pmu-events
> builtin-c2c.c           builtin-kallsyms.o  builtin-stat.o       libsymbol        python
> builtin-c2c.o           builtin-kmem.c      builtin-timechart.c  Makefile         python_ext_build
> builtin-config.c        builtin-kvm.c       builtin-top.c        Makefile.config  scripts
> builtin-config.o        builtin-kvm.o       builtin-top.o        Makefile.perf    tests
> builtin-daemon.c        builtin-kwork.c     builtin-trace.c      MANIFEST         trace
> builtin-daemon.o        builtin-list.c      builtin-version.c    perf             ui
> builtin-data.c          builtin-list.o      builtin-version.o    perf-archive     util
> builtin-data.o          builtin-lock.c      check-headers.sh     perf-archive.sh
> builtin-diff.c          builtin-mem.c       command-list.txt     perf.c
> apple_icestorm_pmu/cycles/: -1: 1035101 469125 469125
> apple_firestorm_pmu/cycles/: -1: 1035035 469125 469125
> cycles: -1: 1034653 469125 469125
> apple_icestorm_pmu/cycles/: 1035101 469125 469125
> apple_firestorm_pmu/cycles/: 1035035 469125 469125
> cycles: 1034653 469125 469125
>
>  Performance counter stats for 'ls':
>
>          1,035,101      apple_icestorm_pmu/cycles/
>          1,035,035      apple_firestorm_pmu/cycles/
>          1,034,653      cycles
>
>        0.000001333 seconds time elapsed
>
>        0.000000000 seconds user
>        0.000000000 seconds sys
> </quote>
>
> which doesn't make any sense either. I really don't understand what
> this PERF_TYPE_HARDWARE does here (the *real* types are 10 and 11),
> nor what this 'cycle=0' stuff is.

Hi Marc,

I'm unclear if you are running a newer perf tool on an older kernel or
not. In any case I'll assume the kernel and perf tool versions match.
In Linux 6.6 this patch was added to the ARM PMU:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/perf/arm_pmu.c?id=5c816728651ae425954542fed64d21d40cb75a9f

My guess is that the apple_icestorm_pmu requires a similar patch. The
perf tool is supposed to not use extended types when they aren't
supported:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n532
So I share your confusion as to why something broke.

PERF_TYPE_HARDWARE is a legacy type where there are hardcoded type and
config values that correspond to an event. The PMU driver turns legacy
events into the real types. On BIG.little systems if the legacy events
are monitoring a task a different event is needed for each PMU (ie >1
event). In your example you are monitoring 'ls', a task, and so
different cycles events are necessary. In the high 32-bits (the
extended type) the PMU is identified.

Thanks for reporting the issue,
Ian

> /puzzled
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-21 15:40     ` Mark Rutland
@ 2023-11-21 15:46       ` Ian Rogers
  2023-11-21 16:02         ` Mark Rutland
  0 siblings, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2023-11-21 15:46 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Marc Zyngier, Hector Martin, Arnaldo Carvalho de Melo,
	James Clark, linux-perf-users, LKML, Asahi Linux

On Tue, Nov 21, 2023 at 7:40 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Tue, Nov 21, 2023 at 03:24:25PM +0000, Marc Zyngier wrote:
> > On Tue, 21 Nov 2023 13:40:31 +0000,
> > Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > [Adding key people on Cc]
> > >
> > > On Tue, 21 Nov 2023 12:08:48 +0000,
> > > Hector Martin <marcan@marcan.st> wrote:
> > > >
> > > > Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > > according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> > >
> > > I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> > > asymmetric ARM platform. It isn't clear what criteria is used to pick
> > > the PMU, but nothing works anymore.
> > >
> > > The saving grace in my case is that Debian still ships a 6.1 perftool
> > > package, but that's obviously not going to last.
> > >
> > > I'm happy to test potential fixes.
> >
> > At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
> > -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
> > CPU):
>
> IIUC the tool is doing the wrong thing here and overriding explicit
> ${pmu}/${event}/ events with PERF_TYPE_HARDWARE events rather than events using
> that ${pmu}'s type and event namespace.
>
> Regardless of the *new* ABI that allows PERF_TYPE_HARDWARE events to be
> targetted to a specific PMU, it's semantically wrong to rewrite events like
> this since ${pmu}/${event}/ is not necessarily equivalent to a similarly-named
> PERF_COUNT_HW_${EVENT}.

If you name a PMU and an event then the event should only be opened on
that PMU, 100% agree. There's a bunch of output, but when the legacy
cycles event is opened it appears to be because it was explicitly
requested.

Thanks,
Ian

> Mark.
>
> > <quote>
> > maz@valley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 0 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
> >  apple_firestorm_pmu/cycles/ -e cycles ls
> > Using CPUID 0x00000000612f0280
> > Attempt to add: apple_icestorm_pmu/cycles=0/
> > ..after resolving event: apple_icestorm_pmu/cycles=0/
> > Opening: unknown-hardware:HG
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             0 (PERF_TYPE_HARDWARE)
> >   config                           0xb00000000
> >   disabled                         1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
> > sys_perf_event_open failed, error -95
> > Attempt to add: apple_firestorm_pmu/cycles=0/
> > ..after resolving event: apple_firestorm_pmu/cycles=0/
> > Control descriptor is not initialized
> > Opening: apple_icestorm_pmu/cycles/
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             0 (PERF_TYPE_HARDWARE)
> >   size                             136
> >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> >   sample_type                      IDENTIFIER
> >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >   disabled                         1
> >   inherit                          1
> >   enable_on_exec                   1
> >   exclude_guest                    1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 3
> > Opening: apple_firestorm_pmu/cycles/
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             0 (PERF_TYPE_HARDWARE)
> >   size                             136
> >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> >   sample_type                      IDENTIFIER
> >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >   disabled                         1
> >   inherit                          1
> >   enable_on_exec                   1
> >   exclude_guest                    1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
> > Opening: cycles
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             0 (PERF_TYPE_HARDWARE)
> >   size                             136
> >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> >   sample_type                      IDENTIFIER
> >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >   disabled                         1
> >   inherit                          1
> >   enable_on_exec                   1
> >   exclude_guest                    1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 5
> > arch                  builtin-diff.o      builtin-mem.o        common-cmds.h    perf-completion.sh
> > bench                 builtin-evlist.c    builtin-probe.c      CREDITS          perf.h
> > Build                 builtin-evlist.o    builtin-probe.o      design.txt       perf-in.o
> > builtin-annotate.c    builtin-ftrace.c    builtin-record.c     dlfilters        perf-iostat
> > builtin-annotate.o    builtin-ftrace.o    builtin-record.o     Documentation    perf-iostat.sh
> > builtin-bench.c               builtin.h           builtin-report.c     FEATURE-DUMP     perf.o
> > builtin-bench.o               builtin-help.c      builtin-report.o     include          perf-read-vdso.c
> > builtin-buildid-cache.c  builtin-help.o      builtin-sched.c   jvmti            perf-sys.h
> > builtin-buildid-cache.o  builtin-inject.c    builtin-script.c  libapi   PERF-VERSION-FILE
> > builtin-buildid-list.c        builtin-inject.o    builtin-script.o     libperf          perf-with-kcore
> > builtin-buildid-list.o        builtin-kallsyms.c  builtin-stat.c       libsubcmd        pmu-events
> > builtin-c2c.c         builtin-kallsyms.o  builtin-stat.o       libsymbol        python
> > builtin-c2c.o         builtin-kmem.c      builtin-timechart.c  Makefile         python_ext_build
> > builtin-config.c      builtin-kvm.c       builtin-top.c        Makefile.config  scripts
> > builtin-config.o      builtin-kvm.o       builtin-top.o        Makefile.perf    tests
> > builtin-daemon.c      builtin-kwork.c     builtin-trace.c      MANIFEST         trace
> > builtin-daemon.o      builtin-list.c      builtin-version.c    perf             ui
> > builtin-data.c                builtin-list.o      builtin-version.o    perf-archive     util
> > builtin-data.o                builtin-lock.c      check-headers.sh     perf-archive.sh
> > builtin-diff.c                builtin-mem.c       command-list.txt     perf.c
> > apple_icestorm_pmu/cycles/: -1: 0 873709 0
> > apple_firestorm_pmu/cycles/: -1: 0 873709 0
> > cycles: -1: 0 873709 0
> > apple_icestorm_pmu/cycles/: 0 873709 0
> > apple_firestorm_pmu/cycles/: 0 873709 0
> > cycles: 0 873709 0
> >
> >  Performance counter stats for 'ls':
> >
> >      <not counted>      apple_icestorm_pmu/cycles/                                              (0.00%)
> >      <not counted>      apple_firestorm_pmu/cycles/                                             (0.00%)
> >      <not counted>      cycles                                                                  (0.00%)
> >
> >        0.000002250 seconds time elapsed
> >
> >        0.000000000 seconds user
> >        0.000000000 seconds sys
> > </quote>
> >
> > If I run the same thing on another CPU cluster (firestorm), I get
> > this:
> >
> > <quote>
> > maz@valley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 2 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
> >  apple_firestorm_pmu/cycles/ -e cycles ls
> > Using CPUID 0x00000000612f0280
> > Attempt to add: apple_icestorm_pmu/cycles=0/
> > ..after resolving event: apple_icestorm_pmu/cycles=0/
> > Opening: unknown-hardware:HG
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             0 (PERF_TYPE_HARDWARE)
> >   config                           0xb00000000
> >   disabled                         1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
> > sys_perf_event_open failed, error -95
> > Attempt to add: apple_firestorm_pmu/cycles=0/
> > ..after resolving event: apple_firestorm_pmu/cycles=0/
> > Control descriptor is not initialized
> > Opening: apple_icestorm_pmu/cycles/
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             0 (PERF_TYPE_HARDWARE)
> >   size                             136
> >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> >   sample_type                      IDENTIFIER
> >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >   disabled                         1
> >   inherit                          1
> >   enable_on_exec                   1
> >   exclude_guest                    1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 3
> > Opening: apple_firestorm_pmu/cycles/
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             0 (PERF_TYPE_HARDWARE)
> >   size                             136
> >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> >   sample_type                      IDENTIFIER
> >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >   disabled                         1
> >   inherit                          1
> >   enable_on_exec                   1
> >   exclude_guest                    1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 4
> > Opening: cycles
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             0 (PERF_TYPE_HARDWARE)
> >   size                             136
> >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> >   sample_type                      IDENTIFIER
> >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >   disabled                         1
> >   inherit                          1
> >   enable_on_exec                   1
> >   exclude_guest                    1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 5
> > arch                  builtin-diff.o      builtin-mem.o        common-cmds.h    perf-completion.sh
> > bench                 builtin-evlist.c    builtin-probe.c      CREDITS          perf.h
> > Build                 builtin-evlist.o    builtin-probe.o      design.txt       perf-in.o
> > builtin-annotate.c    builtin-ftrace.c    builtin-record.c     dlfilters        perf-iostat
> > builtin-annotate.o    builtin-ftrace.o    builtin-record.o     Documentation    perf-iostat.sh
> > builtin-bench.c               builtin.h           builtin-report.c     FEATURE-DUMP     perf.o
> > builtin-bench.o               builtin-help.c      builtin-report.o     include          perf-read-vdso.c
> > builtin-buildid-cache.c  builtin-help.o      builtin-sched.c   jvmti            perf-sys.h
> > builtin-buildid-cache.o  builtin-inject.c    builtin-script.c  libapi   PERF-VERSION-FILE
> > builtin-buildid-list.c        builtin-inject.o    builtin-script.o     libperf          perf-with-kcore
> > builtin-buildid-list.o        builtin-kallsyms.c  builtin-stat.c       libsubcmd        pmu-events
> > builtin-c2c.c         builtin-kallsyms.o  builtin-stat.o       libsymbol        python
> > builtin-c2c.o         builtin-kmem.c      builtin-timechart.c  Makefile         python_ext_build
> > builtin-config.c      builtin-kvm.c       builtin-top.c        Makefile.config  scripts
> > builtin-config.o      builtin-kvm.o       builtin-top.o        Makefile.perf    tests
> > builtin-daemon.c      builtin-kwork.c     builtin-trace.c      MANIFEST         trace
> > builtin-daemon.o      builtin-list.c      builtin-version.c    perf             ui
> > builtin-data.c                builtin-list.o      builtin-version.o    perf-archive     util
> > builtin-data.o                builtin-lock.c      check-headers.sh     perf-archive.sh
> > builtin-diff.c                builtin-mem.c       command-list.txt     perf.c
> > apple_icestorm_pmu/cycles/: -1: 1035101 469125 469125
> > apple_firestorm_pmu/cycles/: -1: 1035035 469125 469125
> > cycles: -1: 1034653 469125 469125
> > apple_icestorm_pmu/cycles/: 1035101 469125 469125
> > apple_firestorm_pmu/cycles/: 1035035 469125 469125
> > cycles: 1034653 469125 469125
> >
> >  Performance counter stats for 'ls':
> >
> >          1,035,101      apple_icestorm_pmu/cycles/
> >          1,035,035      apple_firestorm_pmu/cycles/
> >          1,034,653      cycles
> >
> >        0.000001333 seconds time elapsed
> >
> >        0.000000000 seconds user
> >        0.000000000 seconds sys
> > </quote>
> >
> > which doesn't make any sense either. I really don't understand what
> > this PERF_TYPE_HARDWARE does here (the *real* types are 10 and 11),
> > nor what this 'cycle=0' stuff is.
> >
> > /puzzled
> >
> >       M.
> >
> > --
> > Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-21 15:41     ` Ian Rogers
@ 2023-11-21 15:56       ` Mark Rutland
  2023-11-21 16:03         ` Ian Rogers
  0 siblings, 1 reply; 53+ messages in thread
From: Mark Rutland @ 2023-11-21 15:56 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Marc Zyngier, Hector Martin, Arnaldo Carvalho de Melo,
	James Clark, linux-perf-users, LKML, Asahi Linux

On Tue, Nov 21, 2023 at 07:41:17AM -0800, Ian Rogers wrote:
> Hi Marc,

Hi Ian,

> I'm unclear if you are running a newer perf tool on an older kernel or
> not. In any case I'll assume the kernel and perf tool versions match.
> In Linux 6.6 this patch was added to the ARM PMU:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/perf/arm_pmu.c?id=5c816728651ae425954542fed64d21d40cb75a9f
> 
> My guess is that the apple_icestorm_pmu requires a similar patch. 

The apple_icestorm_pmu PMU driver uses the arm_pmu framework, so it's using
that code (since v6.6).

> The perf tool is supposed to not use extended types when they aren't
> supported:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n532

How does that is_event_supported() check actually work? I suspect that's giving
the wrong answer.

Regardless, I think the tool is doing something semantically wrong, see below.

> So I share your confusion as to why something broke.
> 
> PERF_TYPE_HARDWARE is a legacy type where there are hardcoded type and
> config values that correspond to an event. The PMU driver turns legacy
> events into the real types. On BIG.little systems if the legacy events
> are monitoring a task a different event is needed for each PMU (ie >1
> event). In your example you are monitoring 'ls', a task, and so
> different cycles events are necessary. In the high 32-bits (the
> extended type) the PMU is identified.

I think the interesting thing here is that the tool is mapping events with an
explicit PMU into legacy PERF_TYPE_HARDWARE events, which is the opposite
direction than intended. Regardless of whether PERF_TYPE_HARDWARE events can be
targetted to a specific PMU, if the user has requested to use a specific PMU we
should be using that PMU and related event namespace.

Marc's command line was:

	sudo taskset -c 0 ./perf stat -vvv \
		-e apple_icestorm_pmu/cycles/ \
		-e apple_firestorm_pmu/cycles/ \
		-e cycles \
	ls

... and so the apple_*_pmu events should target their respective PMUs, and the
plain 'cycles' event could legitimately be opened as a single
PERF_TYPE_HARDWARE event, or split into two directed PERF_TYPE_HARDWARE events
targetting the two PMUs.

However, thwe tool opens three (undirected?) PERF_TYPE_HARDWARE events:

Opening: apple_icestorm_pmu/cycles/
------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  size                             136
  config                           0 (PERF_COUNT_HW_CPU_CYCLES)
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 3
Opening: apple_firestorm_pmu/cycles/
------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  size                             136
  config                           0 (PERF_COUNT_HW_CPU_CYCLES)
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
Opening: cycles
------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  size                             136
  config                           0 (PERF_COUNT_HW_CPU_CYCLES)
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  exclude_guest                    1
------------------------------------------------------------

Mark.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-21 15:46       ` Ian Rogers
@ 2023-11-21 16:02         ` Mark Rutland
  2023-11-21 16:09           ` Ian Rogers
  0 siblings, 1 reply; 53+ messages in thread
From: Mark Rutland @ 2023-11-21 16:02 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Marc Zyngier, Hector Martin, Arnaldo Carvalho de Melo,
	James Clark, linux-perf-users, LKML, Asahi Linux

On Tue, Nov 21, 2023 at 07:46:57AM -0800, Ian Rogers wrote:
> On Tue, Nov 21, 2023 at 7:40 AM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Tue, Nov 21, 2023 at 03:24:25PM +0000, Marc Zyngier wrote:
> > > On Tue, 21 Nov 2023 13:40:31 +0000,
> > > Marc Zyngier <maz@kernel.org> wrote:
> > > >
> > > > [Adding key people on Cc]
> > > >
> > > > On Tue, 21 Nov 2023 12:08:48 +0000,
> > > > Hector Martin <marcan@marcan.st> wrote:
> > > > >
> > > > > Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > > > according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> > > >
> > > > I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> > > > asymmetric ARM platform. It isn't clear what criteria is used to pick
> > > > the PMU, but nothing works anymore.
> > > >
> > > > The saving grace in my case is that Debian still ships a 6.1 perftool
> > > > package, but that's obviously not going to last.
> > > >
> > > > I'm happy to test potential fixes.
> > >
> > > At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
> > > -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
> > > CPU):
> >
> > IIUC the tool is doing the wrong thing here and overriding explicit
> > ${pmu}/${event}/ events with PERF_TYPE_HARDWARE events rather than events using
> > that ${pmu}'s type and event namespace.
> >
> > Regardless of the *new* ABI that allows PERF_TYPE_HARDWARE events to be
> > targetted to a specific PMU, it's semantically wrong to rewrite events like
> > this since ${pmu}/${event}/ is not necessarily equivalent to a similarly-named
> > PERF_COUNT_HW_${EVENT}.
> 
> If you name a PMU and an event then the event should only be opened on
> that PMU, 100% agree. There's a bunch of output, but when the legacy
> cycles event is opened it appears to be because it was explicitly
> requested.

I think you've missed that the named PMU events are being erreously transformed
into PERF_TYPE_HARDWARE events. Look at the -vvv output, e.g.

  Opening: apple_firestorm_pmu/cycles/
  ------------------------------------------------------------
  perf_event_attr:
    type                             0 (PERF_TYPE_HARDWARE)
    size                             136
    config                           0 (PERF_COUNT_HW_CPU_CYCLES)
    sample_type                      IDENTIFIER
    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
    disabled                         1
    inherit                          1
    enable_on_exec                   1
    exclude_guest                    1
  ------------------------------------------------------------
  sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4

... which should not be PERF_TYPE_HARDWARE && PERF_COUNT_HW_CPU_CYCLES.

Marc said that he bisected the issue down to commit:

  5ea8f2ccffb23983 ("perf parse-events: Support hardware events as terms")

... so it looks like something is going wrong when the events are being parsed,
e.g. losing the HW PMU information?

Thanks,
Mark.

>
> 
> Thanks,
> Ian
> 
> > Mark.
> >
> > > <quote>
> > > maz@valley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 0 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
> > >  apple_firestorm_pmu/cycles/ -e cycles ls
> > > Using CPUID 0x00000000612f0280
> > > Attempt to add: apple_icestorm_pmu/cycles=0/
> > > ..after resolving event: apple_icestorm_pmu/cycles=0/
> > > Opening: unknown-hardware:HG
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > >   type                             0 (PERF_TYPE_HARDWARE)
> > >   config                           0xb00000000
> > >   disabled                         1
> > > ------------------------------------------------------------
> > > sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
> > > sys_perf_event_open failed, error -95
> > > Attempt to add: apple_firestorm_pmu/cycles=0/
> > > ..after resolving event: apple_firestorm_pmu/cycles=0/
> > > Control descriptor is not initialized
> > > Opening: apple_icestorm_pmu/cycles/
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > >   type                             0 (PERF_TYPE_HARDWARE)
> > >   size                             136
> > >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > >   sample_type                      IDENTIFIER
> > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > >   disabled                         1
> > >   inherit                          1
> > >   enable_on_exec                   1
> > >   exclude_guest                    1
> > > ------------------------------------------------------------
> > > sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 3
> > > Opening: apple_firestorm_pmu/cycles/
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > >   type                             0 (PERF_TYPE_HARDWARE)
> > >   size                             136
> > >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > >   sample_type                      IDENTIFIER
> > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > >   disabled                         1
> > >   inherit                          1
> > >   enable_on_exec                   1
> > >   exclude_guest                    1
> > > ------------------------------------------------------------
> > > sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
> > > Opening: cycles
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > >   type                             0 (PERF_TYPE_HARDWARE)
> > >   size                             136
> > >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > >   sample_type                      IDENTIFIER
> > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > >   disabled                         1
> > >   inherit                          1
> > >   enable_on_exec                   1
> > >   exclude_guest                    1
> > > ------------------------------------------------------------
> > > sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 5
> > > arch                  builtin-diff.o      builtin-mem.o        common-cmds.h    perf-completion.sh
> > > bench                 builtin-evlist.c    builtin-probe.c      CREDITS          perf.h
> > > Build                 builtin-evlist.o    builtin-probe.o      design.txt       perf-in.o
> > > builtin-annotate.c    builtin-ftrace.c    builtin-record.c     dlfilters        perf-iostat
> > > builtin-annotate.o    builtin-ftrace.o    builtin-record.o     Documentation    perf-iostat.sh
> > > builtin-bench.c               builtin.h           builtin-report.c     FEATURE-DUMP     perf.o
> > > builtin-bench.o               builtin-help.c      builtin-report.o     include          perf-read-vdso.c
> > > builtin-buildid-cache.c  builtin-help.o      builtin-sched.c   jvmti            perf-sys.h
> > > builtin-buildid-cache.o  builtin-inject.c    builtin-script.c  libapi   PERF-VERSION-FILE
> > > builtin-buildid-list.c        builtin-inject.o    builtin-script.o     libperf          perf-with-kcore
> > > builtin-buildid-list.o        builtin-kallsyms.c  builtin-stat.c       libsubcmd        pmu-events
> > > builtin-c2c.c         builtin-kallsyms.o  builtin-stat.o       libsymbol        python
> > > builtin-c2c.o         builtin-kmem.c      builtin-timechart.c  Makefile         python_ext_build
> > > builtin-config.c      builtin-kvm.c       builtin-top.c        Makefile.config  scripts
> > > builtin-config.o      builtin-kvm.o       builtin-top.o        Makefile.perf    tests
> > > builtin-daemon.c      builtin-kwork.c     builtin-trace.c      MANIFEST         trace
> > > builtin-daemon.o      builtin-list.c      builtin-version.c    perf             ui
> > > builtin-data.c                builtin-list.o      builtin-version.o    perf-archive     util
> > > builtin-data.o                builtin-lock.c      check-headers.sh     perf-archive.sh
> > > builtin-diff.c                builtin-mem.c       command-list.txt     perf.c
> > > apple_icestorm_pmu/cycles/: -1: 0 873709 0
> > > apple_firestorm_pmu/cycles/: -1: 0 873709 0
> > > cycles: -1: 0 873709 0
> > > apple_icestorm_pmu/cycles/: 0 873709 0
> > > apple_firestorm_pmu/cycles/: 0 873709 0
> > > cycles: 0 873709 0
> > >
> > >  Performance counter stats for 'ls':
> > >
> > >      <not counted>      apple_icestorm_pmu/cycles/                                              (0.00%)
> > >      <not counted>      apple_firestorm_pmu/cycles/                                             (0.00%)
> > >      <not counted>      cycles                                                                  (0.00%)
> > >
> > >        0.000002250 seconds time elapsed
> > >
> > >        0.000000000 seconds user
> > >        0.000000000 seconds sys
> > > </quote>
> > >
> > > If I run the same thing on another CPU cluster (firestorm), I get
> > > this:
> > >
> > > <quote>
> > > maz@valley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 2 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
> > >  apple_firestorm_pmu/cycles/ -e cycles ls
> > > Using CPUID 0x00000000612f0280
> > > Attempt to add: apple_icestorm_pmu/cycles=0/
> > > ..after resolving event: apple_icestorm_pmu/cycles=0/
> > > Opening: unknown-hardware:HG
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > >   type                             0 (PERF_TYPE_HARDWARE)
> > >   config                           0xb00000000
> > >   disabled                         1
> > > ------------------------------------------------------------
> > > sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
> > > sys_perf_event_open failed, error -95
> > > Attempt to add: apple_firestorm_pmu/cycles=0/
> > > ..after resolving event: apple_firestorm_pmu/cycles=0/
> > > Control descriptor is not initialized
> > > Opening: apple_icestorm_pmu/cycles/
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > >   type                             0 (PERF_TYPE_HARDWARE)
> > >   size                             136
> > >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > >   sample_type                      IDENTIFIER
> > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > >   disabled                         1
> > >   inherit                          1
> > >   enable_on_exec                   1
> > >   exclude_guest                    1
> > > ------------------------------------------------------------
> > > sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 3
> > > Opening: apple_firestorm_pmu/cycles/
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > >   type                             0 (PERF_TYPE_HARDWARE)
> > >   size                             136
> > >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > >   sample_type                      IDENTIFIER
> > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > >   disabled                         1
> > >   inherit                          1
> > >   enable_on_exec                   1
> > >   exclude_guest                    1
> > > ------------------------------------------------------------
> > > sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 4
> > > Opening: cycles
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > >   type                             0 (PERF_TYPE_HARDWARE)
> > >   size                             136
> > >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > >   sample_type                      IDENTIFIER
> > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > >   disabled                         1
> > >   inherit                          1
> > >   enable_on_exec                   1
> > >   exclude_guest                    1
> > > ------------------------------------------------------------
> > > sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 5
> > > arch                  builtin-diff.o      builtin-mem.o        common-cmds.h    perf-completion.sh
> > > bench                 builtin-evlist.c    builtin-probe.c      CREDITS          perf.h
> > > Build                 builtin-evlist.o    builtin-probe.o      design.txt       perf-in.o
> > > builtin-annotate.c    builtin-ftrace.c    builtin-record.c     dlfilters        perf-iostat
> > > builtin-annotate.o    builtin-ftrace.o    builtin-record.o     Documentation    perf-iostat.sh
> > > builtin-bench.c               builtin.h           builtin-report.c     FEATURE-DUMP     perf.o
> > > builtin-bench.o               builtin-help.c      builtin-report.o     include          perf-read-vdso.c
> > > builtin-buildid-cache.c  builtin-help.o      builtin-sched.c   jvmti            perf-sys.h
> > > builtin-buildid-cache.o  builtin-inject.c    builtin-script.c  libapi   PERF-VERSION-FILE
> > > builtin-buildid-list.c        builtin-inject.o    builtin-script.o     libperf          perf-with-kcore
> > > builtin-buildid-list.o        builtin-kallsyms.c  builtin-stat.c       libsubcmd        pmu-events
> > > builtin-c2c.c         builtin-kallsyms.o  builtin-stat.o       libsymbol        python
> > > builtin-c2c.o         builtin-kmem.c      builtin-timechart.c  Makefile         python_ext_build
> > > builtin-config.c      builtin-kvm.c       builtin-top.c        Makefile.config  scripts
> > > builtin-config.o      builtin-kvm.o       builtin-top.o        Makefile.perf    tests
> > > builtin-daemon.c      builtin-kwork.c     builtin-trace.c      MANIFEST         trace
> > > builtin-daemon.o      builtin-list.c      builtin-version.c    perf             ui
> > > builtin-data.c                builtin-list.o      builtin-version.o    perf-archive     util
> > > builtin-data.o                builtin-lock.c      check-headers.sh     perf-archive.sh
> > > builtin-diff.c                builtin-mem.c       command-list.txt     perf.c
> > > apple_icestorm_pmu/cycles/: -1: 1035101 469125 469125
> > > apple_firestorm_pmu/cycles/: -1: 1035035 469125 469125
> > > cycles: -1: 1034653 469125 469125
> > > apple_icestorm_pmu/cycles/: 1035101 469125 469125
> > > apple_firestorm_pmu/cycles/: 1035035 469125 469125
> > > cycles: 1034653 469125 469125
> > >
> > >  Performance counter stats for 'ls':
> > >
> > >          1,035,101      apple_icestorm_pmu/cycles/
> > >          1,035,035      apple_firestorm_pmu/cycles/
> > >          1,034,653      cycles
> > >
> > >        0.000001333 seconds time elapsed
> > >
> > >        0.000000000 seconds user
> > >        0.000000000 seconds sys
> > > </quote>
> > >
> > > which doesn't make any sense either. I really don't understand what
> > > this PERF_TYPE_HARDWARE does here (the *real* types are 10 and 11),
> > > nor what this 'cycle=0' stuff is.
> > >
> > > /puzzled
> > >
> > >       M.
> > >
> > > --
> > > Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-21 15:56       ` Mark Rutland
@ 2023-11-21 16:03         ` Ian Rogers
  2023-11-21 16:08           ` Mark Rutland
  0 siblings, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2023-11-21 16:03 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Marc Zyngier, Hector Martin, Arnaldo Carvalho de Melo,
	James Clark, linux-perf-users, LKML, Asahi Linux

On Tue, Nov 21, 2023 at 7:56 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Tue, Nov 21, 2023 at 07:41:17AM -0800, Ian Rogers wrote:
> > Hi Marc,
>
> Hi Ian,
>
> > I'm unclear if you are running a newer perf tool on an older kernel or
> > not. In any case I'll assume the kernel and perf tool versions match.
> > In Linux 6.6 this patch was added to the ARM PMU:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/perf/arm_pmu.c?id=5c816728651ae425954542fed64d21d40cb75a9f
> >
> > My guess is that the apple_icestorm_pmu requires a similar patch.
>
> The apple_icestorm_pmu PMU driver uses the arm_pmu framework, so it's using
> that code (since v6.6).
>
> > The perf tool is supposed to not use extended types when they aren't
> > supported:
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n532
>
> How does that is_event_supported() check actually work? I suspect that's giving
> the wrong answer.

Maybe, the implementation is to check using perf_event_open:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/print-events.c?h=perf-tools-next#n232

This is recycling logic from perf list where many legacy cache events
are elided due to a lack of support.

> Regardless, I think the tool is doing something semantically wrong, see below.
>
> > So I share your confusion as to why something broke.
> >
> > PERF_TYPE_HARDWARE is a legacy type where there are hardcoded type and
> > config values that correspond to an event. The PMU driver turns legacy
> > events into the real types. On BIG.little systems if the legacy events
> > are monitoring a task a different event is needed for each PMU (ie >1
> > event). In your example you are monitoring 'ls', a task, and so
> > different cycles events are necessary. In the high 32-bits (the
> > extended type) the PMU is identified.
>
> I think the interesting thing here is that the tool is mapping events with an
> explicit PMU into legacy PERF_TYPE_HARDWARE events, which is the opposite
> direction than intended. Regardless of whether PERF_TYPE_HARDWARE events can be
> targetted to a specific PMU, if the user has requested to use a specific PMU we
> should be using that PMU and related event namespace.
>
> Marc's command line was:
>
>         sudo taskset -c 0 ./perf stat -vvv \
>                 -e apple_icestorm_pmu/cycles/ \
>                 -e apple_firestorm_pmu/cycles/ \
>                 -e cycles \

-e cycles here is a direct request for the legacy cycles event. It
will match in the parser here:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/parse-events.l?h=perf-tools-next#n301

which goes to:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/parse-events.y?h=perf-tools-next#n397

and as this is a hardware event there is wildcard expansion on each core PMU.

Thanks,
Ian

>         ls
>
> ... and so the apple_*_pmu events should target their respective PMUs, and the
> plain 'cycles' event could legitimately be opened as a single
> PERF_TYPE_HARDWARE event, or split into two directed PERF_TYPE_HARDWARE events
> targetting the two PMUs.
>
> However, thwe tool opens three (undirected?) PERF_TYPE_HARDWARE events:
>
> Opening: apple_icestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 3
> Opening: apple_firestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
> Opening: cycles
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
>
> Mark.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-21 16:03         ` Ian Rogers
@ 2023-11-21 16:08           ` Mark Rutland
  0 siblings, 0 replies; 53+ messages in thread
From: Mark Rutland @ 2023-11-21 16:08 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Marc Zyngier, Hector Martin, Arnaldo Carvalho de Melo,
	James Clark, linux-perf-users, LKML, Asahi Linux

On Tue, Nov 21, 2023 at 08:03:11AM -0800, Ian Rogers wrote:
> On Tue, Nov 21, 2023 at 7:56 AM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Tue, Nov 21, 2023 at 07:41:17AM -0800, Ian Rogers wrote:
> > > Hi Marc,
> >
> > Hi Ian,
> >
> > > I'm unclear if you are running a newer perf tool on an older kernel or
> > > not. In any case I'll assume the kernel and perf tool versions match.
> > > In Linux 6.6 this patch was added to the ARM PMU:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/perf/arm_pmu.c?id=5c816728651ae425954542fed64d21d40cb75a9f
> > >
> > > My guess is that the apple_icestorm_pmu requires a similar patch.
> >
> > The apple_icestorm_pmu PMU driver uses the arm_pmu framework, so it's using
> > that code (since v6.6).
> >
> > > The perf tool is supposed to not use extended types when they aren't
> > > supported:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n532
> >
> > How does that is_event_supported() check actually work? I suspect that's giving
> > the wrong answer.
> 
> Maybe, the implementation is to check using perf_event_open:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/print-events.c?h=perf-tools-next#n232
> 
> This is recycling logic from perf list where many legacy cache events
> are elided due to a lack of support.
> 
> > Regardless, I think the tool is doing something semantically wrong, see below.
> >
> > > So I share your confusion as to why something broke.
> > >
> > > PERF_TYPE_HARDWARE is a legacy type where there are hardcoded type and
> > > config values that correspond to an event. The PMU driver turns legacy
> > > events into the real types. On BIG.little systems if the legacy events
> > > are monitoring a task a different event is needed for each PMU (ie >1
> > > event). In your example you are monitoring 'ls', a task, and so
> > > different cycles events are necessary. In the high 32-bits (the
> > > extended type) the PMU is identified.
> >
> > I think the interesting thing here is that the tool is mapping events with an
> > explicit PMU into legacy PERF_TYPE_HARDWARE events, which is the opposite
> > direction than intended. Regardless of whether PERF_TYPE_HARDWARE events can be
> > targetted to a specific PMU, if the user has requested to use a specific PMU we
> > should be using that PMU and related event namespace.
> >
> > Marc's command line was:
> >
> >         sudo taskset -c 0 ./perf stat -vvv \
> >                 -e apple_icestorm_pmu/cycles/ \
> >                 -e apple_firestorm_pmu/cycles/ \
> >                 -e cycles \
> 
> -e cycles here is a direct request for the legacy cycles event. It
> will match in the parser here:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/parse-events.l?h=perf-tools-next#n301
> 
> which goes to:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/parse-events.y?h=perf-tools-next#n397
> 
> and as this is a hardware event there is wildcard expansion on each core PMU.

Please read the rest of my message, which was talking about the other two
events.

Mark.

> 
> Thanks,
> Ian
> 
> >         ls
> >
> > ... and so the apple_*_pmu events should target their respective PMUs, and the
> > plain 'cycles' event could legitimately be opened as a single
> > PERF_TYPE_HARDWARE event, or split into two directed PERF_TYPE_HARDWARE events
> > targetting the two PMUs.
> >
> > However, thwe tool opens three (undirected?) PERF_TYPE_HARDWARE events:
> >
> > Opening: apple_icestorm_pmu/cycles/
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             0 (PERF_TYPE_HARDWARE)
> >   size                             136
> >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> >   sample_type                      IDENTIFIER
> >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >   disabled                         1
> >   inherit                          1
> >   enable_on_exec                   1
> >   exclude_guest                    1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 3
> > Opening: apple_firestorm_pmu/cycles/
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             0 (PERF_TYPE_HARDWARE)
> >   size                             136
> >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> >   sample_type                      IDENTIFIER
> >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >   disabled                         1
> >   inherit                          1
> >   enable_on_exec                   1
> >   exclude_guest                    1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
> > Opening: cycles
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             0 (PERF_TYPE_HARDWARE)
> >   size                             136
> >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> >   sample_type                      IDENTIFIER
> >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >   disabled                         1
> >   inherit                          1
> >   enable_on_exec                   1
> >   exclude_guest                    1
> > ------------------------------------------------------------
> >
> > Mark.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-21 16:02         ` Mark Rutland
@ 2023-11-21 16:09           ` Ian Rogers
  2023-11-21 16:15             ` Mark Rutland
  0 siblings, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2023-11-21 16:09 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Marc Zyngier, Hector Martin, Arnaldo Carvalho de Melo,
	James Clark, linux-perf-users, LKML, Asahi Linux

On Tue, Nov 21, 2023 at 8:03 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Tue, Nov 21, 2023 at 07:46:57AM -0800, Ian Rogers wrote:
> > On Tue, Nov 21, 2023 at 7:40 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > On Tue, Nov 21, 2023 at 03:24:25PM +0000, Marc Zyngier wrote:
> > > > On Tue, 21 Nov 2023 13:40:31 +0000,
> > > > Marc Zyngier <maz@kernel.org> wrote:
> > > > >
> > > > > [Adding key people on Cc]
> > > > >
> > > > > On Tue, 21 Nov 2023 12:08:48 +0000,
> > > > > Hector Martin <marcan@marcan.st> wrote:
> > > > > >
> > > > > > Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > > > > according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> > > > >
> > > > > I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> > > > > asymmetric ARM platform. It isn't clear what criteria is used to pick
> > > > > the PMU, but nothing works anymore.
> > > > >
> > > > > The saving grace in my case is that Debian still ships a 6.1 perftool
> > > > > package, but that's obviously not going to last.
> > > > >
> > > > > I'm happy to test potential fixes.
> > > >
> > > > At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
> > > > -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
> > > > CPU):
> > >
> > > IIUC the tool is doing the wrong thing here and overriding explicit
> > > ${pmu}/${event}/ events with PERF_TYPE_HARDWARE events rather than events using
> > > that ${pmu}'s type and event namespace.
> > >
> > > Regardless of the *new* ABI that allows PERF_TYPE_HARDWARE events to be
> > > targetted to a specific PMU, it's semantically wrong to rewrite events like
> > > this since ${pmu}/${event}/ is not necessarily equivalent to a similarly-named
> > > PERF_COUNT_HW_${EVENT}.
> >
> > If you name a PMU and an event then the event should only be opened on
> > that PMU, 100% agree. There's a bunch of output, but when the legacy
> > cycles event is opened it appears to be because it was explicitly
> > requested.
>
> I think you've missed that the named PMU events are being erreously transformed
> into PERF_TYPE_HARDWARE events. Look at the -vvv output, e.g.
>
>   Opening: apple_firestorm_pmu/cycles/
>   ------------------------------------------------------------
>   perf_event_attr:
>     type                             0 (PERF_TYPE_HARDWARE)
>     size                             136
>     config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>     sample_type                      IDENTIFIER
>     read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>     disabled                         1
>     inherit                          1
>     enable_on_exec                   1
>     exclude_guest                    1
>   ------------------------------------------------------------
>   sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
>
> ... which should not be PERF_TYPE_HARDWARE && PERF_COUNT_HW_CPU_CYCLES.
>
> Marc said that he bisected the issue down to commit:
>
>   5ea8f2ccffb23983 ("perf parse-events: Support hardware events as terms")
>
> ... so it looks like something is going wrong when the events are being parsed,
> e.g. losing the HW PMU information?

Ok, I think I'm getting confused by other things. This looks like the issue.

I think it may be working as intended, but not how you intended :-) If
a core PMU is listed and then a legacy event, the legacy event should
be opened on the core PMU as a legacy event with the extended type
set. This is to allow things like legacy cache events to be opened on
a specified PMU. Legacy event names match with a higher priority than
those in sysfs or json as they are hard coded. Presumably the
expectation was that by advertising a cycles event, presumably in
sysfs, then this is what would be matched.

Thanks,
Ian

> Thanks,
> Mark.
>
> >
> >
> > Thanks,
> > Ian
> >
> > > Mark.
> > >
> > > > <quote>
> > > > maz@valley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 0 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
> > > >  apple_firestorm_pmu/cycles/ -e cycles ls
> > > > Using CPUID 0x00000000612f0280
> > > > Attempt to add: apple_icestorm_pmu/cycles=0/
> > > > ..after resolving event: apple_icestorm_pmu/cycles=0/
> > > > Opening: unknown-hardware:HG
> > > > ------------------------------------------------------------
> > > > perf_event_attr:
> > > >   type                             0 (PERF_TYPE_HARDWARE)
> > > >   config                           0xb00000000
> > > >   disabled                         1
> > > > ------------------------------------------------------------
> > > > sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
> > > > sys_perf_event_open failed, error -95
> > > > Attempt to add: apple_firestorm_pmu/cycles=0/
> > > > ..after resolving event: apple_firestorm_pmu/cycles=0/
> > > > Control descriptor is not initialized
> > > > Opening: apple_icestorm_pmu/cycles/
> > > > ------------------------------------------------------------
> > > > perf_event_attr:
> > > >   type                             0 (PERF_TYPE_HARDWARE)
> > > >   size                             136
> > > >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > > >   sample_type                      IDENTIFIER
> > > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > > >   disabled                         1
> > > >   inherit                          1
> > > >   enable_on_exec                   1
> > > >   exclude_guest                    1
> > > > ------------------------------------------------------------
> > > > sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 3
> > > > Opening: apple_firestorm_pmu/cycles/
> > > > ------------------------------------------------------------
> > > > perf_event_attr:
> > > >   type                             0 (PERF_TYPE_HARDWARE)
> > > >   size                             136
> > > >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > > >   sample_type                      IDENTIFIER
> > > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > > >   disabled                         1
> > > >   inherit                          1
> > > >   enable_on_exec                   1
> > > >   exclude_guest                    1
> > > > ------------------------------------------------------------
> > > > sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
> > > > Opening: cycles
> > > > ------------------------------------------------------------
> > > > perf_event_attr:
> > > >   type                             0 (PERF_TYPE_HARDWARE)
> > > >   size                             136
> > > >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > > >   sample_type                      IDENTIFIER
> > > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > > >   disabled                         1
> > > >   inherit                          1
> > > >   enable_on_exec                   1
> > > >   exclude_guest                    1
> > > > ------------------------------------------------------------
> > > > sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 5
> > > > arch                  builtin-diff.o      builtin-mem.o        common-cmds.h    perf-completion.sh
> > > > bench                 builtin-evlist.c    builtin-probe.c      CREDITS          perf.h
> > > > Build                 builtin-evlist.o    builtin-probe.o      design.txt       perf-in.o
> > > > builtin-annotate.c    builtin-ftrace.c    builtin-record.c     dlfilters        perf-iostat
> > > > builtin-annotate.o    builtin-ftrace.o    builtin-record.o     Documentation    perf-iostat.sh
> > > > builtin-bench.c               builtin.h           builtin-report.c     FEATURE-DUMP     perf.o
> > > > builtin-bench.o               builtin-help.c      builtin-report.o     include          perf-read-vdso.c
> > > > builtin-buildid-cache.c  builtin-help.o      builtin-sched.c   jvmti            perf-sys.h
> > > > builtin-buildid-cache.o  builtin-inject.c    builtin-script.c  libapi   PERF-VERSION-FILE
> > > > builtin-buildid-list.c        builtin-inject.o    builtin-script.o     libperf          perf-with-kcore
> > > > builtin-buildid-list.o        builtin-kallsyms.c  builtin-stat.c       libsubcmd        pmu-events
> > > > builtin-c2c.c         builtin-kallsyms.o  builtin-stat.o       libsymbol        python
> > > > builtin-c2c.o         builtin-kmem.c      builtin-timechart.c  Makefile         python_ext_build
> > > > builtin-config.c      builtin-kvm.c       builtin-top.c        Makefile.config  scripts
> > > > builtin-config.o      builtin-kvm.o       builtin-top.o        Makefile.perf    tests
> > > > builtin-daemon.c      builtin-kwork.c     builtin-trace.c      MANIFEST         trace
> > > > builtin-daemon.o      builtin-list.c      builtin-version.c    perf             ui
> > > > builtin-data.c                builtin-list.o      builtin-version.o    perf-archive     util
> > > > builtin-data.o                builtin-lock.c      check-headers.sh     perf-archive.sh
> > > > builtin-diff.c                builtin-mem.c       command-list.txt     perf.c
> > > > apple_icestorm_pmu/cycles/: -1: 0 873709 0
> > > > apple_firestorm_pmu/cycles/: -1: 0 873709 0
> > > > cycles: -1: 0 873709 0
> > > > apple_icestorm_pmu/cycles/: 0 873709 0
> > > > apple_firestorm_pmu/cycles/: 0 873709 0
> > > > cycles: 0 873709 0
> > > >
> > > >  Performance counter stats for 'ls':
> > > >
> > > >      <not counted>      apple_icestorm_pmu/cycles/                                              (0.00%)
> > > >      <not counted>      apple_firestorm_pmu/cycles/                                             (0.00%)
> > > >      <not counted>      cycles                                                                  (0.00%)
> > > >
> > > >        0.000002250 seconds time elapsed
> > > >
> > > >        0.000000000 seconds user
> > > >        0.000000000 seconds sys
> > > > </quote>
> > > >
> > > > If I run the same thing on another CPU cluster (firestorm), I get
> > > > this:
> > > >
> > > > <quote>
> > > > maz@valley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 2 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
> > > >  apple_firestorm_pmu/cycles/ -e cycles ls
> > > > Using CPUID 0x00000000612f0280
> > > > Attempt to add: apple_icestorm_pmu/cycles=0/
> > > > ..after resolving event: apple_icestorm_pmu/cycles=0/
> > > > Opening: unknown-hardware:HG
> > > > ------------------------------------------------------------
> > > > perf_event_attr:
> > > >   type                             0 (PERF_TYPE_HARDWARE)
> > > >   config                           0xb00000000
> > > >   disabled                         1
> > > > ------------------------------------------------------------
> > > > sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
> > > > sys_perf_event_open failed, error -95
> > > > Attempt to add: apple_firestorm_pmu/cycles=0/
> > > > ..after resolving event: apple_firestorm_pmu/cycles=0/
> > > > Control descriptor is not initialized
> > > > Opening: apple_icestorm_pmu/cycles/
> > > > ------------------------------------------------------------
> > > > perf_event_attr:
> > > >   type                             0 (PERF_TYPE_HARDWARE)
> > > >   size                             136
> > > >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > > >   sample_type                      IDENTIFIER
> > > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > > >   disabled                         1
> > > >   inherit                          1
> > > >   enable_on_exec                   1
> > > >   exclude_guest                    1
> > > > ------------------------------------------------------------
> > > > sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 3
> > > > Opening: apple_firestorm_pmu/cycles/
> > > > ------------------------------------------------------------
> > > > perf_event_attr:
> > > >   type                             0 (PERF_TYPE_HARDWARE)
> > > >   size                             136
> > > >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > > >   sample_type                      IDENTIFIER
> > > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > > >   disabled                         1
> > > >   inherit                          1
> > > >   enable_on_exec                   1
> > > >   exclude_guest                    1
> > > > ------------------------------------------------------------
> > > > sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 4
> > > > Opening: cycles
> > > > ------------------------------------------------------------
> > > > perf_event_attr:
> > > >   type                             0 (PERF_TYPE_HARDWARE)
> > > >   size                             136
> > > >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > > >   sample_type                      IDENTIFIER
> > > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > > >   disabled                         1
> > > >   inherit                          1
> > > >   enable_on_exec                   1
> > > >   exclude_guest                    1
> > > > ------------------------------------------------------------
> > > > sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 5
> > > > arch                  builtin-diff.o      builtin-mem.o        common-cmds.h    perf-completion.sh
> > > > bench                 builtin-evlist.c    builtin-probe.c      CREDITS          perf.h
> > > > Build                 builtin-evlist.o    builtin-probe.o      design.txt       perf-in.o
> > > > builtin-annotate.c    builtin-ftrace.c    builtin-record.c     dlfilters        perf-iostat
> > > > builtin-annotate.o    builtin-ftrace.o    builtin-record.o     Documentation    perf-iostat.sh
> > > > builtin-bench.c               builtin.h           builtin-report.c     FEATURE-DUMP     perf.o
> > > > builtin-bench.o               builtin-help.c      builtin-report.o     include          perf-read-vdso.c
> > > > builtin-buildid-cache.c  builtin-help.o      builtin-sched.c   jvmti            perf-sys.h
> > > > builtin-buildid-cache.o  builtin-inject.c    builtin-script.c  libapi   PERF-VERSION-FILE
> > > > builtin-buildid-list.c        builtin-inject.o    builtin-script.o     libperf          perf-with-kcore
> > > > builtin-buildid-list.o        builtin-kallsyms.c  builtin-stat.c       libsubcmd        pmu-events
> > > > builtin-c2c.c         builtin-kallsyms.o  builtin-stat.o       libsymbol        python
> > > > builtin-c2c.o         builtin-kmem.c      builtin-timechart.c  Makefile         python_ext_build
> > > > builtin-config.c      builtin-kvm.c       builtin-top.c        Makefile.config  scripts
> > > > builtin-config.o      builtin-kvm.o       builtin-top.o        Makefile.perf    tests
> > > > builtin-daemon.c      builtin-kwork.c     builtin-trace.c      MANIFEST         trace
> > > > builtin-daemon.o      builtin-list.c      builtin-version.c    perf             ui
> > > > builtin-data.c                builtin-list.o      builtin-version.o    perf-archive     util
> > > > builtin-data.o                builtin-lock.c      check-headers.sh     perf-archive.sh
> > > > builtin-diff.c                builtin-mem.c       command-list.txt     perf.c
> > > > apple_icestorm_pmu/cycles/: -1: 1035101 469125 469125
> > > > apple_firestorm_pmu/cycles/: -1: 1035035 469125 469125
> > > > cycles: -1: 1034653 469125 469125
> > > > apple_icestorm_pmu/cycles/: 1035101 469125 469125
> > > > apple_firestorm_pmu/cycles/: 1035035 469125 469125
> > > > cycles: 1034653 469125 469125
> > > >
> > > >  Performance counter stats for 'ls':
> > > >
> > > >          1,035,101      apple_icestorm_pmu/cycles/
> > > >          1,035,035      apple_firestorm_pmu/cycles/
> > > >          1,034,653      cycles
> > > >
> > > >        0.000001333 seconds time elapsed
> > > >
> > > >        0.000000000 seconds user
> > > >        0.000000000 seconds sys
> > > > </quote>
> > > >
> > > > which doesn't make any sense either. I really don't understand what
> > > > this PERF_TYPE_HARDWARE does here (the *real* types are 10 and 11),
> > > > nor what this 'cycle=0' stuff is.
> > > >
> > > > /puzzled
> > > >
> > > >       M.
> > > >
> > > > --
> > > > Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-21 16:09           ` Ian Rogers
@ 2023-11-21 16:15             ` Mark Rutland
  2023-11-21 16:38               ` Ian Rogers
  0 siblings, 1 reply; 53+ messages in thread
From: Mark Rutland @ 2023-11-21 16:15 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Marc Zyngier, Hector Martin, Arnaldo Carvalho de Melo,
	James Clark, linux-perf-users, LKML, Asahi Linux

On Tue, Nov 21, 2023 at 08:09:37AM -0800, Ian Rogers wrote:
> On Tue, Nov 21, 2023 at 8:03 AM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Tue, Nov 21, 2023 at 07:46:57AM -0800, Ian Rogers wrote:
> > > On Tue, Nov 21, 2023 at 7:40 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > >
> > > > On Tue, Nov 21, 2023 at 03:24:25PM +0000, Marc Zyngier wrote:
> > > > > On Tue, 21 Nov 2023 13:40:31 +0000,
> > > > > Marc Zyngier <maz@kernel.org> wrote:
> > > > > >
> > > > > > [Adding key people on Cc]
> > > > > >
> > > > > > On Tue, 21 Nov 2023 12:08:48 +0000,
> > > > > > Hector Martin <marcan@marcan.st> wrote:
> > > > > > >
> > > > > > > Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > > > > > according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> > > > > >
> > > > > > I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> > > > > > asymmetric ARM platform. It isn't clear what criteria is used to pick
> > > > > > the PMU, but nothing works anymore.
> > > > > >
> > > > > > The saving grace in my case is that Debian still ships a 6.1 perftool
> > > > > > package, but that's obviously not going to last.
> > > > > >
> > > > > > I'm happy to test potential fixes.
> > > > >
> > > > > At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
> > > > > -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
> > > > > CPU):
> > > >
> > > > IIUC the tool is doing the wrong thing here and overriding explicit
> > > > ${pmu}/${event}/ events with PERF_TYPE_HARDWARE events rather than events using
> > > > that ${pmu}'s type and event namespace.
> > > >
> > > > Regardless of the *new* ABI that allows PERF_TYPE_HARDWARE events to be
> > > > targetted to a specific PMU, it's semantically wrong to rewrite events like
> > > > this since ${pmu}/${event}/ is not necessarily equivalent to a similarly-named
> > > > PERF_COUNT_HW_${EVENT}.
> > >
> > > If you name a PMU and an event then the event should only be opened on
> > > that PMU, 100% agree. There's a bunch of output, but when the legacy
> > > cycles event is opened it appears to be because it was explicitly
> > > requested.
> >
> > I think you've missed that the named PMU events are being erreously transformed
> > into PERF_TYPE_HARDWARE events. Look at the -vvv output, e.g.
> >
> >   Opening: apple_firestorm_pmu/cycles/
> >   ------------------------------------------------------------
> >   perf_event_attr:
> >     type                             0 (PERF_TYPE_HARDWARE)
> >     size                             136
> >     config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> >     sample_type                      IDENTIFIER
> >     read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >     disabled                         1
> >     inherit                          1
> >     enable_on_exec                   1
> >     exclude_guest                    1
> >   ------------------------------------------------------------
> >   sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
> >
> > ... which should not be PERF_TYPE_HARDWARE && PERF_COUNT_HW_CPU_CYCLES.
> >
> > Marc said that he bisected the issue down to commit:
> >
> >   5ea8f2ccffb23983 ("perf parse-events: Support hardware events as terms")
> >
> > ... so it looks like something is going wrong when the events are being parsed,
> > e.g. losing the HW PMU information?
> 
> Ok, I think I'm getting confused by other things. This looks like the issue.
> 
> I think it may be working as intended, but not how you intended :-) If
> a core PMU is listed and then a legacy event, the legacy event should
> be opened on the core PMU as a legacy event with the extended type
> set. This is to allow things like legacy cache events to be opened on
> a specified PMU. Legacy event names match with a higher priority than
> those in sysfs or json as they are hard coded. 

That has never been the case previously, so this is user-visible breakage, and
it prevents users from being able to do the right thing, so I think that's a
broken design.

> Presumably the expectation was that by advertising a cycles event, presumably
> in sysfs, then this is what would be matched.

I expect that if I ask for ${pmu}/${event}/, that PMU is used, and the event
*in that PMU's namespace* is used. Overriding that breaks long-established
practice and provides users with no recourse to get the behavioru they expect
(and previosuly had).

I do think that (regardless of whther this was the sematnic you intended)
silently overriding events with legacy events is a bug, and one we should fix.
As I mentioned in another reply, just because the events have the same name
does not mean that they are semantically the same, so we're liable to give
people the wrong numbers anyhow.

Can we fix this?

Mark.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-21 16:15             ` Mark Rutland
@ 2023-11-21 16:38               ` Ian Rogers
  2023-11-22  3:23                 ` Hector Martin
  2023-11-22 13:03                 ` Mark Rutland
  0 siblings, 2 replies; 53+ messages in thread
From: Ian Rogers @ 2023-11-21 16:38 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Marc Zyngier, Hector Martin, Arnaldo Carvalho de Melo,
	James Clark, linux-perf-users, LKML, Asahi Linux

On Tue, Nov 21, 2023 at 8:15 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Tue, Nov 21, 2023 at 08:09:37AM -0800, Ian Rogers wrote:
> > On Tue, Nov 21, 2023 at 8:03 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > On Tue, Nov 21, 2023 at 07:46:57AM -0800, Ian Rogers wrote:
> > > > On Tue, Nov 21, 2023 at 7:40 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > > >
> > > > > On Tue, Nov 21, 2023 at 03:24:25PM +0000, Marc Zyngier wrote:
> > > > > > On Tue, 21 Nov 2023 13:40:31 +0000,
> > > > > > Marc Zyngier <maz@kernel.org> wrote:
> > > > > > >
> > > > > > > [Adding key people on Cc]
> > > > > > >
> > > > > > > On Tue, 21 Nov 2023 12:08:48 +0000,
> > > > > > > Hector Martin <marcan@marcan.st> wrote:
> > > > > > > >
> > > > > > > > Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > > > > > > according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> > > > > > >
> > > > > > > I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> > > > > > > asymmetric ARM platform. It isn't clear what criteria is used to pick
> > > > > > > the PMU, but nothing works anymore.
> > > > > > >
> > > > > > > The saving grace in my case is that Debian still ships a 6.1 perftool
> > > > > > > package, but that's obviously not going to last.
> > > > > > >
> > > > > > > I'm happy to test potential fixes.
> > > > > >
> > > > > > At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
> > > > > > -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
> > > > > > CPU):
> > > > >
> > > > > IIUC the tool is doing the wrong thing here and overriding explicit
> > > > > ${pmu}/${event}/ events with PERF_TYPE_HARDWARE events rather than events using
> > > > > that ${pmu}'s type and event namespace.
> > > > >
> > > > > Regardless of the *new* ABI that allows PERF_TYPE_HARDWARE events to be
> > > > > targetted to a specific PMU, it's semantically wrong to rewrite events like
> > > > > this since ${pmu}/${event}/ is not necessarily equivalent to a similarly-named
> > > > > PERF_COUNT_HW_${EVENT}.
> > > >
> > > > If you name a PMU and an event then the event should only be opened on
> > > > that PMU, 100% agree. There's a bunch of output, but when the legacy
> > > > cycles event is opened it appears to be because it was explicitly
> > > > requested.
> > >
> > > I think you've missed that the named PMU events are being erreously transformed
> > > into PERF_TYPE_HARDWARE events. Look at the -vvv output, e.g.
> > >
> > >   Opening: apple_firestorm_pmu/cycles/
> > >   ------------------------------------------------------------
> > >   perf_event_attr:
> > >     type                             0 (PERF_TYPE_HARDWARE)
> > >     size                             136
> > >     config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > >     sample_type                      IDENTIFIER
> > >     read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > >     disabled                         1
> > >     inherit                          1
> > >     enable_on_exec                   1
> > >     exclude_guest                    1
> > >   ------------------------------------------------------------
> > >   sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
> > >
> > > ... which should not be PERF_TYPE_HARDWARE && PERF_COUNT_HW_CPU_CYCLES.
> > >
> > > Marc said that he bisected the issue down to commit:
> > >
> > >   5ea8f2ccffb23983 ("perf parse-events: Support hardware events as terms")
> > >
> > > ... so it looks like something is going wrong when the events are being parsed,
> > > e.g. losing the HW PMU information?
> >
> > Ok, I think I'm getting confused by other things. This looks like the issue.
> >
> > I think it may be working as intended, but not how you intended :-) If
> > a core PMU is listed and then a legacy event, the legacy event should
> > be opened on the core PMU as a legacy event with the extended type
> > set. This is to allow things like legacy cache events to be opened on
> > a specified PMU. Legacy event names match with a higher priority than
> > those in sysfs or json as they are hard coded.
>
> That has never been the case previously, so this is user-visible breakage, and
> it prevents users from being able to do the right thing, so I think that's a
> broken design.

So the problem was caused by ARM and Intel doing two different things.
Intel did at least contribute to the perf tool in support for their
BIG.little/hybrid, so that's why the semantics match their approach.

> > Presumably the expectation was that by advertising a cycles event, presumably
> > in sysfs, then this is what would be matched.
>
> I expect that if I ask for ${pmu}/${event}/, that PMU is used, and the event
> *in that PMU's namespace* is used. Overriding that breaks long-established
> practice and provides users with no recourse to get the behavioru they expect
> (and previosuly had).

On ARM but not Intel.

> I do think that (regardless of whther this was the sematnic you intended)
> silently overriding events with legacy events is a bug, and one we should fix.
> As I mentioned in another reply, just because the events have the same name
> does not mean that they are semantically the same, so we're liable to give
> people the wrong numbers anyhow.
>
> Can we fix this?

So I'd like to fix this, some things from various conversations:

1) we lack testing. Our testing relies on the sysfs of the machine
being run on, which is better than nothing. I think ideally we'd have
a collection of zipped up sysfs directories and then we could have a
test that asserts on ARM you get the behavior you want.

2) for RISC-V they want to make the legacy event matching something in
user land to simplify the PMU driver.

3) I'd like to get rid of the PMU json interface. My idea is to
convert json events/metrics into sysfs style files, zip these up and
then link them into the perf binary. On Intel the json is 70% of the
binary (7MB out of 10MB) and we may get this down to 3MB with this
approach. The json lookup would need to incorporate the cpuid matching
that currently exists. When we look up an event I'd like the approach
to be like unionfs with a specified but configurable order. Users
could provide directories of their own events/metrics for various
PMUs, and then this approach could be used to help with (1).

Those proposals are not something to add as a -rc fix, so what I think
you're asking for here is a "if ARM" fix somewhere in the event
parsing. That's of course possible but it will cause problems if you
did say:

perf stat -e arm_pmu/LLC-load-misses/ ...

as I doubt the PMU driver is advertising this legacy event in sysfs
and the "if ARM" logic would presumably be trying to disable legacy
events in the term list for the ARM PMU.

Given all of this, is anything actually broken and needing a fix for 6.7?

Thanks,
Ian

> Mark.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-21 12:08 [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5 Hector Martin
  2023-11-21 13:40 ` Marc Zyngier
@ 2023-11-21 23:43 ` Bagas Sanjaya
  2023-12-06 12:09   ` Linux regression tracking #update (Thorsten Leemhuis)
  1 sibling, 1 reply; 53+ messages in thread
From: Bagas Sanjaya @ 2023-11-21 23:43 UTC (permalink / raw)
  To: Hector Martin, Linux perf Profiling, Linux Kernel Mailing List
  Cc: Marc Zyngier, Asahi Linux Mailing List, Ian Rogers, Kan Liang,
	Arnaldo Carvalho de Melo

[-- Attachment #1: Type: text/plain, Size: 2389 bytes --]

On Tue, Nov 21, 2023 at 09:08:48PM +0900, Hector Martin wrote:
> Perf broke on all Apple ARM64 systems (tested almost everything), and
> according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> 
> Test command:
> 
> sudo taskset -c 0 ./perf stat -e apple_icestorm_pmu/cycles/ -e
> apple_firestorm_pmu/cycles/ -e cycles ls
> 
> Since this is taskset to CPU #0 (LITTLE core, icestorm), only events for
> icestorm are expected.
> 
> I bisected the breakage to two distinct points:
> 
> 5ea8f2ccffb is the first bad commit. With its parent, the output is as
> expected (same as v6.4):
> 
>          3,297,462      apple_icestorm_pmu/cycles/
> 
>      <not counted>      apple_firestorm_pmu/cycles/
>                        (0.00%)
>      <not counted>      cycles
>                        (0.00%)
> 
> With 5ea8f2ccffb everything breaks:
> 
>    <not supported>      apple_icestorm_pmu/cycles/
> 
>    <not supported>      apple_firestorm_pmu/cycles/
> 
>      <not counted>      cycles
>                        (0.00%)
> 
> Somewhere along the way to 82fe2e45cdb00 things get even worse (didn't
> bother bisecting this range). With its parent:
> 
>    <not supported>      apple_icestorm_pmu/cycles/
> 
>    <not supported>      apple_firestorm_pmu/cycles/
> 
>    <not supported>      apple_icestorm_pmu/cycles/
> 
>    <not supported>      apple_firestorm_pmu/cycles/
> 
> Then 82fe2e45cdb00 leads to the current v6.5 behavior:
> 
>      <not counted>      apple_icestorm_pmu/cycles/
>                        (0.00%)
>      <not counted>      apple_firestorm_pmu/cycles/
>                        (0.00%)
>      <not counted>      cycles
>                        (0.00%)
> 
> If I taskset the task to CPU#2 (big core, firestorm), I get events:
> 
>          1,454,858      apple_icestorm_pmu/cycles/
> 
>          1,454,760      apple_firestorm_pmu/cycles/
> 
>          1,454,384      cycles
> 
> 
> So the current behavior is that all output seems to come from the
> firestorm PMU event counter, regardless of requested event.
> 
> This is all unchanged and still broken in v6.7-rc2.
> 

Thanks for the regression report (and it has been handled well already).
I'm adding it to regzbot for tracking:

#regzbot ^introduced: 5ea8f2ccffb239

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-21 16:38               ` Ian Rogers
@ 2023-11-22  3:23                 ` Hector Martin
  2023-11-22 13:06                   ` Arnaldo Carvalho de Melo
  2023-11-22 13:03                 ` Mark Rutland
  1 sibling, 1 reply; 53+ messages in thread
From: Hector Martin @ 2023-11-22  3:23 UTC (permalink / raw)
  To: Ian Rogers, Mark Rutland
  Cc: Marc Zyngier, Arnaldo Carvalho de Melo, James Clark,
	linux-perf-users, LKML, Asahi Linux



On 2023/11/22 1:38, Ian Rogers wrote:
> On Tue, Nov 21, 2023 at 8:15 AM Mark Rutland <mark.rutland@arm.com> wrote:
>>
>> On Tue, Nov 21, 2023 at 08:09:37AM -0800, Ian Rogers wrote:
>>> On Tue, Nov 21, 2023 at 8:03 AM Mark Rutland <mark.rutland@arm.com> wrote:
>>>>
>>>> On Tue, Nov 21, 2023 at 07:46:57AM -0800, Ian Rogers wrote:
>>>>> On Tue, Nov 21, 2023 at 7:40 AM Mark Rutland <mark.rutland@arm.com> wrote:
>>>>>>
>>>>>> On Tue, Nov 21, 2023 at 03:24:25PM +0000, Marc Zyngier wrote:
>>>>>>> On Tue, 21 Nov 2023 13:40:31 +0000,
>>>>>>> Marc Zyngier <maz@kernel.org> wrote:
>>>>>>>>
>>>>>>>> [Adding key people on Cc]
>>>>>>>>
>>>>>>>> On Tue, 21 Nov 2023 12:08:48 +0000,
>>>>>>>> Hector Martin <marcan@marcan.st> wrote:
>>>>>>>>>
>>>>>>>>> Perf broke on all Apple ARM64 systems (tested almost everything), and
>>>>>>>>> according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
>>>>>>>>
>>>>>>>> I can confirm that at least on 6.7-rc2, perf is pretty busted on any
>>>>>>>> asymmetric ARM platform. It isn't clear what criteria is used to pick
>>>>>>>> the PMU, but nothing works anymore.
>>>>>>>>
>>>>>>>> The saving grace in my case is that Debian still ships a 6.1 perftool
>>>>>>>> package, but that's obviously not going to last.
>>>>>>>>
>>>>>>>> I'm happy to test potential fixes.
>>>>>>>
>>>>>>> At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
>>>>>>> -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
>>>>>>> CPU):
>>>>>>
>>>>>> IIUC the tool is doing the wrong thing here and overriding explicit
>>>>>> ${pmu}/${event}/ events with PERF_TYPE_HARDWARE events rather than events using
>>>>>> that ${pmu}'s type and event namespace.
>>>>>>
>>>>>> Regardless of the *new* ABI that allows PERF_TYPE_HARDWARE events to be
>>>>>> targetted to a specific PMU, it's semantically wrong to rewrite events like
>>>>>> this since ${pmu}/${event}/ is not necessarily equivalent to a similarly-named
>>>>>> PERF_COUNT_HW_${EVENT}.
>>>>>
>>>>> If you name a PMU and an event then the event should only be opened on
>>>>> that PMU, 100% agree. There's a bunch of output, but when the legacy
>>>>> cycles event is opened it appears to be because it was explicitly
>>>>> requested.
>>>>
>>>> I think you've missed that the named PMU events are being erreously transformed
>>>> into PERF_TYPE_HARDWARE events. Look at the -vvv output, e.g.
>>>>
>>>>   Opening: apple_firestorm_pmu/cycles/
>>>>   ------------------------------------------------------------
>>>>   perf_event_attr:
>>>>     type                             0 (PERF_TYPE_HARDWARE)
>>>>     size                             136
>>>>     config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>>>>     sample_type                      IDENTIFIER
>>>>     read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>>>>     disabled                         1
>>>>     inherit                          1
>>>>     enable_on_exec                   1
>>>>     exclude_guest                    1
>>>>   ------------------------------------------------------------
>>>>   sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
>>>>
>>>> ... which should not be PERF_TYPE_HARDWARE && PERF_COUNT_HW_CPU_CYCLES.
>>>>
>>>> Marc said that he bisected the issue down to commit:
>>>>
>>>>   5ea8f2ccffb23983 ("perf parse-events: Support hardware events as terms")
>>>>
>>>> ... so it looks like something is going wrong when the events are being parsed,
>>>> e.g. losing the HW PMU information?
>>>
>>> Ok, I think I'm getting confused by other things. This looks like the issue.
>>>
>>> I think it may be working as intended, but not how you intended :-) If
>>> a core PMU is listed and then a legacy event, the legacy event should
>>> be opened on the core PMU as a legacy event with the extended type
>>> set. This is to allow things like legacy cache events to be opened on
>>> a specified PMU. Legacy event names match with a higher priority than
>>> those in sysfs or json as they are hard coded.
>>
>> That has never been the case previously, so this is user-visible breakage, and
>> it prevents users from being able to do the right thing, so I think that's a
>> broken design.
> 
> So the problem was caused by ARM and Intel doing two different things.
> Intel did at least contribute to the perf tool in support for their
> BIG.little/hybrid, so that's why the semantics match their approach.
> 
>>> Presumably the expectation was that by advertising a cycles event, presumably
>>> in sysfs, then this is what would be matched.
>>
>> I expect that if I ask for ${pmu}/${event}/, that PMU is used, and the event
>> *in that PMU's namespace* is used. Overriding that breaks long-established
>> practice and provides users with no recourse to get the behavioru they expect
>> (and previosuly had).
> 
> On ARM but not Intel.
> 
>> I do think that (regardless of whther this was the sematnic you intended)
>> silently overriding events with legacy events is a bug, and one we should fix.
>> As I mentioned in another reply, just because the events have the same name
>> does not mean that they are semantically the same, so we're liable to give
>> people the wrong numbers anyhow.
>>
>> Can we fix this?
> 
> So I'd like to fix this, some things from various conversations:
> 
> 1) we lack testing. Our testing relies on the sysfs of the machine
> being run on, which is better than nothing. I think ideally we'd have
> a collection of zipped up sysfs directories and then we could have a
> test that asserts on ARM you get the behavior you want.
> 
> 2) for RISC-V they want to make the legacy event matching something in
> user land to simplify the PMU driver.
> 
> 3) I'd like to get rid of the PMU json interface. My idea is to
> convert json events/metrics into sysfs style files, zip these up and
> then link them into the perf binary. On Intel the json is 70% of the
> binary (7MB out of 10MB) and we may get this down to 3MB with this
> approach. The json lookup would need to incorporate the cpuid matching
> that currently exists. When we look up an event I'd like the approach
> to be like unionfs with a specified but configurable order. Users
> could provide directories of their own events/metrics for various
> PMUs, and then this approach could be used to help with (1).
> 
> Those proposals are not something to add as a -rc fix, so what I think
> you're asking for here is a "if ARM" fix somewhere in the event
> parsing. That's of course possible but it will cause problems if you
> did say:
> 
> perf stat -e arm_pmu/LLC-load-misses/ ...
> 
> as I doubt the PMU driver is advertising this legacy event in sysfs
> and the "if ARM" logic would presumably be trying to disable legacy
> events in the term list for the ARM PMU.
> 
> Given all of this, is anything actually broken and needing a fix for 6.7?

You literally cannot use perf correctly on ARM big.LITTLE systems since
6.5, while it worked fine on 6.4. So, yes, it's broken and it needs
fixing. This is a major regression.

$ taskset -c 0 perf stat -e apple_icestorm_pmu/cycles/ echo


 Performance counter stats for 'echo':

     <not counted>      apple_icestorm_pmu/cycles/u
                       (0.00%)

       0.001385544 seconds time elapsed

       0.001375000 seconds user
       0.000000000 seconds sys


$ taskset -c 2 perf stat -e apple_firestorm_pmu/cycles/ echo


 Performance counter stats for 'echo':

           169,965      apple_firestorm_pmu/cycles/u


       0.000466667 seconds time elapsed

       0.000475000 seconds user
       0.000000000 seconds sys


Both of those should return counts. One does not, and it doesn't even
seem to be predictable which one you get. *On my particular system, it
is currently impossible to get any performance counter data from the E
cores, as far as I can tell, no matter how you invoke perf*.

Feel free to argue semantics as to what went wrong or how it should be
fixed, but there is no question that this is a regression that requires
a fix. Perf is currently simply broken here, where it wasn't in 6.4.

- Hector

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-21 16:38               ` Ian Rogers
  2023-11-22  3:23                 ` Hector Martin
@ 2023-11-22 13:03                 ` Mark Rutland
  2023-11-22 15:29                   ` Ian Rogers
  1 sibling, 1 reply; 53+ messages in thread
From: Mark Rutland @ 2023-11-22 13:03 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Marc Zyngier, Hector Martin, Arnaldo Carvalho de Melo,
	James Clark, linux-perf-users, LKML, Asahi Linux

On Tue, Nov 21, 2023 at 08:38:45AM -0800, Ian Rogers wrote:
> On Tue, Nov 21, 2023 at 8:15 AM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Tue, Nov 21, 2023 at 08:09:37AM -0800, Ian Rogers wrote:
> > > On Tue, Nov 21, 2023 at 8:03 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > >
> > > > On Tue, Nov 21, 2023 at 07:46:57AM -0800, Ian Rogers wrote:
> > > > > On Tue, Nov 21, 2023 at 7:40 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > > > >
> > > > > > On Tue, Nov 21, 2023 at 03:24:25PM +0000, Marc Zyngier wrote:
> > > > > > > On Tue, 21 Nov 2023 13:40:31 +0000,
> > > > > > > Marc Zyngier <maz@kernel.org> wrote:
> > > > > > > >
> > > > > > > > [Adding key people on Cc]
> > > > > > > >
> > > > > > > > On Tue, 21 Nov 2023 12:08:48 +0000,
> > > > > > > > Hector Martin <marcan@marcan.st> wrote:
> > > > > > > > >
> > > > > > > > > Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > > > > > > > according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> > > > > > > >
> > > > > > > > I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> > > > > > > > asymmetric ARM platform. It isn't clear what criteria is used to pick
> > > > > > > > the PMU, but nothing works anymore.
> > > > > > > >
> > > > > > > > The saving grace in my case is that Debian still ships a 6.1 perftool
> > > > > > > > package, but that's obviously not going to last.
> > > > > > > >
> > > > > > > > I'm happy to test potential fixes.
> > > > > > >
> > > > > > > At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
> > > > > > > -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
> > > > > > > CPU):
> > > > > >
> > > > > > IIUC the tool is doing the wrong thing here and overriding explicit
> > > > > > ${pmu}/${event}/ events with PERF_TYPE_HARDWARE events rather than events using
> > > > > > that ${pmu}'s type and event namespace.
> > > > > >
> > > > > > Regardless of the *new* ABI that allows PERF_TYPE_HARDWARE events to be
> > > > > > targetted to a specific PMU, it's semantically wrong to rewrite events like
> > > > > > this since ${pmu}/${event}/ is not necessarily equivalent to a similarly-named
> > > > > > PERF_COUNT_HW_${EVENT}.
> > > > >
> > > > > If you name a PMU and an event then the event should only be opened on
> > > > > that PMU, 100% agree. There's a bunch of output, but when the legacy
> > > > > cycles event is opened it appears to be because it was explicitly
> > > > > requested.
> > > >
> > > > I think you've missed that the named PMU events are being erreously transformed
> > > > into PERF_TYPE_HARDWARE events. Look at the -vvv output, e.g.
> > > >
> > > >   Opening: apple_firestorm_pmu/cycles/
> > > >   ------------------------------------------------------------
> > > >   perf_event_attr:
> > > >     type                             0 (PERF_TYPE_HARDWARE)
> > > >     size                             136
> > > >     config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > > >     sample_type                      IDENTIFIER
> > > >     read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > > >     disabled                         1
> > > >     inherit                          1
> > > >     enable_on_exec                   1
> > > >     exclude_guest                    1
> > > >   ------------------------------------------------------------
> > > >   sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
> > > >
> > > > ... which should not be PERF_TYPE_HARDWARE && PERF_COUNT_HW_CPU_CYCLES.
> > > >
> > > > Marc said that he bisected the issue down to commit:
> > > >
> > > >   5ea8f2ccffb23983 ("perf parse-events: Support hardware events as terms")
> > > >
> > > > ... so it looks like something is going wrong when the events are being parsed,
> > > > e.g. losing the HW PMU information?
> > >
> > > Ok, I think I'm getting confused by other things. This looks like the issue.
> > >
> > > I think it may be working as intended, but not how you intended :-) If
> > > a core PMU is listed and then a legacy event, the legacy event should
> > > be opened on the core PMU as a legacy event with the extended type
> > > set. This is to allow things like legacy cache events to be opened on
> > > a specified PMU. Legacy event names match with a higher priority than
> > > those in sysfs or json as they are hard coded.
> >
> > That has never been the case previously, so this is user-visible breakage, and
> > it prevents users from being able to do the right thing, so I think that's a
> > broken design.
> 
> So the problem was caused by ARM and Intel doing two different things.
> Intel did at least contribute to the perf tool in support for their
> BIG.little/hybrid, so that's why the semantics match their approach.

I appreciate that, and I agree that from the Arm side we haven't been as
engaged with userspace on this front (please understand I'm the messenger here,
this is something I've repeatedly asked for within Arm).

Regardless, I don't think that changes the substance of the bug, which is that
we're converting named-pmu events into entirely different PERF_TYPE_HARDWARE
events.

I agree that expanding plain legacy event names to a set of PMU-tagetted legacy
events makes sense (and even for Arm, that's the right thing to do, IMO). If
I ask for 'cycles' and that gets expanded to multiple legacy cycles events that
target specific CPU PMUs, that's good.

The thing that doesn't make sense here is converting named-pmu events into
egacy events. If I ask for 'apple_firestorm_pmu/cycles/', that should be the
'cycles' event in the apple_firestorm_pmu's event namespace, and *shouldn't* be
converted to a (potentially semantically different) PERF_TYPE_HARDWARE event,
even if that's targetted towards the apple_firestorm_pmu. I think that should
be true for *any* PMU, whether thats an arm/x86/whatever CPU PMU or a system
PMU.

> > > Presumably the expectation was that by advertising a cycles event, presumably
> > > in sysfs, then this is what would be matched.

Yes. That's how this has always worked prior to the changes Marc referenced.
Note that this can *also* be expaned to events from json databases, but was
*never* previously silently converted to a PERF_TYPE_HARDWARE event.

Please note that the events in sysfs are *namespaced* to the PMU (specifically,
when using that PMU's dynamic type); they are not necessarily the same as
legacy events (though they may have similar or matching
names in some cases), they may be semantically distinct from the legacy events
even if the names match, and it is incorrect to conflate the two.

> > I expect that if I ask for ${pmu}/${event}/, that PMU is used, and the event
> > *in that PMU's namespace* is used. Overriding that breaks long-established
> > practice and provides users with no recourse to get the behavioru they expect
> > (and previosuly had).
> 
> On ARM but not Intel.

As above, I don't think the CPU architecture matters here for the case that I'm
saying is broken. I think that regardless of CPU architecture (or for any
non-CPU PMU) it is semantically incorrect to convert a named-pmu event to a
legacy event.

> > I do think that (regardless of whther this was the sematnic you intended)
> > silently overriding events with legacy events is a bug, and one we should fix.
> > As I mentioned in another reply, just because the events have the same name
> > does not mean that they are semantically the same, so we're liable to give
> > people the wrong numbers anyhow.
> >
> > Can we fix this?
> 
> So I'd like to fix this, some things from various conversations:
> 
> 1) we lack testing. Our testing relies on the sysfs of the machine
> being run on, which is better than nothing. I think ideally we'd have
> a collection of zipped up sysfs directories and then we could have a
> test that asserts on ARM you get the behavior you want.

I agree we lack testing, and I'd be happy to help here going forwards, though I
don't think this is a prerequisite for fixing this issue.

> 2) for RISC-V they want to make the legacy event matching something in
> user land to simplify the PMU driver.

Ok; I see how this might be related, but it doesn't sound like a prerequisite
for fixing this issue -- there are plenty of people in this thread who can
test.

> 3) I'd like to get rid of the PMU json interface. My idea is to
> convert json events/metrics into sysfs style files, zip these up and
> then link them into the perf binary. On Intel the json is 70% of the
> binary (7MB out of 10MB) and we may get this down to 3MB with this
> approach. The json lookup would need to incorporate the cpuid matching
> that currently exists. When we look up an event I'd like the approach
> to be like unionfs with a specified but configurable order. Users
> could provide directories of their own events/metrics for various
> PMUs, and then this approach could be used to help with (1).

I can see how that might interact with whatever changes we make to fix this
issue, but this seems like a future aspiration, and not a prerequisite for
fixing the existing functional regression.

> Those proposals are not something to add as a -rc fix, so what I think
> you're asking for here is a "if ARM" fix somewhere in the event
> parsing. That's of course possible but it will cause problems if you
> did say:
> 
> perf stat -e arm_pmu/LLC-load-misses/ ...

As above, I do not think this is an arm-specific issue, we're just the canary
in the coalmine.

Please note that:

	perf stat -e arm_pmu/LLC-load-misses/ ...

... would never have worked previously. No arm_pmu instances have a
"LLC-load-misses" event in their event namespaces, and we don't have any
userspace file mapping that event.

That said, If I really wanted that legacy event, I'd have asked for it bare,
e.g.

	perf stat -e LLC-load-misses

... and we're in agreement that it's sensible to expand this to multiple
PERF_TYPE_HARDWARE events targeting the individual CPU PMUs.

So I see no need to do anything to have magic for 'arm_pmu/LLC-load-misses/'.

> as I doubt the PMU driver is advertising this legacy event in sysfs
> and the "if ARM" logic would presumably be trying to disable legacy
> events in the term list for the ARM PMU.
> 
> Given all of this, is anything actually broken and needing a fix for 6.7?

There is absolutely a bug that needs to be fixed here (and needs to be
backported to stable so that it gets picked up by distributions).

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-22  3:23                 ` Hector Martin
@ 2023-11-22 13:06                   ` Arnaldo Carvalho de Melo
  2023-11-22 15:33                     ` Ian Rogers
  2023-11-22 15:49                     ` Mark Rutland
  0 siblings, 2 replies; 53+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-11-22 13:06 UTC (permalink / raw)
  To: Hector Martin
  Cc: Ian Rogers, Mark Rutland, Marc Zyngier, Arnaldo Carvalho de Melo,
	James Clark, linux-perf-users, LKML, Asahi Linux

Em Wed, Nov 22, 2023 at 12:23:27PM +0900, Hector Martin escreveu:
> On 2023/11/22 1:38, Ian Rogers wrote:
> > On Tue, Nov 21, 2023 at 8:15 AM Mark Rutland <mark.rutland@arm.com> wrote:
> >> On Tue, Nov 21, 2023 at 08:09:37AM -0800, Ian Rogers wrote:
> >>> On Tue, Nov 21, 2023 at 8:03 AM Mark Rutland <mark.rutland@arm.com> wrote:
> >>>> On Tue, Nov 21, 2023 at 07:46:57AM -0800, Ian Rogers wrote:
> >>>>> On Tue, Nov 21, 2023 at 7:40 AM Mark Rutland <mark.rutland@arm.com> wrote:
> >>>>>> On Tue, Nov 21, 2023 at 03:24:25PM +0000, Marc Zyngier wrote:
> >>>>>>> On Tue, 21 Nov 2023 13:40:31 +0000,
> >>>>>>> Marc Zyngier <maz@kernel.org> wrote:
> >>>>>>>>
> >>>>>>>> [Adding key people on Cc]
> >>>>>>>>
> >>>>>>>> On Tue, 21 Nov 2023 12:08:48 +0000,
> >>>>>>>> Hector Martin <marcan@marcan.st> wrote:
> >>>>>>>>>
> >>>>>>>>> Perf broke on all Apple ARM64 systems (tested almost everything), and
> >>>>>>>>> according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> >>>>>>>>
> >>>>>>>> I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> >>>>>>>> asymmetric ARM platform. It isn't clear what criteria is used to pick
> >>>>>>>> the PMU, but nothing works anymore.
> >>>>>>>>
> >>>>>>>> The saving grace in my case is that Debian still ships a 6.1 perftool
> >>>>>>>> package, but that's obviously not going to last.
> >>>>>>>>
> >>>>>>>> I'm happy to test potential fixes.
> >>>>>>>
> >>>>>>> At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
> >>>>>>> -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
> >>>>>>> CPU):
> >>>>>>
> >>>>>> IIUC the tool is doing the wrong thing here and overriding explicit
> >>>>>> ${pmu}/${event}/ events with PERF_TYPE_HARDWARE events rather than events using
> >>>>>> that ${pmu}'s type and event namespace.
> >>>>>>
> >>>>>> Regardless of the *new* ABI that allows PERF_TYPE_HARDWARE events to be
> >>>>>> targetted to a specific PMU, it's semantically wrong to rewrite events like
> >>>>>> this since ${pmu}/${event}/ is not necessarily equivalent to a similarly-named
> >>>>>> PERF_COUNT_HW_${EVENT}.
> >>>>>
> >>>>> If you name a PMU and an event then the event should only be opened on
> >>>>> that PMU, 100% agree. There's a bunch of output, but when the legacy
> >>>>> cycles event is opened it appears to be because it was explicitly
> >>>>> requested.
> >>>>
> >>>> I think you've missed that the named PMU events are being erreously transformed
> >>>> into PERF_TYPE_HARDWARE events. Look at the -vvv output, e.g.
> >>>>
> >>>>   Opening: apple_firestorm_pmu/cycles/
> >>>>   ------------------------------------------------------------
> >>>>   perf_event_attr:
> >>>>     type                             0 (PERF_TYPE_HARDWARE)
> >>>>     size                             136
> >>>>     config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> >>>>     sample_type                      IDENTIFIER
> >>>>     read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >>>>     disabled                         1
> >>>>     inherit                          1
> >>>>     enable_on_exec                   1
> >>>>     exclude_guest                    1
> >>>>   ------------------------------------------------------------
> >>>>   sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
> >>>>
> >>>> ... which should not be PERF_TYPE_HARDWARE && PERF_COUNT_HW_CPU_CYCLES.
> >>>>
> >>>> Marc said that he bisected the issue down to commit:
> >>>>
> >>>>   5ea8f2ccffb23983 ("perf parse-events: Support hardware events as terms")
> >>>>
> >>>> ... so it looks like something is going wrong when the events are being parsed,
> >>>> e.g. losing the HW PMU information?
> >>>
> >>> Ok, I think I'm getting confused by other things. This looks like the issue.
> >>>
> >>> I think it may be working as intended, but not how you intended :-) If
> >>> a core PMU is listed and then a legacy event, the legacy event should

The point is that "cycles" when prefixed with "pmu/" shouldn't be
considered "cycles" as HW/0, in that setting it is "cycles" for that
PMU. (but we only have "cpu_cycles" for at least the a53 and a72 PMUs I
have access in a Libre Computer rockchip 3399-pc hybrid board, if we use
it, then we get what we want/had before, see below):

And there is an attempt at using the specified PMU, see the first
perf_event_open:

root@roc-rk3399-pc:~# strace -e perf_event_open perf stat -vv -e cycles,armv8_cortex_a53/cycles/,armv8_cortex_a72/cycles/ echo
Using CPUID 0x00000000410fd082
------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0x700000000
  disabled                         1
------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8perf_event_open({type=PERF_TYPE_HARDWARE, size=0 /* PERF_ATTR_SIZE_??? */, config=0x7<<32|PERF_COUNT_HW_CPU_CYCLES, sample_period=0, sample_type=0, read_format=0, disabled=1, precise_ip=0 /* arbitrary skid */, ...}, 0, -1, -1, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
                                                   
//// HERE: it tries config=0x7<<32|PERF_COUNT_HW_CPU_CYCLES taking into
//account the PMU number 0x7

root@roc-rk3399-pc:~# cat /sys/devices/armv8_cortex_a53/type
7
root@roc-rk3399-pc:~#

But then we don't have "cycles" in that PMU:

root@roc-rk3399-pc:~# ls -la /sys/devices/armv8_cortex_a53/events/cycles
ls: cannot access '/sys/devices/armv8_cortex_a53/events/cycles': No such file or directory
root@roc-rk3399-pc:~#

Maybe:

root@roc-rk3399-pc:~# taskset -c 5,6 perf stat -v -e armv8_cortex_a53/cpu_cycles/,armv8_cortex_a72/cpu_cycles/ echo
Using CPUID 0x00000000410fd034
Control descriptor is not initialized

armv8_cortex_a53/cpu_cycles/: 0 2079000 0
armv8_cortex_a72/cpu_cycles/: 2488961 2079000 2079000

 Performance counter stats for 'echo':

     <not counted>      armv8_cortex_a53/cpu_cycles/                                            (0.00%)
           2488961      armv8_cortex_a72/cpu_cycles/

       0.003449266 seconds time elapsed

       0.003502000 seconds user
       0.000000000 seconds sys


root@roc-rk3399-pc:~# taskset -c 0,1,2,3,4 perf stat -v -e armv8_cortex_a53/cpu_cycles/,armv8_cortex_a72/cpu_cycles/ echo
Using CPUID 0x00000000410fd034
Control descriptor is not initialized

armv8_cortex_a53/cpu_cycles/: 2986601 6999416 6999416
armv8_cortex_a72/cpu_cycles/: 0 6999416 0

 Performance counter stats for 'echo':

           2986601      armv8_cortex_a53/cpu_cycles/
     <not counted>      armv8_cortex_a72/cpu_cycles/                                            (0.00%)

       0.011434508 seconds time elapsed

       0.003911000 seconds user
       0.007454000 seconds sys


root@roc-rk3399-pc:~#

root@roc-rk3399-pc:~# cat /sys/devices/armv8_cortex_a53/events/cpu_cycles
event=0x0011
root@roc-rk3399-pc:~# cat /sys/devices/armv8_cortex_a72/events/cpu_cycles
event=0x0011
root@roc-rk3399-pc:~#

And the syscalls seem sane:

root@roc-rk3399-pc:~# strace -e perf_event_open taskset -c 0,1,2,3,4 perf stat -v -e armv8_cortex_a53/cpu_cycles/,armv8_cortex_a72/cpu_cycles/ echo
Using CPUID 0x00000000410fd034
Control descriptor is not initialized
perf_event_open({type=0x7 /* PERF_TYPE_??? */, size=0x88 /* PERF_ATTR_SIZE_??? */, config=0x11, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, exclude_guest=1, ...}, 14573, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
perf_event_open({type=0x8 /* PERF_TYPE_??? */, size=0x88 /* PERF_ATTR_SIZE_??? */, config=0x11, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, exclude_guest=1, ...}, 14573, -1, -1, PERF_FLAG_FD_CLOEXEC) = 4

--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=14573, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
armv8_cortex_a53/cpu_cycles/: 3227098 4480875 4480875
armv8_cortex_a72/cpu_cycles/: 0 4480875 0

 Performance counter stats for 'echo':

           3227098      armv8_cortex_a53/cpu_cycles/
     <not counted>      armv8_cortex_a72/cpu_cycles/                                            (0.00%)

       0.008381759 seconds time elapsed

       0.004064000 seconds user
       0.004121000 seconds sys


--- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=14572, si_uid=0} ---
+++ exited with 0 +++
root@roc-rk3399-pc:~#

As:

root@roc-rk3399-pc:~# cat /sys/devices/armv8_cortex_a53/type
7
root@roc-rk3399-pc:~# cat /sys/devices/armv8_cortex_a72/type
8
root@roc-rk3399-pc:~#

See the type=0x7 and type=0x8.

So what we need here seems to be to translate the generic term "cycles"
to "cpu_cycles" when a PMU is explicitely passed in the event name and
it doesn't have "cycles" and then just retry.

- Arnaldo

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-22 13:03                 ` Mark Rutland
@ 2023-11-22 15:29                   ` Ian Rogers
  2023-11-22 16:08                     ` Mark Rutland
  0 siblings, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2023-11-22 15:29 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Marc Zyngier, Hector Martin, Arnaldo Carvalho de Melo,
	James Clark, linux-perf-users, LKML, Asahi Linux

On Wed, Nov 22, 2023 at 5:04 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Tue, Nov 21, 2023 at 08:38:45AM -0800, Ian Rogers wrote:
> > On Tue, Nov 21, 2023 at 8:15 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > On Tue, Nov 21, 2023 at 08:09:37AM -0800, Ian Rogers wrote:
> > > > On Tue, Nov 21, 2023 at 8:03 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > > >
> > > > > On Tue, Nov 21, 2023 at 07:46:57AM -0800, Ian Rogers wrote:
> > > > > > On Tue, Nov 21, 2023 at 7:40 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > > > > >
> > > > > > > On Tue, Nov 21, 2023 at 03:24:25PM +0000, Marc Zyngier wrote:
> > > > > > > > On Tue, 21 Nov 2023 13:40:31 +0000,
> > > > > > > > Marc Zyngier <maz@kernel.org> wrote:
> > > > > > > > >
> > > > > > > > > [Adding key people on Cc]
> > > > > > > > >
> > > > > > > > > On Tue, 21 Nov 2023 12:08:48 +0000,
> > > > > > > > > Hector Martin <marcan@marcan.st> wrote:
> > > > > > > > > >
> > > > > > > > > > Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > > > > > > > > according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> > > > > > > > >
> > > > > > > > > I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> > > > > > > > > asymmetric ARM platform. It isn't clear what criteria is used to pick
> > > > > > > > > the PMU, but nothing works anymore.
> > > > > > > > >
> > > > > > > > > The saving grace in my case is that Debian still ships a 6.1 perftool
> > > > > > > > > package, but that's obviously not going to last.
> > > > > > > > >
> > > > > > > > > I'm happy to test potential fixes.
> > > > > > > >
> > > > > > > > At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
> > > > > > > > -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
> > > > > > > > CPU):
> > > > > > >
> > > > > > > IIUC the tool is doing the wrong thing here and overriding explicit
> > > > > > > ${pmu}/${event}/ events with PERF_TYPE_HARDWARE events rather than events using
> > > > > > > that ${pmu}'s type and event namespace.
> > > > > > >
> > > > > > > Regardless of the *new* ABI that allows PERF_TYPE_HARDWARE events to be
> > > > > > > targetted to a specific PMU, it's semantically wrong to rewrite events like
> > > > > > > this since ${pmu}/${event}/ is not necessarily equivalent to a similarly-named
> > > > > > > PERF_COUNT_HW_${EVENT}.
> > > > > >
> > > > > > If you name a PMU and an event then the event should only be opened on
> > > > > > that PMU, 100% agree. There's a bunch of output, but when the legacy
> > > > > > cycles event is opened it appears to be because it was explicitly
> > > > > > requested.
> > > > >
> > > > > I think you've missed that the named PMU events are being erreously transformed
> > > > > into PERF_TYPE_HARDWARE events. Look at the -vvv output, e.g.
> > > > >
> > > > >   Opening: apple_firestorm_pmu/cycles/
> > > > >   ------------------------------------------------------------
> > > > >   perf_event_attr:
> > > > >     type                             0 (PERF_TYPE_HARDWARE)
> > > > >     size                             136
> > > > >     config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > > > >     sample_type                      IDENTIFIER
> > > > >     read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > > > >     disabled                         1
> > > > >     inherit                          1
> > > > >     enable_on_exec                   1
> > > > >     exclude_guest                    1
> > > > >   ------------------------------------------------------------
> > > > >   sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
> > > > >
> > > > > ... which should not be PERF_TYPE_HARDWARE && PERF_COUNT_HW_CPU_CYCLES.
> > > > >
> > > > > Marc said that he bisected the issue down to commit:
> > > > >
> > > > >   5ea8f2ccffb23983 ("perf parse-events: Support hardware events as terms")
> > > > >
> > > > > ... so it looks like something is going wrong when the events are being parsed,
> > > > > e.g. losing the HW PMU information?
> > > >
> > > > Ok, I think I'm getting confused by other things. This looks like the issue.
> > > >
> > > > I think it may be working as intended, but not how you intended :-) If
> > > > a core PMU is listed and then a legacy event, the legacy event should
> > > > be opened on the core PMU as a legacy event with the extended type
> > > > set. This is to allow things like legacy cache events to be opened on
> > > > a specified PMU. Legacy event names match with a higher priority than
> > > > those in sysfs or json as they are hard coded.
> > >
> > > That has never been the case previously, so this is user-visible breakage, and
> > > it prevents users from being able to do the right thing, so I think that's a
> > > broken design.
> >
> > So the problem was caused by ARM and Intel doing two different things.
> > Intel did at least contribute to the perf tool in support for their
> > BIG.little/hybrid, so that's why the semantics match their approach.
>
> I appreciate that, and I agree that from the Arm side we haven't been as
> engaged with userspace on this front (please understand I'm the messenger here,
> this is something I've repeatedly asked for within Arm).
>
> Regardless, I don't think that changes the substance of the bug, which is that
> we're converting named-pmu events into entirely different PERF_TYPE_HARDWARE
> events.
>
> I agree that expanding plain legacy event names to a set of PMU-tagetted legacy
> events makes sense (and even for Arm, that's the right thing to do, IMO). If
> I ask for 'cycles' and that gets expanded to multiple legacy cycles events that
> target specific CPU PMUs, that's good.
>
> The thing that doesn't make sense here is converting named-pmu events into
> egacy events. If I ask for 'apple_firestorm_pmu/cycles/', that should be the
> 'cycles' event in the apple_firestorm_pmu's event namespace, and *shouldn't* be
> converted to a (potentially semantically different) PERF_TYPE_HARDWARE event,
> even if that's targetted towards the apple_firestorm_pmu. I think that should
> be true for *any* PMU, whether thats an arm/x86/whatever CPU PMU or a system
> PMU.

This is saying that legacy events are lower than system events. We
don't do this historically and as it requires extra PMU set up. On an
Intel Tigerlake:

```
$ ls /sys/devices/cpu/events
branch-instructions  cache-misses      instructions  ref-cycles
topdown-be-bound
branch-misses        cache-references  mem-loads     slots
topdown-fe-bound
bus-cycles           cpu-cycles        mem-stores    topdown-bad-spec
topdown-retiring
```
here (at least) branch-misses, bus-cycles, cache-references,
cpu-cycles and instructions overlap with legacy event names
```
$ perf --version
perf version 6.5.6
$ perf stat -vv -e branch-misses,bus-cycles,cache-references,cp
u-cycles,instructions true
Using CPUID GenuineIntel-6-8D-1
intel_pt default config: tsc,mtc,mtc_period=3,psb_period=3,pt,branch
Control descriptor is not initialized
------------------------------------------------------------
perf_event_attr:
 type                             0 (PERF_TYPE_HARDWARE)
 size                             136
 config                           0x5 (PERF_COUNT_HW_BRANCH_MISSES)
...
------------------------------------------------------------
perf_event_attr:
 type                             0 (PERF_TYPE_HARDWARE)
 size                             136
 config                           0x6 (PERF_COUNT_HW_BUS_CYCLES)
...
------------------------------------------------------------
perf_event_attr:
 type                             0 (PERF_TYPE_HARDWARE)
 size                             136
 config                           0x2 (PERF_COUNT_HW_CACHE_REFERENCES)
...
------------------------------------------------------------
perf_event_attr:
 type                             0 (PERF_TYPE_HARDWARE)
 size                             136
 config                           0 (PERF_COUNT_HW_CPU_CYCLES)
...
------------------------------------------------------------
perf_event_attr:
 type                             0 (PERF_TYPE_HARDWARE)
 size                             136
 config                           0x1 (PERF_COUNT_HW_INSTRUCTIONS)
...
branch-misses: -1: 6571 826226 826226
bus-cycles: -1: 31411 826226 826226
cache-references: -1: 19507 826226 826226
cpu-cycles: -1: 1127215 826226 826226
instructions: -1: 1301583 826226 826226
branch-misses: 6571 826226 826226
bus-cycles: 31411 826226 826226
cache-references: 19507 826226 826226
cpu-cycles: 1127215 826226 826226
instructions: 1301583 826226 826226

Performance counter stats for 'true':
...
```
ie perf 6.5 and all events even though sysfs has events we're opening
them with PERF_TYPE_HARDWARE.

> > > > Presumably the expectation was that by advertising a cycles event, presumably
> > > > in sysfs, then this is what would be matched.
>
> Yes. That's how this has always worked prior to the changes Marc referenced.
> Note that this can *also* be expaned to events from json databases, but was
> *never* previously silently converted to a PERF_TYPE_HARDWARE event.
>
> Please note that the events in sysfs are *namespaced* to the PMU (specifically,
> when using that PMU's dynamic type); they are not necessarily the same as
> legacy events (though they may have similar or matching
> names in some cases), they may be semantically distinct from the legacy events
> even if the names match, and it is incorrect to conflate the two.

This was a behavior added by Intel so that say cpu_atom/legacy-event/
would only open as a hardware event on that PMU. The point of the
blamed change is to make that behavior consistent for all core PMUs.

> > > I expect that if I ask for ${pmu}/${event}/, that PMU is used, and the event
> > > *in that PMU's namespace* is used. Overriding that breaks long-established
> > > practice and provides users with no recourse to get the behavioru they expect
> > > (and previosuly had).
> >
> > On ARM but not Intel.
>
> As above, I don't think the CPU architecture matters here for the case that I'm
> saying is broken. I think that regardless of CPU architecture (or for any
> non-CPU PMU) it is semantically incorrect to convert a named-pmu event to a
> legacy event.

So perf's behavior has always been that legacy event priority is
greater-than sysfs and json. The distinction here is that a core PMU
is explicitly listed and it doesn't seem unreasonable to use core PMU
names with legacy events, the behavior Intel added.

> > > I do think that (regardless of whther this was the sematnic you intended)
> > > silently overriding events with legacy events is a bug, and one we should fix.
> > > As I mentioned in another reply, just because the events have the same name
> > > does not mean that they are semantically the same, so we're liable to give
> > > people the wrong numbers anyhow.
> > >
> > > Can we fix this?
> >
> > So I'd like to fix this, some things from various conversations:
> >
> > 1) we lack testing. Our testing relies on the sysfs of the machine
> > being run on, which is better than nothing. I think ideally we'd have
> > a collection of zipped up sysfs directories and then we could have a
> > test that asserts on ARM you get the behavior you want.
>
> I agree we lack testing, and I'd be happy to help here going forwards, though I
> don't think this is a prerequisite for fixing this issue.
>
> > 2) for RISC-V they want to make the legacy event matching something in
> > user land to simplify the PMU driver.
>
> Ok; I see how this might be related, but it doesn't sound like a prerequisite
> for fixing this issue -- there are plenty of people in this thread who can
> test.
>
> > 3) I'd like to get rid of the PMU json interface. My idea is to
> > convert json events/metrics into sysfs style files, zip these up and
> > then link them into the perf binary. On Intel the json is 70% of the
> > binary (7MB out of 10MB) and we may get this down to 3MB with this
> > approach. The json lookup would need to incorporate the cpuid matching
> > that currently exists. When we look up an event I'd like the approach
> > to be like unionfs with a specified but configurable order. Users
> > could provide directories of their own events/metrics for various
> > PMUs, and then this approach could be used to help with (1).
>
> I can see how that might interact with whatever changes we make to fix this
> issue, but this seems like a future aspiration, and not a prerequisite for
> fixing the existing functional regression.
>
> > Those proposals are not something to add as a -rc fix, so what I think
> > you're asking for here is a "if ARM" fix somewhere in the event
> > parsing. That's of course possible but it will cause problems if you
> > did say:
> >
> > perf stat -e arm_pmu/LLC-load-misses/ ...
>
> As above, I do not think this is an arm-specific issue, we're just the canary
> in the coalmine.

Disagree, see comments above. A behavior change here would impact Intel.

> Please note that:
>
>         perf stat -e arm_pmu/LLC-load-misses/ ...
>
> ... would never have worked previously. No arm_pmu instances have a
> "LLC-load-misses" event in their event namespaces, and we don't have any
> userspace file mapping that event.

This event was for the purpose of giving an example, perf list will
show you events that work. The point is that a legacy event may not be
available on both BIG.little PMU types so being able to designate the
PMU there is helpful.

> That said, If I really wanted that legacy event, I'd have asked for it bare,
> e.g.
>
>         perf stat -e LLC-load-misses
>
> ... and we're in agreement that it's sensible to expand this to multiple
> PERF_TYPE_HARDWARE events targeting the individual CPU PMUs.
>
> So I see no need to do anything to have magic for 'arm_pmu/LLC-load-misses/'.
>
> > as I doubt the PMU driver is advertising this legacy event in sysfs
> > and the "if ARM" logic would presumably be trying to disable legacy
> > events in the term list for the ARM PMU.
> >
> > Given all of this, is anything actually broken and needing a fix for 6.7?
>
> There is absolutely a bug that needs to be fixed here (and needs to be
> backported to stable so that it gets picked up by distributions).

I'm not seeing this. The behavior is consistent with Intel, this has
gone 2 releases without being spotted, it was triggered by a PMU event
name aliasing a legacy event name and the behavior has always been
legacy event names have higher priority than sysfs and json events.

Whilst I'm seeing a lot of complaining, I've not seen a proposal of
what behavior you want. Isn't it a PMU bug if the legacy event
specifying the PMU doesn't get opened by the core PMU? Fixing the PMU
driver appears to be the right fix and means there is consistency on
core events across architectures.

Thanks,
Ian

> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-22 13:06                   ` Arnaldo Carvalho de Melo
@ 2023-11-22 15:33                     ` Ian Rogers
  2023-11-22 15:49                     ` Mark Rutland
  1 sibling, 0 replies; 53+ messages in thread
From: Ian Rogers @ 2023-11-22 15:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Hector Martin, Mark Rutland, Marc Zyngier,
	Arnaldo Carvalho de Melo, James Clark, linux-perf-users, LKML,
	Asahi Linux

On Wed, Nov 22, 2023 at 5:06 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Wed, Nov 22, 2023 at 12:23:27PM +0900, Hector Martin escreveu:
> > On 2023/11/22 1:38, Ian Rogers wrote:
> > > On Tue, Nov 21, 2023 at 8:15 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > >> On Tue, Nov 21, 2023 at 08:09:37AM -0800, Ian Rogers wrote:
> > >>> On Tue, Nov 21, 2023 at 8:03 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > >>>> On Tue, Nov 21, 2023 at 07:46:57AM -0800, Ian Rogers wrote:
> > >>>>> On Tue, Nov 21, 2023 at 7:40 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > >>>>>> On Tue, Nov 21, 2023 at 03:24:25PM +0000, Marc Zyngier wrote:
> > >>>>>>> On Tue, 21 Nov 2023 13:40:31 +0000,
> > >>>>>>> Marc Zyngier <maz@kernel.org> wrote:
> > >>>>>>>>
> > >>>>>>>> [Adding key people on Cc]
> > >>>>>>>>
> > >>>>>>>> On Tue, 21 Nov 2023 12:08:48 +0000,
> > >>>>>>>> Hector Martin <marcan@marcan.st> wrote:
> > >>>>>>>>>
> > >>>>>>>>> Perf broke on all Apple ARM64 systems (tested almost everything), and
> > >>>>>>>>> according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> > >>>>>>>>
> > >>>>>>>> I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> > >>>>>>>> asymmetric ARM platform. It isn't clear what criteria is used to pick
> > >>>>>>>> the PMU, but nothing works anymore.
> > >>>>>>>>
> > >>>>>>>> The saving grace in my case is that Debian still ships a 6.1 perftool
> > >>>>>>>> package, but that's obviously not going to last.
> > >>>>>>>>
> > >>>>>>>> I'm happy to test potential fixes.
> > >>>>>>>
> > >>>>>>> At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
> > >>>>>>> -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
> > >>>>>>> CPU):
> > >>>>>>
> > >>>>>> IIUC the tool is doing the wrong thing here and overriding explicit
> > >>>>>> ${pmu}/${event}/ events with PERF_TYPE_HARDWARE events rather than events using
> > >>>>>> that ${pmu}'s type and event namespace.
> > >>>>>>
> > >>>>>> Regardless of the *new* ABI that allows PERF_TYPE_HARDWARE events to be
> > >>>>>> targetted to a specific PMU, it's semantically wrong to rewrite events like
> > >>>>>> this since ${pmu}/${event}/ is not necessarily equivalent to a similarly-named
> > >>>>>> PERF_COUNT_HW_${EVENT}.
> > >>>>>
> > >>>>> If you name a PMU and an event then the event should only be opened on
> > >>>>> that PMU, 100% agree. There's a bunch of output, but when the legacy
> > >>>>> cycles event is opened it appears to be because it was explicitly
> > >>>>> requested.
> > >>>>
> > >>>> I think you've missed that the named PMU events are being erreously transformed
> > >>>> into PERF_TYPE_HARDWARE events. Look at the -vvv output, e.g.
> > >>>>
> > >>>>   Opening: apple_firestorm_pmu/cycles/
> > >>>>   ------------------------------------------------------------
> > >>>>   perf_event_attr:
> > >>>>     type                             0 (PERF_TYPE_HARDWARE)
> > >>>>     size                             136
> > >>>>     config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > >>>>     sample_type                      IDENTIFIER
> > >>>>     read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > >>>>     disabled                         1
> > >>>>     inherit                          1
> > >>>>     enable_on_exec                   1
> > >>>>     exclude_guest                    1
> > >>>>   ------------------------------------------------------------
> > >>>>   sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
> > >>>>
> > >>>> ... which should not be PERF_TYPE_HARDWARE && PERF_COUNT_HW_CPU_CYCLES.
> > >>>>
> > >>>> Marc said that he bisected the issue down to commit:
> > >>>>
> > >>>>   5ea8f2ccffb23983 ("perf parse-events: Support hardware events as terms")
> > >>>>
> > >>>> ... so it looks like something is going wrong when the events are being parsed,
> > >>>> e.g. losing the HW PMU information?
> > >>>
> > >>> Ok, I think I'm getting confused by other things. This looks like the issue.
> > >>>
> > >>> I think it may be working as intended, but not how you intended :-) If
> > >>> a core PMU is listed and then a legacy event, the legacy event should
>
> The point is that "cycles" when prefixed with "pmu/" shouldn't be
> considered "cycles" as HW/0, in that setting it is "cycles" for that
> PMU. (but we only have "cpu_cycles" for at least the a53 and a72 PMUs I
> have access in a Libre Computer rockchip 3399-pc hybrid board, if we use
> it, then we get what we want/had before, see below):
>
> And there is an attempt at using the specified PMU, see the first
> perf_event_open:
>
> root@roc-rk3399-pc:~# strace -e perf_event_open perf stat -vv -e cycles,armv8_cortex_a53/cycles/,armv8_cortex_a72/cycles/ echo
> Using CPUID 0x00000000410fd082
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   config                           0x700000000
>   disabled                         1
> ------------------------------------------------------------
> sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8perf_event_open({type=PERF_TYPE_HARDWARE, size=0 /* PERF_ATTR_SIZE_??? */, config=0x7<<32|PERF_COUNT_HW_CPU_CYCLES, sample_period=0, sample_type=0, read_format=0, disabled=1, precise_ip=0 /* arbitrary skid */, ...}, 0, -1, -1, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
>
> //// HERE: it tries config=0x7<<32|PERF_COUNT_HW_CPU_CYCLES taking into
> //account the PMU number 0x7
>
> root@roc-rk3399-pc:~# cat /sys/devices/armv8_cortex_a53/type
> 7
> root@roc-rk3399-pc:~#
>
> But then we don't have "cycles" in that PMU:
>
> root@roc-rk3399-pc:~# ls -la /sys/devices/armv8_cortex_a53/events/cycles
> ls: cannot access '/sys/devices/armv8_cortex_a53/events/cycles': No such file or directory
> root@roc-rk3399-pc:~#
>
> Maybe:
>
> root@roc-rk3399-pc:~# taskset -c 5,6 perf stat -v -e armv8_cortex_a53/cpu_cycles/,armv8_cortex_a72/cpu_cycles/ echo
> Using CPUID 0x00000000410fd034
> Control descriptor is not initialized
>
> armv8_cortex_a53/cpu_cycles/: 0 2079000 0
> armv8_cortex_a72/cpu_cycles/: 2488961 2079000 2079000
>
>  Performance counter stats for 'echo':
>
>      <not counted>      armv8_cortex_a53/cpu_cycles/                                            (0.00%)
>            2488961      armv8_cortex_a72/cpu_cycles/
>
>        0.003449266 seconds time elapsed
>
>        0.003502000 seconds user
>        0.000000000 seconds sys
>
>
> root@roc-rk3399-pc:~# taskset -c 0,1,2,3,4 perf stat -v -e armv8_cortex_a53/cpu_cycles/,armv8_cortex_a72/cpu_cycles/ echo
> Using CPUID 0x00000000410fd034
> Control descriptor is not initialized
>
> armv8_cortex_a53/cpu_cycles/: 2986601 6999416 6999416
> armv8_cortex_a72/cpu_cycles/: 0 6999416 0
>
>  Performance counter stats for 'echo':
>
>            2986601      armv8_cortex_a53/cpu_cycles/
>      <not counted>      armv8_cortex_a72/cpu_cycles/                                            (0.00%)
>
>        0.011434508 seconds time elapsed
>
>        0.003911000 seconds user
>        0.007454000 seconds sys
>
>
> root@roc-rk3399-pc:~#
>
> root@roc-rk3399-pc:~# cat /sys/devices/armv8_cortex_a53/events/cpu_cycles
> event=0x0011
> root@roc-rk3399-pc:~# cat /sys/devices/armv8_cortex_a72/events/cpu_cycles
> event=0x0011
> root@roc-rk3399-pc:~#
>
> And the syscalls seem sane:
>
> root@roc-rk3399-pc:~# strace -e perf_event_open taskset -c 0,1,2,3,4 perf stat -v -e armv8_cortex_a53/cpu_cycles/,armv8_cortex_a72/cpu_cycles/ echo
> Using CPUID 0x00000000410fd034
> Control descriptor is not initialized
> perf_event_open({type=0x7 /* PERF_TYPE_??? */, size=0x88 /* PERF_ATTR_SIZE_??? */, config=0x11, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, exclude_guest=1, ...}, 14573, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
> perf_event_open({type=0x8 /* PERF_TYPE_??? */, size=0x88 /* PERF_ATTR_SIZE_??? */, config=0x11, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, exclude_guest=1, ...}, 14573, -1, -1, PERF_FLAG_FD_CLOEXEC) = 4
>
> --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=14573, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
> armv8_cortex_a53/cpu_cycles/: 3227098 4480875 4480875
> armv8_cortex_a72/cpu_cycles/: 0 4480875 0
>
>  Performance counter stats for 'echo':
>
>            3227098      armv8_cortex_a53/cpu_cycles/
>      <not counted>      armv8_cortex_a72/cpu_cycles/                                            (0.00%)
>
>        0.008381759 seconds time elapsed
>
>        0.004064000 seconds user
>        0.004121000 seconds sys
>
>
> --- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=14572, si_uid=0} ---
> +++ exited with 0 +++
> root@roc-rk3399-pc:~#
>
> As:
>
> root@roc-rk3399-pc:~# cat /sys/devices/armv8_cortex_a53/type
> 7
> root@roc-rk3399-pc:~# cat /sys/devices/armv8_cortex_a72/type
> 8
> root@roc-rk3399-pc:~#
>
> See the type=0x7 and type=0x8.
>
> So what we need here seems to be to translate the generic term "cycles"
> to "cpu_cycles" when a PMU is explicitely passed in the event name and
> it doesn't have "cycles" and then just retry.

The PMU driver does the legacy to raw encoding translation, this is an
assumption the tool has of core PMUs. You can see ARM's PMU driver
doing the mapping here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_pmuv3.c#n40

Thanks,
Ian


> - Arnaldo

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-22 13:06                   ` Arnaldo Carvalho de Melo
  2023-11-22 15:33                     ` Ian Rogers
@ 2023-11-22 15:49                     ` Mark Rutland
  2023-11-22 16:04                       ` Ian Rogers
  2023-11-22 16:19                       ` Arnaldo Carvalho de Melo
  1 sibling, 2 replies; 53+ messages in thread
From: Mark Rutland @ 2023-11-22 15:49 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Hector Martin, Ian Rogers, Marc Zyngier, Arnaldo Carvalho de Melo,
	James Clark, linux-perf-users, LKML, Asahi Linux

On Wed, Nov 22, 2023 at 10:06:23AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Nov 22, 2023 at 12:23:27PM +0900, Hector Martin escreveu:
> > On 2023/11/22 1:38, Ian Rogers wrote:
> > > On Tue, Nov 21, 2023 at 8:15 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > >> On Tue, Nov 21, 2023 at 08:09:37AM -0800, Ian Rogers wrote:
> > >>> On Tue, Nov 21, 2023 at 8:03 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > >>>> On Tue, Nov 21, 2023 at 07:46:57AM -0800, Ian Rogers wrote:
> > >>>>> On Tue, Nov 21, 2023 at 7:40 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > >>>>>> On Tue, Nov 21, 2023 at 03:24:25PM +0000, Marc Zyngier wrote:
> > >>>>>>> On Tue, 21 Nov 2023 13:40:31 +0000,
> > >>>>>>> Marc Zyngier <maz@kernel.org> wrote:
> > >>>>>>>>
> > >>>>>>>> [Adding key people on Cc]
> > >>>>>>>>
> > >>>>>>>> On Tue, 21 Nov 2023 12:08:48 +0000,
> > >>>>>>>> Hector Martin <marcan@marcan.st> wrote:
> > >>>>>>>>>
> > >>>>>>>>> Perf broke on all Apple ARM64 systems (tested almost everything), and
> > >>>>>>>>> according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> > >>>>>>>>
> > >>>>>>>> I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> > >>>>>>>> asymmetric ARM platform. It isn't clear what criteria is used to pick
> > >>>>>>>> the PMU, but nothing works anymore.
> > >>>>>>>>
> > >>>>>>>> The saving grace in my case is that Debian still ships a 6.1 perftool
> > >>>>>>>> package, but that's obviously not going to last.
> > >>>>>>>>
> > >>>>>>>> I'm happy to test potential fixes.
> > >>>>>>>
> > >>>>>>> At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
> > >>>>>>> -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
> > >>>>>>> CPU):
> > >>>>>>
> > >>>>>> IIUC the tool is doing the wrong thing here and overriding explicit
> > >>>>>> ${pmu}/${event}/ events with PERF_TYPE_HARDWARE events rather than events using
> > >>>>>> that ${pmu}'s type and event namespace.
> > >>>>>>
> > >>>>>> Regardless of the *new* ABI that allows PERF_TYPE_HARDWARE events to be
> > >>>>>> targetted to a specific PMU, it's semantically wrong to rewrite events like
> > >>>>>> this since ${pmu}/${event}/ is not necessarily equivalent to a similarly-named
> > >>>>>> PERF_COUNT_HW_${EVENT}.
> > >>>>>
> > >>>>> If you name a PMU and an event then the event should only be opened on
> > >>>>> that PMU, 100% agree. There's a bunch of output, but when the legacy
> > >>>>> cycles event is opened it appears to be because it was explicitly
> > >>>>> requested.
> > >>>>
> > >>>> I think you've missed that the named PMU events are being erreously transformed
> > >>>> into PERF_TYPE_HARDWARE events. Look at the -vvv output, e.g.
> > >>>>
> > >>>>   Opening: apple_firestorm_pmu/cycles/
> > >>>>   ------------------------------------------------------------
> > >>>>   perf_event_attr:
> > >>>>     type                             0 (PERF_TYPE_HARDWARE)
> > >>>>     size                             136
> > >>>>     config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > >>>>     sample_type                      IDENTIFIER
> > >>>>     read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > >>>>     disabled                         1
> > >>>>     inherit                          1
> > >>>>     enable_on_exec                   1
> > >>>>     exclude_guest                    1
> > >>>>   ------------------------------------------------------------
> > >>>>   sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
> > >>>>
> > >>>> ... which should not be PERF_TYPE_HARDWARE && PERF_COUNT_HW_CPU_CYCLES.
> > >>>>
> > >>>> Marc said that he bisected the issue down to commit:
> > >>>>
> > >>>>   5ea8f2ccffb23983 ("perf parse-events: Support hardware events as terms")
> > >>>>
> > >>>> ... so it looks like something is going wrong when the events are being parsed,
> > >>>> e.g. losing the HW PMU information?
> > >>>
> > >>> Ok, I think I'm getting confused by other things. This looks like the issue.
> > >>>
> > >>> I think it may be working as intended, but not how you intended :-) If
> > >>> a core PMU is listed and then a legacy event, the legacy event should
> 
> The point is that "cycles" when prefixed with "pmu/" shouldn't be
> considered "cycles" as HW/0, in that setting it is "cycles" for that
> PMU.

Exactly.

> (but we only have "cpu_cycles" for at least the a53 and a72 PMUs I
> have access in a Libre Computer rockchip 3399-pc hybrid board, if we use
> it, then we get what we want/had before, see below):

Both Cortex-A53 and Cortex-A72 have the common PMUv3 events, so they have
"cpu_cycles" and "bus_cycles".

The Apple PMUs that Hector and Marc anre using don't follow the PMUv3
architecture, and just have a "cycles" event.

[...]

> So what we need here seems to be to translate the generic term "cycles"
> to "cpu_cycles" when a PMU is explicitely passed in the event name and
> it doesn't have "cycles" and then just retry.

I'm not sure we need to map that.

My thinking is:

* If the user asks for "cycles" without a PMU name, that should use the
  PERF_TYPE_HARDWARE cycles event. The ARM PMUs handle that correctly when the
  event is directed to them.

* If the user asks for "${pmu}/cycles/", that should only use the "cycles"
  event in that PMU's namespace, not PERF_TYPE_HARDWARE.

* If we need a way so say "use the PERF_TYPE_HARDWARE cycles event on ${pmu}",
  then we should have a new syntax for that (e.g. as we have for raw events),
  e.g. it would be possible to have "pmu/hw:cycles/" or something like that.

That way there's no ambiguity.

Mark.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-22 15:49                     ` Mark Rutland
@ 2023-11-22 16:04                       ` Ian Rogers
  2023-11-22 16:26                         ` Arnaldo Carvalho de Melo
  2023-11-22 16:19                       ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2023-11-22 16:04 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Arnaldo Carvalho de Melo, Hector Martin, Marc Zyngier,
	Arnaldo Carvalho de Melo, James Clark, linux-perf-users, LKML,
	Asahi Linux

On Wed, Nov 22, 2023 at 7:49 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Wed, Nov 22, 2023 at 10:06:23AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Nov 22, 2023 at 12:23:27PM +0900, Hector Martin escreveu:
> > > On 2023/11/22 1:38, Ian Rogers wrote:
> > > > On Tue, Nov 21, 2023 at 8:15 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > >> On Tue, Nov 21, 2023 at 08:09:37AM -0800, Ian Rogers wrote:
> > > >>> On Tue, Nov 21, 2023 at 8:03 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > >>>> On Tue, Nov 21, 2023 at 07:46:57AM -0800, Ian Rogers wrote:
> > > >>>>> On Tue, Nov 21, 2023 at 7:40 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > >>>>>> On Tue, Nov 21, 2023 at 03:24:25PM +0000, Marc Zyngier wrote:
> > > >>>>>>> On Tue, 21 Nov 2023 13:40:31 +0000,
> > > >>>>>>> Marc Zyngier <maz@kernel.org> wrote:
> > > >>>>>>>>
> > > >>>>>>>> [Adding key people on Cc]
> > > >>>>>>>>
> > > >>>>>>>> On Tue, 21 Nov 2023 12:08:48 +0000,
> > > >>>>>>>> Hector Martin <marcan@marcan.st> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>> Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > >>>>>>>>> according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> > > >>>>>>>>
> > > >>>>>>>> I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> > > >>>>>>>> asymmetric ARM platform. It isn't clear what criteria is used to pick
> > > >>>>>>>> the PMU, but nothing works anymore.
> > > >>>>>>>>
> > > >>>>>>>> The saving grace in my case is that Debian still ships a 6.1 perftool
> > > >>>>>>>> package, but that's obviously not going to last.
> > > >>>>>>>>
> > > >>>>>>>> I'm happy to test potential fixes.
> > > >>>>>>>
> > > >>>>>>> At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
> > > >>>>>>> -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
> > > >>>>>>> CPU):
> > > >>>>>>
> > > >>>>>> IIUC the tool is doing the wrong thing here and overriding explicit
> > > >>>>>> ${pmu}/${event}/ events with PERF_TYPE_HARDWARE events rather than events using
> > > >>>>>> that ${pmu}'s type and event namespace.
> > > >>>>>>
> > > >>>>>> Regardless of the *new* ABI that allows PERF_TYPE_HARDWARE events to be
> > > >>>>>> targetted to a specific PMU, it's semantically wrong to rewrite events like
> > > >>>>>> this since ${pmu}/${event}/ is not necessarily equivalent to a similarly-named
> > > >>>>>> PERF_COUNT_HW_${EVENT}.
> > > >>>>>
> > > >>>>> If you name a PMU and an event then the event should only be opened on
> > > >>>>> that PMU, 100% agree. There's a bunch of output, but when the legacy
> > > >>>>> cycles event is opened it appears to be because it was explicitly
> > > >>>>> requested.
> > > >>>>
> > > >>>> I think you've missed that the named PMU events are being erreously transformed
> > > >>>> into PERF_TYPE_HARDWARE events. Look at the -vvv output, e.g.
> > > >>>>
> > > >>>>   Opening: apple_firestorm_pmu/cycles/
> > > >>>>   ------------------------------------------------------------
> > > >>>>   perf_event_attr:
> > > >>>>     type                             0 (PERF_TYPE_HARDWARE)
> > > >>>>     size                             136
> > > >>>>     config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > > >>>>     sample_type                      IDENTIFIER
> > > >>>>     read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > > >>>>     disabled                         1
> > > >>>>     inherit                          1
> > > >>>>     enable_on_exec                   1
> > > >>>>     exclude_guest                    1
> > > >>>>   ------------------------------------------------------------
> > > >>>>   sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
> > > >>>>
> > > >>>> ... which should not be PERF_TYPE_HARDWARE && PERF_COUNT_HW_CPU_CYCLES.
> > > >>>>
> > > >>>> Marc said that he bisected the issue down to commit:
> > > >>>>
> > > >>>>   5ea8f2ccffb23983 ("perf parse-events: Support hardware events as terms")
> > > >>>>
> > > >>>> ... so it looks like something is going wrong when the events are being parsed,
> > > >>>> e.g. losing the HW PMU information?
> > > >>>
> > > >>> Ok, I think I'm getting confused by other things. This looks like the issue.
> > > >>>
> > > >>> I think it may be working as intended, but not how you intended :-) If
> > > >>> a core PMU is listed and then a legacy event, the legacy event should
> >
> > The point is that "cycles" when prefixed with "pmu/" shouldn't be
> > considered "cycles" as HW/0, in that setting it is "cycles" for that
> > PMU.
>
> Exactly.
>
> > (but we only have "cpu_cycles" for at least the a53 and a72 PMUs I
> > have access in a Libre Computer rockchip 3399-pc hybrid board, if we use
> > it, then we get what we want/had before, see below):
>
> Both Cortex-A53 and Cortex-A72 have the common PMUv3 events, so they have
> "cpu_cycles" and "bus_cycles".
>
> The Apple PMUs that Hector and Marc anre using don't follow the PMUv3
> architecture, and just have a "cycles" event.
>
> [...]
>
> > So what we need here seems to be to translate the generic term "cycles"
> > to "cpu_cycles" when a PMU is explicitely passed in the event name and
> > it doesn't have "cycles" and then just retry.
>
> I'm not sure we need to map that.
>
> My thinking is:
>
> * If the user asks for "cycles" without a PMU name, that should use the
>   PERF_TYPE_HARDWARE cycles event. The ARM PMUs handle that correctly when the
>   event is directed to them.
>
> * If the user asks for "${pmu}/cycles/", that should only use the "cycles"
>   event in that PMU's namespace, not PERF_TYPE_HARDWARE.
>
> * If we need a way so say "use the PERF_TYPE_HARDWARE cycles event on ${pmu}",
>   then we should have a new syntax for that (e.g. as we have for raw events),
>   e.g. it would be possible to have "pmu/hw:cycles/" or something like that.
>
> That way there's no ambiguity.

This would break cpu_core/LLC-load-misses/ on Intel hybrid as the
LLC-load-misses event is legacy and not advertised in either sysfs or
in json.

Thanks,
Ian

> Mark.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-22 15:29                   ` Ian Rogers
@ 2023-11-22 16:08                     ` Mark Rutland
  2023-11-22 16:29                       ` Ian Rogers
  0 siblings, 1 reply; 53+ messages in thread
From: Mark Rutland @ 2023-11-22 16:08 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Marc Zyngier, Hector Martin, Arnaldo Carvalho de Melo,
	James Clark, linux-perf-users, LKML, Asahi Linux

On Wed, Nov 22, 2023 at 07:29:34AM -0800, Ian Rogers wrote:
> On Wed, Nov 22, 2023 at 5:04 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > On Tue, Nov 21, 2023 at 08:38:45AM -0800, Ian Rogers wrote:
> > > On Tue, Nov 21, 2023 at 8:15 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > > On Tue, Nov 21, 2023 at 08:09:37AM -0800, Ian Rogers wrote:
> > > > > On Tue, Nov 21, 2023 at 8:03 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > > > > On Tue, Nov 21, 2023 at 07:46:57AM -0800, Ian Rogers wrote:
> > > > > > > On Tue, Nov 21, 2023 at 7:40 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > > > > > > On Tue, Nov 21, 2023 at 03:24:25PM +0000, Marc Zyngier wrote:
> > > > > > > > > On Tue, 21 Nov 2023 13:40:31 +0000,
> > > > > > > > > Marc Zyngier <maz@kernel.org> wrote:
> > > > > > > > > >
> > > > > > > > > > [Adding key people on Cc]
> > > > > > > > > >
> > > > > > > > > > On Tue, 21 Nov 2023 12:08:48 +0000,
> > > > > > > > > > Hector Martin <marcan@marcan.st> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > > > > > > > > > according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> > > > > > > > > >
> > > > > > > > > > I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> > > > > > > > > > asymmetric ARM platform. It isn't clear what criteria is used to pick
> > > > > > > > > > the PMU, but nothing works anymore.
> > > > > > > > > >
> > > > > > > > > > The saving grace in my case is that Debian still ships a 6.1 perftool
> > > > > > > > > > package, but that's obviously not going to last.
> > > > > > > > > >
> > > > > > > > > > I'm happy to test potential fixes.
> > > > > > > > >
> > > > > > > > > At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
> > > > > > > > > -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
> > > > > > > > > CPU):
> > > > > > > >
> > > > > > > > IIUC the tool is doing the wrong thing here and overriding explicit
> > > > > > > > ${pmu}/${event}/ events with PERF_TYPE_HARDWARE events rather than events using
> > > > > > > > that ${pmu}'s type and event namespace.
> > > > > > > >
> > > > > > > > Regardless of the *new* ABI that allows PERF_TYPE_HARDWARE events to be
> > > > > > > > targetted to a specific PMU, it's semantically wrong to rewrite events like
> > > > > > > > this since ${pmu}/${event}/ is not necessarily equivalent to a similarly-named
> > > > > > > > PERF_COUNT_HW_${EVENT}.
> > > > > > >
> > > > > > > If you name a PMU and an event then the event should only be opened on
> > > > > > > that PMU, 100% agree. There's a bunch of output, but when the legacy
> > > > > > > cycles event is opened it appears to be because it was explicitly
> > > > > > > requested.
> > > > > >
> > > > > > I think you've missed that the named PMU events are being erreously transformed
> > > > > > into PERF_TYPE_HARDWARE events. Look at the -vvv output, e.g.
> > > > > >
> > > > > >   Opening: apple_firestorm_pmu/cycles/
> > > > > >   ------------------------------------------------------------
> > > > > >   perf_event_attr:
> > > > > >     type                             0 (PERF_TYPE_HARDWARE)
> > > > > >     size                             136
> > > > > >     config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > > > > >     sample_type                      IDENTIFIER
> > > > > >     read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > > > > >     disabled                         1
> > > > > >     inherit                          1
> > > > > >     enable_on_exec                   1
> > > > > >     exclude_guest                    1
> > > > > >   ------------------------------------------------------------
> > > > > >   sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
> > > > > >
> > > > > > ... which should not be PERF_TYPE_HARDWARE && PERF_COUNT_HW_CPU_CYCLES.
> > > > > >
> > > > > > Marc said that he bisected the issue down to commit:
> > > > > >
> > > > > >   5ea8f2ccffb23983 ("perf parse-events: Support hardware events as terms")
> > > > > >
> > > > > > ... so it looks like something is going wrong when the events are being parsed,
> > > > > > e.g. losing the HW PMU information?
> > > > >
> > > > > Ok, I think I'm getting confused by other things. This looks like the issue.
> > > > >
> > > > > I think it may be working as intended, but not how you intended :-) If
> > > > > a core PMU is listed and then a legacy event, the legacy event should
> > > > > be opened on the core PMU as a legacy event with the extended type
> > > > > set. This is to allow things like legacy cache events to be opened on
> > > > > a specified PMU. Legacy event names match with a higher priority than
> > > > > those in sysfs or json as they are hard coded.
> > > >
> > > > That has never been the case previously, so this is user-visible breakage, and
> > > > it prevents users from being able to do the right thing, so I think that's a
> > > > broken design.
> > >
> > > So the problem was caused by ARM and Intel doing two different things.
> > > Intel did at least contribute to the perf tool in support for their
> > > BIG.little/hybrid, so that's why the semantics match their approach.
> >
> > I appreciate that, and I agree that from the Arm side we haven't been as
> > engaged with userspace on this front (please understand I'm the messenger here,
> > this is something I've repeatedly asked for within Arm).
> >
> > Regardless, I don't think that changes the substance of the bug, which is that
> > we're converting named-pmu events into entirely different PERF_TYPE_HARDWARE
> > events.
> >
> > I agree that expanding plain legacy event names to a set of PMU-tagetted legacy
> > events makes sense (and even for Arm, that's the right thing to do, IMO). If
> > I ask for 'cycles' and that gets expanded to multiple legacy cycles events that
> > target specific CPU PMUs, that's good.
> >
> > The thing that doesn't make sense here is converting named-pmu events into
> > egacy events. If I ask for 'apple_firestorm_pmu/cycles/', that should be the
> > 'cycles' event in the apple_firestorm_pmu's event namespace, and *shouldn't* be
> > converted to a (potentially semantically different) PERF_TYPE_HARDWARE event,
> > even if that's targetted towards the apple_firestorm_pmu. I think that should
> > be true for *any* PMU, whether thats an arm/x86/whatever CPU PMU or a system
> > PMU.
> 
> This is saying that legacy events are lower than system events. We
> don't do this historically and as it requires extra PMU set up. On an
> Intel Tigerlake:
> 
> ```
> $ ls /sys/devices/cpu/events
> branch-instructions  cache-misses      instructions  ref-cycles
> topdown-be-bound
> branch-misses        cache-references  mem-loads     slots
> topdown-fe-bound
> bus-cycles           cpu-cycles        mem-stores    topdown-bad-spec
> topdown-retiring
> ```
> here (at least) branch-misses, bus-cycles, cache-references,
> cpu-cycles and instructions overlap with legacy event names
> ```
> $ perf --version
> perf version 6.5.6
> $ perf stat -vv -e branch-misses,bus-cycles,cache-references,cp
> u-cycles,instructions true

Here you *aren't using a named PMU. As I said before, using the
PERF_TYPE_HARDWARE events in this case is entriely fine, it's just the
${pmu}/${eventname}/ case that I'm saying should use the PMU's namespace,
which was historically the case, and is what users are depending upon.

i.e. 

	perf stat -e cycles ./workload

... can/should use PERF_TYPE_HARDWARE events, as it used to

However:

	perf srtat -e ${pmu}/cycles/ ./workload

... should use the PMU's namespaced events, as it used to

> Using CPUID GenuineIntel-6-8D-1
> intel_pt default config: tsc,mtc,mtc_period=3,psb_period=3,pt,branch
> Control descriptor is not initialized
> ------------------------------------------------------------
> perf_event_attr:
>  type                             0 (PERF_TYPE_HARDWARE)
>  size                             136
>  config                           0x5 (PERF_COUNT_HW_BRANCH_MISSES)
> ...
> ------------------------------------------------------------
> perf_event_attr:
>  type                             0 (PERF_TYPE_HARDWARE)
>  size                             136
>  config                           0x6 (PERF_COUNT_HW_BUS_CYCLES)
> ...
> ------------------------------------------------------------
> perf_event_attr:
>  type                             0 (PERF_TYPE_HARDWARE)
>  size                             136
>  config                           0x2 (PERF_COUNT_HW_CACHE_REFERENCES)
> ...
> ------------------------------------------------------------
> perf_event_attr:
>  type                             0 (PERF_TYPE_HARDWARE)
>  size                             136
>  config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> ...
> ------------------------------------------------------------
> perf_event_attr:
>  type                             0 (PERF_TYPE_HARDWARE)
>  size                             136
>  config                           0x1 (PERF_COUNT_HW_INSTRUCTIONS)
> ...
> branch-misses: -1: 6571 826226 826226
> bus-cycles: -1: 31411 826226 826226
> cache-references: -1: 19507 826226 826226
> cpu-cycles: -1: 1127215 826226 826226
> instructions: -1: 1301583 826226 826226
> branch-misses: 6571 826226 826226
> bus-cycles: 31411 826226 826226
> cache-references: 19507 826226 826226
> cpu-cycles: 1127215 826226 826226
> instructions: 1301583 826226 826226
> 
> Performance counter stats for 'true':
> ...
> ```
> ie perf 6.5 and all events even though sysfs has events we're opening
> them with PERF_TYPE_HARDWARE.

As above, this is a different case.

> 
> > > > > Presumably the expectation was that by advertising a cycles event, presumably
> > > > > in sysfs, then this is what would be matched.
> >
> > Yes. That's how this has always worked prior to the changes Marc referenced.
> > Note that this can *also* be expaned to events from json databases, but was
> > *never* previously silently converted to a PERF_TYPE_HARDWARE event.
> >
> > Please note that the events in sysfs are *namespaced* to the PMU (specifically,
> > when using that PMU's dynamic type); they are not necessarily the same as
> > legacy events (though they may have similar or matching
> > names in some cases), they may be semantically distinct from the legacy events
> > even if the names match, and it is incorrect to conflate the two.
> 
> This was a behavior added by Intel so that say cpu_atom/legacy-event/
> would only open as a hardware event on that PMU. The point of the
> blamed change is to make that behavior consistent for all core PMUs.

Ok, so Intel has an intel-specific behaviour change, which was ok for them.

That was made generic, but cause d a functional regression on arm (and possibly
other architectures if anyone else cares about the namespaced events).

Why can't this be rteturned to being x86 specific?

> > > > I expect that if I ask for ${pmu}/${event}/, that PMU is used, and the event
> > > > *in that PMU's namespace* is used. Overriding that breaks long-established
> > > > practice and provides users with no recourse to get the behavioru they expect
> > > > (and previosuly had).
> > >
> > > On ARM but not Intel.
> >
> > As above, I don't think the CPU architecture matters here for the case that I'm
> > saying is broken. I think that regardless of CPU architecture (or for any
> > non-CPU PMU) it is semantically incorrect to convert a named-pmu event to a
> > legacy event.
> 
> So perf's behavior has always been that legacy event priority is
> greater-than sysfs and json. The distinction here is that a core PMU
> is explicitly listed and it doesn't seem unreasonable to use core PMU
> names with legacy events, the behavior Intel added.

That may be ok for Intel, but given it *is* causing functional probelsm for
others, why must it remain generic?

> > > > I do think that (regardless of whther this was the sematnic you intended)
> > > > silently overriding events with legacy events is a bug, and one we should fix.
> > > > As I mentioned in another reply, just because the events have the same name
> > > > does not mean that they are semantically the same, so we're liable to give
> > > > people the wrong numbers anyhow.
> > > >
> > > > Can we fix this?
> > >
> > > So I'd like to fix this, some things from various conversations:
> > >
> > > 1) we lack testing. Our testing relies on the sysfs of the machine
> > > being run on, which is better than nothing. I think ideally we'd have
> > > a collection of zipped up sysfs directories and then we could have a
> > > test that asserts on ARM you get the behavior you want.
> >
> > I agree we lack testing, and I'd be happy to help here going forwards, though I
> > don't think this is a prerequisite for fixing this issue.
> >
> > > 2) for RISC-V they want to make the legacy event matching something in
> > > user land to simplify the PMU driver.
> >
> > Ok; I see how this might be related, but it doesn't sound like a prerequisite
> > for fixing this issue -- there are plenty of people in this thread who can
> > test.
> >
> > > 3) I'd like to get rid of the PMU json interface. My idea is to
> > > convert json events/metrics into sysfs style files, zip these up and
> > > then link them into the perf binary. On Intel the json is 70% of the
> > > binary (7MB out of 10MB) and we may get this down to 3MB with this
> > > approach. The json lookup would need to incorporate the cpuid matching
> > > that currently exists. When we look up an event I'd like the approach
> > > to be like unionfs with a specified but configurable order. Users
> > > could provide directories of their own events/metrics for various
> > > PMUs, and then this approach could be used to help with (1).
> >
> > I can see how that might interact with whatever changes we make to fix this
> > issue, but this seems like a future aspiration, and not a prerequisite for
> > fixing the existing functional regression.
> >
> > > Those proposals are not something to add as a -rc fix, so what I think
> > > you're asking for here is a "if ARM" fix somewhere in the event
> > > parsing. That's of course possible but it will cause problems if you
> > > did say:
> > >
> > > perf stat -e arm_pmu/LLC-load-misses/ ...
> >
> > As above, I do not think this is an arm-specific issue, we're just the canary
> > in the coalmine.
> 
> Disagree, see comments above. A behavior change here would impact Intel.

Ok, so have Intel keep the Intel behaviour?

> > Please note that:
> >
> >         perf stat -e arm_pmu/LLC-load-misses/ ...
> >
> > ... would never have worked previously. No arm_pmu instances have a
> > "LLC-load-misses" event in their event namespaces, and we don't have any
> > userspace file mapping that event.
> 
> This event was for the purpose of giving an example, perf list will
> show you events that work. The point is that a legacy event may not be
> available on both BIG.little PMU types so being able to designate the
> PMU there is helpful.

Sure, but (as per my reply to Arnaldo), it's possible to add an unambiguous way
to specify that, e.g a 'hw:' prefix like:

	some_arm_pmu/hw:LLC-load-misses/

... which wouldn't clash and cause hte regression that users are seing.

> > That said, If I really wanted that legacy event, I'd have asked for it bare,
> > e.g.
> >
> >         perf stat -e LLC-load-misses
> >
> > ... and we're in agreement that it's sensible to expand this to multiple
> > PERF_TYPE_HARDWARE events targeting the individual CPU PMUs.
> >
> > So I see no need to do anything to have magic for 'arm_pmu/LLC-load-misses/'.
> >
> > > as I doubt the PMU driver is advertising this legacy event in sysfs
> > > and the "if ARM" logic would presumably be trying to disable legacy
> > > events in the term list for the ARM PMU.
> > >
> > > Given all of this, is anything actually broken and needing a fix for 6.7?
> >
> > There is absolutely a bug that needs to be fixed here (and needs to be
> > backported to stable so that it gets picked up by distributions).
> 
> I'm not seeing this. The behavior is consistent with Intel, this has
> gone 2 releases without being spotted,

This has gone two releases because people has just updated their tools. The
prior behaviour for Arm has been there for most of a decade.

> it was triggered by a PMU event
> name aliasing a legacy event name and the behavior has always been
> legacy event names have higher priority than sysfs and json events.

That has been the case for plain events without a PMU name. That was never the
case for events with a PMU name, or there would not have been any difference in
behaviour.

> Whilst I'm seeing a lot of complaining, I've not seen a proposal of
> what behavior you want. 

As per my initial reply the bevaiour we want is that:

  pmu/eventname/ 

... opens 'eventname' in that PMU's event namespace, rather than converting the
event into a PERF_TYPE_HARDWARE event. That was the prior behaviour, which
people have been using for most of a decade.

I understand that there was some Intel-specific behaviour, and that may need to
be kept for Intel. Making that behaviour generic broke other existing users.

If we need a mechanism to target a legacy event to a specific PMU, we can add
an unambiguous way of descirbing that (e.g. the 'hw:' prefix I've suggested a
few times).


> Isn't it a PMU bug if the legacy event specifying the PMU doesn't get opened
> by the core PMU?

No?

Prior to that mechanism being added to the kernel, there was no way to do that.

When the mechanism was added to x86 specifically, it wasn't a generic feature.

> Fixing the PMU driver appears to be the right fix and means there is
> consistency on core events across architectures.

I think that's orthogonal.

Adding support to the PMU drivers (which has already been done, per the commit
you quoted before) is good so that userspace can do the right thing for:

	perf stat -e some_generic_event ./workload

... but that should not be necessary to retain the existing behaviour for:

	perf stat -e pmu/some_similarly_named_event/ ./workload

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-22 15:49                     ` Mark Rutland
  2023-11-22 16:04                       ` Ian Rogers
@ 2023-11-22 16:19                       ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 53+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-11-22 16:19 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Hector Martin, Ian Rogers, Marc Zyngier, Arnaldo Carvalho de Melo,
	James Clark, linux-perf-users, LKML, Asahi Linux

Em Wed, Nov 22, 2023 at 03:49:18PM +0000, Mark Rutland escreveu:
> On Wed, Nov 22, 2023 at 10:06:23AM -0300, Arnaldo Carvalho de Melo wrote:
> > The point is that "cycles" when prefixed with "pmu/" shouldn't be
> > considered "cycles" as HW/0, in that setting it is "cycles" for that
> > PMU.
 
> Exactly.
 
> > (but we only have "cpu_cycles" for at least the a53 and a72 PMUs I
> > have access in a Libre Computer rockchip 3399-pc hybrid board, if we use
> > it, then we get what we want/had before, see below):
 
> Both Cortex-A53 and Cortex-A72 have the common PMUv3 events, so they have
> "cpu_cycles" and "bus_cycles".

root@roc-rk3399-pc:~# ls -la /sys/devices/*/events/*cycles
-r--r--r-- 1 root root 4096 Nov 22 12:35 /sys/devices/armv8_cortex_a53/events/bus_cycles
-r--r--r-- 1 root root 4096 Nov 22 12:35 /sys/devices/armv8_cortex_a53/events/cpu_cycles
-r--r--r-- 1 root root 4096 Nov 22 12:35 /sys/devices/armv8_cortex_a72/events/bus_cycles
-r--r--r-- 1 root root 4096 Nov 22 12:35 /sys/devices/armv8_cortex_a72/events/cpu_cycles
root@roc-rk3399-pc:~#

But on x86, on a AMD machine:

⬢[acme@toolbox ~]$ ls -la /sys/devices/*/events/*cycles
-r--r--r--. 1 nobody nobody 4096 Nov 22 12:48 /sys/devices/cpu/events/cpu-cycles
⬢[acme@toolbox ~]$

And an Intel:

[acme@quaco asahi]$ ls -la /sys/devices/*/events/*cycles
-r--r--r--. 1 root root 4096 Nov 22 13:11 /sys/devices/cpu/events/bus-cycles
-r--r--r--. 1 root root 4096 Nov 22 13:11 /sys/devices/cpu/events/cpu-cycles
-r--r--r--. 1 root root 4096 Nov 22 13:11 /sys/devices/cpu/events/ref-cycles
[acme@quaco asahi]$

Slight difference with those - and _.
 
> The Apple PMUs that Hector and Marc anre using don't follow the PMUv3
> architecture, and just have a "cycles" event.

I see, and even being prefixed with the PMU name, as
"apple_icestorm_pmu/cycles/" it ends up trumping that and moving that to
(PERF_TYPE_HARDWARE, PERF_HW_CPU_CYCLES) instead of
(/sys/devices/apple_icestorm_pmu/events/type,
/sys/devices/apple_icestorm_pmu/events/cycles) as I noticed with:

sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8perf_event_open({type=PERF_TYPE_HARDWARE, size=0 /* PERF_ATTR_SIZE_??? */, config=0x7<<32|PERF_COUNT_HW_CPU_CYCLES, sample_period=0, sample_type=0, read_format=0, disabled=1, precise_ip=0 /* arbitrary skid */, ...}, 0, -1, -1, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)

I.e.:

type=PERF_TYPE_HARDWARE, config=0x7<<32|PERF_COUNT_HW_CPU_CYCLES

It should be:

type=/sys/devices/apple_icestorm_pmu/events/type, config=/sys/devices/apple_icestorm_pmu/events/cycles

That is the minimal patch to address the regression reported, even if
using some kludge to buy time for a longer term more elegant solution,
Ian?

> [...]
 
> > So what we need here seems to be to translate the generic term "cycles"
> > to "cpu_cycles" when a PMU is explicitely passed in the event name and
> > it doesn't have "cycles" and then just retry.
> 
> I'm not sure we need to map that.
> 
> My thinking is:
> 
> * If the user asks for "cycles" without a PMU name, that should use the
>   PERF_TYPE_HARDWARE cycles event. The ARM PMUs handle that correctly when the
>   event is directed to them.
> 
> * If the user asks for "${pmu}/cycles/", that should only use the "cycles"
>   event in that PMU's namespace, not PERF_TYPE_HARDWARE.

And thus, armv8_cortex_a53/cycles/ and armv8_cortex_a72/cycles/ should
just fail as there is no "cycles" for that PMU, no fallback.
 
> * If we need a way so say "use the PERF_TYPE_HARDWARE cycles event on ${pmu}",
>   then we should have a new syntax for that (e.g. as we have for raw events),
>   e.g. it would be possible to have "pmu/hw:cycles/" or something like that.
> 
> That way there's no ambiguity.

- Arnaldo

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-22 16:04                       ` Ian Rogers
@ 2023-11-22 16:26                         ` Arnaldo Carvalho de Melo
  2023-11-22 16:33                           ` Ian Rogers
  0 siblings, 1 reply; 53+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-11-22 16:26 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Mark Rutland, Hector Martin, Marc Zyngier,
	Arnaldo Carvalho de Melo, James Clark, linux-perf-users, LKML,
	Asahi Linux

Em Wed, Nov 22, 2023 at 08:04:26AM -0800, Ian Rogers escreveu:
> On Wed, Nov 22, 2023 at 7:49 AM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Wed, Nov 22, 2023 at 10:06:23AM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Wed, Nov 22, 2023 at 12:23:27PM +0900, Hector Martin escreveu:
> > > > On 2023/11/22 1:38, Ian Rogers wrote:
> > > > > On Tue, Nov 21, 2023 at 8:15 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > > >> On Tue, Nov 21, 2023 at 08:09:37AM -0800, Ian Rogers wrote:
> > > > >>> On Tue, Nov 21, 2023 at 8:03 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > > >>>> On Tue, Nov 21, 2023 at 07:46:57AM -0800, Ian Rogers wrote:
> > > > >>>>> On Tue, Nov 21, 2023 at 7:40 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > > >>>>>> On Tue, Nov 21, 2023 at 03:24:25PM +0000, Marc Zyngier wrote:
> > > > >>>>>>> On Tue, 21 Nov 2023 13:40:31 +0000,
> > > > >>>>>>> Marc Zyngier <maz@kernel.org> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>> [Adding key people on Cc]
> > > > >>>>>>>>
> > > > >>>>>>>> On Tue, 21 Nov 2023 12:08:48 +0000,
> > > > >>>>>>>> Hector Martin <marcan@marcan.st> wrote:
> > > > >>>>>>>>>
> > > > >>>>>>>>> Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > > >>>>>>>>> according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> > > > >>>>>>>>
> > > > >>>>>>>> I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> > > > >>>>>>>> asymmetric ARM platform. It isn't clear what criteria is used to pick
> > > > >>>>>>>> the PMU, but nothing works anymore.
> > > > >>>>>>>>
> > > > >>>>>>>> The saving grace in my case is that Debian still ships a 6.1 perftool
> > > > >>>>>>>> package, but that's obviously not going to last.
> > > > >>>>>>>>
> > > > >>>>>>>> I'm happy to test potential fixes.
> > > > >>>>>>>
> > > > >>>>>>> At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
> > > > >>>>>>> -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
> > > > >>>>>>> CPU):
> > > > >>>>>>
> > > > >>>>>> IIUC the tool is doing the wrong thing here and overriding explicit
> > > > >>>>>> ${pmu}/${event}/ events with PERF_TYPE_HARDWARE events rather than events using
> > > > >>>>>> that ${pmu}'s type and event namespace.
> > > > >>>>>>
> > > > >>>>>> Regardless of the *new* ABI that allows PERF_TYPE_HARDWARE events to be
> > > > >>>>>> targetted to a specific PMU, it's semantically wrong to rewrite events like
> > > > >>>>>> this since ${pmu}/${event}/ is not necessarily equivalent to a similarly-named
> > > > >>>>>> PERF_COUNT_HW_${EVENT}.
> > > > >>>>>
> > > > >>>>> If you name a PMU and an event then the event should only be opened on
> > > > >>>>> that PMU, 100% agree. There's a bunch of output, but when the legacy
> > > > >>>>> cycles event is opened it appears to be because it was explicitly
> > > > >>>>> requested.
> > > > >>>>
> > > > >>>> I think you've missed that the named PMU events are being erreously transformed
> > > > >>>> into PERF_TYPE_HARDWARE events. Look at the -vvv output, e.g.
> > > > >>>>
> > > > >>>>   Opening: apple_firestorm_pmu/cycles/
> > > > >>>>   ------------------------------------------------------------
> > > > >>>>   perf_event_attr:
> > > > >>>>     type                             0 (PERF_TYPE_HARDWARE)
> > > > >>>>     size                             136
> > > > >>>>     config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > > > >>>>     sample_type                      IDENTIFIER
> > > > >>>>     read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > > > >>>>     disabled                         1
> > > > >>>>     inherit                          1
> > > > >>>>     enable_on_exec                   1
> > > > >>>>     exclude_guest                    1
> > > > >>>>   ------------------------------------------------------------
> > > > >>>>   sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
> > > > >>>>
> > > > >>>> ... which should not be PERF_TYPE_HARDWARE && PERF_COUNT_HW_CPU_CYCLES.
> > > > >>>>
> > > > >>>> Marc said that he bisected the issue down to commit:
> > > > >>>>
> > > > >>>>   5ea8f2ccffb23983 ("perf parse-events: Support hardware events as terms")
> > > > >>>>
> > > > >>>> ... so it looks like something is going wrong when the events are being parsed,
> > > > >>>> e.g. losing the HW PMU information?
> > > > >>>
> > > > >>> Ok, I think I'm getting confused by other things. This looks like the issue.
> > > > >>>
> > > > >>> I think it may be working as intended, but not how you intended :-) If
> > > > >>> a core PMU is listed and then a legacy event, the legacy event should
> > >
> > > The point is that "cycles" when prefixed with "pmu/" shouldn't be
> > > considered "cycles" as HW/0, in that setting it is "cycles" for that
> > > PMU.
> >
> > Exactly.
> >
> > > (but we only have "cpu_cycles" for at least the a53 and a72 PMUs I
> > > have access in a Libre Computer rockchip 3399-pc hybrid board, if we use
> > > it, then we get what we want/had before, see below):
> >
> > Both Cortex-A53 and Cortex-A72 have the common PMUv3 events, so they have
> > "cpu_cycles" and "bus_cycles".
> >
> > The Apple PMUs that Hector and Marc anre using don't follow the PMUv3
> > architecture, and just have a "cycles" event.
> >
> > [...]
> >
> > > So what we need here seems to be to translate the generic term "cycles"
> > > to "cpu_cycles" when a PMU is explicitely passed in the event name and
> > > it doesn't have "cycles" and then just retry.
> >
> > I'm not sure we need to map that.
> >
> > My thinking is:
> >
> > * If the user asks for "cycles" without a PMU name, that should use the
> >   PERF_TYPE_HARDWARE cycles event. The ARM PMUs handle that correctly when the
> >   event is directed to them.
> >
> > * If the user asks for "${pmu}/cycles/", that should only use the "cycles"
> >   event in that PMU's namespace, not PERF_TYPE_HARDWARE.
> >
> > * If we need a way so say "use the PERF_TYPE_HARDWARE cycles event on ${pmu}",
> >   then we should have a new syntax for that (e.g. as we have for raw events),
> >   e.g. it would be possible to have "pmu/hw:cycles/" or something like that.
> >
> > That way there's no ambiguity.
> 
> This would break cpu_core/LLC-load-misses/ on Intel hybrid as the
> LLC-load-misses event is legacy and not advertised in either sysfs or
> in json.

Indeed:

[root@quaco ~]# ls /sys/devices/cpu/events/
branch-instructions  bus-cycles    cache-references  instructions  mem-stores  topdown-fetch-bubbles     topdown-recovery-bubbles.scale  topdown-slots-retired  topdown-total-slots.scale
branch-misses        cache-misses  cpu-cycles        mem-loads     ref-cycles  topdown-recovery-bubbles  topdown-slots-issued            topdown-total-slots
[root@quaco ~]# strace -e perf_event_open perf stat -e cpu/LLC-load-misses/ echo
perf_event_open({type=PERF_TYPE_HW_CACHE, size=0x88 /* PERF_ATTR_SIZE_??? */, config=PERF_COUNT_HW_CACHE_RESULT_MISS<<16|PERF_COUNT_HW_CACHE_OP_READ<<8|PERF_COUNT_HW_CACHE_LL, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, exclude_guest=1, ...}, 41467, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3

--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=41467, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---

 Performance counter stats for 'echo':

             1,015      cpu/LLC-load-misses/

       0.005167119 seconds time elapsed

       0.000821000 seconds user
       0.004105000 seconds sys


--- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=41466, si_uid=0} ---
+++ exited with 0 +++
[root@quaco ~]#

Is it difficult to before doing the current expansion to
PERF_TYPE_HARDWARE/PERF_HW_CPU_CYCLES just check if there is an event
with the name specified in the PMU specified, if there is, use that.

- Arnaldo

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-22 16:08                     ` Mark Rutland
@ 2023-11-22 16:29                       ` Ian Rogers
  2023-11-22 16:55                         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2023-11-22 16:29 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Marc Zyngier, Hector Martin, Arnaldo Carvalho de Melo,
	James Clark, linux-perf-users, LKML, Asahi Linux

On Wed, Nov 22, 2023 at 8:08 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Wed, Nov 22, 2023 at 07:29:34AM -0800, Ian Rogers wrote:
> > On Wed, Nov 22, 2023 at 5:04 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > On Tue, Nov 21, 2023 at 08:38:45AM -0800, Ian Rogers wrote:
> > > > On Tue, Nov 21, 2023 at 8:15 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > > > On Tue, Nov 21, 2023 at 08:09:37AM -0800, Ian Rogers wrote:
> > > > > > On Tue, Nov 21, 2023 at 8:03 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > > > > > On Tue, Nov 21, 2023 at 07:46:57AM -0800, Ian Rogers wrote:
> > > > > > > > On Tue, Nov 21, 2023 at 7:40 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > > > > > > > On Tue, Nov 21, 2023 at 03:24:25PM +0000, Marc Zyngier wrote:
> > > > > > > > > > On Tue, 21 Nov 2023 13:40:31 +0000,
> > > > > > > > > > Marc Zyngier <maz@kernel.org> wrote:
> > > > > > > > > > >
> > > > > > > > > > > [Adding key people on Cc]
> > > > > > > > > > >
> > > > > > > > > > > On Tue, 21 Nov 2023 12:08:48 +0000,
> > > > > > > > > > > Hector Martin <marcan@marcan.st> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > > > > > > > > > > according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> > > > > > > > > > >
> > > > > > > > > > > I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> > > > > > > > > > > asymmetric ARM platform. It isn't clear what criteria is used to pick
> > > > > > > > > > > the PMU, but nothing works anymore.
> > > > > > > > > > >
> > > > > > > > > > > The saving grace in my case is that Debian still ships a 6.1 perftool
> > > > > > > > > > > package, but that's obviously not going to last.
> > > > > > > > > > >
> > > > > > > > > > > I'm happy to test potential fixes.
> > > > > > > > > >
> > > > > > > > > > At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
> > > > > > > > > > -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
> > > > > > > > > > CPU):
> > > > > > > > >
> > > > > > > > > IIUC the tool is doing the wrong thing here and overriding explicit
> > > > > > > > > ${pmu}/${event}/ events with PERF_TYPE_HARDWARE events rather than events using
> > > > > > > > > that ${pmu}'s type and event namespace.
> > > > > > > > >
> > > > > > > > > Regardless of the *new* ABI that allows PERF_TYPE_HARDWARE events to be
> > > > > > > > > targetted to a specific PMU, it's semantically wrong to rewrite events like
> > > > > > > > > this since ${pmu}/${event}/ is not necessarily equivalent to a similarly-named
> > > > > > > > > PERF_COUNT_HW_${EVENT}.
> > > > > > > >
> > > > > > > > If you name a PMU and an event then the event should only be opened on
> > > > > > > > that PMU, 100% agree. There's a bunch of output, but when the legacy
> > > > > > > > cycles event is opened it appears to be because it was explicitly
> > > > > > > > requested.
> > > > > > >
> > > > > > > I think you've missed that the named PMU events are being erreously transformed
> > > > > > > into PERF_TYPE_HARDWARE events. Look at the -vvv output, e.g.
> > > > > > >
> > > > > > >   Opening: apple_firestorm_pmu/cycles/
> > > > > > >   ------------------------------------------------------------
> > > > > > >   perf_event_attr:
> > > > > > >     type                             0 (PERF_TYPE_HARDWARE)
> > > > > > >     size                             136
> > > > > > >     config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > > > > > >     sample_type                      IDENTIFIER
> > > > > > >     read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > > > > > >     disabled                         1
> > > > > > >     inherit                          1
> > > > > > >     enable_on_exec                   1
> > > > > > >     exclude_guest                    1
> > > > > > >   ------------------------------------------------------------
> > > > > > >   sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
> > > > > > >
> > > > > > > ... which should not be PERF_TYPE_HARDWARE && PERF_COUNT_HW_CPU_CYCLES.
> > > > > > >
> > > > > > > Marc said that he bisected the issue down to commit:
> > > > > > >
> > > > > > >   5ea8f2ccffb23983 ("perf parse-events: Support hardware events as terms")
> > > > > > >
> > > > > > > ... so it looks like something is going wrong when the events are being parsed,
> > > > > > > e.g. losing the HW PMU information?
> > > > > >
> > > > > > Ok, I think I'm getting confused by other things. This looks like the issue.
> > > > > >
> > > > > > I think it may be working as intended, but not how you intended :-) If
> > > > > > a core PMU is listed and then a legacy event, the legacy event should
> > > > > > be opened on the core PMU as a legacy event with the extended type
> > > > > > set. This is to allow things like legacy cache events to be opened on
> > > > > > a specified PMU. Legacy event names match with a higher priority than
> > > > > > those in sysfs or json as they are hard coded.
> > > > >
> > > > > That has never been the case previously, so this is user-visible breakage, and
> > > > > it prevents users from being able to do the right thing, so I think that's a
> > > > > broken design.
> > > >
> > > > So the problem was caused by ARM and Intel doing two different things.
> > > > Intel did at least contribute to the perf tool in support for their
> > > > BIG.little/hybrid, so that's why the semantics match their approach.
> > >
> > > I appreciate that, and I agree that from the Arm side we haven't been as
> > > engaged with userspace on this front (please understand I'm the messenger here,
> > > this is something I've repeatedly asked for within Arm).
> > >
> > > Regardless, I don't think that changes the substance of the bug, which is that
> > > we're converting named-pmu events into entirely different PERF_TYPE_HARDWARE
> > > events.
> > >
> > > I agree that expanding plain legacy event names to a set of PMU-tagetted legacy
> > > events makes sense (and even for Arm, that's the right thing to do, IMO). If
> > > I ask for 'cycles' and that gets expanded to multiple legacy cycles events that
> > > target specific CPU PMUs, that's good.
> > >
> > > The thing that doesn't make sense here is converting named-pmu events into
> > > egacy events. If I ask for 'apple_firestorm_pmu/cycles/', that should be the
> > > 'cycles' event in the apple_firestorm_pmu's event namespace, and *shouldn't* be
> > > converted to a (potentially semantically different) PERF_TYPE_HARDWARE event,
> > > even if that's targetted towards the apple_firestorm_pmu. I think that should
> > > be true for *any* PMU, whether thats an arm/x86/whatever CPU PMU or a system
> > > PMU.
> >
> > This is saying that legacy events are lower than system events. We
> > don't do this historically and as it requires extra PMU set up. On an
> > Intel Tigerlake:
> >
> > ```
> > $ ls /sys/devices/cpu/events
> > branch-instructions  cache-misses      instructions  ref-cycles
> > topdown-be-bound
> > branch-misses        cache-references  mem-loads     slots
> > topdown-fe-bound
> > bus-cycles           cpu-cycles        mem-stores    topdown-bad-spec
> > topdown-retiring
> > ```
> > here (at least) branch-misses, bus-cycles, cache-references,
> > cpu-cycles and instructions overlap with legacy event names
> > ```
> > $ perf --version
> > perf version 6.5.6
> > $ perf stat -vv -e branch-misses,bus-cycles,cache-references,cp
> > u-cycles,instructions true
>
> Here you *aren't using a named PMU. As I said before, using the
> PERF_TYPE_HARDWARE events in this case is entriely fine, it's just the
> ${pmu}/${eventname}/ case that I'm saying should use the PMU's namespace,
> which was historically the case, and is what users are depending upon.
>
> i.e.
>
>         perf stat -e cycles ./workload
>
> ... can/should use PERF_TYPE_HARDWARE events, as it used to
>
> However:
>
>         perf srtat -e ${pmu}/cycles/ ./workload
>
> ... should use the PMU's namespaced events, as it used to
>
> > Using CPUID GenuineIntel-6-8D-1
> > intel_pt default config: tsc,mtc,mtc_period=3,psb_period=3,pt,branch
> > Control descriptor is not initialized
> > ------------------------------------------------------------
> > perf_event_attr:
> >  type                             0 (PERF_TYPE_HARDWARE)
> >  size                             136
> >  config                           0x5 (PERF_COUNT_HW_BRANCH_MISSES)
> > ...
> > ------------------------------------------------------------
> > perf_event_attr:
> >  type                             0 (PERF_TYPE_HARDWARE)
> >  size                             136
> >  config                           0x6 (PERF_COUNT_HW_BUS_CYCLES)
> > ...
> > ------------------------------------------------------------
> > perf_event_attr:
> >  type                             0 (PERF_TYPE_HARDWARE)
> >  size                             136
> >  config                           0x2 (PERF_COUNT_HW_CACHE_REFERENCES)
> > ...
> > ------------------------------------------------------------
> > perf_event_attr:
> >  type                             0 (PERF_TYPE_HARDWARE)
> >  size                             136
> >  config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > ...
> > ------------------------------------------------------------
> > perf_event_attr:
> >  type                             0 (PERF_TYPE_HARDWARE)
> >  size                             136
> >  config                           0x1 (PERF_COUNT_HW_INSTRUCTIONS)
> > ...
> > branch-misses: -1: 6571 826226 826226
> > bus-cycles: -1: 31411 826226 826226
> > cache-references: -1: 19507 826226 826226
> > cpu-cycles: -1: 1127215 826226 826226
> > instructions: -1: 1301583 826226 826226
> > branch-misses: 6571 826226 826226
> > bus-cycles: 31411 826226 826226
> > cache-references: 19507 826226 826226
> > cpu-cycles: 1127215 826226 826226
> > instructions: 1301583 826226 826226
> >
> > Performance counter stats for 'true':
> > ...
> > ```
> > ie perf 6.5 and all events even though sysfs has events we're opening
> > them with PERF_TYPE_HARDWARE.
>
> As above, this is a different case.
>
> >
> > > > > > Presumably the expectation was that by advertising a cycles event, presumably
> > > > > > in sysfs, then this is what would be matched.
> > >
> > > Yes. That's how this has always worked prior to the changes Marc referenced.
> > > Note that this can *also* be expaned to events from json databases, but was
> > > *never* previously silently converted to a PERF_TYPE_HARDWARE event.
> > >
> > > Please note that the events in sysfs are *namespaced* to the PMU (specifically,
> > > when using that PMU's dynamic type); they are not necessarily the same as
> > > legacy events (though they may have similar or matching
> > > names in some cases), they may be semantically distinct from the legacy events
> > > even if the names match, and it is incorrect to conflate the two.
> >
> > This was a behavior added by Intel so that say cpu_atom/legacy-event/
> > would only open as a hardware event on that PMU. The point of the
> > blamed change is to make that behavior consistent for all core PMUs.
>
> Ok, so Intel has an intel-specific behaviour change, which was ok for them.
>
> That was made generic, but cause d a functional regression on arm (and possibly
> other architectures if anyone else cares about the namespaced events).
>
> Why can't this be rteturned to being x86 specific?
>
> > > > > I expect that if I ask for ${pmu}/${event}/, that PMU is used, and the event
> > > > > *in that PMU's namespace* is used. Overriding that breaks long-established
> > > > > practice and provides users with no recourse to get the behavioru they expect
> > > > > (and previosuly had).
> > > >
> > > > On ARM but not Intel.
> > >
> > > As above, I don't think the CPU architecture matters here for the case that I'm
> > > saying is broken. I think that regardless of CPU architecture (or for any
> > > non-CPU PMU) it is semantically incorrect to convert a named-pmu event to a
> > > legacy event.
> >
> > So perf's behavior has always been that legacy event priority is
> > greater-than sysfs and json. The distinction here is that a core PMU
> > is explicitly listed and it doesn't seem unreasonable to use core PMU
> > names with legacy events, the behavior Intel added.
>
> That may be ok for Intel, but given it *is* causing functional probelsm for
> others, why must it remain generic?
>
> > > > > I do think that (regardless of whther this was the sematnic you intended)
> > > > > silently overriding events with legacy events is a bug, and one we should fix.
> > > > > As I mentioned in another reply, just because the events have the same name
> > > > > does not mean that they are semantically the same, so we're liable to give
> > > > > people the wrong numbers anyhow.
> > > > >
> > > > > Can we fix this?
> > > >
> > > > So I'd like to fix this, some things from various conversations:
> > > >
> > > > 1) we lack testing. Our testing relies on the sysfs of the machine
> > > > being run on, which is better than nothing. I think ideally we'd have
> > > > a collection of zipped up sysfs directories and then we could have a
> > > > test that asserts on ARM you get the behavior you want.
> > >
> > > I agree we lack testing, and I'd be happy to help here going forwards, though I
> > > don't think this is a prerequisite for fixing this issue.
> > >
> > > > 2) for RISC-V they want to make the legacy event matching something in
> > > > user land to simplify the PMU driver.
> > >
> > > Ok; I see how this might be related, but it doesn't sound like a prerequisite
> > > for fixing this issue -- there are plenty of people in this thread who can
> > > test.
> > >
> > > > 3) I'd like to get rid of the PMU json interface. My idea is to
> > > > convert json events/metrics into sysfs style files, zip these up and
> > > > then link them into the perf binary. On Intel the json is 70% of the
> > > > binary (7MB out of 10MB) and we may get this down to 3MB with this
> > > > approach. The json lookup would need to incorporate the cpuid matching
> > > > that currently exists. When we look up an event I'd like the approach
> > > > to be like unionfs with a specified but configurable order. Users
> > > > could provide directories of their own events/metrics for various
> > > > PMUs, and then this approach could be used to help with (1).
> > >
> > > I can see how that might interact with whatever changes we make to fix this
> > > issue, but this seems like a future aspiration, and not a prerequisite for
> > > fixing the existing functional regression.
> > >
> > > > Those proposals are not something to add as a -rc fix, so what I think
> > > > you're asking for here is a "if ARM" fix somewhere in the event
> > > > parsing. That's of course possible but it will cause problems if you
> > > > did say:
> > > >
> > > > perf stat -e arm_pmu/LLC-load-misses/ ...
> > >
> > > As above, I do not think this is an arm-specific issue, we're just the canary
> > > in the coalmine.
> >
> > Disagree, see comments above. A behavior change here would impact Intel.
>
> Ok, so have Intel keep the Intel behaviour?
>
> > > Please note that:
> > >
> > >         perf stat -e arm_pmu/LLC-load-misses/ ...
> > >
> > > ... would never have worked previously. No arm_pmu instances have a
> > > "LLC-load-misses" event in their event namespaces, and we don't have any
> > > userspace file mapping that event.
> >
> > This event was for the purpose of giving an example, perf list will
> > show you events that work. The point is that a legacy event may not be
> > available on both BIG.little PMU types so being able to designate the
> > PMU there is helpful.
>
> Sure, but (as per my reply to Arnaldo), it's possible to add an unambiguous way
> to specify that, e.g a 'hw:' prefix like:
>
>         some_arm_pmu/hw:LLC-load-misses/
>
> ... which wouldn't clash and cause hte regression that users are seing.
>
> > > That said, If I really wanted that legacy event, I'd have asked for it bare,
> > > e.g.
> > >
> > >         perf stat -e LLC-load-misses
> > >
> > > ... and we're in agreement that it's sensible to expand this to multiple
> > > PERF_TYPE_HARDWARE events targeting the individual CPU PMUs.
> > >
> > > So I see no need to do anything to have magic for 'arm_pmu/LLC-load-misses/'.
> > >
> > > > as I doubt the PMU driver is advertising this legacy event in sysfs
> > > > and the "if ARM" logic would presumably be trying to disable legacy
> > > > events in the term list for the ARM PMU.
> > > >
> > > > Given all of this, is anything actually broken and needing a fix for 6.7?
> > >
> > > There is absolutely a bug that needs to be fixed here (and needs to be
> > > backported to stable so that it gets picked up by distributions).
> >
> > I'm not seeing this. The behavior is consistent with Intel, this has
> > gone 2 releases without being spotted,
>
> This has gone two releases because people has just updated their tools. The
> prior behaviour for Arm has been there for most of a decade.
>
> > it was triggered by a PMU event
> > name aliasing a legacy event name and the behavior has always been
> > legacy event names have higher priority than sysfs and json events.
>
> That has been the case for plain events without a PMU name. That was never the
> case for events with a PMU name, or there would not have been any difference in
> behaviour.
>
> > Whilst I'm seeing a lot of complaining, I've not seen a proposal of
> > what behavior you want.
>
> As per my initial reply the bevaiour we want is that:
>
>   pmu/eventname/
>
> ... opens 'eventname' in that PMU's event namespace, rather than converting the
> event into a PERF_TYPE_HARDWARE event. That was the prior behaviour, which
> people have been using for most of a decade.
>
> I understand that there was some Intel-specific behaviour, and that may need to
> be kept for Intel. Making that behaviour generic broke other existing users.
>
> If we need a mechanism to target a legacy event to a specific PMU, we can add
> an unambiguous way of descirbing that (e.g. the 'hw:' prefix I've suggested a
> few times).
>
>
> > Isn't it a PMU bug if the legacy event specifying the PMU doesn't get opened
> > by the core PMU?
>
> No?
>
> Prior to that mechanism being added to the kernel, there was no way to do that.
>
> When the mechanism was added to x86 specifically, it wasn't a generic feature.
>
> > Fixing the PMU driver appears to be the right fix and means there is
> > consistency on core events across architectures.
>
> I think that's orthogonal.
>
> Adding support to the PMU drivers (which has already been done, per the commit
> you quoted before) is good so that userspace can do the right thing for:
>
>         perf stat -e some_generic_event ./workload
>
> ... but that should not be necessary to retain the existing behaviour for:
>
>         perf stat -e pmu/some_similarly_named_event/ ./workload
>
> Thanks,
> Mark.

Given the PMU mapping exists, what is the difficulty in the case of
this PMU? I could explain what I see on ARMv8 devices and the broken
PMU landscape from the last 10 years but that hardly feels
constructive here. I'm not understanding the difficulty of
translating:

struct perf_event_attr {
...
 .type = PERF_TYPE_HARDARE,
 .config = <pmu's type> << 32 | PERF_COUNT_HW_CPU_CYCLES,
...
}

to the event called "cycles" that the PMU is advertising? Given the
mapping already has to exist for every core PMU driver.

I can look at doing an event parser change like:

```
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index aa2f5c6fc7fc..9a18fda525d2 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -986,7 +986,8 @@ static int config_term_pmu(struct perf_event_attr *attr,
                                                          err_str,
/*help=*/NULL);
                       return -EINVAL;
               }
-               if (perf_pmu__supports_legacy_cache(pmu)) {
+               if (perf_pmu__supports_legacy_cache(pmu) &&
+                   !perf_pmu__have_event(pmu, term->val.str)) {
                       attr->type = PERF_TYPE_HW_CACHE;
                       return
parse_events__decode_legacy_cache(term->config, pmu->type,
                                                                &attr->config);
@@ -1004,10 +1005,15 @@ static int config_term_pmu(struct perf_event_attr *attr,
                                                          err_str,
/*help=*/NULL);
                       return -EINVAL;
               }
-               attr->type = PERF_TYPE_HARDWARE;
-               attr->config = term->val.num;
-               if (perf_pmus__supports_extended_type())
-                       attr->config |= (__u64)pmu->type << PERF_PMU_TYPE_SHIFT;
+               if (perf_pmu__have_event(pmu, term->val.str)) {
+                       /* If the PMU has a sysfs or json event prefer
it over legacy. ARM requires this. */
+                       term->term_type = PARSE_EVENTS__TERM_TYPE_USER;
+               } else {
+                       attr->type = PERF_TYPE_HARDWARE;
+                       attr->config = term->val.num;
+                       if (perf_pmus__supports_extended_type())
+                               attr->config |= (__u64)pmu->type <<
PERF_PMU_TYPE_SHIFT;
+               }
               return 0;
       }
       if (term->type_term == PARSE_EVENTS__TERM_TYPE_USER ||
```
(note: this is incomplete as term->val.str isn't populated for
PARSE_EVENTS__TERM_TYPE_HARDWARE)

but this is a behavioral change on Intel and shouldn't therefore come
in as an rc fix.

Thanks,
Ian

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-22 16:26                         ` Arnaldo Carvalho de Melo
@ 2023-11-22 16:33                           ` Ian Rogers
  0 siblings, 0 replies; 53+ messages in thread
From: Ian Rogers @ 2023-11-22 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Mark Rutland, Hector Martin, Marc Zyngier,
	Arnaldo Carvalho de Melo, James Clark, linux-perf-users, LKML,
	Asahi Linux

On Wed, Nov 22, 2023 at 8:26 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Wed, Nov 22, 2023 at 08:04:26AM -0800, Ian Rogers escreveu:
> > On Wed, Nov 22, 2023 at 7:49 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > On Wed, Nov 22, 2023 at 10:06:23AM -0300, Arnaldo Carvalho de Melo wrote:
> > > > Em Wed, Nov 22, 2023 at 12:23:27PM +0900, Hector Martin escreveu:
> > > > > On 2023/11/22 1:38, Ian Rogers wrote:
> > > > > > On Tue, Nov 21, 2023 at 8:15 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > > > >> On Tue, Nov 21, 2023 at 08:09:37AM -0800, Ian Rogers wrote:
> > > > > >>> On Tue, Nov 21, 2023 at 8:03 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > > > >>>> On Tue, Nov 21, 2023 at 07:46:57AM -0800, Ian Rogers wrote:
> > > > > >>>>> On Tue, Nov 21, 2023 at 7:40 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > > > >>>>>> On Tue, Nov 21, 2023 at 03:24:25PM +0000, Marc Zyngier wrote:
> > > > > >>>>>>> On Tue, 21 Nov 2023 13:40:31 +0000,
> > > > > >>>>>>> Marc Zyngier <maz@kernel.org> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>> [Adding key people on Cc]
> > > > > >>>>>>>>
> > > > > >>>>>>>> On Tue, 21 Nov 2023 12:08:48 +0000,
> > > > > >>>>>>>> Hector Martin <marcan@marcan.st> wrote:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > > > >>>>>>>>> according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> > > > > >>>>>>>>
> > > > > >>>>>>>> I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> > > > > >>>>>>>> asymmetric ARM platform. It isn't clear what criteria is used to pick
> > > > > >>>>>>>> the PMU, but nothing works anymore.
> > > > > >>>>>>>>
> > > > > >>>>>>>> The saving grace in my case is that Debian still ships a 6.1 perftool
> > > > > >>>>>>>> package, but that's obviously not going to last.
> > > > > >>>>>>>>
> > > > > >>>>>>>> I'm happy to test potential fixes.
> > > > > >>>>>>>
> > > > > >>>>>>> At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
> > > > > >>>>>>> -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
> > > > > >>>>>>> CPU):
> > > > > >>>>>>
> > > > > >>>>>> IIUC the tool is doing the wrong thing here and overriding explicit
> > > > > >>>>>> ${pmu}/${event}/ events with PERF_TYPE_HARDWARE events rather than events using
> > > > > >>>>>> that ${pmu}'s type and event namespace.
> > > > > >>>>>>
> > > > > >>>>>> Regardless of the *new* ABI that allows PERF_TYPE_HARDWARE events to be
> > > > > >>>>>> targetted to a specific PMU, it's semantically wrong to rewrite events like
> > > > > >>>>>> this since ${pmu}/${event}/ is not necessarily equivalent to a similarly-named
> > > > > >>>>>> PERF_COUNT_HW_${EVENT}.
> > > > > >>>>>
> > > > > >>>>> If you name a PMU and an event then the event should only be opened on
> > > > > >>>>> that PMU, 100% agree. There's a bunch of output, but when the legacy
> > > > > >>>>> cycles event is opened it appears to be because it was explicitly
> > > > > >>>>> requested.
> > > > > >>>>
> > > > > >>>> I think you've missed that the named PMU events are being erreously transformed
> > > > > >>>> into PERF_TYPE_HARDWARE events. Look at the -vvv output, e.g.
> > > > > >>>>
> > > > > >>>>   Opening: apple_firestorm_pmu/cycles/
> > > > > >>>>   ------------------------------------------------------------
> > > > > >>>>   perf_event_attr:
> > > > > >>>>     type                             0 (PERF_TYPE_HARDWARE)
> > > > > >>>>     size                             136
> > > > > >>>>     config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > > > > >>>>     sample_type                      IDENTIFIER
> > > > > >>>>     read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > > > > >>>>     disabled                         1
> > > > > >>>>     inherit                          1
> > > > > >>>>     enable_on_exec                   1
> > > > > >>>>     exclude_guest                    1
> > > > > >>>>   ------------------------------------------------------------
> > > > > >>>>   sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
> > > > > >>>>
> > > > > >>>> ... which should not be PERF_TYPE_HARDWARE && PERF_COUNT_HW_CPU_CYCLES.
> > > > > >>>>
> > > > > >>>> Marc said that he bisected the issue down to commit:
> > > > > >>>>
> > > > > >>>>   5ea8f2ccffb23983 ("perf parse-events: Support hardware events as terms")
> > > > > >>>>
> > > > > >>>> ... so it looks like something is going wrong when the events are being parsed,
> > > > > >>>> e.g. losing the HW PMU information?
> > > > > >>>
> > > > > >>> Ok, I think I'm getting confused by other things. This looks like the issue.
> > > > > >>>
> > > > > >>> I think it may be working as intended, but not how you intended :-) If
> > > > > >>> a core PMU is listed and then a legacy event, the legacy event should
> > > >
> > > > The point is that "cycles" when prefixed with "pmu/" shouldn't be
> > > > considered "cycles" as HW/0, in that setting it is "cycles" for that
> > > > PMU.
> > >
> > > Exactly.
> > >
> > > > (but we only have "cpu_cycles" for at least the a53 and a72 PMUs I
> > > > have access in a Libre Computer rockchip 3399-pc hybrid board, if we use
> > > > it, then we get what we want/had before, see below):
> > >
> > > Both Cortex-A53 and Cortex-A72 have the common PMUv3 events, so they have
> > > "cpu_cycles" and "bus_cycles".
> > >
> > > The Apple PMUs that Hector and Marc anre using don't follow the PMUv3
> > > architecture, and just have a "cycles" event.
> > >
> > > [...]
> > >
> > > > So what we need here seems to be to translate the generic term "cycles"
> > > > to "cpu_cycles" when a PMU is explicitely passed in the event name and
> > > > it doesn't have "cycles" and then just retry.
> > >
> > > I'm not sure we need to map that.
> > >
> > > My thinking is:
> > >
> > > * If the user asks for "cycles" without a PMU name, that should use the
> > >   PERF_TYPE_HARDWARE cycles event. The ARM PMUs handle that correctly when the
> > >   event is directed to them.
> > >
> > > * If the user asks for "${pmu}/cycles/", that should only use the "cycles"
> > >   event in that PMU's namespace, not PERF_TYPE_HARDWARE.
> > >
> > > * If we need a way so say "use the PERF_TYPE_HARDWARE cycles event on ${pmu}",
> > >   then we should have a new syntax for that (e.g. as we have for raw events),
> > >   e.g. it would be possible to have "pmu/hw:cycles/" or something like that.
> > >
> > > That way there's no ambiguity.
> >
> > This would break cpu_core/LLC-load-misses/ on Intel hybrid as the
> > LLC-load-misses event is legacy and not advertised in either sysfs or
> > in json.
>
> Indeed:
>
> [root@quaco ~]# ls /sys/devices/cpu/events/
> branch-instructions  bus-cycles    cache-references  instructions  mem-stores  topdown-fetch-bubbles     topdown-recovery-bubbles.scale  topdown-slots-retired  topdown-total-slots.scale
> branch-misses        cache-misses  cpu-cycles        mem-loads     ref-cycles  topdown-recovery-bubbles  topdown-slots-issued            topdown-total-slots
> [root@quaco ~]# strace -e perf_event_open perf stat -e cpu/LLC-load-misses/ echo
> perf_event_open({type=PERF_TYPE_HW_CACHE, size=0x88 /* PERF_ATTR_SIZE_??? */, config=PERF_COUNT_HW_CACHE_RESULT_MISS<<16|PERF_COUNT_HW_CACHE_OP_READ<<8|PERF_COUNT_HW_CACHE_LL, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, exclude_guest=1, ...}, 41467, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
>
> --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=41467, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
>
>  Performance counter stats for 'echo':
>
>              1,015      cpu/LLC-load-misses/
>
>        0.005167119 seconds time elapsed
>
>        0.000821000 seconds user
>        0.004105000 seconds sys
>
>
> --- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=41466, si_uid=0} ---
> +++ exited with 0 +++
> [root@quaco ~]#
>
> Is it difficult to before doing the current expansion to
> PERF_TYPE_HARDWARE/PERF_HW_CPU_CYCLES just check if there is an event
> with the name specified in the PMU specified, if there is, use that.

Agreed and I've sent an early cut of this. The issue is that then we
end up changing the encoding on Intel. I also don't see why ARM
doesn't just fix their PMU.

Thanks,
Ian

> - Arnaldo

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-22 16:29                       ` Ian Rogers
@ 2023-11-22 16:55                         ` Arnaldo Carvalho de Melo
  2023-11-22 16:59                           ` Ian Rogers
  0 siblings, 1 reply; 53+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-11-22 16:55 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Mark Rutland, Marc Zyngier, Hector Martin,
	Arnaldo Carvalho de Melo, James Clark, linux-perf-users, LKML,
	Asahi Linux

Em Wed, Nov 22, 2023 at 08:29:58AM -0800, Ian Rogers escreveu:
> I can look at doing an event parser change like:
> 
> ```
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index aa2f5c6fc7fc..9a18fda525d2 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -986,7 +986,8 @@ static int config_term_pmu(struct perf_event_attr *attr,
>                                                           err_str,
> /*help=*/NULL);
>                        return -EINVAL;
>                }
> -               if (perf_pmu__supports_legacy_cache(pmu)) {
> +               if (perf_pmu__supports_legacy_cache(pmu) &&
> +                   !perf_pmu__have_event(pmu, term->val.str)) {
>                        attr->type = PERF_TYPE_HW_CACHE;
>                        return
> parse_events__decode_legacy_cache(term->config, pmu->type,
>                                                                 &attr->config);
> @@ -1004,10 +1005,15 @@ static int config_term_pmu(struct perf_event_attr *attr,
>                                                           err_str,
> /*help=*/NULL);
>                        return -EINVAL;
>                }
> -               attr->type = PERF_TYPE_HARDWARE;
> -               attr->config = term->val.num;
> -               if (perf_pmus__supports_extended_type())
> -                       attr->config |= (__u64)pmu->type << PERF_PMU_TYPE_SHIFT;
> +               if (perf_pmu__have_event(pmu, term->val.str)) {
> +                       /* If the PMU has a sysfs or json event prefer
> it over legacy. ARM requires this. */
> +                       term->term_type = PARSE_EVENTS__TERM_TYPE_USER;
> +               } else {
> +                       attr->type = PERF_TYPE_HARDWARE;
> +                       attr->config = term->val.num;
> +                       if (perf_pmus__supports_extended_type())
> +                               attr->config |= (__u64)pmu->type <<
> PERF_PMU_TYPE_SHIFT;
> +               }
>                return 0;
>        }
>        if (term->type_term == PARSE_EVENTS__TERM_TYPE_USER ||
> ```
> (note: this is incomplete as term->val.str isn't populated for
> PARSE_EVENTS__TERM_TYPE_HARDWARE)

Yeah, I had to apply manually as your MUA mangled it, then it didn't
build, had to remove some consts, then there was a struct member
mistake, after all fixed I get to the patch below, but it now segfaults,
probably what you mention...

root@roc-rk3399-pc:~# strace -e perf_event_open taskset -c 4,5 perf stat -v -e cycles,armv8_cortex_a53/cycles/,armv8_cortex_a72/cycles/ echo
Using CPUID 0x00000000410fd082
perf_event_open({type=PERF_TYPE_HARDWARE, size=0 /* PERF_ATTR_SIZE_??? */, config=0x7<<32|PERF_COUNT_HW_CPU_CYCLES, sample_period=0, sample_type=0, read_format=0, disabled=1, precise_ip=0 /* arbitrary skid */, ...}, 0, -1, -1, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
+++ killed by SIGSEGV +++
Segmentation fault
root@roc-rk3399-pc:~#

- Arnaldo

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index aa2f5c6fc7fc..1e648454cc49 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -976,7 +976,7 @@ static int config_term_pmu(struct perf_event_attr *attr,
 			   struct parse_events_error *err)
 {
 	if (term->type_term == PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE) {
-		const struct perf_pmu *pmu = perf_pmus__find_by_type(attr->type);
+		struct perf_pmu *pmu = perf_pmus__find_by_type(attr->type);
 
 		if (!pmu) {
 			char *err_str;
@@ -986,7 +986,8 @@ static int config_term_pmu(struct perf_event_attr *attr,
 							   err_str, /*help=*/NULL);
 			return -EINVAL;
 		}
-		if (perf_pmu__supports_legacy_cache(pmu)) {
+		if (perf_pmu__supports_legacy_cache(pmu) &&
+		    !perf_pmu__have_event(pmu, term->val.str)) {
 			attr->type = PERF_TYPE_HW_CACHE;
 			return parse_events__decode_legacy_cache(term->config, pmu->type,
 								 &attr->config);
@@ -994,7 +995,7 @@ static int config_term_pmu(struct perf_event_attr *attr,
 			term->type_term = PARSE_EVENTS__TERM_TYPE_USER;
 	}
 	if (term->type_term == PARSE_EVENTS__TERM_TYPE_HARDWARE) {
-		const struct perf_pmu *pmu = perf_pmus__find_by_type(attr->type);
+		struct perf_pmu *pmu = perf_pmus__find_by_type(attr->type);
 
 		if (!pmu) {
 			char *err_str;
@@ -1004,10 +1005,15 @@ static int config_term_pmu(struct perf_event_attr *attr,
 							   err_str, /*help=*/NULL);
 			return -EINVAL;
 		}
-		attr->type = PERF_TYPE_HARDWARE;
-		attr->config = term->val.num;
-		if (perf_pmus__supports_extended_type())
-			attr->config |= (__u64)pmu->type << PERF_PMU_TYPE_SHIFT;
+		if (perf_pmu__have_event(pmu, term->val.str)) {
+			/* If the PMU has a sysfs or JSON event prefer it over legacy. ARM requires this. */
+			term->type_term = PARSE_EVENTS__TERM_TYPE_USER;
+		} else {
+			attr->type = PERF_TYPE_HARDWARE;
+			attr->config = term->val.num;
+			if (perf_pmus__supports_extended_type())
+			    attr->config |= (__u64)pmu->type << PERF_PMU_TYPE_SHIFT;
+		}
 		return 0;
 	}
 	if (term->type_term == PARSE_EVENTS__TERM_TYPE_USER ||

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-22 16:55                         ` Arnaldo Carvalho de Melo
@ 2023-11-22 16:59                           ` Ian Rogers
  2023-11-23  4:33                             ` Ian Rogers
  0 siblings, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2023-11-22 16:59 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Mark Rutland, Marc Zyngier, Hector Martin,
	Arnaldo Carvalho de Melo, James Clark, linux-perf-users, LKML,
	Asahi Linux

On Wed, Nov 22, 2023 at 8:55 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Wed, Nov 22, 2023 at 08:29:58AM -0800, Ian Rogers escreveu:
> > I can look at doing an event parser change like:
> >
> > ```
> > diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> > index aa2f5c6fc7fc..9a18fda525d2 100644
> > --- a/tools/perf/util/parse-events.c
> > +++ b/tools/perf/util/parse-events.c
> > @@ -986,7 +986,8 @@ static int config_term_pmu(struct perf_event_attr *attr,
> >                                                           err_str,
> > /*help=*/NULL);
> >                        return -EINVAL;
> >                }
> > -               if (perf_pmu__supports_legacy_cache(pmu)) {
> > +               if (perf_pmu__supports_legacy_cache(pmu) &&
> > +                   !perf_pmu__have_event(pmu, term->val.str)) {
> >                        attr->type = PERF_TYPE_HW_CACHE;
> >                        return
> > parse_events__decode_legacy_cache(term->config, pmu->type,
> >                                                                 &attr->config);
> > @@ -1004,10 +1005,15 @@ static int config_term_pmu(struct perf_event_attr *attr,
> >                                                           err_str,
> > /*help=*/NULL);
> >                        return -EINVAL;
> >                }
> > -               attr->type = PERF_TYPE_HARDWARE;
> > -               attr->config = term->val.num;
> > -               if (perf_pmus__supports_extended_type())
> > -                       attr->config |= (__u64)pmu->type << PERF_PMU_TYPE_SHIFT;
> > +               if (perf_pmu__have_event(pmu, term->val.str)) {
> > +                       /* If the PMU has a sysfs or json event prefer
> > it over legacy. ARM requires this. */
> > +                       term->term_type = PARSE_EVENTS__TERM_TYPE_USER;
> > +               } else {
> > +                       attr->type = PERF_TYPE_HARDWARE;
> > +                       attr->config = term->val.num;
> > +                       if (perf_pmus__supports_extended_type())
> > +                               attr->config |= (__u64)pmu->type <<
> > PERF_PMU_TYPE_SHIFT;
> > +               }
> >                return 0;
> >        }
> >        if (term->type_term == PARSE_EVENTS__TERM_TYPE_USER ||
> > ```
> > (note: this is incomplete as term->val.str isn't populated for
> > PARSE_EVENTS__TERM_TYPE_HARDWARE)
>
> Yeah, I had to apply manually as your MUA mangled it, then it didn't
> build, had to remove some consts, then there was a struct member
> mistake, after all fixed I get to the patch below, but it now segfaults,
> probably what you mention...
>
> root@roc-rk3399-pc:~# strace -e perf_event_open taskset -c 4,5 perf stat -v -e cycles,armv8_cortex_a53/cycles/,armv8_cortex_a72/cycles/ echo
> Using CPUID 0x00000000410fd082
> perf_event_open({type=PERF_TYPE_HARDWARE, size=0 /* PERF_ATTR_SIZE_??? */, config=0x7<<32|PERF_COUNT_HW_CPU_CYCLES, sample_period=0, sample_type=0, read_format=0, disabled=1, precise_ip=0 /* arbitrary skid */, ...}, 0, -1, -1, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
> --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
> +++ killed by SIGSEGV +++
> Segmentation fault
> root@roc-rk3399-pc:~#

Right, I have something further along that fails tests. I'll try to
send out an RFC today, but given the Intel behavior change ¯\_(ツ)_/¯
But Intel don't appear to have an issue having two things called, for
example, cycles and them both being a cycles event so they may not
care. It is only ARM's PMUs that appear broken in this way.

Thanks,
Ian

> - Arnaldo
>
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index aa2f5c6fc7fc..1e648454cc49 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -976,7 +976,7 @@ static int config_term_pmu(struct perf_event_attr *attr,
>                            struct parse_events_error *err)
>  {
>         if (term->type_term == PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE) {
> -               const struct perf_pmu *pmu = perf_pmus__find_by_type(attr->type);
> +               struct perf_pmu *pmu = perf_pmus__find_by_type(attr->type);
>
>                 if (!pmu) {
>                         char *err_str;
> @@ -986,7 +986,8 @@ static int config_term_pmu(struct perf_event_attr *attr,
>                                                            err_str, /*help=*/NULL);
>                         return -EINVAL;
>                 }
> -               if (perf_pmu__supports_legacy_cache(pmu)) {
> +               if (perf_pmu__supports_legacy_cache(pmu) &&
> +                   !perf_pmu__have_event(pmu, term->val.str)) {
>                         attr->type = PERF_TYPE_HW_CACHE;
>                         return parse_events__decode_legacy_cache(term->config, pmu->type,
>                                                                  &attr->config);
> @@ -994,7 +995,7 @@ static int config_term_pmu(struct perf_event_attr *attr,
>                         term->type_term = PARSE_EVENTS__TERM_TYPE_USER;
>         }
>         if (term->type_term == PARSE_EVENTS__TERM_TYPE_HARDWARE) {
> -               const struct perf_pmu *pmu = perf_pmus__find_by_type(attr->type);
> +               struct perf_pmu *pmu = perf_pmus__find_by_type(attr->type);
>
>                 if (!pmu) {
>                         char *err_str;
> @@ -1004,10 +1005,15 @@ static int config_term_pmu(struct perf_event_attr *attr,
>                                                            err_str, /*help=*/NULL);
>                         return -EINVAL;
>                 }
> -               attr->type = PERF_TYPE_HARDWARE;
> -               attr->config = term->val.num;
> -               if (perf_pmus__supports_extended_type())
> -                       attr->config |= (__u64)pmu->type << PERF_PMU_TYPE_SHIFT;
> +               if (perf_pmu__have_event(pmu, term->val.str)) {
> +                       /* If the PMU has a sysfs or JSON event prefer it over legacy. ARM requires this. */
> +                       term->type_term = PARSE_EVENTS__TERM_TYPE_USER;
> +               } else {
> +                       attr->type = PERF_TYPE_HARDWARE;
> +                       attr->config = term->val.num;
> +                       if (perf_pmus__supports_extended_type())
> +                           attr->config |= (__u64)pmu->type << PERF_PMU_TYPE_SHIFT;
> +               }
>                 return 0;
>         }
>         if (term->type_term == PARSE_EVENTS__TERM_TYPE_USER ||

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-22 16:59                           ` Ian Rogers
@ 2023-11-23  4:33                             ` Ian Rogers
  0 siblings, 0 replies; 53+ messages in thread
From: Ian Rogers @ 2023-11-23  4:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Mark Rutland, Marc Zyngier, Hector Martin,
	Arnaldo Carvalho de Melo, James Clark, linux-perf-users, LKML,
	Asahi Linux

On Wed, Nov 22, 2023 at 8:59 AM Ian Rogers <irogers@google.com> wrote:
>
> On Wed, Nov 22, 2023 at 8:55 AM Arnaldo Carvalho de Melo
> <acme@kernel.org> wrote:
> >
> > Em Wed, Nov 22, 2023 at 08:29:58AM -0800, Ian Rogers escreveu:
> > > I can look at doing an event parser change like:
> > >
> > > ```
> > > diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> > > index aa2f5c6fc7fc..9a18fda525d2 100644
> > > --- a/tools/perf/util/parse-events.c
> > > +++ b/tools/perf/util/parse-events.c
> > > @@ -986,7 +986,8 @@ static int config_term_pmu(struct perf_event_attr *attr,
> > >                                                           err_str,
> > > /*help=*/NULL);
> > >                        return -EINVAL;
> > >                }
> > > -               if (perf_pmu__supports_legacy_cache(pmu)) {
> > > +               if (perf_pmu__supports_legacy_cache(pmu) &&
> > > +                   !perf_pmu__have_event(pmu, term->val.str)) {
> > >                        attr->type = PERF_TYPE_HW_CACHE;
> > >                        return
> > > parse_events__decode_legacy_cache(term->config, pmu->type,
> > >                                                                 &attr->config);
> > > @@ -1004,10 +1005,15 @@ static int config_term_pmu(struct perf_event_attr *attr,
> > >                                                           err_str,
> > > /*help=*/NULL);
> > >                        return -EINVAL;
> > >                }
> > > -               attr->type = PERF_TYPE_HARDWARE;
> > > -               attr->config = term->val.num;
> > > -               if (perf_pmus__supports_extended_type())
> > > -                       attr->config |= (__u64)pmu->type << PERF_PMU_TYPE_SHIFT;
> > > +               if (perf_pmu__have_event(pmu, term->val.str)) {
> > > +                       /* If the PMU has a sysfs or json event prefer
> > > it over legacy. ARM requires this. */
> > > +                       term->term_type = PARSE_EVENTS__TERM_TYPE_USER;
> > > +               } else {
> > > +                       attr->type = PERF_TYPE_HARDWARE;
> > > +                       attr->config = term->val.num;
> > > +                       if (perf_pmus__supports_extended_type())
> > > +                               attr->config |= (__u64)pmu->type <<
> > > PERF_PMU_TYPE_SHIFT;
> > > +               }
> > >                return 0;
> > >        }
> > >        if (term->type_term == PARSE_EVENTS__TERM_TYPE_USER ||
> > > ```
> > > (note: this is incomplete as term->val.str isn't populated for
> > > PARSE_EVENTS__TERM_TYPE_HARDWARE)
> >
> > Yeah, I had to apply manually as your MUA mangled it, then it didn't
> > build, had to remove some consts, then there was a struct member
> > mistake, after all fixed I get to the patch below, but it now segfaults,
> > probably what you mention...
> >
> > root@roc-rk3399-pc:~# strace -e perf_event_open taskset -c 4,5 perf stat -v -e cycles,armv8_cortex_a53/cycles/,armv8_cortex_a72/cycles/ echo
> > Using CPUID 0x00000000410fd082
> > perf_event_open({type=PERF_TYPE_HARDWARE, size=0 /* PERF_ATTR_SIZE_??? */, config=0x7<<32|PERF_COUNT_HW_CPU_CYCLES, sample_period=0, sample_type=0, read_format=0, disabled=1, precise_ip=0 /* arbitrary skid */, ...}, 0, -1, -1, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
> > --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
> > +++ killed by SIGSEGV +++
> > Segmentation fault
> > root@roc-rk3399-pc:~#
>
> Right, I have something further along that fails tests. I'll try to
> send out an RFC today, but given the Intel behavior change ¯\_(ツ)_/¯
> But Intel don't appear to have an issue having two things called, for
> example, cycles and them both being a cycles event so they may not
> care. It is only ARM's PMUs that appear broken in this way.

To workaround the PMU bug posted:
https://lore.kernel.org/lkml/20231123042922.834425-1-irogers@google.com/

Thanks,
Ian

> Thanks,
> Ian
>
> > - Arnaldo
> >
> > diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> > index aa2f5c6fc7fc..1e648454cc49 100644
> > --- a/tools/perf/util/parse-events.c
> > +++ b/tools/perf/util/parse-events.c
> > @@ -976,7 +976,7 @@ static int config_term_pmu(struct perf_event_attr *attr,
> >                            struct parse_events_error *err)
> >  {
> >         if (term->type_term == PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE) {
> > -               const struct perf_pmu *pmu = perf_pmus__find_by_type(attr->type);
> > +               struct perf_pmu *pmu = perf_pmus__find_by_type(attr->type);
> >
> >                 if (!pmu) {
> >                         char *err_str;
> > @@ -986,7 +986,8 @@ static int config_term_pmu(struct perf_event_attr *attr,
> >                                                            err_str, /*help=*/NULL);
> >                         return -EINVAL;
> >                 }
> > -               if (perf_pmu__supports_legacy_cache(pmu)) {
> > +               if (perf_pmu__supports_legacy_cache(pmu) &&
> > +                   !perf_pmu__have_event(pmu, term->val.str)) {
> >                         attr->type = PERF_TYPE_HW_CACHE;
> >                         return parse_events__decode_legacy_cache(term->config, pmu->type,
> >                                                                  &attr->config);
> > @@ -994,7 +995,7 @@ static int config_term_pmu(struct perf_event_attr *attr,
> >                         term->type_term = PARSE_EVENTS__TERM_TYPE_USER;
> >         }
> >         if (term->type_term == PARSE_EVENTS__TERM_TYPE_HARDWARE) {
> > -               const struct perf_pmu *pmu = perf_pmus__find_by_type(attr->type);
> > +               struct perf_pmu *pmu = perf_pmus__find_by_type(attr->type);
> >
> >                 if (!pmu) {
> >                         char *err_str;
> > @@ -1004,10 +1005,15 @@ static int config_term_pmu(struct perf_event_attr *attr,
> >                                                            err_str, /*help=*/NULL);
> >                         return -EINVAL;
> >                 }
> > -               attr->type = PERF_TYPE_HARDWARE;
> > -               attr->config = term->val.num;
> > -               if (perf_pmus__supports_extended_type())
> > -                       attr->config |= (__u64)pmu->type << PERF_PMU_TYPE_SHIFT;
> > +               if (perf_pmu__have_event(pmu, term->val.str)) {
> > +                       /* If the PMU has a sysfs or JSON event prefer it over legacy. ARM requires this. */
> > +                       term->type_term = PARSE_EVENTS__TERM_TYPE_USER;
> > +               } else {
> > +                       attr->type = PERF_TYPE_HARDWARE;
> > +                       attr->config = term->val.num;
> > +                       if (perf_pmus__supports_extended_type())
> > +                           attr->config |= (__u64)pmu->type << PERF_PMU_TYPE_SHIFT;
> > +               }
> >                 return 0;
> >         }
> >         if (term->type_term == PARSE_EVENTS__TERM_TYPE_USER ||

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-21 15:24   ` Marc Zyngier
  2023-11-21 15:40     ` Mark Rutland
  2023-11-21 15:41     ` Ian Rogers
@ 2023-11-23 14:23     ` Mark Rutland
  2023-11-23 14:45       ` Marc Zyngier
  2023-11-23 15:14       ` Ian Rogers
  2 siblings, 2 replies; 53+ messages in thread
From: Mark Rutland @ 2023-11-23 14:23 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Hector Martin, Arnaldo Carvalho de Melo, Ian Rogers, James Clark,
	linux-perf-users, LKML, Asahi Linux

On Tue, Nov 21, 2023 at 03:24:25PM +0000, Marc Zyngier wrote:
> On Tue, 21 Nov 2023 13:40:31 +0000,
> Marc Zyngier <maz@kernel.org> wrote:
> > 
> > [Adding key people on Cc]
> > 
> > On Tue, 21 Nov 2023 12:08:48 +0000,
> > Hector Martin <marcan@marcan.st> wrote:
> > > 
> > > Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> > 
> > I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> > asymmetric ARM platform. It isn't clear what criteria is used to pick
> > the PMU, but nothing works anymore.
> > 
> > The saving grace in my case is that Debian still ships a 6.1 perftool
> > package, but that's obviously not going to last.
> > 
> > I'm happy to test potential fixes.
> 
> At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
> -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
> CPU):

Looking at this with fresh(er) eyes, I think there's a userspace bug here,
regardless of whether one believes it's correct to convert a named-pmu event to
a PERF_TYPE_HARDWARE event directed at that PMU.

It looks like the userspace tool is dropping the extended type ID after an
initial probe, and requests events with plain PERF_TYPE_HARDWARE (without an
extended type ID), which explains why we seem to get events from one PMU only.

More detail below...

Marc, if you have time, could you run the same commands (on the same kernel)
with a perf tool build from v6.4?

> <quote>
> maz@valley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 0 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
>  apple_firestorm_pmu/cycles/ -e cycles ls
> Using CPUID 0x00000000612f0280
> Attempt to add: apple_icestorm_pmu/cycles=0/
> ..after resolving event: apple_icestorm_pmu/cycles=0/
> Opening: unknown-hardware:HG
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   config                           0xb00000000
>   disabled                         1
> ------------------------------------------------------------

Here config[31:0] is 0 (PERF_COUNT_HW_CPU_CYCLES), and config[63:32] is 0xb,
which is presumably the PMU ID for the apple_icestorm_pmu.

The attr doesn't contain exclude_guest=1, so this will be rejected by the PMU
driver due to its mode exclusion requirements.

> sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
> sys_perf_event_open failed, error -95

... which is what we see here (this is EOPNOTSUPP, which __hw_perf_event_init()
in drivers/perf/arm_pmu.c returns when the mode requested mode exclusion
options aren't supported).

So far, so good...

> Attempt to add: apple_firestorm_pmu/cycles=0/
> ..after resolving event: apple_firestorm_pmu/cycles=0/
> Control descriptor is not initialized
> Opening: apple_icestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------

... but here, the extended type ID has been dropped, and this event is no
longer directed towards the apple_firestorm_pmu PMU, so the kernel can direct
this to *any* CPU PMU...

> sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 3

... and *some* PMU accepts it.

> Opening: apple_firestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------

Likewise here, no extended type ID...

> sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
> Opening: cycles
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------

Likewise here, no extended type ID...

> sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 5
> arch			builtin-diff.o      builtin-mem.o	 common-cmds.h    perf-completion.sh
> bench			builtin-evlist.c    builtin-probe.c	 CREDITS	  perf.h
> Build			builtin-evlist.o    builtin-probe.o	 design.txt	  perf-in.o
> builtin-annotate.c	builtin-ftrace.c    builtin-record.c	 dlfilters	  perf-iostat
> builtin-annotate.o	builtin-ftrace.o    builtin-record.o	 Documentation    perf-iostat.sh
> builtin-bench.c		builtin.h	    builtin-report.c	 FEATURE-DUMP	  perf.o
> builtin-bench.o		builtin-help.c      builtin-report.o	 include	  perf-read-vdso.c
> builtin-buildid-cache.c  builtin-help.o      builtin-sched.c	 jvmti		  perf-sys.h
> builtin-buildid-cache.o  builtin-inject.c    builtin-script.c	 libapi	  PERF-VERSION-FILE
> builtin-buildid-list.c	builtin-inject.o    builtin-script.o	 libperf	  perf-with-kcore
> builtin-buildid-list.o	builtin-kallsyms.c  builtin-stat.c	 libsubcmd	  pmu-events
> builtin-c2c.c		builtin-kallsyms.o  builtin-stat.o	 libsymbol	  python
> builtin-c2c.o		builtin-kmem.c      builtin-timechart.c  Makefile	  python_ext_build
> builtin-config.c	builtin-kvm.c	    builtin-top.c	 Makefile.config  scripts
> builtin-config.o	builtin-kvm.o	    builtin-top.o	 Makefile.perf    tests
> builtin-daemon.c	builtin-kwork.c     builtin-trace.c	 MANIFEST	  trace
> builtin-daemon.o	builtin-list.c      builtin-version.c	 perf		  ui
> builtin-data.c		builtin-list.o      builtin-version.o	 perf-archive	  util
> builtin-data.o		builtin-lock.c      check-headers.sh	 perf-archive.sh
> builtin-diff.c		builtin-mem.c	    command-list.txt	 perf.c
> apple_icestorm_pmu/cycles/: -1: 0 873709 0
> apple_firestorm_pmu/cycles/: -1: 0 873709 0
> cycles: -1: 0 873709 0
> apple_icestorm_pmu/cycles/: 0 873709 0
> apple_firestorm_pmu/cycles/: 0 873709 0
> cycles: 0 873709 0
> 
>  Performance counter stats for 'ls':
> 
>      <not counted>      apple_icestorm_pmu/cycles/                                              (0.00%)
>      <not counted>      apple_firestorm_pmu/cycles/                                             (0.00%)
>      <not counted>      cycles                                                                  (0.00%)
> 
>        0.000002250 seconds time elapsed
> 
>        0.000000000 seconds user
>        0.000000000 seconds sys

So it looks like the tool has expanded the requested
'apple_icestorm_pmu/cycles/' event into three cycles events, each opened
without an extended type ID.

AFAICT, the kernel has done exactly what it has always done for
PERF_TYPE_HARDWARE/PERF_COUNT_HW_CPU_CYCLES events: pick the first PMU which
said it can handle them.

> If I run the same thing on another CPU cluster (firestorm), I get
> this:
> 
> <quote>
> maz@valley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 2 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
>  apple_firestorm_pmu/cycles/ -e cycles ls
> Using CPUID 0x00000000612f0280
> Attempt to add: apple_icestorm_pmu/cycles=0/
> ..after resolving event: apple_icestorm_pmu/cycles=0/
> Opening: unknown-hardware:HG
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   config                           0xb00000000
>   disabled                         1
> ------------------------------------------------------------
> sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
> sys_perf_event_open failed, error -95

Again, we see one request with an extended type ID, which fails due to mode exclusion requirements...

> Attempt to add: apple_firestorm_pmu/cycles=0/
> ..after resolving event: apple_firestorm_pmu/cycles=0/
> Control descriptor is not initialized
> Opening: apple_icestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 3
> Opening: apple_firestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 4
> Opening: cycles
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------

... but all subsequent requests do not have an extended type ID, and the kernel
directs these to whichever PMU accepts the event first...

> sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 5
> arch			builtin-diff.o      builtin-mem.o	 common-cmds.h    perf-completion.sh
> bench			builtin-evlist.c    builtin-probe.c	 CREDITS	  perf.h
> Build			builtin-evlist.o    builtin-probe.o	 design.txt	  perf-in.o
> builtin-annotate.c	builtin-ftrace.c    builtin-record.c	 dlfilters	  perf-iostat
> builtin-annotate.o	builtin-ftrace.o    builtin-record.o	 Documentation    perf-iostat.sh
> builtin-bench.c		builtin.h	    builtin-report.c	 FEATURE-DUMP	  perf.o
> builtin-bench.o		builtin-help.c      builtin-report.o	 include	  perf-read-vdso.c
> builtin-buildid-cache.c  builtin-help.o      builtin-sched.c	 jvmti		  perf-sys.h
> builtin-buildid-cache.o  builtin-inject.c    builtin-script.c	 libapi	  PERF-VERSION-FILE
> builtin-buildid-list.c	builtin-inject.o    builtin-script.o	 libperf	  perf-with-kcore
> builtin-buildid-list.o	builtin-kallsyms.c  builtin-stat.c	 libsubcmd	  pmu-events
> builtin-c2c.c		builtin-kallsyms.o  builtin-stat.o	 libsymbol	  python
> builtin-c2c.o		builtin-kmem.c      builtin-timechart.c  Makefile	  python_ext_build
> builtin-config.c	builtin-kvm.c	    builtin-top.c	 Makefile.config  scripts
> builtin-config.o	builtin-kvm.o	    builtin-top.o	 Makefile.perf    tests
> builtin-daemon.c	builtin-kwork.c     builtin-trace.c	 MANIFEST	  trace
> builtin-daemon.o	builtin-list.c      builtin-version.c	 perf		  ui
> builtin-data.c		builtin-list.o      builtin-version.o	 perf-archive	  util
> builtin-data.o		builtin-lock.c      check-headers.sh	 perf-archive.sh
> builtin-diff.c		builtin-mem.c	    command-list.txt	 perf.c
> apple_icestorm_pmu/cycles/: -1: 1035101 469125 469125
> apple_firestorm_pmu/cycles/: -1: 1035035 469125 469125
> cycles: -1: 1034653 469125 469125
> apple_icestorm_pmu/cycles/: 1035101 469125 469125
> apple_firestorm_pmu/cycles/: 1035035 469125 469125
> cycles: 1034653 469125 469125
> 
>  Performance counter stats for 'ls':
> 
>          1,035,101      apple_icestorm_pmu/cycles/                                            
>          1,035,035      apple_firestorm_pmu/cycles/                                           
>          1,034,653      cycles                                                                
> 
>        0.000001333 seconds time elapsed
> 
>        0.000000000 seconds user
>        0.000000000 seconds sys
> </quote>

... and in this case the workload was run on a CPU affine ot that arbitrary
PMU, hence we managed to count.

So AFAICT, this is a userspace bug, maybe related to the way we probe for
supported PMU features?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-23 14:23     ` Mark Rutland
@ 2023-11-23 14:45       ` Marc Zyngier
  2023-11-23 15:14       ` Ian Rogers
  1 sibling, 0 replies; 53+ messages in thread
From: Marc Zyngier @ 2023-11-23 14:45 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Hector Martin, Arnaldo Carvalho de Melo, Ian Rogers, James Clark,
	linux-perf-users, LKML, Asahi Linux

On Thu, 23 Nov 2023 14:23:10 +0000,
Mark Rutland <mark.rutland@arm.com> wrote:
> 
> On Tue, Nov 21, 2023 at 03:24:25PM +0000, Marc Zyngier wrote:
> > On Tue, 21 Nov 2023 13:40:31 +0000,
> > Marc Zyngier <maz@kernel.org> wrote:
> > > 
> > > [Adding key people on Cc]
> > > 
> > > On Tue, 21 Nov 2023 12:08:48 +0000,
> > > Hector Martin <marcan@marcan.st> wrote:
> > > > 
> > > > Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > > according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> > > 
> > > I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> > > asymmetric ARM platform. It isn't clear what criteria is used to pick
> > > the PMU, but nothing works anymore.
> > > 
> > > The saving grace in my case is that Debian still ships a 6.1 perftool
> > > package, but that's obviously not going to last.
> > > 
> > > I'm happy to test potential fixes.
> > 
> > At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
> > -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
> > CPU):
> 
> Looking at this with fresh(er) eyes, I think there's a userspace bug here,
> regardless of whether one believes it's correct to convert a named-pmu event to
> a PERF_TYPE_HARDWARE event directed at that PMU.
> 
> It looks like the userspace tool is dropping the extended type ID after an
> initial probe, and requests events with plain PERF_TYPE_HARDWARE (without an
> extended type ID), which explains why we seem to get events from one PMU only.
> 
> More detail below...
> 
> Marc, if you have time, could you run the same commands (on the same kernel)
> with a perf tool build from v6.4?

Here you go:

<quote>
$ sudo taskset -c 0 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e  apple_firestorm_pmu/cycles/ -e cycles ls >/dev/null
Using CPUID 0x00000000610f0280
Attempting to add event pmu 'apple_icestorm_pmu' with 'cycles,' that may result in non-fatal errors
After aliases, add event pmu 'apple_icestorm_pmu' with 'event,' that may result in non-fatal errors
Attempting to add event pmu 'apple_firestorm_pmu' with 'cycles,' that may result in non-fatal errors
After aliases, add event pmu 'apple_firestorm_pmu' with 'event,' that may result in non-fatal errors
Control descriptor is not initialized
------------------------------------------------------------
perf_event_attr:
  type                             10
  size                             136
  config                           0x2
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 1624462  cpu -1  group_fd -1  flags 0x8 = 3
------------------------------------------------------------
perf_event_attr:
  type                             11
  size                             136
  config                           0x2
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 1624462  cpu -1  group_fd -1  flags 0x8 = 4
------------------------------------------------------------
perf_event_attr:
  size                             136
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 1624462  cpu -1  group_fd -1  flags 0x8 = 5
apple_icestorm_pmu/cycles/: -1: 1492180 724333 724333
apple_firestorm_pmu/cycles/: -1: 0 724333 0
cycles: -1: 0 724333 0
apple_icestorm_pmu/cycles/: 1492180 724333 724333
apple_firestorm_pmu/cycles/: 0 724333 0
cycles: 0 724333 0

 Performance counter stats for 'ls':

         1,492,180      apple_icestorm_pmu/cycles/                                            
     <not counted>      apple_firestorm_pmu/cycles/                                             (0.00%)
     <not counted>      cycles                                                                  (0.00%)

       0.000001917 seconds time elapsed

       0.000000000 seconds user
       0.000000000 seconds sys
</quote>

and on the other cluster:

<quote>
$ sudo taskset -c 2 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e  apple_firestorm_pmu/cycles/ -e cycles ls >/dev/null
Using CPUID 0x00000000610f0280
Attempting to add event pmu 'apple_icestorm_pmu' with 'cycles,' that may result in non-fatal errors
After aliases, add event pmu 'apple_icestorm_pmu' with 'event,' that may result in non-fatal errors
Attempting to add event pmu 'apple_firestorm_pmu' with 'cycles,' that may result in non-fatal errors
After aliases, add event pmu 'apple_firestorm_pmu' with 'event,' that may result in non-fatal errors
Control descriptor is not initialized
------------------------------------------------------------
perf_event_attr:
  type                             10
  size                             136
  config                           0x2
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 1624466  cpu -1  group_fd -1  flags 0x8 = 3
------------------------------------------------------------
perf_event_attr:
  type                             11
  size                             136
  config                           0x2
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 1624466  cpu -1  group_fd -1  flags 0x8 = 4
------------------------------------------------------------
perf_event_attr:
  size                             136
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 1624466  cpu -1  group_fd -1  flags 0x8 = 5
apple_icestorm_pmu/cycles/: -1: 0 593209 0
apple_firestorm_pmu/cycles/: -1: 1038247 593209 593209
cycles: -1: 1037870 593209 593209
apple_icestorm_pmu/cycles/: 0 593209 0
apple_firestorm_pmu/cycles/: 1038247 593209 593209
cycles: 1037870 593209 593209

 Performance counter stats for 'ls':

     <not counted>      apple_icestorm_pmu/cycles/                                              (0.00%)
         1,038,247      apple_firestorm_pmu/cycles/                                           
         1,037,870      cycles                                                                

       0.000001500 seconds time elapsed

       0.000000000 seconds user
       0.000000000 seconds sys
</quote>

For the record, this is on a 6.6-rc6 kernel, userspace perf as of v6.4.0.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-23 14:23     ` Mark Rutland
  2023-11-23 14:45       ` Marc Zyngier
@ 2023-11-23 15:14       ` Ian Rogers
  2023-11-23 16:48         ` Mark Rutland
  1 sibling, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2023-11-23 15:14 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Marc Zyngier, Hector Martin, Arnaldo Carvalho de Melo,
	James Clark, linux-perf-users, LKML, Asahi Linux

On Thu, Nov 23, 2023 at 6:23 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Tue, Nov 21, 2023 at 03:24:25PM +0000, Marc Zyngier wrote:
> > On Tue, 21 Nov 2023 13:40:31 +0000,
> > Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > [Adding key people on Cc]
> > >
> > > On Tue, 21 Nov 2023 12:08:48 +0000,
> > > Hector Martin <marcan@marcan.st> wrote:
> > > >
> > > > Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > > according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> > >
> > > I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> > > asymmetric ARM platform. It isn't clear what criteria is used to pick
> > > the PMU, but nothing works anymore.
> > >
> > > The saving grace in my case is that Debian still ships a 6.1 perftool
> > > package, but that's obviously not going to last.
> > >
> > > I'm happy to test potential fixes.
> >
> > At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
> > -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
> > CPU):
>
> Looking at this with fresh(er) eyes, I think there's a userspace bug here,
> regardless of whether one believes it's correct to convert a named-pmu event to
> a PERF_TYPE_HARDWARE event directed at that PMU.
>
> It looks like the userspace tool is dropping the extended type ID after an
> initial probe, and requests events with plain PERF_TYPE_HARDWARE (without an
> extended type ID), which explains why we seem to get events from one PMU only.
>
> More detail below...
>
> Marc, if you have time, could you run the same commands (on the same kernel)
> with a perf tool build from v6.4?
>
> > <quote>
> > maz@valley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 0 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
> >  apple_firestorm_pmu/cycles/ -e cycles ls
> > Using CPUID 0x00000000612f0280
> > Attempt to add: apple_icestorm_pmu/cycles=0/
> > ..after resolving event: apple_icestorm_pmu/cycles=0/
> > Opening: unknown-hardware:HG
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             0 (PERF_TYPE_HARDWARE)
> >   config                           0xb00000000
> >   disabled                         1
> > ------------------------------------------------------------
>
> Here config[31:0] is 0 (PERF_COUNT_HW_CPU_CYCLES), and config[63:32] is 0xb,
> which is presumably the PMU ID for the apple_icestorm_pmu.
>
> The attr doesn't contain exclude_guest=1, so this will be rejected by the PMU
> driver due to its mode exclusion requirements.
>
> > sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
> > sys_perf_event_open failed, error -95
>
> ... which is what we see here (this is EOPNOTSUPP, which __hw_perf_event_init()
> in drivers/perf/arm_pmu.c returns when the mode requested mode exclusion
> options aren't supported).
>
> So far, so good...
>
> > Attempt to add: apple_firestorm_pmu/cycles=0/
> > ..after resolving event: apple_firestorm_pmu/cycles=0/
> > Control descriptor is not initialized
> > Opening: apple_icestorm_pmu/cycles/
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             0 (PERF_TYPE_HARDWARE)
> >   size                             136
> >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> >   sample_type                      IDENTIFIER
> >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >   disabled                         1
> >   inherit                          1
> >   enable_on_exec                   1
> >   exclude_guest                    1
> > ------------------------------------------------------------
>
> ... but here, the extended type ID has been dropped, and this event is no
> longer directed towards the apple_firestorm_pmu PMU, so the kernel can direct
> this to *any* CPU PMU...
>
> > sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 3
>
> ... and *some* PMU accepts it.
>
> > Opening: apple_firestorm_pmu/cycles/
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             0 (PERF_TYPE_HARDWARE)
> >   size                             136
> >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> >   sample_type                      IDENTIFIER
> >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >   disabled                         1
> >   inherit                          1
> >   enable_on_exec                   1
> >   exclude_guest                    1
> > ------------------------------------------------------------
>
> Likewise here, no extended type ID...
>
> > sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
> > Opening: cycles
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             0 (PERF_TYPE_HARDWARE)
> >   size                             136
> >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> >   sample_type                      IDENTIFIER
> >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >   disabled                         1
> >   inherit                          1
> >   enable_on_exec                   1
> >   exclude_guest                    1
> > ------------------------------------------------------------
>
> Likewise here, no extended type ID...
>
> > sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 5
> > arch                  builtin-diff.o      builtin-mem.o        common-cmds.h    perf-completion.sh
> > bench                 builtin-evlist.c    builtin-probe.c      CREDITS          perf.h
> > Build                 builtin-evlist.o    builtin-probe.o      design.txt       perf-in.o
> > builtin-annotate.c    builtin-ftrace.c    builtin-record.c     dlfilters        perf-iostat
> > builtin-annotate.o    builtin-ftrace.o    builtin-record.o     Documentation    perf-iostat.sh
> > builtin-bench.c               builtin.h           builtin-report.c     FEATURE-DUMP     perf.o
> > builtin-bench.o               builtin-help.c      builtin-report.o     include          perf-read-vdso.c
> > builtin-buildid-cache.c  builtin-help.o      builtin-sched.c   jvmti            perf-sys.h
> > builtin-buildid-cache.o  builtin-inject.c    builtin-script.c  libapi   PERF-VERSION-FILE
> > builtin-buildid-list.c        builtin-inject.o    builtin-script.o     libperf          perf-with-kcore
> > builtin-buildid-list.o        builtin-kallsyms.c  builtin-stat.c       libsubcmd        pmu-events
> > builtin-c2c.c         builtin-kallsyms.o  builtin-stat.o       libsymbol        python
> > builtin-c2c.o         builtin-kmem.c      builtin-timechart.c  Makefile         python_ext_build
> > builtin-config.c      builtin-kvm.c       builtin-top.c        Makefile.config  scripts
> > builtin-config.o      builtin-kvm.o       builtin-top.o        Makefile.perf    tests
> > builtin-daemon.c      builtin-kwork.c     builtin-trace.c      MANIFEST         trace
> > builtin-daemon.o      builtin-list.c      builtin-version.c    perf             ui
> > builtin-data.c                builtin-list.o      builtin-version.o    perf-archive     util
> > builtin-data.o                builtin-lock.c      check-headers.sh     perf-archive.sh
> > builtin-diff.c                builtin-mem.c       command-list.txt     perf.c
> > apple_icestorm_pmu/cycles/: -1: 0 873709 0
> > apple_firestorm_pmu/cycles/: -1: 0 873709 0
> > cycles: -1: 0 873709 0
> > apple_icestorm_pmu/cycles/: 0 873709 0
> > apple_firestorm_pmu/cycles/: 0 873709 0
> > cycles: 0 873709 0
> >
> >  Performance counter stats for 'ls':
> >
> >      <not counted>      apple_icestorm_pmu/cycles/                                              (0.00%)
> >      <not counted>      apple_firestorm_pmu/cycles/                                             (0.00%)
> >      <not counted>      cycles                                                                  (0.00%)
> >
> >        0.000002250 seconds time elapsed
> >
> >        0.000000000 seconds user
> >        0.000000000 seconds sys
>
> So it looks like the tool has expanded the requested
> 'apple_icestorm_pmu/cycles/' event into three cycles events, each opened
> without an extended type ID.
>
> AFAICT, the kernel has done exactly what it has always done for
> PERF_TYPE_HARDWARE/PERF_COUNT_HW_CPU_CYCLES events: pick the first PMU which
> said it can handle them.
>
> > If I run the same thing on another CPU cluster (firestorm), I get
> > this:
> >
> > <quote>
> > maz@valley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 2 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
> >  apple_firestorm_pmu/cycles/ -e cycles ls
> > Using CPUID 0x00000000612f0280
> > Attempt to add: apple_icestorm_pmu/cycles=0/
> > ..after resolving event: apple_icestorm_pmu/cycles=0/
> > Opening: unknown-hardware:HG
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             0 (PERF_TYPE_HARDWARE)
> >   config                           0xb00000000
> >   disabled                         1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
> > sys_perf_event_open failed, error -95
>
> Again, we see one request with an extended type ID, which fails due to mode exclusion requirements...
>
> > Attempt to add: apple_firestorm_pmu/cycles=0/
> > ..after resolving event: apple_firestorm_pmu/cycles=0/
> > Control descriptor is not initialized
> > Opening: apple_icestorm_pmu/cycles/
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             0 (PERF_TYPE_HARDWARE)
> >   size                             136
> >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> >   sample_type                      IDENTIFIER
> >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >   disabled                         1
> >   inherit                          1
> >   enable_on_exec                   1
> >   exclude_guest                    1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 3
> > Opening: apple_firestorm_pmu/cycles/
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             0 (PERF_TYPE_HARDWARE)
> >   size                             136
> >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> >   sample_type                      IDENTIFIER
> >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >   disabled                         1
> >   inherit                          1
> >   enable_on_exec                   1
> >   exclude_guest                    1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 4
> > Opening: cycles
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             0 (PERF_TYPE_HARDWARE)
> >   size                             136
> >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> >   sample_type                      IDENTIFIER
> >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >   disabled                         1
> >   inherit                          1
> >   enable_on_exec                   1
> >   exclude_guest                    1
> > ------------------------------------------------------------
>
> ... but all subsequent requests do not have an extended type ID, and the kernel
> directs these to whichever PMU accepts the event first...
>
> > sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 5
> > arch                  builtin-diff.o      builtin-mem.o        common-cmds.h    perf-completion.sh
> > bench                 builtin-evlist.c    builtin-probe.c      CREDITS          perf.h
> > Build                 builtin-evlist.o    builtin-probe.o      design.txt       perf-in.o
> > builtin-annotate.c    builtin-ftrace.c    builtin-record.c     dlfilters        perf-iostat
> > builtin-annotate.o    builtin-ftrace.o    builtin-record.o     Documentation    perf-iostat.sh
> > builtin-bench.c               builtin.h           builtin-report.c     FEATURE-DUMP     perf.o
> > builtin-bench.o               builtin-help.c      builtin-report.o     include          perf-read-vdso.c
> > builtin-buildid-cache.c  builtin-help.o      builtin-sched.c   jvmti            perf-sys.h
> > builtin-buildid-cache.o  builtin-inject.c    builtin-script.c  libapi   PERF-VERSION-FILE
> > builtin-buildid-list.c        builtin-inject.o    builtin-script.o     libperf          perf-with-kcore
> > builtin-buildid-list.o        builtin-kallsyms.c  builtin-stat.c       libsubcmd        pmu-events
> > builtin-c2c.c         builtin-kallsyms.o  builtin-stat.o       libsymbol        python
> > builtin-c2c.o         builtin-kmem.c      builtin-timechart.c  Makefile         python_ext_build
> > builtin-config.c      builtin-kvm.c       builtin-top.c        Makefile.config  scripts
> > builtin-config.o      builtin-kvm.o       builtin-top.o        Makefile.perf    tests
> > builtin-daemon.c      builtin-kwork.c     builtin-trace.c      MANIFEST         trace
> > builtin-daemon.o      builtin-list.c      builtin-version.c    perf             ui
> > builtin-data.c                builtin-list.o      builtin-version.o    perf-archive     util
> > builtin-data.o                builtin-lock.c      check-headers.sh     perf-archive.sh
> > builtin-diff.c                builtin-mem.c       command-list.txt     perf.c
> > apple_icestorm_pmu/cycles/: -1: 1035101 469125 469125
> > apple_firestorm_pmu/cycles/: -1: 1035035 469125 469125
> > cycles: -1: 1034653 469125 469125
> > apple_icestorm_pmu/cycles/: 1035101 469125 469125
> > apple_firestorm_pmu/cycles/: 1035035 469125 469125
> > cycles: 1034653 469125 469125
> >
> >  Performance counter stats for 'ls':
> >
> >          1,035,101      apple_icestorm_pmu/cycles/
> >          1,035,035      apple_firestorm_pmu/cycles/
> >          1,034,653      cycles
> >
> >        0.000001333 seconds time elapsed
> >
> >        0.000000000 seconds user
> >        0.000000000 seconds sys
> > </quote>
>
> ... and in this case the workload was run on a CPU affine ot that arbitrary
> PMU, hence we managed to count.
>
> So AFAICT, this is a userspace bug, maybe related to the way we probe for
> supported PMU features?

Probing PMU features is done by trying to perf_event_open events. For
extended types it is a cycles event on each core PMU:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n532

The is_event_supported logic is here:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/print-events.c?h=perf-tools-next#n232

There is the following comment:

if (open_return == -EACCES) {
/*
* This happens if the paranoid value
* /proc/sys/kernel/perf_event_paranoid is set to 2
* Re-run with exclude_kernel set; we don't do that
* by default as some ARM machines do not support it.
*
*/

Thanks,
Ian

> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-23 15:14       ` Ian Rogers
@ 2023-11-23 16:48         ` Mark Rutland
  2023-11-23 17:08           ` James Clark
  0 siblings, 1 reply; 53+ messages in thread
From: Mark Rutland @ 2023-11-23 16:48 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Marc Zyngier, Hector Martin, Arnaldo Carvalho de Melo,
	James Clark, linux-perf-users, LKML, Asahi Linux

On Thu, Nov 23, 2023 at 07:14:21AM -0800, Ian Rogers wrote:
> On Thu, Nov 23, 2023 at 6:23 AM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Tue, Nov 21, 2023 at 03:24:25PM +0000, Marc Zyngier wrote:
> > > On Tue, 21 Nov 2023 13:40:31 +0000,
> > > Marc Zyngier <maz@kernel.org> wrote:
> > > >
> > > > [Adding key people on Cc]
> > > >
> > > > On Tue, 21 Nov 2023 12:08:48 +0000,
> > > > Hector Martin <marcan@marcan.st> wrote:
> > > > >
> > > > > Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > > > according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> > > >
> > > > I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> > > > asymmetric ARM platform. It isn't clear what criteria is used to pick
> > > > the PMU, but nothing works anymore.
> > > >
> > > > The saving grace in my case is that Debian still ships a 6.1 perftool
> > > > package, but that's obviously not going to last.
> > > >
> > > > I'm happy to test potential fixes.
> > >
> > > At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
> > > -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
> > > CPU):
> >
> > Looking at this with fresh(er) eyes, I think there's a userspace bug here,
> > regardless of whether one believes it's correct to convert a named-pmu event to
> > a PERF_TYPE_HARDWARE event directed at that PMU.
> >
> > It looks like the userspace tool is dropping the extended type ID after an
> > initial probe, and requests events with plain PERF_TYPE_HARDWARE (without an
> > extended type ID), which explains why we seem to get events from one PMU only.
> >
> > More detail below...
> >
> > Marc, if you have time, could you run the same commands (on the same kernel)
> > with a perf tool build from v6.4?
> >
> > > <quote>
> > > maz@valley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 0 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
> > >  apple_firestorm_pmu/cycles/ -e cycles ls
> > > Using CPUID 0x00000000612f0280
> > > Attempt to add: apple_icestorm_pmu/cycles=0/
> > > ..after resolving event: apple_icestorm_pmu/cycles=0/
> > > Opening: unknown-hardware:HG
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > >   type                             0 (PERF_TYPE_HARDWARE)
> > >   config                           0xb00000000
> > >   disabled                         1
> > > ------------------------------------------------------------
> >
> > Here config[31:0] is 0 (PERF_COUNT_HW_CPU_CYCLES), and config[63:32] is 0xb,
> > which is presumably the PMU ID for the apple_icestorm_pmu.
> >
> > The attr doesn't contain exclude_guest=1, so this will be rejected by the PMU
> > driver due to its mode exclusion requirements.
> >
> > > sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
> > > sys_perf_event_open failed, error -95
> >
> > ... which is what we see here (this is EOPNOTSUPP, which __hw_perf_event_init()
> > in drivers/perf/arm_pmu.c returns when the mode requested mode exclusion
> > options aren't supported).
> >
> > So far, so good...
> >
> > > Attempt to add: apple_firestorm_pmu/cycles=0/
> > > ..after resolving event: apple_firestorm_pmu/cycles=0/
> > > Control descriptor is not initialized
> > > Opening: apple_icestorm_pmu/cycles/
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > >   type                             0 (PERF_TYPE_HARDWARE)
> > >   size                             136
> > >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > >   sample_type                      IDENTIFIER
> > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > >   disabled                         1
> > >   inherit                          1
> > >   enable_on_exec                   1
> > >   exclude_guest                    1
> > > ------------------------------------------------------------
> >
> > ... but here, the extended type ID has been dropped, and this event is no
> > longer directed towards the apple_firestorm_pmu PMU, so the kernel can direct
> > this to *any* CPU PMU...
> >
> > > sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 3
> >
> > ... and *some* PMU accepts it.
> >
> > > Opening: apple_firestorm_pmu/cycles/
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > >   type                             0 (PERF_TYPE_HARDWARE)
> > >   size                             136
> > >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > >   sample_type                      IDENTIFIER
> > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > >   disabled                         1
> > >   inherit                          1
> > >   enable_on_exec                   1
> > >   exclude_guest                    1
> > > ------------------------------------------------------------
> >
> > Likewise here, no extended type ID...
> >
> > > sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
> > > Opening: cycles
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > >   type                             0 (PERF_TYPE_HARDWARE)
> > >   size                             136
> > >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > >   sample_type                      IDENTIFIER
> > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > >   disabled                         1
> > >   inherit                          1
> > >   enable_on_exec                   1
> > >   exclude_guest                    1
> > > ------------------------------------------------------------
> >
> > Likewise here, no extended type ID...
> >
> > > sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 5
> > > arch                  builtin-diff.o      builtin-mem.o        common-cmds.h    perf-completion.sh
> > > bench                 builtin-evlist.c    builtin-probe.c      CREDITS          perf.h
> > > Build                 builtin-evlist.o    builtin-probe.o      design.txt       perf-in.o
> > > builtin-annotate.c    builtin-ftrace.c    builtin-record.c     dlfilters        perf-iostat
> > > builtin-annotate.o    builtin-ftrace.o    builtin-record.o     Documentation    perf-iostat.sh
> > > builtin-bench.c               builtin.h           builtin-report.c     FEATURE-DUMP     perf.o
> > > builtin-bench.o               builtin-help.c      builtin-report.o     include          perf-read-vdso.c
> > > builtin-buildid-cache.c  builtin-help.o      builtin-sched.c   jvmti            perf-sys.h
> > > builtin-buildid-cache.o  builtin-inject.c    builtin-script.c  libapi   PERF-VERSION-FILE
> > > builtin-buildid-list.c        builtin-inject.o    builtin-script.o     libperf          perf-with-kcore
> > > builtin-buildid-list.o        builtin-kallsyms.c  builtin-stat.c       libsubcmd        pmu-events
> > > builtin-c2c.c         builtin-kallsyms.o  builtin-stat.o       libsymbol        python
> > > builtin-c2c.o         builtin-kmem.c      builtin-timechart.c  Makefile         python_ext_build
> > > builtin-config.c      builtin-kvm.c       builtin-top.c        Makefile.config  scripts
> > > builtin-config.o      builtin-kvm.o       builtin-top.o        Makefile.perf    tests
> > > builtin-daemon.c      builtin-kwork.c     builtin-trace.c      MANIFEST         trace
> > > builtin-daemon.o      builtin-list.c      builtin-version.c    perf             ui
> > > builtin-data.c                builtin-list.o      builtin-version.o    perf-archive     util
> > > builtin-data.o                builtin-lock.c      check-headers.sh     perf-archive.sh
> > > builtin-diff.c                builtin-mem.c       command-list.txt     perf.c
> > > apple_icestorm_pmu/cycles/: -1: 0 873709 0
> > > apple_firestorm_pmu/cycles/: -1: 0 873709 0
> > > cycles: -1: 0 873709 0
> > > apple_icestorm_pmu/cycles/: 0 873709 0
> > > apple_firestorm_pmu/cycles/: 0 873709 0
> > > cycles: 0 873709 0
> > >
> > >  Performance counter stats for 'ls':
> > >
> > >      <not counted>      apple_icestorm_pmu/cycles/                                              (0.00%)
> > >      <not counted>      apple_firestorm_pmu/cycles/                                             (0.00%)
> > >      <not counted>      cycles                                                                  (0.00%)
> > >
> > >        0.000002250 seconds time elapsed
> > >
> > >        0.000000000 seconds user
> > >        0.000000000 seconds sys
> >
> > So it looks like the tool has expanded the requested
> > 'apple_icestorm_pmu/cycles/' event into three cycles events, each opened
> > without an extended type ID.
> >
> > AFAICT, the kernel has done exactly what it has always done for
> > PERF_TYPE_HARDWARE/PERF_COUNT_HW_CPU_CYCLES events: pick the first PMU which
> > said it can handle them.
> >
> > > If I run the same thing on another CPU cluster (firestorm), I get
> > > this:
> > >
> > > <quote>
> > > maz@valley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 2 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
> > >  apple_firestorm_pmu/cycles/ -e cycles ls
> > > Using CPUID 0x00000000612f0280
> > > Attempt to add: apple_icestorm_pmu/cycles=0/
> > > ..after resolving event: apple_icestorm_pmu/cycles=0/
> > > Opening: unknown-hardware:HG
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > >   type                             0 (PERF_TYPE_HARDWARE)
> > >   config                           0xb00000000
> > >   disabled                         1
> > > ------------------------------------------------------------
> > > sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
> > > sys_perf_event_open failed, error -95
> >
> > Again, we see one request with an extended type ID, which fails due to mode exclusion requirements...
> >
> > > Attempt to add: apple_firestorm_pmu/cycles=0/
> > > ..after resolving event: apple_firestorm_pmu/cycles=0/
> > > Control descriptor is not initialized
> > > Opening: apple_icestorm_pmu/cycles/
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > >   type                             0 (PERF_TYPE_HARDWARE)
> > >   size                             136
> > >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > >   sample_type                      IDENTIFIER
> > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > >   disabled                         1
> > >   inherit                          1
> > >   enable_on_exec                   1
> > >   exclude_guest                    1
> > > ------------------------------------------------------------
> > > sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 3
> > > Opening: apple_firestorm_pmu/cycles/
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > >   type                             0 (PERF_TYPE_HARDWARE)
> > >   size                             136
> > >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > >   sample_type                      IDENTIFIER
> > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > >   disabled                         1
> > >   inherit                          1
> > >   enable_on_exec                   1
> > >   exclude_guest                    1
> > > ------------------------------------------------------------
> > > sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 4
> > > Opening: cycles
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > >   type                             0 (PERF_TYPE_HARDWARE)
> > >   size                             136
> > >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > >   sample_type                      IDENTIFIER
> > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > >   disabled                         1
> > >   inherit                          1
> > >   enable_on_exec                   1
> > >   exclude_guest                    1
> > > ------------------------------------------------------------
> >
> > ... but all subsequent requests do not have an extended type ID, and the kernel
> > directs these to whichever PMU accepts the event first...
> >
> > > sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 5
> > > arch                  builtin-diff.o      builtin-mem.o        common-cmds.h    perf-completion.sh
> > > bench                 builtin-evlist.c    builtin-probe.c      CREDITS          perf.h
> > > Build                 builtin-evlist.o    builtin-probe.o      design.txt       perf-in.o
> > > builtin-annotate.c    builtin-ftrace.c    builtin-record.c     dlfilters        perf-iostat
> > > builtin-annotate.o    builtin-ftrace.o    builtin-record.o     Documentation    perf-iostat.sh
> > > builtin-bench.c               builtin.h           builtin-report.c     FEATURE-DUMP     perf.o
> > > builtin-bench.o               builtin-help.c      builtin-report.o     include          perf-read-vdso.c
> > > builtin-buildid-cache.c  builtin-help.o      builtin-sched.c   jvmti            perf-sys.h
> > > builtin-buildid-cache.o  builtin-inject.c    builtin-script.c  libapi   PERF-VERSION-FILE
> > > builtin-buildid-list.c        builtin-inject.o    builtin-script.o     libperf          perf-with-kcore
> > > builtin-buildid-list.o        builtin-kallsyms.c  builtin-stat.c       libsubcmd        pmu-events
> > > builtin-c2c.c         builtin-kallsyms.o  builtin-stat.o       libsymbol        python
> > > builtin-c2c.o         builtin-kmem.c      builtin-timechart.c  Makefile         python_ext_build
> > > builtin-config.c      builtin-kvm.c       builtin-top.c        Makefile.config  scripts
> > > builtin-config.o      builtin-kvm.o       builtin-top.o        Makefile.perf    tests
> > > builtin-daemon.c      builtin-kwork.c     builtin-trace.c      MANIFEST         trace
> > > builtin-daemon.o      builtin-list.c      builtin-version.c    perf             ui
> > > builtin-data.c                builtin-list.o      builtin-version.o    perf-archive     util
> > > builtin-data.o                builtin-lock.c      check-headers.sh     perf-archive.sh
> > > builtin-diff.c                builtin-mem.c       command-list.txt     perf.c
> > > apple_icestorm_pmu/cycles/: -1: 1035101 469125 469125
> > > apple_firestorm_pmu/cycles/: -1: 1035035 469125 469125
> > > cycles: -1: 1034653 469125 469125
> > > apple_icestorm_pmu/cycles/: 1035101 469125 469125
> > > apple_firestorm_pmu/cycles/: 1035035 469125 469125
> > > cycles: 1034653 469125 469125
> > >
> > >  Performance counter stats for 'ls':
> > >
> > >          1,035,101      apple_icestorm_pmu/cycles/
> > >          1,035,035      apple_firestorm_pmu/cycles/
> > >          1,034,653      cycles
> > >
> > >        0.000001333 seconds time elapsed
> > >
> > >        0.000000000 seconds user
> > >        0.000000000 seconds sys
> > > </quote>
> >
> > ... and in this case the workload was run on a CPU affine ot that arbitrary
> > PMU, hence we managed to count.
> >
> > So AFAICT, this is a userspace bug, maybe related to the way we probe for
> > supported PMU features?
> 
> Probing PMU features is done by trying to perf_event_open events. For
> extended types it is a cycles event on each core PMU:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n532
> 
> The is_event_supported logic is here:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/print-events.c?h=perf-tools-next#n232

Ah, so IIUC what's happening is:

1) Userspace tries to detect extended type support, with a cycles event
   directed to one of the CPU PMUs. The attr for this does not have
   exclude_guest set.

2) In the kernel, the core perf code sees the extended hw type id, and directs
   this towards the correct PMU (apple_icestorm_pmu).

3) The PMU driver looks at the attr, sees exclude_guest is not set, and returns
   -EOPNOTSUPP, exactly as it would regardless of whether the extended hw type
   is used.

   Note: this happens to be a difference between x86 PMUs and the apple_* PMUs,
   but this is a legitimate part of the perf ABI, not an arm-specific quirk or
   bug.

4) Userspace receives -EOPNOTSUPP, and so decide the extended hw_type is not
   supported (even though the kernel does support the extended hw type id, and
   the event was rejected for orthogonal reasons).

5) Userspace avoids the extended hw type, but still uses
   PERF_EVENT_TYPE_HARDWARE events for named-pmu events.

Does that sound plausible to you, or have I misunderstood?

From Marc's reply at:

  https://lore.kernel.org/lkml/86edggzfxx.wl-maz@kernel.org/

... with perf built from v6.4, the perf tool can open named pmu events without
issue, and sets exclude_guest in the attr. So it seems like there's a mismatch
between regular opening of events and probing for extended hw type that causes
that to differ.

AFAICT, the kernel is doing the right thing here, but the userspace detection
of extended type id support happens to differ from regular event opening, and
mis-interprets -EOPNOTSUP as "the kernel doesn't support extended type IDs"
rather than "The kernel was able to consume the extended type ID, but the
specific PMU targetted said it doesn't support this attr".

IIUC that means this'll be broken on older kernels (those before the extended
hw type id support was introduced), too?

It sounds like we need to make (4) more robust? I'm not immediately sure how, 
given the rats nest of returns in perf_event_open(), but I'm happy to try to
help with that.

It also seems like (5) is a problem regardless. If the user asks for a named
PMU event on an older kernel (before the extended hw type id was a thing), and
the tool converts that to a plain PERF_EVENT_TYPE_HARDWARE event, it's liable
to be handled by a different PMU than the one the user asked for.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-23 16:48         ` Mark Rutland
@ 2023-11-23 17:08           ` James Clark
  2023-11-23 17:15             ` Mark Rutland
  0 siblings, 1 reply; 53+ messages in thread
From: James Clark @ 2023-11-23 17:08 UTC (permalink / raw)
  To: Mark Rutland, Ian Rogers
  Cc: Marc Zyngier, Hector Martin, Arnaldo Carvalho de Melo,
	linux-perf-users, LKML, Asahi Linux



On 23/11/2023 16:48, Mark Rutland wrote:
> On Thu, Nov 23, 2023 at 07:14:21AM -0800, Ian Rogers wrote:
>> On Thu, Nov 23, 2023 at 6:23 AM Mark Rutland <mark.rutland@arm.com> wrote:
>>>
>>> On Tue, Nov 21, 2023 at 03:24:25PM +0000, Marc Zyngier wrote:
>>>> On Tue, 21 Nov 2023 13:40:31 +0000,
>>>> Marc Zyngier <maz@kernel.org> wrote:
>>>>>
>>>>> [Adding key people on Cc]
>>>>>
>>>>> On Tue, 21 Nov 2023 12:08:48 +0000,
>>>>> Hector Martin <marcan@marcan.st> wrote:
>>>>>>
>>>>>> Perf broke on all Apple ARM64 systems (tested almost everything), and
>>>>>> according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
>>>>>
>>>>> I can confirm that at least on 6.7-rc2, perf is pretty busted on any
>>>>> asymmetric ARM platform. It isn't clear what criteria is used to pick
>>>>> the PMU, but nothing works anymore.
>>>>>
>>>>> The saving grace in my case is that Debian still ships a 6.1 perftool
>>>>> package, but that's obviously not going to last.
>>>>>
>>>>> I'm happy to test potential fixes.
>>>>
>>>> At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
>>>> -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
>>>> CPU):
>>>
>>> Looking at this with fresh(er) eyes, I think there's a userspace bug here,
>>> regardless of whether one believes it's correct to convert a named-pmu event to
>>> a PERF_TYPE_HARDWARE event directed at that PMU.
>>>
>>> It looks like the userspace tool is dropping the extended type ID after an
>>> initial probe, and requests events with plain PERF_TYPE_HARDWARE (without an
>>> extended type ID), which explains why we seem to get events from one PMU only.
>>>
>>> More detail below...
>>>
>>> Marc, if you have time, could you run the same commands (on the same kernel)
>>> with a perf tool build from v6.4?
>>>
>>>> <quote>
>>>> maz@valley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 0 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
>>>>  apple_firestorm_pmu/cycles/ -e cycles ls
>>>> Using CPUID 0x00000000612f0280
>>>> Attempt to add: apple_icestorm_pmu/cycles=0/
>>>> ..after resolving event: apple_icestorm_pmu/cycles=0/
>>>> Opening: unknown-hardware:HG
>>>> ------------------------------------------------------------
>>>> perf_event_attr:
>>>>   type                             0 (PERF_TYPE_HARDWARE)
>>>>   config                           0xb00000000
>>>>   disabled                         1
>>>> ------------------------------------------------------------
>>>
>>> Here config[31:0] is 0 (PERF_COUNT_HW_CPU_CYCLES), and config[63:32] is 0xb,
>>> which is presumably the PMU ID for the apple_icestorm_pmu.
>>>
>>> The attr doesn't contain exclude_guest=1, so this will be rejected by the PMU
>>> driver due to its mode exclusion requirements.
>>>
>>>> sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
>>>> sys_perf_event_open failed, error -95
>>>
>>> ... which is what we see here (this is EOPNOTSUPP, which __hw_perf_event_init()
>>> in drivers/perf/arm_pmu.c returns when the mode requested mode exclusion
>>> options aren't supported).
>>>
>>> So far, so good...
>>>
>>>> Attempt to add: apple_firestorm_pmu/cycles=0/
>>>> ..after resolving event: apple_firestorm_pmu/cycles=0/
>>>> Control descriptor is not initialized
>>>> Opening: apple_icestorm_pmu/cycles/
>>>> ------------------------------------------------------------
>>>> perf_event_attr:
>>>>   type                             0 (PERF_TYPE_HARDWARE)
>>>>   size                             136
>>>>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>>>>   sample_type                      IDENTIFIER
>>>>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>>>>   disabled                         1
>>>>   inherit                          1
>>>>   enable_on_exec                   1
>>>>   exclude_guest                    1
>>>> ------------------------------------------------------------
>>>
>>> ... but here, the extended type ID has been dropped, and this event is no
>>> longer directed towards the apple_firestorm_pmu PMU, so the kernel can direct
>>> this to *any* CPU PMU...
>>>
>>>> sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 3
>>>
>>> ... and *some* PMU accepts it.
>>>
>>>> Opening: apple_firestorm_pmu/cycles/
>>>> ------------------------------------------------------------
>>>> perf_event_attr:
>>>>   type                             0 (PERF_TYPE_HARDWARE)
>>>>   size                             136
>>>>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>>>>   sample_type                      IDENTIFIER
>>>>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>>>>   disabled                         1
>>>>   inherit                          1
>>>>   enable_on_exec                   1
>>>>   exclude_guest                    1
>>>> ------------------------------------------------------------
>>>
>>> Likewise here, no extended type ID...
>>>
>>>> sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
>>>> Opening: cycles
>>>> ------------------------------------------------------------
>>>> perf_event_attr:
>>>>   type                             0 (PERF_TYPE_HARDWARE)
>>>>   size                             136
>>>>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>>>>   sample_type                      IDENTIFIER
>>>>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>>>>   disabled                         1
>>>>   inherit                          1
>>>>   enable_on_exec                   1
>>>>   exclude_guest                    1
>>>> ------------------------------------------------------------
>>>
>>> Likewise here, no extended type ID...
>>>
>>>> sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 5
>>>> arch                  builtin-diff.o      builtin-mem.o        common-cmds.h    perf-completion.sh
>>>> bench                 builtin-evlist.c    builtin-probe.c      CREDITS          perf.h
>>>> Build                 builtin-evlist.o    builtin-probe.o      design.txt       perf-in.o
>>>> builtin-annotate.c    builtin-ftrace.c    builtin-record.c     dlfilters        perf-iostat
>>>> builtin-annotate.o    builtin-ftrace.o    builtin-record.o     Documentation    perf-iostat.sh
>>>> builtin-bench.c               builtin.h           builtin-report.c     FEATURE-DUMP     perf.o
>>>> builtin-bench.o               builtin-help.c      builtin-report.o     include          perf-read-vdso.c
>>>> builtin-buildid-cache.c  builtin-help.o      builtin-sched.c   jvmti            perf-sys.h
>>>> builtin-buildid-cache.o  builtin-inject.c    builtin-script.c  libapi   PERF-VERSION-FILE
>>>> builtin-buildid-list.c        builtin-inject.o    builtin-script.o     libperf          perf-with-kcore
>>>> builtin-buildid-list.o        builtin-kallsyms.c  builtin-stat.c       libsubcmd        pmu-events
>>>> builtin-c2c.c         builtin-kallsyms.o  builtin-stat.o       libsymbol        python
>>>> builtin-c2c.o         builtin-kmem.c      builtin-timechart.c  Makefile         python_ext_build
>>>> builtin-config.c      builtin-kvm.c       builtin-top.c        Makefile.config  scripts
>>>> builtin-config.o      builtin-kvm.o       builtin-top.o        Makefile.perf    tests
>>>> builtin-daemon.c      builtin-kwork.c     builtin-trace.c      MANIFEST         trace
>>>> builtin-daemon.o      builtin-list.c      builtin-version.c    perf             ui
>>>> builtin-data.c                builtin-list.o      builtin-version.o    perf-archive     util
>>>> builtin-data.o                builtin-lock.c      check-headers.sh     perf-archive.sh
>>>> builtin-diff.c                builtin-mem.c       command-list.txt     perf.c
>>>> apple_icestorm_pmu/cycles/: -1: 0 873709 0
>>>> apple_firestorm_pmu/cycles/: -1: 0 873709 0
>>>> cycles: -1: 0 873709 0
>>>> apple_icestorm_pmu/cycles/: 0 873709 0
>>>> apple_firestorm_pmu/cycles/: 0 873709 0
>>>> cycles: 0 873709 0
>>>>
>>>>  Performance counter stats for 'ls':
>>>>
>>>>      <not counted>      apple_icestorm_pmu/cycles/                                              (0.00%)
>>>>      <not counted>      apple_firestorm_pmu/cycles/                                             (0.00%)
>>>>      <not counted>      cycles                                                                  (0.00%)
>>>>
>>>>        0.000002250 seconds time elapsed
>>>>
>>>>        0.000000000 seconds user
>>>>        0.000000000 seconds sys
>>>
>>> So it looks like the tool has expanded the requested
>>> 'apple_icestorm_pmu/cycles/' event into three cycles events, each opened
>>> without an extended type ID.
>>>
>>> AFAICT, the kernel has done exactly what it has always done for
>>> PERF_TYPE_HARDWARE/PERF_COUNT_HW_CPU_CYCLES events: pick the first PMU which
>>> said it can handle them.
>>>
>>>> If I run the same thing on another CPU cluster (firestorm), I get
>>>> this:
>>>>
>>>> <quote>
>>>> maz@valley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 2 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
>>>>  apple_firestorm_pmu/cycles/ -e cycles ls
>>>> Using CPUID 0x00000000612f0280
>>>> Attempt to add: apple_icestorm_pmu/cycles=0/
>>>> ..after resolving event: apple_icestorm_pmu/cycles=0/
>>>> Opening: unknown-hardware:HG
>>>> ------------------------------------------------------------
>>>> perf_event_attr:
>>>>   type                             0 (PERF_TYPE_HARDWARE)
>>>>   config                           0xb00000000
>>>>   disabled                         1
>>>> ------------------------------------------------------------
>>>> sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
>>>> sys_perf_event_open failed, error -95
>>>
>>> Again, we see one request with an extended type ID, which fails due to mode exclusion requirements...
>>>
>>>> Attempt to add: apple_firestorm_pmu/cycles=0/
>>>> ..after resolving event: apple_firestorm_pmu/cycles=0/
>>>> Control descriptor is not initialized
>>>> Opening: apple_icestorm_pmu/cycles/
>>>> ------------------------------------------------------------
>>>> perf_event_attr:
>>>>   type                             0 (PERF_TYPE_HARDWARE)
>>>>   size                             136
>>>>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>>>>   sample_type                      IDENTIFIER
>>>>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>>>>   disabled                         1
>>>>   inherit                          1
>>>>   enable_on_exec                   1
>>>>   exclude_guest                    1
>>>> ------------------------------------------------------------
>>>> sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 3
>>>> Opening: apple_firestorm_pmu/cycles/
>>>> ------------------------------------------------------------
>>>> perf_event_attr:
>>>>   type                             0 (PERF_TYPE_HARDWARE)
>>>>   size                             136
>>>>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>>>>   sample_type                      IDENTIFIER
>>>>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>>>>   disabled                         1
>>>>   inherit                          1
>>>>   enable_on_exec                   1
>>>>   exclude_guest                    1
>>>> ------------------------------------------------------------
>>>> sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 4
>>>> Opening: cycles
>>>> ------------------------------------------------------------
>>>> perf_event_attr:
>>>>   type                             0 (PERF_TYPE_HARDWARE)
>>>>   size                             136
>>>>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>>>>   sample_type                      IDENTIFIER
>>>>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>>>>   disabled                         1
>>>>   inherit                          1
>>>>   enable_on_exec                   1
>>>>   exclude_guest                    1
>>>> ------------------------------------------------------------
>>>
>>> ... but all subsequent requests do not have an extended type ID, and the kernel
>>> directs these to whichever PMU accepts the event first...
>>>
>>>> sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 5
>>>> arch                  builtin-diff.o      builtin-mem.o        common-cmds.h    perf-completion.sh
>>>> bench                 builtin-evlist.c    builtin-probe.c      CREDITS          perf.h
>>>> Build                 builtin-evlist.o    builtin-probe.o      design.txt       perf-in.o
>>>> builtin-annotate.c    builtin-ftrace.c    builtin-record.c     dlfilters        perf-iostat
>>>> builtin-annotate.o    builtin-ftrace.o    builtin-record.o     Documentation    perf-iostat.sh
>>>> builtin-bench.c               builtin.h           builtin-report.c     FEATURE-DUMP     perf.o
>>>> builtin-bench.o               builtin-help.c      builtin-report.o     include          perf-read-vdso.c
>>>> builtin-buildid-cache.c  builtin-help.o      builtin-sched.c   jvmti            perf-sys.h
>>>> builtin-buildid-cache.o  builtin-inject.c    builtin-script.c  libapi   PERF-VERSION-FILE
>>>> builtin-buildid-list.c        builtin-inject.o    builtin-script.o     libperf          perf-with-kcore
>>>> builtin-buildid-list.o        builtin-kallsyms.c  builtin-stat.c       libsubcmd        pmu-events
>>>> builtin-c2c.c         builtin-kallsyms.o  builtin-stat.o       libsymbol        python
>>>> builtin-c2c.o         builtin-kmem.c      builtin-timechart.c  Makefile         python_ext_build
>>>> builtin-config.c      builtin-kvm.c       builtin-top.c        Makefile.config  scripts
>>>> builtin-config.o      builtin-kvm.o       builtin-top.o        Makefile.perf    tests
>>>> builtin-daemon.c      builtin-kwork.c     builtin-trace.c      MANIFEST         trace
>>>> builtin-daemon.o      builtin-list.c      builtin-version.c    perf             ui
>>>> builtin-data.c                builtin-list.o      builtin-version.o    perf-archive     util
>>>> builtin-data.o                builtin-lock.c      check-headers.sh     perf-archive.sh
>>>> builtin-diff.c                builtin-mem.c       command-list.txt     perf.c
>>>> apple_icestorm_pmu/cycles/: -1: 1035101 469125 469125
>>>> apple_firestorm_pmu/cycles/: -1: 1035035 469125 469125
>>>> cycles: -1: 1034653 469125 469125
>>>> apple_icestorm_pmu/cycles/: 1035101 469125 469125
>>>> apple_firestorm_pmu/cycles/: 1035035 469125 469125
>>>> cycles: 1034653 469125 469125
>>>>
>>>>  Performance counter stats for 'ls':
>>>>
>>>>          1,035,101      apple_icestorm_pmu/cycles/
>>>>          1,035,035      apple_firestorm_pmu/cycles/
>>>>          1,034,653      cycles
>>>>
>>>>        0.000001333 seconds time elapsed
>>>>
>>>>        0.000000000 seconds user
>>>>        0.000000000 seconds sys
>>>> </quote>
>>>
>>> ... and in this case the workload was run on a CPU affine ot that arbitrary
>>> PMU, hence we managed to count.
>>>
>>> So AFAICT, this is a userspace bug, maybe related to the way we probe for
>>> supported PMU features?
>>
>> Probing PMU features is done by trying to perf_event_open events. For
>> extended types it is a cycles event on each core PMU:
>> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n532
>>
>> The is_event_supported logic is here:
>> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/print-events.c?h=perf-tools-next#n232
> 
> Ah, so IIUC what's happening is:
> 
> 1) Userspace tries to detect extended type support, with a cycles event
>    directed to one of the CPU PMUs. The attr for this does not have
>    exclude_guest set.
> 
> 2) In the kernel, the core perf code sees the extended hw type id, and directs
>    this towards the correct PMU (apple_icestorm_pmu).
> 
> 3) The PMU driver looks at the attr, sees exclude_guest is not set, and returns
>    -EOPNOTSUPP, exactly as it would regardless of whether the extended hw type
>    is used.
> 
>    Note: this happens to be a difference between x86 PMUs and the apple_* PMUs,
>    but this is a legitimate part of the perf ABI, not an arm-specific quirk or
>    bug.
> 
> 4) Userspace receives -EOPNOTSUPP, and so decide the extended hw_type is not
>    supported (even though the kernel does support the extended hw type id, and
>    the event was rejected for orthogonal reasons).
> 
> 5) Userspace avoids the extended hw type, but still uses
>    PERF_EVENT_TYPE_HARDWARE events for named-pmu events.
> 
> Does that sound plausible to you, or have I misunderstood?
> 
> From Marc's reply at:
> 
>   https://lore.kernel.org/lkml/86edggzfxx.wl-maz@kernel.org/
> 
> ... with perf built from v6.4, the perf tool can open named pmu events without
> issue, and sets exclude_guest in the attr. So it seems like there's a mismatch
> between regular opening of events and probing for extended hw type that causes
> that to differ.
> 
> AFAICT, the kernel is doing the right thing here, but the userspace detection
> of extended type id support happens to differ from regular event opening, and
> mis-interprets -EOPNOTSUP as "the kernel doesn't support extended type IDs"
> rather than "The kernel was able to consume the extended type ID, but the
> specific PMU targetted said it doesn't support this attr".
> 
> IIUC that means this'll be broken on older kernels (those before the extended
> hw type id support was introduced), too?
> 
> It sounds like we need to make (4) more robust? I'm not immediately sure how, 
> given the rats nest of returns in perf_event_open(), but I'm happy to try to
> help with that.

It might be worth reporting extended HW ID support in the caps folder of
the PMU so that Perf can look there instead of trying to open the event.
It's something that we know will always be on or always be off so it
doesn't make sense to try to discover it by opening an event.

> 
> It also seems like (5) is a problem regardless. If the user asks for a named
> PMU event on an older kernel (before the extended hw type id was a thing), and
> the tool converts that to a plain PERF_EVENT_TYPE_HARDWARE event, it's liable
> to be handled by a different PMU than the one the user asked for.
> 
> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-23 17:08           ` James Clark
@ 2023-11-23 17:15             ` Mark Rutland
  0 siblings, 0 replies; 53+ messages in thread
From: Mark Rutland @ 2023-11-23 17:15 UTC (permalink / raw)
  To: James Clark
  Cc: Ian Rogers, Marc Zyngier, Hector Martin, Arnaldo Carvalho de Melo,
	linux-perf-users, LKML, Asahi Linux

On Thu, Nov 23, 2023 at 05:08:43PM +0000, James Clark wrote:
> On 23/11/2023 16:48, Mark Rutland wrote:
> > Ah, so IIUC what's happening is:
> > 
> > 1) Userspace tries to detect extended type support, with a cycles event
> >    directed to one of the CPU PMUs. The attr for this does not have
> >    exclude_guest set.
> > 
> > 2) In the kernel, the core perf code sees the extended hw type id, and directs
> >    this towards the correct PMU (apple_icestorm_pmu).
> > 
> > 3) The PMU driver looks at the attr, sees exclude_guest is not set, and returns
> >    -EOPNOTSUPP, exactly as it would regardless of whether the extended hw type
> >    is used.
> > 
> >    Note: this happens to be a difference between x86 PMUs and the apple_* PMUs,
> >    but this is a legitimate part of the perf ABI, not an arm-specific quirk or
> >    bug.
> > 
> > 4) Userspace receives -EOPNOTSUPP, and so decide the extended hw_type is not
> >    supported (even though the kernel does support the extended hw type id, and
> >    the event was rejected for orthogonal reasons).

> > It sounds like we need to make (4) more robust? I'm not immediately sure how, 
> > given the rats nest of returns in perf_event_open(), but I'm happy to try to
> > help with that.
> 
> It might be worth reporting extended HW ID support in the caps folder of
> the PMU so that Perf can look there instead of trying to open the event.
> It's something that we know will always be on or always be off so it
> doesn't make sense to try to discover it by opening an event.

Yep, I'm open to that idea. I'm more than happy to expose something that
indicates "this PMU supports the extended HW ID" and/or "this kernel supports
the extended HW ID".

Given that the actual PMU drivers don't see the extended cap, and that's
handled by the core, I'd like to make the core logic unconditional and remove
the kernel-internal PERF_PMU_CAP_EXTENDED_HW_TYPE cap. So I'd lean towards the
"this kernel supports the extended HW ID" option.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-11-21 23:43 ` Bagas Sanjaya
@ 2023-12-06 12:09   ` Linux regression tracking #update (Thorsten Leemhuis)
  2024-08-01 19:05     ` Ian Rogers
  0 siblings, 1 reply; 53+ messages in thread
From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-12-06 12:09 UTC (permalink / raw)
  To: Linux perf Profiling, Linux Kernel Mailing List

[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 22.11.23 00:43, Bagas Sanjaya wrote:
> On Tue, Nov 21, 2023 at 09:08:48PM +0900, Hector Martin wrote:
>> Perf broke on all Apple ARM64 systems (tested almost everything), and
>> according to maz also on Juno (so, probably all big.LITTLE) since v6.5.

#regzbot fix: perf parse-events: Make legacy events lower priority than
sysfs/JSON
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2023-12-06 12:09   ` Linux regression tracking #update (Thorsten Leemhuis)
@ 2024-08-01 19:05     ` Ian Rogers
  2024-08-07  8:54       ` Thorsten Leemhuis
  2025-03-09 21:19       ` Ian Rogers
  0 siblings, 2 replies; 53+ messages in thread
From: Ian Rogers @ 2024-08-01 19:05 UTC (permalink / raw)
  To: Linux regressions mailing list, to: Mark Rutland
  Cc: Linux perf Profiling, Linux Kernel Mailing List, James Clark,
	cc: Marc Zyngier, Hector Martin, Arnaldo Carvalho de Melo,
	Asahi Linux

On Wed, Dec 6, 2023 at 4:09 AM Linux regression tracking #update
(Thorsten Leemhuis) <regressions@leemhuis.info> wrote:
>
> [TLDR: This mail in primarily relevant for Linux kernel regression
> tracking. See link in footer if these mails annoy you.]
>
> On 22.11.23 00:43, Bagas Sanjaya wrote:
> > On Tue, Nov 21, 2023 at 09:08:48PM +0900, Hector Martin wrote:
> >> Perf broke on all Apple ARM64 systems (tested almost everything), and
> >> according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
>
> #regzbot fix: perf parse-events: Make legacy events lower priority than
> sysfs/JSON
> #regzbot ignore-activity

Note, this is still broken. The patch changed the priority in the case
that you do something like:

$ perf stat -e 'armv8_pmuv3_0/cycles/' benchmark

but if you do:

$ perf stat -e 'cycles' benchmark

then the broken behavior will happen as legacy events have priority
over sysfs/json events in that case. To fix this you need to revert:
4f1b067359ac Revert "perf parse-events: Prefer sysfs/JSON hardware
events over legacy"

This causes some testing issues resolved in this unmerged patch series:
https://lore.kernel.org/lkml/20240510053705.2462258-1-irogers@google.com/

There is a bug as the arm_dsu PMU advertises an event called "cycles"
and this PMU is present on Ampere systems. Reverting the commit above
will cause an issue as the commit 7b100989b4f6 ("perf evlist: Remove
__evlist__add_default") to fix ARM's BIG.little systems (opening a
cycles event on all PMUs not just 1) will cause the arm_dsu event to
be opened by perf record and fail as the event won't support sampling.

The patch https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
fixes this by only opening the cycles event on core PMUs when choosing
default events.

Rather than take this patch the revert happened as Linus runs the
command "perf record -e cycles:pp" (ie using a specified event and not
defaults) and considers it a regression in the perf tool that on an
Ampere system to need to do "perf record -e
'armv8_pmuv3_0/cycles/pp'". It was pointed out that not specifying -e
will choose the cycles event correctly and with better precision the
pp for systems that support it, but it was still considered a
regression in the perf tool so the revert was made to happen. There is
a lack of perf testing coverage for ARM, in particular as they choose
to do everything in a different way to x86. The patch in question was
in the linux-next tree for weeks without issues.

ARM/Ampere could fix this by renaming the event from cycles to
cpu_cycles, or by following Intel's convention that anything uncore
uses the name clockticks rather than cycles. This could break people
who rely on an event called arm_dsu/cycles/ but I imagine such people
are rare. There has been no progress I'm aware of on renaming the
event.

Making perf not terminate on opening an event for perf record seems
like the most likely workaround as that is at least something under
the tool maintainers control. ARM have discussed doing this on the
lists:
https://lore.kernel.org/lkml/f30f676e-a1d7-4d6b-94c1-3bdbd1448887@arm.com/
but since the revert in v6.10 no patches have appeared for the v6.11
merge window. Feature work like coresight improvements and ARMv9 are
being actively pursued by ARM, but feature work won't resolve this
regression.

I'm keen to see such patches as there are perf stat fixes reliant on
the stacked parse event fixes that are consequently not merged
affecting more than just ARM.

There is a related discussion that events specified without PMUs
should inherently only mean core PMUs. Unfortunately such a change
would break uncore events specified without a PMU, for example `perf
stat -e data_read -a sleep 1` gathers read memory bandwidth on uncore
memory controllers on recent Intel devices. Not specifying a PMU for
uncore events is also assumed by perf metrics, so a large number of
metrics would need updating to make such a change work. Many existing
JSON uncore events specify a PMU in their name like
UNC_M2HBM_CMS_CLOCKTICKS and it feels somewhat redundant to have to
make that h2hbm/UNC_M2HBM_CMS_CLOCKTICKS/. It is unclear who would
pursue fixing all of this, and so it seems not specifying a PMU with
an event for perf will keep meaning trying to open the event on all
PMUs that advertise such an event.

Thanks,
Ian

> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> That page also explains what to do if mails like this annoy you.
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2024-08-01 19:05     ` Ian Rogers
@ 2024-08-07  8:54       ` Thorsten Leemhuis
  2024-08-14 16:28         ` James Clark
  2025-03-09 21:19       ` Ian Rogers
  1 sibling, 1 reply; 53+ messages in thread
From: Thorsten Leemhuis @ 2024-08-07  8:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Linux perf Profiling, Linux Kernel Mailing List, James Clark,
	cc: Marc Zyngier, Hector Martin, Asahi Linux, Ian Rogers,
	Linux regressions mailing list, to: Mark Rutland

On 01.08.24 21:05, Ian Rogers wrote:
> On Wed, Dec 6, 2023 at 4:09 AM Linux regression tracking #update
> (Thorsten Leemhuis) <regressions@leemhuis.info> wrote:
>>
>> [TLDR: This mail in primarily relevant for Linux kernel regression
>> tracking. See link in footer if these mails annoy you.]
>>
>> On 22.11.23 00:43, Bagas Sanjaya wrote:
>>> On Tue, Nov 21, 2023 at 09:08:48PM +0900, Hector Martin wrote:
>>>> Perf broke on all Apple ARM64 systems (tested almost everything), and
>>>> according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
>>
>> #regzbot fix: perf parse-events: Make legacy events lower priority than
>> sysfs/JSON
>> #regzbot ignore-activity
> 
> Note, this is still broken.

Hmmm, so all that became somewhat messy. Arnaldo, what's the way out of
this? Or is this a "we are screwed one way or another and someone has to
bite the bullet" situation?

Ciao, Thorsten

> The patch changed the priority in the case
> that you do something like:
> 
> $ perf stat -e 'armv8_pmuv3_0/cycles/' benchmark
> 
> but if you do:
> 
> $ perf stat -e 'cycles' benchmark
> 
> then the broken behavior will happen as legacy events have priority
> over sysfs/json events in that case. To fix this you need to revert:
> 4f1b067359ac Revert "perf parse-events: Prefer sysfs/JSON hardware
> events over legacy"
> 
> This causes some testing issues resolved in this unmerged patch series:
> https://lore.kernel.org/lkml/20240510053705.2462258-1-irogers@google.com/
> 
> There is a bug as the arm_dsu PMU advertises an event called "cycles"
> and this PMU is present on Ampere systems. Reverting the commit above
> will cause an issue as the commit 7b100989b4f6 ("perf evlist: Remove
> __evlist__add_default") to fix ARM's BIG.little systems (opening a
> cycles event on all PMUs not just 1) will cause the arm_dsu event to
> be opened by perf record and fail as the event won't support sampling.
> 
> The patch https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
> fixes this by only opening the cycles event on core PMUs when choosing
> default events.
> 
> Rather than take this patch the revert happened as Linus runs the
> command "perf record -e cycles:pp" (ie using a specified event and not
> defaults) and considers it a regression in the perf tool that on an
> Ampere system to need to do "perf record -e
> 'armv8_pmuv3_0/cycles/pp'". It was pointed out that not specifying -e
> will choose the cycles event correctly and with better precision the
> pp for systems that support it, but it was still considered a
> regression in the perf tool so the revert was made to happen. There is
> a lack of perf testing coverage for ARM, in particular as they choose
> to do everything in a different way to x86. The patch in question was
> in the linux-next tree for weeks without issues.
> 
> ARM/Ampere could fix this by renaming the event from cycles to
> cpu_cycles, or by following Intel's convention that anything uncore
> uses the name clockticks rather than cycles. This could break people
> who rely on an event called arm_dsu/cycles/ but I imagine such people
> are rare. There has been no progress I'm aware of on renaming the
> event.
> 
> Making perf not terminate on opening an event for perf record seems
> like the most likely workaround as that is at least something under
> the tool maintainers control. ARM have discussed doing this on the
> lists:
> https://lore.kernel.org/lkml/f30f676e-a1d7-4d6b-94c1-3bdbd1448887@arm.com/
> but since the revert in v6.10 no patches have appeared for the v6.11
> merge window. Feature work like coresight improvements and ARMv9 are
> being actively pursued by ARM, but feature work won't resolve this
> regression.
> 
> I'm keen to see such patches as there are perf stat fixes reliant on
> the stacked parse event fixes that are consequently not merged
> affecting more than just ARM.
> 
> There is a related discussion that events specified without PMUs
> should inherently only mean core PMUs. Unfortunately such a change
> would break uncore events specified without a PMU, for example `perf
> stat -e data_read -a sleep 1` gathers read memory bandwidth on uncore
> memory controllers on recent Intel devices. Not specifying a PMU for
> uncore events is also assumed by perf metrics, so a large number of
> metrics would need updating to make such a change work. Many existing
> JSON uncore events specify a PMU in their name like
> UNC_M2HBM_CMS_CLOCKTICKS and it feels somewhat redundant to have to
> make that h2hbm/UNC_M2HBM_CMS_CLOCKTICKS/. It is unclear who would
> pursue fixing all of this, and so it seems not specifying a PMU with
> an event for perf will keep meaning trying to open the event on all
> PMUs that advertise such an event.
> 
> Thanks,
> Ian
> 
>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>> --
>> Everything you wanna know about Linux kernel regression tracking:
>> https://linux-regtracking.leemhuis.info/about/#tldr
>> That page also explains what to do if mails like this annoy you.
>>
> 
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2024-08-07  8:54       ` Thorsten Leemhuis
@ 2024-08-14 16:28         ` James Clark
  2024-08-14 16:41           ` Arnaldo Carvalho de Melo
  2024-08-15 17:29           ` Ian Rogers
  0 siblings, 2 replies; 53+ messages in thread
From: James Clark @ 2024-08-14 16:28 UTC (permalink / raw)
  To: Thorsten Leemhuis, Arnaldo Carvalho de Melo, Ian Rogers,
	Mark Rutland
  Cc: Linux perf Profiling, Linux Kernel Mailing List, James Clark,
	cc: Marc Zyngier, Hector Martin, Asahi Linux, Ian Rogers,
	Linux regressions mailing list, to: Mark Rutland



On 07/08/2024 9:54 am, Thorsten Leemhuis wrote:
> On 01.08.24 21:05, Ian Rogers wrote:
>> On Wed, Dec 6, 2023 at 4:09 AM Linux regression tracking #update
>> (Thorsten Leemhuis) <regressions@leemhuis.info> wrote:
>>>
>>> [TLDR: This mail in primarily relevant for Linux kernel regression
>>> tracking. See link in footer if these mails annoy you.]
>>>
>>> On 22.11.23 00:43, Bagas Sanjaya wrote:
>>>> On Tue, Nov 21, 2023 at 09:08:48PM +0900, Hector Martin wrote:
>>>>> Perf broke on all Apple ARM64 systems (tested almost everything), and
>>>>> according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
>>>
>>> #regzbot fix: perf parse-events: Make legacy events lower priority than
>>> sysfs/JSON
>>> #regzbot ignore-activity
>>
>> Note, this is still broken.
> 
> Hmmm, so all that became somewhat messy. Arnaldo, what's the way out of
> this? Or is this a "we are screwed one way or another and someone has to
> bite the bullet" situation?
> 
> Ciao, Thorsten
> 
>> The patch changed the priority in the case
>> that you do something like:
>>
>> $ perf stat -e 'armv8_pmuv3_0/cycles/' benchmark
>>
>> but if you do:
>>
>> $ perf stat -e 'cycles' benchmark
>>
>> then the broken behavior will happen as legacy events have priority
>> over sysfs/json events in that case. To fix this you need to revert:
>> 4f1b067359ac Revert "perf parse-events: Prefer sysfs/JSON hardware
>> events over legacy"
>>
>> This causes some testing issues resolved in this unmerged patch series:
>> https://lore.kernel.org/lkml/20240510053705.2462258-1-irogers@google.com/
>>
>> There is a bug as the arm_dsu PMU advertises an event called "cycles"
>> and this PMU is present on Ampere systems. Reverting the commit above
>> will cause an issue as the commit 7b100989b4f6 ("perf evlist: Remove
>> __evlist__add_default") to fix ARM's BIG.little systems (opening a
>> cycles event on all PMUs not just 1) will cause the arm_dsu event to
>> be opened by perf record and fail as the event won't support sampling.
>>
>> The patch https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
>> fixes this by only opening the cycles event on core PMUs when choosing
>> default events.
>>
>> Rather than take this patch the revert happened as Linus runs the
>> command "perf record -e cycles:pp" (ie using a specified event and not
>> defaults) and considers it a regression in the perf tool that on an
>> Ampere system to need to do "perf record -e
>> 'armv8_pmuv3_0/cycles/pp'". It was pointed out that not specifying -e
>> will choose the cycles event correctly and with better precision the
>> pp for systems that support it, but it was still considered a
>> regression in the perf tool so the revert was made to happen. There is
>> a lack of perf testing coverage for ARM, in particular as they choose
>> to do everything in a different way to x86. The patch in question was
>> in the linux-next tree for weeks without issues.
>>
>> ARM/Ampere could fix this by renaming the event from cycles to
>> cpu_cycles, or by following Intel's convention that anything uncore
>> uses the name clockticks rather than cycles. This could break people
>> who rely on an event called arm_dsu/cycles/ but I imagine such people
>> are rare. There has been no progress I'm aware of on renaming the
>> event.
>>
>> Making perf not terminate on opening an event for perf record seems
>> like the most likely workaround as that is at least something under
>> the tool maintainers control. ARM have discussed doing this on the
>> lists:
>> https://lore.kernel.org/lkml/f30f676e-a1d7-4d6b-94c1-3bdbd1448887@arm.com/
>> but since the revert in v6.10 no patches have appeared for the v6.11
>> merge window. Feature work like coresight improvements and ARMv9 are
>> being actively pursued by ARM, but feature work won't resolve this
>> regression.
>>

I got some hardware with the DSU PMU so I'm going to have a go at trying 
to send some fixes for this. My initial idea was to try incorporate the 
"not terminate on opening" change as discussed in the link directly 
above. And then do the revert of the "revert of prefer sysfs/json".

FWIW I don't think Juno currently is broken if the kernel supports 
extended type ID? I could have missed some output in this thread but it 
seems like it's mostly related to Apple M hardware. I'm also a bit 
confused why the "supports extended type" check fails there, but maybe 
the v6.9 commit 25412c036 from Mark is missing?

I sent a small fix the other day to make perf stat default arguments 
work on Juno, and didn't notice anything out of the ordinary: 
https://lore.kernel.org/linux-perf-users/dac6ad1d-5aca-48b4-9dcb-ff7e54ca43f6@linaro.org/T/#t
I agree that change is quite narrow but it does incrementally improve 
things for the time being. It's possible that it would become redundant 
if I can just include Ian's change to use strings for Perf stat.

Of course I only think I have a handle on the issue right now, seems 
like it has a lot of moving parts and something else always comes up. If 
I hit a wall at some point I will come back here.

Thanks
James

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2024-08-14 16:28         ` James Clark
@ 2024-08-14 16:41           ` Arnaldo Carvalho de Melo
  2024-08-15 15:15             ` James Clark
  2024-08-15 17:29           ` Ian Rogers
  1 sibling, 1 reply; 53+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-14 16:41 UTC (permalink / raw)
  To: James Clark
  Cc: Thorsten Leemhuis, Arnaldo Carvalho de Melo, Ian Rogers,
	Mark Rutland, Linux perf Profiling, Linux Kernel Mailing List,
	James Clark, cc: Marc Zyngier, Hector Martin, Asahi Linux,
	Linux regressions mailing list

On Wed, Aug 14, 2024 at 05:28:42PM +0100, James Clark wrote:
> 
> 
> On 07/08/2024 9:54 am, Thorsten Leemhuis wrote:
> > On 01.08.24 21:05, Ian Rogers wrote:
> > > On Wed, Dec 6, 2023 at 4:09 AM Linux regression tracking #update
> > > (Thorsten Leemhuis) <regressions@leemhuis.info> wrote:
> > > > 
> > > > [TLDR: This mail in primarily relevant for Linux kernel regression
> > > > tracking. See link in footer if these mails annoy you.]
> > > > 
> > > > On 22.11.23 00:43, Bagas Sanjaya wrote:
> > > > > On Tue, Nov 21, 2023 at 09:08:48PM +0900, Hector Martin wrote:
> > > > > > Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > > > > according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> > > > 
> > > > #regzbot fix: perf parse-events: Make legacy events lower priority than
> > > > sysfs/JSON
> > > > #regzbot ignore-activity
> > > 
> > > Note, this is still broken.
> > 
> > Hmmm, so all that became somewhat messy. Arnaldo, what's the way out of
> > this? Or is this a "we are screwed one way or another and someone has to
> > bite the bullet" situation?
> > 
> > Ciao, Thorsten
> > 
> > > The patch changed the priority in the case
> > > that you do something like:
> > > 
> > > $ perf stat -e 'armv8_pmuv3_0/cycles/' benchmark
> > > 
> > > but if you do:
> > > 
> > > $ perf stat -e 'cycles' benchmark
> > > 
> > > then the broken behavior will happen as legacy events have priority
> > > over sysfs/json events in that case. To fix this you need to revert:
> > > 4f1b067359ac Revert "perf parse-events: Prefer sysfs/JSON hardware
> > > events over legacy"
> > > 
> > > This causes some testing issues resolved in this unmerged patch series:
> > > https://lore.kernel.org/lkml/20240510053705.2462258-1-irogers@google.com/
> > > 
> > > There is a bug as the arm_dsu PMU advertises an event called "cycles"
> > > and this PMU is present on Ampere systems. Reverting the commit above
> > > will cause an issue as the commit 7b100989b4f6 ("perf evlist: Remove
> > > __evlist__add_default") to fix ARM's BIG.little systems (opening a
> > > cycles event on all PMUs not just 1) will cause the arm_dsu event to
> > > be opened by perf record and fail as the event won't support sampling.
> > > 
> > > The patch https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
> > > fixes this by only opening the cycles event on core PMUs when choosing
> > > default events.
> > > 
> > > Rather than take this patch the revert happened as Linus runs the
> > > command "perf record -e cycles:pp" (ie using a specified event and not
> > > defaults) and considers it a regression in the perf tool that on an
> > > Ampere system to need to do "perf record -e
> > > 'armv8_pmuv3_0/cycles/pp'". It was pointed out that not specifying -e
> > > will choose the cycles event correctly and with better precision the
> > > pp for systems that support it, but it was still considered a
> > > regression in the perf tool so the revert was made to happen. There is
> > > a lack of perf testing coverage for ARM, in particular as they choose
> > > to do everything in a different way to x86. The patch in question was
> > > in the linux-next tree for weeks without issues.
> > > 
> > > ARM/Ampere could fix this by renaming the event from cycles to
> > > cpu_cycles, or by following Intel's convention that anything uncore
> > > uses the name clockticks rather than cycles. This could break people
> > > who rely on an event called arm_dsu/cycles/ but I imagine such people
> > > are rare. There has been no progress I'm aware of on renaming the
> > > event.
> > > 
> > > Making perf not terminate on opening an event for perf record seems
> > > like the most likely workaround as that is at least something under
> > > the tool maintainers control. ARM have discussed doing this on the
> > > lists:
> > > https://lore.kernel.org/lkml/f30f676e-a1d7-4d6b-94c1-3bdbd1448887@arm.com/
> > > but since the revert in v6.10 no patches have appeared for the v6.11
> > > merge window. Feature work like coresight improvements and ARMv9 are
> > > being actively pursued by ARM, but feature work won't resolve this
> > > regression.
> > > 
> 
> I got some hardware with the DSU PMU so I'm going to have a go at trying to
> send some fixes for this. My initial idea was to try incorporate the "not
> terminate on opening" change as discussed in the link directly above. And
> then do the revert of the "revert of prefer sysfs/json".
> 
> FWIW I don't think Juno currently is broken if the kernel supports extended
> type ID? I could have missed some output in this thread but it seems like
> it's mostly related to Apple M hardware. I'm also a bit confused why the
> "supports extended type" check fails there, but maybe the v6.9 commit
> 25412c036 from Mark is missing?
> 
> I sent a small fix the other day to make perf stat default arguments work on
> Juno, and didn't notice anything out of the ordinary: https://lore.kernel.org/linux-perf-users/dac6ad1d-5aca-48b4-9dcb-ff7e54ca43f6@linaro.org/T/#t
> I agree that change is quite narrow but it does incrementally improve things
> for the time being. It's possible that it would become redundant if I can
> just include Ian's change to use strings for Perf stat.
> 
> Of course I only think I have a handle on the issue right now, seems like it
> has a lot of moving parts and something else always comes up. If I hit a
> wall at some point I will come back here.

Thanks for working on this, hopefully we'll get to a solution that keeps
all the expectations expressed in this thread about not breaking
existing muscle memory and that allows us to progress on this matter.

- Arnaldo

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2024-08-14 16:41           ` Arnaldo Carvalho de Melo
@ 2024-08-15 15:15             ` James Clark
  2024-08-15 15:20               ` James Clark
  2024-08-15 15:27               ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 53+ messages in thread
From: James Clark @ 2024-08-15 15:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Thorsten Leemhuis, Arnaldo Carvalho de Melo, Ian Rogers,
	Mark Rutland, Linux perf Profiling, Linux Kernel Mailing List,
	James Clark, cc: Marc Zyngier, Hector Martin, Asahi Linux,
	Linux regressions mailing list



On 14/08/2024 5:41 pm, Arnaldo Carvalho de Melo wrote:
> On Wed, Aug 14, 2024 at 05:28:42PM +0100, James Clark wrote:
>>
>>
>> On 07/08/2024 9:54 am, Thorsten Leemhuis wrote:
>>> On 01.08.24 21:05, Ian Rogers wrote:
>>>> On Wed, Dec 6, 2023 at 4:09 AM Linux regression tracking #update
>>>> (Thorsten Leemhuis) <regressions@leemhuis.info> wrote:
>>>>>
>>>>> [TLDR: This mail in primarily relevant for Linux kernel regression
>>>>> tracking. See link in footer if these mails annoy you.]
>>>>>
>>>>> On 22.11.23 00:43, Bagas Sanjaya wrote:
>>>>>> On Tue, Nov 21, 2023 at 09:08:48PM +0900, Hector Martin wrote:
>>>>>>> Perf broke on all Apple ARM64 systems (tested almost everything), and
>>>>>>> according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
>>>>>
>>>>> #regzbot fix: perf parse-events: Make legacy events lower priority than
>>>>> sysfs/JSON
>>>>> #regzbot ignore-activity
>>>>
>>>> Note, this is still broken.
>>>
>>> Hmmm, so all that became somewhat messy. Arnaldo, what's the way out of
>>> this? Or is this a "we are screwed one way or another and someone has to
>>> bite the bullet" situation?
>>>
>>> Ciao, Thorsten
>>>
>>>> The patch changed the priority in the case
>>>> that you do something like:
>>>>
>>>> $ perf stat -e 'armv8_pmuv3_0/cycles/' benchmark
>>>>
>>>> but if you do:
>>>>
>>>> $ perf stat -e 'cycles' benchmark
>>>>
>>>> then the broken behavior will happen as legacy events have priority
>>>> over sysfs/json events in that case. To fix this you need to revert:
>>>> 4f1b067359ac Revert "perf parse-events: Prefer sysfs/JSON hardware
>>>> events over legacy"
>>>>
>>>> This causes some testing issues resolved in this unmerged patch series:
>>>> https://lore.kernel.org/lkml/20240510053705.2462258-1-irogers@google.com/
>>>>
>>>> There is a bug as the arm_dsu PMU advertises an event called "cycles"
>>>> and this PMU is present on Ampere systems. Reverting the commit above
>>>> will cause an issue as the commit 7b100989b4f6 ("perf evlist: Remove
>>>> __evlist__add_default") to fix ARM's BIG.little systems (opening a
>>>> cycles event on all PMUs not just 1) will cause the arm_dsu event to
>>>> be opened by perf record and fail as the event won't support sampling.
>>>>
>>>> The patch https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
>>>> fixes this by only opening the cycles event on core PMUs when choosing
>>>> default events.
>>>>
>>>> Rather than take this patch the revert happened as Linus runs the
>>>> command "perf record -e cycles:pp" (ie using a specified event and not
>>>> defaults) and considers it a regression in the perf tool that on an
>>>> Ampere system to need to do "perf record -e
>>>> 'armv8_pmuv3_0/cycles/pp'". It was pointed out that not specifying -e
>>>> will choose the cycles event correctly and with better precision the
>>>> pp for systems that support it, but it was still considered a
>>>> regression in the perf tool so the revert was made to happen. There is
>>>> a lack of perf testing coverage for ARM, in particular as they choose
>>>> to do everything in a different way to x86. The patch in question was
>>>> in the linux-next tree for weeks without issues.
>>>>
>>>> ARM/Ampere could fix this by renaming the event from cycles to
>>>> cpu_cycles, or by following Intel's convention that anything uncore
>>>> uses the name clockticks rather than cycles. This could break people
>>>> who rely on an event called arm_dsu/cycles/ but I imagine such people
>>>> are rare. There has been no progress I'm aware of on renaming the
>>>> event.
>>>>
>>>> Making perf not terminate on opening an event for perf record seems
>>>> like the most likely workaround as that is at least something under
>>>> the tool maintainers control. ARM have discussed doing this on the
>>>> lists:
>>>> https://lore.kernel.org/lkml/f30f676e-a1d7-4d6b-94c1-3bdbd1448887@arm.com/
>>>> but since the revert in v6.10 no patches have appeared for the v6.11
>>>> merge window. Feature work like coresight improvements and ARMv9 are
>>>> being actively pursued by ARM, but feature work won't resolve this
>>>> regression.
>>>>
>>
>> I got some hardware with the DSU PMU so I'm going to have a go at trying to
>> send some fixes for this. My initial idea was to try incorporate the "not
>> terminate on opening" change as discussed in the link directly above. And
>> then do the revert of the "revert of prefer sysfs/json".
>>
>> FWIW I don't think Juno currently is broken if the kernel supports extended
>> type ID? I could have missed some output in this thread but it seems like
>> it's mostly related to Apple M hardware. I'm also a bit confused why the
>> "supports extended type" check fails there, but maybe the v6.9 commit
>> 25412c036 from Mark is missing?
>>
>> I sent a small fix the other day to make perf stat default arguments work on
>> Juno, and didn't notice anything out of the ordinary: https://lore.kernel.org/linux-perf-users/dac6ad1d-5aca-48b4-9dcb-ff7e54ca43f6@linaro.org/T/#t
>> I agree that change is quite narrow but it does incrementally improve things
>> for the time being. It's possible that it would become redundant if I can
>> just include Ian's change to use strings for Perf stat.
>>
>> Of course I only think I have a handle on the issue right now, seems like it
>> has a lot of moving parts and something else always comes up. If I hit a
>> wall at some point I will come back here.
> 
> Thanks for working on this, hopefully we'll get to a solution that keeps
> all the expectations expressed in this thread about not breaking
> existing muscle memory and that allows us to progress on this matter.
> 
> - Arnaldo

Hi Arnaldo,

In one of your investigations here 
https://lore.kernel.org/lkml/Zld3dlJHjFMFG02v@x1/ comparing "cycles", 
"cpu-cycles" and "cpu_cycles" events on Arm you say only some of them 
open events on both core types. I wasn't able to reproduce that on 
perf-tools-next (27ac597c0e) or v6.9 (a38297e3fb) for perf record or 
stat. I guessed the 6.9 tag because you only mentioned it was on tip and 
it was 29th May. For me they all open exactly the same two legacy events 
with the extended type ID set.

It looks like the behavior you see would be caused by either missing 
this kernel change:

   5c81672865 ("arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability")
    (v6.6 release)

Or this userspace change, but unlikely as it was a fix for Apple M hardware:

   25412c036 ("perf print-events: make is_event_supported() more robust")
    (v6.9 release)

Do you remember if you were using a new kernel or only testing a new 
Perf? Or if you don't mind could you re-test? Hopefully not to derail 
the discussion but I just want to make sure I'm not missing some other 
third issue before I start hacking away.

I believe we still need to revert the revert of the JSON/legacy change. 
Because as Mark mentions there is no guarantee that a PMU's named event 
is the same as a legacy event of the same name, so we do want to prefer 
sysfs/JSON. There are some other edge cases like new Perf on an old 
kernel before we added extended type support, but I don't think I'll 
list all of them.

Having said that, I believe that currently all the sysfs and legacy 
events actually _are_ the same. So it's not a user facing issue _yet_, 
or at least on any hardware mentioned in these threads.

Thanks
James

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2024-08-15 15:15             ` James Clark
@ 2024-08-15 15:20               ` James Clark
  2024-08-15 15:27               ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 53+ messages in thread
From: James Clark @ 2024-08-15 15:20 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Thorsten Leemhuis, Arnaldo Carvalho de Melo, Ian Rogers,
	Mark Rutland, Linux perf Profiling, Linux Kernel Mailing List,
	James Clark, cc: Marc Zyngier, Hector Martin, Asahi Linux,
	Linux regressions mailing list



On 15/08/2024 4:15 pm, James Clark wrote:
> 
> 
> On 14/08/2024 5:41 pm, Arnaldo Carvalho de Melo wrote:
>> On Wed, Aug 14, 2024 at 05:28:42PM +0100, James Clark wrote:
>>>
>>>
>>> On 07/08/2024 9:54 am, Thorsten Leemhuis wrote:
>>>> On 01.08.24 21:05, Ian Rogers wrote:
>>>>> On Wed, Dec 6, 2023 at 4:09 AM Linux regression tracking #update
>>>>> (Thorsten Leemhuis) <regressions@leemhuis.info> wrote:
>>>>>>
>>>>>> [TLDR: This mail in primarily relevant for Linux kernel regression
>>>>>> tracking. See link in footer if these mails annoy you.]
>>>>>>
>>>>>> On 22.11.23 00:43, Bagas Sanjaya wrote:
>>>>>>> On Tue, Nov 21, 2023 at 09:08:48PM +0900, Hector Martin wrote:
>>>>>>>> Perf broke on all Apple ARM64 systems (tested almost 
>>>>>>>> everything), and
>>>>>>>> according to maz also on Juno (so, probably all big.LITTLE) 
>>>>>>>> since v6.5.
>>>>>>
>>>>>> #regzbot fix: perf parse-events: Make legacy events lower priority 
>>>>>> than
>>>>>> sysfs/JSON
>>>>>> #regzbot ignore-activity
>>>>>
>>>>> Note, this is still broken.
>>>>
>>>> Hmmm, so all that became somewhat messy. Arnaldo, what's the way out of
>>>> this? Or is this a "we are screwed one way or another and someone 
>>>> has to
>>>> bite the bullet" situation?
>>>>
>>>> Ciao, Thorsten
>>>>
>>>>> The patch changed the priority in the case
>>>>> that you do something like:
>>>>>
>>>>> $ perf stat -e 'armv8_pmuv3_0/cycles/' benchmark
>>>>>
>>>>> but if you do:
>>>>>
>>>>> $ perf stat -e 'cycles' benchmark
>>>>>
>>>>> then the broken behavior will happen as legacy events have priority
>>>>> over sysfs/json events in that case. To fix this you need to revert:
>>>>> 4f1b067359ac Revert "perf parse-events: Prefer sysfs/JSON hardware
>>>>> events over legacy"
>>>>>
>>>>> This causes some testing issues resolved in this unmerged patch 
>>>>> series:
>>>>> https://lore.kernel.org/lkml/20240510053705.2462258-1-irogers@google.com/
>>>>>
>>>>> There is a bug as the arm_dsu PMU advertises an event called "cycles"
>>>>> and this PMU is present on Ampere systems. Reverting the commit above
>>>>> will cause an issue as the commit 7b100989b4f6 ("perf evlist: Remove
>>>>> __evlist__add_default") to fix ARM's BIG.little systems (opening a
>>>>> cycles event on all PMUs not just 1) will cause the arm_dsu event to
>>>>> be opened by perf record and fail as the event won't support sampling.
>>>>>
>>>>> The patch 
>>>>> https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
>>>>> fixes this by only opening the cycles event on core PMUs when choosing
>>>>> default events.
>>>>>
>>>>> Rather than take this patch the revert happened as Linus runs the
>>>>> command "perf record -e cycles:pp" (ie using a specified event and not
>>>>> defaults) and considers it a regression in the perf tool that on an
>>>>> Ampere system to need to do "perf record -e
>>>>> 'armv8_pmuv3_0/cycles/pp'". It was pointed out that not specifying -e
>>>>> will choose the cycles event correctly and with better precision the
>>>>> pp for systems that support it, but it was still considered a
>>>>> regression in the perf tool so the revert was made to happen. There is
>>>>> a lack of perf testing coverage for ARM, in particular as they choose
>>>>> to do everything in a different way to x86. The patch in question was
>>>>> in the linux-next tree for weeks without issues.
>>>>>
>>>>> ARM/Ampere could fix this by renaming the event from cycles to
>>>>> cpu_cycles, or by following Intel's convention that anything uncore
>>>>> uses the name clockticks rather than cycles. This could break people
>>>>> who rely on an event called arm_dsu/cycles/ but I imagine such people
>>>>> are rare. There has been no progress I'm aware of on renaming the
>>>>> event.
>>>>>
>>>>> Making perf not terminate on opening an event for perf record seems
>>>>> like the most likely workaround as that is at least something under
>>>>> the tool maintainers control. ARM have discussed doing this on the
>>>>> lists:
>>>>> https://lore.kernel.org/lkml/f30f676e-a1d7-4d6b-94c1-3bdbd1448887@arm.com/
>>>>> but since the revert in v6.10 no patches have appeared for the v6.11
>>>>> merge window. Feature work like coresight improvements and ARMv9 are
>>>>> being actively pursued by ARM, but feature work won't resolve this
>>>>> regression.
>>>>>
>>>
>>> I got some hardware with the DSU PMU so I'm going to have a go at 
>>> trying to
>>> send some fixes for this. My initial idea was to try incorporate the 
>>> "not
>>> terminate on opening" change as discussed in the link directly above. 
>>> And
>>> then do the revert of the "revert of prefer sysfs/json".
>>>
>>> FWIW I don't think Juno currently is broken if the kernel supports 
>>> extended
>>> type ID? I could have missed some output in this thread but it seems 
>>> like
>>> it's mostly related to Apple M hardware. I'm also a bit confused why the
>>> "supports extended type" check fails there, but maybe the v6.9 commit
>>> 25412c036 from Mark is missing?
>>>
>>> I sent a small fix the other day to make perf stat default arguments 
>>> work on
>>> Juno, and didn't notice anything out of the ordinary: 
>>> https://lore.kernel.org/linux-perf-users/dac6ad1d-5aca-48b4-9dcb-ff7e54ca43f6@linaro.org/T/#t
>>> I agree that change is quite narrow but it does incrementally improve 
>>> things
>>> for the time being. It's possible that it would become redundant if I 
>>> can
>>> just include Ian's change to use strings for Perf stat.
>>>
>>> Of course I only think I have a handle on the issue right now, seems 
>>> like it
>>> has a lot of moving parts and something else always comes up. If I hit a
>>> wall at some point I will come back here.
>>
>> Thanks for working on this, hopefully we'll get to a solution that keeps
>> all the expectations expressed in this thread about not breaking
>> existing muscle memory and that allows us to progress on this matter.
>>
>> - Arnaldo
> 
> Hi Arnaldo,
> 
> In one of your investigations here 
> https://lore.kernel.org/lkml/Zld3dlJHjFMFG02v@x1/ comparing "cycles", 
> "cpu-cycles" and "cpu_cycles" events on Arm you say only some of them 
> open events on both core types. I wasn't able to reproduce that on 
> perf-tools-next (27ac597c0e) or v6.9 (a38297e3fb) for perf record or 
> stat. I guessed the 6.9 tag because you only mentioned it was on tip and 
> it was 29th May. For me they all open exactly the same two legacy events 
> with the extended type ID set.

Minor correction, one opens using the PMU type rather a legacy event 
with extended type ID. But importantly they do all open on both CPU types.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2024-08-15 15:15             ` James Clark
  2024-08-15 15:20               ` James Clark
@ 2024-08-15 15:27               ` Arnaldo Carvalho de Melo
  2024-08-15 15:53                 ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 53+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-15 15:27 UTC (permalink / raw)
  To: James Clark
  Cc: Thorsten Leemhuis, Arnaldo Carvalho de Melo, Ian Rogers,
	Mark Rutland, Linux perf Profiling, Linux Kernel Mailing List,
	James Clark, cc: Marc Zyngier, Hector Martin, Asahi Linux,
	Linux regressions mailing list

On Thu, Aug 15, 2024 at 04:15:41PM +0100, James Clark wrote:
> 
> 
> On 14/08/2024 5:41 pm, Arnaldo Carvalho de Melo wrote:
> > On Wed, Aug 14, 2024 at 05:28:42PM +0100, James Clark wrote:
> > > 
> > > 
> > > On 07/08/2024 9:54 am, Thorsten Leemhuis wrote:
> > > > On 01.08.24 21:05, Ian Rogers wrote:
> > > > > On Wed, Dec 6, 2023 at 4:09 AM Linux regression tracking #update
> > > > > (Thorsten Leemhuis) <regressions@leemhuis.info> wrote:
> > > > > > 
> > > > > > [TLDR: This mail in primarily relevant for Linux kernel regression
> > > > > > tracking. See link in footer if these mails annoy you.]
> > > > > > 
> > > > > > On 22.11.23 00:43, Bagas Sanjaya wrote:
> > > > > > > On Tue, Nov 21, 2023 at 09:08:48PM +0900, Hector Martin wrote:
> > > > > > > > Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > > > > > > according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> > > > > > 
> > > > > > #regzbot fix: perf parse-events: Make legacy events lower priority than
> > > > > > sysfs/JSON
> > > > > > #regzbot ignore-activity
> > > > > 
> > > > > Note, this is still broken.
> > > > 
> > > > Hmmm, so all that became somewhat messy. Arnaldo, what's the way out of
> > > > this? Or is this a "we are screwed one way or another and someone has to
> > > > bite the bullet" situation?
> > > > 
> > > > Ciao, Thorsten
> > > > 
> > > > > The patch changed the priority in the case
> > > > > that you do something like:
> > > > > 
> > > > > $ perf stat -e 'armv8_pmuv3_0/cycles/' benchmark
> > > > > 
> > > > > but if you do:
> > > > > 
> > > > > $ perf stat -e 'cycles' benchmark
> > > > > 
> > > > > then the broken behavior will happen as legacy events have priority
> > > > > over sysfs/json events in that case. To fix this you need to revert:
> > > > > 4f1b067359ac Revert "perf parse-events: Prefer sysfs/JSON hardware
> > > > > events over legacy"
> > > > > 
> > > > > This causes some testing issues resolved in this unmerged patch series:
> > > > > https://lore.kernel.org/lkml/20240510053705.2462258-1-irogers@google.com/
> > > > > 
> > > > > There is a bug as the arm_dsu PMU advertises an event called "cycles"
> > > > > and this PMU is present on Ampere systems. Reverting the commit above
> > > > > will cause an issue as the commit 7b100989b4f6 ("perf evlist: Remove
> > > > > __evlist__add_default") to fix ARM's BIG.little systems (opening a
> > > > > cycles event on all PMUs not just 1) will cause the arm_dsu event to
> > > > > be opened by perf record and fail as the event won't support sampling.
> > > > > 
> > > > > The patch https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
> > > > > fixes this by only opening the cycles event on core PMUs when choosing
> > > > > default events.
> > > > > 
> > > > > Rather than take this patch the revert happened as Linus runs the
> > > > > command "perf record -e cycles:pp" (ie using a specified event and not
> > > > > defaults) and considers it a regression in the perf tool that on an
> > > > > Ampere system to need to do "perf record -e
> > > > > 'armv8_pmuv3_0/cycles/pp'". It was pointed out that not specifying -e
> > > > > will choose the cycles event correctly and with better precision the
> > > > > pp for systems that support it, but it was still considered a
> > > > > regression in the perf tool so the revert was made to happen. There is
> > > > > a lack of perf testing coverage for ARM, in particular as they choose
> > > > > to do everything in a different way to x86. The patch in question was
> > > > > in the linux-next tree for weeks without issues.
> > > > > 
> > > > > ARM/Ampere could fix this by renaming the event from cycles to
> > > > > cpu_cycles, or by following Intel's convention that anything uncore
> > > > > uses the name clockticks rather than cycles. This could break people
> > > > > who rely on an event called arm_dsu/cycles/ but I imagine such people
> > > > > are rare. There has been no progress I'm aware of on renaming the
> > > > > event.
> > > > > 
> > > > > Making perf not terminate on opening an event for perf record seems
> > > > > like the most likely workaround as that is at least something under
> > > > > the tool maintainers control. ARM have discussed doing this on the
> > > > > lists:
> > > > > https://lore.kernel.org/lkml/f30f676e-a1d7-4d6b-94c1-3bdbd1448887@arm.com/
> > > > > but since the revert in v6.10 no patches have appeared for the v6.11
> > > > > merge window. Feature work like coresight improvements and ARMv9 are
> > > > > being actively pursued by ARM, but feature work won't resolve this
> > > > > regression.
> > > > > 
> > > 
> > > I got some hardware with the DSU PMU so I'm going to have a go at trying to
> > > send some fixes for this. My initial idea was to try incorporate the "not
> > > terminate on opening" change as discussed in the link directly above. And
> > > then do the revert of the "revert of prefer sysfs/json".
> > > 
> > > FWIW I don't think Juno currently is broken if the kernel supports extended
> > > type ID? I could have missed some output in this thread but it seems like
> > > it's mostly related to Apple M hardware. I'm also a bit confused why the
> > > "supports extended type" check fails there, but maybe the v6.9 commit
> > > 25412c036 from Mark is missing?
> > > 
> > > I sent a small fix the other day to make perf stat default arguments work on
> > > Juno, and didn't notice anything out of the ordinary: https://lore.kernel.org/linux-perf-users/dac6ad1d-5aca-48b4-9dcb-ff7e54ca43f6@linaro.org/T/#t
> > > I agree that change is quite narrow but it does incrementally improve things
> > > for the time being. It's possible that it would become redundant if I can
> > > just include Ian's change to use strings for Perf stat.
> > > 
> > > Of course I only think I have a handle on the issue right now, seems like it
> > > has a lot of moving parts and something else always comes up. If I hit a
> > > wall at some point I will come back here.
> > 
> > Thanks for working on this, hopefully we'll get to a solution that keeps
> > all the expectations expressed in this thread about not breaking
> > existing muscle memory and that allows us to progress on this matter.
> > 
> > - Arnaldo
> 
> Hi Arnaldo,
> 
> In one of your investigations here
> https://lore.kernel.org/lkml/Zld3dlJHjFMFG02v@x1/ comparing "cycles",
> "cpu-cycles" and "cpu_cycles" events on Arm you say only some of them open
> events on both core types. I wasn't able to reproduce that on
> perf-tools-next (27ac597c0e) or v6.9 (a38297e3fb) for perf record or stat. I
> guessed the 6.9 tag because you only mentioned it was on tip and it was 29th
> May. For me they all open exactly the same two legacy events with the
> extended type ID set.
> 
> It looks like the behavior you see would be caused by either missing this
> kernel change:
> 
>   5c81672865 ("arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability")
>    (v6.6 release)
> 
> Or this userspace change, but unlikely as it was a fix for Apple M hardware:
> 
>   25412c036 ("perf print-events: make is_event_supported() more robust")
>    (v6.9 release)
> 
> Do you remember if you were using a new kernel or only testing a new Perf?

I normally use the distro/SoC provided kernel, didn't I add the 'uname
-a' output in those investigations (/me slaps himself in the face
speculatively...)?

> Or if you don't mind could you re-test? Hopefully not to derail the

Sure

> discussion but I just want to make sure I'm not missing some other third
> issue before I start hacking away.

This is full of subtleties and has generated a lot of back and forth, so
making sure we don't miss anything is what we should do.
 
> I believe we still need to revert the revert of the JSON/legacy change.

Good to see progress on assessing that.

/me goes and turns on his trusty libre computer board...

- Arnaldo

> Because as Mark mentions there is no guarantee that a PMU's named event is
> the same as a legacy event of the same name, so we do want to prefer
> sysfs/JSON. There are some other edge cases like new Perf on an old kernel
> before we added extended type support, but I don't think I'll list all of
> them.
> 
> Having said that, I believe that currently all the sysfs and legacy events
> actually _are_ the same. So it's not a user facing issue _yet_, or at least
> on any hardware mentioned in these threads.
> 
> Thanks
> James

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2024-08-15 15:27               ` Arnaldo Carvalho de Melo
@ 2024-08-15 15:53                 ` Arnaldo Carvalho de Melo
  2024-08-16  8:57                   ` James Clark
  0 siblings, 1 reply; 53+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-15 15:53 UTC (permalink / raw)
  To: James Clark
  Cc: Thorsten Leemhuis, Arnaldo Carvalho de Melo, Ian Rogers,
	Mark Rutland, Linux perf Profiling, Linux Kernel Mailing List,
	James Clark, cc: Marc Zyngier, Hector Martin, Asahi Linux,
	Linux regressions mailing list

On Thu, Aug 15, 2024 at 12:27:21PM -0300, Arnaldo Carvalho de Melo wrote:
> On Thu, Aug 15, 2024 at 04:15:41PM +0100, James Clark wrote:
> > In one of your investigations here
> > https://lore.kernel.org/lkml/Zld3dlJHjFMFG02v@x1/ comparing "cycles",
> > "cpu-cycles" and "cpu_cycles" events on Arm you say only some of them open
> > events on both core types. I wasn't able to reproduce that on
> > perf-tools-next (27ac597c0e) or v6.9 (a38297e3fb) for perf record or stat. I
> > guessed the 6.9 tag because you only mentioned it was on tip and it was 29th
> > May. For me they all open exactly the same two legacy events with the
> > extended type ID set.
> > 
> > It looks like the behavior you see would be caused by either missing this
> > kernel change:
> > 
> >   5c81672865 ("arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability")
> >    (v6.6 release)

What I have now is:

6.1.92-15907-gf36fd2695db3

It was a bit older, but 6.1 ish as well, I'll try to either get a new
kernel from Libre Computer or build one myself.

- Arnaldo

> > Or this userspace change, but unlikely as it was a fix for Apple M hardware:
> > 
> >   25412c036 ("perf print-events: make is_event_supported() more robust")
> >    (v6.9 release)
> > 
> > Do you remember if you were using a new kernel or only testing a new Perf?
> 
> I normally use the distro/SoC provided kernel, didn't I add the 'uname
> -a' output in those investigations (/me slaps himself in the face
> speculatively...)?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2024-08-14 16:28         ` James Clark
  2024-08-14 16:41           ` Arnaldo Carvalho de Melo
@ 2024-08-15 17:29           ` Ian Rogers
  2024-08-16  9:22             ` James Clark
  1 sibling, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2024-08-15 17:29 UTC (permalink / raw)
  To: James Clark
  Cc: Thorsten Leemhuis, Arnaldo Carvalho de Melo, Mark Rutland,
	Linux perf Profiling, Linux Kernel Mailing List, James Clark,
	cc: Marc Zyngier, Hector Martin, Asahi Linux,
	Linux regressions mailing list, Atish Patra

On Wed, Aug 14, 2024 at 9:28 AM James Clark <james.clark@linaro.org> wrote:
> On 07/08/2024 9:54 am, Thorsten Leemhuis wrote:
> > On 01.08.24 21:05, Ian Rogers wrote:
> >> On Wed, Dec 6, 2023 at 4:09 AM Linux regression tracking #update
> >> (Thorsten Leemhuis) <regressions@leemhuis.info> wrote:
> >>>
> >>> [TLDR: This mail in primarily relevant for Linux kernel regression
> >>> tracking. See link in footer if these mails annoy you.]
> >>>
> >>> On 22.11.23 00:43, Bagas Sanjaya wrote:
> >>>> On Tue, Nov 21, 2023 at 09:08:48PM +0900, Hector Martin wrote:
> >>>>> Perf broke on all Apple ARM64 systems (tested almost everything), and
> >>>>> according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> >>>
> >>> #regzbot fix: perf parse-events: Make legacy events lower priority than
> >>> sysfs/JSON
> >>> #regzbot ignore-activity
> >>
> >> Note, this is still broken.
> >
> > Hmmm, so all that became somewhat messy. Arnaldo, what's the way out of
> > this? Or is this a "we are screwed one way or another and someone has to
> > bite the bullet" situation?
> >
> > Ciao, Thorsten
> >
> >> The patch changed the priority in the case
> >> that you do something like:
> >>
> >> $ perf stat -e 'armv8_pmuv3_0/cycles/' benchmark
> >>
> >> but if you do:
> >>
> >> $ perf stat -e 'cycles' benchmark
> >>
> >> then the broken behavior will happen as legacy events have priority
> >> over sysfs/json events in that case. To fix this you need to revert:
> >> 4f1b067359ac Revert "perf parse-events: Prefer sysfs/JSON hardware
> >> events over legacy"
> >>
> >> This causes some testing issues resolved in this unmerged patch series:
> >> https://lore.kernel.org/lkml/20240510053705.2462258-1-irogers@google.com/
> >>
> >> There is a bug as the arm_dsu PMU advertises an event called "cycles"
> >> and this PMU is present on Ampere systems. Reverting the commit above
> >> will cause an issue as the commit 7b100989b4f6 ("perf evlist: Remove
> >> __evlist__add_default") to fix ARM's BIG.little systems (opening a
> >> cycles event on all PMUs not just 1) will cause the arm_dsu event to
> >> be opened by perf record and fail as the event won't support sampling.
> >>
> >> The patch https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
> >> fixes this by only opening the cycles event on core PMUs when choosing
> >> default events.
> >>
> >> Rather than take this patch the revert happened as Linus runs the
> >> command "perf record -e cycles:pp" (ie using a specified event and not
> >> defaults) and considers it a regression in the perf tool that on an
> >> Ampere system to need to do "perf record -e
> >> 'armv8_pmuv3_0/cycles/pp'". It was pointed out that not specifying -e
> >> will choose the cycles event correctly and with better precision the
> >> pp for systems that support it, but it was still considered a
> >> regression in the perf tool so the revert was made to happen. There is
> >> a lack of perf testing coverage for ARM, in particular as they choose
> >> to do everything in a different way to x86. The patch in question was
> >> in the linux-next tree for weeks without issues.
> >>
> >> ARM/Ampere could fix this by renaming the event from cycles to
> >> cpu_cycles, or by following Intel's convention that anything uncore
> >> uses the name clockticks rather than cycles. This could break people
> >> who rely on an event called arm_dsu/cycles/ but I imagine such people
> >> are rare. There has been no progress I'm aware of on renaming the
> >> event.
> >>
> >> Making perf not terminate on opening an event for perf record seems
> >> like the most likely workaround as that is at least something under
> >> the tool maintainers control. ARM have discussed doing this on the
> >> lists:
> >> https://lore.kernel.org/lkml/f30f676e-a1d7-4d6b-94c1-3bdbd1448887@arm.com/
> >> but since the revert in v6.10 no patches have appeared for the v6.11
> >> merge window. Feature work like coresight improvements and ARMv9 are
> >> being actively pursued by ARM, but feature work won't resolve this
> >> regression.
> >>
>
> I got some hardware with the DSU PMU so I'm going to have a go at trying
> to send some fixes for this. My initial idea was to try incorporate the
> "not terminate on opening" change as discussed in the link directly
> above. And then do the revert of the "revert of prefer sysfs/json".

Thanks, I think this would be good. The biggest issue is that none of
the record logic expects a file descriptor to be not opened, deleting
unopened evsels from the evlist breaks all the indexing into the
mmaps, etc. Tbh, you probably wouldn't do the code this way if was
written afresh. Perhaps a hashmap would map from an evsel to ring
buffer mmaps, etc. Trying to avoid having global state and benefitting
from encapsulation. I'd focus on just doing the expedient thing in the
changes, which probably just means making the record code tolerant of
evsels that fail to open and not modifying the evlist due to the risk
it breaks the indices.

(To point out the obvious, this work wouldn't be necessary if arm_dsu
event were renamed from "cycles" to "cpu_cycles" which would also make
it more intention revealing alongside the arm_dsu's "bus_cycles" event
name).

> FWIW I don't think Juno currently is broken if the kernel supports
> extended type ID? I could have missed some output in this thread but it
> seems like it's mostly related to Apple M hardware. I'm also a bit
> confused why the "supports extended type" check fails there, but maybe
> the v6.9 commit 25412c036 from Mark is missing?

So I think your later emails clarify Arnaldo is probably missing:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/drivers/perf/arm_pmu.c?h=perf-tools-next&id=5c816728651ae425954542fed64d21d40cb75a9f

Fwiw, the Apple M hardware issue came to me by way of Mark Rutland
(iirc), this regression report, etc. My understanding is that Apple M
has something like a v2 ARM PMU and the legacy events are encoded
incorrectly in the driver for this. The regression in v6.5 happened
because ARM's core PMUs had previously been treated as uncore PMUs,
meaning we wouldn't try to program legacy events on them. Fixing the
handling of ARM's core PMUs broke Apple M due to the broken legacy
event mappings. Why not fix the Apple M PMU driver? Well there was
anyway a similar RISC-V issue reported by Atish Patra (iirc) where the
RISC-V PMU driver wants to delegate the mapping of legacy events to
the perf tool so the driver needn't be aware of all and future RISC-V
configurations. The fix discussed with Mark, Atish, etc. has been to
swap the priority of legacy and sysfs/json events so that the latter
has priority. We need the revert of the revert as currently we only do
this if a PMU is specified with an event, not for the general wildcard
PMUs case that most people use. There was huge fallout from flipping
the priority particularly on Intel as all test expectations needed
updating. I've sent out similar fixes that need incorporating when the
revert is reverted. Ideally tools/perf/tests/parse-events.c would be
updated to cover ARM's PMUs that don't follow the normal pattern that
the core PMU is called "cpu" (this would mean that we were testing
event parsing on ARM was WAI wrt encoding priorities, BIG.little,
etc).

> I sent a small fix the other day to make perf stat default arguments
> work on Juno, and didn't notice anything out of the ordinary:
> https://lore.kernel.org/linux-perf-users/dac6ad1d-5aca-48b4-9dcb-ff7e54ca43f6@linaro.org/T/#t
> I agree that change is quite narrow but it does incrementally improve
> things for the time being. It's possible that it would become redundant
> if I can just include Ian's change to use strings for Perf stat.

I'd prefer we didn't merge this as we'd need to rebase:
https://lore.kernel.org/lkml/20240510053705.2462258-4-irogers@google.com/
and those changes would then delete the code introduced. I'm fine with
adding the tests.

There are more exotic heterogeneous core things upcoming, probably
also from ARM, and the thought of duplicating the default attribute
logic and event parsing constraints is just something I'd prefer not
to have to do.

> Of course I only think I have a handle on the issue right now, seems
> like it has a lot of moving parts and something else always comes up. If
> I hit a wall at some point I will come back here.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2024-08-15 15:53                 ` Arnaldo Carvalho de Melo
@ 2024-08-16  8:57                   ` James Clark
  0 siblings, 0 replies; 53+ messages in thread
From: James Clark @ 2024-08-16  8:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Thorsten Leemhuis, Arnaldo Carvalho de Melo, Ian Rogers,
	Mark Rutland, Linux perf Profiling, Linux Kernel Mailing List,
	James Clark, cc: Marc Zyngier, Hector Martin, Asahi Linux,
	Linux regressions mailing list



On 15/08/2024 4:53 pm, Arnaldo Carvalho de Melo wrote:
> On Thu, Aug 15, 2024 at 12:27:21PM -0300, Arnaldo Carvalho de Melo wrote:
>> On Thu, Aug 15, 2024 at 04:15:41PM +0100, James Clark wrote:
>>> In one of your investigations here
>>> https://lore.kernel.org/lkml/Zld3dlJHjFMFG02v@x1/ comparing "cycles",
>>> "cpu-cycles" and "cpu_cycles" events on Arm you say only some of them open
>>> events on both core types. I wasn't able to reproduce that on
>>> perf-tools-next (27ac597c0e) or v6.9 (a38297e3fb) for perf record or stat. I
>>> guessed the 6.9 tag because you only mentioned it was on tip and it was 29th
>>> May. For me they all open exactly the same two legacy events with the
>>> extended type ID set.
>>>
>>> It looks like the behavior you see would be caused by either missing this
>>> kernel change:
>>>
>>>    5c81672865 ("arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability")
>>>     (v6.6 release)
> 
> What I have now is:
> 
> 6.1.92-15907-gf36fd2695db3
> 
> It was a bit older, but 6.1 ish as well, I'll try to either get a new
> kernel from Libre Computer or build one myself.
> 
> - Arnaldo
>

Thanks for the confirmation. In that case you may not even need to 
retest. I was only wondering if it was broken from v6.6 onwards, but 6.1 
not working is expected. And I'm certain that you'll find any later 
versions working.

>>> Or this userspace change, but unlikely as it was a fix for Apple M hardware:
>>>
>>>    25412c036 ("perf print-events: make is_event_supported() more robust")
>>>     (v6.9 release)
>>>
>>> Do you remember if you were using a new kernel or only testing a new Perf?
>>
>> I normally use the distro/SoC provided kernel, didn't I add the 'uname
>> -a' output in those investigations (/me slaps himself in the face
>> speculatively...)?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2024-08-15 17:29           ` Ian Rogers
@ 2024-08-16  9:22             ` James Clark
  2024-08-16 15:30               ` Ian Rogers
  0 siblings, 1 reply; 53+ messages in thread
From: James Clark @ 2024-08-16  9:22 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Thorsten Leemhuis, Arnaldo Carvalho de Melo, Mark Rutland,
	Linux perf Profiling, Linux Kernel Mailing List, James Clark,
	cc: Marc Zyngier, Hector Martin, Asahi Linux,
	Linux regressions mailing list, Atish Patra



On 15/08/2024 6:29 pm, Ian Rogers wrote:
> On Wed, Aug 14, 2024 at 9:28 AM James Clark <james.clark@linaro.org> wrote:
>> On 07/08/2024 9:54 am, Thorsten Leemhuis wrote:
>>> On 01.08.24 21:05, Ian Rogers wrote:
>>>> On Wed, Dec 6, 2023 at 4:09 AM Linux regression tracking #update
>>>> (Thorsten Leemhuis) <regressions@leemhuis.info> wrote:
>>>>>
>>>>> [TLDR: This mail in primarily relevant for Linux kernel regression
>>>>> tracking. See link in footer if these mails annoy you.]
>>>>>
>>>>> On 22.11.23 00:43, Bagas Sanjaya wrote:
>>>>>> On Tue, Nov 21, 2023 at 09:08:48PM +0900, Hector Martin wrote:
>>>>>>> Perf broke on all Apple ARM64 systems (tested almost everything), and
>>>>>>> according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
>>>>>
>>>>> #regzbot fix: perf parse-events: Make legacy events lower priority than
>>>>> sysfs/JSON
>>>>> #regzbot ignore-activity
>>>>
>>>> Note, this is still broken.
>>>
>>> Hmmm, so all that became somewhat messy. Arnaldo, what's the way out of
>>> this? Or is this a "we are screwed one way or another and someone has to
>>> bite the bullet" situation?
>>>
>>> Ciao, Thorsten
>>>
>>>> The patch changed the priority in the case
>>>> that you do something like:
>>>>
>>>> $ perf stat -e 'armv8_pmuv3_0/cycles/' benchmark
>>>>
>>>> but if you do:
>>>>
>>>> $ perf stat -e 'cycles' benchmark
>>>>
>>>> then the broken behavior will happen as legacy events have priority
>>>> over sysfs/json events in that case. To fix this you need to revert:
>>>> 4f1b067359ac Revert "perf parse-events: Prefer sysfs/JSON hardware
>>>> events over legacy"
>>>>
>>>> This causes some testing issues resolved in this unmerged patch series:
>>>> https://lore.kernel.org/lkml/20240510053705.2462258-1-irogers@google.com/
>>>>
>>>> There is a bug as the arm_dsu PMU advertises an event called "cycles"
>>>> and this PMU is present on Ampere systems. Reverting the commit above
>>>> will cause an issue as the commit 7b100989b4f6 ("perf evlist: Remove
>>>> __evlist__add_default") to fix ARM's BIG.little systems (opening a
>>>> cycles event on all PMUs not just 1) will cause the arm_dsu event to
>>>> be opened by perf record and fail as the event won't support sampling.
>>>>
>>>> The patch https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
>>>> fixes this by only opening the cycles event on core PMUs when choosing
>>>> default events.
>>>>
>>>> Rather than take this patch the revert happened as Linus runs the
>>>> command "perf record -e cycles:pp" (ie using a specified event and not
>>>> defaults) and considers it a regression in the perf tool that on an
>>>> Ampere system to need to do "perf record -e
>>>> 'armv8_pmuv3_0/cycles/pp'". It was pointed out that not specifying -e
>>>> will choose the cycles event correctly and with better precision the
>>>> pp for systems that support it, but it was still considered a
>>>> regression in the perf tool so the revert was made to happen. There is
>>>> a lack of perf testing coverage for ARM, in particular as they choose
>>>> to do everything in a different way to x86. The patch in question was
>>>> in the linux-next tree for weeks without issues.
>>>>
>>>> ARM/Ampere could fix this by renaming the event from cycles to
>>>> cpu_cycles, or by following Intel's convention that anything uncore
>>>> uses the name clockticks rather than cycles. This could break people
>>>> who rely on an event called arm_dsu/cycles/ but I imagine such people
>>>> are rare. There has been no progress I'm aware of on renaming the
>>>> event.
>>>>
>>>> Making perf not terminate on opening an event for perf record seems
>>>> like the most likely workaround as that is at least something under
>>>> the tool maintainers control. ARM have discussed doing this on the
>>>> lists:
>>>> https://lore.kernel.org/lkml/f30f676e-a1d7-4d6b-94c1-3bdbd1448887@arm.com/
>>>> but since the revert in v6.10 no patches have appeared for the v6.11
>>>> merge window. Feature work like coresight improvements and ARMv9 are
>>>> being actively pursued by ARM, but feature work won't resolve this
>>>> regression.
>>>>
>>
>> I got some hardware with the DSU PMU so I'm going to have a go at trying
>> to send some fixes for this. My initial idea was to try incorporate the
>> "not terminate on opening" change as discussed in the link directly
>> above. And then do the revert of the "revert of prefer sysfs/json".
> 
> Thanks, I think this would be good. The biggest issue is that none of
> the record logic expects a file descriptor to be not opened, deleting
> unopened evsels from the evlist breaks all the indexing into the
> mmaps, etc. Tbh, you probably wouldn't do the code this way if was
> written afresh. Perhaps a hashmap would map from an evsel to ring
> buffer mmaps, etc. Trying to avoid having global state and benefitting
> from encapsulation. I'd focus on just doing the expedient thing in the
> changes, which probably just means making the record code tolerant of
> evsels that fail to open and not modifying the evlist due to the risk
> it breaks the indices.
> 

Thanks for the tips.

> (To point out the obvious, this work wouldn't be necessary if arm_dsu
> event were renamed from "cycles" to "cpu_cycles" which would also make
> it more intention revealing alongside the arm_dsu's "bus_cycles" event
> name).
> 

I understand but I can imagine the following conversation if we rename that:

   User: "I updated my kernel and now my (non Perf) tool fails to open
          the DSU cycles event because it doesn't exist anymore"

   Linus/maintainers: "Oh ok yes that was a userspace breaking change,
                      lets revert it"

Just because Perf can handle 3 different names for cycles doesn't mean 
other tools can.

>> FWIW I don't think Juno currently is broken if the kernel supports
>> extended type ID? I could have missed some output in this thread but it
>> seems like it's mostly related to Apple M hardware. I'm also a bit
>> confused why the "supports extended type" check fails there, but maybe
>> the v6.9 commit 25412c036 from Mark is missing?
> 
> So I think your later emails clarify Arnaldo is probably missing:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/drivers/perf/arm_pmu.c?h=perf-tools-next&id=5c816728651ae425954542fed64d21d40cb75a9f
> 
> Fwiw, the Apple M hardware issue came to me by way of Mark Rutland
> (iirc), this regression report, etc. My understanding is that Apple M
> has something like a v2 ARM PMU and the legacy events are encoded
> incorrectly in the driver for this. The regression in v6.5 happened

I'm not sure about that. The M PMU events may be incomplete, but the two 
that are there have a mapping that looks sane:

   static const unsigned m1_pmu_perf_map[PERF_COUNT_HW_MAX] = {
	PERF_MAP_ALL_UNSUPPORTED,
	[PERF_COUNT_HW_CPU_CYCLES]	= M1_PMU_PERFCTR_CPU_CYCLES,
	[PERF_COUNT_HW_INSTRUCTIONS]	= M1_PMU_PERFCTR_INSTRUCTIONS,
	/* No idea about the rest yet */
   };

And they map to the same named events:

   static struct attribute *m1_pmu_event_attrs[] = {
	M1_PMU_EVENT_ATTR(cycles, M1_PMU_PERFCTR_CPU_CYCLES),
	M1_PMU_EVENT_ATTR(instructions, M1_PMU_PERFCTR_INSTRUCTIONS),
	NULL,
   };

So in this case I can't see using legacy vs sysfs events making a 
difference. Maybe there is some other case that was mentioned in a 
previous thread that I missed though.

> because ARM's core PMUs had previously been treated as uncore PMUs,
> meaning we wouldn't try to program legacy events on them. Fixing the
> handling of ARM's core PMUs broke Apple M due to the broken legacy
> event mappings. Why not fix the Apple M PMU driver? Well there was
> anyway a similar RISC-V issue reported by Atish Patra (iirc) where the
> RISC-V PMU driver wants to delegate the mapping of legacy events to
> the perf tool so the driver needn't be aware of all and future RISC-V
> configurations. The fix discussed with Mark, Atish, etc. has been to
> swap the priority of legacy and sysfs/json events so that the latter
> has priority. We need the revert of the revert as currently we only do
> this if a PMU is specified with an event, not for the general wildcard
> PMUs case that most people use. There was huge fallout from flipping

Yep makes sense to do the revert if RISC-V isn't going to support any 
legacy events. Although from what I understand that would technically 
only require JSON to be the highest priority? Because putting named 
events in sysfs still requires kernel involvement so doesn't get you any 
further than supporting the legacy events?

Seems like there is another reason to do the revert though as Mark 
mentioned: That now directly specifying the PMU eg "-e 
arm_cortex_a56/cycles/" opens a legacy event if the event matches one, 
which is not the best thing to do. But the revert fixes this AFAIK, so 
while having the priority JSON/legacy/sysfs might work for RISC-V it 
wouldn't work for a platform that wants a slightly different sysfs event 
than legacy but with the same name. And the priority should be 
JSON/sysfs/legacy.

> the priority particularly on Intel as all test expectations needed
> updating. I've sent out similar fixes that need incorporating when the
> revert is reverted. Ideally tools/perf/tests/parse-events.c would be
> updated to cover ARM's PMUs that don't follow the normal pattern that
> the core PMU is called "cpu" (this would mean that we were testing
> event parsing on ARM was WAI wrt encoding priorities, BIG.little,
> etc).
> 
>> I sent a small fix the other day to make perf stat default arguments
>> work on Juno, and didn't notice anything out of the ordinary:
>> https://lore.kernel.org/linux-perf-users/dac6ad1d-5aca-48b4-9dcb-ff7e54ca43f6@linaro.org/T/#t
>> I agree that change is quite narrow but it does incrementally improve
>> things for the time being. It's possible that it would become redundant
>> if I can just include Ian's change to use strings for Perf stat.
> 
> I'd prefer we didn't merge this as we'd need to rebase:
> https://lore.kernel.org/lkml/20240510053705.2462258-4-irogers@google.com/
> and those changes would then delete the code introduced. I'm fine with
> adding the tests.
> 
> There are more exotic heterogeneous core things upcoming, probably
> also from ARM, and the thought of duplicating the default attribute
> logic and event parsing constraints is just something I'd prefer not
> to have to do.
> 

Yep I don't have any strong feelings about this. Even if we don't merge 
it it helped me understand the code and the issue a bit.

I think one thing I assumed about your change was that there was some 
dependency on these other changes. But the more I look at it I think 
it's actually fine on it's own?

Using the cycles string actually works today, even on Apple M. The only 
real remaining issue is softening the error for failure to open, but 
that's _after_ doing the revert of the revert and is separate.

I will re-test that one today with fresh eyes.

>> Of course I only think I have a handle on the issue right now, seems
>> like it has a lot of moving parts and something else always comes up. If
>> I hit a wall at some point I will come back here.
> 
> Thanks,
> Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2024-08-16  9:22             ` James Clark
@ 2024-08-16 15:30               ` Ian Rogers
  2024-08-17  1:38                 ` Atish Kumar Patra
  2024-08-19 14:56                 ` James Clark
  0 siblings, 2 replies; 53+ messages in thread
From: Ian Rogers @ 2024-08-16 15:30 UTC (permalink / raw)
  To: James Clark
  Cc: Thorsten Leemhuis, Arnaldo Carvalho de Melo, Mark Rutland,
	Linux perf Profiling, Linux Kernel Mailing List, James Clark,
	cc: Marc Zyngier, Hector Martin, Asahi Linux,
	Linux regressions mailing list, Atish Patra

On Fri, Aug 16, 2024 at 2:23 AM James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 15/08/2024 6:29 pm, Ian Rogers wrote:
> > On Wed, Aug 14, 2024 at 9:28 AM James Clark <james.clark@linaro.org> wrote:
> >> On 07/08/2024 9:54 am, Thorsten Leemhuis wrote:
> >>> On 01.08.24 21:05, Ian Rogers wrote:
> >>>> On Wed, Dec 6, 2023 at 4:09 AM Linux regression tracking #update
> >>>> (Thorsten Leemhuis) <regressions@leemhuis.info> wrote:
> >>>>>
> >>>>> [TLDR: This mail in primarily relevant for Linux kernel regression
> >>>>> tracking. See link in footer if these mails annoy you.]
> >>>>>
> >>>>> On 22.11.23 00:43, Bagas Sanjaya wrote:
> >>>>>> On Tue, Nov 21, 2023 at 09:08:48PM +0900, Hector Martin wrote:
> >>>>>>> Perf broke on all Apple ARM64 systems (tested almost everything), and
> >>>>>>> according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> >>>>>
> >>>>> #regzbot fix: perf parse-events: Make legacy events lower priority than
> >>>>> sysfs/JSON
> >>>>> #regzbot ignore-activity
> >>>>
> >>>> Note, this is still broken.
> >>>
> >>> Hmmm, so all that became somewhat messy. Arnaldo, what's the way out of
> >>> this? Or is this a "we are screwed one way or another and someone has to
> >>> bite the bullet" situation?
> >>>
> >>> Ciao, Thorsten
> >>>
> >>>> The patch changed the priority in the case
> >>>> that you do something like:
> >>>>
> >>>> $ perf stat -e 'armv8_pmuv3_0/cycles/' benchmark
> >>>>
> >>>> but if you do:
> >>>>
> >>>> $ perf stat -e 'cycles' benchmark
> >>>>
> >>>> then the broken behavior will happen as legacy events have priority
> >>>> over sysfs/json events in that case. To fix this you need to revert:
> >>>> 4f1b067359ac Revert "perf parse-events: Prefer sysfs/JSON hardware
> >>>> events over legacy"
> >>>>
> >>>> This causes some testing issues resolved in this unmerged patch series:
> >>>> https://lore.kernel.org/lkml/20240510053705.2462258-1-irogers@google.com/
> >>>>
> >>>> There is a bug as the arm_dsu PMU advertises an event called "cycles"
> >>>> and this PMU is present on Ampere systems. Reverting the commit above
> >>>> will cause an issue as the commit 7b100989b4f6 ("perf evlist: Remove
> >>>> __evlist__add_default") to fix ARM's BIG.little systems (opening a
> >>>> cycles event on all PMUs not just 1) will cause the arm_dsu event to
> >>>> be opened by perf record and fail as the event won't support sampling.
> >>>>
> >>>> The patch https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
> >>>> fixes this by only opening the cycles event on core PMUs when choosing
> >>>> default events.
> >>>>
> >>>> Rather than take this patch the revert happened as Linus runs the
> >>>> command "perf record -e cycles:pp" (ie using a specified event and not
> >>>> defaults) and considers it a regression in the perf tool that on an
> >>>> Ampere system to need to do "perf record -e
> >>>> 'armv8_pmuv3_0/cycles/pp'". It was pointed out that not specifying -e
> >>>> will choose the cycles event correctly and with better precision the
> >>>> pp for systems that support it, but it was still considered a
> >>>> regression in the perf tool so the revert was made to happen. There is
> >>>> a lack of perf testing coverage for ARM, in particular as they choose
> >>>> to do everything in a different way to x86. The patch in question was
> >>>> in the linux-next tree for weeks without issues.
> >>>>
> >>>> ARM/Ampere could fix this by renaming the event from cycles to
> >>>> cpu_cycles, or by following Intel's convention that anything uncore
> >>>> uses the name clockticks rather than cycles. This could break people
> >>>> who rely on an event called arm_dsu/cycles/ but I imagine such people
> >>>> are rare. There has been no progress I'm aware of on renaming the
> >>>> event.
> >>>>
> >>>> Making perf not terminate on opening an event for perf record seems
> >>>> like the most likely workaround as that is at least something under
> >>>> the tool maintainers control. ARM have discussed doing this on the
> >>>> lists:
> >>>> https://lore.kernel.org/lkml/f30f676e-a1d7-4d6b-94c1-3bdbd1448887@arm.com/
> >>>> but since the revert in v6.10 no patches have appeared for the v6.11
> >>>> merge window. Feature work like coresight improvements and ARMv9 are
> >>>> being actively pursued by ARM, but feature work won't resolve this
> >>>> regression.
> >>>>
> >>
> >> I got some hardware with the DSU PMU so I'm going to have a go at trying
> >> to send some fixes for this. My initial idea was to try incorporate the
> >> "not terminate on opening" change as discussed in the link directly
> >> above. And then do the revert of the "revert of prefer sysfs/json".
> >
> > Thanks, I think this would be good. The biggest issue is that none of
> > the record logic expects a file descriptor to be not opened, deleting
> > unopened evsels from the evlist breaks all the indexing into the
> > mmaps, etc. Tbh, you probably wouldn't do the code this way if was
> > written afresh. Perhaps a hashmap would map from an evsel to ring
> > buffer mmaps, etc. Trying to avoid having global state and benefitting
> > from encapsulation. I'd focus on just doing the expedient thing in the
> > changes, which probably just means making the record code tolerant of
> > evsels that fail to open and not modifying the evlist due to the risk
> > it breaks the indices.
> >
>
> Thanks for the tips.
>
> > (To point out the obvious, this work wouldn't be necessary if arm_dsu
> > event were renamed from "cycles" to "cpu_cycles" which would also make
> > it more intention revealing alongside the arm_dsu's "bus_cycles" event
> > name).
> >
>
> I understand but I can imagine the following conversation if we rename that:
>
>    User: "I updated my kernel and now my (non Perf) tool fails to open
>           the DSU cycles event because it doesn't exist anymore"
>
>    Linus/maintainers: "Oh ok yes that was a userspace breaking change,
>                       lets revert it"
>
> Just because Perf can handle 3 different names for cycles doesn't mean
> other tools can.

cycles was a bad event name, dsu is a terrible name for what is mainly
the l3 cache, the risk that the two are combined get broken I'm fine
with as neoverse users with uncore permissions are say much rarer than
Apple M users. Having a cycles and a bus_cycles event is already
ambiguous, they sound the same. Renaming cycles to cpu_cycles would be
best.

> >> FWIW I don't think Juno currently is broken if the kernel supports
> >> extended type ID? I could have missed some output in this thread but it
> >> seems like it's mostly related to Apple M hardware. I'm also a bit
> >> confused why the "supports extended type" check fails there, but maybe
> >> the v6.9 commit 25412c036 from Mark is missing?
> >
> > So I think your later emails clarify Arnaldo is probably missing:
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/drivers/perf/arm_pmu.c?h=perf-tools-next&id=5c816728651ae425954542fed64d21d40cb75a9f
> >
> > Fwiw, the Apple M hardware issue came to me by way of Mark Rutland
> > (iirc), this regression report, etc. My understanding is that Apple M
> > has something like a v2 ARM PMU and the legacy events are encoded
> > incorrectly in the driver for this. The regression in v6.5 happened
>
> I'm not sure about that. The M PMU events may be incomplete, but the two
> that are there have a mapping that looks sane:
>
>    static const unsigned m1_pmu_perf_map[PERF_COUNT_HW_MAX] = {
>         PERF_MAP_ALL_UNSUPPORTED,
>         [PERF_COUNT_HW_CPU_CYCLES]      = M1_PMU_PERFCTR_CPU_CYCLES,
>         [PERF_COUNT_HW_INSTRUCTIONS]    = M1_PMU_PERFCTR_INSTRUCTIONS,
>         /* No idea about the rest yet */
>    };
>
> And they map to the same named events:
>
>    static struct attribute *m1_pmu_event_attrs[] = {
>         M1_PMU_EVENT_ATTR(cycles, M1_PMU_PERFCTR_CPU_CYCLES),
>         M1_PMU_EVENT_ATTR(instructions, M1_PMU_PERFCTR_INSTRUCTIONS),
>         NULL,
>    };
>
> So in this case I can't see using legacy vs sysfs events making a
> difference. Maybe there is some other case that was mentioned in a
> previous thread that I missed though.

No idea, iirc Mark Rutland requested not to use legacy events for Apple M.

> > because ARM's core PMUs had previously been treated as uncore PMUs,
> > meaning we wouldn't try to program legacy events on them. Fixing the
> > handling of ARM's core PMUs broke Apple M due to the broken legacy
> > event mappings. Why not fix the Apple M PMU driver? Well there was
> > anyway a similar RISC-V issue reported by Atish Patra (iirc) where the
> > RISC-V PMU driver wants to delegate the mapping of legacy events to
> > the perf tool so the driver needn't be aware of all and future RISC-V
> > configurations. The fix discussed with Mark, Atish, etc. has been to
> > swap the priority of legacy and sysfs/json events so that the latter
> > has priority. We need the revert of the revert as currently we only do
> > this if a PMU is specified with an event, not for the general wildcard
> > PMUs case that most people use. There was huge fallout from flipping
>
> Yep makes sense to do the revert if RISC-V isn't going to support any
> legacy events. Although from what I understand that would technically
> only require JSON to be the highest priority? Because putting named
> events in sysfs still requires kernel involvement so doesn't get you any
> further than supporting the legacy events?

The sysfs and json event handling is interwoven, for example you can
add to a sysfs event with json information. There are basically two
approaches in the event parser, hardcoded legacy things and event
names (optionally with PMU names). I'm trying to get rid of the
hardcoded legacy things as they were fine when you had a single core
type, but I want to have events everywhere - say instructions and
cycles on a GPU so we can IPC on a GPU. For RISC-V as long as the
legacy events are covered as names in json and json/sysfs has priority
over legacy then things will be fine.

> Seems like there is another reason to do the revert though as Mark
> mentioned: That now directly specifying the PMU eg "-e
> arm_cortex_a56/cycles/" opens a legacy event if the event matches one,
> which is not the best thing to do. But the revert fixes this AFAIK, so
> while having the priority JSON/legacy/sysfs might work for RISC-V it
> wouldn't work for a platform that wants a slightly different sysfs event
> than legacy but with the same name. And the priority should be
> JSON/sysfs/legacy.

The priority for events with a PMU is the sysfs/json has a priority
over legacy names, so I don't understand what you're saying here. Your
example shouldn't be broken. The revert is for the case where no PMU
is specified, where the priority is the opposite which is at best
inconsistent.

> > the priority particularly on Intel as all test expectations needed
> > updating. I've sent out similar fixes that need incorporating when the
> > revert is reverted. Ideally tools/perf/tests/parse-events.c would be
> > updated to cover ARM's PMUs that don't follow the normal pattern that
> > the core PMU is called "cpu" (this would mean that we were testing
> > event parsing on ARM was WAI wrt encoding priorities, BIG.little,
> > etc).
> >
> >> I sent a small fix the other day to make perf stat default arguments
> >> work on Juno, and didn't notice anything out of the ordinary:
> >> https://lore.kernel.org/linux-perf-users/dac6ad1d-5aca-48b4-9dcb-ff7e54ca43f6@linaro.org/T/#t
> >> I agree that change is quite narrow but it does incrementally improve
> >> things for the time being. It's possible that it would become redundant
> >> if I can just include Ian's change to use strings for Perf stat.
> >
> > I'd prefer we didn't merge this as we'd need to rebase:
> > https://lore.kernel.org/lkml/20240510053705.2462258-4-irogers@google.com/
> > and those changes would then delete the code introduced. I'm fine with
> > adding the tests.
> >
> > There are more exotic heterogeneous core things upcoming, probably
> > also from ARM, and the thought of duplicating the default attribute
> > logic and event parsing constraints is just something I'd prefer not
> > to have to do.
> >
>
> Yep I don't have any strong feelings about this. Even if we don't merge
> it it helped me understand the code and the issue a bit.
>
> I think one thing I assumed about your change was that there was some
> dependency on these other changes. But the more I look at it I think
> it's actually fine on it's own?

Which change? If the change is trying to use "cycles" to open on all
PMUs because it will be wild carded then it will run into the priority
issue.

> Using the cycles string actually works today, even on Apple M. The only
> real remaining issue is softening the error for failure to open, but
> that's _after_ doing the revert of the revert and is separate.
>
> I will re-test that one today with fresh eyes.

Perhaps it is other legacy events, not cycles and instructions. There
must have been a reason for this regression report but I don't have an
Apple M CPU to test on.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2024-08-16 15:30               ` Ian Rogers
@ 2024-08-17  1:38                 ` Atish Kumar Patra
  2024-08-20  8:58                   ` James Clark
  2024-08-19 14:56                 ` James Clark
  1 sibling, 1 reply; 53+ messages in thread
From: Atish Kumar Patra @ 2024-08-17  1:38 UTC (permalink / raw)
  To: Ian Rogers
  Cc: James Clark, Thorsten Leemhuis, Arnaldo Carvalho de Melo,
	Mark Rutland, Linux perf Profiling, Linux Kernel Mailing List,
	James Clark, cc: Marc Zyngier, Hector Martin, Asahi Linux,
	Linux regressions mailing list

On Fri, Aug 16, 2024 at 8:30 AM Ian Rogers <irogers@google.com> wrote:
>
> On Fri, Aug 16, 2024 at 2:23 AM James Clark <james.clark@linaro.org> wrote:
> >
> >
> >
> > On 15/08/2024 6:29 pm, Ian Rogers wrote:
> > > On Wed, Aug 14, 2024 at 9:28 AM James Clark <james.clark@linaro.org> wrote:
> > >> On 07/08/2024 9:54 am, Thorsten Leemhuis wrote:
> > >>> On 01.08.24 21:05, Ian Rogers wrote:
> > >>>> On Wed, Dec 6, 2023 at 4:09 AM Linux regression tracking #update
> > >>>> (Thorsten Leemhuis) <regressions@leemhuis.info> wrote:
> > >>>>>
> > >>>>> [TLDR: This mail in primarily relevant for Linux kernel regression
> > >>>>> tracking. See link in footer if these mails annoy you.]
> > >>>>>
> > >>>>> On 22.11.23 00:43, Bagas Sanjaya wrote:
> > >>>>>> On Tue, Nov 21, 2023 at 09:08:48PM +0900, Hector Martin wrote:
> > >>>>>>> Perf broke on all Apple ARM64 systems (tested almost everything), and
> > >>>>>>> according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> > >>>>>
> > >>>>> #regzbot fix: perf parse-events: Make legacy events lower priority than
> > >>>>> sysfs/JSON
> > >>>>> #regzbot ignore-activity
> > >>>>
> > >>>> Note, this is still broken.
> > >>>
> > >>> Hmmm, so all that became somewhat messy. Arnaldo, what's the way out of
> > >>> this? Or is this a "we are screwed one way or another and someone has to
> > >>> bite the bullet" situation?
> > >>>
> > >>> Ciao, Thorsten
> > >>>
> > >>>> The patch changed the priority in the case
> > >>>> that you do something like:
> > >>>>
> > >>>> $ perf stat -e 'armv8_pmuv3_0/cycles/' benchmark
> > >>>>
> > >>>> but if you do:
> > >>>>
> > >>>> $ perf stat -e 'cycles' benchmark
> > >>>>
> > >>>> then the broken behavior will happen as legacy events have priority
> > >>>> over sysfs/json events in that case. To fix this you need to revert:
> > >>>> 4f1b067359ac Revert "perf parse-events: Prefer sysfs/JSON hardware
> > >>>> events over legacy"
> > >>>>
> > >>>> This causes some testing issues resolved in this unmerged patch series:
> > >>>> https://lore.kernel.org/lkml/20240510053705.2462258-1-irogers@google.com/
> > >>>>
> > >>>> There is a bug as the arm_dsu PMU advertises an event called "cycles"
> > >>>> and this PMU is present on Ampere systems. Reverting the commit above
> > >>>> will cause an issue as the commit 7b100989b4f6 ("perf evlist: Remove
> > >>>> __evlist__add_default") to fix ARM's BIG.little systems (opening a
> > >>>> cycles event on all PMUs not just 1) will cause the arm_dsu event to
> > >>>> be opened by perf record and fail as the event won't support sampling.
> > >>>>
> > >>>> The patch https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
> > >>>> fixes this by only opening the cycles event on core PMUs when choosing
> > >>>> default events.
> > >>>>
> > >>>> Rather than take this patch the revert happened as Linus runs the
> > >>>> command "perf record -e cycles:pp" (ie using a specified event and not
> > >>>> defaults) and considers it a regression in the perf tool that on an
> > >>>> Ampere system to need to do "perf record -e
> > >>>> 'armv8_pmuv3_0/cycles/pp'". It was pointed out that not specifying -e
> > >>>> will choose the cycles event correctly and with better precision the
> > >>>> pp for systems that support it, but it was still considered a
> > >>>> regression in the perf tool so the revert was made to happen. There is
> > >>>> a lack of perf testing coverage for ARM, in particular as they choose
> > >>>> to do everything in a different way to x86. The patch in question was
> > >>>> in the linux-next tree for weeks without issues.
> > >>>>
> > >>>> ARM/Ampere could fix this by renaming the event from cycles to
> > >>>> cpu_cycles, or by following Intel's convention that anything uncore
> > >>>> uses the name clockticks rather than cycles. This could break people
> > >>>> who rely on an event called arm_dsu/cycles/ but I imagine such people
> > >>>> are rare. There has been no progress I'm aware of on renaming the
> > >>>> event.
> > >>>>
> > >>>> Making perf not terminate on opening an event for perf record seems
> > >>>> like the most likely workaround as that is at least something under
> > >>>> the tool maintainers control. ARM have discussed doing this on the
> > >>>> lists:
> > >>>> https://lore.kernel.org/lkml/f30f676e-a1d7-4d6b-94c1-3bdbd1448887@arm.com/
> > >>>> but since the revert in v6.10 no patches have appeared for the v6.11
> > >>>> merge window. Feature work like coresight improvements and ARMv9 are
> > >>>> being actively pursued by ARM, but feature work won't resolve this
> > >>>> regression.
> > >>>>
> > >>
> > >> I got some hardware with the DSU PMU so I'm going to have a go at trying
> > >> to send some fixes for this. My initial idea was to try incorporate the
> > >> "not terminate on opening" change as discussed in the link directly
> > >> above. And then do the revert of the "revert of prefer sysfs/json".
> > >
> > > Thanks, I think this would be good. The biggest issue is that none of
> > > the record logic expects a file descriptor to be not opened, deleting
> > > unopened evsels from the evlist breaks all the indexing into the
> > > mmaps, etc. Tbh, you probably wouldn't do the code this way if was
> > > written afresh. Perhaps a hashmap would map from an evsel to ring
> > > buffer mmaps, etc. Trying to avoid having global state and benefitting
> > > from encapsulation. I'd focus on just doing the expedient thing in the
> > > changes, which probably just means making the record code tolerant of
> > > evsels that fail to open and not modifying the evlist due to the risk
> > > it breaks the indices.
> > >
> >
> > Thanks for the tips.
> >
> > > (To point out the obvious, this work wouldn't be necessary if arm_dsu
> > > event were renamed from "cycles" to "cpu_cycles" which would also make
> > > it more intention revealing alongside the arm_dsu's "bus_cycles" event
> > > name).
> > >
> >
> > I understand but I can imagine the following conversation if we rename that:
> >
> >    User: "I updated my kernel and now my (non Perf) tool fails to open
> >           the DSU cycles event because it doesn't exist anymore"
> >
> >    Linus/maintainers: "Oh ok yes that was a userspace breaking change,
> >                       lets revert it"
> >
> > Just because Perf can handle 3 different names for cycles doesn't mean
> > other tools can.
>
> cycles was a bad event name, dsu is a terrible name for what is mainly
> the l3 cache, the risk that the two are combined get broken I'm fine
> with as neoverse users with uncore permissions are say much rarer than
> Apple M users. Having a cycles and a bus_cycles event is already
> ambiguous, they sound the same. Renaming cycles to cpu_cycles would be
> best.
>
> > >> FWIW I don't think Juno currently is broken if the kernel supports
> > >> extended type ID? I could have missed some output in this thread but it
> > >> seems like it's mostly related to Apple M hardware. I'm also a bit
> > >> confused why the "supports extended type" check fails there, but maybe
> > >> the v6.9 commit 25412c036 from Mark is missing?
> > >
> > > So I think your later emails clarify Arnaldo is probably missing:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/drivers/perf/arm_pmu.c?h=perf-tools-next&id=5c816728651ae425954542fed64d21d40cb75a9f
> > >
> > > Fwiw, the Apple M hardware issue came to me by way of Mark Rutland
> > > (iirc), this regression report, etc. My understanding is that Apple M
> > > has something like a v2 ARM PMU and the legacy events are encoded
> > > incorrectly in the driver for this. The regression in v6.5 happened
> >
> > I'm not sure about that. The M PMU events may be incomplete, but the two
> > that are there have a mapping that looks sane:
> >
> >    static const unsigned m1_pmu_perf_map[PERF_COUNT_HW_MAX] = {
> >         PERF_MAP_ALL_UNSUPPORTED,
> >         [PERF_COUNT_HW_CPU_CYCLES]      = M1_PMU_PERFCTR_CPU_CYCLES,
> >         [PERF_COUNT_HW_INSTRUCTIONS]    = M1_PMU_PERFCTR_INSTRUCTIONS,
> >         /* No idea about the rest yet */
> >    };
> >
> > And they map to the same named events:
> >
> >    static struct attribute *m1_pmu_event_attrs[] = {
> >         M1_PMU_EVENT_ATTR(cycles, M1_PMU_PERFCTR_CPU_CYCLES),
> >         M1_PMU_EVENT_ATTR(instructions, M1_PMU_PERFCTR_INSTRUCTIONS),
> >         NULL,
> >    };
> >
> > So in this case I can't see using legacy vs sysfs events making a
> > difference. Maybe there is some other case that was mentioned in a
> > previous thread that I missed though.
>
> No idea, iirc Mark Rutland requested not to use legacy events for Apple M.
>
> > > because ARM's core PMUs had previously been treated as uncore PMUs,
> > > meaning we wouldn't try to program legacy events on them. Fixing the
> > > handling of ARM's core PMUs broke Apple M due to the broken legacy
> > > event mappings. Why not fix the Apple M PMU driver? Well there was
> > > anyway a similar RISC-V issue reported by Atish Patra (iirc) where the
> > > RISC-V PMU driver wants to delegate the mapping of legacy events to
> > > the perf tool so the driver needn't be aware of all and future RISC-V
> > > configurations. The fix discussed with Mark, Atish, etc. has been to
> > > swap the priority of legacy and sysfs/json events so that the latter
> > > has priority. We need the revert of the revert as currently we only do
> > > this if a PMU is specified with an event, not for the general wildcard
> > > PMUs case that most people use. There was huge fallout from flipping
> >
> > Yep makes sense to do the revert if RISC-V isn't going to support any
> > legacy events. Although from what I understand that would technically
> > only require JSON to be the highest priority? Because putting named
> > events in sysfs still requires kernel involvement so doesn't get you any
> > further than supporting the legacy events?
>
> The sysfs and json event handling is interwoven, for example you can
> add to a sysfs event with json information. There are basically two
> approaches in the event parser, hardcoded legacy things and event
> names (optionally with PMU names). I'm trying to get rid of the
> hardcoded legacy things as they were fine when you had a single core
> type, but I want to have events everywhere - say instructions and
> cycles on a GPU so we can IPC on a GPU. For RISC-V as long as the
> legacy events are covered as names in json and json/sysfs has priority
> over legacy then things will be fine.
>

RISC-V does want to support legacy events as that's how users on other
architectures are used to
run perf. It would be weird if we don't support it.

Our initial reasoning behind relying on json for legacy events to
avoid vendor specific encodings for these
events in the driver. Unlike other ISAs, RISC-V ISA doesn't define an
event encoding for these legacy
events. As a result every platform vendor will have custom encoding.
Managing them in the driver is
cumbersome. Many thanks to Ian for posting the patches to reverse the
priority which works fine for RISC-V.

However, I understand that it is easier said than done and some use
cases are broken. We also discovered
there are few other use cases which still have the same problem even
if we solve the bigger problem via json parsing
for legacy events.

1. Any other user profiling application that invokes perf system calls
directly may also try to just legacy event attributes in
perf_event_attr.
Android simpleperf application also falls in this category. We need to
describe the platform specific encoding somewhere for these
applications.

2. Perf running inside guests may run on any hardware and can't be
tied to a platform specific json file. If we bind the legacy
events via json file, those users won't be able to use perf cycle or
instruction without the json file available.

I don't have any good solutions for the above said problems without
specifying the encoding in the driver itself.
Given all the problems around json parsing for legacy events, we are
thinking of biting the bullet and allowing platform vendors
to encode the legacy events in the driver itself similar to other
ISAs. We will try to keep the interface as scalable as possible.

Any suggestions ?

> > Seems like there is another reason to do the revert though as Mark
> > mentioned: That now directly specifying the PMU eg "-e
> > arm_cortex_a56/cycles/" opens a legacy event if the event matches one,
> > which is not the best thing to do. But the revert fixes this AFAIK, so
> > while having the priority JSON/legacy/sysfs might work for RISC-V it
> > wouldn't work for a platform that wants a slightly different sysfs event
> > than legacy but with the same name. And the priority should be
> > JSON/sysfs/legacy.
>
> The priority for events with a PMU is the sysfs/json has a priority
> over legacy names, so I don't understand what you're saying here. Your
> example shouldn't be broken. The revert is for the case where no PMU
> is specified, where the priority is the opposite which is at best
> inconsistent.
>
> > > the priority particularly on Intel as all test expectations needed
> > > updating. I've sent out similar fixes that need incorporating when the
> > > revert is reverted. Ideally tools/perf/tests/parse-events.c would be
> > > updated to cover ARM's PMUs that don't follow the normal pattern that
> > > the core PMU is called "cpu" (this would mean that we were testing
> > > event parsing on ARM was WAI wrt encoding priorities, BIG.little,
> > > etc).
> > >
> > >> I sent a small fix the other day to make perf stat default arguments
> > >> work on Juno, and didn't notice anything out of the ordinary:
> > >> https://lore.kernel.org/linux-perf-users/dac6ad1d-5aca-48b4-9dcb-ff7e54ca43f6@linaro.org/T/#t
> > >> I agree that change is quite narrow but it does incrementally improve
> > >> things for the time being. It's possible that it would become redundant
> > >> if I can just include Ian's change to use strings for Perf stat.
> > >
> > > I'd prefer we didn't merge this as we'd need to rebase:
> > > https://lore.kernel.org/lkml/20240510053705.2462258-4-irogers@google.com/
> > > and those changes would then delete the code introduced. I'm fine with
> > > adding the tests.
> > >
> > > There are more exotic heterogeneous core things upcoming, probably
> > > also from ARM, and the thought of duplicating the default attribute
> > > logic and event parsing constraints is just something I'd prefer not
> > > to have to do.
> > >
> >
> > Yep I don't have any strong feelings about this. Even if we don't merge
> > it it helped me understand the code and the issue a bit.
> >
> > I think one thing I assumed about your change was that there was some
> > dependency on these other changes. But the more I look at it I think
> > it's actually fine on it's own?
>
> Which change? If the change is trying to use "cycles" to open on all
> PMUs because it will be wild carded then it will run into the priority
> issue.
>
> > Using the cycles string actually works today, even on Apple M. The only
> > real remaining issue is softening the error for failure to open, but
> > that's _after_ doing the revert of the revert and is separate.
> >
> > I will re-test that one today with fresh eyes.
>
> Perhaps it is other legacy events, not cycles and instructions. There
> must have been a reason for this regression report but I don't have an
> Apple M CPU to test on.
>
> Thanks,
> Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2024-08-16 15:30               ` Ian Rogers
  2024-08-17  1:38                 ` Atish Kumar Patra
@ 2024-08-19 14:56                 ` James Clark
  2024-08-19 15:44                   ` Ian Rogers
  1 sibling, 1 reply; 53+ messages in thread
From: James Clark @ 2024-08-19 14:56 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Thorsten Leemhuis, Arnaldo Carvalho de Melo, Mark Rutland,
	Linux perf Profiling, Linux Kernel Mailing List, James Clark,
	cc: Marc Zyngier, Hector Martin, Asahi Linux,
	Linux regressions mailing list, Atish Patra



On 16/08/2024 4:30 pm, Ian Rogers wrote:
> On Fri, Aug 16, 2024 at 2:23 AM James Clark <james.clark@linaro.org> wrote:
>>
>>
>>
>> On 15/08/2024 6:29 pm, Ian Rogers wrote:
>>> On Wed, Aug 14, 2024 at 9:28 AM James Clark <james.clark@linaro.org> wrote:
>>>> On 07/08/2024 9:54 am, Thorsten Leemhuis wrote:
>>>>> On 01.08.24 21:05, Ian Rogers wrote:
>>>>>> On Wed, Dec 6, 2023 at 4:09 AM Linux regression tracking #update
>>>>>> (Thorsten Leemhuis) <regressions@leemhuis.info> wrote:
>>>>>>>
>>>>>>> [TLDR: This mail in primarily relevant for Linux kernel regression
>>>>>>> tracking. See link in footer if these mails annoy you.]
>>>>>>>
>>>>>>> On 22.11.23 00:43, Bagas Sanjaya wrote:
>>>>>>>> On Tue, Nov 21, 2023 at 09:08:48PM +0900, Hector Martin wrote:
>>>>>>>>> Perf broke on all Apple ARM64 systems (tested almost everything), and
>>>>>>>>> according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
>>>>>>>
>>>>>>> #regzbot fix: perf parse-events: Make legacy events lower priority than
>>>>>>> sysfs/JSON
>>>>>>> #regzbot ignore-activity
>>>>>>
>>>>>> Note, this is still broken.
>>>>>
>>>>> Hmmm, so all that became somewhat messy. Arnaldo, what's the way out of
>>>>> this? Or is this a "we are screwed one way or another and someone has to
>>>>> bite the bullet" situation?
>>>>>
>>>>> Ciao, Thorsten
>>>>>
>>>>>> The patch changed the priority in the case
>>>>>> that you do something like:
>>>>>>
>>>>>> $ perf stat -e 'armv8_pmuv3_0/cycles/' benchmark
>>>>>>
>>>>>> but if you do:
>>>>>>
>>>>>> $ perf stat -e 'cycles' benchmark
>>>>>>
>>>>>> then the broken behavior will happen as legacy events have priority
>>>>>> over sysfs/json events in that case. To fix this you need to revert:
>>>>>> 4f1b067359ac Revert "perf parse-events: Prefer sysfs/JSON hardware
>>>>>> events over legacy"
>>>>>>
>>>>>> This causes some testing issues resolved in this unmerged patch series:
>>>>>> https://lore.kernel.org/lkml/20240510053705.2462258-1-irogers@google.com/
>>>>>>
>>>>>> There is a bug as the arm_dsu PMU advertises an event called "cycles"
>>>>>> and this PMU is present on Ampere systems. Reverting the commit above
>>>>>> will cause an issue as the commit 7b100989b4f6 ("perf evlist: Remove
>>>>>> __evlist__add_default") to fix ARM's BIG.little systems (opening a
>>>>>> cycles event on all PMUs not just 1) will cause the arm_dsu event to
>>>>>> be opened by perf record and fail as the event won't support sampling.
>>>>>>
>>>>>> The patch https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
>>>>>> fixes this by only opening the cycles event on core PMUs when choosing
>>>>>> default events.
>>>>>>
>>>>>> Rather than take this patch the revert happened as Linus runs the
>>>>>> command "perf record -e cycles:pp" (ie using a specified event and not
>>>>>> defaults) and considers it a regression in the perf tool that on an
>>>>>> Ampere system to need to do "perf record -e
>>>>>> 'armv8_pmuv3_0/cycles/pp'". It was pointed out that not specifying -e
>>>>>> will choose the cycles event correctly and with better precision the
>>>>>> pp for systems that support it, but it was still considered a
>>>>>> regression in the perf tool so the revert was made to happen. There is
>>>>>> a lack of perf testing coverage for ARM, in particular as they choose
>>>>>> to do everything in a different way to x86. The patch in question was
>>>>>> in the linux-next tree for weeks without issues.
>>>>>>
>>>>>> ARM/Ampere could fix this by renaming the event from cycles to
>>>>>> cpu_cycles, or by following Intel's convention that anything uncore
>>>>>> uses the name clockticks rather than cycles. This could break people
>>>>>> who rely on an event called arm_dsu/cycles/ but I imagine such people
>>>>>> are rare. There has been no progress I'm aware of on renaming the
>>>>>> event.
>>>>>>
>>>>>> Making perf not terminate on opening an event for perf record seems
>>>>>> like the most likely workaround as that is at least something under
>>>>>> the tool maintainers control. ARM have discussed doing this on the
>>>>>> lists:
>>>>>> https://lore.kernel.org/lkml/f30f676e-a1d7-4d6b-94c1-3bdbd1448887@arm.com/
>>>>>> but since the revert in v6.10 no patches have appeared for the v6.11
>>>>>> merge window. Feature work like coresight improvements and ARMv9 are
>>>>>> being actively pursued by ARM, but feature work won't resolve this
>>>>>> regression.
>>>>>>
>>>>
>>>> I got some hardware with the DSU PMU so I'm going to have a go at trying
>>>> to send some fixes for this. My initial idea was to try incorporate the
>>>> "not terminate on opening" change as discussed in the link directly
>>>> above. And then do the revert of the "revert of prefer sysfs/json".
>>>
>>> Thanks, I think this would be good. The biggest issue is that none of
>>> the record logic expects a file descriptor to be not opened, deleting
>>> unopened evsels from the evlist breaks all the indexing into the
>>> mmaps, etc. Tbh, you probably wouldn't do the code this way if was
>>> written afresh. Perhaps a hashmap would map from an evsel to ring
>>> buffer mmaps, etc. Trying to avoid having global state and benefitting
>>> from encapsulation. I'd focus on just doing the expedient thing in the
>>> changes, which probably just means making the record code tolerant of
>>> evsels that fail to open and not modifying the evlist due to the risk
>>> it breaks the indices.
>>>
>>
>> Thanks for the tips.
>>
>>> (To point out the obvious, this work wouldn't be necessary if arm_dsu
>>> event were renamed from "cycles" to "cpu_cycles" which would also make
>>> it more intention revealing alongside the arm_dsu's "bus_cycles" event
>>> name).
>>>
>>
>> I understand but I can imagine the following conversation if we rename that:
>>
>>     User: "I updated my kernel and now my (non Perf) tool fails to open
>>            the DSU cycles event because it doesn't exist anymore"
>>
>>     Linus/maintainers: "Oh ok yes that was a userspace breaking change,
>>                        lets revert it"
>>
>> Just because Perf can handle 3 different names for cycles doesn't mean
>> other tools can.
> 
> cycles was a bad event name, dsu is a terrible name for what is mainly
> the l3 cache, the risk that the two are combined get broken I'm fine
> with as neoverse users with uncore permissions are say much rarer than
> Apple M users. Having a cycles and a bus_cycles event is already
> ambiguous, they sound the same. Renaming cycles to cpu_cycles would be
> best.
> 
>>>> FWIW I don't think Juno currently is broken if the kernel supports
>>>> extended type ID? I could have missed some output in this thread but it
>>>> seems like it's mostly related to Apple M hardware. I'm also a bit
>>>> confused why the "supports extended type" check fails there, but maybe
>>>> the v6.9 commit 25412c036 from Mark is missing?
>>>
>>> So I think your later emails clarify Arnaldo is probably missing:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/drivers/perf/arm_pmu.c?h=perf-tools-next&id=5c816728651ae425954542fed64d21d40cb75a9f
>>>
>>> Fwiw, the Apple M hardware issue came to me by way of Mark Rutland
>>> (iirc), this regression report, etc. My understanding is that Apple M
>>> has something like a v2 ARM PMU and the legacy events are encoded
>>> incorrectly in the driver for this. The regression in v6.5 happened
>>
>> I'm not sure about that. The M PMU events may be incomplete, but the two
>> that are there have a mapping that looks sane:
>>
>>     static const unsigned m1_pmu_perf_map[PERF_COUNT_HW_MAX] = {
>>          PERF_MAP_ALL_UNSUPPORTED,
>>          [PERF_COUNT_HW_CPU_CYCLES]      = M1_PMU_PERFCTR_CPU_CYCLES,
>>          [PERF_COUNT_HW_INSTRUCTIONS]    = M1_PMU_PERFCTR_INSTRUCTIONS,
>>          /* No idea about the rest yet */
>>     };
>>
>> And they map to the same named events:
>>
>>     static struct attribute *m1_pmu_event_attrs[] = {
>>          M1_PMU_EVENT_ATTR(cycles, M1_PMU_PERFCTR_CPU_CYCLES),
>>          M1_PMU_EVENT_ATTR(instructions, M1_PMU_PERFCTR_INSTRUCTIONS),
>>          NULL,
>>     };
>>
>> So in this case I can't see using legacy vs sysfs events making a
>> difference. Maybe there is some other case that was mentioned in a
>> previous thread that I missed though.
> 
> No idea, iirc Mark Rutland requested not to use legacy events for Apple M.
> 

The point I was trying to make here was that there isn't _technically_ 
any user facing bug on Apple M with both a new kernel and new perf, 
despite the issues Mark mentioned.

I think there's a bit more subtlety in Mark's request. Using sysfs is 
only required for old kernels that don't support extended type ID, and 
it's not specific to apple M, that's for everywhere. The other case he 
mentioned was when the events are slightly different but with the same 
name as legacy, which isn't the case here specifically but is already 
fixed by  ("perf parse-events: Make legacy events lower priority than 
sysfs/JSON") (v6.8).

>>> because ARM's core PMUs had previously been treated as uncore PMUs,
>>> meaning we wouldn't try to program legacy events on them. Fixing the
>>> handling of ARM's core PMUs broke Apple M due to the broken legacy
>>> event mappings. Why not fix the Apple M PMU driver? Well there was
>>> anyway a similar RISC-V issue reported by Atish Patra (iirc) where the
>>> RISC-V PMU driver wants to delegate the mapping of legacy events to
>>> the perf tool so the driver needn't be aware of all and future RISC-V
>>> configurations. The fix discussed with Mark, Atish, etc. has been to
>>> swap the priority of legacy and sysfs/json events so that the latter
>>> has priority. We need the revert of the revert as currently we only do
>>> this if a PMU is specified with an event, not for the general wildcard
>>> PMUs case that most people use. There was huge fallout from flipping
>>
>> Yep makes sense to do the revert if RISC-V isn't going to support any
>> legacy events. Although from what I understand that would technically
>> only require JSON to be the highest priority? Because putting named
>> events in sysfs still requires kernel involvement so doesn't get you any
>> further than supporting the legacy events?
> 
> The sysfs and json event handling is interwoven, for example you can
> add to a sysfs event with json information. There are basically two
> approaches in the event parser, hardcoded legacy things and event
> names (optionally with PMU names). I'm trying to get rid of the
> hardcoded legacy things as they were fine when you had a single core
> type, but I want to have events everywhere - say instructions and
> cycles on a GPU so we can IPC on a GPU. For RISC-V as long as the
> legacy events are covered as names in json and json/sysfs has priority
> over legacy then things will be fine.
> 
>> Seems like there is another reason to do the revert though as Mark
>> mentioned: That now directly specifying the PMU eg "-e
>> arm_cortex_a56/cycles/" opens a legacy event if the event matches one,
>> which is not the best thing to do. But the revert fixes this AFAIK, so
>> while having the priority JSON/legacy/sysfs might work for RISC-V it
>> wouldn't work for a platform that wants a slightly different sysfs event
>> than legacy but with the same name. And the priority should be
>> JSON/sysfs/legacy.
> 
> The priority for events with a PMU is the sysfs/json has a priority
> over legacy names, so I don't understand what you're saying here. Your
> example shouldn't be broken. The revert is for the case where no PMU
> is specified, where the priority is the opposite which is at best
> inconsistent.
> 

Yep you're right, I got confused with the original bug report which is 
now old. With commit a24d9d9dc ("perf parse-events: Make legacy events 
lower priority than sysfs/JSON") (v6.8) named PMUs do prioritize sysfs.

>>> the priority particularly on Intel as all test expectations needed
>>> updating. I've sent out similar fixes that need incorporating when the
>>> revert is reverted. Ideally tools/perf/tests/parse-events.c would be
>>> updated to cover ARM's PMUs that don't follow the normal pattern that
>>> the core PMU is called "cpu" (this would mean that we were testing
>>> event parsing on ARM was WAI wrt encoding priorities, BIG.little,
>>> etc).
>>>
>>>> I sent a small fix the other day to make perf stat default arguments
>>>> work on Juno, and didn't notice anything out of the ordinary:
>>>> https://lore.kernel.org/linux-perf-users/dac6ad1d-5aca-48b4-9dcb-ff7e54ca43f6@linaro.org/T/#t
>>>> I agree that change is quite narrow but it does incrementally improve
>>>> things for the time being. It's possible that it would become redundant
>>>> if I can just include Ian's change to use strings for Perf stat.
>>>
>>> I'd prefer we didn't merge this as we'd need to rebase:
>>> https://lore.kernel.org/lkml/20240510053705.2462258-4-irogers@google.com/
>>> and those changes would then delete the code introduced. I'm fine with
>>> adding the tests.
>>>
>>> There are more exotic heterogeneous core things upcoming, probably
>>> also from ARM, and the thought of duplicating the default attribute
>>> logic and event parsing constraints is just something I'd prefer not
>>> to have to do.
>>>
>>
>> Yep I don't have any strong feelings about this. Even if we don't merge
>> it it helped me understand the code and the issue a bit.
>>
>> I think one thing I assumed about your change was that there was some
>> dependency on these other changes. But the more I look at it I think
>> it's actually fine on it's own?
> 
> Which change? If the change is trying to use "cycles" to open on all
> PMUs because it will be wild carded then it will run into the priority
> issue.
> 

Just patch 3 here: 
https://lore.kernel.org/lkml/20240510053705.2462258-4-irogers@google.com/

I assume it works because we don't open on uncore right now. But I'm 
still rebasing and testing it. So we could merge that, and then when we 
do the priority revert along with the fix to ignore the DSU error it 
will continue to work.

>> Using the cycles string actually works today, even on Apple M. The only
>> real remaining issue is softening the error for failure to open, but
>> that's _after_ doing the revert of the revert and is separate.
>>
>> I will re-test that one today with fresh eyes.
> 
> Perhaps it is other legacy events, not cycles and instructions. There
> must have been a reason for this regression report but I don't have an
> Apple M CPU to test on.
> 

This regression report is for various (admittedly extremely confusing) 
combinations of kernels and perfs without the following patches:

5c81672865 ("arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability")
    (v6.6 kernel release)

25412c036 ("perf print-events: make is_event_supported() more robust")
    (v6.9 Perf release for Apple M)

a24d9d9dc ("perf parse-events: Make legacy events lower priority than
             sysfs/JSON")
    (v6.8 Perf)

With all of those applied everything is fixed even on Apple M. I don't 
think anything needs to be fixed for the bare "-e cycles" that you 
mentioned at the beginning of the chain because that never regressed, it 
actually never worked on big.LITTLE until 5c81672865, and after that 
using legacy was fine. I don't think Mark actually wants bare "cycles" 
to _not_ use legacy either because it never did. He only mentioned what 
happens when you really do want to target a PMU with a name (already 
fixed in a24d9d9dc).

> Thanks,
> Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2024-08-19 14:56                 ` James Clark
@ 2024-08-19 15:44                   ` Ian Rogers
  0 siblings, 0 replies; 53+ messages in thread
From: Ian Rogers @ 2024-08-19 15:44 UTC (permalink / raw)
  To: James Clark
  Cc: Thorsten Leemhuis, Arnaldo Carvalho de Melo, Mark Rutland,
	Linux perf Profiling, Linux Kernel Mailing List, James Clark,
	cc: Marc Zyngier, Hector Martin, Asahi Linux,
	Linux regressions mailing list, Atish Patra

On Mon, Aug 19, 2024 at 7:56 AM James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 16/08/2024 4:30 pm, Ian Rogers wrote:
> > On Fri, Aug 16, 2024 at 2:23 AM James Clark <james.clark@linaro.org> wrote:
> >>
> >>
> >>
> >> On 15/08/2024 6:29 pm, Ian Rogers wrote:
> >>> On Wed, Aug 14, 2024 at 9:28 AM James Clark <james.clark@linaro.org> wrote:
> >>>> On 07/08/2024 9:54 am, Thorsten Leemhuis wrote:
> >>>>> On 01.08.24 21:05, Ian Rogers wrote:
> >>>>>> On Wed, Dec 6, 2023 at 4:09 AM Linux regression tracking #update
> >>>>>> (Thorsten Leemhuis) <regressions@leemhuis.info> wrote:
> >>>>>>>
> >>>>>>> [TLDR: This mail in primarily relevant for Linux kernel regression
> >>>>>>> tracking. See link in footer if these mails annoy you.]
> >>>>>>>
> >>>>>>> On 22.11.23 00:43, Bagas Sanjaya wrote:
> >>>>>>>> On Tue, Nov 21, 2023 at 09:08:48PM +0900, Hector Martin wrote:
> >>>>>>>>> Perf broke on all Apple ARM64 systems (tested almost everything), and
> >>>>>>>>> according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> >>>>>>>
> >>>>>>> #regzbot fix: perf parse-events: Make legacy events lower priority than
> >>>>>>> sysfs/JSON
> >>>>>>> #regzbot ignore-activity
> >>>>>>
> >>>>>> Note, this is still broken.
> >>>>>
> >>>>> Hmmm, so all that became somewhat messy. Arnaldo, what's the way out of
> >>>>> this? Or is this a "we are screwed one way or another and someone has to
> >>>>> bite the bullet" situation?
> >>>>>
> >>>>> Ciao, Thorsten
> >>>>>
> >>>>>> The patch changed the priority in the case
> >>>>>> that you do something like:
> >>>>>>
> >>>>>> $ perf stat -e 'armv8_pmuv3_0/cycles/' benchmark
> >>>>>>
> >>>>>> but if you do:
> >>>>>>
> >>>>>> $ perf stat -e 'cycles' benchmark
> >>>>>>
> >>>>>> then the broken behavior will happen as legacy events have priority
> >>>>>> over sysfs/json events in that case. To fix this you need to revert:
> >>>>>> 4f1b067359ac Revert "perf parse-events: Prefer sysfs/JSON hardware
> >>>>>> events over legacy"
> >>>>>>
> >>>>>> This causes some testing issues resolved in this unmerged patch series:
> >>>>>> https://lore.kernel.org/lkml/20240510053705.2462258-1-irogers@google.com/
> >>>>>>
> >>>>>> There is a bug as the arm_dsu PMU advertises an event called "cycles"
> >>>>>> and this PMU is present on Ampere systems. Reverting the commit above
> >>>>>> will cause an issue as the commit 7b100989b4f6 ("perf evlist: Remove
> >>>>>> __evlist__add_default") to fix ARM's BIG.little systems (opening a
> >>>>>> cycles event on all PMUs not just 1) will cause the arm_dsu event to
> >>>>>> be opened by perf record and fail as the event won't support sampling.
> >>>>>>
> >>>>>> The patch https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
> >>>>>> fixes this by only opening the cycles event on core PMUs when choosing
> >>>>>> default events.
> >>>>>>
> >>>>>> Rather than take this patch the revert happened as Linus runs the
> >>>>>> command "perf record -e cycles:pp" (ie using a specified event and not
> >>>>>> defaults) and considers it a regression in the perf tool that on an
> >>>>>> Ampere system to need to do "perf record -e
> >>>>>> 'armv8_pmuv3_0/cycles/pp'". It was pointed out that not specifying -e
> >>>>>> will choose the cycles event correctly and with better precision the
> >>>>>> pp for systems that support it, but it was still considered a
> >>>>>> regression in the perf tool so the revert was made to happen. There is
> >>>>>> a lack of perf testing coverage for ARM, in particular as they choose
> >>>>>> to do everything in a different way to x86. The patch in question was
> >>>>>> in the linux-next tree for weeks without issues.
> >>>>>>
> >>>>>> ARM/Ampere could fix this by renaming the event from cycles to
> >>>>>> cpu_cycles, or by following Intel's convention that anything uncore
> >>>>>> uses the name clockticks rather than cycles. This could break people
> >>>>>> who rely on an event called arm_dsu/cycles/ but I imagine such people
> >>>>>> are rare. There has been no progress I'm aware of on renaming the
> >>>>>> event.
> >>>>>>
> >>>>>> Making perf not terminate on opening an event for perf record seems
> >>>>>> like the most likely workaround as that is at least something under
> >>>>>> the tool maintainers control. ARM have discussed doing this on the
> >>>>>> lists:
> >>>>>> https://lore.kernel.org/lkml/f30f676e-a1d7-4d6b-94c1-3bdbd1448887@arm.com/
> >>>>>> but since the revert in v6.10 no patches have appeared for the v6.11
> >>>>>> merge window. Feature work like coresight improvements and ARMv9 are
> >>>>>> being actively pursued by ARM, but feature work won't resolve this
> >>>>>> regression.
> >>>>>>
> >>>>
> >>>> I got some hardware with the DSU PMU so I'm going to have a go at trying
> >>>> to send some fixes for this. My initial idea was to try incorporate the
> >>>> "not terminate on opening" change as discussed in the link directly
> >>>> above. And then do the revert of the "revert of prefer sysfs/json".
> >>>
> >>> Thanks, I think this would be good. The biggest issue is that none of
> >>> the record logic expects a file descriptor to be not opened, deleting
> >>> unopened evsels from the evlist breaks all the indexing into the
> >>> mmaps, etc. Tbh, you probably wouldn't do the code this way if was
> >>> written afresh. Perhaps a hashmap would map from an evsel to ring
> >>> buffer mmaps, etc. Trying to avoid having global state and benefitting
> >>> from encapsulation. I'd focus on just doing the expedient thing in the
> >>> changes, which probably just means making the record code tolerant of
> >>> evsels that fail to open and not modifying the evlist due to the risk
> >>> it breaks the indices.
> >>>
> >>
> >> Thanks for the tips.
> >>
> >>> (To point out the obvious, this work wouldn't be necessary if arm_dsu
> >>> event were renamed from "cycles" to "cpu_cycles" which would also make
> >>> it more intention revealing alongside the arm_dsu's "bus_cycles" event
> >>> name).
> >>>
> >>
> >> I understand but I can imagine the following conversation if we rename that:
> >>
> >>     User: "I updated my kernel and now my (non Perf) tool fails to open
> >>            the DSU cycles event because it doesn't exist anymore"
> >>
> >>     Linus/maintainers: "Oh ok yes that was a userspace breaking change,
> >>                        lets revert it"
> >>
> >> Just because Perf can handle 3 different names for cycles doesn't mean
> >> other tools can.
> >
> > cycles was a bad event name, dsu is a terrible name for what is mainly
> > the l3 cache, the risk that the two are combined get broken I'm fine
> > with as neoverse users with uncore permissions are say much rarer than
> > Apple M users. Having a cycles and a bus_cycles event is already
> > ambiguous, they sound the same. Renaming cycles to cpu_cycles would be
> > best.
> >
> >>>> FWIW I don't think Juno currently is broken if the kernel supports
> >>>> extended type ID? I could have missed some output in this thread but it
> >>>> seems like it's mostly related to Apple M hardware. I'm also a bit
> >>>> confused why the "supports extended type" check fails there, but maybe
> >>>> the v6.9 commit 25412c036 from Mark is missing?
> >>>
> >>> So I think your later emails clarify Arnaldo is probably missing:
> >>> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/drivers/perf/arm_pmu.c?h=perf-tools-next&id=5c816728651ae425954542fed64d21d40cb75a9f
> >>>
> >>> Fwiw, the Apple M hardware issue came to me by way of Mark Rutland
> >>> (iirc), this regression report, etc. My understanding is that Apple M
> >>> has something like a v2 ARM PMU and the legacy events are encoded
> >>> incorrectly in the driver for this. The regression in v6.5 happened
> >>
> >> I'm not sure about that. The M PMU events may be incomplete, but the two
> >> that are there have a mapping that looks sane:
> >>
> >>     static const unsigned m1_pmu_perf_map[PERF_COUNT_HW_MAX] = {
> >>          PERF_MAP_ALL_UNSUPPORTED,
> >>          [PERF_COUNT_HW_CPU_CYCLES]      = M1_PMU_PERFCTR_CPU_CYCLES,
> >>          [PERF_COUNT_HW_INSTRUCTIONS]    = M1_PMU_PERFCTR_INSTRUCTIONS,
> >>          /* No idea about the rest yet */
> >>     };
> >>
> >> And they map to the same named events:
> >>
> >>     static struct attribute *m1_pmu_event_attrs[] = {
> >>          M1_PMU_EVENT_ATTR(cycles, M1_PMU_PERFCTR_CPU_CYCLES),
> >>          M1_PMU_EVENT_ATTR(instructions, M1_PMU_PERFCTR_INSTRUCTIONS),
> >>          NULL,
> >>     };
> >>
> >> So in this case I can't see using legacy vs sysfs events making a
> >> difference. Maybe there is some other case that was mentioned in a
> >> previous thread that I missed though.
> >
> > No idea, iirc Mark Rutland requested not to use legacy events for Apple M.
> >
>
> The point I was trying to make here was that there isn't _technically_
> any user facing bug on Apple M with both a new kernel and new perf,
> despite the issues Mark mentioned.
>
> I think there's a bit more subtlety in Mark's request. Using sysfs is
> only required for old kernels that don't support extended type ID, and
> it's not specific to apple M, that's for everywhere. The other case he
> mentioned was when the events are slightly different but with the same
> name as legacy, which isn't the case here specifically but is already
> fixed by  ("perf parse-events: Make legacy events lower priority than
> sysfs/JSON") (v6.8).
>
> >>> because ARM's core PMUs had previously been treated as uncore PMUs,
> >>> meaning we wouldn't try to program legacy events on them. Fixing the
> >>> handling of ARM's core PMUs broke Apple M due to the broken legacy
> >>> event mappings. Why not fix the Apple M PMU driver? Well there was
> >>> anyway a similar RISC-V issue reported by Atish Patra (iirc) where the
> >>> RISC-V PMU driver wants to delegate the mapping of legacy events to
> >>> the perf tool so the driver needn't be aware of all and future RISC-V
> >>> configurations. The fix discussed with Mark, Atish, etc. has been to
> >>> swap the priority of legacy and sysfs/json events so that the latter
> >>> has priority. We need the revert of the revert as currently we only do
> >>> this if a PMU is specified with an event, not for the general wildcard
> >>> PMUs case that most people use. There was huge fallout from flipping
> >>
> >> Yep makes sense to do the revert if RISC-V isn't going to support any
> >> legacy events. Although from what I understand that would technically
> >> only require JSON to be the highest priority? Because putting named
> >> events in sysfs still requires kernel involvement so doesn't get you any
> >> further than supporting the legacy events?
> >
> > The sysfs and json event handling is interwoven, for example you can
> > add to a sysfs event with json information. There are basically two
> > approaches in the event parser, hardcoded legacy things and event
> > names (optionally with PMU names). I'm trying to get rid of the
> > hardcoded legacy things as they were fine when you had a single core
> > type, but I want to have events everywhere - say instructions and
> > cycles on a GPU so we can IPC on a GPU. For RISC-V as long as the
> > legacy events are covered as names in json and json/sysfs has priority
> > over legacy then things will be fine.
> >
> >> Seems like there is another reason to do the revert though as Mark
> >> mentioned: That now directly specifying the PMU eg "-e
> >> arm_cortex_a56/cycles/" opens a legacy event if the event matches one,
> >> which is not the best thing to do. But the revert fixes this AFAIK, so
> >> while having the priority JSON/legacy/sysfs might work for RISC-V it
> >> wouldn't work for a platform that wants a slightly different sysfs event
> >> than legacy but with the same name. And the priority should be
> >> JSON/sysfs/legacy.
> >
> > The priority for events with a PMU is the sysfs/json has a priority
> > over legacy names, so I don't understand what you're saying here. Your
> > example shouldn't be broken. The revert is for the case where no PMU
> > is specified, where the priority is the opposite which is at best
> > inconsistent.
> >
>
> Yep you're right, I got confused with the original bug report which is
> now old. With commit a24d9d9dc ("perf parse-events: Make legacy events
> lower priority than sysfs/JSON") (v6.8) named PMUs do prioritize sysfs.
>
> >>> the priority particularly on Intel as all test expectations needed
> >>> updating. I've sent out similar fixes that need incorporating when the
> >>> revert is reverted. Ideally tools/perf/tests/parse-events.c would be
> >>> updated to cover ARM's PMUs that don't follow the normal pattern that
> >>> the core PMU is called "cpu" (this would mean that we were testing
> >>> event parsing on ARM was WAI wrt encoding priorities, BIG.little,
> >>> etc).
> >>>
> >>>> I sent a small fix the other day to make perf stat default arguments
> >>>> work on Juno, and didn't notice anything out of the ordinary:
> >>>> https://lore.kernel.org/linux-perf-users/dac6ad1d-5aca-48b4-9dcb-ff7e54ca43f6@linaro.org/T/#t
> >>>> I agree that change is quite narrow but it does incrementally improve
> >>>> things for the time being. It's possible that it would become redundant
> >>>> if I can just include Ian's change to use strings for Perf stat.
> >>>
> >>> I'd prefer we didn't merge this as we'd need to rebase:
> >>> https://lore.kernel.org/lkml/20240510053705.2462258-4-irogers@google.com/
> >>> and those changes would then delete the code introduced. I'm fine with
> >>> adding the tests.
> >>>
> >>> There are more exotic heterogeneous core things upcoming, probably
> >>> also from ARM, and the thought of duplicating the default attribute
> >>> logic and event parsing constraints is just something I'd prefer not
> >>> to have to do.
> >>>
> >>
> >> Yep I don't have any strong feelings about this. Even if we don't merge
> >> it it helped me understand the code and the issue a bit.
> >>
> >> I think one thing I assumed about your change was that there was some
> >> dependency on these other changes. But the more I look at it I think
> >> it's actually fine on it's own?
> >
> > Which change? If the change is trying to use "cycles" to open on all
> > PMUs because it will be wild carded then it will run into the priority
> > issue.
> >
>
> Just patch 3 here:
> https://lore.kernel.org/lkml/20240510053705.2462258-4-irogers@google.com/
>
> I assume it works because we don't open on uncore right now. But I'm
> still rebasing and testing it. So we could merge that, and then when we
> do the priority revert along with the fix to ignore the DSU error it
> will continue to work.
>
> >> Using the cycles string actually works today, even on Apple M. The only
> >> real remaining issue is softening the error for failure to open, but
> >> that's _after_ doing the revert of the revert and is separate.
> >>
> >> I will re-test that one today with fresh eyes.
> >
> > Perhaps it is other legacy events, not cycles and instructions. There
> > must have been a reason for this regression report but I don't have an
> > Apple M CPU to test on.
> >
>
> This regression report is for various (admittedly extremely confusing)
> combinations of kernels and perfs without the following patches:
>
> 5c81672865 ("arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability")
>     (v6.6 kernel release)
>
> 25412c036 ("perf print-events: make is_event_supported() more robust")
>     (v6.9 Perf release for Apple M)
>
> a24d9d9dc ("perf parse-events: Make legacy events lower priority than
>              sysfs/JSON")
>     (v6.8 Perf)
>
> With all of those applied everything is fixed even on Apple M. I don't
> think anything needs to be fixed for the bare "-e cycles" that you
> mentioned at the beginning of the chain because that never regressed, it
> actually never worked on big.LITTLE until 5c81672865, and after that
> using legacy was fine. I don't think Mark actually wants bare "cycles"
> to _not_ use legacy either because it never did. He only mentioned what
> happens when you really do want to target a PMU with a name (already
> fixed in a24d9d9dc).

I'm not clear, is your point that when we get regression reports on
the tool like this and Mark says things to me face-to-face at LPC we
should ignore the issue and wait for the driver fix? The PMU driver
for Apple M has fixed the legacy defaults for instructions and cycles,
great - this was the obvious fix for a driver issue from the get go.
Has it fixed all legacy values? Are you saying we should flip from
sysfs/json preferred over legacy to legacy preferred over sysfs json?
I still would like to get rid of legacy events having different wild
card behavior, cpu-cycles (legacy - matches only core PMUs) vs
cpu_cycles (sysfs - matches on all PMUs) but if we need to carry this
awkwardness for the sake of arm_dsu then *sigh* ok, it'll be forever a
potential trap when writing metrics - beware magic legacy names that
won't work on anything other than core PMUs. We carry lots of other
discrepancies around for things like arbitrary hex cut off values to
work around PMU suffix naming (5fabcdef vs a53 - both hex suffixes
with different interpretations), hotplug handling, etc. One concern
that's been raised is other tools being able to work correctly, given
the minefield set up in this regard I can imagine legacy events
working but little else. At least we can work to have the reference
implementation that comes with the kernel working.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2024-08-17  1:38                 ` Atish Kumar Patra
@ 2024-08-20  8:58                   ` James Clark
  0 siblings, 0 replies; 53+ messages in thread
From: James Clark @ 2024-08-20  8:58 UTC (permalink / raw)
  To: Atish Kumar Patra, Ian Rogers
  Cc: Thorsten Leemhuis, Arnaldo Carvalho de Melo, Mark Rutland,
	Linux perf Profiling, Linux Kernel Mailing List, James Clark,
	cc: Marc Zyngier, Hector Martin, Asahi Linux,
	Linux regressions mailing list



On 17/08/2024 2:38 am, Atish Kumar Patra wrote:
> On Fri, Aug 16, 2024 at 8:30 AM Ian Rogers <irogers@google.com> wrote:
>>
>> On Fri, Aug 16, 2024 at 2:23 AM James Clark <james.clark@linaro.org> wrote:
>>>
>>>
>>>
>>> On 15/08/2024 6:29 pm, Ian Rogers wrote:
>>>> On Wed, Aug 14, 2024 at 9:28 AM James Clark <james.clark@linaro.org> wrote:
>>>>> On 07/08/2024 9:54 am, Thorsten Leemhuis wrote:
>>>>>> On 01.08.24 21:05, Ian Rogers wrote:
>>>>>>> On Wed, Dec 6, 2023 at 4:09 AM Linux regression tracking #update
>>>>>>> (Thorsten Leemhuis) <regressions@leemhuis.info> wrote:
>>>>>>>>
>>>>>>>> [TLDR: This mail in primarily relevant for Linux kernel regression
>>>>>>>> tracking. See link in footer if these mails annoy you.]
>>>>>>>>
>>>>>>>> On 22.11.23 00:43, Bagas Sanjaya wrote:
>>>>>>>>> On Tue, Nov 21, 2023 at 09:08:48PM +0900, Hector Martin wrote:
>>>>>>>>>> Perf broke on all Apple ARM64 systems (tested almost everything), and
>>>>>>>>>> according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
>>>>>>>>
>>>>>>>> #regzbot fix: perf parse-events: Make legacy events lower priority than
>>>>>>>> sysfs/JSON
>>>>>>>> #regzbot ignore-activity
>>>>>>>
>>>>>>> Note, this is still broken.
>>>>>>
>>>>>> Hmmm, so all that became somewhat messy. Arnaldo, what's the way out of
>>>>>> this? Or is this a "we are screwed one way or another and someone has to
>>>>>> bite the bullet" situation?
>>>>>>
>>>>>> Ciao, Thorsten
>>>>>>
>>>>>>> The patch changed the priority in the case
>>>>>>> that you do something like:
>>>>>>>
>>>>>>> $ perf stat -e 'armv8_pmuv3_0/cycles/' benchmark
>>>>>>>
>>>>>>> but if you do:
>>>>>>>
>>>>>>> $ perf stat -e 'cycles' benchmark
>>>>>>>
>>>>>>> then the broken behavior will happen as legacy events have priority
>>>>>>> over sysfs/json events in that case. To fix this you need to revert:
>>>>>>> 4f1b067359ac Revert "perf parse-events: Prefer sysfs/JSON hardware
>>>>>>> events over legacy"
>>>>>>>
>>>>>>> This causes some testing issues resolved in this unmerged patch series:
>>>>>>> https://lore.kernel.org/lkml/20240510053705.2462258-1-irogers@google.com/
>>>>>>>
>>>>>>> There is a bug as the arm_dsu PMU advertises an event called "cycles"
>>>>>>> and this PMU is present on Ampere systems. Reverting the commit above
>>>>>>> will cause an issue as the commit 7b100989b4f6 ("perf evlist: Remove
>>>>>>> __evlist__add_default") to fix ARM's BIG.little systems (opening a
>>>>>>> cycles event on all PMUs not just 1) will cause the arm_dsu event to
>>>>>>> be opened by perf record and fail as the event won't support sampling.
>>>>>>>
>>>>>>> The patch https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
>>>>>>> fixes this by only opening the cycles event on core PMUs when choosing
>>>>>>> default events.
>>>>>>>
>>>>>>> Rather than take this patch the revert happened as Linus runs the
>>>>>>> command "perf record -e cycles:pp" (ie using a specified event and not
>>>>>>> defaults) and considers it a regression in the perf tool that on an
>>>>>>> Ampere system to need to do "perf record -e
>>>>>>> 'armv8_pmuv3_0/cycles/pp'". It was pointed out that not specifying -e
>>>>>>> will choose the cycles event correctly and with better precision the
>>>>>>> pp for systems that support it, but it was still considered a
>>>>>>> regression in the perf tool so the revert was made to happen. There is
>>>>>>> a lack of perf testing coverage for ARM, in particular as they choose
>>>>>>> to do everything in a different way to x86. The patch in question was
>>>>>>> in the linux-next tree for weeks without issues.
>>>>>>>
>>>>>>> ARM/Ampere could fix this by renaming the event from cycles to
>>>>>>> cpu_cycles, or by following Intel's convention that anything uncore
>>>>>>> uses the name clockticks rather than cycles. This could break people
>>>>>>> who rely on an event called arm_dsu/cycles/ but I imagine such people
>>>>>>> are rare. There has been no progress I'm aware of on renaming the
>>>>>>> event.
>>>>>>>
>>>>>>> Making perf not terminate on opening an event for perf record seems
>>>>>>> like the most likely workaround as that is at least something under
>>>>>>> the tool maintainers control. ARM have discussed doing this on the
>>>>>>> lists:
>>>>>>> https://lore.kernel.org/lkml/f30f676e-a1d7-4d6b-94c1-3bdbd1448887@arm.com/
>>>>>>> but since the revert in v6.10 no patches have appeared for the v6.11
>>>>>>> merge window. Feature work like coresight improvements and ARMv9 are
>>>>>>> being actively pursued by ARM, but feature work won't resolve this
>>>>>>> regression.
>>>>>>>
>>>>>
>>>>> I got some hardware with the DSU PMU so I'm going to have a go at trying
>>>>> to send some fixes for this. My initial idea was to try incorporate the
>>>>> "not terminate on opening" change as discussed in the link directly
>>>>> above. And then do the revert of the "revert of prefer sysfs/json".
>>>>
>>>> Thanks, I think this would be good. The biggest issue is that none of
>>>> the record logic expects a file descriptor to be not opened, deleting
>>>> unopened evsels from the evlist breaks all the indexing into the
>>>> mmaps, etc. Tbh, you probably wouldn't do the code this way if was
>>>> written afresh. Perhaps a hashmap would map from an evsel to ring
>>>> buffer mmaps, etc. Trying to avoid having global state and benefitting
>>>> from encapsulation. I'd focus on just doing the expedient thing in the
>>>> changes, which probably just means making the record code tolerant of
>>>> evsels that fail to open and not modifying the evlist due to the risk
>>>> it breaks the indices.
>>>>
>>>
>>> Thanks for the tips.
>>>
>>>> (To point out the obvious, this work wouldn't be necessary if arm_dsu
>>>> event were renamed from "cycles" to "cpu_cycles" which would also make
>>>> it more intention revealing alongside the arm_dsu's "bus_cycles" event
>>>> name).
>>>>
>>>
>>> I understand but I can imagine the following conversation if we rename that:
>>>
>>>     User: "I updated my kernel and now my (non Perf) tool fails to open
>>>            the DSU cycles event because it doesn't exist anymore"
>>>
>>>     Linus/maintainers: "Oh ok yes that was a userspace breaking change,
>>>                        lets revert it"
>>>
>>> Just because Perf can handle 3 different names for cycles doesn't mean
>>> other tools can.
>>
>> cycles was a bad event name, dsu is a terrible name for what is mainly
>> the l3 cache, the risk that the two are combined get broken I'm fine
>> with as neoverse users with uncore permissions are say much rarer than
>> Apple M users. Having a cycles and a bus_cycles event is already
>> ambiguous, they sound the same. Renaming cycles to cpu_cycles would be
>> best.
>>
>>>>> FWIW I don't think Juno currently is broken if the kernel supports
>>>>> extended type ID? I could have missed some output in this thread but it
>>>>> seems like it's mostly related to Apple M hardware. I'm also a bit
>>>>> confused why the "supports extended type" check fails there, but maybe
>>>>> the v6.9 commit 25412c036 from Mark is missing?
>>>>
>>>> So I think your later emails clarify Arnaldo is probably missing:
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/drivers/perf/arm_pmu.c?h=perf-tools-next&id=5c816728651ae425954542fed64d21d40cb75a9f
>>>>
>>>> Fwiw, the Apple M hardware issue came to me by way of Mark Rutland
>>>> (iirc), this regression report, etc. My understanding is that Apple M
>>>> has something like a v2 ARM PMU and the legacy events are encoded
>>>> incorrectly in the driver for this. The regression in v6.5 happened
>>>
>>> I'm not sure about that. The M PMU events may be incomplete, but the two
>>> that are there have a mapping that looks sane:
>>>
>>>     static const unsigned m1_pmu_perf_map[PERF_COUNT_HW_MAX] = {
>>>          PERF_MAP_ALL_UNSUPPORTED,
>>>          [PERF_COUNT_HW_CPU_CYCLES]      = M1_PMU_PERFCTR_CPU_CYCLES,
>>>          [PERF_COUNT_HW_INSTRUCTIONS]    = M1_PMU_PERFCTR_INSTRUCTIONS,
>>>          /* No idea about the rest yet */
>>>     };
>>>
>>> And they map to the same named events:
>>>
>>>     static struct attribute *m1_pmu_event_attrs[] = {
>>>          M1_PMU_EVENT_ATTR(cycles, M1_PMU_PERFCTR_CPU_CYCLES),
>>>          M1_PMU_EVENT_ATTR(instructions, M1_PMU_PERFCTR_INSTRUCTIONS),
>>>          NULL,
>>>     };
>>>
>>> So in this case I can't see using legacy vs sysfs events making a
>>> difference. Maybe there is some other case that was mentioned in a
>>> previous thread that I missed though.
>>
>> No idea, iirc Mark Rutland requested not to use legacy events for Apple M.
>>
>>>> because ARM's core PMUs had previously been treated as uncore PMUs,
>>>> meaning we wouldn't try to program legacy events on them. Fixing the
>>>> handling of ARM's core PMUs broke Apple M due to the broken legacy
>>>> event mappings. Why not fix the Apple M PMU driver? Well there was
>>>> anyway a similar RISC-V issue reported by Atish Patra (iirc) where the
>>>> RISC-V PMU driver wants to delegate the mapping of legacy events to
>>>> the perf tool so the driver needn't be aware of all and future RISC-V
>>>> configurations. The fix discussed with Mark, Atish, etc. has been to
>>>> swap the priority of legacy and sysfs/json events so that the latter
>>>> has priority. We need the revert of the revert as currently we only do
>>>> this if a PMU is specified with an event, not for the general wildcard
>>>> PMUs case that most people use. There was huge fallout from flipping
>>>
>>> Yep makes sense to do the revert if RISC-V isn't going to support any
>>> legacy events. Although from what I understand that would technically
>>> only require JSON to be the highest priority? Because putting named
>>> events in sysfs still requires kernel involvement so doesn't get you any
>>> further than supporting the legacy events?
>>
>> The sysfs and json event handling is interwoven, for example you can
>> add to a sysfs event with json information. There are basically two
>> approaches in the event parser, hardcoded legacy things and event
>> names (optionally with PMU names). I'm trying to get rid of the
>> hardcoded legacy things as they were fine when you had a single core
>> type, but I want to have events everywhere - say instructions and
>> cycles on a GPU so we can IPC on a GPU. For RISC-V as long as the
>> legacy events are covered as names in json and json/sysfs has priority
>> over legacy then things will be fine.
>>
> 
> RISC-V does want to support legacy events as that's how users on other
> architectures are used to
> run perf. It would be weird if we don't support it.
> 
> Our initial reasoning behind relying on json for legacy events to
> avoid vendor specific encodings for these
> events in the driver. Unlike other ISAs, RISC-V ISA doesn't define an
> event encoding for these legacy
> events. As a result every platform vendor will have custom encoding.
> Managing them in the driver is
> cumbersome. Many thanks to Ian for posting the patches to reverse the
> priority which works fine for RISC-V.
> 
> However, I understand that it is easier said than done and some use
> cases are broken. We also discovered
> there are few other use cases which still have the same problem even
> if we solve the bigger problem via json parsing
> for legacy events.
> 
> 1. Any other user profiling application that invokes perf system calls
> directly may also try to just legacy event attributes in
> perf_event_attr.
> Android simpleperf application also falls in this category. We need to
> describe the platform specific encoding somewhere for these
> applications.
> 

I think this use case is important. Not just for profiling applications 
but even something that wants to monitor itself. I imagine opening 
PERF_COUNT_HW_CPU_CYCLES or INSRUCTIONS is actually somewhat common, and 
I don't think every application that wants to do perf system calls 
should have to maintain JSON mappings for all platforms. That doesn't 
sound feasible to me, unless there is a smart way to do it? Maybe the 
mappings could be in libperf or something? But then that still requires 
everyone to add that as a dependency and keep it up to date. By that 
point you might as well just add them in the kernel and keep the 
existing interface.

James

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
  2024-08-01 19:05     ` Ian Rogers
  2024-08-07  8:54       ` Thorsten Leemhuis
@ 2025-03-09 21:19       ` Ian Rogers
  1 sibling, 0 replies; 53+ messages in thread
From: Ian Rogers @ 2025-03-09 21:19 UTC (permalink / raw)
  To: Linux regressions mailing list, to: Mark Rutland
  Cc: Linux perf Profiling, Linux Kernel Mailing List, James Clark,
	cc: Marc Zyngier, Hector Martin, Arnaldo Carvalho de Melo,
	Asahi Linux

On Thu, Aug 1, 2024 at 12:05 PM Ian Rogers <irogers@google.com> wrote:
>
> On Wed, Dec 6, 2023 at 4:09 AM Linux regression tracking #update
> (Thorsten Leemhuis) <regressions@leemhuis.info> wrote:
> >
> > [TLDR: This mail in primarily relevant for Linux kernel regression
> > tracking. See link in footer if these mails annoy you.]
> >
> > On 22.11.23 00:43, Bagas Sanjaya wrote:
> > > On Tue, Nov 21, 2023 at 09:08:48PM +0900, Hector Martin wrote:
> > >> Perf broke on all Apple ARM64 systems (tested almost everything), and
> > >> according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> >
> > #regzbot fix: perf parse-events: Make legacy events lower priority than
> > sysfs/JSON
> > #regzbot ignore-activity
>
> Note, this is still broken. The patch changed the priority in the case
> that you do something like:
>
> $ perf stat -e 'armv8_pmuv3_0/cycles/' benchmark
>
> but if you do:
>
> $ perf stat -e 'cycles' benchmark
>
> then the broken behavior will happen as legacy events have priority
> over sysfs/json events in that case. To fix this you need to revert:
> 4f1b067359ac Revert "perf parse-events: Prefer sysfs/JSON hardware
> events over legacy"

This still hasn't been fixed and I'm at the point of saying I no
longer care except I want consistency. Let's revert the prioritization
of sysfs/json events for PMUs. I don't want to carry around patches
like:
https://lore.kernel.org/r/20240926144851.245903-2-james.clark@linaro.org
If this re-opens this bug then I'm fine with that, and I'm happy to
point to James and Arnaldo's comments [1] saying that somehow legacy
events are better, because drill down or something (what a bit pattern
has to do with that, no idea, we already default on Intel to
non-legacy events and drill down just dandily for topdown). Whatever,
I'm fed up with dealing with mine and others' comments being taken out
of context. I'm fed up with the ambiguity of two encoding systems, one
with and one without PMUs specified. I'm fed up with working on PMU
and event encoding, ordering, matching, metrics, etc. where it is
unclear what the behavior should be. I'm fed up with ARM choosing bad
uncore event names, refusing to correct them and creating a massive
mess they barely help clean up other than by largely reposting my
patches. I'm fed up that all of this was done for ARM and then they
don't seem to care about its resolution or testing the original
regression. Yes, this sucks as user land won't be able to be a source
for event configuration fixes. Yes, this sucks as such functionality
would slim down PMU drivers and was a behavior requested by RISC-V
face-to-face with a maintainer. I don't see why I should have to fight
for this other than I unexpectedly broke things in the first place
(this regression report) and I was trying to help RISC-V.

To be specific, I don't want the event 'instructions' be encoded as
type 'hardware' and config 'instructions', be reported as
'cpu_core/instructions/' but then that event to be encoded as type 4
(RAW) and config 0xc0.

The fact with this cpu-cycles will only wild card on core PMUs, but
cpu_cycles will wildcard on all of them. Again, why do I have to try
to fight for sanity, let's just back everything this regression report
created out. We check legacy events and do their behaviors, otherwise
we fall back on sysfs/json.

Thanks,
Ian

[1] https://lore.kernel.org/all/Z8sMcta0zTWeOso4@x1/

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2025-03-09 21:19 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-21 12:08 [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5 Hector Martin
2023-11-21 13:40 ` Marc Zyngier
2023-11-21 15:24   ` Marc Zyngier
2023-11-21 15:40     ` Mark Rutland
2023-11-21 15:46       ` Ian Rogers
2023-11-21 16:02         ` Mark Rutland
2023-11-21 16:09           ` Ian Rogers
2023-11-21 16:15             ` Mark Rutland
2023-11-21 16:38               ` Ian Rogers
2023-11-22  3:23                 ` Hector Martin
2023-11-22 13:06                   ` Arnaldo Carvalho de Melo
2023-11-22 15:33                     ` Ian Rogers
2023-11-22 15:49                     ` Mark Rutland
2023-11-22 16:04                       ` Ian Rogers
2023-11-22 16:26                         ` Arnaldo Carvalho de Melo
2023-11-22 16:33                           ` Ian Rogers
2023-11-22 16:19                       ` Arnaldo Carvalho de Melo
2023-11-22 13:03                 ` Mark Rutland
2023-11-22 15:29                   ` Ian Rogers
2023-11-22 16:08                     ` Mark Rutland
2023-11-22 16:29                       ` Ian Rogers
2023-11-22 16:55                         ` Arnaldo Carvalho de Melo
2023-11-22 16:59                           ` Ian Rogers
2023-11-23  4:33                             ` Ian Rogers
2023-11-21 15:41     ` Ian Rogers
2023-11-21 15:56       ` Mark Rutland
2023-11-21 16:03         ` Ian Rogers
2023-11-21 16:08           ` Mark Rutland
2023-11-23 14:23     ` Mark Rutland
2023-11-23 14:45       ` Marc Zyngier
2023-11-23 15:14       ` Ian Rogers
2023-11-23 16:48         ` Mark Rutland
2023-11-23 17:08           ` James Clark
2023-11-23 17:15             ` Mark Rutland
2023-11-21 23:43 ` Bagas Sanjaya
2023-12-06 12:09   ` Linux regression tracking #update (Thorsten Leemhuis)
2024-08-01 19:05     ` Ian Rogers
2024-08-07  8:54       ` Thorsten Leemhuis
2024-08-14 16:28         ` James Clark
2024-08-14 16:41           ` Arnaldo Carvalho de Melo
2024-08-15 15:15             ` James Clark
2024-08-15 15:20               ` James Clark
2024-08-15 15:27               ` Arnaldo Carvalho de Melo
2024-08-15 15:53                 ` Arnaldo Carvalho de Melo
2024-08-16  8:57                   ` James Clark
2024-08-15 17:29           ` Ian Rogers
2024-08-16  9:22             ` James Clark
2024-08-16 15:30               ` Ian Rogers
2024-08-17  1:38                 ` Atish Kumar Patra
2024-08-20  8:58                   ` James Clark
2024-08-19 14:56                 ` James Clark
2024-08-19 15:44                   ` Ian Rogers
2025-03-09 21:19       ` Ian Rogers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).