Re: perf stat issue with 7.0.0rc3

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: perf stat issue with 7.0.0rc3
       [not found]   ` <CAP-5=fWG0Q-xNGn_-_KGCopaNsWG5rUN3pYNLEh3K7+xE0OwVQ@mail.gmail.com>
@ 2026-03-17 19:39     ` Arnaldo Carvalho de Melo
  2026-03-17 19:56       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2026-03-17 19:39 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Thomas Richter, linux-perf-users, Jan Polensky,
	Linux Kernel Mailing List, Namhyung Kim

On Fri, Mar 13, 2026 at 08:41:48AM -0700, Ian Rogers wrote:
> On Fri, Mar 13, 2026 at 8:19 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > On Fri, Mar 13, 2026 at 02:13:18PM +0100, Thomas Richter wrote:
> > > I just discovered a strange behavior on linux 7.0.0rc3.
> > > This is the epected output, however when I use the perf version 7.0.0rc3:

> > > bash-5.3# ./perf -v
> > > perf version 7.0.rc3.g1f318b96cc84
> > > bash-5.3# ./perf stat -- true
> > > Error:
> > > No supported events found.
> > > trace.args_alignment

> > And this last line is even stranger, in my case I get something else,
> > also on x86_64:

> > root@number:~#  perf stat -- true
> > Error:
> > No supported events found.
> > addr2line.style
> > root@number:~#

> > On an ARM machine:

> > acme@raspberrypi:~/git/perf-tools $ perf stat -- true
> > Error:
> > No supported events found.

> > acme@raspberrypi:~/git/perf-tools $ uname -a
> > Linux raspberrypi 6.12.62+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.12.62-1+rpt1 (2025-12-18) aarch64 GNU/Linux
> > acme@raspberrypi:~/git/perf-tools $

> > I'll try to bisect this later, thanks for the report!
 
> Hi Arnaldo,
 
> Leo has pointed at the fix:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/tools/perf/util/metricgroup.c?h=perf-tools-next&id=c5a244bf17caf2de22f9e100832b75f72b31d3e6
> This also showed up as missing for the LTS backports:
> https://lore.kernel.org/lkml/ad95d781-7eb2-4c0c-a9e9-aaabae8eb602@kernel.org/
> so I thought it was flagged as a fix for the next PR. I don't see it in there:
> https://lore.kernel.org/lkml/20260313151434.1695228-1-acme@kernel.org/
> Could we get it in?

So, I applied the one Leo pointed out and got this:

root@number:~# perf stat sleep 1

 Performance counter stats for 'sleep 1':

                 1      context-switches                 #   3509.1 cs/sec  cs_per_second
                 0      cpu-migrations                   #      0.0 migrations/sec  migrations_per_second
                75      page-faults                      # 263181.0 faults/sec  page_faults_per_second
              0.28 msec task-clock                       #      0.0 CPUs  CPUs_utilized
             8,018      branch-misses                    #      4.2 %  branch_miss_rate
           191,108      branches                         #    670.6 M/sec  branch_frequency
           870,234      cpu-cycles                       #      3.1 GHz  cycles_frequency
     <not counted>      instructions                     #      nan instructions  insn_per_cycle  (0.00%)
     <not counted>      stalled-cycles-frontend          #      nan frontend_cycles_idle        (0.00%)

       1.001839832 seconds time elapsed

       0.000727000 seconds user
       0.000000000 seconds sys


Some events weren't counted. Try disabling the NMI watchdog:
	echo 0 > /proc/sys/kernel/nmi_watchdog
	perf stat ...
	echo 1 > /proc/sys/kernel/nmi_watchdog
root@number:~#  echo 0 > /proc/sys/kernel/nmi_watchdog


Which is strange, in the past (see below) instructions and
stalled-cycles-frontend were counted in the default (no -e) 'perf stat'
set of events, now if I disable the nmi_watchdog I get 'instructions'
back, but not 'stalled-cycles-frontend':


root@number:~# perf stat sleep 1

 Performance counter stats for 'sleep 1':

                 1      context-switches                 #   6524.7 cs/sec  cs_per_second
                 0      cpu-migrations                   #      0.0 migrations/sec  migrations_per_second
                71      page-faults                      # 463256.0 faults/sec  page_faults_per_second
              0.15 msec task-clock                       #      0.0 CPUs  CPUs_utilized
             7,484      branch-misses                    #      4.0 %  branch_miss_rate
           186,108      branches                         #   1214.3 M/sec  branch_frequency
           810,475      cpu-cycles                       #      5.3 GHz  cycles_frequency
           893,290      instructions                     #      1.1 instructions  insn_per_cycle
     <not counted>      stalled-cycles-frontend          #      nan frontend_cycles_idle        (0.00%)

       1.000382345 seconds time elapsed

       0.000443000 seconds user
       0.000000000 seconds sys


If I go back to v6.18 (will try a bisect to see exactly where this
happens), I get the events above, with a different printing order, but
'instructions' and 'stalled-cycles-frontend' successfully counts:

root@number:~# perf -v
perf version 6.18.g7d0a66e4bb90
root@number:~# perf stat sleep 1

 Performance counter stats for 'sleep 1':

           202,103      task-clock                       #    0.000 CPUs utilized
                 1      context-switches                 #    4.948 K/sec
                 0      cpu-migrations                   #    0.000 /sec
                71      page-faults                      #  351.306 K/sec
           895,485      instructions                     #    0.85  insn per cycle
                                                  #    0.56  stalled cycles per insn
         1,048,898      cycles                           #    5.190 GHz
           504,794      stalled-cycles-frontend          #   48.13% frontend cycles idle
           186,002      branches                         #  920.333 M/sec
             7,857      branch-misses                    #    4.22% of all branches

       1.000537221 seconds time elapsed

       0.000542000 seconds user
       0.000000000 seconds sys


root@number:~#

Trying bisection now.

- Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: perf stat issue with 7.0.0rc3
  2026-03-17 19:39     ` perf stat issue with 7.0.0rc3 Arnaldo Carvalho de Melo
@ 2026-03-17 19:56       ` Arnaldo Carvalho de Melo
  2026-03-17 20:12         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2026-03-17 19:56 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Thomas Richter, linux-perf-users, Jan Polensky,
	Linux Kernel Mailing List, Namhyung Kim

On Tue, Mar 17, 2026 at 04:39:24PM -0300, Arnaldo Carvalho de Melo wrote:
> On Fri, Mar 13, 2026 at 08:41:48AM -0700, Ian Rogers wrote:
> > On Fri, Mar 13, 2026 at 8:19 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > > On Fri, Mar 13, 2026 at 02:13:18PM +0100, Thomas Richter wrote:
> > > > I just discovered a strange behavior on linux 7.0.0rc3.
> > > > This is the epected output, however when I use the perf version 7.0.0rc3:
> 
> > > > bash-5.3# ./perf -v
> > > > perf version 7.0.rc3.g1f318b96cc84
> > > > bash-5.3# ./perf stat -- true
> > > > Error:
> > > > No supported events found.
> > > > trace.args_alignment
> 
> > > And this last line is even stranger, in my case I get something else,
> > > also on x86_64:
> 
> > > root@number:~#  perf stat -- true
> > > Error:
> > > No supported events found.
> > > addr2line.style
> > > root@number:~#
> 
> > > On an ARM machine:
> 
> > > acme@raspberrypi:~/git/perf-tools $ perf stat -- true
> > > Error:
> > > No supported events found.
> 
> > > acme@raspberrypi:~/git/perf-tools $ uname -a
> > > Linux raspberrypi 6.12.62+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.12.62-1+rpt1 (2025-12-18) aarch64 GNU/Linux
> > > acme@raspberrypi:~/git/perf-tools $
> 
> > > I'll try to bisect this later, thanks for the report!
>  
> > Hi Arnaldo,
>  
> > Leo has pointed at the fix:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/tools/perf/util/metricgroup.c?h=perf-tools-next&id=c5a244bf17caf2de22f9e100832b75f72b31d3e6
> > This also showed up as missing for the LTS backports:
> > https://lore.kernel.org/lkml/ad95d781-7eb2-4c0c-a9e9-aaabae8eb602@kernel.org/
> > so I thought it was flagged as a fix for the next PR. I don't see it in there:
> > https://lore.kernel.org/lkml/20260313151434.1695228-1-acme@kernel.org/
> > Could we get it in?
> 
> So, I applied the one Leo pointed out and got this:
> 
> root@number:~# perf stat sleep 1
> 
>  Performance counter stats for 'sleep 1':
> 
>                  1      context-switches                 #   3509.1 cs/sec  cs_per_second
>                  0      cpu-migrations                   #      0.0 migrations/sec  migrations_per_second
>                 75      page-faults                      # 263181.0 faults/sec  page_faults_per_second
>               0.28 msec task-clock                       #      0.0 CPUs  CPUs_utilized
>              8,018      branch-misses                    #      4.2 %  branch_miss_rate
>            191,108      branches                         #    670.6 M/sec  branch_frequency
>            870,234      cpu-cycles                       #      3.1 GHz  cycles_frequency
>      <not counted>      instructions                     #      nan instructions  insn_per_cycle  (0.00%)
>      <not counted>      stalled-cycles-frontend          #      nan frontend_cycles_idle        (0.00%)
> 
>        1.001839832 seconds time elapsed
> 
>        0.000727000 seconds user
>        0.000000000 seconds sys
> 
> 
> Some events weren't counted. Try disabling the NMI watchdog:
> 	echo 0 > /proc/sys/kernel/nmi_watchdog
> 	perf stat ...
> 	echo 1 > /proc/sys/kernel/nmi_watchdog
> root@number:~#  echo 0 > /proc/sys/kernel/nmi_watchdog
> 
> 
> Which is strange, in the past (see below) instructions and
> stalled-cycles-frontend were counted in the default (no -e) 'perf stat'
> set of events, now if I disable the nmi_watchdog I get 'instructions'
> back, but not 'stalled-cycles-frontend':
> 
> 
> root@number:~# perf stat sleep 1
> 
>  Performance counter stats for 'sleep 1':
> 
>                  1      context-switches                 #   6524.7 cs/sec  cs_per_second
>                  0      cpu-migrations                   #      0.0 migrations/sec  migrations_per_second
>                 71      page-faults                      # 463256.0 faults/sec  page_faults_per_second
>               0.15 msec task-clock                       #      0.0 CPUs  CPUs_utilized
>              7,484      branch-misses                    #      4.0 %  branch_miss_rate
>            186,108      branches                         #   1214.3 M/sec  branch_frequency
>            810,475      cpu-cycles                       #      5.3 GHz  cycles_frequency
>            893,290      instructions                     #      1.1 instructions  insn_per_cycle
>      <not counted>      stalled-cycles-frontend          #      nan frontend_cycles_idle        (0.00%)
> 
>        1.000382345 seconds time elapsed
> 
>        0.000443000 seconds user
>        0.000000000 seconds sys
> 
> 
> If I go back to v6.18 (will try a bisect to see exactly where this
> happens), I get the events above, with a different printing order, but
> 'instructions' and 'stalled-cycles-frontend' successfully counts:
> 
> root@number:~# perf -v
> perf version 6.18.g7d0a66e4bb90
> root@number:~# perf stat sleep 1
> 
>  Performance counter stats for 'sleep 1':
> 
>            202,103      task-clock                       #    0.000 CPUs utilized
>                  1      context-switches                 #    4.948 K/sec
>                  0      cpu-migrations                   #    0.000 /sec
>                 71      page-faults                      #  351.306 K/sec
>            895,485      instructions                     #    0.85  insn per cycle
>                                                   #    0.56  stalled cycles per insn
>          1,048,898      cycles                           #    5.190 GHz
>            504,794      stalled-cycles-frontend          #   48.13% frontend cycles idle
>            186,002      branches                         #  920.333 M/sec
>              7,857      branch-misses                    #    4.22% of all branches
> 
>        1.000537221 seconds time elapsed
> 
>        0.000542000 seconds user
>        0.000000000 seconds sys
> 
> 
> root@number:~#
> 
> Trying bisection now.

Its related to that:

a3248b5b5427dc2126c19aa9c32f1e840b65024f is the first bad commit
commit a3248b5b5427dc2126c19aa9c32f1e840b65024f
Author: Ian Rogers <irogers@google.com>
Date:   Tue Nov 11 13:21:52 2025 -0800

    perf jevents: Add metric DefaultShowEvents

    Some Default group metrics require their events showing for
    consistency with perf's previous behavior. Add a flag to indicate when
    this is the case and use it in stat-display.

root@number:~# strace -e perf_event_open perf stat sleep 1
perf_event_open({type=PERF_TYPE_SOFTWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_SW_CONTEXT_SWITCHES, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
perf_event_open({type=PERF_TYPE_SOFTWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_SW_CPU_MIGRATIONS, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, -1, PERF_FLAG_FD_CLOEXEC) = 4
perf_event_open({type=PERF_TYPE_SOFTWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_SW_PAGE_FAULTS, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, -1, PERF_FLAG_FD_CLOEXEC) = 5
perf_event_open({type=PERF_TYPE_SOFTWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_SW_TASK_CLOCK, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, -1, PERF_FLAG_FD_CLOEXEC) = 7
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xc3, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, -1, PERF_FLAG_FD_CLOEXEC) = 8
perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_BRANCH_INSTRUCTIONS, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, 8, PERF_FLAG_FD_CLOEXEC) = 9
perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_BRANCH_INSTRUCTIONS, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, -1, PERF_FLAG_FD_CLOEXEC) = 10
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, -1, PERF_FLAG_FD_CLOEXEC) = 11
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, -1, PERF_FLAG_FD_CLOEXEC) = 12
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xc0, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, 12, PERF_FLAG_FD_CLOEXEC) = 13
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, -1, PERF_FLAG_FD_CLOEXEC) = 14
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, 14, PERF_FLAG_FD_CLOEXEC) = 15
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, -1, PERF_FLAG_FD_CLOEXEC) = 16
perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, 16, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, -1, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=249821, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---

 Performance counter stats for 'sleep 1':

                 1      context-switches                 #   3356.1 cs/sec  cs_per_second     
                 0      cpu-migrations                   #      0.0 migrations/sec  migrations_per_second
                74      page-faults                      # 248352.1 faults/sec  page_faults_per_second
              0.30 msec task-clock                       #      0.0 CPUs  CPUs_utilized       
             7,566      branch-misses                    #      4.0 %  branch_miss_rate       
           189,307      branches                         #    635.3 M/sec  branch_frequency   
           798,790      cpu-cycles                       #      2.7 GHz  cycles_frequency     
           908,746      instructions                     #      1.1 instructions  insn_per_cycle
     <not counted>      stalled-cycles-frontend          #      nan frontend_cycles_idle        (0.00%)

       1.000722079 seconds time elapsed

       0.000000000 seconds user
       0.000714000 seconds sys


--- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=249820, si_uid=0} ---
+++ exited with 0 +++
root@number:~#

It is not trying PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, is asking for
PERF_COUNT_HW_STALLED_CYCLES_BACKEND instead...

- Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: perf stat issue with 7.0.0rc3
  2026-03-17 19:56       ` Arnaldo Carvalho de Melo
@ 2026-03-17 20:12         ` Arnaldo Carvalho de Melo
  2026-03-17 20:50           ` Ian Rogers
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2026-03-17 20:12 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Thomas Richter, linux-perf-users, Jan Polensky,
	Linux Kernel Mailing List, Namhyung Kim

On Tue, Mar 17, 2026 at 04:56:51PM -0300, Arnaldo Carvalho de Melo wrote:
> It is not trying PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, is asking for
> PERF_COUNT_HW_STALLED_CYCLES_BACKEND instead...

If I instead ask just for stalled-cycles-frontend and
stalled-cycles-backend:

root@number:~# strace -e perf_event_open perf stat -e stalled-cycles-frontend,stalled-cycles-backend sleep 1
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 250619, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 250619, -1, -1, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=250619, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---

 Performance counter stats for 'sleep 1':

           409,276      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend

       1.000428804 seconds time elapsed

       0.000439000 seconds user
       0.000000000 seconds sys


--- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=250618, si_uid=0} ---
+++ exited with 0 +++
root@number:~#

It used type=PERF_TYPE_RAW, config=0xa9 for  stalled-cycles-frontend but
type=PERF_TYPE_HARDWARE, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND.

⬢ [acme@toolbx perf-tools]$ git grep stalled-cycles-frontend tools
tools/bpf/bpftool/link.c:       [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = "stalled-cycles-frontend",
tools/perf/builtin-stat.c:     3,856,436,920 stalled-cycles-frontend   #   74.09% frontend cycles idle
tools/perf/pmu-events/arch/common/common/legacy-hardware.json:    "EventName": "stalled-cycles-frontend",
tools/perf/pmu-events/empty-pmu-events.c:/* offset=122795 */ "stalled-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of idle-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000"
tools/perf/pmu-events/empty-pmu-events.c:/* offset=122945 */ "idle-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of stalled-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000"
tools/perf/pmu-events/empty-pmu-events.c:{ 122795 }, /* stalled-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of idle-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000 */
tools/perf/tests/shell/stat+std_output.sh:event_name=(cpu-clock task-clock context-switches cpu-migrations page-faults stalled-cycles-frontend stalled-cycles-backend cycles instructions branches branch-misses)
tools/perf/util/evsel.c:        "stalled-cycles-frontend",
⬢ [acme@toolbx perf-tools]$

This machine is:

⬢ [acme@toolbx perf-tools]$ grep -m1 "model name" /proc/cpuinfo
model name	: AMD Ryzen 9 9950X3D 16-Core Processor
⬢ [acme@toolbx perf-tools]

And doesn't have PERF_COUNT_HW_STALLED_CYCLES_BACKEND, but has
PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, that gets configured using
PERF_TYPE_RAW and 0xa9 because:

root@number:~# cat /sys/devices/cpu/events/stalled-cycles-frontend
event=0xa9
root@number:~#

But I couldn't so far explain why in the default case it is asking for
PERF_COUNT_HW_STALLED_CYCLES_BACKEND, when it should be asking for
PERF_COUNT_HW_STALLED_CYCLES_FRONTEND or PERF_TYPE_RAW+config=0xa9...

- Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: perf stat issue with 7.0.0rc3
  2026-03-17 20:12         ` Arnaldo Carvalho de Melo
@ 2026-03-17 20:50           ` Ian Rogers
  2026-03-18  0:37             ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 10+ messages in thread
From: Ian Rogers @ 2026-03-17 20:50 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Thomas Richter, linux-perf-users, Jan Polensky,
	Linux Kernel Mailing List, Namhyung Kim

On Tue, Mar 17, 2026 at 1:12 PM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> On Tue, Mar 17, 2026 at 04:56:51PM -0300, Arnaldo Carvalho de Melo wrote:
> > It is not trying PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, is asking for
> > PERF_COUNT_HW_STALLED_CYCLES_BACKEND instead...
>
> If I instead ask just for stalled-cycles-frontend and
> stalled-cycles-backend:
>
> root@number:~# strace -e perf_event_open perf stat -e stalled-cycles-frontend,stalled-cycles-backend sleep 1

I think you intend for this to be system wide '-a'.

> perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 250619, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
> perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 250619, -1, -1, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
> --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=250619, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
>
>  Performance counter stats for 'sleep 1':
>
>            409,276      stalled-cycles-frontend
>    <not supported>      stalled-cycles-backend
>
>        1.000428804 seconds time elapsed
>
>        0.000439000 seconds user
>        0.000000000 seconds sys
>
>
> --- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=250618, si_uid=0} ---
> +++ exited with 0 +++
> root@number:~#
>
> It used type=PERF_TYPE_RAW, config=0xa9 for  stalled-cycles-frontend but
> type=PERF_TYPE_HARDWARE, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND.
>
> ⬢ [acme@toolbx perf-tools]$ git grep stalled-cycles-frontend tools
> tools/bpf/bpftool/link.c:       [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = "stalled-cycles-frontend",
> tools/perf/builtin-stat.c:     3,856,436,920 stalled-cycles-frontend   #   74.09% frontend cycles idle
> tools/perf/pmu-events/arch/common/common/legacy-hardware.json:    "EventName": "stalled-cycles-frontend",
> tools/perf/pmu-events/empty-pmu-events.c:/* offset=122795 */ "stalled-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of idle-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000"
> tools/perf/pmu-events/empty-pmu-events.c:/* offset=122945 */ "idle-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of stalled-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000"
> tools/perf/pmu-events/empty-pmu-events.c:{ 122795 }, /* stalled-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of idle-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000 */
> tools/perf/tests/shell/stat+std_output.sh:event_name=(cpu-clock task-clock context-switches cpu-migrations page-faults stalled-cycles-frontend stalled-cycles-backend cycles instructions branches branch-misses)
> tools/perf/util/evsel.c:        "stalled-cycles-frontend",
> ⬢ [acme@toolbx perf-tools]$
>
> This machine is:
>
> ⬢ [acme@toolbx perf-tools]$ grep -m1 "model name" /proc/cpuinfo
> model name      : AMD Ryzen 9 9950X3D 16-Core Processor

Lots of missing legacy events on AMD. The problem is worse with -dd and -ddd.

> ⬢ [acme@toolbx perf-tools]
>
> And doesn't have PERF_COUNT_HW_STALLED_CYCLES_BACKEND, but has
> PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, that gets configured using
> PERF_TYPE_RAW and 0xa9 because:
>
> root@number:~# cat /sys/devices/cpu/events/stalled-cycles-frontend
> event=0xa9
> root@number:~#
>
> But I couldn't so far explain why in the default case it is asking for
> PERF_COUNT_HW_STALLED_CYCLES_BACKEND, when it should be asking for
> PERF_COUNT_HW_STALLED_CYCLES_FRONTEND or PERF_TYPE_RAW+config=0xa9...

So the default events/metrics are now in json:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common/metrics.json?h=perf-tools-next
Relating to the stalls there are:
```
    {
        "BriefDescription": "Max front or backend stalls per instruction",
        "MetricExpr": "max(stalled\\-cycles\\-frontend,
stalled\\-cycles\\-backend) / instructions",
        "MetricGroup": "Default",
        "MetricName": "stalled_cycles_per_instruction",
        "DefaultShowEvents": "1"
    },
    {
        "BriefDescription": "Frontend stalls per cycle",
        "MetricExpr": "stalled\\-cycles\\-frontend / cpu\\-cycles",
        "MetricGroup": "Default",
        "MetricName": "frontend_cycles_idle",
        "MetricThreshold": "frontend_cycles_idle > 0.1",
        "DefaultShowEvents": "1"
    },
    {
        "BriefDescription": "Backend stalls per cycle",
        "MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
        "MetricGroup": "Default",
        "MetricName": "backend_cycles_idle",
        "MetricThreshold": "backend_cycles_idle > 0.2",
        "DefaultShowEvents": "1"
    },
```
The stalled_cycles_per_instruction and backed_cycles_idle should fail
as the stalled-cycles-backend event is missing. frontend_cycles_idle
should work, I wonder if the 0 counts relate to trouble scheduling
groups of events. I'll need more verbose output to understand. Perhaps
for stalled_cycles_per_instruction, we should modify the metric to
tolerate missing events:

max(stalled\\-cycles\\-frontend if
have_event(stalled\\-cycles\\-frontend) else 0,
stalled\\-cycles\\-backend if have_event(stalled\\-cycles\\-backend)
else 0) / instructions

Thanks,
Ian

> - Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: perf stat issue with 7.0.0rc3
  2026-03-17 20:50           ` Ian Rogers
@ 2026-03-18  0:37             ` Arnaldo Carvalho de Melo
  2026-03-18  2:25               ` Ian Rogers
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2026-03-18  0:37 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Thomas Richter, linux-perf-users, Jan Polensky,
	Linux Kernel Mailing List, Namhyung Kim

On Tue, Mar 17, 2026 at 01:50:21PM -0700, Ian Rogers wrote:
> On Tue, Mar 17, 2026 at 1:12 PM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > On Tue, Mar 17, 2026 at 04:56:51PM -0300, Arnaldo Carvalho de Melo wrote:
> > > It is not trying PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, is asking for
> > > PERF_COUNT_HW_STALLED_CYCLES_BACKEND instead...

> > If I instead ask just for stalled-cycles-frontend and
> > stalled-cycles-backend:

> > root@number:~# strace -e perf_event_open perf stat -e stalled-cycles-frontend,stalled-cycles-backend sleep 1

> I think you intend for this to be system wide '-a'.

> > perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 250619, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
> > perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 250619, -1, -1, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
> > --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=250619, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
> >
> >  Performance counter stats for 'sleep 1':
> >
> >            409,276      stalled-cycles-frontend
> >    <not supported>      stalled-cycles-backend
> >
> >        1.000428804 seconds time elapsed
> >
> >        0.000439000 seconds user
> >        0.000000000 seconds sys
> >
> >
> > --- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=250618, si_uid=0} ---
> > +++ exited with 0 +++
> > root@number:~#
> >
> > It used type=PERF_TYPE_RAW, config=0xa9 for  stalled-cycles-frontend but
> > type=PERF_TYPE_HARDWARE, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND.
> >
> > ⬢ [acme@toolbx perf-tools]$ git grep stalled-cycles-frontend tools
> > tools/bpf/bpftool/link.c:       [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = "stalled-cycles-frontend",
> > tools/perf/builtin-stat.c:     3,856,436,920 stalled-cycles-frontend   #   74.09% frontend cycles idle
> > tools/perf/pmu-events/arch/common/common/legacy-hardware.json:    "EventName": "stalled-cycles-frontend",
> > tools/perf/pmu-events/empty-pmu-events.c:/* offset=122795 */ "stalled-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of idle-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000"
> > tools/perf/pmu-events/empty-pmu-events.c:/* offset=122945 */ "idle-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of stalled-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000"
> > tools/perf/pmu-events/empty-pmu-events.c:{ 122795 }, /* stalled-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of idle-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000 */
> > tools/perf/tests/shell/stat+std_output.sh:event_name=(cpu-clock task-clock context-switches cpu-migrations page-faults stalled-cycles-frontend stalled-cycles-backend cycles instructions branches branch-misses)
> > tools/perf/util/evsel.c:        "stalled-cycles-frontend",
> > ⬢ [acme@toolbx perf-tools]$
> >
> > This machine is:
> >
> > ⬢ [acme@toolbx perf-tools]$ grep -m1 "model name" /proc/cpuinfo
> > model name      : AMD Ryzen 9 9950X3D 16-Core Processor
> 
> Lots of missing legacy events on AMD. The problem is worse with -dd and -ddd.
> 
> > ⬢ [acme@toolbx perf-tools]
> >
> > And doesn't have PERF_COUNT_HW_STALLED_CYCLES_BACKEND, but has
> > PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, that gets configured using
> > PERF_TYPE_RAW and 0xa9 because:
> >
> > root@number:~# cat /sys/devices/cpu/events/stalled-cycles-frontend
> > event=0xa9
> > root@number:~#
> >
> > But I couldn't so far explain why in the default case it is asking for
> > PERF_COUNT_HW_STALLED_CYCLES_BACKEND, when it should be asking for
> > PERF_COUNT_HW_STALLED_CYCLES_FRONTEND or PERF_TYPE_RAW+config=0xa9...

So you mean that it goes on to try this:

  {
        "BriefDescription": "Max front or backend stalls per instruction",
        "MetricExpr": "max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions",
        "MetricGroup": "Default",
        "MetricName": "stalled_cycles_per_instruction",
        "DefaultShowEvents": "1"
    },

Yeah, it tries both:

perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, 14, PERF_FLAG_FD_CLOEXEC) = 15
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, -1, PERF_FLAG_FD_CLOEXEC) = 16
perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, 16, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)

The RAW one is the equivalent to PERF_COUNT_HW_STALLED_CYCLES_FRONTEND,
I see now that I looked again at the 'strace perf stat sleep 1'

But in the output it also says:

     <not counted>      stalled-cycles-frontend          #      nan frontend_cycles_idle        (0.00%)

And if I try just this one:

root@number:~# strace -e perf_event_open perf stat -e stalled-cycles-frontend sleep 1
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865273, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=865273, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---

 Performance counter stats for 'sleep 1':

           422,404      stalled-cycles-frontend

       1.000432524 seconds time elapsed

       0.000438000 seconds user
       0.000000000 seconds sys


--- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=865272, si_uid=0} ---
+++ exited with 0 +++
root@number:~#

It works, so that line with stalled-cycles-frontend could have produced
the value, not '<not counted>', as this call succeeded:

perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, 14, PERF_FLAG_FD_CLOEXEC) = 15
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, -1, PERF_FLAG_FD_CLOEXEC) = 16

Maybe the explanation is that it tries the metric, that uses both
frontend and backend, it fails at backend and then it discards the
frontend?

 
> So the default events/metrics are now in json:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common/metrics.json?h=perf-tools-next
> Relating to the stalls there are:
> ```
>     {
>         "BriefDescription": "Max front or backend stalls per instruction",
>         "MetricExpr": "max(stalled\\-cycles\\-frontend,
> stalled\\-cycles\\-backend) / instructions",
>         "MetricGroup": "Default",
>         "MetricName": "stalled_cycles_per_instruction",
>         "DefaultShowEvents": "1"
>     },

root@number:~# strace -e perf_event_open perf stat -M stalled_cycles_per_instruction sleep 1
perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865362, -1, -1, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
Error:
No supported events found.
The stalled-cycles-backend event is not supported.
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=865362, si_uid=0, si_status=SIGTERM, si_utime=0, si_stime=0} ---
+++ exited with 1 +++
root@number:~#

>     {
>         "BriefDescription": "Frontend stalls per cycle",
>         "MetricExpr": "stalled\\-cycles\\-frontend / cpu\\-cycles",
>         "MetricGroup": "Default",
>         "MetricName": "frontend_cycles_idle",
>         "MetricThreshold": "frontend_cycles_idle > 0.1",
>         "DefaultShowEvents": "1"
>     },

root@number:~# strace -e perf_event_open perf stat -M frontend_cycles_idle sleep 1
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865414, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865414, -1, 3, PERF_FLAG_FD_CLOEXEC) = 4
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=865414, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---

 Performance counter stats for 'sleep 1':

           881,022      cpu-cycles                       #     0.48 frontend_cycles_idle
           422,386      stalled-cycles-frontend

       1.000468505 seconds time elapsed

       0.000504000 seconds user
       0.000000000 seconds sys


--- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=865413, si_uid=0} ---
+++ exited with 0 +++
root@number:~#
>     {
>         "BriefDescription": "Backend stalls per cycle",
>         "MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
>         "MetricGroup": "Default",
>         "MetricName": "backend_cycles_idle",
>         "MetricThreshold": "backend_cycles_idle > 0.2",
>         "DefaultShowEvents": "1"
>     },

root@number:~# strace -e perf_event_open perf stat -M backend_cycles_idle sleep 1
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865442, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865442, -1, 3, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=865442, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---

 Performance counter stats for 'sleep 1':

     <not counted>      cpu-cycles                       #      nan backend_cycles_idle       
   <not supported>      stalled-cycles-backend                                                

       1.000739264 seconds time elapsed

       0.000675000 seconds user
       0.000000000 seconds sys


--- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=865441, si_uid=0} ---
+++ exited with 0 +++
root@number:~#

> ```
> The stalled_cycles_per_instruction and backed_cycles_idle should fail
> as the stalled-cycles-backend event is missing. frontend_cycles_idle
> should work, I wonder if the 0 counts relate to trouble scheduling
> groups of events. I'll need more verbose output to understand. Perhaps
> for stalled_cycles_per_instruction, we should modify the metric to
> tolerate missing events:
> 
> max(stalled\\-cycles\\-frontend if
> have_event(stalled\\-cycles\\-frontend) else 0,
> stalled\\-cycles\\-backend if have_event(stalled\\-cycles\\-backend)
> else 0) / instructions

That have_event() part also have to be implemented, right?

- Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: perf stat issue with 7.0.0rc3
  2026-03-18  0:37             ` Arnaldo Carvalho de Melo
@ 2026-03-18  2:25               ` Ian Rogers
  2026-03-19  1:01                 ` [PATCH v1] perf metrics: Make common stalled metrics conditional on having the event Ian Rogers
  2026-03-24  4:19                 ` perf stat issue with 7.0.0rc3 Ian Rogers
  0 siblings, 2 replies; 10+ messages in thread
From: Ian Rogers @ 2026-03-18  2:25 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Thomas Richter, linux-perf-users, Jan Polensky,
	Linux Kernel Mailing List, Namhyung Kim

On Tue, Mar 17, 2026 at 5:37 PM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> On Tue, Mar 17, 2026 at 01:50:21PM -0700, Ian Rogers wrote:
> > On Tue, Mar 17, 2026 at 1:12 PM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > > On Tue, Mar 17, 2026 at 04:56:51PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > It is not trying PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, is asking for
> > > > PERF_COUNT_HW_STALLED_CYCLES_BACKEND instead...
>
> > > If I instead ask just for stalled-cycles-frontend and
> > > stalled-cycles-backend:
>
> > > root@number:~# strace -e perf_event_open perf stat -e stalled-cycles-frontend,stalled-cycles-backend sleep 1
>
> > I think you intend for this to be system wide '-a'.
>
> > > perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 250619, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
> > > perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 250619, -1, -1, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
> > > --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=250619, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
> > >
> > >  Performance counter stats for 'sleep 1':
> > >
> > >            409,276      stalled-cycles-frontend
> > >    <not supported>      stalled-cycles-backend
> > >
> > >        1.000428804 seconds time elapsed
> > >
> > >        0.000439000 seconds user
> > >        0.000000000 seconds sys
> > >
> > >
> > > --- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=250618, si_uid=0} ---
> > > +++ exited with 0 +++
> > > root@number:~#
> > >
> > > It used type=PERF_TYPE_RAW, config=0xa9 for  stalled-cycles-frontend but
> > > type=PERF_TYPE_HARDWARE, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND.
> > >
> > > ⬢ [acme@toolbx perf-tools]$ git grep stalled-cycles-frontend tools
> > > tools/bpf/bpftool/link.c:       [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = "stalled-cycles-frontend",
> > > tools/perf/builtin-stat.c:     3,856,436,920 stalled-cycles-frontend   #   74.09% frontend cycles idle
> > > tools/perf/pmu-events/arch/common/common/legacy-hardware.json:    "EventName": "stalled-cycles-frontend",
> > > tools/perf/pmu-events/empty-pmu-events.c:/* offset=122795 */ "stalled-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of idle-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000"
> > > tools/perf/pmu-events/empty-pmu-events.c:/* offset=122945 */ "idle-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of stalled-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000"
> > > tools/perf/pmu-events/empty-pmu-events.c:{ 122795 }, /* stalled-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of idle-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000 */
> > > tools/perf/tests/shell/stat+std_output.sh:event_name=(cpu-clock task-clock context-switches cpu-migrations page-faults stalled-cycles-frontend stalled-cycles-backend cycles instructions branches branch-misses)
> > > tools/perf/util/evsel.c:        "stalled-cycles-frontend",
> > > ⬢ [acme@toolbx perf-tools]$
> > >
> > > This machine is:
> > >
> > > ⬢ [acme@toolbx perf-tools]$ grep -m1 "model name" /proc/cpuinfo
> > > model name      : AMD Ryzen 9 9950X3D 16-Core Processor
> >
> > Lots of missing legacy events on AMD. The problem is worse with -dd and -ddd.
> >
> > > ⬢ [acme@toolbx perf-tools]
> > >
> > > And doesn't have PERF_COUNT_HW_STALLED_CYCLES_BACKEND, but has
> > > PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, that gets configured using
> > > PERF_TYPE_RAW and 0xa9 because:
> > >
> > > root@number:~# cat /sys/devices/cpu/events/stalled-cycles-frontend
> > > event=0xa9
> > > root@number:~#
> > >
> > > But I couldn't so far explain why in the default case it is asking for
> > > PERF_COUNT_HW_STALLED_CYCLES_BACKEND, when it should be asking for
> > > PERF_COUNT_HW_STALLED_CYCLES_FRONTEND or PERF_TYPE_RAW+config=0xa9...
>
> So you mean that it goes on to try this:
>
>   {
>         "BriefDescription": "Max front or backend stalls per instruction",
>         "MetricExpr": "max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions",
>         "MetricGroup": "Default",
>         "MetricName": "stalled_cycles_per_instruction",
>         "DefaultShowEvents": "1"
>     },
>
> Yeah, it tries both:
>
> perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, 14, PERF_FLAG_FD_CLOEXEC) = 15
> perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, -1, PERF_FLAG_FD_CLOEXEC) = 16
> perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, 16, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)

perf trace? The formatting on perf_event_open is better :-)

> The RAW one is the equivalent to PERF_COUNT_HW_STALLED_CYCLES_FRONTEND,
> I see now that I looked again at the 'strace perf stat sleep 1'
>
> But in the output it also says:
>
>      <not counted>      stalled-cycles-frontend          #      nan frontend_cycles_idle        (0.00%)
>
> And if I try just this one:
>
> root@number:~# strace -e perf_event_open perf stat -e stalled-cycles-frontend sleep 1
> perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865273, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
> --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=865273, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
>
>  Performance counter stats for 'sleep 1':
>
>            422,404      stalled-cycles-frontend
>
>        1.000432524 seconds time elapsed
>
>        0.000438000 seconds user
>        0.000000000 seconds sys
>
>
> --- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=865272, si_uid=0} ---
> +++ exited with 0 +++
> root@number:~#
>
> It works, so that line with stalled-cycles-frontend could have produced
> the value, not '<not counted>', as this call succeeded:
>
> perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, 14, PERF_FLAG_FD_CLOEXEC) = 15
> perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, -1, PERF_FLAG_FD_CLOEXEC) = 16
>
> Maybe the explanation is that it tries the metric, that uses both
> frontend and backend, it fails at backend and then it discards the
> frontend?

The metric logic tries to be smart, creating groups of events and then
sharing events between groups to avoid programming more events. I
suspect the frontend and backend are in a group, maybe the backend is
the leader. The backend fails to open, resulting in no group_fd and a
broken group for both the metrics.

> > So the default events/metrics are now in json:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common/metrics.json?h=perf-tools-next
> > Relating to the stalls there are:
> > ```
> >     {
> >         "BriefDescription": "Max front or backend stalls per instruction",
> >         "MetricExpr": "max(stalled\\-cycles\\-frontend,
> > stalled\\-cycles\\-backend) / instructions",
> >         "MetricGroup": "Default",
> >         "MetricName": "stalled_cycles_per_instruction",
> >         "DefaultShowEvents": "1"
> >     },
>
> root@number:~# strace -e perf_event_open perf stat -M stalled_cycles_per_instruction sleep 1
> perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865362, -1, -1, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
> Error:
> No supported events found.
> The stalled-cycles-backend event is not supported.
> --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=865362, si_uid=0, si_status=SIGTERM, si_utime=0, si_stime=0} ---
> +++ exited with 1 +++
> root@number:~#
>
> >     {
> >         "BriefDescription": "Frontend stalls per cycle",
> >         "MetricExpr": "stalled\\-cycles\\-frontend / cpu\\-cycles",
> >         "MetricGroup": "Default",
> >         "MetricName": "frontend_cycles_idle",
> >         "MetricThreshold": "frontend_cycles_idle > 0.1",
> >         "DefaultShowEvents": "1"
> >     },
>
> root@number:~# strace -e perf_event_open perf stat -M frontend_cycles_idle sleep 1
> perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865414, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
> perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865414, -1, 3, PERF_FLAG_FD_CLOEXEC) = 4
> --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=865414, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
>
>  Performance counter stats for 'sleep 1':
>
>            881,022      cpu-cycles                       #     0.48 frontend_cycles_idle
>            422,386      stalled-cycles-frontend
>
>        1.000468505 seconds time elapsed
>
>        0.000504000 seconds user
>        0.000000000 seconds sys
>
>
> --- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=865413, si_uid=0} ---
> +++ exited with 0 +++
> root@number:~#
> >     {
> >         "BriefDescription": "Backend stalls per cycle",
> >         "MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
> >         "MetricGroup": "Default",
> >         "MetricName": "backend_cycles_idle",
> >         "MetricThreshold": "backend_cycles_idle > 0.2",
> >         "DefaultShowEvents": "1"
> >     },
>
> root@number:~# strace -e perf_event_open perf stat -M backend_cycles_idle sleep 1
> perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865442, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
> perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865442, -1, 3, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
> --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=865442, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
>
>  Performance counter stats for 'sleep 1':
>
>      <not counted>      cpu-cycles                       #      nan backend_cycles_idle
>    <not supported>      stalled-cycles-backend
>
>        1.000739264 seconds time elapsed
>
>        0.000675000 seconds user
>        0.000000000 seconds sys
>
>
> --- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=865441, si_uid=0} ---
> +++ exited with 0 +++
> root@number:~#
>
> > ```
> > The stalled_cycles_per_instruction and backed_cycles_idle should fail
> > as the stalled-cycles-backend event is missing. frontend_cycles_idle
> > should work, I wonder if the 0 counts relate to trouble scheduling
> > groups of events. I'll need more verbose output to understand. Perhaps
> > for stalled_cycles_per_instruction, we should modify the metric to
> > tolerate missing events:
> >
> > max(stalled\\-cycles\\-frontend if
> > have_event(stalled\\-cycles\\-frontend) else 0,
> > stalled\\-cycles\\-backend if have_event(stalled\\-cycles\\-backend)
> > else 0) / instructions
>
> That have_event() part also have to be implemented, right?

I mistyped, has_event not have_event :-)

Thanks,
Ian

> - Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v1] perf metrics: Make common stalled metrics conditional on having the event
  2026-03-18  2:25               ` Ian Rogers
@ 2026-03-19  1:01                 ` Ian Rogers
  2026-04-01  5:55                   ` Ian Rogers
  2026-04-04  0:15                   ` Namhyung Kim
  2026-03-24  4:19                 ` perf stat issue with 7.0.0rc3 Ian Rogers
  1 sibling, 2 replies; 10+ messages in thread
From: Ian Rogers @ 2026-03-19  1:01 UTC (permalink / raw)
  To: acme; +Cc: irogers, japo, linux-kernel, linux-perf-users, namhyung, tmricht

The metric code uses the event parsing code but it generally assumes
all events are supported. Arnaldo reported AMD supporting
stalled-cycles-frontend but not stalled-cycles-backend [1]. An issue
with this is that before parsing happens the metric code tries to
share events within groups to reduce the number of events and
multiplexing. If the group has some supported and not supported
events, the whole group will become broken. To avoid this situation
add has_event tests to the metrics for stalled-cycles-frontend and
stalled-cycles-backend. has_events is evaluated when parsing the
metric and its result constant propagated (with if-elses) to reduce
the number of events. This means when the metric code considers
sharing the events, only supported events will be shared.

Note for backporting. This change updates
tools/perf/pmu-events/empty-pmu-events.c a convenience file for builds
on systems without python present. While the metrics.json code should
backport easily there can be conflicts on empty-pmu-events.c. In this
case the build will have left a file test-empty-pmu-events.c that can
be copied over empty-pmu-events.c to resolve issues and make an
appropriate empty-pmu-events.c for the json in the source tree at the
time of the build.

[1] https://lore.kernel.org/lkml/abm1nR-2xjOUBroD@x1/

Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Closes: https://lore.kernel.org/lkml/abm1nR-2xjOUBroD@x1/
Fixes: c7adeb0974f1 ("perf jevents: Add set of common metrics based on default ones")
Signed-off-by: Ian Rogers <irogers@google.com>
---
 .../arch/common/common/metrics.json           |   6 +-
 tools/perf/pmu-events/empty-pmu-events.c      | 108 +++++++++---------
 2 files changed, 57 insertions(+), 57 deletions(-)

diff --git a/tools/perf/pmu-events/arch/common/common/metrics.json b/tools/perf/pmu-events/arch/common/common/metrics.json
index 0d010b3ebc6d..cefc8bfe7830 100644
--- a/tools/perf/pmu-events/arch/common/common/metrics.json
+++ b/tools/perf/pmu-events/arch/common/common/metrics.json
@@ -46,14 +46,14 @@
     },
     {
         "BriefDescription": "Max front or backend stalls per instruction",
-        "MetricExpr": "max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions",
+        "MetricExpr": "(max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions) if (has_event(stalled\\-cycles\\-frontend) & has_event(stalled\\-cycles\\-backend)) else ((stalled\\-cycles\\-frontend / instructions) if has_event(stalled\\-cycles\\-frontend) else ((stalled\\-cycles\\-backend / instructions) if has_event(stalled\\-cycles\\-backend) else 0))",
         "MetricGroup": "Default",
         "MetricName": "stalled_cycles_per_instruction",
         "DefaultShowEvents": "1"
     },
     {
         "BriefDescription": "Frontend stalls per cycle",
-        "MetricExpr": "stalled\\-cycles\\-frontend / cpu\\-cycles",
+        "MetricExpr": "(stalled\\-cycles\\-frontend / cpu\\-cycles) if has_event(stalled\\-cycles\\-frontend) else 0",
         "MetricGroup": "Default",
         "MetricName": "frontend_cycles_idle",
         "MetricThreshold": "frontend_cycles_idle > 0.1",
@@ -61,7 +61,7 @@
     },
     {
         "BriefDescription": "Backend stalls per cycle",
-        "MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
+        "MetricExpr": "(stalled\\-cycles\\-backend / cpu\\-cycles) if has_event(stalled\\-cycles\\-backend) else 0",
         "MetricGroup": "Default",
         "MetricName": "backend_cycles_idle",
         "MetricThreshold": "backend_cycles_idle > 0.2",
diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
index 76c395cf513c..a92dd0424f79 100644
--- a/tools/perf/pmu-events/empty-pmu-events.c
+++ b/tools/perf/pmu-events/empty-pmu-events.c
@@ -1310,33 +1310,33 @@ static const char *const big_c_string =
 /* offset=128375 */ "migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\000011"
 /* offset=128635 */ "page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\000011"
 /* offset=128866 */ "insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\000001"
-/* offset=128979 */ "stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001"
-/* offset=129143 */ "frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001"
-/* offset=129273 */ "backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\000001"
-/* offset=129399 */ "cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011"
-/* offset=129575 */ "branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\000011"
-/* offset=129755 */ "branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001"
-/* offset=129859 */ "l1d_miss_rate\000Default2\000L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads\000l1d_miss_rate > 0.05\000L1D  miss rate\000\000100%\000\000\000\000001"
-/* offset=129975 */ "llc_miss_rate\000Default2\000LLC\\-load\\-misses / LLC\\-loads\000llc_miss_rate > 0.05\000LLC miss rate\000\000100%\000\000\000\000001"
-/* offset=130076 */ "l1i_miss_rate\000Default3\000L1\\-icache\\-load\\-misses / L1\\-icache\\-loads\000l1i_miss_rate > 0.05\000L1I miss rate\000\000100%\000\000\000\000001"
-/* offset=130191 */ "dtlb_miss_rate\000Default3\000dTLB\\-load\\-misses / dTLB\\-loads\000dtlb_miss_rate > 0.05\000dTLB miss rate\000\000100%\000\000\000\000001"
-/* offset=130297 */ "itlb_miss_rate\000Default3\000iTLB\\-load\\-misses / iTLB\\-loads\000itlb_miss_rate > 0.05\000iTLB miss rate\000\000100%\000\000\000\000001"
-/* offset=130403 */ "l1_prefetch_miss_rate\000Default4\000L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches\000l1_prefetch_miss_rate > 0.05\000L1 prefetch miss rate\000\000100%\000\000\000\000001"
-/* offset=130551 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000"
-/* offset=130574 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000"
-/* offset=130638 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000"
-/* offset=130805 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
-/* offset=130870 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
-/* offset=130938 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000"
-/* offset=131010 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000"
-/* offset=131105 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000"
-/* offset=131240 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000"
-/* offset=131305 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000"
-/* offset=131374 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000"
-/* offset=131445 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\000000"
-/* offset=131468 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\000000"
-/* offset=131491 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\000000"
-/* offset=131512 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000"
+/* offset=128979 */ "stalled_cycles_per_instruction\000Default\000(max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions if has_event(stalled\\-cycles\\-frontend) & has_event(stalled\\-cycles\\-backend) else (stalled\\-cycles\\-frontend / instructions if has_event(stalled\\-cycles\\-frontend) else (stalled\\-cycles\\-backend / instructions if has_event(stalled\\-cycles\\-backend) else 0)))\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001"
+/* offset=129404 */ "frontend_cycles_idle\000Default\000(stalled\\-cycles\\-frontend / cpu\\-cycles if has_event(stalled\\-cycles\\-frontend) else 0)\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001"
+/* offset=129583 */ "backend_cycles_idle\000Default\000(stalled\\-cycles\\-backend / cpu\\-cycles if has_event(stalled\\-cycles\\-backend) else 0)\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\000001"
+/* offset=129757 */ "cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011"
+/* offset=129933 */ "branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\000011"
+/* offset=130113 */ "branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001"
+/* offset=130217 */ "l1d_miss_rate\000Default2\000L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads\000l1d_miss_rate > 0.05\000L1D  miss rate\000\000100%\000\000\000\000001"
+/* offset=130333 */ "llc_miss_rate\000Default2\000LLC\\-load\\-misses / LLC\\-loads\000llc_miss_rate > 0.05\000LLC miss rate\000\000100%\000\000\000\000001"
+/* offset=130434 */ "l1i_miss_rate\000Default3\000L1\\-icache\\-load\\-misses / L1\\-icache\\-loads\000l1i_miss_rate > 0.05\000L1I miss rate\000\000100%\000\000\000\000001"
+/* offset=130549 */ "dtlb_miss_rate\000Default3\000dTLB\\-load\\-misses / dTLB\\-loads\000dtlb_miss_rate > 0.05\000dTLB miss rate\000\000100%\000\000\000\000001"
+/* offset=130655 */ "itlb_miss_rate\000Default3\000iTLB\\-load\\-misses / iTLB\\-loads\000itlb_miss_rate > 0.05\000iTLB miss rate\000\000100%\000\000\000\000001"
+/* offset=130761 */ "l1_prefetch_miss_rate\000Default4\000L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches\000l1_prefetch_miss_rate > 0.05\000L1 prefetch miss rate\000\000100%\000\000\000\000001"
+/* offset=130909 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000"
+/* offset=130932 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000"
+/* offset=130996 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000"
+/* offset=131163 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
+/* offset=131228 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
+/* offset=131296 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000"
+/* offset=131368 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000"
+/* offset=131463 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000"
+/* offset=131598 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000"
+/* offset=131663 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000"
+/* offset=131732 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000"
+/* offset=131803 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\000000"
+/* offset=131826 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\000000"
+/* offset=131849 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\000000"
+/* offset=131870 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000"
 ;
 
 static const struct compact_pmu_event pmu_events__common_default_core[] = {
@@ -2626,22 +2626,22 @@ static const struct pmu_table_entry pmu_events__common[] = {
 
 static const struct compact_pmu_event pmu_metrics__common_default_core[] = {
 { 127956 }, /* CPUs_utilized\000Default\000(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)\000\000Average CPU utilization\000\0001CPUs\000\000\000\000011 */
-{ 129273 }, /* backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\000001 */
-{ 129575 }, /* branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\000011 */
-{ 129755 }, /* branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001 */
+{ 129583 }, /* backend_cycles_idle\000Default\000(stalled\\-cycles\\-backend / cpu\\-cycles if has_event(stalled\\-cycles\\-backend) else 0)\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\000001 */
+{ 129933 }, /* branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\000011 */
+{ 130113 }, /* branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001 */
 { 128142 }, /* cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\000011 */
-{ 129399 }, /* cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011 */
-{ 130191 }, /* dtlb_miss_rate\000Default3\000dTLB\\-load\\-misses / dTLB\\-loads\000dtlb_miss_rate > 0.05\000dTLB miss rate\000\000100%\000\000\000\000001 */
-{ 129143 }, /* frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001 */
+{ 129757 }, /* cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011 */
+{ 130549 }, /* dtlb_miss_rate\000Default3\000dTLB\\-load\\-misses / dTLB\\-loads\000dtlb_miss_rate > 0.05\000dTLB miss rate\000\000100%\000\000\000\000001 */
+{ 129404 }, /* frontend_cycles_idle\000Default\000(stalled\\-cycles\\-frontend / cpu\\-cycles if has_event(stalled\\-cycles\\-frontend) else 0)\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001 */
 { 128866 }, /* insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\000001 */
-{ 130297 }, /* itlb_miss_rate\000Default3\000iTLB\\-load\\-misses / iTLB\\-loads\000itlb_miss_rate > 0.05\000iTLB miss rate\000\000100%\000\000\000\000001 */
-{ 130403 }, /* l1_prefetch_miss_rate\000Default4\000L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches\000l1_prefetch_miss_rate > 0.05\000L1 prefetch miss rate\000\000100%\000\000\000\000001 */
-{ 129859 }, /* l1d_miss_rate\000Default2\000L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads\000l1d_miss_rate > 0.05\000L1D  miss rate\000\000100%\000\000\000\000001 */
-{ 130076 }, /* l1i_miss_rate\000Default3\000L1\\-icache\\-load\\-misses / L1\\-icache\\-loads\000l1i_miss_rate > 0.05\000L1I miss rate\000\000100%\000\000\000\000001 */
-{ 129975 }, /* llc_miss_rate\000Default2\000LLC\\-load\\-misses / LLC\\-loads\000llc_miss_rate > 0.05\000LLC miss rate\000\000100%\000\000\000\000001 */
+{ 130655 }, /* itlb_miss_rate\000Default3\000iTLB\\-load\\-misses / iTLB\\-loads\000itlb_miss_rate > 0.05\000iTLB miss rate\000\000100%\000\000\000\000001 */
+{ 130761 }, /* l1_prefetch_miss_rate\000Default4\000L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches\000l1_prefetch_miss_rate > 0.05\000L1 prefetch miss rate\000\000100%\000\000\000\000001 */
+{ 130217 }, /* l1d_miss_rate\000Default2\000L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads\000l1d_miss_rate > 0.05\000L1D  miss rate\000\000100%\000\000\000\000001 */
+{ 130434 }, /* l1i_miss_rate\000Default3\000L1\\-icache\\-load\\-misses / L1\\-icache\\-loads\000l1i_miss_rate > 0.05\000L1I miss rate\000\000100%\000\000\000\000001 */
+{ 130333 }, /* llc_miss_rate\000Default2\000LLC\\-load\\-misses / LLC\\-loads\000llc_miss_rate > 0.05\000LLC miss rate\000\000100%\000\000\000\000001 */
 { 128375 }, /* migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\000011 */
 { 128635 }, /* page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\000011 */
-{ 128979 }, /* stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001 */
+{ 128979 }, /* stalled_cycles_per_instruction\000Default\000(max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions if has_event(stalled\\-cycles\\-frontend) & has_event(stalled\\-cycles\\-backend) else (stalled\\-cycles\\-frontend / instructions if has_event(stalled\\-cycles\\-frontend) else (stalled\\-cycles\\-backend / instructions if has_event(stalled\\-cycles\\-backend) else 0)))\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001 */
 
 };
 
@@ -2714,21 +2714,21 @@ static const struct pmu_table_entry pmu_events__test_soc_cpu[] = {
 };
 
 static const struct compact_pmu_event pmu_metrics__test_soc_cpu_default_core[] = {
-{ 130551 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000 */
-{ 131240 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000 */
-{ 131010 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000 */
-{ 131105 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000 */
-{ 131305 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
-{ 131374 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
-{ 130638 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000 */
-{ 130574 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000 */
-{ 131512 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000 */
-{ 131445 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\000000 */
-{ 131468 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\000000 */
-{ 131491 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\000000 */
-{ 130938 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000 */
-{ 130805 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
-{ 130870 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
+{ 130909 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000 */
+{ 131598 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000 */
+{ 131368 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000 */
+{ 131463 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000 */
+{ 131663 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
+{ 131732 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
+{ 130996 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000 */
+{ 130932 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000 */
+{ 131870 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000 */
+{ 131803 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\000000 */
+{ 131826 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\000000 */
+{ 131849 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\000000 */
+{ 131296 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000 */
+{ 131163 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
+{ 131228 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
 
 };
 
-- 
2.53.0.851.ga537e3e6e9-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: perf stat issue with 7.0.0rc3
  2026-03-18  2:25               ` Ian Rogers
  2026-03-19  1:01                 ` [PATCH v1] perf metrics: Make common stalled metrics conditional on having the event Ian Rogers
@ 2026-03-24  4:19                 ` Ian Rogers
  1 sibling, 0 replies; 10+ messages in thread
From: Ian Rogers @ 2026-03-24  4:19 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Thomas Richter, linux-perf-users, Jan Polensky,
	Linux Kernel Mailing List, Namhyung Kim

On Tue, Mar 17, 2026 at 7:25 PM Ian Rogers <irogers@google.com> wrote:
>
> On Tue, Mar 17, 2026 at 5:37 PM Arnaldo Carvalho de Melo
> <acme@kernel.org> wrote:
> >
> > On Tue, Mar 17, 2026 at 01:50:21PM -0700, Ian Rogers wrote:
> > > On Tue, Mar 17, 2026 at 1:12 PM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > > > On Tue, Mar 17, 2026 at 04:56:51PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > > It is not trying PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, is asking for
> > > > > PERF_COUNT_HW_STALLED_CYCLES_BACKEND instead...
> >
> > > > If I instead ask just for stalled-cycles-frontend and
> > > > stalled-cycles-backend:
> >
> > > > root@number:~# strace -e perf_event_open perf stat -e stalled-cycles-frontend,stalled-cycles-backend sleep 1
> >
> > > I think you intend for this to be system wide '-a'.
> >
> > > > perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 250619, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
> > > > perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 250619, -1, -1, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
> > > > --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=250619, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
> > > >
> > > >  Performance counter stats for 'sleep 1':
> > > >
> > > >            409,276      stalled-cycles-frontend
> > > >    <not supported>      stalled-cycles-backend
> > > >
> > > >        1.000428804 seconds time elapsed
> > > >
> > > >        0.000439000 seconds user
> > > >        0.000000000 seconds sys
> > > >
> > > >
> > > > --- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=250618, si_uid=0} ---
> > > > +++ exited with 0 +++
> > > > root@number:~#
> > > >
> > > > It used type=PERF_TYPE_RAW, config=0xa9 for  stalled-cycles-frontend but
> > > > type=PERF_TYPE_HARDWARE, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND.
> > > >
> > > > ⬢ [acme@toolbx perf-tools]$ git grep stalled-cycles-frontend tools
> > > > tools/bpf/bpftool/link.c:       [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = "stalled-cycles-frontend",
> > > > tools/perf/builtin-stat.c:     3,856,436,920 stalled-cycles-frontend   #   74.09% frontend cycles idle
> > > > tools/perf/pmu-events/arch/common/common/legacy-hardware.json:    "EventName": "stalled-cycles-frontend",
> > > > tools/perf/pmu-events/empty-pmu-events.c:/* offset=122795 */ "stalled-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of idle-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000"
> > > > tools/perf/pmu-events/empty-pmu-events.c:/* offset=122945 */ "idle-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of stalled-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000"
> > > > tools/perf/pmu-events/empty-pmu-events.c:{ 122795 }, /* stalled-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of idle-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000 */
> > > > tools/perf/tests/shell/stat+std_output.sh:event_name=(cpu-clock task-clock context-switches cpu-migrations page-faults stalled-cycles-frontend stalled-cycles-backend cycles instructions branches branch-misses)
> > > > tools/perf/util/evsel.c:        "stalled-cycles-frontend",
> > > > ⬢ [acme@toolbx perf-tools]$
> > > >
> > > > This machine is:
> > > >
> > > > ⬢ [acme@toolbx perf-tools]$ grep -m1 "model name" /proc/cpuinfo
> > > > model name      : AMD Ryzen 9 9950X3D 16-Core Processor
> > >
> > > Lots of missing legacy events on AMD. The problem is worse with -dd and -ddd.
> > >
> > > > ⬢ [acme@toolbx perf-tools]
> > > >
> > > > And doesn't have PERF_COUNT_HW_STALLED_CYCLES_BACKEND, but has
> > > > PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, that gets configured using
> > > > PERF_TYPE_RAW and 0xa9 because:
> > > >
> > > > root@number:~# cat /sys/devices/cpu/events/stalled-cycles-frontend
> > > > event=0xa9
> > > > root@number:~#
> > > >
> > > > But I couldn't so far explain why in the default case it is asking for
> > > > PERF_COUNT_HW_STALLED_CYCLES_BACKEND, when it should be asking for
> > > > PERF_COUNT_HW_STALLED_CYCLES_FRONTEND or PERF_TYPE_RAW+config=0xa9...
> >
> > So you mean that it goes on to try this:
> >
> >   {
> >         "BriefDescription": "Max front or backend stalls per instruction",
> >         "MetricExpr": "max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions",
> >         "MetricGroup": "Default",
> >         "MetricName": "stalled_cycles_per_instruction",
> >         "DefaultShowEvents": "1"
> >     },
> >
> > Yeah, it tries both:
> >
> > perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, 14, PERF_FLAG_FD_CLOEXEC) = 15
> > perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, -1, PERF_FLAG_FD_CLOEXEC) = 16
> > perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, 16, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
>
> perf trace? The formatting on perf_event_open is better :-)
>
> > The RAW one is the equivalent to PERF_COUNT_HW_STALLED_CYCLES_FRONTEND,
> > I see now that I looked again at the 'strace perf stat sleep 1'
> >
> > But in the output it also says:
> >
> >      <not counted>      stalled-cycles-frontend          #      nan frontend_cycles_idle        (0.00%)
> >
> > And if I try just this one:
> >
> > root@number:~# strace -e perf_event_open perf stat -e stalled-cycles-frontend sleep 1
> > perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865273, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
> > --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=865273, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
> >
> >  Performance counter stats for 'sleep 1':
> >
> >            422,404      stalled-cycles-frontend
> >
> >        1.000432524 seconds time elapsed
> >
> >        0.000438000 seconds user
> >        0.000000000 seconds sys
> >
> >
> > --- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=865272, si_uid=0} ---
> > +++ exited with 0 +++
> > root@number:~#
> >
> > It works, so that line with stalled-cycles-frontend could have produced
> > the value, not '<not counted>', as this call succeeded:
> >
> > perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, 14, PERF_FLAG_FD_CLOEXEC) = 15
> > perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, -1, PERF_FLAG_FD_CLOEXEC) = 16
> >
> > Maybe the explanation is that it tries the metric, that uses both
> > frontend and backend, it fails at backend and then it discards the
> > frontend?
>
> The metric logic tries to be smart, creating groups of events and then
> sharing events between groups to avoid programming more events. I
> suspect the frontend and backend are in a group, maybe the backend is
> the leader. The backend fails to open, resulting in no group_fd and a
> broken group for both the metrics.
>
> > > So the default events/metrics are now in json:
> > > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common/metrics.json?h=perf-tools-next
> > > Relating to the stalls there are:
> > > ```
> > >     {
> > >         "BriefDescription": "Max front or backend stalls per instruction",
> > >         "MetricExpr": "max(stalled\\-cycles\\-frontend,
> > > stalled\\-cycles\\-backend) / instructions",
> > >         "MetricGroup": "Default",
> > >         "MetricName": "stalled_cycles_per_instruction",
> > >         "DefaultShowEvents": "1"
> > >     },
> >
> > root@number:~# strace -e perf_event_open perf stat -M stalled_cycles_per_instruction sleep 1
> > perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865362, -1, -1, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
> > Error:
> > No supported events found.
> > The stalled-cycles-backend event is not supported.
> > --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=865362, si_uid=0, si_status=SIGTERM, si_utime=0, si_stime=0} ---
> > +++ exited with 1 +++
> > root@number:~#
> >
> > >     {
> > >         "BriefDescription": "Frontend stalls per cycle",
> > >         "MetricExpr": "stalled\\-cycles\\-frontend / cpu\\-cycles",
> > >         "MetricGroup": "Default",
> > >         "MetricName": "frontend_cycles_idle",
> > >         "MetricThreshold": "frontend_cycles_idle > 0.1",
> > >         "DefaultShowEvents": "1"
> > >     },
> >
> > root@number:~# strace -e perf_event_open perf stat -M frontend_cycles_idle sleep 1
> > perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865414, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
> > perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865414, -1, 3, PERF_FLAG_FD_CLOEXEC) = 4
> > --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=865414, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
> >
> >  Performance counter stats for 'sleep 1':
> >
> >            881,022      cpu-cycles                       #     0.48 frontend_cycles_idle
> >            422,386      stalled-cycles-frontend
> >
> >        1.000468505 seconds time elapsed
> >
> >        0.000504000 seconds user
> >        0.000000000 seconds sys
> >
> >
> > --- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=865413, si_uid=0} ---
> > +++ exited with 0 +++
> > root@number:~#
> > >     {
> > >         "BriefDescription": "Backend stalls per cycle",
> > >         "MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
> > >         "MetricGroup": "Default",
> > >         "MetricName": "backend_cycles_idle",
> > >         "MetricThreshold": "backend_cycles_idle > 0.2",
> > >         "DefaultShowEvents": "1"
> > >     },
> >
> > root@number:~# strace -e perf_event_open perf stat -M backend_cycles_idle sleep 1
> > perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865442, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
> > perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865442, -1, 3, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
> > --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=865442, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
> >
> >  Performance counter stats for 'sleep 1':
> >
> >      <not counted>      cpu-cycles                       #      nan backend_cycles_idle
> >    <not supported>      stalled-cycles-backend
> >
> >        1.000739264 seconds time elapsed
> >
> >        0.000675000 seconds user
> >        0.000000000 seconds sys
> >
> >
> > --- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=865441, si_uid=0} ---
> > +++ exited with 0 +++
> > root@number:~#
> >
> > > ```
> > > The stalled_cycles_per_instruction and backed_cycles_idle should fail
> > > as the stalled-cycles-backend event is missing. frontend_cycles_idle
> > > should work, I wonder if the 0 counts relate to trouble scheduling
> > > groups of events. I'll need more verbose output to understand. Perhaps
> > > for stalled_cycles_per_instruction, we should modify the metric to
> > > tolerate missing events:
> > >
> > > max(stalled\\-cycles\\-frontend if
> > > have_event(stalled\\-cycles\\-frontend) else 0,
> > > stalled\\-cycles\\-backend if have_event(stalled\\-cycles\\-backend)
> > > else 0) / instructions
> >
> > That have_event() part also have to be implemented, right?
>
> I mistyped, has_event not have_event :-)

I posted:
https://lore.kernel.org/lkml/20260319010103.834106-1-irogers@google.com/
that implements the has_event approach.

Thanks,
Ian

> Thanks,
> Ian
>
> > - Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] perf metrics: Make common stalled metrics conditional on having the event
  2026-03-19  1:01                 ` [PATCH v1] perf metrics: Make common stalled metrics conditional on having the event Ian Rogers
@ 2026-04-01  5:55                   ` Ian Rogers
  2026-04-04  0:15                   ` Namhyung Kim
  1 sibling, 0 replies; 10+ messages in thread
From: Ian Rogers @ 2026-04-01  5:55 UTC (permalink / raw)
  To: acme; +Cc: japo, linux-kernel, linux-perf-users, namhyung, tmricht

On Wed, Mar 18, 2026 at 6:01 PM Ian Rogers <irogers@google.com> wrote:
>
> The metric code uses the event parsing code but it generally assumes
> all events are supported. Arnaldo reported AMD supporting
> stalled-cycles-frontend but not stalled-cycles-backend [1]. An issue
> with this is that before parsing happens the metric code tries to
> share events within groups to reduce the number of events and
> multiplexing. If the group has some supported and not supported
> events, the whole group will become broken. To avoid this situation
> add has_event tests to the metrics for stalled-cycles-frontend and
> stalled-cycles-backend. has_events is evaluated when parsing the
> metric and its result constant propagated (with if-elses) to reduce
> the number of events. This means when the metric code considers
> sharing the events, only supported events will be shared.
>
> Note for backporting. This change updates
> tools/perf/pmu-events/empty-pmu-events.c a convenience file for builds
> on systems without python present. While the metrics.json code should
> backport easily there can be conflicts on empty-pmu-events.c. In this
> case the build will have left a file test-empty-pmu-events.c that can
> be copied over empty-pmu-events.c to resolve issues and make an
> appropriate empty-pmu-events.c for the json in the source tree at the
> time of the build.
>
> [1] https://lore.kernel.org/lkml/abm1nR-2xjOUBroD@x1/
>
> Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
> Closes: https://lore.kernel.org/lkml/abm1nR-2xjOUBroD@x1/
> Fixes: c7adeb0974f1 ("perf jevents: Add set of common metrics based on default ones")
> Signed-off-by: Ian Rogers <irogers@google.com>

Ping.

Thanks,
Ian

> ---
>  .../arch/common/common/metrics.json           |   6 +-
>  tools/perf/pmu-events/empty-pmu-events.c      | 108 +++++++++---------
>  2 files changed, 57 insertions(+), 57 deletions(-)
>
> diff --git a/tools/perf/pmu-events/arch/common/common/metrics.json b/tools/perf/pmu-events/arch/common/common/metrics.json
> index 0d010b3ebc6d..cefc8bfe7830 100644
> --- a/tools/perf/pmu-events/arch/common/common/metrics.json
> +++ b/tools/perf/pmu-events/arch/common/common/metrics.json
> @@ -46,14 +46,14 @@
>      },
>      {
>          "BriefDescription": "Max front or backend stalls per instruction",
> -        "MetricExpr": "max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions",
> +        "MetricExpr": "(max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions) if (has_event(stalled\\-cycles\\-frontend) & has_event(stalled\\-cycles\\-backend)) else ((stalled\\-cycles\\-frontend / instructions) if has_event(stalled\\-cycles\\-frontend) else ((stalled\\-cycles\\-backend / instructions) if has_event(stalled\\-cycles\\-backend) else 0))",
>          "MetricGroup": "Default",
>          "MetricName": "stalled_cycles_per_instruction",
>          "DefaultShowEvents": "1"
>      },
>      {
>          "BriefDescription": "Frontend stalls per cycle",
> -        "MetricExpr": "stalled\\-cycles\\-frontend / cpu\\-cycles",
> +        "MetricExpr": "(stalled\\-cycles\\-frontend / cpu\\-cycles) if has_event(stalled\\-cycles\\-frontend) else 0",
>          "MetricGroup": "Default",
>          "MetricName": "frontend_cycles_idle",
>          "MetricThreshold": "frontend_cycles_idle > 0.1",
> @@ -61,7 +61,7 @@
>      },
>      {
>          "BriefDescription": "Backend stalls per cycle",
> -        "MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
> +        "MetricExpr": "(stalled\\-cycles\\-backend / cpu\\-cycles) if has_event(stalled\\-cycles\\-backend) else 0",
>          "MetricGroup": "Default",
>          "MetricName": "backend_cycles_idle",
>          "MetricThreshold": "backend_cycles_idle > 0.2",
> diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
> index 76c395cf513c..a92dd0424f79 100644
> --- a/tools/perf/pmu-events/empty-pmu-events.c
> +++ b/tools/perf/pmu-events/empty-pmu-events.c
> @@ -1310,33 +1310,33 @@ static const char *const big_c_string =
>  /* offset=128375 */ "migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\000011"
>  /* offset=128635 */ "page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\000011"
>  /* offset=128866 */ "insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\000001"
> -/* offset=128979 */ "stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001"
> -/* offset=129143 */ "frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001"
> -/* offset=129273 */ "backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\000001"
> -/* offset=129399 */ "cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011"
> -/* offset=129575 */ "branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\000011"
> -/* offset=129755 */ "branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001"
> -/* offset=129859 */ "l1d_miss_rate\000Default2\000L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads\000l1d_miss_rate > 0.05\000L1D  miss rate\000\000100%\000\000\000\000001"
> -/* offset=129975 */ "llc_miss_rate\000Default2\000LLC\\-load\\-misses / LLC\\-loads\000llc_miss_rate > 0.05\000LLC miss rate\000\000100%\000\000\000\000001"
> -/* offset=130076 */ "l1i_miss_rate\000Default3\000L1\\-icache\\-load\\-misses / L1\\-icache\\-loads\000l1i_miss_rate > 0.05\000L1I miss rate\000\000100%\000\000\000\000001"
> -/* offset=130191 */ "dtlb_miss_rate\000Default3\000dTLB\\-load\\-misses / dTLB\\-loads\000dtlb_miss_rate > 0.05\000dTLB miss rate\000\000100%\000\000\000\000001"
> -/* offset=130297 */ "itlb_miss_rate\000Default3\000iTLB\\-load\\-misses / iTLB\\-loads\000itlb_miss_rate > 0.05\000iTLB miss rate\000\000100%\000\000\000\000001"
> -/* offset=130403 */ "l1_prefetch_miss_rate\000Default4\000L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches\000l1_prefetch_miss_rate > 0.05\000L1 prefetch miss rate\000\000100%\000\000\000\000001"
> -/* offset=130551 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000"
> -/* offset=130574 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000"
> -/* offset=130638 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000"
> -/* offset=130805 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
> -/* offset=130870 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
> -/* offset=130938 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000"
> -/* offset=131010 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000"
> -/* offset=131105 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000"
> -/* offset=131240 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000"
> -/* offset=131305 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000"
> -/* offset=131374 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000"
> -/* offset=131445 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\000000"
> -/* offset=131468 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\000000"
> -/* offset=131491 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\000000"
> -/* offset=131512 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000"
> +/* offset=128979 */ "stalled_cycles_per_instruction\000Default\000(max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions if has_event(stalled\\-cycles\\-frontend) & has_event(stalled\\-cycles\\-backend) else (stalled\\-cycles\\-frontend / instructions if has_event(stalled\\-cycles\\-frontend) else (stalled\\-cycles\\-backend / instructions if has_event(stalled\\-cycles\\-backend) else 0)))\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001"
> +/* offset=129404 */ "frontend_cycles_idle\000Default\000(stalled\\-cycles\\-frontend / cpu\\-cycles if has_event(stalled\\-cycles\\-frontend) else 0)\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001"
> +/* offset=129583 */ "backend_cycles_idle\000Default\000(stalled\\-cycles\\-backend / cpu\\-cycles if has_event(stalled\\-cycles\\-backend) else 0)\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\000001"
> +/* offset=129757 */ "cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011"
> +/* offset=129933 */ "branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\000011"
> +/* offset=130113 */ "branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001"
> +/* offset=130217 */ "l1d_miss_rate\000Default2\000L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads\000l1d_miss_rate > 0.05\000L1D  miss rate\000\000100%\000\000\000\000001"
> +/* offset=130333 */ "llc_miss_rate\000Default2\000LLC\\-load\\-misses / LLC\\-loads\000llc_miss_rate > 0.05\000LLC miss rate\000\000100%\000\000\000\000001"
> +/* offset=130434 */ "l1i_miss_rate\000Default3\000L1\\-icache\\-load\\-misses / L1\\-icache\\-loads\000l1i_miss_rate > 0.05\000L1I miss rate\000\000100%\000\000\000\000001"
> +/* offset=130549 */ "dtlb_miss_rate\000Default3\000dTLB\\-load\\-misses / dTLB\\-loads\000dtlb_miss_rate > 0.05\000dTLB miss rate\000\000100%\000\000\000\000001"
> +/* offset=130655 */ "itlb_miss_rate\000Default3\000iTLB\\-load\\-misses / iTLB\\-loads\000itlb_miss_rate > 0.05\000iTLB miss rate\000\000100%\000\000\000\000001"
> +/* offset=130761 */ "l1_prefetch_miss_rate\000Default4\000L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches\000l1_prefetch_miss_rate > 0.05\000L1 prefetch miss rate\000\000100%\000\000\000\000001"
> +/* offset=130909 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000"
> +/* offset=130932 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000"
> +/* offset=130996 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000"
> +/* offset=131163 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
> +/* offset=131228 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
> +/* offset=131296 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000"
> +/* offset=131368 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000"
> +/* offset=131463 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000"
> +/* offset=131598 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000"
> +/* offset=131663 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000"
> +/* offset=131732 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000"
> +/* offset=131803 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\000000"
> +/* offset=131826 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\000000"
> +/* offset=131849 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\000000"
> +/* offset=131870 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000"
>  ;
>
>  static const struct compact_pmu_event pmu_events__common_default_core[] = {
> @@ -2626,22 +2626,22 @@ static const struct pmu_table_entry pmu_events__common[] = {
>
>  static const struct compact_pmu_event pmu_metrics__common_default_core[] = {
>  { 127956 }, /* CPUs_utilized\000Default\000(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)\000\000Average CPU utilization\000\0001CPUs\000\000\000\000011 */
> -{ 129273 }, /* backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\000001 */
> -{ 129575 }, /* branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\000011 */
> -{ 129755 }, /* branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001 */
> +{ 129583 }, /* backend_cycles_idle\000Default\000(stalled\\-cycles\\-backend / cpu\\-cycles if has_event(stalled\\-cycles\\-backend) else 0)\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\000001 */
> +{ 129933 }, /* branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\000011 */
> +{ 130113 }, /* branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001 */
>  { 128142 }, /* cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\000011 */
> -{ 129399 }, /* cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011 */
> -{ 130191 }, /* dtlb_miss_rate\000Default3\000dTLB\\-load\\-misses / dTLB\\-loads\000dtlb_miss_rate > 0.05\000dTLB miss rate\000\000100%\000\000\000\000001 */
> -{ 129143 }, /* frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001 */
> +{ 129757 }, /* cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011 */
> +{ 130549 }, /* dtlb_miss_rate\000Default3\000dTLB\\-load\\-misses / dTLB\\-loads\000dtlb_miss_rate > 0.05\000dTLB miss rate\000\000100%\000\000\000\000001 */
> +{ 129404 }, /* frontend_cycles_idle\000Default\000(stalled\\-cycles\\-frontend / cpu\\-cycles if has_event(stalled\\-cycles\\-frontend) else 0)\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001 */
>  { 128866 }, /* insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\000001 */
> -{ 130297 }, /* itlb_miss_rate\000Default3\000iTLB\\-load\\-misses / iTLB\\-loads\000itlb_miss_rate > 0.05\000iTLB miss rate\000\000100%\000\000\000\000001 */
> -{ 130403 }, /* l1_prefetch_miss_rate\000Default4\000L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches\000l1_prefetch_miss_rate > 0.05\000L1 prefetch miss rate\000\000100%\000\000\000\000001 */
> -{ 129859 }, /* l1d_miss_rate\000Default2\000L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads\000l1d_miss_rate > 0.05\000L1D  miss rate\000\000100%\000\000\000\000001 */
> -{ 130076 }, /* l1i_miss_rate\000Default3\000L1\\-icache\\-load\\-misses / L1\\-icache\\-loads\000l1i_miss_rate > 0.05\000L1I miss rate\000\000100%\000\000\000\000001 */
> -{ 129975 }, /* llc_miss_rate\000Default2\000LLC\\-load\\-misses / LLC\\-loads\000llc_miss_rate > 0.05\000LLC miss rate\000\000100%\000\000\000\000001 */
> +{ 130655 }, /* itlb_miss_rate\000Default3\000iTLB\\-load\\-misses / iTLB\\-loads\000itlb_miss_rate > 0.05\000iTLB miss rate\000\000100%\000\000\000\000001 */
> +{ 130761 }, /* l1_prefetch_miss_rate\000Default4\000L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches\000l1_prefetch_miss_rate > 0.05\000L1 prefetch miss rate\000\000100%\000\000\000\000001 */
> +{ 130217 }, /* l1d_miss_rate\000Default2\000L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads\000l1d_miss_rate > 0.05\000L1D  miss rate\000\000100%\000\000\000\000001 */
> +{ 130434 }, /* l1i_miss_rate\000Default3\000L1\\-icache\\-load\\-misses / L1\\-icache\\-loads\000l1i_miss_rate > 0.05\000L1I miss rate\000\000100%\000\000\000\000001 */
> +{ 130333 }, /* llc_miss_rate\000Default2\000LLC\\-load\\-misses / LLC\\-loads\000llc_miss_rate > 0.05\000LLC miss rate\000\000100%\000\000\000\000001 */
>  { 128375 }, /* migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\000011 */
>  { 128635 }, /* page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\000011 */
> -{ 128979 }, /* stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001 */
> +{ 128979 }, /* stalled_cycles_per_instruction\000Default\000(max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions if has_event(stalled\\-cycles\\-frontend) & has_event(stalled\\-cycles\\-backend) else (stalled\\-cycles\\-frontend / instructions if has_event(stalled\\-cycles\\-frontend) else (stalled\\-cycles\\-backend / instructions if has_event(stalled\\-cycles\\-backend) else 0)))\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001 */
>
>  };
>
> @@ -2714,21 +2714,21 @@ static const struct pmu_table_entry pmu_events__test_soc_cpu[] = {
>  };
>
>  static const struct compact_pmu_event pmu_metrics__test_soc_cpu_default_core[] = {
> -{ 130551 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000 */
> -{ 131240 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000 */
> -{ 131010 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000 */
> -{ 131105 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000 */
> -{ 131305 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
> -{ 131374 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
> -{ 130638 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000 */
> -{ 130574 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000 */
> -{ 131512 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000 */
> -{ 131445 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\000000 */
> -{ 131468 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\000000 */
> -{ 131491 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\000000 */
> -{ 130938 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000 */
> -{ 130805 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
> -{ 130870 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
> +{ 130909 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000 */
> +{ 131598 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000 */
> +{ 131368 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000 */
> +{ 131463 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000 */
> +{ 131663 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
> +{ 131732 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
> +{ 130996 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000 */
> +{ 130932 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000 */
> +{ 131870 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000 */
> +{ 131803 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\000000 */
> +{ 131826 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\000000 */
> +{ 131849 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\000000 */
> +{ 131296 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000 */
> +{ 131163 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
> +{ 131228 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
>
>  };
>
> --
> 2.53.0.851.ga537e3e6e9-goog
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] perf metrics: Make common stalled metrics conditional on having the event
  2026-03-19  1:01                 ` [PATCH v1] perf metrics: Make common stalled metrics conditional on having the event Ian Rogers
  2026-04-01  5:55                   ` Ian Rogers
@ 2026-04-04  0:15                   ` Namhyung Kim
  1 sibling, 0 replies; 10+ messages in thread
From: Namhyung Kim @ 2026-04-04  0:15 UTC (permalink / raw)
  To: acme, Ian Rogers; +Cc: japo, linux-kernel, linux-perf-users, tmricht

On Wed, 18 Mar 2026 18:01:03 -0700, Ian Rogers wrote:
> The metric code uses the event parsing code but it generally assumes
> all events are supported. Arnaldo reported AMD supporting
> stalled-cycles-frontend but not stalled-cycles-backend [1]. An issue
> with this is that before parsing happens the metric code tries to
> share events within groups to reduce the number of events and
> multiplexing. If the group has some supported and not supported
> events, the whole group will become broken. To avoid this situation
> add has_event tests to the metrics for stalled-cycles-frontend and
> stalled-cycles-backend. has_events is evaluated when parsing the
> metric and its result constant propagated (with if-elses) to reduce
> the number of events. This means when the metric code considers
> sharing the events, only supported events will be shared.
> 
> [...]
Applied to perf-tools-next, thanks!

Best regards,
Namhyung



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-04-04  0:15 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <66d0366b-8690-4bde-aba4-1d51f278dfa1@linux.ibm.com>
     [not found] ` <abQq79_ajs8zFeEI@x1>
     [not found]   ` <CAP-5=fWG0Q-xNGn_-_KGCopaNsWG5rUN3pYNLEh3K7+xE0OwVQ@mail.gmail.com>
2026-03-17 19:39     ` perf stat issue with 7.0.0rc3 Arnaldo Carvalho de Melo
2026-03-17 19:56       ` Arnaldo Carvalho de Melo
2026-03-17 20:12         ` Arnaldo Carvalho de Melo
2026-03-17 20:50           ` Ian Rogers
2026-03-18  0:37             ` Arnaldo Carvalho de Melo
2026-03-18  2:25               ` Ian Rogers
2026-03-19  1:01                 ` [PATCH v1] perf metrics: Make common stalled metrics conditional on having the event Ian Rogers
2026-04-01  5:55                   ` Ian Rogers
2026-04-04  0:15                   ` Namhyung Kim
2026-03-24  4:19                 ` perf stat issue with 7.0.0rc3 Ian Rogers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox