* Re: perf stat issue with 7.0.0rc3
[not found] ` <CAP-5=fWG0Q-xNGn_-_KGCopaNsWG5rUN3pYNLEh3K7+xE0OwVQ@mail.gmail.com>
@ 2026-03-17 19:39 ` Arnaldo Carvalho de Melo
2026-03-17 19:56 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2026-03-17 19:39 UTC (permalink / raw)
To: Ian Rogers
Cc: Thomas Richter, linux-perf-users, Jan Polensky,
Linux Kernel Mailing List, Namhyung Kim
On Fri, Mar 13, 2026 at 08:41:48AM -0700, Ian Rogers wrote:
> On Fri, Mar 13, 2026 at 8:19 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > On Fri, Mar 13, 2026 at 02:13:18PM +0100, Thomas Richter wrote:
> > > I just discovered a strange behavior on linux 7.0.0rc3.
> > > This is the epected output, however when I use the perf version 7.0.0rc3:
> > > bash-5.3# ./perf -v
> > > perf version 7.0.rc3.g1f318b96cc84
> > > bash-5.3# ./perf stat -- true
> > > Error:
> > > No supported events found.
> > > trace.args_alignment
> > And this last line is even stranger, in my case I get something else,
> > also on x86_64:
> > root@number:~# perf stat -- true
> > Error:
> > No supported events found.
> > addr2line.style
> > root@number:~#
> > On an ARM machine:
> > acme@raspberrypi:~/git/perf-tools $ perf stat -- true
> > Error:
> > No supported events found.
> > acme@raspberrypi:~/git/perf-tools $ uname -a
> > Linux raspberrypi 6.12.62+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.12.62-1+rpt1 (2025-12-18) aarch64 GNU/Linux
> > acme@raspberrypi:~/git/perf-tools $
> > I'll try to bisect this later, thanks for the report!
> Hi Arnaldo,
> Leo has pointed at the fix:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/tools/perf/util/metricgroup.c?h=perf-tools-next&id=c5a244bf17caf2de22f9e100832b75f72b31d3e6
> This also showed up as missing for the LTS backports:
> https://lore.kernel.org/lkml/ad95d781-7eb2-4c0c-a9e9-aaabae8eb602@kernel.org/
> so I thought it was flagged as a fix for the next PR. I don't see it in there:
> https://lore.kernel.org/lkml/20260313151434.1695228-1-acme@kernel.org/
> Could we get it in?
So, I applied the one Leo pointed out and got this:
root@number:~# perf stat sleep 1
Performance counter stats for 'sleep 1':
1 context-switches # 3509.1 cs/sec cs_per_second
0 cpu-migrations # 0.0 migrations/sec migrations_per_second
75 page-faults # 263181.0 faults/sec page_faults_per_second
0.28 msec task-clock # 0.0 CPUs CPUs_utilized
8,018 branch-misses # 4.2 % branch_miss_rate
191,108 branches # 670.6 M/sec branch_frequency
870,234 cpu-cycles # 3.1 GHz cycles_frequency
<not counted> instructions # nan instructions insn_per_cycle (0.00%)
<not counted> stalled-cycles-frontend # nan frontend_cycles_idle (0.00%)
1.001839832 seconds time elapsed
0.000727000 seconds user
0.000000000 seconds sys
Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
root@number:~# echo 0 > /proc/sys/kernel/nmi_watchdog
Which is strange, in the past (see below) instructions and
stalled-cycles-frontend were counted in the default (no -e) 'perf stat'
set of events, now if I disable the nmi_watchdog I get 'instructions'
back, but not 'stalled-cycles-frontend':
root@number:~# perf stat sleep 1
Performance counter stats for 'sleep 1':
1 context-switches # 6524.7 cs/sec cs_per_second
0 cpu-migrations # 0.0 migrations/sec migrations_per_second
71 page-faults # 463256.0 faults/sec page_faults_per_second
0.15 msec task-clock # 0.0 CPUs CPUs_utilized
7,484 branch-misses # 4.0 % branch_miss_rate
186,108 branches # 1214.3 M/sec branch_frequency
810,475 cpu-cycles # 5.3 GHz cycles_frequency
893,290 instructions # 1.1 instructions insn_per_cycle
<not counted> stalled-cycles-frontend # nan frontend_cycles_idle (0.00%)
1.000382345 seconds time elapsed
0.000443000 seconds user
0.000000000 seconds sys
If I go back to v6.18 (will try a bisect to see exactly where this
happens), I get the events above, with a different printing order, but
'instructions' and 'stalled-cycles-frontend' successfully counts:
root@number:~# perf -v
perf version 6.18.g7d0a66e4bb90
root@number:~# perf stat sleep 1
Performance counter stats for 'sleep 1':
202,103 task-clock # 0.000 CPUs utilized
1 context-switches # 4.948 K/sec
0 cpu-migrations # 0.000 /sec
71 page-faults # 351.306 K/sec
895,485 instructions # 0.85 insn per cycle
# 0.56 stalled cycles per insn
1,048,898 cycles # 5.190 GHz
504,794 stalled-cycles-frontend # 48.13% frontend cycles idle
186,002 branches # 920.333 M/sec
7,857 branch-misses # 4.22% of all branches
1.000537221 seconds time elapsed
0.000542000 seconds user
0.000000000 seconds sys
root@number:~#
Trying bisection now.
- Arnaldo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: perf stat issue with 7.0.0rc3
2026-03-17 19:39 ` perf stat issue with 7.0.0rc3 Arnaldo Carvalho de Melo
@ 2026-03-17 19:56 ` Arnaldo Carvalho de Melo
2026-03-17 20:12 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2026-03-17 19:56 UTC (permalink / raw)
To: Ian Rogers
Cc: Thomas Richter, linux-perf-users, Jan Polensky,
Linux Kernel Mailing List, Namhyung Kim
On Tue, Mar 17, 2026 at 04:39:24PM -0300, Arnaldo Carvalho de Melo wrote:
> On Fri, Mar 13, 2026 at 08:41:48AM -0700, Ian Rogers wrote:
> > On Fri, Mar 13, 2026 at 8:19 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > > On Fri, Mar 13, 2026 at 02:13:18PM +0100, Thomas Richter wrote:
> > > > I just discovered a strange behavior on linux 7.0.0rc3.
> > > > This is the epected output, however when I use the perf version 7.0.0rc3:
>
> > > > bash-5.3# ./perf -v
> > > > perf version 7.0.rc3.g1f318b96cc84
> > > > bash-5.3# ./perf stat -- true
> > > > Error:
> > > > No supported events found.
> > > > trace.args_alignment
>
> > > And this last line is even stranger, in my case I get something else,
> > > also on x86_64:
>
> > > root@number:~# perf stat -- true
> > > Error:
> > > No supported events found.
> > > addr2line.style
> > > root@number:~#
>
> > > On an ARM machine:
>
> > > acme@raspberrypi:~/git/perf-tools $ perf stat -- true
> > > Error:
> > > No supported events found.
>
> > > acme@raspberrypi:~/git/perf-tools $ uname -a
> > > Linux raspberrypi 6.12.62+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.12.62-1+rpt1 (2025-12-18) aarch64 GNU/Linux
> > > acme@raspberrypi:~/git/perf-tools $
>
> > > I'll try to bisect this later, thanks for the report!
>
> > Hi Arnaldo,
>
> > Leo has pointed at the fix:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/tools/perf/util/metricgroup.c?h=perf-tools-next&id=c5a244bf17caf2de22f9e100832b75f72b31d3e6
> > This also showed up as missing for the LTS backports:
> > https://lore.kernel.org/lkml/ad95d781-7eb2-4c0c-a9e9-aaabae8eb602@kernel.org/
> > so I thought it was flagged as a fix for the next PR. I don't see it in there:
> > https://lore.kernel.org/lkml/20260313151434.1695228-1-acme@kernel.org/
> > Could we get it in?
>
> So, I applied the one Leo pointed out and got this:
>
> root@number:~# perf stat sleep 1
>
> Performance counter stats for 'sleep 1':
>
> 1 context-switches # 3509.1 cs/sec cs_per_second
> 0 cpu-migrations # 0.0 migrations/sec migrations_per_second
> 75 page-faults # 263181.0 faults/sec page_faults_per_second
> 0.28 msec task-clock # 0.0 CPUs CPUs_utilized
> 8,018 branch-misses # 4.2 % branch_miss_rate
> 191,108 branches # 670.6 M/sec branch_frequency
> 870,234 cpu-cycles # 3.1 GHz cycles_frequency
> <not counted> instructions # nan instructions insn_per_cycle (0.00%)
> <not counted> stalled-cycles-frontend # nan frontend_cycles_idle (0.00%)
>
> 1.001839832 seconds time elapsed
>
> 0.000727000 seconds user
> 0.000000000 seconds sys
>
>
> Some events weren't counted. Try disabling the NMI watchdog:
> echo 0 > /proc/sys/kernel/nmi_watchdog
> perf stat ...
> echo 1 > /proc/sys/kernel/nmi_watchdog
> root@number:~# echo 0 > /proc/sys/kernel/nmi_watchdog
>
>
> Which is strange, in the past (see below) instructions and
> stalled-cycles-frontend were counted in the default (no -e) 'perf stat'
> set of events, now if I disable the nmi_watchdog I get 'instructions'
> back, but not 'stalled-cycles-frontend':
>
>
> root@number:~# perf stat sleep 1
>
> Performance counter stats for 'sleep 1':
>
> 1 context-switches # 6524.7 cs/sec cs_per_second
> 0 cpu-migrations # 0.0 migrations/sec migrations_per_second
> 71 page-faults # 463256.0 faults/sec page_faults_per_second
> 0.15 msec task-clock # 0.0 CPUs CPUs_utilized
> 7,484 branch-misses # 4.0 % branch_miss_rate
> 186,108 branches # 1214.3 M/sec branch_frequency
> 810,475 cpu-cycles # 5.3 GHz cycles_frequency
> 893,290 instructions # 1.1 instructions insn_per_cycle
> <not counted> stalled-cycles-frontend # nan frontend_cycles_idle (0.00%)
>
> 1.000382345 seconds time elapsed
>
> 0.000443000 seconds user
> 0.000000000 seconds sys
>
>
> If I go back to v6.18 (will try a bisect to see exactly where this
> happens), I get the events above, with a different printing order, but
> 'instructions' and 'stalled-cycles-frontend' successfully counts:
>
> root@number:~# perf -v
> perf version 6.18.g7d0a66e4bb90
> root@number:~# perf stat sleep 1
>
> Performance counter stats for 'sleep 1':
>
> 202,103 task-clock # 0.000 CPUs utilized
> 1 context-switches # 4.948 K/sec
> 0 cpu-migrations # 0.000 /sec
> 71 page-faults # 351.306 K/sec
> 895,485 instructions # 0.85 insn per cycle
> # 0.56 stalled cycles per insn
> 1,048,898 cycles # 5.190 GHz
> 504,794 stalled-cycles-frontend # 48.13% frontend cycles idle
> 186,002 branches # 920.333 M/sec
> 7,857 branch-misses # 4.22% of all branches
>
> 1.000537221 seconds time elapsed
>
> 0.000542000 seconds user
> 0.000000000 seconds sys
>
>
> root@number:~#
>
> Trying bisection now.
Its related to that:
a3248b5b5427dc2126c19aa9c32f1e840b65024f is the first bad commit
commit a3248b5b5427dc2126c19aa9c32f1e840b65024f
Author: Ian Rogers <irogers@google.com>
Date: Tue Nov 11 13:21:52 2025 -0800
perf jevents: Add metric DefaultShowEvents
Some Default group metrics require their events showing for
consistency with perf's previous behavior. Add a flag to indicate when
this is the case and use it in stat-display.
root@number:~# strace -e perf_event_open perf stat sleep 1
perf_event_open({type=PERF_TYPE_SOFTWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_SW_CONTEXT_SWITCHES, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
perf_event_open({type=PERF_TYPE_SOFTWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_SW_CPU_MIGRATIONS, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, -1, PERF_FLAG_FD_CLOEXEC) = 4
perf_event_open({type=PERF_TYPE_SOFTWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_SW_PAGE_FAULTS, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, -1, PERF_FLAG_FD_CLOEXEC) = 5
perf_event_open({type=PERF_TYPE_SOFTWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_SW_TASK_CLOCK, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, -1, PERF_FLAG_FD_CLOEXEC) = 7
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xc3, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, -1, PERF_FLAG_FD_CLOEXEC) = 8
perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_BRANCH_INSTRUCTIONS, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, 8, PERF_FLAG_FD_CLOEXEC) = 9
perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_BRANCH_INSTRUCTIONS, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, -1, PERF_FLAG_FD_CLOEXEC) = 10
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, -1, PERF_FLAG_FD_CLOEXEC) = 11
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, -1, PERF_FLAG_FD_CLOEXEC) = 12
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xc0, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, 12, PERF_FLAG_FD_CLOEXEC) = 13
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, -1, PERF_FLAG_FD_CLOEXEC) = 14
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, 14, PERF_FLAG_FD_CLOEXEC) = 15
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, -1, PERF_FLAG_FD_CLOEXEC) = 16
perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, 16, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 249821, -1, -1, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=249821, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
Performance counter stats for 'sleep 1':
1 context-switches # 3356.1 cs/sec cs_per_second
0 cpu-migrations # 0.0 migrations/sec migrations_per_second
74 page-faults # 248352.1 faults/sec page_faults_per_second
0.30 msec task-clock # 0.0 CPUs CPUs_utilized
7,566 branch-misses # 4.0 % branch_miss_rate
189,307 branches # 635.3 M/sec branch_frequency
798,790 cpu-cycles # 2.7 GHz cycles_frequency
908,746 instructions # 1.1 instructions insn_per_cycle
<not counted> stalled-cycles-frontend # nan frontend_cycles_idle (0.00%)
1.000722079 seconds time elapsed
0.000000000 seconds user
0.000714000 seconds sys
--- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=249820, si_uid=0} ---
+++ exited with 0 +++
root@number:~#
It is not trying PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, is asking for
PERF_COUNT_HW_STALLED_CYCLES_BACKEND instead...
- Arnaldo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: perf stat issue with 7.0.0rc3
2026-03-17 19:56 ` Arnaldo Carvalho de Melo
@ 2026-03-17 20:12 ` Arnaldo Carvalho de Melo
2026-03-17 20:50 ` Ian Rogers
0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2026-03-17 20:12 UTC (permalink / raw)
To: Ian Rogers
Cc: Thomas Richter, linux-perf-users, Jan Polensky,
Linux Kernel Mailing List, Namhyung Kim
On Tue, Mar 17, 2026 at 04:56:51PM -0300, Arnaldo Carvalho de Melo wrote:
> It is not trying PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, is asking for
> PERF_COUNT_HW_STALLED_CYCLES_BACKEND instead...
If I instead ask just for stalled-cycles-frontend and
stalled-cycles-backend:
root@number:~# strace -e perf_event_open perf stat -e stalled-cycles-frontend,stalled-cycles-backend sleep 1
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 250619, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 250619, -1, -1, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=250619, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
Performance counter stats for 'sleep 1':
409,276 stalled-cycles-frontend
<not supported> stalled-cycles-backend
1.000428804 seconds time elapsed
0.000439000 seconds user
0.000000000 seconds sys
--- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=250618, si_uid=0} ---
+++ exited with 0 +++
root@number:~#
It used type=PERF_TYPE_RAW, config=0xa9 for stalled-cycles-frontend but
type=PERF_TYPE_HARDWARE, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND.
⬢ [acme@toolbx perf-tools]$ git grep stalled-cycles-frontend tools
tools/bpf/bpftool/link.c: [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = "stalled-cycles-frontend",
tools/perf/builtin-stat.c: 3,856,436,920 stalled-cycles-frontend # 74.09% frontend cycles idle
tools/perf/pmu-events/arch/common/common/legacy-hardware.json: "EventName": "stalled-cycles-frontend",
tools/perf/pmu-events/empty-pmu-events.c:/* offset=122795 */ "stalled-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of idle-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000"
tools/perf/pmu-events/empty-pmu-events.c:/* offset=122945 */ "idle-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of stalled-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000"
tools/perf/pmu-events/empty-pmu-events.c:{ 122795 }, /* stalled-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of idle-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000 */
tools/perf/tests/shell/stat+std_output.sh:event_name=(cpu-clock task-clock context-switches cpu-migrations page-faults stalled-cycles-frontend stalled-cycles-backend cycles instructions branches branch-misses)
tools/perf/util/evsel.c: "stalled-cycles-frontend",
⬢ [acme@toolbx perf-tools]$
This machine is:
⬢ [acme@toolbx perf-tools]$ grep -m1 "model name" /proc/cpuinfo
model name : AMD Ryzen 9 9950X3D 16-Core Processor
⬢ [acme@toolbx perf-tools]
And doesn't have PERF_COUNT_HW_STALLED_CYCLES_BACKEND, but has
PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, that gets configured using
PERF_TYPE_RAW and 0xa9 because:
root@number:~# cat /sys/devices/cpu/events/stalled-cycles-frontend
event=0xa9
root@number:~#
But I couldn't so far explain why in the default case it is asking for
PERF_COUNT_HW_STALLED_CYCLES_BACKEND, when it should be asking for
PERF_COUNT_HW_STALLED_CYCLES_FRONTEND or PERF_TYPE_RAW+config=0xa9...
- Arnaldo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: perf stat issue with 7.0.0rc3
2026-03-17 20:12 ` Arnaldo Carvalho de Melo
@ 2026-03-17 20:50 ` Ian Rogers
2026-03-18 0:37 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 10+ messages in thread
From: Ian Rogers @ 2026-03-17 20:50 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Thomas Richter, linux-perf-users, Jan Polensky,
Linux Kernel Mailing List, Namhyung Kim
On Tue, Mar 17, 2026 at 1:12 PM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> On Tue, Mar 17, 2026 at 04:56:51PM -0300, Arnaldo Carvalho de Melo wrote:
> > It is not trying PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, is asking for
> > PERF_COUNT_HW_STALLED_CYCLES_BACKEND instead...
>
> If I instead ask just for stalled-cycles-frontend and
> stalled-cycles-backend:
>
> root@number:~# strace -e perf_event_open perf stat -e stalled-cycles-frontend,stalled-cycles-backend sleep 1
I think you intend for this to be system wide '-a'.
> perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 250619, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
> perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 250619, -1, -1, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
> --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=250619, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
>
> Performance counter stats for 'sleep 1':
>
> 409,276 stalled-cycles-frontend
> <not supported> stalled-cycles-backend
>
> 1.000428804 seconds time elapsed
>
> 0.000439000 seconds user
> 0.000000000 seconds sys
>
>
> --- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=250618, si_uid=0} ---
> +++ exited with 0 +++
> root@number:~#
>
> It used type=PERF_TYPE_RAW, config=0xa9 for stalled-cycles-frontend but
> type=PERF_TYPE_HARDWARE, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND.
>
> ⬢ [acme@toolbx perf-tools]$ git grep stalled-cycles-frontend tools
> tools/bpf/bpftool/link.c: [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = "stalled-cycles-frontend",
> tools/perf/builtin-stat.c: 3,856,436,920 stalled-cycles-frontend # 74.09% frontend cycles idle
> tools/perf/pmu-events/arch/common/common/legacy-hardware.json: "EventName": "stalled-cycles-frontend",
> tools/perf/pmu-events/empty-pmu-events.c:/* offset=122795 */ "stalled-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of idle-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000"
> tools/perf/pmu-events/empty-pmu-events.c:/* offset=122945 */ "idle-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of stalled-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000"
> tools/perf/pmu-events/empty-pmu-events.c:{ 122795 }, /* stalled-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of idle-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000 */
> tools/perf/tests/shell/stat+std_output.sh:event_name=(cpu-clock task-clock context-switches cpu-migrations page-faults stalled-cycles-frontend stalled-cycles-backend cycles instructions branches branch-misses)
> tools/perf/util/evsel.c: "stalled-cycles-frontend",
> ⬢ [acme@toolbx perf-tools]$
>
> This machine is:
>
> ⬢ [acme@toolbx perf-tools]$ grep -m1 "model name" /proc/cpuinfo
> model name : AMD Ryzen 9 9950X3D 16-Core Processor
Lots of missing legacy events on AMD. The problem is worse with -dd and -ddd.
> ⬢ [acme@toolbx perf-tools]
>
> And doesn't have PERF_COUNT_HW_STALLED_CYCLES_BACKEND, but has
> PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, that gets configured using
> PERF_TYPE_RAW and 0xa9 because:
>
> root@number:~# cat /sys/devices/cpu/events/stalled-cycles-frontend
> event=0xa9
> root@number:~#
>
> But I couldn't so far explain why in the default case it is asking for
> PERF_COUNT_HW_STALLED_CYCLES_BACKEND, when it should be asking for
> PERF_COUNT_HW_STALLED_CYCLES_FRONTEND or PERF_TYPE_RAW+config=0xa9...
So the default events/metrics are now in json:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common/metrics.json?h=perf-tools-next
Relating to the stalls there are:
```
{
"BriefDescription": "Max front or backend stalls per instruction",
"MetricExpr": "max(stalled\\-cycles\\-frontend,
stalled\\-cycles\\-backend) / instructions",
"MetricGroup": "Default",
"MetricName": "stalled_cycles_per_instruction",
"DefaultShowEvents": "1"
},
{
"BriefDescription": "Frontend stalls per cycle",
"MetricExpr": "stalled\\-cycles\\-frontend / cpu\\-cycles",
"MetricGroup": "Default",
"MetricName": "frontend_cycles_idle",
"MetricThreshold": "frontend_cycles_idle > 0.1",
"DefaultShowEvents": "1"
},
{
"BriefDescription": "Backend stalls per cycle",
"MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
"MetricGroup": "Default",
"MetricName": "backend_cycles_idle",
"MetricThreshold": "backend_cycles_idle > 0.2",
"DefaultShowEvents": "1"
},
```
The stalled_cycles_per_instruction and backed_cycles_idle should fail
as the stalled-cycles-backend event is missing. frontend_cycles_idle
should work, I wonder if the 0 counts relate to trouble scheduling
groups of events. I'll need more verbose output to understand. Perhaps
for stalled_cycles_per_instruction, we should modify the metric to
tolerate missing events:
max(stalled\\-cycles\\-frontend if
have_event(stalled\\-cycles\\-frontend) else 0,
stalled\\-cycles\\-backend if have_event(stalled\\-cycles\\-backend)
else 0) / instructions
Thanks,
Ian
> - Arnaldo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: perf stat issue with 7.0.0rc3
2026-03-17 20:50 ` Ian Rogers
@ 2026-03-18 0:37 ` Arnaldo Carvalho de Melo
2026-03-18 2:25 ` Ian Rogers
0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2026-03-18 0:37 UTC (permalink / raw)
To: Ian Rogers
Cc: Thomas Richter, linux-perf-users, Jan Polensky,
Linux Kernel Mailing List, Namhyung Kim
On Tue, Mar 17, 2026 at 01:50:21PM -0700, Ian Rogers wrote:
> On Tue, Mar 17, 2026 at 1:12 PM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > On Tue, Mar 17, 2026 at 04:56:51PM -0300, Arnaldo Carvalho de Melo wrote:
> > > It is not trying PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, is asking for
> > > PERF_COUNT_HW_STALLED_CYCLES_BACKEND instead...
> > If I instead ask just for stalled-cycles-frontend and
> > stalled-cycles-backend:
> > root@number:~# strace -e perf_event_open perf stat -e stalled-cycles-frontend,stalled-cycles-backend sleep 1
> I think you intend for this to be system wide '-a'.
> > perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 250619, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
> > perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 250619, -1, -1, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
> > --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=250619, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
> >
> > Performance counter stats for 'sleep 1':
> >
> > 409,276 stalled-cycles-frontend
> > <not supported> stalled-cycles-backend
> >
> > 1.000428804 seconds time elapsed
> >
> > 0.000439000 seconds user
> > 0.000000000 seconds sys
> >
> >
> > --- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=250618, si_uid=0} ---
> > +++ exited with 0 +++
> > root@number:~#
> >
> > It used type=PERF_TYPE_RAW, config=0xa9 for stalled-cycles-frontend but
> > type=PERF_TYPE_HARDWARE, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND.
> >
> > ⬢ [acme@toolbx perf-tools]$ git grep stalled-cycles-frontend tools
> > tools/bpf/bpftool/link.c: [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = "stalled-cycles-frontend",
> > tools/perf/builtin-stat.c: 3,856,436,920 stalled-cycles-frontend # 74.09% frontend cycles idle
> > tools/perf/pmu-events/arch/common/common/legacy-hardware.json: "EventName": "stalled-cycles-frontend",
> > tools/perf/pmu-events/empty-pmu-events.c:/* offset=122795 */ "stalled-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of idle-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000"
> > tools/perf/pmu-events/empty-pmu-events.c:/* offset=122945 */ "idle-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of stalled-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000"
> > tools/perf/pmu-events/empty-pmu-events.c:{ 122795 }, /* stalled-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of idle-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000 */
> > tools/perf/tests/shell/stat+std_output.sh:event_name=(cpu-clock task-clock context-switches cpu-migrations page-faults stalled-cycles-frontend stalled-cycles-backend cycles instructions branches branch-misses)
> > tools/perf/util/evsel.c: "stalled-cycles-frontend",
> > ⬢ [acme@toolbx perf-tools]$
> >
> > This machine is:
> >
> > ⬢ [acme@toolbx perf-tools]$ grep -m1 "model name" /proc/cpuinfo
> > model name : AMD Ryzen 9 9950X3D 16-Core Processor
>
> Lots of missing legacy events on AMD. The problem is worse with -dd and -ddd.
>
> > ⬢ [acme@toolbx perf-tools]
> >
> > And doesn't have PERF_COUNT_HW_STALLED_CYCLES_BACKEND, but has
> > PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, that gets configured using
> > PERF_TYPE_RAW and 0xa9 because:
> >
> > root@number:~# cat /sys/devices/cpu/events/stalled-cycles-frontend
> > event=0xa9
> > root@number:~#
> >
> > But I couldn't so far explain why in the default case it is asking for
> > PERF_COUNT_HW_STALLED_CYCLES_BACKEND, when it should be asking for
> > PERF_COUNT_HW_STALLED_CYCLES_FRONTEND or PERF_TYPE_RAW+config=0xa9...
So you mean that it goes on to try this:
{
"BriefDescription": "Max front or backend stalls per instruction",
"MetricExpr": "max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions",
"MetricGroup": "Default",
"MetricName": "stalled_cycles_per_instruction",
"DefaultShowEvents": "1"
},
Yeah, it tries both:
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, 14, PERF_FLAG_FD_CLOEXEC) = 15
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, -1, PERF_FLAG_FD_CLOEXEC) = 16
perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, 16, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
The RAW one is the equivalent to PERF_COUNT_HW_STALLED_CYCLES_FRONTEND,
I see now that I looked again at the 'strace perf stat sleep 1'
But in the output it also says:
<not counted> stalled-cycles-frontend # nan frontend_cycles_idle (0.00%)
And if I try just this one:
root@number:~# strace -e perf_event_open perf stat -e stalled-cycles-frontend sleep 1
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865273, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=865273, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
Performance counter stats for 'sleep 1':
422,404 stalled-cycles-frontend
1.000432524 seconds time elapsed
0.000438000 seconds user
0.000000000 seconds sys
--- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=865272, si_uid=0} ---
+++ exited with 0 +++
root@number:~#
It works, so that line with stalled-cycles-frontend could have produced
the value, not '<not counted>', as this call succeeded:
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, 14, PERF_FLAG_FD_CLOEXEC) = 15
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, -1, PERF_FLAG_FD_CLOEXEC) = 16
Maybe the explanation is that it tries the metric, that uses both
frontend and backend, it fails at backend and then it discards the
frontend?
> So the default events/metrics are now in json:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common/metrics.json?h=perf-tools-next
> Relating to the stalls there are:
> ```
> {
> "BriefDescription": "Max front or backend stalls per instruction",
> "MetricExpr": "max(stalled\\-cycles\\-frontend,
> stalled\\-cycles\\-backend) / instructions",
> "MetricGroup": "Default",
> "MetricName": "stalled_cycles_per_instruction",
> "DefaultShowEvents": "1"
> },
root@number:~# strace -e perf_event_open perf stat -M stalled_cycles_per_instruction sleep 1
perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865362, -1, -1, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
Error:
No supported events found.
The stalled-cycles-backend event is not supported.
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=865362, si_uid=0, si_status=SIGTERM, si_utime=0, si_stime=0} ---
+++ exited with 1 +++
root@number:~#
> {
> "BriefDescription": "Frontend stalls per cycle",
> "MetricExpr": "stalled\\-cycles\\-frontend / cpu\\-cycles",
> "MetricGroup": "Default",
> "MetricName": "frontend_cycles_idle",
> "MetricThreshold": "frontend_cycles_idle > 0.1",
> "DefaultShowEvents": "1"
> },
root@number:~# strace -e perf_event_open perf stat -M frontend_cycles_idle sleep 1
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865414, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865414, -1, 3, PERF_FLAG_FD_CLOEXEC) = 4
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=865414, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
Performance counter stats for 'sleep 1':
881,022 cpu-cycles # 0.48 frontend_cycles_idle
422,386 stalled-cycles-frontend
1.000468505 seconds time elapsed
0.000504000 seconds user
0.000000000 seconds sys
--- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=865413, si_uid=0} ---
+++ exited with 0 +++
root@number:~#
> {
> "BriefDescription": "Backend stalls per cycle",
> "MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
> "MetricGroup": "Default",
> "MetricName": "backend_cycles_idle",
> "MetricThreshold": "backend_cycles_idle > 0.2",
> "DefaultShowEvents": "1"
> },
root@number:~# strace -e perf_event_open perf stat -M backend_cycles_idle sleep 1
perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865442, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865442, -1, 3, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=865442, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
Performance counter stats for 'sleep 1':
<not counted> cpu-cycles # nan backend_cycles_idle
<not supported> stalled-cycles-backend
1.000739264 seconds time elapsed
0.000675000 seconds user
0.000000000 seconds sys
--- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=865441, si_uid=0} ---
+++ exited with 0 +++
root@number:~#
> ```
> The stalled_cycles_per_instruction and backed_cycles_idle should fail
> as the stalled-cycles-backend event is missing. frontend_cycles_idle
> should work, I wonder if the 0 counts relate to trouble scheduling
> groups of events. I'll need more verbose output to understand. Perhaps
> for stalled_cycles_per_instruction, we should modify the metric to
> tolerate missing events:
>
> max(stalled\\-cycles\\-frontend if
> have_event(stalled\\-cycles\\-frontend) else 0,
> stalled\\-cycles\\-backend if have_event(stalled\\-cycles\\-backend)
> else 0) / instructions
That have_event() part also have to be implemented, right?
- Arnaldo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: perf stat issue with 7.0.0rc3
2026-03-18 0:37 ` Arnaldo Carvalho de Melo
@ 2026-03-18 2:25 ` Ian Rogers
2026-03-19 1:01 ` [PATCH v1] perf metrics: Make common stalled metrics conditional on having the event Ian Rogers
2026-03-24 4:19 ` perf stat issue with 7.0.0rc3 Ian Rogers
0 siblings, 2 replies; 10+ messages in thread
From: Ian Rogers @ 2026-03-18 2:25 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Thomas Richter, linux-perf-users, Jan Polensky,
Linux Kernel Mailing List, Namhyung Kim
On Tue, Mar 17, 2026 at 5:37 PM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> On Tue, Mar 17, 2026 at 01:50:21PM -0700, Ian Rogers wrote:
> > On Tue, Mar 17, 2026 at 1:12 PM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > > On Tue, Mar 17, 2026 at 04:56:51PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > It is not trying PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, is asking for
> > > > PERF_COUNT_HW_STALLED_CYCLES_BACKEND instead...
>
> > > If I instead ask just for stalled-cycles-frontend and
> > > stalled-cycles-backend:
>
> > > root@number:~# strace -e perf_event_open perf stat -e stalled-cycles-frontend,stalled-cycles-backend sleep 1
>
> > I think you intend for this to be system wide '-a'.
>
> > > perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 250619, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
> > > perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 250619, -1, -1, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
> > > --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=250619, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
> > >
> > > Performance counter stats for 'sleep 1':
> > >
> > > 409,276 stalled-cycles-frontend
> > > <not supported> stalled-cycles-backend
> > >
> > > 1.000428804 seconds time elapsed
> > >
> > > 0.000439000 seconds user
> > > 0.000000000 seconds sys
> > >
> > >
> > > --- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=250618, si_uid=0} ---
> > > +++ exited with 0 +++
> > > root@number:~#
> > >
> > > It used type=PERF_TYPE_RAW, config=0xa9 for stalled-cycles-frontend but
> > > type=PERF_TYPE_HARDWARE, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND.
> > >
> > > ⬢ [acme@toolbx perf-tools]$ git grep stalled-cycles-frontend tools
> > > tools/bpf/bpftool/link.c: [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = "stalled-cycles-frontend",
> > > tools/perf/builtin-stat.c: 3,856,436,920 stalled-cycles-frontend # 74.09% frontend cycles idle
> > > tools/perf/pmu-events/arch/common/common/legacy-hardware.json: "EventName": "stalled-cycles-frontend",
> > > tools/perf/pmu-events/empty-pmu-events.c:/* offset=122795 */ "stalled-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of idle-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000"
> > > tools/perf/pmu-events/empty-pmu-events.c:/* offset=122945 */ "idle-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of stalled-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000"
> > > tools/perf/pmu-events/empty-pmu-events.c:{ 122795 }, /* stalled-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of idle-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000 */
> > > tools/perf/tests/shell/stat+std_output.sh:event_name=(cpu-clock task-clock context-switches cpu-migrations page-faults stalled-cycles-frontend stalled-cycles-backend cycles instructions branches branch-misses)
> > > tools/perf/util/evsel.c: "stalled-cycles-frontend",
> > > ⬢ [acme@toolbx perf-tools]$
> > >
> > > This machine is:
> > >
> > > ⬢ [acme@toolbx perf-tools]$ grep -m1 "model name" /proc/cpuinfo
> > > model name : AMD Ryzen 9 9950X3D 16-Core Processor
> >
> > Lots of missing legacy events on AMD. The problem is worse with -dd and -ddd.
> >
> > > ⬢ [acme@toolbx perf-tools]
> > >
> > > And doesn't have PERF_COUNT_HW_STALLED_CYCLES_BACKEND, but has
> > > PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, that gets configured using
> > > PERF_TYPE_RAW and 0xa9 because:
> > >
> > > root@number:~# cat /sys/devices/cpu/events/stalled-cycles-frontend
> > > event=0xa9
> > > root@number:~#
> > >
> > > But I couldn't so far explain why in the default case it is asking for
> > > PERF_COUNT_HW_STALLED_CYCLES_BACKEND, when it should be asking for
> > > PERF_COUNT_HW_STALLED_CYCLES_FRONTEND or PERF_TYPE_RAW+config=0xa9...
>
> So you mean that it goes on to try this:
>
> {
> "BriefDescription": "Max front or backend stalls per instruction",
> "MetricExpr": "max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions",
> "MetricGroup": "Default",
> "MetricName": "stalled_cycles_per_instruction",
> "DefaultShowEvents": "1"
> },
>
> Yeah, it tries both:
>
> perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, 14, PERF_FLAG_FD_CLOEXEC) = 15
> perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, -1, PERF_FLAG_FD_CLOEXEC) = 16
> perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, 16, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
perf trace? The formatting on perf_event_open is better :-)
> The RAW one is the equivalent to PERF_COUNT_HW_STALLED_CYCLES_FRONTEND,
> I see now that I looked again at the 'strace perf stat sleep 1'
>
> But in the output it also says:
>
> <not counted> stalled-cycles-frontend # nan frontend_cycles_idle (0.00%)
>
> And if I try just this one:
>
> root@number:~# strace -e perf_event_open perf stat -e stalled-cycles-frontend sleep 1
> perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865273, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
> --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=865273, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
>
> Performance counter stats for 'sleep 1':
>
> 422,404 stalled-cycles-frontend
>
> 1.000432524 seconds time elapsed
>
> 0.000438000 seconds user
> 0.000000000 seconds sys
>
>
> --- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=865272, si_uid=0} ---
> +++ exited with 0 +++
> root@number:~#
>
> It works, so that line with stalled-cycles-frontend could have produced
> the value, not '<not counted>', as this call succeeded:
>
> perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, 14, PERF_FLAG_FD_CLOEXEC) = 15
> perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, -1, PERF_FLAG_FD_CLOEXEC) = 16
>
> Maybe the explanation is that it tries the metric, that uses both
> frontend and backend, it fails at backend and then it discards the
> frontend?
The metric logic tries to be smart, creating groups of events and then
sharing events between groups to avoid programming more events. I
suspect the frontend and backend are in a group, maybe the backend is
the leader. The backend fails to open, resulting in no group_fd and a
broken group for both the metrics.
> > So the default events/metrics are now in json:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common/metrics.json?h=perf-tools-next
> > Relating to the stalls there are:
> > ```
> > {
> > "BriefDescription": "Max front or backend stalls per instruction",
> > "MetricExpr": "max(stalled\\-cycles\\-frontend,
> > stalled\\-cycles\\-backend) / instructions",
> > "MetricGroup": "Default",
> > "MetricName": "stalled_cycles_per_instruction",
> > "DefaultShowEvents": "1"
> > },
>
> root@number:~# strace -e perf_event_open perf stat -M stalled_cycles_per_instruction sleep 1
> perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865362, -1, -1, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
> Error:
> No supported events found.
> The stalled-cycles-backend event is not supported.
> --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=865362, si_uid=0, si_status=SIGTERM, si_utime=0, si_stime=0} ---
> +++ exited with 1 +++
> root@number:~#
>
> > {
> > "BriefDescription": "Frontend stalls per cycle",
> > "MetricExpr": "stalled\\-cycles\\-frontend / cpu\\-cycles",
> > "MetricGroup": "Default",
> > "MetricName": "frontend_cycles_idle",
> > "MetricThreshold": "frontend_cycles_idle > 0.1",
> > "DefaultShowEvents": "1"
> > },
>
> root@number:~# strace -e perf_event_open perf stat -M frontend_cycles_idle sleep 1
> perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865414, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
> perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865414, -1, 3, PERF_FLAG_FD_CLOEXEC) = 4
> --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=865414, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
>
> Performance counter stats for 'sleep 1':
>
> 881,022 cpu-cycles # 0.48 frontend_cycles_idle
> 422,386 stalled-cycles-frontend
>
> 1.000468505 seconds time elapsed
>
> 0.000504000 seconds user
> 0.000000000 seconds sys
>
>
> --- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=865413, si_uid=0} ---
> +++ exited with 0 +++
> root@number:~#
> > {
> > "BriefDescription": "Backend stalls per cycle",
> > "MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
> > "MetricGroup": "Default",
> > "MetricName": "backend_cycles_idle",
> > "MetricThreshold": "backend_cycles_idle > 0.2",
> > "DefaultShowEvents": "1"
> > },
>
> root@number:~# strace -e perf_event_open perf stat -M backend_cycles_idle sleep 1
> perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865442, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
> perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865442, -1, 3, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
> --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=865442, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
>
> Performance counter stats for 'sleep 1':
>
> <not counted> cpu-cycles # nan backend_cycles_idle
> <not supported> stalled-cycles-backend
>
> 1.000739264 seconds time elapsed
>
> 0.000675000 seconds user
> 0.000000000 seconds sys
>
>
> --- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=865441, si_uid=0} ---
> +++ exited with 0 +++
> root@number:~#
>
> > ```
> > The stalled_cycles_per_instruction and backed_cycles_idle should fail
> > as the stalled-cycles-backend event is missing. frontend_cycles_idle
> > should work, I wonder if the 0 counts relate to trouble scheduling
> > groups of events. I'll need more verbose output to understand. Perhaps
> > for stalled_cycles_per_instruction, we should modify the metric to
> > tolerate missing events:
> >
> > max(stalled\\-cycles\\-frontend if
> > have_event(stalled\\-cycles\\-frontend) else 0,
> > stalled\\-cycles\\-backend if have_event(stalled\\-cycles\\-backend)
> > else 0) / instructions
>
> That have_event() part also have to be implemented, right?
I mistyped, has_event not have_event :-)
Thanks,
Ian
> - Arnaldo
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v1] perf metrics: Make common stalled metrics conditional on having the event
2026-03-18 2:25 ` Ian Rogers
@ 2026-03-19 1:01 ` Ian Rogers
2026-04-01 5:55 ` Ian Rogers
2026-04-04 0:15 ` Namhyung Kim
2026-03-24 4:19 ` perf stat issue with 7.0.0rc3 Ian Rogers
1 sibling, 2 replies; 10+ messages in thread
From: Ian Rogers @ 2026-03-19 1:01 UTC (permalink / raw)
To: acme; +Cc: irogers, japo, linux-kernel, linux-perf-users, namhyung, tmricht
The metric code uses the event parsing code but it generally assumes
all events are supported. Arnaldo reported AMD supporting
stalled-cycles-frontend but not stalled-cycles-backend [1]. An issue
with this is that before parsing happens the metric code tries to
share events within groups to reduce the number of events and
multiplexing. If the group has some supported and not supported
events, the whole group will become broken. To avoid this situation
add has_event tests to the metrics for stalled-cycles-frontend and
stalled-cycles-backend. has_events is evaluated when parsing the
metric and its result constant propagated (with if-elses) to reduce
the number of events. This means when the metric code considers
sharing the events, only supported events will be shared.
Note for backporting. This change updates
tools/perf/pmu-events/empty-pmu-events.c a convenience file for builds
on systems without python present. While the metrics.json code should
backport easily there can be conflicts on empty-pmu-events.c. In this
case the build will have left a file test-empty-pmu-events.c that can
be copied over empty-pmu-events.c to resolve issues and make an
appropriate empty-pmu-events.c for the json in the source tree at the
time of the build.
[1] https://lore.kernel.org/lkml/abm1nR-2xjOUBroD@x1/
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Closes: https://lore.kernel.org/lkml/abm1nR-2xjOUBroD@x1/
Fixes: c7adeb0974f1 ("perf jevents: Add set of common metrics based on default ones")
Signed-off-by: Ian Rogers <irogers@google.com>
---
.../arch/common/common/metrics.json | 6 +-
tools/perf/pmu-events/empty-pmu-events.c | 108 +++++++++---------
2 files changed, 57 insertions(+), 57 deletions(-)
diff --git a/tools/perf/pmu-events/arch/common/common/metrics.json b/tools/perf/pmu-events/arch/common/common/metrics.json
index 0d010b3ebc6d..cefc8bfe7830 100644
--- a/tools/perf/pmu-events/arch/common/common/metrics.json
+++ b/tools/perf/pmu-events/arch/common/common/metrics.json
@@ -46,14 +46,14 @@
},
{
"BriefDescription": "Max front or backend stalls per instruction",
- "MetricExpr": "max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions",
+ "MetricExpr": "(max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions) if (has_event(stalled\\-cycles\\-frontend) & has_event(stalled\\-cycles\\-backend)) else ((stalled\\-cycles\\-frontend / instructions) if has_event(stalled\\-cycles\\-frontend) else ((stalled\\-cycles\\-backend / instructions) if has_event(stalled\\-cycles\\-backend) else 0))",
"MetricGroup": "Default",
"MetricName": "stalled_cycles_per_instruction",
"DefaultShowEvents": "1"
},
{
"BriefDescription": "Frontend stalls per cycle",
- "MetricExpr": "stalled\\-cycles\\-frontend / cpu\\-cycles",
+ "MetricExpr": "(stalled\\-cycles\\-frontend / cpu\\-cycles) if has_event(stalled\\-cycles\\-frontend) else 0",
"MetricGroup": "Default",
"MetricName": "frontend_cycles_idle",
"MetricThreshold": "frontend_cycles_idle > 0.1",
@@ -61,7 +61,7 @@
},
{
"BriefDescription": "Backend stalls per cycle",
- "MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
+ "MetricExpr": "(stalled\\-cycles\\-backend / cpu\\-cycles) if has_event(stalled\\-cycles\\-backend) else 0",
"MetricGroup": "Default",
"MetricName": "backend_cycles_idle",
"MetricThreshold": "backend_cycles_idle > 0.2",
diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
index 76c395cf513c..a92dd0424f79 100644
--- a/tools/perf/pmu-events/empty-pmu-events.c
+++ b/tools/perf/pmu-events/empty-pmu-events.c
@@ -1310,33 +1310,33 @@ static const char *const big_c_string =
/* offset=128375 */ "migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\000011"
/* offset=128635 */ "page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\000011"
/* offset=128866 */ "insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\000001"
-/* offset=128979 */ "stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001"
-/* offset=129143 */ "frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001"
-/* offset=129273 */ "backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\000001"
-/* offset=129399 */ "cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011"
-/* offset=129575 */ "branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\000011"
-/* offset=129755 */ "branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001"
-/* offset=129859 */ "l1d_miss_rate\000Default2\000L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads\000l1d_miss_rate > 0.05\000L1D miss rate\000\000100%\000\000\000\000001"
-/* offset=129975 */ "llc_miss_rate\000Default2\000LLC\\-load\\-misses / LLC\\-loads\000llc_miss_rate > 0.05\000LLC miss rate\000\000100%\000\000\000\000001"
-/* offset=130076 */ "l1i_miss_rate\000Default3\000L1\\-icache\\-load\\-misses / L1\\-icache\\-loads\000l1i_miss_rate > 0.05\000L1I miss rate\000\000100%\000\000\000\000001"
-/* offset=130191 */ "dtlb_miss_rate\000Default3\000dTLB\\-load\\-misses / dTLB\\-loads\000dtlb_miss_rate > 0.05\000dTLB miss rate\000\000100%\000\000\000\000001"
-/* offset=130297 */ "itlb_miss_rate\000Default3\000iTLB\\-load\\-misses / iTLB\\-loads\000itlb_miss_rate > 0.05\000iTLB miss rate\000\000100%\000\000\000\000001"
-/* offset=130403 */ "l1_prefetch_miss_rate\000Default4\000L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches\000l1_prefetch_miss_rate > 0.05\000L1 prefetch miss rate\000\000100%\000\000\000\000001"
-/* offset=130551 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000"
-/* offset=130574 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000"
-/* offset=130638 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000"
-/* offset=130805 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
-/* offset=130870 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
-/* offset=130938 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000"
-/* offset=131010 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000"
-/* offset=131105 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000"
-/* offset=131240 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000"
-/* offset=131305 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000"
-/* offset=131374 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000"
-/* offset=131445 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\000000"
-/* offset=131468 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\000000"
-/* offset=131491 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\000000"
-/* offset=131512 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000"
+/* offset=128979 */ "stalled_cycles_per_instruction\000Default\000(max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions if has_event(stalled\\-cycles\\-frontend) & has_event(stalled\\-cycles\\-backend) else (stalled\\-cycles\\-frontend / instructions if has_event(stalled\\-cycles\\-frontend) else (stalled\\-cycles\\-backend / instructions if has_event(stalled\\-cycles\\-backend) else 0)))\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001"
+/* offset=129404 */ "frontend_cycles_idle\000Default\000(stalled\\-cycles\\-frontend / cpu\\-cycles if has_event(stalled\\-cycles\\-frontend) else 0)\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001"
+/* offset=129583 */ "backend_cycles_idle\000Default\000(stalled\\-cycles\\-backend / cpu\\-cycles if has_event(stalled\\-cycles\\-backend) else 0)\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\000001"
+/* offset=129757 */ "cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011"
+/* offset=129933 */ "branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\000011"
+/* offset=130113 */ "branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001"
+/* offset=130217 */ "l1d_miss_rate\000Default2\000L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads\000l1d_miss_rate > 0.05\000L1D miss rate\000\000100%\000\000\000\000001"
+/* offset=130333 */ "llc_miss_rate\000Default2\000LLC\\-load\\-misses / LLC\\-loads\000llc_miss_rate > 0.05\000LLC miss rate\000\000100%\000\000\000\000001"
+/* offset=130434 */ "l1i_miss_rate\000Default3\000L1\\-icache\\-load\\-misses / L1\\-icache\\-loads\000l1i_miss_rate > 0.05\000L1I miss rate\000\000100%\000\000\000\000001"
+/* offset=130549 */ "dtlb_miss_rate\000Default3\000dTLB\\-load\\-misses / dTLB\\-loads\000dtlb_miss_rate > 0.05\000dTLB miss rate\000\000100%\000\000\000\000001"
+/* offset=130655 */ "itlb_miss_rate\000Default3\000iTLB\\-load\\-misses / iTLB\\-loads\000itlb_miss_rate > 0.05\000iTLB miss rate\000\000100%\000\000\000\000001"
+/* offset=130761 */ "l1_prefetch_miss_rate\000Default4\000L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches\000l1_prefetch_miss_rate > 0.05\000L1 prefetch miss rate\000\000100%\000\000\000\000001"
+/* offset=130909 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000"
+/* offset=130932 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000"
+/* offset=130996 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000"
+/* offset=131163 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
+/* offset=131228 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
+/* offset=131296 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000"
+/* offset=131368 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000"
+/* offset=131463 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000"
+/* offset=131598 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000"
+/* offset=131663 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000"
+/* offset=131732 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000"
+/* offset=131803 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\000000"
+/* offset=131826 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\000000"
+/* offset=131849 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\000000"
+/* offset=131870 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000"
;
static const struct compact_pmu_event pmu_events__common_default_core[] = {
@@ -2626,22 +2626,22 @@ static const struct pmu_table_entry pmu_events__common[] = {
static const struct compact_pmu_event pmu_metrics__common_default_core[] = {
{ 127956 }, /* CPUs_utilized\000Default\000(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)\000\000Average CPU utilization\000\0001CPUs\000\000\000\000011 */
-{ 129273 }, /* backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\000001 */
-{ 129575 }, /* branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\000011 */
-{ 129755 }, /* branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001 */
+{ 129583 }, /* backend_cycles_idle\000Default\000(stalled\\-cycles\\-backend / cpu\\-cycles if has_event(stalled\\-cycles\\-backend) else 0)\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\000001 */
+{ 129933 }, /* branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\000011 */
+{ 130113 }, /* branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001 */
{ 128142 }, /* cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\000011 */
-{ 129399 }, /* cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011 */
-{ 130191 }, /* dtlb_miss_rate\000Default3\000dTLB\\-load\\-misses / dTLB\\-loads\000dtlb_miss_rate > 0.05\000dTLB miss rate\000\000100%\000\000\000\000001 */
-{ 129143 }, /* frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001 */
+{ 129757 }, /* cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011 */
+{ 130549 }, /* dtlb_miss_rate\000Default3\000dTLB\\-load\\-misses / dTLB\\-loads\000dtlb_miss_rate > 0.05\000dTLB miss rate\000\000100%\000\000\000\000001 */
+{ 129404 }, /* frontend_cycles_idle\000Default\000(stalled\\-cycles\\-frontend / cpu\\-cycles if has_event(stalled\\-cycles\\-frontend) else 0)\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001 */
{ 128866 }, /* insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\000001 */
-{ 130297 }, /* itlb_miss_rate\000Default3\000iTLB\\-load\\-misses / iTLB\\-loads\000itlb_miss_rate > 0.05\000iTLB miss rate\000\000100%\000\000\000\000001 */
-{ 130403 }, /* l1_prefetch_miss_rate\000Default4\000L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches\000l1_prefetch_miss_rate > 0.05\000L1 prefetch miss rate\000\000100%\000\000\000\000001 */
-{ 129859 }, /* l1d_miss_rate\000Default2\000L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads\000l1d_miss_rate > 0.05\000L1D miss rate\000\000100%\000\000\000\000001 */
-{ 130076 }, /* l1i_miss_rate\000Default3\000L1\\-icache\\-load\\-misses / L1\\-icache\\-loads\000l1i_miss_rate > 0.05\000L1I miss rate\000\000100%\000\000\000\000001 */
-{ 129975 }, /* llc_miss_rate\000Default2\000LLC\\-load\\-misses / LLC\\-loads\000llc_miss_rate > 0.05\000LLC miss rate\000\000100%\000\000\000\000001 */
+{ 130655 }, /* itlb_miss_rate\000Default3\000iTLB\\-load\\-misses / iTLB\\-loads\000itlb_miss_rate > 0.05\000iTLB miss rate\000\000100%\000\000\000\000001 */
+{ 130761 }, /* l1_prefetch_miss_rate\000Default4\000L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches\000l1_prefetch_miss_rate > 0.05\000L1 prefetch miss rate\000\000100%\000\000\000\000001 */
+{ 130217 }, /* l1d_miss_rate\000Default2\000L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads\000l1d_miss_rate > 0.05\000L1D miss rate\000\000100%\000\000\000\000001 */
+{ 130434 }, /* l1i_miss_rate\000Default3\000L1\\-icache\\-load\\-misses / L1\\-icache\\-loads\000l1i_miss_rate > 0.05\000L1I miss rate\000\000100%\000\000\000\000001 */
+{ 130333 }, /* llc_miss_rate\000Default2\000LLC\\-load\\-misses / LLC\\-loads\000llc_miss_rate > 0.05\000LLC miss rate\000\000100%\000\000\000\000001 */
{ 128375 }, /* migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\000011 */
{ 128635 }, /* page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\000011 */
-{ 128979 }, /* stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001 */
+{ 128979 }, /* stalled_cycles_per_instruction\000Default\000(max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions if has_event(stalled\\-cycles\\-frontend) & has_event(stalled\\-cycles\\-backend) else (stalled\\-cycles\\-frontend / instructions if has_event(stalled\\-cycles\\-frontend) else (stalled\\-cycles\\-backend / instructions if has_event(stalled\\-cycles\\-backend) else 0)))\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001 */
};
@@ -2714,21 +2714,21 @@ static const struct pmu_table_entry pmu_events__test_soc_cpu[] = {
};
static const struct compact_pmu_event pmu_metrics__test_soc_cpu_default_core[] = {
-{ 130551 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000 */
-{ 131240 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000 */
-{ 131010 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000 */
-{ 131105 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000 */
-{ 131305 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
-{ 131374 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
-{ 130638 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000 */
-{ 130574 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000 */
-{ 131512 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000 */
-{ 131445 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\000000 */
-{ 131468 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\000000 */
-{ 131491 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\000000 */
-{ 130938 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000 */
-{ 130805 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
-{ 130870 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
+{ 130909 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000 */
+{ 131598 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000 */
+{ 131368 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000 */
+{ 131463 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000 */
+{ 131663 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
+{ 131732 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
+{ 130996 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000 */
+{ 130932 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000 */
+{ 131870 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000 */
+{ 131803 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\000000 */
+{ 131826 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\000000 */
+{ 131849 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\000000 */
+{ 131296 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000 */
+{ 131163 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
+{ 131228 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
};
--
2.53.0.851.ga537e3e6e9-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: perf stat issue with 7.0.0rc3
2026-03-18 2:25 ` Ian Rogers
2026-03-19 1:01 ` [PATCH v1] perf metrics: Make common stalled metrics conditional on having the event Ian Rogers
@ 2026-03-24 4:19 ` Ian Rogers
1 sibling, 0 replies; 10+ messages in thread
From: Ian Rogers @ 2026-03-24 4:19 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Thomas Richter, linux-perf-users, Jan Polensky,
Linux Kernel Mailing List, Namhyung Kim
On Tue, Mar 17, 2026 at 7:25 PM Ian Rogers <irogers@google.com> wrote:
>
> On Tue, Mar 17, 2026 at 5:37 PM Arnaldo Carvalho de Melo
> <acme@kernel.org> wrote:
> >
> > On Tue, Mar 17, 2026 at 01:50:21PM -0700, Ian Rogers wrote:
> > > On Tue, Mar 17, 2026 at 1:12 PM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > > > On Tue, Mar 17, 2026 at 04:56:51PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > > It is not trying PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, is asking for
> > > > > PERF_COUNT_HW_STALLED_CYCLES_BACKEND instead...
> >
> > > > If I instead ask just for stalled-cycles-frontend and
> > > > stalled-cycles-backend:
> >
> > > > root@number:~# strace -e perf_event_open perf stat -e stalled-cycles-frontend,stalled-cycles-backend sleep 1
> >
> > > I think you intend for this to be system wide '-a'.
> >
> > > > perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 250619, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
> > > > perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 250619, -1, -1, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
> > > > --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=250619, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
> > > >
> > > > Performance counter stats for 'sleep 1':
> > > >
> > > > 409,276 stalled-cycles-frontend
> > > > <not supported> stalled-cycles-backend
> > > >
> > > > 1.000428804 seconds time elapsed
> > > >
> > > > 0.000439000 seconds user
> > > > 0.000000000 seconds sys
> > > >
> > > >
> > > > --- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=250618, si_uid=0} ---
> > > > +++ exited with 0 +++
> > > > root@number:~#
> > > >
> > > > It used type=PERF_TYPE_RAW, config=0xa9 for stalled-cycles-frontend but
> > > > type=PERF_TYPE_HARDWARE, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND.
> > > >
> > > > ⬢ [acme@toolbx perf-tools]$ git grep stalled-cycles-frontend tools
> > > > tools/bpf/bpftool/link.c: [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = "stalled-cycles-frontend",
> > > > tools/perf/builtin-stat.c: 3,856,436,920 stalled-cycles-frontend # 74.09% frontend cycles idle
> > > > tools/perf/pmu-events/arch/common/common/legacy-hardware.json: "EventName": "stalled-cycles-frontend",
> > > > tools/perf/pmu-events/empty-pmu-events.c:/* offset=122795 */ "stalled-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of idle-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000"
> > > > tools/perf/pmu-events/empty-pmu-events.c:/* offset=122945 */ "idle-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of stalled-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000"
> > > > tools/perf/pmu-events/empty-pmu-events.c:{ 122795 }, /* stalled-cycles-frontend\000legacy hardware\000Stalled cycles during issue [This event is an alias of idle-cycles-frontend]\000legacy-hardware-config=7\000\00000\000\000\000\000\000 */
> > > > tools/perf/tests/shell/stat+std_output.sh:event_name=(cpu-clock task-clock context-switches cpu-migrations page-faults stalled-cycles-frontend stalled-cycles-backend cycles instructions branches branch-misses)
> > > > tools/perf/util/evsel.c: "stalled-cycles-frontend",
> > > > ⬢ [acme@toolbx perf-tools]$
> > > >
> > > > This machine is:
> > > >
> > > > ⬢ [acme@toolbx perf-tools]$ grep -m1 "model name" /proc/cpuinfo
> > > > model name : AMD Ryzen 9 9950X3D 16-Core Processor
> > >
> > > Lots of missing legacy events on AMD. The problem is worse with -dd and -ddd.
> > >
> > > > ⬢ [acme@toolbx perf-tools]
> > > >
> > > > And doesn't have PERF_COUNT_HW_STALLED_CYCLES_BACKEND, but has
> > > > PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, that gets configured using
> > > > PERF_TYPE_RAW and 0xa9 because:
> > > >
> > > > root@number:~# cat /sys/devices/cpu/events/stalled-cycles-frontend
> > > > event=0xa9
> > > > root@number:~#
> > > >
> > > > But I couldn't so far explain why in the default case it is asking for
> > > > PERF_COUNT_HW_STALLED_CYCLES_BACKEND, when it should be asking for
> > > > PERF_COUNT_HW_STALLED_CYCLES_FRONTEND or PERF_TYPE_RAW+config=0xa9...
> >
> > So you mean that it goes on to try this:
> >
> > {
> > "BriefDescription": "Max front or backend stalls per instruction",
> > "MetricExpr": "max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions",
> > "MetricGroup": "Default",
> > "MetricName": "stalled_cycles_per_instruction",
> > "DefaultShowEvents": "1"
> > },
> >
> > Yeah, it tries both:
> >
> > perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, 14, PERF_FLAG_FD_CLOEXEC) = 15
> > perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, -1, PERF_FLAG_FD_CLOEXEC) = 16
> > perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, 16, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
>
> perf trace? The formatting on perf_event_open is better :-)
>
> > The RAW one is the equivalent to PERF_COUNT_HW_STALLED_CYCLES_FRONTEND,
> > I see now that I looked again at the 'strace perf stat sleep 1'
> >
> > But in the output it also says:
> >
> > <not counted> stalled-cycles-frontend # nan frontend_cycles_idle (0.00%)
> >
> > And if I try just this one:
> >
> > root@number:~# strace -e perf_event_open perf stat -e stalled-cycles-frontend sleep 1
> > perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865273, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
> > --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=865273, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
> >
> > Performance counter stats for 'sleep 1':
> >
> > 422,404 stalled-cycles-frontend
> >
> > 1.000432524 seconds time elapsed
> >
> > 0.000438000 seconds user
> > 0.000000000 seconds sys
> >
> >
> > --- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=865272, si_uid=0} ---
> > +++ exited with 0 +++
> > root@number:~#
> >
> > It works, so that line with stalled-cycles-frontend could have produced
> > the value, not '<not counted>', as this call succeeded:
> >
> > perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, 14, PERF_FLAG_FD_CLOEXEC) = 15
> > perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865157, -1, -1, PERF_FLAG_FD_CLOEXEC) = 16
> >
> > Maybe the explanation is that it tries the metric, that uses both
> > frontend and backend, it fails at backend and then it discards the
> > frontend?
>
> The metric logic tries to be smart, creating groups of events and then
> sharing events between groups to avoid programming more events. I
> suspect the frontend and backend are in a group, maybe the backend is
> the leader. The backend fails to open, resulting in no group_fd and a
> broken group for both the metrics.
>
> > > So the default events/metrics are now in json:
> > > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common/metrics.json?h=perf-tools-next
> > > Relating to the stalls there are:
> > > ```
> > > {
> > > "BriefDescription": "Max front or backend stalls per instruction",
> > > "MetricExpr": "max(stalled\\-cycles\\-frontend,
> > > stalled\\-cycles\\-backend) / instructions",
> > > "MetricGroup": "Default",
> > > "MetricName": "stalled_cycles_per_instruction",
> > > "DefaultShowEvents": "1"
> > > },
> >
> > root@number:~# strace -e perf_event_open perf stat -M stalled_cycles_per_instruction sleep 1
> > perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865362, -1, -1, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
> > Error:
> > No supported events found.
> > The stalled-cycles-backend event is not supported.
> > --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=865362, si_uid=0, si_status=SIGTERM, si_utime=0, si_stime=0} ---
> > +++ exited with 1 +++
> > root@number:~#
> >
> > > {
> > > "BriefDescription": "Frontend stalls per cycle",
> > > "MetricExpr": "stalled\\-cycles\\-frontend / cpu\\-cycles",
> > > "MetricGroup": "Default",
> > > "MetricName": "frontend_cycles_idle",
> > > "MetricThreshold": "frontend_cycles_idle > 0.1",
> > > "DefaultShowEvents": "1"
> > > },
> >
> > root@number:~# strace -e perf_event_open perf stat -M frontend_cycles_idle sleep 1
> > perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865414, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
> > perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0xa9, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865414, -1, 3, PERF_FLAG_FD_CLOEXEC) = 4
> > --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=865414, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
> >
> > Performance counter stats for 'sleep 1':
> >
> > 881,022 cpu-cycles # 0.48 frontend_cycles_idle
> > 422,386 stalled-cycles-frontend
> >
> > 1.000468505 seconds time elapsed
> >
> > 0.000504000 seconds user
> > 0.000000000 seconds sys
> >
> >
> > --- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=865413, si_uid=0} ---
> > +++ exited with 0 +++
> > root@number:~#
> > > {
> > > "BriefDescription": "Backend stalls per cycle",
> > > "MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
> > > "MetricGroup": "Default",
> > > "MetricName": "backend_cycles_idle",
> > > "MetricThreshold": "backend_cycles_idle > 0.2",
> > > "DefaultShowEvents": "1"
> > > },
> >
> > root@number:~# strace -e perf_event_open perf stat -M backend_cycles_idle sleep 1
> > perf_event_open({type=PERF_TYPE_RAW, size=PERF_ATTR_SIZE_VER9, config=0x76, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, ...}, 865442, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
> > perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER9, config=PERF_COUNT_HW_STALLED_CYCLES_BACKEND, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP, inherit=1, precise_ip=0 /* arbitrary skid */, ...}, 865442, -1, 3, PERF_FLAG_FD_CLOEXEC) = -1 ENOENT (No such file or directory)
> > --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=865442, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
> >
> > Performance counter stats for 'sleep 1':
> >
> > <not counted> cpu-cycles # nan backend_cycles_idle
> > <not supported> stalled-cycles-backend
> >
> > 1.000739264 seconds time elapsed
> >
> > 0.000675000 seconds user
> > 0.000000000 seconds sys
> >
> >
> > --- SIGCHLD {si_signo=SIGCHLD, si_code=SI_USER, si_pid=865441, si_uid=0} ---
> > +++ exited with 0 +++
> > root@number:~#
> >
> > > ```
> > > The stalled_cycles_per_instruction and backed_cycles_idle should fail
> > > as the stalled-cycles-backend event is missing. frontend_cycles_idle
> > > should work, I wonder if the 0 counts relate to trouble scheduling
> > > groups of events. I'll need more verbose output to understand. Perhaps
> > > for stalled_cycles_per_instruction, we should modify the metric to
> > > tolerate missing events:
> > >
> > > max(stalled\\-cycles\\-frontend if
> > > have_event(stalled\\-cycles\\-frontend) else 0,
> > > stalled\\-cycles\\-backend if have_event(stalled\\-cycles\\-backend)
> > > else 0) / instructions
> >
> > That have_event() part also have to be implemented, right?
>
> I mistyped, has_event not have_event :-)
I posted:
https://lore.kernel.org/lkml/20260319010103.834106-1-irogers@google.com/
that implements the has_event approach.
Thanks,
Ian
> Thanks,
> Ian
>
> > - Arnaldo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1] perf metrics: Make common stalled metrics conditional on having the event
2026-03-19 1:01 ` [PATCH v1] perf metrics: Make common stalled metrics conditional on having the event Ian Rogers
@ 2026-04-01 5:55 ` Ian Rogers
2026-04-04 0:15 ` Namhyung Kim
1 sibling, 0 replies; 10+ messages in thread
From: Ian Rogers @ 2026-04-01 5:55 UTC (permalink / raw)
To: acme; +Cc: japo, linux-kernel, linux-perf-users, namhyung, tmricht
On Wed, Mar 18, 2026 at 6:01 PM Ian Rogers <irogers@google.com> wrote:
>
> The metric code uses the event parsing code but it generally assumes
> all events are supported. Arnaldo reported AMD supporting
> stalled-cycles-frontend but not stalled-cycles-backend [1]. An issue
> with this is that before parsing happens the metric code tries to
> share events within groups to reduce the number of events and
> multiplexing. If the group has some supported and not supported
> events, the whole group will become broken. To avoid this situation
> add has_event tests to the metrics for stalled-cycles-frontend and
> stalled-cycles-backend. has_events is evaluated when parsing the
> metric and its result constant propagated (with if-elses) to reduce
> the number of events. This means when the metric code considers
> sharing the events, only supported events will be shared.
>
> Note for backporting. This change updates
> tools/perf/pmu-events/empty-pmu-events.c a convenience file for builds
> on systems without python present. While the metrics.json code should
> backport easily there can be conflicts on empty-pmu-events.c. In this
> case the build will have left a file test-empty-pmu-events.c that can
> be copied over empty-pmu-events.c to resolve issues and make an
> appropriate empty-pmu-events.c for the json in the source tree at the
> time of the build.
>
> [1] https://lore.kernel.org/lkml/abm1nR-2xjOUBroD@x1/
>
> Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
> Closes: https://lore.kernel.org/lkml/abm1nR-2xjOUBroD@x1/
> Fixes: c7adeb0974f1 ("perf jevents: Add set of common metrics based on default ones")
> Signed-off-by: Ian Rogers <irogers@google.com>
Ping.
Thanks,
Ian
> ---
> .../arch/common/common/metrics.json | 6 +-
> tools/perf/pmu-events/empty-pmu-events.c | 108 +++++++++---------
> 2 files changed, 57 insertions(+), 57 deletions(-)
>
> diff --git a/tools/perf/pmu-events/arch/common/common/metrics.json b/tools/perf/pmu-events/arch/common/common/metrics.json
> index 0d010b3ebc6d..cefc8bfe7830 100644
> --- a/tools/perf/pmu-events/arch/common/common/metrics.json
> +++ b/tools/perf/pmu-events/arch/common/common/metrics.json
> @@ -46,14 +46,14 @@
> },
> {
> "BriefDescription": "Max front or backend stalls per instruction",
> - "MetricExpr": "max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions",
> + "MetricExpr": "(max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions) if (has_event(stalled\\-cycles\\-frontend) & has_event(stalled\\-cycles\\-backend)) else ((stalled\\-cycles\\-frontend / instructions) if has_event(stalled\\-cycles\\-frontend) else ((stalled\\-cycles\\-backend / instructions) if has_event(stalled\\-cycles\\-backend) else 0))",
> "MetricGroup": "Default",
> "MetricName": "stalled_cycles_per_instruction",
> "DefaultShowEvents": "1"
> },
> {
> "BriefDescription": "Frontend stalls per cycle",
> - "MetricExpr": "stalled\\-cycles\\-frontend / cpu\\-cycles",
> + "MetricExpr": "(stalled\\-cycles\\-frontend / cpu\\-cycles) if has_event(stalled\\-cycles\\-frontend) else 0",
> "MetricGroup": "Default",
> "MetricName": "frontend_cycles_idle",
> "MetricThreshold": "frontend_cycles_idle > 0.1",
> @@ -61,7 +61,7 @@
> },
> {
> "BriefDescription": "Backend stalls per cycle",
> - "MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
> + "MetricExpr": "(stalled\\-cycles\\-backend / cpu\\-cycles) if has_event(stalled\\-cycles\\-backend) else 0",
> "MetricGroup": "Default",
> "MetricName": "backend_cycles_idle",
> "MetricThreshold": "backend_cycles_idle > 0.2",
> diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
> index 76c395cf513c..a92dd0424f79 100644
> --- a/tools/perf/pmu-events/empty-pmu-events.c
> +++ b/tools/perf/pmu-events/empty-pmu-events.c
> @@ -1310,33 +1310,33 @@ static const char *const big_c_string =
> /* offset=128375 */ "migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\000011"
> /* offset=128635 */ "page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\000011"
> /* offset=128866 */ "insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\000001"
> -/* offset=128979 */ "stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001"
> -/* offset=129143 */ "frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001"
> -/* offset=129273 */ "backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\000001"
> -/* offset=129399 */ "cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011"
> -/* offset=129575 */ "branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\000011"
> -/* offset=129755 */ "branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001"
> -/* offset=129859 */ "l1d_miss_rate\000Default2\000L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads\000l1d_miss_rate > 0.05\000L1D miss rate\000\000100%\000\000\000\000001"
> -/* offset=129975 */ "llc_miss_rate\000Default2\000LLC\\-load\\-misses / LLC\\-loads\000llc_miss_rate > 0.05\000LLC miss rate\000\000100%\000\000\000\000001"
> -/* offset=130076 */ "l1i_miss_rate\000Default3\000L1\\-icache\\-load\\-misses / L1\\-icache\\-loads\000l1i_miss_rate > 0.05\000L1I miss rate\000\000100%\000\000\000\000001"
> -/* offset=130191 */ "dtlb_miss_rate\000Default3\000dTLB\\-load\\-misses / dTLB\\-loads\000dtlb_miss_rate > 0.05\000dTLB miss rate\000\000100%\000\000\000\000001"
> -/* offset=130297 */ "itlb_miss_rate\000Default3\000iTLB\\-load\\-misses / iTLB\\-loads\000itlb_miss_rate > 0.05\000iTLB miss rate\000\000100%\000\000\000\000001"
> -/* offset=130403 */ "l1_prefetch_miss_rate\000Default4\000L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches\000l1_prefetch_miss_rate > 0.05\000L1 prefetch miss rate\000\000100%\000\000\000\000001"
> -/* offset=130551 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000"
> -/* offset=130574 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000"
> -/* offset=130638 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000"
> -/* offset=130805 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
> -/* offset=130870 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
> -/* offset=130938 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000"
> -/* offset=131010 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000"
> -/* offset=131105 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000"
> -/* offset=131240 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000"
> -/* offset=131305 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000"
> -/* offset=131374 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000"
> -/* offset=131445 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\000000"
> -/* offset=131468 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\000000"
> -/* offset=131491 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\000000"
> -/* offset=131512 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000"
> +/* offset=128979 */ "stalled_cycles_per_instruction\000Default\000(max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions if has_event(stalled\\-cycles\\-frontend) & has_event(stalled\\-cycles\\-backend) else (stalled\\-cycles\\-frontend / instructions if has_event(stalled\\-cycles\\-frontend) else (stalled\\-cycles\\-backend / instructions if has_event(stalled\\-cycles\\-backend) else 0)))\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001"
> +/* offset=129404 */ "frontend_cycles_idle\000Default\000(stalled\\-cycles\\-frontend / cpu\\-cycles if has_event(stalled\\-cycles\\-frontend) else 0)\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001"
> +/* offset=129583 */ "backend_cycles_idle\000Default\000(stalled\\-cycles\\-backend / cpu\\-cycles if has_event(stalled\\-cycles\\-backend) else 0)\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\000001"
> +/* offset=129757 */ "cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011"
> +/* offset=129933 */ "branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\000011"
> +/* offset=130113 */ "branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001"
> +/* offset=130217 */ "l1d_miss_rate\000Default2\000L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads\000l1d_miss_rate > 0.05\000L1D miss rate\000\000100%\000\000\000\000001"
> +/* offset=130333 */ "llc_miss_rate\000Default2\000LLC\\-load\\-misses / LLC\\-loads\000llc_miss_rate > 0.05\000LLC miss rate\000\000100%\000\000\000\000001"
> +/* offset=130434 */ "l1i_miss_rate\000Default3\000L1\\-icache\\-load\\-misses / L1\\-icache\\-loads\000l1i_miss_rate > 0.05\000L1I miss rate\000\000100%\000\000\000\000001"
> +/* offset=130549 */ "dtlb_miss_rate\000Default3\000dTLB\\-load\\-misses / dTLB\\-loads\000dtlb_miss_rate > 0.05\000dTLB miss rate\000\000100%\000\000\000\000001"
> +/* offset=130655 */ "itlb_miss_rate\000Default3\000iTLB\\-load\\-misses / iTLB\\-loads\000itlb_miss_rate > 0.05\000iTLB miss rate\000\000100%\000\000\000\000001"
> +/* offset=130761 */ "l1_prefetch_miss_rate\000Default4\000L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches\000l1_prefetch_miss_rate > 0.05\000L1 prefetch miss rate\000\000100%\000\000\000\000001"
> +/* offset=130909 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000"
> +/* offset=130932 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000"
> +/* offset=130996 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000"
> +/* offset=131163 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
> +/* offset=131228 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
> +/* offset=131296 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000"
> +/* offset=131368 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000"
> +/* offset=131463 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000"
> +/* offset=131598 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000"
> +/* offset=131663 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000"
> +/* offset=131732 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000"
> +/* offset=131803 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\000000"
> +/* offset=131826 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\000000"
> +/* offset=131849 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\000000"
> +/* offset=131870 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000"
> ;
>
> static const struct compact_pmu_event pmu_events__common_default_core[] = {
> @@ -2626,22 +2626,22 @@ static const struct pmu_table_entry pmu_events__common[] = {
>
> static const struct compact_pmu_event pmu_metrics__common_default_core[] = {
> { 127956 }, /* CPUs_utilized\000Default\000(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)\000\000Average CPU utilization\000\0001CPUs\000\000\000\000011 */
> -{ 129273 }, /* backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\000001 */
> -{ 129575 }, /* branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\000011 */
> -{ 129755 }, /* branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001 */
> +{ 129583 }, /* backend_cycles_idle\000Default\000(stalled\\-cycles\\-backend / cpu\\-cycles if has_event(stalled\\-cycles\\-backend) else 0)\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\000001 */
> +{ 129933 }, /* branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\000011 */
> +{ 130113 }, /* branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001 */
> { 128142 }, /* cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\000011 */
> -{ 129399 }, /* cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011 */
> -{ 130191 }, /* dtlb_miss_rate\000Default3\000dTLB\\-load\\-misses / dTLB\\-loads\000dtlb_miss_rate > 0.05\000dTLB miss rate\000\000100%\000\000\000\000001 */
> -{ 129143 }, /* frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001 */
> +{ 129757 }, /* cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011 */
> +{ 130549 }, /* dtlb_miss_rate\000Default3\000dTLB\\-load\\-misses / dTLB\\-loads\000dtlb_miss_rate > 0.05\000dTLB miss rate\000\000100%\000\000\000\000001 */
> +{ 129404 }, /* frontend_cycles_idle\000Default\000(stalled\\-cycles\\-frontend / cpu\\-cycles if has_event(stalled\\-cycles\\-frontend) else 0)\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001 */
> { 128866 }, /* insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\000001 */
> -{ 130297 }, /* itlb_miss_rate\000Default3\000iTLB\\-load\\-misses / iTLB\\-loads\000itlb_miss_rate > 0.05\000iTLB miss rate\000\000100%\000\000\000\000001 */
> -{ 130403 }, /* l1_prefetch_miss_rate\000Default4\000L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches\000l1_prefetch_miss_rate > 0.05\000L1 prefetch miss rate\000\000100%\000\000\000\000001 */
> -{ 129859 }, /* l1d_miss_rate\000Default2\000L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads\000l1d_miss_rate > 0.05\000L1D miss rate\000\000100%\000\000\000\000001 */
> -{ 130076 }, /* l1i_miss_rate\000Default3\000L1\\-icache\\-load\\-misses / L1\\-icache\\-loads\000l1i_miss_rate > 0.05\000L1I miss rate\000\000100%\000\000\000\000001 */
> -{ 129975 }, /* llc_miss_rate\000Default2\000LLC\\-load\\-misses / LLC\\-loads\000llc_miss_rate > 0.05\000LLC miss rate\000\000100%\000\000\000\000001 */
> +{ 130655 }, /* itlb_miss_rate\000Default3\000iTLB\\-load\\-misses / iTLB\\-loads\000itlb_miss_rate > 0.05\000iTLB miss rate\000\000100%\000\000\000\000001 */
> +{ 130761 }, /* l1_prefetch_miss_rate\000Default4\000L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches\000l1_prefetch_miss_rate > 0.05\000L1 prefetch miss rate\000\000100%\000\000\000\000001 */
> +{ 130217 }, /* l1d_miss_rate\000Default2\000L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads\000l1d_miss_rate > 0.05\000L1D miss rate\000\000100%\000\000\000\000001 */
> +{ 130434 }, /* l1i_miss_rate\000Default3\000L1\\-icache\\-load\\-misses / L1\\-icache\\-loads\000l1i_miss_rate > 0.05\000L1I miss rate\000\000100%\000\000\000\000001 */
> +{ 130333 }, /* llc_miss_rate\000Default2\000LLC\\-load\\-misses / LLC\\-loads\000llc_miss_rate > 0.05\000LLC miss rate\000\000100%\000\000\000\000001 */
> { 128375 }, /* migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\000011 */
> { 128635 }, /* page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\000011 */
> -{ 128979 }, /* stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001 */
> +{ 128979 }, /* stalled_cycles_per_instruction\000Default\000(max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions if has_event(stalled\\-cycles\\-frontend) & has_event(stalled\\-cycles\\-backend) else (stalled\\-cycles\\-frontend / instructions if has_event(stalled\\-cycles\\-frontend) else (stalled\\-cycles\\-backend / instructions if has_event(stalled\\-cycles\\-backend) else 0)))\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001 */
>
> };
>
> @@ -2714,21 +2714,21 @@ static const struct pmu_table_entry pmu_events__test_soc_cpu[] = {
> };
>
> static const struct compact_pmu_event pmu_metrics__test_soc_cpu_default_core[] = {
> -{ 130551 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000 */
> -{ 131240 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000 */
> -{ 131010 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000 */
> -{ 131105 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000 */
> -{ 131305 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
> -{ 131374 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
> -{ 130638 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000 */
> -{ 130574 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000 */
> -{ 131512 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000 */
> -{ 131445 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\000000 */
> -{ 131468 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\000000 */
> -{ 131491 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\000000 */
> -{ 130938 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000 */
> -{ 130805 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
> -{ 130870 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
> +{ 130909 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000 */
> +{ 131598 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000 */
> +{ 131368 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000 */
> +{ 131463 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000 */
> +{ 131663 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
> +{ 131732 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
> +{ 130996 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000 */
> +{ 130932 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000 */
> +{ 131870 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000 */
> +{ 131803 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\000000 */
> +{ 131826 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\000000 */
> +{ 131849 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\000000 */
> +{ 131296 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000 */
> +{ 131163 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
> +{ 131228 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
>
> };
>
> --
> 2.53.0.851.ga537e3e6e9-goog
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1] perf metrics: Make common stalled metrics conditional on having the event
2026-03-19 1:01 ` [PATCH v1] perf metrics: Make common stalled metrics conditional on having the event Ian Rogers
2026-04-01 5:55 ` Ian Rogers
@ 2026-04-04 0:15 ` Namhyung Kim
1 sibling, 0 replies; 10+ messages in thread
From: Namhyung Kim @ 2026-04-04 0:15 UTC (permalink / raw)
To: acme, Ian Rogers; +Cc: japo, linux-kernel, linux-perf-users, tmricht
On Wed, 18 Mar 2026 18:01:03 -0700, Ian Rogers wrote:
> The metric code uses the event parsing code but it generally assumes
> all events are supported. Arnaldo reported AMD supporting
> stalled-cycles-frontend but not stalled-cycles-backend [1]. An issue
> with this is that before parsing happens the metric code tries to
> share events within groups to reduce the number of events and
> multiplexing. If the group has some supported and not supported
> events, the whole group will become broken. To avoid this situation
> add has_event tests to the metrics for stalled-cycles-frontend and
> stalled-cycles-backend. has_events is evaluated when parsing the
> metric and its result constant propagated (with if-elses) to reduce
> the number of events. This means when the metric code considers
> sharing the events, only supported events will be shared.
>
> [...]
Applied to perf-tools-next, thanks!
Best regards,
Namhyung
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-04-04 0:15 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <66d0366b-8690-4bde-aba4-1d51f278dfa1@linux.ibm.com>
[not found] ` <abQq79_ajs8zFeEI@x1>
[not found] ` <CAP-5=fWG0Q-xNGn_-_KGCopaNsWG5rUN3pYNLEh3K7+xE0OwVQ@mail.gmail.com>
2026-03-17 19:39 ` perf stat issue with 7.0.0rc3 Arnaldo Carvalho de Melo
2026-03-17 19:56 ` Arnaldo Carvalho de Melo
2026-03-17 20:12 ` Arnaldo Carvalho de Melo
2026-03-17 20:50 ` Ian Rogers
2026-03-18 0:37 ` Arnaldo Carvalho de Melo
2026-03-18 2:25 ` Ian Rogers
2026-03-19 1:01 ` [PATCH v1] perf metrics: Make common stalled metrics conditional on having the event Ian Rogers
2026-04-01 5:55 ` Ian Rogers
2026-04-04 0:15 ` Namhyung Kim
2026-03-24 4:19 ` perf stat issue with 7.0.0rc3 Ian Rogers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox