From: "Mi, Dapeng" <dapeng1.mi@linux.intel.com>
To: Ian Rogers <irogers@google.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Namhyung Kim <namhyung@kernel.org>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@kernel.org>,
Adrian Hunter <adrian.hunter@intel.com>,
James Clark <james.clark@linaro.org>, Xu Yang <xu.yang_2@nxp.com>,
Chun-Tse Shao <ctshao@google.com>,
Thomas Richter <tmricht@linux.ibm.com>,
Sumanth Korikkar <sumanthk@linux.ibm.com>,
Collin Funk <collin.funk1@gmail.com>,
Thomas Falcon <thomas.falcon@intel.com>,
Howard Chu <howardchu95@gmail.com>,
Levi Yun <yeoreum.yun@arm.com>,
Yang Li <yang.lee@linux.alibaba.com>,
linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
Andi Kleen <ak@linux.intel.com>,
Weilin Wang <weilin.wang@intel.com>
Subject: Re: [PATCH v4 00/18]
Date: Wed, 12 Nov 2025 17:03:24 +0800 [thread overview]
Message-ID: <52a04db2-ee35-4870-9fcc-1b8824d2f2f9@linux.intel.com> (raw)
In-Reply-To: <20251111212206.631711-1-irogers@google.com>
I tested this patch series on Sapphire Rapids and Arrow Lake, the topdown
metrics output looks much prettier and reader-friendly (especially on
hybrid platforms) than before. Thanks.
Sapphire Rapids:
1. sudo ./perf stat -a
^C
Performance counter stats for 'system wide':
1,742 context-switches # 4.1 cs/sec
cs_per_second
420,720.12 msec cpu-clock # 224.5 CPUs
CPUs_utilized
225 cpu-migrations # 0.5
migrations/sec migrations_per_second
1,463 page-faults # 3.5
faults/sec page_faults_per_second
842,434 branch-misses # 3.0 %
branch_miss_rate
28,215,728 branches # 0.1 M/sec
branch_frequency
373,303,824 cpu-cycles # 0.0 GHz
cycles_frequency
135,738,837 instructions # 0.4
instructions insn_per_cycle
TopdownL1 # 4.4 %
tma_bad_speculation
# 29.0 %
tma_frontend_bound
# 58.3 %
tma_backend_bound
# 8.3 % tma_retiring
TopdownL2 # 25.9 % tma_core_bound
# 32.4 %
tma_memory_bound
# 2.3 %
tma_heavy_operations
# 6.0 %
tma_light_operations
# 4.1 %
tma_branch_mispredicts
# 0.3 %
tma_machine_clears
# 4.4 %
tma_fetch_bandwidth
# 24.6 %
tma_fetch_latency
1.873921629 seconds time elapsed
2. sudo ./perf stat -- true
Performance counter stats for 'true':
0 context-switches # 0.0 cs/sec
cs_per_second
0 cpu-migrations # 0.0
migrations/sec migrations_per_second
53 page-faults # 178267.5
faults/sec page_faults_per_second
0.30 msec task-clock # 0.4 CPUs
CPUs_utilized
4,977 branch-misses # 4.6 %
branch_miss_rate
109,186 branches # 367.3 M/sec
branch_frequency
832,970 cpu-cycles # 2.8 GHz
cycles_frequency
561,263 instructions # 0.7
instructions insn_per_cycle
TopdownL1 # 11.1 %
tma_bad_speculation
# 40.5 %
tma_frontend_bound
# 35.2 %
tma_backend_bound
# 13.3 % tma_retiring
TopdownL2 # 13.7 % tma_core_bound
# 21.5 %
tma_memory_bound
# 3.1 %
tma_heavy_operations
# 10.2 %
tma_light_operations
# 10.5 %
tma_branch_mispredicts
# 0.6 %
tma_machine_clears
# 10.5 %
tma_fetch_bandwidth
# 29.9 %
tma_fetch_latency
0.000752150 seconds time elapsed
3. sudo ./perf stat -M TopdownL1 -- true
Performance counter stats for 'true':
5,352,744 TOPDOWN.SLOTS # 11.1 %
tma_bad_speculation
# 41.5 %
tma_frontend_bound
650,725 topdown-retiring # 35.4 %
tma_backend_bound
2,246,053 topdown-fe-bound
1,910,194 topdown-be-bound
146,938 topdown-heavy-ops # 12.1 %
tma_retiring
587,752 topdown-bad-spec
8,977 INT_MISC.UOP_DROPPING
0.000655604 seconds time elapsed
4. sudo ./perf stat -M TopdownL2 -- true
Performance counter stats for 'true':
5,935,368 TOPDOWN.SLOTS
651,726 topdown-retiring
2,257,767 topdown-fe-bound
1,699,144 topdown-mem-bound # 12.5 %
tma_core_bound
# 28.6 %
tma_memory_bound
2,443,975 topdown-be-bound
162,931 topdown-heavy-ops # 2.7 %
tma_heavy_operations
# 8.2 %
tma_light_operations
558,622 topdown-br-mispredict # 9.4 %
tma_branch_mispredicts
# 0.5 %
tma_machine_clears
1,722,420 topdown-fetch-lat # 9.0 %
tma_fetch_bandwidth
# 28.9 %
tma_fetch_latency
581,898 topdown-bad-spec
9,177 INT_MISC.UOP_DROPPING
0.000762976 seconds time elapsed
Arrow Lake:
1. sudo ./perf stat -a
^C
Performance counter stats for 'system wide':
355 context-switches # 8.7 cs/sec
cs_per_second
40,877.75 msec cpu-clock # 24.0 CPUs
CPUs_utilized
37 cpu-migrations # 0.9
migrations/sec migrations_per_second
749 page-faults # 18.3
faults/sec page_faults_per_second
80,736 cpu_core/branch-misses/ # 4.5 %
branch_miss_rate
1,817,804 cpu_core/branches/ # 0.0 M/sec
branch_frequency
22,099,084 cpu_core/cpu-cycles/ # 0.0 GHz
cycles_frequency
8,993,043 cpu_core/instructions/ # 0.4
instructions insn_per_cycle
7,484,501 cpu_atom/branch-misses/ # 9.0 %
branch_miss_rate (72.70%)
80,826,849 cpu_atom/branches/ # 2.0 M/sec
branch_frequency (72.79%)
1,071,170,614 cpu_atom/cpu-cycles/ # 0.0 GHz
cycles_frequency (72.78%)
429,581,963 cpu_atom/instructions/ # 0.4
instructions insn_per_cycle (72.68%)
TopdownL1 (cpu_core) # 62.1 %
tma_backend_bound
# 4.6 %
tma_bad_speculation
# 27.5 %
tma_frontend_bound
# 5.9 % tma_retiring
TopdownL1 (cpu_atom) # 13.5 %
tma_bad_speculation (72.85%)
# 29.4 %
tma_backend_bound (72.87%)
# 0.0 %
tma_frontend_bound (81.91%)
# 0.0 %
tma_retiring (81.76%)
1.703000770 seconds time elapsed
2. sudo ./perf stat -- true
Performance counter stats for 'true':
0 context-switches # 0.0 cs/sec
cs_per_second
0 cpu-migrations # 0.0
migrations/sec migrations_per_second
52 page-faults # 123119.2
faults/sec page_faults_per_second
0.42 msec task-clock # 0.3 CPUs
CPUs_utilized
8,317 cpu_atom/branch-misses/ # 1.6 %
branch_miss_rate (51.13%)
621,409 cpu_atom/branches/ # 1471.3 M/sec
branch_frequency
1,670,355 cpu_atom/cpu-cycles/ # 4.0 GHz
cycles_frequency
3,412,023 cpu_atom/instructions/ # 2.0
instructions insn_per_cycle
TopdownL1 (cpu_atom) # 12.9 %
tma_bad_speculation
# 22.1 %
tma_backend_bound (48.87%)
# 0.0 %
tma_frontend_bound (48.87%)
0.001387192 seconds time elapsed
3. sudo ./perf stat -M TopdownL1
^C
Performance counter stats for 'system wide':
70,711,798 cpu_atom/TOPDOWN_BE_BOUND.ALL_P/ # 32.5 %
tma_backend_bound
34,170,064 cpu_core/slots/
2,838,696 cpu_core/topdown-retiring/ # 31.9 %
tma_backend_bound
# 7.6 %
tma_bad_speculation
# 52.2 %
tma_frontend_bound
2,596,813 cpu_core/topdown-bad-spec/
389,708 cpu_core/topdown-heavy-ops/ # 8.3 %
tma_retiring
17,836,476 cpu_core/topdown-fe-bound/
10,892,767 cpu_core/topdown-be-bound/
0 cpu_atom/TOPDOWN_RETIRING.ALL/ # 0.0 %
tma_retiring
27,212,830 cpu_atom/CPU_CLK_UNHALTED.CORE/
14,606,510 cpu_atom/TOPDOWN_BAD_SPECULATION.ALL_P/ # 6.7
% tma_bad_speculation
0 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 0.0 %
tma_frontend_bound
0.933603501 seconds time elapsed
4. sudo ./perf stat -M TopdownL2
^C
Performance counter stats for 'system wide':
3,185,474 cpu_atom/TOPDOWN_BAD_SPECULATION.MACHINE_CLEARS/ #
0.3 % tma_machine_clears
362,392,575 cpu_atom/TOPDOWN_BE_BOUND.ALL_P/ # 11.2 %
tma_core_bound
# 21.1 %
tma_resource_bound
134,220,848 cpu_core/slots/
7,973,945 cpu_core/topdown-retiring/
21,283,136 cpu_core/topdown-mem-bound/ # 20.3 %
tma_core_bound
# 15.9 %
tma_memory_bound
8,723,033 cpu_core/topdown-bad-spec/
1,312,216 cpu_core/topdown-heavy-ops/ # 1.0 %
tma_heavy_operations
# 5.0 %
tma_light_operations
58,866,799 cpu_core/topdown-fetch-lat/ # 7.5 %
tma_fetch_bandwidth
# 43.9 %
tma_fetch_latency
8,588,952 cpu_core/topdown-br-mispredict/ # 6.4 %
tma_branch_mispredicts
# 0.1 %
tma_machine_clears
68,870,574 cpu_core/topdown-fe-bound/
48,573,009 cpu_core/topdown-be-bound/
125,913,035 cpu_atom/TOPDOWN_BE_BOUND.ALLOC_RESTRICTIONS/
106,491,449 cpu_atom/TOPDOWN_BAD_SPECULATION.MISPREDICT/ #
9.5 % tma_branch_mispredicts
199,780,747 cpu_atom/TOPDOWN_FE_BOUND.FRONTEND_LATENCY/ #
17.8 % tma_ifetch_latency
140,205,932 cpu_atom/CPU_CLK_UNHALTED.CORE/
109,670,746 cpu_atom/TOPDOWN_BAD_SPECULATION.ALL_P/
0 cpu_atom/TOPDOWN_FE_BOUND.ALL/
176,695,510 cpu_atom/TOPDOWN_FE_BOUND.FRONTEND_BANDWIDTH/ #
15.8 % tma_ifetch_bandwidth
1.463942844 seconds time elapsed
On 11/12/2025 5:21 AM, Ian Rogers wrote:
> Prior to this series stat-shadow would produce hard coded metrics if
> certain events appeared in the evlist. This series produces equivalent
> json metrics and cleans up the consequences in tests and display
> output. A before and after of the default display output on a
> tigerlake is:
>
> Before:
> ```
> $ perf stat -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 16,041,816,418 cpu-clock # 15.995 CPUs utilized
> 5,749 context-switches # 358.376 /sec
> 121 cpu-migrations # 7.543 /sec
> 1,806 page-faults # 112.581 /sec
> 825,965,204 instructions # 0.70 insn per cycle
> 1,180,799,101 cycles # 0.074 GHz
> 168,945,109 branches # 10.532 M/sec
> 4,629,567 branch-misses # 2.74% of all branches
> # 30.2 % tma_backend_bound
> # 7.8 % tma_bad_speculation
> # 47.1 % tma_frontend_bound
> # 14.9 % tma_retiring
> ```
>
> After:
> ```
> $ perf stat -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 2,890 context-switches # 179.9 cs/sec cs_per_second
> 16,061,923,339 cpu-clock # 16.0 CPUs CPUs_utilized
> 43 cpu-migrations # 2.7 migrations/sec migrations_per_second
> 5,645 page-faults # 351.5 faults/sec page_faults_per_second
> 5,708,413 branch-misses # 1.4 % branch_miss_rate (88.83%)
> 429,978,120 branches # 26.8 M/sec branch_frequency (88.85%)
> 1,626,915,897 cpu-cycles # 0.1 GHz cycles_frequency (88.84%)
> 2,556,805,534 instructions # 1.5 instructions insn_per_cycle (88.86%)
> TopdownL1 # 20.1 % tma_backend_bound
> # 40.5 % tma_bad_speculation (88.90%)
> # 17.2 % tma_frontend_bound (78.05%)
> # 22.2 % tma_retiring (88.89%)
>
> 1.002994394 seconds time elapsed
> ```
>
> Having the metrics in json brings greater uniformity, allows events to
> be shared by metrics, and it also allows descriptions like:
> ```
> $ perf list cs_per_second
> ...
> cs_per_second
> [Context switches per CPU second]
> ```
>
> A thorn in the side of doing this work was that the hard coded metrics
> were used by perf script with '-F metric'. This functionality didn't
> work for me (I was testing `perf record -e instructions,cycles`
> with/without leader sampling and then `perf script -F metric` but saw
> nothing but empty lines) but anyway I decided to fix it to the best of
> my ability in this series. So the script side counters were removed
> and the regular ones associated with the evsel used. The json metrics
> were all searched looking for ones that have a subset of events
> matching those in the perf script session, and all metrics are
> printed. This is kind of weird as the counters are being set by the
> period of samples, but I carried the behavior forward. I suspect there
> needs to be follow up work to make this better, but what is in the
> series is superior to what is currently in the tree. Follow up work
> could include finding metrics for the machine in the perf.data rather
> than using the host, allowing multiple metrics even if the metric ids
> of the events differ, fixing pre-existing `perf stat record/report`
> issues, etc.
>
> There is a lot of stat tests that, for example, assume '-e
> instructions,cycles' will produce an IPC metric. These things needed
> tidying as now the metric must be explicitly asked for and when doing
> this ones using software events were preferred to increase
> compatibility. As the test updates were numerous they are distinct to
> the patches updating the functionality causing periods in the series
> where not all tests are passing. If this is undesirable the test fixes
> can be squashed into the functionality updates, but this will be kind
> of messy, especially as at some points in the series both the old
> metrics and the new metrics will be displayed.
>
> v4: K/sec to M/sec on branch frequency (Namhyung), perf script -F
> metric to-done a system-wide calculation (Namhyung) and don't
> crash because of the CPU map index couldn't be found. Regenerate
> commit messages but the cpu-clock was always yielding 0 on my
> machine leading to a lot of nan metric values.
>
> v3: Rebase resolving merge conflicts in
> tools/perf/pmu-events/empty-pmu-events.c by just regenerating it
> (Dapeng Mi).
> https://lore.kernel.org/lkml/20251111040417.270945-1-irogers@google.com/
>
> v2: Drop merged patches, add json to document target_cpu/core_wide and
> example to "Add care to picking the evsel for displaying a metric"
> commit message (Namhyung).
> https://lore.kernel.org/lkml/20251106231508.448793-1-irogers@google.com/
>
> v1: https://lore.kernel.org/lkml/20251024175857.808401-1-irogers@google.com/
>
> Ian Rogers (18):
> perf metricgroup: Add care to picking the evsel for displaying a
> metric
> perf expr: Add #target_cpu literal
> perf jevents: Add set of common metrics based on default ones
> perf jevents: Add metric DefaultShowEvents
> perf stat: Add detail -d,-dd,-ddd metrics
> perf script: Change metric format to use json metrics
> perf stat: Remove hard coded shadow metrics
> perf stat: Fix default metricgroup display on hybrid
> perf stat: Sort default events/metrics
> perf stat: Remove "unit" workarounds for metric-only
> perf test stat+json: Improve metric-only testing
> perf test stat: Ignore failures in Default[234] metricgroups
> perf test stat: Update std_output testing metric expectations
> perf test metrics: Update all metrics for possibly failing default
> metrics
> perf test stat: Update shadow test to use metrics
> perf test stat: Update test expectations and events
> perf test stat csv: Update test expectations and events
> perf tool_pmu: Make core_wide and target_cpu json events
>
> tools/perf/builtin-script.c | 251 ++++++++++-
> tools/perf/builtin-stat.c | 154 ++-----
> .../arch/common/common/metrics.json | 151 +++++++
> .../pmu-events/arch/common/common/tool.json | 12 +
> tools/perf/pmu-events/empty-pmu-events.c | 229 ++++++----
> tools/perf/pmu-events/jevents.py | 28 +-
> tools/perf/pmu-events/pmu-events.h | 2 +
> .../tests/shell/lib/perf_json_output_lint.py | 4 +-
> tools/perf/tests/shell/lib/stat_output.sh | 2 +-
> tools/perf/tests/shell/stat+csv_output.sh | 2 +-
> tools/perf/tests/shell/stat+json_output.sh | 2 +-
> tools/perf/tests/shell/stat+shadow_stat.sh | 4 +-
> tools/perf/tests/shell/stat+std_output.sh | 4 +-
> tools/perf/tests/shell/stat.sh | 6 +-
> .../perf/tests/shell/stat_all_metricgroups.sh | 3 +
> tools/perf/tests/shell/stat_all_metrics.sh | 7 +-
> tools/perf/util/evsel.h | 1 +
> tools/perf/util/expr.c | 8 +-
> tools/perf/util/metricgroup.c | 92 +++-
> tools/perf/util/stat-display.c | 55 +--
> tools/perf/util/stat-shadow.c | 404 +-----------------
> tools/perf/util/stat.h | 2 +-
> tools/perf/util/tool_pmu.c | 24 +-
> tools/perf/util/tool_pmu.h | 9 +-
> 24 files changed, 769 insertions(+), 687 deletions(-)
> create mode 100644 tools/perf/pmu-events/arch/common/common/metrics.json
>
next prev parent reply other threads:[~2025-11-12 9:03 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-11 21:21 [PATCH v4 00/18] Ian Rogers
2025-11-11 21:21 ` [PATCH v4 01/18] perf metricgroup: Add care to picking the evsel for displaying a metric Ian Rogers
2025-11-11 21:21 ` [PATCH v4 02/18] perf expr: Add #target_cpu literal Ian Rogers
2025-11-11 21:21 ` [PATCH v4 03/18] perf jevents: Add set of common metrics based on default ones Ian Rogers
2025-11-14 16:28 ` James Clark
2025-11-14 16:57 ` Ian Rogers
2025-11-15 17:52 ` Namhyung Kim
2025-11-16 3:29 ` Ian Rogers
2025-11-18 1:36 ` Namhyung Kim
2025-11-18 2:28 ` Ian Rogers
2025-11-18 7:29 ` Namhyung Kim
2025-11-18 10:57 ` James Clark
2025-11-11 21:21 ` [PATCH v4 04/18] perf jevents: Add metric DefaultShowEvents Ian Rogers
2025-11-11 21:21 ` [PATCH v4 05/18] perf stat: Add detail -d,-dd,-ddd metrics Ian Rogers
2025-11-11 21:21 ` [PATCH v4 06/18] perf script: Change metric format to use json metrics Ian Rogers
2025-11-11 21:21 ` [PATCH v4 07/18] perf stat: Remove hard coded shadow metrics Ian Rogers
2025-11-11 21:21 ` [PATCH v4 08/18] perf stat: Fix default metricgroup display on hybrid Ian Rogers
2025-11-11 21:21 ` [PATCH v4 09/18] perf stat: Sort default events/metrics Ian Rogers
2025-11-11 21:21 ` [PATCH v4 10/18] perf stat: Remove "unit" workarounds for metric-only Ian Rogers
2025-11-11 21:21 ` [PATCH v4 11/18] perf test stat+json: Improve metric-only testing Ian Rogers
2025-11-11 21:22 ` [PATCH v4 12/18] perf test stat: Ignore failures in Default[234] metricgroups Ian Rogers
2025-11-11 21:22 ` [PATCH v4 13/18] perf test stat: Update std_output testing metric expectations Ian Rogers
2025-11-11 21:22 ` [PATCH v4 14/18] perf test metrics: Update all metrics for possibly failing default metrics Ian Rogers
2025-11-11 21:22 ` [PATCH v4 15/18] perf test stat: Update shadow test to use metrics Ian Rogers
2025-11-11 21:22 ` [PATCH v4 16/18] perf test stat: Update test expectations and events Ian Rogers
2025-11-11 21:22 ` [PATCH v4 17/18] perf test stat csv: " Ian Rogers
2025-11-11 21:22 ` [PATCH v4 18/18] perf tool_pmu: Make core_wide and target_cpu json events Ian Rogers
2025-11-11 22:42 ` [PATCH v4 00/18] Namhyung Kim
2025-11-11 23:13 ` Ian Rogers
2025-11-12 1:08 ` Namhyung Kim
2025-11-12 8:20 ` Mi, Dapeng
2025-11-12 9:03 ` Mi, Dapeng [this message]
2025-11-12 17:56 ` Namhyung Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52a04db2-ee35-4870-9fcc-1b8824d2f2f9@linux.intel.com \
--to=dapeng1.mi@linux.intel.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=ak@linux.intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=collin.funk1@gmail.com \
--cc=ctshao@google.com \
--cc=howardchu95@gmail.com \
--cc=irogers@google.com \
--cc=james.clark@linaro.org \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=sumanthk@linux.ibm.com \
--cc=thomas.falcon@intel.com \
--cc=tmricht@linux.ibm.com \
--cc=weilin.wang@intel.com \
--cc=xu.yang_2@nxp.com \
--cc=yang.lee@linux.alibaba.com \
--cc=yeoreum.yun@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.