[PATCH v4 00/18]

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 00/18]
@ 2025-11-11 21:21 Ian Rogers
  2025-11-11 21:21 ` [PATCH v4 01/18] perf metricgroup: Add care to picking the evsel for displaying a metric Ian Rogers
                   ` (20 more replies)
  0 siblings, 21 replies; 33+ messages in thread
From: Ian Rogers @ 2025-11-11 21:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
	Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
	Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
	linux-perf-users, Andi Kleen, Weilin Wang

Prior to this series stat-shadow would produce hard coded metrics if
certain events appeared in the evlist. This series produces equivalent
json metrics and cleans up the consequences in tests and display
output. A before and after of the default display output on a
tigerlake is:

Before:
```
$ perf stat -a sleep 1

 Performance counter stats for 'system wide':

    16,041,816,418      cpu-clock                        #   15.995 CPUs utilized             
             5,749      context-switches                 #  358.376 /sec                      
               121      cpu-migrations                   #    7.543 /sec                      
             1,806      page-faults                      #  112.581 /sec                      
       825,965,204      instructions                     #    0.70  insn per cycle            
     1,180,799,101      cycles                           #    0.074 GHz                       
       168,945,109      branches                         #   10.532 M/sec                     
         4,629,567      branch-misses                    #    2.74% of all branches           
 #     30.2 %  tma_backend_bound      
                                                  #      7.8 %  tma_bad_speculation    
                                                  #     47.1 %  tma_frontend_bound     
 #     14.9 %  tma_retiring           
```

After:
```
$ perf stat -a sleep 1

 Performance counter stats for 'system wide':

             2,890      context-switches                 #    179.9 cs/sec  cs_per_second     
    16,061,923,339      cpu-clock                        #     16.0 CPUs  CPUs_utilized       
                43      cpu-migrations                   #      2.7 migrations/sec  migrations_per_second
             5,645      page-faults                      #    351.5 faults/sec  page_faults_per_second
         5,708,413      branch-misses                    #      1.4 %  branch_miss_rate         (88.83%)
       429,978,120      branches                         #     26.8 M/sec  branch_frequency     (88.85%)
     1,626,915,897      cpu-cycles                       #      0.1 GHz  cycles_frequency       (88.84%)
     2,556,805,534      instructions                     #      1.5 instructions  insn_per_cycle  (88.86%)
                        TopdownL1                 #     20.1 %  tma_backend_bound      
                                                  #     40.5 %  tma_bad_speculation      (88.90%)
                                                  #     17.2 %  tma_frontend_bound       (78.05%)
                                                  #     22.2 %  tma_retiring             (88.89%)

       1.002994394 seconds time elapsed
```

Having the metrics in json brings greater uniformity, allows events to
be shared by metrics, and it also allows descriptions like:
```
$ perf list cs_per_second
...
  cs_per_second
       [Context switches per CPU second]
```

A thorn in the side of doing this work was that the hard coded metrics
were used by perf script with '-F metric'. This functionality didn't
work for me (I was testing `perf record -e instructions,cycles`
with/without leader sampling and then `perf script -F metric` but saw
nothing but empty lines) but anyway I decided to fix it to the best of
my ability in this series. So the script side counters were removed
and the regular ones associated with the evsel used. The json metrics
were all searched looking for ones that have a subset of events
matching those in the perf script session, and all metrics are
printed. This is kind of weird as the counters are being set by the
period of samples, but I carried the behavior forward. I suspect there
needs to be follow up work to make this better, but what is in the
series is superior to what is currently in the tree. Follow up work
could include finding metrics for the machine in the perf.data rather
than using the host, allowing multiple metrics even if the metric ids
of the events differ, fixing pre-existing `perf stat record/report`
issues, etc.

There is a lot of stat tests that, for example, assume '-e
instructions,cycles' will produce an IPC metric. These things needed
tidying as now the metric must be explicitly asked for and when doing
this ones using software events were preferred to increase
compatibility. As the test updates were numerous they are distinct to
the patches updating the functionality causing periods in the series
where not all tests are passing. If this is undesirable the test fixes
can be squashed into the functionality updates, but this will be kind
of messy, especially as at some points in the series both the old
metrics and the new metrics will be displayed.

v4: K/sec to M/sec on branch frequency (Namhyung), perf script -F
    metric to-done a system-wide calculation (Namhyung) and don't
    crash because of the CPU map index couldn't be found. Regenerate
    commit messages but the cpu-clock was always yielding 0 on my
    machine leading to a lot of nan metric values.

v3: Rebase resolving merge conflicts in
    tools/perf/pmu-events/empty-pmu-events.c by just regenerating it
    (Dapeng Mi).
    https://lore.kernel.org/lkml/20251111040417.270945-1-irogers@google.com/

v2: Drop merged patches, add json to document target_cpu/core_wide and
    example to "Add care to picking the evsel for displaying a metric"
    commit message (Namhyung).
    https://lore.kernel.org/lkml/20251106231508.448793-1-irogers@google.com/

v1: https://lore.kernel.org/lkml/20251024175857.808401-1-irogers@google.com/

Ian Rogers (18):
  perf metricgroup: Add care to picking the evsel for displaying a
    metric
  perf expr: Add #target_cpu literal
  perf jevents: Add set of common metrics based on default ones
  perf jevents: Add metric DefaultShowEvents
  perf stat: Add detail -d,-dd,-ddd metrics
  perf script: Change metric format to use json metrics
  perf stat: Remove hard coded shadow metrics
  perf stat: Fix default metricgroup display on hybrid
  perf stat: Sort default events/metrics
  perf stat: Remove "unit" workarounds for metric-only
  perf test stat+json: Improve metric-only testing
  perf test stat: Ignore failures in Default[234] metricgroups
  perf test stat: Update std_output testing metric expectations
  perf test metrics: Update all metrics for possibly failing default
    metrics
  perf test stat: Update shadow test to use metrics
  perf test stat: Update test expectations and events
  perf test stat csv: Update test expectations and events
  perf tool_pmu: Make core_wide and target_cpu json events

 tools/perf/builtin-script.c                   | 251 ++++++++++-
 tools/perf/builtin-stat.c                     | 154 ++-----
 .../arch/common/common/metrics.json           | 151 +++++++
 .../pmu-events/arch/common/common/tool.json   |  12 +
 tools/perf/pmu-events/empty-pmu-events.c      | 229 ++++++----
 tools/perf/pmu-events/jevents.py              |  28 +-
 tools/perf/pmu-events/pmu-events.h            |   2 +
 .../tests/shell/lib/perf_json_output_lint.py  |   4 +-
 tools/perf/tests/shell/lib/stat_output.sh     |   2 +-
 tools/perf/tests/shell/stat+csv_output.sh     |   2 +-
 tools/perf/tests/shell/stat+json_output.sh    |   2 +-
 tools/perf/tests/shell/stat+shadow_stat.sh    |   4 +-
 tools/perf/tests/shell/stat+std_output.sh     |   4 +-
 tools/perf/tests/shell/stat.sh                |   6 +-
 .../perf/tests/shell/stat_all_metricgroups.sh |   3 +
 tools/perf/tests/shell/stat_all_metrics.sh    |   7 +-
 tools/perf/util/evsel.h                       |   1 +
 tools/perf/util/expr.c                        |   8 +-
 tools/perf/util/metricgroup.c                 |  92 +++-
 tools/perf/util/stat-display.c                |  55 +--
 tools/perf/util/stat-shadow.c                 | 404 +-----------------
 tools/perf/util/stat.h                        |   2 +-
 tools/perf/util/tool_pmu.c                    |  24 +-
 tools/perf/util/tool_pmu.h                    |   9 +-
 24 files changed, 769 insertions(+), 687 deletions(-)
 create mode 100644 tools/perf/pmu-events/arch/common/common/metrics.json

-- 
2.51.2.1041.gc1ab5b90ca-goog


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v4 01/18] perf metricgroup: Add care to picking the evsel for displaying a metric
  2025-11-11 21:21 [PATCH v4 00/18] Ian Rogers
@ 2025-11-11 21:21 ` Ian Rogers
  2025-11-11 21:21 ` [PATCH v4 02/18] perf expr: Add #target_cpu literal Ian Rogers
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Ian Rogers @ 2025-11-11 21:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
	Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
	Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
	linux-perf-users, Andi Kleen, Weilin Wang

Rather than using the first evsel in the matched events, try to find
the least shared non-tool evsel. The aim is to pick the first evsel
that typifies the metric within the list of metrics.

This addresses an issue where Default metric group metrics may lose
their counter value due to how the stat displaying hides counters for
default event/metric output.

For a metricgroup like TopdownL1 on an Intel Alderlake the change is,
before there are 4 events with metrics:
```
$ perf stat -M topdownL1 -a sleep 1

 Performance counter stats for 'system wide':

     7,782,334,296      cpu_core/TOPDOWN.SLOTS/          #     10.4 %  tma_bad_speculation
                                                  #     19.7 %  tma_frontend_bound
     2,668,927,977      cpu_core/topdown-retiring/       #     35.7 %  tma_backend_bound
                                                  #     34.1 %  tma_retiring
       803,623,987      cpu_core/topdown-bad-spec/
       167,514,386      cpu_core/topdown-heavy-ops/
     1,555,265,776      cpu_core/topdown-fe-bound/
     2,792,733,013      cpu_core/topdown-be-bound/
       279,769,310      cpu_atom/TOPDOWN_RETIRING.ALL/   #     12.2 %  tma_retiring
                                                  #     15.1 %  tma_bad_speculation
       457,917,232      cpu_atom/CPU_CLK_UNHALTED.CORE/  #     38.4 %  tma_backend_bound
                                                  #     34.2 %  tma_frontend_bound
       783,519,226      cpu_atom/TOPDOWN_FE_BOUND.ALL/
        10,790,192      cpu_core/INT_MISC.UOP_DROPPING/
       879,845,633      cpu_atom/TOPDOWN_BE_BOUND.ALL/
```

After there are 6 events with metrics:
```
$ perf stat -M topdownL1 -a sleep 1

 Performance counter stats for 'system wide':

     2,377,551,258      cpu_core/TOPDOWN.SLOTS/          #      7.9 %  tma_bad_speculation
                                                  #     36.4 %  tma_frontend_bound
       480,791,142      cpu_core/topdown-retiring/       #     35.5 %  tma_backend_bound
       186,323,991      cpu_core/topdown-bad-spec/
        65,070,590      cpu_core/topdown-heavy-ops/      #     20.1 %  tma_retiring
       871,733,444      cpu_core/topdown-fe-bound/
       848,286,598      cpu_core/topdown-be-bound/
       260,936,456      cpu_atom/TOPDOWN_RETIRING.ALL/   #     12.4 %  tma_retiring
                                                  #     17.6 %  tma_bad_speculation
       419,576,513      cpu_atom/CPU_CLK_UNHALTED.CORE/
       797,132,597      cpu_atom/TOPDOWN_FE_BOUND.ALL/   #     38.0 %  tma_frontend_bound
         3,055,447      cpu_core/INT_MISC.UOP_DROPPING/
       671,014,164      cpu_atom/TOPDOWN_BE_BOUND.ALL/   #     32.0 %  tma_backend_bound
```

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/metricgroup.c | 48 ++++++++++++++++++++++++++++++++++-
 1 file changed, 47 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 48936e517803..76092ee26761 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -1323,6 +1323,51 @@ static int parse_ids(bool metric_no_merge, bool fake_pmu,
 	return ret;
 }
 
+/* How many times will a given evsel be used in a set of metrics? */
+static int count_uses(struct list_head *metric_list, struct evsel *evsel)
+{
+	const char *metric_id = evsel__metric_id(evsel);
+	struct metric *m;
+	int uses = 0;
+
+	list_for_each_entry(m, metric_list, nd) {
+		if (hashmap__find(m->pctx->ids, metric_id, NULL))
+			uses++;
+	}
+	return uses;
+}
+
+/*
+ * Select the evsel that stat-display will use to trigger shadow/metric
+ * printing. Pick the least shared non-tool evsel, encouraging metrics to be
+ * with a hardware counter that is specific to them.
+ */
+static struct evsel *pick_display_evsel(struct list_head *metric_list,
+					struct evsel **metric_events)
+{
+	struct evsel *selected = metric_events[0];
+	size_t selected_uses;
+	bool selected_is_tool;
+
+	if (!selected)
+		return NULL;
+
+	selected_uses = count_uses(metric_list, selected);
+	selected_is_tool = evsel__is_tool(selected);
+	for (int i = 1; metric_events[i]; i++) {
+		struct evsel *candidate = metric_events[i];
+		size_t candidate_uses = count_uses(metric_list, candidate);
+
+		if ((selected_is_tool && !evsel__is_tool(candidate)) ||
+		    (candidate_uses < selected_uses)) {
+			selected = candidate;
+			selected_uses = candidate_uses;
+			selected_is_tool = evsel__is_tool(selected);
+		}
+	}
+	return selected;
+}
+
 static int parse_groups(struct evlist *perf_evlist,
 			const char *pmu, const char *str,
 			bool metric_no_group,
@@ -1430,7 +1475,8 @@ static int parse_groups(struct evlist *perf_evlist,
 			goto out;
 		}
 
-		me = metricgroup__lookup(&perf_evlist->metric_events, metric_events[0],
+		me = metricgroup__lookup(&perf_evlist->metric_events,
+					 pick_display_evsel(&metric_list, metric_events),
 					 /*create=*/true);
 
 		expr = malloc(sizeof(struct metric_expr));
-- 
2.51.2.1041.gc1ab5b90ca-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 02/18] perf expr: Add #target_cpu literal
  2025-11-11 21:21 [PATCH v4 00/18] Ian Rogers
  2025-11-11 21:21 ` [PATCH v4 01/18] perf metricgroup: Add care to picking the evsel for displaying a metric Ian Rogers
@ 2025-11-11 21:21 ` Ian Rogers
  2025-11-11 21:21 ` [PATCH v4 03/18] perf jevents: Add set of common metrics based on default ones Ian Rogers
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Ian Rogers @ 2025-11-11 21:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
	Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
	Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
	linux-perf-users, Andi Kleen, Weilin Wang

For CPU nanoseconds a lot of the stat-shadow metrics use either
task-clock or cpu-clock, the latter being used when
target__has_cpu. Add a #target_cpu literal so that json metrics can
perform the same test.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/expr.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
index 7fda0ff89c16..4df56f2b283d 100644
--- a/tools/perf/util/expr.c
+++ b/tools/perf/util/expr.c
@@ -409,6 +409,9 @@ double expr__get_literal(const char *literal, const struct expr_scanner_ctx *ctx
 	} else if (!strcmp("#core_wide", literal)) {
 		result = core_wide(ctx->system_wide, ctx->user_requested_cpu_list)
 			? 1.0 : 0.0;
+	} else if (!strcmp("#target_cpu", literal)) {
+		result = (ctx->system_wide || ctx->user_requested_cpu_list)
+			? 1.0 : 0.0;
 	} else {
 		pr_err("Unrecognized literal '%s'", literal);
 	}
-- 
2.51.2.1041.gc1ab5b90ca-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 03/18] perf jevents: Add set of common metrics based on default ones
  2025-11-11 21:21 [PATCH v4 00/18] Ian Rogers
  2025-11-11 21:21 ` [PATCH v4 01/18] perf metricgroup: Add care to picking the evsel for displaying a metric Ian Rogers
  2025-11-11 21:21 ` [PATCH v4 02/18] perf expr: Add #target_cpu literal Ian Rogers
@ 2025-11-11 21:21 ` Ian Rogers
  2025-11-14 16:28   ` James Clark
  2025-11-11 21:21 ` [PATCH v4 04/18] perf jevents: Add metric DefaultShowEvents Ian Rogers
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 33+ messages in thread
From: Ian Rogers @ 2025-11-11 21:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
	Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
	Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
	linux-perf-users, Andi Kleen, Weilin Wang

Add support to getting a common set of metrics from a default
table. It simplifies the generation to add json metrics at the same
time. The metrics added are CPUs_utilized, cs_per_second,
migrations_per_second, page_faults_per_second, insn_per_cycle,
stalled_cycles_per_instruction, frontend_cycles_idle,
backend_cycles_idle, cycles_frequency, branch_frequency and
branch_miss_rate based on the shadow metric definitions.

Following this change the default perf stat output on an alderlake
looks like:
```
$ perf stat -a -- sleep 2

 Performance counter stats for 'system wide':

              0.00 msec cpu-clock                        #    0.000 CPUs utilized
            77,739      context-switches
            15,033      cpu-migrations
           321,313      page-faults
    14,355,634,225      cpu_atom/instructions/           #    1.40  insn per cycle              (35.37%)
   134,561,560,583      cpu_core/instructions/           #    3.44  insn per cycle              (57.85%)
    10,263,836,145      cpu_atom/cycles/                                                        (35.42%)
    39,138,632,894      cpu_core/cycles/                                                        (57.60%)
     2,989,658,777      cpu_atom/branches/                                                      (42.60%)
    32,170,570,388      cpu_core/branches/                                                      (57.39%)
        29,789,870      cpu_atom/branch-misses/          #    1.00% of all branches             (42.69%)
       165,991,152      cpu_core/branch-misses/          #    0.52% of all branches             (57.19%)
                       (software)                 #      nan cs/sec  cs_per_second
             TopdownL1 (cpu_core)                 #     11.9 %  tma_bad_speculation
                                                  #     19.6 %  tma_frontend_bound       (63.97%)
             TopdownL1 (cpu_core)                 #     18.8 %  tma_backend_bound
                                                  #     49.7 %  tma_retiring             (63.97%)
                       (software)                 #      nan faults/sec  page_faults_per_second
                                                  #      nan GHz  cycles_frequency       (42.88%)
                                                  #      nan GHz  cycles_frequency       (69.88%)
             TopdownL1 (cpu_atom)                 #     11.7 %  tma_bad_speculation
                                                  #     29.9 %  tma_retiring             (50.07%)
             TopdownL1 (cpu_atom)                 #     31.3 %  tma_frontend_bound       (43.09%)
                       (cpu_atom)                 #      nan M/sec  branch_frequency     (43.09%)
                                                  #      nan M/sec  branch_frequency     (70.07%)
                                                  #      nan migrations/sec  migrations_per_second
             TopdownL1 (cpu_atom)                 #     27.1 %  tma_backend_bound        (43.08%)
                       (software)                 #      0.0 CPUs  CPUs_utilized
                                                  #      1.4 instructions  insn_per_cycle  (43.04%)
                                                  #      3.5 instructions  insn_per_cycle  (69.99%)
                                                  #      1.0 %  branch_miss_rate         (35.46%)
                                                  #      0.5 %  branch_miss_rate         (65.02%)

       2.005626564 seconds time elapsed
```

Signed-off-by: Ian Rogers <irogers@google.com>
---
 .../arch/common/common/metrics.json           |  86 +++++++++++++
 tools/perf/pmu-events/empty-pmu-events.c      | 115 +++++++++++++-----
 tools/perf/pmu-events/jevents.py              |  21 +++-
 tools/perf/pmu-events/pmu-events.h            |   1 +
 tools/perf/util/metricgroup.c                 |  31 +++--
 5 files changed, 212 insertions(+), 42 deletions(-)
 create mode 100644 tools/perf/pmu-events/arch/common/common/metrics.json

diff --git a/tools/perf/pmu-events/arch/common/common/metrics.json b/tools/perf/pmu-events/arch/common/common/metrics.json
new file mode 100644
index 000000000000..d915be51e300
--- /dev/null
+++ b/tools/perf/pmu-events/arch/common/common/metrics.json
@@ -0,0 +1,86 @@
+[
+    {
+        "BriefDescription": "Average CPU utilization",
+        "MetricExpr": "(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)",
+        "MetricGroup": "Default",
+        "MetricName": "CPUs_utilized",
+        "ScaleUnit": "1CPUs",
+        "MetricConstraint": "NO_GROUP_EVENTS"
+    },
+    {
+        "BriefDescription": "Context switches per CPU second",
+        "MetricExpr": "(software@context\\-switches\\,name\\=context\\-switches@ * 1e9) / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
+        "MetricGroup": "Default",
+        "MetricName": "cs_per_second",
+        "ScaleUnit": "1cs/sec",
+        "MetricConstraint": "NO_GROUP_EVENTS"
+    },
+    {
+        "BriefDescription": "Process migrations to a new CPU per CPU second",
+        "MetricExpr": "(software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9) / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
+        "MetricGroup": "Default",
+        "MetricName": "migrations_per_second",
+        "ScaleUnit": "1migrations/sec",
+        "MetricConstraint": "NO_GROUP_EVENTS"
+    },
+    {
+        "BriefDescription": "Page faults per CPU second",
+        "MetricExpr": "(software@page\\-faults\\,name\\=page\\-faults@ * 1e9) / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
+        "MetricGroup": "Default",
+        "MetricName": "page_faults_per_second",
+        "ScaleUnit": "1faults/sec",
+        "MetricConstraint": "NO_GROUP_EVENTS"
+    },
+    {
+        "BriefDescription": "Instructions Per Cycle",
+        "MetricExpr": "instructions / cpu\\-cycles",
+        "MetricGroup": "Default",
+        "MetricName": "insn_per_cycle",
+        "MetricThreshold": "insn_per_cycle < 1",
+        "ScaleUnit": "1instructions"
+    },
+    {
+        "BriefDescription": "Max front or backend stalls per instruction",
+        "MetricExpr": "max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions",
+        "MetricGroup": "Default",
+        "MetricName": "stalled_cycles_per_instruction"
+    },
+    {
+        "BriefDescription": "Frontend stalls per cycle",
+        "MetricExpr": "stalled\\-cycles\\-frontend / cpu\\-cycles",
+        "MetricGroup": "Default",
+        "MetricName": "frontend_cycles_idle",
+        "MetricThreshold": "frontend_cycles_idle > 0.1"
+    },
+    {
+        "BriefDescription": "Backend stalls per cycle",
+        "MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
+        "MetricGroup": "Default",
+        "MetricName": "backend_cycles_idle",
+        "MetricThreshold": "backend_cycles_idle > 0.2"
+    },
+    {
+        "BriefDescription": "Cycles per CPU second",
+        "MetricExpr": "cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
+        "MetricGroup": "Default",
+        "MetricName": "cycles_frequency",
+        "ScaleUnit": "1GHz",
+        "MetricConstraint": "NO_GROUP_EVENTS"
+    },
+    {
+        "BriefDescription": "Branches per CPU second",
+        "MetricExpr": "branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
+        "MetricGroup": "Default",
+        "MetricName": "branch_frequency",
+        "ScaleUnit": "1000M/sec",
+        "MetricConstraint": "NO_GROUP_EVENTS"
+    },
+    {
+        "BriefDescription": "Branch miss rate",
+        "MetricExpr": "branch\\-misses / branches",
+        "MetricGroup": "Default",
+        "MetricName": "branch_miss_rate",
+        "MetricThreshold": "branch_miss_rate > 0.05",
+        "ScaleUnit": "100%"
+    }
+]
diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
index 2fdf4fbf36e2..e4d00f6b2b5d 100644
--- a/tools/perf/pmu-events/empty-pmu-events.c
+++ b/tools/perf/pmu-events/empty-pmu-events.c
@@ -1303,21 +1303,32 @@ static const char *const big_c_string =
 /* offset=127519 */ "sys_ccn_pmu.read_cycles\000uncore\000ccn read-cycles event\000config=0x2c\0000x01\00000\000\000\000\000\000"
 /* offset=127596 */ "uncore_sys_cmn_pmu\000"
 /* offset=127615 */ "sys_cmn_pmu.hnf_cache_miss\000uncore\000Counts total cache misses in first lookup result (high priority)\000eventid=1,type=5\000(434|436|43c|43a).*\00000\000\000\000\000\000"
-/* offset=127758 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\00000"
-/* offset=127780 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\00000"
-/* offset=127843 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\00000"
-/* offset=128009 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000"
-/* offset=128073 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000"
-/* offset=128140 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\00000"
-/* offset=128211 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\00000"
-/* offset=128305 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\00000"
-/* offset=128439 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\00000"
-/* offset=128503 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\00000"
-/* offset=128571 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\00000"
-/* offset=128641 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\00000"
-/* offset=128663 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\00000"
-/* offset=128685 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\00000"
-/* offset=128705 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\00000"
+/* offset=127758 */ "CPUs_utilized\000Default\000(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)\000\000Average CPU utilization\000\0001CPUs\000\000\000\00001"
+/* offset=127943 */ "cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\00001"
+/* offset=128175 */ "migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\00001"
+/* offset=128434 */ "page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\00001"
+/* offset=128664 */ "insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\00000"
+/* offset=128776 */ "stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\00000"
+/* offset=128939 */ "frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\00000"
+/* offset=129068 */ "backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\00000"
+/* offset=129193 */ "cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\00001"
+/* offset=129368 */ "branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\00001"
+/* offset=129547 */ "branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\00000"
+/* offset=129650 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\00000"
+/* offset=129672 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\00000"
+/* offset=129735 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\00000"
+/* offset=129901 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000"
+/* offset=129965 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000"
+/* offset=130032 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\00000"
+/* offset=130103 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\00000"
+/* offset=130197 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\00000"
+/* offset=130331 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\00000"
+/* offset=130395 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\00000"
+/* offset=130463 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\00000"
+/* offset=130533 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\00000"
+/* offset=130555 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\00000"
+/* offset=130577 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\00000"
+/* offset=130597 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\00000"
 ;
 
 static const struct compact_pmu_event pmu_events__common_default_core[] = {
@@ -2603,6 +2614,29 @@ static const struct pmu_table_entry pmu_events__common[] = {
 },
 };
 
+static const struct compact_pmu_event pmu_metrics__common_default_core[] = {
+{ 127758 }, /* CPUs_utilized\000Default\000(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)\000\000Average CPU utilization\000\0001CPUs\000\000\000\00001 */
+{ 129068 }, /* backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\00000 */
+{ 129368 }, /* branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\00001 */
+{ 129547 }, /* branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\00000 */
+{ 127943 }, /* cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\00001 */
+{ 129193 }, /* cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\00001 */
+{ 128939 }, /* frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\00000 */
+{ 128664 }, /* insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\00000 */
+{ 128175 }, /* migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\00001 */
+{ 128434 }, /* page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\00001 */
+{ 128776 }, /* stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\00000 */
+
+};
+
+static const struct pmu_table_entry pmu_metrics__common[] = {
+{
+     .entries = pmu_metrics__common_default_core,
+     .num_entries = ARRAY_SIZE(pmu_metrics__common_default_core),
+     .pmu_name = { 0 /* default_core\000 */ },
+},
+};
+
 static const struct compact_pmu_event pmu_events__test_soc_cpu_default_core[] = {
 { 126205 }, /* bp_l1_btb_correct\000branch\000L1 BTB Correction\000event=0x8a\000\00000\000\000\000\000\000 */
 { 126267 }, /* bp_l2_btb_correct\000branch\000L2 BTB Correction\000event=0x8b\000\00000\000\000\000\000\000 */
@@ -2664,21 +2698,21 @@ static const struct pmu_table_entry pmu_events__test_soc_cpu[] = {
 };
 
 static const struct compact_pmu_event pmu_metrics__test_soc_cpu_default_core[] = {
-{ 127758 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\00000 */
-{ 128439 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\00000 */
-{ 128211 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\00000 */
-{ 128305 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\00000 */
-{ 128503 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\00000 */
-{ 128571 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\00000 */
-{ 127843 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\00000 */
-{ 127780 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\00000 */
-{ 128705 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\00000 */
-{ 128641 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\00000 */
-{ 128663 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\00000 */
-{ 128685 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\00000 */
-{ 128140 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\00000 */
-{ 128009 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000 */
-{ 128073 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000 */
+{ 129650 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\00000 */
+{ 130331 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\00000 */
+{ 130103 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\00000 */
+{ 130197 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\00000 */
+{ 130395 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\00000 */
+{ 130463 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\00000 */
+{ 129735 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\00000 */
+{ 129672 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\00000 */
+{ 130597 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\00000 */
+{ 130533 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\00000 */
+{ 130555 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\00000 */
+{ 130577 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\00000 */
+{ 130032 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\00000 */
+{ 129901 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000 */
+{ 129965 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000 */
 
 };
 
@@ -2759,7 +2793,10 @@ static const struct pmu_events_map pmu_events_map[] = {
 		.pmus = pmu_events__common,
 		.num_pmus = ARRAY_SIZE(pmu_events__common),
 	},
-	.metric_table = {},
+	.metric_table = {
+		.pmus = pmu_metrics__common,
+		.num_pmus = ARRAY_SIZE(pmu_metrics__common),
+	},
 },
 {
 	.arch = "testarch",
@@ -3208,6 +3245,22 @@ const struct pmu_metrics_table *pmu_metrics_table__find(void)
         return map ? &map->metric_table : NULL;
 }
 
+const struct pmu_metrics_table *pmu_metrics_table__default(void)
+{
+        int i = 0;
+
+        for (;;) {
+                const struct pmu_events_map *map = &pmu_events_map[i++];
+
+                if (!map->arch)
+                        break;
+
+                if (!strcmp(map->cpuid, "common"))
+                        return &map->metric_table;
+        }
+        return NULL;
+}
+
 const struct pmu_events_table *find_core_events_table(const char *arch, const char *cpuid)
 {
         for (const struct pmu_events_map *tables = &pmu_events_map[0];
diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 786a7049363f..5d3f4b44cfb7 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -755,7 +755,10 @@ static const struct pmu_events_map pmu_events_map[] = {
 \t\t.pmus = pmu_events__common,
 \t\t.num_pmus = ARRAY_SIZE(pmu_events__common),
 \t},
-\t.metric_table = {},
+\t.metric_table = {
+\t\t.pmus = pmu_metrics__common,
+\t\t.num_pmus = ARRAY_SIZE(pmu_metrics__common),
+\t},
 },
 """)
     else:
@@ -1237,6 +1240,22 @@ const struct pmu_metrics_table *pmu_metrics_table__find(void)
         return map ? &map->metric_table : NULL;
 }
 
+const struct pmu_metrics_table *pmu_metrics_table__default(void)
+{
+        int i = 0;
+
+        for (;;) {
+                const struct pmu_events_map *map = &pmu_events_map[i++];
+
+                if (!map->arch)
+                        break;
+
+                if (!strcmp(map->cpuid, "common"))
+                        return &map->metric_table;
+        }
+        return NULL;
+}
+
 const struct pmu_events_table *find_core_events_table(const char *arch, const char *cpuid)
 {
         for (const struct pmu_events_map *tables = &pmu_events_map[0];
diff --git a/tools/perf/pmu-events/pmu-events.h b/tools/perf/pmu-events/pmu-events.h
index e0535380c0b2..559265a903c8 100644
--- a/tools/perf/pmu-events/pmu-events.h
+++ b/tools/perf/pmu-events/pmu-events.h
@@ -127,6 +127,7 @@ int pmu_metrics_table__find_metric(const struct pmu_metrics_table *table,
 const struct pmu_events_table *perf_pmu__find_events_table(struct perf_pmu *pmu);
 const struct pmu_events_table *perf_pmu__default_core_events_table(void);
 const struct pmu_metrics_table *pmu_metrics_table__find(void);
+const struct pmu_metrics_table *pmu_metrics_table__default(void);
 const struct pmu_events_table *find_core_events_table(const char *arch, const char *cpuid);
 const struct pmu_metrics_table *find_core_metrics_table(const char *arch, const char *cpuid);
 int pmu_for_each_core_event(pmu_event_iter_fn fn, void *data);
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 76092ee26761..e67e04ce01c9 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -424,10 +424,18 @@ int metricgroup__for_each_metric(const struct pmu_metrics_table *table, pmu_metr
 		.fn = fn,
 		.data = data,
 	};
+	const struct pmu_metrics_table *tables[2] = {
+		table,
+		pmu_metrics_table__default(),
+	};
+
+	for (size_t i = 0; i < ARRAY_SIZE(tables); i++) {
+		int ret;
 
-	if (table) {
-		int ret = pmu_metrics_table__for_each_metric(table, fn, data);
+		if (!tables[i])
+			continue;
 
+		ret = pmu_metrics_table__for_each_metric(tables[i], fn, data);
 		if (ret)
 			return ret;
 	}
@@ -1581,19 +1589,22 @@ static int metricgroup__has_metric_or_groups_callback(const struct pmu_metric *p
 
 bool metricgroup__has_metric_or_groups(const char *pmu, const char *metric_or_groups)
 {
-	const struct pmu_metrics_table *table = pmu_metrics_table__find();
+	const struct pmu_metrics_table *tables[2] = {
+		pmu_metrics_table__find(),
+		pmu_metrics_table__default(),
+	};
 	struct metricgroup__has_metric_data data = {
 		.pmu = pmu,
 		.metric_or_groups = metric_or_groups,
 	};
 
-	if (!table)
-		return false;
-
-	return pmu_metrics_table__for_each_metric(table,
-						  metricgroup__has_metric_or_groups_callback,
-						  &data)
-		? true : false;
+	for (size_t i = 0; i < ARRAY_SIZE(tables); i++) {
+		if (pmu_metrics_table__for_each_metric(tables[i],
+							metricgroup__has_metric_or_groups_callback,
+							&data))
+			return true;
+	}
+	return false;
 }
 
 static int metricgroup__topdown_max_level_callback(const struct pmu_metric *pm,
-- 
2.51.2.1041.gc1ab5b90ca-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 03/18] perf jevents: Add set of common metrics based on default ones
  2025-11-11 21:21 ` [PATCH v4 03/18] perf jevents: Add set of common metrics based on default ones Ian Rogers
@ 2025-11-14 16:28   ` James Clark
  2025-11-14 16:57     ` Ian Rogers
  0 siblings, 1 reply; 33+ messages in thread
From: James Clark @ 2025-11-14 16:28 UTC (permalink / raw)
  To: Ian Rogers, Arnaldo Carvalho de Melo, Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Xu Yang, Chun-Tse Shao, Thomas Richter,
	Sumanth Korikkar, Collin Funk, Thomas Falcon, Howard Chu,
	Dapeng Mi, Levi Yun, Yang Li, linux-kernel, linux-perf-users,
	Andi Kleen, Weilin Wang, Leo Yan



On 11/11/2025 9:21 pm, Ian Rogers wrote:
> Add support to getting a common set of metrics from a default
> table. It simplifies the generation to add json metrics at the same
> time. The metrics added are CPUs_utilized, cs_per_second,
> migrations_per_second, page_faults_per_second, insn_per_cycle,
> stalled_cycles_per_instruction, frontend_cycles_idle,
> backend_cycles_idle, cycles_frequency, branch_frequency and
> branch_miss_rate based on the shadow metric definitions.
> 
> Following this change the default perf stat output on an alderlake
> looks like:
> ```
> $ perf stat -a -- sleep 2
> 
>   Performance counter stats for 'system wide':
> 
>                0.00 msec cpu-clock                        #    0.000 CPUs utilized
>              77,739      context-switches
>              15,033      cpu-migrations
>             321,313      page-faults
>      14,355,634,225      cpu_atom/instructions/           #    1.40  insn per cycle              (35.37%)
>     134,561,560,583      cpu_core/instructions/           #    3.44  insn per cycle              (57.85%)
>      10,263,836,145      cpu_atom/cycles/                                                        (35.42%)
>      39,138,632,894      cpu_core/cycles/                                                        (57.60%)
>       2,989,658,777      cpu_atom/branches/                                                      (42.60%)
>      32,170,570,388      cpu_core/branches/                                                      (57.39%)
>          29,789,870      cpu_atom/branch-misses/          #    1.00% of all branches             (42.69%)
>         165,991,152      cpu_core/branch-misses/          #    0.52% of all branches             (57.19%)
>                         (software)                 #      nan cs/sec  cs_per_second
>               TopdownL1 (cpu_core)                 #     11.9 %  tma_bad_speculation
>                                                    #     19.6 %  tma_frontend_bound       (63.97%)
>               TopdownL1 (cpu_core)                 #     18.8 %  tma_backend_bound
>                                                    #     49.7 %  tma_retiring             (63.97%)
>                         (software)                 #      nan faults/sec  page_faults_per_second
>                                                    #      nan GHz  cycles_frequency       (42.88%)
>                                                    #      nan GHz  cycles_frequency       (69.88%)
>               TopdownL1 (cpu_atom)                 #     11.7 %  tma_bad_speculation
>                                                    #     29.9 %  tma_retiring             (50.07%)
>               TopdownL1 (cpu_atom)                 #     31.3 %  tma_frontend_bound       (43.09%)
>                         (cpu_atom)                 #      nan M/sec  branch_frequency     (43.09%)
>                                                    #      nan M/sec  branch_frequency     (70.07%)
>                                                    #      nan migrations/sec  migrations_per_second
>               TopdownL1 (cpu_atom)                 #     27.1 %  tma_backend_bound        (43.08%)
>                         (software)                 #      0.0 CPUs  CPUs_utilized
>                                                    #      1.4 instructions  insn_per_cycle  (43.04%)
>                                                    #      3.5 instructions  insn_per_cycle  (69.99%)
>                                                    #      1.0 %  branch_miss_rate         (35.46%)
>                                                    #      0.5 %  branch_miss_rate         (65.02%)
> 
>         2.005626564 seconds time elapsed
> ```
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>   .../arch/common/common/metrics.json           |  86 +++++++++++++
>   tools/perf/pmu-events/empty-pmu-events.c      | 115 +++++++++++++-----
>   tools/perf/pmu-events/jevents.py              |  21 +++-
>   tools/perf/pmu-events/pmu-events.h            |   1 +
>   tools/perf/util/metricgroup.c                 |  31 +++--
>   5 files changed, 212 insertions(+), 42 deletions(-)
>   create mode 100644 tools/perf/pmu-events/arch/common/common/metrics.json
> 
> diff --git a/tools/perf/pmu-events/arch/common/common/metrics.json b/tools/perf/pmu-events/arch/common/common/metrics.json
> new file mode 100644
> index 000000000000..d915be51e300
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/common/common/metrics.json
> @@ -0,0 +1,86 @@
> +[
> +    {
> +        "BriefDescription": "Average CPU utilization",
> +        "MetricExpr": "(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)",

Hi Ian,

I noticed that this metric is making "perf stat tests" fail. 
"duration_time" is a tool event and they don't work with "perf stat 
record" anymore. The test tests the record command with the default args 
which results in this event being used and a failure.

I suppose there are three issues. First two are unrelated to this change:

  - Perf stat record continues to write out a bad perf.data file even
    though it knows that tool events won't work.

    For example 'status' ends up being -1 in cmd_stat() but it's ignored
    for some of the writing parts. It does decide to not print any stdout
    though:

    $ perf stat record -e "duration_time"
    <blank>

  - The other issue is obviously that tool events don't work with perf
    stat record which seems to be a regression from 6828d6929b76 ("perf
    evsel: Refactor tool events")

  - The third issue is that this change adds a broken tool event to the
    default output of perf stat

I'm not actually sure what "perf stat record" is for? It's possible that 
it's not used anymore, expecially if nobody noticed that tool events 
haven't been working in it for a while.

I think we're also supposed to have json output for perf stat (although 
this is also broken in some obscure scenarios), so maybe perf stat 
record isn't needed anymore?

Thanks
James

> +        "MetricGroup": "Default",
> +        "MetricName": "CPUs_utilized",
> +        "ScaleUnit": "1CPUs",
> +        "MetricConstraint": "NO_GROUP_EVENTS"
> +    },
> +    {
> +        "BriefDescription": "Context switches per CPU second",
> +        "MetricExpr": "(software@context\\-switches\\,name\\=context\\-switches@ * 1e9) / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
> +        "MetricGroup": "Default",
> +        "MetricName": "cs_per_second",
> +        "ScaleUnit": "1cs/sec",
> +        "MetricConstraint": "NO_GROUP_EVENTS"
> +    },
> +    {
> +        "BriefDescription": "Process migrations to a new CPU per CPU second",
> +        "MetricExpr": "(software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9) / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
> +        "MetricGroup": "Default",
> +        "MetricName": "migrations_per_second",
> +        "ScaleUnit": "1migrations/sec",
> +        "MetricConstraint": "NO_GROUP_EVENTS"
> +    },
> +    {
> +        "BriefDescription": "Page faults per CPU second",
> +        "MetricExpr": "(software@page\\-faults\\,name\\=page\\-faults@ * 1e9) / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
> +        "MetricGroup": "Default",
> +        "MetricName": "page_faults_per_second",
> +        "ScaleUnit": "1faults/sec",
> +        "MetricConstraint": "NO_GROUP_EVENTS"
> +    },
> +    {
> +        "BriefDescription": "Instructions Per Cycle",
> +        "MetricExpr": "instructions / cpu\\-cycles",
> +        "MetricGroup": "Default",
> +        "MetricName": "insn_per_cycle",
> +        "MetricThreshold": "insn_per_cycle < 1",
> +        "ScaleUnit": "1instructions"
> +    },
> +    {
> +        "BriefDescription": "Max front or backend stalls per instruction",
> +        "MetricExpr": "max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions",
> +        "MetricGroup": "Default",
> +        "MetricName": "stalled_cycles_per_instruction"
> +    },
> +    {
> +        "BriefDescription": "Frontend stalls per cycle",
> +        "MetricExpr": "stalled\\-cycles\\-frontend / cpu\\-cycles",
> +        "MetricGroup": "Default",
> +        "MetricName": "frontend_cycles_idle",
> +        "MetricThreshold": "frontend_cycles_idle > 0.1"
> +    },
> +    {
> +        "BriefDescription": "Backend stalls per cycle",
> +        "MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
> +        "MetricGroup": "Default",
> +        "MetricName": "backend_cycles_idle",
> +        "MetricThreshold": "backend_cycles_idle > 0.2"
> +    },
> +    {
> +        "BriefDescription": "Cycles per CPU second",
> +        "MetricExpr": "cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
> +        "MetricGroup": "Default",
> +        "MetricName": "cycles_frequency",
> +        "ScaleUnit": "1GHz",
> +        "MetricConstraint": "NO_GROUP_EVENTS"
> +    },
> +    {
> +        "BriefDescription": "Branches per CPU second",
> +        "MetricExpr": "branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
> +        "MetricGroup": "Default",
> +        "MetricName": "branch_frequency",
> +        "ScaleUnit": "1000M/sec",
> +        "MetricConstraint": "NO_GROUP_EVENTS"
> +    },
> +    {
> +        "BriefDescription": "Branch miss rate",
> +        "MetricExpr": "branch\\-misses / branches",
> +        "MetricGroup": "Default",
> +        "MetricName": "branch_miss_rate",
> +        "MetricThreshold": "branch_miss_rate > 0.05",
> +        "ScaleUnit": "100%"
> +    }
> +]
> diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
> index 2fdf4fbf36e2..e4d00f6b2b5d 100644
> --- a/tools/perf/pmu-events/empty-pmu-events.c
> +++ b/tools/perf/pmu-events/empty-pmu-events.c
> @@ -1303,21 +1303,32 @@ static const char *const big_c_string =
>   /* offset=127519 */ "sys_ccn_pmu.read_cycles\000uncore\000ccn read-cycles event\000config=0x2c\0000x01\00000\000\000\000\000\000"
>   /* offset=127596 */ "uncore_sys_cmn_pmu\000"
>   /* offset=127615 */ "sys_cmn_pmu.hnf_cache_miss\000uncore\000Counts total cache misses in first lookup result (high priority)\000eventid=1,type=5\000(434|436|43c|43a).*\00000\000\000\000\000\000"
> -/* offset=127758 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\00000"
> -/* offset=127780 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\00000"
> -/* offset=127843 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\00000"
> -/* offset=128009 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000"
> -/* offset=128073 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000"
> -/* offset=128140 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\00000"
> -/* offset=128211 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\00000"
> -/* offset=128305 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\00000"
> -/* offset=128439 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\00000"
> -/* offset=128503 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\00000"
> -/* offset=128571 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\00000"
> -/* offset=128641 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\00000"
> -/* offset=128663 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\00000"
> -/* offset=128685 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\00000"
> -/* offset=128705 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\00000"
> +/* offset=127758 */ "CPUs_utilized\000Default\000(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)\000\000Average CPU utilization\000\0001CPUs\000\000\000\00001"
> +/* offset=127943 */ "cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\00001"
> +/* offset=128175 */ "migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\00001"
> +/* offset=128434 */ "page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\00001"
> +/* offset=128664 */ "insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\00000"
> +/* offset=128776 */ "stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\00000"
> +/* offset=128939 */ "frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\00000"
> +/* offset=129068 */ "backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\00000"
> +/* offset=129193 */ "cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\00001"
> +/* offset=129368 */ "branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\00001"
> +/* offset=129547 */ "branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\00000"
> +/* offset=129650 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\00000"
> +/* offset=129672 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\00000"
> +/* offset=129735 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\00000"
> +/* offset=129901 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000"
> +/* offset=129965 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000"
> +/* offset=130032 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\00000"
> +/* offset=130103 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\00000"
> +/* offset=130197 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\00000"
> +/* offset=130331 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\00000"
> +/* offset=130395 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\00000"
> +/* offset=130463 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\00000"
> +/* offset=130533 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\00000"
> +/* offset=130555 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\00000"
> +/* offset=130577 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\00000"
> +/* offset=130597 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\00000"
>   ;
>   
>   static const struct compact_pmu_event pmu_events__common_default_core[] = {
> @@ -2603,6 +2614,29 @@ static const struct pmu_table_entry pmu_events__common[] = {
>   },
>   };
>   
> +static const struct compact_pmu_event pmu_metrics__common_default_core[] = {
> +{ 127758 }, /* CPUs_utilized\000Default\000(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)\000\000Average CPU utilization\000\0001CPUs\000\000\000\00001 */
> +{ 129068 }, /* backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\00000 */
> +{ 129368 }, /* branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\00001 */
> +{ 129547 }, /* branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\00000 */
> +{ 127943 }, /* cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\00001 */
> +{ 129193 }, /* cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\00001 */
> +{ 128939 }, /* frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\00000 */
> +{ 128664 }, /* insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\00000 */
> +{ 128175 }, /* migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\00001 */
> +{ 128434 }, /* page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\00001 */
> +{ 128776 }, /* stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\00000 */
> +
> +};
> +
> +static const struct pmu_table_entry pmu_metrics__common[] = {
> +{
> +     .entries = pmu_metrics__common_default_core,
> +     .num_entries = ARRAY_SIZE(pmu_metrics__common_default_core),
> +     .pmu_name = { 0 /* default_core\000 */ },
> +},
> +};
> +
>   static const struct compact_pmu_event pmu_events__test_soc_cpu_default_core[] = {
>   { 126205 }, /* bp_l1_btb_correct\000branch\000L1 BTB Correction\000event=0x8a\000\00000\000\000\000\000\000 */
>   { 126267 }, /* bp_l2_btb_correct\000branch\000L2 BTB Correction\000event=0x8b\000\00000\000\000\000\000\000 */
> @@ -2664,21 +2698,21 @@ static const struct pmu_table_entry pmu_events__test_soc_cpu[] = {
>   };
>   
>   static const struct compact_pmu_event pmu_metrics__test_soc_cpu_default_core[] = {
> -{ 127758 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\00000 */
> -{ 128439 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\00000 */
> -{ 128211 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\00000 */
> -{ 128305 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\00000 */
> -{ 128503 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\00000 */
> -{ 128571 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\00000 */
> -{ 127843 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\00000 */
> -{ 127780 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\00000 */
> -{ 128705 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\00000 */
> -{ 128641 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\00000 */
> -{ 128663 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\00000 */
> -{ 128685 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\00000 */
> -{ 128140 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\00000 */
> -{ 128009 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000 */
> -{ 128073 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000 */
> +{ 129650 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\00000 */
> +{ 130331 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\00000 */
> +{ 130103 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\00000 */
> +{ 130197 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\00000 */
> +{ 130395 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\00000 */
> +{ 130463 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\00000 */
> +{ 129735 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\00000 */
> +{ 129672 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\00000 */
> +{ 130597 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\00000 */
> +{ 130533 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\00000 */
> +{ 130555 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\00000 */
> +{ 130577 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\00000 */
> +{ 130032 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\00000 */
> +{ 129901 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000 */
> +{ 129965 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000 */
>   
>   };
>   
> @@ -2759,7 +2793,10 @@ static const struct pmu_events_map pmu_events_map[] = {
>   		.pmus = pmu_events__common,
>   		.num_pmus = ARRAY_SIZE(pmu_events__common),
>   	},
> -	.metric_table = {},
> +	.metric_table = {
> +		.pmus = pmu_metrics__common,
> +		.num_pmus = ARRAY_SIZE(pmu_metrics__common),
> +	},
>   },
>   {
>   	.arch = "testarch",
> @@ -3208,6 +3245,22 @@ const struct pmu_metrics_table *pmu_metrics_table__find(void)
>           return map ? &map->metric_table : NULL;
>   }
>   
> +const struct pmu_metrics_table *pmu_metrics_table__default(void)
> +{
> +        int i = 0;
> +
> +        for (;;) {
> +                const struct pmu_events_map *map = &pmu_events_map[i++];
> +
> +                if (!map->arch)
> +                        break;
> +
> +                if (!strcmp(map->cpuid, "common"))
> +                        return &map->metric_table;
> +        }
> +        return NULL;
> +}
> +
>   const struct pmu_events_table *find_core_events_table(const char *arch, const char *cpuid)
>   {
>           for (const struct pmu_events_map *tables = &pmu_events_map[0];
> diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
> index 786a7049363f..5d3f4b44cfb7 100755
> --- a/tools/perf/pmu-events/jevents.py
> +++ b/tools/perf/pmu-events/jevents.py
> @@ -755,7 +755,10 @@ static const struct pmu_events_map pmu_events_map[] = {
>   \t\t.pmus = pmu_events__common,
>   \t\t.num_pmus = ARRAY_SIZE(pmu_events__common),
>   \t},
> -\t.metric_table = {},
> +\t.metric_table = {
> +\t\t.pmus = pmu_metrics__common,
> +\t\t.num_pmus = ARRAY_SIZE(pmu_metrics__common),
> +\t},
>   },
>   """)
>       else:
> @@ -1237,6 +1240,22 @@ const struct pmu_metrics_table *pmu_metrics_table__find(void)
>           return map ? &map->metric_table : NULL;
>   }
>   
> +const struct pmu_metrics_table *pmu_metrics_table__default(void)
> +{
> +        int i = 0;
> +
> +        for (;;) {
> +                const struct pmu_events_map *map = &pmu_events_map[i++];
> +
> +                if (!map->arch)
> +                        break;
> +
> +                if (!strcmp(map->cpuid, "common"))
> +                        return &map->metric_table;
> +        }
> +        return NULL;
> +}
> +
>   const struct pmu_events_table *find_core_events_table(const char *arch, const char *cpuid)
>   {
>           for (const struct pmu_events_map *tables = &pmu_events_map[0];
> diff --git a/tools/perf/pmu-events/pmu-events.h b/tools/perf/pmu-events/pmu-events.h
> index e0535380c0b2..559265a903c8 100644
> --- a/tools/perf/pmu-events/pmu-events.h
> +++ b/tools/perf/pmu-events/pmu-events.h
> @@ -127,6 +127,7 @@ int pmu_metrics_table__find_metric(const struct pmu_metrics_table *table,
>   const struct pmu_events_table *perf_pmu__find_events_table(struct perf_pmu *pmu);
>   const struct pmu_events_table *perf_pmu__default_core_events_table(void);
>   const struct pmu_metrics_table *pmu_metrics_table__find(void);
> +const struct pmu_metrics_table *pmu_metrics_table__default(void);
>   const struct pmu_events_table *find_core_events_table(const char *arch, const char *cpuid);
>   const struct pmu_metrics_table *find_core_metrics_table(const char *arch, const char *cpuid);
>   int pmu_for_each_core_event(pmu_event_iter_fn fn, void *data);
> diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> index 76092ee26761..e67e04ce01c9 100644
> --- a/tools/perf/util/metricgroup.c
> +++ b/tools/perf/util/metricgroup.c
> @@ -424,10 +424,18 @@ int metricgroup__for_each_metric(const struct pmu_metrics_table *table, pmu_metr
>   		.fn = fn,
>   		.data = data,
>   	};
> +	const struct pmu_metrics_table *tables[2] = {
> +		table,
> +		pmu_metrics_table__default(),
> +	};
> +
> +	for (size_t i = 0; i < ARRAY_SIZE(tables); i++) {
> +		int ret;
>   
> -	if (table) {
> -		int ret = pmu_metrics_table__for_each_metric(table, fn, data);
> +		if (!tables[i])
> +			continue;
>   
> +		ret = pmu_metrics_table__for_each_metric(tables[i], fn, data);
>   		if (ret)
>   			return ret;
>   	}
> @@ -1581,19 +1589,22 @@ static int metricgroup__has_metric_or_groups_callback(const struct pmu_metric *p
>   
>   bool metricgroup__has_metric_or_groups(const char *pmu, const char *metric_or_groups)
>   {
> -	const struct pmu_metrics_table *table = pmu_metrics_table__find();
> +	const struct pmu_metrics_table *tables[2] = {
> +		pmu_metrics_table__find(),
> +		pmu_metrics_table__default(),
> +	};
>   	struct metricgroup__has_metric_data data = {
>   		.pmu = pmu,
>   		.metric_or_groups = metric_or_groups,
>   	};
>   
> -	if (!table)
> -		return false;
> -
> -	return pmu_metrics_table__for_each_metric(table,
> -						  metricgroup__has_metric_or_groups_callback,
> -						  &data)
> -		? true : false;
> +	for (size_t i = 0; i < ARRAY_SIZE(tables); i++) {
> +		if (pmu_metrics_table__for_each_metric(tables[i],
> +							metricgroup__has_metric_or_groups_callback,
> +							&data))
> +			return true;
> +	}
> +	return false;
>   }
>   
>   static int metricgroup__topdown_max_level_callback(const struct pmu_metric *pm,


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 03/18] perf jevents: Add set of common metrics based on default ones
  2025-11-14 16:28   ` James Clark
@ 2025-11-14 16:57     ` Ian Rogers
  2025-11-15 17:52       ` Namhyung Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Ian Rogers @ 2025-11-14 16:57 UTC (permalink / raw)
  To: James Clark
  Cc: Arnaldo Carvalho de Melo, Namhyung Kim, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Xu Yang, Chun-Tse Shao, Thomas Richter, Sumanth Korikkar,
	Collin Funk, Thomas Falcon, Howard Chu, Dapeng Mi, Levi Yun,
	Yang Li, linux-kernel, linux-perf-users, Andi Kleen, Weilin Wang,
	Leo Yan

On Fri, Nov 14, 2025 at 8:28 AM James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 11/11/2025 9:21 pm, Ian Rogers wrote:
> > Add support to getting a common set of metrics from a default
> > table. It simplifies the generation to add json metrics at the same
> > time. The metrics added are CPUs_utilized, cs_per_second,
> > migrations_per_second, page_faults_per_second, insn_per_cycle,
> > stalled_cycles_per_instruction, frontend_cycles_idle,
> > backend_cycles_idle, cycles_frequency, branch_frequency and
> > branch_miss_rate based on the shadow metric definitions.
> >
> > Following this change the default perf stat output on an alderlake
> > looks like:
> > ```
> > $ perf stat -a -- sleep 2
> >
> >   Performance counter stats for 'system wide':
> >
> >                0.00 msec cpu-clock                        #    0.000 CPUs utilized
> >              77,739      context-switches
> >              15,033      cpu-migrations
> >             321,313      page-faults
> >      14,355,634,225      cpu_atom/instructions/           #    1.40  insn per cycle              (35.37%)
> >     134,561,560,583      cpu_core/instructions/           #    3.44  insn per cycle              (57.85%)
> >      10,263,836,145      cpu_atom/cycles/                                                        (35.42%)
> >      39,138,632,894      cpu_core/cycles/                                                        (57.60%)
> >       2,989,658,777      cpu_atom/branches/                                                      (42.60%)
> >      32,170,570,388      cpu_core/branches/                                                      (57.39%)
> >          29,789,870      cpu_atom/branch-misses/          #    1.00% of all branches             (42.69%)
> >         165,991,152      cpu_core/branch-misses/          #    0.52% of all branches             (57.19%)
> >                         (software)                 #      nan cs/sec  cs_per_second
> >               TopdownL1 (cpu_core)                 #     11.9 %  tma_bad_speculation
> >                                                    #     19.6 %  tma_frontend_bound       (63.97%)
> >               TopdownL1 (cpu_core)                 #     18.8 %  tma_backend_bound
> >                                                    #     49.7 %  tma_retiring             (63.97%)
> >                         (software)                 #      nan faults/sec  page_faults_per_second
> >                                                    #      nan GHz  cycles_frequency       (42.88%)
> >                                                    #      nan GHz  cycles_frequency       (69.88%)
> >               TopdownL1 (cpu_atom)                 #     11.7 %  tma_bad_speculation
> >                                                    #     29.9 %  tma_retiring             (50.07%)
> >               TopdownL1 (cpu_atom)                 #     31.3 %  tma_frontend_bound       (43.09%)
> >                         (cpu_atom)                 #      nan M/sec  branch_frequency     (43.09%)
> >                                                    #      nan M/sec  branch_frequency     (70.07%)
> >                                                    #      nan migrations/sec  migrations_per_second
> >               TopdownL1 (cpu_atom)                 #     27.1 %  tma_backend_bound        (43.08%)
> >                         (software)                 #      0.0 CPUs  CPUs_utilized
> >                                                    #      1.4 instructions  insn_per_cycle  (43.04%)
> >                                                    #      3.5 instructions  insn_per_cycle  (69.99%)
> >                                                    #      1.0 %  branch_miss_rate         (35.46%)
> >                                                    #      0.5 %  branch_miss_rate         (65.02%)
> >
> >         2.005626564 seconds time elapsed
> > ```
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >   .../arch/common/common/metrics.json           |  86 +++++++++++++
> >   tools/perf/pmu-events/empty-pmu-events.c      | 115 +++++++++++++-----
> >   tools/perf/pmu-events/jevents.py              |  21 +++-
> >   tools/perf/pmu-events/pmu-events.h            |   1 +
> >   tools/perf/util/metricgroup.c                 |  31 +++--
> >   5 files changed, 212 insertions(+), 42 deletions(-)
> >   create mode 100644 tools/perf/pmu-events/arch/common/common/metrics.json
> >
> > diff --git a/tools/perf/pmu-events/arch/common/common/metrics.json b/tools/perf/pmu-events/arch/common/common/metrics.json
> > new file mode 100644
> > index 000000000000..d915be51e300
> > --- /dev/null
> > +++ b/tools/perf/pmu-events/arch/common/common/metrics.json
> > @@ -0,0 +1,86 @@
> > +[
> > +    {
> > +        "BriefDescription": "Average CPU utilization",
> > +        "MetricExpr": "(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)",
>
> Hi Ian,
>
> I noticed that this metric is making "perf stat tests" fail.
> "duration_time" is a tool event and they don't work with "perf stat
> record" anymore. The test tests the record command with the default args
> which results in this event being used and a failure.
>
> I suppose there are three issues. First two are unrelated to this change:
>
>   - Perf stat record continues to write out a bad perf.data file even
>     though it knows that tool events won't work.
>
>     For example 'status' ends up being -1 in cmd_stat() but it's ignored
>     for some of the writing parts. It does decide to not print any stdout
>     though:
>
>     $ perf stat record -e "duration_time"
>     <blank>
>
>   - The other issue is obviously that tool events don't work with perf
>     stat record which seems to be a regression from 6828d6929b76 ("perf
>     evsel: Refactor tool events")
>
>   - The third issue is that this change adds a broken tool event to the
>     default output of perf stat
>
> I'm not actually sure what "perf stat record" is for? It's possible that
> it's not used anymore, expecially if nobody noticed that tool events
> haven't been working in it for a while.
>
> I think we're also supposed to have json output for perf stat (although
> this is also broken in some obscure scenarios), so maybe perf stat
> record isn't needed anymore?

Hi James,

Thanks for the report. I think this is also an overlap with perf stat
metrics don't work with perf stat record, and because these changes
made that the default. Let me do some follow up work as the perf
script work shows we can do useful things with metrics while not being
on a live perf stat - there's the obstacle that the CPUID of the host
will be used :-/

Anyway, I'll take a look and we should add a test on this. There is
one that the perf stat json output is okay, to some definition. One
problem is that the stat-display code is complete spaghetti. Now that
stat-shadow only handles json metrics, and perf script isn't trying to
maintain a set of shadow counters, that is a little bit improved.

Thanks,
Ian

> Thanks
> James
>
> > +        "MetricGroup": "Default",
> > +        "MetricName": "CPUs_utilized",
> > +        "ScaleUnit": "1CPUs",
> > +        "MetricConstraint": "NO_GROUP_EVENTS"
> > +    },
> > +    {
> > +        "BriefDescription": "Context switches per CPU second",
> > +        "MetricExpr": "(software@context\\-switches\\,name\\=context\\-switches@ * 1e9) / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
> > +        "MetricGroup": "Default",
> > +        "MetricName": "cs_per_second",
> > +        "ScaleUnit": "1cs/sec",
> > +        "MetricConstraint": "NO_GROUP_EVENTS"
> > +    },
> > +    {
> > +        "BriefDescription": "Process migrations to a new CPU per CPU second",
> > +        "MetricExpr": "(software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9) / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
> > +        "MetricGroup": "Default",
> > +        "MetricName": "migrations_per_second",
> > +        "ScaleUnit": "1migrations/sec",
> > +        "MetricConstraint": "NO_GROUP_EVENTS"
> > +    },
> > +    {
> > +        "BriefDescription": "Page faults per CPU second",
> > +        "MetricExpr": "(software@page\\-faults\\,name\\=page\\-faults@ * 1e9) / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
> > +        "MetricGroup": "Default",
> > +        "MetricName": "page_faults_per_second",
> > +        "ScaleUnit": "1faults/sec",
> > +        "MetricConstraint": "NO_GROUP_EVENTS"
> > +    },
> > +    {
> > +        "BriefDescription": "Instructions Per Cycle",
> > +        "MetricExpr": "instructions / cpu\\-cycles",
> > +        "MetricGroup": "Default",
> > +        "MetricName": "insn_per_cycle",
> > +        "MetricThreshold": "insn_per_cycle < 1",
> > +        "ScaleUnit": "1instructions"
> > +    },
> > +    {
> > +        "BriefDescription": "Max front or backend stalls per instruction",
> > +        "MetricExpr": "max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions",
> > +        "MetricGroup": "Default",
> > +        "MetricName": "stalled_cycles_per_instruction"
> > +    },
> > +    {
> > +        "BriefDescription": "Frontend stalls per cycle",
> > +        "MetricExpr": "stalled\\-cycles\\-frontend / cpu\\-cycles",
> > +        "MetricGroup": "Default",
> > +        "MetricName": "frontend_cycles_idle",
> > +        "MetricThreshold": "frontend_cycles_idle > 0.1"
> > +    },
> > +    {
> > +        "BriefDescription": "Backend stalls per cycle",
> > +        "MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
> > +        "MetricGroup": "Default",
> > +        "MetricName": "backend_cycles_idle",
> > +        "MetricThreshold": "backend_cycles_idle > 0.2"
> > +    },
> > +    {
> > +        "BriefDescription": "Cycles per CPU second",
> > +        "MetricExpr": "cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
> > +        "MetricGroup": "Default",
> > +        "MetricName": "cycles_frequency",
> > +        "ScaleUnit": "1GHz",
> > +        "MetricConstraint": "NO_GROUP_EVENTS"
> > +    },
> > +    {
> > +        "BriefDescription": "Branches per CPU second",
> > +        "MetricExpr": "branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
> > +        "MetricGroup": "Default",
> > +        "MetricName": "branch_frequency",
> > +        "ScaleUnit": "1000M/sec",
> > +        "MetricConstraint": "NO_GROUP_EVENTS"
> > +    },
> > +    {
> > +        "BriefDescription": "Branch miss rate",
> > +        "MetricExpr": "branch\\-misses / branches",
> > +        "MetricGroup": "Default",
> > +        "MetricName": "branch_miss_rate",
> > +        "MetricThreshold": "branch_miss_rate > 0.05",
> > +        "ScaleUnit": "100%"
> > +    }
> > +]
> > diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
> > index 2fdf4fbf36e2..e4d00f6b2b5d 100644
> > --- a/tools/perf/pmu-events/empty-pmu-events.c
> > +++ b/tools/perf/pmu-events/empty-pmu-events.c
> > @@ -1303,21 +1303,32 @@ static const char *const big_c_string =
> >   /* offset=127519 */ "sys_ccn_pmu.read_cycles\000uncore\000ccn read-cycles event\000config=0x2c\0000x01\00000\000\000\000\000\000"
> >   /* offset=127596 */ "uncore_sys_cmn_pmu\000"
> >   /* offset=127615 */ "sys_cmn_pmu.hnf_cache_miss\000uncore\000Counts total cache misses in first lookup result (high priority)\000eventid=1,type=5\000(434|436|43c|43a).*\00000\000\000\000\000\000"
> > -/* offset=127758 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\00000"
> > -/* offset=127780 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\00000"
> > -/* offset=127843 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\00000"
> > -/* offset=128009 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000"
> > -/* offset=128073 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000"
> > -/* offset=128140 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\00000"
> > -/* offset=128211 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\00000"
> > -/* offset=128305 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\00000"
> > -/* offset=128439 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\00000"
> > -/* offset=128503 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\00000"
> > -/* offset=128571 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\00000"
> > -/* offset=128641 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\00000"
> > -/* offset=128663 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\00000"
> > -/* offset=128685 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\00000"
> > -/* offset=128705 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\00000"
> > +/* offset=127758 */ "CPUs_utilized\000Default\000(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)\000\000Average CPU utilization\000\0001CPUs\000\000\000\00001"
> > +/* offset=127943 */ "cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\00001"
> > +/* offset=128175 */ "migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\00001"
> > +/* offset=128434 */ "page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\00001"
> > +/* offset=128664 */ "insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\00000"
> > +/* offset=128776 */ "stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\00000"
> > +/* offset=128939 */ "frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\00000"
> > +/* offset=129068 */ "backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\00000"
> > +/* offset=129193 */ "cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\00001"
> > +/* offset=129368 */ "branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\00001"
> > +/* offset=129547 */ "branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\00000"
> > +/* offset=129650 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\00000"
> > +/* offset=129672 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\00000"
> > +/* offset=129735 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\00000"
> > +/* offset=129901 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000"
> > +/* offset=129965 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000"
> > +/* offset=130032 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\00000"
> > +/* offset=130103 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\00000"
> > +/* offset=130197 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\00000"
> > +/* offset=130331 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\00000"
> > +/* offset=130395 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\00000"
> > +/* offset=130463 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\00000"
> > +/* offset=130533 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\00000"
> > +/* offset=130555 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\00000"
> > +/* offset=130577 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\00000"
> > +/* offset=130597 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\00000"
> >   ;
> >
> >   static const struct compact_pmu_event pmu_events__common_default_core[] = {
> > @@ -2603,6 +2614,29 @@ static const struct pmu_table_entry pmu_events__common[] = {
> >   },
> >   };
> >
> > +static const struct compact_pmu_event pmu_metrics__common_default_core[] = {
> > +{ 127758 }, /* CPUs_utilized\000Default\000(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)\000\000Average CPU utilization\000\0001CPUs\000\000\000\00001 */
> > +{ 129068 }, /* backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\00000 */
> > +{ 129368 }, /* branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\00001 */
> > +{ 129547 }, /* branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\00000 */
> > +{ 127943 }, /* cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\00001 */
> > +{ 129193 }, /* cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\00001 */
> > +{ 128939 }, /* frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\00000 */
> > +{ 128664 }, /* insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\00000 */
> > +{ 128175 }, /* migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\00001 */
> > +{ 128434 }, /* page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\00001 */
> > +{ 128776 }, /* stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\00000 */
> > +
> > +};
> > +
> > +static const struct pmu_table_entry pmu_metrics__common[] = {
> > +{
> > +     .entries = pmu_metrics__common_default_core,
> > +     .num_entries = ARRAY_SIZE(pmu_metrics__common_default_core),
> > +     .pmu_name = { 0 /* default_core\000 */ },
> > +},
> > +};
> > +
> >   static const struct compact_pmu_event pmu_events__test_soc_cpu_default_core[] = {
> >   { 126205 }, /* bp_l1_btb_correct\000branch\000L1 BTB Correction\000event=0x8a\000\00000\000\000\000\000\000 */
> >   { 126267 }, /* bp_l2_btb_correct\000branch\000L2 BTB Correction\000event=0x8b\000\00000\000\000\000\000\000 */
> > @@ -2664,21 +2698,21 @@ static const struct pmu_table_entry pmu_events__test_soc_cpu[] = {
> >   };
> >
> >   static const struct compact_pmu_event pmu_metrics__test_soc_cpu_default_core[] = {
> > -{ 127758 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\00000 */
> > -{ 128439 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\00000 */
> > -{ 128211 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\00000 */
> > -{ 128305 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\00000 */
> > -{ 128503 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\00000 */
> > -{ 128571 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\00000 */
> > -{ 127843 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\00000 */
> > -{ 127780 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\00000 */
> > -{ 128705 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\00000 */
> > -{ 128641 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\00000 */
> > -{ 128663 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\00000 */
> > -{ 128685 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\00000 */
> > -{ 128140 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\00000 */
> > -{ 128009 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000 */
> > -{ 128073 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000 */
> > +{ 129650 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\00000 */
> > +{ 130331 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\00000 */
> > +{ 130103 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\00000 */
> > +{ 130197 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\00000 */
> > +{ 130395 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\00000 */
> > +{ 130463 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\00000 */
> > +{ 129735 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\00000 */
> > +{ 129672 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\00000 */
> > +{ 130597 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\00000 */
> > +{ 130533 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\00000 */
> > +{ 130555 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\00000 */
> > +{ 130577 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\00000 */
> > +{ 130032 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\00000 */
> > +{ 129901 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000 */
> > +{ 129965 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000 */
> >
> >   };
> >
> > @@ -2759,7 +2793,10 @@ static const struct pmu_events_map pmu_events_map[] = {
> >               .pmus = pmu_events__common,
> >               .num_pmus = ARRAY_SIZE(pmu_events__common),
> >       },
> > -     .metric_table = {},
> > +     .metric_table = {
> > +             .pmus = pmu_metrics__common,
> > +             .num_pmus = ARRAY_SIZE(pmu_metrics__common),
> > +     },
> >   },
> >   {
> >       .arch = "testarch",
> > @@ -3208,6 +3245,22 @@ const struct pmu_metrics_table *pmu_metrics_table__find(void)
> >           return map ? &map->metric_table : NULL;
> >   }
> >
> > +const struct pmu_metrics_table *pmu_metrics_table__default(void)
> > +{
> > +        int i = 0;
> > +
> > +        for (;;) {
> > +                const struct pmu_events_map *map = &pmu_events_map[i++];
> > +
> > +                if (!map->arch)
> > +                        break;
> > +
> > +                if (!strcmp(map->cpuid, "common"))
> > +                        return &map->metric_table;
> > +        }
> > +        return NULL;
> > +}
> > +
> >   const struct pmu_events_table *find_core_events_table(const char *arch, const char *cpuid)
> >   {
> >           for (const struct pmu_events_map *tables = &pmu_events_map[0];
> > diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
> > index 786a7049363f..5d3f4b44cfb7 100755
> > --- a/tools/perf/pmu-events/jevents.py
> > +++ b/tools/perf/pmu-events/jevents.py
> > @@ -755,7 +755,10 @@ static const struct pmu_events_map pmu_events_map[] = {
> >   \t\t.pmus = pmu_events__common,
> >   \t\t.num_pmus = ARRAY_SIZE(pmu_events__common),
> >   \t},
> > -\t.metric_table = {},
> > +\t.metric_table = {
> > +\t\t.pmus = pmu_metrics__common,
> > +\t\t.num_pmus = ARRAY_SIZE(pmu_metrics__common),
> > +\t},
> >   },
> >   """)
> >       else:
> > @@ -1237,6 +1240,22 @@ const struct pmu_metrics_table *pmu_metrics_table__find(void)
> >           return map ? &map->metric_table : NULL;
> >   }
> >
> > +const struct pmu_metrics_table *pmu_metrics_table__default(void)
> > +{
> > +        int i = 0;
> > +
> > +        for (;;) {
> > +                const struct pmu_events_map *map = &pmu_events_map[i++];
> > +
> > +                if (!map->arch)
> > +                        break;
> > +
> > +                if (!strcmp(map->cpuid, "common"))
> > +                        return &map->metric_table;
> > +        }
> > +        return NULL;
> > +}
> > +
> >   const struct pmu_events_table *find_core_events_table(const char *arch, const char *cpuid)
> >   {
> >           for (const struct pmu_events_map *tables = &pmu_events_map[0];
> > diff --git a/tools/perf/pmu-events/pmu-events.h b/tools/perf/pmu-events/pmu-events.h
> > index e0535380c0b2..559265a903c8 100644
> > --- a/tools/perf/pmu-events/pmu-events.h
> > +++ b/tools/perf/pmu-events/pmu-events.h
> > @@ -127,6 +127,7 @@ int pmu_metrics_table__find_metric(const struct pmu_metrics_table *table,
> >   const struct pmu_events_table *perf_pmu__find_events_table(struct perf_pmu *pmu);
> >   const struct pmu_events_table *perf_pmu__default_core_events_table(void);
> >   const struct pmu_metrics_table *pmu_metrics_table__find(void);
> > +const struct pmu_metrics_table *pmu_metrics_table__default(void);
> >   const struct pmu_events_table *find_core_events_table(const char *arch, const char *cpuid);
> >   const struct pmu_metrics_table *find_core_metrics_table(const char *arch, const char *cpuid);
> >   int pmu_for_each_core_event(pmu_event_iter_fn fn, void *data);
> > diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> > index 76092ee26761..e67e04ce01c9 100644
> > --- a/tools/perf/util/metricgroup.c
> > +++ b/tools/perf/util/metricgroup.c
> > @@ -424,10 +424,18 @@ int metricgroup__for_each_metric(const struct pmu_metrics_table *table, pmu_metr
> >               .fn = fn,
> >               .data = data,
> >       };
> > +     const struct pmu_metrics_table *tables[2] = {
> > +             table,
> > +             pmu_metrics_table__default(),
> > +     };
> > +
> > +     for (size_t i = 0; i < ARRAY_SIZE(tables); i++) {
> > +             int ret;
> >
> > -     if (table) {
> > -             int ret = pmu_metrics_table__for_each_metric(table, fn, data);
> > +             if (!tables[i])
> > +                     continue;
> >
> > +             ret = pmu_metrics_table__for_each_metric(tables[i], fn, data);
> >               if (ret)
> >                       return ret;
> >       }
> > @@ -1581,19 +1589,22 @@ static int metricgroup__has_metric_or_groups_callback(const struct pmu_metric *p
> >
> >   bool metricgroup__has_metric_or_groups(const char *pmu, const char *metric_or_groups)
> >   {
> > -     const struct pmu_metrics_table *table = pmu_metrics_table__find();
> > +     const struct pmu_metrics_table *tables[2] = {
> > +             pmu_metrics_table__find(),
> > +             pmu_metrics_table__default(),
> > +     };
> >       struct metricgroup__has_metric_data data = {
> >               .pmu = pmu,
> >               .metric_or_groups = metric_or_groups,
> >       };
> >
> > -     if (!table)
> > -             return false;
> > -
> > -     return pmu_metrics_table__for_each_metric(table,
> > -                                               metricgroup__has_metric_or_groups_callback,
> > -                                               &data)
> > -             ? true : false;
> > +     for (size_t i = 0; i < ARRAY_SIZE(tables); i++) {
> > +             if (pmu_metrics_table__for_each_metric(tables[i],
> > +                                                     metricgroup__has_metric_or_groups_callback,
> > +                                                     &data))
> > +                     return true;
> > +     }
> > +     return false;
> >   }
> >
> >   static int metricgroup__topdown_max_level_callback(const struct pmu_metric *pm,
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 03/18] perf jevents: Add set of common metrics based on default ones
  2025-11-14 16:57     ` Ian Rogers
@ 2025-11-15 17:52       ` Namhyung Kim
  2025-11-16  3:29         ` Ian Rogers
  0 siblings, 1 reply; 33+ messages in thread
From: Namhyung Kim @ 2025-11-15 17:52 UTC (permalink / raw)
  To: Ian Rogers
  Cc: James Clark, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Xu Yang, Chun-Tse Shao, Thomas Richter, Sumanth Korikkar,
	Collin Funk, Thomas Falcon, Howard Chu, Dapeng Mi, Levi Yun,
	Yang Li, linux-kernel, linux-perf-users, Andi Kleen, Weilin Wang,
	Leo Yan

On Fri, Nov 14, 2025 at 08:57:39AM -0800, Ian Rogers wrote:
> On Fri, Nov 14, 2025 at 8:28 AM James Clark <james.clark@linaro.org> wrote:
> >
> >
> >
> > On 11/11/2025 9:21 pm, Ian Rogers wrote:
> > > Add support to getting a common set of metrics from a default
> > > table. It simplifies the generation to add json metrics at the same
> > > time. The metrics added are CPUs_utilized, cs_per_second,
> > > migrations_per_second, page_faults_per_second, insn_per_cycle,
> > > stalled_cycles_per_instruction, frontend_cycles_idle,
> > > backend_cycles_idle, cycles_frequency, branch_frequency and
> > > branch_miss_rate based on the shadow metric definitions.
> > >
> > > Following this change the default perf stat output on an alderlake
> > > looks like:
> > > ```
> > > $ perf stat -a -- sleep 2
> > >
> > >   Performance counter stats for 'system wide':
> > >
> > >                0.00 msec cpu-clock                        #    0.000 CPUs utilized
> > >              77,739      context-switches
> > >              15,033      cpu-migrations
> > >             321,313      page-faults
> > >      14,355,634,225      cpu_atom/instructions/           #    1.40  insn per cycle              (35.37%)
> > >     134,561,560,583      cpu_core/instructions/           #    3.44  insn per cycle              (57.85%)
> > >      10,263,836,145      cpu_atom/cycles/                                                        (35.42%)
> > >      39,138,632,894      cpu_core/cycles/                                                        (57.60%)
> > >       2,989,658,777      cpu_atom/branches/                                                      (42.60%)
> > >      32,170,570,388      cpu_core/branches/                                                      (57.39%)
> > >          29,789,870      cpu_atom/branch-misses/          #    1.00% of all branches             (42.69%)
> > >         165,991,152      cpu_core/branch-misses/          #    0.52% of all branches             (57.19%)
> > >                         (software)                 #      nan cs/sec  cs_per_second
> > >               TopdownL1 (cpu_core)                 #     11.9 %  tma_bad_speculation
> > >                                                    #     19.6 %  tma_frontend_bound       (63.97%)
> > >               TopdownL1 (cpu_core)                 #     18.8 %  tma_backend_bound
> > >                                                    #     49.7 %  tma_retiring             (63.97%)
> > >                         (software)                 #      nan faults/sec  page_faults_per_second
> > >                                                    #      nan GHz  cycles_frequency       (42.88%)
> > >                                                    #      nan GHz  cycles_frequency       (69.88%)
> > >               TopdownL1 (cpu_atom)                 #     11.7 %  tma_bad_speculation
> > >                                                    #     29.9 %  tma_retiring             (50.07%)
> > >               TopdownL1 (cpu_atom)                 #     31.3 %  tma_frontend_bound       (43.09%)
> > >                         (cpu_atom)                 #      nan M/sec  branch_frequency     (43.09%)
> > >                                                    #      nan M/sec  branch_frequency     (70.07%)
> > >                                                    #      nan migrations/sec  migrations_per_second
> > >               TopdownL1 (cpu_atom)                 #     27.1 %  tma_backend_bound        (43.08%)
> > >                         (software)                 #      0.0 CPUs  CPUs_utilized
> > >                                                    #      1.4 instructions  insn_per_cycle  (43.04%)
> > >                                                    #      3.5 instructions  insn_per_cycle  (69.99%)
> > >                                                    #      1.0 %  branch_miss_rate         (35.46%)
> > >                                                    #      0.5 %  branch_miss_rate         (65.02%)
> > >
> > >         2.005626564 seconds time elapsed
> > > ```
> > >
> > > Signed-off-by: Ian Rogers <irogers@google.com>
> > > ---
> > >   .../arch/common/common/metrics.json           |  86 +++++++++++++
> > >   tools/perf/pmu-events/empty-pmu-events.c      | 115 +++++++++++++-----
> > >   tools/perf/pmu-events/jevents.py              |  21 +++-
> > >   tools/perf/pmu-events/pmu-events.h            |   1 +
> > >   tools/perf/util/metricgroup.c                 |  31 +++--
> > >   5 files changed, 212 insertions(+), 42 deletions(-)
> > >   create mode 100644 tools/perf/pmu-events/arch/common/common/metrics.json
> > >
> > > diff --git a/tools/perf/pmu-events/arch/common/common/metrics.json b/tools/perf/pmu-events/arch/common/common/metrics.json
> > > new file mode 100644
> > > index 000000000000..d915be51e300
> > > --- /dev/null
> > > +++ b/tools/perf/pmu-events/arch/common/common/metrics.json
> > > @@ -0,0 +1,86 @@
> > > +[
> > > +    {
> > > +        "BriefDescription": "Average CPU utilization",
> > > +        "MetricExpr": "(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)",
> >
> > Hi Ian,
> >
> > I noticed that this metric is making "perf stat tests" fail.
> > "duration_time" is a tool event and they don't work with "perf stat
> > record" anymore. The test tests the record command with the default args
> > which results in this event being used and a failure.
> >
> > I suppose there are three issues. First two are unrelated to this change:
> >
> >   - Perf stat record continues to write out a bad perf.data file even
> >     though it knows that tool events won't work.
> >
> >     For example 'status' ends up being -1 in cmd_stat() but it's ignored
> >     for some of the writing parts. It does decide to not print any stdout
> >     though:
> >
> >     $ perf stat record -e "duration_time"
> >     <blank>
> >
> >   - The other issue is obviously that tool events don't work with perf
> >     stat record which seems to be a regression from 6828d6929b76 ("perf
> >     evsel: Refactor tool events")
> >
> >   - The third issue is that this change adds a broken tool event to the
> >     default output of perf stat
> >
> > I'm not actually sure what "perf stat record" is for? It's possible that
> > it's not used anymore, expecially if nobody noticed that tool events
> > haven't been working in it for a while.
> >
> > I think we're also supposed to have json output for perf stat (although
> > this is also broken in some obscure scenarios), so maybe perf stat
> > record isn't needed anymore?
> 
> Hi James,
> 
> Thanks for the report. I think this is also an overlap with perf stat
> metrics don't work with perf stat record, and because these changes
> made that the default. Let me do some follow up work as the perf
> script work shows we can do useful things with metrics while not being
> on a live perf stat - there's the obstacle that the CPUID of the host
> will be used :-/
> 
> Anyway, I'll take a look and we should add a test on this. There is
> one that the perf stat json output is okay, to some definition. One
> problem is that the stat-display code is complete spaghetti. Now that
> stat-shadow only handles json metrics, and perf script isn't trying to
> maintain a set of shadow counters, that is a little bit improved.

I have another test failure on this.  On my AMD machine, perf all
metrics test fails due to missing "LLC-loads" events.

  $ sudo perf stat -M llc_miss_rate true
  Error:
  No supported events found.
  The LLC-loads event is not supported.

Maybe we need to make some cache metrics conditional as some events are
missing.

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 03/18] perf jevents: Add set of common metrics based on default ones
  2025-11-15 17:52       ` Namhyung Kim
@ 2025-11-16  3:29         ` Ian Rogers
  2025-11-18  1:36           ` Namhyung Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Ian Rogers @ 2025-11-16  3:29 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: James Clark, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Xu Yang, Chun-Tse Shao, Thomas Richter, Sumanth Korikkar,
	Collin Funk, Thomas Falcon, Howard Chu, Dapeng Mi, Levi Yun,
	Yang Li, linux-kernel, linux-perf-users, Andi Kleen, Weilin Wang,
	Leo Yan

On Sat, Nov 15, 2025 at 9:52 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Fri, Nov 14, 2025 at 08:57:39AM -0800, Ian Rogers wrote:
> > On Fri, Nov 14, 2025 at 8:28 AM James Clark <james.clark@linaro.org> wrote:
> > >
> > >
> > >
> > > On 11/11/2025 9:21 pm, Ian Rogers wrote:
> > > > Add support to getting a common set of metrics from a default
> > > > table. It simplifies the generation to add json metrics at the same
> > > > time. The metrics added are CPUs_utilized, cs_per_second,
> > > > migrations_per_second, page_faults_per_second, insn_per_cycle,
> > > > stalled_cycles_per_instruction, frontend_cycles_idle,
> > > > backend_cycles_idle, cycles_frequency, branch_frequency and
> > > > branch_miss_rate based on the shadow metric definitions.
> > > >
> > > > Following this change the default perf stat output on an alderlake
> > > > looks like:
> > > > ```
> > > > $ perf stat -a -- sleep 2
> > > >
> > > >   Performance counter stats for 'system wide':
> > > >
> > > >                0.00 msec cpu-clock                        #    0.000 CPUs utilized
> > > >              77,739      context-switches
> > > >              15,033      cpu-migrations
> > > >             321,313      page-faults
> > > >      14,355,634,225      cpu_atom/instructions/           #    1.40  insn per cycle              (35.37%)
> > > >     134,561,560,583      cpu_core/instructions/           #    3.44  insn per cycle              (57.85%)
> > > >      10,263,836,145      cpu_atom/cycles/                                                        (35.42%)
> > > >      39,138,632,894      cpu_core/cycles/                                                        (57.60%)
> > > >       2,989,658,777      cpu_atom/branches/                                                      (42.60%)
> > > >      32,170,570,388      cpu_core/branches/                                                      (57.39%)
> > > >          29,789,870      cpu_atom/branch-misses/          #    1.00% of all branches             (42.69%)
> > > >         165,991,152      cpu_core/branch-misses/          #    0.52% of all branches             (57.19%)
> > > >                         (software)                 #      nan cs/sec  cs_per_second
> > > >               TopdownL1 (cpu_core)                 #     11.9 %  tma_bad_speculation
> > > >                                                    #     19.6 %  tma_frontend_bound       (63.97%)
> > > >               TopdownL1 (cpu_core)                 #     18.8 %  tma_backend_bound
> > > >                                                    #     49.7 %  tma_retiring             (63.97%)
> > > >                         (software)                 #      nan faults/sec  page_faults_per_second
> > > >                                                    #      nan GHz  cycles_frequency       (42.88%)
> > > >                                                    #      nan GHz  cycles_frequency       (69.88%)
> > > >               TopdownL1 (cpu_atom)                 #     11.7 %  tma_bad_speculation
> > > >                                                    #     29.9 %  tma_retiring             (50.07%)
> > > >               TopdownL1 (cpu_atom)                 #     31.3 %  tma_frontend_bound       (43.09%)
> > > >                         (cpu_atom)                 #      nan M/sec  branch_frequency     (43.09%)
> > > >                                                    #      nan M/sec  branch_frequency     (70.07%)
> > > >                                                    #      nan migrations/sec  migrations_per_second
> > > >               TopdownL1 (cpu_atom)                 #     27.1 %  tma_backend_bound        (43.08%)
> > > >                         (software)                 #      0.0 CPUs  CPUs_utilized
> > > >                                                    #      1.4 instructions  insn_per_cycle  (43.04%)
> > > >                                                    #      3.5 instructions  insn_per_cycle  (69.99%)
> > > >                                                    #      1.0 %  branch_miss_rate         (35.46%)
> > > >                                                    #      0.5 %  branch_miss_rate         (65.02%)
> > > >
> > > >         2.005626564 seconds time elapsed
> > > > ```
> > > >
> > > > Signed-off-by: Ian Rogers <irogers@google.com>
> > > > ---
> > > >   .../arch/common/common/metrics.json           |  86 +++++++++++++
> > > >   tools/perf/pmu-events/empty-pmu-events.c      | 115 +++++++++++++-----
> > > >   tools/perf/pmu-events/jevents.py              |  21 +++-
> > > >   tools/perf/pmu-events/pmu-events.h            |   1 +
> > > >   tools/perf/util/metricgroup.c                 |  31 +++--
> > > >   5 files changed, 212 insertions(+), 42 deletions(-)
> > > >   create mode 100644 tools/perf/pmu-events/arch/common/common/metrics.json
> > > >
> > > > diff --git a/tools/perf/pmu-events/arch/common/common/metrics.json b/tools/perf/pmu-events/arch/common/common/metrics.json
> > > > new file mode 100644
> > > > index 000000000000..d915be51e300
> > > > --- /dev/null
> > > > +++ b/tools/perf/pmu-events/arch/common/common/metrics.json
> > > > @@ -0,0 +1,86 @@
> > > > +[
> > > > +    {
> > > > +        "BriefDescription": "Average CPU utilization",
> > > > +        "MetricExpr": "(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)",
> > >
> > > Hi Ian,
> > >
> > > I noticed that this metric is making "perf stat tests" fail.
> > > "duration_time" is a tool event and they don't work with "perf stat
> > > record" anymore. The test tests the record command with the default args
> > > which results in this event being used and a failure.
> > >
> > > I suppose there are three issues. First two are unrelated to this change:
> > >
> > >   - Perf stat record continues to write out a bad perf.data file even
> > >     though it knows that tool events won't work.
> > >
> > >     For example 'status' ends up being -1 in cmd_stat() but it's ignored
> > >     for some of the writing parts. It does decide to not print any stdout
> > >     though:
> > >
> > >     $ perf stat record -e "duration_time"
> > >     <blank>
> > >
> > >   - The other issue is obviously that tool events don't work with perf
> > >     stat record which seems to be a regression from 6828d6929b76 ("perf
> > >     evsel: Refactor tool events")
> > >
> > >   - The third issue is that this change adds a broken tool event to the
> > >     default output of perf stat
> > >
> > > I'm not actually sure what "perf stat record" is for? It's possible that
> > > it's not used anymore, expecially if nobody noticed that tool events
> > > haven't been working in it for a while.
> > >
> > > I think we're also supposed to have json output for perf stat (although
> > > this is also broken in some obscure scenarios), so maybe perf stat
> > > record isn't needed anymore?
> >
> > Hi James,
> >
> > Thanks for the report. I think this is also an overlap with perf stat
> > metrics don't work with perf stat record, and because these changes
> > made that the default. Let me do some follow up work as the perf
> > script work shows we can do useful things with metrics while not being
> > on a live perf stat - there's the obstacle that the CPUID of the host
> > will be used :-/
> >
> > Anyway, I'll take a look and we should add a test on this. There is
> > one that the perf stat json output is okay, to some definition. One
> > problem is that the stat-display code is complete spaghetti. Now that
> > stat-shadow only handles json metrics, and perf script isn't trying to
> > maintain a set of shadow counters, that is a little bit improved.
>
> I have another test failure on this.  On my AMD machine, perf all
> metrics test fails due to missing "LLC-loads" events.
>
>   $ sudo perf stat -M llc_miss_rate true
>   Error:
>   No supported events found.
>   The LLC-loads event is not supported.
>
> Maybe we need to make some cache metrics conditional as some events are
> missing.

Maybe we can `perf list Default`, etc. for this is a problem. We have
similar unsupported events in metrics on Intel like:

```
$ perf stat -M itlb_miss_rate -a sleep 1

 Performance counter stats for 'system wide':

   <not supported>      iTLB-loads
           168,926      iTLB-load-misses

       1.002287122 seconds time elapsed
```

but I've not seen failures:

```
$ perf test -v "all metrics"
103: perf all metrics test                                           : Skip
```

Thanks,
Ian

> Thanks,
> Namhyung
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 03/18] perf jevents: Add set of common metrics based on default ones
  2025-11-16  3:29         ` Ian Rogers
@ 2025-11-18  1:36           ` Namhyung Kim
  2025-11-18  2:28             ` Ian Rogers
  0 siblings, 1 reply; 33+ messages in thread
From: Namhyung Kim @ 2025-11-18  1:36 UTC (permalink / raw)
  To: Ian Rogers
  Cc: James Clark, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Xu Yang, Chun-Tse Shao, Thomas Richter, Sumanth Korikkar,
	Collin Funk, Thomas Falcon, Howard Chu, Dapeng Mi, Levi Yun,
	Yang Li, linux-kernel, linux-perf-users, Andi Kleen, Weilin Wang,
	Leo Yan

On Sat, Nov 15, 2025 at 07:29:29PM -0800, Ian Rogers wrote:
> On Sat, Nov 15, 2025 at 9:52 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Fri, Nov 14, 2025 at 08:57:39AM -0800, Ian Rogers wrote:
> > > On Fri, Nov 14, 2025 at 8:28 AM James Clark <james.clark@linaro.org> wrote:
> > > >
> > > >
> > > >
> > > > On 11/11/2025 9:21 pm, Ian Rogers wrote:
> > > > > Add support to getting a common set of metrics from a default
> > > > > table. It simplifies the generation to add json metrics at the same
> > > > > time. The metrics added are CPUs_utilized, cs_per_second,
> > > > > migrations_per_second, page_faults_per_second, insn_per_cycle,
> > > > > stalled_cycles_per_instruction, frontend_cycles_idle,
> > > > > backend_cycles_idle, cycles_frequency, branch_frequency and
> > > > > branch_miss_rate based on the shadow metric definitions.
> > > > >
> > > > > Following this change the default perf stat output on an alderlake
> > > > > looks like:
> > > > > ```
> > > > > $ perf stat -a -- sleep 2
> > > > >
> > > > >   Performance counter stats for 'system wide':
> > > > >
> > > > >                0.00 msec cpu-clock                        #    0.000 CPUs utilized
> > > > >              77,739      context-switches
> > > > >              15,033      cpu-migrations
> > > > >             321,313      page-faults
> > > > >      14,355,634,225      cpu_atom/instructions/           #    1.40  insn per cycle              (35.37%)
> > > > >     134,561,560,583      cpu_core/instructions/           #    3.44  insn per cycle              (57.85%)
> > > > >      10,263,836,145      cpu_atom/cycles/                                                        (35.42%)
> > > > >      39,138,632,894      cpu_core/cycles/                                                        (57.60%)
> > > > >       2,989,658,777      cpu_atom/branches/                                                      (42.60%)
> > > > >      32,170,570,388      cpu_core/branches/                                                      (57.39%)
> > > > >          29,789,870      cpu_atom/branch-misses/          #    1.00% of all branches             (42.69%)
> > > > >         165,991,152      cpu_core/branch-misses/          #    0.52% of all branches             (57.19%)
> > > > >                         (software)                 #      nan cs/sec  cs_per_second
> > > > >               TopdownL1 (cpu_core)                 #     11.9 %  tma_bad_speculation
> > > > >                                                    #     19.6 %  tma_frontend_bound       (63.97%)
> > > > >               TopdownL1 (cpu_core)                 #     18.8 %  tma_backend_bound
> > > > >                                                    #     49.7 %  tma_retiring             (63.97%)
> > > > >                         (software)                 #      nan faults/sec  page_faults_per_second
> > > > >                                                    #      nan GHz  cycles_frequency       (42.88%)
> > > > >                                                    #      nan GHz  cycles_frequency       (69.88%)
> > > > >               TopdownL1 (cpu_atom)                 #     11.7 %  tma_bad_speculation
> > > > >                                                    #     29.9 %  tma_retiring             (50.07%)
> > > > >               TopdownL1 (cpu_atom)                 #     31.3 %  tma_frontend_bound       (43.09%)
> > > > >                         (cpu_atom)                 #      nan M/sec  branch_frequency     (43.09%)
> > > > >                                                    #      nan M/sec  branch_frequency     (70.07%)
> > > > >                                                    #      nan migrations/sec  migrations_per_second
> > > > >               TopdownL1 (cpu_atom)                 #     27.1 %  tma_backend_bound        (43.08%)
> > > > >                         (software)                 #      0.0 CPUs  CPUs_utilized
> > > > >                                                    #      1.4 instructions  insn_per_cycle  (43.04%)
> > > > >                                                    #      3.5 instructions  insn_per_cycle  (69.99%)
> > > > >                                                    #      1.0 %  branch_miss_rate         (35.46%)
> > > > >                                                    #      0.5 %  branch_miss_rate         (65.02%)
> > > > >
> > > > >         2.005626564 seconds time elapsed
> > > > > ```
> > > > >
> > > > > Signed-off-by: Ian Rogers <irogers@google.com>
> > > > > ---
> > > > >   .../arch/common/common/metrics.json           |  86 +++++++++++++
> > > > >   tools/perf/pmu-events/empty-pmu-events.c      | 115 +++++++++++++-----
> > > > >   tools/perf/pmu-events/jevents.py              |  21 +++-
> > > > >   tools/perf/pmu-events/pmu-events.h            |   1 +
> > > > >   tools/perf/util/metricgroup.c                 |  31 +++--
> > > > >   5 files changed, 212 insertions(+), 42 deletions(-)
> > > > >   create mode 100644 tools/perf/pmu-events/arch/common/common/metrics.json
> > > > >
> > > > > diff --git a/tools/perf/pmu-events/arch/common/common/metrics.json b/tools/perf/pmu-events/arch/common/common/metrics.json
> > > > > new file mode 100644
> > > > > index 000000000000..d915be51e300
> > > > > --- /dev/null
> > > > > +++ b/tools/perf/pmu-events/arch/common/common/metrics.json
> > > > > @@ -0,0 +1,86 @@
> > > > > +[
> > > > > +    {
> > > > > +        "BriefDescription": "Average CPU utilization",
> > > > > +        "MetricExpr": "(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)",
> > > >
> > > > Hi Ian,
> > > >
> > > > I noticed that this metric is making "perf stat tests" fail.
> > > > "duration_time" is a tool event and they don't work with "perf stat
> > > > record" anymore. The test tests the record command with the default args
> > > > which results in this event being used and a failure.
> > > >
> > > > I suppose there are three issues. First two are unrelated to this change:
> > > >
> > > >   - Perf stat record continues to write out a bad perf.data file even
> > > >     though it knows that tool events won't work.
> > > >
> > > >     For example 'status' ends up being -1 in cmd_stat() but it's ignored
> > > >     for some of the writing parts. It does decide to not print any stdout
> > > >     though:
> > > >
> > > >     $ perf stat record -e "duration_time"
> > > >     <blank>
> > > >
> > > >   - The other issue is obviously that tool events don't work with perf
> > > >     stat record which seems to be a regression from 6828d6929b76 ("perf
> > > >     evsel: Refactor tool events")
> > > >
> > > >   - The third issue is that this change adds a broken tool event to the
> > > >     default output of perf stat
> > > >
> > > > I'm not actually sure what "perf stat record" is for? It's possible that
> > > > it's not used anymore, expecially if nobody noticed that tool events
> > > > haven't been working in it for a while.
> > > >
> > > > I think we're also supposed to have json output for perf stat (although
> > > > this is also broken in some obscure scenarios), so maybe perf stat
> > > > record isn't needed anymore?
> > >
> > > Hi James,
> > >
> > > Thanks for the report. I think this is also an overlap with perf stat
> > > metrics don't work with perf stat record, and because these changes
> > > made that the default. Let me do some follow up work as the perf
> > > script work shows we can do useful things with metrics while not being
> > > on a live perf stat - there's the obstacle that the CPUID of the host
> > > will be used :-/
> > >
> > > Anyway, I'll take a look and we should add a test on this. There is
> > > one that the perf stat json output is okay, to some definition. One
> > > problem is that the stat-display code is complete spaghetti. Now that
> > > stat-shadow only handles json metrics, and perf script isn't trying to
> > > maintain a set of shadow counters, that is a little bit improved.
> >
> > I have another test failure on this.  On my AMD machine, perf all
> > metrics test fails due to missing "LLC-loads" events.
> >
> >   $ sudo perf stat -M llc_miss_rate true
> >   Error:
> >   No supported events found.
> >   The LLC-loads event is not supported.
> >
> > Maybe we need to make some cache metrics conditional as some events are
> > missing.
> 
> Maybe we can `perf list Default`, etc. for this is a problem. We have
> similar unsupported events in metrics on Intel like:
> 
> ```
> $ perf stat -M itlb_miss_rate -a sleep 1
> 
>  Performance counter stats for 'system wide':
> 
>    <not supported>      iTLB-loads
>            168,926      iTLB-load-misses
> 
>        1.002287122 seconds time elapsed
> ```
> 
> but I've not seen failures:
> 
> ```
> $ perf test -v "all metrics"
> 103: perf all metrics test                                           : Skip
> ```

  $ sudo perf test -v "all metrics"
  --- start ---
  test child forked, pid 1347112
  Testing CPUs_utilized
  Testing backend_cycles_idle
  Not supported events
  Performance counter stats for 'system wide': <not counted> cpu-cycles <not supported> stalled-cycles-backend 0.013162328 seconds time elapsed
  Testing branch_frequency
  Testing branch_miss_rate
  Testing cs_per_second
  Testing cycles_frequency
  Testing frontend_cycles_idle
  Testing insn_per_cycle
  Testing migrations_per_second
  Testing page_faults_per_second
  Testing stalled_cycles_per_instruction
  Testing l1d_miss_rate
  Testing llc_miss_rate
  Metric contains missing events
  Error: No supported events found. The LLC-loads event is not supported.
  Testing dtlb_miss_rate
  Testing itlb_miss_rate
  Testing l1i_miss_rate
  Testing l1_prefetch_miss_rate
  Not supported events
  Performance counter stats for 'system wide': <not counted> L1-dcache-prefetches <not supported> L1-dcache-prefetch-misses 0.012983559 seconds time elapsed
  Testing branch_misprediction_ratio
  Testing all_remote_links_outbound
  Testing nps1_die_to_dram
  Testing all_l2_cache_accesses
  Testing all_l2_cache_hits
  Testing all_l2_cache_misses
  Testing ic_fetch_miss_ratio
  Testing l2_cache_accesses_from_l2_hwpf
  Testing l2_cache_misses_from_l2_hwpf
  Testing l3_read_miss_latency
  Testing l1_itlb_misses
  ---- end(-1) ----
  103: perf all metrics test                                           : FAILED!

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 03/18] perf jevents: Add set of common metrics based on default ones
  2025-11-18  1:36           ` Namhyung Kim
@ 2025-11-18  2:28             ` Ian Rogers
  2025-11-18  7:29               ` Namhyung Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Ian Rogers @ 2025-11-18  2:28 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: James Clark, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Xu Yang, Chun-Tse Shao, Thomas Richter, Sumanth Korikkar,
	Collin Funk, Thomas Falcon, Howard Chu, Dapeng Mi, Levi Yun,
	Yang Li, linux-kernel, linux-perf-users, Andi Kleen, Weilin Wang,
	Leo Yan

On Mon, Nov 17, 2025 at 5:37 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Sat, Nov 15, 2025 at 07:29:29PM -0800, Ian Rogers wrote:
> > On Sat, Nov 15, 2025 at 9:52 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Fri, Nov 14, 2025 at 08:57:39AM -0800, Ian Rogers wrote:
> > > > On Fri, Nov 14, 2025 at 8:28 AM James Clark <james.clark@linaro.org> wrote:
> > > > >
> > > > >
> > > > >
> > > > > On 11/11/2025 9:21 pm, Ian Rogers wrote:
> > > > > > Add support to getting a common set of metrics from a default
> > > > > > table. It simplifies the generation to add json metrics at the same
> > > > > > time. The metrics added are CPUs_utilized, cs_per_second,
> > > > > > migrations_per_second, page_faults_per_second, insn_per_cycle,
> > > > > > stalled_cycles_per_instruction, frontend_cycles_idle,
> > > > > > backend_cycles_idle, cycles_frequency, branch_frequency and
> > > > > > branch_miss_rate based on the shadow metric definitions.
> > > > > >
> > > > > > Following this change the default perf stat output on an alderlake
> > > > > > looks like:
> > > > > > ```
> > > > > > $ perf stat -a -- sleep 2
> > > > > >
> > > > > >   Performance counter stats for 'system wide':
> > > > > >
> > > > > >                0.00 msec cpu-clock                        #    0.000 CPUs utilized
> > > > > >              77,739      context-switches
> > > > > >              15,033      cpu-migrations
> > > > > >             321,313      page-faults
> > > > > >      14,355,634,225      cpu_atom/instructions/           #    1.40  insn per cycle              (35.37%)
> > > > > >     134,561,560,583      cpu_core/instructions/           #    3.44  insn per cycle              (57.85%)
> > > > > >      10,263,836,145      cpu_atom/cycles/                                                        (35.42%)
> > > > > >      39,138,632,894      cpu_core/cycles/                                                        (57.60%)
> > > > > >       2,989,658,777      cpu_atom/branches/                                                      (42.60%)
> > > > > >      32,170,570,388      cpu_core/branches/                                                      (57.39%)
> > > > > >          29,789,870      cpu_atom/branch-misses/          #    1.00% of all branches             (42.69%)
> > > > > >         165,991,152      cpu_core/branch-misses/          #    0.52% of all branches             (57.19%)
> > > > > >                         (software)                 #      nan cs/sec  cs_per_second
> > > > > >               TopdownL1 (cpu_core)                 #     11.9 %  tma_bad_speculation
> > > > > >                                                    #     19.6 %  tma_frontend_bound       (63.97%)
> > > > > >               TopdownL1 (cpu_core)                 #     18.8 %  tma_backend_bound
> > > > > >                                                    #     49.7 %  tma_retiring             (63.97%)
> > > > > >                         (software)                 #      nan faults/sec  page_faults_per_second
> > > > > >                                                    #      nan GHz  cycles_frequency       (42.88%)
> > > > > >                                                    #      nan GHz  cycles_frequency       (69.88%)
> > > > > >               TopdownL1 (cpu_atom)                 #     11.7 %  tma_bad_speculation
> > > > > >                                                    #     29.9 %  tma_retiring             (50.07%)
> > > > > >               TopdownL1 (cpu_atom)                 #     31.3 %  tma_frontend_bound       (43.09%)
> > > > > >                         (cpu_atom)                 #      nan M/sec  branch_frequency     (43.09%)
> > > > > >                                                    #      nan M/sec  branch_frequency     (70.07%)
> > > > > >                                                    #      nan migrations/sec  migrations_per_second
> > > > > >               TopdownL1 (cpu_atom)                 #     27.1 %  tma_backend_bound        (43.08%)
> > > > > >                         (software)                 #      0.0 CPUs  CPUs_utilized
> > > > > >                                                    #      1.4 instructions  insn_per_cycle  (43.04%)
> > > > > >                                                    #      3.5 instructions  insn_per_cycle  (69.99%)
> > > > > >                                                    #      1.0 %  branch_miss_rate         (35.46%)
> > > > > >                                                    #      0.5 %  branch_miss_rate         (65.02%)
> > > > > >
> > > > > >         2.005626564 seconds time elapsed
> > > > > > ```
> > > > > >
> > > > > > Signed-off-by: Ian Rogers <irogers@google.com>
> > > > > > ---
> > > > > >   .../arch/common/common/metrics.json           |  86 +++++++++++++
> > > > > >   tools/perf/pmu-events/empty-pmu-events.c      | 115 +++++++++++++-----
> > > > > >   tools/perf/pmu-events/jevents.py              |  21 +++-
> > > > > >   tools/perf/pmu-events/pmu-events.h            |   1 +
> > > > > >   tools/perf/util/metricgroup.c                 |  31 +++--
> > > > > >   5 files changed, 212 insertions(+), 42 deletions(-)
> > > > > >   create mode 100644 tools/perf/pmu-events/arch/common/common/metrics.json
> > > > > >
> > > > > > diff --git a/tools/perf/pmu-events/arch/common/common/metrics.json b/tools/perf/pmu-events/arch/common/common/metrics.json
> > > > > > new file mode 100644
> > > > > > index 000000000000..d915be51e300
> > > > > > --- /dev/null
> > > > > > +++ b/tools/perf/pmu-events/arch/common/common/metrics.json
> > > > > > @@ -0,0 +1,86 @@
> > > > > > +[
> > > > > > +    {
> > > > > > +        "BriefDescription": "Average CPU utilization",
> > > > > > +        "MetricExpr": "(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)",
> > > > >
> > > > > Hi Ian,
> > > > >
> > > > > I noticed that this metric is making "perf stat tests" fail.
> > > > > "duration_time" is a tool event and they don't work with "perf stat
> > > > > record" anymore. The test tests the record command with the default args
> > > > > which results in this event being used and a failure.
> > > > >
> > > > > I suppose there are three issues. First two are unrelated to this change:
> > > > >
> > > > >   - Perf stat record continues to write out a bad perf.data file even
> > > > >     though it knows that tool events won't work.
> > > > >
> > > > >     For example 'status' ends up being -1 in cmd_stat() but it's ignored
> > > > >     for some of the writing parts. It does decide to not print any stdout
> > > > >     though:
> > > > >
> > > > >     $ perf stat record -e "duration_time"
> > > > >     <blank>
> > > > >
> > > > >   - The other issue is obviously that tool events don't work with perf
> > > > >     stat record which seems to be a regression from 6828d6929b76 ("perf
> > > > >     evsel: Refactor tool events")
> > > > >
> > > > >   - The third issue is that this change adds a broken tool event to the
> > > > >     default output of perf stat
> > > > >
> > > > > I'm not actually sure what "perf stat record" is for? It's possible that
> > > > > it's not used anymore, expecially if nobody noticed that tool events
> > > > > haven't been working in it for a while.
> > > > >
> > > > > I think we're also supposed to have json output for perf stat (although
> > > > > this is also broken in some obscure scenarios), so maybe perf stat
> > > > > record isn't needed anymore?
> > > >
> > > > Hi James,
> > > >
> > > > Thanks for the report. I think this is also an overlap with perf stat
> > > > metrics don't work with perf stat record, and because these changes
> > > > made that the default. Let me do some follow up work as the perf
> > > > script work shows we can do useful things with metrics while not being
> > > > on a live perf stat - there's the obstacle that the CPUID of the host
> > > > will be used :-/
> > > >
> > > > Anyway, I'll take a look and we should add a test on this. There is
> > > > one that the perf stat json output is okay, to some definition. One
> > > > problem is that the stat-display code is complete spaghetti. Now that
> > > > stat-shadow only handles json metrics, and perf script isn't trying to
> > > > maintain a set of shadow counters, that is a little bit improved.
> > >
> > > I have another test failure on this.  On my AMD machine, perf all
> > > metrics test fails due to missing "LLC-loads" events.
> > >
> > >   $ sudo perf stat -M llc_miss_rate true
> > >   Error:
> > >   No supported events found.
> > >   The LLC-loads event is not supported.
> > >
> > > Maybe we need to make some cache metrics conditional as some events are
> > > missing.
> >
> > Maybe we can `perf list Default`, etc. for this is a problem. We have
> > similar unsupported events in metrics on Intel like:
> >
> > ```
> > $ perf stat -M itlb_miss_rate -a sleep 1
> >
> >  Performance counter stats for 'system wide':
> >
> >    <not supported>      iTLB-loads
> >            168,926      iTLB-load-misses
> >
> >        1.002287122 seconds time elapsed
> > ```
> >
> > but I've not seen failures:
> >
> > ```
> > $ perf test -v "all metrics"
> > 103: perf all metrics test                                           : Skip
> > ```
>
>   $ sudo perf test -v "all metrics"
>   --- start ---
>   test child forked, pid 1347112
>   Testing CPUs_utilized
>   Testing backend_cycles_idle
>   Not supported events
>   Performance counter stats for 'system wide': <not counted> cpu-cycles <not supported> stalled-cycles-backend 0.013162328 seconds time elapsed
>   Testing branch_frequency
>   Testing branch_miss_rate
>   Testing cs_per_second
>   Testing cycles_frequency
>   Testing frontend_cycles_idle
>   Testing insn_per_cycle
>   Testing migrations_per_second
>   Testing page_faults_per_second
>   Testing stalled_cycles_per_instruction
>   Testing l1d_miss_rate
>   Testing llc_miss_rate
>   Metric contains missing events
>   Error: No supported events found. The LLC-loads event is not supported.

Right, but this should match the Intel case as iTLB-loads is an
unsupported event so I'm not sure why we don't see a failure on Intel
but do on AMD given both events are legacy cache ones. I'll need to
trace through the code (or uftrace it :-) ).

Thanks,
Ian

>   Testing dtlb_miss_rate
>   Testing itlb_miss_rate
>   Testing l1i_miss_rate
>   Testing l1_prefetch_miss_rate
>   Not supported events
>   Performance counter stats for 'system wide': <not counted> L1-dcache-prefetches <not supported> L1-dcache-prefetch-misses 0.012983559 seconds time elapsed
>   Testing branch_misprediction_ratio
>   Testing all_remote_links_outbound
>   Testing nps1_die_to_dram
>   Testing all_l2_cache_accesses
>   Testing all_l2_cache_hits
>   Testing all_l2_cache_misses
>   Testing ic_fetch_miss_ratio
>   Testing l2_cache_accesses_from_l2_hwpf
>   Testing l2_cache_misses_from_l2_hwpf
>   Testing l3_read_miss_latency
>   Testing l1_itlb_misses
>   ---- end(-1) ----
>   103: perf all metrics test                                           : FAILED!
>
> Thanks,
> Namhyung
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 03/18] perf jevents: Add set of common metrics based on default ones
  2025-11-18  2:28             ` Ian Rogers
@ 2025-11-18  7:29               ` Namhyung Kim
  2025-11-18 10:57                 ` James Clark
  0 siblings, 1 reply; 33+ messages in thread
From: Namhyung Kim @ 2025-11-18  7:29 UTC (permalink / raw)
  To: Ian Rogers
  Cc: James Clark, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Xu Yang, Chun-Tse Shao, Thomas Richter, Sumanth Korikkar,
	Collin Funk, Thomas Falcon, Howard Chu, Dapeng Mi, Levi Yun,
	Yang Li, linux-kernel, linux-perf-users, Andi Kleen, Weilin Wang,
	Leo Yan

On Mon, Nov 17, 2025 at 06:28:31PM -0800, Ian Rogers wrote:
> On Mon, Nov 17, 2025 at 5:37 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Sat, Nov 15, 2025 at 07:29:29PM -0800, Ian Rogers wrote:
> > > On Sat, Nov 15, 2025 at 9:52 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > On Fri, Nov 14, 2025 at 08:57:39AM -0800, Ian Rogers wrote:
> > > > > On Fri, Nov 14, 2025 at 8:28 AM James Clark <james.clark@linaro.org> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 11/11/2025 9:21 pm, Ian Rogers wrote:
> > > > > > > Add support to getting a common set of metrics from a default
> > > > > > > table. It simplifies the generation to add json metrics at the same
> > > > > > > time. The metrics added are CPUs_utilized, cs_per_second,
> > > > > > > migrations_per_second, page_faults_per_second, insn_per_cycle,
> > > > > > > stalled_cycles_per_instruction, frontend_cycles_idle,
> > > > > > > backend_cycles_idle, cycles_frequency, branch_frequency and
> > > > > > > branch_miss_rate based on the shadow metric definitions.
> > > > > > >
> > > > > > > Following this change the default perf stat output on an alderlake
> > > > > > > looks like:
> > > > > > > ```
> > > > > > > $ perf stat -a -- sleep 2
> > > > > > >
> > > > > > >   Performance counter stats for 'system wide':
> > > > > > >
> > > > > > >                0.00 msec cpu-clock                        #    0.000 CPUs utilized
> > > > > > >              77,739      context-switches
> > > > > > >              15,033      cpu-migrations
> > > > > > >             321,313      page-faults
> > > > > > >      14,355,634,225      cpu_atom/instructions/           #    1.40  insn per cycle              (35.37%)
> > > > > > >     134,561,560,583      cpu_core/instructions/           #    3.44  insn per cycle              (57.85%)
> > > > > > >      10,263,836,145      cpu_atom/cycles/                                                        (35.42%)
> > > > > > >      39,138,632,894      cpu_core/cycles/                                                        (57.60%)
> > > > > > >       2,989,658,777      cpu_atom/branches/                                                      (42.60%)
> > > > > > >      32,170,570,388      cpu_core/branches/                                                      (57.39%)
> > > > > > >          29,789,870      cpu_atom/branch-misses/          #    1.00% of all branches             (42.69%)
> > > > > > >         165,991,152      cpu_core/branch-misses/          #    0.52% of all branches             (57.19%)
> > > > > > >                         (software)                 #      nan cs/sec  cs_per_second
> > > > > > >               TopdownL1 (cpu_core)                 #     11.9 %  tma_bad_speculation
> > > > > > >                                                    #     19.6 %  tma_frontend_bound       (63.97%)
> > > > > > >               TopdownL1 (cpu_core)                 #     18.8 %  tma_backend_bound
> > > > > > >                                                    #     49.7 %  tma_retiring             (63.97%)
> > > > > > >                         (software)                 #      nan faults/sec  page_faults_per_second
> > > > > > >                                                    #      nan GHz  cycles_frequency       (42.88%)
> > > > > > >                                                    #      nan GHz  cycles_frequency       (69.88%)
> > > > > > >               TopdownL1 (cpu_atom)                 #     11.7 %  tma_bad_speculation
> > > > > > >                                                    #     29.9 %  tma_retiring             (50.07%)
> > > > > > >               TopdownL1 (cpu_atom)                 #     31.3 %  tma_frontend_bound       (43.09%)
> > > > > > >                         (cpu_atom)                 #      nan M/sec  branch_frequency     (43.09%)
> > > > > > >                                                    #      nan M/sec  branch_frequency     (70.07%)
> > > > > > >                                                    #      nan migrations/sec  migrations_per_second
> > > > > > >               TopdownL1 (cpu_atom)                 #     27.1 %  tma_backend_bound        (43.08%)
> > > > > > >                         (software)                 #      0.0 CPUs  CPUs_utilized
> > > > > > >                                                    #      1.4 instructions  insn_per_cycle  (43.04%)
> > > > > > >                                                    #      3.5 instructions  insn_per_cycle  (69.99%)
> > > > > > >                                                    #      1.0 %  branch_miss_rate         (35.46%)
> > > > > > >                                                    #      0.5 %  branch_miss_rate         (65.02%)
> > > > > > >
> > > > > > >         2.005626564 seconds time elapsed
> > > > > > > ```
> > > > > > >
> > > > > > > Signed-off-by: Ian Rogers <irogers@google.com>
> > > > > > > ---
> > > > > > >   .../arch/common/common/metrics.json           |  86 +++++++++++++
> > > > > > >   tools/perf/pmu-events/empty-pmu-events.c      | 115 +++++++++++++-----
> > > > > > >   tools/perf/pmu-events/jevents.py              |  21 +++-
> > > > > > >   tools/perf/pmu-events/pmu-events.h            |   1 +
> > > > > > >   tools/perf/util/metricgroup.c                 |  31 +++--
> > > > > > >   5 files changed, 212 insertions(+), 42 deletions(-)
> > > > > > >   create mode 100644 tools/perf/pmu-events/arch/common/common/metrics.json
> > > > > > >
> > > > > > > diff --git a/tools/perf/pmu-events/arch/common/common/metrics.json b/tools/perf/pmu-events/arch/common/common/metrics.json
> > > > > > > new file mode 100644
> > > > > > > index 000000000000..d915be51e300
> > > > > > > --- /dev/null
> > > > > > > +++ b/tools/perf/pmu-events/arch/common/common/metrics.json
> > > > > > > @@ -0,0 +1,86 @@
> > > > > > > +[
> > > > > > > +    {
> > > > > > > +        "BriefDescription": "Average CPU utilization",
> > > > > > > +        "MetricExpr": "(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)",
> > > > > >
> > > > > > Hi Ian,
> > > > > >
> > > > > > I noticed that this metric is making "perf stat tests" fail.
> > > > > > "duration_time" is a tool event and they don't work with "perf stat
> > > > > > record" anymore. The test tests the record command with the default args
> > > > > > which results in this event being used and a failure.
> > > > > >
> > > > > > I suppose there are three issues. First two are unrelated to this change:
> > > > > >
> > > > > >   - Perf stat record continues to write out a bad perf.data file even
> > > > > >     though it knows that tool events won't work.
> > > > > >
> > > > > >     For example 'status' ends up being -1 in cmd_stat() but it's ignored
> > > > > >     for some of the writing parts. It does decide to not print any stdout
> > > > > >     though:
> > > > > >
> > > > > >     $ perf stat record -e "duration_time"
> > > > > >     <blank>
> > > > > >
> > > > > >   - The other issue is obviously that tool events don't work with perf
> > > > > >     stat record which seems to be a regression from 6828d6929b76 ("perf
> > > > > >     evsel: Refactor tool events")
> > > > > >
> > > > > >   - The third issue is that this change adds a broken tool event to the
> > > > > >     default output of perf stat
> > > > > >
> > > > > > I'm not actually sure what "perf stat record" is for? It's possible that
> > > > > > it's not used anymore, expecially if nobody noticed that tool events
> > > > > > haven't been working in it for a while.
> > > > > >
> > > > > > I think we're also supposed to have json output for perf stat (although
> > > > > > this is also broken in some obscure scenarios), so maybe perf stat
> > > > > > record isn't needed anymore?
> > > > >
> > > > > Hi James,
> > > > >
> > > > > Thanks for the report. I think this is also an overlap with perf stat
> > > > > metrics don't work with perf stat record, and because these changes
> > > > > made that the default. Let me do some follow up work as the perf
> > > > > script work shows we can do useful things with metrics while not being
> > > > > on a live perf stat - there's the obstacle that the CPUID of the host
> > > > > will be used :-/
> > > > >
> > > > > Anyway, I'll take a look and we should add a test on this. There is
> > > > > one that the perf stat json output is okay, to some definition. One
> > > > > problem is that the stat-display code is complete spaghetti. Now that
> > > > > stat-shadow only handles json metrics, and perf script isn't trying to
> > > > > maintain a set of shadow counters, that is a little bit improved.
> > > >
> > > > I have another test failure on this.  On my AMD machine, perf all
> > > > metrics test fails due to missing "LLC-loads" events.
> > > >
> > > >   $ sudo perf stat -M llc_miss_rate true
> > > >   Error:
> > > >   No supported events found.
> > > >   The LLC-loads event is not supported.
> > > >
> > > > Maybe we need to make some cache metrics conditional as some events are
> > > > missing.
> > >
> > > Maybe we can `perf list Default`, etc. for this is a problem. We have
> > > similar unsupported events in metrics on Intel like:
> > >
> > > ```
> > > $ perf stat -M itlb_miss_rate -a sleep 1
> > >
> > >  Performance counter stats for 'system wide':
> > >
> > >    <not supported>      iTLB-loads
> > >            168,926      iTLB-load-misses
> > >
> > >        1.002287122 seconds time elapsed
> > > ```
> > >
> > > but I've not seen failures:
> > >
> > > ```
> > > $ perf test -v "all metrics"
> > > 103: perf all metrics test                                           : Skip
> > > ```
> >
> >   $ sudo perf test -v "all metrics"
> >   --- start ---
> >   test child forked, pid 1347112
> >   Testing CPUs_utilized
> >   Testing backend_cycles_idle
> >   Not supported events
> >   Performance counter stats for 'system wide': <not counted> cpu-cycles <not supported> stalled-cycles-backend 0.013162328 seconds time elapsed
> >   Testing branch_frequency
> >   Testing branch_miss_rate
> >   Testing cs_per_second
> >   Testing cycles_frequency
> >   Testing frontend_cycles_idle
> >   Testing insn_per_cycle
> >   Testing migrations_per_second
> >   Testing page_faults_per_second
> >   Testing stalled_cycles_per_instruction
> >   Testing l1d_miss_rate
> >   Testing llc_miss_rate
> >   Metric contains missing events
> >   Error: No supported events found. The LLC-loads event is not supported.
> 
> Right, but this should match the Intel case as iTLB-loads is an
> unsupported event so I'm not sure why we don't see a failure on Intel
> but do on AMD given both events are legacy cache ones. I'll need to
> trace through the code (or uftrace it :-) ).
                          ^^^^^^^^^^^^^^^^^
                          That'd be fun! ;-)

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 03/18] perf jevents: Add set of common metrics based on default ones
  2025-11-18  7:29               ` Namhyung Kim
@ 2025-11-18 10:57                 ` James Clark
  0 siblings, 0 replies; 33+ messages in thread
From: James Clark @ 2025-11-18 10:57 UTC (permalink / raw)
  To: Namhyung Kim, Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, Xu Yang,
	Chun-Tse Shao, Thomas Richter, Sumanth Korikkar, Collin Funk,
	Thomas Falcon, Howard Chu, Dapeng Mi, Levi Yun, Yang Li,
	linux-kernel, linux-perf-users, Andi Kleen, Weilin Wang, Leo Yan



On 18/11/2025 7:29 am, Namhyung Kim wrote:
> On Mon, Nov 17, 2025 at 06:28:31PM -0800, Ian Rogers wrote:
>> On Mon, Nov 17, 2025 at 5:37 PM Namhyung Kim <namhyung@kernel.org> wrote:
>>>
>>> On Sat, Nov 15, 2025 at 07:29:29PM -0800, Ian Rogers wrote:
>>>> On Sat, Nov 15, 2025 at 9:52 AM Namhyung Kim <namhyung@kernel.org> wrote:
>>>>>
>>>>> On Fri, Nov 14, 2025 at 08:57:39AM -0800, Ian Rogers wrote:
>>>>>> On Fri, Nov 14, 2025 at 8:28 AM James Clark <james.clark@linaro.org> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 11/11/2025 9:21 pm, Ian Rogers wrote:
>>>>>>>> Add support to getting a common set of metrics from a default
>>>>>>>> table. It simplifies the generation to add json metrics at the same
>>>>>>>> time. The metrics added are CPUs_utilized, cs_per_second,
>>>>>>>> migrations_per_second, page_faults_per_second, insn_per_cycle,
>>>>>>>> stalled_cycles_per_instruction, frontend_cycles_idle,
>>>>>>>> backend_cycles_idle, cycles_frequency, branch_frequency and
>>>>>>>> branch_miss_rate based on the shadow metric definitions.
>>>>>>>>
>>>>>>>> Following this change the default perf stat output on an alderlake
>>>>>>>> looks like:
>>>>>>>> ```
>>>>>>>> $ perf stat -a -- sleep 2
>>>>>>>>
>>>>>>>>    Performance counter stats for 'system wide':
>>>>>>>>
>>>>>>>>                 0.00 msec cpu-clock                        #    0.000 CPUs utilized
>>>>>>>>               77,739      context-switches
>>>>>>>>               15,033      cpu-migrations
>>>>>>>>              321,313      page-faults
>>>>>>>>       14,355,634,225      cpu_atom/instructions/           #    1.40  insn per cycle              (35.37%)
>>>>>>>>      134,561,560,583      cpu_core/instructions/           #    3.44  insn per cycle              (57.85%)
>>>>>>>>       10,263,836,145      cpu_atom/cycles/                                                        (35.42%)
>>>>>>>>       39,138,632,894      cpu_core/cycles/                                                        (57.60%)
>>>>>>>>        2,989,658,777      cpu_atom/branches/                                                      (42.60%)
>>>>>>>>       32,170,570,388      cpu_core/branches/                                                      (57.39%)
>>>>>>>>           29,789,870      cpu_atom/branch-misses/          #    1.00% of all branches             (42.69%)
>>>>>>>>          165,991,152      cpu_core/branch-misses/          #    0.52% of all branches             (57.19%)
>>>>>>>>                          (software)                 #      nan cs/sec  cs_per_second
>>>>>>>>                TopdownL1 (cpu_core)                 #     11.9 %  tma_bad_speculation
>>>>>>>>                                                     #     19.6 %  tma_frontend_bound       (63.97%)
>>>>>>>>                TopdownL1 (cpu_core)                 #     18.8 %  tma_backend_bound
>>>>>>>>                                                     #     49.7 %  tma_retiring             (63.97%)
>>>>>>>>                          (software)                 #      nan faults/sec  page_faults_per_second
>>>>>>>>                                                     #      nan GHz  cycles_frequency       (42.88%)
>>>>>>>>                                                     #      nan GHz  cycles_frequency       (69.88%)
>>>>>>>>                TopdownL1 (cpu_atom)                 #     11.7 %  tma_bad_speculation
>>>>>>>>                                                     #     29.9 %  tma_retiring             (50.07%)
>>>>>>>>                TopdownL1 (cpu_atom)                 #     31.3 %  tma_frontend_bound       (43.09%)
>>>>>>>>                          (cpu_atom)                 #      nan M/sec  branch_frequency     (43.09%)
>>>>>>>>                                                     #      nan M/sec  branch_frequency     (70.07%)
>>>>>>>>                                                     #      nan migrations/sec  migrations_per_second
>>>>>>>>                TopdownL1 (cpu_atom)                 #     27.1 %  tma_backend_bound        (43.08%)
>>>>>>>>                          (software)                 #      0.0 CPUs  CPUs_utilized
>>>>>>>>                                                     #      1.4 instructions  insn_per_cycle  (43.04%)
>>>>>>>>                                                     #      3.5 instructions  insn_per_cycle  (69.99%)
>>>>>>>>                                                     #      1.0 %  branch_miss_rate         (35.46%)
>>>>>>>>                                                     #      0.5 %  branch_miss_rate         (65.02%)
>>>>>>>>
>>>>>>>>          2.005626564 seconds time elapsed
>>>>>>>> ```
>>>>>>>>
>>>>>>>> Signed-off-by: Ian Rogers <irogers@google.com>
>>>>>>>> ---
>>>>>>>>    .../arch/common/common/metrics.json           |  86 +++++++++++++
>>>>>>>>    tools/perf/pmu-events/empty-pmu-events.c      | 115 +++++++++++++-----
>>>>>>>>    tools/perf/pmu-events/jevents.py              |  21 +++-
>>>>>>>>    tools/perf/pmu-events/pmu-events.h            |   1 +
>>>>>>>>    tools/perf/util/metricgroup.c                 |  31 +++--
>>>>>>>>    5 files changed, 212 insertions(+), 42 deletions(-)
>>>>>>>>    create mode 100644 tools/perf/pmu-events/arch/common/common/metrics.json
>>>>>>>>
>>>>>>>> diff --git a/tools/perf/pmu-events/arch/common/common/metrics.json b/tools/perf/pmu-events/arch/common/common/metrics.json
>>>>>>>> new file mode 100644
>>>>>>>> index 000000000000..d915be51e300
>>>>>>>> --- /dev/null
>>>>>>>> +++ b/tools/perf/pmu-events/arch/common/common/metrics.json
>>>>>>>> @@ -0,0 +1,86 @@
>>>>>>>> +[
>>>>>>>> +    {
>>>>>>>> +        "BriefDescription": "Average CPU utilization",
>>>>>>>> +        "MetricExpr": "(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)",
>>>>>>>
>>>>>>> Hi Ian,
>>>>>>>
>>>>>>> I noticed that this metric is making "perf stat tests" fail.
>>>>>>> "duration_time" is a tool event and they don't work with "perf stat
>>>>>>> record" anymore. The test tests the record command with the default args
>>>>>>> which results in this event being used and a failure.
>>>>>>>
>>>>>>> I suppose there are three issues. First two are unrelated to this change:
>>>>>>>
>>>>>>>    - Perf stat record continues to write out a bad perf.data file even
>>>>>>>      though it knows that tool events won't work.
>>>>>>>
>>>>>>>      For example 'status' ends up being -1 in cmd_stat() but it's ignored
>>>>>>>      for some of the writing parts. It does decide to not print any stdout
>>>>>>>      though:
>>>>>>>
>>>>>>>      $ perf stat record -e "duration_time"
>>>>>>>      <blank>
>>>>>>>
>>>>>>>    - The other issue is obviously that tool events don't work with perf
>>>>>>>      stat record which seems to be a regression from 6828d6929b76 ("perf
>>>>>>>      evsel: Refactor tool events")
>>>>>>>
>>>>>>>    - The third issue is that this change adds a broken tool event to the
>>>>>>>      default output of perf stat
>>>>>>>
>>>>>>> I'm not actually sure what "perf stat record" is for? It's possible that
>>>>>>> it's not used anymore, expecially if nobody noticed that tool events
>>>>>>> haven't been working in it for a while.
>>>>>>>
>>>>>>> I think we're also supposed to have json output for perf stat (although
>>>>>>> this is also broken in some obscure scenarios), so maybe perf stat
>>>>>>> record isn't needed anymore?
>>>>>>
>>>>>> Hi James,
>>>>>>
>>>>>> Thanks for the report. I think this is also an overlap with perf stat
>>>>>> metrics don't work with perf stat record, and because these changes
>>>>>> made that the default. Let me do some follow up work as the perf
>>>>>> script work shows we can do useful things with metrics while not being
>>>>>> on a live perf stat - there's the obstacle that the CPUID of the host
>>>>>> will be used :-/
>>>>>>
>>>>>> Anyway, I'll take a look and we should add a test on this. There is
>>>>>> one that the perf stat json output is okay, to some definition. One
>>>>>> problem is that the stat-display code is complete spaghetti. Now that
>>>>>> stat-shadow only handles json metrics, and perf script isn't trying to
>>>>>> maintain a set of shadow counters, that is a little bit improved.
>>>>>
>>>>> I have another test failure on this.  On my AMD machine, perf all
>>>>> metrics test fails due to missing "LLC-loads" events.
>>>>>
>>>>>    $ sudo perf stat -M llc_miss_rate true
>>>>>    Error:
>>>>>    No supported events found.
>>>>>    The LLC-loads event is not supported.
>>>>>
>>>>> Maybe we need to make some cache metrics conditional as some events are
>>>>> missing.
>>>>
>>>> Maybe we can `perf list Default`, etc. for this is a problem. We have
>>>> similar unsupported events in metrics on Intel like:
>>>>
>>>> ```
>>>> $ perf stat -M itlb_miss_rate -a sleep 1
>>>>
>>>>   Performance counter stats for 'system wide':
>>>>
>>>>     <not supported>      iTLB-loads
>>>>             168,926      iTLB-load-misses
>>>>
>>>>         1.002287122 seconds time elapsed
>>>> ```
>>>>
>>>> but I've not seen failures:
>>>>
>>>> ```
>>>> $ perf test -v "all metrics"
>>>> 103: perf all metrics test                                           : Skip
>>>> ```
>>>
>>>    $ sudo perf test -v "all metrics"
>>>    --- start ---
>>>    test child forked, pid 1347112
>>>    Testing CPUs_utilized
>>>    Testing backend_cycles_idle
>>>    Not supported events
>>>    Performance counter stats for 'system wide': <not counted> cpu-cycles <not supported> stalled-cycles-backend 0.013162328 seconds time elapsed
>>>    Testing branch_frequency
>>>    Testing branch_miss_rate
>>>    Testing cs_per_second
>>>    Testing cycles_frequency
>>>    Testing frontend_cycles_idle
>>>    Testing insn_per_cycle
>>>    Testing migrations_per_second
>>>    Testing page_faults_per_second
>>>    Testing stalled_cycles_per_instruction
>>>    Testing l1d_miss_rate
>>>    Testing llc_miss_rate
>>>    Metric contains missing events
>>>    Error: No supported events found. The LLC-loads event is not supported.
>>
>> Right, but this should match the Intel case as iTLB-loads is an
>> unsupported event so I'm not sure why we don't see a failure on Intel
>> but do on AMD given both events are legacy cache ones. I'll need to
>> trace through the code (or uftrace it :-) ).
>                            ^^^^^^^^^^^^^^^^^
>                            That'd be fun! ;-)
> 
> Thanks,
> Namhyung
> 

There is the same "LLC-loads event is not supported" issue with this 
test on Arm too. (But it's from patch 5 rather than 3, just for the 
avoidance of confusion).


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v4 04/18] perf jevents: Add metric DefaultShowEvents
  2025-11-11 21:21 [PATCH v4 00/18] Ian Rogers
                   ` (2 preceding siblings ...)
  2025-11-11 21:21 ` [PATCH v4 03/18] perf jevents: Add set of common metrics based on default ones Ian Rogers
@ 2025-11-11 21:21 ` Ian Rogers
  2025-11-11 21:21 ` [PATCH v4 05/18] perf stat: Add detail -d,-dd,-ddd metrics Ian Rogers
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Ian Rogers @ 2025-11-11 21:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
	Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
	Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
	linux-perf-users, Andi Kleen, Weilin Wang

Some Default group metrics require their events showing for
consistency with perf's previous behavior. Add a flag to indicate when
this is the case and use it in stat-display.

As events are coming from Default metrics remove that default hardware
and software events from perf stat.

Following this change the default perf stat output on an alderlake looks like:
```
$ perf stat -a -- sleep 1

 Performance counter stats for 'system wide':

            20,550      context-switches                 #      nan cs/sec  cs_per_second
             TopdownL1 (cpu_core)                 #      9.0 %  tma_bad_speculation
                                                  #     28.1 %  tma_frontend_bound
             TopdownL1 (cpu_core)                 #     29.2 %  tma_backend_bound
                                                  #     33.7 %  tma_retiring
             6,685      page-faults                      #      nan faults/sec  page_faults_per_second
       790,091,064      cpu_atom/cpu-cycles/
                                                  #      nan GHz  cycles_frequency       (49.83%)
     2,563,918,366      cpu_core/cpu-cycles/
                                                  #      nan GHz  cycles_frequency
                                                  #     12.3 %  tma_bad_speculation
                                                  #     14.5 %  tma_retiring             (50.20%)
                                                  #     33.8 %  tma_frontend_bound       (50.24%)
        76,390,322      cpu_atom/branches/               #      nan M/sec  branch_frequency     (60.20%)
     1,015,173,047      cpu_core/branches/               #      nan M/sec  branch_frequency
             1,325      cpu-migrations                   #      nan migrations/sec  migrations_per_second
                                                  #     39.3 %  tma_backend_bound        (60.17%)
              0.00 msec cpu-clock                        #    0.000 CPUs utilized
                                                  #      0.0 CPUs  CPUs_utilized
       554,347,072      cpu_atom/instructions/           #    0.64  insn per cycle
                                                  #      0.6 instructions  insn_per_cycle  (60.14%)
     5,228,931,991      cpu_core/instructions/           #    2.04  insn per cycle
                                                  #      2.0 instructions  insn_per_cycle
         4,308,874      cpu_atom/branch-misses/          #    5.65% of all branches
                                                  #      5.6 %  branch_miss_rate         (49.76%)
         9,890,606      cpu_core/branch-misses/          #    0.97% of all branches
                                                  #      1.0 %  branch_miss_rate

       1.005477803 seconds time elapsed
```

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-stat.c                     |  42 +------
 .../arch/common/common/metrics.json           |  33 ++++--
 tools/perf/pmu-events/empty-pmu-events.c      | 106 +++++++++---------
 tools/perf/pmu-events/jevents.py              |   7 +-
 tools/perf/pmu-events/pmu-events.h            |   1 +
 tools/perf/util/evsel.h                       |   1 +
 tools/perf/util/metricgroup.c                 |  13 +++
 tools/perf/util/stat-display.c                |   4 +-
 tools/perf/util/stat-shadow.c                 |   2 +-
 9 files changed, 102 insertions(+), 107 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 3c46b92a53ab..31c762695d4b 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1857,16 +1857,6 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
 	return 0;
 }
 
-/* Add given software event to evlist without wildcarding. */
-static int parse_software_event(struct evlist *evlist, const char *event,
-				struct parse_events_error *err)
-{
-	char buf[256];
-
-	snprintf(buf, sizeof(buf), "software/%s,name=%s/", event, event);
-	return parse_events(evlist, buf, err);
-}
-
 /* Add legacy hardware/hardware-cache event to evlist for all core PMUs without wildcarding. */
 static int parse_hardware_event(struct evlist *evlist, const char *event,
 				struct parse_events_error *err)
@@ -2011,36 +2001,10 @@ static int add_default_events(void)
 		stat_config.topdown_level = 1;
 
 	if (!evlist->core.nr_entries && !evsel_list->core.nr_entries) {
-		/* No events so add defaults. */
-		const char *sw_events[] = {
-			target__has_cpu(&target) ? "cpu-clock" : "task-clock",
-			"context-switches",
-			"cpu-migrations",
-			"page-faults",
-		};
-		const char *hw_events[] = {
-			"instructions",
-			"cycles",
-			"stalled-cycles-frontend",
-			"stalled-cycles-backend",
-			"branches",
-			"branch-misses",
-		};
-
-		for (size_t i = 0; i < ARRAY_SIZE(sw_events); i++) {
-			ret = parse_software_event(evlist, sw_events[i], &err);
-			if (ret)
-				goto out;
-		}
-		for (size_t i = 0; i < ARRAY_SIZE(hw_events); i++) {
-			ret = parse_hardware_event(evlist, hw_events[i], &err);
-			if (ret)
-				goto out;
-		}
-
 		/*
-		 * Add TopdownL1 metrics if they exist. To minimize
-		 * multiplexing, don't request threshold computation.
+		 * Add Default metrics. To minimize multiplexing, don't request
+		 * threshold computation, but it will be computed if the events
+		 * are present.
 		 */
 		if (metricgroup__has_metric_or_groups(pmu, "Default")) {
 			struct evlist *metric_evlist = evlist__new();
diff --git a/tools/perf/pmu-events/arch/common/common/metrics.json b/tools/perf/pmu-events/arch/common/common/metrics.json
index d915be51e300..d6ea967a4045 100644
--- a/tools/perf/pmu-events/arch/common/common/metrics.json
+++ b/tools/perf/pmu-events/arch/common/common/metrics.json
@@ -5,7 +5,8 @@
         "MetricGroup": "Default",
         "MetricName": "CPUs_utilized",
         "ScaleUnit": "1CPUs",
-        "MetricConstraint": "NO_GROUP_EVENTS"
+        "MetricConstraint": "NO_GROUP_EVENTS",
+        "DefaultShowEvents": "1"
     },
     {
         "BriefDescription": "Context switches per CPU second",
@@ -13,7 +14,8 @@
         "MetricGroup": "Default",
         "MetricName": "cs_per_second",
         "ScaleUnit": "1cs/sec",
-        "MetricConstraint": "NO_GROUP_EVENTS"
+        "MetricConstraint": "NO_GROUP_EVENTS",
+        "DefaultShowEvents": "1"
     },
     {
         "BriefDescription": "Process migrations to a new CPU per CPU second",
@@ -21,7 +23,8 @@
         "MetricGroup": "Default",
         "MetricName": "migrations_per_second",
         "ScaleUnit": "1migrations/sec",
-        "MetricConstraint": "NO_GROUP_EVENTS"
+        "MetricConstraint": "NO_GROUP_EVENTS",
+        "DefaultShowEvents": "1"
     },
     {
         "BriefDescription": "Page faults per CPU second",
@@ -29,7 +32,8 @@
         "MetricGroup": "Default",
         "MetricName": "page_faults_per_second",
         "ScaleUnit": "1faults/sec",
-        "MetricConstraint": "NO_GROUP_EVENTS"
+        "MetricConstraint": "NO_GROUP_EVENTS",
+        "DefaultShowEvents": "1"
     },
     {
         "BriefDescription": "Instructions Per Cycle",
@@ -37,27 +41,31 @@
         "MetricGroup": "Default",
         "MetricName": "insn_per_cycle",
         "MetricThreshold": "insn_per_cycle < 1",
-        "ScaleUnit": "1instructions"
+        "ScaleUnit": "1instructions",
+        "DefaultShowEvents": "1"
     },
     {
         "BriefDescription": "Max front or backend stalls per instruction",
         "MetricExpr": "max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions",
         "MetricGroup": "Default",
-        "MetricName": "stalled_cycles_per_instruction"
+        "MetricName": "stalled_cycles_per_instruction",
+        "DefaultShowEvents": "1"
     },
     {
         "BriefDescription": "Frontend stalls per cycle",
         "MetricExpr": "stalled\\-cycles\\-frontend / cpu\\-cycles",
         "MetricGroup": "Default",
         "MetricName": "frontend_cycles_idle",
-        "MetricThreshold": "frontend_cycles_idle > 0.1"
+        "MetricThreshold": "frontend_cycles_idle > 0.1",
+        "DefaultShowEvents": "1"
     },
     {
         "BriefDescription": "Backend stalls per cycle",
         "MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
         "MetricGroup": "Default",
         "MetricName": "backend_cycles_idle",
-        "MetricThreshold": "backend_cycles_idle > 0.2"
+        "MetricThreshold": "backend_cycles_idle > 0.2",
+        "DefaultShowEvents": "1"
     },
     {
         "BriefDescription": "Cycles per CPU second",
@@ -65,7 +73,8 @@
         "MetricGroup": "Default",
         "MetricName": "cycles_frequency",
         "ScaleUnit": "1GHz",
-        "MetricConstraint": "NO_GROUP_EVENTS"
+        "MetricConstraint": "NO_GROUP_EVENTS",
+        "DefaultShowEvents": "1"
     },
     {
         "BriefDescription": "Branches per CPU second",
@@ -73,7 +82,8 @@
         "MetricGroup": "Default",
         "MetricName": "branch_frequency",
         "ScaleUnit": "1000M/sec",
-        "MetricConstraint": "NO_GROUP_EVENTS"
+        "MetricConstraint": "NO_GROUP_EVENTS",
+        "DefaultShowEvents": "1"
     },
     {
         "BriefDescription": "Branch miss rate",
@@ -81,6 +91,7 @@
         "MetricGroup": "Default",
         "MetricName": "branch_miss_rate",
         "MetricThreshold": "branch_miss_rate > 0.05",
-        "ScaleUnit": "100%"
+        "ScaleUnit": "100%",
+        "DefaultShowEvents": "1"
     }
 ]
diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
index e4d00f6b2b5d..333a44930910 100644
--- a/tools/perf/pmu-events/empty-pmu-events.c
+++ b/tools/perf/pmu-events/empty-pmu-events.c
@@ -1303,32 +1303,32 @@ static const char *const big_c_string =
 /* offset=127519 */ "sys_ccn_pmu.read_cycles\000uncore\000ccn read-cycles event\000config=0x2c\0000x01\00000\000\000\000\000\000"
 /* offset=127596 */ "uncore_sys_cmn_pmu\000"
 /* offset=127615 */ "sys_cmn_pmu.hnf_cache_miss\000uncore\000Counts total cache misses in first lookup result (high priority)\000eventid=1,type=5\000(434|436|43c|43a).*\00000\000\000\000\000\000"
-/* offset=127758 */ "CPUs_utilized\000Default\000(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)\000\000Average CPU utilization\000\0001CPUs\000\000\000\00001"
-/* offset=127943 */ "cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\00001"
-/* offset=128175 */ "migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\00001"
-/* offset=128434 */ "page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\00001"
-/* offset=128664 */ "insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\00000"
-/* offset=128776 */ "stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\00000"
-/* offset=128939 */ "frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\00000"
-/* offset=129068 */ "backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\00000"
-/* offset=129193 */ "cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\00001"
-/* offset=129368 */ "branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\00001"
-/* offset=129547 */ "branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\00000"
-/* offset=129650 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\00000"
-/* offset=129672 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\00000"
-/* offset=129735 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\00000"
-/* offset=129901 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000"
-/* offset=129965 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000"
-/* offset=130032 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\00000"
-/* offset=130103 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\00000"
-/* offset=130197 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\00000"
-/* offset=130331 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\00000"
-/* offset=130395 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\00000"
-/* offset=130463 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\00000"
-/* offset=130533 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\00000"
-/* offset=130555 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\00000"
-/* offset=130577 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\00000"
-/* offset=130597 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\00000"
+/* offset=127758 */ "CPUs_utilized\000Default\000(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)\000\000Average CPU utilization\000\0001CPUs\000\000\000\000011"
+/* offset=127944 */ "cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\000011"
+/* offset=128177 */ "migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\000011"
+/* offset=128437 */ "page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\000011"
+/* offset=128668 */ "insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\000001"
+/* offset=128781 */ "stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001"
+/* offset=128945 */ "frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001"
+/* offset=129075 */ "backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\000001"
+/* offset=129201 */ "cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011"
+/* offset=129377 */ "branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\000011"
+/* offset=129557 */ "branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001"
+/* offset=129661 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000"
+/* offset=129684 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000"
+/* offset=129748 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000"
+/* offset=129915 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
+/* offset=129980 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
+/* offset=130048 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000"
+/* offset=130120 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000"
+/* offset=130215 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000"
+/* offset=130350 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000"
+/* offset=130415 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000"
+/* offset=130484 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000"
+/* offset=130555 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\000000"
+/* offset=130578 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\000000"
+/* offset=130601 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\000000"
+/* offset=130622 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000"
 ;
 
 static const struct compact_pmu_event pmu_events__common_default_core[] = {
@@ -2615,17 +2615,17 @@ static const struct pmu_table_entry pmu_events__common[] = {
 };
 
 static const struct compact_pmu_event pmu_metrics__common_default_core[] = {
-{ 127758 }, /* CPUs_utilized\000Default\000(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)\000\000Average CPU utilization\000\0001CPUs\000\000\000\00001 */
-{ 129068 }, /* backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\00000 */
-{ 129368 }, /* branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\00001 */
-{ 129547 }, /* branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\00000 */
-{ 127943 }, /* cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\00001 */
-{ 129193 }, /* cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\00001 */
-{ 128939 }, /* frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\00000 */
-{ 128664 }, /* insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\00000 */
-{ 128175 }, /* migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\00001 */
-{ 128434 }, /* page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\00001 */
-{ 128776 }, /* stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\00000 */
+{ 127758 }, /* CPUs_utilized\000Default\000(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)\000\000Average CPU utilization\000\0001CPUs\000\000\000\000011 */
+{ 129075 }, /* backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\000001 */
+{ 129377 }, /* branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\000011 */
+{ 129557 }, /* branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001 */
+{ 127944 }, /* cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\000011 */
+{ 129201 }, /* cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011 */
+{ 128945 }, /* frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001 */
+{ 128668 }, /* insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\000001 */
+{ 128177 }, /* migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\000011 */
+{ 128437 }, /* page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\000011 */
+{ 128781 }, /* stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001 */
 
 };
 
@@ -2698,21 +2698,21 @@ static const struct pmu_table_entry pmu_events__test_soc_cpu[] = {
 };
 
 static const struct compact_pmu_event pmu_metrics__test_soc_cpu_default_core[] = {
-{ 129650 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\00000 */
-{ 130331 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\00000 */
-{ 130103 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\00000 */
-{ 130197 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\00000 */
-{ 130395 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\00000 */
-{ 130463 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\00000 */
-{ 129735 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\00000 */
-{ 129672 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\00000 */
-{ 130597 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\00000 */
-{ 130533 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\00000 */
-{ 130555 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\00000 */
-{ 130577 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\00000 */
-{ 130032 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\00000 */
-{ 129901 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000 */
-{ 129965 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000 */
+{ 129661 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000 */
+{ 130350 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000 */
+{ 130120 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000 */
+{ 130215 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000 */
+{ 130415 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
+{ 130484 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
+{ 129748 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000 */
+{ 129684 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000 */
+{ 130622 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000 */
+{ 130555 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\000000 */
+{ 130578 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\000000 */
+{ 130601 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\000000 */
+{ 130048 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000 */
+{ 129915 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
+{ 129980 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
 
 };
 
@@ -2894,6 +2894,8 @@ static void decompress_metric(int offset, struct pmu_metric *pm)
 	pm->aggr_mode = *p - '0';
 	p++;
 	pm->event_grouping = *p - '0';
+	p++;
+	pm->default_show_events = *p - '0';
 }
 
 static int pmu_events_table__for_each_event_pmu(const struct pmu_events_table *table,
diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 5d3f4b44cfb7..3413ee5d0227 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -58,10 +58,12 @@ _json_event_attributes = [
 _json_metric_attributes = [
     'metric_name', 'metric_group', 'metric_expr', 'metric_threshold',
     'desc', 'long_desc', 'unit', 'compat', 'metricgroup_no_group',
-    'default_metricgroup_name', 'aggr_mode', 'event_grouping'
+    'default_metricgroup_name', 'aggr_mode', 'event_grouping',
+    'default_show_events'
 ]
 # Attributes that are bools or enum int values, encoded as '0', '1',...
-_json_enum_attributes = ['aggr_mode', 'deprecated', 'event_grouping', 'perpkg']
+_json_enum_attributes = ['aggr_mode', 'deprecated', 'event_grouping', 'perpkg',
+                         'default_show_events']
 
 def removesuffix(s: str, suffix: str) -> str:
   """Remove the suffix from a string
@@ -356,6 +358,7 @@ class JsonEvent:
     self.metricgroup_no_group = jd.get('MetricgroupNoGroup')
     self.default_metricgroup_name = jd.get('DefaultMetricgroupName')
     self.event_grouping = convert_metric_constraint(jd.get('MetricConstraint'))
+    self.default_show_events = jd.get('DefaultShowEvents')
     self.metric_expr = None
     if 'MetricExpr' in jd:
       self.metric_expr = metric.ParsePerfJson(jd['MetricExpr']).Simplify()
diff --git a/tools/perf/pmu-events/pmu-events.h b/tools/perf/pmu-events/pmu-events.h
index 559265a903c8..d3b24014c6ff 100644
--- a/tools/perf/pmu-events/pmu-events.h
+++ b/tools/perf/pmu-events/pmu-events.h
@@ -74,6 +74,7 @@ struct pmu_metric {
 	const char *default_metricgroup_name;
 	enum aggr_mode_class aggr_mode;
 	enum metric_event_groups event_grouping;
+	bool default_show_events;
 };
 
 struct pmu_events_table;
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 71f74c7036ef..3ae4ac8f9a37 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -122,6 +122,7 @@ struct evsel {
 	bool			reset_group;
 	bool			needs_auxtrace_mmap;
 	bool			default_metricgroup; /* A member of the Default metricgroup */
+	bool			default_show_events; /* If a default group member, show the event */
 	bool			needs_uniquify;
 	struct hashmap		*per_pkg_mask;
 	int			err;
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index e67e04ce01c9..25c75fdbfc52 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -152,6 +152,8 @@ struct metric {
 	 * Should events of the metric be grouped?
 	 */
 	bool group_events;
+	/** Show events even if in the Default metric group. */
+	bool default_show_events;
 	/**
 	 * Parsed events for the metric. Optional as events may be taken from a
 	 * different metric whose group contains all the IDs necessary for this
@@ -255,6 +257,7 @@ static struct metric *metric__new(const struct pmu_metric *pm,
 	m->pctx->sctx.runtime = runtime;
 	m->pctx->sctx.system_wide = system_wide;
 	m->group_events = !metric_no_group && metric__group_events(pm, metric_no_threshold);
+	m->default_show_events = pm->default_show_events;
 	m->metric_refs = NULL;
 	m->evlist = NULL;
 
@@ -1513,6 +1516,16 @@ static int parse_groups(struct evlist *perf_evlist,
 			free(metric_events);
 			goto out;
 		}
+		if (m->default_show_events) {
+			struct evsel *pos;
+
+			for (int i = 0; metric_events[i]; i++)
+				metric_events[i]->default_show_events = true;
+			evlist__for_each_entry(metric_evlist, pos) {
+				if (pos->metric_leader && pos->metric_leader->default_show_events)
+					pos->default_show_events = true;
+			}
+		}
 		expr->metric_threshold = m->metric_threshold;
 		expr->metric_unit = m->metric_unit;
 		expr->metric_events = metric_events;
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index a67b991f4e81..4d0e353846ea 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -872,7 +872,7 @@ static void printout(struct perf_stat_config *config, struct outstate *os,
 	out.ctx = os;
 	out.force_header = false;
 
-	if (!config->metric_only && !counter->default_metricgroup) {
+	if (!config->metric_only && (!counter->default_metricgroup || counter->default_show_events)) {
 		abs_printout(config, os, os->id, os->aggr_nr, counter, uval, ok);
 
 		print_noise(config, os, counter, noise, /*before_metric=*/true);
@@ -880,7 +880,7 @@ static void printout(struct perf_stat_config *config, struct outstate *os,
 	}
 
 	if (ok) {
-		if (!config->metric_only && counter->default_metricgroup) {
+		if (!config->metric_only && counter->default_metricgroup && !counter->default_show_events) {
 			void *from = NULL;
 
 			aggr_printout(config, os, os->evsel, os->id, os->aggr_nr);
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index abaf6b579bfc..4df614f8e200 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -665,7 +665,7 @@ void *perf_stat__print_shadow_stats_metricgroup(struct perf_stat_config *config,
 			if (strcmp(name, mexp->default_metricgroup_name))
 				return (void *)mexp;
 			/* Only print the name of the metricgroup once */
-			if (!header_printed) {
+			if (!header_printed && !evsel->default_show_events) {
 				header_printed = true;
 				perf_stat__print_metricgroup_header(config, evsel, ctxp,
 								    name, out);
-- 
2.51.2.1041.gc1ab5b90ca-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 05/18] perf stat: Add detail -d,-dd,-ddd metrics
  2025-11-11 21:21 [PATCH v4 00/18] Ian Rogers
                   ` (3 preceding siblings ...)
  2025-11-11 21:21 ` [PATCH v4 04/18] perf jevents: Add metric DefaultShowEvents Ian Rogers
@ 2025-11-11 21:21 ` Ian Rogers
  2025-11-11 21:21 ` [PATCH v4 06/18] perf script: Change metric format to use json metrics Ian Rogers
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Ian Rogers @ 2025-11-11 21:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
	Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
	Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
	linux-perf-users, Andi Kleen, Weilin Wang

Add metrics for the stat-shadow -d, -dd and -ddd events and hard coded
metrics. Remove the events as these now come from the metrics.

Following this change a detailed perf stat output looks like:
```
$ perf stat -a -ddd -- sleep 1
 Performance counter stats for 'system wide':

            21,089      context-switches                 #      nan cs/sec  cs_per_second
             TopdownL1 (cpu_core)                 #     14.1 %  tma_bad_speculation
                                                  #     27.3 %  tma_frontend_bound       (30.56%)
             TopdownL1 (cpu_core)                 #     31.5 %  tma_backend_bound
                                                  #     27.2 %  tma_retiring             (30.56%)
             6,302      page-faults                      #      nan faults/sec  page_faults_per_second
       928,495,163      cpu_atom/cpu-cycles/
                                                  #      nan GHz  cycles_frequency       (28.41%)
     1,841,409,834      cpu_core/cpu-cycles/
                                                  #      nan GHz  cycles_frequency       (38.51%)
                                                  #     14.5 %  tma_bad_speculation
                                                  #     16.0 %  tma_retiring             (28.41%)
                                                  #     36.8 %  tma_frontend_bound       (35.57%)
       100,859,118      cpu_atom/branches/               #      nan M/sec  branch_frequency     (42.73%)
       572,657,734      cpu_core/branches/               #      nan M/sec  branch_frequency     (54.43%)
             1,527      cpu-migrations                   #      nan migrations/sec  migrations_per_second
                                                  #     32.7 %  tma_backend_bound        (42.73%)
              0.00 msec cpu-clock                        #    0.000 CPUs utilized
                                                  #      0.0 CPUs  CPUs_utilized
       498,668,509      cpu_atom/instructions/           #    0.57  insn per cycle
                                                  #      0.6 instructions  insn_per_cycle  (42.97%)
     3,281,762,225      cpu_core/instructions/           #    1.84  insn per cycle
                                                  #      1.8 instructions  insn_per_cycle  (62.20%)
         4,919,511      cpu_atom/branch-misses/          #    5.43% of all branches
                                                  #      5.4 %  branch_miss_rate         (35.80%)
         7,431,776      cpu_core/branch-misses/          #    1.39% of all branches
                                                  #      1.4 %  branch_miss_rate         (62.20%)
         2,517,007      cpu_atom/LLC-loads/              #      0.1 %  llc_miss_rate            (28.62%)
         3,931,318      cpu_core/LLC-loads/              #     40.4 %  llc_miss_rate            (45.98%)
        14,918,674      cpu_core/L1-dcache-load-misses/  #    2.25% of all L1-dcache accesses
                                                  #      nan %  l1d_miss_rate            (37.80%)
        27,067,264      cpu_atom/L1-icache-load-misses/  #   15.92% of all L1-icache accesses
                                                  #     15.9 %  l1i_miss_rate            (21.47%)
       116,848,994      cpu_atom/dTLB-loads/             #      0.8 %  dtlb_miss_rate           (21.47%)
       764,870,407      cpu_core/dTLB-loads/             #      0.1 %  dtlb_miss_rate           (15.12%)

       1.006181526 seconds time elapsed
```

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-stat.c                     | 100 +++---------------
 .../arch/common/common/metrics.json           |  54 ++++++++++
 tools/perf/pmu-events/empty-pmu-events.c      |  72 +++++++------
 3 files changed, 113 insertions(+), 113 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 31c762695d4b..7862094b93c8 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1857,28 +1857,6 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
 	return 0;
 }
 
-/* Add legacy hardware/hardware-cache event to evlist for all core PMUs without wildcarding. */
-static int parse_hardware_event(struct evlist *evlist, const char *event,
-				struct parse_events_error *err)
-{
-	char buf[256];
-	struct perf_pmu *pmu = NULL;
-
-	while ((pmu = perf_pmus__scan_core(pmu)) != NULL) {
-		int ret;
-
-		if (perf_pmus__num_core_pmus() == 1)
-			snprintf(buf, sizeof(buf), "%s/%s,name=%s/", pmu->name, event, event);
-		else
-			snprintf(buf, sizeof(buf), "%s/%s/", pmu->name, event);
-
-		ret = parse_events(evlist, buf, err);
-		if (ret)
-			return ret;
-	}
-	return 0;
-}
-
 /*
  * Add default events, if there were no attributes specified or
  * if -d/--detailed, -d -d or -d -d -d is used:
@@ -2006,22 +1984,34 @@ static int add_default_events(void)
 		 * threshold computation, but it will be computed if the events
 		 * are present.
 		 */
-		if (metricgroup__has_metric_or_groups(pmu, "Default")) {
-			struct evlist *metric_evlist = evlist__new();
+		const char *default_metricgroup_names[] = {
+			"Default", "Default2", "Default3", "Default4",
+		};
+
+		for (size_t i = 0; i < ARRAY_SIZE(default_metricgroup_names); i++) {
+			struct evlist *metric_evlist;
+
+			if (!metricgroup__has_metric_or_groups(pmu, default_metricgroup_names[i]))
+				continue;
+
+			if ((int)i > detailed_run)
+				break;
 
+			metric_evlist = evlist__new();
 			if (!metric_evlist) {
 				ret = -ENOMEM;
-				goto out;
+				break;
 			}
-			if (metricgroup__parse_groups(metric_evlist, pmu, "Default",
+			if (metricgroup__parse_groups(metric_evlist, pmu, default_metricgroup_names[i],
 							/*metric_no_group=*/false,
 							/*metric_no_merge=*/false,
 							/*metric_no_threshold=*/true,
 							stat_config.user_requested_cpu_list,
 							stat_config.system_wide,
 							stat_config.hardware_aware_grouping) < 0) {
+				evlist__delete(metric_evlist);
 				ret = -1;
-				goto out;
+				break;
 			}
 
 			evlist__for_each_entry(metric_evlist, evsel)
@@ -2034,62 +2024,6 @@ static int add_default_events(void)
 			evlist__delete(metric_evlist);
 		}
 	}
-
-	/* Detailed events get appended to the event list: */
-
-	if (!ret && detailed_run >=  1) {
-		/*
-		 * Detailed stats (-d), covering the L1 and last level data
-		 * caches:
-		 */
-		const char *hw_events[] = {
-			"L1-dcache-loads",
-			"L1-dcache-load-misses",
-			"LLC-loads",
-			"LLC-load-misses",
-		};
-
-		for (size_t i = 0; i < ARRAY_SIZE(hw_events); i++) {
-			ret = parse_hardware_event(evlist, hw_events[i], &err);
-			if (ret)
-				goto out;
-		}
-	}
-	if (!ret && detailed_run >=  2) {
-		/*
-		 * Very detailed stats (-d -d), covering the instruction cache
-		 * and the TLB caches:
-		 */
-		const char *hw_events[] = {
-			"L1-icache-loads",
-			"L1-icache-load-misses",
-			"dTLB-loads",
-			"dTLB-load-misses",
-			"iTLB-loads",
-			"iTLB-load-misses",
-		};
-
-		for (size_t i = 0; i < ARRAY_SIZE(hw_events); i++) {
-			ret = parse_hardware_event(evlist, hw_events[i], &err);
-			if (ret)
-				goto out;
-		}
-	}
-	if (!ret && detailed_run >=  3) {
-		/*
-		 * Very, very detailed stats (-d -d -d), adding prefetch events:
-		 */
-		const char *hw_events[] = {
-			"L1-dcache-prefetches",
-			"L1-dcache-prefetch-misses",
-		};
-
-		for (size_t i = 0; i < ARRAY_SIZE(hw_events); i++) {
-			ret = parse_hardware_event(evlist, hw_events[i], &err);
-			if (ret)
-				goto out;
-		}
-	}
 out:
 	if (!ret) {
 		evlist__for_each_entry(evlist, evsel) {
diff --git a/tools/perf/pmu-events/arch/common/common/metrics.json b/tools/perf/pmu-events/arch/common/common/metrics.json
index d6ea967a4045..0d010b3ebc6d 100644
--- a/tools/perf/pmu-events/arch/common/common/metrics.json
+++ b/tools/perf/pmu-events/arch/common/common/metrics.json
@@ -93,5 +93,59 @@
         "MetricThreshold": "branch_miss_rate > 0.05",
         "ScaleUnit": "100%",
         "DefaultShowEvents": "1"
+    },
+    {
+        "BriefDescription": "L1D  miss rate",
+        "MetricExpr": "L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads",
+        "MetricGroup": "Default2",
+        "MetricName": "l1d_miss_rate",
+        "MetricThreshold": "l1d_miss_rate > 0.05",
+        "ScaleUnit": "100%",
+        "DefaultShowEvents": "1"
+    },
+    {
+        "BriefDescription": "LLC miss rate",
+        "MetricExpr": "LLC\\-load\\-misses / LLC\\-loads",
+        "MetricGroup": "Default2",
+        "MetricName": "llc_miss_rate",
+        "MetricThreshold": "llc_miss_rate > 0.05",
+        "ScaleUnit": "100%",
+        "DefaultShowEvents": "1"
+    },
+    {
+        "BriefDescription": "L1I miss rate",
+        "MetricExpr": "L1\\-icache\\-load\\-misses / L1\\-icache\\-loads",
+        "MetricGroup": "Default3",
+        "MetricName": "l1i_miss_rate",
+        "MetricThreshold": "l1i_miss_rate > 0.05",
+        "ScaleUnit": "100%",
+        "DefaultShowEvents": "1"
+    },
+    {
+        "BriefDescription": "dTLB miss rate",
+        "MetricExpr": "dTLB\\-load\\-misses / dTLB\\-loads",
+        "MetricGroup": "Default3",
+        "MetricName": "dtlb_miss_rate",
+        "MetricThreshold": "dtlb_miss_rate > 0.05",
+        "ScaleUnit": "100%",
+        "DefaultShowEvents": "1"
+    },
+    {
+        "BriefDescription": "iTLB miss rate",
+        "MetricExpr": "iTLB\\-load\\-misses / iTLB\\-loads",
+        "MetricGroup": "Default3",
+        "MetricName": "itlb_miss_rate",
+        "MetricThreshold": "itlb_miss_rate > 0.05",
+        "ScaleUnit": "100%",
+        "DefaultShowEvents": "1"
+    },
+    {
+        "BriefDescription": "L1 prefetch miss rate",
+        "MetricExpr": "L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches",
+        "MetricGroup": "Default4",
+        "MetricName": "l1_prefetch_miss_rate",
+        "MetricThreshold": "l1_prefetch_miss_rate > 0.05",
+        "ScaleUnit": "100%",
+        "DefaultShowEvents": "1"
     }
 ]
diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
index 333a44930910..7fa42f13300f 100644
--- a/tools/perf/pmu-events/empty-pmu-events.c
+++ b/tools/perf/pmu-events/empty-pmu-events.c
@@ -1314,21 +1314,27 @@ static const char *const big_c_string =
 /* offset=129201 */ "cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011"
 /* offset=129377 */ "branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\000011"
 /* offset=129557 */ "branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001"
-/* offset=129661 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000"
-/* offset=129684 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000"
-/* offset=129748 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000"
-/* offset=129915 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
-/* offset=129980 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
-/* offset=130048 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000"
-/* offset=130120 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000"
-/* offset=130215 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000"
-/* offset=130350 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000"
-/* offset=130415 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000"
-/* offset=130484 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000"
-/* offset=130555 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\000000"
-/* offset=130578 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\000000"
-/* offset=130601 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\000000"
-/* offset=130622 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000"
+/* offset=129661 */ "l1d_miss_rate\000Default2\000L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads\000l1d_miss_rate > 0.05\000L1D  miss rate\000\000100%\000\000\000\000001"
+/* offset=129777 */ "llc_miss_rate\000Default2\000LLC\\-load\\-misses / LLC\\-loads\000llc_miss_rate > 0.05\000LLC miss rate\000\000100%\000\000\000\000001"
+/* offset=129878 */ "l1i_miss_rate\000Default3\000L1\\-icache\\-load\\-misses / L1\\-icache\\-loads\000l1i_miss_rate > 0.05\000L1I miss rate\000\000100%\000\000\000\000001"
+/* offset=129993 */ "dtlb_miss_rate\000Default3\000dTLB\\-load\\-misses / dTLB\\-loads\000dtlb_miss_rate > 0.05\000dTLB miss rate\000\000100%\000\000\000\000001"
+/* offset=130099 */ "itlb_miss_rate\000Default3\000iTLB\\-load\\-misses / iTLB\\-loads\000itlb_miss_rate > 0.05\000iTLB miss rate\000\000100%\000\000\000\000001"
+/* offset=130205 */ "l1_prefetch_miss_rate\000Default4\000L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches\000l1_prefetch_miss_rate > 0.05\000L1 prefetch miss rate\000\000100%\000\000\000\000001"
+/* offset=130353 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000"
+/* offset=130376 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000"
+/* offset=130440 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000"
+/* offset=130607 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
+/* offset=130672 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
+/* offset=130740 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000"
+/* offset=130812 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000"
+/* offset=130907 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000"
+/* offset=131042 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000"
+/* offset=131107 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000"
+/* offset=131176 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000"
+/* offset=131247 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\000000"
+/* offset=131270 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\000000"
+/* offset=131293 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\000000"
+/* offset=131314 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000"
 ;
 
 static const struct compact_pmu_event pmu_events__common_default_core[] = {
@@ -2621,8 +2627,14 @@ static const struct compact_pmu_event pmu_metrics__common_default_core[] = {
 { 129557 }, /* branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001 */
 { 127944 }, /* cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\000011 */
 { 129201 }, /* cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011 */
+{ 129993 }, /* dtlb_miss_rate\000Default3\000dTLB\\-load\\-misses / dTLB\\-loads\000dtlb_miss_rate > 0.05\000dTLB miss rate\000\000100%\000\000\000\000001 */
 { 128945 }, /* frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001 */
 { 128668 }, /* insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\000001 */
+{ 130099 }, /* itlb_miss_rate\000Default3\000iTLB\\-load\\-misses / iTLB\\-loads\000itlb_miss_rate > 0.05\000iTLB miss rate\000\000100%\000\000\000\000001 */
+{ 130205 }, /* l1_prefetch_miss_rate\000Default4\000L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches\000l1_prefetch_miss_rate > 0.05\000L1 prefetch miss rate\000\000100%\000\000\000\000001 */
+{ 129661 }, /* l1d_miss_rate\000Default2\000L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads\000l1d_miss_rate > 0.05\000L1D  miss rate\000\000100%\000\000\000\000001 */
+{ 129878 }, /* l1i_miss_rate\000Default3\000L1\\-icache\\-load\\-misses / L1\\-icache\\-loads\000l1i_miss_rate > 0.05\000L1I miss rate\000\000100%\000\000\000\000001 */
+{ 129777 }, /* llc_miss_rate\000Default2\000LLC\\-load\\-misses / LLC\\-loads\000llc_miss_rate > 0.05\000LLC miss rate\000\000100%\000\000\000\000001 */
 { 128177 }, /* migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\000011 */
 { 128437 }, /* page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\000011 */
 { 128781 }, /* stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001 */
@@ -2698,21 +2710,21 @@ static const struct pmu_table_entry pmu_events__test_soc_cpu[] = {
 };
 
 static const struct compact_pmu_event pmu_metrics__test_soc_cpu_default_core[] = {
-{ 129661 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000 */
-{ 130350 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000 */
-{ 130120 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000 */
-{ 130215 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000 */
-{ 130415 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
-{ 130484 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
-{ 129748 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000 */
-{ 129684 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000 */
-{ 130622 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000 */
-{ 130555 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\000000 */
-{ 130578 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\000000 */
-{ 130601 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\000000 */
-{ 130048 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000 */
-{ 129915 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
-{ 129980 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
+{ 130353 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000 */
+{ 131042 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000 */
+{ 130812 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000 */
+{ 130907 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000 */
+{ 131107 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
+{ 131176 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
+{ 130440 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000 */
+{ 130376 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000 */
+{ 131314 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000 */
+{ 131247 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\000000 */
+{ 131270 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\000000 */
+{ 131293 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\000000 */
+{ 130740 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000 */
+{ 130607 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
+{ 130672 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
 
 };
 
-- 
2.51.2.1041.gc1ab5b90ca-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 06/18] perf script: Change metric format to use json metrics
  2025-11-11 21:21 [PATCH v4 00/18] Ian Rogers
                   ` (4 preceding siblings ...)
  2025-11-11 21:21 ` [PATCH v4 05/18] perf stat: Add detail -d,-dd,-ddd metrics Ian Rogers
@ 2025-11-11 21:21 ` Ian Rogers
  2025-11-11 21:21 ` [PATCH v4 07/18] perf stat: Remove hard coded shadow metrics Ian Rogers
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Ian Rogers @ 2025-11-11 21:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
	Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
	Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
	linux-perf-users, Andi Kleen, Weilin Wang

The metric format option isn't properly supported. This change
improves that by making the sample events update the counts of an
evsel, where the shadow metric code expects to read the values.  To
support printing metrics, metrics need to be found. This is done on
the first attempt to print a metric. Every metric is parsed and then
the evsels in the metric's evlist compared to those in perf script
using the perf_event_attr type and config. If the metric matches then
it is added for printing. As an event in the perf script's evlist may
have >1 metric id, or different leader for aggregation, the first
metric matched will be displayed in those cases.

An example use is:
```
$ perf record -a -e '{instructions,cpu-cycles}:S' -a -- sleep 1
$ perf script -F period,metric
...
     867817
         metric:    0.30  insn per cycle
     125394
         metric:    0.04  insn per cycle
     313516
         metric:    0.11  insn per cycle
         metric:    1.00  insn per cycle
```

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-script.c | 252 ++++++++++++++++++++++++++++++++----
 1 file changed, 230 insertions(+), 22 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index d813adbf9889..8bab5b264f61 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -33,6 +33,7 @@
 #include "util/path.h"
 #include "util/event.h"
 #include "util/mem-info.h"
+#include "util/metricgroup.h"
 #include "ui/ui.h"
 #include "print_binary.h"
 #include "print_insn.h"
@@ -341,9 +342,6 @@ struct evsel_script {
        char *filename;
        FILE *fp;
        u64  samples;
-       /* For metric output */
-       u64  val;
-       int  gnum;
 };
 
 static inline struct evsel_script *evsel_script(struct evsel *evsel)
@@ -2132,13 +2130,161 @@ static void script_new_line(struct perf_stat_config *config __maybe_unused,
 	fputs("\tmetric: ", mctx->fp);
 }
 
-static void perf_sample__fprint_metric(struct perf_script *script,
-				       struct thread *thread,
+struct script_find_metrics_args {
+	struct evlist *evlist;
+	bool system_wide;
+};
+
+static struct evsel *map_metric_evsel_to_script_evsel(struct evlist *script_evlist,
+						      struct evsel *metric_evsel)
+{
+	struct evsel *script_evsel;
+
+	evlist__for_each_entry(script_evlist, script_evsel) {
+		/* Skip if perf_event_attr differ. */
+		if (metric_evsel->core.attr.type != script_evsel->core.attr.type)
+			continue;
+		if (metric_evsel->core.attr.config != script_evsel->core.attr.config)
+			continue;
+		/* Skip if the script event has a metric_id that doesn't match. */
+		if (script_evsel->metric_id &&
+		    strcmp(evsel__metric_id(metric_evsel), evsel__metric_id(script_evsel))) {
+			pr_debug("Skipping matching evsel due to differing metric ids '%s' vs '%s'\n",
+				 evsel__metric_id(metric_evsel), evsel__metric_id(script_evsel));
+			continue;
+		}
+		return script_evsel;
+	}
+	return NULL;
+}
+
+static int script_find_metrics(const struct pmu_metric *pm,
+			       const struct pmu_metrics_table *table __maybe_unused,
+			       void *data)
+{
+	struct script_find_metrics_args *args = data;
+	struct evlist *script_evlist = args->evlist;
+	struct evlist *metric_evlist = evlist__new();
+	struct evsel *metric_evsel;
+	int ret = metricgroup__parse_groups(metric_evlist,
+					/*pmu=*/"all",
+					pm->metric_name,
+					/*metric_no_group=*/false,
+					/*metric_no_merge=*/false,
+					/*metric_no_threshold=*/true,
+					/*user_requested_cpu_list=*/NULL,
+					args->system_wide,
+					/*hardware_aware_grouping=*/false);
+
+	if (ret) {
+		/* Metric parsing failed but continue the search. */
+		goto out;
+	}
+
+	/*
+	 * Check the script_evlist has an entry for each metric_evlist entry. If
+	 * the script evsel was already set up avoid changing data that may
+	 * break it.
+	 */
+	evlist__for_each_entry(metric_evlist, metric_evsel) {
+		struct evsel *script_evsel =
+			map_metric_evsel_to_script_evsel(script_evlist, metric_evsel);
+		struct evsel *new_metric_leader;
+
+		if (!script_evsel) {
+			pr_debug("Skipping metric '%s' as evsel '%s' / '%s' is missing\n",
+				pm->metric_name, evsel__name(metric_evsel),
+				evsel__metric_id(metric_evsel));
+			goto out;
+		}
+
+		if (script_evsel->metric_leader == NULL)
+			continue;
+
+		if (metric_evsel->metric_leader == metric_evsel) {
+			new_metric_leader = script_evsel;
+		} else {
+			new_metric_leader =
+				map_metric_evsel_to_script_evsel(script_evlist,
+								 metric_evsel->metric_leader);
+		}
+		/* Mismatching evsel leaders. */
+		if (script_evsel->metric_leader != new_metric_leader) {
+			pr_debug("Skipping metric '%s' due to mismatching evsel metric leaders '%s' vs '%s'\n",
+				pm->metric_name, evsel__metric_id(metric_evsel),
+				evsel__metric_id(script_evsel));
+			goto out;
+		}
+	}
+	/*
+	 * Metric events match those in the script evlist, copy metric evsel
+	 * data into the script evlist.
+	 */
+	evlist__for_each_entry(metric_evlist, metric_evsel) {
+		struct evsel *script_evsel =
+			map_metric_evsel_to_script_evsel(script_evlist, metric_evsel);
+		struct metric_event *metric_me = metricgroup__lookup(&metric_evlist->metric_events,
+								     metric_evsel,
+								     /*create=*/false);
+
+		if (script_evsel->metric_id == NULL) {
+			script_evsel->metric_id = metric_evsel->metric_id;
+			metric_evsel->metric_id = NULL;
+		}
+
+		if (script_evsel->metric_leader == NULL) {
+			if (metric_evsel->metric_leader == metric_evsel) {
+				script_evsel->metric_leader = script_evsel;
+			} else {
+				script_evsel->metric_leader =
+					map_metric_evsel_to_script_evsel(script_evlist,
+								       metric_evsel->metric_leader);
+			}
+		}
+
+		if (metric_me) {
+			struct metric_expr *expr;
+			struct metric_event *script_me =
+				metricgroup__lookup(&script_evlist->metric_events,
+						    script_evsel,
+						    /*create=*/true);
+
+			if (!script_me) {
+				/*
+				 * As the metric_expr is created, the only
+				 * failure is a lack of memory.
+				 */
+				goto out;
+			}
+			list_splice_init(&metric_me->head, &script_me->head);
+			list_for_each_entry(expr, &script_me->head, nd) {
+				for (int i = 0; expr->metric_events[i]; i++) {
+					expr->metric_events[i] =
+						map_metric_evsel_to_script_evsel(script_evlist,
+									expr->metric_events[i]);
+				}
+			}
+		}
+	}
+	pr_debug("Found metric '%s' whose evsels match those of in the perf data\n",
+		 pm->metric_name);
+	evlist__delete(metric_evlist);
+out:
+	return 0;
+}
+
+static struct aggr_cpu_id script_aggr_cpu_id_get(struct perf_stat_config *config __maybe_unused,
+						    struct perf_cpu cpu)
+{
+	return aggr_cpu_id__global(cpu, /*data=*/NULL);
+}
+
+static void perf_sample__fprint_metric(struct thread *thread,
 				       struct evsel *evsel,
 				       struct perf_sample *sample,
 				       FILE *fp)
 {
-	struct evsel *leader = evsel__leader(evsel);
+	static bool init_metrics;
 	struct perf_stat_output_ctx ctx = {
 		.print_metric = script_print_metric,
 		.new_line = script_new_line,
@@ -2150,23 +2296,85 @@ static void perf_sample__fprint_metric(struct perf_script *script,
 			 },
 		.force_header = false,
 	};
-	struct evsel *ev2;
-	u64 val;
+	struct perf_counts_values *count, *old_count;
+	int cpu_map_idx, thread_map_idx, aggr_idx;
+	struct evsel *pos;
+
+	if (!init_metrics) {
+		/* One time initialization of stat_config and metric data. */
+		struct script_find_metrics_args args = {
+			.evlist = evsel->evlist,
+			.system_wide = perf_thread_map__pid(evsel->core.threads, /*idx=*/0) == -1,
+
+		};
+		if (!stat_config.output)
+			stat_config.output = stdout;
+
+		if (!stat_config.aggr_map) {
+			/* TODO: currently only global aggregation is supported. */
+			assert(stat_config.aggr_mode == AGGR_GLOBAL);
+			stat_config.aggr_get_id = script_aggr_cpu_id_get;
+			stat_config.aggr_map =
+				cpu_aggr_map__new(evsel->evlist->core.user_requested_cpus,
+						  aggr_cpu_id__global, /*data=*/NULL,
+						  /*needs_sort=*/false);
+		}
+
+		metricgroup__for_each_metric(pmu_metrics_table__find(), script_find_metrics, &args);
+		init_metrics = true;
+	}
+
+	if (!evsel->stats) {
+		if (evlist__alloc_stats(&stat_config, evsel->evlist, /*alloc_raw=*/true) < 0)
+			return;
+	}
+	if (!evsel->stats->aggr) {
+		if (evlist__alloc_aggr_stats(evsel->evlist, stat_config.aggr_map->nr) < 0)
+			return;
+	}
 
-	if (!evsel->stats)
-		evlist__alloc_stats(&stat_config, script->session->evlist, /*alloc_raw=*/false);
-	if (evsel_script(leader)->gnum++ == 0)
-		perf_stat__reset_shadow_stats();
-	val = sample->period * evsel->scale;
-	evsel_script(evsel)->val = val;
-	if (evsel_script(leader)->gnum == leader->core.nr_members) {
-		for_each_group_member (ev2, leader) {
-			perf_stat__print_shadow_stats(&stat_config, ev2,
-						      evsel_script(ev2)->val,
-						      sample->cpu,
-						      &ctx);
+	/* Update the evsel's count using the sample's data. */
+	cpu_map_idx = perf_cpu_map__idx(evsel->core.cpus, (struct perf_cpu){sample->cpu});
+	if (cpu_map_idx < 0) {
+		/* Missing CPU, check for any CPU. */
+		if (perf_cpu_map__cpu(evsel->core.cpus, /*idx=*/0).cpu == -1 ||
+		    sample->cpu == (u32)-1) {
+			/* Place the counts in the which ever CPU is first in the map. */
+			cpu_map_idx = 0;
+		} else {
+			pr_info("Missing CPU map entry for CPU %d\n", sample->cpu);
+			return;
+		}
+	}
+	thread_map_idx = perf_thread_map__idx(evsel->core.threads, sample->tid);
+	if (thread_map_idx < 0) {
+		/* Missing thread, check for any thread. */
+		if (perf_thread_map__pid(evsel->core.threads, /*idx=*/0) == -1 ||
+		    sample->tid == (u32)-1) {
+			/* Place the counts in the which ever thread is first in the map. */
+			thread_map_idx = 0;
+		} else {
+			pr_info("Missing thread map entry for thread %d\n", sample->tid);
+			return;
+		}
+	}
+	count = perf_counts(evsel->counts, cpu_map_idx, thread_map_idx);
+	old_count = perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread_map_idx);
+	count->val = old_count->val + sample->period;
+	count->run = old_count->run + 1;
+	count->ena = old_count->ena + 1;
+
+	/* Update the aggregated stats. */
+	perf_stat_process_counter(&stat_config, evsel);
+
+	/* Display all metrics. */
+	evlist__for_each_entry(evsel->evlist, pos) {
+		cpu_aggr_map__for_each_idx(aggr_idx, stat_config.aggr_map) {
+			perf_stat__print_shadow_stats(&stat_config, pos,
+						count->val,
+						aggr_idx,
+						&ctx);
 		}
-		evsel_script(leader)->gnum = 0;
 	}
 }
 
@@ -2348,7 +2556,7 @@ static void process_event(struct perf_script *script,
 	}
 
 	if (PRINT_FIELD(METRIC))
-		perf_sample__fprint_metric(script, thread, evsel, sample, fp);
+		perf_sample__fprint_metric(thread, evsel, sample, fp);
 
 	if (verbose > 0)
 		fflush(fp);
-- 
2.51.2.1041.gc1ab5b90ca-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 07/18] perf stat: Remove hard coded shadow metrics
  2025-11-11 21:21 [PATCH v4 00/18] Ian Rogers
                   ` (5 preceding siblings ...)
  2025-11-11 21:21 ` [PATCH v4 06/18] perf script: Change metric format to use json metrics Ian Rogers
@ 2025-11-11 21:21 ` Ian Rogers
  2025-11-11 21:21 ` [PATCH v4 08/18] perf stat: Fix default metricgroup display on hybrid Ian Rogers
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Ian Rogers @ 2025-11-11 21:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
	Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
	Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
	linux-perf-users, Andi Kleen, Weilin Wang

Now that the metrics are encoded in common json the hard coded
printing means the metrics are shown twice. Remove the hard coded
version.

This means that when specifying events, and those events correspond to
a hard coded metric, the metric will no longer be displayed. The
metric will be displayed if the metric is requested. Due to the adhoc
printing in the previous approach it was often found frustrating, the
new approach avoids this.

The default perf stat output on an alderlake now looks like:
```
$ perf stat -a -- sleep 1

 Performance counter stats for 'system wide':

            19,697      context-switches                 #      nan cs/sec  cs_per_second
             TopdownL1 (cpu_core)                 #     10.7 %  tma_bad_speculation
                                                  #     24.9 %  tma_frontend_bound
             TopdownL1 (cpu_core)                 #     34.3 %  tma_backend_bound
                                                  #     30.1 %  tma_retiring
             6,593      page-faults                      #      nan faults/sec  page_faults_per_second
       729,065,658      cpu_atom/cpu-cycles/             #      nan GHz  cycles_frequency       (49.79%)
     1,605,131,101      cpu_core/cpu-cycles/             #      nan GHz  cycles_frequency
                                                  #     19.7 %  tma_bad_speculation
                                                  #     14.2 %  tma_retiring             (50.14%)
                                                  #     37.3 %  tma_frontend_bound       (50.31%)
        87,302,268      cpu_atom/branches/               #      nan M/sec  branch_frequency     (60.27%)
       512,046,956      cpu_core/branches/               #      nan M/sec  branch_frequency
             1,111      cpu-migrations                   #      nan migrations/sec  migrations_per_second
                                                  #     28.8 %  tma_backend_bound        (60.26%)
              0.00 msec cpu-clock                        #      0.0 CPUs  CPUs_utilized
       392,509,323      cpu_atom/instructions/           #      0.6 instructions  insn_per_cycle  (60.19%)
     2,990,369,310      cpu_core/instructions/           #      1.9 instructions  insn_per_cycle
         3,493,478      cpu_atom/branch-misses/          #      5.9 %  branch_miss_rate         (49.69%)
         7,297,531      cpu_core/branch-misses/          #      1.4 %  branch_miss_rate

       1.006621701 seconds time elapsed
```

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-script.c    |   1 -
 tools/perf/util/stat-display.c |   4 +-
 tools/perf/util/stat-shadow.c  | 392 +--------------------------------
 tools/perf/util/stat.h         |   2 +-
 4 files changed, 6 insertions(+), 393 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 8bab5b264f61..cf0040bbaba9 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2371,7 +2371,6 @@ static void perf_sample__fprint_metric(struct thread *thread,
 	evlist__for_each_entry(evsel->evlist, pos) {
 		cpu_aggr_map__for_each_idx(aggr_idx, stat_config.aggr_map) {
 			perf_stat__print_shadow_stats(&stat_config, pos,
-						count->val,
 						aggr_idx,
 						&ctx);
 		}
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 4d0e353846ea..eabeab5e6614 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -902,7 +902,7 @@ static void printout(struct perf_stat_config *config, struct outstate *os,
 										 &num, from, &out);
 			} while (from != NULL);
 		} else {
-			perf_stat__print_shadow_stats(config, counter, uval, aggr_idx, &out);
+			perf_stat__print_shadow_stats(config, counter, aggr_idx, &out);
 		}
 	} else {
 		pm(config, os, METRIC_THRESHOLD_UNKNOWN, /*format=*/NULL, /*unit=*/NULL, /*val=*/0);
@@ -1274,7 +1274,7 @@ static void print_metric_headers(struct perf_stat_config *config,
 
 		os.evsel = counter;
 
-		perf_stat__print_shadow_stats(config, counter, 0, 0, &out);
+		perf_stat__print_shadow_stats(config, counter, /*aggr_idx=*/0, &out);
 	}
 
 	if (!config->json_output)
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 4df614f8e200..afbc49e8cb31 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -20,357 +20,12 @@
 struct stats walltime_nsecs_stats;
 struct rusage_stats ru_stats;
 
-enum {
-	CTX_BIT_USER	= 1 << 0,
-	CTX_BIT_KERNEL	= 1 << 1,
-	CTX_BIT_HV	= 1 << 2,
-	CTX_BIT_HOST	= 1 << 3,
-	CTX_BIT_IDLE	= 1 << 4,
-	CTX_BIT_MAX	= 1 << 5,
-};
-
-enum stat_type {
-	STAT_NONE = 0,
-	STAT_NSECS,
-	STAT_CYCLES,
-	STAT_INSTRUCTIONS,
-	STAT_STALLED_CYCLES_FRONT,
-	STAT_STALLED_CYCLES_BACK,
-	STAT_BRANCHES,
-	STAT_BRANCH_MISS,
-	STAT_CACHE_REFS,
-	STAT_CACHE_MISSES,
-	STAT_L1_DCACHE,
-	STAT_L1_ICACHE,
-	STAT_LL_CACHE,
-	STAT_ITLB_CACHE,
-	STAT_DTLB_CACHE,
-	STAT_L1D_MISS,
-	STAT_L1I_MISS,
-	STAT_LL_MISS,
-	STAT_DTLB_MISS,
-	STAT_ITLB_MISS,
-	STAT_MAX
-};
-
-static int evsel_context(const struct evsel *evsel)
-{
-	int ctx = 0;
-
-	if (evsel->core.attr.exclude_kernel)
-		ctx |= CTX_BIT_KERNEL;
-	if (evsel->core.attr.exclude_user)
-		ctx |= CTX_BIT_USER;
-	if (evsel->core.attr.exclude_hv)
-		ctx |= CTX_BIT_HV;
-	if (evsel->core.attr.exclude_host)
-		ctx |= CTX_BIT_HOST;
-	if (evsel->core.attr.exclude_idle)
-		ctx |= CTX_BIT_IDLE;
-
-	return ctx;
-}
-
 void perf_stat__reset_shadow_stats(void)
 {
 	memset(&walltime_nsecs_stats, 0, sizeof(walltime_nsecs_stats));
 	memset(&ru_stats, 0, sizeof(ru_stats));
 }
 
-static enum stat_type evsel__stat_type(struct evsel *evsel)
-{
-	/* Fake perf_hw_cache_op_id values for use with evsel__match. */
-	u64 PERF_COUNT_hw_cache_l1d_miss = PERF_COUNT_HW_CACHE_L1D |
-		((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
-		((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16);
-	u64 PERF_COUNT_hw_cache_l1i_miss = PERF_COUNT_HW_CACHE_L1I |
-		((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
-		((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16);
-	u64 PERF_COUNT_hw_cache_ll_miss = PERF_COUNT_HW_CACHE_LL |
-		((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
-		((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16);
-	u64 PERF_COUNT_hw_cache_dtlb_miss = PERF_COUNT_HW_CACHE_DTLB |
-		((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
-		((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16);
-	u64 PERF_COUNT_hw_cache_itlb_miss = PERF_COUNT_HW_CACHE_ITLB |
-		((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
-		((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16);
-
-	if (evsel__is_clock(evsel))
-		return STAT_NSECS;
-	else if (evsel__match(evsel, HARDWARE, HW_CPU_CYCLES))
-		return STAT_CYCLES;
-	else if (evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS))
-		return STAT_INSTRUCTIONS;
-	else if (evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
-		return STAT_STALLED_CYCLES_FRONT;
-	else if (evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_BACKEND))
-		return STAT_STALLED_CYCLES_BACK;
-	else if (evsel__match(evsel, HARDWARE, HW_BRANCH_INSTRUCTIONS))
-		return STAT_BRANCHES;
-	else if (evsel__match(evsel, HARDWARE, HW_BRANCH_MISSES))
-		return STAT_BRANCH_MISS;
-	else if (evsel__match(evsel, HARDWARE, HW_CACHE_REFERENCES))
-		return STAT_CACHE_REFS;
-	else if (evsel__match(evsel, HARDWARE, HW_CACHE_MISSES))
-		return STAT_CACHE_MISSES;
-	else if (evsel__match(evsel, HW_CACHE, HW_CACHE_L1D))
-		return STAT_L1_DCACHE;
-	else if (evsel__match(evsel, HW_CACHE, HW_CACHE_L1I))
-		return STAT_L1_ICACHE;
-	else if (evsel__match(evsel, HW_CACHE, HW_CACHE_LL))
-		return STAT_LL_CACHE;
-	else if (evsel__match(evsel, HW_CACHE, HW_CACHE_DTLB))
-		return STAT_DTLB_CACHE;
-	else if (evsel__match(evsel, HW_CACHE, HW_CACHE_ITLB))
-		return STAT_ITLB_CACHE;
-	else if (evsel__match(evsel, HW_CACHE, hw_cache_l1d_miss))
-		return STAT_L1D_MISS;
-	else if (evsel__match(evsel, HW_CACHE, hw_cache_l1i_miss))
-		return STAT_L1I_MISS;
-	else if (evsel__match(evsel, HW_CACHE, hw_cache_ll_miss))
-		return STAT_LL_MISS;
-	else if (evsel__match(evsel, HW_CACHE, hw_cache_dtlb_miss))
-		return STAT_DTLB_MISS;
-	else if (evsel__match(evsel, HW_CACHE, hw_cache_itlb_miss))
-		return STAT_ITLB_MISS;
-	return STAT_NONE;
-}
-
-static enum metric_threshold_classify get_ratio_thresh(const double ratios[3], double val)
-{
-	assert(ratios[0] > ratios[1]);
-	assert(ratios[1] > ratios[2]);
-
-	return val > ratios[1]
-		? (val > ratios[0] ? METRIC_THRESHOLD_BAD : METRIC_THRESHOLD_NEARLY_BAD)
-		: (val > ratios[2] ? METRIC_THRESHOLD_LESS_GOOD : METRIC_THRESHOLD_GOOD);
-}
-
-static double find_stat(const struct evsel *evsel, int aggr_idx, enum stat_type type)
-{
-	struct evsel *cur;
-	int evsel_ctx = evsel_context(evsel);
-	struct perf_pmu *evsel_pmu = evsel__find_pmu(evsel);
-
-	evlist__for_each_entry(evsel->evlist, cur) {
-		struct perf_stat_aggr *aggr;
-
-		/* Ignore the evsel that is being searched from. */
-		if (evsel == cur)
-			continue;
-
-		/* Ignore evsels that are part of different groups. */
-		if (evsel->core.leader->nr_members > 1 &&
-		    evsel->core.leader != cur->core.leader)
-			continue;
-		/* Ignore evsels with mismatched modifiers. */
-		if (evsel_ctx != evsel_context(cur))
-			continue;
-		/* Ignore if not the cgroup we're looking for. */
-		if (evsel->cgrp != cur->cgrp)
-			continue;
-		/* Ignore if not the stat we're looking for. */
-		if (type != evsel__stat_type(cur))
-			continue;
-
-		/*
-		 * Except the SW CLOCK events,
-		 * ignore if not the PMU we're looking for.
-		 */
-		if ((type != STAT_NSECS) && (evsel_pmu != evsel__find_pmu(cur)))
-			continue;
-
-		aggr = &cur->stats->aggr[aggr_idx];
-		if (type == STAT_NSECS)
-			return aggr->counts.val;
-		return aggr->counts.val * cur->scale;
-	}
-	return 0.0;
-}
-
-static void print_ratio(struct perf_stat_config *config,
-			const struct evsel *evsel, int aggr_idx,
-			double numerator, struct perf_stat_output_ctx *out,
-			enum stat_type denominator_type,
-			const double thresh_ratios[3], const char *_unit)
-{
-	double denominator = find_stat(evsel, aggr_idx, denominator_type);
-	double ratio = 0;
-	enum metric_threshold_classify thresh = METRIC_THRESHOLD_UNKNOWN;
-	const char *fmt = NULL;
-	const char *unit = NULL;
-
-	if (numerator && denominator) {
-		ratio = numerator / denominator * 100.0;
-		thresh = get_ratio_thresh(thresh_ratios, ratio);
-		fmt = "%7.2f%%";
-		unit = _unit;
-	}
-	out->print_metric(config, out->ctx, thresh, fmt, unit, ratio);
-}
-
-static void print_stalled_cycles_front(struct perf_stat_config *config,
-				const struct evsel *evsel,
-				int aggr_idx, double stalled,
-				struct perf_stat_output_ctx *out)
-{
-	const double thresh_ratios[3] = {50.0, 30.0, 10.0};
-
-	print_ratio(config, evsel, aggr_idx, stalled, out, STAT_CYCLES, thresh_ratios,
-		    "frontend cycles idle");
-}
-
-static void print_stalled_cycles_back(struct perf_stat_config *config,
-				const struct evsel *evsel,
-				int aggr_idx, double stalled,
-				struct perf_stat_output_ctx *out)
-{
-	const double thresh_ratios[3] = {75.0, 50.0, 20.0};
-
-	print_ratio(config, evsel, aggr_idx, stalled, out, STAT_CYCLES, thresh_ratios,
-		    "backend cycles idle");
-}
-
-static void print_branch_miss(struct perf_stat_config *config,
-			const struct evsel *evsel,
-			int aggr_idx, double misses,
-			struct perf_stat_output_ctx *out)
-{
-	const double thresh_ratios[3] = {20.0, 10.0, 5.0};
-
-	print_ratio(config, evsel, aggr_idx, misses, out, STAT_BRANCHES, thresh_ratios,
-		    "of all branches");
-}
-
-static void print_l1d_miss(struct perf_stat_config *config,
-			const struct evsel *evsel,
-			int aggr_idx, double misses,
-			struct perf_stat_output_ctx *out)
-{
-	const double thresh_ratios[3] = {20.0, 10.0, 5.0};
-
-	print_ratio(config, evsel, aggr_idx, misses, out, STAT_L1_DCACHE, thresh_ratios,
-		    "of all L1-dcache accesses");
-}
-
-static void print_l1i_miss(struct perf_stat_config *config,
-			const struct evsel *evsel,
-			int aggr_idx, double misses,
-			struct perf_stat_output_ctx *out)
-{
-	const double thresh_ratios[3] = {20.0, 10.0, 5.0};
-
-	print_ratio(config, evsel, aggr_idx, misses, out, STAT_L1_ICACHE, thresh_ratios,
-		    "of all L1-icache accesses");
-}
-
-static void print_ll_miss(struct perf_stat_config *config,
-			const struct evsel *evsel,
-			int aggr_idx, double misses,
-			struct perf_stat_output_ctx *out)
-{
-	const double thresh_ratios[3] = {20.0, 10.0, 5.0};
-
-	print_ratio(config, evsel, aggr_idx, misses, out, STAT_LL_CACHE, thresh_ratios,
-		    "of all LL-cache accesses");
-}
-
-static void print_dtlb_miss(struct perf_stat_config *config,
-			const struct evsel *evsel,
-			int aggr_idx, double misses,
-			struct perf_stat_output_ctx *out)
-{
-	const double thresh_ratios[3] = {20.0, 10.0, 5.0};
-
-	print_ratio(config, evsel, aggr_idx, misses, out, STAT_DTLB_CACHE, thresh_ratios,
-		    "of all dTLB cache accesses");
-}
-
-static void print_itlb_miss(struct perf_stat_config *config,
-			const struct evsel *evsel,
-			int aggr_idx, double misses,
-			struct perf_stat_output_ctx *out)
-{
-	const double thresh_ratios[3] = {20.0, 10.0, 5.0};
-
-	print_ratio(config, evsel, aggr_idx, misses, out, STAT_ITLB_CACHE, thresh_ratios,
-		    "of all iTLB cache accesses");
-}
-
-static void print_cache_miss(struct perf_stat_config *config,
-			const struct evsel *evsel,
-			int aggr_idx, double misses,
-			struct perf_stat_output_ctx *out)
-{
-	const double thresh_ratios[3] = {20.0, 10.0, 5.0};
-
-	print_ratio(config, evsel, aggr_idx, misses, out, STAT_CACHE_REFS, thresh_ratios,
-		    "of all cache refs");
-}
-
-static void print_instructions(struct perf_stat_config *config,
-			const struct evsel *evsel,
-			int aggr_idx, double instructions,
-			struct perf_stat_output_ctx *out)
-{
-	print_metric_t print_metric = out->print_metric;
-	void *ctxp = out->ctx;
-	double cycles = find_stat(evsel, aggr_idx, STAT_CYCLES);
-	double max_stalled = max(find_stat(evsel, aggr_idx, STAT_STALLED_CYCLES_FRONT),
-				find_stat(evsel, aggr_idx, STAT_STALLED_CYCLES_BACK));
-
-	if (cycles) {
-		print_metric(config, ctxp, METRIC_THRESHOLD_UNKNOWN, "%7.2f ",
-			     "insn per cycle", instructions / cycles);
-	} else {
-		print_metric(config, ctxp, METRIC_THRESHOLD_UNKNOWN, /*fmt=*/NULL,
-			     "insn per cycle", 0);
-	}
-	if (max_stalled && instructions) {
-		if (out->new_line)
-			out->new_line(config, ctxp);
-		print_metric(config, ctxp, METRIC_THRESHOLD_UNKNOWN, "%7.2f ",
-			     "stalled cycles per insn", max_stalled / instructions);
-	}
-}
-
-static void print_cycles(struct perf_stat_config *config,
-			const struct evsel *evsel,
-			int aggr_idx, double cycles,
-			struct perf_stat_output_ctx *out)
-{
-	double nsecs = find_stat(evsel, aggr_idx, STAT_NSECS);
-
-	if (cycles && nsecs) {
-		double ratio = cycles / nsecs;
-
-		out->print_metric(config, out->ctx, METRIC_THRESHOLD_UNKNOWN, "%8.3f",
-				  "GHz", ratio);
-	} else {
-		out->print_metric(config, out->ctx, METRIC_THRESHOLD_UNKNOWN, /*fmt=*/NULL,
-				  "GHz", 0);
-	}
-}
-
-static void print_nsecs(struct perf_stat_config *config,
-			const struct evsel *evsel,
-			int aggr_idx __maybe_unused, double nsecs,
-			struct perf_stat_output_ctx *out)
-{
-	print_metric_t print_metric = out->print_metric;
-	void *ctxp = out->ctx;
-	double wall_time = avg_stats(&walltime_nsecs_stats);
-
-	if (wall_time) {
-		print_metric(config, ctxp, METRIC_THRESHOLD_UNKNOWN, "%8.3f", "CPUs utilized",
-			nsecs / (wall_time * evsel->scale));
-	} else {
-		print_metric(config, ctxp, METRIC_THRESHOLD_UNKNOWN, /*fmt=*/NULL,
-			     "CPUs utilized", 0);
-	}
-}
-
 static int prepare_metric(const struct metric_expr *mexp,
 			  const struct evsel *evsel,
 			  struct expr_parse_ctx *pctx,
@@ -682,56 +337,15 @@ void *perf_stat__print_shadow_stats_metricgroup(struct perf_stat_config *config,
 
 void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 				   struct evsel *evsel,
-				   double avg, int aggr_idx,
+				   int aggr_idx,
 				   struct perf_stat_output_ctx *out)
 {
-	typedef void (*stat_print_function_t)(struct perf_stat_config *config,
-					const struct evsel *evsel,
-					int aggr_idx, double misses,
-					struct perf_stat_output_ctx *out);
-	static const stat_print_function_t stat_print_function[STAT_MAX] = {
-		[STAT_INSTRUCTIONS] = print_instructions,
-		[STAT_BRANCH_MISS] = print_branch_miss,
-		[STAT_L1D_MISS] = print_l1d_miss,
-		[STAT_L1I_MISS] = print_l1i_miss,
-		[STAT_DTLB_MISS] = print_dtlb_miss,
-		[STAT_ITLB_MISS] = print_itlb_miss,
-		[STAT_LL_MISS] = print_ll_miss,
-		[STAT_CACHE_MISSES] = print_cache_miss,
-		[STAT_STALLED_CYCLES_FRONT] = print_stalled_cycles_front,
-		[STAT_STALLED_CYCLES_BACK] = print_stalled_cycles_back,
-		[STAT_CYCLES] = print_cycles,
-		[STAT_NSECS] = print_nsecs,
-	};
 	print_metric_t print_metric = out->print_metric;
 	void *ctxp = out->ctx;
-	int num = 1;
+	int num = 0;
 
-	if (config->iostat_run) {
+	if (config->iostat_run)
 		iostat_print_metric(config, evsel, out);
-	} else {
-		stat_print_function_t fn = stat_print_function[evsel__stat_type(evsel)];
-
-		if (fn)
-			fn(config, evsel, aggr_idx, avg, out);
-		else {
-			double nsecs =	find_stat(evsel, aggr_idx, STAT_NSECS);
-
-			if (nsecs) {
-				char unit = ' ';
-				char unit_buf[10] = "/sec";
-				double ratio = convert_unit_double(1000000000.0 * avg / nsecs,
-								   &unit);
-
-				if (unit != ' ')
-					snprintf(unit_buf, sizeof(unit_buf), "%c/sec", unit);
-				print_metric(config, ctxp, METRIC_THRESHOLD_UNKNOWN, "%8.3f",
-					     unit_buf, ratio);
-			} else {
-				num = 0;
-			}
-		}
-	}
 
 	perf_stat__print_shadow_stats_metricgroup(config, evsel, aggr_idx,
 						  &num, NULL, out);
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 230474f49315..b42da4a29c44 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -184,7 +184,7 @@ struct perf_stat_output_ctx {
 
 void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 				   struct evsel *evsel,
-				   double avg, int aggr_idx,
+				   int aggr_idx,
 				   struct perf_stat_output_ctx *out);
 bool perf_stat__skip_metric_event(struct evsel *evsel, u64 ena, u64 run);
 void *perf_stat__print_shadow_stats_metricgroup(struct perf_stat_config *config,
-- 
2.51.2.1041.gc1ab5b90ca-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 08/18] perf stat: Fix default metricgroup display on hybrid
  2025-11-11 21:21 [PATCH v4 00/18] Ian Rogers
                   ` (6 preceding siblings ...)
  2025-11-11 21:21 ` [PATCH v4 07/18] perf stat: Remove hard coded shadow metrics Ian Rogers
@ 2025-11-11 21:21 ` Ian Rogers
  2025-11-11 21:21 ` [PATCH v4 09/18] perf stat: Sort default events/metrics Ian Rogers
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Ian Rogers @ 2025-11-11 21:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
	Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
	Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
	linux-perf-users, Andi Kleen, Weilin Wang

The logic to skip output of a default metric line was firing on
Alderlake and not displaying 'TopdownL1 (cpu_atom)'. Remove the
need_full_name check as it is equivalent to the different PMU test in
the cases we care about, merge the 'if's and flip the evsel of the PMU
test. The 'if' is now basically saying, if the output matches the last
printed output then skip the output.

Before:
```
             TopdownL1 (cpu_core)                 #     11.3 %  tma_bad_speculation
                                                  #     24.3 %  tma_frontend_bound
             TopdownL1 (cpu_core)                 #     33.9 %  tma_backend_bound
                                                  #     30.6 %  tma_retiring
                                                  #     42.2 %  tma_backend_bound
                                                  #     25.0 %  tma_frontend_bound       (49.81%)
                                                  #     12.8 %  tma_bad_speculation
                                                  #     20.0 %  tma_retiring             (59.46%)
```
After:
```
             TopdownL1 (cpu_core)                 #      8.3 %  tma_bad_speculation
                                                  #     43.7 %  tma_frontend_bound
                                                  #     30.7 %  tma_backend_bound
                                                  #     17.2 %  tma_retiring
             TopdownL1 (cpu_atom)                 #     31.9 %  tma_backend_bound
                                                  #     37.6 %  tma_frontend_bound       (49.66%)
                                                  #     18.0 %  tma_bad_speculation
                                                  #     12.6 %  tma_retiring             (59.58%)
```

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/stat-shadow.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index afbc49e8cb31..c1547128c396 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -256,11 +256,9 @@ static void perf_stat__print_metricgroup_header(struct perf_stat_config *config,
 	 * event. Only align with other metics from
 	 * different metric events.
 	 */
-	if (last_name && !strcmp(last_name, name)) {
-		if (!need_full_name || last_pmu != evsel->pmu) {
-			out->print_metricgroup_header(config, ctxp, NULL);
-			return;
-		}
+	if (last_name && !strcmp(last_name, name) && last_pmu == evsel->pmu) {
+		out->print_metricgroup_header(config, ctxp, NULL);
+		return;
 	}
 
 	if (need_full_name && evsel->pmu)
-- 
2.51.2.1041.gc1ab5b90ca-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 09/18] perf stat: Sort default events/metrics
  2025-11-11 21:21 [PATCH v4 00/18] Ian Rogers
                   ` (7 preceding siblings ...)
  2025-11-11 21:21 ` [PATCH v4 08/18] perf stat: Fix default metricgroup display on hybrid Ian Rogers
@ 2025-11-11 21:21 ` Ian Rogers
  2025-11-11 21:21 ` [PATCH v4 10/18] perf stat: Remove "unit" workarounds for metric-only Ian Rogers
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Ian Rogers @ 2025-11-11 21:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
	Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
	Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
	linux-perf-users, Andi Kleen, Weilin Wang

To improve the readability of default events/metrics, sort the evsels
after the Default metric groups have be parsed.

Before:
```
$ perf stat -a sleep 1
 Performance counter stats for 'system wide':

            22,087      context-switches                 #      nan cs/sec  cs_per_second
             TopdownL1 (cpu_core)                 #     10.3 %  tma_bad_speculation
                                                  #     25.8 %  tma_frontend_bound
                                                  #     34.5 %  tma_backend_bound
                                                  #     29.3 %  tma_retiring
             7,829      page-faults                      #      nan faults/sec  page_faults_per_second
       880,144,270      cpu_atom/cpu-cycles/             #      nan GHz  cycles_frequency       (50.10%)
     1,693,081,235      cpu_core/cpu-cycles/             #      nan GHz  cycles_frequency
             TopdownL1 (cpu_atom)                 #     20.5 %  tma_bad_speculation
                                                  #     13.8 %  tma_retiring             (50.26%)
                                                  #     34.6 %  tma_frontend_bound       (50.23%)
        89,326,916      cpu_atom/branches/               #      nan M/sec  branch_frequency     (60.19%)
       538,123,088      cpu_core/branches/               #      nan M/sec  branch_frequency
             1,368      cpu-migrations                   #      nan migrations/sec  migrations_per_second
                                                  #     31.1 %  tma_backend_bound        (60.19%)
              0.00 msec cpu-clock                        #      0.0 CPUs  CPUs_utilized
       485,744,856      cpu_atom/instructions/           #      0.6 instructions  insn_per_cycle  (59.87%)
     3,093,112,283      cpu_core/instructions/           #      1.8 instructions  insn_per_cycle
         4,939,427      cpu_atom/branch-misses/          #      5.0 %  branch_miss_rate         (49.77%)
         7,632,248      cpu_core/branch-misses/          #      1.4 %  branch_miss_rate

       1.005084693 seconds time elapsed
```
After:
```
$ perf stat -a sleep 1
 Performance counter stats for 'system wide':

            22,165      context-switches                 #      nan cs/sec  cs_per_second
              0.00 msec cpu-clock                        #      0.0 CPUs  CPUs_utilized
             2,260      cpu-migrations                   #      nan migrations/sec  migrations_per_second
            20,476      page-faults                      #      nan faults/sec  page_faults_per_second
        17,052,357      cpu_core/branch-misses/          #      1.5 %  branch_miss_rate
     1,120,090,590      cpu_core/branches/               #      nan M/sec  branch_frequency
     3,402,892,275      cpu_core/cpu-cycles/             #      nan GHz  cycles_frequency
     6,129,236,701      cpu_core/instructions/           #      1.8 instructions  insn_per_cycle
         6,159,523      cpu_atom/branch-misses/          #      3.1 %  branch_miss_rate         (49.86%)
       222,158,812      cpu_atom/branches/               #      nan M/sec  branch_frequency     (50.25%)
     1,547,610,244      cpu_atom/cpu-cycles/             #      nan GHz  cycles_frequency       (50.40%)
     1,304,901,260      cpu_atom/instructions/           #      0.8 instructions  insn_per_cycle  (50.41%)
             TopdownL1 (cpu_core)                 #     13.7 %  tma_bad_speculation
                                                  #     23.5 %  tma_frontend_bound
                                                  #     33.3 %  tma_backend_bound
                                                  #     29.6 %  tma_retiring
             TopdownL1 (cpu_atom)                 #     32.1 %  tma_backend_bound        (59.65%)
                                                  #     30.1 %  tma_frontend_bound       (59.51%)
                                                  #     22.3 %  tma_bad_speculation
                                                  #     15.5 %  tma_retiring             (59.53%)

       1.008405429 seconds time elapsed
```

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-stat.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 7862094b93c8..095016b2209e 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -74,6 +74,7 @@
 #include "util/intel-tpebs.h"
 #include "asm/bug.h"
 
+#include <linux/list_sort.h>
 #include <linux/time64.h>
 #include <linux/zalloc.h>
 #include <api/fs/fs.h>
@@ -1857,6 +1858,35 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
 	return 0;
 }
 
+static int default_evlist_evsel_cmp(void *priv __maybe_unused,
+				    const struct list_head *l,
+				    const struct list_head *r)
+{
+	const struct perf_evsel *lhs_core = container_of(l, struct perf_evsel, node);
+	const struct evsel *lhs = container_of(lhs_core, struct evsel, core);
+	const struct perf_evsel *rhs_core = container_of(r, struct perf_evsel, node);
+	const struct evsel *rhs = container_of(rhs_core, struct evsel, core);
+
+	if (evsel__leader(lhs) == evsel__leader(rhs)) {
+		/* Within the same group, respect the original order. */
+		return lhs_core->idx - rhs_core->idx;
+	}
+
+	/* Sort default metrics evsels first, and default show events before those. */
+	if (lhs->default_metricgroup != rhs->default_metricgroup)
+		return lhs->default_metricgroup ? -1 : 1;
+
+	if (lhs->default_show_events != rhs->default_show_events)
+		return lhs->default_show_events ? -1 : 1;
+
+	/* Sort by PMU type (prefers legacy types first). */
+	if (lhs->pmu != rhs->pmu)
+		return lhs->pmu->type - rhs->pmu->type;
+
+	/* Sort by name. */
+	return strcmp(evsel__name((struct evsel *)lhs), evsel__name((struct evsel *)rhs));
+}
+
 /*
  * Add default events, if there were no attributes specified or
  * if -d/--detailed, -d -d or -d -d -d is used:
@@ -2023,6 +2053,8 @@ static int add_default_events(void)
 							&metric_evlist->metric_events);
 			evlist__delete(metric_evlist);
 		}
+		list_sort(/*priv=*/NULL, &evlist->core.entries, default_evlist_evsel_cmp);
+
 	}
 out:
 	if (!ret) {
-- 
2.51.2.1041.gc1ab5b90ca-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 10/18] perf stat: Remove "unit" workarounds for metric-only
  2025-11-11 21:21 [PATCH v4 00/18] Ian Rogers
                   ` (8 preceding siblings ...)
  2025-11-11 21:21 ` [PATCH v4 09/18] perf stat: Sort default events/metrics Ian Rogers
@ 2025-11-11 21:21 ` Ian Rogers
  2025-11-11 21:21 ` [PATCH v4 11/18] perf test stat+json: Improve metric-only testing Ian Rogers
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Ian Rogers @ 2025-11-11 21:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
	Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
	Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
	linux-perf-users, Andi Kleen, Weilin Wang

Remove code that tested the "unit" as in KB/sec for certain hard coded
metric values and did workarounds.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/stat-display.c | 47 ++++++----------------------------
 1 file changed, 8 insertions(+), 39 deletions(-)

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index eabeab5e6614..b3596f9f5cdd 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -592,42 +592,18 @@ static void print_metricgroup_header_std(struct perf_stat_config *config,
 	fprintf(config->output, "%*s", MGROUP_LEN - n - 1, "");
 }
 
-/* Filter out some columns that don't work well in metrics only mode */
-
-static bool valid_only_metric(const char *unit)
-{
-	if (!unit)
-		return false;
-	if (strstr(unit, "/sec") ||
-	    strstr(unit, "CPUs utilized"))
-		return false;
-	return true;
-}
-
-static const char *fixunit(char *buf, struct evsel *evsel,
-			   const char *unit)
-{
-	if (!strncmp(unit, "of all", 6)) {
-		snprintf(buf, 1024, "%s %s", evsel__name(evsel),
-			 unit);
-		return buf;
-	}
-	return unit;
-}
-
 static void print_metric_only(struct perf_stat_config *config,
 			      void *ctx, enum metric_threshold_classify thresh,
 			      const char *fmt, const char *unit, double val)
 {
 	struct outstate *os = ctx;
 	FILE *out = os->fh;
-	char buf[1024], str[1024];
+	char str[1024];
 	unsigned mlen = config->metric_only_len;
 	const char *color = metric_threshold_classify__color(thresh);
 
-	if (!valid_only_metric(unit))
-		return;
-	unit = fixunit(buf, os->evsel, unit);
+	if (!unit)
+		unit = "";
 	if (mlen < strlen(unit))
 		mlen = strlen(unit) + 1;
 
@@ -643,16 +619,15 @@ static void print_metric_only_csv(struct perf_stat_config *config __maybe_unused
 				  void *ctx,
 				  enum metric_threshold_classify thresh __maybe_unused,
 				  const char *fmt,
-				  const char *unit, double val)
+				  const char *unit __maybe_unused, double val)
 {
 	struct outstate *os = ctx;
 	FILE *out = os->fh;
 	char buf[64], *vals, *ends;
-	char tbuf[1024];
 
-	if (!valid_only_metric(unit))
+	if (!unit)
 		return;
-	unit = fixunit(tbuf, os->evsel, unit);
+
 	snprintf(buf, sizeof(buf), fmt ?: "", val);
 	ends = vals = skip_spaces(buf);
 	while (isdigit(*ends) || *ends == '.')
@@ -670,13 +645,9 @@ static void print_metric_only_json(struct perf_stat_config *config __maybe_unuse
 {
 	struct outstate *os = ctx;
 	char buf[64], *ends;
-	char tbuf[1024];
 	const char *vals;
 
-	if (!valid_only_metric(unit))
-		return;
-	unit = fixunit(tbuf, os->evsel, unit);
-	if (!unit[0])
+	if (!unit || !unit[0])
 		return;
 	snprintf(buf, sizeof(buf), fmt ?: "", val);
 	vals = ends = skip_spaces(buf);
@@ -695,7 +666,6 @@ static void print_metric_header(struct perf_stat_config *config,
 				const char *unit, double val __maybe_unused)
 {
 	struct outstate *os = ctx;
-	char tbuf[1024];
 
 	/* In case of iostat, print metric header for first root port only */
 	if (config->iostat_run &&
@@ -705,9 +675,8 @@ static void print_metric_header(struct perf_stat_config *config,
 	if (os->evsel->cgrp != os->cgrp)
 		return;
 
-	if (!valid_only_metric(unit))
+	if (!unit)
 		return;
-	unit = fixunit(tbuf, os->evsel, unit);
 
 	if (config->json_output)
 		return;
-- 
2.51.2.1041.gc1ab5b90ca-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 11/18] perf test stat+json: Improve metric-only testing
  2025-11-11 21:21 [PATCH v4 00/18] Ian Rogers
                   ` (9 preceding siblings ...)
  2025-11-11 21:21 ` [PATCH v4 10/18] perf stat: Remove "unit" workarounds for metric-only Ian Rogers
@ 2025-11-11 21:21 ` Ian Rogers
  2025-11-11 21:22 ` [PATCH v4 12/18] perf test stat: Ignore failures in Default[234] metricgroups Ian Rogers
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Ian Rogers @ 2025-11-11 21:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
	Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
	Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
	linux-perf-users, Andi Kleen, Weilin Wang

When testing metric-only, pass a metric to perf rather than expecting
a hard coded metric value to be generated.

Remove keys that were really metric-only units and instead don't
expect metric only to have a matching json key as it encodes metrics
as {"metric_name", "metric_value"}.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/tests/shell/lib/perf_json_output_lint.py | 4 ++--
 tools/perf/tests/shell/stat+json_output.sh          | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/tests/shell/lib/perf_json_output_lint.py b/tools/perf/tests/shell/lib/perf_json_output_lint.py
index c6750ef06c0f..1369baaa0361 100644
--- a/tools/perf/tests/shell/lib/perf_json_output_lint.py
+++ b/tools/perf/tests/shell/lib/perf_json_output_lint.py
@@ -65,8 +65,6 @@ def check_json_output(expected_items):
       'socket': lambda x: True,
       'thread': lambda x: True,
       'unit': lambda x: True,
-      'insn per cycle': lambda x: isfloat(x),
-      'GHz': lambda x: True,  # FIXME: it seems unintended for --metric-only
   }
   input = '[\n' + ','.join(Lines) + '\n]'
   for item in json.loads(input):
@@ -88,6 +86,8 @@ def check_json_output(expected_items):
                            f' in \'{item}\'')
     for key, value in item.items():
       if key not in checks:
+        if args.metric_only:
+          continue
         raise RuntimeError(f'Unexpected key: key={key} value={value}')
       if not checks[key](value):
         raise RuntimeError(f'Check failed for: key={key} value={value}')
diff --git a/tools/perf/tests/shell/stat+json_output.sh b/tools/perf/tests/shell/stat+json_output.sh
index 98fb65274ac4..85d1ad7186c6 100755
--- a/tools/perf/tests/shell/stat+json_output.sh
+++ b/tools/perf/tests/shell/stat+json_output.sh
@@ -181,7 +181,7 @@ check_metric_only()
 		echo "[Skip] CPU-measurement counter facility not installed"
 		return
 	fi
-	perf stat -j --metric-only -e instructions,cycles -o "${stat_output}" true
+	perf stat -j --metric-only -M page_faults_per_second -o "${stat_output}" true
 	$PYTHON $pythonchecker --metric-only --file "${stat_output}"
 	echo "[Success]"
 }
-- 
2.51.2.1041.gc1ab5b90ca-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 12/18] perf test stat: Ignore failures in Default[234] metricgroups
  2025-11-11 21:21 [PATCH v4 00/18] Ian Rogers
                   ` (10 preceding siblings ...)
  2025-11-11 21:21 ` [PATCH v4 11/18] perf test stat+json: Improve metric-only testing Ian Rogers
@ 2025-11-11 21:22 ` Ian Rogers
  2025-11-11 21:22 ` [PATCH v4 13/18] perf test stat: Update std_output testing metric expectations Ian Rogers
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Ian Rogers @ 2025-11-11 21:22 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
	Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
	Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
	linux-perf-users, Andi Kleen, Weilin Wang

The Default[234] metric groups may contain unsupported legacy
events. Allow those metric groups to fail.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/tests/shell/stat_all_metricgroups.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/tests/shell/stat_all_metricgroups.sh b/tools/perf/tests/shell/stat_all_metricgroups.sh
index c6d61a4ac3e7..1400880ec01f 100755
--- a/tools/perf/tests/shell/stat_all_metricgroups.sh
+++ b/tools/perf/tests/shell/stat_all_metricgroups.sh
@@ -37,6 +37,9 @@ do
       then
         err=2 # Skip
       fi
+    elif [[ "$m" == @(Default2|Default3|Default4) ]]
+    then
+      echo "Ignoring failures in $m that may contain unsupported legacy events"
     else
       echo "Metric group $m failed"
       echo $result
-- 
2.51.2.1041.gc1ab5b90ca-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 13/18] perf test stat: Update std_output testing metric expectations
  2025-11-11 21:21 [PATCH v4 00/18] Ian Rogers
                   ` (11 preceding siblings ...)
  2025-11-11 21:22 ` [PATCH v4 12/18] perf test stat: Ignore failures in Default[234] metricgroups Ian Rogers
@ 2025-11-11 21:22 ` Ian Rogers
  2025-11-11 21:22 ` [PATCH v4 14/18] perf test metrics: Update all metrics for possibly failing default metrics Ian Rogers
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Ian Rogers @ 2025-11-11 21:22 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
	Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
	Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
	linux-perf-users, Andi Kleen, Weilin Wang

Make the expectations match json metrics rather than the previous hard
coded ones.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/tests/shell/stat+std_output.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/tests/shell/stat+std_output.sh b/tools/perf/tests/shell/stat+std_output.sh
index ec41f24299d9..9c4b92ecf448 100755
--- a/tools/perf/tests/shell/stat+std_output.sh
+++ b/tools/perf/tests/shell/stat+std_output.sh
@@ -12,8 +12,8 @@ set -e
 stat_output=$(mktemp /tmp/__perf_test.stat_output.std.XXXXX)
 
 event_name=(cpu-clock task-clock context-switches cpu-migrations page-faults stalled-cycles-frontend stalled-cycles-backend cycles instructions branches branch-misses)
-event_metric=("CPUs utilized" "CPUs utilized" "/sec" "/sec" "/sec" "frontend cycles idle" "backend cycles idle" "GHz" "insn per cycle" "/sec" "of all branches")
-skip_metric=("stalled cycles per insn" "tma_" "retiring" "frontend_bound" "bad_speculation" "backend_bound" "TopdownL1" "percent of slots")
+event_metric=("CPUs_utilized" "CPUs_utilized" "cs/sec" "migrations/sec" "faults/sec" "frontend_cycles_idle" "backend_cycles_idle" "GHz" "insn_per_cycle" "/sec" "branch_miss_rate")
+skip_metric=("tma_" "TopdownL1")
 
 cleanup() {
   rm -f "${stat_output}"
-- 
2.51.2.1041.gc1ab5b90ca-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 14/18] perf test metrics: Update all metrics for possibly failing default metrics
  2025-11-11 21:21 [PATCH v4 00/18] Ian Rogers
                   ` (12 preceding siblings ...)
  2025-11-11 21:22 ` [PATCH v4 13/18] perf test stat: Update std_output testing metric expectations Ian Rogers
@ 2025-11-11 21:22 ` Ian Rogers
  2025-11-11 21:22 ` [PATCH v4 15/18] perf test stat: Update shadow test to use metrics Ian Rogers
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Ian Rogers @ 2025-11-11 21:22 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
	Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
	Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
	linux-perf-users, Andi Kleen, Weilin Wang

Default metrics may use unsupported events and be ignored. These
metrics shouldn't cause metric testing to fail.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/tests/shell/stat_all_metrics.sh | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/perf/tests/shell/stat_all_metrics.sh b/tools/perf/tests/shell/stat_all_metrics.sh
index 6fa585a1e34c..a7edf01b3943 100755
--- a/tools/perf/tests/shell/stat_all_metrics.sh
+++ b/tools/perf/tests/shell/stat_all_metrics.sh
@@ -25,8 +25,13 @@ for m in $(perf list --raw-dump metrics); do
     # No error result and metric shown.
     continue
   fi
-  if [[ "$result" =~ "Cannot resolve IDs for" ]]
+  if [[ "$result" =~ "Cannot resolve IDs for" || "$result" =~ "No supported events found" ]]
   then
+    if [[ "$m" == @(l1_prefetch_miss_rate|stalled_cycles_per_instruction) ]]
+    then
+      # Default metrics that may use unsupported events.
+      continue
+    fi
     echo "Metric contains missing events"
     echo $result
     err=1 # Fail
-- 
2.51.2.1041.gc1ab5b90ca-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 15/18] perf test stat: Update shadow test to use metrics
  2025-11-11 21:21 [PATCH v4 00/18] Ian Rogers
                   ` (13 preceding siblings ...)
  2025-11-11 21:22 ` [PATCH v4 14/18] perf test metrics: Update all metrics for possibly failing default metrics Ian Rogers
@ 2025-11-11 21:22 ` Ian Rogers
  2025-11-11 21:22 ` [PATCH v4 16/18] perf test stat: Update test expectations and events Ian Rogers
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Ian Rogers @ 2025-11-11 21:22 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
	Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
	Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
	linux-perf-users, Andi Kleen, Weilin Wang

Previously '-e cycles,instructions' would implicitly create an IPC
metric. This now has to be explicit with '-M insn_per_cycle'.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/tests/shell/stat+shadow_stat.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/tests/shell/stat+shadow_stat.sh b/tools/perf/tests/shell/stat+shadow_stat.sh
index 8824f445d343..cabbbf17c662 100755
--- a/tools/perf/tests/shell/stat+shadow_stat.sh
+++ b/tools/perf/tests/shell/stat+shadow_stat.sh
@@ -14,7 +14,7 @@ perf stat -a -e cycles sleep 1 2>&1 | grep -e cpu_core && exit 2
 
 test_global_aggr()
 {
-	perf stat -a --no-big-num -e cycles,instructions sleep 1  2>&1 | \
+	perf stat -a --no-big-num -M insn_per_cycle sleep 1  2>&1 | \
 	grep -e cycles -e instructions | \
 	while read num evt _ ipc rest
 	do
@@ -53,7 +53,7 @@ test_global_aggr()
 
 test_no_aggr()
 {
-	perf stat -a -A --no-big-num -e cycles,instructions sleep 1  2>&1 | \
+	perf stat -a -A --no-big-num -M insn_per_cycle sleep 1  2>&1 | \
 	grep ^CPU | \
 	while read cpu num evt _ ipc rest
 	do
-- 
2.51.2.1041.gc1ab5b90ca-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 16/18] perf test stat: Update test expectations and events
  2025-11-11 21:21 [PATCH v4 00/18] Ian Rogers
                   ` (14 preceding siblings ...)
  2025-11-11 21:22 ` [PATCH v4 15/18] perf test stat: Update shadow test to use metrics Ian Rogers
@ 2025-11-11 21:22 ` Ian Rogers
  2025-11-11 21:22 ` [PATCH v4 17/18] perf test stat csv: " Ian Rogers
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Ian Rogers @ 2025-11-11 21:22 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
	Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
	Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
	linux-perf-users, Andi Kleen, Weilin Wang

test_stat_record_report and test_stat_record_script used default
output which triggers a bug when sending metrics. As this isn't
relevant to the test switch to using named software events.

Update the match in test_hybrid as the cycles event is now cpu-cycles
to workaround potential ARM issues.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/tests/shell/stat.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/tests/shell/stat.sh b/tools/perf/tests/shell/stat.sh
index 8a100a7f2dc1..985adc02749e 100755
--- a/tools/perf/tests/shell/stat.sh
+++ b/tools/perf/tests/shell/stat.sh
@@ -18,7 +18,7 @@ test_default_stat() {
 
 test_stat_record_report() {
   echo "stat record and report test"
-  if ! perf stat record -o - true | perf stat report -i - 2>&1 | \
+  if ! perf stat record -e task-clock -o - true | perf stat report -i - 2>&1 | \
     grep -E -q "Performance counter stats for 'pipe':"
   then
     echo "stat record and report test [Failed]"
@@ -30,7 +30,7 @@ test_stat_record_report() {
 
 test_stat_record_script() {
   echo "stat record and script test"
-  if ! perf stat record -o - true | perf script -i - 2>&1 | \
+  if ! perf stat record -e task-clock -o - true | perf script -i - 2>&1 | \
     grep -E -q "CPU[[:space:]]+THREAD[[:space:]]+VAL[[:space:]]+ENA[[:space:]]+RUN[[:space:]]+TIME[[:space:]]+EVENT"
   then
     echo "stat record and script test [Failed]"
@@ -196,7 +196,7 @@ test_hybrid() {
   fi
 
   # Run default Perf stat
-  cycles_events=$(perf stat -- true 2>&1 | grep -E "/cycles/[uH]*|  cycles[:uH]*  " -c)
+  cycles_events=$(perf stat -a -- sleep 0.1 2>&1 | grep -E "/cpu-cycles/[uH]*|  cpu-cycles[:uH]*  " -c)
 
   # The expectation is that default output will have a cycles events on each
   # hybrid PMU. In situations with no cycles PMU events, like virtualized, this
-- 
2.51.2.1041.gc1ab5b90ca-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 17/18] perf test stat csv: Update test expectations and events
  2025-11-11 21:21 [PATCH v4 00/18] Ian Rogers
                   ` (15 preceding siblings ...)
  2025-11-11 21:22 ` [PATCH v4 16/18] perf test stat: Update test expectations and events Ian Rogers
@ 2025-11-11 21:22 ` Ian Rogers
  2025-11-11 21:22 ` [PATCH v4 18/18] perf tool_pmu: Make core_wide and target_cpu json events Ian Rogers
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Ian Rogers @ 2025-11-11 21:22 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
	Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
	Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
	linux-perf-users, Andi Kleen, Weilin Wang

Explicitly use a metric rather than implicitly expecting '-e
instructions,cycles' to produce a metric. Use a metric with software
events to make it more compatible.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/tests/shell/lib/stat_output.sh | 2 +-
 tools/perf/tests/shell/stat+csv_output.sh | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/tests/shell/lib/stat_output.sh b/tools/perf/tests/shell/lib/stat_output.sh
index c2ec7881ec1d..3c36e80fe422 100644
--- a/tools/perf/tests/shell/lib/stat_output.sh
+++ b/tools/perf/tests/shell/lib/stat_output.sh
@@ -156,7 +156,7 @@ check_metric_only()
 		echo "[Skip] CPU-measurement counter facility not installed"
 		return
 	fi
-	perf stat --metric-only $2 -e instructions,cycles true
+	perf stat --metric-only $2 -M page_faults_per_second true
 	commachecker --metric-only
 	echo "[Success]"
 }
diff --git a/tools/perf/tests/shell/stat+csv_output.sh b/tools/perf/tests/shell/stat+csv_output.sh
index 7a6f6e177402..cd6fff597091 100755
--- a/tools/perf/tests/shell/stat+csv_output.sh
+++ b/tools/perf/tests/shell/stat+csv_output.sh
@@ -44,7 +44,7 @@ function commachecker()
 	;; "--per-die")		exp=8
 	;; "--per-cluster")	exp=8
 	;; "--per-cache")	exp=8
-	;; "--metric-only")	exp=2
+	;; "--metric-only")	exp=1
 	esac
 
 	while read line
-- 
2.51.2.1041.gc1ab5b90ca-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 18/18] perf tool_pmu: Make core_wide and target_cpu json events
  2025-11-11 21:21 [PATCH v4 00/18] Ian Rogers
                   ` (16 preceding siblings ...)
  2025-11-11 21:22 ` [PATCH v4 17/18] perf test stat csv: " Ian Rogers
@ 2025-11-11 21:22 ` Ian Rogers
  2025-11-11 22:42 ` [PATCH v4 00/18] Namhyung Kim
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Ian Rogers @ 2025-11-11 21:22 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
	Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
	Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
	linux-perf-users, Andi Kleen, Weilin Wang

For the sake of better documentation, add core_wide and target_cpu to
the tool.json. When the values of system_wide and
user_requested_cpu_list are unknown, use the values from the global
stat_config.

Example output showing how '-a' modifies the values in `perf stat`:
```
$ perf stat -e core_wide,target_cpu true

 Performance counter stats for 'true':

                 0      core_wide
                 0      target_cpu

       0.000993787 seconds time elapsed

       0.001128000 seconds user
       0.000000000 seconds sys

$ perf stat -e core_wide,target_cpu -a true

 Performance counter stats for 'system wide':

                 1      core_wide
                 1      target_cpu

       0.002271723 seconds time elapsed

$ perf list
...
tool:
  core_wide
       [1 if not SMT,if SMT are events being gathered on all SMT threads 1 otherwise 0. Unit: tool]
...
  target_cpu
       [1 if CPUs being analyzed,0 if threads/processes. Unit: tool]
...
```

Signed-off-by: Ian Rogers <irogers@google.com>
---
 .../pmu-events/arch/common/common/tool.json   |  12 +
 tools/perf/pmu-events/empty-pmu-events.c      | 228 +++++++++---------
 tools/perf/util/expr.c                        |  11 +-
 tools/perf/util/stat-shadow.c                 |   2 +
 tools/perf/util/tool_pmu.c                    |  24 +-
 tools/perf/util/tool_pmu.h                    |   9 +-
 6 files changed, 163 insertions(+), 123 deletions(-)

diff --git a/tools/perf/pmu-events/arch/common/common/tool.json b/tools/perf/pmu-events/arch/common/common/tool.json
index 12f2ef1813a6..14d0d60a1976 100644
--- a/tools/perf/pmu-events/arch/common/common/tool.json
+++ b/tools/perf/pmu-events/arch/common/common/tool.json
@@ -70,5 +70,17 @@
     "EventName": "system_tsc_freq",
     "BriefDescription": "The amount a Time Stamp Counter (TSC) increases per second",
     "ConfigCode": "12"
+  },
+  {
+    "Unit": "tool",
+    "EventName": "core_wide",
+    "BriefDescription": "1 if not SMT, if SMT are events being gathered on all SMT threads 1 otherwise 0",
+    "ConfigCode": "13"
+  },
+  {
+    "Unit": "tool",
+    "EventName": "target_cpu",
+    "BriefDescription": "1 if CPUs being analyzed, 0 if threads/processes",
+    "ConfigCode": "14"
   }
 ]
diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
index 7fa42f13300f..76c395cf513c 100644
--- a/tools/perf/pmu-events/empty-pmu-events.c
+++ b/tools/perf/pmu-events/empty-pmu-events.c
@@ -1279,62 +1279,64 @@ static const char *const big_c_string =
 /* offset=125889 */ "slots\000tool\000Number of functional units that in parallel can execute parts of an instruction\000config=0xa\000\00000\000\000\000\000\000"
 /* offset=125999 */ "smt_on\000tool\0001 if simultaneous multithreading (aka hyperthreading) is enable otherwise 0\000config=0xb\000\00000\000\000\000\000\000"
 /* offset=126106 */ "system_tsc_freq\000tool\000The amount a Time Stamp Counter (TSC) increases per second\000config=0xc\000\00000\000\000\000\000\000"
-/* offset=126205 */ "bp_l1_btb_correct\000branch\000L1 BTB Correction\000event=0x8a\000\00000\000\000\000\000\000"
-/* offset=126267 */ "bp_l2_btb_correct\000branch\000L2 BTB Correction\000event=0x8b\000\00000\000\000\000\000\000"
-/* offset=126329 */ "l3_cache_rd\000cache\000L3 cache access, read\000event=0x40\000\00000\000\000\000\000Attributable Level 3 cache access, read\000"
-/* offset=126427 */ "segment_reg_loads.any\000other\000Number of segment register loads\000event=6,period=200000,umask=0x80\000\00000\000\000\000\000\000"
-/* offset=126529 */ "dispatch_blocked.any\000other\000Memory cluster signals to block micro-op dispatch for any reason\000event=9,period=200000,umask=0x20\000\00000\000\000\000\000\000"
-/* offset=126662 */ "eist_trans\000other\000Number of Enhanced Intel SpeedStep(R) Technology (EIST) transitions\000event=0x3a,period=200000\000\00000\000\000\000\000\000"
-/* offset=126780 */ "hisi_sccl,ddrc\000"
-/* offset=126795 */ "uncore_hisi_ddrc.flux_wcmd\000uncore\000DDRC write commands\000event=2\000\00000\000\000\000\000\000"
-/* offset=126865 */ "uncore_cbox\000"
-/* offset=126877 */ "unc_cbo_xsnp_response.miss_eviction\000uncore\000A cross-core snoop resulted from L3 Eviction which misses in some processor core\000event=0x22,umask=0x81\000\00000\000\000\000\000\000"
-/* offset=127031 */ "event-hyphen\000uncore\000UNC_CBO_HYPHEN\000event=0xe0\000\00000\000\000\000\000\000"
-/* offset=127085 */ "event-two-hyph\000uncore\000UNC_CBO_TWO_HYPH\000event=0xc0\000\00000\000\000\000\000\000"
-/* offset=127143 */ "hisi_sccl,l3c\000"
-/* offset=127157 */ "uncore_hisi_l3c.rd_hit_cpipe\000uncore\000Total read hits\000event=7\000\00000\000\000\000\000\000"
-/* offset=127225 */ "uncore_imc_free_running\000"
-/* offset=127249 */ "uncore_imc_free_running.cache_miss\000uncore\000Total cache misses\000event=0x12\000\00000\000\000\000\000\000"
-/* offset=127329 */ "uncore_imc\000"
-/* offset=127340 */ "uncore_imc.cache_hits\000uncore\000Total cache hits\000event=0x34\000\00000\000\000\000\000\000"
-/* offset=127405 */ "uncore_sys_ddr_pmu\000"
-/* offset=127424 */ "sys_ddr_pmu.write_cycles\000uncore\000ddr write-cycles event\000event=0x2b\000v8\00000\000\000\000\000\000"
-/* offset=127500 */ "uncore_sys_ccn_pmu\000"
-/* offset=127519 */ "sys_ccn_pmu.read_cycles\000uncore\000ccn read-cycles event\000config=0x2c\0000x01\00000\000\000\000\000\000"
-/* offset=127596 */ "uncore_sys_cmn_pmu\000"
-/* offset=127615 */ "sys_cmn_pmu.hnf_cache_miss\000uncore\000Counts total cache misses in first lookup result (high priority)\000eventid=1,type=5\000(434|436|43c|43a).*\00000\000\000\000\000\000"
-/* offset=127758 */ "CPUs_utilized\000Default\000(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)\000\000Average CPU utilization\000\0001CPUs\000\000\000\000011"
-/* offset=127944 */ "cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\000011"
-/* offset=128177 */ "migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\000011"
-/* offset=128437 */ "page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\000011"
-/* offset=128668 */ "insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\000001"
-/* offset=128781 */ "stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001"
-/* offset=128945 */ "frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001"
-/* offset=129075 */ "backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\000001"
-/* offset=129201 */ "cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011"
-/* offset=129377 */ "branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\000011"
-/* offset=129557 */ "branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001"
-/* offset=129661 */ "l1d_miss_rate\000Default2\000L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads\000l1d_miss_rate > 0.05\000L1D  miss rate\000\000100%\000\000\000\000001"
-/* offset=129777 */ "llc_miss_rate\000Default2\000LLC\\-load\\-misses / LLC\\-loads\000llc_miss_rate > 0.05\000LLC miss rate\000\000100%\000\000\000\000001"
-/* offset=129878 */ "l1i_miss_rate\000Default3\000L1\\-icache\\-load\\-misses / L1\\-icache\\-loads\000l1i_miss_rate > 0.05\000L1I miss rate\000\000100%\000\000\000\000001"
-/* offset=129993 */ "dtlb_miss_rate\000Default3\000dTLB\\-load\\-misses / dTLB\\-loads\000dtlb_miss_rate > 0.05\000dTLB miss rate\000\000100%\000\000\000\000001"
-/* offset=130099 */ "itlb_miss_rate\000Default3\000iTLB\\-load\\-misses / iTLB\\-loads\000itlb_miss_rate > 0.05\000iTLB miss rate\000\000100%\000\000\000\000001"
-/* offset=130205 */ "l1_prefetch_miss_rate\000Default4\000L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches\000l1_prefetch_miss_rate > 0.05\000L1 prefetch miss rate\000\000100%\000\000\000\000001"
-/* offset=130353 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000"
-/* offset=130376 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000"
-/* offset=130440 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000"
-/* offset=130607 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
-/* offset=130672 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
-/* offset=130740 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000"
-/* offset=130812 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000"
-/* offset=130907 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000"
-/* offset=131042 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000"
-/* offset=131107 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000"
-/* offset=131176 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000"
-/* offset=131247 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\000000"
-/* offset=131270 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\000000"
-/* offset=131293 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\000000"
-/* offset=131314 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000"
+/* offset=126205 */ "core_wide\000tool\0001 if not SMT, if SMT are events being gathered on all SMT threads 1 otherwise 0\000config=0xd\000\00000\000\000\000\000\000"
+/* offset=126319 */ "target_cpu\000tool\0001 if CPUs being analyzed, 0 if threads/processes\000config=0xe\000\00000\000\000\000\000\000"
+/* offset=126403 */ "bp_l1_btb_correct\000branch\000L1 BTB Correction\000event=0x8a\000\00000\000\000\000\000\000"
+/* offset=126465 */ "bp_l2_btb_correct\000branch\000L2 BTB Correction\000event=0x8b\000\00000\000\000\000\000\000"
+/* offset=126527 */ "l3_cache_rd\000cache\000L3 cache access, read\000event=0x40\000\00000\000\000\000\000Attributable Level 3 cache access, read\000"
+/* offset=126625 */ "segment_reg_loads.any\000other\000Number of segment register loads\000event=6,period=200000,umask=0x80\000\00000\000\000\000\000\000"
+/* offset=126727 */ "dispatch_blocked.any\000other\000Memory cluster signals to block micro-op dispatch for any reason\000event=9,period=200000,umask=0x20\000\00000\000\000\000\000\000"
+/* offset=126860 */ "eist_trans\000other\000Number of Enhanced Intel SpeedStep(R) Technology (EIST) transitions\000event=0x3a,period=200000\000\00000\000\000\000\000\000"
+/* offset=126978 */ "hisi_sccl,ddrc\000"
+/* offset=126993 */ "uncore_hisi_ddrc.flux_wcmd\000uncore\000DDRC write commands\000event=2\000\00000\000\000\000\000\000"
+/* offset=127063 */ "uncore_cbox\000"
+/* offset=127075 */ "unc_cbo_xsnp_response.miss_eviction\000uncore\000A cross-core snoop resulted from L3 Eviction which misses in some processor core\000event=0x22,umask=0x81\000\00000\000\000\000\000\000"
+/* offset=127229 */ "event-hyphen\000uncore\000UNC_CBO_HYPHEN\000event=0xe0\000\00000\000\000\000\000\000"
+/* offset=127283 */ "event-two-hyph\000uncore\000UNC_CBO_TWO_HYPH\000event=0xc0\000\00000\000\000\000\000\000"
+/* offset=127341 */ "hisi_sccl,l3c\000"
+/* offset=127355 */ "uncore_hisi_l3c.rd_hit_cpipe\000uncore\000Total read hits\000event=7\000\00000\000\000\000\000\000"
+/* offset=127423 */ "uncore_imc_free_running\000"
+/* offset=127447 */ "uncore_imc_free_running.cache_miss\000uncore\000Total cache misses\000event=0x12\000\00000\000\000\000\000\000"
+/* offset=127527 */ "uncore_imc\000"
+/* offset=127538 */ "uncore_imc.cache_hits\000uncore\000Total cache hits\000event=0x34\000\00000\000\000\000\000\000"
+/* offset=127603 */ "uncore_sys_ddr_pmu\000"
+/* offset=127622 */ "sys_ddr_pmu.write_cycles\000uncore\000ddr write-cycles event\000event=0x2b\000v8\00000\000\000\000\000\000"
+/* offset=127698 */ "uncore_sys_ccn_pmu\000"
+/* offset=127717 */ "sys_ccn_pmu.read_cycles\000uncore\000ccn read-cycles event\000config=0x2c\0000x01\00000\000\000\000\000\000"
+/* offset=127794 */ "uncore_sys_cmn_pmu\000"
+/* offset=127813 */ "sys_cmn_pmu.hnf_cache_miss\000uncore\000Counts total cache misses in first lookup result (high priority)\000eventid=1,type=5\000(434|436|43c|43a).*\00000\000\000\000\000\000"
+/* offset=127956 */ "CPUs_utilized\000Default\000(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)\000\000Average CPU utilization\000\0001CPUs\000\000\000\000011"
+/* offset=128142 */ "cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\000011"
+/* offset=128375 */ "migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\000011"
+/* offset=128635 */ "page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\000011"
+/* offset=128866 */ "insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\000001"
+/* offset=128979 */ "stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001"
+/* offset=129143 */ "frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001"
+/* offset=129273 */ "backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\000001"
+/* offset=129399 */ "cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011"
+/* offset=129575 */ "branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\000011"
+/* offset=129755 */ "branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001"
+/* offset=129859 */ "l1d_miss_rate\000Default2\000L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads\000l1d_miss_rate > 0.05\000L1D  miss rate\000\000100%\000\000\000\000001"
+/* offset=129975 */ "llc_miss_rate\000Default2\000LLC\\-load\\-misses / LLC\\-loads\000llc_miss_rate > 0.05\000LLC miss rate\000\000100%\000\000\000\000001"
+/* offset=130076 */ "l1i_miss_rate\000Default3\000L1\\-icache\\-load\\-misses / L1\\-icache\\-loads\000l1i_miss_rate > 0.05\000L1I miss rate\000\000100%\000\000\000\000001"
+/* offset=130191 */ "dtlb_miss_rate\000Default3\000dTLB\\-load\\-misses / dTLB\\-loads\000dtlb_miss_rate > 0.05\000dTLB miss rate\000\000100%\000\000\000\000001"
+/* offset=130297 */ "itlb_miss_rate\000Default3\000iTLB\\-load\\-misses / iTLB\\-loads\000itlb_miss_rate > 0.05\000iTLB miss rate\000\000100%\000\000\000\000001"
+/* offset=130403 */ "l1_prefetch_miss_rate\000Default4\000L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches\000l1_prefetch_miss_rate > 0.05\000L1 prefetch miss rate\000\000100%\000\000\000\000001"
+/* offset=130551 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000"
+/* offset=130574 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000"
+/* offset=130638 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000"
+/* offset=130805 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
+/* offset=130870 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
+/* offset=130938 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000"
+/* offset=131010 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000"
+/* offset=131105 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000"
+/* offset=131240 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000"
+/* offset=131305 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000"
+/* offset=131374 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000"
+/* offset=131445 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\000000"
+/* offset=131468 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\000000"
+/* offset=131491 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\000000"
+/* offset=131512 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000"
 ;
 
 static const struct compact_pmu_event pmu_events__common_default_core[] = {
@@ -2587,6 +2589,7 @@ static const struct compact_pmu_event pmu_events__common_software[] = {
 { 123607 }, /* task-clock\000software\000Per-task high-resolution timer based event\000config=1\000\000001e-6msec\000\000\000\000\000 */
 };
 static const struct compact_pmu_event pmu_events__common_tool[] = {
+{ 126205 }, /* core_wide\000tool\0001 if not SMT, if SMT are events being gathered on all SMT threads 1 otherwise 0\000config=0xd\000\00000\000\000\000\000\000 */
 { 125072 }, /* duration_time\000tool\000Wall clock interval time in nanoseconds\000config=1\000\00000\000\000\000\000\000 */
 { 125286 }, /* has_pmem\000tool\0001 if persistent memory installed otherwise 0\000config=4\000\00000\000\000\000\000\000 */
 { 125362 }, /* num_cores\000tool\000Number of cores. A core consists of 1 or more thread, with each thread being associated with a logical Linux CPU\000config=5\000\00000\000\000\000\000\000 */
@@ -2598,6 +2601,7 @@ static const struct compact_pmu_event pmu_events__common_tool[] = {
 { 125999 }, /* smt_on\000tool\0001 if simultaneous multithreading (aka hyperthreading) is enable otherwise 0\000config=0xb\000\00000\000\000\000\000\000 */
 { 125218 }, /* system_time\000tool\000System/kernel time in nanoseconds\000config=3\000\00000\000\000\000\000\000 */
 { 126106 }, /* system_tsc_freq\000tool\000The amount a Time Stamp Counter (TSC) increases per second\000config=0xc\000\00000\000\000\000\000\000 */
+{ 126319 }, /* target_cpu\000tool\0001 if CPUs being analyzed, 0 if threads/processes\000config=0xe\000\00000\000\000\000\000\000 */
 { 125148 }, /* user_time\000tool\000User (non-kernel) time in nanoseconds\000config=2\000\00000\000\000\000\000\000 */
 
 };
@@ -2621,23 +2625,23 @@ static const struct pmu_table_entry pmu_events__common[] = {
 };
 
 static const struct compact_pmu_event pmu_metrics__common_default_core[] = {
-{ 127758 }, /* CPUs_utilized\000Default\000(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)\000\000Average CPU utilization\000\0001CPUs\000\000\000\000011 */
-{ 129075 }, /* backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\000001 */
-{ 129377 }, /* branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\000011 */
-{ 129557 }, /* branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001 */
-{ 127944 }, /* cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\000011 */
-{ 129201 }, /* cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011 */
-{ 129993 }, /* dtlb_miss_rate\000Default3\000dTLB\\-load\\-misses / dTLB\\-loads\000dtlb_miss_rate > 0.05\000dTLB miss rate\000\000100%\000\000\000\000001 */
-{ 128945 }, /* frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001 */
-{ 128668 }, /* insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\000001 */
-{ 130099 }, /* itlb_miss_rate\000Default3\000iTLB\\-load\\-misses / iTLB\\-loads\000itlb_miss_rate > 0.05\000iTLB miss rate\000\000100%\000\000\000\000001 */
-{ 130205 }, /* l1_prefetch_miss_rate\000Default4\000L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches\000l1_prefetch_miss_rate > 0.05\000L1 prefetch miss rate\000\000100%\000\000\000\000001 */
-{ 129661 }, /* l1d_miss_rate\000Default2\000L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads\000l1d_miss_rate > 0.05\000L1D  miss rate\000\000100%\000\000\000\000001 */
-{ 129878 }, /* l1i_miss_rate\000Default3\000L1\\-icache\\-load\\-misses / L1\\-icache\\-loads\000l1i_miss_rate > 0.05\000L1I miss rate\000\000100%\000\000\000\000001 */
-{ 129777 }, /* llc_miss_rate\000Default2\000LLC\\-load\\-misses / LLC\\-loads\000llc_miss_rate > 0.05\000LLC miss rate\000\000100%\000\000\000\000001 */
-{ 128177 }, /* migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\000011 */
-{ 128437 }, /* page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\000011 */
-{ 128781 }, /* stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001 */
+{ 127956 }, /* CPUs_utilized\000Default\000(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)\000\000Average CPU utilization\000\0001CPUs\000\000\000\000011 */
+{ 129273 }, /* backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\000001 */
+{ 129575 }, /* branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000M/sec\000\000\000\000011 */
+{ 129755 }, /* branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001 */
+{ 128142 }, /* cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\000011 */
+{ 129399 }, /* cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011 */
+{ 130191 }, /* dtlb_miss_rate\000Default3\000dTLB\\-load\\-misses / dTLB\\-loads\000dtlb_miss_rate > 0.05\000dTLB miss rate\000\000100%\000\000\000\000001 */
+{ 129143 }, /* frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001 */
+{ 128866 }, /* insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\000001 */
+{ 130297 }, /* itlb_miss_rate\000Default3\000iTLB\\-load\\-misses / iTLB\\-loads\000itlb_miss_rate > 0.05\000iTLB miss rate\000\000100%\000\000\000\000001 */
+{ 130403 }, /* l1_prefetch_miss_rate\000Default4\000L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches\000l1_prefetch_miss_rate > 0.05\000L1 prefetch miss rate\000\000100%\000\000\000\000001 */
+{ 129859 }, /* l1d_miss_rate\000Default2\000L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads\000l1d_miss_rate > 0.05\000L1D  miss rate\000\000100%\000\000\000\000001 */
+{ 130076 }, /* l1i_miss_rate\000Default3\000L1\\-icache\\-load\\-misses / L1\\-icache\\-loads\000l1i_miss_rate > 0.05\000L1I miss rate\000\000100%\000\000\000\000001 */
+{ 129975 }, /* llc_miss_rate\000Default2\000LLC\\-load\\-misses / LLC\\-loads\000llc_miss_rate > 0.05\000LLC miss rate\000\000100%\000\000\000\000001 */
+{ 128375 }, /* migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\000011 */
+{ 128635 }, /* page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\000011 */
+{ 128979 }, /* stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001 */
 
 };
 
@@ -2650,29 +2654,29 @@ static const struct pmu_table_entry pmu_metrics__common[] = {
 };
 
 static const struct compact_pmu_event pmu_events__test_soc_cpu_default_core[] = {
-{ 126205 }, /* bp_l1_btb_correct\000branch\000L1 BTB Correction\000event=0x8a\000\00000\000\000\000\000\000 */
-{ 126267 }, /* bp_l2_btb_correct\000branch\000L2 BTB Correction\000event=0x8b\000\00000\000\000\000\000\000 */
-{ 126529 }, /* dispatch_blocked.any\000other\000Memory cluster signals to block micro-op dispatch for any reason\000event=9,period=200000,umask=0x20\000\00000\000\000\000\000\000 */
-{ 126662 }, /* eist_trans\000other\000Number of Enhanced Intel SpeedStep(R) Technology (EIST) transitions\000event=0x3a,period=200000\000\00000\000\000\000\000\000 */
-{ 126329 }, /* l3_cache_rd\000cache\000L3 cache access, read\000event=0x40\000\00000\000\000\000\000Attributable Level 3 cache access, read\000 */
-{ 126427 }, /* segment_reg_loads.any\000other\000Number of segment register loads\000event=6,period=200000,umask=0x80\000\00000\000\000\000\000\000 */
+{ 126403 }, /* bp_l1_btb_correct\000branch\000L1 BTB Correction\000event=0x8a\000\00000\000\000\000\000\000 */
+{ 126465 }, /* bp_l2_btb_correct\000branch\000L2 BTB Correction\000event=0x8b\000\00000\000\000\000\000\000 */
+{ 126727 }, /* dispatch_blocked.any\000other\000Memory cluster signals to block micro-op dispatch for any reason\000event=9,period=200000,umask=0x20\000\00000\000\000\000\000\000 */
+{ 126860 }, /* eist_trans\000other\000Number of Enhanced Intel SpeedStep(R) Technology (EIST) transitions\000event=0x3a,period=200000\000\00000\000\000\000\000\000 */
+{ 126527 }, /* l3_cache_rd\000cache\000L3 cache access, read\000event=0x40\000\00000\000\000\000\000Attributable Level 3 cache access, read\000 */
+{ 126625 }, /* segment_reg_loads.any\000other\000Number of segment register loads\000event=6,period=200000,umask=0x80\000\00000\000\000\000\000\000 */
 };
 static const struct compact_pmu_event pmu_events__test_soc_cpu_hisi_sccl_ddrc[] = {
-{ 126795 }, /* uncore_hisi_ddrc.flux_wcmd\000uncore\000DDRC write commands\000event=2\000\00000\000\000\000\000\000 */
+{ 126993 }, /* uncore_hisi_ddrc.flux_wcmd\000uncore\000DDRC write commands\000event=2\000\00000\000\000\000\000\000 */
 };
 static const struct compact_pmu_event pmu_events__test_soc_cpu_hisi_sccl_l3c[] = {
-{ 127157 }, /* uncore_hisi_l3c.rd_hit_cpipe\000uncore\000Total read hits\000event=7\000\00000\000\000\000\000\000 */
+{ 127355 }, /* uncore_hisi_l3c.rd_hit_cpipe\000uncore\000Total read hits\000event=7\000\00000\000\000\000\000\000 */
 };
 static const struct compact_pmu_event pmu_events__test_soc_cpu_uncore_cbox[] = {
-{ 127031 }, /* event-hyphen\000uncore\000UNC_CBO_HYPHEN\000event=0xe0\000\00000\000\000\000\000\000 */
-{ 127085 }, /* event-two-hyph\000uncore\000UNC_CBO_TWO_HYPH\000event=0xc0\000\00000\000\000\000\000\000 */
-{ 126877 }, /* unc_cbo_xsnp_response.miss_eviction\000uncore\000A cross-core snoop resulted from L3 Eviction which misses in some processor core\000event=0x22,umask=0x81\000\00000\000\000\000\000\000 */
+{ 127229 }, /* event-hyphen\000uncore\000UNC_CBO_HYPHEN\000event=0xe0\000\00000\000\000\000\000\000 */
+{ 127283 }, /* event-two-hyph\000uncore\000UNC_CBO_TWO_HYPH\000event=0xc0\000\00000\000\000\000\000\000 */
+{ 127075 }, /* unc_cbo_xsnp_response.miss_eviction\000uncore\000A cross-core snoop resulted from L3 Eviction which misses in some processor core\000event=0x22,umask=0x81\000\00000\000\000\000\000\000 */
 };
 static const struct compact_pmu_event pmu_events__test_soc_cpu_uncore_imc[] = {
-{ 127340 }, /* uncore_imc.cache_hits\000uncore\000Total cache hits\000event=0x34\000\00000\000\000\000\000\000 */
+{ 127538 }, /* uncore_imc.cache_hits\000uncore\000Total cache hits\000event=0x34\000\00000\000\000\000\000\000 */
 };
 static const struct compact_pmu_event pmu_events__test_soc_cpu_uncore_imc_free_running[] = {
-{ 127249 }, /* uncore_imc_free_running.cache_miss\000uncore\000Total cache misses\000event=0x12\000\00000\000\000\000\000\000 */
+{ 127447 }, /* uncore_imc_free_running.cache_miss\000uncore\000Total cache misses\000event=0x12\000\00000\000\000\000\000\000 */
 
 };
 
@@ -2685,46 +2689,46 @@ static const struct pmu_table_entry pmu_events__test_soc_cpu[] = {
 {
      .entries = pmu_events__test_soc_cpu_hisi_sccl_ddrc,
      .num_entries = ARRAY_SIZE(pmu_events__test_soc_cpu_hisi_sccl_ddrc),
-     .pmu_name = { 126780 /* hisi_sccl,ddrc\000 */ },
+     .pmu_name = { 126978 /* hisi_sccl,ddrc\000 */ },
 },
 {
      .entries = pmu_events__test_soc_cpu_hisi_sccl_l3c,
      .num_entries = ARRAY_SIZE(pmu_events__test_soc_cpu_hisi_sccl_l3c),
-     .pmu_name = { 127143 /* hisi_sccl,l3c\000 */ },
+     .pmu_name = { 127341 /* hisi_sccl,l3c\000 */ },
 },
 {
      .entries = pmu_events__test_soc_cpu_uncore_cbox,
      .num_entries = ARRAY_SIZE(pmu_events__test_soc_cpu_uncore_cbox),
-     .pmu_name = { 126865 /* uncore_cbox\000 */ },
+     .pmu_name = { 127063 /* uncore_cbox\000 */ },
 },
 {
      .entries = pmu_events__test_soc_cpu_uncore_imc,
      .num_entries = ARRAY_SIZE(pmu_events__test_soc_cpu_uncore_imc),
-     .pmu_name = { 127329 /* uncore_imc\000 */ },
+     .pmu_name = { 127527 /* uncore_imc\000 */ },
 },
 {
      .entries = pmu_events__test_soc_cpu_uncore_imc_free_running,
      .num_entries = ARRAY_SIZE(pmu_events__test_soc_cpu_uncore_imc_free_running),
-     .pmu_name = { 127225 /* uncore_imc_free_running\000 */ },
+     .pmu_name = { 127423 /* uncore_imc_free_running\000 */ },
 },
 };
 
 static const struct compact_pmu_event pmu_metrics__test_soc_cpu_default_core[] = {
-{ 130353 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000 */
-{ 131042 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000 */
-{ 130812 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000 */
-{ 130907 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000 */
-{ 131107 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
-{ 131176 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
-{ 130440 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000 */
-{ 130376 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000 */
-{ 131314 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000 */
-{ 131247 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\000000 */
-{ 131270 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\000000 */
-{ 131293 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\000000 */
-{ 130740 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000 */
-{ 130607 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
-{ 130672 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
+{ 130551 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000 */
+{ 131240 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000 */
+{ 131010 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000 */
+{ 131105 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000 */
+{ 131305 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
+{ 131374 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
+{ 130638 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000 */
+{ 130574 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000 */
+{ 131512 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000 */
+{ 131445 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\000000 */
+{ 131468 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\000000 */
+{ 131491 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\000000 */
+{ 130938 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000 */
+{ 130805 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
+{ 130870 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
 
 };
 
@@ -2737,13 +2741,13 @@ static const struct pmu_table_entry pmu_metrics__test_soc_cpu[] = {
 };
 
 static const struct compact_pmu_event pmu_events__test_soc_sys_uncore_sys_ccn_pmu[] = {
-{ 127519 }, /* sys_ccn_pmu.read_cycles\000uncore\000ccn read-cycles event\000config=0x2c\0000x01\00000\000\000\000\000\000 */
+{ 127717 }, /* sys_ccn_pmu.read_cycles\000uncore\000ccn read-cycles event\000config=0x2c\0000x01\00000\000\000\000\000\000 */
 };
 static const struct compact_pmu_event pmu_events__test_soc_sys_uncore_sys_cmn_pmu[] = {
-{ 127615 }, /* sys_cmn_pmu.hnf_cache_miss\000uncore\000Counts total cache misses in first lookup result (high priority)\000eventid=1,type=5\000(434|436|43c|43a).*\00000\000\000\000\000\000 */
+{ 127813 }, /* sys_cmn_pmu.hnf_cache_miss\000uncore\000Counts total cache misses in first lookup result (high priority)\000eventid=1,type=5\000(434|436|43c|43a).*\00000\000\000\000\000\000 */
 };
 static const struct compact_pmu_event pmu_events__test_soc_sys_uncore_sys_ddr_pmu[] = {
-{ 127424 }, /* sys_ddr_pmu.write_cycles\000uncore\000ddr write-cycles event\000event=0x2b\000v8\00000\000\000\000\000\000 */
+{ 127622 }, /* sys_ddr_pmu.write_cycles\000uncore\000ddr write-cycles event\000event=0x2b\000v8\00000\000\000\000\000\000 */
 
 };
 
@@ -2751,17 +2755,17 @@ static const struct pmu_table_entry pmu_events__test_soc_sys[] = {
 {
      .entries = pmu_events__test_soc_sys_uncore_sys_ccn_pmu,
      .num_entries = ARRAY_SIZE(pmu_events__test_soc_sys_uncore_sys_ccn_pmu),
-     .pmu_name = { 127500 /* uncore_sys_ccn_pmu\000 */ },
+     .pmu_name = { 127698 /* uncore_sys_ccn_pmu\000 */ },
 },
 {
      .entries = pmu_events__test_soc_sys_uncore_sys_cmn_pmu,
      .num_entries = ARRAY_SIZE(pmu_events__test_soc_sys_uncore_sys_cmn_pmu),
-     .pmu_name = { 127596 /* uncore_sys_cmn_pmu\000 */ },
+     .pmu_name = { 127794 /* uncore_sys_cmn_pmu\000 */ },
 },
 {
      .entries = pmu_events__test_soc_sys_uncore_sys_ddr_pmu,
      .num_entries = ARRAY_SIZE(pmu_events__test_soc_sys_uncore_sys_ddr_pmu),
-     .pmu_name = { 127405 /* uncore_sys_ddr_pmu\000 */ },
+     .pmu_name = { 127603 /* uncore_sys_ddr_pmu\000 */ },
 },
 };
 
diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
index 4df56f2b283d..465fe2e9bbbe 100644
--- a/tools/perf/util/expr.c
+++ b/tools/perf/util/expr.c
@@ -401,17 +401,12 @@ double expr__get_literal(const char *literal, const struct expr_scanner_ctx *ctx
 	if (ev != TOOL_PMU__EVENT_NONE) {
 		u64 count;
 
-		if (tool_pmu__read_event(ev, /*evsel=*/NULL, &count))
+		if (tool_pmu__read_event(ev, /*evsel=*/NULL,
+					 ctx->system_wide, ctx->user_requested_cpu_list,
+					 &count))
 			result = count;
 		else
 			pr_err("Failure to read '%s'", literal);
-
-	} else if (!strcmp("#core_wide", literal)) {
-		result = core_wide(ctx->system_wide, ctx->user_requested_cpu_list)
-			? 1.0 : 0.0;
-	} else if (!strcmp("#target_cpu", literal)) {
-		result = (ctx->system_wide || ctx->user_requested_cpu_list)
-			? 1.0 : 0.0;
 	} else {
 		pr_err("Unrecognized literal '%s'", literal);
 	}
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index c1547128c396..b3b482e1808f 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -72,6 +72,8 @@ static int prepare_metric(const struct metric_expr *mexp,
 			case TOOL_PMU__EVENT_SLOTS:
 			case TOOL_PMU__EVENT_SMT_ON:
 			case TOOL_PMU__EVENT_SYSTEM_TSC_FREQ:
+			case TOOL_PMU__EVENT_CORE_WIDE:
+			case TOOL_PMU__EVENT_TARGET_CPU:
 			default:
 				pr_err("Unexpected tool event '%s'", evsel__name(metric_events[i]));
 				abort();
diff --git a/tools/perf/util/tool_pmu.c b/tools/perf/util/tool_pmu.c
index f075098488ba..a72c665ee644 100644
--- a/tools/perf/util/tool_pmu.c
+++ b/tools/perf/util/tool_pmu.c
@@ -6,6 +6,7 @@
 #include "pmu.h"
 #include "print-events.h"
 #include "smt.h"
+#include "stat.h"
 #include "time-utils.h"
 #include "tool_pmu.h"
 #include "tsc.h"
@@ -30,6 +31,8 @@ static const char *const tool_pmu__event_names[TOOL_PMU__EVENT_MAX] = {
 	"slots",
 	"smt_on",
 	"system_tsc_freq",
+	"core_wide",
+	"target_cpu",
 };
 
 bool tool_pmu__skip_event(const char *name __maybe_unused)
@@ -329,7 +332,11 @@ static bool has_pmem(void)
 	return has_pmem;
 }
 
-bool tool_pmu__read_event(enum tool_pmu_event ev, struct evsel *evsel, u64 *result)
+bool tool_pmu__read_event(enum tool_pmu_event ev,
+			  struct evsel *evsel,
+			  bool system_wide,
+			  const char *user_requested_cpu_list,
+			  u64 *result)
 {
 	const struct cpu_topology *topology;
 
@@ -421,6 +428,14 @@ bool tool_pmu__read_event(enum tool_pmu_event ev, struct evsel *evsel, u64 *resu
 		*result = arch_get_tsc_freq();
 		return true;
 
+	case TOOL_PMU__EVENT_CORE_WIDE:
+		*result = core_wide(system_wide, user_requested_cpu_list) ? 1 : 0;
+		return true;
+
+	case TOOL_PMU__EVENT_TARGET_CPU:
+		*result = system_wide || (user_requested_cpu_list != NULL) ? 1 : 0;
+		return true;
+
 	case TOOL_PMU__EVENT_NONE:
 	case TOOL_PMU__EVENT_DURATION_TIME:
 	case TOOL_PMU__EVENT_USER_TIME:
@@ -452,11 +467,16 @@ int evsel__tool_pmu_read(struct evsel *evsel, int cpu_map_idx, int thread)
 	case TOOL_PMU__EVENT_SLOTS:
 	case TOOL_PMU__EVENT_SMT_ON:
 	case TOOL_PMU__EVENT_SYSTEM_TSC_FREQ:
+	case TOOL_PMU__EVENT_CORE_WIDE:
+	case TOOL_PMU__EVENT_TARGET_CPU:
 		if (evsel->prev_raw_counts)
 			old_count = perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread);
 		val = 0;
 		if (cpu_map_idx == 0 && thread == 0) {
-			if (!tool_pmu__read_event(ev, evsel, &val)) {
+			if (!tool_pmu__read_event(ev, evsel,
+						  stat_config.system_wide,
+						  stat_config.user_requested_cpu_list,
+						  &val)) {
 				count->lost++;
 				val = 0;
 			}
diff --git a/tools/perf/util/tool_pmu.h b/tools/perf/util/tool_pmu.h
index d642e7d73910..f1714001bc1d 100644
--- a/tools/perf/util/tool_pmu.h
+++ b/tools/perf/util/tool_pmu.h
@@ -22,6 +22,8 @@ enum tool_pmu_event {
 	TOOL_PMU__EVENT_SLOTS,
 	TOOL_PMU__EVENT_SMT_ON,
 	TOOL_PMU__EVENT_SYSTEM_TSC_FREQ,
+	TOOL_PMU__EVENT_CORE_WIDE,
+	TOOL_PMU__EVENT_TARGET_CPU,
 
 	TOOL_PMU__EVENT_MAX,
 };
@@ -34,7 +36,12 @@ enum tool_pmu_event tool_pmu__str_to_event(const char *str);
 bool tool_pmu__skip_event(const char *name);
 int tool_pmu__num_skip_events(void);
 
-bool tool_pmu__read_event(enum tool_pmu_event ev, struct evsel *evsel, u64 *result);
+bool tool_pmu__read_event(enum tool_pmu_event ev,
+			  struct evsel *evsel,
+			  bool system_wide,
+			  const char *user_requested_cpu_list,
+			  u64 *result);
+
 
 u64 tool_pmu__cpu_slots_per_cycle(void);
 
-- 
2.51.2.1041.gc1ab5b90ca-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 00/18]
  2025-11-11 21:21 [PATCH v4 00/18] Ian Rogers
                   ` (17 preceding siblings ...)
  2025-11-11 21:22 ` [PATCH v4 18/18] perf tool_pmu: Make core_wide and target_cpu json events Ian Rogers
@ 2025-11-11 22:42 ` Namhyung Kim
  2025-11-11 23:13   ` Ian Rogers
  2025-11-12  9:03 ` Mi, Dapeng
  2025-11-12 17:56 ` Namhyung Kim
  20 siblings, 1 reply; 33+ messages in thread
From: Namhyung Kim @ 2025-11-11 22:42 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, James Clark,
	Xu Yang, Chun-Tse Shao, Thomas Richter, Sumanth Korikkar,
	Collin Funk, Thomas Falcon, Howard Chu, Dapeng Mi, Levi Yun,
	Yang Li, linux-kernel, linux-perf-users, Andi Kleen, Weilin Wang

On Tue, Nov 11, 2025 at 01:21:48PM -0800, Ian Rogers wrote:
> Prior to this series stat-shadow would produce hard coded metrics if
> certain events appeared in the evlist. This series produces equivalent
> json metrics and cleans up the consequences in tests and display
> output. A before and after of the default display output on a
> tigerlake is:
> 
> Before:
> ```
> $ perf stat -a sleep 1
> 
>  Performance counter stats for 'system wide':
> 
>     16,041,816,418      cpu-clock                        #   15.995 CPUs utilized             
>              5,749      context-switches                 #  358.376 /sec                      
>                121      cpu-migrations                   #    7.543 /sec                      
>              1,806      page-faults                      #  112.581 /sec                      
>        825,965,204      instructions                     #    0.70  insn per cycle            
>      1,180,799,101      cycles                           #    0.074 GHz                       
>        168,945,109      branches                         #   10.532 M/sec                     
>          4,629,567      branch-misses                    #    2.74% of all branches           
>  #     30.2 %  tma_backend_bound      
>                                                   #      7.8 %  tma_bad_speculation    
>                                                   #     47.1 %  tma_frontend_bound     
>  #     14.9 %  tma_retiring           
> ```
> 
> After:
> ```
> $ perf stat -a sleep 1
> 
>  Performance counter stats for 'system wide':
> 
>              2,890      context-switches                 #    179.9 cs/sec  cs_per_second     
>     16,061,923,339      cpu-clock                        #     16.0 CPUs  CPUs_utilized       
>                 43      cpu-migrations                   #      2.7 migrations/sec  migrations_per_second
>              5,645      page-faults                      #    351.5 faults/sec  page_faults_per_second
>          5,708,413      branch-misses                    #      1.4 %  branch_miss_rate         (88.83%)
>        429,978,120      branches                         #     26.8 M/sec  branch_frequency     (88.85%)
>      1,626,915,897      cpu-cycles                       #      0.1 GHz  cycles_frequency       (88.84%)
>      2,556,805,534      instructions                     #      1.5 instructions  insn_per_cycle  (88.86%)
>                         TopdownL1                 #     20.1 %  tma_backend_bound      
>                                                   #     40.5 %  tma_bad_speculation      (88.90%)
>                                                   #     17.2 %  tma_frontend_bound       (78.05%)
>                                                   #     22.2 %  tma_retiring             (88.89%)
> 
>        1.002994394 seconds time elapsed
> ```
> 
> Having the metrics in json brings greater uniformity, allows events to
> be shared by metrics, and it also allows descriptions like:
> ```
> $ perf list cs_per_second
> ...
>   cs_per_second
>        [Context switches per CPU second]
> ```
> 
> A thorn in the side of doing this work was that the hard coded metrics
> were used by perf script with '-F metric'. This functionality didn't
> work for me (I was testing `perf record -e instructions,cycles`
> with/without leader sampling and then `perf script -F metric` but saw
> nothing but empty lines) but anyway I decided to fix it to the best of
> my ability in this series. So the script side counters were removed
> and the regular ones associated with the evsel used. The json metrics
> were all searched looking for ones that have a subset of events
> matching those in the perf script session, and all metrics are
> printed. This is kind of weird as the counters are being set by the
> period of samples, but I carried the behavior forward. I suspect there
> needs to be follow up work to make this better, but what is in the
> series is superior to what is currently in the tree. Follow up work
> could include finding metrics for the machine in the perf.data rather
> than using the host, allowing multiple metrics even if the metric ids
> of the events differ, fixing pre-existing `perf stat record/report`
> issues, etc.
> 
> There is a lot of stat tests that, for example, assume '-e
> instructions,cycles' will produce an IPC metric. These things needed
> tidying as now the metric must be explicitly asked for and when doing
> this ones using software events were preferred to increase
> compatibility. As the test updates were numerous they are distinct to
> the patches updating the functionality causing periods in the series
> where not all tests are passing. If this is undesirable the test fixes
> can be squashed into the functionality updates, but this will be kind
> of messy, especially as at some points in the series both the old
> metrics and the new metrics will be displayed.
> 
> v4: K/sec to M/sec on branch frequency (Namhyung), perf script -F
>     metric to-done a system-wide calculation (Namhyung) and don't
>     crash because of the CPU map index couldn't be found. Regenerate
>     commit messages but the cpu-clock was always yielding 0 on my
>     machine leading to a lot of nan metric values.

This is strange.  The cpu-clock should not be 0 as long as you ran it.
Do you think it's related to the scale unit change?  I tested v3 and
didn't see the problem.

Thanks,
Namhyung

> 
> v3: Rebase resolving merge conflicts in
>     tools/perf/pmu-events/empty-pmu-events.c by just regenerating it
>     (Dapeng Mi).
>     https://lore.kernel.org/lkml/20251111040417.270945-1-irogers@google.com/
> 
> v2: Drop merged patches, add json to document target_cpu/core_wide and
>     example to "Add care to picking the evsel for displaying a metric"
>     commit message (Namhyung).
>     https://lore.kernel.org/lkml/20251106231508.448793-1-irogers@google.com/
> 
> v1: https://lore.kernel.org/lkml/20251024175857.808401-1-irogers@google.com/
> 
> Ian Rogers (18):
>   perf metricgroup: Add care to picking the evsel for displaying a
>     metric
>   perf expr: Add #target_cpu literal
>   perf jevents: Add set of common metrics based on default ones
>   perf jevents: Add metric DefaultShowEvents
>   perf stat: Add detail -d,-dd,-ddd metrics
>   perf script: Change metric format to use json metrics
>   perf stat: Remove hard coded shadow metrics
>   perf stat: Fix default metricgroup display on hybrid
>   perf stat: Sort default events/metrics
>   perf stat: Remove "unit" workarounds for metric-only
>   perf test stat+json: Improve metric-only testing
>   perf test stat: Ignore failures in Default[234] metricgroups
>   perf test stat: Update std_output testing metric expectations
>   perf test metrics: Update all metrics for possibly failing default
>     metrics
>   perf test stat: Update shadow test to use metrics
>   perf test stat: Update test expectations and events
>   perf test stat csv: Update test expectations and events
>   perf tool_pmu: Make core_wide and target_cpu json events
> 
>  tools/perf/builtin-script.c                   | 251 ++++++++++-
>  tools/perf/builtin-stat.c                     | 154 ++-----
>  .../arch/common/common/metrics.json           | 151 +++++++
>  .../pmu-events/arch/common/common/tool.json   |  12 +
>  tools/perf/pmu-events/empty-pmu-events.c      | 229 ++++++----
>  tools/perf/pmu-events/jevents.py              |  28 +-
>  tools/perf/pmu-events/pmu-events.h            |   2 +
>  .../tests/shell/lib/perf_json_output_lint.py  |   4 +-
>  tools/perf/tests/shell/lib/stat_output.sh     |   2 +-
>  tools/perf/tests/shell/stat+csv_output.sh     |   2 +-
>  tools/perf/tests/shell/stat+json_output.sh    |   2 +-
>  tools/perf/tests/shell/stat+shadow_stat.sh    |   4 +-
>  tools/perf/tests/shell/stat+std_output.sh     |   4 +-
>  tools/perf/tests/shell/stat.sh                |   6 +-
>  .../perf/tests/shell/stat_all_metricgroups.sh |   3 +
>  tools/perf/tests/shell/stat_all_metrics.sh    |   7 +-
>  tools/perf/util/evsel.h                       |   1 +
>  tools/perf/util/expr.c                        |   8 +-
>  tools/perf/util/metricgroup.c                 |  92 +++-
>  tools/perf/util/stat-display.c                |  55 +--
>  tools/perf/util/stat-shadow.c                 | 404 +-----------------
>  tools/perf/util/stat.h                        |   2 +-
>  tools/perf/util/tool_pmu.c                    |  24 +-
>  tools/perf/util/tool_pmu.h                    |   9 +-
>  24 files changed, 769 insertions(+), 687 deletions(-)
>  create mode 100644 tools/perf/pmu-events/arch/common/common/metrics.json
> 
> -- 
> 2.51.2.1041.gc1ab5b90ca-goog
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 00/18]
  2025-11-11 22:42 ` [PATCH v4 00/18] Namhyung Kim
@ 2025-11-11 23:13   ` Ian Rogers
  2025-11-12  1:08     ` Namhyung Kim
  2025-11-12  8:20     ` Mi, Dapeng
  0 siblings, 2 replies; 33+ messages in thread
From: Ian Rogers @ 2025-11-11 23:13 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, James Clark,
	Xu Yang, Chun-Tse Shao, Thomas Richter, Sumanth Korikkar,
	Collin Funk, Thomas Falcon, Howard Chu, Dapeng Mi, Levi Yun,
	Yang Li, linux-kernel, linux-perf-users, Andi Kleen, Weilin Wang

On Tue, Nov 11, 2025 at 2:42 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Tue, Nov 11, 2025 at 01:21:48PM -0800, Ian Rogers wrote:
> > Prior to this series stat-shadow would produce hard coded metrics if
> > certain events appeared in the evlist. This series produces equivalent
> > json metrics and cleans up the consequences in tests and display
> > output. A before and after of the default display output on a
> > tigerlake is:
> >
> > Before:
> > ```
> > $ perf stat -a sleep 1
> >
> >  Performance counter stats for 'system wide':
> >
> >     16,041,816,418      cpu-clock                        #   15.995 CPUs utilized
> >              5,749      context-switches                 #  358.376 /sec
> >                121      cpu-migrations                   #    7.543 /sec
> >              1,806      page-faults                      #  112.581 /sec
> >        825,965,204      instructions                     #    0.70  insn per cycle
> >      1,180,799,101      cycles                           #    0.074 GHz
> >        168,945,109      branches                         #   10.532 M/sec
> >          4,629,567      branch-misses                    #    2.74% of all branches
> >  #     30.2 %  tma_backend_bound
> >                                                   #      7.8 %  tma_bad_speculation
> >                                                   #     47.1 %  tma_frontend_bound
> >  #     14.9 %  tma_retiring
> > ```
> >
> > After:
> > ```
> > $ perf stat -a sleep 1
> >
> >  Performance counter stats for 'system wide':
> >
> >              2,890      context-switches                 #    179.9 cs/sec  cs_per_second
> >     16,061,923,339      cpu-clock                        #     16.0 CPUs  CPUs_utilized
> >                 43      cpu-migrations                   #      2.7 migrations/sec  migrations_per_second
> >              5,645      page-faults                      #    351.5 faults/sec  page_faults_per_second
> >          5,708,413      branch-misses                    #      1.4 %  branch_miss_rate         (88.83%)
> >        429,978,120      branches                         #     26.8 M/sec  branch_frequency     (88.85%)
> >      1,626,915,897      cpu-cycles                       #      0.1 GHz  cycles_frequency       (88.84%)
> >      2,556,805,534      instructions                     #      1.5 instructions  insn_per_cycle  (88.86%)
> >                         TopdownL1                 #     20.1 %  tma_backend_bound
> >                                                   #     40.5 %  tma_bad_speculation      (88.90%)
> >                                                   #     17.2 %  tma_frontend_bound       (78.05%)
> >                                                   #     22.2 %  tma_retiring             (88.89%)
> >
> >        1.002994394 seconds time elapsed
> > ```
> >
> > Having the metrics in json brings greater uniformity, allows events to
> > be shared by metrics, and it also allows descriptions like:
> > ```
> > $ perf list cs_per_second
> > ...
> >   cs_per_second
> >        [Context switches per CPU second]
> > ```
> >
> > A thorn in the side of doing this work was that the hard coded metrics
> > were used by perf script with '-F metric'. This functionality didn't
> > work for me (I was testing `perf record -e instructions,cycles`
> > with/without leader sampling and then `perf script -F metric` but saw
> > nothing but empty lines) but anyway I decided to fix it to the best of
> > my ability in this series. So the script side counters were removed
> > and the regular ones associated with the evsel used. The json metrics
> > were all searched looking for ones that have a subset of events
> > matching those in the perf script session, and all metrics are
> > printed. This is kind of weird as the counters are being set by the
> > period of samples, but I carried the behavior forward. I suspect there
> > needs to be follow up work to make this better, but what is in the
> > series is superior to what is currently in the tree. Follow up work
> > could include finding metrics for the machine in the perf.data rather
> > than using the host, allowing multiple metrics even if the metric ids
> > of the events differ, fixing pre-existing `perf stat record/report`
> > issues, etc.
> >
> > There is a lot of stat tests that, for example, assume '-e
> > instructions,cycles' will produce an IPC metric. These things needed
> > tidying as now the metric must be explicitly asked for and when doing
> > this ones using software events were preferred to increase
> > compatibility. As the test updates were numerous they are distinct to
> > the patches updating the functionality causing periods in the series
> > where not all tests are passing. If this is undesirable the test fixes
> > can be squashed into the functionality updates, but this will be kind
> > of messy, especially as at some points in the series both the old
> > metrics and the new metrics will be displayed.
> >
> > v4: K/sec to M/sec on branch frequency (Namhyung), perf script -F
> >     metric to-done a system-wide calculation (Namhyung) and don't
> >     crash because of the CPU map index couldn't be found. Regenerate
> >     commit messages but the cpu-clock was always yielding 0 on my
> >     machine leading to a lot of nan metric values.
>
> This is strange.  The cpu-clock should not be 0 as long as you ran it.
> Do you think it's related to the scale unit change?  I tested v3 and
> didn't see the problem.

It looked like a kernel issue. The raw counts were 0 before being
scaled. All metrics always work on unscaled values. It is only the
commit messages and the formatting is more important than the numeric
values - which were correct for a cpu-clock of 0.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 00/18]
  2025-11-11 23:13   ` Ian Rogers
@ 2025-11-12  1:08     ` Namhyung Kim
  2025-11-12  8:20     ` Mi, Dapeng
  1 sibling, 0 replies; 33+ messages in thread
From: Namhyung Kim @ 2025-11-12  1:08 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, James Clark,
	Xu Yang, Chun-Tse Shao, Thomas Richter, Sumanth Korikkar,
	Collin Funk, Thomas Falcon, Howard Chu, Dapeng Mi, Levi Yun,
	Yang Li, linux-kernel, linux-perf-users, Andi Kleen, Weilin Wang

On Tue, Nov 11, 2025 at 03:13:35PM -0800, Ian Rogers wrote:
> On Tue, Nov 11, 2025 at 2:42 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Tue, Nov 11, 2025 at 01:21:48PM -0800, Ian Rogers wrote:
> > > Prior to this series stat-shadow would produce hard coded metrics if
> > > certain events appeared in the evlist. This series produces equivalent
> > > json metrics and cleans up the consequences in tests and display
> > > output. A before and after of the default display output on a
> > > tigerlake is:
> > >
> > > Before:
> > > ```
> > > $ perf stat -a sleep 1
> > >
> > >  Performance counter stats for 'system wide':
> > >
> > >     16,041,816,418      cpu-clock                        #   15.995 CPUs utilized
> > >              5,749      context-switches                 #  358.376 /sec
> > >                121      cpu-migrations                   #    7.543 /sec
> > >              1,806      page-faults                      #  112.581 /sec
> > >        825,965,204      instructions                     #    0.70  insn per cycle
> > >      1,180,799,101      cycles                           #    0.074 GHz
> > >        168,945,109      branches                         #   10.532 M/sec
> > >          4,629,567      branch-misses                    #    2.74% of all branches
> > >  #     30.2 %  tma_backend_bound
> > >                                                   #      7.8 %  tma_bad_speculation
> > >                                                   #     47.1 %  tma_frontend_bound
> > >  #     14.9 %  tma_retiring
> > > ```
> > >
> > > After:
> > > ```
> > > $ perf stat -a sleep 1
> > >
> > >  Performance counter stats for 'system wide':
> > >
> > >              2,890      context-switches                 #    179.9 cs/sec  cs_per_second
> > >     16,061,923,339      cpu-clock                        #     16.0 CPUs  CPUs_utilized
> > >                 43      cpu-migrations                   #      2.7 migrations/sec  migrations_per_second
> > >              5,645      page-faults                      #    351.5 faults/sec  page_faults_per_second
> > >          5,708,413      branch-misses                    #      1.4 %  branch_miss_rate         (88.83%)
> > >        429,978,120      branches                         #     26.8 M/sec  branch_frequency     (88.85%)
> > >      1,626,915,897      cpu-cycles                       #      0.1 GHz  cycles_frequency       (88.84%)
> > >      2,556,805,534      instructions                     #      1.5 instructions  insn_per_cycle  (88.86%)
> > >                         TopdownL1                 #     20.1 %  tma_backend_bound
> > >                                                   #     40.5 %  tma_bad_speculation      (88.90%)
> > >                                                   #     17.2 %  tma_frontend_bound       (78.05%)
> > >                                                   #     22.2 %  tma_retiring             (88.89%)
> > >
> > >        1.002994394 seconds time elapsed
> > > ```
> > >
> > > Having the metrics in json brings greater uniformity, allows events to
> > > be shared by metrics, and it also allows descriptions like:
> > > ```
> > > $ perf list cs_per_second
> > > ...
> > >   cs_per_second
> > >        [Context switches per CPU second]
> > > ```
> > >
> > > A thorn in the side of doing this work was that the hard coded metrics
> > > were used by perf script with '-F metric'. This functionality didn't
> > > work for me (I was testing `perf record -e instructions,cycles`
> > > with/without leader sampling and then `perf script -F metric` but saw
> > > nothing but empty lines) but anyway I decided to fix it to the best of
> > > my ability in this series. So the script side counters were removed
> > > and the regular ones associated with the evsel used. The json metrics
> > > were all searched looking for ones that have a subset of events
> > > matching those in the perf script session, and all metrics are
> > > printed. This is kind of weird as the counters are being set by the
> > > period of samples, but I carried the behavior forward. I suspect there
> > > needs to be follow up work to make this better, but what is in the
> > > series is superior to what is currently in the tree. Follow up work
> > > could include finding metrics for the machine in the perf.data rather
> > > than using the host, allowing multiple metrics even if the metric ids
> > > of the events differ, fixing pre-existing `perf stat record/report`
> > > issues, etc.
> > >
> > > There is a lot of stat tests that, for example, assume '-e
> > > instructions,cycles' will produce an IPC metric. These things needed
> > > tidying as now the metric must be explicitly asked for and when doing
> > > this ones using software events were preferred to increase
> > > compatibility. As the test updates were numerous they are distinct to
> > > the patches updating the functionality causing periods in the series
> > > where not all tests are passing. If this is undesirable the test fixes
> > > can be squashed into the functionality updates, but this will be kind
> > > of messy, especially as at some points in the series both the old
> > > metrics and the new metrics will be displayed.
> > >
> > > v4: K/sec to M/sec on branch frequency (Namhyung), perf script -F
> > >     metric to-done a system-wide calculation (Namhyung) and don't
> > >     crash because of the CPU map index couldn't be found. Regenerate
> > >     commit messages but the cpu-clock was always yielding 0 on my
> > >     machine leading to a lot of nan metric values.
> >
> > This is strange.  The cpu-clock should not be 0 as long as you ran it.
> > Do you think it's related to the scale unit change?  I tested v3 and
> > didn't see the problem.
> 
> It looked like a kernel issue. The raw counts were 0 before being
> scaled. All metrics always work on unscaled values. It is only the
> commit messages and the formatting is more important than the numeric
> values - which were correct for a cpu-clock of 0.

Hmm.. ok.  I don't see the problem when I test the series so it may be
a problem in your environment.

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 00/18]
  2025-11-11 23:13   ` Ian Rogers
  2025-11-12  1:08     ` Namhyung Kim
@ 2025-11-12  8:20     ` Mi, Dapeng
  1 sibling, 0 replies; 33+ messages in thread
From: Mi, Dapeng @ 2025-11-12  8:20 UTC (permalink / raw)
  To: Ian Rogers, Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, James Clark,
	Xu Yang, Chun-Tse Shao, Thomas Richter, Sumanth Korikkar,
	Collin Funk, Thomas Falcon, Howard Chu, Levi Yun, Yang Li,
	linux-kernel, linux-perf-users, Andi Kleen, Weilin Wang


On 11/12/2025 7:13 AM, Ian Rogers wrote:
> On Tue, Nov 11, 2025 at 2:42 PM Namhyung Kim <namhyung@kernel.org> wrote:
>> On Tue, Nov 11, 2025 at 01:21:48PM -0800, Ian Rogers wrote:
>>> Prior to this series stat-shadow would produce hard coded metrics if
>>> certain events appeared in the evlist. This series produces equivalent
>>> json metrics and cleans up the consequences in tests and display
>>> output. A before and after of the default display output on a
>>> tigerlake is:
>>>
>>> Before:
>>> ```
>>> $ perf stat -a sleep 1
>>>
>>>  Performance counter stats for 'system wide':
>>>
>>>     16,041,816,418      cpu-clock                        #   15.995 CPUs utilized
>>>              5,749      context-switches                 #  358.376 /sec
>>>                121      cpu-migrations                   #    7.543 /sec
>>>              1,806      page-faults                      #  112.581 /sec
>>>        825,965,204      instructions                     #    0.70  insn per cycle
>>>      1,180,799,101      cycles                           #    0.074 GHz
>>>        168,945,109      branches                         #   10.532 M/sec
>>>          4,629,567      branch-misses                    #    2.74% of all branches
>>>  #     30.2 %  tma_backend_bound
>>>                                                   #      7.8 %  tma_bad_speculation
>>>                                                   #     47.1 %  tma_frontend_bound
>>>  #     14.9 %  tma_retiring
>>> ```
>>>
>>> After:
>>> ```
>>> $ perf stat -a sleep 1
>>>
>>>  Performance counter stats for 'system wide':
>>>
>>>              2,890      context-switches                 #    179.9 cs/sec  cs_per_second
>>>     16,061,923,339      cpu-clock                        #     16.0 CPUs  CPUs_utilized
>>>                 43      cpu-migrations                   #      2.7 migrations/sec  migrations_per_second
>>>              5,645      page-faults                      #    351.5 faults/sec  page_faults_per_second
>>>          5,708,413      branch-misses                    #      1.4 %  branch_miss_rate         (88.83%)
>>>        429,978,120      branches                         #     26.8 M/sec  branch_frequency     (88.85%)
>>>      1,626,915,897      cpu-cycles                       #      0.1 GHz  cycles_frequency       (88.84%)
>>>      2,556,805,534      instructions                     #      1.5 instructions  insn_per_cycle  (88.86%)
>>>                         TopdownL1                 #     20.1 %  tma_backend_bound
>>>                                                   #     40.5 %  tma_bad_speculation      (88.90%)
>>>                                                   #     17.2 %  tma_frontend_bound       (78.05%)
>>>                                                   #     22.2 %  tma_retiring             (88.89%)
>>>
>>>        1.002994394 seconds time elapsed
>>> ```
>>>
>>> Having the metrics in json brings greater uniformity, allows events to
>>> be shared by metrics, and it also allows descriptions like:
>>> ```
>>> $ perf list cs_per_second
>>> ...
>>>   cs_per_second
>>>        [Context switches per CPU second]
>>> ```
>>>
>>> A thorn in the side of doing this work was that the hard coded metrics
>>> were used by perf script with '-F metric'. This functionality didn't
>>> work for me (I was testing `perf record -e instructions,cycles`
>>> with/without leader sampling and then `perf script -F metric` but saw
>>> nothing but empty lines) but anyway I decided to fix it to the best of
>>> my ability in this series. So the script side counters were removed
>>> and the regular ones associated with the evsel used. The json metrics
>>> were all searched looking for ones that have a subset of events
>>> matching those in the perf script session, and all metrics are
>>> printed. This is kind of weird as the counters are being set by the
>>> period of samples, but I carried the behavior forward. I suspect there
>>> needs to be follow up work to make this better, but what is in the
>>> series is superior to what is currently in the tree. Follow up work
>>> could include finding metrics for the machine in the perf.data rather
>>> than using the host, allowing multiple metrics even if the metric ids
>>> of the events differ, fixing pre-existing `perf stat record/report`
>>> issues, etc.
>>>
>>> There is a lot of stat tests that, for example, assume '-e
>>> instructions,cycles' will produce an IPC metric. These things needed
>>> tidying as now the metric must be explicitly asked for and when doing
>>> this ones using software events were preferred to increase
>>> compatibility. As the test updates were numerous they are distinct to
>>> the patches updating the functionality causing periods in the series
>>> where not all tests are passing. If this is undesirable the test fixes
>>> can be squashed into the functionality updates, but this will be kind
>>> of messy, especially as at some points in the series both the old
>>> metrics and the new metrics will be displayed.
>>>
>>> v4: K/sec to M/sec on branch frequency (Namhyung), perf script -F
>>>     metric to-done a system-wide calculation (Namhyung) and don't
>>>     crash because of the CPU map index couldn't be found. Regenerate
>>>     commit messages but the cpu-clock was always yielding 0 on my
>>>     machine leading to a lot of nan metric values.
>> This is strange.  The cpu-clock should not be 0 as long as you ran it.
>> Do you think it's related to the scale unit change?  I tested v3 and
>> didn't see the problem.
> It looked like a kernel issue. The raw counts were 0 before being
> scaled. All metrics always work on unscaled values. It is only the
> commit messages and the formatting is more important than the numeric
> values - which were correct for a cpu-clock of 0.

Yes, It's a kernel issue. I also found it several days ago. I have posted a
patch to fix it. :)

https://lore.kernel.org/all/20251112080526.3971392-1-dapeng1.mi@linux.intel.com/


>
> Thanks,
> Ian
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 00/18]
  2025-11-11 21:21 [PATCH v4 00/18] Ian Rogers
                   ` (18 preceding siblings ...)
  2025-11-11 22:42 ` [PATCH v4 00/18] Namhyung Kim
@ 2025-11-12  9:03 ` Mi, Dapeng
  2025-11-12 17:56 ` Namhyung Kim
  20 siblings, 0 replies; 33+ messages in thread
From: Mi, Dapeng @ 2025-11-12  9:03 UTC (permalink / raw)
  To: Ian Rogers, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	James Clark, Xu Yang, Chun-Tse Shao, Thomas Richter,
	Sumanth Korikkar, Collin Funk, Thomas Falcon, Howard Chu,
	Levi Yun, Yang Li, linux-kernel, linux-perf-users, Andi Kleen,
	Weilin Wang

I tested this patch series on Sapphire Rapids and Arrow Lake, the topdown
metrics output looks much prettier and reader-friendly (especially on
hybrid platforms) than before. Thanks.

Sapphire Rapids:

1. sudo ./perf stat -a
^C
 Performance counter stats for 'system wide':

             1,742      context-switches                 #      4.1 cs/sec 
cs_per_second
        420,720.12 msec cpu-clock                        #    224.5 CPUs 
CPUs_utilized
               225      cpu-migrations                   #      0.5
migrations/sec  migrations_per_second
             1,463      page-faults                      #      3.5
faults/sec  page_faults_per_second
           842,434      branch-misses                    #      3.0 % 
branch_miss_rate
        28,215,728      branches                         #      0.1 M/sec 
branch_frequency
       373,303,824      cpu-cycles                       #      0.0 GHz 
cycles_frequency
       135,738,837      instructions                     #      0.4
instructions  insn_per_cycle
                        TopdownL1                 #      4.4 % 
tma_bad_speculation
                                                  #     29.0 % 
tma_frontend_bound
                                                  #     58.3 % 
tma_backend_bound
                                                  #      8.3 %  tma_retiring
                        TopdownL2                 #     25.9 %  tma_core_bound
                                                  #     32.4 % 
tma_memory_bound
                                                  #      2.3 % 
tma_heavy_operations
                                                  #      6.0 % 
tma_light_operations
                                                  #      4.1 % 
tma_branch_mispredicts
                                                  #      0.3 % 
tma_machine_clears
                                                  #      4.4 % 
tma_fetch_bandwidth
                                                  #     24.6 % 
tma_fetch_latency

       1.873921629 seconds time elapsed

2. sudo ./perf stat -- true

 Performance counter stats for 'true':

                 0      context-switches                 #      0.0 cs/sec 
cs_per_second
                 0      cpu-migrations                   #      0.0
migrations/sec  migrations_per_second
                53      page-faults                      # 178267.5
faults/sec  page_faults_per_second
              0.30 msec task-clock                       #      0.4 CPUs 
CPUs_utilized
             4,977      branch-misses                    #      4.6 % 
branch_miss_rate
           109,186      branches                         #    367.3 M/sec 
branch_frequency
           832,970      cpu-cycles                       #      2.8 GHz 
cycles_frequency
           561,263      instructions                     #      0.7
instructions  insn_per_cycle
                        TopdownL1                 #     11.1 % 
tma_bad_speculation
                                                  #     40.5 % 
tma_frontend_bound
                                                  #     35.2 % 
tma_backend_bound
                                                  #     13.3 %  tma_retiring
                        TopdownL2                 #     13.7 %  tma_core_bound
                                                  #     21.5 % 
tma_memory_bound
                                                  #      3.1 % 
tma_heavy_operations
                                                  #     10.2 % 
tma_light_operations
                                                  #     10.5 % 
tma_branch_mispredicts
                                                  #      0.6 % 
tma_machine_clears
                                                  #     10.5 % 
tma_fetch_bandwidth
                                                  #     29.9 % 
tma_fetch_latency

       0.000752150 seconds time elapsed

3. sudo ./perf stat -M TopdownL1 -- true

 Performance counter stats for 'true':

         5,352,744      TOPDOWN.SLOTS                    #     11.1 % 
tma_bad_speculation
                                                  #     41.5 % 
tma_frontend_bound
           650,725      topdown-retiring                 #     35.4 % 
tma_backend_bound
         2,246,053      topdown-fe-bound
         1,910,194      topdown-be-bound
           146,938      topdown-heavy-ops                #     12.1 % 
tma_retiring
           587,752      topdown-bad-spec
             8,977      INT_MISC.UOP_DROPPING

       0.000655604 seconds time elapsed

4. sudo ./perf stat -M TopdownL2 -- true

 Performance counter stats for 'true':

         5,935,368      TOPDOWN.SLOTS
           651,726      topdown-retiring
         2,257,767      topdown-fe-bound
         1,699,144      topdown-mem-bound                #     12.5 % 
tma_core_bound
                                                  #     28.6 % 
tma_memory_bound
         2,443,975      topdown-be-bound
           162,931      topdown-heavy-ops                #      2.7 % 
tma_heavy_operations
                                                  #      8.2 % 
tma_light_operations
           558,622      topdown-br-mispredict            #      9.4 % 
tma_branch_mispredicts
                                                  #      0.5 % 
tma_machine_clears
         1,722,420      topdown-fetch-lat                #      9.0 % 
tma_fetch_bandwidth
                                                  #     28.9 % 
tma_fetch_latency
           581,898      topdown-bad-spec
             9,177      INT_MISC.UOP_DROPPING

       0.000762976 seconds time elapsed

Arrow Lake:

1. sudo ./perf stat -a
^C
 Performance counter stats for 'system wide':

               355      context-switches                 #      8.7 cs/sec 
cs_per_second
         40,877.75 msec cpu-clock                        #     24.0 CPUs 
CPUs_utilized
                37      cpu-migrations                   #      0.9
migrations/sec  migrations_per_second
               749      page-faults                      #     18.3
faults/sec  page_faults_per_second
            80,736      cpu_core/branch-misses/          #      4.5 % 
branch_miss_rate
         1,817,804      cpu_core/branches/               #      0.0 M/sec 
branch_frequency
        22,099,084      cpu_core/cpu-cycles/             #      0.0 GHz 
cycles_frequency
         8,993,043      cpu_core/instructions/           #      0.4
instructions  insn_per_cycle
         7,484,501      cpu_atom/branch-misses/          #      9.0 % 
branch_miss_rate         (72.70%)
        80,826,849      cpu_atom/branches/               #      2.0 M/sec 
branch_frequency     (72.79%)
     1,071,170,614      cpu_atom/cpu-cycles/             #      0.0 GHz 
cycles_frequency       (72.78%)
       429,581,963      cpu_atom/instructions/           #      0.4
instructions  insn_per_cycle  (72.68%)
             TopdownL1 (cpu_core)                 #     62.1 % 
tma_backend_bound
                                                  #      4.6 % 
tma_bad_speculation
                                                  #     27.5 % 
tma_frontend_bound
                                                  #      5.9 %  tma_retiring
             TopdownL1 (cpu_atom)                 #     13.5 % 
tma_bad_speculation      (72.85%)
                                                  #     29.4 % 
tma_backend_bound        (72.87%)
                                                  #      0.0 % 
tma_frontend_bound       (81.91%)
                                                  #      0.0 % 
tma_retiring             (81.76%)

       1.703000770 seconds time elapsed

2. sudo ./perf stat -- true

 Performance counter stats for 'true':

                 0      context-switches                 #      0.0 cs/sec 
cs_per_second
                 0      cpu-migrations                   #      0.0
migrations/sec  migrations_per_second
                52      page-faults                      # 123119.2
faults/sec  page_faults_per_second
              0.42 msec task-clock                       #      0.3 CPUs 
CPUs_utilized
             8,317      cpu_atom/branch-misses/          #      1.6 % 
branch_miss_rate         (51.13%)
           621,409      cpu_atom/branches/               #   1471.3 M/sec 
branch_frequency
         1,670,355      cpu_atom/cpu-cycles/             #      4.0 GHz 
cycles_frequency
         3,412,023      cpu_atom/instructions/           #      2.0
instructions  insn_per_cycle
             TopdownL1 (cpu_atom)                 #     12.9 % 
tma_bad_speculation
                                                  #     22.1 % 
tma_backend_bound        (48.87%)
                                                  #      0.0 % 
tma_frontend_bound       (48.87%)

       0.001387192 seconds time elapsed

3. sudo ./perf stat -M TopdownL1
^C
 Performance counter stats for 'system wide':

        70,711,798      cpu_atom/TOPDOWN_BE_BOUND.ALL_P/ #     32.5 % 
tma_backend_bound
        34,170,064      cpu_core/slots/
         2,838,696      cpu_core/topdown-retiring/       #     31.9 % 
tma_backend_bound
                                                  #      7.6 % 
tma_bad_speculation
                                                  #     52.2 % 
tma_frontend_bound
         2,596,813      cpu_core/topdown-bad-spec/
           389,708      cpu_core/topdown-heavy-ops/      #      8.3 % 
tma_retiring
        17,836,476      cpu_core/topdown-fe-bound/
        10,892,767      cpu_core/topdown-be-bound/
                 0      cpu_atom/TOPDOWN_RETIRING.ALL/   #      0.0 % 
tma_retiring
        27,212,830      cpu_atom/CPU_CLK_UNHALTED.CORE/
        14,606,510      cpu_atom/TOPDOWN_BAD_SPECULATION.ALL_P/ #      6.7
%  tma_bad_speculation
                 0      cpu_atom/TOPDOWN_FE_BOUND.ALL/   #      0.0 % 
tma_frontend_bound

       0.933603501 seconds time elapsed

4. sudo ./perf stat -M TopdownL2
^C
 Performance counter stats for 'system wide':

         3,185,474      cpu_atom/TOPDOWN_BAD_SPECULATION.MACHINE_CLEARS/ # 
    0.3 %  tma_machine_clears
       362,392,575      cpu_atom/TOPDOWN_BE_BOUND.ALL_P/ #     11.2 % 
tma_core_bound
                                                  #     21.1 % 
tma_resource_bound
       134,220,848      cpu_core/slots/
         7,973,945      cpu_core/topdown-retiring/
        21,283,136      cpu_core/topdown-mem-bound/      #     20.3 % 
tma_core_bound
                                                  #     15.9 % 
tma_memory_bound
         8,723,033      cpu_core/topdown-bad-spec/
         1,312,216      cpu_core/topdown-heavy-ops/      #      1.0 % 
tma_heavy_operations
                                                  #      5.0 % 
tma_light_operations
        58,866,799      cpu_core/topdown-fetch-lat/      #      7.5 % 
tma_fetch_bandwidth
                                                  #     43.9 % 
tma_fetch_latency
         8,588,952      cpu_core/topdown-br-mispredict/  #      6.4 % 
tma_branch_mispredicts
                                                  #      0.1 % 
tma_machine_clears
        68,870,574      cpu_core/topdown-fe-bound/
        48,573,009      cpu_core/topdown-be-bound/
       125,913,035      cpu_atom/TOPDOWN_BE_BOUND.ALLOC_RESTRICTIONS/
       106,491,449      cpu_atom/TOPDOWN_BAD_SPECULATION.MISPREDICT/ #     
9.5 %  tma_branch_mispredicts
       199,780,747      cpu_atom/TOPDOWN_FE_BOUND.FRONTEND_LATENCY/ #   
 17.8 %  tma_ifetch_latency
       140,205,932      cpu_atom/CPU_CLK_UNHALTED.CORE/
       109,670,746      cpu_atom/TOPDOWN_BAD_SPECULATION.ALL_P/
                 0      cpu_atom/TOPDOWN_FE_BOUND.ALL/
       176,695,510      cpu_atom/TOPDOWN_FE_BOUND.FRONTEND_BANDWIDTH/ #   
 15.8 %  tma_ifetch_bandwidth

       1.463942844 seconds time elapsed


On 11/12/2025 5:21 AM, Ian Rogers wrote:
> Prior to this series stat-shadow would produce hard coded metrics if
> certain events appeared in the evlist. This series produces equivalent
> json metrics and cleans up the consequences in tests and display
> output. A before and after of the default display output on a
> tigerlake is:
>
> Before:
> ```
> $ perf stat -a sleep 1
>
>  Performance counter stats for 'system wide':
>
>     16,041,816,418      cpu-clock                        #   15.995 CPUs utilized             
>              5,749      context-switches                 #  358.376 /sec                      
>                121      cpu-migrations                   #    7.543 /sec                      
>              1,806      page-faults                      #  112.581 /sec                      
>        825,965,204      instructions                     #    0.70  insn per cycle            
>      1,180,799,101      cycles                           #    0.074 GHz                       
>        168,945,109      branches                         #   10.532 M/sec                     
>          4,629,567      branch-misses                    #    2.74% of all branches           
>  #     30.2 %  tma_backend_bound      
>                                                   #      7.8 %  tma_bad_speculation    
>                                                   #     47.1 %  tma_frontend_bound     
>  #     14.9 %  tma_retiring           
> ```
>
> After:
> ```
> $ perf stat -a sleep 1
>
>  Performance counter stats for 'system wide':
>
>              2,890      context-switches                 #    179.9 cs/sec  cs_per_second     
>     16,061,923,339      cpu-clock                        #     16.0 CPUs  CPUs_utilized       
>                 43      cpu-migrations                   #      2.7 migrations/sec  migrations_per_second
>              5,645      page-faults                      #    351.5 faults/sec  page_faults_per_second
>          5,708,413      branch-misses                    #      1.4 %  branch_miss_rate         (88.83%)
>        429,978,120      branches                         #     26.8 M/sec  branch_frequency     (88.85%)
>      1,626,915,897      cpu-cycles                       #      0.1 GHz  cycles_frequency       (88.84%)
>      2,556,805,534      instructions                     #      1.5 instructions  insn_per_cycle  (88.86%)
>                         TopdownL1                 #     20.1 %  tma_backend_bound      
>                                                   #     40.5 %  tma_bad_speculation      (88.90%)
>                                                   #     17.2 %  tma_frontend_bound       (78.05%)
>                                                   #     22.2 %  tma_retiring             (88.89%)
>
>        1.002994394 seconds time elapsed
> ```
>
> Having the metrics in json brings greater uniformity, allows events to
> be shared by metrics, and it also allows descriptions like:
> ```
> $ perf list cs_per_second
> ...
>   cs_per_second
>        [Context switches per CPU second]
> ```
>
> A thorn in the side of doing this work was that the hard coded metrics
> were used by perf script with '-F metric'. This functionality didn't
> work for me (I was testing `perf record -e instructions,cycles`
> with/without leader sampling and then `perf script -F metric` but saw
> nothing but empty lines) but anyway I decided to fix it to the best of
> my ability in this series. So the script side counters were removed
> and the regular ones associated with the evsel used. The json metrics
> were all searched looking for ones that have a subset of events
> matching those in the perf script session, and all metrics are
> printed. This is kind of weird as the counters are being set by the
> period of samples, but I carried the behavior forward. I suspect there
> needs to be follow up work to make this better, but what is in the
> series is superior to what is currently in the tree. Follow up work
> could include finding metrics for the machine in the perf.data rather
> than using the host, allowing multiple metrics even if the metric ids
> of the events differ, fixing pre-existing `perf stat record/report`
> issues, etc.
>
> There is a lot of stat tests that, for example, assume '-e
> instructions,cycles' will produce an IPC metric. These things needed
> tidying as now the metric must be explicitly asked for and when doing
> this ones using software events were preferred to increase
> compatibility. As the test updates were numerous they are distinct to
> the patches updating the functionality causing periods in the series
> where not all tests are passing. If this is undesirable the test fixes
> can be squashed into the functionality updates, but this will be kind
> of messy, especially as at some points in the series both the old
> metrics and the new metrics will be displayed.
>
> v4: K/sec to M/sec on branch frequency (Namhyung), perf script -F
>     metric to-done a system-wide calculation (Namhyung) and don't
>     crash because of the CPU map index couldn't be found. Regenerate
>     commit messages but the cpu-clock was always yielding 0 on my
>     machine leading to a lot of nan metric values.
>
> v3: Rebase resolving merge conflicts in
>     tools/perf/pmu-events/empty-pmu-events.c by just regenerating it
>     (Dapeng Mi).
>     https://lore.kernel.org/lkml/20251111040417.270945-1-irogers@google.com/
>
> v2: Drop merged patches, add json to document target_cpu/core_wide and
>     example to "Add care to picking the evsel for displaying a metric"
>     commit message (Namhyung).
>     https://lore.kernel.org/lkml/20251106231508.448793-1-irogers@google.com/
>
> v1: https://lore.kernel.org/lkml/20251024175857.808401-1-irogers@google.com/
>
> Ian Rogers (18):
>   perf metricgroup: Add care to picking the evsel for displaying a
>     metric
>   perf expr: Add #target_cpu literal
>   perf jevents: Add set of common metrics based on default ones
>   perf jevents: Add metric DefaultShowEvents
>   perf stat: Add detail -d,-dd,-ddd metrics
>   perf script: Change metric format to use json metrics
>   perf stat: Remove hard coded shadow metrics
>   perf stat: Fix default metricgroup display on hybrid
>   perf stat: Sort default events/metrics
>   perf stat: Remove "unit" workarounds for metric-only
>   perf test stat+json: Improve metric-only testing
>   perf test stat: Ignore failures in Default[234] metricgroups
>   perf test stat: Update std_output testing metric expectations
>   perf test metrics: Update all metrics for possibly failing default
>     metrics
>   perf test stat: Update shadow test to use metrics
>   perf test stat: Update test expectations and events
>   perf test stat csv: Update test expectations and events
>   perf tool_pmu: Make core_wide and target_cpu json events
>
>  tools/perf/builtin-script.c                   | 251 ++++++++++-
>  tools/perf/builtin-stat.c                     | 154 ++-----
>  .../arch/common/common/metrics.json           | 151 +++++++
>  .../pmu-events/arch/common/common/tool.json   |  12 +
>  tools/perf/pmu-events/empty-pmu-events.c      | 229 ++++++----
>  tools/perf/pmu-events/jevents.py              |  28 +-
>  tools/perf/pmu-events/pmu-events.h            |   2 +
>  .../tests/shell/lib/perf_json_output_lint.py  |   4 +-
>  tools/perf/tests/shell/lib/stat_output.sh     |   2 +-
>  tools/perf/tests/shell/stat+csv_output.sh     |   2 +-
>  tools/perf/tests/shell/stat+json_output.sh    |   2 +-
>  tools/perf/tests/shell/stat+shadow_stat.sh    |   4 +-
>  tools/perf/tests/shell/stat+std_output.sh     |   4 +-
>  tools/perf/tests/shell/stat.sh                |   6 +-
>  .../perf/tests/shell/stat_all_metricgroups.sh |   3 +
>  tools/perf/tests/shell/stat_all_metrics.sh    |   7 +-
>  tools/perf/util/evsel.h                       |   1 +
>  tools/perf/util/expr.c                        |   8 +-
>  tools/perf/util/metricgroup.c                 |  92 +++-
>  tools/perf/util/stat-display.c                |  55 +--
>  tools/perf/util/stat-shadow.c                 | 404 +-----------------
>  tools/perf/util/stat.h                        |   2 +-
>  tools/perf/util/tool_pmu.c                    |  24 +-
>  tools/perf/util/tool_pmu.h                    |   9 +-
>  24 files changed, 769 insertions(+), 687 deletions(-)
>  create mode 100644 tools/perf/pmu-events/arch/common/common/metrics.json
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 00/18]
  2025-11-11 21:21 [PATCH v4 00/18] Ian Rogers
                   ` (19 preceding siblings ...)
  2025-11-12  9:03 ` Mi, Dapeng
@ 2025-11-12 17:56 ` Namhyung Kim
  20 siblings, 0 replies; 33+ messages in thread
From: Namhyung Kim @ 2025-11-12 17:56 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, James Clark,
	Xu Yang, Chun-Tse Shao, Thomas Richter, Sumanth Korikkar,
	Collin Funk, Thomas Falcon, Howard Chu, Dapeng Mi, Levi Yun,
	Yang Li, linux-kernel, linux-perf-users, Andi Kleen, Weilin Wang,
	Ian Rogers

On Tue, 11 Nov 2025 13:21:48 -0800, Ian Rogers wrote:
> Prior to this series stat-shadow would produce hard coded metrics if
> certain events appeared in the evlist. This series produces equivalent
> json metrics and cleans up the consequences in tests and display
> output. A before and after of the default display output on a
> tigerlake is:
> 
> Before:
> ```
> $ perf stat -a sleep 1
> 
> [...]
Applied to perf-tools-next, thanks!

Best regards,
Namhyung



^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2025-11-18 10:57 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-11 21:21 [PATCH v4 00/18] Ian Rogers
2025-11-11 21:21 ` [PATCH v4 01/18] perf metricgroup: Add care to picking the evsel for displaying a metric Ian Rogers
2025-11-11 21:21 ` [PATCH v4 02/18] perf expr: Add #target_cpu literal Ian Rogers
2025-11-11 21:21 ` [PATCH v4 03/18] perf jevents: Add set of common metrics based on default ones Ian Rogers
2025-11-14 16:28   ` James Clark
2025-11-14 16:57     ` Ian Rogers
2025-11-15 17:52       ` Namhyung Kim
2025-11-16  3:29         ` Ian Rogers
2025-11-18  1:36           ` Namhyung Kim
2025-11-18  2:28             ` Ian Rogers
2025-11-18  7:29               ` Namhyung Kim
2025-11-18 10:57                 ` James Clark
2025-11-11 21:21 ` [PATCH v4 04/18] perf jevents: Add metric DefaultShowEvents Ian Rogers
2025-11-11 21:21 ` [PATCH v4 05/18] perf stat: Add detail -d,-dd,-ddd metrics Ian Rogers
2025-11-11 21:21 ` [PATCH v4 06/18] perf script: Change metric format to use json metrics Ian Rogers
2025-11-11 21:21 ` [PATCH v4 07/18] perf stat: Remove hard coded shadow metrics Ian Rogers
2025-11-11 21:21 ` [PATCH v4 08/18] perf stat: Fix default metricgroup display on hybrid Ian Rogers
2025-11-11 21:21 ` [PATCH v4 09/18] perf stat: Sort default events/metrics Ian Rogers
2025-11-11 21:21 ` [PATCH v4 10/18] perf stat: Remove "unit" workarounds for metric-only Ian Rogers
2025-11-11 21:21 ` [PATCH v4 11/18] perf test stat+json: Improve metric-only testing Ian Rogers
2025-11-11 21:22 ` [PATCH v4 12/18] perf test stat: Ignore failures in Default[234] metricgroups Ian Rogers
2025-11-11 21:22 ` [PATCH v4 13/18] perf test stat: Update std_output testing metric expectations Ian Rogers
2025-11-11 21:22 ` [PATCH v4 14/18] perf test metrics: Update all metrics for possibly failing default metrics Ian Rogers
2025-11-11 21:22 ` [PATCH v4 15/18] perf test stat: Update shadow test to use metrics Ian Rogers
2025-11-11 21:22 ` [PATCH v4 16/18] perf test stat: Update test expectations and events Ian Rogers
2025-11-11 21:22 ` [PATCH v4 17/18] perf test stat csv: " Ian Rogers
2025-11-11 21:22 ` [PATCH v4 18/18] perf tool_pmu: Make core_wide and target_cpu json events Ian Rogers
2025-11-11 22:42 ` [PATCH v4 00/18] Namhyung Kim
2025-11-11 23:13   ` Ian Rogers
2025-11-12  1:08     ` Namhyung Kim
2025-11-12  8:20     ` Mi, Dapeng
2025-11-12  9:03 ` Mi, Dapeng
2025-11-12 17:56 ` Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).