* [PATCH v1 01/22] perf evsel: Remove unused metric_events variable
2025-10-24 17:58 [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
@ 2025-10-24 17:58 ` Ian Rogers
2025-10-24 17:58 ` [PATCH v1 02/22] perf metricgroup: Update comment on location of metric_event list Ian Rogers
` (22 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Ian Rogers @ 2025-10-24 17:58 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
linux-perf-users
The metric_events exist in the metric_expr list and so this variable
has been unused for a while.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/util/evsel.c | 2 --
tools/perf/util/evsel.h | 1 -
2 files changed, 3 deletions(-)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index ad11cbfcbff1..67a898cda86a 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -402,7 +402,6 @@ void evsel__init(struct evsel *evsel,
evsel->sample_size = __evsel__sample_size(attr->sample_type);
evsel__calc_id_pos(evsel);
evsel->cmdline_group_boundary = false;
- evsel->metric_events = NULL;
evsel->per_pkg_mask = NULL;
evsel->collect_stat = false;
evsel->group_pmu_name = NULL;
@@ -1754,7 +1753,6 @@ void evsel__exit(struct evsel *evsel)
evsel__zero_per_pkg(evsel);
hashmap__free(evsel->per_pkg_mask);
evsel->per_pkg_mask = NULL;
- zfree(&evsel->metric_events);
if (evsel__priv_destructor)
evsel__priv_destructor(evsel->priv);
perf_evsel__object.fini(evsel);
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index f8de0f9a719b..71f74c7036ef 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -100,7 +100,6 @@ struct evsel {
* metric fields are similar, but needs more care as they can have
* references to other metric (evsel).
*/
- struct evsel **metric_events;
struct evsel *metric_leader;
void *handler;
--
2.51.1.821.gb6fe4d2222-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread* [PATCH v1 02/22] perf metricgroup: Update comment on location of metric_event list
2025-10-24 17:58 [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
2025-10-24 17:58 ` [PATCH v1 01/22] perf evsel: Remove unused metric_events variable Ian Rogers
@ 2025-10-24 17:58 ` Ian Rogers
2025-10-24 17:58 ` [PATCH v1 03/22] perf metricgroup: Missed free on error path Ian Rogers
` (21 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Ian Rogers @ 2025-10-24 17:58 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
linux-perf-users
Update comment as the stat_config no longer holds all metrics.
Signed-off-by: Ian Rogers <irogers@google.com>
Fixes: faebee18d720 ("perf stat: Move metric list from config to evlist")
---
tools/perf/util/metricgroup.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
index 324880b2ed8f..4be6bfc13c46 100644
--- a/tools/perf/util/metricgroup.h
+++ b/tools/perf/util/metricgroup.h
@@ -16,7 +16,7 @@ struct cgroup;
/**
* A node in a rblist keyed by the evsel. The global rblist of metric events
- * generally exists in perf_stat_config. The evsel is looked up in the rblist
+ * generally exists in evlist. The evsel is looked up in the rblist
* yielding a list of metric_expr.
*/
struct metric_event {
--
2.51.1.821.gb6fe4d2222-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread* [PATCH v1 03/22] perf metricgroup: Missed free on error path
2025-10-24 17:58 [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
2025-10-24 17:58 ` [PATCH v1 01/22] perf evsel: Remove unused metric_events variable Ian Rogers
2025-10-24 17:58 ` [PATCH v1 02/22] perf metricgroup: Update comment on location of metric_event list Ian Rogers
@ 2025-10-24 17:58 ` Ian Rogers
2025-10-24 17:58 ` [PATCH v1 04/22] perf metricgroup: When copy metrics copy default information Ian Rogers
` (20 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Ian Rogers @ 2025-10-24 17:58 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
linux-perf-users
If an out-of-memory occurs the expr also needs freeing.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/util/metricgroup.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 595b83142d2c..c822cf5da53b 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -1455,6 +1455,7 @@ static int parse_groups(struct evlist *perf_evlist,
if (!expr->metric_name) {
ret = -ENOMEM;
+ free(expr);
free(metric_events);
goto out;
}
--
2.51.1.821.gb6fe4d2222-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread* [PATCH v1 04/22] perf metricgroup: When copy metrics copy default information
2025-10-24 17:58 [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
` (2 preceding siblings ...)
2025-10-24 17:58 ` [PATCH v1 03/22] perf metricgroup: Missed free on error path Ian Rogers
@ 2025-10-24 17:58 ` Ian Rogers
2025-10-24 17:58 ` [PATCH v1 05/22] perf metricgroup: Add care to picking the evsel for displaying a metric Ian Rogers
` (19 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Ian Rogers @ 2025-10-24 17:58 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
linux-perf-users
When copy metrics into a group also copy default information from the
original metrics.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/util/metricgroup.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index c822cf5da53b..48936e517803 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -1608,6 +1608,7 @@ int metricgroup__copy_metric_events(struct evlist *evlist, struct cgroup *cgrp,
pr_debug("copying metric event for cgroup '%s': %s (idx=%d)\n",
cgrp ? cgrp->name : "root", evsel->name, evsel->core.idx);
+ new_me->is_default = old_me->is_default;
list_for_each_entry(old_expr, &old_me->head, nd) {
new_expr = malloc(sizeof(*new_expr));
if (!new_expr)
@@ -1621,6 +1622,7 @@ int metricgroup__copy_metric_events(struct evlist *evlist, struct cgroup *cgrp,
new_expr->metric_unit = old_expr->metric_unit;
new_expr->runtime = old_expr->runtime;
+ new_expr->default_metricgroup_name = old_expr->default_metricgroup_name;
if (old_expr->metric_refs) {
/* calculate number of metric_events */
--
2.51.1.821.gb6fe4d2222-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread* [PATCH v1 05/22] perf metricgroup: Add care to picking the evsel for displaying a metric
2025-10-24 17:58 [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
` (3 preceding siblings ...)
2025-10-24 17:58 ` [PATCH v1 04/22] perf metricgroup: When copy metrics copy default information Ian Rogers
@ 2025-10-24 17:58 ` Ian Rogers
2025-11-04 4:52 ` Namhyung Kim
2025-10-24 17:58 ` [PATCH v1 06/22] perf jevents: Make all tables static Ian Rogers
` (18 subsequent siblings)
23 siblings, 1 reply; 36+ messages in thread
From: Ian Rogers @ 2025-10-24 17:58 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
linux-perf-users
Rather than using the first evsel in the matched events, try to find
the least shared non-tool evsel. The aim is to pick the first evsel
that typifies the metric within the list of metrics.
This addresses an issue where Default metric group metrics may lose
their counter value due to how the stat displaying hides counters for
default event/metric output.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/util/metricgroup.c | 48 ++++++++++++++++++++++++++++++++++-
1 file changed, 47 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 48936e517803..76092ee26761 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -1323,6 +1323,51 @@ static int parse_ids(bool metric_no_merge, bool fake_pmu,
return ret;
}
+/* How many times will a given evsel be used in a set of metrics? */
+static int count_uses(struct list_head *metric_list, struct evsel *evsel)
+{
+ const char *metric_id = evsel__metric_id(evsel);
+ struct metric *m;
+ int uses = 0;
+
+ list_for_each_entry(m, metric_list, nd) {
+ if (hashmap__find(m->pctx->ids, metric_id, NULL))
+ uses++;
+ }
+ return uses;
+}
+
+/*
+ * Select the evsel that stat-display will use to trigger shadow/metric
+ * printing. Pick the least shared non-tool evsel, encouraging metrics to be
+ * with a hardware counter that is specific to them.
+ */
+static struct evsel *pick_display_evsel(struct list_head *metric_list,
+ struct evsel **metric_events)
+{
+ struct evsel *selected = metric_events[0];
+ size_t selected_uses;
+ bool selected_is_tool;
+
+ if (!selected)
+ return NULL;
+
+ selected_uses = count_uses(metric_list, selected);
+ selected_is_tool = evsel__is_tool(selected);
+ for (int i = 1; metric_events[i]; i++) {
+ struct evsel *candidate = metric_events[i];
+ size_t candidate_uses = count_uses(metric_list, candidate);
+
+ if ((selected_is_tool && !evsel__is_tool(candidate)) ||
+ (candidate_uses < selected_uses)) {
+ selected = candidate;
+ selected_uses = candidate_uses;
+ selected_is_tool = evsel__is_tool(selected);
+ }
+ }
+ return selected;
+}
+
static int parse_groups(struct evlist *perf_evlist,
const char *pmu, const char *str,
bool metric_no_group,
@@ -1430,7 +1475,8 @@ static int parse_groups(struct evlist *perf_evlist,
goto out;
}
- me = metricgroup__lookup(&perf_evlist->metric_events, metric_events[0],
+ me = metricgroup__lookup(&perf_evlist->metric_events,
+ pick_display_evsel(&metric_list, metric_events),
/*create=*/true);
expr = malloc(sizeof(struct metric_expr));
--
2.51.1.821.gb6fe4d2222-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread* Re: [PATCH v1 05/22] perf metricgroup: Add care to picking the evsel for displaying a metric
2025-10-24 17:58 ` [PATCH v1 05/22] perf metricgroup: Add care to picking the evsel for displaying a metric Ian Rogers
@ 2025-11-04 4:52 ` Namhyung Kim
2025-11-04 5:28 ` Ian Rogers
0 siblings, 1 reply; 36+ messages in thread
From: Namhyung Kim @ 2025-11-04 4:52 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, James Clark,
Xu Yang, Chun-Tse Shao, Thomas Richter, Sumanth Korikkar,
Collin Funk, Thomas Falcon, Howard Chu, Dapeng Mi, Levi Yun,
Yang Li, linux-kernel, linux-perf-users
On Fri, Oct 24, 2025 at 10:58:40AM -0700, Ian Rogers wrote:
> Rather than using the first evsel in the matched events, try to find
> the least shared non-tool evsel. The aim is to pick the first evsel
> that typifies the metric within the list of metrics.
>
> This addresses an issue where Default metric group metrics may lose
> their counter value due to how the stat displaying hides counters for
> default event/metric output.
Do you have a command line example to show impact of this change?
Thanks,
Namhyung
>
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
> tools/perf/util/metricgroup.c | 48 ++++++++++++++++++++++++++++++++++-
> 1 file changed, 47 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> index 48936e517803..76092ee26761 100644
> --- a/tools/perf/util/metricgroup.c
> +++ b/tools/perf/util/metricgroup.c
> @@ -1323,6 +1323,51 @@ static int parse_ids(bool metric_no_merge, bool fake_pmu,
> return ret;
> }
>
> +/* How many times will a given evsel be used in a set of metrics? */
> +static int count_uses(struct list_head *metric_list, struct evsel *evsel)
> +{
> + const char *metric_id = evsel__metric_id(evsel);
> + struct metric *m;
> + int uses = 0;
> +
> + list_for_each_entry(m, metric_list, nd) {
> + if (hashmap__find(m->pctx->ids, metric_id, NULL))
> + uses++;
> + }
> + return uses;
> +}
> +
> +/*
> + * Select the evsel that stat-display will use to trigger shadow/metric
> + * printing. Pick the least shared non-tool evsel, encouraging metrics to be
> + * with a hardware counter that is specific to them.
> + */
> +static struct evsel *pick_display_evsel(struct list_head *metric_list,
> + struct evsel **metric_events)
> +{
> + struct evsel *selected = metric_events[0];
> + size_t selected_uses;
> + bool selected_is_tool;
> +
> + if (!selected)
> + return NULL;
> +
> + selected_uses = count_uses(metric_list, selected);
> + selected_is_tool = evsel__is_tool(selected);
> + for (int i = 1; metric_events[i]; i++) {
> + struct evsel *candidate = metric_events[i];
> + size_t candidate_uses = count_uses(metric_list, candidate);
> +
> + if ((selected_is_tool && !evsel__is_tool(candidate)) ||
> + (candidate_uses < selected_uses)) {
> + selected = candidate;
> + selected_uses = candidate_uses;
> + selected_is_tool = evsel__is_tool(selected);
> + }
> + }
> + return selected;
> +}
> +
> static int parse_groups(struct evlist *perf_evlist,
> const char *pmu, const char *str,
> bool metric_no_group,
> @@ -1430,7 +1475,8 @@ static int parse_groups(struct evlist *perf_evlist,
> goto out;
> }
>
> - me = metricgroup__lookup(&perf_evlist->metric_events, metric_events[0],
> + me = metricgroup__lookup(&perf_evlist->metric_events,
> + pick_display_evsel(&metric_list, metric_events),
> /*create=*/true);
>
> expr = malloc(sizeof(struct metric_expr));
> --
> 2.51.1.821.gb6fe4d2222-goog
>
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: [PATCH v1 05/22] perf metricgroup: Add care to picking the evsel for displaying a metric
2025-11-04 4:52 ` Namhyung Kim
@ 2025-11-04 5:28 ` Ian Rogers
2025-11-06 6:03 ` Namhyung Kim
0 siblings, 1 reply; 36+ messages in thread
From: Ian Rogers @ 2025-11-04 5:28 UTC (permalink / raw)
To: Namhyung Kim
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, James Clark,
Xu Yang, Chun-Tse Shao, Thomas Richter, Sumanth Korikkar,
Collin Funk, Thomas Falcon, Howard Chu, Dapeng Mi, Levi Yun,
Yang Li, linux-kernel, linux-perf-users
On Mon, Nov 3, 2025 at 8:52 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Fri, Oct 24, 2025 at 10:58:40AM -0700, Ian Rogers wrote:
> > Rather than using the first evsel in the matched events, try to find
> > the least shared non-tool evsel. The aim is to pick the first evsel
> > that typifies the metric within the list of metrics.
> >
> > This addresses an issue where Default metric group metrics may lose
> > their counter value due to how the stat displaying hides counters for
> > default event/metric output.
>
> Do you have a command line example to show impact of this change?
You can just run a Topdown metricgroup on Intel to see differences,
but they are minor. The main impact is on the Default legacy metrics
as those have a counter then a metric, but without this change you get
everything grouped on the cpu-clock event and the formatting gets
broken. As --metric-only is popular when looking at a group of events
and the Default legacy metrics are added in subsequent changes it
didn't seem right to include the output (it either shows broken things
keeping to be somewhat broken or output from later patches).
Thanks,
Ian
> Thanks,
> Namhyung
>
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> > tools/perf/util/metricgroup.c | 48 ++++++++++++++++++++++++++++++++++-
> > 1 file changed, 47 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> > index 48936e517803..76092ee26761 100644
> > --- a/tools/perf/util/metricgroup.c
> > +++ b/tools/perf/util/metricgroup.c
> > @@ -1323,6 +1323,51 @@ static int parse_ids(bool metric_no_merge, bool fake_pmu,
> > return ret;
> > }
> >
> > +/* How many times will a given evsel be used in a set of metrics? */
> > +static int count_uses(struct list_head *metric_list, struct evsel *evsel)
> > +{
> > + const char *metric_id = evsel__metric_id(evsel);
> > + struct metric *m;
> > + int uses = 0;
> > +
> > + list_for_each_entry(m, metric_list, nd) {
> > + if (hashmap__find(m->pctx->ids, metric_id, NULL))
> > + uses++;
> > + }
> > + return uses;
> > +}
> > +
> > +/*
> > + * Select the evsel that stat-display will use to trigger shadow/metric
> > + * printing. Pick the least shared non-tool evsel, encouraging metrics to be
> > + * with a hardware counter that is specific to them.
> > + */
> > +static struct evsel *pick_display_evsel(struct list_head *metric_list,
> > + struct evsel **metric_events)
> > +{
> > + struct evsel *selected = metric_events[0];
> > + size_t selected_uses;
> > + bool selected_is_tool;
> > +
> > + if (!selected)
> > + return NULL;
> > +
> > + selected_uses = count_uses(metric_list, selected);
> > + selected_is_tool = evsel__is_tool(selected);
> > + for (int i = 1; metric_events[i]; i++) {
> > + struct evsel *candidate = metric_events[i];
> > + size_t candidate_uses = count_uses(metric_list, candidate);
> > +
> > + if ((selected_is_tool && !evsel__is_tool(candidate)) ||
> > + (candidate_uses < selected_uses)) {
> > + selected = candidate;
> > + selected_uses = candidate_uses;
> > + selected_is_tool = evsel__is_tool(selected);
> > + }
> > + }
> > + return selected;
> > +}
> > +
> > static int parse_groups(struct evlist *perf_evlist,
> > const char *pmu, const char *str,
> > bool metric_no_group,
> > @@ -1430,7 +1475,8 @@ static int parse_groups(struct evlist *perf_evlist,
> > goto out;
> > }
> >
> > - me = metricgroup__lookup(&perf_evlist->metric_events, metric_events[0],
> > + me = metricgroup__lookup(&perf_evlist->metric_events,
> > + pick_display_evsel(&metric_list, metric_events),
> > /*create=*/true);
> >
> > expr = malloc(sizeof(struct metric_expr));
> > --
> > 2.51.1.821.gb6fe4d2222-goog
> >
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: [PATCH v1 05/22] perf metricgroup: Add care to picking the evsel for displaying a metric
2025-11-04 5:28 ` Ian Rogers
@ 2025-11-06 6:03 ` Namhyung Kim
2025-11-06 6:42 ` Ian Rogers
0 siblings, 1 reply; 36+ messages in thread
From: Namhyung Kim @ 2025-11-06 6:03 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, James Clark,
Xu Yang, Chun-Tse Shao, Thomas Richter, Sumanth Korikkar,
Collin Funk, Thomas Falcon, Howard Chu, Dapeng Mi, Levi Yun,
Yang Li, linux-kernel, linux-perf-users
On Mon, Nov 03, 2025 at 09:28:44PM -0800, Ian Rogers wrote:
> On Mon, Nov 3, 2025 at 8:52 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Fri, Oct 24, 2025 at 10:58:40AM -0700, Ian Rogers wrote:
> > > Rather than using the first evsel in the matched events, try to find
> > > the least shared non-tool evsel. The aim is to pick the first evsel
> > > that typifies the metric within the list of metrics.
> > >
> > > This addresses an issue where Default metric group metrics may lose
> > > their counter value due to how the stat displaying hides counters for
> > > default event/metric output.
> >
> > Do you have a command line example to show impact of this change?
>
> You can just run a Topdown metricgroup on Intel to see differences,
Ok, before this change.
$ perf stat -M topdownL1 true
Performance counter stats for 'true':
7,754,275 TOPDOWN.SLOTS # 37.1 % tma_backend_bound
# 38.7 % tma_frontend_bound
# 8.8 % tma_bad_speculation
# 15.3 % tma_retiring
1,185,947 topdown-retiring
3,010,483 topdown-fe-bound
2,828,029 topdown-be-bound
729,814 topdown-bad-spec
9,987 INT_MISC.CLEARS_COUNT
221,405 IDQ.MS_UOPS
6,352 INT_MISC.UOP_DROPPING
1,212,644 UOPS_RETIRED.SLOTS
119,895 UOPS_DECODED.DEC0
60,975 cpu/UOPS_DECODED.DEC0,cmask=1/
1,639,442 UOPS_ISSUED.ANY
820,982 IDQ.MITE_UOPS
0.001172956 seconds time elapsed
0.001269000 seconds user
0.000000000 seconds sys
And with this change, it does look better.
$ perf stat -M topdownL1 true
Performance counter stats for 'true':
7,977,430 TOPDOWN.SLOTS
1,188,793 topdown-retiring
3,159,687 topdown-fe-bound
2,940,699 topdown-be-bound
688,248 topdown-bad-spec
9,749 INT_MISC.CLEARS_COUNT # 37.5 % tma_backend_bound
# 8.1 % tma_bad_speculation
219,145 IDQ.MS_UOPS # 14.9 % tma_retiring
6,188 INT_MISC.UOP_DROPPING # 39.5 % tma_frontend_bound
1,205,712 UOPS_RETIRED.SLOTS
117,505 UOPS_DECODED.DEC0
59,891 cpu/UOPS_DECODED.DEC0,cmask=1/
1,625,232 UOPS_ISSUED.ANY
805,560 IDQ.MITE_UOPS
0.001629344 seconds time elapsed
0.001672000 seconds user
0.000000000 seconds sys
> but they are minor. The main impact is on the Default legacy metrics
> as those have a counter then a metric, but without this change you get
> everything grouped on the cpu-clock event and the formatting gets broken.
Do you mean with other changes in this series? I don't see any
differences in the output just after this patch..
Before:
$ perf stat -a true
Performance counter stats for 'system wide':
19,078,719 cpu-clock # 7.256 CPUs utilized
94 context-switches # 4.927 K/sec
14 cpu-migrations # 733.802 /sec
61 page-faults # 3.197 K/sec
43,304,957 instructions # 1.10 insn per cycle
39,281,107 cycles # 2.059 GHz
5,012,071 branches # 262.705 M/sec
128,358 branch-misses # 2.56% of all branches
# 24.4 % tma_retiring
# 33.7 % tma_backend_bound
# 5.9 % tma_bad_speculation
# 36.0 % tma_frontend_bound
0.002629534 seconds time elapsed
After:
$ perf stat -a true
Performance counter stats for 'system wide':
6,201,661 cpu-clock # 3.692 CPUs utilized
24 context-switches # 3.870 K/sec
7 cpu-migrations # 1.129 K/sec
60 page-faults # 9.675 K/sec
11,458,681 instructions # 1.07 insn per cycle
10,704,978 cycles # 1.726 GHz
2,457,704 branches # 396.298 M/sec
54,553 branch-misses # 2.22% of all branches
# 21.4 % tma_retiring
# 36.1 % tma_backend_bound
# 10.2 % tma_bad_speculation
# 32.3 % tma_frontend_bound
0.001679679 seconds time elapsed
Thanks,
Namhyung
> As --metric-only is popular when looking at a group of events
> and the Default legacy metrics are added in subsequent changes it
> didn't seem right to include the output (it either shows broken things
> keeping to be somewhat broken or output from later patches).
>
> Thanks,
> Ian
>
> > Thanks,
> > Namhyung
> >
> > >
> > > Signed-off-by: Ian Rogers <irogers@google.com>
> > > ---
> > > tools/perf/util/metricgroup.c | 48 ++++++++++++++++++++++++++++++++++-
> > > 1 file changed, 47 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> > > index 48936e517803..76092ee26761 100644
> > > --- a/tools/perf/util/metricgroup.c
> > > +++ b/tools/perf/util/metricgroup.c
> > > @@ -1323,6 +1323,51 @@ static int parse_ids(bool metric_no_merge, bool fake_pmu,
> > > return ret;
> > > }
> > >
> > > +/* How many times will a given evsel be used in a set of metrics? */
> > > +static int count_uses(struct list_head *metric_list, struct evsel *evsel)
> > > +{
> > > + const char *metric_id = evsel__metric_id(evsel);
> > > + struct metric *m;
> > > + int uses = 0;
> > > +
> > > + list_for_each_entry(m, metric_list, nd) {
> > > + if (hashmap__find(m->pctx->ids, metric_id, NULL))
> > > + uses++;
> > > + }
> > > + return uses;
> > > +}
> > > +
> > > +/*
> > > + * Select the evsel that stat-display will use to trigger shadow/metric
> > > + * printing. Pick the least shared non-tool evsel, encouraging metrics to be
> > > + * with a hardware counter that is specific to them.
> > > + */
> > > +static struct evsel *pick_display_evsel(struct list_head *metric_list,
> > > + struct evsel **metric_events)
> > > +{
> > > + struct evsel *selected = metric_events[0];
> > > + size_t selected_uses;
> > > + bool selected_is_tool;
> > > +
> > > + if (!selected)
> > > + return NULL;
> > > +
> > > + selected_uses = count_uses(metric_list, selected);
> > > + selected_is_tool = evsel__is_tool(selected);
> > > + for (int i = 1; metric_events[i]; i++) {
> > > + struct evsel *candidate = metric_events[i];
> > > + size_t candidate_uses = count_uses(metric_list, candidate);
> > > +
> > > + if ((selected_is_tool && !evsel__is_tool(candidate)) ||
> > > + (candidate_uses < selected_uses)) {
> > > + selected = candidate;
> > > + selected_uses = candidate_uses;
> > > + selected_is_tool = evsel__is_tool(selected);
> > > + }
> > > + }
> > > + return selected;
> > > +}
> > > +
> > > static int parse_groups(struct evlist *perf_evlist,
> > > const char *pmu, const char *str,
> > > bool metric_no_group,
> > > @@ -1430,7 +1475,8 @@ static int parse_groups(struct evlist *perf_evlist,
> > > goto out;
> > > }
> > >
> > > - me = metricgroup__lookup(&perf_evlist->metric_events, metric_events[0],
> > > + me = metricgroup__lookup(&perf_evlist->metric_events,
> > > + pick_display_evsel(&metric_list, metric_events),
> > > /*create=*/true);
> > >
> > > expr = malloc(sizeof(struct metric_expr));
> > > --
> > > 2.51.1.821.gb6fe4d2222-goog
> > >
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: [PATCH v1 05/22] perf metricgroup: Add care to picking the evsel for displaying a metric
2025-11-06 6:03 ` Namhyung Kim
@ 2025-11-06 6:42 ` Ian Rogers
0 siblings, 0 replies; 36+ messages in thread
From: Ian Rogers @ 2025-11-06 6:42 UTC (permalink / raw)
To: Namhyung Kim
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, James Clark,
Xu Yang, Chun-Tse Shao, Thomas Richter, Sumanth Korikkar,
Collin Funk, Thomas Falcon, Howard Chu, Dapeng Mi, Levi Yun,
Yang Li, linux-kernel, linux-perf-users
On Wed, Nov 5, 2025 at 10:03 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Mon, Nov 03, 2025 at 09:28:44PM -0800, Ian Rogers wrote:
> > On Mon, Nov 3, 2025 at 8:52 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Fri, Oct 24, 2025 at 10:58:40AM -0700, Ian Rogers wrote:
> > > > Rather than using the first evsel in the matched events, try to find
> > > > the least shared non-tool evsel. The aim is to pick the first evsel
> > > > that typifies the metric within the list of metrics.
> > > >
> > > > This addresses an issue where Default metric group metrics may lose
> > > > their counter value due to how the stat displaying hides counters for
> > > > default event/metric output.
> > >
> > > Do you have a command line example to show impact of this change?
> >
> > You can just run a Topdown metricgroup on Intel to see differences,
>
> Ok, before this change.
>
> $ perf stat -M topdownL1 true
>
> Performance counter stats for 'true':
>
> 7,754,275 TOPDOWN.SLOTS # 37.1 % tma_backend_bound
> # 38.7 % tma_frontend_bound
> # 8.8 % tma_bad_speculation
> # 15.3 % tma_retiring
> 1,185,947 topdown-retiring
> 3,010,483 topdown-fe-bound
> 2,828,029 topdown-be-bound
> 729,814 topdown-bad-spec
> 9,987 INT_MISC.CLEARS_COUNT
> 221,405 IDQ.MS_UOPS
> 6,352 INT_MISC.UOP_DROPPING
> 1,212,644 UOPS_RETIRED.SLOTS
> 119,895 UOPS_DECODED.DEC0
> 60,975 cpu/UOPS_DECODED.DEC0,cmask=1/
> 1,639,442 UOPS_ISSUED.ANY
> 820,982 IDQ.MITE_UOPS
>
> 0.001172956 seconds time elapsed
>
> 0.001269000 seconds user
> 0.000000000 seconds sys
>
>
> And with this change, it does look better.
>
> $ perf stat -M topdownL1 true
>
> Performance counter stats for 'true':
>
> 7,977,430 TOPDOWN.SLOTS
> 1,188,793 topdown-retiring
> 3,159,687 topdown-fe-bound
> 2,940,699 topdown-be-bound
> 688,248 topdown-bad-spec
> 9,749 INT_MISC.CLEARS_COUNT # 37.5 % tma_backend_bound
> # 8.1 % tma_bad_speculation
> 219,145 IDQ.MS_UOPS # 14.9 % tma_retiring
> 6,188 INT_MISC.UOP_DROPPING # 39.5 % tma_frontend_bound
> 1,205,712 UOPS_RETIRED.SLOTS
> 117,505 UOPS_DECODED.DEC0
> 59,891 cpu/UOPS_DECODED.DEC0,cmask=1/
> 1,625,232 UOPS_ISSUED.ANY
> 805,560 IDQ.MITE_UOPS
>
> 0.001629344 seconds time elapsed
>
> 0.001672000 seconds user
> 0.000000000 seconds sys
>
> > but they are minor. The main impact is on the Default legacy metrics
> > as those have a counter then a metric, but without this change you get
> > everything grouped on the cpu-clock event and the formatting gets broken.
>
> Do you mean with other changes in this series? I don't see any
> differences in the output just after this patch..
>
> Before:
>
> $ perf stat -a true
>
> Performance counter stats for 'system wide':
>
> 19,078,719 cpu-clock # 7.256 CPUs utilized
> 94 context-switches # 4.927 K/sec
> 14 cpu-migrations # 733.802 /sec
> 61 page-faults # 3.197 K/sec
> 43,304,957 instructions # 1.10 insn per cycle
> 39,281,107 cycles # 2.059 GHz
> 5,012,071 branches # 262.705 M/sec
> 128,358 branch-misses # 2.56% of all branches
> # 24.4 % tma_retiring
> # 33.7 % tma_backend_bound
> # 5.9 % tma_bad_speculation
> # 36.0 % tma_frontend_bound
>
> 0.002629534 seconds time elapsed
>
> After:
>
> $ perf stat -a true
>
> Performance counter stats for 'system wide':
>
> 6,201,661 cpu-clock # 3.692 CPUs utilized
> 24 context-switches # 3.870 K/sec
> 7 cpu-migrations # 1.129 K/sec
> 60 page-faults # 9.675 K/sec
> 11,458,681 instructions # 1.07 insn per cycle
> 10,704,978 cycles # 1.726 GHz
> 2,457,704 branches # 396.298 M/sec
> 54,553 branch-misses # 2.22% of all branches
> # 21.4 % tma_retiring
> # 36.1 % tma_backend_bound
> # 10.2 % tma_bad_speculation
> # 32.3 % tma_frontend_bound
>
> 0.001679679 seconds time elapsed
These are the hardcoded metrics that aren't impacted by my changes to
the json metric's behavior. Patch 8 will add json for the legacy
metrics.
Thanks,
Ian
> Thanks,
> Namhyung
>
>
> > As --metric-only is popular when looking at a group of events
> > and the Default legacy metrics are added in subsequent changes it
> > didn't seem right to include the output (it either shows broken things
> > keeping to be somewhat broken or output from later patches).
> >
> > Thanks,
> > Ian
> >
> > > Thanks,
> > > Namhyung
> > >
> > > >
> > > > Signed-off-by: Ian Rogers <irogers@google.com>
> > > > ---
> > > > tools/perf/util/metricgroup.c | 48 ++++++++++++++++++++++++++++++++++-
> > > > 1 file changed, 47 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> > > > index 48936e517803..76092ee26761 100644
> > > > --- a/tools/perf/util/metricgroup.c
> > > > +++ b/tools/perf/util/metricgroup.c
> > > > @@ -1323,6 +1323,51 @@ static int parse_ids(bool metric_no_merge, bool fake_pmu,
> > > > return ret;
> > > > }
> > > >
> > > > +/* How many times will a given evsel be used in a set of metrics? */
> > > > +static int count_uses(struct list_head *metric_list, struct evsel *evsel)
> > > > +{
> > > > + const char *metric_id = evsel__metric_id(evsel);
> > > > + struct metric *m;
> > > > + int uses = 0;
> > > > +
> > > > + list_for_each_entry(m, metric_list, nd) {
> > > > + if (hashmap__find(m->pctx->ids, metric_id, NULL))
> > > > + uses++;
> > > > + }
> > > > + return uses;
> > > > +}
> > > > +
> > > > +/*
> > > > + * Select the evsel that stat-display will use to trigger shadow/metric
> > > > + * printing. Pick the least shared non-tool evsel, encouraging metrics to be
> > > > + * with a hardware counter that is specific to them.
> > > > + */
> > > > +static struct evsel *pick_display_evsel(struct list_head *metric_list,
> > > > + struct evsel **metric_events)
> > > > +{
> > > > + struct evsel *selected = metric_events[0];
> > > > + size_t selected_uses;
> > > > + bool selected_is_tool;
> > > > +
> > > > + if (!selected)
> > > > + return NULL;
> > > > +
> > > > + selected_uses = count_uses(metric_list, selected);
> > > > + selected_is_tool = evsel__is_tool(selected);
> > > > + for (int i = 1; metric_events[i]; i++) {
> > > > + struct evsel *candidate = metric_events[i];
> > > > + size_t candidate_uses = count_uses(metric_list, candidate);
> > > > +
> > > > + if ((selected_is_tool && !evsel__is_tool(candidate)) ||
> > > > + (candidate_uses < selected_uses)) {
> > > > + selected = candidate;
> > > > + selected_uses = candidate_uses;
> > > > + selected_is_tool = evsel__is_tool(selected);
> > > > + }
> > > > + }
> > > > + return selected;
> > > > +}
> > > > +
> > > > static int parse_groups(struct evlist *perf_evlist,
> > > > const char *pmu, const char *str,
> > > > bool metric_no_group,
> > > > @@ -1430,7 +1475,8 @@ static int parse_groups(struct evlist *perf_evlist,
> > > > goto out;
> > > > }
> > > >
> > > > - me = metricgroup__lookup(&perf_evlist->metric_events, metric_events[0],
> > > > + me = metricgroup__lookup(&perf_evlist->metric_events,
> > > > + pick_display_evsel(&metric_list, metric_events),
> > > > /*create=*/true);
> > > >
> > > > expr = malloc(sizeof(struct metric_expr));
> > > > --
> > > > 2.51.1.821.gb6fe4d2222-goog
> > > >
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v1 06/22] perf jevents: Make all tables static
2025-10-24 17:58 [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
` (4 preceding siblings ...)
2025-10-24 17:58 ` [PATCH v1 05/22] perf metricgroup: Add care to picking the evsel for displaying a metric Ian Rogers
@ 2025-10-24 17:58 ` Ian Rogers
2025-10-24 17:58 ` [PATCH v1 07/22] perf expr: Add #target_cpu literal Ian Rogers
` (17 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Ian Rogers @ 2025-10-24 17:58 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
linux-perf-users
The tables created by jevents.py are only used within the pmu-events.c
file. Change the declarations of those global variables to be static
to encapsulate this.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/pmu-events/empty-pmu-events.c | 10 +++++-----
tools/perf/pmu-events/jevents.py | 6 +++---
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
index 336e3924ce84..5120fb93690e 100644
--- a/tools/perf/pmu-events/empty-pmu-events.c
+++ b/tools/perf/pmu-events/empty-pmu-events.c
@@ -2585,7 +2585,7 @@ static const struct compact_pmu_event pmu_events__common_tool[] = {
};
-const struct pmu_table_entry pmu_events__common[] = {
+static const struct pmu_table_entry pmu_events__common[] = {
{
.entries = pmu_events__common_default_core,
.num_entries = ARRAY_SIZE(pmu_events__common_default_core),
@@ -2630,7 +2630,7 @@ static const struct compact_pmu_event pmu_events__test_soc_cpu_uncore_imc_free_r
};
-const struct pmu_table_entry pmu_events__test_soc_cpu[] = {
+static const struct pmu_table_entry pmu_events__test_soc_cpu[] = {
{
.entries = pmu_events__test_soc_cpu_default_core,
.num_entries = ARRAY_SIZE(pmu_events__test_soc_cpu_default_core),
@@ -2682,7 +2682,7 @@ static const struct compact_pmu_event pmu_metrics__test_soc_cpu_default_core[] =
};
-const struct pmu_table_entry pmu_metrics__test_soc_cpu[] = {
+static const struct pmu_table_entry pmu_metrics__test_soc_cpu[] = {
{
.entries = pmu_metrics__test_soc_cpu_default_core,
.num_entries = ARRAY_SIZE(pmu_metrics__test_soc_cpu_default_core),
@@ -2701,7 +2701,7 @@ static const struct compact_pmu_event pmu_events__test_soc_sys_uncore_sys_ddr_pm
};
-const struct pmu_table_entry pmu_events__test_soc_sys[] = {
+static const struct pmu_table_entry pmu_events__test_soc_sys[] = {
{
.entries = pmu_events__test_soc_sys_uncore_sys_ccn_pmu,
.num_entries = ARRAY_SIZE(pmu_events__test_soc_sys_uncore_sys_ccn_pmu),
@@ -2751,7 +2751,7 @@ struct pmu_events_map {
* Global table mapping each known CPU for the architecture to its
* table of PMU events.
*/
-const struct pmu_events_map pmu_events_map[] = {
+static const struct pmu_events_map pmu_events_map[] = {
{
.arch = "common",
.cpuid = "common",
diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 1f3917cbff87..786a7049363f 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -550,7 +550,7 @@ def print_pending_events() -> None:
_args.output_file.write(f"""
}};
-const struct pmu_table_entry {_pending_events_tblname}[] = {{
+static const struct pmu_table_entry {_pending_events_tblname}[] = {{
""")
for (pmu, tbl_pmu) in sorted(pmus):
pmu_name = f"{pmu}\\000"
@@ -605,7 +605,7 @@ def print_pending_metrics() -> None:
_args.output_file.write(f"""
}};
-const struct pmu_table_entry {_pending_metrics_tblname}[] = {{
+static const struct pmu_table_entry {_pending_metrics_tblname}[] = {{
""")
for (pmu, tbl_pmu) in sorted(pmus):
pmu_name = f"{pmu}\\000"
@@ -730,7 +730,7 @@ struct pmu_events_map {
* Global table mapping each known CPU for the architecture to its
* table of PMU events.
*/
-const struct pmu_events_map pmu_events_map[] = {
+static const struct pmu_events_map pmu_events_map[] = {
""")
for arch in archs:
if arch == 'test':
--
2.51.1.821.gb6fe4d2222-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread* [PATCH v1 07/22] perf expr: Add #target_cpu literal
2025-10-24 17:58 [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
` (5 preceding siblings ...)
2025-10-24 17:58 ` [PATCH v1 06/22] perf jevents: Make all tables static Ian Rogers
@ 2025-10-24 17:58 ` Ian Rogers
2025-11-04 4:56 ` Namhyung Kim
2025-10-24 17:58 ` [PATCH v1 08/22] perf jevents: Add set of common metrics based on default ones Ian Rogers
` (16 subsequent siblings)
23 siblings, 1 reply; 36+ messages in thread
From: Ian Rogers @ 2025-10-24 17:58 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
linux-perf-users
For CPU nanoseconds a lot of the stat-shadow metrics use either
task-clock or cpu-clock, the latter being used when
target__has_cpu. Add a #target_cpu literal so that json metrics can
perform the same test.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/util/expr.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
index 7fda0ff89c16..4df56f2b283d 100644
--- a/tools/perf/util/expr.c
+++ b/tools/perf/util/expr.c
@@ -409,6 +409,9 @@ double expr__get_literal(const char *literal, const struct expr_scanner_ctx *ctx
} else if (!strcmp("#core_wide", literal)) {
result = core_wide(ctx->system_wide, ctx->user_requested_cpu_list)
? 1.0 : 0.0;
+ } else if (!strcmp("#target_cpu", literal)) {
+ result = (ctx->system_wide || ctx->user_requested_cpu_list)
+ ? 1.0 : 0.0;
} else {
pr_err("Unrecognized literal '%s'", literal);
}
--
2.51.1.821.gb6fe4d2222-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread* Re: [PATCH v1 07/22] perf expr: Add #target_cpu literal
2025-10-24 17:58 ` [PATCH v1 07/22] perf expr: Add #target_cpu literal Ian Rogers
@ 2025-11-04 4:56 ` Namhyung Kim
2025-11-06 18:43 ` Ian Rogers
0 siblings, 1 reply; 36+ messages in thread
From: Namhyung Kim @ 2025-11-04 4:56 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, James Clark,
Xu Yang, Chun-Tse Shao, Thomas Richter, Sumanth Korikkar,
Collin Funk, Thomas Falcon, Howard Chu, Dapeng Mi, Levi Yun,
Yang Li, linux-kernel, linux-perf-users
On Fri, Oct 24, 2025 at 10:58:42AM -0700, Ian Rogers wrote:
> For CPU nanoseconds a lot of the stat-shadow metrics use either
> task-clock or cpu-clock, the latter being used when
> target__has_cpu. Add a #target_cpu literal so that json metrics can
> perform the same test.
Do we have documentation for the literals and metric expressions in
general? I think it's getting complex and we should provide one.
Thanks,
Namhyung
>
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
> tools/perf/util/expr.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
> index 7fda0ff89c16..4df56f2b283d 100644
> --- a/tools/perf/util/expr.c
> +++ b/tools/perf/util/expr.c
> @@ -409,6 +409,9 @@ double expr__get_literal(const char *literal, const struct expr_scanner_ctx *ctx
> } else if (!strcmp("#core_wide", literal)) {
> result = core_wide(ctx->system_wide, ctx->user_requested_cpu_list)
> ? 1.0 : 0.0;
> + } else if (!strcmp("#target_cpu", literal)) {
> + result = (ctx->system_wide || ctx->user_requested_cpu_list)
> + ? 1.0 : 0.0;
> } else {
> pr_err("Unrecognized literal '%s'", literal);
> }
> --
> 2.51.1.821.gb6fe4d2222-goog
>
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: [PATCH v1 07/22] perf expr: Add #target_cpu literal
2025-11-04 4:56 ` Namhyung Kim
@ 2025-11-06 18:43 ` Ian Rogers
0 siblings, 0 replies; 36+ messages in thread
From: Ian Rogers @ 2025-11-06 18:43 UTC (permalink / raw)
To: Namhyung Kim
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, James Clark,
Xu Yang, Chun-Tse Shao, Thomas Richter, Sumanth Korikkar,
Collin Funk, Thomas Falcon, Howard Chu, Dapeng Mi, Levi Yun,
Yang Li, linux-kernel, linux-perf-users
On Mon, Nov 3, 2025 at 8:56 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Fri, Oct 24, 2025 at 10:58:42AM -0700, Ian Rogers wrote:
> > For CPU nanoseconds a lot of the stat-shadow metrics use either
> > task-clock or cpu-clock, the latter being used when
> > target__has_cpu. Add a #target_cpu literal so that json metrics can
> > perform the same test.
>
> Do we have documentation for the literals and metric expressions in
> general? I think it's getting complex and we should provide one.
So in general all these are documented in the tools events in `perf list`:
```
$ perf list
...
tool:
duration_time
[Wall clock interval time in nanoseconds. Unit: tool]
has_pmem
[1 if persistent memory installed otherwise 0. Unit: tool]
num_cores
[Number of cores. A core consists of 1 or more thread,with each
thread being associated
with a logical Linux CPU. Unit: tool]
num_cpus
[Number of logical Linux CPUs. There may be multiple such CPUs
on a core. Unit: tool]
num_cpus_online
[Number of online logical Linux CPUs. There may be multiple
such CPUs on a core. Unit: tool]
num_dies
[Number of dies. Each die has 1 or more cores. Unit: tool]
num_packages
[Number of packages. Each package has 1 or more die. Unit: tool]
smt_on
[1 if simultaneous multithreading (aka hyperthreading) is
enable otherwise 0. Unit: tool]
system_time
[System/kernel time in nanoseconds. Unit: tool]
system_tsc_freq
[The amount a Time Stamp Counter (TSC) increases per second. Unit: tool]
user_time
[User (non-kernel) time in nanoseconds. Unit: tool]
```
We haven't done that with #core_wide and I followed that pattern for
#target_cpu as they do similar things. The issue with these two
"literals", why they are hard to be tool events, is that they depend
on command line options that may be processed after the processing of
the metrics. We could make tool versions with some plumbing. I'll look
to add that.
Thanks,
Ian
> Thanks,
> Namhyung
>
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> > tools/perf/util/expr.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
> > index 7fda0ff89c16..4df56f2b283d 100644
> > --- a/tools/perf/util/expr.c
> > +++ b/tools/perf/util/expr.c
> > @@ -409,6 +409,9 @@ double expr__get_literal(const char *literal, const struct expr_scanner_ctx *ctx
> > } else if (!strcmp("#core_wide", literal)) {
> > result = core_wide(ctx->system_wide, ctx->user_requested_cpu_list)
> > ? 1.0 : 0.0;
> > + } else if (!strcmp("#target_cpu", literal)) {
> > + result = (ctx->system_wide || ctx->user_requested_cpu_list)
> > + ? 1.0 : 0.0;
> > } else {
> > pr_err("Unrecognized literal '%s'", literal);
> > }
> > --
> > 2.51.1.821.gb6fe4d2222-goog
> >
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v1 08/22] perf jevents: Add set of common metrics based on default ones
2025-10-24 17:58 [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
` (6 preceding siblings ...)
2025-10-24 17:58 ` [PATCH v1 07/22] perf expr: Add #target_cpu literal Ian Rogers
@ 2025-10-24 17:58 ` Ian Rogers
2025-11-06 6:22 ` Namhyung Kim
2025-10-24 17:58 ` [PATCH v1 09/22] perf jevents: Add metric DefaultShowEvents Ian Rogers
` (15 subsequent siblings)
23 siblings, 1 reply; 36+ messages in thread
From: Ian Rogers @ 2025-10-24 17:58 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
linux-perf-users
Add support to getting a common set of metrics from a default
table. It simplifies the generation to add json metrics at the same
time. The metrics added are CPUs_utilized, cs_per_second,
migrations_per_second, page_faults_per_second, insn_per_cycle,
stalled_cycles_per_instruction, frontend_cycles_idle,
backend_cycles_idle, cycles_frequency, branch_frequency and
branch_miss_rate based on the shadow metric definitions.
Following this change the default perf stat output on an alderlake looks like:
```
$ perf stat -a -- sleep 1
Performance counter stats for 'system wide':
28,165,735,434 cpu-clock # 27.973 CPUs utilized
23,220 context-switches # 824.406 /sec
833 cpu-migrations # 29.575 /sec
35,293 page-faults # 1.253 K/sec
997,341,554 cpu_atom/instructions/ # 0.84 insn per cycle (35.63%)
11,197,053,736 cpu_core/instructions/ # 1.97 insn per cycle (58.21%)
1,184,871,493 cpu_atom/cycles/ # 0.042 GHz (35.64%)
5,676,692,769 cpu_core/cycles/ # 0.202 GHz (58.22%)
150,525,309 cpu_atom/branches/ # 5.344 M/sec (42.80%)
2,277,232,030 cpu_core/branches/ # 80.851 M/sec (58.21%)
5,248,575 cpu_atom/branch-misses/ # 3.49% of all branches (42.82%)
28,829,930 cpu_core/branch-misses/ # 1.27% of all branches (58.22%)
(software) # 824.4 cs/sec cs_per_second
TopdownL1 (cpu_core) # 12.6 % tma_bad_speculation
# 28.8 % tma_frontend_bound (66.57%)
TopdownL1 (cpu_core) # 25.8 % tma_backend_bound
# 32.8 % tma_retiring (66.57%)
(software) # 1253.1 faults/sec page_faults_per_second
# 0.0 GHz cycles_frequency (42.80%)
# 0.2 GHz cycles_frequency (74.92%)
TopdownL1 (cpu_atom) # 22.3 % tma_bad_speculation
# 17.2 % tma_retiring (49.95%)
TopdownL1 (cpu_atom) # 30.6 % tma_backend_bound
# 29.8 % tma_frontend_bound (49.94%)
(cpu_atom) # 6.9 K/sec branch_frequency (42.89%)
# 80.5 K/sec branch_frequency (74.93%)
# 29.6 migrations/sec migrations_per_second
# 28.0 CPUs CPUs_utilized
(cpu_atom) # 0.8 instructions insn_per_cycle (42.91%)
# 2.0 instructions insn_per_cycle (75.14%)
(cpu_atom) # 3.8 % branch_miss_rate (35.75%)
# 1.2 % branch_miss_rate (66.86%)
1.007063529 seconds time elapsed
```
Signed-off-by: Ian Rogers <irogers@google.com>
---
.../arch/common/common/metrics.json | 86 +++++++++++++
tools/perf/pmu-events/empty-pmu-events.c | 115 +++++++++++++-----
tools/perf/pmu-events/jevents.py | 21 +++-
tools/perf/pmu-events/pmu-events.h | 1 +
tools/perf/util/metricgroup.c | 31 +++--
5 files changed, 212 insertions(+), 42 deletions(-)
create mode 100644 tools/perf/pmu-events/arch/common/common/metrics.json
diff --git a/tools/perf/pmu-events/arch/common/common/metrics.json b/tools/perf/pmu-events/arch/common/common/metrics.json
new file mode 100644
index 000000000000..d1e37db18dc6
--- /dev/null
+++ b/tools/perf/pmu-events/arch/common/common/metrics.json
@@ -0,0 +1,86 @@
+[
+ {
+ "BriefDescription": "Average CPU utilization",
+ "MetricExpr": "(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)",
+ "MetricGroup": "Default",
+ "MetricName": "CPUs_utilized",
+ "ScaleUnit": "1CPUs",
+ "MetricConstraint": "NO_GROUP_EVENTS"
+ },
+ {
+ "BriefDescription": "Context switches per CPU second",
+ "MetricExpr": "(software@context\\-switches\\,name\\=context\\-switches@ * 1e9) / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
+ "MetricGroup": "Default",
+ "MetricName": "cs_per_second",
+ "ScaleUnit": "1cs/sec",
+ "MetricConstraint": "NO_GROUP_EVENTS"
+ },
+ {
+ "BriefDescription": "Process migrations to a new CPU per CPU second",
+ "MetricExpr": "(software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9) / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
+ "MetricGroup": "Default",
+ "MetricName": "migrations_per_second",
+ "ScaleUnit": "1migrations/sec",
+ "MetricConstraint": "NO_GROUP_EVENTS"
+ },
+ {
+ "BriefDescription": "Page faults per CPU second",
+ "MetricExpr": "(software@page\\-faults\\,name\\=page\\-faults@ * 1e9) / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
+ "MetricGroup": "Default",
+ "MetricName": "page_faults_per_second",
+ "ScaleUnit": "1faults/sec",
+ "MetricConstraint": "NO_GROUP_EVENTS"
+ },
+ {
+ "BriefDescription": "Instructions Per Cycle",
+ "MetricExpr": "instructions / cpu\\-cycles",
+ "MetricGroup": "Default",
+ "MetricName": "insn_per_cycle",
+ "MetricThreshold": "insn_per_cycle < 1",
+ "ScaleUnit": "1instructions"
+ },
+ {
+ "BriefDescription": "Max front or backend stalls per instruction",
+ "MetricExpr": "max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions",
+ "MetricGroup": "Default",
+ "MetricName": "stalled_cycles_per_instruction"
+ },
+ {
+ "BriefDescription": "Frontend stalls per cycle",
+ "MetricExpr": "stalled\\-cycles\\-frontend / cpu\\-cycles",
+ "MetricGroup": "Default",
+ "MetricName": "frontend_cycles_idle",
+ "MetricThreshold": "frontend_cycles_idle > 0.1"
+ },
+ {
+ "BriefDescription": "Backend stalls per cycle",
+ "MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
+ "MetricGroup": "Default",
+ "MetricName": "backend_cycles_idle",
+ "MetricThreshold": "backend_cycles_idle > 0.2"
+ },
+ {
+ "BriefDescription": "Cycles per CPU second",
+ "MetricExpr": "cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
+ "MetricGroup": "Default",
+ "MetricName": "cycles_frequency",
+ "ScaleUnit": "1GHz",
+ "MetricConstraint": "NO_GROUP_EVENTS"
+ },
+ {
+ "BriefDescription": "Branches per CPU second",
+ "MetricExpr": "branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
+ "MetricGroup": "Default",
+ "MetricName": "branch_frequency",
+ "ScaleUnit": "1000K/sec",
+ "MetricConstraint": "NO_GROUP_EVENTS"
+ },
+ {
+ "BriefDescription": "Branch miss rate",
+ "MetricExpr": "branch\\-misses / branches",
+ "MetricGroup": "Default",
+ "MetricName": "branch_miss_rate",
+ "MetricThreshold": "branch_miss_rate > 0.05",
+ "ScaleUnit": "100%"
+ }
+]
diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
index 5120fb93690e..83a01ecc625e 100644
--- a/tools/perf/pmu-events/empty-pmu-events.c
+++ b/tools/perf/pmu-events/empty-pmu-events.c
@@ -1303,21 +1303,32 @@ static const char *const big_c_string =
/* offset=127503 */ "sys_ccn_pmu.read_cycles\000uncore\000ccn read-cycles event\000config=0x2c\0000x01\00000\000\000\000\000\000"
/* offset=127580 */ "uncore_sys_cmn_pmu\000"
/* offset=127599 */ "sys_cmn_pmu.hnf_cache_miss\000uncore\000Counts total cache misses in first lookup result (high priority)\000eventid=1,type=5\000(434|436|43c|43a).*\00000\000\000\000\000\000"
-/* offset=127742 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\00000"
-/* offset=127764 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\00000"
-/* offset=127827 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\00000"
-/* offset=127993 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000"
-/* offset=128057 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000"
-/* offset=128124 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\00000"
-/* offset=128195 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\00000"
-/* offset=128289 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\00000"
-/* offset=128423 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\00000"
-/* offset=128487 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\00000"
-/* offset=128555 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\00000"
-/* offset=128625 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\00000"
-/* offset=128647 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\00000"
-/* offset=128669 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\00000"
-/* offset=128689 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\00000"
+/* offset=127742 */ "CPUs_utilized\000Default\000(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)\000\000Average CPU utilization\000\0001CPUs\000\000\000\00001"
+/* offset=127927 */ "cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\00001"
+/* offset=128159 */ "migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\00001"
+/* offset=128418 */ "page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\00001"
+/* offset=128648 */ "insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\00000"
+/* offset=128760 */ "stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\00000"
+/* offset=128923 */ "frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\00000"
+/* offset=129052 */ "backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\00000"
+/* offset=129177 */ "cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\00001"
+/* offset=129352 */ "branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000K/sec\000\000\000\00001"
+/* offset=129531 */ "branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\00000"
+/* offset=129634 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\00000"
+/* offset=129656 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\00000"
+/* offset=129719 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\00000"
+/* offset=129885 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000"
+/* offset=129949 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000"
+/* offset=130016 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\00000"
+/* offset=130087 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\00000"
+/* offset=130181 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\00000"
+/* offset=130315 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\00000"
+/* offset=130379 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\00000"
+/* offset=130447 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\00000"
+/* offset=130517 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\00000"
+/* offset=130539 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\00000"
+/* offset=130561 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\00000"
+/* offset=130581 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\00000"
;
static const struct compact_pmu_event pmu_events__common_default_core[] = {
@@ -2603,6 +2614,29 @@ static const struct pmu_table_entry pmu_events__common[] = {
},
};
+static const struct compact_pmu_event pmu_metrics__common_default_core[] = {
+{ 127742 }, /* CPUs_utilized\000Default\000(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)\000\000Average CPU utilization\000\0001CPUs\000\000\000\00001 */
+{ 129052 }, /* backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\00000 */
+{ 129352 }, /* branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000K/sec\000\000\000\00001 */
+{ 129531 }, /* branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\00000 */
+{ 127927 }, /* cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\00001 */
+{ 129177 }, /* cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\00001 */
+{ 128923 }, /* frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\00000 */
+{ 128648 }, /* insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\00000 */
+{ 128159 }, /* migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\00001 */
+{ 128418 }, /* page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\00001 */
+{ 128760 }, /* stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\00000 */
+
+};
+
+static const struct pmu_table_entry pmu_metrics__common[] = {
+{
+ .entries = pmu_metrics__common_default_core,
+ .num_entries = ARRAY_SIZE(pmu_metrics__common_default_core),
+ .pmu_name = { 0 /* default_core\000 */ },
+},
+};
+
static const struct compact_pmu_event pmu_events__test_soc_cpu_default_core[] = {
{ 126189 }, /* bp_l1_btb_correct\000branch\000L1 BTB Correction\000event=0x8a\000\00000\000\000\000\000\000 */
{ 126251 }, /* bp_l2_btb_correct\000branch\000L2 BTB Correction\000event=0x8b\000\00000\000\000\000\000\000 */
@@ -2664,21 +2698,21 @@ static const struct pmu_table_entry pmu_events__test_soc_cpu[] = {
};
static const struct compact_pmu_event pmu_metrics__test_soc_cpu_default_core[] = {
-{ 127742 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\00000 */
-{ 128423 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\00000 */
-{ 128195 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\00000 */
-{ 128289 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\00000 */
-{ 128487 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\00000 */
-{ 128555 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\00000 */
-{ 127827 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\00000 */
-{ 127764 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\00000 */
-{ 128689 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\00000 */
-{ 128625 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\00000 */
-{ 128647 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\00000 */
-{ 128669 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\00000 */
-{ 128124 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\00000 */
-{ 127993 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000 */
-{ 128057 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000 */
+{ 129634 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\00000 */
+{ 130315 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\00000 */
+{ 130087 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\00000 */
+{ 130181 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\00000 */
+{ 130379 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\00000 */
+{ 130447 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\00000 */
+{ 129719 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\00000 */
+{ 129656 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\00000 */
+{ 130581 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\00000 */
+{ 130517 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\00000 */
+{ 130539 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\00000 */
+{ 130561 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\00000 */
+{ 130016 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\00000 */
+{ 129885 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000 */
+{ 129949 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000 */
};
@@ -2759,7 +2793,10 @@ static const struct pmu_events_map pmu_events_map[] = {
.pmus = pmu_events__common,
.num_pmus = ARRAY_SIZE(pmu_events__common),
},
- .metric_table = {},
+ .metric_table = {
+ .pmus = pmu_metrics__common,
+ .num_pmus = ARRAY_SIZE(pmu_metrics__common),
+ },
},
{
.arch = "testarch",
@@ -3208,6 +3245,22 @@ const struct pmu_metrics_table *pmu_metrics_table__find(void)
return map ? &map->metric_table : NULL;
}
+const struct pmu_metrics_table *pmu_metrics_table__default(void)
+{
+ int i = 0;
+
+ for (;;) {
+ const struct pmu_events_map *map = &pmu_events_map[i++];
+
+ if (!map->arch)
+ break;
+
+ if (!strcmp(map->cpuid, "common"))
+ return &map->metric_table;
+ }
+ return NULL;
+}
+
const struct pmu_events_table *find_core_events_table(const char *arch, const char *cpuid)
{
for (const struct pmu_events_map *tables = &pmu_events_map[0];
diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 786a7049363f..5d3f4b44cfb7 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -755,7 +755,10 @@ static const struct pmu_events_map pmu_events_map[] = {
\t\t.pmus = pmu_events__common,
\t\t.num_pmus = ARRAY_SIZE(pmu_events__common),
\t},
-\t.metric_table = {},
+\t.metric_table = {
+\t\t.pmus = pmu_metrics__common,
+\t\t.num_pmus = ARRAY_SIZE(pmu_metrics__common),
+\t},
},
""")
else:
@@ -1237,6 +1240,22 @@ const struct pmu_metrics_table *pmu_metrics_table__find(void)
return map ? &map->metric_table : NULL;
}
+const struct pmu_metrics_table *pmu_metrics_table__default(void)
+{
+ int i = 0;
+
+ for (;;) {
+ const struct pmu_events_map *map = &pmu_events_map[i++];
+
+ if (!map->arch)
+ break;
+
+ if (!strcmp(map->cpuid, "common"))
+ return &map->metric_table;
+ }
+ return NULL;
+}
+
const struct pmu_events_table *find_core_events_table(const char *arch, const char *cpuid)
{
for (const struct pmu_events_map *tables = &pmu_events_map[0];
diff --git a/tools/perf/pmu-events/pmu-events.h b/tools/perf/pmu-events/pmu-events.h
index e0535380c0b2..559265a903c8 100644
--- a/tools/perf/pmu-events/pmu-events.h
+++ b/tools/perf/pmu-events/pmu-events.h
@@ -127,6 +127,7 @@ int pmu_metrics_table__find_metric(const struct pmu_metrics_table *table,
const struct pmu_events_table *perf_pmu__find_events_table(struct perf_pmu *pmu);
const struct pmu_events_table *perf_pmu__default_core_events_table(void);
const struct pmu_metrics_table *pmu_metrics_table__find(void);
+const struct pmu_metrics_table *pmu_metrics_table__default(void);
const struct pmu_events_table *find_core_events_table(const char *arch, const char *cpuid);
const struct pmu_metrics_table *find_core_metrics_table(const char *arch, const char *cpuid);
int pmu_for_each_core_event(pmu_event_iter_fn fn, void *data);
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 76092ee26761..e67e04ce01c9 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -424,10 +424,18 @@ int metricgroup__for_each_metric(const struct pmu_metrics_table *table, pmu_metr
.fn = fn,
.data = data,
};
+ const struct pmu_metrics_table *tables[2] = {
+ table,
+ pmu_metrics_table__default(),
+ };
+
+ for (size_t i = 0; i < ARRAY_SIZE(tables); i++) {
+ int ret;
- if (table) {
- int ret = pmu_metrics_table__for_each_metric(table, fn, data);
+ if (!tables[i])
+ continue;
+ ret = pmu_metrics_table__for_each_metric(tables[i], fn, data);
if (ret)
return ret;
}
@@ -1581,19 +1589,22 @@ static int metricgroup__has_metric_or_groups_callback(const struct pmu_metric *p
bool metricgroup__has_metric_or_groups(const char *pmu, const char *metric_or_groups)
{
- const struct pmu_metrics_table *table = pmu_metrics_table__find();
+ const struct pmu_metrics_table *tables[2] = {
+ pmu_metrics_table__find(),
+ pmu_metrics_table__default(),
+ };
struct metricgroup__has_metric_data data = {
.pmu = pmu,
.metric_or_groups = metric_or_groups,
};
- if (!table)
- return false;
-
- return pmu_metrics_table__for_each_metric(table,
- metricgroup__has_metric_or_groups_callback,
- &data)
- ? true : false;
+ for (size_t i = 0; i < ARRAY_SIZE(tables); i++) {
+ if (pmu_metrics_table__for_each_metric(tables[i],
+ metricgroup__has_metric_or_groups_callback,
+ &data))
+ return true;
+ }
+ return false;
}
static int metricgroup__topdown_max_level_callback(const struct pmu_metric *pm,
--
2.51.1.821.gb6fe4d2222-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread* Re: [PATCH v1 08/22] perf jevents: Add set of common metrics based on default ones
2025-10-24 17:58 ` [PATCH v1 08/22] perf jevents: Add set of common metrics based on default ones Ian Rogers
@ 2025-11-06 6:22 ` Namhyung Kim
2025-11-06 18:05 ` Ian Rogers
0 siblings, 1 reply; 36+ messages in thread
From: Namhyung Kim @ 2025-11-06 6:22 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, James Clark,
Xu Yang, Chun-Tse Shao, Thomas Richter, Sumanth Korikkar,
Collin Funk, Thomas Falcon, Howard Chu, Dapeng Mi, Levi Yun,
Yang Li, linux-kernel, linux-perf-users
On Fri, Oct 24, 2025 at 10:58:43AM -0700, Ian Rogers wrote:
> Add support to getting a common set of metrics from a default
> table. It simplifies the generation to add json metrics at the same
> time. The metrics added are CPUs_utilized, cs_per_second,
> migrations_per_second, page_faults_per_second, insn_per_cycle,
> stalled_cycles_per_instruction, frontend_cycles_idle,
> backend_cycles_idle, cycles_frequency, branch_frequency and
> branch_miss_rate based on the shadow metric definitions.
>
> Following this change the default perf stat output on an alderlake looks like:
> ```
> $ perf stat -a -- sleep 1
>
> Performance counter stats for 'system wide':
>
> 28,165,735,434 cpu-clock # 27.973 CPUs utilized
> 23,220 context-switches # 824.406 /sec
> 833 cpu-migrations # 29.575 /sec
> 35,293 page-faults # 1.253 K/sec
> 997,341,554 cpu_atom/instructions/ # 0.84 insn per cycle (35.63%)
> 11,197,053,736 cpu_core/instructions/ # 1.97 insn per cycle (58.21%)
> 1,184,871,493 cpu_atom/cycles/ # 0.042 GHz (35.64%)
> 5,676,692,769 cpu_core/cycles/ # 0.202 GHz (58.22%)
> 150,525,309 cpu_atom/branches/ # 5.344 M/sec (42.80%)
> 2,277,232,030 cpu_core/branches/ # 80.851 M/sec (58.21%)
> 5,248,575 cpu_atom/branch-misses/ # 3.49% of all branches (42.82%)
> 28,829,930 cpu_core/branch-misses/ # 1.27% of all branches (58.22%)
> (software) # 824.4 cs/sec cs_per_second
> TopdownL1 (cpu_core) # 12.6 % tma_bad_speculation
> # 28.8 % tma_frontend_bound (66.57%)
> TopdownL1 (cpu_core) # 25.8 % tma_backend_bound
> # 32.8 % tma_retiring (66.57%)
> (software) # 1253.1 faults/sec page_faults_per_second
> # 0.0 GHz cycles_frequency (42.80%)
> # 0.2 GHz cycles_frequency (74.92%)
> TopdownL1 (cpu_atom) # 22.3 % tma_bad_speculation
> # 17.2 % tma_retiring (49.95%)
> TopdownL1 (cpu_atom) # 30.6 % tma_backend_bound
> # 29.8 % tma_frontend_bound (49.94%)
> (cpu_atom) # 6.9 K/sec branch_frequency (42.89%)
> # 80.5 K/sec branch_frequency (74.93%)
> # 29.6 migrations/sec migrations_per_second
> # 28.0 CPUs CPUs_utilized
> (cpu_atom) # 0.8 instructions insn_per_cycle (42.91%)
> # 2.0 instructions insn_per_cycle (75.14%)
> (cpu_atom) # 3.8 % branch_miss_rate (35.75%)
> # 1.2 % branch_miss_rate (66.86%)
>
> 1.007063529 seconds time elapsed
> ```
>
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
> .../arch/common/common/metrics.json | 86 +++++++++++++
> tools/perf/pmu-events/empty-pmu-events.c | 115 +++++++++++++-----
> tools/perf/pmu-events/jevents.py | 21 +++-
> tools/perf/pmu-events/pmu-events.h | 1 +
> tools/perf/util/metricgroup.c | 31 +++--
> 5 files changed, 212 insertions(+), 42 deletions(-)
> create mode 100644 tools/perf/pmu-events/arch/common/common/metrics.json
>
> diff --git a/tools/perf/pmu-events/arch/common/common/metrics.json b/tools/perf/pmu-events/arch/common/common/metrics.json
> new file mode 100644
> index 000000000000..d1e37db18dc6
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/common/common/metrics.json
> @@ -0,0 +1,86 @@
> +[
> + {
> + "BriefDescription": "Average CPU utilization",
> + "MetricExpr": "(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)",
> + "MetricGroup": "Default",
> + "MetricName": "CPUs_utilized",
> + "ScaleUnit": "1CPUs",
> + "MetricConstraint": "NO_GROUP_EVENTS"
> + },
> + {
> + "BriefDescription": "Context switches per CPU second",
> + "MetricExpr": "(software@context\\-switches\\,name\\=context\\-switches@ * 1e9) / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
> + "MetricGroup": "Default",
> + "MetricName": "cs_per_second",
> + "ScaleUnit": "1cs/sec",
> + "MetricConstraint": "NO_GROUP_EVENTS"
> + },
> + {
> + "BriefDescription": "Process migrations to a new CPU per CPU second",
> + "MetricExpr": "(software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9) / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
> + "MetricGroup": "Default",
> + "MetricName": "migrations_per_second",
> + "ScaleUnit": "1migrations/sec",
> + "MetricConstraint": "NO_GROUP_EVENTS"
> + },
> + {
> + "BriefDescription": "Page faults per CPU second",
> + "MetricExpr": "(software@page\\-faults\\,name\\=page\\-faults@ * 1e9) / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
> + "MetricGroup": "Default",
> + "MetricName": "page_faults_per_second",
> + "ScaleUnit": "1faults/sec",
> + "MetricConstraint": "NO_GROUP_EVENTS"
> + },
> + {
> + "BriefDescription": "Instructions Per Cycle",
> + "MetricExpr": "instructions / cpu\\-cycles",
> + "MetricGroup": "Default",
> + "MetricName": "insn_per_cycle",
> + "MetricThreshold": "insn_per_cycle < 1",
> + "ScaleUnit": "1instructions"
> + },
> + {
> + "BriefDescription": "Max front or backend stalls per instruction",
> + "MetricExpr": "max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions",
> + "MetricGroup": "Default",
> + "MetricName": "stalled_cycles_per_instruction"
> + },
> + {
> + "BriefDescription": "Frontend stalls per cycle",
> + "MetricExpr": "stalled\\-cycles\\-frontend / cpu\\-cycles",
> + "MetricGroup": "Default",
> + "MetricName": "frontend_cycles_idle",
> + "MetricThreshold": "frontend_cycles_idle > 0.1"
> + },
> + {
> + "BriefDescription": "Backend stalls per cycle",
> + "MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
> + "MetricGroup": "Default",
> + "MetricName": "backend_cycles_idle",
> + "MetricThreshold": "backend_cycles_idle > 0.2"
> + },
> + {
> + "BriefDescription": "Cycles per CPU second",
> + "MetricExpr": "cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
> + "MetricGroup": "Default",
> + "MetricName": "cycles_frequency",
> + "ScaleUnit": "1GHz",
> + "MetricConstraint": "NO_GROUP_EVENTS"
> + },
> + {
> + "BriefDescription": "Branches per CPU second",
> + "MetricExpr": "branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
> + "MetricGroup": "Default",
> + "MetricName": "branch_frequency",
> + "ScaleUnit": "1000K/sec",
Wouldn't it be "1000M/sec" ?
> + "MetricConstraint": "NO_GROUP_EVENTS"
> + },
> + {
> + "BriefDescription": "Branch miss rate",
> + "MetricExpr": "branch\\-misses / branches",
> + "MetricGroup": "Default",
> + "MetricName": "branch_miss_rate",
> + "MetricThreshold": "branch_miss_rate > 0.05",
Is MetricThreshold evaluated before scaling?
Thanks,
Namhyung
> + "ScaleUnit": "100%"
> + }
> +]
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: [PATCH v1 08/22] perf jevents: Add set of common metrics based on default ones
2025-11-06 6:22 ` Namhyung Kim
@ 2025-11-06 18:05 ` Ian Rogers
0 siblings, 0 replies; 36+ messages in thread
From: Ian Rogers @ 2025-11-06 18:05 UTC (permalink / raw)
To: Namhyung Kim
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, James Clark,
Xu Yang, Chun-Tse Shao, Thomas Richter, Sumanth Korikkar,
Collin Funk, Thomas Falcon, Howard Chu, Dapeng Mi, Levi Yun,
Yang Li, linux-kernel, linux-perf-users
On Wed, Nov 5, 2025 at 10:22 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Fri, Oct 24, 2025 at 10:58:43AM -0700, Ian Rogers wrote:
> > Add support to getting a common set of metrics from a default
> > table. It simplifies the generation to add json metrics at the same
> > time. The metrics added are CPUs_utilized, cs_per_second,
> > migrations_per_second, page_faults_per_second, insn_per_cycle,
> > stalled_cycles_per_instruction, frontend_cycles_idle,
> > backend_cycles_idle, cycles_frequency, branch_frequency and
> > branch_miss_rate based on the shadow metric definitions.
> >
> > Following this change the default perf stat output on an alderlake looks like:
> > ```
> > $ perf stat -a -- sleep 1
> >
> > Performance counter stats for 'system wide':
> >
> > 28,165,735,434 cpu-clock # 27.973 CPUs utilized
> > 23,220 context-switches # 824.406 /sec
> > 833 cpu-migrations # 29.575 /sec
> > 35,293 page-faults # 1.253 K/sec
> > 997,341,554 cpu_atom/instructions/ # 0.84 insn per cycle (35.63%)
> > 11,197,053,736 cpu_core/instructions/ # 1.97 insn per cycle (58.21%)
> > 1,184,871,493 cpu_atom/cycles/ # 0.042 GHz (35.64%)
> > 5,676,692,769 cpu_core/cycles/ # 0.202 GHz (58.22%)
> > 150,525,309 cpu_atom/branches/ # 5.344 M/sec (42.80%)
> > 2,277,232,030 cpu_core/branches/ # 80.851 M/sec (58.21%)
> > 5,248,575 cpu_atom/branch-misses/ # 3.49% of all branches (42.82%)
> > 28,829,930 cpu_core/branch-misses/ # 1.27% of all branches (58.22%)
> > (software) # 824.4 cs/sec cs_per_second
> > TopdownL1 (cpu_core) # 12.6 % tma_bad_speculation
> > # 28.8 % tma_frontend_bound (66.57%)
> > TopdownL1 (cpu_core) # 25.8 % tma_backend_bound
> > # 32.8 % tma_retiring (66.57%)
> > (software) # 1253.1 faults/sec page_faults_per_second
> > # 0.0 GHz cycles_frequency (42.80%)
> > # 0.2 GHz cycles_frequency (74.92%)
> > TopdownL1 (cpu_atom) # 22.3 % tma_bad_speculation
> > # 17.2 % tma_retiring (49.95%)
> > TopdownL1 (cpu_atom) # 30.6 % tma_backend_bound
> > # 29.8 % tma_frontend_bound (49.94%)
> > (cpu_atom) # 6.9 K/sec branch_frequency (42.89%)
> > # 80.5 K/sec branch_frequency (74.93%)
> > # 29.6 migrations/sec migrations_per_second
> > # 28.0 CPUs CPUs_utilized
> > (cpu_atom) # 0.8 instructions insn_per_cycle (42.91%)
> > # 2.0 instructions insn_per_cycle (75.14%)
> > (cpu_atom) # 3.8 % branch_miss_rate (35.75%)
> > # 1.2 % branch_miss_rate (66.86%)
> >
> > 1.007063529 seconds time elapsed
> > ```
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> > .../arch/common/common/metrics.json | 86 +++++++++++++
> > tools/perf/pmu-events/empty-pmu-events.c | 115 +++++++++++++-----
> > tools/perf/pmu-events/jevents.py | 21 +++-
> > tools/perf/pmu-events/pmu-events.h | 1 +
> > tools/perf/util/metricgroup.c | 31 +++--
> > 5 files changed, 212 insertions(+), 42 deletions(-)
> > create mode 100644 tools/perf/pmu-events/arch/common/common/metrics.json
> >
> > diff --git a/tools/perf/pmu-events/arch/common/common/metrics.json b/tools/perf/pmu-events/arch/common/common/metrics.json
> > new file mode 100644
> > index 000000000000..d1e37db18dc6
> > --- /dev/null
> > +++ b/tools/perf/pmu-events/arch/common/common/metrics.json
> > @@ -0,0 +1,86 @@
> > +[
> > + {
> > + "BriefDescription": "Average CPU utilization",
> > + "MetricExpr": "(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)",
> > + "MetricGroup": "Default",
> > + "MetricName": "CPUs_utilized",
> > + "ScaleUnit": "1CPUs",
> > + "MetricConstraint": "NO_GROUP_EVENTS"
> > + },
> > + {
> > + "BriefDescription": "Context switches per CPU second",
> > + "MetricExpr": "(software@context\\-switches\\,name\\=context\\-switches@ * 1e9) / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
> > + "MetricGroup": "Default",
> > + "MetricName": "cs_per_second",
> > + "ScaleUnit": "1cs/sec",
> > + "MetricConstraint": "NO_GROUP_EVENTS"
> > + },
> > + {
> > + "BriefDescription": "Process migrations to a new CPU per CPU second",
> > + "MetricExpr": "(software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9) / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
> > + "MetricGroup": "Default",
> > + "MetricName": "migrations_per_second",
> > + "ScaleUnit": "1migrations/sec",
> > + "MetricConstraint": "NO_GROUP_EVENTS"
> > + },
> > + {
> > + "BriefDescription": "Page faults per CPU second",
> > + "MetricExpr": "(software@page\\-faults\\,name\\=page\\-faults@ * 1e9) / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
> > + "MetricGroup": "Default",
> > + "MetricName": "page_faults_per_second",
> > + "ScaleUnit": "1faults/sec",
> > + "MetricConstraint": "NO_GROUP_EVENTS"
> > + },
> > + {
> > + "BriefDescription": "Instructions Per Cycle",
> > + "MetricExpr": "instructions / cpu\\-cycles",
> > + "MetricGroup": "Default",
> > + "MetricName": "insn_per_cycle",
> > + "MetricThreshold": "insn_per_cycle < 1",
> > + "ScaleUnit": "1instructions"
> > + },
> > + {
> > + "BriefDescription": "Max front or backend stalls per instruction",
> > + "MetricExpr": "max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions",
> > + "MetricGroup": "Default",
> > + "MetricName": "stalled_cycles_per_instruction"
> > + },
> > + {
> > + "BriefDescription": "Frontend stalls per cycle",
> > + "MetricExpr": "stalled\\-cycles\\-frontend / cpu\\-cycles",
> > + "MetricGroup": "Default",
> > + "MetricName": "frontend_cycles_idle",
> > + "MetricThreshold": "frontend_cycles_idle > 0.1"
> > + },
> > + {
> > + "BriefDescription": "Backend stalls per cycle",
> > + "MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
> > + "MetricGroup": "Default",
> > + "MetricName": "backend_cycles_idle",
> > + "MetricThreshold": "backend_cycles_idle > 0.2"
> > + },
> > + {
> > + "BriefDescription": "Cycles per CPU second",
> > + "MetricExpr": "cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
> > + "MetricGroup": "Default",
> > + "MetricName": "cycles_frequency",
> > + "ScaleUnit": "1GHz",
> > + "MetricConstraint": "NO_GROUP_EVENTS"
> > + },
> > + {
> > + "BriefDescription": "Branches per CPU second",
> > + "MetricExpr": "branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)",
> > + "MetricGroup": "Default",
> > + "MetricName": "branch_frequency",
> > + "ScaleUnit": "1000K/sec",
>
> Wouldn't it be "1000M/sec" ?
Agreed. Will fix in v2. The existing logic does multiple by 1e9 in one
place and then divide by 1e3 in another. It would be good if we could
do better units, based on metric value, but I'll leave that for
another day.
> > + "MetricConstraint": "NO_GROUP_EVENTS"
> > + },
> > + {
> > + "BriefDescription": "Branch miss rate",
> > + "MetricExpr": "branch\\-misses / branches",
> > + "MetricGroup": "Default",
> > + "MetricName": "branch_miss_rate",
> > + "MetricThreshold": "branch_miss_rate > 0.05",
>
> Is MetricThreshold evaluated before scaling?
Yep. Primarily to help share most of the events/calculation with the
metric being created. Fwiw, the 5% here comes from the existing
stat-shadow metric threshold.
Thanks,
Ian
> Thanks,
> Namhyung
>
>
> > + "ScaleUnit": "100%"
> > + }
> > +]
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v1 09/22] perf jevents: Add metric DefaultShowEvents
2025-10-24 17:58 [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
` (7 preceding siblings ...)
2025-10-24 17:58 ` [PATCH v1 08/22] perf jevents: Add set of common metrics based on default ones Ian Rogers
@ 2025-10-24 17:58 ` Ian Rogers
2025-10-24 17:58 ` [PATCH v1 10/22] perf stat: Add detail -d,-dd,-ddd metrics Ian Rogers
` (14 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Ian Rogers @ 2025-10-24 17:58 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
linux-perf-users
Some Default group metrics require their events showing for
consistency with perf's previous behavior. Add a flag to indicate when
this is the case and use it in stat-display.
As events are coming from Default metrics remove that default hardware
and software events from perf stat.
Following this change the default perf stat output on an alderlake looks like:
```
$ perf stat -a -- sleep 1
Performance counter stats for 'system wide':
20,759 context-switches # 735.7 cs/sec cs_per_second
TopdownL1 (cpu_core) # 7.8 % tma_bad_speculation
# 34.8 % tma_frontend_bound
TopdownL1 (cpu_core) # 39.0 % tma_backend_bound
# 18.4 % tma_retiring
769 page-faults # 27.3 faults/sec page_faults_per_second
531,102,439 cpu_atom/cpu-cycles/ # 0.0 GHz cycles_frequency (49.80%)
785,144,850 cpu_core/cpu-cycles/ # 0.0 GHz cycles_frequency
# 17.6 % tma_bad_speculation
# 14.4 % tma_retiring (50.20%)
# 37.0 % tma_backend_bound
# 31.0 % tma_frontend_bound (50.37%)
47,631,924 cpu_atom/branches/ # 1.7 K/sec branch_frequency (60.31%)
138,036,825 cpu_core/branches/ # 4.9 K/sec branch_frequency
779 cpu-migrations # 27.6 migrations/sec migrations_per_second
28,218,162,085 cpu-clock # 28.0 CPUs CPUs_utilized
522,230,152 cpu_atom/cpu-cycles/ # 0.5 instructions insn_per_cycle (60.12%)
785,133,103 cpu_core/cpu-cycles/ # 1.0 instructions insn_per_cycle
2,541,997 cpu_atom/branch-misses/ # 5.5 % branch_miss_rate (49.63%)
3,106,064 cpu_core/branch-misses/ # 2.3 % branch_miss_rate
1.007489028 seconds time elapsed
```
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/builtin-stat.c | 42 +------
.../arch/common/common/metrics.json | 33 ++++--
tools/perf/pmu-events/empty-pmu-events.c | 106 +++++++++---------
tools/perf/pmu-events/jevents.py | 7 +-
tools/perf/pmu-events/pmu-events.h | 1 +
tools/perf/util/evsel.h | 1 +
tools/perf/util/metricgroup.c | 13 +++
tools/perf/util/stat-display.c | 4 +-
tools/perf/util/stat-shadow.c | 2 +-
9 files changed, 102 insertions(+), 107 deletions(-)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 3c3188a57016..9c7d63614cab 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1853,16 +1853,6 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
return 0;
}
-/* Add given software event to evlist without wildcarding. */
-static int parse_software_event(struct evlist *evlist, const char *event,
- struct parse_events_error *err)
-{
- char buf[256];
-
- snprintf(buf, sizeof(buf), "software/%s,name=%s/", event, event);
- return parse_events(evlist, buf, err);
-}
-
/* Add legacy hardware/hardware-cache event to evlist for all core PMUs without wildcarding. */
static int parse_hardware_event(struct evlist *evlist, const char *event,
struct parse_events_error *err)
@@ -2007,36 +1997,10 @@ static int add_default_events(void)
stat_config.topdown_level = 1;
if (!evlist->core.nr_entries && !evsel_list->core.nr_entries) {
- /* No events so add defaults. */
- const char *sw_events[] = {
- target__has_cpu(&target) ? "cpu-clock" : "task-clock",
- "context-switches",
- "cpu-migrations",
- "page-faults",
- };
- const char *hw_events[] = {
- "instructions",
- "cycles",
- "stalled-cycles-frontend",
- "stalled-cycles-backend",
- "branches",
- "branch-misses",
- };
-
- for (size_t i = 0; i < ARRAY_SIZE(sw_events); i++) {
- ret = parse_software_event(evlist, sw_events[i], &err);
- if (ret)
- goto out;
- }
- for (size_t i = 0; i < ARRAY_SIZE(hw_events); i++) {
- ret = parse_hardware_event(evlist, hw_events[i], &err);
- if (ret)
- goto out;
- }
-
/*
- * Add TopdownL1 metrics if they exist. To minimize
- * multiplexing, don't request threshold computation.
+ * Add Default metrics. To minimize multiplexing, don't request
+ * threshold computation, but it will be computed if the events
+ * are present.
*/
if (metricgroup__has_metric_or_groups(pmu, "Default")) {
struct evlist *metric_evlist = evlist__new();
diff --git a/tools/perf/pmu-events/arch/common/common/metrics.json b/tools/perf/pmu-events/arch/common/common/metrics.json
index d1e37db18dc6..017bbdede3d7 100644
--- a/tools/perf/pmu-events/arch/common/common/metrics.json
+++ b/tools/perf/pmu-events/arch/common/common/metrics.json
@@ -5,7 +5,8 @@
"MetricGroup": "Default",
"MetricName": "CPUs_utilized",
"ScaleUnit": "1CPUs",
- "MetricConstraint": "NO_GROUP_EVENTS"
+ "MetricConstraint": "NO_GROUP_EVENTS",
+ "DefaultShowEvents": "1"
},
{
"BriefDescription": "Context switches per CPU second",
@@ -13,7 +14,8 @@
"MetricGroup": "Default",
"MetricName": "cs_per_second",
"ScaleUnit": "1cs/sec",
- "MetricConstraint": "NO_GROUP_EVENTS"
+ "MetricConstraint": "NO_GROUP_EVENTS",
+ "DefaultShowEvents": "1"
},
{
"BriefDescription": "Process migrations to a new CPU per CPU second",
@@ -21,7 +23,8 @@
"MetricGroup": "Default",
"MetricName": "migrations_per_second",
"ScaleUnit": "1migrations/sec",
- "MetricConstraint": "NO_GROUP_EVENTS"
+ "MetricConstraint": "NO_GROUP_EVENTS",
+ "DefaultShowEvents": "1"
},
{
"BriefDescription": "Page faults per CPU second",
@@ -29,7 +32,8 @@
"MetricGroup": "Default",
"MetricName": "page_faults_per_second",
"ScaleUnit": "1faults/sec",
- "MetricConstraint": "NO_GROUP_EVENTS"
+ "MetricConstraint": "NO_GROUP_EVENTS",
+ "DefaultShowEvents": "1"
},
{
"BriefDescription": "Instructions Per Cycle",
@@ -37,27 +41,31 @@
"MetricGroup": "Default",
"MetricName": "insn_per_cycle",
"MetricThreshold": "insn_per_cycle < 1",
- "ScaleUnit": "1instructions"
+ "ScaleUnit": "1instructions",
+ "DefaultShowEvents": "1"
},
{
"BriefDescription": "Max front or backend stalls per instruction",
"MetricExpr": "max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions",
"MetricGroup": "Default",
- "MetricName": "stalled_cycles_per_instruction"
+ "MetricName": "stalled_cycles_per_instruction",
+ "DefaultShowEvents": "1"
},
{
"BriefDescription": "Frontend stalls per cycle",
"MetricExpr": "stalled\\-cycles\\-frontend / cpu\\-cycles",
"MetricGroup": "Default",
"MetricName": "frontend_cycles_idle",
- "MetricThreshold": "frontend_cycles_idle > 0.1"
+ "MetricThreshold": "frontend_cycles_idle > 0.1",
+ "DefaultShowEvents": "1"
},
{
"BriefDescription": "Backend stalls per cycle",
"MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
"MetricGroup": "Default",
"MetricName": "backend_cycles_idle",
- "MetricThreshold": "backend_cycles_idle > 0.2"
+ "MetricThreshold": "backend_cycles_idle > 0.2",
+ "DefaultShowEvents": "1"
},
{
"BriefDescription": "Cycles per CPU second",
@@ -65,7 +73,8 @@
"MetricGroup": "Default",
"MetricName": "cycles_frequency",
"ScaleUnit": "1GHz",
- "MetricConstraint": "NO_GROUP_EVENTS"
+ "MetricConstraint": "NO_GROUP_EVENTS",
+ "DefaultShowEvents": "1"
},
{
"BriefDescription": "Branches per CPU second",
@@ -73,7 +82,8 @@
"MetricGroup": "Default",
"MetricName": "branch_frequency",
"ScaleUnit": "1000K/sec",
- "MetricConstraint": "NO_GROUP_EVENTS"
+ "MetricConstraint": "NO_GROUP_EVENTS",
+ "DefaultShowEvents": "1"
},
{
"BriefDescription": "Branch miss rate",
@@ -81,6 +91,7 @@
"MetricGroup": "Default",
"MetricName": "branch_miss_rate",
"MetricThreshold": "branch_miss_rate > 0.05",
- "ScaleUnit": "100%"
+ "ScaleUnit": "100%",
+ "DefaultShowEvents": "1"
}
]
diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
index 83a01ecc625e..71464b1d8afe 100644
--- a/tools/perf/pmu-events/empty-pmu-events.c
+++ b/tools/perf/pmu-events/empty-pmu-events.c
@@ -1303,32 +1303,32 @@ static const char *const big_c_string =
/* offset=127503 */ "sys_ccn_pmu.read_cycles\000uncore\000ccn read-cycles event\000config=0x2c\0000x01\00000\000\000\000\000\000"
/* offset=127580 */ "uncore_sys_cmn_pmu\000"
/* offset=127599 */ "sys_cmn_pmu.hnf_cache_miss\000uncore\000Counts total cache misses in first lookup result (high priority)\000eventid=1,type=5\000(434|436|43c|43a).*\00000\000\000\000\000\000"
-/* offset=127742 */ "CPUs_utilized\000Default\000(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)\000\000Average CPU utilization\000\0001CPUs\000\000\000\00001"
-/* offset=127927 */ "cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\00001"
-/* offset=128159 */ "migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\00001"
-/* offset=128418 */ "page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\00001"
-/* offset=128648 */ "insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\00000"
-/* offset=128760 */ "stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\00000"
-/* offset=128923 */ "frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\00000"
-/* offset=129052 */ "backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\00000"
-/* offset=129177 */ "cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\00001"
-/* offset=129352 */ "branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000K/sec\000\000\000\00001"
-/* offset=129531 */ "branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\00000"
-/* offset=129634 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\00000"
-/* offset=129656 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\00000"
-/* offset=129719 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\00000"
-/* offset=129885 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000"
-/* offset=129949 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000"
-/* offset=130016 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\00000"
-/* offset=130087 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\00000"
-/* offset=130181 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\00000"
-/* offset=130315 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\00000"
-/* offset=130379 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\00000"
-/* offset=130447 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\00000"
-/* offset=130517 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\00000"
-/* offset=130539 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\00000"
-/* offset=130561 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\00000"
-/* offset=130581 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\00000"
+/* offset=127742 */ "CPUs_utilized\000Default\000(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)\000\000Average CPU utilization\000\0001CPUs\000\000\000\000011"
+/* offset=127928 */ "cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\000011"
+/* offset=128161 */ "migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\000011"
+/* offset=128421 */ "page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\000011"
+/* offset=128652 */ "insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\000001"
+/* offset=128765 */ "stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001"
+/* offset=128929 */ "frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001"
+/* offset=129059 */ "backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\000001"
+/* offset=129185 */ "cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011"
+/* offset=129361 */ "branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000K/sec\000\000\000\000011"
+/* offset=129541 */ "branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001"
+/* offset=129645 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000"
+/* offset=129668 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000"
+/* offset=129732 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000"
+/* offset=129899 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
+/* offset=129964 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
+/* offset=130032 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000"
+/* offset=130104 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000"
+/* offset=130199 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000"
+/* offset=130334 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000"
+/* offset=130399 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000"
+/* offset=130468 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000"
+/* offset=130539 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\000000"
+/* offset=130562 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\000000"
+/* offset=130585 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\000000"
+/* offset=130606 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000"
;
static const struct compact_pmu_event pmu_events__common_default_core[] = {
@@ -2615,17 +2615,17 @@ static const struct pmu_table_entry pmu_events__common[] = {
};
static const struct compact_pmu_event pmu_metrics__common_default_core[] = {
-{ 127742 }, /* CPUs_utilized\000Default\000(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)\000\000Average CPU utilization\000\0001CPUs\000\000\000\00001 */
-{ 129052 }, /* backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\00000 */
-{ 129352 }, /* branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000K/sec\000\000\000\00001 */
-{ 129531 }, /* branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\00000 */
-{ 127927 }, /* cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\00001 */
-{ 129177 }, /* cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\00001 */
-{ 128923 }, /* frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\00000 */
-{ 128648 }, /* insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\00000 */
-{ 128159 }, /* migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\00001 */
-{ 128418 }, /* page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\00001 */
-{ 128760 }, /* stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\00000 */
+{ 127742 }, /* CPUs_utilized\000Default\000(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)\000\000Average CPU utilization\000\0001CPUs\000\000\000\000011 */
+{ 129059 }, /* backend_cycles_idle\000Default\000stalled\\-cycles\\-backend / cpu\\-cycles\000backend_cycles_idle > 0.2\000Backend stalls per cycle\000\000\000\000\000\000001 */
+{ 129361 }, /* branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000K/sec\000\000\000\000011 */
+{ 129541 }, /* branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001 */
+{ 127928 }, /* cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\000011 */
+{ 129185 }, /* cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011 */
+{ 128929 }, /* frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001 */
+{ 128652 }, /* insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\000001 */
+{ 128161 }, /* migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\000011 */
+{ 128421 }, /* page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\000011 */
+{ 128765 }, /* stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001 */
};
@@ -2698,21 +2698,21 @@ static const struct pmu_table_entry pmu_events__test_soc_cpu[] = {
};
static const struct compact_pmu_event pmu_metrics__test_soc_cpu_default_core[] = {
-{ 129634 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\00000 */
-{ 130315 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\00000 */
-{ 130087 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\00000 */
-{ 130181 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\00000 */
-{ 130379 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\00000 */
-{ 130447 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\00000 */
-{ 129719 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\00000 */
-{ 129656 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\00000 */
-{ 130581 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\00000 */
-{ 130517 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\00000 */
-{ 130539 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\00000 */
-{ 130561 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\00000 */
-{ 130016 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\00000 */
-{ 129885 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000 */
-{ 129949 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\00000 */
+{ 129645 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000 */
+{ 130334 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000 */
+{ 130104 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000 */
+{ 130199 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000 */
+{ 130399 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
+{ 130468 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
+{ 129732 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000 */
+{ 129668 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000 */
+{ 130606 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000 */
+{ 130539 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\000000 */
+{ 130562 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\000000 */
+{ 130585 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\000000 */
+{ 130032 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000 */
+{ 129899 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
+{ 129964 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
};
@@ -2894,6 +2894,8 @@ static void decompress_metric(int offset, struct pmu_metric *pm)
pm->aggr_mode = *p - '0';
p++;
pm->event_grouping = *p - '0';
+ p++;
+ pm->default_show_events = *p - '0';
}
static int pmu_events_table__for_each_event_pmu(const struct pmu_events_table *table,
diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 5d3f4b44cfb7..3413ee5d0227 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -58,10 +58,12 @@ _json_event_attributes = [
_json_metric_attributes = [
'metric_name', 'metric_group', 'metric_expr', 'metric_threshold',
'desc', 'long_desc', 'unit', 'compat', 'metricgroup_no_group',
- 'default_metricgroup_name', 'aggr_mode', 'event_grouping'
+ 'default_metricgroup_name', 'aggr_mode', 'event_grouping',
+ 'default_show_events'
]
# Attributes that are bools or enum int values, encoded as '0', '1',...
-_json_enum_attributes = ['aggr_mode', 'deprecated', 'event_grouping', 'perpkg']
+_json_enum_attributes = ['aggr_mode', 'deprecated', 'event_grouping', 'perpkg',
+ 'default_show_events']
def removesuffix(s: str, suffix: str) -> str:
"""Remove the suffix from a string
@@ -356,6 +358,7 @@ class JsonEvent:
self.metricgroup_no_group = jd.get('MetricgroupNoGroup')
self.default_metricgroup_name = jd.get('DefaultMetricgroupName')
self.event_grouping = convert_metric_constraint(jd.get('MetricConstraint'))
+ self.default_show_events = jd.get('DefaultShowEvents')
self.metric_expr = None
if 'MetricExpr' in jd:
self.metric_expr = metric.ParsePerfJson(jd['MetricExpr']).Simplify()
diff --git a/tools/perf/pmu-events/pmu-events.h b/tools/perf/pmu-events/pmu-events.h
index 559265a903c8..d3b24014c6ff 100644
--- a/tools/perf/pmu-events/pmu-events.h
+++ b/tools/perf/pmu-events/pmu-events.h
@@ -74,6 +74,7 @@ struct pmu_metric {
const char *default_metricgroup_name;
enum aggr_mode_class aggr_mode;
enum metric_event_groups event_grouping;
+ bool default_show_events;
};
struct pmu_events_table;
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 71f74c7036ef..3ae4ac8f9a37 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -122,6 +122,7 @@ struct evsel {
bool reset_group;
bool needs_auxtrace_mmap;
bool default_metricgroup; /* A member of the Default metricgroup */
+ bool default_show_events; /* If a default group member, show the event */
bool needs_uniquify;
struct hashmap *per_pkg_mask;
int err;
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index e67e04ce01c9..25c75fdbfc52 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -152,6 +152,8 @@ struct metric {
* Should events of the metric be grouped?
*/
bool group_events;
+ /** Show events even if in the Default metric group. */
+ bool default_show_events;
/**
* Parsed events for the metric. Optional as events may be taken from a
* different metric whose group contains all the IDs necessary for this
@@ -255,6 +257,7 @@ static struct metric *metric__new(const struct pmu_metric *pm,
m->pctx->sctx.runtime = runtime;
m->pctx->sctx.system_wide = system_wide;
m->group_events = !metric_no_group && metric__group_events(pm, metric_no_threshold);
+ m->default_show_events = pm->default_show_events;
m->metric_refs = NULL;
m->evlist = NULL;
@@ -1513,6 +1516,16 @@ static int parse_groups(struct evlist *perf_evlist,
free(metric_events);
goto out;
}
+ if (m->default_show_events) {
+ struct evsel *pos;
+
+ for (int i = 0; metric_events[i]; i++)
+ metric_events[i]->default_show_events = true;
+ evlist__for_each_entry(metric_evlist, pos) {
+ if (pos->metric_leader && pos->metric_leader->default_show_events)
+ pos->default_show_events = true;
+ }
+ }
expr->metric_threshold = m->metric_threshold;
expr->metric_unit = m->metric_unit;
expr->metric_events = metric_events;
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index a67b991f4e81..4d0e353846ea 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -872,7 +872,7 @@ static void printout(struct perf_stat_config *config, struct outstate *os,
out.ctx = os;
out.force_header = false;
- if (!config->metric_only && !counter->default_metricgroup) {
+ if (!config->metric_only && (!counter->default_metricgroup || counter->default_show_events)) {
abs_printout(config, os, os->id, os->aggr_nr, counter, uval, ok);
print_noise(config, os, counter, noise, /*before_metric=*/true);
@@ -880,7 +880,7 @@ static void printout(struct perf_stat_config *config, struct outstate *os,
}
if (ok) {
- if (!config->metric_only && counter->default_metricgroup) {
+ if (!config->metric_only && counter->default_metricgroup && !counter->default_show_events) {
void *from = NULL;
aggr_printout(config, os, os->evsel, os->id, os->aggr_nr);
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index abaf6b579bfc..4df614f8e200 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -665,7 +665,7 @@ void *perf_stat__print_shadow_stats_metricgroup(struct perf_stat_config *config,
if (strcmp(name, mexp->default_metricgroup_name))
return (void *)mexp;
/* Only print the name of the metricgroup once */
- if (!header_printed) {
+ if (!header_printed && !evsel->default_show_events) {
header_printed = true;
perf_stat__print_metricgroup_header(config, evsel, ctxp,
name, out);
--
2.51.1.821.gb6fe4d2222-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread* [PATCH v1 10/22] perf stat: Add detail -d,-dd,-ddd metrics
2025-10-24 17:58 [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
` (8 preceding siblings ...)
2025-10-24 17:58 ` [PATCH v1 09/22] perf jevents: Add metric DefaultShowEvents Ian Rogers
@ 2025-10-24 17:58 ` Ian Rogers
2025-10-24 17:58 ` [PATCH v1 11/22] perf script: Change metric format to use json metrics Ian Rogers
` (13 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Ian Rogers @ 2025-10-24 17:58 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
linux-perf-users
Add metrics for the stat-shadow -d, -dd and -ddd events and hard coded
metrics. Remove the events as these now come from the metrics.
Following this change a detailed perf stat output looks like:
```
$ perf stat -a -ddd -- sleep 1
Performance counter stats for 'system wide':
18,446 context-switches # 653.0 cs/sec cs_per_second
TopdownL1 (cpu_core) # 6.8 % tma_bad_speculation
# 37.0 % tma_frontend_bound (30.32%)
TopdownL1 (cpu_core) # 40.1 % tma_backend_bound
# 16.1 % tma_retiring (30.32%)
177 page-faults # 6.3 faults/sec page_faults_per_second
472,170,922 cpu_atom/cpu-cycles/ # 0.0 GHz cycles_frequency (28.57%)
656,868,742 cpu_core/cpu-cycles/ # 0.0 GHz cycles_frequency (38.24%)
# 22.2 % tma_bad_speculation
# 12.2 % tma_retiring (28.55%)
# 32.4 % tma_backend_bound
# 33.1 % tma_frontend_bound (35.71%)
43,583,604 cpu_atom/branches/ # 1.5 K/sec branch_frequency (42.85%)
87,140,541 cpu_core/branches/ # 3.1 K/sec branch_frequency (54.09%)
493 cpu-migrations # 17.5 migrations/sec migrations_per_second
28,247,893,219 cpu-clock # 28.0 CPUs CPUs_utilized
445,297,600 cpu_atom/cpu-cycles/ # 0.4 instructions insn_per_cycle (42.87%)
642,323,993 cpu_core/cpu-cycles/ # 0.8 instructions insn_per_cycle (62.01%)
2,126,311 cpu_atom/branch-misses/ # 6.8 % branch_miss_rate (35.73%)
2,172,775 cpu_core/branch-misses/ # 2.5 % branch_miss_rate (62.36%)
1,855,042 cpu_atom/LLC-loads/ # 0.0 % llc_miss_rate (28.56%)
2,671,549 cpu_core/LLC-loads/ # 32.5 % llc_miss_rate (46.31%)
8,440,231 cpu_core/L1-dcache-load-misses/ # nan % l1d_miss_rate (37.99%)
10,823,925 cpu_atom/L1-icache-load-misses/ # 19.0 % l1i_miss_rate (21.43%)
22,602,344 cpu_atom/dTLB-loads/ # 2.0 % dtlb_miss_rate (21.44%)
136,524,528 cpu_core/dTLB-loads/ # 0.3 % dtlb_miss_rate (15.06%)
1.007665494 seconds time elapsed
```
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/builtin-stat.c | 100 +++---------------
.../arch/common/common/metrics.json | 54 ++++++++++
tools/perf/pmu-events/empty-pmu-events.c | 72 +++++++------
3 files changed, 113 insertions(+), 113 deletions(-)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 9c7d63614cab..c00d84a04593 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1853,28 +1853,6 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
return 0;
}
-/* Add legacy hardware/hardware-cache event to evlist for all core PMUs without wildcarding. */
-static int parse_hardware_event(struct evlist *evlist, const char *event,
- struct parse_events_error *err)
-{
- char buf[256];
- struct perf_pmu *pmu = NULL;
-
- while ((pmu = perf_pmus__scan_core(pmu)) != NULL) {
- int ret;
-
- if (perf_pmus__num_core_pmus() == 1)
- snprintf(buf, sizeof(buf), "%s/%s,name=%s/", pmu->name, event, event);
- else
- snprintf(buf, sizeof(buf), "%s/%s/", pmu->name, event);
-
- ret = parse_events(evlist, buf, err);
- if (ret)
- return ret;
- }
- return 0;
-}
-
/*
* Add default events, if there were no attributes specified or
* if -d/--detailed, -d -d or -d -d -d is used:
@@ -2002,22 +1980,34 @@ static int add_default_events(void)
* threshold computation, but it will be computed if the events
* are present.
*/
- if (metricgroup__has_metric_or_groups(pmu, "Default")) {
- struct evlist *metric_evlist = evlist__new();
+ const char *default_metricgroup_names[] = {
+ "Default", "Default2", "Default3", "Default4",
+ };
+
+ for (size_t i = 0; i < ARRAY_SIZE(default_metricgroup_names); i++) {
+ struct evlist *metric_evlist;
+
+ if (!metricgroup__has_metric_or_groups(pmu, default_metricgroup_names[i]))
+ continue;
+
+ if ((int)i > detailed_run)
+ break;
+ metric_evlist = evlist__new();
if (!metric_evlist) {
ret = -ENOMEM;
- goto out;
+ break;
}
- if (metricgroup__parse_groups(metric_evlist, pmu, "Default",
+ if (metricgroup__parse_groups(metric_evlist, pmu, default_metricgroup_names[i],
/*metric_no_group=*/false,
/*metric_no_merge=*/false,
/*metric_no_threshold=*/true,
stat_config.user_requested_cpu_list,
stat_config.system_wide,
stat_config.hardware_aware_grouping) < 0) {
+ evlist__delete(metric_evlist);
ret = -1;
- goto out;
+ break;
}
evlist__for_each_entry(metric_evlist, evsel)
@@ -2030,62 +2020,6 @@ static int add_default_events(void)
evlist__delete(metric_evlist);
}
}
-
- /* Detailed events get appended to the event list: */
-
- if (!ret && detailed_run >= 1) {
- /*
- * Detailed stats (-d), covering the L1 and last level data
- * caches:
- */
- const char *hw_events[] = {
- "L1-dcache-loads",
- "L1-dcache-load-misses",
- "LLC-loads",
- "LLC-load-misses",
- };
-
- for (size_t i = 0; i < ARRAY_SIZE(hw_events); i++) {
- ret = parse_hardware_event(evlist, hw_events[i], &err);
- if (ret)
- goto out;
- }
- }
- if (!ret && detailed_run >= 2) {
- /*
- * Very detailed stats (-d -d), covering the instruction cache
- * and the TLB caches:
- */
- const char *hw_events[] = {
- "L1-icache-loads",
- "L1-icache-load-misses",
- "dTLB-loads",
- "dTLB-load-misses",
- "iTLB-loads",
- "iTLB-load-misses",
- };
-
- for (size_t i = 0; i < ARRAY_SIZE(hw_events); i++) {
- ret = parse_hardware_event(evlist, hw_events[i], &err);
- if (ret)
- goto out;
- }
- }
- if (!ret && detailed_run >= 3) {
- /*
- * Very, very detailed stats (-d -d -d), adding prefetch events:
- */
- const char *hw_events[] = {
- "L1-dcache-prefetches",
- "L1-dcache-prefetch-misses",
- };
-
- for (size_t i = 0; i < ARRAY_SIZE(hw_events); i++) {
- ret = parse_hardware_event(evlist, hw_events[i], &err);
- if (ret)
- goto out;
- }
- }
out:
if (!ret) {
evlist__for_each_entry(evlist, evsel) {
diff --git a/tools/perf/pmu-events/arch/common/common/metrics.json b/tools/perf/pmu-events/arch/common/common/metrics.json
index 017bbdede3d7..89d1d9f61014 100644
--- a/tools/perf/pmu-events/arch/common/common/metrics.json
+++ b/tools/perf/pmu-events/arch/common/common/metrics.json
@@ -93,5 +93,59 @@
"MetricThreshold": "branch_miss_rate > 0.05",
"ScaleUnit": "100%",
"DefaultShowEvents": "1"
+ },
+ {
+ "BriefDescription": "L1D miss rate",
+ "MetricExpr": "L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads",
+ "MetricGroup": "Default2",
+ "MetricName": "l1d_miss_rate",
+ "MetricThreshold": "l1d_miss_rate > 0.05",
+ "ScaleUnit": "100%",
+ "DefaultShowEvents": "1"
+ },
+ {
+ "BriefDescription": "LLC miss rate",
+ "MetricExpr": "LLC\\-load\\-misses / LLC\\-loads",
+ "MetricGroup": "Default2",
+ "MetricName": "llc_miss_rate",
+ "MetricThreshold": "llc_miss_rate > 0.05",
+ "ScaleUnit": "100%",
+ "DefaultShowEvents": "1"
+ },
+ {
+ "BriefDescription": "L1I miss rate",
+ "MetricExpr": "L1\\-icache\\-load\\-misses / L1\\-icache\\-loads",
+ "MetricGroup": "Default3",
+ "MetricName": "l1i_miss_rate",
+ "MetricThreshold": "l1i_miss_rate > 0.05",
+ "ScaleUnit": "100%",
+ "DefaultShowEvents": "1"
+ },
+ {
+ "BriefDescription": "dTLB miss rate",
+ "MetricExpr": "dTLB\\-load\\-misses / dTLB\\-loads",
+ "MetricGroup": "Default3",
+ "MetricName": "dtlb_miss_rate",
+ "MetricThreshold": "dtlb_miss_rate > 0.05",
+ "ScaleUnit": "100%",
+ "DefaultShowEvents": "1"
+ },
+ {
+ "BriefDescription": "iTLB miss rate",
+ "MetricExpr": "iTLB\\-load\\-misses / iTLB\\-loads",
+ "MetricGroup": "Default3",
+ "MetricName": "itlb_miss_rate",
+ "MetricThreshold": "itlb_miss_rate > 0.05",
+ "ScaleUnit": "100%",
+ "DefaultShowEvents": "1"
+ },
+ {
+ "BriefDescription": "L1 prefetch miss rate",
+ "MetricExpr": "L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches",
+ "MetricGroup": "Default4",
+ "MetricName": "l1_prefetch_miss_rate",
+ "MetricThreshold": "l1_prefetch_miss_rate > 0.05",
+ "ScaleUnit": "100%",
+ "DefaultShowEvents": "1"
}
]
diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
index 71464b1d8afe..e882c645fbbe 100644
--- a/tools/perf/pmu-events/empty-pmu-events.c
+++ b/tools/perf/pmu-events/empty-pmu-events.c
@@ -1314,21 +1314,27 @@ static const char *const big_c_string =
/* offset=129185 */ "cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011"
/* offset=129361 */ "branch_frequency\000Default\000branches / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Branches per CPU second\000\0001000K/sec\000\000\000\000011"
/* offset=129541 */ "branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001"
-/* offset=129645 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000"
-/* offset=129668 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000"
-/* offset=129732 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000"
-/* offset=129899 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
-/* offset=129964 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
-/* offset=130032 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000"
-/* offset=130104 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000"
-/* offset=130199 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000"
-/* offset=130334 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000"
-/* offset=130399 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000"
-/* offset=130468 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000"
-/* offset=130539 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\000000"
-/* offset=130562 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\000000"
-/* offset=130585 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\000000"
-/* offset=130606 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000"
+/* offset=129645 */ "l1d_miss_rate\000Default2\000L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads\000l1d_miss_rate > 0.05\000L1D miss rate\000\000100%\000\000\000\000001"
+/* offset=129761 */ "llc_miss_rate\000Default2\000LLC\\-load\\-misses / LLC\\-loads\000llc_miss_rate > 0.05\000LLC miss rate\000\000100%\000\000\000\000001"
+/* offset=129862 */ "l1i_miss_rate\000Default3\000L1\\-icache\\-load\\-misses / L1\\-icache\\-loads\000l1i_miss_rate > 0.05\000L1I miss rate\000\000100%\000\000\000\000001"
+/* offset=129977 */ "dtlb_miss_rate\000Default3\000dTLB\\-load\\-misses / dTLB\\-loads\000dtlb_miss_rate > 0.05\000dTLB miss rate\000\000100%\000\000\000\000001"
+/* offset=130083 */ "itlb_miss_rate\000Default3\000iTLB\\-load\\-misses / iTLB\\-loads\000itlb_miss_rate > 0.05\000iTLB miss rate\000\000100%\000\000\000\000001"
+/* offset=130189 */ "l1_prefetch_miss_rate\000Default4\000L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches\000l1_prefetch_miss_rate > 0.05\000L1 prefetch miss rate\000\000100%\000\000\000\000001"
+/* offset=130337 */ "CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000"
+/* offset=130360 */ "IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000"
+/* offset=130424 */ "Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000"
+/* offset=130591 */ "dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
+/* offset=130656 */ "icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000"
+/* offset=130724 */ "cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000"
+/* offset=130796 */ "DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000"
+/* offset=130891 */ "DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000"
+/* offset=131026 */ "DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000"
+/* offset=131091 */ "DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000"
+/* offset=131160 */ "DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000"
+/* offset=131231 */ "M1\000\000ipc + M2\000\000\000\000\000\000\000\000000"
+/* offset=131254 */ "M2\000\000ipc + M1\000\000\000\000\000\000\000\000000"
+/* offset=131277 */ "M3\000\0001 / M3\000\000\000\000\000\000\000\000000"
+/* offset=131298 */ "L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000"
;
static const struct compact_pmu_event pmu_events__common_default_core[] = {
@@ -2621,8 +2627,14 @@ static const struct compact_pmu_event pmu_metrics__common_default_core[] = {
{ 129541 }, /* branch_miss_rate\000Default\000branch\\-misses / branches\000branch_miss_rate > 0.05\000Branch miss rate\000\000100%\000\000\000\000001 */
{ 127928 }, /* cs_per_second\000Default\000software@context\\-switches\\,name\\=context\\-switches@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Context switches per CPU second\000\0001cs/sec\000\000\000\000011 */
{ 129185 }, /* cycles_frequency\000Default\000cpu\\-cycles / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Cycles per CPU second\000\0001GHz\000\000\000\000011 */
+{ 129977 }, /* dtlb_miss_rate\000Default3\000dTLB\\-load\\-misses / dTLB\\-loads\000dtlb_miss_rate > 0.05\000dTLB miss rate\000\000100%\000\000\000\000001 */
{ 128929 }, /* frontend_cycles_idle\000Default\000stalled\\-cycles\\-frontend / cpu\\-cycles\000frontend_cycles_idle > 0.1\000Frontend stalls per cycle\000\000\000\000\000\000001 */
{ 128652 }, /* insn_per_cycle\000Default\000instructions / cpu\\-cycles\000insn_per_cycle < 1\000Instructions Per Cycle\000\0001instructions\000\000\000\000001 */
+{ 130083 }, /* itlb_miss_rate\000Default3\000iTLB\\-load\\-misses / iTLB\\-loads\000itlb_miss_rate > 0.05\000iTLB miss rate\000\000100%\000\000\000\000001 */
+{ 130189 }, /* l1_prefetch_miss_rate\000Default4\000L1\\-dcache\\-prefetch\\-misses / L1\\-dcache\\-prefetches\000l1_prefetch_miss_rate > 0.05\000L1 prefetch miss rate\000\000100%\000\000\000\000001 */
+{ 129645 }, /* l1d_miss_rate\000Default2\000L1\\-dcache\\-load\\-misses / L1\\-dcache\\-loads\000l1d_miss_rate > 0.05\000L1D miss rate\000\000100%\000\000\000\000001 */
+{ 129862 }, /* l1i_miss_rate\000Default3\000L1\\-icache\\-load\\-misses / L1\\-icache\\-loads\000l1i_miss_rate > 0.05\000L1I miss rate\000\000100%\000\000\000\000001 */
+{ 129761 }, /* llc_miss_rate\000Default2\000LLC\\-load\\-misses / LLC\\-loads\000llc_miss_rate > 0.05\000LLC miss rate\000\000100%\000\000\000\000001 */
{ 128161 }, /* migrations_per_second\000Default\000software@cpu\\-migrations\\,name\\=cpu\\-migrations@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Process migrations to a new CPU per CPU second\000\0001migrations/sec\000\000\000\000011 */
{ 128421 }, /* page_faults_per_second\000Default\000software@page\\-faults\\,name\\=page\\-faults@ * 1e9 / (software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@)\000\000Page faults per CPU second\000\0001faults/sec\000\000\000\000011 */
{ 128765 }, /* stalled_cycles_per_instruction\000Default\000max(stalled\\-cycles\\-frontend, stalled\\-cycles\\-backend) / instructions\000\000Max front or backend stalls per instruction\000\000\000\000\000\000001 */
@@ -2698,21 +2710,21 @@ static const struct pmu_table_entry pmu_events__test_soc_cpu[] = {
};
static const struct compact_pmu_event pmu_metrics__test_soc_cpu_default_core[] = {
-{ 129645 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000 */
-{ 130334 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000 */
-{ 130104 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000 */
-{ 130199 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000 */
-{ 130399 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
-{ 130468 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
-{ 129732 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000 */
-{ 129668 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000 */
-{ 130606 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000 */
-{ 130539 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\000000 */
-{ 130562 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\000000 */
-{ 130585 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\000000 */
-{ 130032 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000 */
-{ 129899 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
-{ 129964 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
+{ 130337 }, /* CPI\000\0001 / IPC\000\000\000\000\000\000\000\000000 */
+{ 131026 }, /* DCache_L2_All\000\000DCache_L2_All_Hits + DCache_L2_All_Miss\000\000\000\000\000\000\000\000000 */
+{ 130796 }, /* DCache_L2_All_Hits\000\000l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit\000\000\000\000\000\000\000\000000 */
+{ 130891 }, /* DCache_L2_All_Miss\000\000max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss\000\000\000\000\000\000\000\000000 */
+{ 131091 }, /* DCache_L2_Hits\000\000d_ratio(DCache_L2_All_Hits, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
+{ 131160 }, /* DCache_L2_Misses\000\000d_ratio(DCache_L2_All_Miss, DCache_L2_All)\000\000\000\000\000\000\000\000000 */
+{ 130424 }, /* Frontend_Bound_SMT\000\000idq_uops_not_delivered.core / (4 * (cpu_clk_unhalted.thread / 2 * (1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk)))\000\000\000\000\000\000\000\000000 */
+{ 130360 }, /* IPC\000group1\000inst_retired.any / cpu_clk_unhalted.thread\000\000\000\000\000\000\000\000000 */
+{ 131298 }, /* L1D_Cache_Fill_BW\000\00064 * l1d.replacement / 1e9 / duration_time\000\000\000\000\000\000\000\000000 */
+{ 131231 }, /* M1\000\000ipc + M2\000\000\000\000\000\000\000\000000 */
+{ 131254 }, /* M2\000\000ipc + M1\000\000\000\000\000\000\000\000000 */
+{ 131277 }, /* M3\000\0001 / M3\000\000\000\000\000\000\000\000000 */
+{ 130724 }, /* cache_miss_cycles\000group1\000dcache_miss_cpi + icache_miss_cycles\000\000\000\000\000\000\000\000000 */
+{ 130591 }, /* dcache_miss_cpi\000\000l1d\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
+{ 130656 }, /* icache_miss_cycles\000\000l1i\\-loads\\-misses / inst_retired.any\000\000\000\000\000\000\000\000000 */
};
--
2.51.1.821.gb6fe4d2222-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread* [PATCH v1 11/22] perf script: Change metric format to use json metrics
2025-10-24 17:58 [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
` (9 preceding siblings ...)
2025-10-24 17:58 ` [PATCH v1 10/22] perf stat: Add detail -d,-dd,-ddd metrics Ian Rogers
@ 2025-10-24 17:58 ` Ian Rogers
2025-10-24 17:58 ` [PATCH v1 12/22] perf stat: Remove hard coded shadow metrics Ian Rogers
` (12 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Ian Rogers @ 2025-10-24 17:58 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
linux-perf-users
The metric format option isn't properly supported. This change
improves that by making the sample events update the counts of an
evsel, where the shadow metric code expects to read the values. To
support printing metrics, metrics need to be found. This is done on
the first attempt to print a metric. Every metric is parsed and then
the evsels in the metric's evlist compared to those in perf script
using the perf_event_attr type and config. If the metric matches then
it is added for printing. As an event in the perf script's evlist may
have >1 metric id, or different leader for aggregation, the first
metric matched will be displayed in those cases.
An example use is:
```
$ perf record -a -e '{instructions,cpu-cycles}:S' -a -- sleep 1
$ perf script -F period,metric
...
867817
metric: 0.30 insn per cycle
125394
metric: 0.04 insn per cycle
313516
metric: 0.11 insn per cycle
metric: 1.00 insn per cycle
```
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/builtin-script.c | 239 ++++++++++++++++++++++++++++++++----
1 file changed, 217 insertions(+), 22 deletions(-)
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 8124fcb51da9..e24c3d9e01a8 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -33,6 +33,7 @@
#include "util/path.h"
#include "util/event.h"
#include "util/mem-info.h"
+#include "util/metricgroup.h"
#include "ui/ui.h"
#include "print_binary.h"
#include "print_insn.h"
@@ -341,9 +342,6 @@ struct evsel_script {
char *filename;
FILE *fp;
u64 samples;
- /* For metric output */
- u64 val;
- int gnum;
};
static inline struct evsel_script *evsel_script(struct evsel *evsel)
@@ -2132,13 +2130,161 @@ static void script_new_line(struct perf_stat_config *config __maybe_unused,
fputs("\tmetric: ", mctx->fp);
}
-static void perf_sample__fprint_metric(struct perf_script *script,
- struct thread *thread,
+struct script_find_metrics_args {
+ struct evlist *evlist;
+ bool system_wide;
+};
+
+static struct evsel *map_metric_evsel_to_script_evsel(struct evlist *script_evlist,
+ struct evsel *metric_evsel)
+{
+ struct evsel *script_evsel;
+
+ evlist__for_each_entry(script_evlist, script_evsel) {
+ /* Skip if perf_event_attr differ. */
+ if (metric_evsel->core.attr.type != script_evsel->core.attr.type)
+ continue;
+ if (metric_evsel->core.attr.config != script_evsel->core.attr.config)
+ continue;
+ /* Skip if the script event has a metric_id that doesn't match. */
+ if (script_evsel->metric_id &&
+ strcmp(evsel__metric_id(metric_evsel), evsel__metric_id(script_evsel))) {
+ pr_debug("Skipping matching evsel due to differing metric ids '%s' vs '%s'\n",
+ evsel__metric_id(metric_evsel), evsel__metric_id(script_evsel));
+ continue;
+ }
+ return script_evsel;
+ }
+ return NULL;
+}
+
+static int script_find_metrics(const struct pmu_metric *pm,
+ const struct pmu_metrics_table *table __maybe_unused,
+ void *data)
+{
+ struct script_find_metrics_args *args = data;
+ struct evlist *script_evlist = args->evlist;
+ struct evlist *metric_evlist = evlist__new();
+ struct evsel *metric_evsel;
+ int ret = metricgroup__parse_groups(metric_evlist,
+ /*pmu=*/"all",
+ pm->metric_name,
+ /*metric_no_group=*/false,
+ /*metric_no_merge=*/false,
+ /*metric_no_threshold=*/true,
+ /*user_requested_cpu_list=*/NULL,
+ args->system_wide,
+ /*hardware_aware_grouping=*/false);
+
+ if (ret) {
+ /* Metric parsing failed but continue the search. */
+ goto out;
+ }
+
+ /*
+ * Check the script_evlist has an entry for each metric_evlist entry. If
+ * the script evsel was already set up avoid changing data that may
+ * break it.
+ */
+ evlist__for_each_entry(metric_evlist, metric_evsel) {
+ struct evsel *script_evsel =
+ map_metric_evsel_to_script_evsel(script_evlist, metric_evsel);
+ struct evsel *new_metric_leader;
+
+ if (!script_evsel) {
+ pr_debug("Skipping metric '%s' as evsel '%s' / '%s' is missing\n",
+ pm->metric_name, evsel__name(metric_evsel),
+ evsel__metric_id(metric_evsel));
+ goto out;
+ }
+
+ if (script_evsel->metric_leader == NULL)
+ continue;
+
+ if (metric_evsel->metric_leader == metric_evsel) {
+ new_metric_leader = script_evsel;
+ } else {
+ new_metric_leader =
+ map_metric_evsel_to_script_evsel(script_evlist,
+ metric_evsel->metric_leader);
+ }
+ /* Mismatching evsel leaders. */
+ if (script_evsel->metric_leader != new_metric_leader) {
+ pr_debug("Skipping metric '%s' due to mismatching evsel metric leaders '%s' vs '%s'\n",
+ pm->metric_name, evsel__metric_id(metric_evsel),
+ evsel__metric_id(script_evsel));
+ goto out;
+ }
+ }
+ /*
+ * Metric events match those in the script evlist, copy metric evsel
+ * data into the script evlist.
+ */
+ evlist__for_each_entry(metric_evlist, metric_evsel) {
+ struct evsel *script_evsel =
+ map_metric_evsel_to_script_evsel(script_evlist, metric_evsel);
+ struct metric_event *metric_me = metricgroup__lookup(&metric_evlist->metric_events,
+ metric_evsel,
+ /*create=*/false);
+
+ if (script_evsel->metric_id == NULL) {
+ script_evsel->metric_id = metric_evsel->metric_id;
+ metric_evsel->metric_id = NULL;
+ }
+
+ if (script_evsel->metric_leader == NULL) {
+ if (metric_evsel->metric_leader == metric_evsel) {
+ script_evsel->metric_leader = script_evsel;
+ } else {
+ script_evsel->metric_leader =
+ map_metric_evsel_to_script_evsel(script_evlist,
+ metric_evsel->metric_leader);
+ }
+ }
+
+ if (metric_me) {
+ struct metric_expr *expr;
+ struct metric_event *script_me =
+ metricgroup__lookup(&script_evlist->metric_events,
+ script_evsel,
+ /*create=*/true);
+
+ if (!script_me) {
+ /*
+ * As the metric_expr is created, the only
+ * failure is a lack of memory.
+ */
+ goto out;
+ }
+ list_splice_init(&metric_me->head, &script_me->head);
+ list_for_each_entry(expr, &script_me->head, nd) {
+ for (int i = 0; expr->metric_events[i]; i++) {
+ expr->metric_events[i] =
+ map_metric_evsel_to_script_evsel(script_evlist,
+ expr->metric_events[i]);
+ }
+ }
+ }
+ }
+ pr_debug("Found metric '%s' whose evsels match those of in the perf data\n",
+ pm->metric_name);
+ evlist__delete(metric_evlist);
+out:
+ return 0;
+}
+
+static struct aggr_cpu_id script_aggr_cpu_id_get(struct perf_stat_config *config __maybe_unused,
+ struct perf_cpu cpu)
+{
+ return aggr_cpu_id__global(cpu, /*data=*/NULL);
+}
+
+static void perf_sample__fprint_metric(struct thread *thread,
struct evsel *evsel,
struct perf_sample *sample,
FILE *fp)
{
- struct evsel *leader = evsel__leader(evsel);
+ static bool init_metrics;
struct perf_stat_output_ctx ctx = {
.print_metric = script_print_metric,
.new_line = script_new_line,
@@ -2150,23 +2296,72 @@ static void perf_sample__fprint_metric(struct perf_script *script,
},
.force_header = false,
};
- struct evsel *ev2;
- u64 val;
+ struct perf_counts_values *count, *old_count;
+ int cpu_map_idx, thread_map_idx, aggr_idx;
+ struct evsel *pos;
+
+ if (!init_metrics) {
+ /* One time initialization of stat_config and metric data. */
+ struct script_find_metrics_args args = {
+ .evlist = evsel->evlist,
+ /* TODO: Determine system-wide based on evlist.. */
+ .system_wide = true,
+ };
+ if (!stat_config.output)
+ stat_config.output = stdout;
+
+ if (!stat_config.aggr_map) {
+ /* TODO: currently only global aggregation is supported. */
+ assert(stat_config.aggr_mode == AGGR_GLOBAL);
+ stat_config.aggr_get_id = script_aggr_cpu_id_get;
+ stat_config.aggr_map =
+ cpu_aggr_map__new(evsel->evlist->core.user_requested_cpus,
+ aggr_cpu_id__global, /*data=*/NULL,
+ /*needs_sort=*/false);
+ }
- if (!evsel->stats)
- evlist__alloc_stats(&stat_config, script->session->evlist, /*alloc_raw=*/false);
- if (evsel_script(leader)->gnum++ == 0)
- perf_stat__reset_shadow_stats();
- val = sample->period * evsel->scale;
- evsel_script(evsel)->val = val;
- if (evsel_script(leader)->gnum == leader->core.nr_members) {
- for_each_group_member (ev2, leader) {
- perf_stat__print_shadow_stats(&stat_config, ev2,
- evsel_script(ev2)->val,
- sample->cpu,
- &ctx);
+ metricgroup__for_each_metric(pmu_metrics_table__find(), script_find_metrics, &args);
+ init_metrics = true;
+ }
+
+ if (!evsel->stats) {
+ if (evlist__alloc_stats(&stat_config, evsel->evlist, /*alloc_raw=*/true) < 0)
+ return;
+ }
+ if (!evsel->stats->aggr) {
+ if (evlist__alloc_aggr_stats(evsel->evlist, stat_config.aggr_map->nr) < 0)
+ return;
+ }
+
+ /* Update the evsel's count using the sample's data. */
+ cpu_map_idx = perf_cpu_map__idx(evsel->core.cpus, (struct perf_cpu){sample->cpu});
+ thread_map_idx = perf_thread_map__idx(evsel->core.threads, sample->tid);
+ if (thread_map_idx < 0) {
+ /* Missing thread, check for any thread. */
+ if (perf_thread_map__pid(evsel->core.threads, /*idx=*/0) == -1) {
+ thread_map_idx = 0;
+ } else {
+ pr_info("Missing thread map entry for thread %d\n", sample->tid);
+ return;
+ }
+ }
+ count = perf_counts(evsel->counts, cpu_map_idx, thread_map_idx);
+ old_count = perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread_map_idx);
+ count->val = old_count->val + sample->period;
+ count->run = old_count->run + 1;
+ count->ena = old_count->ena + 1;
+
+ /* Update the aggregated stats. */
+ perf_stat_process_counter(&stat_config, evsel);
+
+ /* Display all metrics. */
+ evlist__for_each_entry(evsel->evlist, pos) {
+ cpu_aggr_map__for_each_idx(aggr_idx, stat_config.aggr_map) {
+ perf_stat__print_shadow_stats(&stat_config, pos,
+ count->val,
+ aggr_idx,
+ &ctx);
}
- evsel_script(leader)->gnum = 0;
}
}
@@ -2348,7 +2543,7 @@ static void process_event(struct perf_script *script,
}
if (PRINT_FIELD(METRIC))
- perf_sample__fprint_metric(script, thread, evsel, sample, fp);
+ perf_sample__fprint_metric(thread, evsel, sample, fp);
if (verbose > 0)
fflush(fp);
--
2.51.1.821.gb6fe4d2222-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread* [PATCH v1 12/22] perf stat: Remove hard coded shadow metrics
2025-10-24 17:58 [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
` (10 preceding siblings ...)
2025-10-24 17:58 ` [PATCH v1 11/22] perf script: Change metric format to use json metrics Ian Rogers
@ 2025-10-24 17:58 ` Ian Rogers
2025-10-24 17:58 ` [PATCH v1 13/22] perf stat: Fix default metricgroup display on hybrid Ian Rogers
` (11 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Ian Rogers @ 2025-10-24 17:58 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
linux-perf-users
Now that the metrics are encoded in common json the hard coded
printing means the metrics are shown twice. Remove the hard coded
version.
This means that when specifying events, and those events correspond to
a hard coded metric, the metric will no longer be displayed. The
metric will be displayed if the metric is requested. Due to the adhoc
printing in the previous approach it was often found frustrating, the
new approach avoids this.
The default perf stat output on an alderlake now looks like:
```
$ perf stat -a -- sleep 1
Performance counter stats for 'system wide':
7,932 context-switches # 281.7 cs/sec cs_per_second
TopdownL1 (cpu_core) # 10.3 % tma_bad_speculation
# 17.3 % tma_frontend_bound
TopdownL1 (cpu_core) # 37.3 % tma_backend_bound
# 35.2 % tma_retiring
5,901 page-faults # 209.5 faults/sec page_faults_per_second
418,955,116 cpu_atom/cpu-cycles/ # 0.0 GHz cycles_frequency (49.77%)
1,113,933,476 cpu_core/cpu-cycles/ # 0.0 GHz cycles_frequency
# 14.6 % tma_bad_speculation
# 8.5 % tma_retiring (50.17%)
# 41.8 % tma_backend_bound
# 35.1 % tma_frontend_bound (50.31%)
32,196,918 cpu_atom/branches/ # 1.1 K/sec branch_frequency (60.24%)
445,404,717 cpu_core/branches/ # 15.8 K/sec branch_frequency
235 cpu-migrations # 8.3 migrations/sec migrations_per_second
28,160,951,165 cpu-clock # 28.0 CPUs CPUs_utilized
382,285,763 cpu_atom/cpu-cycles/ # 0.4 instructions insn_per_cycle (60.18%)
1,114,029,255 cpu_core/cpu-cycles/ # 2.3 instructions insn_per_cycle
1,768,727 cpu_atom/branches-misses/ # 6.5 % branch_miss_rate (49.68%)
4,505,904 cpu_core/branches-misses/ # 1.0 % branch_miss_rate
1.007137632 seconds time elapsed
```
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/builtin-script.c | 1 -
tools/perf/util/stat-display.c | 4 +-
tools/perf/util/stat-shadow.c | 392 +--------------------------------
tools/perf/util/stat.h | 2 +-
4 files changed, 6 insertions(+), 393 deletions(-)
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index e24c3d9e01a8..6da8bfe1e7aa 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2358,7 +2358,6 @@ static void perf_sample__fprint_metric(struct thread *thread,
evlist__for_each_entry(evsel->evlist, pos) {
cpu_aggr_map__for_each_idx(aggr_idx, stat_config.aggr_map) {
perf_stat__print_shadow_stats(&stat_config, pos,
- count->val,
aggr_idx,
&ctx);
}
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 4d0e353846ea..eabeab5e6614 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -902,7 +902,7 @@ static void printout(struct perf_stat_config *config, struct outstate *os,
&num, from, &out);
} while (from != NULL);
} else {
- perf_stat__print_shadow_stats(config, counter, uval, aggr_idx, &out);
+ perf_stat__print_shadow_stats(config, counter, aggr_idx, &out);
}
} else {
pm(config, os, METRIC_THRESHOLD_UNKNOWN, /*format=*/NULL, /*unit=*/NULL, /*val=*/0);
@@ -1274,7 +1274,7 @@ static void print_metric_headers(struct perf_stat_config *config,
os.evsel = counter;
- perf_stat__print_shadow_stats(config, counter, 0, 0, &out);
+ perf_stat__print_shadow_stats(config, counter, /*aggr_idx=*/0, &out);
}
if (!config->json_output)
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 4df614f8e200..afbc49e8cb31 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -20,357 +20,12 @@
struct stats walltime_nsecs_stats;
struct rusage_stats ru_stats;
-enum {
- CTX_BIT_USER = 1 << 0,
- CTX_BIT_KERNEL = 1 << 1,
- CTX_BIT_HV = 1 << 2,
- CTX_BIT_HOST = 1 << 3,
- CTX_BIT_IDLE = 1 << 4,
- CTX_BIT_MAX = 1 << 5,
-};
-
-enum stat_type {
- STAT_NONE = 0,
- STAT_NSECS,
- STAT_CYCLES,
- STAT_INSTRUCTIONS,
- STAT_STALLED_CYCLES_FRONT,
- STAT_STALLED_CYCLES_BACK,
- STAT_BRANCHES,
- STAT_BRANCH_MISS,
- STAT_CACHE_REFS,
- STAT_CACHE_MISSES,
- STAT_L1_DCACHE,
- STAT_L1_ICACHE,
- STAT_LL_CACHE,
- STAT_ITLB_CACHE,
- STAT_DTLB_CACHE,
- STAT_L1D_MISS,
- STAT_L1I_MISS,
- STAT_LL_MISS,
- STAT_DTLB_MISS,
- STAT_ITLB_MISS,
- STAT_MAX
-};
-
-static int evsel_context(const struct evsel *evsel)
-{
- int ctx = 0;
-
- if (evsel->core.attr.exclude_kernel)
- ctx |= CTX_BIT_KERNEL;
- if (evsel->core.attr.exclude_user)
- ctx |= CTX_BIT_USER;
- if (evsel->core.attr.exclude_hv)
- ctx |= CTX_BIT_HV;
- if (evsel->core.attr.exclude_host)
- ctx |= CTX_BIT_HOST;
- if (evsel->core.attr.exclude_idle)
- ctx |= CTX_BIT_IDLE;
-
- return ctx;
-}
-
void perf_stat__reset_shadow_stats(void)
{
memset(&walltime_nsecs_stats, 0, sizeof(walltime_nsecs_stats));
memset(&ru_stats, 0, sizeof(ru_stats));
}
-static enum stat_type evsel__stat_type(struct evsel *evsel)
-{
- /* Fake perf_hw_cache_op_id values for use with evsel__match. */
- u64 PERF_COUNT_hw_cache_l1d_miss = PERF_COUNT_HW_CACHE_L1D |
- ((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
- ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16);
- u64 PERF_COUNT_hw_cache_l1i_miss = PERF_COUNT_HW_CACHE_L1I |
- ((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
- ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16);
- u64 PERF_COUNT_hw_cache_ll_miss = PERF_COUNT_HW_CACHE_LL |
- ((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
- ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16);
- u64 PERF_COUNT_hw_cache_dtlb_miss = PERF_COUNT_HW_CACHE_DTLB |
- ((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
- ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16);
- u64 PERF_COUNT_hw_cache_itlb_miss = PERF_COUNT_HW_CACHE_ITLB |
- ((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
- ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16);
-
- if (evsel__is_clock(evsel))
- return STAT_NSECS;
- else if (evsel__match(evsel, HARDWARE, HW_CPU_CYCLES))
- return STAT_CYCLES;
- else if (evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS))
- return STAT_INSTRUCTIONS;
- else if (evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
- return STAT_STALLED_CYCLES_FRONT;
- else if (evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_BACKEND))
- return STAT_STALLED_CYCLES_BACK;
- else if (evsel__match(evsel, HARDWARE, HW_BRANCH_INSTRUCTIONS))
- return STAT_BRANCHES;
- else if (evsel__match(evsel, HARDWARE, HW_BRANCH_MISSES))
- return STAT_BRANCH_MISS;
- else if (evsel__match(evsel, HARDWARE, HW_CACHE_REFERENCES))
- return STAT_CACHE_REFS;
- else if (evsel__match(evsel, HARDWARE, HW_CACHE_MISSES))
- return STAT_CACHE_MISSES;
- else if (evsel__match(evsel, HW_CACHE, HW_CACHE_L1D))
- return STAT_L1_DCACHE;
- else if (evsel__match(evsel, HW_CACHE, HW_CACHE_L1I))
- return STAT_L1_ICACHE;
- else if (evsel__match(evsel, HW_CACHE, HW_CACHE_LL))
- return STAT_LL_CACHE;
- else if (evsel__match(evsel, HW_CACHE, HW_CACHE_DTLB))
- return STAT_DTLB_CACHE;
- else if (evsel__match(evsel, HW_CACHE, HW_CACHE_ITLB))
- return STAT_ITLB_CACHE;
- else if (evsel__match(evsel, HW_CACHE, hw_cache_l1d_miss))
- return STAT_L1D_MISS;
- else if (evsel__match(evsel, HW_CACHE, hw_cache_l1i_miss))
- return STAT_L1I_MISS;
- else if (evsel__match(evsel, HW_CACHE, hw_cache_ll_miss))
- return STAT_LL_MISS;
- else if (evsel__match(evsel, HW_CACHE, hw_cache_dtlb_miss))
- return STAT_DTLB_MISS;
- else if (evsel__match(evsel, HW_CACHE, hw_cache_itlb_miss))
- return STAT_ITLB_MISS;
- return STAT_NONE;
-}
-
-static enum metric_threshold_classify get_ratio_thresh(const double ratios[3], double val)
-{
- assert(ratios[0] > ratios[1]);
- assert(ratios[1] > ratios[2]);
-
- return val > ratios[1]
- ? (val > ratios[0] ? METRIC_THRESHOLD_BAD : METRIC_THRESHOLD_NEARLY_BAD)
- : (val > ratios[2] ? METRIC_THRESHOLD_LESS_GOOD : METRIC_THRESHOLD_GOOD);
-}
-
-static double find_stat(const struct evsel *evsel, int aggr_idx, enum stat_type type)
-{
- struct evsel *cur;
- int evsel_ctx = evsel_context(evsel);
- struct perf_pmu *evsel_pmu = evsel__find_pmu(evsel);
-
- evlist__for_each_entry(evsel->evlist, cur) {
- struct perf_stat_aggr *aggr;
-
- /* Ignore the evsel that is being searched from. */
- if (evsel == cur)
- continue;
-
- /* Ignore evsels that are part of different groups. */
- if (evsel->core.leader->nr_members > 1 &&
- evsel->core.leader != cur->core.leader)
- continue;
- /* Ignore evsels with mismatched modifiers. */
- if (evsel_ctx != evsel_context(cur))
- continue;
- /* Ignore if not the cgroup we're looking for. */
- if (evsel->cgrp != cur->cgrp)
- continue;
- /* Ignore if not the stat we're looking for. */
- if (type != evsel__stat_type(cur))
- continue;
-
- /*
- * Except the SW CLOCK events,
- * ignore if not the PMU we're looking for.
- */
- if ((type != STAT_NSECS) && (evsel_pmu != evsel__find_pmu(cur)))
- continue;
-
- aggr = &cur->stats->aggr[aggr_idx];
- if (type == STAT_NSECS)
- return aggr->counts.val;
- return aggr->counts.val * cur->scale;
- }
- return 0.0;
-}
-
-static void print_ratio(struct perf_stat_config *config,
- const struct evsel *evsel, int aggr_idx,
- double numerator, struct perf_stat_output_ctx *out,
- enum stat_type denominator_type,
- const double thresh_ratios[3], const char *_unit)
-{
- double denominator = find_stat(evsel, aggr_idx, denominator_type);
- double ratio = 0;
- enum metric_threshold_classify thresh = METRIC_THRESHOLD_UNKNOWN;
- const char *fmt = NULL;
- const char *unit = NULL;
-
- if (numerator && denominator) {
- ratio = numerator / denominator * 100.0;
- thresh = get_ratio_thresh(thresh_ratios, ratio);
- fmt = "%7.2f%%";
- unit = _unit;
- }
- out->print_metric(config, out->ctx, thresh, fmt, unit, ratio);
-}
-
-static void print_stalled_cycles_front(struct perf_stat_config *config,
- const struct evsel *evsel,
- int aggr_idx, double stalled,
- struct perf_stat_output_ctx *out)
-{
- const double thresh_ratios[3] = {50.0, 30.0, 10.0};
-
- print_ratio(config, evsel, aggr_idx, stalled, out, STAT_CYCLES, thresh_ratios,
- "frontend cycles idle");
-}
-
-static void print_stalled_cycles_back(struct perf_stat_config *config,
- const struct evsel *evsel,
- int aggr_idx, double stalled,
- struct perf_stat_output_ctx *out)
-{
- const double thresh_ratios[3] = {75.0, 50.0, 20.0};
-
- print_ratio(config, evsel, aggr_idx, stalled, out, STAT_CYCLES, thresh_ratios,
- "backend cycles idle");
-}
-
-static void print_branch_miss(struct perf_stat_config *config,
- const struct evsel *evsel,
- int aggr_idx, double misses,
- struct perf_stat_output_ctx *out)
-{
- const double thresh_ratios[3] = {20.0, 10.0, 5.0};
-
- print_ratio(config, evsel, aggr_idx, misses, out, STAT_BRANCHES, thresh_ratios,
- "of all branches");
-}
-
-static void print_l1d_miss(struct perf_stat_config *config,
- const struct evsel *evsel,
- int aggr_idx, double misses,
- struct perf_stat_output_ctx *out)
-{
- const double thresh_ratios[3] = {20.0, 10.0, 5.0};
-
- print_ratio(config, evsel, aggr_idx, misses, out, STAT_L1_DCACHE, thresh_ratios,
- "of all L1-dcache accesses");
-}
-
-static void print_l1i_miss(struct perf_stat_config *config,
- const struct evsel *evsel,
- int aggr_idx, double misses,
- struct perf_stat_output_ctx *out)
-{
- const double thresh_ratios[3] = {20.0, 10.0, 5.0};
-
- print_ratio(config, evsel, aggr_idx, misses, out, STAT_L1_ICACHE, thresh_ratios,
- "of all L1-icache accesses");
-}
-
-static void print_ll_miss(struct perf_stat_config *config,
- const struct evsel *evsel,
- int aggr_idx, double misses,
- struct perf_stat_output_ctx *out)
-{
- const double thresh_ratios[3] = {20.0, 10.0, 5.0};
-
- print_ratio(config, evsel, aggr_idx, misses, out, STAT_LL_CACHE, thresh_ratios,
- "of all LL-cache accesses");
-}
-
-static void print_dtlb_miss(struct perf_stat_config *config,
- const struct evsel *evsel,
- int aggr_idx, double misses,
- struct perf_stat_output_ctx *out)
-{
- const double thresh_ratios[3] = {20.0, 10.0, 5.0};
-
- print_ratio(config, evsel, aggr_idx, misses, out, STAT_DTLB_CACHE, thresh_ratios,
- "of all dTLB cache accesses");
-}
-
-static void print_itlb_miss(struct perf_stat_config *config,
- const struct evsel *evsel,
- int aggr_idx, double misses,
- struct perf_stat_output_ctx *out)
-{
- const double thresh_ratios[3] = {20.0, 10.0, 5.0};
-
- print_ratio(config, evsel, aggr_idx, misses, out, STAT_ITLB_CACHE, thresh_ratios,
- "of all iTLB cache accesses");
-}
-
-static void print_cache_miss(struct perf_stat_config *config,
- const struct evsel *evsel,
- int aggr_idx, double misses,
- struct perf_stat_output_ctx *out)
-{
- const double thresh_ratios[3] = {20.0, 10.0, 5.0};
-
- print_ratio(config, evsel, aggr_idx, misses, out, STAT_CACHE_REFS, thresh_ratios,
- "of all cache refs");
-}
-
-static void print_instructions(struct perf_stat_config *config,
- const struct evsel *evsel,
- int aggr_idx, double instructions,
- struct perf_stat_output_ctx *out)
-{
- print_metric_t print_metric = out->print_metric;
- void *ctxp = out->ctx;
- double cycles = find_stat(evsel, aggr_idx, STAT_CYCLES);
- double max_stalled = max(find_stat(evsel, aggr_idx, STAT_STALLED_CYCLES_FRONT),
- find_stat(evsel, aggr_idx, STAT_STALLED_CYCLES_BACK));
-
- if (cycles) {
- print_metric(config, ctxp, METRIC_THRESHOLD_UNKNOWN, "%7.2f ",
- "insn per cycle", instructions / cycles);
- } else {
- print_metric(config, ctxp, METRIC_THRESHOLD_UNKNOWN, /*fmt=*/NULL,
- "insn per cycle", 0);
- }
- if (max_stalled && instructions) {
- if (out->new_line)
- out->new_line(config, ctxp);
- print_metric(config, ctxp, METRIC_THRESHOLD_UNKNOWN, "%7.2f ",
- "stalled cycles per insn", max_stalled / instructions);
- }
-}
-
-static void print_cycles(struct perf_stat_config *config,
- const struct evsel *evsel,
- int aggr_idx, double cycles,
- struct perf_stat_output_ctx *out)
-{
- double nsecs = find_stat(evsel, aggr_idx, STAT_NSECS);
-
- if (cycles && nsecs) {
- double ratio = cycles / nsecs;
-
- out->print_metric(config, out->ctx, METRIC_THRESHOLD_UNKNOWN, "%8.3f",
- "GHz", ratio);
- } else {
- out->print_metric(config, out->ctx, METRIC_THRESHOLD_UNKNOWN, /*fmt=*/NULL,
- "GHz", 0);
- }
-}
-
-static void print_nsecs(struct perf_stat_config *config,
- const struct evsel *evsel,
- int aggr_idx __maybe_unused, double nsecs,
- struct perf_stat_output_ctx *out)
-{
- print_metric_t print_metric = out->print_metric;
- void *ctxp = out->ctx;
- double wall_time = avg_stats(&walltime_nsecs_stats);
-
- if (wall_time) {
- print_metric(config, ctxp, METRIC_THRESHOLD_UNKNOWN, "%8.3f", "CPUs utilized",
- nsecs / (wall_time * evsel->scale));
- } else {
- print_metric(config, ctxp, METRIC_THRESHOLD_UNKNOWN, /*fmt=*/NULL,
- "CPUs utilized", 0);
- }
-}
-
static int prepare_metric(const struct metric_expr *mexp,
const struct evsel *evsel,
struct expr_parse_ctx *pctx,
@@ -682,56 +337,15 @@ void *perf_stat__print_shadow_stats_metricgroup(struct perf_stat_config *config,
void perf_stat__print_shadow_stats(struct perf_stat_config *config,
struct evsel *evsel,
- double avg, int aggr_idx,
+ int aggr_idx,
struct perf_stat_output_ctx *out)
{
- typedef void (*stat_print_function_t)(struct perf_stat_config *config,
- const struct evsel *evsel,
- int aggr_idx, double misses,
- struct perf_stat_output_ctx *out);
- static const stat_print_function_t stat_print_function[STAT_MAX] = {
- [STAT_INSTRUCTIONS] = print_instructions,
- [STAT_BRANCH_MISS] = print_branch_miss,
- [STAT_L1D_MISS] = print_l1d_miss,
- [STAT_L1I_MISS] = print_l1i_miss,
- [STAT_DTLB_MISS] = print_dtlb_miss,
- [STAT_ITLB_MISS] = print_itlb_miss,
- [STAT_LL_MISS] = print_ll_miss,
- [STAT_CACHE_MISSES] = print_cache_miss,
- [STAT_STALLED_CYCLES_FRONT] = print_stalled_cycles_front,
- [STAT_STALLED_CYCLES_BACK] = print_stalled_cycles_back,
- [STAT_CYCLES] = print_cycles,
- [STAT_NSECS] = print_nsecs,
- };
print_metric_t print_metric = out->print_metric;
void *ctxp = out->ctx;
- int num = 1;
+ int num = 0;
- if (config->iostat_run) {
+ if (config->iostat_run)
iostat_print_metric(config, evsel, out);
- } else {
- stat_print_function_t fn = stat_print_function[evsel__stat_type(evsel)];
-
- if (fn)
- fn(config, evsel, aggr_idx, avg, out);
- else {
- double nsecs = find_stat(evsel, aggr_idx, STAT_NSECS);
-
- if (nsecs) {
- char unit = ' ';
- char unit_buf[10] = "/sec";
- double ratio = convert_unit_double(1000000000.0 * avg / nsecs,
- &unit);
-
- if (unit != ' ')
- snprintf(unit_buf, sizeof(unit_buf), "%c/sec", unit);
- print_metric(config, ctxp, METRIC_THRESHOLD_UNKNOWN, "%8.3f",
- unit_buf, ratio);
- } else {
- num = 0;
- }
- }
- }
perf_stat__print_shadow_stats_metricgroup(config, evsel, aggr_idx,
&num, NULL, out);
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 34f30a295f89..53e4aa411e5f 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -184,7 +184,7 @@ struct perf_stat_output_ctx {
void perf_stat__print_shadow_stats(struct perf_stat_config *config,
struct evsel *evsel,
- double avg, int aggr_idx,
+ int aggr_idx,
struct perf_stat_output_ctx *out);
bool perf_stat__skip_metric_event(struct evsel *evsel, u64 ena, u64 run);
void *perf_stat__print_shadow_stats_metricgroup(struct perf_stat_config *config,
--
2.51.1.821.gb6fe4d2222-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread* [PATCH v1 13/22] perf stat: Fix default metricgroup display on hybrid
2025-10-24 17:58 [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
` (11 preceding siblings ...)
2025-10-24 17:58 ` [PATCH v1 12/22] perf stat: Remove hard coded shadow metrics Ian Rogers
@ 2025-10-24 17:58 ` Ian Rogers
2025-10-24 17:58 ` [PATCH v1 14/22] perf stat: Sort default events/metrics Ian Rogers
` (10 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Ian Rogers @ 2025-10-24 17:58 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
linux-perf-users
The logic to skip output of a default metric line was firing on
Alderlake and not displaying 'TopdownL1 (cpu_atom)'. Remove the
need_full_name check as it is equivalent to the different PMU test in
the cases we care about, merge the 'if's and flip the evsel of the PMU
test. The 'if' is now basically saying, if the output matches the last
printed output then skip the output.
Before:
```
TopdownL1 (cpu_core) # 11.3 % tma_bad_speculation
# 24.3 % tma_frontend_bound
TopdownL1 (cpu_core) # 33.9 % tma_backend_bound
# 30.6 % tma_retiring
# 42.2 % tma_backend_bound
# 25.0 % tma_frontend_bound (49.81%)
# 12.8 % tma_bad_speculation
# 20.0 % tma_retiring (59.46%)
```
After:
```
TopdownL1 (cpu_core) # 8.3 % tma_bad_speculation
# 43.7 % tma_frontend_bound
# 30.7 % tma_backend_bound
# 17.2 % tma_retiring
TopdownL1 (cpu_atom) # 31.9 % tma_backend_bound
# 37.6 % tma_frontend_bound (49.66%)
# 18.0 % tma_bad_speculation
# 12.6 % tma_retiring (59.58%)
```
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/util/stat-shadow.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index afbc49e8cb31..c1547128c396 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -256,11 +256,9 @@ static void perf_stat__print_metricgroup_header(struct perf_stat_config *config,
* event. Only align with other metics from
* different metric events.
*/
- if (last_name && !strcmp(last_name, name)) {
- if (!need_full_name || last_pmu != evsel->pmu) {
- out->print_metricgroup_header(config, ctxp, NULL);
- return;
- }
+ if (last_name && !strcmp(last_name, name) && last_pmu == evsel->pmu) {
+ out->print_metricgroup_header(config, ctxp, NULL);
+ return;
}
if (need_full_name && evsel->pmu)
--
2.51.1.821.gb6fe4d2222-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread* [PATCH v1 14/22] perf stat: Sort default events/metrics
2025-10-24 17:58 [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
` (12 preceding siblings ...)
2025-10-24 17:58 ` [PATCH v1 13/22] perf stat: Fix default metricgroup display on hybrid Ian Rogers
@ 2025-10-24 17:58 ` Ian Rogers
2025-10-24 17:58 ` [PATCH v1 15/22] perf stat: Remove "unit" workarounds for metric-only Ian Rogers
` (9 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Ian Rogers @ 2025-10-24 17:58 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
linux-perf-users
To improve the readability of default events/metrics, sort the evsels
after the Default metric groups have be parsed.
Before:
```
$ perf stat -a sleep 1
Performance counter stats for 'system wide':
21,194 context-switches # 752.1 cs/sec cs_per_second
TopdownL1 (cpu_core) # 9.4 % tma_bad_speculation
# 25.0 % tma_frontend_bound
# 37.0 % tma_backend_bound
# 28.7 % tma_retiring
6,371 page-faults # 226.1 faults/sec page_faults_per_second
734,456,525 cpu_atom/cpu-cycles/ # 0.0 GHz cycles_frequency (49.77%)
1,679,085,181 cpu_core/cpu-cycles/ # 0.1 GHz cycles_frequency
TopdownL1 (cpu_atom) # 19.2 % tma_bad_speculation
# 15.1 % tma_retiring (50.15%)
# 32.9 % tma_backend_bound
# 32.9 % tma_frontend_bound (50.34%)
86,758,824 cpu_atom/branches/ # 3.1 K/sec branch_frequency (60.26%)
524,281,539 cpu_core/branches/ # 18.6 K/sec branch_frequency
1,458 cpu-migrations # 51.7 migrations/sec migrations_per_second
28,178,124,975 cpu-clock # 28.0 CPUs CPUs_utilized
776,037,182 cpu_atom/cpu-cycles/ # 0.6 instructions insn_per_cycle (60.18%)
1,679,168,140 cpu_core/cpu-cycles/ # 1.8 instructions insn_per_cycle
4,045,615 cpu_atom/branches-misses/ # 5.3 % branch_miss_rate (49.65%)
6,656,795 cpu_core/branches-misses/ # 1.3 % branch_miss_rate
1.007340329 seconds time elapsed
```
After:
```
$ perf stat -a sleep 1
Performance counter stats for 'system wide':
25,701 context-switches # 911.8 cs/sec cs_per_second
28,187,404,943 cpu-clock # 28.0 CPUs CPUs_utilized
2,053 cpu-migrations # 72.8 migrations/sec migrations_per_second
12,390 page-faults # 439.6 faults/sec page_faults_per_second
592,082,798 cpu_core/branches/ # 21.0 K/sec branch_frequency
7,762,204 cpu_core/branches-misses/ # 1.3 % branch_miss_rate
1,925,833,804 cpu_core/cpu-cycles/ # 0.1 GHz cycles_frequency
1,925,848,650 cpu_core/cpu-cycles/ # 1.7 instructions insn_per_cycle
95,449,119 cpu_atom/branches/ # 3.4 K/sec branch_frequency (59.78%)
4,278,932 cpu_atom/branches-misses/ # 4.3 % branch_miss_rate (50.26%)
980,441,753 cpu_atom/cpu-cycles/ # 0.0 GHz cycles_frequency (50.34%)
1,091,626,599 cpu_atom/cpu-cycles/ # 0.6 instructions insn_per_cycle (50.37%)
TopdownL1 (cpu_core) # 9.1 % tma_bad_speculation
# 27.3 % tma_frontend_bound
# 35.7 % tma_backend_bound
# 27.9 % tma_retiring
TopdownL1 (cpu_atom) # 31.1 % tma_backend_bound
# 34.3 % tma_frontend_bound (49.74%)
# 24.1 % tma_bad_speculation
# 10.5 % tma_retiring (59.57%)
```
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/builtin-stat.c | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index c00d84a04593..4d15eabb4927 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -74,6 +74,7 @@
#include "util/intel-tpebs.h"
#include "asm/bug.h"
+#include <linux/list_sort.h>
#include <linux/time64.h>
#include <linux/zalloc.h>
#include <api/fs/fs.h>
@@ -1853,6 +1854,35 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
return 0;
}
+static int default_evlist_evsel_cmp(void *priv __maybe_unused,
+ const struct list_head *l,
+ const struct list_head *r)
+{
+ const struct perf_evsel *lhs_core = container_of(l, struct perf_evsel, node);
+ const struct evsel *lhs = container_of(lhs_core, struct evsel, core);
+ const struct perf_evsel *rhs_core = container_of(r, struct perf_evsel, node);
+ const struct evsel *rhs = container_of(rhs_core, struct evsel, core);
+
+ if (evsel__leader(lhs) == evsel__leader(rhs)) {
+ /* Within the same group, respect the original order. */
+ return lhs_core->idx - rhs_core->idx;
+ }
+
+ /* Sort default metrics evsels first, and default show events before those. */
+ if (lhs->default_metricgroup != rhs->default_metricgroup)
+ return lhs->default_metricgroup ? -1 : 1;
+
+ if (lhs->default_show_events != rhs->default_show_events)
+ return lhs->default_show_events ? -1 : 1;
+
+ /* Sort by PMU type (prefers legacy types first). */
+ if (lhs->pmu != rhs->pmu)
+ return lhs->pmu->type - rhs->pmu->type;
+
+ /* Sort by name. */
+ return strcmp(evsel__name((struct evsel *)lhs), evsel__name((struct evsel *)rhs));
+}
+
/*
* Add default events, if there were no attributes specified or
* if -d/--detailed, -d -d or -d -d -d is used:
@@ -2019,6 +2049,8 @@ static int add_default_events(void)
&metric_evlist->metric_events);
evlist__delete(metric_evlist);
}
+ list_sort(/*priv=*/NULL, &evlist->core.entries, default_evlist_evsel_cmp);
+
}
out:
if (!ret) {
--
2.51.1.821.gb6fe4d2222-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread* [PATCH v1 15/22] perf stat: Remove "unit" workarounds for metric-only
2025-10-24 17:58 [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
` (13 preceding siblings ...)
2025-10-24 17:58 ` [PATCH v1 14/22] perf stat: Sort default events/metrics Ian Rogers
@ 2025-10-24 17:58 ` Ian Rogers
2025-10-24 17:58 ` [PATCH v1 16/22] perf test stat+json: Improve metric-only testing Ian Rogers
` (8 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Ian Rogers @ 2025-10-24 17:58 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
linux-perf-users
Remove code that tested the "unit" as in KB/sec for certain hard coded
metric values and did workarounds.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/util/stat-display.c | 47 ++++++----------------------------
1 file changed, 8 insertions(+), 39 deletions(-)
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index eabeab5e6614..b3596f9f5cdd 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -592,42 +592,18 @@ static void print_metricgroup_header_std(struct perf_stat_config *config,
fprintf(config->output, "%*s", MGROUP_LEN - n - 1, "");
}
-/* Filter out some columns that don't work well in metrics only mode */
-
-static bool valid_only_metric(const char *unit)
-{
- if (!unit)
- return false;
- if (strstr(unit, "/sec") ||
- strstr(unit, "CPUs utilized"))
- return false;
- return true;
-}
-
-static const char *fixunit(char *buf, struct evsel *evsel,
- const char *unit)
-{
- if (!strncmp(unit, "of all", 6)) {
- snprintf(buf, 1024, "%s %s", evsel__name(evsel),
- unit);
- return buf;
- }
- return unit;
-}
-
static void print_metric_only(struct perf_stat_config *config,
void *ctx, enum metric_threshold_classify thresh,
const char *fmt, const char *unit, double val)
{
struct outstate *os = ctx;
FILE *out = os->fh;
- char buf[1024], str[1024];
+ char str[1024];
unsigned mlen = config->metric_only_len;
const char *color = metric_threshold_classify__color(thresh);
- if (!valid_only_metric(unit))
- return;
- unit = fixunit(buf, os->evsel, unit);
+ if (!unit)
+ unit = "";
if (mlen < strlen(unit))
mlen = strlen(unit) + 1;
@@ -643,16 +619,15 @@ static void print_metric_only_csv(struct perf_stat_config *config __maybe_unused
void *ctx,
enum metric_threshold_classify thresh __maybe_unused,
const char *fmt,
- const char *unit, double val)
+ const char *unit __maybe_unused, double val)
{
struct outstate *os = ctx;
FILE *out = os->fh;
char buf[64], *vals, *ends;
- char tbuf[1024];
- if (!valid_only_metric(unit))
+ if (!unit)
return;
- unit = fixunit(tbuf, os->evsel, unit);
+
snprintf(buf, sizeof(buf), fmt ?: "", val);
ends = vals = skip_spaces(buf);
while (isdigit(*ends) || *ends == '.')
@@ -670,13 +645,9 @@ static void print_metric_only_json(struct perf_stat_config *config __maybe_unuse
{
struct outstate *os = ctx;
char buf[64], *ends;
- char tbuf[1024];
const char *vals;
- if (!valid_only_metric(unit))
- return;
- unit = fixunit(tbuf, os->evsel, unit);
- if (!unit[0])
+ if (!unit || !unit[0])
return;
snprintf(buf, sizeof(buf), fmt ?: "", val);
vals = ends = skip_spaces(buf);
@@ -695,7 +666,6 @@ static void print_metric_header(struct perf_stat_config *config,
const char *unit, double val __maybe_unused)
{
struct outstate *os = ctx;
- char tbuf[1024];
/* In case of iostat, print metric header for first root port only */
if (config->iostat_run &&
@@ -705,9 +675,8 @@ static void print_metric_header(struct perf_stat_config *config,
if (os->evsel->cgrp != os->cgrp)
return;
- if (!valid_only_metric(unit))
+ if (!unit)
return;
- unit = fixunit(tbuf, os->evsel, unit);
if (config->json_output)
return;
--
2.51.1.821.gb6fe4d2222-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread* [PATCH v1 16/22] perf test stat+json: Improve metric-only testing
2025-10-24 17:58 [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
` (14 preceding siblings ...)
2025-10-24 17:58 ` [PATCH v1 15/22] perf stat: Remove "unit" workarounds for metric-only Ian Rogers
@ 2025-10-24 17:58 ` Ian Rogers
2025-10-24 17:58 ` [PATCH v1 17/22] perf test stat: Ignore failures in Default[234] metricgroups Ian Rogers
` (7 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Ian Rogers @ 2025-10-24 17:58 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
linux-perf-users
When testing metric-only, pass a metric to perf rather than expecting
a hard coded metric value to be generated.
Remove keys that were really metric-only units and instead don't
expect metric only to have a matching json key as it encodes metrics
as {"metric_name", "metric_value"}.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/tests/shell/lib/perf_json_output_lint.py | 4 ++--
tools/perf/tests/shell/stat+json_output.sh | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/perf/tests/shell/lib/perf_json_output_lint.py b/tools/perf/tests/shell/lib/perf_json_output_lint.py
index c6750ef06c0f..1369baaa0361 100644
--- a/tools/perf/tests/shell/lib/perf_json_output_lint.py
+++ b/tools/perf/tests/shell/lib/perf_json_output_lint.py
@@ -65,8 +65,6 @@ def check_json_output(expected_items):
'socket': lambda x: True,
'thread': lambda x: True,
'unit': lambda x: True,
- 'insn per cycle': lambda x: isfloat(x),
- 'GHz': lambda x: True, # FIXME: it seems unintended for --metric-only
}
input = '[\n' + ','.join(Lines) + '\n]'
for item in json.loads(input):
@@ -88,6 +86,8 @@ def check_json_output(expected_items):
f' in \'{item}\'')
for key, value in item.items():
if key not in checks:
+ if args.metric_only:
+ continue
raise RuntimeError(f'Unexpected key: key={key} value={value}')
if not checks[key](value):
raise RuntimeError(f'Check failed for: key={key} value={value}')
diff --git a/tools/perf/tests/shell/stat+json_output.sh b/tools/perf/tests/shell/stat+json_output.sh
index 98fb65274ac4..85d1ad7186c6 100755
--- a/tools/perf/tests/shell/stat+json_output.sh
+++ b/tools/perf/tests/shell/stat+json_output.sh
@@ -181,7 +181,7 @@ check_metric_only()
echo "[Skip] CPU-measurement counter facility not installed"
return
fi
- perf stat -j --metric-only -e instructions,cycles -o "${stat_output}" true
+ perf stat -j --metric-only -M page_faults_per_second -o "${stat_output}" true
$PYTHON $pythonchecker --metric-only --file "${stat_output}"
echo "[Success]"
}
--
2.51.1.821.gb6fe4d2222-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread* [PATCH v1 17/22] perf test stat: Ignore failures in Default[234] metricgroups
2025-10-24 17:58 [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
` (15 preceding siblings ...)
2025-10-24 17:58 ` [PATCH v1 16/22] perf test stat+json: Improve metric-only testing Ian Rogers
@ 2025-10-24 17:58 ` Ian Rogers
2025-10-24 17:58 ` [PATCH v1 18/22] perf test stat: Update std_output testing metric expectations Ian Rogers
` (6 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Ian Rogers @ 2025-10-24 17:58 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
linux-perf-users
The Default[234] metric groups may contain unsupported legacy
events. Allow those metric groups to fail.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/tests/shell/stat_all_metricgroups.sh | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/perf/tests/shell/stat_all_metricgroups.sh b/tools/perf/tests/shell/stat_all_metricgroups.sh
index c6d61a4ac3e7..1400880ec01f 100755
--- a/tools/perf/tests/shell/stat_all_metricgroups.sh
+++ b/tools/perf/tests/shell/stat_all_metricgroups.sh
@@ -37,6 +37,9 @@ do
then
err=2 # Skip
fi
+ elif [[ "$m" == @(Default2|Default3|Default4) ]]
+ then
+ echo "Ignoring failures in $m that may contain unsupported legacy events"
else
echo "Metric group $m failed"
echo $result
--
2.51.1.821.gb6fe4d2222-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread* [PATCH v1 18/22] perf test stat: Update std_output testing metric expectations
2025-10-24 17:58 [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
` (16 preceding siblings ...)
2025-10-24 17:58 ` [PATCH v1 17/22] perf test stat: Ignore failures in Default[234] metricgroups Ian Rogers
@ 2025-10-24 17:58 ` Ian Rogers
2025-10-24 17:58 ` [PATCH v1 19/22] perf test metrics: Update all metrics for possibly failing default metrics Ian Rogers
` (5 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Ian Rogers @ 2025-10-24 17:58 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
linux-perf-users
Make the expectations match json metrics rather than the previous hard
coded ones.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/tests/shell/stat+std_output.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/perf/tests/shell/stat+std_output.sh b/tools/perf/tests/shell/stat+std_output.sh
index ec41f24299d9..9c4b92ecf448 100755
--- a/tools/perf/tests/shell/stat+std_output.sh
+++ b/tools/perf/tests/shell/stat+std_output.sh
@@ -12,8 +12,8 @@ set -e
stat_output=$(mktemp /tmp/__perf_test.stat_output.std.XXXXX)
event_name=(cpu-clock task-clock context-switches cpu-migrations page-faults stalled-cycles-frontend stalled-cycles-backend cycles instructions branches branch-misses)
-event_metric=("CPUs utilized" "CPUs utilized" "/sec" "/sec" "/sec" "frontend cycles idle" "backend cycles idle" "GHz" "insn per cycle" "/sec" "of all branches")
-skip_metric=("stalled cycles per insn" "tma_" "retiring" "frontend_bound" "bad_speculation" "backend_bound" "TopdownL1" "percent of slots")
+event_metric=("CPUs_utilized" "CPUs_utilized" "cs/sec" "migrations/sec" "faults/sec" "frontend_cycles_idle" "backend_cycles_idle" "GHz" "insn_per_cycle" "/sec" "branch_miss_rate")
+skip_metric=("tma_" "TopdownL1")
cleanup() {
rm -f "${stat_output}"
--
2.51.1.821.gb6fe4d2222-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread* [PATCH v1 19/22] perf test metrics: Update all metrics for possibly failing default metrics
2025-10-24 17:58 [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
` (17 preceding siblings ...)
2025-10-24 17:58 ` [PATCH v1 18/22] perf test stat: Update std_output testing metric expectations Ian Rogers
@ 2025-10-24 17:58 ` Ian Rogers
2025-10-24 17:58 ` [PATCH v1 20/22] perf test stat: Update shadow test to use metrics Ian Rogers
` (4 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Ian Rogers @ 2025-10-24 17:58 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
linux-perf-users
Default metrics may use unsupported events and be ignored. These
metrics shouldn't cause metric testing to fail.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/tests/shell/stat_all_metrics.sh | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/tools/perf/tests/shell/stat_all_metrics.sh b/tools/perf/tests/shell/stat_all_metrics.sh
index 6fa585a1e34c..a7edf01b3943 100755
--- a/tools/perf/tests/shell/stat_all_metrics.sh
+++ b/tools/perf/tests/shell/stat_all_metrics.sh
@@ -25,8 +25,13 @@ for m in $(perf list --raw-dump metrics); do
# No error result and metric shown.
continue
fi
- if [[ "$result" =~ "Cannot resolve IDs for" ]]
+ if [[ "$result" =~ "Cannot resolve IDs for" || "$result" =~ "No supported events found" ]]
then
+ if [[ "$m" == @(l1_prefetch_miss_rate|stalled_cycles_per_instruction) ]]
+ then
+ # Default metrics that may use unsupported events.
+ continue
+ fi
echo "Metric contains missing events"
echo $result
err=1 # Fail
--
2.51.1.821.gb6fe4d2222-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread* [PATCH v1 20/22] perf test stat: Update shadow test to use metrics
2025-10-24 17:58 [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
` (18 preceding siblings ...)
2025-10-24 17:58 ` [PATCH v1 19/22] perf test metrics: Update all metrics for possibly failing default metrics Ian Rogers
@ 2025-10-24 17:58 ` Ian Rogers
2025-10-24 17:58 ` [PATCH v1 21/22] perf test stat: Update test expectations and events Ian Rogers
` (3 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Ian Rogers @ 2025-10-24 17:58 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
linux-perf-users
Previously '-e cycles,instructions' would implicitly create an IPC
metric. This now has to be explicit with '-M insn_per_cycle'.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/tests/shell/stat+shadow_stat.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/perf/tests/shell/stat+shadow_stat.sh b/tools/perf/tests/shell/stat+shadow_stat.sh
index 8824f445d343..cabbbf17c662 100755
--- a/tools/perf/tests/shell/stat+shadow_stat.sh
+++ b/tools/perf/tests/shell/stat+shadow_stat.sh
@@ -14,7 +14,7 @@ perf stat -a -e cycles sleep 1 2>&1 | grep -e cpu_core && exit 2
test_global_aggr()
{
- perf stat -a --no-big-num -e cycles,instructions sleep 1 2>&1 | \
+ perf stat -a --no-big-num -M insn_per_cycle sleep 1 2>&1 | \
grep -e cycles -e instructions | \
while read num evt _ ipc rest
do
@@ -53,7 +53,7 @@ test_global_aggr()
test_no_aggr()
{
- perf stat -a -A --no-big-num -e cycles,instructions sleep 1 2>&1 | \
+ perf stat -a -A --no-big-num -M insn_per_cycle sleep 1 2>&1 | \
grep ^CPU | \
while read cpu num evt _ ipc rest
do
--
2.51.1.821.gb6fe4d2222-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread* [PATCH v1 21/22] perf test stat: Update test expectations and events
2025-10-24 17:58 [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
` (19 preceding siblings ...)
2025-10-24 17:58 ` [PATCH v1 20/22] perf test stat: Update shadow test to use metrics Ian Rogers
@ 2025-10-24 17:58 ` Ian Rogers
2025-10-24 17:58 ` [PATCH v1 22/22] perf test stat csv: " Ian Rogers
` (2 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Ian Rogers @ 2025-10-24 17:58 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
linux-perf-users
test_stat_record_report and test_stat_record_script used default
output which triggers a bug when sending metrics. As this isn't
relevant to the test switch to using named software events.
Update the match in test_hybrid as the cycles event is now cpu-cycles
to workaround potential ARM issues.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/tests/shell/stat.sh | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/perf/tests/shell/stat.sh b/tools/perf/tests/shell/stat.sh
index 8a100a7f2dc1..985adc02749e 100755
--- a/tools/perf/tests/shell/stat.sh
+++ b/tools/perf/tests/shell/stat.sh
@@ -18,7 +18,7 @@ test_default_stat() {
test_stat_record_report() {
echo "stat record and report test"
- if ! perf stat record -o - true | perf stat report -i - 2>&1 | \
+ if ! perf stat record -e task-clock -o - true | perf stat report -i - 2>&1 | \
grep -E -q "Performance counter stats for 'pipe':"
then
echo "stat record and report test [Failed]"
@@ -30,7 +30,7 @@ test_stat_record_report() {
test_stat_record_script() {
echo "stat record and script test"
- if ! perf stat record -o - true | perf script -i - 2>&1 | \
+ if ! perf stat record -e task-clock -o - true | perf script -i - 2>&1 | \
grep -E -q "CPU[[:space:]]+THREAD[[:space:]]+VAL[[:space:]]+ENA[[:space:]]+RUN[[:space:]]+TIME[[:space:]]+EVENT"
then
echo "stat record and script test [Failed]"
@@ -196,7 +196,7 @@ test_hybrid() {
fi
# Run default Perf stat
- cycles_events=$(perf stat -- true 2>&1 | grep -E "/cycles/[uH]*| cycles[:uH]* " -c)
+ cycles_events=$(perf stat -a -- sleep 0.1 2>&1 | grep -E "/cpu-cycles/[uH]*| cpu-cycles[:uH]* " -c)
# The expectation is that default output will have a cycles events on each
# hybrid PMU. In situations with no cycles PMU events, like virtualized, this
--
2.51.1.821.gb6fe4d2222-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread* [PATCH v1 22/22] perf test stat csv: Update test expectations and events
2025-10-24 17:58 [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
` (20 preceding siblings ...)
2025-10-24 17:58 ` [PATCH v1 21/22] perf test stat: Update test expectations and events Ian Rogers
@ 2025-10-24 17:58 ` Ian Rogers
2025-10-30 20:51 ` [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
2025-11-04 4:47 ` Namhyung Kim
23 siblings, 0 replies; 36+ messages in thread
From: Ian Rogers @ 2025-10-24 17:58 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
linux-perf-users
Explicitly use a metric rather than implicitly expecting '-e
instructions,cycles' to produce a metric. Use a metric with software
events to make it more compatible.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/tests/shell/lib/stat_output.sh | 2 +-
tools/perf/tests/shell/stat+csv_output.sh | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/perf/tests/shell/lib/stat_output.sh b/tools/perf/tests/shell/lib/stat_output.sh
index c2ec7881ec1d..3c36e80fe422 100644
--- a/tools/perf/tests/shell/lib/stat_output.sh
+++ b/tools/perf/tests/shell/lib/stat_output.sh
@@ -156,7 +156,7 @@ check_metric_only()
echo "[Skip] CPU-measurement counter facility not installed"
return
fi
- perf stat --metric-only $2 -e instructions,cycles true
+ perf stat --metric-only $2 -M page_faults_per_second true
commachecker --metric-only
echo "[Success]"
}
diff --git a/tools/perf/tests/shell/stat+csv_output.sh b/tools/perf/tests/shell/stat+csv_output.sh
index 7a6f6e177402..cd6fff597091 100755
--- a/tools/perf/tests/shell/stat+csv_output.sh
+++ b/tools/perf/tests/shell/stat+csv_output.sh
@@ -44,7 +44,7 @@ function commachecker()
;; "--per-die") exp=8
;; "--per-cluster") exp=8
;; "--per-cache") exp=8
- ;; "--metric-only") exp=2
+ ;; "--metric-only") exp=1
esac
while read line
--
2.51.1.821.gb6fe4d2222-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread* Re: [PATCH v1 00/22] Switch the default perf stat metrics to json
2025-10-24 17:58 [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
` (21 preceding siblings ...)
2025-10-24 17:58 ` [PATCH v1 22/22] perf test stat csv: " Ian Rogers
@ 2025-10-30 20:51 ` Ian Rogers
2025-11-03 17:05 ` Ian Rogers
2025-11-04 4:47 ` Namhyung Kim
23 siblings, 1 reply; 36+ messages in thread
From: Ian Rogers @ 2025-10-30 20:51 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
linux-perf-users, Andi Kleen, Weilin Wang
On Fri, Oct 24, 2025 at 10:59 AM Ian Rogers <irogers@google.com> wrote:
>
> Prior to this series stat-shadow would produce hard coded metrics if
> certain events appeared in the evlist. This series produces equivalent
> json metrics and cleans up the consequences in tests and display
> output. A before and after of the default display output on a
> tigerlake is:
>
> Before:
> ```
> $ perf stat -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 16,041,816,418 cpu-clock # 15.995 CPUs utilized
> 5,749 context-switches # 358.376 /sec
> 121 cpu-migrations # 7.543 /sec
> 1,806 page-faults # 112.581 /sec
> 825,965,204 instructions # 0.70 insn per cycle
> 1,180,799,101 cycles # 0.074 GHz
> 168,945,109 branches # 10.532 M/sec
> 4,629,567 branch-misses # 2.74% of all branches
> # 30.2 % tma_backend_bound
> # 7.8 % tma_bad_speculation
> # 47.1 % tma_frontend_bound
> # 14.9 % tma_retiring
> ```
>
> After:
> ```
> $ perf stat -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 2,890 context-switches # 179.9 cs/sec cs_per_second
> 16,061,923,339 cpu-clock # 16.0 CPUs CPUs_utilized
> 43 cpu-migrations # 2.7 migrations/sec migrations_per_second
> 5,645 page-faults # 351.5 faults/sec page_faults_per_second
> 5,708,413 branch-misses # 1.4 % branch_miss_rate (88.83%)
> 429,978,120 branches # 26.8 K/sec branch_frequency (88.85%)
> 1,626,915,897 cpu-cycles # 0.1 GHz cycles_frequency (88.84%)
> 2,556,805,534 instructions # 1.5 instructions insn_per_cycle (88.86%)
> TopdownL1 # 20.1 % tma_backend_bound
> # 40.5 % tma_bad_speculation (88.90%)
> # 17.2 % tma_frontend_bound (78.05%)
> # 22.2 % tma_retiring (88.89%)
>
> 1.002994394 seconds time elapsed
> ```
>
> Having the metrics in json brings greater uniformity, allows events to
> be shared by metrics, and it also allows descriptions like:
> ```
> $ perf list cs_per_second
> ...
> cs_per_second
> [Context switches per CPU second]
> ```
>
> A thorn in the side of doing this work was that the hard coded metrics
> were used by perf script with '-F metric'. This functionality didn't
> work for me (I was testing `perf record -e instructions,cycles` and
> then `perf script -F metric` but saw nothing but empty lines) but
> anyway I decided to fix it to the best of my ability in this
> series. So the script side counters were removed and the regular ones
> associated with the evsel used. The json metrics were all searched
> looking for ones that have a subset of events matching those in the
> perf script session, and all metrics are printed. This is kind of
> weird as the counters are being set by the period of samples, but I
> carried the behavior forward. I suspect there needs to be follow up
> work to make this better, but what is in the series is superior to
> what is currently in the tree. Follow up work could include finding
> metrics for the machine in the perf.data rather than using the host,
> allowing multiple metrics even if the metric ids of the events differ,
> fixing pre-existing `perf stat record/report` issues, etc.
>
> There is a lot of stat tests that, for example, assume '-e
> instructions,cycles' will produce an IPC metric. These things needed
> tidying as now the metric must be explicitly asked for and when doing
> this ones using software events were preferred to increase
> compatibility. As the test updates were numerous they are distinct to
> the patches updating the functionality causing periods in the series
> where not all tests are passing. If this is undesirable the test fixes
> can be squashed into the functionality updates.
Hi,
no comments on this series yet, please help! I'd like to land this
work and then rebase the python generating metric work [1] on it. The
metric generation work is largely independent of everything else but
there are collisions in the json Makefile/Build files.
Thanks,
Ian
[1]
* Foundations: https://lore.kernel.org/lkml/20240228175617.4049201-1-irogers@google.com/
* AMD: https://lore.kernel.org/lkml/20240229001537.4158049-1-irogers@google.com/
* Intel: https://lore.kernel.org/lkml/20240229001806.4158429-1-irogers@google.com/
* ARM: https://lore.kernel.org/lkml/20240229001325.4157655-1-irogers@google.com/
> Ian Rogers (22):
> perf evsel: Remove unused metric_events variable
> perf metricgroup: Update comment on location of metric_event list
> perf metricgroup: Missed free on error path
> perf metricgroup: When copy metrics copy default information
> perf metricgroup: Add care to picking the evsel for displaying a
> metric
> perf jevents: Make all tables static
> perf expr: Add #target_cpu literal
> perf jevents: Add set of common metrics based on default ones
> perf jevents: Add metric DefaultShowEvents
> perf stat: Add detail -d,-dd,-ddd metrics
> perf script: Change metric format to use json metrics
> perf stat: Remove hard coded shadow metrics
> perf stat: Fix default metricgroup display on hybrid
> perf stat: Sort default events/metrics
> perf stat: Remove "unit" workarounds for metric-only
> perf test stat+json: Improve metric-only testing
> perf test stat: Ignore failures in Default[234] metricgroups
> perf test stat: Update std_output testing metric expectations
> perf test metrics: Update all metrics for possibly failing default
> metrics
> perf test stat: Update shadow test to use metrics
> perf test stat: Update test expectations and events
> perf test stat csv: Update test expectations and events
>
> tools/perf/builtin-script.c | 238 ++++++++++-
> tools/perf/builtin-stat.c | 154 ++-----
> .../arch/common/common/metrics.json | 151 +++++++
> tools/perf/pmu-events/empty-pmu-events.c | 139 ++++--
> tools/perf/pmu-events/jevents.py | 34 +-
> tools/perf/pmu-events/pmu-events.h | 2 +
> .../tests/shell/lib/perf_json_output_lint.py | 4 +-
> tools/perf/tests/shell/lib/stat_output.sh | 2 +-
> tools/perf/tests/shell/stat+csv_output.sh | 2 +-
> tools/perf/tests/shell/stat+json_output.sh | 2 +-
> tools/perf/tests/shell/stat+shadow_stat.sh | 4 +-
> tools/perf/tests/shell/stat+std_output.sh | 4 +-
> tools/perf/tests/shell/stat.sh | 6 +-
> .../perf/tests/shell/stat_all_metricgroups.sh | 3 +
> tools/perf/tests/shell/stat_all_metrics.sh | 7 +-
> tools/perf/util/evsel.c | 2 -
> tools/perf/util/evsel.h | 2 +-
> tools/perf/util/expr.c | 3 +
> tools/perf/util/metricgroup.c | 95 ++++-
> tools/perf/util/metricgroup.h | 2 +-
> tools/perf/util/stat-display.c | 55 +--
> tools/perf/util/stat-shadow.c | 402 +-----------------
> tools/perf/util/stat.h | 2 +-
> 23 files changed, 672 insertions(+), 643 deletions(-)
> create mode 100644 tools/perf/pmu-events/arch/common/common/metrics.json
>
> --
> 2.51.1.821.gb6fe4d2222-goog
>
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: [PATCH v1 00/22] Switch the default perf stat metrics to json
2025-10-30 20:51 ` [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
@ 2025-11-03 17:05 ` Ian Rogers
0 siblings, 0 replies; 36+ messages in thread
From: Ian Rogers @ 2025-11-03 17:05 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Xu Yang, Chun-Tse Shao,
Thomas Richter, Sumanth Korikkar, Collin Funk, Thomas Falcon,
Howard Chu, Dapeng Mi, Levi Yun, Yang Li, linux-kernel,
linux-perf-users, Andi Kleen, Weilin Wang
On Thu, Oct 30, 2025 at 1:51 PM Ian Rogers <irogers@google.com> wrote:
>
> On Fri, Oct 24, 2025 at 10:59 AM Ian Rogers <irogers@google.com> wrote:
> >
> > Prior to this series stat-shadow would produce hard coded metrics if
> > certain events appeared in the evlist. This series produces equivalent
> > json metrics and cleans up the consequences in tests and display
> > output. A before and after of the default display output on a
> > tigerlake is:
> >
> > Before:
> > ```
> > $ perf stat -a sleep 1
> >
> > Performance counter stats for 'system wide':
> >
> > 16,041,816,418 cpu-clock # 15.995 CPUs utilized
> > 5,749 context-switches # 358.376 /sec
> > 121 cpu-migrations # 7.543 /sec
> > 1,806 page-faults # 112.581 /sec
> > 825,965,204 instructions # 0.70 insn per cycle
> > 1,180,799,101 cycles # 0.074 GHz
> > 168,945,109 branches # 10.532 M/sec
> > 4,629,567 branch-misses # 2.74% of all branches
> > # 30.2 % tma_backend_bound
> > # 7.8 % tma_bad_speculation
> > # 47.1 % tma_frontend_bound
> > # 14.9 % tma_retiring
> > ```
> >
> > After:
> > ```
> > $ perf stat -a sleep 1
> >
> > Performance counter stats for 'system wide':
> >
> > 2,890 context-switches # 179.9 cs/sec cs_per_second
> > 16,061,923,339 cpu-clock # 16.0 CPUs CPUs_utilized
> > 43 cpu-migrations # 2.7 migrations/sec migrations_per_second
> > 5,645 page-faults # 351.5 faults/sec page_faults_per_second
> > 5,708,413 branch-misses # 1.4 % branch_miss_rate (88.83%)
> > 429,978,120 branches # 26.8 K/sec branch_frequency (88.85%)
> > 1,626,915,897 cpu-cycles # 0.1 GHz cycles_frequency (88.84%)
> > 2,556,805,534 instructions # 1.5 instructions insn_per_cycle (88.86%)
> > TopdownL1 # 20.1 % tma_backend_bound
> > # 40.5 % tma_bad_speculation (88.90%)
> > # 17.2 % tma_frontend_bound (78.05%)
> > # 22.2 % tma_retiring (88.89%)
> >
> > 1.002994394 seconds time elapsed
> > ```
> >
> > Having the metrics in json brings greater uniformity, allows events to
> > be shared by metrics, and it also allows descriptions like:
> > ```
> > $ perf list cs_per_second
> > ...
> > cs_per_second
> > [Context switches per CPU second]
> > ```
> >
> > A thorn in the side of doing this work was that the hard coded metrics
> > were used by perf script with '-F metric'. This functionality didn't
> > work for me (I was testing `perf record -e instructions,cycles` and
> > then `perf script -F metric` but saw nothing but empty lines) but
> > anyway I decided to fix it to the best of my ability in this
> > series. So the script side counters were removed and the regular ones
> > associated with the evsel used. The json metrics were all searched
> > looking for ones that have a subset of events matching those in the
> > perf script session, and all metrics are printed. This is kind of
> > weird as the counters are being set by the period of samples, but I
> > carried the behavior forward. I suspect there needs to be follow up
> > work to make this better, but what is in the series is superior to
> > what is currently in the tree. Follow up work could include finding
> > metrics for the machine in the perf.data rather than using the host,
> > allowing multiple metrics even if the metric ids of the events differ,
> > fixing pre-existing `perf stat record/report` issues, etc.
> >
> > There is a lot of stat tests that, for example, assume '-e
> > instructions,cycles' will produce an IPC metric. These things needed
> > tidying as now the metric must be explicitly asked for and when doing
> > this ones using software events were preferred to increase
> > compatibility. As the test updates were numerous they are distinct to
> > the patches updating the functionality causing periods in the series
> > where not all tests are passing. If this is undesirable the test fixes
> > can be squashed into the functionality updates.
>
> Hi,
>
> no comments on this series yet, please help! I'd like to land this
> work and then rebase the python generating metric work [1] on it. The
> metric generation work is largely independent of everything else but
> there are collisions in the json Makefile/Build files.
Just to also add that the default perf stat output in perf-tools-next
looks like this on an Alderlake:
```
$ perf stat -a sleep 1
Performance counter stats for 'system wide':
0 cpu-clock # 0.000
CPUs utilized
19,362 context-switches
874 cpu-migrations
10,194 page-faults
633,489,938 cpu_atom/instructions/ # 0.69
insn per cycle (87.25%)
3,738,623,788 cpu_core/instructions/ # 2.05
insn per cycle
923,779,727 cpu_atom/cycles/
(87.28%)
1,821,165,755 cpu_core/cycles/
102,969,608 cpu_atom/branches/
(87.41%)
594,784,374 cpu_core/branches/
4,376,709 cpu_atom/branch-misses/ # 4.25% of
all branches (87.66%)
7,886,194 cpu_core/branch-misses/ # 1.33% of
all branches
# 10.4 % tma_bad_speculation
# 21.5 %
tma_frontend_bound
# 34.5 % tma_backend_bound
# 33.5 %
tma_retiring
# 17.7 % tma_bad_speculation
# 17.8 %
tma_retiring (87.64%)
# 33.4 % tma_backend_bound
# 31.1 %
tma_frontend_bound (87.67%)
1.004970242 seconds time elapsed
```
and this with the series:
```
$ perf stat -a sleep 1
Performance counter stats for 'system wide':
21,198 context-switches # nan
cs/sec cs_per_second
0 cpu-clock # 0.0
CPUs CPUs_utilized
989 cpu-migrations # nan
migrations/sec migrations_per_second
6,642 page-faults # nan
faults/sec page_faults_per_second
6,966,308 cpu_core/branch-misses/ # 1.3 %
branch_miss_rate
517,064,969 cpu_core/branches/ # nan
K/sec branch_frequency
1,602,405,292 cpu_core/cpu-cycles/ # nan
GHz cycles_frequency
3,012,408,051 cpu_core/instructions/ # 1.9
instructions insn_per_cycle
4,727,342 cpu_atom/branch-misses/ # 4.8 %
branch_miss_rate (49.79%)
94,075,578 cpu_atom/branches/ # nan
K/sec branch_frequency (50.14%)
922,932,356 cpu_atom/cpu-cycles/ # nan
GHz cycles_frequency (50.36%)
513,356,622 cpu_atom/instructions/ # 0.6
instructions insn_per_cycle (50.36%)
TopdownL1 (cpu_core) # 10.4 %
tma_bad_speculation
# 24.0 %
tma_frontend_bound
# 35.2 %
tma_backend_bound
# 30.4 %
tma_retiring
TopdownL1 (cpu_atom) # 36.1 %
tma_backend_bound (59.76%)
# 38.7 %
tma_frontend_bound (59.57%)
# 8.8 %
tma_bad_speculation
# 16.4 %
tma_retiring (59.57%)
1.006937573 seconds time elapsed
```
That is the TopdownL1 default group name is missing in the current
tree, etc. So just fixing the default perf stat output would be a good
reason to land this. The also broken output at the top is from a
tigerlake non-hybrid system.
Thanks,
Ian
> [1]
> * Foundations: https://lore.kernel.org/lkml/20240228175617.4049201-1-irogers@google.com/
> * AMD: https://lore.kernel.org/lkml/20240229001537.4158049-1-irogers@google.com/
> * Intel: https://lore.kernel.org/lkml/20240229001806.4158429-1-irogers@google.com/
> * ARM: https://lore.kernel.org/lkml/20240229001325.4157655-1-irogers@google.com/
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v1 00/22] Switch the default perf stat metrics to json
2025-10-24 17:58 [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
` (22 preceding siblings ...)
2025-10-30 20:51 ` [PATCH v1 00/22] Switch the default perf stat metrics to json Ian Rogers
@ 2025-11-04 4:47 ` Namhyung Kim
2025-11-04 5:09 ` Ian Rogers
23 siblings, 1 reply; 36+ messages in thread
From: Namhyung Kim @ 2025-11-04 4:47 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, James Clark,
Xu Yang, Chun-Tse Shao, Thomas Richter, Sumanth Korikkar,
Collin Funk, Thomas Falcon, Howard Chu, Dapeng Mi, Levi Yun,
Yang Li, linux-kernel, linux-perf-users
Hi Ian,
On Fri, Oct 24, 2025 at 10:58:35AM -0700, Ian Rogers wrote:
> Prior to this series stat-shadow would produce hard coded metrics if
> certain events appeared in the evlist. This series produces equivalent
> json metrics and cleans up the consequences in tests and display
> output. A before and after of the default display output on a
> tigerlake is:
>
> Before:
> ```
> $ perf stat -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 16,041,816,418 cpu-clock # 15.995 CPUs utilized
> 5,749 context-switches # 358.376 /sec
> 121 cpu-migrations # 7.543 /sec
> 1,806 page-faults # 112.581 /sec
> 825,965,204 instructions # 0.70 insn per cycle
> 1,180,799,101 cycles # 0.074 GHz
> 168,945,109 branches # 10.532 M/sec
> 4,629,567 branch-misses # 2.74% of all branches
> # 30.2 % tma_backend_bound
> # 7.8 % tma_bad_speculation
> # 47.1 % tma_frontend_bound
> # 14.9 % tma_retiring
> ```
>
> After:
> ```
> $ perf stat -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 2,890 context-switches # 179.9 cs/sec cs_per_second
> 16,061,923,339 cpu-clock # 16.0 CPUs CPUs_utilized
> 43 cpu-migrations # 2.7 migrations/sec migrations_per_second
> 5,645 page-faults # 351.5 faults/sec page_faults_per_second
> 5,708,413 branch-misses # 1.4 % branch_miss_rate (88.83%)
> 429,978,120 branches # 26.8 K/sec branch_frequency (88.85%)
> 1,626,915,897 cpu-cycles # 0.1 GHz cycles_frequency (88.84%)
> 2,556,805,534 instructions # 1.5 instructions insn_per_cycle (88.86%)
> TopdownL1 # 20.1 % tma_backend_bound
> # 40.5 % tma_bad_speculation (88.90%)
> # 17.2 % tma_frontend_bound (78.05%)
> # 22.2 % tma_retiring (88.89%)
>
> 1.002994394 seconds time elapsed
> ```
While this looks nicer, I worry about the changes in the output. And I'm
curious why only the "After" output shows the multiplexing percent.
>
> Having the metrics in json brings greater uniformity, allows events to
> be shared by metrics, and it also allows descriptions like:
> ```
> $ perf list cs_per_second
> ...
> cs_per_second
> [Context switches per CPU second]
> ```
>
> A thorn in the side of doing this work was that the hard coded metrics
> were used by perf script with '-F metric'. This functionality didn't
> work for me (I was testing `perf record -e instructions,cycles` and
> then `perf script -F metric` but saw nothing but empty lines)
The documentation says:
With the metric option perf script can compute metrics for
sampling periods, similar to perf stat. This requires
specifying a group with multiple events defining metrics with the :S option
for perf record. perf will sample on the first event, and
print computed metrics for all the events in the group. Please note
that the metric computed is averaged over the whole sampling
period (since the last sample), not just for the sample point.
So I guess it should have 'S' modifiers in a group.
> but anyway I decided to fix it to the best of my ability in this
> series. So the script side counters were removed and the regular ones
> associated with the evsel used. The json metrics were all searched
> looking for ones that have a subset of events matching those in the
> perf script session, and all metrics are printed. This is kind of
> weird as the counters are being set by the period of samples, but I
> carried the behavior forward. I suspect there needs to be follow up
> work to make this better, but what is in the series is superior to
> what is currently in the tree. Follow up work could include finding
> metrics for the machine in the perf.data rather than using the host,
> allowing multiple metrics even if the metric ids of the events differ,
> fixing pre-existing `perf stat record/report` issues, etc.
>
> There is a lot of stat tests that, for example, assume '-e
> instructions,cycles' will produce an IPC metric. These things needed
> tidying as now the metric must be explicitly asked for and when doing
> this ones using software events were preferred to increase
> compatibility. As the test updates were numerous they are distinct to
> the patches updating the functionality causing periods in the series
> where not all tests are passing. If this is undesirable the test fixes
> can be squashed into the functionality updates.
Hmm.. how many of them? I think it'd better to have the test changes at
the same time so that we can assure test success count after the change.
Can the test changes be squashed into one or two commits?
Thanks,
Namhyung
>
> Ian Rogers (22):
> perf evsel: Remove unused metric_events variable
> perf metricgroup: Update comment on location of metric_event list
> perf metricgroup: Missed free on error path
> perf metricgroup: When copy metrics copy default information
> perf metricgroup: Add care to picking the evsel for displaying a
> metric
> perf jevents: Make all tables static
> perf expr: Add #target_cpu literal
> perf jevents: Add set of common metrics based on default ones
> perf jevents: Add metric DefaultShowEvents
> perf stat: Add detail -d,-dd,-ddd metrics
> perf script: Change metric format to use json metrics
> perf stat: Remove hard coded shadow metrics
> perf stat: Fix default metricgroup display on hybrid
> perf stat: Sort default events/metrics
> perf stat: Remove "unit" workarounds for metric-only
> perf test stat+json: Improve metric-only testing
> perf test stat: Ignore failures in Default[234] metricgroups
> perf test stat: Update std_output testing metric expectations
> perf test metrics: Update all metrics for possibly failing default
> metrics
> perf test stat: Update shadow test to use metrics
> perf test stat: Update test expectations and events
> perf test stat csv: Update test expectations and events
>
> tools/perf/builtin-script.c | 238 ++++++++++-
> tools/perf/builtin-stat.c | 154 ++-----
> .../arch/common/common/metrics.json | 151 +++++++
> tools/perf/pmu-events/empty-pmu-events.c | 139 ++++--
> tools/perf/pmu-events/jevents.py | 34 +-
> tools/perf/pmu-events/pmu-events.h | 2 +
> .../tests/shell/lib/perf_json_output_lint.py | 4 +-
> tools/perf/tests/shell/lib/stat_output.sh | 2 +-
> tools/perf/tests/shell/stat+csv_output.sh | 2 +-
> tools/perf/tests/shell/stat+json_output.sh | 2 +-
> tools/perf/tests/shell/stat+shadow_stat.sh | 4 +-
> tools/perf/tests/shell/stat+std_output.sh | 4 +-
> tools/perf/tests/shell/stat.sh | 6 +-
> .../perf/tests/shell/stat_all_metricgroups.sh | 3 +
> tools/perf/tests/shell/stat_all_metrics.sh | 7 +-
> tools/perf/util/evsel.c | 2 -
> tools/perf/util/evsel.h | 2 +-
> tools/perf/util/expr.c | 3 +
> tools/perf/util/metricgroup.c | 95 ++++-
> tools/perf/util/metricgroup.h | 2 +-
> tools/perf/util/stat-display.c | 55 +--
> tools/perf/util/stat-shadow.c | 402 +-----------------
> tools/perf/util/stat.h | 2 +-
> 23 files changed, 672 insertions(+), 643 deletions(-)
> create mode 100644 tools/perf/pmu-events/arch/common/common/metrics.json
>
> --
> 2.51.1.821.gb6fe4d2222-goog
>
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: [PATCH v1 00/22] Switch the default perf stat metrics to json
2025-11-04 4:47 ` Namhyung Kim
@ 2025-11-04 5:09 ` Ian Rogers
2025-11-06 5:29 ` Namhyung Kim
0 siblings, 1 reply; 36+ messages in thread
From: Ian Rogers @ 2025-11-04 5:09 UTC (permalink / raw)
To: Namhyung Kim
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, James Clark,
Xu Yang, Chun-Tse Shao, Thomas Richter, Sumanth Korikkar,
Collin Funk, Thomas Falcon, Howard Chu, Dapeng Mi, Levi Yun,
Yang Li, linux-kernel, linux-perf-users
On Mon, Nov 3, 2025 at 8:47 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hi Ian,
>
> On Fri, Oct 24, 2025 at 10:58:35AM -0700, Ian Rogers wrote:
> > Prior to this series stat-shadow would produce hard coded metrics if
> > certain events appeared in the evlist. This series produces equivalent
> > json metrics and cleans up the consequences in tests and display
> > output. A before and after of the default display output on a
> > tigerlake is:
> >
> > Before:
> > ```
> > $ perf stat -a sleep 1
> >
> > Performance counter stats for 'system wide':
> >
> > 16,041,816,418 cpu-clock # 15.995 CPUs utilized
> > 5,749 context-switches # 358.376 /sec
> > 121 cpu-migrations # 7.543 /sec
> > 1,806 page-faults # 112.581 /sec
> > 825,965,204 instructions # 0.70 insn per cycle
> > 1,180,799,101 cycles # 0.074 GHz
> > 168,945,109 branches # 10.532 M/sec
> > 4,629,567 branch-misses # 2.74% of all branches
> > # 30.2 % tma_backend_bound
> > # 7.8 % tma_bad_speculation
> > # 47.1 % tma_frontend_bound
> > # 14.9 % tma_retiring
> > ```
> >
> > After:
> > ```
> > $ perf stat -a sleep 1
> >
> > Performance counter stats for 'system wide':
> >
> > 2,890 context-switches # 179.9 cs/sec cs_per_second
> > 16,061,923,339 cpu-clock # 16.0 CPUs CPUs_utilized
> > 43 cpu-migrations # 2.7 migrations/sec migrations_per_second
> > 5,645 page-faults # 351.5 faults/sec page_faults_per_second
> > 5,708,413 branch-misses # 1.4 % branch_miss_rate (88.83%)
> > 429,978,120 branches # 26.8 K/sec branch_frequency (88.85%)
> > 1,626,915,897 cpu-cycles # 0.1 GHz cycles_frequency (88.84%)
> > 2,556,805,534 instructions # 1.5 instructions insn_per_cycle (88.86%)
> > TopdownL1 # 20.1 % tma_backend_bound
> > # 40.5 % tma_bad_speculation (88.90%)
> > # 17.2 % tma_frontend_bound (78.05%)
> > # 22.2 % tma_retiring (88.89%)
> >
> > 1.002994394 seconds time elapsed
> > ```
>
> While this looks nicer, I worry about the changes in the output. And I'm
> curious why only the "After" output shows the multiplexing percent.
>
> >
> > Having the metrics in json brings greater uniformity, allows events to
> > be shared by metrics, and it also allows descriptions like:
> > ```
> > $ perf list cs_per_second
> > ...
> > cs_per_second
> > [Context switches per CPU second]
> > ```
> >
> > A thorn in the side of doing this work was that the hard coded metrics
> > were used by perf script with '-F metric'. This functionality didn't
> > work for me (I was testing `perf record -e instructions,cycles` and
> > then `perf script -F metric` but saw nothing but empty lines)
>
> The documentation says:
>
> With the metric option perf script can compute metrics for
> sampling periods, similar to perf stat. This requires
> specifying a group with multiple events defining metrics with the :S option
> for perf record. perf will sample on the first event, and
> print computed metrics for all the events in the group. Please note
> that the metric computed is averaged over the whole sampling
> period (since the last sample), not just for the sample point.
>
> So I guess it should have 'S' modifiers in a group.
Thanks Namhyung. Yes, this is the silly behavior where leader sample
events are both treated as an event but then the constituent parts
turned into individual events with the period set to the leader sample
read counts. Most recently this behavior was disabled by struct
perf_tool's dont_split_sample_group in the case of perf inject as it
causes events to be processed multiple times. The perf script behavior
doesn't rely anywhere on the grouping of the leader sample events and
even with it the metric format option doesn't work either - I'll save
pasting a screen full of blank lines here.
> > but anyway I decided to fix it to the best of my ability in this
> > series. So the script side counters were removed and the regular ones
> > associated with the evsel used. The json metrics were all searched
> > looking for ones that have a subset of events matching those in the
> > perf script session, and all metrics are printed. This is kind of
> > weird as the counters are being set by the period of samples, but I
> > carried the behavior forward. I suspect there needs to be follow up
> > work to make this better, but what is in the series is superior to
> > what is currently in the tree. Follow up work could include finding
> > metrics for the machine in the perf.data rather than using the host,
> > allowing multiple metrics even if the metric ids of the events differ,
> > fixing pre-existing `perf stat record/report` issues, etc.
> >
> > There is a lot of stat tests that, for example, assume '-e
> > instructions,cycles' will produce an IPC metric. These things needed
> > tidying as now the metric must be explicitly asked for and when doing
> > this ones using software events were preferred to increase
> > compatibility. As the test updates were numerous they are distinct to
> > the patches updating the functionality causing periods in the series
> > where not all tests are passing. If this is undesirable the test fixes
> > can be squashed into the functionality updates.
>
> Hmm.. how many of them? I think it'd better to have the test changes at
> the same time so that we can assure test success count after the change.
> Can the test changes be squashed into one or two commits?
So the patches are below. The first set are all clean up:
> > Ian Rogers (22):
> > perf evsel: Remove unused metric_events variable
> > perf metricgroup: Update comment on location of metric_event list
> > perf metricgroup: Missed free on error path
> > perf metricgroup: When copy metrics copy default information
> > perf metricgroup: Add care to picking the evsel for displaying a
> > metric
> > perf jevents: Make all tables static
Then there is the addition of the legacy metrics as json:
> > perf expr: Add #target_cpu literal
> > perf jevents: Add set of common metrics based on default ones
> > perf jevents: Add metric DefaultShowEvents
> > perf stat: Add detail -d,-dd,-ddd metrics
Then there is the change to make perf script metric format work:
> > perf script: Change metric format to use json metrics
Then there is a clean up patch:
> > perf stat: Remove hard coded shadow metrics
Then there are fixes to perf stat's already broken output:
> > perf stat: Fix default metricgroup display on hybrid
> > perf stat: Sort default events/metrics
> > perf stat: Remove "unit" workarounds for metric-only
Then there are 7 patches updating test expectations. Each patch deals
with a separate test to make the resolution clear.
> > perf test stat+json: Improve metric-only testing
> > perf test stat: Ignore failures in Default[234] metricgroups
> > perf test stat: Update std_output testing metric expectations
> > perf test metrics: Update all metrics for possibly failing default
> > metrics
> > perf test stat: Update shadow test to use metrics
> > perf test stat: Update test expectations and events
> > perf test stat csv: Update test expectations and events
The patch "perf jevents: Add set of common metrics based on default
ones" most impacts the output but we don't want to verify the default
stat output with the hardcoded metrics that are removed in "perf stat:
Remove hard coded shadow metrics". Having a test for both hard coded
and json metrics in an intermediate state makes little sense and the
default output is impacting by the 3 patches fixing it and removing
workarounds.
It is possible to squash things together but I think something is lost
in doing so, hence presenting it this way.
Thanks,
Ian
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v1 00/22] Switch the default perf stat metrics to json
2025-11-04 5:09 ` Ian Rogers
@ 2025-11-06 5:29 ` Namhyung Kim
0 siblings, 0 replies; 36+ messages in thread
From: Namhyung Kim @ 2025-11-06 5:29 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, James Clark,
Xu Yang, Chun-Tse Shao, Thomas Richter, Sumanth Korikkar,
Collin Funk, Thomas Falcon, Howard Chu, Dapeng Mi, Levi Yun,
Yang Li, linux-kernel, linux-perf-users
On Mon, Nov 03, 2025 at 09:09:14PM -0800, Ian Rogers wrote:
> On Mon, Nov 3, 2025 at 8:47 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hi Ian,
> >
> > On Fri, Oct 24, 2025 at 10:58:35AM -0700, Ian Rogers wrote:
> > > Prior to this series stat-shadow would produce hard coded metrics if
> > > certain events appeared in the evlist. This series produces equivalent
> > > json metrics and cleans up the consequences in tests and display
> > > output. A before and after of the default display output on a
> > > tigerlake is:
> > >
> > > Before:
> > > ```
> > > $ perf stat -a sleep 1
> > >
> > > Performance counter stats for 'system wide':
> > >
> > > 16,041,816,418 cpu-clock # 15.995 CPUs utilized
> > > 5,749 context-switches # 358.376 /sec
> > > 121 cpu-migrations # 7.543 /sec
> > > 1,806 page-faults # 112.581 /sec
> > > 825,965,204 instructions # 0.70 insn per cycle
> > > 1,180,799,101 cycles # 0.074 GHz
> > > 168,945,109 branches # 10.532 M/sec
> > > 4,629,567 branch-misses # 2.74% of all branches
> > > # 30.2 % tma_backend_bound
> > > # 7.8 % tma_bad_speculation
> > > # 47.1 % tma_frontend_bound
> > > # 14.9 % tma_retiring
> > > ```
> > >
> > > After:
> > > ```
> > > $ perf stat -a sleep 1
> > >
> > > Performance counter stats for 'system wide':
> > >
> > > 2,890 context-switches # 179.9 cs/sec cs_per_second
> > > 16,061,923,339 cpu-clock # 16.0 CPUs CPUs_utilized
> > > 43 cpu-migrations # 2.7 migrations/sec migrations_per_second
> > > 5,645 page-faults # 351.5 faults/sec page_faults_per_second
> > > 5,708,413 branch-misses # 1.4 % branch_miss_rate (88.83%)
> > > 429,978,120 branches # 26.8 K/sec branch_frequency (88.85%)
> > > 1,626,915,897 cpu-cycles # 0.1 GHz cycles_frequency (88.84%)
> > > 2,556,805,534 instructions # 1.5 instructions insn_per_cycle (88.86%)
> > > TopdownL1 # 20.1 % tma_backend_bound
> > > # 40.5 % tma_bad_speculation (88.90%)
> > > # 17.2 % tma_frontend_bound (78.05%)
> > > # 22.2 % tma_retiring (88.89%)
> > >
> > > 1.002994394 seconds time elapsed
> > > ```
> >
> > While this looks nicer, I worry about the changes in the output. And I'm
> > curious why only the "After" output shows the multiplexing percent.
> >
> > >
> > > Having the metrics in json brings greater uniformity, allows events to
> > > be shared by metrics, and it also allows descriptions like:
> > > ```
> > > $ perf list cs_per_second
> > > ...
> > > cs_per_second
> > > [Context switches per CPU second]
> > > ```
> > >
> > > A thorn in the side of doing this work was that the hard coded metrics
> > > were used by perf script with '-F metric'. This functionality didn't
> > > work for me (I was testing `perf record -e instructions,cycles` and
> > > then `perf script -F metric` but saw nothing but empty lines)
> >
> > The documentation says:
> >
> > With the metric option perf script can compute metrics for
> > sampling periods, similar to perf stat. This requires
> > specifying a group with multiple events defining metrics with the :S option
> > for perf record. perf will sample on the first event, and
> > print computed metrics for all the events in the group. Please note
> > that the metric computed is averaged over the whole sampling
> > period (since the last sample), not just for the sample point.
> >
> > So I guess it should have 'S' modifiers in a group.
>
> Thanks Namhyung. Yes, this is the silly behavior where leader sample
> events are both treated as an event but then the constituent parts
> turned into individual events with the period set to the leader sample
> read counts. Most recently this behavior was disabled by struct
> perf_tool's dont_split_sample_group in the case of perf inject as it
> causes events to be processed multiple times. The perf script behavior
> doesn't rely anywhere on the grouping of the leader sample events and
> even with it the metric format option doesn't work either - I'll save
> pasting a screen full of blank lines here.
Right, it seems to be broken at some point.
>
> > > but anyway I decided to fix it to the best of my ability in this
> > > series. So the script side counters were removed and the regular ones
> > > associated with the evsel used. The json metrics were all searched
> > > looking for ones that have a subset of events matching those in the
> > > perf script session, and all metrics are printed. This is kind of
> > > weird as the counters are being set by the period of samples, but I
> > > carried the behavior forward. I suspect there needs to be follow up
> > > work to make this better, but what is in the series is superior to
> > > what is currently in the tree. Follow up work could include finding
> > > metrics for the machine in the perf.data rather than using the host,
> > > allowing multiple metrics even if the metric ids of the events differ,
> > > fixing pre-existing `perf stat record/report` issues, etc.
> > >
> > > There is a lot of stat tests that, for example, assume '-e
> > > instructions,cycles' will produce an IPC metric. These things needed
> > > tidying as now the metric must be explicitly asked for and when doing
> > > this ones using software events were preferred to increase
> > > compatibility. As the test updates were numerous they are distinct to
> > > the patches updating the functionality causing periods in the series
> > > where not all tests are passing. If this is undesirable the test fixes
> > > can be squashed into the functionality updates.
> >
> > Hmm.. how many of them? I think it'd better to have the test changes at
> > the same time so that we can assure test success count after the change.
> > Can the test changes be squashed into one or two commits?
>
> So the patches are below. The first set are all clean up:
>
> > > Ian Rogers (22):
> > > perf evsel: Remove unused metric_events variable
> > > perf metricgroup: Update comment on location of metric_event list
> > > perf metricgroup: Missed free on error path
> > > perf metricgroup: When copy metrics copy default information
> > > perf metricgroup: Add care to picking the evsel for displaying a
> > > metric
> > > perf jevents: Make all tables static
I've applied most of this part to perf-tools-next, will take a look at
others later.
Thanks,
Namhyung
>
> Then there is the addition of the legacy metrics as json:
>
> > > perf expr: Add #target_cpu literal
> > > perf jevents: Add set of common metrics based on default ones
> > > perf jevents: Add metric DefaultShowEvents
> > > perf stat: Add detail -d,-dd,-ddd metrics
>
> Then there is the change to make perf script metric format work:
>
> > > perf script: Change metric format to use json metrics
>
> Then there is a clean up patch:
>
> > > perf stat: Remove hard coded shadow metrics
>
> Then there are fixes to perf stat's already broken output:
>
> > > perf stat: Fix default metricgroup display on hybrid
> > > perf stat: Sort default events/metrics
> > > perf stat: Remove "unit" workarounds for metric-only
>
> Then there are 7 patches updating test expectations. Each patch deals
> with a separate test to make the resolution clear.
>
> > > perf test stat+json: Improve metric-only testing
> > > perf test stat: Ignore failures in Default[234] metricgroups
> > > perf test stat: Update std_output testing metric expectations
> > > perf test metrics: Update all metrics for possibly failing default
> > > metrics
> > > perf test stat: Update shadow test to use metrics
> > > perf test stat: Update test expectations and events
> > > perf test stat csv: Update test expectations and events
>
> The patch "perf jevents: Add set of common metrics based on default
> ones" most impacts the output but we don't want to verify the default
> stat output with the hardcoded metrics that are removed in "perf stat:
> Remove hard coded shadow metrics". Having a test for both hard coded
> and json metrics in an intermediate state makes little sense and the
> default output is impacting by the 3 patches fixing it and removing
> workarounds.
>
> It is possible to squash things together but I think something is lost
> in doing so, hence presenting it this way.
>
> Thanks,
> Ian
^ permalink raw reply [flat|nested] 36+ messages in thread