[PATCH v1 00/51] shadow metric clean up and improvements

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v1 00/51] shadow metric clean up and improvements
@ 2023-02-19  9:27 Ian Rogers
  2023-02-19  9:27 ` [PATCH v1 01/51] perf tools: Ensure evsel name is initialized Ian Rogers
                   ` (36 more replies)
  0 siblings, 37 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:27 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Recently the shadow stat metrics broke due to repeated aggregation and
a quick fix was applied:
https://lore.kernel.org/lkml/20230209064447.83733-1-irogers@google.com/
This is the longer fix but one that comes with some extras. To avoid
fixing issues for hard coded metrics, the topdown, SMI cost and
transaction flags are moved into json metrics. A side effect of this
is that TopdownL1 metrics will now be displayed when supported, if no
"perf stat" events are specified.

Another fix included here is for event grouping as raised in:
https://lore.kernel.org/lkml/CA+icZUU_ew7pzWJJZLbj1xsU6MQTPrj8tkFfDhNdTDRQfGUBMQ@mail.gmail.com/
Metrics are now tagged with NMI and SMT flags, meaning that the events
shouldn't be grouped if the NMI watchdog is enabled or SMT is enabled.

Given the two issues, the metrics are re-generated and the patches
also include the latest Intel vendor events. The changes to the metric
generation code can be seen in:
https://github.com/intel/perfmon/pull/56

Hard coded metrics support thresholds, the patches add this ability to
json metrics so that the hard coded metrics can be removed. Migrate
remaining hard coded metrics to looking up counters from the
evlist/aggregation count. Finally, get rid of the saved_value logic
and thereby look to fix the aggregation issues.

Some related fix ups and code clean ups are included in the changes,
in particular to aid with the code's readability and to keep topdown
documentation in sync.

Ian Rogers (51):
  perf tools: Ensure evsel name is initialized
  perf metrics: Improve variable names
  perf pmu-events: Remove aggr_mode from pmu_event
  perf pmu-events: Change aggr_mode to be an enum
  perf pmu-events: Change deprecated to be a bool
  perf pmu-events: Change perpkg to be a bool
  perf expr: Make the online topology accessible globally
  perf pmu-events: Make the metric_constraint an enum
  perf pmu-events: Don't '\0' terminate enum values
  perf vendor events intel: Refresh alderlake events
  perf vendor events intel: Refresh alderlake-n metrics
  perf vendor events intel: Refresh broadwell metrics
  perf vendor events intel: Refresh broadwellde metrics
  perf vendor events intel: Refresh broadwellx metrics
  perf vendor events intel: Refresh cascadelakex events
  perf vendor events intel: Add graniterapids events
  perf vendor events intel: Refresh haswell metrics
  perf vendor events intel: Refresh haswellx metrics
  perf vendor events intel: Refresh icelake events
  perf vendor events intel: Refresh icelakex metrics
  perf vendor events intel: Refresh ivybridge metrics
  perf vendor events intel: Refresh ivytown metrics
  perf vendor events intel: Refresh jaketown events
  perf vendor events intel: Refresh knightslanding events
  perf vendor events intel: Refresh sandybridge events
  perf vendor events intel: Refresh sapphirerapids events
  perf vendor events intel: Refresh silvermont events
  perf vendor events intel: Refresh skylake events
  perf vendor events intel: Refresh skylakex metrics
  perf vendor events intel: Refresh tigerlake events
  perf vendor events intel: Refresh westmereep-dp events
  perf jevents: Add rand support to metrics
  perf jevent: Parse metric thresholds
  perf pmu-events: Test parsing metric thresholds with the fake PMU
  perf list: Support for printing metric thresholds
  perf metric: Compute and print threshold values
  perf expr: More explicit NAN handling
  perf metric: Add --metric-no-threshold option
  perf stat: Add TopdownL1 metric as a default if present
  perf stat: Implement --topdown using json metrics
  perf stat: Remove topdown event special handling
  perf doc: Refresh topdown documentation
  perf stat: Remove hard coded transaction events
  perf stat: Use metrics for --smi-cost
  perf stat: Remove perf_stat_evsel_id
  perf stat: Move enums from header
  perf stat: Hide runtime_stat
  perf stat: Add cpu_aggr_map for loop
  perf metric: Directly use counts rather than saved_value
  perf stat: Use counts rather than saved_value
  perf stat: Remove saved_value/runtime_stat

 tools/perf/Documentation/perf-stat.txt        |   27 +-
 tools/perf/Documentation/topdown.txt          |   70 +-
 tools/perf/arch/powerpc/util/header.c         |    2 +-
 tools/perf/arch/x86/util/evlist.c             |    6 +-
 tools/perf/arch/x86/util/topdown.c            |   78 +-
 tools/perf/arch/x86/util/topdown.h            |    1 -
 tools/perf/builtin-list.c                     |   13 +-
 tools/perf/builtin-script.c                   |    9 +-
 tools/perf/builtin-stat.c                     |  233 +-
 .../arch/x86/alderlake/adl-metrics.json       | 3190 ++++++++++-------
 .../pmu-events/arch/x86/alderlake/cache.json  |   36 +-
 .../arch/x86/alderlake/floating-point.json    |   27 +
 .../arch/x86/alderlake/frontend.json          |    9 +
 .../pmu-events/arch/x86/alderlake/memory.json |    3 +-
 .../arch/x86/alderlake/pipeline.json          |   14 +-
 .../arch/x86/alderlake/uncore-other.json      |   28 +-
 .../arch/x86/alderlaken/adln-metrics.json     |  811 +++--
 .../arch/x86/broadwell/bdw-metrics.json       | 1439 ++++----
 .../arch/x86/broadwellde/bdwde-metrics.json   | 1405 ++++----
 .../arch/x86/broadwellx/bdx-metrics.json      | 1626 +++++----
 .../arch/x86/broadwellx/uncore-cache.json     |   74 +-
 .../x86/broadwellx/uncore-interconnect.json   |   64 +-
 .../arch/x86/broadwellx/uncore-other.json     |    4 +-
 .../arch/x86/cascadelakex/cache.json          |   24 +-
 .../arch/x86/cascadelakex/clx-metrics.json    | 2198 ++++++------
 .../arch/x86/cascadelakex/frontend.json       |    8 +-
 .../arch/x86/cascadelakex/pipeline.json       |   16 +
 .../arch/x86/cascadelakex/uncore-memory.json  |   18 +-
 .../arch/x86/cascadelakex/uncore-other.json   |  120 +-
 .../arch/x86/cascadelakex/uncore-power.json   |    8 +-
 .../arch/x86/graniterapids/cache.json         |   54 +
 .../arch/x86/graniterapids/frontend.json      |   10 +
 .../arch/x86/graniterapids/memory.json        |  174 +
 .../arch/x86/graniterapids/other.json         |   29 +
 .../arch/x86/graniterapids/pipeline.json      |  102 +
 .../x86/graniterapids/virtual-memory.json     |   26 +
 .../arch/x86/haswell/hsw-metrics.json         | 1220 ++++---
 .../arch/x86/haswellx/hsx-metrics.json        | 1397 ++++----
 .../pmu-events/arch/x86/icelake/cache.json    |   16 +
 .../arch/x86/icelake/floating-point.json      |   31 +
 .../arch/x86/icelake/icl-metrics.json         | 1932 +++++-----
 .../pmu-events/arch/x86/icelake/pipeline.json |   23 +-
 .../arch/x86/icelake/uncore-other.json        |   56 +
 .../arch/x86/icelakex/icx-metrics.json        | 2153 +++++------
 .../arch/x86/icelakex/uncore-memory.json      |    2 +-
 .../arch/x86/icelakex/uncore-other.json       |    4 +-
 .../arch/x86/ivybridge/ivb-metrics.json       | 1270 ++++---
 .../arch/x86/ivytown/ivt-metrics.json         | 1311 ++++---
 .../pmu-events/arch/x86/jaketown/cache.json   |    6 +-
 .../arch/x86/jaketown/floating-point.json     |    2 +-
 .../arch/x86/jaketown/frontend.json           |   12 +-
 .../arch/x86/jaketown/jkt-metrics.json        |  602 ++--
 .../arch/x86/jaketown/pipeline.json           |    2 +-
 .../arch/x86/jaketown/uncore-cache.json       |   22 +-
 .../x86/jaketown/uncore-interconnect.json     |   74 +-
 .../arch/x86/jaketown/uncore-memory.json      |    4 +-
 .../arch/x86/jaketown/uncore-other.json       |   22 +-
 .../arch/x86/jaketown/uncore-power.json       |    8 +-
 .../arch/x86/knightslanding/cache.json        |   94 +-
 .../arch/x86/knightslanding/pipeline.json     |    8 +-
 .../arch/x86/knightslanding/uncore-other.json |    8 +-
 tools/perf/pmu-events/arch/x86/mapfile.csv    |   29 +-
 .../arch/x86/sandybridge/cache.json           |    8 +-
 .../arch/x86/sandybridge/floating-point.json  |    2 +-
 .../arch/x86/sandybridge/frontend.json        |   12 +-
 .../arch/x86/sandybridge/pipeline.json        |    2 +-
 .../arch/x86/sandybridge/snb-metrics.json     |  601 ++--
 .../arch/x86/sapphirerapids/cache.json        |   24 +-
 .../x86/sapphirerapids/floating-point.json    |   32 +
 .../arch/x86/sapphirerapids/frontend.json     |    8 +
 .../arch/x86/sapphirerapids/pipeline.json     |   19 +-
 .../arch/x86/sapphirerapids/spr-metrics.json  | 2283 ++++++------
 .../arch/x86/sapphirerapids/uncore-other.json |   60 +
 .../arch/x86/silvermont/frontend.json         |    2 +-
 .../arch/x86/silvermont/pipeline.json         |    2 +-
 .../pmu-events/arch/x86/skylake/cache.json    |   25 +-
 .../pmu-events/arch/x86/skylake/frontend.json |    8 +-
 .../pmu-events/arch/x86/skylake/other.json    |    1 +
 .../pmu-events/arch/x86/skylake/pipeline.json |   16 +
 .../arch/x86/skylake/skl-metrics.json         | 1877 ++++++----
 .../arch/x86/skylake/uncore-other.json        |    1 +
 .../pmu-events/arch/x86/skylakex/cache.json   |    8 +-
 .../arch/x86/skylakex/frontend.json           |    8 +-
 .../arch/x86/skylakex/pipeline.json           |   16 +
 .../arch/x86/skylakex/skx-metrics.json        | 2097 +++++------
 .../arch/x86/skylakex/uncore-memory.json      |    2 +-
 .../arch/x86/skylakex/uncore-other.json       |   96 +-
 .../arch/x86/skylakex/uncore-power.json       |    6 +-
 .../arch/x86/tigerlake/floating-point.json    |   31 +
 .../arch/x86/tigerlake/pipeline.json          |   18 +
 .../arch/x86/tigerlake/tgl-metrics.json       | 1942 +++++-----
 .../arch/x86/tigerlake/uncore-other.json      |   28 +-
 .../arch/x86/westmereep-dp/cache.json         |    2 +-
 .../x86/westmereep-dp/virtual-memory.json     |    2 +-
 tools/perf/pmu-events/jevents.py              |   58 +-
 tools/perf/pmu-events/metric.py               |    8 +-
 tools/perf/pmu-events/pmu-events.h            |   35 +-
 tools/perf/tests/expand-cgroup.c              |    3 +-
 tools/perf/tests/expr.c                       |    7 +-
 tools/perf/tests/parse-metric.c               |   21 +-
 tools/perf/tests/pmu-events.c                 |   49 +-
 tools/perf/util/cpumap.h                      |    3 +
 tools/perf/util/cputopo.c                     |   14 +
 tools/perf/util/cputopo.h                     |    5 +
 tools/perf/util/evsel.h                       |    2 +-
 tools/perf/util/expr.c                        |   16 +-
 tools/perf/util/expr.y                        |   12 +-
 tools/perf/util/metricgroup.c                 |  178 +-
 tools/perf/util/metricgroup.h                 |    5 +-
 tools/perf/util/pmu.c                         |   17 +-
 tools/perf/util/print-events.h                |    1 +
 tools/perf/util/smt.c                         |   11 +-
 tools/perf/util/smt.h                         |   12 +-
 tools/perf/util/stat-display.c                |  117 +-
 tools/perf/util/stat-shadow.c                 | 1287 ++-----
 tools/perf/util/stat.c                        |   74 -
 tools/perf/util/stat.h                        |   96 +-
 tools/perf/util/synthetic-events.c            |    2 +-
 tools/perf/util/topdown.c                     |   68 +-
 tools/perf/util/topdown.h                     |   11 +-
 120 files changed, 18025 insertions(+), 15590 deletions(-)
 create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/cache.json
 create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/frontend.json
 create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/memory.json
 create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/other.json
 create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json
 create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/virtual-memory.json

-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 01/51] perf tools: Ensure evsel name is initialized
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
@ 2023-02-19  9:27 ` Ian Rogers
  2023-02-28 12:06   ` kajoljain
  2023-02-19  9:27 ` [PATCH v1 02/51] perf metrics: Improve variable names Ian Rogers
                   ` (35 subsequent siblings)
  36 siblings, 1 reply; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:27 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Use the evsel__name accessor as otherwise name may be NULL resulting
in a segv. This was observed with the perf stat shell test.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/synthetic-events.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 9ab9308ee80c..6def01036eb5 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -2004,7 +2004,7 @@ int perf_event__synthesize_event_update_name(struct perf_tool *tool, struct evse
 					     perf_event__handler_t process)
 {
 	struct perf_record_event_update *ev;
-	size_t len = strlen(evsel->name);
+	size_t len = strlen(evsel__name(evsel));
 	int err;
 
 	ev = event_update_event__new(len + 1, PERF_EVENT_UPDATE__NAME, evsel->core.id[0]);
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 02/51] perf metrics: Improve variable names
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
  2023-02-19  9:27 ` [PATCH v1 01/51] perf tools: Ensure evsel name is initialized Ian Rogers
@ 2023-02-19  9:27 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 03/51] perf pmu-events: Remove aggr_mode from pmu_event Ian Rogers
                   ` (34 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:27 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

has_constraint implies the NMI_WATCHDOG_CONSTRAINT and if the
constraint is detected it causes events not to be grouped. Most of the
code cares about whether events are grouped or not, so rename
has_constraint to group_events.

Also remove group from metricgroup___watchdog_constraint_hint as the
warning is specific to a metric. Make the warning message agree with
this too.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/metricgroup.c | 45 +++++++++++++++++------------------
 1 file changed, 22 insertions(+), 23 deletions(-)

diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index f3559be95541..b2aa6e049804 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -136,10 +136,9 @@ struct metric {
 	/** Optional null terminated array of referenced metrics. */
 	struct metric_ref *metric_refs;
 	/**
-	 * Is there a constraint on the group of events? In which case the
-	 * events won't be grouped.
+	 * Should events of the metric be grouped?
 	 */
-	bool has_constraint;
+	bool group_events;
 	/**
 	 * Parsed events for the metric. Optional as events may be taken from a
 	 * different metric whose group contains all the IDs necessary for this
@@ -148,12 +147,12 @@ struct metric {
 	struct evlist *evlist;
 };
 
-static void metricgroup___watchdog_constraint_hint(const char *name, bool foot)
+static void metric__watchdog_constraint_hint(const char *name, bool foot)
 {
 	static bool violate_nmi_constraint;
 
 	if (!foot) {
-		pr_warning("Splitting metric group %s into standalone metrics.\n", name);
+		pr_warning("Not grouping metric %s's events.\n", name);
 		violate_nmi_constraint = true;
 		return;
 	}
@@ -167,18 +166,18 @@ static void metricgroup___watchdog_constraint_hint(const char *name, bool foot)
 		   "    echo 1 > /proc/sys/kernel/nmi_watchdog\n");
 }
 
-static bool metricgroup__has_constraint(const struct pmu_metric *pm)
+static bool metric__group_events(const struct pmu_metric *pm)
 {
 	if (!pm->metric_constraint)
-		return false;
+		return true;
 
 	if (!strcmp(pm->metric_constraint, "NO_NMI_WATCHDOG") &&
 	    sysctl__nmi_watchdog_enabled()) {
-		metricgroup___watchdog_constraint_hint(pm->metric_name, false);
-		return true;
+		metric__watchdog_constraint_hint(pm->metric_name, /*foot=*/false);
+		return false;
 	}
 
-	return false;
+	return true;
 }
 
 static void metric__free(struct metric *m)
@@ -227,7 +226,7 @@ static struct metric *metric__new(const struct pmu_metric *pm,
 	}
 	m->pctx->sctx.runtime = runtime;
 	m->pctx->sctx.system_wide = system_wide;
-	m->has_constraint = metric_no_group || metricgroup__has_constraint(pm);
+	m->group_events = !metric_no_group && metric__group_events(pm);
 	m->metric_refs = NULL;
 	m->evlist = NULL;
 
@@ -637,7 +636,7 @@ static int decode_all_metric_ids(struct evlist *perf_evlist, const char *modifie
 static int metricgroup__build_event_string(struct strbuf *events,
 					   const struct expr_parse_ctx *ctx,
 					   const char *modifier,
-					   bool has_constraint)
+					   bool group_events)
 {
 	struct hashmap_entry *cur;
 	size_t bkt;
@@ -662,7 +661,7 @@ static int metricgroup__build_event_string(struct strbuf *events,
 		}
 		/* Separate events with commas and open the group if necessary. */
 		if (no_group) {
-			if (!has_constraint) {
+			if (group_events) {
 				ret = strbuf_addch(events, '{');
 				RETURN_IF_NON_ZERO(ret);
 			}
@@ -716,7 +715,7 @@ static int metricgroup__build_event_string(struct strbuf *events,
 			RETURN_IF_NON_ZERO(ret);
 		}
 	}
-	if (!no_group && !has_constraint) {
+	if (!no_group && group_events) {
 		ret = strbuf_addf(events, "}:W");
 		RETURN_IF_NON_ZERO(ret);
 	}
@@ -1252,7 +1251,7 @@ static int metricgroup__add_metric_list(const char *list, bool metric_no_group,
 		 * Warn about nmi_watchdog if any parsed metrics had the
 		 * NO_NMI_WATCHDOG constraint.
 		 */
-		metricgroup___watchdog_constraint_hint(NULL, true);
+		metric__watchdog_constraint_hint(NULL, /*foot=*/true);
 		/* No metrics. */
 		if (count == 0)
 			return -EINVAL;
@@ -1295,7 +1294,7 @@ static void find_tool_events(const struct list_head *metric_list,
 }
 
 /**
- * build_combined_expr_ctx - Make an expr_parse_ctx with all has_constraint
+ * build_combined_expr_ctx - Make an expr_parse_ctx with all !group_events
  *                           metric IDs, as the IDs are held in a set,
  *                           duplicates will be removed.
  * @metric_list: List to take metrics from.
@@ -1315,7 +1314,7 @@ static int build_combined_expr_ctx(const struct list_head *metric_list,
 		return -ENOMEM;
 
 	list_for_each_entry(m, metric_list, nd) {
-		if (m->has_constraint && !m->modifier) {
+		if (!m->group_events && !m->modifier) {
 			hashmap__for_each_entry(m->pctx->ids, cur, bkt) {
 				dup = strdup(cur->pkey);
 				if (!dup) {
@@ -1342,14 +1341,14 @@ static int build_combined_expr_ctx(const struct list_head *metric_list,
  * @fake_pmu: used when testing metrics not supported by the current CPU.
  * @ids: the event identifiers parsed from a metric.
  * @modifier: any modifiers added to the events.
- * @has_constraint: false if events should be placed in a weak group.
+ * @group_events: should events be placed in a weak group.
  * @tool_events: entries set true if the tool event of index could be present in
  *               the overall list of metrics.
  * @out_evlist: the created list of events.
  */
 static int parse_ids(bool metric_no_merge, struct perf_pmu *fake_pmu,
 		     struct expr_parse_ctx *ids, const char *modifier,
-		     bool has_constraint, const bool tool_events[PERF_TOOL_MAX],
+		     bool group_events, const bool tool_events[PERF_TOOL_MAX],
 		     struct evlist **out_evlist)
 {
 	struct parse_events_error parse_error;
@@ -1393,7 +1392,7 @@ static int parse_ids(bool metric_no_merge, struct perf_pmu *fake_pmu,
 		}
 	}
 	ret = metricgroup__build_event_string(&events, ids, modifier,
-					      has_constraint);
+					      group_events);
 	if (ret)
 		return ret;
 
@@ -1458,7 +1457,7 @@ static int parse_groups(struct evlist *perf_evlist, const char *str,
 		if (!ret && combined && hashmap__size(combined->ids)) {
 			ret = parse_ids(metric_no_merge, fake_pmu, combined,
 					/*modifier=*/NULL,
-					/*has_constraint=*/true,
+					/*group_events=*/false,
 					tool_events,
 					&combined_evlist);
 		}
@@ -1476,7 +1475,7 @@ static int parse_groups(struct evlist *perf_evlist, const char *str,
 		struct metric *n;
 		struct metric_expr *expr;
 
-		if (combined_evlist && m->has_constraint) {
+		if (combined_evlist && !m->group_events) {
 			metric_evlist = combined_evlist;
 		} else if (!metric_no_merge) {
 			/*
@@ -1507,7 +1506,7 @@ static int parse_groups(struct evlist *perf_evlist, const char *str,
 		}
 		if (!metric_evlist) {
 			ret = parse_ids(metric_no_merge, fake_pmu, m->pctx, m->modifier,
-					m->has_constraint, tool_events, &m->evlist);
+					m->group_events, tool_events, &m->evlist);
 			if (ret)
 				goto out;
 
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 03/51] perf pmu-events: Remove aggr_mode from pmu_event
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
  2023-02-19  9:27 ` [PATCH v1 01/51] perf tools: Ensure evsel name is initialized Ian Rogers
  2023-02-19  9:27 ` [PATCH v1 02/51] perf metrics: Improve variable names Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 04/51] perf pmu-events: Change aggr_mode to be an enum Ian Rogers
                   ` (33 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

aggr_mode is used on Power to set a flag for metrics. For pmu_event it
is unused.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/jevents.py   | 2 +-
 tools/perf/pmu-events/pmu-events.h | 1 -
 tools/perf/tests/pmu-events.c      | 6 ------
 3 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 2bcd07ce609f..db8b92de113e 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -44,7 +44,7 @@ _json_event_attributes = [
     # Seems useful, put it early.
     'event',
     # Short things in alphabetical order.
-    'aggr_mode', 'compat', 'deprecated', 'perpkg', 'unit',
+    'compat', 'deprecated', 'perpkg', 'unit',
     # Longer things (the last won't be iterated over during decompress).
     'long_desc'
 ]
diff --git a/tools/perf/pmu-events/pmu-events.h b/tools/perf/pmu-events/pmu-events.h
index b7d4a66b8ad2..cee8b83792f8 100644
--- a/tools/perf/pmu-events/pmu-events.h
+++ b/tools/perf/pmu-events/pmu-events.h
@@ -22,7 +22,6 @@ struct pmu_event {
 	const char *pmu;
 	const char *unit;
 	const char *perpkg;
-	const char *aggr_mode;
 	const char *deprecated;
 };
 
diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index accf44b3d968..9b4c94ba5460 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -331,12 +331,6 @@ static int compare_pmu_events(const struct pmu_event *e1, const struct pmu_event
 		return -1;
 	}
 
-	if (!is_same(e1->aggr_mode, e2->aggr_mode)) {
-		pr_debug2("testing event e1 %s: mismatched aggr_mode, %s vs %s\n",
-			  e1->name, e1->aggr_mode, e2->aggr_mode);
-		return -1;
-	}
-
 	if (!is_same(e1->deprecated, e2->deprecated)) {
 		pr_debug2("testing event e1 %s: mismatched deprecated, %s vs %s\n",
 			  e1->name, e1->deprecated, e2->deprecated);
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 04/51] perf pmu-events: Change aggr_mode to be an enum
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (2 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 03/51] perf pmu-events: Remove aggr_mode from pmu_event Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 05/51] perf pmu-events: Change deprecated to be a bool Ian Rogers
                   ` (32 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Rather than use a string to encode aggr_mode, use an enum value.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/arch/powerpc/util/header.c |  2 +-
 tools/perf/pmu-events/jevents.py      | 17 +++++++++++------
 tools/perf/pmu-events/pmu-events.h    |  2 +-
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/tools/perf/arch/powerpc/util/header.c b/tools/perf/arch/powerpc/util/header.c
index 78eef77d8a8d..c8d0dc775e5d 100644
--- a/tools/perf/arch/powerpc/util/header.c
+++ b/tools/perf/arch/powerpc/util/header.c
@@ -45,6 +45,6 @@ int arch_get_runtimeparam(const struct pmu_metric *pm)
 	int count;
 	char path[PATH_MAX] = "/devices/hv_24x7/interface/";
 
-	atoi(pm->aggr_mode) == PerChip ? strcat(path, "sockets") : strcat(path, "coresperchip");
+	strcat(path, pm->aggr_mode == PerChip ? "sockets" : "coresperchip");
 	return sysfs__read_int(path, &count) < 0 ? 1 : count;
 }
diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index db8b92de113e..2b08d7c18f4b 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -678,10 +678,13 @@ static void decompress_event(int offset, struct pmu_event *pe)
 {
 \tconst char *p = &big_c_string[offset];
 """)
+  enum_attributes = ['aggr_mode']
   for attr in _json_event_attributes:
-    _args.output_file.write(f"""
-\tpe->{attr} = (*p == '\\0' ? NULL : p);
-""")
+    _args.output_file.write(f'\n\tpe->{attr} = ')
+    if attr in enum_attributes:
+      _args.output_file.write("(*p == '\\0' ? 0 : *p - '0');\n")
+    else:
+      _args.output_file.write("(*p == '\\0' ? NULL : p);\n")
     if attr == _json_event_attributes[-1]:
       continue
     _args.output_file.write('\twhile (*p++);')
@@ -692,9 +695,11 @@ static void decompress_metric(int offset, struct pmu_metric *pm)
 \tconst char *p = &big_c_string[offset];
 """)
   for attr in _json_metric_attributes:
-    _args.output_file.write(f"""
-\tpm->{attr} = (*p == '\\0' ? NULL : p);
-""")
+    _args.output_file.write(f'\n\tpm->{attr} = ')
+    if attr in enum_attributes:
+      _args.output_file.write("(*p == '\\0' ? 0 : *p - '0');\n")
+    else:
+      _args.output_file.write("(*p == '\\0' ? NULL : p);\n")
     if attr == _json_metric_attributes[-1]:
       continue
     _args.output_file.write('\twhile (*p++);')
diff --git a/tools/perf/pmu-events/pmu-events.h b/tools/perf/pmu-events/pmu-events.h
index cee8b83792f8..7225efc4e4df 100644
--- a/tools/perf/pmu-events/pmu-events.h
+++ b/tools/perf/pmu-events/pmu-events.h
@@ -31,10 +31,10 @@ struct pmu_metric {
 	const char *metric_expr;
 	const char *unit;
 	const char *compat;
-	const char *aggr_mode;
 	const char *metric_constraint;
 	const char *desc;
 	const char *long_desc;
+	enum aggr_mode_class aggr_mode;
 };
 
 struct pmu_events_table;
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 05/51] perf pmu-events: Change deprecated to be a bool
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (3 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 04/51] perf pmu-events: Change aggr_mode to be an enum Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 06/51] perf pmu-events: Change perpkg " Ian Rogers
                   ` (31 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Switch to a more natural bool rather than string encoding, where NULL
implicitly meant false.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/jevents.py   |  2 +-
 tools/perf/pmu-events/pmu-events.h |  4 +++-
 tools/perf/tests/pmu-events.c      |  4 ++--
 tools/perf/util/pmu.c              | 10 ++++------
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 2b08d7c18f4b..35ca34eca74a 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -678,7 +678,7 @@ static void decompress_event(int offset, struct pmu_event *pe)
 {
 \tconst char *p = &big_c_string[offset];
 """)
-  enum_attributes = ['aggr_mode']
+  enum_attributes = ['aggr_mode', 'deprecated']
   for attr in _json_event_attributes:
     _args.output_file.write(f'\n\tpe->{attr} = ')
     if attr in enum_attributes:
diff --git a/tools/perf/pmu-events/pmu-events.h b/tools/perf/pmu-events/pmu-events.h
index 7225efc4e4df..2434bc7cf92d 100644
--- a/tools/perf/pmu-events/pmu-events.h
+++ b/tools/perf/pmu-events/pmu-events.h
@@ -2,6 +2,8 @@
 #ifndef PMU_EVENTS_H
 #define PMU_EVENTS_H
 
+#include <stdbool.h>
+
 struct perf_pmu;
 
 enum aggr_mode_class {
@@ -22,7 +24,7 @@ struct pmu_event {
 	const char *pmu;
 	const char *unit;
 	const char *perpkg;
-	const char *deprecated;
+	bool deprecated;
 };
 
 struct pmu_metric {
diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index 9b4c94ba5460..937804c84e29 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -331,8 +331,8 @@ static int compare_pmu_events(const struct pmu_event *e1, const struct pmu_event
 		return -1;
 	}
 
-	if (!is_same(e1->deprecated, e2->deprecated)) {
-		pr_debug2("testing event e1 %s: mismatched deprecated, %s vs %s\n",
+	if (e1->deprecated != e2->deprecated) {
+		pr_debug2("testing event e1 %s: mismatched deprecated, %d vs %d\n",
 			  e1->name, e1->deprecated, e2->deprecated);
 		return -1;
 	}
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index c256b29defad..80644e25a568 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -331,14 +331,15 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
 	int num;
 	char newval[256];
 	char *long_desc = NULL, *topic = NULL, *unit = NULL, *perpkg = NULL,
-	     *deprecated = NULL, *pmu_name = NULL;
+	     *pmu_name = NULL;
+	bool deprecated = false;
 
 	if (pe) {
 		long_desc = (char *)pe->long_desc;
 		topic = (char *)pe->topic;
 		unit = (char *)pe->unit;
 		perpkg = (char *)pe->perpkg;
-		deprecated = (char *)pe->deprecated;
+		deprecated = pe->deprecated;
 		pmu_name = (char *)pe->pmu;
 	}
 
@@ -351,7 +352,7 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
 	alias->unit[0] = '\0';
 	alias->per_pkg = false;
 	alias->snapshot = false;
-	alias->deprecated = false;
+	alias->deprecated = deprecated;
 
 	ret = parse_events_terms(&alias->terms, val);
 	if (ret) {
@@ -405,9 +406,6 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
 	alias->str = strdup(newval);
 	alias->pmu_name = pmu_name ? strdup(pmu_name) : NULL;
 
-	if (deprecated)
-		alias->deprecated = true;
-
 	if (!perf_pmu_merge_alias(alias, list))
 		list_add_tail(&alias->list, list);
 
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 06/51] perf pmu-events: Change perpkg to be a bool
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (4 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 05/51] perf pmu-events: Change deprecated to be a bool Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 07/51] perf expr: Make the online topology accessible globally Ian Rogers
                   ` (30 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Switch to a more natural bool rather than string encoding, where NULL
implicitly meant false. The only value of 'PerPkg' in the event json
is '1'.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/jevents.py   |  2 +-
 tools/perf/pmu-events/pmu-events.h |  2 +-
 tools/perf/tests/pmu-events.c      |  4 ++--
 tools/perf/util/pmu.c              | 11 ++++-------
 4 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 35ca34eca74a..2da55408398f 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -678,7 +678,7 @@ static void decompress_event(int offset, struct pmu_event *pe)
 {
 \tconst char *p = &big_c_string[offset];
 """)
-  enum_attributes = ['aggr_mode', 'deprecated']
+  enum_attributes = ['aggr_mode', 'deprecated', 'perpkg']
   for attr in _json_event_attributes:
     _args.output_file.write(f'\n\tpe->{attr} = ')
     if attr in enum_attributes:
diff --git a/tools/perf/pmu-events/pmu-events.h b/tools/perf/pmu-events/pmu-events.h
index 2434bc7cf92d..4d236bb32fd3 100644
--- a/tools/perf/pmu-events/pmu-events.h
+++ b/tools/perf/pmu-events/pmu-events.h
@@ -23,7 +23,7 @@ struct pmu_event {
 	const char *long_desc;
 	const char *pmu;
 	const char *unit;
-	const char *perpkg;
+	bool perpkg;
 	bool deprecated;
 };
 
diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index 937804c84e29..521557c396bc 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -325,8 +325,8 @@ static int compare_pmu_events(const struct pmu_event *e1, const struct pmu_event
 		return -1;
 	}
 
-	if (!is_same(e1->perpkg, e2->perpkg)) {
-		pr_debug2("testing event e1 %s: mismatched perpkg, %s vs %s\n",
+	if (e1->perpkg != e2->perpkg) {
+		pr_debug2("testing event e1 %s: mismatched perpkg, %d vs %d\n",
 			  e1->name, e1->perpkg, e2->perpkg);
 		return -1;
 	}
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 80644e25a568..43b6182d96b7 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -328,17 +328,15 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
 	struct parse_events_term *term;
 	struct perf_pmu_alias *alias;
 	int ret;
-	int num;
 	char newval[256];
-	char *long_desc = NULL, *topic = NULL, *unit = NULL, *perpkg = NULL,
-	     *pmu_name = NULL;
-	bool deprecated = false;
+	char *long_desc = NULL, *topic = NULL, *unit = NULL, *pmu_name = NULL;
+	bool deprecated = false, perpkg = false;
 
 	if (pe) {
 		long_desc = (char *)pe->long_desc;
 		topic = (char *)pe->topic;
 		unit = (char *)pe->unit;
-		perpkg = (char *)pe->perpkg;
+		perpkg = pe->perpkg;
 		deprecated = pe->deprecated;
 		pmu_name = (char *)pe->pmu;
 	}
@@ -350,7 +348,7 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
 	INIT_LIST_HEAD(&alias->terms);
 	alias->scale = 1.0;
 	alias->unit[0] = '\0';
-	alias->per_pkg = false;
+	alias->per_pkg = perpkg;
 	alias->snapshot = false;
 	alias->deprecated = deprecated;
 
@@ -402,7 +400,6 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
 			return -1;
 		snprintf(alias->unit, sizeof(alias->unit), "%s", unit);
 	}
-	alias->per_pkg = perpkg && sscanf(perpkg, "%d", &num) == 1 && num == 1;
 	alias->str = strdup(newval);
 	alias->pmu_name = pmu_name ? strdup(pmu_name) : NULL;
 
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 07/51] perf expr: Make the online topology accessible globally
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (5 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 06/51] perf pmu-events: Change perpkg " Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 08/51] perf pmu-events: Make the metric_constraint an enum Ian Rogers
                   ` (29 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Knowing the topology of online CPUs is useful for more than just expr
literals. Move to a global function that caches the value. An
additional upside is that this may also avoid computing the CPU
topology in some situations.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/tests/expr.c   |  7 ++-----
 tools/perf/util/cputopo.c | 14 ++++++++++++++
 tools/perf/util/cputopo.h |  5 +++++
 tools/perf/util/expr.c    | 16 ++++++----------
 tools/perf/util/smt.c     | 11 +++++------
 tools/perf/util/smt.h     | 12 ++++++------
 6 files changed, 38 insertions(+), 27 deletions(-)

diff --git a/tools/perf/tests/expr.c b/tools/perf/tests/expr.c
index a9eb1ed6bd63..cbf0e0c74906 100644
--- a/tools/perf/tests/expr.c
+++ b/tools/perf/tests/expr.c
@@ -154,13 +154,10 @@ static int test__expr(struct test_suite *t __maybe_unused, int subtest __maybe_u
 
 	/* Only EVENT1 or EVENT2 need be measured depending on the value of smt_on. */
 	{
-		struct cpu_topology *topology = cpu_topology__new();
-		bool smton = smt_on(topology);
+		bool smton = smt_on();
 		bool corewide = core_wide(/*system_wide=*/false,
-					  /*user_requested_cpus=*/false,
-					  topology);
+					  /*user_requested_cpus=*/false);
 
-		cpu_topology__delete(topology);
 		expr__ctx_clear(ctx);
 		TEST_ASSERT_VAL("find ids",
 				expr__find_ids("EVENT1 if #smt_on else EVENT2",
diff --git a/tools/perf/util/cputopo.c b/tools/perf/util/cputopo.c
index e08797c3cdbc..ca1d833a0c26 100644
--- a/tools/perf/util/cputopo.c
+++ b/tools/perf/util/cputopo.c
@@ -238,6 +238,20 @@ static bool has_die_topology(void)
 	return true;
 }
 
+const struct cpu_topology *online_topology(void)
+{
+	static const struct cpu_topology *topology;
+
+	if (!topology) {
+		topology = cpu_topology__new();
+		if (!topology) {
+			pr_err("Error creating CPU topology");
+			abort();
+		}
+	}
+	return topology;
+}
+
 struct cpu_topology *cpu_topology__new(void)
 {
 	struct cpu_topology *tp = NULL;
diff --git a/tools/perf/util/cputopo.h b/tools/perf/util/cputopo.h
index 969e5920a00e..8d42f6102954 100644
--- a/tools/perf/util/cputopo.h
+++ b/tools/perf/util/cputopo.h
@@ -56,6 +56,11 @@ struct hybrid_topology {
 	struct hybrid_topology_node	nodes[];
 };
 
+/*
+ * The topology for online CPUs, lazily created.
+ */
+const struct cpu_topology *online_topology(void);
+
 struct cpu_topology *cpu_topology__new(void);
 void cpu_topology__delete(struct cpu_topology *tp);
 /* Determine from the core list whether SMT was enabled. */
diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
index c1da20b868db..d46a1878bc9e 100644
--- a/tools/perf/util/expr.c
+++ b/tools/perf/util/expr.c
@@ -402,7 +402,7 @@ double arch_get_tsc_freq(void)
 
 double expr__get_literal(const char *literal, const struct expr_scanner_ctx *ctx)
 {
-	static struct cpu_topology *topology;
+	const struct cpu_topology *topology;
 	double result = NAN;
 
 	if (!strcmp("#num_cpus", literal)) {
@@ -421,31 +421,27 @@ double expr__get_literal(const char *literal, const struct expr_scanner_ctx *ctx
 	 * these strings gives an indication of the number of packages, dies,
 	 * etc.
 	 */
-	if (!topology) {
-		topology = cpu_topology__new();
-		if (!topology) {
-			pr_err("Error creating CPU topology");
-			goto out;
-		}
-	}
 	if (!strcasecmp("#smt_on", literal)) {
-		result = smt_on(topology) ? 1.0 : 0.0;
+		result = smt_on() ? 1.0 : 0.0;
 		goto out;
 	}
 	if (!strcmp("#core_wide", literal)) {
-		result = core_wide(ctx->system_wide, ctx->user_requested_cpu_list, topology)
+		result = core_wide(ctx->system_wide, ctx->user_requested_cpu_list)
 			? 1.0 : 0.0;
 		goto out;
 	}
 	if (!strcmp("#num_packages", literal)) {
+		topology = online_topology();
 		result = topology->package_cpus_lists;
 		goto out;
 	}
 	if (!strcmp("#num_dies", literal)) {
+		topology = online_topology();
 		result = topology->die_cpus_lists;
 		goto out;
 	}
 	if (!strcmp("#num_cores", literal)) {
+		topology = online_topology();
 		result = topology->core_cpus_lists;
 		goto out;
 	}
diff --git a/tools/perf/util/smt.c b/tools/perf/util/smt.c
index 994e9e418227..650e804d0adc 100644
--- a/tools/perf/util/smt.c
+++ b/tools/perf/util/smt.c
@@ -4,7 +4,7 @@
 #include "cputopo.h"
 #include "smt.h"
 
-bool smt_on(const struct cpu_topology *topology)
+bool smt_on(void)
 {
 	static bool cached;
 	static bool cached_result;
@@ -16,22 +16,21 @@ bool smt_on(const struct cpu_topology *topology)
 	if (sysfs__read_int("devices/system/cpu/smt/active", &fs_value) >= 0)
 		cached_result = (fs_value == 1);
 	else
-		cached_result = cpu_topology__smt_on(topology);
+		cached_result = cpu_topology__smt_on(online_topology());
 
 	cached = true;
 	return cached_result;
 }
 
-bool core_wide(bool system_wide, const char *user_requested_cpu_list,
-	       const struct cpu_topology *topology)
+bool core_wide(bool system_wide, const char *user_requested_cpu_list)
 {
 	/* If not everything running on a core is being recorded then we can't use core_wide. */
 	if (!system_wide)
 		return false;
 
 	/* Cheap case that SMT is disabled and therefore we're inherently core_wide. */
-	if (!smt_on(topology))
+	if (!smt_on())
 		return true;
 
-	return cpu_topology__core_wide(topology, user_requested_cpu_list);
+	return cpu_topology__core_wide(online_topology(), user_requested_cpu_list);
 }
diff --git a/tools/perf/util/smt.h b/tools/perf/util/smt.h
index ae9095f2c38c..01441fd2c0a2 100644
--- a/tools/perf/util/smt.h
+++ b/tools/perf/util/smt.h
@@ -2,16 +2,16 @@
 #ifndef __SMT_H
 #define __SMT_H 1
 
-struct cpu_topology;
-
-/* Returns true if SMT (aka hyperthreading) is enabled. */
-bool smt_on(const struct cpu_topology *topology);
+/*
+ * Returns true if SMT (aka hyperthreading) is enabled. Determined via sysfs or
+ * the online topology.
+ */
+bool smt_on(void);
 
 /*
  * Returns true when system wide and all SMT threads for a core are in the
  * user_requested_cpus map.
  */
-bool core_wide(bool system_wide, const char *user_requested_cpu_list,
-	       const struct cpu_topology *topology);
+bool core_wide(bool system_wide, const char *user_requested_cpu_list);
 
 #endif /* __SMT_H */
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 08/51] perf pmu-events: Make the metric_constraint an enum
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (6 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 07/51] perf expr: Make the online topology accessible globally Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 09/51] perf pmu-events: Don't '\0' terminate enum values Ian Rogers
                   ` (28 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Rename metric_constraint to event_grouping to better explain what the
variable is used for. Switch to use an enum for encoding instead of a
string. Rather than just no constraint/grouping information or
"NO_NMI_WATCHDOG", have 4 enum values. The values encode whether to
group or not, and two cases where the behavior is dependent on either
the NMI watchdog being enabled or SMT being enabled.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/jevents.py   | 20 ++++++++++++++++----
 tools/perf/pmu-events/pmu-events.h | 25 ++++++++++++++++++++++++-
 tools/perf/util/metricgroup.c      | 19 ++++++++++++-------
 3 files changed, 52 insertions(+), 12 deletions(-)

diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 2da55408398f..dc0c56dccb5e 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -51,8 +51,8 @@ _json_event_attributes = [
 
 # Attributes that are in pmu_metric rather than pmu_event.
 _json_metric_attributes = [
-    'metric_name', 'metric_group', 'metric_constraint', 'metric_expr', 'desc',
-    'long_desc', 'unit', 'compat', 'aggr_mode'
+    'metric_name', 'metric_group', 'metric_expr', 'desc',
+    'long_desc', 'unit', 'compat', 'aggr_mode', 'event_grouping'
 ]
 
 def removesuffix(s: str, suffix: str) -> str:
@@ -204,6 +204,18 @@ class JsonEvent:
       }
       return aggr_mode_to_enum[aggr_mode]
 
+    def convert_metric_constraint(metric_constraint: str) -> Optional[str]:
+      """Returns the metric_event_groups enum value associated with the JSON string."""
+      if not metric_constraint:
+        return None
+      metric_constraint_to_enum = {
+          'NO_GROUP_EVENTS': '1',
+          'NO_GROUP_EVENTS_NMI': '2',
+          'NO_NMI_WATCHDOG': '2',
+          'NO_GROUP_EVENTS_SMT': '3',
+      }
+      return metric_constraint_to_enum[metric_constraint]
+
     def lookup_msr(num: str) -> Optional[str]:
       """Converts the msr number, or first in a list to the appropriate event field."""
       if not num:
@@ -288,7 +300,7 @@ class JsonEvent:
     self.deprecated = jd.get('Deprecated')
     self.metric_name = jd.get('MetricName')
     self.metric_group = jd.get('MetricGroup')
-    self.metric_constraint = jd.get('MetricConstraint')
+    self.event_grouping = convert_metric_constraint(jd.get('MetricConstraint'))
     self.metric_expr = None
     if 'MetricExpr' in jd:
       self.metric_expr = metric.ParsePerfJson(jd['MetricExpr']).Simplify()
@@ -678,7 +690,7 @@ static void decompress_event(int offset, struct pmu_event *pe)
 {
 \tconst char *p = &big_c_string[offset];
 """)
-  enum_attributes = ['aggr_mode', 'deprecated', 'perpkg']
+  enum_attributes = ['aggr_mode', 'deprecated', 'event_grouping', 'perpkg']
   for attr in _json_event_attributes:
     _args.output_file.write(f'\n\tpe->{attr} = ')
     if attr in enum_attributes:
diff --git a/tools/perf/pmu-events/pmu-events.h b/tools/perf/pmu-events/pmu-events.h
index 4d236bb32fd3..57a38e3e5c32 100644
--- a/tools/perf/pmu-events/pmu-events.h
+++ b/tools/perf/pmu-events/pmu-events.h
@@ -11,6 +11,29 @@ enum aggr_mode_class {
 	PerCore
 };
 
+/**
+ * enum metric_event_groups - How events within a pmu_metric should be grouped.
+ */
+enum metric_event_groups {
+	/**
+	 * @MetricGroupEvents: Default, group events within the metric.
+	 */
+	MetricGroupEvents = 0,
+	/**
+	 * @MetricNoGroupEvents: Don't group events for the metric.
+	 */
+	MetricNoGroupEvents = 1,
+	/**
+	 * @MetricNoGroupEventsNmi: Don't group events for the metric if the NMI
+	 *                          watchdog is enabled.
+	 */
+	MetricNoGroupEventsNmi = 2,
+	/**
+	 * @MetricNoGroupEventsSmt: Don't group events for the metric if SMT is
+	 *                          enabled.
+	 */
+	MetricNoGroupEventsSmt = 3,
+};
 /*
  * Describe each PMU event. Each CPU has a table of PMU events.
  */
@@ -33,10 +56,10 @@ struct pmu_metric {
 	const char *metric_expr;
 	const char *unit;
 	const char *compat;
-	const char *metric_constraint;
 	const char *desc;
 	const char *long_desc;
 	enum aggr_mode_class aggr_mode;
+	enum metric_event_groups event_grouping;
 };
 
 struct pmu_events_table;
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index b2aa6e049804..868fc9c35606 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -13,6 +13,7 @@
 #include "pmu.h"
 #include "pmu-hybrid.h"
 #include "print-events.h"
+#include "smt.h"
 #include "expr.h"
 #include "rblist.h"
 #include <string.h>
@@ -168,16 +169,20 @@ static void metric__watchdog_constraint_hint(const char *name, bool foot)
 
 static bool metric__group_events(const struct pmu_metric *pm)
 {
-	if (!pm->metric_constraint)
-		return true;
-
-	if (!strcmp(pm->metric_constraint, "NO_NMI_WATCHDOG") &&
-	    sysctl__nmi_watchdog_enabled()) {
+	switch (pm->event_grouping) {
+	case MetricNoGroupEvents:
+		return false;
+	case MetricNoGroupEventsNmi:
+		if (!sysctl__nmi_watchdog_enabled())
+			return true;
 		metric__watchdog_constraint_hint(pm->metric_name, /*foot=*/false);
 		return false;
+	case MetricNoGroupEventsSmt:
+		return !smt_on();
+	case MetricGroupEvents:
+	default:
+		return true;
 	}
-
-	return true;
 }
 
 static void metric__free(struct metric *m)
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 09/51] perf pmu-events: Don't '\0' terminate enum values
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (7 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 08/51] perf pmu-events: Make the metric_constraint an enum Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 11/51] perf vendor events intel: Refresh alderlake-n metrics Ian Rogers
                   ` (27 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Encoding enums like '1\0' wastes a byte and could be '1' (no '\0'
terminator) if the 0 case is '0', it also removes a branch for
decompressing.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/jevents.py | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index dc0c56dccb5e..e82dff3a1228 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -54,6 +54,8 @@ _json_metric_attributes = [
     'metric_name', 'metric_group', 'metric_expr', 'desc',
     'long_desc', 'unit', 'compat', 'aggr_mode', 'event_grouping'
 ]
+# Attributes that are bools or enum int values, encoded as '0', '1',...
+_json_enum_attributes = ['aggr_mode', 'deprecated', 'event_grouping', 'perpkg']
 
 def removesuffix(s: str, suffix: str) -> str:
   """Remove the suffix from a string
@@ -360,7 +362,10 @@ class JsonEvent:
         # Convert parsed metric expressions into a string. Slashes
         # must be doubled in the file.
         x = x.ToPerfJson().replace('\\', '\\\\')
-      s += f'{x}\\000' if x else '\\000'
+      if attr in _json_enum_attributes:
+        s += x if x else '0'
+      else:
+        s += f'{x}\\000' if x else '\\000'
     return s
 
   def to_c_string(self, metric: bool) -> str:
@@ -690,16 +695,18 @@ static void decompress_event(int offset, struct pmu_event *pe)
 {
 \tconst char *p = &big_c_string[offset];
 """)
-  enum_attributes = ['aggr_mode', 'deprecated', 'event_grouping', 'perpkg']
   for attr in _json_event_attributes:
     _args.output_file.write(f'\n\tpe->{attr} = ')
-    if attr in enum_attributes:
-      _args.output_file.write("(*p == '\\0' ? 0 : *p - '0');\n")
+    if attr in _json_enum_attributes:
+      _args.output_file.write("*p - '0';\n")
     else:
       _args.output_file.write("(*p == '\\0' ? NULL : p);\n")
     if attr == _json_event_attributes[-1]:
       continue
-    _args.output_file.write('\twhile (*p++);')
+    if attr in _json_enum_attributes:
+      _args.output_file.write('\tp++;')
+    else:
+      _args.output_file.write('\twhile (*p++);')
   _args.output_file.write("""}
 
 static void decompress_metric(int offset, struct pmu_metric *pm)
@@ -708,13 +715,16 @@ static void decompress_metric(int offset, struct pmu_metric *pm)
 """)
   for attr in _json_metric_attributes:
     _args.output_file.write(f'\n\tpm->{attr} = ')
-    if attr in enum_attributes:
-      _args.output_file.write("(*p == '\\0' ? 0 : *p - '0');\n")
+    if attr in _json_enum_attributes:
+      _args.output_file.write("*p - '0';\n")
     else:
       _args.output_file.write("(*p == '\\0' ? NULL : p);\n")
     if attr == _json_metric_attributes[-1]:
       continue
-    _args.output_file.write('\twhile (*p++);')
+    if attr in _json_enum_attributes:
+      _args.output_file.write('\tp++;')
+    else:
+      _args.output_file.write('\twhile (*p++);')
   _args.output_file.write("""}
 
 int pmu_events_table_for_each_event(const struct pmu_events_table *table,
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 11/51] perf vendor events intel: Refresh alderlake-n metrics
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (8 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 09/51] perf pmu-events: Don't '\0' terminate enum values Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 16/51] perf vendor events intel: Add graniterapids events Ian Rogers
                   ` (26 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Update the alderlake-n events from 1.16 to 1.18 (no change) and
metrics. Generation was done using https://github.com/intel/perfmon.

Notable changes are TMA metrics are updated to version 4.5, TMA info
metrics are renamed from their node name to be lower case and prefixed
by tma_info_, MetricThreshold expressions are added and the smi_cost
metric group is added replicating existing hard coded metrics in
stat-shadow.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 .../arch/x86/alderlaken/adln-metrics.json     | 811 ++++++++++--------
 tools/perf/pmu-events/arch/x86/mapfile.csv    |   2 +-
 2 files changed, 454 insertions(+), 359 deletions(-)

diff --git a/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json b/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json
index 9ab1d5bcf4a2..5078c468480f 100644
--- a/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json
@@ -1,583 +1,678 @@
 [
     {
-        "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to frontend stalls.",
-        "MetricExpr": "TOPDOWN_FE_BOUND.ALL / SLOTS",
-        "MetricGroup": "TopdownL1",
-        "MetricName": "tma_frontend_bound",
-        "ScaleUnit": "100%"
-    },
-    {
-        "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to frontend bandwidth restrictions due to decode, predecode, cisc, and other limitations.",
-        "MetricExpr": "TOPDOWN_FE_BOUND.FRONTEND_LATENCY / SLOTS",
-        "MetricGroup": "TopdownL2;tma_frontend_bound_group",
-        "MetricName": "tma_frontend_latency",
+        "BriefDescription": "C10 residency percent per package",
+        "MetricExpr": "cstate_pkg@c10\\-residency@ / TSC",
+        "MetricGroup": "Power",
+        "MetricName": "C10_Pkg_Residency",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to instruction cache misses.",
-        "MetricExpr": "TOPDOWN_FE_BOUND.ICACHE / SLOTS",
-        "MetricGroup": "TopdownL3;tma_frontend_latency_group",
-        "MetricName": "tma_icache",
+        "BriefDescription": "C1 residency percent per core",
+        "MetricExpr": "cstate_core@c1\\-residency@ / TSC",
+        "MetricGroup": "Power",
+        "MetricName": "C1_Core_Residency",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to Instruction Table Lookaside Buffer (ITLB) misses.",
-        "MetricExpr": "TOPDOWN_FE_BOUND.ITLB / SLOTS",
-        "MetricGroup": "TopdownL3;tma_frontend_latency_group",
-        "MetricName": "tma_itlb",
+        "BriefDescription": "C2 residency percent per package",
+        "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC",
+        "MetricGroup": "Power",
+        "MetricName": "C2_Pkg_Residency",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to BACLEARS, which occurs when the Branch Target Buffer (BTB) prediction or lack thereof, was corrected by a later branch predictor in the frontend",
-        "MetricExpr": "TOPDOWN_FE_BOUND.BRANCH_DETECT / SLOTS",
-        "MetricGroup": "TopdownL3;tma_frontend_latency_group",
-        "MetricName": "tma_branch_detect",
-        "PublicDescription": "Counts the number of issue slots  that were not delivered by the frontend due to BACLEARS, which occurs when the Branch Target Buffer (BTB) prediction or lack thereof, was corrected by a later branch predictor in the frontend. Includes BACLEARS due to all branch types including conditional and unconditional jumps, returns, and indirect branches.",
+        "BriefDescription": "C3 residency percent per package",
+        "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC",
+        "MetricGroup": "Power",
+        "MetricName": "C3_Pkg_Residency",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to BTCLEARS, which occurs when the Branch Target Buffer (BTB) predicts a taken branch.",
-        "MetricExpr": "TOPDOWN_FE_BOUND.BRANCH_RESTEER / SLOTS",
-        "MetricGroup": "TopdownL3;tma_frontend_latency_group",
-        "MetricName": "tma_branch_resteer",
+        "BriefDescription": "C6 residency percent per core",
+        "MetricExpr": "cstate_core@c6\\-residency@ / TSC",
+        "MetricGroup": "Power",
+        "MetricName": "C6_Core_Residency",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to frontend bandwidth restrictions due to decode, predecode, cisc, and other limitations.",
-        "MetricExpr": "TOPDOWN_FE_BOUND.FRONTEND_BANDWIDTH / SLOTS",
-        "MetricGroup": "TopdownL2;tma_frontend_bound_group",
-        "MetricName": "tma_frontend_bandwidth",
+        "BriefDescription": "C6 residency percent per package",
+        "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC",
+        "MetricGroup": "Power",
+        "MetricName": "C6_Pkg_Residency",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to the microcode sequencer (MS).",
-        "MetricExpr": "TOPDOWN_FE_BOUND.CISC / SLOTS",
-        "MetricGroup": "TopdownL3;tma_frontend_bandwidth_group",
-        "MetricName": "tma_cisc",
+        "BriefDescription": "C7 residency percent per core",
+        "MetricExpr": "cstate_core@c7\\-residency@ / TSC",
+        "MetricGroup": "Power",
+        "MetricName": "C7_Core_Residency",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to decode stalls.",
-        "MetricExpr": "TOPDOWN_FE_BOUND.DECODE / SLOTS",
-        "MetricGroup": "TopdownL3;tma_frontend_bandwidth_group",
-        "MetricName": "tma_decode",
+        "BriefDescription": "C7 residency percent per package",
+        "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC",
+        "MetricGroup": "Power",
+        "MetricName": "C7_Pkg_Residency",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to wrong predecodes.",
-        "MetricExpr": "TOPDOWN_FE_BOUND.PREDECODE / SLOTS",
-        "MetricGroup": "TopdownL3;tma_frontend_bandwidth_group",
-        "MetricName": "tma_predecode",
+        "BriefDescription": "C8 residency percent per package",
+        "MetricExpr": "cstate_pkg@c8\\-residency@ / TSC",
+        "MetricGroup": "Power",
+        "MetricName": "C8_Pkg_Residency",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to other common frontend stalls not categorized.",
-        "MetricExpr": "TOPDOWN_FE_BOUND.OTHER / SLOTS",
-        "MetricGroup": "TopdownL3;tma_frontend_bandwidth_group",
-        "MetricName": "tma_other_fb",
+        "BriefDescription": "C9 residency percent per package",
+        "MetricExpr": "cstate_pkg@c9\\-residency@ / TSC",
+        "MetricGroup": "Power",
+        "MetricName": "C9_Pkg_Residency",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a mispredicted jump or a machine clear",
-        "MetricExpr": "(SLOTS - (TOPDOWN_FE_BOUND.ALL + TOPDOWN_BE_BOUND.ALL + TOPDOWN_RETIRING.ALL)) / SLOTS",
-        "MetricGroup": "TopdownL1",
-        "MetricName": "tma_bad_speculation",
-        "PublicDescription": "Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a mispredicted jump or a machine clear. Only issue slots wasted due to fast nukes such as memory ordering nukes are counted. Other nukes are not accounted for. Counts all issue slots blocked during this recovery window including relevant microcode flows and while uops are not yet available in the instruction queue (IQ). Also includes the issue slots that were consumed by the backend but were thrown away because they were younger than the mispredict or machine clear.",
+        "BriefDescription": "Percentage of cycles spent in System Management Interrupts.",
+        "MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)",
+        "MetricGroup": "smi",
+        "MetricName": "smi_cycles",
+        "MetricThreshold": "smi_cycles > 0.1",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to branch mispredicts.",
-        "MetricExpr": "TOPDOWN_BAD_SPECULATION.MISPREDICT / SLOTS",
-        "MetricGroup": "TopdownL2;tma_bad_speculation_group",
-        "MetricName": "tma_branch_mispredicts",
-        "ScaleUnit": "100%"
+        "BriefDescription": "Number of SMI interrupts.",
+        "MetricExpr": "msr@smi@",
+        "MetricGroup": "smi",
+        "MetricName": "smi_num",
+        "ScaleUnit": "1SMI#"
     },
     {
-        "BriefDescription": "Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a machine clear (nuke) of any kind including memory ordering and memory disambiguation.",
-        "MetricExpr": "TOPDOWN_BAD_SPECULATION.MACHINE_CLEARS / SLOTS",
-        "MetricGroup": "TopdownL2;tma_bad_speculation_group",
-        "MetricName": "tma_machine_clears",
+        "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to certain allocation restrictions.",
+        "MetricExpr": "TOPDOWN_BE_BOUND.ALLOC_RESTRICTIONS / tma_info_slots",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group",
+        "MetricName": "tma_alloc_restriction",
+        "MetricThreshold": "tma_alloc_restriction > 0.1",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to a machine clear (slow nuke).",
-        "MetricExpr": "TOPDOWN_BAD_SPECULATION.NUKE / SLOTS",
-        "MetricGroup": "TopdownL3;tma_machine_clears_group",
-        "MetricName": "tma_nuke",
+        "BriefDescription": "Counts the total number of issue slots  that were not consumed by the backend due to backend stalls",
+        "MetricExpr": "TOPDOWN_BE_BOUND.ALL / tma_info_slots",
+        "MetricGroup": "TopdownL1;tma_L1_group",
+        "MetricName": "tma_backend_bound",
+        "MetricThreshold": "tma_backend_bound > 0.1",
+        "PublicDescription": "Counts the total number of issue slots  that were not consumed by the backend due to backend stalls.  Note that uops must be available for consumption in order for this event to count.  If a uop is not available (IQ is empty), this event will not count.   The rest of these subevents count backend stalls, in cycles, due to an outstanding request which is memory bound vs core bound.   The subevents are not slot based events and therefore can not be precisely added or subtracted from the Backend_Bound_Aux subevents which are slot based.",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of machine clears relative to the number of nuke slots due to SMC. ",
-        "MetricExpr": "tma_nuke * (MACHINE_CLEARS.SMC / MACHINE_CLEARS.SLOW)",
-        "MetricGroup": "TopdownL4;tma_nuke_group",
-        "MetricName": "tma_smc",
+        "BriefDescription": "Counts the total number of issue slots  that were not consumed by the backend due to backend stalls",
+        "MetricExpr": "tma_backend_bound",
+        "MetricGroup": "TopdownL1;tma_L1_group",
+        "MetricName": "tma_backend_bound_aux",
+        "MetricThreshold": "tma_backend_bound_aux > 0.2",
+        "PublicDescription": "Counts the total number of issue slots  that were not consumed by the backend due to backend stalls.  Note that UOPS must be available for consumption in order for this event to count.  If a uop is not available (IQ is empty), this event will not count.  All of these subevents count backend stalls, in slots, due to a resource limitation.   These are not cycle based events and therefore can not be precisely added or subtracted from the Backend_Bound subevents which are cycle based.  These subevents are supplementary to Backend_Bound and can be used to analyze results from a resource perspective at allocation.",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of machine clears relative to the number of nuke slots due to memory ordering. ",
-        "MetricExpr": "tma_nuke * (MACHINE_CLEARS.MEMORY_ORDERING / MACHINE_CLEARS.SLOW)",
-        "MetricGroup": "TopdownL4;tma_nuke_group",
-        "MetricName": "tma_memory_ordering",
+        "BriefDescription": "Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a mispredicted jump or a machine clear",
+        "MetricExpr": "(tma_info_slots - (TOPDOWN_FE_BOUND.ALL + TOPDOWN_BE_BOUND.ALL + TOPDOWN_RETIRING.ALL)) / tma_info_slots",
+        "MetricGroup": "TopdownL1;tma_L1_group",
+        "MetricName": "tma_bad_speculation",
+        "MetricThreshold": "tma_bad_speculation > 0.15",
+        "PublicDescription": "Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a mispredicted jump or a machine clear. Only issue slots wasted due to fast nukes such as memory ordering nukes are counted. Other nukes are not accounted for. Counts all issue slots blocked during this recovery window including relevant microcode flows and while uops are not yet available in the instruction queue (IQ). Also includes the issue slots that were consumed by the backend but were thrown away because they were younger than the mispredict or machine clear.",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of machine clears relative to the number of nuke slots due to FP assists. ",
-        "MetricExpr": "tma_nuke * (MACHINE_CLEARS.FP_ASSIST / MACHINE_CLEARS.SLOW)",
-        "MetricGroup": "TopdownL4;tma_nuke_group",
-        "MetricName": "tma_fp_assist",
+        "BriefDescription": "Counts the number of uops that are not from the microsequencer.",
+        "MetricExpr": "(TOPDOWN_RETIRING.ALL - UOPS_RETIRED.MS) / tma_info_slots",
+        "MetricGroup": "TopdownL2;tma_L2_group;tma_retiring_group",
+        "MetricName": "tma_base",
+        "MetricThreshold": "tma_base > 0.6",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of machine clears relative to the number of nuke slots due to memory disambiguation. ",
-        "MetricExpr": "tma_nuke * (MACHINE_CLEARS.DISAMBIGUATION / MACHINE_CLEARS.SLOW)",
-        "MetricGroup": "TopdownL4;tma_nuke_group",
-        "MetricName": "tma_disambiguation",
+        "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to BACLEARS, which occurs when the Branch Target Buffer (BTB) prediction or lack thereof, was corrected by a later branch predictor in the frontend",
+        "MetricExpr": "TOPDOWN_FE_BOUND.BRANCH_DETECT / tma_info_slots",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_frontend_latency_group",
+        "MetricName": "tma_branch_detect",
+        "MetricThreshold": "tma_branch_detect > 0.05",
+        "PublicDescription": "Counts the number of issue slots  that were not delivered by the frontend due to BACLEARS, which occurs when the Branch Target Buffer (BTB) prediction or lack thereof, was corrected by a later branch predictor in the frontend. Includes BACLEARS due to all branch types including conditional and unconditional jumps, returns, and indirect branches.",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of machine clears relative to the number of nuke slots due to page faults. ",
-        "MetricExpr": "tma_nuke * (MACHINE_CLEARS.PAGE_FAULT / MACHINE_CLEARS.SLOW)",
-        "MetricGroup": "TopdownL4;tma_nuke_group",
-        "MetricName": "tma_page_fault",
+        "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to branch mispredicts.",
+        "MetricExpr": "TOPDOWN_BAD_SPECULATION.MISPREDICT / tma_info_slots",
+        "MetricGroup": "TopdownL2;tma_L2_group;tma_bad_speculation_group",
+        "MetricName": "tma_branch_mispredicts",
+        "MetricThreshold": "tma_branch_mispredicts > 0.05",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to a machine clear classified as a fast nuke due to memory ordering, memory disambiguation and memory renaming.",
-        "MetricExpr": "TOPDOWN_BAD_SPECULATION.FASTNUKE / SLOTS",
-        "MetricGroup": "TopdownL3;tma_machine_clears_group",
-        "MetricName": "tma_fast_nuke",
+        "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to BTCLEARS, which occurs when the Branch Target Buffer (BTB) predicts a taken branch.",
+        "MetricExpr": "TOPDOWN_FE_BOUND.BRANCH_RESTEER / tma_info_slots",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_frontend_latency_group",
+        "MetricName": "tma_branch_resteer",
+        "MetricThreshold": "tma_branch_resteer > 0.05",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the total number of issue slots  that were not consumed by the backend due to backend stalls",
-        "MetricExpr": "TOPDOWN_BE_BOUND.ALL / SLOTS",
-        "MetricGroup": "TopdownL1",
-        "MetricName": "tma_backend_bound",
-        "PublicDescription": "Counts the total number of issue slots  that were not consumed by the backend due to backend stalls.  Note that uops must be available for consumption in order for this event to count.  If a uop is not available (IQ is empty), this event will not count.   The rest of these subevents count backend stalls, in cycles, due to an outstanding request which is memory bound vs core bound.   The subevents are not slot based events and therefore can not be precisely added or subtracted from the Backend_Bound_Aux subevents which are slot based.",
+        "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to the microcode sequencer (MS).",
+        "MetricExpr": "TOPDOWN_FE_BOUND.CISC / tma_info_slots",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_frontend_bandwidth_group",
+        "MetricName": "tma_cisc",
+        "MetricThreshold": "tma_cisc > 0.05",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of cycles due to backend bound stalls that are core execution bound and not attributed to outstanding demand load or store stalls. ",
+        "BriefDescription": "Counts the number of cycles due to backend bound stalls that are core execution bound and not attributed to outstanding demand load or store stalls.",
         "MetricExpr": "max(0, tma_backend_bound - tma_load_store_bound)",
-        "MetricGroup": "TopdownL2;tma_backend_bound_group",
+        "MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_core_bound",
+        "MetricThreshold": "tma_core_bound > 0.1",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of cycles the core is stalled due to stores or loads. ",
-        "MetricExpr": "min(tma_backend_bound, LD_HEAD.ANY_AT_RET / CLKS + tma_store_bound)",
-        "MetricGroup": "TopdownL2;tma_backend_bound_group",
-        "MetricName": "tma_load_store_bound",
+        "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to decode stalls.",
+        "MetricExpr": "TOPDOWN_FE_BOUND.DECODE / tma_info_slots",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_frontend_bandwidth_group",
+        "MetricName": "tma_decode",
+        "MetricThreshold": "tma_decode > 0.05",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of cycles the core is stalled due to store buffer full.",
-        "MetricExpr": "tma_st_buffer",
-        "MetricGroup": "TopdownL3;tma_load_store_bound_group",
-        "MetricName": "tma_store_bound",
+        "BriefDescription": "Counts the number of machine clears relative to the number of nuke slots due to memory disambiguation.",
+        "MetricExpr": "tma_nuke * (MACHINE_CLEARS.DISAMBIGUATION / MACHINE_CLEARS.SLOW)",
+        "MetricGroup": "TopdownL4;tma_L4_group;tma_nuke_group",
+        "MetricName": "tma_disambiguation",
+        "MetricThreshold": "tma_disambiguation > 0.02",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of cycles that the oldest load of the load buffer is stalled at retirement due to a load block.",
-        "MetricExpr": "LD_HEAD.L1_BOUND_AT_RET / CLKS",
-        "MetricGroup": "TopdownL3;tma_load_store_bound_group",
-        "MetricName": "tma_l1_bound",
+        "BriefDescription": "Counts the number of cycles the core is stalled due to a demand load miss which hit in DRAM or MMIO (Non-DRAM).",
+        "MetricConstraint": "NO_GROUP_EVENTS",
+        "MetricExpr": "MEM_BOUND_STALLS.LOAD_DRAM_HIT / tma_info_clks - MEM_BOUND_STALLS_AT_RET_CORRECTION * MEM_BOUND_STALLS.LOAD_DRAM_HIT / MEM_BOUND_STALLS.LOAD",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_load_store_bound_group",
+        "MetricName": "tma_dram_bound",
+        "MetricThreshold": "tma_dram_bound > 0.1",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of cycles that the oldest load of the load buffer is stalled at retirement due to a store forward block.",
-        "MetricExpr": "LD_HEAD.ST_ADDR_AT_RET / CLKS",
-        "MetricGroup": "TopdownL4;tma_l1_bound_group",
-        "MetricName": "tma_store_fwd",
+        "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to a machine clear classified as a fast nuke due to memory ordering, memory disambiguation and memory renaming.",
+        "MetricExpr": "TOPDOWN_BAD_SPECULATION.FASTNUKE / tma_info_slots",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_machine_clears_group",
+        "MetricName": "tma_fast_nuke",
+        "MetricThreshold": "tma_fast_nuke > 0.05",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of cycles that the oldest load of the load buffer is stalled at retirement due to a first level TLB miss.",
-        "MetricExpr": "LD_HEAD.DTLB_MISS_AT_RET / CLKS",
-        "MetricGroup": "TopdownL4;tma_l1_bound_group",
-        "MetricName": "tma_stlb_hit",
+        "BriefDescription": "Counts the number of machine clears relative to the number of nuke slots due to FP assists.",
+        "MetricExpr": "tma_nuke * (MACHINE_CLEARS.FP_ASSIST / MACHINE_CLEARS.SLOW)",
+        "MetricGroup": "TopdownL4;tma_L4_group;tma_nuke_group",
+        "MetricName": "tma_fp_assist",
+        "MetricThreshold": "tma_fp_assist > 0.02",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of cycles that the oldest load of the load buffer is stalled at retirement due to a second level TLB miss requiring a page walk.",
-        "MetricExpr": "LD_HEAD.PGWALK_AT_RET / CLKS",
-        "MetricGroup": "TopdownL4;tma_l1_bound_group",
-        "MetricName": "tma_stlb_miss",
+        "BriefDescription": "Counts the number of floating point operations per uop with all default weighting.",
+        "MetricExpr": "UOPS_RETIRED.FPDIV / tma_info_slots",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_base_group",
+        "MetricName": "tma_fp_uops",
+        "MetricThreshold": "tma_fp_uops > 0.2",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of cycles that the oldest load of the load buffer is stalled at retirement due to a number of other load blocks.",
-        "MetricExpr": "LD_HEAD.OTHER_AT_RET / CLKS",
-        "MetricGroup": "TopdownL4;tma_l1_bound_group",
-        "MetricName": "tma_other_l1",
+        "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to frontend bandwidth restrictions due to decode, predecode, cisc, and other limitations.",
+        "MetricExpr": "TOPDOWN_FE_BOUND.FRONTEND_BANDWIDTH / tma_info_slots",
+        "MetricGroup": "TopdownL2;tma_L2_group;tma_frontend_bound_group",
+        "MetricName": "tma_frontend_bandwidth",
+        "MetricThreshold": "tma_frontend_bandwidth > 0.1",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of cycles a core is stalled due to a demand load which hit in the L2 Cache.",
-        "MetricExpr": "MEM_BOUND_STALLS.LOAD_L2_HIT / CLKS - MEM_BOUND_STALLS_AT_RET_CORRECTION * MEM_BOUND_STALLS.LOAD_L2_HIT / MEM_BOUND_STALLS.LOAD",
-        "MetricGroup": "TopdownL3;tma_load_store_bound_group",
-        "MetricName": "tma_l2_bound",
+        "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to frontend stalls.",
+        "MetricExpr": "TOPDOWN_FE_BOUND.ALL / tma_info_slots",
+        "MetricGroup": "TopdownL1;tma_L1_group",
+        "MetricName": "tma_frontend_bound",
+        "MetricThreshold": "tma_frontend_bound > 0.2",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of cycles a core is stalled due to a demand load which hit in the Last Level Cache (LLC) or other core with HITE/F/M.",
-        "MetricExpr": "MEM_BOUND_STALLS.LOAD_LLC_HIT / CLKS - MEM_BOUND_STALLS_AT_RET_CORRECTION * MEM_BOUND_STALLS.LOAD_LLC_HIT / MEM_BOUND_STALLS.LOAD",
-        "MetricGroup": "TopdownL3;tma_load_store_bound_group",
-        "MetricName": "tma_l3_bound",
+        "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to frontend bandwidth restrictions due to decode, predecode, cisc, and other limitations.",
+        "MetricExpr": "TOPDOWN_FE_BOUND.FRONTEND_LATENCY / tma_info_slots",
+        "MetricGroup": "TopdownL2;tma_L2_group;tma_frontend_bound_group",
+        "MetricName": "tma_frontend_latency",
+        "MetricThreshold": "tma_frontend_latency > 0.15",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of cycles the core is stalled due to a demand load miss which hit in DRAM or MMIO (Non-DRAM).",
-        "MetricExpr": "MEM_BOUND_STALLS.LOAD_DRAM_HIT / CLKS - MEM_BOUND_STALLS_AT_RET_CORRECTION * MEM_BOUND_STALLS.LOAD_DRAM_HIT / MEM_BOUND_STALLS.LOAD",
-        "MetricGroup": "TopdownL3;tma_load_store_bound_group",
-        "MetricName": "tma_dram_bound",
+        "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to instruction cache misses.",
+        "MetricExpr": "TOPDOWN_FE_BOUND.ICACHE / tma_info_slots",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_frontend_latency_group",
+        "MetricName": "tma_icache",
+        "MetricThreshold": "tma_icache > 0.05",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Counts the number of cycles the core is stalled due to a demand load miss which hits in the L2, LLC, DRAM or MMIO (Non-DRAM) but could not be correctly attributed or cycles in which the load miss is waiting on a request buffer.",
-        "MetricExpr": "max(0, tma_load_store_bound - (tma_store_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_dram_bound))",
-        "MetricGroup": "TopdownL3;tma_load_store_bound_group",
-        "MetricName": "tma_other_load_store",
-        "ScaleUnit": "100%"
+        "BriefDescription": "Percentage of total non-speculative loads with a address aliasing block",
+        "MetricExpr": "100 * LD_BLOCKS.4K_ALIAS / MEM_UOPS_RETIRED.ALL_LOADS",
+        "MetricName": "tma_info_address_alias_blocks"
     },
     {
-        "BriefDescription": "Counts the total number of issue slots  that were not consumed by the backend due to backend stalls",
-        "MetricExpr": "tma_backend_bound",
-        "MetricGroup": "TopdownL1",
-        "MetricName": "tma_backend_bound_aux",
-        "PublicDescription": "Counts the total number of issue slots  that were not consumed by the backend due to backend stalls.  Note that UOPS must be available for consumption in order for this event to count.  If a uop is not available (IQ is empty), this event will not count.  All of these subevents count backend stalls, in slots, due to a resource limitation.   These are not cycle based events and therefore can not be precisely added or subtracted from the Backend_Bound subevents which are cycle based.  These subevents are supplementary to Backend_Bound and can be used to analyze results from a resource perspective at allocation.  ",
-        "ScaleUnit": "100%"
+        "BriefDescription": "Ratio of all branches which mispredict",
+        "MetricExpr": "BR_MISP_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.ALL_BRANCHES",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_branch_mispredict_ratio"
     },
     {
-        "BriefDescription": "Counts the total number of issue slots  that were not consumed by the backend due to backend stalls",
-        "MetricExpr": "tma_backend_bound",
-        "MetricGroup": "TopdownL2;tma_backend_bound_aux_group",
-        "MetricName": "tma_resource_bound",
-        "PublicDescription": "Counts the total number of issue slots  that were not consumed by the backend due to backend stalls.  Note that uops must be available for consumption in order for this event to count.  If a uop is not available (IQ is empty), this event will not count. ",
-        "ScaleUnit": "100%"
+        "BriefDescription": "Ratio between Mispredicted branches and unknown branches",
+        "MetricExpr": "BR_MISP_RETIRED.ALL_BRANCHES / BACLEARS.ANY",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_branch_mispredict_to_unknown_branch_ratio"
     },
     {
-        "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to memory reservation stalls in which a scheduler is not able to accept uops.",
-        "MetricExpr": "TOPDOWN_BE_BOUND.MEM_SCHEDULER / SLOTS",
-        "MetricGroup": "TopdownL3;tma_resource_bound_group",
-        "MetricName": "tma_mem_scheduler",
-        "ScaleUnit": "100%"
+        "BriefDescription": "",
+        "MetricExpr": "CPU_CLK_UNHALTED.CORE",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_clks"
     },
     {
-        "BriefDescription": "Counts the number of cycles, relative to the number of mem_scheduler slots, in which uops are blocked due to store buffer full",
-        "MetricExpr": "tma_mem_scheduler * (MEM_SCHEDULER_BLOCK.ST_BUF / MEM_SCHEDULER_BLOCK.ALL)",
-        "MetricGroup": "TopdownL4;tma_mem_scheduler_group",
-        "MetricName": "tma_st_buffer",
-        "ScaleUnit": "100%"
+        "BriefDescription": "",
+        "MetricExpr": "CPU_CLK_UNHALTED.CORE_P",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_clks_p"
     },
     {
-        "BriefDescription": "Counts the number of cycles, relative to the number of mem_scheduler slots, in which uops are blocked due to load buffer full",
-        "MetricExpr": "tma_mem_scheduler * MEM_SCHEDULER_BLOCK.LD_BUF / MEM_SCHEDULER_BLOCK.ALL",
-        "MetricGroup": "TopdownL4;tma_mem_scheduler_group",
-        "MetricName": "tma_ld_buffer",
-        "ScaleUnit": "100%"
+        "BriefDescription": "Cycles Per Instruction",
+        "MetricExpr": "tma_info_clks / INST_RETIRED.ANY",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_cpi"
     },
     {
-        "BriefDescription": "Counts the number of cycles, relative to the number of mem_scheduler slots, in which uops are blocked due to RSV full relative ",
-        "MetricExpr": "tma_mem_scheduler * MEM_SCHEDULER_BLOCK.RSV / MEM_SCHEDULER_BLOCK.ALL",
-        "MetricGroup": "TopdownL4;tma_mem_scheduler_group",
-        "MetricName": "tma_rsv",
-        "ScaleUnit": "100%"
+        "BriefDescription": "Average CPU Utilization",
+        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_cpu_utilization"
     },
     {
-        "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to IEC or FPC RAT stalls, which can be due to FIQ or IEC reservation stalls in which the integer, floating point or SIMD scheduler is not able to accept uops.",
-        "MetricExpr": "TOPDOWN_BE_BOUND.NON_MEM_SCHEDULER / SLOTS",
-        "MetricGroup": "TopdownL3;tma_resource_bound_group",
-        "MetricName": "tma_non_mem_scheduler",
-        "ScaleUnit": "100%"
+        "BriefDescription": "Cycle cost per DRAM hit",
+        "MetricExpr": "MEM_BOUND_STALLS.LOAD_DRAM_HIT / MEM_LOAD_UOPS_RETIRED.DRAM_HIT",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_cycles_per_demand_load_dram_hit"
     },
     {
-        "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to the physical register file unable to accept an entry (marble stalls).",
-        "MetricExpr": "TOPDOWN_BE_BOUND.REGISTER / SLOTS",
-        "MetricGroup": "TopdownL3;tma_resource_bound_group",
-        "MetricName": "tma_register",
-        "ScaleUnit": "100%"
+        "BriefDescription": "Cycle cost per L2 hit",
+        "MetricExpr": "MEM_BOUND_STALLS.LOAD_L2_HIT / MEM_LOAD_UOPS_RETIRED.L2_HIT",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_cycles_per_demand_load_l2_hit"
     },
     {
-        "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to the reorder buffer being full (ROB stalls).",
-        "MetricExpr": "TOPDOWN_BE_BOUND.REORDER_BUFFER / SLOTS",
-        "MetricGroup": "TopdownL3;tma_resource_bound_group",
-        "MetricName": "tma_reorder_buffer",
-        "ScaleUnit": "100%"
+        "BriefDescription": "Cycle cost per LLC hit",
+        "MetricExpr": "MEM_BOUND_STALLS.LOAD_LLC_HIT / MEM_LOAD_UOPS_RETIRED.L3_HIT",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_cycles_per_demand_load_l3_hit"
     },
     {
-        "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to certain allocation restrictions.",
-        "MetricExpr": "TOPDOWN_BE_BOUND.ALLOC_RESTRICTIONS / SLOTS",
-        "MetricGroup": "TopdownL3;tma_resource_bound_group",
-        "MetricName": "tma_alloc_restriction",
-        "ScaleUnit": "100%"
+        "BriefDescription": "Percentage of all uops which are FPDiv uops",
+        "MetricExpr": "100 * UOPS_RETIRED.FPDIV / UOPS_RETIRED.ALL",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_fpdiv_uop_ratio"
     },
     {
-        "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to scoreboards from the instruction queue (IQ), jump execution unit (JEU), or microcode sequencer (MS).",
-        "MetricExpr": "TOPDOWN_BE_BOUND.SERIALIZATION / SLOTS",
-        "MetricGroup": "TopdownL3;tma_resource_bound_group",
-        "MetricName": "tma_serialization",
-        "ScaleUnit": "100%"
+        "BriefDescription": "Percentage of all uops which are IDiv uops",
+        "MetricExpr": "100 * UOPS_RETIRED.IDIV / UOPS_RETIRED.ALL",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_idiv_uop_ratio"
     },
     {
-        "BriefDescription": "Counts the numer of issue slots  that result in retirement slots. ",
-        "MetricExpr": "TOPDOWN_RETIRING.ALL / SLOTS",
-        "MetricGroup": "TopdownL1",
-        "MetricName": "tma_retiring",
-        "ScaleUnit": "100%"
+        "BriefDescription": "Percent of instruction miss cost that hit in DRAM",
+        "MetricExpr": "100 * MEM_BOUND_STALLS.IFETCH_DRAM_HIT / MEM_BOUND_STALLS.IFETCH",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_inst_miss_cost_dramhit_percent"
     },
     {
-        "BriefDescription": "Counts the number of uops that are not from the microsequencer. ",
-        "MetricExpr": "(TOPDOWN_RETIRING.ALL - UOPS_RETIRED.MS) / SLOTS",
-        "MetricGroup": "TopdownL2;tma_retiring_group",
-        "MetricName": "tma_base",
-        "ScaleUnit": "100%"
+        "BriefDescription": "Percent of instruction miss cost that hit in the L2",
+        "MetricExpr": "100 * MEM_BOUND_STALLS.IFETCH_L2_HIT / MEM_BOUND_STALLS.IFETCH",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_inst_miss_cost_l2hit_percent"
     },
     {
-        "BriefDescription": "Counts the number of floating point operations per uop with all default weighting.",
-        "MetricExpr": "UOPS_RETIRED.FPDIV / SLOTS",
-        "MetricGroup": "TopdownL3;tma_base_group",
-        "MetricName": "tma_fp_uops",
-        "ScaleUnit": "100%"
+        "BriefDescription": "Percent of instruction miss cost that hit in the L3",
+        "MetricExpr": "100 * MEM_BOUND_STALLS.IFETCH_LLC_HIT / MEM_BOUND_STALLS.IFETCH",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_inst_miss_cost_l3hit_percent"
     },
     {
-        "BriefDescription": "Counts the number of uops retired excluding ms and fp div uops.",
-        "MetricExpr": "(TOPDOWN_RETIRING.ALL - UOPS_RETIRED.MS - UOPS_RETIRED.FPDIV) / SLOTS",
-        "MetricGroup": "TopdownL3;tma_base_group",
-        "MetricName": "tma_other_ret",
-        "ScaleUnit": "100%"
+        "BriefDescription": "Instructions per Branch (lower number means higher occurance rate)",
+        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.ALL_BRANCHES",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_ipbranch"
     },
     {
-        "BriefDescription": "Counts the number of uops that are from the complex flows issued by the micro-sequencer (MS)",
-        "MetricExpr": "UOPS_RETIRED.MS / SLOTS",
-        "MetricGroup": "TopdownL2;tma_retiring_group",
-        "MetricName": "tma_ms_uops",
-        "PublicDescription": "Counts the number of uops that are from the complex flows issued by the micro-sequencer (MS).  This includes uops from flows due to complex instructions, faults, assists, and inserted flows.",
-        "ScaleUnit": "100%"
+        "BriefDescription": "Instructions Per Cycle",
+        "MetricExpr": "INST_RETIRED.ANY / tma_info_clks",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_ipc"
     },
     {
-        "BriefDescription": "",
-        "MetricExpr": "CPU_CLK_UNHALTED.CORE",
-        "MetricName": "CLKS"
+        "BriefDescription": "Instruction per (near) call (lower number means higher occurance rate)",
+        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.CALL",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_ipcall"
     },
     {
-        "BriefDescription": "",
-        "MetricExpr": "CPU_CLK_UNHALTED.CORE_P",
-        "MetricName": "CLKS_P"
+        "BriefDescription": "Instructions per Far Branch",
+        "MetricExpr": "INST_RETIRED.ANY / (BR_INST_RETIRED.FAR_BRANCH / 2)",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_ipfarbranch"
     },
     {
-        "BriefDescription": "",
-        "MetricExpr": "5 * CLKS",
-        "MetricName": "SLOTS"
+        "BriefDescription": "Instructions per Load",
+        "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_LOADS",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_ipload"
     },
     {
-        "BriefDescription": "Instructions Per Cycle",
-        "MetricExpr": "INST_RETIRED.ANY / CLKS",
-        "MetricName": "IPC"
+        "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction",
+        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_ipmispredict"
     },
     {
-        "BriefDescription": "Cycles Per Instruction",
-        "MetricExpr": "CLKS / INST_RETIRED.ANY",
-        "MetricName": "CPI"
+        "BriefDescription": "Instructions per Store",
+        "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_STORES",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_ipstore"
     },
     {
-        "BriefDescription": "Uops Per Instruction",
-        "MetricExpr": "UOPS_RETIRED.ALL / INST_RETIRED.ANY",
-        "MetricName": "UPI"
+        "BriefDescription": "Fraction of cycles spent in Kernel mode",
+        "MetricExpr": "cpu@CPU_CLK_UNHALTED.CORE@k / CPU_CLK_UNHALTED.CORE",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_kernel_utilization"
     },
     {
-        "BriefDescription": "Percentage of total non-speculative loads with a store forward or unknown store address block",
-        "MetricExpr": "100 * LD_BLOCKS.DATA_UNKNOWN / MEM_UOPS_RETIRED.ALL_LOADS",
-        "MetricName": "Store_Fwd_Blocks"
+        "BriefDescription": "Percentage of total non-speculative loads that are splits",
+        "MetricExpr": "100 * MEM_UOPS_RETIRED.SPLIT_LOADS / MEM_UOPS_RETIRED.ALL_LOADS",
+        "MetricName": "tma_info_load_splits"
     },
     {
-        "BriefDescription": "Percentage of total non-speculative loads with a address aliasing block",
-        "MetricExpr": "100 * LD_BLOCKS.4K_ALIAS / MEM_UOPS_RETIRED.ALL_LOADS",
-        "MetricName": "Address_Alias_Blocks"
+        "BriefDescription": "load ops retired per 1000 instruction",
+        "MetricExpr": "1e3 * MEM_UOPS_RETIRED.ALL_LOADS / INST_RETIRED.ANY",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_memloadpki"
     },
     {
-        "BriefDescription": "Percentage of total non-speculative loads that are splits",
-        "MetricExpr": "100 * MEM_UOPS_RETIRED.SPLIT_LOADS / MEM_UOPS_RETIRED.ALL_LOADS",
-        "MetricName": "Load_Splits"
+        "BriefDescription": "Percentage of all uops which are ucode ops",
+        "MetricExpr": "100 * UOPS_RETIRED.MS / UOPS_RETIRED.ALL",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_microcode_uop_ratio"
     },
     {
-        "BriefDescription": "Instructions per Branch (lower number means higher occurrence rate)",
-        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.ALL_BRANCHES",
-        "MetricName": "IpBranch"
+        "BriefDescription": "",
+        "MetricExpr": "5 * tma_info_clks",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_slots"
     },
     {
-        "BriefDescription": "Instruction per (near) call (lower number means higher occurrence rate)",
-        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.CALL",
-        "MetricName": "IpCall"
+        "BriefDescription": "Percentage of total non-speculative loads with a store forward or unknown store address block",
+        "MetricExpr": "100 * LD_BLOCKS.DATA_UNKNOWN / MEM_UOPS_RETIRED.ALL_LOADS",
+        "MetricName": "tma_info_store_fwd_blocks"
     },
     {
-        "BriefDescription": "Instructions per Load",
-        "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_LOADS",
-        "MetricName": "IpLoad"
+        "BriefDescription": "Average Frequency Utilization relative nominal frequency",
+        "MetricExpr": "tma_info_clks / CPU_CLK_UNHALTED.REF_TSC",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_turbo_utilization"
     },
     {
-        "BriefDescription": "Instructions per Store",
-        "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_STORES",
-        "MetricName": "IpStore"
+        "BriefDescription": "Uops Per Instruction",
+        "MetricExpr": "UOPS_RETIRED.ALL / INST_RETIRED.ANY",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_upi"
     },
     {
-        "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction",
-        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
-        "MetricName": "IpMispredict"
+        "BriefDescription": "Percentage of all uops which are x87 uops",
+        "MetricExpr": "100 * UOPS_RETIRED.X87 / UOPS_RETIRED.ALL",
+        "MetricGroup": " ",
+        "MetricName": "tma_info_x87_uop_ratio"
     },
     {
-        "BriefDescription": "Instructions per Far Branch",
-        "MetricExpr": "INST_RETIRED.ANY / (BR_INST_RETIRED.FAR_BRANCH / 2)",
-        "MetricName": "IpFarBranch"
+        "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to Instruction Table Lookaside Buffer (ITLB) misses.",
+        "MetricExpr": "TOPDOWN_FE_BOUND.ITLB / tma_info_slots",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_frontend_latency_group",
+        "MetricName": "tma_itlb",
+        "MetricThreshold": "tma_itlb > 0.05",
+        "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Ratio of all branches which mispredict",
-        "MetricExpr": "BR_MISP_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.ALL_BRANCHES",
-        "MetricName": "Branch_Mispredict_Ratio"
+        "BriefDescription": "Counts the number of cycles that the oldest load of the load buffer is stalled at retirement due to a load block.",
+        "MetricExpr": "LD_HEAD.L1_BOUND_AT_RET / tma_info_clks",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_load_store_bound_group",
+        "MetricName": "tma_l1_bound",
+        "MetricThreshold": "tma_l1_bound > 0.1",
+        "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Ratio between Mispredicted branches and unknown branches",
-        "MetricExpr": "BR_MISP_RETIRED.ALL_BRANCHES / BACLEARS.ANY",
-        "MetricName": "Branch_Mispredict_to_Unknown_Branch_Ratio"
+        "BriefDescription": "Counts the number of cycles a core is stalled due to a demand load which hit in the L2 Cache.",
+        "MetricConstraint": "NO_GROUP_EVENTS",
+        "MetricExpr": "MEM_BOUND_STALLS.LOAD_L2_HIT / tma_info_clks - MEM_BOUND_STALLS_AT_RET_CORRECTION * MEM_BOUND_STALLS.LOAD_L2_HIT / MEM_BOUND_STALLS.LOAD",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_load_store_bound_group",
+        "MetricName": "tma_l2_bound",
+        "MetricThreshold": "tma_l2_bound > 0.1",
+        "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Percentage of all uops which are ucode ops",
-        "MetricExpr": "100 * UOPS_RETIRED.MS / UOPS_RETIRED.ALL",
-        "MetricName": "Microcode_Uop_Ratio"
+        "BriefDescription": "Counts the number of cycles a core is stalled due to a demand load which hit in the Last Level Cache (LLC) or other core with HITE/F/M.",
+        "MetricExpr": "MEM_BOUND_STALLS.LOAD_LLC_HIT / tma_info_clks - MEM_BOUND_STALLS_AT_RET_CORRECTION * MEM_BOUND_STALLS.LOAD_LLC_HIT / MEM_BOUND_STALLS.LOAD",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_load_store_bound_group",
+        "MetricName": "tma_l3_bound",
+        "MetricThreshold": "tma_l3_bound > 0.1",
+        "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Percentage of all uops which are FPDiv uops",
-        "MetricExpr": "100 * UOPS_RETIRED.FPDIV / UOPS_RETIRED.ALL",
-        "MetricName": "FPDiv_Uop_Ratio"
+        "BriefDescription": "Counts the number of cycles, relative to the number of mem_scheduler slots, in which uops are blocked due to load buffer full",
+        "MetricExpr": "tma_mem_scheduler * MEM_SCHEDULER_BLOCK.LD_BUF / MEM_SCHEDULER_BLOCK.ALL",
+        "MetricGroup": "TopdownL4;tma_L4_group;tma_mem_scheduler_group",
+        "MetricName": "tma_ld_buffer",
+        "MetricThreshold": "tma_ld_buffer > 0.05",
+        "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Percentage of all uops which are IDiv uops",
-        "MetricExpr": "100 * UOPS_RETIRED.IDIV / UOPS_RETIRED.ALL",
-        "MetricName": "IDiv_Uop_Ratio"
+        "BriefDescription": "Counts the number of cycles the core is stalled due to stores or loads.",
+        "MetricExpr": "min(tma_backend_bound, LD_HEAD.ANY_AT_RET / tma_info_clks + tma_store_bound)",
+        "MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_group",
+        "MetricName": "tma_load_store_bound",
+        "MetricThreshold": "tma_load_store_bound > 0.2",
+        "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Percentage of all uops which are x87 uops",
-        "MetricExpr": "100 * UOPS_RETIRED.X87 / UOPS_RETIRED.ALL",
-        "MetricName": "X87_Uop_Ratio"
+        "BriefDescription": "Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a machine clear (nuke) of any kind including memory ordering and memory disambiguation.",
+        "MetricExpr": "TOPDOWN_BAD_SPECULATION.MACHINE_CLEARS / tma_info_slots",
+        "MetricGroup": "TopdownL2;tma_L2_group;tma_bad_speculation_group",
+        "MetricName": "tma_machine_clears",
+        "MetricThreshold": "tma_machine_clears > 0.05",
+        "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Average Frequency Utilization relative nominal frequency",
-        "MetricExpr": "CLKS / CPU_CLK_UNHALTED.REF_TSC",
-        "MetricName": "Turbo_Utilization"
+        "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to memory reservation stalls in which a scheduler is not able to accept uops.",
+        "MetricExpr": "TOPDOWN_BE_BOUND.MEM_SCHEDULER / tma_info_slots",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group",
+        "MetricName": "tma_mem_scheduler",
+        "MetricThreshold": "tma_mem_scheduler > 0.1",
+        "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Fraction of cycles spent in Kernel mode",
-        "MetricExpr": "cpu@CPU_CLK_UNHALTED.CORE@k / CPU_CLK_UNHALTED.CORE",
-        "MetricName": "Kernel_Utilization"
+        "BriefDescription": "Counts the number of machine clears relative to the number of nuke slots due to memory ordering.",
+        "MetricExpr": "tma_nuke * (MACHINE_CLEARS.MEMORY_ORDERING / MACHINE_CLEARS.SLOW)",
+        "MetricGroup": "TopdownL4;tma_L4_group;tma_nuke_group",
+        "MetricName": "tma_memory_ordering",
+        "MetricThreshold": "tma_memory_ordering > 0.02",
+        "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Average CPU Utilization",
-        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC",
-        "MetricName": "CPU_Utilization"
+        "BriefDescription": "Counts the number of uops that are from the complex flows issued by the micro-sequencer (MS)",
+        "MetricExpr": "UOPS_RETIRED.MS / tma_info_slots",
+        "MetricGroup": "TopdownL2;tma_L2_group;tma_retiring_group",
+        "MetricName": "tma_ms_uops",
+        "MetricThreshold": "tma_ms_uops > 0.05",
+        "PublicDescription": "Counts the number of uops that are from the complex flows issued by the micro-sequencer (MS).  This includes uops from flows due to complex instructions, faults, assists, and inserted flows.",
+        "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Cycle cost per L2 hit",
-        "MetricExpr": "MEM_BOUND_STALLS.LOAD_L2_HIT / MEM_LOAD_UOPS_RETIRED.L2_HIT",
-        "MetricName": "Cycles_per_Demand_Load_L2_Hit"
+        "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to IEC or FPC RAT stalls, which can be due to FIQ or IEC reservation stalls in which the integer, floating point or SIMD scheduler is not able to accept uops.",
+        "MetricExpr": "TOPDOWN_BE_BOUND.NON_MEM_SCHEDULER / tma_info_slots",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group",
+        "MetricName": "tma_non_mem_scheduler",
+        "MetricThreshold": "tma_non_mem_scheduler > 0.1",
+        "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Cycle cost per LLC hit",
-        "MetricExpr": "MEM_BOUND_STALLS.LOAD_LLC_HIT / MEM_LOAD_UOPS_RETIRED.L3_HIT",
-        "MetricName": "Cycles_per_Demand_Load_L3_Hit"
+        "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to a machine clear (slow nuke).",
+        "MetricExpr": "TOPDOWN_BAD_SPECULATION.NUKE / tma_info_slots",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_machine_clears_group",
+        "MetricName": "tma_nuke",
+        "MetricThreshold": "tma_nuke > 0.05",
+        "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Cycle cost per DRAM hit",
-        "MetricExpr": "MEM_BOUND_STALLS.LOAD_DRAM_HIT / MEM_LOAD_UOPS_RETIRED.DRAM_HIT",
-        "MetricName": "Cycles_per_Demand_Load_DRAM_Hit"
+        "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to other common frontend stalls not categorized.",
+        "MetricExpr": "TOPDOWN_FE_BOUND.OTHER / tma_info_slots",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_frontend_bandwidth_group",
+        "MetricName": "tma_other_fb",
+        "MetricThreshold": "tma_other_fb > 0.05",
+        "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Percent of instruction miss cost that hit in the L2",
-        "MetricExpr": "100 * MEM_BOUND_STALLS.IFETCH_L2_HIT / MEM_BOUND_STALLS.IFETCH",
-        "MetricName": "Inst_Miss_Cost_L2Hit_Percent"
+        "BriefDescription": "Counts the number of cycles that the oldest load of the load buffer is stalled at retirement due to a number of other load blocks.",
+        "MetricExpr": "LD_HEAD.OTHER_AT_RET / tma_info_clks",
+        "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group",
+        "MetricName": "tma_other_l1",
+        "MetricThreshold": "tma_other_l1 > 0.05",
+        "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Percent of instruction miss cost that hit in the L3",
-        "MetricExpr": "100 * MEM_BOUND_STALLS.IFETCH_LLC_HIT / MEM_BOUND_STALLS.IFETCH",
-        "MetricName": "Inst_Miss_Cost_L3Hit_Percent"
+        "BriefDescription": "Counts the number of cycles the core is stalled due to a demand load miss which hits in the L2, LLC, DRAM or MMIO (Non-DRAM) but could not be correctly attributed or cycles in which the load miss is waiting on a request buffer.",
+        "MetricExpr": "max(0, tma_load_store_bound - (tma_store_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_dram_bound))",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_load_store_bound_group",
+        "MetricName": "tma_other_load_store",
+        "MetricThreshold": "tma_other_load_store > 0.1",
+        "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Percent of instruction miss cost that hit in DRAM",
-        "MetricExpr": "100 * MEM_BOUND_STALLS.IFETCH_DRAM_HIT / MEM_BOUND_STALLS.IFETCH",
-        "MetricName": "Inst_Miss_Cost_DRAMHit_Percent"
+        "BriefDescription": "Counts the number of uops retired excluding ms and fp div uops.",
+        "MetricExpr": "(TOPDOWN_RETIRING.ALL - UOPS_RETIRED.MS - UOPS_RETIRED.FPDIV) / tma_info_slots",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_base_group",
+        "MetricName": "tma_other_ret",
+        "MetricThreshold": "tma_other_ret > 0.3",
+        "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "load ops retired per 1000 instruction",
-        "MetricExpr": "1e3 * MEM_UOPS_RETIRED.ALL_LOADS / INST_RETIRED.ANY",
-        "MetricName": "MemLoadPKI"
+        "BriefDescription": "Counts the number of machine clears relative to the number of nuke slots due to page faults.",
+        "MetricExpr": "tma_nuke * (MACHINE_CLEARS.PAGE_FAULT / MACHINE_CLEARS.SLOW)",
+        "MetricGroup": "TopdownL4;tma_L4_group;tma_nuke_group",
+        "MetricName": "tma_page_fault",
+        "MetricThreshold": "tma_page_fault > 0.02",
+        "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "C1 residency percent per core",
-        "MetricExpr": "cstate_core@c1\\-residency@ / TSC",
-        "MetricGroup": "Power",
-        "MetricName": "C1_Core_Residency",
+        "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to wrong predecodes.",
+        "MetricExpr": "TOPDOWN_FE_BOUND.PREDECODE / tma_info_slots",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_frontend_bandwidth_group",
+        "MetricName": "tma_predecode",
+        "MetricThreshold": "tma_predecode > 0.05",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "C6 residency percent per core",
-        "MetricExpr": "cstate_core@c6\\-residency@ / TSC",
-        "MetricGroup": "Power",
-        "MetricName": "C6_Core_Residency",
+        "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to the physical register file unable to accept an entry (marble stalls).",
+        "MetricExpr": "TOPDOWN_BE_BOUND.REGISTER / tma_info_slots",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group",
+        "MetricName": "tma_register",
+        "MetricThreshold": "tma_register > 0.1",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "C7 residency percent per core",
-        "MetricExpr": "cstate_core@c7\\-residency@ / TSC",
-        "MetricGroup": "Power",
-        "MetricName": "C7_Core_Residency",
+        "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to the reorder buffer being full (ROB stalls).",
+        "MetricExpr": "TOPDOWN_BE_BOUND.REORDER_BUFFER / tma_info_slots",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group",
+        "MetricName": "tma_reorder_buffer",
+        "MetricThreshold": "tma_reorder_buffer > 0.1",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "C2 residency percent per package",
-        "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC",
-        "MetricGroup": "Power",
-        "MetricName": "C2_Pkg_Residency",
+        "BriefDescription": "Counts the total number of issue slots  that were not consumed by the backend due to backend stalls",
+        "MetricExpr": "tma_backend_bound",
+        "MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_aux_group",
+        "MetricName": "tma_resource_bound",
+        "MetricThreshold": "tma_resource_bound > 0.2",
+        "PublicDescription": "Counts the total number of issue slots  that were not consumed by the backend due to backend stalls.  Note that uops must be available for consumption in order for this event to count.  If a uop is not available (IQ is empty), this event will not count.",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "C3 residency percent per package",
-        "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC",
-        "MetricGroup": "Power",
-        "MetricName": "C3_Pkg_Residency",
+        "BriefDescription": "Counts the numer of issue slots  that result in retirement slots.",
+        "MetricExpr": "TOPDOWN_RETIRING.ALL / tma_info_slots",
+        "MetricGroup": "TopdownL1;tma_L1_group",
+        "MetricName": "tma_retiring",
+        "MetricThreshold": "tma_retiring > 0.75",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "C6 residency percent per package",
-        "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC",
-        "MetricGroup": "Power",
-        "MetricName": "C6_Pkg_Residency",
+        "BriefDescription": "Counts the number of cycles, relative to the number of mem_scheduler slots, in which uops are blocked due to RSV full relative",
+        "MetricExpr": "tma_mem_scheduler * MEM_SCHEDULER_BLOCK.RSV / MEM_SCHEDULER_BLOCK.ALL",
+        "MetricGroup": "TopdownL4;tma_L4_group;tma_mem_scheduler_group",
+        "MetricName": "tma_rsv",
+        "MetricThreshold": "tma_rsv > 0.05",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "C7 residency percent per package",
-        "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC",
-        "MetricGroup": "Power",
-        "MetricName": "C7_Pkg_Residency",
+        "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to scoreboards from the instruction queue (IQ), jump execution unit (JEU), or microcode sequencer (MS).",
+        "MetricExpr": "TOPDOWN_BE_BOUND.SERIALIZATION / tma_info_slots",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group",
+        "MetricName": "tma_serialization",
+        "MetricThreshold": "tma_serialization > 0.1",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "C8 residency percent per package",
-        "MetricExpr": "cstate_pkg@c8\\-residency@ / TSC",
-        "MetricGroup": "Power",
-        "MetricName": "C8_Pkg_Residency",
+        "BriefDescription": "Counts the number of machine clears relative to the number of nuke slots due to SMC.",
+        "MetricExpr": "tma_nuke * (MACHINE_CLEARS.SMC / MACHINE_CLEARS.SLOW)",
+        "MetricGroup": "TopdownL4;tma_L4_group;tma_nuke_group",
+        "MetricName": "tma_smc",
+        "MetricThreshold": "tma_smc > 0.02",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "C9 residency percent per package",
-        "MetricExpr": "cstate_pkg@c9\\-residency@ / TSC",
-        "MetricGroup": "Power",
-        "MetricName": "C9_Pkg_Residency",
+        "BriefDescription": "Counts the number of cycles, relative to the number of mem_scheduler slots, in which uops are blocked due to store buffer full",
+        "MetricExpr": "tma_store_bound",
+        "MetricGroup": "TopdownL4;tma_L4_group;tma_mem_scheduler_group",
+        "MetricName": "tma_st_buffer",
+        "MetricThreshold": "tma_st_buffer > 0.05",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "C10 residency percent per package",
-        "MetricExpr": "cstate_pkg@c10\\-residency@ / TSC",
-        "MetricGroup": "Power",
-        "MetricName": "C10_Pkg_Residency",
+        "BriefDescription": "Counts the number of cycles that the oldest load of the load buffer is stalled at retirement due to a first level TLB miss.",
+        "MetricExpr": "LD_HEAD.DTLB_MISS_AT_RET / tma_info_clks",
+        "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group",
+        "MetricName": "tma_stlb_hit",
+        "MetricThreshold": "tma_stlb_hit > 0.05",
+        "ScaleUnit": "100%"
+    },
+    {
+        "BriefDescription": "Counts the number of cycles that the oldest load of the load buffer is stalled at retirement due to a second level TLB miss requiring a page walk.",
+        "MetricExpr": "LD_HEAD.PGWALK_AT_RET / tma_info_clks",
+        "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group",
+        "MetricName": "tma_stlb_miss",
+        "MetricThreshold": "tma_stlb_miss > 0.05",
+        "ScaleUnit": "100%"
+    },
+    {
+        "BriefDescription": "Counts the number of cycles the core is stalled due to store buffer full.",
+        "MetricExpr": "tma_mem_scheduler * (MEM_SCHEDULER_BLOCK.ST_BUF / MEM_SCHEDULER_BLOCK.ALL)",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_load_store_bound_group",
+        "MetricName": "tma_store_bound",
+        "MetricThreshold": "tma_store_bound > 0.1",
+        "ScaleUnit": "100%"
+    },
+    {
+        "BriefDescription": "Counts the number of cycles that the oldest load of the load buffer is stalled at retirement due to a store forward block.",
+        "MetricExpr": "LD_HEAD.ST_ADDR_AT_RET / tma_info_clks",
+        "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group",
+        "MetricName": "tma_store_fwd",
+        "MetricThreshold": "tma_store_fwd > 0.05",
         "ScaleUnit": "100%"
     }
 ]
diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv
index 4bcccab07ea9..cad51223d0ea 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -1,6 +1,6 @@
 Family-model,Version,Filename,EventType
 GenuineIntel-6-(97|9A|B7|BA|BF),v1.18,alderlake,core
-GenuineIntel-6-BE,v1.16,alderlaken,core
+GenuineIntel-6-BE,v1.18,alderlaken,core
 GenuineIntel-6-(1C|26|27|35|36),v4,bonnell,core
 GenuineIntel-6-(3D|47),v26,broadwell,core
 GenuineIntel-6-56,v7,broadwellde,core
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 16/51] perf vendor events intel: Add graniterapids events
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (9 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 11/51] perf vendor events intel: Refresh alderlake-n metrics Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 24/51] perf vendor events intel: Refresh knightslanding events Ian Rogers
                   ` (25 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Add version 1.00 of the graniterapids events from
https://github.com/intel/perfmon.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 .../arch/x86/graniterapids/cache.json         |  54 ++++++
 .../arch/x86/graniterapids/frontend.json      |  10 +
 .../arch/x86/graniterapids/memory.json        | 174 ++++++++++++++++++
 .../arch/x86/graniterapids/other.json         |  29 +++
 .../arch/x86/graniterapids/pipeline.json      | 102 ++++++++++
 .../x86/graniterapids/virtual-memory.json     |  26 +++
 tools/perf/pmu-events/arch/x86/mapfile.csv    |   1 +
 7 files changed, 396 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/cache.json
 create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/frontend.json
 create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/memory.json
 create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/other.json
 create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json
 create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/virtual-memory.json

diff --git a/tools/perf/pmu-events/arch/x86/graniterapids/cache.json b/tools/perf/pmu-events/arch/x86/graniterapids/cache.json
new file mode 100644
index 000000000000..56212827870c
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/graniterapids/cache.json
@@ -0,0 +1,54 @@
+[
+    {
+        "BriefDescription": "L2 code requests",
+        "EventCode": "0x24",
+        "EventName": "L2_RQSTS.ALL_CODE_RD",
+        "PublicDescription": "Counts the total number of L2 code requests.",
+        "SampleAfterValue": "200003",
+        "UMask": "0xe4"
+    },
+    {
+        "BriefDescription": "Demand Data Read access L2 cache",
+        "EventCode": "0x24",
+        "EventName": "L2_RQSTS.ALL_DEMAND_DATA_RD",
+        "PublicDescription": "Counts Demand Data Read requests accessing the L2 cache. These requests may hit or miss L2 cache. True-miss exclude misses that were merged with ongoing L2 misses. An access is counted once.",
+        "SampleAfterValue": "200003",
+        "UMask": "0xe1"
+    },
+    {
+        "BriefDescription": "Core-originated cacheable requests that missed L3  (Except hardware prefetches to the L3)",
+        "EventCode": "0x2e",
+        "EventName": "LONGEST_LAT_CACHE.MISS",
+        "PublicDescription": "Counts core-originated cacheable requests that miss the L3 cache (Longest Latency cache). Requests include data and code reads, Reads-for-Ownership (RFOs), speculative accesses and hardware prefetches to the L1 and L2.  It does not include hardware prefetches to the L3, and may not count other types of requests to the L3.",
+        "SampleAfterValue": "100003",
+        "UMask": "0x41"
+    },
+    {
+        "BriefDescription": "Core-originated cacheable requests that refer to L3 (Except hardware prefetches to the L3)",
+        "EventCode": "0x2e",
+        "EventName": "LONGEST_LAT_CACHE.REFERENCE",
+        "PublicDescription": "Counts core-originated cacheable requests to the L3 cache (Longest Latency cache). Requests include data and code reads, Reads-for-Ownership (RFOs), speculative accesses and hardware prefetches to the L1 and L2.  It does not include hardware prefetches to the L3, and may not count other types of requests to the L3.",
+        "SampleAfterValue": "100003",
+        "UMask": "0x4f"
+    },
+    {
+        "BriefDescription": "Retired load instructions.",
+        "Data_LA": "1",
+        "EventCode": "0xd0",
+        "EventName": "MEM_INST_RETIRED.ALL_LOADS",
+        "PEBS": "1",
+        "PublicDescription": "Counts all retired load instructions. This event accounts for SW prefetch instructions of PREFETCHNTA or PREFETCHT0/1/2 or PREFETCHW.",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x81"
+    },
+    {
+        "BriefDescription": "Retired store instructions.",
+        "Data_LA": "1",
+        "EventCode": "0xd0",
+        "EventName": "MEM_INST_RETIRED.ALL_STORES",
+        "PEBS": "1",
+        "PublicDescription": "Counts all retired store instructions.",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x82"
+    }
+]
diff --git a/tools/perf/pmu-events/arch/x86/graniterapids/frontend.json b/tools/perf/pmu-events/arch/x86/graniterapids/frontend.json
new file mode 100644
index 000000000000..dfd9c5ea1584
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/graniterapids/frontend.json
@@ -0,0 +1,10 @@
+[
+    {
+        "BriefDescription": "This event counts a subset of the Topdown Slots event that were no operation was delivered to the back-end pipeline due to instruction fetch limitations when the back-end could have accepted more operations. Common examples include instruction cache misses or x86 instruction decode limitations.",
+        "EventCode": "0x9c",
+        "EventName": "IDQ_BUBBLES.CORE",
+        "PublicDescription": "This event counts a subset of the Topdown Slots event that were no operation was delivered to the back-end pipeline due to instruction fetch limitations when the back-end could have accepted more operations. Common examples include instruction cache misses or x86 instruction decode limitations.\nThe count may be distributed among unhalted logical processors (hyper-threads) who share the same physical core, in processors that support Intel Hyper-Threading Technology. Software can use this event as the nominator for the Frontend Bound metric (or top-level category) of the Top-down Microarchitecture Analysis method.",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x1"
+    }
+]
diff --git a/tools/perf/pmu-events/arch/x86/graniterapids/memory.json b/tools/perf/pmu-events/arch/x86/graniterapids/memory.json
new file mode 100644
index 000000000000..1c0e0e86e58e
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/graniterapids/memory.json
@@ -0,0 +1,174 @@
+[
+    {
+        "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 128 cycles.",
+        "Data_LA": "1",
+        "EventCode": "0xcd",
+        "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_128",
+        "MSRIndex": "0x3F6",
+        "MSRValue": "0x80",
+        "PEBS": "2",
+        "PublicDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 128 cycles.  Reported latency may be longer than just the memory latency.",
+        "SampleAfterValue": "1009",
+        "UMask": "0x1"
+    },
+    {
+        "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 16 cycles.",
+        "Data_LA": "1",
+        "EventCode": "0xcd",
+        "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_16",
+        "MSRIndex": "0x3F6",
+        "MSRValue": "0x10",
+        "PEBS": "2",
+        "PublicDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 16 cycles.  Reported latency may be longer than just the memory latency.",
+        "SampleAfterValue": "20011",
+        "UMask": "0x1"
+    },
+    {
+        "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 256 cycles.",
+        "Data_LA": "1",
+        "EventCode": "0xcd",
+        "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_256",
+        "MSRIndex": "0x3F6",
+        "MSRValue": "0x100",
+        "PEBS": "2",
+        "PublicDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 256 cycles.  Reported latency may be longer than just the memory latency.",
+        "SampleAfterValue": "503",
+        "UMask": "0x1"
+    },
+    {
+        "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 32 cycles.",
+        "Data_LA": "1",
+        "EventCode": "0xcd",
+        "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_32",
+        "MSRIndex": "0x3F6",
+        "MSRValue": "0x20",
+        "PEBS": "2",
+        "PublicDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 32 cycles.  Reported latency may be longer than just the memory latency.",
+        "SampleAfterValue": "100007",
+        "UMask": "0x1"
+    },
+    {
+        "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 4 cycles.",
+        "Data_LA": "1",
+        "EventCode": "0xcd",
+        "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_4",
+        "MSRIndex": "0x3F6",
+        "MSRValue": "0x4",
+        "PEBS": "2",
+        "PublicDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 4 cycles.  Reported latency may be longer than just the memory latency.",
+        "SampleAfterValue": "100003",
+        "UMask": "0x1"
+    },
+    {
+        "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 512 cycles.",
+        "Data_LA": "1",
+        "EventCode": "0xcd",
+        "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_512",
+        "MSRIndex": "0x3F6",
+        "MSRValue": "0x200",
+        "PEBS": "2",
+        "PublicDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 512 cycles.  Reported latency may be longer than just the memory latency.",
+        "SampleAfterValue": "101",
+        "UMask": "0x1"
+    },
+    {
+        "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 64 cycles.",
+        "Data_LA": "1",
+        "EventCode": "0xcd",
+        "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_64",
+        "MSRIndex": "0x3F6",
+        "MSRValue": "0x40",
+        "PEBS": "2",
+        "PublicDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 64 cycles.  Reported latency may be longer than just the memory latency.",
+        "SampleAfterValue": "2003",
+        "UMask": "0x1"
+    },
+    {
+        "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 8 cycles.",
+        "Data_LA": "1",
+        "EventCode": "0xcd",
+        "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_8",
+        "MSRIndex": "0x3F6",
+        "MSRValue": "0x8",
+        "PEBS": "2",
+        "PublicDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 8 cycles.  Reported latency may be longer than just the memory latency.",
+        "SampleAfterValue": "50021",
+        "UMask": "0x1"
+    },
+    {
+        "BriefDescription": "Retired memory store access operations. A PDist event for PEBS Store Latency Facility.",
+        "Data_LA": "1",
+        "EventCode": "0xcd",
+        "EventName": "MEM_TRANS_RETIRED.STORE_SAMPLE",
+        "PEBS": "2",
+        "PublicDescription": "Counts Retired memory accesses with at least 1 store operation. This PEBS event is the precisely-distributed (PDist) trigger covering all stores uops for sampling by the PEBS Store Latency Facility. The facility is described in Intel SDM Volume 3 section 19.9.8",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x2"
+    },
+    {
+        "BriefDescription": "Counts demand data reads that were not supplied by the local socket's L1, L2, or L3 caches.",
+        "EventCode": "0x2A,0x2B",
+        "EventName": "OCR.DEMAND_DATA_RD.L3_MISS",
+        "MSRIndex": "0x1a6,0x1a7",
+        "MSRValue": "0x3FBFC00001",
+        "SampleAfterValue": "100003",
+        "UMask": "0x1"
+    },
+    {
+        "BriefDescription": "Counts demand reads for ownership (RFO) requests and software prefetches for exclusive ownership (PREFETCHW) that were not supplied by the local socket's L1, L2, or L3 caches.",
+        "EventCode": "0x2A,0x2B",
+        "EventName": "OCR.DEMAND_RFO.L3_MISS",
+        "MSRIndex": "0x1a6,0x1a7",
+        "MSRValue": "0x3F3FC00002",
+        "SampleAfterValue": "100003",
+        "UMask": "0x1"
+    },
+    {
+        "BriefDescription": "Number of times an RTM execution aborted.",
+        "EventCode": "0xc9",
+        "EventName": "RTM_RETIRED.ABORTED",
+        "PublicDescription": "Counts the number of times RTM abort was triggered.",
+        "SampleAfterValue": "100003",
+        "UMask": "0x4"
+    },
+    {
+        "BriefDescription": "Number of times an RTM execution successfully committed",
+        "EventCode": "0xc9",
+        "EventName": "RTM_RETIRED.COMMIT",
+        "PublicDescription": "Counts the number of times RTM commit succeeded.",
+        "SampleAfterValue": "100003",
+        "UMask": "0x2"
+    },
+    {
+        "BriefDescription": "Number of times an RTM execution started.",
+        "EventCode": "0xc9",
+        "EventName": "RTM_RETIRED.START",
+        "PublicDescription": "Counts the number of times we entered an RTM region. Does not count nested transactions.",
+        "SampleAfterValue": "100003",
+        "UMask": "0x1"
+    },
+    {
+        "BriefDescription": "Speculatively counts the number of TSX aborts due to a data capacity limitation for transactional reads",
+        "EventCode": "0x54",
+        "EventName": "TX_MEM.ABORT_CAPACITY_READ",
+        "PublicDescription": "Speculatively counts the number of Transactional Synchronization Extensions (TSX) aborts due to a data capacity limitation for transactional reads",
+        "SampleAfterValue": "100003",
+        "UMask": "0x80"
+    },
+    {
+        "BriefDescription": "Speculatively counts the number of TSX aborts due to a data capacity limitation for transactional writes.",
+        "EventCode": "0x54",
+        "EventName": "TX_MEM.ABORT_CAPACITY_WRITE",
+        "PublicDescription": "Speculatively counts the number of Transactional Synchronization Extensions (TSX) aborts due to a data capacity limitation for transactional writes.",
+        "SampleAfterValue": "100003",
+        "UMask": "0x2"
+    },
+    {
+        "BriefDescription": "Number of times a transactional abort was signaled due to a data conflict on a transactionally accessed address",
+        "EventCode": "0x54",
+        "EventName": "TX_MEM.ABORT_CONFLICT",
+        "PublicDescription": "Counts the number of times a TSX line had a cache conflict.",
+        "SampleAfterValue": "100003",
+        "UMask": "0x1"
+    }
+]
diff --git a/tools/perf/pmu-events/arch/x86/graniterapids/other.json b/tools/perf/pmu-events/arch/x86/graniterapids/other.json
new file mode 100644
index 000000000000..5e799bae03ea
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/graniterapids/other.json
@@ -0,0 +1,29 @@
+[
+    {
+        "BriefDescription": "Counts demand data reads that have any type of response.",
+        "EventCode": "0x2A,0x2B",
+        "EventName": "OCR.DEMAND_DATA_RD.ANY_RESPONSE",
+        "MSRIndex": "0x1a6,0x1a7",
+        "MSRValue": "0x10001",
+        "SampleAfterValue": "100003",
+        "UMask": "0x1"
+    },
+    {
+        "BriefDescription": "Counts demand data reads that were supplied by DRAM attached to this socket, unless in Sub NUMA Cluster(SNC) Mode.  In SNC Mode counts only those DRAM accesses that are controlled by the close SNC Cluster.",
+        "EventCode": "0x2A,0x2B",
+        "EventName": "OCR.DEMAND_DATA_RD.LOCAL_DRAM",
+        "MSRIndex": "0x1a6,0x1a7",
+        "MSRValue": "0x104000001",
+        "SampleAfterValue": "100003",
+        "UMask": "0x1"
+    },
+    {
+        "BriefDescription": "Counts demand reads for ownership (RFO) requests and software prefetches for exclusive ownership (PREFETCHW) that have any type of response.",
+        "EventCode": "0x2A,0x2B",
+        "EventName": "OCR.DEMAND_RFO.ANY_RESPONSE",
+        "MSRIndex": "0x1a6,0x1a7",
+        "MSRValue": "0x3F3FFC0002",
+        "SampleAfterValue": "100003",
+        "UMask": "0x1"
+    }
+]
diff --git a/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json b/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json
new file mode 100644
index 000000000000..d6aafb258708
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json
@@ -0,0 +1,102 @@
+[
+    {
+        "BriefDescription": "All branch instructions retired.",
+        "EventCode": "0xc4",
+        "EventName": "BR_INST_RETIRED.ALL_BRANCHES",
+        "PEBS": "1",
+        "PublicDescription": "Counts all branch instructions retired.",
+        "SampleAfterValue": "400009"
+    },
+    {
+        "BriefDescription": "All mispredicted branch instructions retired.",
+        "EventCode": "0xc5",
+        "EventName": "BR_MISP_RETIRED.ALL_BRANCHES",
+        "PEBS": "1",
+        "PublicDescription": "Counts all the retired branch instructions that were mispredicted by the processor. A branch misprediction occurs when the processor incorrectly predicts the destination of the branch.  When the misprediction is discovered at execution, all the instructions executed in the wrong (speculative) path must be discarded, and the processor must start fetching from the correct path.",
+        "SampleAfterValue": "400009"
+    },
+    {
+        "BriefDescription": "Reference cycles when the core is not in halt state.",
+        "EventName": "CPU_CLK_UNHALTED.REF_TSC",
+        "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
+        "SampleAfterValue": "2000003",
+        "UMask": "0x3"
+    },
+    {
+        "BriefDescription": "Reference cycles when the core is not in halt state.",
+        "EventCode": "0x3c",
+        "EventName": "CPU_CLK_UNHALTED.REF_TSC_P",
+        "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
+        "SampleAfterValue": "2000003",
+        "UMask": "0x1"
+    },
+    {
+        "BriefDescription": "Core cycles when the thread is not in halt state",
+        "EventName": "CPU_CLK_UNHALTED.THREAD",
+        "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.",
+        "SampleAfterValue": "2000003",
+        "UMask": "0x2"
+    },
+    {
+        "BriefDescription": "Thread cycles when thread is not in halt state",
+        "EventCode": "0x3c",
+        "EventName": "CPU_CLK_UNHALTED.THREAD_P",
+        "PublicDescription": "This is an architectural event that counts the number of thread cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. The core frequency may change from time to time due to power or thermal throttling. For this reason, this event may have a changing ratio with regards to wall clock time.",
+        "SampleAfterValue": "2000003"
+    },
+    {
+        "BriefDescription": "Number of instructions retired. Fixed Counter - architectural event",
+        "EventName": "INST_RETIRED.ANY",
+        "PEBS": "1",
+        "PublicDescription": "Counts the number of X86 instructions retired - an Architectural PerfMon event. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter freeing up programmable counters to count other events. INST_RETIRED.ANY_P is counted by a programmable counter.",
+        "SampleAfterValue": "2000003",
+        "UMask": "0x1"
+    },
+    {
+        "BriefDescription": "Number of instructions retired. General Counter - architectural event",
+        "EventCode": "0xc0",
+        "EventName": "INST_RETIRED.ANY_P",
+        "PEBS": "1",
+        "PublicDescription": "Counts the number of X86 instructions retired - an Architectural PerfMon event. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter freeing up programmable counters to count other events. INST_RETIRED.ANY_P is counted by a programmable counter.",
+        "SampleAfterValue": "2000003"
+    },
+    {
+        "BriefDescription": "Loads blocked due to overlapping with a preceding store that cannot be forwarded.",
+        "EventCode": "0x03",
+        "EventName": "LD_BLOCKS.STORE_FORWARD",
+        "PublicDescription": "Counts the number of times where store forwarding was prevented for a load operation. The most common case is a load blocked due to the address of memory access (partially) overlapping with a preceding uncompleted store. Note: See the table of not supported store forwards in the Optimization Guide.",
+        "SampleAfterValue": "100003",
+        "UMask": "0x82"
+    },
+    {
+        "BriefDescription": "This event counts a subset of the Topdown Slots event that were not consumed by the back-end pipeline due to lack of back-end resources, as a result of memory subsystem delays, execution units limitations, or other conditions.",
+        "EventCode": "0xa4",
+        "EventName": "TOPDOWN.BACKEND_BOUND_SLOTS",
+        "PublicDescription": "This event counts a subset of the Topdown Slots event that were not consumed by the back-end pipeline due to lack of back-end resources, as a result of memory subsystem delays, execution units limitations, or other conditions.\nThe count is distributed among unhalted logical processors (hyper-threads) who share the same physical core, in processors that support Intel Hyper-Threading Technology. Software can use this event as the nominator for the Backend Bound metric (or top-level category) of the Top-down Microarchitecture Analysis method.",
+        "SampleAfterValue": "10000003",
+        "UMask": "0x2"
+    },
+    {
+        "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event",
+        "EventName": "TOPDOWN.SLOTS",
+        "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).",
+        "SampleAfterValue": "10000003",
+        "UMask": "0x4"
+    },
+    {
+        "BriefDescription": "TMA slots available for an unhalted logical processor. General counter - architectural event",
+        "EventCode": "0xa4",
+        "EventName": "TOPDOWN.SLOTS_P",
+        "PublicDescription": "Counts the number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method. The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core.",
+        "SampleAfterValue": "10000003",
+        "UMask": "0x1"
+    },
+    {
+        "BriefDescription": "This event counts a subset of the Topdown Slots event that are utilized by operations that eventually get retired (committed) by the processor pipeline. Usually, this event positively correlates with higher performance  for example, as measured by the instructions-per-cycle metric.",
+        "EventCode": "0xc2",
+        "EventName": "UOPS_RETIRED.SLOTS",
+        "PublicDescription": "This event counts a subset of the Topdown Slots event that are utilized by operations that eventually get retired (committed) by the processor pipeline. Usually, this event positively correlates with higher performance  for example, as measured by the instructions-per-cycle metric.\nSoftware can use this event as the nominator for the Retiring metric (or top-level category) of the Top-down Microarchitecture Analysis method.",
+        "SampleAfterValue": "2000003",
+        "UMask": "0x2"
+    }
+]
diff --git a/tools/perf/pmu-events/arch/x86/graniterapids/virtual-memory.json b/tools/perf/pmu-events/arch/x86/graniterapids/virtual-memory.json
new file mode 100644
index 000000000000..8784c97b7534
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/graniterapids/virtual-memory.json
@@ -0,0 +1,26 @@
+[
+    {
+        "BriefDescription": "Load miss in all TLB levels causes a page walk that completes. (All page sizes)",
+        "EventCode": "0x12",
+        "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED",
+        "PublicDescription": "Counts completed page walks  (all page sizes) caused by demand data loads. This implies it missed in the DTLB and further levels of TLB. The page walk can end with or without a fault.",
+        "SampleAfterValue": "100003",
+        "UMask": "0xe"
+    },
+    {
+        "BriefDescription": "Store misses in all TLB levels causes a page walk that completes. (All page sizes)",
+        "EventCode": "0x13",
+        "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED",
+        "PublicDescription": "Counts completed page walks  (all page sizes) caused by demand data stores. This implies it missed in the DTLB and further levels of TLB. The page walk can end with or without a fault.",
+        "SampleAfterValue": "100003",
+        "UMask": "0xe"
+    },
+    {
+        "BriefDescription": "Code miss in all TLB levels causes a page walk that completes. (All page sizes)",
+        "EventCode": "0x11",
+        "EventName": "ITLB_MISSES.WALK_COMPLETED",
+        "PublicDescription": "Counts completed page walks (all page sizes) caused by a code fetch. This implies it missed in the ITLB (Instruction TLB) and further levels of TLB. The page walk can end with or without a fault.",
+        "SampleAfterValue": "100003",
+        "UMask": "0xe"
+    }
+]
diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv
index 793076e00188..1677ec22e2e3 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -9,6 +9,7 @@ GenuineIntel-6-55-[56789ABCDEF],v1.17,cascadelakex,core
 GenuineIntel-6-9[6C],v1.03,elkhartlake,core
 GenuineIntel-6-5[CF],v13,goldmont,core
 GenuineIntel-6-7A,v1.01,goldmontplus,core
+GenuineIntel-6-A[DE],v1.00,graniterapids,core
 GenuineIntel-6-(3C|45|46),v32,haswell,core
 GenuineIntel-6-3F,v26,haswellx,core
 GenuineIntel-6-(7D|7E|A7),v1.15,icelake,core
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 24/51] perf vendor events intel: Refresh knightslanding events
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (10 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 16/51] perf vendor events intel: Add graniterapids events Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 25/51] perf vendor events intel: Refresh sandybridge events Ian Rogers
                   ` (24 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Update the knightslanding events from 9 to 10. Generation was done
using https://github.com/intel/perfmon.

The most notable change is in corrections to event descriptions.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 .../arch/x86/knightslanding/cache.json        | 94 +++++++++----------
 .../arch/x86/knightslanding/pipeline.json     |  8 +-
 .../arch/x86/knightslanding/uncore-other.json |  8 +-
 tools/perf/pmu-events/arch/x86/mapfile.csv    |  2 +-
 4 files changed, 56 insertions(+), 56 deletions(-)

diff --git a/tools/perf/pmu-events/arch/x86/knightslanding/cache.json b/tools/perf/pmu-events/arch/x86/knightslanding/cache.json
index 01aea3d2832e..d9876cb06b08 100644
--- a/tools/perf/pmu-events/arch/x86/knightslanding/cache.json
+++ b/tools/perf/pmu-events/arch/x86/knightslanding/cache.json
@@ -6,7 +6,7 @@
         "SampleAfterValue": "200003"
     },
     {
-        "BriefDescription": "Counts the number of core cycles the fetch stalls because of an icache miss. This is a cummulative count of core cycles the fetch stalled for all icache misses.",
+        "BriefDescription": "Counts the number of core cycles the fetch stalls because of an icache miss. This is a cumulative count of core cycles the fetch stalled for all icache misses.",
         "EventCode": "0x86",
         "EventName": "FETCH_STALL.ICACHE_FILL_PENDING_CYCLES",
         "PublicDescription": "This event counts the number of core cycles the fetch stalls because of an icache miss. This is a cumulative count of cycles the NIP stalled for all icache misses.",
@@ -28,7 +28,7 @@
         "UMask": "0x4f"
     },
     {
-        "BriefDescription": "Counts the number of MEC requests from the L2Q that reference a cache line (cacheable requests) exlcuding SW prefetches filling only to L2 cache and L1 evictions (automatically exlcudes L2HWP, UC, WC) that were rejected - Multiple repeated rejects should be counted multiple times",
+        "BriefDescription": "Counts the number of MEC requests from the L2Q that reference a cache line (cacheable requests) excluding SW prefetches filling only to L2 cache and L1 evictions (automatically exlcudes L2HWP, UC, WC) that were rejected - Multiple repeated rejects should be counted multiple times",
         "EventCode": "0x30",
         "EventName": "L2_REQUESTS_REJECT.ALL",
         "SampleAfterValue": "200003"
@@ -108,7 +108,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts Demand code reads and prefetch code read requests  that accounts for reponses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
+        "BriefDescription": "Counts Demand code reads and prefetch code read requests  that accounts for responses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.ANY_CODE_RD.L2_HIT_FAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -135,7 +135,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts Demand code reads and prefetch code read requests  that accounts for reponses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
+        "BriefDescription": "Counts Demand code reads and prefetch code read requests  that accounts for responses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.ANY_CODE_RD.L2_HIT_NEAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -198,7 +198,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts Demand code reads and prefetch code read requests  that are outstanding, per weighted cycle, from the time of the request to when any response is received. The oustanding response should be programmed only on PMC0.",
+        "BriefDescription": "Counts Demand code reads and prefetch code read requests  that are outstanding, per weighted cycle, from the time of the request to when any response is received. The outstanding response should be programmed only on PMC0.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.ANY_CODE_RD.OUTSTANDING",
         "MSRIndex": "0x1a6",
@@ -216,7 +216,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts Demand cacheable data and L1 prefetch data read requests  that accounts for reponses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
+        "BriefDescription": "Counts Demand cacheable data and L1 prefetch data read requests  that accounts for responses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.ANY_DATA_RD.L2_HIT_FAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -243,7 +243,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts Demand cacheable data and L1 prefetch data read requests  that accounts for reponses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
+        "BriefDescription": "Counts Demand cacheable data and L1 prefetch data read requests  that accounts for responses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.ANY_DATA_RD.L2_HIT_NEAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -306,7 +306,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts Demand cacheable data and L1 prefetch data read requests  that are outstanding, per weighted cycle, from the time of the request to when any response is received. The oustanding response should be programmed only on PMC0.",
+        "BriefDescription": "Counts Demand cacheable data and L1 prefetch data read requests  that are outstanding, per weighted cycle, from the time of the request to when any response is received. The outstanding response should be programmed only on PMC0.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.ANY_DATA_RD.OUTSTANDING",
         "MSRIndex": "0x1a6",
@@ -324,7 +324,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts any Prefetch requests that accounts for reponses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
+        "BriefDescription": "Counts any Prefetch requests that accounts for responses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.ANY_PF_L2.L2_HIT_FAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -351,7 +351,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts any Prefetch requests that accounts for reponses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
+        "BriefDescription": "Counts any Prefetch requests that accounts for responses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.ANY_PF_L2.L2_HIT_NEAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -405,7 +405,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts any Prefetch requests that are outstanding, per weighted cycle, from the time of the request to when any response is received. The oustanding response should be programmed only on PMC0.",
+        "BriefDescription": "Counts any Prefetch requests that are outstanding, per weighted cycle, from the time of the request to when any response is received. The outstanding response should be programmed only on PMC0.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.ANY_PF_L2.OUTSTANDING",
         "MSRIndex": "0x1a6",
@@ -423,7 +423,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts any Read request  that accounts for reponses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
+        "BriefDescription": "Counts any Read request  that accounts for responses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.ANY_READ.L2_HIT_FAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -450,7 +450,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts any Read request  that accounts for reponses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
+        "BriefDescription": "Counts any Read request  that accounts for responses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.ANY_READ.L2_HIT_NEAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -513,7 +513,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts any Read request  that are outstanding, per weighted cycle, from the time of the request to when any response is received. The oustanding response should be programmed only on PMC0.",
+        "BriefDescription": "Counts any Read request  that are outstanding, per weighted cycle, from the time of the request to when any response is received. The outstanding response should be programmed only on PMC0.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.ANY_READ.OUTSTANDING",
         "MSRIndex": "0x1a6",
@@ -531,7 +531,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts any request that accounts for reponses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
+        "BriefDescription": "Counts any request that accounts for responses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.ANY_REQUEST.L2_HIT_FAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -558,7 +558,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts any request that accounts for reponses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
+        "BriefDescription": "Counts any request that accounts for responses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.ANY_REQUEST.L2_HIT_NEAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -621,7 +621,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts any request that are outstanding, per weighted cycle, from the time of the request to when any response is received. The oustanding response should be programmed only on PMC0.",
+        "BriefDescription": "Counts any request that are outstanding, per weighted cycle, from the time of the request to when any response is received. The outstanding response should be programmed only on PMC0.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.ANY_REQUEST.OUTSTANDING",
         "MSRIndex": "0x1a6",
@@ -639,7 +639,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts Demand cacheable data write requests  that accounts for reponses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
+        "BriefDescription": "Counts Demand cacheable data write requests  that accounts for responses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.ANY_RFO.L2_HIT_FAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -666,7 +666,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts Demand cacheable data write requests  that accounts for reponses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
+        "BriefDescription": "Counts Demand cacheable data write requests  that accounts for responses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.ANY_RFO.L2_HIT_NEAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -729,7 +729,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts Demand cacheable data write requests  that are outstanding, per weighted cycle, from the time of the request to when any response is received. The oustanding response should be programmed only on PMC0.",
+        "BriefDescription": "Counts Demand cacheable data write requests  that are outstanding, per weighted cycle, from the time of the request to when any response is received. The outstanding response should be programmed only on PMC0.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.ANY_RFO.OUTSTANDING",
         "MSRIndex": "0x1a6",
@@ -747,7 +747,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts Bus locks and split lock requests that accounts for reponses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
+        "BriefDescription": "Counts Bus locks and split lock requests that accounts for responses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.BUS_LOCKS.L2_HIT_FAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -774,7 +774,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts Bus locks and split lock requests that accounts for reponses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
+        "BriefDescription": "Counts Bus locks and split lock requests that accounts for responses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.BUS_LOCKS.L2_HIT_NEAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -837,7 +837,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts Bus locks and split lock requests that are outstanding, per weighted cycle, from the time of the request to when any response is received. The oustanding response should be programmed only on PMC0.",
+        "BriefDescription": "Counts Bus locks and split lock requests that are outstanding, per weighted cycle, from the time of the request to when any response is received. The outstanding response should be programmed only on PMC0.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.BUS_LOCKS.OUTSTANDING",
         "MSRIndex": "0x1a6",
@@ -855,7 +855,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts demand code reads and prefetch code reads that accounts for reponses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
+        "BriefDescription": "Counts demand code reads and prefetch code reads that accounts for responses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.DEMAND_CODE_RD.L2_HIT_FAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -882,7 +882,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts demand code reads and prefetch code reads that accounts for reponses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
+        "BriefDescription": "Counts demand code reads and prefetch code reads that accounts for responses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.DEMAND_CODE_RD.L2_HIT_NEAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -945,7 +945,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts demand code reads and prefetch code reads that are outstanding, per weighted cycle, from the time of the request to when any response is received. The oustanding response should be programmed only on PMC0.",
+        "BriefDescription": "Counts demand code reads and prefetch code reads that are outstanding, per weighted cycle, from the time of the request to when any response is received. The outstanding response should be programmed only on PMC0.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.DEMAND_CODE_RD.OUTSTANDING",
         "MSRIndex": "0x1a6",
@@ -1035,7 +1035,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts demand cacheable data and L1 prefetch data reads that are outstanding, per weighted cycle, from the time of the request to when any response is received. The oustanding response should be programmed only on PMC0.",
+        "BriefDescription": "Counts demand cacheable data and L1 prefetch data reads that are outstanding, per weighted cycle, from the time of the request to when any response is received. The outstanding response should be programmed only on PMC0.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.DEMAND_DATA_RD.OUTSTANDING",
         "MSRIndex": "0x1a6",
@@ -1053,7 +1053,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts Demand cacheable data writes that accounts for reponses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
+        "BriefDescription": "Counts Demand cacheable data writes that accounts for responses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.DEMAND_RFO.L2_HIT_FAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -1080,7 +1080,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts Demand cacheable data writes that accounts for reponses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
+        "BriefDescription": "Counts Demand cacheable data writes that accounts for responses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.DEMAND_RFO.L2_HIT_NEAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -1143,7 +1143,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts Demand cacheable data writes that are outstanding, per weighted cycle, from the time of the request to when any response is received. The oustanding response should be programmed only on PMC0.",
+        "BriefDescription": "Counts Demand cacheable data writes that are outstanding, per weighted cycle, from the time of the request to when any response is received. The outstanding response should be programmed only on PMC0.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.DEMAND_RFO.OUTSTANDING",
         "MSRIndex": "0x1a6",
@@ -1170,7 +1170,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts Partial reads (UC or WC and is valid only for Outstanding response type).  that accounts for reponses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
+        "BriefDescription": "Counts Partial reads (UC or WC and is valid only for Outstanding response type).  that accounts for responses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.PARTIAL_READS.L2_HIT_FAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -1197,7 +1197,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts Partial reads (UC or WC and is valid only for Outstanding response type).  that accounts for reponses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
+        "BriefDescription": "Counts Partial reads (UC or WC and is valid only for Outstanding response type).  that accounts for responses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.PARTIAL_READS.L2_HIT_NEAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -1260,7 +1260,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts Partial reads (UC or WC and is valid only for Outstanding response type).  that are outstanding, per weighted cycle, from the time of the request to when any response is received. The oustanding response should be programmed only on PMC0.",
+        "BriefDescription": "Counts Partial reads (UC or WC and is valid only for Outstanding response type).  that are outstanding, per weighted cycle, from the time of the request to when any response is received. The outstanding response should be programmed only on PMC0.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.PARTIAL_READS.OUTSTANDING",
         "MSRIndex": "0x1a6",
@@ -1287,7 +1287,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts Partial writes (UC or WT or WP and should be programmed on PMC1) that accounts for reponses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
+        "BriefDescription": "Counts Partial writes (UC or WT or WP and should be programmed on PMC1) that accounts for responses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.PARTIAL_WRITES.L2_HIT_FAR_TILE",
         "MSRIndex": "0x1a7",
@@ -1314,7 +1314,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts Partial writes (UC or WT or WP and should be programmed on PMC1) that accounts for reponses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
+        "BriefDescription": "Counts Partial writes (UC or WT or WP and should be programmed on PMC1) that accounts for responses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.PARTIAL_WRITES.L2_HIT_NEAR_TILE",
         "MSRIndex": "0x1a7",
@@ -1386,7 +1386,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts L1 data HW prefetches that accounts for reponses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
+        "BriefDescription": "Counts L1 data HW prefetches that accounts for responses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.PF_L1_DATA_RD.L2_HIT_FAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -1413,7 +1413,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts L1 data HW prefetches that accounts for reponses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
+        "BriefDescription": "Counts L1 data HW prefetches that accounts for responses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.PF_L1_DATA_RD.L2_HIT_NEAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -1476,7 +1476,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts L1 data HW prefetches that are outstanding, per weighted cycle, from the time of the request to when any response is received. The oustanding response should be programmed only on PMC0.",
+        "BriefDescription": "Counts L1 data HW prefetches that are outstanding, per weighted cycle, from the time of the request to when any response is received. The outstanding response should be programmed only on PMC0.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.PF_L1_DATA_RD.OUTSTANDING",
         "MSRIndex": "0x1a6",
@@ -1494,7 +1494,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts L2 code HW prefetches that accounts for reponses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
+        "BriefDescription": "Counts L2 code HW prefetches that accounts for responses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.PF_L2_CODE_RD.L2_HIT_FAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -1521,7 +1521,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts L2 code HW prefetches that accounts for reponses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
+        "BriefDescription": "Counts L2 code HW prefetches that accounts for responses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.PF_L2_CODE_RD.L2_HIT_NEAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -1566,7 +1566,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts L2 code HW prefetches that are outstanding, per weighted cycle, from the time of the request to when any response is received. The oustanding response should be programmed only on PMC0.",
+        "BriefDescription": "Counts L2 code HW prefetches that are outstanding, per weighted cycle, from the time of the request to when any response is received. The outstanding response should be programmed only on PMC0.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.PF_L2_CODE_RD.OUTSTANDING",
         "MSRIndex": "0x1a6",
@@ -1602,7 +1602,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts L2 data RFO prefetches (includes PREFETCHW instruction) that accounts for reponses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
+        "BriefDescription": "Counts L2 data RFO prefetches (includes PREFETCHW instruction) that accounts for responses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.PF_L2_RFO.L2_HIT_NEAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -1683,7 +1683,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts Software Prefetches that accounts for reponses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
+        "BriefDescription": "Counts Software Prefetches that accounts for responses from snoop request hit with data forwarded from it Far(not in the same quadrant as the request)-other tile L2 in E/F/M state. Valid only in SNC4 Cluster mode.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.PF_SOFTWARE.L2_HIT_FAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -1710,7 +1710,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts Software Prefetches that accounts for reponses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
+        "BriefDescription": "Counts Software Prefetches that accounts for responses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.PF_SOFTWARE.L2_HIT_NEAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -1773,7 +1773,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts Software Prefetches that are outstanding, per weighted cycle, from the time of the request to when any response is received. The oustanding response should be programmed only on PMC0.",
+        "BriefDescription": "Counts Software Prefetches that are outstanding, per weighted cycle, from the time of the request to when any response is received. The outstanding response should be programmed only on PMC0.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.PF_SOFTWARE.OUTSTANDING",
         "MSRIndex": "0x1a6",
@@ -1818,7 +1818,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts UC code reads (valid only for Outstanding response type)  that accounts for reponses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
+        "BriefDescription": "Counts UC code reads (valid only for Outstanding response type)  that accounts for responses from snoop request hit with data forwarded from its Near-other tile L2 in E/F/M state",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.UC_CODE_READS.L2_HIT_NEAR_TILE",
         "MSRIndex": "0x1a6,0x1a7",
@@ -1881,7 +1881,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts UC code reads (valid only for Outstanding response type)  that are outstanding, per weighted cycle, from the time of the request to when any response is received. The oustanding response should be programmed only on PMC0.",
+        "BriefDescription": "Counts UC code reads (valid only for Outstanding response type)  that are outstanding, per weighted cycle, from the time of the request to when any response is received. The outstanding response should be programmed only on PMC0.",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.UC_CODE_READS.OUTSTANDING",
         "MSRIndex": "0x1a6",
diff --git a/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json b/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json
index 1b803fa38641..3dc532107ead 100644
--- a/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json
@@ -254,14 +254,14 @@
         "UMask": "0x80"
     },
     {
-        "BriefDescription": "Counts the number of occurences a retired load gets blocked because its address overlaps with a store whose data is not ready",
+        "BriefDescription": "Counts the number of occurrences a retired load gets blocked because its address overlaps with a store whose data is not ready",
         "EventCode": "0x03",
         "EventName": "RECYCLEQ.LD_BLOCK_STD_NOTREADY",
         "SampleAfterValue": "200003",
         "UMask": "0x2"
     },
     {
-        "BriefDescription": "Counts the number of occurences a retired load gets blocked because its address partially overlaps with a store",
+        "BriefDescription": "Counts the number of occurrences a retired load gets blocked because its address partially overlaps with a store",
         "Data_LA": "1",
         "EventCode": "0x03",
         "EventName": "RECYCLEQ.LD_BLOCK_ST_FORWARD",
@@ -270,7 +270,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts the number of occurences a retired load that is a cache line split. Each split should be counted only once.",
+        "BriefDescription": "Counts the number of occurrences a retired load that is a cache line split. Each split should be counted only once.",
         "Data_LA": "1",
         "EventCode": "0x03",
         "EventName": "RECYCLEQ.LD_SPLITS",
@@ -293,7 +293,7 @@
         "UMask": "0x20"
     },
     {
-        "BriefDescription": "Counts the number of occurences a retired store that is a cache line split. Each split should be counted only once.",
+        "BriefDescription": "Counts the number of occurrences a retired store that is a cache line split. Each split should be counted only once.",
         "EventCode": "0x03",
         "EventName": "RECYCLEQ.ST_SPLITS",
         "PublicDescription": "This event counts the number of retired store that experienced a cache line boundary split(Precise Event). Note that each spilt should be counted only once.",
diff --git a/tools/perf/pmu-events/arch/x86/knightslanding/uncore-other.json b/tools/perf/pmu-events/arch/x86/knightslanding/uncore-other.json
index 3abd9c3fdc48..491cb37ddab0 100644
--- a/tools/perf/pmu-events/arch/x86/knightslanding/uncore-other.json
+++ b/tools/perf/pmu-events/arch/x86/knightslanding/uncore-other.json
@@ -1084,7 +1084,7 @@
         "Unit": "CHA"
     },
     {
-        "BriefDescription": "Cache Lookups. Counts the number of times the LLC was accessed. Writeback transactions from L2 to the LLC  This includes all write transactions -- both Cachable and UC.",
+        "BriefDescription": "Cache Lookups. Counts the number of times the LLC was accessed. Writeback transactions from L2 to the LLC  This includes all write transactions -- both Cacheable and UC.",
         "EventCode": "0x37",
         "EventName": "UNC_H_CACHE_LINES_VICTIMIZED.E_STATE",
         "PerPkg": "1",
@@ -1843,7 +1843,7 @@
         "Unit": "CHA"
     },
     {
-        "BriefDescription": "Counts cycles source throttling is adderted - horizontal",
+        "BriefDescription": "Counts cycles source throttling is asserted - horizontal",
         "EventCode": "0xA5",
         "EventName": "UNC_H_FAST_ASSERTED.HORZ",
         "PerPkg": "1",
@@ -1851,7 +1851,7 @@
         "Unit": "CHA"
     },
     {
-        "BriefDescription": "Counts cycles source throttling is adderted - vertical",
+        "BriefDescription": "Counts cycles source throttling is asserted - vertical",
         "EventCode": "0xA5",
         "EventName": "UNC_H_FAST_ASSERTED.VERT",
         "PerPkg": "1",
@@ -2929,7 +2929,7 @@
         "Unit": "CHA"
     },
     {
-        "BriefDescription": "Cache Lookups. Counts the number of times the LLC was accessed. Writeback transactions from L2 to the LLC  This includes all write transactions -- both Cachable and UC.",
+        "BriefDescription": "Cache Lookups. Counts the number of times the LLC was accessed. Writeback transactions from L2 to the LLC  This includes all write transactions -- both Cacheable and UC.",
         "EventCode": "0x34",
         "EventName": "UNC_H_SF_LOOKUP.WRITE",
         "PerPkg": "1",
diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv
index afe811f154d7..41bd13baa265 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -17,7 +17,7 @@ GenuineIntel-6-6[AC],v1.18,icelakex,core
 GenuineIntel-6-3A,v23,ivybridge,core
 GenuineIntel-6-3E,v22,ivytown,core
 GenuineIntel-6-2D,v22,jaketown,core
-GenuineIntel-6-(57|85),v9,knightslanding,core
+GenuineIntel-6-(57|85),v10,knightslanding,core
 GenuineIntel-6-A[AC],v1.00,meteorlake,core
 GenuineIntel-6-1[AEF],v3,nehalemep,core
 GenuineIntel-6-2E,v3,nehalemex,core
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 25/51] perf vendor events intel: Refresh sandybridge events
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (11 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 24/51] perf vendor events intel: Refresh knightslanding events Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 27/51] perf vendor events intel: Refresh silvermont events Ian Rogers
                   ` (23 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Update the sandybridge events from 17 to 18. Generation was done
using https://github.com/intel/perfmon.

Notable changes are improved event descriptions, TMA metrics are
updated to version 4.5, TMA info metrics are renamed from their node
name to be lower case and prefixed by tma_info_, MetricThreshold
expressions are added, and the smi_cost metric group is added
replicating existing hard coded metrics in stat-shadow.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/arch/x86/mapfile.csv    |   2 +-
 .../arch/x86/sandybridge/cache.json           |   8 +-
 .../arch/x86/sandybridge/floating-point.json  |   2 +-
 .../arch/x86/sandybridge/frontend.json        |  12 +-
 .../arch/x86/sandybridge/pipeline.json        |   2 +-
 .../arch/x86/sandybridge/snb-metrics.json     | 601 ++++++++++--------
 6 files changed, 344 insertions(+), 283 deletions(-)

diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv
index 41bd13baa265..02715cbe4fd8 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -21,7 +21,7 @@ GenuineIntel-6-(57|85),v10,knightslanding,core
 GenuineIntel-6-A[AC],v1.00,meteorlake,core
 GenuineIntel-6-1[AEF],v3,nehalemep,core
 GenuineIntel-6-2E,v3,nehalemex,core
-GenuineIntel-6-2A,v17,sandybridge,core
+GenuineIntel-6-2A,v18,sandybridge,core
 GenuineIntel-6-(8F|CF),v1.09,sapphirerapids,core
 GenuineIntel-6-(37|4A|4C|4D|5A),v14,silvermont,core
 GenuineIntel-6-(4E|5E|8E|9E|A5|A6),v53,skylake,core
diff --git a/tools/perf/pmu-events/arch/x86/sandybridge/cache.json b/tools/perf/pmu-events/arch/x86/sandybridge/cache.json
index 65696ea2a581..4e5572ee7dfe 100644
--- a/tools/perf/pmu-events/arch/x86/sandybridge/cache.json
+++ b/tools/perf/pmu-events/arch/x86/sandybridge/cache.json
@@ -37,7 +37,7 @@
         "UMask": "0x5"
     },
     {
-        "BriefDescription": "Cycles a demand request was blocked due to Fill Buffers inavailability.",
+        "BriefDescription": "Cycles a demand request was blocked due to Fill Buffers unavailability.",
         "CounterMask": "1",
         "EventCode": "0x48",
         "EventName": "L1D_PEND_MISS.FB_FULL",
@@ -45,7 +45,7 @@
         "UMask": "0x2"
     },
     {
-        "BriefDescription": "L1D miss oustandings duration in cycles.",
+        "BriefDescription": "L1D miss outstanding duration in cycles.",
         "EventCode": "0x48",
         "EventName": "L1D_PEND_MISS.PENDING",
         "SampleAfterValue": "2000003",
@@ -493,7 +493,7 @@
         "UMask": "0x8"
     },
     {
-        "BriefDescription": "Cacheable and noncachaeble code read requests.",
+        "BriefDescription": "Cacheable and noncacheable code read requests.",
         "EventCode": "0xB0",
         "EventName": "OFFCORE_REQUESTS.DEMAND_CODE_RD",
         "SampleAfterValue": "100003",
@@ -898,7 +898,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "COREWB & ANY_RESPONSE",
+        "BriefDescription": "OFFCORE_RESPONSE.COREWB.ANY_RESPONSE",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OFFCORE_RESPONSE.COREWB.ANY_RESPONSE",
         "MSRIndex": "0x1a6,0x1a7",
diff --git a/tools/perf/pmu-events/arch/x86/sandybridge/floating-point.json b/tools/perf/pmu-events/arch/x86/sandybridge/floating-point.json
index 8c2a246adef9..79e8f403c426 100644
--- a/tools/perf/pmu-events/arch/x86/sandybridge/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/sandybridge/floating-point.json
@@ -64,7 +64,7 @@
         "UMask": "0x20"
     },
     {
-        "BriefDescription": "Number of FP Computational Uops Executed this cycle. The number of FADD, FSUB, FCOM, FMULs, integer MULsand IMULs, FDIVs, FPREMs, FSQRTS, integer DIVs, and IDIVs. This event does not distinguish an FADD used in the middle of a transcendental flow from a s.",
+        "BriefDescription": "Number of FP Computational Uops Executed this cycle. The number of FADD, FSUB, FCOM, FMULs, integer MULs and IMULs, FDIVs, FPREMs, FSQRTS, integer DIVs, and IDIVs. This event does not distinguish an FADD used in the middle of a transcendental flow from a s.",
         "EventCode": "0x10",
         "EventName": "FP_COMP_OPS_EXE.X87",
         "SampleAfterValue": "2000003",
diff --git a/tools/perf/pmu-events/arch/x86/sandybridge/frontend.json b/tools/perf/pmu-events/arch/x86/sandybridge/frontend.json
index 69ab8d215f84..700716b42f1a 100644
--- a/tools/perf/pmu-events/arch/x86/sandybridge/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/sandybridge/frontend.json
@@ -134,7 +134,7 @@
         "UMask": "0x4"
     },
     {
-        "BriefDescription": "Cycles when uops are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy.",
+        "BriefDescription": "Cycles when uops are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequencer (MS) is busy.",
         "CounterMask": "1",
         "EventCode": "0x79",
         "EventName": "IDQ.MS_CYCLES",
@@ -143,7 +143,7 @@
         "UMask": "0x30"
     },
     {
-        "BriefDescription": "Cycles when uops initiated by Decode Stream Buffer (DSB) are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy.",
+        "BriefDescription": "Cycles when uops initiated by Decode Stream Buffer (DSB) are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequencer (MS) is busy.",
         "CounterMask": "1",
         "EventCode": "0x79",
         "EventName": "IDQ.MS_DSB_CYCLES",
@@ -151,7 +151,7 @@
         "UMask": "0x10"
     },
     {
-        "BriefDescription": "Deliveries to Instruction Decode Queue (IDQ) initiated by Decode Stream Buffer (DSB) while Microcode Sequenser (MS) is busy.",
+        "BriefDescription": "Deliveries to Instruction Decode Queue (IDQ) initiated by Decode Stream Buffer (DSB) while Microcode Sequencer (MS) is busy.",
         "CounterMask": "1",
         "EdgeDetect": "1",
         "EventCode": "0x79",
@@ -160,14 +160,14 @@
         "UMask": "0x10"
     },
     {
-        "BriefDescription": "Uops initiated by Decode Stream Buffer (DSB) that are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy.",
+        "BriefDescription": "Uops initiated by Decode Stream Buffer (DSB) that are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequencer (MS) is busy.",
         "EventCode": "0x79",
         "EventName": "IDQ.MS_DSB_UOPS",
         "SampleAfterValue": "2000003",
         "UMask": "0x10"
     },
     {
-        "BriefDescription": "Uops initiated by MITE and delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy.",
+        "BriefDescription": "Uops initiated by MITE and delivered to Instruction Decode Queue (IDQ) while Microcode Sequencer (MS) is busy.",
         "EventCode": "0x79",
         "EventName": "IDQ.MS_MITE_UOPS",
         "SampleAfterValue": "2000003",
@@ -183,7 +183,7 @@
         "UMask": "0x30"
     },
     {
-        "BriefDescription": "Uops delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy.",
+        "BriefDescription": "Uops delivered to Instruction Decode Queue (IDQ) while Microcode Sequencer (MS) is busy.",
         "EventCode": "0x79",
         "EventName": "IDQ.MS_UOPS",
         "SampleAfterValue": "2000003",
diff --git a/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json b/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json
index 53ab5993e8b0..54454e5e262c 100644
--- a/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json
@@ -509,7 +509,7 @@
         "BriefDescription": "Cases when loads get true Block-on-Store blocking code preventing store forwarding.",
         "EventCode": "0x03",
         "EventName": "LD_BLOCKS.STORE_FORWARD",
-        "PublicDescription": "This event counts loads that followed a store to the same address, where the data could not be forwarded inside the pipeline from the store to the load.  The most common reason why store forwarding would be blocked is when a load's address range overlaps with a preceeding smaller uncompleted store.  See the table of not supported store forwards in the Intel(R) 64 and IA-32 Architectures Optimization Reference Manual.  The penalty for blocked store forwarding is that the load must wait for the store to complete before it can be issued.",
+        "PublicDescription": "This event counts loads that followed a store to the same address, where the data could not be forwarded inside the pipeline from the store to the load.  The most common reason why store forwarding would be blocked is when a load's address range overlaps with a preceding smaller uncompleted store.  See the table of not supported store forwards in the Intel(R) 64 and IA-32 Architectures Optimization Reference Manual.  The penalty for blocked store forwarding is that the load must wait for the store to complete before it can be issued.",
         "SampleAfterValue": "100003",
         "UMask": "0x2"
     },
diff --git a/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json b/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json
index a7b3c835b03d..4a99fe515f4b 100644
--- a/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json
@@ -1,449 +1,510 @@
 [
     {
-        "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend",
-        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / SLOTS",
-        "MetricGroup": "PGO;TopdownL1;tma_L1_group",
-        "MetricName": "tma_frontend_bound",
-        "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Machine_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.",
-        "ScaleUnit": "100%"
-    },
-    {
-        "BriefDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues",
-        "MetricExpr": "4 * min(CPU_CLK_UNHALTED.THREAD, IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE) / SLOTS",
-        "MetricGroup": "Frontend;TopdownL2;tma_L2_group;tma_frontend_bound_group",
-        "MetricName": "tma_fetch_latency",
-        "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues.  For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END",
+        "BriefDescription": "C2 residency percent per package",
+        "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC",
+        "MetricGroup": "Power",
+        "MetricName": "C2_Pkg_Residency",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to Instruction TLB (ITLB) misses",
-        "MetricExpr": "(12 * ITLB_MISSES.STLB_HIT + ITLB_MISSES.WALK_DURATION) / CLKS",
-        "MetricGroup": "BigFoot;FetchLat;MemoryTLB;TopdownL3;tma_fetch_latency_group",
-        "MetricName": "tma_itlb_misses",
-        "PublicDescription": "This metric represents fraction of cycles the CPU was stalled due to Instruction TLB (ITLB) misses. Sample with: ITLB_MISSES.WALK_COMPLETED",
+        "BriefDescription": "C3 residency percent per core",
+        "MetricExpr": "cstate_core@c3\\-residency@ / TSC",
+        "MetricGroup": "Power",
+        "MetricName": "C3_Core_Residency",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to Branch Resteers",
-        "MetricExpr": "12 * (BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + BACLEARS.ANY) / CLKS",
-        "MetricGroup": "FetchLat;TopdownL3;tma_fetch_latency_group",
-        "MetricName": "tma_branch_resteers",
-        "PublicDescription": "This metric represents fraction of cycles the CPU was stalled due to Branch Resteers. Branch Resteers estimates the Frontend delay in fetching operations from corrected path; following all sorts of miss-predicted branches. For example; branchy code with lots of miss-predictions might get categorized under Branch Resteers. Note the value of this node may overlap with its siblings. Sample with: BR_MISP_RETIRED.ALL_BRANCHES",
+        "BriefDescription": "C3 residency percent per package",
+        "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC",
+        "MetricGroup": "Power",
+        "MetricName": "C3_Pkg_Residency",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to switches from DSB to MITE pipelines",
-        "MetricExpr": "DSB2MITE_SWITCHES.PENALTY_CYCLES / CLKS",
-        "MetricGroup": "DSBmiss;FetchLat;TopdownL3;tma_fetch_latency_group",
-        "MetricName": "tma_dsb_switches",
-        "PublicDescription": "This metric represents fraction of cycles the CPU was stalled due to switches from DSB to MITE pipelines. The DSB (decoded i-cache) is a Uop Cache where the front-end directly delivers Uops (micro operations) avoiding heavy x86 decoding. The DSB pipeline has shorter latency and delivered higher bandwidth than the MITE (legacy instruction decode pipeline). Switching between the two pipelines can cause penalties hence this metric measures the exposed penalty.",
+        "BriefDescription": "C6 residency percent per core",
+        "MetricExpr": "cstate_core@c6\\-residency@ / TSC",
+        "MetricGroup": "Power",
+        "MetricName": "C6_Core_Residency",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "This metric represents fraction of cycles CPU was stalled due to Length Changing Prefixes (LCPs)",
-        "MetricExpr": "ILD_STALL.LCP / CLKS",
-        "MetricGroup": "FetchLat;TopdownL3;tma_fetch_latency_group",
-        "MetricName": "tma_lcp",
-        "PublicDescription": "This metric represents fraction of cycles CPU was stalled due to Length Changing Prefixes (LCPs). Using proper compiler flags or Intel Compiler by default will certainly avoid this. #Link: Optimization Guide about LCP BKMs.",
+        "BriefDescription": "C6 residency percent per package",
+        "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC",
+        "MetricGroup": "Power",
+        "MetricName": "C6_Pkg_Residency",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "This metric estimates the fraction of cycles when the CPU was stalled due to switches of uop delivery to the Microcode Sequencer (MS)",
-        "MetricExpr": "3 * IDQ.MS_SWITCHES / CLKS",
-        "MetricGroup": "FetchLat;MicroSeq;TopdownL3;tma_fetch_latency_group",
-        "MetricName": "tma_ms_switches",
-        "PublicDescription": "This metric estimates the fraction of cycles when the CPU was stalled due to switches of uop delivery to the Microcode Sequencer (MS). Commonly used instructions are optimized for delivery by the DSB (decoded i-cache) or MITE (legacy instruction decode) pipelines. Certain operations cannot be handled natively by the execution pipeline; and must be performed by microcode (small programs injected into the execution stream). Switching to the MS too often can negatively impact performance. The MS is designated to deliver long uop flows required by CISC instructions like CPUID; or uncommon conditions like Floating Point Assists when dealing with Denormals. Sample with: IDQ.MS_SWITCHES",
+        "BriefDescription": "C7 residency percent per core",
+        "MetricExpr": "cstate_core@c7\\-residency@ / TSC",
+        "MetricGroup": "Power",
+        "MetricName": "C7_Core_Residency",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues",
-        "MetricExpr": "tma_frontend_bound - tma_fetch_latency",
-        "MetricGroup": "FetchBW;Frontend;TopdownL2;tma_L2_group;tma_frontend_bound_group",
-        "MetricName": "tma_fetch_bandwidth",
-        "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues.  For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend.",
+        "BriefDescription": "C7 residency percent per package",
+        "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC",
+        "MetricGroup": "Power",
+        "MetricName": "C7_Pkg_Residency",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations",
-        "MetricExpr": "(UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (INT_MISC.RECOVERY_CYCLES_ANY / 2 if #SMT_on else INT_MISC.RECOVERY_CYCLES)) / SLOTS",
-        "MetricGroup": "TopdownL1;tma_L1_group",
-        "MetricName": "tma_bad_speculation",
-        "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
-        "ScaleUnit": "100%"
+        "BriefDescription": "Uncore frequency per die [GHZ]",
+        "MetricExpr": "tma_info_socket_clks / #num_dies / duration_time / 1e9",
+        "MetricGroup": "SoC",
+        "MetricName": "UNCORE_FREQ"
     },
     {
-        "BriefDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction",
-        "MetricExpr": "BR_MISP_RETIRED.ALL_BRANCHES / (BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT) * tma_bad_speculation",
-        "MetricGroup": "BadSpec;BrMispredicts;TopdownL2;tma_L2_group;tma_bad_speculation_group",
-        "MetricName": "tma_branch_mispredicts",
-        "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction.  These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES",
+        "BriefDescription": "Percentage of cycles spent in System Management Interrupts.",
+        "MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)",
+        "MetricGroup": "smi",
+        "MetricName": "smi_cycles",
+        "MetricThreshold": "smi_cycles > 0.1",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears",
-        "MetricExpr": "tma_bad_speculation - tma_branch_mispredicts",
-        "MetricGroup": "BadSpec;MachineClears;TopdownL2;tma_L2_group;tma_bad_speculation_group",
-        "MetricName": "tma_machine_clears",
-        "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears.  These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT",
-        "ScaleUnit": "100%"
+        "BriefDescription": "Number of SMI interrupts.",
+        "MetricExpr": "msr@smi@",
+        "MetricGroup": "smi",
+        "MetricName": "smi_num",
+        "ScaleUnit": "1SMI#"
     },
     {
         "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend",
+        "MetricConstraint": "NO_GROUP_EVENTS_NMI",
         "MetricExpr": "1 - (tma_frontend_bound + tma_bad_speculation + tma_retiring)",
-        "MetricGroup": "TopdownL1;tma_L1_group",
+        "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound",
+        "MetricThreshold": "tma_backend_bound > 0.2",
         "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck",
-        "MetricExpr": "(min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.STALLS_L1D_PENDING) + RESOURCE_STALLS.SB) / (min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.CYCLES_NO_DISPATCH) + cpu@UOPS_DISPATCHED.THREAD\\,cmask\\=1@ - cpu@UOPS_DISPATCHED.THREAD\\,cmask\\=3@ if IPC > 1.8 else (cpu@UOPS_DISPATCHED.THREAD\\,cmask\\=2@ - RS_EVENTS.EMPTY_CYCLES if tma_fetch_latency > 0.1 else RESOURCE_STALLS.SB)) * tma_backend_bound",
-        "MetricGroup": "Backend;TopdownL2;tma_L2_group;tma_backend_bound_group",
-        "MetricName": "tma_memory_bound",
-        "PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck.  Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
-        "ScaleUnit": "100%"
-    },
-    {
-        "BriefDescription": "This metric roughly estimates the fraction of cycles where the Data TLB (DTLB) was missed by load accesses",
-        "MetricExpr": "(7 * DTLB_LOAD_MISSES.STLB_HIT + DTLB_LOAD_MISSES.WALK_DURATION) / CLKS",
-        "MetricGroup": "MemoryTLB;TopdownL4;tma_l1_bound_group",
-        "MetricName": "tma_dtlb_load",
-        "PublicDescription": "This metric roughly estimates the fraction of cycles where the Data TLB (DTLB) was missed by load accesses. TLBs (Translation Look-aside Buffers) are processor caches for recently used entries out of the Page Tables that are used to map virtual- to physical-addresses by the operating system. This metric approximates the potential delay of demand loads missing the first-level data TLB (assuming worst case scenario with back to back misses to different pages). This includes hitting in the second-level TLB (STLB) as well as performing a hardware page walk on an STLB miss. Sample with: MEM_UOPS_RETIRED.STLB_MISS_LOADS_PS",
-        "ScaleUnit": "100%"
-    },
-    {
-        "BriefDescription": "This metric estimates how often the CPU was stalled due to loads accesses to L3 cache or contended with a sibling Core",
-        "MetricExpr": "MEM_LOAD_UOPS_RETIRED.LLC_HIT / (MEM_LOAD_UOPS_RETIRED.LLC_HIT + 7 * MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS) * CYCLE_ACTIVITY.STALLS_L2_PENDING / CLKS",
-        "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_memory_bound_group",
-        "MetricName": "tma_l3_bound",
-        "PublicDescription": "This metric estimates how often the CPU was stalled due to loads accesses to L3 cache or contended with a sibling Core.  Avoiding cache misses (i.e. L2 misses/L3 hits) can improve the latency and increase performance. Sample with: MEM_LOAD_UOPS_RETIRED.L3_HIT_PS",
-        "ScaleUnit": "100%"
-    },
-    {
-        "BriefDescription": "This metric estimates how often the CPU was stalled on accesses to external memory (DRAM) by loads",
-        "MetricExpr": "(1 - MEM_LOAD_UOPS_RETIRED.LLC_HIT / (MEM_LOAD_UOPS_RETIRED.LLC_HIT + 7 * MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS)) * CYCLE_ACTIVITY.STALLS_L2_PENDING / CLKS",
-        "MetricGroup": "MemoryBound;TmaL3mem;TopdownL3;tma_memory_bound_group",
-        "MetricName": "tma_dram_bound",
-        "PublicDescription": "This metric estimates how often the CPU was stalled on accesses to external memory (DRAM) by loads. Better caching can improve the latency and increase performance. Sample with: MEM_LOAD_UOPS_RETIRED.L3_MISS_PS",
-        "ScaleUnit": "100%"
-    },
-    {
-        "BriefDescription": "This metric estimates fraction of cycles where the core's performance was likely hurt due to approaching bandwidth limits of external memory (DRAM)",
-        "MetricExpr": "min(CPU_CLK_UNHALTED.THREAD, cpu@OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD\\,cmask\\=6@) / CLKS",
-        "MetricGroup": "MemoryBW;Offcore;TopdownL4;tma_dram_bound_group",
-        "MetricName": "tma_mem_bandwidth",
-        "PublicDescription": "This metric estimates fraction of cycles where the core's performance was likely hurt due to approaching bandwidth limits of external memory (DRAM).  The underlying heuristic assumes that a similar off-core traffic is generated by all IA cores. This metric does not aggregate non-data-read requests by this logical processor; requests from other IA Logical Processors/Physical Cores/sockets; or other non-IA devices like GPU; hence the maximum external memory bandwidth limits may or may not be approached when this metric is flagged (see Uncore counters for that).",
+        "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations",
+        "MetricExpr": "(UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (INT_MISC.RECOVERY_CYCLES_ANY / 2 if #SMT_on else INT_MISC.RECOVERY_CYCLES)) / tma_info_slots",
+        "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
+        "MetricName": "tma_bad_speculation",
+        "MetricThreshold": "tma_bad_speculation > 0.15",
+        "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "This metric estimates fraction of cycles where the performance was likely hurt due to latency from external memory (DRAM)",
-        "MetricExpr": "min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD) / CLKS - tma_mem_bandwidth",
-        "MetricGroup": "MemoryLat;Offcore;TopdownL4;tma_dram_bound_group",
-        "MetricName": "tma_mem_latency",
-        "PublicDescription": "This metric estimates fraction of cycles where the performance was likely hurt due to latency from external memory (DRAM).  This metric does not aggregate requests from other Logical Processors/Physical Cores/sockets (see Uncore counters for that).",
+        "BriefDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction",
+        "MetricConstraint": "NO_GROUP_EVENTS",
+        "MetricExpr": "BR_MISP_RETIRED.ALL_BRANCHES / (BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT) * tma_bad_speculation",
+        "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
+        "MetricName": "tma_branch_mispredicts",
+        "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
+        "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction.  These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics: tma_info_branch_misprediction_cost, tma_mispredicts_resteers",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "This metric estimates how often CPU was stalled  due to RFO store memory accesses; RFO store issue a read-for-ownership request before the write",
-        "MetricExpr": "RESOURCE_STALLS.SB / CLKS",
-        "MetricGroup": "MemoryBound;TmaL3mem;TopdownL3;tma_memory_bound_group",
-        "MetricName": "tma_store_bound",
-        "PublicDescription": "This metric estimates how often CPU was stalled  due to RFO store memory accesses; RFO store issue a read-for-ownership request before the write. Even though store accesses do not typically stall out-of-order CPUs; there are few cases where stores can lead to actual stalls. This metric will be flagged should RFO stores be a bottleneck. Sample with: MEM_UOPS_RETIRED.ALL_STORES_PS",
+        "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to Branch Resteers",
+        "MetricExpr": "12 * (BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + BACLEARS.ANY) / tma_info_clks",
+        "MetricGroup": "FetchLat;TopdownL3;tma_L3_group;tma_fetch_latency_group",
+        "MetricName": "tma_branch_resteers",
+        "MetricThreshold": "tma_branch_resteers > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15)",
+        "PublicDescription": "This metric represents fraction of cycles the CPU was stalled due to Branch Resteers. Branch Resteers estimates the Frontend delay in fetching operations from corrected path; following all sorts of miss-predicted branches. For example; branchy code with lots of miss-predictions might get categorized under Branch Resteers. Note the value of this node may overlap with its siblings. Sample with: BR_MISP_RETIRED.ALL_BRANCHES",
         "ScaleUnit": "100%"
     },
     {
         "BriefDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck",
+        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "tma_backend_bound - tma_memory_bound",
-        "MetricGroup": "Backend;Compute;TopdownL2;tma_L2_group;tma_backend_bound_group",
+        "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_core_bound",
+        "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2",
         "PublicDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck.  Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
         "ScaleUnit": "100%"
     },
     {
         "BriefDescription": "This metric represents fraction of cycles where the Divider unit was active",
-        "MetricExpr": "ARITH.FPU_DIV_ACTIVE / CORE_CLKS",
-        "MetricGroup": "TopdownL3;tma_core_bound_group",
+        "MetricExpr": "ARITH.FPU_DIV_ACTIVE / tma_info_core_clks",
+        "MetricGroup": "TopdownL3;tma_L3_group;tma_core_bound_group",
         "MetricName": "tma_divider",
+        "MetricThreshold": "tma_divider > 0.2 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2)",
         "PublicDescription": "This metric represents fraction of cycles where the Divider unit was active. Divide and square root instructions are performed by the Divider unit and can take considerably longer latency than integer or Floating Point addition; subtraction; or multiplication. Sample with: ARITH.DIVIDER_UOPS",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "This metric estimates fraction of cycles the CPU performance was potentially limited due to Core computation issues (non divider-related)",
-        "MetricExpr": "((min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.CYCLES_NO_DISPATCH) + cpu@UOPS_DISPATCHED.THREAD\\,cmask\\=1@ - cpu@UOPS_DISPATCHED.THREAD\\,cmask\\=3@ if IPC > 1.8 else (cpu@UOPS_DISPATCHED.THREAD\\,cmask\\=2@ - RS_EVENTS.EMPTY_CYCLES if tma_fetch_latency > 0.1 else RESOURCE_STALLS.SB)) - RESOURCE_STALLS.SB - min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.STALLS_L1D_PENDING)) / CLKS",
-        "MetricGroup": "PortsUtil;TopdownL3;tma_core_bound_group",
-        "MetricName": "tma_ports_utilization",
-        "PublicDescription": "This metric estimates fraction of cycles the CPU performance was potentially limited due to Core computation issues (non divider-related).  Two distinct categories can be attributed into this metric: (1) heavy data-dependency among contiguous instructions would manifest in this metric - such cases are often referred to as low Instruction Level Parallelism (ILP). (2) Contention on some hardware execution unit other than Divider. For example; when there are too many multiply operations.",
+        "BriefDescription": "This metric estimates how often the CPU was stalled on accesses to external memory (DRAM) by loads",
+        "MetricConstraint": "NO_GROUP_EVENTS_SMT",
+        "MetricExpr": "(1 - MEM_LOAD_UOPS_RETIRED.LLC_HIT / (MEM_LOAD_UOPS_RETIRED.LLC_HIT + 7 * MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS)) * CYCLE_ACTIVITY.STALLS_L2_PENDING / tma_info_clks",
+        "MetricGroup": "MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_memory_bound_group",
+        "MetricName": "tma_dram_bound",
+        "MetricThreshold": "tma_dram_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2)",
+        "PublicDescription": "This metric estimates how often the CPU was stalled on accesses to external memory (DRAM) by loads. Better caching can improve the latency and increase performance. Sample with: MEM_LOAD_UOPS_RETIRED.L3_MISS_PS",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired",
-        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / SLOTS",
-        "MetricGroup": "TopdownL1;tma_L1_group",
-        "MetricName": "tma_retiring",
-        "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category.  Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved.  Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance.  For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided.  Sample with: UOPS_RETIRED.RETIRE_SLOTS",
+        "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to switches from DSB to MITE pipelines",
+        "MetricExpr": "DSB2MITE_SWITCHES.PENALTY_CYCLES / tma_info_clks",
+        "MetricGroup": "DSBmiss;FetchLat;TopdownL3;tma_L3_group;tma_fetch_latency_group;tma_issueFB",
+        "MetricName": "tma_dsb_switches",
+        "MetricThreshold": "tma_dsb_switches > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15)",
+        "PublicDescription": "This metric represents fraction of cycles the CPU was stalled due to switches from DSB to MITE pipelines. The DSB (decoded i-cache) is a Uop Cache where the front-end directly delivers Uops (micro operations) avoiding heavy x86 decoding. The DSB pipeline has shorter latency and delivered higher bandwidth than the MITE (legacy instruction decode pipeline). Switching between the two pipelines can cause penalties hence this metric measures the exposed penalty. Related metrics: tma_fetch_bandwidth, tma_info_dsb_coverage, tma_lcp",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation)",
-        "MetricExpr": "tma_retiring - tma_heavy_operations",
-        "MetricGroup": "Retire;TopdownL2;tma_L2_group;tma_retiring_group",
-        "MetricName": "tma_light_operations",
-        "PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
+        "BriefDescription": "This metric roughly estimates the fraction of cycles where the Data TLB (DTLB) was missed by load accesses",
+        "MetricExpr": "(7 * DTLB_LOAD_MISSES.STLB_HIT + DTLB_LOAD_MISSES.WALK_DURATION) / tma_info_clks",
+        "MetricGroup": "MemoryTLB;TopdownL4;tma_L4_group;tma_issueTLB;tma_l1_bound_group",
+        "MetricName": "tma_dtlb_load",
+        "MetricThreshold": "tma_dtlb_load > 0.1",
+        "PublicDescription": "This metric roughly estimates the fraction of cycles where the Data TLB (DTLB) was missed by load accesses. TLBs (Translation Look-aside Buffers) are processor caches for recently used entries out of the Page Tables that are used to map virtual- to physical-addresses by the operating system. This metric approximates the potential delay of demand loads missing the first-level data TLB (assuming worst case scenario with back to back misses to different pages). This includes hitting in the second-level TLB (STLB) as well as performing a hardware page walk on an STLB miss. Sample with: MEM_UOPS_RETIRED.STLB_MISS_LOADS_PS. Related metrics: tma_dtlb_store",
+        "ScaleUnit": "100%"
+    },
+    {
+        "BriefDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues",
+        "MetricExpr": "tma_frontend_bound - tma_fetch_latency",
+        "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group;tma_issueFB",
+        "MetricName": "tma_fetch_bandwidth",
+        "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_ipc / 4 > 0.35",
+        "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues.  For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Related metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_lcp",
+        "ScaleUnit": "100%"
+    },
+    {
+        "BriefDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues",
+        "MetricExpr": "4 * min(CPU_CLK_UNHALTED.THREAD, IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE) / tma_info_slots",
+        "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend_bound_group",
+        "MetricName": "tma_fetch_latency",
+        "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15",
+        "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues.  For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END",
         "ScaleUnit": "100%"
     },
     {
         "BriefDescription": "This metric represents overall arithmetic floating-point (FP) operations fraction the CPU has executed (retired)",
         "MetricExpr": "tma_x87_use + tma_fp_scalar + tma_fp_vector",
-        "MetricGroup": "HPC;TopdownL3;tma_light_operations_group",
+        "MetricGroup": "HPC;TopdownL3;tma_L3_group;tma_light_operations_group",
         "MetricName": "tma_fp_arith",
+        "MetricThreshold": "tma_fp_arith > 0.2 & tma_light_operations > 0.6",
         "PublicDescription": "This metric represents overall arithmetic floating-point (FP) operations fraction the CPU has executed (retired). Note this metric's value may exceed its parent due to use of \"Uops\" CountDomain and FMA double-counting.",
         "ScaleUnit": "100%"
     },
-    {
-        "BriefDescription": "This metric serves as an approximation of legacy x87 usage",
-        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS * FP_COMP_OPS_EXE.X87 / UOPS_DISPATCHED.THREAD",
-        "MetricGroup": "Compute;TopdownL4;tma_fp_arith_group",
-        "MetricName": "tma_x87_use",
-        "PublicDescription": "This metric serves as an approximation of legacy x87 usage. It accounts for instructions beyond X87 FP arithmetic operations; hence may be used as a thermometer to avoid X87 high usage and preferably upgrade to modern ISA. See Tip under Tuning Hint.",
-        "ScaleUnit": "100%"
-    },
     {
         "BriefDescription": "This metric approximates arithmetic floating-point (FP) scalar uops fraction the CPU has retired",
         "MetricExpr": "(FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE) / UOPS_DISPATCHED.THREAD",
-        "MetricGroup": "Compute;Flops;TopdownL4;tma_fp_arith_group",
+        "MetricGroup": "Compute;Flops;TopdownL4;tma_L4_group;tma_fp_arith_group;tma_issue2P",
         "MetricName": "tma_fp_scalar",
-        "PublicDescription": "This metric approximates arithmetic floating-point (FP) scalar uops fraction the CPU has retired. May overcount due to FMA double counting.",
+        "MetricThreshold": "tma_fp_scalar > 0.1 & (tma_fp_arith > 0.2 & tma_light_operations > 0.6)",
+        "PublicDescription": "This metric approximates arithmetic floating-point (FP) scalar uops fraction the CPU has retired. May overcount due to FMA double counting. Related metrics: tma_fp_vector, tma_fp_vector_512b, tma_port_6, tma_ports_utilized_2",
         "ScaleUnit": "100%"
     },
     {
         "BriefDescription": "This metric approximates arithmetic floating-point (FP) vector uops fraction the CPU has retired aggregated across all vector widths",
         "MetricExpr": "(FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE) / UOPS_DISPATCHED.THREAD",
-        "MetricGroup": "Compute;Flops;TopdownL4;tma_fp_arith_group",
+        "MetricGroup": "Compute;Flops;TopdownL4;tma_L4_group;tma_fp_arith_group;tma_issue2P",
         "MetricName": "tma_fp_vector",
-        "PublicDescription": "This metric approximates arithmetic floating-point (FP) vector uops fraction the CPU has retired aggregated across all vector widths. May overcount due to FMA double counting.",
+        "MetricThreshold": "tma_fp_vector > 0.1 & (tma_fp_arith > 0.2 & tma_light_operations > 0.6)",
+        "PublicDescription": "This metric approximates arithmetic floating-point (FP) vector uops fraction the CPU has retired aggregated across all vector widths. May overcount due to FMA double counting. Related metrics: tma_fp_scalar, tma_fp_vector_512b, tma_port_6, tma_ports_utilized_2",
+        "ScaleUnit": "100%"
+    },
+    {
+        "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend",
+        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / tma_info_slots",
+        "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
+        "MetricName": "tma_frontend_bound",
+        "MetricThreshold": "tma_frontend_bound > 0.15",
+        "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or microcoded sequences",
+        "BriefDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences",
         "MetricExpr": "tma_microcode_sequencer",
-        "MetricGroup": "Retire;TopdownL2;tma_L2_group;tma_retiring_group",
+        "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_heavy_operations",
-        "PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or microcoded sequences. This highly-correlates with the uop length of these instructions/sequences.",
+        "MetricThreshold": "tma_heavy_operations > 0.1",
+        "PublicDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro-coded sequences. This highly-correlates with the uop length of these instructions/sequences.",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "This metric represents fraction of slots the CPU was retiring uops fetched by the Microcode Sequencer (MS) unit",
-        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / UOPS_ISSUED.ANY * IDQ.MS_UOPS / SLOTS",
-        "MetricGroup": "MicroSeq;TopdownL3;tma_heavy_operations_group",
-        "MetricName": "tma_microcode_sequencer",
-        "PublicDescription": "This metric represents fraction of slots the CPU was retiring uops fetched by the Microcode Sequencer (MS) unit.  The MS is used for CISC instructions not supported by the default decoders (like repeat move strings; or CPUID); or by microcode assists used to address some operation modes (like in Floating Point assists). These cases can often be avoided. Sample with: IDQ.MS_UOPS",
-        "ScaleUnit": "100%"
+        "BriefDescription": "Measured Average Frequency for unhalted processors [GHz]",
+        "MetricExpr": "tma_info_turbo_utilization * TSC / 1e9 / duration_time",
+        "MetricGroup": "Power;Summary",
+        "MetricName": "tma_info_average_frequency"
     },
     {
-        "BriefDescription": "Instructions Per Cycle (per Logical Processor)",
-        "MetricExpr": "INST_RETIRED.ANY / CLKS",
-        "MetricGroup": "Ret;Summary",
-        "MetricName": "IPC"
+        "BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.",
+        "MetricExpr": "CPU_CLK_UNHALTED.THREAD",
+        "MetricGroup": "Pipeline",
+        "MetricName": "tma_info_clks"
     },
     {
-        "BriefDescription": "Uops Per Instruction",
-        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY",
-        "MetricGroup": "Pipeline;Ret;Retire",
-        "MetricName": "UPI"
+        "BriefDescription": "Core actual clocks when any Logical Processor is active on the Physical Core",
+        "MetricExpr": "(CPU_CLK_UNHALTED.THREAD / 2 * (1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK) if #core_wide < 1 else (CPU_CLK_UNHALTED.THREAD_ANY / 2 if #SMT_on else tma_info_clks))",
+        "MetricGroup": "SMT",
+        "MetricName": "tma_info_core_clks"
+    },
+    {
+        "BriefDescription": "Instructions Per Cycle across hyper-threads (per physical core)",
+        "MetricExpr": "INST_RETIRED.ANY / tma_info_core_clks",
+        "MetricGroup": "Ret;SMT;TmaL1;tma_L1_group",
+        "MetricName": "tma_info_coreipc"
     },
     {
         "BriefDescription": "Cycles Per Instruction (per Logical Processor)",
-        "MetricExpr": "1 / IPC",
+        "MetricExpr": "1 / tma_info_ipc",
         "MetricGroup": "Mem;Pipeline",
-        "MetricName": "CPI"
+        "MetricName": "tma_info_cpi"
     },
     {
-        "BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.",
-        "MetricExpr": "CPU_CLK_UNHALTED.THREAD",
-        "MetricGroup": "Pipeline",
-        "MetricName": "CLKS"
+        "BriefDescription": "Average CPU Utilization",
+        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC",
+        "MetricGroup": "HPC;Summary",
+        "MetricName": "tma_info_cpu_utilization"
     },
     {
-        "BriefDescription": "Total issue-pipeline slots (per-Physical Core till ICL; per-Logical Processor ICL onward)",
-        "MetricExpr": "4 * CORE_CLKS",
-        "MetricGroup": "tma_L1_group",
-        "MetricName": "SLOTS"
+        "BriefDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]",
+        "MetricExpr": "64 * (UNC_ARB_TRK_REQUESTS.ALL + UNC_ARB_COH_TRK_REQUESTS.ALL) / 1e6 / duration_time / 1e3",
+        "MetricGroup": "HPC;Mem;MemoryBW;SoC;tma_issueBW",
+        "MetricName": "tma_info_dram_bw_use",
+        "PublicDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]. Related metrics: tma_mem_bandwidth"
+    },
+    {
+        "BriefDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)",
+        "MetricExpr": "IDQ.DSB_UOPS / (IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS)",
+        "MetricGroup": "DSB;Fed;FetchBW;tma_issueFB",
+        "MetricName": "tma_info_dsb_coverage",
+        "MetricThreshold": "tma_info_dsb_coverage < 0.7 & tma_info_ipc / 4 > 0.35",
+        "PublicDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache). Related metrics: tma_dsb_switches, tma_fetch_bandwidth, tma_lcp"
     },
     {
         "BriefDescription": "The ratio of Executed- by Issued-Uops",
         "MetricExpr": "UOPS_DISPATCHED.THREAD / UOPS_ISSUED.ANY",
         "MetricGroup": "Cor;Pipeline",
-        "MetricName": "Execute_per_Issue",
+        "MetricName": "tma_info_execute_per_issue",
         "PublicDescription": "The ratio of Executed- by Issued-Uops. Ratio > 1 suggests high rate of uop micro-fusions. Ratio < 1 suggest high rate of \"execute\" at rename stage."
     },
-    {
-        "BriefDescription": "Instructions Per Cycle across hyper-threads (per physical core)",
-        "MetricExpr": "INST_RETIRED.ANY / CORE_CLKS",
-        "MetricGroup": "Ret;SMT;tma_L1_group",
-        "MetricName": "CoreIPC"
-    },
     {
         "BriefDescription": "Floating Point Operations Per Cycle",
-        "MetricExpr": "(FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * (FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE) + 8 * SIMD_FP_256.PACKED_SINGLE) / CORE_CLKS",
+        "MetricExpr": "(FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * (FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE) + 8 * SIMD_FP_256.PACKED_SINGLE) / tma_info_core_clks",
         "MetricGroup": "Flops;Ret",
-        "MetricName": "FLOPc"
+        "MetricName": "tma_info_flopc"
+    },
+    {
+        "BriefDescription": "Giga Floating Point Operations Per Second",
+        "MetricExpr": "(FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * (FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE) + 8 * SIMD_FP_256.PACKED_SINGLE) / 1e9 / duration_time",
+        "MetricGroup": "Cor;Flops;HPC",
+        "MetricName": "tma_info_gflops",
+        "PublicDescription": "Giga Floating Point Operations Per Second. Aggregate across all supported options of: FP precisions, scalar and vector instructions, vector-width and AMX engine."
     },
     {
         "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is execution) per-core",
         "MetricExpr": "UOPS_DISPATCHED.THREAD / (cpu@UOPS_DISPATCHED.CORE\\,cmask\\=1@ / 2 if #SMT_on else cpu@UOPS_DISPATCHED.CORE\\,cmask\\=1@)",
         "MetricGroup": "Backend;Cor;Pipeline;PortsUtil",
-        "MetricName": "ILP"
+        "MetricName": "tma_info_ilp"
     },
     {
-        "BriefDescription": "Core actual clocks when any Logical Processor is active on the Physical Core",
-        "MetricExpr": "(CPU_CLK_UNHALTED.THREAD / 2 * (1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK) if #core_wide < 1 else (CPU_CLK_UNHALTED.THREAD_ANY / 2 if #SMT_on else CLKS))",
-        "MetricGroup": "SMT",
-        "MetricName": "CORE_CLKS"
+        "BriefDescription": "Total number of retired Instructions",
+        "MetricExpr": "INST_RETIRED.ANY",
+        "MetricGroup": "Summary;TmaL1;tma_L1_group",
+        "MetricName": "tma_info_instructions",
+        "PublicDescription": "Total number of retired Instructions. Sample with: INST_RETIRED.PREC_DIST"
     },
     {
-        "BriefDescription": "Total number of retired Instructions Sample with: INST_RETIRED.PREC_DIST",
-        "MetricExpr": "INST_RETIRED.ANY",
-        "MetricGroup": "Summary;tma_L1_group",
-        "MetricName": "Instructions"
+        "BriefDescription": "Instructions Per Cycle (per Logical Processor)",
+        "MetricExpr": "INST_RETIRED.ANY / tma_info_clks",
+        "MetricGroup": "Ret;Summary",
+        "MetricName": "tma_info_ipc"
     },
     {
-        "BriefDescription": "Average number of Uops retired in cycles where at least one uop has retired.",
-        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / cpu@UOPS_RETIRED.RETIRE_SLOTS\\,cmask\\=1@",
-        "MetricGroup": "Pipeline;Ret",
-        "MetricName": "Retire"
+        "BriefDescription": "Instructions per Far Branch ( Far Branches apply upon transition from application to operating system, handling interrupts, exceptions) [lower number means higher occurrence rate]",
+        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.FAR_BRANCH:u",
+        "MetricGroup": "Branches;OS",
+        "MetricName": "tma_info_ipfarbranch",
+        "MetricThreshold": "tma_info_ipfarbranch < 1e6"
     },
     {
-        "BriefDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)",
-        "MetricExpr": "IDQ.DSB_UOPS / (IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS)",
-        "MetricGroup": "DSB;Fed;FetchBW",
-        "MetricName": "DSB_Coverage"
+        "BriefDescription": "Cycles Per Instruction for the Operating System (OS) Kernel mode",
+        "MetricExpr": "CPU_CLK_UNHALTED.THREAD_P:k / INST_RETIRED.ANY_P:k",
+        "MetricGroup": "OS",
+        "MetricName": "tma_info_kernel_cpi"
     },
     {
-        "BriefDescription": "Average CPU Utilization",
-        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC",
-        "MetricGroup": "HPC;Summary",
-        "MetricName": "CPU_Utilization"
+        "BriefDescription": "Fraction of cycles spent in the Operating System (OS) Kernel mode",
+        "MetricExpr": "CPU_CLK_UNHALTED.THREAD_P:k / CPU_CLK_UNHALTED.THREAD",
+        "MetricGroup": "OS",
+        "MetricName": "tma_info_kernel_utilization",
+        "MetricThreshold": "tma_info_kernel_utilization > 0.05"
     },
     {
-        "BriefDescription": "Measured Average Frequency for unhalted processors [GHz]",
-        "MetricExpr": "Turbo_Utilization * TSC / 1e9 / duration_time",
-        "MetricGroup": "Power;Summary",
-        "MetricName": "Average_Frequency"
+        "BriefDescription": "Average number of parallel requests to external memory",
+        "MetricExpr": "UNC_ARB_TRK_OCCUPANCY.ALL / UNC_ARB_TRK_OCCUPANCY.CYCLES_WITH_ANY_REQUEST",
+        "MetricGroup": "Mem;SoC",
+        "MetricName": "tma_info_mem_parallel_requests",
+        "PublicDescription": "Average number of parallel requests to external memory. Accounts for all requests"
     },
     {
-        "BriefDescription": "Giga Floating Point Operations Per Second",
-        "MetricExpr": "(FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * (FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE) + 8 * SIMD_FP_256.PACKED_SINGLE) / 1e9 / duration_time",
-        "MetricGroup": "Cor;Flops;HPC",
-        "MetricName": "GFLOPs",
-        "PublicDescription": "Giga Floating Point Operations Per Second. Aggregate across all supported options of: FP precisions, scalar and vector instructions, vector-width and AMX engine."
+        "BriefDescription": "Average latency of all requests to external memory (in Uncore cycles)",
+        "MetricExpr": "UNC_ARB_TRK_OCCUPANCY.ALL / UNC_ARB_TRK_REQUESTS.ALL",
+        "MetricGroup": "Mem;SoC",
+        "MetricName": "tma_info_mem_request_latency"
     },
     {
-        "BriefDescription": "Average Frequency Utilization relative nominal frequency",
-        "MetricExpr": "CLKS / CPU_CLK_UNHALTED.REF_TSC",
-        "MetricGroup": "Power",
-        "MetricName": "Turbo_Utilization"
+        "BriefDescription": "Average number of Uops retired in cycles where at least one uop has retired.",
+        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / cpu@UOPS_RETIRED.RETIRE_SLOTS\\,cmask\\=1@",
+        "MetricGroup": "Pipeline;Ret",
+        "MetricName": "tma_info_retire"
+    },
+    {
+        "BriefDescription": "Total issue-pipeline slots (per-Physical Core till ICL; per-Logical Processor ICL onward)",
+        "MetricExpr": "4 * tma_info_core_clks",
+        "MetricGroup": "TmaL1;tma_L1_group",
+        "MetricName": "tma_info_slots"
     },
     {
         "BriefDescription": "Fraction of cycles where both hardware Logical Processors were active",
         "MetricExpr": "(1 - CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / (CPU_CLK_UNHALTED.REF_XCLK_ANY / 2) if #SMT_on else 0)",
         "MetricGroup": "SMT",
-        "MetricName": "SMT_2T_Utilization"
+        "MetricName": "tma_info_smt_2t_utilization"
     },
     {
-        "BriefDescription": "Fraction of cycles spent in the Operating System (OS) Kernel mode",
-        "MetricExpr": "CPU_CLK_UNHALTED.THREAD_P:k / CPU_CLK_UNHALTED.THREAD",
-        "MetricGroup": "OS",
-        "MetricName": "Kernel_Utilization"
+        "BriefDescription": "Socket actual clocks when any core is active on that socket",
+        "MetricExpr": "UNC_CLOCK.SOCKET",
+        "MetricGroup": "SoC",
+        "MetricName": "tma_info_socket_clks"
     },
     {
-        "BriefDescription": "Cycles Per Instruction for the Operating System (OS) Kernel mode",
-        "MetricExpr": "CPU_CLK_UNHALTED.THREAD_P:k / INST_RETIRED.ANY_P:k",
-        "MetricGroup": "OS",
-        "MetricName": "Kernel_CPI"
+        "BriefDescription": "Average Frequency Utilization relative nominal frequency",
+        "MetricExpr": "tma_info_clks / CPU_CLK_UNHALTED.REF_TSC",
+        "MetricGroup": "Power",
+        "MetricName": "tma_info_turbo_utilization"
     },
     {
-        "BriefDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]",
-        "MetricExpr": "64 * (UNC_ARB_TRK_REQUESTS.ALL + UNC_ARB_COH_TRK_REQUESTS.ALL) / 1e6 / duration_time / 1e3",
-        "MetricGroup": "HPC;Mem;MemoryBW;SoC",
-        "MetricName": "DRAM_BW_Use"
+        "BriefDescription": "Uops Per Instruction",
+        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY",
+        "MetricGroup": "Pipeline;Ret;Retire",
+        "MetricName": "tma_info_uoppi",
+        "MetricThreshold": "tma_info_uoppi > 1.05"
     },
     {
-        "BriefDescription": "Average latency of all requests to external memory (in Uncore cycles)",
-        "MetricExpr": "MEM_Parallel_Requests",
-        "MetricGroup": "Mem;SoC",
-        "MetricName": "MEM_Request_Latency"
+        "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to Instruction TLB (ITLB) misses",
+        "MetricExpr": "(12 * ITLB_MISSES.STLB_HIT + ITLB_MISSES.WALK_DURATION) / tma_info_clks",
+        "MetricGroup": "BigFoot;FetchLat;MemoryTLB;TopdownL3;tma_L3_group;tma_fetch_latency_group",
+        "MetricName": "tma_itlb_misses",
+        "MetricThreshold": "tma_itlb_misses > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15)",
+        "PublicDescription": "This metric represents fraction of cycles the CPU was stalled due to Instruction TLB (ITLB) misses. Sample with: ITLB_MISSES.WALK_COMPLETED",
+        "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Average number of parallel requests to external memory. Accounts for all requests",
-        "MetricExpr": "UNC_ARB_TRK_OCCUPANCY.ALL / UNC_ARB_TRK_REQUESTS.ALL",
-        "MetricGroup": "Mem;SoC",
-        "MetricName": "MEM_Parallel_Requests"
+        "BriefDescription": "This metric estimates how often the CPU was stalled due to loads accesses to L3 cache or contended with a sibling Core",
+        "MetricConstraint": "NO_GROUP_EVENTS_SMT",
+        "MetricExpr": "MEM_LOAD_UOPS_RETIRED.LLC_HIT / (MEM_LOAD_UOPS_RETIRED.LLC_HIT + 7 * MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS) * CYCLE_ACTIVITY.STALLS_L2_PENDING / tma_info_clks",
+        "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_memory_bound_group",
+        "MetricName": "tma_l3_bound",
+        "MetricThreshold": "tma_l3_bound > 0.05 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2)",
+        "PublicDescription": "This metric estimates how often the CPU was stalled due to loads accesses to L3 cache or contended with a sibling Core.  Avoiding cache misses (i.e. L2 misses/L3 hits) can improve the latency and increase performance. Sample with: MEM_LOAD_UOPS_RETIRED.L3_HIT_PS",
+        "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Socket actual clocks when any core is active on that socket",
-        "MetricExpr": "UNC_CLOCK.SOCKET",
-        "MetricGroup": "SoC",
-        "MetricName": "Socket_CLKS"
+        "BriefDescription": "This metric represents fraction of cycles CPU was stalled due to Length Changing Prefixes (LCPs)",
+        "MetricExpr": "ILD_STALL.LCP / tma_info_clks",
+        "MetricGroup": "FetchLat;TopdownL3;tma_L3_group;tma_fetch_latency_group;tma_issueFB",
+        "MetricName": "tma_lcp",
+        "MetricThreshold": "tma_lcp > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15)",
+        "PublicDescription": "This metric represents fraction of cycles CPU was stalled due to Length Changing Prefixes (LCPs). Using proper compiler flags or Intel Compiler by default will certainly avoid this. #Link: Optimization Guide about LCP BKMs. Related metrics: tma_dsb_switches, tma_fetch_bandwidth, tma_info_dsb_coverage",
+        "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Instructions per Far Branch ( Far Branches apply upon transition from application to operating system, handling interrupts, exceptions) [lower number means higher occurrence rate]",
-        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.FAR_BRANCH:u",
-        "MetricGroup": "Branches;OS",
-        "MetricName": "IpFarBranch"
+        "BriefDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation)",
+        "MetricExpr": "tma_retiring - tma_heavy_operations",
+        "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_group",
+        "MetricName": "tma_light_operations",
+        "MetricThreshold": "tma_light_operations > 0.6",
+        "PublicDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UopPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST",
+        "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "Uncore frequency per die [GHZ]",
-        "MetricExpr": "Socket_CLKS / #num_dies / duration_time / 1e9",
-        "MetricGroup": "SoC",
-        "MetricName": "UNCORE_FREQ"
+        "BriefDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears",
+        "MetricConstraint": "NO_GROUP_EVENTS",
+        "MetricExpr": "tma_bad_speculation - tma_branch_mispredicts",
+        "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn",
+        "MetricName": "tma_machine_clears",
+        "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15",
+        "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears.  These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: tma_clears_resteers, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_cache",
+        "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "C3 residency percent per core",
-        "MetricExpr": "cstate_core@c3\\-residency@ / TSC",
-        "MetricGroup": "Power",
-        "MetricName": "C3_Core_Residency",
+        "BriefDescription": "This metric estimates fraction of cycles where the core's performance was likely hurt due to approaching bandwidth limits of external memory (DRAM)",
+        "MetricExpr": "min(CPU_CLK_UNHALTED.THREAD, cpu@OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD\\,cmask\\=6@) / tma_info_clks",
+        "MetricGroup": "MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_dram_bound_group;tma_issueBW",
+        "MetricName": "tma_mem_bandwidth",
+        "MetricThreshold": "tma_mem_bandwidth > 0.2 & (tma_dram_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))",
+        "PublicDescription": "This metric estimates fraction of cycles where the core's performance was likely hurt due to approaching bandwidth limits of external memory (DRAM).  The underlying heuristic assumes that a similar off-core traffic is generated by all IA cores. This metric does not aggregate non-data-read requests by this logical processor; requests from other IA Logical Processors/Physical Cores/sockets; or other non-IA devices like GPU; hence the maximum external memory bandwidth limits may or may not be approached when this metric is flagged (see Uncore counters for that). Related metrics: tma_info_dram_bw_use",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "C6 residency percent per core",
-        "MetricExpr": "cstate_core@c6\\-residency@ / TSC",
-        "MetricGroup": "Power",
-        "MetricName": "C6_Core_Residency",
+        "BriefDescription": "This metric estimates fraction of cycles where the performance was likely hurt due to latency from external memory (DRAM)",
+        "MetricExpr": "min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD) / tma_info_clks - tma_mem_bandwidth",
+        "MetricGroup": "MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_dram_bound_group;tma_issueLat",
+        "MetricName": "tma_mem_latency",
+        "MetricThreshold": "tma_mem_latency > 0.1 & (tma_dram_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))",
+        "PublicDescription": "This metric estimates fraction of cycles where the performance was likely hurt due to latency from external memory (DRAM).  This metric does not aggregate requests from other Logical Processors/Physical Cores/sockets (see Uncore counters for that). Related metrics: ",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "C7 residency percent per core",
-        "MetricExpr": "cstate_core@c7\\-residency@ / TSC",
-        "MetricGroup": "Power",
-        "MetricName": "C7_Core_Residency",
+        "BriefDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck",
+        "MetricConstraint": "NO_GROUP_EVENTS",
+        "MetricExpr": "(min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.STALLS_L1D_PENDING) + RESOURCE_STALLS.SB) / (min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.CYCLES_NO_DISPATCH) + cpu@UOPS_DISPATCHED.THREAD\\,cmask\\=1@ - (cpu@UOPS_DISPATCHED.THREAD\\,cmask\\=3@ if tma_info_ipc > 1.8 else cpu@UOPS_DISPATCHED.THREAD\\,cmask\\=2@) - (RS_EVENTS.EMPTY_CYCLES if tma_fetch_latency > 0.1 else 0) + RESOURCE_STALLS.SB) * tma_backend_bound",
+        "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
+        "MetricName": "tma_memory_bound",
+        "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0.2",
+        "PublicDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck.  Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "C2 residency percent per package",
-        "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC",
-        "MetricGroup": "Power",
-        "MetricName": "C2_Pkg_Residency",
+        "BriefDescription": "This metric represents fraction of slots the CPU was retiring uops fetched by the Microcode Sequencer (MS) unit",
+        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / UOPS_ISSUED.ANY * IDQ.MS_UOPS / tma_info_slots",
+        "MetricGroup": "MicroSeq;TopdownL3;tma_L3_group;tma_heavy_operations_group;tma_issueMC;tma_issueMS",
+        "MetricName": "tma_microcode_sequencer",
+        "MetricThreshold": "tma_microcode_sequencer > 0.05 & tma_heavy_operations > 0.1",
+        "PublicDescription": "This metric represents fraction of slots the CPU was retiring uops fetched by the Microcode Sequencer (MS) unit.  The MS is used for CISC instructions not supported by the default decoders (like repeat move strings; or CPUID); or by microcode assists used to address some operation modes (like in Floating Point assists). These cases can often be avoided. Sample with: IDQ.MS_UOPS. Related metrics: tma_clears_resteers, tma_l1_bound, tma_machine_clears, tma_ms_switches",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "C3 residency percent per package",
-        "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC",
-        "MetricGroup": "Power",
-        "MetricName": "C3_Pkg_Residency",
+        "BriefDescription": "This metric estimates the fraction of cycles when the CPU was stalled due to switches of uop delivery to the Microcode Sequencer (MS)",
+        "MetricExpr": "3 * IDQ.MS_SWITCHES / tma_info_clks",
+        "MetricGroup": "FetchLat;MicroSeq;TopdownL3;tma_L3_group;tma_fetch_latency_group;tma_issueMC;tma_issueMS;tma_issueMV;tma_issueSO",
+        "MetricName": "tma_ms_switches",
+        "MetricThreshold": "tma_ms_switches > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15)",
+        "PublicDescription": "This metric estimates the fraction of cycles when the CPU was stalled due to switches of uop delivery to the Microcode Sequencer (MS). Commonly used instructions are optimized for delivery by the DSB (decoded i-cache) or MITE (legacy instruction decode) pipelines. Certain operations cannot be handled natively by the execution pipeline; and must be performed by microcode (small programs injected into the execution stream). Switching to the MS too often can negatively impact performance. The MS is designated to deliver long uop flows required by CISC instructions like CPUID; or uncommon conditions like Floating Point Assists when dealing with Denormals. Sample with: IDQ.MS_SWITCHES. Related metrics: tma_clears_resteers, tma_l1_bound, tma_machine_clears, tma_microcode_sequencer, tma_mixing_vectors, tma_serializing_operation",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "C6 residency percent per package",
-        "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC",
-        "MetricGroup": "Power",
-        "MetricName": "C6_Pkg_Residency",
+        "BriefDescription": "This metric estimates fraction of cycles the CPU performance was potentially limited due to Core computation issues (non divider-related)",
+        "MetricConstraint": "NO_GROUP_EVENTS",
+        "MetricExpr": "(min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.CYCLES_NO_DISPATCH) + cpu@UOPS_DISPATCHED.THREAD\\,cmask\\=1@ - (cpu@UOPS_DISPATCHED.THREAD\\,cmask\\=3@ if tma_info_ipc > 1.8 else cpu@UOPS_DISPATCHED.THREAD\\,cmask\\=2@) - (RS_EVENTS.EMPTY_CYCLES if tma_fetch_latency > 0.1 else 0) + RESOURCE_STALLS.SB - RESOURCE_STALLS.SB - min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.STALLS_L1D_PENDING)) / tma_info_clks",
+        "MetricGroup": "PortsUtil;TopdownL3;tma_L3_group;tma_core_bound_group",
+        "MetricName": "tma_ports_utilization",
+        "MetricThreshold": "tma_ports_utilization > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2)",
+        "PublicDescription": "This metric estimates fraction of cycles the CPU performance was potentially limited due to Core computation issues (non divider-related).  Two distinct categories can be attributed into this metric: (1) heavy data-dependency among contiguous instructions would manifest in this metric - such cases are often referred to as low Instruction Level Parallelism (ILP). (2) Contention on some hardware execution unit other than Divider. For example; when there are too many multiply operations.",
         "ScaleUnit": "100%"
     },
     {
-        "BriefDescription": "C7 residency percent per package",
-        "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC",
-        "MetricGroup": "Power",
-        "MetricName": "C7_Pkg_Residency",
+        "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired",
+        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / tma_info_slots",
+        "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
+        "MetricName": "tma_retiring",
+        "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
+        "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category.  Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved.  Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance.  For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.RETIRE_SLOTS",
+        "ScaleUnit": "100%"
+    },
+    {
+        "BriefDescription": "This metric estimates how often CPU was stalled  due to RFO store memory accesses; RFO store issue a read-for-ownership request before the write",
+        "MetricExpr": "RESOURCE_STALLS.SB / tma_info_clks",
+        "MetricGroup": "MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_memory_bound_group",
+        "MetricName": "tma_store_bound",
+        "MetricThreshold": "tma_store_bound > 0.2 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2)",
+        "PublicDescription": "This metric estimates how often CPU was stalled  due to RFO store memory accesses; RFO store issue a read-for-ownership request before the write. Even though store accesses do not typically stall out-of-order CPUs; there are few cases where stores can lead to actual stalls. This metric will be flagged should RFO stores be a bottleneck. Sample with: MEM_UOPS_RETIRED.ALL_STORES_PS",
+        "ScaleUnit": "100%"
+    },
+    {
+        "BriefDescription": "This metric serves as an approximation of legacy x87 usage",
+        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS * FP_COMP_OPS_EXE.X87 / UOPS_DISPATCHED.THREAD",
+        "MetricGroup": "Compute;TopdownL4;tma_L4_group;tma_fp_arith_group",
+        "MetricName": "tma_x87_use",
+        "MetricThreshold": "tma_x87_use > 0.1 & (tma_fp_arith > 0.2 & tma_light_operations > 0.6)",
+        "PublicDescription": "This metric serves as an approximation of legacy x87 usage. It accounts for instructions beyond X87 FP arithmetic operations; hence may be used as a thermometer to avoid X87 high usage and preferably upgrade to modern ISA. See Tip under Tuning Hint.",
         "ScaleUnit": "100%"
     }
 ]
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 27/51] perf vendor events intel: Refresh silvermont events
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (12 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 25/51] perf vendor events intel: Refresh sandybridge events Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 31/51] perf vendor events intel: Refresh westmereep-dp events Ian Rogers
                   ` (22 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Update the silvermont events from 14 to 15. Generation was done using
https://github.com/intel/perfmon.

The most notable change is in corrections to event descriptions.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/arch/x86/mapfile.csv              | 2 +-
 tools/perf/pmu-events/arch/x86/silvermont/frontend.json | 2 +-
 tools/perf/pmu-events/arch/x86/silvermont/pipeline.json | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv
index 1f611a7dbdda..d1d40d0f2b2c 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -23,7 +23,7 @@ GenuineIntel-6-1[AEF],v3,nehalemep,core
 GenuineIntel-6-2E,v3,nehalemex,core
 GenuineIntel-6-2A,v18,sandybridge,core
 GenuineIntel-6-(8F|CF),v1.11,sapphirerapids,core
-GenuineIntel-6-(37|4A|4C|4D|5A),v14,silvermont,core
+GenuineIntel-6-(37|4A|4C|4D|5A),v15,silvermont,core
 GenuineIntel-6-(4E|5E|8E|9E|A5|A6),v53,skylake,core
 GenuineIntel-6-55-[01234],v1.28,skylakex,core
 GenuineIntel-6-86,v1.20,snowridgex,core
diff --git a/tools/perf/pmu-events/arch/x86/silvermont/frontend.json b/tools/perf/pmu-events/arch/x86/silvermont/frontend.json
index c35da10f7133..cd6ed3f59e26 100644
--- a/tools/perf/pmu-events/arch/x86/silvermont/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/silvermont/frontend.json
@@ -11,7 +11,7 @@
         "BriefDescription": "Counts the number of JCC baclears",
         "EventCode": "0xE6",
         "EventName": "BACLEARS.COND",
-        "PublicDescription": "The BACLEARS event counts the number of times the front end is resteered, mainly when the Branch Prediction Unit cannot provide a correct prediction and this is corrected by the Branch Address Calculator at the front end.  The BACLEARS.COND event counts the number of JCC (Jump on Condtional Code) baclears.",
+        "PublicDescription": "The BACLEARS event counts the number of times the front end is resteered, mainly when the Branch Prediction Unit cannot provide a correct prediction and this is corrected by the Branch Address Calculator at the front end.  The BACLEARS.COND event counts the number of JCC (Jump on Conditional Code) baclears.",
         "SampleAfterValue": "200003",
         "UMask": "0x10"
     },
diff --git a/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json b/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json
index 59f6116a7eae..2d4214bf9e39 100644
--- a/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json
@@ -228,7 +228,7 @@
         "BriefDescription": "Counts the number of cycles when no uops are allocated, the IQ is empty, and no other condition is blocking allocation.",
         "EventCode": "0xCA",
         "EventName": "NO_ALLOC_CYCLES.NOT_DELIVERED",
-        "PublicDescription": "The NO_ALLOC_CYCLES.NOT_DELIVERED event is used to measure front-end inefficiencies, i.e. when front-end of the machine is not delivering micro-ops to the back-end and the back-end is not stalled. This event can be used to identify if the machine is truly front-end bound.  When this event occurs, it is an indication that the front-end of the machine is operating at less than its theoretical peak performance.  Background: We can think of the processor pipeline as being divided into 2 broader parts: Front-end and Back-end. Front-end is responsible for fetching the instruction, decoding into micro-ops (uops) in machine understandable format and putting them into a micro-op queue to be consumed by back end. The back-end then takes these micro-ops, allocates the required resources.  When all resources are ready, micro-ops are executed. If the back-end is not ready to accept micro-ops from the front-end, then we do not want to count these as front-end bottlenecks.  However, whenever we have bottlenecks in the back-end, we will have allocation unit stalls and eventually forcing the front-end to wait until the back-end is ready to receive more UOPS. This event counts the cycles only when back-end is requesting more uops and front-end is not able to provide them. Some examples of conditions that cause front-end efficiencies are: Icache misses, ITLB misses, and decoder restrictions that limit the the front-end bandwidth.",
+        "PublicDescription": "The NO_ALLOC_CYCLES.NOT_DELIVERED event is used to measure front-end inefficiencies, i.e. when front-end of the machine is not delivering micro-ops to the back-end and the back-end is not stalled. This event can be used to identify if the machine is truly front-end bound.  When this event occurs, it is an indication that the front-end of the machine is operating at less than its theoretical peak performance.  Background: We can think of the processor pipeline as being divided into 2 broader parts: Front-end and Back-end. Front-end is responsible for fetching the instruction, decoding into micro-ops (uops) in machine understandable format and putting them into a micro-op queue to be consumed by back end. The back-end then takes these micro-ops, allocates the required resources.  When all resources are ready, micro-ops are executed. If the back-end is not ready to accept micro-ops from the front-end, then we do not want to count these as front-end bottlenecks.  However, whenever we have bottlenecks in the back-end, we will have allocation unit stalls and eventually forcing the front-end to wait until the back-end is ready to receive more UOPS. This event counts the cycles only when back-end is requesting more uops and front-end is not able to provide them. Some examples of conditions that cause front-end efficiencies are: Icache misses, ITLB misses, and decoder restrictions that limit the front-end bandwidth.",
         "SampleAfterValue": "200003",
         "UMask": "0x50"
     },
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 31/51] perf vendor events intel: Refresh westmereep-dp events
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (13 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 27/51] perf vendor events intel: Refresh silvermont events Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 32/51] perf jevents: Add rand support to metrics Ian Rogers
                   ` (21 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Update the westmereep-dp events from 3 to 4. Generation was done
using https://github.com/intel/perfmon.

The most notable change is in corrections to event descriptions.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/arch/x86/mapfile.csv                      | 2 +-
 tools/perf/pmu-events/arch/x86/westmereep-dp/cache.json         | 2 +-
 .../perf/pmu-events/arch/x86/westmereep-dp/virtual-memory.json  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv
index bc2c4e756f44..1c6eef118e61 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -28,7 +28,7 @@ GenuineIntel-6-(4E|5E|8E|9E|A5|A6),v54,skylake,core
 GenuineIntel-6-55-[01234],v1.29,skylakex,core
 GenuineIntel-6-86,v1.20,snowridgex,core
 GenuineIntel-6-8[CD],v1.10,tigerlake,core
-GenuineIntel-6-2C,v3,westmereep-dp,core
+GenuineIntel-6-2C,v4,westmereep-dp,core
 GenuineIntel-6-25,v3,westmereep-sp,core
 GenuineIntel-6-2F,v3,westmereex,core
 AuthenticAMD-23-([12][0-9A-F]|[0-9A-F]),v2,amdzen1,core
diff --git a/tools/perf/pmu-events/arch/x86/westmereep-dp/cache.json b/tools/perf/pmu-events/arch/x86/westmereep-dp/cache.json
index 5c897da3cd6b..4dae735fb636 100644
--- a/tools/perf/pmu-events/arch/x86/westmereep-dp/cache.json
+++ b/tools/perf/pmu-events/arch/x86/westmereep-dp/cache.json
@@ -182,7 +182,7 @@
         "UMask": "0x20"
     },
     {
-        "BriefDescription": "L2 lines alloacated",
+        "BriefDescription": "L2 lines allocated",
         "EventCode": "0xF1",
         "EventName": "L2_LINES_IN.ANY",
         "SampleAfterValue": "100000",
diff --git a/tools/perf/pmu-events/arch/x86/westmereep-dp/virtual-memory.json b/tools/perf/pmu-events/arch/x86/westmereep-dp/virtual-memory.json
index ef635bff1522..f75084309041 100644
--- a/tools/perf/pmu-events/arch/x86/westmereep-dp/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/westmereep-dp/virtual-memory.json
@@ -56,7 +56,7 @@
         "UMask": "0x80"
     },
     {
-        "BriefDescription": "DTLB misses casued by low part of address",
+        "BriefDescription": "DTLB misses caused by low part of address",
         "EventCode": "0x49",
         "EventName": "DTLB_MISSES.PDE_MISS",
         "SampleAfterValue": "200000",
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 32/51] perf jevents: Add rand support to metrics
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (14 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 31/51] perf vendor events intel: Refresh westmereep-dp events Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 33/51] perf jevent: Parse metric thresholds Ian Rogers
                   ` (20 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

rand (reverse and) is useful in the parsing of metric
thresholds. Update the documentation on operator precedence to clarify
the simple expression parser and python differences wrt binary/logical
operators.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/metric.py | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/perf/pmu-events/metric.py b/tools/perf/pmu-events/metric.py
index 77ea6ff98538..8ec0ba884673 100644
--- a/tools/perf/pmu-events/metric.py
+++ b/tools/perf/pmu-events/metric.py
@@ -44,6 +44,9 @@ class Expression:
   def __and__(self, other: Union[int, float, 'Expression']) -> 'Operator':
     return Operator('&', self, other)
 
+  def __rand__(self, other: Union[int, float, 'Expression']) -> 'Operator':
+    return Operator('&', other, self)
+
   def __lt__(self, other: Union[int, float, 'Expression']) -> 'Operator':
     return Operator('<', self, other)
 
@@ -88,7 +91,10 @@ def _Constify(val: Union[bool, int, float, Expression]) -> Expression:
 
 
 # Simple lookup for operator precedence, used to avoid unnecessary
-# brackets. Precedence matches that of python and the simple expression parser.
+# brackets. Precedence matches that of the simple expression parser
+# but differs from python where comparisons are lower precedence than
+# the bitwise &, ^, | but not the logical versions that the expression
+# parser doesn't have.
 _PRECEDENCE = {
     '|': 0,
     '^': 1,
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 33/51] perf jevent: Parse metric thresholds
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (15 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 32/51] perf jevents: Add rand support to metrics Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 34/51] perf pmu-events: Test parsing metric thresholds with the fake PMU Ian Rogers
                   ` (19 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Parse the metric threshold and add to the pmu-events.c file. The
metric isn't parsed as the parser uses python's parser and will break
the operator precedence.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/jevents.py   | 7 ++++++-
 tools/perf/pmu-events/pmu-events.h | 1 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index e82dff3a1228..40b9e626fc15 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -51,7 +51,7 @@ _json_event_attributes = [
 
 # Attributes that are in pmu_metric rather than pmu_event.
 _json_metric_attributes = [
-    'metric_name', 'metric_group', 'metric_expr', 'desc',
+    'metric_name', 'metric_group', 'metric_expr', 'metric_threshold', 'desc',
     'long_desc', 'unit', 'compat', 'aggr_mode', 'event_grouping'
 ]
 # Attributes that are bools or enum int values, encoded as '0', '1',...
@@ -306,6 +306,9 @@ class JsonEvent:
     self.metric_expr = None
     if 'MetricExpr' in jd:
       self.metric_expr = metric.ParsePerfJson(jd['MetricExpr']).Simplify()
+    # Note, the metric formula for the threshold isn't parsed as the &
+    # and > have incorrect precedence.
+    self.metric_threshold = jd.get('MetricThreshold')
 
     arch_std = jd.get('ArchStdEvent')
     if precise and self.desc and '(Precise Event)' not in self.desc:
@@ -362,6 +365,8 @@ class JsonEvent:
         # Convert parsed metric expressions into a string. Slashes
         # must be doubled in the file.
         x = x.ToPerfJson().replace('\\', '\\\\')
+      if metric and x and attr == 'metric_threshold':
+        x = x.replace('\\', '\\\\')
       if attr in _json_enum_attributes:
         s += x if x else '0'
       else:
diff --git a/tools/perf/pmu-events/pmu-events.h b/tools/perf/pmu-events/pmu-events.h
index 57a38e3e5c32..b7dff8f1021f 100644
--- a/tools/perf/pmu-events/pmu-events.h
+++ b/tools/perf/pmu-events/pmu-events.h
@@ -54,6 +54,7 @@ struct pmu_metric {
 	const char *metric_name;
 	const char *metric_group;
 	const char *metric_expr;
+	const char *metric_threshold;
 	const char *unit;
 	const char *compat;
 	const char *desc;
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 34/51] perf pmu-events: Test parsing metric thresholds with the fake PMU
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (16 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 33/51] perf jevent: Parse metric thresholds Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 35/51] perf list: Support for printing metric thresholds Ian Rogers
                   ` (18 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Test the correctness of metric thresholds by testing them all with the
fake PMU logic.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/tests/pmu-events.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index 521557c396bc..db2fed0c6993 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -1021,12 +1021,34 @@ static int test__parsing_fake(struct test_suite *test __maybe_unused,
 	return pmu_for_each_sys_metric(test__parsing_fake_callback, NULL);
 }
 
+static int test__parsing_threshold_callback(const struct pmu_metric *pm,
+					const struct pmu_metrics_table *table __maybe_unused,
+					void *data __maybe_unused)
+{
+	if (!pm->metric_threshold)
+		return 0;
+	return metric_parse_fake(pm->metric_name, pm->metric_threshold);
+}
+
+static int test__parsing_threshold(struct test_suite *test __maybe_unused,
+			      int subtest __maybe_unused)
+{
+	int err = 0;
+
+	err = pmu_for_each_core_metric(test__parsing_threshold_callback, NULL);
+	if (err)
+		return err;
+
+	return pmu_for_each_sys_metric(test__parsing_threshold_callback, NULL);
+}
+
 static struct test_case pmu_events_tests[] = {
 	TEST_CASE("PMU event table sanity", pmu_event_table),
 	TEST_CASE("PMU event map aliases", aliases),
 	TEST_CASE_REASON("Parsing of PMU event table metrics", parsing,
 			 "some metrics failed"),
 	TEST_CASE("Parsing of PMU event table metrics with fake PMUs", parsing_fake),
+	TEST_CASE("Parsing of metric thresholds with fake PMUs", parsing_threshold),
 	{ .name = NULL, }
 };
 
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 35/51] perf list: Support for printing metric thresholds
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (17 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 34/51] perf pmu-events: Test parsing metric thresholds with the fake PMU Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 36/51] perf metric: Compute and print threshold values Ian Rogers
                   ` (17 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Add to json output by default. For regular output, only enable with
the --detail flag.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-list.c      | 13 ++++++++++++-
 tools/perf/util/metricgroup.c  |  3 +++
 tools/perf/util/print-events.h |  1 +
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c
index 791f513ae5b4..76e1d31a68ee 100644
--- a/tools/perf/builtin-list.c
+++ b/tools/perf/builtin-list.c
@@ -168,6 +168,7 @@ static void default_print_metric(void *ps,
 				const char *desc,
 				const char *long_desc,
 				const char *expr,
+				const char *threshold,
 				const char *unit __maybe_unused)
 {
 	struct print_state *print_state = ps;
@@ -227,6 +228,11 @@ static void default_print_metric(void *ps,
 		wordwrap(expr, 8, pager_get_columns(), 0);
 		printf("]\n");
 	}
+	if (threshold && print_state->detailed) {
+		printf("%*s", 8, "[");
+		wordwrap(threshold, 8, pager_get_columns(), 0);
+		printf("]\n");
+	}
 }
 
 struct json_print_state {
@@ -367,7 +373,7 @@ static void json_print_event(void *ps, const char *pmu_name, const char *topic,
 static void json_print_metric(void *ps __maybe_unused, const char *group,
 			      const char *name, const char *desc,
 			      const char *long_desc, const char *expr,
-			      const char *unit)
+			      const char *threshold, const char *unit)
 {
 	struct json_print_state *print_state = ps;
 	bool need_sep = false;
@@ -388,6 +394,11 @@ static void json_print_metric(void *ps __maybe_unused, const char *group,
 		fix_escape_printf(&buf, "%s\t\"MetricExpr\": \"%S\"", need_sep ? ",\n" : "", expr);
 		need_sep = true;
 	}
+	if (threshold) {
+		fix_escape_printf(&buf, "%s\t\"MetricThreshold\": \"%S\"", need_sep ? ",\n" : "",
+				  threshold);
+		need_sep = true;
+	}
 	if (unit) {
 		fix_escape_printf(&buf, "%s\t\"ScaleUnit\": \"%S\"", need_sep ? ",\n" : "", unit);
 		need_sep = true;
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 868fc9c35606..b1d56a73223d 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -368,6 +368,7 @@ struct mep {
 	const char *metric_desc;
 	const char *metric_long_desc;
 	const char *metric_expr;
+	const char *metric_threshold;
 	const char *metric_unit;
 };
 
@@ -447,6 +448,7 @@ static int metricgroup__add_to_mep_groups(const struct pmu_metric *pm,
 			me->metric_desc = pm->desc;
 			me->metric_long_desc = pm->long_desc;
 			me->metric_expr = pm->metric_expr;
+			me->metric_threshold = pm->metric_threshold;
 			me->metric_unit = pm->unit;
 		}
 	}
@@ -522,6 +524,7 @@ void metricgroup__print(const struct print_callbacks *print_cb, void *print_stat
 				me->metric_desc,
 				me->metric_long_desc,
 				me->metric_expr,
+				me->metric_threshold,
 				me->metric_unit);
 		next = rb_next(node);
 		rblist__remove_node(&groups, node);
diff --git a/tools/perf/util/print-events.h b/tools/perf/util/print-events.h
index 716dcf4b4859..e75a3d7e3fe3 100644
--- a/tools/perf/util/print-events.h
+++ b/tools/perf/util/print-events.h
@@ -23,6 +23,7 @@ struct print_callbacks {
 			const char *desc,
 			const char *long_desc,
 			const char *expr,
+			const char *threshold,
 			const char *unit);
 };
 
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 36/51] perf metric: Compute and print threshold values
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (18 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 35/51] perf list: Support for printing metric thresholds Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 37/51] perf expr: More explicit NAN handling Ian Rogers
                   ` (16 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Compute the threshold metric and use it to color the metric value as
red or green. The threshold expression is used to generate the set of
events as the threshold may require additional events. A later patch
make this behavior optional with a --metric-no-threshold flag.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/metricgroup.c | 24 +++++++++++++++++++++---
 tools/perf/util/metricgroup.h |  1 +
 tools/perf/util/stat-shadow.c | 24 ++++++++++++++++--------
 3 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index b1d56a73223d..d83885697125 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -129,6 +129,8 @@ struct metric {
 	const char *modifier;
 	/** The expression to parse, for example, "instructions/cycles". */
 	const char *metric_expr;
+	/** Optional threshold expression where zero value is green, otherwise red. */
+	const char *metric_threshold;
 	/**
 	 * The "ScaleUnit" that scales and adds a unit to the metric during
 	 * output.
@@ -222,6 +224,7 @@ static struct metric *metric__new(const struct pmu_metric *pm,
 			goto out_err;
 	}
 	m->metric_expr = pm->metric_expr;
+	m->metric_threshold = pm->metric_threshold;
 	m->metric_unit = pm->unit;
 	m->pctx->sctx.user_requested_cpu_list = NULL;
 	if (user_requested_cpu_list) {
@@ -901,6 +904,7 @@ static int __add_metric(struct list_head *metric_list,
 	const struct visited_metric *vm;
 	int ret;
 	bool is_root = !root_metric;
+	const char *expr;
 	struct visited_metric visited_node = {
 		.name = pm->metric_name,
 		.parent = visited,
@@ -963,16 +967,29 @@ static int __add_metric(struct list_head *metric_list,
 	 * For both the parent and referenced metrics, we parse
 	 * all the metric's IDs and add it to the root context.
 	 */
-	if (expr__find_ids(pm->metric_expr, NULL, root_metric->pctx) < 0) {
+	ret = 0;
+	expr = pm->metric_expr;
+	if (is_root && pm->metric_threshold) {
+		/*
+		 * Threshold expressions are built off the actual metric. Switch
+		 * to use that in case of additional necessary events. Change
+		 * the visited node name to avoid this being flagged as
+		 * recursion.
+		 */
+		assert(strstr(pm->metric_threshold, pm->metric_name));
+		expr = pm->metric_threshold;
+		visited_node.name = "__threshold__";
+	}
+	if (expr__find_ids(expr, NULL, root_metric->pctx) < 0) {
 		/* Broken metric. */
 		ret = -EINVAL;
-	} else {
+	}
+	if (!ret) {
 		/* Resolve referenced metrics. */
 		ret = resolve_metric(metric_list, modifier, metric_no_group,
 				     user_requested_cpu_list, system_wide,
 				     root_metric, &visited_node, table);
 	}
-
 	if (ret) {
 		if (is_root)
 			metric__free(root_metric);
@@ -1554,6 +1571,7 @@ static int parse_groups(struct evlist *perf_evlist, const char *str,
 			free(metric_events);
 			goto out;
 		}
+		expr->metric_threshold = m->metric_threshold;
 		expr->metric_unit = m->metric_unit;
 		expr->metric_events = metric_events;
 		expr->runtime = m->pctx->sctx.runtime;
diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
index 84030321a057..32eb3a5381fb 100644
--- a/tools/perf/util/metricgroup.h
+++ b/tools/perf/util/metricgroup.h
@@ -47,6 +47,7 @@ struct metric_expr {
 	const char *metric_expr;
 	/** The name of the meric such as "IPC". */
 	const char *metric_name;
+	const char *metric_threshold;
 	/**
 	 * The "ScaleUnit" that scales and adds a unit to the metric during
 	 * output. For example, "6.4e-05MiB" means to scale the resulting metric
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 806b32156459..a41f186c6ec8 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -777,6 +777,7 @@ static int prepare_metric(struct evsel **metric_events,
 
 static void generic_metric(struct perf_stat_config *config,
 			   const char *metric_expr,
+			   const char *metric_threshold,
 			   struct evsel **metric_events,
 			   struct metric_ref *metric_refs,
 			   char *name,
@@ -789,9 +790,10 @@ static void generic_metric(struct perf_stat_config *config,
 {
 	print_metric_t print_metric = out->print_metric;
 	struct expr_parse_ctx *pctx;
-	double ratio, scale;
+	double ratio, scale, threshold;
 	int i;
 	void *ctxp = out->ctx;
+	const char *color = NULL;
 
 	pctx = expr__ctx_new();
 	if (!pctx)
@@ -811,6 +813,12 @@ static void generic_metric(struct perf_stat_config *config,
 			char *unit;
 			char metric_bf[64];
 
+			if (metric_threshold &&
+			    expr__parse(&threshold, pctx, metric_threshold) == 0) {
+				color = fpclassify(threshold) == FP_ZERO
+					? PERF_COLOR_GREEN : PERF_COLOR_RED;
+			}
+
 			if (metric_unit && metric_name) {
 				if (perf_pmu__convert_scale(metric_unit,
 					&unit, &scale) >= 0) {
@@ -823,22 +831,22 @@ static void generic_metric(struct perf_stat_config *config,
 					scnprintf(metric_bf, sizeof(metric_bf),
 					  "%s  %s", unit, metric_name);
 
-				print_metric(config, ctxp, NULL, "%8.1f",
+				print_metric(config, ctxp, color, "%8.1f",
 					     metric_bf, ratio);
 			} else {
-				print_metric(config, ctxp, NULL, "%8.2f",
+				print_metric(config, ctxp, color, "%8.2f",
 					metric_name ?
 					metric_name :
 					out->force_header ?  name : "",
 					ratio);
 			}
 		} else {
-			print_metric(config, ctxp, NULL, NULL,
+			print_metric(config, ctxp, color, /*unit=*/NULL,
 				     out->force_header ?
 				     (metric_name ? metric_name : name) : "", 0);
 		}
 	} else {
-		print_metric(config, ctxp, NULL, NULL,
+		print_metric(config, ctxp, color, /*unit=*/NULL,
 			     out->force_header ?
 			     (metric_name ? metric_name : name) : "", 0);
 	}
@@ -1214,9 +1222,9 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 		list_for_each_entry (mexp, &me->head, nd) {
 			if (num++ > 0)
 				out->new_line(config, ctxp);
-			generic_metric(config, mexp->metric_expr, mexp->metric_events,
-				       mexp->metric_refs, evsel->name, mexp->metric_name,
-				       mexp->metric_unit, mexp->runtime,
+			generic_metric(config, mexp->metric_expr, mexp->metric_threshold,
+				       mexp->metric_events, mexp->metric_refs, evsel->name,
+				       mexp->metric_name, mexp->metric_unit, mexp->runtime,
 				       map_idx, out, st);
 		}
 	}
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 37/51] perf expr: More explicit NAN handling
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (19 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 36/51] perf metric: Compute and print threshold values Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 38/51] perf metric: Add --metric-no-threshold option Ian Rogers
                   ` (15 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Comparison and logical operations on NAN won't ensure the result is
NAN. Ensure NANs are propogated so that threshold expressions like
"tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15" don't yield a
number when the components are NAN.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/expr.y | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/expr.y b/tools/perf/util/expr.y
index 635e562350c5..250e444bf032 100644
--- a/tools/perf/util/expr.y
+++ b/tools/perf/util/expr.y
@@ -127,7 +127,11 @@ static struct ids handle_id(struct expr_parse_ctx *ctx, char *id,
 	if (!compute_ids || (is_const(LHS.val) && is_const(RHS.val))) { \
 		assert(LHS.ids == NULL);				\
 		assert(RHS.ids == NULL);				\
-		RESULT.val = (long)LHS.val OP (long)RHS.val;		\
+		if (isnan(LHS.val) || isnan(RHS.val)) {			\
+			RESULT.val = NAN;				\
+		} else {						\
+			RESULT.val = (long)LHS.val OP (long)RHS.val;	\
+		}							\
 		RESULT.ids = NULL;					\
 	} else {							\
 	        RESULT = union_expr(LHS, RHS);				\
@@ -137,7 +141,11 @@ static struct ids handle_id(struct expr_parse_ctx *ctx, char *id,
 	if (!compute_ids || (is_const(LHS.val) && is_const(RHS.val))) { \
 		assert(LHS.ids == NULL);				\
 		assert(RHS.ids == NULL);				\
-		RESULT.val = LHS.val OP RHS.val;			\
+		if (isnan(LHS.val) || isnan(RHS.val)) {			\
+			RESULT.val = NAN;				\
+		} else {						\
+			RESULT.val = LHS.val OP RHS.val;		\
+		}							\
 		RESULT.ids = NULL;					\
 	} else {							\
 	        RESULT = union_expr(LHS, RHS);				\
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 38/51] perf metric: Add --metric-no-threshold option
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (20 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 37/51] perf expr: More explicit NAN handling Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 39/51] perf stat: Add TopdownL1 metric as a default if present Ian Rogers
                   ` (14 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Thresholds may need additional events, this can impact things like
sharing groups of events to avoid multiplexing. Add a flag to make the
threshold calculations optional. The threshold will still be computed
if no additional events are necessary.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-stat.c        |  4 +++
 tools/perf/tests/expand-cgroup.c |  3 +-
 tools/perf/tests/parse-metric.c  |  1 -
 tools/perf/tests/pmu-events.c    |  4 +--
 tools/perf/util/metricgroup.c    | 62 ++++++++++++++++++++------------
 tools/perf/util/metricgroup.h    |  3 +-
 tools/perf/util/stat-shadow.c    |  3 +-
 tools/perf/util/stat.h           |  1 +
 8 files changed, 49 insertions(+), 32 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 5d18a5a6f662..5e13171a7bba 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1256,6 +1256,8 @@ static struct option stat_options[] = {
 		       "don't group metric events, impacts multiplexing"),
 	OPT_BOOLEAN(0, "metric-no-merge", &stat_config.metric_no_merge,
 		       "don't try to share events between metrics in a group"),
+	OPT_BOOLEAN(0, "metric-no-threshold", &stat_config.metric_no_threshold,
+		       "don't try to share events between metrics in a group  "),
 	OPT_BOOLEAN(0, "topdown", &topdown_run,
 			"measure top-down statistics"),
 	OPT_UINTEGER(0, "td-level", &stat_config.topdown_level,
@@ -1852,6 +1854,7 @@ static int add_default_attributes(void)
 			return metricgroup__parse_groups(evsel_list, "transaction",
 							 stat_config.metric_no_group,
 							 stat_config.metric_no_merge,
+							 stat_config.metric_no_threshold,
 							 stat_config.user_requested_cpu_list,
 							 stat_config.system_wide,
 							 &stat_config.metric_events);
@@ -2519,6 +2522,7 @@ int cmd_stat(int argc, const char **argv)
 		metricgroup__parse_groups(evsel_list, metrics,
 					stat_config.metric_no_group,
 					stat_config.metric_no_merge,
+					stat_config.metric_no_threshold,
 					stat_config.user_requested_cpu_list,
 					stat_config.system_wide,
 					&stat_config.metric_events);
diff --git a/tools/perf/tests/expand-cgroup.c b/tools/perf/tests/expand-cgroup.c
index 672a27f37060..ec340880a848 100644
--- a/tools/perf/tests/expand-cgroup.c
+++ b/tools/perf/tests/expand-cgroup.c
@@ -187,8 +187,7 @@ static int expand_metric_events(void)
 
 	rblist__init(&metric_events);
 	pme_test = find_core_metrics_table("testarch", "testcpu");
-	ret = metricgroup__parse_groups_test(evlist, pme_test, metric_str,
-					     false, false, &metric_events);
+	ret = metricgroup__parse_groups_test(evlist, pme_test, metric_str, &metric_events);
 	if (ret < 0) {
 		pr_debug("failed to parse '%s' metric\n", metric_str);
 		goto out;
diff --git a/tools/perf/tests/parse-metric.c b/tools/perf/tests/parse-metric.c
index 9fec6040950c..132c9b945a42 100644
--- a/tools/perf/tests/parse-metric.c
+++ b/tools/perf/tests/parse-metric.c
@@ -98,7 +98,6 @@ static int __compute_metric(const char *name, struct value *vals,
 	/* Parse the metric into metric_events list. */
 	pme_test = find_core_metrics_table("testarch", "testcpu");
 	err = metricgroup__parse_groups_test(evlist, pme_test, name,
-					     false, false,
 					     &metric_events);
 	if (err)
 		goto out;
diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index db2fed0c6993..50b99a0f8f59 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -846,9 +846,7 @@ static int test__parsing_callback(const struct pmu_metric *pm,
 	perf_evlist__set_maps(&evlist->core, cpus, NULL);
 	runtime_stat__init(&st);
 
-	err = metricgroup__parse_groups_test(evlist, table, pm->metric_name,
-					     false, false,
-					     &metric_events);
+	err = metricgroup__parse_groups_test(evlist, table, pm->metric_name, &metric_events);
 	if (err) {
 		if (!strcmp(pm->metric_name, "M1") || !strcmp(pm->metric_name, "M2") ||
 		    !strcmp(pm->metric_name, "M3")) {
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index d83885697125..afb6f2fdc24e 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -771,6 +771,7 @@ struct metricgroup_add_iter_data {
 	int *ret;
 	bool *has_match;
 	bool metric_no_group;
+	bool metric_no_threshold;
 	const char *user_requested_cpu_list;
 	bool system_wide;
 	struct metric *root_metric;
@@ -786,6 +787,7 @@ static int add_metric(struct list_head *metric_list,
 		      const struct pmu_metric *pm,
 		      const char *modifier,
 		      bool metric_no_group,
+		      bool metric_no_threshold,
 		      const char *user_requested_cpu_list,
 		      bool system_wide,
 		      struct metric *root_metric,
@@ -813,6 +815,7 @@ static int add_metric(struct list_head *metric_list,
 static int resolve_metric(struct list_head *metric_list,
 			  const char *modifier,
 			  bool metric_no_group,
+			  bool metric_no_threshold,
 			  const char *user_requested_cpu_list,
 			  bool system_wide,
 			  struct metric *root_metric,
@@ -861,8 +864,8 @@ static int resolve_metric(struct list_head *metric_list,
 	 */
 	for (i = 0; i < pending_cnt; i++) {
 		ret = add_metric(metric_list, &pending[i].pm, modifier, metric_no_group,
-				 user_requested_cpu_list, system_wide, root_metric, visited,
-				 table);
+				 metric_no_threshold, user_requested_cpu_list, system_wide,
+				 root_metric, visited, table);
 		if (ret)
 			break;
 	}
@@ -879,6 +882,7 @@ static int resolve_metric(struct list_head *metric_list,
  * @metric_no_group: Should events written to events be grouped "{}" or
  *                   global. Grouping is the default but due to multiplexing the
  *                   user may override.
+ * @metric_no_threshold: Should threshold expressions be ignored?
  * @runtime: A special argument for the parser only known at runtime.
  * @user_requested_cpu_list: Command line specified CPUs to record on.
  * @system_wide: Are events for all processes recorded.
@@ -894,6 +898,7 @@ static int __add_metric(struct list_head *metric_list,
 			const struct pmu_metric *pm,
 			const char *modifier,
 			bool metric_no_group,
+			bool metric_no_threshold,
 			int runtime,
 			const char *user_requested_cpu_list,
 			bool system_wide,
@@ -974,10 +979,12 @@ static int __add_metric(struct list_head *metric_list,
 		 * Threshold expressions are built off the actual metric. Switch
 		 * to use that in case of additional necessary events. Change
 		 * the visited node name to avoid this being flagged as
-		 * recursion.
+		 * recursion. If the threshold events are disabled, just use the
+		 * metric's name as a reference. This allows metric threshold
+		 * computation if there are sufficient events.
 		 */
 		assert(strstr(pm->metric_threshold, pm->metric_name));
-		expr = pm->metric_threshold;
+		expr = metric_no_threshold ? pm->metric_name : pm->metric_threshold;
 		visited_node.name = "__threshold__";
 	}
 	if (expr__find_ids(expr, NULL, root_metric->pctx) < 0) {
@@ -987,8 +994,8 @@ static int __add_metric(struct list_head *metric_list,
 	if (!ret) {
 		/* Resolve referenced metrics. */
 		ret = resolve_metric(metric_list, modifier, metric_no_group,
-				     user_requested_cpu_list, system_wide,
-				     root_metric, &visited_node, table);
+				     metric_no_threshold, user_requested_cpu_list,
+				     system_wide, root_metric, &visited_node, table);
 	}
 	if (ret) {
 		if (is_root)
@@ -1035,6 +1042,7 @@ static int add_metric(struct list_head *metric_list,
 		      const struct pmu_metric *pm,
 		      const char *modifier,
 		      bool metric_no_group,
+		      bool metric_no_threshold,
 		      const char *user_requested_cpu_list,
 		      bool system_wide,
 		      struct metric *root_metric,
@@ -1046,9 +1054,9 @@ static int add_metric(struct list_head *metric_list,
 	pr_debug("metric expr %s for %s\n", pm->metric_expr, pm->metric_name);
 
 	if (!strstr(pm->metric_expr, "?")) {
-		ret = __add_metric(metric_list, pm, modifier, metric_no_group, 0,
-				   user_requested_cpu_list, system_wide, root_metric,
-				   visited, table);
+		ret = __add_metric(metric_list, pm, modifier, metric_no_group,
+				   metric_no_threshold, 0, user_requested_cpu_list,
+				   system_wide, root_metric, visited, table);
 	} else {
 		int j, count;
 
@@ -1060,9 +1068,9 @@ static int add_metric(struct list_head *metric_list,
 		 */
 
 		for (j = 0; j < count && !ret; j++)
-			ret = __add_metric(metric_list, pm, modifier, metric_no_group, j,
-					   user_requested_cpu_list, system_wide,
-					   root_metric, visited, table);
+			ret = __add_metric(metric_list, pm, modifier, metric_no_group,
+					   metric_no_threshold, j, user_requested_cpu_list,
+					   system_wide, root_metric, visited, table);
 	}
 
 	return ret;
@@ -1079,8 +1087,8 @@ static int metricgroup__add_metric_sys_event_iter(const struct pmu_metric *pm,
 		return 0;
 
 	ret = add_metric(d->metric_list, pm, d->modifier, d->metric_no_group,
-			 d->user_requested_cpu_list, d->system_wide,
-			 d->root_metric, d->visited, d->table);
+			 d->metric_no_threshold, d->user_requested_cpu_list,
+			 d->system_wide, d->root_metric, d->visited, d->table);
 	if (ret)
 		goto out;
 
@@ -1124,6 +1132,7 @@ struct metricgroup__add_metric_data {
 	const char *modifier;
 	const char *user_requested_cpu_list;
 	bool metric_no_group;
+	bool metric_no_threshold;
 	bool system_wide;
 	bool has_match;
 };
@@ -1141,8 +1150,9 @@ static int metricgroup__add_metric_callback(const struct pmu_metric *pm,
 
 		data->has_match = true;
 		ret = add_metric(data->list, pm, data->modifier, data->metric_no_group,
-				 data->user_requested_cpu_list, data->system_wide,
-				 /*root_metric=*/NULL, /*visited_metrics=*/NULL, table);
+				 data->metric_no_threshold, data->user_requested_cpu_list,
+				 data->system_wide, /*root_metric=*/NULL,
+				 /*visited_metrics=*/NULL, table);
 	}
 	return ret;
 }
@@ -1163,7 +1173,7 @@ static int metricgroup__add_metric_callback(const struct pmu_metric *pm,
  *       architecture perf is running upon.
  */
 static int metricgroup__add_metric(const char *metric_name, const char *modifier,
-				   bool metric_no_group,
+				   bool metric_no_group, bool metric_no_threshold,
 				   const char *user_requested_cpu_list,
 				   bool system_wide,
 				   struct list_head *metric_list,
@@ -1179,6 +1189,7 @@ static int metricgroup__add_metric(const char *metric_name, const char *modifier
 			.metric_name = metric_name,
 			.modifier = modifier,
 			.metric_no_group = metric_no_group,
+			.metric_no_threshold = metric_no_threshold,
 			.user_requested_cpu_list = user_requested_cpu_list,
 			.system_wide = system_wide,
 			.has_match = false,
@@ -1241,6 +1252,7 @@ static int metricgroup__add_metric(const char *metric_name, const char *modifier
  *       architecture perf is running upon.
  */
 static int metricgroup__add_metric_list(const char *list, bool metric_no_group,
+					bool metric_no_threshold,
 					const char *user_requested_cpu_list,
 					bool system_wide, struct list_head *metric_list,
 					const struct pmu_metrics_table *table)
@@ -1259,7 +1271,8 @@ static int metricgroup__add_metric_list(const char *list, bool metric_no_group,
 			*modifier++ = '\0';
 
 		ret = metricgroup__add_metric(metric_name, modifier,
-					      metric_no_group, user_requested_cpu_list,
+					      metric_no_group, metric_no_threshold,
+					      user_requested_cpu_list,
 					      system_wide, metric_list, table);
 		if (ret == -EINVAL)
 			pr_err("Cannot find metric or group `%s'\n", metric_name);
@@ -1449,6 +1462,7 @@ static int parse_ids(bool metric_no_merge, struct perf_pmu *fake_pmu,
 static int parse_groups(struct evlist *perf_evlist, const char *str,
 			bool metric_no_group,
 			bool metric_no_merge,
+			bool metric_no_threshold,
 			const char *user_requested_cpu_list,
 			bool system_wide,
 			struct perf_pmu *fake_pmu,
@@ -1463,7 +1477,7 @@ static int parse_groups(struct evlist *perf_evlist, const char *str,
 
 	if (metric_events_list->nr_entries == 0)
 		metricgroup__rblist_init(metric_events_list);
-	ret = metricgroup__add_metric_list(str, metric_no_group,
+	ret = metricgroup__add_metric_list(str, metric_no_group, metric_no_threshold,
 					   user_requested_cpu_list,
 					   system_wide, &metric_list, table);
 	if (ret)
@@ -1598,6 +1612,7 @@ int metricgroup__parse_groups(struct evlist *perf_evlist,
 			      const char *str,
 			      bool metric_no_group,
 			      bool metric_no_merge,
+			      bool metric_no_threshold,
 			      const char *user_requested_cpu_list,
 			      bool system_wide,
 			      struct rblist *metric_events)
@@ -1608,18 +1623,19 @@ int metricgroup__parse_groups(struct evlist *perf_evlist,
 		return -EINVAL;
 
 	return parse_groups(perf_evlist, str, metric_no_group, metric_no_merge,
-			    user_requested_cpu_list, system_wide,
+			    metric_no_threshold, user_requested_cpu_list, system_wide,
 			    /*fake_pmu=*/NULL, metric_events, table);
 }
 
 int metricgroup__parse_groups_test(struct evlist *evlist,
 				   const struct pmu_metrics_table *table,
 				   const char *str,
-				   bool metric_no_group,
-				   bool metric_no_merge,
 				   struct rblist *metric_events)
 {
-	return parse_groups(evlist, str, metric_no_group, metric_no_merge,
+	return parse_groups(evlist, str,
+			    /*metric_no_group=*/false,
+			    /*metric_no_merge=*/false,
+			    /*metric_no_threshold=*/false,
 			    /*user_requested_cpu_list=*/NULL,
 			    /*system_wide=*/false,
 			    &perf_pmu__fake, metric_events, table);
diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
index 32eb3a5381fb..8d50052c5b4c 100644
--- a/tools/perf/util/metricgroup.h
+++ b/tools/perf/util/metricgroup.h
@@ -70,14 +70,13 @@ int metricgroup__parse_groups(struct evlist *perf_evlist,
 			      const char *str,
 			      bool metric_no_group,
 			      bool metric_no_merge,
+			      bool metric_no_threshold,
 			      const char *user_requested_cpu_list,
 			      bool system_wide,
 			      struct rblist *metric_events);
 int metricgroup__parse_groups_test(struct evlist *evlist,
 				   const struct pmu_metrics_table *table,
 				   const char *str,
-				   bool metric_no_group,
-				   bool metric_no_merge,
 				   struct rblist *metric_events);
 
 void metricgroup__print(const struct print_callbacks *print_cb, void *print_state);
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index a41f186c6ec8..77483eeda0d8 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -814,7 +814,8 @@ static void generic_metric(struct perf_stat_config *config,
 			char metric_bf[64];
 
 			if (metric_threshold &&
-			    expr__parse(&threshold, pctx, metric_threshold) == 0) {
+			    expr__parse(&threshold, pctx, metric_threshold) == 0 &&
+			    !isnan(threshold)) {
 				color = fpclassify(threshold) == FP_ZERO
 					? PERF_COLOR_GREEN : PERF_COLOR_RED;
 			}
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index b1c29156c560..cf2d8aa445f3 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -159,6 +159,7 @@ struct perf_stat_config {
 	bool			 no_csv_summary;
 	bool			 metric_no_group;
 	bool			 metric_no_merge;
+	bool			 metric_no_threshold;
 	bool			 stop_read_counter;
 	bool			 iostat_run;
 	char			 *user_requested_cpu_list;
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 39/51] perf stat: Add TopdownL1 metric as a default if present
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (21 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 38/51] perf metric: Add --metric-no-threshold option Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-27 19:12   ` Liang, Kan
  2023-02-19  9:28 ` [PATCH v1 40/51] perf stat: Implement --topdown using json metrics Ian Rogers
                   ` (13 subsequent siblings)
  36 siblings, 1 reply; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

When there are no events and on Intel, the topdown events will be
added by default if present. To display the metrics associated with
these request special handling in stat-shadow.c. To more easily update
these metrics use the json metric version via the TopdownL1
group. This makes the handling less platform specific.

Modify the metricgroup__has_metric code to also cover metric groups.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/arch/x86/util/evlist.c  |  6 +++---
 tools/perf/arch/x86/util/topdown.c | 30 ------------------------------
 tools/perf/arch/x86/util/topdown.h |  1 -
 tools/perf/builtin-stat.c          | 14 ++++++++++++++
 tools/perf/util/metricgroup.c      |  6 ++----
 5 files changed, 19 insertions(+), 38 deletions(-)

diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
index cb59ce9b9638..8a7ae4162563 100644
--- a/tools/perf/arch/x86/util/evlist.c
+++ b/tools/perf/arch/x86/util/evlist.c
@@ -59,10 +59,10 @@ int arch_evlist__add_default_attrs(struct evlist *evlist,
 				   struct perf_event_attr *attrs,
 				   size_t nr_attrs)
 {
-	if (nr_attrs)
-		return ___evlist__add_default_attrs(evlist, attrs, nr_attrs);
+	if (!nr_attrs)
+		return 0;
 
-	return topdown_parse_events(evlist);
+	return ___evlist__add_default_attrs(evlist, attrs, nr_attrs);
 }
 
 struct evsel *arch_evlist__leader(struct list_head *list)
diff --git a/tools/perf/arch/x86/util/topdown.c b/tools/perf/arch/x86/util/topdown.c
index 54810f9acd6f..eb3a7d9652ab 100644
--- a/tools/perf/arch/x86/util/topdown.c
+++ b/tools/perf/arch/x86/util/topdown.c
@@ -9,11 +9,6 @@
 #include "topdown.h"
 #include "evsel.h"
 
-#define TOPDOWN_L1_EVENTS       "{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound}"
-#define TOPDOWN_L1_EVENTS_CORE  "{slots,cpu_core/topdown-retiring/,cpu_core/topdown-bad-spec/,cpu_core/topdown-fe-bound/,cpu_core/topdown-be-bound/}"
-#define TOPDOWN_L2_EVENTS       "{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}"
-#define TOPDOWN_L2_EVENTS_CORE  "{slots,cpu_core/topdown-retiring/,cpu_core/topdown-bad-spec/,cpu_core/topdown-fe-bound/,cpu_core/topdown-be-bound/,cpu_core/topdown-heavy-ops/,cpu_core/topdown-br-mispredict/,cpu_core/topdown-fetch-lat/,cpu_core/topdown-mem-bound/}"
-
 /* Check whether there is a PMU which supports the perf metrics. */
 bool topdown_sys_has_perf_metrics(void)
 {
@@ -99,28 +94,3 @@ const char *arch_get_topdown_pmu_name(struct evlist *evlist, bool warn)
 
 	return pmu_name;
 }
-
-int topdown_parse_events(struct evlist *evlist)
-{
-	const char *topdown_events;
-	const char *pmu_name;
-
-	if (!topdown_sys_has_perf_metrics())
-		return 0;
-
-	pmu_name = arch_get_topdown_pmu_name(evlist, false);
-
-	if (pmu_have_event(pmu_name, "topdown-heavy-ops")) {
-		if (!strcmp(pmu_name, "cpu_core"))
-			topdown_events = TOPDOWN_L2_EVENTS_CORE;
-		else
-			topdown_events = TOPDOWN_L2_EVENTS;
-	} else {
-		if (!strcmp(pmu_name, "cpu_core"))
-			topdown_events = TOPDOWN_L1_EVENTS_CORE;
-		else
-			topdown_events = TOPDOWN_L1_EVENTS;
-	}
-
-	return parse_event(evlist, topdown_events);
-}
diff --git a/tools/perf/arch/x86/util/topdown.h b/tools/perf/arch/x86/util/topdown.h
index 7eb81f042838..46bf9273e572 100644
--- a/tools/perf/arch/x86/util/topdown.h
+++ b/tools/perf/arch/x86/util/topdown.h
@@ -3,6 +3,5 @@
 #define _TOPDOWN_H 1
 
 bool topdown_sys_has_perf_metrics(void);
-int topdown_parse_events(struct evlist *evlist);
 
 #endif
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 5e13171a7bba..796e98e453f6 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1996,6 +1996,7 @@ static int add_default_attributes(void)
 		stat_config.topdown_level = TOPDOWN_MAX_LEVEL;
 
 	if (!evsel_list->core.nr_entries) {
+		/* No events so add defaults. */
 		if (target__has_cpu(&target))
 			default_attrs0[0].config = PERF_COUNT_SW_CPU_CLOCK;
 
@@ -2011,6 +2012,19 @@ static int add_default_attributes(void)
 		}
 		if (evlist__add_default_attrs(evsel_list, default_attrs1) < 0)
 			return -1;
+		/*
+		 * Add TopdownL1 metrics if they exist. To minimize
+		 * multiplexing, don't request threshold computation.
+		 */
+		if (metricgroup__has_metric("TopdownL1") &&
+		    metricgroup__parse_groups(evsel_list, "TopdownL1",
+					    /*metric_no_group=*/false,
+					    /*metric_no_merge=*/false,
+					    /*metric_no_threshold=*/true,
+					    stat_config.user_requested_cpu_list,
+					    stat_config.system_wide,
+					    &stat_config.metric_events) < 0)
+			return -1;
 		/* Platform specific attrs */
 		if (evlist__add_default_attrs(evsel_list, default_null_attrs) < 0)
 			return -1;
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index afb6f2fdc24e..64a35f2787dc 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -1647,10 +1647,8 @@ static int metricgroup__has_metric_callback(const struct pmu_metric *pm,
 {
 	const char *metric = vdata;
 
-	if (!pm->metric_expr)
-		return 0;
-
-	if (match_metric(pm->metric_name, metric))
+	if (match_metric(pm->metric_name, metric) ||
+	    match_metric(pm->metric_group, metric))
 		return 1;
 
 	return 0;
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 40/51] perf stat: Implement --topdown using json metrics
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (22 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 39/51] perf stat: Add TopdownL1 metric as a default if present Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 41/51] perf stat: Remove topdown event special handling Ian Rogers
                   ` (12 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Request the topdown metric group of a level with the metrics in the
group 'TopdownL<level>' rather than through specific events. As more
topdown levels are supported this way, such as 6 on Intel Ice Lake,
default to just showing the level 1 metrics. This can be overridden
using '--td-level'. Rather than determine the maximum topdown level
from sysfs, use the metric group names. Remove some now unused topdown
code.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/arch/x86/util/topdown.c |  48 +-----------
 tools/perf/builtin-stat.c          | 118 +++++------------------------
 tools/perf/util/metricgroup.c      |  31 ++++++++
 tools/perf/util/metricgroup.h      |   1 +
 tools/perf/util/topdown.c          |  68 +----------------
 tools/perf/util/topdown.h          |  11 +--
 6 files changed, 58 insertions(+), 219 deletions(-)

diff --git a/tools/perf/arch/x86/util/topdown.c b/tools/perf/arch/x86/util/topdown.c
index eb3a7d9652ab..9ad5e5c7bd27 100644
--- a/tools/perf/arch/x86/util/topdown.c
+++ b/tools/perf/arch/x86/util/topdown.c
@@ -1,11 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0
-#include <stdio.h>
 #include "api/fs/fs.h"
+#include "util/evsel.h"
 #include "util/pmu.h"
 #include "util/topdown.h"
-#include "util/evlist.h"
-#include "util/debug.h"
-#include "util/pmu-hybrid.h"
 #include "topdown.h"
 #include "evsel.h"
 
@@ -33,30 +30,6 @@ bool topdown_sys_has_perf_metrics(void)
 	return has_perf_metrics;
 }
 
-/*
- * Check whether we can use a group for top down.
- * Without a group may get bad results due to multiplexing.
- */
-bool arch_topdown_check_group(bool *warn)
-{
-	int n;
-
-	if (sysctl__read_int("kernel/nmi_watchdog", &n) < 0)
-		return false;
-	if (n > 0) {
-		*warn = true;
-		return false;
-	}
-	return true;
-}
-
-void arch_topdown_group_warn(void)
-{
-	fprintf(stderr,
-		"nmi_watchdog enabled with topdown. May give wrong results.\n"
-		"Disable with echo 0 > /proc/sys/kernel/nmi_watchdog\n");
-}
-
 #define TOPDOWN_SLOTS		0x0400
 
 /*
@@ -65,7 +38,6 @@ void arch_topdown_group_warn(void)
  * Only Topdown metric supports sample-read. The slots
  * event must be the leader of the topdown group.
  */
-
 bool arch_topdown_sample_read(struct evsel *leader)
 {
 	if (!evsel__sys_has_perf_metrics(leader))
@@ -76,21 +48,3 @@ bool arch_topdown_sample_read(struct evsel *leader)
 
 	return false;
 }
-
-const char *arch_get_topdown_pmu_name(struct evlist *evlist, bool warn)
-{
-	const char *pmu_name;
-
-	if (!perf_pmu__has_hybrid())
-		return "cpu";
-
-	if (!evlist->hybrid_pmu_name) {
-		if (warn)
-			pr_warning("WARNING: default to use cpu_core topdown events\n");
-		evlist->hybrid_pmu_name = perf_pmu__hybrid_type_to_pmu("core");
-	}
-
-	pmu_name = evlist->hybrid_pmu_name;
-
-	return pmu_name;
-}
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 796e98e453f6..bdb1ef4fc6ad 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -124,39 +124,6 @@ static const char * transaction_limited_attrs = {
 	"}"
 };
 
-static const char * topdown_attrs[] = {
-	"topdown-total-slots",
-	"topdown-slots-retired",
-	"topdown-recovery-bubbles",
-	"topdown-fetch-bubbles",
-	"topdown-slots-issued",
-	NULL,
-};
-
-static const char *topdown_metric_attrs[] = {
-	"slots",
-	"topdown-retiring",
-	"topdown-bad-spec",
-	"topdown-fe-bound",
-	"topdown-be-bound",
-	NULL,
-};
-
-static const char *topdown_metric_L2_attrs[] = {
-	"slots",
-	"topdown-retiring",
-	"topdown-bad-spec",
-	"topdown-fe-bound",
-	"topdown-be-bound",
-	"topdown-heavy-ops",
-	"topdown-br-mispredict",
-	"topdown-fetch-lat",
-	"topdown-mem-bound",
-	NULL,
-};
-
-#define TOPDOWN_MAX_LEVEL			2
-
 static const char *smi_cost_attrs = {
 	"{"
 	"msr/aperf/,"
@@ -1914,86 +1881,41 @@ static int add_default_attributes(void)
 	}
 
 	if (topdown_run) {
-		const char **metric_attrs = topdown_metric_attrs;
-		unsigned int max_level = 1;
-		char *str = NULL;
-		bool warn = false;
-		const char *pmu_name = arch_get_topdown_pmu_name(evsel_list, true);
+		unsigned int max_level = metricgroups__topdown_max_level();
+		char str[] = "TopdownL1";
 
 		if (!force_metric_only)
 			stat_config.metric_only = true;
 
-		if (pmu_have_event(pmu_name, topdown_metric_L2_attrs[5])) {
-			metric_attrs = topdown_metric_L2_attrs;
-			max_level = 2;
+		if (!max_level) {
+			pr_err("Topdown requested but the topdown metric groups aren't present.\n"
+				"(See perf list the metric groups have names like TopdownL1)");
+			return -1;
 		}
-
 		if (stat_config.topdown_level > max_level) {
 			pr_err("Invalid top-down metrics level. The max level is %u.\n", max_level);
 			return -1;
 		} else if (!stat_config.topdown_level)
-			stat_config.topdown_level = max_level;
+			stat_config.topdown_level = 1;
 
-		if (topdown_filter_events(metric_attrs, &str, 1, pmu_name) < 0) {
-			pr_err("Out of memory\n");
-			return -1;
-		}
-
-		if (metric_attrs[0] && str) {
-			if (!stat_config.interval && !stat_config.metric_only) {
-				fprintf(stat_config.output,
-					"Topdown accuracy may decrease when measuring long periods.\n"
-					"Please print the result regularly, e.g. -I1000\n");
-			}
-			goto setup_metrics;
-		}
-
-		zfree(&str);
-
-		if (stat_config.aggr_mode != AGGR_GLOBAL &&
-		    stat_config.aggr_mode != AGGR_CORE) {
-			pr_err("top down event configuration requires --per-core mode\n");
-			return -1;
-		}
-		stat_config.aggr_mode = AGGR_CORE;
-		if (nr_cgroups || !target__has_cpu(&target)) {
-			pr_err("top down event configuration requires system-wide mode (-a)\n");
-			return -1;
-		}
-
-		if (topdown_filter_events(topdown_attrs, &str,
-				arch_topdown_check_group(&warn),
-				pmu_name) < 0) {
-			pr_err("Out of memory\n");
-			return -1;
+		if (!stat_config.interval && !stat_config.metric_only) {
+			fprintf(stat_config.output,
+				"Topdown accuracy may decrease when measuring long periods.\n"
+				"Please print the result regularly, e.g. -I1000\n");
 		}
-
-		if (topdown_attrs[0] && str) {
-			struct parse_events_error errinfo;
-			if (warn)
-				arch_topdown_group_warn();
-setup_metrics:
-			parse_events_error__init(&errinfo);
-			err = parse_events(evsel_list, str, &errinfo);
-			if (err) {
-				fprintf(stderr,
-					"Cannot set up top down events %s: %d\n",
-					str, err);
-				parse_events_error__print(&errinfo, str);
-				parse_events_error__exit(&errinfo);
-				free(str);
-				return -1;
-			}
-			parse_events_error__exit(&errinfo);
-		} else {
-			fprintf(stderr, "System does not support topdown\n");
+		str[8] = stat_config.topdown_level + '0';
+		if (metricgroup__parse_groups(evsel_list, str,
+						/*metric_no_group=*/false,
+						/*metric_no_merge=*/false,
+						/*metric_no_threshold=*/true,
+						stat_config.user_requested_cpu_list,
+						stat_config.system_wide,
+						&stat_config.metric_events) < 0)
 			return -1;
-		}
-		free(str);
 	}
 
 	if (!stat_config.topdown_level)
-		stat_config.topdown_level = TOPDOWN_MAX_LEVEL;
+		stat_config.topdown_level = 1;
 
 	if (!evsel_list->core.nr_entries) {
 		/* No events so add defaults. */
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 64a35f2787dc..de6dd527a2ba 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -1665,6 +1665,37 @@ bool metricgroup__has_metric(const char *metric)
 						(void *)metric) ? true : false;
 }
 
+static int metricgroup__topdown_max_level_callback(const struct pmu_metric *pm,
+					    const struct pmu_metrics_table *table __maybe_unused,
+					    void *data)
+{
+	unsigned int *max_level = data;
+	unsigned int level;
+	const char *p = strstr(pm->metric_group, "TopdownL");
+
+	if (!p || p[8] == '\0')
+		return 0;
+
+	level = p[8] - '0';
+	if (level > *max_level)
+		*max_level = level;
+
+	return 0;
+}
+
+unsigned int metricgroups__topdown_max_level(void)
+{
+	unsigned int max_level = 0;
+	const struct pmu_metrics_table *table = pmu_metrics_table__find();
+
+	if (!table)
+		return false;
+
+	pmu_metrics_table_for_each_metric(table, metricgroup__topdown_max_level_callback,
+					  &max_level);
+	return max_level;
+}
+
 int metricgroup__copy_metric_events(struct evlist *evlist, struct cgroup *cgrp,
 				    struct rblist *new_metric_events,
 				    struct rblist *old_metric_events)
diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
index 8d50052c5b4c..77472e35705e 100644
--- a/tools/perf/util/metricgroup.h
+++ b/tools/perf/util/metricgroup.h
@@ -81,6 +81,7 @@ int metricgroup__parse_groups_test(struct evlist *evlist,
 
 void metricgroup__print(const struct print_callbacks *print_cb, void *print_state);
 bool metricgroup__has_metric(const char *metric);
+unsigned int metricgroups__topdown_max_level(void);
 int arch_get_runtimeparam(const struct pmu_metric *pm);
 void metricgroup__rblist_exit(struct rblist *metric_events);
 
diff --git a/tools/perf/util/topdown.c b/tools/perf/util/topdown.c
index 1090841550f7..18fd5fed5d1a 100644
--- a/tools/perf/util/topdown.c
+++ b/tools/perf/util/topdown.c
@@ -1,74 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0
-#include <stdio.h>
-#include "pmu.h"
-#include "pmu-hybrid.h"
 #include "topdown.h"
-
-int topdown_filter_events(const char **attr, char **str, bool use_group,
-			  const char *pmu_name)
-{
-	int off = 0;
-	int i;
-	int len = 0;
-	char *s;
-	bool is_hybrid = perf_pmu__is_hybrid(pmu_name);
-
-	for (i = 0; attr[i]; i++) {
-		if (pmu_have_event(pmu_name, attr[i])) {
-			if (is_hybrid)
-				len += strlen(attr[i]) + strlen(pmu_name) + 3;
-			else
-				len += strlen(attr[i]) + 1;
-			attr[i - off] = attr[i];
-		} else
-			off++;
-	}
-	attr[i - off] = NULL;
-
-	*str = malloc(len + 1 + 2);
-	if (!*str)
-		return -1;
-	s = *str;
-	if (i - off == 0) {
-		*s = 0;
-		return 0;
-	}
-	if (use_group)
-		*s++ = '{';
-	for (i = 0; attr[i]; i++) {
-		if (!is_hybrid)
-			strcpy(s, attr[i]);
-		else
-			sprintf(s, "%s/%s/", pmu_name, attr[i]);
-		s += strlen(s);
-		*s++ = ',';
-	}
-	if (use_group) {
-		s[-1] = '}';
-		*s = 0;
-	} else
-		s[-1] = 0;
-	return 0;
-}
-
-__weak bool arch_topdown_check_group(bool *warn)
-{
-	*warn = false;
-	return false;
-}
-
-__weak void arch_topdown_group_warn(void)
-{
-}
+#include <linux/kernel.h>
 
 __weak bool arch_topdown_sample_read(struct evsel *leader __maybe_unused)
 {
 	return false;
 }
-
-__weak const char *arch_get_topdown_pmu_name(struct evlist *evlist
-					     __maybe_unused,
-					     bool warn __maybe_unused)
-{
-	return "cpu";
-}
diff --git a/tools/perf/util/topdown.h b/tools/perf/util/topdown.h
index f9531528c559..1996c5fedcd7 100644
--- a/tools/perf/util/topdown.h
+++ b/tools/perf/util/topdown.h
@@ -1,14 +1,11 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #ifndef TOPDOWN_H
 #define TOPDOWN_H 1
-#include "evsel.h"
-#include "evlist.h"
 
-bool arch_topdown_check_group(bool *warn);
-void arch_topdown_group_warn(void);
+#include <stdbool.h>
+
+struct evsel;
+
 bool arch_topdown_sample_read(struct evsel *leader);
-const char *arch_get_topdown_pmu_name(struct evlist *evlist, bool warn);
-int topdown_filter_events(const char **attr, char **str, bool use_group,
-			  const char *pmu_name);
 
 #endif
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 41/51] perf stat: Remove topdown event special handling
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (23 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 40/51] perf stat: Implement --topdown using json metrics Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 42/51] perf doc: Refresh topdown documentation Ian Rogers
                   ` (11 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Now the events are computed from json metrics, the hard coded logic
can be removed.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/stat-shadow.c | 346 ----------------------------------
 tools/perf/util/stat.c        |  13 --
 tools/perf/util/stat.h        |  26 ---
 3 files changed, 385 deletions(-)

diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 77483eeda0d8..5189756bf16d 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -241,45 +241,6 @@ void perf_stat__update_shadow_stats(struct evsel *counter, u64 count,
 		update_runtime_stat(st, STAT_TRANSACTION, map_idx, count, &rsd);
 	else if (perf_stat_evsel__is(counter, ELISION_START))
 		update_runtime_stat(st, STAT_ELISION, map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_TOTAL_SLOTS))
-		update_runtime_stat(st, STAT_TOPDOWN_TOTAL_SLOTS,
-				    map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_ISSUED))
-		update_runtime_stat(st, STAT_TOPDOWN_SLOTS_ISSUED,
-				    map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_RETIRED))
-		update_runtime_stat(st, STAT_TOPDOWN_SLOTS_RETIRED,
-				    map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_FETCH_BUBBLES))
-		update_runtime_stat(st, STAT_TOPDOWN_FETCH_BUBBLES,
-				    map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_RECOVERY_BUBBLES))
-		update_runtime_stat(st, STAT_TOPDOWN_RECOVERY_BUBBLES,
-				    map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_RETIRING))
-		update_runtime_stat(st, STAT_TOPDOWN_RETIRING,
-				    map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_BAD_SPEC))
-		update_runtime_stat(st, STAT_TOPDOWN_BAD_SPEC,
-				    map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_FE_BOUND))
-		update_runtime_stat(st, STAT_TOPDOWN_FE_BOUND,
-				    map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_BE_BOUND))
-		update_runtime_stat(st, STAT_TOPDOWN_BE_BOUND,
-				    map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_HEAVY_OPS))
-		update_runtime_stat(st, STAT_TOPDOWN_HEAVY_OPS,
-				    map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_BR_MISPREDICT))
-		update_runtime_stat(st, STAT_TOPDOWN_BR_MISPREDICT,
-				    map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_FETCH_LAT))
-		update_runtime_stat(st, STAT_TOPDOWN_FETCH_LAT,
-				    map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_MEM_BOUND))
-		update_runtime_stat(st, STAT_TOPDOWN_MEM_BOUND,
-				    map_idx, count, &rsd);
 	else if (evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
 		update_runtime_stat(st, STAT_STALLED_CYCLES_FRONT,
 				    map_idx, count, &rsd);
@@ -524,156 +485,6 @@ static void print_ll_cache_misses(struct perf_stat_config *config,
 	out->print_metric(config, out->ctx, color, "%7.2f%%", "of all LL-cache accesses", ratio);
 }
 
-/*
- * High level "TopDown" CPU core pipe line bottleneck break down.
- *
- * Basic concept following
- * Yasin, A Top Down Method for Performance analysis and Counter architecture
- * ISPASS14
- *
- * The CPU pipeline is divided into 4 areas that can be bottlenecks:
- *
- * Frontend -> Backend -> Retiring
- * BadSpeculation in addition means out of order execution that is thrown away
- * (for example branch mispredictions)
- * Frontend is instruction decoding.
- * Backend is execution, like computation and accessing data in memory
- * Retiring is good execution that is not directly bottlenecked
- *
- * The formulas are computed in slots.
- * A slot is an entry in the pipeline each for the pipeline width
- * (for example a 4-wide pipeline has 4 slots for each cycle)
- *
- * Formulas:
- * BadSpeculation = ((SlotsIssued - SlotsRetired) + RecoveryBubbles) /
- *			TotalSlots
- * Retiring = SlotsRetired / TotalSlots
- * FrontendBound = FetchBubbles / TotalSlots
- * BackendBound = 1.0 - BadSpeculation - Retiring - FrontendBound
- *
- * The kernel provides the mapping to the low level CPU events and any scaling
- * needed for the CPU pipeline width, for example:
- *
- * TotalSlots = Cycles * 4
- *
- * The scaling factor is communicated in the sysfs unit.
- *
- * In some cases the CPU may not be able to measure all the formulas due to
- * missing events. In this case multiple formulas are combined, as possible.
- *
- * Full TopDown supports more levels to sub-divide each area: for example
- * BackendBound into computing bound and memory bound. For now we only
- * support Level 1 TopDown.
- */
-
-static double sanitize_val(double x)
-{
-	if (x < 0 && x >= -0.02)
-		return 0.0;
-	return x;
-}
-
-static double td_total_slots(int map_idx, struct runtime_stat *st,
-			     struct runtime_stat_data *rsd)
-{
-	return runtime_stat_avg(st, STAT_TOPDOWN_TOTAL_SLOTS, map_idx, rsd);
-}
-
-static double td_bad_spec(int map_idx, struct runtime_stat *st,
-			  struct runtime_stat_data *rsd)
-{
-	double bad_spec = 0;
-	double total_slots;
-	double total;
-
-	total = runtime_stat_avg(st, STAT_TOPDOWN_SLOTS_ISSUED, map_idx, rsd) -
-		runtime_stat_avg(st, STAT_TOPDOWN_SLOTS_RETIRED, map_idx, rsd) +
-		runtime_stat_avg(st, STAT_TOPDOWN_RECOVERY_BUBBLES, map_idx, rsd);
-
-	total_slots = td_total_slots(map_idx, st, rsd);
-	if (total_slots)
-		bad_spec = total / total_slots;
-	return sanitize_val(bad_spec);
-}
-
-static double td_retiring(int map_idx, struct runtime_stat *st,
-			  struct runtime_stat_data *rsd)
-{
-	double retiring = 0;
-	double total_slots = td_total_slots(map_idx, st, rsd);
-	double ret_slots = runtime_stat_avg(st, STAT_TOPDOWN_SLOTS_RETIRED,
-					    map_idx, rsd);
-
-	if (total_slots)
-		retiring = ret_slots / total_slots;
-	return retiring;
-}
-
-static double td_fe_bound(int map_idx, struct runtime_stat *st,
-			  struct runtime_stat_data *rsd)
-{
-	double fe_bound = 0;
-	double total_slots = td_total_slots(map_idx, st, rsd);
-	double fetch_bub = runtime_stat_avg(st, STAT_TOPDOWN_FETCH_BUBBLES,
-					    map_idx, rsd);
-
-	if (total_slots)
-		fe_bound = fetch_bub / total_slots;
-	return fe_bound;
-}
-
-static double td_be_bound(int map_idx, struct runtime_stat *st,
-			  struct runtime_stat_data *rsd)
-{
-	double sum = (td_fe_bound(map_idx, st, rsd) +
-		      td_bad_spec(map_idx, st, rsd) +
-		      td_retiring(map_idx, st, rsd));
-	if (sum == 0)
-		return 0;
-	return sanitize_val(1.0 - sum);
-}
-
-/*
- * Kernel reports metrics multiplied with slots. To get back
- * the ratios we need to recreate the sum.
- */
-
-static double td_metric_ratio(int map_idx, enum stat_type type,
-			      struct runtime_stat *stat,
-			      struct runtime_stat_data *rsd)
-{
-	double sum = runtime_stat_avg(stat, STAT_TOPDOWN_RETIRING, map_idx, rsd) +
-		runtime_stat_avg(stat, STAT_TOPDOWN_FE_BOUND, map_idx, rsd) +
-		runtime_stat_avg(stat, STAT_TOPDOWN_BE_BOUND, map_idx, rsd) +
-		runtime_stat_avg(stat, STAT_TOPDOWN_BAD_SPEC, map_idx, rsd);
-	double d = runtime_stat_avg(stat, type, map_idx, rsd);
-
-	if (sum)
-		return d / sum;
-	return 0;
-}
-
-/*
- * ... but only if most of the values are actually available.
- * We allow two missing.
- */
-
-static bool full_td(int map_idx, struct runtime_stat *stat,
-		    struct runtime_stat_data *rsd)
-{
-	int c = 0;
-
-	if (runtime_stat_avg(stat, STAT_TOPDOWN_RETIRING, map_idx, rsd) > 0)
-		c++;
-	if (runtime_stat_avg(stat, STAT_TOPDOWN_BE_BOUND, map_idx, rsd) > 0)
-		c++;
-	if (runtime_stat_avg(stat, STAT_TOPDOWN_FE_BOUND, map_idx, rsd) > 0)
-		c++;
-	if (runtime_stat_avg(stat, STAT_TOPDOWN_BAD_SPEC, map_idx, rsd) > 0)
-		c++;
-	return c >= 2;
-}
-
 static void print_smi_cost(struct perf_stat_config *config, int map_idx,
 			   struct perf_stat_output_ctx *out,
 			   struct runtime_stat *st,
@@ -885,7 +696,6 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 	void *ctxp = out->ctx;
 	print_metric_t print_metric = out->print_metric;
 	double total, ratio = 0.0, total2;
-	const char *color = NULL;
 	struct runtime_stat_data rsd = {
 		.ctx = evsel_context(evsel),
 		.cgrp = evsel->cgrp,
@@ -1044,162 +854,6 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 				     avg / (ratio * evsel->scale));
 		else
 			print_metric(config, ctxp, NULL, NULL, "CPUs utilized", 0);
-	} else if (perf_stat_evsel__is(evsel, TOPDOWN_FETCH_BUBBLES)) {
-		double fe_bound = td_fe_bound(map_idx, st, &rsd);
-
-		if (fe_bound > 0.2)
-			color = PERF_COLOR_RED;
-		print_metric(config, ctxp, color, "%8.1f%%", "frontend bound",
-				fe_bound * 100.);
-	} else if (perf_stat_evsel__is(evsel, TOPDOWN_SLOTS_RETIRED)) {
-		double retiring = td_retiring(map_idx, st, &rsd);
-
-		if (retiring > 0.7)
-			color = PERF_COLOR_GREEN;
-		print_metric(config, ctxp, color, "%8.1f%%", "retiring",
-				retiring * 100.);
-	} else if (perf_stat_evsel__is(evsel, TOPDOWN_RECOVERY_BUBBLES)) {
-		double bad_spec = td_bad_spec(map_idx, st, &rsd);
-
-		if (bad_spec > 0.1)
-			color = PERF_COLOR_RED;
-		print_metric(config, ctxp, color, "%8.1f%%", "bad speculation",
-				bad_spec * 100.);
-	} else if (perf_stat_evsel__is(evsel, TOPDOWN_SLOTS_ISSUED)) {
-		double be_bound = td_be_bound(map_idx, st, &rsd);
-		const char *name = "backend bound";
-		static int have_recovery_bubbles = -1;
-
-		/* In case the CPU does not support topdown-recovery-bubbles */
-		if (have_recovery_bubbles < 0)
-			have_recovery_bubbles = pmu_have_event("cpu",
-					"topdown-recovery-bubbles");
-		if (!have_recovery_bubbles)
-			name = "backend bound/bad spec";
-
-		if (be_bound > 0.2)
-			color = PERF_COLOR_RED;
-		if (td_total_slots(map_idx, st, &rsd) > 0)
-			print_metric(config, ctxp, color, "%8.1f%%", name,
-					be_bound * 100.);
-		else
-			print_metric(config, ctxp, NULL, NULL, name, 0);
-	} else if (perf_stat_evsel__is(evsel, TOPDOWN_RETIRING) &&
-		   full_td(map_idx, st, &rsd)) {
-		double retiring = td_metric_ratio(map_idx,
-						  STAT_TOPDOWN_RETIRING, st,
-						  &rsd);
-		if (retiring > 0.7)
-			color = PERF_COLOR_GREEN;
-		print_metric(config, ctxp, color, "%8.1f%%", "Retiring",
-				retiring * 100.);
-	} else if (perf_stat_evsel__is(evsel, TOPDOWN_FE_BOUND) &&
-		   full_td(map_idx, st, &rsd)) {
-		double fe_bound = td_metric_ratio(map_idx,
-						  STAT_TOPDOWN_FE_BOUND, st,
-						  &rsd);
-		if (fe_bound > 0.2)
-			color = PERF_COLOR_RED;
-		print_metric(config, ctxp, color, "%8.1f%%", "Frontend Bound",
-				fe_bound * 100.);
-	} else if (perf_stat_evsel__is(evsel, TOPDOWN_BE_BOUND) &&
-		   full_td(map_idx, st, &rsd)) {
-		double be_bound = td_metric_ratio(map_idx,
-						  STAT_TOPDOWN_BE_BOUND, st,
-						  &rsd);
-		if (be_bound > 0.2)
-			color = PERF_COLOR_RED;
-		print_metric(config, ctxp, color, "%8.1f%%", "Backend Bound",
-				be_bound * 100.);
-	} else if (perf_stat_evsel__is(evsel, TOPDOWN_BAD_SPEC) &&
-		   full_td(map_idx, st, &rsd)) {
-		double bad_spec = td_metric_ratio(map_idx,
-						  STAT_TOPDOWN_BAD_SPEC, st,
-						  &rsd);
-		if (bad_spec > 0.1)
-			color = PERF_COLOR_RED;
-		print_metric(config, ctxp, color, "%8.1f%%", "Bad Speculation",
-				bad_spec * 100.);
-	} else if (perf_stat_evsel__is(evsel, TOPDOWN_HEAVY_OPS) &&
-			full_td(map_idx, st, &rsd) && (config->topdown_level > 1)) {
-		double retiring = td_metric_ratio(map_idx,
-						  STAT_TOPDOWN_RETIRING, st,
-						  &rsd);
-		double heavy_ops = td_metric_ratio(map_idx,
-						   STAT_TOPDOWN_HEAVY_OPS, st,
-						   &rsd);
-		double light_ops = retiring - heavy_ops;
-
-		if (retiring > 0.7 && heavy_ops > 0.1)
-			color = PERF_COLOR_GREEN;
-		print_metric(config, ctxp, color, "%8.1f%%", "Heavy Operations",
-				heavy_ops * 100.);
-		if (retiring > 0.7 && light_ops > 0.6)
-			color = PERF_COLOR_GREEN;
-		else
-			color = NULL;
-		print_metric(config, ctxp, color, "%8.1f%%", "Light Operations",
-				light_ops * 100.);
-	} else if (perf_stat_evsel__is(evsel, TOPDOWN_BR_MISPREDICT) &&
-			full_td(map_idx, st, &rsd) && (config->topdown_level > 1)) {
-		double bad_spec = td_metric_ratio(map_idx,
-						  STAT_TOPDOWN_BAD_SPEC, st,
-						  &rsd);
-		double br_mis = td_metric_ratio(map_idx,
-						STAT_TOPDOWN_BR_MISPREDICT, st,
-						&rsd);
-		double m_clears = bad_spec - br_mis;
-
-		if (bad_spec > 0.1 && br_mis > 0.05)
-			color = PERF_COLOR_RED;
-		print_metric(config, ctxp, color, "%8.1f%%", "Branch Mispredict",
-				br_mis * 100.);
-		if (bad_spec > 0.1 && m_clears > 0.05)
-			color = PERF_COLOR_RED;
-		else
-			color = NULL;
-		print_metric(config, ctxp, color, "%8.1f%%", "Machine Clears",
-				m_clears * 100.);
-	} else if (perf_stat_evsel__is(evsel, TOPDOWN_FETCH_LAT) &&
-			full_td(map_idx, st, &rsd) && (config->topdown_level > 1)) {
-		double fe_bound = td_metric_ratio(map_idx,
-						  STAT_TOPDOWN_FE_BOUND, st,
-						  &rsd);
-		double fetch_lat = td_metric_ratio(map_idx,
-						   STAT_TOPDOWN_FETCH_LAT, st,
-						   &rsd);
-		double fetch_bw = fe_bound - fetch_lat;
-
-		if (fe_bound > 0.2 && fetch_lat > 0.15)
-			color = PERF_COLOR_RED;
-		print_metric(config, ctxp, color, "%8.1f%%", "Fetch Latency",
-				fetch_lat * 100.);
-		if (fe_bound > 0.2 && fetch_bw > 0.1)
-			color = PERF_COLOR_RED;
-		else
-			color = NULL;
-		print_metric(config, ctxp, color, "%8.1f%%", "Fetch Bandwidth",
-				fetch_bw * 100.);
-	} else if (perf_stat_evsel__is(evsel, TOPDOWN_MEM_BOUND) &&
-			full_td(map_idx, st, &rsd) && (config->topdown_level > 1)) {
-		double be_bound = td_metric_ratio(map_idx,
-						  STAT_TOPDOWN_BE_BOUND, st,
-						  &rsd);
-		double mem_bound = td_metric_ratio(map_idx,
-						   STAT_TOPDOWN_MEM_BOUND, st,
-						   &rsd);
-		double core_bound = be_bound - mem_bound;
-
-		if (be_bound > 0.2 && mem_bound > 0.2)
-			color = PERF_COLOR_RED;
-		print_metric(config, ctxp, color, "%8.1f%%", "Memory Bound",
-				mem_bound * 100.);
-		if (be_bound > 0.2 && core_bound > 0.1)
-			color = PERF_COLOR_RED;
-		else
-			color = NULL;
-		print_metric(config, ctxp, color, "%8.1f%%", "Core Bound",
-				core_bound * 100.);
 	} else if (runtime_stat_n(st, STAT_NSECS, map_idx, &rsd) != 0) {
 		char unit = ' ';
 		char unit_buf[10] = "/sec";
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 534d36d26fc3..0b8c91ca13cd 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -91,19 +91,6 @@ static const char *id_str[PERF_STAT_EVSEL_ID__MAX] = {
 	ID(TRANSACTION_START,	cpu/tx-start/),
 	ID(ELISION_START,	cpu/el-start/),
 	ID(CYCLES_IN_TX_CP,	cpu/cycles-ct/),
-	ID(TOPDOWN_TOTAL_SLOTS, topdown-total-slots),
-	ID(TOPDOWN_SLOTS_ISSUED, topdown-slots-issued),
-	ID(TOPDOWN_SLOTS_RETIRED, topdown-slots-retired),
-	ID(TOPDOWN_FETCH_BUBBLES, topdown-fetch-bubbles),
-	ID(TOPDOWN_RECOVERY_BUBBLES, topdown-recovery-bubbles),
-	ID(TOPDOWN_RETIRING, topdown-retiring),
-	ID(TOPDOWN_BAD_SPEC, topdown-bad-spec),
-	ID(TOPDOWN_FE_BOUND, topdown-fe-bound),
-	ID(TOPDOWN_BE_BOUND, topdown-be-bound),
-	ID(TOPDOWN_HEAVY_OPS, topdown-heavy-ops),
-	ID(TOPDOWN_BR_MISPREDICT, topdown-br-mispredict),
-	ID(TOPDOWN_FETCH_LAT, topdown-fetch-lat),
-	ID(TOPDOWN_MEM_BOUND, topdown-mem-bound),
 	ID(SMI_NUM, msr/smi/),
 	ID(APERF, msr/aperf/),
 };
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index cf2d8aa445f3..42af350a96d9 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -25,19 +25,6 @@ enum perf_stat_evsel_id {
 	PERF_STAT_EVSEL_ID__TRANSACTION_START,
 	PERF_STAT_EVSEL_ID__ELISION_START,
 	PERF_STAT_EVSEL_ID__CYCLES_IN_TX_CP,
-	PERF_STAT_EVSEL_ID__TOPDOWN_TOTAL_SLOTS,
-	PERF_STAT_EVSEL_ID__TOPDOWN_SLOTS_ISSUED,
-	PERF_STAT_EVSEL_ID__TOPDOWN_SLOTS_RETIRED,
-	PERF_STAT_EVSEL_ID__TOPDOWN_FETCH_BUBBLES,
-	PERF_STAT_EVSEL_ID__TOPDOWN_RECOVERY_BUBBLES,
-	PERF_STAT_EVSEL_ID__TOPDOWN_RETIRING,
-	PERF_STAT_EVSEL_ID__TOPDOWN_BAD_SPEC,
-	PERF_STAT_EVSEL_ID__TOPDOWN_FE_BOUND,
-	PERF_STAT_EVSEL_ID__TOPDOWN_BE_BOUND,
-	PERF_STAT_EVSEL_ID__TOPDOWN_HEAVY_OPS,
-	PERF_STAT_EVSEL_ID__TOPDOWN_BR_MISPREDICT,
-	PERF_STAT_EVSEL_ID__TOPDOWN_FETCH_LAT,
-	PERF_STAT_EVSEL_ID__TOPDOWN_MEM_BOUND,
 	PERF_STAT_EVSEL_ID__SMI_NUM,
 	PERF_STAT_EVSEL_ID__APERF,
 	PERF_STAT_EVSEL_ID__MAX,
@@ -108,19 +95,6 @@ enum stat_type {
 	STAT_CYCLES_IN_TX,
 	STAT_TRANSACTION,
 	STAT_ELISION,
-	STAT_TOPDOWN_TOTAL_SLOTS,
-	STAT_TOPDOWN_SLOTS_ISSUED,
-	STAT_TOPDOWN_SLOTS_RETIRED,
-	STAT_TOPDOWN_FETCH_BUBBLES,
-	STAT_TOPDOWN_RECOVERY_BUBBLES,
-	STAT_TOPDOWN_RETIRING,
-	STAT_TOPDOWN_BAD_SPEC,
-	STAT_TOPDOWN_FE_BOUND,
-	STAT_TOPDOWN_BE_BOUND,
-	STAT_TOPDOWN_HEAVY_OPS,
-	STAT_TOPDOWN_BR_MISPREDICT,
-	STAT_TOPDOWN_FETCH_LAT,
-	STAT_TOPDOWN_MEM_BOUND,
 	STAT_SMI_NUM,
 	STAT_APERF,
 	STAT_MAX
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 42/51] perf doc: Refresh topdown documentation
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (24 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 41/51] perf stat: Remove topdown event special handling Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 43/51] perf stat: Remove hard coded transaction events Ian Rogers
                   ` (10 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

perf stat now supports --topdown for any platform with the TopdownL1
metric group including Intel before Icelake. Tweak the documentation
to reflect this.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/Documentation/perf-stat.txt | 27 +++++-----
 tools/perf/Documentation/topdown.txt   | 70 +++++++++++---------------
 2 files changed, 44 insertions(+), 53 deletions(-)

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 18abdc1dce05..29bdcfa93f04 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -394,10 +394,10 @@ See perf list output for the possible metrics and metricgroups.
 Do not aggregate counts across all monitored CPUs.
 
 --topdown::
-Print complete top-down metrics supported by the CPU. This allows to
-determine bottle necks in the CPU pipeline for CPU bound workloads,
-by breaking the cycles consumed down into frontend bound, backend bound,
-bad speculation and retiring.
+Print top-down metrics supported by the CPU. This allows to determine
+bottle necks in the CPU pipeline for CPU bound workloads, by breaking
+the cycles consumed down into frontend bound, backend bound, bad
+speculation and retiring.
 
 Frontend bound means that the CPU cannot fetch and decode instructions fast
 enough. Backend bound means that computation or memory access is the bottle
@@ -430,15 +430,18 @@ CPUs the workload runs on. If needed the CPUs can be forced using
 taskset.
 
 --td-level::
-Print the top-down statistics that equal to or lower than the input level.
-It allows users to print the interested top-down metrics level instead of
-the complete top-down metrics.
+Print the top-down statistics that equal the input level. It allows
+users to print the interested top-down metrics level instead of the
+level 1 top-down metrics.
+
+As the higher levels gather more metrics and use more counters they
+will be less accurate. By convention a metric can be examined by
+appending '_group' to it and this will increase accuracy compared to
+gathering all metrics for a level. For example, level 1 analysis may
+highlight 'tma_frontend_bound'. This metric may be drilled into with
+'tma_frontend_bound_group' with
+'perf stat -M tma_frontend_bound_group...'.
 
-The availability of the top-down metrics level depends on the hardware. For
-example, Ice Lake only supports L1 top-down metrics. The Sapphire Rapids
-supports both L1 and L2 top-down metrics.
-
-Default: 0 means the max level that the current hardware support.
 Error out if the input is higher than the supported max level.
 
 --no-merge::
diff --git a/tools/perf/Documentation/topdown.txt b/tools/perf/Documentation/topdown.txt
index a15b93fdcf50..ae0aee86844f 100644
--- a/tools/perf/Documentation/topdown.txt
+++ b/tools/perf/Documentation/topdown.txt
@@ -1,46 +1,35 @@
-Using TopDown metrics in user space
------------------------------------
+Using TopDown metrics
+---------------------
 
-Intel CPUs (since Sandy Bridge and Silvermont) support a TopDown
-methodology to break down CPU pipeline execution into 4 bottlenecks:
-frontend bound, backend bound, bad speculation, retiring.
+TopDown metrics break apart performance bottlenecks. Starting at level
+1 it is typical to get metrics on retiring, bad speculation, frontend
+bound, and backend bound. Higher levels provide more detail in to the
+level 1 bottlenecks, such as at level 2: core bound, memory bound,
+heavy operations, light operations, branch mispredicts, machine
+clears, fetch latency and fetch bandwidth. For more details see [1][2][3].
 
-For more details on Topdown see [1][5]
+perf stat --topdown implements this using available metrics that vary
+per architecture.
 
-Traditionally this was implemented by events in generic counters
-and specific formulas to compute the bottlenecks.
-
-perf stat --topdown implements this.
-
-Full Top Down includes more levels that can break down the
-bottlenecks further. This is not directly implemented in perf,
-but available in other tools that can run on top of perf,
-such as toplev[2] or vtune[3]
+% perf stat -a --topdown -I1000
+#           time      %  tma_retiring %  tma_backend_bound %  tma_frontend_bound %  tma_bad_speculation
+     1.001141351                 11.5                 34.9                  46.9                    6.7
+     2.006141972                 13.4                 28.1                  50.4                    8.1
+     3.010162040                 12.9                 28.1                  51.1                    8.0
+     4.014009311                 12.5                 28.6                  51.8                    7.2
+     5.017838554                 11.8                 33.0                  48.0                    7.2
+     5.704818971                 14.0                 27.5                  51.3                    7.3
+...
 
-New Topdown features in Ice Lake
-===============================
+New Topdown features in Intel Ice Lake
+======================================
 
 With Ice Lake CPUs the TopDown metrics are directly available as
 fixed counters and do not require generic counters. This allows
 to collect TopDown always in addition to other events.
 
-% perf stat -a --topdown -I1000
-#           time             retiring      bad speculation       frontend bound        backend bound
-     1.001281330                23.0%                15.3%                29.6%                32.1%
-     2.003009005                 5.0%                 6.8%                46.6%                41.6%
-     3.004646182                 6.7%                 6.7%                46.0%                40.6%
-     4.006326375                 5.0%                 6.4%                47.6%                41.0%
-     5.007991804                 5.1%                 6.3%                46.3%                42.3%
-     6.009626773                 6.2%                 7.1%                47.3%                39.3%
-     7.011296356                 4.7%                 6.7%                46.2%                42.4%
-     8.012951831                 4.7%                 6.7%                47.5%                41.1%
-...
-
-This also enables measuring TopDown per thread/process instead
-of only per core.
-
-Using TopDown through RDPMC in applications on Ice Lake
-======================================================
+Using TopDown through RDPMC in applications on Intel Ice Lake
+=============================================================
 
 For more fine grained measurements it can be useful to
 access the new  directly from user space. This is more complicated,
@@ -301,8 +290,8 @@ This "opens" a new measurement period.
 A program using RDPMC for TopDown should schedule such a reset
 regularly, as in every few seconds.
 
-Limits on Ice Lake
-==================
+Limits on Intel Ice Lake
+========================
 
 Four pseudo TopDown metric events are exposed for the end-users,
 topdown-retiring, topdown-bad-spec, topdown-fe-bound and topdown-be-bound.
@@ -318,8 +307,8 @@ a sampling read group. Since the SLOTS event must be the leader of a TopDown
 group, the second event of the group is the sampling event.
 For example, perf record -e '{slots, $sampling_event, topdown-retiring}:S'
 
-Extension on Sapphire Rapids Server
-===================================
+Extension on Intel Sapphire Rapids Server
+=========================================
 The metrics counter is extended to support TMA method level 2 metrics.
 The lower half of the register is the TMA level 1 metrics (legacy).
 The upper half is also divided into four 8-bit fields for the new level 2
@@ -338,7 +327,6 @@ other four level 2 metrics by subtracting corresponding metrics as below.
 
 
 [1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win
-[2] https://github.com/andikleen/pmu-tools/wiki/toplev-manual
-[3] https://software.intel.com/en-us/intel-vtune-amplifier-xe
+[2] https://sites.google.com/site/analysismethods/yasin-pubs
+[3] https://perf.wiki.kernel.org/index.php/Top-Down_Analysis
 [4] https://github.com/andikleen/pmu-tools/tree/master/jevents
-[5] https://sites.google.com/site/analysismethods/yasin-pubs
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 43/51] perf stat: Remove hard coded transaction events
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (25 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 42/51] perf doc: Refresh topdown documentation Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 44/51] perf stat: Use metrics for --smi-cost Ian Rogers
                   ` (9 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

The metric group "transaction" is now present for Intel architectures
so the legacy hard coded approach won't be used. Remove the associated
logic.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-stat.c     | 59 ++++++-----------------------------
 tools/perf/util/stat-shadow.c | 48 +---------------------------
 tools/perf/util/stat.c        |  4 ---
 tools/perf/util/stat.h        |  7 -----
 4 files changed, 11 insertions(+), 107 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index bdb1ef4fc6ad..e6b60b058257 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -100,30 +100,6 @@
 
 static void print_counters(struct timespec *ts, int argc, const char **argv);
 
-/* Default events used for perf stat -T */
-static const char *transaction_attrs = {
-	"task-clock,"
-	"{"
-	"instructions,"
-	"cycles,"
-	"cpu/cycles-t/,"
-	"cpu/tx-start/,"
-	"cpu/el-start/,"
-	"cpu/cycles-ct/"
-	"}"
-};
-
-/* More limited version when the CPU does not have all events. */
-static const char * transaction_limited_attrs = {
-	"task-clock,"
-	"{"
-	"instructions,"
-	"cycles,"
-	"cpu/cycles-t/,"
-	"cpu/tx-start/"
-	"}"
-};
-
 static const char *smi_cost_attrs = {
 	"{"
 	"msr/aperf/,"
@@ -1811,37 +1787,22 @@ static int add_default_attributes(void)
 		return 0;
 
 	if (transaction_run) {
-		struct parse_events_error errinfo;
 		/* Handle -T as -M transaction. Once platform specific metrics
 		 * support has been added to the json files, all architectures
 		 * will use this approach. To determine transaction support
 		 * on an architecture test for such a metric name.
 		 */
-		if (metricgroup__has_metric("transaction")) {
-			return metricgroup__parse_groups(evsel_list, "transaction",
-							 stat_config.metric_no_group,
-							 stat_config.metric_no_merge,
-							 stat_config.metric_no_threshold,
-							 stat_config.user_requested_cpu_list,
-							 stat_config.system_wide,
-							 &stat_config.metric_events);
-		}
-
-		parse_events_error__init(&errinfo);
-		if (pmu_have_event("cpu", "cycles-ct") &&
-		    pmu_have_event("cpu", "el-start"))
-			err = parse_events(evsel_list, transaction_attrs,
-					   &errinfo);
-		else
-			err = parse_events(evsel_list,
-					   transaction_limited_attrs,
-					   &errinfo);
-		if (err) {
-			fprintf(stderr, "Cannot set up transaction events\n");
-			parse_events_error__print(&errinfo, transaction_attrs);
+		if (!metricgroup__has_metric("transaction")) {
+			pr_err("Missing transaction metrics");
+			return -1;
 		}
-		parse_events_error__exit(&errinfo);
-		return err ? -1 : 0;
+		return metricgroup__parse_groups(evsel_list, "transaction",
+						stat_config.metric_no_group,
+						stat_config.metric_no_merge,
+						stat_config.metric_no_threshold,
+						stat_config.user_requested_cpu_list,
+						stat_config.system_wide,
+						&stat_config.metric_events);
 	}
 
 	if (smi_cost) {
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 5189756bf16d..3cfe4b4eb3de 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -235,12 +235,6 @@ void perf_stat__update_shadow_stats(struct evsel *counter, u64 count,
 		update_runtime_stat(st, STAT_NSECS, map_idx, count_ns, &rsd);
 	else if (evsel__match(counter, HARDWARE, HW_CPU_CYCLES))
 		update_runtime_stat(st, STAT_CYCLES, map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, CYCLES_IN_TX))
-		update_runtime_stat(st, STAT_CYCLES_IN_TX, map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TRANSACTION_START))
-		update_runtime_stat(st, STAT_TRANSACTION, map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, ELISION_START))
-		update_runtime_stat(st, STAT_ELISION, map_idx, count, &rsd);
 	else if (evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
 		update_runtime_stat(st, STAT_STALLED_CYCLES_FRONT,
 				    map_idx, count, &rsd);
@@ -695,7 +689,7 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 {
 	void *ctxp = out->ctx;
 	print_metric_t print_metric = out->print_metric;
-	double total, ratio = 0.0, total2;
+	double total, ratio = 0.0;
 	struct runtime_stat_data rsd = {
 		.ctx = evsel_context(evsel),
 		.cgrp = evsel->cgrp,
@@ -808,46 +802,6 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 		} else {
 			print_metric(config, ctxp, NULL, NULL, "Ghz", 0);
 		}
-	} else if (perf_stat_evsel__is(evsel, CYCLES_IN_TX)) {
-		total = runtime_stat_avg(st, STAT_CYCLES, map_idx, &rsd);
-
-		if (total)
-			print_metric(config, ctxp, NULL,
-					"%7.2f%%", "transactional cycles",
-					100.0 * (avg / total));
-		else
-			print_metric(config, ctxp, NULL, NULL, "transactional cycles",
-				     0);
-	} else if (perf_stat_evsel__is(evsel, CYCLES_IN_TX_CP)) {
-		total = runtime_stat_avg(st, STAT_CYCLES, map_idx, &rsd);
-		total2 = runtime_stat_avg(st, STAT_CYCLES_IN_TX, map_idx, &rsd);
-
-		if (total2 < avg)
-			total2 = avg;
-		if (total)
-			print_metric(config, ctxp, NULL, "%7.2f%%", "aborted cycles",
-				100.0 * ((total2-avg) / total));
-		else
-			print_metric(config, ctxp, NULL, NULL, "aborted cycles", 0);
-	} else if (perf_stat_evsel__is(evsel, TRANSACTION_START)) {
-		total = runtime_stat_avg(st, STAT_CYCLES_IN_TX, map_idx, &rsd);
-
-		if (avg)
-			ratio = total / avg;
-
-		if (runtime_stat_n(st, STAT_CYCLES_IN_TX, map_idx, &rsd) != 0)
-			print_metric(config, ctxp, NULL, "%8.0f",
-				     "cycles / transaction", ratio);
-		else
-			print_metric(config, ctxp, NULL, NULL, "cycles / transaction",
-				      0);
-	} else if (perf_stat_evsel__is(evsel, ELISION_START)) {
-		total = runtime_stat_avg(st, STAT_CYCLES_IN_TX, map_idx, &rsd);
-
-		if (avg)
-			ratio = total / avg;
-
-		print_metric(config, ctxp, NULL, "%8.0f", "cycles / elision", ratio);
 	} else if (evsel__is_clock(evsel)) {
 		if ((ratio = avg_stats(&walltime_nsecs_stats)) != 0)
 			print_metric(config, ctxp, NULL, "%8.3f", "CPUs utilized",
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 0b8c91ca13cd..b5b18d457254 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -87,10 +87,6 @@ bool __perf_stat_evsel__is(struct evsel *evsel, enum perf_stat_evsel_id id)
 #define ID(id, name) [PERF_STAT_EVSEL_ID__##id] = #name
 static const char *id_str[PERF_STAT_EVSEL_ID__MAX] = {
 	ID(NONE,		x),
-	ID(CYCLES_IN_TX,	cpu/cycles-t/),
-	ID(TRANSACTION_START,	cpu/tx-start/),
-	ID(ELISION_START,	cpu/el-start/),
-	ID(CYCLES_IN_TX_CP,	cpu/cycles-ct/),
 	ID(SMI_NUM, msr/smi/),
 	ID(APERF, msr/aperf/),
 };
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 42af350a96d9..c5fe847dd344 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -21,10 +21,6 @@ struct stats {
 
 enum perf_stat_evsel_id {
 	PERF_STAT_EVSEL_ID__NONE = 0,
-	PERF_STAT_EVSEL_ID__CYCLES_IN_TX,
-	PERF_STAT_EVSEL_ID__TRANSACTION_START,
-	PERF_STAT_EVSEL_ID__ELISION_START,
-	PERF_STAT_EVSEL_ID__CYCLES_IN_TX_CP,
 	PERF_STAT_EVSEL_ID__SMI_NUM,
 	PERF_STAT_EVSEL_ID__APERF,
 	PERF_STAT_EVSEL_ID__MAX,
@@ -92,9 +88,6 @@ enum stat_type {
 	STAT_LL_CACHE,
 	STAT_ITLB_CACHE,
 	STAT_DTLB_CACHE,
-	STAT_CYCLES_IN_TX,
-	STAT_TRANSACTION,
-	STAT_ELISION,
 	STAT_SMI_NUM,
 	STAT_APERF,
 	STAT_MAX
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 44/51] perf stat: Use metrics for --smi-cost
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (26 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 43/51] perf stat: Remove hard coded transaction events Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 45/51] perf stat: Remove perf_stat_evsel_id Ian Rogers
                   ` (8 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Rather than parsing events for --smi-cost, use the json metric group
'smi'.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-stat.c     | 34 +++++++++++-----------------------
 tools/perf/util/stat-shadow.c | 30 ------------------------------
 tools/perf/util/stat.c        |  2 --
 tools/perf/util/stat.h        |  4 ----
 4 files changed, 11 insertions(+), 59 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index e6b60b058257..9c1fbf154ee3 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -100,14 +100,6 @@
 
 static void print_counters(struct timespec *ts, int argc, const char **argv);
 
-static const char *smi_cost_attrs = {
-	"{"
-	"msr/aperf/,"
-	"msr/smi/,"
-	"cycles"
-	"}"
-};
-
 static struct evlist	*evsel_list;
 static bool all_counters_use_bpf = true;
 
@@ -1666,7 +1658,6 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
  */
 static int add_default_attributes(void)
 {
-	int err;
 	struct perf_event_attr default_attrs0[] = {
 
   { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK		},
@@ -1806,11 +1797,10 @@ static int add_default_attributes(void)
 	}
 
 	if (smi_cost) {
-		struct parse_events_error errinfo;
 		int smi;
 
 		if (sysfs__read_int(FREEZE_ON_SMI_PATH, &smi) < 0) {
-			fprintf(stderr, "freeze_on_smi is not supported.\n");
+			pr_err("freeze_on_smi is not supported.");
 			return -1;
 		}
 
@@ -1822,23 +1812,21 @@ static int add_default_attributes(void)
 			smi_reset = true;
 		}
 
-		if (!pmu_have_event("msr", "aperf") ||
-		    !pmu_have_event("msr", "smi")) {
-			fprintf(stderr, "To measure SMI cost, it needs "
-				"msr/aperf/, msr/smi/ and cpu/cycles/ support\n");
+		if (!metricgroup__has_metric("smi")) {
+			pr_err("Missing smi metrics");
 			return -1;
 		}
+
 		if (!force_metric_only)
 			stat_config.metric_only = true;
 
-		parse_events_error__init(&errinfo);
-		err = parse_events(evsel_list, smi_cost_attrs, &errinfo);
-		if (err) {
-			parse_events_error__print(&errinfo, smi_cost_attrs);
-			fprintf(stderr, "Cannot set up SMI cost events\n");
-		}
-		parse_events_error__exit(&errinfo);
-		return err ? -1 : 0;
+		return metricgroup__parse_groups(evsel_list, "smi",
+						stat_config.metric_no_group,
+						stat_config.metric_no_merge,
+						stat_config.metric_no_threshold,
+						stat_config.user_requested_cpu_list,
+						stat_config.system_wide,
+						&stat_config.metric_events);
 	}
 
 	if (topdown_run) {
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 3cfe4b4eb3de..d14fa531ee27 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -255,10 +255,6 @@ void perf_stat__update_shadow_stats(struct evsel *counter, u64 count,
 		update_runtime_stat(st, STAT_DTLB_CACHE, map_idx, count, &rsd);
 	else if (evsel__match(counter, HW_CACHE, HW_CACHE_ITLB))
 		update_runtime_stat(st, STAT_ITLB_CACHE, map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, SMI_NUM))
-		update_runtime_stat(st, STAT_SMI_NUM, map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, APERF))
-		update_runtime_stat(st, STAT_APERF, map_idx, count, &rsd);
 
 	if (counter->collect_stat) {
 		v = saved_value_lookup(counter, map_idx, true, STAT_NONE, 0, st,
@@ -479,30 +475,6 @@ static void print_ll_cache_misses(struct perf_stat_config *config,
 	out->print_metric(config, out->ctx, color, "%7.2f%%", "of all LL-cache accesses", ratio);
 }
 
-static void print_smi_cost(struct perf_stat_config *config, int map_idx,
-			   struct perf_stat_output_ctx *out,
-			   struct runtime_stat *st,
-			   struct runtime_stat_data *rsd)
-{
-	double smi_num, aperf, cycles, cost = 0.0;
-	const char *color = NULL;
-
-	smi_num = runtime_stat_avg(st, STAT_SMI_NUM, map_idx, rsd);
-	aperf = runtime_stat_avg(st, STAT_APERF, map_idx, rsd);
-	cycles = runtime_stat_avg(st, STAT_CYCLES, map_idx, rsd);
-
-	if ((cycles == 0) || (aperf == 0))
-		return;
-
-	if (smi_num)
-		cost = (aperf - cycles) / aperf * 100.00;
-
-	if (cost > 10)
-		color = PERF_COLOR_RED;
-	out->print_metric(config, out->ctx, color, "%8.1f%%", "SMI cycles%", cost);
-	out->print_metric(config, out->ctx, NULL, "%4.0f", "SMI#", smi_num);
-}
-
 static int prepare_metric(struct evsel **metric_events,
 			  struct metric_ref *metric_refs,
 			  struct expr_parse_ctx *pctx,
@@ -819,8 +791,6 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 		if (unit != ' ')
 			snprintf(unit_buf, sizeof(unit_buf), "%c/sec", unit);
 		print_metric(config, ctxp, NULL, "%8.3f", unit_buf, ratio);
-	} else if (perf_stat_evsel__is(evsel, SMI_NUM)) {
-		print_smi_cost(config, map_idx, out, st, &rsd);
 	} else {
 		num = 0;
 	}
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index b5b18d457254..d51d7457f12d 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -87,8 +87,6 @@ bool __perf_stat_evsel__is(struct evsel *evsel, enum perf_stat_evsel_id id)
 #define ID(id, name) [PERF_STAT_EVSEL_ID__##id] = #name
 static const char *id_str[PERF_STAT_EVSEL_ID__MAX] = {
 	ID(NONE,		x),
-	ID(SMI_NUM, msr/smi/),
-	ID(APERF, msr/aperf/),
 };
 #undef ID
 
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index c5fe847dd344..9af4af3bc3f2 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -21,8 +21,6 @@ struct stats {
 
 enum perf_stat_evsel_id {
 	PERF_STAT_EVSEL_ID__NONE = 0,
-	PERF_STAT_EVSEL_ID__SMI_NUM,
-	PERF_STAT_EVSEL_ID__APERF,
 	PERF_STAT_EVSEL_ID__MAX,
 };
 
@@ -88,8 +86,6 @@ enum stat_type {
 	STAT_LL_CACHE,
 	STAT_ITLB_CACHE,
 	STAT_DTLB_CACHE,
-	STAT_SMI_NUM,
-	STAT_APERF,
 	STAT_MAX
 };
 
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 45/51] perf stat: Remove perf_stat_evsel_id
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (27 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 44/51] perf stat: Use metrics for --smi-cost Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 46/51] perf stat: Move enums from header Ian Rogers
                   ` (7 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

perf_stat_evsel_id was used for hard coded metrics. These have now
migrated to json metrics and so the id values are no longer necessary.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/stat.c | 31 -------------------------------
 tools/perf/util/stat.h | 12 ------------
 2 files changed, 43 deletions(-)

diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index d51d7457f12d..8d83d2f4a082 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -77,36 +77,6 @@ double rel_stddev_stats(double stddev, double avg)
 	return pct;
 }
 
-bool __perf_stat_evsel__is(struct evsel *evsel, enum perf_stat_evsel_id id)
-{
-	struct perf_stat_evsel *ps = evsel->stats;
-
-	return ps->id == id;
-}
-
-#define ID(id, name) [PERF_STAT_EVSEL_ID__##id] = #name
-static const char *id_str[PERF_STAT_EVSEL_ID__MAX] = {
-	ID(NONE,		x),
-};
-#undef ID
-
-static void perf_stat_evsel_id_init(struct evsel *evsel)
-{
-	struct perf_stat_evsel *ps = evsel->stats;
-	int i;
-
-	/* ps->id is 0 hence PERF_STAT_EVSEL_ID__NONE by default */
-
-	for (i = 0; i < PERF_STAT_EVSEL_ID__MAX; i++) {
-		if (!strcmp(evsel__name(evsel), id_str[i]) ||
-		    (strstr(evsel__name(evsel), id_str[i]) && evsel->pmu_name
-		     && strstr(evsel__name(evsel), evsel->pmu_name))) {
-			ps->id = i;
-			break;
-		}
-	}
-}
-
 static void evsel__reset_aggr_stats(struct evsel *evsel)
 {
 	struct perf_stat_evsel *ps = evsel->stats;
@@ -166,7 +136,6 @@ static int evsel__alloc_stat_priv(struct evsel *evsel, int nr_aggr)
 		return -ENOMEM;
 	}
 
-	perf_stat_evsel_id_init(evsel);
 	evsel__reset_stat_priv(evsel);
 	return 0;
 }
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 9af4af3bc3f2..df6068a3f7bb 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -19,11 +19,6 @@ struct stats {
 	u64 max, min;
 };
 
-enum perf_stat_evsel_id {
-	PERF_STAT_EVSEL_ID__NONE = 0,
-	PERF_STAT_EVSEL_ID__MAX,
-};
-
 /* hold aggregated event info */
 struct perf_stat_aggr {
 	/* aggregated values */
@@ -40,8 +35,6 @@ struct perf_stat_aggr {
 struct perf_stat_evsel {
 	/* used for repeated runs */
 	struct stats		 res_stats;
-	/* evsel id for quick check */
-	enum perf_stat_evsel_id	 id;
 	/* number of allocated 'aggr' */
 	int			 nr_aggr;
 	/* aggregated event values */
@@ -187,11 +180,6 @@ static inline void update_rusage_stats(struct rusage_stats *ru_stats, struct rus
 struct evsel;
 struct evlist;
 
-bool __perf_stat_evsel__is(struct evsel *evsel, enum perf_stat_evsel_id id);
-
-#define perf_stat_evsel__is(evsel, id) \
-	__perf_stat_evsel__is(evsel, PERF_STAT_EVSEL_ID__ ## id)
-
 extern struct runtime_stat rt_stat;
 extern struct stats walltime_nsecs_stats;
 extern struct rusage_stats ru_stats;
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 46/51] perf stat: Move enums from header
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (28 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 45/51] perf stat: Remove perf_stat_evsel_id Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 47/51] perf stat: Hide runtime_stat Ian Rogers
                   ` (6 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

The enums are only used in stat-shadow.c, so narrow their scope by
moving to the C file.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/stat-shadow.c | 25 +++++++++++++++++++++++++
 tools/perf/util/stat.h        | 27 ---------------------------
 2 files changed, 25 insertions(+), 27 deletions(-)

diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index d14fa531ee27..fc948a7e83b7 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -29,6 +29,31 @@ struct runtime_stat rt_stat;
 struct stats walltime_nsecs_stats;
 struct rusage_stats ru_stats;
 
+enum {
+	CTX_BIT_USER	= 1 << 0,
+	CTX_BIT_KERNEL	= 1 << 1,
+	CTX_BIT_HV	= 1 << 2,
+	CTX_BIT_HOST	= 1 << 3,
+	CTX_BIT_IDLE	= 1 << 4,
+	CTX_BIT_MAX	= 1 << 5,
+};
+
+enum stat_type {
+	STAT_NONE = 0,
+	STAT_NSECS,
+	STAT_CYCLES,
+	STAT_STALLED_CYCLES_FRONT,
+	STAT_STALLED_CYCLES_BACK,
+	STAT_BRANCHES,
+	STAT_CACHEREFS,
+	STAT_L1_DCACHE,
+	STAT_L1_ICACHE,
+	STAT_LL_CACHE,
+	STAT_ITLB_CACHE,
+	STAT_DTLB_CACHE,
+	STAT_MAX
+};
+
 struct saved_value {
 	struct rb_node rb_node;
 	struct evsel *evsel;
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index df6068a3f7bb..215c0f5c4db7 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -55,33 +55,6 @@ enum aggr_mode {
 	AGGR_MAX
 };
 
-enum {
-	CTX_BIT_USER	= 1 << 0,
-	CTX_BIT_KERNEL	= 1 << 1,
-	CTX_BIT_HV	= 1 << 2,
-	CTX_BIT_HOST	= 1 << 3,
-	CTX_BIT_IDLE	= 1 << 4,
-	CTX_BIT_MAX	= 1 << 5,
-};
-
-#define NUM_CTX CTX_BIT_MAX
-
-enum stat_type {
-	STAT_NONE = 0,
-	STAT_NSECS,
-	STAT_CYCLES,
-	STAT_STALLED_CYCLES_FRONT,
-	STAT_STALLED_CYCLES_BACK,
-	STAT_BRANCHES,
-	STAT_CACHEREFS,
-	STAT_L1_DCACHE,
-	STAT_L1_ICACHE,
-	STAT_LL_CACHE,
-	STAT_ITLB_CACHE,
-	STAT_DTLB_CACHE,
-	STAT_MAX
-};
-
 struct runtime_stat {
 	struct rblist value_list;
 };
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 47/51] perf stat: Hide runtime_stat
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (29 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 46/51] perf stat: Move enums from header Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 48/51] perf stat: Add cpu_aggr_map for loop Ian Rogers
                   ` (5 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

runtime_stat is only shared for the sake of tests that don't care
about its value. Move the definition into stat-shadow.c and have the
tests also use the global version.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-script.c     |   6 +-
 tools/perf/builtin-stat.c       |   4 +-
 tools/perf/tests/parse-metric.c |  19 ++--
 tools/perf/tests/pmu-events.c   |   8 +-
 tools/perf/util/stat-display.c  |   5 +-
 tools/perf/util/stat-shadow.c   | 165 +++++++++++++-------------------
 tools/perf/util/stat.c          |   2 +-
 tools/perf/util/stat.h          |  17 +---
 8 files changed, 90 insertions(+), 136 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index a792214d1af8..e9b5387161df 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2074,8 +2074,7 @@ static void perf_sample__fprint_metric(struct perf_script *script,
 	val = sample->period * evsel->scale;
 	perf_stat__update_shadow_stats(evsel,
 				       val,
-				       sample->cpu,
-				       &rt_stat);
+				       sample->cpu);
 	evsel_script(evsel)->val = val;
 	if (evsel_script(leader)->gnum == leader->core.nr_members) {
 		for_each_group_member (ev2, leader) {
@@ -2083,8 +2082,7 @@ static void perf_sample__fprint_metric(struct perf_script *script,
 						      evsel_script(ev2)->val,
 						      sample->cpu,
 						      &ctx,
-						      NULL,
-						      &rt_stat);
+						      NULL);
 		}
 		evsel_script(leader)->gnum = 0;
 	}
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 9c1fbf154ee3..619387459914 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -434,7 +434,7 @@ static void process_interval(void)
 	clock_gettime(CLOCK_MONOTONIC, &ts);
 	diff_timespec(&rs, &ts, &ref_time);
 
-	perf_stat__reset_shadow_per_stat(&rt_stat);
+	perf_stat__reset_shadow_per_stat();
 	evlist__reset_aggr_stats(evsel_list);
 
 	if (read_counters(&rs) == 0)
@@ -910,7 +910,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 		evlist__copy_prev_raw_counts(evsel_list);
 		evlist__reset_prev_raw_counts(evsel_list);
 		evlist__reset_aggr_stats(evsel_list);
-		perf_stat__reset_shadow_per_stat(&rt_stat);
+		perf_stat__reset_shadow_per_stat();
 	} else {
 		update_stats(&walltime_nsecs_stats, t1 - t0);
 		update_rusage_stats(&ru_stats, &stat_config.ru_data);
diff --git a/tools/perf/tests/parse-metric.c b/tools/perf/tests/parse-metric.c
index 132c9b945a42..37e3371d978e 100644
--- a/tools/perf/tests/parse-metric.c
+++ b/tools/perf/tests/parse-metric.c
@@ -30,8 +30,7 @@ static u64 find_value(const char *name, struct value *values)
 	return 0;
 }
 
-static void load_runtime_stat(struct runtime_stat *st, struct evlist *evlist,
-			      struct value *vals)
+static void load_runtime_stat(struct evlist *evlist, struct value *vals)
 {
 	struct evsel *evsel;
 	u64 count;
@@ -39,14 +38,14 @@ static void load_runtime_stat(struct runtime_stat *st, struct evlist *evlist,
 	perf_stat__reset_shadow_stats();
 	evlist__for_each_entry(evlist, evsel) {
 		count = find_value(evsel->name, vals);
-		perf_stat__update_shadow_stats(evsel, count, 0, st);
+		perf_stat__update_shadow_stats(evsel, count, 0);
 		if (!strcmp(evsel->name, "duration_time"))
 			update_stats(&walltime_nsecs_stats, count);
 	}
 }
 
 static double compute_single(struct rblist *metric_events, struct evlist *evlist,
-			     struct runtime_stat *st, const char *name)
+			     const char *name)
 {
 	struct metric_expr *mexp;
 	struct metric_event *me;
@@ -58,7 +57,7 @@ static double compute_single(struct rblist *metric_events, struct evlist *evlist
 			list_for_each_entry (mexp, &me->head, nd) {
 				if (strcmp(mexp->metric_name, name))
 					continue;
-				return test_generic_metric(mexp, 0, st);
+				return test_generic_metric(mexp, 0);
 			}
 		}
 	}
@@ -74,7 +73,6 @@ static int __compute_metric(const char *name, struct value *vals,
 	};
 	const struct pmu_metrics_table *pme_test;
 	struct perf_cpu_map *cpus;
-	struct runtime_stat st;
 	struct evlist *evlist;
 	int err;
 
@@ -93,7 +91,6 @@ static int __compute_metric(const char *name, struct value *vals,
 	}
 
 	perf_evlist__set_maps(&evlist->core, cpus, NULL);
-	runtime_stat__init(&st);
 
 	/* Parse the metric into metric_events list. */
 	pme_test = find_core_metrics_table("testarch", "testcpu");
@@ -107,18 +104,17 @@ static int __compute_metric(const char *name, struct value *vals,
 		goto out;
 
 	/* Load the runtime stats with given numbers for events. */
-	load_runtime_stat(&st, evlist, vals);
+	load_runtime_stat(evlist, vals);
 
 	/* And execute the metric */
 	if (name1 && ratio1)
-		*ratio1 = compute_single(&metric_events, evlist, &st, name1);
+		*ratio1 = compute_single(&metric_events, evlist, name1);
 	if (name2 && ratio2)
-		*ratio2 = compute_single(&metric_events, evlist, &st, name2);
+		*ratio2 = compute_single(&metric_events, evlist, name2);
 
 out:
 	/* ... cleanup. */
 	metricgroup__rblist_exit(&metric_events);
-	runtime_stat__exit(&st);
 	evlist__free_stats(evlist);
 	perf_cpu_map__put(cpus);
 	evlist__delete(evlist);
@@ -300,6 +296,7 @@ static int test_metric_group(void)
 
 static int test__parse_metric(struct test_suite *test __maybe_unused, int subtest __maybe_unused)
 {
+	perf_stat__init_shadow_stats();
 	TEST_ASSERT_VAL("IPC failed", test_ipc() == 0);
 	TEST_ASSERT_VAL("frontend failed", test_frontend() == 0);
 	TEST_ASSERT_VAL("DCache_L2 failed", test_dcache_l2() == 0);
diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index 50b99a0f8f59..122e74c282a7 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -816,7 +816,6 @@ static int test__parsing_callback(const struct pmu_metric *pm,
 	int k;
 	struct evlist *evlist;
 	struct perf_cpu_map *cpus;
-	struct runtime_stat st;
 	struct evsel *evsel;
 	struct rblist metric_events = {
 		.nr_entries = 0,
@@ -844,7 +843,6 @@ static int test__parsing_callback(const struct pmu_metric *pm,
 	}
 
 	perf_evlist__set_maps(&evlist->core, cpus, NULL);
-	runtime_stat__init(&st);
 
 	err = metricgroup__parse_groups_test(evlist, table, pm->metric_name, &metric_events);
 	if (err) {
@@ -867,7 +865,7 @@ static int test__parsing_callback(const struct pmu_metric *pm,
 	k = 1;
 	perf_stat__reset_shadow_stats();
 	evlist__for_each_entry(evlist, evsel) {
-		perf_stat__update_shadow_stats(evsel, k, 0, &st);
+		perf_stat__update_shadow_stats(evsel, k, 0);
 		if (!strcmp(evsel->name, "duration_time"))
 			update_stats(&walltime_nsecs_stats, k);
 		k++;
@@ -881,7 +879,7 @@ static int test__parsing_callback(const struct pmu_metric *pm,
 			list_for_each_entry (mexp, &me->head, nd) {
 				if (strcmp(mexp->metric_name, pm->metric_name))
 					continue;
-				pr_debug("Result %f\n", test_generic_metric(mexp, 0, &st));
+				pr_debug("Result %f\n", test_generic_metric(mexp, 0));
 				err = 0;
 				(*failures)--;
 				goto out_err;
@@ -896,7 +894,6 @@ static int test__parsing_callback(const struct pmu_metric *pm,
 
 	/* ... cleanup. */
 	metricgroup__rblist_exit(&metric_events);
-	runtime_stat__exit(&st);
 	evlist__free_stats(evlist);
 	perf_cpu_map__put(cpus);
 	evlist__delete(evlist);
@@ -908,6 +905,7 @@ static int test__parsing(struct test_suite *test __maybe_unused,
 {
 	int failures = 0;
 
+	perf_stat__init_shadow_stats();
 	pmu_for_each_core_metric(test__parsing_callback, &failures);
 	pmu_for_each_sys_metric(test__parsing_callback, &failures);
 
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 1b5cb20efd23..6c065d0624c3 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -729,7 +729,7 @@ static void printout(struct perf_stat_config *config, struct outstate *os,
 
 	if (ok) {
 		perf_stat__print_shadow_stats(config, counter, uval, map_idx,
-					      &out, &config->metric_events, &rt_stat);
+					      &out, &config->metric_events);
 	} else {
 		pm(config, os, /*color=*/NULL, /*format=*/NULL, /*unit=*/"", /*val=*/0);
 	}
@@ -1089,8 +1089,7 @@ static void print_metric_headers(struct perf_stat_config *config,
 		perf_stat__print_shadow_stats(config, counter, 0,
 					      0,
 					      &out,
-					      &config->metric_events,
-					      &rt_stat);
+					      &config->metric_events);
 	}
 
 	if (!config->json_output)
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index fc948a7e83b7..f80be6abac90 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -25,10 +25,13 @@
  * AGGR_THREAD: Not supported?
  */
 
-struct runtime_stat rt_stat;
 struct stats walltime_nsecs_stats;
 struct rusage_stats ru_stats;
 
+static struct runtime_stat {
+	struct rblist value_list;
+} rt_stat;
+
 enum {
 	CTX_BIT_USER	= 1 << 0,
 	CTX_BIT_KERNEL	= 1 << 1,
@@ -125,7 +128,6 @@ static struct saved_value *saved_value_lookup(struct evsel *evsel,
 					      bool create,
 					      enum stat_type type,
 					      int ctx,
-					      struct runtime_stat *st,
 					      struct cgroup *cgrp)
 {
 	struct rblist *rblist;
@@ -138,7 +140,7 @@ static struct saved_value *saved_value_lookup(struct evsel *evsel,
 		.cgrp = cgrp,
 	};
 
-	rblist = &st->value_list;
+	rblist = &rt_stat.value_list;
 
 	/* don't use context info for clock events */
 	if (type == STAT_NSECS)
@@ -156,9 +158,9 @@ static struct saved_value *saved_value_lookup(struct evsel *evsel,
 	return NULL;
 }
 
-void runtime_stat__init(struct runtime_stat *st)
+void perf_stat__init_shadow_stats(void)
 {
-	struct rblist *rblist = &st->value_list;
+	struct rblist *rblist = &rt_stat.value_list;
 
 	rblist__init(rblist);
 	rblist->node_cmp = saved_value_cmp;
@@ -166,16 +168,6 @@ void runtime_stat__init(struct runtime_stat *st)
 	rblist->node_delete = saved_value_delete;
 }
 
-void runtime_stat__exit(struct runtime_stat *st)
-{
-	rblist__exit(&st->value_list);
-}
-
-void perf_stat__init_shadow_stats(void)
-{
-	runtime_stat__init(&rt_stat);
-}
-
 static int evsel_context(struct evsel *evsel)
 {
 	int ctx = 0;
@@ -194,12 +186,12 @@ static int evsel_context(struct evsel *evsel)
 	return ctx;
 }
 
-static void reset_stat(struct runtime_stat *st)
+void perf_stat__reset_shadow_per_stat(void)
 {
 	struct rblist *rblist;
 	struct rb_node *pos, *next;
 
-	rblist = &st->value_list;
+	rblist = &rt_stat.value_list;
 	next = rb_first_cached(&rblist->entries);
 	while (next) {
 		pos = next;
@@ -212,28 +204,22 @@ static void reset_stat(struct runtime_stat *st)
 
 void perf_stat__reset_shadow_stats(void)
 {
-	reset_stat(&rt_stat);
+	perf_stat__reset_shadow_per_stat();
 	memset(&walltime_nsecs_stats, 0, sizeof(walltime_nsecs_stats));
 	memset(&ru_stats, 0, sizeof(ru_stats));
 }
 
-void perf_stat__reset_shadow_per_stat(struct runtime_stat *st)
-{
-	reset_stat(st);
-}
-
 struct runtime_stat_data {
 	int ctx;
 	struct cgroup *cgrp;
 };
 
-static void update_runtime_stat(struct runtime_stat *st,
-				enum stat_type type,
+static void update_runtime_stat(enum stat_type type,
 				int map_idx, u64 count,
 				struct runtime_stat_data *rsd)
 {
 	struct saved_value *v = saved_value_lookup(NULL, map_idx, true, type,
-						   rsd->ctx, st, rsd->cgrp);
+						   rsd->ctx, rsd->cgrp);
 
 	if (v)
 		update_stats(&v->stats, count);
@@ -245,7 +231,7 @@ static void update_runtime_stat(struct runtime_stat *st,
  * instruction rates, etc:
  */
 void perf_stat__update_shadow_stats(struct evsel *counter, u64 count,
-				    int map_idx, struct runtime_stat *st)
+				    int map_idx)
 {
 	u64 count_ns = count;
 	struct saved_value *v;
@@ -253,43 +239,42 @@ void perf_stat__update_shadow_stats(struct evsel *counter, u64 count,
 		.ctx = evsel_context(counter),
 		.cgrp = counter->cgrp,
 	};
-
 	count *= counter->scale;
 
 	if (evsel__is_clock(counter))
-		update_runtime_stat(st, STAT_NSECS, map_idx, count_ns, &rsd);
+		update_runtime_stat(STAT_NSECS, map_idx, count_ns, &rsd);
 	else if (evsel__match(counter, HARDWARE, HW_CPU_CYCLES))
-		update_runtime_stat(st, STAT_CYCLES, map_idx, count, &rsd);
+		update_runtime_stat(STAT_CYCLES, map_idx, count, &rsd);
 	else if (evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
-		update_runtime_stat(st, STAT_STALLED_CYCLES_FRONT,
+		update_runtime_stat(STAT_STALLED_CYCLES_FRONT,
 				    map_idx, count, &rsd);
 	else if (evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_BACKEND))
-		update_runtime_stat(st, STAT_STALLED_CYCLES_BACK,
+		update_runtime_stat(STAT_STALLED_CYCLES_BACK,
 				    map_idx, count, &rsd);
 	else if (evsel__match(counter, HARDWARE, HW_BRANCH_INSTRUCTIONS))
-		update_runtime_stat(st, STAT_BRANCHES, map_idx, count, &rsd);
+		update_runtime_stat(STAT_BRANCHES, map_idx, count, &rsd);
 	else if (evsel__match(counter, HARDWARE, HW_CACHE_REFERENCES))
-		update_runtime_stat(st, STAT_CACHEREFS, map_idx, count, &rsd);
+		update_runtime_stat(STAT_CACHEREFS, map_idx, count, &rsd);
 	else if (evsel__match(counter, HW_CACHE, HW_CACHE_L1D))
-		update_runtime_stat(st, STAT_L1_DCACHE, map_idx, count, &rsd);
+		update_runtime_stat(STAT_L1_DCACHE, map_idx, count, &rsd);
 	else if (evsel__match(counter, HW_CACHE, HW_CACHE_L1I))
-		update_runtime_stat(st, STAT_L1_ICACHE, map_idx, count, &rsd);
+		update_runtime_stat(STAT_L1_ICACHE, map_idx, count, &rsd);
 	else if (evsel__match(counter, HW_CACHE, HW_CACHE_LL))
-		update_runtime_stat(st, STAT_LL_CACHE, map_idx, count, &rsd);
+		update_runtime_stat(STAT_LL_CACHE, map_idx, count, &rsd);
 	else if (evsel__match(counter, HW_CACHE, HW_CACHE_DTLB))
-		update_runtime_stat(st, STAT_DTLB_CACHE, map_idx, count, &rsd);
+		update_runtime_stat(STAT_DTLB_CACHE, map_idx, count, &rsd);
 	else if (evsel__match(counter, HW_CACHE, HW_CACHE_ITLB))
-		update_runtime_stat(st, STAT_ITLB_CACHE, map_idx, count, &rsd);
+		update_runtime_stat(STAT_ITLB_CACHE, map_idx, count, &rsd);
 
 	if (counter->collect_stat) {
-		v = saved_value_lookup(counter, map_idx, true, STAT_NONE, 0, st,
+		v = saved_value_lookup(counter, map_idx, true, STAT_NONE, 0,
 				       rsd.cgrp);
 		update_stats(&v->stats, count);
 		if (counter->metric_leader)
 			v->metric_total += count;
 	} else if (counter->metric_leader && !counter->merged_stat) {
 		v = saved_value_lookup(counter->metric_leader,
-				       map_idx, true, STAT_NONE, 0, st, rsd.cgrp);
+				       map_idx, true, STAT_NONE, 0, rsd.cgrp);
 		v->metric_total += count;
 		v->metric_other++;
 	}
@@ -322,26 +307,24 @@ static const char *get_ratio_color(enum grc_type type, double ratio)
 	return color;
 }
 
-static double runtime_stat_avg(struct runtime_stat *st,
-			       enum stat_type type, int map_idx,
+static double runtime_stat_avg(enum stat_type type, int map_idx,
 			       struct runtime_stat_data *rsd)
 {
 	struct saved_value *v;
 
-	v = saved_value_lookup(NULL, map_idx, false, type, rsd->ctx, st, rsd->cgrp);
+	v = saved_value_lookup(NULL, map_idx, false, type, rsd->ctx, rsd->cgrp);
 	if (!v)
 		return 0.0;
 
 	return avg_stats(&v->stats);
 }
 
-static double runtime_stat_n(struct runtime_stat *st,
-			     enum stat_type type, int map_idx,
+static double runtime_stat_n(enum stat_type type, int map_idx,
 			     struct runtime_stat_data *rsd)
 {
 	struct saved_value *v;
 
-	v = saved_value_lookup(NULL, map_idx, false, type, rsd->ctx, st, rsd->cgrp);
+	v = saved_value_lookup(NULL, map_idx, false, type, rsd->ctx, rsd->cgrp);
 	if (!v)
 		return 0.0;
 
@@ -351,13 +334,12 @@ static double runtime_stat_n(struct runtime_stat *st,
 static void print_stalled_cycles_frontend(struct perf_stat_config *config,
 					  int map_idx, double avg,
 					  struct perf_stat_output_ctx *out,
-					  struct runtime_stat *st,
 					  struct runtime_stat_data *rsd)
 {
 	double total, ratio = 0.0;
 	const char *color;
 
-	total = runtime_stat_avg(st, STAT_CYCLES, map_idx, rsd);
+	total = runtime_stat_avg(STAT_CYCLES, map_idx, rsd);
 
 	if (total)
 		ratio = avg / total * 100.0;
@@ -374,13 +356,12 @@ static void print_stalled_cycles_frontend(struct perf_stat_config *config,
 static void print_stalled_cycles_backend(struct perf_stat_config *config,
 					 int map_idx, double avg,
 					 struct perf_stat_output_ctx *out,
-					 struct runtime_stat *st,
 					 struct runtime_stat_data *rsd)
 {
 	double total, ratio = 0.0;
 	const char *color;
 
-	total = runtime_stat_avg(st, STAT_CYCLES, map_idx, rsd);
+	total = runtime_stat_avg(STAT_CYCLES, map_idx, rsd);
 
 	if (total)
 		ratio = avg / total * 100.0;
@@ -393,13 +374,12 @@ static void print_stalled_cycles_backend(struct perf_stat_config *config,
 static void print_branch_misses(struct perf_stat_config *config,
 				int map_idx, double avg,
 				struct perf_stat_output_ctx *out,
-				struct runtime_stat *st,
 				struct runtime_stat_data *rsd)
 {
 	double total, ratio = 0.0;
 	const char *color;
 
-	total = runtime_stat_avg(st, STAT_BRANCHES, map_idx, rsd);
+	total = runtime_stat_avg(STAT_BRANCHES, map_idx, rsd);
 
 	if (total)
 		ratio = avg / total * 100.0;
@@ -412,13 +392,12 @@ static void print_branch_misses(struct perf_stat_config *config,
 static void print_l1_dcache_misses(struct perf_stat_config *config,
 				   int map_idx, double avg,
 				   struct perf_stat_output_ctx *out,
-				   struct runtime_stat *st,
 				   struct runtime_stat_data *rsd)
 {
 	double total, ratio = 0.0;
 	const char *color;
 
-	total = runtime_stat_avg(st, STAT_L1_DCACHE, map_idx, rsd);
+	total = runtime_stat_avg(STAT_L1_DCACHE, map_idx, rsd);
 
 	if (total)
 		ratio = avg / total * 100.0;
@@ -431,13 +410,12 @@ static void print_l1_dcache_misses(struct perf_stat_config *config,
 static void print_l1_icache_misses(struct perf_stat_config *config,
 				   int map_idx, double avg,
 				   struct perf_stat_output_ctx *out,
-				   struct runtime_stat *st,
 				   struct runtime_stat_data *rsd)
 {
 	double total, ratio = 0.0;
 	const char *color;
 
-	total = runtime_stat_avg(st, STAT_L1_ICACHE, map_idx, rsd);
+	total = runtime_stat_avg(STAT_L1_ICACHE, map_idx, rsd);
 
 	if (total)
 		ratio = avg / total * 100.0;
@@ -449,13 +427,12 @@ static void print_l1_icache_misses(struct perf_stat_config *config,
 static void print_dtlb_cache_misses(struct perf_stat_config *config,
 				    int map_idx, double avg,
 				    struct perf_stat_output_ctx *out,
-				    struct runtime_stat *st,
 				    struct runtime_stat_data *rsd)
 {
 	double total, ratio = 0.0;
 	const char *color;
 
-	total = runtime_stat_avg(st, STAT_DTLB_CACHE, map_idx, rsd);
+	total = runtime_stat_avg(STAT_DTLB_CACHE, map_idx, rsd);
 
 	if (total)
 		ratio = avg / total * 100.0;
@@ -467,13 +444,12 @@ static void print_dtlb_cache_misses(struct perf_stat_config *config,
 static void print_itlb_cache_misses(struct perf_stat_config *config,
 				    int map_idx, double avg,
 				    struct perf_stat_output_ctx *out,
-				    struct runtime_stat *st,
 				    struct runtime_stat_data *rsd)
 {
 	double total, ratio = 0.0;
 	const char *color;
 
-	total = runtime_stat_avg(st, STAT_ITLB_CACHE, map_idx, rsd);
+	total = runtime_stat_avg(STAT_ITLB_CACHE, map_idx, rsd);
 
 	if (total)
 		ratio = avg / total * 100.0;
@@ -485,13 +461,12 @@ static void print_itlb_cache_misses(struct perf_stat_config *config,
 static void print_ll_cache_misses(struct perf_stat_config *config,
 				  int map_idx, double avg,
 				  struct perf_stat_output_ctx *out,
-				  struct runtime_stat *st,
 				  struct runtime_stat_data *rsd)
 {
 	double total, ratio = 0.0;
 	const char *color;
 
-	total = runtime_stat_avg(st, STAT_LL_CACHE, map_idx, rsd);
+	total = runtime_stat_avg(STAT_LL_CACHE, map_idx, rsd);
 
 	if (total)
 		ratio = avg / total * 100.0;
@@ -503,8 +478,7 @@ static void print_ll_cache_misses(struct perf_stat_config *config,
 static int prepare_metric(struct evsel **metric_events,
 			  struct metric_ref *metric_refs,
 			  struct expr_parse_ctx *pctx,
-			  int map_idx,
-			  struct runtime_stat *st)
+			  int map_idx)
 {
 	double scale;
 	char *n;
@@ -543,7 +517,7 @@ static int prepare_metric(struct evsel **metric_events,
 			}
 		} else {
 			v = saved_value_lookup(metric_events[i], map_idx, false,
-					       STAT_NONE, 0, st,
+					       STAT_NONE, 0,
 					       metric_events[i]->cgrp);
 			if (!v)
 				break;
@@ -587,8 +561,7 @@ static void generic_metric(struct perf_stat_config *config,
 			   const char *metric_unit,
 			   int runtime,
 			   int map_idx,
-			   struct perf_stat_output_ctx *out,
-			   struct runtime_stat *st)
+			   struct perf_stat_output_ctx *out)
 {
 	print_metric_t print_metric = out->print_metric;
 	struct expr_parse_ctx *pctx;
@@ -605,7 +578,7 @@ static void generic_metric(struct perf_stat_config *config,
 		pctx->sctx.user_requested_cpu_list = strdup(config->user_requested_cpu_list);
 	pctx->sctx.runtime = runtime;
 	pctx->sctx.system_wide = config->system_wide;
-	i = prepare_metric(metric_events, metric_refs, pctx, map_idx, st);
+	i = prepare_metric(metric_events, metric_refs, pctx, map_idx);
 	if (i < 0) {
 		expr__ctx_free(pctx);
 		return;
@@ -657,7 +630,7 @@ static void generic_metric(struct perf_stat_config *config,
 	expr__ctx_free(pctx);
 }
 
-double test_generic_metric(struct metric_expr *mexp, int map_idx, struct runtime_stat *st)
+double test_generic_metric(struct metric_expr *mexp, int map_idx)
 {
 	struct expr_parse_ctx *pctx;
 	double ratio = 0.0;
@@ -666,7 +639,7 @@ double test_generic_metric(struct metric_expr *mexp, int map_idx, struct runtime
 	if (!pctx)
 		return NAN;
 
-	if (prepare_metric(mexp->metric_events, mexp->metric_refs, pctx, map_idx, st) < 0)
+	if (prepare_metric(mexp->metric_events, mexp->metric_refs, pctx, map_idx) < 0)
 		goto out;
 
 	if (expr__parse(&ratio, pctx, mexp->metric_expr))
@@ -681,8 +654,7 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 				   struct evsel *evsel,
 				   double avg, int map_idx,
 				   struct perf_stat_output_ctx *out,
-				   struct rblist *metric_events,
-				   struct runtime_stat *st)
+				   struct rblist *metric_events)
 {
 	void *ctxp = out->ctx;
 	print_metric_t print_metric = out->print_metric;
@@ -697,7 +669,7 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 	if (config->iostat_run) {
 		iostat_print_metric(config, evsel, out);
 	} else if (evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) {
-		total = runtime_stat_avg(st, STAT_CYCLES, map_idx, &rsd);
+		total = runtime_stat_avg(STAT_CYCLES, map_idx, &rsd);
 
 		if (total) {
 			ratio = avg / total;
@@ -707,10 +679,9 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 			print_metric(config, ctxp, NULL, NULL, "insn per cycle", 0);
 		}
 
-		total = runtime_stat_avg(st, STAT_STALLED_CYCLES_FRONT, map_idx, &rsd);
+		total = runtime_stat_avg(STAT_STALLED_CYCLES_FRONT, map_idx, &rsd);
 
-		total = max(total, runtime_stat_avg(st,
-						    STAT_STALLED_CYCLES_BACK,
+		total = max(total, runtime_stat_avg(STAT_STALLED_CYCLES_BACK,
 						    map_idx, &rsd));
 
 		if (total && avg) {
@@ -721,8 +692,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 					ratio);
 		}
 	} else if (evsel__match(evsel, HARDWARE, HW_BRANCH_MISSES)) {
-		if (runtime_stat_n(st, STAT_BRANCHES, map_idx, &rsd) != 0)
-			print_branch_misses(config, map_idx, avg, out, st, &rsd);
+		if (runtime_stat_n(STAT_BRANCHES, map_idx, &rsd) != 0)
+			print_branch_misses(config, map_idx, avg, out, &rsd);
 		else
 			print_metric(config, ctxp, NULL, NULL, "of all branches", 0);
 	} else if (
@@ -731,8 +702,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
 					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
 
-		if (runtime_stat_n(st, STAT_L1_DCACHE, map_idx, &rsd) != 0)
-			print_l1_dcache_misses(config, map_idx, avg, out, st, &rsd);
+		if (runtime_stat_n(STAT_L1_DCACHE, map_idx, &rsd) != 0)
+			print_l1_dcache_misses(config, map_idx, avg, out, &rsd);
 		else
 			print_metric(config, ctxp, NULL, NULL, "of all L1-dcache accesses", 0);
 	} else if (
@@ -741,8 +712,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
 					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
 
-		if (runtime_stat_n(st, STAT_L1_ICACHE, map_idx, &rsd) != 0)
-			print_l1_icache_misses(config, map_idx, avg, out, st, &rsd);
+		if (runtime_stat_n(STAT_L1_ICACHE, map_idx, &rsd) != 0)
+			print_l1_icache_misses(config, map_idx, avg, out, &rsd);
 		else
 			print_metric(config, ctxp, NULL, NULL, "of all L1-icache accesses", 0);
 	} else if (
@@ -751,8 +722,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
 					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
 
-		if (runtime_stat_n(st, STAT_DTLB_CACHE, map_idx, &rsd) != 0)
-			print_dtlb_cache_misses(config, map_idx, avg, out, st, &rsd);
+		if (runtime_stat_n(STAT_DTLB_CACHE, map_idx, &rsd) != 0)
+			print_dtlb_cache_misses(config, map_idx, avg, out, &rsd);
 		else
 			print_metric(config, ctxp, NULL, NULL, "of all dTLB cache accesses", 0);
 	} else if (
@@ -761,8 +732,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
 					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
 
-		if (runtime_stat_n(st, STAT_ITLB_CACHE, map_idx, &rsd) != 0)
-			print_itlb_cache_misses(config, map_idx, avg, out, st, &rsd);
+		if (runtime_stat_n(STAT_ITLB_CACHE, map_idx, &rsd) != 0)
+			print_itlb_cache_misses(config, map_idx, avg, out, &rsd);
 		else
 			print_metric(config, ctxp, NULL, NULL, "of all iTLB cache accesses", 0);
 	} else if (
@@ -771,27 +742,27 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
 					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
 
-		if (runtime_stat_n(st, STAT_LL_CACHE, map_idx, &rsd) != 0)
-			print_ll_cache_misses(config, map_idx, avg, out, st, &rsd);
+		if (runtime_stat_n(STAT_LL_CACHE, map_idx, &rsd) != 0)
+			print_ll_cache_misses(config, map_idx, avg, out, &rsd);
 		else
 			print_metric(config, ctxp, NULL, NULL, "of all LL-cache accesses", 0);
 	} else if (evsel__match(evsel, HARDWARE, HW_CACHE_MISSES)) {
-		total = runtime_stat_avg(st, STAT_CACHEREFS, map_idx, &rsd);
+		total = runtime_stat_avg(STAT_CACHEREFS, map_idx, &rsd);
 
 		if (total)
 			ratio = avg * 100 / total;
 
-		if (runtime_stat_n(st, STAT_CACHEREFS, map_idx, &rsd) != 0)
+		if (runtime_stat_n(STAT_CACHEREFS, map_idx, &rsd) != 0)
 			print_metric(config, ctxp, NULL, "%8.3f %%",
 				     "of all cache refs", ratio);
 		else
 			print_metric(config, ctxp, NULL, NULL, "of all cache refs", 0);
 	} else if (evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_FRONTEND)) {
-		print_stalled_cycles_frontend(config, map_idx, avg, out, st, &rsd);
+		print_stalled_cycles_frontend(config, map_idx, avg, out, &rsd);
 	} else if (evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_BACKEND)) {
-		print_stalled_cycles_backend(config, map_idx, avg, out, st, &rsd);
+		print_stalled_cycles_backend(config, map_idx, avg, out, &rsd);
 	} else if (evsel__match(evsel, HARDWARE, HW_CPU_CYCLES)) {
-		total = runtime_stat_avg(st, STAT_NSECS, map_idx, &rsd);
+		total = runtime_stat_avg(STAT_NSECS, map_idx, &rsd);
 
 		if (total) {
 			ratio = avg / total;
@@ -805,11 +776,11 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 				     avg / (ratio * evsel->scale));
 		else
 			print_metric(config, ctxp, NULL, NULL, "CPUs utilized", 0);
-	} else if (runtime_stat_n(st, STAT_NSECS, map_idx, &rsd) != 0) {
+	} else if (runtime_stat_n(STAT_NSECS, map_idx, &rsd) != 0) {
 		char unit = ' ';
 		char unit_buf[10] = "/sec";
 
-		total = runtime_stat_avg(st, STAT_NSECS, map_idx, &rsd);
+		total = runtime_stat_avg(STAT_NSECS, map_idx, &rsd);
 		if (total)
 			ratio = convert_unit_double(1000000000.0 * avg / total, &unit);
 
@@ -829,7 +800,7 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 			generic_metric(config, mexp->metric_expr, mexp->metric_threshold,
 				       mexp->metric_events, mexp->metric_refs, evsel->name,
 				       mexp->metric_name, mexp->metric_unit, mexp->runtime,
-				       map_idx, out, st);
+				       map_idx, out);
 		}
 	}
 	if (num == 0)
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 8d83d2f4a082..0d7538670d67 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -659,7 +659,7 @@ static void evsel__update_shadow_stats(struct evsel *evsel)
 	for (i = 0; i < ps->nr_aggr; i++) {
 		struct perf_counts_values *aggr_counts = &ps->aggr[i].counts;
 
-		perf_stat__update_shadow_stats(evsel, aggr_counts->val, i, &rt_stat);
+		perf_stat__update_shadow_stats(evsel, aggr_counts->val, i);
 	}
 }
 
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 215c0f5c4db7..09975e098bd0 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -55,10 +55,6 @@ enum aggr_mode {
 	AGGR_MAX
 };
 
-struct runtime_stat {
-	struct rblist value_list;
-};
-
 struct rusage_stats {
 	struct stats ru_utime_usec_stat;
 	struct stats ru_stime_usec_stat;
@@ -153,7 +149,6 @@ static inline void update_rusage_stats(struct rusage_stats *ru_stats, struct rus
 struct evsel;
 struct evlist;
 
-extern struct runtime_stat rt_stat;
 extern struct stats walltime_nsecs_stats;
 extern struct rusage_stats ru_stats;
 
@@ -162,13 +157,10 @@ typedef void (*print_metric_t)(struct perf_stat_config *config,
 			       const char *fmt, double val);
 typedef void (*new_line_t)(struct perf_stat_config *config, void *ctx);
 
-void runtime_stat__init(struct runtime_stat *st);
-void runtime_stat__exit(struct runtime_stat *st);
 void perf_stat__init_shadow_stats(void);
 void perf_stat__reset_shadow_stats(void);
-void perf_stat__reset_shadow_per_stat(struct runtime_stat *st);
-void perf_stat__update_shadow_stats(struct evsel *counter, u64 count,
-				    int map_idx, struct runtime_stat *st);
+void perf_stat__reset_shadow_per_stat(void);
+void perf_stat__update_shadow_stats(struct evsel *counter, u64 count, int map_idx);
 struct perf_stat_output_ctx {
 	void *ctx;
 	print_metric_t print_metric;
@@ -180,8 +172,7 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 				   struct evsel *evsel,
 				   double avg, int map_idx,
 				   struct perf_stat_output_ctx *out,
-				   struct rblist *metric_events,
-				   struct runtime_stat *st);
+				   struct rblist *metric_events);
 
 int evlist__alloc_stats(struct perf_stat_config *config,
 			struct evlist *evlist, bool alloc_raw);
@@ -220,5 +211,5 @@ void evlist__print_counters(struct evlist *evlist, struct perf_stat_config *conf
 			    struct target *_target, struct timespec *ts, int argc, const char **argv);
 
 struct metric_expr;
-double test_generic_metric(struct metric_expr *mexp, int map_idx, struct runtime_stat *st);
+double test_generic_metric(struct metric_expr *mexp, int map_idx);
 #endif
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 48/51] perf stat: Add cpu_aggr_map for loop
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (30 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 47/51] perf stat: Hide runtime_stat Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 49/51] perf metric: Directly use counts rather than saved_value Ian Rogers
                   ` (4 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Rename variables, add a comment and add a cpu_aggr_map__for_each_idx
to aid the readability of the stat-display code. In particular, try to
make sure aggr_idx is used consistently to differentiate from other
kinds of index.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/cpumap.h       |   3 +
 tools/perf/util/stat-display.c | 112 +++++++++++++++--------------
 tools/perf/util/stat-shadow.c  | 128 ++++++++++++++++-----------------
 tools/perf/util/stat.c         |   8 +--
 tools/perf/util/stat.h         |   6 +-
 5 files changed, 132 insertions(+), 125 deletions(-)

diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
index c2f5824a3a22..e3426541e0aa 100644
--- a/tools/perf/util/cpumap.h
+++ b/tools/perf/util/cpumap.h
@@ -35,6 +35,9 @@ struct cpu_aggr_map {
 	struct aggr_cpu_id map[];
 };
 
+#define cpu_aggr_map__for_each_idx(idx, aggr_map)				\
+	for ((idx) = 0; (idx) < aggr_map->nr; (idx)++)
+
 struct perf_record_cpu_map_data;
 
 bool perf_record_cpu_map_data__test_bit(int i, const struct perf_record_cpu_map_data *data);
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 6c065d0624c3..e6035ecbeee8 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -183,7 +183,7 @@ static void print_cgroup(struct perf_stat_config *config, struct cgroup *cgrp)
 }
 
 static void print_aggr_id_std(struct perf_stat_config *config,
-			      struct evsel *evsel, struct aggr_cpu_id id, int nr)
+			      struct evsel *evsel, struct aggr_cpu_id id, int aggr_nr)
 {
 	FILE *output = config->output;
 	int idx = config->aggr_mode;
@@ -225,11 +225,11 @@ static void print_aggr_id_std(struct perf_stat_config *config,
 		return;
 	}
 
-	fprintf(output, "%-*s %*d ", aggr_header_lens[idx], buf, 4, nr);
+	fprintf(output, "%-*s %*d ", aggr_header_lens[idx], buf, 4, aggr_nr);
 }
 
 static void print_aggr_id_csv(struct perf_stat_config *config,
-			      struct evsel *evsel, struct aggr_cpu_id id, int nr)
+			      struct evsel *evsel, struct aggr_cpu_id id, int aggr_nr)
 {
 	FILE *output = config->output;
 	const char *sep = config->csv_sep;
@@ -237,19 +237,19 @@ static void print_aggr_id_csv(struct perf_stat_config *config,
 	switch (config->aggr_mode) {
 	case AGGR_CORE:
 		fprintf(output, "S%d-D%d-C%d%s%d%s",
-			id.socket, id.die, id.core, sep, nr, sep);
+			id.socket, id.die, id.core, sep, aggr_nr, sep);
 		break;
 	case AGGR_DIE:
 		fprintf(output, "S%d-D%d%s%d%s",
-			id.socket, id.die, sep, nr, sep);
+			id.socket, id.die, sep, aggr_nr, sep);
 		break;
 	case AGGR_SOCKET:
 		fprintf(output, "S%d%s%d%s",
-			id.socket, sep, nr, sep);
+			id.socket, sep, aggr_nr, sep);
 		break;
 	case AGGR_NODE:
 		fprintf(output, "N%d%s%d%s",
-			id.node, sep, nr, sep);
+			id.node, sep, aggr_nr, sep);
 		break;
 	case AGGR_NONE:
 		if (evsel->percore && !config->percore_show_thread) {
@@ -275,26 +275,26 @@ static void print_aggr_id_csv(struct perf_stat_config *config,
 }
 
 static void print_aggr_id_json(struct perf_stat_config *config,
-			       struct evsel *evsel, struct aggr_cpu_id id, int nr)
+			       struct evsel *evsel, struct aggr_cpu_id id, int aggr_nr)
 {
 	FILE *output = config->output;
 
 	switch (config->aggr_mode) {
 	case AGGR_CORE:
 		fprintf(output, "\"core\" : \"S%d-D%d-C%d\", \"aggregate-number\" : %d, ",
-			id.socket, id.die, id.core, nr);
+			id.socket, id.die, id.core, aggr_nr);
 		break;
 	case AGGR_DIE:
 		fprintf(output, "\"die\" : \"S%d-D%d\", \"aggregate-number\" : %d, ",
-			id.socket, id.die, nr);
+			id.socket, id.die, aggr_nr);
 		break;
 	case AGGR_SOCKET:
 		fprintf(output, "\"socket\" : \"S%d\", \"aggregate-number\" : %d, ",
-			id.socket, nr);
+			id.socket, aggr_nr);
 		break;
 	case AGGR_NODE:
 		fprintf(output, "\"node\" : \"N%d\", \"aggregate-number\" : %d, ",
-			id.node, nr);
+			id.node, aggr_nr);
 		break;
 	case AGGR_NONE:
 		if (evsel->percore && !config->percore_show_thread) {
@@ -319,14 +319,14 @@ static void print_aggr_id_json(struct perf_stat_config *config,
 }
 
 static void aggr_printout(struct perf_stat_config *config,
-			  struct evsel *evsel, struct aggr_cpu_id id, int nr)
+			  struct evsel *evsel, struct aggr_cpu_id id, int aggr_nr)
 {
 	if (config->json_output)
-		print_aggr_id_json(config, evsel, id, nr);
+		print_aggr_id_json(config, evsel, id, aggr_nr);
 	else if (config->csv_output)
-		print_aggr_id_csv(config, evsel, id, nr);
+		print_aggr_id_csv(config, evsel, id, aggr_nr);
 	else
-		print_aggr_id_std(config, evsel, id, nr);
+		print_aggr_id_std(config, evsel, id, aggr_nr);
 }
 
 struct outstate {
@@ -335,7 +335,7 @@ struct outstate {
 	bool first;
 	const char *prefix;
 	int  nfields;
-	int  nr;
+	int  aggr_nr;
 	struct aggr_cpu_id id;
 	struct evsel *evsel;
 	struct cgroup *cgrp;
@@ -355,7 +355,7 @@ static void do_new_line_std(struct perf_stat_config *config,
 	fputc('\n', os->fh);
 	if (os->prefix)
 		fputs(os->prefix, os->fh);
-	aggr_printout(config, os->evsel, os->id, os->nr);
+	aggr_printout(config, os->evsel, os->id, os->aggr_nr);
 	if (config->aggr_mode == AGGR_NONE)
 		fprintf(os->fh, "        ");
 	fprintf(os->fh, "                                                 ");
@@ -396,7 +396,7 @@ static void new_line_csv(struct perf_stat_config *config, void *ctx)
 	fputc('\n', os->fh);
 	if (os->prefix)
 		fprintf(os->fh, "%s", os->prefix);
-	aggr_printout(config, os->evsel, os->id, os->nr);
+	aggr_printout(config, os->evsel, os->id, os->aggr_nr);
 	for (i = 0; i < os->nfields; i++)
 		fputs(config->csv_sep, os->fh);
 }
@@ -444,7 +444,7 @@ static void new_line_json(struct perf_stat_config *config, void *ctx)
 	fputs("\n{", os->fh);
 	if (os->prefix)
 		fprintf(os->fh, "%s", os->prefix);
-	aggr_printout(config, os->evsel, os->id, os->nr);
+	aggr_printout(config, os->evsel, os->id, os->aggr_nr);
 }
 
 /* Filter out some columns that don't work well in metrics only mode */
@@ -645,10 +645,10 @@ static void print_counter_value(struct perf_stat_config *config,
 }
 
 static void abs_printout(struct perf_stat_config *config,
-			 struct aggr_cpu_id id, int nr,
+			 struct aggr_cpu_id id, int aggr_nr,
 			 struct evsel *evsel, double avg, bool ok)
 {
-	aggr_printout(config, evsel, id, nr);
+	aggr_printout(config, evsel, id, aggr_nr);
 	print_counter_value(config, evsel, avg, ok);
 	print_cgroup(config, evsel->cgrp);
 }
@@ -678,7 +678,7 @@ static bool is_mixed_hw_group(struct evsel *counter)
 }
 
 static void printout(struct perf_stat_config *config, struct outstate *os,
-		     double uval, u64 run, u64 ena, double noise, int map_idx)
+		     double uval, u64 run, u64 ena, double noise, int aggr_idx)
 {
 	struct perf_stat_output_ctx out;
 	print_metric_t pm;
@@ -721,14 +721,14 @@ static void printout(struct perf_stat_config *config, struct outstate *os,
 	out.force_header = false;
 
 	if (!config->metric_only) {
-		abs_printout(config, os->id, os->nr, counter, uval, ok);
+		abs_printout(config, os->id, os->aggr_nr, counter, uval, ok);
 
 		print_noise(config, counter, noise, /*before_metric=*/true);
 		print_running(config, run, ena, /*before_metric=*/true);
 	}
 
 	if (ok) {
-		perf_stat__print_shadow_stats(config, counter, uval, map_idx,
+		perf_stat__print_shadow_stats(config, counter, uval, aggr_idx,
 					      &out, &config->metric_events);
 	} else {
 		pm(config, os, /*color=*/NULL, /*format=*/NULL, /*unit=*/"", /*val=*/0);
@@ -833,20 +833,20 @@ static bool should_skip_zero_counter(struct perf_stat_config *config,
 }
 
 static void print_counter_aggrdata(struct perf_stat_config *config,
-				   struct evsel *counter, int s,
+				   struct evsel *counter, int aggr_idx,
 				   struct outstate *os)
 {
 	FILE *output = config->output;
 	u64 ena, run, val;
 	double uval;
 	struct perf_stat_evsel *ps = counter->stats;
-	struct perf_stat_aggr *aggr = &ps->aggr[s];
-	struct aggr_cpu_id id = config->aggr_map->map[s];
+	struct perf_stat_aggr *aggr = &ps->aggr[aggr_idx];
+	struct aggr_cpu_id id = config->aggr_map->map[aggr_idx];
 	double avg = aggr->counts.val;
 	bool metric_only = config->metric_only;
 
 	os->id = id;
-	os->nr = aggr->nr;
+	os->aggr_nr = aggr->nr;
 	os->evsel = counter;
 
 	/* Skip already merged uncore/hybrid events */
@@ -874,7 +874,7 @@ static void print_counter_aggrdata(struct perf_stat_config *config,
 
 	uval = val * counter->scale;
 
-	printout(config, os, uval, run, ena, avg, s);
+	printout(config, os, uval, run, ena, avg, aggr_idx);
 
 	if (!metric_only)
 		fputc('\n', output);
@@ -925,7 +925,7 @@ static void print_aggr(struct perf_stat_config *config,
 		       struct outstate *os)
 {
 	struct evsel *counter;
-	int s;
+	int aggr_idx;
 
 	if (!config->aggr_map || !config->aggr_get_id)
 		return;
@@ -934,11 +934,11 @@ static void print_aggr(struct perf_stat_config *config,
 	 * With metric_only everything is on a single line.
 	 * Without each counter has its own line.
 	 */
-	for (s = 0; s < config->aggr_map->nr; s++) {
-		print_metric_begin(config, evlist, os, s);
+	cpu_aggr_map__for_each_idx(aggr_idx, config->aggr_map) {
+		print_metric_begin(config, evlist, os, aggr_idx);
 
 		evlist__for_each_entry(evlist, counter) {
-			print_counter_aggrdata(config, counter, s, os);
+			print_counter_aggrdata(config, counter, aggr_idx, os);
 		}
 		print_metric_end(config, os);
 	}
@@ -949,7 +949,7 @@ static void print_aggr_cgroup(struct perf_stat_config *config,
 			      struct outstate *os)
 {
 	struct evsel *counter, *evsel;
-	int s;
+	int aggr_idx;
 
 	if (!config->aggr_map || !config->aggr_get_id)
 		return;
@@ -960,14 +960,14 @@ static void print_aggr_cgroup(struct perf_stat_config *config,
 
 		os->cgrp = evsel->cgrp;
 
-		for (s = 0; s < config->aggr_map->nr; s++) {
-			print_metric_begin(config, evlist, os, s);
+		cpu_aggr_map__for_each_idx(aggr_idx, config->aggr_map) {
+			print_metric_begin(config, evlist, os, aggr_idx);
 
 			evlist__for_each_entry(evlist, counter) {
 				if (counter->cgrp != os->cgrp)
 					continue;
 
-				print_counter_aggrdata(config, counter, s, os);
+				print_counter_aggrdata(config, counter, aggr_idx, os);
 			}
 			print_metric_end(config, os);
 		}
@@ -977,14 +977,14 @@ static void print_aggr_cgroup(struct perf_stat_config *config,
 static void print_counter(struct perf_stat_config *config,
 			  struct evsel *counter, struct outstate *os)
 {
-	int s;
+	int aggr_idx;
 
 	/* AGGR_THREAD doesn't have config->aggr_get_id */
 	if (!config->aggr_map)
 		return;
 
-	for (s = 0; s < config->aggr_map->nr; s++) {
-		print_counter_aggrdata(config, counter, s, os);
+	cpu_aggr_map__for_each_idx(aggr_idx, config->aggr_map) {
+		print_counter_aggrdata(config, counter, aggr_idx, os);
 	}
 }
 
@@ -1003,23 +1003,23 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
 			u64 ena, run, val;
 			double uval;
 			struct perf_stat_evsel *ps = counter->stats;
-			int counter_idx = perf_cpu_map__idx(evsel__cpus(counter), cpu);
+			int aggr_idx = perf_cpu_map__idx(evsel__cpus(counter), cpu);
 
-			if (counter_idx < 0)
+			if (aggr_idx < 0)
 				continue;
 
 			os->evsel = counter;
 			os->id = aggr_cpu_id__cpu(cpu, /*data=*/NULL);
 			if (first) {
-				print_metric_begin(config, evlist, os, counter_idx);
+				print_metric_begin(config, evlist, os, aggr_idx);
 				first = false;
 			}
-			val = ps->aggr[counter_idx].counts.val;
-			ena = ps->aggr[counter_idx].counts.ena;
-			run = ps->aggr[counter_idx].counts.run;
+			val = ps->aggr[aggr_idx].counts.val;
+			ena = ps->aggr[aggr_idx].counts.ena;
+			run = ps->aggr[aggr_idx].counts.run;
 
 			uval = val * counter->scale;
-			printout(config, os, uval, run, ena, 1.0, counter_idx);
+			printout(config, os, uval, run, ena, 1.0, aggr_idx);
 		}
 		if (!first)
 			print_metric_end(config, os);
@@ -1338,7 +1338,7 @@ static void print_percore(struct perf_stat_config *config,
 	bool metric_only = config->metric_only;
 	FILE *output = config->output;
 	struct cpu_aggr_map *core_map;
-	int s, c, i;
+	int aggr_idx, core_map_len = 0;
 
 	if (!config->aggr_map || !config->aggr_get_id)
 		return;
@@ -1346,18 +1346,22 @@ static void print_percore(struct perf_stat_config *config,
 	if (config->percore_show_thread)
 		return print_counter(config, counter, os);
 
+	/*
+	 * core_map will hold the aggr_cpu_id for the cores that have been
+	 * printed so that each core is printed just once.
+	 */
 	core_map = cpu_aggr_map__empty_new(config->aggr_map->nr);
 	if (core_map == NULL) {
 		fprintf(output, "Cannot allocate per-core aggr map for display\n");
 		return;
 	}
 
-	for (s = 0, c = 0; s < config->aggr_map->nr; s++) {
-		struct perf_cpu curr_cpu = config->aggr_map->map[s].cpu;
+	cpu_aggr_map__for_each_idx(aggr_idx, config->aggr_map) {
+		struct perf_cpu curr_cpu = config->aggr_map->map[aggr_idx].cpu;
 		struct aggr_cpu_id core_id = aggr_cpu_id__core(curr_cpu, NULL);
 		bool found = false;
 
-		for (i = 0; i < c; i++) {
+		for (int i = 0; i < core_map_len; i++) {
 			if (aggr_cpu_id__equal(&core_map->map[i], &core_id)) {
 				found = true;
 				break;
@@ -1366,9 +1370,9 @@ static void print_percore(struct perf_stat_config *config,
 		if (found)
 			continue;
 
-		print_counter_aggrdata(config, counter, s, os);
+		print_counter_aggrdata(config, counter, aggr_idx, os);
 
-		core_map->map[c++] = core_id;
+		core_map->map[core_map_len++] = core_id;
 	}
 	free(core_map);
 
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index f80be6abac90..7b48e2bd3ba1 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -231,7 +231,7 @@ static void update_runtime_stat(enum stat_type type,
  * instruction rates, etc:
  */
 void perf_stat__update_shadow_stats(struct evsel *counter, u64 count,
-				    int map_idx)
+				    int aggr_idx)
 {
 	u64 count_ns = count;
 	struct saved_value *v;
@@ -242,39 +242,39 @@ void perf_stat__update_shadow_stats(struct evsel *counter, u64 count,
 	count *= counter->scale;
 
 	if (evsel__is_clock(counter))
-		update_runtime_stat(STAT_NSECS, map_idx, count_ns, &rsd);
+		update_runtime_stat(STAT_NSECS, aggr_idx, count_ns, &rsd);
 	else if (evsel__match(counter, HARDWARE, HW_CPU_CYCLES))
-		update_runtime_stat(STAT_CYCLES, map_idx, count, &rsd);
+		update_runtime_stat(STAT_CYCLES, aggr_idx, count, &rsd);
 	else if (evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
 		update_runtime_stat(STAT_STALLED_CYCLES_FRONT,
-				    map_idx, count, &rsd);
+				    aggr_idx, count, &rsd);
 	else if (evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_BACKEND))
 		update_runtime_stat(STAT_STALLED_CYCLES_BACK,
-				    map_idx, count, &rsd);
+				    aggr_idx, count, &rsd);
 	else if (evsel__match(counter, HARDWARE, HW_BRANCH_INSTRUCTIONS))
-		update_runtime_stat(STAT_BRANCHES, map_idx, count, &rsd);
+		update_runtime_stat(STAT_BRANCHES, aggr_idx, count, &rsd);
 	else if (evsel__match(counter, HARDWARE, HW_CACHE_REFERENCES))
-		update_runtime_stat(STAT_CACHEREFS, map_idx, count, &rsd);
+		update_runtime_stat(STAT_CACHEREFS, aggr_idx, count, &rsd);
 	else if (evsel__match(counter, HW_CACHE, HW_CACHE_L1D))
-		update_runtime_stat(STAT_L1_DCACHE, map_idx, count, &rsd);
+		update_runtime_stat(STAT_L1_DCACHE, aggr_idx, count, &rsd);
 	else if (evsel__match(counter, HW_CACHE, HW_CACHE_L1I))
-		update_runtime_stat(STAT_L1_ICACHE, map_idx, count, &rsd);
+		update_runtime_stat(STAT_L1_ICACHE, aggr_idx, count, &rsd);
 	else if (evsel__match(counter, HW_CACHE, HW_CACHE_LL))
-		update_runtime_stat(STAT_LL_CACHE, map_idx, count, &rsd);
+		update_runtime_stat(STAT_LL_CACHE, aggr_idx, count, &rsd);
 	else if (evsel__match(counter, HW_CACHE, HW_CACHE_DTLB))
-		update_runtime_stat(STAT_DTLB_CACHE, map_idx, count, &rsd);
+		update_runtime_stat(STAT_DTLB_CACHE, aggr_idx, count, &rsd);
 	else if (evsel__match(counter, HW_CACHE, HW_CACHE_ITLB))
-		update_runtime_stat(STAT_ITLB_CACHE, map_idx, count, &rsd);
+		update_runtime_stat(STAT_ITLB_CACHE, aggr_idx, count, &rsd);
 
 	if (counter->collect_stat) {
-		v = saved_value_lookup(counter, map_idx, true, STAT_NONE, 0,
+		v = saved_value_lookup(counter, aggr_idx, true, STAT_NONE, 0,
 				       rsd.cgrp);
 		update_stats(&v->stats, count);
 		if (counter->metric_leader)
 			v->metric_total += count;
 	} else if (counter->metric_leader && !counter->merged_stat) {
 		v = saved_value_lookup(counter->metric_leader,
-				       map_idx, true, STAT_NONE, 0, rsd.cgrp);
+				       aggr_idx, true, STAT_NONE, 0, rsd.cgrp);
 		v->metric_total += count;
 		v->metric_other++;
 	}
@@ -307,24 +307,24 @@ static const char *get_ratio_color(enum grc_type type, double ratio)
 	return color;
 }
 
-static double runtime_stat_avg(enum stat_type type, int map_idx,
+static double runtime_stat_avg(enum stat_type type, int aggr_idx,
 			       struct runtime_stat_data *rsd)
 {
 	struct saved_value *v;
 
-	v = saved_value_lookup(NULL, map_idx, false, type, rsd->ctx, rsd->cgrp);
+	v = saved_value_lookup(NULL, aggr_idx, false, type, rsd->ctx, rsd->cgrp);
 	if (!v)
 		return 0.0;
 
 	return avg_stats(&v->stats);
 }
 
-static double runtime_stat_n(enum stat_type type, int map_idx,
+static double runtime_stat_n(enum stat_type type, int aggr_idx,
 			     struct runtime_stat_data *rsd)
 {
 	struct saved_value *v;
 
-	v = saved_value_lookup(NULL, map_idx, false, type, rsd->ctx, rsd->cgrp);
+	v = saved_value_lookup(NULL, aggr_idx, false, type, rsd->ctx, rsd->cgrp);
 	if (!v)
 		return 0.0;
 
@@ -332,14 +332,14 @@ static double runtime_stat_n(enum stat_type type, int map_idx,
 }
 
 static void print_stalled_cycles_frontend(struct perf_stat_config *config,
-					  int map_idx, double avg,
+					  int aggr_idx, double avg,
 					  struct perf_stat_output_ctx *out,
 					  struct runtime_stat_data *rsd)
 {
 	double total, ratio = 0.0;
 	const char *color;
 
-	total = runtime_stat_avg(STAT_CYCLES, map_idx, rsd);
+	total = runtime_stat_avg(STAT_CYCLES, aggr_idx, rsd);
 
 	if (total)
 		ratio = avg / total * 100.0;
@@ -354,14 +354,14 @@ static void print_stalled_cycles_frontend(struct perf_stat_config *config,
 }
 
 static void print_stalled_cycles_backend(struct perf_stat_config *config,
-					 int map_idx, double avg,
+					 int aggr_idx, double avg,
 					 struct perf_stat_output_ctx *out,
 					 struct runtime_stat_data *rsd)
 {
 	double total, ratio = 0.0;
 	const char *color;
 
-	total = runtime_stat_avg(STAT_CYCLES, map_idx, rsd);
+	total = runtime_stat_avg(STAT_CYCLES, aggr_idx, rsd);
 
 	if (total)
 		ratio = avg / total * 100.0;
@@ -372,14 +372,14 @@ static void print_stalled_cycles_backend(struct perf_stat_config *config,
 }
 
 static void print_branch_misses(struct perf_stat_config *config,
-				int map_idx, double avg,
+				int aggr_idx, double avg,
 				struct perf_stat_output_ctx *out,
 				struct runtime_stat_data *rsd)
 {
 	double total, ratio = 0.0;
 	const char *color;
 
-	total = runtime_stat_avg(STAT_BRANCHES, map_idx, rsd);
+	total = runtime_stat_avg(STAT_BRANCHES, aggr_idx, rsd);
 
 	if (total)
 		ratio = avg / total * 100.0;
@@ -390,14 +390,14 @@ static void print_branch_misses(struct perf_stat_config *config,
 }
 
 static void print_l1_dcache_misses(struct perf_stat_config *config,
-				   int map_idx, double avg,
+				   int aggr_idx, double avg,
 				   struct perf_stat_output_ctx *out,
 				   struct runtime_stat_data *rsd)
 {
 	double total, ratio = 0.0;
 	const char *color;
 
-	total = runtime_stat_avg(STAT_L1_DCACHE, map_idx, rsd);
+	total = runtime_stat_avg(STAT_L1_DCACHE, aggr_idx, rsd);
 
 	if (total)
 		ratio = avg / total * 100.0;
@@ -408,14 +408,14 @@ static void print_l1_dcache_misses(struct perf_stat_config *config,
 }
 
 static void print_l1_icache_misses(struct perf_stat_config *config,
-				   int map_idx, double avg,
+				   int aggr_idx, double avg,
 				   struct perf_stat_output_ctx *out,
 				   struct runtime_stat_data *rsd)
 {
 	double total, ratio = 0.0;
 	const char *color;
 
-	total = runtime_stat_avg(STAT_L1_ICACHE, map_idx, rsd);
+	total = runtime_stat_avg(STAT_L1_ICACHE, aggr_idx, rsd);
 
 	if (total)
 		ratio = avg / total * 100.0;
@@ -425,14 +425,14 @@ static void print_l1_icache_misses(struct perf_stat_config *config,
 }
 
 static void print_dtlb_cache_misses(struct perf_stat_config *config,
-				    int map_idx, double avg,
+				    int aggr_idx, double avg,
 				    struct perf_stat_output_ctx *out,
 				    struct runtime_stat_data *rsd)
 {
 	double total, ratio = 0.0;
 	const char *color;
 
-	total = runtime_stat_avg(STAT_DTLB_CACHE, map_idx, rsd);
+	total = runtime_stat_avg(STAT_DTLB_CACHE, aggr_idx, rsd);
 
 	if (total)
 		ratio = avg / total * 100.0;
@@ -442,14 +442,14 @@ static void print_dtlb_cache_misses(struct perf_stat_config *config,
 }
 
 static void print_itlb_cache_misses(struct perf_stat_config *config,
-				    int map_idx, double avg,
+				    int aggr_idx, double avg,
 				    struct perf_stat_output_ctx *out,
 				    struct runtime_stat_data *rsd)
 {
 	double total, ratio = 0.0;
 	const char *color;
 
-	total = runtime_stat_avg(STAT_ITLB_CACHE, map_idx, rsd);
+	total = runtime_stat_avg(STAT_ITLB_CACHE, aggr_idx, rsd);
 
 	if (total)
 		ratio = avg / total * 100.0;
@@ -459,14 +459,14 @@ static void print_itlb_cache_misses(struct perf_stat_config *config,
 }
 
 static void print_ll_cache_misses(struct perf_stat_config *config,
-				  int map_idx, double avg,
+				  int aggr_idx, double avg,
 				  struct perf_stat_output_ctx *out,
 				  struct runtime_stat_data *rsd)
 {
 	double total, ratio = 0.0;
 	const char *color;
 
-	total = runtime_stat_avg(STAT_LL_CACHE, map_idx, rsd);
+	total = runtime_stat_avg(STAT_LL_CACHE, aggr_idx, rsd);
 
 	if (total)
 		ratio = avg / total * 100.0;
@@ -478,7 +478,7 @@ static void print_ll_cache_misses(struct perf_stat_config *config,
 static int prepare_metric(struct evsel **metric_events,
 			  struct metric_ref *metric_refs,
 			  struct expr_parse_ctx *pctx,
-			  int map_idx)
+			  int aggr_idx)
 {
 	double scale;
 	char *n;
@@ -516,7 +516,7 @@ static int prepare_metric(struct evsel **metric_events,
 				abort();
 			}
 		} else {
-			v = saved_value_lookup(metric_events[i], map_idx, false,
+			v = saved_value_lookup(metric_events[i], aggr_idx, false,
 					       STAT_NONE, 0,
 					       metric_events[i]->cgrp);
 			if (!v)
@@ -560,7 +560,7 @@ static void generic_metric(struct perf_stat_config *config,
 			   const char *metric_name,
 			   const char *metric_unit,
 			   int runtime,
-			   int map_idx,
+			   int aggr_idx,
 			   struct perf_stat_output_ctx *out)
 {
 	print_metric_t print_metric = out->print_metric;
@@ -578,7 +578,7 @@ static void generic_metric(struct perf_stat_config *config,
 		pctx->sctx.user_requested_cpu_list = strdup(config->user_requested_cpu_list);
 	pctx->sctx.runtime = runtime;
 	pctx->sctx.system_wide = config->system_wide;
-	i = prepare_metric(metric_events, metric_refs, pctx, map_idx);
+	i = prepare_metric(metric_events, metric_refs, pctx, aggr_idx);
 	if (i < 0) {
 		expr__ctx_free(pctx);
 		return;
@@ -630,7 +630,7 @@ static void generic_metric(struct perf_stat_config *config,
 	expr__ctx_free(pctx);
 }
 
-double test_generic_metric(struct metric_expr *mexp, int map_idx)
+double test_generic_metric(struct metric_expr *mexp, int aggr_idx)
 {
 	struct expr_parse_ctx *pctx;
 	double ratio = 0.0;
@@ -639,7 +639,7 @@ double test_generic_metric(struct metric_expr *mexp, int map_idx)
 	if (!pctx)
 		return NAN;
 
-	if (prepare_metric(mexp->metric_events, mexp->metric_refs, pctx, map_idx) < 0)
+	if (prepare_metric(mexp->metric_events, mexp->metric_refs, pctx, aggr_idx) < 0)
 		goto out;
 
 	if (expr__parse(&ratio, pctx, mexp->metric_expr))
@@ -652,7 +652,7 @@ double test_generic_metric(struct metric_expr *mexp, int map_idx)
 
 void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 				   struct evsel *evsel,
-				   double avg, int map_idx,
+				   double avg, int aggr_idx,
 				   struct perf_stat_output_ctx *out,
 				   struct rblist *metric_events)
 {
@@ -669,7 +669,7 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 	if (config->iostat_run) {
 		iostat_print_metric(config, evsel, out);
 	} else if (evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) {
-		total = runtime_stat_avg(STAT_CYCLES, map_idx, &rsd);
+		total = runtime_stat_avg(STAT_CYCLES, aggr_idx, &rsd);
 
 		if (total) {
 			ratio = avg / total;
@@ -679,10 +679,10 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 			print_metric(config, ctxp, NULL, NULL, "insn per cycle", 0);
 		}
 
-		total = runtime_stat_avg(STAT_STALLED_CYCLES_FRONT, map_idx, &rsd);
+		total = runtime_stat_avg(STAT_STALLED_CYCLES_FRONT, aggr_idx, &rsd);
 
 		total = max(total, runtime_stat_avg(STAT_STALLED_CYCLES_BACK,
-						    map_idx, &rsd));
+						    aggr_idx, &rsd));
 
 		if (total && avg) {
 			out->new_line(config, ctxp);
@@ -692,8 +692,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 					ratio);
 		}
 	} else if (evsel__match(evsel, HARDWARE, HW_BRANCH_MISSES)) {
-		if (runtime_stat_n(STAT_BRANCHES, map_idx, &rsd) != 0)
-			print_branch_misses(config, map_idx, avg, out, &rsd);
+		if (runtime_stat_n(STAT_BRANCHES, aggr_idx, &rsd) != 0)
+			print_branch_misses(config, aggr_idx, avg, out, &rsd);
 		else
 			print_metric(config, ctxp, NULL, NULL, "of all branches", 0);
 	} else if (
@@ -702,8 +702,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
 					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
 
-		if (runtime_stat_n(STAT_L1_DCACHE, map_idx, &rsd) != 0)
-			print_l1_dcache_misses(config, map_idx, avg, out, &rsd);
+		if (runtime_stat_n(STAT_L1_DCACHE, aggr_idx, &rsd) != 0)
+			print_l1_dcache_misses(config, aggr_idx, avg, out, &rsd);
 		else
 			print_metric(config, ctxp, NULL, NULL, "of all L1-dcache accesses", 0);
 	} else if (
@@ -712,8 +712,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
 					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
 
-		if (runtime_stat_n(STAT_L1_ICACHE, map_idx, &rsd) != 0)
-			print_l1_icache_misses(config, map_idx, avg, out, &rsd);
+		if (runtime_stat_n(STAT_L1_ICACHE, aggr_idx, &rsd) != 0)
+			print_l1_icache_misses(config, aggr_idx, avg, out, &rsd);
 		else
 			print_metric(config, ctxp, NULL, NULL, "of all L1-icache accesses", 0);
 	} else if (
@@ -722,8 +722,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
 					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
 
-		if (runtime_stat_n(STAT_DTLB_CACHE, map_idx, &rsd) != 0)
-			print_dtlb_cache_misses(config, map_idx, avg, out, &rsd);
+		if (runtime_stat_n(STAT_DTLB_CACHE, aggr_idx, &rsd) != 0)
+			print_dtlb_cache_misses(config, aggr_idx, avg, out, &rsd);
 		else
 			print_metric(config, ctxp, NULL, NULL, "of all dTLB cache accesses", 0);
 	} else if (
@@ -732,8 +732,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
 					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
 
-		if (runtime_stat_n(STAT_ITLB_CACHE, map_idx, &rsd) != 0)
-			print_itlb_cache_misses(config, map_idx, avg, out, &rsd);
+		if (runtime_stat_n(STAT_ITLB_CACHE, aggr_idx, &rsd) != 0)
+			print_itlb_cache_misses(config, aggr_idx, avg, out, &rsd);
 		else
 			print_metric(config, ctxp, NULL, NULL, "of all iTLB cache accesses", 0);
 	} else if (
@@ -742,27 +742,27 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
 					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
 
-		if (runtime_stat_n(STAT_LL_CACHE, map_idx, &rsd) != 0)
-			print_ll_cache_misses(config, map_idx, avg, out, &rsd);
+		if (runtime_stat_n(STAT_LL_CACHE, aggr_idx, &rsd) != 0)
+			print_ll_cache_misses(config, aggr_idx, avg, out, &rsd);
 		else
 			print_metric(config, ctxp, NULL, NULL, "of all LL-cache accesses", 0);
 	} else if (evsel__match(evsel, HARDWARE, HW_CACHE_MISSES)) {
-		total = runtime_stat_avg(STAT_CACHEREFS, map_idx, &rsd);
+		total = runtime_stat_avg(STAT_CACHEREFS, aggr_idx, &rsd);
 
 		if (total)
 			ratio = avg * 100 / total;
 
-		if (runtime_stat_n(STAT_CACHEREFS, map_idx, &rsd) != 0)
+		if (runtime_stat_n(STAT_CACHEREFS, aggr_idx, &rsd) != 0)
 			print_metric(config, ctxp, NULL, "%8.3f %%",
 				     "of all cache refs", ratio);
 		else
 			print_metric(config, ctxp, NULL, NULL, "of all cache refs", 0);
 	} else if (evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_FRONTEND)) {
-		print_stalled_cycles_frontend(config, map_idx, avg, out, &rsd);
+		print_stalled_cycles_frontend(config, aggr_idx, avg, out, &rsd);
 	} else if (evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_BACKEND)) {
-		print_stalled_cycles_backend(config, map_idx, avg, out, &rsd);
+		print_stalled_cycles_backend(config, aggr_idx, avg, out, &rsd);
 	} else if (evsel__match(evsel, HARDWARE, HW_CPU_CYCLES)) {
-		total = runtime_stat_avg(STAT_NSECS, map_idx, &rsd);
+		total = runtime_stat_avg(STAT_NSECS, aggr_idx, &rsd);
 
 		if (total) {
 			ratio = avg / total;
@@ -776,11 +776,11 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 				     avg / (ratio * evsel->scale));
 		else
 			print_metric(config, ctxp, NULL, NULL, "CPUs utilized", 0);
-	} else if (runtime_stat_n(STAT_NSECS, map_idx, &rsd) != 0) {
+	} else if (runtime_stat_n(STAT_NSECS, aggr_idx, &rsd) != 0) {
 		char unit = ' ';
 		char unit_buf[10] = "/sec";
 
-		total = runtime_stat_avg(STAT_NSECS, map_idx, &rsd);
+		total = runtime_stat_avg(STAT_NSECS, aggr_idx, &rsd);
 		if (total)
 			ratio = convert_unit_double(1000000000.0 * avg / total, &unit);
 
@@ -800,7 +800,7 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 			generic_metric(config, mexp->metric_expr, mexp->metric_threshold,
 				       mexp->metric_events, mexp->metric_refs, evsel->name,
 				       mexp->metric_name, mexp->metric_unit, mexp->runtime,
-				       map_idx, out);
+				       aggr_idx, out);
 		}
 	}
 	if (num == 0)
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 0d7538670d67..83dc4c1f4b12 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -651,15 +651,15 @@ void perf_stat_process_percore(struct perf_stat_config *config, struct evlist *e
 static void evsel__update_shadow_stats(struct evsel *evsel)
 {
 	struct perf_stat_evsel *ps = evsel->stats;
-	int i;
+	int aggr_idx;
 
 	if (ps->aggr == NULL)
 		return;
 
-	for (i = 0; i < ps->nr_aggr; i++) {
-		struct perf_counts_values *aggr_counts = &ps->aggr[i].counts;
+	for (aggr_idx = 0; aggr_idx < ps->nr_aggr; aggr_idx++) {
+		struct perf_counts_values *aggr_counts = &ps->aggr[aggr_idx].counts;
 
-		perf_stat__update_shadow_stats(evsel, aggr_counts->val, i);
+		perf_stat__update_shadow_stats(evsel, aggr_counts->val, aggr_idx);
 	}
 }
 
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 09975e098bd0..b01c828c3799 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -160,7 +160,7 @@ typedef void (*new_line_t)(struct perf_stat_config *config, void *ctx);
 void perf_stat__init_shadow_stats(void);
 void perf_stat__reset_shadow_stats(void);
 void perf_stat__reset_shadow_per_stat(void);
-void perf_stat__update_shadow_stats(struct evsel *counter, u64 count, int map_idx);
+void perf_stat__update_shadow_stats(struct evsel *counter, u64 count, int aggr_idx);
 struct perf_stat_output_ctx {
 	void *ctx;
 	print_metric_t print_metric;
@@ -170,7 +170,7 @@ struct perf_stat_output_ctx {
 
 void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 				   struct evsel *evsel,
-				   double avg, int map_idx,
+				   double avg, int aggr_idx,
 				   struct perf_stat_output_ctx *out,
 				   struct rblist *metric_events);
 
@@ -211,5 +211,5 @@ void evlist__print_counters(struct evlist *evlist, struct perf_stat_config *conf
 			    struct target *_target, struct timespec *ts, int argc, const char **argv);
 
 struct metric_expr;
-double test_generic_metric(struct metric_expr *mexp, int map_idx);
+double test_generic_metric(struct metric_expr *mexp, int aggr_idx);
 #endif
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 49/51] perf metric: Directly use counts rather than saved_value
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (31 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 48/51] perf stat: Add cpu_aggr_map for loop Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19  9:28 ` [PATCH v1 50/51] perf stat: Use " Ian Rogers
                   ` (3 subsequent siblings)
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Bugs with double aggregation have been introduced because of
aggregation of counters and again with saved_value. Remove the generic
metric use-case. Update parse-metric and pmu-events tests to update
aggregate rather than saved_value counts.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/tests/parse-metric.c |  4 +--
 tools/perf/tests/pmu-events.c   |  4 +--
 tools/perf/util/stat-shadow.c   | 56 +++++++++++----------------------
 3 files changed, 23 insertions(+), 41 deletions(-)

diff --git a/tools/perf/tests/parse-metric.c b/tools/perf/tests/parse-metric.c
index 37e3371d978e..b9b8a48289c4 100644
--- a/tools/perf/tests/parse-metric.c
+++ b/tools/perf/tests/parse-metric.c
@@ -35,10 +35,10 @@ static void load_runtime_stat(struct evlist *evlist, struct value *vals)
 	struct evsel *evsel;
 	u64 count;
 
-	perf_stat__reset_shadow_stats();
+	evlist__alloc_aggr_stats(evlist, 1);
 	evlist__for_each_entry(evlist, evsel) {
 		count = find_value(evsel->name, vals);
-		perf_stat__update_shadow_stats(evsel, count, 0);
+		evsel->stats->aggr->counts.val = count;
 		if (!strcmp(evsel->name, "duration_time"))
 			update_stats(&walltime_nsecs_stats, count);
 	}
diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index 122e74c282a7..4ec2a4ca1a82 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -863,9 +863,9 @@ static int test__parsing_callback(const struct pmu_metric *pm,
 	 * zero when subtracted and so try to make them unique.
 	 */
 	k = 1;
-	perf_stat__reset_shadow_stats();
+	evlist__alloc_aggr_stats(evlist, 1);
 	evlist__for_each_entry(evlist, evsel) {
-		perf_stat__update_shadow_stats(evsel, k, 0);
+		evsel->stats->aggr->counts.val = k;
 		if (!strcmp(evsel->name, "duration_time"))
 			update_stats(&walltime_nsecs_stats, k);
 		k++;
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 7b48e2bd3ba1..eba98520cea2 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -234,7 +234,6 @@ void perf_stat__update_shadow_stats(struct evsel *counter, u64 count,
 				    int aggr_idx)
 {
 	u64 count_ns = count;
-	struct saved_value *v;
 	struct runtime_stat_data rsd = {
 		.ctx = evsel_context(counter),
 		.cgrp = counter->cgrp,
@@ -265,19 +264,6 @@ void perf_stat__update_shadow_stats(struct evsel *counter, u64 count,
 		update_runtime_stat(STAT_DTLB_CACHE, aggr_idx, count, &rsd);
 	else if (evsel__match(counter, HW_CACHE, HW_CACHE_ITLB))
 		update_runtime_stat(STAT_ITLB_CACHE, aggr_idx, count, &rsd);
-
-	if (counter->collect_stat) {
-		v = saved_value_lookup(counter, aggr_idx, true, STAT_NONE, 0,
-				       rsd.cgrp);
-		update_stats(&v->stats, count);
-		if (counter->metric_leader)
-			v->metric_total += count;
-	} else if (counter->metric_leader && !counter->merged_stat) {
-		v = saved_value_lookup(counter->metric_leader,
-				       aggr_idx, true, STAT_NONE, 0, rsd.cgrp);
-		v->metric_total += count;
-		v->metric_other++;
-	}
 }
 
 /* used for get_ratio_color() */
@@ -480,18 +466,17 @@ static int prepare_metric(struct evsel **metric_events,
 			  struct expr_parse_ctx *pctx,
 			  int aggr_idx)
 {
-	double scale;
-	char *n;
-	int i, j, ret;
+	int i;
 
 	for (i = 0; metric_events[i]; i++) {
-		struct saved_value *v;
-		struct stats *stats;
-		u64 metric_total = 0;
-		int source_count;
+		char *n;
+		double val;
+		int source_count = 0;
 
 		if (evsel__is_tool(metric_events[i])) {
-			source_count = 1;
+			struct stats *stats;
+			double scale;
+
 			switch (metric_events[i]->tool_event) {
 			case PERF_TOOL_DURATION_TIME:
 				stats = &walltime_nsecs_stats;
@@ -515,35 +500,32 @@ static int prepare_metric(struct evsel **metric_events,
 				pr_err("Unknown tool event '%s'", evsel__name(metric_events[i]));
 				abort();
 			}
+			val = avg_stats(stats) * scale;
+			source_count = 1;
 		} else {
-			v = saved_value_lookup(metric_events[i], aggr_idx, false,
-					       STAT_NONE, 0,
-					       metric_events[i]->cgrp);
-			if (!v)
+			struct perf_stat_evsel *ps = metric_events[i]->stats;
+			struct perf_stat_aggr *aggr = &ps->aggr[aggr_idx];
+
+			if (!aggr)
 				break;
-			stats = &v->stats;
+
 			/*
 			 * If an event was scaled during stat gathering, reverse
 			 * the scale before computing the metric.
 			 */
-			scale = 1.0 / metric_events[i]->scale;
-
+			val = aggr->counts.val * (1.0 / metric_events[i]->scale);
 			source_count = evsel__source_count(metric_events[i]);
-
-			if (v->metric_other)
-				metric_total = v->metric_total * scale;
 		}
 		n = strdup(evsel__metric_id(metric_events[i]));
 		if (!n)
 			return -ENOMEM;
 
-		expr__add_id_val_source_count(pctx, n,
-					metric_total ? : avg_stats(stats) * scale,
-					source_count);
+		expr__add_id_val_source_count(pctx, n, val, source_count);
 	}
 
-	for (j = 0; metric_refs && metric_refs[j].metric_name; j++) {
-		ret = expr__add_ref(pctx, &metric_refs[j]);
+	for (int j = 0; metric_refs && metric_refs[j].metric_name; j++) {
+		int ret = expr__add_ref(pctx, &metric_refs[j]);
+
 		if (ret)
 			return ret;
 	}
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 50/51] perf stat: Use counts rather than saved_value
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (32 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 49/51] perf metric: Directly use counts rather than saved_value Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-24 22:48   ` Namhyung Kim
  2023-02-19  9:28 ` [PATCH v1 51/51] perf stat: Remove saved_value/runtime_stat Ian Rogers
                   ` (2 subsequent siblings)
  36 siblings, 1 reply; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

Switch the hard coded metrics to use the aggregate value rather than
from saved_value. When computing a metric like IPC the aggregate count
comes from instructions then cycles is looked up and if present IPC
computed. Rather than lookup from the saved_value rbtree, search the
counter's evlist for the desired counter.

A new helper evsel__stat_type is used to both quickly find a metric
function and to identify when a counter is the one being sought. So
that both total and miss counts can be sought, the stat_type enum is
expanded. The ratio functions are rewritten to share a common helper
with the ratios being directly passed rather than computed from an
enum value.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/evsel.h       |   2 +-
 tools/perf/util/stat-shadow.c | 534 +++++++++++++++++-----------------
 2 files changed, 270 insertions(+), 266 deletions(-)

diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 24cb807ef6ce..814a49ebb7e3 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -436,7 +436,7 @@ static inline bool evsel__is_bpf_output(struct evsel *evsel)
 	return evsel__match(evsel, SOFTWARE, SW_BPF_OUTPUT);
 }
 
-static inline bool evsel__is_clock(struct evsel *evsel)
+static inline bool evsel__is_clock(const struct evsel *evsel)
 {
 	return evsel__match(evsel, SOFTWARE, SW_CPU_CLOCK) ||
 	       evsel__match(evsel, SOFTWARE, SW_TASK_CLOCK);
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index eba98520cea2..9d22cde09dc9 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -45,15 +45,23 @@ enum stat_type {
 	STAT_NONE = 0,
 	STAT_NSECS,
 	STAT_CYCLES,
+	STAT_INSTRUCTIONS,
 	STAT_STALLED_CYCLES_FRONT,
 	STAT_STALLED_CYCLES_BACK,
 	STAT_BRANCHES,
-	STAT_CACHEREFS,
+	STAT_BRANCH_MISS,
+	STAT_CACHE_REFS,
+	STAT_CACHE_MISSES,
 	STAT_L1_DCACHE,
 	STAT_L1_ICACHE,
 	STAT_LL_CACHE,
 	STAT_ITLB_CACHE,
 	STAT_DTLB_CACHE,
+	STAT_L1D_MISS,
+	STAT_L1I_MISS,
+	STAT_LL_MISS,
+	STAT_DTLB_MISS,
+	STAT_ITLB_MISS,
 	STAT_MAX
 };
 
@@ -168,7 +176,7 @@ void perf_stat__init_shadow_stats(void)
 	rblist->node_delete = saved_value_delete;
 }
 
-static int evsel_context(struct evsel *evsel)
+static int evsel_context(const struct evsel *evsel)
 {
 	int ctx = 0;
 
@@ -253,7 +261,7 @@ void perf_stat__update_shadow_stats(struct evsel *counter, u64 count,
 	else if (evsel__match(counter, HARDWARE, HW_BRANCH_INSTRUCTIONS))
 		update_runtime_stat(STAT_BRANCHES, aggr_idx, count, &rsd);
 	else if (evsel__match(counter, HARDWARE, HW_CACHE_REFERENCES))
-		update_runtime_stat(STAT_CACHEREFS, aggr_idx, count, &rsd);
+		update_runtime_stat(STAT_CACHE_REFS, aggr_idx, count, &rsd);
 	else if (evsel__match(counter, HW_CACHE, HW_CACHE_L1D))
 		update_runtime_stat(STAT_L1_DCACHE, aggr_idx, count, &rsd);
 	else if (evsel__match(counter, HW_CACHE, HW_CACHE_L1I))
@@ -266,199 +274,283 @@ void perf_stat__update_shadow_stats(struct evsel *counter, u64 count,
 		update_runtime_stat(STAT_ITLB_CACHE, aggr_idx, count, &rsd);
 }
 
-/* used for get_ratio_color() */
-enum grc_type {
-	GRC_STALLED_CYCLES_FE,
-	GRC_STALLED_CYCLES_BE,
-	GRC_CACHE_MISSES,
-	GRC_MAX_NR
-};
+static enum stat_type evsel__stat_type(const struct evsel *evsel)
+{
+	/* Fake perf_hw_cache_op_id values for use with evsel__match. */
+	u64 PERF_COUNT_hw_cache_l1d_miss = PERF_COUNT_HW_CACHE_L1D |
+		((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
+		((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16);
+	u64 PERF_COUNT_hw_cache_l1i_miss = PERF_COUNT_HW_CACHE_L1I |
+		((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
+		((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16);
+	u64 PERF_COUNT_hw_cache_ll_miss = PERF_COUNT_HW_CACHE_LL |
+		((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
+		((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16);
+	u64 PERF_COUNT_hw_cache_dtlb_miss = PERF_COUNT_HW_CACHE_DTLB |
+		((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
+		((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16);
+	u64 PERF_COUNT_hw_cache_itlb_miss = PERF_COUNT_HW_CACHE_ITLB |
+		((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
+		((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16);
+
+	if (evsel__is_clock(evsel))
+		return STAT_NSECS;
+	else if (evsel__match(evsel, HARDWARE, HW_CPU_CYCLES))
+		return STAT_CYCLES;
+	else if (evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS))
+		return STAT_INSTRUCTIONS;
+	else if (evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
+		return STAT_STALLED_CYCLES_FRONT;
+	else if (evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_BACKEND))
+		return STAT_STALLED_CYCLES_BACK;
+	else if (evsel__match(evsel, HARDWARE, HW_BRANCH_INSTRUCTIONS))
+		return STAT_BRANCHES;
+	else if (evsel__match(evsel, HARDWARE, HW_BRANCH_MISSES))
+		return STAT_BRANCH_MISS;
+	else if (evsel__match(evsel, HARDWARE, HW_CACHE_REFERENCES))
+		return STAT_CACHE_REFS;
+	else if (evsel__match(evsel, HARDWARE, HW_CACHE_MISSES))
+		return STAT_CACHE_MISSES;
+	else if (evsel__match(evsel, HW_CACHE, HW_CACHE_L1D))
+		return STAT_L1_DCACHE;
+	else if (evsel__match(evsel, HW_CACHE, HW_CACHE_L1I))
+		return STAT_L1_ICACHE;
+	else if (evsel__match(evsel, HW_CACHE, HW_CACHE_LL))
+		return STAT_LL_CACHE;
+	else if (evsel__match(evsel, HW_CACHE, HW_CACHE_DTLB))
+		return STAT_DTLB_CACHE;
+	else if (evsel__match(evsel, HW_CACHE, HW_CACHE_ITLB))
+		return STAT_ITLB_CACHE;
+	else if (evsel__match(evsel, HW_CACHE, hw_cache_l1d_miss))
+		return STAT_L1D_MISS;
+	else if (evsel__match(evsel, HW_CACHE, hw_cache_l1i_miss))
+		return STAT_L1I_MISS;
+	else if (evsel__match(evsel, HW_CACHE, hw_cache_ll_miss))
+		return STAT_LL_MISS;
+	else if (evsel__match(evsel, HW_CACHE, hw_cache_dtlb_miss))
+		return STAT_DTLB_MISS;
+	else if (evsel__match(evsel, HW_CACHE, hw_cache_itlb_miss))
+		return STAT_ITLB_MISS;
+	return STAT_NONE;
+}
 
-static const char *get_ratio_color(enum grc_type type, double ratio)
+static const char *get_ratio_color(const double ratios[3], double val)
 {
-	static const double grc_table[GRC_MAX_NR][3] = {
-		[GRC_STALLED_CYCLES_FE] = { 50.0, 30.0, 10.0 },
-		[GRC_STALLED_CYCLES_BE] = { 75.0, 50.0, 20.0 },
-		[GRC_CACHE_MISSES] 	= { 20.0, 10.0, 5.0 },
-	};
 	const char *color = PERF_COLOR_NORMAL;
 
-	if (ratio > grc_table[type][0])
+	if (val > ratios[0])
 		color = PERF_COLOR_RED;
-	else if (ratio > grc_table[type][1])
+	else if (val > ratios[1])
 		color = PERF_COLOR_MAGENTA;
-	else if (ratio > grc_table[type][2])
+	else if (val > ratios[2])
 		color = PERF_COLOR_YELLOW;
 
 	return color;
 }
 
-static double runtime_stat_avg(enum stat_type type, int aggr_idx,
-			       struct runtime_stat_data *rsd)
+static double find_stat(const struct evsel *evsel, int aggr_idx, enum stat_type type)
 {
-	struct saved_value *v;
-
-	v = saved_value_lookup(NULL, aggr_idx, false, type, rsd->ctx, rsd->cgrp);
-	if (!v)
-		return 0.0;
-
-	return avg_stats(&v->stats);
+	const struct evsel *cur;
+	int evsel_ctx = evsel_context(evsel);
+
+	evlist__for_each_entry(evsel->evlist, cur) {
+		struct perf_stat_aggr *aggr;
+
+		/* Ignore the evsel that is being searched from. */
+		if (evsel == cur)
+			continue;
+
+		/* Ignore evsels that are part of different groups. */
+		if (evsel->core.leader->nr_members &&
+		    evsel->core.leader != cur->core.leader)
+			continue;
+		/* Ignore evsels with mismatched modifiers. */
+		if (evsel_ctx != evsel_context(cur))
+			continue;
+		/* Ignore if not the cgroup we're looking for. */
+		if (evsel->cgrp != cur->cgrp)
+			continue;
+		/* Ignore if not the stat we're looking for. */
+		if (type != evsel__stat_type(cur))
+			continue;
+
+		aggr = &cur->stats->aggr[aggr_idx];
+		if (type == STAT_NSECS)
+			return aggr->counts.val;
+		return aggr->counts.val * cur->scale;
+	}
+	return 0.0;
 }
 
-static double runtime_stat_n(enum stat_type type, int aggr_idx,
-			     struct runtime_stat_data *rsd)
+static void print_ratio(struct perf_stat_config *config,
+			const struct evsel *evsel, int aggr_idx,
+			double numerator, struct perf_stat_output_ctx *out,
+			enum stat_type denominator_type,
+			const double color_ratios[3], const char *unit)
 {
-	struct saved_value *v;
+	double denominator = find_stat(evsel, aggr_idx, denominator_type);
 
-	v = saved_value_lookup(NULL, aggr_idx, false, type, rsd->ctx, rsd->cgrp);
-	if (!v)
-		return 0.0;
+	if (numerator && denominator) {
+		double ratio = numerator / denominator * 100.0;
+		const char *color = get_ratio_color(color_ratios, ratio);
 
-	return v->stats.n;
+		out->print_metric(config, out->ctx, color, "%7.2f%%", unit, ratio);
+	} else
+		out->print_metric(config, out->ctx, NULL, NULL, unit, 0);
 }
 
-static void print_stalled_cycles_frontend(struct perf_stat_config *config,
-					  int aggr_idx, double avg,
-					  struct perf_stat_output_ctx *out,
-					  struct runtime_stat_data *rsd)
+static void print_stalled_cycles_front(struct perf_stat_config *config,
+				const struct evsel *evsel,
+				int aggr_idx, double stalled,
+				struct perf_stat_output_ctx *out)
 {
-	double total, ratio = 0.0;
-	const char *color;
-
-	total = runtime_stat_avg(STAT_CYCLES, aggr_idx, rsd);
-
-	if (total)
-		ratio = avg / total * 100.0;
+	static const double color_ratios[3] = {50.0, 30.0, 10.0};
 
-	color = get_ratio_color(GRC_STALLED_CYCLES_FE, ratio);
-
-	if (ratio)
-		out->print_metric(config, out->ctx, color, "%7.2f%%", "frontend cycles idle",
-				  ratio);
-	else
-		out->print_metric(config, out->ctx, NULL, NULL, "frontend cycles idle", 0);
+	print_ratio(config, evsel, aggr_idx, stalled, out, STAT_CYCLES, color_ratios,
+		    "frontend cycles idle");
 }
 
-static void print_stalled_cycles_backend(struct perf_stat_config *config,
-					 int aggr_idx, double avg,
-					 struct perf_stat_output_ctx *out,
-					 struct runtime_stat_data *rsd)
+static void print_stalled_cycles_back(struct perf_stat_config *config,
+				const struct evsel *evsel,
+				int aggr_idx, double stalled,
+				struct perf_stat_output_ctx *out)
 {
-	double total, ratio = 0.0;
-	const char *color;
-
-	total = runtime_stat_avg(STAT_CYCLES, aggr_idx, rsd);
-
-	if (total)
-		ratio = avg / total * 100.0;
+	static const double color_ratios[3] = {75.0, 50.0, 20.0};
 
-	color = get_ratio_color(GRC_STALLED_CYCLES_BE, ratio);
-
-	out->print_metric(config, out->ctx, color, "%7.2f%%", "backend cycles idle", ratio);
+	print_ratio(config, evsel, aggr_idx, stalled, out, STAT_CYCLES, color_ratios,
+		    "backend cycles idle");
 }
 
-static void print_branch_misses(struct perf_stat_config *config,
-				int aggr_idx, double avg,
-				struct perf_stat_output_ctx *out,
-				struct runtime_stat_data *rsd)
+static void print_branch_miss(struct perf_stat_config *config,
+			const struct evsel *evsel,
+			int aggr_idx, double misses,
+			struct perf_stat_output_ctx *out)
 {
-	double total, ratio = 0.0;
-	const char *color;
-
-	total = runtime_stat_avg(STAT_BRANCHES, aggr_idx, rsd);
-
-	if (total)
-		ratio = avg / total * 100.0;
+	static const double color_ratios[3] = {20.0, 10.0, 5.0};
 
-	color = get_ratio_color(GRC_CACHE_MISSES, ratio);
-
-	out->print_metric(config, out->ctx, color, "%7.2f%%", "of all branches", ratio);
+	print_ratio(config, evsel, aggr_idx, misses, out, STAT_BRANCHES, color_ratios,
+		    "of all branches");
 }
 
-static void print_l1_dcache_misses(struct perf_stat_config *config,
-				   int aggr_idx, double avg,
-				   struct perf_stat_output_ctx *out,
-				   struct runtime_stat_data *rsd)
+static void print_l1d_miss(struct perf_stat_config *config,
+			const struct evsel *evsel,
+			int aggr_idx, double misses,
+			struct perf_stat_output_ctx *out)
 {
-	double total, ratio = 0.0;
-	const char *color;
-
-	total = runtime_stat_avg(STAT_L1_DCACHE, aggr_idx, rsd);
+	static const double color_ratios[3] = {20.0, 10.0, 5.0};
 
-	if (total)
-		ratio = avg / total * 100.0;
+	print_ratio(config, evsel, aggr_idx, misses, out, STAT_L1_DCACHE, color_ratios,
+		    "of all L1-dcache accesses");
+}
 
-	color = get_ratio_color(GRC_CACHE_MISSES, ratio);
+static void print_l1i_miss(struct perf_stat_config *config,
+			const struct evsel *evsel,
+			int aggr_idx, double misses,
+			struct perf_stat_output_ctx *out)
+{
+	static const double color_ratios[3] = {20.0, 10.0, 5.0};
 
-	out->print_metric(config, out->ctx, color, "%7.2f%%", "of all L1-dcache accesses", ratio);
+	print_ratio(config, evsel, aggr_idx, misses, out, STAT_L1_ICACHE, color_ratios,
+		    "of all L1-icache accesses");
 }
 
-static void print_l1_icache_misses(struct perf_stat_config *config,
-				   int aggr_idx, double avg,
-				   struct perf_stat_output_ctx *out,
-				   struct runtime_stat_data *rsd)
+static void print_ll_miss(struct perf_stat_config *config,
+			const struct evsel *evsel,
+			int aggr_idx, double misses,
+			struct perf_stat_output_ctx *out)
 {
-	double total, ratio = 0.0;
-	const char *color;
+	static const double color_ratios[3] = {20.0, 10.0, 5.0};
 
-	total = runtime_stat_avg(STAT_L1_ICACHE, aggr_idx, rsd);
+	print_ratio(config, evsel, aggr_idx, misses, out, STAT_LL_CACHE, color_ratios,
+		    "of all L1-icache accesses");
+}
 
-	if (total)
-		ratio = avg / total * 100.0;
+static void print_dtlb_miss(struct perf_stat_config *config,
+			const struct evsel *evsel,
+			int aggr_idx, double misses,
+			struct perf_stat_output_ctx *out)
+{
+	static const double color_ratios[3] = {20.0, 10.0, 5.0};
 
-	color = get_ratio_color(GRC_CACHE_MISSES, ratio);
-	out->print_metric(config, out->ctx, color, "%7.2f%%", "of all L1-icache accesses", ratio);
+	print_ratio(config, evsel, aggr_idx, misses, out, STAT_DTLB_CACHE, color_ratios,
+		    "of all dTLB cache accesses");
 }
 
-static void print_dtlb_cache_misses(struct perf_stat_config *config,
-				    int aggr_idx, double avg,
-				    struct perf_stat_output_ctx *out,
-				    struct runtime_stat_data *rsd)
+static void print_itlb_miss(struct perf_stat_config *config,
+			const struct evsel *evsel,
+			int aggr_idx, double misses,
+			struct perf_stat_output_ctx *out)
 {
-	double total, ratio = 0.0;
-	const char *color;
+	static const double color_ratios[3] = {20.0, 10.0, 5.0};
 
-	total = runtime_stat_avg(STAT_DTLB_CACHE, aggr_idx, rsd);
+	print_ratio(config, evsel, aggr_idx, misses, out, STAT_ITLB_CACHE, color_ratios,
+		    "of all iTLB cache accesses");
+}
 
-	if (total)
-		ratio = avg / total * 100.0;
+static void print_cache_miss(struct perf_stat_config *config,
+			const struct evsel *evsel,
+			int aggr_idx, double misses,
+			struct perf_stat_output_ctx *out)
+{
+	static const double color_ratios[3] = {20.0, 10.0, 5.0};
 
-	color = get_ratio_color(GRC_CACHE_MISSES, ratio);
-	out->print_metric(config, out->ctx, color, "%7.2f%%", "of all dTLB cache accesses", ratio);
+	print_ratio(config, evsel, aggr_idx, misses, out, STAT_CACHE_REFS, color_ratios,
+		    "of all cache refs");
 }
 
-static void print_itlb_cache_misses(struct perf_stat_config *config,
-				    int aggr_idx, double avg,
-				    struct perf_stat_output_ctx *out,
-				    struct runtime_stat_data *rsd)
+static void print_instructions(struct perf_stat_config *config,
+			const struct evsel *evsel,
+			int aggr_idx, double instructions,
+			struct perf_stat_output_ctx *out)
 {
-	double total, ratio = 0.0;
-	const char *color;
+	print_metric_t print_metric = out->print_metric;
+	void *ctxp = out->ctx;
+	double cycles = find_stat(evsel, aggr_idx, STAT_CYCLES);
+	double max_stalled = max(find_stat(evsel, aggr_idx, STAT_STALLED_CYCLES_FRONT),
+				find_stat(evsel, aggr_idx, STAT_STALLED_CYCLES_BACK));
+
+	if (cycles) {
+		print_metric(config, ctxp, NULL, "%7.2f ", "insn per cycle",
+			instructions / cycles);
+	} else
+		print_metric(config, ctxp, NULL, NULL, "insn per cycle", 0);
+
+	if (max_stalled && instructions) {
+		out->new_line(config, ctxp);
+		print_metric(config, ctxp, NULL, "%7.2f ", "stalled cycles per insn",
+			max_stalled / instructions);
+	}
+}
 
-	total = runtime_stat_avg(STAT_ITLB_CACHE, aggr_idx, rsd);
+static void print_cycles(struct perf_stat_config *config,
+			const struct evsel *evsel,
+			int aggr_idx, double cycles,
+			struct perf_stat_output_ctx *out)
+{
+	double nsecs = find_stat(evsel, aggr_idx, STAT_NSECS);
 
-	if (total)
-		ratio = avg / total * 100.0;
+	if (cycles && nsecs) {
+		double ratio = cycles / nsecs;
 
-	color = get_ratio_color(GRC_CACHE_MISSES, ratio);
-	out->print_metric(config, out->ctx, color, "%7.2f%%", "of all iTLB cache accesses", ratio);
+		out->print_metric(config, out->ctx, NULL, "%8.3f", "GHz", ratio);
+	} else
+		out->print_metric(config, out->ctx, NULL, NULL, "GHz", 0);
 }
 
-static void print_ll_cache_misses(struct perf_stat_config *config,
-				  int aggr_idx, double avg,
-				  struct perf_stat_output_ctx *out,
-				  struct runtime_stat_data *rsd)
+static void print_nsecs(struct perf_stat_config *config,
+			const struct evsel *evsel,
+			int aggr_idx __maybe_unused, double nsecs,
+			struct perf_stat_output_ctx *out)
 {
-	double total, ratio = 0.0;
-	const char *color;
-
-	total = runtime_stat_avg(STAT_LL_CACHE, aggr_idx, rsd);
-
-	if (total)
-		ratio = avg / total * 100.0;
+	print_metric_t print_metric = out->print_metric;
+	void *ctxp = out->ctx;
+	double wall_time = avg_stats(&walltime_nsecs_stats);
 
-	color = get_ratio_color(GRC_CACHE_MISSES, ratio);
-	out->print_metric(config, out->ctx, color, "%7.2f%%", "of all LL-cache accesses", ratio);
+	if (wall_time) {
+		print_metric(config, ctxp, NULL, "%8.3f", "CPUs utilized",
+			nsecs / (wall_time * evsel->scale));
+	} else
+		print_metric(config, ctxp, NULL, NULL, "CPUs utilized", 0);
 }
 
 static int prepare_metric(struct evsel **metric_events,
@@ -638,139 +730,51 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 				   struct perf_stat_output_ctx *out,
 				   struct rblist *metric_events)
 {
-	void *ctxp = out->ctx;
-	print_metric_t print_metric = out->print_metric;
-	double total, ratio = 0.0;
-	struct runtime_stat_data rsd = {
-		.ctx = evsel_context(evsel),
-		.cgrp = evsel->cgrp,
+	typedef void (*stat_print_function_t)(struct perf_stat_config *config,
+					const struct evsel *evsel,
+					int aggr_idx, double misses,
+					struct perf_stat_output_ctx *out);
+	static const stat_print_function_t stat_print_function[STAT_MAX] = {
+		[STAT_INSTRUCTIONS] = print_instructions,
+		[STAT_BRANCH_MISS] = print_branch_miss,
+		[STAT_L1D_MISS] = print_l1d_miss,
+		[STAT_L1I_MISS] = print_l1i_miss,
+		[STAT_DTLB_MISS] = print_dtlb_miss,
+		[STAT_ITLB_MISS] = print_itlb_miss,
+		[STAT_LL_MISS] = print_ll_miss,
+		[STAT_CACHE_MISSES] = print_cache_miss,
+		[STAT_STALLED_CYCLES_FRONT] = print_stalled_cycles_front,
+		[STAT_STALLED_CYCLES_BACK] = print_stalled_cycles_back,
+		[STAT_CYCLES] = print_cycles,
+		[STAT_NSECS] = print_nsecs,
 	};
+	print_metric_t print_metric = out->print_metric;
+	void *ctxp = out->ctx;
 	struct metric_event *me;
 	int num = 1;
 
 	if (config->iostat_run) {
 		iostat_print_metric(config, evsel, out);
-	} else if (evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) {
-		total = runtime_stat_avg(STAT_CYCLES, aggr_idx, &rsd);
-
-		if (total) {
-			ratio = avg / total;
-			print_metric(config, ctxp, NULL, "%7.2f ",
-					"insn per cycle", ratio);
-		} else {
-			print_metric(config, ctxp, NULL, NULL, "insn per cycle", 0);
-		}
-
-		total = runtime_stat_avg(STAT_STALLED_CYCLES_FRONT, aggr_idx, &rsd);
-
-		total = max(total, runtime_stat_avg(STAT_STALLED_CYCLES_BACK,
-						    aggr_idx, &rsd));
-
-		if (total && avg) {
-			out->new_line(config, ctxp);
-			ratio = total / avg;
-			print_metric(config, ctxp, NULL, "%7.2f ",
-					"stalled cycles per insn",
-					ratio);
-		}
-	} else if (evsel__match(evsel, HARDWARE, HW_BRANCH_MISSES)) {
-		if (runtime_stat_n(STAT_BRANCHES, aggr_idx, &rsd) != 0)
-			print_branch_misses(config, aggr_idx, avg, out, &rsd);
-		else
-			print_metric(config, ctxp, NULL, NULL, "of all branches", 0);
-	} else if (
-		evsel->core.attr.type == PERF_TYPE_HW_CACHE &&
-		evsel->core.attr.config ==  ( PERF_COUNT_HW_CACHE_L1D |
-					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
-					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
-
-		if (runtime_stat_n(STAT_L1_DCACHE, aggr_idx, &rsd) != 0)
-			print_l1_dcache_misses(config, aggr_idx, avg, out, &rsd);
-		else
-			print_metric(config, ctxp, NULL, NULL, "of all L1-dcache accesses", 0);
-	} else if (
-		evsel->core.attr.type == PERF_TYPE_HW_CACHE &&
-		evsel->core.attr.config ==  ( PERF_COUNT_HW_CACHE_L1I |
-					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
-					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
-
-		if (runtime_stat_n(STAT_L1_ICACHE, aggr_idx, &rsd) != 0)
-			print_l1_icache_misses(config, aggr_idx, avg, out, &rsd);
-		else
-			print_metric(config, ctxp, NULL, NULL, "of all L1-icache accesses", 0);
-	} else if (
-		evsel->core.attr.type == PERF_TYPE_HW_CACHE &&
-		evsel->core.attr.config ==  ( PERF_COUNT_HW_CACHE_DTLB |
-					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
-					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
-
-		if (runtime_stat_n(STAT_DTLB_CACHE, aggr_idx, &rsd) != 0)
-			print_dtlb_cache_misses(config, aggr_idx, avg, out, &rsd);
-		else
-			print_metric(config, ctxp, NULL, NULL, "of all dTLB cache accesses", 0);
-	} else if (
-		evsel->core.attr.type == PERF_TYPE_HW_CACHE &&
-		evsel->core.attr.config ==  ( PERF_COUNT_HW_CACHE_ITLB |
-					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
-					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
-
-		if (runtime_stat_n(STAT_ITLB_CACHE, aggr_idx, &rsd) != 0)
-			print_itlb_cache_misses(config, aggr_idx, avg, out, &rsd);
-		else
-			print_metric(config, ctxp, NULL, NULL, "of all iTLB cache accesses", 0);
-	} else if (
-		evsel->core.attr.type == PERF_TYPE_HW_CACHE &&
-		evsel->core.attr.config ==  ( PERF_COUNT_HW_CACHE_LL |
-					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
-					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
-
-		if (runtime_stat_n(STAT_LL_CACHE, aggr_idx, &rsd) != 0)
-			print_ll_cache_misses(config, aggr_idx, avg, out, &rsd);
-		else
-			print_metric(config, ctxp, NULL, NULL, "of all LL-cache accesses", 0);
-	} else if (evsel__match(evsel, HARDWARE, HW_CACHE_MISSES)) {
-		total = runtime_stat_avg(STAT_CACHEREFS, aggr_idx, &rsd);
-
-		if (total)
-			ratio = avg * 100 / total;
-
-		if (runtime_stat_n(STAT_CACHEREFS, aggr_idx, &rsd) != 0)
-			print_metric(config, ctxp, NULL, "%8.3f %%",
-				     "of all cache refs", ratio);
-		else
-			print_metric(config, ctxp, NULL, NULL, "of all cache refs", 0);
-	} else if (evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_FRONTEND)) {
-		print_stalled_cycles_frontend(config, aggr_idx, avg, out, &rsd);
-	} else if (evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_BACKEND)) {
-		print_stalled_cycles_backend(config, aggr_idx, avg, out, &rsd);
-	} else if (evsel__match(evsel, HARDWARE, HW_CPU_CYCLES)) {
-		total = runtime_stat_avg(STAT_NSECS, aggr_idx, &rsd);
-
-		if (total) {
-			ratio = avg / total;
-			print_metric(config, ctxp, NULL, "%8.3f", "GHz", ratio);
-		} else {
-			print_metric(config, ctxp, NULL, NULL, "Ghz", 0);
-		}
-	} else if (evsel__is_clock(evsel)) {
-		if ((ratio = avg_stats(&walltime_nsecs_stats)) != 0)
-			print_metric(config, ctxp, NULL, "%8.3f", "CPUs utilized",
-				     avg / (ratio * evsel->scale));
-		else
-			print_metric(config, ctxp, NULL, NULL, "CPUs utilized", 0);
-	} else if (runtime_stat_n(STAT_NSECS, aggr_idx, &rsd) != 0) {
-		char unit = ' ';
-		char unit_buf[10] = "/sec";
-
-		total = runtime_stat_avg(STAT_NSECS, aggr_idx, &rsd);
-		if (total)
-			ratio = convert_unit_double(1000000000.0 * avg / total, &unit);
-
-		if (unit != ' ')
-			snprintf(unit_buf, sizeof(unit_buf), "%c/sec", unit);
-		print_metric(config, ctxp, NULL, "%8.3f", unit_buf, ratio);
 	} else {
-		num = 0;
+		stat_print_function_t fn = stat_print_function[evsel__stat_type(evsel)];
+
+		if (fn)
+			fn(config, evsel, aggr_idx, avg, out);
+		else {
+			double nsecs =	find_stat(evsel, aggr_idx, STAT_NSECS);
+
+			if (nsecs) {
+				char unit = ' ';
+				char unit_buf[10] = "/sec";
+				double ratio = convert_unit_double(1000000000.0 * avg / nsecs,
+								   &unit);
+
+				if (unit != ' ')
+					snprintf(unit_buf, sizeof(unit_buf), "%c/sec", unit);
+				print_metric(config, ctxp, NULL, "%8.3f", unit_buf, ratio);
+			} else
+				num = 0;
+		}
 	}
 
 	if ((me = metricgroup__lookup(metric_events, evsel, false)) != NULL) {
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v1 51/51] perf stat: Remove saved_value/runtime_stat
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (33 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 50/51] perf stat: Use " Ian Rogers
@ 2023-02-19  9:28 ` Ian Rogers
  2023-02-19 11:17 ` [PATCH v1 00/51] shadow metric clean up and improvements Arnaldo Carvalho de Melo
  2023-02-27 22:04 ` Liang, Kan
  36 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-19  9:28 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian, Ian Rogers

As saved_value/runtime_stat are only written to and not read, remove
the associated logic and writes.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-script.c     |   5 -
 tools/perf/builtin-stat.c       |   6 -
 tools/perf/tests/parse-metric.c |   1 -
 tools/perf/tests/pmu-events.c   |   1 -
 tools/perf/util/stat-shadow.c   | 198 --------------------------------
 tools/perf/util/stat.c          |  24 ----
 tools/perf/util/stat.h          |   4 -
 7 files changed, 239 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index e9b5387161df..522226114263 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2072,9 +2072,6 @@ static void perf_sample__fprint_metric(struct perf_script *script,
 	if (evsel_script(leader)->gnum++ == 0)
 		perf_stat__reset_shadow_stats();
 	val = sample->period * evsel->scale;
-	perf_stat__update_shadow_stats(evsel,
-				       val,
-				       sample->cpu);
 	evsel_script(evsel)->val = val;
 	if (evsel_script(leader)->gnum == leader->core.nr_members) {
 		for_each_group_member (ev2, leader) {
@@ -2792,8 +2789,6 @@ static int __cmd_script(struct perf_script *script)
 
 	signal(SIGINT, sig_handler);
 
-	perf_stat__init_shadow_stats();
-
 	/* override event processing functions */
 	if (script->show_task_events) {
 		script->tool.comm = process_comm_event;
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 619387459914..d70b1ec88594 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -424,7 +424,6 @@ static void process_counters(void)
 
 	perf_stat_merge_counters(&stat_config, evsel_list);
 	perf_stat_process_percore(&stat_config, evsel_list);
-	perf_stat_process_shadow_stats(&stat_config, evsel_list);
 }
 
 static void process_interval(void)
@@ -434,7 +433,6 @@ static void process_interval(void)
 	clock_gettime(CLOCK_MONOTONIC, &ts);
 	diff_timespec(&rs, &ts, &ref_time);
 
-	perf_stat__reset_shadow_per_stat();
 	evlist__reset_aggr_stats(evsel_list);
 
 	if (read_counters(&rs) == 0)
@@ -910,7 +908,6 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 		evlist__copy_prev_raw_counts(evsel_list);
 		evlist__reset_prev_raw_counts(evsel_list);
 		evlist__reset_aggr_stats(evsel_list);
-		perf_stat__reset_shadow_per_stat();
 	} else {
 		update_stats(&walltime_nsecs_stats, t1 - t0);
 		update_rusage_stats(&ru_stats, &stat_config.ru_data);
@@ -2132,8 +2129,6 @@ static int __cmd_report(int argc, const char **argv)
 			input_name = "perf.data";
 	}
 
-	perf_stat__init_shadow_stats();
-
 	perf_stat.data.path = input_name;
 	perf_stat.data.mode = PERF_DATA_MODE_READ;
 
@@ -2413,7 +2408,6 @@ int cmd_stat(int argc, const char **argv)
 					&stat_config.metric_events);
 		zfree(&metrics);
 	}
-	perf_stat__init_shadow_stats();
 
 	if (add_default_attributes())
 		goto out;
diff --git a/tools/perf/tests/parse-metric.c b/tools/perf/tests/parse-metric.c
index b9b8a48289c4..c43b056f9fa3 100644
--- a/tools/perf/tests/parse-metric.c
+++ b/tools/perf/tests/parse-metric.c
@@ -296,7 +296,6 @@ static int test_metric_group(void)
 
 static int test__parse_metric(struct test_suite *test __maybe_unused, int subtest __maybe_unused)
 {
-	perf_stat__init_shadow_stats();
 	TEST_ASSERT_VAL("IPC failed", test_ipc() == 0);
 	TEST_ASSERT_VAL("frontend failed", test_frontend() == 0);
 	TEST_ASSERT_VAL("DCache_L2 failed", test_dcache_l2() == 0);
diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index 4ec2a4ca1a82..6ccd413b5983 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -905,7 +905,6 @@ static int test__parsing(struct test_suite *test __maybe_unused,
 {
 	int failures = 0;
 
-	perf_stat__init_shadow_stats();
 	pmu_for_each_core_metric(test__parsing_callback, &failures);
 	pmu_for_each_sys_metric(test__parsing_callback, &failures);
 
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 9d22cde09dc9..ef85f1ae1ab2 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -16,22 +16,9 @@
 #include "iostat.h"
 #include "util/hashmap.h"
 
-/*
- * AGGR_GLOBAL: Use CPU 0
- * AGGR_SOCKET: Use first CPU of socket
- * AGGR_DIE: Use first CPU of die
- * AGGR_CORE: Use first CPU of core
- * AGGR_NONE: Use matching CPU
- * AGGR_THREAD: Not supported?
- */
-
 struct stats walltime_nsecs_stats;
 struct rusage_stats ru_stats;
 
-static struct runtime_stat {
-	struct rblist value_list;
-} rt_stat;
-
 enum {
 	CTX_BIT_USER	= 1 << 0,
 	CTX_BIT_KERNEL	= 1 << 1,
@@ -65,117 +52,6 @@ enum stat_type {
 	STAT_MAX
 };
 
-struct saved_value {
-	struct rb_node rb_node;
-	struct evsel *evsel;
-	enum stat_type type;
-	int ctx;
-	int map_idx;  /* cpu or thread map index */
-	struct cgroup *cgrp;
-	struct stats stats;
-	u64 metric_total;
-	int metric_other;
-};
-
-static int saved_value_cmp(struct rb_node *rb_node, const void *entry)
-{
-	struct saved_value *a = container_of(rb_node,
-					     struct saved_value,
-					     rb_node);
-	const struct saved_value *b = entry;
-
-	if (a->map_idx != b->map_idx)
-		return a->map_idx - b->map_idx;
-
-	/*
-	 * Previously the rbtree was used to link generic metrics.
-	 * The keys were evsel/cpu. Now the rbtree is extended to support
-	 * per-thread shadow stats. For shadow stats case, the keys
-	 * are cpu/type/ctx/stat (evsel is NULL). For generic metrics
-	 * case, the keys are still evsel/cpu (type/ctx/stat are 0 or NULL).
-	 */
-	if (a->type != b->type)
-		return a->type - b->type;
-
-	if (a->ctx != b->ctx)
-		return a->ctx - b->ctx;
-
-	if (a->cgrp != b->cgrp)
-		return (char *)a->cgrp < (char *)b->cgrp ? -1 : +1;
-
-	if (a->evsel == b->evsel)
-		return 0;
-	if ((char *)a->evsel < (char *)b->evsel)
-		return -1;
-	return +1;
-}
-
-static struct rb_node *saved_value_new(struct rblist *rblist __maybe_unused,
-				     const void *entry)
-{
-	struct saved_value *nd = malloc(sizeof(struct saved_value));
-
-	if (!nd)
-		return NULL;
-	memcpy(nd, entry, sizeof(struct saved_value));
-	return &nd->rb_node;
-}
-
-static void saved_value_delete(struct rblist *rblist __maybe_unused,
-			       struct rb_node *rb_node)
-{
-	struct saved_value *v;
-
-	BUG_ON(!rb_node);
-	v = container_of(rb_node, struct saved_value, rb_node);
-	free(v);
-}
-
-static struct saved_value *saved_value_lookup(struct evsel *evsel,
-					      int map_idx,
-					      bool create,
-					      enum stat_type type,
-					      int ctx,
-					      struct cgroup *cgrp)
-{
-	struct rblist *rblist;
-	struct rb_node *nd;
-	struct saved_value dm = {
-		.map_idx = map_idx,
-		.evsel = evsel,
-		.type = type,
-		.ctx = ctx,
-		.cgrp = cgrp,
-	};
-
-	rblist = &rt_stat.value_list;
-
-	/* don't use context info for clock events */
-	if (type == STAT_NSECS)
-		dm.ctx = 0;
-
-	nd = rblist__find(rblist, &dm);
-	if (nd)
-		return container_of(nd, struct saved_value, rb_node);
-	if (create) {
-		rblist__add_node(rblist, &dm);
-		nd = rblist__find(rblist, &dm);
-		if (nd)
-			return container_of(nd, struct saved_value, rb_node);
-	}
-	return NULL;
-}
-
-void perf_stat__init_shadow_stats(void)
-{
-	struct rblist *rblist = &rt_stat.value_list;
-
-	rblist__init(rblist);
-	rblist->node_cmp = saved_value_cmp;
-	rblist->node_new = saved_value_new;
-	rblist->node_delete = saved_value_delete;
-}
-
 static int evsel_context(const struct evsel *evsel)
 {
 	int ctx = 0;
@@ -194,86 +70,12 @@ static int evsel_context(const struct evsel *evsel)
 	return ctx;
 }
 
-void perf_stat__reset_shadow_per_stat(void)
-{
-	struct rblist *rblist;
-	struct rb_node *pos, *next;
-
-	rblist = &rt_stat.value_list;
-	next = rb_first_cached(&rblist->entries);
-	while (next) {
-		pos = next;
-		next = rb_next(pos);
-		memset(&container_of(pos, struct saved_value, rb_node)->stats,
-		       0,
-		       sizeof(struct stats));
-	}
-}
-
 void perf_stat__reset_shadow_stats(void)
 {
-	perf_stat__reset_shadow_per_stat();
 	memset(&walltime_nsecs_stats, 0, sizeof(walltime_nsecs_stats));
 	memset(&ru_stats, 0, sizeof(ru_stats));
 }
 
-struct runtime_stat_data {
-	int ctx;
-	struct cgroup *cgrp;
-};
-
-static void update_runtime_stat(enum stat_type type,
-				int map_idx, u64 count,
-				struct runtime_stat_data *rsd)
-{
-	struct saved_value *v = saved_value_lookup(NULL, map_idx, true, type,
-						   rsd->ctx, rsd->cgrp);
-
-	if (v)
-		update_stats(&v->stats, count);
-}
-
-/*
- * Update various tracking values we maintain to print
- * more semantic information such as miss/hit ratios,
- * instruction rates, etc:
- */
-void perf_stat__update_shadow_stats(struct evsel *counter, u64 count,
-				    int aggr_idx)
-{
-	u64 count_ns = count;
-	struct runtime_stat_data rsd = {
-		.ctx = evsel_context(counter),
-		.cgrp = counter->cgrp,
-	};
-	count *= counter->scale;
-
-	if (evsel__is_clock(counter))
-		update_runtime_stat(STAT_NSECS, aggr_idx, count_ns, &rsd);
-	else if (evsel__match(counter, HARDWARE, HW_CPU_CYCLES))
-		update_runtime_stat(STAT_CYCLES, aggr_idx, count, &rsd);
-	else if (evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
-		update_runtime_stat(STAT_STALLED_CYCLES_FRONT,
-				    aggr_idx, count, &rsd);
-	else if (evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_BACKEND))
-		update_runtime_stat(STAT_STALLED_CYCLES_BACK,
-				    aggr_idx, count, &rsd);
-	else if (evsel__match(counter, HARDWARE, HW_BRANCH_INSTRUCTIONS))
-		update_runtime_stat(STAT_BRANCHES, aggr_idx, count, &rsd);
-	else if (evsel__match(counter, HARDWARE, HW_CACHE_REFERENCES))
-		update_runtime_stat(STAT_CACHE_REFS, aggr_idx, count, &rsd);
-	else if (evsel__match(counter, HW_CACHE, HW_CACHE_L1D))
-		update_runtime_stat(STAT_L1_DCACHE, aggr_idx, count, &rsd);
-	else if (evsel__match(counter, HW_CACHE, HW_CACHE_L1I))
-		update_runtime_stat(STAT_L1_ICACHE, aggr_idx, count, &rsd);
-	else if (evsel__match(counter, HW_CACHE, HW_CACHE_LL))
-		update_runtime_stat(STAT_LL_CACHE, aggr_idx, count, &rsd);
-	else if (evsel__match(counter, HW_CACHE, HW_CACHE_DTLB))
-		update_runtime_stat(STAT_DTLB_CACHE, aggr_idx, count, &rsd);
-	else if (evsel__match(counter, HW_CACHE, HW_CACHE_ITLB))
-		update_runtime_stat(STAT_ITLB_CACHE, aggr_idx, count, &rsd);
-}
-
 static enum stat_type evsel__stat_type(const struct evsel *evsel)
 {
 	/* Fake perf_hw_cache_op_id values for use with evsel__match. */
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 83dc4c1f4b12..4abfd87c5352 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -648,30 +648,6 @@ void perf_stat_process_percore(struct perf_stat_config *config, struct evlist *e
 		evsel__process_percore(evsel);
 }
 
-static void evsel__update_shadow_stats(struct evsel *evsel)
-{
-	struct perf_stat_evsel *ps = evsel->stats;
-	int aggr_idx;
-
-	if (ps->aggr == NULL)
-		return;
-
-	for (aggr_idx = 0; aggr_idx < ps->nr_aggr; aggr_idx++) {
-		struct perf_counts_values *aggr_counts = &ps->aggr[aggr_idx].counts;
-
-		perf_stat__update_shadow_stats(evsel, aggr_counts->val, aggr_idx);
-	}
-}
-
-void perf_stat_process_shadow_stats(struct perf_stat_config *config __maybe_unused,
-				    struct evlist *evlist)
-{
-	struct evsel *evsel;
-
-	evlist__for_each_entry(evlist, evsel)
-		evsel__update_shadow_stats(evsel);
-}
-
 int perf_event__process_stat_event(struct perf_session *session,
 				   union perf_event *event)
 {
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index b01c828c3799..41204547b76b 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -157,10 +157,7 @@ typedef void (*print_metric_t)(struct perf_stat_config *config,
 			       const char *fmt, double val);
 typedef void (*new_line_t)(struct perf_stat_config *config, void *ctx);
 
-void perf_stat__init_shadow_stats(void);
 void perf_stat__reset_shadow_stats(void);
-void perf_stat__reset_shadow_per_stat(void);
-void perf_stat__update_shadow_stats(struct evsel *counter, u64 count, int aggr_idx);
 struct perf_stat_output_ctx {
 	void *ctx;
 	print_metric_t print_metric;
@@ -189,7 +186,6 @@ int perf_stat_process_counter(struct perf_stat_config *config,
 			      struct evsel *counter);
 void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *evlist);
 void perf_stat_process_percore(struct perf_stat_config *config, struct evlist *evlist);
-void perf_stat_process_shadow_stats(struct perf_stat_config *config, struct evlist *evlist);
 
 struct perf_tool;
 union perf_event;
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 00/51] shadow metric clean up and improvements
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (34 preceding siblings ...)
  2023-02-19  9:28 ` [PATCH v1 51/51] perf stat: Remove saved_value/runtime_stat Ian Rogers
@ 2023-02-19 11:17 ` Arnaldo Carvalho de Melo
  2023-02-19 15:43   ` Ian Rogers
  2023-02-27 22:04 ` Liang, Kan
  36 siblings, 1 reply; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-02-19 11:17 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Zhengjun Xing, Sandipan Das, James Clark, Kajol Jain, John Garry,
	Kan Liang, Adrian Hunter, Andrii Nakryiko, Eduard Zingerman,
	Suzuki Poulouse, Leo Yan, Florian Fischer, Ravi Bangoria,
	Jing Zhang, Sean Christopherson, Athira Rajeev, linux-kernel,
	linux-perf-users, linux-stm32, linux-arm-kernel, Perry Taylor,
	Caleb Biggers, Stephane Eranian

Em Sun, Feb 19, 2023 at 01:27:57AM -0800, Ian Rogers escreveu:
> Recently the shadow stat metrics broke due to repeated aggregation and
> a quick fix was applied:
> https://lore.kernel.org/lkml/20230209064447.83733-1-irogers@google.com/
> This is the longer fix but one that comes with some extras. To avoid
> fixing issues for hard coded metrics, the topdown, SMI cost and
> transaction flags are moved into json metrics. A side effect of this
> is that TopdownL1 metrics will now be displayed when supported, if no
> "perf stat" events are specified.
> 
> Another fix included here is for event grouping as raised in:
> https://lore.kernel.org/lkml/CA+icZUU_ew7pzWJJZLbj1xsU6MQTPrj8tkFfDhNdTDRQfGUBMQ@mail.gmail.com/
> Metrics are now tagged with NMI and SMT flags, meaning that the events
> shouldn't be grouped if the NMI watchdog is enabled or SMT is enabled.
> 
> Given the two issues, the metrics are re-generated and the patches
> also include the latest Intel vendor events. The changes to the metric
> generation code can be seen in:
> https://github.com/intel/perfmon/pull/56
> 
> Hard coded metrics support thresholds, the patches add this ability to
> json metrics so that the hard coded metrics can be removed. Migrate
> remaining hard coded metrics to looking up counters from the
> evlist/aggregation count. Finally, get rid of the saved_value logic
> and thereby look to fix the aggregation issues.
> 
> Some related fix ups and code clean ups are included in the changes,
> in particular to aid with the code's readability and to keep topdown
> documentation in sync.

That is great work but won't have a reasonable time sitting on
linux-next to make it into 6.3.

I have just applied it locally for the usual set of tests, that I'll
report back here.

- Arnaldo
 
> Ian Rogers (51):
>   perf tools: Ensure evsel name is initialized
>   perf metrics: Improve variable names
>   perf pmu-events: Remove aggr_mode from pmu_event
>   perf pmu-events: Change aggr_mode to be an enum
>   perf pmu-events: Change deprecated to be a bool
>   perf pmu-events: Change perpkg to be a bool
>   perf expr: Make the online topology accessible globally
>   perf pmu-events: Make the metric_constraint an enum
>   perf pmu-events: Don't '\0' terminate enum values
>   perf vendor events intel: Refresh alderlake events
>   perf vendor events intel: Refresh alderlake-n metrics
>   perf vendor events intel: Refresh broadwell metrics
>   perf vendor events intel: Refresh broadwellde metrics
>   perf vendor events intel: Refresh broadwellx metrics
>   perf vendor events intel: Refresh cascadelakex events
>   perf vendor events intel: Add graniterapids events
>   perf vendor events intel: Refresh haswell metrics
>   perf vendor events intel: Refresh haswellx metrics
>   perf vendor events intel: Refresh icelake events
>   perf vendor events intel: Refresh icelakex metrics
>   perf vendor events intel: Refresh ivybridge metrics
>   perf vendor events intel: Refresh ivytown metrics
>   perf vendor events intel: Refresh jaketown events
>   perf vendor events intel: Refresh knightslanding events
>   perf vendor events intel: Refresh sandybridge events
>   perf vendor events intel: Refresh sapphirerapids events
>   perf vendor events intel: Refresh silvermont events
>   perf vendor events intel: Refresh skylake events
>   perf vendor events intel: Refresh skylakex metrics
>   perf vendor events intel: Refresh tigerlake events
>   perf vendor events intel: Refresh westmereep-dp events
>   perf jevents: Add rand support to metrics
>   perf jevent: Parse metric thresholds
>   perf pmu-events: Test parsing metric thresholds with the fake PMU
>   perf list: Support for printing metric thresholds
>   perf metric: Compute and print threshold values
>   perf expr: More explicit NAN handling
>   perf metric: Add --metric-no-threshold option
>   perf stat: Add TopdownL1 metric as a default if present
>   perf stat: Implement --topdown using json metrics
>   perf stat: Remove topdown event special handling
>   perf doc: Refresh topdown documentation
>   perf stat: Remove hard coded transaction events
>   perf stat: Use metrics for --smi-cost
>   perf stat: Remove perf_stat_evsel_id
>   perf stat: Move enums from header
>   perf stat: Hide runtime_stat
>   perf stat: Add cpu_aggr_map for loop
>   perf metric: Directly use counts rather than saved_value
>   perf stat: Use counts rather than saved_value
>   perf stat: Remove saved_value/runtime_stat
> 
>  tools/perf/Documentation/perf-stat.txt        |   27 +-
>  tools/perf/Documentation/topdown.txt          |   70 +-
>  tools/perf/arch/powerpc/util/header.c         |    2 +-
>  tools/perf/arch/x86/util/evlist.c             |    6 +-
>  tools/perf/arch/x86/util/topdown.c            |   78 +-
>  tools/perf/arch/x86/util/topdown.h            |    1 -
>  tools/perf/builtin-list.c                     |   13 +-
>  tools/perf/builtin-script.c                   |    9 +-
>  tools/perf/builtin-stat.c                     |  233 +-
>  .../arch/x86/alderlake/adl-metrics.json       | 3190 ++++++++++-------
>  .../pmu-events/arch/x86/alderlake/cache.json  |   36 +-
>  .../arch/x86/alderlake/floating-point.json    |   27 +
>  .../arch/x86/alderlake/frontend.json          |    9 +
>  .../pmu-events/arch/x86/alderlake/memory.json |    3 +-
>  .../arch/x86/alderlake/pipeline.json          |   14 +-
>  .../arch/x86/alderlake/uncore-other.json      |   28 +-
>  .../arch/x86/alderlaken/adln-metrics.json     |  811 +++--
>  .../arch/x86/broadwell/bdw-metrics.json       | 1439 ++++----
>  .../arch/x86/broadwellde/bdwde-metrics.json   | 1405 ++++----
>  .../arch/x86/broadwellx/bdx-metrics.json      | 1626 +++++----
>  .../arch/x86/broadwellx/uncore-cache.json     |   74 +-
>  .../x86/broadwellx/uncore-interconnect.json   |   64 +-
>  .../arch/x86/broadwellx/uncore-other.json     |    4 +-
>  .../arch/x86/cascadelakex/cache.json          |   24 +-
>  .../arch/x86/cascadelakex/clx-metrics.json    | 2198 ++++++------
>  .../arch/x86/cascadelakex/frontend.json       |    8 +-
>  .../arch/x86/cascadelakex/pipeline.json       |   16 +
>  .../arch/x86/cascadelakex/uncore-memory.json  |   18 +-
>  .../arch/x86/cascadelakex/uncore-other.json   |  120 +-
>  .../arch/x86/cascadelakex/uncore-power.json   |    8 +-
>  .../arch/x86/graniterapids/cache.json         |   54 +
>  .../arch/x86/graniterapids/frontend.json      |   10 +
>  .../arch/x86/graniterapids/memory.json        |  174 +
>  .../arch/x86/graniterapids/other.json         |   29 +
>  .../arch/x86/graniterapids/pipeline.json      |  102 +
>  .../x86/graniterapids/virtual-memory.json     |   26 +
>  .../arch/x86/haswell/hsw-metrics.json         | 1220 ++++---
>  .../arch/x86/haswellx/hsx-metrics.json        | 1397 ++++----
>  .../pmu-events/arch/x86/icelake/cache.json    |   16 +
>  .../arch/x86/icelake/floating-point.json      |   31 +
>  .../arch/x86/icelake/icl-metrics.json         | 1932 +++++-----
>  .../pmu-events/arch/x86/icelake/pipeline.json |   23 +-
>  .../arch/x86/icelake/uncore-other.json        |   56 +
>  .../arch/x86/icelakex/icx-metrics.json        | 2153 +++++------
>  .../arch/x86/icelakex/uncore-memory.json      |    2 +-
>  .../arch/x86/icelakex/uncore-other.json       |    4 +-
>  .../arch/x86/ivybridge/ivb-metrics.json       | 1270 ++++---
>  .../arch/x86/ivytown/ivt-metrics.json         | 1311 ++++---
>  .../pmu-events/arch/x86/jaketown/cache.json   |    6 +-
>  .../arch/x86/jaketown/floating-point.json     |    2 +-
>  .../arch/x86/jaketown/frontend.json           |   12 +-
>  .../arch/x86/jaketown/jkt-metrics.json        |  602 ++--
>  .../arch/x86/jaketown/pipeline.json           |    2 +-
>  .../arch/x86/jaketown/uncore-cache.json       |   22 +-
>  .../x86/jaketown/uncore-interconnect.json     |   74 +-
>  .../arch/x86/jaketown/uncore-memory.json      |    4 +-
>  .../arch/x86/jaketown/uncore-other.json       |   22 +-
>  .../arch/x86/jaketown/uncore-power.json       |    8 +-
>  .../arch/x86/knightslanding/cache.json        |   94 +-
>  .../arch/x86/knightslanding/pipeline.json     |    8 +-
>  .../arch/x86/knightslanding/uncore-other.json |    8 +-
>  tools/perf/pmu-events/arch/x86/mapfile.csv    |   29 +-
>  .../arch/x86/sandybridge/cache.json           |    8 +-
>  .../arch/x86/sandybridge/floating-point.json  |    2 +-
>  .../arch/x86/sandybridge/frontend.json        |   12 +-
>  .../arch/x86/sandybridge/pipeline.json        |    2 +-
>  .../arch/x86/sandybridge/snb-metrics.json     |  601 ++--
>  .../arch/x86/sapphirerapids/cache.json        |   24 +-
>  .../x86/sapphirerapids/floating-point.json    |   32 +
>  .../arch/x86/sapphirerapids/frontend.json     |    8 +
>  .../arch/x86/sapphirerapids/pipeline.json     |   19 +-
>  .../arch/x86/sapphirerapids/spr-metrics.json  | 2283 ++++++------
>  .../arch/x86/sapphirerapids/uncore-other.json |   60 +
>  .../arch/x86/silvermont/frontend.json         |    2 +-
>  .../arch/x86/silvermont/pipeline.json         |    2 +-
>  .../pmu-events/arch/x86/skylake/cache.json    |   25 +-
>  .../pmu-events/arch/x86/skylake/frontend.json |    8 +-
>  .../pmu-events/arch/x86/skylake/other.json    |    1 +
>  .../pmu-events/arch/x86/skylake/pipeline.json |   16 +
>  .../arch/x86/skylake/skl-metrics.json         | 1877 ++++++----
>  .../arch/x86/skylake/uncore-other.json        |    1 +
>  .../pmu-events/arch/x86/skylakex/cache.json   |    8 +-
>  .../arch/x86/skylakex/frontend.json           |    8 +-
>  .../arch/x86/skylakex/pipeline.json           |   16 +
>  .../arch/x86/skylakex/skx-metrics.json        | 2097 +++++------
>  .../arch/x86/skylakex/uncore-memory.json      |    2 +-
>  .../arch/x86/skylakex/uncore-other.json       |   96 +-
>  .../arch/x86/skylakex/uncore-power.json       |    6 +-
>  .../arch/x86/tigerlake/floating-point.json    |   31 +
>  .../arch/x86/tigerlake/pipeline.json          |   18 +
>  .../arch/x86/tigerlake/tgl-metrics.json       | 1942 +++++-----
>  .../arch/x86/tigerlake/uncore-other.json      |   28 +-
>  .../arch/x86/westmereep-dp/cache.json         |    2 +-
>  .../x86/westmereep-dp/virtual-memory.json     |    2 +-
>  tools/perf/pmu-events/jevents.py              |   58 +-
>  tools/perf/pmu-events/metric.py               |    8 +-
>  tools/perf/pmu-events/pmu-events.h            |   35 +-
>  tools/perf/tests/expand-cgroup.c              |    3 +-
>  tools/perf/tests/expr.c                       |    7 +-
>  tools/perf/tests/parse-metric.c               |   21 +-
>  tools/perf/tests/pmu-events.c                 |   49 +-
>  tools/perf/util/cpumap.h                      |    3 +
>  tools/perf/util/cputopo.c                     |   14 +
>  tools/perf/util/cputopo.h                     |    5 +
>  tools/perf/util/evsel.h                       |    2 +-
>  tools/perf/util/expr.c                        |   16 +-
>  tools/perf/util/expr.y                        |   12 +-
>  tools/perf/util/metricgroup.c                 |  178 +-
>  tools/perf/util/metricgroup.h                 |    5 +-
>  tools/perf/util/pmu.c                         |   17 +-
>  tools/perf/util/print-events.h                |    1 +
>  tools/perf/util/smt.c                         |   11 +-
>  tools/perf/util/smt.h                         |   12 +-
>  tools/perf/util/stat-display.c                |  117 +-
>  tools/perf/util/stat-shadow.c                 | 1287 ++-----
>  tools/perf/util/stat.c                        |   74 -
>  tools/perf/util/stat.h                        |   96 +-
>  tools/perf/util/synthetic-events.c            |    2 +-
>  tools/perf/util/topdown.c                     |   68 +-
>  tools/perf/util/topdown.h                     |   11 +-
>  120 files changed, 18025 insertions(+), 15590 deletions(-)
>  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/cache.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/frontend.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/memory.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/other.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/virtual-memory.json
> 
> -- 
> 2.39.2.637.g21b0678d19-goog
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 00/51] shadow metric clean up and improvements
  2023-02-19 11:17 ` [PATCH v1 00/51] shadow metric clean up and improvements Arnaldo Carvalho de Melo
@ 2023-02-19 15:43   ` Ian Rogers
  2023-02-21 17:44     ` Ian Rogers
  0 siblings, 1 reply; 50+ messages in thread
From: Ian Rogers @ 2023-02-19 15:43 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Zhengjun Xing, Sandipan Das, James Clark, Kajol Jain, John Garry,
	Kan Liang, Adrian Hunter, Andrii Nakryiko, Eduard Zingerman,
	Suzuki Poulouse, Leo Yan, Florian Fischer, Ravi Bangoria,
	Jing Zhang, Sean Christopherson, Athira Rajeev, LKML,
	linux-perf-users, moderated list:ARM/STM32 ARCHITECTURE,
	Linux ARM, Perry Taylor, Caleb Biggers, Stephane Eranian

On Sun, Feb 19, 2023, 3:17 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>
> Em Sun, Feb 19, 2023 at 01:27:57AM -0800, Ian Rogers escreveu:
> > Recently the shadow stat metrics broke due to repeated aggregation and
> > a quick fix was applied:
> > https://lore.kernel.org/lkml/20230209064447.83733-1-irogers@google.com/
> > This is the longer fix but one that comes with some extras. To avoid
> > fixing issues for hard coded metrics, the topdown, SMI cost and
> > transaction flags are moved into json metrics. A side effect of this
> > is that TopdownL1 metrics will now be displayed when supported, if no
> > "perf stat" events are specified.
> >
> > Another fix included here is for event grouping as raised in:
> > https://lore.kernel.org/lkml/CA+icZUU_ew7pzWJJZLbj1xsU6MQTPrj8tkFfDhNdTDRQfGUBMQ@mail.gmail.com/
> > Metrics are now tagged with NMI and SMT flags, meaning that the events
> > shouldn't be grouped if the NMI watchdog is enabled or SMT is enabled.
> >
> > Given the two issues, the metrics are re-generated and the patches
> > also include the latest Intel vendor events. The changes to the metric
> > generation code can be seen in:
> > https://github.com/intel/perfmon/pull/56
> >
> > Hard coded metrics support thresholds, the patches add this ability to
> > json metrics so that the hard coded metrics can be removed. Migrate
> > remaining hard coded metrics to looking up counters from the
> > evlist/aggregation count. Finally, get rid of the saved_value logic
> > and thereby look to fix the aggregation issues.
> >
> > Some related fix ups and code clean ups are included in the changes,
> > in particular to aid with the code's readability and to keep topdown
> > documentation in sync.
>
> That is great work but won't have a reasonable time sitting on
> linux-next to make it into 6.3.
>
> I have just applied it locally for the usual set of tests, that I'll
> report back here.


Ugh. I'm guessing it won't be useful if I point out more things broken
with the current workaround, like metrics with --repeat :-/

Thanks,
Ian

> - Arnaldo
>
> > Ian Rogers (51):
> >   perf tools: Ensure evsel name is initialized
> >   perf metrics: Improve variable names
> >   perf pmu-events: Remove aggr_mode from pmu_event
> >   perf pmu-events: Change aggr_mode to be an enum
> >   perf pmu-events: Change deprecated to be a bool
> >   perf pmu-events: Change perpkg to be a bool
> >   perf expr: Make the online topology accessible globally
> >   perf pmu-events: Make the metric_constraint an enum
> >   perf pmu-events: Don't '\0' terminate enum values
> >   perf vendor events intel: Refresh alderlake events
> >   perf vendor events intel: Refresh alderlake-n metrics
> >   perf vendor events intel: Refresh broadwell metrics
> >   perf vendor events intel: Refresh broadwellde metrics
> >   perf vendor events intel: Refresh broadwellx metrics
> >   perf vendor events intel: Refresh cascadelakex events
> >   perf vendor events intel: Add graniterapids events
> >   perf vendor events intel: Refresh haswell metrics
> >   perf vendor events intel: Refresh haswellx metrics
> >   perf vendor events intel: Refresh icelake events
> >   perf vendor events intel: Refresh icelakex metrics
> >   perf vendor events intel: Refresh ivybridge metrics
> >   perf vendor events intel: Refresh ivytown metrics
> >   perf vendor events intel: Refresh jaketown events
> >   perf vendor events intel: Refresh knightslanding events
> >   perf vendor events intel: Refresh sandybridge events
> >   perf vendor events intel: Refresh sapphirerapids events
> >   perf vendor events intel: Refresh silvermont events
> >   perf vendor events intel: Refresh skylake events
> >   perf vendor events intel: Refresh skylakex metrics
> >   perf vendor events intel: Refresh tigerlake events
> >   perf vendor events intel: Refresh westmereep-dp events
> >   perf jevents: Add rand support to metrics
> >   perf jevent: Parse metric thresholds
> >   perf pmu-events: Test parsing metric thresholds with the fake PMU
> >   perf list: Support for printing metric thresholds
> >   perf metric: Compute and print threshold values
> >   perf expr: More explicit NAN handling
> >   perf metric: Add --metric-no-threshold option
> >   perf stat: Add TopdownL1 metric as a default if present
> >   perf stat: Implement --topdown using json metrics
> >   perf stat: Remove topdown event special handling
> >   perf doc: Refresh topdown documentation
> >   perf stat: Remove hard coded transaction events
> >   perf stat: Use metrics for --smi-cost
> >   perf stat: Remove perf_stat_evsel_id
> >   perf stat: Move enums from header
> >   perf stat: Hide runtime_stat
> >   perf stat: Add cpu_aggr_map for loop
> >   perf metric: Directly use counts rather than saved_value
> >   perf stat: Use counts rather than saved_value
> >   perf stat: Remove saved_value/runtime_stat
> >
> >  tools/perf/Documentation/perf-stat.txt        |   27 +-
> >  tools/perf/Documentation/topdown.txt          |   70 +-
> >  tools/perf/arch/powerpc/util/header.c         |    2 +-
> >  tools/perf/arch/x86/util/evlist.c             |    6 +-
> >  tools/perf/arch/x86/util/topdown.c            |   78 +-
> >  tools/perf/arch/x86/util/topdown.h            |    1 -
> >  tools/perf/builtin-list.c                     |   13 +-
> >  tools/perf/builtin-script.c                   |    9 +-
> >  tools/perf/builtin-stat.c                     |  233 +-
> >  .../arch/x86/alderlake/adl-metrics.json       | 3190 ++++++++++-------
> >  .../pmu-events/arch/x86/alderlake/cache.json  |   36 +-
> >  .../arch/x86/alderlake/floating-point.json    |   27 +
> >  .../arch/x86/alderlake/frontend.json          |    9 +
> >  .../pmu-events/arch/x86/alderlake/memory.json |    3 +-
> >  .../arch/x86/alderlake/pipeline.json          |   14 +-
> >  .../arch/x86/alderlake/uncore-other.json      |   28 +-
> >  .../arch/x86/alderlaken/adln-metrics.json     |  811 +++--
> >  .../arch/x86/broadwell/bdw-metrics.json       | 1439 ++++----
> >  .../arch/x86/broadwellde/bdwde-metrics.json   | 1405 ++++----
> >  .../arch/x86/broadwellx/bdx-metrics.json      | 1626 +++++----
> >  .../arch/x86/broadwellx/uncore-cache.json     |   74 +-
> >  .../x86/broadwellx/uncore-interconnect.json   |   64 +-
> >  .../arch/x86/broadwellx/uncore-other.json     |    4 +-
> >  .../arch/x86/cascadelakex/cache.json          |   24 +-
> >  .../arch/x86/cascadelakex/clx-metrics.json    | 2198 ++++++------
> >  .../arch/x86/cascadelakex/frontend.json       |    8 +-
> >  .../arch/x86/cascadelakex/pipeline.json       |   16 +
> >  .../arch/x86/cascadelakex/uncore-memory.json  |   18 +-
> >  .../arch/x86/cascadelakex/uncore-other.json   |  120 +-
> >  .../arch/x86/cascadelakex/uncore-power.json   |    8 +-
> >  .../arch/x86/graniterapids/cache.json         |   54 +
> >  .../arch/x86/graniterapids/frontend.json      |   10 +
> >  .../arch/x86/graniterapids/memory.json        |  174 +
> >  .../arch/x86/graniterapids/other.json         |   29 +
> >  .../arch/x86/graniterapids/pipeline.json      |  102 +
> >  .../x86/graniterapids/virtual-memory.json     |   26 +
> >  .../arch/x86/haswell/hsw-metrics.json         | 1220 ++++---
> >  .../arch/x86/haswellx/hsx-metrics.json        | 1397 ++++----
> >  .../pmu-events/arch/x86/icelake/cache.json    |   16 +
> >  .../arch/x86/icelake/floating-point.json      |   31 +
> >  .../arch/x86/icelake/icl-metrics.json         | 1932 +++++-----
> >  .../pmu-events/arch/x86/icelake/pipeline.json |   23 +-
> >  .../arch/x86/icelake/uncore-other.json        |   56 +
> >  .../arch/x86/icelakex/icx-metrics.json        | 2153 +++++------
> >  .../arch/x86/icelakex/uncore-memory.json      |    2 +-
> >  .../arch/x86/icelakex/uncore-other.json       |    4 +-
> >  .../arch/x86/ivybridge/ivb-metrics.json       | 1270 ++++---
> >  .../arch/x86/ivytown/ivt-metrics.json         | 1311 ++++---
> >  .../pmu-events/arch/x86/jaketown/cache.json   |    6 +-
> >  .../arch/x86/jaketown/floating-point.json     |    2 +-
> >  .../arch/x86/jaketown/frontend.json           |   12 +-
> >  .../arch/x86/jaketown/jkt-metrics.json        |  602 ++--
> >  .../arch/x86/jaketown/pipeline.json           |    2 +-
> >  .../arch/x86/jaketown/uncore-cache.json       |   22 +-
> >  .../x86/jaketown/uncore-interconnect.json     |   74 +-
> >  .../arch/x86/jaketown/uncore-memory.json      |    4 +-
> >  .../arch/x86/jaketown/uncore-other.json       |   22 +-
> >  .../arch/x86/jaketown/uncore-power.json       |    8 +-
> >  .../arch/x86/knightslanding/cache.json        |   94 +-
> >  .../arch/x86/knightslanding/pipeline.json     |    8 +-
> >  .../arch/x86/knightslanding/uncore-other.json |    8 +-
> >  tools/perf/pmu-events/arch/x86/mapfile.csv    |   29 +-
> >  .../arch/x86/sandybridge/cache.json           |    8 +-
> >  .../arch/x86/sandybridge/floating-point.json  |    2 +-
> >  .../arch/x86/sandybridge/frontend.json        |   12 +-
> >  .../arch/x86/sandybridge/pipeline.json        |    2 +-
> >  .../arch/x86/sandybridge/snb-metrics.json     |  601 ++--
> >  .../arch/x86/sapphirerapids/cache.json        |   24 +-
> >  .../x86/sapphirerapids/floating-point.json    |   32 +
> >  .../arch/x86/sapphirerapids/frontend.json     |    8 +
> >  .../arch/x86/sapphirerapids/pipeline.json     |   19 +-
> >  .../arch/x86/sapphirerapids/spr-metrics.json  | 2283 ++++++------
> >  .../arch/x86/sapphirerapids/uncore-other.json |   60 +
> >  .../arch/x86/silvermont/frontend.json         |    2 +-
> >  .../arch/x86/silvermont/pipeline.json         |    2 +-
> >  .../pmu-events/arch/x86/skylake/cache.json    |   25 +-
> >  .../pmu-events/arch/x86/skylake/frontend.json |    8 +-
> >  .../pmu-events/arch/x86/skylake/other.json    |    1 +
> >  .../pmu-events/arch/x86/skylake/pipeline.json |   16 +
> >  .../arch/x86/skylake/skl-metrics.json         | 1877 ++++++----
> >  .../arch/x86/skylake/uncore-other.json        |    1 +
> >  .../pmu-events/arch/x86/skylakex/cache.json   |    8 +-
> >  .../arch/x86/skylakex/frontend.json           |    8 +-
> >  .../arch/x86/skylakex/pipeline.json           |   16 +
> >  .../arch/x86/skylakex/skx-metrics.json        | 2097 +++++------
> >  .../arch/x86/skylakex/uncore-memory.json      |    2 +-
> >  .../arch/x86/skylakex/uncore-other.json       |   96 +-
> >  .../arch/x86/skylakex/uncore-power.json       |    6 +-
> >  .../arch/x86/tigerlake/floating-point.json    |   31 +
> >  .../arch/x86/tigerlake/pipeline.json          |   18 +
> >  .../arch/x86/tigerlake/tgl-metrics.json       | 1942 +++++-----
> >  .../arch/x86/tigerlake/uncore-other.json      |   28 +-
> >  .../arch/x86/westmereep-dp/cache.json         |    2 +-
> >  .../x86/westmereep-dp/virtual-memory.json     |    2 +-
> >  tools/perf/pmu-events/jevents.py              |   58 +-
> >  tools/perf/pmu-events/metric.py               |    8 +-
> >  tools/perf/pmu-events/pmu-events.h            |   35 +-
> >  tools/perf/tests/expand-cgroup.c              |    3 +-
> >  tools/perf/tests/expr.c                       |    7 +-
> >  tools/perf/tests/parse-metric.c               |   21 +-
> >  tools/perf/tests/pmu-events.c                 |   49 +-
> >  tools/perf/util/cpumap.h                      |    3 +
> >  tools/perf/util/cputopo.c                     |   14 +
> >  tools/perf/util/cputopo.h                     |    5 +
> >  tools/perf/util/evsel.h                       |    2 +-
> >  tools/perf/util/expr.c                        |   16 +-
> >  tools/perf/util/expr.y                        |   12 +-
> >  tools/perf/util/metricgroup.c                 |  178 +-
> >  tools/perf/util/metricgroup.h                 |    5 +-
> >  tools/perf/util/pmu.c                         |   17 +-
> >  tools/perf/util/print-events.h                |    1 +
> >  tools/perf/util/smt.c                         |   11 +-
> >  tools/perf/util/smt.h                         |   12 +-
> >  tools/perf/util/stat-display.c                |  117 +-
> >  tools/perf/util/stat-shadow.c                 | 1287 ++-----
> >  tools/perf/util/stat.c                        |   74 -
> >  tools/perf/util/stat.h                        |   96 +-
> >  tools/perf/util/synthetic-events.c            |    2 +-
> >  tools/perf/util/topdown.c                     |   68 +-
> >  tools/perf/util/topdown.h                     |   11 +-
> >  120 files changed, 18025 insertions(+), 15590 deletions(-)
> >  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/cache.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/frontend.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/memory.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/other.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/virtual-memory.json
> >
> > --
> > 2.39.2.637.g21b0678d19-goog
> >
>
> --
>
> - Arnaldo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 00/51] shadow metric clean up and improvements
  2023-02-19 15:43   ` Ian Rogers
@ 2023-02-21 17:44     ` Ian Rogers
  2023-02-22 13:47       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 50+ messages in thread
From: Ian Rogers @ 2023-02-21 17:44 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Stephane Eranian
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Zhengjun Xing, Sandipan Das, James Clark, Kajol Jain, John Garry,
	Kan Liang, Adrian Hunter, Andrii Nakryiko, Eduard Zingerman,
	Suzuki Poulouse, Leo Yan, Florian Fischer, Ravi Bangoria,
	Jing Zhang, Sean Christopherson, Athira Rajeev, LKML,
	linux-perf-users, moderated list:ARM/STM32 ARCHITECTURE,
	Linux ARM, Perry Taylor, Caleb Biggers

On Sun, Feb 19, 2023 at 7:43 AM Ian Rogers <irogers@google.com> wrote:
>
> On Sun, Feb 19, 2023, 3:17 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> >
> > Em Sun, Feb 19, 2023 at 01:27:57AM -0800, Ian Rogers escreveu:
> > > Recently the shadow stat metrics broke due to repeated aggregation and
> > > a quick fix was applied:
> > > https://lore.kernel.org/lkml/20230209064447.83733-1-irogers@google.com/
> > > This is the longer fix but one that comes with some extras. To avoid
> > > fixing issues for hard coded metrics, the topdown, SMI cost and
> > > transaction flags are moved into json metrics. A side effect of this
> > > is that TopdownL1 metrics will now be displayed when supported, if no
> > > "perf stat" events are specified.
> > >
> > > Another fix included here is for event grouping as raised in:
> > > https://lore.kernel.org/lkml/CA+icZUU_ew7pzWJJZLbj1xsU6MQTPrj8tkFfDhNdTDRQfGUBMQ@mail.gmail.com/
> > > Metrics are now tagged with NMI and SMT flags, meaning that the events
> > > shouldn't be grouped if the NMI watchdog is enabled or SMT is enabled.
> > >
> > > Given the two issues, the metrics are re-generated and the patches
> > > also include the latest Intel vendor events. The changes to the metric
> > > generation code can be seen in:
> > > https://github.com/intel/perfmon/pull/56
> > >
> > > Hard coded metrics support thresholds, the patches add this ability to
> > > json metrics so that the hard coded metrics can be removed. Migrate
> > > remaining hard coded metrics to looking up counters from the
> > > evlist/aggregation count. Finally, get rid of the saved_value logic
> > > and thereby look to fix the aggregation issues.
> > >
> > > Some related fix ups and code clean ups are included in the changes,
> > > in particular to aid with the code's readability and to keep topdown
> > > documentation in sync.
> >
> > That is great work but won't have a reasonable time sitting on
> > linux-next to make it into 6.3.
> >
> > I have just applied it locally for the usual set of tests, that I'll
> > report back here.
>
>
> Ugh. I'm guessing it won't be useful if I point out more things broken
> with the current workaround, like metrics with --repeat :-/
>
> Thanks,
> Ian

So currently the flow of patches is:

1) initially testing - acme tmp.perf/core
2) things staged for next release - acme perf/core (perhaps this
should be called perf-next)
3) linux wide next release testing - linux-next
4) release - linus/master

I wonder if there should be a perf-next-next branch to work around the
"sitting time" problem. Otherwise anybody who touches code in these 51
patches will create a merge conflict. Given the aggregation issues
we're likely to see changes in this code and so conflicts are likely
to happen.

The patch flow with perf-next-next would be:

1) initial testing - tmp.perf-next-next
2) things acquiring sitting time and where developers work - perf-next-next
3) things staged for the next release - perf-next
4) as 3 above
5) as 4 above

With linux-next picking up acme perf/core (aka perf-next) daily it
isn't clear whether we should work off of perf-core or linux-next as
they are so in sync. This process means we've lost a sitting place for
developer patches and we're going to feel the pain in terms of merge
conflicts on the list, difficulty building off of the latest work
without cherry-picking from the list, etc.

Thanks,
Ian

> > - Arnaldo
> >
> > > Ian Rogers (51):
> > >   perf tools: Ensure evsel name is initialized
> > >   perf metrics: Improve variable names
> > >   perf pmu-events: Remove aggr_mode from pmu_event
> > >   perf pmu-events: Change aggr_mode to be an enum
> > >   perf pmu-events: Change deprecated to be a bool
> > >   perf pmu-events: Change perpkg to be a bool
> > >   perf expr: Make the online topology accessible globally
> > >   perf pmu-events: Make the metric_constraint an enum
> > >   perf pmu-events: Don't '\0' terminate enum values
> > >   perf vendor events intel: Refresh alderlake events
> > >   perf vendor events intel: Refresh alderlake-n metrics
> > >   perf vendor events intel: Refresh broadwell metrics
> > >   perf vendor events intel: Refresh broadwellde metrics
> > >   perf vendor events intel: Refresh broadwellx metrics
> > >   perf vendor events intel: Refresh cascadelakex events
> > >   perf vendor events intel: Add graniterapids events
> > >   perf vendor events intel: Refresh haswell metrics
> > >   perf vendor events intel: Refresh haswellx metrics
> > >   perf vendor events intel: Refresh icelake events
> > >   perf vendor events intel: Refresh icelakex metrics
> > >   perf vendor events intel: Refresh ivybridge metrics
> > >   perf vendor events intel: Refresh ivytown metrics
> > >   perf vendor events intel: Refresh jaketown events
> > >   perf vendor events intel: Refresh knightslanding events
> > >   perf vendor events intel: Refresh sandybridge events
> > >   perf vendor events intel: Refresh sapphirerapids events
> > >   perf vendor events intel: Refresh silvermont events
> > >   perf vendor events intel: Refresh skylake events
> > >   perf vendor events intel: Refresh skylakex metrics
> > >   perf vendor events intel: Refresh tigerlake events
> > >   perf vendor events intel: Refresh westmereep-dp events
> > >   perf jevents: Add rand support to metrics
> > >   perf jevent: Parse metric thresholds
> > >   perf pmu-events: Test parsing metric thresholds with the fake PMU
> > >   perf list: Support for printing metric thresholds
> > >   perf metric: Compute and print threshold values
> > >   perf expr: More explicit NAN handling
> > >   perf metric: Add --metric-no-threshold option
> > >   perf stat: Add TopdownL1 metric as a default if present
> > >   perf stat: Implement --topdown using json metrics
> > >   perf stat: Remove topdown event special handling
> > >   perf doc: Refresh topdown documentation
> > >   perf stat: Remove hard coded transaction events
> > >   perf stat: Use metrics for --smi-cost
> > >   perf stat: Remove perf_stat_evsel_id
> > >   perf stat: Move enums from header
> > >   perf stat: Hide runtime_stat
> > >   perf stat: Add cpu_aggr_map for loop
> > >   perf metric: Directly use counts rather than saved_value
> > >   perf stat: Use counts rather than saved_value
> > >   perf stat: Remove saved_value/runtime_stat
> > >
> > >  tools/perf/Documentation/perf-stat.txt        |   27 +-
> > >  tools/perf/Documentation/topdown.txt          |   70 +-
> > >  tools/perf/arch/powerpc/util/header.c         |    2 +-
> > >  tools/perf/arch/x86/util/evlist.c             |    6 +-
> > >  tools/perf/arch/x86/util/topdown.c            |   78 +-
> > >  tools/perf/arch/x86/util/topdown.h            |    1 -
> > >  tools/perf/builtin-list.c                     |   13 +-
> > >  tools/perf/builtin-script.c                   |    9 +-
> > >  tools/perf/builtin-stat.c                     |  233 +-
> > >  .../arch/x86/alderlake/adl-metrics.json       | 3190 ++++++++++-------
> > >  .../pmu-events/arch/x86/alderlake/cache.json  |   36 +-
> > >  .../arch/x86/alderlake/floating-point.json    |   27 +
> > >  .../arch/x86/alderlake/frontend.json          |    9 +
> > >  .../pmu-events/arch/x86/alderlake/memory.json |    3 +-
> > >  .../arch/x86/alderlake/pipeline.json          |   14 +-
> > >  .../arch/x86/alderlake/uncore-other.json      |   28 +-
> > >  .../arch/x86/alderlaken/adln-metrics.json     |  811 +++--
> > >  .../arch/x86/broadwell/bdw-metrics.json       | 1439 ++++----
> > >  .../arch/x86/broadwellde/bdwde-metrics.json   | 1405 ++++----
> > >  .../arch/x86/broadwellx/bdx-metrics.json      | 1626 +++++----
> > >  .../arch/x86/broadwellx/uncore-cache.json     |   74 +-
> > >  .../x86/broadwellx/uncore-interconnect.json   |   64 +-
> > >  .../arch/x86/broadwellx/uncore-other.json     |    4 +-
> > >  .../arch/x86/cascadelakex/cache.json          |   24 +-
> > >  .../arch/x86/cascadelakex/clx-metrics.json    | 2198 ++++++------
> > >  .../arch/x86/cascadelakex/frontend.json       |    8 +-
> > >  .../arch/x86/cascadelakex/pipeline.json       |   16 +
> > >  .../arch/x86/cascadelakex/uncore-memory.json  |   18 +-
> > >  .../arch/x86/cascadelakex/uncore-other.json   |  120 +-
> > >  .../arch/x86/cascadelakex/uncore-power.json   |    8 +-
> > >  .../arch/x86/graniterapids/cache.json         |   54 +
> > >  .../arch/x86/graniterapids/frontend.json      |   10 +
> > >  .../arch/x86/graniterapids/memory.json        |  174 +
> > >  .../arch/x86/graniterapids/other.json         |   29 +
> > >  .../arch/x86/graniterapids/pipeline.json      |  102 +
> > >  .../x86/graniterapids/virtual-memory.json     |   26 +
> > >  .../arch/x86/haswell/hsw-metrics.json         | 1220 ++++---
> > >  .../arch/x86/haswellx/hsx-metrics.json        | 1397 ++++----
> > >  .../pmu-events/arch/x86/icelake/cache.json    |   16 +
> > >  .../arch/x86/icelake/floating-point.json      |   31 +
> > >  .../arch/x86/icelake/icl-metrics.json         | 1932 +++++-----
> > >  .../pmu-events/arch/x86/icelake/pipeline.json |   23 +-
> > >  .../arch/x86/icelake/uncore-other.json        |   56 +
> > >  .../arch/x86/icelakex/icx-metrics.json        | 2153 +++++------
> > >  .../arch/x86/icelakex/uncore-memory.json      |    2 +-
> > >  .../arch/x86/icelakex/uncore-other.json       |    4 +-
> > >  .../arch/x86/ivybridge/ivb-metrics.json       | 1270 ++++---
> > >  .../arch/x86/ivytown/ivt-metrics.json         | 1311 ++++---
> > >  .../pmu-events/arch/x86/jaketown/cache.json   |    6 +-
> > >  .../arch/x86/jaketown/floating-point.json     |    2 +-
> > >  .../arch/x86/jaketown/frontend.json           |   12 +-
> > >  .../arch/x86/jaketown/jkt-metrics.json        |  602 ++--
> > >  .../arch/x86/jaketown/pipeline.json           |    2 +-
> > >  .../arch/x86/jaketown/uncore-cache.json       |   22 +-
> > >  .../x86/jaketown/uncore-interconnect.json     |   74 +-
> > >  .../arch/x86/jaketown/uncore-memory.json      |    4 +-
> > >  .../arch/x86/jaketown/uncore-other.json       |   22 +-
> > >  .../arch/x86/jaketown/uncore-power.json       |    8 +-
> > >  .../arch/x86/knightslanding/cache.json        |   94 +-
> > >  .../arch/x86/knightslanding/pipeline.json     |    8 +-
> > >  .../arch/x86/knightslanding/uncore-other.json |    8 +-
> > >  tools/perf/pmu-events/arch/x86/mapfile.csv    |   29 +-
> > >  .../arch/x86/sandybridge/cache.json           |    8 +-
> > >  .../arch/x86/sandybridge/floating-point.json  |    2 +-
> > >  .../arch/x86/sandybridge/frontend.json        |   12 +-
> > >  .../arch/x86/sandybridge/pipeline.json        |    2 +-
> > >  .../arch/x86/sandybridge/snb-metrics.json     |  601 ++--
> > >  .../arch/x86/sapphirerapids/cache.json        |   24 +-
> > >  .../x86/sapphirerapids/floating-point.json    |   32 +
> > >  .../arch/x86/sapphirerapids/frontend.json     |    8 +
> > >  .../arch/x86/sapphirerapids/pipeline.json     |   19 +-
> > >  .../arch/x86/sapphirerapids/spr-metrics.json  | 2283 ++++++------
> > >  .../arch/x86/sapphirerapids/uncore-other.json |   60 +
> > >  .../arch/x86/silvermont/frontend.json         |    2 +-
> > >  .../arch/x86/silvermont/pipeline.json         |    2 +-
> > >  .../pmu-events/arch/x86/skylake/cache.json    |   25 +-
> > >  .../pmu-events/arch/x86/skylake/frontend.json |    8 +-
> > >  .../pmu-events/arch/x86/skylake/other.json    |    1 +
> > >  .../pmu-events/arch/x86/skylake/pipeline.json |   16 +
> > >  .../arch/x86/skylake/skl-metrics.json         | 1877 ++++++----
> > >  .../arch/x86/skylake/uncore-other.json        |    1 +
> > >  .../pmu-events/arch/x86/skylakex/cache.json   |    8 +-
> > >  .../arch/x86/skylakex/frontend.json           |    8 +-
> > >  .../arch/x86/skylakex/pipeline.json           |   16 +
> > >  .../arch/x86/skylakex/skx-metrics.json        | 2097 +++++------
> > >  .../arch/x86/skylakex/uncore-memory.json      |    2 +-
> > >  .../arch/x86/skylakex/uncore-other.json       |   96 +-
> > >  .../arch/x86/skylakex/uncore-power.json       |    6 +-
> > >  .../arch/x86/tigerlake/floating-point.json    |   31 +
> > >  .../arch/x86/tigerlake/pipeline.json          |   18 +
> > >  .../arch/x86/tigerlake/tgl-metrics.json       | 1942 +++++-----
> > >  .../arch/x86/tigerlake/uncore-other.json      |   28 +-
> > >  .../arch/x86/westmereep-dp/cache.json         |    2 +-
> > >  .../x86/westmereep-dp/virtual-memory.json     |    2 +-
> > >  tools/perf/pmu-events/jevents.py              |   58 +-
> > >  tools/perf/pmu-events/metric.py               |    8 +-
> > >  tools/perf/pmu-events/pmu-events.h            |   35 +-
> > >  tools/perf/tests/expand-cgroup.c              |    3 +-
> > >  tools/perf/tests/expr.c                       |    7 +-
> > >  tools/perf/tests/parse-metric.c               |   21 +-
> > >  tools/perf/tests/pmu-events.c                 |   49 +-
> > >  tools/perf/util/cpumap.h                      |    3 +
> > >  tools/perf/util/cputopo.c                     |   14 +
> > >  tools/perf/util/cputopo.h                     |    5 +
> > >  tools/perf/util/evsel.h                       |    2 +-
> > >  tools/perf/util/expr.c                        |   16 +-
> > >  tools/perf/util/expr.y                        |   12 +-
> > >  tools/perf/util/metricgroup.c                 |  178 +-
> > >  tools/perf/util/metricgroup.h                 |    5 +-
> > >  tools/perf/util/pmu.c                         |   17 +-
> > >  tools/perf/util/print-events.h                |    1 +
> > >  tools/perf/util/smt.c                         |   11 +-
> > >  tools/perf/util/smt.h                         |   12 +-
> > >  tools/perf/util/stat-display.c                |  117 +-
> > >  tools/perf/util/stat-shadow.c                 | 1287 ++-----
> > >  tools/perf/util/stat.c                        |   74 -
> > >  tools/perf/util/stat.h                        |   96 +-
> > >  tools/perf/util/synthetic-events.c            |    2 +-
> > >  tools/perf/util/topdown.c                     |   68 +-
> > >  tools/perf/util/topdown.h                     |   11 +-
> > >  120 files changed, 18025 insertions(+), 15590 deletions(-)
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/cache.json
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/frontend.json
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/memory.json
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/other.json
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/virtual-memory.json
> > >
> > > --
> > > 2.39.2.637.g21b0678d19-goog
> > >
> >
> > --
> >
> > - Arnaldo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 00/51] shadow metric clean up and improvements
  2023-02-21 17:44     ` Ian Rogers
@ 2023-02-22 13:47       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-02-22 13:47 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Stephane Eranian, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Maxime Coquelin,
	Alexandre Torgue, Zhengjun Xing, Sandipan Das, James Clark,
	Kajol Jain, John Garry, Kan Liang, Adrian Hunter, Andrii Nakryiko,
	Eduard Zingerman, Suzuki Poulouse, Leo Yan, Florian Fischer,
	Ravi Bangoria, Jing Zhang, Sean Christopherson, Athira Rajeev,
	LKML, linux-perf-users, moderated list:ARM/STM32 ARCHITECTURE,
	Linux ARM, Perry Taylor, Caleb Biggers

Em Tue, Feb 21, 2023 at 09:44:36AM -0800, Ian Rogers escreveu:
> On Sun, Feb 19, 2023 at 7:43 AM Ian Rogers <irogers@google.com> wrote:
> >
> > On Sun, Feb 19, 2023, 3:17 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > >
> > > Em Sun, Feb 19, 2023 at 01:27:57AM -0800, Ian Rogers escreveu:
> > > > Recently the shadow stat metrics broke due to repeated aggregation and
> > > > a quick fix was applied:
> > > > https://lore.kernel.org/lkml/20230209064447.83733-1-irogers@google.com/
> > > > This is the longer fix but one that comes with some extras. To avoid
> > > > fixing issues for hard coded metrics, the topdown, SMI cost and
> > > > transaction flags are moved into json metrics. A side effect of this
> > > > is that TopdownL1 metrics will now be displayed when supported, if no
> > > > "perf stat" events are specified.
> > > >
> > > > Another fix included here is for event grouping as raised in:
> > > > https://lore.kernel.org/lkml/CA+icZUU_ew7pzWJJZLbj1xsU6MQTPrj8tkFfDhNdTDRQfGUBMQ@mail.gmail.com/
> > > > Metrics are now tagged with NMI and SMT flags, meaning that the events
> > > > shouldn't be grouped if the NMI watchdog is enabled or SMT is enabled.
> > > >
> > > > Given the two issues, the metrics are re-generated and the patches
> > > > also include the latest Intel vendor events. The changes to the metric
> > > > generation code can be seen in:
> > > > https://github.com/intel/perfmon/pull/56
> > > >
> > > > Hard coded metrics support thresholds, the patches add this ability to
> > > > json metrics so that the hard coded metrics can be removed. Migrate
> > > > remaining hard coded metrics to looking up counters from the
> > > > evlist/aggregation count. Finally, get rid of the saved_value logic
> > > > and thereby look to fix the aggregation issues.
> > > >
> > > > Some related fix ups and code clean ups are included in the changes,
> > > > in particular to aid with the code's readability and to keep topdown
> > > > documentation in sync.
> > >
> > > That is great work but won't have a reasonable time sitting on
> > > linux-next to make it into 6.3.
> > >
> > > I have just applied it locally for the usual set of tests, that I'll
> > > report back here.
> >
> >
> > Ugh. I'm guessing it won't be useful if I point out more things broken
> > with the current workaround, like metrics with --repeat :-/
> >
> > Thanks,
> > Ian
> 
> So currently the flow of patches is:
> 
> 1) initially testing - acme tmp.perf/core
> 2) things staged for next release - acme perf/core (perhaps this
> should be called perf-next)

Yeah, perf-tools-next probably.

> 3) linux wide next release testing - linux-next
> 4) release - linus/master
> 
> I wonder if there should be a perf-next-next branch to work around the
> "sitting time" problem. Otherwise anybody who touches code in these 51
> patches will create a merge conflict. Given the aggregation issues
> we're likely to see changes in this code and so conflicts are likely
> to happen.
> 
> The patch flow with perf-next-next would be:
> 
> 1) initial testing - tmp.perf-next-next
> 2) things acquiring sitting time and where developers work - perf-next-next
> 3) things staged for the next release - perf-next
> 4) as 3 above
> 5) as 4 above
> 
> With linux-next picking up acme perf/core (aka perf-next) daily it
> isn't clear whether we should work off of perf-core or linux-next as
> they are so in sync. This process means we've lost a sitting place for
> developer patches and we're going to feel the pain in terms of merge
> conflicts on the list, difficulty building off of the latest work
> without cherry-picking from the list, etc.

I'll send perf/core to Linus today and then your 51 patches will appear
on perf-tools-next, where we'll do what was done before in perf/core
(remained like that for historical reasons).

I'll make perf/urgent become 'perf-tools', and when the merge window
closes, we turn 'perf-tools-next' into 'perf-tools',

- Arnaldo
 
> Thanks,
> Ian
> 
> > > - Arnaldo
> > >
> > > > Ian Rogers (51):
> > > >   perf tools: Ensure evsel name is initialized
> > > >   perf metrics: Improve variable names
> > > >   perf pmu-events: Remove aggr_mode from pmu_event
> > > >   perf pmu-events: Change aggr_mode to be an enum
> > > >   perf pmu-events: Change deprecated to be a bool
> > > >   perf pmu-events: Change perpkg to be a bool
> > > >   perf expr: Make the online topology accessible globally
> > > >   perf pmu-events: Make the metric_constraint an enum
> > > >   perf pmu-events: Don't '\0' terminate enum values
> > > >   perf vendor events intel: Refresh alderlake events
> > > >   perf vendor events intel: Refresh alderlake-n metrics
> > > >   perf vendor events intel: Refresh broadwell metrics
> > > >   perf vendor events intel: Refresh broadwellde metrics
> > > >   perf vendor events intel: Refresh broadwellx metrics
> > > >   perf vendor events intel: Refresh cascadelakex events
> > > >   perf vendor events intel: Add graniterapids events
> > > >   perf vendor events intel: Refresh haswell metrics
> > > >   perf vendor events intel: Refresh haswellx metrics
> > > >   perf vendor events intel: Refresh icelake events
> > > >   perf vendor events intel: Refresh icelakex metrics
> > > >   perf vendor events intel: Refresh ivybridge metrics
> > > >   perf vendor events intel: Refresh ivytown metrics
> > > >   perf vendor events intel: Refresh jaketown events
> > > >   perf vendor events intel: Refresh knightslanding events
> > > >   perf vendor events intel: Refresh sandybridge events
> > > >   perf vendor events intel: Refresh sapphirerapids events
> > > >   perf vendor events intel: Refresh silvermont events
> > > >   perf vendor events intel: Refresh skylake events
> > > >   perf vendor events intel: Refresh skylakex metrics
> > > >   perf vendor events intel: Refresh tigerlake events
> > > >   perf vendor events intel: Refresh westmereep-dp events
> > > >   perf jevents: Add rand support to metrics
> > > >   perf jevent: Parse metric thresholds
> > > >   perf pmu-events: Test parsing metric thresholds with the fake PMU
> > > >   perf list: Support for printing metric thresholds
> > > >   perf metric: Compute and print threshold values
> > > >   perf expr: More explicit NAN handling
> > > >   perf metric: Add --metric-no-threshold option
> > > >   perf stat: Add TopdownL1 metric as a default if present
> > > >   perf stat: Implement --topdown using json metrics
> > > >   perf stat: Remove topdown event special handling
> > > >   perf doc: Refresh topdown documentation
> > > >   perf stat: Remove hard coded transaction events
> > > >   perf stat: Use metrics for --smi-cost
> > > >   perf stat: Remove perf_stat_evsel_id
> > > >   perf stat: Move enums from header
> > > >   perf stat: Hide runtime_stat
> > > >   perf stat: Add cpu_aggr_map for loop
> > > >   perf metric: Directly use counts rather than saved_value
> > > >   perf stat: Use counts rather than saved_value
> > > >   perf stat: Remove saved_value/runtime_stat
> > > >
> > > >  tools/perf/Documentation/perf-stat.txt        |   27 +-
> > > >  tools/perf/Documentation/topdown.txt          |   70 +-
> > > >  tools/perf/arch/powerpc/util/header.c         |    2 +-
> > > >  tools/perf/arch/x86/util/evlist.c             |    6 +-
> > > >  tools/perf/arch/x86/util/topdown.c            |   78 +-
> > > >  tools/perf/arch/x86/util/topdown.h            |    1 -
> > > >  tools/perf/builtin-list.c                     |   13 +-
> > > >  tools/perf/builtin-script.c                   |    9 +-
> > > >  tools/perf/builtin-stat.c                     |  233 +-
> > > >  .../arch/x86/alderlake/adl-metrics.json       | 3190 ++++++++++-------
> > > >  .../pmu-events/arch/x86/alderlake/cache.json  |   36 +-
> > > >  .../arch/x86/alderlake/floating-point.json    |   27 +
> > > >  .../arch/x86/alderlake/frontend.json          |    9 +
> > > >  .../pmu-events/arch/x86/alderlake/memory.json |    3 +-
> > > >  .../arch/x86/alderlake/pipeline.json          |   14 +-
> > > >  .../arch/x86/alderlake/uncore-other.json      |   28 +-
> > > >  .../arch/x86/alderlaken/adln-metrics.json     |  811 +++--
> > > >  .../arch/x86/broadwell/bdw-metrics.json       | 1439 ++++----
> > > >  .../arch/x86/broadwellde/bdwde-metrics.json   | 1405 ++++----
> > > >  .../arch/x86/broadwellx/bdx-metrics.json      | 1626 +++++----
> > > >  .../arch/x86/broadwellx/uncore-cache.json     |   74 +-
> > > >  .../x86/broadwellx/uncore-interconnect.json   |   64 +-
> > > >  .../arch/x86/broadwellx/uncore-other.json     |    4 +-
> > > >  .../arch/x86/cascadelakex/cache.json          |   24 +-
> > > >  .../arch/x86/cascadelakex/clx-metrics.json    | 2198 ++++++------
> > > >  .../arch/x86/cascadelakex/frontend.json       |    8 +-
> > > >  .../arch/x86/cascadelakex/pipeline.json       |   16 +
> > > >  .../arch/x86/cascadelakex/uncore-memory.json  |   18 +-
> > > >  .../arch/x86/cascadelakex/uncore-other.json   |  120 +-
> > > >  .../arch/x86/cascadelakex/uncore-power.json   |    8 +-
> > > >  .../arch/x86/graniterapids/cache.json         |   54 +
> > > >  .../arch/x86/graniterapids/frontend.json      |   10 +
> > > >  .../arch/x86/graniterapids/memory.json        |  174 +
> > > >  .../arch/x86/graniterapids/other.json         |   29 +
> > > >  .../arch/x86/graniterapids/pipeline.json      |  102 +
> > > >  .../x86/graniterapids/virtual-memory.json     |   26 +
> > > >  .../arch/x86/haswell/hsw-metrics.json         | 1220 ++++---
> > > >  .../arch/x86/haswellx/hsx-metrics.json        | 1397 ++++----
> > > >  .../pmu-events/arch/x86/icelake/cache.json    |   16 +
> > > >  .../arch/x86/icelake/floating-point.json      |   31 +
> > > >  .../arch/x86/icelake/icl-metrics.json         | 1932 +++++-----
> > > >  .../pmu-events/arch/x86/icelake/pipeline.json |   23 +-
> > > >  .../arch/x86/icelake/uncore-other.json        |   56 +
> > > >  .../arch/x86/icelakex/icx-metrics.json        | 2153 +++++------
> > > >  .../arch/x86/icelakex/uncore-memory.json      |    2 +-
> > > >  .../arch/x86/icelakex/uncore-other.json       |    4 +-
> > > >  .../arch/x86/ivybridge/ivb-metrics.json       | 1270 ++++---
> > > >  .../arch/x86/ivytown/ivt-metrics.json         | 1311 ++++---
> > > >  .../pmu-events/arch/x86/jaketown/cache.json   |    6 +-
> > > >  .../arch/x86/jaketown/floating-point.json     |    2 +-
> > > >  .../arch/x86/jaketown/frontend.json           |   12 +-
> > > >  .../arch/x86/jaketown/jkt-metrics.json        |  602 ++--
> > > >  .../arch/x86/jaketown/pipeline.json           |    2 +-
> > > >  .../arch/x86/jaketown/uncore-cache.json       |   22 +-
> > > >  .../x86/jaketown/uncore-interconnect.json     |   74 +-
> > > >  .../arch/x86/jaketown/uncore-memory.json      |    4 +-
> > > >  .../arch/x86/jaketown/uncore-other.json       |   22 +-
> > > >  .../arch/x86/jaketown/uncore-power.json       |    8 +-
> > > >  .../arch/x86/knightslanding/cache.json        |   94 +-
> > > >  .../arch/x86/knightslanding/pipeline.json     |    8 +-
> > > >  .../arch/x86/knightslanding/uncore-other.json |    8 +-
> > > >  tools/perf/pmu-events/arch/x86/mapfile.csv    |   29 +-
> > > >  .../arch/x86/sandybridge/cache.json           |    8 +-
> > > >  .../arch/x86/sandybridge/floating-point.json  |    2 +-
> > > >  .../arch/x86/sandybridge/frontend.json        |   12 +-
> > > >  .../arch/x86/sandybridge/pipeline.json        |    2 +-
> > > >  .../arch/x86/sandybridge/snb-metrics.json     |  601 ++--
> > > >  .../arch/x86/sapphirerapids/cache.json        |   24 +-
> > > >  .../x86/sapphirerapids/floating-point.json    |   32 +
> > > >  .../arch/x86/sapphirerapids/frontend.json     |    8 +
> > > >  .../arch/x86/sapphirerapids/pipeline.json     |   19 +-
> > > >  .../arch/x86/sapphirerapids/spr-metrics.json  | 2283 ++++++------
> > > >  .../arch/x86/sapphirerapids/uncore-other.json |   60 +
> > > >  .../arch/x86/silvermont/frontend.json         |    2 +-
> > > >  .../arch/x86/silvermont/pipeline.json         |    2 +-
> > > >  .../pmu-events/arch/x86/skylake/cache.json    |   25 +-
> > > >  .../pmu-events/arch/x86/skylake/frontend.json |    8 +-
> > > >  .../pmu-events/arch/x86/skylake/other.json    |    1 +
> > > >  .../pmu-events/arch/x86/skylake/pipeline.json |   16 +
> > > >  .../arch/x86/skylake/skl-metrics.json         | 1877 ++++++----
> > > >  .../arch/x86/skylake/uncore-other.json        |    1 +
> > > >  .../pmu-events/arch/x86/skylakex/cache.json   |    8 +-
> > > >  .../arch/x86/skylakex/frontend.json           |    8 +-
> > > >  .../arch/x86/skylakex/pipeline.json           |   16 +
> > > >  .../arch/x86/skylakex/skx-metrics.json        | 2097 +++++------
> > > >  .../arch/x86/skylakex/uncore-memory.json      |    2 +-
> > > >  .../arch/x86/skylakex/uncore-other.json       |   96 +-
> > > >  .../arch/x86/skylakex/uncore-power.json       |    6 +-
> > > >  .../arch/x86/tigerlake/floating-point.json    |   31 +
> > > >  .../arch/x86/tigerlake/pipeline.json          |   18 +
> > > >  .../arch/x86/tigerlake/tgl-metrics.json       | 1942 +++++-----
> > > >  .../arch/x86/tigerlake/uncore-other.json      |   28 +-
> > > >  .../arch/x86/westmereep-dp/cache.json         |    2 +-
> > > >  .../x86/westmereep-dp/virtual-memory.json     |    2 +-
> > > >  tools/perf/pmu-events/jevents.py              |   58 +-
> > > >  tools/perf/pmu-events/metric.py               |    8 +-
> > > >  tools/perf/pmu-events/pmu-events.h            |   35 +-
> > > >  tools/perf/tests/expand-cgroup.c              |    3 +-
> > > >  tools/perf/tests/expr.c                       |    7 +-
> > > >  tools/perf/tests/parse-metric.c               |   21 +-
> > > >  tools/perf/tests/pmu-events.c                 |   49 +-
> > > >  tools/perf/util/cpumap.h                      |    3 +
> > > >  tools/perf/util/cputopo.c                     |   14 +
> > > >  tools/perf/util/cputopo.h                     |    5 +
> > > >  tools/perf/util/evsel.h                       |    2 +-
> > > >  tools/perf/util/expr.c                        |   16 +-
> > > >  tools/perf/util/expr.y                        |   12 +-
> > > >  tools/perf/util/metricgroup.c                 |  178 +-
> > > >  tools/perf/util/metricgroup.h                 |    5 +-
> > > >  tools/perf/util/pmu.c                         |   17 +-
> > > >  tools/perf/util/print-events.h                |    1 +
> > > >  tools/perf/util/smt.c                         |   11 +-
> > > >  tools/perf/util/smt.h                         |   12 +-
> > > >  tools/perf/util/stat-display.c                |  117 +-
> > > >  tools/perf/util/stat-shadow.c                 | 1287 ++-----
> > > >  tools/perf/util/stat.c                        |   74 -
> > > >  tools/perf/util/stat.h                        |   96 +-
> > > >  tools/perf/util/synthetic-events.c            |    2 +-
> > > >  tools/perf/util/topdown.c                     |   68 +-
> > > >  tools/perf/util/topdown.h                     |   11 +-
> > > >  120 files changed, 18025 insertions(+), 15590 deletions(-)
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/cache.json
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/frontend.json
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/memory.json
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/other.json
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/virtual-memory.json
> > > >
> > > > --
> > > > 2.39.2.637.g21b0678d19-goog
> > > >
> > >
> > > --
> > >
> > > - Arnaldo

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 50/51] perf stat: Use counts rather than saved_value
  2023-02-19  9:28 ` [PATCH v1 50/51] perf stat: Use " Ian Rogers
@ 2023-02-24 22:48   ` Namhyung Kim
  2023-02-25  5:47     ` Ian Rogers
  0 siblings, 1 reply; 50+ messages in thread
From: Namhyung Kim @ 2023-02-24 22:48 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Maxime Coquelin,
	Alexandre Torgue, Zhengjun Xing, Sandipan Das, James Clark,
	Kajol Jain, John Garry, Kan Liang, Adrian Hunter, Andrii Nakryiko,
	Eduard Zingerman, Suzuki Poulouse, Leo Yan, Florian Fischer,
	Ravi Bangoria, Jing Zhang, Sean Christopherson, Athira Rajeev,
	linux-kernel, linux-perf-users, linux-stm32, linux-arm-kernel,
	Perry Taylor, Caleb Biggers, Stephane Eranian

On Sun, Feb 19, 2023 at 01:28:47AM -0800, Ian Rogers wrote:
> Switch the hard coded metrics to use the aggregate value rather than
> from saved_value. When computing a metric like IPC the aggregate count
> comes from instructions then cycles is looked up and if present IPC
> computed. Rather than lookup from the saved_value rbtree, search the
> counter's evlist for the desired counter.
> 
> A new helper evsel__stat_type is used to both quickly find a metric
> function and to identify when a counter is the one being sought. So
> that both total and miss counts can be sought, the stat_type enum is
> expanded. The ratio functions are rewritten to share a common helper
> with the ratios being directly passed rather than computed from an
> enum value.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
[SNIP]
> -static double runtime_stat_avg(enum stat_type type, int aggr_idx,
> -			       struct runtime_stat_data *rsd)
> +static double find_stat(const struct evsel *evsel, int aggr_idx, enum stat_type type)
>  {
> -	struct saved_value *v;
> -
> -	v = saved_value_lookup(NULL, aggr_idx, false, type, rsd->ctx, rsd->cgrp);
> -	if (!v)
> -		return 0.0;
> -
> -	return avg_stats(&v->stats);
> +	const struct evsel *cur;
> +	int evsel_ctx = evsel_context(evsel);
> +
> +	evlist__for_each_entry(evsel->evlist, cur) {
> +		struct perf_stat_aggr *aggr;
> +
> +		/* Ignore the evsel that is being searched from. */
> +		if (evsel == cur)
> +			continue;
> +
> +		/* Ignore evsels that are part of different groups. */
> +		if (evsel->core.leader->nr_members &&
> +		    evsel->core.leader != cur->core.leader)

The evsel->nr_members is somewhat confusing in that it counts itself
as a member.  I'm not sure it resets the nr_members to 0 for standalone
events.  You'd better checking nr_members greater than 1 for group
events.

Thanks,
Namhyung


> +			continue;
> +		/* Ignore evsels with mismatched modifiers. */
> +		if (evsel_ctx != evsel_context(cur))
> +			continue;
> +		/* Ignore if not the cgroup we're looking for. */
> +		if (evsel->cgrp != cur->cgrp)
> +			continue;
> +		/* Ignore if not the stat we're looking for. */
> +		if (type != evsel__stat_type(cur))
> +			continue;
> +
> +		aggr = &cur->stats->aggr[aggr_idx];
> +		if (type == STAT_NSECS)
> +			return aggr->counts.val;
> +		return aggr->counts.val * cur->scale;
> +	}
> +	return 0.0;
>  }

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 50/51] perf stat: Use counts rather than saved_value
  2023-02-24 22:48   ` Namhyung Kim
@ 2023-02-25  5:47     ` Ian Rogers
  0 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-25  5:47 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Maxime Coquelin,
	Alexandre Torgue, Zhengjun Xing, Sandipan Das, James Clark,
	Kajol Jain, John Garry, Kan Liang, Adrian Hunter, Andrii Nakryiko,
	Eduard Zingerman, Suzuki Poulouse, Leo Yan, Florian Fischer,
	Ravi Bangoria, Jing Zhang, Sean Christopherson, Athira Rajeev,
	linux-kernel, linux-perf-users, linux-stm32, linux-arm-kernel,
	Perry Taylor, Caleb Biggers, Stephane Eranian

On Fri, Feb 24, 2023 at 2:48 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Sun, Feb 19, 2023 at 01:28:47AM -0800, Ian Rogers wrote:
> > Switch the hard coded metrics to use the aggregate value rather than
> > from saved_value. When computing a metric like IPC the aggregate count
> > comes from instructions then cycles is looked up and if present IPC
> > computed. Rather than lookup from the saved_value rbtree, search the
> > counter's evlist for the desired counter.
> >
> > A new helper evsel__stat_type is used to both quickly find a metric
> > function and to identify when a counter is the one being sought. So
> > that both total and miss counts can be sought, the stat_type enum is
> > expanded. The ratio functions are rewritten to share a common helper
> > with the ratios being directly passed rather than computed from an
> > enum value.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> [SNIP]
> > -static double runtime_stat_avg(enum stat_type type, int aggr_idx,
> > -                            struct runtime_stat_data *rsd)
> > +static double find_stat(const struct evsel *evsel, int aggr_idx, enum stat_type type)
> >  {
> > -     struct saved_value *v;
> > -
> > -     v = saved_value_lookup(NULL, aggr_idx, false, type, rsd->ctx, rsd->cgrp);
> > -     if (!v)
> > -             return 0.0;
> > -
> > -     return avg_stats(&v->stats);
> > +     const struct evsel *cur;
> > +     int evsel_ctx = evsel_context(evsel);
> > +
> > +     evlist__for_each_entry(evsel->evlist, cur) {
> > +             struct perf_stat_aggr *aggr;
> > +
> > +             /* Ignore the evsel that is being searched from. */
> > +             if (evsel == cur)
> > +                     continue;
> > +
> > +             /* Ignore evsels that are part of different groups. */
> > +             if (evsel->core.leader->nr_members &&
> > +                 evsel->core.leader != cur->core.leader)
>
> The evsel->nr_members is somewhat confusing in that it counts itself
> as a member.  I'm not sure it resets the nr_members to 0 for standalone
> events.  You'd better checking nr_members greater than 1 for group
> events.

Agreed. The code is correct as the nr_members is only set when the
group is closed by the call to parse_events_set_leader, standalone
events don't close a group and so have nr_members of 0, but I agree
that's confusing.

I'm actually looking at a related bug where telling metrics not to
group events breaks the topdown events that must be grouped with
slots.

One thing that bugs me is the libperf evsel idx variable. When an
evsel is added to an evlist the idx is that number of elements in the
evlist. However, we reorganize the list in parse-events and so the idx
is just a hopefully unique value in the list. In places in parse
events we use the idx for computing the length of the evlist by
subtracting the last idx from the first and adding 1. Removing the idx
isn't straightforward though as later on it is used for mmaps.

Thanks,
Ian

> Thanks,
> Namhyung
>
>
> > +                     continue;
> > +             /* Ignore evsels with mismatched modifiers. */
> > +             if (evsel_ctx != evsel_context(cur))
> > +                     continue;
> > +             /* Ignore if not the cgroup we're looking for. */
> > +             if (evsel->cgrp != cur->cgrp)
> > +                     continue;
> > +             /* Ignore if not the stat we're looking for. */
> > +             if (type != evsel__stat_type(cur))
> > +                     continue;
> > +
> > +             aggr = &cur->stats->aggr[aggr_idx];
> > +             if (type == STAT_NSECS)
> > +                     return aggr->counts.val;
> > +             return aggr->counts.val * cur->scale;
> > +     }
> > +     return 0.0;
> >  }

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 39/51] perf stat: Add TopdownL1 metric as a default if present
  2023-02-19  9:28 ` [PATCH v1 39/51] perf stat: Add TopdownL1 metric as a default if present Ian Rogers
@ 2023-02-27 19:12   ` Liang, Kan
  2023-02-27 19:33     ` Ian Rogers
  0 siblings, 1 reply; 50+ messages in thread
From: Liang, Kan @ 2023-02-27 19:12 UTC (permalink / raw)
  To: Ian Rogers, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian



On 2023-02-19 4:28 a.m., Ian Rogers wrote:
> When there are no events and on Intel, the topdown events will be
> added by default if present. To display the metrics associated with
> these request special handling in stat-shadow.c. To more easily update
> these metrics use the json metric version via the TopdownL1
> group. This makes the handling less platform specific.
> 
> Modify the metricgroup__has_metric code to also cover metric groups.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/arch/x86/util/evlist.c  |  6 +++---
>  tools/perf/arch/x86/util/topdown.c | 30 ------------------------------
>  tools/perf/arch/x86/util/topdown.h |  1 -
>  tools/perf/builtin-stat.c          | 14 ++++++++++++++
>  tools/perf/util/metricgroup.c      |  6 ++----
>  5 files changed, 19 insertions(+), 38 deletions(-)
> 
> diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
> index cb59ce9b9638..8a7ae4162563 100644
> --- a/tools/perf/arch/x86/util/evlist.c
> +++ b/tools/perf/arch/x86/util/evlist.c
> @@ -59,10 +59,10 @@ int arch_evlist__add_default_attrs(struct evlist *evlist,
>  				   struct perf_event_attr *attrs,
>  				   size_t nr_attrs)
>  {
> -	if (nr_attrs)
> -		return ___evlist__add_default_attrs(evlist, attrs, nr_attrs);
> +	if (!nr_attrs)
> +		return 0;
>  
> -	return topdown_parse_events(evlist);
> +	return ___evlist__add_default_attrs(evlist, attrs, nr_attrs);
>  }
>  
>  struct evsel *arch_evlist__leader(struct list_head *list)
> diff --git a/tools/perf/arch/x86/util/topdown.c b/tools/perf/arch/x86/util/topdown.c
> index 54810f9acd6f..eb3a7d9652ab 100644
> --- a/tools/perf/arch/x86/util/topdown.c
> +++ b/tools/perf/arch/x86/util/topdown.c
> @@ -9,11 +9,6 @@
>  #include "topdown.h"
>  #include "evsel.h"
>  
> -#define TOPDOWN_L1_EVENTS       "{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound}"
> -#define TOPDOWN_L1_EVENTS_CORE  "{slots,cpu_core/topdown-retiring/,cpu_core/topdown-bad-spec/,cpu_core/topdown-fe-bound/,cpu_core/topdown-be-bound/}"
> -#define TOPDOWN_L2_EVENTS       "{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}"
> -#define TOPDOWN_L2_EVENTS_CORE  "{slots,cpu_core/topdown-retiring/,cpu_core/topdown-bad-spec/,cpu_core/topdown-fe-bound/,cpu_core/topdown-be-bound/,cpu_core/topdown-heavy-ops/,cpu_core/topdown-br-mispredict/,cpu_core/topdown-fetch-lat/,cpu_core/topdown-mem-bound/}"
> -
>  /* Check whether there is a PMU which supports the perf metrics. */
>  bool topdown_sys_has_perf_metrics(void)
>  {
> @@ -99,28 +94,3 @@ const char *arch_get_topdown_pmu_name(struct evlist *evlist, bool warn)
>  
>  	return pmu_name;
>  }
> -
> -int topdown_parse_events(struct evlist *evlist)
> -{
> -	const char *topdown_events;
> -	const char *pmu_name;
> -
> -	if (!topdown_sys_has_perf_metrics())
> -		return 0;
> -
> -	pmu_name = arch_get_topdown_pmu_name(evlist, false);
> -
> -	if (pmu_have_event(pmu_name, "topdown-heavy-ops")) {
> -		if (!strcmp(pmu_name, "cpu_core"))
> -			topdown_events = TOPDOWN_L2_EVENTS_CORE;
> -		else
> -			topdown_events = TOPDOWN_L2_EVENTS;
> -	} else {
> -		if (!strcmp(pmu_name, "cpu_core"))
> -			topdown_events = TOPDOWN_L1_EVENTS_CORE;
> -		else
> -			topdown_events = TOPDOWN_L1_EVENTS;
> -	}
> -
> -	return parse_event(evlist, topdown_events);
> -}
> diff --git a/tools/perf/arch/x86/util/topdown.h b/tools/perf/arch/x86/util/topdown.h
> index 7eb81f042838..46bf9273e572 100644
> --- a/tools/perf/arch/x86/util/topdown.h
> +++ b/tools/perf/arch/x86/util/topdown.h
> @@ -3,6 +3,5 @@
>  #define _TOPDOWN_H 1
>  
>  bool topdown_sys_has_perf_metrics(void);
> -int topdown_parse_events(struct evlist *evlist);
>  
>  #endif
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 5e13171a7bba..796e98e453f6 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1996,6 +1996,7 @@ static int add_default_attributes(void)
>  		stat_config.topdown_level = TOPDOWN_MAX_LEVEL;
>  
>  	if (!evsel_list->core.nr_entries) {
> +		/* No events so add defaults. */
>  		if (target__has_cpu(&target))
>  			default_attrs0[0].config = PERF_COUNT_SW_CPU_CLOCK;
>  
> @@ -2011,6 +2012,19 @@ static int add_default_attributes(void)
>  		}
>  		if (evlist__add_default_attrs(evsel_list, default_attrs1) < 0)
>  			return -1;
> +		/*
> +		 * Add TopdownL1 metrics if they exist. To minimize
> +		 * multiplexing, don't request threshold computation.
> +		 */
> +		if (metricgroup__has_metric("TopdownL1") &&
> +		    metricgroup__parse_groups(evsel_list, "TopdownL1",
> +					    /*metric_no_group=*/false,
> +					    /*metric_no_merge=*/false,
> +					    /*metric_no_threshold=*/true,
> +					    stat_config.user_requested_cpu_list,
> +					    stat_config.system_wide,
> +					    &stat_config.metric_events) < 0)

Does the metricgroup__* function check the existances of the events on
the machine? If not, it may not be reliable to only check the event list.

The existing code supports both L1 and L2 Topdown for SPR. But this
patch seems remove the L2 Topdown support for SPR.

The TopdownL1/L2 metric is added only for the big core with perf stat
default. It's because that the perf_metrics is a dedicated register,
which should not impact other events (using GP counters.) But this patch
seems don't check the CPU type. It may brings extra multiplexing for the
perf stat default on an ATOM platform.

Thanks,
Kan

> +			return -1;
>  		/* Platform specific attrs */
>  		if (evlist__add_default_attrs(evsel_list, default_null_attrs) < 0)
>  			return -1;
> diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> index afb6f2fdc24e..64a35f2787dc 100644
> --- a/tools/perf/util/metricgroup.c
> +++ b/tools/perf/util/metricgroup.c
> @@ -1647,10 +1647,8 @@ static int metricgroup__has_metric_callback(const struct pmu_metric *pm,
>  {
>  	const char *metric = vdata;
>  
> -	if (!pm->metric_expr)
> -		return 0;
> -
> -	if (match_metric(pm->metric_name, metric))
> +	if (match_metric(pm->metric_name, metric) ||
> +	    match_metric(pm->metric_group, metric))
>  		return 1;
>  
>  	return 0;

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 39/51] perf stat: Add TopdownL1 metric as a default if present
  2023-02-27 19:12   ` Liang, Kan
@ 2023-02-27 19:33     ` Ian Rogers
  2023-02-27 20:12       ` Liang, Kan
  0 siblings, 1 reply; 50+ messages in thread
From: Ian Rogers @ 2023-02-27 19:33 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers, Stephane Eranian

On Mon, Feb 27, 2023 at 11:12 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2023-02-19 4:28 a.m., Ian Rogers wrote:
> > When there are no events and on Intel, the topdown events will be
> > added by default if present. To display the metrics associated with
> > these request special handling in stat-shadow.c. To more easily update
> > these metrics use the json metric version via the TopdownL1
> > group. This makes the handling less platform specific.
> >
> > Modify the metricgroup__has_metric code to also cover metric groups.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/perf/arch/x86/util/evlist.c  |  6 +++---
> >  tools/perf/arch/x86/util/topdown.c | 30 ------------------------------
> >  tools/perf/arch/x86/util/topdown.h |  1 -
> >  tools/perf/builtin-stat.c          | 14 ++++++++++++++
> >  tools/perf/util/metricgroup.c      |  6 ++----
> >  5 files changed, 19 insertions(+), 38 deletions(-)
> >
> > diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
> > index cb59ce9b9638..8a7ae4162563 100644
> > --- a/tools/perf/arch/x86/util/evlist.c
> > +++ b/tools/perf/arch/x86/util/evlist.c
> > @@ -59,10 +59,10 @@ int arch_evlist__add_default_attrs(struct evlist *evlist,
> >                                  struct perf_event_attr *attrs,
> >                                  size_t nr_attrs)
> >  {
> > -     if (nr_attrs)
> > -             return ___evlist__add_default_attrs(evlist, attrs, nr_attrs);
> > +     if (!nr_attrs)
> > +             return 0;
> >
> > -     return topdown_parse_events(evlist);
> > +     return ___evlist__add_default_attrs(evlist, attrs, nr_attrs);
> >  }
> >
> >  struct evsel *arch_evlist__leader(struct list_head *list)
> > diff --git a/tools/perf/arch/x86/util/topdown.c b/tools/perf/arch/x86/util/topdown.c
> > index 54810f9acd6f..eb3a7d9652ab 100644
> > --- a/tools/perf/arch/x86/util/topdown.c
> > +++ b/tools/perf/arch/x86/util/topdown.c
> > @@ -9,11 +9,6 @@
> >  #include "topdown.h"
> >  #include "evsel.h"
> >
> > -#define TOPDOWN_L1_EVENTS       "{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound}"
> > -#define TOPDOWN_L1_EVENTS_CORE  "{slots,cpu_core/topdown-retiring/,cpu_core/topdown-bad-spec/,cpu_core/topdown-fe-bound/,cpu_core/topdown-be-bound/}"
> > -#define TOPDOWN_L2_EVENTS       "{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}"
> > -#define TOPDOWN_L2_EVENTS_CORE  "{slots,cpu_core/topdown-retiring/,cpu_core/topdown-bad-spec/,cpu_core/topdown-fe-bound/,cpu_core/topdown-be-bound/,cpu_core/topdown-heavy-ops/,cpu_core/topdown-br-mispredict/,cpu_core/topdown-fetch-lat/,cpu_core/topdown-mem-bound/}"
> > -
> >  /* Check whether there is a PMU which supports the perf metrics. */
> >  bool topdown_sys_has_perf_metrics(void)
> >  {
> > @@ -99,28 +94,3 @@ const char *arch_get_topdown_pmu_name(struct evlist *evlist, bool warn)
> >
> >       return pmu_name;
> >  }
> > -
> > -int topdown_parse_events(struct evlist *evlist)
> > -{
> > -     const char *topdown_events;
> > -     const char *pmu_name;
> > -
> > -     if (!topdown_sys_has_perf_metrics())
> > -             return 0;
> > -
> > -     pmu_name = arch_get_topdown_pmu_name(evlist, false);
> > -
> > -     if (pmu_have_event(pmu_name, "topdown-heavy-ops")) {
> > -             if (!strcmp(pmu_name, "cpu_core"))
> > -                     topdown_events = TOPDOWN_L2_EVENTS_CORE;
> > -             else
> > -                     topdown_events = TOPDOWN_L2_EVENTS;
> > -     } else {
> > -             if (!strcmp(pmu_name, "cpu_core"))
> > -                     topdown_events = TOPDOWN_L1_EVENTS_CORE;
> > -             else
> > -                     topdown_events = TOPDOWN_L1_EVENTS;
> > -     }
> > -
> > -     return parse_event(evlist, topdown_events);
> > -}
> > diff --git a/tools/perf/arch/x86/util/topdown.h b/tools/perf/arch/x86/util/topdown.h
> > index 7eb81f042838..46bf9273e572 100644
> > --- a/tools/perf/arch/x86/util/topdown.h
> > +++ b/tools/perf/arch/x86/util/topdown.h
> > @@ -3,6 +3,5 @@
> >  #define _TOPDOWN_H 1
> >
> >  bool topdown_sys_has_perf_metrics(void);
> > -int topdown_parse_events(struct evlist *evlist);
> >
> >  #endif
> > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > index 5e13171a7bba..796e98e453f6 100644
> > --- a/tools/perf/builtin-stat.c
> > +++ b/tools/perf/builtin-stat.c
> > @@ -1996,6 +1996,7 @@ static int add_default_attributes(void)
> >               stat_config.topdown_level = TOPDOWN_MAX_LEVEL;
> >
> >       if (!evsel_list->core.nr_entries) {
> > +             /* No events so add defaults. */
> >               if (target__has_cpu(&target))
> >                       default_attrs0[0].config = PERF_COUNT_SW_CPU_CLOCK;
> >
> > @@ -2011,6 +2012,19 @@ static int add_default_attributes(void)
> >               }
> >               if (evlist__add_default_attrs(evsel_list, default_attrs1) < 0)
> >                       return -1;
> > +             /*
> > +              * Add TopdownL1 metrics if they exist. To minimize
> > +              * multiplexing, don't request threshold computation.
> > +              */
> > +             if (metricgroup__has_metric("TopdownL1") &&
> > +                 metricgroup__parse_groups(evsel_list, "TopdownL1",
> > +                                         /*metric_no_group=*/false,
> > +                                         /*metric_no_merge=*/false,
> > +                                         /*metric_no_threshold=*/true,
> > +                                         stat_config.user_requested_cpu_list,
> > +                                         stat_config.system_wide,
> > +                                         &stat_config.metric_events) < 0)
>
> Does the metricgroup__* function check the existances of the events on
> the machine? If not, it may not be reliable to only check the event list.
>
> The existing code supports both L1 and L2 Topdown for SPR. But this
> patch seems remove the L2 Topdown support for SPR.
>
> The TopdownL1/L2 metric is added only for the big core with perf stat
> default. It's because that the perf_metrics is a dedicated register,
> which should not impact other events (using GP counters.) But this patch
> seems don't check the CPU type. It may brings extra multiplexing for the
> perf stat default on an ATOM platform.
>
> Thanks,
> Kan

Hi Kan,

The TopdownL2 metrics are present for SPR. The code changes to default
for L1 as with json topdown the maximum topdown level (the default
previously) is L6, and nobody really wants to see that. The --topdown
option is no longer limited to Icelake+ processors, any with the
TopdownL1 metricgroup will work as --topdown has just become a
shortcut to that.

There may be additional multiplexing, but also, in the old code events
from different groups could be used to calculate a bogus metric. There
are also additional events as the previous metrics don't agree with
those in the TMA spreadsheet. If there is multiplexing from this
change on SPR, the TMA json metrics do try to avoid this, I think the
right path through this is to fix the json metrics.

Thanks,
Ian

> > +                     return -1;
> >               /* Platform specific attrs */
> >               if (evlist__add_default_attrs(evsel_list, default_null_attrs) < 0)
> >                       return -1;
> > diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> > index afb6f2fdc24e..64a35f2787dc 100644
> > --- a/tools/perf/util/metricgroup.c
> > +++ b/tools/perf/util/metricgroup.c
> > @@ -1647,10 +1647,8 @@ static int metricgroup__has_metric_callback(const struct pmu_metric *pm,
> >  {
> >       const char *metric = vdata;
> >
> > -     if (!pm->metric_expr)
> > -             return 0;
> > -
> > -     if (match_metric(pm->metric_name, metric))
> > +     if (match_metric(pm->metric_name, metric) ||
> > +         match_metric(pm->metric_group, metric))
> >               return 1;
> >
> >       return 0;

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 39/51] perf stat: Add TopdownL1 metric as a default if present
  2023-02-27 19:33     ` Ian Rogers
@ 2023-02-27 20:12       ` Liang, Kan
  2023-02-28  6:27         ` Ian Rogers
  0 siblings, 1 reply; 50+ messages in thread
From: Liang, Kan @ 2023-02-27 20:12 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers, Stephane Eranian



On 2023-02-27 2:33 p.m., Ian Rogers wrote:
> On Mon, Feb 27, 2023 at 11:12 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>
>>
>>
>> On 2023-02-19 4:28 a.m., Ian Rogers wrote:
>>> When there are no events and on Intel, the topdown events will be
>>> added by default if present. To display the metrics associated with
>>> these request special handling in stat-shadow.c. To more easily update
>>> these metrics use the json metric version via the TopdownL1
>>> group. This makes the handling less platform specific.
>>>
>>> Modify the metricgroup__has_metric code to also cover metric groups.
>>>
>>> Signed-off-by: Ian Rogers <irogers@google.com>
>>> ---
>>>  tools/perf/arch/x86/util/evlist.c  |  6 +++---
>>>  tools/perf/arch/x86/util/topdown.c | 30 ------------------------------
>>>  tools/perf/arch/x86/util/topdown.h |  1 -
>>>  tools/perf/builtin-stat.c          | 14 ++++++++++++++
>>>  tools/perf/util/metricgroup.c      |  6 ++----
>>>  5 files changed, 19 insertions(+), 38 deletions(-)
>>>
>>> diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
>>> index cb59ce9b9638..8a7ae4162563 100644
>>> --- a/tools/perf/arch/x86/util/evlist.c
>>> +++ b/tools/perf/arch/x86/util/evlist.c
>>> @@ -59,10 +59,10 @@ int arch_evlist__add_default_attrs(struct evlist *evlist,
>>>                                  struct perf_event_attr *attrs,
>>>                                  size_t nr_attrs)
>>>  {
>>> -     if (nr_attrs)
>>> -             return ___evlist__add_default_attrs(evlist, attrs, nr_attrs);
>>> +     if (!nr_attrs)
>>> +             return 0;
>>>
>>> -     return topdown_parse_events(evlist);
>>> +     return ___evlist__add_default_attrs(evlist, attrs, nr_attrs);
>>>  }
>>>
>>>  struct evsel *arch_evlist__leader(struct list_head *list)
>>> diff --git a/tools/perf/arch/x86/util/topdown.c b/tools/perf/arch/x86/util/topdown.c
>>> index 54810f9acd6f..eb3a7d9652ab 100644
>>> --- a/tools/perf/arch/x86/util/topdown.c
>>> +++ b/tools/perf/arch/x86/util/topdown.c
>>> @@ -9,11 +9,6 @@
>>>  #include "topdown.h"
>>>  #include "evsel.h"
>>>
>>> -#define TOPDOWN_L1_EVENTS       "{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound}"
>>> -#define TOPDOWN_L1_EVENTS_CORE  "{slots,cpu_core/topdown-retiring/,cpu_core/topdown-bad-spec/,cpu_core/topdown-fe-bound/,cpu_core/topdown-be-bound/}"
>>> -#define TOPDOWN_L2_EVENTS       "{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}"
>>> -#define TOPDOWN_L2_EVENTS_CORE  "{slots,cpu_core/topdown-retiring/,cpu_core/topdown-bad-spec/,cpu_core/topdown-fe-bound/,cpu_core/topdown-be-bound/,cpu_core/topdown-heavy-ops/,cpu_core/topdown-br-mispredict/,cpu_core/topdown-fetch-lat/,cpu_core/topdown-mem-bound/}"
>>> -
>>>  /* Check whether there is a PMU which supports the perf metrics. */
>>>  bool topdown_sys_has_perf_metrics(void)
>>>  {
>>> @@ -99,28 +94,3 @@ const char *arch_get_topdown_pmu_name(struct evlist *evlist, bool warn)
>>>
>>>       return pmu_name;
>>>  }
>>> -
>>> -int topdown_parse_events(struct evlist *evlist)
>>> -{
>>> -     const char *topdown_events;
>>> -     const char *pmu_name;
>>> -
>>> -     if (!topdown_sys_has_perf_metrics())
>>> -             return 0;
>>> -
>>> -     pmu_name = arch_get_topdown_pmu_name(evlist, false);
>>> -
>>> -     if (pmu_have_event(pmu_name, "topdown-heavy-ops")) {
>>> -             if (!strcmp(pmu_name, "cpu_core"))
>>> -                     topdown_events = TOPDOWN_L2_EVENTS_CORE;
>>> -             else
>>> -                     topdown_events = TOPDOWN_L2_EVENTS;
>>> -     } else {
>>> -             if (!strcmp(pmu_name, "cpu_core"))
>>> -                     topdown_events = TOPDOWN_L1_EVENTS_CORE;
>>> -             else
>>> -                     topdown_events = TOPDOWN_L1_EVENTS;
>>> -     }
>>> -
>>> -     return parse_event(evlist, topdown_events);
>>> -}
>>> diff --git a/tools/perf/arch/x86/util/topdown.h b/tools/perf/arch/x86/util/topdown.h
>>> index 7eb81f042838..46bf9273e572 100644
>>> --- a/tools/perf/arch/x86/util/topdown.h
>>> +++ b/tools/perf/arch/x86/util/topdown.h
>>> @@ -3,6 +3,5 @@
>>>  #define _TOPDOWN_H 1
>>>
>>>  bool topdown_sys_has_perf_metrics(void);
>>> -int topdown_parse_events(struct evlist *evlist);
>>>
>>>  #endif
>>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
>>> index 5e13171a7bba..796e98e453f6 100644
>>> --- a/tools/perf/builtin-stat.c
>>> +++ b/tools/perf/builtin-stat.c
>>> @@ -1996,6 +1996,7 @@ static int add_default_attributes(void)
>>>               stat_config.topdown_level = TOPDOWN_MAX_LEVEL;
>>>
>>>       if (!evsel_list->core.nr_entries) {
>>> +             /* No events so add defaults. */
>>>               if (target__has_cpu(&target))
>>>                       default_attrs0[0].config = PERF_COUNT_SW_CPU_CLOCK;
>>>
>>> @@ -2011,6 +2012,19 @@ static int add_default_attributes(void)
>>>               }
>>>               if (evlist__add_default_attrs(evsel_list, default_attrs1) < 0)
>>>                       return -1;
>>> +             /*
>>> +              * Add TopdownL1 metrics if they exist. To minimize
>>> +              * multiplexing, don't request threshold computation.
>>> +              */
>>> +             if (metricgroup__has_metric("TopdownL1") &&
>>> +                 metricgroup__parse_groups(evsel_list, "TopdownL1",
>>> +                                         /*metric_no_group=*/false,
>>> +                                         /*metric_no_merge=*/false,
>>> +                                         /*metric_no_threshold=*/true,
>>> +                                         stat_config.user_requested_cpu_list,
>>> +                                         stat_config.system_wide,
>>> +                                         &stat_config.metric_events) < 0)
>>
>> Does the metricgroup__* function check the existances of the events on
>> the machine? If not, it may not be reliable to only check the event list.
>>
>> The existing code supports both L1 and L2 Topdown for SPR. But this
>> patch seems remove the L2 Topdown support for SPR.
>>
>> The TopdownL1/L2 metric is added only for the big core with perf stat
>> default. It's because that the perf_metrics is a dedicated register,
>> which should not impact other events (using GP counters.) But this patch
>> seems don't check the CPU type. It may brings extra multiplexing for the
>> perf stat default on an ATOM platform.
>>
>> Thanks,
>> Kan
> 
> Hi Kan,
> 
> The TopdownL2 metrics are present for SPR. The code changes to default
> for L1 as with json topdown the maximum topdown level (the default
> previously) is L6, and nobody really wants to see that. The --topdown
> option is no longer limited to Icelake+ processors, any with the
> TopdownL1 metricgroup will work as --topdown has just become a
> shortcut to that.

This patch seems also changes the perf stat default. The current perf
stat default shows both L1 and L2 for SPR. If that's the case, it should
be a user visible changes. What's output of "perf stat sleep 1" with
this patch on SPR?

> 
> There may be additional multiplexing, but also, in the old code events
> from different groups could be used to calculate a bogus metric. There
> are also additional events as the previous metrics don't agree with
> those in the TMA spreadsheet. If there is multiplexing from this
> change on SPR, the TMA json metrics do try to avoid this, I think the
> right path through this is to fix the json metrics.

For the perf stat default, there should be no multiplexing.

Also, it looks like the patch and the following several patches remove
the existence check of an event (pmu_have_event()). It may not be a good
idea. Those events/features usually be enumerated, which means that they
may not be available in some cases. For example, we don't have the perf
metrics support on KVM. I don't think the current JSON metrics checks
the CPUID enumeration. If so, the patch may brings problem in a VM.

Thanks,
Kan

> 
> Thanks,
> Ian
> 
>>> +                     return -1;
>>>               /* Platform specific attrs */
>>>               if (evlist__add_default_attrs(evsel_list, default_null_attrs) < 0)
>>>                       return -1;
>>> diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
>>> index afb6f2fdc24e..64a35f2787dc 100644
>>> --- a/tools/perf/util/metricgroup.c
>>> +++ b/tools/perf/util/metricgroup.c
>>> @@ -1647,10 +1647,8 @@ static int metricgroup__has_metric_callback(const struct pmu_metric *pm,
>>>  {
>>>       const char *metric = vdata;
>>>
>>> -     if (!pm->metric_expr)
>>> -             return 0;
>>> -
>>> -     if (match_metric(pm->metric_name, metric))
>>> +     if (match_metric(pm->metric_name, metric) ||
>>> +         match_metric(pm->metric_group, metric))
>>>               return 1;
>>>
>>>       return 0;

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 00/51] shadow metric clean up and improvements
  2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
                   ` (35 preceding siblings ...)
  2023-02-19 11:17 ` [PATCH v1 00/51] shadow metric clean up and improvements Arnaldo Carvalho de Melo
@ 2023-02-27 22:04 ` Liang, Kan
  2023-02-28  6:21   ` Ian Rogers
  36 siblings, 1 reply; 50+ messages in thread
From: Liang, Kan @ 2023-02-27 22:04 UTC (permalink / raw)
  To: Ian Rogers, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian



On 2023-02-19 4:27 a.m., Ian Rogers wrote:
> Recently the shadow stat metrics broke due to repeated aggregation and
> a quick fix was applied:
> https://lore.kernel.org/lkml/20230209064447.83733-1-irogers@google.com/
> This is the longer fix but one that comes with some extras. To avoid
> fixing issues for hard coded metrics, the topdown, SMI cost and
> transaction flags are moved into json metrics. A side effect of this
> is that TopdownL1 metrics will now be displayed when supported, if no
> "perf stat" events are specified.
> 
> Another fix included here is for event grouping as raised in:
> https://lore.kernel.org/lkml/CA+icZUU_ew7pzWJJZLbj1xsU6MQTPrj8tkFfDhNdTDRQfGUBMQ@mail.gmail.com/
> Metrics are now tagged with NMI and SMT flags, meaning that the events
> shouldn't be grouped if the NMI watchdog is enabled or SMT is enabled.
> 
> Given the two issues, the metrics are re-generated and the patches
> also include the latest Intel vendor events. The changes to the metric
> generation code can be seen in:
> https://github.com/intel/perfmon/pull/56
> 
> Hard coded metrics support thresholds, the patches add this ability to
> json metrics so that the hard coded metrics can be removed. Migrate
> remaining hard coded metrics to looking up counters from the
> evlist/aggregation count. Finally, get rid of the saved_value logic
> and thereby look to fix the aggregation issues.
> 
> Some related fix ups and code clean ups are included in the changes,
> in particular to aid with the code's readability and to keep topdown
> documentation in sync.
> 
> Ian Rogers (51):

Thanks Ian for the clean up and improvements. The patches 1-38 looks
good to me.

Reviewed-by: Kan Liang<kan.liang@linux.intel.com>

I like the idea of utilizing the json metrics. But the changes for the
later patches seem change the current user-visible behavior for some cases.

Thanks,
Kan

>   perf tools: Ensure evsel name is initialized
>   perf metrics: Improve variable names
>   perf pmu-events: Remove aggr_mode from pmu_event
>   perf pmu-events: Change aggr_mode to be an enum
>   perf pmu-events: Change deprecated to be a bool
>   perf pmu-events: Change perpkg to be a bool
>   perf expr: Make the online topology accessible globally
>   perf pmu-events: Make the metric_constraint an enum
>   perf pmu-events: Don't '\0' terminate enum values
>   perf vendor events intel: Refresh alderlake events
>   perf vendor events intel: Refresh alderlake-n metrics
>   perf vendor events intel: Refresh broadwell metrics
>   perf vendor events intel: Refresh broadwellde metrics
>   perf vendor events intel: Refresh broadwellx metrics
>   perf vendor events intel: Refresh cascadelakex events
>   perf vendor events intel: Add graniterapids events
>   perf vendor events intel: Refresh haswell metrics
>   perf vendor events intel: Refresh haswellx metrics
>   perf vendor events intel: Refresh icelake events
>   perf vendor events intel: Refresh icelakex metrics
>   perf vendor events intel: Refresh ivybridge metrics
>   perf vendor events intel: Refresh ivytown metrics
>   perf vendor events intel: Refresh jaketown events
>   perf vendor events intel: Refresh knightslanding events
>   perf vendor events intel: Refresh sandybridge events
>   perf vendor events intel: Refresh sapphirerapids events
>   perf vendor events intel: Refresh silvermont events
>   perf vendor events intel: Refresh skylake events
>   perf vendor events intel: Refresh skylakex metrics
>   perf vendor events intel: Refresh tigerlake events
>   perf vendor events intel: Refresh westmereep-dp events
>   perf jevents: Add rand support to metrics
>   perf jevent: Parse metric thresholds
>   perf pmu-events: Test parsing metric thresholds with the fake PMU
>   perf list: Support for printing metric thresholds
>   perf metric: Compute and print threshold values
>   perf expr: More explicit NAN handling
>   perf metric: Add --metric-no-threshold option
>   perf stat: Add TopdownL1 metric as a default if present
>   perf stat: Implement --topdown using json metrics
>   perf stat: Remove topdown event special handling
>   perf doc: Refresh topdown documentation
>   perf stat: Remove hard coded transaction events
>   perf stat: Use metrics for --smi-cost
>   perf stat: Remove perf_stat_evsel_id
>   perf stat: Move enums from header
>   perf stat: Hide runtime_stat
>   perf stat: Add cpu_aggr_map for loop
>   perf metric: Directly use counts rather than saved_value
>   perf stat: Use counts rather than saved_value
>   perf stat: Remove saved_value/runtime_stat
> 
>  tools/perf/Documentation/perf-stat.txt        |   27 +-
>  tools/perf/Documentation/topdown.txt          |   70 +-
>  tools/perf/arch/powerpc/util/header.c         |    2 +-
>  tools/perf/arch/x86/util/evlist.c             |    6 +-
>  tools/perf/arch/x86/util/topdown.c            |   78 +-
>  tools/perf/arch/x86/util/topdown.h            |    1 -
>  tools/perf/builtin-list.c                     |   13 +-
>  tools/perf/builtin-script.c                   |    9 +-
>  tools/perf/builtin-stat.c                     |  233 +-
>  .../arch/x86/alderlake/adl-metrics.json       | 3190 ++++++++++-------
>  .../pmu-events/arch/x86/alderlake/cache.json  |   36 +-
>  .../arch/x86/alderlake/floating-point.json    |   27 +
>  .../arch/x86/alderlake/frontend.json          |    9 +
>  .../pmu-events/arch/x86/alderlake/memory.json |    3 +-
>  .../arch/x86/alderlake/pipeline.json          |   14 +-
>  .../arch/x86/alderlake/uncore-other.json      |   28 +-
>  .../arch/x86/alderlaken/adln-metrics.json     |  811 +++--
>  .../arch/x86/broadwell/bdw-metrics.json       | 1439 ++++----
>  .../arch/x86/broadwellde/bdwde-metrics.json   | 1405 ++++----
>  .../arch/x86/broadwellx/bdx-metrics.json      | 1626 +++++----
>  .../arch/x86/broadwellx/uncore-cache.json     |   74 +-
>  .../x86/broadwellx/uncore-interconnect.json   |   64 +-
>  .../arch/x86/broadwellx/uncore-other.json     |    4 +-
>  .../arch/x86/cascadelakex/cache.json          |   24 +-
>  .../arch/x86/cascadelakex/clx-metrics.json    | 2198 ++++++------
>  .../arch/x86/cascadelakex/frontend.json       |    8 +-
>  .../arch/x86/cascadelakex/pipeline.json       |   16 +
>  .../arch/x86/cascadelakex/uncore-memory.json  |   18 +-
>  .../arch/x86/cascadelakex/uncore-other.json   |  120 +-
>  .../arch/x86/cascadelakex/uncore-power.json   |    8 +-
>  .../arch/x86/graniterapids/cache.json         |   54 +
>  .../arch/x86/graniterapids/frontend.json      |   10 +
>  .../arch/x86/graniterapids/memory.json        |  174 +
>  .../arch/x86/graniterapids/other.json         |   29 +
>  .../arch/x86/graniterapids/pipeline.json      |  102 +
>  .../x86/graniterapids/virtual-memory.json     |   26 +
>  .../arch/x86/haswell/hsw-metrics.json         | 1220 ++++---
>  .../arch/x86/haswellx/hsx-metrics.json        | 1397 ++++----
>  .../pmu-events/arch/x86/icelake/cache.json    |   16 +
>  .../arch/x86/icelake/floating-point.json      |   31 +
>  .../arch/x86/icelake/icl-metrics.json         | 1932 +++++-----
>  .../pmu-events/arch/x86/icelake/pipeline.json |   23 +-
>  .../arch/x86/icelake/uncore-other.json        |   56 +
>  .../arch/x86/icelakex/icx-metrics.json        | 2153 +++++------
>  .../arch/x86/icelakex/uncore-memory.json      |    2 +-
>  .../arch/x86/icelakex/uncore-other.json       |    4 +-
>  .../arch/x86/ivybridge/ivb-metrics.json       | 1270 ++++---
>  .../arch/x86/ivytown/ivt-metrics.json         | 1311 ++++---
>  .../pmu-events/arch/x86/jaketown/cache.json   |    6 +-
>  .../arch/x86/jaketown/floating-point.json     |    2 +-
>  .../arch/x86/jaketown/frontend.json           |   12 +-
>  .../arch/x86/jaketown/jkt-metrics.json        |  602 ++--
>  .../arch/x86/jaketown/pipeline.json           |    2 +-
>  .../arch/x86/jaketown/uncore-cache.json       |   22 +-
>  .../x86/jaketown/uncore-interconnect.json     |   74 +-
>  .../arch/x86/jaketown/uncore-memory.json      |    4 +-
>  .../arch/x86/jaketown/uncore-other.json       |   22 +-
>  .../arch/x86/jaketown/uncore-power.json       |    8 +-
>  .../arch/x86/knightslanding/cache.json        |   94 +-
>  .../arch/x86/knightslanding/pipeline.json     |    8 +-
>  .../arch/x86/knightslanding/uncore-other.json |    8 +-
>  tools/perf/pmu-events/arch/x86/mapfile.csv    |   29 +-
>  .../arch/x86/sandybridge/cache.json           |    8 +-
>  .../arch/x86/sandybridge/floating-point.json  |    2 +-
>  .../arch/x86/sandybridge/frontend.json        |   12 +-
>  .../arch/x86/sandybridge/pipeline.json        |    2 +-
>  .../arch/x86/sandybridge/snb-metrics.json     |  601 ++--
>  .../arch/x86/sapphirerapids/cache.json        |   24 +-
>  .../x86/sapphirerapids/floating-point.json    |   32 +
>  .../arch/x86/sapphirerapids/frontend.json     |    8 +
>  .../arch/x86/sapphirerapids/pipeline.json     |   19 +-
>  .../arch/x86/sapphirerapids/spr-metrics.json  | 2283 ++++++------
>  .../arch/x86/sapphirerapids/uncore-other.json |   60 +
>  .../arch/x86/silvermont/frontend.json         |    2 +-
>  .../arch/x86/silvermont/pipeline.json         |    2 +-
>  .../pmu-events/arch/x86/skylake/cache.json    |   25 +-
>  .../pmu-events/arch/x86/skylake/frontend.json |    8 +-
>  .../pmu-events/arch/x86/skylake/other.json    |    1 +
>  .../pmu-events/arch/x86/skylake/pipeline.json |   16 +
>  .../arch/x86/skylake/skl-metrics.json         | 1877 ++++++----
>  .../arch/x86/skylake/uncore-other.json        |    1 +
>  .../pmu-events/arch/x86/skylakex/cache.json   |    8 +-
>  .../arch/x86/skylakex/frontend.json           |    8 +-
>  .../arch/x86/skylakex/pipeline.json           |   16 +
>  .../arch/x86/skylakex/skx-metrics.json        | 2097 +++++------
>  .../arch/x86/skylakex/uncore-memory.json      |    2 +-
>  .../arch/x86/skylakex/uncore-other.json       |   96 +-
>  .../arch/x86/skylakex/uncore-power.json       |    6 +-
>  .../arch/x86/tigerlake/floating-point.json    |   31 +
>  .../arch/x86/tigerlake/pipeline.json          |   18 +
>  .../arch/x86/tigerlake/tgl-metrics.json       | 1942 +++++-----
>  .../arch/x86/tigerlake/uncore-other.json      |   28 +-
>  .../arch/x86/westmereep-dp/cache.json         |    2 +-
>  .../x86/westmereep-dp/virtual-memory.json     |    2 +-
>  tools/perf/pmu-events/jevents.py              |   58 +-
>  tools/perf/pmu-events/metric.py               |    8 +-
>  tools/perf/pmu-events/pmu-events.h            |   35 +-
>  tools/perf/tests/expand-cgroup.c              |    3 +-
>  tools/perf/tests/expr.c                       |    7 +-
>  tools/perf/tests/parse-metric.c               |   21 +-
>  tools/perf/tests/pmu-events.c                 |   49 +-
>  tools/perf/util/cpumap.h                      |    3 +
>  tools/perf/util/cputopo.c                     |   14 +
>  tools/perf/util/cputopo.h                     |    5 +
>  tools/perf/util/evsel.h                       |    2 +-
>  tools/perf/util/expr.c                        |   16 +-
>  tools/perf/util/expr.y                        |   12 +-
>  tools/perf/util/metricgroup.c                 |  178 +-
>  tools/perf/util/metricgroup.h                 |    5 +-
>  tools/perf/util/pmu.c                         |   17 +-
>  tools/perf/util/print-events.h                |    1 +
>  tools/perf/util/smt.c                         |   11 +-
>  tools/perf/util/smt.h                         |   12 +-
>  tools/perf/util/stat-display.c                |  117 +-
>  tools/perf/util/stat-shadow.c                 | 1287 ++-----
>  tools/perf/util/stat.c                        |   74 -
>  tools/perf/util/stat.h                        |   96 +-
>  tools/perf/util/synthetic-events.c            |    2 +-
>  tools/perf/util/topdown.c                     |   68 +-
>  tools/perf/util/topdown.h                     |   11 +-
>  120 files changed, 18025 insertions(+), 15590 deletions(-)
>  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/cache.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/frontend.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/memory.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/other.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/virtual-memory.json
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 00/51] shadow metric clean up and improvements
  2023-02-27 22:04 ` Liang, Kan
@ 2023-02-28  6:21   ` Ian Rogers
  0 siblings, 0 replies; 50+ messages in thread
From: Ian Rogers @ 2023-02-28  6:21 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers, Stephane Eranian

On Mon, Feb 27, 2023 at 2:05 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2023-02-19 4:27 a.m., Ian Rogers wrote:
> > Recently the shadow stat metrics broke due to repeated aggregation and
> > a quick fix was applied:
> > https://lore.kernel.org/lkml/20230209064447.83733-1-irogers@google.com/
> > This is the longer fix but one that comes with some extras. To avoid
> > fixing issues for hard coded metrics, the topdown, SMI cost and
> > transaction flags are moved into json metrics. A side effect of this
> > is that TopdownL1 metrics will now be displayed when supported, if no
> > "perf stat" events are specified.
> >
> > Another fix included here is for event grouping as raised in:
> > https://lore.kernel.org/lkml/CA+icZUU_ew7pzWJJZLbj1xsU6MQTPrj8tkFfDhNdTDRQfGUBMQ@mail.gmail.com/
> > Metrics are now tagged with NMI and SMT flags, meaning that the events
> > shouldn't be grouped if the NMI watchdog is enabled or SMT is enabled.
> >
> > Given the two issues, the metrics are re-generated and the patches
> > also include the latest Intel vendor events. The changes to the metric
> > generation code can be seen in:
> > https://github.com/intel/perfmon/pull/56
> >
> > Hard coded metrics support thresholds, the patches add this ability to
> > json metrics so that the hard coded metrics can be removed. Migrate
> > remaining hard coded metrics to looking up counters from the
> > evlist/aggregation count. Finally, get rid of the saved_value logic
> > and thereby look to fix the aggregation issues.
> >
> > Some related fix ups and code clean ups are included in the changes,
> > in particular to aid with the code's readability and to keep topdown
> > documentation in sync.
> >
> > Ian Rogers (51):
>
> Thanks Ian for the clean up and improvements. The patches 1-38 looks
> good to me.
>
> Reviewed-by: Kan Liang<kan.liang@linux.intel.com>
>
> I like the idea of utilizing the json metrics. But the changes for the
> later patches seem change the current user-visible behavior for some cases.

Thanks Kan, I'll put some comments on the previous thread wrt behavior change.

Ian

> Thanks,
> Kan
>
> >   perf tools: Ensure evsel name is initialized
> >   perf metrics: Improve variable names
> >   perf pmu-events: Remove aggr_mode from pmu_event
> >   perf pmu-events: Change aggr_mode to be an enum
> >   perf pmu-events: Change deprecated to be a bool
> >   perf pmu-events: Change perpkg to be a bool
> >   perf expr: Make the online topology accessible globally
> >   perf pmu-events: Make the metric_constraint an enum
> >   perf pmu-events: Don't '\0' terminate enum values
> >   perf vendor events intel: Refresh alderlake events
> >   perf vendor events intel: Refresh alderlake-n metrics
> >   perf vendor events intel: Refresh broadwell metrics
> >   perf vendor events intel: Refresh broadwellde metrics
> >   perf vendor events intel: Refresh broadwellx metrics
> >   perf vendor events intel: Refresh cascadelakex events
> >   perf vendor events intel: Add graniterapids events
> >   perf vendor events intel: Refresh haswell metrics
> >   perf vendor events intel: Refresh haswellx metrics
> >   perf vendor events intel: Refresh icelake events
> >   perf vendor events intel: Refresh icelakex metrics
> >   perf vendor events intel: Refresh ivybridge metrics
> >   perf vendor events intel: Refresh ivytown metrics
> >   perf vendor events intel: Refresh jaketown events
> >   perf vendor events intel: Refresh knightslanding events
> >   perf vendor events intel: Refresh sandybridge events
> >   perf vendor events intel: Refresh sapphirerapids events
> >   perf vendor events intel: Refresh silvermont events
> >   perf vendor events intel: Refresh skylake events
> >   perf vendor events intel: Refresh skylakex metrics
> >   perf vendor events intel: Refresh tigerlake events
> >   perf vendor events intel: Refresh westmereep-dp events
> >   perf jevents: Add rand support to metrics
> >   perf jevent: Parse metric thresholds
> >   perf pmu-events: Test parsing metric thresholds with the fake PMU
> >   perf list: Support for printing metric thresholds
> >   perf metric: Compute and print threshold values
> >   perf expr: More explicit NAN handling
> >   perf metric: Add --metric-no-threshold option
> >   perf stat: Add TopdownL1 metric as a default if present
> >   perf stat: Implement --topdown using json metrics
> >   perf stat: Remove topdown event special handling
> >   perf doc: Refresh topdown documentation
> >   perf stat: Remove hard coded transaction events
> >   perf stat: Use metrics for --smi-cost
> >   perf stat: Remove perf_stat_evsel_id
> >   perf stat: Move enums from header
> >   perf stat: Hide runtime_stat
> >   perf stat: Add cpu_aggr_map for loop
> >   perf metric: Directly use counts rather than saved_value
> >   perf stat: Use counts rather than saved_value
> >   perf stat: Remove saved_value/runtime_stat
> >
> >  tools/perf/Documentation/perf-stat.txt        |   27 +-
> >  tools/perf/Documentation/topdown.txt          |   70 +-
> >  tools/perf/arch/powerpc/util/header.c         |    2 +-
> >  tools/perf/arch/x86/util/evlist.c             |    6 +-
> >  tools/perf/arch/x86/util/topdown.c            |   78 +-
> >  tools/perf/arch/x86/util/topdown.h            |    1 -
> >  tools/perf/builtin-list.c                     |   13 +-
> >  tools/perf/builtin-script.c                   |    9 +-
> >  tools/perf/builtin-stat.c                     |  233 +-
> >  .../arch/x86/alderlake/adl-metrics.json       | 3190 ++++++++++-------
> >  .../pmu-events/arch/x86/alderlake/cache.json  |   36 +-
> >  .../arch/x86/alderlake/floating-point.json    |   27 +
> >  .../arch/x86/alderlake/frontend.json          |    9 +
> >  .../pmu-events/arch/x86/alderlake/memory.json |    3 +-
> >  .../arch/x86/alderlake/pipeline.json          |   14 +-
> >  .../arch/x86/alderlake/uncore-other.json      |   28 +-
> >  .../arch/x86/alderlaken/adln-metrics.json     |  811 +++--
> >  .../arch/x86/broadwell/bdw-metrics.json       | 1439 ++++----
> >  .../arch/x86/broadwellde/bdwde-metrics.json   | 1405 ++++----
> >  .../arch/x86/broadwellx/bdx-metrics.json      | 1626 +++++----
> >  .../arch/x86/broadwellx/uncore-cache.json     |   74 +-
> >  .../x86/broadwellx/uncore-interconnect.json   |   64 +-
> >  .../arch/x86/broadwellx/uncore-other.json     |    4 +-
> >  .../arch/x86/cascadelakex/cache.json          |   24 +-
> >  .../arch/x86/cascadelakex/clx-metrics.json    | 2198 ++++++------
> >  .../arch/x86/cascadelakex/frontend.json       |    8 +-
> >  .../arch/x86/cascadelakex/pipeline.json       |   16 +
> >  .../arch/x86/cascadelakex/uncore-memory.json  |   18 +-
> >  .../arch/x86/cascadelakex/uncore-other.json   |  120 +-
> >  .../arch/x86/cascadelakex/uncore-power.json   |    8 +-
> >  .../arch/x86/graniterapids/cache.json         |   54 +
> >  .../arch/x86/graniterapids/frontend.json      |   10 +
> >  .../arch/x86/graniterapids/memory.json        |  174 +
> >  .../arch/x86/graniterapids/other.json         |   29 +
> >  .../arch/x86/graniterapids/pipeline.json      |  102 +
> >  .../x86/graniterapids/virtual-memory.json     |   26 +
> >  .../arch/x86/haswell/hsw-metrics.json         | 1220 ++++---
> >  .../arch/x86/haswellx/hsx-metrics.json        | 1397 ++++----
> >  .../pmu-events/arch/x86/icelake/cache.json    |   16 +
> >  .../arch/x86/icelake/floating-point.json      |   31 +
> >  .../arch/x86/icelake/icl-metrics.json         | 1932 +++++-----
> >  .../pmu-events/arch/x86/icelake/pipeline.json |   23 +-
> >  .../arch/x86/icelake/uncore-other.json        |   56 +
> >  .../arch/x86/icelakex/icx-metrics.json        | 2153 +++++------
> >  .../arch/x86/icelakex/uncore-memory.json      |    2 +-
> >  .../arch/x86/icelakex/uncore-other.json       |    4 +-
> >  .../arch/x86/ivybridge/ivb-metrics.json       | 1270 ++++---
> >  .../arch/x86/ivytown/ivt-metrics.json         | 1311 ++++---
> >  .../pmu-events/arch/x86/jaketown/cache.json   |    6 +-
> >  .../arch/x86/jaketown/floating-point.json     |    2 +-
> >  .../arch/x86/jaketown/frontend.json           |   12 +-
> >  .../arch/x86/jaketown/jkt-metrics.json        |  602 ++--
> >  .../arch/x86/jaketown/pipeline.json           |    2 +-
> >  .../arch/x86/jaketown/uncore-cache.json       |   22 +-
> >  .../x86/jaketown/uncore-interconnect.json     |   74 +-
> >  .../arch/x86/jaketown/uncore-memory.json      |    4 +-
> >  .../arch/x86/jaketown/uncore-other.json       |   22 +-
> >  .../arch/x86/jaketown/uncore-power.json       |    8 +-
> >  .../arch/x86/knightslanding/cache.json        |   94 +-
> >  .../arch/x86/knightslanding/pipeline.json     |    8 +-
> >  .../arch/x86/knightslanding/uncore-other.json |    8 +-
> >  tools/perf/pmu-events/arch/x86/mapfile.csv    |   29 +-
> >  .../arch/x86/sandybridge/cache.json           |    8 +-
> >  .../arch/x86/sandybridge/floating-point.json  |    2 +-
> >  .../arch/x86/sandybridge/frontend.json        |   12 +-
> >  .../arch/x86/sandybridge/pipeline.json        |    2 +-
> >  .../arch/x86/sandybridge/snb-metrics.json     |  601 ++--
> >  .../arch/x86/sapphirerapids/cache.json        |   24 +-
> >  .../x86/sapphirerapids/floating-point.json    |   32 +
> >  .../arch/x86/sapphirerapids/frontend.json     |    8 +
> >  .../arch/x86/sapphirerapids/pipeline.json     |   19 +-
> >  .../arch/x86/sapphirerapids/spr-metrics.json  | 2283 ++++++------
> >  .../arch/x86/sapphirerapids/uncore-other.json |   60 +
> >  .../arch/x86/silvermont/frontend.json         |    2 +-
> >  .../arch/x86/silvermont/pipeline.json         |    2 +-
> >  .../pmu-events/arch/x86/skylake/cache.json    |   25 +-
> >  .../pmu-events/arch/x86/skylake/frontend.json |    8 +-
> >  .../pmu-events/arch/x86/skylake/other.json    |    1 +
> >  .../pmu-events/arch/x86/skylake/pipeline.json |   16 +
> >  .../arch/x86/skylake/skl-metrics.json         | 1877 ++++++----
> >  .../arch/x86/skylake/uncore-other.json        |    1 +
> >  .../pmu-events/arch/x86/skylakex/cache.json   |    8 +-
> >  .../arch/x86/skylakex/frontend.json           |    8 +-
> >  .../arch/x86/skylakex/pipeline.json           |   16 +
> >  .../arch/x86/skylakex/skx-metrics.json        | 2097 +++++------
> >  .../arch/x86/skylakex/uncore-memory.json      |    2 +-
> >  .../arch/x86/skylakex/uncore-other.json       |   96 +-
> >  .../arch/x86/skylakex/uncore-power.json       |    6 +-
> >  .../arch/x86/tigerlake/floating-point.json    |   31 +
> >  .../arch/x86/tigerlake/pipeline.json          |   18 +
> >  .../arch/x86/tigerlake/tgl-metrics.json       | 1942 +++++-----
> >  .../arch/x86/tigerlake/uncore-other.json      |   28 +-
> >  .../arch/x86/westmereep-dp/cache.json         |    2 +-
> >  .../x86/westmereep-dp/virtual-memory.json     |    2 +-
> >  tools/perf/pmu-events/jevents.py              |   58 +-
> >  tools/perf/pmu-events/metric.py               |    8 +-
> >  tools/perf/pmu-events/pmu-events.h            |   35 +-
> >  tools/perf/tests/expand-cgroup.c              |    3 +-
> >  tools/perf/tests/expr.c                       |    7 +-
> >  tools/perf/tests/parse-metric.c               |   21 +-
> >  tools/perf/tests/pmu-events.c                 |   49 +-
> >  tools/perf/util/cpumap.h                      |    3 +
> >  tools/perf/util/cputopo.c                     |   14 +
> >  tools/perf/util/cputopo.h                     |    5 +
> >  tools/perf/util/evsel.h                       |    2 +-
> >  tools/perf/util/expr.c                        |   16 +-
> >  tools/perf/util/expr.y                        |   12 +-
> >  tools/perf/util/metricgroup.c                 |  178 +-
> >  tools/perf/util/metricgroup.h                 |    5 +-
> >  tools/perf/util/pmu.c                         |   17 +-
> >  tools/perf/util/print-events.h                |    1 +
> >  tools/perf/util/smt.c                         |   11 +-
> >  tools/perf/util/smt.h                         |   12 +-
> >  tools/perf/util/stat-display.c                |  117 +-
> >  tools/perf/util/stat-shadow.c                 | 1287 ++-----
> >  tools/perf/util/stat.c                        |   74 -
> >  tools/perf/util/stat.h                        |   96 +-
> >  tools/perf/util/synthetic-events.c            |    2 +-
> >  tools/perf/util/topdown.c                     |   68 +-
> >  tools/perf/util/topdown.h                     |   11 +-
> >  120 files changed, 18025 insertions(+), 15590 deletions(-)
> >  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/cache.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/frontend.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/memory.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/other.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/virtual-memory.json
> >

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 39/51] perf stat: Add TopdownL1 metric as a default if present
  2023-02-27 20:12       ` Liang, Kan
@ 2023-02-28  6:27         ` Ian Rogers
  2023-02-28 14:15           ` Liang, Kan
  0 siblings, 1 reply; 50+ messages in thread
From: Ian Rogers @ 2023-02-28  6:27 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers, Stephane Eranian

On Mon, Feb 27, 2023 at 12:13 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2023-02-27 2:33 p.m., Ian Rogers wrote:
> > On Mon, Feb 27, 2023 at 11:12 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
> >>
> >>
> >>
> >> On 2023-02-19 4:28 a.m., Ian Rogers wrote:
> >>> When there are no events and on Intel, the topdown events will be
> >>> added by default if present. To display the metrics associated with
> >>> these request special handling in stat-shadow.c. To more easily update
> >>> these metrics use the json metric version via the TopdownL1
> >>> group. This makes the handling less platform specific.
> >>>
> >>> Modify the metricgroup__has_metric code to also cover metric groups.
> >>>
> >>> Signed-off-by: Ian Rogers <irogers@google.com>
> >>> ---
> >>>  tools/perf/arch/x86/util/evlist.c  |  6 +++---
> >>>  tools/perf/arch/x86/util/topdown.c | 30 ------------------------------
> >>>  tools/perf/arch/x86/util/topdown.h |  1 -
> >>>  tools/perf/builtin-stat.c          | 14 ++++++++++++++
> >>>  tools/perf/util/metricgroup.c      |  6 ++----
> >>>  5 files changed, 19 insertions(+), 38 deletions(-)
> >>>
> >>> diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
> >>> index cb59ce9b9638..8a7ae4162563 100644
> >>> --- a/tools/perf/arch/x86/util/evlist.c
> >>> +++ b/tools/perf/arch/x86/util/evlist.c
> >>> @@ -59,10 +59,10 @@ int arch_evlist__add_default_attrs(struct evlist *evlist,
> >>>                                  struct perf_event_attr *attrs,
> >>>                                  size_t nr_attrs)
> >>>  {
> >>> -     if (nr_attrs)
> >>> -             return ___evlist__add_default_attrs(evlist, attrs, nr_attrs);
> >>> +     if (!nr_attrs)
> >>> +             return 0;
> >>>
> >>> -     return topdown_parse_events(evlist);
> >>> +     return ___evlist__add_default_attrs(evlist, attrs, nr_attrs);
> >>>  }
> >>>
> >>>  struct evsel *arch_evlist__leader(struct list_head *list)
> >>> diff --git a/tools/perf/arch/x86/util/topdown.c b/tools/perf/arch/x86/util/topdown.c
> >>> index 54810f9acd6f..eb3a7d9652ab 100644
> >>> --- a/tools/perf/arch/x86/util/topdown.c
> >>> +++ b/tools/perf/arch/x86/util/topdown.c
> >>> @@ -9,11 +9,6 @@
> >>>  #include "topdown.h"
> >>>  #include "evsel.h"
> >>>
> >>> -#define TOPDOWN_L1_EVENTS       "{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound}"
> >>> -#define TOPDOWN_L1_EVENTS_CORE  "{slots,cpu_core/topdown-retiring/,cpu_core/topdown-bad-spec/,cpu_core/topdown-fe-bound/,cpu_core/topdown-be-bound/}"
> >>> -#define TOPDOWN_L2_EVENTS       "{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}"
> >>> -#define TOPDOWN_L2_EVENTS_CORE  "{slots,cpu_core/topdown-retiring/,cpu_core/topdown-bad-spec/,cpu_core/topdown-fe-bound/,cpu_core/topdown-be-bound/,cpu_core/topdown-heavy-ops/,cpu_core/topdown-br-mispredict/,cpu_core/topdown-fetch-lat/,cpu_core/topdown-mem-bound/}"
> >>> -
> >>>  /* Check whether there is a PMU which supports the perf metrics. */
> >>>  bool topdown_sys_has_perf_metrics(void)
> >>>  {
> >>> @@ -99,28 +94,3 @@ const char *arch_get_topdown_pmu_name(struct evlist *evlist, bool warn)
> >>>
> >>>       return pmu_name;
> >>>  }
> >>> -
> >>> -int topdown_parse_events(struct evlist *evlist)
> >>> -{
> >>> -     const char *topdown_events;
> >>> -     const char *pmu_name;
> >>> -
> >>> -     if (!topdown_sys_has_perf_metrics())
> >>> -             return 0;
> >>> -
> >>> -     pmu_name = arch_get_topdown_pmu_name(evlist, false);
> >>> -
> >>> -     if (pmu_have_event(pmu_name, "topdown-heavy-ops")) {
> >>> -             if (!strcmp(pmu_name, "cpu_core"))
> >>> -                     topdown_events = TOPDOWN_L2_EVENTS_CORE;
> >>> -             else
> >>> -                     topdown_events = TOPDOWN_L2_EVENTS;
> >>> -     } else {
> >>> -             if (!strcmp(pmu_name, "cpu_core"))
> >>> -                     topdown_events = TOPDOWN_L1_EVENTS_CORE;
> >>> -             else
> >>> -                     topdown_events = TOPDOWN_L1_EVENTS;
> >>> -     }
> >>> -
> >>> -     return parse_event(evlist, topdown_events);
> >>> -}
> >>> diff --git a/tools/perf/arch/x86/util/topdown.h b/tools/perf/arch/x86/util/topdown.h
> >>> index 7eb81f042838..46bf9273e572 100644
> >>> --- a/tools/perf/arch/x86/util/topdown.h
> >>> +++ b/tools/perf/arch/x86/util/topdown.h
> >>> @@ -3,6 +3,5 @@
> >>>  #define _TOPDOWN_H 1
> >>>
> >>>  bool topdown_sys_has_perf_metrics(void);
> >>> -int topdown_parse_events(struct evlist *evlist);
> >>>
> >>>  #endif
> >>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> >>> index 5e13171a7bba..796e98e453f6 100644
> >>> --- a/tools/perf/builtin-stat.c
> >>> +++ b/tools/perf/builtin-stat.c
> >>> @@ -1996,6 +1996,7 @@ static int add_default_attributes(void)
> >>>               stat_config.topdown_level = TOPDOWN_MAX_LEVEL;
> >>>
> >>>       if (!evsel_list->core.nr_entries) {
> >>> +             /* No events so add defaults. */
> >>>               if (target__has_cpu(&target))
> >>>                       default_attrs0[0].config = PERF_COUNT_SW_CPU_CLOCK;
> >>>
> >>> @@ -2011,6 +2012,19 @@ static int add_default_attributes(void)
> >>>               }
> >>>               if (evlist__add_default_attrs(evsel_list, default_attrs1) < 0)
> >>>                       return -1;
> >>> +             /*
> >>> +              * Add TopdownL1 metrics if they exist. To minimize
> >>> +              * multiplexing, don't request threshold computation.
> >>> +              */
> >>> +             if (metricgroup__has_metric("TopdownL1") &&
> >>> +                 metricgroup__parse_groups(evsel_list, "TopdownL1",
> >>> +                                         /*metric_no_group=*/false,
> >>> +                                         /*metric_no_merge=*/false,
> >>> +                                         /*metric_no_threshold=*/true,
> >>> +                                         stat_config.user_requested_cpu_list,
> >>> +                                         stat_config.system_wide,
> >>> +                                         &stat_config.metric_events) < 0)
> >>
> >> Does the metricgroup__* function check the existances of the events on
> >> the machine? If not, it may not be reliable to only check the event list.
> >>
> >> The existing code supports both L1 and L2 Topdown for SPR. But this
> >> patch seems remove the L2 Topdown support for SPR.
> >>
> >> The TopdownL1/L2 metric is added only for the big core with perf stat
> >> default. It's because that the perf_metrics is a dedicated register,
> >> which should not impact other events (using GP counters.) But this patch
> >> seems don't check the CPU type. It may brings extra multiplexing for the
> >> perf stat default on an ATOM platform.
> >>
> >> Thanks,
> >> Kan
> >
> > Hi Kan,
> >
> > The TopdownL2 metrics are present for SPR. The code changes to default
> > for L1 as with json topdown the maximum topdown level (the default
> > previously) is L6, and nobody really wants to see that. The --topdown
> > option is no longer limited to Icelake+ processors, any with the
> > TopdownL1 metricgroup will work as --topdown has just become a
> > shortcut to that.
>
> This patch seems also changes the perf stat default. The current perf
> stat default shows both L1 and L2 for SPR. If that's the case, it should
> be a user visible changes. What's output of "perf stat sleep 1" with
> this patch on SPR?

I'll need to find an SPR. The change from L2 to L1 as mentioned above
is that the behavior was to print max topdown level, and with json
metrics that is level 6. We could make the behavior level 2 if L2
topdown events are present. I didn't do this as it would mean SPR
didn't align with everything else.

> >
> > There may be additional multiplexing, but also, in the old code events
> > from different groups could be used to calculate a bogus metric. There
> > are also additional events as the previous metrics don't agree with
> > those in the TMA spreadsheet. If there is multiplexing from this
> > change on SPR, the TMA json metrics do try to avoid this, I think the
> > right path through this is to fix the json metrics.
>
> For the perf stat default, there should be no multiplexing.
>
> Also, it looks like the patch and the following several patches remove
> the existence check of an event (pmu_have_event()). It may not be a good
> idea. Those events/features usually be enumerated, which means that they
> may not be available in some cases. For example, we don't have the perf
> metrics support on KVM. I don't think the current JSON metrics checks
> the CPUID enumeration. If so, the patch may brings problem in a VM.

This seems like a general problem with json metrics. In the case that
no metric is calculated the metric isn't printed, but you may see the
events as not counted. We could special case this as the current code
does or perhaps we can have some kind of "default event" flag and if
such events fail to open just drop them from the output. There are
likely other strategies.

Thanks,
Ian

> Thanks,
> Kan
>
> >
> > Thanks,
> > Ian
> >
> >>> +                     return -1;
> >>>               /* Platform specific attrs */
> >>>               if (evlist__add_default_attrs(evsel_list, default_null_attrs) < 0)
> >>>                       return -1;
> >>> diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> >>> index afb6f2fdc24e..64a35f2787dc 100644
> >>> --- a/tools/perf/util/metricgroup.c
> >>> +++ b/tools/perf/util/metricgroup.c
> >>> @@ -1647,10 +1647,8 @@ static int metricgroup__has_metric_callback(const struct pmu_metric *pm,
> >>>  {
> >>>       const char *metric = vdata;
> >>>
> >>> -     if (!pm->metric_expr)
> >>> -             return 0;
> >>> -
> >>> -     if (match_metric(pm->metric_name, metric))
> >>> +     if (match_metric(pm->metric_name, metric) ||
> >>> +         match_metric(pm->metric_group, metric))
> >>>               return 1;
> >>>
> >>>       return 0;

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 01/51] perf tools: Ensure evsel name is initialized
  2023-02-19  9:27 ` [PATCH v1 01/51] perf tools: Ensure evsel name is initialized Ian Rogers
@ 2023-02-28 12:06   ` kajoljain
  0 siblings, 0 replies; 50+ messages in thread
From: kajoljain @ 2023-02-28 12:06 UTC (permalink / raw)
  To: Ian Rogers, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, John Garry, Kan Liang, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers
  Cc: Stephane Eranian



On 2/19/23 14:57, Ian Rogers wrote:
> Use the evsel__name accessor as otherwise name may be NULL resulting
> in a segv. This was observed with the perf stat shell test.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/util/synthetic-events.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> index 9ab9308ee80c..6def01036eb5 100644
> --- a/tools/perf/util/synthetic-events.c
> +++ b/tools/perf/util/synthetic-events.c
> @@ -2004,7 +2004,7 @@ int perf_event__synthesize_event_update_name(struct perf_tool *tool, struct evse
>  					     perf_event__handler_t process)
>  {
>  	struct perf_record_event_update *ev;
> -	size_t len = strlen(evsel->name);
> +	size_t len = strlen(evsel__name(evsel));
>  	int err;
>  

Patch looks fine to me

Reviewed-by: Kajol Jain <kjain@linux.ibm.com>

>  	ev = event_update_event__new(len + 1, PERF_EVENT_UPDATE__NAME, evsel->core.id[0]);

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 39/51] perf stat: Add TopdownL1 metric as a default if present
  2023-02-28  6:27         ` Ian Rogers
@ 2023-02-28 14:15           ` Liang, Kan
  0 siblings, 0 replies; 50+ messages in thread
From: Liang, Kan @ 2023-02-28 14:15 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Zhengjun Xing, Sandipan Das,
	James Clark, Kajol Jain, John Garry, Adrian Hunter,
	Andrii Nakryiko, Eduard Zingerman, Suzuki Poulouse, Leo Yan,
	Florian Fischer, Ravi Bangoria, Jing Zhang, Sean Christopherson,
	Athira Rajeev, linux-kernel, linux-perf-users, linux-stm32,
	linux-arm-kernel, Perry Taylor, Caleb Biggers, Stephane Eranian



On 2023-02-28 1:27 a.m., Ian Rogers wrote:
> On Mon, Feb 27, 2023 at 12:13 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>
>>
>>
>> On 2023-02-27 2:33 p.m., Ian Rogers wrote:
>>> On Mon, Feb 27, 2023 at 11:12 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2023-02-19 4:28 a.m., Ian Rogers wrote:
>>>>> When there are no events and on Intel, the topdown events will be
>>>>> added by default if present. To display the metrics associated with
>>>>> these request special handling in stat-shadow.c. To more easily update
>>>>> these metrics use the json metric version via the TopdownL1
>>>>> group. This makes the handling less platform specific.
>>>>>
>>>>> Modify the metricgroup__has_metric code to also cover metric groups.
>>>>>
>>>>> Signed-off-by: Ian Rogers <irogers@google.com>
>>>>> ---
>>>>>  tools/perf/arch/x86/util/evlist.c  |  6 +++---
>>>>>  tools/perf/arch/x86/util/topdown.c | 30 ------------------------------
>>>>>  tools/perf/arch/x86/util/topdown.h |  1 -
>>>>>  tools/perf/builtin-stat.c          | 14 ++++++++++++++
>>>>>  tools/perf/util/metricgroup.c      |  6 ++----
>>>>>  5 files changed, 19 insertions(+), 38 deletions(-)
>>>>>
>>>>> diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
>>>>> index cb59ce9b9638..8a7ae4162563 100644
>>>>> --- a/tools/perf/arch/x86/util/evlist.c
>>>>> +++ b/tools/perf/arch/x86/util/evlist.c
>>>>> @@ -59,10 +59,10 @@ int arch_evlist__add_default_attrs(struct evlist *evlist,
>>>>>                                  struct perf_event_attr *attrs,
>>>>>                                  size_t nr_attrs)
>>>>>  {
>>>>> -     if (nr_attrs)
>>>>> -             return ___evlist__add_default_attrs(evlist, attrs, nr_attrs);
>>>>> +     if (!nr_attrs)
>>>>> +             return 0;
>>>>>
>>>>> -     return topdown_parse_events(evlist);
>>>>> +     return ___evlist__add_default_attrs(evlist, attrs, nr_attrs);
>>>>>  }
>>>>>
>>>>>  struct evsel *arch_evlist__leader(struct list_head *list)
>>>>> diff --git a/tools/perf/arch/x86/util/topdown.c b/tools/perf/arch/x86/util/topdown.c
>>>>> index 54810f9acd6f..eb3a7d9652ab 100644
>>>>> --- a/tools/perf/arch/x86/util/topdown.c
>>>>> +++ b/tools/perf/arch/x86/util/topdown.c
>>>>> @@ -9,11 +9,6 @@
>>>>>  #include "topdown.h"
>>>>>  #include "evsel.h"
>>>>>
>>>>> -#define TOPDOWN_L1_EVENTS       "{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound}"
>>>>> -#define TOPDOWN_L1_EVENTS_CORE  "{slots,cpu_core/topdown-retiring/,cpu_core/topdown-bad-spec/,cpu_core/topdown-fe-bound/,cpu_core/topdown-be-bound/}"
>>>>> -#define TOPDOWN_L2_EVENTS       "{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}"
>>>>> -#define TOPDOWN_L2_EVENTS_CORE  "{slots,cpu_core/topdown-retiring/,cpu_core/topdown-bad-spec/,cpu_core/topdown-fe-bound/,cpu_core/topdown-be-bound/,cpu_core/topdown-heavy-ops/,cpu_core/topdown-br-mispredict/,cpu_core/topdown-fetch-lat/,cpu_core/topdown-mem-bound/}"
>>>>> -
>>>>>  /* Check whether there is a PMU which supports the perf metrics. */
>>>>>  bool topdown_sys_has_perf_metrics(void)
>>>>>  {
>>>>> @@ -99,28 +94,3 @@ const char *arch_get_topdown_pmu_name(struct evlist *evlist, bool warn)
>>>>>
>>>>>       return pmu_name;
>>>>>  }
>>>>> -
>>>>> -int topdown_parse_events(struct evlist *evlist)
>>>>> -{
>>>>> -     const char *topdown_events;
>>>>> -     const char *pmu_name;
>>>>> -
>>>>> -     if (!topdown_sys_has_perf_metrics())
>>>>> -             return 0;
>>>>> -
>>>>> -     pmu_name = arch_get_topdown_pmu_name(evlist, false);
>>>>> -
>>>>> -     if (pmu_have_event(pmu_name, "topdown-heavy-ops")) {
>>>>> -             if (!strcmp(pmu_name, "cpu_core"))
>>>>> -                     topdown_events = TOPDOWN_L2_EVENTS_CORE;
>>>>> -             else
>>>>> -                     topdown_events = TOPDOWN_L2_EVENTS;
>>>>> -     } else {
>>>>> -             if (!strcmp(pmu_name, "cpu_core"))
>>>>> -                     topdown_events = TOPDOWN_L1_EVENTS_CORE;
>>>>> -             else
>>>>> -                     topdown_events = TOPDOWN_L1_EVENTS;
>>>>> -     }
>>>>> -
>>>>> -     return parse_event(evlist, topdown_events);
>>>>> -}
>>>>> diff --git a/tools/perf/arch/x86/util/topdown.h b/tools/perf/arch/x86/util/topdown.h
>>>>> index 7eb81f042838..46bf9273e572 100644
>>>>> --- a/tools/perf/arch/x86/util/topdown.h
>>>>> +++ b/tools/perf/arch/x86/util/topdown.h
>>>>> @@ -3,6 +3,5 @@
>>>>>  #define _TOPDOWN_H 1
>>>>>
>>>>>  bool topdown_sys_has_perf_metrics(void);
>>>>> -int topdown_parse_events(struct evlist *evlist);
>>>>>
>>>>>  #endif
>>>>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
>>>>> index 5e13171a7bba..796e98e453f6 100644
>>>>> --- a/tools/perf/builtin-stat.c
>>>>> +++ b/tools/perf/builtin-stat.c
>>>>> @@ -1996,6 +1996,7 @@ static int add_default_attributes(void)
>>>>>               stat_config.topdown_level = TOPDOWN_MAX_LEVEL;
>>>>>
>>>>>       if (!evsel_list->core.nr_entries) {
>>>>> +             /* No events so add defaults. */
>>>>>               if (target__has_cpu(&target))
>>>>>                       default_attrs0[0].config = PERF_COUNT_SW_CPU_CLOCK;
>>>>>
>>>>> @@ -2011,6 +2012,19 @@ static int add_default_attributes(void)
>>>>>               }
>>>>>               if (evlist__add_default_attrs(evsel_list, default_attrs1) < 0)
>>>>>                       return -1;
>>>>> +             /*
>>>>> +              * Add TopdownL1 metrics if they exist. To minimize
>>>>> +              * multiplexing, don't request threshold computation.
>>>>> +              */
>>>>> +             if (metricgroup__has_metric("TopdownL1") &&
>>>>> +                 metricgroup__parse_groups(evsel_list, "TopdownL1",
>>>>> +                                         /*metric_no_group=*/false,
>>>>> +                                         /*metric_no_merge=*/false,
>>>>> +                                         /*metric_no_threshold=*/true,
>>>>> +                                         stat_config.user_requested_cpu_list,
>>>>> +                                         stat_config.system_wide,
>>>>> +                                         &stat_config.metric_events) < 0)
>>>>
>>>> Does the metricgroup__* function check the existances of the events on
>>>> the machine? If not, it may not be reliable to only check the event list.
>>>>
>>>> The existing code supports both L1 and L2 Topdown for SPR. But this
>>>> patch seems remove the L2 Topdown support for SPR.
>>>>
>>>> The TopdownL1/L2 metric is added only for the big core with perf stat
>>>> default. It's because that the perf_metrics is a dedicated register,
>>>> which should not impact other events (using GP counters.) But this patch
>>>> seems don't check the CPU type. It may brings extra multiplexing for the
>>>> perf stat default on an ATOM platform.
>>>>
>>>> Thanks,
>>>> Kan
>>>
>>> Hi Kan,
>>>
>>> The TopdownL2 metrics are present for SPR. The code changes to default
>>> for L1 as with json topdown the maximum topdown level (the default
>>> previously) is L6, and nobody really wants to see that. The --topdown
>>> option is no longer limited to Icelake+ processors, any with the
>>> TopdownL1 metricgroup will work as --topdown has just become a
>>> shortcut to that.
>>
>> This patch seems also changes the perf stat default. The current perf
>> stat default shows both L1 and L2 for SPR. If that's the case, it should
>> be a user visible changes. What's output of "perf stat sleep 1" with
>> this patch on SPR?
> 
> I'll need to find an SPR. The change from L2 to L1 as mentioned above
> is that the behavior was to print max topdown level, and with json
> metrics that is level 6. We could make the behavior level 2 if L2
> topdown events are present. I didn't do this as it would mean SPR
> didn't align with everything else.

For L2, it's free on SPR. So we extended it only for SPR. That could
give user more information with perf stat default.

> 
>>>
>>> There may be additional multiplexing, but also, in the old code events
>>> from different groups could be used to calculate a bogus metric. There
>>> are also additional events as the previous metrics don't agree with
>>> those in the TMA spreadsheet. If there is multiplexing from this
>>> change on SPR, the TMA json metrics do try to avoid this, I think the
>>> right path through this is to fix the json metrics.
>>
>> For the perf stat default, there should be no multiplexing.
>>
>> Also, it looks like the patch and the following several patches remove
>> the existence check of an event (pmu_have_event()). It may not be a good
>> idea. Those events/features usually be enumerated, which means that they
>> may not be available in some cases. For example, we don't have the perf
>> metrics support on KVM. I don't think the current JSON metrics checks
>> the CPUID enumeration. If so, the patch may brings problem in a VM.
> 
> This seems like a general problem with json metrics. In the case that
> no metric is calculated the metric isn't printed, but you may see the
> events as not counted. We could special case this as the current code
> does or perhaps we can have some kind of "default event" flag and if
> such events fail to open just drop them from the output. There are
> likely other strategies.


Yes, it should be a general problem. The current behavior is to error
out with an error message rather than 0 count. The 0 count sometimes may
mislead the user.

Either special case or a "default event" flag is OK for me. My point is
that we should check the existence of the feature/events. So we can give
user a clear message regarding the reason of dropping or 0 the events.
If they really want the feature/metrics, they can update the kernel or
ask the cloud vender to update the service.

Maybe we can extend the same soulution to perf list as well. For the
machine which doesn't have some feature, we may drop the metrics or use
an alternative metrics. But that could be done later separately.

Thanks,
Kan

> 
> Thanks,
> Ian
> 
>> Thanks,
>> Kan
>>
>>>
>>> Thanks,
>>> Ian
>>>
>>>>> +                     return -1;
>>>>>               /* Platform specific attrs */
>>>>>               if (evlist__add_default_attrs(evsel_list, default_null_attrs) < 0)
>>>>>                       return -1;
>>>>> diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
>>>>> index afb6f2fdc24e..64a35f2787dc 100644
>>>>> --- a/tools/perf/util/metricgroup.c
>>>>> +++ b/tools/perf/util/metricgroup.c
>>>>> @@ -1647,10 +1647,8 @@ static int metricgroup__has_metric_callback(const struct pmu_metric *pm,
>>>>>  {
>>>>>       const char *metric = vdata;
>>>>>
>>>>> -     if (!pm->metric_expr)
>>>>> -             return 0;
>>>>> -
>>>>> -     if (match_metric(pm->metric_name, metric))
>>>>> +     if (match_metric(pm->metric_name, metric) ||
>>>>> +         match_metric(pm->metric_group, metric))
>>>>>               return 1;
>>>>>
>>>>>       return 0;

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2023-02-28 14:15 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-19  9:27 [PATCH v1 00/51] shadow metric clean up and improvements Ian Rogers
2023-02-19  9:27 ` [PATCH v1 01/51] perf tools: Ensure evsel name is initialized Ian Rogers
2023-02-28 12:06   ` kajoljain
2023-02-19  9:27 ` [PATCH v1 02/51] perf metrics: Improve variable names Ian Rogers
2023-02-19  9:28 ` [PATCH v1 03/51] perf pmu-events: Remove aggr_mode from pmu_event Ian Rogers
2023-02-19  9:28 ` [PATCH v1 04/51] perf pmu-events: Change aggr_mode to be an enum Ian Rogers
2023-02-19  9:28 ` [PATCH v1 05/51] perf pmu-events: Change deprecated to be a bool Ian Rogers
2023-02-19  9:28 ` [PATCH v1 06/51] perf pmu-events: Change perpkg " Ian Rogers
2023-02-19  9:28 ` [PATCH v1 07/51] perf expr: Make the online topology accessible globally Ian Rogers
2023-02-19  9:28 ` [PATCH v1 08/51] perf pmu-events: Make the metric_constraint an enum Ian Rogers
2023-02-19  9:28 ` [PATCH v1 09/51] perf pmu-events: Don't '\0' terminate enum values Ian Rogers
2023-02-19  9:28 ` [PATCH v1 11/51] perf vendor events intel: Refresh alderlake-n metrics Ian Rogers
2023-02-19  9:28 ` [PATCH v1 16/51] perf vendor events intel: Add graniterapids events Ian Rogers
2023-02-19  9:28 ` [PATCH v1 24/51] perf vendor events intel: Refresh knightslanding events Ian Rogers
2023-02-19  9:28 ` [PATCH v1 25/51] perf vendor events intel: Refresh sandybridge events Ian Rogers
2023-02-19  9:28 ` [PATCH v1 27/51] perf vendor events intel: Refresh silvermont events Ian Rogers
2023-02-19  9:28 ` [PATCH v1 31/51] perf vendor events intel: Refresh westmereep-dp events Ian Rogers
2023-02-19  9:28 ` [PATCH v1 32/51] perf jevents: Add rand support to metrics Ian Rogers
2023-02-19  9:28 ` [PATCH v1 33/51] perf jevent: Parse metric thresholds Ian Rogers
2023-02-19  9:28 ` [PATCH v1 34/51] perf pmu-events: Test parsing metric thresholds with the fake PMU Ian Rogers
2023-02-19  9:28 ` [PATCH v1 35/51] perf list: Support for printing metric thresholds Ian Rogers
2023-02-19  9:28 ` [PATCH v1 36/51] perf metric: Compute and print threshold values Ian Rogers
2023-02-19  9:28 ` [PATCH v1 37/51] perf expr: More explicit NAN handling Ian Rogers
2023-02-19  9:28 ` [PATCH v1 38/51] perf metric: Add --metric-no-threshold option Ian Rogers
2023-02-19  9:28 ` [PATCH v1 39/51] perf stat: Add TopdownL1 metric as a default if present Ian Rogers
2023-02-27 19:12   ` Liang, Kan
2023-02-27 19:33     ` Ian Rogers
2023-02-27 20:12       ` Liang, Kan
2023-02-28  6:27         ` Ian Rogers
2023-02-28 14:15           ` Liang, Kan
2023-02-19  9:28 ` [PATCH v1 40/51] perf stat: Implement --topdown using json metrics Ian Rogers
2023-02-19  9:28 ` [PATCH v1 41/51] perf stat: Remove topdown event special handling Ian Rogers
2023-02-19  9:28 ` [PATCH v1 42/51] perf doc: Refresh topdown documentation Ian Rogers
2023-02-19  9:28 ` [PATCH v1 43/51] perf stat: Remove hard coded transaction events Ian Rogers
2023-02-19  9:28 ` [PATCH v1 44/51] perf stat: Use metrics for --smi-cost Ian Rogers
2023-02-19  9:28 ` [PATCH v1 45/51] perf stat: Remove perf_stat_evsel_id Ian Rogers
2023-02-19  9:28 ` [PATCH v1 46/51] perf stat: Move enums from header Ian Rogers
2023-02-19  9:28 ` [PATCH v1 47/51] perf stat: Hide runtime_stat Ian Rogers
2023-02-19  9:28 ` [PATCH v1 48/51] perf stat: Add cpu_aggr_map for loop Ian Rogers
2023-02-19  9:28 ` [PATCH v1 49/51] perf metric: Directly use counts rather than saved_value Ian Rogers
2023-02-19  9:28 ` [PATCH v1 50/51] perf stat: Use " Ian Rogers
2023-02-24 22:48   ` Namhyung Kim
2023-02-25  5:47     ` Ian Rogers
2023-02-19  9:28 ` [PATCH v1 51/51] perf stat: Remove saved_value/runtime_stat Ian Rogers
2023-02-19 11:17 ` [PATCH v1 00/51] shadow metric clean up and improvements Arnaldo Carvalho de Melo
2023-02-19 15:43   ` Ian Rogers
2023-02-21 17:44     ` Ian Rogers
2023-02-22 13:47       ` Arnaldo Carvalho de Melo
2023-02-27 22:04 ` Liang, Kan
2023-02-28  6:21   ` Ian Rogers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).