[PATCH 0/2] perf stat: Fix uncore metric scaling across aggregation modes

Linux Perf Users
 help / color / mirror / Atom feed

* [PATCH 0/2] perf stat: Fix uncore metric scaling across aggregation modes
@ 2026-05-20 17:58 Chun-Tse Shao
  2026-05-20 17:58 ` [PATCH 1/2] perf stat: Add aggr_nr metric parser support Chun-Tse Shao
  2026-05-20 17:58 ` [PATCH 2/2] perf stat: Use aggr_nr scaling for Intel uncore miss latency metrics Chun-Tse Shao
  0 siblings, 2 replies; 6+ messages in thread
From: Chun-Tse Shao @ 2026-05-20 17:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chun-Tse Shao, peterz, mingo, acme, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, james.clark,
	sandipan.das, leo.yan, thomas.falcon, yang.lee, linux-perf-users

This series fixes a scaling issue for metrics (like lpm_miss_lat) across
different runtime aggregation modes.

Uncore metrics currently use `source_count` to scale events. However,
`source_count` returns the total uncore unit count regardless of the
selected aggregation mode. When evaluating metrics in different
aggregation mode other than `--per-socket`, this incorrectly divides
aggregated uncore events against the total uncore count rather than the
uncores belonging to the aggregation, leading to wrong metric results.

To fix this, we:
1. Introduce the aggr_nr() keyword to the metric parser, which
dynamically resolves to the active units in the current aggregation
group (`aggr->nr`).

2. Update the python metrics to use `aggr_nr` instead of `source_count`,
ensuring correct scaling across all runtime aggregation boundaries.

Before the fix (incorrect low latency in global mode):
  $ perf stat -M lpm_miss_lat --metric-only -a -j -- sleep 1
  {"ns  lpm_miss_lat_rem" : "122.8", "ns  lpm_miss_lat_loc" : "114.5"}
  $ perf stat -M lpm_miss_lat --per-socket --metric-only -a -j -- sleep 1
  {"socket" : "S0", "ns  lpm_miss_lat_rem" : "232.1", "ns  lpm_miss_lat_loc" : "278.2"}
  {"socket" : "S1", "ns  lpm_miss_lat_rem" : "233.9", "ns  lpm_miss_lat_loc" : "257.5"}

After the fix (correct scaled latency in all aggregation modes):
  $ perf stat -M lpm_miss_lat --metric-only -a -j -- sleep 1
  {"ns  lpm_miss_lat_rem" : "230.0", "ns  lpm_miss_lat_loc" : "220.8"}
  $ perf stat -M lpm_miss_lat --per-socket --metric-only -a -j -- sleep 1
  {"socket" : "S0", "ns  lpm_miss_lat_rem" : "232.5", "ns  lpm_miss_lat_loc" : "269.9"}
  {"socket" : "S1", "ns  lpm_miss_lat_rem" : "225.7", "ns  lpm_miss_lat_loc" : "262.3"}

Chun-Tse Shao (2):
  perf stat: Add aggr_nr metric parser support
  perf stat: Use aggr_nr scaling for Intel uncore miss latency metrics

 tools/perf/pmu-events/intel_metrics.py |  8 ++++----
 tools/perf/pmu-events/metric.py        | 11 +++++++++--
 tools/perf/util/expr.c                 | 27 ++++++++++++++++++++++----
 tools/perf/util/expr.h                 |  4 ++++
 tools/perf/util/expr.l                 |  1 +
 tools/perf/util/expr.y                 | 24 ++++++++++++++++-------
 tools/perf/util/stat-shadow.c          |  4 +++-
 7 files changed, 61 insertions(+), 18 deletions(-)

--
2.54.0.669.g59709faab0-goog


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] perf stat: Add aggr_nr metric parser support
  2026-05-20 17:58 [PATCH 0/2] perf stat: Fix uncore metric scaling across aggregation modes Chun-Tse Shao
@ 2026-05-20 17:58 ` Chun-Tse Shao
  2026-05-20 18:28   ` sashiko-bot
  2026-05-20 17:58 ` [PATCH 2/2] perf stat: Use aggr_nr scaling for Intel uncore miss latency metrics Chun-Tse Shao
  1 sibling, 1 reply; 6+ messages in thread
From: Chun-Tse Shao @ 2026-05-20 17:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chun-Tse Shao, peterz, mingo, acme, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, james.clark,
	leo.yan, thomas.falcon, yang.lee, linux-perf-users

Introduce the `aggr_nr` function to the metric expression parser.
`aggr_nr` allows metric formulas to dynamically utilize the number of
aggregated targets (`aggr->nr`) instead of relying on the static
`source_count` (which represents the static socket or node count).

This adds the `AGGR_NR` token to the lexer and parser, updates the
expression parsing context helpers to store `aggr_nr`, and feeds
`aggr->nr` from the aggregation structure in `prepare_metric`.

Signed-off-by: Chun-Tse Shao <ctshao@google.com>
---
 tools/perf/util/expr.c        | 27 +++++++++++++++++++++++----
 tools/perf/util/expr.h        |  4 ++++
 tools/perf/util/expr.l        |  1 +
 tools/perf/util/expr.y        | 24 +++++++++++++++++-------
 tools/perf/util/stat-shadow.c |  4 +++-
 5 files changed, 48 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
index 644769e92708..8d00805f93c4 100644
--- a/tools/perf/util/expr.c
+++ b/tools/perf/util/expr.c
@@ -27,6 +27,7 @@ struct expr_id_data {
 		struct {
 			double val;
 			int source_count;
+			int aggr_nr;
 		} val;
 		struct {
 			double val;
@@ -151,8 +152,8 @@ int expr__add_id_val(struct expr_parse_ctx *ctx, const char *id, double val)
 }

 /* Caller must make sure id is allocated */
-int expr__add_id_val_source_count(struct expr_parse_ctx *ctx, const char *id,
-				  double val, int source_count)
+int expr__add_id_val_source_count_aggr_nr(struct expr_parse_ctx *ctx, const char *id,
+					  double val, int source_count, int aggr_nr)
 {
 	struct expr_id_data *data_ptr = NULL, *old_data = NULL;
 	char *old_key = NULL;
@@ -163,6 +164,7 @@ int expr__add_id_val_source_count(struct expr_parse_ctx *ctx, const char *id,
 		return -ENOMEM;
 	data_ptr->val.val = val;
 	data_ptr->val.source_count = source_count;
+	data_ptr->val.aggr_nr = aggr_nr;
 	data_ptr->kind = EXPR_ID_DATA__VALUE;

 	ret = hashmap__set(ctx->ids, id, data_ptr, &old_key, &old_data);
@@ -177,6 +179,14 @@ int expr__add_id_val_source_count(struct expr_parse_ctx *ctx, const char *id,
 	return ret;
 }

+/* Caller must make sure id is allocated */
+int expr__add_id_val_source_count(struct expr_parse_ctx *ctx, const char *id,
+				  double val, int source_count)
+{
+	return expr__add_id_val_source_count_aggr_nr(ctx, id, val, source_count, 1);
+}
+
+
 int expr__add_ref(struct expr_parse_ctx *ctx, struct metric_ref *ref)
 {
 	struct expr_id_data *data_ptr = NULL, *old_data = NULL;
@@ -390,10 +400,19 @@ double expr_id_data__value(const struct expr_id_data *data)

 double expr_id_data__source_count(const struct expr_id_data *data)
 {
-	assert(data->kind == EXPR_ID_DATA__VALUE);
-	return data->val.source_count;
+	if (data->kind == EXPR_ID_DATA__VALUE)
+		return data->val.source_count;
+	return 1.0;
 }

+double expr_id_data__aggr_nr(const struct expr_id_data *data)
+{
+	if (data->kind == EXPR_ID_DATA__VALUE)
+		return data->val.aggr_nr;
+	return 1.0;
+}
+
+
 double expr__get_literal(const char *literal, const struct expr_scanner_ctx *ctx)
 {
 	double result = NAN;
diff --git a/tools/perf/util/expr.h b/tools/perf/util/expr.h
index c0cec29ddc29..865bb63cdb03 100644
--- a/tools/perf/util/expr.h
+++ b/tools/perf/util/expr.h
@@ -37,6 +37,8 @@ int expr__add_id(struct expr_parse_ctx *ctx, const char *id);
 int expr__add_id_val(struct expr_parse_ctx *ctx, const char *id, double val);
 int expr__add_id_val_source_count(struct expr_parse_ctx *ctx, const char *id,
 				double val, int source_count);
+int expr__add_id_val_source_count_aggr_nr(struct expr_parse_ctx *ctx, const char *id,
+					double val, int source_count, int aggr_nr);
 int expr__add_ref(struct expr_parse_ctx *ctx, struct metric_ref *ref);
 int expr__get_id(struct expr_parse_ctx *ctx, const char *id,
 		 struct expr_id_data **data);
@@ -53,6 +55,8 @@ int expr__find_ids(const char *expr, const char *one,

 double expr_id_data__value(const struct expr_id_data *data);
 double expr_id_data__source_count(const struct expr_id_data *data);
+double expr_id_data__aggr_nr(const struct expr_id_data *data);
+
 double expr__get_literal(const char *literal, const struct expr_scanner_ctx *ctx);
 double expr__has_event(const struct expr_parse_ctx *ctx, bool compute_ids, const char *id);
 double expr__strcmp_cpuid_str(const struct expr_parse_ctx *ctx, bool compute_ids, const char *id);
diff --git a/tools/perf/util/expr.l b/tools/perf/util/expr.l
index a2fc43159ee9..f16ccff278d0 100644
--- a/tools/perf/util/expr.l
+++ b/tools/perf/util/expr.l
@@ -121,6 +121,7 @@ min		{ return MIN; }
 if		{ return IF; }
 else		{ return ELSE; }
 source_count	{ return SOURCE_COUNT; }
+aggr_nr		{ return AGGR_NR; }
 has_event	{ return HAS_EVENT; }
 strcmp_cpuid_str	{ return STRCMP_CPUID_STR; }
 NaN		{ return nan_value(yyscanner); }
diff --git a/tools/perf/util/expr.y b/tools/perf/util/expr.y
index e364790babb5..e20f649354cf 100644
--- a/tools/perf/util/expr.y
+++ b/tools/perf/util/expr.y
@@ -41,7 +41,7 @@ int expr_lex(YYSTYPE * yylval_param , void *yyscanner);
 	} ids;
 }

-%token ID NUMBER MIN MAX IF ELSE LITERAL D_RATIO SOURCE_COUNT HAS_EVENT STRCMP_CPUID_STR EXPR_ERROR
+%token ID NUMBER MIN MAX IF ELSE LITERAL D_RATIO SOURCE_COUNT AGGR_NR HAS_EVENT STRCMP_CPUID_STR EXPR_ERROR
 %left MIN MAX IF
 %left '|'
 %left '^'
@@ -87,8 +87,14 @@ static struct ids union_expr(struct ids ids1, struct ids ids2)
 	return result;
 }

+enum expr_id_kind {
+	EXPR_ID_KIND__VALUE,
+	EXPR_ID_KIND__SOURCE_COUNT,
+	EXPR_ID_KIND__AGGR_NR,
+};
+
 static struct ids handle_id(struct expr_parse_ctx *ctx, char *id,
-			    bool compute_ids, bool source_count)
+			    bool compute_ids, enum expr_id_kind kind)
 {
 	struct ids result;

@@ -101,9 +107,12 @@ static struct ids handle_id(struct expr_parse_ctx *ctx, char *id,

 		result.val = NAN;
 		if (expr__resolve_id(ctx, id, &data) == 0) {
-			result.val = source_count
-				? expr_id_data__source_count(data)
-				: expr_id_data__value(data);
+			if (kind == EXPR_ID_KIND__SOURCE_COUNT)
+				result.val = expr_id_data__source_count(data);
+			else if (kind == EXPR_ID_KIND__AGGR_NR)
+				result.val = expr_id_data__aggr_nr(data);
+			else
+				result.val = expr_id_data__value(data);
 		}
 		result.ids = NULL;
 		free(id);
@@ -201,8 +210,9 @@ expr: NUMBER
 	$$.val = $1;
 	$$.ids = NULL;
 }
-| ID				{ $$ = handle_id(ctx, $1, compute_ids, /*source_count=*/false); }
-| SOURCE_COUNT '(' ID ')'	{ $$ = handle_id(ctx, $3, compute_ids, /*source_count=*/true); }
+| ID				{ $$ = handle_id(ctx, $1, compute_ids, EXPR_ID_KIND__VALUE); }
+| SOURCE_COUNT '(' ID ')'	{ $$ = handle_id(ctx, $3, compute_ids, EXPR_ID_KIND__SOURCE_COUNT); }
+| AGGR_NR '(' ID ')'		{ $$ = handle_id(ctx, $3, compute_ids, EXPR_ID_KIND__AGGR_NR); }
 | HAS_EVENT '(' ID ')'
 {
 	$$.val = expr__has_event(ctx, compute_ids, $3);
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index bc2d44df7baf..e375f6f942be 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -53,6 +53,7 @@ static int prepare_metric(struct perf_stat_config *config,

 	for (i = 0; metric_events[i]; i++) {
 		int source_count = 0, tool_aggr_idx;
+		int aggr_nr = 1;
 		bool is_tool_time =
 			tool_pmu__is_time_event(config, metric_events[i], &tool_aggr_idx);
 		struct perf_stat_evsel *ps = metric_events[i]->stats;
@@ -104,13 +105,14 @@ static int prepare_metric(struct perf_stat_config *config,
 				}
 				if (!source_count)
 					source_count = evsel__source_count(metric_events[i]);
+				aggr_nr = aggr->nr ?: 1;
 			}
 		}
 		n = strdup(evsel__metric_id(metric_events[i]));
 		if (!n)
 			return -ENOMEM;

-		expr__add_id_val_source_count(pctx, n, val, source_count);
+		expr__add_id_val_source_count_aggr_nr(pctx, n, val, source_count, aggr_nr);
 	}

 	for (int j = 0; metric_refs && metric_refs[j].metric_name; j++) {
--
2.54.0.669.g59709faab0-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] perf stat: Add aggr_nr metric parser support
  2026-05-20 17:58 ` [PATCH 1/2] perf stat: Add aggr_nr metric parser support Chun-Tse Shao
@ 2026-05-20 18:28   ` sashiko-bot
  0 siblings, 0 replies; 6+ messages in thread
From: sashiko-bot @ 2026-05-20 18:28 UTC (permalink / raw)
  To: Chun-Tse Shao; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Incorrect `aggr_nr` evaluation when an event fails to run (`aggr->counts.run == 0`).
--

commit 7b8eacc795788823a2f1b5c16d96f2fc07c9fe23
Author: Chun-Tse Shao <ctshao@google.com>

perf stat: Add aggr_nr metric parser support

This commit introduces the aggr_nr function to the metric expression parser
to allow metric formulas to dynamically utilize the number of aggregated
targets. It updates the lexer, parser, and expression context helpers to
store and provide the aggregation count.

> diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
> --- a/tools/perf/util/stat-shadow.c
> +++ b/tools/perf/util/stat-shadow.c
> @@ -53,6 +53,7 @@ static int prepare_metric(struct perf_stat_config *config,
>  
>  	for (i = 0; metric_events[i]; i++) {
>  		int source_count = 0, tool_aggr_idx;
> +		int aggr_nr = 1;
>  		bool is_tool_time =
>  			tool_pmu__is_time_event(config, metric_events[i], &tool_aggr_idx);
>  		struct perf_stat_evsel *ps = metric_events[i]->stats;

[ ... ]

>  		} else {
>  			struct perf_stat_aggr *aggr =
>  				&ps->aggr[is_tool_time ? tool_aggr_idx : aggr_idx];
>  
>  			if (aggr->counts.run == 0) {
>  				val = NAN;
>  				source_count = 0;

[Severity: Medium]
Does this leave aggr_nr at its default initialized value of 1 when an event
fails to run?

When aggr->counts.run == 0, source_count is explicitly set to 0. Since
aggr_nr skips evaluation in this branch, any metric expression utilizing
the aggr_nr function on an event that failed to run will evaluate to 1
instead of the correct aggregation topology size.

Should aggr_nr be set to 0 here to mirror source_count, or unconditionally
populated with aggr->nr?

>  			} else {
>  				val = aggr->counts.val;
>  				if (is_tool_time) {
>  					/* Convert time event nanoseconds to seconds. */
>  					val *= 1e-9;
>  				}
>  				if (!source_count)
>  					source_count = evsel__source_count(metric_events[i]);
> +				aggr_nr = aggr->nr ?: 1;
>  			}
>  		}
>  		n = strdup(evsel__metric_id(metric_events[i]));
>  		if (!n)
>  			return -ENOMEM;
>  
> -		expr__add_id_val_source_count(pctx, n, val, source_count);
> +		expr__add_id_val_source_count_aggr_nr(pctx, n, val, source_count, aggr_nr);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260520180032.3045144-1-ctshao@google.com?part=1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/2] perf stat: Use aggr_nr scaling for Intel uncore miss latency metrics
  2026-05-20 17:58 [PATCH 0/2] perf stat: Fix uncore metric scaling across aggregation modes Chun-Tse Shao
  2026-05-20 17:58 ` [PATCH 1/2] perf stat: Add aggr_nr metric parser support Chun-Tse Shao
@ 2026-05-20 17:58 ` Chun-Tse Shao
  2026-05-20 19:24   ` sashiko-bot
  1 sibling, 1 reply; 6+ messages in thread
From: Chun-Tse Shao @ 2026-05-20 17:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chun-Tse Shao, peterz, mingo, acme, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, james.clark,
	sandipan.das, linux-perf-users

Update `metric.py` to support the new `aggr_nr` keyword in the python
metric generator. Replace the usage of `source_count` with `aggr_nr` in
`IntelMissLat` inside `intel_metrics.py` so that uncore latency metrics
(like `lpm_miss_lat`) scale correctly on multi-socket and SNC systems when
aggregated globally.

Signed-off-by: Chun-Tse Shao <ctshao@google.com>
---
 tools/perf/pmu-events/intel_metrics.py |  8 ++++----
 tools/perf/pmu-events/metric.py        | 11 +++++++++--
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
index 52035433b505..50d5a20fa354 100755
--- a/tools/perf/pmu-events/intel_metrics.py
+++ b/tools/perf/pmu-events/intel_metrics.py
@@ -5,9 +5,9 @@ import json
 import math
 import os
 import re
-from typing import Optional
+from typing import List, Optional
 from common_metrics import Cycles
-from metric import (d_ratio, has_event, max, source_count, CheckPmu, Event,
+from metric import (d_ratio, has_event, max, aggr_nr, CheckPmu, Event,
                     JsonEncodeMetric, JsonEncodeMetricGroupDescriptions,
                     Literal, LoadEvents, Metric, MetricConstraint, MetricGroup,
                     MetricRef, Select)
@@ -674,10 +674,10 @@ def IntelMissLat() -> Optional[MetricGroup]:
     else:
         assert data_rd_loc_occ.name == "UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_LOCAL", data_rd_loc_occ
 
-    ticks_per_cha = ticks / source_count(data_rd_loc_ins)
+    ticks_per_cha = ticks / aggr_nr(data_rd_loc_ins)
     loc_lat = interval_sec * 1e9 * data_rd_loc_occ / \
         (ticks_per_cha * data_rd_loc_ins)
-    ticks_per_cha = ticks / source_count(data_rd_rem_ins)
+    ticks_per_cha = ticks / aggr_nr(data_rd_rem_ins)
     rem_lat = interval_sec * 1e9 * data_rd_rem_occ / \
         (ticks_per_cha * data_rd_rem_ins)
     return MetricGroup("lpm_miss_lat", [
diff --git a/tools/perf/pmu-events/metric.py b/tools/perf/pmu-events/metric.py
index ac582db785fc..424659b021ff 100644
--- a/tools/perf/pmu-events/metric.py
+++ b/tools/perf/pmu-events/metric.py
@@ -93,7 +93,7 @@ def CheckEveryEvent(*names: str) -> None:
       name = name[:name.find(':')]
     elif '/' in name:
       name = name[:name.find('/')]
-      if any([name.startswith(x) for x in ['amd', 'arm', 'cpu', 'msr', 'power']]):
+      if any([name.startswith(x) for x in ['amd', 'arm', 'cpu', 'msr', 'power', 'cha', 'uncore']]):
         continue
     if name not in all_events_all_models:
       raise Exception(f"Is {name} a named json event?")
@@ -576,6 +576,13 @@ def source_count(event: Event) -> Function:
   return Function('source_count', event)
 
 
+def aggr_nr(event: Event) -> Function:
+  # pylint: disable=redefined-builtin
+  # pylint: disable=invalid-name
+  return Function('aggr_nr', event)
+
+
+
 def has_event(event: Event) -> Function:
   # pylint: disable=redefined-builtin
   # pylint: disable=invalid-name
@@ -762,7 +769,7 @@ def ParsePerfJson(orig: str) -> Expression:
   # Convert accidentally converted scientific notation constants back
   py = re.sub(r'([0-9]+)Event\(r"(e[0-9]*)"\)', r'\1\2', py)
   # Convert all the known keywords back from events to just the keyword
-  keywords = ['if', 'else', 'min', 'max', 'd_ratio', 'source_count', 'has_event', 'strcmp_cpuid_str']
+  keywords = ['if', 'else', 'min', 'max', 'd_ratio', 'source_count', 'aggr_nr', 'has_event', 'strcmp_cpuid_str']
   for kw in keywords:
     py = re.sub(rf'Event\(r"{kw}"\)', kw, py)
   try:
-- 
2.54.0.669.g59709faab0-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] perf stat: Use aggr_nr scaling for Intel uncore miss latency metrics
  2026-05-20 17:58 ` [PATCH 2/2] perf stat: Use aggr_nr scaling for Intel uncore miss latency metrics Chun-Tse Shao
@ 2026-05-20 19:24   ` sashiko-bot
  0 siblings, 0 replies; 6+ messages in thread
From: sashiko-bot @ 2026-05-20 19:24 UTC (permalink / raw)
  To: Chun-Tse Shao; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] Unused import `List` from `typing`.
--

commit 78c2e54794653940a1f0e4de31504f86f0809cd9
Author: Chun-Tse Shao <ctshao@google.com>

perf stat: Use aggr_nr scaling for Intel uncore miss latency metrics

This commit updates metric.py to support the new aggr_nr keyword in the
python metric generator. It also updates IntelMissLat to use aggr_nr instead
of source_count so uncore latency metrics scale correctly on multi-socket
and SNC systems.

> diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
> index 52035433b5054..50d5a20fa354c 100755
> --- a/tools/perf/pmu-events/intel_metrics.py
> +++ b/tools/perf/pmu-events/intel_metrics.py
> @@ -5,9 +5,9 @@ import json
>  import math
>  import os
>  import re
> -from typing import Optional
> +from typing import List, Optional

[Severity: Low]
This isn't a bug, but it looks like List was added to the typing imports
here without being used anywhere else in the file. Should this addition
be dropped to keep the imports clean?

>  from common_metrics import Cycles

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260520180032.3045144-1-ctshao@google.com?part=2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 0/2] perf stat: Fix uncore metric scaling across aggregation modes
@ 2026-05-20 17:47 Chun-Tse Shao
  2026-05-20 17:47 ` [PATCH 1/2] perf stat: Add aggr_nr metric parser support Chun-Tse Shao
  0 siblings, 1 reply; 6+ messages in thread
From: Chun-Tse Shao @ 2026-05-20 17:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chun-Tse Shao, peterz, mingo, acme, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, james.clark,
	sandipan.das, leo.yan, thomas.falcon, yang.lee, linux-perf-users

This series fixes a scaling issue for metrics (like lpm_miss_lat) across
different runtime aggregation modes.

Uncore metrics currently use `source_count` to scale events. However,
`source_count` returns the total uncore unit count regardless of the
selected aggregation mode. When evaluating metrics in different
aggregation mode other than `--per-socket`, this incorrectly divides
aggregated uncore events against the total uncore count rather than the
uncores belonging to the aggregation, leading to wrong metric results.

To fix this, we:
1. Introduce the aggr_nr() keyword to the metric parser, which
dynamically resolves to the active units in the current aggregation
group (`aggr->nr`).

2. Update the python metrics to use `aggr_nr` instead of `source_count`,
ensuring correct scaling across all runtime aggregation boundaries.

Before the fix (incorrect low latency in global mode):
  $ perf stat -M lpm_miss_lat --metric-only -a -j -- sleep 1
  {"ns  lpm_miss_lat_rem" : "122.8", "ns  lpm_miss_lat_loc" : "114.5"}
  $ perf stat -M lpm_miss_lat --per-socket --metric-only -a -j -- sleep 1
  {"socket" : "S0", "ns  lpm_miss_lat_rem" : "232.1", "ns  lpm_miss_lat_loc" : "278.2"}
  {"socket" : "S1", "ns  lpm_miss_lat_rem" : "233.9", "ns  lpm_miss_lat_loc" : "257.5"}

After the fix (correct scaled latency in all aggregation modes):
  $ perf stat -M lpm_miss_lat --metric-only -a -j -- sleep 1
  {"ns  lpm_miss_lat_rem" : "230.0", "ns  lpm_miss_lat_loc" : "220.8"}
  $ perf stat -M lpm_miss_lat --per-socket --metric-only -a -j -- sleep 1
  {"socket" : "S0", "ns  lpm_miss_lat_rem" : "232.5", "ns  lpm_miss_lat_loc" : "269.9"}
  {"socket" : "S1", "ns  lpm_miss_lat_rem" : "225.7", "ns  lpm_miss_lat_loc" : "262.3"}

Chun-Tse Shao (2):
  perf stat: Add aggr_nr metric parser support
  perf stat: Use aggr_nr scaling for Intel uncore miss latency metrics

 tools/perf/pmu-events/intel_metrics.py |  8 ++++----
 tools/perf/pmu-events/metric.py        | 11 +++++++++--
 tools/perf/util/expr.c                 | 27 ++++++++++++++++++++++----
 tools/perf/util/expr.h                 |  4 ++++
 tools/perf/util/expr.l                 |  1 +
 tools/perf/util/expr.y                 | 24 ++++++++++++++++-------
 tools/perf/util/stat-shadow.c          |  4 +++-
 7 files changed, 61 insertions(+), 18 deletions(-)

--
2.54.0.669.g59709faab0-goog


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] perf stat: Add aggr_nr metric parser support
  2026-05-20 17:47 [PATCH 0/2] perf stat: Fix uncore metric scaling across aggregation modes Chun-Tse Shao
@ 2026-05-20 17:47 ` Chun-Tse Shao
  0 siblings, 0 replies; 6+ messages in thread
From: Chun-Tse Shao @ 2026-05-20 17:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chun-Tse Shao, peterz, mingo, acme, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, james.clark,
	sandipan.das, leo.yan, thomas.falcon, yang.lee, linux-perf-users

Introduce the `aggr_nr` function to the metric expression parser.
`aggr_nr` allows metric formulas to dynamically utilize the number of
aggregated targets (`aggr->nr`) instead of relying on the static
`source_count` (which represents the static socket or node count).

This adds the `AGGR_NR` token to the lexer and parser, updates the
expression parsing context helpers to store `aggr_nr`, and feeds
`aggr->nr` from the aggregation structure in `prepare_metric`.
---
 tools/perf/util/expr.c        | 27 +++++++++++++++++++++++----
 tools/perf/util/expr.h        |  4 ++++
 tools/perf/util/expr.l        |  1 +
 tools/perf/util/expr.y        | 24 +++++++++++++++++-------
 tools/perf/util/stat-shadow.c |  4 +++-
 5 files changed, 48 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
index 644769e92708..8d00805f93c4 100644
--- a/tools/perf/util/expr.c
+++ b/tools/perf/util/expr.c
@@ -27,6 +27,7 @@ struct expr_id_data {
 		struct {
 			double val;
 			int source_count;
+			int aggr_nr;
 		} val;
 		struct {
 			double val;
@@ -151,8 +152,8 @@ int expr__add_id_val(struct expr_parse_ctx *ctx, const char *id, double val)
 }

 /* Caller must make sure id is allocated */
-int expr__add_id_val_source_count(struct expr_parse_ctx *ctx, const char *id,
-				  double val, int source_count)
+int expr__add_id_val_source_count_aggr_nr(struct expr_parse_ctx *ctx, const char *id,
+					  double val, int source_count, int aggr_nr)
 {
 	struct expr_id_data *data_ptr = NULL, *old_data = NULL;
 	char *old_key = NULL;
@@ -163,6 +164,7 @@ int expr__add_id_val_source_count(struct expr_parse_ctx *ctx, const char *id,
 		return -ENOMEM;
 	data_ptr->val.val = val;
 	data_ptr->val.source_count = source_count;
+	data_ptr->val.aggr_nr = aggr_nr;
 	data_ptr->kind = EXPR_ID_DATA__VALUE;

 	ret = hashmap__set(ctx->ids, id, data_ptr, &old_key, &old_data);
@@ -177,6 +179,14 @@ int expr__add_id_val_source_count(struct expr_parse_ctx *ctx, const char *id,
 	return ret;
 }

+/* Caller must make sure id is allocated */
+int expr__add_id_val_source_count(struct expr_parse_ctx *ctx, const char *id,
+				  double val, int source_count)
+{
+	return expr__add_id_val_source_count_aggr_nr(ctx, id, val, source_count, 1);
+}
+
+
 int expr__add_ref(struct expr_parse_ctx *ctx, struct metric_ref *ref)
 {
 	struct expr_id_data *data_ptr = NULL, *old_data = NULL;
@@ -390,10 +400,19 @@ double expr_id_data__value(const struct expr_id_data *data)

 double expr_id_data__source_count(const struct expr_id_data *data)
 {
-	assert(data->kind == EXPR_ID_DATA__VALUE);
-	return data->val.source_count;
+	if (data->kind == EXPR_ID_DATA__VALUE)
+		return data->val.source_count;
+	return 1.0;
 }

+double expr_id_data__aggr_nr(const struct expr_id_data *data)
+{
+	if (data->kind == EXPR_ID_DATA__VALUE)
+		return data->val.aggr_nr;
+	return 1.0;
+}
+
+
 double expr__get_literal(const char *literal, const struct expr_scanner_ctx *ctx)
 {
 	double result = NAN;
diff --git a/tools/perf/util/expr.h b/tools/perf/util/expr.h
index c0cec29ddc29..865bb63cdb03 100644
--- a/tools/perf/util/expr.h
+++ b/tools/perf/util/expr.h
@@ -37,6 +37,8 @@ int expr__add_id(struct expr_parse_ctx *ctx, const char *id);
 int expr__add_id_val(struct expr_parse_ctx *ctx, const char *id, double val);
 int expr__add_id_val_source_count(struct expr_parse_ctx *ctx, const char *id,
 				double val, int source_count);
+int expr__add_id_val_source_count_aggr_nr(struct expr_parse_ctx *ctx, const char *id,
+					double val, int source_count, int aggr_nr);
 int expr__add_ref(struct expr_parse_ctx *ctx, struct metric_ref *ref);
 int expr__get_id(struct expr_parse_ctx *ctx, const char *id,
 		 struct expr_id_data **data);
@@ -53,6 +55,8 @@ int expr__find_ids(const char *expr, const char *one,

 double expr_id_data__value(const struct expr_id_data *data);
 double expr_id_data__source_count(const struct expr_id_data *data);
+double expr_id_data__aggr_nr(const struct expr_id_data *data);
+
 double expr__get_literal(const char *literal, const struct expr_scanner_ctx *ctx);
 double expr__has_event(const struct expr_parse_ctx *ctx, bool compute_ids, const char *id);
 double expr__strcmp_cpuid_str(const struct expr_parse_ctx *ctx, bool compute_ids, const char *id);
diff --git a/tools/perf/util/expr.l b/tools/perf/util/expr.l
index a2fc43159ee9..f16ccff278d0 100644
--- a/tools/perf/util/expr.l
+++ b/tools/perf/util/expr.l
@@ -121,6 +121,7 @@ min		{ return MIN; }
 if		{ return IF; }
 else		{ return ELSE; }
 source_count	{ return SOURCE_COUNT; }
+aggr_nr		{ return AGGR_NR; }
 has_event	{ return HAS_EVENT; }
 strcmp_cpuid_str	{ return STRCMP_CPUID_STR; }
 NaN		{ return nan_value(yyscanner); }
diff --git a/tools/perf/util/expr.y b/tools/perf/util/expr.y
index e364790babb5..e20f649354cf 100644
--- a/tools/perf/util/expr.y
+++ b/tools/perf/util/expr.y
@@ -41,7 +41,7 @@ int expr_lex(YYSTYPE * yylval_param , void *yyscanner);
 	} ids;
 }

-%token ID NUMBER MIN MAX IF ELSE LITERAL D_RATIO SOURCE_COUNT HAS_EVENT STRCMP_CPUID_STR EXPR_ERROR
+%token ID NUMBER MIN MAX IF ELSE LITERAL D_RATIO SOURCE_COUNT AGGR_NR HAS_EVENT STRCMP_CPUID_STR EXPR_ERROR
 %left MIN MAX IF
 %left '|'
 %left '^'
@@ -87,8 +87,14 @@ static struct ids union_expr(struct ids ids1, struct ids ids2)
 	return result;
 }

+enum expr_id_kind {
+	EXPR_ID_KIND__VALUE,
+	EXPR_ID_KIND__SOURCE_COUNT,
+	EXPR_ID_KIND__AGGR_NR,
+};
+
 static struct ids handle_id(struct expr_parse_ctx *ctx, char *id,
-			    bool compute_ids, bool source_count)
+			    bool compute_ids, enum expr_id_kind kind)
 {
 	struct ids result;

@@ -101,9 +107,12 @@ static struct ids handle_id(struct expr_parse_ctx *ctx, char *id,

 		result.val = NAN;
 		if (expr__resolve_id(ctx, id, &data) == 0) {
-			result.val = source_count
-				? expr_id_data__source_count(data)
-				: expr_id_data__value(data);
+			if (kind == EXPR_ID_KIND__SOURCE_COUNT)
+				result.val = expr_id_data__source_count(data);
+			else if (kind == EXPR_ID_KIND__AGGR_NR)
+				result.val = expr_id_data__aggr_nr(data);
+			else
+				result.val = expr_id_data__value(data);
 		}
 		result.ids = NULL;
 		free(id);
@@ -201,8 +210,9 @@ expr: NUMBER
 	$$.val = $1;
 	$$.ids = NULL;
 }
-| ID				{ $$ = handle_id(ctx, $1, compute_ids, /*source_count=*/false); }
-| SOURCE_COUNT '(' ID ')'	{ $$ = handle_id(ctx, $3, compute_ids, /*source_count=*/true); }
+| ID				{ $$ = handle_id(ctx, $1, compute_ids, EXPR_ID_KIND__VALUE); }
+| SOURCE_COUNT '(' ID ')'	{ $$ = handle_id(ctx, $3, compute_ids, EXPR_ID_KIND__SOURCE_COUNT); }
+| AGGR_NR '(' ID ')'		{ $$ = handle_id(ctx, $3, compute_ids, EXPR_ID_KIND__AGGR_NR); }
 | HAS_EVENT '(' ID ')'
 {
 	$$.val = expr__has_event(ctx, compute_ids, $3);
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index bc2d44df7baf..e375f6f942be 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -53,6 +53,7 @@ static int prepare_metric(struct perf_stat_config *config,

 	for (i = 0; metric_events[i]; i++) {
 		int source_count = 0, tool_aggr_idx;
+		int aggr_nr = 1;
 		bool is_tool_time =
 			tool_pmu__is_time_event(config, metric_events[i], &tool_aggr_idx);
 		struct perf_stat_evsel *ps = metric_events[i]->stats;
@@ -104,13 +105,14 @@ static int prepare_metric(struct perf_stat_config *config,
 				}
 				if (!source_count)
 					source_count = evsel__source_count(metric_events[i]);
+				aggr_nr = aggr->nr ?: 1;
 			}
 		}
 		n = strdup(evsel__metric_id(metric_events[i]));
 		if (!n)
 			return -ENOMEM;

-		expr__add_id_val_source_count(pctx, n, val, source_count);
+		expr__add_id_val_source_count_aggr_nr(pctx, n, val, source_count, aggr_nr);
 	}

 	for (int j = 0; metric_refs && metric_refs[j].metric_name; j++) {
--
2.54.0.669.g59709faab0-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-20 19:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-20 17:58 [PATCH 0/2] perf stat: Fix uncore metric scaling across aggregation modes Chun-Tse Shao
2026-05-20 17:58 ` [PATCH 1/2] perf stat: Add aggr_nr metric parser support Chun-Tse Shao
2026-05-20 18:28   ` sashiko-bot
2026-05-20 17:58 ` [PATCH 2/2] perf stat: Use aggr_nr scaling for Intel uncore miss latency metrics Chun-Tse Shao
2026-05-20 19:24   ` sashiko-bot
  -- strict thread matches above, loose matches on Subject: below --
2026-05-20 17:47 [PATCH 0/2] perf stat: Fix uncore metric scaling across aggregation modes Chun-Tse Shao
2026-05-20 17:47 ` [PATCH 1/2] perf stat: Add aggr_nr metric parser support Chun-Tse Shao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox