* [PATCH] perf stat: Fix uncore metric scaling bug across sockets and nodes
@ 2026-05-18 23:11 Chun-Tse Shao
2026-05-18 23:44 ` sashiko-bot
0 siblings, 1 reply; 4+ messages in thread
From: Chun-Tse Shao @ 2026-05-18 23:11 UTC (permalink / raw)
To: linux-kernel
Cc: Chun-Tse Shao, peterz, mingo, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, james.clark,
yang.lee, linux-perf-users
In multi-socket or Sub-NUMA Clustering (SNC) configurations, uncore
metrics (such as lpm_miss_lat) calculate incorrect values because they
divide by a static socket CHA count rather than the aggregation target.
Fix this by dynamically utilizing the aggregation count (`aggr->nr`) on
the metric's leader event. In standard aggregation modes, `aggr->nr` is
already automatically populated with the correct number of active
hardware units contributing to that stats bucket.
Before the fix:
perf stat -M lpm_miss_lat --per-socket --metric-only -a -j -- sleep 1
{"socket" : "S0", "counters" : 28, "ns lpm_miss_lat_rem" : "163.4", "ns lpm_miss_lat_loc" : "27.5"}
{"socket" : "S1", "counters" : 28, "ns lpm_miss_lat_rem" : "170.5", "ns lpm_miss_lat_loc" : "25.5"}
perf stat -M lpm_miss_lat --metric-only -a -j -- sleep 1
{"ns lpm_miss_lat_rem" : "90.4", "ns lpm_miss_lat_loc" : "12.6"}
`lpm_miss_lat` on global aggr is 0.5x the actual value.
After the fix:
perf stat -M lpm_miss_lat --per-socket --metric-only -a -j -- sleep 1
{"socket" : "S0", "counters" : 28, "ns lpm_miss_lat_rem" : "174.8", "ns lpm_miss_lat_loc" : "34.8"}
{"socket" : "S1", "counters" : 28, "ns lpm_miss_lat_rem" : "170.4", "ns lpm_miss_lat_loc" : "23.9"}
perf stat -M lpm_miss_lat --metric-only -a -j -- sleep 1
{"ns lpm_miss_lat_rem" : "174.5", "ns lpm_miss_lat_loc" : "26.2"}
Signed-off-by: Chun-Tse Shao <ctshao@google.com>
---
tools/perf/util/stat-shadow.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index bc2d44df7baf..60fdf26c5bb0 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -103,7 +103,7 @@ static int prepare_metric(struct perf_stat_config *config,
val *= 1e-9;
}
if (!source_count)
- source_count = evsel__source_count(metric_events[i]);
+ source_count = aggr->nr;
}
}
n = strdup(evsel__metric_id(metric_events[i]));
--
2.54.0.746.g67dd491aae-goog
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] perf stat: Fix uncore metric scaling bug across sockets and nodes
2026-05-18 23:11 [PATCH] perf stat: Fix uncore metric scaling bug across sockets and nodes Chun-Tse Shao
@ 2026-05-18 23:44 ` sashiko-bot
2026-05-19 18:06 ` Chun-Tse Shao
0 siblings, 1 reply; 4+ messages in thread
From: sashiko-bot @ 2026-05-18 23:44 UTC (permalink / raw)
To: Chun-Tse Shao; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The patch codifies a mathematical scaling error across all aggregation modes for uncore metrics by changing `source_count` to `aggr->nr` without removing `#num_packages` from the JSON formulas.
--
commit 1c2296ec7fca6ef588c3a63c6bd68377de82335f
Author: Chun-Tse Shao <ctshao@google.com>
perf stat: Fix uncore metric scaling bug across sockets and nodes
This commit changes how source_count is determined for uncore metrics.
Instead of using the static PMU count via evsel__source_count(), it
dynamically assigns aggr->nr from the metric's leader event to handle
various aggregation modes.
> Before the fix:
> perf stat -M lpm_miss_lat --per-socket --metric-only -a -j -- sleep 1
> {"socket" : "S0", "counters" : 28, "ns lpm_miss_lat_rem" : "163.4", "ns lpm_miss_lat_loc" : "27.5"}
> {"socket" : "S1", "counters" : 28, "ns lpm_miss_lat_rem" : "170.5", "ns lpm_miss_lat_loc" : "25.5"}
> perf stat -M lpm_miss_lat --metric-only -a -j -- sleep 1
> {"ns lpm_miss_lat_rem" : "90.4", "ns lpm_miss_lat_loc" : "12.6"}
> `lpm_miss_lat` on global aggr is 0.5x the actual value.
Does this misdiagnose the correct baseline? Existing uncore metrics like
lpm_miss_lat are defined in JSON formulas as dividing by
(source_count(TICKS) * #num_packages).
Before this patch, evsel__source_count() returned the static PMU count per
package (e.g., 28 CHAs). In global aggregation, dividing the global sum of 56
CHAs by (28 * 2) = 56 yields the correct average of ~90.4ns.
The --per-socket mode yielded an incorrectly scaled value because it divided
the sum of 28 CHAs by (28 * 2) = 56 instead of 28.
> diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
> --- a/tools/perf/util/stat-shadow.c
> +++ b/tools/perf/util/stat-shadow.c
[ ... ]
> @@ -103,7 +103,7 @@ static int prepare_metric(struct perf_stat_config *config,
> val *= 1e-9;
> }
> if (!source_count)
> - source_count = evsel__source_count(metric_events[i]);
> + source_count = aggr->nr;
> }
> }
Does changing source_count to aggr->nr break global mode calculations without
updating the JSON metric formulas?
In global mode, aggr->nr dynamically evaluates to 56 instead of 28. Because the
JSON formula still multiplies by #num_packages, the metric divides the global
sum of 56 CHAs by (56 * 2) = 112. This scales the true average incorrectly,
causing the global mode to output the doubled latency value of ~174.5ns.
Similarly, in --per-socket mode, aggr->nr evaluates to 28, and the formula
still divides the sum of 28 CHAs by (28 * 2) = 56, retaining the original
scaling error in per-socket mode.
Should the #num_packages multiplier be removed from the JSON formulas when
making source_count dynamically scale with the aggregation mode?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260518231150.1252420-1-ctshao@google.com?part=1
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] perf stat: Fix uncore metric scaling bug across sockets and nodes
2026-05-18 23:44 ` sashiko-bot
@ 2026-05-19 18:06 ` Chun-Tse Shao
2026-05-20 0:56 ` Namhyung Kim
0 siblings, 1 reply; 4+ messages in thread
From: Chun-Tse Shao @ 2026-05-19 18:06 UTC (permalink / raw)
To: sashiko-reviews; +Cc: linux-perf-users
Based on the AI review, instead of changing `source_count` behavior, I
will expose `aggr->nr` in perf python metrics and fix per-aggregation
metrics accordingly.
Thanks,
CT
On Mon, May 18, 2026 at 4:44 PM <sashiko-bot@kernel.org> wrote:
>
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [Medium] The patch codifies a mathematical scaling error across all aggregation modes for uncore metrics by changing `source_count` to `aggr->nr` without removing `#num_packages` from the JSON formulas.
> --
>
> commit 1c2296ec7fca6ef588c3a63c6bd68377de82335f
> Author: Chun-Tse Shao <ctshao@google.com>
>
> perf stat: Fix uncore metric scaling bug across sockets and nodes
>
> This commit changes how source_count is determined for uncore metrics.
> Instead of using the static PMU count via evsel__source_count(), it
> dynamically assigns aggr->nr from the metric's leader event to handle
> various aggregation modes.
>
> > Before the fix:
> > perf stat -M lpm_miss_lat --per-socket --metric-only -a -j -- sleep 1
> > {"socket" : "S0", "counters" : 28, "ns lpm_miss_lat_rem" : "163.4", "ns lpm_miss_lat_loc" : "27.5"}
> > {"socket" : "S1", "counters" : 28, "ns lpm_miss_lat_rem" : "170.5", "ns lpm_miss_lat_loc" : "25.5"}
> > perf stat -M lpm_miss_lat --metric-only -a -j -- sleep 1
> > {"ns lpm_miss_lat_rem" : "90.4", "ns lpm_miss_lat_loc" : "12.6"}
> > `lpm_miss_lat` on global aggr is 0.5x the actual value.
>
> Does this misdiagnose the correct baseline? Existing uncore metrics like
> lpm_miss_lat are defined in JSON formulas as dividing by
> (source_count(TICKS) * #num_packages).
>
> Before this patch, evsel__source_count() returned the static PMU count per
> package (e.g., 28 CHAs). In global aggregation, dividing the global sum of 56
> CHAs by (28 * 2) = 56 yields the correct average of ~90.4ns.
>
> The --per-socket mode yielded an incorrectly scaled value because it divided
> the sum of 28 CHAs by (28 * 2) = 56 instead of 28.
>
> > diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
> > --- a/tools/perf/util/stat-shadow.c
> > +++ b/tools/perf/util/stat-shadow.c
>
> [ ... ]
>
> > @@ -103,7 +103,7 @@ static int prepare_metric(struct perf_stat_config *config,
> > val *= 1e-9;
> > }
> > if (!source_count)
> > - source_count = evsel__source_count(metric_events[i]);
> > + source_count = aggr->nr;
> > }
> > }
>
> Does changing source_count to aggr->nr break global mode calculations without
> updating the JSON metric formulas?
>
> In global mode, aggr->nr dynamically evaluates to 56 instead of 28. Because the
> JSON formula still multiplies by #num_packages, the metric divides the global
> sum of 56 CHAs by (56 * 2) = 112. This scales the true average incorrectly,
> causing the global mode to output the doubled latency value of ~174.5ns.
>
> Similarly, in --per-socket mode, aggr->nr evaluates to 28, and the formula
> still divides the sum of 28 CHAs by (28 * 2) = 56, retaining the original
> scaling error in per-socket mode.
>
> Should the #num_packages multiplier be removed from the JSON formulas when
> making source_count dynamically scale with the aggregation mode?
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260518231150.1252420-1-ctshao@google.com?part=1
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] perf stat: Fix uncore metric scaling bug across sockets and nodes
2026-05-19 18:06 ` Chun-Tse Shao
@ 2026-05-20 0:56 ` Namhyung Kim
0 siblings, 0 replies; 4+ messages in thread
From: Namhyung Kim @ 2026-05-20 0:56 UTC (permalink / raw)
To: Chun-Tse Shao; +Cc: sashiko-reviews, linux-perf-users
Hi CT,
On Tue, May 19, 2026 at 11:06:40AM -0700, Chun-Tse Shao wrote:
> Based on the AI review, instead of changing `source_count` behavior, I
> will expose `aggr->nr` in perf python metrics and fix per-aggregation
> metrics accordingly.
Please don't top post.
https://subspace.kernel.org/etiquette.html#do-not-top-post-when-replying
Thanks,
Namhyung
>
> On Mon, May 18, 2026 at 4:44 PM <sashiko-bot@kernel.org> wrote:
> >
> > Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> > - [Medium] The patch codifies a mathematical scaling error across all aggregation modes for uncore metrics by changing `source_count` to `aggr->nr` without removing `#num_packages` from the JSON formulas.
> > --
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-05-20 0:56 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-18 23:11 [PATCH] perf stat: Fix uncore metric scaling bug across sockets and nodes Chun-Tse Shao
2026-05-18 23:44 ` sashiko-bot
2026-05-19 18:06 ` Chun-Tse Shao
2026-05-20 0:56 ` Namhyung Kim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox