Linux Perf Users
 help / color / mirror / Atom feed
* [PATCH v2] perf stat: Fix false NMI watchdog warning in aggregation modes
@ 2026-06-11 21:56 Chun-Tse Shao
  2026-06-11 22:10 ` sashiko-bot
  2026-06-11 22:33 ` Ian Rogers
  0 siblings, 2 replies; 4+ messages in thread
From: Chun-Tse Shao @ 2026-06-11 21:56 UTC (permalink / raw)
  To: acme, namhyung; +Cc: irogers, linux-perf-users, linux-kernel, Chun-Tse Shao

In aggregation modes (e.g. --per-socket, --per-die, etc.), a
counter might not be scheduled or counted on specific aggregate
groups if it was not assigned to the CPUs belonging to those
groups. However, the printout() check triggers the
"print_free_counters_hint" logic unconditionally for any
supported counter with a missing count. This results in a false
"Some events weren't counted. Try disabling the NMI watchdog"
warning.

Furthermore, the NMI watchdog only reserves performance counters
on core PMUs. Uncore PMU events (e.g. CHA, IMC) are not affected
by the NMI watchdog, but their failures also falsely triggered
this warning.

This warning was originally introduced in commit 02d492e5dcb7
("perf stat: Issue a HW watchdog disable hint").

To fix this, restrict setting of print_free_counters_hint to
only trigger for core PMU events by checking counter->pmu and
counter->pmu->is_core.

Example before/after:

$ perf stat -M lpm_miss_lat --metric-only --per-socket -a -- sleep 1

Before:
 Performance counter stats for 'system wide':

       ns  lpm_miss_lat_rem ns  lpm_miss_lat_loc
S0      126                202.3               207.9
S1      126                231.9               259.3

       1.006029831 seconds time elapsed

Some events weren't counted. Try disabling the NMI watchdog:
        echo 0 > /proc/sys/kernel/nmi_watchdog
        perf stat ...
        echo 1 > /proc/sys/kernel/nmi_watchdog

After:
 Performance counter stats for 'system wide':

       ns  lpm_miss_lat_rem ns  lpm_miss_lat_loc
S0      126                202.3               207.9
S1      126                231.9               259.3

       1.006029831 seconds time elapsed

Assisted-by: Gemini:gemini-next
Signed-off-by: Chun-Tse Shao <ctshao@google.com>
---
 tools/perf/util/stat-display.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 2b69d238858c..0a5750bb59fa 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -821,9 +821,9 @@ static void printout(struct perf_stat_config *config, struct outstate *os,
 		ok = false;
 
 		if (counter->supported) {
-			if (!evlist__has_hybrid_pmus(counter->evlist)) {
+			if (!evlist__has_hybrid_pmus(counter->evlist) &&
+			    counter->pmu && counter->pmu->is_core)
 				config->print_free_counters_hint = 1;
-			}
 		}
 	}
 
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] perf stat: Fix false NMI watchdog warning in aggregation modes
  2026-06-11 21:56 [PATCH v2] perf stat: Fix false NMI watchdog warning in aggregation modes Chun-Tse Shao
@ 2026-06-11 22:10 ` sashiko-bot
  2026-06-11 22:33 ` Ian Rogers
  1 sibling, 0 replies; 4+ messages in thread
From: sashiko-bot @ 2026-06-11 22:10 UTC (permalink / raw)
  To: Chun-Tse Shao; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Directly accessing `counter->pmu` without lazy initialization suppresses the NMI watchdog hint during `perf stat report`.
--

commit 8ecd592e1137cf0841158570884e6514049ebf6f
Author: Chun-Tse Shao <ctshao@google.com>

perf stat: Fix false NMI watchdog warning in aggregation modes

This commit fixes false NMI watchdog warnings in aggregation modes by
restricting the hint to core PMU events. It prevents the warning from
triggering on uncore PMU events by checking the event's PMU and its
core flag.

> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index 993f4c4b8f442..b576776917353 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
[ ... ]
> @@ -823,9 +823,9 @@ static void printout(struct perf_stat_config *config, struct outstate *os,
>  		ok = false;
>  
>  		if (counter->supported) {
> -			if (!evlist__has_hybrid_pmus(counter->evlist)) {
> +			if (!evlist__has_hybrid_pmus(counter->evlist) &&
> +			    counter->pmu && counter->pmu->is_core)

[Severity: Medium]
During a perf stat report on non-hybrid machines, will counter->pmu be null
here because it is lazily initialized when constructed from perf.data?

Since evlist__has_hybrid_pmus() immediately returns false on non-hybrid
machines without iterating over the events, counter->pmu would remain
uninitialized.

Does this code safely handle lazy initialization, or should it use
evsel__find_pmu(counter) to guarantee the PMU is resolved before checking
is_core?

>  				config->print_free_counters_hint = 1;
> -			}
>  		}
>  	}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260611215632.562900-1-ctshao@google.com?part=1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] perf stat: Fix false NMI watchdog warning in aggregation modes
  2026-06-11 21:56 [PATCH v2] perf stat: Fix false NMI watchdog warning in aggregation modes Chun-Tse Shao
  2026-06-11 22:10 ` sashiko-bot
@ 2026-06-11 22:33 ` Ian Rogers
  2026-06-12  0:22   ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 4+ messages in thread
From: Ian Rogers @ 2026-06-11 22:33 UTC (permalink / raw)
  To: Chun-Tse Shao; +Cc: acme, namhyung, linux-perf-users, linux-kernel

On Thu, Jun 11, 2026 at 2:56 PM Chun-Tse Shao <ctshao@google.com> wrote:
>
> In aggregation modes (e.g. --per-socket, --per-die, etc.), a
> counter might not be scheduled or counted on specific aggregate
> groups if it was not assigned to the CPUs belonging to those
> groups. However, the printout() check triggers the
> "print_free_counters_hint" logic unconditionally for any
> supported counter with a missing count. This results in a false
> "Some events weren't counted. Try disabling the NMI watchdog"
> warning.
>
> Furthermore, the NMI watchdog only reserves performance counters
> on core PMUs. Uncore PMU events (e.g. CHA, IMC) are not affected
> by the NMI watchdog, but their failures also falsely triggered
> this warning.
>
> This warning was originally introduced in commit 02d492e5dcb7
> ("perf stat: Issue a HW watchdog disable hint").
>
> To fix this, restrict setting of print_free_counters_hint to
> only trigger for core PMU events by checking counter->pmu and
> counter->pmu->is_core.
>
> Example before/after:
>
> $ perf stat -M lpm_miss_lat --metric-only --per-socket -a -- sleep 1
>
> Before:
>  Performance counter stats for 'system wide':
>
>        ns  lpm_miss_lat_rem ns  lpm_miss_lat_loc
> S0      126                202.3               207.9
> S1      126                231.9               259.3
>
>        1.006029831 seconds time elapsed
>
> Some events weren't counted. Try disabling the NMI watchdog:
>         echo 0 > /proc/sys/kernel/nmi_watchdog
>         perf stat ...
>         echo 1 > /proc/sys/kernel/nmi_watchdog
>
> After:
>  Performance counter stats for 'system wide':
>
>        ns  lpm_miss_lat_rem ns  lpm_miss_lat_loc
> S0      126                202.3               207.9
> S1      126                231.9               259.3
>
>        1.006029831 seconds time elapsed
>
> Assisted-by: Gemini:gemini-next
> Signed-off-by: Chun-Tse Shao <ctshao@google.com>

Reviewed-by: Ian Rogers <irogers@google.com>

The suggestion for something like:
```
pmu = evsel__find_pmu(counter);
if (pmu && pmu->is_core)
```
isn't really necessary because we nearly always set the PMU. Also, a
case where we lack a pmu has historically been for a core PMU, making
the whole thing contradictory. The patch as-is is clear.

Thanks,
Ian

> ---
>  tools/perf/util/stat-display.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index 2b69d238858c..0a5750bb59fa 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -821,9 +821,9 @@ static void printout(struct perf_stat_config *config, struct outstate *os,
>                 ok = false;
>
>                 if (counter->supported) {
> -                       if (!evlist__has_hybrid_pmus(counter->evlist)) {
> +                       if (!evlist__has_hybrid_pmus(counter->evlist) &&
> +                           counter->pmu && counter->pmu->is_core)
>                                 config->print_free_counters_hint = 1;
> -                       }
>                 }
>         }
>
> --
> 2.54.0.1136.gdb2ca164c4-goog
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] perf stat: Fix false NMI watchdog warning in aggregation modes
  2026-06-11 22:33 ` Ian Rogers
@ 2026-06-12  0:22   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 4+ messages in thread
From: Arnaldo Carvalho de Melo @ 2026-06-12  0:22 UTC (permalink / raw)
  To: Ian Rogers; +Cc: Chun-Tse Shao, namhyung, linux-perf-users, linux-kernel

On Thu, Jun 11, 2026 at 03:33:38PM -0700, Ian Rogers wrote:
> On Thu, Jun 11, 2026 at 2:56 PM Chun-Tse Shao <ctshao@google.com> wrote:
> > To fix this, restrict setting of print_free_counters_hint to
> > only trigger for core PMU events by checking counter->pmu and
> > counter->pmu->is_core.

> > Assisted-by: Gemini:gemini-next
> > Signed-off-by: Chun-Tse Shao <ctshao@google.com>
> 
> Reviewed-by: Ian Rogers <irogers@google.com>
> 
> The suggestion for something like:
> ```
> pmu = evsel__find_pmu(counter);
> if (pmu && pmu->is_core)
> ```
> isn't really necessary because we nearly always set the PMU. Also, a
> case where we lack a pmu has historically been for a core PMU, making
> the whole thing contradictory. The patch as-is is clear.

Thanks, applied to perf-tools-next, for v7.2.

- Arnaldo

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-12  0:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-11 21:56 [PATCH v2] perf stat: Fix false NMI watchdog warning in aggregation modes Chun-Tse Shao
2026-06-11 22:10 ` sashiko-bot
2026-06-11 22:33 ` Ian Rogers
2026-06-12  0:22   ` Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox