Linux Perf Users
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Calvin Owens" <calvin@wbinvd.org>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [PATCH v2 2/2] perf: Don't throttle based on NMI watchdog events
Date: Wed, 29 Apr 2026 20:38:57 +0000	[thread overview]
Message-ID: <20260429203858.CA1ACC19425@smtp.kernel.org> (raw)
In-Reply-To: <33d87384aa5f96af76949d1399476779dd4f4fce.1777483573.git.calvin@wbinvd.org>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Non-atomic 64-bit cmpxchg in NMI context on 32-bit architectures causes torn writes and breaks nesting protection.
- [High] Logic error in time gap calculation incorrectly includes the current NMI's duration, contradicting the comment and incorrectly bypassing throttling.
- [High] Hardcoded 1-second bypass fundamentally defeats the `perf_cpu_time_max_percent` sysctl limit for low-frequency, long-running PMU events.
--

commit 657f1aa01c3c8a25116ffd10c6d50891dd498e3f
Author: Calvin Owens <calvin@wbinvd.org>

perf: Don't throttle based on NMI watchdog events

This commit modifies perf_sample_event_took() to skip throttling for
infrequent NMIs by checking if the previous NMI occurred more than one
second ago. It uses a per-CPU timestamp updated via cmpxchg to track
the last throttle time.

> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 5b5cb620499e..6f367da17ff5 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -596,6 +596,7 @@ core_initcall(init_events_core_sysctls);
>   */
>  #define NR_ACCUMULATED_SAMPLES 128
>  static DEFINE_PER_CPU(u64, running_sample_length);
> +static DEFINE_PER_CPU(u64, last_throttle_clock);

[ ... ]

> @@ -636,6 +639,19 @@ void perf_sample_event_took(u64 sample_len_ns)
>  	if (avg_len <= max_len)
>  		return;
>  
> +	/*
> +	 * Very infrequent events like the perf counter hard watchdog
> +	 * can trigger spurious throttling: skip throttling if the prior
> +	 * NMI got here more than one second before this NMI began. But
> +	 * never skip throttling if NMIs are nesting, or if any NMI runs
> +	 * for longer than one second.
> +	 */
> +	now = local_clock();
> +	last = __this_cpu_read(last_throttle_clock);
> +	if (__this_cpu_cmpxchg(last_throttle_clock, last, now) == last &&

On 32-bit architectures, is it safe to use __this_cpu_cmpxchg() on a 64-bit
value in NMI context?

For architectures like 32-bit x86, the generic fallback implementation
achieves atomicity by disabling interrupts, which does not mask NMIs. If a
nested NMI interrupts the sequence halfway through reading or writing the
32-bit halves of the u64, could this result in a torn read or write?

A corrupted timestamp could cause a massive wrap-around in the time gap
calculation, perpetually satisfying the > NSEC_PER_SEC bypass condition and
silently disabling PMU throttling for all events on that CPU.

> +	    now - last > NSEC_PER_SEC && sample_len_ns < NSEC_PER_SEC)

The comment indicates the intent is to skip throttling if the prior NMI
occurred more than one second before this NMI began.

However, local_clock() is sampled at the end of the current NMI, so now
includes the current NMI's duration.

If an NMI takes a very long time (e.g., 900ms) but started only 200ms after
the previous NMI ended, the true gap before it began is 200ms. Because now
includes the 900ms duration, now - last evaluates to 1100ms, which bypasses
the throttling.

Could this artificially inflate the time gap for long-running NMIs,
directly defeating the throttling mechanism?

Additionally, is it possible for user-defined PMU events with expensive
overhead (like heavy eBPF programs attached to slow tracepoints) to exploit
this hardcoded frequency bypass?

If an event triggers slower than 1Hz but takes a very long time to execute
(e.g., 999ms), both now - last > NSEC_PER_SEC and sample_len_ns <
NSEC_PER_SEC will be true.

Such an event would bypass the moving average calculation and never be
throttled, allowing it to consume nearly 100% of the CPU while it runs and
violating the default 25% CPU time limit set by
sysctl_perf_cpu_time_max_percent.

> +		return;
> +
>  	__report_avg = avg_len;
>  	__report_allowed = max_len;
>

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1777483573.git.calvin@wbinvd.org?part=2

  reply	other threads:[~2026-04-29 20:38 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-29 17:36 [PATCH v2 0/2] Two semi-related perf throttling fixes Calvin Owens
2026-04-29 17:36 ` [PATCH v2 1/2] perf/x86: Avoid double accounting of PMU NMI latencies Calvin Owens
2026-04-29 17:36 ` [PATCH v2 2/2] perf: Don't throttle based on NMI watchdog events Calvin Owens
2026-04-29 20:38   ` sashiko-bot [this message]
2026-04-29 22:08   ` Calvin Owens
2026-04-29 22:15     ` Ian Rogers
2026-04-29 22:41       ` Calvin Owens
2026-05-01 17:07     ` Calvin Owens
2026-05-01 20:54   ` Peter Zijlstra
2026-05-02  9:52     ` Calvin Owens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260429203858.CA1ACC19425@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=calvin@wbinvd.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=sashiko@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox