Re: [PATCH 4/4] perf: nvidia_t410_c2c: handle PERF_EF_UPDATE in stop

Linux Perf Users
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Saurav Sachidanand" <sauravsc@amazon.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [PATCH 4/4] perf: nvidia_t410_c2c: handle PERF_EF_UPDATE in stop
Date: Thu, 14 May 2026 17:34:32 +0000	[thread overview]
Message-ID: <20260514173433.B51DAC2BCB3@smtp.kernel.org> (raw)
In-Reply-To: <20260514010629.76558-4-sauravsc@amazon.com>

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [Critical] Calling nv_c2c_pmu_event_update() in nv_c2c_pmu_stop() causes double-counting and massive counter wrap-around corruption.
- [High] Logically stopping the event fails to exclude counts because the hardware counter continues running and start logic is unmodified.
- [High] nv_c2c_pmu_stop() incorrectly uses an early return on PERF_HES_STOPPED, permanently dropping final counter updates.
- [Medium] nv_c2c_pmu_stop() fails to set PERF_HES_UPTODATE after updating the event.
--

commit 1d56e58b1d1592eb3b08b91d9c3d09bd821ef6ad
Author: Saurav Sachidanand <sauravsc@amazon.com>

perf: nvidia_t410_c2c: handle PERF_EF_UPDATE in stop

This commit updates nv_c2c_pmu_stop() to read the final counter value when
called with PERF_EF_UPDATE, aiming to prevent lost counter deltas when the
perf core removes the event.

> diff --git a/drivers/perf/nvidia_t410_c2c_pmu.c b/drivers/perf/nvidia_t410_c2c_pmu.c
> index 662fa1bc833a5..39d0e661e4c5c 100644
> --- a/drivers/perf/nvidia_t410_c2c_pmu.c
> +++ b/drivers/perf/nvidia_t410_c2c_pmu.c
> @@ -394,6 +394,12 @@ static void nv_c2c_pmu_start(struct perf_event *event, int pmu_flags)
>  
>  static void nv_c2c_pmu_stop(struct perf_event *event, int pmu_flags)
>  {
> +	if (event->hw.state & PERF_HES_STOPPED)
> +		return;

Does this early return permanently drop final counter updates?

If the core stops the event without requesting an update first (for example,
via pmu->stop(event, 0) during event throttling), the event is marked
PERF_HES_STOPPED.

When the core later attempts to drain the final unread counts by calling
pmu->stop(event, PERF_EF_UPDATE), the early return will trigger and the
final update will be entirely skipped.

> +
> +	if (pmu_flags & PERF_EF_UPDATE)
> +		nv_c2c_pmu_event_update(event);

Can this cause double-counting and massive counter wrap-around?

The perf core wraps event teardowns within perf_pmu_disable() and
perf_pmu_enable(). For this PMU, nv_c2c_pmu_disable() halts global
counters, calls nv_c2c_pmu_event_update() to accumulate final counts,
and then explicitly sets event->hw.prev_count to 0.

Subsequently, pmu->del() calls nv_c2c_pmu_stop() with PERF_EF_UPDATE, which
redundantly calls nv_c2c_pmu_event_update() again. Because prev_count was
just reset to 0 but the stopped hardware counter still retains its final
value V, the second update computes V - 0 = V and erroneously adds the full
lifetime count to event->count a second time.

Additionally, this second update overwrites prev_count with V. When the PMU
restarts, the physical counters clear to 0. The next update will read a
small new value W, calculate W - V, and cause a massive unsigned 64-bit
integer underflow.

Also, if the event is stopped logically without disabling the entire global
PMU, the hardware counter continues incrementing. Since nv_c2c_pmu_start()
is unmodified and does not synchronize prev_count to the new running
value, won't all counts accumulated while the event was logically stopped
still be incorrectly added to event->count upon the next update?

> +
>  	event->hw.state |= PERF_HES_STOPPED;

Should this also set PERF_HES_UPTODATE?

The standard PMU stop pattern usually sets PERF_HES_UPTODATE after an
update so the core knows the software counter is synchronized with the
hardware.

>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260514010629.76558-1-sauravsc@amazon.com?part=4

next prev parent reply	other threads:[~2026-05-14 17:34 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-14  1:06 [PATCH 1/4] perf: nvidia_t410_cmem_latency: fix cpuhp state leak on init failure Saurav Sachidanand
2026-05-14  1:06 ` [PATCH 2/4] perf: nvidia_t410_cmem_latency: handle PERF_EF_UPDATE in stop Saurav Sachidanand
2026-05-14 12:49   ` sashiko-bot
2026-05-14 15:59   ` Besar Wicaksono
2026-05-14  1:06 ` [PATCH 3/4] perf: nvidia_t410_c2c: fix cpuhp state leak on init failure Saurav Sachidanand
2026-05-14 15:57   ` Besar Wicaksono
2026-05-14  1:06 ` [PATCH 4/4] perf: nvidia_t410_c2c: handle PERF_EF_UPDATE in stop Saurav Sachidanand
2026-05-14 16:00   ` Besar Wicaksono
2026-05-14 17:34   ` sashiko-bot [this message]
2026-05-14 15:55 ` [PATCH 1/4] perf: nvidia_t410_cmem_latency: fix cpuhp state leak on init failure Besar Wicaksono

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260514173433.B51DAC2BCB3@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=sauravsc@amazon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox