Re: [PATCH 4/4] perf: nvidia_t410_c2c: handle PERF_EF_UPDATE in stop

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Saurav Sachidanand" <sauravsc@amazon.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [PATCH 4/4] perf: nvidia_t410_c2c: handle PERF_EF_UPDATE in stop
Date: Thu, 14 May 2026 17:34:32 +0000	[thread overview]
Message-ID: <20260514173433.B51DAC2BCB3@smtp.kernel.org> (raw)
In-Reply-To: <20260514010629.76558-4-sauravsc@amazon.com>

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [Critical] Calling nv_c2c_pmu_event_update() in nv_c2c_pmu_stop() causes double-counting and massive counter wrap-around corruption.
- [High] Logically stopping the event fails to exclude counts because the hardware counter continues running and start logic is unmodified.
- [High] nv_c2c_pmu_stop() incorrectly uses an early return on PERF_HES_STOPPED, permanently dropping final counter updates.
- [Medium] nv_c2c_pmu_stop() fails to set PERF_HES_UPTODATE after updating the event.
--

commit 1d56e58b1d1592eb3b08b91d9c3d09bd821ef6ad
Author: Saurav Sachidanand <sauravsc@amazon.com>

perf: nvidia_t410_c2c: handle PERF_EF_UPDATE in stop

This commit updates nv_c2c_pmu_stop() to read the final counter value when
called with PERF_EF_UPDATE, aiming to prevent lost counter deltas when the
perf core removes the event.

> diff --git a/drivers/perf/nvidia_t410_c2c_pmu.c b/drivers/perf/nvidia_t410_c2c_pmu.c
> index 662fa1bc833a5..39d0e661e4c5c 100644
> --- a/drivers/perf/nvidia_t410_c2c_pmu.c
> +++ b/drivers/perf/nvidia_t410_c2c_pmu.c
> @@ -394,6 +394,12 @@ static void nv_c2c_pmu_start(struct perf_event *event, int pmu_flags)
>  
>  static void nv_c2c_pmu_stop(struct perf_event *event, int pmu_flags)
>  {
> +	if (event->hw.state & PERF_HES_STOPPED)
> +		return;

Does this early return permanently drop final counter updates?

If the core stops the event without requesting an update first (for example,
via pmu->stop(event, 0) during event throttling), the event is marked
PERF_HES_STOPPED.

When the core later attempts to drain the final unread counts by calling
pmu->stop(event, PERF_EF_UPDATE), the early return will trigger and the
final update will be entirely skipped.

> +
> +	if (pmu_flags & PERF_EF_UPDATE)
> +		nv_c2c_pmu_event_update(event);

Can this cause double-counting and massive counter wrap-around?

The perf core wraps event teardowns within perf_pmu_disable() and
perf_pmu_enable(). For this PMU, nv_c2c_pmu_disable() halts global
counters, calls nv_c2c_pmu_event_update() to accumulate final counts,
and then explicitly sets event->hw.prev_count to 0.

Subsequently, pmu->del() calls nv_c2c_pmu_stop() with PERF_EF_UPDATE, which
redundantly calls nv_c2c_pmu_event_update() again. Because prev_count was
just reset to 0 but the stopped hardware counter still retains its final
value V, the second update computes V - 0 = V and erroneously adds the full
lifetime count to event->count a second time.

Additionally, this second update overwrites prev_count with V. When the PMU
restarts, the physical counters clear to 0. The next update will read a
small new value W, calculate W - V, and cause a massive unsigned 64-bit
integer underflow.

Also, if the event is stopped logically without disabling the entire global
PMU, the hardware counter continues incrementing. Since nv_c2c_pmu_start()
is unmodified and does not synchronize prev_count to the new running
value, won't all counts accumulated while the event was logically stopped
still be incorrectly added to event->count upon the next update?

> +
>  	event->hw.state |= PERF_HES_STOPPED;

Should this also set PERF_HES_UPTODATE?

The standard PMU stop pattern usually sets PERF_HES_UPTODATE after an
update so the core knows the software counter is synchronized with the
hardware.

>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260514010629.76558-1-sauravsc@amazon.com?part=4

next prev parent reply	other threads:[~2026-05-14 17:34 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-14  1:06 [PATCH 1/4] perf: nvidia_t410_cmem_latency: fix cpuhp state leak on init failure Saurav Sachidanand
2026-05-14  1:06 ` [PATCH 2/4] perf: nvidia_t410_cmem_latency: handle PERF_EF_UPDATE in stop Saurav Sachidanand
2026-05-14 12:49   ` sashiko-bot
2026-05-14 15:59   ` Besar Wicaksono
2026-05-14  1:06 ` [PATCH 3/4] perf: nvidia_t410_c2c: fix cpuhp state leak on init failure Saurav Sachidanand
2026-05-14 15:57   ` Besar Wicaksono
2026-05-14  1:06 ` [PATCH 4/4] perf: nvidia_t410_c2c: handle PERF_EF_UPDATE in stop Saurav Sachidanand
2026-05-14 16:00   ` Besar Wicaksono
2026-05-14 17:34   ` sashiko-bot [this message]
2026-05-14 15:55 ` [PATCH 1/4] perf: nvidia_t410_cmem_latency: fix cpuhp state leak on init failure Besar Wicaksono

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260514173433.B51DAC2BCB3@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=sauravsc@amazon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.