Re: [PATCH 2/4] perf: nvidia_t410_cmem_latency: handle PERF_EF_UPDATE in stop

Linux Perf Users
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Saurav Sachidanand" <sauravsc@amazon.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [PATCH 2/4] perf: nvidia_t410_cmem_latency: handle PERF_EF_UPDATE in stop
Date: Thu, 14 May 2026 12:49:06 +0000	[thread overview]
Message-ID: <20260514124907.0C52CC2BCB3@smtp.kernel.org> (raw)
In-Reply-To: <20260514010629.76558-2-sauravsc@amazon.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Critical] Performing MMIO reads on clock-gated hardware in pmu->stop() will trigger a system crash (SError).
- [High] API Contract Violation and conditional update logic error for free-running PMU counters.
- [High] Use-After-Free of PMU context during CPU hotplug teardown.
--

commit 1c17f7a02936b83432204f0c38f9d1d22f6f4acd
Author: Saurav Sachidanand <sauravsc@amazon.com>

perf: nvidia_t410_cmem_latency: handle PERF_EF_UPDATE in stop

This commit updates cmem_lat_pmu_stop to return early if the event is
already stopped, and conditionally updates the event if the update flag is
requested.

> cmem_lat_pmu_stop() does not read the final counter value when called
> with PERF_EF_UPDATE. When perf core calls pmu->del() -> pmu->stop()
> with PERF_EF_UPDATE, the last counter delta is lost because the event
> is marked stopped without reading hardware.

Is this description accurate?

Looking at cmem_lat_pmu_disable(), it explicitly stops the hardware, reads
and updates all active events, and clears the counters. Since pmu->stop()
is called within the perf_pmu_disable() block, the final count should already
be unconditionally read before shutting down the hardware, meaning the last
counter delta might not be lost.

> diff --git a/drivers/perf/nvidia_t410_cmem_latency_pmu.c b/drivers/perf/nvidia_t410_cmem_latency_pmu.c
> --- a/drivers/perf/nvidia_t410_cmem_latency_pmu.c
> +++ b/drivers/perf/nvidia_t410_cmem_latency_pmu.c
> @@ -303,6 +303,12 @@ static void cmem_lat_pmu_start(struct perf_event *event, int pmu_flags)
>  
>  static void cmem_lat_pmu_stop(struct perf_event *event, int pmu_flags)
>  {
> +	if (event->hw.state & PERF_HES_STOPPED)
> +		return;

Does this early return violate the perf core API contract by ignoring
subsequent PERF_EF_UPDATE requests for already-stopped events?

> +
> +	if (pmu_flags & PERF_EF_UPDATE)
> +		cmem_lat_pmu_event_update(event);

Will this trigger an asynchronous external abort (SError) and crash
the system?

The perf core calls pmu->stop() within a perf_pmu_disable() block.
In this driver, cmem_lat_pmu_disable() gates the module clock. Calling
cmem_lat_pmu_event_update() performs MMIO reads on the PMU registers.
Since this happens while the PMU clock is gated, these MMIO accesses will
target unclocked registers.

Additionally, does this conditional update cause regressions for free-running
PMU counters?

Since this PMU uses free-running global counters without per-event hardware
stops, conditionally updating the event only when PERF_EF_UPDATE is set means
calls like pmu->stop(event, 0) will fail to capture the counter delta. This
allows the hardware counter to continue incrementing while the event is
logically stopped, which could corrupt subsequent readings.

> +
>  	event->hw.state |= PERF_HES_STOPPED;
>  }

While reviewing this driver, I noticed an existing regression in the hotplug
teardown path.

Can a use-after-free occur during CPU hotplug teardown?

In cmem_lat_pmu_device_remove(), perf_pmu_unregister() is called before
cpuhp_state_remove_instance(). perf_pmu_unregister() detaches all events and
frees internal core PMU resources.

When cpuhp_state_remove_instance() runs sequentially after it, it executes
the hotplug teardown callback cmem_lat_pmu_cpu_teardown(), which invokes
perf_pmu_migrate_context() with the already unregistered PMU pointer. This
accesses partially freed perf core state, which might cause a kernel panic
upon unbind or failed initialization.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260514010629.76558-1-sauravsc@amazon.com?part=2

next prev parent reply	other threads:[~2026-05-14 12:49 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-14  1:06 [PATCH 1/4] perf: nvidia_t410_cmem_latency: fix cpuhp state leak on init failure Saurav Sachidanand
2026-05-14  1:06 ` [PATCH 2/4] perf: nvidia_t410_cmem_latency: handle PERF_EF_UPDATE in stop Saurav Sachidanand
2026-05-14 12:49   ` sashiko-bot [this message]
2026-05-14  1:06 ` [PATCH 3/4] perf: nvidia_t410_c2c: fix cpuhp state leak on init failure Saurav Sachidanand
2026-05-14  1:06 ` [PATCH 4/4] perf: nvidia_t410_c2c: handle PERF_EF_UPDATE in stop Saurav Sachidanand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260514124907.0C52CC2BCB3@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=sauravsc@amazon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox