Re: [PATCH v2 5/7] drm/panfrost: Make reset sequence deal with an active HWPerf session

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Adrián Larumbe" <adrian.larumbe@collabora.com>
Cc: dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v2 5/7] drm/panfrost: Make reset sequence deal with an active HWPerf session
Date: Thu, 04 Jun 2026 17:49:17 +0000	[thread overview]
Message-ID: <20260604174918.2BC551F00893@smtp.kernel.org> (raw)
In-Reply-To: <20260604-claude-fixes-v2-5-57c6bd4c1655@collabora.com>

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] Address Space reference (mmu->as_count) permanently leaks when hardware reset aborts perfcnt_enable_locked.
- [High] A TOCTOU race between disable_locked and the asynchronous reset workqueue reintroduces the mmu->as_count underflow WARN.
- [Medium] A positive error code (EAGAIN) is incorrectly returned directly to userspace instead of the negative standard (-EAGAIN).
- [Medium] Spurious initialization failures occur for new perfcnt sessions due to a stale hw_reset_happened flag.
--

commit 555dc068f79668462a998f3ee4fa95ead697090d
Author: Adrián Larumbe <adrian.larumbe@collabora.com>

drm/panfrost: Make reset sequence deal with an active HWPerf session

This commit introduces a perfcnt HW reset path to handle active HWPerf
sessions during a HW reset. It renders perfcnt unusable after reset and
notifies the user to take recovery measures.

Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/88
Fixes: 7786fd108777 ("drm/panfrost: Expose performance counters through unstable ioctls")

> diff --git a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c b/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
> index ad1156678e91c..c2087ea705fe6 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_perfcnt.c

[ ... ]

> @@ -173,6 +187,16 @@ static int panfrost_perfcnt_enable_locked(struct panfrost_device *pfdev,
>  	if (panfrost_has_hw_issue(pfdev, HW_ISSUE_8186))
>  		gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0xffffffff);
>  
> +	/* If a reset happened, we've no way of knowing whether it was between the time we called
> +	 * panfrost_mmu_as_get() or before perfcnt_enable(), so clearing this flag and going forward
> +	 * isn't possible. We must clear the flag and try again in the hopes no resets will happen
> +	 * between this and the next ioctl invocation.
> +	 */
> +	if (atomic_cmpxchg(&perfcnt->hw_reset_happened, 1, 0)) {
> +		ret = EAGAIN;

[Severity: Medium]
Does this return a positive error code directly to userspace instead of the
standard negative one (-EAGAIN)?

DRM ioctls pass positive return values directly back to user space as the
ioctl syscall result. Because of this, user space will incorrectly interpret
this as success and subsequently fail on dump ioctls since the session wasn't
actually enabled.

> +		goto err_disable;

[Severity: High]
Can this leak the Address Space reference if a hardware reset aborts the
enable sequence?

If a GPU reset occurs before panfrost_mmu_as_get() is called, the reset
handler clears mmu->as_count to 0, and then panfrost_mmu_as_get() increments
it to 1. The code subsequently detects the reset via the hw_reset_happened
flag, clears it, and jumps to err_disable here. Because err_disable skips
panfrost_mmu_as_put(), the AS slot reference is permanently leaked,
eventually exhausting the limited hardware AS slots.

> +	}
> +
>  	/* The BO ref is retained by the mapping. */
>  	drm_gem_object_put(&bo->base);

[ ... ]

> @@ -209,7 +235,8 @@ static int panfrost_perfcnt_disable_locked(struct panfrost_device *pfdev,
>  	drm_gem_vunmap(&perfcnt->mapping->obj->base.base, &map);
>  	perfcnt->buf = NULL;
>  	panfrost_gem_close(&perfcnt->mapping->obj->base.base, file_priv);
> -	panfrost_mmu_as_put(pfdev, perfcnt->mapping->mmu);
> +	if (!atomic_read(&perfcnt->hw_reset_happened))
> +		panfrost_mmu_as_put(pfdev, perfcnt->mapping->mmu);

[Severity: High]
Could a race between this check and an asynchronous reset reintroduce the
mmu->as_count underflow WARN that this patch was explicitly intended to
prevent?

The non-atomic gap between reading the flag and executing
panfrost_mmu_as_put() creates an exposure window. If an interleaved
asynchronous hardware reset occurs in this window, panfrost_device_reset()
will clear mmu->as_count to 0. The subsequent execution of
panfrost_mmu_as_put() decrements it to -1, directly triggering the WARN.

>  	panfrost_gem_mapping_put(perfcnt->mapping);
>  	perfcnt->mapping = NULL;
>  	pm_runtime_put_autosuspend(pfdev->base.dev);

[ ... ]

> @@ -346,3 +373,20 @@ void panfrost_perfcnt_fini(struct panfrost_device *pfdev)
>  	/* Disable everything before leaving. */
>  	panfrost_perfcnt_gpu_disable(pfdev);
>  }
> +
> +void panfrost_perfcnt_reset(struct panfrost_device *pfdev)
> +{
> +	struct panfrost_perfcnt *perfcnt = pfdev->perfcnt;
> +
> +	/* Since this function will be called either from a scheduled HW reset
> +	 * or a runtime resume, tearing down any perfcnt resources means we're
> +	 * doomed to deadlocking with perfcnt_{enable/disable}, since we'd have
> +	 * to take the perfecnt lock. On top of that, it'd also violate DMA fence
> +	 * signalling rules because GFP_KERNEL allocations are made with the perfcnt
> +	 * lock taken in perfcnt_enable. In light of this, the only thing we can do
> +	 * is disabling perfcnt unconditionally, and notifying the perfcnt user of
> +	 * the reset having happpened so that they can take recovery measures.
> +	 */
> +	panfrost_perfcnt_gpu_disable(pfdev);
> +	atomic_set(&perfcnt->hw_reset_happened, 1);

[Severity: Medium]
Will this unconditionally set the reset flag even if no perfcnt session is
active, causing the next new session to spuriously fail?

Since this function is called during any GPU reset, it sets the flag even
when perfcnt->user == NULL. If user space later attempts to start a
completely new session, panfrost_perfcnt_enable_locked() will consume this
stale flag and abort with EAGAIN, functionally breaking the feature on the
first attempt after a reset.

> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604-claude-fixes-v2-0-57c6bd4c1655@collabora.com?part=5

next prev parent reply	other threads:[~2026-06-04 17:49 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-04 17:35 [PATCH v2 0/7] RPM, perfcnt and other minor fixes for Panfrost Adrián Larumbe
2026-06-04 17:35 ` [PATCH v2 1/7] drm/panfrost: Check another bo field for cache option query Adrián Larumbe
2026-06-04 17:57   ` Boris Brezillon
2026-06-05 10:29   ` Steven Price
2026-06-04 17:35 ` [PATCH v2 2/7] drm/panfrost: Prevent division by 0 Adrián Larumbe
2026-06-04 17:44   ` sashiko-bot
2026-06-04 18:02   ` Boris Brezillon
2026-06-05 10:29     ` Steven Price
2026-06-04 17:35 ` [PATCH v2 3/7] drm/panfrost: Move shrinker initialization and unplug one level down Adrián Larumbe
2026-06-04 18:04   ` Boris Brezillon
2026-06-04 17:35 ` [PATCH v2 4/7] drm/panfrost: Move perfcnt GPU disable sequence into a helper Adrián Larumbe
2026-06-04 17:47   ` sashiko-bot
2026-06-04 18:05   ` Boris Brezillon
2026-06-05 10:34   ` Steven Price
2026-06-04 17:35 ` [PATCH v2 5/7] drm/panfrost: Make reset sequence deal with an active HWPerf session Adrián Larumbe
2026-06-04 17:49   ` sashiko-bot [this message]
2026-06-04 18:26   ` Boris Brezillon
2026-06-05 10:41     ` Steven Price
2026-06-04 17:35 ` [PATCH v2 6/7] drm/panfrost: Fix PM usage_count mishandling Adrián Larumbe
2026-06-04 17:50   ` sashiko-bot
2026-06-04 18:36   ` Boris Brezillon
2026-06-05 10:48   ` Steven Price
2026-06-04 17:35 ` [PATCH v2 7/7] drm/panfrost: Explicitly enable MMU interrupts at device init Adrián Larumbe
2026-06-04 17:55   ` sashiko-bot
2026-06-05  6:56   ` Boris Brezillon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260604174918.2BC551F00893@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=adrian.larumbe@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.