From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1EAF3CD6E77 for ; Thu, 4 Jun 2026 17:49:21 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6E42111A222; Thu, 4 Jun 2026 17:49:20 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.b="LbooHSvY"; dkim-atps=neutral Received: from tor.source.kernel.org (tor.source.kernel.org [172.105.4.254]) by gabe.freedesktop.org (Postfix) with ESMTPS id 47CF711A222 for ; Thu, 4 Jun 2026 17:49:19 +0000 (UTC) Received: from smtp.kernel.org (quasi.space.kernel.org [100.103.45.18]) by tor.source.kernel.org (Postfix) with ESMTP id 998EA60052; Thu, 4 Jun 2026 17:49:18 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2BC551F00893; Thu, 4 Jun 2026 17:49:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780595358; bh=BgSP3IPBoxNycat9P8BFSROqA3u+P0agcbJEIzc4wZA=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date; b=LbooHSvY86GL9esSseycJggfk/BMM1g6wPNMqaAwuJ2FiG/4TroTLzKSD9RHoMRlf XaTVR/hjfp3RXo9Wev6oi5wSlEa3Z85gYq5jKpaB1OZINNMAGaqUYAV42B0EaG2+o4 wTc2EU/fWBF5vOZGPCqZivi1R4wGGsgCzEkkIF67dwkxBcodE5E76ocX7DujdWV4NU 132OY+Ol6Iwh9rsaVt3PELddX4IBBC8B7toqgJ/t+xB5glMHMDA5NXeOiU/ngpBCpN J1+U7QTCQyUxmEKqOsWEqk+USUq6H0U5OYIGzg+qpH9E87QnX7rEGHalRcD9Jxe9cy gq0MJOBs+ZiLA== From: sashiko-bot@kernel.org Subject: Re: [PATCH v2 5/7] drm/panfrost: Make reset sequence deal with an active HWPerf session To: =?utf-8?b?QWRyacOhbiBMYXJ1bWJl?= Cc: dri-devel@lists.freedesktop.org In-Reply-To: <20260604-claude-fixes-v2-5-57c6bd4c1655@collabora.com> References: <20260604-claude-fixes-v2-5-57c6bd4c1655@collabora.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Thu, 04 Jun 2026 17:49:17 +0000 Message-Id: <20260604174918.2BC551F00893@smtp.kernel.org> X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: sashiko-reviews@lists.linux.dev Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Thank you for your contribution! Sashiko AI review found 4 potential issue(= s) to consider: - [High] Address Space reference (mmu->as_count) permanently leaks when har= dware reset aborts perfcnt_enable_locked. - [High] A TOCTOU race between disable_locked and the asynchronous reset wo= rkqueue reintroduces the mmu->as_count underflow WARN. - [Medium] A positive error code (EAGAIN) is incorrectly returned directly = to userspace instead of the negative standard (-EAGAIN). - [Medium] Spurious initialization failures occur for new perfcnt sessions = due to a stale hw_reset_happened flag. -- commit 555dc068f79668462a998f3ee4fa95ead697090d Author: Adri=C3=A1n Larumbe drm/panfrost: Make reset sequence deal with an active HWPerf session This commit introduces a perfcnt HW reset path to handle active HWPerf sessions during a HW reset. It renders perfcnt unusable after reset and notifies the user to take recovery measures. Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/88 Fixes: 7786fd108777 ("drm/panfrost: Expose performance counters through uns= table ioctls") > diff --git a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c b/drivers/gpu/dr= m/panfrost/panfrost_perfcnt.c > index ad1156678e91c..c2087ea705fe6 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c > +++ b/drivers/gpu/drm/panfrost/panfrost_perfcnt.c [ ... ] > @@ -173,6 +187,16 @@ static int panfrost_perfcnt_enable_locked(struct pan= frost_device *pfdev, > if (panfrost_has_hw_issue(pfdev, HW_ISSUE_8186)) > gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0xffffffff); > =20 > + /* If a reset happened, we've no way of knowing whether it was between = the time we called > + * panfrost_mmu_as_get() or before perfcnt_enable(), so clearing this f= lag and going forward > + * isn't possible. We must clear the flag and try again in the hopes no= resets will happen > + * between this and the next ioctl invocation. > + */ > + if (atomic_cmpxchg(&perfcnt->hw_reset_happened, 1, 0)) { > + ret =3D EAGAIN; [Severity: Medium] Does this return a positive error code directly to userspace instead of the standard negative one (-EAGAIN)? DRM ioctls pass positive return values directly back to user space as the ioctl syscall result. Because of this, user space will incorrectly interpret this as success and subsequently fail on dump ioctls since the session wasn= 't actually enabled. > + goto err_disable; [Severity: High] Can this leak the Address Space reference if a hardware reset aborts the enable sequence? If a GPU reset occurs before panfrost_mmu_as_get() is called, the reset handler clears mmu->as_count to 0, and then panfrost_mmu_as_get() increments it to 1. The code subsequently detects the reset via the hw_reset_happened flag, clears it, and jumps to err_disable here. Because err_disable skips panfrost_mmu_as_put(), the AS slot reference is permanently leaked, eventually exhausting the limited hardware AS slots. > + } > + > /* The BO ref is retained by the mapping. */ > drm_gem_object_put(&bo->base); [ ... ] > @@ -209,7 +235,8 @@ static int panfrost_perfcnt_disable_locked(struct pan= frost_device *pfdev, > drm_gem_vunmap(&perfcnt->mapping->obj->base.base, &map); > perfcnt->buf =3D NULL; > panfrost_gem_close(&perfcnt->mapping->obj->base.base, file_priv); > - panfrost_mmu_as_put(pfdev, perfcnt->mapping->mmu); > + if (!atomic_read(&perfcnt->hw_reset_happened)) > + panfrost_mmu_as_put(pfdev, perfcnt->mapping->mmu); [Severity: High] Could a race between this check and an asynchronous reset reintroduce the mmu->as_count underflow WARN that this patch was explicitly intended to prevent? The non-atomic gap between reading the flag and executing panfrost_mmu_as_put() creates an exposure window. If an interleaved asynchronous hardware reset occurs in this window, panfrost_device_reset() will clear mmu->as_count to 0. The subsequent execution of panfrost_mmu_as_put() decrements it to -1, directly triggering the WARN. > panfrost_gem_mapping_put(perfcnt->mapping); > perfcnt->mapping =3D NULL; > pm_runtime_put_autosuspend(pfdev->base.dev); [ ... ] > @@ -346,3 +373,20 @@ void panfrost_perfcnt_fini(struct panfrost_device *p= fdev) > /* Disable everything before leaving. */ > panfrost_perfcnt_gpu_disable(pfdev); > } > + > +void panfrost_perfcnt_reset(struct panfrost_device *pfdev) > +{ > + struct panfrost_perfcnt *perfcnt =3D pfdev->perfcnt; > + > + /* Since this function will be called either from a scheduled HW reset > + * or a runtime resume, tearing down any perfcnt resources means we're > + * doomed to deadlocking with perfcnt_{enable/disable}, since we'd have > + * to take the perfecnt lock. On top of that, it'd also violate DMA fen= ce > + * signalling rules because GFP_KERNEL allocations are made with the pe= rfcnt > + * lock taken in perfcnt_enable. In light of this, the only thing we ca= n do > + * is disabling perfcnt unconditionally, and notifying the perfcnt user= of > + * the reset having happpened so that they can take recovery measures. > + */ > + panfrost_perfcnt_gpu_disable(pfdev); > + atomic_set(&perfcnt->hw_reset_happened, 1); [Severity: Medium] Will this unconditionally set the reset flag even if no perfcnt session is active, causing the next new session to spuriously fail? Since this function is called during any GPU reset, it sets the flag even when perfcnt->user =3D=3D NULL. If user space later attempts to start a completely new session, panfrost_perfcnt_enable_locked() will consume this stale flag and abort with EAGAIN, functionally breaking the feature on the first attempt after a reset. > +} --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260604-claude-fix= es-v2-0-57c6bd4c1655@collabora.com?part=3D5