From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E3D6ECD98E1 for ; Tue, 16 Jun 2026 19:15:31 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 410CE10E822; Tue, 16 Jun 2026 19:15:31 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (1024-bit key; unprotected) header.d=collabora.com header.i=adrian.larumbe@collabora.com header.b="Z8xsV58j"; dkim-atps=neutral Received: from sender4-op-o11.zoho.com (sender4-op-o11.zoho.com [136.143.188.11]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8BFDA10E822 for ; Tue, 16 Jun 2026 19:15:29 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; t=1781637319; cv=none; d=zohomail.com; s=zohoarc; b=IpBJUWTdT7RT9QRbJ3zpzfssnUt0WtMyMFxGNkUyH49g52zBtgc9rlFVdKFONax1HedyGq43EW3QvI1dHfHTVX/wE9s7ZcOLOK4WxYVL2gkw9MC8wz6v5O0dTmk7cyznsP4Bq3zg0ZbNk3pKnCNJFupKYgYysoFEP0QBmZrKGD0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1781637319; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:Subject:To:To:Message-Id:Reply-To; bh=BAZfFJhK7BAlmOVgBfnDcsb9o7dXLckamZVFSn2NDfA=; b=Mn7jFQXvvHHtnMpU+JepUf2ROOFlStqk1W5x9eo5c18cDn4QK5hDMgBs2FsAYBo/+nYEWkb5KCklHFdGY/W3Md0H68GkA3K3/xAw6+GMxcGc1cw9CmEMRqFeEc14Y5U4yEMKXx4bb+0bcsiHFiD62njoIl0+eTNPZbFbg8fFc7U= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=collabora.com; spf=pass smtp.mailfrom=adrian.larumbe@collabora.com; dmarc=pass header.from= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1781637319; s=zohomail; d=collabora.com; i=adrian.larumbe@collabora.com; h=Date:Date:From:From:To:To:Cc:Cc:Subject:Subject:Message-ID:References:MIME-Version:Content-Type:Content-Transfer-Encoding:In-Reply-To:Message-Id:Reply-To; bh=BAZfFJhK7BAlmOVgBfnDcsb9o7dXLckamZVFSn2NDfA=; b=Z8xsV58jBrtQ5hELlPowch8zBSdZTE/KLdMawvzDMCtFZW3YNgNJYs8L5UNRtqKQ V0bm4x9U02HgLfGyC4KEd2QSLtIvf0bc+nI1rxhcPGYElvesCT8J9n5ud4aKrQhjpPv K7+GL2cEO41SW3SJ6UV6aRGBqsU0GlzR/oqrW6Hc= Received: by mx.zohomail.com with SMTPS id 1781637317246786.2221356183986; Tue, 16 Jun 2026 12:15:17 -0700 (PDT) Date: Tue, 16 Jun 2026 20:15:11 +0100 From: =?utf-8?Q?Adri=C3=A1n?= Larumbe To: Steven Price Cc: Boris Brezillon , Rob Herring , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Simona Vetter , Faith Ekstrand , "Marty E. Plummer" , Tomeu Vizoso , Eric Anholt , Alyssa Rosenzweig , Robin Murphy , dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, Collabora Kernel Team , Neil Armstrong , Claude Subject: Re: [PATCH v2 5/7] drm/panfrost: Make reset sequence deal with an active HWPerf session Message-ID: References: <20260604-claude-fixes-v2-0-57c6bd4c1655@collabora.com> <20260604-claude-fixes-v2-5-57c6bd4c1655@collabora.com> <20260604202613.4afe53f1@fedora-2.home> <9ec2de4c-9404-4c9b-9d86-c5aad9cde721@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <9ec2de4c-9404-4c9b-9d86-c5aad9cde721@arm.com> X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On 05.06.2026 11:41, Steven Price wrote: >On 04/06/2026 19:26, Boris Brezillon wrote: >> On Thu, 04 Jun 2026 18:35:24 +0100 >> Adrián Larumbe wrote: >> >>> Right now, if there's a HW reset and an HWPerf session is active, >>> panfrost_mmu_reset() will reset the AS count for every single open file's >>> mmu struct back to 0, and also invalidate their AS numbers. Then, when >>> disabling hwperf, panfrost_mmu_as_put() will WARN that mmu->as_count is >>> less than zero. >>> >>> Fix this by introducing a perfcnt HW reset path. >>> >>> The choice was made to render perfcnt unusable after reset, so that a >>> user might have to reprogram it with a full disable/enable sequence >>> before requesting more perfcnt dumps. >> >> Can't we do better than that. We store the config of the perf session, >> so if a reset is in progress, we can simply block on it, restore the >> old config in the post_reset path, and get going with the DUMP request >> once the reset has been done. There's certainly something to do to >> report the discontinuity to userspace, but that's something we can >> reflect through and extra field added to drm_panfrost_perfcnt_dump. > >I agree with Boris' point, ideally reset should be as transparent as >possible. Reporting that it has happened is useful, but there's no need >to force user space into restoring the context unnecessarily. > >One other minor issue below. I think the fact that this is going to require a non-negligible amount of effort in thinking out a stat-save-and-restore sequence for perfcnt in the event of a reset, I guess I'd better leave it off for another series of its own. >>> >>> Reported-by: Claude >>> Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/88 >>> Signed-off-by: Adrián Larumbe >>> Fixes: 7786fd108777 ("drm/panfrost: Expose performance counters through unstable ioctls") >>> --- >>> drivers/gpu/drm/panfrost/panfrost_device.c | 1 + >>> drivers/gpu/drm/panfrost/panfrost_perfcnt.c | 46 ++++++++++++++++++++++++++++- >>> drivers/gpu/drm/panfrost/panfrost_perfcnt.h | 1 + >>> 3 files changed, 47 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c b/drivers/gpu/drm/panfrost/panfrost_device.c >>> index 87b372c9e675..2805d50c1b9b 100644 >>> --- a/drivers/gpu/drm/panfrost/panfrost_device.c >>> +++ b/drivers/gpu/drm/panfrost/panfrost_device.c >>> @@ -426,6 +426,7 @@ bool panfrost_exception_needs_reset(const struct panfrost_device *pfdev, >>> >>> void panfrost_device_reset(struct panfrost_device *pfdev, bool enable_job_int) >>> { >>> + panfrost_perfcnt_reset(pfdev); >>> panfrost_gpu_soft_reset(pfdev); >>> >>> panfrost_gpu_power_on(pfdev); >>> diff --git a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c b/drivers/gpu/drm/panfrost/panfrost_perfcnt.c >>> index ad1156678e91..c2087ea705fe 100644 >>> --- a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c >>> +++ b/drivers/gpu/drm/panfrost/panfrost_perfcnt.c >>> @@ -33,6 +33,7 @@ struct panfrost_perfcnt { >>> struct panfrost_file_priv *user; >>> struct mutex lock; >>> struct completion dump_comp; >>> + atomic_t hw_reset_happened; >>> }; >>> >>> static void panfrost_perfcnt_gpu_disable(struct panfrost_device *pfdev) >>> @@ -57,9 +58,13 @@ void panfrost_perfcnt_sample_done(struct panfrost_device *pfdev) >>> >>> static int panfrost_perfcnt_dump_locked(struct panfrost_device *pfdev) >>> { >>> + struct panfrost_perfcnt *perfcnt = pfdev->perfcnt; >>> u64 gpuva; >>> int ret; >>> >>> + if (atomic_read(&perfcnt->hw_reset_happened)) >>> + return -EIO; >>> + >>> reinit_completion(&pfdev->perfcnt->dump_comp); >>> gpuva = pfdev->perfcnt->mapping->mmnode.start << PAGE_SHIFT; >>> gpu_write(pfdev, GPU_PERFCNT_BASE_LO, lower_32_bits(gpuva)); >>> @@ -140,6 +145,15 @@ static int panfrost_perfcnt_enable_locked(struct panfrost_device *pfdev, >>> goto err_vunmap; >>> } >>> >>> + /* If a reset is ongoing, the AS we get right below will be torn >>> + * down, so rather than waiting until this becomes obvious in a >>> + * perfcnt_dump() ioctl, we ask the user to try again slightly later. >>> + */ >>> + if (atomic_read(&pfdev->reset.pending)) { >>> + ret = -EAGAIN; >>> + goto err_vunmap; >>> + } >>> + >>> ret = panfrost_mmu_as_get(pfdev, perfcnt->mapping->mmu); >>> if (ret < 0) >>> goto err_vunmap; >>> @@ -173,6 +187,16 @@ static int panfrost_perfcnt_enable_locked(struct panfrost_device *pfdev, >>> if (panfrost_has_hw_issue(pfdev, HW_ISSUE_8186)) >>> gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0xffffffff); >>> >>> + /* If a reset happened, we've no way of knowing whether it was between the time we called >>> + * panfrost_mmu_as_get() or before perfcnt_enable(), so clearing this flag and going forward >>> + * isn't possible. We must clear the flag and try again in the hopes no resets will happen >>> + * between this and the next ioctl invocation. >>> + */ >>> + if (atomic_cmpxchg(&perfcnt->hw_reset_happened, 1, 0)) { >>> + ret = EAGAIN; > >Missing '-' on EAGAIN. Acked. >>> + goto err_disable; >>> + } >> >> This should really be transparent to the user, apart from reporting >> that samples might have been lost because of the reset. >> >>> + >>> /* The BO ref is retained by the mapping. */ >>> drm_gem_object_put(&bo->base); >>> >>> @@ -180,6 +204,8 @@ static int panfrost_perfcnt_enable_locked(struct panfrost_device *pfdev, >>> >>> return 0; >>> >>> +err_disable: > >We're leaking the AS reference here - you need a panfrost_mmu_as_put() call. Acked, will fix. >>> + panfrost_perfcnt_gpu_disable(pfdev); >>> err_vunmap: >>> drm_gem_vunmap(&bo->base, &map); >>> err_put_mapping: >>> @@ -209,7 +235,8 @@ static int panfrost_perfcnt_disable_locked(struct panfrost_device *pfdev, >>> drm_gem_vunmap(&perfcnt->mapping->obj->base.base, &map); >>> perfcnt->buf = NULL; >>> panfrost_gem_close(&perfcnt->mapping->obj->base.base, file_priv); >>> - panfrost_mmu_as_put(pfdev, perfcnt->mapping->mmu); >>> + if (!atomic_read(&perfcnt->hw_reset_happened)) >>> + panfrost_mmu_as_put(pfdev, perfcnt->mapping->mmu); > >The atomic_read() scares me - is this really safe and not subject to races? You're right, it's racy because the thread executing the ioctl could be scheduled out right before panfrost_mmu_as_put() is executed, then a reset come through in the meantime, and by the time it's resumed the as_count would again be < 0. I guess I need a proper spinlock to protect access to the mmu struct from perfcnt. >Thanks, >Steve > >> >> Why not call panfrost_mmu_as_put() in panfrost_perfcnt_pre_reset(), >> which is called before panfrost_mmu_reset()? If we do that, the AS >> should be returned, and we can add a panfrost_perfcnt_post_reset() >> that's called after panfrost_mmu_reset() and which re-acquires the AS >> and re-instantiate the perfcnt settings. >> >>> panfrost_gem_mapping_put(perfcnt->mapping); >>> perfcnt->mapping = NULL; >>> pm_runtime_put_autosuspend(pfdev->base.dev); >>> @@ -346,3 +373,20 @@ void panfrost_perfcnt_fini(struct panfrost_device *pfdev) >>> /* Disable everything before leaving. */ >>> panfrost_perfcnt_gpu_disable(pfdev); >>> } >>> + >>> +void panfrost_perfcnt_reset(struct panfrost_device *pfdev) >>> +{ >>> + struct panfrost_perfcnt *perfcnt = pfdev->perfcnt; >>> + >>> + /* Since this function will be called either from a scheduled HW reset >>> + * or a runtime resume, tearing down any perfcnt resources means we're >>> + * doomed to deadlocking with perfcnt_{enable/disable}, since we'd have >>> + * to take the perfecnt lock. On top of that, it'd also violate DMA fence >>> + * signalling rules because GFP_KERNEL allocations are made with the perfcnt >>> + * lock taken in perfcnt_enable. >> >> Question is, do we really need these allocation to happen with the lock >> held? And if yes, can't we protect perfcnt ops with a separate reset >> lock? >> >>> In light of this, the only thing we can do >>> + * is disabling perfcnt unconditionally, and notifying the perfcnt user of >>> + * the reset having happpened so that they can take recovery measures. >> >> Informing userspace about the discontinuity is fine, but I think we >> should try to restore the config in a post_reset() hook. >> >>> + */ >>> + panfrost_perfcnt_gpu_disable(pfdev); >>> + atomic_set(&perfcnt->hw_reset_happened, 1); >>> +} >>> diff --git a/drivers/gpu/drm/panfrost/panfrost_perfcnt.h b/drivers/gpu/drm/panfrost/panfrost_perfcnt.h >>> index 8bbcf5f5fb33..8b9bc704b634 100644 >>> --- a/drivers/gpu/drm/panfrost/panfrost_perfcnt.h >>> +++ b/drivers/gpu/drm/panfrost/panfrost_perfcnt.h >>> @@ -14,5 +14,6 @@ int panfrost_ioctl_perfcnt_enable(struct drm_device *dev, void *data, >>> struct drm_file *file_priv); >>> int panfrost_ioctl_perfcnt_dump(struct drm_device *dev, void *data, >>> struct drm_file *file_priv); >>> +void panfrost_perfcnt_reset(struct panfrost_device *pfdev); >>> >>> #endif >>> >> Adrian Larumbe