[PATCH v2 0/7] RPM, perfcnt and other minor fixes for Panfrost

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/7] RPM, perfcnt and other minor fixes for Panfrost
@ 2026-06-04 17:35 Adrián Larumbe
  2026-06-04 17:35 ` [PATCH v2 1/7] drm/panfrost: Check another bo field for cache option query Adrián Larumbe
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Adrián Larumbe @ 2026-06-04 17:35 UTC (permalink / raw)
  To: Boris Brezillon, Rob Herring, Steven Price, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Faith Ekstrand, Marty E. Plummer, Tomeu Vizoso, Eric Anholt,
	Alyssa Rosenzweig, Robin Murphy
  Cc: dri-devel, linux-kernel, Collabora Kernel Team,
	Adrián Larumbe, Neil Armstrong, Claude

This patch series comes with some minor fixes for Panfrost. Some of the
issues addressed were uncovered by LLM tools, while the one about RPM was
traditionally debugged.

The only proper bug fixes are the ones for perfcnt and RPM, and the others
are more about ensuring robustness. perfcnt bug was never observed in the
open, because no one had faced a HW reset when a perfcnt session was
ongoing. RPM fix addresses a well-known issue I had been observing for
quite some time, but that I had completely misunderstood.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
Changes in v2:
- Fixed race conditions introduced by the previous revision, in the RPM and
HWPerf commits specifically. These can be consulted at [1]
- Don't attempt to suspend the device unconditionally at the end of
probe(), and do it through autosuspend instead.
- Broke the RPM patch into one that fixes PM refcnt proper and another one
for MMU enablement at device init time.
- Moved perfcnt GPU disable helper into its own commmit. Same for
transplating shrinker initialisation and unplug.
- Added 'Fixes' and 'Reported-by' tags to all the relevant commits
- [1] https://sashiko.dev/#/patchset/20260526-claude-fixes-v1-0-16e92eaa4949%40collabora.com
- Link to v1: https://patch.msgid.link/20260526-claude-fixes-v1-0-16e92eaa4949@collabora.com

To: Boris Brezillon <boris.brezillon@collabora.com>
To: Rob Herring <robh@kernel.org>
To: Steven Price <steven.price@arm.com>
To: Adrián Larumbe <adrian.larumbe@collabora.com>
To: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
To: Maxime Ripard <mripard@kernel.org>
To: Thomas Zimmermann <tzimmermann@suse.de>
To: David Airlie <airlied@gmail.com>
To: Simona Vetter <simona@ffwll.ch>
To: Faith Ekstrand <faith.ekstrand@collabora.com>
To: "Marty E. Plummer" <hanetzer@startmail.com>
To: Tomeu Vizoso <tomeu@tomeuvizoso.net>
To: Eric Anholt <eric@anholt.net>
To: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
To: Robin Murphy <robin.murphy@arm.com>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
Cc: Neil Armstrong <neil.armstrong@linaro.org>

---
Adrián Larumbe (7):
      drm/panfrost: Check another bo field for cache option query
      drm/panfrost: Prevent division by 0
      drm/panfrost: Move shrinker initialization and unplug one level down
      drm/panfrost: Move perfcnt GPU disable sequence into a helper
      drm/panfrost: Make reset sequence deal with an active HWPerf session
      drm/panfrost: Fix PM usage_count mishandling
      drm/panfrost: Explicitly enable MMU interrupts at device init

 drivers/gpu/drm/panfrost/panfrost_devfreq.c      |  3 +-
 drivers/gpu/drm/panfrost/panfrost_device.c       |  8 +++
 drivers/gpu/drm/panfrost/panfrost_drv.c          | 14 ++---
 drivers/gpu/drm/panfrost/panfrost_gem.h          |  4 +-
 drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c |  8 +--
 drivers/gpu/drm/panfrost/panfrost_mmu.c          | 11 +++-
 drivers/gpu/drm/panfrost/panfrost_mmu.h          |  2 +
 drivers/gpu/drm/panfrost/panfrost_perfcnt.c      | 77 ++++++++++++++++++------
 drivers/gpu/drm/panfrost/panfrost_perfcnt.h      |  1 +
 9 files changed, 90 insertions(+), 38 deletions(-)
---
base-commit: c1079aebb4de218caa86c44f9a53700d1a582683
change-id: 20260523-claude-fixes-82b03a134a8c

Best regards,
--  
Adrián Larumbe <adrian.larumbe@collabora.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 1/7] drm/panfrost: Check another bo field for cache option query
  2026-06-04 17:35 [PATCH v2 0/7] RPM, perfcnt and other minor fixes for Panfrost Adrián Larumbe
@ 2026-06-04 17:35 ` Adrián Larumbe
  2026-06-04 17:57   ` Boris Brezillon
  2026-06-05 10:29   ` Steven Price
  2026-06-04 17:35 ` [PATCH v2 2/7] drm/panfrost: Prevent division by 0 Adrián Larumbe
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: Adrián Larumbe @ 2026-06-04 17:35 UTC (permalink / raw)
  To: Boris Brezillon, Rob Herring, Steven Price, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Faith Ekstrand, Marty E. Plummer, Tomeu Vizoso, Eric Anholt,
	Alyssa Rosenzweig, Robin Murphy
  Cc: dri-devel, linux-kernel, Collabora Kernel Team,
	Adrián Larumbe, Neil Armstrong, Claude

When querying the properties of a BO, whether it was created in
Writeback caching mode for userspace mappings, is recorded in a Panfrost
BO's wb_mmap field.

Reported-by: Claude <noreply@anthropic.com>
Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/88
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Fixes: 62eedf1ccba5 ("drm/panfrost: Add flag to map GEM object Write-Back Cacheable")
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 3d0bdba2a474..cb8e5015847f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -659,7 +659,7 @@ static int panfrost_ioctl_query_bo_info(struct drm_device *dev, void *data,
 		if (bo->is_heap)
 			args->create_flags |= PANFROST_BO_HEAP;
 
-		if (!bo->base.map_wc)
+		if (bo->wb_mmap)
 			args->create_flags |= PANFROST_BO_WB_MMAP;
 	}
 

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 2/7] drm/panfrost: Prevent division by 0
  2026-06-04 17:35 [PATCH v2 0/7] RPM, perfcnt and other minor fixes for Panfrost Adrián Larumbe
  2026-06-04 17:35 ` [PATCH v2 1/7] drm/panfrost: Check another bo field for cache option query Adrián Larumbe
@ 2026-06-04 17:35 ` Adrián Larumbe
  2026-06-04 17:44   ` sashiko-bot
  2026-06-04 18:02   ` Boris Brezillon
  2026-06-04 17:35 ` [PATCH v2 3/7] drm/panfrost: Move shrinker initialization and unplug one level down Adrián Larumbe
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: Adrián Larumbe @ 2026-06-04 17:35 UTC (permalink / raw)
  To: Boris Brezillon, Rob Herring, Steven Price, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Faith Ekstrand, Marty E. Plummer, Tomeu Vizoso, Eric Anholt,
	Alyssa Rosenzweig, Robin Murphy
  Cc: dri-devel, linux-kernel, Collabora Kernel Team,
	Adrián Larumbe, Neil Armstrong, Claude

When updating and debug-printing devfreq stats, in the very unlikely
offchance that total device time is less than 100 ns, clamp it to 0 to
avoid division by 0.

Reported-by: Claude <noreply@anthropic.com>
Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/88
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver")
---
 drivers/gpu/drm/panfrost/panfrost_devfreq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
index b51c30778811..e34de47bf43b 100644
--- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c
+++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
@@ -78,7 +78,8 @@ static int panfrost_devfreq_get_dev_status(struct device *dev,
 
 	dev_dbg(pfdev->base.dev, "busy %lu total %lu %lu %% freq %lu MHz\n",
 		status->busy_time, status->total_time,
-		status->busy_time / (status->total_time / 100),
+		status->total_time >= 100 ?
+			status->busy_time / (status->total_time / 100) : 0,
 		status->current_frequency / 1000 / 1000);
 
 	return 0;

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 3/7] drm/panfrost: Move shrinker initialization and unplug one level down
  2026-06-04 17:35 [PATCH v2 0/7] RPM, perfcnt and other minor fixes for Panfrost Adrián Larumbe
  2026-06-04 17:35 ` [PATCH v2 1/7] drm/panfrost: Check another bo field for cache option query Adrián Larumbe
  2026-06-04 17:35 ` [PATCH v2 2/7] drm/panfrost: Prevent division by 0 Adrián Larumbe
@ 2026-06-04 17:35 ` Adrián Larumbe
  2026-06-04 18:04   ` Boris Brezillon
  2026-06-04 17:35 ` [PATCH v2 4/7] drm/panfrost: Move perfcnt GPU disable sequence into a helper Adrián Larumbe
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Adrián Larumbe @ 2026-06-04 17:35 UTC (permalink / raw)
  To: Boris Brezillon, Rob Herring, Steven Price, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Faith Ekstrand, Marty E. Plummer, Tomeu Vizoso, Eric Anholt,
	Alyssa Rosenzweig, Robin Murphy
  Cc: dri-devel, linux-kernel, Collabora Kernel Team,
	Adrián Larumbe, Neil Armstrong

Since the moment we call drm_dev_register() the device should be in a
position to accept jobs, so it's best if the shrinker is already
initialized by then.

On top of that, make shrinker functions take an panfrost_device pointer
like other functions in the same sequence and rename them accordingly.

Essentially mimic the init/fini behaviour in Panthor.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
 drivers/gpu/drm/panfrost/panfrost_device.c       | 7 +++++++
 drivers/gpu/drm/panfrost/panfrost_drv.c          | 6 ------
 drivers/gpu/drm/panfrost/panfrost_gem.h          | 4 ++--
 drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c | 8 ++------
 4 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c b/drivers/gpu/drm/panfrost/panfrost_device.c
index 7fed22d555a5..87b372c9e675 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.c
+++ b/drivers/gpu/drm/panfrost/panfrost_device.c
@@ -282,9 +282,15 @@ int panfrost_device_init(struct panfrost_device *pfdev)
 	if (err)
 		goto out_job;
 
+	err = panfrost_gem_shrinker_init(pfdev);
+	if (err)
+		goto out_perfcnt;
+
 	panfrost_gem_init(pfdev);
 
 	return 0;
+out_perfcnt:
+	panfrost_perfcnt_fini(pfdev);
 out_job:
 	panfrost_jm_fini(pfdev);
 out_mmu:
@@ -306,6 +312,7 @@ int panfrost_device_init(struct panfrost_device *pfdev)
 
 void panfrost_device_fini(struct panfrost_device *pfdev)
 {
+	panfrost_gem_shrinker_fini(pfdev);
 	panfrost_perfcnt_fini(pfdev);
 	panfrost_jm_fini(pfdev);
 	panfrost_mmu_fini(pfdev);
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index cb8e5015847f..2d4b6aa95c66 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -1000,14 +1000,9 @@ static int panfrost_probe(struct platform_device *pdev)
 	if (err < 0)
 		goto err_out1;
 
-	err = panfrost_gem_shrinker_init(&pfdev->base);
-	if (err)
-		goto err_out2;
 
 	return 0;
 
-err_out2:
-	drm_dev_unregister(&pfdev->base);
 err_out1:
 	pm_runtime_disable(pfdev->base.dev);
 	panfrost_device_fini(pfdev);
@@ -1021,7 +1016,6 @@ static void panfrost_remove(struct platform_device *pdev)
 	struct panfrost_device *pfdev = platform_get_drvdata(pdev);
 
 	drm_dev_unregister(&pfdev->base);
-	panfrost_gem_shrinker_cleanup(&pfdev->base);
 
 	pm_runtime_get_sync(pfdev->base.dev);
 	pm_runtime_disable(pfdev->base.dev);
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
index 79d4377019e9..323a1aee255e 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -154,8 +154,8 @@ panfrost_gem_mapping_get(struct panfrost_gem_object *bo,
 void panfrost_gem_mapping_put(struct panfrost_gem_mapping *mapping);
 void panfrost_gem_teardown_mappings_locked(struct panfrost_gem_object *bo);
 
-int panfrost_gem_shrinker_init(struct drm_device *dev);
-void panfrost_gem_shrinker_cleanup(struct drm_device *dev);
+int panfrost_gem_shrinker_init(struct panfrost_device *pfdev);
+void panfrost_gem_shrinker_fini(struct panfrost_device *pfdev);
 
 void panfrost_gem_set_label(struct drm_gem_object *obj, const char *label);
 int panfrost_gem_sync(struct drm_gem_object *obj, u32 type,
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c b/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
index 2fe967a90bcb..fefae87535d6 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
@@ -95,10 +95,8 @@ panfrost_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
  *
  * This function registers and sets up the panfrost shrinker.
  */
-int panfrost_gem_shrinker_init(struct drm_device *dev)
+int panfrost_gem_shrinker_init(struct panfrost_device *pfdev)
 {
-	struct panfrost_device *pfdev = to_panfrost_device(dev);
-
 	pfdev->shrinker = shrinker_alloc(0, "drm-panfrost");
 	if (!pfdev->shrinker)
 		return -ENOMEM;
@@ -118,10 +116,8 @@ int panfrost_gem_shrinker_init(struct drm_device *dev)
  *
  * This function unregisters the panfrost shrinker.
  */
-void panfrost_gem_shrinker_cleanup(struct drm_device *dev)
+void panfrost_gem_shrinker_fini(struct panfrost_device *pfdev)
 {
-	struct panfrost_device *pfdev = to_panfrost_device(dev);
-
 	if (pfdev->shrinker)
 		shrinker_free(pfdev->shrinker);
 }

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 4/7] drm/panfrost: Move perfcnt GPU disable sequence into a helper
  2026-06-04 17:35 [PATCH v2 0/7] RPM, perfcnt and other minor fixes for Panfrost Adrián Larumbe
                   ` (2 preceding siblings ...)
  2026-06-04 17:35 ` [PATCH v2 3/7] drm/panfrost: Move shrinker initialization and unplug one level down Adrián Larumbe
@ 2026-06-04 17:35 ` Adrián Larumbe
  2026-06-04 17:47   ` sashiko-bot
                     ` (2 more replies)
  2026-06-04 17:35 ` [PATCH v2 5/7] drm/panfrost: Make reset sequence deal with an active HWPerf session Adrián Larumbe
                   ` (2 subsequent siblings)
  6 siblings, 3 replies; 25+ messages in thread
From: Adrián Larumbe @ 2026-06-04 17:35 UTC (permalink / raw)
  To: Boris Brezillon, Rob Herring, Steven Price, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Faith Ekstrand, Marty E. Plummer, Tomeu Vizoso, Eric Anholt,
	Alyssa Rosenzweig, Robin Murphy
  Cc: dri-devel, linux-kernel, Collabora Kernel Team,
	Adrián Larumbe, Neil Armstrong

Just for the sake of avoiding repetition.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
 drivers/gpu/drm/panfrost/panfrost_perfcnt.c | 31 ++++++++++++-----------------
 1 file changed, 13 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c b/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
index 7020c0192e18..ad1156678e91 100644
--- a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
+++ b/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
@@ -35,6 +35,16 @@ struct panfrost_perfcnt {
 	struct completion dump_comp;
 };
 
+static void panfrost_perfcnt_gpu_disable(struct panfrost_device *pfdev)
+{
+	gpu_write(pfdev, GPU_PERFCNT_CFG,
+		  GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_OFF));
+	gpu_write(pfdev, GPU_PRFCNT_JM_EN, 0x0);
+	gpu_write(pfdev, GPU_PRFCNT_SHADER_EN, 0x0);
+	gpu_write(pfdev, GPU_PRFCNT_MMU_L2_EN, 0x0);
+	gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0);
+}
+
 void panfrost_perfcnt_clean_cache_done(struct panfrost_device *pfdev)
 {
 	complete(&pfdev->perfcnt->dump_comp);
@@ -193,12 +203,7 @@ static int panfrost_perfcnt_disable_locked(struct panfrost_device *pfdev,
 	if (user != perfcnt->user)
 		return -EINVAL;
 
-	gpu_write(pfdev, GPU_PRFCNT_JM_EN, 0x0);
-	gpu_write(pfdev, GPU_PRFCNT_SHADER_EN, 0x0);
-	gpu_write(pfdev, GPU_PRFCNT_MMU_L2_EN, 0x0);
-	gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0);
-	gpu_write(pfdev, GPU_PERFCNT_CFG,
-		  GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_OFF));
+	panfrost_perfcnt_gpu_disable(pfdev);
 
 	perfcnt->user = NULL;
 	drm_gem_vunmap(&perfcnt->mapping->obj->base.base, &map);
@@ -327,12 +332,7 @@ int panfrost_perfcnt_init(struct panfrost_device *pfdev)
 	perfcnt->bosize = size;
 
 	/* Start with everything disabled. */
-	gpu_write(pfdev, GPU_PERFCNT_CFG,
-		  GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_OFF));
-	gpu_write(pfdev, GPU_PRFCNT_JM_EN, 0);
-	gpu_write(pfdev, GPU_PRFCNT_SHADER_EN, 0);
-	gpu_write(pfdev, GPU_PRFCNT_MMU_L2_EN, 0);
-	gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0);
+	panfrost_perfcnt_gpu_disable(pfdev);
 
 	init_completion(&perfcnt->dump_comp);
 	mutex_init(&perfcnt->lock);
@@ -344,10 +344,5 @@ int panfrost_perfcnt_init(struct panfrost_device *pfdev)
 void panfrost_perfcnt_fini(struct panfrost_device *pfdev)
 {
 	/* Disable everything before leaving. */
-	gpu_write(pfdev, GPU_PERFCNT_CFG,
-		  GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_OFF));
-	gpu_write(pfdev, GPU_PRFCNT_JM_EN, 0);
-	gpu_write(pfdev, GPU_PRFCNT_SHADER_EN, 0);
-	gpu_write(pfdev, GPU_PRFCNT_MMU_L2_EN, 0);
-	gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0);
+	panfrost_perfcnt_gpu_disable(pfdev);
 }

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 5/7] drm/panfrost: Make reset sequence deal with an active HWPerf session
  2026-06-04 17:35 [PATCH v2 0/7] RPM, perfcnt and other minor fixes for Panfrost Adrián Larumbe
                   ` (3 preceding siblings ...)
  2026-06-04 17:35 ` [PATCH v2 4/7] drm/panfrost: Move perfcnt GPU disable sequence into a helper Adrián Larumbe
@ 2026-06-04 17:35 ` Adrián Larumbe
  2026-06-04 17:49   ` sashiko-bot
  2026-06-04 18:26   ` Boris Brezillon
  2026-06-04 17:35 ` [PATCH v2 6/7] drm/panfrost: Fix PM usage_count mishandling Adrián Larumbe
  2026-06-04 17:35 ` [PATCH v2 7/7] drm/panfrost: Explicitly enable MMU interrupts at device init Adrián Larumbe
  6 siblings, 2 replies; 25+ messages in thread
From: Adrián Larumbe @ 2026-06-04 17:35 UTC (permalink / raw)
  To: Boris Brezillon, Rob Herring, Steven Price, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Faith Ekstrand, Marty E. Plummer, Tomeu Vizoso, Eric Anholt,
	Alyssa Rosenzweig, Robin Murphy
  Cc: dri-devel, linux-kernel, Collabora Kernel Team,
	Adrián Larumbe, Neil Armstrong, Claude

Right now, if there's a HW reset and an HWPerf session is active,
panfrost_mmu_reset() will reset the AS count for every single open file's
mmu struct back to 0, and also invalidate their AS numbers. Then, when
disabling hwperf, panfrost_mmu_as_put() will WARN that mmu->as_count is
less than zero.

Fix this by introducing a perfcnt HW reset path.

The choice was made to render perfcnt unusable after reset, so that a
user might have to reprogram it with a full disable/enable sequence
before requesting more perfcnt dumps.

Reported-by: Claude <noreply@anthropic.com>
Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/88
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Fixes: 7786fd108777 ("drm/panfrost: Expose performance counters through unstable ioctls")
---
 drivers/gpu/drm/panfrost/panfrost_device.c  |  1 +
 drivers/gpu/drm/panfrost/panfrost_perfcnt.c | 46 ++++++++++++++++++++++++++++-
 drivers/gpu/drm/panfrost/panfrost_perfcnt.h |  1 +
 3 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c b/drivers/gpu/drm/panfrost/panfrost_device.c
index 87b372c9e675..2805d50c1b9b 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.c
+++ b/drivers/gpu/drm/panfrost/panfrost_device.c
@@ -426,6 +426,7 @@ bool panfrost_exception_needs_reset(const struct panfrost_device *pfdev,
 
 void panfrost_device_reset(struct panfrost_device *pfdev, bool enable_job_int)
 {
+	panfrost_perfcnt_reset(pfdev);
 	panfrost_gpu_soft_reset(pfdev);
 
 	panfrost_gpu_power_on(pfdev);
diff --git a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c b/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
index ad1156678e91..c2087ea705fe 100644
--- a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
+++ b/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
@@ -33,6 +33,7 @@ struct panfrost_perfcnt {
 	struct panfrost_file_priv *user;
 	struct mutex lock;
 	struct completion dump_comp;
+	atomic_t hw_reset_happened;
 };
 
 static void panfrost_perfcnt_gpu_disable(struct panfrost_device *pfdev)
@@ -57,9 +58,13 @@ void panfrost_perfcnt_sample_done(struct panfrost_device *pfdev)
 
 static int panfrost_perfcnt_dump_locked(struct panfrost_device *pfdev)
 {
+	struct panfrost_perfcnt *perfcnt = pfdev->perfcnt;
 	u64 gpuva;
 	int ret;
 
+	if (atomic_read(&perfcnt->hw_reset_happened))
+		return -EIO;
+
 	reinit_completion(&pfdev->perfcnt->dump_comp);
 	gpuva = pfdev->perfcnt->mapping->mmnode.start << PAGE_SHIFT;
 	gpu_write(pfdev, GPU_PERFCNT_BASE_LO, lower_32_bits(gpuva));
@@ -140,6 +145,15 @@ static int panfrost_perfcnt_enable_locked(struct panfrost_device *pfdev,
 		goto err_vunmap;
 	}
 
+	/* If a reset is ongoing, the AS we get right below will be torn
+	 * down, so rather than waiting until this becomes obvious in a
+	 * perfcnt_dump() ioctl, we ask the user to try again slightly later.
+	 */
+	if (atomic_read(&pfdev->reset.pending)) {
+		ret = -EAGAIN;
+		goto err_vunmap;
+	}
+
 	ret = panfrost_mmu_as_get(pfdev, perfcnt->mapping->mmu);
 	if (ret < 0)
 		goto err_vunmap;
@@ -173,6 +187,16 @@ static int panfrost_perfcnt_enable_locked(struct panfrost_device *pfdev,
 	if (panfrost_has_hw_issue(pfdev, HW_ISSUE_8186))
 		gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0xffffffff);
 
+	/* If a reset happened, we've no way of knowing whether it was between the time we called
+	 * panfrost_mmu_as_get() or before perfcnt_enable(), so clearing this flag and going forward
+	 * isn't possible. We must clear the flag and try again in the hopes no resets will happen
+	 * between this and the next ioctl invocation.
+	 */
+	if (atomic_cmpxchg(&perfcnt->hw_reset_happened, 1, 0)) {
+		ret = EAGAIN;
+		goto err_disable;
+	}
+
 	/* The BO ref is retained by the mapping. */
 	drm_gem_object_put(&bo->base);
 
@@ -180,6 +204,8 @@ static int panfrost_perfcnt_enable_locked(struct panfrost_device *pfdev,
 
 	return 0;
 
+err_disable:
+	panfrost_perfcnt_gpu_disable(pfdev);
 err_vunmap:
 	drm_gem_vunmap(&bo->base, &map);
 err_put_mapping:
@@ -209,7 +235,8 @@ static int panfrost_perfcnt_disable_locked(struct panfrost_device *pfdev,
 	drm_gem_vunmap(&perfcnt->mapping->obj->base.base, &map);
 	perfcnt->buf = NULL;
 	panfrost_gem_close(&perfcnt->mapping->obj->base.base, file_priv);
-	panfrost_mmu_as_put(pfdev, perfcnt->mapping->mmu);
+	if (!atomic_read(&perfcnt->hw_reset_happened))
+		panfrost_mmu_as_put(pfdev, perfcnt->mapping->mmu);
 	panfrost_gem_mapping_put(perfcnt->mapping);
 	perfcnt->mapping = NULL;
 	pm_runtime_put_autosuspend(pfdev->base.dev);
@@ -346,3 +373,20 @@ void panfrost_perfcnt_fini(struct panfrost_device *pfdev)
 	/* Disable everything before leaving. */
 	panfrost_perfcnt_gpu_disable(pfdev);
 }
+
+void panfrost_perfcnt_reset(struct panfrost_device *pfdev)
+{
+	struct panfrost_perfcnt *perfcnt = pfdev->perfcnt;
+
+	/* Since this function will be called either from a scheduled HW reset
+	 * or a runtime resume, tearing down any perfcnt resources means we're
+	 * doomed to deadlocking with perfcnt_{enable/disable}, since we'd have
+	 * to take the perfecnt lock. On top of that, it'd also violate DMA fence
+	 * signalling rules because GFP_KERNEL allocations are made with the perfcnt
+	 * lock taken in perfcnt_enable. In light of this, the only thing we can do
+	 * is disabling perfcnt unconditionally, and notifying the perfcnt user of
+	 * the reset having happpened so that they can take recovery measures.
+	 */
+	panfrost_perfcnt_gpu_disable(pfdev);
+	atomic_set(&perfcnt->hw_reset_happened, 1);
+}
diff --git a/drivers/gpu/drm/panfrost/panfrost_perfcnt.h b/drivers/gpu/drm/panfrost/panfrost_perfcnt.h
index 8bbcf5f5fb33..8b9bc704b634 100644
--- a/drivers/gpu/drm/panfrost/panfrost_perfcnt.h
+++ b/drivers/gpu/drm/panfrost/panfrost_perfcnt.h
@@ -14,5 +14,6 @@ int panfrost_ioctl_perfcnt_enable(struct drm_device *dev, void *data,
 				  struct drm_file *file_priv);
 int panfrost_ioctl_perfcnt_dump(struct drm_device *dev, void *data,
 				struct drm_file *file_priv);
+void panfrost_perfcnt_reset(struct panfrost_device *pfdev);
 
 #endif

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 6/7] drm/panfrost: Fix PM usage_count mishandling
  2026-06-04 17:35 [PATCH v2 0/7] RPM, perfcnt and other minor fixes for Panfrost Adrián Larumbe
                   ` (4 preceding siblings ...)
  2026-06-04 17:35 ` [PATCH v2 5/7] drm/panfrost: Make reset sequence deal with an active HWPerf session Adrián Larumbe
@ 2026-06-04 17:35 ` Adrián Larumbe
  2026-06-04 17:50   ` sashiko-bot
                     ` (2 more replies)
  2026-06-04 17:35 ` [PATCH v2 7/7] drm/panfrost: Explicitly enable MMU interrupts at device init Adrián Larumbe
  6 siblings, 3 replies; 25+ messages in thread
From: Adrián Larumbe @ 2026-06-04 17:35 UTC (permalink / raw)
  To: Boris Brezillon, Rob Herring, Steven Price, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Faith Ekstrand, Marty E. Plummer, Tomeu Vizoso, Eric Anholt,
	Alyssa Rosenzweig, Robin Murphy
  Cc: dri-devel, linux-kernel, Collabora Kernel Team,
	Adrián Larumbe, Neil Armstrong

During device probe(), failure to do a PM get() will leave the usage_count
set to 0, which is the value assigned at device creation time. That means
when the autosuspend delay expires, runtime suspend callback won't be
invoked, so the device will remain powered on forever.

On top of that, failure to call PM put() during device unplug means
Panfrost device's PM usage_count increases monotonically for every new
module reload.

The combined outcome of both of the above was that devfreq OPP transition
notifications would be printed all the time, even when no jobs are being
submitted. This quickly fills the kernel ring buffer with junk.

Even direr than that was the fact MMU interrupts are only enabled when
the device is reset, so after device probe() the very first job targeting
the tiler heap BO would always time out, because the driver's PM runtime
resume callback would not be invoked.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Fixes: 635430797d3f ("drm/panfrost: Rework runtime PM initialization")
Fixes: 876b15d2c88d ("drm/panfrost: Fix module unload")
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 2d4b6aa95c66..545fbf2c8d0c 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -989,6 +989,7 @@ static int panfrost_probe(struct platform_device *pdev)
 	pm_runtime_set_active(pfdev->base.dev);
 	pm_runtime_mark_last_busy(pfdev->base.dev);
 	pm_runtime_enable(pfdev->base.dev);
+	pm_runtime_get_noresume(pfdev->base.dev);
 	pm_runtime_set_autosuspend_delay(pfdev->base.dev, 50); /* ~3 frames */
 	pm_runtime_use_autosuspend(pfdev->base.dev);
 
@@ -1000,10 +1001,12 @@ static int panfrost_probe(struct platform_device *pdev)
 	if (err < 0)
 		goto err_out1;
 
+	pm_runtime_put_autosuspend(pfdev->base.dev);
 
 	return 0;
 
 err_out1:
+	pm_runtime_put_noidle(pfdev->base.dev);
 	pm_runtime_disable(pfdev->base.dev);
 	panfrost_device_fini(pfdev);
 	pm_runtime_set_suspended(pfdev->base.dev);
@@ -1018,8 +1021,9 @@ static void panfrost_remove(struct platform_device *pdev)
 	drm_dev_unregister(&pfdev->base);
 
 	pm_runtime_get_sync(pfdev->base.dev);
-	pm_runtime_disable(pfdev->base.dev);
 	panfrost_device_fini(pfdev);
+	pm_runtime_put_noidle(pfdev->base.dev);
+	pm_runtime_disable(pfdev->base.dev);
 	pm_runtime_set_suspended(pfdev->base.dev);
 }
 

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 7/7] drm/panfrost: Explicitly enable MMU interrupts at device init
  2026-06-04 17:35 [PATCH v2 0/7] RPM, perfcnt and other minor fixes for Panfrost Adrián Larumbe
                   ` (5 preceding siblings ...)
  2026-06-04 17:35 ` [PATCH v2 6/7] drm/panfrost: Fix PM usage_count mishandling Adrián Larumbe
@ 2026-06-04 17:35 ` Adrián Larumbe
  2026-06-04 17:55   ` sashiko-bot
  2026-06-05  6:56   ` Boris Brezillon
  6 siblings, 2 replies; 25+ messages in thread
From: Adrián Larumbe @ 2026-06-04 17:35 UTC (permalink / raw)
  To: Boris Brezillon, Rob Herring, Steven Price, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Faith Ekstrand, Marty E. Plummer, Tomeu Vizoso, Eric Anholt,
	Alyssa Rosenzweig, Robin Murphy
  Cc: dri-devel, linux-kernel, Collabora Kernel Team,
	Adrián Larumbe, Neil Armstrong

Because the device must be in a position to accept jobs between the time
drm_dev_register() is called and autosuspend first kicks in, there's a very
narrow window inbetween during which jobs targeting the tiler buffer
object would time out, since the device's PM status is 'Active', but no MMU
interrupts were enabled at device initialisation time.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Fixes: 73e467f60acd ("drm/panfrost: Consolidate reset handling")
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 11 +++++++++--
 drivers/gpu/drm/panfrost/panfrost_mmu.h |  2 ++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 4a3162c3b659..0e1ad6256c2c 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -336,6 +336,12 @@ void panfrost_mmu_as_put(struct panfrost_device *pfdev, struct panfrost_mmu *mmu
 	WARN_ON(atomic_read(&mmu->as_count) < 0);
 }
 
+void panfrost_mmu_enable_interrupts(struct panfrost_device *pfdev)
+{
+	mmu_write(pfdev, MMU_INT_CLEAR, ~0);
+	mmu_write(pfdev, MMU_INT_MASK, ~0);
+}
+
 void panfrost_mmu_reset(struct panfrost_device *pfdev)
 {
 	struct panfrost_mmu *mmu, *mmu_tmp;
@@ -355,8 +361,7 @@ void panfrost_mmu_reset(struct panfrost_device *pfdev)
 
 	spin_unlock(&pfdev->as_lock);
 
-	mmu_write(pfdev, MMU_INT_CLEAR, ~0);
-	mmu_write(pfdev, MMU_INT_MASK, ~0);
+	panfrost_mmu_enable_interrupts(pfdev);
 }
 
 static size_t get_pgsize(u64 addr, size_t size, size_t *count)
@@ -970,6 +975,8 @@ int panfrost_mmu_init(struct panfrost_device *pfdev)
 		return err;
 	}
 
+	panfrost_mmu_enable_interrupts(pfdev);
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.h b/drivers/gpu/drm/panfrost/panfrost_mmu.h
index 27c3c65ed074..3140eb4fea0f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.h
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.h
@@ -12,6 +12,8 @@ struct panfrost_mmu;
 int panfrost_mmu_map(struct panfrost_gem_mapping *mapping);
 void panfrost_mmu_unmap(struct panfrost_gem_mapping *mapping);
 
+void panfrost_mmu_enable_interrupts(struct panfrost_device *pfdev);
+
 int panfrost_mmu_init(struct panfrost_device *pfdev);
 void panfrost_mmu_fini(struct panfrost_device *pfdev);
 void panfrost_mmu_reset(struct panfrost_device *pfdev);

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/7] drm/panfrost: Prevent division by 0
  2026-06-04 17:35 ` [PATCH v2 2/7] drm/panfrost: Prevent division by 0 Adrián Larumbe
@ 2026-06-04 17:44   ` sashiko-bot
  2026-06-04 18:02   ` Boris Brezillon
  1 sibling, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-04 17:44 UTC (permalink / raw)
  To: Adrián Larumbe; +Cc: dri-devel

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] The exact same division-by-zero vulnerability fixed in panfrost_devfreq.c is present in the Lima, Panthor, and Nouveau (gk20a) drivers.
--

commit dc73067c1030f65e3609c76a060b1b1176f85e4e
Author: Adrián Larumbe <adrian.larumbe@collabora.com>

drm/panfrost: Prevent division by 0

This commit clamps the devfreq total_time to 0 when it is less than 100ns
in panfrost_devfreq_get_dev_status() to prevent a division by zero error
during debug printing.

Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/88

> diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
> index b51c307788110..e34de47bf43bb 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
> @@ -78,7 +78,8 @@ static int panfrost_devfreq_get_dev_status(struct device *dev,
>  
>  	dev_dbg(pfdev->base.dev, "busy %lu total %lu %lu %% freq %lu MHz\n",
>  		status->busy_time, status->total_time,
> -		status->busy_time / (status->total_time / 100),
> +		status->total_time >= 100 ?
> +			status->busy_time / (status->total_time / 100) : 0,

[Severity: High]
This is a pre-existing issue, but the exact same division-by-zero vulnerability
fixed here is also present in other DRM drivers.

In drivers/gpu/drm/lima/lima_devfreq.c:lima_devfreq_get_dev_status():
    status->busy_time / (status->total_time / 100),

In drivers/gpu/drm/panthor/panthor_devfreq.c:panthor_devfreq_get_dev_status():
    status->busy_time / (status->total_time / 100),

In drivers/gpu/drm/nouveau/nvkm/subdev/clk/gk20a_devfreq.c:gk20a_devfreq_get_dev_status():
    status->busy_time / (status->total_time / 100),

If userspace repeatedly reads the devfreq load in quick succession, total_time
can be less than 100ns, leading to a division-by-zero kernel panic when debug
printing is enabled.

Should these other drivers be patched similarly?

>  		status->current_frequency / 1000 / 1000);
>  
>  	return 0;
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604-claude-fixes-v2-0-57c6bd4c1655@collabora.com?part=2

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/7] drm/panfrost: Move perfcnt GPU disable sequence into a helper
  2026-06-04 17:35 ` [PATCH v2 4/7] drm/panfrost: Move perfcnt GPU disable sequence into a helper Adrián Larumbe
@ 2026-06-04 17:47   ` sashiko-bot
  2026-06-04 18:05   ` Boris Brezillon
  2026-06-05 10:34   ` Steven Price
  2 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-04 17:47 UTC (permalink / raw)
  To: Adrián Larumbe; +Cc: dri-devel

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] The refactored `panfrost_perfcnt_gpu_disable()` helper reverses the register write sequence when disabling performance counters, potentially breaking the HW_ISSUE_8186 (PRLAM-8186) hardware workaround and risking hardware state corruption.
--

commit b1a0eb47a40de5889be52c5109cc3ee780f78598
Author: Adrián Larumbe <adrian.larumbe@collabora.com>

drm/panfrost: Move perfcnt GPU disable sequence into a helper

This commit refactors the performance counter disable sequences into a
single helper function to avoid repetition.

> diff --git a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c b/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
> index 7020c0192e189..ad1156678e91c 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
> @@ -35,6 +35,16 @@ struct panfrost_perfcnt {
>  	struct completion dump_comp;
>  };
>  
> +static void panfrost_perfcnt_gpu_disable(struct panfrost_device *pfdev)
> +{
> +	gpu_write(pfdev, GPU_PERFCNT_CFG,
> +		  GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_OFF));
> +	gpu_write(pfdev, GPU_PRFCNT_JM_EN, 0x0);
> +	gpu_write(pfdev, GPU_PRFCNT_SHADER_EN, 0x0);
> +	gpu_write(pfdev, GPU_PRFCNT_MMU_L2_EN, 0x0);
> +	gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0);
> +}

[Severity: High]
Does this new helper reverse the hardware disable sequence when called on
actively running counters?

In the original panfrost_perfcnt_disable_locked(), the block enable
registers (like GPU_PRFCNT_TILER_EN) were explicitly cleared to 0 before
the master configuration GPU_PERFCNT_CFG was turned off. The new helper
reverses this order.

Could shutting down the master configuration while sub-blocks are still
actively generating events cause hardware state corruption or bus hangs?

Modifying the main configuration while the Tiler counters are actively
enabled might also trigger hardware bugs (e.g., HW_ISSUE_8186), which the
original sequence safely avoided.

[ ... ]

> @@ -193,12 +203,7 @@ static int panfrost_perfcnt_disable_locked(struct panfrost_device *pfdev,
>  	if (user != perfcnt->user)
>  		return -EINVAL;
>  
> -	gpu_write(pfdev, GPU_PRFCNT_JM_EN, 0x0);
> -	gpu_write(pfdev, GPU_PRFCNT_SHADER_EN, 0x0);
> -	gpu_write(pfdev, GPU_PRFCNT_MMU_L2_EN, 0x0);
> -	gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0);
> -	gpu_write(pfdev, GPU_PERFCNT_CFG,
> -		  GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_OFF));
> +	panfrost_perfcnt_gpu_disable(pfdev);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604-claude-fixes-v2-0-57c6bd4c1655@collabora.com?part=4

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/7] drm/panfrost: Make reset sequence deal with an active HWPerf session
  2026-06-04 17:35 ` [PATCH v2 5/7] drm/panfrost: Make reset sequence deal with an active HWPerf session Adrián Larumbe
@ 2026-06-04 17:49   ` sashiko-bot
  2026-06-04 18:26   ` Boris Brezillon
  1 sibling, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-04 17:49 UTC (permalink / raw)
  To: Adrián Larumbe; +Cc: dri-devel

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] Address Space reference (mmu->as_count) permanently leaks when hardware reset aborts perfcnt_enable_locked.
- [High] A TOCTOU race between disable_locked and the asynchronous reset workqueue reintroduces the mmu->as_count underflow WARN.
- [Medium] A positive error code (EAGAIN) is incorrectly returned directly to userspace instead of the negative standard (-EAGAIN).
- [Medium] Spurious initialization failures occur for new perfcnt sessions due to a stale hw_reset_happened flag.
--

commit 555dc068f79668462a998f3ee4fa95ead697090d
Author: Adrián Larumbe <adrian.larumbe@collabora.com>

drm/panfrost: Make reset sequence deal with an active HWPerf session

This commit introduces a perfcnt HW reset path to handle active HWPerf
sessions during a HW reset. It renders perfcnt unusable after reset and
notifies the user to take recovery measures.

Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/88
Fixes: 7786fd108777 ("drm/panfrost: Expose performance counters through unstable ioctls")

> diff --git a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c b/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
> index ad1156678e91c..c2087ea705fe6 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_perfcnt.c

[ ... ]

> @@ -173,6 +187,16 @@ static int panfrost_perfcnt_enable_locked(struct panfrost_device *pfdev,
>  	if (panfrost_has_hw_issue(pfdev, HW_ISSUE_8186))
>  		gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0xffffffff);
>  
> +	/* If a reset happened, we've no way of knowing whether it was between the time we called
> +	 * panfrost_mmu_as_get() or before perfcnt_enable(), so clearing this flag and going forward
> +	 * isn't possible. We must clear the flag and try again in the hopes no resets will happen
> +	 * between this and the next ioctl invocation.
> +	 */
> +	if (atomic_cmpxchg(&perfcnt->hw_reset_happened, 1, 0)) {
> +		ret = EAGAIN;

[Severity: Medium]
Does this return a positive error code directly to userspace instead of the
standard negative one (-EAGAIN)?

DRM ioctls pass positive return values directly back to user space as the
ioctl syscall result. Because of this, user space will incorrectly interpret
this as success and subsequently fail on dump ioctls since the session wasn't
actually enabled.

> +		goto err_disable;

[Severity: High]
Can this leak the Address Space reference if a hardware reset aborts the
enable sequence?

If a GPU reset occurs before panfrost_mmu_as_get() is called, the reset
handler clears mmu->as_count to 0, and then panfrost_mmu_as_get() increments
it to 1. The code subsequently detects the reset via the hw_reset_happened
flag, clears it, and jumps to err_disable here. Because err_disable skips
panfrost_mmu_as_put(), the AS slot reference is permanently leaked,
eventually exhausting the limited hardware AS slots.

> +	}
> +
>  	/* The BO ref is retained by the mapping. */
>  	drm_gem_object_put(&bo->base);

[ ... ]

> @@ -209,7 +235,8 @@ static int panfrost_perfcnt_disable_locked(struct panfrost_device *pfdev,
>  	drm_gem_vunmap(&perfcnt->mapping->obj->base.base, &map);
>  	perfcnt->buf = NULL;
>  	panfrost_gem_close(&perfcnt->mapping->obj->base.base, file_priv);
> -	panfrost_mmu_as_put(pfdev, perfcnt->mapping->mmu);
> +	if (!atomic_read(&perfcnt->hw_reset_happened))
> +		panfrost_mmu_as_put(pfdev, perfcnt->mapping->mmu);

[Severity: High]
Could a race between this check and an asynchronous reset reintroduce the
mmu->as_count underflow WARN that this patch was explicitly intended to
prevent?

The non-atomic gap between reading the flag and executing
panfrost_mmu_as_put() creates an exposure window. If an interleaved
asynchronous hardware reset occurs in this window, panfrost_device_reset()
will clear mmu->as_count to 0. The subsequent execution of
panfrost_mmu_as_put() decrements it to -1, directly triggering the WARN.

>  	panfrost_gem_mapping_put(perfcnt->mapping);
>  	perfcnt->mapping = NULL;
>  	pm_runtime_put_autosuspend(pfdev->base.dev);

[ ... ]

> @@ -346,3 +373,20 @@ void panfrost_perfcnt_fini(struct panfrost_device *pfdev)
>  	/* Disable everything before leaving. */
>  	panfrost_perfcnt_gpu_disable(pfdev);
>  }
> +
> +void panfrost_perfcnt_reset(struct panfrost_device *pfdev)
> +{
> +	struct panfrost_perfcnt *perfcnt = pfdev->perfcnt;
> +
> +	/* Since this function will be called either from a scheduled HW reset
> +	 * or a runtime resume, tearing down any perfcnt resources means we're
> +	 * doomed to deadlocking with perfcnt_{enable/disable}, since we'd have
> +	 * to take the perfecnt lock. On top of that, it'd also violate DMA fence
> +	 * signalling rules because GFP_KERNEL allocations are made with the perfcnt
> +	 * lock taken in perfcnt_enable. In light of this, the only thing we can do
> +	 * is disabling perfcnt unconditionally, and notifying the perfcnt user of
> +	 * the reset having happpened so that they can take recovery measures.
> +	 */
> +	panfrost_perfcnt_gpu_disable(pfdev);
> +	atomic_set(&perfcnt->hw_reset_happened, 1);

[Severity: Medium]
Will this unconditionally set the reset flag even if no perfcnt session is
active, causing the next new session to spuriously fail?

Since this function is called during any GPU reset, it sets the flag even
when perfcnt->user == NULL. If user space later attempts to start a
completely new session, panfrost_perfcnt_enable_locked() will consume this
stale flag and abort with EAGAIN, functionally breaking the feature on the
first attempt after a reset.

> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604-claude-fixes-v2-0-57c6bd4c1655@collabora.com?part=5

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 6/7] drm/panfrost: Fix PM usage_count mishandling
  2026-06-04 17:35 ` [PATCH v2 6/7] drm/panfrost: Fix PM usage_count mishandling Adrián Larumbe
@ 2026-06-04 17:50   ` sashiko-bot
  2026-06-04 18:36   ` Boris Brezillon
  2026-06-05 10:48   ` Steven Price
  2 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-04 17:50 UTC (permalink / raw)
  To: Adrián Larumbe; +Cc: dri-devel

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [High] Reordering PM teardown in device removal and probe error paths creates race conditions (UAF/kernel panics) and violates PM domain rules.

Pre-existing issues:
- [High] Improper DRM device unplug allows subsequent ioctls to access freed driver resources, causing Use-After-Free.
--

commit 385821af932c300aa013c14b13508c8e9cf10f64
Author: Adrián Larumbe <adrian.larumbe@collabora.com>

drm/panfrost: Fix PM usage_count mishandling

This commit addresses a PM usage_count mishandling issue during device probe
and unplug. It adds balanced PM get/put calls to prevent devices from
staying powered on indefinitely or triggering constant devfreq OPP transition
notifications.

> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index 2d4b6aa95c66d..545fbf2c8d0cb 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c

[ ... ]

> @@ -1000,10 +1001,12 @@ static int panfrost_probe(struct platform_device *pdev)
>  	if (err < 0)
>  		goto err_out1;
>  
> +	pm_runtime_put_autosuspend(pfdev->base.dev);
>  
>  	return 0;
>  
>  err_out1:
> +	pm_runtime_put_noidle(pfdev->base.dev);
>  	pm_runtime_disable(pfdev->base.dev);
>  	panfrost_device_fini(pfdev);
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[Severity: High]
Does dropping the usage count before pm_runtime_disable() create a race
condition where the suspend callback can run and disable clocks before
hardware shutdown?

Because the usage count is dropped early, a concurrent PM event could trigger
the suspend callback, disabling clocks. Then, panfrost_device_fini() calls
panfrost_gpu_fini() which writes to MMIO registers. Could writing to
unclocked registers on ARM SoCs cause fatal bus errors or panics?

>  	pm_runtime_set_suspended(pfdev->base.dev);

[ ... ]

> @@ -1018,8 +1021,9 @@ static void panfrost_remove(struct platform_device *pdev)
>  	drm_dev_unregister(&pfdev->base);
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[Severity: High]
This is a pre-existing issue, but does using drm_dev_unregister() instead of
drm_dev_unplug() allow subsequent ioctls to access freed driver resources?

Because drm_dev_unregister() removes the device from sysfs but does not set
the dev->unplugged flag, the DRM core (drm_ioctl_kernel) does not block
ioctls from userspace processes that already hold an open file descriptor.

Since panfrost_remove() proceeds to call panfrost_device_fini() which
destroys critical structures (job manager, MMU contexts, shrinker lists) and
powers down hardware, could a user issuing an ioctl during or after driver
unbind via sysfs access freed structures and unmapped MMIO regions, resulting
in a use-after-free?

>  
>  	pm_runtime_get_sync(pfdev->base.dev);
> -	pm_runtime_disable(pfdev->base.dev);
>  	panfrost_device_fini(pfdev);
> +	pm_runtime_put_noidle(pfdev->base.dev);
> +	pm_runtime_disable(pfdev->base.dev);
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[Severity: High]
Does placing pm_runtime_put_noidle() and pm_runtime_disable() after
panfrost_device_fini() create a window where runtime PM can access freed
structures?

pm_runtime_put_noidle() drops the PM usage count to 0. At this point, a
concurrent PM event (e.g., userspace writing to /sys/.../power/control) can
trigger panfrost_device_runtime_suspend() before pm_runtime_disable()
executes.

Since panfrost_device_fini() has already destroyed driver structures like the
job manager, would the suspend callback access freed memory (like checking
panfrost_jm_is_idle), causing a use-after-free?

>  	pm_runtime_set_suspended(pfdev->base.dev);
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604-claude-fixes-v2-0-57c6bd4c1655@collabora.com?part=6

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 7/7] drm/panfrost: Explicitly enable MMU interrupts at device init
  2026-06-04 17:35 ` [PATCH v2 7/7] drm/panfrost: Explicitly enable MMU interrupts at device init Adrián Larumbe
@ 2026-06-04 17:55   ` sashiko-bot
  2026-06-05  6:56   ` Boris Brezillon
  1 sibling, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-04 17:55 UTC (permalink / raw)
  To: Adrián Larumbe; +Cc: dri-devel

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [High] The MMU IRQ handler remains active while device clocks and power domains are disabled during error teardown, causing unclocked MMIO accesses (S-Error).
- [High] panfrost_reset() does not mask or synchronize the MMU interrupt before resetting the GPU hardware, leading to a race condition with the MMU IRQ thread.
--

commit cac1cfb873df895888c7871605b659cdaf55b78e
Author: Adrián Larumbe <adrian.larumbe@collabora.com>
Date:   Thu Jun 4 17:35:26 2026 +0000

    drm/panfrost: Explicitly enable MMU interrupts at device init

    Because the device must be in a position to accept jobs between the time
    drm_dev_register() is called and autosuspend first kicks in, there's a very
    narrow window inbetween during which jobs targeting the tiler buffer
    object would time out, since the device's PM status is 'Active', but no MMU
    interrupts were enabled at device initialisation time.

> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> index 4a3162c3b659f..0e1ad6256c2cb 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
[ ... ]
> @@ -355,8 +361,7 @@ void panfrost_mmu_reset(struct panfrost_device *pfdev)
>  
>  	spin_unlock(&pfdev->as_lock);
>  
> -	mmu_write(pfdev, MMU_INT_CLEAR, ~0);
> -	mmu_write(pfdev, MMU_INT_MASK, ~0);
> +	panfrost_mmu_enable_interrupts(pfdev);
>  }

[Severity: High]
This isn't a bug introduced by this patch, but while reviewing the interrupt
handling during reset, does panfrost_reset() need to mask and synchronize
the MMU interrupt before resetting the hardware?

When a job timeout occurs, panfrost_reset() correctly masks and synchronizes
the job manager interrupts:

drivers/gpu/drm/panfrost/panfrost_job.c:panfrost_reset() {
	/* Mask job interrupts and synchronize to make sure we won't be
	 * interrupted during our reset.
	 */
	job_write(pfdev, JOB_INT_MASK, 0);
	synchronize_irq(pfdev->js->irq);
...
	panfrost_device_reset(pfdev, true);
...
}

However, it does not do the same for the MMU interrupt (pfdev->mmu_irq).

Because MMU faults often cause job timeouts, the
panfrost_mmu_irq_handler_thread could be actively running concurrently on
another CPU.

If the IRQ thread attempts to handle an MMU page fault, it will write to MMU
registers while panfrost_device_reset() concurrently calls
panfrost_gpu_soft_reset() to reset the hardware.

Can uncoordinated MMIO accesses to hardware undergoing a soft reset lead to
unpredictable behavior or bus errors?

[ ... ]
> @@ -970,6 +975,8 @@ int panfrost_mmu_init(struct panfrost_device *pfdev)
>  		return err;
>  	}
>  
> +	panfrost_mmu_enable_interrupts(pfdev);
> +

[Severity: High]
This is a pre-existing issue, but by explicitly enabling the MMU interrupts
here during init, does this expose a race condition in the device
initialization error paths?

If a subsequent initialization step like panfrost_jm_init() fails,
panfrost_device_init() will jump to the out_mmu label:

drivers/gpu/drm/panfrost/panfrost_device.c:panfrost_device_init() {
...
	err = panfrost_jm_init(pfdev);
	if (err)
		goto out_mmu;
...
out_mmu:
	panfrost_mmu_fini(pfdev);
...
out_clk:
	panfrost_clk_fini(pfdev);
...
}

panfrost_mmu_fini() masks the interrupt in hardware, but does not synchronize
or disable the IRQ handler in the kernel:

void panfrost_mmu_fini(struct panfrost_device *pfdev)
{
	mmu_write(pfdev, MMU_INT_MASK, 0);
}

Because panfrost_mmu_init() registers a shared threaded IRQ handler using
devm_request_threaded_irq(), the handler remains active during the rest of
the teardown.

If an interrupt fires or the thread runs after panfrost_clk_fini() has
disabled the hardware clocks, panfrost_mmu_irq_handler() will read unclocked
MMIO registers (e.g., mmu_read(pfdev, MMU_INT_STAT)).

Could this lead to a fatal S-Error (synchronous external abort)?

>  	return 0;
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604-claude-fixes-v2-0-57c6bd4c1655@collabora.com?part=7

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/7] drm/panfrost: Check another bo field for cache option query
  2026-06-04 17:35 ` [PATCH v2 1/7] drm/panfrost: Check another bo field for cache option query Adrián Larumbe
@ 2026-06-04 17:57   ` Boris Brezillon
  2026-06-05 10:29   ` Steven Price
  1 sibling, 0 replies; 25+ messages in thread
From: Boris Brezillon @ 2026-06-04 17:57 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: Rob Herring, Steven Price, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
	Marty E. Plummer, Tomeu Vizoso, Eric Anholt, Alyssa Rosenzweig,
	Robin Murphy, dri-devel, linux-kernel, Collabora Kernel Team,
	Neil Armstrong, Claude

On Thu, 04 Jun 2026 18:35:20 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> When querying the properties of a BO, whether it was created in
> Writeback caching mode for userspace mappings, is recorded in a Panfrost
> BO's wb_mmap field.
> 
> Reported-by: Claude <noreply@anthropic.com>
> Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/88
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> Fixes: 62eedf1ccba5 ("drm/panfrost: Add flag to map GEM object Write-Back Cacheable")

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>

> ---
>  drivers/gpu/drm/panfrost/panfrost_drv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index 3d0bdba2a474..cb8e5015847f 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -659,7 +659,7 @@ static int panfrost_ioctl_query_bo_info(struct drm_device *dev, void *data,
>  		if (bo->is_heap)
>  			args->create_flags |= PANFROST_BO_HEAP;
>  
> -		if (!bo->base.map_wc)
> +		if (bo->wb_mmap)
>  			args->create_flags |= PANFROST_BO_WB_MMAP;
>  	}
>  
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/7] drm/panfrost: Prevent division by 0
  2026-06-04 17:35 ` [PATCH v2 2/7] drm/panfrost: Prevent division by 0 Adrián Larumbe
  2026-06-04 17:44   ` sashiko-bot
@ 2026-06-04 18:02   ` Boris Brezillon
  2026-06-05 10:29     ` Steven Price
  1 sibling, 1 reply; 25+ messages in thread
From: Boris Brezillon @ 2026-06-04 18:02 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: Rob Herring, Steven Price, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
	Marty E. Plummer, Tomeu Vizoso, Eric Anholt, Alyssa Rosenzweig,
	Robin Murphy, dri-devel, linux-kernel, Collabora Kernel Team,
	Neil Armstrong, Claude

On Thu, 04 Jun 2026 18:35:21 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> When updating and debug-printing devfreq stats, in the very unlikely
> offchance that total device time is less than 100 ns, clamp it to 0 to
> avoid division by 0.
> 
> Reported-by: Claude <noreply@anthropic.com>
> Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/88
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver")
> ---
>  drivers/gpu/drm/panfrost/panfrost_devfreq.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
> index b51c30778811..e34de47bf43b 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
> @@ -78,7 +78,8 @@ static int panfrost_devfreq_get_dev_status(struct device *dev,
>  
>  	dev_dbg(pfdev->base.dev, "busy %lu total %lu %lu %% freq %lu MHz\n",
>  		status->busy_time, status->total_time,
> -		status->busy_time / (status->total_time / 100),
> +		status->total_time >= 100 ?
> +			status->busy_time / (status->total_time / 100) : 0,

or:

		status->busy_time * 100 / MAX(status->total_time, 1),

>  		status->current_frequency / 1000 / 1000);
>  
>  	return 0;
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/7] drm/panfrost: Move shrinker initialization and unplug one level down
  2026-06-04 17:35 ` [PATCH v2 3/7] drm/panfrost: Move shrinker initialization and unplug one level down Adrián Larumbe
@ 2026-06-04 18:04   ` Boris Brezillon
  0 siblings, 0 replies; 25+ messages in thread
From: Boris Brezillon @ 2026-06-04 18:04 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: Rob Herring, Steven Price, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
	Marty E. Plummer, Tomeu Vizoso, Eric Anholt, Alyssa Rosenzweig,
	Robin Murphy, dri-devel, linux-kernel, Collabora Kernel Team,
	Neil Armstrong

On Thu, 04 Jun 2026 18:35:22 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> Since the moment we call drm_dev_register() the device should be in a
> position to accept jobs, so it's best if the shrinker is already
> initialized by then.
> 
> On top of that, make shrinker functions take an panfrost_device pointer
> like other functions in the same sequence and rename them accordingly.
> 
> Essentially mimic the init/fini behaviour in Panthor.
> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> ---
>  drivers/gpu/drm/panfrost/panfrost_device.c       | 7 +++++++
>  drivers/gpu/drm/panfrost/panfrost_drv.c          | 6 ------
>  drivers/gpu/drm/panfrost/panfrost_gem.h          | 4 ++--
>  drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c | 8 ++------
>  4 files changed, 11 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c b/drivers/gpu/drm/panfrost/panfrost_device.c
> index 7fed22d555a5..87b372c9e675 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_device.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_device.c
> @@ -282,9 +282,15 @@ int panfrost_device_init(struct panfrost_device *pfdev)
>  	if (err)
>  		goto out_job;
>  
> +	err = panfrost_gem_shrinker_init(pfdev);
> +	if (err)
> +		goto out_perfcnt;
> +
>  	panfrost_gem_init(pfdev);

It feels weird to have the GEM shrinker initialized before the GEM
component. Also, I'd consider the shrinker part of the GEM logic, so
maybe call panfrost_gem_shrinker_init() from panfrost_gem_init() and
change the prototype of panfrost_gem_init() to return an int.

>  
>  	return 0;
> +out_perfcnt:
> +	panfrost_perfcnt_fini(pfdev);
>  out_job:
>  	panfrost_jm_fini(pfdev);
>  out_mmu:
> @@ -306,6 +312,7 @@ int panfrost_device_init(struct panfrost_device *pfdev)
>  
>  void panfrost_device_fini(struct panfrost_device *pfdev)
>  {
> +	panfrost_gem_shrinker_fini(pfdev);
>  	panfrost_perfcnt_fini(pfdev);
>  	panfrost_jm_fini(pfdev);
>  	panfrost_mmu_fini(pfdev);
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index cb8e5015847f..2d4b6aa95c66 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -1000,14 +1000,9 @@ static int panfrost_probe(struct platform_device *pdev)
>  	if (err < 0)
>  		goto err_out1;
>  
> -	err = panfrost_gem_shrinker_init(&pfdev->base);
> -	if (err)
> -		goto err_out2;
>  
>  	return 0;
>  
> -err_out2:
> -	drm_dev_unregister(&pfdev->base);
>  err_out1:
>  	pm_runtime_disable(pfdev->base.dev);
>  	panfrost_device_fini(pfdev);
> @@ -1021,7 +1016,6 @@ static void panfrost_remove(struct platform_device *pdev)
>  	struct panfrost_device *pfdev = platform_get_drvdata(pdev);
>  
>  	drm_dev_unregister(&pfdev->base);
> -	panfrost_gem_shrinker_cleanup(&pfdev->base);
>  
>  	pm_runtime_get_sync(pfdev->base.dev);
>  	pm_runtime_disable(pfdev->base.dev);
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
> index 79d4377019e9..323a1aee255e 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
> @@ -154,8 +154,8 @@ panfrost_gem_mapping_get(struct panfrost_gem_object *bo,
>  void panfrost_gem_mapping_put(struct panfrost_gem_mapping *mapping);
>  void panfrost_gem_teardown_mappings_locked(struct panfrost_gem_object *bo);
>  
> -int panfrost_gem_shrinker_init(struct drm_device *dev);
> -void panfrost_gem_shrinker_cleanup(struct drm_device *dev);
> +int panfrost_gem_shrinker_init(struct panfrost_device *pfdev);
> +void panfrost_gem_shrinker_fini(struct panfrost_device *pfdev);
>  
>  void panfrost_gem_set_label(struct drm_gem_object *obj, const char *label);
>  int panfrost_gem_sync(struct drm_gem_object *obj, u32 type,
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c b/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
> index 2fe967a90bcb..fefae87535d6 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
> @@ -95,10 +95,8 @@ panfrost_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
>   *
>   * This function registers and sets up the panfrost shrinker.
>   */
> -int panfrost_gem_shrinker_init(struct drm_device *dev)
> +int panfrost_gem_shrinker_init(struct panfrost_device *pfdev)
>  {
> -	struct panfrost_device *pfdev = to_panfrost_device(dev);
> -
>  	pfdev->shrinker = shrinker_alloc(0, "drm-panfrost");
>  	if (!pfdev->shrinker)
>  		return -ENOMEM;
> @@ -118,10 +116,8 @@ int panfrost_gem_shrinker_init(struct drm_device *dev)
>   *
>   * This function unregisters the panfrost shrinker.
>   */
> -void panfrost_gem_shrinker_cleanup(struct drm_device *dev)
> +void panfrost_gem_shrinker_fini(struct panfrost_device *pfdev)
>  {
> -	struct panfrost_device *pfdev = to_panfrost_device(dev);
> -
>  	if (pfdev->shrinker)
>  		shrinker_free(pfdev->shrinker);
>  }
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/7] drm/panfrost: Move perfcnt GPU disable sequence into a helper
  2026-06-04 17:35 ` [PATCH v2 4/7] drm/panfrost: Move perfcnt GPU disable sequence into a helper Adrián Larumbe
  2026-06-04 17:47   ` sashiko-bot
@ 2026-06-04 18:05   ` Boris Brezillon
  2026-06-05 10:34   ` Steven Price
  2 siblings, 0 replies; 25+ messages in thread
From: Boris Brezillon @ 2026-06-04 18:05 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: Rob Herring, Steven Price, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
	Marty E. Plummer, Tomeu Vizoso, Eric Anholt, Alyssa Rosenzweig,
	Robin Murphy, dri-devel, linux-kernel, Collabora Kernel Team,
	Neil Armstrong

On Thu, 04 Jun 2026 18:35:23 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> Just for the sake of avoiding repetition.
> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>

> ---
>  drivers/gpu/drm/panfrost/panfrost_perfcnt.c | 31 ++++++++++++-----------------
>  1 file changed, 13 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c b/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
> index 7020c0192e18..ad1156678e91 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
> @@ -35,6 +35,16 @@ struct panfrost_perfcnt {
>  	struct completion dump_comp;
>  };
>  
> +static void panfrost_perfcnt_gpu_disable(struct panfrost_device *pfdev)
> +{
> +	gpu_write(pfdev, GPU_PERFCNT_CFG,
> +		  GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_OFF));
> +	gpu_write(pfdev, GPU_PRFCNT_JM_EN, 0x0);
> +	gpu_write(pfdev, GPU_PRFCNT_SHADER_EN, 0x0);
> +	gpu_write(pfdev, GPU_PRFCNT_MMU_L2_EN, 0x0);
> +	gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0);
> +}
> +
>  void panfrost_perfcnt_clean_cache_done(struct panfrost_device *pfdev)
>  {
>  	complete(&pfdev->perfcnt->dump_comp);
> @@ -193,12 +203,7 @@ static int panfrost_perfcnt_disable_locked(struct panfrost_device *pfdev,
>  	if (user != perfcnt->user)
>  		return -EINVAL;
>  
> -	gpu_write(pfdev, GPU_PRFCNT_JM_EN, 0x0);
> -	gpu_write(pfdev, GPU_PRFCNT_SHADER_EN, 0x0);
> -	gpu_write(pfdev, GPU_PRFCNT_MMU_L2_EN, 0x0);
> -	gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0);
> -	gpu_write(pfdev, GPU_PERFCNT_CFG,
> -		  GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_OFF));
> +	panfrost_perfcnt_gpu_disable(pfdev);
>  
>  	perfcnt->user = NULL;
>  	drm_gem_vunmap(&perfcnt->mapping->obj->base.base, &map);
> @@ -327,12 +332,7 @@ int panfrost_perfcnt_init(struct panfrost_device *pfdev)
>  	perfcnt->bosize = size;
>  
>  	/* Start with everything disabled. */
> -	gpu_write(pfdev, GPU_PERFCNT_CFG,
> -		  GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_OFF));
> -	gpu_write(pfdev, GPU_PRFCNT_JM_EN, 0);
> -	gpu_write(pfdev, GPU_PRFCNT_SHADER_EN, 0);
> -	gpu_write(pfdev, GPU_PRFCNT_MMU_L2_EN, 0);
> -	gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0);
> +	panfrost_perfcnt_gpu_disable(pfdev);
>  
>  	init_completion(&perfcnt->dump_comp);
>  	mutex_init(&perfcnt->lock);
> @@ -344,10 +344,5 @@ int panfrost_perfcnt_init(struct panfrost_device *pfdev)
>  void panfrost_perfcnt_fini(struct panfrost_device *pfdev)
>  {
>  	/* Disable everything before leaving. */
> -	gpu_write(pfdev, GPU_PERFCNT_CFG,
> -		  GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_OFF));
> -	gpu_write(pfdev, GPU_PRFCNT_JM_EN, 0);
> -	gpu_write(pfdev, GPU_PRFCNT_SHADER_EN, 0);
> -	gpu_write(pfdev, GPU_PRFCNT_MMU_L2_EN, 0);
> -	gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0);
> +	panfrost_perfcnt_gpu_disable(pfdev);
>  }
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/7] drm/panfrost: Make reset sequence deal with an active HWPerf session
  2026-06-04 17:35 ` [PATCH v2 5/7] drm/panfrost: Make reset sequence deal with an active HWPerf session Adrián Larumbe
  2026-06-04 17:49   ` sashiko-bot
@ 2026-06-04 18:26   ` Boris Brezillon
  2026-06-05 10:41     ` Steven Price
  1 sibling, 1 reply; 25+ messages in thread
From: Boris Brezillon @ 2026-06-04 18:26 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: Rob Herring, Steven Price, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
	Marty E. Plummer, Tomeu Vizoso, Eric Anholt, Alyssa Rosenzweig,
	Robin Murphy, dri-devel, linux-kernel, Collabora Kernel Team,
	Neil Armstrong, Claude

On Thu, 04 Jun 2026 18:35:24 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> Right now, if there's a HW reset and an HWPerf session is active,
> panfrost_mmu_reset() will reset the AS count for every single open file's
> mmu struct back to 0, and also invalidate their AS numbers. Then, when
> disabling hwperf, panfrost_mmu_as_put() will WARN that mmu->as_count is
> less than zero.
> 
> Fix this by introducing a perfcnt HW reset path.
> 
> The choice was made to render perfcnt unusable after reset, so that a
> user might have to reprogram it with a full disable/enable sequence
> before requesting more perfcnt dumps.

Can't we do better than that. We store the config of the perf session,
so if a reset is in progress, we can simply block on it, restore the
old config in the post_reset path, and get going with the DUMP request
once the reset has been done. There's certainly something to do to
report the discontinuity to userspace, but that's something we can
reflect through and extra field added to drm_panfrost_perfcnt_dump.

> 
> Reported-by: Claude <noreply@anthropic.com>
> Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/88
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> Fixes: 7786fd108777 ("drm/panfrost: Expose performance counters through unstable ioctls")
> ---
>  drivers/gpu/drm/panfrost/panfrost_device.c  |  1 +
>  drivers/gpu/drm/panfrost/panfrost_perfcnt.c | 46 ++++++++++++++++++++++++++++-
>  drivers/gpu/drm/panfrost/panfrost_perfcnt.h |  1 +
>  3 files changed, 47 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c b/drivers/gpu/drm/panfrost/panfrost_device.c
> index 87b372c9e675..2805d50c1b9b 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_device.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_device.c
> @@ -426,6 +426,7 @@ bool panfrost_exception_needs_reset(const struct panfrost_device *pfdev,
>  
>  void panfrost_device_reset(struct panfrost_device *pfdev, bool enable_job_int)
>  {
> +	panfrost_perfcnt_reset(pfdev);
>  	panfrost_gpu_soft_reset(pfdev);
>  
>  	panfrost_gpu_power_on(pfdev);
> diff --git a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c b/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
> index ad1156678e91..c2087ea705fe 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
> @@ -33,6 +33,7 @@ struct panfrost_perfcnt {
>  	struct panfrost_file_priv *user;
>  	struct mutex lock;
>  	struct completion dump_comp;
> +	atomic_t hw_reset_happened;
>  };
>  
>  static void panfrost_perfcnt_gpu_disable(struct panfrost_device *pfdev)
> @@ -57,9 +58,13 @@ void panfrost_perfcnt_sample_done(struct panfrost_device *pfdev)
>  
>  static int panfrost_perfcnt_dump_locked(struct panfrost_device *pfdev)
>  {
> +	struct panfrost_perfcnt *perfcnt = pfdev->perfcnt;
>  	u64 gpuva;
>  	int ret;
>  
> +	if (atomic_read(&perfcnt->hw_reset_happened))
> +		return -EIO;
> +
>  	reinit_completion(&pfdev->perfcnt->dump_comp);
>  	gpuva = pfdev->perfcnt->mapping->mmnode.start << PAGE_SHIFT;
>  	gpu_write(pfdev, GPU_PERFCNT_BASE_LO, lower_32_bits(gpuva));
> @@ -140,6 +145,15 @@ static int panfrost_perfcnt_enable_locked(struct panfrost_device *pfdev,
>  		goto err_vunmap;
>  	}
>  
> +	/* If a reset is ongoing, the AS we get right below will be torn
> +	 * down, so rather than waiting until this becomes obvious in a
> +	 * perfcnt_dump() ioctl, we ask the user to try again slightly later.
> +	 */
> +	if (atomic_read(&pfdev->reset.pending)) {
> +		ret = -EAGAIN;
> +		goto err_vunmap;
> +	}
> +
>  	ret = panfrost_mmu_as_get(pfdev, perfcnt->mapping->mmu);
>  	if (ret < 0)
>  		goto err_vunmap;
> @@ -173,6 +187,16 @@ static int panfrost_perfcnt_enable_locked(struct panfrost_device *pfdev,
>  	if (panfrost_has_hw_issue(pfdev, HW_ISSUE_8186))
>  		gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0xffffffff);
>  
> +	/* If a reset happened, we've no way of knowing whether it was between the time we called
> +	 * panfrost_mmu_as_get() or before perfcnt_enable(), so clearing this flag and going forward
> +	 * isn't possible. We must clear the flag and try again in the hopes no resets will happen
> +	 * between this and the next ioctl invocation.
> +	 */
> +	if (atomic_cmpxchg(&perfcnt->hw_reset_happened, 1, 0)) {
> +		ret = EAGAIN;
> +		goto err_disable;
> +	}

This should really be transparent to the user, apart from reporting
that samples might have been lost because of the reset.

> +
>  	/* The BO ref is retained by the mapping. */
>  	drm_gem_object_put(&bo->base);
>  
> @@ -180,6 +204,8 @@ static int panfrost_perfcnt_enable_locked(struct panfrost_device *pfdev,
>  
>  	return 0;
>  
> +err_disable:
> +	panfrost_perfcnt_gpu_disable(pfdev);
>  err_vunmap:
>  	drm_gem_vunmap(&bo->base, &map);
>  err_put_mapping:
> @@ -209,7 +235,8 @@ static int panfrost_perfcnt_disable_locked(struct panfrost_device *pfdev,
>  	drm_gem_vunmap(&perfcnt->mapping->obj->base.base, &map);
>  	perfcnt->buf = NULL;
>  	panfrost_gem_close(&perfcnt->mapping->obj->base.base, file_priv);
> -	panfrost_mmu_as_put(pfdev, perfcnt->mapping->mmu);
> +	if (!atomic_read(&perfcnt->hw_reset_happened))
> +		panfrost_mmu_as_put(pfdev, perfcnt->mapping->mmu);

Why not call panfrost_mmu_as_put() in panfrost_perfcnt_pre_reset(),
which is called before panfrost_mmu_reset()? If we do that, the AS
should be returned, and we can add a panfrost_perfcnt_post_reset()
that's called after panfrost_mmu_reset() and which re-acquires the AS
and re-instantiate the perfcnt settings.

>  	panfrost_gem_mapping_put(perfcnt->mapping);
>  	perfcnt->mapping = NULL;
>  	pm_runtime_put_autosuspend(pfdev->base.dev);
> @@ -346,3 +373,20 @@ void panfrost_perfcnt_fini(struct panfrost_device *pfdev)
>  	/* Disable everything before leaving. */
>  	panfrost_perfcnt_gpu_disable(pfdev);
>  }
> +
> +void panfrost_perfcnt_reset(struct panfrost_device *pfdev)
> +{
> +	struct panfrost_perfcnt *perfcnt = pfdev->perfcnt;
> +
> +	/* Since this function will be called either from a scheduled HW reset
> +	 * or a runtime resume, tearing down any perfcnt resources means we're
> +	 * doomed to deadlocking with perfcnt_{enable/disable}, since we'd have
> +	 * to take the perfecnt lock. On top of that, it'd also violate DMA fence
> +	 * signalling rules because GFP_KERNEL allocations are made with the perfcnt
> +	 * lock taken in perfcnt_enable.

Question is, do we really need these allocation to happen with the lock
held? And if yes, can't we protect perfcnt ops with a separate reset
lock?

> In light of this, the only thing we can do
> +	 * is disabling perfcnt unconditionally, and notifying the perfcnt user of
> +	 * the reset having happpened so that they can take recovery measures.

Informing userspace about the discontinuity is fine, but I think we
should try to restore the config in a post_reset() hook.

> +	 */
> +	panfrost_perfcnt_gpu_disable(pfdev);
> +	atomic_set(&perfcnt->hw_reset_happened, 1);
> +}
> diff --git a/drivers/gpu/drm/panfrost/panfrost_perfcnt.h b/drivers/gpu/drm/panfrost/panfrost_perfcnt.h
> index 8bbcf5f5fb33..8b9bc704b634 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_perfcnt.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_perfcnt.h
> @@ -14,5 +14,6 @@ int panfrost_ioctl_perfcnt_enable(struct drm_device *dev, void *data,
>  				  struct drm_file *file_priv);
>  int panfrost_ioctl_perfcnt_dump(struct drm_device *dev, void *data,
>  				struct drm_file *file_priv);
> +void panfrost_perfcnt_reset(struct panfrost_device *pfdev);
>  
>  #endif
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 6/7] drm/panfrost: Fix PM usage_count mishandling
  2026-06-04 17:35 ` [PATCH v2 6/7] drm/panfrost: Fix PM usage_count mishandling Adrián Larumbe
  2026-06-04 17:50   ` sashiko-bot
@ 2026-06-04 18:36   ` Boris Brezillon
  2026-06-05 10:48   ` Steven Price
  2 siblings, 0 replies; 25+ messages in thread
From: Boris Brezillon @ 2026-06-04 18:36 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: Rob Herring, Steven Price, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
	Marty E. Plummer, Tomeu Vizoso, Eric Anholt, Alyssa Rosenzweig,
	Robin Murphy, dri-devel, linux-kernel, Collabora Kernel Team,
	Neil Armstrong

On Thu, 04 Jun 2026 18:35:25 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> During device probe(), failure to do a PM get() will leave the usage_count
> set to 0, which is the value assigned at device creation time. That means
> when the autosuspend delay expires, runtime suspend callback won't be
> invoked, so the device will remain powered on forever.
> 
> On top of that, failure to call PM put() during device unplug means
> Panfrost device's PM usage_count increases monotonically for every new
> module reload.
> 
> The combined outcome of both of the above was that devfreq OPP transition
> notifications would be printed all the time, even when no jobs are being
> submitted. This quickly fills the kernel ring buffer with junk.
> 
> Even direr than that was the fact MMU interrupts are only enabled when
> the device is reset, so after device probe() the very first job targeting
> the tiler heap BO would always time out, because the driver's PM runtime
> resume callback would not be invoked.
> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> Fixes: 635430797d3f ("drm/panfrost: Rework runtime PM initialization")
> Fixes: 876b15d2c88d ("drm/panfrost: Fix module unload")
> ---
>  drivers/gpu/drm/panfrost/panfrost_drv.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index 2d4b6aa95c66..545fbf2c8d0c 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -989,6 +989,7 @@ static int panfrost_probe(struct platform_device *pdev)
>  	pm_runtime_set_active(pfdev->base.dev);
>  	pm_runtime_mark_last_busy(pfdev->base.dev);
>  	pm_runtime_enable(pfdev->base.dev);
> +	pm_runtime_get_noresume(pfdev->base.dev);
>  	pm_runtime_set_autosuspend_delay(pfdev->base.dev, 50); /* ~3 frames */
>  	pm_runtime_use_autosuspend(pfdev->base.dev);
>  
> @@ -1000,10 +1001,12 @@ static int panfrost_probe(struct platform_device *pdev)
>  	if (err < 0)
>  		goto err_out1;
>  
> +	pm_runtime_put_autosuspend(pfdev->base.dev);
>  
>  	return 0;
>  
>  err_out1:
> +	pm_runtime_put_noidle(pfdev->base.dev);

Do we really need this get_noresume/put_noidle dance, can't use call
pm_runtime_dont_use_autosuspend() instead like is done in panthor, or
is panthor broken too?

>  	pm_runtime_disable(pfdev->base.dev);
>  	panfrost_device_fini(pfdev);
>  	pm_runtime_set_suspended(pfdev->base.dev);
> @@ -1018,8 +1021,9 @@ static void panfrost_remove(struct platform_device *pdev)
>  	drm_dev_unregister(&pfdev->base);
>  
>  	pm_runtime_get_sync(pfdev->base.dev);
> -	pm_runtime_disable(pfdev->base.dev);
>  	panfrost_device_fini(pfdev);
> +	pm_runtime_put_noidle(pfdev->base.dev);
> +	pm_runtime_disable(pfdev->base.dev);
>  	pm_runtime_set_suspended(pfdev->base.dev);
>  }
>  
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 7/7] drm/panfrost: Explicitly enable MMU interrupts at device init
  2026-06-04 17:35 ` [PATCH v2 7/7] drm/panfrost: Explicitly enable MMU interrupts at device init Adrián Larumbe
  2026-06-04 17:55   ` sashiko-bot
@ 2026-06-05  6:56   ` Boris Brezillon
  1 sibling, 0 replies; 25+ messages in thread
From: Boris Brezillon @ 2026-06-05  6:56 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: Rob Herring, Steven Price, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
	Marty E. Plummer, Tomeu Vizoso, Eric Anholt, Alyssa Rosenzweig,
	Robin Murphy, dri-devel, linux-kernel, Collabora Kernel Team,
	Neil Armstrong

On Thu, 04 Jun 2026 18:35:26 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> Because the device must be in a position to accept jobs between the time
> drm_dev_register() is called and autosuspend first kicks in, there's a very
> narrow window inbetween during which jobs targeting the tiler buffer
> object would time out, since the device's PM status is 'Active', but no MMU
> interrupts were enabled at device initialisation time.
> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> Fixes: 73e467f60acd ("drm/panfrost: Consolidate reset handling")
> ---
>  drivers/gpu/drm/panfrost/panfrost_mmu.c | 11 +++++++++--
>  drivers/gpu/drm/panfrost/panfrost_mmu.h |  2 ++
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> index 4a3162c3b659..0e1ad6256c2c 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> @@ -336,6 +336,12 @@ void panfrost_mmu_as_put(struct panfrost_device *pfdev, struct panfrost_mmu *mmu
>  	WARN_ON(atomic_read(&mmu->as_count) < 0);
>  }
>  
> +void panfrost_mmu_enable_interrupts(struct panfrost_device *pfdev)
> +{
> +	mmu_write(pfdev, MMU_INT_CLEAR, ~0);
> +	mmu_write(pfdev, MMU_INT_MASK, ~0);
> +}
> +
>  void panfrost_mmu_reset(struct panfrost_device *pfdev)
>  {
>  	struct panfrost_mmu *mmu, *mmu_tmp;
> @@ -355,8 +361,7 @@ void panfrost_mmu_reset(struct panfrost_device *pfdev)
>  
>  	spin_unlock(&pfdev->as_lock);
>  
> -	mmu_write(pfdev, MMU_INT_CLEAR, ~0);
> -	mmu_write(pfdev, MMU_INT_MASK, ~0);
> +	panfrost_mmu_enable_interrupts(pfdev);
>  }
>  
>  static size_t get_pgsize(u64 addr, size_t size, size_t *count)
> @@ -970,6 +975,8 @@ int panfrost_mmu_init(struct panfrost_device *pfdev)
>  		return err;
>  	}
>  
> +	panfrost_mmu_enable_interrupts(pfdev);
> +
>  	return 0;
>  }
>  
> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.h b/drivers/gpu/drm/panfrost/panfrost_mmu.h
> index 27c3c65ed074..3140eb4fea0f 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.h
> @@ -12,6 +12,8 @@ struct panfrost_mmu;
>  int panfrost_mmu_map(struct panfrost_gem_mapping *mapping);
>  void panfrost_mmu_unmap(struct panfrost_gem_mapping *mapping);
>  
> +void panfrost_mmu_enable_interrupts(struct panfrost_device *pfdev);

You don't need to expose this function, it's only used inside
panfrost_mmu.c.

> +
>  int panfrost_mmu_init(struct panfrost_device *pfdev);
>  void panfrost_mmu_fini(struct panfrost_device *pfdev);
>  void panfrost_mmu_reset(struct panfrost_device *pfdev);
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/7] drm/panfrost: Check another bo field for cache option query
  2026-06-04 17:35 ` [PATCH v2 1/7] drm/panfrost: Check another bo field for cache option query Adrián Larumbe
  2026-06-04 17:57   ` Boris Brezillon
@ 2026-06-05 10:29   ` Steven Price
  1 sibling, 0 replies; 25+ messages in thread
From: Steven Price @ 2026-06-05 10:29 UTC (permalink / raw)
  To: Adrián Larumbe, Boris Brezillon, Rob Herring,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Faith Ekstrand, Marty E. Plummer, Tomeu Vizoso,
	Eric Anholt, Alyssa Rosenzweig, Robin Murphy
  Cc: dri-devel, linux-kernel, Collabora Kernel Team, Neil Armstrong,
	Claude

On 04/06/2026 18:35, Adrián Larumbe wrote:
> When querying the properties of a BO, whether it was created in
> Writeback caching mode for userspace mappings, is recorded in a Panfrost
> BO's wb_mmap field.
> 
> Reported-by: Claude <noreply@anthropic.com>
> Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/88
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> Fixes: 62eedf1ccba5 ("drm/panfrost: Add flag to map GEM object Write-Back Cacheable")

This is a subtle change in uABI because bo->base.map_wc reflects the
actual properties of the buffer. So on a coherent device this will
change. But it does match the intention/documentation of the ioctl
(returning the creation flags), and I don't think user space could have
been usefully relying on the current behaviour.

Reviewed-by: Steven Price <steven.price@arm.com>

> ---
>  drivers/gpu/drm/panfrost/panfrost_drv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index 3d0bdba2a474..cb8e5015847f 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -659,7 +659,7 @@ static int panfrost_ioctl_query_bo_info(struct drm_device *dev, void *data,
>  		if (bo->is_heap)
>  			args->create_flags |= PANFROST_BO_HEAP;
>  
> -		if (!bo->base.map_wc)
> +		if (bo->wb_mmap)
>  			args->create_flags |= PANFROST_BO_WB_MMAP;
>  	}
>  
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/7] drm/panfrost: Prevent division by 0
  2026-06-04 18:02   ` Boris Brezillon
@ 2026-06-05 10:29     ` Steven Price
  0 siblings, 0 replies; 25+ messages in thread
From: Steven Price @ 2026-06-05 10:29 UTC (permalink / raw)
  To: Boris Brezillon, Adrián Larumbe
  Cc: Rob Herring, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Marty E. Plummer,
	Tomeu Vizoso, Eric Anholt, Alyssa Rosenzweig, Robin Murphy,
	dri-devel, linux-kernel, Collabora Kernel Team, Neil Armstrong,
	Claude

On 04/06/2026 19:02, Boris Brezillon wrote:
> On Thu, 04 Jun 2026 18:35:21 +0100
> Adrián Larumbe <adrian.larumbe@collabora.com> wrote:
> 
>> When updating and debug-printing devfreq stats, in the very unlikely
>> offchance that total device time is less than 100 ns, clamp it to 0 to
>> avoid division by 0.
>>
>> Reported-by: Claude <noreply@anthropic.com>
>> Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/88
>> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>> Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver")
>> ---
>>  drivers/gpu/drm/panfrost/panfrost_devfreq.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
>> index b51c30778811..e34de47bf43b 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
>> @@ -78,7 +78,8 @@ static int panfrost_devfreq_get_dev_status(struct device *dev,
>>  
>>  	dev_dbg(pfdev->base.dev, "busy %lu total %lu %lu %% freq %lu MHz\n",
>>  		status->busy_time, status->total_time,
>> -		status->busy_time / (status->total_time / 100),
>> +		status->total_time >= 100 ?
>> +			status->busy_time / (status->total_time / 100) : 0,
> 
> or:
> 
> 		status->busy_time * 100 / MAX(status->total_time, 1),
> 

I have to admit I prefer Boris' version ;)

But either way:

Reviewed-by: Steven Price <steven.price@arm.com>

>>  		status->current_frequency / 1000 / 1000);
>>  
>>  	return 0;
>>
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/7] drm/panfrost: Move perfcnt GPU disable sequence into a helper
  2026-06-04 17:35 ` [PATCH v2 4/7] drm/panfrost: Move perfcnt GPU disable sequence into a helper Adrián Larumbe
  2026-06-04 17:47   ` sashiko-bot
  2026-06-04 18:05   ` Boris Brezillon
@ 2026-06-05 10:34   ` Steven Price
  2 siblings, 0 replies; 25+ messages in thread
From: Steven Price @ 2026-06-05 10:34 UTC (permalink / raw)
  To: Adrián Larumbe, Boris Brezillon, Rob Herring,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Faith Ekstrand, Marty E. Plummer, Tomeu Vizoso,
	Eric Anholt, Alyssa Rosenzweig, Robin Murphy
  Cc: dri-devel, linux-kernel, Collabora Kernel Team, Neil Armstrong

On 04/06/2026 18:35, Adrián Larumbe wrote:
> Just for the sake of avoiding repetition.
> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>

Reviewed-by: Steven Price <steven.price@arm.com>

> ---
>  drivers/gpu/drm/panfrost/panfrost_perfcnt.c | 31 ++++++++++++-----------------
>  1 file changed, 13 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c b/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
> index 7020c0192e18..ad1156678e91 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
> @@ -35,6 +35,16 @@ struct panfrost_perfcnt {
>  	struct completion dump_comp;
>  };
>  
> +static void panfrost_perfcnt_gpu_disable(struct panfrost_device *pfdev)
> +{
> +	gpu_write(pfdev, GPU_PERFCNT_CFG,
> +		  GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_OFF));
> +	gpu_write(pfdev, GPU_PRFCNT_JM_EN, 0x0);
> +	gpu_write(pfdev, GPU_PRFCNT_SHADER_EN, 0x0);
> +	gpu_write(pfdev, GPU_PRFCNT_MMU_L2_EN, 0x0);
> +	gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0);
> +}
> +
>  void panfrost_perfcnt_clean_cache_done(struct panfrost_device *pfdev)
>  {
>  	complete(&pfdev->perfcnt->dump_comp);
> @@ -193,12 +203,7 @@ static int panfrost_perfcnt_disable_locked(struct panfrost_device *pfdev,
>  	if (user != perfcnt->user)
>  		return -EINVAL;
>  
> -	gpu_write(pfdev, GPU_PRFCNT_JM_EN, 0x0);
> -	gpu_write(pfdev, GPU_PRFCNT_SHADER_EN, 0x0);
> -	gpu_write(pfdev, GPU_PRFCNT_MMU_L2_EN, 0x0);
> -	gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0);
> -	gpu_write(pfdev, GPU_PERFCNT_CFG,
> -		  GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_OFF));
> +	panfrost_perfcnt_gpu_disable(pfdev);
>  
>  	perfcnt->user = NULL;
>  	drm_gem_vunmap(&perfcnt->mapping->obj->base.base, &map);
> @@ -327,12 +332,7 @@ int panfrost_perfcnt_init(struct panfrost_device *pfdev)
>  	perfcnt->bosize = size;
>  
>  	/* Start with everything disabled. */
> -	gpu_write(pfdev, GPU_PERFCNT_CFG,
> -		  GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_OFF));
> -	gpu_write(pfdev, GPU_PRFCNT_JM_EN, 0);
> -	gpu_write(pfdev, GPU_PRFCNT_SHADER_EN, 0);
> -	gpu_write(pfdev, GPU_PRFCNT_MMU_L2_EN, 0);
> -	gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0);
> +	panfrost_perfcnt_gpu_disable(pfdev);
>  
>  	init_completion(&perfcnt->dump_comp);
>  	mutex_init(&perfcnt->lock);
> @@ -344,10 +344,5 @@ int panfrost_perfcnt_init(struct panfrost_device *pfdev)
>  void panfrost_perfcnt_fini(struct panfrost_device *pfdev)
>  {
>  	/* Disable everything before leaving. */
> -	gpu_write(pfdev, GPU_PERFCNT_CFG,
> -		  GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_OFF));
> -	gpu_write(pfdev, GPU_PRFCNT_JM_EN, 0);
> -	gpu_write(pfdev, GPU_PRFCNT_SHADER_EN, 0);
> -	gpu_write(pfdev, GPU_PRFCNT_MMU_L2_EN, 0);
> -	gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0);
> +	panfrost_perfcnt_gpu_disable(pfdev);
>  }
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/7] drm/panfrost: Make reset sequence deal with an active HWPerf session
  2026-06-04 18:26   ` Boris Brezillon
@ 2026-06-05 10:41     ` Steven Price
  0 siblings, 0 replies; 25+ messages in thread
From: Steven Price @ 2026-06-05 10:41 UTC (permalink / raw)
  To: Boris Brezillon, Adrián Larumbe
  Cc: Rob Herring, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Marty E. Plummer,
	Tomeu Vizoso, Eric Anholt, Alyssa Rosenzweig, Robin Murphy,
	dri-devel, linux-kernel, Collabora Kernel Team, Neil Armstrong,
	Claude

On 04/06/2026 19:26, Boris Brezillon wrote:
> On Thu, 04 Jun 2026 18:35:24 +0100
> Adrián Larumbe <adrian.larumbe@collabora.com> wrote:
> 
>> Right now, if there's a HW reset and an HWPerf session is active,
>> panfrost_mmu_reset() will reset the AS count for every single open file's
>> mmu struct back to 0, and also invalidate their AS numbers. Then, when
>> disabling hwperf, panfrost_mmu_as_put() will WARN that mmu->as_count is
>> less than zero.
>>
>> Fix this by introducing a perfcnt HW reset path.
>>
>> The choice was made to render perfcnt unusable after reset, so that a
>> user might have to reprogram it with a full disable/enable sequence
>> before requesting more perfcnt dumps.
> 
> Can't we do better than that. We store the config of the perf session,
> so if a reset is in progress, we can simply block on it, restore the
> old config in the post_reset path, and get going with the DUMP request
> once the reset has been done. There's certainly something to do to
> report the discontinuity to userspace, but that's something we can
> reflect through and extra field added to drm_panfrost_perfcnt_dump.

I agree with Boris' point, ideally reset should be as transparent as
possible. Reporting that it has happened is useful, but there's no need
to force user space into restoring the context unnecessarily.

One other minor issue below.

>>
>> Reported-by: Claude <noreply@anthropic.com>
>> Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/88
>> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>> Fixes: 7786fd108777 ("drm/panfrost: Expose performance counters through unstable ioctls")
>> ---
>>  drivers/gpu/drm/panfrost/panfrost_device.c  |  1 +
>>  drivers/gpu/drm/panfrost/panfrost_perfcnt.c | 46 ++++++++++++++++++++++++++++-
>>  drivers/gpu/drm/panfrost/panfrost_perfcnt.h |  1 +
>>  3 files changed, 47 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c b/drivers/gpu/drm/panfrost/panfrost_device.c
>> index 87b372c9e675..2805d50c1b9b 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_device.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_device.c
>> @@ -426,6 +426,7 @@ bool panfrost_exception_needs_reset(const struct panfrost_device *pfdev,
>>  
>>  void panfrost_device_reset(struct panfrost_device *pfdev, bool enable_job_int)
>>  {
>> +	panfrost_perfcnt_reset(pfdev);
>>  	panfrost_gpu_soft_reset(pfdev);
>>  
>>  	panfrost_gpu_power_on(pfdev);
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c b/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
>> index ad1156678e91..c2087ea705fe 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
>> @@ -33,6 +33,7 @@ struct panfrost_perfcnt {
>>  	struct panfrost_file_priv *user;
>>  	struct mutex lock;
>>  	struct completion dump_comp;
>> +	atomic_t hw_reset_happened;
>>  };
>>  
>>  static void panfrost_perfcnt_gpu_disable(struct panfrost_device *pfdev)
>> @@ -57,9 +58,13 @@ void panfrost_perfcnt_sample_done(struct panfrost_device *pfdev)
>>  
>>  static int panfrost_perfcnt_dump_locked(struct panfrost_device *pfdev)
>>  {
>> +	struct panfrost_perfcnt *perfcnt = pfdev->perfcnt;
>>  	u64 gpuva;
>>  	int ret;
>>  
>> +	if (atomic_read(&perfcnt->hw_reset_happened))
>> +		return -EIO;
>> +
>>  	reinit_completion(&pfdev->perfcnt->dump_comp);
>>  	gpuva = pfdev->perfcnt->mapping->mmnode.start << PAGE_SHIFT;
>>  	gpu_write(pfdev, GPU_PERFCNT_BASE_LO, lower_32_bits(gpuva));
>> @@ -140,6 +145,15 @@ static int panfrost_perfcnt_enable_locked(struct panfrost_device *pfdev,
>>  		goto err_vunmap;
>>  	}
>>  
>> +	/* If a reset is ongoing, the AS we get right below will be torn
>> +	 * down, so rather than waiting until this becomes obvious in a
>> +	 * perfcnt_dump() ioctl, we ask the user to try again slightly later.
>> +	 */
>> +	if (atomic_read(&pfdev->reset.pending)) {
>> +		ret = -EAGAIN;
>> +		goto err_vunmap;
>> +	}
>> +
>>  	ret = panfrost_mmu_as_get(pfdev, perfcnt->mapping->mmu);
>>  	if (ret < 0)
>>  		goto err_vunmap;
>> @@ -173,6 +187,16 @@ static int panfrost_perfcnt_enable_locked(struct panfrost_device *pfdev,
>>  	if (panfrost_has_hw_issue(pfdev, HW_ISSUE_8186))
>>  		gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0xffffffff);
>>  
>> +	/* If a reset happened, we've no way of knowing whether it was between the time we called
>> +	 * panfrost_mmu_as_get() or before perfcnt_enable(), so clearing this flag and going forward
>> +	 * isn't possible. We must clear the flag and try again in the hopes no resets will happen
>> +	 * between this and the next ioctl invocation.
>> +	 */
>> +	if (atomic_cmpxchg(&perfcnt->hw_reset_happened, 1, 0)) {
>> +		ret = EAGAIN;

Missing '-' on EAGAIN.

>> +		goto err_disable;
>> +	}
> 
> This should really be transparent to the user, apart from reporting
> that samples might have been lost because of the reset.
> 
>> +
>>  	/* The BO ref is retained by the mapping. */
>>  	drm_gem_object_put(&bo->base);
>>  
>> @@ -180,6 +204,8 @@ static int panfrost_perfcnt_enable_locked(struct panfrost_device *pfdev,
>>  
>>  	return 0;
>>  
>> +err_disable:

We're leaking the AS reference here - you need a panfrost_mmu_as_put() call.

>> +	panfrost_perfcnt_gpu_disable(pfdev);
>>  err_vunmap:
>>  	drm_gem_vunmap(&bo->base, &map);
>>  err_put_mapping:
>> @@ -209,7 +235,8 @@ static int panfrost_perfcnt_disable_locked(struct panfrost_device *pfdev,
>>  	drm_gem_vunmap(&perfcnt->mapping->obj->base.base, &map);
>>  	perfcnt->buf = NULL;
>>  	panfrost_gem_close(&perfcnt->mapping->obj->base.base, file_priv);
>> -	panfrost_mmu_as_put(pfdev, perfcnt->mapping->mmu);
>> +	if (!atomic_read(&perfcnt->hw_reset_happened))
>> +		panfrost_mmu_as_put(pfdev, perfcnt->mapping->mmu);

The atomic_read() scares me - is this really safe and not subject to races?

Thanks,
Steve

> 
> Why not call panfrost_mmu_as_put() in panfrost_perfcnt_pre_reset(),
> which is called before panfrost_mmu_reset()? If we do that, the AS
> should be returned, and we can add a panfrost_perfcnt_post_reset()
> that's called after panfrost_mmu_reset() and which re-acquires the AS
> and re-instantiate the perfcnt settings.
> 
>>  	panfrost_gem_mapping_put(perfcnt->mapping);
>>  	perfcnt->mapping = NULL;
>>  	pm_runtime_put_autosuspend(pfdev->base.dev);
>> @@ -346,3 +373,20 @@ void panfrost_perfcnt_fini(struct panfrost_device *pfdev)
>>  	/* Disable everything before leaving. */
>>  	panfrost_perfcnt_gpu_disable(pfdev);
>>  }
>> +
>> +void panfrost_perfcnt_reset(struct panfrost_device *pfdev)
>> +{
>> +	struct panfrost_perfcnt *perfcnt = pfdev->perfcnt;
>> +
>> +	/* Since this function will be called either from a scheduled HW reset
>> +	 * or a runtime resume, tearing down any perfcnt resources means we're
>> +	 * doomed to deadlocking with perfcnt_{enable/disable}, since we'd have
>> +	 * to take the perfecnt lock. On top of that, it'd also violate DMA fence
>> +	 * signalling rules because GFP_KERNEL allocations are made with the perfcnt
>> +	 * lock taken in perfcnt_enable.
> 
> Question is, do we really need these allocation to happen with the lock
> held? And if yes, can't we protect perfcnt ops with a separate reset
> lock?
> 
>> In light of this, the only thing we can do
>> +	 * is disabling perfcnt unconditionally, and notifying the perfcnt user of
>> +	 * the reset having happpened so that they can take recovery measures.
> 
> Informing userspace about the discontinuity is fine, but I think we
> should try to restore the config in a post_reset() hook.
> 
>> +	 */
>> +	panfrost_perfcnt_gpu_disable(pfdev);
>> +	atomic_set(&perfcnt->hw_reset_happened, 1);
>> +}
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_perfcnt.h b/drivers/gpu/drm/panfrost/panfrost_perfcnt.h
>> index 8bbcf5f5fb33..8b9bc704b634 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_perfcnt.h
>> +++ b/drivers/gpu/drm/panfrost/panfrost_perfcnt.h
>> @@ -14,5 +14,6 @@ int panfrost_ioctl_perfcnt_enable(struct drm_device *dev, void *data,
>>  				  struct drm_file *file_priv);
>>  int panfrost_ioctl_perfcnt_dump(struct drm_device *dev, void *data,
>>  				struct drm_file *file_priv);
>> +void panfrost_perfcnt_reset(struct panfrost_device *pfdev);
>>  
>>  #endif
>>
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 6/7] drm/panfrost: Fix PM usage_count mishandling
  2026-06-04 17:35 ` [PATCH v2 6/7] drm/panfrost: Fix PM usage_count mishandling Adrián Larumbe
  2026-06-04 17:50   ` sashiko-bot
  2026-06-04 18:36   ` Boris Brezillon
@ 2026-06-05 10:48   ` Steven Price
  2 siblings, 0 replies; 25+ messages in thread
From: Steven Price @ 2026-06-05 10:48 UTC (permalink / raw)
  To: Adrián Larumbe, Boris Brezillon, Rob Herring,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Faith Ekstrand, Marty E. Plummer, Tomeu Vizoso,
	Eric Anholt, Alyssa Rosenzweig, Robin Murphy
  Cc: dri-devel, linux-kernel, Collabora Kernel Team, Neil Armstrong

On 04/06/2026 18:35, Adrián Larumbe wrote:
> During device probe(), failure to do a PM get() will leave the usage_count
> set to 0, which is the value assigned at device creation time. That means
> when the autosuspend delay expires, runtime suspend callback won't be
> invoked, so the device will remain powered on forever.
> 
> On top of that, failure to call PM put() during device unplug means
> Panfrost device's PM usage_count increases monotonically for every new
> module reload.
> 
> The combined outcome of both of the above was that devfreq OPP transition
> notifications would be printed all the time, even when no jobs are being
> submitted. This quickly fills the kernel ring buffer with junk.
> 
> Even direr than that was the fact MMU interrupts are only enabled when
> the device is reset, so after device probe() the very first job targeting
> the tiler heap BO would always time out, because the driver's PM runtime
> resume callback would not be invoked.
> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> Fixes: 635430797d3f ("drm/panfrost: Rework runtime PM initialization")
> Fixes: 876b15d2c88d ("drm/panfrost: Fix module unload")
> ---
>  drivers/gpu/drm/panfrost/panfrost_drv.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index 2d4b6aa95c66..545fbf2c8d0c 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -989,6 +989,7 @@ static int panfrost_probe(struct platform_device *pdev)
>  	pm_runtime_set_active(pfdev->base.dev);
>  	pm_runtime_mark_last_busy(pfdev->base.dev);
>  	pm_runtime_enable(pfdev->base.dev);
> +	pm_runtime_get_noresume(pfdev->base.dev);
>  	pm_runtime_set_autosuspend_delay(pfdev->base.dev, 50); /* ~3 frames */
>  	pm_runtime_use_autosuspend(pfdev->base.dev);
>  
> @@ -1000,10 +1001,12 @@ static int panfrost_probe(struct platform_device *pdev)
>  	if (err < 0)
>  		goto err_out1;
>  
> +	pm_runtime_put_autosuspend(pfdev->base.dev);
>  
>  	return 0;
>  
>  err_out1:
> +	pm_runtime_put_noidle(pfdev->base.dev);
>  	pm_runtime_disable(pfdev->base.dev);
>  	panfrost_device_fini(pfdev);

Sashiko is concerned that dropping the usage count before
pm_runtime_disable() could cause things to turn off too early, and I
have to agree it sounds like it could be a problem:

Sashiko wrote:
> Does dropping the usage count before pm_runtime_disable() create a race
> condition where the suspend callback can run and disable clocks before
> hardware shutdown?
> Because the usage count is dropped early, a concurrent PM event could trigger
> the suspend callback, disabling clocks. Then, panfrost_device_fini() calls
> panfrost_gpu_fini() which writes to MMIO registers. Could writing to
> unclocked registers on ARM SoCs cause fatal bus errors or panics?

Sashiko also suggests we might have some other (partially pre-existing)
issues here.

https://sashiko.dev/#/patchset/20260604-claude-fixes-v2-0-57c6bd4c1655%40collabora.com

Thanks,
Steve

>  	pm_runtime_set_suspended(pfdev->base.dev);
> @@ -1018,8 +1021,9 @@ static void panfrost_remove(struct platform_device *pdev)
>  	drm_dev_unregister(&pfdev->base);
>  
>  	pm_runtime_get_sync(pfdev->base.dev);
> -	pm_runtime_disable(pfdev->base.dev);
>  	panfrost_device_fini(pfdev);
> +	pm_runtime_put_noidle(pfdev->base.dev);
> +	pm_runtime_disable(pfdev->base.dev);
>  	pm_runtime_set_suspended(pfdev->base.dev);
>  }
>  
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2026-06-05 10:48 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-04 17:35 [PATCH v2 0/7] RPM, perfcnt and other minor fixes for Panfrost Adrián Larumbe
2026-06-04 17:35 ` [PATCH v2 1/7] drm/panfrost: Check another bo field for cache option query Adrián Larumbe
2026-06-04 17:57   ` Boris Brezillon
2026-06-05 10:29   ` Steven Price
2026-06-04 17:35 ` [PATCH v2 2/7] drm/panfrost: Prevent division by 0 Adrián Larumbe
2026-06-04 17:44   ` sashiko-bot
2026-06-04 18:02   ` Boris Brezillon
2026-06-05 10:29     ` Steven Price
2026-06-04 17:35 ` [PATCH v2 3/7] drm/panfrost: Move shrinker initialization and unplug one level down Adrián Larumbe
2026-06-04 18:04   ` Boris Brezillon
2026-06-04 17:35 ` [PATCH v2 4/7] drm/panfrost: Move perfcnt GPU disable sequence into a helper Adrián Larumbe
2026-06-04 17:47   ` sashiko-bot
2026-06-04 18:05   ` Boris Brezillon
2026-06-05 10:34   ` Steven Price
2026-06-04 17:35 ` [PATCH v2 5/7] drm/panfrost: Make reset sequence deal with an active HWPerf session Adrián Larumbe
2026-06-04 17:49   ` sashiko-bot
2026-06-04 18:26   ` Boris Brezillon
2026-06-05 10:41     ` Steven Price
2026-06-04 17:35 ` [PATCH v2 6/7] drm/panfrost: Fix PM usage_count mishandling Adrián Larumbe
2026-06-04 17:50   ` sashiko-bot
2026-06-04 18:36   ` Boris Brezillon
2026-06-05 10:48   ` Steven Price
2026-06-04 17:35 ` [PATCH v2 7/7] drm/panfrost: Explicitly enable MMU interrupts at device init Adrián Larumbe
2026-06-04 17:55   ` sashiko-bot
2026-06-05  6:56   ` Boris Brezillon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.