public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] drm/v3d: Fix GPU reset issues on the Raspberry Pi 5
@ 2025-03-11 18:13 Maíra Canal
  2025-03-11 18:13 ` [PATCH v3 1/7] drm/v3d: Don't run jobs that have errors flagged in its fence Maíra Canal
  2025-03-12  9:34 ` [PATCH v3 0/7] drm/v3d: Fix GPU reset issues on the Raspberry Pi 5 Raag Jadav
  0 siblings, 2 replies; 3+ messages in thread
From: Maíra Canal @ 2025-03-11 18:13 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, Jose Maria Casanova Crespo,
	Krzysztof Kozlowski, Conor Dooley, Nicolas Saenz Julienne
  Cc: Phil Elwell, dri-devel, devicetree, kernel-dev, stable,
	Maíra Canal, Emma Anholt, Rob Herring (Arm)

This series addresses GPU reset issues reported in [1], where running a
long compute job would trigger repeated GPU resets, leading to a UI
freeze.

Patches #1 and #2 prevent the same faulty job from being resubmitted in a
loop, mitigating the first cause of the issue.

However, the issue isn't entirely solved. Even with only a single GPU
reset, the UI still freezes on the Raspberry Pi 5, indicating a GPU hang.
Patches #3 to #6 address this by properly configuring the V3D_SMS
registers, which are required for power management and resets in V3D 7.1.

Patch #7 updates the DT maintainership, replacing Emma with the current
v3d driver maintainer.

[1] https://github.com/raspberrypi/linux/issues/6660

Best Regards,
- Maíra

---
v1 -> v2:
- [1/6, 2/6, 5/6] Add Iago's R-b (Iago Toral)
- [3/6] Use V3D_GEN_* macros consistently throughout the driver (Phil Elwell)
- [3/6] Don't add Iago's R-b in 3/6 due to changes in the patch
- [4/6] Add per-compatible restrictions to enforce per‐SoC register rules (Conor Dooley)
- [6/6] Add Emma's A-b, collected through IRC (Emma Anholt)
- [6/6] Add Rob's A-b (Rob Herring)
- Link to v1: https://lore.kernel.org/r/20250226-v3d-gpu-reset-fixes-v1-0-83a969fdd9c1@igalia.com

v2 -> v3:
- [3/7] Add Iago's R-b (Iago Toral)
- [4/7, 5/7] Separate the patches to ease the reviewing process -> Now,
  PATCH 4/7 only adds the per-compatible rules and PATCH 5/7 adds the
  SMS registers
- [4/7] `allOf` goes above `additionalProperties` (Krzysztof Kozlowski)
- [4/7, 5/7] Sync `reg` and `reg-names` items (Krzysztof Kozlowski)
- Link to v2: https://lore.kernel.org/r/20250308-v3d-gpu-reset-fixes-v2-0-2939c30f0cc4@igalia.com

---
Maíra Canal (7):
      drm/v3d: Don't run jobs that have errors flagged in its fence
      drm/v3d: Set job pointer to NULL when the job's fence has an error
      drm/v3d: Associate a V3D tech revision to all supported devices
      dt-bindings: gpu: v3d: Add per-compatible register restrictions
      dt-bindings: gpu: v3d: Add SMS register to BCM2712 compatible
      drm/v3d: Use V3D_SMS registers for power on/off and reset on V3D 7.x
      dt-bindings: gpu: Add V3D driver maintainer as DT maintainer

 .../devicetree/bindings/gpu/brcm,bcm-v3d.yaml      |  77 +++++++++++--
 drivers/gpu/drm/v3d/v3d_debugfs.c                  | 126 ++++++++++-----------
 drivers/gpu/drm/v3d/v3d_drv.c                      |  62 +++++++++-
 drivers/gpu/drm/v3d/v3d_drv.h                      |  22 +++-
 drivers/gpu/drm/v3d/v3d_gem.c                      |  27 ++++-
 drivers/gpu/drm/v3d/v3d_irq.c                      |   6 +-
 drivers/gpu/drm/v3d/v3d_perfmon.c                  |   4 +-
 drivers/gpu/drm/v3d/v3d_regs.h                     |  26 +++++
 drivers/gpu/drm/v3d/v3d_sched.c                    |  29 ++++-
 9 files changed, 281 insertions(+), 98 deletions(-)
---
base-commit: 9e75b6ef407fee5d4ed8021cd7ddd9d6a8f7b0e8
change-id: 20250224-v3d-gpu-reset-fixes-2d21fc70711d


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH v3 1/7] drm/v3d: Don't run jobs that have errors flagged in its fence
  2025-03-11 18:13 [PATCH v3 0/7] drm/v3d: Fix GPU reset issues on the Raspberry Pi 5 Maíra Canal
@ 2025-03-11 18:13 ` Maíra Canal
  2025-03-12  9:34 ` [PATCH v3 0/7] drm/v3d: Fix GPU reset issues on the Raspberry Pi 5 Raag Jadav
  1 sibling, 0 replies; 3+ messages in thread
From: Maíra Canal @ 2025-03-11 18:13 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, Jose Maria Casanova Crespo,
	Krzysztof Kozlowski, Conor Dooley, Nicolas Saenz Julienne
  Cc: Phil Elwell, dri-devel, devicetree, kernel-dev, stable,
	Maíra Canal

The V3D driver still relies on `drm_sched_increase_karma()` and
`drm_sched_resubmit_jobs()` for resubmissions when a timeout occurs.
The function `drm_sched_increase_karma()` marks the job as guilty, while
`drm_sched_resubmit_jobs()` sets an error (-ECANCELED) in the DMA fence of
that guilty job.

Because of this, we must check whether the job’s DMA fence has been
flagged with an error before executing the job. Otherwise, the same guilty
job may be resubmitted indefinitely, causing repeated GPU resets.

This patch adds a check for an error on the job's fence to prevent running
a guilty job that was previously flagged when the GPU timed out.

Note that the CPU and CACHE_CLEAN queues do not require this check, as
their jobs are executed synchronously once the DRM scheduler starts them.

Cc: stable@vger.kernel.org
Fixes: d223f98f0209 ("drm/v3d: Add support for compute shader dispatch.")
Fixes: 1584f16ca96e ("drm/v3d: Add support for submitting jobs to the TFU.")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_sched.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 80466ce8c7df669280e556c0793490b79e75d2c7..c2010ecdb08f4ba3b54f7783ed33901552d0eba1 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -327,11 +327,15 @@ v3d_tfu_job_run(struct drm_sched_job *sched_job)
 	struct drm_device *dev = &v3d->drm;
 	struct dma_fence *fence;
 
+	if (unlikely(job->base.base.s_fence->finished.error))
+		return NULL;
+
+	v3d->tfu_job = job;
+
 	fence = v3d_fence_create(v3d, V3D_TFU);
 	if (IS_ERR(fence))
 		return NULL;
 
-	v3d->tfu_job = job;
 	if (job->base.irq_fence)
 		dma_fence_put(job->base.irq_fence);
 	job->base.irq_fence = dma_fence_get(fence);
@@ -369,6 +373,9 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
 	struct dma_fence *fence;
 	int i, csd_cfg0_reg;
 
+	if (unlikely(job->base.base.s_fence->finished.error))
+		return NULL;
+
 	v3d->csd_job = job;
 
 	v3d_invalidate_caches(v3d);

-- 
Git-154)


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v3 0/7] drm/v3d: Fix GPU reset issues on the Raspberry Pi 5
  2025-03-11 18:13 [PATCH v3 0/7] drm/v3d: Fix GPU reset issues on the Raspberry Pi 5 Maíra Canal
  2025-03-11 18:13 ` [PATCH v3 1/7] drm/v3d: Don't run jobs that have errors flagged in its fence Maíra Canal
@ 2025-03-12  9:34 ` Raag Jadav
  1 sibling, 0 replies; 3+ messages in thread
From: Raag Jadav @ 2025-03-12  9:34 UTC (permalink / raw)
  To: Maíra Canal
  Cc: Melissa Wen, Iago Toral, Jose Maria Casanova Crespo,
	Krzysztof Kozlowski, Conor Dooley, Nicolas Saenz Julienne,
	Phil Elwell, dri-devel, devicetree, kernel-dev, stable,
	Emma Anholt, Rob Herring (Arm)

On Tue, Mar 11, 2025 at 03:13:42PM -0300, Maíra Canal wrote:
> This series addresses GPU reset issues reported in [1], where running a
> long compute job would trigger repeated GPU resets, leading to a UI
> freeze.
> 
> Patches #1 and #2 prevent the same faulty job from being resubmitted in a
> loop, mitigating the first cause of the issue.
> 
> However, the issue isn't entirely solved. Even with only a single GPU
> reset, the UI still freezes on the Raspberry Pi 5, indicating a GPU hang.
> Patches #3 to #6 address this by properly configuring the V3D_SMS
> registers, which are required for power management and resets in V3D 7.1.

Not sure how much it helps your case, but still leaving it here in case it
turns out to be useful here. It's already in -next and trending 6.15 merge.

https://patchwork.freedesktop.org/series/138070/

Raag

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-03-12  9:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-11 18:13 [PATCH v3 0/7] drm/v3d: Fix GPU reset issues on the Raspberry Pi 5 Maíra Canal
2025-03-11 18:13 ` [PATCH v3 1/7] drm/v3d: Don't run jobs that have errors flagged in its fence Maíra Canal
2025-03-12  9:34 ` [PATCH v3 0/7] drm/v3d: Fix GPU reset issues on the Raspberry Pi 5 Raag Jadav

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox