devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Maíra Canal" <mcanal@igalia.com>
To: Melissa Wen <mwen@igalia.com>, Iago Toral <itoral@igalia.com>,
	 Jose Maria Casanova Crespo <jmcasanova@igalia.com>,
	 Krzysztof Kozlowski <krzk+dt@kernel.org>,
	 Conor Dooley <conor+dt@kernel.org>,
	 Nicolas Saenz Julienne <nsaenz@kernel.org>
Cc: "Phil Elwell" <phil@raspberrypi.com>,
	dri-devel@lists.freedesktop.org, devicetree@vger.kernel.org,
	kernel-dev@igalia.com, "Maíra Canal" <mcanal@igalia.com>
Subject: [PATCH v3 2/7] drm/v3d: Set job pointer to NULL when the job's fence has an error
Date: Tue, 11 Mar 2025 15:13:44 -0300	[thread overview]
Message-ID: <20250311-v3d-gpu-reset-fixes-v3-2-64f7a4247ec0@igalia.com> (raw)
In-Reply-To: <20250311-v3d-gpu-reset-fixes-v3-0-64f7a4247ec0@igalia.com>

Similar to commit e4b5ccd392b9 ("drm/v3d: Ensure job pointer is set to
NULL after job completion"), ensure the job pointer is set to `NULL` when
a job's fence has an error. Failing to do so can trigger kernel warnings
in specific scenarios, such as:

1. v3d_csd_job_run() assigns `v3d->csd_job = job`
2. CSD job exceeds hang limit, causing a timeout → v3d_gpu_reset_for_timeout()
3. GPU reset
4. drm_sched_resubmit_jobs() sets the job's fence to `-ECANCELED`.
5. v3d_csd_job_run() detects the fence error and returns NULL, not
   submitting the job to the GPU
6. User-space runs `modprobe -r v3d`
7. v3d_gem_destroy()

v3d_gem_destroy() triggers a warning indicating that the CSD job never
ended, as we didn't set `v3d->csd_job` to NULL after the timeout. The same
can also happen to BIN, RENDER, and TFU jobs.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_sched.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index c2010ecdb08f4ba3b54f7783ed33901552d0eba1..34c42d6e12cde656d3b51a18be324976199eceae 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -226,8 +226,12 @@ static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
 	struct dma_fence *fence;
 	unsigned long irqflags;
 
-	if (unlikely(job->base.base.s_fence->finished.error))
+	if (unlikely(job->base.base.s_fence->finished.error)) {
+		spin_lock_irqsave(&v3d->job_lock, irqflags);
+		v3d->bin_job = NULL;
+		spin_unlock_irqrestore(&v3d->job_lock, irqflags);
 		return NULL;
+	}
 
 	/* Lock required around bin_job update vs
 	 * v3d_overflow_mem_work().
@@ -281,8 +285,10 @@ static struct dma_fence *v3d_render_job_run(struct drm_sched_job *sched_job)
 	struct drm_device *dev = &v3d->drm;
 	struct dma_fence *fence;
 
-	if (unlikely(job->base.base.s_fence->finished.error))
+	if (unlikely(job->base.base.s_fence->finished.error)) {
+		v3d->render_job = NULL;
 		return NULL;
+	}
 
 	v3d->render_job = job;
 
@@ -327,8 +333,10 @@ v3d_tfu_job_run(struct drm_sched_job *sched_job)
 	struct drm_device *dev = &v3d->drm;
 	struct dma_fence *fence;
 
-	if (unlikely(job->base.base.s_fence->finished.error))
+	if (unlikely(job->base.base.s_fence->finished.error)) {
+		v3d->tfu_job = NULL;
 		return NULL;
+	}
 
 	v3d->tfu_job = job;
 
@@ -373,8 +381,10 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
 	struct dma_fence *fence;
 	int i, csd_cfg0_reg;
 
-	if (unlikely(job->base.base.s_fence->finished.error))
+	if (unlikely(job->base.base.s_fence->finished.error)) {
+		v3d->csd_job = NULL;
 		return NULL;
+	}
 
 	v3d->csd_job = job;
 

-- 
Git-154)


  parent reply	other threads:[~2025-03-11 18:14 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-11 18:13 [PATCH v3 0/7] drm/v3d: Fix GPU reset issues on the Raspberry Pi 5 Maíra Canal
2025-03-11 18:13 ` [PATCH v3 1/7] drm/v3d: Don't run jobs that have errors flagged in its fence Maíra Canal
2025-03-11 18:13 ` Maíra Canal [this message]
2025-03-11 18:13 ` [PATCH v3 3/7] drm/v3d: Associate a V3D tech revision to all supported devices Maíra Canal
2025-03-11 18:13 ` [PATCH v3 4/7] dt-bindings: gpu: v3d: Add per-compatible register restrictions Maíra Canal
2025-03-11 18:13 ` [PATCH v3 5/7] dt-bindings: gpu: v3d: Add SMS register to BCM2712 compatible Maíra Canal
2025-03-11 20:23   ` Rob Herring
2025-03-11 22:05     ` Maíra Canal
2025-03-12  9:06       ` Krzysztof Kozlowski
2025-03-12 17:47         ` Maíra Canal
2025-03-11 18:13 ` [PATCH v3 6/7] drm/v3d: Use V3D_SMS registers for power on/off and reset on V3D 7.x Maíra Canal
2025-03-11 18:13 ` [PATCH v3 7/7] dt-bindings: gpu: Add V3D driver maintainer as DT maintainer Maíra Canal
2025-03-12  9:34 ` [PATCH v3 0/7] drm/v3d: Fix GPU reset issues on the Raspberry Pi 5 Raag Jadav

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250311-v3d-gpu-reset-fixes-v3-2-64f7a4247ec0@igalia.com \
    --to=mcanal@igalia.com \
    --cc=conor+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=itoral@igalia.com \
    --cc=jmcasanova@igalia.com \
    --cc=kernel-dev@igalia.com \
    --cc=krzk+dt@kernel.org \
    --cc=mwen@igalia.com \
    --cc=nsaenz@kernel.org \
    --cc=phil@raspberrypi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).