dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/8] drm/sched: Allow drivers to skip the reset with DRM_GPU_SCHED_STAT_NO_HANG
@ 2025-07-07 14:46 Maíra Canal
  2025-07-07 14:46 ` [PATCH v4 1/8] drm/sched: Rename DRM_GPU_SCHED_STAT_NOMINAL to DRM_GPU_SCHED_STAT_RESET Maíra Canal
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Maíra Canal @ 2025-07-07 14:46 UTC (permalink / raw)
  To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
	Christian König, Tvrtko Ursulin, Simona Vetter, David Airlie,
	Melissa Wen, Lucas Stach, Russell King, Christian Gmeiner,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Boris Brezillon, Rob Herring, Steven Price, Liviu Dudau
  Cc: kernel-dev, dri-devel, etnaviv, intel-xe, Maíra Canal,
	Min Ma, Lizhi Hou, Oded Gabbay, Frank Binns, Matt Coster,
	Qiang Yu, Lyude Paul, Alex Deucher, Christian König

TL;DR: The only two patches that are lacking R-b's are:

[PATCH 2/8] drm/sched: Allow drivers to skip the reset and keep on running
[PATCH 7/8] drm/xe: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset

For those two patches, it would be great to gather feedback and/or R-b's,
particularly from the Intel folks.

Thanks for all the reviews so far!

---
When the DRM scheduler times out, it's possible that the GPU isn't hung;
instead, a job may still be running, and there may be no valid reason to
reset the hardware. This can occur in two situations:

  1. The GPU exposes some mechanism that ensures the GPU is still making
     progress. By checking this mechanism, the driver can safely skip the
     reset, re-arm the timeout, and allow the job to continue running until
     completion. This is the case for v3d, Etnaviv, and Xe.

  2. Timeout has fired before the free-job worker. Consequently, the
     scheduler calls `timedout_job()` for a job that isn't timed out.

These two scenarios are problematic because the job was removed from the
`sched->pending_list` before calling `sched->ops->timedout_job()`, which
means that when the job finishes, it won't be freed by the scheduler
though `sched->ops->free_job()`. As a result, the job and its resources
won't be freed, leading to a memory leak.

For v3d specifically, we have observed that these memory leaks can be
significant in certain scenarios, as reported by users in [1][2]. To
address this situation, I submitted a patch similar to commit 704d3d60fec4
("drm/etnaviv: don't block scheduler when GPU is still active") for v3d [3].
This patch has already landed in drm-misc-fixes and successfully resolved
the users' issues.

However, as I mentioned in [3], exposing the scheduler's internals within
the drivers isn't ideal and I believe this specific situation can be
addressed within the DRM scheduler framework.

This series aims to resolve this issue by adding a new DRM sched status
that allows a driver to skip the reset. This new status will indicate that
the job should be reinserted into the pending list, and the driver will
still signal its completion.

[1] https://gitlab.freedesktop.org/mesa/mesa/-/issues/12227
[2] https://github.com/raspberrypi/linux/issues/6817
[3] https://lore.kernel.org/dri-devel/20250430210643.57924-1-mcanal@igalia.com/T/

Best Regards,
- Maíra

---
v1 -> v2:

- Fix several grammar nits across the documentation and commit messages.
- Drop "drm/sched: Always free the job after the timeout" (Tvrtko Ursulin)
- [1/8] NEW PATCH: Rename DRM_GPU_SCHED_STAT_NOMINAL to a more semantic
	name (Tvrtko Ursulin, Philipp Stanner)
- [2/8] Rename DRM_GPU_SCHED_STAT_RUNNING to DRM_GPU_SCHED_STAT_NO_HANG (Tvrtko Ursulin, Philipp Stanner)
- [2/8] Requeue free-job work after reinserting the job to the pending list (Matthew Brost)
- [2/8] Create a helper function to reinsert the job (Philipp Stanner)
- [2/8] Rewrite the commit message (Philipp Stanner)
- [2/8] Add a comment to `drm_sched_start()` documentation, similar to what
	was commented in `drm_sched_stop()` (Philipp Stanner)
- [3/8] Keep HZ as timeout for `drm_mock_sched_job_wait_scheduled()` (Tvrtko Ursulin)
- [4/8] Use a job flag to indicate that `timedout_job()` should skip the
	reset (Tvrtko Ursulin)
- [7/8] Use DRM_GPU_SCHED_STAT_NO_HANG to re-arm the timer in other cases
	as well (Matthew Brost)
- Link to v1: https://lore.kernel.org/r/20250503-sched-skip-reset-v1-0-ed0d6701a3fe@igalia.com

v2 -> v3:

- [2/8] Address comments about the commit message (Philipp Stanner)
- [2/8] Improve comments and documentation style (Philipp Stanner)
- [3/8] Rename the commit title to "drm/sched: Make timeout KUnit tests faster" (Philipp Stanner)
- [3/8] Add Tvrtko's R-b (Tvrtko Ursulin)
- [4/8] Instead of setting up a job duration, advance it manually (Tvrtko Ursulin)
- [4/8] Wait for 2 * MOCK_TIMEOUT instead of MOCK_TIMEOUT (Tvrtko Ursulin)
- [5/8, 6/8, 7/8, 8/8] Use Philipp's suggestion to improve the commit messages (Philipp Stanner)
- Link to v2: https://lore.kernel.org/r/20250530-sched-skip-reset-v2-0-c40a8d2d8daa@igalia.com

v3 -> v4:

- [1/8] s/betters/better and Philipp's R-b (Philipp Stanner)
- [2/8] Apply some documentation nits (Philipp Stanner)
- [3/8] Add Philipp's A-b  (Philipp Stanner)
- [4/8, 5/8] Add Tvrtko's R-b (Tvrtko Ursulin)
- [6/8] Add Lucas' R-b (Lucas Stach)
- Link to v3: https://lore.kernel.org/r/20250618-sched-skip-reset-v3-0-8be5cca2725d@igalia.com

---
Maíra Canal (8):
      drm/sched: Rename DRM_GPU_SCHED_STAT_NOMINAL to DRM_GPU_SCHED_STAT_RESET
      drm/sched: Allow drivers to skip the reset and keep on running
      drm/sched: Make timeout KUnit tests faster
      drm/sched: Add new test for DRM_GPU_SCHED_STAT_NO_HANG
      drm/v3d: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
      drm/etnaviv: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
      drm/xe: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
      drm/panfrost: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset

 drivers/accel/amdxdna/aie2_ctx.c                 |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c          |  2 +-
 drivers/gpu/drm/etnaviv/etnaviv_sched.c          | 13 +++---
 drivers/gpu/drm/imagination/pvr_queue.c          |  4 +-
 drivers/gpu/drm/lima/lima_sched.c                |  6 +--
 drivers/gpu/drm/nouveau/nouveau_exec.c           |  2 +-
 drivers/gpu/drm/nouveau/nouveau_sched.c          |  2 +-
 drivers/gpu/drm/panfrost/panfrost_job.c          | 10 ++---
 drivers/gpu/drm/panthor/panthor_mmu.c            |  2 +-
 drivers/gpu/drm/panthor/panthor_sched.c          |  2 +-
 drivers/gpu/drm/scheduler/sched_main.c           | 45 +++++++++++++++++--
 drivers/gpu/drm/scheduler/tests/mock_scheduler.c |  7 ++-
 drivers/gpu/drm/scheduler/tests/sched_tests.h    |  1 +
 drivers/gpu/drm/scheduler/tests/tests_basic.c    | 55 ++++++++++++++++++++++--
 drivers/gpu/drm/v3d/v3d_sched.c                  | 18 ++------
 drivers/gpu/drm/xe/xe_guc_submit.c               | 14 ++----
 include/drm/gpu_scheduler.h                      |  7 ++-
 17 files changed, 134 insertions(+), 58 deletions(-)
---
base-commit: cedb945101df021e9c7546b7bf490bec4dbc9cc8
change-id: 20250502-sched-skip-reset-bf7c163233da


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4 1/8] drm/sched: Rename DRM_GPU_SCHED_STAT_NOMINAL to DRM_GPU_SCHED_STAT_RESET
  2025-07-07 14:46 [PATCH v4 0/8] drm/sched: Allow drivers to skip the reset with DRM_GPU_SCHED_STAT_NO_HANG Maíra Canal
@ 2025-07-07 14:46 ` Maíra Canal
  2025-07-07 14:46 ` [PATCH v4 2/8] drm/sched: Allow drivers to skip the reset and keep on running Maíra Canal
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Maíra Canal @ 2025-07-07 14:46 UTC (permalink / raw)
  To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
	Christian König, Tvrtko Ursulin, Simona Vetter, David Airlie,
	Melissa Wen, Lucas Stach, Russell King, Christian Gmeiner,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Boris Brezillon, Rob Herring, Steven Price, Liviu Dudau
  Cc: kernel-dev, dri-devel, etnaviv, intel-xe, Maíra Canal,
	Min Ma, Lizhi Hou, Oded Gabbay, Frank Binns, Matt Coster,
	Qiang Yu, Lyude Paul, Alex Deucher, Christian König

Among the scheduler's statuses, the only one that indicates an error is
DRM_GPU_SCHED_STAT_ENODEV. Any status other than DRM_GPU_SCHED_STAT_ENODEV
signifies that the operation succeeded and the GPU is in a nominal state.

However, to provide more information about the GPU's status, it is needed
to convey more information than just "OK".

Therefore, rename DRM_GPU_SCHED_STAT_NOMINAL to
DRM_GPU_SCHED_STAT_RESET, which better communicates the meaning of this
status. The status DRM_GPU_SCHED_STAT_RESET indicates that the GPU has
hung, but it has been successfully reset and is now in a nominal state
again.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Philipp Stanner <phasta@kernel.org>
---
To: Min Ma <min.ma@amd.com>
To: Lizhi Hou <lizhi.hou@amd.com>
To: Oded Gabbay <ogabbay@kernel.org>
To: Frank Binns <frank.binns@imgtec.com>
To: Matt Coster <matt.coster@imgtec.com>
To: Qiang Yu <yuq825@gmail.com>
To: Lyude Paul <lyude@redhat.com>
To: Alex Deucher <alexander.deucher@amd.com>
To: Christian König <christian.koenig@amd.com>
---
 drivers/accel/amdxdna/aie2_ctx.c                 | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c          | 2 +-
 drivers/gpu/drm/etnaviv/etnaviv_sched.c          | 4 ++--
 drivers/gpu/drm/imagination/pvr_queue.c          | 4 ++--
 drivers/gpu/drm/lima/lima_sched.c                | 6 +++---
 drivers/gpu/drm/nouveau/nouveau_exec.c           | 2 +-
 drivers/gpu/drm/nouveau/nouveau_sched.c          | 2 +-
 drivers/gpu/drm/panfrost/panfrost_job.c          | 6 +++---
 drivers/gpu/drm/panthor/panthor_mmu.c            | 2 +-
 drivers/gpu/drm/panthor/panthor_sched.c          | 2 +-
 drivers/gpu/drm/scheduler/sched_main.c           | 2 +-
 drivers/gpu/drm/scheduler/tests/mock_scheduler.c | 2 +-
 drivers/gpu/drm/v3d/v3d_sched.c                  | 6 +++---
 drivers/gpu/drm/xe/xe_guc_submit.c               | 6 +++---
 include/drm/gpu_scheduler.h                      | 4 ++--
 15 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/accel/amdxdna/aie2_ctx.c b/drivers/accel/amdxdna/aie2_ctx.c
index f20999f2d66864fd4a6b7069e866727c37befb39..2cff5419bd2facb59ff5df6388aba0512fd45d5c 100644
--- a/drivers/accel/amdxdna/aie2_ctx.c
+++ b/drivers/accel/amdxdna/aie2_ctx.c
@@ -361,7 +361,7 @@ aie2_sched_job_timedout(struct drm_sched_job *sched_job)
 	aie2_hwctx_restart(xdna, hwctx);
 	mutex_unlock(&xdna->dev_lock);
 
-	return DRM_GPU_SCHED_STAT_NOMINAL;
+	return DRM_GPU_SCHED_STAT_RESET;
 }
 
 static const struct drm_sched_backend_ops sched_ops = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 1e24590ae1449f49e4632fbf2b931e04c03af8d5..58fd1d1f8571a869ad5d7594ea4bb2599a459113 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -200,7 +200,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
 
 exit:
 	drm_dev_exit(idx);
-	return DRM_GPU_SCHED_STAT_NOMINAL;
+	return DRM_GPU_SCHED_STAT_RESET;
 }
 
 int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index 76a3a3e517d8d9f654fb6b9e98e72910795cfc7a..7146069a98492f5fab2a49d96e2054f649e1fe3d 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -86,11 +86,11 @@ static enum drm_gpu_sched_stat etnaviv_sched_timedout_job(struct drm_sched_job
 	drm_sched_resubmit_jobs(&gpu->sched);
 
 	drm_sched_start(&gpu->sched, 0);
-	return DRM_GPU_SCHED_STAT_NOMINAL;
+	return DRM_GPU_SCHED_STAT_RESET;
 
 out_no_timeout:
 	list_add(&sched_job->list, &sched_job->sched->pending_list);
-	return DRM_GPU_SCHED_STAT_NOMINAL;
+	return DRM_GPU_SCHED_STAT_RESET;
 }
 
 static void etnaviv_sched_free_job(struct drm_sched_job *sched_job)
diff --git a/drivers/gpu/drm/imagination/pvr_queue.c b/drivers/gpu/drm/imagination/pvr_queue.c
index 5a41ee79fed646a86344cd16e78efdb45ff02e43..fc415dd0d7a73631bd4144c9f35b9b294c625a12 100644
--- a/drivers/gpu/drm/imagination/pvr_queue.c
+++ b/drivers/gpu/drm/imagination/pvr_queue.c
@@ -803,7 +803,7 @@ static void pvr_queue_start(struct pvr_queue *queue)
  * the scheduler, and re-assign parent fences in the middle.
  *
  * Return:
- *  * DRM_GPU_SCHED_STAT_NOMINAL.
+ *  * DRM_GPU_SCHED_STAT_RESET.
  */
 static enum drm_gpu_sched_stat
 pvr_queue_timedout_job(struct drm_sched_job *s_job)
@@ -854,7 +854,7 @@ pvr_queue_timedout_job(struct drm_sched_job *s_job)
 
 	drm_sched_start(sched, 0);
 
-	return DRM_GPU_SCHED_STAT_NOMINAL;
+	return DRM_GPU_SCHED_STAT_RESET;
 }
 
 /**
diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
index 954f4325b859b2977a2cc608a99a6ebb642f1000..739e8c6c6d909aa4263bad8a12ec07f0c6607bb2 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -412,7 +412,7 @@ static enum drm_gpu_sched_stat lima_sched_timedout_job(struct drm_sched_job *job
 	 */
 	if (dma_fence_is_signaled(task->fence)) {
 		DRM_WARN("%s spurious timeout\n", lima_ip_name(ip));
-		return DRM_GPU_SCHED_STAT_NOMINAL;
+		return DRM_GPU_SCHED_STAT_RESET;
 	}
 
 	/*
@@ -429,7 +429,7 @@ static enum drm_gpu_sched_stat lima_sched_timedout_job(struct drm_sched_job *job
 
 	if (dma_fence_is_signaled(task->fence)) {
 		DRM_WARN("%s unexpectedly high interrupt latency\n", lima_ip_name(ip));
-		return DRM_GPU_SCHED_STAT_NOMINAL;
+		return DRM_GPU_SCHED_STAT_RESET;
 	}
 
 	/*
@@ -467,7 +467,7 @@ static enum drm_gpu_sched_stat lima_sched_timedout_job(struct drm_sched_job *job
 	drm_sched_resubmit_jobs(&pipe->base);
 	drm_sched_start(&pipe->base, 0);
 
-	return DRM_GPU_SCHED_STAT_NOMINAL;
+	return DRM_GPU_SCHED_STAT_RESET;
 }
 
 static void lima_sched_free_job(struct drm_sched_job *job)
diff --git a/drivers/gpu/drm/nouveau/nouveau_exec.c b/drivers/gpu/drm/nouveau/nouveau_exec.c
index 41b7c608c9054869ddadfe17c96100266e44c254..edbbda78bac90432c4877aa39a9587cf976705c7 100644
--- a/drivers/gpu/drm/nouveau/nouveau_exec.c
+++ b/drivers/gpu/drm/nouveau/nouveau_exec.c
@@ -189,7 +189,7 @@ nouveau_exec_job_timeout(struct nouveau_job *job)
 	NV_PRINTK(warn, job->cli, "job timeout, channel %d killed!\n",
 		  chan->chid);
 
-	return DRM_GPU_SCHED_STAT_NOMINAL;
+	return DRM_GPU_SCHED_STAT_RESET;
 }
 
 static const struct nouveau_job_ops nouveau_exec_job_ops = {
diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c b/drivers/gpu/drm/nouveau/nouveau_sched.c
index 460a5fb024129a0557f2b55008278e1378019d89..e60f7892f5ce9aff0c5fa1908c1a0445891927ed 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sched.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
@@ -371,7 +371,7 @@ nouveau_sched_timedout_job(struct drm_sched_job *sched_job)
 {
 	struct drm_gpu_scheduler *sched = sched_job->sched;
 	struct nouveau_job *job = to_nouveau_job(sched_job);
-	enum drm_gpu_sched_stat stat = DRM_GPU_SCHED_STAT_NOMINAL;
+	enum drm_gpu_sched_stat stat = DRM_GPU_SCHED_STAT_RESET;
 
 	drm_sched_stop(sched, sched_job);
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 5657106c2f7d0a0ca6162850767f58f3200cce13..afcffe7f8fe9e11f84e4ab7e8f5a72f7bf583690 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -755,7 +755,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job
 	 * spurious. Bail out.
 	 */
 	if (dma_fence_is_signaled(job->done_fence))
-		return DRM_GPU_SCHED_STAT_NOMINAL;
+		return DRM_GPU_SCHED_STAT_RESET;
 
 	/*
 	 * Panfrost IRQ handler may take a long time to process an interrupt
@@ -770,7 +770,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job
 
 	if (dma_fence_is_signaled(job->done_fence)) {
 		dev_warn(pfdev->dev, "unexpectedly high interrupt latency\n");
-		return DRM_GPU_SCHED_STAT_NOMINAL;
+		return DRM_GPU_SCHED_STAT_RESET;
 	}
 
 	dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x, status=0x%x, head=0x%x, tail=0x%x, sched_job=%p",
@@ -786,7 +786,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job
 	atomic_set(&pfdev->reset.pending, 1);
 	panfrost_reset(pfdev, sched_job);
 
-	return DRM_GPU_SCHED_STAT_NOMINAL;
+	return DRM_GPU_SCHED_STAT_RESET;
 }
 
 static void panfrost_reset_work(struct work_struct *work)
diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
index b39ea6acc6a9681f2ffa7d52248b6d2c119d1f1b..d0ae462015510bd0c25aaffc059d28084e600372 100644
--- a/drivers/gpu/drm/panthor/panthor_mmu.c
+++ b/drivers/gpu/drm/panthor/panthor_mmu.c
@@ -2270,7 +2270,7 @@ static enum drm_gpu_sched_stat
 panthor_vm_bind_timedout_job(struct drm_sched_job *sched_job)
 {
 	WARN(1, "VM_BIND ops are synchronous for now, there should be no timeout!");
-	return DRM_GPU_SCHED_STAT_NOMINAL;
+	return DRM_GPU_SCHED_STAT_RESET;
 }
 
 static const struct drm_sched_backend_ops panthor_vm_bind_ops = {
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index a2248f692a030c1c84869b9a1948ad1cb0c0b490..8f17394cc82aad9eaf01e473cd9d3dea46fa3d61 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -3241,7 +3241,7 @@ queue_timedout_job(struct drm_sched_job *sched_job)
 
 	queue_start(queue);
 
-	return DRM_GPU_SCHED_STAT_NOMINAL;
+	return DRM_GPU_SCHED_STAT_RESET;
 }
 
 static void queue_free_job(struct drm_sched_job *sched_job)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 648b5d2feff886bc12e81af3c35335fa4e5dd050..0f32e2cb43d6af294408968a970990f9f5c47bee 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -535,7 +535,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
 {
 	struct drm_gpu_scheduler *sched;
 	struct drm_sched_job *job;
-	enum drm_gpu_sched_stat status = DRM_GPU_SCHED_STAT_NOMINAL;
+	enum drm_gpu_sched_stat status = DRM_GPU_SCHED_STAT_RESET;
 
 	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
 
diff --git a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
index 49d067fecd67f705f7a7b957ce04699e3cc7af76..998162202972eb5919dfff4c8784ecc22c00ec9d 100644
--- a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
+++ b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
@@ -231,7 +231,7 @@ mock_sched_timedout_job(struct drm_sched_job *sched_job)
 	drm_sched_job_cleanup(sched_job);
 	/* Mock job itself is freed by the kunit framework. */
 
-	return DRM_GPU_SCHED_STAT_NOMINAL;
+	return DRM_GPU_SCHED_STAT_RESET;
 }
 
 static void mock_sched_free_job(struct drm_sched_job *sched_job)
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 35f131a46d0701cc8040d3b9654595a2bc260eab..e2b7f24d528e773968daea0f5b31c869584bb692 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -741,7 +741,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job)
 
 	mutex_unlock(&v3d->reset_lock);
 
-	return DRM_GPU_SCHED_STAT_NOMINAL;
+	return DRM_GPU_SCHED_STAT_RESET;
 }
 
 static void
@@ -773,7 +773,7 @@ v3d_cl_job_timedout(struct drm_sched_job *sched_job, enum v3d_queue q,
 		*timedout_ctra = ctra;
 
 		v3d_sched_skip_reset(sched_job);
-		return DRM_GPU_SCHED_STAT_NOMINAL;
+		return DRM_GPU_SCHED_STAT_RESET;
 	}
 
 	return v3d_gpu_reset_for_timeout(v3d, sched_job);
@@ -819,7 +819,7 @@ v3d_csd_job_timedout(struct drm_sched_job *sched_job)
 		job->timedout_batches = batches;
 
 		v3d_sched_skip_reset(sched_job);
-		return DRM_GPU_SCHED_STAT_NOMINAL;
+		return DRM_GPU_SCHED_STAT_RESET;
 	}
 
 	return v3d_gpu_reset_for_timeout(v3d, sched_job);
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 4a9ada5edbca4facfbd3ba82082dc9c3af4fc191..9c7e445b9ea7ce7e3610eadca023e6d810e683e9 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1082,7 +1082,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 		xe_sched_add_pending_job(sched, job);
 		xe_sched_submission_start(sched);
 
-		return DRM_GPU_SCHED_STAT_NOMINAL;
+		return DRM_GPU_SCHED_STAT_RESET;
 	}
 
 	/* Kill the run_job entry point */
@@ -1251,7 +1251,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	/* Start fence signaling */
 	xe_hw_fence_irq_start(q->fence_irq);
 
-	return DRM_GPU_SCHED_STAT_NOMINAL;
+	return DRM_GPU_SCHED_STAT_RESET;
 
 sched_enable:
 	enable_scheduling(q);
@@ -1264,7 +1264,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	xe_sched_add_pending_job(sched, job);
 	xe_sched_submission_start(sched);
 
-	return DRM_GPU_SCHED_STAT_NOMINAL;
+	return DRM_GPU_SCHED_STAT_RESET;
 }
 
 static void __guc_exec_queue_fini_async(struct work_struct *w)
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index e62a7214e05217d72de5c6e5168544d47099090a..83e5c00d8dd9a83ab20547a93d6fc572de97616e 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -391,12 +391,12 @@ struct drm_sched_job {
  * enum drm_gpu_sched_stat - the scheduler's status
  *
  * @DRM_GPU_SCHED_STAT_NONE: Reserved. Do not use.
- * @DRM_GPU_SCHED_STAT_NOMINAL: Operation succeeded.
+ * @DRM_GPU_SCHED_STAT_RESET: The GPU hung and successfully reset.
  * @DRM_GPU_SCHED_STAT_ENODEV: Error: Device is not available anymore.
  */
 enum drm_gpu_sched_stat {
 	DRM_GPU_SCHED_STAT_NONE,
-	DRM_GPU_SCHED_STAT_NOMINAL,
+	DRM_GPU_SCHED_STAT_RESET,
 	DRM_GPU_SCHED_STAT_ENODEV,
 };
 

-- 
2.50.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 2/8] drm/sched: Allow drivers to skip the reset and keep on running
  2025-07-07 14:46 [PATCH v4 0/8] drm/sched: Allow drivers to skip the reset with DRM_GPU_SCHED_STAT_NO_HANG Maíra Canal
  2025-07-07 14:46 ` [PATCH v4 1/8] drm/sched: Rename DRM_GPU_SCHED_STAT_NOMINAL to DRM_GPU_SCHED_STAT_RESET Maíra Canal
@ 2025-07-07 14:46 ` Maíra Canal
  2025-07-08  7:02   ` Philipp Stanner
  2025-07-07 14:46 ` [PATCH v4 3/8] drm/sched: Make timeout KUnit tests faster Maíra Canal
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Maíra Canal @ 2025-07-07 14:46 UTC (permalink / raw)
  To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
	Christian König, Tvrtko Ursulin, Simona Vetter, David Airlie,
	Melissa Wen, Lucas Stach, Russell King, Christian Gmeiner,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Boris Brezillon, Rob Herring, Steven Price, Liviu Dudau
  Cc: kernel-dev, dri-devel, etnaviv, intel-xe, Maíra Canal

When the DRM scheduler times out, it's possible that the GPU isn't hung;
instead, a job may still be running, and there may be no valid reason to
reset the hardware. This can occur in two situations:

  1. The GPU exposes some mechanism that ensures the GPU is still making
     progress. By checking this mechanism, the driver can safely skip the
     reset, re-arm the timeout, and allow the job to continue running until
     completion. This is the case for v3d, Etnaviv, and Xe.
  2. Timeout has fired before the free-job worker. Consequently, the
     scheduler calls `sched->ops->timedout_job()` for a job that isn't
     timed out.

These two scenarios are problematic because the job was removed from the
`sched->pending_list` before calling `sched->ops->timedout_job()`, which
means that when the job finishes, it won't be freed by the scheduler
though `sched->ops->free_job()` - leading to a memory leak.

To solve those problems, create a new `drm_gpu_sched_stat`, called
DRM_GPU_SCHED_STAT_NO_HANG, that allows a driver to skip the reset. The
new status will indicate that the job must be reinserted into the
pending list, and the hardware / driver will still complete that job.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 43 ++++++++++++++++++++++++++++++++--
 include/drm/gpu_scheduler.h            |  3 +++
 2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 0f32e2cb43d6af294408968a970990f9f5c47bee..d3f48526883cc7ceea0cd1d0c62fb119f7092704 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -374,11 +374,16 @@ static void drm_sched_run_free_queue(struct drm_gpu_scheduler *sched)
 {
 	struct drm_sched_job *job;
 
-	spin_lock(&sched->job_list_lock);
 	job = list_first_entry_or_null(&sched->pending_list,
 				       struct drm_sched_job, list);
 	if (job && dma_fence_is_signaled(&job->s_fence->finished))
 		__drm_sched_run_free_queue(sched);
+}
+
+static void drm_sched_run_free_queue_unlocked(struct drm_gpu_scheduler *sched)
+{
+	spin_lock(&sched->job_list_lock);
+	drm_sched_run_free_queue(sched);
 	spin_unlock(&sched->job_list_lock);
 }
 
@@ -531,6 +536,31 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
 	spin_unlock(&sched->job_list_lock);
 }
 
+/**
+ * drm_sched_job_reinsert_on_false_timeout - reinsert the job on a false timeout
+ * @sched: scheduler instance
+ * @job: job to be reinserted on the pending list
+ *
+ * In the case of a "false timeout" - when a timeout occurs but the GPU isn't
+ * hung and the job is making progress, the scheduler must reinsert the job back
+ * into the pending list. Otherwise, the job and its resources won't be freed
+ * through the &drm_sched_backend_ops.free_job callback.
+ *
+ * Note that after reinserting the job, the scheduler enqueues the free-job
+ * work again if ready. Otherwise, a signaled job could be added to the pending
+ * list, but never freed.
+ *
+ * This function must be used in "false timeout" cases only.
+ */
+static void drm_sched_job_reinsert_on_false_timeout(struct drm_gpu_scheduler *sched,
+						    struct drm_sched_job *job)
+{
+	spin_lock(&sched->job_list_lock);
+	list_add(&job->list, &sched->pending_list);
+	drm_sched_run_free_queue(sched);
+	spin_unlock(&sched->job_list_lock);
+}
+
 static void drm_sched_job_timedout(struct work_struct *work)
 {
 	struct drm_gpu_scheduler *sched;
@@ -564,6 +594,9 @@ static void drm_sched_job_timedout(struct work_struct *work)
 			job->sched->ops->free_job(job);
 			sched->free_guilty = false;
 		}
+
+		if (status == DRM_GPU_SCHED_STAT_NO_HANG)
+			drm_sched_job_reinsert_on_false_timeout(sched, job);
 	} else {
 		spin_unlock(&sched->job_list_lock);
 	}
@@ -586,6 +619,9 @@ static void drm_sched_job_timedout(struct work_struct *work)
  * This function is typically used for reset recovery (see the docu of
  * drm_sched_backend_ops.timedout_job() for details). Do not call it for
  * scheduler teardown, i.e., before calling drm_sched_fini().
+ *
+ * As it's used for reset recovery, drm_sched_stop() shouldn't be called
+ * if the driver skipped the reset (DRM_GPU_SCHED_STAT_NO_HANG).
  */
 void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
 {
@@ -671,6 +707,9 @@ EXPORT_SYMBOL(drm_sched_stop);
  * drm_sched_backend_ops.timedout_job() for details). Do not call it for
  * scheduler startup. The scheduler itself is fully operational after
  * drm_sched_init() succeeded.
+ *
+ * As it's used for reset recovery, drm_sched_start() shouldn't be called
+ * if the driver skipped the reset (DRM_GPU_SCHED_STAT_NO_HANG).
  */
 void drm_sched_start(struct drm_gpu_scheduler *sched, int errno)
 {
@@ -1192,7 +1231,7 @@ static void drm_sched_free_job_work(struct work_struct *w)
 	if (job)
 		sched->ops->free_job(job);
 
-	drm_sched_run_free_queue(sched);
+	drm_sched_run_free_queue_unlocked(sched);
 	drm_sched_run_job_queue(sched);
 }
 
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 83e5c00d8dd9a83ab20547a93d6fc572de97616e..257d21d8d1d2c4f035d6d4882e159de59b263c76 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -393,11 +393,14 @@ struct drm_sched_job {
  * @DRM_GPU_SCHED_STAT_NONE: Reserved. Do not use.
  * @DRM_GPU_SCHED_STAT_RESET: The GPU hung and successfully reset.
  * @DRM_GPU_SCHED_STAT_ENODEV: Error: Device is not available anymore.
+ * @DRM_GPU_SCHED_STAT_NO_HANG: Contrary to scheduler's assumption, the GPU
+ * did not hang and is still running.
  */
 enum drm_gpu_sched_stat {
 	DRM_GPU_SCHED_STAT_NONE,
 	DRM_GPU_SCHED_STAT_RESET,
 	DRM_GPU_SCHED_STAT_ENODEV,
+	DRM_GPU_SCHED_STAT_NO_HANG,
 };
 
 /**

-- 
2.50.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 3/8] drm/sched: Make timeout KUnit tests faster
  2025-07-07 14:46 [PATCH v4 0/8] drm/sched: Allow drivers to skip the reset with DRM_GPU_SCHED_STAT_NO_HANG Maíra Canal
  2025-07-07 14:46 ` [PATCH v4 1/8] drm/sched: Rename DRM_GPU_SCHED_STAT_NOMINAL to DRM_GPU_SCHED_STAT_RESET Maíra Canal
  2025-07-07 14:46 ` [PATCH v4 2/8] drm/sched: Allow drivers to skip the reset and keep on running Maíra Canal
@ 2025-07-07 14:46 ` Maíra Canal
  2025-07-08  7:41   ` Simona Vetter
  2025-07-07 14:46 ` [PATCH v4 4/8] drm/sched: Add new test for DRM_GPU_SCHED_STAT_NO_HANG Maíra Canal
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Maíra Canal @ 2025-07-07 14:46 UTC (permalink / raw)
  To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
	Christian König, Tvrtko Ursulin, Simona Vetter, David Airlie,
	Melissa Wen, Lucas Stach, Russell King, Christian Gmeiner,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Boris Brezillon, Rob Herring, Steven Price, Liviu Dudau
  Cc: kernel-dev, dri-devel, etnaviv, intel-xe, Maíra Canal

As more KUnit tests are introduced to evaluate the basic capabilities of
the `timedout_job()` hook, the test suite will continue to increase in
duration. To reduce the overall running time of the test suite, decrease
the scheduler's timeout for the timeout tests.

Before this commit:

[15:42:26] Elapsed time: 15.637s total, 0.002s configuring, 10.387s building, 5.229s running

After this commit:

[15:45:26] Elapsed time: 9.263s total, 0.002s configuring, 5.168s building, 4.037s running

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Acked-by: Philipp Stanner <phasta@kernel.org>
---
 drivers/gpu/drm/scheduler/tests/tests_basic.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/tests/tests_basic.c b/drivers/gpu/drm/scheduler/tests/tests_basic.c
index 7230057e0594c6246f02608f07fcb1f8d738ac75..41c648782f4548e202bd8711b45d28eead9bd0b2 100644
--- a/drivers/gpu/drm/scheduler/tests/tests_basic.c
+++ b/drivers/gpu/drm/scheduler/tests/tests_basic.c
@@ -5,6 +5,8 @@
 
 #include "sched_tests.h"
 
+#define MOCK_TIMEOUT (HZ / 5)
+
 /*
  * DRM scheduler basic tests should check the basic functional correctness of
  * the scheduler, including some very light smoke testing. More targeted tests,
@@ -28,7 +30,7 @@ static void drm_sched_basic_exit(struct kunit *test)
 
 static int drm_sched_timeout_init(struct kunit *test)
 {
-	test->priv = drm_mock_sched_new(test, HZ);
+	test->priv = drm_mock_sched_new(test, MOCK_TIMEOUT);
 
 	return 0;
 }
@@ -227,14 +229,14 @@ static void drm_sched_basic_timeout(struct kunit *test)
 	done = drm_mock_sched_job_wait_scheduled(job, HZ);
 	KUNIT_ASSERT_TRUE(test, done);
 
-	done = drm_mock_sched_job_wait_finished(job, HZ / 2);
+	done = drm_mock_sched_job_wait_finished(job, MOCK_TIMEOUT / 2);
 	KUNIT_ASSERT_FALSE(test, done);
 
 	KUNIT_ASSERT_EQ(test,
 			job->flags & DRM_MOCK_SCHED_JOB_TIMEDOUT,
 			0);
 
-	done = drm_mock_sched_job_wait_finished(job, HZ);
+	done = drm_mock_sched_job_wait_finished(job, MOCK_TIMEOUT);
 	KUNIT_ASSERT_FALSE(test, done);
 
 	KUNIT_ASSERT_EQ(test,

-- 
2.50.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 4/8] drm/sched: Add new test for DRM_GPU_SCHED_STAT_NO_HANG
  2025-07-07 14:46 [PATCH v4 0/8] drm/sched: Allow drivers to skip the reset with DRM_GPU_SCHED_STAT_NO_HANG Maíra Canal
                   ` (2 preceding siblings ...)
  2025-07-07 14:46 ` [PATCH v4 3/8] drm/sched: Make timeout KUnit tests faster Maíra Canal
@ 2025-07-07 14:46 ` Maíra Canal
  2025-07-08  7:08   ` Philipp Stanner
  2025-07-07 14:46 ` [PATCH v4 5/8] drm/v3d: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset Maíra Canal
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Maíra Canal @ 2025-07-07 14:46 UTC (permalink / raw)
  To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
	Christian König, Tvrtko Ursulin, Simona Vetter, David Airlie,
	Melissa Wen, Lucas Stach, Russell King, Christian Gmeiner,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Boris Brezillon, Rob Herring, Steven Price, Liviu Dudau
  Cc: kernel-dev, dri-devel, etnaviv, intel-xe, Maíra Canal

Add a test to submit a single job against a scheduler with the timeout
configured and verify that if the job is still running, the timeout
handler will skip the reset and allow the job to complete.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
---
 drivers/gpu/drm/scheduler/tests/mock_scheduler.c |  5 +++
 drivers/gpu/drm/scheduler/tests/sched_tests.h    |  1 +
 drivers/gpu/drm/scheduler/tests/tests_basic.c    | 47 ++++++++++++++++++++++++
 3 files changed, 53 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
index 998162202972eb5919dfff4c8784ecc22c00ec9d..b3b33f85b7ae30c8e6bba97866a74978b0a96fa7 100644
--- a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
+++ b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
@@ -231,6 +231,11 @@ mock_sched_timedout_job(struct drm_sched_job *sched_job)
 	drm_sched_job_cleanup(sched_job);
 	/* Mock job itself is freed by the kunit framework. */
 
+	if (job->flags & DRM_MOCK_SCHED_JOB_DONT_RESET) {
+		job->flags &= ~DRM_MOCK_SCHED_JOB_DONT_RESET;
+		return DRM_GPU_SCHED_STAT_NO_HANG;
+	}
+
 	return DRM_GPU_SCHED_STAT_RESET;
 }
 
diff --git a/drivers/gpu/drm/scheduler/tests/sched_tests.h b/drivers/gpu/drm/scheduler/tests/sched_tests.h
index fbba38137f0c324cf2472fe5b3a8a78ec016e829..4adf961e1930203fe94241a8a0ae5f7817874a39 100644
--- a/drivers/gpu/drm/scheduler/tests/sched_tests.h
+++ b/drivers/gpu/drm/scheduler/tests/sched_tests.h
@@ -98,6 +98,7 @@ struct drm_mock_sched_job {
 
 #define DRM_MOCK_SCHED_JOB_DONE		0x1
 #define DRM_MOCK_SCHED_JOB_TIMEDOUT	0x2
+#define DRM_MOCK_SCHED_JOB_DONT_RESET	0x4
 	unsigned long		flags;
 
 	struct list_head	link;
diff --git a/drivers/gpu/drm/scheduler/tests/tests_basic.c b/drivers/gpu/drm/scheduler/tests/tests_basic.c
index 41c648782f4548e202bd8711b45d28eead9bd0b2..91c0449590ed24c3da18ab7d930cca47d7c317c7 100644
--- a/drivers/gpu/drm/scheduler/tests/tests_basic.c
+++ b/drivers/gpu/drm/scheduler/tests/tests_basic.c
@@ -246,8 +246,55 @@ static void drm_sched_basic_timeout(struct kunit *test)
 	drm_mock_sched_entity_free(entity);
 }
 
+static void drm_sched_skip_reset(struct kunit *test)
+{
+	struct drm_mock_scheduler *sched = test->priv;
+	struct drm_mock_sched_entity *entity;
+	struct drm_mock_sched_job *job;
+	unsigned int i;
+	bool done;
+
+	/*
+	 * Submit a single job against a scheduler with the timeout configured
+	 * and verify that if the job is still running, the timeout handler
+	 * will skip the reset and allow the job to complete.
+	 */
+
+	entity = drm_mock_sched_entity_new(test,
+					   DRM_SCHED_PRIORITY_NORMAL,
+					   sched);
+	job = drm_mock_sched_job_new(test, entity);
+
+	job->flags = DRM_MOCK_SCHED_JOB_DONT_RESET;
+
+	drm_mock_sched_job_submit(job);
+
+	done = drm_mock_sched_job_wait_scheduled(job, HZ);
+	KUNIT_ASSERT_TRUE(test, done);
+
+	done = drm_mock_sched_job_wait_finished(job, 2 * MOCK_TIMEOUT);
+	KUNIT_ASSERT_FALSE(test, done);
+
+	KUNIT_ASSERT_EQ(test,
+			job->flags & DRM_MOCK_SCHED_JOB_TIMEDOUT,
+			DRM_MOCK_SCHED_JOB_TIMEDOUT);
+
+	KUNIT_ASSERT_EQ(test,
+			job->flags & DRM_MOCK_SCHED_JOB_DONT_RESET,
+			0);
+
+	i = drm_mock_sched_advance(sched, 1);
+	KUNIT_ASSERT_EQ(test, i, 1);
+
+	done = drm_mock_sched_job_wait_finished(job, HZ);
+	KUNIT_ASSERT_TRUE(test, done);
+
+	drm_mock_sched_entity_free(entity);
+}
+
 static struct kunit_case drm_sched_timeout_tests[] = {
 	KUNIT_CASE(drm_sched_basic_timeout),
+	KUNIT_CASE(drm_sched_skip_reset),
 	{}
 };
 

-- 
2.50.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 5/8] drm/v3d: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
  2025-07-07 14:46 [PATCH v4 0/8] drm/sched: Allow drivers to skip the reset with DRM_GPU_SCHED_STAT_NO_HANG Maíra Canal
                   ` (3 preceding siblings ...)
  2025-07-07 14:46 ` [PATCH v4 4/8] drm/sched: Add new test for DRM_GPU_SCHED_STAT_NO_HANG Maíra Canal
@ 2025-07-07 14:46 ` Maíra Canal
  2025-07-07 14:46 ` [PATCH v4 6/8] drm/etnaviv: " Maíra Canal
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Maíra Canal @ 2025-07-07 14:46 UTC (permalink / raw)
  To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
	Christian König, Tvrtko Ursulin, Simona Vetter, David Airlie,
	Melissa Wen, Lucas Stach, Russell King, Christian Gmeiner,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Boris Brezillon, Rob Herring, Steven Price, Liviu Dudau
  Cc: kernel-dev, dri-devel, etnaviv, intel-xe, Maíra Canal

When a CL/CSD job times out, we check if the GPU has made any progress
since the last timeout. If so, instead of resetting the hardware, we skip
the reset and allow the timer to be rearmed. This gives long-running jobs
a chance to complete.

Instead of manipulating scheduler's internals, inform the scheduler that
the job did not actually timeout and no reset was performed through
the new status code DRM_GPU_SCHED_STAT_NO_HANG.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_sched.c | 16 ++--------------
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index e2b7f24d528e773968daea0f5b31c869584bb692..cc85f1b19ac405146a2a516f335a46376684bc91 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -744,16 +744,6 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job)
 	return DRM_GPU_SCHED_STAT_RESET;
 }
 
-static void
-v3d_sched_skip_reset(struct drm_sched_job *sched_job)
-{
-	struct drm_gpu_scheduler *sched = sched_job->sched;
-
-	spin_lock(&sched->job_list_lock);
-	list_add(&sched_job->list, &sched->pending_list);
-	spin_unlock(&sched->job_list_lock);
-}
-
 static enum drm_gpu_sched_stat
 v3d_cl_job_timedout(struct drm_sched_job *sched_job, enum v3d_queue q,
 		    u32 *timedout_ctca, u32 *timedout_ctra)
@@ -772,8 +762,7 @@ v3d_cl_job_timedout(struct drm_sched_job *sched_job, enum v3d_queue q,
 		*timedout_ctca = ctca;
 		*timedout_ctra = ctra;
 
-		v3d_sched_skip_reset(sched_job);
-		return DRM_GPU_SCHED_STAT_RESET;
+		return DRM_GPU_SCHED_STAT_NO_HANG;
 	}
 
 	return v3d_gpu_reset_for_timeout(v3d, sched_job);
@@ -818,8 +807,7 @@ v3d_csd_job_timedout(struct drm_sched_job *sched_job)
 	if (job->timedout_batches != batches) {
 		job->timedout_batches = batches;
 
-		v3d_sched_skip_reset(sched_job);
-		return DRM_GPU_SCHED_STAT_RESET;
+		return DRM_GPU_SCHED_STAT_NO_HANG;
 	}
 
 	return v3d_gpu_reset_for_timeout(v3d, sched_job);

-- 
2.50.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 6/8] drm/etnaviv: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
  2025-07-07 14:46 [PATCH v4 0/8] drm/sched: Allow drivers to skip the reset with DRM_GPU_SCHED_STAT_NO_HANG Maíra Canal
                   ` (4 preceding siblings ...)
  2025-07-07 14:46 ` [PATCH v4 5/8] drm/v3d: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset Maíra Canal
@ 2025-07-07 14:46 ` Maíra Canal
  2025-07-07 14:46 ` [PATCH v4 7/8] drm/xe: " Maíra Canal
  2025-07-07 14:46 ` [PATCH v4 8/8] drm/panfrost: " Maíra Canal
  7 siblings, 0 replies; 15+ messages in thread
From: Maíra Canal @ 2025-07-07 14:46 UTC (permalink / raw)
  To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
	Christian König, Tvrtko Ursulin, Simona Vetter, David Airlie,
	Melissa Wen, Lucas Stach, Russell King, Christian Gmeiner,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Boris Brezillon, Rob Herring, Steven Price, Liviu Dudau
  Cc: kernel-dev, dri-devel, etnaviv, intel-xe, Maíra Canal

Etnaviv can skip a hardware reset in two situations:

  1. TDR has fired before the free-job worker and the timeout is spurious.
  2. The GPU is still making progress on the front-end and we can give
     the job a chance to complete.

Instead of manipulating scheduler's internals, inform the scheduler that
the job did not actually timeout and no reset was performed through
the new status code DRM_GPU_SCHED_STAT_NO_HANG.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
---
 drivers/gpu/drm/etnaviv/etnaviv_sched.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index 7146069a98492f5fab2a49d96e2054f649e1fe3d..46f5391e84a12232b247886cf1311f8e09f42f04 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -40,11 +40,11 @@ static enum drm_gpu_sched_stat etnaviv_sched_timedout_job(struct drm_sched_job
 	int change;
 
 	/*
-	 * If the GPU managed to complete this jobs fence, the timout is
-	 * spurious. Bail out.
+	 * If the GPU managed to complete this jobs fence, the timeout has
+	 * fired before free-job worker. The timeout is spurious, so bail out.
 	 */
 	if (dma_fence_is_signaled(submit->out_fence))
-		goto out_no_timeout;
+		return DRM_GPU_SCHED_STAT_NO_HANG;
 
 	/*
 	 * If the GPU is still making forward progress on the front-end (which
@@ -70,7 +70,7 @@ static enum drm_gpu_sched_stat etnaviv_sched_timedout_job(struct drm_sched_job
 		gpu->hangcheck_dma_addr = dma_addr;
 		gpu->hangcheck_primid = primid;
 		gpu->hangcheck_fence = gpu->completed_fence;
-		goto out_no_timeout;
+		return DRM_GPU_SCHED_STAT_NO_HANG;
 	}
 
 	/* block scheduler */
@@ -86,10 +86,7 @@ static enum drm_gpu_sched_stat etnaviv_sched_timedout_job(struct drm_sched_job
 	drm_sched_resubmit_jobs(&gpu->sched);
 
 	drm_sched_start(&gpu->sched, 0);
-	return DRM_GPU_SCHED_STAT_RESET;
 
-out_no_timeout:
-	list_add(&sched_job->list, &sched_job->sched->pending_list);
 	return DRM_GPU_SCHED_STAT_RESET;
 }
 

-- 
2.50.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 7/8] drm/xe: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
  2025-07-07 14:46 [PATCH v4 0/8] drm/sched: Allow drivers to skip the reset with DRM_GPU_SCHED_STAT_NO_HANG Maíra Canal
                   ` (5 preceding siblings ...)
  2025-07-07 14:46 ` [PATCH v4 6/8] drm/etnaviv: " Maíra Canal
@ 2025-07-07 14:46 ` Maíra Canal
  2025-07-08  7:22   ` Matthew Brost
  2025-07-07 14:46 ` [PATCH v4 8/8] drm/panfrost: " Maíra Canal
  7 siblings, 1 reply; 15+ messages in thread
From: Maíra Canal @ 2025-07-07 14:46 UTC (permalink / raw)
  To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
	Christian König, Tvrtko Ursulin, Simona Vetter, David Airlie,
	Melissa Wen, Lucas Stach, Russell King, Christian Gmeiner,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Boris Brezillon, Rob Herring, Steven Price, Liviu Dudau
  Cc: kernel-dev, dri-devel, etnaviv, intel-xe, Maíra Canal

Xe can skip the reset if TDR has fired before the free job worker and can
also re-arm the timeout timer in some scenarios. Instead of manipulating
scheduler's internals, inform the scheduler that the job did not actually
timeout and no reset was performed through the new status code
DRM_GPU_SCHED_STAT_NO_HANG.

Note that, in the first case, there is no need to restart submission if it
hasn't been stopped.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 9c7e445b9ea7ce7e3610eadca023e6d810e683e9..f6289eeffd852e40b33d0e455d9bcc21a4fb1467 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1078,12 +1078,8 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	 * list so job can be freed and kick scheduler ensuring free job is not
 	 * lost.
 	 */
-	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &job->fence->flags)) {
-		xe_sched_add_pending_job(sched, job);
-		xe_sched_submission_start(sched);
-
-		return DRM_GPU_SCHED_STAT_RESET;
-	}
+	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &job->fence->flags))
+		return DRM_GPU_SCHED_STAT_NO_HANG;
 
 	/* Kill the run_job entry point */
 	xe_sched_submission_stop(sched);
@@ -1261,10 +1257,8 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	 * but there is not currently an easy way to do in DRM scheduler. With
 	 * some thought, do this in a follow up.
 	 */
-	xe_sched_add_pending_job(sched, job);
 	xe_sched_submission_start(sched);
-
-	return DRM_GPU_SCHED_STAT_RESET;
+	return DRM_GPU_SCHED_STAT_NO_HANG;
 }
 
 static void __guc_exec_queue_fini_async(struct work_struct *w)

-- 
2.50.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 8/8] drm/panfrost: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
  2025-07-07 14:46 [PATCH v4 0/8] drm/sched: Allow drivers to skip the reset with DRM_GPU_SCHED_STAT_NO_HANG Maíra Canal
                   ` (6 preceding siblings ...)
  2025-07-07 14:46 ` [PATCH v4 7/8] drm/xe: " Maíra Canal
@ 2025-07-07 14:46 ` Maíra Canal
  7 siblings, 0 replies; 15+ messages in thread
From: Maíra Canal @ 2025-07-07 14:46 UTC (permalink / raw)
  To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
	Christian König, Tvrtko Ursulin, Simona Vetter, David Airlie,
	Melissa Wen, Lucas Stach, Russell King, Christian Gmeiner,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Boris Brezillon, Rob Herring, Steven Price, Liviu Dudau
  Cc: kernel-dev, dri-devel, etnaviv, intel-xe, Maíra Canal

Panfrost can skip the reset if TDR has fired before the free-job worker.
Currently, since Panfrost doesn't take any action on these scenarios, the
job is being leaked, considering that `free_job()` won't be called.

To avoid such leaks, inform the scheduler that the job did not actually
timeout and no reset was performed through the new status code
DRM_GPU_SCHED_STAT_NO_HANG.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
 drivers/gpu/drm/panfrost/panfrost_job.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index afcffe7f8fe9e11f84e4ab7e8f5a72f7bf583690..842e012cdc68e130a13e08ffae3b7fdf5f8e1acc 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -751,11 +751,11 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job
 	int js = panfrost_job_get_slot(job);
 
 	/*
-	 * If the GPU managed to complete this jobs fence, the timeout is
-	 * spurious. Bail out.
+	 * If the GPU managed to complete this jobs fence, the timeout has
+	 * fired before free-job worker. The timeout is spurious, so bail out.
 	 */
 	if (dma_fence_is_signaled(job->done_fence))
-		return DRM_GPU_SCHED_STAT_RESET;
+		return DRM_GPU_SCHED_STAT_NO_HANG;
 
 	/*
 	 * Panfrost IRQ handler may take a long time to process an interrupt
@@ -770,7 +770,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job
 
 	if (dma_fence_is_signaled(job->done_fence)) {
 		dev_warn(pfdev->dev, "unexpectedly high interrupt latency\n");
-		return DRM_GPU_SCHED_STAT_RESET;
+		return DRM_GPU_SCHED_STAT_NO_HANG;
 	}
 
 	dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x, status=0x%x, head=0x%x, tail=0x%x, sched_job=%p",

-- 
2.50.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 2/8] drm/sched: Allow drivers to skip the reset and keep on running
  2025-07-07 14:46 ` [PATCH v4 2/8] drm/sched: Allow drivers to skip the reset and keep on running Maíra Canal
@ 2025-07-08  7:02   ` Philipp Stanner
  2025-07-08 12:38     ` Maíra Canal
  0 siblings, 1 reply; 15+ messages in thread
From: Philipp Stanner @ 2025-07-08  7:02 UTC (permalink / raw)
  To: Maíra Canal, Matthew Brost, Danilo Krummrich,
	Philipp Stanner, Christian König, Tvrtko Ursulin,
	Simona Vetter, David Airlie, Melissa Wen, Lucas Stach,
	Russell King, Christian Gmeiner, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Boris Brezillon, Rob Herring,
	Steven Price, Liviu Dudau
  Cc: kernel-dev, dri-devel, etnaviv, intel-xe

On Mon, 2025-07-07 at 11:46 -0300, Maíra Canal wrote:
> When the DRM scheduler times out, it's possible that the GPU isn't
> hung;
> instead, a job may still be running, and there may be no valid reason
> to
> reset the hardware. This can occur in two situations:
> 
>   1. The GPU exposes some mechanism that ensures the GPU is still
> making
>      progress. By checking this mechanism, the driver can safely skip

I think this should be rephrased, because it reads as if there is a
mechanism with which the GPU can be forced to still make progress even
with a while (1) job or something.

I think what we want probably is:

"When the DRM scheduler times out, it's possible that the GPU isn't
hung; instead, a job just took unusually long (longer than the timeout)
but is still running, and there is, thus, no reason to reset the
hardware. A false-positive timeout can occur in two scenarios:

1. The job took too long, but the driver determined through a GPU-
specific mechanism that the hardware is still making progress. Hence,
the driver would like the scheduler to skip the timeout and treat the
job as still pending from then onward.

> the
>      reset, re-arm the timeout, and allow the job to continue running
> until
>      completion. This is the case for v3d, Etnaviv, and Xe.
>   2. Timeout has fired before the free-job worker. Consequently, the
>      scheduler calls `sched->ops->timedout_job()` for a job that
> isn't
>      timed out.


"2. The job actually did complete from the driver's point of view, but
there was a race with the scheduler's timeout, which determined this
job timed out slightly before the free-job worker could remove it from
the pending_list."


Feel free to adjust the wording to your liking.

> 
> These two scenarios are problematic because the job was removed from
> the
> `sched->pending_list` before calling `sched->ops->timedout_job()`,
> which
> means that when the job finishes, it won't be freed by the scheduler
> though `sched->ops->free_job()` - leading to a memory leak.
> 
> To solve those problems, create a new `drm_gpu_sched_stat`, called
> DRM_GPU_SCHED_STAT_NO_HANG, that allows a driver to skip the reset.

nit:
s/that/which

? Reads a bit clearer IMO

> The
> new status will indicate that the job must be reinserted into the
> pending list, and the hardware / driver will still complete that job.

I think it would be cool to write it as pending_list consistently
throughout the patch.

> 
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 43
> ++++++++++++++++++++++++++++++++--
>  include/drm/gpu_scheduler.h            |  3 +++
>  2 files changed, 44 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index
> 0f32e2cb43d6af294408968a970990f9f5c47bee..d3f48526883cc7ceea0cd1d0c62
> fb119f7092704 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -374,11 +374,16 @@ static void drm_sched_run_free_queue(struct
> drm_gpu_scheduler *sched)
>  {
>  	struct drm_sched_job *job;
>  
> -	spin_lock(&sched->job_list_lock);
>  	job = list_first_entry_or_null(&sched->pending_list,
>  				       struct drm_sched_job, list);
>  	if (job && dma_fence_is_signaled(&job->s_fence->finished))
>  		__drm_sched_run_free_queue(sched);
> +}
> +
> +static void drm_sched_run_free_queue_unlocked(struct
> drm_gpu_scheduler *sched)
> +{
> +	spin_lock(&sched->job_list_lock);
> +	drm_sched_run_free_queue(sched);
>  	spin_unlock(&sched->job_list_lock);
>  }
>  
> @@ -531,6 +536,31 @@ static void drm_sched_job_begin(struct
> drm_sched_job *s_job)
>  	spin_unlock(&sched->job_list_lock);
>  }
>  
> +/**
> + * drm_sched_job_reinsert_on_false_timeout - reinsert the job on a
> false timeout
> + * @sched: scheduler instance
> + * @job: job to be reinserted on the pending list
> + *
> + * In the case of a "false timeout" - when a timeout occurs but the
> GPU isn't
> + * hung and the job is making progress, the scheduler must reinsert

nit:
s/job/GPU

> the job back
> + * into the pending list. Otherwise, the job and its resources won't
> 
> Dito
>  be freed
> + * through the &drm_sched_backend_ops.free_job callback.
> + *
> + * Note that after reinserting the job, the scheduler enqueues the
> free-job
> + * work again if ready. Otherwise, a signaled job could be added to
> the pending
> + * list, but never freed.
> + *
> + * This function must be used in "false timeout" cases only.

I think the "Note" should be removed, because it reads as if the API
user has to mind about that in any way.

If you want to highlight that, maybe move it down in the function body
as a code comment. I think it's good to know, but not relevant for the
API user, or is it?

> + */
> +static void drm_sched_job_reinsert_on_false_timeout(struct
> drm_gpu_scheduler *sched,
> +						    struct
> drm_sched_job *job)
> +{
> +	spin_lock(&sched->job_list_lock);
> +	list_add(&job->list, &sched->pending_list);
> +	drm_sched_run_free_queue(sched);
> +	spin_unlock(&sched->job_list_lock);
> +}
> +
>  static void drm_sched_job_timedout(struct work_struct *work)
>  {
>  	struct drm_gpu_scheduler *sched;
> @@ -564,6 +594,9 @@ static void drm_sched_job_timedout(struct
> work_struct *work)
>  			job->sched->ops->free_job(job);
>  			sched->free_guilty = false;
>  		}
> +
> +		if (status == DRM_GPU_SCHED_STAT_NO_HANG)
> +			drm_sched_job_reinsert_on_false_timeout(sche
> d, job);
>  	} else {
>  		spin_unlock(&sched->job_list_lock);
>  	}
> @@ -586,6 +619,9 @@ static void drm_sched_job_timedout(struct
> work_struct *work)
>   * This function is typically used for reset recovery (see the docu
> of
>   * drm_sched_backend_ops.timedout_job() for details). Do not call it
> for
>   * scheduler teardown, i.e., before calling drm_sched_fini().
> + *
> + * As it's used for reset recovery, drm_sched_stop() shouldn't be
> called
> + * if the driver skipped the reset (DRM_GPU_SCHED_STAT_NO_HANG).

I know I suggested these comments; but reading them again I think
they're confusing.

It reads as if drm_sched_stop() should not be called if the driver,
sometime in the past, skipped a reset.

It would be more waterproof like so:

"As it is only used for reset recovery, drivers must not call this
function in their &struct drm_sched_backend_ops.timedout_job callback
if they are skipping the reset through status &enum
drm_gpu_sched_stat.DRM_GPU_SCHED_STAT_NO_HANG."

Note that we should also prohibit using the function with
s/should/must not

Better safe than sorry..

>   */
>  void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
> drm_sched_job *bad)
>  {
> @@ -671,6 +707,9 @@ EXPORT_SYMBOL(drm_sched_stop);
>   * drm_sched_backend_ops.timedout_job() for details). Do not call it
> for
>   * scheduler startup. The scheduler itself is fully operational
> after
>   * drm_sched_init() succeeded.
> + *
> + * As it's used for reset recovery, drm_sched_start() shouldn't be
> called
> + * if the driver skipped the reset (DRM_GPU_SCHED_STAT_NO_HANG).

Same.


Otherwise, great series. Looking forward to having this merged.

P.

>   */
>  void drm_sched_start(struct drm_gpu_scheduler *sched, int errno)
>  {
> @@ -1192,7 +1231,7 @@ static void drm_sched_free_job_work(struct
> work_struct *w)
>  	if (job)
>  		sched->ops->free_job(job);
>  
> -	drm_sched_run_free_queue(sched);
> +	drm_sched_run_free_queue_unlocked(sched);
>  	drm_sched_run_job_queue(sched);
>  }
>  
> diff --git a/include/drm/gpu_scheduler.h
> b/include/drm/gpu_scheduler.h
> index
> 83e5c00d8dd9a83ab20547a93d6fc572de97616e..257d21d8d1d2c4f035d6d4882e1
> 59de59b263c76 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -393,11 +393,14 @@ struct drm_sched_job {
>   * @DRM_GPU_SCHED_STAT_NONE: Reserved. Do not use.
>   * @DRM_GPU_SCHED_STAT_RESET: The GPU hung and successfully reset.
>   * @DRM_GPU_SCHED_STAT_ENODEV: Error: Device is not available
> anymore.
> + * @DRM_GPU_SCHED_STAT_NO_HANG: Contrary to scheduler's assumption,
> the GPU
> + * did not hang and is still running.
>   */
>  enum drm_gpu_sched_stat {
>  	DRM_GPU_SCHED_STAT_NONE,
>  	DRM_GPU_SCHED_STAT_RESET,
>  	DRM_GPU_SCHED_STAT_ENODEV,
> +	DRM_GPU_SCHED_STAT_NO_HANG,
>  };
>  
>  /**
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 4/8] drm/sched: Add new test for DRM_GPU_SCHED_STAT_NO_HANG
  2025-07-07 14:46 ` [PATCH v4 4/8] drm/sched: Add new test for DRM_GPU_SCHED_STAT_NO_HANG Maíra Canal
@ 2025-07-08  7:08   ` Philipp Stanner
  0 siblings, 0 replies; 15+ messages in thread
From: Philipp Stanner @ 2025-07-08  7:08 UTC (permalink / raw)
  To: Maíra Canal, Matthew Brost, Danilo Krummrich,
	Philipp Stanner, Christian König, Tvrtko Ursulin,
	Simona Vetter, David Airlie, Melissa Wen, Lucas Stach,
	Russell King, Christian Gmeiner, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Boris Brezillon, Rob Herring,
	Steven Price, Liviu Dudau
  Cc: kernel-dev, dri-devel, etnaviv, intel-xe

On Mon, 2025-07-07 at 11:46 -0300, Maíra Canal wrote:
> Add a test to submit a single job against a scheduler with the
> timeout
> configured and verify that if the job is still running, the timeout
> handler will skip the reset and allow the job to complete.
> 
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>

Reviewed-by: Philipp Stanner <phasta@kernel.org>


> ---
>  drivers/gpu/drm/scheduler/tests/mock_scheduler.c |  5 +++
>  drivers/gpu/drm/scheduler/tests/sched_tests.h    |  1 +
>  drivers/gpu/drm/scheduler/tests/tests_basic.c    | 47
> ++++++++++++++++++++++++
>  3 files changed, 53 insertions(+)
> 
> diff --git a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
> b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
> index
> 998162202972eb5919dfff4c8784ecc22c00ec9d..b3b33f85b7ae30c8e6bba97866a
> 74978b0a96fa7 100644
> --- a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
> @@ -231,6 +231,11 @@ mock_sched_timedout_job(struct drm_sched_job
> *sched_job)
>  	drm_sched_job_cleanup(sched_job);
>  	/* Mock job itself is freed by the kunit framework. */
>  
> +	if (job->flags & DRM_MOCK_SCHED_JOB_DONT_RESET) {
> +		job->flags &= ~DRM_MOCK_SCHED_JOB_DONT_RESET;
> +		return DRM_GPU_SCHED_STAT_NO_HANG;
> +	}
> +
>  	return DRM_GPU_SCHED_STAT_RESET;
>  }
>  
> diff --git a/drivers/gpu/drm/scheduler/tests/sched_tests.h
> b/drivers/gpu/drm/scheduler/tests/sched_tests.h
> index
> fbba38137f0c324cf2472fe5b3a8a78ec016e829..4adf961e1930203fe94241a8a0a
> e5f7817874a39 100644
> --- a/drivers/gpu/drm/scheduler/tests/sched_tests.h
> +++ b/drivers/gpu/drm/scheduler/tests/sched_tests.h
> @@ -98,6 +98,7 @@ struct drm_mock_sched_job {
>  
>  #define DRM_MOCK_SCHED_JOB_DONE		0x1
>  #define DRM_MOCK_SCHED_JOB_TIMEDOUT	0x2
> +#define DRM_MOCK_SCHED_JOB_DONT_RESET	0x4
>  	unsigned long		flags;
>  
>  	struct list_head	link;
> diff --git a/drivers/gpu/drm/scheduler/tests/tests_basic.c
> b/drivers/gpu/drm/scheduler/tests/tests_basic.c
> index
> 41c648782f4548e202bd8711b45d28eead9bd0b2..91c0449590ed24c3da18ab7d930
> cca47d7c317c7 100644
> --- a/drivers/gpu/drm/scheduler/tests/tests_basic.c
> +++ b/drivers/gpu/drm/scheduler/tests/tests_basic.c
> @@ -246,8 +246,55 @@ static void drm_sched_basic_timeout(struct kunit
> *test)
>  	drm_mock_sched_entity_free(entity);
>  }
>  
> +static void drm_sched_skip_reset(struct kunit *test)
> +{
> +	struct drm_mock_scheduler *sched = test->priv;
> +	struct drm_mock_sched_entity *entity;
> +	struct drm_mock_sched_job *job;
> +	unsigned int i;
> +	bool done;
> +
> +	/*
> +	 * Submit a single job against a scheduler with the timeout
> configured
> +	 * and verify that if the job is still running, the timeout
> handler
> +	 * will skip the reset and allow the job to complete.
> +	 */
> +
> +	entity = drm_mock_sched_entity_new(test,
> +					  
> DRM_SCHED_PRIORITY_NORMAL,
> +					   sched);
> +	job = drm_mock_sched_job_new(test, entity);
> +
> +	job->flags = DRM_MOCK_SCHED_JOB_DONT_RESET;
> +
> +	drm_mock_sched_job_submit(job);
> +
> +	done = drm_mock_sched_job_wait_scheduled(job, HZ);
> +	KUNIT_ASSERT_TRUE(test, done);
> +
> +	done = drm_mock_sched_job_wait_finished(job, 2 *
> MOCK_TIMEOUT);
> +	KUNIT_ASSERT_FALSE(test, done);
> +
> +	KUNIT_ASSERT_EQ(test,
> +			job->flags & DRM_MOCK_SCHED_JOB_TIMEDOUT,
> +			DRM_MOCK_SCHED_JOB_TIMEDOUT);
> +
> +	KUNIT_ASSERT_EQ(test,
> +			job->flags & DRM_MOCK_SCHED_JOB_DONT_RESET,
> +			0);
> +
> +	i = drm_mock_sched_advance(sched, 1);
> +	KUNIT_ASSERT_EQ(test, i, 1);
> +
> +	done = drm_mock_sched_job_wait_finished(job, HZ);
> +	KUNIT_ASSERT_TRUE(test, done);
> +
> +	drm_mock_sched_entity_free(entity);
> +}
> +
>  static struct kunit_case drm_sched_timeout_tests[] = {
>  	KUNIT_CASE(drm_sched_basic_timeout),
> +	KUNIT_CASE(drm_sched_skip_reset),
>  	{}
>  };
>  
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 7/8] drm/xe: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
  2025-07-07 14:46 ` [PATCH v4 7/8] drm/xe: " Maíra Canal
@ 2025-07-08  7:22   ` Matthew Brost
  2025-07-08  9:47     ` Philipp Stanner
  0 siblings, 1 reply; 15+ messages in thread
From: Matthew Brost @ 2025-07-08  7:22 UTC (permalink / raw)
  To: Maíra Canal
  Cc: Danilo Krummrich, Philipp Stanner, Christian König,
	Tvrtko Ursulin, Simona Vetter, David Airlie, Melissa Wen,
	Lucas Stach, Russell King, Christian Gmeiner, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Boris Brezillon, Rob Herring,
	Steven Price, Liviu Dudau, kernel-dev, dri-devel, etnaviv,
	intel-xe

On Mon, Jul 07, 2025 at 11:46:36AM -0300, Maíra Canal wrote:
> Xe can skip the reset if TDR has fired before the free job worker and can
> also re-arm the timeout timer in some scenarios. Instead of manipulating
> scheduler's internals, inform the scheduler that the job did not actually
> timeout and no reset was performed through the new status code
> DRM_GPU_SCHED_STAT_NO_HANG.
> 
> Note that, in the first case, there is no need to restart submission if it
> hasn't been stopped.
> 
> Signed-off-by: Maíra Canal <mcanal@igalia.com>

I'm fairly certain this is correct. However, Intel's CI didn't run with
your latest series. Can you resubmit and ensure a clean CI run before
merging? CI can be a bit flaky—if you get some failures, ping me and
I’ll let you know if they're related to this patch.

With clean CI:
Reviewed-by: Matthew Brost matthew.brost@intel.com

> ---
>  drivers/gpu/drm/xe/xe_guc_submit.c | 12 +++---------
>  1 file changed, 3 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 9c7e445b9ea7ce7e3610eadca023e6d810e683e9..f6289eeffd852e40b33d0e455d9bcc21a4fb1467 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -1078,12 +1078,8 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>  	 * list so job can be freed and kick scheduler ensuring free job is not
>  	 * lost.
>  	 */
> -	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &job->fence->flags)) {
> -		xe_sched_add_pending_job(sched, job);
> -		xe_sched_submission_start(sched);
> -
> -		return DRM_GPU_SCHED_STAT_RESET;
> -	}
> +	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &job->fence->flags))
> +		return DRM_GPU_SCHED_STAT_NO_HANG;
>  
>  	/* Kill the run_job entry point */
>  	xe_sched_submission_stop(sched);
> @@ -1261,10 +1257,8 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>  	 * but there is not currently an easy way to do in DRM scheduler. With
>  	 * some thought, do this in a follow up.
>  	 */
> -	xe_sched_add_pending_job(sched, job);
>  	xe_sched_submission_start(sched);
> -
> -	return DRM_GPU_SCHED_STAT_RESET;
> +	return DRM_GPU_SCHED_STAT_NO_HANG;
>  }
>  
>  static void __guc_exec_queue_fini_async(struct work_struct *w)
> 
> -- 
> 2.50.0
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 3/8] drm/sched: Make timeout KUnit tests faster
  2025-07-07 14:46 ` [PATCH v4 3/8] drm/sched: Make timeout KUnit tests faster Maíra Canal
@ 2025-07-08  7:41   ` Simona Vetter
  0 siblings, 0 replies; 15+ messages in thread
From: Simona Vetter @ 2025-07-08  7:41 UTC (permalink / raw)
  To: Maíra Canal
  Cc: Matthew Brost, Danilo Krummrich, Philipp Stanner,
	Christian König, Tvrtko Ursulin, Simona Vetter, David Airlie,
	Melissa Wen, Lucas Stach, Russell King, Christian Gmeiner,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Boris Brezillon, Rob Herring, Steven Price, Liviu Dudau,
	kernel-dev, dri-devel, etnaviv, intel-xe

On Mon, Jul 07, 2025 at 11:46:32AM -0300, Maíra Canal wrote:
> As more KUnit tests are introduced to evaluate the basic capabilities of
> the `timedout_job()` hook, the test suite will continue to increase in
> duration. To reduce the overall running time of the test suite, decrease
> the scheduler's timeout for the timeout tests.
> 
> Before this commit:
> 
> [15:42:26] Elapsed time: 15.637s total, 0.002s configuring, 10.387s building, 5.229s running
> 
> After this commit:
> 
> [15:45:26] Elapsed time: 9.263s total, 0.002s configuring, 5.168s building, 4.037s running
> 
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Acked-by: Philipp Stanner <phasta@kernel.org>
> ---
>  drivers/gpu/drm/scheduler/tests/tests_basic.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/tests/tests_basic.c b/drivers/gpu/drm/scheduler/tests/tests_basic.c
> index 7230057e0594c6246f02608f07fcb1f8d738ac75..41c648782f4548e202bd8711b45d28eead9bd0b2 100644
> --- a/drivers/gpu/drm/scheduler/tests/tests_basic.c
> +++ b/drivers/gpu/drm/scheduler/tests/tests_basic.c
> @@ -5,6 +5,8 @@
>  
>  #include "sched_tests.h"
>  
> +#define MOCK_TIMEOUT (HZ / 5)

Eventually I think we want a test interface to immediately time out jobs
by rescheduling their timer to immediately (and handling all the trickery
of making sure it's scheduled first). That could also help with testcases
that want to exercise specific timing.

But for now this seems good enough.
-Sima

> +
>  /*
>   * DRM scheduler basic tests should check the basic functional correctness of
>   * the scheduler, including some very light smoke testing. More targeted tests,
> @@ -28,7 +30,7 @@ static void drm_sched_basic_exit(struct kunit *test)
>  
>  static int drm_sched_timeout_init(struct kunit *test)
>  {
> -	test->priv = drm_mock_sched_new(test, HZ);
> +	test->priv = drm_mock_sched_new(test, MOCK_TIMEOUT);
>  
>  	return 0;
>  }
> @@ -227,14 +229,14 @@ static void drm_sched_basic_timeout(struct kunit *test)
>  	done = drm_mock_sched_job_wait_scheduled(job, HZ);
>  	KUNIT_ASSERT_TRUE(test, done);
>  
> -	done = drm_mock_sched_job_wait_finished(job, HZ / 2);
> +	done = drm_mock_sched_job_wait_finished(job, MOCK_TIMEOUT / 2);
>  	KUNIT_ASSERT_FALSE(test, done);
>  
>  	KUNIT_ASSERT_EQ(test,
>  			job->flags & DRM_MOCK_SCHED_JOB_TIMEDOUT,
>  			0);
>  
> -	done = drm_mock_sched_job_wait_finished(job, HZ);
> +	done = drm_mock_sched_job_wait_finished(job, MOCK_TIMEOUT);
>  	KUNIT_ASSERT_FALSE(test, done);
>  
>  	KUNIT_ASSERT_EQ(test,
> 
> -- 
> 2.50.0
> 

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 7/8] drm/xe: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
  2025-07-08  7:22   ` Matthew Brost
@ 2025-07-08  9:47     ` Philipp Stanner
  0 siblings, 0 replies; 15+ messages in thread
From: Philipp Stanner @ 2025-07-08  9:47 UTC (permalink / raw)
  To: Matthew Brost, Maíra Canal
  Cc: Danilo Krummrich, Philipp Stanner, Christian König,
	Tvrtko Ursulin, Simona Vetter, David Airlie, Melissa Wen,
	Lucas Stach, Russell King, Christian Gmeiner, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Boris Brezillon, Rob Herring,
	Steven Price, Liviu Dudau, kernel-dev, dri-devel, etnaviv,
	intel-xe

On Tue, 2025-07-08 at 00:22 -0700, Matthew Brost wrote:
> On Mon, Jul 07, 2025 at 11:46:36AM -0300, Maíra Canal wrote:
> > Xe can skip the reset if TDR has fired before the free job worker
> > and can
> > also re-arm the timeout timer in some scenarios. Instead of
> > manipulating
> > scheduler's internals, inform the scheduler that the job did not
> > actually
> > timeout and no reset was performed through the new status code
> > DRM_GPU_SCHED_STAT_NO_HANG.
> > 
> > Note that, in the first case, there is no need to restart
> > submission if it
> > hasn't been stopped.
> > 
> > Signed-off-by: Maíra Canal <mcanal@igalia.com>
> 
> I'm fairly certain this is correct. However, Intel's CI didn't run
> with
> your latest series. Can you resubmit and ensure a clean CI run before
> merging?

How can someone who's not at Intel ensure that?

P.

>  CI can be a bit flaky—if you get some failures, ping me and
> I’ll let you know if they're related to this patch.
> 
> With clean CI:
> Reviewed-by: Matthew Brost matthew.brost@intel.com
> 
> > ---
> >  drivers/gpu/drm/xe/xe_guc_submit.c | 12 +++---------
> >  1 file changed, 3 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c
> > b/drivers/gpu/drm/xe/xe_guc_submit.c
> > index
> > 9c7e445b9ea7ce7e3610eadca023e6d810e683e9..f6289eeffd852e40b33d0e455
> > d9bcc21a4fb1467 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > @@ -1078,12 +1078,8 @@ guc_exec_queue_timedout_job(struct
> > drm_sched_job *drm_job)
> >  	 * list so job can be freed and kick scheduler ensuring
> > free job is not
> >  	 * lost.
> >  	 */
> > -	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &job->fence-
> > >flags)) {
> > -		xe_sched_add_pending_job(sched, job);
> > -		xe_sched_submission_start(sched);
> > -
> > -		return DRM_GPU_SCHED_STAT_RESET;
> > -	}
> > +	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &job->fence-
> > >flags))
> > +		return DRM_GPU_SCHED_STAT_NO_HANG;
> >  
> >  	/* Kill the run_job entry point */
> >  	xe_sched_submission_stop(sched);
> > @@ -1261,10 +1257,8 @@ guc_exec_queue_timedout_job(struct
> > drm_sched_job *drm_job)
> >  	 * but there is not currently an easy way to do in DRM
> > scheduler. With
> >  	 * some thought, do this in a follow up.
> >  	 */
> > -	xe_sched_add_pending_job(sched, job);
> >  	xe_sched_submission_start(sched);
> > -
> > -	return DRM_GPU_SCHED_STAT_RESET;
> > +	return DRM_GPU_SCHED_STAT_NO_HANG;
> >  }
> >  
> >  static void __guc_exec_queue_fini_async(struct work_struct *w)
> > 
> > -- 
> > 2.50.0
> > 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 2/8] drm/sched: Allow drivers to skip the reset and keep on running
  2025-07-08  7:02   ` Philipp Stanner
@ 2025-07-08 12:38     ` Maíra Canal
  0 siblings, 0 replies; 15+ messages in thread
From: Maíra Canal @ 2025-07-08 12:38 UTC (permalink / raw)
  To: phasta, Matthew Brost, Danilo Krummrich, Christian König,
	Tvrtko Ursulin, Simona Vetter, David Airlie, Melissa Wen,
	Lucas Stach, Russell King, Christian Gmeiner, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Boris Brezillon, Rob Herring,
	Steven Price, Liviu Dudau
  Cc: kernel-dev, dri-devel, etnaviv, intel-xe

Hi Philipp,

On 08/07/25 04:02, Philipp Stanner wrote:
> On Mon, 2025-07-07 at 11:46 -0300, Maíra Canal wrote:
>> When the DRM scheduler times out, it's possible that the GPU isn't
>> hung;
>> instead, a job may still be running, and there may be no valid reason
>> to
>> reset the hardware. This can occur in two situations:
>>
>>    1. The GPU exposes some mechanism that ensures the GPU is still
>> making
>>       progress. By checking this mechanism, the driver can safely skip
> 
> I think this should be rephrased, because it reads as if there is a
> mechanism with which the GPU can be forced to still make progress even
> with a while (1) job or something.
> 
> I think what we want probably is:
> 
> "When the DRM scheduler times out, it's possible that the GPU isn't
> hung; instead, a job just took unusually long (longer than the timeout)
> but is still running, and there is, thus, no reason to reset the
> hardware. A false-positive timeout can occur in two scenarios:
> 
> 1. The job took too long, but the driver determined through a GPU-
> specific mechanism that the hardware is still making progress. Hence,
> the driver would like the scheduler to skip the timeout and treat the
> job as still pending from then onward.
> 

Applied it locally.

>> the
>>       reset, re-arm the timeout, and allow the job to continue running
>> until
>>       completion. This is the case for v3d, Etnaviv, and Xe.
>>    2. Timeout has fired before the free-job worker. Consequently, the
>>       scheduler calls `sched->ops->timedout_job()` for a job that
>> isn't
>>       timed out.
> 
> 
> "2. The job actually did complete from the driver's point of view, but
> there was a race with the scheduler's timeout, which determined this
> job timed out slightly before the free-job worker could remove it from
> the pending_list."
> 

Actually, for this second point, I prefer my wording. It's more straight
to the point and easier to understand when you read the code. I'd prefer
to keep the second point as it is.

All other comments have been applied. Thanks for your feedback!

Best Regards,
- Maíra



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-07-08 12:38 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-07 14:46 [PATCH v4 0/8] drm/sched: Allow drivers to skip the reset with DRM_GPU_SCHED_STAT_NO_HANG Maíra Canal
2025-07-07 14:46 ` [PATCH v4 1/8] drm/sched: Rename DRM_GPU_SCHED_STAT_NOMINAL to DRM_GPU_SCHED_STAT_RESET Maíra Canal
2025-07-07 14:46 ` [PATCH v4 2/8] drm/sched: Allow drivers to skip the reset and keep on running Maíra Canal
2025-07-08  7:02   ` Philipp Stanner
2025-07-08 12:38     ` Maíra Canal
2025-07-07 14:46 ` [PATCH v4 3/8] drm/sched: Make timeout KUnit tests faster Maíra Canal
2025-07-08  7:41   ` Simona Vetter
2025-07-07 14:46 ` [PATCH v4 4/8] drm/sched: Add new test for DRM_GPU_SCHED_STAT_NO_HANG Maíra Canal
2025-07-08  7:08   ` Philipp Stanner
2025-07-07 14:46 ` [PATCH v4 5/8] drm/v3d: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset Maíra Canal
2025-07-07 14:46 ` [PATCH v4 6/8] drm/etnaviv: " Maíra Canal
2025-07-07 14:46 ` [PATCH v4 7/8] drm/xe: " Maíra Canal
2025-07-08  7:22   ` Matthew Brost
2025-07-08  9:47     ` Philipp Stanner
2025-07-07 14:46 ` [PATCH v4 8/8] drm/panfrost: " Maíra Canal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).