All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/sched: Document race condition in drm_sched_fini()
@ 2025-07-31  9:30 Philipp Stanner
  0 siblings, 0 replies; only message in thread
From: Philipp Stanner @ 2025-07-31  9:30 UTC (permalink / raw)
  To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
	Christian König, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter
  Cc: dri-devel, linux-kernel, James Flowers

In drm_sched_fini() all entities are marked as stopped - without taking
the appropriate lock, because that would deadlock. That means that
drm_sched_fini() and drm_sched_entity_push_job() can race against each
other.

This should most likely be fixed by establishing the rule that all
entities associated with a scheduler must be torn down first. Then,
however, the locking should be removed from drm_sched_fini() alltogether
with an appropriate comment.

Reported-by: James Flowers <bold.zone2373@fastmail.com>
Link: https://lore.kernel.org/dri-devel/20250720235748.2798-1-bold.zone2373@fastmail.com/
Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
 drivers/gpu/drm/scheduler/sched_main.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 5a550fd76bf0..738aefed727c 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1424,6 +1424,22 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
 			 * Prevents reinsertion and marks job_queue as idle,
 			 * it will be removed from the rq in drm_sched_entity_fini()
 			 * eventually
+			 *
+			 * FIXME:
+			 * This lacks the proper spin_lock(&s_entity->lock) and
+			 * is, therefore, a race condition. Most notably, it
+			 * can race with drm_sched_entity_push_job(). The lock
+			 * cannot be taken here, however, because this would
+			 * lead to lock inversion -> deadlock.
+			 *
+			 * The best solution probably is to enforce the life
+			 * time rule of all entities having to be torn down
+			 * before their scheduler. Then, however, locking could
+			 * be dropped alltogether from this function.
+			 *
+			 * For now, this remains a potential race in all
+			 * drivers that keep entities aliver for longer than
+			 * the scheduler.
 			 */
 			s_entity->stopped = true;
 		spin_unlock(&rq->lock);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] only message in thread

only message in thread, other threads:[~2025-07-31  9:30 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-31  9:30 [PATCH] drm/sched: Document race condition in drm_sched_fini() Philipp Stanner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.