The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/sched: Guard sched->ready with ACCESS_ONCE()
@ 2026-06-29 10:40 Philipp Stanner
  2026-06-29 10:40 ` [PATCH 2/2] drm/sched: Deprecate drm_sched_wqueue_ready() Philipp Stanner
  2026-06-29 12:09 ` [PATCH 1/2] drm/sched: Guard sched->ready with ACCESS_ONCE() Danilo Krummrich
  0 siblings, 2 replies; 4+ messages in thread
From: Philipp Stanner @ 2026-06-29 10:40 UTC (permalink / raw)
  To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
	Christian König, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter
  Cc: dri-devel, linux-kernel

commit faf6e1a87e07 ("drm/sched: Add boolean to mark if sched is ready to work v5")

moved tracking of the hardware ring's state from the driver (amdgpu in
that case) into drm_sched. To do so, it added a 'ready' flag to the
scheduler.

This flag is currently being accessed through drm_sched_wqueue_ready()
and, even worse, directly through the scheduler struct. Since drm_sched
does not have a consistent locking design, all these accesses are
potentially undefined behavior as they are subject to compiler
optimizations.

Make the code base more robust by guarding access to the 'ready' flag
with ACCESS_ONCE().

Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
 drivers/gpu/drm/scheduler/sched_main.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index d2ca01b31ee4..c4e4ac436a86 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -929,7 +929,7 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
 	for (i = 0; i < num_sched_list; ++i) {
 		sched = sched_list[i];
 
-		if (!sched->ready) {
+		if (!READ_ONCE(sched->ready)) {
 			DRM_WARN("scheduler %s is not ready, skipping",
 				 sched->name);
 			continue;
@@ -1143,7 +1143,18 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
 
 	if (sched->own_submit_wq)
 		destroy_workqueue(sched->submit_wq);
-	sched->ready = false;
+
+	/* The 'ready' flag only exists in drm_sched because amdgpu uses it to
+	 * represent the state of its hardware rings. This problem is related to
+	 * the fundamental issue of drm_sched not having a solid, consistent
+	 * locking design.
+	 *
+	 * Obviously, it does not make sense at all to set this flag to false
+	 * here, but since it's unclear whether it can ever be removed from
+	 * amdgpu's point of view, we guard it here with WRITE_ONCE() to make it
+	 * slightly less broken.
+	 */
+	WRITE_ONCE(sched->ready, false);
 
 	if (!list_empty(&sched->pending_list))
 		dev_warn(sched->dev, "Tearing down scheduler while jobs are pending!\n");
@@ -1195,7 +1206,7 @@ EXPORT_SYMBOL(drm_sched_increase_karma);
  */
 bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched)
 {
-	return sched->ready;
+	return READ_ONCE(sched->ready);
 }
 EXPORT_SYMBOL(drm_sched_wqueue_ready);
 

base-commit: 6648301c5bb2ef23f0fb15bcb01d21ff66f36799
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-29 12:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-29 10:40 [PATCH 1/2] drm/sched: Guard sched->ready with ACCESS_ONCE() Philipp Stanner
2026-06-29 10:40 ` [PATCH 2/2] drm/sched: Deprecate drm_sched_wqueue_ready() Philipp Stanner
2026-06-29 11:57   ` Danilo Krummrich
2026-06-29 12:09 ` [PATCH 1/2] drm/sched: Guard sched->ready with ACCESS_ONCE() Danilo Krummrich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox