The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/sched: Guard sched->ready with ACCESS_ONCE()
@ 2026-06-29 10:40 Philipp Stanner
  2026-06-29 10:40 ` [PATCH 2/2] drm/sched: Deprecate drm_sched_wqueue_ready() Philipp Stanner
  2026-06-29 12:09 ` [PATCH 1/2] drm/sched: Guard sched->ready with ACCESS_ONCE() Danilo Krummrich
  0 siblings, 2 replies; 4+ messages in thread
From: Philipp Stanner @ 2026-06-29 10:40 UTC (permalink / raw)
  To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
	Christian König, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter
  Cc: dri-devel, linux-kernel

commit faf6e1a87e07 ("drm/sched: Add boolean to mark if sched is ready to work v5")

moved tracking of the hardware ring's state from the driver (amdgpu in
that case) into drm_sched. To do so, it added a 'ready' flag to the
scheduler.

This flag is currently being accessed through drm_sched_wqueue_ready()
and, even worse, directly through the scheduler struct. Since drm_sched
does not have a consistent locking design, all these accesses are
potentially undefined behavior as they are subject to compiler
optimizations.

Make the code base more robust by guarding access to the 'ready' flag
with ACCESS_ONCE().

Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
 drivers/gpu/drm/scheduler/sched_main.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index d2ca01b31ee4..c4e4ac436a86 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -929,7 +929,7 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
 	for (i = 0; i < num_sched_list; ++i) {
 		sched = sched_list[i];
 
-		if (!sched->ready) {
+		if (!READ_ONCE(sched->ready)) {
 			DRM_WARN("scheduler %s is not ready, skipping",
 				 sched->name);
 			continue;
@@ -1143,7 +1143,18 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
 
 	if (sched->own_submit_wq)
 		destroy_workqueue(sched->submit_wq);
-	sched->ready = false;
+
+	/* The 'ready' flag only exists in drm_sched because amdgpu uses it to
+	 * represent the state of its hardware rings. This problem is related to
+	 * the fundamental issue of drm_sched not having a solid, consistent
+	 * locking design.
+	 *
+	 * Obviously, it does not make sense at all to set this flag to false
+	 * here, but since it's unclear whether it can ever be removed from
+	 * amdgpu's point of view, we guard it here with WRITE_ONCE() to make it
+	 * slightly less broken.
+	 */
+	WRITE_ONCE(sched->ready, false);
 
 	if (!list_empty(&sched->pending_list))
 		dev_warn(sched->dev, "Tearing down scheduler while jobs are pending!\n");
@@ -1195,7 +1206,7 @@ EXPORT_SYMBOL(drm_sched_increase_karma);
  */
 bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched)
 {
-	return sched->ready;
+	return READ_ONCE(sched->ready);
 }
 EXPORT_SYMBOL(drm_sched_wqueue_ready);
 

base-commit: 6648301c5bb2ef23f0fb15bcb01d21ff66f36799
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] drm/sched: Deprecate drm_sched_wqueue_ready()
  2026-06-29 10:40 [PATCH 1/2] drm/sched: Guard sched->ready with ACCESS_ONCE() Philipp Stanner
@ 2026-06-29 10:40 ` Philipp Stanner
  2026-06-29 11:57   ` Danilo Krummrich
  2026-06-29 12:09 ` [PATCH 1/2] drm/sched: Guard sched->ready with ACCESS_ONCE() Danilo Krummrich
  1 sibling, 1 reply; 4+ messages in thread
From: Philipp Stanner @ 2026-06-29 10:40 UTC (permalink / raw)
  To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
	Christian König, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter
  Cc: dri-devel, linux-kernel

drm_sched_wqueue_ready() stems from an old rework, see

commit faf6e1a87e07 ("drm/sched: Add boolean to mark if sched is ready to work v5")

That commit moved tracking of the hardware ring's state was moved into
drm_sched.

This is highly racy and problematic and something that should most
certainly be covered by the driver.

Deprecate drm_sched_wqueue_ready().

Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
 drivers/gpu/drm/scheduler/sched_main.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index c4e4ac436a86..35a7892c8dc8 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1198,10 +1198,14 @@ void drm_sched_increase_karma(struct drm_sched_job *bad)
 EXPORT_SYMBOL(drm_sched_increase_karma);
 
 /**
- * drm_sched_wqueue_ready - Is the scheduler ready for submission
+ * drm_sched_wqueue_ready - Is the scheduler ready for submission (DEPRECATED)
  *
  * @sched: scheduler instance
  *
+ * Deprecated, don't use it in new code. This function was added to have the
+ * scheduler represent the hardware ring's state, which must be represented by
+ * the driver's respective data structures.
+ *
  * Returns true if submission is ready
  */
 bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched)
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] drm/sched: Deprecate drm_sched_wqueue_ready()
  2026-06-29 10:40 ` [PATCH 2/2] drm/sched: Deprecate drm_sched_wqueue_ready() Philipp Stanner
@ 2026-06-29 11:57   ` Danilo Krummrich
  0 siblings, 0 replies; 4+ messages in thread
From: Danilo Krummrich @ 2026-06-29 11:57 UTC (permalink / raw)
  To: Philipp Stanner
  Cc: Matthew Brost, Christian König, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	dri-devel, linux-kernel

On 6/29/26 12:40 PM, Philipp Stanner wrote:
> Deprecate drm_sched_wqueue_ready().

AFAICS, this is only used in amdgpu and the implementation is a single line. If
we don't want people to use, I'd rather just remove it and let amdgpu open-code it.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] drm/sched: Guard sched->ready with ACCESS_ONCE()
  2026-06-29 10:40 [PATCH 1/2] drm/sched: Guard sched->ready with ACCESS_ONCE() Philipp Stanner
  2026-06-29 10:40 ` [PATCH 2/2] drm/sched: Deprecate drm_sched_wqueue_ready() Philipp Stanner
@ 2026-06-29 12:09 ` Danilo Krummrich
  1 sibling, 0 replies; 4+ messages in thread
From: Danilo Krummrich @ 2026-06-29 12:09 UTC (permalink / raw)
  To: Philipp Stanner
  Cc: Matthew Brost, Christian König, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	dri-devel, linux-kernel

> +	WRITE_ONCE(sched->ready, false);

Don't we need smp_store_release() here and

> +	return READ_ONCE(sched->ready);
smp_load_acquire() here?

Also, what about drm_sched_init()? It also seems that this is accessed from
amdgpu without the drm_sched_wqueue_ready() helper about a million times. :)

	$ grep -Rin "sched\.ready" drivers/gpu/drm/amd | wc
	$     119     544   10320

There may be false positives, but from a quick glance at least most of them seem
to actually come from the scheduler.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-29 12:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-29 10:40 [PATCH 1/2] drm/sched: Guard sched->ready with ACCESS_ONCE() Philipp Stanner
2026-06-29 10:40 ` [PATCH 2/2] drm/sched: Deprecate drm_sched_wqueue_ready() Philipp Stanner
2026-06-29 11:57   ` Danilo Krummrich
2026-06-29 12:09 ` [PATCH 1/2] drm/sched: Guard sched->ready with ACCESS_ONCE() Danilo Krummrich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox