* [PATCH v2 0/4] Add TODO list (+ small docu change)
@ 2025-10-23 14:30 Philipp Stanner
2025-10-23 14:30 ` [PATCH v2 1/4] drm/sched: Remove out of place resubmit docu Philipp Stanner
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Philipp Stanner @ 2025-10-23 14:30 UTC (permalink / raw)
To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
Christian König, David Airlie, Simona Vetter, Tvrtko Ursulin
Cc: dri-devel, linux-kernel, linux-media
drm_sched has so many problems that we should have our own TODO list for
contributors who might wanna help.
Changes in v2:
- Add generic TODO list example that can stay in the file forever.
Philipp Stanner (4):
drm/sched: Remove out of place resubmit docu
drm/sched: Add a TODO list
drm/sched: Add TODO entry for resubmitting jobs
drm/sched: Add TODO entry for missing runqueue locks
drivers/gpu/drm/scheduler/TODO | 51 ++++++++++++++++++++++++++++++++++
include/drm/gpu_scheduler.h | 10 -------
2 files changed, 51 insertions(+), 10 deletions(-)
create mode 100644 drivers/gpu/drm/scheduler/TODO
--
2.49.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/4] drm/sched: Remove out of place resubmit docu
2025-10-23 14:30 [PATCH v2 0/4] Add TODO list (+ small docu change) Philipp Stanner
@ 2025-10-23 14:30 ` Philipp Stanner
2025-10-23 14:30 ` [PATCH v2 2/4] drm/sched: Add a TODO list Philipp Stanner
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Philipp Stanner @ 2025-10-23 14:30 UTC (permalink / raw)
To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
Christian König, David Airlie, Simona Vetter, Tvrtko Ursulin
Cc: dri-devel, linux-kernel, linux-media
The documentation for drm_sched_backend_ops.run_job() details that that
callback can be invoked multiple times by the deprecated function
drm_sched_resubmit_jobs(). It also contains an unresolved TODO.
It is not useful to document side effects of a different, deprecated
function in the docu of run_job(): Existing users won't re-evaluate
their usage of the deprecated function by reading the non-deprecated
one, and new users must not use the deprecated function in the first
place.
Remove the out of place documentation.
Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
include/drm/gpu_scheduler.h | 10 ----------
1 file changed, 10 deletions(-)
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index fb88301b3c45..9c629bbc0684 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -429,16 +429,6 @@ struct drm_sched_backend_ops {
*
* @sched_job: the job to run
*
- * The deprecated drm_sched_resubmit_jobs() (called by &struct
- * drm_sched_backend_ops.timedout_job) can invoke this again with the
- * same parameters. Using this is discouraged because it violates
- * dma_fence rules, notably dma_fence_init() has to be called on
- * already initialized fences for a second time. Moreover, this is
- * dangerous because attempts to allocate memory might deadlock with
- * memory management code waiting for the reset to complete.
- *
- * TODO: Document what drivers should do / use instead.
- *
* This method is called in a workqueue context - either from the
* submit_wq the driver passed through drm_sched_init(), or, if the
* driver passed NULL, a separate, ordered workqueue the scheduler
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 2/4] drm/sched: Add a TODO list
2025-10-23 14:30 [PATCH v2 0/4] Add TODO list (+ small docu change) Philipp Stanner
2025-10-23 14:30 ` [PATCH v2 1/4] drm/sched: Remove out of place resubmit docu Philipp Stanner
@ 2025-10-23 14:30 ` Philipp Stanner
2025-10-23 14:30 ` [PATCH v2 3/4] drm/sched: Add TODO entry for resubmitting jobs Philipp Stanner
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Philipp Stanner @ 2025-10-23 14:30 UTC (permalink / raw)
To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
Christian König, David Airlie, Simona Vetter, Tvrtko Ursulin
Cc: dri-devel, linux-kernel, linux-media
The DRM GPU scheduler contains a huge number of problems. These should
be documented for (new) contributors to pick up work.
Add a TODO list with an example entry.
Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
drivers/gpu/drm/scheduler/TODO | 12 ++++++++++++
1 file changed, 12 insertions(+)
create mode 100644 drivers/gpu/drm/scheduler/TODO
diff --git a/drivers/gpu/drm/scheduler/TODO b/drivers/gpu/drm/scheduler/TODO
new file mode 100644
index 000000000000..79044adb7d01
--- /dev/null
+++ b/drivers/gpu/drm/scheduler/TODO
@@ -0,0 +1,12 @@
+=== drm_sched TODO list ===
+
+* Example Entry
+ - Difficulty: hard
+ - Contact:
+ - Danilo Krummrich <dakr@kernel.org>
+ - Philipp Stanner <phasta@kernel.org>
+ - Description:
+ This is an example.
+ - Tasks:
+ 1. Read the example entry.
+ 2. Remove the entry once solved (never in this case)
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 3/4] drm/sched: Add TODO entry for resubmitting jobs
2025-10-23 14:30 [PATCH v2 0/4] Add TODO list (+ small docu change) Philipp Stanner
2025-10-23 14:30 ` [PATCH v2 1/4] drm/sched: Remove out of place resubmit docu Philipp Stanner
2025-10-23 14:30 ` [PATCH v2 2/4] drm/sched: Add a TODO list Philipp Stanner
@ 2025-10-23 14:30 ` Philipp Stanner
2025-10-23 14:30 ` [PATCH v2 4/4] drm/sched: Add TODO entry for missing runqueue locks Philipp Stanner
2025-10-30 11:00 ` [PATCH v2 0/4] Add TODO list (+ small docu change) Danilo Krummrich
4 siblings, 0 replies; 6+ messages in thread
From: Philipp Stanner @ 2025-10-23 14:30 UTC (permalink / raw)
To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
Christian König, David Airlie, Simona Vetter, Tvrtko Ursulin
Cc: dri-devel, linux-kernel, linux-media
Add the issue of a successor of drm_sched_resubmit_jobs() missing to the
TODO file.
Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
drivers/gpu/drm/scheduler/TODO | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/TODO b/drivers/gpu/drm/scheduler/TODO
index 79044adb7d01..713dd62c58da 100644
--- a/drivers/gpu/drm/scheduler/TODO
+++ b/drivers/gpu/drm/scheduler/TODO
@@ -10,3 +10,29 @@
- Tasks:
1. Read the example entry.
2. Remove the entry once solved (never in this case)
+
+* GPU job resubmits
+ - Difficulty: hard
+ - Contact:
+ - Christian König <ckoenig.leichtzumerken@gmail.com>
+ - Philipp Stanner <phasta@kernel.org>
+ - Description:
+ drm_sched_resubmit_jobs() is deprecated. Main reason being that it leads to
+ reinitializing dma_fences. See that function's docu for details. The better
+ approach for valid resubmissions by amdgpu and Xe is (apparently) to figure
+ out which job (and, through association: which entity) caused the hang. Then,
+ the job's buffer data, together with all other jobs' buffer data currently
+ in the same hardware ring, must be invalidated. This can for example be done
+ by overwriting it.
+ amdgpu currently determines which jobs are in the ring and need to be
+ overwritten by keeping copies of the job. Xe obtains that information by
+ directly accessing drm_sched's pending_list.
+ - Tasks:
+ 1. implement scheduler functionality through which
+ the driver can obtain the information which *broken* jobs are currently in
+ the hardware ring.
+ 2. Such infrastructure would then typically be used in
+ drm_sched_backend_ops.timedout_job(). Document that.
+ 3. Port a driver as first user.
+ 3. Document the new alternative in the docu of deprecated
+ drm_sched_resubmit_jobs().
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 4/4] drm/sched: Add TODO entry for missing runqueue locks
2025-10-23 14:30 [PATCH v2 0/4] Add TODO list (+ small docu change) Philipp Stanner
` (2 preceding siblings ...)
2025-10-23 14:30 ` [PATCH v2 3/4] drm/sched: Add TODO entry for resubmitting jobs Philipp Stanner
@ 2025-10-23 14:30 ` Philipp Stanner
2025-10-30 11:00 ` [PATCH v2 0/4] Add TODO list (+ small docu change) Danilo Krummrich
4 siblings, 0 replies; 6+ messages in thread
From: Philipp Stanner @ 2025-10-23 14:30 UTC (permalink / raw)
To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
Christian König, David Airlie, Simona Vetter, Tvrtko Ursulin
Cc: dri-devel, linux-kernel, linux-media
struct drm_sched_rq is not being locked at many places throughout the
scheduler, at least for readers. This was documented in a FIXME added
in:
commit 981b04d96856 ("drm/sched: improve docs around drm_sched_entity")
Add a TODO entry for that problem.
Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
drivers/gpu/drm/scheduler/TODO | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/TODO b/drivers/gpu/drm/scheduler/TODO
index 713dd62c58da..263ce2deb69a 100644
--- a/drivers/gpu/drm/scheduler/TODO
+++ b/drivers/gpu/drm/scheduler/TODO
@@ -36,3 +36,16 @@
3. Port a driver as first user.
3. Document the new alternative in the docu of deprecated
drm_sched_resubmit_jobs().
+
+* Unlocked readers for runqueues
+ - Difficulty: medium
+ - Contact: Philipp Stanner <phasta@kernel.org>
+ - Description:
+ There is an old FIXME by Sima in include/drm/gpu_scheduler.h. It details
+ that struct drm_sched_rq is read at many places without any locks, not even
+ with a READ_ONCE. At XDC 2025 no one could really tell why that is the case,
+ whether locks are needed and whether they could be added. (But for real,
+ that should probably be locked!).
+ - Tasks:
+ 1. Check whether locks for runqueue readers can be added.
+ 2. If yes, add the locks.
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2 0/4] Add TODO list (+ small docu change)
2025-10-23 14:30 [PATCH v2 0/4] Add TODO list (+ small docu change) Philipp Stanner
` (3 preceding siblings ...)
2025-10-23 14:30 ` [PATCH v2 4/4] drm/sched: Add TODO entry for missing runqueue locks Philipp Stanner
@ 2025-10-30 11:00 ` Danilo Krummrich
4 siblings, 0 replies; 6+ messages in thread
From: Danilo Krummrich @ 2025-10-30 11:00 UTC (permalink / raw)
To: Philipp Stanner
Cc: Matthew Brost, Christian König, David Airlie, Simona Vetter,
Tvrtko Ursulin, dri-devel, linux-kernel, linux-media
On 10/23/25 4:30 PM, Philipp Stanner wrote:
> drm_sched has so many problems that we should have our own TODO list for
> contributors who might wanna help.
Looks good,
Acked-by: Danilo Krummrich <dakr@kernel.org>
> drivers/gpu/drm/scheduler/TODO | 51 ++++++++++++++++++++++++++++++++++
I'd move this into Documentation/, just like we did it for Nova [1].
[1] https://docs.kernel.org/gpu/nova/core/todo.html
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-10-30 11:00 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-23 14:30 [PATCH v2 0/4] Add TODO list (+ small docu change) Philipp Stanner
2025-10-23 14:30 ` [PATCH v2 1/4] drm/sched: Remove out of place resubmit docu Philipp Stanner
2025-10-23 14:30 ` [PATCH v2 2/4] drm/sched: Add a TODO list Philipp Stanner
2025-10-23 14:30 ` [PATCH v2 3/4] drm/sched: Add TODO entry for resubmitting jobs Philipp Stanner
2025-10-23 14:30 ` [PATCH v2 4/4] drm/sched: Add TODO entry for missing runqueue locks Philipp Stanner
2025-10-30 11:00 ` [PATCH v2 0/4] Add TODO list (+ small docu change) Danilo Krummrich
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.