* [PATCH v2 0/4] Add TODO list (+ small docu change)
@ 2025-10-23 14:30 Philipp Stanner
2025-10-23 14:30 ` [PATCH v2 1/4] drm/sched: Remove out of place resubmit docu Philipp Stanner
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Philipp Stanner @ 2025-10-23 14:30 UTC (permalink / raw)
To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
Christian König, David Airlie, Simona Vetter, Tvrtko Ursulin
Cc: dri-devel, linux-kernel, linux-media
drm_sched has so many problems that we should have our own TODO list for
contributors who might wanna help.
Changes in v2:
- Add generic TODO list example that can stay in the file forever.
Philipp Stanner (4):
drm/sched: Remove out of place resubmit docu
drm/sched: Add a TODO list
drm/sched: Add TODO entry for resubmitting jobs
drm/sched: Add TODO entry for missing runqueue locks
drivers/gpu/drm/scheduler/TODO | 51 ++++++++++++++++++++++++++++++++++
include/drm/gpu_scheduler.h | 10 -------
2 files changed, 51 insertions(+), 10 deletions(-)
create mode 100644 drivers/gpu/drm/scheduler/TODO
--
2.49.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/4] drm/sched: Remove out of place resubmit docu
2025-10-23 14:30 [PATCH v2 0/4] Add TODO list (+ small docu change) Philipp Stanner
@ 2025-10-23 14:30 ` Philipp Stanner
2025-10-23 14:30 ` [PATCH v2 2/4] drm/sched: Add a TODO list Philipp Stanner
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Philipp Stanner @ 2025-10-23 14:30 UTC (permalink / raw)
To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
Christian König, David Airlie, Simona Vetter, Tvrtko Ursulin
Cc: dri-devel, linux-kernel, linux-media
The documentation for drm_sched_backend_ops.run_job() details that that
callback can be invoked multiple times by the deprecated function
drm_sched_resubmit_jobs(). It also contains an unresolved TODO.
It is not useful to document side effects of a different, deprecated
function in the docu of run_job(): Existing users won't re-evaluate
their usage of the deprecated function by reading the non-deprecated
one, and new users must not use the deprecated function in the first
place.
Remove the out of place documentation.
Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
include/drm/gpu_scheduler.h | 10 ----------
1 file changed, 10 deletions(-)
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index fb88301b3c45..9c629bbc0684 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -429,16 +429,6 @@ struct drm_sched_backend_ops {
*
* @sched_job: the job to run
*
- * The deprecated drm_sched_resubmit_jobs() (called by &struct
- * drm_sched_backend_ops.timedout_job) can invoke this again with the
- * same parameters. Using this is discouraged because it violates
- * dma_fence rules, notably dma_fence_init() has to be called on
- * already initialized fences for a second time. Moreover, this is
- * dangerous because attempts to allocate memory might deadlock with
- * memory management code waiting for the reset to complete.
- *
- * TODO: Document what drivers should do / use instead.
- *
* This method is called in a workqueue context - either from the
* submit_wq the driver passed through drm_sched_init(), or, if the
* driver passed NULL, a separate, ordered workqueue the scheduler
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 2/4] drm/sched: Add a TODO list
2025-10-23 14:30 [PATCH v2 0/4] Add TODO list (+ small docu change) Philipp Stanner
2025-10-23 14:30 ` [PATCH v2 1/4] drm/sched: Remove out of place resubmit docu Philipp Stanner
@ 2025-10-23 14:30 ` Philipp Stanner
2025-10-23 14:30 ` [PATCH v2 3/4] drm/sched: Add TODO entry for resubmitting jobs Philipp Stanner
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Philipp Stanner @ 2025-10-23 14:30 UTC (permalink / raw)
To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
Christian König, David Airlie, Simona Vetter, Tvrtko Ursulin
Cc: dri-devel, linux-kernel, linux-media
The DRM GPU scheduler contains a huge number of problems. These should
be documented for (new) contributors to pick up work.
Add a TODO list with an example entry.
Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
drivers/gpu/drm/scheduler/TODO | 12 ++++++++++++
1 file changed, 12 insertions(+)
create mode 100644 drivers/gpu/drm/scheduler/TODO
diff --git a/drivers/gpu/drm/scheduler/TODO b/drivers/gpu/drm/scheduler/TODO
new file mode 100644
index 000000000000..79044adb7d01
--- /dev/null
+++ b/drivers/gpu/drm/scheduler/TODO
@@ -0,0 +1,12 @@
+=== drm_sched TODO list ===
+
+* Example Entry
+ - Difficulty: hard
+ - Contact:
+ - Danilo Krummrich <dakr@kernel.org>
+ - Philipp Stanner <phasta@kernel.org>
+ - Description:
+ This is an example.
+ - Tasks:
+ 1. Read the example entry.
+ 2. Remove the entry once solved (never in this case)
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 3/4] drm/sched: Add TODO entry for resubmitting jobs
2025-10-23 14:30 [PATCH v2 0/4] Add TODO list (+ small docu change) Philipp Stanner
2025-10-23 14:30 ` [PATCH v2 1/4] drm/sched: Remove out of place resubmit docu Philipp Stanner
2025-10-23 14:30 ` [PATCH v2 2/4] drm/sched: Add a TODO list Philipp Stanner
@ 2025-10-23 14:30 ` Philipp Stanner
2025-10-23 14:30 ` [PATCH v2 4/4] drm/sched: Add TODO entry for missing runqueue locks Philipp Stanner
2025-10-30 11:00 ` [PATCH v2 0/4] Add TODO list (+ small docu change) Danilo Krummrich
4 siblings, 0 replies; 6+ messages in thread
From: Philipp Stanner @ 2025-10-23 14:30 UTC (permalink / raw)
To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
Christian König, David Airlie, Simona Vetter, Tvrtko Ursulin
Cc: dri-devel, linux-kernel, linux-media
Add the issue of a successor of drm_sched_resubmit_jobs() missing to the
TODO file.
Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
drivers/gpu/drm/scheduler/TODO | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/TODO b/drivers/gpu/drm/scheduler/TODO
index 79044adb7d01..713dd62c58da 100644
--- a/drivers/gpu/drm/scheduler/TODO
+++ b/drivers/gpu/drm/scheduler/TODO
@@ -10,3 +10,29 @@
- Tasks:
1. Read the example entry.
2. Remove the entry once solved (never in this case)
+
+* GPU job resubmits
+ - Difficulty: hard
+ - Contact:
+ - Christian König <ckoenig.leichtzumerken@gmail.com>
+ - Philipp Stanner <phasta@kernel.org>
+ - Description:
+ drm_sched_resubmit_jobs() is deprecated. Main reason being that it leads to
+ reinitializing dma_fences. See that function's docu for details. The better
+ approach for valid resubmissions by amdgpu and Xe is (apparently) to figure
+ out which job (and, through association: which entity) caused the hang. Then,
+ the job's buffer data, together with all other jobs' buffer data currently
+ in the same hardware ring, must be invalidated. This can for example be done
+ by overwriting it.
+ amdgpu currently determines which jobs are in the ring and need to be
+ overwritten by keeping copies of the job. Xe obtains that information by
+ directly accessing drm_sched's pending_list.
+ - Tasks:
+ 1. implement scheduler functionality through which
+ the driver can obtain the information which *broken* jobs are currently in
+ the hardware ring.
+ 2. Such infrastructure would then typically be used in
+ drm_sched_backend_ops.timedout_job(). Document that.
+ 3. Port a driver as first user.
+ 3. Document the new alternative in the docu of deprecated
+ drm_sched_resubmit_jobs().
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 4/4] drm/sched: Add TODO entry for missing runqueue locks
2025-10-23 14:30 [PATCH v2 0/4] Add TODO list (+ small docu change) Philipp Stanner
` (2 preceding siblings ...)
2025-10-23 14:30 ` [PATCH v2 3/4] drm/sched: Add TODO entry for resubmitting jobs Philipp Stanner
@ 2025-10-23 14:30 ` Philipp Stanner
2025-10-30 11:00 ` [PATCH v2 0/4] Add TODO list (+ small docu change) Danilo Krummrich
4 siblings, 0 replies; 6+ messages in thread
From: Philipp Stanner @ 2025-10-23 14:30 UTC (permalink / raw)
To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
Christian König, David Airlie, Simona Vetter, Tvrtko Ursulin
Cc: dri-devel, linux-kernel, linux-media
struct drm_sched_rq is not being locked at many places throughout the
scheduler, at least for readers. This was documented in a FIXME added
in:
commit 981b04d96856 ("drm/sched: improve docs around drm_sched_entity")
Add a TODO entry for that problem.
Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
drivers/gpu/drm/scheduler/TODO | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/TODO b/drivers/gpu/drm/scheduler/TODO
index 713dd62c58da..263ce2deb69a 100644
--- a/drivers/gpu/drm/scheduler/TODO
+++ b/drivers/gpu/drm/scheduler/TODO
@@ -36,3 +36,16 @@
3. Port a driver as first user.
3. Document the new alternative in the docu of deprecated
drm_sched_resubmit_jobs().
+
+* Unlocked readers for runqueues
+ - Difficulty: medium
+ - Contact: Philipp Stanner <phasta@kernel.org>
+ - Description:
+ There is an old FIXME by Sima in include/drm/gpu_scheduler.h. It details
+ that struct drm_sched_rq is read at many places without any locks, not even
+ with a READ_ONCE. At XDC 2025 no one could really tell why that is the case,
+ whether locks are needed and whether they could be added. (But for real,
+ that should probably be locked!).
+ - Tasks:
+ 1. Check whether locks for runqueue readers can be added.
+ 2. If yes, add the locks.
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2 0/4] Add TODO list (+ small docu change)
2025-10-23 14:30 [PATCH v2 0/4] Add TODO list (+ small docu change) Philipp Stanner
` (3 preceding siblings ...)
2025-10-23 14:30 ` [PATCH v2 4/4] drm/sched: Add TODO entry for missing runqueue locks Philipp Stanner
@ 2025-10-30 11:00 ` Danilo Krummrich
4 siblings, 0 replies; 6+ messages in thread
From: Danilo Krummrich @ 2025-10-30 11:00 UTC (permalink / raw)
To: Philipp Stanner
Cc: Matthew Brost, Christian König, David Airlie, Simona Vetter,
Tvrtko Ursulin, dri-devel, linux-kernel, linux-media
On 10/23/25 4:30 PM, Philipp Stanner wrote:
> drm_sched has so many problems that we should have our own TODO list for
> contributors who might wanna help.
Looks good,
Acked-by: Danilo Krummrich <dakr@kernel.org>
> drivers/gpu/drm/scheduler/TODO | 51 ++++++++++++++++++++++++++++++++++
I'd move this into Documentation/, just like we did it for Nova [1].
[1] https://docs.kernel.org/gpu/nova/core/todo.html
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-10-30 11:00 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-23 14:30 [PATCH v2 0/4] Add TODO list (+ small docu change) Philipp Stanner
2025-10-23 14:30 ` [PATCH v2 1/4] drm/sched: Remove out of place resubmit docu Philipp Stanner
2025-10-23 14:30 ` [PATCH v2 2/4] drm/sched: Add a TODO list Philipp Stanner
2025-10-23 14:30 ` [PATCH v2 3/4] drm/sched: Add TODO entry for resubmitting jobs Philipp Stanner
2025-10-23 14:30 ` [PATCH v2 4/4] drm/sched: Add TODO entry for missing runqueue locks Philipp Stanner
2025-10-30 11:00 ` [PATCH v2 0/4] Add TODO list (+ small docu change) Danilo Krummrich
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).