[PATCH v2] drm/sched: Clarify scenarios for separate workqueues

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2] drm/sched: Clarify scenarios for separate workqueues
@ 2025-06-12 14:49 Philipp Stanner
  2025-06-17 13:51 ` Simona Vetter
  0 siblings, 1 reply; 7+ messages in thread
From: Philipp Stanner @ 2025-06-12 14:49 UTC (permalink / raw)
  To: Matthew Brost, Danilo Krummrich, Philipp Stanner,
	Christian König, David Airlie, Simona Vetter, Sumit Semwal
  Cc: dri-devel, linux-kernel, linux-media

struct drm_sched_init_args provides the possibility of letting the
scheduler use user-controlled workqueues, instead of the scheduler
creating its own workqueues. It's currently not documented who would
want to use that.

Not sharing the submit_wq between driver and scheduler has the advantage
of no negative intereference between them being able to occur (e.g.,
MMU notifier callbacks waiting for fences to get signaled).

Add a new docu section for concurrency in drm/sched.

Discourage the usage of own workqueues in the documentation. Document
when using them can make sense. Warn about pitfalls.

Co-authored-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
Changes in v2:
  - Add new docu section for concurrency in the scheduler. (Sima)
  - Document what an ordered workqueue passed to the scheduler can be
    useful for. (Christian, Sima)
  - Warn more detailed about potential deadlocks. (Danilo)
---
 include/drm/gpu_scheduler.h | 54 ++++++++++++++++++++++++++++++++++---
 1 file changed, 51 insertions(+), 3 deletions(-)

diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 81dcbfc8c223..00c528e62485 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -21,6 +21,49 @@
  *
  */
 
+/**
+ * DOC: Concurrency in drm/sched
+ *
+ * The DRM GPU Scheduler interacts with drivers through the callbacks defined in
+ * &struct drm_sched_backend_ops. These callbacks can be invoked by the scheduler
+ * at any point in time if not stated otherwise explicitly in the callback
+ * documentation.
+ *
+ * For most use cases, passing the recommended default parameters in &struct
+ * drm_sched_init_args is sufficient. There are some special circumstances,
+ * though:
+ *
+ * For timeout handling, it makes a lot of sense for drivers with HARDWARE
+ * scheduling to have the timeouts (e.g., for different hardware rings) occur
+ * sequentially, for example to allow for detecting which job timedout first
+ * and, therefore, caused the hang. Thereby, it is determined which &struct
+ * drm_sched_entity has to be killed and which entities' jobs must be
+ * resubmitted after a GPU reset.
+ *
+ * This can be achieved by passing an ordered workqueue in &struct
+ * drm_sched_init_args.timeout_wq. Also take a look at the documentation of
+ * &struct drm_sched_backend_ops.timedout_job.
+ *
+ * Furthermore, for drivers with hardware that supports FIRMWARE scheduling,
+ * the driver design can be simplified a bit by providing one ordered workqueue
+ * for both &struct drm_sched_init_args.timeout_wq and
+ * &struct drm_sched_init_args.submit_wq. Reason being that firmware schedulers
+ * should always have one scheduler instance per firmware runqueue and one
+ * entity per scheduler instance. If that scheduler instance then shares one
+ * ordered workqueue with the driver, locking can be very lightweight or
+ * dropped alltogether.
+ *
+ * NOTE that sharing &struct drm_sched_init_args.submit_wq with the driver
+ * theoretically can deadlock. It must be guaranteed that submit_wq never has
+ * more than max_active - 1 active tasks, or if max_active tasks are reached at
+ * least one of them does not execute operations that may block on dma_fences
+ * that potentially make progress through this scheduler instance. Otherwise,
+ * it is possible that all max_active tasks end up waiting on a dma_fence (that
+ * can only make progress through this schduler instance), while the
+ * scheduler's queued work waits for at least one of the max_active tasks to
+ * finish. Thus, this can result in a deadlock.
+ */
+
 #ifndef _DRM_GPU_SCHEDULER_H_
 #define _DRM_GPU_SCHEDULER_H_
 
@@ -499,7 +542,7 @@ struct drm_sched_backend_ops {
 	 * timeout handlers of different schedulers. One way to achieve this
 	 * synchronization is to create an ordered workqueue (using
 	 * alloc_ordered_workqueue()) at the driver level, and pass this queue
-	 * as drm_sched_init()'s @timeout_wq parameter. This will guarantee
+	 * in &struct drm_sched_init_args.timeout_wq. This will guarantee
 	 * that timeout handlers are executed sequentially.
 	 *
 	 * Return: The scheduler's status, defined by &enum drm_gpu_sched_stat
@@ -590,14 +633,19 @@ struct drm_gpu_scheduler {
  *
  * @ops: backend operations provided by the driver
  * @submit_wq: workqueue to use for submission. If NULL, an ordered wq is
- *	       allocated and used.
+ *	       allocated and used. It is recommended to pass NULL unless there
+ *	       is a good reason not to. For details, see &DOC: Concurrency in
+ *	       drm/sched.
  * @num_rqs: Number of run-queues. This may be at most DRM_SCHED_PRIORITY_COUNT,
  *	     as there's usually one run-queue per priority, but may be less.
  * @credit_limit: the number of credits this scheduler can hold from all jobs
  * @hang_limit: number of times to allow a job to hang before dropping it.
  *		This mechanism is DEPRECATED. Set it to 0.
  * @timeout: timeout value in jiffies for submitted jobs.
- * @timeout_wq: workqueue to use for timeout work. If NULL, the system_wq is used.
+ * @timeout_wq: workqueue to use for timeout work. If NULL, the system_wq is
+ *		used. An ordered workqueue could be passed to achieve timeout
+ *		synchronization. See &DOC: Concurreny in drm/sched and &struct
+ *		drm_sched_backend_ops.timedout_job.
  * @score: score atomic shared with other schedulers. May be NULL.
  * @name: name (typically the driver's name). Used for debugging
  * @dev: associated device. Used for debugging
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] drm/sched: Clarify scenarios for separate workqueues
  2025-06-12 14:49 [PATCH v2] drm/sched: Clarify scenarios for separate workqueues Philipp Stanner
@ 2025-06-17 13:51 ` Simona Vetter
  2025-06-17 14:10   ` Danilo Krummrich
  0 siblings, 1 reply; 7+ messages in thread
From: Simona Vetter @ 2025-06-17 13:51 UTC (permalink / raw)
  To: Philipp Stanner
  Cc: Matthew Brost, Danilo Krummrich, Christian König,
	David Airlie, Simona Vetter, Sumit Semwal, dri-devel,
	linux-kernel, linux-media

On Thu, Jun 12, 2025 at 04:49:54PM +0200, Philipp Stanner wrote:
> struct drm_sched_init_args provides the possibility of letting the
> scheduler use user-controlled workqueues, instead of the scheduler
> creating its own workqueues. It's currently not documented who would
> want to use that.
> 
> Not sharing the submit_wq between driver and scheduler has the advantage
> of no negative intereference between them being able to occur (e.g.,
> MMU notifier callbacks waiting for fences to get signaled).
> 
> Add a new docu section for concurrency in drm/sched.
> 
> Discourage the usage of own workqueues in the documentation. Document
> when using them can make sense. Warn about pitfalls.
> 
> Co-authored-by: Danilo Krummrich <dakr@kernel.org>
> Signed-off-by: Philipp Stanner <phasta@kernel.org>
> ---
> Changes in v2:
>   - Add new docu section for concurrency in the scheduler. (Sima)
>   - Document what an ordered workqueue passed to the scheduler can be
>     useful for. (Christian, Sima)
>   - Warn more detailed about potential deadlocks. (Danilo)
> ---
>  include/drm/gpu_scheduler.h | 54 ++++++++++++++++++++++++++++++++++---
>  1 file changed, 51 insertions(+), 3 deletions(-)
> 
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 81dcbfc8c223..00c528e62485 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -21,6 +21,49 @@
>   *
>   */
>  
> +/**
> + * DOC: Concurrency in drm/sched

You need to explicitly pull this into the .rst files somewhere suitable,
otherwise it wont show up in the docs. With that please also check that
the hyperlinks all work.

> + *
> + * The DRM GPU Scheduler interacts with drivers through the callbacks defined in
> + * &struct drm_sched_backend_ops. These callbacks can be invoked by the scheduler
> + * at any point in time if not stated otherwise explicitly in the callback
> + * documentation.
> + *
> + * For most use cases, passing the recommended default parameters in &struct
> + * drm_sched_init_args is sufficient. There are some special circumstances,
> + * though:
> + *
> + * For timeout handling, it makes a lot of sense for drivers with HARDWARE
> + * scheduling to have the timeouts (e.g., for different hardware rings) occur
> + * sequentially, for example to allow for detecting which job timedout first
> + * and, therefore, caused the hang. Thereby, it is determined which &struct
> + * drm_sched_entity has to be killed and which entities' jobs must be
> + * resubmitted after a GPU reset.
> + *
> + * This can be achieved by passing an ordered workqueue in &struct
> + * drm_sched_init_args.timeout_wq. Also take a look at the documentation of
> + * &struct drm_sched_backend_ops.timedout_job.

So the main use-case for this is when the reset domain is bigger than a
single engine, and you don't want to deal with the mess of inventing your
own synchronization primitives.

I'm not exactly clear what hardware scheduling scenario you're thinking of
here that needs an ordered timeout queue that's shared across instances.

Note that an ordered timeout queue for a single scheduler is fairly
pointless, because we only ever have one pending timeout handler (at least
I thought that's how it works).

> + * Furthermore, for drivers with hardware that supports FIRMWARE scheduling,

I'm not sure you need a firmware scheduler to make this useful, just that
firmware scheduler submit rings tend to be one case where this is a good
locking design. I'd put this at the end as a concrete example instead and
explain why (you already have a single submit ringbuffer, parallelism in
the kernel doesn't serve a point aside from making locking more complex).

> + * the driver design can be simplified a bit by providing one ordered workqueue
> + * for both &struct drm_sched_init_args.timeout_wq and
> + * &struct drm_sched_init_args.submit_wq. Reason being that firmware schedulers
> + * should always have one scheduler instance per firmware runqueue and one
> + * entity per scheduler instance. If that scheduler instance then shares one
> + * ordered workqueue with the driver, locking can be very lightweight or
> + * dropped alltogether.
> + *
> + * NOTE that sharing &struct drm_sched_init_args.submit_wq with the driver
> + * theoretically can deadlock. It must be guaranteed that submit_wq never has
> + * more than max_active - 1 active tasks, or if max_active tasks are reached at
> + * least one of them does not execute operations that may block on dma_fences
> + * that potentially make progress through this scheduler instance. Otherwise,
> + * it is possible that all max_active tasks end up waiting on a dma_fence (that
> + * can only make progress through this schduler instance), while the
> + * scheduler's queued work waits for at least one of the max_active tasks to
> + * finish. Thus, this can result in a deadlock.

Uh if you have an ordered wq you deadlock with just one misuse. I'd just
explain that the wq must provide sufficient forward-progress guarantees
for the scheduler, specifically that it's on the dma_fence signalling
critical path and leave the concrete examples for people to figure out
when the design a specific locking scheme.

> + */
> +
>  #ifndef _DRM_GPU_SCHEDULER_H_
>  #define _DRM_GPU_SCHEDULER_H_
>  
> @@ -499,7 +542,7 @@ struct drm_sched_backend_ops {
>  	 * timeout handlers of different schedulers. One way to achieve this
>  	 * synchronization is to create an ordered workqueue (using
>  	 * alloc_ordered_workqueue()) at the driver level, and pass this queue
> -	 * as drm_sched_init()'s @timeout_wq parameter. This will guarantee
> +	 * in &struct drm_sched_init_args.timeout_wq. This will guarantee
>  	 * that timeout handlers are executed sequentially.
>  	 *
>  	 * Return: The scheduler's status, defined by &enum drm_gpu_sched_stat
> @@ -590,14 +633,19 @@ struct drm_gpu_scheduler {
>   *
>   * @ops: backend operations provided by the driver
>   * @submit_wq: workqueue to use for submission. If NULL, an ordered wq is
> - *	       allocated and used.
> + *	       allocated and used. It is recommended to pass NULL unless there
> + *	       is a good reason not to. For details, see &DOC: Concurrency in
> + *	       drm/sched.

Does this really link?

>   * @num_rqs: Number of run-queues. This may be at most DRM_SCHED_PRIORITY_COUNT,
>   *	     as there's usually one run-queue per priority, but may be less.
>   * @credit_limit: the number of credits this scheduler can hold from all jobs
>   * @hang_limit: number of times to allow a job to hang before dropping it.
>   *		This mechanism is DEPRECATED. Set it to 0.
>   * @timeout: timeout value in jiffies for submitted jobs.
> - * @timeout_wq: workqueue to use for timeout work. If NULL, the system_wq is used.
> + * @timeout_wq: workqueue to use for timeout work. If NULL, the system_wq is
> + *		used. An ordered workqueue could be passed to achieve timeout
> + *		synchronization. See &DOC: Concurreny in drm/sched and &struct

Same question about linking ...

Might just put a note in the text section below about concurrency design
questions instead of trying to squeeze these in here.

Cheers, Sima

> + *		drm_sched_backend_ops.timedout_job.
>   * @score: score atomic shared with other schedulers. May be NULL.
>   * @name: name (typically the driver's name). Used for debugging
>   * @dev: associated device. Used for debugging
> -- 
> 2.49.0
> 

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] drm/sched: Clarify scenarios for separate workqueues
  2025-06-17 13:51 ` Simona Vetter
@ 2025-06-17 14:10   ` Danilo Krummrich
  2025-06-17 14:25     ` Simona Vetter
  0 siblings, 1 reply; 7+ messages in thread
From: Danilo Krummrich @ 2025-06-17 14:10 UTC (permalink / raw)
  To: Philipp Stanner, Matthew Brost, Christian König,
	David Airlie, Simona Vetter, Sumit Semwal, dri-devel,
	linux-kernel, linux-media

On Tue, Jun 17, 2025 at 03:51:33PM +0200, Simona Vetter wrote:
> On Thu, Jun 12, 2025 at 04:49:54PM +0200, Philipp Stanner wrote:
> > + * NOTE that sharing &struct drm_sched_init_args.submit_wq with the driver
> > + * theoretically can deadlock. It must be guaranteed that submit_wq never has
> > + * more than max_active - 1 active tasks, or if max_active tasks are reached at
> > + * least one of them does not execute operations that may block on dma_fences
> > + * that potentially make progress through this scheduler instance. Otherwise,
> > + * it is possible that all max_active tasks end up waiting on a dma_fence (that
> > + * can only make progress through this schduler instance), while the
> > + * scheduler's queued work waits for at least one of the max_active tasks to
> > + * finish. Thus, this can result in a deadlock.
> 
> Uh if you have an ordered wq you deadlock with just one misuse. I'd just
> explain that the wq must provide sufficient forward-progress guarantees
> for the scheduler, specifically that it's on the dma_fence signalling
> critical path and leave the concrete examples for people to figure out
> when the design a specific locking scheme.

This isn't a concrete example, is it? It's exactly what you say in slightly
different words, with the addition of highlighting the impact of the workqueue's
max_active configuration.

I think that's relevant, because N - 1 active tasks can be on the dma_fence
signalling critical path without issues.

We could change

	"if max_active tasks are reached at least one of them must not execute
	 operations that may block on dma_fences that potentially make progress
	 through this scheduler instance"

to 

	"if max_active tasks are reached at least one of them must not be on the
	 dma_fence signalling critical path"

which is a bit more to the point I think.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] drm/sched: Clarify scenarios for separate workqueues
  2025-06-17 14:10   ` Danilo Krummrich
@ 2025-06-17 14:25     ` Simona Vetter
  2025-06-17 15:08       ` Danilo Krummrich
  0 siblings, 1 reply; 7+ messages in thread
From: Simona Vetter @ 2025-06-17 14:25 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Philipp Stanner, Matthew Brost, Christian König,
	David Airlie, Simona Vetter, Sumit Semwal, dri-devel,
	linux-kernel, linux-media

On Tue, Jun 17, 2025 at 04:10:40PM +0200, Danilo Krummrich wrote:
> On Tue, Jun 17, 2025 at 03:51:33PM +0200, Simona Vetter wrote:
> > On Thu, Jun 12, 2025 at 04:49:54PM +0200, Philipp Stanner wrote:
> > > + * NOTE that sharing &struct drm_sched_init_args.submit_wq with the driver
> > > + * theoretically can deadlock. It must be guaranteed that submit_wq never has
> > > + * more than max_active - 1 active tasks, or if max_active tasks are reached at
> > > + * least one of them does not execute operations that may block on dma_fences
> > > + * that potentially make progress through this scheduler instance. Otherwise,
> > > + * it is possible that all max_active tasks end up waiting on a dma_fence (that
> > > + * can only make progress through this schduler instance), while the
> > > + * scheduler's queued work waits for at least one of the max_active tasks to
> > > + * finish. Thus, this can result in a deadlock.
> > 
> > Uh if you have an ordered wq you deadlock with just one misuse. I'd just
> > explain that the wq must provide sufficient forward-progress guarantees
> > for the scheduler, specifically that it's on the dma_fence signalling
> > critical path and leave the concrete examples for people to figure out
> > when the design a specific locking scheme.
> 
> This isn't a concrete example, is it? It's exactly what you say in slightly
> different words, with the addition of highlighting the impact of the workqueue's
> max_active configuration.
> 
> I think that's relevant, because N - 1 active tasks can be on the dma_fence
> signalling critical path without issues.
> 
> We could change
> 
> 	"if max_active tasks are reached at least one of them must not execute
> 	 operations that may block on dma_fences that potentially make progress
> 	 through this scheduler instance"
> 
> to 
> 
> 	"if max_active tasks are reached at least one of them must not be on the
> 	 dma_fence signalling critical path"
> 
> which is a bit more to the point I think.

My point was to more state that the wq must be suitable for the scheduler
jobs as the general issue, and specifically then also highlight the
dma_fence concurrency issue. But it's not the only one, you can have
driver locks and other fun involved here too.

Also since all the paragraphs above talk about ordered wq as the example
where specifying your own wq makes sense, it's a bit confusing to now
suddenly only talk about the concurrent wq case without again mentioned
that the ordered wq case is really limited.
-Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] drm/sched: Clarify scenarios for separate workqueues
  2025-06-17 14:25     ` Simona Vetter
@ 2025-06-17 15:08       ` Danilo Krummrich
  2025-06-18 14:06         ` Simona Vetter
  0 siblings, 1 reply; 7+ messages in thread
From: Danilo Krummrich @ 2025-06-17 15:08 UTC (permalink / raw)
  To: Philipp Stanner, Matthew Brost, Christian König,
	David Airlie, Simona Vetter, Sumit Semwal, dri-devel,
	linux-kernel, linux-media

On Tue, Jun 17, 2025 at 04:25:09PM +0200, Simona Vetter wrote:
> On Tue, Jun 17, 2025 at 04:10:40PM +0200, Danilo Krummrich wrote:
> > On Tue, Jun 17, 2025 at 03:51:33PM +0200, Simona Vetter wrote:
> > > On Thu, Jun 12, 2025 at 04:49:54PM +0200, Philipp Stanner wrote:
> > > > + * NOTE that sharing &struct drm_sched_init_args.submit_wq with the driver
> > > > + * theoretically can deadlock. It must be guaranteed that submit_wq never has
> > > > + * more than max_active - 1 active tasks, or if max_active tasks are reached at
> > > > + * least one of them does not execute operations that may block on dma_fences
> > > > + * that potentially make progress through this scheduler instance. Otherwise,
> > > > + * it is possible that all max_active tasks end up waiting on a dma_fence (that
> > > > + * can only make progress through this schduler instance), while the
> > > > + * scheduler's queued work waits for at least one of the max_active tasks to
> > > > + * finish. Thus, this can result in a deadlock.
> > > 
> > > Uh if you have an ordered wq you deadlock with just one misuse. I'd just
> > > explain that the wq must provide sufficient forward-progress guarantees
> > > for the scheduler, specifically that it's on the dma_fence signalling
> > > critical path and leave the concrete examples for people to figure out
> > > when the design a specific locking scheme.
> > 
> > This isn't a concrete example, is it? It's exactly what you say in slightly
> > different words, with the addition of highlighting the impact of the workqueue's
> > max_active configuration.
> > 
> > I think that's relevant, because N - 1 active tasks can be on the dma_fence
> > signalling critical path without issues.
> > 
> > We could change
> > 
> > 	"if max_active tasks are reached at least one of them must not execute
> > 	 operations that may block on dma_fences that potentially make progress
> > 	 through this scheduler instance"
> > 
> > to 
> > 
> > 	"if max_active tasks are reached at least one of them must not be on the
> > 	 dma_fence signalling critical path"
> > 
> > which is a bit more to the point I think.
> 
> My point was to more state that the wq must be suitable for the scheduler
> jobs as the general issue, and specifically then also highlight the
> dma_fence concurrency issue.

Sure, there are more guarantees the driver has to uphold, but this is one of
them, so I think we should explain it.

> But it's not the only one, you can have driver locks and other fun involved
> here too.

Yeah, but it boils down to the same issue, e.g. if a driver takes a lock in
active work, and this lock is taken elsewhere for activities that violate the
dma_fence signalling critical path.

All the cases I have in mind boil down to that we potentially, either directly
or indirectly (through some synchronization primitive), wait for things we are
not allowed to wait for in the dma_fence signalling critical path.

Or do you mean something different?

> Also since all the paragraphs above talk about ordered wq as the example
> where specifying your own wq makes sense, it's a bit confusing to now
> suddenly only talk about the concurrent wq case without again mentioned
> that the ordered wq case is really limited.

I mean, it talks about both cases in a generic way, i.e. if you set
max_active == 1 in the text it covers the ordered case.

Or do you mean to say that we should *only* allow ordered workqueues to be
shared with the driver?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] drm/sched: Clarify scenarios for separate workqueues
  2025-06-17 15:08       ` Danilo Krummrich
@ 2025-06-18 14:06         ` Simona Vetter
  2025-06-18 14:42           ` Danilo Krummrich
  0 siblings, 1 reply; 7+ messages in thread
From: Simona Vetter @ 2025-06-18 14:06 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Philipp Stanner, Matthew Brost, Christian König,
	David Airlie, Simona Vetter, Sumit Semwal, dri-devel,
	linux-kernel, linux-media

On Tue, Jun 17, 2025 at 05:08:57PM +0200, Danilo Krummrich wrote:
> On Tue, Jun 17, 2025 at 04:25:09PM +0200, Simona Vetter wrote:
> > On Tue, Jun 17, 2025 at 04:10:40PM +0200, Danilo Krummrich wrote:
> > > On Tue, Jun 17, 2025 at 03:51:33PM +0200, Simona Vetter wrote:
> > > > On Thu, Jun 12, 2025 at 04:49:54PM +0200, Philipp Stanner wrote:
> > > > > + * NOTE that sharing &struct drm_sched_init_args.submit_wq with the driver
> > > > > + * theoretically can deadlock. It must be guaranteed that submit_wq never has
> > > > > + * more than max_active - 1 active tasks, or if max_active tasks are reached at
> > > > > + * least one of them does not execute operations that may block on dma_fences
> > > > > + * that potentially make progress through this scheduler instance. Otherwise,
> > > > > + * it is possible that all max_active tasks end up waiting on a dma_fence (that
> > > > > + * can only make progress through this schduler instance), while the
> > > > > + * scheduler's queued work waits for at least one of the max_active tasks to
> > > > > + * finish. Thus, this can result in a deadlock.
> > > > 
> > > > Uh if you have an ordered wq you deadlock with just one misuse. I'd just
> > > > explain that the wq must provide sufficient forward-progress guarantees
> > > > for the scheduler, specifically that it's on the dma_fence signalling
> > > > critical path and leave the concrete examples for people to figure out
> > > > when the design a specific locking scheme.
> > > 
> > > This isn't a concrete example, is it? It's exactly what you say in slightly
> > > different words, with the addition of highlighting the impact of the workqueue's
> > > max_active configuration.
> > > 
> > > I think that's relevant, because N - 1 active tasks can be on the dma_fence
> > > signalling critical path without issues.
> > > 
> > > We could change
> > > 
> > > 	"if max_active tasks are reached at least one of them must not execute
> > > 	 operations that may block on dma_fences that potentially make progress
> > > 	 through this scheduler instance"
> > > 
> > > to 
> > > 
> > > 	"if max_active tasks are reached at least one of them must not be on the
> > > 	 dma_fence signalling critical path"
> > > 
> > > which is a bit more to the point I think.
> > 
> > My point was to more state that the wq must be suitable for the scheduler
> > jobs as the general issue, and specifically then also highlight the
> > dma_fence concurrency issue.
> 
> Sure, there are more guarantees the driver has to uphold, but this is one of
> them, so I think we should explain it.
> 
> > But it's not the only one, you can have driver locks and other fun involved
> > here too.
> 
> Yeah, but it boils down to the same issue, e.g. if a driver takes a lock in
> active work, and this lock is taken elsewhere for activities that violate the
> dma_fence signalling critical path.
> 
> All the cases I have in mind boil down to that we potentially, either directly
> or indirectly (through some synchronization primitive), wait for things we are
> not allowed to wait for in the dma_fence signalling critical path.
> 
> Or do you mean something different?

You could also grab a mutex in those driver work items that is held while
waiting for a dma_fence somewhere. The dma_fence annotations should catch
that, but at least in my reading of the text here it's not covered.

But my main point is what I explain below, the text fails to clearly
address the issues that all current drivers (maybe all reasonable drivers,
but sometimes I lack imagination) can encounter, so to me it's too generic
and not that directly applicable in practice. And then on the other hand
dma_fence is definitely the big thing, but fundamtentally you tie anything
you schedule on those wq to the drm/scheduler design in it's entirety. So
for the general rule, it's not generic enough for my taste.

> > Also since all the paragraphs above talk about ordered wq as the example
> > where specifying your own wq makes sense, it's a bit confusing to now
> > suddenly only talk about the concurrent wq case without again mentioned
> > that the ordered wq case is really limited.
> 
> I mean, it talks about both cases in a generic way, i.e. if you set
> max_active == 1 in the text it covers the ordered case.
> 
> Or do you mean to say that we should *only* allow ordered workqueues to be
> shared with the driver?

Both examples talk about ordered wq, they don't make any sense with
max_active > 1, and I can't come up with any example that would. So yeah
I'm leaning in that direction, at least in the docs. Only discussing
max_active > 1 for this specific issue is imo very confusing and not
helping much. I guess we could even WARN_ON if a provided wq is not
ordered, because that does smell funky for sure.

Cheers, Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] drm/sched: Clarify scenarios for separate workqueues
  2025-06-18 14:06         ` Simona Vetter
@ 2025-06-18 14:42           ` Danilo Krummrich
  0 siblings, 0 replies; 7+ messages in thread
From: Danilo Krummrich @ 2025-06-18 14:42 UTC (permalink / raw)
  To: Philipp Stanner, Matthew Brost, Christian König,
	David Airlie, Simona Vetter, Sumit Semwal, dri-devel,
	linux-kernel, linux-media

On Wed, Jun 18, 2025 at 04:06:41PM +0200, Simona Vetter wrote:
> On Tue, Jun 17, 2025 at 05:08:57PM +0200, Danilo Krummrich wrote:
> > On Tue, Jun 17, 2025 at 04:25:09PM +0200, Simona Vetter wrote:
> > > On Tue, Jun 17, 2025 at 04:10:40PM +0200, Danilo Krummrich wrote:
> > > > On Tue, Jun 17, 2025 at 03:51:33PM +0200, Simona Vetter wrote:
> > > > > On Thu, Jun 12, 2025 at 04:49:54PM +0200, Philipp Stanner wrote:
> > > > > > + * NOTE that sharing &struct drm_sched_init_args.submit_wq with the driver
> > > > > > + * theoretically can deadlock. It must be guaranteed that submit_wq never has
> > > > > > + * more than max_active - 1 active tasks, or if max_active tasks are reached at
> > > > > > + * least one of them does not execute operations that may block on dma_fences
> > > > > > + * that potentially make progress through this scheduler instance. Otherwise,
> > > > > > + * it is possible that all max_active tasks end up waiting on a dma_fence (that
> > > > > > + * can only make progress through this schduler instance), while the
> > > > > > + * scheduler's queued work waits for at least one of the max_active tasks to
> > > > > > + * finish. Thus, this can result in a deadlock.
> > > > > 
> > > > > Uh if you have an ordered wq you deadlock with just one misuse. I'd just
> > > > > explain that the wq must provide sufficient forward-progress guarantees
> > > > > for the scheduler, specifically that it's on the dma_fence signalling
> > > > > critical path and leave the concrete examples for people to figure out
> > > > > when the design a specific locking scheme.
> > > > 
> > > > This isn't a concrete example, is it? It's exactly what you say in slightly
> > > > different words, with the addition of highlighting the impact of the workqueue's
> > > > max_active configuration.
> > > > 
> > > > I think that's relevant, because N - 1 active tasks can be on the dma_fence
> > > > signalling critical path without issues.
> > > > 
> > > > We could change
> > > > 
> > > > 	"if max_active tasks are reached at least one of them must not execute
> > > > 	 operations that may block on dma_fences that potentially make progress
> > > > 	 through this scheduler instance"
> > > > 
> > > > to 
> > > > 
> > > > 	"if max_active tasks are reached at least one of them must not be on the
> > > > 	 dma_fence signalling critical path"
> > > > 
> > > > which is a bit more to the point I think.
> > > 
> > > My point was to more state that the wq must be suitable for the scheduler
> > > jobs as the general issue, and specifically then also highlight the
> > > dma_fence concurrency issue.
> > 
> > Sure, there are more guarantees the driver has to uphold, but this is one of
> > them, so I think we should explain it.
> > 
> > > But it's not the only one, you can have driver locks and other fun involved
> > > here too.
> > 
> > Yeah, but it boils down to the same issue, e.g. if a driver takes a lock in
> > active work, and this lock is taken elsewhere for activities that violate the
> > dma_fence signalling critical path.
> > 
> > All the cases I have in mind boil down to that we potentially, either directly
> > or indirectly (through some synchronization primitive), wait for things we are
> > not allowed to wait for in the dma_fence signalling critical path.
> > 
> > Or do you mean something different?
> 
> You could also grab a mutex in those driver work items that is held while
> waiting for a dma_fence somewhere. The dma_fence annotations should catch
> that, but at least in my reading of the text here it's not covered.

That's exactly one example of what I mean above with: "All the cases I have in
mind boil down to that we potentially, either directly or indirectly (through
some synchronization primitive), wait for things we are not allowed to wait for
in the dma_fence signalling critical path."

> But my main point is what I explain below, the text fails to clearly
> address the issues that all current drivers (maybe all reasonable drivers,
> but sometimes I lack imagination) can encounter, so to me it's too generic
> and not that directly applicable in practice.

I mean, the ordered workqueue isn't more limited than any other workqueue, it's
just that the likelyhood of hitting problems increases with max_active being
lower or even one.

But the error conditions are the exact same, aren't they? We always have to
ensure that at least one slot of the workqueue is not running a task that does
things that are not allowed to do in the dma_fence signalling critical path.

So, this part is technically covered. Do I understand you correctly that you say
you think it's covered too generically?

Do you propose to list the things we're not allowed to do explicitly?

> And then on the other hand
> dma_fence is definitely the big thing, but fundamtentally you tie anything
> you schedule on those wq to the drm/scheduler design in it's entirety. So
> for the general rule, it's not generic enough for my taste.

Sorry, I can't follow that. Can you please expand a bit on what you think is not
generic enough?

> > > Also since all the paragraphs above talk about ordered wq as the example
> > > where specifying your own wq makes sense, it's a bit confusing to now
> > > suddenly only talk about the concurrent wq case without again mentioned
> > > that the ordered wq case is really limited.
> > 
> > I mean, it talks about both cases in a generic way, i.e. if you set
> > max_active == 1 in the text it covers the ordered case.
> > 
> > Or do you mean to say that we should *only* allow ordered workqueues to be
> > shared with the driver?
> 
> Both examples talk about ordered wq, they don't make any sense with
> max_active > 1, and I can't come up with any example that would. So yeah
> I'm leaning in that direction, at least in the docs. Only discussing
> max_active > 1 for this specific issue is imo very confusing and not
> helping much. I guess we could even WARN_ON if a provided wq is not
> ordered, because that does smell funky for sure.

I think there is nothing wrong with sharing a workqueue with WQ_MAX_ACTIVE
between scheduler instances. In the firmware scheduler case we may have quite a
lot of them. So, why create a new workqueue for each of those, in case we can't
take advantage of the synchronization trick with ordered workqueues?

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-06-18 14:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-12 14:49 [PATCH v2] drm/sched: Clarify scenarios for separate workqueues Philipp Stanner
2025-06-17 13:51 ` Simona Vetter
2025-06-17 14:10   ` Danilo Krummrich
2025-06-17 14:25     ` Simona Vetter
2025-06-17 15:08       ` Danilo Krummrich
2025-06-18 14:06         ` Simona Vetter
2025-06-18 14:42           ` Danilo Krummrich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).