Re: [PATCH v2] drm/sched: Clarify scenarios for separate workqueues

public inbox for linux-media@vger.kernel.org
 help / color / mirror / Atom feed

From: Simona Vetter <simona.vetter@ffwll.ch>
To: Danilo Krummrich <dakr@kernel.org>
Cc: "Philipp Stanner" <phasta@kernel.org>,
	"Matthew Brost" <matthew.brost@intel.com>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	linux-media@vger.kernel.org
Subject: Re: [PATCH v2] drm/sched: Clarify scenarios for separate workqueues
Date: Wed, 18 Jun 2025 16:06:41 +0200	[thread overview]
Message-ID: <aFLH8Us_e_U6gNHV@phenom.ffwll.local> (raw)
In-Reply-To: <aFGFCc7eiZJM8RKM@pollux>

On Tue, Jun 17, 2025 at 05:08:57PM +0200, Danilo Krummrich wrote:
> On Tue, Jun 17, 2025 at 04:25:09PM +0200, Simona Vetter wrote:
> > On Tue, Jun 17, 2025 at 04:10:40PM +0200, Danilo Krummrich wrote:
> > > On Tue, Jun 17, 2025 at 03:51:33PM +0200, Simona Vetter wrote:
> > > > On Thu, Jun 12, 2025 at 04:49:54PM +0200, Philipp Stanner wrote:
> > > > > + * NOTE that sharing &struct drm_sched_init_args.submit_wq with the driver
> > > > > + * theoretically can deadlock. It must be guaranteed that submit_wq never has
> > > > > + * more than max_active - 1 active tasks, or if max_active tasks are reached at
> > > > > + * least one of them does not execute operations that may block on dma_fences
> > > > > + * that potentially make progress through this scheduler instance. Otherwise,
> > > > > + * it is possible that all max_active tasks end up waiting on a dma_fence (that
> > > > > + * can only make progress through this schduler instance), while the
> > > > > + * scheduler's queued work waits for at least one of the max_active tasks to
> > > > > + * finish. Thus, this can result in a deadlock.
> > > > 
> > > > Uh if you have an ordered wq you deadlock with just one misuse. I'd just
> > > > explain that the wq must provide sufficient forward-progress guarantees
> > > > for the scheduler, specifically that it's on the dma_fence signalling
> > > > critical path and leave the concrete examples for people to figure out
> > > > when the design a specific locking scheme.
> > > 
> > > This isn't a concrete example, is it? It's exactly what you say in slightly
> > > different words, with the addition of highlighting the impact of the workqueue's
> > > max_active configuration.
> > > 
> > > I think that's relevant, because N - 1 active tasks can be on the dma_fence
> > > signalling critical path without issues.
> > > 
> > > We could change
> > > 
> > > 	"if max_active tasks are reached at least one of them must not execute
> > > 	 operations that may block on dma_fences that potentially make progress
> > > 	 through this scheduler instance"
> > > 
> > > to 
> > > 
> > > 	"if max_active tasks are reached at least one of them must not be on the
> > > 	 dma_fence signalling critical path"
> > > 
> > > which is a bit more to the point I think.
> > 
> > My point was to more state that the wq must be suitable for the scheduler
> > jobs as the general issue, and specifically then also highlight the
> > dma_fence concurrency issue.
> 
> Sure, there are more guarantees the driver has to uphold, but this is one of
> them, so I think we should explain it.
> 
> > But it's not the only one, you can have driver locks and other fun involved
> > here too.
> 
> Yeah, but it boils down to the same issue, e.g. if a driver takes a lock in
> active work, and this lock is taken elsewhere for activities that violate the
> dma_fence signalling critical path.
> 
> All the cases I have in mind boil down to that we potentially, either directly
> or indirectly (through some synchronization primitive), wait for things we are
> not allowed to wait for in the dma_fence signalling critical path.
> 
> Or do you mean something different?

You could also grab a mutex in those driver work items that is held while
waiting for a dma_fence somewhere. The dma_fence annotations should catch
that, but at least in my reading of the text here it's not covered.

But my main point is what I explain below, the text fails to clearly
address the issues that all current drivers (maybe all reasonable drivers,
but sometimes I lack imagination) can encounter, so to me it's too generic
and not that directly applicable in practice. And then on the other hand
dma_fence is definitely the big thing, but fundamtentally you tie anything
you schedule on those wq to the drm/scheduler design in it's entirety. So
for the general rule, it's not generic enough for my taste.

> > Also since all the paragraphs above talk about ordered wq as the example
> > where specifying your own wq makes sense, it's a bit confusing to now
> > suddenly only talk about the concurrent wq case without again mentioned
> > that the ordered wq case is really limited.
> 
> I mean, it talks about both cases in a generic way, i.e. if you set
> max_active == 1 in the text it covers the ordered case.
> 
> Or do you mean to say that we should *only* allow ordered workqueues to be
> shared with the driver?

Both examples talk about ordered wq, they don't make any sense with
max_active > 1, and I can't come up with any example that would. So yeah
I'm leaning in that direction, at least in the docs. Only discussing
max_active > 1 for this specific issue is imo very confusing and not
helping much. I guess we could even WARN_ON if a provided wq is not
ordered, because that does smell funky for sure.

Cheers, Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

next prev parent reply	other threads:[~2025-06-18 14:06 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-12 14:49 [PATCH v2] drm/sched: Clarify scenarios for separate workqueues Philipp Stanner
2025-06-17 13:51 ` Simona Vetter
2025-06-17 14:10   ` Danilo Krummrich
2025-06-17 14:25     ` Simona Vetter
2025-06-17 15:08       ` Danilo Krummrich
2025-06-18 14:06         ` Simona Vetter [this message]
2025-06-18 14:42           ` Danilo Krummrich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aFLH8Us_e_U6gNHV@phenom.ffwll.local \
    --to=simona.vetter@ffwll.ch \
    --cc=airlied@gmail.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=matthew.brost@intel.com \
    --cc=phasta@kernel.org \
    --cc=simona@ffwll.ch \
    --cc=sumit.semwal@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox