Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Boris Brezillon <boris.brezillon@collabora.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: "Daniel Almeida" <daniel.almeida@collabora.com>,
	intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	"Tvrtko Ursulin" <tvrtko.ursulin@igalia.com>,
	"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"David Airlie" <airlied@gmail.com>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Maxime Ripard" <mripard@kernel.org>,
	"Philipp Stanner" <phasta@kernel.org>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	linux-kernel@vger.kernel.org,
	"Sami Tolvanen" <samitolvanen@google.com>,
	"Jeffrey Vander Stoep" <jeffv@google.com>,
	"Alice Ryhl" <aliceryhl@google.com>,
	"Daniel Stone" <daniels@collabora.com>,
	"Alexandre Courbot" <acourbot@nvidia.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	shashanks@nvidia.com, jajones@nvidia.com,
	"Eliot Courtney" <ecourtney@nvidia.com>,
	"Joel Fernandes" <joelagnelf@nvidia.com>,
	rust-for-linux <rust-for-linux@vger.kernel.org>
Subject: Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
Date: Thu, 19 Mar 2026 10:57:29 +0100	[thread overview]
Message-ID: <20260319105729.2c20116d@fedora> (raw)
In-Reply-To: <absp4z1Op0BrXReP@lstrano-desk.jf.intel.com>

On Wed, 18 Mar 2026 15:40:35 -0700
Matthew Brost <matthew.brost@intel.com> wrote:

> > > 
> > > So I don’t think Rust natively solves these types of problems, although
> > > I’ll concede that it does make refcounting a bit more sane.  
> > 
> > Rust won't magically defer the cleanup, nor will it dictate how you want
> > to do the queue teardown, those are things you need to implement. But it
> > should give visibility about object lifetimes, and guarantee that an
> > object that's still visible to some owners is usable (the notion of
> > usable is highly dependent on the object implementation).
> > 
> > Just a purely theoretical example of a multi-step queue teardown that
> > might be possible to encode in rust:
> > 
> > - MyJobQueue<Usable>: The job queue is currently exposed and usable.
> >   There's a ::destroy() method consuming 'self' and returning a
> >   MyJobQueue<Destroyed> object
> > - MyJobQueue<Destroyed>: The user asked for the workqueue to be
> >   destroyed. No new job can be pushed. Existing jobs that didn't make
> >   it to the FW queue are cancelled, jobs that are in-flight are
> >   cancelled if they can, or are just waited upon if they can't. When
> >   the whole destruction step is done, ::destroyed() is called, it
> >   consumes 'self' and returns a MyJobQueue<Inactive> object.
> > - MyJobQueue<Inactive>: The queue is no longer active (HW doesn't have
> >   any resources on this queue). It's ready to be cleaned up.
> >   ::cleanup() (or just ::drop()) defers the cleanup of some inner
> >   object that has been passed around between the various
> >   MyJobQueue<State> wrappers.
> > 
> > Each of the state transition can happen asynchronously. A state
> > transition consumes the object in one state, and returns a new object
> > in its new state. None of the transition involves dropping a refcnt,
> > ownership is just transferred. The final MyJobQueue<Inactive> object is
> > the object we'll defer cleanup on.
> > 
> > It's a very high-level view of one way this can be implemented (I'm
> > sure there are others, probably better than my suggestion) in order to
> > make sure the object doesn't go away without the compiler enforcing
> > proper state transitions.
> >   
> 
> I'm sure Rust can implement this. My point about Rust is it doesn't
> magically solve hard software arch probles, but I will admit the
> ownership model, way it can enforce locking at compile time is pretty
> cool.

It's not quite about rust directly solving those problems for you, it's
about rust forcing you to think about those problems in the first
place. So no, rust won't magically solve your multi-step teardown with
crazy CPU <-> Device synchronization etc, but it allows you to clearly
identify those steps, and think about how you want to represent them
without abusing other concepts, like object refcounting/ownership.
Everything I described, you can code it in C BTW, it's just that C is so
lax that you can also abuse other stuff to get to your ends, which might
or might not be safe, but more importantly, will very likely obfuscate
the code (even with good docs).

> 
> > > > > > +/**
> > > > > > + * DOC: DRM dependency fence
> > > > > > + *
> > > > > > + * Each struct drm_dep_job has an associated struct drm_dep_fence that
> > > > > > + * provides a single dma_fence (@finished) signalled when the hardware
> > > > > > + * completes the job.
> > > > > > + *
> > > > > > + * The hardware fence returned by &drm_dep_queue_ops.run_job is stored as
> > > > > > + * @parent. @finished is chained to @parent via drm_dep_job_done_cb() and
> > > > > > + * is signalled once @parent signals (or immediately if run_job() returns
> > > > > > + * NULL or an error).    
> > > > > 
> > > > > I thought this fence proxy mechanism was going away due to recent work being
> > > > > carried out by Christian?
> > > > >     
> > > 
> > > Consider the case where a driver’s hardware fence is implemented as a
> > > dma-fence-array or dma-fence-chain. You cannot install these types of
> > > fences into a dma-resv or into syncobjs, so a proxy fence is useful
> > > here.  
> > 
> > Hm, so that's a driver returning a dma_fence_array/chain through
> > ::run_job()? Why would we not want to have them directly exposed and
> > split up into singular fence objects at resv insertion time (I don't
> > think syncobjs care, but I might be wrong). I mean, one of the point  
> 
> You can stick dma-fence-arrays in syncobjs, but not chains.

Yeah, kinda makes sense, since timeline syncobjs use chains, and if the
chain reject inner chains, it won't work.

> 
> Neither dma-fence-arrays/chain can go into dma-resv.

They can't go directly in it, but those can be split into individual
fences and be inserted, which would achieve the same goal.

> 
> Hence why disconnecting a job's finished fence from hardware fence IMO
> is good idea to keep so gives drivers flexiblity on the hardware fences.

The thing is, I'm not sure drivers were ever meant to expose containers
through ::run_job().

> e.g., If this design didn't have a job's finished fence, I'd have to
> open code one Xe side.

There might be other reasons we'd like to keep the
drm_sched_fence-like proxy that I'm missing. But if it's the only one,
and the fence-combining pattern you're describing is common to multiple
drivers, we can provide a container implementation that's not a
fence_array, so you can use it to insert driver fences into other
containers. This way we wouldn't force the proxy model to all drivers,
but we would keep the code generic/re-usable.

> 
> > behind the container extraction is so fences coming from the same
> > context/timeline can be detected and merged. If you insert the
> > container through a proxy, you're defeating the whole fence merging
> > optimization.  
> 
> Right. Finished fences have single timeline too...

Aren't you faking a single timeline though if you combine fences from
different engines running at their own pace into a container?

> 
> > 
> > The second thing is that I'm not sure drivers were ever supposed to
> > return fence containers in the first place, because the whole idea
> > behind a fence context is that fences are emitted/signalled in
> > seqno-order, and if the fence is encoding the state of multiple
> > timelines that progress at their own pace, it becomes tricky to control
> > that. I guess if it's always the same set of timelines that are
> > combined, that would work.  
> 
> Xe does this is definitely works. We submit to multiple rings, when all
> rings signal a seqno, a chain or array signals -> finished fence
> signals. The queues used in this manor can only submit multiple ring
> jobs so the finished fence timeline stays intact. If you could a
> multiple rings followed by a single ring submission on the same queue,
> yes this could break.

Okay, I had the same understanding, thanks for confirming.

> 
> >   
> > > One example is when a single job submits work to multiple rings
> > > that are flipped in hardware at the same time.  
> > 
> > We do have that in Panthor, but that's all explicit: in a single
> > SUBMIT, you can have multiple jobs targeting different queues, each of
> > them having their own set of deps/signal ops. The combination of all the
> > signal ops into a container is left to the UMD. It could be automated
> > kernel side, but that would be a flag on the SIGNAL op leading to the
> > creation of a fence_array containing fences from multiple submitted
> > jobs, rather than the driver combining stuff in the fence it returns in
> > ::run_job().  
> 
> See above. We have a dedicated queue type for these type of submissions
> and single job that submits to the all rings. We had multiple queue /
> jobs in the i915 to implemented this but it turns out it is much cleaner
> with a single queue / singler job / multiple rings model.

Hm, okay. It didn't turn into a mess in Panthor, but Xe is likely an
order of magnitude more complicated that Mali, so I'll refrain from
judging this design decision.

> 
> >   
> > > 
> > > Another case is late arming of hardware fences in run_job (which many
> > > drivers do). The proxy fence is immediately available at arm time and
> > > can be installed into dma-resv or syncobjs even though the actual
> > > hardware fence is not yet available. I think most drivers could be
> > > refactored to make the hardware fence immediately available at run_job,
> > > though.  
> > 
> > Yep, I also think we can arm the driver fence early in the case of
> > JobQueue. The reason it couldn't be done before is because the
> > scheduler was in the middle, deciding which entity to pull the next job
> > from, which was changing the seqno a job driver-fence would be assigned
> > (you can't guess that at queue time in that case).
> >   
> 
> Xe doesn't need to late arming, but it look like multiple drivers to
> implement the late arming which may be required (?).

As I said, it's mostly a problem when you have a
single-HW-queue:multiple-contexts model, which is exactly what
drm_sched was designed for. I suspect early arming is not an issue for
any of the HW supporting FW-based scheduling (PVR, Mali, NVidia,
...). If you want to use drm_dep for all drivers currently using
drm_sched (I'm still not convinced this is a good idea to do that
just yet, because then you're going to pull a lot of the complexity
we're trying to get rid of), then you need late arming of driver fences.

> 
> > [...]
> >   
> > > > > > + * **Reference counting**
> > > > > > + *
> > > > > > + * Jobs and queues are both reference counted.
> > > > > > + *
> > > > > > + * A job holds a reference to its queue from drm_dep_job_init() until
> > > > > > + * drm_dep_job_put() drops the job's last reference and its release callback
> > > > > > + * runs. This ensures the queue remains valid for the entire lifetime of any
> > > > > > + * job that was submitted to it.
> > > > > > + *
> > > > > > + * The queue holds its own reference to a job for as long as the job is
> > > > > > + * internally tracked: from the moment the job is added to the pending list
> > > > > > + * in drm_dep_queue_run_job() until drm_dep_job_done() kicks the put_job
> > > > > > + * worker, which calls drm_dep_job_put() to release that reference.    
> > > > > 
> > > > > Why not simply keep track that the job was completed, instead of relinquishing
> > > > > the reference? We can then release the reference once the job is cleaned up
> > > > > (by the queue, using a worker) in process context.    
> > > 
> > > I think that’s what I’m doing, while also allowing an opt-in path to
> > > drop the job reference when it signals (in IRQ context)  
> > 
> > Did you mean in !IRQ (or !atomic) context here? Feels weird to not
> > defer the cleanup when you're in an IRQ/atomic context, but defer it
> > when you're in a thread context.
> >   
> 
> The put of a job in this design can be from an IRQ context (opt-in)
> feature. xa_destroy blows up if it is called from an IRQ context,
> although maybe that could be workaround.

Making it so _put() in IRQ context is safe is fine, what I'm saying is
that instead of doing a partial immediate cleanup, and the rest in a
worker, we can just defer everything: that is, have some
_deref_release() function called by kref_put() that would queue a work
item from which the actual release is done.

> 
> > > so we avoid
> > > switching to a work item just to drop a ref. That seems like a
> > > significant win in terms of CPU cycles.  
> > 
> > Well, the cleanup path is probably not where latency matters the most.  
> 
> Agree. But I do think avoiding a CPU context switch (work item) for a
> very lightweight job cleanup (usually just drop refs) will save of CPU
> cycles, thus also things like power, etc...

That's the sort of statements I'd like to be backed by actual
numbers/scenarios proving that it actually makes a difference. The
mixed model where things are partially freed immediately/partially
deferred, and sometimes even with conditionals for whether the deferral
happens or not, it just makes building a mental model of this thing a
nightmare, which in turn usually leads to subtle bugs.

> 
> > It's adding scheduling overhead, sure, but given all the stuff we defer
> > already, I'm not too sure we're at saving a few cycles to get the
> > cleanup done immediately. What's important to have is a way to signal
> > fences in an atomic context, because this has an impact on latency.
> >   
> 
> Yes. The signaling happens first then drm_dep_job_put if IRQ opt-in.
> 
> > [...]
> >   
> > > > > > + /*
> > > > > > + * Drop all input dependency fences now, in process context, before the
> > > > > > + * final job put. Once the job is on the pending list its last reference
> > > > > > + * may be dropped from a dma_fence callback (IRQ context), where calling
> > > > > > + * xa_destroy() would be unsafe.
> > > > > > + */    
> > > > > 
> > > > > I assume that “pending” is the list of jobs that have been handed to the driver
> > > > > via ops->run_job()?
> > > > > 
> > > > > Can’t this problem be solved by not doing anything inside a dma_fence callback
> > > > > other than scheduling the queue worker?
> > > > >     
> > > 
> > > Yes, this code is required to support dropping job refs directly in the
> > > dma-fence callback (an opt-in feature). Again, this seems like a
> > > significant win in terms of CPU cycles, although I haven’t collected
> > > data yet.  
> > 
> > If it significantly hurts the perf, I'd like to understand why, because
> > to me it looks like pure-cleanup (no signaling involved), and thus no
> > other process waiting for us to do the cleanup. The only thing that
> > might have an impact is how fast you release the resources, and given
> > it's only a partial cleanup (xa_destroy() still has to be deferred), I'd
> > like to understand which part of the immediate cleanup is causing a
> > contention (basically which kind of resources the system is starving of)
> >   
> 
> It was more of once we moved to a ref counted model, it is pretty
> trivial allow drm_dep_job_put when the fence is signaling. It doesn't
> really add any complexity either, thus why I added it is.

It's not the refcount model I'm complaining about, it's the "part of it
is always freed immediately, part of it is deferred, but not always ..."
that happens in drm_dep_job_release() I'm questioning. I'd really
prefer something like:

static void drm_dep_job_release()
{
	// do it all unconditionally
}

static void drm_dep_job_defer_release()
{
	queue_work(&job->cleanup_work);
}

static void drm_dep_job_put()
{
	kref_put(job, drm_dep_job_defer_release);
}

next prev parent reply	other threads:[~2026-03-19  9:57 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260316043255.226352-1-matthew.brost@intel.com>
2026-03-16  4:32 ` [RFC PATCH 01/12] workqueue: Add interface to teach lockdep to warn on reclaim violations Matthew Brost
2026-03-25 15:59   ` Tejun Heo
2026-03-26  1:49     ` Matthew Brost
2026-03-26  2:19       ` Tejun Heo
2026-03-27  4:33         ` Matthew Brost
2026-03-27 17:25           ` Tejun Heo
2026-03-16  4:32 ` [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer Matthew Brost
2026-03-16  9:16   ` Boris Brezillon
2026-03-17  5:22     ` Matthew Brost
2026-03-17  8:48       ` Boris Brezillon
2026-03-16 10:25   ` Danilo Krummrich
2026-03-17  5:10     ` Matthew Brost
2026-03-17 12:19       ` Danilo Krummrich
2026-03-18 23:02         ` Matthew Brost
2026-03-17  2:47   ` Daniel Almeida
2026-03-17  5:45     ` Matthew Brost
2026-03-17  7:17       ` Miguel Ojeda
2026-03-17  8:26         ` Matthew Brost
2026-03-17 12:04           ` Daniel Almeida
2026-03-17 19:41           ` Miguel Ojeda
2026-03-23 17:31             ` Matthew Brost
2026-03-23 17:42               ` Miguel Ojeda
2026-03-17 18:14       ` Matthew Brost
2026-03-17 19:48         ` Daniel Almeida
2026-03-17 20:43         ` Boris Brezillon
2026-03-18 22:40           ` Matthew Brost
2026-03-19  9:57             ` Boris Brezillon [this message]
2026-03-22  6:43               ` Matthew Brost
2026-03-23  7:58                 ` Matthew Brost
2026-03-23 10:06                   ` Boris Brezillon
2026-03-23 17:11                     ` Matthew Brost
2026-03-17 12:31     ` Danilo Krummrich
2026-03-17 14:25       ` Daniel Almeida
2026-03-17 14:33         ` Danilo Krummrich
2026-03-18 22:50           ` Matthew Brost
2026-03-17  8:47   ` Christian König
2026-03-17 14:55   ` Boris Brezillon
2026-03-18 23:28     ` Matthew Brost
2026-03-19  9:11       ` Boris Brezillon
2026-03-23  4:50         ` Matthew Brost
2026-03-23  9:55           ` Boris Brezillon
2026-03-23 17:08             ` Matthew Brost
2026-03-23 18:38               ` Matthew Brost
2026-03-24  9:23                 ` Boris Brezillon
2026-03-24 16:06                   ` Matthew Brost
2026-03-25  2:33                     ` Matthew Brost
2026-03-24  8:49               ` Boris Brezillon
2026-03-24 16:51                 ` Matthew Brost
2026-03-17 16:30   ` Shashank Sharma
2026-03-16  4:32 ` [RFC PATCH 11/12] accel/amdxdna: Convert to drm_dep scheduler layer Matthew Brost
2026-03-16  4:32 ` [RFC PATCH 12/12] drm/panthor: " Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260319105729.2c20116d@fedora \
    --to=boris.brezillon@collabora.com \
    --cc=acourbot@nvidia.com \
    --cc=airlied@gmail.com \
    --cc=aliceryhl@google.com \
    --cc=christian.koenig@amd.com \
    --cc=dakr@kernel.org \
    --cc=daniel.almeida@collabora.com \
    --cc=daniels@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=ecourtney@nvidia.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jajones@nvidia.com \
    --cc=jeffv@google.com \
    --cc=jhubbard@nvidia.com \
    --cc=joelagnelf@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=matthew.brost@intel.com \
    --cc=mripard@kernel.org \
    --cc=phasta@kernel.org \
    --cc=rodrigo.vivi@intel.com \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=samitolvanen@google.com \
    --cc=shashanks@nvidia.com \
    --cc=simona@ffwll.ch \
    --cc=sumit.semwal@linaro.org \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=tvrtko.ursulin@igalia.com \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox