From: Matthew Brost <matthew.brost@intel.com>
To: Boris Brezillon <boris.brezillon@collabora.com>
Cc: "Danilo Krummrich" <dakr@kernel.org>,
"Alice Ryhl" <aliceryhl@google.com>,
"Christian König" <christian.koenig@amd.com>,
"Philipp Stanner" <phasta@mailbox.org>,
phasta@kernel.org, "David Airlie" <airlied@gmail.com>,
"Simona Vetter" <simona@ffwll.ch>, "Gary Guo" <gary@garyguo.net>,
"Benno Lossin" <lossin@kernel.org>,
"Daniel Almeida" <daniel.almeida@collabora.com>,
"Joel Fernandes" <joelagnelf@nvidia.com>,
linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
rust-for-linux@vger.kernel.org, lucas.demarchi@intel.com,
thomas.hellstrom@linux.intel.com, rodrigo.vivi@intel.com
Subject: Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
Date: Fri, 13 Mar 2026 10:27:44 -0700 [thread overview]
Message-ID: <abRJEG8dodzboWHp@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <20260211160059.6e0d3b60@fedora>
On Wed, Feb 11, 2026 at 04:00:59PM +0100, Boris Brezillon wrote:
Jumping in here as I was tagged in this thread… a lot gets through.
Randomly picking a point to reply.
> On Wed, 11 Feb 2026 15:38:32 +0100
> "Danilo Krummrich" <dakr@kernel.org> wrote:
>
> > On Wed Feb 11, 2026 at 12:12 PM CET, Boris Brezillon wrote:
> > > On Wed, 11 Feb 2026 12:00:30 +0100
> > > "Danilo Krummrich" <dakr@kernel.org> wrote:
> > >> I.e. sharing a workqueue between JobQs is fine, but we have to ensure they can't
> > >> be used for anything else.
> > >
> > > Totally agree with that, and that's where I was going with this special
> > > DmaFenceWorkqueue wrapper/abstract, that would only accept
> > > scheduling MaySignalDmaFencesWorkItem objects.
> >
> > Not sure if it has to be that complicated (for a first shot). At least for the
> > JobQ it would probably be enough to have a helper to create a new, let's say,
> > struct JobQueueWorker that encapsulates a (reference counted) workqueue, but
> > does not give access to it outside of jobq.rs.
>
> Except we need to schedule some work items that are in the
> DMA-signaling path but not directly controlled by the jobq.rs
> implementation (see [1] for the post-execution work we schedule in
> panthor).
>
> The two options I can think of are:
>
> 1. Add a an unsafe interface to schedule work items on the wq attached
> to JobQ. Safety requirements in that case being compliance with the
> DMA-fence signalling rules.
For (1), use lockdep to enforce these rules. I have a patch for this
[1]. Something like this is probably what everyone needs—jobqueue can
either create a workqueue with this annotation or enforce that the one
being passed in already has it. I turned this on for all Xe workqueues
in the signaling path and immediately found a few bugs, and I know the
dma-fence rules pretty well, so this is clearly useful.
I think user-scheduling work on the submit work item is valid. The
primary case in Xe is control-plane messages (e.g., queue
suspend/resume, teardown, toggling queue priority in firmware, etc.).
You don’t want to race with submission while manipulating queue state,
so you order this work on the workqueue. Could you do this with a lock?
Probably. But then you’d have to audit every point that issues a
control-plane message to make sure you can take that lock.
There’s also the hazard where a control message is issued in IRQ context
but you need a mutex to manipulate the queue (In Xe this the mutex to
send firmware commands). For example, I’ve implemented fence deadlines
in Xe [2], which fire control-plane messages in IRQ context. Another
example is a job dropping a ref to the queue in IRQ context, and that
being the final reference that triggers teardown. I don’t do the later
yet in Xe, but it should be possible to drop your last queue ref when a
dma-fence signals (i.e., no free_job work — just a put in the dma-fence
signaling IRQ handler) if jobqueue is designed correctly.
I’m also not sure timeouts are supposed to work in jobqueue, but if you
need to stop/start the jobqueue to ensure you have full control over
your queue (e.g., new jobqueues aren’t racing), then you likely need a
second workqueue so you can stop the submit one, or you might be able to
get away with a mutex. This also applies to users scheduling workqueue
operations on this—such as global resets or migrating a VF—which stop
all jobqueue instances to perform fixups. These global events can’t race
with jobs timing out either, since multiple entities can’t be
stop/starting jobqueue instances at the same time without breaking
things.
This is why, in Xe, all job timeouts and all global events are scheduled
on a single workqueue instance shared among all DRM sched instances.
This has worked quite well, so I’d strongly recommend carrying this part
of DRM sched forward into whatever succeeds it.
I have a fairly detailed write-up of the Xe scheduler design [3] — it’s
a little stale, but it should describe how a subset of DRM sched works
very well to implement complex driver-side scheduling requirements. A
whole other subset of DRM sched is horrid, so I’d recommend taking the
good ideas from DRM sched (queue stop/start, workqueue-based ordering,
finished fences, job tracking to completion) and using those in
jobqueue, while dropping the bad ones (no real object-lifetime rules, no
ownership rules, no refcounting, wild teardown flows, wild dma-fence
callback manipulation, etc.) and not carrying those forward. Some of DRM
sched’s very bad ideas appear to be in jobqueue as well. I’d reconsider
those, but I won’t harp on the design at this point.
Matt
[1] https://patchwork.freedesktop.org/patch/682491/?series=156283&rev=1
[2] https://patchwork.freedesktop.org/patch/696820/?series=159479&rev=2
[3] https://patchwork.freedesktop.org/patch/669007/?series=153000&rev=3
> 2. The thing I was describing before, where we add the concept of
> DmaFenceWorkqueue that can only take MaySignalDmaFencesWorkItem. We
> can then have a DmaFenceWorkqueue that's global, and pass it to the
> JobQueue so it can use it for its own work item.
>
> We could start with option 1, sure, but since we're going to need to
> schedule post-execution work items that have to be considered part of
> the DMA-signalling path, I'd rather have these concepts clearly defined
> from the start.
>
> Mind if I give this DmaFenceWorkqueue/MaySignalDmaFencesWorkItem a try
> to see what it looks like a get the discussion going from there
> (hopefully it's just a thin wrapper around a regular
> Workqueue/WorkItem, with an extra dma_fence_signalling annotation in
> the WorkItem::run() path), or are you completely against the idea?
>
> [1]https://elixir.bootlin.com/linux/v6.19-rc5/source/drivers/gpu/drm/panthor/panthor_sched.c#L1913
next prev parent reply other threads:[~2026-03-13 17:27 UTC|newest]
Thread overview: 103+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-03 8:13 [RFC PATCH 0/4] Add dma_fence abstractions and DRM Jobqueue Philipp Stanner
2026-02-03 8:14 ` [RFC PATCH 1/4] rust: list: Add unsafe for container_of Philipp Stanner
2026-02-03 15:25 ` Gary Guo
2026-02-04 10:30 ` Alice Ryhl
2026-02-03 8:14 ` [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions Philipp Stanner
2026-02-05 8:57 ` Boris Brezillon
2026-02-06 10:23 ` Danilo Krummrich
2026-02-09 8:19 ` Philipp Stanner
2026-02-09 14:58 ` Boris Brezillon
2026-02-10 8:16 ` Christian König
2026-02-10 8:38 ` Alice Ryhl
2026-02-10 9:06 ` Philipp Stanner
2026-02-10 9:54 ` Christian König
2026-02-10 9:15 ` Boris Brezillon
2026-02-10 10:15 ` Alice Ryhl
2026-02-10 10:36 ` Danilo Krummrich
2026-02-10 10:46 ` Christian König
2026-02-10 11:40 ` Alice Ryhl
2026-02-10 12:28 ` Boris Brezillon
2026-02-11 9:57 ` Danilo Krummrich
2026-02-11 10:08 ` Philipp Stanner
2026-02-11 10:28 ` Boris Brezillon
2026-02-11 10:20 ` Boris Brezillon
2026-02-11 11:00 ` Danilo Krummrich
2026-02-11 11:12 ` Boris Brezillon
2026-02-11 14:38 ` Danilo Krummrich
2026-02-11 15:00 ` Boris Brezillon
2026-02-11 15:05 ` Danilo Krummrich
2026-02-11 15:14 ` Boris Brezillon
2026-02-11 15:16 ` Danilo Krummrich
2026-03-13 17:27 ` Matthew Brost [this message]
2026-02-10 10:46 ` Boris Brezillon
2026-02-10 11:34 ` Boris Brezillon
2026-02-10 11:45 ` Alice Ryhl
2026-02-10 12:21 ` Boris Brezillon
2026-02-10 13:34 ` Alice Ryhl
2026-02-10 12:36 ` Boris Brezillon
2026-02-10 13:15 ` Alice Ryhl
2026-02-10 13:26 ` Boris Brezillon
2026-02-10 13:49 ` Alice Ryhl
2026-02-10 13:56 ` Christian König
2026-02-10 14:00 ` Philipp Stanner
2026-02-10 14:06 ` Christian König
2026-02-10 15:32 ` Philipp Stanner
2026-02-10 15:50 ` Christian König
2026-02-10 15:07 ` Alice Ryhl
2026-02-10 15:45 ` Christian König
2026-02-11 8:16 ` Philipp Stanner
2026-02-17 14:03 ` Philipp Stanner
2026-02-17 14:09 ` Alice Ryhl
2026-02-17 14:22 ` Christian König
2026-02-17 14:28 ` Philipp Stanner
2026-02-17 14:44 ` Danilo Krummrich
2026-03-13 23:20 ` Matthew Brost
2026-02-17 15:01 ` Christian König
2026-02-18 9:50 ` Alice Ryhl
2026-02-18 10:48 ` Boris Brezillon
2026-02-10 12:49 ` Boris Brezillon
2026-02-10 12:56 ` Boris Brezillon
2026-02-10 13:26 ` Alice Ryhl
2026-02-10 13:51 ` Boris Brezillon
2026-02-10 14:11 ` Alice Ryhl
2026-02-10 14:50 ` Boris Brezillon
2026-02-11 8:16 ` Alice Ryhl
2026-02-11 9:20 ` Boris Brezillon
2026-02-10 9:26 ` Christian König
2026-02-05 10:16 ` Boris Brezillon
2026-02-05 13:16 ` Gary Guo
2026-02-06 9:32 ` Philipp Stanner
2026-02-06 10:16 ` Danilo Krummrich
2026-02-06 13:24 ` Philipp Stanner
2026-02-06 11:04 ` Boris Brezillon
2026-02-09 8:21 ` Philipp Stanner
2026-02-06 11:23 ` Boris Brezillon
2026-02-09 11:30 ` Alice Ryhl
2026-02-03 8:14 ` [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue Philipp Stanner
2026-02-10 14:57 ` Boris Brezillon
2026-02-11 10:47 ` Philipp Stanner
2026-02-11 11:07 ` Boris Brezillon
2026-02-11 11:19 ` Danilo Krummrich
2026-02-11 12:10 ` Boris Brezillon
2026-02-11 12:32 ` Danilo Krummrich
2026-02-11 12:51 ` Boris Brezillon
2026-02-11 11:19 ` Philipp Stanner
2026-02-11 11:59 ` Boris Brezillon
2026-02-11 12:14 ` Philipp Stanner
2026-02-11 12:24 ` Boris Brezillon
2026-02-11 12:22 ` Alice Ryhl
2026-02-11 12:44 ` Philipp Stanner
2026-02-11 12:52 ` Alice Ryhl
2026-02-11 13:53 ` Philipp Stanner
2026-02-11 15:28 ` Alice Ryhl
2026-02-11 12:45 ` Danilo Krummrich
2026-02-11 13:45 ` Gary Guo
2026-02-11 14:07 ` Boris Brezillon
2026-02-11 15:17 ` Alice Ryhl
2026-02-11 15:20 ` Philipp Stanner
2026-02-11 15:51 ` Boris Brezillon
2026-02-11 15:53 ` Alice Ryhl
2026-02-11 15:54 ` Danilo Krummrich
2026-02-11 15:33 ` Alice Ryhl
2026-02-03 8:14 ` [RFC PATCH 4/4] samples: rust: Add jobqueue tester Philipp Stanner
2026-02-03 16:46 ` [RFC PATCH 0/4] Add dma_fence abstractions and DRM Jobqueue Daniel Almeida
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abRJEG8dodzboWHp@lstrano-desk.jf.intel.com \
--to=matthew.brost@intel.com \
--cc=airlied@gmail.com \
--cc=aliceryhl@google.com \
--cc=boris.brezillon@collabora.com \
--cc=christian.koenig@amd.com \
--cc=dakr@kernel.org \
--cc=daniel.almeida@collabora.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=gary@garyguo.net \
--cc=joelagnelf@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lossin@kernel.org \
--cc=lucas.demarchi@intel.com \
--cc=phasta@kernel.org \
--cc=phasta@mailbox.org \
--cc=rodrigo.vivi@intel.com \
--cc=rust-for-linux@vger.kernel.org \
--cc=simona@ffwll.ch \
--cc=thomas.hellstrom@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox