public inbox for rust-for-linux@vger.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: Boris Brezillon <boris.brezillon@collabora.com>
Cc: "Danilo Krummrich" <dakr@kernel.org>,
	"Alice Ryhl" <aliceryhl@google.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Philipp Stanner" <phasta@mailbox.org>,
	phasta@kernel.org, "David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>, "Gary Guo" <gary@garyguo.net>,
	"Benno Lossin" <lossin@kernel.org>,
	"Daniel Almeida" <daniel.almeida@collabora.com>,
	"Joel Fernandes" <joelagnelf@nvidia.com>,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	rust-for-linux@vger.kernel.org, lucas.demarchi@intel.com,
	thomas.hellstrom@linux.intel.com, rodrigo.vivi@intel.com
Subject: Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
Date: Fri, 13 Mar 2026 10:27:44 -0700	[thread overview]
Message-ID: <abRJEG8dodzboWHp@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <20260211160059.6e0d3b60@fedora>

On Wed, Feb 11, 2026 at 04:00:59PM +0100, Boris Brezillon wrote:

Jumping in here as I was tagged in this thread… a lot gets through.

Randomly picking a point to reply.

> On Wed, 11 Feb 2026 15:38:32 +0100
> "Danilo Krummrich" <dakr@kernel.org> wrote:
> 
> > On Wed Feb 11, 2026 at 12:12 PM CET, Boris Brezillon wrote:
> > > On Wed, 11 Feb 2026 12:00:30 +0100
> > > "Danilo Krummrich" <dakr@kernel.org> wrote:  
> > >> I.e. sharing a workqueue between JobQs is fine, but we have to ensure they can't
> > >> be used for anything else.  
> > >
> > > Totally agree with that, and that's where I was going with this special
> > > DmaFenceWorkqueue wrapper/abstract, that would only accept
> > > scheduling MaySignalDmaFencesWorkItem objects.  
> > 
> > Not sure if it has to be that complicated (for a first shot). At least for the
> > JobQ it would probably be enough to have a helper to create a new, let's say,
> > struct JobQueueWorker that encapsulates a (reference counted) workqueue, but
> > does not give access to it outside of jobq.rs.
> 
> Except we need to schedule some work items that are in the
> DMA-signaling path but not directly controlled by the jobq.rs
> implementation (see [1] for the post-execution work we schedule in
> panthor).
> 
> The two options I can think of are:
> 
> 1. Add a an unsafe interface to schedule work items on the wq attached
>    to JobQ. Safety requirements in that case being compliance with the
>    DMA-fence signalling rules.

For (1), use lockdep to enforce these rules. I have a patch for this
[1]. Something like this is probably what everyone needs—jobqueue can
either create a workqueue with this annotation or enforce that the one
being passed in already has it. I turned this on for all Xe workqueues
in the signaling path and immediately found a few bugs, and I know the
dma-fence rules pretty well, so this is clearly useful.

I think user-scheduling work on the submit work item is valid. The
primary case in Xe is control-plane messages (e.g., queue
suspend/resume, teardown, toggling queue priority in firmware, etc.).
You don’t want to race with submission while manipulating queue state,
so you order this work on the workqueue. Could you do this with a lock?
Probably. But then you’d have to audit every point that issues a
control-plane message to make sure you can take that lock.

There’s also the hazard where a control message is issued in IRQ context
but you need a mutex to manipulate the queue (In Xe this the mutex to
send firmware commands). For example, I’ve implemented fence deadlines
in Xe [2], which fire control-plane messages in IRQ context. Another
example is a job dropping a ref to the queue in IRQ context, and that
being the final reference that triggers teardown. I don’t do the later
yet in Xe, but it should be possible to drop your last queue ref when a
dma-fence signals (i.e., no free_job work — just a put in the dma-fence
signaling IRQ handler) if jobqueue is designed correctly.

I’m also not sure timeouts are supposed to work in jobqueue, but if you
need to stop/start the jobqueue to ensure you have full control over
your queue (e.g., new jobqueues aren’t racing), then you likely need a
second workqueue so you can stop the submit one, or you might be able to
get away with a mutex. This also applies to users scheduling workqueue
operations on this—such as global resets or migrating a VF—which stop
all jobqueue instances to perform fixups. These global events can’t race
with jobs timing out either, since multiple entities can’t be
stop/starting jobqueue instances at the same time without breaking
things.

This is why, in Xe, all job timeouts and all global events are scheduled
on a single workqueue instance shared among all DRM sched instances.
This has worked quite well, so I’d strongly recommend carrying this part
of DRM sched forward into whatever succeeds it.

I have a fairly detailed write-up of the Xe scheduler design [3] — it’s
a little stale, but it should describe how a subset of DRM sched works
very well to implement complex driver-side scheduling requirements. A
whole other subset of DRM sched is horrid, so I’d recommend taking the
good ideas from DRM sched (queue stop/start, workqueue-based ordering,
finished fences, job tracking to completion) and using those in
jobqueue, while dropping the bad ones (no real object-lifetime rules, no
ownership rules, no refcounting, wild teardown flows, wild dma-fence
callback manipulation, etc.) and not carrying those forward. Some of DRM
sched’s very bad ideas appear to be in jobqueue as well. I’d reconsider
those, but I won’t harp on the design at this point.

Matt

[1] https://patchwork.freedesktop.org/patch/682491/?series=156283&rev=1
[2] https://patchwork.freedesktop.org/patch/696820/?series=159479&rev=2
[3] https://patchwork.freedesktop.org/patch/669007/?series=153000&rev=3

> 2. The thing I was describing before, where we add the concept of
>    DmaFenceWorkqueue that can only take MaySignalDmaFencesWorkItem. We
>    can then have a DmaFenceWorkqueue that's global, and pass it to the
>    JobQueue so it can use it for its own work item.
> 
> We could start with option 1, sure, but since we're going to need to
> schedule post-execution work items that have to be considered part of
> the DMA-signalling path, I'd rather have these concepts clearly defined
> from the start.
> 
> Mind if I give this DmaFenceWorkqueue/MaySignalDmaFencesWorkItem a try
> to see what it looks like a get the discussion going from there
> (hopefully it's just a thin wrapper around a regular
> Workqueue/WorkItem, with an extra dma_fence_signalling annotation in
> the WorkItem::run() path), or are you completely against the idea?
> 
> [1]https://elixir.bootlin.com/linux/v6.19-rc5/source/drivers/gpu/drm/panthor/panthor_sched.c#L1913

  parent reply	other threads:[~2026-03-13 17:27 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-03  8:13 [RFC PATCH 0/4] Add dma_fence abstractions and DRM Jobqueue Philipp Stanner
2026-02-03  8:14 ` [RFC PATCH 1/4] rust: list: Add unsafe for container_of Philipp Stanner
2026-02-03 15:25   ` Gary Guo
2026-02-04 10:30   ` Alice Ryhl
2026-02-03  8:14 ` [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions Philipp Stanner
2026-02-05  8:57   ` Boris Brezillon
2026-02-06 10:23     ` Danilo Krummrich
2026-02-09  8:19       ` Philipp Stanner
2026-02-09 14:58         ` Boris Brezillon
2026-02-10  8:16           ` Christian König
2026-02-10  8:38             ` Alice Ryhl
2026-02-10  9:06               ` Philipp Stanner
2026-02-10  9:54                 ` Christian König
2026-02-10  9:15               ` Boris Brezillon
2026-02-10 10:15                 ` Alice Ryhl
2026-02-10 10:36                   ` Danilo Krummrich
2026-02-10 10:46                     ` Christian König
2026-02-10 11:40                       ` Alice Ryhl
2026-02-10 12:28                         ` Boris Brezillon
2026-02-11  9:57                         ` Danilo Krummrich
2026-02-11 10:08                           ` Philipp Stanner
2026-02-11 10:28                             ` Boris Brezillon
2026-02-11 10:20                           ` Boris Brezillon
2026-02-11 11:00                             ` Danilo Krummrich
2026-02-11 11:12                               ` Boris Brezillon
2026-02-11 14:38                                 ` Danilo Krummrich
2026-02-11 15:00                                   ` Boris Brezillon
2026-02-11 15:05                                     ` Danilo Krummrich
2026-02-11 15:14                                       ` Boris Brezillon
2026-02-11 15:16                                         ` Danilo Krummrich
2026-03-13 17:27                                     ` Matthew Brost [this message]
2026-02-10 10:46                   ` Boris Brezillon
2026-02-10 11:34                   ` Boris Brezillon
2026-02-10 11:45                     ` Alice Ryhl
2026-02-10 12:21                       ` Boris Brezillon
2026-02-10 13:34                         ` Alice Ryhl
2026-02-10 12:36                   ` Boris Brezillon
2026-02-10 13:15                     ` Alice Ryhl
2026-02-10 13:26                       ` Boris Brezillon
2026-02-10 13:49                         ` Alice Ryhl
2026-02-10 13:56                           ` Christian König
2026-02-10 14:00                             ` Philipp Stanner
2026-02-10 14:06                               ` Christian König
2026-02-10 15:32                                 ` Philipp Stanner
2026-02-10 15:50                                   ` Christian König
2026-02-10 15:07                             ` Alice Ryhl
2026-02-10 15:45                               ` Christian König
2026-02-11  8:16                                 ` Philipp Stanner
2026-02-17 14:03                                 ` Philipp Stanner
2026-02-17 14:09                                   ` Alice Ryhl
2026-02-17 14:22                                     ` Christian König
2026-02-17 14:28                                       ` Philipp Stanner
2026-02-17 14:44                                         ` Danilo Krummrich
2026-03-13 23:20                                           ` Matthew Brost
2026-02-17 15:01                                         ` Christian König
2026-02-18  9:50                                         ` Alice Ryhl
2026-02-18 10:48                                           ` Boris Brezillon
2026-02-10 12:49                   ` Boris Brezillon
2026-02-10 12:56                     ` Boris Brezillon
2026-02-10 13:26                     ` Alice Ryhl
2026-02-10 13:51                       ` Boris Brezillon
2026-02-10 14:11                         ` Alice Ryhl
2026-02-10 14:50                           ` Boris Brezillon
2026-02-11  8:16                             ` Alice Ryhl
2026-02-11  9:20                               ` Boris Brezillon
2026-02-10  9:26               ` Christian König
2026-02-05 10:16   ` Boris Brezillon
2026-02-05 13:16     ` Gary Guo
2026-02-06  9:32       ` Philipp Stanner
2026-02-06 10:16         ` Danilo Krummrich
2026-02-06 13:24           ` Philipp Stanner
2026-02-06 11:04         ` Boris Brezillon
2026-02-09  8:21           ` Philipp Stanner
2026-02-06 11:23         ` Boris Brezillon
2026-02-09 11:30   ` Alice Ryhl
2026-02-03  8:14 ` [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue Philipp Stanner
2026-02-10 14:57   ` Boris Brezillon
2026-02-11 10:47     ` Philipp Stanner
2026-02-11 11:07       ` Boris Brezillon
2026-02-11 11:19         ` Danilo Krummrich
2026-02-11 12:10           ` Boris Brezillon
2026-02-11 12:32             ` Danilo Krummrich
2026-02-11 12:51               ` Boris Brezillon
2026-02-11 11:19         ` Philipp Stanner
2026-02-11 11:59           ` Boris Brezillon
2026-02-11 12:14             ` Philipp Stanner
2026-02-11 12:24               ` Boris Brezillon
2026-02-11 12:22           ` Alice Ryhl
2026-02-11 12:44             ` Philipp Stanner
2026-02-11 12:52               ` Alice Ryhl
2026-02-11 13:53                 ` Philipp Stanner
2026-02-11 15:28                   ` Alice Ryhl
2026-02-11 12:45             ` Danilo Krummrich
2026-02-11 13:45             ` Gary Guo
2026-02-11 14:07               ` Boris Brezillon
2026-02-11 15:17                 ` Alice Ryhl
2026-02-11 15:20                   ` Philipp Stanner
2026-02-11 15:51                     ` Boris Brezillon
2026-02-11 15:53                     ` Alice Ryhl
2026-02-11 15:54                     ` Danilo Krummrich
2026-02-11 15:33               ` Alice Ryhl
2026-02-03  8:14 ` [RFC PATCH 4/4] samples: rust: Add jobqueue tester Philipp Stanner
2026-02-03 16:46 ` [RFC PATCH 0/4] Add dma_fence abstractions and DRM Jobqueue Daniel Almeida

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abRJEG8dodzboWHp@lstrano-desk.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=airlied@gmail.com \
    --cc=aliceryhl@google.com \
    --cc=boris.brezillon@collabora.com \
    --cc=christian.koenig@amd.com \
    --cc=dakr@kernel.org \
    --cc=daniel.almeida@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gary@garyguo.net \
    --cc=joelagnelf@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lossin@kernel.org \
    --cc=lucas.demarchi@intel.com \
    --cc=phasta@kernel.org \
    --cc=phasta@mailbox.org \
    --cc=rodrigo.vivi@intel.com \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=simona@ffwll.ch \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox