public inbox for rust-for-linux@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
       [not found] ` <20260316043255.226352-3-matthew.brost@intel.com>
@ 2026-03-17  2:47   ` Daniel Almeida
  2026-03-17  5:45     ` Matthew Brost
  2026-03-17 12:31     ` Danilo Krummrich
  0 siblings, 2 replies; 21+ messages in thread
From: Daniel Almeida @ 2026-03-17  2:47 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, dri-devel, Boris Brezillon, Tvrtko Ursulin,
	Rodrigo Vivi, Thomas Hellström, Christian König,
	Danilo Krummrich, David Airlie, Maarten Lankhorst, Maxime Ripard,
	Philipp Stanner, Simona Vetter, Sumit Semwal, Thomas Zimmermann,
	linux-kernel, Sami Tolvanen, Jeffrey Vander Stoep, Alice Ryhl,
	Daniel Stone, Alexandre Courbot, John Hubbard, shashanks, jajones,
	Eliot Courtney, Joel Fernandes, rust-for-linux

(+cc a few other people + Rust-for-Linux ML)

Hi Matthew,

I agree with what Danilo said below, i.e.:  IMHO, with the direction that DRM
is going, it is much more ergonomic to add a Rust component with a nice C
interface than doing it the other way around.

> On 16 Mar 2026, at 01:32, Matthew Brost <matthew.brost@intel.com> wrote:
> 
> Diverging requirements between GPU drivers using firmware scheduling
> and those using hardware scheduling have shown that drm_gpu_scheduler is
> no longer sufficient for firmware-scheduled GPU drivers. The technical
> debt, lack of memory-safety guarantees, absence of clear object-lifetime
> rules, and numerous driver-specific hacks have rendered
> drm_gpu_scheduler unmaintainable. It is time for a fresh design for
> firmware-scheduled GPU drivers—one that addresses all of the
> aforementioned shortcomings.
> 
> Add drm_dep, a lightweight GPU submission queue intended as a
> replacement for drm_gpu_scheduler for firmware-managed GPU schedulers
> (e.g. Xe, Panthor, AMDXDNA, PVR, Nouveau, Nova). Unlike
> drm_gpu_scheduler, which separates the scheduler (drm_gpu_scheduler)
> from the queue (drm_sched_entity) into two objects requiring external
> coordination, drm_dep merges both roles into a single struct
> drm_dep_queue. This eliminates the N:1 entity-to-scheduler mapping
> that is unnecessary for firmware schedulers which manage their own
> run-lists internally.
> 
> Unlike drm_gpu_scheduler, which relies on external locking and lifetime
> management by the driver, drm_dep uses reference counting (kref) on both
> queues and jobs to guarantee object lifetime safety. A job holds a queue

In a domain that has been plagued by lifetime issues, we really should be
enforcing RAII for resource management instead of manual calls.

> reference from init until its last put, and the queue holds a job reference
> from dispatch until the put_job worker runs. This makes use-after-free
> impossible even when completion arrives from IRQ context or concurrent
> teardown is in flight.

It makes use-after-free impossible _if_ you’re careful. It is not a
property of the type system, and incorrect code will compile just fine.

> 
> The core objects are:
> 
>  struct drm_dep_queue - a per-context submission queue owning an
>    ordered submit workqueue, a TDR timeout workqueue, an SPSC job
>    queue, and a pending-job list. Reference counted; drivers can embed
>    it and provide a .release vfunc for RCU-safe teardown.
> 
>  struct drm_dep_job - a single unit of GPU work. Drivers embed this
>    and provide a .release vfunc. Jobs carry an xarray of input
>    dma_fence dependencies and produce a drm_dep_fence as their
>    finished fence.
> 
>  struct drm_dep_fence - a dma_fence subclass wrapping an optional
>    parent hardware fence. The finished fence is armed (sequence
>    number assigned) before submission and signals when the hardware
>    fence signals (or immediately on synchronous completion).
> 
> Job lifecycle:
>  1. drm_dep_job_init() - allocate and initialise; job acquires a
>     queue reference.
>  2. drm_dep_job_add_dependency() and friends - register input fences;
>     duplicates from the same context are deduplicated.
>  3. drm_dep_job_arm() - assign sequence number, obtain finished fence.
>  4. drm_dep_job_push() - submit to queue.

You cannot enforce this sequence easily in C code. Once again, we are trusting
drivers that it is followed, but in Rust, you can simply reject code that does
not follow this order at compile time.


> 
> Submission paths under queue lock:
>  - Bypass path: if DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED is set, the
>    SPSC queue is empty, no dependencies are pending, and credits are
>    available, the job is dispatched inline on the calling thread.
>  - Queued path: job is pushed onto the SPSC queue and the run_job
>    worker is kicked. The worker resolves remaining dependencies
>    (installing wakeup callbacks for unresolved fences) before calling
>    ops->run_job().
> 
> Credit-based throttling prevents hardware overflow: each job declares
> a credit cost at init time; dispatch is deferred until sufficient
> credits are available.

Why can’t we design an API where the driver can refuse jobs in
ops->run_job() if there are no resources to run it? This would do away with the
credit system that has been in place for quite a while. Has this approach been
tried in the past?


> 
> Timeout Detection and Recovery (TDR): a per-queue delayed work item
> fires when the head pending job exceeds q->job.timeout jiffies, calling
> ops->timedout_job(). drm_dep_queue_trigger_timeout() forces immediate
> expiry for device teardown.
> 
> IRQ-safe completion: queues flagged DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE
> allow drm_dep_job_done() to be called from hardirq context (e.g. a
> dma_fence callback). Dependency cleanup is deferred to process context
> after ops->run_job() returns to avoid calling xa_destroy() from IRQ.
> 
> Zombie-state guard: workers use kref_get_unless_zero() on entry and
> bail immediately if the queue refcount has already reached zero and
> async teardown is in flight, preventing use-after-free.

In rust, when you queue work, you have to pass a reference-counted pointer
(Arc<T>). We simply never have this problem in a Rust design. If there is work
queued, the queue is alive.

By the way, why can’t we simply require synchronous teardowns?

> 
> Teardown is always deferred to a module-private workqueue (dep_free_wq)
> so that destroy_workqueue() is never called from within one of the
> queue's own workers. Each queue holds a drm_dev_get() reference on its
> owning struct drm_device, released as the final step of teardown via
> drm_dev_put(). This prevents the driver module from being unloaded
> while any queue is still alive without requiring a separate drain API.
> 
> Cc: Boris Brezillon <boris.brezillon@collabora.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: David Airlie <airlied@gmail.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Maxime Ripard <mripard@kernel.org>
> Cc: Philipp Stanner <phasta@kernel.org>
> Cc: Simona Vetter <simona@ffwll.ch>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: Thomas Zimmermann <tzimmermann@suse.de>
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Assisted-by: GitHub Copilot:claude-sonnet-4.6
> ---
> drivers/gpu/drm/Kconfig             |    4 +
> drivers/gpu/drm/Makefile            |    1 +
> drivers/gpu/drm/dep/Makefile        |    5 +
> drivers/gpu/drm/dep/drm_dep_fence.c |  406 +++++++
> drivers/gpu/drm/dep/drm_dep_fence.h |   25 +
> drivers/gpu/drm/dep/drm_dep_job.c   |  675 +++++++++++
> drivers/gpu/drm/dep/drm_dep_job.h   |   13 +
> drivers/gpu/drm/dep/drm_dep_queue.c | 1647 +++++++++++++++++++++++++++
> drivers/gpu/drm/dep/drm_dep_queue.h |   31 +
> include/drm/drm_dep.h               |  597 ++++++++++
> 10 files changed, 3404 insertions(+)
> create mode 100644 drivers/gpu/drm/dep/Makefile
> create mode 100644 drivers/gpu/drm/dep/drm_dep_fence.c
> create mode 100644 drivers/gpu/drm/dep/drm_dep_fence.h
> create mode 100644 drivers/gpu/drm/dep/drm_dep_job.c
> create mode 100644 drivers/gpu/drm/dep/drm_dep_job.h
> create mode 100644 drivers/gpu/drm/dep/drm_dep_queue.c
> create mode 100644 drivers/gpu/drm/dep/drm_dep_queue.h
> create mode 100644 include/drm/drm_dep.h
> 
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index 5386248e75b6..834f6e210551 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -276,6 +276,10 @@ config DRM_SCHED
> tristate
> depends on DRM
> 
> +config DRM_DEP
> + tristate
> + depends on DRM
> +
> # Separate option as not all DRM drivers use it
> config DRM_PANEL_BACKLIGHT_QUIRKS
> tristate
> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> index e97faabcd783..1ad87cc0e545 100644
> --- a/drivers/gpu/drm/Makefile
> +++ b/drivers/gpu/drm/Makefile
> @@ -173,6 +173,7 @@ obj-y += clients/
> obj-y += display/
> obj-$(CONFIG_DRM_TTM) += ttm/
> obj-$(CONFIG_DRM_SCHED) += scheduler/
> +obj-$(CONFIG_DRM_DEP) += dep/
> obj-$(CONFIG_DRM_RADEON)+= radeon/
> obj-$(CONFIG_DRM_AMDGPU)+= amd/amdgpu/
> obj-$(CONFIG_DRM_AMDGPU)+= amd/amdxcp/
> diff --git a/drivers/gpu/drm/dep/Makefile b/drivers/gpu/drm/dep/Makefile
> new file mode 100644
> index 000000000000..335f1af46a7b
> --- /dev/null
> +++ b/drivers/gpu/drm/dep/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +drm_dep-y := drm_dep_queue.o drm_dep_job.o drm_dep_fence.o
> +
> +obj-$(CONFIG_DRM_DEP) += drm_dep.o
> diff --git a/drivers/gpu/drm/dep/drm_dep_fence.c b/drivers/gpu/drm/dep/drm_dep_fence.c
> new file mode 100644
> index 000000000000..ae05b9077772
> --- /dev/null
> +++ b/drivers/gpu/drm/dep/drm_dep_fence.c
> @@ -0,0 +1,406 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2026 Intel Corporation
> + */
> +
> +/**
> + * DOC: DRM dependency fence
> + *
> + * Each struct drm_dep_job has an associated struct drm_dep_fence that
> + * provides a single dma_fence (@finished) signalled when the hardware
> + * completes the job.
> + *
> + * The hardware fence returned by &drm_dep_queue_ops.run_job is stored as
> + * @parent. @finished is chained to @parent via drm_dep_job_done_cb() and
> + * is signalled once @parent signals (or immediately if run_job() returns
> + * NULL or an error).

I thought this fence proxy mechanism was going away due to recent work being
carried out by Christian?

> + *
> + * Drivers should expose @finished as the out-fence for GPU work since it is
> + * valid from the moment drm_dep_job_arm() returns, whereas the hardware fence
> + * could be a compound fence, which is disallowed when installed into
> + * drm_syncobjs or dma-resv.
> + *
> + * The fence uses the kernel's inline spinlock (NULL passed to dma_fence_init())
> + * so no separate lock allocation is required.
> + *
> + * Deadline propagation is supported: if a consumer sets a deadline via
> + * dma_fence_set_deadline(), it is forwarded to @parent when @parent is set.
> + * If @parent has not been set yet the deadline is stored in @deadline and
> + * forwarded at that point.
> + *
> + * Memory management: drm_dep_fence objects are allocated with kzalloc() and
> + * freed via kfree_rcu() once the fence is released, ensuring safety with
> + * RCU-protected fence accesses.
> + */
> +
> +#include <linux/slab.h>
> +#include <drm/drm_dep.h>
> +#include "drm_dep_fence.h"
> +
> +/**
> + * DRM_DEP_FENCE_FLAG_HAS_DEADLINE_BIT - a fence deadline hint has been set
> + *
> + * Set by the deadline callback on the finished fence to indicate a deadline
> + * has been set which may need to be propagated to the parent hardware fence.
> + */
> +#define DRM_DEP_FENCE_FLAG_HAS_DEADLINE_BIT (DMA_FENCE_FLAG_USER_BITS + 1)
> +
> +/**
> + * struct drm_dep_fence - fence tracking the completion of a dep job
> + *
> + * Contains a single dma_fence (@finished) that is signalled when the
> + * hardware completes the job. The fence uses the kernel's inline_lock
> + * (no external spinlock required).
> + *
> + * This struct is private to the drm_dep module; external code interacts
> + * through the accessor functions declared in drm_dep_fence.h.
> + */
> +struct drm_dep_fence {
> + /**
> + * @finished: signalled when the job completes on hardware.
> + *
> + * Drivers should use this fence as the out-fence for a job since it
> + * is available immediately upon drm_dep_job_arm().
> + */
> + struct dma_fence finished;
> +
> + /**
> + * @deadline: deadline set on @finished which potentially needs to be
> + * propagated to @parent.
> + */
> + ktime_t deadline;
> +
> + /**
> + * @parent: The hardware fence returned by &drm_dep_queue_ops.run_job.
> + *
> + * @finished is signaled once @parent is signaled. The initial store is
> + * performed via smp_store_release to synchronize with deadline handling.
> + *
> + * All readers must access this under the fence lock and take a reference to
> + * it, as @parent is set to NULL under the fence lock when the drm_dep_fence
> + * signals, and this drop also releases its internal reference.
> + */
> + struct dma_fence *parent;
> +
> + /**
> + * @q: the queue this fence belongs to.
> + */
> + struct drm_dep_queue *q;
> +};
> +
> +static const struct dma_fence_ops drm_dep_fence_ops;
> +
> +/**
> + * to_drm_dep_fence() - cast a dma_fence to its enclosing drm_dep_fence
> + * @f: dma_fence to cast
> + *
> + * Context: No context requirements (inline helper).
> + * Return: pointer to the enclosing &drm_dep_fence.
> + */
> +static struct drm_dep_fence *to_drm_dep_fence(struct dma_fence *f)
> +{
> + return container_of(f, struct drm_dep_fence, finished);
> +}
> +
> +/**
> + * drm_dep_fence_set_parent() - store the hardware fence and propagate
> + *   any deadline
> + * @dfence: dep fence
> + * @parent: hardware fence returned by &drm_dep_queue_ops.run_job, or NULL/error
> + *
> + * Stores @parent on @dfence under smp_store_release() so that a concurrent
> + * drm_dep_fence_set_deadline() call sees the parent before checking the
> + * deadline bit. If a deadline has already been set on @dfence->finished it is
> + * forwarded to @parent immediately. Does nothing if @parent is NULL or an
> + * error pointer.
> + *
> + * Context: Any context.
> + */
> +void drm_dep_fence_set_parent(struct drm_dep_fence *dfence,
> +      struct dma_fence *parent)
> +{
> + if (IS_ERR_OR_NULL(parent))
> + return;
> +
> + /*
> + * smp_store_release() to ensure a thread racing us in
> + * drm_dep_fence_set_deadline() sees the parent set before
> + * it calls test_bit(HAS_DEADLINE_BIT).
> + */
> + smp_store_release(&dfence->parent, dma_fence_get(parent));
> + if (test_bit(DRM_DEP_FENCE_FLAG_HAS_DEADLINE_BIT,
> +     &dfence->finished.flags))
> + dma_fence_set_deadline(parent, dfence->deadline);
> +}
> +
> +/**
> + * drm_dep_fence_finished() - signal the finished fence with a result
> + * @dfence: dep fence to signal
> + * @result: error code to set, or 0 for success
> + *
> + * Sets the fence error to @result if non-zero, then signals
> + * @dfence->finished. Also removes parent visibility under the fence lock
> + * and drops the parent reference. Dropping the parent here allows the
> + * DRM dep fence to be completely decoupled from the DRM dep module.
> + *
> + * Context: Any context.
> + */
> +static void drm_dep_fence_finished(struct drm_dep_fence *dfence, int result)
> +{
> + struct dma_fence *parent;
> + unsigned long flags;
> +
> + dma_fence_lock_irqsave(&dfence->finished, flags);
> + if (result)
> + dma_fence_set_error(&dfence->finished, result);
> + dma_fence_signal_locked(&dfence->finished);
> + parent = dfence->parent;
> + dfence->parent = NULL;
> + dma_fence_unlock_irqrestore(&dfence->finished, flags);
> +
> + dma_fence_put(parent);
> +}

We should really try to move away from manual locks and unlocks.

> +
> +static const char *drm_dep_fence_get_driver_name(struct dma_fence *fence)
> +{
> + return "drm_dep";
> +}
> +
> +static const char *drm_dep_fence_get_timeline_name(struct dma_fence *f)
> +{
> + struct drm_dep_fence *dfence = to_drm_dep_fence(f);
> +
> + return dfence->q->name;
> +}
> +
> +/**
> + * drm_dep_fence_get_parent() - get a reference to the parent hardware fence
> + * @dfence: dep fence to query
> + *
> + * Returns a new reference to @dfence->parent, or NULL if the parent has
> + * already been cleared (i.e. @dfence->finished has signalled and the parent
> + * reference was dropped under the fence lock).
> + *
> + * Uses smp_load_acquire() to pair with the smp_store_release() in
> + * drm_dep_fence_set_parent(), ensuring that if we race a concurrent
> + * drm_dep_fence_set_parent() call we observe the parent pointer only after
> + * the store is fully visible — before set_parent() tests
> + * %DRM_DEP_FENCE_FLAG_HAS_DEADLINE_BIT.
> + *
> + * Caller must hold the fence lock on @dfence->finished.
> + *
> + * Context: Any context, fence lock on @dfence->finished must be held.
> + * Return: a new reference to the parent fence, or NULL.
> + */
> +static struct dma_fence *drm_dep_fence_get_parent(struct drm_dep_fence *dfence)
> +{
> + dma_fence_assert_held(&dfence->finished);

> +
> + return dma_fence_get(smp_load_acquire(&dfence->parent));
> +}
> +
> +/**
> + * drm_dep_fence_set_deadline() - dma_fence_ops deadline callback
> + * @f: fence on which the deadline is being set
> + * @deadline: the deadline hint to apply
> + *
> + * Stores the earliest deadline under the fence lock, then propagates
> + * it to the parent hardware fence via smp_load_acquire() to race
> + * safely with drm_dep_fence_set_parent().
> + *
> + * Context: Any context.
> + */
> +static void drm_dep_fence_set_deadline(struct dma_fence *f, ktime_t deadline)
> +{
> + struct drm_dep_fence *dfence = to_drm_dep_fence(f);
> + struct dma_fence *parent;
> + unsigned long flags;
> +
> + dma_fence_lock_irqsave(f, flags);
> +
> + /* If we already have an earlier deadline, keep it: */
> + if (test_bit(DRM_DEP_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) &&
> +    ktime_before(dfence->deadline, deadline)) {
> + dma_fence_unlock_irqrestore(f, flags);
> + return;
> + }
> +
> + dfence->deadline = deadline;
> + set_bit(DRM_DEP_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags);
> +
> + parent = drm_dep_fence_get_parent(dfence);
> + dma_fence_unlock_irqrestore(f, flags);
> +
> + if (parent)
> + dma_fence_set_deadline(parent, deadline);
> +
> + dma_fence_put(parent);
> +}
> +
> +static const struct dma_fence_ops drm_dep_fence_ops = {
> + .get_driver_name = drm_dep_fence_get_driver_name,
> + .get_timeline_name = drm_dep_fence_get_timeline_name,
> + .set_deadline = drm_dep_fence_set_deadline,
> +};
> +
> +/**
> + * drm_dep_fence_alloc() - allocate a dep fence
> + *
> + * Allocates a &drm_dep_fence with kzalloc() without initialising the
> + * dma_fence. Call drm_dep_fence_init() to fully initialise it.
> + *
> + * Context: Process context.
> + * Return: new &drm_dep_fence on success, NULL on allocation failure.
> + */
> +struct drm_dep_fence *drm_dep_fence_alloc(void)
> +{
> + return kzalloc_obj(struct drm_dep_fence);
> +}
> +
> +/**
> + * drm_dep_fence_init() - initialise the dma_fence inside a dep fence
> + * @dfence: dep fence to initialise
> + * @q: queue the owning job belongs to
> + *
> + * Initialises @dfence->finished using the context and sequence number from @q.
> + * Passes NULL as the lock so the fence uses its inline spinlock.
> + *
> + * Context: Any context.
> + */
> +void drm_dep_fence_init(struct drm_dep_fence *dfence, struct drm_dep_queue *q)
> +{
> + u32 seq = ++q->fence.seqno;
> +
> + /*
> + * XXX: Inline fence hazard: currently all expected users of DRM dep
> + * hardware fences have a unique lockdep class. If that ever changes,
> + * we will need to assign a unique lockdep class here so lockdep knows
> + * this fence is allowed to nest with driver hardware fences.
> + */
> +
> + dfence->q = q;
> + dma_fence_init(&dfence->finished, &drm_dep_fence_ops,
> +       NULL, q->fence.context, seq);
> +}
> +
> +/**
> + * drm_dep_fence_cleanup() - release a dep fence at job teardown
> + * @dfence: dep fence to clean up
> + *
> + * Called from drm_dep_job_fini(). If the dep fence was armed (refcount > 0)
> + * it is released via dma_fence_put() and will be freed by the RCU release
> + * callback once all waiters have dropped their references. If it was never
> + * armed it is freed directly with kfree().
> + *
> + * Context: Any context.
> + */
> +void drm_dep_fence_cleanup(struct drm_dep_fence *dfence)
> +{
> + if (drm_dep_fence_is_armed(dfence))
> + dma_fence_put(&dfence->finished);
> + else
> + kfree(dfence);
> +}
> +
> +/**
> + * drm_dep_fence_is_armed() - check whether the fence has been armed
> + * @dfence: dep fence to check
> + *
> + * Returns true if drm_dep_job_arm() has been called, i.e. @dfence->finished
> + * has been initialised and its reference count is non-zero.  Used by
> + * assertions to enforce correct job lifecycle ordering (arm before push,
> + * add_dependency before arm).
> + *
> + * Context: Any context.
> + * Return: true if the fence is armed, false otherwise.
> + */
> +bool drm_dep_fence_is_armed(struct drm_dep_fence *dfence)
> +{
> + return !!kref_read(&dfence->finished.refcount);
> +}

> +
> +/**
> + * drm_dep_fence_is_finished() - test whether the finished fence has signalled
> + * @dfence: dep fence to check
> + *
> + * Uses dma_fence_test_signaled_flag() to read %DMA_FENCE_FLAG_SIGNALED_BIT
> + * directly without invoking the fence's ->signaled() callback or triggering
> + * any signalling side-effects.
> + *
> + * Context: Any context.
> + * Return: true if @dfence->finished has been signalled, false otherwise.
> + */
> +bool drm_dep_fence_is_finished(struct drm_dep_fence *dfence)
> +{
> + return dma_fence_test_signaled_flag(&dfence->finished);
> +}
> +
> +/**
> + * drm_dep_fence_is_complete() - test whether the job has completed
> + * @dfence: dep fence to check
> + *
> + * Takes the fence lock on @dfence->finished and calls
> + * drm_dep_fence_get_parent() to safely obtain a reference to the parent
> + * hardware fence — or NULL if the parent has already been cleared after
> + * signalling.  Calls dma_fence_is_signaled() on @parent outside the lock,
> + * which may invoke the fence's ->signaled() callback and trigger signalling
> + * side-effects if the fence has completed but the signalled flag has not yet
> + * been set.  The finished fence is tested via dma_fence_test_signaled_flag(),
> + * without side-effects.
> + *
> + * May only be called on a stopped queue (see drm_dep_queue_is_stopped()).
> + *
> + * Context: Process context. The queue must be stopped before calling this.
> + * Return: true if the job is complete, false otherwise.
> + */
> +bool drm_dep_fence_is_complete(struct drm_dep_fence *dfence)
> +{
> + struct dma_fence *parent;
> + unsigned long flags;
> + bool complete;
> +
> + dma_fence_lock_irqsave(&dfence->finished, flags);
> + parent = drm_dep_fence_get_parent(dfence);
> + dma_fence_unlock_irqrestore(&dfence->finished, flags);
> +
> + complete = (parent && dma_fence_is_signaled(parent)) ||
> + dma_fence_test_signaled_flag(&dfence->finished);
> +
> + dma_fence_put(parent);
> +
> + return complete;
> +}
> +
> +/**
> + * drm_dep_fence_to_dma() - return the finished dma_fence for a dep fence
> + * @dfence: dep fence to query
> + *
> + * No reference is taken; the caller must hold its own reference to the owning
> + * &drm_dep_job for the duration of the access.
> + *
> + * Context: Any context.
> + * Return: the finished &dma_fence.
> + */
> +struct dma_fence *drm_dep_fence_to_dma(struct drm_dep_fence *dfence)
> +{
> + return &dfence->finished;
> +}
> +
> +/**
> + * drm_dep_fence_done() - signal the finished fence on job completion
> + * @dfence: dep fence to signal
> + * @result: job error code, or 0 on success
> + *
> + * Gets a temporary reference to @dfence->finished to guard against a racing
> + * last-put, signals the fence with @result, then drops the temporary
> + * reference. Called from drm_dep_job_done() in the queue core when a
> + * hardware completion callback fires or when run_job() returns immediately.
> + *
> + * Context: Any context.
> + */
> +void drm_dep_fence_done(struct drm_dep_fence *dfence, int result)
> +{
> + dma_fence_get(&dfence->finished);
> + drm_dep_fence_finished(dfence, result);
> + dma_fence_put(&dfence->finished);
> +}

Proper refcounting is automated (and enforced) in Rust.

> diff --git a/drivers/gpu/drm/dep/drm_dep_fence.h b/drivers/gpu/drm/dep/drm_dep_fence.h
> new file mode 100644
> index 000000000000..65a1582f858b
> --- /dev/null
> +++ b/drivers/gpu/drm/dep/drm_dep_fence.h
> @@ -0,0 +1,25 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2026 Intel Corporation
> + */
> +
> +#ifndef _DRM_DEP_FENCE_H_
> +#define _DRM_DEP_FENCE_H_
> +
> +#include <linux/dma-fence.h>
> +
> +struct drm_dep_fence;
> +struct drm_dep_queue;
> +
> +struct drm_dep_fence *drm_dep_fence_alloc(void);
> +void drm_dep_fence_init(struct drm_dep_fence *dfence, struct drm_dep_queue *q);
> +void drm_dep_fence_cleanup(struct drm_dep_fence *dfence);
> +void drm_dep_fence_set_parent(struct drm_dep_fence *dfence,
> +      struct dma_fence *parent);
> +void drm_dep_fence_done(struct drm_dep_fence *dfence, int result);
> +bool drm_dep_fence_is_armed(struct drm_dep_fence *dfence);
> +bool drm_dep_fence_is_finished(struct drm_dep_fence *dfence);
> +bool drm_dep_fence_is_complete(struct drm_dep_fence *dfence);
> +struct dma_fence *drm_dep_fence_to_dma(struct drm_dep_fence *dfence);
> +
> +#endif /* _DRM_DEP_FENCE_H_ */
> diff --git a/drivers/gpu/drm/dep/drm_dep_job.c b/drivers/gpu/drm/dep/drm_dep_job.c
> new file mode 100644
> index 000000000000..2d012b29a5fc
> --- /dev/null
> +++ b/drivers/gpu/drm/dep/drm_dep_job.c
> @@ -0,0 +1,675 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright 2015 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + * Copyright © 2026 Intel Corporation
> + */
> +
> +/**
> + * DOC: DRM dependency job
> + *
> + * A struct drm_dep_job represents a single unit of GPU work associated with
> + * a struct drm_dep_queue. The lifecycle of a job is:
> + *
> + * 1. **Allocation**: the driver allocates memory for the job (typically by
> + *    embedding struct drm_dep_job in a larger structure) and calls
> + *    drm_dep_job_init() to initialise it. On success the job holds one
> + *    kref reference and a reference to its queue.
> + *
> + * 2. **Dependency collection**: the driver calls drm_dep_job_add_dependency(),
> + *    drm_dep_job_add_syncobj_dependency(), drm_dep_job_add_resv_dependencies(),
> + *    or drm_dep_job_add_implicit_dependencies() to register dma_fence objects
> + *    that must be signalled before the job can run. Duplicate fences from the
> + *    same fence context are deduplicated automatically.
> + *
> + * 3. **Arming**: drm_dep_job_arm() initialises the job's finished fence,
> + *    consuming a sequence number from the queue. After arming,
> + *    drm_dep_job_finished_fence() returns a valid fence that may be passed to
> + *    userspace or used as a dependency by other jobs.
> + *
> + * 4. **Submission**: drm_dep_job_push() submits the job to the queue. The
> + *    queue takes a reference that it holds until the job's finished fence
> + *    signals and the job is freed by the put_job worker.
> + *
> + * 5. **Completion**: when the job's hardware work finishes its finished fence
> + *    is signalled and drm_dep_job_put() is called by the queue. The driver
> + *    must release any driver-private resources in &drm_dep_job_ops.release.
> + *
> + * Reference counting uses drm_dep_job_get() / drm_dep_job_put(). The
> + * internal drm_dep_job_fini() tears down the dependency xarray and fence
> + * objects before the driver's release callback is invoked.
> + */
> +
> +#include <linux/dma-resv.h>
> +#include <linux/kref.h>
> +#include <linux/slab.h>
> +#include <drm/drm_dep.h>
> +#include <drm/drm_file.h>
> +#include <drm/drm_gem.h>
> +#include <drm/drm_syncobj.h>
> +#include "drm_dep_fence.h"
> +#include "drm_dep_job.h"
> +#include "drm_dep_queue.h"
> +
> +/**
> + * drm_dep_job_init() - initialise a dep job
> + * @job: dep job to initialise
> + * @args: initialisation arguments
> + *
> + * Initialises @job with the queue, ops and credit count from @args.  Acquires
> + * a reference to @args->q via drm_dep_queue_get(); this reference is held for
> + * the lifetime of the job and released by drm_dep_job_release() when the last
> + * job reference is dropped.
> + *
> + * Resources are released automatically when the last reference is dropped
> + * via drm_dep_job_put(), which must be called to release the job; drivers
> + * must not free the job directly.

Again, can’t enforce that in C.

> + *
> + * Context: Process context. Allocates memory with GFP_KERNEL.
> + * Return: 0 on success, -%EINVAL if credits is 0,
> + *   -%ENOMEM on fence allocation failure.
> + */
> +int drm_dep_job_init(struct drm_dep_job *job,
> +     const struct drm_dep_job_init_args *args)
> +{
> + if (unlikely(!args->credits)) {
> + pr_err("drm_dep: %s: credits cannot be 0\n", __func__);
> + return -EINVAL;
> + }
> +
> + memset(job, 0, sizeof(*job));
> +
> + job->dfence = drm_dep_fence_alloc();
> + if (!job->dfence)
> + return -ENOMEM;
> +
> + job->ops = args->ops;
> + job->q = drm_dep_queue_get(args->q);
> + job->credits = args->credits;
> +
> + kref_init(&job->refcount);
> + xa_init_flags(&job->dependencies, XA_FLAGS_ALLOC);
> + INIT_LIST_HEAD(&job->pending_link);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(drm_dep_job_init);
> +
> +/**
> + * drm_dep_job_drop_dependencies() - release all input dependency fences
> + * @job: dep job whose dependency xarray to drain
> + *
> + * Walks @job->dependencies, puts each fence, and destroys the xarray.
> + * Any slots still holding a %DRM_DEP_JOB_FENCE_PREALLOC sentinel —
> + * i.e. slots that were pre-allocated but never replaced — are silently
> + * skipped; the sentinel carries no reference.  Called from
> + * drm_dep_queue_run_job() in process context immediately after
> + * @ops->run_job() returns, before the final drm_dep_job_put().  Releasing
> + * dependencies here — while still in process context — avoids calling
> + * xa_destroy() from IRQ context if the job's last reference is later
> + * dropped from a dma_fence callback.
> + *
> + * Context: Process context.
> + */
> +void drm_dep_job_drop_dependencies(struct drm_dep_job *job)
> +{
> + struct dma_fence *fence;
> + unsigned long index;
> +
> + xa_for_each(&job->dependencies, index, fence) {
> + if (unlikely(fence == DRM_DEP_JOB_FENCE_PREALLOC))
> + continue;
> + dma_fence_put(fence);
> + }
> + xa_destroy(&job->dependencies);
> +}

This is automated in Rust. You also can’t “forget” to call this.

> +
> +/**
> + * drm_dep_job_fini() - clean up a dep job
> + * @job: dep job to clean up
> + *
> + * Cleans up the dep fence and drops the queue reference held by @job.
> + *
> + * If the job was never armed (e.g. init failed before drm_dep_job_arm()),
> + * the dependency xarray is also released here.  For armed jobs the xarray
> + * has already been drained by drm_dep_job_drop_dependencies() in process
> + * context immediately after run_job(), so it is left untouched to avoid
> + * calling xa_destroy() from IRQ context.
> + *
> + * Warns if @job is still linked on the queue's pending list, which would
> + * indicate a bug in the teardown ordering.
> + *
> + * Context: Any context.
> + */
> +static void drm_dep_job_fini(struct drm_dep_job *job)
> +{
> + bool armed = drm_dep_fence_is_armed(job->dfence);
> +
> + WARN_ON(!list_empty(&job->pending_link));
> +
> + drm_dep_fence_cleanup(job->dfence);
> + job->dfence = NULL;
> +
> + /*
> + * Armed jobs have their dependencies drained by
> + * drm_dep_job_drop_dependencies() in process context after run_job().
> + * Skip here to avoid calling xa_destroy() from IRQ context.
> + */
> + if (!armed)
> + drm_dep_job_drop_dependencies(job);
> +}

Same here.

> +
> +/**
> + * drm_dep_job_get() - acquire a reference to a dep job
> + * @job: dep job to acquire a reference on, or NULL
> + *
> + * Context: Any context.
> + * Return: @job with an additional reference held, or NULL if @job is NULL.
> + */
> +struct drm_dep_job *drm_dep_job_get(struct drm_dep_job *job)
> +{
> + if (job)
> + kref_get(&job->refcount);
> + return job;
> +}
> +EXPORT_SYMBOL(drm_dep_job_get);
> +

Same here.

> +/**
> + * drm_dep_job_release() - kref release callback for a dep job
> + * @kref: kref embedded in the dep job
> + *
> + * Calls drm_dep_job_fini(), then invokes &drm_dep_job_ops.release if set,
> + * otherwise frees @job with kfree().  Finally, releases the queue reference
> + * that was acquired by drm_dep_job_init() via drm_dep_queue_put().  The
> + * queue put is performed last to ensure no queue state is accessed after
> + * the job memory is freed.
> + *
> + * Context: Any context if %DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE is set on the
> + *   job's queue; otherwise process context only, as the release callback may
> + *   sleep.
> + */
> +static void drm_dep_job_release(struct kref *kref)
> +{
> + struct drm_dep_job *job =
> + container_of(kref, struct drm_dep_job, refcount);
> + struct drm_dep_queue *q = job->q;
> +
> + drm_dep_job_fini(job);
> +
> + if (job->ops && job->ops->release)
> + job->ops->release(job);
> + else
> + kfree(job);
> +
> + drm_dep_queue_put(q);
> +}

Same here.

> +
> +/**
> + * drm_dep_job_put() - release a reference to a dep job
> + * @job: dep job to release a reference on, or NULL
> + *
> + * When the last reference is dropped, calls &drm_dep_job_ops.release if set,
> + * otherwise frees @job with kfree(). Does nothing if @job is NULL.
> + *
> + * Context: Any context if %DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE is set on the
> + *   job's queue; otherwise process context only, as the release callback may
> + *   sleep.
> + */
> +void drm_dep_job_put(struct drm_dep_job *job)
> +{
> + if (job)
> + kref_put(&job->refcount, drm_dep_job_release);
> +}
> +EXPORT_SYMBOL(drm_dep_job_put);
> +

Same here.

> +/**
> + * drm_dep_job_arm() - arm a dep job for submission
> + * @job: dep job to arm
> + *
> + * Initialises the finished fence on @job->dfence, assigning
> + * it a sequence number from the job's queue. Must be called after
> + * drm_dep_job_init() and before drm_dep_job_push(). Once armed,
> + * drm_dep_job_finished_fence() returns a valid fence that may be passed to
> + * userspace or used as a dependency by other jobs.
> + *
> + * Begins the DMA fence signalling path via dma_fence_begin_signalling().
> + * After this point, memory allocations that could trigger reclaim are
> + * forbidden; lockdep enforces this. arm() must always be paired with
> + * drm_dep_job_push(); lockdep also enforces this pairing.
> + *
> + * Warns if the job has already been armed.
> + *
> + * Context: Process context if %DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED is set
> + *   (takes @q->sched.lock, a mutex); any context otherwise. DMA fence signaling
> + *   path.
> + */
> +void drm_dep_job_arm(struct drm_dep_job *job)
> +{
> + drm_dep_queue_push_job_begin(job->q);
> + WARN_ON(drm_dep_fence_is_armed(job->dfence));
> + drm_dep_fence_init(job->dfence, job->q);
> + job->signalling_cookie = dma_fence_begin_signalling();
> +}
> +EXPORT_SYMBOL(drm_dep_job_arm);
> +
> +/**
> + * drm_dep_job_push() - submit a job to its queue for execution
> + * @job: dep job to push
> + *
> + * Submits @job to the queue it was initialised with. Must be called after
> + * drm_dep_job_arm(). Acquires a reference on @job on behalf of the queue,
> + * held until the queue is fully done with it. The reference is released
> + * directly in the finished-fence dma_fence callback for queues with
> + * %DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE (where drm_dep_job_done() may run
> + * from hardirq context), or via the put_job work item on the submit
> + * workqueue otherwise.
> + *
> + * Ends the DMA fence signalling path begun by drm_dep_job_arm() via
> + * dma_fence_end_signalling(). This must be paired with arm(); lockdep
> + * enforces the pairing.
> + *
> + * Once pushed, &drm_dep_queue_ops.run_job is guaranteed to be called for
> + * @job exactly once, even if the queue is killed or torn down before the
> + * job reaches the head of the queue. Drivers can use this guarantee to
> + * perform bookkeeping cleanup; the actual backend operation should be
> + * skipped when drm_dep_queue_is_killed() returns true.
> + *
> + * If the queue does not support the bypass path, the job is pushed directly
> + * onto the SPSC submission queue via drm_dep_queue_push_job() without holding
> + * @q->sched.lock. Otherwise, @q->sched.lock is taken and the job is either
> + * run immediately via drm_dep_queue_run_job() if it qualifies for bypass, or
> + * enqueued via drm_dep_queue_push_job() for dispatch by the run_job work item.
> + *
> + * Warns if the job has not been armed.
> + *
> + * Context: Process context if %DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED is set
> + *   (takes @q->sched.lock, a mutex); any context otherwise. DMA fence signaling
> + *   path.
> + */
> +void drm_dep_job_push(struct drm_dep_job *job)
> +{
> + struct drm_dep_queue *q = job->q;
> +
> + WARN_ON(!drm_dep_fence_is_armed(job->dfence));
> +
> + drm_dep_job_get(job);
> +
> + if (!(q->sched.flags & DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED)) {
> + drm_dep_queue_push_job(q, job);
> + dma_fence_end_signalling(job->signalling_cookie);

Signaling is enforced in a more thorough way in Rust. I’ll expand on this later in this patch.

> + drm_dep_queue_push_job_end(job->q);
> + return;
> + }
> +
> + scoped_guard(mutex, &q->sched.lock) {
> + if (drm_dep_queue_can_job_bypass(q, job))
> + drm_dep_queue_run_job(q, job);
> + else
> + drm_dep_queue_push_job(q, job);
> + }
> +
> + dma_fence_end_signalling(job->signalling_cookie);
> + drm_dep_queue_push_job_end(job->q);
> +}
> +EXPORT_SYMBOL(drm_dep_job_push);
> +
> +/**
> + * drm_dep_job_add_dependency() - adds the fence as a job dependency
> + * @job: dep job to add the dependencies to
> + * @fence: the dma_fence to add to the list of dependencies, or
> + *         %DRM_DEP_JOB_FENCE_PREALLOC to reserve a slot for later.
> + *
> + * Note that @fence is consumed in both the success and error cases (except
> + * when @fence is %DRM_DEP_JOB_FENCE_PREALLOC, which carries no reference).
> + *
> + * Signalled fences and fences belonging to the same queue as @job (i.e. where
> + * fence->context matches the queue's finished fence context) are silently
> + * dropped; the job need not wait on its own queue's output.
> + *
> + * Warns if the job has already been armed (dependencies must be added before
> + * drm_dep_job_arm()).
> + *
> + * **Pre-allocation pattern**
> + *
> + * When multiple jobs across different queues must be prepared and submitted
> + * together in a single atomic commit — for example, where job A's finished
> + * fence is an input dependency of job B — all jobs must be armed and pushed
> + * within a single dma_fence_begin_signalling() / dma_fence_end_signalling()
> + * region.  Once that region has started no memory allocation is permitted.
> + *
> + * To handle this, pass %DRM_DEP_JOB_FENCE_PREALLOC during the preparation
> + * phase (before arming any job, while GFP_KERNEL allocation is still allowed)
> + * to pre-allocate a slot in @job->dependencies.  The slot index assigned by
> + * the underlying xarray must be tracked by the caller separately (e.g. it is
> + * always index 0 when the dependency array is empty, as Xe relies on).
> + * After all jobs have been armed and the finished fences are available, call
> + * drm_dep_job_replace_dependency() with that index and the real fence.
> + * drm_dep_job_replace_dependency() uses GFP_NOWAIT internally and may be
> + * called from atomic or signalling context.
> + *
> + * The sentinel slot is never skipped by the signalled-fence fast-path,
> + * ensuring a slot is always allocated even when the real fence is not yet
> + * known.
> + *
> + * **Example: bind job feeding TLB invalidation jobs**
> + *
> + * Consider a GPU with separate queues for page-table bind operations and for
> + * TLB invalidation.  A single atomic commit must:
> + *
> + *  1. Run a bind job that modifies page tables.
> + *  2. Run one TLB-invalidation job per MMU that depends on the bind
> + *     completing, so stale translations are flushed before the engines
> + *     continue.
> + *
> + * Because all jobs must be armed and pushed inside a signalling region (where
> + * GFP_KERNEL is forbidden), pre-allocate slots before entering the region::
> + *
> + *   // Phase 1 — process context, GFP_KERNEL allowed
> + *   drm_dep_job_init(bind_job, bind_queue, ops);
> + *   for_each_mmu(mmu) {
> + *       drm_dep_job_init(tlb_job[mmu], tlb_queue[mmu], ops);
> + *       // Pre-allocate slot at index 0; real fence not available yet
> + *       drm_dep_job_add_dependency(tlb_job[mmu], DRM_DEP_JOB_FENCE_PREALLOC);
> + *   }
> + *
> + *   // Phase 2 — inside signalling region, no GFP_KERNEL
> + *   dma_fence_begin_signalling();
> + *   drm_dep_job_arm(bind_job);
> + *   for_each_mmu(mmu) {
> + *       // Swap sentinel for bind job's finished fence
> + *       drm_dep_job_replace_dependency(tlb_job[mmu], 0,
> + *                                      dma_fence_get(bind_job->finished));
> + *       drm_dep_job_arm(tlb_job[mmu]);
> + *   }
> + *   drm_dep_job_push(bind_job);
> + *   for_each_mmu(mmu)
> + *       drm_dep_job_push(tlb_job[mmu]);
> + *   dma_fence_end_signalling();
> + *
> + * Context: Process context. May allocate memory with GFP_KERNEL.
> + * Return: If fence == DRM_DEP_JOB_FENCE_PREALLOC index of allocation on
> + * success, else 0 on success, or a negative error code.
> + */

> +int drm_dep_job_add_dependency(struct drm_dep_job *job, struct dma_fence *fence)
> +{
> + struct drm_dep_queue *q = job->q;
> + struct dma_fence *entry;
> + unsigned long index;
> + u32 id = 0;
> + int ret;
> +
> + WARN_ON(drm_dep_fence_is_armed(job->dfence));
> + might_alloc(GFP_KERNEL);
> +
> + if (!fence)
> + return 0;
> +
> + if (fence == DRM_DEP_JOB_FENCE_PREALLOC)
> + goto add_fence;
> +
> + /*
> + * Ignore signalled fences or fences from our own queue — finished
> + * fences use q->fence.context.
> + */
> + if (dma_fence_test_signaled_flag(fence) ||
> +    fence->context == q->fence.context) {
> + dma_fence_put(fence);
> + return 0;
> + }
> +
> + /* Deduplicate if we already depend on a fence from the same context.
> + * This lets the size of the array of deps scale with the number of
> + * engines involved, rather than the number of BOs.
> + */
> + xa_for_each(&job->dependencies, index, entry) {
> + if (entry == DRM_DEP_JOB_FENCE_PREALLOC ||
> +    entry->context != fence->context)
> + continue;
> +
> + if (dma_fence_is_later(fence, entry)) {
> + dma_fence_put(entry);
> + xa_store(&job->dependencies, index, fence, GFP_KERNEL);
> + } else {
> + dma_fence_put(fence);
> + }
> + return 0;
> + }
> +
> +add_fence:
> + ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b,
> +       GFP_KERNEL);
> + if (ret != 0) {
> + if (fence != DRM_DEP_JOB_FENCE_PREALLOC)
> + dma_fence_put(fence);
> + return ret;
> + }
> +
> + return (fence == DRM_DEP_JOB_FENCE_PREALLOC) ? id : 0;
> +}
> +EXPORT_SYMBOL(drm_dep_job_add_dependency);
> +
> +/**
> + * drm_dep_job_replace_dependency() - replace a pre-allocated dependency slot
> + * @job: dep job to update
> + * @index: xarray index of the slot to replace, as returned when the sentinel
> + *         was originally inserted via drm_dep_job_add_dependency()
> + * @fence: the real dma_fence to store; its reference is always consumed
> + *
> + * Replaces the %DRM_DEP_JOB_FENCE_PREALLOC sentinel at @index in
> + * @job->dependencies with @fence.  The slot must have been pre-allocated by
> + * passing %DRM_DEP_JOB_FENCE_PREALLOC to drm_dep_job_add_dependency(); the
> + * existing entry is asserted to be the sentinel.
> + *
> + * This is the second half of the pre-allocation pattern described in
> + * drm_dep_job_add_dependency().  It is intended to be called inside a
> + * dma_fence_begin_signalling() / dma_fence_end_signalling() region where
> + * memory allocation with GFP_KERNEL is forbidden.  It uses GFP_NOWAIT
> + * internally so it is safe to call from atomic or signalling context, but
> + * since the slot has been pre-allocated no actual memory allocation occurs.
> + *
> + * If @fence is already signalled the slot is erased rather than storing a
> + * redundant dependency.  The successful store is asserted — if the store
> + * fails it indicates a programming error (slot index out of range or
> + * concurrent modification).
> + *
> + * Must be called before drm_dep_job_arm(). @fence is consumed in all cases.

Can’t enforce this in C. Also, how is the fence “consumed” ? You can’t enforce that
the user can’t access the fence anymore after this function returns, like we can do
at compile time in Rust.

> + *
> + * Context: Any context. DMA fence signaling path.
> + */
> +void drm_dep_job_replace_dependency(struct drm_dep_job *job, u32 index,
> +    struct dma_fence *fence)
> +{
> + WARN_ON(xa_load(&job->dependencies, index) !=
> + DRM_DEP_JOB_FENCE_PREALLOC);
> +
> + if (dma_fence_test_signaled_flag(fence)) {
> + xa_erase(&job->dependencies, index);
> + dma_fence_put(fence);
> + return;
> + }
> +
> + if (WARN_ON(xa_is_err(xa_store(&job->dependencies, index, fence,
> +       GFP_NOWAIT)))) {
> + dma_fence_put(fence);
> + return;
> + }
> +}
> +EXPORT_SYMBOL(drm_dep_job_replace_dependency);
> +
> +/**
> + * drm_dep_job_add_syncobj_dependency() - adds a syncobj's fence as a
> + *   job dependency
> + * @job: dep job to add the dependencies to
> + * @file: drm file private pointer
> + * @handle: syncobj handle to lookup
> + * @point: timeline point
> + *
> + * This adds the fence matching the given syncobj to @job.
> + *
> + * Context: Process context.
> + * Return: 0 on success, or a negative error code.
> + */
> +int drm_dep_job_add_syncobj_dependency(struct drm_dep_job *job,
> +       struct drm_file *file, u32 handle,
> +       u32 point)
> +{
> + struct dma_fence *fence;
> + int ret;
> +
> + ret = drm_syncobj_find_fence(file, handle, point, 0, &fence);
> + if (ret)
> + return ret;
> +
> + return drm_dep_job_add_dependency(job, fence);
> +}
> +EXPORT_SYMBOL(drm_dep_job_add_syncobj_dependency);
> +
> +/**
> + * drm_dep_job_add_resv_dependencies() - add all fences from the resv to the job
> + * @job: dep job to add the dependencies to
> + * @resv: the dma_resv object to get the fences from
> + * @usage: the dma_resv_usage to use to filter the fences
> + *
> + * This adds all fences matching the given usage from @resv to @job.
> + * Must be called with the @resv lock held.
> + *
> + * Context: Process context.
> + * Return: 0 on success, or a negative error code.
> + */
> +int drm_dep_job_add_resv_dependencies(struct drm_dep_job *job,
> +      struct dma_resv *resv,
> +      enum dma_resv_usage usage)
> +{
> + struct dma_resv_iter cursor;
> + struct dma_fence *fence;
> + int ret;
> +
> + dma_resv_assert_held(resv);
> +
> + dma_resv_for_each_fence(&cursor, resv, usage, fence) {
> + /*
> + * As drm_dep_job_add_dependency always consumes the fence
> + * reference (even when it fails), and dma_resv_for_each_fence
> + * is not obtaining one, we need to grab one before calling.
> + */
> + ret = drm_dep_job_add_dependency(job, dma_fence_get(fence));
> + if (ret)
> + return ret;
> + }
> + return 0;
> +}
> +EXPORT_SYMBOL(drm_dep_job_add_resv_dependencies);
> +
> +/**
> + * drm_dep_job_add_implicit_dependencies() - adds implicit dependencies
> + *   as job dependencies
> + * @job: dep job to add the dependencies to
> + * @obj: the gem object to add new dependencies from.
> + * @write: whether the job might write the object (so we need to depend on
> + * shared fences in the reservation object).
> + *
> + * This should be called after drm_gem_lock_reservations() on your array of
> + * GEM objects used in the job but before updating the reservations with your
> + * own fences.
> + *
> + * Context: Process context.
> + * Return: 0 on success, or a negative error code.
> + */
> +int drm_dep_job_add_implicit_dependencies(struct drm_dep_job *job,
> +  struct drm_gem_object *obj,
> +  bool write)
> +{
> + return drm_dep_job_add_resv_dependencies(job, obj->resv,
> + dma_resv_usage_rw(write));
> +}
> +EXPORT_SYMBOL(drm_dep_job_add_implicit_dependencies);
> +
> +/**
> + * drm_dep_job_is_signaled() - check whether a dep job has completed
> + * @job: dep job to check
> + *
> + * Determines whether @job has signalled. The queue should be stopped before
> + * calling this to obtain a stable snapshot of state. Both the parent hardware
> + * fence and the finished software fence are checked.
> + *
> + * Context: Process context. The queue must be stopped before calling this.
> + * Return: true if the job is signalled, false otherwise.
> + */
> +bool drm_dep_job_is_signaled(struct drm_dep_job *job)
> +{
> + WARN_ON(!drm_dep_queue_is_stopped(job->q));
> + return drm_dep_fence_is_complete(job->dfence);
> +}
> +EXPORT_SYMBOL(drm_dep_job_is_signaled);
> +
> +/**
> + * drm_dep_job_is_finished() - test whether a dep job's finished fence has signalled
> + * @job: dep job to check
> + *
> + * Tests whether the job's software finished fence has been signalled, using
> + * dma_fence_test_signaled_flag() to avoid any signalling side-effects. Unlike
> + * drm_dep_job_is_signaled(), this does not require the queue to be stopped and
> + * does not check the parent hardware fence — it is a lightweight test of the
> + * finished fence only.
> + *
> + * Context: Any context.
> + * Return: true if the job's finished fence has been signalled, false otherwise.
> + */
> +bool drm_dep_job_is_finished(struct drm_dep_job *job)
> +{
> + return drm_dep_fence_is_finished(job->dfence);
> +}
> +EXPORT_SYMBOL(drm_dep_job_is_finished);
> +
> +/**
> + * drm_dep_job_invalidate_job() - increment the invalidation count for a job
> + * @job: dep job to invalidate
> + * @threshold: threshold above which the job is considered invalidated
> + *
> + * Increments @job->invalidate_count and returns true if it exceeds @threshold,
> + * indicating the job should be considered hung and discarded. The queue must
> + * be stopped before calling this function.
> + *
> + * Context: Process context. The queue must be stopped before calling this.
> + * Return: true if @job->invalidate_count exceeds @threshold, false otherwise.
> + */
> +bool drm_dep_job_invalidate_job(struct drm_dep_job *job, int threshold)
> +{
> + WARN_ON(!drm_dep_queue_is_stopped(job->q));
> + return ++job->invalidate_count > threshold;
> +}
> +EXPORT_SYMBOL(drm_dep_job_invalidate_job);
> +
> +/**
> + * drm_dep_job_finished_fence() - return the finished fence for a job
> + * @job: dep job to query
> + *
> + * No reference is taken on the returned fence; the caller must hold its own
> + * reference to @job for the duration of any access.

Can’t enforce this in C.

> + *
> + * Context: Any context.
> + * Return: the finished &dma_fence for @job.
> + */
> +struct dma_fence *drm_dep_job_finished_fence(struct drm_dep_job *job)
> +{
> + return drm_dep_fence_to_dma(job->dfence);
> +}
> +EXPORT_SYMBOL(drm_dep_job_finished_fence);
> diff --git a/drivers/gpu/drm/dep/drm_dep_job.h b/drivers/gpu/drm/dep/drm_dep_job.h
> new file mode 100644
> index 000000000000..35c61d258fa1
> --- /dev/null
> +++ b/drivers/gpu/drm/dep/drm_dep_job.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2026 Intel Corporation
> + */
> +
> +#ifndef _DRM_DEP_JOB_H_
> +#define _DRM_DEP_JOB_H_
> +
> +struct drm_dep_queue;
> +
> +void drm_dep_job_drop_dependencies(struct drm_dep_job *job);
> +
> +#endif /* _DRM_DEP_JOB_H_ */
> diff --git a/drivers/gpu/drm/dep/drm_dep_queue.c b/drivers/gpu/drm/dep/drm_dep_queue.c
> new file mode 100644
> index 000000000000..dac02d0d22c4
> --- /dev/null
> +++ b/drivers/gpu/drm/dep/drm_dep_queue.c
> @@ -0,0 +1,1647 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright 2015 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + * Copyright © 2026 Intel Corporation
> + */
> +
> +/**
> + * DOC: DRM dependency queue
> + *
> + * The drm_dep subsystem provides a lightweight GPU submission queue that
> + * combines the roles of drm_gpu_scheduler and drm_sched_entity into a
> + * single object (struct drm_dep_queue). Each queue owns its own ordered
> + * submit workqueue, timeout workqueue, and TDR delayed-work.
> + *
> + * **Job lifecycle**
> + *
> + * 1. Allocate and initialise a job with drm_dep_job_init().
> + * 2. Add dependency fences with drm_dep_job_add_dependency() and friends.
> + * 3. Arm the job with drm_dep_job_arm() to obtain its out-fences.
> + * 4. Submit with drm_dep_job_push().
> + *
> + * **Submission paths**
> + *
> + * drm_dep_job_push() decides between two paths under @q->sched.lock:
> + *
> + * - **Bypass path** (drm_dep_queue_can_job_bypass()): if
> + *   %DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED is set, the queue is not stopped,
> + *   the SPSC queue is empty, the job has no dependency fences, and credits
> + *   are available, the job is submitted inline on the calling thread without
> + *   touching the submit workqueue.
> + *
> + * - **Queued path** (drm_dep_queue_push_job()): the job is pushed onto an
> + *   SPSC queue and the run_job worker is kicked. The run_job worker pops the
> + *   job, resolves any remaining dependency fences (installing wakeup
> + *   callbacks for unresolved ones), and calls drm_dep_queue_run_job().
> + *
> + * **Running a job**
> + *
> + * drm_dep_queue_run_job() accounts credits, appends the job to the pending
> + * list (starting the TDR timer only when the list was previously empty),
> + * calls @ops->run_job(), stores the returned hardware fence as the parent
> + * of the job's dep fence, then installs a callback on it. When the hardware
> + * fence fires (or the job completes synchronously), drm_dep_job_done()
> + * signals the finished fence, returns credits, and kicks the put_job worker
> + * to free the job.
> + *
> + * **Timeout detection and recovery (TDR)**
> + *
> + * A delayed work item fires when a job on the pending list takes longer than
> + * @q->job.timeout jiffies. It calls @ops->timedout_job() and acts on the
> + * returned status (%DRM_DEP_TIMEDOUT_STAT_JOB_SIGNALED or
> + * %DRM_DEP_TIMEDOUT_STAT_REQUEUE_JOB).
> + * drm_dep_queue_trigger_timeout() forces the timer to fire immediately (without
> + * changing the stored timeout), for example during device teardown.
> + *
> + * **Reference counting**
> + *
> + * Jobs and queues are both reference counted.
> + *
> + * A job holds a reference to its queue from drm_dep_job_init() until
> + * drm_dep_job_put() drops the job's last reference and its release callback
> + * runs. This ensures the queue remains valid for the entire lifetime of any
> + * job that was submitted to it.
> + *
> + * The queue holds its own reference to a job for as long as the job is
> + * internally tracked: from the moment the job is added to the pending list
> + * in drm_dep_queue_run_job() until drm_dep_job_done() kicks the put_job
> + * worker, which calls drm_dep_job_put() to release that reference.

Why not simply keep track that the job was completed, instead of relinquishing
the reference? We can then release the reference once the job is cleaned up
(by the queue, using a worker) in process context.


> + *
> + * **Hazard: use-after-free from within a worker**
> + *
> + * Because a job holds a queue reference, drm_dep_job_put() dropping the last
> + * job reference will also drop a queue reference via the job's release path.
> + * If that happens to be the last queue reference, drm_dep_queue_fini() can be
> + * called, which queues @q->free_work on dep_free_wq and returns immediately.
> + * free_work calls disable_work_sync() / disable_delayed_work_sync() on the
> + * queue's own workers before destroying its workqueues, so in practice a
> + * running worker always completes before the queue memory is freed.
> + *
> + * However, there is a secondary hazard: a worker can be queued while the
> + * queue is in a "zombie" state — refcount has already reached zero and async
> + * teardown is in flight, but the work item has not yet been disabled by
> + * free_work.  To guard against this every worker uses
> + * drm_dep_queue_get_unless_zero() at entry; if the refcount is already zero
> + * the worker bails immediately without touching the queue state.

Again, this problem is gone in Rust.

> + *
> + * Because all actual teardown (disable_*_sync, destroy_workqueue) runs on
> + * dep_free_wq — which is independent of the queue's own submit/timeout
> + * workqueues — there is no deadlock risk.  Each queue holds a drm_dev_get()
> + * reference on its owning &drm_device, which is released as the last step of
> + * teardown.  This ensures the driver module cannot be unloaded while any queue
> + * is still alive.
> + */
> +
> +#include <linux/dma-resv.h>
> +#include <linux/kref.h>
> +#include <linux/module.h>
> +#include <linux/overflow.h>
> +#include <linux/slab.h>
> +#include <linux/wait.h>
> +#include <linux/workqueue.h>
> +#include <drm/drm_dep.h>
> +#include <drm/drm_drv.h>
> +#include <drm/drm_print.h>
> +#include "drm_dep_fence.h"
> +#include "drm_dep_job.h"
> +#include "drm_dep_queue.h"
> +
> +/*
> + * Dedicated workqueue for deferred drm_dep_queue teardown.  Using a
> + * module-private WQ instead of system_percpu_wq keeps teardown isolated
> + * from unrelated kernel subsystems.
> + */
> +static struct workqueue_struct *dep_free_wq;
> +
> +/**
> + * drm_dep_queue_flags_set() - set a flag on the queue under sched.lock
> + * @q: dep queue
> + * @flag: flag to set (one of &enum drm_dep_queue_flags)
> + *
> + * Sets @flag in @q->sched.flags. Must be called with @q->sched.lock
> + * held; the lockdep assertion enforces this.
> + *
> + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> + */
> +static void drm_dep_queue_flags_set(struct drm_dep_queue *q,
> +    enum drm_dep_queue_flags flag)
> +{
> + lockdep_assert_held(&q->sched.lock);

We can enforce this in Rust at compile-time. The code does not compile if the
lock is not taken. Same here and everywhere else where the sched lock has
to be taken.


> + q->sched.flags |= flag;
> +}
> +
> +/**
> + * drm_dep_queue_flags_clear() - clear a flag on the queue under sched.lock
> + * @q: dep queue
> + * @flag: flag to clear (one of &enum drm_dep_queue_flags)
> + *
> + * Clears @flag in @q->sched.flags. Must be called with @q->sched.lock
> + * held; the lockdep assertion enforces this.
> + *
> + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> + */
> +static void drm_dep_queue_flags_clear(struct drm_dep_queue *q,
> +      enum drm_dep_queue_flags flag)
> +{
> + lockdep_assert_held(&q->sched.lock);
> + q->sched.flags &= ~flag;
> +}
> +
> +/**
> + * drm_dep_queue_has_credits() - check whether the queue has enough credits
> + * @q: dep queue
> + * @job: job requesting credits
> + *
> + * Checks whether the queue has enough available credits to dispatch
> + * @job. If @job->credits exceeds the queue's credit limit, it is
> + * clamped with a WARN.
> + *
> + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> + * Return: true if available credits >= @job->credits, false otherwise.
> + */
> +static bool drm_dep_queue_has_credits(struct drm_dep_queue *q,
> +      struct drm_dep_job *job)
> +{
> + u32 available;
> +
> + lockdep_assert_held(&q->sched.lock);
> +
> + if (job->credits > q->credit.limit) {
> + drm_warn(q->drm,
> + "Jobs may not exceed the credit limit, truncate.\n");
> + job->credits = q->credit.limit;
> + }
> +
> + WARN_ON(check_sub_overflow(q->credit.limit,
> +   atomic_read(&q->credit.count),
> +   &available));
> +
> + return available >= job->credits;
> +}
> +
> +/**
> + * drm_dep_queue_run_job_queue() - kick the run-job worker
> + * @q: dep queue
> + *
> + * Queues @q->sched.run_job on @q->sched.submit_wq unless the queue is stopped
> + * or the job queue is empty.  The empty-queue check avoids queueing a work item
> + * that would immediately return with nothing to do.
> + *
> + * Context: Any context.
> + */
> +static void drm_dep_queue_run_job_queue(struct drm_dep_queue *q)
> +{
> + if (!drm_dep_queue_is_stopped(q) && spsc_queue_count(&q->job.queue))
> + queue_work(q->sched.submit_wq, &q->sched.run_job);
> +}
> +
> +/**
> + * drm_dep_queue_put_job_queue() - kick the put-job worker
> + * @q: dep queue
> + *
> + * Queues @q->sched.put_job on @q->sched.submit_wq unless the queue
> + * is stopped.
> + *
> + * Context: Any context.
> + */
> +static void drm_dep_queue_put_job_queue(struct drm_dep_queue *q)
> +{
> + if (!drm_dep_queue_is_stopped(q))
> + queue_work(q->sched.submit_wq, &q->sched.put_job);
> +}
> +
> +/**
> + * drm_queue_start_timeout() - arm or re-arm the TDR delayed work
> + * @q: dep queue
> + *
> + * Arms the TDR delayed work with @q->job.timeout. No-op if
> + * @q->ops->timedout_job is NULL, the timeout is MAX_SCHEDULE_TIMEOUT,
> + * or the pending list is empty.
> + *
> + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> + */
> +static void drm_queue_start_timeout(struct drm_dep_queue *q)
> +{
> + lockdep_assert_held(&q->job.lock);
> +
> + if (!q->ops->timedout_job ||
> +    q->job.timeout == MAX_SCHEDULE_TIMEOUT ||
> +    list_empty(&q->job.pending))
> + return;
> +
> + mod_delayed_work(q->sched.timeout_wq, &q->sched.tdr, q->job.timeout);
> +}
> +
> +/**
> + * drm_queue_start_timeout_unlocked() - arm TDR, acquiring job.lock
> + * @q: dep queue
> + *
> + * Acquires @q->job.lock with irq and calls
> + * drm_queue_start_timeout().
> + *
> + * Context: Process context (workqueue).
> + */
> +static void drm_queue_start_timeout_unlocked(struct drm_dep_queue *q)
> +{
> + guard(spinlock_irq)(&q->job.lock);
> + drm_queue_start_timeout(q);
> +}
> +
> +/**
> + * drm_dep_queue_remove_dependency() - clear the active dependency and wake
> + *   the run-job worker
> + * @q: dep queue
> + * @f: the dependency fence being removed
> + *
> + * Stores @f into @q->dep.removed_fence via smp_store_release() so that the
> + * run-job worker can drop the reference to it in drm_dep_queue_is_ready(),
> + * paired with smp_load_acquire().  Clears @q->dep.fence and kicks the
> + * run-job worker.
> + *
> + * The fence reference is not dropped here; it is deferred to the run-job
> + * worker via @q->dep.removed_fence to keep this path suitable dma_fence
> + * callback removal in drm_dep_queue_kill().

This is a comment in C, but in Rust this is encoded directly in the type system.

> + *
> + * Context: Any context.
> + */
> +static void drm_dep_queue_remove_dependency(struct drm_dep_queue *q,
> +    struct dma_fence *f)
> +{
> + /* removed_fence must be visible to the reader before &q->dep.fence */
> + smp_store_release(&q->dep.removed_fence, f);
> +
> + WRITE_ONCE(q->dep.fence, NULL);
> + drm_dep_queue_run_job_queue(q);
> +}
> +
> +/**
> + * drm_dep_queue_wakeup() - dma_fence callback to wake the run-job worker
> + * @f: the signalled dependency fence
> + * @cb: callback embedded in the dep queue
> + *
> + * Called from dma_fence_signal() when the active dependency fence signals.
> + * Delegates to drm_dep_queue_remove_dependency() to clear @q->dep.fence and
> + * kick the run-job worker.  The fence reference is not dropped here; it is
> + * deferred to the run-job worker via @q->dep.removed_fence.

Same here.

> + *
> + * Context: Any context.
> + */
> +static void drm_dep_queue_wakeup(struct dma_fence *f, struct dma_fence_cb *cb)
> +{
> + struct drm_dep_queue *q =
> + container_of(cb, struct drm_dep_queue, dep.cb);
> +
> + drm_dep_queue_remove_dependency(q, f);
> +}
> +
> +/**
> + * drm_dep_queue_is_ready() - check whether the queue has a dispatchable job
> + * @q: dep queue
> + *
> + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.

Can’t call this in Rust if the lock is not taken.

> + * Return: true if SPSC queue non-empty and no dep fence pending,
> + *   false otherwise.
> + */
> +static bool drm_dep_queue_is_ready(struct drm_dep_queue *q)
> +{
> + lockdep_assert_held(&q->sched.lock);
> +
> + if (!spsc_queue_count(&q->job.queue))
> + return false;
> +
> + if (READ_ONCE(q->dep.fence))
> + return false;
> +
> + /* Paired with smp_store_release in drm_dep_queue_remove_dependency() */
> + dma_fence_put(smp_load_acquire(&q->dep.removed_fence));
> +
> + q->dep.removed_fence = NULL;
> +
> + return true;
> +}
> +
> +/**
> + * drm_dep_queue_is_killed() - check whether a dep queue has been killed
> + * @q: dep queue to check
> + *
> + * Return: true if %DRM_DEP_QUEUE_FLAGS_KILLED is set on @q, false otherwise.
> + *
> + * Context: Any context.
> + */
> +bool drm_dep_queue_is_killed(struct drm_dep_queue *q)
> +{
> + return !!(q->sched.flags & DRM_DEP_QUEUE_FLAGS_KILLED);
> +}
> +EXPORT_SYMBOL(drm_dep_queue_is_killed);
> +
> +/**
> + * drm_dep_queue_is_initialized() - check whether a dep queue has been initialized
> + * @q: dep queue to check
> + *
> + * A queue is considered initialized once its ops pointer has been set by a
> + * successful call to drm_dep_queue_init().  Drivers that embed a
> + * &drm_dep_queue inside a larger structure may call this before attempting any
> + * other queue operation to confirm that initialization has taken place.
> + * drm_dep_queue_put() must be called if this function returns true to drop the
> + * initialization reference from drm_dep_queue_init().
> + *
> + * Return: true if @q has been initialized, false otherwise.
> + *
> + * Context: Any context.
> + */
> +bool drm_dep_queue_is_initialized(struct drm_dep_queue *q)
> +{
> + return !!q->ops;
> +}
> +EXPORT_SYMBOL(drm_dep_queue_is_initialized);
> +
> +/**
> + * drm_dep_queue_set_stopped() - pre-mark a queue as stopped before first use
> + * @q: dep queue to mark
> + *
> + * Sets %DRM_DEP_QUEUE_FLAGS_STOPPED directly on @q without going through the
> + * normal drm_dep_queue_stop() path.  This is only valid during the driver-side
> + * queue initialisation sequence — i.e. after drm_dep_queue_init() returns but
> + * before the queue is made visible to other threads (e.g. before it is added
> + * to any lookup structures).  Using this after the queue is live is a driver
> + * bug; use drm_dep_queue_stop() instead.
> + *
> + * Context: Process context, queue not yet visible to other threads.
> + */
> +void drm_dep_queue_set_stopped(struct drm_dep_queue *q)
> +{
> + q->sched.flags |= DRM_DEP_QUEUE_FLAGS_STOPPED;
> +}
> +EXPORT_SYMBOL(drm_dep_queue_set_stopped);
> +
> +/**
> + * drm_dep_queue_refcount() - read the current reference count of a queue
> + * @q: dep queue to query
> + *
> + * Returns the instantaneous kref value.  The count may change immediately
> + * after this call; callers must not make safety decisions based solely on
> + * the returned value.  Intended for diagnostic snapshots and debugfs output.
> + *
> + * Context: Any context.
> + * Return: current reference count.
> + */
> +unsigned int drm_dep_queue_refcount(const struct drm_dep_queue *q)
> +{
> + return kref_read(&q->refcount);
> +}
> +EXPORT_SYMBOL(drm_dep_queue_refcount);
> +
> +/**
> + * drm_dep_queue_timeout() - read the per-job TDR timeout for a queue
> + * @q: dep queue to query
> + *
> + * Returns the per-job timeout in jiffies as set at init time.
> + * %MAX_SCHEDULE_TIMEOUT means no timeout is configured.
> + *
> + * Context: Any context.
> + * Return: timeout in jiffies.
> + */
> +long drm_dep_queue_timeout(const struct drm_dep_queue *q)
> +{
> + return q->job.timeout;
> +}
> +EXPORT_SYMBOL(drm_dep_queue_timeout);
> +
> +/**
> + * drm_dep_queue_is_job_put_irq_safe() - test whether job-put from IRQ is allowed
> + * @q: dep queue
> + *
> + * Context: Any context.
> + * Return: true if %DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE is set,
> + *   false otherwise.
> + */
> +static bool drm_dep_queue_is_job_put_irq_safe(const struct drm_dep_queue *q)
> +{
> + return !!(q->sched.flags & DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE);
> +}
> +
> +/**
> + * drm_dep_queue_job_dependency() - get next unresolved dep fence
> + * @q: dep queue
> + * @job: job whose dependencies to advance
> + *
> + * Returns NULL immediately if the queue has been killed via
> + * drm_dep_queue_kill(), bypassing all dependency waits so that jobs
> + * drain through run_job as quickly as possible.
> + *
> + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> + * Return: next unresolved &dma_fence with a new reference, or NULL
> + *   when all dependencies have been consumed (or the queue is killed).
> + */
> +static struct dma_fence *
> +drm_dep_queue_job_dependency(struct drm_dep_queue *q,
> +     struct drm_dep_job *job)
> +{
> + struct dma_fence *f;
> +
> + lockdep_assert_held(&q->sched.lock);
> +
> + if (drm_dep_queue_is_killed(q))
> + return NULL;
> +
> + f = xa_load(&job->dependencies, job->last_dependency);
> + if (f) {
> + job->last_dependency++;
> + if (WARN_ON(DRM_DEP_JOB_FENCE_PREALLOC == f))
> + return dma_fence_get_stub();
> + return dma_fence_get(f);
> + }
> +
> + return NULL;
> +}
> +
> +/**
> + * drm_dep_queue_add_dep_cb() - install wakeup callback on dep fence
> + * @q: dep queue
> + * @job: job whose dependency fence is stored in @q->dep.fence
> + *
> + * Installs a wakeup callback on @q->dep.fence. Returns true if the
> + * callback was installed (the queue must wait), false if the fence is
> + * already signalled or is a self-fence from the same queue context.
> + *
> + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> + * Return: true if callback installed, false if fence already done.
> + */

In Rust, we can encode the signaling paths with a “token type”. So any
sections that are part of the signaling path can simply take this token as an
argument. This type also enforces that end_signaling() is called automatically when it
goes out of scope.

By the way, we can easily offer an irq handler type where we enforce this:

fn handle_threaded_irq(&self, device: &Device<Bound>) -> IrqReturn { 
 let _annotation = DmaFenceSignallingAnnotation::new();  // Calls begin_signaling()
 self.driver.handle_threaded_irq(device) 

 // end_signaling() is called here automatically.
}

Same for workqueues:

fn work_fn(&self, device: &Device<Bound>) {
 let _annotation = DmaFenceSignallingAnnotation::new();  // Calls begin_signaling()
 self.driver.work_fn(device) 

 // end_signaling() is called here automatically.
}

This is not Rust-specific, of course, but it is more ergonomic to write in Rust.

> +static bool drm_dep_queue_add_dep_cb(struct drm_dep_queue *q,
> +     struct drm_dep_job *job)
> +{
> + struct dma_fence *fence = q->dep.fence;
> +
> + lockdep_assert_held(&q->sched.lock);
> +
> + if (WARN_ON(fence->context == q->fence.context)) {
> + dma_fence_put(q->dep.fence);
> + q->dep.fence = NULL;
> + return false;
> + }
> +
> + if (!dma_fence_add_callback(q->dep.fence, &q->dep.cb,
> +    drm_dep_queue_wakeup))
> + return true;
> +
> + dma_fence_put(q->dep.fence);
> + q->dep.fence = NULL;
> +
> + return false;
> +}

In rust we can enforce that all callbacks take a reference to the fence
automatically. If the callback is “forgotten” in a buggy path, it is
automatically removed, and the fence is automatically signaled with -ECANCELED.

> +
> +/**
> + * drm_dep_queue_pop_job() - pop a dispatchable job from the SPSC queue
> + * @q: dep queue
> + *
> + * Peeks at the head of the SPSC queue and drains all resolved
> + * dependencies. If a dependency is still pending, installs a wakeup
> + * callback and returns NULL. On success pops the job and returns it.
> + *
> + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> + * Return: next dispatchable job, or NULL if a dep is still pending.
> + */
> +static struct drm_dep_job *drm_dep_queue_pop_job(struct drm_dep_queue *q)
> +{
> + struct spsc_node *node;
> + struct drm_dep_job *job;
> +
> + lockdep_assert_held(&q->sched.lock);
> +
> + node = spsc_queue_peek(&q->job.queue);
> + if (!node)
> + return NULL;
> +
> + job = container_of(node, struct drm_dep_job, queue_node);
> +
> + while ((q->dep.fence = drm_dep_queue_job_dependency(q, job))) {
> + if (drm_dep_queue_add_dep_cb(q, job))
> + return NULL;
> + }
> +
> + spsc_queue_pop(&q->job.queue);
> +
> + return job;
> +}
> +
> +/*
> + * drm_dep_queue_get_unless_zero() - try to acquire a queue reference
> + *
> + * Workers use this instead of drm_dep_queue_get() to guard against the zombie
> + * state: the queue's refcount has already reached zero (async teardown is in
> + * flight) but a work item was queued before free_work had a chance to cancel
> + * it.  If kref_get_unless_zero() fails the caller must bail immediately.
> + *
> + * Context: Any context.
> + * Returns true if the reference was acquired, false if the queue is zombie.
> + */

Again, this function is totally gone in Rust.

> +bool drm_dep_queue_get_unless_zero(struct drm_dep_queue *q)
> +{
> + return kref_get_unless_zero(&q->refcount);
> +}
> +EXPORT_SYMBOL(drm_dep_queue_get_unless_zero);
> +
> +/**
> + * drm_dep_queue_run_job_work() - run-job worker
> + * @work: work item embedded in the dep queue
> + *
> + * Acquires @q->sched.lock, checks stopped state, queue readiness and
> + * available credits, pops the next job via drm_dep_queue_pop_job(),
> + * dispatches it via drm_dep_queue_run_job(), then re-kicks itself.
> + *
> + * Uses drm_dep_queue_get_unless_zero() at entry and bails immediately if the
> + * queue is in zombie state (refcount already zero, async teardown in flight).
> + *
> + * Context: Process context (workqueue). DMA fence signaling path.
> + */
> +static void drm_dep_queue_run_job_work(struct work_struct *work)
> +{
> + struct drm_dep_queue *q =
> + container_of(work, struct drm_dep_queue, sched.run_job);
> + struct spsc_node *node;
> + struct drm_dep_job *job;
> + bool cookie = dma_fence_begin_signalling();
> +
> + /* Bail if queue is zombie (refcount already zero, teardown in flight). */
> + if (!drm_dep_queue_get_unless_zero(q)) {
> + dma_fence_end_signalling(cookie);
> + return;
> + }
> +
> + mutex_lock(&q->sched.lock);
> +
> + if (drm_dep_queue_is_stopped(q))
> + goto put_queue;
> +
> + if (!drm_dep_queue_is_ready(q))
> + goto put_queue;
> +
> + /* Peek to check credits before committing to pop and dep resolution */
> + node = spsc_queue_peek(&q->job.queue);
> + if (!node)
> + goto put_queue;
> +
> + job = container_of(node, struct drm_dep_job, queue_node);
> + if (!drm_dep_queue_has_credits(q, job))
> + goto put_queue;
> +
> + job = drm_dep_queue_pop_job(q);
> + if (!job)
> + goto put_queue;
> +
> + drm_dep_queue_run_job(q, job);
> + drm_dep_queue_run_job_queue(q);
> +
> +put_queue:
> + mutex_unlock(&q->sched.lock);
> + drm_dep_queue_put(q);
> + dma_fence_end_signalling(cookie);
> +}
> +
> +/*
> + * drm_dep_queue_remove_job() - unlink a job from the pending list and reset TDR
> + * @q:   dep queue owning @job
> + * @job: job to remove
> + *
> + * Splices @job out of @q->job.pending, cancels any pending TDR delayed work,
> + * and arms the timeout for the new list head (if any).
> + *
> + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> + */
> +static void drm_dep_queue_remove_job(struct drm_dep_queue *q,
> +     struct drm_dep_job *job)
> +{
> + lockdep_assert_held(&q->job.lock);
> +
> + list_del_init(&job->pending_link);
> + cancel_delayed_work(&q->sched.tdr);
> + drm_queue_start_timeout(q);
> +}
> +
> +/**
> + * drm_dep_queue_get_finished_job() - dequeue a finished job
> + * @q: dep queue
> + *
> + * Under @q->job.lock checks the head of the pending list for a
> + * finished dep fence. If found, removes the job from the list,
> + * cancels the TDR, and re-arms it for the new head.
> + *
> + * Context: Process context (workqueue). DMA fence signaling path.
> + * Return: the finished &drm_dep_job, or NULL if none is ready.
> + */
> +static struct drm_dep_job *
> +drm_dep_queue_get_finished_job(struct drm_dep_queue *q)
> +{
> + struct drm_dep_job *job;
> +
> + guard(spinlock_irq)(&q->job.lock);
> +
> + job = list_first_entry_or_null(&q->job.pending, struct drm_dep_job,
> +       pending_link);
> + if (job && drm_dep_fence_is_finished(job->dfence))
> + drm_dep_queue_remove_job(q, job);
> + else
> + job = NULL;
> +
> + return job;
> +}
> +
> +/**
> + * drm_dep_queue_put_job_work() - put-job worker
> + * @work: work item embedded in the dep queue
> + *
> + * Drains all finished jobs by calling drm_dep_job_put() in a loop,
> + * then kicks the run-job worker.
> + *
> + * Uses drm_dep_queue_get_unless_zero() at entry and bails immediately if the
> + * queue is in zombie state (refcount already zero, async teardown in flight).
> + *
> + * Wraps execution in dma_fence_begin_signalling() / dma_fence_end_signalling()
> + * because workqueue is shared with other items in the fence signaling path.
> + *
> + * Context: Process context (workqueue). DMA fence signaling path.
> + */
> +static void drm_dep_queue_put_job_work(struct work_struct *work)
> +{
> + struct drm_dep_queue *q =
> + container_of(work, struct drm_dep_queue, sched.put_job);
> + struct drm_dep_job *job;
> + bool cookie = dma_fence_begin_signalling();
> +
> + /* Bail if queue is zombie (refcount already zero, teardown in flight). */
> + if (!drm_dep_queue_get_unless_zero(q)) {
> + dma_fence_end_signalling(cookie);
> + return;
> + }
> +
> + while ((job = drm_dep_queue_get_finished_job(q)))
> + drm_dep_job_put(job);
> +
> + drm_dep_queue_run_job_queue(q);
> +
> + drm_dep_queue_put(q);
> + dma_fence_end_signalling(cookie);
> +}
> +
> +/**
> + * drm_dep_queue_tdr_work() - TDR worker
> + * @work: work item embedded in the delayed TDR work
> + *
> + * Removes the head job from the pending list under @q->job.lock,
> + * asserts @q->ops->timedout_job is non-NULL, calls it outside the lock,
> + * requeues the job if %DRM_DEP_TIMEDOUT_STAT_REQUEUE_JOB, drops the
> + * queue's job reference on %DRM_DEP_TIMEDOUT_STAT_JOB_SIGNALED, and always
> + * restarts the TDR timer after handling the job (unless @q is stopping).
> + * Any other return value triggers a WARN.
> + *
> + * The TDR is never armed when @q->ops->timedout_job is NULL, so firing
> + * this worker without a timedout_job callback is a driver bug.
> + *
> + * Uses drm_dep_queue_get_unless_zero() at entry and bails immediately if the
> + * queue is in zombie state (refcount already zero, async teardown in flight).
> + *
> + * Wraps execution in dma_fence_begin_signalling() / dma_fence_end_signalling()
> + * because timedout_job() is expected to signal the guilty job's fence as part
> + * of reset.
> + *
> + * Context: Process context (workqueue). DMA fence signaling path.
> + */
> +static void drm_dep_queue_tdr_work(struct work_struct *work)
> +{
> + struct drm_dep_queue *q =
> + container_of(work, struct drm_dep_queue, sched.tdr.work);
> + struct drm_dep_job *job;
> + bool cookie = dma_fence_begin_signalling();
> +
> + /* Bail if queue is zombie (refcount already zero, teardown in flight). */
> + if (!drm_dep_queue_get_unless_zero(q)) {
> + dma_fence_end_signalling(cookie);
> + return;
> + }
> +
> + scoped_guard(spinlock_irq, &q->job.lock) {
> + job = list_first_entry_or_null(&q->job.pending,
> +       struct drm_dep_job,
> +       pending_link);
> + if (job)
> + /*
> + * Remove from pending so it cannot be freed
> + * concurrently by drm_dep_queue_get_finished_job() or
> + * .drm_dep_job_done().
> + */
> + list_del_init(&job->pending_link);
> + }
> +
> + if (job) {
> + enum drm_dep_timedout_stat status;
> +
> + if (WARN_ON(!q->ops->timedout_job)) {
> + drm_dep_job_put(job);
> + goto out;
> + }
> +
> + status = q->ops->timedout_job(job);
> +
> + switch (status) {
> + case DRM_DEP_TIMEDOUT_STAT_REQUEUE_JOB:
> + scoped_guard(spinlock_irq, &q->job.lock)
> + list_add(&job->pending_link, &q->job.pending);
> + drm_dep_queue_put_job_queue(q);
> + break;
> + case DRM_DEP_TIMEDOUT_STAT_JOB_SIGNALED:
> + drm_dep_job_put(job);
> + break;
> + default:
> + WARN_ON("invalid drm_dep_timedout_stat");
> + break;
> + }
> + }
> +
> +out:
> + drm_queue_start_timeout_unlocked(q);
> + drm_dep_queue_put(q);
> + dma_fence_end_signalling(cookie);
> +}
> +
> +/**
> + * drm_dep_alloc_submit_wq() - allocate an ordered submit workqueue
> + * @name: name for the workqueue
> + * @flags: DRM_DEP_QUEUE_FLAGS_* flags
> + *
> + * Allocates an ordered workqueue for job submission with %WQ_MEM_RECLAIM and
> + * %WQ_MEM_WARN_ON_RECLAIM set, ensuring the workqueue is safe to use from
> + * memory reclaim context and properly annotated for lockdep taint tracking.
> + * Adds %WQ_HIGHPRI if %DRM_DEP_QUEUE_FLAGS_HIGHPRI is set. When
> + * CONFIG_LOCKDEP is enabled, uses a dedicated lockdep map for annotation.
> + *
> + * Context: Process context.
> + * Return: the new &workqueue_struct, or NULL on failure.
> + */
> +static struct workqueue_struct *
> +drm_dep_alloc_submit_wq(const char *name, enum drm_dep_queue_flags flags)
> +{
> + unsigned int wq_flags = WQ_MEM_RECLAIM | WQ_MEM_WARN_ON_RECLAIM;
> +
> + if (flags & DRM_DEP_QUEUE_FLAGS_HIGHPRI)
> + wq_flags |= WQ_HIGHPRI;
> +
> +#if IS_ENABLED(CONFIG_LOCKDEP)
> + static struct lockdep_map map = {
> + .name = "drm_dep_submit_lockdep_map"
> + };
> + return alloc_ordered_workqueue_lockdep_map(name, wq_flags, &map);
> +#else
> + return alloc_ordered_workqueue(name, wq_flags);
> +#endif
> +}
> +
> +/**
> + * drm_dep_alloc_timeout_wq() - allocate an ordered TDR workqueue
> + * @name: name for the workqueue
> + *
> + * Allocates an ordered workqueue for timeout detection and recovery with
> + * %WQ_MEM_RECLAIM and %WQ_MEM_WARN_ON_RECLAIM set, ensuring consistent taint
> + * annotation with the submit workqueue. When CONFIG_LOCKDEP is enabled, uses
> + * a dedicated lockdep map for annotation.
> + *
> + * Context: Process context.
> + * Return: the new &workqueue_struct, or NULL on failure.
> + */
> +static struct workqueue_struct *drm_dep_alloc_timeout_wq(const char *name)
> +{
> + unsigned int wq_flags = WQ_MEM_RECLAIM | WQ_MEM_WARN_ON_RECLAIM;
> +
> +#if IS_ENABLED(CONFIG_LOCKDEP)
> + static struct lockdep_map map = {
> + .name = "drm_dep_timeout_lockdep_map"
> + };
> + return alloc_ordered_workqueue_lockdep_map(name, wq_flags, &map);
> +#else
> + return alloc_ordered_workqueue(name, wq_flags);
> +#endif
> +}
> +
> +/**
> + * drm_dep_queue_init() - initialize a dep queue
> + * @q: dep queue to initialize
> + * @args: initialization arguments
> + *
> + * Initializes all fields of @q from @args. If @args->submit_wq is NULL an
> + * ordered workqueue is allocated and owned by the queue
> + * (%DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ). If @args->timeout_wq is NULL an
> + * ordered workqueue is allocated and owned by the queue
> + * (%DRM_DEP_QUEUE_FLAGS_OWN_TIMEDOUT_WQ). On success the queue holds one kref
> + * reference and drm_dep_queue_put() must be called to drop this reference
> + * (i.e., drivers cannot directly free the queue).
> + *
> + * When CONFIG_LOCKDEP is enabled, @q->sched.lock is primed against the
> + * fs_reclaim pseudo-lock so that lockdep can detect any lock ordering
> + * inversion between @sched.lock and memory reclaim.
> + *
> + * Return: 0 on success, %-EINVAL when @args->credit_limit is zero, @args->ops
> + * is NULL, @args->drm is NULL, @args->ops->run_job is NULL, or when
> + * @args->submit_wq or @args->timeout_wq is non-NULL but was not allocated with
> + * %WQ_MEM_WARN_ON_RECLAIM; %-ENOMEM when workqueue allocation fails.
> + *
> + * Context: Process context. May allocate memory and create workqueues.
> + */
> +int drm_dep_queue_init(struct drm_dep_queue *q,
> +       const struct drm_dep_queue_init_args *args)
> +{
> + if (!args->credit_limit || !args->drm || !args->ops ||
> +    !args->ops->run_job)
> + return -EINVAL;
> +
> + if (args->submit_wq && !workqueue_is_reclaim_annotated(args->submit_wq))
> + return -EINVAL;
> +
> + if (args->timeout_wq &&
> +    !workqueue_is_reclaim_annotated(args->timeout_wq))
> + return -EINVAL;
> +
> + memset(q, 0, sizeof(*q));
> +
> + q->name = args->name;
> + q->drm = args->drm;
> + q->credit.limit = args->credit_limit;
> + q->job.timeout = args->timeout ? args->timeout : MAX_SCHEDULE_TIMEOUT;
> +
> + init_rcu_head(&q->rcu);
> + INIT_LIST_HEAD(&q->job.pending);
> + spin_lock_init(&q->job.lock);
> + spsc_queue_init(&q->job.queue);
> +
> + mutex_init(&q->sched.lock);
> + if (IS_ENABLED(CONFIG_LOCKDEP)) {
> + fs_reclaim_acquire(GFP_KERNEL);
> + might_lock(&q->sched.lock);
> + fs_reclaim_release(GFP_KERNEL);
> + }
> +
> + if (args->submit_wq) {
> + q->sched.submit_wq = args->submit_wq;
> + } else {
> + q->sched.submit_wq = drm_dep_alloc_submit_wq(args->name ?: "drm_dep",
> +     args->flags);
> + if (!q->sched.submit_wq)
> + return -ENOMEM;
> +
> + q->sched.flags |= DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ;
> + }
> +
> + if (args->timeout_wq) {
> + q->sched.timeout_wq = args->timeout_wq;
> + } else {
> + q->sched.timeout_wq = drm_dep_alloc_timeout_wq(args->name ?: "drm_dep");
> + if (!q->sched.timeout_wq)
> + goto err_submit_wq;
> +
> + q->sched.flags |= DRM_DEP_QUEUE_FLAGS_OWN_TIMEDOUT_WQ;
> + }
> +
> + q->sched.flags |= args->flags &
> + ~(DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ |
> +  DRM_DEP_QUEUE_FLAGS_OWN_TIMEDOUT_WQ);
> +
> + INIT_DELAYED_WORK(&q->sched.tdr, drm_dep_queue_tdr_work);
> + INIT_WORK(&q->sched.run_job, drm_dep_queue_run_job_work);
> + INIT_WORK(&q->sched.put_job, drm_dep_queue_put_job_work);
> +
> + q->fence.context = dma_fence_context_alloc(1);
> +
> + kref_init(&q->refcount);
> + q->ops = args->ops;
> + drm_dev_get(q->drm);
> +
> + return 0;
> +
> +err_submit_wq:
> + if (q->sched.flags & DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ)
> + destroy_workqueue(q->sched.submit_wq);
> + mutex_destroy(&q->sched.lock);
> +
> + return -ENOMEM;
> +}
> +EXPORT_SYMBOL(drm_dep_queue_init);
> +
> +#if IS_ENABLED(CONFIG_PROVE_LOCKING)
> +/**
> + * drm_dep_queue_push_job_begin() - mark the start of an arm/push critical section
> + * @q: dep queue the job belongs to
> + *
> + * Called at the start of drm_dep_job_arm() and warns if the push context is
> + * already owned by another task, which would indicate concurrent arm/push on
> + * the same queue.
> + *
> + * No-op when CONFIG_PROVE_LOCKING is disabled.
> + *
> + * Context: Process context. DMA fence signaling path.
> + */
> +void drm_dep_queue_push_job_begin(struct drm_dep_queue *q)
> +{
> + WARN_ON(q->job.push.owner);
> + q->job.push.owner = current;
> +}
> +
> +/**
> + * drm_dep_queue_push_job_end() - mark the end of an arm/push critical section
> + * @q: dep queue the job belongs to
> + *
> + * Called at the end of drm_dep_job_push() and warns if the push context is not
> + * owned by the current task, which would indicate a mismatched begin/end pair
> + * or a push from the wrong thread.
> + *
> + * No-op when CONFIG_PROVE_LOCKING is disabled.
> + *
> + * Context: Process context. DMA fence signaling path.
> + */
> +void drm_dep_queue_push_job_end(struct drm_dep_queue *q)
> +{
> + WARN_ON(q->job.push.owner != current);
> + q->job.push.owner = NULL;
> +}
> +#endif
> +
> +/**
> + * drm_dep_queue_assert_teardown_invariants() - assert teardown invariants
> + * @q: dep queue being torn down
> + *
> + * Warns if the pending-job list, the SPSC submission queue, or the credit
> + * counter is non-zero when called, or if the queue still has a non-zero
> + * reference count.
> + *
> + * Context: Any context.
> + */
> +static void drm_dep_queue_assert_teardown_invariants(struct drm_dep_queue *q)
> +{
> + WARN_ON(!list_empty(&q->job.pending));
> + WARN_ON(spsc_queue_count(&q->job.queue));
> + WARN_ON(atomic_read(&q->credit.count));
> + WARN_ON(drm_dep_queue_refcount(q));
> +}
> +
> +/**
> + * drm_dep_queue_release() - final internal cleanup of a dep queue
> + * @q: dep queue to clean up
> + *
> + * Asserts teardown invariants and destroys internal resources allocated by
> + * drm_dep_queue_init() that cannot be torn down earlier in the teardown
> + * sequence.  Currently this destroys @q->sched.lock.
> + *
> + * Drivers that implement &drm_dep_queue_ops.release **must** call this
> + * function after removing @q from any internal bookkeeping (e.g. lookup
> + * tables or lists) but before freeing the memory that contains @q.  When
> + * &drm_dep_queue_ops.release is NULL, drm_dep follows the default teardown
> + * path and calls this function automatically.
> + *
> + * Context: Any context.
> + */
> +void drm_dep_queue_release(struct drm_dep_queue *q)
> +{
> + drm_dep_queue_assert_teardown_invariants(q);
> + mutex_destroy(&q->sched.lock);
> +}
> +EXPORT_SYMBOL(drm_dep_queue_release);
> +
> +/**
> + * drm_dep_queue_free() - final cleanup of a dep queue
> + * @q: dep queue to free
> + *
> + * Invokes &drm_dep_queue_ops.release if set, in which case the driver is
> + * responsible for calling drm_dep_queue_release() and freeing @q itself.
> + * If &drm_dep_queue_ops.release is NULL, calls drm_dep_queue_release()
> + * and then frees @q with kfree_rcu().
> + *
> + * In either case, releases the drm_dev_get() reference taken at init time
> + * via drm_dev_put(), allowing the owning &drm_device to be unloaded once
> + * all queues have been freed.
> + *
> + * Context: Process context (workqueue), reclaim safe.
> + */
> +static void drm_dep_queue_free(struct drm_dep_queue *q)
> +{
> + struct drm_device *drm = q->drm;
> +
> + if (q->ops->release) {
> + q->ops->release(q);
> + } else {
> + drm_dep_queue_release(q);
> + kfree_rcu(q, rcu);
> + }
> + drm_dev_put(drm);
> +}
> +
> +/**
> + * drm_dep_queue_free_work() - deferred queue teardown worker
> + * @work: free_work item embedded in the dep queue
> + *
> + * Runs on dep_free_wq. Disables all work items synchronously
> + * (preventing re-queue and waiting for in-flight instances),
> + * destroys any owned workqueues, then calls drm_dep_queue_free().
> + * Running on dep_free_wq ensures destroy_workqueue() is never
> + * called from within one of the queue's own workers (deadlock)
> + * and disable_*_sync() cannot deadlock either.
> + *
> + * Context: Process context (workqueue), reclaim safe.
> + */
> +static void drm_dep_queue_free_work(struct work_struct *work)
> +{
> + struct drm_dep_queue *q =
> + container_of(work, struct drm_dep_queue, free_work);
> +
> + drm_dep_queue_assert_teardown_invariants(q);
> +
> + disable_delayed_work_sync(&q->sched.tdr);
> + disable_work_sync(&q->sched.run_job);
> + disable_work_sync(&q->sched.put_job);
> +
> + if (q->sched.flags & DRM_DEP_QUEUE_FLAGS_OWN_TIMEDOUT_WQ)
> + destroy_workqueue(q->sched.timeout_wq);
> +
> + if (q->sched.flags & DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ)
> + destroy_workqueue(q->sched.submit_wq);
> +
> + drm_dep_queue_free(q);
> +}
> +
> +/**
> + * drm_dep_queue_fini() - tear down a dep queue
> + * @q: dep queue to tear down
> + *
> + * Asserts teardown invariants  and nitiates teardown of @q by queuing the
> + * deferred free work onto tht module-private dep_free_wq workqueue.  The work
> + * item disables any pending TDR and run/put-job work synchronously, destroys
> + * any workqueues that were allocated by drm_dep_queue_init(), and then releases
> + * the queue memory.
> + *
> + * Running teardown from dep_free_wq ensures that destroy_workqueue() is never
> + * called from within one of the queue's own workers (e.g. via
> + * drm_dep_queue_put()), which would deadlock.
> + *
> + * Drivers can wait for all outstanding deferred work to complete by waiting
> + * for the last drm_dev_put() reference on their &drm_device, which is
> + * released as the final step of each queue's teardown.
> + *
> + * Drivers that implement &drm_dep_queue_ops.fini **must** call this
> + * function after removing @q from any device bookkeeping but before freeing the
> + * memory that contains @q.  When &drm_dep_queue_ops.fini is NULL, drm_dep
> + * follows the default teardown path and calls this function automatically.
> + *
> + * Context: Any context.
> + */
> +void drm_dep_queue_fini(struct drm_dep_queue *q)
> +{
> + drm_dep_queue_assert_teardown_invariants(q);
> +
> + INIT_WORK(&q->free_work, drm_dep_queue_free_work);
> + queue_work(dep_free_wq, &q->free_work);
> +}
> +EXPORT_SYMBOL(drm_dep_queue_fini);
> +
> +/**
> + * drm_dep_queue_get() - acquire a reference to a dep queue
> + * @q: dep queue to acquire a reference on, or NULL
> + *
> + * Return: @q with an additional reference held, or NULL if @q is NULL.
> + *
> + * Context: Any context.
> + */
> +struct drm_dep_queue *drm_dep_queue_get(struct drm_dep_queue *q)
> +{
> + if (q)
> + kref_get(&q->refcount);
> + return q;
> +}
> +EXPORT_SYMBOL(drm_dep_queue_get);
> +
> +/**
> + * __drm_dep_queue_release() - kref release callback for a dep queue
> + * @kref: kref embedded in the dep queue
> + *
> + * Calls &drm_dep_queue_ops.fini if set, otherwise calls
> + * drm_dep_queue_fini() to initiate deferred teardown.
> + *
> + * Context: Any context.
> + */
> +static void __drm_dep_queue_release(struct kref *kref)
> +{
> + struct drm_dep_queue *q =
> + container_of(kref, struct drm_dep_queue, refcount);
> +
> + if (q->ops->fini)
> + q->ops->fini(q);
> + else
> + drm_dep_queue_fini(q);
> +}
> +
> +/**
> + * drm_dep_queue_put() - release a reference to a dep queue
> + * @q: dep queue to release a reference on, or NULL
> + *
> + * When the last reference is dropped, calls &drm_dep_queue_ops.fini if set,
> + * otherwise calls drm_dep_queue_fini(). Final memory release is handled by
> + * &drm_dep_queue_ops.release (which must call drm_dep_queue_release()) if set,
> + * or drm_dep_queue_release() followed by kfree_rcu() otherwise.
> + * Does nothing if @q is NULL.
> + *
> + * Context: Any context.
> + */
> +void drm_dep_queue_put(struct drm_dep_queue *q)
> +{
> + if (q)
> + kref_put(&q->refcount, __drm_dep_queue_release);
> +}
> +EXPORT_SYMBOL(drm_dep_queue_put);
> +
> +/**
> + * drm_dep_queue_stop() - stop a dep queue from processing new jobs
> + * @q: dep queue to stop
> + *
> + * Sets %DRM_DEP_QUEUE_FLAGS_STOPPED on @q under both @q->sched.lock (mutex)
> + * and @q->job.lock (spinlock_irq), making the flag safe to test from finished
> + * fenced signaling context. Then cancels any in-flight run_job and put_job work
> + * items. Once stopped, the bypass path and the submit workqueue will not
> + * dispatch further jobs nor will any jobs be removed from the pending list.
> + * Call drm_dep_queue_start() to resume processing.
> + *
> + * Context: Process context. Waits for in-flight workers to complete.
> + */
> +void drm_dep_queue_stop(struct drm_dep_queue *q)
> +{
> + scoped_guard(mutex, &q->sched.lock) {
> + scoped_guard(spinlock_irq, &q->job.lock)
> + drm_dep_queue_flags_set(q, DRM_DEP_QUEUE_FLAGS_STOPPED);
> + }
> + cancel_work_sync(&q->sched.run_job);
> + cancel_work_sync(&q->sched.put_job);
> +}
> +EXPORT_SYMBOL(drm_dep_queue_stop);
> +
> +/**
> + * drm_dep_queue_start() - resume a stopped dep queue
> + * @q: dep queue to start
> + *
> + * Clears %DRM_DEP_QUEUE_FLAGS_STOPPED on @q under both @q->sched.lock (mutex)
> + * and @q->job.lock (spinlock_irq), making the flag safe to test from IRQ
> + * context. Then re-queues the run_job and put_job work items so that any jobs
> + * pending since the queue was stopped are processed. Must only be called after
> + * drm_dep_queue_stop().
> + *
> + * Context: Process context.
> + */
> +void drm_dep_queue_start(struct drm_dep_queue *q)
> +{
> + scoped_guard(mutex, &q->sched.lock) {
> + scoped_guard(spinlock_irq, &q->job.lock)
> + drm_dep_queue_flags_clear(q, DRM_DEP_QUEUE_FLAGS_STOPPED);
> + }
> + drm_dep_queue_run_job_queue(q);
> + drm_dep_queue_put_job_queue(q);
> +}
> +EXPORT_SYMBOL(drm_dep_queue_start);
> +
> +/**
> + * drm_dep_queue_trigger_timeout() - trigger the TDR immediately for
> + *   all pending jobs
> + * @q: dep queue to trigger timeout on
> + *
> + * Sets @q->job.timeout to 1 and arms the TDR delayed work with a one-jiffy
> + * delay, causing it to fire almost immediately without hot-spinning at zero
> + * delay. This is used to force-expire any pendind jobs on the queue, for
> + * example when the device is being torn down or has encountered an
> + * unrecoverable error.
> + *
> + * It is suggested that when this function is used, the first timedout_job call
> + * causes the driver to kick the queue off the hardware and signal all pending
> + * job fences. Subsequent calls continue to signal all pending job fences.
> + *
> + * Has no effect if the pending list is empty.
> + *
> + * Context: Any context.
> + */
> +void drm_dep_queue_trigger_timeout(struct drm_dep_queue *q)
> +{
> + guard(spinlock_irqsave)(&q->job.lock);
> + q->job.timeout = 1;
> + drm_queue_start_timeout(q);
> +}
> +EXPORT_SYMBOL(drm_dep_queue_trigger_timeout);
> +
> +/**
> + * drm_dep_queue_cancel_tdr_sync() - cancel any pending TDR and wait
> + *   for it to finish
> + * @q: dep queue whose TDR to cancel
> + *
> + * Cancels the TDR delayed work item if it has not yet started, and waits for
> + * it to complete if it is already running.  After this call returns, the TDR
> + * worker is guaranteed not to be executing and will not fire again until
> + * explicitly rearmed (e.g. via drm_dep_queue_resume_timeout() or by a new
> + * job being submitted).
> + *
> + * Useful during error recovery or queue teardown when the caller needs to
> + * know that no timeout handling races with its own reset logic.
> + *
> + * Context: Process context. May sleep waiting for the TDR worker to finish.
> + */
> +void drm_dep_queue_cancel_tdr_sync(struct drm_dep_queue *q)
> +{
> + cancel_delayed_work_sync(&q->sched.tdr);
> +}
> +EXPORT_SYMBOL(drm_dep_queue_cancel_tdr_sync);
> +
> +/**
> + * drm_dep_queue_resume_timeout() - restart the TDR timer with the
> + *   configured timeout
> + * @q: dep queue to resume the timeout for
> + *
> + * Restarts the TDR delayed work using @q->job.timeout. Called after device
> + * recovery to give pending jobs a fresh full timeout window. Has no effect
> + * if the pending list is empty.
> + *
> + * Context: Any context.
> + */
> +void drm_dep_queue_resume_timeout(struct drm_dep_queue *q)
> +{
> + drm_queue_start_timeout_unlocked(q);
> +}
> +EXPORT_SYMBOL(drm_dep_queue_resume_timeout);
> +
> +/**
> + * drm_dep_queue_is_stopped() - check whether a dep queue is stopped
> + * @q: dep queue to check
> + *
> + * Return: true if %DRM_DEP_QUEUE_FLAGS_STOPPED is set on @q, false otherwise.
> + *
> + * Context: Any context.
> + */
> +bool drm_dep_queue_is_stopped(struct drm_dep_queue *q)
> +{
> + return !!(q->sched.flags & DRM_DEP_QUEUE_FLAGS_STOPPED);
> +}
> +EXPORT_SYMBOL(drm_dep_queue_is_stopped);
> +
> +/**
> + * drm_dep_queue_kill() - kill a dep queue and flush all pending jobs
> + * @q: dep queue to kill
> + *
> + * Sets %DRM_DEP_QUEUE_FLAGS_KILLED on @q under @q->sched.lock.  If a
> + * dependency fence is currently being waited on, its callback is removed and
> + * the run-job worker is kicked immediately so that the blocked job drains
> + * without waiting.
> + *
> + * Once killed, drm_dep_queue_job_dependency() returns NULL for all jobs,
> + * bypassing dependency waits so that every queued job drains through
> + * &drm_dep_queue_ops.run_job without blocking.
> + *
> + * The &drm_dep_queue_ops.run_job callback is guaranteed to be called for every
> + * job that was pushed before or after drm_dep_queue_kill(), even during queue
> + * teardown.  Drivers should use this guarantee to perform any necessary
> + * bookkeeping cleanup without executing the actual backend operation when the
> + * queue is killed.
> + *
> + * Unlike drm_dep_queue_stop(), killing is one-way: there is no corresponding
> + * start function.
> + *
> + * **Driver safety requirement**
> + *
> + * drm_dep_queue_kill() must only be called once the driver can guarantee that
> + * no job in the queue will touch memory associated with any of its fences
> + * (i.e., the queue has been removed from the device and will never be put back
> + * on).
> + *
> + * Context: Process context.
> + */
> +void drm_dep_queue_kill(struct drm_dep_queue *q)
> +{
> + scoped_guard(mutex, &q->sched.lock) {
> + struct dma_fence *fence;
> +
> + drm_dep_queue_flags_set(q, DRM_DEP_QUEUE_FLAGS_KILLED);
> +
> + /*
> + * Holding &q->sched.lock guarantees that the run-job work item
> + * cannot drop its reference to q->dep.fence concurrently, so
> + * reading q->dep.fence here is safe.
> + */
> + fence = READ_ONCE(q->dep.fence);
> + if (fence && dma_fence_remove_callback(fence, &q->dep.cb))
> + drm_dep_queue_remove_dependency(q, fence);
> + }
> +}
> +EXPORT_SYMBOL(drm_dep_queue_kill);
> +
> +/**
> + * drm_dep_queue_submit_wq() - retrieve the submit workqueue of a dep queue
> + * @q: dep queue whose workqueue to retrieve
> + *
> + * Drivers may use this to queue their own work items alongside the queue's
> + * internal run-job and put-job workers — for example to process incoming
> + * messages in the same serialisation domain.
> + *
> + * Prefer drm_dep_queue_work_enqueue() when the only need is to enqueue a
> + * work item, as it additionally checks the stopped state.  Use this accessor
> + * when the workqueue itself is required (e.g. for alloc_ordered_workqueue
> + * replacement or drain_workqueue calls).
> + *
> + * Context: Any context.
> + * Return: the &workqueue_struct used by @q for job submission.
> + */
> +struct workqueue_struct *drm_dep_queue_submit_wq(struct drm_dep_queue *q)
> +{
> + return q->sched.submit_wq;
> +}
> +EXPORT_SYMBOL(drm_dep_queue_submit_wq);
> +
> +/**
> + * drm_dep_queue_timeout_wq() - retrieve the timeout workqueue of a dep queue
> + * @q: dep queue whose workqueue to retrieve
> + *
> + * Returns the workqueue used by @q to run TDR (timeout detection and recovery)
> + * work.  Drivers may use this to queue their own timeout-domain work items, or
> + * to call drain_workqueue() when tearing down and needing to ensure all pending
> + * timeout callbacks have completed before proceeding.
> + *
> + * Context: Any context.
> + * Return: the &workqueue_struct used by @q for TDR work.
> + */
> +struct workqueue_struct *drm_dep_queue_timeout_wq(struct drm_dep_queue *q)
> +{
> + return q->sched.timeout_wq;
> +}
> +EXPORT_SYMBOL(drm_dep_queue_timeout_wq);
> +
> +/**
> + * drm_dep_queue_work_enqueue() - queue work on the dep queue's submit workqueue
> + * @q: dep queue to enqueue work on
> + * @work: work item to enqueue
> + *
> + * Queues @work on @q->sched.submit_wq if the queue is not stopped.  This
> + * allows drivers to schedule custom work items that run serialised with the
> + * queue's own run-job and put-job workers.
> + *
> + * Return: true if the work was queued, false if the queue is stopped or the
> + * work item was already pending.
> + *
> + * Context: Any context.
> + */
> +bool drm_dep_queue_work_enqueue(struct drm_dep_queue *q,
> + struct work_struct *work)
> +{
> + if (drm_dep_queue_is_stopped(q))
> + return false;
> +
> + return queue_work(q->sched.submit_wq, work);
> +}
> +EXPORT_SYMBOL(drm_dep_queue_work_enqueue);
> +
> +/**
> + * drm_dep_queue_can_job_bypass() - test whether a job can skip the SPSC queue
> + * @q: dep queue
> + * @job: job to test
> + *
> + * A job may bypass the submit workqueue and run inline on the calling thread
> + * if all of the following hold:
> + *
> + *  - %DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED is set on the queue
> + *  - the queue is not stopped
> + *  - the SPSC submission queue is empty (no other jobs waiting)
> + *  - the queue has enough credits for @job
> + *  - @job has no unresolved dependency fences
> + *
> + * Must be called under @q->sched.lock.
> + *
> + * Context: Process context. Must hold @q->sched.lock (a mutex).
> + * Return: true if the job may be run inline, false otherwise.
> + */
> +bool drm_dep_queue_can_job_bypass(struct drm_dep_queue *q,
> +  struct drm_dep_job *job)
> +{
> + lockdep_assert_held(&q->sched.lock);
> +
> + return q->sched.flags & DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED &&
> + !drm_dep_queue_is_stopped(q) &&
> + !spsc_queue_count(&q->job.queue) &&
> + drm_dep_queue_has_credits(q, job) &&
> + xa_empty(&job->dependencies);
> +}
> +
> +/**
> + * drm_dep_job_done() - mark a job as complete
> + * @job: the job that finished
> + * @result: error code to propagate, or 0 for success
> + *
> + * Subtracts @job->credits from the queue credit counter, then signals the
> + * job's dep fence with @result.
> + *
> + * When %DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE is set (IRQ-safe path), a
> + * temporary extra reference is taken on @job before signalling the fence.
> + * This prevents a concurrent put-job worker — which may be woken by timeouts or
> + * queue starting — from freeing the job while this function still holds a
> + * pointer to it.  The extra reference is released at the end of the function.
> + *
> + * After signalling, the IRQ-safe path removes the job from the pending list
> + * under @q->job.lock, provided the queue is not stopped.  Removal is skipped
> + * when the queue is stopped so that drm_dep_queue_for_each_pending_job() can
> + * iterate the list without racing with the completion path.  On successful
> + * removal, kicks the run-job worker so the next queued job can be dispatched
> + * immediately, then drops the job reference.  If the job was already removed
> + * by TDR, or removal was skipped because the queue is stopped, kicks the
> + * put-job worker instead to allow the deferred put to complete.
> + *
> + * Context: Any context.
> + */
> +static void drm_dep_job_done(struct drm_dep_job *job, int result)
> +{
> + struct drm_dep_queue *q = job->q;
> + bool irq_safe = drm_dep_queue_is_job_put_irq_safe(q), removed = false;
> +
> + /*
> + * Local ref to ensure the put worker—which may be woken by external
> + * forces (TDR, driver-side queue starting)—doesn't free the job behind
> + * this function's back after drm_dep_fence_done() while it is still on
> + * the pending list.
> + */
> + if (irq_safe)
> + drm_dep_job_get(job);
> +
> + atomic_sub(job->credits, &q->credit.count);
> + drm_dep_fence_done(job->dfence, result);
> +
> + /* Only safe to touch job after fence signal if we have a local ref. */
> +
> + if (irq_safe) {
> + scoped_guard(spinlock_irqsave, &q->job.lock) {
> + removed = !list_empty(&job->pending_link) &&
> + !drm_dep_queue_is_stopped(q);
> +
> + /* Guard against TDR operating on job */
> + if (removed)
> + drm_dep_queue_remove_job(q, job);
> + }
> + }
> +
> + if (removed) {
> + drm_dep_queue_run_job_queue(q);
> + drm_dep_job_put(job);
> + } else {
> + drm_dep_queue_put_job_queue(q);
> + }
> +
> + if (irq_safe)
> + drm_dep_job_put(job);
> +}
> +
> +/**
> + * drm_dep_job_done_cb() - dma_fence callback to complete a job
> + * @f: the hardware fence that signalled
> + * @cb: fence callback embedded in the dep job
> + *
> + * Extracts the job from @cb and calls drm_dep_job_done() with
> + * @f->error as the result.
> + *
> + * Context: Any context, but with IRQ disabled. May not sleep.
> + */
> +static void drm_dep_job_done_cb(struct dma_fence *f, struct dma_fence_cb *cb)
> +{
> + struct drm_dep_job *job = container_of(cb, struct drm_dep_job, cb);
> +
> + drm_dep_job_done(job, f->error);
> +}
> +
> +/**
> + * drm_dep_queue_run_job() - submit a job to hardware and set up
> + *   completion tracking
> + * @q: dep queue
> + * @job: job to run
> + *
> + * Accounts @job->credits against the queue, appends the job to the pending
> + * list, then calls @q->ops->run_job(). The TDR timer is started only when
> + * @job is the first entry on the pending list; subsequent jobs added while
> + * a TDR is already in flight do not reset the timer (which would otherwise
> + * extend the deadline for the already-running head job). Stores the returned
> + * hardware fence as the parent of the job's dep fence, then installs
> + * drm_dep_job_done_cb() on it. If the hardware fence is already signalled
> + * (%-ENOENT from dma_fence_add_callback()) or run_job() returns NULL/error,
> + * the job is completed immediately. Must be called under @q->sched.lock.
> + *
> + * Context: Process context. Must hold @q->sched.lock (a mutex). DMA fence
> + * signaling path.
> + */
> +void drm_dep_queue_run_job(struct drm_dep_queue *q, struct drm_dep_job *job)
> +{
> + struct dma_fence *fence;
> + int r;
> +
> + lockdep_assert_held(&q->sched.lock);
> +
> + drm_dep_job_get(job);
> + atomic_add(job->credits, &q->credit.count);
> +
> + scoped_guard(spinlock_irq, &q->job.lock) {
> + bool first = list_empty(&q->job.pending);
> +
> + list_add_tail(&job->pending_link, &q->job.pending);
> + if (first)
> + drm_queue_start_timeout(q);
> + }
> +
> + fence = q->ops->run_job(job);
> + drm_dep_fence_set_parent(job->dfence, fence);
> +
> + if (!IS_ERR_OR_NULL(fence)) {
> + r = dma_fence_add_callback(fence, &job->cb,
> +   drm_dep_job_done_cb);
> + if (r == -ENOENT)
> + drm_dep_job_done(job, fence->error);
> + else if (r)
> + drm_err(q->drm, "fence add callback failed (%d)\n", r);
> + dma_fence_put(fence);
> + } else {
> + drm_dep_job_done(job, IS_ERR(fence) ? PTR_ERR(fence) : 0);
> + }
> +
> + /*
> + * Drop all input dependency fences now, in process context, before the
> + * final job put. Once the job is on the pending list its last reference
> + * may be dropped from a dma_fence callback (IRQ context), where calling
> + * xa_destroy() would be unsafe.
> + */

I assume that “pending” is the list of jobs that have been handed to the driver
via ops->run_job()?

Can’t this problem be solved by not doing anything inside a dma_fence callback
other than scheduling the queue worker?

> + drm_dep_job_drop_dependencies(job);
> + drm_dep_job_put(job);
> +}
> +
> +/**
> + * drm_dep_queue_push_job() - enqueue a job on the SPSC submission queue
> + * @q: dep queue
> + * @job: job to push
> + *
> + * Pushes @job onto the SPSC queue. If the queue was previously empty
> + * (i.e. this is the first pending job), kicks the run_job worker so it
> + * processes the job promptly without waiting for the next wakeup.
> + * May be called with or without @q->sched.lock held.
> + *
> + * Context: Any context. DMA fence signaling path.
> + */
> +void drm_dep_queue_push_job(struct drm_dep_queue *q, struct drm_dep_job *job)
> +{
> + /*
> + * spsc_queue_push() returns true if the queue was previously empty,
> + * i.e. this is the first pending job. Kick the run_job worker so it
> + * picks it up without waiting for the next wakeup.
> + */
> + if (spsc_queue_push(&q->job.queue, &job->queue_node))
> + drm_dep_queue_run_job_queue(q);
> +}
> +
> +/**
> + * drm_dep_init() - module initialiser
> + *
> + * Allocates the module-private dep_free_wq unbound workqueue used for
> + * deferred queue teardown.
> + *
> + * Return: 0 on success, %-ENOMEM if workqueue allocation fails.
> + */
> +static int __init drm_dep_init(void)
> +{
> + dep_free_wq = alloc_workqueue("drm_dep_free", WQ_UNBOUND, 0);
> + if (!dep_free_wq)
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> +/**
> + * drm_dep_exit() - module exit
> + *
> + * Destroys the module-private dep_free_wq workqueue.
> + */
> +static void __exit drm_dep_exit(void)
> +{
> + destroy_workqueue(dep_free_wq);
> + dep_free_wq = NULL;
> +}
> +
> +module_init(drm_dep_init);
> +module_exit(drm_dep_exit);
> +
> +MODULE_DESCRIPTION("DRM dependency queue");
> +MODULE_LICENSE("Dual MIT/GPL");
> diff --git a/drivers/gpu/drm/dep/drm_dep_queue.h b/drivers/gpu/drm/dep/drm_dep_queue.h
> new file mode 100644
> index 000000000000..e5c217a3fab5
> --- /dev/null
> +++ b/drivers/gpu/drm/dep/drm_dep_queue.h
> @@ -0,0 +1,31 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2026 Intel Corporation
> + */
> +
> +#ifndef _DRM_DEP_QUEUE_H_
> +#define _DRM_DEP_QUEUE_H_
> +
> +#include <linux/types.h>
> +
> +struct drm_dep_job;
> +struct drm_dep_queue;
> +
> +bool drm_dep_queue_can_job_bypass(struct drm_dep_queue *q,
> +  struct drm_dep_job *job);
> +void drm_dep_queue_run_job(struct drm_dep_queue *q, struct drm_dep_job *job);
> +void drm_dep_queue_push_job(struct drm_dep_queue *q, struct drm_dep_job *job);
> +
> +#if IS_ENABLED(CONFIG_PROVE_LOCKING)
> +void drm_dep_queue_push_job_begin(struct drm_dep_queue *q);
> +void drm_dep_queue_push_job_end(struct drm_dep_queue *q);
> +#else
> +static inline void drm_dep_queue_push_job_begin(struct drm_dep_queue *q)
> +{
> +}
> +static inline void drm_dep_queue_push_job_end(struct drm_dep_queue *q)
> +{
> +}
> +#endif
> +
> +#endif /* _DRM_DEP_QUEUE_H_ */
> diff --git a/include/drm/drm_dep.h b/include/drm/drm_dep.h
> new file mode 100644
> index 000000000000..615926584506
> --- /dev/null
> +++ b/include/drm/drm_dep.h
> @@ -0,0 +1,597 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright 2015 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + * Copyright © 2026 Intel Corporation
> + */
> +
> +#ifndef _DRM_DEP_H_
> +#define _DRM_DEP_H_
> +
> +#include <drm/spsc_queue.h>
> +#include <linux/dma-fence.h>
> +#include <linux/xarray.h>
> +#include <linux/workqueue.h>
> +
> +enum dma_resv_usage;
> +struct dma_resv;
> +struct drm_dep_fence;
> +struct drm_dep_job;
> +struct drm_dep_queue;
> +struct drm_file;
> +struct drm_gem_object;
> +
> +/**
> + * enum drm_dep_timedout_stat - return value of &drm_dep_queue_ops.timedout_job
> + * @DRM_DEP_TIMEDOUT_STAT_JOB_SIGNALED: driver signaled the job's finished
> + *   fence during reset; drm_dep may safely drop its reference to the job.
> + * @DRM_DEP_TIMEDOUT_STAT_REQUEUE_JOB: timeout was a false alarm; reinsert the
> + *   job at the head of the pending list so it can complete normally.
> + */
> +enum drm_dep_timedout_stat {
> + DRM_DEP_TIMEDOUT_STAT_JOB_SIGNALED,
> + DRM_DEP_TIMEDOUT_STAT_REQUEUE_JOB,
> +};
> +
> +/**
> + * struct drm_dep_queue_ops - driver callbacks for a dep queue
> + */
> +struct drm_dep_queue_ops {
> + /**
> + * @run_job: submit the job to hardware. Returns the hardware completion
> + * fence (with a reference held for the scheduler), or NULL/ERR_PTR on
> + * synchronous completion or error.
> + */
> + struct dma_fence *(*run_job)(struct drm_dep_job *job);
> +
> + /**
> + * @timedout_job: called when the TDR fires for the head job. Must stop
> + * the hardware, then return %DRM_DEP_TIMEDOUT_STAT_JOB_SIGNALED if the
> + * job's fence was signalled during reset, or
> + * %DRM_DEP_TIMEDOUT_STAT_REQUEUE_JOB if the timeout was spurious or
> + * signalling was otherwise delayed, and the job should be re-inserted
> + * at the head of the pending list. Any other value triggers a WARN.
> + */
> + enum drm_dep_timedout_stat (*timedout_job)(struct drm_dep_job *job);
> +
> + /**
> + * @release: called when the last kref on the queue is dropped and
> + * drm_dep_queue_fini() has completed.  The driver is responsible for
> + * removing @q from any internal bookkeeping, calling
> + * drm_dep_queue_release(), and then freeing the memory containing @q
> + * (e.g. via kfree_rcu() using @q->rcu).  If NULL, drm_dep calls
> + * drm_dep_queue_release() and frees @q automatically via kfree_rcu().
> + * Use this when the queue is embedded in a larger structure.
> + */
> + void (*release)(struct drm_dep_queue *q);
> +
> + /**
> + * @fini: if set, called instead of drm_dep_queue_fini() when the last
> + * kref is dropped. The driver is responsible for calling
> + * drm_dep_queue_fini() itself after it is done with the queue. Use this
> + * when additional teardown logic must run before fini (e.g., cleanup
> + * firmware resources associated with the queue).
> + */
> + void (*fini)(struct drm_dep_queue *q);
> +};
> +
> +/**
> + * enum drm_dep_queue_flags - flags for &drm_dep_queue and
> + *   &drm_dep_queue_init_args
> + *
> + * Flags are divided into three categories:
> + *
> + * - **Private static**: set internally at init time and never changed.
> + *   Drivers must not read or write these.
> + *   %DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ,
> + *   %DRM_DEP_QUEUE_FLAGS_OWN_TIMEDOUT_WQ.
> + *
> + * - **Public dynamic**: toggled at runtime by drivers via accessors.
> + *   Any modification must be performed under &drm_dep_queue.sched.lock.

Can’t enforce that in C.

> + *   Accessor functions provide unstable reads.
> + *   %DRM_DEP_QUEUE_FLAGS_STOPPED,
> + *   %DRM_DEP_QUEUE_FLAGS_KILLED.

> + *
> + * - **Public static**: supplied by the driver in
> + *   &drm_dep_queue_init_args.flags at queue creation time and not modified
> + *   thereafter.

Same here.

> + *   %DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED,
> + *   %DRM_DEP_QUEUE_FLAGS_HIGHPRI,
> + *   %DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE.

> + *
> + * @DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ: (private, static) submit workqueue was
> + *   allocated by drm_dep_queue_init() and will be destroyed by
> + *   drm_dep_queue_fini().
> + * @DRM_DEP_QUEUE_FLAGS_OWN_TIMEDOUT_WQ: (private, static) timeout workqueue
> + *   was allocated by drm_dep_queue_init() and will be destroyed by
> + *   drm_dep_queue_fini().
> + * @DRM_DEP_QUEUE_FLAGS_STOPPED: (public, dynamic) the queue is stopped and
> + *   will not dispatch new jobs or remove jobs from the pending list, dropping
> + *   the drm_dep-owned reference. Set by drm_dep_queue_stop(), cleared by
> + *   drm_dep_queue_start().
> + * @DRM_DEP_QUEUE_FLAGS_KILLED: (public, dynamic) the queue has been killed
> + *   via drm_dep_queue_kill(). Any active dependency wait is cancelled
> + *   immediately.  Jobs continue to flow through run_job for bookkeeping
> + *   cleanup, but dependency waiting is skipped so that queued work drains
> + *   as quickly as possible.
> + * @DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED: (public, static) the queue supports
> + *   the bypass path where eligible jobs skip the SPSC queue and run inline.
> + * @DRM_DEP_QUEUE_FLAGS_HIGHPRI: (public, static) the submit workqueue owned
> + *   by the queue is created with %WQ_HIGHPRI, causing run-job and put-job
> + *   workers to execute at elevated priority. Only privileged clients (e.g.
> + *   drivers managing time-critical or real-time GPU contexts) should request
> + *   this flag; granting it to unprivileged userspace would allow priority
> + *   inversion attacks.
> + *   @drm_dep_queue_init_args.submit_wq is provided.
> + * @DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE: (public, static) when set,
> + *   drm_dep_job_done() may be called from hardirq context (e.g. from a
> + *   hardware-signalled dma_fence callback). drm_dep_job_done() will directly
> + *   dequeue the job and call drm_dep_job_put() without deferring to a
> + *   workqueue. The driver's &drm_dep_job_ops.release callback must therefore
> + *   be safe to invoke from IRQ context.
> + */
> +enum drm_dep_queue_flags {
> + DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ = BIT(0),
> + DRM_DEP_QUEUE_FLAGS_OWN_TIMEDOUT_WQ = BIT(1),
> + DRM_DEP_QUEUE_FLAGS_STOPPED = BIT(2),
> + DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED = BIT(3),
> + DRM_DEP_QUEUE_FLAGS_HIGHPRI = BIT(4),
> + DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE = BIT(5),
> + DRM_DEP_QUEUE_FLAGS_KILLED = BIT(6),
> +};
> +
> +/**
> + * struct drm_dep_queue - a dependency-tracked GPU submission queue
> + *
> + * Combines the role of &drm_gpu_scheduler and &drm_sched_entity into a single
> + * object.  Each queue owns a submit workqueue (or borrows one), a timeout
> + * workqueue, an SPSC submission queue, and a pending-job list used for TDR.
> + *
> + * Initialise with drm_dep_queue_init(), tear down with drm_dep_queue_fini().
> + * Reference counted via drm_dep_queue_get() / drm_dep_queue_put().
> + *
> + * All fields are **opaque to drivers**.  Do not read or write any field

Can’t enforce this in C.

> + * directly; use the provided helper functions instead.  The sole exception
> + * is @rcu, which drivers may pass to kfree_rcu() when the queue is embedded
> + * inside a larger driver-managed structure and the &drm_dep_queue_ops.release
> + * vfunc performs an RCU-deferred free.

> + */
> +struct drm_dep_queue {
> + /** @ops: driver callbacks, set at init time. */
> + const struct drm_dep_queue_ops *ops;
> + /** @name: human-readable name used for workqueue and fence naming. */
> + const char *name;
> + /** @drm: owning DRM device; a drm_dev_get() reference is held for the
> + *  lifetime of the queue to prevent module unload while queues are live.
> + */
> + struct drm_device *drm;
> + /** @refcount: reference count; use drm_dep_queue_get/put(). */
> + struct kref refcount;
> + /**
> + * @free_work: deferred teardown work queued unconditionally by
> + * drm_dep_queue_fini() onto the module-private dep_free_wq.  The work
> + * item disables pending workers synchronously and destroys any owned
> + * workqueues before releasing the queue memory and dropping the
> + * drm_dev_get() reference.  Running on dep_free_wq ensures
> + * destroy_workqueue() is never called from within one of the queue's
> + * own workers.
> + */
> + struct work_struct free_work;
> + /**
> + * @rcu: RCU head for deferred freeing.
> + *
> + * This is the **only** field drivers may access directly.  When the

We can enforce this in Rust at compile time.

> + * queue is embedded in a larger structure, implement
> + * &drm_dep_queue_ops.release, call drm_dep_queue_release() to destroy
> + * internal resources, then pass this field to kfree_rcu() so that any
> + * in-flight RCU readers referencing the queue's dma_fence timeline name
> + * complete before the memory is returned.  All other fields must be
> + * accessed through the provided helpers.
> + */
> + struct rcu_head rcu;
> +
> + /** @sched: scheduling and workqueue state. */
> + struct {
> + /** @sched.submit_wq: ordered workqueue for run/put-job work. */
> + struct workqueue_struct *submit_wq;
> + /** @sched.timeout_wq: workqueue for the TDR delayed work. */
> + struct workqueue_struct *timeout_wq;
> + /**
> + * @sched.run_job: work item that dispatches the next queued
> + * job.
> + */
> + struct work_struct run_job;
> + /** @sched.put_job: work item that frees finished jobs. */
> + struct work_struct put_job;
> + /** @sched.tdr: delayed work item for timeout/reset (TDR). */
> + struct delayed_work tdr;
> + /**
> + * @sched.lock: mutex serialising job dispatch, bypass
> + * decisions, stop/start, and flag updates.
> + */
> + struct mutex lock;
> + /**
> + * @sched.flags: bitmask of &enum drm_dep_queue_flags.
> + * Any modification after drm_dep_queue_init() must be
> + * performed under @sched.lock.
> + */
> + enum drm_dep_queue_flags flags;
> + } sched;
> +
> + /** @job: pending-job tracking state. */
> + struct {
> + /**
> + * @job.pending: list of jobs that have been dispatched to
> + * hardware and not yet freed. Protected by @job.lock.
> + */
> + struct list_head pending;
> + /**
> + * @job.queue: SPSC queue of jobs waiting to be dispatched.
> + * Producers push via drm_dep_queue_push_job(); the run_job
> + * work item pops from the consumer side.
> + */
> + struct spsc_queue queue;
> + /**
> + * @job.lock: spinlock protecting @job.pending, TDR start, and
> + * the %DRM_DEP_QUEUE_FLAGS_STOPPED flag. Always acquired with
> + * irqsave (spin_lock_irqsave / spin_unlock_irqrestore) to
> + * support %DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE queues where
> + * drm_dep_job_done() may run from hardirq context.
> + */
> + spinlock_t lock;
> + /**
> + * @job.timeout: per-job TDR timeout in jiffies.
> + * %MAX_SCHEDULE_TIMEOUT means no timeout.
> + */
> + long timeout;
> +#if IS_ENABLED(CONFIG_PROVE_LOCKING)
> + /**
> + * @job.push: lockdep annotation tracking the arm-to-push
> + * critical section.
> + */
> + struct {
> + /*
> + * @job.push.owner: task that currently holds the push
> + * context, used to assert single-owner invariants.
> + * NULL when idle.
> + */
> + struct task_struct *owner;
> + } push;
> +#endif
> + } job;
> +
> + /** @credit: hardware credit accounting. */
> + struct {
> + /** @credit.limit: maximum credits the queue can hold. */
> + u32 limit;
> + /** @credit.count: credits currently in flight (atomic). */
> + atomic_t count;
> + } credit;
> +
> + /** @dep: current blocking dependency for the head SPSC job. */
> + struct {
> + /**
> + * @dep.fence: fence being waited on before the head job can
> + * run. NULL when no dependency is pending.
> + */
> + struct dma_fence *fence;
> + /**
> + * @dep.removed_fence: dependency fence whose callback has been
> + * removed.  The run-job worker must drop its reference to this
> + * fence before proceeding to call run_job.

We can enforce this in Rust automatically.

> + */
> + struct dma_fence *removed_fence;
> + /** @dep.cb: callback installed on @dep.fence. */
> + struct dma_fence_cb cb;
> + } dep;
> +
> + /** @fence: fence context and sequence number state. */
> + struct {
> + /**
> + * @fence.seqno: next sequence number to assign, incremented
> + * each time a job is armed.
> + */
> + u32 seqno;
> + /**
> + * @fence.context: base DMA fence context allocated at init
> + * time. Finished fences use this context.
> + */
> + u64 context;
> + } fence;
> +};
> +
> +/**
> + * struct drm_dep_queue_init_args - arguments for drm_dep_queue_init()
> + */
> +struct drm_dep_queue_init_args {
> + /** @ops: driver callbacks; must not be NULL. */
> + const struct drm_dep_queue_ops *ops;
> + /** @name: human-readable name for workqueues and fence timelines. */
> + const char *name;
> + /** @drm: owning DRM device. A drm_dev_get() reference is taken at
> + *  queue init and released when the queue is freed, preventing module
> + *  unload while any queue is still alive.
> + */
> + struct drm_device *drm;
> + /**
> + * @submit_wq: workqueue for job dispatch. If NULL, an ordered
> + * workqueue is allocated and owned by the queue.  If non-NULL, the
> + * workqueue must have been allocated with %WQ_MEM_RECLAIM_TAINT;
> + * drm_dep_queue_init() returns %-EINVAL otherwise.
> + */
> + struct workqueue_struct *submit_wq;
> + /**
> + * @timeout_wq: workqueue for TDR. If NULL, an ordered workqueue
> + * is allocated and owned by the queue.  If non-NULL, the workqueue
> + * must have been allocated with %WQ_MEM_RECLAIM_TAINT;
> + * drm_dep_queue_init() returns %-EINVAL otherwise.
> + */
> + struct workqueue_struct *timeout_wq;
> + /** @credit_limit: maximum hardware credits; must be non-zero. */
> + u32 credit_limit;
> + /**
> + * @timeout: per-job TDR timeout in jiffies. Zero means no timeout
> + * (%MAX_SCHEDULE_TIMEOUT is used internally).
> + */
> + long timeout;
> + /**
> + * @flags: initial queue flags. %DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ
> + * and %DRM_DEP_QUEUE_FLAGS_OWN_TIMEDOUT_WQ are managed internally
> + * and will be ignored if set here. Setting
> + * %DRM_DEP_QUEUE_FLAGS_HIGHPRI requests a high-priority submit
> + * workqueue; drivers must only set this for privileged clients.
> + */
> + enum drm_dep_queue_flags flags;
> +};
> +
> +/**
> + * struct drm_dep_job_ops - driver callbacks for a dep job
> + */
> +struct drm_dep_job_ops {
> + /**
> + * @release: called when the last reference to the job is dropped.
> + *
> + * If set, the driver is responsible for freeing the job. If NULL,

And if they don’t?

By the way, we can also enforce this in Rust.

> + * drm_dep_job_put() will call kfree() on the job directly.
> + */
> + void (*release)(struct drm_dep_job *job);
> +};
> +
> +/**
> + * struct drm_dep_job - a unit of work submitted to a dep queue
> + *
> + * All fields are **opaque to drivers**.  Do not read or write any field
> + * directly; use the provided helper functions instead.
> + */
> +struct drm_dep_job {
> + /** @ops: driver callbacks for this job. */
> + const struct drm_dep_job_ops *ops;
> + /** @refcount: reference count, managed by drm_dep_job_get/put(). */
> + struct kref refcount;
> + /**
> + * @dependencies: xarray of &dma_fence dependencies before the job can
> + * run.
> + */
> + struct xarray dependencies;
> + /** @q: the queue this job is submitted to. */
> + struct drm_dep_queue *q;
> + /** @queue_node: SPSC queue linkage for pending submission. */
> + struct spsc_node queue_node;
> + /**
> + * @pending_link: list entry in the queue's pending job list. Protected
> + * by @job.q->job.lock.
> + */
> + struct list_head pending_link;
> + /** @dfence: finished fence for this job. */
> + struct drm_dep_fence *dfence;
> + /** @cb: fence callback used to watch for dependency completion. */
> + struct dma_fence_cb cb;
> + /** @credits: number of credits this job consumes from the queue. */
> + u32 credits;
> + /**
> + * @last_dependency: index into @dependencies of the next fence to
> + * check. Advanced by drm_dep_queue_job_dependency() as each
> + * dependency is consumed.
> + */
> + u32 last_dependency;
> + /**
> + * @invalidate_count: number of times this job has been invalidated.
> + * Incremented by drm_dep_job_invalidate_job().
> + */
> + u32 invalidate_count;
> + /**
> + * @signalling_cookie: return value of dma_fence_begin_signalling()
> + * captured in drm_dep_job_arm() and consumed by drm_dep_job_push().
> + * Not valid outside the arm→push window.
> + */
> + bool signalling_cookie;
> +};
> +
> +/**
> + * struct drm_dep_job_init_args - arguments for drm_dep_job_init()
> + */
> +struct drm_dep_job_init_args {
> + /**
> + * @ops: driver callbacks for the job, or NULL for default behaviour.
> + */
> + const struct drm_dep_job_ops *ops;
> + /** @q: the queue to associate the job with. A reference is taken. */
> + struct drm_dep_queue *q;
> + /** @credits: number of credits this job consumes; must be non-zero. */
> + u32 credits;
> +};
> +
> +/* Queue API */
> +
> +/**
> + * drm_dep_queue_sched_guard() - acquire the queue scheduler lock as a guard
> + * @__q: dep queue whose scheduler lock to acquire
> + *
> + * Acquires @__q->sched.lock as a scoped mutex guard (released automatically
> + * when the enclosing scope exits).  This lock serialises all scheduler state
> + * transitions — stop/start/kill flag changes, bypass-path decisions, and the
> + * run-job worker — so it must be held when the driver needs to atomically
> + * inspect or modify queue state in relation to job submission.
> + *
> + * **When to use**
> + *
> + * Drivers that set %DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED and wish to
> + * serialise their own submit work against the bypass path must acquire this
> + * guard.  Without it, a concurrent caller of drm_dep_job_push() could take
> + * the bypass path and call ops->run_job() inline between the driver's
> + * eligibility check and its corresponding action, producing a race.

So if you’re not careful, you have just introduced a race :/

> + *
> + * **Constraint: only from submit_wq worker context**
> + *
> + * This guard must only be acquired from a work item running on the queue's
> + * submit workqueue (@q->sched.submit_wq) by drivers.
> + *
> + * Context: Process context only; must be called from submit_wq work by
> + * drivers.
> + */
> +#define drm_dep_queue_sched_guard(__q) \
> + guard(mutex)(&(__q)->sched.lock)
> +
> +int drm_dep_queue_init(struct drm_dep_queue *q,
> +       const struct drm_dep_queue_init_args *args);
> +void drm_dep_queue_fini(struct drm_dep_queue *q);
> +void drm_dep_queue_release(struct drm_dep_queue *q);
> +struct drm_dep_queue *drm_dep_queue_get(struct drm_dep_queue *q);
> +bool drm_dep_queue_get_unless_zero(struct drm_dep_queue *q);
> +void drm_dep_queue_put(struct drm_dep_queue *q);
> +void drm_dep_queue_stop(struct drm_dep_queue *q);
> +void drm_dep_queue_start(struct drm_dep_queue *q);
> +void drm_dep_queue_kill(struct drm_dep_queue *q);
> +void drm_dep_queue_trigger_timeout(struct drm_dep_queue *q);
> +void drm_dep_queue_cancel_tdr_sync(struct drm_dep_queue *q);
> +void drm_dep_queue_resume_timeout(struct drm_dep_queue *q);
> +bool drm_dep_queue_work_enqueue(struct drm_dep_queue *q,
> + struct work_struct *work);
> +bool drm_dep_queue_is_stopped(struct drm_dep_queue *q);
> +bool drm_dep_queue_is_killed(struct drm_dep_queue *q);
> +bool drm_dep_queue_is_initialized(struct drm_dep_queue *q);
> +void drm_dep_queue_set_stopped(struct drm_dep_queue *q);
> +unsigned int drm_dep_queue_refcount(const struct drm_dep_queue *q);
> +long drm_dep_queue_timeout(const struct drm_dep_queue *q);
> +struct workqueue_struct *drm_dep_queue_submit_wq(struct drm_dep_queue *q);
> +struct workqueue_struct *drm_dep_queue_timeout_wq(struct drm_dep_queue *q);
> +
> +/* Job API */
> +
> +/**
> + * DRM_DEP_JOB_FENCE_PREALLOC - sentinel value for pre-allocating a dependency slot
> + *
> + * Pass this to drm_dep_job_add_dependency() instead of a real fence to
> + * pre-allocate a slot in the job's dependency xarray during the preparation
> + * phase (where GFP_KERNEL is available).  The returned xarray index identifies
> + * the slot.  Call drm_dep_job_replace_dependency() later — inside a
> + * dma_fence_begin_signalling() region if needed — to swap in the real fence
> + * without further allocation.
> + *
> + * This sentinel is never treated as a dma_fence; it carries no reference count
> + * and must not be passed to dma_fence_put().  It is only valid as an argument
> + * to drm_dep_job_add_dependency() and as the expected stored value checked by
> + * drm_dep_job_replace_dependency().
> + */
> +#define DRM_DEP_JOB_FENCE_PREALLOC ((struct dma_fence *)-1)
> +
> +int drm_dep_job_init(struct drm_dep_job *job,
> +     const struct drm_dep_job_init_args *args);
> +struct drm_dep_job *drm_dep_job_get(struct drm_dep_job *job);
> +void drm_dep_job_put(struct drm_dep_job *job);
> +void drm_dep_job_arm(struct drm_dep_job *job);
> +void drm_dep_job_push(struct drm_dep_job *job);
> +int drm_dep_job_add_dependency(struct drm_dep_job *job,
> +       struct dma_fence *fence);
> +void drm_dep_job_replace_dependency(struct drm_dep_job *job, u32 index,
> +    struct dma_fence *fence);
> +int drm_dep_job_add_syncobj_dependency(struct drm_dep_job *job,
> +       struct drm_file *file, u32 handle,
> +       u32 point);
> +int drm_dep_job_add_resv_dependencies(struct drm_dep_job *job,
> +      struct dma_resv *resv,
> +      enum dma_resv_usage usage);
> +int drm_dep_job_add_implicit_dependencies(struct drm_dep_job *job,
> +  struct drm_gem_object *obj,
> +  bool write);
> +bool drm_dep_job_is_signaled(struct drm_dep_job *job);
> +bool drm_dep_job_is_finished(struct drm_dep_job *job);
> +bool drm_dep_job_invalidate_job(struct drm_dep_job *job, int threshold);
> +struct dma_fence *drm_dep_job_finished_fence(struct drm_dep_job *job);
> +
> +/**
> + * struct drm_dep_queue_pending_job_iter - iterator state for
> + *   drm_dep_queue_for_each_pending_job()
> + * @q: queue being iterated
> + */
> +struct drm_dep_queue_pending_job_iter {
> + struct drm_dep_queue *q;
> +};
> +
> +/* Drivers should never call this directly */

Not enforceable in C.

> +static inline struct drm_dep_queue_pending_job_iter
> +__drm_dep_queue_pending_job_iter_begin(struct drm_dep_queue *q)
> +{
> + struct drm_dep_queue_pending_job_iter iter = {
> + .q = q,
> + };
> +
> + WARN_ON(!drm_dep_queue_is_stopped(q));
> + return iter;
> +}
> +
> +/* Drivers should never call this directly */
> +static inline void
> +__drm_dep_queue_pending_job_iter_end(struct drm_dep_queue_pending_job_iter iter)
> +{
> + WARN_ON(!drm_dep_queue_is_stopped(iter.q));
> +}
> +
> +/* clang-format off */
> +DEFINE_CLASS(drm_dep_queue_pending_job_iter,
> +     struct drm_dep_queue_pending_job_iter,
> +     __drm_dep_queue_pending_job_iter_end(_T),
> +     __drm_dep_queue_pending_job_iter_begin(__q),
> +     struct drm_dep_queue *__q);
> +/* clang-format on */
> +static inline void *
> +class_drm_dep_queue_pending_job_iter_lock_ptr(
> + class_drm_dep_queue_pending_job_iter_t *_T)
> +{ return _T; }
> +#define class_drm_dep_queue_pending_job_iter_is_conditional false
> +
> +/**
> + * drm_dep_queue_for_each_pending_job() - iterate over all pending jobs
> + *   in a queue
> + * @__job: loop cursor, a &struct drm_dep_job pointer
> + * @__q: &struct drm_dep_queue to iterate
> + *
> + * Iterates over every job currently on @__q->job.pending. The queue must be
> + * stopped (drm_dep_queue_stop() called) before using this iterator; a WARN_ON
> + * fires at the start and end of the scope if it is not.
> + *
> + * Context: Any context.
> + */
> +#define drm_dep_queue_for_each_pending_job(__job, __q) \
> + scoped_guard(drm_dep_queue_pending_job_iter, (__q)) \
> + list_for_each_entry((__job), &(__q)->job.pending, pending_link)
> +
> +#endif
> -- 
> 2.34.1
> 


By the way:

I invite you to have a look at this implementation [0]. It currently works in real
hardware i.e.: our downstream "Tyr" driver for Arm Mali is using that at the
moment. It is a mere prototype that we’ve put together to test different
approaches, so it’s not meant to be a “solution” at all. It’s a mere data point
for further discussion.

Philip Stanner is working on this “Job Queue” concept too, but from an upstream
perspective.

[0]: https://gitlab.freedesktop.org/panfrost/linux/-/merge_requests/61

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
  2026-03-17  2:47   ` [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer Daniel Almeida
@ 2026-03-17  5:45     ` Matthew Brost
  2026-03-17  7:17       ` Miguel Ojeda
  2026-03-17 18:14       ` Matthew Brost
  2026-03-17 12:31     ` Danilo Krummrich
  1 sibling, 2 replies; 21+ messages in thread
From: Matthew Brost @ 2026-03-17  5:45 UTC (permalink / raw)
  To: Daniel Almeida
  Cc: intel-xe, dri-devel, Boris Brezillon, Tvrtko Ursulin,
	Rodrigo Vivi, Thomas Hellström, Christian König,
	Danilo Krummrich, David Airlie, Maarten Lankhorst, Maxime Ripard,
	Philipp Stanner, Simona Vetter, Sumit Semwal, Thomas Zimmermann,
	linux-kernel, Sami Tolvanen, Jeffrey Vander Stoep, Alice Ryhl,
	Daniel Stone, Alexandre Courbot, John Hubbard, shashanks, jajones,
	Eliot Courtney, Joel Fernandes, rust-for-linux

On Mon, Mar 16, 2026 at 11:47:01PM -0300, Daniel Almeida wrote:
> (+cc a few other people + Rust-for-Linux ML)
> 
> Hi Matthew,
> 
> I agree with what Danilo said below, i.e.:  IMHO, with the direction that DRM
> is going, it is much more ergonomic to add a Rust component with a nice C
> interface than doing it the other way around.
>

Holy war? See my reply to Danilo — I’ll write this in Rust if needed,
but it’s not my first choice since I’m not yet a native speaker.
 
> > On 16 Mar 2026, at 01:32, Matthew Brost <matthew.brost@intel.com> wrote:
> > 
> > Diverging requirements between GPU drivers using firmware scheduling
> > and those using hardware scheduling have shown that drm_gpu_scheduler is
> > no longer sufficient for firmware-scheduled GPU drivers. The technical
> > debt, lack of memory-safety guarantees, absence of clear object-lifetime
> > rules, and numerous driver-specific hacks have rendered
> > drm_gpu_scheduler unmaintainable. It is time for a fresh design for
> > firmware-scheduled GPU drivers—one that addresses all of the
> > aforementioned shortcomings.
> > 
> > Add drm_dep, a lightweight GPU submission queue intended as a
> > replacement for drm_gpu_scheduler for firmware-managed GPU schedulers
> > (e.g. Xe, Panthor, AMDXDNA, PVR, Nouveau, Nova). Unlike
> > drm_gpu_scheduler, which separates the scheduler (drm_gpu_scheduler)
> > from the queue (drm_sched_entity) into two objects requiring external
> > coordination, drm_dep merges both roles into a single struct
> > drm_dep_queue. This eliminates the N:1 entity-to-scheduler mapping
> > that is unnecessary for firmware schedulers which manage their own
> > run-lists internally.
> > 
> > Unlike drm_gpu_scheduler, which relies on external locking and lifetime
> > management by the driver, drm_dep uses reference counting (kref) on both
> > queues and jobs to guarantee object lifetime safety. A job holds a queue
> 
> In a domain that has been plagued by lifetime issues, we really should be

Yes, drm sched is a mess. I’ve been suggesting we fix it for years and
have met pushback. This, however (drm dep), isn’t plagued by lifetime
issues — that’s the primary focus here.

> enforcing RAII for resource management instead of manual calls.
> 

You can do RAII in C - see cleanup.h. Clear object lifetimes and
ownership are what is important. Disciplined coding is the only to do
this regardless of language. RAII doesn't help with help with bad object
models / ownership / lifetime models either.

I don't buy the Rust solves everything argument but again non-native
speaker.

> > reference from init until its last put, and the queue holds a job reference
> > from dispatch until the put_job worker runs. This makes use-after-free
> > impossible even when completion arrives from IRQ context or concurrent
> > teardown is in flight.
> 
> It makes use-after-free impossible _if_ you’re careful. It is not a
> property of the type system, and incorrect code will compile just fine.
> 

Sure. If a driver puts a drm_dep object reference on a resource that
drm_dep owns, it will explode. That’s effectively putting a reference on
a resource the driver doesn’t own. A driver can write to any physical
memory and crash the system anyway, so I’m not really sure what we’re
talking about here. Rust doesn’t solve anything in this scenario — you
can always use an unsafe block and put a reference on a resource you
don’t own.

Object model, ownership, and lifetimes are what is important and that is
what drm dep is built around.

> > 
> > The core objects are:
> > 
> >  struct drm_dep_queue - a per-context submission queue owning an
> >    ordered submit workqueue, a TDR timeout workqueue, an SPSC job
> >    queue, and a pending-job list. Reference counted; drivers can embed
> >    it and provide a .release vfunc for RCU-safe teardown.
> > 
> >  struct drm_dep_job - a single unit of GPU work. Drivers embed this
> >    and provide a .release vfunc. Jobs carry an xarray of input
> >    dma_fence dependencies and produce a drm_dep_fence as their
> >    finished fence.
> > 
> >  struct drm_dep_fence - a dma_fence subclass wrapping an optional
> >    parent hardware fence. The finished fence is armed (sequence
> >    number assigned) before submission and signals when the hardware
> >    fence signals (or immediately on synchronous completion).
> > 
> > Job lifecycle:
> >  1. drm_dep_job_init() - allocate and initialise; job acquires a
> >     queue reference.
> >  2. drm_dep_job_add_dependency() and friends - register input fences;
> >     duplicates from the same context are deduplicated.
> >  3. drm_dep_job_arm() - assign sequence number, obtain finished fence.
> >  4. drm_dep_job_push() - submit to queue.
> 
> You cannot enforce this sequence easily in C code. Once again, we are trusting
> drivers that it is followed, but in Rust, you can simply reject code that does
> not follow this order at compile time.
> 

I don’t know Rust, but yes — you can enforce this in C. It’s called
lockdep and annotations. It’s not compile-time, but all of this is
strictly enforced. e.g., write some code that doesn't follow this and
report back if the kernel doesn't explode. It will, if doesn't I'll fix
it to complain.

> 
> > 
> > Submission paths under queue lock:
> >  - Bypass path: if DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED is set, the
> >    SPSC queue is empty, no dependencies are pending, and credits are
> >    available, the job is dispatched inline on the calling thread.
> >  - Queued path: job is pushed onto the SPSC queue and the run_job
> >    worker is kicked. The worker resolves remaining dependencies
> >    (installing wakeup callbacks for unresolved fences) before calling
> >    ops->run_job().
> > 
> > Credit-based throttling prevents hardware overflow: each job declares
> > a credit cost at init time; dispatch is deferred until sufficient
> > credits are available.
> 
> Why can’t we design an API where the driver can refuse jobs in
> ops->run_job() if there are no resources to run it? This would do away with the
> credit system that has been in place for quite a while. Has this approach been
> tried in the past?
> 

That seems possible if this is the preferred option. -EAGAIN is the way
to do this. I’m open to the idea, but we also need to weigh the cost of
converting drivers against the number of changes required.

Partial - reply with catch up the rest later.

Appreciate the feedback.

Matt

> 
> > 
> > Timeout Detection and Recovery (TDR): a per-queue delayed work item
> > fires when the head pending job exceeds q->job.timeout jiffies, calling
> > ops->timedout_job(). drm_dep_queue_trigger_timeout() forces immediate
> > expiry for device teardown.
> > 
> > IRQ-safe completion: queues flagged DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE
> > allow drm_dep_job_done() to be called from hardirq context (e.g. a
> > dma_fence callback). Dependency cleanup is deferred to process context
> > after ops->run_job() returns to avoid calling xa_destroy() from IRQ.
> > 
> > Zombie-state guard: workers use kref_get_unless_zero() on entry and
> > bail immediately if the queue refcount has already reached zero and
> > async teardown is in flight, preventing use-after-free.
> 
> In rust, when you queue work, you have to pass a reference-counted pointer
> (Arc<T>). We simply never have this problem in a Rust design. If there is work
> queued, the queue is alive.
> 
> By the way, why can’t we simply require synchronous teardowns?
> 
> > 
> > Teardown is always deferred to a module-private workqueue (dep_free_wq)
> > so that destroy_workqueue() is never called from within one of the
> > queue's own workers. Each queue holds a drm_dev_get() reference on its
> > owning struct drm_device, released as the final step of teardown via
> > drm_dev_put(). This prevents the driver module from being unloaded
> > while any queue is still alive without requiring a separate drain API.
> > 
> > Cc: Boris Brezillon <boris.brezillon@collabora.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > Cc: Christian König <christian.koenig@amd.com>
> > Cc: Danilo Krummrich <dakr@kernel.org>
> > Cc: David Airlie <airlied@gmail.com>
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > Cc: Maxime Ripard <mripard@kernel.org>
> > Cc: Philipp Stanner <phasta@kernel.org>
> > Cc: Simona Vetter <simona@ffwll.ch>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: Thomas Zimmermann <tzimmermann@suse.de>
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Assisted-by: GitHub Copilot:claude-sonnet-4.6
> > ---
> > drivers/gpu/drm/Kconfig             |    4 +
> > drivers/gpu/drm/Makefile            |    1 +
> > drivers/gpu/drm/dep/Makefile        |    5 +
> > drivers/gpu/drm/dep/drm_dep_fence.c |  406 +++++++
> > drivers/gpu/drm/dep/drm_dep_fence.h |   25 +
> > drivers/gpu/drm/dep/drm_dep_job.c   |  675 +++++++++++
> > drivers/gpu/drm/dep/drm_dep_job.h   |   13 +
> > drivers/gpu/drm/dep/drm_dep_queue.c | 1647 +++++++++++++++++++++++++++
> > drivers/gpu/drm/dep/drm_dep_queue.h |   31 +
> > include/drm/drm_dep.h               |  597 ++++++++++
> > 10 files changed, 3404 insertions(+)
> > create mode 100644 drivers/gpu/drm/dep/Makefile
> > create mode 100644 drivers/gpu/drm/dep/drm_dep_fence.c
> > create mode 100644 drivers/gpu/drm/dep/drm_dep_fence.h
> > create mode 100644 drivers/gpu/drm/dep/drm_dep_job.c
> > create mode 100644 drivers/gpu/drm/dep/drm_dep_job.h
> > create mode 100644 drivers/gpu/drm/dep/drm_dep_queue.c
> > create mode 100644 drivers/gpu/drm/dep/drm_dep_queue.h
> > create mode 100644 include/drm/drm_dep.h
> > 
> > diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> > index 5386248e75b6..834f6e210551 100644
> > --- a/drivers/gpu/drm/Kconfig
> > +++ b/drivers/gpu/drm/Kconfig
> > @@ -276,6 +276,10 @@ config DRM_SCHED
> > tristate
> > depends on DRM
> > 
> > +config DRM_DEP
> > + tristate
> > + depends on DRM
> > +
> > # Separate option as not all DRM drivers use it
> > config DRM_PANEL_BACKLIGHT_QUIRKS
> > tristate
> > diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> > index e97faabcd783..1ad87cc0e545 100644
> > --- a/drivers/gpu/drm/Makefile
> > +++ b/drivers/gpu/drm/Makefile
> > @@ -173,6 +173,7 @@ obj-y += clients/
> > obj-y += display/
> > obj-$(CONFIG_DRM_TTM) += ttm/
> > obj-$(CONFIG_DRM_SCHED) += scheduler/
> > +obj-$(CONFIG_DRM_DEP) += dep/
> > obj-$(CONFIG_DRM_RADEON)+= radeon/
> > obj-$(CONFIG_DRM_AMDGPU)+= amd/amdgpu/
> > obj-$(CONFIG_DRM_AMDGPU)+= amd/amdxcp/
> > diff --git a/drivers/gpu/drm/dep/Makefile b/drivers/gpu/drm/dep/Makefile
> > new file mode 100644
> > index 000000000000..335f1af46a7b
> > --- /dev/null
> > +++ b/drivers/gpu/drm/dep/Makefile
> > @@ -0,0 +1,5 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +
> > +drm_dep-y := drm_dep_queue.o drm_dep_job.o drm_dep_fence.o
> > +
> > +obj-$(CONFIG_DRM_DEP) += drm_dep.o
> > diff --git a/drivers/gpu/drm/dep/drm_dep_fence.c b/drivers/gpu/drm/dep/drm_dep_fence.c
> > new file mode 100644
> > index 000000000000..ae05b9077772
> > --- /dev/null
> > +++ b/drivers/gpu/drm/dep/drm_dep_fence.c
> > @@ -0,0 +1,406 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright © 2026 Intel Corporation
> > + */
> > +
> > +/**
> > + * DOC: DRM dependency fence
> > + *
> > + * Each struct drm_dep_job has an associated struct drm_dep_fence that
> > + * provides a single dma_fence (@finished) signalled when the hardware
> > + * completes the job.
> > + *
> > + * The hardware fence returned by &drm_dep_queue_ops.run_job is stored as
> > + * @parent. @finished is chained to @parent via drm_dep_job_done_cb() and
> > + * is signalled once @parent signals (or immediately if run_job() returns
> > + * NULL or an error).
> 
> I thought this fence proxy mechanism was going away due to recent work being
> carried out by Christian?
> 
> > + *
> > + * Drivers should expose @finished as the out-fence for GPU work since it is
> > + * valid from the moment drm_dep_job_arm() returns, whereas the hardware fence
> > + * could be a compound fence, which is disallowed when installed into
> > + * drm_syncobjs or dma-resv.
> > + *
> > + * The fence uses the kernel's inline spinlock (NULL passed to dma_fence_init())
> > + * so no separate lock allocation is required.
> > + *
> > + * Deadline propagation is supported: if a consumer sets a deadline via
> > + * dma_fence_set_deadline(), it is forwarded to @parent when @parent is set.
> > + * If @parent has not been set yet the deadline is stored in @deadline and
> > + * forwarded at that point.
> > + *
> > + * Memory management: drm_dep_fence objects are allocated with kzalloc() and
> > + * freed via kfree_rcu() once the fence is released, ensuring safety with
> > + * RCU-protected fence accesses.
> > + */
> > +
> > +#include <linux/slab.h>
> > +#include <drm/drm_dep.h>
> > +#include "drm_dep_fence.h"
> > +
> > +/**
> > + * DRM_DEP_FENCE_FLAG_HAS_DEADLINE_BIT - a fence deadline hint has been set
> > + *
> > + * Set by the deadline callback on the finished fence to indicate a deadline
> > + * has been set which may need to be propagated to the parent hardware fence.
> > + */
> > +#define DRM_DEP_FENCE_FLAG_HAS_DEADLINE_BIT (DMA_FENCE_FLAG_USER_BITS + 1)
> > +
> > +/**
> > + * struct drm_dep_fence - fence tracking the completion of a dep job
> > + *
> > + * Contains a single dma_fence (@finished) that is signalled when the
> > + * hardware completes the job. The fence uses the kernel's inline_lock
> > + * (no external spinlock required).
> > + *
> > + * This struct is private to the drm_dep module; external code interacts
> > + * through the accessor functions declared in drm_dep_fence.h.
> > + */
> > +struct drm_dep_fence {
> > + /**
> > + * @finished: signalled when the job completes on hardware.
> > + *
> > + * Drivers should use this fence as the out-fence for a job since it
> > + * is available immediately upon drm_dep_job_arm().
> > + */
> > + struct dma_fence finished;
> > +
> > + /**
> > + * @deadline: deadline set on @finished which potentially needs to be
> > + * propagated to @parent.
> > + */
> > + ktime_t deadline;
> > +
> > + /**
> > + * @parent: The hardware fence returned by &drm_dep_queue_ops.run_job.
> > + *
> > + * @finished is signaled once @parent is signaled. The initial store is
> > + * performed via smp_store_release to synchronize with deadline handling.
> > + *
> > + * All readers must access this under the fence lock and take a reference to
> > + * it, as @parent is set to NULL under the fence lock when the drm_dep_fence
> > + * signals, and this drop also releases its internal reference.
> > + */
> > + struct dma_fence *parent;
> > +
> > + /**
> > + * @q: the queue this fence belongs to.
> > + */
> > + struct drm_dep_queue *q;
> > +};
> > +
> > +static const struct dma_fence_ops drm_dep_fence_ops;
> > +
> > +/**
> > + * to_drm_dep_fence() - cast a dma_fence to its enclosing drm_dep_fence
> > + * @f: dma_fence to cast
> > + *
> > + * Context: No context requirements (inline helper).
> > + * Return: pointer to the enclosing &drm_dep_fence.
> > + */
> > +static struct drm_dep_fence *to_drm_dep_fence(struct dma_fence *f)
> > +{
> > + return container_of(f, struct drm_dep_fence, finished);
> > +}
> > +
> > +/**
> > + * drm_dep_fence_set_parent() - store the hardware fence and propagate
> > + *   any deadline
> > + * @dfence: dep fence
> > + * @parent: hardware fence returned by &drm_dep_queue_ops.run_job, or NULL/error
> > + *
> > + * Stores @parent on @dfence under smp_store_release() so that a concurrent
> > + * drm_dep_fence_set_deadline() call sees the parent before checking the
> > + * deadline bit. If a deadline has already been set on @dfence->finished it is
> > + * forwarded to @parent immediately. Does nothing if @parent is NULL or an
> > + * error pointer.
> > + *
> > + * Context: Any context.
> > + */
> > +void drm_dep_fence_set_parent(struct drm_dep_fence *dfence,
> > +      struct dma_fence *parent)
> > +{
> > + if (IS_ERR_OR_NULL(parent))
> > + return;
> > +
> > + /*
> > + * smp_store_release() to ensure a thread racing us in
> > + * drm_dep_fence_set_deadline() sees the parent set before
> > + * it calls test_bit(HAS_DEADLINE_BIT).
> > + */
> > + smp_store_release(&dfence->parent, dma_fence_get(parent));
> > + if (test_bit(DRM_DEP_FENCE_FLAG_HAS_DEADLINE_BIT,
> > +     &dfence->finished.flags))
> > + dma_fence_set_deadline(parent, dfence->deadline);
> > +}
> > +
> > +/**
> > + * drm_dep_fence_finished() - signal the finished fence with a result
> > + * @dfence: dep fence to signal
> > + * @result: error code to set, or 0 for success
> > + *
> > + * Sets the fence error to @result if non-zero, then signals
> > + * @dfence->finished. Also removes parent visibility under the fence lock
> > + * and drops the parent reference. Dropping the parent here allows the
> > + * DRM dep fence to be completely decoupled from the DRM dep module.
> > + *
> > + * Context: Any context.
> > + */
> > +static void drm_dep_fence_finished(struct drm_dep_fence *dfence, int result)
> > +{
> > + struct dma_fence *parent;
> > + unsigned long flags;
> > +
> > + dma_fence_lock_irqsave(&dfence->finished, flags);
> > + if (result)
> > + dma_fence_set_error(&dfence->finished, result);
> > + dma_fence_signal_locked(&dfence->finished);
> > + parent = dfence->parent;
> > + dfence->parent = NULL;
> > + dma_fence_unlock_irqrestore(&dfence->finished, flags);
> > +
> > + dma_fence_put(parent);
> > +}
> 
> We should really try to move away from manual locks and unlocks.
> 
> > +
> > +static const char *drm_dep_fence_get_driver_name(struct dma_fence *fence)
> > +{
> > + return "drm_dep";
> > +}
> > +
> > +static const char *drm_dep_fence_get_timeline_name(struct dma_fence *f)
> > +{
> > + struct drm_dep_fence *dfence = to_drm_dep_fence(f);
> > +
> > + return dfence->q->name;
> > +}
> > +
> > +/**
> > + * drm_dep_fence_get_parent() - get a reference to the parent hardware fence
> > + * @dfence: dep fence to query
> > + *
> > + * Returns a new reference to @dfence->parent, or NULL if the parent has
> > + * already been cleared (i.e. @dfence->finished has signalled and the parent
> > + * reference was dropped under the fence lock).
> > + *
> > + * Uses smp_load_acquire() to pair with the smp_store_release() in
> > + * drm_dep_fence_set_parent(), ensuring that if we race a concurrent
> > + * drm_dep_fence_set_parent() call we observe the parent pointer only after
> > + * the store is fully visible — before set_parent() tests
> > + * %DRM_DEP_FENCE_FLAG_HAS_DEADLINE_BIT.
> > + *
> > + * Caller must hold the fence lock on @dfence->finished.
> > + *
> > + * Context: Any context, fence lock on @dfence->finished must be held.
> > + * Return: a new reference to the parent fence, or NULL.
> > + */
> > +static struct dma_fence *drm_dep_fence_get_parent(struct drm_dep_fence *dfence)
> > +{
> > + dma_fence_assert_held(&dfence->finished);
> 
> > +
> > + return dma_fence_get(smp_load_acquire(&dfence->parent));
> > +}
> > +
> > +/**
> > + * drm_dep_fence_set_deadline() - dma_fence_ops deadline callback
> > + * @f: fence on which the deadline is being set
> > + * @deadline: the deadline hint to apply
> > + *
> > + * Stores the earliest deadline under the fence lock, then propagates
> > + * it to the parent hardware fence via smp_load_acquire() to race
> > + * safely with drm_dep_fence_set_parent().
> > + *
> > + * Context: Any context.
> > + */
> > +static void drm_dep_fence_set_deadline(struct dma_fence *f, ktime_t deadline)
> > +{
> > + struct drm_dep_fence *dfence = to_drm_dep_fence(f);
> > + struct dma_fence *parent;
> > + unsigned long flags;
> > +
> > + dma_fence_lock_irqsave(f, flags);
> > +
> > + /* If we already have an earlier deadline, keep it: */
> > + if (test_bit(DRM_DEP_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) &&
> > +    ktime_before(dfence->deadline, deadline)) {
> > + dma_fence_unlock_irqrestore(f, flags);
> > + return;
> > + }
> > +
> > + dfence->deadline = deadline;
> > + set_bit(DRM_DEP_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags);
> > +
> > + parent = drm_dep_fence_get_parent(dfence);
> > + dma_fence_unlock_irqrestore(f, flags);
> > +
> > + if (parent)
> > + dma_fence_set_deadline(parent, deadline);
> > +
> > + dma_fence_put(parent);
> > +}
> > +
> > +static const struct dma_fence_ops drm_dep_fence_ops = {
> > + .get_driver_name = drm_dep_fence_get_driver_name,
> > + .get_timeline_name = drm_dep_fence_get_timeline_name,
> > + .set_deadline = drm_dep_fence_set_deadline,
> > +};
> > +
> > +/**
> > + * drm_dep_fence_alloc() - allocate a dep fence
> > + *
> > + * Allocates a &drm_dep_fence with kzalloc() without initialising the
> > + * dma_fence. Call drm_dep_fence_init() to fully initialise it.
> > + *
> > + * Context: Process context.
> > + * Return: new &drm_dep_fence on success, NULL on allocation failure.
> > + */
> > +struct drm_dep_fence *drm_dep_fence_alloc(void)
> > +{
> > + return kzalloc_obj(struct drm_dep_fence);
> > +}
> > +
> > +/**
> > + * drm_dep_fence_init() - initialise the dma_fence inside a dep fence
> > + * @dfence: dep fence to initialise
> > + * @q: queue the owning job belongs to
> > + *
> > + * Initialises @dfence->finished using the context and sequence number from @q.
> > + * Passes NULL as the lock so the fence uses its inline spinlock.
> > + *
> > + * Context: Any context.
> > + */
> > +void drm_dep_fence_init(struct drm_dep_fence *dfence, struct drm_dep_queue *q)
> > +{
> > + u32 seq = ++q->fence.seqno;
> > +
> > + /*
> > + * XXX: Inline fence hazard: currently all expected users of DRM dep
> > + * hardware fences have a unique lockdep class. If that ever changes,
> > + * we will need to assign a unique lockdep class here so lockdep knows
> > + * this fence is allowed to nest with driver hardware fences.
> > + */
> > +
> > + dfence->q = q;
> > + dma_fence_init(&dfence->finished, &drm_dep_fence_ops,
> > +       NULL, q->fence.context, seq);
> > +}
> > +
> > +/**
> > + * drm_dep_fence_cleanup() - release a dep fence at job teardown
> > + * @dfence: dep fence to clean up
> > + *
> > + * Called from drm_dep_job_fini(). If the dep fence was armed (refcount > 0)
> > + * it is released via dma_fence_put() and will be freed by the RCU release
> > + * callback once all waiters have dropped their references. If it was never
> > + * armed it is freed directly with kfree().
> > + *
> > + * Context: Any context.
> > + */
> > +void drm_dep_fence_cleanup(struct drm_dep_fence *dfence)
> > +{
> > + if (drm_dep_fence_is_armed(dfence))
> > + dma_fence_put(&dfence->finished);
> > + else
> > + kfree(dfence);
> > +}
> > +
> > +/**
> > + * drm_dep_fence_is_armed() - check whether the fence has been armed
> > + * @dfence: dep fence to check
> > + *
> > + * Returns true if drm_dep_job_arm() has been called, i.e. @dfence->finished
> > + * has been initialised and its reference count is non-zero.  Used by
> > + * assertions to enforce correct job lifecycle ordering (arm before push,
> > + * add_dependency before arm).
> > + *
> > + * Context: Any context.
> > + * Return: true if the fence is armed, false otherwise.
> > + */
> > +bool drm_dep_fence_is_armed(struct drm_dep_fence *dfence)
> > +{
> > + return !!kref_read(&dfence->finished.refcount);
> > +}
> 
> > +
> > +/**
> > + * drm_dep_fence_is_finished() - test whether the finished fence has signalled
> > + * @dfence: dep fence to check
> > + *
> > + * Uses dma_fence_test_signaled_flag() to read %DMA_FENCE_FLAG_SIGNALED_BIT
> > + * directly without invoking the fence's ->signaled() callback or triggering
> > + * any signalling side-effects.
> > + *
> > + * Context: Any context.
> > + * Return: true if @dfence->finished has been signalled, false otherwise.
> > + */
> > +bool drm_dep_fence_is_finished(struct drm_dep_fence *dfence)
> > +{
> > + return dma_fence_test_signaled_flag(&dfence->finished);
> > +}
> > +
> > +/**
> > + * drm_dep_fence_is_complete() - test whether the job has completed
> > + * @dfence: dep fence to check
> > + *
> > + * Takes the fence lock on @dfence->finished and calls
> > + * drm_dep_fence_get_parent() to safely obtain a reference to the parent
> > + * hardware fence — or NULL if the parent has already been cleared after
> > + * signalling.  Calls dma_fence_is_signaled() on @parent outside the lock,
> > + * which may invoke the fence's ->signaled() callback and trigger signalling
> > + * side-effects if the fence has completed but the signalled flag has not yet
> > + * been set.  The finished fence is tested via dma_fence_test_signaled_flag(),
> > + * without side-effects.
> > + *
> > + * May only be called on a stopped queue (see drm_dep_queue_is_stopped()).
> > + *
> > + * Context: Process context. The queue must be stopped before calling this.
> > + * Return: true if the job is complete, false otherwise.
> > + */
> > +bool drm_dep_fence_is_complete(struct drm_dep_fence *dfence)
> > +{
> > + struct dma_fence *parent;
> > + unsigned long flags;
> > + bool complete;
> > +
> > + dma_fence_lock_irqsave(&dfence->finished, flags);
> > + parent = drm_dep_fence_get_parent(dfence);
> > + dma_fence_unlock_irqrestore(&dfence->finished, flags);
> > +
> > + complete = (parent && dma_fence_is_signaled(parent)) ||
> > + dma_fence_test_signaled_flag(&dfence->finished);
> > +
> > + dma_fence_put(parent);
> > +
> > + return complete;
> > +}
> > +
> > +/**
> > + * drm_dep_fence_to_dma() - return the finished dma_fence for a dep fence
> > + * @dfence: dep fence to query
> > + *
> > + * No reference is taken; the caller must hold its own reference to the owning
> > + * &drm_dep_job for the duration of the access.
> > + *
> > + * Context: Any context.
> > + * Return: the finished &dma_fence.
> > + */
> > +struct dma_fence *drm_dep_fence_to_dma(struct drm_dep_fence *dfence)
> > +{
> > + return &dfence->finished;
> > +}
> > +
> > +/**
> > + * drm_dep_fence_done() - signal the finished fence on job completion
> > + * @dfence: dep fence to signal
> > + * @result: job error code, or 0 on success
> > + *
> > + * Gets a temporary reference to @dfence->finished to guard against a racing
> > + * last-put, signals the fence with @result, then drops the temporary
> > + * reference. Called from drm_dep_job_done() in the queue core when a
> > + * hardware completion callback fires or when run_job() returns immediately.
> > + *
> > + * Context: Any context.
> > + */
> > +void drm_dep_fence_done(struct drm_dep_fence *dfence, int result)
> > +{
> > + dma_fence_get(&dfence->finished);
> > + drm_dep_fence_finished(dfence, result);
> > + dma_fence_put(&dfence->finished);
> > +}
> 
> Proper refcounting is automated (and enforced) in Rust.
> 
> > diff --git a/drivers/gpu/drm/dep/drm_dep_fence.h b/drivers/gpu/drm/dep/drm_dep_fence.h
> > new file mode 100644
> > index 000000000000..65a1582f858b
> > --- /dev/null
> > +++ b/drivers/gpu/drm/dep/drm_dep_fence.h
> > @@ -0,0 +1,25 @@
> > +/* SPDX-License-Identifier: MIT */
> > +/*
> > + * Copyright © 2026 Intel Corporation
> > + */
> > +
> > +#ifndef _DRM_DEP_FENCE_H_
> > +#define _DRM_DEP_FENCE_H_
> > +
> > +#include <linux/dma-fence.h>
> > +
> > +struct drm_dep_fence;
> > +struct drm_dep_queue;
> > +
> > +struct drm_dep_fence *drm_dep_fence_alloc(void);
> > +void drm_dep_fence_init(struct drm_dep_fence *dfence, struct drm_dep_queue *q);
> > +void drm_dep_fence_cleanup(struct drm_dep_fence *dfence);
> > +void drm_dep_fence_set_parent(struct drm_dep_fence *dfence,
> > +      struct dma_fence *parent);
> > +void drm_dep_fence_done(struct drm_dep_fence *dfence, int result);
> > +bool drm_dep_fence_is_armed(struct drm_dep_fence *dfence);
> > +bool drm_dep_fence_is_finished(struct drm_dep_fence *dfence);
> > +bool drm_dep_fence_is_complete(struct drm_dep_fence *dfence);
> > +struct dma_fence *drm_dep_fence_to_dma(struct drm_dep_fence *dfence);
> > +
> > +#endif /* _DRM_DEP_FENCE_H_ */
> > diff --git a/drivers/gpu/drm/dep/drm_dep_job.c b/drivers/gpu/drm/dep/drm_dep_job.c
> > new file mode 100644
> > index 000000000000..2d012b29a5fc
> > --- /dev/null
> > +++ b/drivers/gpu/drm/dep/drm_dep_job.c
> > @@ -0,0 +1,675 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright 2015 Advanced Micro Devices, Inc.
> > + *
> > + * Permission is hereby granted, free of charge, to any person obtaining a
> > + * copy of this software and associated documentation files (the "Software"),
> > + * to deal in the Software without restriction, including without limitation
> > + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> > + * and/or sell copies of the Software, and to permit persons to whom the
> > + * Software is furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice shall be included in
> > + * all copies or substantial portions of the Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> > + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> > + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> > + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> > + * OTHER DEALINGS IN THE SOFTWARE.
> > + *
> > + * Copyright © 2026 Intel Corporation
> > + */
> > +
> > +/**
> > + * DOC: DRM dependency job
> > + *
> > + * A struct drm_dep_job represents a single unit of GPU work associated with
> > + * a struct drm_dep_queue. The lifecycle of a job is:
> > + *
> > + * 1. **Allocation**: the driver allocates memory for the job (typically by
> > + *    embedding struct drm_dep_job in a larger structure) and calls
> > + *    drm_dep_job_init() to initialise it. On success the job holds one
> > + *    kref reference and a reference to its queue.
> > + *
> > + * 2. **Dependency collection**: the driver calls drm_dep_job_add_dependency(),
> > + *    drm_dep_job_add_syncobj_dependency(), drm_dep_job_add_resv_dependencies(),
> > + *    or drm_dep_job_add_implicit_dependencies() to register dma_fence objects
> > + *    that must be signalled before the job can run. Duplicate fences from the
> > + *    same fence context are deduplicated automatically.
> > + *
> > + * 3. **Arming**: drm_dep_job_arm() initialises the job's finished fence,
> > + *    consuming a sequence number from the queue. After arming,
> > + *    drm_dep_job_finished_fence() returns a valid fence that may be passed to
> > + *    userspace or used as a dependency by other jobs.
> > + *
> > + * 4. **Submission**: drm_dep_job_push() submits the job to the queue. The
> > + *    queue takes a reference that it holds until the job's finished fence
> > + *    signals and the job is freed by the put_job worker.
> > + *
> > + * 5. **Completion**: when the job's hardware work finishes its finished fence
> > + *    is signalled and drm_dep_job_put() is called by the queue. The driver
> > + *    must release any driver-private resources in &drm_dep_job_ops.release.
> > + *
> > + * Reference counting uses drm_dep_job_get() / drm_dep_job_put(). The
> > + * internal drm_dep_job_fini() tears down the dependency xarray and fence
> > + * objects before the driver's release callback is invoked.
> > + */
> > +
> > +#include <linux/dma-resv.h>
> > +#include <linux/kref.h>
> > +#include <linux/slab.h>
> > +#include <drm/drm_dep.h>
> > +#include <drm/drm_file.h>
> > +#include <drm/drm_gem.h>
> > +#include <drm/drm_syncobj.h>
> > +#include "drm_dep_fence.h"
> > +#include "drm_dep_job.h"
> > +#include "drm_dep_queue.h"
> > +
> > +/**
> > + * drm_dep_job_init() - initialise a dep job
> > + * @job: dep job to initialise
> > + * @args: initialisation arguments
> > + *
> > + * Initialises @job with the queue, ops and credit count from @args.  Acquires
> > + * a reference to @args->q via drm_dep_queue_get(); this reference is held for
> > + * the lifetime of the job and released by drm_dep_job_release() when the last
> > + * job reference is dropped.
> > + *
> > + * Resources are released automatically when the last reference is dropped
> > + * via drm_dep_job_put(), which must be called to release the job; drivers
> > + * must not free the job directly.
> 
> Again, can’t enforce that in C.
> 
> > + *
> > + * Context: Process context. Allocates memory with GFP_KERNEL.
> > + * Return: 0 on success, -%EINVAL if credits is 0,
> > + *   -%ENOMEM on fence allocation failure.
> > + */
> > +int drm_dep_job_init(struct drm_dep_job *job,
> > +     const struct drm_dep_job_init_args *args)
> > +{
> > + if (unlikely(!args->credits)) {
> > + pr_err("drm_dep: %s: credits cannot be 0\n", __func__);
> > + return -EINVAL;
> > + }
> > +
> > + memset(job, 0, sizeof(*job));
> > +
> > + job->dfence = drm_dep_fence_alloc();
> > + if (!job->dfence)
> > + return -ENOMEM;
> > +
> > + job->ops = args->ops;
> > + job->q = drm_dep_queue_get(args->q);
> > + job->credits = args->credits;
> > +
> > + kref_init(&job->refcount);
> > + xa_init_flags(&job->dependencies, XA_FLAGS_ALLOC);
> > + INIT_LIST_HEAD(&job->pending_link);
> > +
> > + return 0;
> > +}
> > +EXPORT_SYMBOL(drm_dep_job_init);
> > +
> > +/**
> > + * drm_dep_job_drop_dependencies() - release all input dependency fences
> > + * @job: dep job whose dependency xarray to drain
> > + *
> > + * Walks @job->dependencies, puts each fence, and destroys the xarray.
> > + * Any slots still holding a %DRM_DEP_JOB_FENCE_PREALLOC sentinel —
> > + * i.e. slots that were pre-allocated but never replaced — are silently
> > + * skipped; the sentinel carries no reference.  Called from
> > + * drm_dep_queue_run_job() in process context immediately after
> > + * @ops->run_job() returns, before the final drm_dep_job_put().  Releasing
> > + * dependencies here — while still in process context — avoids calling
> > + * xa_destroy() from IRQ context if the job's last reference is later
> > + * dropped from a dma_fence callback.
> > + *
> > + * Context: Process context.
> > + */
> > +void drm_dep_job_drop_dependencies(struct drm_dep_job *job)
> > +{
> > + struct dma_fence *fence;
> > + unsigned long index;
> > +
> > + xa_for_each(&job->dependencies, index, fence) {
> > + if (unlikely(fence == DRM_DEP_JOB_FENCE_PREALLOC))
> > + continue;
> > + dma_fence_put(fence);
> > + }
> > + xa_destroy(&job->dependencies);
> > +}
> 
> This is automated in Rust. You also can’t “forget” to call this.
> 
> > +
> > +/**
> > + * drm_dep_job_fini() - clean up a dep job
> > + * @job: dep job to clean up
> > + *
> > + * Cleans up the dep fence and drops the queue reference held by @job.
> > + *
> > + * If the job was never armed (e.g. init failed before drm_dep_job_arm()),
> > + * the dependency xarray is also released here.  For armed jobs the xarray
> > + * has already been drained by drm_dep_job_drop_dependencies() in process
> > + * context immediately after run_job(), so it is left untouched to avoid
> > + * calling xa_destroy() from IRQ context.
> > + *
> > + * Warns if @job is still linked on the queue's pending list, which would
> > + * indicate a bug in the teardown ordering.
> > + *
> > + * Context: Any context.
> > + */
> > +static void drm_dep_job_fini(struct drm_dep_job *job)
> > +{
> > + bool armed = drm_dep_fence_is_armed(job->dfence);
> > +
> > + WARN_ON(!list_empty(&job->pending_link));
> > +
> > + drm_dep_fence_cleanup(job->dfence);
> > + job->dfence = NULL;
> > +
> > + /*
> > + * Armed jobs have their dependencies drained by
> > + * drm_dep_job_drop_dependencies() in process context after run_job().
> > + * Skip here to avoid calling xa_destroy() from IRQ context.
> > + */
> > + if (!armed)
> > + drm_dep_job_drop_dependencies(job);
> > +}
> 
> Same here.
> 
> > +
> > +/**
> > + * drm_dep_job_get() - acquire a reference to a dep job
> > + * @job: dep job to acquire a reference on, or NULL
> > + *
> > + * Context: Any context.
> > + * Return: @job with an additional reference held, or NULL if @job is NULL.
> > + */
> > +struct drm_dep_job *drm_dep_job_get(struct drm_dep_job *job)
> > +{
> > + if (job)
> > + kref_get(&job->refcount);
> > + return job;
> > +}
> > +EXPORT_SYMBOL(drm_dep_job_get);
> > +
> 
> Same here.
> 
> > +/**
> > + * drm_dep_job_release() - kref release callback for a dep job
> > + * @kref: kref embedded in the dep job
> > + *
> > + * Calls drm_dep_job_fini(), then invokes &drm_dep_job_ops.release if set,
> > + * otherwise frees @job with kfree().  Finally, releases the queue reference
> > + * that was acquired by drm_dep_job_init() via drm_dep_queue_put().  The
> > + * queue put is performed last to ensure no queue state is accessed after
> > + * the job memory is freed.
> > + *
> > + * Context: Any context if %DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE is set on the
> > + *   job's queue; otherwise process context only, as the release callback may
> > + *   sleep.
> > + */
> > +static void drm_dep_job_release(struct kref *kref)
> > +{
> > + struct drm_dep_job *job =
> > + container_of(kref, struct drm_dep_job, refcount);
> > + struct drm_dep_queue *q = job->q;
> > +
> > + drm_dep_job_fini(job);
> > +
> > + if (job->ops && job->ops->release)
> > + job->ops->release(job);
> > + else
> > + kfree(job);
> > +
> > + drm_dep_queue_put(q);
> > +}
> 
> Same here.
> 
> > +
> > +/**
> > + * drm_dep_job_put() - release a reference to a dep job
> > + * @job: dep job to release a reference on, or NULL
> > + *
> > + * When the last reference is dropped, calls &drm_dep_job_ops.release if set,
> > + * otherwise frees @job with kfree(). Does nothing if @job is NULL.
> > + *
> > + * Context: Any context if %DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE is set on the
> > + *   job's queue; otherwise process context only, as the release callback may
> > + *   sleep.
> > + */
> > +void drm_dep_job_put(struct drm_dep_job *job)
> > +{
> > + if (job)
> > + kref_put(&job->refcount, drm_dep_job_release);
> > +}
> > +EXPORT_SYMBOL(drm_dep_job_put);
> > +
> 
> Same here.
> 
> > +/**
> > + * drm_dep_job_arm() - arm a dep job for submission
> > + * @job: dep job to arm
> > + *
> > + * Initialises the finished fence on @job->dfence, assigning
> > + * it a sequence number from the job's queue. Must be called after
> > + * drm_dep_job_init() and before drm_dep_job_push(). Once armed,
> > + * drm_dep_job_finished_fence() returns a valid fence that may be passed to
> > + * userspace or used as a dependency by other jobs.
> > + *
> > + * Begins the DMA fence signalling path via dma_fence_begin_signalling().
> > + * After this point, memory allocations that could trigger reclaim are
> > + * forbidden; lockdep enforces this. arm() must always be paired with
> > + * drm_dep_job_push(); lockdep also enforces this pairing.
> > + *
> > + * Warns if the job has already been armed.
> > + *
> > + * Context: Process context if %DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED is set
> > + *   (takes @q->sched.lock, a mutex); any context otherwise. DMA fence signaling
> > + *   path.
> > + */
> > +void drm_dep_job_arm(struct drm_dep_job *job)
> > +{
> > + drm_dep_queue_push_job_begin(job->q);
> > + WARN_ON(drm_dep_fence_is_armed(job->dfence));
> > + drm_dep_fence_init(job->dfence, job->q);
> > + job->signalling_cookie = dma_fence_begin_signalling();
> > +}
> > +EXPORT_SYMBOL(drm_dep_job_arm);
> > +
> > +/**
> > + * drm_dep_job_push() - submit a job to its queue for execution
> > + * @job: dep job to push
> > + *
> > + * Submits @job to the queue it was initialised with. Must be called after
> > + * drm_dep_job_arm(). Acquires a reference on @job on behalf of the queue,
> > + * held until the queue is fully done with it. The reference is released
> > + * directly in the finished-fence dma_fence callback for queues with
> > + * %DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE (where drm_dep_job_done() may run
> > + * from hardirq context), or via the put_job work item on the submit
> > + * workqueue otherwise.
> > + *
> > + * Ends the DMA fence signalling path begun by drm_dep_job_arm() via
> > + * dma_fence_end_signalling(). This must be paired with arm(); lockdep
> > + * enforces the pairing.
> > + *
> > + * Once pushed, &drm_dep_queue_ops.run_job is guaranteed to be called for
> > + * @job exactly once, even if the queue is killed or torn down before the
> > + * job reaches the head of the queue. Drivers can use this guarantee to
> > + * perform bookkeeping cleanup; the actual backend operation should be
> > + * skipped when drm_dep_queue_is_killed() returns true.
> > + *
> > + * If the queue does not support the bypass path, the job is pushed directly
> > + * onto the SPSC submission queue via drm_dep_queue_push_job() without holding
> > + * @q->sched.lock. Otherwise, @q->sched.lock is taken and the job is either
> > + * run immediately via drm_dep_queue_run_job() if it qualifies for bypass, or
> > + * enqueued via drm_dep_queue_push_job() for dispatch by the run_job work item.
> > + *
> > + * Warns if the job has not been armed.
> > + *
> > + * Context: Process context if %DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED is set
> > + *   (takes @q->sched.lock, a mutex); any context otherwise. DMA fence signaling
> > + *   path.
> > + */
> > +void drm_dep_job_push(struct drm_dep_job *job)
> > +{
> > + struct drm_dep_queue *q = job->q;
> > +
> > + WARN_ON(!drm_dep_fence_is_armed(job->dfence));
> > +
> > + drm_dep_job_get(job);
> > +
> > + if (!(q->sched.flags & DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED)) {
> > + drm_dep_queue_push_job(q, job);
> > + dma_fence_end_signalling(job->signalling_cookie);
> 
> Signaling is enforced in a more thorough way in Rust. I’ll expand on this later in this patch.
> 
> > + drm_dep_queue_push_job_end(job->q);
> > + return;
> > + }
> > +
> > + scoped_guard(mutex, &q->sched.lock) {
> > + if (drm_dep_queue_can_job_bypass(q, job))
> > + drm_dep_queue_run_job(q, job);
> > + else
> > + drm_dep_queue_push_job(q, job);
> > + }
> > +
> > + dma_fence_end_signalling(job->signalling_cookie);
> > + drm_dep_queue_push_job_end(job->q);
> > +}
> > +EXPORT_SYMBOL(drm_dep_job_push);
> > +
> > +/**
> > + * drm_dep_job_add_dependency() - adds the fence as a job dependency
> > + * @job: dep job to add the dependencies to
> > + * @fence: the dma_fence to add to the list of dependencies, or
> > + *         %DRM_DEP_JOB_FENCE_PREALLOC to reserve a slot for later.
> > + *
> > + * Note that @fence is consumed in both the success and error cases (except
> > + * when @fence is %DRM_DEP_JOB_FENCE_PREALLOC, which carries no reference).
> > + *
> > + * Signalled fences and fences belonging to the same queue as @job (i.e. where
> > + * fence->context matches the queue's finished fence context) are silently
> > + * dropped; the job need not wait on its own queue's output.
> > + *
> > + * Warns if the job has already been armed (dependencies must be added before
> > + * drm_dep_job_arm()).
> > + *
> > + * **Pre-allocation pattern**
> > + *
> > + * When multiple jobs across different queues must be prepared and submitted
> > + * together in a single atomic commit — for example, where job A's finished
> > + * fence is an input dependency of job B — all jobs must be armed and pushed
> > + * within a single dma_fence_begin_signalling() / dma_fence_end_signalling()
> > + * region.  Once that region has started no memory allocation is permitted.
> > + *
> > + * To handle this, pass %DRM_DEP_JOB_FENCE_PREALLOC during the preparation
> > + * phase (before arming any job, while GFP_KERNEL allocation is still allowed)
> > + * to pre-allocate a slot in @job->dependencies.  The slot index assigned by
> > + * the underlying xarray must be tracked by the caller separately (e.g. it is
> > + * always index 0 when the dependency array is empty, as Xe relies on).
> > + * After all jobs have been armed and the finished fences are available, call
> > + * drm_dep_job_replace_dependency() with that index and the real fence.
> > + * drm_dep_job_replace_dependency() uses GFP_NOWAIT internally and may be
> > + * called from atomic or signalling context.
> > + *
> > + * The sentinel slot is never skipped by the signalled-fence fast-path,
> > + * ensuring a slot is always allocated even when the real fence is not yet
> > + * known.
> > + *
> > + * **Example: bind job feeding TLB invalidation jobs**
> > + *
> > + * Consider a GPU with separate queues for page-table bind operations and for
> > + * TLB invalidation.  A single atomic commit must:
> > + *
> > + *  1. Run a bind job that modifies page tables.
> > + *  2. Run one TLB-invalidation job per MMU that depends on the bind
> > + *     completing, so stale translations are flushed before the engines
> > + *     continue.
> > + *
> > + * Because all jobs must be armed and pushed inside a signalling region (where
> > + * GFP_KERNEL is forbidden), pre-allocate slots before entering the region::
> > + *
> > + *   // Phase 1 — process context, GFP_KERNEL allowed
> > + *   drm_dep_job_init(bind_job, bind_queue, ops);
> > + *   for_each_mmu(mmu) {
> > + *       drm_dep_job_init(tlb_job[mmu], tlb_queue[mmu], ops);
> > + *       // Pre-allocate slot at index 0; real fence not available yet
> > + *       drm_dep_job_add_dependency(tlb_job[mmu], DRM_DEP_JOB_FENCE_PREALLOC);
> > + *   }
> > + *
> > + *   // Phase 2 — inside signalling region, no GFP_KERNEL
> > + *   dma_fence_begin_signalling();
> > + *   drm_dep_job_arm(bind_job);
> > + *   for_each_mmu(mmu) {
> > + *       // Swap sentinel for bind job's finished fence
> > + *       drm_dep_job_replace_dependency(tlb_job[mmu], 0,
> > + *                                      dma_fence_get(bind_job->finished));
> > + *       drm_dep_job_arm(tlb_job[mmu]);
> > + *   }
> > + *   drm_dep_job_push(bind_job);
> > + *   for_each_mmu(mmu)
> > + *       drm_dep_job_push(tlb_job[mmu]);
> > + *   dma_fence_end_signalling();
> > + *
> > + * Context: Process context. May allocate memory with GFP_KERNEL.
> > + * Return: If fence == DRM_DEP_JOB_FENCE_PREALLOC index of allocation on
> > + * success, else 0 on success, or a negative error code.
> > + */
> 
> > +int drm_dep_job_add_dependency(struct drm_dep_job *job, struct dma_fence *fence)
> > +{
> > + struct drm_dep_queue *q = job->q;
> > + struct dma_fence *entry;
> > + unsigned long index;
> > + u32 id = 0;
> > + int ret;
> > +
> > + WARN_ON(drm_dep_fence_is_armed(job->dfence));
> > + might_alloc(GFP_KERNEL);
> > +
> > + if (!fence)
> > + return 0;
> > +
> > + if (fence == DRM_DEP_JOB_FENCE_PREALLOC)
> > + goto add_fence;
> > +
> > + /*
> > + * Ignore signalled fences or fences from our own queue — finished
> > + * fences use q->fence.context.
> > + */
> > + if (dma_fence_test_signaled_flag(fence) ||
> > +    fence->context == q->fence.context) {
> > + dma_fence_put(fence);
> > + return 0;
> > + }
> > +
> > + /* Deduplicate if we already depend on a fence from the same context.
> > + * This lets the size of the array of deps scale with the number of
> > + * engines involved, rather than the number of BOs.
> > + */
> > + xa_for_each(&job->dependencies, index, entry) {
> > + if (entry == DRM_DEP_JOB_FENCE_PREALLOC ||
> > +    entry->context != fence->context)
> > + continue;
> > +
> > + if (dma_fence_is_later(fence, entry)) {
> > + dma_fence_put(entry);
> > + xa_store(&job->dependencies, index, fence, GFP_KERNEL);
> > + } else {
> > + dma_fence_put(fence);
> > + }
> > + return 0;
> > + }
> > +
> > +add_fence:
> > + ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b,
> > +       GFP_KERNEL);
> > + if (ret != 0) {
> > + if (fence != DRM_DEP_JOB_FENCE_PREALLOC)
> > + dma_fence_put(fence);
> > + return ret;
> > + }
> > +
> > + return (fence == DRM_DEP_JOB_FENCE_PREALLOC) ? id : 0;
> > +}
> > +EXPORT_SYMBOL(drm_dep_job_add_dependency);
> > +
> > +/**
> > + * drm_dep_job_replace_dependency() - replace a pre-allocated dependency slot
> > + * @job: dep job to update
> > + * @index: xarray index of the slot to replace, as returned when the sentinel
> > + *         was originally inserted via drm_dep_job_add_dependency()
> > + * @fence: the real dma_fence to store; its reference is always consumed
> > + *
> > + * Replaces the %DRM_DEP_JOB_FENCE_PREALLOC sentinel at @index in
> > + * @job->dependencies with @fence.  The slot must have been pre-allocated by
> > + * passing %DRM_DEP_JOB_FENCE_PREALLOC to drm_dep_job_add_dependency(); the
> > + * existing entry is asserted to be the sentinel.
> > + *
> > + * This is the second half of the pre-allocation pattern described in
> > + * drm_dep_job_add_dependency().  It is intended to be called inside a
> > + * dma_fence_begin_signalling() / dma_fence_end_signalling() region where
> > + * memory allocation with GFP_KERNEL is forbidden.  It uses GFP_NOWAIT
> > + * internally so it is safe to call from atomic or signalling context, but
> > + * since the slot has been pre-allocated no actual memory allocation occurs.
> > + *
> > + * If @fence is already signalled the slot is erased rather than storing a
> > + * redundant dependency.  The successful store is asserted — if the store
> > + * fails it indicates a programming error (slot index out of range or
> > + * concurrent modification).
> > + *
> > + * Must be called before drm_dep_job_arm(). @fence is consumed in all cases.
> 
> Can’t enforce this in C. Also, how is the fence “consumed” ? You can’t enforce that
> the user can’t access the fence anymore after this function returns, like we can do
> at compile time in Rust.
> 
> > + *
> > + * Context: Any context. DMA fence signaling path.
> > + */
> > +void drm_dep_job_replace_dependency(struct drm_dep_job *job, u32 index,
> > +    struct dma_fence *fence)
> > +{
> > + WARN_ON(xa_load(&job->dependencies, index) !=
> > + DRM_DEP_JOB_FENCE_PREALLOC);
> > +
> > + if (dma_fence_test_signaled_flag(fence)) {
> > + xa_erase(&job->dependencies, index);
> > + dma_fence_put(fence);
> > + return;
> > + }
> > +
> > + if (WARN_ON(xa_is_err(xa_store(&job->dependencies, index, fence,
> > +       GFP_NOWAIT)))) {
> > + dma_fence_put(fence);
> > + return;
> > + }
> > +}
> > +EXPORT_SYMBOL(drm_dep_job_replace_dependency);
> > +
> > +/**
> > + * drm_dep_job_add_syncobj_dependency() - adds a syncobj's fence as a
> > + *   job dependency
> > + * @job: dep job to add the dependencies to
> > + * @file: drm file private pointer
> > + * @handle: syncobj handle to lookup
> > + * @point: timeline point
> > + *
> > + * This adds the fence matching the given syncobj to @job.
> > + *
> > + * Context: Process context.
> > + * Return: 0 on success, or a negative error code.
> > + */
> > +int drm_dep_job_add_syncobj_dependency(struct drm_dep_job *job,
> > +       struct drm_file *file, u32 handle,
> > +       u32 point)
> > +{
> > + struct dma_fence *fence;
> > + int ret;
> > +
> > + ret = drm_syncobj_find_fence(file, handle, point, 0, &fence);
> > + if (ret)
> > + return ret;
> > +
> > + return drm_dep_job_add_dependency(job, fence);
> > +}
> > +EXPORT_SYMBOL(drm_dep_job_add_syncobj_dependency);
> > +
> > +/**
> > + * drm_dep_job_add_resv_dependencies() - add all fences from the resv to the job
> > + * @job: dep job to add the dependencies to
> > + * @resv: the dma_resv object to get the fences from
> > + * @usage: the dma_resv_usage to use to filter the fences
> > + *
> > + * This adds all fences matching the given usage from @resv to @job.
> > + * Must be called with the @resv lock held.
> > + *
> > + * Context: Process context.
> > + * Return: 0 on success, or a negative error code.
> > + */
> > +int drm_dep_job_add_resv_dependencies(struct drm_dep_job *job,
> > +      struct dma_resv *resv,
> > +      enum dma_resv_usage usage)
> > +{
> > + struct dma_resv_iter cursor;
> > + struct dma_fence *fence;
> > + int ret;
> > +
> > + dma_resv_assert_held(resv);
> > +
> > + dma_resv_for_each_fence(&cursor, resv, usage, fence) {
> > + /*
> > + * As drm_dep_job_add_dependency always consumes the fence
> > + * reference (even when it fails), and dma_resv_for_each_fence
> > + * is not obtaining one, we need to grab one before calling.
> > + */
> > + ret = drm_dep_job_add_dependency(job, dma_fence_get(fence));
> > + if (ret)
> > + return ret;
> > + }
> > + return 0;
> > +}
> > +EXPORT_SYMBOL(drm_dep_job_add_resv_dependencies);
> > +
> > +/**
> > + * drm_dep_job_add_implicit_dependencies() - adds implicit dependencies
> > + *   as job dependencies
> > + * @job: dep job to add the dependencies to
> > + * @obj: the gem object to add new dependencies from.
> > + * @write: whether the job might write the object (so we need to depend on
> > + * shared fences in the reservation object).
> > + *
> > + * This should be called after drm_gem_lock_reservations() on your array of
> > + * GEM objects used in the job but before updating the reservations with your
> > + * own fences.
> > + *
> > + * Context: Process context.
> > + * Return: 0 on success, or a negative error code.
> > + */
> > +int drm_dep_job_add_implicit_dependencies(struct drm_dep_job *job,
> > +  struct drm_gem_object *obj,
> > +  bool write)
> > +{
> > + return drm_dep_job_add_resv_dependencies(job, obj->resv,
> > + dma_resv_usage_rw(write));
> > +}
> > +EXPORT_SYMBOL(drm_dep_job_add_implicit_dependencies);
> > +
> > +/**
> > + * drm_dep_job_is_signaled() - check whether a dep job has completed
> > + * @job: dep job to check
> > + *
> > + * Determines whether @job has signalled. The queue should be stopped before
> > + * calling this to obtain a stable snapshot of state. Both the parent hardware
> > + * fence and the finished software fence are checked.
> > + *
> > + * Context: Process context. The queue must be stopped before calling this.
> > + * Return: true if the job is signalled, false otherwise.
> > + */
> > +bool drm_dep_job_is_signaled(struct drm_dep_job *job)
> > +{
> > + WARN_ON(!drm_dep_queue_is_stopped(job->q));
> > + return drm_dep_fence_is_complete(job->dfence);
> > +}
> > +EXPORT_SYMBOL(drm_dep_job_is_signaled);
> > +
> > +/**
> > + * drm_dep_job_is_finished() - test whether a dep job's finished fence has signalled
> > + * @job: dep job to check
> > + *
> > + * Tests whether the job's software finished fence has been signalled, using
> > + * dma_fence_test_signaled_flag() to avoid any signalling side-effects. Unlike
> > + * drm_dep_job_is_signaled(), this does not require the queue to be stopped and
> > + * does not check the parent hardware fence — it is a lightweight test of the
> > + * finished fence only.
> > + *
> > + * Context: Any context.
> > + * Return: true if the job's finished fence has been signalled, false otherwise.
> > + */
> > +bool drm_dep_job_is_finished(struct drm_dep_job *job)
> > +{
> > + return drm_dep_fence_is_finished(job->dfence);
> > +}
> > +EXPORT_SYMBOL(drm_dep_job_is_finished);
> > +
> > +/**
> > + * drm_dep_job_invalidate_job() - increment the invalidation count for a job
> > + * @job: dep job to invalidate
> > + * @threshold: threshold above which the job is considered invalidated
> > + *
> > + * Increments @job->invalidate_count and returns true if it exceeds @threshold,
> > + * indicating the job should be considered hung and discarded. The queue must
> > + * be stopped before calling this function.
> > + *
> > + * Context: Process context. The queue must be stopped before calling this.
> > + * Return: true if @job->invalidate_count exceeds @threshold, false otherwise.
> > + */
> > +bool drm_dep_job_invalidate_job(struct drm_dep_job *job, int threshold)
> > +{
> > + WARN_ON(!drm_dep_queue_is_stopped(job->q));
> > + return ++job->invalidate_count > threshold;
> > +}
> > +EXPORT_SYMBOL(drm_dep_job_invalidate_job);
> > +
> > +/**
> > + * drm_dep_job_finished_fence() - return the finished fence for a job
> > + * @job: dep job to query
> > + *
> > + * No reference is taken on the returned fence; the caller must hold its own
> > + * reference to @job for the duration of any access.
> 
> Can’t enforce this in C.
> 
> > + *
> > + * Context: Any context.
> > + * Return: the finished &dma_fence for @job.
> > + */
> > +struct dma_fence *drm_dep_job_finished_fence(struct drm_dep_job *job)
> > +{
> > + return drm_dep_fence_to_dma(job->dfence);
> > +}
> > +EXPORT_SYMBOL(drm_dep_job_finished_fence);
> > diff --git a/drivers/gpu/drm/dep/drm_dep_job.h b/drivers/gpu/drm/dep/drm_dep_job.h
> > new file mode 100644
> > index 000000000000..35c61d258fa1
> > --- /dev/null
> > +++ b/drivers/gpu/drm/dep/drm_dep_job.h
> > @@ -0,0 +1,13 @@
> > +/* SPDX-License-Identifier: MIT */
> > +/*
> > + * Copyright © 2026 Intel Corporation
> > + */
> > +
> > +#ifndef _DRM_DEP_JOB_H_
> > +#define _DRM_DEP_JOB_H_
> > +
> > +struct drm_dep_queue;
> > +
> > +void drm_dep_job_drop_dependencies(struct drm_dep_job *job);
> > +
> > +#endif /* _DRM_DEP_JOB_H_ */
> > diff --git a/drivers/gpu/drm/dep/drm_dep_queue.c b/drivers/gpu/drm/dep/drm_dep_queue.c
> > new file mode 100644
> > index 000000000000..dac02d0d22c4
> > --- /dev/null
> > +++ b/drivers/gpu/drm/dep/drm_dep_queue.c
> > @@ -0,0 +1,1647 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright 2015 Advanced Micro Devices, Inc.
> > + *
> > + * Permission is hereby granted, free of charge, to any person obtaining a
> > + * copy of this software and associated documentation files (the "Software"),
> > + * to deal in the Software without restriction, including without limitation
> > + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> > + * and/or sell copies of the Software, and to permit persons to whom the
> > + * Software is furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice shall be included in
> > + * all copies or substantial portions of the Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> > + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> > + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> > + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> > + * OTHER DEALINGS IN THE SOFTWARE.
> > + *
> > + * Copyright © 2026 Intel Corporation
> > + */
> > +
> > +/**
> > + * DOC: DRM dependency queue
> > + *
> > + * The drm_dep subsystem provides a lightweight GPU submission queue that
> > + * combines the roles of drm_gpu_scheduler and drm_sched_entity into a
> > + * single object (struct drm_dep_queue). Each queue owns its own ordered
> > + * submit workqueue, timeout workqueue, and TDR delayed-work.
> > + *
> > + * **Job lifecycle**
> > + *
> > + * 1. Allocate and initialise a job with drm_dep_job_init().
> > + * 2. Add dependency fences with drm_dep_job_add_dependency() and friends.
> > + * 3. Arm the job with drm_dep_job_arm() to obtain its out-fences.
> > + * 4. Submit with drm_dep_job_push().
> > + *
> > + * **Submission paths**
> > + *
> > + * drm_dep_job_push() decides between two paths under @q->sched.lock:
> > + *
> > + * - **Bypass path** (drm_dep_queue_can_job_bypass()): if
> > + *   %DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED is set, the queue is not stopped,
> > + *   the SPSC queue is empty, the job has no dependency fences, and credits
> > + *   are available, the job is submitted inline on the calling thread without
> > + *   touching the submit workqueue.
> > + *
> > + * - **Queued path** (drm_dep_queue_push_job()): the job is pushed onto an
> > + *   SPSC queue and the run_job worker is kicked. The run_job worker pops the
> > + *   job, resolves any remaining dependency fences (installing wakeup
> > + *   callbacks for unresolved ones), and calls drm_dep_queue_run_job().
> > + *
> > + * **Running a job**
> > + *
> > + * drm_dep_queue_run_job() accounts credits, appends the job to the pending
> > + * list (starting the TDR timer only when the list was previously empty),
> > + * calls @ops->run_job(), stores the returned hardware fence as the parent
> > + * of the job's dep fence, then installs a callback on it. When the hardware
> > + * fence fires (or the job completes synchronously), drm_dep_job_done()
> > + * signals the finished fence, returns credits, and kicks the put_job worker
> > + * to free the job.
> > + *
> > + * **Timeout detection and recovery (TDR)**
> > + *
> > + * A delayed work item fires when a job on the pending list takes longer than
> > + * @q->job.timeout jiffies. It calls @ops->timedout_job() and acts on the
> > + * returned status (%DRM_DEP_TIMEDOUT_STAT_JOB_SIGNALED or
> > + * %DRM_DEP_TIMEDOUT_STAT_REQUEUE_JOB).
> > + * drm_dep_queue_trigger_timeout() forces the timer to fire immediately (without
> > + * changing the stored timeout), for example during device teardown.
> > + *
> > + * **Reference counting**
> > + *
> > + * Jobs and queues are both reference counted.
> > + *
> > + * A job holds a reference to its queue from drm_dep_job_init() until
> > + * drm_dep_job_put() drops the job's last reference and its release callback
> > + * runs. This ensures the queue remains valid for the entire lifetime of any
> > + * job that was submitted to it.
> > + *
> > + * The queue holds its own reference to a job for as long as the job is
> > + * internally tracked: from the moment the job is added to the pending list
> > + * in drm_dep_queue_run_job() until drm_dep_job_done() kicks the put_job
> > + * worker, which calls drm_dep_job_put() to release that reference.
> 
> Why not simply keep track that the job was completed, instead of relinquishing
> the reference? We can then release the reference once the job is cleaned up
> (by the queue, using a worker) in process context.
> 
> 
> > + *
> > + * **Hazard: use-after-free from within a worker**
> > + *
> > + * Because a job holds a queue reference, drm_dep_job_put() dropping the last
> > + * job reference will also drop a queue reference via the job's release path.
> > + * If that happens to be the last queue reference, drm_dep_queue_fini() can be
> > + * called, which queues @q->free_work on dep_free_wq and returns immediately.
> > + * free_work calls disable_work_sync() / disable_delayed_work_sync() on the
> > + * queue's own workers before destroying its workqueues, so in practice a
> > + * running worker always completes before the queue memory is freed.
> > + *
> > + * However, there is a secondary hazard: a worker can be queued while the
> > + * queue is in a "zombie" state — refcount has already reached zero and async
> > + * teardown is in flight, but the work item has not yet been disabled by
> > + * free_work.  To guard against this every worker uses
> > + * drm_dep_queue_get_unless_zero() at entry; if the refcount is already zero
> > + * the worker bails immediately without touching the queue state.
> 
> Again, this problem is gone in Rust.
> 
> > + *
> > + * Because all actual teardown (disable_*_sync, destroy_workqueue) runs on
> > + * dep_free_wq — which is independent of the queue's own submit/timeout
> > + * workqueues — there is no deadlock risk.  Each queue holds a drm_dev_get()
> > + * reference on its owning &drm_device, which is released as the last step of
> > + * teardown.  This ensures the driver module cannot be unloaded while any queue
> > + * is still alive.
> > + */
> > +
> > +#include <linux/dma-resv.h>
> > +#include <linux/kref.h>
> > +#include <linux/module.h>
> > +#include <linux/overflow.h>
> > +#include <linux/slab.h>
> > +#include <linux/wait.h>
> > +#include <linux/workqueue.h>
> > +#include <drm/drm_dep.h>
> > +#include <drm/drm_drv.h>
> > +#include <drm/drm_print.h>
> > +#include "drm_dep_fence.h"
> > +#include "drm_dep_job.h"
> > +#include "drm_dep_queue.h"
> > +
> > +/*
> > + * Dedicated workqueue for deferred drm_dep_queue teardown.  Using a
> > + * module-private WQ instead of system_percpu_wq keeps teardown isolated
> > + * from unrelated kernel subsystems.
> > + */
> > +static struct workqueue_struct *dep_free_wq;
> > +
> > +/**
> > + * drm_dep_queue_flags_set() - set a flag on the queue under sched.lock
> > + * @q: dep queue
> > + * @flag: flag to set (one of &enum drm_dep_queue_flags)
> > + *
> > + * Sets @flag in @q->sched.flags. Must be called with @q->sched.lock
> > + * held; the lockdep assertion enforces this.
> > + *
> > + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> > + */
> > +static void drm_dep_queue_flags_set(struct drm_dep_queue *q,
> > +    enum drm_dep_queue_flags flag)
> > +{
> > + lockdep_assert_held(&q->sched.lock);
> 
> We can enforce this in Rust at compile-time. The code does not compile if the
> lock is not taken. Same here and everywhere else where the sched lock has
> to be taken.
> 
> 
> > + q->sched.flags |= flag;
> > +}
> > +
> > +/**
> > + * drm_dep_queue_flags_clear() - clear a flag on the queue under sched.lock
> > + * @q: dep queue
> > + * @flag: flag to clear (one of &enum drm_dep_queue_flags)
> > + *
> > + * Clears @flag in @q->sched.flags. Must be called with @q->sched.lock
> > + * held; the lockdep assertion enforces this.
> > + *
> > + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> > + */
> > +static void drm_dep_queue_flags_clear(struct drm_dep_queue *q,
> > +      enum drm_dep_queue_flags flag)
> > +{
> > + lockdep_assert_held(&q->sched.lock);
> > + q->sched.flags &= ~flag;
> > +}
> > +
> > +/**
> > + * drm_dep_queue_has_credits() - check whether the queue has enough credits
> > + * @q: dep queue
> > + * @job: job requesting credits
> > + *
> > + * Checks whether the queue has enough available credits to dispatch
> > + * @job. If @job->credits exceeds the queue's credit limit, it is
> > + * clamped with a WARN.
> > + *
> > + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> > + * Return: true if available credits >= @job->credits, false otherwise.
> > + */
> > +static bool drm_dep_queue_has_credits(struct drm_dep_queue *q,
> > +      struct drm_dep_job *job)
> > +{
> > + u32 available;
> > +
> > + lockdep_assert_held(&q->sched.lock);
> > +
> > + if (job->credits > q->credit.limit) {
> > + drm_warn(q->drm,
> > + "Jobs may not exceed the credit limit, truncate.\n");
> > + job->credits = q->credit.limit;
> > + }
> > +
> > + WARN_ON(check_sub_overflow(q->credit.limit,
> > +   atomic_read(&q->credit.count),
> > +   &available));
> > +
> > + return available >= job->credits;
> > +}
> > +
> > +/**
> > + * drm_dep_queue_run_job_queue() - kick the run-job worker
> > + * @q: dep queue
> > + *
> > + * Queues @q->sched.run_job on @q->sched.submit_wq unless the queue is stopped
> > + * or the job queue is empty.  The empty-queue check avoids queueing a work item
> > + * that would immediately return with nothing to do.
> > + *
> > + * Context: Any context.
> > + */
> > +static void drm_dep_queue_run_job_queue(struct drm_dep_queue *q)
> > +{
> > + if (!drm_dep_queue_is_stopped(q) && spsc_queue_count(&q->job.queue))
> > + queue_work(q->sched.submit_wq, &q->sched.run_job);
> > +}
> > +
> > +/**
> > + * drm_dep_queue_put_job_queue() - kick the put-job worker
> > + * @q: dep queue
> > + *
> > + * Queues @q->sched.put_job on @q->sched.submit_wq unless the queue
> > + * is stopped.
> > + *
> > + * Context: Any context.
> > + */
> > +static void drm_dep_queue_put_job_queue(struct drm_dep_queue *q)
> > +{
> > + if (!drm_dep_queue_is_stopped(q))
> > + queue_work(q->sched.submit_wq, &q->sched.put_job);
> > +}
> > +
> > +/**
> > + * drm_queue_start_timeout() - arm or re-arm the TDR delayed work
> > + * @q: dep queue
> > + *
> > + * Arms the TDR delayed work with @q->job.timeout. No-op if
> > + * @q->ops->timedout_job is NULL, the timeout is MAX_SCHEDULE_TIMEOUT,
> > + * or the pending list is empty.
> > + *
> > + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> > + */
> > +static void drm_queue_start_timeout(struct drm_dep_queue *q)
> > +{
> > + lockdep_assert_held(&q->job.lock);
> > +
> > + if (!q->ops->timedout_job ||
> > +    q->job.timeout == MAX_SCHEDULE_TIMEOUT ||
> > +    list_empty(&q->job.pending))
> > + return;
> > +
> > + mod_delayed_work(q->sched.timeout_wq, &q->sched.tdr, q->job.timeout);
> > +}
> > +
> > +/**
> > + * drm_queue_start_timeout_unlocked() - arm TDR, acquiring job.lock
> > + * @q: dep queue
> > + *
> > + * Acquires @q->job.lock with irq and calls
> > + * drm_queue_start_timeout().
> > + *
> > + * Context: Process context (workqueue).
> > + */
> > +static void drm_queue_start_timeout_unlocked(struct drm_dep_queue *q)
> > +{
> > + guard(spinlock_irq)(&q->job.lock);
> > + drm_queue_start_timeout(q);
> > +}
> > +
> > +/**
> > + * drm_dep_queue_remove_dependency() - clear the active dependency and wake
> > + *   the run-job worker
> > + * @q: dep queue
> > + * @f: the dependency fence being removed
> > + *
> > + * Stores @f into @q->dep.removed_fence via smp_store_release() so that the
> > + * run-job worker can drop the reference to it in drm_dep_queue_is_ready(),
> > + * paired with smp_load_acquire().  Clears @q->dep.fence and kicks the
> > + * run-job worker.
> > + *
> > + * The fence reference is not dropped here; it is deferred to the run-job
> > + * worker via @q->dep.removed_fence to keep this path suitable dma_fence
> > + * callback removal in drm_dep_queue_kill().
> 
> This is a comment in C, but in Rust this is encoded directly in the type system.
> 
> > + *
> > + * Context: Any context.
> > + */
> > +static void drm_dep_queue_remove_dependency(struct drm_dep_queue *q,
> > +    struct dma_fence *f)
> > +{
> > + /* removed_fence must be visible to the reader before &q->dep.fence */
> > + smp_store_release(&q->dep.removed_fence, f);
> > +
> > + WRITE_ONCE(q->dep.fence, NULL);
> > + drm_dep_queue_run_job_queue(q);
> > +}
> > +
> > +/**
> > + * drm_dep_queue_wakeup() - dma_fence callback to wake the run-job worker
> > + * @f: the signalled dependency fence
> > + * @cb: callback embedded in the dep queue
> > + *
> > + * Called from dma_fence_signal() when the active dependency fence signals.
> > + * Delegates to drm_dep_queue_remove_dependency() to clear @q->dep.fence and
> > + * kick the run-job worker.  The fence reference is not dropped here; it is
> > + * deferred to the run-job worker via @q->dep.removed_fence.
> 
> Same here.
> 
> > + *
> > + * Context: Any context.
> > + */
> > +static void drm_dep_queue_wakeup(struct dma_fence *f, struct dma_fence_cb *cb)
> > +{
> > + struct drm_dep_queue *q =
> > + container_of(cb, struct drm_dep_queue, dep.cb);
> > +
> > + drm_dep_queue_remove_dependency(q, f);
> > +}
> > +
> > +/**
> > + * drm_dep_queue_is_ready() - check whether the queue has a dispatchable job
> > + * @q: dep queue
> > + *
> > + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> 
> Can’t call this in Rust if the lock is not taken.
> 
> > + * Return: true if SPSC queue non-empty and no dep fence pending,
> > + *   false otherwise.
> > + */
> > +static bool drm_dep_queue_is_ready(struct drm_dep_queue *q)
> > +{
> > + lockdep_assert_held(&q->sched.lock);
> > +
> > + if (!spsc_queue_count(&q->job.queue))
> > + return false;
> > +
> > + if (READ_ONCE(q->dep.fence))
> > + return false;
> > +
> > + /* Paired with smp_store_release in drm_dep_queue_remove_dependency() */
> > + dma_fence_put(smp_load_acquire(&q->dep.removed_fence));
> > +
> > + q->dep.removed_fence = NULL;
> > +
> > + return true;
> > +}
> > +
> > +/**
> > + * drm_dep_queue_is_killed() - check whether a dep queue has been killed
> > + * @q: dep queue to check
> > + *
> > + * Return: true if %DRM_DEP_QUEUE_FLAGS_KILLED is set on @q, false otherwise.
> > + *
> > + * Context: Any context.
> > + */
> > +bool drm_dep_queue_is_killed(struct drm_dep_queue *q)
> > +{
> > + return !!(q->sched.flags & DRM_DEP_QUEUE_FLAGS_KILLED);
> > +}
> > +EXPORT_SYMBOL(drm_dep_queue_is_killed);
> > +
> > +/**
> > + * drm_dep_queue_is_initialized() - check whether a dep queue has been initialized
> > + * @q: dep queue to check
> > + *
> > + * A queue is considered initialized once its ops pointer has been set by a
> > + * successful call to drm_dep_queue_init().  Drivers that embed a
> > + * &drm_dep_queue inside a larger structure may call this before attempting any
> > + * other queue operation to confirm that initialization has taken place.
> > + * drm_dep_queue_put() must be called if this function returns true to drop the
> > + * initialization reference from drm_dep_queue_init().
> > + *
> > + * Return: true if @q has been initialized, false otherwise.
> > + *
> > + * Context: Any context.
> > + */
> > +bool drm_dep_queue_is_initialized(struct drm_dep_queue *q)
> > +{
> > + return !!q->ops;
> > +}
> > +EXPORT_SYMBOL(drm_dep_queue_is_initialized);
> > +
> > +/**
> > + * drm_dep_queue_set_stopped() - pre-mark a queue as stopped before first use
> > + * @q: dep queue to mark
> > + *
> > + * Sets %DRM_DEP_QUEUE_FLAGS_STOPPED directly on @q without going through the
> > + * normal drm_dep_queue_stop() path.  This is only valid during the driver-side
> > + * queue initialisation sequence — i.e. after drm_dep_queue_init() returns but
> > + * before the queue is made visible to other threads (e.g. before it is added
> > + * to any lookup structures).  Using this after the queue is live is a driver
> > + * bug; use drm_dep_queue_stop() instead.
> > + *
> > + * Context: Process context, queue not yet visible to other threads.
> > + */
> > +void drm_dep_queue_set_stopped(struct drm_dep_queue *q)
> > +{
> > + q->sched.flags |= DRM_DEP_QUEUE_FLAGS_STOPPED;
> > +}
> > +EXPORT_SYMBOL(drm_dep_queue_set_stopped);
> > +
> > +/**
> > + * drm_dep_queue_refcount() - read the current reference count of a queue
> > + * @q: dep queue to query
> > + *
> > + * Returns the instantaneous kref value.  The count may change immediately
> > + * after this call; callers must not make safety decisions based solely on
> > + * the returned value.  Intended for diagnostic snapshots and debugfs output.
> > + *
> > + * Context: Any context.
> > + * Return: current reference count.
> > + */
> > +unsigned int drm_dep_queue_refcount(const struct drm_dep_queue *q)
> > +{
> > + return kref_read(&q->refcount);
> > +}
> > +EXPORT_SYMBOL(drm_dep_queue_refcount);
> > +
> > +/**
> > + * drm_dep_queue_timeout() - read the per-job TDR timeout for a queue
> > + * @q: dep queue to query
> > + *
> > + * Returns the per-job timeout in jiffies as set at init time.
> > + * %MAX_SCHEDULE_TIMEOUT means no timeout is configured.
> > + *
> > + * Context: Any context.
> > + * Return: timeout in jiffies.
> > + */
> > +long drm_dep_queue_timeout(const struct drm_dep_queue *q)
> > +{
> > + return q->job.timeout;
> > +}
> > +EXPORT_SYMBOL(drm_dep_queue_timeout);
> > +
> > +/**
> > + * drm_dep_queue_is_job_put_irq_safe() - test whether job-put from IRQ is allowed
> > + * @q: dep queue
> > + *
> > + * Context: Any context.
> > + * Return: true if %DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE is set,
> > + *   false otherwise.
> > + */
> > +static bool drm_dep_queue_is_job_put_irq_safe(const struct drm_dep_queue *q)
> > +{
> > + return !!(q->sched.flags & DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE);
> > +}
> > +
> > +/**
> > + * drm_dep_queue_job_dependency() - get next unresolved dep fence
> > + * @q: dep queue
> > + * @job: job whose dependencies to advance
> > + *
> > + * Returns NULL immediately if the queue has been killed via
> > + * drm_dep_queue_kill(), bypassing all dependency waits so that jobs
> > + * drain through run_job as quickly as possible.
> > + *
> > + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> > + * Return: next unresolved &dma_fence with a new reference, or NULL
> > + *   when all dependencies have been consumed (or the queue is killed).
> > + */
> > +static struct dma_fence *
> > +drm_dep_queue_job_dependency(struct drm_dep_queue *q,
> > +     struct drm_dep_job *job)
> > +{
> > + struct dma_fence *f;
> > +
> > + lockdep_assert_held(&q->sched.lock);
> > +
> > + if (drm_dep_queue_is_killed(q))
> > + return NULL;
> > +
> > + f = xa_load(&job->dependencies, job->last_dependency);
> > + if (f) {
> > + job->last_dependency++;
> > + if (WARN_ON(DRM_DEP_JOB_FENCE_PREALLOC == f))
> > + return dma_fence_get_stub();
> > + return dma_fence_get(f);
> > + }
> > +
> > + return NULL;
> > +}
> > +
> > +/**
> > + * drm_dep_queue_add_dep_cb() - install wakeup callback on dep fence
> > + * @q: dep queue
> > + * @job: job whose dependency fence is stored in @q->dep.fence
> > + *
> > + * Installs a wakeup callback on @q->dep.fence. Returns true if the
> > + * callback was installed (the queue must wait), false if the fence is
> > + * already signalled or is a self-fence from the same queue context.
> > + *
> > + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> > + * Return: true if callback installed, false if fence already done.
> > + */
> 
> In Rust, we can encode the signaling paths with a “token type”. So any
> sections that are part of the signaling path can simply take this token as an
> argument. This type also enforces that end_signaling() is called automatically when it
> goes out of scope.
> 
> By the way, we can easily offer an irq handler type where we enforce this:
> 
> fn handle_threaded_irq(&self, device: &Device<Bound>) -> IrqReturn { 
>  let _annotation = DmaFenceSignallingAnnotation::new();  // Calls begin_signaling()
>  self.driver.handle_threaded_irq(device) 
> 
>  // end_signaling() is called here automatically.
> }
> 
> Same for workqueues:
> 
> fn work_fn(&self, device: &Device<Bound>) {
>  let _annotation = DmaFenceSignallingAnnotation::new();  // Calls begin_signaling()
>  self.driver.work_fn(device) 
> 
>  // end_signaling() is called here automatically.
> }
> 
> This is not Rust-specific, of course, but it is more ergonomic to write in Rust.
> 
> > +static bool drm_dep_queue_add_dep_cb(struct drm_dep_queue *q,
> > +     struct drm_dep_job *job)
> > +{
> > + struct dma_fence *fence = q->dep.fence;
> > +
> > + lockdep_assert_held(&q->sched.lock);
> > +
> > + if (WARN_ON(fence->context == q->fence.context)) {
> > + dma_fence_put(q->dep.fence);
> > + q->dep.fence = NULL;
> > + return false;
> > + }
> > +
> > + if (!dma_fence_add_callback(q->dep.fence, &q->dep.cb,
> > +    drm_dep_queue_wakeup))
> > + return true;
> > +
> > + dma_fence_put(q->dep.fence);
> > + q->dep.fence = NULL;
> > +
> > + return false;
> > +}
> 
> In rust we can enforce that all callbacks take a reference to the fence
> automatically. If the callback is “forgotten” in a buggy path, it is
> automatically removed, and the fence is automatically signaled with -ECANCELED.
> 
> > +
> > +/**
> > + * drm_dep_queue_pop_job() - pop a dispatchable job from the SPSC queue
> > + * @q: dep queue
> > + *
> > + * Peeks at the head of the SPSC queue and drains all resolved
> > + * dependencies. If a dependency is still pending, installs a wakeup
> > + * callback and returns NULL. On success pops the job and returns it.
> > + *
> > + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> > + * Return: next dispatchable job, or NULL if a dep is still pending.
> > + */
> > +static struct drm_dep_job *drm_dep_queue_pop_job(struct drm_dep_queue *q)
> > +{
> > + struct spsc_node *node;
> > + struct drm_dep_job *job;
> > +
> > + lockdep_assert_held(&q->sched.lock);
> > +
> > + node = spsc_queue_peek(&q->job.queue);
> > + if (!node)
> > + return NULL;
> > +
> > + job = container_of(node, struct drm_dep_job, queue_node);
> > +
> > + while ((q->dep.fence = drm_dep_queue_job_dependency(q, job))) {
> > + if (drm_dep_queue_add_dep_cb(q, job))
> > + return NULL;
> > + }
> > +
> > + spsc_queue_pop(&q->job.queue);
> > +
> > + return job;
> > +}
> > +
> > +/*
> > + * drm_dep_queue_get_unless_zero() - try to acquire a queue reference
> > + *
> > + * Workers use this instead of drm_dep_queue_get() to guard against the zombie
> > + * state: the queue's refcount has already reached zero (async teardown is in
> > + * flight) but a work item was queued before free_work had a chance to cancel
> > + * it.  If kref_get_unless_zero() fails the caller must bail immediately.
> > + *
> > + * Context: Any context.
> > + * Returns true if the reference was acquired, false if the queue is zombie.
> > + */
> 
> Again, this function is totally gone in Rust.
> 
> > +bool drm_dep_queue_get_unless_zero(struct drm_dep_queue *q)
> > +{
> > + return kref_get_unless_zero(&q->refcount);
> > +}
> > +EXPORT_SYMBOL(drm_dep_queue_get_unless_zero);
> > +
> > +/**
> > + * drm_dep_queue_run_job_work() - run-job worker
> > + * @work: work item embedded in the dep queue
> > + *
> > + * Acquires @q->sched.lock, checks stopped state, queue readiness and
> > + * available credits, pops the next job via drm_dep_queue_pop_job(),
> > + * dispatches it via drm_dep_queue_run_job(), then re-kicks itself.
> > + *
> > + * Uses drm_dep_queue_get_unless_zero() at entry and bails immediately if the
> > + * queue is in zombie state (refcount already zero, async teardown in flight).
> > + *
> > + * Context: Process context (workqueue). DMA fence signaling path.
> > + */
> > +static void drm_dep_queue_run_job_work(struct work_struct *work)
> > +{
> > + struct drm_dep_queue *q =
> > + container_of(work, struct drm_dep_queue, sched.run_job);
> > + struct spsc_node *node;
> > + struct drm_dep_job *job;
> > + bool cookie = dma_fence_begin_signalling();
> > +
> > + /* Bail if queue is zombie (refcount already zero, teardown in flight). */
> > + if (!drm_dep_queue_get_unless_zero(q)) {
> > + dma_fence_end_signalling(cookie);
> > + return;
> > + }
> > +
> > + mutex_lock(&q->sched.lock);
> > +
> > + if (drm_dep_queue_is_stopped(q))
> > + goto put_queue;
> > +
> > + if (!drm_dep_queue_is_ready(q))
> > + goto put_queue;
> > +
> > + /* Peek to check credits before committing to pop and dep resolution */
> > + node = spsc_queue_peek(&q->job.queue);
> > + if (!node)
> > + goto put_queue;
> > +
> > + job = container_of(node, struct drm_dep_job, queue_node);
> > + if (!drm_dep_queue_has_credits(q, job))
> > + goto put_queue;
> > +
> > + job = drm_dep_queue_pop_job(q);
> > + if (!job)
> > + goto put_queue;
> > +
> > + drm_dep_queue_run_job(q, job);
> > + drm_dep_queue_run_job_queue(q);
> > +
> > +put_queue:
> > + mutex_unlock(&q->sched.lock);
> > + drm_dep_queue_put(q);
> > + dma_fence_end_signalling(cookie);
> > +}
> > +
> > +/*
> > + * drm_dep_queue_remove_job() - unlink a job from the pending list and reset TDR
> > + * @q:   dep queue owning @job
> > + * @job: job to remove
> > + *
> > + * Splices @job out of @q->job.pending, cancels any pending TDR delayed work,
> > + * and arms the timeout for the new list head (if any).
> > + *
> > + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> > + */
> > +static void drm_dep_queue_remove_job(struct drm_dep_queue *q,
> > +     struct drm_dep_job *job)
> > +{
> > + lockdep_assert_held(&q->job.lock);
> > +
> > + list_del_init(&job->pending_link);
> > + cancel_delayed_work(&q->sched.tdr);
> > + drm_queue_start_timeout(q);
> > +}
> > +
> > +/**
> > + * drm_dep_queue_get_finished_job() - dequeue a finished job
> > + * @q: dep queue
> > + *
> > + * Under @q->job.lock checks the head of the pending list for a
> > + * finished dep fence. If found, removes the job from the list,
> > + * cancels the TDR, and re-arms it for the new head.
> > + *
> > + * Context: Process context (workqueue). DMA fence signaling path.
> > + * Return: the finished &drm_dep_job, or NULL if none is ready.
> > + */
> > +static struct drm_dep_job *
> > +drm_dep_queue_get_finished_job(struct drm_dep_queue *q)
> > +{
> > + struct drm_dep_job *job;
> > +
> > + guard(spinlock_irq)(&q->job.lock);
> > +
> > + job = list_first_entry_or_null(&q->job.pending, struct drm_dep_job,
> > +       pending_link);
> > + if (job && drm_dep_fence_is_finished(job->dfence))
> > + drm_dep_queue_remove_job(q, job);
> > + else
> > + job = NULL;
> > +
> > + return job;
> > +}
> > +
> > +/**
> > + * drm_dep_queue_put_job_work() - put-job worker
> > + * @work: work item embedded in the dep queue
> > + *
> > + * Drains all finished jobs by calling drm_dep_job_put() in a loop,
> > + * then kicks the run-job worker.
> > + *
> > + * Uses drm_dep_queue_get_unless_zero() at entry and bails immediately if the
> > + * queue is in zombie state (refcount already zero, async teardown in flight).
> > + *
> > + * Wraps execution in dma_fence_begin_signalling() / dma_fence_end_signalling()
> > + * because workqueue is shared with other items in the fence signaling path.
> > + *
> > + * Context: Process context (workqueue). DMA fence signaling path.
> > + */
> > +static void drm_dep_queue_put_job_work(struct work_struct *work)
> > +{
> > + struct drm_dep_queue *q =
> > + container_of(work, struct drm_dep_queue, sched.put_job);
> > + struct drm_dep_job *job;
> > + bool cookie = dma_fence_begin_signalling();
> > +
> > + /* Bail if queue is zombie (refcount already zero, teardown in flight). */
> > + if (!drm_dep_queue_get_unless_zero(q)) {
> > + dma_fence_end_signalling(cookie);
> > + return;
> > + }
> > +
> > + while ((job = drm_dep_queue_get_finished_job(q)))
> > + drm_dep_job_put(job);
> > +
> > + drm_dep_queue_run_job_queue(q);
> > +
> > + drm_dep_queue_put(q);
> > + dma_fence_end_signalling(cookie);
> > +}
> > +
> > +/**
> > + * drm_dep_queue_tdr_work() - TDR worker
> > + * @work: work item embedded in the delayed TDR work
> > + *
> > + * Removes the head job from the pending list under @q->job.lock,
> > + * asserts @q->ops->timedout_job is non-NULL, calls it outside the lock,
> > + * requeues the job if %DRM_DEP_TIMEDOUT_STAT_REQUEUE_JOB, drops the
> > + * queue's job reference on %DRM_DEP_TIMEDOUT_STAT_JOB_SIGNALED, and always
> > + * restarts the TDR timer after handling the job (unless @q is stopping).
> > + * Any other return value triggers a WARN.
> > + *
> > + * The TDR is never armed when @q->ops->timedout_job is NULL, so firing
> > + * this worker without a timedout_job callback is a driver bug.
> > + *
> > + * Uses drm_dep_queue_get_unless_zero() at entry and bails immediately if the
> > + * queue is in zombie state (refcount already zero, async teardown in flight).
> > + *
> > + * Wraps execution in dma_fence_begin_signalling() / dma_fence_end_signalling()
> > + * because timedout_job() is expected to signal the guilty job's fence as part
> > + * of reset.
> > + *
> > + * Context: Process context (workqueue). DMA fence signaling path.
> > + */
> > +static void drm_dep_queue_tdr_work(struct work_struct *work)
> > +{
> > + struct drm_dep_queue *q =
> > + container_of(work, struct drm_dep_queue, sched.tdr.work);
> > + struct drm_dep_job *job;
> > + bool cookie = dma_fence_begin_signalling();
> > +
> > + /* Bail if queue is zombie (refcount already zero, teardown in flight). */
> > + if (!drm_dep_queue_get_unless_zero(q)) {
> > + dma_fence_end_signalling(cookie);
> > + return;
> > + }
> > +
> > + scoped_guard(spinlock_irq, &q->job.lock) {
> > + job = list_first_entry_or_null(&q->job.pending,
> > +       struct drm_dep_job,
> > +       pending_link);
> > + if (job)
> > + /*
> > + * Remove from pending so it cannot be freed
> > + * concurrently by drm_dep_queue_get_finished_job() or
> > + * .drm_dep_job_done().
> > + */
> > + list_del_init(&job->pending_link);
> > + }
> > +
> > + if (job) {
> > + enum drm_dep_timedout_stat status;
> > +
> > + if (WARN_ON(!q->ops->timedout_job)) {
> > + drm_dep_job_put(job);
> > + goto out;
> > + }
> > +
> > + status = q->ops->timedout_job(job);
> > +
> > + switch (status) {
> > + case DRM_DEP_TIMEDOUT_STAT_REQUEUE_JOB:
> > + scoped_guard(spinlock_irq, &q->job.lock)
> > + list_add(&job->pending_link, &q->job.pending);
> > + drm_dep_queue_put_job_queue(q);
> > + break;
> > + case DRM_DEP_TIMEDOUT_STAT_JOB_SIGNALED:
> > + drm_dep_job_put(job);
> > + break;
> > + default:
> > + WARN_ON("invalid drm_dep_timedout_stat");
> > + break;
> > + }
> > + }
> > +
> > +out:
> > + drm_queue_start_timeout_unlocked(q);
> > + drm_dep_queue_put(q);
> > + dma_fence_end_signalling(cookie);
> > +}
> > +
> > +/**
> > + * drm_dep_alloc_submit_wq() - allocate an ordered submit workqueue
> > + * @name: name for the workqueue
> > + * @flags: DRM_DEP_QUEUE_FLAGS_* flags
> > + *
> > + * Allocates an ordered workqueue for job submission with %WQ_MEM_RECLAIM and
> > + * %WQ_MEM_WARN_ON_RECLAIM set, ensuring the workqueue is safe to use from
> > + * memory reclaim context and properly annotated for lockdep taint tracking.
> > + * Adds %WQ_HIGHPRI if %DRM_DEP_QUEUE_FLAGS_HIGHPRI is set. When
> > + * CONFIG_LOCKDEP is enabled, uses a dedicated lockdep map for annotation.
> > + *
> > + * Context: Process context.
> > + * Return: the new &workqueue_struct, or NULL on failure.
> > + */
> > +static struct workqueue_struct *
> > +drm_dep_alloc_submit_wq(const char *name, enum drm_dep_queue_flags flags)
> > +{
> > + unsigned int wq_flags = WQ_MEM_RECLAIM | WQ_MEM_WARN_ON_RECLAIM;
> > +
> > + if (flags & DRM_DEP_QUEUE_FLAGS_HIGHPRI)
> > + wq_flags |= WQ_HIGHPRI;
> > +
> > +#if IS_ENABLED(CONFIG_LOCKDEP)
> > + static struct lockdep_map map = {
> > + .name = "drm_dep_submit_lockdep_map"
> > + };
> > + return alloc_ordered_workqueue_lockdep_map(name, wq_flags, &map);
> > +#else
> > + return alloc_ordered_workqueue(name, wq_flags);
> > +#endif
> > +}
> > +
> > +/**
> > + * drm_dep_alloc_timeout_wq() - allocate an ordered TDR workqueue
> > + * @name: name for the workqueue
> > + *
> > + * Allocates an ordered workqueue for timeout detection and recovery with
> > + * %WQ_MEM_RECLAIM and %WQ_MEM_WARN_ON_RECLAIM set, ensuring consistent taint
> > + * annotation with the submit workqueue. When CONFIG_LOCKDEP is enabled, uses
> > + * a dedicated lockdep map for annotation.
> > + *
> > + * Context: Process context.
> > + * Return: the new &workqueue_struct, or NULL on failure.
> > + */
> > +static struct workqueue_struct *drm_dep_alloc_timeout_wq(const char *name)
> > +{
> > + unsigned int wq_flags = WQ_MEM_RECLAIM | WQ_MEM_WARN_ON_RECLAIM;
> > +
> > +#if IS_ENABLED(CONFIG_LOCKDEP)
> > + static struct lockdep_map map = {
> > + .name = "drm_dep_timeout_lockdep_map"
> > + };
> > + return alloc_ordered_workqueue_lockdep_map(name, wq_flags, &map);
> > +#else
> > + return alloc_ordered_workqueue(name, wq_flags);
> > +#endif
> > +}
> > +
> > +/**
> > + * drm_dep_queue_init() - initialize a dep queue
> > + * @q: dep queue to initialize
> > + * @args: initialization arguments
> > + *
> > + * Initializes all fields of @q from @args. If @args->submit_wq is NULL an
> > + * ordered workqueue is allocated and owned by the queue
> > + * (%DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ). If @args->timeout_wq is NULL an
> > + * ordered workqueue is allocated and owned by the queue
> > + * (%DRM_DEP_QUEUE_FLAGS_OWN_TIMEDOUT_WQ). On success the queue holds one kref
> > + * reference and drm_dep_queue_put() must be called to drop this reference
> > + * (i.e., drivers cannot directly free the queue).
> > + *
> > + * When CONFIG_LOCKDEP is enabled, @q->sched.lock is primed against the
> > + * fs_reclaim pseudo-lock so that lockdep can detect any lock ordering
> > + * inversion between @sched.lock and memory reclaim.
> > + *
> > + * Return: 0 on success, %-EINVAL when @args->credit_limit is zero, @args->ops
> > + * is NULL, @args->drm is NULL, @args->ops->run_job is NULL, or when
> > + * @args->submit_wq or @args->timeout_wq is non-NULL but was not allocated with
> > + * %WQ_MEM_WARN_ON_RECLAIM; %-ENOMEM when workqueue allocation fails.
> > + *
> > + * Context: Process context. May allocate memory and create workqueues.
> > + */
> > +int drm_dep_queue_init(struct drm_dep_queue *q,
> > +       const struct drm_dep_queue_init_args *args)
> > +{
> > + if (!args->credit_limit || !args->drm || !args->ops ||
> > +    !args->ops->run_job)
> > + return -EINVAL;
> > +
> > + if (args->submit_wq && !workqueue_is_reclaim_annotated(args->submit_wq))
> > + return -EINVAL;
> > +
> > + if (args->timeout_wq &&
> > +    !workqueue_is_reclaim_annotated(args->timeout_wq))
> > + return -EINVAL;
> > +
> > + memset(q, 0, sizeof(*q));
> > +
> > + q->name = args->name;
> > + q->drm = args->drm;
> > + q->credit.limit = args->credit_limit;
> > + q->job.timeout = args->timeout ? args->timeout : MAX_SCHEDULE_TIMEOUT;
> > +
> > + init_rcu_head(&q->rcu);
> > + INIT_LIST_HEAD(&q->job.pending);
> > + spin_lock_init(&q->job.lock);
> > + spsc_queue_init(&q->job.queue);
> > +
> > + mutex_init(&q->sched.lock);
> > + if (IS_ENABLED(CONFIG_LOCKDEP)) {
> > + fs_reclaim_acquire(GFP_KERNEL);
> > + might_lock(&q->sched.lock);
> > + fs_reclaim_release(GFP_KERNEL);
> > + }
> > +
> > + if (args->submit_wq) {
> > + q->sched.submit_wq = args->submit_wq;
> > + } else {
> > + q->sched.submit_wq = drm_dep_alloc_submit_wq(args->name ?: "drm_dep",
> > +     args->flags);
> > + if (!q->sched.submit_wq)
> > + return -ENOMEM;
> > +
> > + q->sched.flags |= DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ;
> > + }
> > +
> > + if (args->timeout_wq) {
> > + q->sched.timeout_wq = args->timeout_wq;
> > + } else {
> > + q->sched.timeout_wq = drm_dep_alloc_timeout_wq(args->name ?: "drm_dep");
> > + if (!q->sched.timeout_wq)
> > + goto err_submit_wq;
> > +
> > + q->sched.flags |= DRM_DEP_QUEUE_FLAGS_OWN_TIMEDOUT_WQ;
> > + }
> > +
> > + q->sched.flags |= args->flags &
> > + ~(DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ |
> > +  DRM_DEP_QUEUE_FLAGS_OWN_TIMEDOUT_WQ);
> > +
> > + INIT_DELAYED_WORK(&q->sched.tdr, drm_dep_queue_tdr_work);
> > + INIT_WORK(&q->sched.run_job, drm_dep_queue_run_job_work);
> > + INIT_WORK(&q->sched.put_job, drm_dep_queue_put_job_work);
> > +
> > + q->fence.context = dma_fence_context_alloc(1);
> > +
> > + kref_init(&q->refcount);
> > + q->ops = args->ops;
> > + drm_dev_get(q->drm);
> > +
> > + return 0;
> > +
> > +err_submit_wq:
> > + if (q->sched.flags & DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ)
> > + destroy_workqueue(q->sched.submit_wq);
> > + mutex_destroy(&q->sched.lock);
> > +
> > + return -ENOMEM;
> > +}
> > +EXPORT_SYMBOL(drm_dep_queue_init);
> > +
> > +#if IS_ENABLED(CONFIG_PROVE_LOCKING)
> > +/**
> > + * drm_dep_queue_push_job_begin() - mark the start of an arm/push critical section
> > + * @q: dep queue the job belongs to
> > + *
> > + * Called at the start of drm_dep_job_arm() and warns if the push context is
> > + * already owned by another task, which would indicate concurrent arm/push on
> > + * the same queue.
> > + *
> > + * No-op when CONFIG_PROVE_LOCKING is disabled.
> > + *
> > + * Context: Process context. DMA fence signaling path.
> > + */
> > +void drm_dep_queue_push_job_begin(struct drm_dep_queue *q)
> > +{
> > + WARN_ON(q->job.push.owner);
> > + q->job.push.owner = current;
> > +}
> > +
> > +/**
> > + * drm_dep_queue_push_job_end() - mark the end of an arm/push critical section
> > + * @q: dep queue the job belongs to
> > + *
> > + * Called at the end of drm_dep_job_push() and warns if the push context is not
> > + * owned by the current task, which would indicate a mismatched begin/end pair
> > + * or a push from the wrong thread.
> > + *
> > + * No-op when CONFIG_PROVE_LOCKING is disabled.
> > + *
> > + * Context: Process context. DMA fence signaling path.
> > + */
> > +void drm_dep_queue_push_job_end(struct drm_dep_queue *q)
> > +{
> > + WARN_ON(q->job.push.owner != current);
> > + q->job.push.owner = NULL;
> > +}
> > +#endif
> > +
> > +/**
> > + * drm_dep_queue_assert_teardown_invariants() - assert teardown invariants
> > + * @q: dep queue being torn down
> > + *
> > + * Warns if the pending-job list, the SPSC submission queue, or the credit
> > + * counter is non-zero when called, or if the queue still has a non-zero
> > + * reference count.
> > + *
> > + * Context: Any context.
> > + */
> > +static void drm_dep_queue_assert_teardown_invariants(struct drm_dep_queue *q)
> > +{
> > + WARN_ON(!list_empty(&q->job.pending));
> > + WARN_ON(spsc_queue_count(&q->job.queue));
> > + WARN_ON(atomic_read(&q->credit.count));
> > + WARN_ON(drm_dep_queue_refcount(q));
> > +}
> > +
> > +/**
> > + * drm_dep_queue_release() - final internal cleanup of a dep queue
> > + * @q: dep queue to clean up
> > + *
> > + * Asserts teardown invariants and destroys internal resources allocated by
> > + * drm_dep_queue_init() that cannot be torn down earlier in the teardown
> > + * sequence.  Currently this destroys @q->sched.lock.
> > + *
> > + * Drivers that implement &drm_dep_queue_ops.release **must** call this
> > + * function after removing @q from any internal bookkeeping (e.g. lookup
> > + * tables or lists) but before freeing the memory that contains @q.  When
> > + * &drm_dep_queue_ops.release is NULL, drm_dep follows the default teardown
> > + * path and calls this function automatically.
> > + *
> > + * Context: Any context.
> > + */
> > +void drm_dep_queue_release(struct drm_dep_queue *q)
> > +{
> > + drm_dep_queue_assert_teardown_invariants(q);
> > + mutex_destroy(&q->sched.lock);
> > +}
> > +EXPORT_SYMBOL(drm_dep_queue_release);
> > +
> > +/**
> > + * drm_dep_queue_free() - final cleanup of a dep queue
> > + * @q: dep queue to free
> > + *
> > + * Invokes &drm_dep_queue_ops.release if set, in which case the driver is
> > + * responsible for calling drm_dep_queue_release() and freeing @q itself.
> > + * If &drm_dep_queue_ops.release is NULL, calls drm_dep_queue_release()
> > + * and then frees @q with kfree_rcu().
> > + *
> > + * In either case, releases the drm_dev_get() reference taken at init time
> > + * via drm_dev_put(), allowing the owning &drm_device to be unloaded once
> > + * all queues have been freed.
> > + *
> > + * Context: Process context (workqueue), reclaim safe.
> > + */
> > +static void drm_dep_queue_free(struct drm_dep_queue *q)
> > +{
> > + struct drm_device *drm = q->drm;
> > +
> > + if (q->ops->release) {
> > + q->ops->release(q);
> > + } else {
> > + drm_dep_queue_release(q);
> > + kfree_rcu(q, rcu);
> > + }
> > + drm_dev_put(drm);
> > +}
> > +
> > +/**
> > + * drm_dep_queue_free_work() - deferred queue teardown worker
> > + * @work: free_work item embedded in the dep queue
> > + *
> > + * Runs on dep_free_wq. Disables all work items synchronously
> > + * (preventing re-queue and waiting for in-flight instances),
> > + * destroys any owned workqueues, then calls drm_dep_queue_free().
> > + * Running on dep_free_wq ensures destroy_workqueue() is never
> > + * called from within one of the queue's own workers (deadlock)
> > + * and disable_*_sync() cannot deadlock either.
> > + *
> > + * Context: Process context (workqueue), reclaim safe.
> > + */
> > +static void drm_dep_queue_free_work(struct work_struct *work)
> > +{
> > + struct drm_dep_queue *q =
> > + container_of(work, struct drm_dep_queue, free_work);
> > +
> > + drm_dep_queue_assert_teardown_invariants(q);
> > +
> > + disable_delayed_work_sync(&q->sched.tdr);
> > + disable_work_sync(&q->sched.run_job);
> > + disable_work_sync(&q->sched.put_job);
> > +
> > + if (q->sched.flags & DRM_DEP_QUEUE_FLAGS_OWN_TIMEDOUT_WQ)
> > + destroy_workqueue(q->sched.timeout_wq);
> > +
> > + if (q->sched.flags & DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ)
> > + destroy_workqueue(q->sched.submit_wq);
> > +
> > + drm_dep_queue_free(q);
> > +}
> > +
> > +/**
> > + * drm_dep_queue_fini() - tear down a dep queue
> > + * @q: dep queue to tear down
> > + *
> > + * Asserts teardown invariants  and nitiates teardown of @q by queuing the
> > + * deferred free work onto tht module-private dep_free_wq workqueue.  The work
> > + * item disables any pending TDR and run/put-job work synchronously, destroys
> > + * any workqueues that were allocated by drm_dep_queue_init(), and then releases
> > + * the queue memory.
> > + *
> > + * Running teardown from dep_free_wq ensures that destroy_workqueue() is never
> > + * called from within one of the queue's own workers (e.g. via
> > + * drm_dep_queue_put()), which would deadlock.
> > + *
> > + * Drivers can wait for all outstanding deferred work to complete by waiting
> > + * for the last drm_dev_put() reference on their &drm_device, which is
> > + * released as the final step of each queue's teardown.
> > + *
> > + * Drivers that implement &drm_dep_queue_ops.fini **must** call this
> > + * function after removing @q from any device bookkeeping but before freeing the
> > + * memory that contains @q.  When &drm_dep_queue_ops.fini is NULL, drm_dep
> > + * follows the default teardown path and calls this function automatically.
> > + *
> > + * Context: Any context.
> > + */
> > +void drm_dep_queue_fini(struct drm_dep_queue *q)
> > +{
> > + drm_dep_queue_assert_teardown_invariants(q);
> > +
> > + INIT_WORK(&q->free_work, drm_dep_queue_free_work);
> > + queue_work(dep_free_wq, &q->free_work);
> > +}
> > +EXPORT_SYMBOL(drm_dep_queue_fini);
> > +
> > +/**
> > + * drm_dep_queue_get() - acquire a reference to a dep queue
> > + * @q: dep queue to acquire a reference on, or NULL
> > + *
> > + * Return: @q with an additional reference held, or NULL if @q is NULL.
> > + *
> > + * Context: Any context.
> > + */
> > +struct drm_dep_queue *drm_dep_queue_get(struct drm_dep_queue *q)
> > +{
> > + if (q)
> > + kref_get(&q->refcount);
> > + return q;
> > +}
> > +EXPORT_SYMBOL(drm_dep_queue_get);
> > +
> > +/**
> > + * __drm_dep_queue_release() - kref release callback for a dep queue
> > + * @kref: kref embedded in the dep queue
> > + *
> > + * Calls &drm_dep_queue_ops.fini if set, otherwise calls
> > + * drm_dep_queue_fini() to initiate deferred teardown.
> > + *
> > + * Context: Any context.
> > + */
> > +static void __drm_dep_queue_release(struct kref *kref)
> > +{
> > + struct drm_dep_queue *q =
> > + container_of(kref, struct drm_dep_queue, refcount);
> > +
> > + if (q->ops->fini)
> > + q->ops->fini(q);
> > + else
> > + drm_dep_queue_fini(q);
> > +}
> > +
> > +/**
> > + * drm_dep_queue_put() - release a reference to a dep queue
> > + * @q: dep queue to release a reference on, or NULL
> > + *
> > + * When the last reference is dropped, calls &drm_dep_queue_ops.fini if set,
> > + * otherwise calls drm_dep_queue_fini(). Final memory release is handled by
> > + * &drm_dep_queue_ops.release (which must call drm_dep_queue_release()) if set,
> > + * or drm_dep_queue_release() followed by kfree_rcu() otherwise.
> > + * Does nothing if @q is NULL.
> > + *
> > + * Context: Any context.
> > + */
> > +void drm_dep_queue_put(struct drm_dep_queue *q)
> > +{
> > + if (q)
> > + kref_put(&q->refcount, __drm_dep_queue_release);
> > +}
> > +EXPORT_SYMBOL(drm_dep_queue_put);
> > +
> > +/**
> > + * drm_dep_queue_stop() - stop a dep queue from processing new jobs
> > + * @q: dep queue to stop
> > + *
> > + * Sets %DRM_DEP_QUEUE_FLAGS_STOPPED on @q under both @q->sched.lock (mutex)
> > + * and @q->job.lock (spinlock_irq), making the flag safe to test from finished
> > + * fenced signaling context. Then cancels any in-flight run_job and put_job work
> > + * items. Once stopped, the bypass path and the submit workqueue will not
> > + * dispatch further jobs nor will any jobs be removed from the pending list.
> > + * Call drm_dep_queue_start() to resume processing.
> > + *
> > + * Context: Process context. Waits for in-flight workers to complete.
> > + */
> > +void drm_dep_queue_stop(struct drm_dep_queue *q)
> > +{
> > + scoped_guard(mutex, &q->sched.lock) {
> > + scoped_guard(spinlock_irq, &q->job.lock)
> > + drm_dep_queue_flags_set(q, DRM_DEP_QUEUE_FLAGS_STOPPED);
> > + }
> > + cancel_work_sync(&q->sched.run_job);
> > + cancel_work_sync(&q->sched.put_job);
> > +}
> > +EXPORT_SYMBOL(drm_dep_queue_stop);
> > +
> > +/**
> > + * drm_dep_queue_start() - resume a stopped dep queue
> > + * @q: dep queue to start
> > + *
> > + * Clears %DRM_DEP_QUEUE_FLAGS_STOPPED on @q under both @q->sched.lock (mutex)
> > + * and @q->job.lock (spinlock_irq), making the flag safe to test from IRQ
> > + * context. Then re-queues the run_job and put_job work items so that any jobs
> > + * pending since the queue was stopped are processed. Must only be called after
> > + * drm_dep_queue_stop().
> > + *
> > + * Context: Process context.
> > + */
> > +void drm_dep_queue_start(struct drm_dep_queue *q)
> > +{
> > + scoped_guard(mutex, &q->sched.lock) {
> > + scoped_guard(spinlock_irq, &q->job.lock)
> > + drm_dep_queue_flags_clear(q, DRM_DEP_QUEUE_FLAGS_STOPPED);
> > + }
> > + drm_dep_queue_run_job_queue(q);
> > + drm_dep_queue_put_job_queue(q);
> > +}
> > +EXPORT_SYMBOL(drm_dep_queue_start);
> > +
> > +/**
> > + * drm_dep_queue_trigger_timeout() - trigger the TDR immediately for
> > + *   all pending jobs
> > + * @q: dep queue to trigger timeout on
> > + *
> > + * Sets @q->job.timeout to 1 and arms the TDR delayed work with a one-jiffy
> > + * delay, causing it to fire almost immediately without hot-spinning at zero
> > + * delay. This is used to force-expire any pendind jobs on the queue, for
> > + * example when the device is being torn down or has encountered an
> > + * unrecoverable error.
> > + *
> > + * It is suggested that when this function is used, the first timedout_job call
> > + * causes the driver to kick the queue off the hardware and signal all pending
> > + * job fences. Subsequent calls continue to signal all pending job fences.
> > + *
> > + * Has no effect if the pending list is empty.
> > + *
> > + * Context: Any context.
> > + */
> > +void drm_dep_queue_trigger_timeout(struct drm_dep_queue *q)
> > +{
> > + guard(spinlock_irqsave)(&q->job.lock);
> > + q->job.timeout = 1;
> > + drm_queue_start_timeout(q);
> > +}
> > +EXPORT_SYMBOL(drm_dep_queue_trigger_timeout);
> > +
> > +/**
> > + * drm_dep_queue_cancel_tdr_sync() - cancel any pending TDR and wait
> > + *   for it to finish
> > + * @q: dep queue whose TDR to cancel
> > + *
> > + * Cancels the TDR delayed work item if it has not yet started, and waits for
> > + * it to complete if it is already running.  After this call returns, the TDR
> > + * worker is guaranteed not to be executing and will not fire again until
> > + * explicitly rearmed (e.g. via drm_dep_queue_resume_timeout() or by a new
> > + * job being submitted).
> > + *
> > + * Useful during error recovery or queue teardown when the caller needs to
> > + * know that no timeout handling races with its own reset logic.
> > + *
> > + * Context: Process context. May sleep waiting for the TDR worker to finish.
> > + */
> > +void drm_dep_queue_cancel_tdr_sync(struct drm_dep_queue *q)
> > +{
> > + cancel_delayed_work_sync(&q->sched.tdr);
> > +}
> > +EXPORT_SYMBOL(drm_dep_queue_cancel_tdr_sync);
> > +
> > +/**
> > + * drm_dep_queue_resume_timeout() - restart the TDR timer with the
> > + *   configured timeout
> > + * @q: dep queue to resume the timeout for
> > + *
> > + * Restarts the TDR delayed work using @q->job.timeout. Called after device
> > + * recovery to give pending jobs a fresh full timeout window. Has no effect
> > + * if the pending list is empty.
> > + *
> > + * Context: Any context.
> > + */
> > +void drm_dep_queue_resume_timeout(struct drm_dep_queue *q)
> > +{
> > + drm_queue_start_timeout_unlocked(q);
> > +}
> > +EXPORT_SYMBOL(drm_dep_queue_resume_timeout);
> > +
> > +/**
> > + * drm_dep_queue_is_stopped() - check whether a dep queue is stopped
> > + * @q: dep queue to check
> > + *
> > + * Return: true if %DRM_DEP_QUEUE_FLAGS_STOPPED is set on @q, false otherwise.
> > + *
> > + * Context: Any context.
> > + */
> > +bool drm_dep_queue_is_stopped(struct drm_dep_queue *q)
> > +{
> > + return !!(q->sched.flags & DRM_DEP_QUEUE_FLAGS_STOPPED);
> > +}
> > +EXPORT_SYMBOL(drm_dep_queue_is_stopped);
> > +
> > +/**
> > + * drm_dep_queue_kill() - kill a dep queue and flush all pending jobs
> > + * @q: dep queue to kill
> > + *
> > + * Sets %DRM_DEP_QUEUE_FLAGS_KILLED on @q under @q->sched.lock.  If a
> > + * dependency fence is currently being waited on, its callback is removed and
> > + * the run-job worker is kicked immediately so that the blocked job drains
> > + * without waiting.
> > + *
> > + * Once killed, drm_dep_queue_job_dependency() returns NULL for all jobs,
> > + * bypassing dependency waits so that every queued job drains through
> > + * &drm_dep_queue_ops.run_job without blocking.
> > + *
> > + * The &drm_dep_queue_ops.run_job callback is guaranteed to be called for every
> > + * job that was pushed before or after drm_dep_queue_kill(), even during queue
> > + * teardown.  Drivers should use this guarantee to perform any necessary
> > + * bookkeeping cleanup without executing the actual backend operation when the
> > + * queue is killed.
> > + *
> > + * Unlike drm_dep_queue_stop(), killing is one-way: there is no corresponding
> > + * start function.
> > + *
> > + * **Driver safety requirement**
> > + *
> > + * drm_dep_queue_kill() must only be called once the driver can guarantee that
> > + * no job in the queue will touch memory associated with any of its fences
> > + * (i.e., the queue has been removed from the device and will never be put back
> > + * on).
> > + *
> > + * Context: Process context.
> > + */
> > +void drm_dep_queue_kill(struct drm_dep_queue *q)
> > +{
> > + scoped_guard(mutex, &q->sched.lock) {
> > + struct dma_fence *fence;
> > +
> > + drm_dep_queue_flags_set(q, DRM_DEP_QUEUE_FLAGS_KILLED);
> > +
> > + /*
> > + * Holding &q->sched.lock guarantees that the run-job work item
> > + * cannot drop its reference to q->dep.fence concurrently, so
> > + * reading q->dep.fence here is safe.
> > + */
> > + fence = READ_ONCE(q->dep.fence);
> > + if (fence && dma_fence_remove_callback(fence, &q->dep.cb))
> > + drm_dep_queue_remove_dependency(q, fence);
> > + }
> > +}
> > +EXPORT_SYMBOL(drm_dep_queue_kill);
> > +
> > +/**
> > + * drm_dep_queue_submit_wq() - retrieve the submit workqueue of a dep queue
> > + * @q: dep queue whose workqueue to retrieve
> > + *
> > + * Drivers may use this to queue their own work items alongside the queue's
> > + * internal run-job and put-job workers — for example to process incoming
> > + * messages in the same serialisation domain.
> > + *
> > + * Prefer drm_dep_queue_work_enqueue() when the only need is to enqueue a
> > + * work item, as it additionally checks the stopped state.  Use this accessor
> > + * when the workqueue itself is required (e.g. for alloc_ordered_workqueue
> > + * replacement or drain_workqueue calls).
> > + *
> > + * Context: Any context.
> > + * Return: the &workqueue_struct used by @q for job submission.
> > + */
> > +struct workqueue_struct *drm_dep_queue_submit_wq(struct drm_dep_queue *q)
> > +{
> > + return q->sched.submit_wq;
> > +}
> > +EXPORT_SYMBOL(drm_dep_queue_submit_wq);
> > +
> > +/**
> > + * drm_dep_queue_timeout_wq() - retrieve the timeout workqueue of a dep queue
> > + * @q: dep queue whose workqueue to retrieve
> > + *
> > + * Returns the workqueue used by @q to run TDR (timeout detection and recovery)
> > + * work.  Drivers may use this to queue their own timeout-domain work items, or
> > + * to call drain_workqueue() when tearing down and needing to ensure all pending
> > + * timeout callbacks have completed before proceeding.
> > + *
> > + * Context: Any context.
> > + * Return: the &workqueue_struct used by @q for TDR work.
> > + */
> > +struct workqueue_struct *drm_dep_queue_timeout_wq(struct drm_dep_queue *q)
> > +{
> > + return q->sched.timeout_wq;
> > +}
> > +EXPORT_SYMBOL(drm_dep_queue_timeout_wq);
> > +
> > +/**
> > + * drm_dep_queue_work_enqueue() - queue work on the dep queue's submit workqueue
> > + * @q: dep queue to enqueue work on
> > + * @work: work item to enqueue
> > + *
> > + * Queues @work on @q->sched.submit_wq if the queue is not stopped.  This
> > + * allows drivers to schedule custom work items that run serialised with the
> > + * queue's own run-job and put-job workers.
> > + *
> > + * Return: true if the work was queued, false if the queue is stopped or the
> > + * work item was already pending.
> > + *
> > + * Context: Any context.
> > + */
> > +bool drm_dep_queue_work_enqueue(struct drm_dep_queue *q,
> > + struct work_struct *work)
> > +{
> > + if (drm_dep_queue_is_stopped(q))
> > + return false;
> > +
> > + return queue_work(q->sched.submit_wq, work);
> > +}
> > +EXPORT_SYMBOL(drm_dep_queue_work_enqueue);
> > +
> > +/**
> > + * drm_dep_queue_can_job_bypass() - test whether a job can skip the SPSC queue
> > + * @q: dep queue
> > + * @job: job to test
> > + *
> > + * A job may bypass the submit workqueue and run inline on the calling thread
> > + * if all of the following hold:
> > + *
> > + *  - %DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED is set on the queue
> > + *  - the queue is not stopped
> > + *  - the SPSC submission queue is empty (no other jobs waiting)
> > + *  - the queue has enough credits for @job
> > + *  - @job has no unresolved dependency fences
> > + *
> > + * Must be called under @q->sched.lock.
> > + *
> > + * Context: Process context. Must hold @q->sched.lock (a mutex).
> > + * Return: true if the job may be run inline, false otherwise.
> > + */
> > +bool drm_dep_queue_can_job_bypass(struct drm_dep_queue *q,
> > +  struct drm_dep_job *job)
> > +{
> > + lockdep_assert_held(&q->sched.lock);
> > +
> > + return q->sched.flags & DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED &&
> > + !drm_dep_queue_is_stopped(q) &&
> > + !spsc_queue_count(&q->job.queue) &&
> > + drm_dep_queue_has_credits(q, job) &&
> > + xa_empty(&job->dependencies);
> > +}
> > +
> > +/**
> > + * drm_dep_job_done() - mark a job as complete
> > + * @job: the job that finished
> > + * @result: error code to propagate, or 0 for success
> > + *
> > + * Subtracts @job->credits from the queue credit counter, then signals the
> > + * job's dep fence with @result.
> > + *
> > + * When %DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE is set (IRQ-safe path), a
> > + * temporary extra reference is taken on @job before signalling the fence.
> > + * This prevents a concurrent put-job worker — which may be woken by timeouts or
> > + * queue starting — from freeing the job while this function still holds a
> > + * pointer to it.  The extra reference is released at the end of the function.
> > + *
> > + * After signalling, the IRQ-safe path removes the job from the pending list
> > + * under @q->job.lock, provided the queue is not stopped.  Removal is skipped
> > + * when the queue is stopped so that drm_dep_queue_for_each_pending_job() can
> > + * iterate the list without racing with the completion path.  On successful
> > + * removal, kicks the run-job worker so the next queued job can be dispatched
> > + * immediately, then drops the job reference.  If the job was already removed
> > + * by TDR, or removal was skipped because the queue is stopped, kicks the
> > + * put-job worker instead to allow the deferred put to complete.
> > + *
> > + * Context: Any context.
> > + */
> > +static void drm_dep_job_done(struct drm_dep_job *job, int result)
> > +{
> > + struct drm_dep_queue *q = job->q;
> > + bool irq_safe = drm_dep_queue_is_job_put_irq_safe(q), removed = false;
> > +
> > + /*
> > + * Local ref to ensure the put worker—which may be woken by external
> > + * forces (TDR, driver-side queue starting)—doesn't free the job behind
> > + * this function's back after drm_dep_fence_done() while it is still on
> > + * the pending list.
> > + */
> > + if (irq_safe)
> > + drm_dep_job_get(job);
> > +
> > + atomic_sub(job->credits, &q->credit.count);
> > + drm_dep_fence_done(job->dfence, result);
> > +
> > + /* Only safe to touch job after fence signal if we have a local ref. */
> > +
> > + if (irq_safe) {
> > + scoped_guard(spinlock_irqsave, &q->job.lock) {
> > + removed = !list_empty(&job->pending_link) &&
> > + !drm_dep_queue_is_stopped(q);
> > +
> > + /* Guard against TDR operating on job */
> > + if (removed)
> > + drm_dep_queue_remove_job(q, job);
> > + }
> > + }
> > +
> > + if (removed) {
> > + drm_dep_queue_run_job_queue(q);
> > + drm_dep_job_put(job);
> > + } else {
> > + drm_dep_queue_put_job_queue(q);
> > + }
> > +
> > + if (irq_safe)
> > + drm_dep_job_put(job);
> > +}
> > +
> > +/**
> > + * drm_dep_job_done_cb() - dma_fence callback to complete a job
> > + * @f: the hardware fence that signalled
> > + * @cb: fence callback embedded in the dep job
> > + *
> > + * Extracts the job from @cb and calls drm_dep_job_done() with
> > + * @f->error as the result.
> > + *
> > + * Context: Any context, but with IRQ disabled. May not sleep.
> > + */
> > +static void drm_dep_job_done_cb(struct dma_fence *f, struct dma_fence_cb *cb)
> > +{
> > + struct drm_dep_job *job = container_of(cb, struct drm_dep_job, cb);
> > +
> > + drm_dep_job_done(job, f->error);
> > +}
> > +
> > +/**
> > + * drm_dep_queue_run_job() - submit a job to hardware and set up
> > + *   completion tracking
> > + * @q: dep queue
> > + * @job: job to run
> > + *
> > + * Accounts @job->credits against the queue, appends the job to the pending
> > + * list, then calls @q->ops->run_job(). The TDR timer is started only when
> > + * @job is the first entry on the pending list; subsequent jobs added while
> > + * a TDR is already in flight do not reset the timer (which would otherwise
> > + * extend the deadline for the already-running head job). Stores the returned
> > + * hardware fence as the parent of the job's dep fence, then installs
> > + * drm_dep_job_done_cb() on it. If the hardware fence is already signalled
> > + * (%-ENOENT from dma_fence_add_callback()) or run_job() returns NULL/error,
> > + * the job is completed immediately. Must be called under @q->sched.lock.
> > + *
> > + * Context: Process context. Must hold @q->sched.lock (a mutex). DMA fence
> > + * signaling path.
> > + */
> > +void drm_dep_queue_run_job(struct drm_dep_queue *q, struct drm_dep_job *job)
> > +{
> > + struct dma_fence *fence;
> > + int r;
> > +
> > + lockdep_assert_held(&q->sched.lock);
> > +
> > + drm_dep_job_get(job);
> > + atomic_add(job->credits, &q->credit.count);
> > +
> > + scoped_guard(spinlock_irq, &q->job.lock) {
> > + bool first = list_empty(&q->job.pending);
> > +
> > + list_add_tail(&job->pending_link, &q->job.pending);
> > + if (first)
> > + drm_queue_start_timeout(q);
> > + }
> > +
> > + fence = q->ops->run_job(job);
> > + drm_dep_fence_set_parent(job->dfence, fence);
> > +
> > + if (!IS_ERR_OR_NULL(fence)) {
> > + r = dma_fence_add_callback(fence, &job->cb,
> > +   drm_dep_job_done_cb);
> > + if (r == -ENOENT)
> > + drm_dep_job_done(job, fence->error);
> > + else if (r)
> > + drm_err(q->drm, "fence add callback failed (%d)\n", r);
> > + dma_fence_put(fence);
> > + } else {
> > + drm_dep_job_done(job, IS_ERR(fence) ? PTR_ERR(fence) : 0);
> > + }
> > +
> > + /*
> > + * Drop all input dependency fences now, in process context, before the
> > + * final job put. Once the job is on the pending list its last reference
> > + * may be dropped from a dma_fence callback (IRQ context), where calling
> > + * xa_destroy() would be unsafe.
> > + */
> 
> I assume that “pending” is the list of jobs that have been handed to the driver
> via ops->run_job()?
> 
> Can’t this problem be solved by not doing anything inside a dma_fence callback
> other than scheduling the queue worker?
> 
> > + drm_dep_job_drop_dependencies(job);
> > + drm_dep_job_put(job);
> > +}
> > +
> > +/**
> > + * drm_dep_queue_push_job() - enqueue a job on the SPSC submission queue
> > + * @q: dep queue
> > + * @job: job to push
> > + *
> > + * Pushes @job onto the SPSC queue. If the queue was previously empty
> > + * (i.e. this is the first pending job), kicks the run_job worker so it
> > + * processes the job promptly without waiting for the next wakeup.
> > + * May be called with or without @q->sched.lock held.
> > + *
> > + * Context: Any context. DMA fence signaling path.
> > + */
> > +void drm_dep_queue_push_job(struct drm_dep_queue *q, struct drm_dep_job *job)
> > +{
> > + /*
> > + * spsc_queue_push() returns true if the queue was previously empty,
> > + * i.e. this is the first pending job. Kick the run_job worker so it
> > + * picks it up without waiting for the next wakeup.
> > + */
> > + if (spsc_queue_push(&q->job.queue, &job->queue_node))
> > + drm_dep_queue_run_job_queue(q);
> > +}
> > +
> > +/**
> > + * drm_dep_init() - module initialiser
> > + *
> > + * Allocates the module-private dep_free_wq unbound workqueue used for
> > + * deferred queue teardown.
> > + *
> > + * Return: 0 on success, %-ENOMEM if workqueue allocation fails.
> > + */
> > +static int __init drm_dep_init(void)
> > +{
> > + dep_free_wq = alloc_workqueue("drm_dep_free", WQ_UNBOUND, 0);
> > + if (!dep_free_wq)
> > + return -ENOMEM;
> > +
> > + return 0;
> > +}
> > +
> > +/**
> > + * drm_dep_exit() - module exit
> > + *
> > + * Destroys the module-private dep_free_wq workqueue.
> > + */
> > +static void __exit drm_dep_exit(void)
> > +{
> > + destroy_workqueue(dep_free_wq);
> > + dep_free_wq = NULL;
> > +}
> > +
> > +module_init(drm_dep_init);
> > +module_exit(drm_dep_exit);
> > +
> > +MODULE_DESCRIPTION("DRM dependency queue");
> > +MODULE_LICENSE("Dual MIT/GPL");
> > diff --git a/drivers/gpu/drm/dep/drm_dep_queue.h b/drivers/gpu/drm/dep/drm_dep_queue.h
> > new file mode 100644
> > index 000000000000..e5c217a3fab5
> > --- /dev/null
> > +++ b/drivers/gpu/drm/dep/drm_dep_queue.h
> > @@ -0,0 +1,31 @@
> > +/* SPDX-License-Identifier: MIT */
> > +/*
> > + * Copyright © 2026 Intel Corporation
> > + */
> > +
> > +#ifndef _DRM_DEP_QUEUE_H_
> > +#define _DRM_DEP_QUEUE_H_
> > +
> > +#include <linux/types.h>
> > +
> > +struct drm_dep_job;
> > +struct drm_dep_queue;
> > +
> > +bool drm_dep_queue_can_job_bypass(struct drm_dep_queue *q,
> > +  struct drm_dep_job *job);
> > +void drm_dep_queue_run_job(struct drm_dep_queue *q, struct drm_dep_job *job);
> > +void drm_dep_queue_push_job(struct drm_dep_queue *q, struct drm_dep_job *job);
> > +
> > +#if IS_ENABLED(CONFIG_PROVE_LOCKING)
> > +void drm_dep_queue_push_job_begin(struct drm_dep_queue *q);
> > +void drm_dep_queue_push_job_end(struct drm_dep_queue *q);
> > +#else
> > +static inline void drm_dep_queue_push_job_begin(struct drm_dep_queue *q)
> > +{
> > +}
> > +static inline void drm_dep_queue_push_job_end(struct drm_dep_queue *q)
> > +{
> > +}
> > +#endif
> > +
> > +#endif /* _DRM_DEP_QUEUE_H_ */
> > diff --git a/include/drm/drm_dep.h b/include/drm/drm_dep.h
> > new file mode 100644
> > index 000000000000..615926584506
> > --- /dev/null
> > +++ b/include/drm/drm_dep.h
> > @@ -0,0 +1,597 @@
> > +/* SPDX-License-Identifier: MIT */
> > +/*
> > + * Copyright 2015 Advanced Micro Devices, Inc.
> > + *
> > + * Permission is hereby granted, free of charge, to any person obtaining a
> > + * copy of this software and associated documentation files (the "Software"),
> > + * to deal in the Software without restriction, including without limitation
> > + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> > + * and/or sell copies of the Software, and to permit persons to whom the
> > + * Software is furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice shall be included in
> > + * all copies or substantial portions of the Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> > + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> > + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> > + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> > + * OTHER DEALINGS IN THE SOFTWARE.
> > + *
> > + * Copyright © 2026 Intel Corporation
> > + */
> > +
> > +#ifndef _DRM_DEP_H_
> > +#define _DRM_DEP_H_
> > +
> > +#include <drm/spsc_queue.h>
> > +#include <linux/dma-fence.h>
> > +#include <linux/xarray.h>
> > +#include <linux/workqueue.h>
> > +
> > +enum dma_resv_usage;
> > +struct dma_resv;
> > +struct drm_dep_fence;
> > +struct drm_dep_job;
> > +struct drm_dep_queue;
> > +struct drm_file;
> > +struct drm_gem_object;
> > +
> > +/**
> > + * enum drm_dep_timedout_stat - return value of &drm_dep_queue_ops.timedout_job
> > + * @DRM_DEP_TIMEDOUT_STAT_JOB_SIGNALED: driver signaled the job's finished
> > + *   fence during reset; drm_dep may safely drop its reference to the job.
> > + * @DRM_DEP_TIMEDOUT_STAT_REQUEUE_JOB: timeout was a false alarm; reinsert the
> > + *   job at the head of the pending list so it can complete normally.
> > + */
> > +enum drm_dep_timedout_stat {
> > + DRM_DEP_TIMEDOUT_STAT_JOB_SIGNALED,
> > + DRM_DEP_TIMEDOUT_STAT_REQUEUE_JOB,
> > +};
> > +
> > +/**
> > + * struct drm_dep_queue_ops - driver callbacks for a dep queue
> > + */
> > +struct drm_dep_queue_ops {
> > + /**
> > + * @run_job: submit the job to hardware. Returns the hardware completion
> > + * fence (with a reference held for the scheduler), or NULL/ERR_PTR on
> > + * synchronous completion or error.
> > + */
> > + struct dma_fence *(*run_job)(struct drm_dep_job *job);
> > +
> > + /**
> > + * @timedout_job: called when the TDR fires for the head job. Must stop
> > + * the hardware, then return %DRM_DEP_TIMEDOUT_STAT_JOB_SIGNALED if the
> > + * job's fence was signalled during reset, or
> > + * %DRM_DEP_TIMEDOUT_STAT_REQUEUE_JOB if the timeout was spurious or
> > + * signalling was otherwise delayed, and the job should be re-inserted
> > + * at the head of the pending list. Any other value triggers a WARN.
> > + */
> > + enum drm_dep_timedout_stat (*timedout_job)(struct drm_dep_job *job);
> > +
> > + /**
> > + * @release: called when the last kref on the queue is dropped and
> > + * drm_dep_queue_fini() has completed.  The driver is responsible for
> > + * removing @q from any internal bookkeeping, calling
> > + * drm_dep_queue_release(), and then freeing the memory containing @q
> > + * (e.g. via kfree_rcu() using @q->rcu).  If NULL, drm_dep calls
> > + * drm_dep_queue_release() and frees @q automatically via kfree_rcu().
> > + * Use this when the queue is embedded in a larger structure.
> > + */
> > + void (*release)(struct drm_dep_queue *q);
> > +
> > + /**
> > + * @fini: if set, called instead of drm_dep_queue_fini() when the last
> > + * kref is dropped. The driver is responsible for calling
> > + * drm_dep_queue_fini() itself after it is done with the queue. Use this
> > + * when additional teardown logic must run before fini (e.g., cleanup
> > + * firmware resources associated with the queue).
> > + */
> > + void (*fini)(struct drm_dep_queue *q);
> > +};
> > +
> > +/**
> > + * enum drm_dep_queue_flags - flags for &drm_dep_queue and
> > + *   &drm_dep_queue_init_args
> > + *
> > + * Flags are divided into three categories:
> > + *
> > + * - **Private static**: set internally at init time and never changed.
> > + *   Drivers must not read or write these.
> > + *   %DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ,
> > + *   %DRM_DEP_QUEUE_FLAGS_OWN_TIMEDOUT_WQ.
> > + *
> > + * - **Public dynamic**: toggled at runtime by drivers via accessors.
> > + *   Any modification must be performed under &drm_dep_queue.sched.lock.
> 
> Can’t enforce that in C.
> 
> > + *   Accessor functions provide unstable reads.
> > + *   %DRM_DEP_QUEUE_FLAGS_STOPPED,
> > + *   %DRM_DEP_QUEUE_FLAGS_KILLED.
> 
> > + *
> > + * - **Public static**: supplied by the driver in
> > + *   &drm_dep_queue_init_args.flags at queue creation time and not modified
> > + *   thereafter.
> 
> Same here.
> 
> > + *   %DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED,
> > + *   %DRM_DEP_QUEUE_FLAGS_HIGHPRI,
> > + *   %DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE.
> 
> > + *
> > + * @DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ: (private, static) submit workqueue was
> > + *   allocated by drm_dep_queue_init() and will be destroyed by
> > + *   drm_dep_queue_fini().
> > + * @DRM_DEP_QUEUE_FLAGS_OWN_TIMEDOUT_WQ: (private, static) timeout workqueue
> > + *   was allocated by drm_dep_queue_init() and will be destroyed by
> > + *   drm_dep_queue_fini().
> > + * @DRM_DEP_QUEUE_FLAGS_STOPPED: (public, dynamic) the queue is stopped and
> > + *   will not dispatch new jobs or remove jobs from the pending list, dropping
> > + *   the drm_dep-owned reference. Set by drm_dep_queue_stop(), cleared by
> > + *   drm_dep_queue_start().
> > + * @DRM_DEP_QUEUE_FLAGS_KILLED: (public, dynamic) the queue has been killed
> > + *   via drm_dep_queue_kill(). Any active dependency wait is cancelled
> > + *   immediately.  Jobs continue to flow through run_job for bookkeeping
> > + *   cleanup, but dependency waiting is skipped so that queued work drains
> > + *   as quickly as possible.
> > + * @DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED: (public, static) the queue supports
> > + *   the bypass path where eligible jobs skip the SPSC queue and run inline.
> > + * @DRM_DEP_QUEUE_FLAGS_HIGHPRI: (public, static) the submit workqueue owned
> > + *   by the queue is created with %WQ_HIGHPRI, causing run-job and put-job
> > + *   workers to execute at elevated priority. Only privileged clients (e.g.
> > + *   drivers managing time-critical or real-time GPU contexts) should request
> > + *   this flag; granting it to unprivileged userspace would allow priority
> > + *   inversion attacks.
> > + *   @drm_dep_queue_init_args.submit_wq is provided.
> > + * @DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE: (public, static) when set,
> > + *   drm_dep_job_done() may be called from hardirq context (e.g. from a
> > + *   hardware-signalled dma_fence callback). drm_dep_job_done() will directly
> > + *   dequeue the job and call drm_dep_job_put() without deferring to a
> > + *   workqueue. The driver's &drm_dep_job_ops.release callback must therefore
> > + *   be safe to invoke from IRQ context.
> > + */
> > +enum drm_dep_queue_flags {
> > + DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ = BIT(0),
> > + DRM_DEP_QUEUE_FLAGS_OWN_TIMEDOUT_WQ = BIT(1),
> > + DRM_DEP_QUEUE_FLAGS_STOPPED = BIT(2),
> > + DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED = BIT(3),
> > + DRM_DEP_QUEUE_FLAGS_HIGHPRI = BIT(4),
> > + DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE = BIT(5),
> > + DRM_DEP_QUEUE_FLAGS_KILLED = BIT(6),
> > +};
> > +
> > +/**
> > + * struct drm_dep_queue - a dependency-tracked GPU submission queue
> > + *
> > + * Combines the role of &drm_gpu_scheduler and &drm_sched_entity into a single
> > + * object.  Each queue owns a submit workqueue (or borrows one), a timeout
> > + * workqueue, an SPSC submission queue, and a pending-job list used for TDR.
> > + *
> > + * Initialise with drm_dep_queue_init(), tear down with drm_dep_queue_fini().
> > + * Reference counted via drm_dep_queue_get() / drm_dep_queue_put().
> > + *
> > + * All fields are **opaque to drivers**.  Do not read or write any field
> 
> Can’t enforce this in C.
> 
> > + * directly; use the provided helper functions instead.  The sole exception
> > + * is @rcu, which drivers may pass to kfree_rcu() when the queue is embedded
> > + * inside a larger driver-managed structure and the &drm_dep_queue_ops.release
> > + * vfunc performs an RCU-deferred free.
> 
> > + */
> > +struct drm_dep_queue {
> > + /** @ops: driver callbacks, set at init time. */
> > + const struct drm_dep_queue_ops *ops;
> > + /** @name: human-readable name used for workqueue and fence naming. */
> > + const char *name;
> > + /** @drm: owning DRM device; a drm_dev_get() reference is held for the
> > + *  lifetime of the queue to prevent module unload while queues are live.
> > + */
> > + struct drm_device *drm;
> > + /** @refcount: reference count; use drm_dep_queue_get/put(). */
> > + struct kref refcount;
> > + /**
> > + * @free_work: deferred teardown work queued unconditionally by
> > + * drm_dep_queue_fini() onto the module-private dep_free_wq.  The work
> > + * item disables pending workers synchronously and destroys any owned
> > + * workqueues before releasing the queue memory and dropping the
> > + * drm_dev_get() reference.  Running on dep_free_wq ensures
> > + * destroy_workqueue() is never called from within one of the queue's
> > + * own workers.
> > + */
> > + struct work_struct free_work;
> > + /**
> > + * @rcu: RCU head for deferred freeing.
> > + *
> > + * This is the **only** field drivers may access directly.  When the
> 
> We can enforce this in Rust at compile time.
> 
> > + * queue is embedded in a larger structure, implement
> > + * &drm_dep_queue_ops.release, call drm_dep_queue_release() to destroy
> > + * internal resources, then pass this field to kfree_rcu() so that any
> > + * in-flight RCU readers referencing the queue's dma_fence timeline name
> > + * complete before the memory is returned.  All other fields must be
> > + * accessed through the provided helpers.
> > + */
> > + struct rcu_head rcu;
> > +
> > + /** @sched: scheduling and workqueue state. */
> > + struct {
> > + /** @sched.submit_wq: ordered workqueue for run/put-job work. */
> > + struct workqueue_struct *submit_wq;
> > + /** @sched.timeout_wq: workqueue for the TDR delayed work. */
> > + struct workqueue_struct *timeout_wq;
> > + /**
> > + * @sched.run_job: work item that dispatches the next queued
> > + * job.
> > + */
> > + struct work_struct run_job;
> > + /** @sched.put_job: work item that frees finished jobs. */
> > + struct work_struct put_job;
> > + /** @sched.tdr: delayed work item for timeout/reset (TDR). */
> > + struct delayed_work tdr;
> > + /**
> > + * @sched.lock: mutex serialising job dispatch, bypass
> > + * decisions, stop/start, and flag updates.
> > + */
> > + struct mutex lock;
> > + /**
> > + * @sched.flags: bitmask of &enum drm_dep_queue_flags.
> > + * Any modification after drm_dep_queue_init() must be
> > + * performed under @sched.lock.
> > + */
> > + enum drm_dep_queue_flags flags;
> > + } sched;
> > +
> > + /** @job: pending-job tracking state. */
> > + struct {
> > + /**
> > + * @job.pending: list of jobs that have been dispatched to
> > + * hardware and not yet freed. Protected by @job.lock.
> > + */
> > + struct list_head pending;
> > + /**
> > + * @job.queue: SPSC queue of jobs waiting to be dispatched.
> > + * Producers push via drm_dep_queue_push_job(); the run_job
> > + * work item pops from the consumer side.
> > + */
> > + struct spsc_queue queue;
> > + /**
> > + * @job.lock: spinlock protecting @job.pending, TDR start, and
> > + * the %DRM_DEP_QUEUE_FLAGS_STOPPED flag. Always acquired with
> > + * irqsave (spin_lock_irqsave / spin_unlock_irqrestore) to
> > + * support %DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE queues where
> > + * drm_dep_job_done() may run from hardirq context.
> > + */
> > + spinlock_t lock;
> > + /**
> > + * @job.timeout: per-job TDR timeout in jiffies.
> > + * %MAX_SCHEDULE_TIMEOUT means no timeout.
> > + */
> > + long timeout;
> > +#if IS_ENABLED(CONFIG_PROVE_LOCKING)
> > + /**
> > + * @job.push: lockdep annotation tracking the arm-to-push
> > + * critical section.
> > + */
> > + struct {
> > + /*
> > + * @job.push.owner: task that currently holds the push
> > + * context, used to assert single-owner invariants.
> > + * NULL when idle.
> > + */
> > + struct task_struct *owner;
> > + } push;
> > +#endif
> > + } job;
> > +
> > + /** @credit: hardware credit accounting. */
> > + struct {
> > + /** @credit.limit: maximum credits the queue can hold. */
> > + u32 limit;
> > + /** @credit.count: credits currently in flight (atomic). */
> > + atomic_t count;
> > + } credit;
> > +
> > + /** @dep: current blocking dependency for the head SPSC job. */
> > + struct {
> > + /**
> > + * @dep.fence: fence being waited on before the head job can
> > + * run. NULL when no dependency is pending.
> > + */
> > + struct dma_fence *fence;
> > + /**
> > + * @dep.removed_fence: dependency fence whose callback has been
> > + * removed.  The run-job worker must drop its reference to this
> > + * fence before proceeding to call run_job.
> 
> We can enforce this in Rust automatically.
> 
> > + */
> > + struct dma_fence *removed_fence;
> > + /** @dep.cb: callback installed on @dep.fence. */
> > + struct dma_fence_cb cb;
> > + } dep;
> > +
> > + /** @fence: fence context and sequence number state. */
> > + struct {
> > + /**
> > + * @fence.seqno: next sequence number to assign, incremented
> > + * each time a job is armed.
> > + */
> > + u32 seqno;
> > + /**
> > + * @fence.context: base DMA fence context allocated at init
> > + * time. Finished fences use this context.
> > + */
> > + u64 context;
> > + } fence;
> > +};
> > +
> > +/**
> > + * struct drm_dep_queue_init_args - arguments for drm_dep_queue_init()
> > + */
> > +struct drm_dep_queue_init_args {
> > + /** @ops: driver callbacks; must not be NULL. */
> > + const struct drm_dep_queue_ops *ops;
> > + /** @name: human-readable name for workqueues and fence timelines. */
> > + const char *name;
> > + /** @drm: owning DRM device. A drm_dev_get() reference is taken at
> > + *  queue init and released when the queue is freed, preventing module
> > + *  unload while any queue is still alive.
> > + */
> > + struct drm_device *drm;
> > + /**
> > + * @submit_wq: workqueue for job dispatch. If NULL, an ordered
> > + * workqueue is allocated and owned by the queue.  If non-NULL, the
> > + * workqueue must have been allocated with %WQ_MEM_RECLAIM_TAINT;
> > + * drm_dep_queue_init() returns %-EINVAL otherwise.
> > + */
> > + struct workqueue_struct *submit_wq;
> > + /**
> > + * @timeout_wq: workqueue for TDR. If NULL, an ordered workqueue
> > + * is allocated and owned by the queue.  If non-NULL, the workqueue
> > + * must have been allocated with %WQ_MEM_RECLAIM_TAINT;
> > + * drm_dep_queue_init() returns %-EINVAL otherwise.
> > + */
> > + struct workqueue_struct *timeout_wq;
> > + /** @credit_limit: maximum hardware credits; must be non-zero. */
> > + u32 credit_limit;
> > + /**
> > + * @timeout: per-job TDR timeout in jiffies. Zero means no timeout
> > + * (%MAX_SCHEDULE_TIMEOUT is used internally).
> > + */
> > + long timeout;
> > + /**
> > + * @flags: initial queue flags. %DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ
> > + * and %DRM_DEP_QUEUE_FLAGS_OWN_TIMEDOUT_WQ are managed internally
> > + * and will be ignored if set here. Setting
> > + * %DRM_DEP_QUEUE_FLAGS_HIGHPRI requests a high-priority submit
> > + * workqueue; drivers must only set this for privileged clients.
> > + */
> > + enum drm_dep_queue_flags flags;
> > +};
> > +
> > +/**
> > + * struct drm_dep_job_ops - driver callbacks for a dep job
> > + */
> > +struct drm_dep_job_ops {
> > + /**
> > + * @release: called when the last reference to the job is dropped.
> > + *
> > + * If set, the driver is responsible for freeing the job. If NULL,
> 
> And if they don’t?
> 
> By the way, we can also enforce this in Rust.
> 
> > + * drm_dep_job_put() will call kfree() on the job directly.
> > + */
> > + void (*release)(struct drm_dep_job *job);
> > +};
> > +
> > +/**
> > + * struct drm_dep_job - a unit of work submitted to a dep queue
> > + *
> > + * All fields are **opaque to drivers**.  Do not read or write any field
> > + * directly; use the provided helper functions instead.
> > + */
> > +struct drm_dep_job {
> > + /** @ops: driver callbacks for this job. */
> > + const struct drm_dep_job_ops *ops;
> > + /** @refcount: reference count, managed by drm_dep_job_get/put(). */
> > + struct kref refcount;
> > + /**
> > + * @dependencies: xarray of &dma_fence dependencies before the job can
> > + * run.
> > + */
> > + struct xarray dependencies;
> > + /** @q: the queue this job is submitted to. */
> > + struct drm_dep_queue *q;
> > + /** @queue_node: SPSC queue linkage for pending submission. */
> > + struct spsc_node queue_node;
> > + /**
> > + * @pending_link: list entry in the queue's pending job list. Protected
> > + * by @job.q->job.lock.
> > + */
> > + struct list_head pending_link;
> > + /** @dfence: finished fence for this job. */
> > + struct drm_dep_fence *dfence;
> > + /** @cb: fence callback used to watch for dependency completion. */
> > + struct dma_fence_cb cb;
> > + /** @credits: number of credits this job consumes from the queue. */
> > + u32 credits;
> > + /**
> > + * @last_dependency: index into @dependencies of the next fence to
> > + * check. Advanced by drm_dep_queue_job_dependency() as each
> > + * dependency is consumed.
> > + */
> > + u32 last_dependency;
> > + /**
> > + * @invalidate_count: number of times this job has been invalidated.
> > + * Incremented by drm_dep_job_invalidate_job().
> > + */
> > + u32 invalidate_count;
> > + /**
> > + * @signalling_cookie: return value of dma_fence_begin_signalling()
> > + * captured in drm_dep_job_arm() and consumed by drm_dep_job_push().
> > + * Not valid outside the arm→push window.
> > + */
> > + bool signalling_cookie;
> > +};
> > +
> > +/**
> > + * struct drm_dep_job_init_args - arguments for drm_dep_job_init()
> > + */
> > +struct drm_dep_job_init_args {
> > + /**
> > + * @ops: driver callbacks for the job, or NULL for default behaviour.
> > + */
> > + const struct drm_dep_job_ops *ops;
> > + /** @q: the queue to associate the job with. A reference is taken. */
> > + struct drm_dep_queue *q;
> > + /** @credits: number of credits this job consumes; must be non-zero. */
> > + u32 credits;
> > +};
> > +
> > +/* Queue API */
> > +
> > +/**
> > + * drm_dep_queue_sched_guard() - acquire the queue scheduler lock as a guard
> > + * @__q: dep queue whose scheduler lock to acquire
> > + *
> > + * Acquires @__q->sched.lock as a scoped mutex guard (released automatically
> > + * when the enclosing scope exits).  This lock serialises all scheduler state
> > + * transitions — stop/start/kill flag changes, bypass-path decisions, and the
> > + * run-job worker — so it must be held when the driver needs to atomically
> > + * inspect or modify queue state in relation to job submission.
> > + *
> > + * **When to use**
> > + *
> > + * Drivers that set %DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED and wish to
> > + * serialise their own submit work against the bypass path must acquire this
> > + * guard.  Without it, a concurrent caller of drm_dep_job_push() could take
> > + * the bypass path and call ops->run_job() inline between the driver's
> > + * eligibility check and its corresponding action, producing a race.
> 
> So if you’re not careful, you have just introduced a race :/
> 
> > + *
> > + * **Constraint: only from submit_wq worker context**
> > + *
> > + * This guard must only be acquired from a work item running on the queue's
> > + * submit workqueue (@q->sched.submit_wq) by drivers.
> > + *
> > + * Context: Process context only; must be called from submit_wq work by
> > + * drivers.
> > + */
> > +#define drm_dep_queue_sched_guard(__q) \
> > + guard(mutex)(&(__q)->sched.lock)
> > +
> > +int drm_dep_queue_init(struct drm_dep_queue *q,
> > +       const struct drm_dep_queue_init_args *args);
> > +void drm_dep_queue_fini(struct drm_dep_queue *q);
> > +void drm_dep_queue_release(struct drm_dep_queue *q);
> > +struct drm_dep_queue *drm_dep_queue_get(struct drm_dep_queue *q);
> > +bool drm_dep_queue_get_unless_zero(struct drm_dep_queue *q);
> > +void drm_dep_queue_put(struct drm_dep_queue *q);
> > +void drm_dep_queue_stop(struct drm_dep_queue *q);
> > +void drm_dep_queue_start(struct drm_dep_queue *q);
> > +void drm_dep_queue_kill(struct drm_dep_queue *q);
> > +void drm_dep_queue_trigger_timeout(struct drm_dep_queue *q);
> > +void drm_dep_queue_cancel_tdr_sync(struct drm_dep_queue *q);
> > +void drm_dep_queue_resume_timeout(struct drm_dep_queue *q);
> > +bool drm_dep_queue_work_enqueue(struct drm_dep_queue *q,
> > + struct work_struct *work);
> > +bool drm_dep_queue_is_stopped(struct drm_dep_queue *q);
> > +bool drm_dep_queue_is_killed(struct drm_dep_queue *q);
> > +bool drm_dep_queue_is_initialized(struct drm_dep_queue *q);
> > +void drm_dep_queue_set_stopped(struct drm_dep_queue *q);
> > +unsigned int drm_dep_queue_refcount(const struct drm_dep_queue *q);
> > +long drm_dep_queue_timeout(const struct drm_dep_queue *q);
> > +struct workqueue_struct *drm_dep_queue_submit_wq(struct drm_dep_queue *q);
> > +struct workqueue_struct *drm_dep_queue_timeout_wq(struct drm_dep_queue *q);
> > +
> > +/* Job API */
> > +
> > +/**
> > + * DRM_DEP_JOB_FENCE_PREALLOC - sentinel value for pre-allocating a dependency slot
> > + *
> > + * Pass this to drm_dep_job_add_dependency() instead of a real fence to
> > + * pre-allocate a slot in the job's dependency xarray during the preparation
> > + * phase (where GFP_KERNEL is available).  The returned xarray index identifies
> > + * the slot.  Call drm_dep_job_replace_dependency() later — inside a
> > + * dma_fence_begin_signalling() region if needed — to swap in the real fence
> > + * without further allocation.
> > + *
> > + * This sentinel is never treated as a dma_fence; it carries no reference count
> > + * and must not be passed to dma_fence_put().  It is only valid as an argument
> > + * to drm_dep_job_add_dependency() and as the expected stored value checked by
> > + * drm_dep_job_replace_dependency().
> > + */
> > +#define DRM_DEP_JOB_FENCE_PREALLOC ((struct dma_fence *)-1)
> > +
> > +int drm_dep_job_init(struct drm_dep_job *job,
> > +     const struct drm_dep_job_init_args *args);
> > +struct drm_dep_job *drm_dep_job_get(struct drm_dep_job *job);
> > +void drm_dep_job_put(struct drm_dep_job *job);
> > +void drm_dep_job_arm(struct drm_dep_job *job);
> > +void drm_dep_job_push(struct drm_dep_job *job);
> > +int drm_dep_job_add_dependency(struct drm_dep_job *job,
> > +       struct dma_fence *fence);
> > +void drm_dep_job_replace_dependency(struct drm_dep_job *job, u32 index,
> > +    struct dma_fence *fence);
> > +int drm_dep_job_add_syncobj_dependency(struct drm_dep_job *job,
> > +       struct drm_file *file, u32 handle,
> > +       u32 point);
> > +int drm_dep_job_add_resv_dependencies(struct drm_dep_job *job,
> > +      struct dma_resv *resv,
> > +      enum dma_resv_usage usage);
> > +int drm_dep_job_add_implicit_dependencies(struct drm_dep_job *job,
> > +  struct drm_gem_object *obj,
> > +  bool write);
> > +bool drm_dep_job_is_signaled(struct drm_dep_job *job);
> > +bool drm_dep_job_is_finished(struct drm_dep_job *job);
> > +bool drm_dep_job_invalidate_job(struct drm_dep_job *job, int threshold);
> > +struct dma_fence *drm_dep_job_finished_fence(struct drm_dep_job *job);
> > +
> > +/**
> > + * struct drm_dep_queue_pending_job_iter - iterator state for
> > + *   drm_dep_queue_for_each_pending_job()
> > + * @q: queue being iterated
> > + */
> > +struct drm_dep_queue_pending_job_iter {
> > + struct drm_dep_queue *q;
> > +};
> > +
> > +/* Drivers should never call this directly */
> 
> Not enforceable in C.
> 
> > +static inline struct drm_dep_queue_pending_job_iter
> > +__drm_dep_queue_pending_job_iter_begin(struct drm_dep_queue *q)
> > +{
> > + struct drm_dep_queue_pending_job_iter iter = {
> > + .q = q,
> > + };
> > +
> > + WARN_ON(!drm_dep_queue_is_stopped(q));
> > + return iter;
> > +}
> > +
> > +/* Drivers should never call this directly */
> > +static inline void
> > +__drm_dep_queue_pending_job_iter_end(struct drm_dep_queue_pending_job_iter iter)
> > +{
> > + WARN_ON(!drm_dep_queue_is_stopped(iter.q));
> > +}
> > +
> > +/* clang-format off */
> > +DEFINE_CLASS(drm_dep_queue_pending_job_iter,
> > +     struct drm_dep_queue_pending_job_iter,
> > +     __drm_dep_queue_pending_job_iter_end(_T),
> > +     __drm_dep_queue_pending_job_iter_begin(__q),
> > +     struct drm_dep_queue *__q);
> > +/* clang-format on */
> > +static inline void *
> > +class_drm_dep_queue_pending_job_iter_lock_ptr(
> > + class_drm_dep_queue_pending_job_iter_t *_T)
> > +{ return _T; }
> > +#define class_drm_dep_queue_pending_job_iter_is_conditional false
> > +
> > +/**
> > + * drm_dep_queue_for_each_pending_job() - iterate over all pending jobs
> > + *   in a queue
> > + * @__job: loop cursor, a &struct drm_dep_job pointer
> > + * @__q: &struct drm_dep_queue to iterate
> > + *
> > + * Iterates over every job currently on @__q->job.pending. The queue must be
> > + * stopped (drm_dep_queue_stop() called) before using this iterator; a WARN_ON
> > + * fires at the start and end of the scope if it is not.
> > + *
> > + * Context: Any context.
> > + */
> > +#define drm_dep_queue_for_each_pending_job(__job, __q) \
> > + scoped_guard(drm_dep_queue_pending_job_iter, (__q)) \
> > + list_for_each_entry((__job), &(__q)->job.pending, pending_link)
> > +
> > +#endif
> > -- 
> > 2.34.1
> > 
> 
> 
> By the way:
> 
> I invite you to have a look at this implementation [0]. It currently works in real
> hardware i.e.: our downstream "Tyr" driver for Arm Mali is using that at the
> moment. It is a mere prototype that we’ve put together to test different
> approaches, so it’s not meant to be a “solution” at all. It’s a mere data point
> for further discussion.
> 
> Philip Stanner is working on this “Job Queue” concept too, but from an upstream
> perspective.
> 
> [0]: https://gitlab.freedesktop.org/panfrost/linux/-/merge_requests/61

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
  2026-03-17  5:45     ` Matthew Brost
@ 2026-03-17  7:17       ` Miguel Ojeda
  2026-03-17  8:26         ` Matthew Brost
  2026-03-17 18:14       ` Matthew Brost
  1 sibling, 1 reply; 21+ messages in thread
From: Miguel Ojeda @ 2026-03-17  7:17 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Daniel Almeida, intel-xe, dri-devel, Boris Brezillon,
	Tvrtko Ursulin, Rodrigo Vivi, Thomas Hellström,
	Christian König, Danilo Krummrich, David Airlie,
	Maarten Lankhorst, Maxime Ripard, Philipp Stanner, Simona Vetter,
	Sumit Semwal, Thomas Zimmermann, linux-kernel, Sami Tolvanen,
	Jeffrey Vander Stoep, Alice Ryhl, Daniel Stone, Alexandre Courbot,
	John Hubbard, shashanks, jajones, Eliot Courtney, Joel Fernandes,
	rust-for-linux

On Tue, Mar 17, 2026 at 6:46 AM Matthew Brost <matthew.brost@intel.com> wrote:
>
> You can do RAII in C - see cleanup.h. Clear object lifetimes and
> ownership are what is important. Disciplined coding is the only to do
> this regardless of language. RAII doesn't help with help with bad object
> models / ownership / lifetime models either.

"Ownership", "lifetimes" and being "disciplined" *is* what Rust helps
with. That is the whole point (even if there are other advantages).

Yes, the cleanup attribute is nice, but even the whole `CLASS` thing
is meant to simplify code. Simplifying code does reduce bugs in
general, but it doesn't solve anything fundamental. Even if we had C++
and full-fledged smart pointers and so on, it doesn't improve
meaningfully the situation -- one can still mess things up very easily
with them.

And yes, sanitizers and lockdep and runtime solutions that require to
trigger paths are amazing, but not anywhere close to enforcing
something statically.

The fact that `unsafe` exists doesn't mean "Rust doesn't solve
anything". Quite the opposite: the goal is to provide safe
abstractions where possible, i.e. we minimize the need for `unsafe`.
And for the cases where there is no other way around it, the toolchain
will force you to write an explanation for your `unsafe` usage. Then
maintainers and reviewers will have to agree with your argument for
it.

In particular, it is not something that gets routinely (and
implicitly) used every second line like we do in C.

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
  2026-03-17  7:17       ` Miguel Ojeda
@ 2026-03-17  8:26         ` Matthew Brost
  2026-03-17 12:04           ` Daniel Almeida
  2026-03-17 19:41           ` Miguel Ojeda
  0 siblings, 2 replies; 21+ messages in thread
From: Matthew Brost @ 2026-03-17  8:26 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: Daniel Almeida, intel-xe, dri-devel, Boris Brezillon,
	Tvrtko Ursulin, Rodrigo Vivi, Thomas Hellström,
	Christian König, Danilo Krummrich, David Airlie,
	Maarten Lankhorst, Maxime Ripard, Philipp Stanner, Simona Vetter,
	Sumit Semwal, Thomas Zimmermann, linux-kernel, Sami Tolvanen,
	Jeffrey Vander Stoep, Alice Ryhl, Daniel Stone, Alexandre Courbot,
	John Hubbard, shashanks, jajones, Eliot Courtney, Joel Fernandes,
	rust-for-linux

On Tue, Mar 17, 2026 at 08:17:27AM +0100, Miguel Ojeda wrote:
> On Tue, Mar 17, 2026 at 6:46 AM Matthew Brost <matthew.brost@intel.com> wrote:
> >
> > You can do RAII in C - see cleanup.h. Clear object lifetimes and
> > ownership are what is important. Disciplined coding is the only to do
> > this regardless of language. RAII doesn't help with help with bad object
> > models / ownership / lifetime models either.
> 

I hate cut off in thteads.

> "Ownership", "lifetimes" and being "disciplined" *is* what Rust helps
> with. That is the whole point (even if there are other advantages).
> 

I get it — you’re a Rust zealot. You can do this in C and enforce the
rules quite well.

RAII cannot describe ownership transfers of refs, nor can it express who
owns what in multi-threaded components, as far as I know. Ref-tracking
and ownership need to be explicit.

I’m not going to reply to Rust vs C comments in this thread. If you want
to talk about ownership, lifetimes, dma-fence enforcement, and teardown
guarantees, sure.

If you want to build on top of a component that’s been tested on a
production driver, great — please join in. If you want to figure out all
the pitfalls yourself, well… have fun.

Matt

> Yes, the cleanup attribute is nice, but even the whole `CLASS` thing
> is meant to simplify code. Simplifying code does reduce bugs in
> general, but it doesn't solve anything fundamental. Even if we had C++
> and full-fledged smart pointers and so on, it doesn't improve
> meaningfully the situation -- one can still mess things up very easily
> with them.
> 
> And yes, sanitizers and lockdep and runtime solutions that require to
> trigger paths are amazing, but not anywhere close to enforcing
> something statically.
> 
> The fact that `unsafe` exists doesn't mean "Rust doesn't solve
> anything". Quite the opposite: the goal is to provide safe
> abstractions where possible, i.e. we minimize the need for `unsafe`.
> And for the cases where there is no other way around it, the toolchain
> will force you to write an explanation for your `unsafe` usage. Then
> maintainers and reviewers will have to agree with your argument for
> it.
> 
> In particular, it is not something that gets routinely (and
> implicitly) used every second line like we do in C.
> 
> Cheers,
> Miguel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
  2026-03-17  8:26         ` Matthew Brost
@ 2026-03-17 12:04           ` Daniel Almeida
  2026-03-17 19:41           ` Miguel Ojeda
  1 sibling, 0 replies; 21+ messages in thread
From: Daniel Almeida @ 2026-03-17 12:04 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Miguel Ojeda, intel-xe, dri-devel, Boris Brezillon,
	Tvrtko Ursulin, Rodrigo Vivi, Thomas Hellström,
	Christian König, Danilo Krummrich, David Airlie,
	Maarten Lankhorst, Maxime Ripard, Philipp Stanner, Simona Vetter,
	Sumit Semwal, Thomas Zimmermann, linux-kernel, Sami Tolvanen,
	Jeffrey Vander Stoep, Alice Ryhl, Daniel Stone, Alexandre Courbot,
	John Hubbard, shashanks, jajones, Eliot Courtney, Joel Fernandes,
	rust-for-linux

Matthew,

> I get it — you’re a Rust zealot. You can do this in C and enforce the
> rules quite well.
> 
> RAII cannot describe ownership transfers of refs, nor can it express who
> owns what in multi-threaded components, as far as I know. Ref-tracking
> and ownership need to be explicit.
> 
> I’m not going to reply to Rust vs C comments in this thread. If you want
> to talk about ownership, lifetimes, dma-fence enforcement, and teardown
> guarantees, sure.
> 
> If you want to build on top of a component that’s been tested on a
> production driver, great — please join in. If you want to figure out all
> the pitfalls yourself, well… have fun.
> 
> Matt
> 

It is not about being a Rust zealot. I pointed out that your code has issues.
Every time you access the queue you have to use a special function because the
queue might be gone, how is this not a problem?

+ * However, there is a secondary hazard: a worker can be queued while the
+ * queue is in a "zombie" state — refcount has already reached zero and async
+ * teardown is in flight, but the work item has not yet been disabled by
+ * free_work.  To guard against this every worker uses
+ * drm_dep_queue_get_unless_zero() at entry; if the refcount is already zero
+ * the worker bails immediately without touching the queue state.

At various points you document requirements that are simply comments. Resource
management is scattered all over the place, and it’s sometimes even shared
with drivers, whom you have no control over.

+ * Drivers that set %DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED and wish to
+ * serialise their own submit work against the bypass path must acquire this
+ * guard.  Without it, a concurrent caller of drm_dep_job_push() could take
+ * the bypass path and call ops->run_job() inline between the driver's
+ * eligibility check and its corresponding action, producing a race.

How is this not a problem? Again, you’re not in control of driver code.

+ * If set, the driver is responsible for freeing the job. If NULL,

Same here.

Even if we take Rust out of the equation, how do you plan to solve these things? Or
do you consider them solved as is?

I worry that we will find ourselves again at XDC in yet another scheduler
workshop to address the issues that will invariably come up with your new
design in a few years.

> If you want to build on top of a component that’s been tested on a
> production driver, great — please join in. If you want to figure out all
> the pitfalls yourself, well… have fun.

Note that I didn’t show up with a low-effort “hey, how about we rewrite
this in Rust?”. Instead, I linked to an actual Rust implementation that I
spent weeks painstakingly debugging, not to mention the time it took to write
it. Again, I suggest that you guys have a look, like I did with your code. You
might find things you end up liking there. 

— Daniel




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
  2026-03-17  2:47   ` [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer Daniel Almeida
  2026-03-17  5:45     ` Matthew Brost
@ 2026-03-17 12:31     ` Danilo Krummrich
  2026-03-17 14:25       ` Daniel Almeida
  1 sibling, 1 reply; 21+ messages in thread
From: Danilo Krummrich @ 2026-03-17 12:31 UTC (permalink / raw)
  To: Daniel Almeida
  Cc: Matthew Brost, intel-xe, dri-devel, Boris Brezillon,
	Tvrtko Ursulin, Rodrigo Vivi, Thomas Hellström,
	Christian König, David Airlie, Maarten Lankhorst,
	Maxime Ripard, Philipp Stanner, Simona Vetter, Sumit Semwal,
	Thomas Zimmermann, linux-kernel, Sami Tolvanen,
	Jeffrey Vander Stoep, Alice Ryhl, Daniel Stone, Alexandre Courbot,
	John Hubbard, shashanks, jajones, Eliot Courtney, Joel Fernandes,
	rust-for-linux, Miguel Ojeda

On Tue Mar 17, 2026 at 3:47 AM CET, Daniel Almeida wrote:
> I agree with what Danilo said below, i.e.:  IMHO, with the direction that DRM
> is going, it is much more ergonomic to add a Rust component with a nice C
> interface than doing it the other way around.

This is not exactly what I said. I was talking about the maintainance aspects
and that a Rust Jobqueue implementation (for the reasons explained in my initial
reply) is easily justifiable in this aspect, whereas another C implementation,
that does *not* replace the existing DRM scheduler entirely, is much harder to
justify from a maintainance perspective.

I'm also not sure whether a C interface from the Rust side is easy to establish.
We don't want to limit ourselves in terms of language capabilities for this and
passing through all the additional infromation Rust carries in the type system
might not be straight forward.

It would be an experiment, and it was one of the ideas behind the Rust Jobqueue
to see how it turns if we try. Always with the fallback of having C
infrastructure as an alternative when it doesn't work out well.

Having this said, I don't see an issue with the drm_dep thing going forward if
there is a path to replacing DRM sched entirely.

The Rust component should remain independent from this for the reasons mentioned
in [1].

[1] https://lore.kernel.org/dri-devel/DH51W6XRQXYX.3M30IRYIWZLFG@kernel.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
  2026-03-17 12:31     ` Danilo Krummrich
@ 2026-03-17 14:25       ` Daniel Almeida
  2026-03-17 14:33         ` Danilo Krummrich
  0 siblings, 1 reply; 21+ messages in thread
From: Daniel Almeida @ 2026-03-17 14:25 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Matthew Brost, intel-xe, dri-devel, Boris Brezillon,
	Tvrtko Ursulin, Rodrigo Vivi, Thomas Hellström,
	Christian König, David Airlie, Maarten Lankhorst,
	Maxime Ripard, Philipp Stanner, Simona Vetter, Sumit Semwal,
	Thomas Zimmermann, linux-kernel, Sami Tolvanen,
	Jeffrey Vander Stoep, Alice Ryhl, Daniel Stone, Alexandre Courbot,
	John Hubbard, shashanks, jajones, Eliot Courtney, Joel Fernandes,
	rust-for-linux, Miguel Ojeda



> On 17 Mar 2026, at 09:31, Danilo Krummrich <dakr@kernel.org> wrote:
> 
> On Tue Mar 17, 2026 at 3:47 AM CET, Daniel Almeida wrote:
>> I agree with what Danilo said below, i.e.:  IMHO, with the direction that DRM
>> is going, it is much more ergonomic to add a Rust component with a nice C
>> interface than doing it the other way around.
> 
> This is not exactly what I said. I was talking about the maintainance aspects
> and that a Rust Jobqueue implementation (for the reasons explained in my initial
> reply) is easily justifiable in this aspect, whereas another C implementation,
> that does *not* replace the existing DRM scheduler entirely, is much harder to
> justify from a maintainance perspective.

Ok, I misunderstood your point a bit.

> 
> I'm also not sure whether a C interface from the Rust side is easy to establish.
> We don't want to limit ourselves in terms of language capabilities for this and
> passing through all the additional infromation Rust carries in the type system
> might not be straight forward.
> 
> It would be an experiment, and it was one of the ideas behind the Rust Jobqueue
> to see how it turns if we try. Always with the fallback of having C
> infrastructure as an alternative when it doesn't work out well.

From previous experience in doing Rust to C FFI in NVK, I don’t see, at
first, why this can’t work. But I agree with you, there may very well be
unanticipated things here and this part is indeed an experiment. No argument
from me here.

> 
> Having this said, I don't see an issue with the drm_dep thing going forward if
> there is a path to replacing DRM sched entirely.

The issues I pointed out remain. Even if the plan is to have drm_dep + JobQueue
(and no drm_sched). I feel that my point of considering doing it in Rust remains.

> 
> The Rust component should remain independent from this for the reasons mentioned
> in [1].
> 
> [1] https://lore.kernel.org/dri-devel/DH51W6XRQXYX.3M30IRYIWZLFG@kernel.org/

Ok

— Daniel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
  2026-03-17 14:25       ` Daniel Almeida
@ 2026-03-17 14:33         ` Danilo Krummrich
  2026-03-18 22:50           ` Matthew Brost
  0 siblings, 1 reply; 21+ messages in thread
From: Danilo Krummrich @ 2026-03-17 14:33 UTC (permalink / raw)
  To: Daniel Almeida
  Cc: Matthew Brost, intel-xe, dri-devel, Boris Brezillon,
	Tvrtko Ursulin, Rodrigo Vivi, Thomas Hellström,
	Christian König, David Airlie, Maarten Lankhorst,
	Maxime Ripard, Philipp Stanner, Simona Vetter, Sumit Semwal,
	Thomas Zimmermann, linux-kernel, Sami Tolvanen,
	Jeffrey Vander Stoep, Alice Ryhl, Daniel Stone, Alexandre Courbot,
	John Hubbard, shashanks, jajones, Eliot Courtney, Joel Fernandes,
	rust-for-linux, Miguel Ojeda

On Tue Mar 17, 2026 at 3:25 PM CET, Daniel Almeida wrote:
>
>
>> On 17 Mar 2026, at 09:31, Danilo Krummrich <dakr@kernel.org> wrote:
>> 
>> On Tue Mar 17, 2026 at 3:47 AM CET, Daniel Almeida wrote:
>>> I agree with what Danilo said below, i.e.:  IMHO, with the direction that DRM
>>> is going, it is much more ergonomic to add a Rust component with a nice C
>>> interface than doing it the other way around.
>> 
>> This is not exactly what I said. I was talking about the maintainance aspects
>> and that a Rust Jobqueue implementation (for the reasons explained in my initial
>> reply) is easily justifiable in this aspect, whereas another C implementation,
>> that does *not* replace the existing DRM scheduler entirely, is much harder to
>> justify from a maintainance perspective.
>
> Ok, I misunderstood your point a bit.
>
>> 
>> I'm also not sure whether a C interface from the Rust side is easy to establish.
>> We don't want to limit ourselves in terms of language capabilities for this and
>> passing through all the additional infromation Rust carries in the type system
>> might not be straight forward.
>> 
>> It would be an experiment, and it was one of the ideas behind the Rust Jobqueue
>> to see how it turns if we try. Always with the fallback of having C
>> infrastructure as an alternative when it doesn't work out well.
>
> From previous experience in doing Rust to C FFI in NVK, I don’t see, at
> first, why this can’t work. But I agree with you, there may very well be
> unanticipated things here and this part is indeed an experiment. No argument
> from me here.
>
>> 
>> Having this said, I don't see an issue with the drm_dep thing going forward if
>> there is a path to replacing DRM sched entirely.
>
> The issues I pointed out remain. Even if the plan is to have drm_dep + JobQueue
> (and no drm_sched). I feel that my point of considering doing it in Rust remains.

I mean, as mentioned below, we should have a Rust Jobqueue as independent
component. Or are you saying you'd consdider having only a Rust component with a
C API eventually? If so, that'd be way too early to consider for various
reasons.

>> The Rust component should remain independent from this for the reasons mentioned
>> in [1].
>> 
>> [1] https://lore.kernel.org/dri-devel/DH51W6XRQXYX.3M30IRYIWZLFG@kernel.org/
>
> Ok
>
> — Daniel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
  2026-03-17  5:45     ` Matthew Brost
  2026-03-17  7:17       ` Miguel Ojeda
@ 2026-03-17 18:14       ` Matthew Brost
  2026-03-17 19:48         ` Daniel Almeida
  2026-03-17 20:43         ` Boris Brezillon
  1 sibling, 2 replies; 21+ messages in thread
From: Matthew Brost @ 2026-03-17 18:14 UTC (permalink / raw)
  To: Daniel Almeida
  Cc: intel-xe, dri-devel, Boris Brezillon, Tvrtko Ursulin,
	Rodrigo Vivi, Thomas Hellström, Christian König,
	Danilo Krummrich, David Airlie, Maarten Lankhorst, Maxime Ripard,
	Philipp Stanner, Simona Vetter, Sumit Semwal, Thomas Zimmermann,
	linux-kernel, Sami Tolvanen, Jeffrey Vander Stoep, Alice Ryhl,
	Daniel Stone, Alexandre Courbot, John Hubbard, shashanks, jajones,
	Eliot Courtney, Joel Fernandes, rust-for-linux

On Mon, Mar 16, 2026 at 10:45:33PM -0700, Matthew Brost wrote:
> On Mon, Mar 16, 2026 at 11:47:01PM -0300, Daniel Almeida wrote:
> > (+cc a few other people + Rust-for-Linux ML)
> > 
> > Hi Matthew,
> > 
> > I agree with what Danilo said below, i.e.:  IMHO, with the direction that DRM
> > is going, it is much more ergonomic to add a Rust component with a nice C
> > interface than doing it the other way around.
> >
> 
> Holy war? See my reply to Danilo — I’ll write this in Rust if needed,
> but it’s not my first choice since I’m not yet a native speaker.
>  
> > > On 16 Mar 2026, at 01:32, Matthew Brost <matthew.brost@intel.com> wrote:
> > > 
> > > Diverging requirements between GPU drivers using firmware scheduling
> > > and those using hardware scheduling have shown that drm_gpu_scheduler is
> > > no longer sufficient for firmware-scheduled GPU drivers. The technical
> > > debt, lack of memory-safety guarantees, absence of clear object-lifetime
> > > rules, and numerous driver-specific hacks have rendered
> > > drm_gpu_scheduler unmaintainable. It is time for a fresh design for
> > > firmware-scheduled GPU drivers—one that addresses all of the
> > > aforementioned shortcomings.
> > > 
> > > Add drm_dep, a lightweight GPU submission queue intended as a
> > > replacement for drm_gpu_scheduler for firmware-managed GPU schedulers
> > > (e.g. Xe, Panthor, AMDXDNA, PVR, Nouveau, Nova). Unlike
> > > drm_gpu_scheduler, which separates the scheduler (drm_gpu_scheduler)
> > > from the queue (drm_sched_entity) into two objects requiring external
> > > coordination, drm_dep merges both roles into a single struct
> > > drm_dep_queue. This eliminates the N:1 entity-to-scheduler mapping
> > > that is unnecessary for firmware schedulers which manage their own
> > > run-lists internally.
> > > 
> > > Unlike drm_gpu_scheduler, which relies on external locking and lifetime
> > > management by the driver, drm_dep uses reference counting (kref) on both
> > > queues and jobs to guarantee object lifetime safety. A job holds a queue
> > 
> > In a domain that has been plagued by lifetime issues, we really should be
> 
> Yes, drm sched is a mess. I’ve been suggesting we fix it for years and
> have met pushback. This, however (drm dep), isn’t plagued by lifetime
> issues — that’s the primary focus here.
> 
> > enforcing RAII for resource management instead of manual calls.
> > 
> 
> You can do RAII in C - see cleanup.h. Clear object lifetimes and
> ownership are what is important. Disciplined coding is the only to do
> this regardless of language. RAII doesn't help with help with bad object
> models / ownership / lifetime models either.
> 
> I don't buy the Rust solves everything argument but again non-native
> speaker.
> 
> > > reference from init until its last put, and the queue holds a job reference
> > > from dispatch until the put_job worker runs. This makes use-after-free
> > > impossible even when completion arrives from IRQ context or concurrent
> > > teardown is in flight.
> > 
> > It makes use-after-free impossible _if_ you’re careful. It is not a
> > property of the type system, and incorrect code will compile just fine.
> > 
> 
> Sure. If a driver puts a drm_dep object reference on a resource that
> drm_dep owns, it will explode. That’s effectively putting a reference on
> a resource the driver doesn’t own. A driver can write to any physical
> memory and crash the system anyway, so I’m not really sure what we’re
> talking about here. Rust doesn’t solve anything in this scenario — you
> can always use an unsafe block and put a reference on a resource you
> don’t own.
> 
> Object model, ownership, and lifetimes are what is important and that is
> what drm dep is built around.
> 
> > > 
> > > The core objects are:
> > > 
> > >  struct drm_dep_queue - a per-context submission queue owning an
> > >    ordered submit workqueue, a TDR timeout workqueue, an SPSC job
> > >    queue, and a pending-job list. Reference counted; drivers can embed
> > >    it and provide a .release vfunc for RCU-safe teardown.
> > > 
> > >  struct drm_dep_job - a single unit of GPU work. Drivers embed this
> > >    and provide a .release vfunc. Jobs carry an xarray of input
> > >    dma_fence dependencies and produce a drm_dep_fence as their
> > >    finished fence.
> > > 
> > >  struct drm_dep_fence - a dma_fence subclass wrapping an optional
> > >    parent hardware fence. The finished fence is armed (sequence
> > >    number assigned) before submission and signals when the hardware
> > >    fence signals (or immediately on synchronous completion).
> > > 
> > > Job lifecycle:
> > >  1. drm_dep_job_init() - allocate and initialise; job acquires a
> > >     queue reference.
> > >  2. drm_dep_job_add_dependency() and friends - register input fences;
> > >     duplicates from the same context are deduplicated.
> > >  3. drm_dep_job_arm() - assign sequence number, obtain finished fence.
> > >  4. drm_dep_job_push() - submit to queue.
> > 
> > You cannot enforce this sequence easily in C code. Once again, we are trusting
> > drivers that it is followed, but in Rust, you can simply reject code that does
> > not follow this order at compile time.
> > 
> 
> I don’t know Rust, but yes — you can enforce this in C. It’s called
> lockdep and annotations. It’s not compile-time, but all of this is
> strictly enforced. e.g., write some code that doesn't follow this and
> report back if the kernel doesn't explode. It will, if doesn't I'll fix
> it to complain.
> 
> > 
> > > 
> > > Submission paths under queue lock:
> > >  - Bypass path: if DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED is set, the
> > >    SPSC queue is empty, no dependencies are pending, and credits are
> > >    available, the job is dispatched inline on the calling thread.
> > >  - Queued path: job is pushed onto the SPSC queue and the run_job
> > >    worker is kicked. The worker resolves remaining dependencies
> > >    (installing wakeup callbacks for unresolved fences) before calling
> > >    ops->run_job().
> > > 
> > > Credit-based throttling prevents hardware overflow: each job declares
> > > a credit cost at init time; dispatch is deferred until sufficient
> > > credits are available.
> > 
> > Why can’t we design an API where the driver can refuse jobs in
> > ops->run_job() if there are no resources to run it? This would do away with the
> > credit system that has been in place for quite a while. Has this approach been
> > tried in the past?
> > 
> 
> That seems possible if this is the preferred option. -EAGAIN is the way
> to do this. I’m open to the idea, but we also need to weigh the cost of
> converting drivers against the number of changes required.
> 
> Partial - reply with catch up the rest later.
> 
> Appreciate the feedback.

Picking up replies.

> 
> Matt
> 
> > 
> > > 
> > > Timeout Detection and Recovery (TDR): a per-queue delayed work item
> > > fires when the head pending job exceeds q->job.timeout jiffies, calling
> > > ops->timedout_job(). drm_dep_queue_trigger_timeout() forces immediate
> > > expiry for device teardown.
> > > 
> > > IRQ-safe completion: queues flagged DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE
> > > allow drm_dep_job_done() to be called from hardirq context (e.g. a
> > > dma_fence callback). Dependency cleanup is deferred to process context
> > > after ops->run_job() returns to avoid calling xa_destroy() from IRQ.
> > > 
> > > Zombie-state guard: workers use kref_get_unless_zero() on entry and
> > > bail immediately if the queue refcount has already reached zero and
> > > async teardown is in flight, preventing use-after-free.
> > 
> > In rust, when you queue work, you have to pass a reference-counted pointer
> > (Arc<T>). We simply never have this problem in a Rust design. If there is work
> > queued, the queue is alive.
> > 
> > By the way, why can’t we simply require synchronous teardowns?

Consider the case where the DRM dep queue’s refcount drops to zero, but
the device firmware still holds references to the associated queue.
These are resources that must be torn down asynchronously. In Xe, I need
to send two asynchronous firmware commands before I can safely remove
the memory associated with the queue (faulting on this kind of global
memory will take down the device) and recycle the firmware ID tied to
the queue. These async commands are issued on the driver side, on the
DRM dep queue’s workqueue as well.

Now consider a scenario where something goes wrong and those firmware
commands never complete, and a device reset is required to recover. The
driver’s per-queue tracking logic stops all queues (including zombie
ones), determines which commands were lost, cleans up the side effects
of that lost state, and then restarts all queues. That is how we would
end up in this work item with a zombie queue. The restart logic could
probably be made smart enough to avoid queueing work for zombie queues,
but in my opinion it’s safe enough to use kref_get_unless_zero() in the
work items.

It should also be clear that a DRM dep queue is primarily intended to be
embedded inside the driver’s own queue object, even though it is valid
to use it as a standalone object. The async teardown flows are also
optional features.

Let’s also consider a case where you do not need the async firmware
flows described above, but the DRM dep queue is still embedded in a
driver-side object that owns memory via dma-resv. The final queue put
may occur in IRQ context (DRM dep avoids kicking a worker just to drop a
refi as opt in), or in the reclaim path (any scheduler workqueue is the
reclaim path). In either case, you cannot free memory there taking a
dma-resv lock, which is why all DRM dep queues ultimately free their
resources in a work item outside of reclaim. Many drivers already follow
this pattern, but in DRM dep this behavior is built-in.

So I don’t think Rust natively solves these types of problems, although
I’ll concede that it does make refcounting a bit more sane.

> > 
> > > 
> > > Teardown is always deferred to a module-private workqueue (dep_free_wq)
> > > so that destroy_workqueue() is never called from within one of the
> > > queue's own workers. Each queue holds a drm_dev_get() reference on its
> > > owning struct drm_device, released as the final step of teardown via
> > > drm_dev_put(). This prevents the driver module from being unloaded
> > > while any queue is still alive without requiring a separate drain API.
> > > 
> > > Cc: Boris Brezillon <boris.brezillon@collabora.com>
> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> > > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > Cc: Christian König <christian.koenig@amd.com>
> > > Cc: Danilo Krummrich <dakr@kernel.org>
> > > Cc: David Airlie <airlied@gmail.com>
> > > Cc: dri-devel@lists.freedesktop.org
> > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > Cc: Maxime Ripard <mripard@kernel.org>
> > > Cc: Philipp Stanner <phasta@kernel.org>
> > > Cc: Simona Vetter <simona@ffwll.ch>
> > > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > > Cc: Thomas Zimmermann <tzimmermann@suse.de>
> > > Cc: linux-kernel@vger.kernel.org
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > Assisted-by: GitHub Copilot:claude-sonnet-4.6
> > > ---
> > > drivers/gpu/drm/Kconfig             |    4 +
> > > drivers/gpu/drm/Makefile            |    1 +
> > > drivers/gpu/drm/dep/Makefile        |    5 +
> > > drivers/gpu/drm/dep/drm_dep_fence.c |  406 +++++++
> > > drivers/gpu/drm/dep/drm_dep_fence.h |   25 +
> > > drivers/gpu/drm/dep/drm_dep_job.c   |  675 +++++++++++
> > > drivers/gpu/drm/dep/drm_dep_job.h   |   13 +
> > > drivers/gpu/drm/dep/drm_dep_queue.c | 1647 +++++++++++++++++++++++++++
> > > drivers/gpu/drm/dep/drm_dep_queue.h |   31 +
> > > include/drm/drm_dep.h               |  597 ++++++++++
> > > 10 files changed, 3404 insertions(+)
> > > create mode 100644 drivers/gpu/drm/dep/Makefile
> > > create mode 100644 drivers/gpu/drm/dep/drm_dep_fence.c
> > > create mode 100644 drivers/gpu/drm/dep/drm_dep_fence.h
> > > create mode 100644 drivers/gpu/drm/dep/drm_dep_job.c
> > > create mode 100644 drivers/gpu/drm/dep/drm_dep_job.h
> > > create mode 100644 drivers/gpu/drm/dep/drm_dep_queue.c
> > > create mode 100644 drivers/gpu/drm/dep/drm_dep_queue.h
> > > create mode 100644 include/drm/drm_dep.h
> > > 
> > > diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> > > index 5386248e75b6..834f6e210551 100644
> > > --- a/drivers/gpu/drm/Kconfig
> > > +++ b/drivers/gpu/drm/Kconfig
> > > @@ -276,6 +276,10 @@ config DRM_SCHED
> > > tristate
> > > depends on DRM
> > > 
> > > +config DRM_DEP
> > > + tristate
> > > + depends on DRM
> > > +
> > > # Separate option as not all DRM drivers use it
> > > config DRM_PANEL_BACKLIGHT_QUIRKS
> > > tristate
> > > diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> > > index e97faabcd783..1ad87cc0e545 100644
> > > --- a/drivers/gpu/drm/Makefile
> > > +++ b/drivers/gpu/drm/Makefile
> > > @@ -173,6 +173,7 @@ obj-y += clients/
> > > obj-y += display/
> > > obj-$(CONFIG_DRM_TTM) += ttm/
> > > obj-$(CONFIG_DRM_SCHED) += scheduler/
> > > +obj-$(CONFIG_DRM_DEP) += dep/
> > > obj-$(CONFIG_DRM_RADEON)+= radeon/
> > > obj-$(CONFIG_DRM_AMDGPU)+= amd/amdgpu/
> > > obj-$(CONFIG_DRM_AMDGPU)+= amd/amdxcp/
> > > diff --git a/drivers/gpu/drm/dep/Makefile b/drivers/gpu/drm/dep/Makefile
> > > new file mode 100644
> > > index 000000000000..335f1af46a7b
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/dep/Makefile
> > > @@ -0,0 +1,5 @@
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +
> > > +drm_dep-y := drm_dep_queue.o drm_dep_job.o drm_dep_fence.o
> > > +
> > > +obj-$(CONFIG_DRM_DEP) += drm_dep.o
> > > diff --git a/drivers/gpu/drm/dep/drm_dep_fence.c b/drivers/gpu/drm/dep/drm_dep_fence.c
> > > new file mode 100644
> > > index 000000000000..ae05b9077772
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/dep/drm_dep_fence.c
> > > @@ -0,0 +1,406 @@
> > > +// SPDX-License-Identifier: MIT
> > > +/*
> > > + * Copyright © 2026 Intel Corporation
> > > + */
> > > +
> > > +/**
> > > + * DOC: DRM dependency fence
> > > + *
> > > + * Each struct drm_dep_job has an associated struct drm_dep_fence that
> > > + * provides a single dma_fence (@finished) signalled when the hardware
> > > + * completes the job.
> > > + *
> > > + * The hardware fence returned by &drm_dep_queue_ops.run_job is stored as
> > > + * @parent. @finished is chained to @parent via drm_dep_job_done_cb() and
> > > + * is signalled once @parent signals (or immediately if run_job() returns
> > > + * NULL or an error).
> > 
> > I thought this fence proxy mechanism was going away due to recent work being
> > carried out by Christian?
> > 

Consider the case where a driver’s hardware fence is implemented as a
dma-fence-array or dma-fence-chain. You cannot install these types of
fences into a dma-resv or into syncobjs, so a proxy fence is useful
here. One example is when a single job submits work to multiple rings
that are flipped in hardware at the same time.

Another case is late arming of hardware fences in run_job (which many
drivers do). The proxy fence is immediately available at arm time and
can be installed into dma-resv or syncobjs even though the actual
hardware fence is not yet available. I think most drivers could be
refactored to make the hardware fence immediately available at run_job,
though.

> > > + *
> > > + * Drivers should expose @finished as the out-fence for GPU work since it is
> > > + * valid from the moment drm_dep_job_arm() returns, whereas the hardware fence
> > > + * could be a compound fence, which is disallowed when installed into
> > > + * drm_syncobjs or dma-resv.
> > > + *
> > > + * The fence uses the kernel's inline spinlock (NULL passed to dma_fence_init())
> > > + * so no separate lock allocation is required.
> > > + *
> > > + * Deadline propagation is supported: if a consumer sets a deadline via
> > > + * dma_fence_set_deadline(), it is forwarded to @parent when @parent is set.
> > > + * If @parent has not been set yet the deadline is stored in @deadline and
> > > + * forwarded at that point.
> > > + *
> > > + * Memory management: drm_dep_fence objects are allocated with kzalloc() and
> > > + * freed via kfree_rcu() once the fence is released, ensuring safety with
> > > + * RCU-protected fence accesses.
> > > + */
> > > +
> > > +#include <linux/slab.h>
> > > +#include <drm/drm_dep.h>
> > > +#include "drm_dep_fence.h"
> > > +
> > > +/**
> > > + * DRM_DEP_FENCE_FLAG_HAS_DEADLINE_BIT - a fence deadline hint has been set
> > > + *
> > > + * Set by the deadline callback on the finished fence to indicate a deadline
> > > + * has been set which may need to be propagated to the parent hardware fence.
> > > + */
> > > +#define DRM_DEP_FENCE_FLAG_HAS_DEADLINE_BIT (DMA_FENCE_FLAG_USER_BITS + 1)
> > > +
> > > +/**
> > > + * struct drm_dep_fence - fence tracking the completion of a dep job
> > > + *
> > > + * Contains a single dma_fence (@finished) that is signalled when the
> > > + * hardware completes the job. The fence uses the kernel's inline_lock
> > > + * (no external spinlock required).
> > > + *
> > > + * This struct is private to the drm_dep module; external code interacts
> > > + * through the accessor functions declared in drm_dep_fence.h.
> > > + */
> > > +struct drm_dep_fence {
> > > + /**
> > > + * @finished: signalled when the job completes on hardware.
> > > + *
> > > + * Drivers should use this fence as the out-fence for a job since it
> > > + * is available immediately upon drm_dep_job_arm().
> > > + */
> > > + struct dma_fence finished;
> > > +
> > > + /**
> > > + * @deadline: deadline set on @finished which potentially needs to be
> > > + * propagated to @parent.
> > > + */
> > > + ktime_t deadline;
> > > +
> > > + /**
> > > + * @parent: The hardware fence returned by &drm_dep_queue_ops.run_job.
> > > + *
> > > + * @finished is signaled once @parent is signaled. The initial store is
> > > + * performed via smp_store_release to synchronize with deadline handling.
> > > + *
> > > + * All readers must access this under the fence lock and take a reference to
> > > + * it, as @parent is set to NULL under the fence lock when the drm_dep_fence
> > > + * signals, and this drop also releases its internal reference.
> > > + */
> > > + struct dma_fence *parent;
> > > +
> > > + /**
> > > + * @q: the queue this fence belongs to.
> > > + */
> > > + struct drm_dep_queue *q;
> > > +};
> > > +
> > > +static const struct dma_fence_ops drm_dep_fence_ops;
> > > +
> > > +/**
> > > + * to_drm_dep_fence() - cast a dma_fence to its enclosing drm_dep_fence
> > > + * @f: dma_fence to cast
> > > + *
> > > + * Context: No context requirements (inline helper).
> > > + * Return: pointer to the enclosing &drm_dep_fence.
> > > + */
> > > +static struct drm_dep_fence *to_drm_dep_fence(struct dma_fence *f)
> > > +{
> > > + return container_of(f, struct drm_dep_fence, finished);
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_fence_set_parent() - store the hardware fence and propagate
> > > + *   any deadline
> > > + * @dfence: dep fence
> > > + * @parent: hardware fence returned by &drm_dep_queue_ops.run_job, or NULL/error
> > > + *
> > > + * Stores @parent on @dfence under smp_store_release() so that a concurrent
> > > + * drm_dep_fence_set_deadline() call sees the parent before checking the
> > > + * deadline bit. If a deadline has already been set on @dfence->finished it is
> > > + * forwarded to @parent immediately. Does nothing if @parent is NULL or an
> > > + * error pointer.
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +void drm_dep_fence_set_parent(struct drm_dep_fence *dfence,
> > > +      struct dma_fence *parent)
> > > +{
> > > + if (IS_ERR_OR_NULL(parent))
> > > + return;
> > > +
> > > + /*
> > > + * smp_store_release() to ensure a thread racing us in
> > > + * drm_dep_fence_set_deadline() sees the parent set before
> > > + * it calls test_bit(HAS_DEADLINE_BIT).
> > > + */
> > > + smp_store_release(&dfence->parent, dma_fence_get(parent));
> > > + if (test_bit(DRM_DEP_FENCE_FLAG_HAS_DEADLINE_BIT,
> > > +     &dfence->finished.flags))
> > > + dma_fence_set_deadline(parent, dfence->deadline);
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_fence_finished() - signal the finished fence with a result
> > > + * @dfence: dep fence to signal
> > > + * @result: error code to set, or 0 for success
> > > + *
> > > + * Sets the fence error to @result if non-zero, then signals
> > > + * @dfence->finished. Also removes parent visibility under the fence lock
> > > + * and drops the parent reference. Dropping the parent here allows the
> > > + * DRM dep fence to be completely decoupled from the DRM dep module.
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +static void drm_dep_fence_finished(struct drm_dep_fence *dfence, int result)
> > > +{
> > > + struct dma_fence *parent;
> > > + unsigned long flags;
> > > +
> > > + dma_fence_lock_irqsave(&dfence->finished, flags);
> > > + if (result)
> > > + dma_fence_set_error(&dfence->finished, result);
> > > + dma_fence_signal_locked(&dfence->finished);
> > > + parent = dfence->parent;
> > > + dfence->parent = NULL;
> > > + dma_fence_unlock_irqrestore(&dfence->finished, flags);
> > > +
> > > + dma_fence_put(parent);
> > > +}
> > 
> > We should really try to move away from manual locks and unlocks.
> > 

I agree. Let's see if we can get dma_fence scoped guard in.

> > > +
> > > +static const char *drm_dep_fence_get_driver_name(struct dma_fence *fence)
> > > +{
> > > + return "drm_dep";
> > > +}
> > > +
> > > +static const char *drm_dep_fence_get_timeline_name(struct dma_fence *f)
> > > +{
> > > + struct drm_dep_fence *dfence = to_drm_dep_fence(f);
> > > +
> > > + return dfence->q->name;
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_fence_get_parent() - get a reference to the parent hardware fence
> > > + * @dfence: dep fence to query
> > > + *
> > > + * Returns a new reference to @dfence->parent, or NULL if the parent has
> > > + * already been cleared (i.e. @dfence->finished has signalled and the parent
> > > + * reference was dropped under the fence lock).
> > > + *
> > > + * Uses smp_load_acquire() to pair with the smp_store_release() in
> > > + * drm_dep_fence_set_parent(), ensuring that if we race a concurrent
> > > + * drm_dep_fence_set_parent() call we observe the parent pointer only after
> > > + * the store is fully visible — before set_parent() tests
> > > + * %DRM_DEP_FENCE_FLAG_HAS_DEADLINE_BIT.
> > > + *
> > > + * Caller must hold the fence lock on @dfence->finished.
> > > + *
> > > + * Context: Any context, fence lock on @dfence->finished must be held.
> > > + * Return: a new reference to the parent fence, or NULL.
> > > + */
> > > +static struct dma_fence *drm_dep_fence_get_parent(struct drm_dep_fence *dfence)
> > > +{
> > > + dma_fence_assert_held(&dfence->finished);
> > 
> > > +
> > > + return dma_fence_get(smp_load_acquire(&dfence->parent));
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_fence_set_deadline() - dma_fence_ops deadline callback
> > > + * @f: fence on which the deadline is being set
> > > + * @deadline: the deadline hint to apply
> > > + *
> > > + * Stores the earliest deadline under the fence lock, then propagates
> > > + * it to the parent hardware fence via smp_load_acquire() to race
> > > + * safely with drm_dep_fence_set_parent().
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +static void drm_dep_fence_set_deadline(struct dma_fence *f, ktime_t deadline)
> > > +{
> > > + struct drm_dep_fence *dfence = to_drm_dep_fence(f);
> > > + struct dma_fence *parent;
> > > + unsigned long flags;
> > > +
> > > + dma_fence_lock_irqsave(f, flags);
> > > +
> > > + /* If we already have an earlier deadline, keep it: */
> > > + if (test_bit(DRM_DEP_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) &&
> > > +    ktime_before(dfence->deadline, deadline)) {
> > > + dma_fence_unlock_irqrestore(f, flags);
> > > + return;
> > > + }
> > > +
> > > + dfence->deadline = deadline;
> > > + set_bit(DRM_DEP_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags);
> > > +
> > > + parent = drm_dep_fence_get_parent(dfence);
> > > + dma_fence_unlock_irqrestore(f, flags);
> > > +
> > > + if (parent)
> > > + dma_fence_set_deadline(parent, deadline);
> > > +
> > > + dma_fence_put(parent);
> > > +}
> > > +
> > > +static const struct dma_fence_ops drm_dep_fence_ops = {
> > > + .get_driver_name = drm_dep_fence_get_driver_name,
> > > + .get_timeline_name = drm_dep_fence_get_timeline_name,
> > > + .set_deadline = drm_dep_fence_set_deadline,
> > > +};
> > > +
> > > +/**
> > > + * drm_dep_fence_alloc() - allocate a dep fence
> > > + *
> > > + * Allocates a &drm_dep_fence with kzalloc() without initialising the
> > > + * dma_fence. Call drm_dep_fence_init() to fully initialise it.
> > > + *
> > > + * Context: Process context.
> > > + * Return: new &drm_dep_fence on success, NULL on allocation failure.
> > > + */
> > > +struct drm_dep_fence *drm_dep_fence_alloc(void)
> > > +{
> > > + return kzalloc_obj(struct drm_dep_fence);
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_fence_init() - initialise the dma_fence inside a dep fence
> > > + * @dfence: dep fence to initialise
> > > + * @q: queue the owning job belongs to
> > > + *
> > > + * Initialises @dfence->finished using the context and sequence number from @q.
> > > + * Passes NULL as the lock so the fence uses its inline spinlock.
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +void drm_dep_fence_init(struct drm_dep_fence *dfence, struct drm_dep_queue *q)
> > > +{
> > > + u32 seq = ++q->fence.seqno;
> > > +
> > > + /*
> > > + * XXX: Inline fence hazard: currently all expected users of DRM dep
> > > + * hardware fences have a unique lockdep class. If that ever changes,
> > > + * we will need to assign a unique lockdep class here so lockdep knows
> > > + * this fence is allowed to nest with driver hardware fences.
> > > + */
> > > +
> > > + dfence->q = q;
> > > + dma_fence_init(&dfence->finished, &drm_dep_fence_ops,
> > > +       NULL, q->fence.context, seq);
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_fence_cleanup() - release a dep fence at job teardown
> > > + * @dfence: dep fence to clean up
> > > + *
> > > + * Called from drm_dep_job_fini(). If the dep fence was armed (refcount > 0)
> > > + * it is released via dma_fence_put() and will be freed by the RCU release
> > > + * callback once all waiters have dropped their references. If it was never
> > > + * armed it is freed directly with kfree().
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +void drm_dep_fence_cleanup(struct drm_dep_fence *dfence)
> > > +{
> > > + if (drm_dep_fence_is_armed(dfence))
> > > + dma_fence_put(&dfence->finished);
> > > + else
> > > + kfree(dfence);
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_fence_is_armed() - check whether the fence has been armed
> > > + * @dfence: dep fence to check
> > > + *
> > > + * Returns true if drm_dep_job_arm() has been called, i.e. @dfence->finished
> > > + * has been initialised and its reference count is non-zero.  Used by
> > > + * assertions to enforce correct job lifecycle ordering (arm before push,
> > > + * add_dependency before arm).
> > > + *
> > > + * Context: Any context.
> > > + * Return: true if the fence is armed, false otherwise.
> > > + */
> > > +bool drm_dep_fence_is_armed(struct drm_dep_fence *dfence)
> > > +{
> > > + return !!kref_read(&dfence->finished.refcount);
> > > +}
> > 
> > > +
> > > +/**
> > > + * drm_dep_fence_is_finished() - test whether the finished fence has signalled
> > > + * @dfence: dep fence to check
> > > + *
> > > + * Uses dma_fence_test_signaled_flag() to read %DMA_FENCE_FLAG_SIGNALED_BIT
> > > + * directly without invoking the fence's ->signaled() callback or triggering
> > > + * any signalling side-effects.
> > > + *
> > > + * Context: Any context.
> > > + * Return: true if @dfence->finished has been signalled, false otherwise.
> > > + */
> > > +bool drm_dep_fence_is_finished(struct drm_dep_fence *dfence)
> > > +{
> > > + return dma_fence_test_signaled_flag(&dfence->finished);
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_fence_is_complete() - test whether the job has completed
> > > + * @dfence: dep fence to check
> > > + *
> > > + * Takes the fence lock on @dfence->finished and calls
> > > + * drm_dep_fence_get_parent() to safely obtain a reference to the parent
> > > + * hardware fence — or NULL if the parent has already been cleared after
> > > + * signalling.  Calls dma_fence_is_signaled() on @parent outside the lock,
> > > + * which may invoke the fence's ->signaled() callback and trigger signalling
> > > + * side-effects if the fence has completed but the signalled flag has not yet
> > > + * been set.  The finished fence is tested via dma_fence_test_signaled_flag(),
> > > + * without side-effects.
> > > + *
> > > + * May only be called on a stopped queue (see drm_dep_queue_is_stopped()).
> > > + *
> > > + * Context: Process context. The queue must be stopped before calling this.
> > > + * Return: true if the job is complete, false otherwise.
> > > + */
> > > +bool drm_dep_fence_is_complete(struct drm_dep_fence *dfence)
> > > +{
> > > + struct dma_fence *parent;
> > > + unsigned long flags;
> > > + bool complete;
> > > +
> > > + dma_fence_lock_irqsave(&dfence->finished, flags);
> > > + parent = drm_dep_fence_get_parent(dfence);
> > > + dma_fence_unlock_irqrestore(&dfence->finished, flags);
> > > +
> > > + complete = (parent && dma_fence_is_signaled(parent)) ||
> > > + dma_fence_test_signaled_flag(&dfence->finished);
> > > +
> > > + dma_fence_put(parent);
> > > +
> > > + return complete;
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_fence_to_dma() - return the finished dma_fence for a dep fence
> > > + * @dfence: dep fence to query
> > > + *
> > > + * No reference is taken; the caller must hold its own reference to the owning
> > > + * &drm_dep_job for the duration of the access.
> > > + *
> > > + * Context: Any context.
> > > + * Return: the finished &dma_fence.
> > > + */
> > > +struct dma_fence *drm_dep_fence_to_dma(struct drm_dep_fence *dfence)
> > > +{
> > > + return &dfence->finished;
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_fence_done() - signal the finished fence on job completion
> > > + * @dfence: dep fence to signal
> > > + * @result: job error code, or 0 on success
> > > + *
> > > + * Gets a temporary reference to @dfence->finished to guard against a racing
> > > + * last-put, signals the fence with @result, then drops the temporary
> > > + * reference. Called from drm_dep_job_done() in the queue core when a
> > > + * hardware completion callback fires or when run_job() returns immediately.
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +void drm_dep_fence_done(struct drm_dep_fence *dfence, int result)
> > > +{
> > > + dma_fence_get(&dfence->finished);
> > > + drm_dep_fence_finished(dfence, result);
> > > + dma_fence_put(&dfence->finished);
> > > +}
> > 
> > Proper refcounting is automated (and enforced) in Rust.
> > 

That is a nice feature.

> > > diff --git a/drivers/gpu/drm/dep/drm_dep_fence.h b/drivers/gpu/drm/dep/drm_dep_fence.h
> > > new file mode 100644
> > > index 000000000000..65a1582f858b
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/dep/drm_dep_fence.h
> > > @@ -0,0 +1,25 @@
> > > +/* SPDX-License-Identifier: MIT */
> > > +/*
> > > + * Copyright © 2026 Intel Corporation
> > > + */
> > > +
> > > +#ifndef _DRM_DEP_FENCE_H_
> > > +#define _DRM_DEP_FENCE_H_
> > > +
> > > +#include <linux/dma-fence.h>
> > > +
> > > +struct drm_dep_fence;
> > > +struct drm_dep_queue;
> > > +
> > > +struct drm_dep_fence *drm_dep_fence_alloc(void);
> > > +void drm_dep_fence_init(struct drm_dep_fence *dfence, struct drm_dep_queue *q);
> > > +void drm_dep_fence_cleanup(struct drm_dep_fence *dfence);
> > > +void drm_dep_fence_set_parent(struct drm_dep_fence *dfence,
> > > +      struct dma_fence *parent);
> > > +void drm_dep_fence_done(struct drm_dep_fence *dfence, int result);
> > > +bool drm_dep_fence_is_armed(struct drm_dep_fence *dfence);
> > > +bool drm_dep_fence_is_finished(struct drm_dep_fence *dfence);
> > > +bool drm_dep_fence_is_complete(struct drm_dep_fence *dfence);
> > > +struct dma_fence *drm_dep_fence_to_dma(struct drm_dep_fence *dfence);
> > > +
> > > +#endif /* _DRM_DEP_FENCE_H_ */
> > > diff --git a/drivers/gpu/drm/dep/drm_dep_job.c b/drivers/gpu/drm/dep/drm_dep_job.c
> > > new file mode 100644
> > > index 000000000000..2d012b29a5fc
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/dep/drm_dep_job.c
> > > @@ -0,0 +1,675 @@
> > > +// SPDX-License-Identifier: MIT
> > > +/*
> > > + * Copyright 2015 Advanced Micro Devices, Inc.
> > > + *
> > > + * Permission is hereby granted, free of charge, to any person obtaining a
> > > + * copy of this software and associated documentation files (the "Software"),
> > > + * to deal in the Software without restriction, including without limitation
> > > + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> > > + * and/or sell copies of the Software, and to permit persons to whom the
> > > + * Software is furnished to do so, subject to the following conditions:
> > > + *
> > > + * The above copyright notice and this permission notice shall be included in
> > > + * all copies or substantial portions of the Software.
> > > + *
> > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> > > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> > > + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> > > + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> > > + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> > > + * OTHER DEALINGS IN THE SOFTWARE.
> > > + *
> > > + * Copyright © 2026 Intel Corporation
> > > + */
> > > +
> > > +/**
> > > + * DOC: DRM dependency job
> > > + *
> > > + * A struct drm_dep_job represents a single unit of GPU work associated with
> > > + * a struct drm_dep_queue. The lifecycle of a job is:
> > > + *
> > > + * 1. **Allocation**: the driver allocates memory for the job (typically by
> > > + *    embedding struct drm_dep_job in a larger structure) and calls
> > > + *    drm_dep_job_init() to initialise it. On success the job holds one
> > > + *    kref reference and a reference to its queue.
> > > + *
> > > + * 2. **Dependency collection**: the driver calls drm_dep_job_add_dependency(),
> > > + *    drm_dep_job_add_syncobj_dependency(), drm_dep_job_add_resv_dependencies(),
> > > + *    or drm_dep_job_add_implicit_dependencies() to register dma_fence objects
> > > + *    that must be signalled before the job can run. Duplicate fences from the
> > > + *    same fence context are deduplicated automatically.
> > > + *
> > > + * 3. **Arming**: drm_dep_job_arm() initialises the job's finished fence,
> > > + *    consuming a sequence number from the queue. After arming,
> > > + *    drm_dep_job_finished_fence() returns a valid fence that may be passed to
> > > + *    userspace or used as a dependency by other jobs.
> > > + *
> > > + * 4. **Submission**: drm_dep_job_push() submits the job to the queue. The
> > > + *    queue takes a reference that it holds until the job's finished fence
> > > + *    signals and the job is freed by the put_job worker.
> > > + *
> > > + * 5. **Completion**: when the job's hardware work finishes its finished fence
> > > + *    is signalled and drm_dep_job_put() is called by the queue. The driver
> > > + *    must release any driver-private resources in &drm_dep_job_ops.release.
> > > + *
> > > + * Reference counting uses drm_dep_job_get() / drm_dep_job_put(). The
> > > + * internal drm_dep_job_fini() tears down the dependency xarray and fence
> > > + * objects before the driver's release callback is invoked.
> > > + */
> > > +
> > > +#include <linux/dma-resv.h>
> > > +#include <linux/kref.h>
> > > +#include <linux/slab.h>
> > > +#include <drm/drm_dep.h>
> > > +#include <drm/drm_file.h>
> > > +#include <drm/drm_gem.h>
> > > +#include <drm/drm_syncobj.h>
> > > +#include "drm_dep_fence.h"
> > > +#include "drm_dep_job.h"
> > > +#include "drm_dep_queue.h"
> > > +
> > > +/**
> > > + * drm_dep_job_init() - initialise a dep job
> > > + * @job: dep job to initialise
> > > + * @args: initialisation arguments
> > > + *
> > > + * Initialises @job with the queue, ops and credit count from @args.  Acquires
> > > + * a reference to @args->q via drm_dep_queue_get(); this reference is held for
> > > + * the lifetime of the job and released by drm_dep_job_release() when the last
> > > + * job reference is dropped.
> > > + *
> > > + * Resources are released automatically when the last reference is dropped
> > > + * via drm_dep_job_put(), which must be called to release the job; drivers
> > > + * must not free the job directly.
> > 
> > Again, can’t enforce that in C.
> > 

I agree. A driver could just kfree() the job after init… but in this
design the driver unload would hang.

> > > + *
> > > + * Context: Process context. Allocates memory with GFP_KERNEL.
> > > + * Return: 0 on success, -%EINVAL if credits is 0,
> > > + *   -%ENOMEM on fence allocation failure.
> > > + */
> > > +int drm_dep_job_init(struct drm_dep_job *job,
> > > +     const struct drm_dep_job_init_args *args)
> > > +{
> > > + if (unlikely(!args->credits)) {
> > > + pr_err("drm_dep: %s: credits cannot be 0\n", __func__);
> > > + return -EINVAL;
> > > + }
> > > +
> > > + memset(job, 0, sizeof(*job));
> > > +
> > > + job->dfence = drm_dep_fence_alloc();
> > > + if (!job->dfence)
> > > + return -ENOMEM;
> > > +
> > > + job->ops = args->ops;
> > > + job->q = drm_dep_queue_get(args->q);
> > > + job->credits = args->credits;
> > > +
> > > + kref_init(&job->refcount);
> > > + xa_init_flags(&job->dependencies, XA_FLAGS_ALLOC);
> > > + INIT_LIST_HEAD(&job->pending_link);
> > > +
> > > + return 0;
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_job_init);
> > > +
> > > +/**
> > > + * drm_dep_job_drop_dependencies() - release all input dependency fences
> > > + * @job: dep job whose dependency xarray to drain
> > > + *
> > > + * Walks @job->dependencies, puts each fence, and destroys the xarray.
> > > + * Any slots still holding a %DRM_DEP_JOB_FENCE_PREALLOC sentinel —
> > > + * i.e. slots that were pre-allocated but never replaced — are silently
> > > + * skipped; the sentinel carries no reference.  Called from
> > > + * drm_dep_queue_run_job() in process context immediately after
> > > + * @ops->run_job() returns, before the final drm_dep_job_put().  Releasing
> > > + * dependencies here — while still in process context — avoids calling
> > > + * xa_destroy() from IRQ context if the job's last reference is later
> > > + * dropped from a dma_fence callback.
> > > + *
> > > + * Context: Process context.
> > > + */
> > > +void drm_dep_job_drop_dependencies(struct drm_dep_job *job)
> > > +{
> > > + struct dma_fence *fence;
> > > + unsigned long index;
> > > +
> > > + xa_for_each(&job->dependencies, index, fence) {
> > > + if (unlikely(fence == DRM_DEP_JOB_FENCE_PREALLOC))
> > > + continue;
> > > + dma_fence_put(fence);
> > > + }
> > > + xa_destroy(&job->dependencies);
> > > +}
> > 
> > This is automated in Rust. You also can’t “forget” to call this.

Driver code can’t call this function—note the lack of an export. DRM dep
owns this call, and it always invokes it. But as discussed, a driver
could still kfree() the job or forget to drop its creation reference.

> > 
> > > +
> > > +/**
> > > + * drm_dep_job_fini() - clean up a dep job
> > > + * @job: dep job to clean up
> > > + *
> > > + * Cleans up the dep fence and drops the queue reference held by @job.
> > > + *
> > > + * If the job was never armed (e.g. init failed before drm_dep_job_arm()),
> > > + * the dependency xarray is also released here.  For armed jobs the xarray
> > > + * has already been drained by drm_dep_job_drop_dependencies() in process
> > > + * context immediately after run_job(), so it is left untouched to avoid
> > > + * calling xa_destroy() from IRQ context.
> > > + *
> > > + * Warns if @job is still linked on the queue's pending list, which would
> > > + * indicate a bug in the teardown ordering.
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +static void drm_dep_job_fini(struct drm_dep_job *job)
> > > +{
> > > + bool armed = drm_dep_fence_is_armed(job->dfence);
> > > +
> > > + WARN_ON(!list_empty(&job->pending_link));
> > > +
> > > + drm_dep_fence_cleanup(job->dfence);
> > > + job->dfence = NULL;
> > > +
> > > + /*
> > > + * Armed jobs have their dependencies drained by
> > > + * drm_dep_job_drop_dependencies() in process context after run_job().
> > > + * Skip here to avoid calling xa_destroy() from IRQ context.
> > > + */
> > > + if (!armed)
> > > + drm_dep_job_drop_dependencies(job);
> > > +}
> > 
> > Same here.
> > 
> > > +
> > > +/**
> > > + * drm_dep_job_get() - acquire a reference to a dep job
> > > + * @job: dep job to acquire a reference on, or NULL
> > > + *
> > > + * Context: Any context.
> > > + * Return: @job with an additional reference held, or NULL if @job is NULL.
> > > + */
> > > +struct drm_dep_job *drm_dep_job_get(struct drm_dep_job *job)
> > > +{
> > > + if (job)
> > > + kref_get(&job->refcount);
> > > + return job;
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_job_get);
> > > +
> > 
> > Same here.
> > 
> > > +/**
> > > + * drm_dep_job_release() - kref release callback for a dep job
> > > + * @kref: kref embedded in the dep job
> > > + *
> > > + * Calls drm_dep_job_fini(), then invokes &drm_dep_job_ops.release if set,
> > > + * otherwise frees @job with kfree().  Finally, releases the queue reference
> > > + * that was acquired by drm_dep_job_init() via drm_dep_queue_put().  The
> > > + * queue put is performed last to ensure no queue state is accessed after
> > > + * the job memory is freed.
> > > + *
> > > + * Context: Any context if %DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE is set on the
> > > + *   job's queue; otherwise process context only, as the release callback may
> > > + *   sleep.
> > > + */
> > > +static void drm_dep_job_release(struct kref *kref)
> > > +{
> > > + struct drm_dep_job *job =
> > > + container_of(kref, struct drm_dep_job, refcount);
> > > + struct drm_dep_queue *q = job->q;
> > > +
> > > + drm_dep_job_fini(job);
> > > +
> > > + if (job->ops && job->ops->release)
> > > + job->ops->release(job);
> > > + else
> > > + kfree(job);
> > > +
> > > + drm_dep_queue_put(q);
> > > +}
> > 
> > Same here.
> > 
> > > +
> > > +/**
> > > + * drm_dep_job_put() - release a reference to a dep job
> > > + * @job: dep job to release a reference on, or NULL
> > > + *
> > > + * When the last reference is dropped, calls &drm_dep_job_ops.release if set,
> > > + * otherwise frees @job with kfree(). Does nothing if @job is NULL.
> > > + *
> > > + * Context: Any context if %DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE is set on the
> > > + *   job's queue; otherwise process context only, as the release callback may
> > > + *   sleep.
> > > + */
> > > +void drm_dep_job_put(struct drm_dep_job *job)
> > > +{
> > > + if (job)
> > > + kref_put(&job->refcount, drm_dep_job_release);
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_job_put);
> > > +
> > 
> > Same here.
> > 
> > > +/**
> > > + * drm_dep_job_arm() - arm a dep job for submission
> > > + * @job: dep job to arm
> > > + *
> > > + * Initialises the finished fence on @job->dfence, assigning
> > > + * it a sequence number from the job's queue. Must be called after
> > > + * drm_dep_job_init() and before drm_dep_job_push(). Once armed,
> > > + * drm_dep_job_finished_fence() returns a valid fence that may be passed to
> > > + * userspace or used as a dependency by other jobs.
> > > + *
> > > + * Begins the DMA fence signalling path via dma_fence_begin_signalling().
> > > + * After this point, memory allocations that could trigger reclaim are
> > > + * forbidden; lockdep enforces this. arm() must always be paired with
> > > + * drm_dep_job_push(); lockdep also enforces this pairing.
> > > + *
> > > + * Warns if the job has already been armed.
> > > + *
> > > + * Context: Process context if %DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED is set
> > > + *   (takes @q->sched.lock, a mutex); any context otherwise. DMA fence signaling
> > > + *   path.
> > > + */
> > > +void drm_dep_job_arm(struct drm_dep_job *job)
> > > +{
> > > + drm_dep_queue_push_job_begin(job->q);
> > > + WARN_ON(drm_dep_fence_is_armed(job->dfence));
> > > + drm_dep_fence_init(job->dfence, job->q);
> > > + job->signalling_cookie = dma_fence_begin_signalling();
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_job_arm);
> > > +
> > > +/**
> > > + * drm_dep_job_push() - submit a job to its queue for execution
> > > + * @job: dep job to push
> > > + *
> > > + * Submits @job to the queue it was initialised with. Must be called after
> > > + * drm_dep_job_arm(). Acquires a reference on @job on behalf of the queue,
> > > + * held until the queue is fully done with it. The reference is released
> > > + * directly in the finished-fence dma_fence callback for queues with
> > > + * %DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE (where drm_dep_job_done() may run
> > > + * from hardirq context), or via the put_job work item on the submit
> > > + * workqueue otherwise.
> > > + *
> > > + * Ends the DMA fence signalling path begun by drm_dep_job_arm() via
> > > + * dma_fence_end_signalling(). This must be paired with arm(); lockdep
> > > + * enforces the pairing.
> > > + *
> > > + * Once pushed, &drm_dep_queue_ops.run_job is guaranteed to be called for
> > > + * @job exactly once, even if the queue is killed or torn down before the
> > > + * job reaches the head of the queue. Drivers can use this guarantee to
> > > + * perform bookkeeping cleanup; the actual backend operation should be
> > > + * skipped when drm_dep_queue_is_killed() returns true.
> > > + *
> > > + * If the queue does not support the bypass path, the job is pushed directly
> > > + * onto the SPSC submission queue via drm_dep_queue_push_job() without holding
> > > + * @q->sched.lock. Otherwise, @q->sched.lock is taken and the job is either
> > > + * run immediately via drm_dep_queue_run_job() if it qualifies for bypass, or
> > > + * enqueued via drm_dep_queue_push_job() for dispatch by the run_job work item.
> > > + *
> > > + * Warns if the job has not been armed.
> > > + *
> > > + * Context: Process context if %DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED is set
> > > + *   (takes @q->sched.lock, a mutex); any context otherwise. DMA fence signaling
> > > + *   path.
> > > + */
> > > +void drm_dep_job_push(struct drm_dep_job *job)
> > > +{
> > > + struct drm_dep_queue *q = job->q;
> > > +
> > > + WARN_ON(!drm_dep_fence_is_armed(job->dfence));
> > > +
> > > + drm_dep_job_get(job);
> > > +
> > > + if (!(q->sched.flags & DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED)) {
> > > + drm_dep_queue_push_job(q, job);
> > > + dma_fence_end_signalling(job->signalling_cookie);
> > 
> > Signaling is enforced in a more thorough way in Rust. I’ll expand on this later in this patch.
> > 
> > > + drm_dep_queue_push_job_end(job->q);
> > > + return;
> > > + }
> > > +
> > > + scoped_guard(mutex, &q->sched.lock) {
> > > + if (drm_dep_queue_can_job_bypass(q, job))
> > > + drm_dep_queue_run_job(q, job);
> > > + else
> > > + drm_dep_queue_push_job(q, job);
> > > + }
> > > +
> > > + dma_fence_end_signalling(job->signalling_cookie);
> > > + drm_dep_queue_push_job_end(job->q);
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_job_push);
> > > +
> > > +/**
> > > + * drm_dep_job_add_dependency() - adds the fence as a job dependency
> > > + * @job: dep job to add the dependencies to
> > > + * @fence: the dma_fence to add to the list of dependencies, or
> > > + *         %DRM_DEP_JOB_FENCE_PREALLOC to reserve a slot for later.
> > > + *
> > > + * Note that @fence is consumed in both the success and error cases (except
> > > + * when @fence is %DRM_DEP_JOB_FENCE_PREALLOC, which carries no reference).
> > > + *
> > > + * Signalled fences and fences belonging to the same queue as @job (i.e. where
> > > + * fence->context matches the queue's finished fence context) are silently
> > > + * dropped; the job need not wait on its own queue's output.
> > > + *
> > > + * Warns if the job has already been armed (dependencies must be added before
> > > + * drm_dep_job_arm()).
> > > + *
> > > + * **Pre-allocation pattern**
> > > + *
> > > + * When multiple jobs across different queues must be prepared and submitted
> > > + * together in a single atomic commit — for example, where job A's finished
> > > + * fence is an input dependency of job B — all jobs must be armed and pushed
> > > + * within a single dma_fence_begin_signalling() / dma_fence_end_signalling()
> > > + * region.  Once that region has started no memory allocation is permitted.
> > > + *
> > > + * To handle this, pass %DRM_DEP_JOB_FENCE_PREALLOC during the preparation
> > > + * phase (before arming any job, while GFP_KERNEL allocation is still allowed)
> > > + * to pre-allocate a slot in @job->dependencies.  The slot index assigned by
> > > + * the underlying xarray must be tracked by the caller separately (e.g. it is
> > > + * always index 0 when the dependency array is empty, as Xe relies on).
> > > + * After all jobs have been armed and the finished fences are available, call
> > > + * drm_dep_job_replace_dependency() with that index and the real fence.
> > > + * drm_dep_job_replace_dependency() uses GFP_NOWAIT internally and may be
> > > + * called from atomic or signalling context.
> > > + *
> > > + * The sentinel slot is never skipped by the signalled-fence fast-path,
> > > + * ensuring a slot is always allocated even when the real fence is not yet
> > > + * known.
> > > + *
> > > + * **Example: bind job feeding TLB invalidation jobs**
> > > + *
> > > + * Consider a GPU with separate queues for page-table bind operations and for
> > > + * TLB invalidation.  A single atomic commit must:
> > > + *
> > > + *  1. Run a bind job that modifies page tables.
> > > + *  2. Run one TLB-invalidation job per MMU that depends on the bind
> > > + *     completing, so stale translations are flushed before the engines
> > > + *     continue.
> > > + *
> > > + * Because all jobs must be armed and pushed inside a signalling region (where
> > > + * GFP_KERNEL is forbidden), pre-allocate slots before entering the region::
> > > + *
> > > + *   // Phase 1 — process context, GFP_KERNEL allowed
> > > + *   drm_dep_job_init(bind_job, bind_queue, ops);
> > > + *   for_each_mmu(mmu) {
> > > + *       drm_dep_job_init(tlb_job[mmu], tlb_queue[mmu], ops);
> > > + *       // Pre-allocate slot at index 0; real fence not available yet
> > > + *       drm_dep_job_add_dependency(tlb_job[mmu], DRM_DEP_JOB_FENCE_PREALLOC);
> > > + *   }
> > > + *
> > > + *   // Phase 2 — inside signalling region, no GFP_KERNEL
> > > + *   dma_fence_begin_signalling();
> > > + *   drm_dep_job_arm(bind_job);
> > > + *   for_each_mmu(mmu) {
> > > + *       // Swap sentinel for bind job's finished fence
> > > + *       drm_dep_job_replace_dependency(tlb_job[mmu], 0,
> > > + *                                      dma_fence_get(bind_job->finished));
> > > + *       drm_dep_job_arm(tlb_job[mmu]);
> > > + *   }
> > > + *   drm_dep_job_push(bind_job);
> > > + *   for_each_mmu(mmu)
> > > + *       drm_dep_job_push(tlb_job[mmu]);
> > > + *   dma_fence_end_signalling();
> > > + *
> > > + * Context: Process context. May allocate memory with GFP_KERNEL.
> > > + * Return: If fence == DRM_DEP_JOB_FENCE_PREALLOC index of allocation on
> > > + * success, else 0 on success, or a negative error code.
> > > + */
> > 
> > > +int drm_dep_job_add_dependency(struct drm_dep_job *job, struct dma_fence *fence)
> > > +{
> > > + struct drm_dep_queue *q = job->q;
> > > + struct dma_fence *entry;
> > > + unsigned long index;
> > > + u32 id = 0;
> > > + int ret;
> > > +
> > > + WARN_ON(drm_dep_fence_is_armed(job->dfence));
> > > + might_alloc(GFP_KERNEL);
> > > +
> > > + if (!fence)
> > > + return 0;
> > > +
> > > + if (fence == DRM_DEP_JOB_FENCE_PREALLOC)
> > > + goto add_fence;
> > > +
> > > + /*
> > > + * Ignore signalled fences or fences from our own queue — finished
> > > + * fences use q->fence.context.
> > > + */
> > > + if (dma_fence_test_signaled_flag(fence) ||
> > > +    fence->context == q->fence.context) {
> > > + dma_fence_put(fence);
> > > + return 0;
> > > + }
> > > +
> > > + /* Deduplicate if we already depend on a fence from the same context.
> > > + * This lets the size of the array of deps scale with the number of
> > > + * engines involved, rather than the number of BOs.
> > > + */
> > > + xa_for_each(&job->dependencies, index, entry) {
> > > + if (entry == DRM_DEP_JOB_FENCE_PREALLOC ||
> > > +    entry->context != fence->context)
> > > + continue;
> > > +
> > > + if (dma_fence_is_later(fence, entry)) {
> > > + dma_fence_put(entry);
> > > + xa_store(&job->dependencies, index, fence, GFP_KERNEL);
> > > + } else {
> > > + dma_fence_put(fence);
> > > + }
> > > + return 0;
> > > + }
> > > +
> > > +add_fence:
> > > + ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b,
> > > +       GFP_KERNEL);
> > > + if (ret != 0) {
> > > + if (fence != DRM_DEP_JOB_FENCE_PREALLOC)
> > > + dma_fence_put(fence);
> > > + return ret;
> > > + }
> > > +
> > > + return (fence == DRM_DEP_JOB_FENCE_PREALLOC) ? id : 0;
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_job_add_dependency);
> > > +
> > > +/**
> > > + * drm_dep_job_replace_dependency() - replace a pre-allocated dependency slot
> > > + * @job: dep job to update
> > > + * @index: xarray index of the slot to replace, as returned when the sentinel
> > > + *         was originally inserted via drm_dep_job_add_dependency()
> > > + * @fence: the real dma_fence to store; its reference is always consumed
> > > + *
> > > + * Replaces the %DRM_DEP_JOB_FENCE_PREALLOC sentinel at @index in
> > > + * @job->dependencies with @fence.  The slot must have been pre-allocated by
> > > + * passing %DRM_DEP_JOB_FENCE_PREALLOC to drm_dep_job_add_dependency(); the
> > > + * existing entry is asserted to be the sentinel.
> > > + *
> > > + * This is the second half of the pre-allocation pattern described in
> > > + * drm_dep_job_add_dependency().  It is intended to be called inside a
> > > + * dma_fence_begin_signalling() / dma_fence_end_signalling() region where
> > > + * memory allocation with GFP_KERNEL is forbidden.  It uses GFP_NOWAIT
> > > + * internally so it is safe to call from atomic or signalling context, but
> > > + * since the slot has been pre-allocated no actual memory allocation occurs.
> > > + *
> > > + * If @fence is already signalled the slot is erased rather than storing a
> > > + * redundant dependency.  The successful store is asserted — if the store
> > > + * fails it indicates a programming error (slot index out of range or
> > > + * concurrent modification).
> > > + *
> > > + * Must be called before drm_dep_job_arm(). @fence is consumed in all cases.
> > 
> > Can’t enforce this in C. Also, how is the fence “consumed” ? You can’t enforce that
> > the user can’t access the fence anymore after this function returns, like we can do
> > at compile time in Rust.
> > 

I agree—you can’t enforce correct usage at compile time. The best you
can do is document the rules and annotate them. DRM dep will complain
when those rules are violated.

> > > + *
> > > + * Context: Any context. DMA fence signaling path.
> > > + */
> > > +void drm_dep_job_replace_dependency(struct drm_dep_job *job, u32 index,
> > > +    struct dma_fence *fence)
> > > +{
> > > + WARN_ON(xa_load(&job->dependencies, index) !=
> > > + DRM_DEP_JOB_FENCE_PREALLOC);
> > > +
> > > + if (dma_fence_test_signaled_flag(fence)) {
> > > + xa_erase(&job->dependencies, index);
> > > + dma_fence_put(fence);
> > > + return;
> > > + }
> > > +
> > > + if (WARN_ON(xa_is_err(xa_store(&job->dependencies, index, fence,
> > > +       GFP_NOWAIT)))) {
> > > + dma_fence_put(fence);
> > > + return;
> > > + }
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_job_replace_dependency);
> > > +
> > > +/**
> > > + * drm_dep_job_add_syncobj_dependency() - adds a syncobj's fence as a
> > > + *   job dependency
> > > + * @job: dep job to add the dependencies to
> > > + * @file: drm file private pointer
> > > + * @handle: syncobj handle to lookup
> > > + * @point: timeline point
> > > + *
> > > + * This adds the fence matching the given syncobj to @job.
> > > + *
> > > + * Context: Process context.
> > > + * Return: 0 on success, or a negative error code.
> > > + */
> > > +int drm_dep_job_add_syncobj_dependency(struct drm_dep_job *job,
> > > +       struct drm_file *file, u32 handle,
> > > +       u32 point)
> > > +{
> > > + struct dma_fence *fence;
> > > + int ret;
> > > +
> > > + ret = drm_syncobj_find_fence(file, handle, point, 0, &fence);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + return drm_dep_job_add_dependency(job, fence);
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_job_add_syncobj_dependency);
> > > +
> > > +/**
> > > + * drm_dep_job_add_resv_dependencies() - add all fences from the resv to the job
> > > + * @job: dep job to add the dependencies to
> > > + * @resv: the dma_resv object to get the fences from
> > > + * @usage: the dma_resv_usage to use to filter the fences
> > > + *
> > > + * This adds all fences matching the given usage from @resv to @job.
> > > + * Must be called with the @resv lock held.
> > > + *
> > > + * Context: Process context.
> > > + * Return: 0 on success, or a negative error code.
> > > + */
> > > +int drm_dep_job_add_resv_dependencies(struct drm_dep_job *job,
> > > +      struct dma_resv *resv,
> > > +      enum dma_resv_usage usage)
> > > +{
> > > + struct dma_resv_iter cursor;
> > > + struct dma_fence *fence;
> > > + int ret;
> > > +
> > > + dma_resv_assert_held(resv);
> > > +
> > > + dma_resv_for_each_fence(&cursor, resv, usage, fence) {
> > > + /*
> > > + * As drm_dep_job_add_dependency always consumes the fence
> > > + * reference (even when it fails), and dma_resv_for_each_fence
> > > + * is not obtaining one, we need to grab one before calling.
> > > + */
> > > + ret = drm_dep_job_add_dependency(job, dma_fence_get(fence));
> > > + if (ret)
> > > + return ret;
> > > + }
> > > + return 0;
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_job_add_resv_dependencies);
> > > +
> > > +/**
> > > + * drm_dep_job_add_implicit_dependencies() - adds implicit dependencies
> > > + *   as job dependencies
> > > + * @job: dep job to add the dependencies to
> > > + * @obj: the gem object to add new dependencies from.
> > > + * @write: whether the job might write the object (so we need to depend on
> > > + * shared fences in the reservation object).
> > > + *
> > > + * This should be called after drm_gem_lock_reservations() on your array of
> > > + * GEM objects used in the job but before updating the reservations with your
> > > + * own fences.
> > > + *
> > > + * Context: Process context.
> > > + * Return: 0 on success, or a negative error code.
> > > + */
> > > +int drm_dep_job_add_implicit_dependencies(struct drm_dep_job *job,
> > > +  struct drm_gem_object *obj,
> > > +  bool write)
> > > +{
> > > + return drm_dep_job_add_resv_dependencies(job, obj->resv,
> > > + dma_resv_usage_rw(write));
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_job_add_implicit_dependencies);
> > > +
> > > +/**
> > > + * drm_dep_job_is_signaled() - check whether a dep job has completed
> > > + * @job: dep job to check
> > > + *
> > > + * Determines whether @job has signalled. The queue should be stopped before
> > > + * calling this to obtain a stable snapshot of state. Both the parent hardware
> > > + * fence and the finished software fence are checked.
> > > + *
> > > + * Context: Process context. The queue must be stopped before calling this.
> > > + * Return: true if the job is signalled, false otherwise.
> > > + */
> > > +bool drm_dep_job_is_signaled(struct drm_dep_job *job)
> > > +{
> > > + WARN_ON(!drm_dep_queue_is_stopped(job->q));
> > > + return drm_dep_fence_is_complete(job->dfence);
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_job_is_signaled);
> > > +
> > > +/**
> > > + * drm_dep_job_is_finished() - test whether a dep job's finished fence has signalled
> > > + * @job: dep job to check
> > > + *
> > > + * Tests whether the job's software finished fence has been signalled, using
> > > + * dma_fence_test_signaled_flag() to avoid any signalling side-effects. Unlike
> > > + * drm_dep_job_is_signaled(), this does not require the queue to be stopped and
> > > + * does not check the parent hardware fence — it is a lightweight test of the
> > > + * finished fence only.
> > > + *
> > > + * Context: Any context.
> > > + * Return: true if the job's finished fence has been signalled, false otherwise.
> > > + */
> > > +bool drm_dep_job_is_finished(struct drm_dep_job *job)
> > > +{
> > > + return drm_dep_fence_is_finished(job->dfence);
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_job_is_finished);
> > > +
> > > +/**
> > > + * drm_dep_job_invalidate_job() - increment the invalidation count for a job
> > > + * @job: dep job to invalidate
> > > + * @threshold: threshold above which the job is considered invalidated
> > > + *
> > > + * Increments @job->invalidate_count and returns true if it exceeds @threshold,
> > > + * indicating the job should be considered hung and discarded. The queue must
> > > + * be stopped before calling this function.
> > > + *
> > > + * Context: Process context. The queue must be stopped before calling this.
> > > + * Return: true if @job->invalidate_count exceeds @threshold, false otherwise.
> > > + */
> > > +bool drm_dep_job_invalidate_job(struct drm_dep_job *job, int threshold)
> > > +{
> > > + WARN_ON(!drm_dep_queue_is_stopped(job->q));
> > > + return ++job->invalidate_count > threshold;
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_job_invalidate_job);
> > > +
> > > +/**
> > > + * drm_dep_job_finished_fence() - return the finished fence for a job
> > > + * @job: dep job to query
> > > + *
> > > + * No reference is taken on the returned fence; the caller must hold its own
> > > + * reference to @job for the duration of any access.
> > 
> > Can’t enforce this in C.
> > 
> > > + *
> > > + * Context: Any context.
> > > + * Return: the finished &dma_fence for @job.
> > > + */
> > > +struct dma_fence *drm_dep_job_finished_fence(struct drm_dep_job *job)
> > > +{
> > > + return drm_dep_fence_to_dma(job->dfence);
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_job_finished_fence);
> > > diff --git a/drivers/gpu/drm/dep/drm_dep_job.h b/drivers/gpu/drm/dep/drm_dep_job.h
> > > new file mode 100644
> > > index 000000000000..35c61d258fa1
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/dep/drm_dep_job.h
> > > @@ -0,0 +1,13 @@
> > > +/* SPDX-License-Identifier: MIT */
> > > +/*
> > > + * Copyright © 2026 Intel Corporation
> > > + */
> > > +
> > > +#ifndef _DRM_DEP_JOB_H_
> > > +#define _DRM_DEP_JOB_H_
> > > +
> > > +struct drm_dep_queue;
> > > +
> > > +void drm_dep_job_drop_dependencies(struct drm_dep_job *job);
> > > +
> > > +#endif /* _DRM_DEP_JOB_H_ */
> > > diff --git a/drivers/gpu/drm/dep/drm_dep_queue.c b/drivers/gpu/drm/dep/drm_dep_queue.c
> > > new file mode 100644
> > > index 000000000000..dac02d0d22c4
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/dep/drm_dep_queue.c
> > > @@ -0,0 +1,1647 @@
> > > +// SPDX-License-Identifier: MIT
> > > +/*
> > > + * Copyright 2015 Advanced Micro Devices, Inc.
> > > + *
> > > + * Permission is hereby granted, free of charge, to any person obtaining a
> > > + * copy of this software and associated documentation files (the "Software"),
> > > + * to deal in the Software without restriction, including without limitation
> > > + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> > > + * and/or sell copies of the Software, and to permit persons to whom the
> > > + * Software is furnished to do so, subject to the following conditions:
> > > + *
> > > + * The above copyright notice and this permission notice shall be included in
> > > + * all copies or substantial portions of the Software.
> > > + *
> > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> > > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> > > + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> > > + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> > > + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> > > + * OTHER DEALINGS IN THE SOFTWARE.
> > > + *
> > > + * Copyright © 2026 Intel Corporation
> > > + */
> > > +
> > > +/**
> > > + * DOC: DRM dependency queue
> > > + *
> > > + * The drm_dep subsystem provides a lightweight GPU submission queue that
> > > + * combines the roles of drm_gpu_scheduler and drm_sched_entity into a
> > > + * single object (struct drm_dep_queue). Each queue owns its own ordered
> > > + * submit workqueue, timeout workqueue, and TDR delayed-work.
> > > + *
> > > + * **Job lifecycle**
> > > + *
> > > + * 1. Allocate and initialise a job with drm_dep_job_init().
> > > + * 2. Add dependency fences with drm_dep_job_add_dependency() and friends.
> > > + * 3. Arm the job with drm_dep_job_arm() to obtain its out-fences.
> > > + * 4. Submit with drm_dep_job_push().
> > > + *
> > > + * **Submission paths**
> > > + *
> > > + * drm_dep_job_push() decides between two paths under @q->sched.lock:
> > > + *
> > > + * - **Bypass path** (drm_dep_queue_can_job_bypass()): if
> > > + *   %DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED is set, the queue is not stopped,
> > > + *   the SPSC queue is empty, the job has no dependency fences, and credits
> > > + *   are available, the job is submitted inline on the calling thread without
> > > + *   touching the submit workqueue.
> > > + *
> > > + * - **Queued path** (drm_dep_queue_push_job()): the job is pushed onto an
> > > + *   SPSC queue and the run_job worker is kicked. The run_job worker pops the
> > > + *   job, resolves any remaining dependency fences (installing wakeup
> > > + *   callbacks for unresolved ones), and calls drm_dep_queue_run_job().
> > > + *
> > > + * **Running a job**
> > > + *
> > > + * drm_dep_queue_run_job() accounts credits, appends the job to the pending
> > > + * list (starting the TDR timer only when the list was previously empty),
> > > + * calls @ops->run_job(), stores the returned hardware fence as the parent
> > > + * of the job's dep fence, then installs a callback on it. When the hardware
> > > + * fence fires (or the job completes synchronously), drm_dep_job_done()
> > > + * signals the finished fence, returns credits, and kicks the put_job worker
> > > + * to free the job.
> > > + *
> > > + * **Timeout detection and recovery (TDR)**
> > > + *
> > > + * A delayed work item fires when a job on the pending list takes longer than
> > > + * @q->job.timeout jiffies. It calls @ops->timedout_job() and acts on the
> > > + * returned status (%DRM_DEP_TIMEDOUT_STAT_JOB_SIGNALED or
> > > + * %DRM_DEP_TIMEDOUT_STAT_REQUEUE_JOB).
> > > + * drm_dep_queue_trigger_timeout() forces the timer to fire immediately (without
> > > + * changing the stored timeout), for example during device teardown.
> > > + *
> > > + * **Reference counting**
> > > + *
> > > + * Jobs and queues are both reference counted.
> > > + *
> > > + * A job holds a reference to its queue from drm_dep_job_init() until
> > > + * drm_dep_job_put() drops the job's last reference and its release callback
> > > + * runs. This ensures the queue remains valid for the entire lifetime of any
> > > + * job that was submitted to it.
> > > + *
> > > + * The queue holds its own reference to a job for as long as the job is
> > > + * internally tracked: from the moment the job is added to the pending list
> > > + * in drm_dep_queue_run_job() until drm_dep_job_done() kicks the put_job
> > > + * worker, which calls drm_dep_job_put() to release that reference.
> > 
> > Why not simply keep track that the job was completed, instead of relinquishing
> > the reference? We can then release the reference once the job is cleaned up
> > (by the queue, using a worker) in process context.

I think that’s what I’m doing, while also allowing an opt-in path to
drop the job reference when it signals (in IRQ context) so we avoid
switching to a work item just to drop a ref. That seems like a
significant win in terms of CPU cycles.

> > 
> > 
> > > + *
> > > + * **Hazard: use-after-free from within a worker**
> > > + *
> > > + * Because a job holds a queue reference, drm_dep_job_put() dropping the last
> > > + * job reference will also drop a queue reference via the job's release path.
> > > + * If that happens to be the last queue reference, drm_dep_queue_fini() can be
> > > + * called, which queues @q->free_work on dep_free_wq and returns immediately.
> > > + * free_work calls disable_work_sync() / disable_delayed_work_sync() on the
> > > + * queue's own workers before destroying its workqueues, so in practice a
> > > + * running worker always completes before the queue memory is freed.
> > > + *
> > > + * However, there is a secondary hazard: a worker can be queued while the
> > > + * queue is in a "zombie" state — refcount has already reached zero and async
> > > + * teardown is in flight, but the work item has not yet been disabled by
> > > + * free_work.  To guard against this every worker uses
> > > + * drm_dep_queue_get_unless_zero() at entry; if the refcount is already zero
> > > + * the worker bails immediately without touching the queue state.
> > 
> > Again, this problem is gone in Rust.
> > 

I answer this one above.

> > > + *
> > > + * Because all actual teardown (disable_*_sync, destroy_workqueue) runs on
> > > + * dep_free_wq — which is independent of the queue's own submit/timeout
> > > + * workqueues — there is no deadlock risk.  Each queue holds a drm_dev_get()
> > > + * reference on its owning &drm_device, which is released as the last step of
> > > + * teardown.  This ensures the driver module cannot be unloaded while any queue
> > > + * is still alive.
> > > + */
> > > +
> > > +#include <linux/dma-resv.h>
> > > +#include <linux/kref.h>
> > > +#include <linux/module.h>
> > > +#include <linux/overflow.h>
> > > +#include <linux/slab.h>
> > > +#include <linux/wait.h>
> > > +#include <linux/workqueue.h>
> > > +#include <drm/drm_dep.h>
> > > +#include <drm/drm_drv.h>
> > > +#include <drm/drm_print.h>
> > > +#include "drm_dep_fence.h"
> > > +#include "drm_dep_job.h"
> > > +#include "drm_dep_queue.h"
> > > +
> > > +/*
> > > + * Dedicated workqueue for deferred drm_dep_queue teardown.  Using a
> > > + * module-private WQ instead of system_percpu_wq keeps teardown isolated
> > > + * from unrelated kernel subsystems.
> > > + */
> > > +static struct workqueue_struct *dep_free_wq;
> > > +
> > > +/**
> > > + * drm_dep_queue_flags_set() - set a flag on the queue under sched.lock
> > > + * @q: dep queue
> > > + * @flag: flag to set (one of &enum drm_dep_queue_flags)
> > > + *
> > > + * Sets @flag in @q->sched.flags. Must be called with @q->sched.lock
> > > + * held; the lockdep assertion enforces this.
> > > + *
> > > + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> > > + */
> > > +static void drm_dep_queue_flags_set(struct drm_dep_queue *q,
> > > +    enum drm_dep_queue_flags flag)
> > > +{
> > > + lockdep_assert_held(&q->sched.lock);
> > 
> > We can enforce this in Rust at compile-time. The code does not compile if the
> > lock is not taken. Same here and everywhere else where the sched lock has
> > to be taken.
> > 

I do understand part of Rust and agree it is a nice feature.

> > 
> > > + q->sched.flags |= flag;
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_queue_flags_clear() - clear a flag on the queue under sched.lock
> > > + * @q: dep queue
> > > + * @flag: flag to clear (one of &enum drm_dep_queue_flags)
> > > + *
> > > + * Clears @flag in @q->sched.flags. Must be called with @q->sched.lock
> > > + * held; the lockdep assertion enforces this.
> > > + *
> > > + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> > > + */
> > > +static void drm_dep_queue_flags_clear(struct drm_dep_queue *q,
> > > +      enum drm_dep_queue_flags flag)
> > > +{
> > > + lockdep_assert_held(&q->sched.lock);
> > > + q->sched.flags &= ~flag;
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_queue_has_credits() - check whether the queue has enough credits
> > > + * @q: dep queue
> > > + * @job: job requesting credits
> > > + *
> > > + * Checks whether the queue has enough available credits to dispatch
> > > + * @job. If @job->credits exceeds the queue's credit limit, it is
> > > + * clamped with a WARN.
> > > + *
> > > + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> > > + * Return: true if available credits >= @job->credits, false otherwise.
> > > + */
> > > +static bool drm_dep_queue_has_credits(struct drm_dep_queue *q,
> > > +      struct drm_dep_job *job)
> > > +{
> > > + u32 available;
> > > +
> > > + lockdep_assert_held(&q->sched.lock);
> > > +
> > > + if (job->credits > q->credit.limit) {
> > > + drm_warn(q->drm,
> > > + "Jobs may not exceed the credit limit, truncate.\n");
> > > + job->credits = q->credit.limit;
> > > + }
> > > +
> > > + WARN_ON(check_sub_overflow(q->credit.limit,
> > > +   atomic_read(&q->credit.count),
> > > +   &available));
> > > +
> > > + return available >= job->credits;
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_queue_run_job_queue() - kick the run-job worker
> > > + * @q: dep queue
> > > + *
> > > + * Queues @q->sched.run_job on @q->sched.submit_wq unless the queue is stopped
> > > + * or the job queue is empty.  The empty-queue check avoids queueing a work item
> > > + * that would immediately return with nothing to do.
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +static void drm_dep_queue_run_job_queue(struct drm_dep_queue *q)
> > > +{
> > > + if (!drm_dep_queue_is_stopped(q) && spsc_queue_count(&q->job.queue))
> > > + queue_work(q->sched.submit_wq, &q->sched.run_job);
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_queue_put_job_queue() - kick the put-job worker
> > > + * @q: dep queue
> > > + *
> > > + * Queues @q->sched.put_job on @q->sched.submit_wq unless the queue
> > > + * is stopped.
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +static void drm_dep_queue_put_job_queue(struct drm_dep_queue *q)
> > > +{
> > > + if (!drm_dep_queue_is_stopped(q))
> > > + queue_work(q->sched.submit_wq, &q->sched.put_job);
> > > +}
> > > +
> > > +/**
> > > + * drm_queue_start_timeout() - arm or re-arm the TDR delayed work
> > > + * @q: dep queue
> > > + *
> > > + * Arms the TDR delayed work with @q->job.timeout. No-op if
> > > + * @q->ops->timedout_job is NULL, the timeout is MAX_SCHEDULE_TIMEOUT,
> > > + * or the pending list is empty.
> > > + *
> > > + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> > > + */
> > > +static void drm_queue_start_timeout(struct drm_dep_queue *q)
> > > +{
> > > + lockdep_assert_held(&q->job.lock);
> > > +
> > > + if (!q->ops->timedout_job ||
> > > +    q->job.timeout == MAX_SCHEDULE_TIMEOUT ||
> > > +    list_empty(&q->job.pending))
> > > + return;
> > > +
> > > + mod_delayed_work(q->sched.timeout_wq, &q->sched.tdr, q->job.timeout);
> > > +}
> > > +
> > > +/**
> > > + * drm_queue_start_timeout_unlocked() - arm TDR, acquiring job.lock
> > > + * @q: dep queue
> > > + *
> > > + * Acquires @q->job.lock with irq and calls
> > > + * drm_queue_start_timeout().
> > > + *
> > > + * Context: Process context (workqueue).
> > > + */
> > > +static void drm_queue_start_timeout_unlocked(struct drm_dep_queue *q)
> > > +{
> > > + guard(spinlock_irq)(&q->job.lock);
> > > + drm_queue_start_timeout(q);
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_queue_remove_dependency() - clear the active dependency and wake
> > > + *   the run-job worker
> > > + * @q: dep queue
> > > + * @f: the dependency fence being removed
> > > + *
> > > + * Stores @f into @q->dep.removed_fence via smp_store_release() so that the
> > > + * run-job worker can drop the reference to it in drm_dep_queue_is_ready(),
> > > + * paired with smp_load_acquire().  Clears @q->dep.fence and kicks the
> > > + * run-job worker.
> > > + *
> > > + * The fence reference is not dropped here; it is deferred to the run-job
> > > + * worker via @q->dep.removed_fence to keep this path suitable dma_fence
> > > + * callback removal in drm_dep_queue_kill().
> > 
> > This is a comment in C, but in Rust this is encoded directly in the type system.
> > 
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +static void drm_dep_queue_remove_dependency(struct drm_dep_queue *q,
> > > +    struct dma_fence *f)
> > > +{
> > > + /* removed_fence must be visible to the reader before &q->dep.fence */
> > > + smp_store_release(&q->dep.removed_fence, f);
> > > +
> > > + WRITE_ONCE(q->dep.fence, NULL);
> > > + drm_dep_queue_run_job_queue(q);
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_queue_wakeup() - dma_fence callback to wake the run-job worker
> > > + * @f: the signalled dependency fence
> > > + * @cb: callback embedded in the dep queue
> > > + *
> > > + * Called from dma_fence_signal() when the active dependency fence signals.
> > > + * Delegates to drm_dep_queue_remove_dependency() to clear @q->dep.fence and
> > > + * kick the run-job worker.  The fence reference is not dropped here; it is
> > > + * deferred to the run-job worker via @q->dep.removed_fence.
> > 
> > Same here.
> > 
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +static void drm_dep_queue_wakeup(struct dma_fence *f, struct dma_fence_cb *cb)
> > > +{
> > > + struct drm_dep_queue *q =
> > > + container_of(cb, struct drm_dep_queue, dep.cb);
> > > +
> > > + drm_dep_queue_remove_dependency(q, f);
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_queue_is_ready() - check whether the queue has a dispatchable job
> > > + * @q: dep queue
> > > + *
> > > + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> > 
> > Can’t call this in Rust if the lock is not taken.
> > 
> > > + * Return: true if SPSC queue non-empty and no dep fence pending,
> > > + *   false otherwise.
> > > + */
> > > +static bool drm_dep_queue_is_ready(struct drm_dep_queue *q)
> > > +{
> > > + lockdep_assert_held(&q->sched.lock);
> > > +
> > > + if (!spsc_queue_count(&q->job.queue))
> > > + return false;
> > > +
> > > + if (READ_ONCE(q->dep.fence))
> > > + return false;
> > > +
> > > + /* Paired with smp_store_release in drm_dep_queue_remove_dependency() */
> > > + dma_fence_put(smp_load_acquire(&q->dep.removed_fence));
> > > +
> > > + q->dep.removed_fence = NULL;
> > > +
> > > + return true;
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_queue_is_killed() - check whether a dep queue has been killed
> > > + * @q: dep queue to check
> > > + *
> > > + * Return: true if %DRM_DEP_QUEUE_FLAGS_KILLED is set on @q, false otherwise.
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +bool drm_dep_queue_is_killed(struct drm_dep_queue *q)
> > > +{
> > > + return !!(q->sched.flags & DRM_DEP_QUEUE_FLAGS_KILLED);
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_queue_is_killed);
> > > +
> > > +/**
> > > + * drm_dep_queue_is_initialized() - check whether a dep queue has been initialized
> > > + * @q: dep queue to check
> > > + *
> > > + * A queue is considered initialized once its ops pointer has been set by a
> > > + * successful call to drm_dep_queue_init().  Drivers that embed a
> > > + * &drm_dep_queue inside a larger structure may call this before attempting any
> > > + * other queue operation to confirm that initialization has taken place.
> > > + * drm_dep_queue_put() must be called if this function returns true to drop the
> > > + * initialization reference from drm_dep_queue_init().
> > > + *
> > > + * Return: true if @q has been initialized, false otherwise.
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +bool drm_dep_queue_is_initialized(struct drm_dep_queue *q)
> > > +{
> > > + return !!q->ops;
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_queue_is_initialized);
> > > +
> > > +/**
> > > + * drm_dep_queue_set_stopped() - pre-mark a queue as stopped before first use
> > > + * @q: dep queue to mark
> > > + *
> > > + * Sets %DRM_DEP_QUEUE_FLAGS_STOPPED directly on @q without going through the
> > > + * normal drm_dep_queue_stop() path.  This is only valid during the driver-side
> > > + * queue initialisation sequence — i.e. after drm_dep_queue_init() returns but
> > > + * before the queue is made visible to other threads (e.g. before it is added
> > > + * to any lookup structures).  Using this after the queue is live is a driver
> > > + * bug; use drm_dep_queue_stop() instead.
> > > + *
> > > + * Context: Process context, queue not yet visible to other threads.
> > > + */
> > > +void drm_dep_queue_set_stopped(struct drm_dep_queue *q)
> > > +{
> > > + q->sched.flags |= DRM_DEP_QUEUE_FLAGS_STOPPED;
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_queue_set_stopped);
> > > +
> > > +/**
> > > + * drm_dep_queue_refcount() - read the current reference count of a queue
> > > + * @q: dep queue to query
> > > + *
> > > + * Returns the instantaneous kref value.  The count may change immediately
> > > + * after this call; callers must not make safety decisions based solely on
> > > + * the returned value.  Intended for diagnostic snapshots and debugfs output.
> > > + *
> > > + * Context: Any context.
> > > + * Return: current reference count.
> > > + */
> > > +unsigned int drm_dep_queue_refcount(const struct drm_dep_queue *q)
> > > +{
> > > + return kref_read(&q->refcount);
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_queue_refcount);
> > > +
> > > +/**
> > > + * drm_dep_queue_timeout() - read the per-job TDR timeout for a queue
> > > + * @q: dep queue to query
> > > + *
> > > + * Returns the per-job timeout in jiffies as set at init time.
> > > + * %MAX_SCHEDULE_TIMEOUT means no timeout is configured.
> > > + *
> > > + * Context: Any context.
> > > + * Return: timeout in jiffies.
> > > + */
> > > +long drm_dep_queue_timeout(const struct drm_dep_queue *q)
> > > +{
> > > + return q->job.timeout;
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_queue_timeout);
> > > +
> > > +/**
> > > + * drm_dep_queue_is_job_put_irq_safe() - test whether job-put from IRQ is allowed
> > > + * @q: dep queue
> > > + *
> > > + * Context: Any context.
> > > + * Return: true if %DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE is set,
> > > + *   false otherwise.
> > > + */
> > > +static bool drm_dep_queue_is_job_put_irq_safe(const struct drm_dep_queue *q)
> > > +{
> > > + return !!(q->sched.flags & DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE);
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_queue_job_dependency() - get next unresolved dep fence
> > > + * @q: dep queue
> > > + * @job: job whose dependencies to advance
> > > + *
> > > + * Returns NULL immediately if the queue has been killed via
> > > + * drm_dep_queue_kill(), bypassing all dependency waits so that jobs
> > > + * drain through run_job as quickly as possible.
> > > + *
> > > + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> > > + * Return: next unresolved &dma_fence with a new reference, or NULL
> > > + *   when all dependencies have been consumed (or the queue is killed).
> > > + */
> > > +static struct dma_fence *
> > > +drm_dep_queue_job_dependency(struct drm_dep_queue *q,
> > > +     struct drm_dep_job *job)
> > > +{
> > > + struct dma_fence *f;
> > > +
> > > + lockdep_assert_held(&q->sched.lock);
> > > +
> > > + if (drm_dep_queue_is_killed(q))
> > > + return NULL;
> > > +
> > > + f = xa_load(&job->dependencies, job->last_dependency);
> > > + if (f) {
> > > + job->last_dependency++;
> > > + if (WARN_ON(DRM_DEP_JOB_FENCE_PREALLOC == f))
> > > + return dma_fence_get_stub();
> > > + return dma_fence_get(f);
> > > + }
> > > +
> > > + return NULL;
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_queue_add_dep_cb() - install wakeup callback on dep fence
> > > + * @q: dep queue
> > > + * @job: job whose dependency fence is stored in @q->dep.fence
> > > + *
> > > + * Installs a wakeup callback on @q->dep.fence. Returns true if the
> > > + * callback was installed (the queue must wait), false if the fence is
> > > + * already signalled or is a self-fence from the same queue context.
> > > + *
> > > + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> > > + * Return: true if callback installed, false if fence already done.
> > > + */
> > 
> > In Rust, we can encode the signaling paths with a “token type”. So any
> > sections that are part of the signaling path can simply take this token as an
> > argument. This type also enforces that end_signaling() is called automatically when it
> > goes out of scope.
> > 
> > By the way, we can easily offer an irq handler type where we enforce this:
> > 
> > fn handle_threaded_irq(&self, device: &Device<Bound>) -> IrqReturn { 
> >  let _annotation = DmaFenceSignallingAnnotation::new();  // Calls begin_signaling()
> >  self.driver.handle_threaded_irq(device) 
> > 
> >  // end_signaling() is called here automatically.
> > }
> > 
> > Same for workqueues:
> > 
> > fn work_fn(&self, device: &Device<Bound>) {
> >  let _annotation = DmaFenceSignallingAnnotation::new();  // Calls begin_signaling()
> >  self.driver.work_fn(device) 
> > 
> >  // end_signaling() is called here automatically.
> > }
> > 
> > This is not Rust-specific, of course, but it is more ergonomic to write in Rust.
> > 

Yes, I agree this is a nice feature, and properly annotating C code
requires discipline.

> > > +static bool drm_dep_queue_add_dep_cb(struct drm_dep_queue *q,
> > > +     struct drm_dep_job *job)
> > > +{
> > > + struct dma_fence *fence = q->dep.fence;
> > > +
> > > + lockdep_assert_held(&q->sched.lock);
> > > +
> > > + if (WARN_ON(fence->context == q->fence.context)) {
> > > + dma_fence_put(q->dep.fence);
> > > + q->dep.fence = NULL;
> > > + return false;
> > > + }
> > > +
> > > + if (!dma_fence_add_callback(q->dep.fence, &q->dep.cb,
> > > +    drm_dep_queue_wakeup))
> > > + return true;
> > > +
> > > + dma_fence_put(q->dep.fence);
> > > + q->dep.fence = NULL;
> > > +
> > > + return false;
> > > +}
> > 
> > In rust we can enforce that all callbacks take a reference to the fence
> > automatically. If the callback is “forgotten” in a buggy path, it is
> > automatically removed, and the fence is automatically signaled with -ECANCELED.
> > 
> > > +
> > > +/**
> > > + * drm_dep_queue_pop_job() - pop a dispatchable job from the SPSC queue
> > > + * @q: dep queue
> > > + *
> > > + * Peeks at the head of the SPSC queue and drains all resolved
> > > + * dependencies. If a dependency is still pending, installs a wakeup
> > > + * callback and returns NULL. On success pops the job and returns it.
> > > + *
> > > + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> > > + * Return: next dispatchable job, or NULL if a dep is still pending.
> > > + */
> > > +static struct drm_dep_job *drm_dep_queue_pop_job(struct drm_dep_queue *q)
> > > +{
> > > + struct spsc_node *node;
> > > + struct drm_dep_job *job;
> > > +
> > > + lockdep_assert_held(&q->sched.lock);
> > > +
> > > + node = spsc_queue_peek(&q->job.queue);
> > > + if (!node)
> > > + return NULL;
> > > +
> > > + job = container_of(node, struct drm_dep_job, queue_node);
> > > +
> > > + while ((q->dep.fence = drm_dep_queue_job_dependency(q, job))) {
> > > + if (drm_dep_queue_add_dep_cb(q, job))
> > > + return NULL;
> > > + }
> > > +
> > > + spsc_queue_pop(&q->job.queue);
> > > +
> > > + return job;
> > > +}
> > > +
> > > +/*
> > > + * drm_dep_queue_get_unless_zero() - try to acquire a queue reference
> > > + *
> > > + * Workers use this instead of drm_dep_queue_get() to guard against the zombie
> > > + * state: the queue's refcount has already reached zero (async teardown is in
> > > + * flight) but a work item was queued before free_work had a chance to cancel
> > > + * it.  If kref_get_unless_zero() fails the caller must bail immediately.
> > > + *
> > > + * Context: Any context.
> > > + * Returns true if the reference was acquired, false if the queue is zombie.
> > > + */
> > 
> > Again, this function is totally gone in Rust.
> > 

See above. I don't think it is given the async teardown flow design.

> > > +bool drm_dep_queue_get_unless_zero(struct drm_dep_queue *q)
> > > +{
> > > + return kref_get_unless_zero(&q->refcount);
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_queue_get_unless_zero);
> > > +
> > > +/**
> > > + * drm_dep_queue_run_job_work() - run-job worker
> > > + * @work: work item embedded in the dep queue
> > > + *
> > > + * Acquires @q->sched.lock, checks stopped state, queue readiness and
> > > + * available credits, pops the next job via drm_dep_queue_pop_job(),
> > > + * dispatches it via drm_dep_queue_run_job(), then re-kicks itself.
> > > + *
> > > + * Uses drm_dep_queue_get_unless_zero() at entry and bails immediately if the
> > > + * queue is in zombie state (refcount already zero, async teardown in flight).
> > > + *
> > > + * Context: Process context (workqueue). DMA fence signaling path.
> > > + */
> > > +static void drm_dep_queue_run_job_work(struct work_struct *work)
> > > +{
> > > + struct drm_dep_queue *q =
> > > + container_of(work, struct drm_dep_queue, sched.run_job);
> > > + struct spsc_node *node;
> > > + struct drm_dep_job *job;
> > > + bool cookie = dma_fence_begin_signalling();
> > > +
> > > + /* Bail if queue is zombie (refcount already zero, teardown in flight). */
> > > + if (!drm_dep_queue_get_unless_zero(q)) {
> > > + dma_fence_end_signalling(cookie);
> > > + return;
> > > + }
> > > +
> > > + mutex_lock(&q->sched.lock);
> > > +
> > > + if (drm_dep_queue_is_stopped(q))
> > > + goto put_queue;
> > > +
> > > + if (!drm_dep_queue_is_ready(q))
> > > + goto put_queue;
> > > +
> > > + /* Peek to check credits before committing to pop and dep resolution */
> > > + node = spsc_queue_peek(&q->job.queue);
> > > + if (!node)
> > > + goto put_queue;
> > > +
> > > + job = container_of(node, struct drm_dep_job, queue_node);
> > > + if (!drm_dep_queue_has_credits(q, job))
> > > + goto put_queue;
> > > +
> > > + job = drm_dep_queue_pop_job(q);
> > > + if (!job)
> > > + goto put_queue;
> > > +
> > > + drm_dep_queue_run_job(q, job);
> > > + drm_dep_queue_run_job_queue(q);
> > > +
> > > +put_queue:
> > > + mutex_unlock(&q->sched.lock);
> > > + drm_dep_queue_put(q);
> > > + dma_fence_end_signalling(cookie);
> > > +}
> > > +
> > > +/*
> > > + * drm_dep_queue_remove_job() - unlink a job from the pending list and reset TDR
> > > + * @q:   dep queue owning @job
> > > + * @job: job to remove
> > > + *
> > > + * Splices @job out of @q->job.pending, cancels any pending TDR delayed work,
> > > + * and arms the timeout for the new list head (if any).
> > > + *
> > > + * Context: Process context. Must hold @q->sched.lock. DMA fence signaling path.
> > > + */
> > > +static void drm_dep_queue_remove_job(struct drm_dep_queue *q,
> > > +     struct drm_dep_job *job)
> > > +{
> > > + lockdep_assert_held(&q->job.lock);
> > > +
> > > + list_del_init(&job->pending_link);
> > > + cancel_delayed_work(&q->sched.tdr);
> > > + drm_queue_start_timeout(q);
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_queue_get_finished_job() - dequeue a finished job
> > > + * @q: dep queue
> > > + *
> > > + * Under @q->job.lock checks the head of the pending list for a
> > > + * finished dep fence. If found, removes the job from the list,
> > > + * cancels the TDR, and re-arms it for the new head.
> > > + *
> > > + * Context: Process context (workqueue). DMA fence signaling path.
> > > + * Return: the finished &drm_dep_job, or NULL if none is ready.
> > > + */
> > > +static struct drm_dep_job *
> > > +drm_dep_queue_get_finished_job(struct drm_dep_queue *q)
> > > +{
> > > + struct drm_dep_job *job;
> > > +
> > > + guard(spinlock_irq)(&q->job.lock);
> > > +
> > > + job = list_first_entry_or_null(&q->job.pending, struct drm_dep_job,
> > > +       pending_link);
> > > + if (job && drm_dep_fence_is_finished(job->dfence))
> > > + drm_dep_queue_remove_job(q, job);
> > > + else
> > > + job = NULL;
> > > +
> > > + return job;
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_queue_put_job_work() - put-job worker
> > > + * @work: work item embedded in the dep queue
> > > + *
> > > + * Drains all finished jobs by calling drm_dep_job_put() in a loop,
> > > + * then kicks the run-job worker.
> > > + *
> > > + * Uses drm_dep_queue_get_unless_zero() at entry and bails immediately if the
> > > + * queue is in zombie state (refcount already zero, async teardown in flight).
> > > + *
> > > + * Wraps execution in dma_fence_begin_signalling() / dma_fence_end_signalling()
> > > + * because workqueue is shared with other items in the fence signaling path.
> > > + *
> > > + * Context: Process context (workqueue). DMA fence signaling path.
> > > + */
> > > +static void drm_dep_queue_put_job_work(struct work_struct *work)
> > > +{
> > > + struct drm_dep_queue *q =
> > > + container_of(work, struct drm_dep_queue, sched.put_job);
> > > + struct drm_dep_job *job;
> > > + bool cookie = dma_fence_begin_signalling();
> > > +
> > > + /* Bail if queue is zombie (refcount already zero, teardown in flight). */
> > > + if (!drm_dep_queue_get_unless_zero(q)) {
> > > + dma_fence_end_signalling(cookie);
> > > + return;
> > > + }
> > > +
> > > + while ((job = drm_dep_queue_get_finished_job(q)))
> > > + drm_dep_job_put(job);
> > > +
> > > + drm_dep_queue_run_job_queue(q);
> > > +
> > > + drm_dep_queue_put(q);
> > > + dma_fence_end_signalling(cookie);
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_queue_tdr_work() - TDR worker
> > > + * @work: work item embedded in the delayed TDR work
> > > + *
> > > + * Removes the head job from the pending list under @q->job.lock,
> > > + * asserts @q->ops->timedout_job is non-NULL, calls it outside the lock,
> > > + * requeues the job if %DRM_DEP_TIMEDOUT_STAT_REQUEUE_JOB, drops the
> > > + * queue's job reference on %DRM_DEP_TIMEDOUT_STAT_JOB_SIGNALED, and always
> > > + * restarts the TDR timer after handling the job (unless @q is stopping).
> > > + * Any other return value triggers a WARN.
> > > + *
> > > + * The TDR is never armed when @q->ops->timedout_job is NULL, so firing
> > > + * this worker without a timedout_job callback is a driver bug.
> > > + *
> > > + * Uses drm_dep_queue_get_unless_zero() at entry and bails immediately if the
> > > + * queue is in zombie state (refcount already zero, async teardown in flight).
> > > + *
> > > + * Wraps execution in dma_fence_begin_signalling() / dma_fence_end_signalling()
> > > + * because timedout_job() is expected to signal the guilty job's fence as part
> > > + * of reset.
> > > + *
> > > + * Context: Process context (workqueue). DMA fence signaling path.
> > > + */
> > > +static void drm_dep_queue_tdr_work(struct work_struct *work)
> > > +{
> > > + struct drm_dep_queue *q =
> > > + container_of(work, struct drm_dep_queue, sched.tdr.work);
> > > + struct drm_dep_job *job;
> > > + bool cookie = dma_fence_begin_signalling();
> > > +
> > > + /* Bail if queue is zombie (refcount already zero, teardown in flight). */
> > > + if (!drm_dep_queue_get_unless_zero(q)) {
> > > + dma_fence_end_signalling(cookie);
> > > + return;
> > > + }
> > > +
> > > + scoped_guard(spinlock_irq, &q->job.lock) {
> > > + job = list_first_entry_or_null(&q->job.pending,
> > > +       struct drm_dep_job,
> > > +       pending_link);
> > > + if (job)
> > > + /*
> > > + * Remove from pending so it cannot be freed
> > > + * concurrently by drm_dep_queue_get_finished_job() or
> > > + * .drm_dep_job_done().
> > > + */
> > > + list_del_init(&job->pending_link);
> > > + }
> > > +
> > > + if (job) {
> > > + enum drm_dep_timedout_stat status;
> > > +
> > > + if (WARN_ON(!q->ops->timedout_job)) {
> > > + drm_dep_job_put(job);
> > > + goto out;
> > > + }
> > > +
> > > + status = q->ops->timedout_job(job);
> > > +
> > > + switch (status) {
> > > + case DRM_DEP_TIMEDOUT_STAT_REQUEUE_JOB:
> > > + scoped_guard(spinlock_irq, &q->job.lock)
> > > + list_add(&job->pending_link, &q->job.pending);
> > > + drm_dep_queue_put_job_queue(q);
> > > + break;
> > > + case DRM_DEP_TIMEDOUT_STAT_JOB_SIGNALED:
> > > + drm_dep_job_put(job);
> > > + break;
> > > + default:
> > > + WARN_ON("invalid drm_dep_timedout_stat");
> > > + break;
> > > + }
> > > + }
> > > +
> > > +out:
> > > + drm_queue_start_timeout_unlocked(q);
> > > + drm_dep_queue_put(q);
> > > + dma_fence_end_signalling(cookie);
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_alloc_submit_wq() - allocate an ordered submit workqueue
> > > + * @name: name for the workqueue
> > > + * @flags: DRM_DEP_QUEUE_FLAGS_* flags
> > > + *
> > > + * Allocates an ordered workqueue for job submission with %WQ_MEM_RECLAIM and
> > > + * %WQ_MEM_WARN_ON_RECLAIM set, ensuring the workqueue is safe to use from
> > > + * memory reclaim context and properly annotated for lockdep taint tracking.
> > > + * Adds %WQ_HIGHPRI if %DRM_DEP_QUEUE_FLAGS_HIGHPRI is set. When
> > > + * CONFIG_LOCKDEP is enabled, uses a dedicated lockdep map for annotation.
> > > + *
> > > + * Context: Process context.
> > > + * Return: the new &workqueue_struct, or NULL on failure.
> > > + */
> > > +static struct workqueue_struct *
> > > +drm_dep_alloc_submit_wq(const char *name, enum drm_dep_queue_flags flags)
> > > +{
> > > + unsigned int wq_flags = WQ_MEM_RECLAIM | WQ_MEM_WARN_ON_RECLAIM;
> > > +
> > > + if (flags & DRM_DEP_QUEUE_FLAGS_HIGHPRI)
> > > + wq_flags |= WQ_HIGHPRI;
> > > +
> > > +#if IS_ENABLED(CONFIG_LOCKDEP)
> > > + static struct lockdep_map map = {
> > > + .name = "drm_dep_submit_lockdep_map"
> > > + };
> > > + return alloc_ordered_workqueue_lockdep_map(name, wq_flags, &map);
> > > +#else
> > > + return alloc_ordered_workqueue(name, wq_flags);
> > > +#endif
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_alloc_timeout_wq() - allocate an ordered TDR workqueue
> > > + * @name: name for the workqueue
> > > + *
> > > + * Allocates an ordered workqueue for timeout detection and recovery with
> > > + * %WQ_MEM_RECLAIM and %WQ_MEM_WARN_ON_RECLAIM set, ensuring consistent taint
> > > + * annotation with the submit workqueue. When CONFIG_LOCKDEP is enabled, uses
> > > + * a dedicated lockdep map for annotation.
> > > + *
> > > + * Context: Process context.
> > > + * Return: the new &workqueue_struct, or NULL on failure.
> > > + */
> > > +static struct workqueue_struct *drm_dep_alloc_timeout_wq(const char *name)
> > > +{
> > > + unsigned int wq_flags = WQ_MEM_RECLAIM | WQ_MEM_WARN_ON_RECLAIM;
> > > +
> > > +#if IS_ENABLED(CONFIG_LOCKDEP)
> > > + static struct lockdep_map map = {
> > > + .name = "drm_dep_timeout_lockdep_map"
> > > + };
> > > + return alloc_ordered_workqueue_lockdep_map(name, wq_flags, &map);
> > > +#else
> > > + return alloc_ordered_workqueue(name, wq_flags);
> > > +#endif
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_queue_init() - initialize a dep queue
> > > + * @q: dep queue to initialize
> > > + * @args: initialization arguments
> > > + *
> > > + * Initializes all fields of @q from @args. If @args->submit_wq is NULL an
> > > + * ordered workqueue is allocated and owned by the queue
> > > + * (%DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ). If @args->timeout_wq is NULL an
> > > + * ordered workqueue is allocated and owned by the queue
> > > + * (%DRM_DEP_QUEUE_FLAGS_OWN_TIMEDOUT_WQ). On success the queue holds one kref
> > > + * reference and drm_dep_queue_put() must be called to drop this reference
> > > + * (i.e., drivers cannot directly free the queue).
> > > + *
> > > + * When CONFIG_LOCKDEP is enabled, @q->sched.lock is primed against the
> > > + * fs_reclaim pseudo-lock so that lockdep can detect any lock ordering
> > > + * inversion between @sched.lock and memory reclaim.
> > > + *
> > > + * Return: 0 on success, %-EINVAL when @args->credit_limit is zero, @args->ops
> > > + * is NULL, @args->drm is NULL, @args->ops->run_job is NULL, or when
> > > + * @args->submit_wq or @args->timeout_wq is non-NULL but was not allocated with
> > > + * %WQ_MEM_WARN_ON_RECLAIM; %-ENOMEM when workqueue allocation fails.
> > > + *
> > > + * Context: Process context. May allocate memory and create workqueues.
> > > + */
> > > +int drm_dep_queue_init(struct drm_dep_queue *q,
> > > +       const struct drm_dep_queue_init_args *args)
> > > +{
> > > + if (!args->credit_limit || !args->drm || !args->ops ||
> > > +    !args->ops->run_job)
> > > + return -EINVAL;
> > > +
> > > + if (args->submit_wq && !workqueue_is_reclaim_annotated(args->submit_wq))
> > > + return -EINVAL;
> > > +
> > > + if (args->timeout_wq &&
> > > +    !workqueue_is_reclaim_annotated(args->timeout_wq))
> > > + return -EINVAL;
> > > +
> > > + memset(q, 0, sizeof(*q));
> > > +
> > > + q->name = args->name;
> > > + q->drm = args->drm;
> > > + q->credit.limit = args->credit_limit;
> > > + q->job.timeout = args->timeout ? args->timeout : MAX_SCHEDULE_TIMEOUT;
> > > +
> > > + init_rcu_head(&q->rcu);
> > > + INIT_LIST_HEAD(&q->job.pending);
> > > + spin_lock_init(&q->job.lock);
> > > + spsc_queue_init(&q->job.queue);
> > > +
> > > + mutex_init(&q->sched.lock);
> > > + if (IS_ENABLED(CONFIG_LOCKDEP)) {
> > > + fs_reclaim_acquire(GFP_KERNEL);
> > > + might_lock(&q->sched.lock);
> > > + fs_reclaim_release(GFP_KERNEL);
> > > + }
> > > +
> > > + if (args->submit_wq) {
> > > + q->sched.submit_wq = args->submit_wq;
> > > + } else {
> > > + q->sched.submit_wq = drm_dep_alloc_submit_wq(args->name ?: "drm_dep",
> > > +     args->flags);
> > > + if (!q->sched.submit_wq)
> > > + return -ENOMEM;
> > > +
> > > + q->sched.flags |= DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ;
> > > + }
> > > +
> > > + if (args->timeout_wq) {
> > > + q->sched.timeout_wq = args->timeout_wq;
> > > + } else {
> > > + q->sched.timeout_wq = drm_dep_alloc_timeout_wq(args->name ?: "drm_dep");
> > > + if (!q->sched.timeout_wq)
> > > + goto err_submit_wq;
> > > +
> > > + q->sched.flags |= DRM_DEP_QUEUE_FLAGS_OWN_TIMEDOUT_WQ;
> > > + }
> > > +
> > > + q->sched.flags |= args->flags &
> > > + ~(DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ |
> > > +  DRM_DEP_QUEUE_FLAGS_OWN_TIMEDOUT_WQ);
> > > +
> > > + INIT_DELAYED_WORK(&q->sched.tdr, drm_dep_queue_tdr_work);
> > > + INIT_WORK(&q->sched.run_job, drm_dep_queue_run_job_work);
> > > + INIT_WORK(&q->sched.put_job, drm_dep_queue_put_job_work);
> > > +
> > > + q->fence.context = dma_fence_context_alloc(1);
> > > +
> > > + kref_init(&q->refcount);
> > > + q->ops = args->ops;
> > > + drm_dev_get(q->drm);
> > > +
> > > + return 0;
> > > +
> > > +err_submit_wq:
> > > + if (q->sched.flags & DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ)
> > > + destroy_workqueue(q->sched.submit_wq);
> > > + mutex_destroy(&q->sched.lock);
> > > +
> > > + return -ENOMEM;
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_queue_init);
> > > +
> > > +#if IS_ENABLED(CONFIG_PROVE_LOCKING)
> > > +/**
> > > + * drm_dep_queue_push_job_begin() - mark the start of an arm/push critical section
> > > + * @q: dep queue the job belongs to
> > > + *
> > > + * Called at the start of drm_dep_job_arm() and warns if the push context is
> > > + * already owned by another task, which would indicate concurrent arm/push on
> > > + * the same queue.
> > > + *
> > > + * No-op when CONFIG_PROVE_LOCKING is disabled.
> > > + *
> > > + * Context: Process context. DMA fence signaling path.
> > > + */
> > > +void drm_dep_queue_push_job_begin(struct drm_dep_queue *q)
> > > +{
> > > + WARN_ON(q->job.push.owner);
> > > + q->job.push.owner = current;
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_queue_push_job_end() - mark the end of an arm/push critical section
> > > + * @q: dep queue the job belongs to
> > > + *
> > > + * Called at the end of drm_dep_job_push() and warns if the push context is not
> > > + * owned by the current task, which would indicate a mismatched begin/end pair
> > > + * or a push from the wrong thread.
> > > + *
> > > + * No-op when CONFIG_PROVE_LOCKING is disabled.
> > > + *
> > > + * Context: Process context. DMA fence signaling path.
> > > + */
> > > +void drm_dep_queue_push_job_end(struct drm_dep_queue *q)
> > > +{
> > > + WARN_ON(q->job.push.owner != current);
> > > + q->job.push.owner = NULL;
> > > +}
> > > +#endif
> > > +
> > > +/**
> > > + * drm_dep_queue_assert_teardown_invariants() - assert teardown invariants
> > > + * @q: dep queue being torn down
> > > + *
> > > + * Warns if the pending-job list, the SPSC submission queue, or the credit
> > > + * counter is non-zero when called, or if the queue still has a non-zero
> > > + * reference count.
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +static void drm_dep_queue_assert_teardown_invariants(struct drm_dep_queue *q)
> > > +{
> > > + WARN_ON(!list_empty(&q->job.pending));
> > > + WARN_ON(spsc_queue_count(&q->job.queue));
> > > + WARN_ON(atomic_read(&q->credit.count));
> > > + WARN_ON(drm_dep_queue_refcount(q));
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_queue_release() - final internal cleanup of a dep queue
> > > + * @q: dep queue to clean up
> > > + *
> > > + * Asserts teardown invariants and destroys internal resources allocated by
> > > + * drm_dep_queue_init() that cannot be torn down earlier in the teardown
> > > + * sequence.  Currently this destroys @q->sched.lock.
> > > + *
> > > + * Drivers that implement &drm_dep_queue_ops.release **must** call this
> > > + * function after removing @q from any internal bookkeeping (e.g. lookup
> > > + * tables or lists) but before freeing the memory that contains @q.  When
> > > + * &drm_dep_queue_ops.release is NULL, drm_dep follows the default teardown
> > > + * path and calls this function automatically.
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +void drm_dep_queue_release(struct drm_dep_queue *q)
> > > +{
> > > + drm_dep_queue_assert_teardown_invariants(q);
> > > + mutex_destroy(&q->sched.lock);
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_queue_release);
> > > +
> > > +/**
> > > + * drm_dep_queue_free() - final cleanup of a dep queue
> > > + * @q: dep queue to free
> > > + *
> > > + * Invokes &drm_dep_queue_ops.release if set, in which case the driver is
> > > + * responsible for calling drm_dep_queue_release() and freeing @q itself.
> > > + * If &drm_dep_queue_ops.release is NULL, calls drm_dep_queue_release()
> > > + * and then frees @q with kfree_rcu().
> > > + *
> > > + * In either case, releases the drm_dev_get() reference taken at init time
> > > + * via drm_dev_put(), allowing the owning &drm_device to be unloaded once
> > > + * all queues have been freed.
> > > + *
> > > + * Context: Process context (workqueue), reclaim safe.
> > > + */
> > > +static void drm_dep_queue_free(struct drm_dep_queue *q)
> > > +{
> > > + struct drm_device *drm = q->drm;
> > > +
> > > + if (q->ops->release) {
> > > + q->ops->release(q);
> > > + } else {
> > > + drm_dep_queue_release(q);
> > > + kfree_rcu(q, rcu);
> > > + }
> > > + drm_dev_put(drm);
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_queue_free_work() - deferred queue teardown worker
> > > + * @work: free_work item embedded in the dep queue
> > > + *
> > > + * Runs on dep_free_wq. Disables all work items synchronously
> > > + * (preventing re-queue and waiting for in-flight instances),
> > > + * destroys any owned workqueues, then calls drm_dep_queue_free().
> > > + * Running on dep_free_wq ensures destroy_workqueue() is never
> > > + * called from within one of the queue's own workers (deadlock)
> > > + * and disable_*_sync() cannot deadlock either.
> > > + *
> > > + * Context: Process context (workqueue), reclaim safe.
> > > + */
> > > +static void drm_dep_queue_free_work(struct work_struct *work)
> > > +{
> > > + struct drm_dep_queue *q =
> > > + container_of(work, struct drm_dep_queue, free_work);
> > > +
> > > + drm_dep_queue_assert_teardown_invariants(q);
> > > +
> > > + disable_delayed_work_sync(&q->sched.tdr);
> > > + disable_work_sync(&q->sched.run_job);
> > > + disable_work_sync(&q->sched.put_job);
> > > +
> > > + if (q->sched.flags & DRM_DEP_QUEUE_FLAGS_OWN_TIMEDOUT_WQ)
> > > + destroy_workqueue(q->sched.timeout_wq);
> > > +
> > > + if (q->sched.flags & DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ)
> > > + destroy_workqueue(q->sched.submit_wq);
> > > +
> > > + drm_dep_queue_free(q);
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_queue_fini() - tear down a dep queue
> > > + * @q: dep queue to tear down
> > > + *
> > > + * Asserts teardown invariants  and nitiates teardown of @q by queuing the
> > > + * deferred free work onto tht module-private dep_free_wq workqueue.  The work
> > > + * item disables any pending TDR and run/put-job work synchronously, destroys
> > > + * any workqueues that were allocated by drm_dep_queue_init(), and then releases
> > > + * the queue memory.
> > > + *
> > > + * Running teardown from dep_free_wq ensures that destroy_workqueue() is never
> > > + * called from within one of the queue's own workers (e.g. via
> > > + * drm_dep_queue_put()), which would deadlock.
> > > + *
> > > + * Drivers can wait for all outstanding deferred work to complete by waiting
> > > + * for the last drm_dev_put() reference on their &drm_device, which is
> > > + * released as the final step of each queue's teardown.
> > > + *
> > > + * Drivers that implement &drm_dep_queue_ops.fini **must** call this
> > > + * function after removing @q from any device bookkeeping but before freeing the
> > > + * memory that contains @q.  When &drm_dep_queue_ops.fini is NULL, drm_dep
> > > + * follows the default teardown path and calls this function automatically.
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +void drm_dep_queue_fini(struct drm_dep_queue *q)
> > > +{
> > > + drm_dep_queue_assert_teardown_invariants(q);
> > > +
> > > + INIT_WORK(&q->free_work, drm_dep_queue_free_work);
> > > + queue_work(dep_free_wq, &q->free_work);
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_queue_fini);
> > > +
> > > +/**
> > > + * drm_dep_queue_get() - acquire a reference to a dep queue
> > > + * @q: dep queue to acquire a reference on, or NULL
> > > + *
> > > + * Return: @q with an additional reference held, or NULL if @q is NULL.
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +struct drm_dep_queue *drm_dep_queue_get(struct drm_dep_queue *q)
> > > +{
> > > + if (q)
> > > + kref_get(&q->refcount);
> > > + return q;
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_queue_get);
> > > +
> > > +/**
> > > + * __drm_dep_queue_release() - kref release callback for a dep queue
> > > + * @kref: kref embedded in the dep queue
> > > + *
> > > + * Calls &drm_dep_queue_ops.fini if set, otherwise calls
> > > + * drm_dep_queue_fini() to initiate deferred teardown.
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +static void __drm_dep_queue_release(struct kref *kref)
> > > +{
> > > + struct drm_dep_queue *q =
> > > + container_of(kref, struct drm_dep_queue, refcount);
> > > +
> > > + if (q->ops->fini)
> > > + q->ops->fini(q);
> > > + else
> > > + drm_dep_queue_fini(q);
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_queue_put() - release a reference to a dep queue
> > > + * @q: dep queue to release a reference on, or NULL
> > > + *
> > > + * When the last reference is dropped, calls &drm_dep_queue_ops.fini if set,
> > > + * otherwise calls drm_dep_queue_fini(). Final memory release is handled by
> > > + * &drm_dep_queue_ops.release (which must call drm_dep_queue_release()) if set,
> > > + * or drm_dep_queue_release() followed by kfree_rcu() otherwise.
> > > + * Does nothing if @q is NULL.
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +void drm_dep_queue_put(struct drm_dep_queue *q)
> > > +{
> > > + if (q)
> > > + kref_put(&q->refcount, __drm_dep_queue_release);
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_queue_put);
> > > +
> > > +/**
> > > + * drm_dep_queue_stop() - stop a dep queue from processing new jobs
> > > + * @q: dep queue to stop
> > > + *
> > > + * Sets %DRM_DEP_QUEUE_FLAGS_STOPPED on @q under both @q->sched.lock (mutex)
> > > + * and @q->job.lock (spinlock_irq), making the flag safe to test from finished
> > > + * fenced signaling context. Then cancels any in-flight run_job and put_job work
> > > + * items. Once stopped, the bypass path and the submit workqueue will not
> > > + * dispatch further jobs nor will any jobs be removed from the pending list.
> > > + * Call drm_dep_queue_start() to resume processing.
> > > + *
> > > + * Context: Process context. Waits for in-flight workers to complete.
> > > + */
> > > +void drm_dep_queue_stop(struct drm_dep_queue *q)
> > > +{
> > > + scoped_guard(mutex, &q->sched.lock) {
> > > + scoped_guard(spinlock_irq, &q->job.lock)
> > > + drm_dep_queue_flags_set(q, DRM_DEP_QUEUE_FLAGS_STOPPED);
> > > + }
> > > + cancel_work_sync(&q->sched.run_job);
> > > + cancel_work_sync(&q->sched.put_job);
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_queue_stop);
> > > +
> > > +/**
> > > + * drm_dep_queue_start() - resume a stopped dep queue
> > > + * @q: dep queue to start
> > > + *
> > > + * Clears %DRM_DEP_QUEUE_FLAGS_STOPPED on @q under both @q->sched.lock (mutex)
> > > + * and @q->job.lock (spinlock_irq), making the flag safe to test from IRQ
> > > + * context. Then re-queues the run_job and put_job work items so that any jobs
> > > + * pending since the queue was stopped are processed. Must only be called after
> > > + * drm_dep_queue_stop().
> > > + *
> > > + * Context: Process context.
> > > + */
> > > +void drm_dep_queue_start(struct drm_dep_queue *q)
> > > +{
> > > + scoped_guard(mutex, &q->sched.lock) {
> > > + scoped_guard(spinlock_irq, &q->job.lock)
> > > + drm_dep_queue_flags_clear(q, DRM_DEP_QUEUE_FLAGS_STOPPED);
> > > + }
> > > + drm_dep_queue_run_job_queue(q);
> > > + drm_dep_queue_put_job_queue(q);
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_queue_start);
> > > +
> > > +/**
> > > + * drm_dep_queue_trigger_timeout() - trigger the TDR immediately for
> > > + *   all pending jobs
> > > + * @q: dep queue to trigger timeout on
> > > + *
> > > + * Sets @q->job.timeout to 1 and arms the TDR delayed work with a one-jiffy
> > > + * delay, causing it to fire almost immediately without hot-spinning at zero
> > > + * delay. This is used to force-expire any pendind jobs on the queue, for
> > > + * example when the device is being torn down or has encountered an
> > > + * unrecoverable error.
> > > + *
> > > + * It is suggested that when this function is used, the first timedout_job call
> > > + * causes the driver to kick the queue off the hardware and signal all pending
> > > + * job fences. Subsequent calls continue to signal all pending job fences.
> > > + *
> > > + * Has no effect if the pending list is empty.
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +void drm_dep_queue_trigger_timeout(struct drm_dep_queue *q)
> > > +{
> > > + guard(spinlock_irqsave)(&q->job.lock);
> > > + q->job.timeout = 1;
> > > + drm_queue_start_timeout(q);
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_queue_trigger_timeout);
> > > +
> > > +/**
> > > + * drm_dep_queue_cancel_tdr_sync() - cancel any pending TDR and wait
> > > + *   for it to finish
> > > + * @q: dep queue whose TDR to cancel
> > > + *
> > > + * Cancels the TDR delayed work item if it has not yet started, and waits for
> > > + * it to complete if it is already running.  After this call returns, the TDR
> > > + * worker is guaranteed not to be executing and will not fire again until
> > > + * explicitly rearmed (e.g. via drm_dep_queue_resume_timeout() or by a new
> > > + * job being submitted).
> > > + *
> > > + * Useful during error recovery or queue teardown when the caller needs to
> > > + * know that no timeout handling races with its own reset logic.
> > > + *
> > > + * Context: Process context. May sleep waiting for the TDR worker to finish.
> > > + */
> > > +void drm_dep_queue_cancel_tdr_sync(struct drm_dep_queue *q)
> > > +{
> > > + cancel_delayed_work_sync(&q->sched.tdr);
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_queue_cancel_tdr_sync);
> > > +
> > > +/**
> > > + * drm_dep_queue_resume_timeout() - restart the TDR timer with the
> > > + *   configured timeout
> > > + * @q: dep queue to resume the timeout for
> > > + *
> > > + * Restarts the TDR delayed work using @q->job.timeout. Called after device
> > > + * recovery to give pending jobs a fresh full timeout window. Has no effect
> > > + * if the pending list is empty.
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +void drm_dep_queue_resume_timeout(struct drm_dep_queue *q)
> > > +{
> > > + drm_queue_start_timeout_unlocked(q);
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_queue_resume_timeout);
> > > +
> > > +/**
> > > + * drm_dep_queue_is_stopped() - check whether a dep queue is stopped
> > > + * @q: dep queue to check
> > > + *
> > > + * Return: true if %DRM_DEP_QUEUE_FLAGS_STOPPED is set on @q, false otherwise.
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +bool drm_dep_queue_is_stopped(struct drm_dep_queue *q)
> > > +{
> > > + return !!(q->sched.flags & DRM_DEP_QUEUE_FLAGS_STOPPED);
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_queue_is_stopped);
> > > +
> > > +/**
> > > + * drm_dep_queue_kill() - kill a dep queue and flush all pending jobs
> > > + * @q: dep queue to kill
> > > + *
> > > + * Sets %DRM_DEP_QUEUE_FLAGS_KILLED on @q under @q->sched.lock.  If a
> > > + * dependency fence is currently being waited on, its callback is removed and
> > > + * the run-job worker is kicked immediately so that the blocked job drains
> > > + * without waiting.
> > > + *
> > > + * Once killed, drm_dep_queue_job_dependency() returns NULL for all jobs,
> > > + * bypassing dependency waits so that every queued job drains through
> > > + * &drm_dep_queue_ops.run_job without blocking.
> > > + *
> > > + * The &drm_dep_queue_ops.run_job callback is guaranteed to be called for every
> > > + * job that was pushed before or after drm_dep_queue_kill(), even during queue
> > > + * teardown.  Drivers should use this guarantee to perform any necessary
> > > + * bookkeeping cleanup without executing the actual backend operation when the
> > > + * queue is killed.
> > > + *
> > > + * Unlike drm_dep_queue_stop(), killing is one-way: there is no corresponding
> > > + * start function.
> > > + *
> > > + * **Driver safety requirement**
> > > + *
> > > + * drm_dep_queue_kill() must only be called once the driver can guarantee that
> > > + * no job in the queue will touch memory associated with any of its fences
> > > + * (i.e., the queue has been removed from the device and will never be put back
> > > + * on).
> > > + *
> > > + * Context: Process context.
> > > + */
> > > +void drm_dep_queue_kill(struct drm_dep_queue *q)
> > > +{
> > > + scoped_guard(mutex, &q->sched.lock) {
> > > + struct dma_fence *fence;
> > > +
> > > + drm_dep_queue_flags_set(q, DRM_DEP_QUEUE_FLAGS_KILLED);
> > > +
> > > + /*
> > > + * Holding &q->sched.lock guarantees that the run-job work item
> > > + * cannot drop its reference to q->dep.fence concurrently, so
> > > + * reading q->dep.fence here is safe.
> > > + */
> > > + fence = READ_ONCE(q->dep.fence);
> > > + if (fence && dma_fence_remove_callback(fence, &q->dep.cb))
> > > + drm_dep_queue_remove_dependency(q, fence);
> > > + }
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_queue_kill);
> > > +
> > > +/**
> > > + * drm_dep_queue_submit_wq() - retrieve the submit workqueue of a dep queue
> > > + * @q: dep queue whose workqueue to retrieve
> > > + *
> > > + * Drivers may use this to queue their own work items alongside the queue's
> > > + * internal run-job and put-job workers — for example to process incoming
> > > + * messages in the same serialisation domain.
> > > + *
> > > + * Prefer drm_dep_queue_work_enqueue() when the only need is to enqueue a
> > > + * work item, as it additionally checks the stopped state.  Use this accessor
> > > + * when the workqueue itself is required (e.g. for alloc_ordered_workqueue
> > > + * replacement or drain_workqueue calls).
> > > + *
> > > + * Context: Any context.
> > > + * Return: the &workqueue_struct used by @q for job submission.
> > > + */
> > > +struct workqueue_struct *drm_dep_queue_submit_wq(struct drm_dep_queue *q)
> > > +{
> > > + return q->sched.submit_wq;
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_queue_submit_wq);
> > > +
> > > +/**
> > > + * drm_dep_queue_timeout_wq() - retrieve the timeout workqueue of a dep queue
> > > + * @q: dep queue whose workqueue to retrieve
> > > + *
> > > + * Returns the workqueue used by @q to run TDR (timeout detection and recovery)
> > > + * work.  Drivers may use this to queue their own timeout-domain work items, or
> > > + * to call drain_workqueue() when tearing down and needing to ensure all pending
> > > + * timeout callbacks have completed before proceeding.
> > > + *
> > > + * Context: Any context.
> > > + * Return: the &workqueue_struct used by @q for TDR work.
> > > + */
> > > +struct workqueue_struct *drm_dep_queue_timeout_wq(struct drm_dep_queue *q)
> > > +{
> > > + return q->sched.timeout_wq;
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_queue_timeout_wq);
> > > +
> > > +/**
> > > + * drm_dep_queue_work_enqueue() - queue work on the dep queue's submit workqueue
> > > + * @q: dep queue to enqueue work on
> > > + * @work: work item to enqueue
> > > + *
> > > + * Queues @work on @q->sched.submit_wq if the queue is not stopped.  This
> > > + * allows drivers to schedule custom work items that run serialised with the
> > > + * queue's own run-job and put-job workers.
> > > + *
> > > + * Return: true if the work was queued, false if the queue is stopped or the
> > > + * work item was already pending.
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +bool drm_dep_queue_work_enqueue(struct drm_dep_queue *q,
> > > + struct work_struct *work)
> > > +{
> > > + if (drm_dep_queue_is_stopped(q))
> > > + return false;
> > > +
> > > + return queue_work(q->sched.submit_wq, work);
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_queue_work_enqueue);
> > > +
> > > +/**
> > > + * drm_dep_queue_can_job_bypass() - test whether a job can skip the SPSC queue
> > > + * @q: dep queue
> > > + * @job: job to test
> > > + *
> > > + * A job may bypass the submit workqueue and run inline on the calling thread
> > > + * if all of the following hold:
> > > + *
> > > + *  - %DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED is set on the queue
> > > + *  - the queue is not stopped
> > > + *  - the SPSC submission queue is empty (no other jobs waiting)
> > > + *  - the queue has enough credits for @job
> > > + *  - @job has no unresolved dependency fences
> > > + *
> > > + * Must be called under @q->sched.lock.
> > > + *
> > > + * Context: Process context. Must hold @q->sched.lock (a mutex).
> > > + * Return: true if the job may be run inline, false otherwise.
> > > + */
> > > +bool drm_dep_queue_can_job_bypass(struct drm_dep_queue *q,
> > > +  struct drm_dep_job *job)
> > > +{
> > > + lockdep_assert_held(&q->sched.lock);
> > > +
> > > + return q->sched.flags & DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED &&
> > > + !drm_dep_queue_is_stopped(q) &&
> > > + !spsc_queue_count(&q->job.queue) &&
> > > + drm_dep_queue_has_credits(q, job) &&
> > > + xa_empty(&job->dependencies);
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_job_done() - mark a job as complete
> > > + * @job: the job that finished
> > > + * @result: error code to propagate, or 0 for success
> > > + *
> > > + * Subtracts @job->credits from the queue credit counter, then signals the
> > > + * job's dep fence with @result.
> > > + *
> > > + * When %DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE is set (IRQ-safe path), a
> > > + * temporary extra reference is taken on @job before signalling the fence.
> > > + * This prevents a concurrent put-job worker — which may be woken by timeouts or
> > > + * queue starting — from freeing the job while this function still holds a
> > > + * pointer to it.  The extra reference is released at the end of the function.
> > > + *
> > > + * After signalling, the IRQ-safe path removes the job from the pending list
> > > + * under @q->job.lock, provided the queue is not stopped.  Removal is skipped
> > > + * when the queue is stopped so that drm_dep_queue_for_each_pending_job() can
> > > + * iterate the list without racing with the completion path.  On successful
> > > + * removal, kicks the run-job worker so the next queued job can be dispatched
> > > + * immediately, then drops the job reference.  If the job was already removed
> > > + * by TDR, or removal was skipped because the queue is stopped, kicks the
> > > + * put-job worker instead to allow the deferred put to complete.
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +static void drm_dep_job_done(struct drm_dep_job *job, int result)
> > > +{
> > > + struct drm_dep_queue *q = job->q;
> > > + bool irq_safe = drm_dep_queue_is_job_put_irq_safe(q), removed = false;
> > > +
> > > + /*
> > > + * Local ref to ensure the put worker—which may be woken by external
> > > + * forces (TDR, driver-side queue starting)—doesn't free the job behind
> > > + * this function's back after drm_dep_fence_done() while it is still on
> > > + * the pending list.
> > > + */
> > > + if (irq_safe)
> > > + drm_dep_job_get(job);
> > > +
> > > + atomic_sub(job->credits, &q->credit.count);
> > > + drm_dep_fence_done(job->dfence, result);
> > > +
> > > + /* Only safe to touch job after fence signal if we have a local ref. */
> > > +
> > > + if (irq_safe) {
> > > + scoped_guard(spinlock_irqsave, &q->job.lock) {
> > > + removed = !list_empty(&job->pending_link) &&
> > > + !drm_dep_queue_is_stopped(q);
> > > +
> > > + /* Guard against TDR operating on job */
> > > + if (removed)
> > > + drm_dep_queue_remove_job(q, job);
> > > + }
> > > + }
> > > +
> > > + if (removed) {
> > > + drm_dep_queue_run_job_queue(q);
> > > + drm_dep_job_put(job);
> > > + } else {
> > > + drm_dep_queue_put_job_queue(q);
> > > + }
> > > +
> > > + if (irq_safe)
> > > + drm_dep_job_put(job);
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_job_done_cb() - dma_fence callback to complete a job
> > > + * @f: the hardware fence that signalled
> > > + * @cb: fence callback embedded in the dep job
> > > + *
> > > + * Extracts the job from @cb and calls drm_dep_job_done() with
> > > + * @f->error as the result.
> > > + *
> > > + * Context: Any context, but with IRQ disabled. May not sleep.
> > > + */
> > > +static void drm_dep_job_done_cb(struct dma_fence *f, struct dma_fence_cb *cb)
> > > +{
> > > + struct drm_dep_job *job = container_of(cb, struct drm_dep_job, cb);
> > > +
> > > + drm_dep_job_done(job, f->error);
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_queue_run_job() - submit a job to hardware and set up
> > > + *   completion tracking
> > > + * @q: dep queue
> > > + * @job: job to run
> > > + *
> > > + * Accounts @job->credits against the queue, appends the job to the pending
> > > + * list, then calls @q->ops->run_job(). The TDR timer is started only when
> > > + * @job is the first entry on the pending list; subsequent jobs added while
> > > + * a TDR is already in flight do not reset the timer (which would otherwise
> > > + * extend the deadline for the already-running head job). Stores the returned
> > > + * hardware fence as the parent of the job's dep fence, then installs
> > > + * drm_dep_job_done_cb() on it. If the hardware fence is already signalled
> > > + * (%-ENOENT from dma_fence_add_callback()) or run_job() returns NULL/error,
> > > + * the job is completed immediately. Must be called under @q->sched.lock.
> > > + *
> > > + * Context: Process context. Must hold @q->sched.lock (a mutex). DMA fence
> > > + * signaling path.
> > > + */
> > > +void drm_dep_queue_run_job(struct drm_dep_queue *q, struct drm_dep_job *job)
> > > +{
> > > + struct dma_fence *fence;
> > > + int r;
> > > +
> > > + lockdep_assert_held(&q->sched.lock);
> > > +
> > > + drm_dep_job_get(job);
> > > + atomic_add(job->credits, &q->credit.count);
> > > +
> > > + scoped_guard(spinlock_irq, &q->job.lock) {
> > > + bool first = list_empty(&q->job.pending);
> > > +
> > > + list_add_tail(&job->pending_link, &q->job.pending);
> > > + if (first)
> > > + drm_queue_start_timeout(q);
> > > + }
> > > +
> > > + fence = q->ops->run_job(job);
> > > + drm_dep_fence_set_parent(job->dfence, fence);
> > > +
> > > + if (!IS_ERR_OR_NULL(fence)) {
> > > + r = dma_fence_add_callback(fence, &job->cb,
> > > +   drm_dep_job_done_cb);
> > > + if (r == -ENOENT)
> > > + drm_dep_job_done(job, fence->error);
> > > + else if (r)
> > > + drm_err(q->drm, "fence add callback failed (%d)\n", r);
> > > + dma_fence_put(fence);
> > > + } else {
> > > + drm_dep_job_done(job, IS_ERR(fence) ? PTR_ERR(fence) : 0);
> > > + }
> > > +
> > > + /*
> > > + * Drop all input dependency fences now, in process context, before the
> > > + * final job put. Once the job is on the pending list its last reference
> > > + * may be dropped from a dma_fence callback (IRQ context), where calling
> > > + * xa_destroy() would be unsafe.
> > > + */
> > 
> > I assume that “pending” is the list of jobs that have been handed to the driver
> > via ops->run_job()?
> > 
> > Can’t this problem be solved by not doing anything inside a dma_fence callback
> > other than scheduling the queue worker?
> > 

Yes, this code is required to support dropping job refs directly in the
dma-fence callback (an opt-in feature). Again, this seems like a
significant win in terms of CPU cycles, although I haven’t collected
data yet.

I could drop this, but conceptually it still feels like the right
approach.

> > > + drm_dep_job_drop_dependencies(job);
> > > + drm_dep_job_put(job);
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_queue_push_job() - enqueue a job on the SPSC submission queue
> > > + * @q: dep queue
> > > + * @job: job to push
> > > + *
> > > + * Pushes @job onto the SPSC queue. If the queue was previously empty
> > > + * (i.e. this is the first pending job), kicks the run_job worker so it
> > > + * processes the job promptly without waiting for the next wakeup.
> > > + * May be called with or without @q->sched.lock held.
> > > + *
> > > + * Context: Any context. DMA fence signaling path.
> > > + */
> > > +void drm_dep_queue_push_job(struct drm_dep_queue *q, struct drm_dep_job *job)
> > > +{
> > > + /*
> > > + * spsc_queue_push() returns true if the queue was previously empty,
> > > + * i.e. this is the first pending job. Kick the run_job worker so it
> > > + * picks it up without waiting for the next wakeup.
> > > + */
> > > + if (spsc_queue_push(&q->job.queue, &job->queue_node))
> > > + drm_dep_queue_run_job_queue(q);
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_init() - module initialiser
> > > + *
> > > + * Allocates the module-private dep_free_wq unbound workqueue used for
> > > + * deferred queue teardown.
> > > + *
> > > + * Return: 0 on success, %-ENOMEM if workqueue allocation fails.
> > > + */
> > > +static int __init drm_dep_init(void)
> > > +{
> > > + dep_free_wq = alloc_workqueue("drm_dep_free", WQ_UNBOUND, 0);
> > > + if (!dep_free_wq)
> > > + return -ENOMEM;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_exit() - module exit
> > > + *
> > > + * Destroys the module-private dep_free_wq workqueue.
> > > + */
> > > +static void __exit drm_dep_exit(void)
> > > +{
> > > + destroy_workqueue(dep_free_wq);
> > > + dep_free_wq = NULL;
> > > +}
> > > +
> > > +module_init(drm_dep_init);
> > > +module_exit(drm_dep_exit);
> > > +
> > > +MODULE_DESCRIPTION("DRM dependency queue");
> > > +MODULE_LICENSE("Dual MIT/GPL");
> > > diff --git a/drivers/gpu/drm/dep/drm_dep_queue.h b/drivers/gpu/drm/dep/drm_dep_queue.h
> > > new file mode 100644
> > > index 000000000000..e5c217a3fab5
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/dep/drm_dep_queue.h
> > > @@ -0,0 +1,31 @@
> > > +/* SPDX-License-Identifier: MIT */
> > > +/*
> > > + * Copyright © 2026 Intel Corporation
> > > + */
> > > +
> > > +#ifndef _DRM_DEP_QUEUE_H_
> > > +#define _DRM_DEP_QUEUE_H_
> > > +
> > > +#include <linux/types.h>
> > > +
> > > +struct drm_dep_job;
> > > +struct drm_dep_queue;
> > > +
> > > +bool drm_dep_queue_can_job_bypass(struct drm_dep_queue *q,
> > > +  struct drm_dep_job *job);
> > > +void drm_dep_queue_run_job(struct drm_dep_queue *q, struct drm_dep_job *job);
> > > +void drm_dep_queue_push_job(struct drm_dep_queue *q, struct drm_dep_job *job);
> > > +
> > > +#if IS_ENABLED(CONFIG_PROVE_LOCKING)
> > > +void drm_dep_queue_push_job_begin(struct drm_dep_queue *q);
> > > +void drm_dep_queue_push_job_end(struct drm_dep_queue *q);
> > > +#else
> > > +static inline void drm_dep_queue_push_job_begin(struct drm_dep_queue *q)
> > > +{
> > > +}
> > > +static inline void drm_dep_queue_push_job_end(struct drm_dep_queue *q)
> > > +{
> > > +}
> > > +#endif
> > > +
> > > +#endif /* _DRM_DEP_QUEUE_H_ */
> > > diff --git a/include/drm/drm_dep.h b/include/drm/drm_dep.h
> > > new file mode 100644
> > > index 000000000000..615926584506
> > > --- /dev/null
> > > +++ b/include/drm/drm_dep.h
> > > @@ -0,0 +1,597 @@
> > > +/* SPDX-License-Identifier: MIT */
> > > +/*
> > > + * Copyright 2015 Advanced Micro Devices, Inc.
> > > + *
> > > + * Permission is hereby granted, free of charge, to any person obtaining a
> > > + * copy of this software and associated documentation files (the "Software"),
> > > + * to deal in the Software without restriction, including without limitation
> > > + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> > > + * and/or sell copies of the Software, and to permit persons to whom the
> > > + * Software is furnished to do so, subject to the following conditions:
> > > + *
> > > + * The above copyright notice and this permission notice shall be included in
> > > + * all copies or substantial portions of the Software.
> > > + *
> > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> > > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> > > + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> > > + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> > > + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> > > + * OTHER DEALINGS IN THE SOFTWARE.
> > > + *
> > > + * Copyright © 2026 Intel Corporation
> > > + */
> > > +
> > > +#ifndef _DRM_DEP_H_
> > > +#define _DRM_DEP_H_
> > > +
> > > +#include <drm/spsc_queue.h>
> > > +#include <linux/dma-fence.h>
> > > +#include <linux/xarray.h>
> > > +#include <linux/workqueue.h>
> > > +
> > > +enum dma_resv_usage;
> > > +struct dma_resv;
> > > +struct drm_dep_fence;
> > > +struct drm_dep_job;
> > > +struct drm_dep_queue;
> > > +struct drm_file;
> > > +struct drm_gem_object;
> > > +
> > > +/**
> > > + * enum drm_dep_timedout_stat - return value of &drm_dep_queue_ops.timedout_job
> > > + * @DRM_DEP_TIMEDOUT_STAT_JOB_SIGNALED: driver signaled the job's finished
> > > + *   fence during reset; drm_dep may safely drop its reference to the job.
> > > + * @DRM_DEP_TIMEDOUT_STAT_REQUEUE_JOB: timeout was a false alarm; reinsert the
> > > + *   job at the head of the pending list so it can complete normally.
> > > + */
> > > +enum drm_dep_timedout_stat {
> > > + DRM_DEP_TIMEDOUT_STAT_JOB_SIGNALED,
> > > + DRM_DEP_TIMEDOUT_STAT_REQUEUE_JOB,
> > > +};
> > > +
> > > +/**
> > > + * struct drm_dep_queue_ops - driver callbacks for a dep queue
> > > + */
> > > +struct drm_dep_queue_ops {
> > > + /**
> > > + * @run_job: submit the job to hardware. Returns the hardware completion
> > > + * fence (with a reference held for the scheduler), or NULL/ERR_PTR on
> > > + * synchronous completion or error.
> > > + */
> > > + struct dma_fence *(*run_job)(struct drm_dep_job *job);
> > > +
> > > + /**
> > > + * @timedout_job: called when the TDR fires for the head job. Must stop
> > > + * the hardware, then return %DRM_DEP_TIMEDOUT_STAT_JOB_SIGNALED if the
> > > + * job's fence was signalled during reset, or
> > > + * %DRM_DEP_TIMEDOUT_STAT_REQUEUE_JOB if the timeout was spurious or
> > > + * signalling was otherwise delayed, and the job should be re-inserted
> > > + * at the head of the pending list. Any other value triggers a WARN.
> > > + */
> > > + enum drm_dep_timedout_stat (*timedout_job)(struct drm_dep_job *job);
> > > +
> > > + /**
> > > + * @release: called when the last kref on the queue is dropped and
> > > + * drm_dep_queue_fini() has completed.  The driver is responsible for
> > > + * removing @q from any internal bookkeeping, calling
> > > + * drm_dep_queue_release(), and then freeing the memory containing @q
> > > + * (e.g. via kfree_rcu() using @q->rcu).  If NULL, drm_dep calls
> > > + * drm_dep_queue_release() and frees @q automatically via kfree_rcu().
> > > + * Use this when the queue is embedded in a larger structure.
> > > + */
> > > + void (*release)(struct drm_dep_queue *q);
> > > +
> > > + /**
> > > + * @fini: if set, called instead of drm_dep_queue_fini() when the last
> > > + * kref is dropped. The driver is responsible for calling
> > > + * drm_dep_queue_fini() itself after it is done with the queue. Use this
> > > + * when additional teardown logic must run before fini (e.g., cleanup
> > > + * firmware resources associated with the queue).
> > > + */
> > > + void (*fini)(struct drm_dep_queue *q);
> > > +};
> > > +
> > > +/**
> > > + * enum drm_dep_queue_flags - flags for &drm_dep_queue and
> > > + *   &drm_dep_queue_init_args
> > > + *
> > > + * Flags are divided into three categories:
> > > + *
> > > + * - **Private static**: set internally at init time and never changed.
> > > + *   Drivers must not read or write these.
> > > + *   %DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ,
> > > + *   %DRM_DEP_QUEUE_FLAGS_OWN_TIMEDOUT_WQ.
> > > + *
> > > + * - **Public dynamic**: toggled at runtime by drivers via accessors.
> > > + *   Any modification must be performed under &drm_dep_queue.sched.lock.
> > 
> > Can’t enforce that in C.
> > 

I agree. There are no true “private” fields in C if the object lives in
a shared header file. I’d love to make drm_dep_queue and drm_dep_job
private (defined in a C file), but then you can’t embed these objects
inside driver objects—which is the primary use case. The best we can do
is simply refuse to accept drivers that touch fields they shouldn’t, so
the code can remain maintainable.

I did, however, make drm_dep_fence a private object—notice it’s defined
in drm_dep_fence.c, so no one can abuse it.

> > > + *   Accessor functions provide unstable reads.
> > > + *   %DRM_DEP_QUEUE_FLAGS_STOPPED,
> > > + *   %DRM_DEP_QUEUE_FLAGS_KILLED.
> > 
> > > + *
> > > + * - **Public static**: supplied by the driver in
> > > + *   &drm_dep_queue_init_args.flags at queue creation time and not modified
> > > + *   thereafter.
> > 
> > Same here.
> > 
> > > + *   %DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED,
> > > + *   %DRM_DEP_QUEUE_FLAGS_HIGHPRI,
> > > + *   %DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE.
> > 
> > > + *
> > > + * @DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ: (private, static) submit workqueue was
> > > + *   allocated by drm_dep_queue_init() and will be destroyed by
> > > + *   drm_dep_queue_fini().
> > > + * @DRM_DEP_QUEUE_FLAGS_OWN_TIMEDOUT_WQ: (private, static) timeout workqueue
> > > + *   was allocated by drm_dep_queue_init() and will be destroyed by
> > > + *   drm_dep_queue_fini().
> > > + * @DRM_DEP_QUEUE_FLAGS_STOPPED: (public, dynamic) the queue is stopped and
> > > + *   will not dispatch new jobs or remove jobs from the pending list, dropping
> > > + *   the drm_dep-owned reference. Set by drm_dep_queue_stop(), cleared by
> > > + *   drm_dep_queue_start().
> > > + * @DRM_DEP_QUEUE_FLAGS_KILLED: (public, dynamic) the queue has been killed
> > > + *   via drm_dep_queue_kill(). Any active dependency wait is cancelled
> > > + *   immediately.  Jobs continue to flow through run_job for bookkeeping
> > > + *   cleanup, but dependency waiting is skipped so that queued work drains
> > > + *   as quickly as possible.
> > > + * @DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED: (public, static) the queue supports
> > > + *   the bypass path where eligible jobs skip the SPSC queue and run inline.
> > > + * @DRM_DEP_QUEUE_FLAGS_HIGHPRI: (public, static) the submit workqueue owned
> > > + *   by the queue is created with %WQ_HIGHPRI, causing run-job and put-job
> > > + *   workers to execute at elevated priority. Only privileged clients (e.g.
> > > + *   drivers managing time-critical or real-time GPU contexts) should request
> > > + *   this flag; granting it to unprivileged userspace would allow priority
> > > + *   inversion attacks.
> > > + *   @drm_dep_queue_init_args.submit_wq is provided.
> > > + * @DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE: (public, static) when set,
> > > + *   drm_dep_job_done() may be called from hardirq context (e.g. from a
> > > + *   hardware-signalled dma_fence callback). drm_dep_job_done() will directly
> > > + *   dequeue the job and call drm_dep_job_put() without deferring to a
> > > + *   workqueue. The driver's &drm_dep_job_ops.release callback must therefore
> > > + *   be safe to invoke from IRQ context.
> > > + */
> > > +enum drm_dep_queue_flags {
> > > + DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ = BIT(0),
> > > + DRM_DEP_QUEUE_FLAGS_OWN_TIMEDOUT_WQ = BIT(1),
> > > + DRM_DEP_QUEUE_FLAGS_STOPPED = BIT(2),
> > > + DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED = BIT(3),
> > > + DRM_DEP_QUEUE_FLAGS_HIGHPRI = BIT(4),
> > > + DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE = BIT(5),
> > > + DRM_DEP_QUEUE_FLAGS_KILLED = BIT(6),
> > > +};
> > > +
> > > +/**
> > > + * struct drm_dep_queue - a dependency-tracked GPU submission queue
> > > + *
> > > + * Combines the role of &drm_gpu_scheduler and &drm_sched_entity into a single
> > > + * object.  Each queue owns a submit workqueue (or borrows one), a timeout
> > > + * workqueue, an SPSC submission queue, and a pending-job list used for TDR.
> > > + *
> > > + * Initialise with drm_dep_queue_init(), tear down with drm_dep_queue_fini().
> > > + * Reference counted via drm_dep_queue_get() / drm_dep_queue_put().
> > > + *
> > > + * All fields are **opaque to drivers**.  Do not read or write any field
> > 
> > Can’t enforce this in C.
> > 

Just answer above, agree.

> > > + * directly; use the provided helper functions instead.  The sole exception
> > > + * is @rcu, which drivers may pass to kfree_rcu() when the queue is embedded
> > > + * inside a larger driver-managed structure and the &drm_dep_queue_ops.release
> > > + * vfunc performs an RCU-deferred free.
> > 
> > > + */
> > > +struct drm_dep_queue {
> > > + /** @ops: driver callbacks, set at init time. */
> > > + const struct drm_dep_queue_ops *ops;
> > > + /** @name: human-readable name used for workqueue and fence naming. */
> > > + const char *name;
> > > + /** @drm: owning DRM device; a drm_dev_get() reference is held for the
> > > + *  lifetime of the queue to prevent module unload while queues are live.
> > > + */
> > > + struct drm_device *drm;
> > > + /** @refcount: reference count; use drm_dep_queue_get/put(). */
> > > + struct kref refcount;
> > > + /**
> > > + * @free_work: deferred teardown work queued unconditionally by
> > > + * drm_dep_queue_fini() onto the module-private dep_free_wq.  The work
> > > + * item disables pending workers synchronously and destroys any owned
> > > + * workqueues before releasing the queue memory and dropping the
> > > + * drm_dev_get() reference.  Running on dep_free_wq ensures
> > > + * destroy_workqueue() is never called from within one of the queue's
> > > + * own workers.
> > > + */
> > > + struct work_struct free_work;
> > > + /**
> > > + * @rcu: RCU head for deferred freeing.
> > > + *
> > > + * This is the **only** field drivers may access directly.  When the
> > 
> > We can enforce this in Rust at compile time.
> > 

That is nice.

> > > + * queue is embedded in a larger structure, implement
> > > + * &drm_dep_queue_ops.release, call drm_dep_queue_release() to destroy
> > > + * internal resources, then pass this field to kfree_rcu() so that any
> > > + * in-flight RCU readers referencing the queue's dma_fence timeline name
> > > + * complete before the memory is returned.  All other fields must be
> > > + * accessed through the provided helpers.
> > > + */
> > > + struct rcu_head rcu;
> > > +
> > > + /** @sched: scheduling and workqueue state. */
> > > + struct {
> > > + /** @sched.submit_wq: ordered workqueue for run/put-job work. */
> > > + struct workqueue_struct *submit_wq;
> > > + /** @sched.timeout_wq: workqueue for the TDR delayed work. */
> > > + struct workqueue_struct *timeout_wq;
> > > + /**
> > > + * @sched.run_job: work item that dispatches the next queued
> > > + * job.
> > > + */
> > > + struct work_struct run_job;
> > > + /** @sched.put_job: work item that frees finished jobs. */
> > > + struct work_struct put_job;
> > > + /** @sched.tdr: delayed work item for timeout/reset (TDR). */
> > > + struct delayed_work tdr;
> > > + /**
> > > + * @sched.lock: mutex serialising job dispatch, bypass
> > > + * decisions, stop/start, and flag updates.
> > > + */
> > > + struct mutex lock;
> > > + /**
> > > + * @sched.flags: bitmask of &enum drm_dep_queue_flags.
> > > + * Any modification after drm_dep_queue_init() must be
> > > + * performed under @sched.lock.
> > > + */
> > > + enum drm_dep_queue_flags flags;
> > > + } sched;
> > > +
> > > + /** @job: pending-job tracking state. */
> > > + struct {
> > > + /**
> > > + * @job.pending: list of jobs that have been dispatched to
> > > + * hardware and not yet freed. Protected by @job.lock.
> > > + */
> > > + struct list_head pending;
> > > + /**
> > > + * @job.queue: SPSC queue of jobs waiting to be dispatched.
> > > + * Producers push via drm_dep_queue_push_job(); the run_job
> > > + * work item pops from the consumer side.
> > > + */
> > > + struct spsc_queue queue;
> > > + /**
> > > + * @job.lock: spinlock protecting @job.pending, TDR start, and
> > > + * the %DRM_DEP_QUEUE_FLAGS_STOPPED flag. Always acquired with
> > > + * irqsave (spin_lock_irqsave / spin_unlock_irqrestore) to
> > > + * support %DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE queues where
> > > + * drm_dep_job_done() may run from hardirq context.
> > > + */
> > > + spinlock_t lock;
> > > + /**
> > > + * @job.timeout: per-job TDR timeout in jiffies.
> > > + * %MAX_SCHEDULE_TIMEOUT means no timeout.
> > > + */
> > > + long timeout;
> > > +#if IS_ENABLED(CONFIG_PROVE_LOCKING)
> > > + /**
> > > + * @job.push: lockdep annotation tracking the arm-to-push
> > > + * critical section.
> > > + */
> > > + struct {
> > > + /*
> > > + * @job.push.owner: task that currently holds the push
> > > + * context, used to assert single-owner invariants.
> > > + * NULL when idle.
> > > + */
> > > + struct task_struct *owner;
> > > + } push;
> > > +#endif
> > > + } job;
> > > +
> > > + /** @credit: hardware credit accounting. */
> > > + struct {
> > > + /** @credit.limit: maximum credits the queue can hold. */
> > > + u32 limit;
> > > + /** @credit.count: credits currently in flight (atomic). */
> > > + atomic_t count;
> > > + } credit;
> > > +
> > > + /** @dep: current blocking dependency for the head SPSC job. */
> > > + struct {
> > > + /**
> > > + * @dep.fence: fence being waited on before the head job can
> > > + * run. NULL when no dependency is pending.
> > > + */
> > > + struct dma_fence *fence;
> > > + /**
> > > + * @dep.removed_fence: dependency fence whose callback has been
> > > + * removed.  The run-job worker must drop its reference to this
> > > + * fence before proceeding to call run_job.
> > 
> > We can enforce this in Rust automatically.
> > 
> > > + */
> > > + struct dma_fence *removed_fence;
> > > + /** @dep.cb: callback installed on @dep.fence. */
> > > + struct dma_fence_cb cb;
> > > + } dep;
> > > +
> > > + /** @fence: fence context and sequence number state. */
> > > + struct {
> > > + /**
> > > + * @fence.seqno: next sequence number to assign, incremented
> > > + * each time a job is armed.
> > > + */
> > > + u32 seqno;
> > > + /**
> > > + * @fence.context: base DMA fence context allocated at init
> > > + * time. Finished fences use this context.
> > > + */
> > > + u64 context;
> > > + } fence;
> > > +};
> > > +
> > > +/**
> > > + * struct drm_dep_queue_init_args - arguments for drm_dep_queue_init()
> > > + */
> > > +struct drm_dep_queue_init_args {
> > > + /** @ops: driver callbacks; must not be NULL. */
> > > + const struct drm_dep_queue_ops *ops;
> > > + /** @name: human-readable name for workqueues and fence timelines. */
> > > + const char *name;
> > > + /** @drm: owning DRM device. A drm_dev_get() reference is taken at
> > > + *  queue init and released when the queue is freed, preventing module
> > > + *  unload while any queue is still alive.
> > > + */
> > > + struct drm_device *drm;
> > > + /**
> > > + * @submit_wq: workqueue for job dispatch. If NULL, an ordered
> > > + * workqueue is allocated and owned by the queue.  If non-NULL, the
> > > + * workqueue must have been allocated with %WQ_MEM_RECLAIM_TAINT;
> > > + * drm_dep_queue_init() returns %-EINVAL otherwise.
> > > + */
> > > + struct workqueue_struct *submit_wq;
> > > + /**
> > > + * @timeout_wq: workqueue for TDR. If NULL, an ordered workqueue
> > > + * is allocated and owned by the queue.  If non-NULL, the workqueue
> > > + * must have been allocated with %WQ_MEM_RECLAIM_TAINT;
> > > + * drm_dep_queue_init() returns %-EINVAL otherwise.
> > > + */
> > > + struct workqueue_struct *timeout_wq;
> > > + /** @credit_limit: maximum hardware credits; must be non-zero. */
> > > + u32 credit_limit;
> > > + /**
> > > + * @timeout: per-job TDR timeout in jiffies. Zero means no timeout
> > > + * (%MAX_SCHEDULE_TIMEOUT is used internally).
> > > + */
> > > + long timeout;
> > > + /**
> > > + * @flags: initial queue flags. %DRM_DEP_QUEUE_FLAGS_OWN_SUBMIT_WQ
> > > + * and %DRM_DEP_QUEUE_FLAGS_OWN_TIMEDOUT_WQ are managed internally
> > > + * and will be ignored if set here. Setting
> > > + * %DRM_DEP_QUEUE_FLAGS_HIGHPRI requests a high-priority submit
> > > + * workqueue; drivers must only set this for privileged clients.
> > > + */
> > > + enum drm_dep_queue_flags flags;
> > > +};
> > > +
> > > +/**
> > > + * struct drm_dep_job_ops - driver callbacks for a dep job
> > > + */
> > > +struct drm_dep_job_ops {
> > > + /**
> > > + * @release: called when the last reference to the job is dropped.
> > > + *
> > > + * If set, the driver is responsible for freeing the job. If NULL,
> > 
> > And if they don’t?
> > 

They leak memory.

> > By the way, we can also enforce this in Rust.
> > 
> > > + * drm_dep_job_put() will call kfree() on the job directly.
> > > + */
> > > + void (*release)(struct drm_dep_job *job);
> > > +};
> > > +
> > > +/**
> > > + * struct drm_dep_job - a unit of work submitted to a dep queue
> > > + *
> > > + * All fields are **opaque to drivers**.  Do not read or write any field
> > > + * directly; use the provided helper functions instead.
> > > + */
> > > +struct drm_dep_job {
> > > + /** @ops: driver callbacks for this job. */
> > > + const struct drm_dep_job_ops *ops;
> > > + /** @refcount: reference count, managed by drm_dep_job_get/put(). */
> > > + struct kref refcount;
> > > + /**
> > > + * @dependencies: xarray of &dma_fence dependencies before the job can
> > > + * run.
> > > + */
> > > + struct xarray dependencies;
> > > + /** @q: the queue this job is submitted to. */
> > > + struct drm_dep_queue *q;
> > > + /** @queue_node: SPSC queue linkage for pending submission. */
> > > + struct spsc_node queue_node;
> > > + /**
> > > + * @pending_link: list entry in the queue's pending job list. Protected
> > > + * by @job.q->job.lock.
> > > + */
> > > + struct list_head pending_link;
> > > + /** @dfence: finished fence for this job. */
> > > + struct drm_dep_fence *dfence;
> > > + /** @cb: fence callback used to watch for dependency completion. */
> > > + struct dma_fence_cb cb;
> > > + /** @credits: number of credits this job consumes from the queue. */
> > > + u32 credits;
> > > + /**
> > > + * @last_dependency: index into @dependencies of the next fence to
> > > + * check. Advanced by drm_dep_queue_job_dependency() as each
> > > + * dependency is consumed.
> > > + */
> > > + u32 last_dependency;
> > > + /**
> > > + * @invalidate_count: number of times this job has been invalidated.
> > > + * Incremented by drm_dep_job_invalidate_job().
> > > + */
> > > + u32 invalidate_count;
> > > + /**
> > > + * @signalling_cookie: return value of dma_fence_begin_signalling()
> > > + * captured in drm_dep_job_arm() and consumed by drm_dep_job_push().
> > > + * Not valid outside the arm→push window.
> > > + */
> > > + bool signalling_cookie;
> > > +};
> > > +
> > > +/**
> > > + * struct drm_dep_job_init_args - arguments for drm_dep_job_init()
> > > + */
> > > +struct drm_dep_job_init_args {
> > > + /**
> > > + * @ops: driver callbacks for the job, or NULL for default behaviour.
> > > + */
> > > + const struct drm_dep_job_ops *ops;
> > > + /** @q: the queue to associate the job with. A reference is taken. */
> > > + struct drm_dep_queue *q;
> > > + /** @credits: number of credits this job consumes; must be non-zero. */
> > > + u32 credits;
> > > +};
> > > +
> > > +/* Queue API */
> > > +
> > > +/**
> > > + * drm_dep_queue_sched_guard() - acquire the queue scheduler lock as a guard
> > > + * @__q: dep queue whose scheduler lock to acquire
> > > + *
> > > + * Acquires @__q->sched.lock as a scoped mutex guard (released automatically
> > > + * when the enclosing scope exits).  This lock serialises all scheduler state
> > > + * transitions — stop/start/kill flag changes, bypass-path decisions, and the
> > > + * run-job worker — so it must be held when the driver needs to atomically
> > > + * inspect or modify queue state in relation to job submission.
> > > + *
> > > + * **When to use**
> > > + *
> > > + * Drivers that set %DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED and wish to
> > > + * serialise their own submit work against the bypass path must acquire this
> > > + * guard.  Without it, a concurrent caller of drm_dep_job_push() could take
> > > + * the bypass path and call ops->run_job() inline between the driver's
> > > + * eligibility check and its corresponding action, producing a race.
> > 
> > So if you’re not careful, you have just introduced a race :/
> > 

Luckily I’m careful. The use case here is compositors, compute
workloads, or servicing KMD page faults—none of which have input
dependencies and all of which require very low latency and minimal
jitter.

> > > + *
> > > + * **Constraint: only from submit_wq worker context**
> > > + *
> > > + * This guard must only be acquired from a work item running on the queue's
> > > + * submit workqueue (@q->sched.submit_wq) by drivers.
> > > + *
> > > + * Context: Process context only; must be called from submit_wq work by
> > > + * drivers.
> > > + */
> > > +#define drm_dep_queue_sched_guard(__q) \
> > > + guard(mutex)(&(__q)->sched.lock)
> > > +
> > > +int drm_dep_queue_init(struct drm_dep_queue *q,
> > > +       const struct drm_dep_queue_init_args *args);
> > > +void drm_dep_queue_fini(struct drm_dep_queue *q);
> > > +void drm_dep_queue_release(struct drm_dep_queue *q);
> > > +struct drm_dep_queue *drm_dep_queue_get(struct drm_dep_queue *q);
> > > +bool drm_dep_queue_get_unless_zero(struct drm_dep_queue *q);
> > > +void drm_dep_queue_put(struct drm_dep_queue *q);
> > > +void drm_dep_queue_stop(struct drm_dep_queue *q);
> > > +void drm_dep_queue_start(struct drm_dep_queue *q);
> > > +void drm_dep_queue_kill(struct drm_dep_queue *q);
> > > +void drm_dep_queue_trigger_timeout(struct drm_dep_queue *q);
> > > +void drm_dep_queue_cancel_tdr_sync(struct drm_dep_queue *q);
> > > +void drm_dep_queue_resume_timeout(struct drm_dep_queue *q);
> > > +bool drm_dep_queue_work_enqueue(struct drm_dep_queue *q,
> > > + struct work_struct *work);
> > > +bool drm_dep_queue_is_stopped(struct drm_dep_queue *q);
> > > +bool drm_dep_queue_is_killed(struct drm_dep_queue *q);
> > > +bool drm_dep_queue_is_initialized(struct drm_dep_queue *q);
> > > +void drm_dep_queue_set_stopped(struct drm_dep_queue *q);
> > > +unsigned int drm_dep_queue_refcount(const struct drm_dep_queue *q);
> > > +long drm_dep_queue_timeout(const struct drm_dep_queue *q);
> > > +struct workqueue_struct *drm_dep_queue_submit_wq(struct drm_dep_queue *q);
> > > +struct workqueue_struct *drm_dep_queue_timeout_wq(struct drm_dep_queue *q);
> > > +
> > > +/* Job API */
> > > +
> > > +/**
> > > + * DRM_DEP_JOB_FENCE_PREALLOC - sentinel value for pre-allocating a dependency slot
> > > + *
> > > + * Pass this to drm_dep_job_add_dependency() instead of a real fence to
> > > + * pre-allocate a slot in the job's dependency xarray during the preparation
> > > + * phase (where GFP_KERNEL is available).  The returned xarray index identifies
> > > + * the slot.  Call drm_dep_job_replace_dependency() later — inside a
> > > + * dma_fence_begin_signalling() region if needed — to swap in the real fence
> > > + * without further allocation.
> > > + *
> > > + * This sentinel is never treated as a dma_fence; it carries no reference count
> > > + * and must not be passed to dma_fence_put().  It is only valid as an argument
> > > + * to drm_dep_job_add_dependency() and as the expected stored value checked by
> > > + * drm_dep_job_replace_dependency().
> > > + */
> > > +#define DRM_DEP_JOB_FENCE_PREALLOC ((struct dma_fence *)-1)
> > > +
> > > +int drm_dep_job_init(struct drm_dep_job *job,
> > > +     const struct drm_dep_job_init_args *args);
> > > +struct drm_dep_job *drm_dep_job_get(struct drm_dep_job *job);
> > > +void drm_dep_job_put(struct drm_dep_job *job);
> > > +void drm_dep_job_arm(struct drm_dep_job *job);
> > > +void drm_dep_job_push(struct drm_dep_job *job);
> > > +int drm_dep_job_add_dependency(struct drm_dep_job *job,
> > > +       struct dma_fence *fence);
> > > +void drm_dep_job_replace_dependency(struct drm_dep_job *job, u32 index,
> > > +    struct dma_fence *fence);
> > > +int drm_dep_job_add_syncobj_dependency(struct drm_dep_job *job,
> > > +       struct drm_file *file, u32 handle,
> > > +       u32 point);
> > > +int drm_dep_job_add_resv_dependencies(struct drm_dep_job *job,
> > > +      struct dma_resv *resv,
> > > +      enum dma_resv_usage usage);
> > > +int drm_dep_job_add_implicit_dependencies(struct drm_dep_job *job,
> > > +  struct drm_gem_object *obj,
> > > +  bool write);
> > > +bool drm_dep_job_is_signaled(struct drm_dep_job *job);
> > > +bool drm_dep_job_is_finished(struct drm_dep_job *job);
> > > +bool drm_dep_job_invalidate_job(struct drm_dep_job *job, int threshold);
> > > +struct dma_fence *drm_dep_job_finished_fence(struct drm_dep_job *job);
> > > +
> > > +/**
> > > + * struct drm_dep_queue_pending_job_iter - iterator state for
> > > + *   drm_dep_queue_for_each_pending_job()
> > > + * @q: queue being iterated
> > > + */
> > > +struct drm_dep_queue_pending_job_iter {
> > > + struct drm_dep_queue *q;
> > > +};
> > > +
> > > +/* Drivers should never call this directly */
> > 
> > Not enforceable in C.
> > 
> > > +static inline struct drm_dep_queue_pending_job_iter
> > > +__drm_dep_queue_pending_job_iter_begin(struct drm_dep_queue *q)
> > > +{
> > > + struct drm_dep_queue_pending_job_iter iter = {
> > > + .q = q,
> > > + };
> > > +
> > > + WARN_ON(!drm_dep_queue_is_stopped(q));
> > > + return iter;
> > > +}
> > > +
> > > +/* Drivers should never call this directly */
> > > +static inline void
> > > +__drm_dep_queue_pending_job_iter_end(struct drm_dep_queue_pending_job_iter iter)
> > > +{
> > > + WARN_ON(!drm_dep_queue_is_stopped(iter.q));
> > > +}
> > > +
> > > +/* clang-format off */
> > > +DEFINE_CLASS(drm_dep_queue_pending_job_iter,
> > > +     struct drm_dep_queue_pending_job_iter,
> > > +     __drm_dep_queue_pending_job_iter_end(_T),
> > > +     __drm_dep_queue_pending_job_iter_begin(__q),
> > > +     struct drm_dep_queue *__q);
> > > +/* clang-format on */
> > > +static inline void *
> > > +class_drm_dep_queue_pending_job_iter_lock_ptr(
> > > + class_drm_dep_queue_pending_job_iter_t *_T)
> > > +{ return _T; }
> > > +#define class_drm_dep_queue_pending_job_iter_is_conditional false
> > > +
> > > +/**
> > > + * drm_dep_queue_for_each_pending_job() - iterate over all pending jobs
> > > + *   in a queue
> > > + * @__job: loop cursor, a &struct drm_dep_job pointer
> > > + * @__q: &struct drm_dep_queue to iterate
> > > + *
> > > + * Iterates over every job currently on @__q->job.pending. The queue must be
> > > + * stopped (drm_dep_queue_stop() called) before using this iterator; a WARN_ON
> > > + * fires at the start and end of the scope if it is not.
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +#define drm_dep_queue_for_each_pending_job(__job, __q) \
> > > + scoped_guard(drm_dep_queue_pending_job_iter, (__q)) \
> > > + list_for_each_entry((__job), &(__q)->job.pending, pending_link)
> > > +
> > > +#endif
> > > -- 
> > > 2.34.1
> > > 
> > 
> > 
> > By the way:
> > 
> > I invite you to have a look at this implementation [0]. It currently works in real
> > hardware i.e.: our downstream "Tyr" driver for Arm Mali is using that at the
> > moment. It is a mere prototype that we’ve put together to test different
> > approaches, so it’s not meant to be a “solution” at all. It’s a mere data point
> > for further discussion.

I think some of the things I pointed out—async teardown, bypass paths,
and dropping job refs in IRQ context—would still need to be added,
though.

> > 
> > Philip Stanner is working on this “Job Queue” concept too, but from an upstream
> > perspective.
> > 
> > [0]: https://gitlab.freedesktop.org/panfrost/linux/-/merge_requests/61

I scanned [0], it looks signicantly better than the post upstream. Let
me dig in a bit more.

Matt
Matt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
  2026-03-17  8:26         ` Matthew Brost
  2026-03-17 12:04           ` Daniel Almeida
@ 2026-03-17 19:41           ` Miguel Ojeda
  2026-03-23 17:31             ` Matthew Brost
  1 sibling, 1 reply; 21+ messages in thread
From: Miguel Ojeda @ 2026-03-17 19:41 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Daniel Almeida, intel-xe, dri-devel, Boris Brezillon,
	Tvrtko Ursulin, Rodrigo Vivi, Thomas Hellström,
	Christian König, Danilo Krummrich, David Airlie,
	Maarten Lankhorst, Maxime Ripard, Philipp Stanner, Simona Vetter,
	Sumit Semwal, Thomas Zimmermann, linux-kernel, Sami Tolvanen,
	Jeffrey Vander Stoep, Alice Ryhl, Daniel Stone, Alexandre Courbot,
	John Hubbard, shashanks, jajones, Eliot Courtney, Joel Fernandes,
	rust-for-linux

On Tue, Mar 17, 2026 at 9:27 AM Matthew Brost <matthew.brost@intel.com> wrote:
>
> I hate cut off in thteads.
>
> I get it — you’re a Rust zealot.

Cut off? Zealot?

Look, I got the email in my inbox, so I skimmed it to understand why I
got it and why the Rust list was Cc'd. I happened to notice your
(quite surprising) claims about Rust, so I decided to reply to a
couple of those, since I proposed Rust for the kernel.

How is that a cut off and how does that make a maintainer a zealot?

Anyway, my understanding is that we agreed that the cleanup attribute
in C doesn't enforce much of anything. We also agreed that it is
important to think about ownership and lifetimes and to enforce the
rules and to be disciplined. All good so far.

Now, what I said is simply that Rust fundamentally improves the
situation -- C "RAII" not doing so is not comparable. For instance,
that statically enforcing things is a meaningful improvement over
runtime approaches (which generally require to trigger an issue, and
which in some cases are not suitable for production settings).

Really, I just said Rust would help with things you already stated you
care about. And nobody claims "Rust solves everything" as you stated.
So I don't see zealots here, and insulting others doesn't help your
argument.

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
  2026-03-17 18:14       ` Matthew Brost
@ 2026-03-17 19:48         ` Daniel Almeida
  2026-03-17 20:43         ` Boris Brezillon
  1 sibling, 0 replies; 21+ messages in thread
From: Daniel Almeida @ 2026-03-17 19:48 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, dri-devel, Boris Brezillon, Tvrtko Ursulin,
	Rodrigo Vivi, Thomas Hellström, Christian König,
	Danilo Krummrich, David Airlie, Maarten Lankhorst, Maxime Ripard,
	Philipp Stanner, Simona Vetter, Sumit Semwal, Thomas Zimmermann,
	linux-kernel, Sami Tolvanen, Jeffrey Vander Stoep, Alice Ryhl,
	Daniel Stone, Alexandre Courbot, John Hubbard, shashanks, jajones,
	Eliot Courtney, Joel Fernandes, rust-for-linux

I still need to digest your answers above, there's quite a bit of information.
Thanks for that. I'll do a pass on it tomorrow.

>>> By the way:
>>> 
>>> I invite you to have a look at this implementation [0]. It currently works in real
>>> hardware i.e.: our downstream "Tyr" driver for Arm Mali is using that at the
>>> moment. It is a mere prototype that we’ve put together to test different
>>> approaches, so it’s not meant to be a “solution” at all. It’s a mere data point
>>> for further discussion.
> 
> I think some of the things I pointed out—async teardown, bypass paths,
> and dropping job refs in IRQ context—would still need to be added,
> though.

That’s ok, I suppose we can find a way to add these things if they’re
needed in order to support other FW scheduling GPUs (i.e.: other than Mali,
which is the only thing I tested on, and Nova, which I assume has very similar
requirements).

> 
>>> 
>>> Philip Stanner is working on this “Job Queue” concept too, but from an upstream
>>> perspective.
>>> 
>>> [0]: https://gitlab.freedesktop.org/panfrost/linux/-/merge_requests/61
> 
> I scanned [0], it looks signicantly better than the post upstream. Let
> me dig in a bit more.
> 
> Matt
> Matt

One thing that is missing is that, at the moment, submit() is fallible and
there is no preallocation. This can be added to the current design rather
easily (i.e. by splitting into two different steps, a fallible prepare() where
rollback is possible, and an infallible commit(), or whatever names get
chosen).

Perhaps we can also split this into two types too, AtomicJobQueue and JobQueue,
where only the first one allows refs to be dropped in IRQ context; i.e.: since
we do not need this in Tyr, and not allowing this makes the design of the
"non-atomic" version much simpler. Or perhaps we can figure out a way to ensure
that we don't drop the last ref in IRQ context. I am just brainstorming some
ideas at this point, and again, I still need to go through your explanations above.

— Daniel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
  2026-03-17 18:14       ` Matthew Brost
  2026-03-17 19:48         ` Daniel Almeida
@ 2026-03-17 20:43         ` Boris Brezillon
  2026-03-18 22:40           ` Matthew Brost
  1 sibling, 1 reply; 21+ messages in thread
From: Boris Brezillon @ 2026-03-17 20:43 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Daniel Almeida, intel-xe, dri-devel, Tvrtko Ursulin, Rodrigo Vivi,
	Thomas Hellström, Christian König, Danilo Krummrich,
	David Airlie, Maarten Lankhorst, Maxime Ripard, Philipp Stanner,
	Simona Vetter, Sumit Semwal, Thomas Zimmermann, linux-kernel,
	Sami Tolvanen, Jeffrey Vander Stoep, Alice Ryhl, Daniel Stone,
	Alexandre Courbot, John Hubbard, shashanks, jajones,
	Eliot Courtney, Joel Fernandes, rust-for-linux

Hi Matthew,

Just a few drive-by comments.

On Tue, 17 Mar 2026 11:14:36 -0700
Matthew Brost <matthew.brost@intel.com> wrote:

> > > > Timeout Detection and Recovery (TDR): a per-queue delayed work item
> > > > fires when the head pending job exceeds q->job.timeout jiffies, calling
> > > > ops->timedout_job(). drm_dep_queue_trigger_timeout() forces immediate
> > > > expiry for device teardown.
> > > > 
> > > > IRQ-safe completion: queues flagged DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE
> > > > allow drm_dep_job_done() to be called from hardirq context (e.g. a
> > > > dma_fence callback). Dependency cleanup is deferred to process context
> > > > after ops->run_job() returns to avoid calling xa_destroy() from IRQ.
> > > > 
> > > > Zombie-state guard: workers use kref_get_unless_zero() on entry and
> > > > bail immediately if the queue refcount has already reached zero and
> > > > async teardown is in flight, preventing use-after-free.  
> > > 
> > > In rust, when you queue work, you have to pass a reference-counted pointer
> > > (Arc<T>). We simply never have this problem in a Rust design. If there is work
> > > queued, the queue is alive.
> > > 
> > > By the way, why can’t we simply require synchronous teardowns?  
> 
> Consider the case where the DRM dep queue’s refcount drops to zero, but
> the device firmware still holds references to the associated queue.
> These are resources that must be torn down asynchronously. In Xe, I need
> to send two asynchronous firmware commands before I can safely remove
> the memory associated with the queue (faulting on this kind of global
> memory will take down the device) and recycle the firmware ID tied to
> the queue. These async commands are issued on the driver side, on the
> DRM dep queue’s workqueue as well.

Asynchronous teardown is okay, but I'm not too sure using the refcnt to
know that the queue is no longer usable is the way to go. To me the
refcnt is what determines when the SW object is no longer referenced by
any other item in the code, and a work item acting on the queue counts
as one owner of this queue. If you want to cancel the work in order to
speed up the destruction of the queue, you can call
{cancel,disable}_work[_sync](), and have the ref dropped if the
cancel/disable was effective. Multi-step teardown is also an option,
but again, the state of the queue shouldn't be determined from its
refcnt IMHO.

> 
> Now consider a scenario where something goes wrong and those firmware
> commands never complete, and a device reset is required to recover. The
> driver’s per-queue tracking logic stops all queues (including zombie
> ones), determines which commands were lost, cleans up the side effects
> of that lost state, and then restarts all queues. That is how we would
> end up in this work item with a zombie queue. The restart logic could
> probably be made smart enough to avoid queueing work for zombie queues,
> but in my opinion it’s safe enough to use kref_get_unless_zero() in the
> work items.

Well, that only works for single-step teardown, or when you enter the
last step. At which point, I'm not too sure it's significantly better
than encoding the state of the queue through a separate field, and have
the job queue logic reject new jobs if the queue is no longer usable
(shouldn't even be exposed to userland at this point though).

> 
> It should also be clear that a DRM dep queue is primarily intended to be
> embedded inside the driver’s own queue object, even though it is valid
> to use it as a standalone object. The async teardown flows are also
> optional features.
> 
> Let’s also consider a case where you do not need the async firmware
> flows described above, but the DRM dep queue is still embedded in a
> driver-side object that owns memory via dma-resv. The final queue put
> may occur in IRQ context (DRM dep avoids kicking a worker just to drop a
> refi as opt in), or in the reclaim path (any scheduler workqueue is the
> reclaim path). In either case, you cannot free memory there taking a
> dma-resv lock, which is why all DRM dep queues ultimately free their
> resources in a work item outside of reclaim. Many drivers already follow
> this pattern, but in DRM dep this behavior is built-in.

I agree deferred cleanup is the way to go.

> 
> So I don’t think Rust natively solves these types of problems, although
> I’ll concede that it does make refcounting a bit more sane.

Rust won't magically defer the cleanup, nor will it dictate how you want
to do the queue teardown, those are things you need to implement. But it
should give visibility about object lifetimes, and guarantee that an
object that's still visible to some owners is usable (the notion of
usable is highly dependent on the object implementation).

Just a purely theoretical example of a multi-step queue teardown that
might be possible to encode in rust:

- MyJobQueue<Usable>: The job queue is currently exposed and usable.
  There's a ::destroy() method consuming 'self' and returning a
  MyJobQueue<Destroyed> object
- MyJobQueue<Destroyed>: The user asked for the workqueue to be
  destroyed. No new job can be pushed. Existing jobs that didn't make
  it to the FW queue are cancelled, jobs that are in-flight are
  cancelled if they can, or are just waited upon if they can't. When
  the whole destruction step is done, ::destroyed() is called, it
  consumes 'self' and returns a MyJobQueue<Inactive> object.
- MyJobQueue<Inactive>: The queue is no longer active (HW doesn't have
  any resources on this queue). It's ready to be cleaned up.
  ::cleanup() (or just ::drop()) defers the cleanup of some inner
  object that has been passed around between the various
  MyJobQueue<State> wrappers.

Each of the state transition can happen asynchronously. A state
transition consumes the object in one state, and returns a new object
in its new state. None of the transition involves dropping a refcnt,
ownership is just transferred. The final MyJobQueue<Inactive> object is
the object we'll defer cleanup on.

It's a very high-level view of one way this can be implemented (I'm
sure there are others, probably better than my suggestion) in order to
make sure the object doesn't go away without the compiler enforcing
proper state transitions.

> > > > +/**
> > > > + * DOC: DRM dependency fence
> > > > + *
> > > > + * Each struct drm_dep_job has an associated struct drm_dep_fence that
> > > > + * provides a single dma_fence (@finished) signalled when the hardware
> > > > + * completes the job.
> > > > + *
> > > > + * The hardware fence returned by &drm_dep_queue_ops.run_job is stored as
> > > > + * @parent. @finished is chained to @parent via drm_dep_job_done_cb() and
> > > > + * is signalled once @parent signals (or immediately if run_job() returns
> > > > + * NULL or an error).  
> > > 
> > > I thought this fence proxy mechanism was going away due to recent work being
> > > carried out by Christian?
> > >   
> 
> Consider the case where a driver’s hardware fence is implemented as a
> dma-fence-array or dma-fence-chain. You cannot install these types of
> fences into a dma-resv or into syncobjs, so a proxy fence is useful
> here.

Hm, so that's a driver returning a dma_fence_array/chain through
::run_job()? Why would we not want to have them directly exposed and
split up into singular fence objects at resv insertion time (I don't
think syncobjs care, but I might be wrong). I mean, one of the point
behind the container extraction is so fences coming from the same
context/timeline can be detected and merged. If you insert the
container through a proxy, you're defeating the whole fence merging
optimization.

The second thing is that I'm not sure drivers were ever supposed to
return fence containers in the first place, because the whole idea
behind a fence context is that fences are emitted/signalled in
seqno-order, and if the fence is encoding the state of multiple
timelines that progress at their own pace, it becomes tricky to control
that. I guess if it's always the same set of timelines that are
combined, that would work.

> One example is when a single job submits work to multiple rings
> that are flipped in hardware at the same time.

We do have that in Panthor, but that's all explicit: in a single
SUBMIT, you can have multiple jobs targeting different queues, each of
them having their own set of deps/signal ops. The combination of all the
signal ops into a container is left to the UMD. It could be automated
kernel side, but that would be a flag on the SIGNAL op leading to the
creation of a fence_array containing fences from multiple submitted
jobs, rather than the driver combining stuff in the fence it returns in
::run_job().

> 
> Another case is late arming of hardware fences in run_job (which many
> drivers do). The proxy fence is immediately available at arm time and
> can be installed into dma-resv or syncobjs even though the actual
> hardware fence is not yet available. I think most drivers could be
> refactored to make the hardware fence immediately available at run_job,
> though.

Yep, I also think we can arm the driver fence early in the case of
JobQueue. The reason it couldn't be done before is because the
scheduler was in the middle, deciding which entity to pull the next job
from, which was changing the seqno a job driver-fence would be assigned
(you can't guess that at queue time in that case).

[...]

> > > > + * **Reference counting**
> > > > + *
> > > > + * Jobs and queues are both reference counted.
> > > > + *
> > > > + * A job holds a reference to its queue from drm_dep_job_init() until
> > > > + * drm_dep_job_put() drops the job's last reference and its release callback
> > > > + * runs. This ensures the queue remains valid for the entire lifetime of any
> > > > + * job that was submitted to it.
> > > > + *
> > > > + * The queue holds its own reference to a job for as long as the job is
> > > > + * internally tracked: from the moment the job is added to the pending list
> > > > + * in drm_dep_queue_run_job() until drm_dep_job_done() kicks the put_job
> > > > + * worker, which calls drm_dep_job_put() to release that reference.  
> > > 
> > > Why not simply keep track that the job was completed, instead of relinquishing
> > > the reference? We can then release the reference once the job is cleaned up
> > > (by the queue, using a worker) in process context.  
> 
> I think that’s what I’m doing, while also allowing an opt-in path to
> drop the job reference when it signals (in IRQ context)

Did you mean in !IRQ (or !atomic) context here? Feels weird to not
defer the cleanup when you're in an IRQ/atomic context, but defer it
when you're in a thread context.

> so we avoid
> switching to a work item just to drop a ref. That seems like a
> significant win in terms of CPU cycles.

Well, the cleanup path is probably not where latency matters the most.
It's adding scheduling overhead, sure, but given all the stuff we defer
already, I'm not too sure we're at saving a few cycles to get the
cleanup done immediately. What's important to have is a way to signal
fences in an atomic context, because this has an impact on latency.

[...]

> > > > + /*
> > > > + * Drop all input dependency fences now, in process context, before the
> > > > + * final job put. Once the job is on the pending list its last reference
> > > > + * may be dropped from a dma_fence callback (IRQ context), where calling
> > > > + * xa_destroy() would be unsafe.
> > > > + */  
> > > 
> > > I assume that “pending” is the list of jobs that have been handed to the driver
> > > via ops->run_job()?
> > > 
> > > Can’t this problem be solved by not doing anything inside a dma_fence callback
> > > other than scheduling the queue worker?
> > >   
> 
> Yes, this code is required to support dropping job refs directly in the
> dma-fence callback (an opt-in feature). Again, this seems like a
> significant win in terms of CPU cycles, although I haven’t collected
> data yet.

If it significantly hurts the perf, I'd like to understand why, because
to me it looks like pure-cleanup (no signaling involved), and thus no
other process waiting for us to do the cleanup. The only thing that
might have an impact is how fast you release the resources, and given
it's only a partial cleanup (xa_destroy() still has to be deferred), I'd
like to understand which part of the immediate cleanup is causing a
contention (basically which kind of resources the system is starving of)

Regards,

Boris

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
  2026-03-17 20:43         ` Boris Brezillon
@ 2026-03-18 22:40           ` Matthew Brost
  2026-03-19  9:57             ` Boris Brezillon
  0 siblings, 1 reply; 21+ messages in thread
From: Matthew Brost @ 2026-03-18 22:40 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Daniel Almeida, intel-xe, dri-devel, Tvrtko Ursulin, Rodrigo Vivi,
	Thomas Hellström, Christian König, Danilo Krummrich,
	David Airlie, Maarten Lankhorst, Maxime Ripard, Philipp Stanner,
	Simona Vetter, Sumit Semwal, Thomas Zimmermann, linux-kernel,
	Sami Tolvanen, Jeffrey Vander Stoep, Alice Ryhl, Daniel Stone,
	Alexandre Courbot, John Hubbard, shashanks, jajones,
	Eliot Courtney, Joel Fernandes, rust-for-linux

On Tue, Mar 17, 2026 at 09:43:20PM +0100, Boris Brezillon wrote:
> Hi Matthew,
> 
> Just a few drive-by comments.
> 
> On Tue, 17 Mar 2026 11:14:36 -0700
> Matthew Brost <matthew.brost@intel.com> wrote:
> 
> > > > > Timeout Detection and Recovery (TDR): a per-queue delayed work item
> > > > > fires when the head pending job exceeds q->job.timeout jiffies, calling
> > > > > ops->timedout_job(). drm_dep_queue_trigger_timeout() forces immediate
> > > > > expiry for device teardown.
> > > > > 
> > > > > IRQ-safe completion: queues flagged DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE
> > > > > allow drm_dep_job_done() to be called from hardirq context (e.g. a
> > > > > dma_fence callback). Dependency cleanup is deferred to process context
> > > > > after ops->run_job() returns to avoid calling xa_destroy() from IRQ.
> > > > > 
> > > > > Zombie-state guard: workers use kref_get_unless_zero() on entry and
> > > > > bail immediately if the queue refcount has already reached zero and
> > > > > async teardown is in flight, preventing use-after-free.  
> > > > 
> > > > In rust, when you queue work, you have to pass a reference-counted pointer
> > > > (Arc<T>). We simply never have this problem in a Rust design. If there is work
> > > > queued, the queue is alive.
> > > > 
> > > > By the way, why can’t we simply require synchronous teardowns?  
> > 
> > Consider the case where the DRM dep queue’s refcount drops to zero, but
> > the device firmware still holds references to the associated queue.
> > These are resources that must be torn down asynchronously. In Xe, I need
> > to send two asynchronous firmware commands before I can safely remove
> > the memory associated with the queue (faulting on this kind of global
> > memory will take down the device) and recycle the firmware ID tied to
> > the queue. These async commands are issued on the driver side, on the
> > DRM dep queue’s workqueue as well.
> 
> Asynchronous teardown is okay, but I'm not too sure using the refcnt to
> know that the queue is no longer usable is the way to go. To me the
> refcnt is what determines when the SW object is no longer referenced by
> any other item in the code, and a work item acting on the queue counts
> as one owner of this queue. If you want to cancel the work in order to
> speed up the destruction of the queue, you can call
> {cancel,disable}_work[_sync](), and have the ref dropped if the
> cancel/disable was effective. Multi-step teardown is also an option,
> but again, the state of the queue shouldn't be determined from its
> refcnt IMHO.
> 
> > 
> > Now consider a scenario where something goes wrong and those firmware
> > commands never complete, and a device reset is required to recover. The
> > driver’s per-queue tracking logic stops all queues (including zombie
> > ones), determines which commands were lost, cleans up the side effects
> > of that lost state, and then restarts all queues. That is how we would
> > end up in this work item with a zombie queue. The restart logic could
> > probably be made smart enough to avoid queueing work for zombie queues,
> > but in my opinion it’s safe enough to use kref_get_unless_zero() in the
> > work items.
> 
> Well, that only works for single-step teardown, or when you enter the
> last step. At which point, I'm not too sure it's significantly better
> than encoding the state of the queue through a separate field, and have
> the job queue logic reject new jobs if the queue is no longer usable
> (shouldn't even be exposed to userland at this point though).
> 

'shouldn't even be exposed to userland at this point though' - Yes.

The philosophy of ref counting design is roughly:

- When queue is created by userland call drm_dep_queue_init
- All jobs hold a ref to drm_dep_queue
- When userland close queue, remove it from the FD, call
  drm_dep_queue_put + initiatie teardown (I'd recommned just setting TDR
  to immediately fire, kick off queue from device in first fire + signal
  all fences).
- Queue refcount goes to zero optionality implement
  drm_dep_queue_ops.fini to keep the drm_dep_queue (and object it is
  embedded) around for a bit longer if additional firmware / device side
  resources are still around, call drm_dep_queue_fini when this part
  completes. If drm_dep_queue_ops.fini isn't implement the core
  implementation just calls drm_dep_queue_fini.
- Work item releases drm_dep_queue out dma-fence signaling path for safe
  memory release (e.g., take dma-resv locks).


> > 
> > It should also be clear that a DRM dep queue is primarily intended to be
> > embedded inside the driver’s own queue object, even though it is valid
> > to use it as a standalone object. The async teardown flows are also
> > optional features.
> > 
> > Let’s also consider a case where you do not need the async firmware
> > flows described above, but the DRM dep queue is still embedded in a
> > driver-side object that owns memory via dma-resv. The final queue put
> > may occur in IRQ context (DRM dep avoids kicking a worker just to drop a
> > refi as opt in), or in the reclaim path (any scheduler workqueue is the
> > reclaim path). In either case, you cannot free memory there taking a
> > dma-resv lock, which is why all DRM dep queues ultimately free their
> > resources in a work item outside of reclaim. Many drivers already follow
> > this pattern, but in DRM dep this behavior is built-in.
> 
> I agree deferred cleanup is the way to go.
> 

+1. Yes. I spot a bunch of drivers that open code this part driver side,
Xe included.

> > 
> > So I don’t think Rust natively solves these types of problems, although
> > I’ll concede that it does make refcounting a bit more sane.
> 
> Rust won't magically defer the cleanup, nor will it dictate how you want
> to do the queue teardown, those are things you need to implement. But it
> should give visibility about object lifetimes, and guarantee that an
> object that's still visible to some owners is usable (the notion of
> usable is highly dependent on the object implementation).
> 
> Just a purely theoretical example of a multi-step queue teardown that
> might be possible to encode in rust:
> 
> - MyJobQueue<Usable>: The job queue is currently exposed and usable.
>   There's a ::destroy() method consuming 'self' and returning a
>   MyJobQueue<Destroyed> object
> - MyJobQueue<Destroyed>: The user asked for the workqueue to be
>   destroyed. No new job can be pushed. Existing jobs that didn't make
>   it to the FW queue are cancelled, jobs that are in-flight are
>   cancelled if they can, or are just waited upon if they can't. When
>   the whole destruction step is done, ::destroyed() is called, it
>   consumes 'self' and returns a MyJobQueue<Inactive> object.
> - MyJobQueue<Inactive>: The queue is no longer active (HW doesn't have
>   any resources on this queue). It's ready to be cleaned up.
>   ::cleanup() (or just ::drop()) defers the cleanup of some inner
>   object that has been passed around between the various
>   MyJobQueue<State> wrappers.
> 
> Each of the state transition can happen asynchronously. A state
> transition consumes the object in one state, and returns a new object
> in its new state. None of the transition involves dropping a refcnt,
> ownership is just transferred. The final MyJobQueue<Inactive> object is
> the object we'll defer cleanup on.
> 
> It's a very high-level view of one way this can be implemented (I'm
> sure there are others, probably better than my suggestion) in order to
> make sure the object doesn't go away without the compiler enforcing
> proper state transitions.
> 

I'm sure Rust can implement this. My point about Rust is it doesn't
magically solve hard software arch probles, but I will admit the
ownership model, way it can enforce locking at compile time is pretty
cool.

> > > > > +/**
> > > > > + * DOC: DRM dependency fence
> > > > > + *
> > > > > + * Each struct drm_dep_job has an associated struct drm_dep_fence that
> > > > > + * provides a single dma_fence (@finished) signalled when the hardware
> > > > > + * completes the job.
> > > > > + *
> > > > > + * The hardware fence returned by &drm_dep_queue_ops.run_job is stored as
> > > > > + * @parent. @finished is chained to @parent via drm_dep_job_done_cb() and
> > > > > + * is signalled once @parent signals (or immediately if run_job() returns
> > > > > + * NULL or an error).  
> > > > 
> > > > I thought this fence proxy mechanism was going away due to recent work being
> > > > carried out by Christian?
> > > >   
> > 
> > Consider the case where a driver’s hardware fence is implemented as a
> > dma-fence-array or dma-fence-chain. You cannot install these types of
> > fences into a dma-resv or into syncobjs, so a proxy fence is useful
> > here.
> 
> Hm, so that's a driver returning a dma_fence_array/chain through
> ::run_job()? Why would we not want to have them directly exposed and
> split up into singular fence objects at resv insertion time (I don't
> think syncobjs care, but I might be wrong). I mean, one of the point

You can stick dma-fence-arrays in syncobjs, but not chains.

Neither dma-fence-arrays/chain can go into dma-resv.

Hence why disconnecting a job's finished fence from hardware fence IMO
is good idea to keep so gives drivers flexiblity on the hardware fences.
e.g., If this design didn't have a job's finished fence, I'd have to
open code one Xe side.

> behind the container extraction is so fences coming from the same
> context/timeline can be detected and merged. If you insert the
> container through a proxy, you're defeating the whole fence merging
> optimization.

Right. Finished fences have single timeline too...

> 
> The second thing is that I'm not sure drivers were ever supposed to
> return fence containers in the first place, because the whole idea
> behind a fence context is that fences are emitted/signalled in
> seqno-order, and if the fence is encoding the state of multiple
> timelines that progress at their own pace, it becomes tricky to control
> that. I guess if it's always the same set of timelines that are
> combined, that would work.

Xe does this is definitely works. We submit to multiple rings, when all
rings signal a seqno, a chain or array signals -> finished fence
signals. The queues used in this manor can only submit multiple ring
jobs so the finished fence timeline stays intact. If you could a
multiple rings followed by a single ring submission on the same queue,
yes this could break.

> 
> > One example is when a single job submits work to multiple rings
> > that are flipped in hardware at the same time.
> 
> We do have that in Panthor, but that's all explicit: in a single
> SUBMIT, you can have multiple jobs targeting different queues, each of
> them having their own set of deps/signal ops. The combination of all the
> signal ops into a container is left to the UMD. It could be automated
> kernel side, but that would be a flag on the SIGNAL op leading to the
> creation of a fence_array containing fences from multiple submitted
> jobs, rather than the driver combining stuff in the fence it returns in
> ::run_job().

See above. We have a dedicated queue type for these type of submissions
and single job that submits to the all rings. We had multiple queue /
jobs in the i915 to implemented this but it turns out it is much cleaner
with a single queue / singler job / multiple rings model.

> 
> > 
> > Another case is late arming of hardware fences in run_job (which many
> > drivers do). The proxy fence is immediately available at arm time and
> > can be installed into dma-resv or syncobjs even though the actual
> > hardware fence is not yet available. I think most drivers could be
> > refactored to make the hardware fence immediately available at run_job,
> > though.
> 
> Yep, I also think we can arm the driver fence early in the case of
> JobQueue. The reason it couldn't be done before is because the
> scheduler was in the middle, deciding which entity to pull the next job
> from, which was changing the seqno a job driver-fence would be assigned
> (you can't guess that at queue time in that case).
> 

Xe doesn't need to late arming, but it look like multiple drivers to
implement the late arming which may be required (?).

> [...]
> 
> > > > > + * **Reference counting**
> > > > > + *
> > > > > + * Jobs and queues are both reference counted.
> > > > > + *
> > > > > + * A job holds a reference to its queue from drm_dep_job_init() until
> > > > > + * drm_dep_job_put() drops the job's last reference and its release callback
> > > > > + * runs. This ensures the queue remains valid for the entire lifetime of any
> > > > > + * job that was submitted to it.
> > > > > + *
> > > > > + * The queue holds its own reference to a job for as long as the job is
> > > > > + * internally tracked: from the moment the job is added to the pending list
> > > > > + * in drm_dep_queue_run_job() until drm_dep_job_done() kicks the put_job
> > > > > + * worker, which calls drm_dep_job_put() to release that reference.  
> > > > 
> > > > Why not simply keep track that the job was completed, instead of relinquishing
> > > > the reference? We can then release the reference once the job is cleaned up
> > > > (by the queue, using a worker) in process context.  
> > 
> > I think that’s what I’m doing, while also allowing an opt-in path to
> > drop the job reference when it signals (in IRQ context)
> 
> Did you mean in !IRQ (or !atomic) context here? Feels weird to not
> defer the cleanup when you're in an IRQ/atomic context, but defer it
> when you're in a thread context.
> 

The put of a job in this design can be from an IRQ context (opt-in)
feature. xa_destroy blows up if it is called from an IRQ context,
although maybe that could be workaround.

> > so we avoid
> > switching to a work item just to drop a ref. That seems like a
> > significant win in terms of CPU cycles.
> 
> Well, the cleanup path is probably not where latency matters the most.

Agree. But I do think avoiding a CPU context switch (work item) for a
very lightweight job cleanup (usually just drop refs) will save of CPU
cycles, thus also things like power, etc...

> It's adding scheduling overhead, sure, but given all the stuff we defer
> already, I'm not too sure we're at saving a few cycles to get the
> cleanup done immediately. What's important to have is a way to signal
> fences in an atomic context, because this has an impact on latency.
> 

Yes. The signaling happens first then drm_dep_job_put if IRQ opt-in.

> [...]
> 
> > > > > + /*
> > > > > + * Drop all input dependency fences now, in process context, before the
> > > > > + * final job put. Once the job is on the pending list its last reference
> > > > > + * may be dropped from a dma_fence callback (IRQ context), where calling
> > > > > + * xa_destroy() would be unsafe.
> > > > > + */  
> > > > 
> > > > I assume that “pending” is the list of jobs that have been handed to the driver
> > > > via ops->run_job()?
> > > > 
> > > > Can’t this problem be solved by not doing anything inside a dma_fence callback
> > > > other than scheduling the queue worker?
> > > >   
> > 
> > Yes, this code is required to support dropping job refs directly in the
> > dma-fence callback (an opt-in feature). Again, this seems like a
> > significant win in terms of CPU cycles, although I haven’t collected
> > data yet.
> 
> If it significantly hurts the perf, I'd like to understand why, because
> to me it looks like pure-cleanup (no signaling involved), and thus no
> other process waiting for us to do the cleanup. The only thing that
> might have an impact is how fast you release the resources, and given
> it's only a partial cleanup (xa_destroy() still has to be deferred), I'd
> like to understand which part of the immediate cleanup is causing a
> contention (basically which kind of resources the system is starving of)
> 

It was more of once we moved to a ref counted model, it is pretty
trivial allow drm_dep_job_put when the fence is signaling. It doesn't
really add any complexity either, thus why I added it is.

Matt

> Regards,
> 
> Boris

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
  2026-03-17 14:33         ` Danilo Krummrich
@ 2026-03-18 22:50           ` Matthew Brost
  0 siblings, 0 replies; 21+ messages in thread
From: Matthew Brost @ 2026-03-18 22:50 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Daniel Almeida, intel-xe, dri-devel, Boris Brezillon,
	Tvrtko Ursulin, Rodrigo Vivi, Thomas Hellström,
	Christian König, David Airlie, Maarten Lankhorst,
	Maxime Ripard, Philipp Stanner, Simona Vetter, Sumit Semwal,
	Thomas Zimmermann, linux-kernel, Sami Tolvanen,
	Jeffrey Vander Stoep, Alice Ryhl, Daniel Stone, Alexandre Courbot,
	John Hubbard, shashanks, jajones, Eliot Courtney, Joel Fernandes,
	rust-for-linux, Miguel Ojeda

On Tue, Mar 17, 2026 at 03:33:13PM +0100, Danilo Krummrich wrote:
> On Tue Mar 17, 2026 at 3:25 PM CET, Daniel Almeida wrote:
> >
> >
> >> On 17 Mar 2026, at 09:31, Danilo Krummrich <dakr@kernel.org> wrote:
> >> 
> >> On Tue Mar 17, 2026 at 3:47 AM CET, Daniel Almeida wrote:
> >>> I agree with what Danilo said below, i.e.:  IMHO, with the direction that DRM
> >>> is going, it is much more ergonomic to add a Rust component with a nice C
> >>> interface than doing it the other way around.
> >> 
> >> This is not exactly what I said. I was talking about the maintainance aspects
> >> and that a Rust Jobqueue implementation (for the reasons explained in my initial
> >> reply) is easily justifiable in this aspect, whereas another C implementation,
> >> that does *not* replace the existing DRM scheduler entirely, is much harder to
> >> justify from a maintainance perspective.
> >
> > Ok, I misunderstood your point a bit.
> >
> >> 
> >> I'm also not sure whether a C interface from the Rust side is easy to establish.
> >> We don't want to limit ourselves in terms of language capabilities for this and
> >> passing through all the additional infromation Rust carries in the type system
> >> might not be straight forward.
> >> 
> >> It would be an experiment, and it was one of the ideas behind the Rust Jobqueue
> >> to see how it turns if we try. Always with the fallback of having C
> >> infrastructure as an alternative when it doesn't work out well.
> >
> > From previous experience in doing Rust to C FFI in NVK, I don’t see, at
> > first, why this can’t work. But I agree with you, there may very well be
> > unanticipated things here and this part is indeed an experiment. No argument
> > from me here.
> >
> >> 
> >> Having this said, I don't see an issue with the drm_dep thing going forward if
> >> there is a path to replacing DRM sched entirely.

The only weird case I haven't wrapped my head around quite yet is the
ganged submissions that rely on the scheduled fence (PVR, AMDGPU do
this). Pretty much every other driver seems like it could be coverted
with what I have in place in this series + local work to provide a
hardware scheduler...

> >
> > The issues I pointed out remain. Even if the plan is to have drm_dep + JobQueue
> > (and no drm_sched). I feel that my point of considering doing it in Rust remains.
> 
> I mean, as mentioned below, we should have a Rust Jobqueue as independent
> component. Or are you saying you'd consdider having only a Rust component with a
> C API eventually? If so, that'd be way too early to consider for various
> reasons.
> 

We need to some C story one way or another as we have C drivers and DRM
sched is not cutting it nor is maitainable.

> >> The Rust component should remain independent from this for the reasons mentioned
> >> in [1].
> >> 
> >> [1] https://lore.kernel.org/dri-devel/DH51W6XRQXYX.3M30IRYIWZLFG@kernel.org/

Fair enough. I read through [1], let me respond there.

Matt

> >
> > Ok
> >
> > — Daniel
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
  2026-03-18 22:40           ` Matthew Brost
@ 2026-03-19  9:57             ` Boris Brezillon
  2026-03-22  6:43               ` Matthew Brost
  0 siblings, 1 reply; 21+ messages in thread
From: Boris Brezillon @ 2026-03-19  9:57 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Daniel Almeida, intel-xe, dri-devel, Tvrtko Ursulin, Rodrigo Vivi,
	Thomas Hellström, Christian König, Danilo Krummrich,
	David Airlie, Maarten Lankhorst, Maxime Ripard, Philipp Stanner,
	Simona Vetter, Sumit Semwal, Thomas Zimmermann, linux-kernel,
	Sami Tolvanen, Jeffrey Vander Stoep, Alice Ryhl, Daniel Stone,
	Alexandre Courbot, John Hubbard, shashanks, jajones,
	Eliot Courtney, Joel Fernandes, rust-for-linux

On Wed, 18 Mar 2026 15:40:35 -0700
Matthew Brost <matthew.brost@intel.com> wrote:

> > > 
> > > So I don’t think Rust natively solves these types of problems, although
> > > I’ll concede that it does make refcounting a bit more sane.  
> > 
> > Rust won't magically defer the cleanup, nor will it dictate how you want
> > to do the queue teardown, those are things you need to implement. But it
> > should give visibility about object lifetimes, and guarantee that an
> > object that's still visible to some owners is usable (the notion of
> > usable is highly dependent on the object implementation).
> > 
> > Just a purely theoretical example of a multi-step queue teardown that
> > might be possible to encode in rust:
> > 
> > - MyJobQueue<Usable>: The job queue is currently exposed and usable.
> >   There's a ::destroy() method consuming 'self' and returning a
> >   MyJobQueue<Destroyed> object
> > - MyJobQueue<Destroyed>: The user asked for the workqueue to be
> >   destroyed. No new job can be pushed. Existing jobs that didn't make
> >   it to the FW queue are cancelled, jobs that are in-flight are
> >   cancelled if they can, or are just waited upon if they can't. When
> >   the whole destruction step is done, ::destroyed() is called, it
> >   consumes 'self' and returns a MyJobQueue<Inactive> object.
> > - MyJobQueue<Inactive>: The queue is no longer active (HW doesn't have
> >   any resources on this queue). It's ready to be cleaned up.
> >   ::cleanup() (or just ::drop()) defers the cleanup of some inner
> >   object that has been passed around between the various
> >   MyJobQueue<State> wrappers.
> > 
> > Each of the state transition can happen asynchronously. A state
> > transition consumes the object in one state, and returns a new object
> > in its new state. None of the transition involves dropping a refcnt,
> > ownership is just transferred. The final MyJobQueue<Inactive> object is
> > the object we'll defer cleanup on.
> > 
> > It's a very high-level view of one way this can be implemented (I'm
> > sure there are others, probably better than my suggestion) in order to
> > make sure the object doesn't go away without the compiler enforcing
> > proper state transitions.
> >   
> 
> I'm sure Rust can implement this. My point about Rust is it doesn't
> magically solve hard software arch probles, but I will admit the
> ownership model, way it can enforce locking at compile time is pretty
> cool.

It's not quite about rust directly solving those problems for you, it's
about rust forcing you to think about those problems in the first
place. So no, rust won't magically solve your multi-step teardown with
crazy CPU <-> Device synchronization etc, but it allows you to clearly
identify those steps, and think about how you want to represent them
without abusing other concepts, like object refcounting/ownership.
Everything I described, you can code it in C BTW, it's just that C is so
lax that you can also abuse other stuff to get to your ends, which might
or might not be safe, but more importantly, will very likely obfuscate
the code (even with good docs).

> 
> > > > > > +/**
> > > > > > + * DOC: DRM dependency fence
> > > > > > + *
> > > > > > + * Each struct drm_dep_job has an associated struct drm_dep_fence that
> > > > > > + * provides a single dma_fence (@finished) signalled when the hardware
> > > > > > + * completes the job.
> > > > > > + *
> > > > > > + * The hardware fence returned by &drm_dep_queue_ops.run_job is stored as
> > > > > > + * @parent. @finished is chained to @parent via drm_dep_job_done_cb() and
> > > > > > + * is signalled once @parent signals (or immediately if run_job() returns
> > > > > > + * NULL or an error).    
> > > > > 
> > > > > I thought this fence proxy mechanism was going away due to recent work being
> > > > > carried out by Christian?
> > > > >     
> > > 
> > > Consider the case where a driver’s hardware fence is implemented as a
> > > dma-fence-array or dma-fence-chain. You cannot install these types of
> > > fences into a dma-resv or into syncobjs, so a proxy fence is useful
> > > here.  
> > 
> > Hm, so that's a driver returning a dma_fence_array/chain through
> > ::run_job()? Why would we not want to have them directly exposed and
> > split up into singular fence objects at resv insertion time (I don't
> > think syncobjs care, but I might be wrong). I mean, one of the point  
> 
> You can stick dma-fence-arrays in syncobjs, but not chains.

Yeah, kinda makes sense, since timeline syncobjs use chains, and if the
chain reject inner chains, it won't work.

> 
> Neither dma-fence-arrays/chain can go into dma-resv.

They can't go directly in it, but those can be split into individual
fences and be inserted, which would achieve the same goal.

> 
> Hence why disconnecting a job's finished fence from hardware fence IMO
> is good idea to keep so gives drivers flexiblity on the hardware fences.

The thing is, I'm not sure drivers were ever meant to expose containers
through ::run_job().

> e.g., If this design didn't have a job's finished fence, I'd have to
> open code one Xe side.

There might be other reasons we'd like to keep the
drm_sched_fence-like proxy that I'm missing. But if it's the only one,
and the fence-combining pattern you're describing is common to multiple
drivers, we can provide a container implementation that's not a
fence_array, so you can use it to insert driver fences into other
containers. This way we wouldn't force the proxy model to all drivers,
but we would keep the code generic/re-usable.

> 
> > behind the container extraction is so fences coming from the same
> > context/timeline can be detected and merged. If you insert the
> > container through a proxy, you're defeating the whole fence merging
> > optimization.  
> 
> Right. Finished fences have single timeline too...

Aren't you faking a single timeline though if you combine fences from
different engines running at their own pace into a container?

> 
> > 
> > The second thing is that I'm not sure drivers were ever supposed to
> > return fence containers in the first place, because the whole idea
> > behind a fence context is that fences are emitted/signalled in
> > seqno-order, and if the fence is encoding the state of multiple
> > timelines that progress at their own pace, it becomes tricky to control
> > that. I guess if it's always the same set of timelines that are
> > combined, that would work.  
> 
> Xe does this is definitely works. We submit to multiple rings, when all
> rings signal a seqno, a chain or array signals -> finished fence
> signals. The queues used in this manor can only submit multiple ring
> jobs so the finished fence timeline stays intact. If you could a
> multiple rings followed by a single ring submission on the same queue,
> yes this could break.

Okay, I had the same understanding, thanks for confirming.

> 
> >   
> > > One example is when a single job submits work to multiple rings
> > > that are flipped in hardware at the same time.  
> > 
> > We do have that in Panthor, but that's all explicit: in a single
> > SUBMIT, you can have multiple jobs targeting different queues, each of
> > them having their own set of deps/signal ops. The combination of all the
> > signal ops into a container is left to the UMD. It could be automated
> > kernel side, but that would be a flag on the SIGNAL op leading to the
> > creation of a fence_array containing fences from multiple submitted
> > jobs, rather than the driver combining stuff in the fence it returns in
> > ::run_job().  
> 
> See above. We have a dedicated queue type for these type of submissions
> and single job that submits to the all rings. We had multiple queue /
> jobs in the i915 to implemented this but it turns out it is much cleaner
> with a single queue / singler job / multiple rings model.

Hm, okay. It didn't turn into a mess in Panthor, but Xe is likely an
order of magnitude more complicated that Mali, so I'll refrain from
judging this design decision.

> 
> >   
> > > 
> > > Another case is late arming of hardware fences in run_job (which many
> > > drivers do). The proxy fence is immediately available at arm time and
> > > can be installed into dma-resv or syncobjs even though the actual
> > > hardware fence is not yet available. I think most drivers could be
> > > refactored to make the hardware fence immediately available at run_job,
> > > though.  
> > 
> > Yep, I also think we can arm the driver fence early in the case of
> > JobQueue. The reason it couldn't be done before is because the
> > scheduler was in the middle, deciding which entity to pull the next job
> > from, which was changing the seqno a job driver-fence would be assigned
> > (you can't guess that at queue time in that case).
> >   
> 
> Xe doesn't need to late arming, but it look like multiple drivers to
> implement the late arming which may be required (?).

As I said, it's mostly a problem when you have a
single-HW-queue:multiple-contexts model, which is exactly what
drm_sched was designed for. I suspect early arming is not an issue for
any of the HW supporting FW-based scheduling (PVR, Mali, NVidia,
...). If you want to use drm_dep for all drivers currently using
drm_sched (I'm still not convinced this is a good idea to do that
just yet, because then you're going to pull a lot of the complexity
we're trying to get rid of), then you need late arming of driver fences.

> 
> > [...]
> >   
> > > > > > + * **Reference counting**
> > > > > > + *
> > > > > > + * Jobs and queues are both reference counted.
> > > > > > + *
> > > > > > + * A job holds a reference to its queue from drm_dep_job_init() until
> > > > > > + * drm_dep_job_put() drops the job's last reference and its release callback
> > > > > > + * runs. This ensures the queue remains valid for the entire lifetime of any
> > > > > > + * job that was submitted to it.
> > > > > > + *
> > > > > > + * The queue holds its own reference to a job for as long as the job is
> > > > > > + * internally tracked: from the moment the job is added to the pending list
> > > > > > + * in drm_dep_queue_run_job() until drm_dep_job_done() kicks the put_job
> > > > > > + * worker, which calls drm_dep_job_put() to release that reference.    
> > > > > 
> > > > > Why not simply keep track that the job was completed, instead of relinquishing
> > > > > the reference? We can then release the reference once the job is cleaned up
> > > > > (by the queue, using a worker) in process context.    
> > > 
> > > I think that’s what I’m doing, while also allowing an opt-in path to
> > > drop the job reference when it signals (in IRQ context)  
> > 
> > Did you mean in !IRQ (or !atomic) context here? Feels weird to not
> > defer the cleanup when you're in an IRQ/atomic context, but defer it
> > when you're in a thread context.
> >   
> 
> The put of a job in this design can be from an IRQ context (opt-in)
> feature. xa_destroy blows up if it is called from an IRQ context,
> although maybe that could be workaround.

Making it so _put() in IRQ context is safe is fine, what I'm saying is
that instead of doing a partial immediate cleanup, and the rest in a
worker, we can just defer everything: that is, have some
_deref_release() function called by kref_put() that would queue a work
item from which the actual release is done.

> 
> > > so we avoid
> > > switching to a work item just to drop a ref. That seems like a
> > > significant win in terms of CPU cycles.  
> > 
> > Well, the cleanup path is probably not where latency matters the most.  
> 
> Agree. But I do think avoiding a CPU context switch (work item) for a
> very lightweight job cleanup (usually just drop refs) will save of CPU
> cycles, thus also things like power, etc...

That's the sort of statements I'd like to be backed by actual
numbers/scenarios proving that it actually makes a difference. The
mixed model where things are partially freed immediately/partially
deferred, and sometimes even with conditionals for whether the deferral
happens or not, it just makes building a mental model of this thing a
nightmare, which in turn usually leads to subtle bugs.

> 
> > It's adding scheduling overhead, sure, but given all the stuff we defer
> > already, I'm not too sure we're at saving a few cycles to get the
> > cleanup done immediately. What's important to have is a way to signal
> > fences in an atomic context, because this has an impact on latency.
> >   
> 
> Yes. The signaling happens first then drm_dep_job_put if IRQ opt-in.
> 
> > [...]
> >   
> > > > > > + /*
> > > > > > + * Drop all input dependency fences now, in process context, before the
> > > > > > + * final job put. Once the job is on the pending list its last reference
> > > > > > + * may be dropped from a dma_fence callback (IRQ context), where calling
> > > > > > + * xa_destroy() would be unsafe.
> > > > > > + */    
> > > > > 
> > > > > I assume that “pending” is the list of jobs that have been handed to the driver
> > > > > via ops->run_job()?
> > > > > 
> > > > > Can’t this problem be solved by not doing anything inside a dma_fence callback
> > > > > other than scheduling the queue worker?
> > > > >     
> > > 
> > > Yes, this code is required to support dropping job refs directly in the
> > > dma-fence callback (an opt-in feature). Again, this seems like a
> > > significant win in terms of CPU cycles, although I haven’t collected
> > > data yet.  
> > 
> > If it significantly hurts the perf, I'd like to understand why, because
> > to me it looks like pure-cleanup (no signaling involved), and thus no
> > other process waiting for us to do the cleanup. The only thing that
> > might have an impact is how fast you release the resources, and given
> > it's only a partial cleanup (xa_destroy() still has to be deferred), I'd
> > like to understand which part of the immediate cleanup is causing a
> > contention (basically which kind of resources the system is starving of)
> >   
> 
> It was more of once we moved to a ref counted model, it is pretty
> trivial allow drm_dep_job_put when the fence is signaling. It doesn't
> really add any complexity either, thus why I added it is.

It's not the refcount model I'm complaining about, it's the "part of it
is always freed immediately, part of it is deferred, but not always ..."
that happens in drm_dep_job_release() I'm questioning. I'd really
prefer something like:

static void drm_dep_job_release()
{
	// do it all unconditionally
}

static void drm_dep_job_defer_release()
{
	queue_work(&job->cleanup_work);
}

static void drm_dep_job_put()
{
	kref_put(job, drm_dep_job_defer_release);
}

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
  2026-03-19  9:57             ` Boris Brezillon
@ 2026-03-22  6:43               ` Matthew Brost
  2026-03-23  7:58                 ` Matthew Brost
  0 siblings, 1 reply; 21+ messages in thread
From: Matthew Brost @ 2026-03-22  6:43 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Daniel Almeida, intel-xe, dri-devel, Tvrtko Ursulin, Rodrigo Vivi,
	Thomas Hellström, Christian König, Danilo Krummrich,
	David Airlie, Maarten Lankhorst, Maxime Ripard, Philipp Stanner,
	Simona Vetter, Sumit Semwal, Thomas Zimmermann, linux-kernel,
	Sami Tolvanen, Jeffrey Vander Stoep, Alice Ryhl, Daniel Stone,
	Alexandre Courbot, John Hubbard, shashanks, jajones,
	Eliot Courtney, Joel Fernandes, rust-for-linux

On Thu, Mar 19, 2026 at 10:57:29AM +0100, Boris Brezillon wrote:
> On Wed, 18 Mar 2026 15:40:35 -0700
> Matthew Brost <matthew.brost@intel.com> wrote:
> 
> > > > 
> > > > So I don’t think Rust natively solves these types of problems, although
> > > > I’ll concede that it does make refcounting a bit more sane.  
> > > 
> > > Rust won't magically defer the cleanup, nor will it dictate how you want
> > > to do the queue teardown, those are things you need to implement. But it
> > > should give visibility about object lifetimes, and guarantee that an
> > > object that's still visible to some owners is usable (the notion of
> > > usable is highly dependent on the object implementation).
> > > 
> > > Just a purely theoretical example of a multi-step queue teardown that
> > > might be possible to encode in rust:
> > > 
> > > - MyJobQueue<Usable>: The job queue is currently exposed and usable.
> > >   There's a ::destroy() method consuming 'self' and returning a
> > >   MyJobQueue<Destroyed> object
> > > - MyJobQueue<Destroyed>: The user asked for the workqueue to be
> > >   destroyed. No new job can be pushed. Existing jobs that didn't make
> > >   it to the FW queue are cancelled, jobs that are in-flight are
> > >   cancelled if they can, or are just waited upon if they can't. When
> > >   the whole destruction step is done, ::destroyed() is called, it
> > >   consumes 'self' and returns a MyJobQueue<Inactive> object.
> > > - MyJobQueue<Inactive>: The queue is no longer active (HW doesn't have
> > >   any resources on this queue). It's ready to be cleaned up.
> > >   ::cleanup() (or just ::drop()) defers the cleanup of some inner
> > >   object that has been passed around between the various
> > >   MyJobQueue<State> wrappers.
> > > 
> > > Each of the state transition can happen asynchronously. A state
> > > transition consumes the object in one state, and returns a new object
> > > in its new state. None of the transition involves dropping a refcnt,
> > > ownership is just transferred. The final MyJobQueue<Inactive> object is
> > > the object we'll defer cleanup on.
> > > 
> > > It's a very high-level view of one way this can be implemented (I'm
> > > sure there are others, probably better than my suggestion) in order to
> > > make sure the object doesn't go away without the compiler enforcing
> > > proper state transitions.
> > >   
> > 
> > I'm sure Rust can implement this. My point about Rust is it doesn't
> > magically solve hard software arch probles, but I will admit the
> > ownership model, way it can enforce locking at compile time is pretty
> > cool.
> 
> It's not quite about rust directly solving those problems for you, it's
> about rust forcing you to think about those problems in the first
> place. So no, rust won't magically solve your multi-step teardown with
> crazy CPU <-> Device synchronization etc, but it allows you to clearly
> identify those steps, and think about how you want to represent them
> without abusing other concepts, like object refcounting/ownership.
> Everything I described, you can code it in C BTW, it's just that C is so
> lax that you can also abuse other stuff to get to your ends, which might
> or might not be safe, but more importantly, will very likely obfuscate
> the code (even with good docs).
> 

This is very well put, and I completely agree. Sorry—I get annoyed by
the Rust comments. It solves some classes of problems, but it doesn’t
magically solve complex software architecture issues that need to be
thoughtfully designed.

> > 
> > > > > > > +/**
> > > > > > > + * DOC: DRM dependency fence
> > > > > > > + *
> > > > > > > + * Each struct drm_dep_job has an associated struct drm_dep_fence that
> > > > > > > + * provides a single dma_fence (@finished) signalled when the hardware
> > > > > > > + * completes the job.
> > > > > > > + *
> > > > > > > + * The hardware fence returned by &drm_dep_queue_ops.run_job is stored as
> > > > > > > + * @parent. @finished is chained to @parent via drm_dep_job_done_cb() and
> > > > > > > + * is signalled once @parent signals (or immediately if run_job() returns
> > > > > > > + * NULL or an error).    
> > > > > > 
> > > > > > I thought this fence proxy mechanism was going away due to recent work being
> > > > > > carried out by Christian?
> > > > > >     
> > > > 
> > > > Consider the case where a driver’s hardware fence is implemented as a
> > > > dma-fence-array or dma-fence-chain. You cannot install these types of
> > > > fences into a dma-resv or into syncobjs, so a proxy fence is useful
> > > > here.  
> > > 
> > > Hm, so that's a driver returning a dma_fence_array/chain through
> > > ::run_job()? Why would we not want to have them directly exposed and
> > > split up into singular fence objects at resv insertion time (I don't
> > > think syncobjs care, but I might be wrong). I mean, one of the point  
> > 
> > You can stick dma-fence-arrays in syncobjs, but not chains.
> 
> Yeah, kinda makes sense, since timeline syncobjs use chains, and if the
> chain reject inner chains, it won't work.
> 

+1, Exactly.

> > 
> > Neither dma-fence-arrays/chain can go into dma-resv.
> 
> They can't go directly in it, but those can be split into individual
> fences and be inserted, which would achieve the same goal.
> 

Yes, but now it becomes a driver problem (maybe only mine) rather than
an opaque job fence that can be inserted. In my opinion, it’s best to
keep the job vs. hardware fence abstraction.

> > 
> > Hence why disconnecting a job's finished fence from hardware fence IMO
> > is good idea to keep so gives drivers flexiblity on the hardware fences.
> 
> The thing is, I'm not sure drivers were ever meant to expose containers
> through ::run_job().
> 

Well there haven't been any rules...

> > e.g., If this design didn't have a job's finished fence, I'd have to
> > open code one Xe side.
> 
> There might be other reasons we'd like to keep the
> drm_sched_fence-like proxy that I'm missing. But if it's the only one,
> and the fence-combining pattern you're describing is common to multiple
> drivers, we can provide a container implementation that's not a
> fence_array, so you can use it to insert driver fences into other
> containers. This way we wouldn't force the proxy model to all drivers,
> but we would keep the code generic/re-usable.
> 
> > 
> > > behind the container extraction is so fences coming from the same
> > > context/timeline can be detected and merged. If you insert the
> > > container through a proxy, you're defeating the whole fence merging
> > > optimization.  
> > 
> > Right. Finished fences have single timeline too...
> 
> Aren't you faking a single timeline though if you combine fences from
> different engines running at their own pace into a container?
> 
> > 
> > > 
> > > The second thing is that I'm not sure drivers were ever supposed to
> > > return fence containers in the first place, because the whole idea
> > > behind a fence context is that fences are emitted/signalled in
> > > seqno-order, and if the fence is encoding the state of multiple
> > > timelines that progress at their own pace, it becomes tricky to control
> > > that. I guess if it's always the same set of timelines that are
> > > combined, that would work.  
> > 
> > Xe does this is definitely works. We submit to multiple rings, when all
> > rings signal a seqno, a chain or array signals -> finished fence
> > signals. The queues used in this manor can only submit multiple ring
> > jobs so the finished fence timeline stays intact. If you could a
> > multiple rings followed by a single ring submission on the same queue,
> > yes this could break.
> 
> Okay, I had the same understanding, thanks for confirming.
> 

I think the last three comments are resolved here—it’s a queue timeline.
As long as the queue has consistent rules (i.e., submits to a consistent
set of rings), this whole approach makes sense?

> > 
> > >   
> > > > One example is when a single job submits work to multiple rings
> > > > that are flipped in hardware at the same time.  
> > > 
> > > We do have that in Panthor, but that's all explicit: in a single
> > > SUBMIT, you can have multiple jobs targeting different queues, each of
> > > them having their own set of deps/signal ops. The combination of all the
> > > signal ops into a container is left to the UMD. It could be automated
> > > kernel side, but that would be a flag on the SIGNAL op leading to the
> > > creation of a fence_array containing fences from multiple submitted
> > > jobs, rather than the driver combining stuff in the fence it returns in
> > > ::run_job().  
> > 
> > See above. We have a dedicated queue type for these type of submissions
> > and single job that submits to the all rings. We had multiple queue /
> > jobs in the i915 to implemented this but it turns out it is much cleaner
> > with a single queue / singler job / multiple rings model.
> 
> Hm, okay. It didn't turn into a mess in Panthor, but Xe is likely an
> order of magnitude more complicated that Mali, so I'll refrain from
> judging this design decision.
> 

Yes, Xe is a beast, but we tend to build complexity into components and
layers to manage it. That is what I’m attempting to do here.

> > 
> > >   
> > > > 
> > > > Another case is late arming of hardware fences in run_job (which many
> > > > drivers do). The proxy fence is immediately available at arm time and
> > > > can be installed into dma-resv or syncobjs even though the actual
> > > > hardware fence is not yet available. I think most drivers could be
> > > > refactored to make the hardware fence immediately available at run_job,
> > > > though.  
> > > 
> > > Yep, I also think we can arm the driver fence early in the case of
> > > JobQueue. The reason it couldn't be done before is because the
> > > scheduler was in the middle, deciding which entity to pull the next job
> > > from, which was changing the seqno a job driver-fence would be assigned
> > > (you can't guess that at queue time in that case).
> > >   
> > 
> > Xe doesn't need to late arming, but it look like multiple drivers to
> > implement the late arming which may be required (?).
> 
> As I said, it's mostly a problem when you have a
> single-HW-queue:multiple-contexts model, which is exactly what
> drm_sched was designed for. I suspect early arming is not an issue for
> any of the HW supporting FW-based scheduling (PVR, Mali, NVidia,
> ...). If you want to use drm_dep for all drivers currently using
> drm_sched (I'm still not convinced this is a good idea to do that
> just yet, because then you're going to pull a lot of the complexity
> we're trying to get rid of), then you need late arming of driver fences.
> 

Yes, even the hardware scheduling component [1] I hacked together relied
on no late arming. But even then, you can arm a dma-fence early and
assign a hardware seqno later in run_job()—those are two different
things.

[1] https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-svn-perf-6-15-2025/-/commit/22c8aa993b5c9e4ad0c312af2f3e032273d20966#line_7c49af3ee_A319

> > 
> > > [...]
> > >   
> > > > > > > + * **Reference counting**
> > > > > > > + *
> > > > > > > + * Jobs and queues are both reference counted.
> > > > > > > + *
> > > > > > > + * A job holds a reference to its queue from drm_dep_job_init() until
> > > > > > > + * drm_dep_job_put() drops the job's last reference and its release callback
> > > > > > > + * runs. This ensures the queue remains valid for the entire lifetime of any
> > > > > > > + * job that was submitted to it.
> > > > > > > + *
> > > > > > > + * The queue holds its own reference to a job for as long as the job is
> > > > > > > + * internally tracked: from the moment the job is added to the pending list
> > > > > > > + * in drm_dep_queue_run_job() until drm_dep_job_done() kicks the put_job
> > > > > > > + * worker, which calls drm_dep_job_put() to release that reference.    
> > > > > > 
> > > > > > Why not simply keep track that the job was completed, instead of relinquishing
> > > > > > the reference? We can then release the reference once the job is cleaned up
> > > > > > (by the queue, using a worker) in process context.    
> > > > 
> > > > I think that’s what I’m doing, while also allowing an opt-in path to
> > > > drop the job reference when it signals (in IRQ context)  
> > > 
> > > Did you mean in !IRQ (or !atomic) context here? Feels weird to not
> > > defer the cleanup when you're in an IRQ/atomic context, but defer it
> > > when you're in a thread context.
> > >   
> > 
> > The put of a job in this design can be from an IRQ context (opt-in)
> > feature. xa_destroy blows up if it is called from an IRQ context,
> > although maybe that could be workaround.
> 
> Making it so _put() in IRQ context is safe is fine, what I'm saying is
> that instead of doing a partial immediate cleanup, and the rest in a
> worker, we can just defer everything: that is, have some
> _deref_release() function called by kref_put() that would queue a work
> item from which the actual release is done.
> 

See below.

> > 
> > > > so we avoid
> > > > switching to a work item just to drop a ref. That seems like a
> > > > significant win in terms of CPU cycles.  
> > > 
> > > Well, the cleanup path is probably not where latency matters the most.  
> > 
> > Agree. But I do think avoiding a CPU context switch (work item) for a
> > very lightweight job cleanup (usually just drop refs) will save of CPU
> > cycles, thus also things like power, etc...
> 
> That's the sort of statements I'd like to be backed by actual
> numbers/scenarios proving that it actually makes a difference. The

I disagree. This is not a locking micro-optimization, for example. It is
a software architecture choice that says “do not trigger a CPU context
to free a job,” which costs thousands of cycles. This will have an
effect on CPU utilization and, thus, power.

> mixed model where things are partially freed immediately/partially
> deferred, and sometimes even with conditionals for whether the deferral
> happens or not, it just makes building a mental model of this thing a
> nightmare, which in turn usually leads to subtle bugs.
> 

See above—managing complexity in components. This works in both modes. I
refactored Xe so it also works in IRQ context. If it would make you feel
better, I can ask my company commits CI resources so non-IRQ mode
consistently works too—it’s just a single API flag on the queue. But
then maybe other companies should also commit to public CI.

> > 
> > > It's adding scheduling overhead, sure, but given all the stuff we defer
> > > already, I'm not too sure we're at saving a few cycles to get the
> > > cleanup done immediately. What's important to have is a way to signal
> > > fences in an atomic context, because this has an impact on latency.
> > >   
> > 
> > Yes. The signaling happens first then drm_dep_job_put if IRQ opt-in.
> > 
> > > [...]
> > >   
> > > > > > > + /*
> > > > > > > + * Drop all input dependency fences now, in process context, before the
> > > > > > > + * final job put. Once the job is on the pending list its last reference
> > > > > > > + * may be dropped from a dma_fence callback (IRQ context), where calling
> > > > > > > + * xa_destroy() would be unsafe.
> > > > > > > + */    
> > > > > > 
> > > > > > I assume that “pending” is the list of jobs that have been handed to the driver
> > > > > > via ops->run_job()?
> > > > > > 
> > > > > > Can’t this problem be solved by not doing anything inside a dma_fence callback
> > > > > > other than scheduling the queue worker?
> > > > > >     
> > > > 
> > > > Yes, this code is required to support dropping job refs directly in the
> > > > dma-fence callback (an opt-in feature). Again, this seems like a
> > > > significant win in terms of CPU cycles, although I haven’t collected
> > > > data yet.  
> > > 
> > > If it significantly hurts the perf, I'd like to understand why, because
> > > to me it looks like pure-cleanup (no signaling involved), and thus no
> > > other process waiting for us to do the cleanup. The only thing that
> > > might have an impact is how fast you release the resources, and given
> > > it's only a partial cleanup (xa_destroy() still has to be deferred), I'd
> > > like to understand which part of the immediate cleanup is causing a
> > > contention (basically which kind of resources the system is starving of)
> > >   
> > 
> > It was more of once we moved to a ref counted model, it is pretty
> > trivial allow drm_dep_job_put when the fence is signaling. It doesn't
> > really add any complexity either, thus why I added it is.
> 
> It's not the refcount model I'm complaining about, it's the "part of it
> is always freed immediately, part of it is deferred, but not always ..."
> that happens in drm_dep_job_release() I'm questioning. I'd really
> prefer something like:
> 

You are completely missing the point here.

Here is what I’ve reduced my job put to:

188         xe_sched_job_free_fences(job);
189         dma_fence_put(job->fence);
190         job_free(job);
191         atomic_dec(&q->job_cnt);
192         xe_pm_runtime_put(xe);

These are lightweight (IRQ-safe) operations that never need to be done
in a work item—so why kick one?

Matt

> static void drm_dep_job_release()
> {
> 	// do it all unconditionally
> }
> 
> static void drm_dep_job_defer_release()
> {
> 	queue_work(&job->cleanup_work);
> }
> 
> static void drm_dep_job_put()
> {
> 	kref_put(job, drm_dep_job_defer_release);
> }

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
  2026-03-22  6:43               ` Matthew Brost
@ 2026-03-23  7:58                 ` Matthew Brost
  2026-03-23 10:06                   ` Boris Brezillon
  0 siblings, 1 reply; 21+ messages in thread
From: Matthew Brost @ 2026-03-23  7:58 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Daniel Almeida, intel-xe, dri-devel, Tvrtko Ursulin, Rodrigo Vivi,
	Thomas Hellström, Christian König, Danilo Krummrich,
	David Airlie, Maarten Lankhorst, Maxime Ripard, Philipp Stanner,
	Simona Vetter, Sumit Semwal, Thomas Zimmermann, linux-kernel,
	Sami Tolvanen, Jeffrey Vander Stoep, Alice Ryhl, Daniel Stone,
	Alexandre Courbot, John Hubbard, shashanks, jajones,
	Eliot Courtney, Joel Fernandes, rust-for-linux

On Sat, Mar 21, 2026 at 11:43:12PM -0700, Matthew Brost wrote:
> On Thu, Mar 19, 2026 at 10:57:29AM +0100, Boris Brezillon wrote:
> > On Wed, 18 Mar 2026 15:40:35 -0700
> > Matthew Brost <matthew.brost@intel.com> wrote:
> > 
> > > > > 
> > > > > So I don’t think Rust natively solves these types of problems, although
> > > > > I’ll concede that it does make refcounting a bit more sane.  
> > > > 
> > > > Rust won't magically defer the cleanup, nor will it dictate how you want
> > > > to do the queue teardown, those are things you need to implement. But it
> > > > should give visibility about object lifetimes, and guarantee that an
> > > > object that's still visible to some owners is usable (the notion of
> > > > usable is highly dependent on the object implementation).
> > > > 
> > > > Just a purely theoretical example of a multi-step queue teardown that
> > > > might be possible to encode in rust:
> > > > 
> > > > - MyJobQueue<Usable>: The job queue is currently exposed and usable.
> > > >   There's a ::destroy() method consuming 'self' and returning a
> > > >   MyJobQueue<Destroyed> object
> > > > - MyJobQueue<Destroyed>: The user asked for the workqueue to be
> > > >   destroyed. No new job can be pushed. Existing jobs that didn't make
> > > >   it to the FW queue are cancelled, jobs that are in-flight are
> > > >   cancelled if they can, or are just waited upon if they can't. When
> > > >   the whole destruction step is done, ::destroyed() is called, it
> > > >   consumes 'self' and returns a MyJobQueue<Inactive> object.
> > > > - MyJobQueue<Inactive>: The queue is no longer active (HW doesn't have
> > > >   any resources on this queue). It's ready to be cleaned up.
> > > >   ::cleanup() (or just ::drop()) defers the cleanup of some inner
> > > >   object that has been passed around between the various
> > > >   MyJobQueue<State> wrappers.
> > > > 
> > > > Each of the state transition can happen asynchronously. A state
> > > > transition consumes the object in one state, and returns a new object
> > > > in its new state. None of the transition involves dropping a refcnt,
> > > > ownership is just transferred. The final MyJobQueue<Inactive> object is
> > > > the object we'll defer cleanup on.
> > > > 
> > > > It's a very high-level view of one way this can be implemented (I'm
> > > > sure there are others, probably better than my suggestion) in order to
> > > > make sure the object doesn't go away without the compiler enforcing
> > > > proper state transitions.
> > > >   
> > > 
> > > I'm sure Rust can implement this. My point about Rust is it doesn't
> > > magically solve hard software arch probles, but I will admit the
> > > ownership model, way it can enforce locking at compile time is pretty
> > > cool.
> > 
> > It's not quite about rust directly solving those problems for you, it's
> > about rust forcing you to think about those problems in the first
> > place. So no, rust won't magically solve your multi-step teardown with
> > crazy CPU <-> Device synchronization etc, but it allows you to clearly
> > identify those steps, and think about how you want to represent them
> > without abusing other concepts, like object refcounting/ownership.
> > Everything I described, you can code it in C BTW, it's just that C is so
> > lax that you can also abuse other stuff to get to your ends, which might
> > or might not be safe, but more importantly, will very likely obfuscate
> > the code (even with good docs).
> > 
> 
> This is very well put, and I completely agree. Sorry—I get annoyed by
> the Rust comments. It solves some classes of problems, but it doesn’t
> magically solve complex software architecture issues that need to be
> thoughtfully designed.
> 
> > > 
> > > > > > > > +/**
> > > > > > > > + * DOC: DRM dependency fence
> > > > > > > > + *
> > > > > > > > + * Each struct drm_dep_job has an associated struct drm_dep_fence that
> > > > > > > > + * provides a single dma_fence (@finished) signalled when the hardware
> > > > > > > > + * completes the job.
> > > > > > > > + *
> > > > > > > > + * The hardware fence returned by &drm_dep_queue_ops.run_job is stored as
> > > > > > > > + * @parent. @finished is chained to @parent via drm_dep_job_done_cb() and
> > > > > > > > + * is signalled once @parent signals (or immediately if run_job() returns
> > > > > > > > + * NULL or an error).    
> > > > > > > 
> > > > > > > I thought this fence proxy mechanism was going away due to recent work being
> > > > > > > carried out by Christian?
> > > > > > >     
> > > > > 
> > > > > Consider the case where a driver’s hardware fence is implemented as a
> > > > > dma-fence-array or dma-fence-chain. You cannot install these types of
> > > > > fences into a dma-resv or into syncobjs, so a proxy fence is useful
> > > > > here.  
> > > > 
> > > > Hm, so that's a driver returning a dma_fence_array/chain through
> > > > ::run_job()? Why would we not want to have them directly exposed and
> > > > split up into singular fence objects at resv insertion time (I don't
> > > > think syncobjs care, but I might be wrong). I mean, one of the point  
> > > 
> > > You can stick dma-fence-arrays in syncobjs, but not chains.
> > 
> > Yeah, kinda makes sense, since timeline syncobjs use chains, and if the
> > chain reject inner chains, it won't work.
> > 
> 
> +1, Exactly.
> 
> > > 
> > > Neither dma-fence-arrays/chain can go into dma-resv.
> > 
> > They can't go directly in it, but those can be split into individual
> > fences and be inserted, which would achieve the same goal.
> > 
> 
> Yes, but now it becomes a driver problem (maybe only mine) rather than
> an opaque job fence that can be inserted. In my opinion, it’s best to
> keep the job vs. hardware fence abstraction.
> 
> > > 
> > > Hence why disconnecting a job's finished fence from hardware fence IMO
> > > is good idea to keep so gives drivers flexiblity on the hardware fences.
> > 
> > The thing is, I'm not sure drivers were ever meant to expose containers
> > through ::run_job().
> > 
> 
> Well there haven't been any rules...
> 
> > > e.g., If this design didn't have a job's finished fence, I'd have to
> > > open code one Xe side.
> > 
> > There might be other reasons we'd like to keep the
> > drm_sched_fence-like proxy that I'm missing. But if it's the only one,
> > and the fence-combining pattern you're describing is common to multiple
> > drivers, we can provide a container implementation that's not a
> > fence_array, so you can use it to insert driver fences into other
> > containers. This way we wouldn't force the proxy model to all drivers,
> > but we would keep the code generic/re-usable.
> > 
> > > 
> > > > behind the container extraction is so fences coming from the same
> > > > context/timeline can be detected and merged. If you insert the
> > > > container through a proxy, you're defeating the whole fence merging
> > > > optimization.  
> > > 
> > > Right. Finished fences have single timeline too...
> > 
> > Aren't you faking a single timeline though if you combine fences from
> > different engines running at their own pace into a container?
> > 
> > > 
> > > > 
> > > > The second thing is that I'm not sure drivers were ever supposed to
> > > > return fence containers in the first place, because the whole idea
> > > > behind a fence context is that fences are emitted/signalled in
> > > > seqno-order, and if the fence is encoding the state of multiple
> > > > timelines that progress at their own pace, it becomes tricky to control
> > > > that. I guess if it's always the same set of timelines that are
> > > > combined, that would work.  
> > > 
> > > Xe does this is definitely works. We submit to multiple rings, when all
> > > rings signal a seqno, a chain or array signals -> finished fence
> > > signals. The queues used in this manor can only submit multiple ring
> > > jobs so the finished fence timeline stays intact. If you could a
> > > multiple rings followed by a single ring submission on the same queue,
> > > yes this could break.
> > 
> > Okay, I had the same understanding, thanks for confirming.
> > 
> 
> I think the last three comments are resolved here—it’s a queue timeline.
> As long as the queue has consistent rules (i.e., submits to a consistent
> set of rings), this whole approach makes sense?
> 
> > > 
> > > >   
> > > > > One example is when a single job submits work to multiple rings
> > > > > that are flipped in hardware at the same time.  
> > > > 
> > > > We do have that in Panthor, but that's all explicit: in a single
> > > > SUBMIT, you can have multiple jobs targeting different queues, each of
> > > > them having their own set of deps/signal ops. The combination of all the
> > > > signal ops into a container is left to the UMD. It could be automated
> > > > kernel side, but that would be a flag on the SIGNAL op leading to the
> > > > creation of a fence_array containing fences from multiple submitted
> > > > jobs, rather than the driver combining stuff in the fence it returns in
> > > > ::run_job().  
> > > 
> > > See above. We have a dedicated queue type for these type of submissions
> > > and single job that submits to the all rings. We had multiple queue /
> > > jobs in the i915 to implemented this but it turns out it is much cleaner
> > > with a single queue / singler job / multiple rings model.
> > 
> > Hm, okay. It didn't turn into a mess in Panthor, but Xe is likely an
> > order of magnitude more complicated that Mali, so I'll refrain from
> > judging this design decision.
> > 
> 
> Yes, Xe is a beast, but we tend to build complexity into components and
> layers to manage it. That is what I’m attempting to do here.
> 
> > > 
> > > >   
> > > > > 
> > > > > Another case is late arming of hardware fences in run_job (which many
> > > > > drivers do). The proxy fence is immediately available at arm time and
> > > > > can be installed into dma-resv or syncobjs even though the actual
> > > > > hardware fence is not yet available. I think most drivers could be
> > > > > refactored to make the hardware fence immediately available at run_job,
> > > > > though.  
> > > > 
> > > > Yep, I also think we can arm the driver fence early in the case of
> > > > JobQueue. The reason it couldn't be done before is because the
> > > > scheduler was in the middle, deciding which entity to pull the next job
> > > > from, which was changing the seqno a job driver-fence would be assigned
> > > > (you can't guess that at queue time in that case).
> > > >   
> > > 
> > > Xe doesn't need to late arming, but it look like multiple drivers to
> > > implement the late arming which may be required (?).
> > 
> > As I said, it's mostly a problem when you have a
> > single-HW-queue:multiple-contexts model, which is exactly what
> > drm_sched was designed for. I suspect early arming is not an issue for
> > any of the HW supporting FW-based scheduling (PVR, Mali, NVidia,
> > ...). If you want to use drm_dep for all drivers currently using
> > drm_sched (I'm still not convinced this is a good idea to do that
> > just yet, because then you're going to pull a lot of the complexity
> > we're trying to get rid of), then you need late arming of driver fences.
> > 
> 
> Yes, even the hardware scheduling component [1] I hacked together relied
> on no late arming. But even then, you can arm a dma-fence early and
> assign a hardware seqno later in run_job()—those are two different
> things.
> 
> [1] https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-svn-perf-6-15-2025/-/commit/22c8aa993b5c9e4ad0c312af2f3e032273d20966#line_7c49af3ee_A319
> 
> > > 
> > > > [...]
> > > >   
> > > > > > > > + * **Reference counting**
> > > > > > > > + *
> > > > > > > > + * Jobs and queues are both reference counted.
> > > > > > > > + *
> > > > > > > > + * A job holds a reference to its queue from drm_dep_job_init() until
> > > > > > > > + * drm_dep_job_put() drops the job's last reference and its release callback
> > > > > > > > + * runs. This ensures the queue remains valid for the entire lifetime of any
> > > > > > > > + * job that was submitted to it.
> > > > > > > > + *
> > > > > > > > + * The queue holds its own reference to a job for as long as the job is
> > > > > > > > + * internally tracked: from the moment the job is added to the pending list
> > > > > > > > + * in drm_dep_queue_run_job() until drm_dep_job_done() kicks the put_job
> > > > > > > > + * worker, which calls drm_dep_job_put() to release that reference.    
> > > > > > > 
> > > > > > > Why not simply keep track that the job was completed, instead of relinquishing
> > > > > > > the reference? We can then release the reference once the job is cleaned up
> > > > > > > (by the queue, using a worker) in process context.    
> > > > > 
> > > > > I think that’s what I’m doing, while also allowing an opt-in path to
> > > > > drop the job reference when it signals (in IRQ context)  
> > > > 
> > > > Did you mean in !IRQ (or !atomic) context here? Feels weird to not
> > > > defer the cleanup when you're in an IRQ/atomic context, but defer it
> > > > when you're in a thread context.
> > > >   
> > > 
> > > The put of a job in this design can be from an IRQ context (opt-in)
> > > feature. xa_destroy blows up if it is called from an IRQ context,
> > > although maybe that could be workaround.
> > 
> > Making it so _put() in IRQ context is safe is fine, what I'm saying is
> > that instead of doing a partial immediate cleanup, and the rest in a
> > worker, we can just defer everything: that is, have some
> > _deref_release() function called by kref_put() that would queue a work
> > item from which the actual release is done.
> > 
> 
> See below.
> 
> > > 
> > > > > so we avoid
> > > > > switching to a work item just to drop a ref. That seems like a
> > > > > significant win in terms of CPU cycles.  
> > > > 
> > > > Well, the cleanup path is probably not where latency matters the most.  
> > > 
> > > Agree. But I do think avoiding a CPU context switch (work item) for a
> > > very lightweight job cleanup (usually just drop refs) will save of CPU
> > > cycles, thus also things like power, etc...
> > 
> > That's the sort of statements I'd like to be backed by actual
> > numbers/scenarios proving that it actually makes a difference. The
> 
> I disagree. This is not a locking micro-optimization, for example. It is
> a software architecture choice that says “do not trigger a CPU context
> to free a job,” which costs thousands of cycles. This will have an
> effect on CPU utilization and, thus, power.
> 
> > mixed model where things are partially freed immediately/partially
> > deferred, and sometimes even with conditionals for whether the deferral
> > happens or not, it just makes building a mental model of this thing a
> > nightmare, which in turn usually leads to subtle bugs.
> > 
> 
> See above—managing complexity in components. This works in both modes. I
> refactored Xe so it also works in IRQ context. If it would make you feel
> better, I can ask my company commits CI resources so non-IRQ mode
> consistently works too—it’s just a single API flag on the queue. But
> then maybe other companies should also commit to public CI.
> 
> > > 
> > > > It's adding scheduling overhead, sure, but given all the stuff we defer
> > > > already, I'm not too sure we're at saving a few cycles to get the
> > > > cleanup done immediately. What's important to have is a way to signal
> > > > fences in an atomic context, because this has an impact on latency.
> > > >   
> > > 
> > > Yes. The signaling happens first then drm_dep_job_put if IRQ opt-in.
> > > 
> > > > [...]
> > > >   
> > > > > > > > + /*
> > > > > > > > + * Drop all input dependency fences now, in process context, before the
> > > > > > > > + * final job put. Once the job is on the pending list its last reference
> > > > > > > > + * may be dropped from a dma_fence callback (IRQ context), where calling
> > > > > > > > + * xa_destroy() would be unsafe.
> > > > > > > > + */    
> > > > > > > 
> > > > > > > I assume that “pending” is the list of jobs that have been handed to the driver
> > > > > > > via ops->run_job()?
> > > > > > > 
> > > > > > > Can’t this problem be solved by not doing anything inside a dma_fence callback
> > > > > > > other than scheduling the queue worker?
> > > > > > >     
> > > > > 
> > > > > Yes, this code is required to support dropping job refs directly in the
> > > > > dma-fence callback (an opt-in feature). Again, this seems like a
> > > > > significant win in terms of CPU cycles, although I haven’t collected
> > > > > data yet.  
> > > > 
> > > > If it significantly hurts the perf, I'd like to understand why, because
> > > > to me it looks like pure-cleanup (no signaling involved), and thus no
> > > > other process waiting for us to do the cleanup. The only thing that
> > > > might have an impact is how fast you release the resources, and given
> > > > it's only a partial cleanup (xa_destroy() still has to be deferred), I'd
> > > > like to understand which part of the immediate cleanup is causing a
> > > > contention (basically which kind of resources the system is starving of)
> > > >   
> > > 
> > > It was more of once we moved to a ref counted model, it is pretty
> > > trivial allow drm_dep_job_put when the fence is signaling. It doesn't
> > > really add any complexity either, thus why I added it is.
> > 
> > It's not the refcount model I'm complaining about, it's the "part of it
> > is always freed immediately, part of it is deferred, but not always ..."
> > that happens in drm_dep_job_release() I'm questioning. I'd really
> > prefer something like:
> > 
> 
> You are completely missing the point here.
> 

Let me rephrase this — I realize this may come across as rude, which is
not my intent. I believe there is simply a disconnect in understanding
the constraints.

In my example below, the job release completes within bounded time
constraints, which makes it suitable for direct release in IRQ context,
bypassing the need for a work item that would otherwise incur a costly
CPU context switch.

Matt

> Here is what I’ve reduced my job put to:
> 
> 188         xe_sched_job_free_fences(job);
> 189         dma_fence_put(job->fence);
> 190         job_free(job);
> 191         atomic_dec(&q->job_cnt);
> 192         xe_pm_runtime_put(xe);
> 
> These are lightweight (IRQ-safe) operations that never need to be done
> in a work item—so why kick one?
> 
> Matt
> 
> > static void drm_dep_job_release()
> > {
> > 	// do it all unconditionally
> > }
> > 
> > static void drm_dep_job_defer_release()
> > {
> > 	queue_work(&job->cleanup_work);
> > }
> > 
> > static void drm_dep_job_put()
> > {
> > 	kref_put(job, drm_dep_job_defer_release);
> > }

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
  2026-03-23  7:58                 ` Matthew Brost
@ 2026-03-23 10:06                   ` Boris Brezillon
  2026-03-23 17:11                     ` Matthew Brost
  0 siblings, 1 reply; 21+ messages in thread
From: Boris Brezillon @ 2026-03-23 10:06 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Daniel Almeida, intel-xe, dri-devel, Tvrtko Ursulin, Rodrigo Vivi,
	Thomas Hellström, Christian König, Danilo Krummrich,
	David Airlie, Maarten Lankhorst, Maxime Ripard, Philipp Stanner,
	Simona Vetter, Sumit Semwal, Thomas Zimmermann, linux-kernel,
	Sami Tolvanen, Jeffrey Vander Stoep, Alice Ryhl, Daniel Stone,
	Alexandre Courbot, John Hubbard, shashanks, jajones,
	Eliot Courtney, Joel Fernandes, rust-for-linux

On Mon, 23 Mar 2026 00:58:51 -0700
Matthew Brost <matthew.brost@intel.com> wrote:

> > > It's not the refcount model I'm complaining about, it's the "part of it
> > > is always freed immediately, part of it is deferred, but not always ..."
> > > that happens in drm_dep_job_release() I'm questioning. I'd really
> > > prefer something like:
> > >   
> > 
> > You are completely missing the point here.
> >   
> 
> Let me rephrase this — I realize this may come across as rude, which is
> not my intent.

No offense taken ;-).

> I believe there is simply a disconnect in understanding
> the constraints.
> 
> In my example below, the job release completes within bounded time
> constraints, which makes it suitable for direct release in IRQ context,
> bypassing the need for a work item that would otherwise incur a costly
> CPU context switch.

In the other thread, I've explained in more details why I think
deferred cleanup of jobs is not as bad as you make it sound (context
switch amortized by the fact it's already there for queue progress
checking). But let's assume it is, I'd prefer a model where we say
"ops->job_release() has to be IRQ-safe" and have implementations defer
their cleanup if they have to, than this mixed approach with a flag. Of
course, I'd still like to have numbers proving that this job cleanup
deferral actually makes a difference in practice :P.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
  2026-03-23 10:06                   ` Boris Brezillon
@ 2026-03-23 17:11                     ` Matthew Brost
  0 siblings, 0 replies; 21+ messages in thread
From: Matthew Brost @ 2026-03-23 17:11 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Daniel Almeida, intel-xe, dri-devel, Tvrtko Ursulin, Rodrigo Vivi,
	Thomas Hellström, Christian König, Danilo Krummrich,
	David Airlie, Maarten Lankhorst, Maxime Ripard, Philipp Stanner,
	Simona Vetter, Sumit Semwal, Thomas Zimmermann, linux-kernel,
	Sami Tolvanen, Jeffrey Vander Stoep, Alice Ryhl, Daniel Stone,
	Alexandre Courbot, John Hubbard, shashanks, jajones,
	Eliot Courtney, Joel Fernandes, rust-for-linux

On Mon, Mar 23, 2026 at 11:06:13AM +0100, Boris Brezillon wrote:
> On Mon, 23 Mar 2026 00:58:51 -0700
> Matthew Brost <matthew.brost@intel.com> wrote:
> 
> > > > It's not the refcount model I'm complaining about, it's the "part of it
> > > > is always freed immediately, part of it is deferred, but not always ..."
> > > > that happens in drm_dep_job_release() I'm questioning. I'd really
> > > > prefer something like:
> > > >   
> > > 
> > > You are completely missing the point here.
> > >   
> > 
> > Let me rephrase this — I realize this may come across as rude, which is
> > not my intent.
> 
> No offense taken ;-).
> 
> > I believe there is simply a disconnect in understanding
> > the constraints.
> > 
> > In my example below, the job release completes within bounded time
> > constraints, which makes it suitable for direct release in IRQ context,
> > bypassing the need for a work item that would otherwise incur a costly
> > CPU context switch.
> 
> In the other thread, I've explained in more details why I think
> deferred cleanup of jobs is not as bad as you make it sound (context
> switch amortized by the fact it's already there for queue progress
> checking). But let's assume it is, I'd prefer a model where we say
> "ops->job_release() has to be IRQ-safe" and have implementations defer
> their cleanup if they have to, than this mixed approach with a flag. Of
> course, I'd still like to have numbers proving that this job cleanup
> deferral actually makes a difference in practice :P.

Yes, I replied there will either drop this or have solid numbers showing
yes, the CPU utlization shows this is worth while.

Matt 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
  2026-03-17 19:41           ` Miguel Ojeda
@ 2026-03-23 17:31             ` Matthew Brost
  2026-03-23 17:42               ` Miguel Ojeda
  0 siblings, 1 reply; 21+ messages in thread
From: Matthew Brost @ 2026-03-23 17:31 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: Daniel Almeida, intel-xe, dri-devel, Boris Brezillon,
	Tvrtko Ursulin, Rodrigo Vivi, Thomas Hellström,
	Christian König, Danilo Krummrich, David Airlie,
	Maarten Lankhorst, Maxime Ripard, Philipp Stanner, Simona Vetter,
	Sumit Semwal, Thomas Zimmermann, linux-kernel, Sami Tolvanen,
	Jeffrey Vander Stoep, Alice Ryhl, Daniel Stone, Alexandre Courbot,
	John Hubbard, shashanks, jajones, Eliot Courtney, Joel Fernandes,
	rust-for-linux

On Tue, Mar 17, 2026 at 08:41:24PM +0100, Miguel Ojeda wrote:
> On Tue, Mar 17, 2026 at 9:27 AM Matthew Brost <matthew.brost@intel.com> wrote:
> >
> > I hate cut off in thteads.
> >
> > I get it — you’re a Rust zealot.
> 
> Cut off? Zealot?
> 

I appologize here I shouldn't type when I get annoyed. This the 2nd
comment that pointing out difference between C and Rust which really
wasn't direction I have hoping this thread would take.

> Look, I got the email in my inbox, so I skimmed it to understand why I
> got it and why the Rust list was Cc'd. I happened to notice your
> (quite surprising) claims about Rust, so I decided to reply to a
> couple of those, since I proposed Rust for the kernel.
> 

Again my mistake.

> How is that a cut off and how does that make a maintainer a zealot?
> 
> Anyway, my understanding is that we agreed that the cleanup attribute
> in C doesn't enforce much of anything. We also agreed that it is
> important to think about ownership and lifetimes and to enforce the
> rules and to be disciplined. All good so far.
> 
> Now, what I said is simply that Rust fundamentally improves the
> situation -- C "RAII" not doing so is not comparable. For instance,
> that statically enforcing things is a meaningful improvement over
> runtime approaches (which generally require to trigger an issue, and
> which in some cases are not suitable for production settings).
> 

I agree the static checking in Rust is a very nice feature.

> Really, I just said Rust would help with things you already stated you
> care about. And nobody claims "Rust solves everything" as you stated.
> So I don't see zealots here, and insulting others doesn't help your
> argument.

I know, appologize.

Matt

> 
> Cheers,
> Miguel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
  2026-03-23 17:31             ` Matthew Brost
@ 2026-03-23 17:42               ` Miguel Ojeda
  0 siblings, 0 replies; 21+ messages in thread
From: Miguel Ojeda @ 2026-03-23 17:42 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Daniel Almeida, intel-xe, dri-devel, Boris Brezillon,
	Tvrtko Ursulin, Rodrigo Vivi, Thomas Hellström,
	Christian König, Danilo Krummrich, David Airlie,
	Maarten Lankhorst, Maxime Ripard, Philipp Stanner, Simona Vetter,
	Sumit Semwal, Thomas Zimmermann, linux-kernel, Sami Tolvanen,
	Jeffrey Vander Stoep, Alice Ryhl, Daniel Stone, Alexandre Courbot,
	John Hubbard, shashanks, jajones, Eliot Courtney, Joel Fernandes,
	rust-for-linux

On Mon, Mar 23, 2026 at 6:31 PM Matthew Brost <matthew.brost@intel.com> wrote:
>
> I appologize here I shouldn't type when I get annoyed. This the 2nd
> comment that pointing out difference between C and Rust which really
> wasn't direction I have hoping this thread would take.

No worries, it happens to everyone from time to time.

Thanks!

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2026-03-23 17:42 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260316043255.226352-1-matthew.brost@intel.com>
     [not found] ` <20260316043255.226352-3-matthew.brost@intel.com>
2026-03-17  2:47   ` [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer Daniel Almeida
2026-03-17  5:45     ` Matthew Brost
2026-03-17  7:17       ` Miguel Ojeda
2026-03-17  8:26         ` Matthew Brost
2026-03-17 12:04           ` Daniel Almeida
2026-03-17 19:41           ` Miguel Ojeda
2026-03-23 17:31             ` Matthew Brost
2026-03-23 17:42               ` Miguel Ojeda
2026-03-17 18:14       ` Matthew Brost
2026-03-17 19:48         ` Daniel Almeida
2026-03-17 20:43         ` Boris Brezillon
2026-03-18 22:40           ` Matthew Brost
2026-03-19  9:57             ` Boris Brezillon
2026-03-22  6:43               ` Matthew Brost
2026-03-23  7:58                 ` Matthew Brost
2026-03-23 10:06                   ` Boris Brezillon
2026-03-23 17:11                     ` Matthew Brost
2026-03-17 12:31     ` Danilo Krummrich
2026-03-17 14:25       ` Daniel Almeida
2026-03-17 14:33         ` Danilo Krummrich
2026-03-18 22:50           ` Matthew Brost

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox