Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Boris Brezillon <boris.brezillon@collabora.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	"Tvrtko Ursulin" <tvrtko.ursulin@igalia.com>,
	"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"David Airlie" <airlied@gmail.com>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Maxime Ripard" <mripard@kernel.org>,
	"Philipp Stanner" <phasta@kernel.org>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
Date: Thu, 19 Mar 2026 10:11:53 +0100	[thread overview]
Message-ID: <20260319101153.169c7f36@fedora> (raw)
In-Reply-To: <abs1Da8ntfAQISL6@lstrano-desk.jf.intel.com>

Hi Matthew,

On Wed, 18 Mar 2026 16:28:13 -0700
Matthew Brost <matthew.brost@intel.com> wrote:

> > - fence must be signaled for dma_fence::ops to be set back to NULL
> > - no .cleanup and no .wait implementation
> > 
> > There might be an interest in having HW submission fences reflecting
> > when the job is passed to the FW/HW queue, but that can done as a
> > separate fence implementation using a different fence timeline/context.
> >   
> 
> Yes, I removed scheduled side of drm sched fence as I figured that could
> be implemented driver side (or as an optional API in drm dep). Only
> AMDGPU / PVR use these too for ganged submissions which I need to wrap
> my head around. My initial thought is both of implementations likely
> could be simplified.

IIRC, PVR was also relying on it to allow native FW waits: when we have
a job that has deps that are backed by fences emitted by the same
driver, they are detected and lowered to waits on the "scheduled"
fence, the wait on the finished fence is done FW side.

> 
> > > diff --git a/drivers/gpu/drm/dep/drm_dep_job.c b/drivers/gpu/drm/dep/drm_dep_job.c
> > > new file mode 100644
> > > index 000000000000..2d012b29a5fc
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/dep/drm_dep_job.c
> > > @@ -0,0 +1,675 @@
> > > +// SPDX-License-Identifier: MIT
> > > +/*
> > > + * Copyright 2015 Advanced Micro Devices, Inc.
> > > + *
> > > + * Permission is hereby granted, free of charge, to any person obtaining a
> > > + * copy of this software and associated documentation files (the "Software"),
> > > + * to deal in the Software without restriction, including without limitation
> > > + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> > > + * and/or sell copies of the Software, and to permit persons to whom the
> > > + * Software is furnished to do so, subject to the following conditions:
> > > + *
> > > + * The above copyright notice and this permission notice shall be included in
> > > + * all copies or substantial portions of the Software.
> > > + *
> > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> > > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> > > + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> > > + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> > > + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> > > + * OTHER DEALINGS IN THE SOFTWARE.
> > > + *
> > > + * Copyright © 2026 Intel Corporation
> > > + */
> > > +
> > > +/**
> > > + * DOC: DRM dependency job
> > > + *
> > > + * A struct drm_dep_job represents a single unit of GPU work associated with
> > > + * a struct drm_dep_queue. The lifecycle of a job is:
> > > + *
> > > + * 1. **Allocation**: the driver allocates memory for the job (typically by
> > > + *    embedding struct drm_dep_job in a larger structure) and calls
> > > + *    drm_dep_job_init() to initialise it. On success the job holds one
> > > + *    kref reference and a reference to its queue.
> > > + *
> > > + * 2. **Dependency collection**: the driver calls drm_dep_job_add_dependency(),
> > > + *    drm_dep_job_add_syncobj_dependency(), drm_dep_job_add_resv_dependencies(),
> > > + *    or drm_dep_job_add_implicit_dependencies() to register dma_fence objects
> > > + *    that must be signalled before the job can run. Duplicate fences from the
> > > + *    same fence context are deduplicated automatically.
> > > + *
> > > + * 3. **Arming**: drm_dep_job_arm() initialises the job's finished fence,
> > > + *    consuming a sequence number from the queue. After arming,
> > > + *    drm_dep_job_finished_fence() returns a valid fence that may be passed to
> > > + *    userspace or used as a dependency by other jobs.
> > > + *
> > > + * 4. **Submission**: drm_dep_job_push() submits the job to the queue. The
> > > + *    queue takes a reference that it holds until the job's finished fence
> > > + *    signals and the job is freed by the put_job worker.
> > > + *
> > > + * 5. **Completion**: when the job's hardware work finishes its finished fence
> > > + *    is signalled and drm_dep_job_put() is called by the queue. The driver
> > > + *    must release any driver-private resources in &drm_dep_job_ops.release.
> > > + *
> > > + * Reference counting uses drm_dep_job_get() / drm_dep_job_put(). The
> > > + * internal drm_dep_job_fini() tears down the dependency xarray and fence
> > > + * objects before the driver's release callback is invoked.
> > > + */
> > > +
> > > +#include <linux/dma-resv.h>
> > > +#include <linux/kref.h>
> > > +#include <linux/slab.h>
> > > +#include <drm/drm_dep.h>
> > > +#include <drm/drm_file.h>
> > > +#include <drm/drm_gem.h>
> > > +#include <drm/drm_syncobj.h>
> > > +#include "drm_dep_fence.h"
> > > +#include "drm_dep_job.h"
> > > +#include "drm_dep_queue.h"
> > > +
> > > +/**
> > > + * drm_dep_job_init() - initialise a dep job
> > > + * @job: dep job to initialise
> > > + * @args: initialisation arguments
> > > + *
> > > + * Initialises @job with the queue, ops and credit count from @args.  Acquires
> > > + * a reference to @args->q via drm_dep_queue_get(); this reference is held for
> > > + * the lifetime of the job and released by drm_dep_job_release() when the last
> > > + * job reference is dropped.
> > > + *
> > > + * Resources are released automatically when the last reference is dropped
> > > + * via drm_dep_job_put(), which must be called to release the job; drivers
> > > + * must not free the job directly.
> > > + *
> > > + * Context: Process context. Allocates memory with GFP_KERNEL.
> > > + * Return: 0 on success, -%EINVAL if credits is 0,
> > > + *   -%ENOMEM on fence allocation failure.
> > > + */
> > > +int drm_dep_job_init(struct drm_dep_job *job,
> > > +		     const struct drm_dep_job_init_args *args)
> > > +{
> > > +	if (unlikely(!args->credits)) {
> > > +		pr_err("drm_dep: %s: credits cannot be 0\n", __func__);
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	memset(job, 0, sizeof(*job));
> > > +
> > > +	job->dfence = drm_dep_fence_alloc();
> > > +	if (!job->dfence)
> > > +		return -ENOMEM;
> > > +
> > > +	job->ops = args->ops;
> > > +	job->q = drm_dep_queue_get(args->q);
> > > +	job->credits = args->credits;
> > > +
> > > +	kref_init(&job->refcount);
> > > +	xa_init_flags(&job->dependencies, XA_FLAGS_ALLOC);
> > > +	INIT_LIST_HEAD(&job->pending_link);
> > > +
> > > +	return 0;
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_job_init);
> > > +
> > > +/**
> > > + * drm_dep_job_drop_dependencies() - release all input dependency fences
> > > + * @job: dep job whose dependency xarray to drain
> > > + *
> > > + * Walks @job->dependencies, puts each fence, and destroys the xarray.
> > > + * Any slots still holding a %DRM_DEP_JOB_FENCE_PREALLOC sentinel —
> > > + * i.e. slots that were pre-allocated but never replaced — are silently
> > > + * skipped; the sentinel carries no reference.  Called from
> > > + * drm_dep_queue_run_job() in process context immediately after
> > > + * @ops->run_job() returns, before the final drm_dep_job_put().  Releasing
> > > + * dependencies here — while still in process context — avoids calling
> > > + * xa_destroy() from IRQ context if the job's last reference is later
> > > + * dropped from a dma_fence callback.
> > > + *
> > > + * Context: Process context.
> > > + */
> > > +void drm_dep_job_drop_dependencies(struct drm_dep_job *job)
> > > +{
> > > +	struct dma_fence *fence;
> > > +	unsigned long index;
> > > +
> > > +	xa_for_each(&job->dependencies, index, fence) {
> > > +		if (unlikely(fence == DRM_DEP_JOB_FENCE_PREALLOC))
> > > +			continue;
> > > +		dma_fence_put(fence);
> > > +	}
> > > +	xa_destroy(&job->dependencies);
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_job_fini() - clean up a dep job
> > > + * @job: dep job to clean up
> > > + *
> > > + * Cleans up the dep fence and drops the queue reference held by @job.
> > > + *
> > > + * If the job was never armed (e.g. init failed before drm_dep_job_arm()),
> > > + * the dependency xarray is also released here.  For armed jobs the xarray
> > > + * has already been drained by drm_dep_job_drop_dependencies() in process
> > > + * context immediately after run_job(), so it is left untouched to avoid
> > > + * calling xa_destroy() from IRQ context.
> > > + *
> > > + * Warns if @job is still linked on the queue's pending list, which would
> > > + * indicate a bug in the teardown ordering.
> > > + *
> > > + * Context: Any context.
> > > + */
> > > +static void drm_dep_job_fini(struct drm_dep_job *job)
> > > +{
> > > +	bool armed = drm_dep_fence_is_armed(job->dfence);
> > > +
> > > +	WARN_ON(!list_empty(&job->pending_link));
> > > +
> > > +	drm_dep_fence_cleanup(job->dfence);
> > > +	job->dfence = NULL;
> > > +
> > > +	/*
> > > +	 * Armed jobs have their dependencies drained by
> > > +	 * drm_dep_job_drop_dependencies() in process context after run_job().  
> > 
> > Just want to clear the confusion and make sure I get this right at the
> > same time. To me, "process context" means a user thread entering some
> > syscall(). What you call "process context" is more a "thread context" to
> > me. I'm actually almost certain it's always a kernel thread (a workqueue
> > worker thread to be accurate) that executes the drop_deps() after a
> > run_job().  
> 
> Some of context comments likely could be cleaned up. 'process context'
> here either in user context (bypass path) or run job work item.
> 
> >   
> > > +	 * Skip here to avoid calling xa_destroy() from IRQ context.
> > > +	 */
> > > +	if (!armed)
> > > +		drm_dep_job_drop_dependencies(job);  
> > 
> > Why do we need to make a difference here. Can't we just assume that the
> > hole drm_dep_job_fini() call is unsafe in atomic context, and have a
> > work item embedded in the job to defer its destruction when _put() is
> > called in a context where the destruction is not allowed?
> >   
> 
> We already touched on this, but the design currently allows the last job
> put from dma-fence signaling path (IRQ).

It's not much about the last _put and more about what happens in the
_release() you pass to kref_put(). My point being, if you assume
something in _release() is not safe to be done in an atomic context,
and _put() is assumed to be called from any context, you might as well
just defer the cleanup (AKA the stuff you currently have in _release())
so everything is always cleaned up in a thread context. Yes, there's
scheduling overhead and extra latency, but it's also simpler, because
there's just one path. So, if the latency and the overhead is not
proven to be a problem (and it rarely is for cleanup operations), I'm
still convinced this makes for an easier design to just defer the
cleanup all the time.

>  If we droppped that, then yes
> this could change. The reason the if statement currently is user is
> building a job and need to abort prior to calling arm() (e.g., a memory
> allocation fails) via a drm_dep_job_put.

But even in that context, it could still be deferred and work just
fine, no?

> 
> Once arm() is called there is a guarnette the run_job path is called
> either via bypass or run job work item.

Sure.

> 
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_job_get() - acquire a reference to a dep job
> > > + * @job: dep job to acquire a reference on, or NULL
> > > + *
> > > + * Context: Any context.
> > > + * Return: @job with an additional reference held, or NULL if @job is NULL.
> > > + */
> > > +struct drm_dep_job *drm_dep_job_get(struct drm_dep_job *job)
> > > +{
> > > +	if (job)
> > > +		kref_get(&job->refcount);
> > > +	return job;
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_job_get);
> > > +
> > > +/**
> > > + * drm_dep_job_release() - kref release callback for a dep job
> > > + * @kref: kref embedded in the dep job
> > > + *
> > > + * Calls drm_dep_job_fini(), then invokes &drm_dep_job_ops.release if set,
> > > + * otherwise frees @job with kfree().  Finally, releases the queue reference
> > > + * that was acquired by drm_dep_job_init() via drm_dep_queue_put().  The
> > > + * queue put is performed last to ensure no queue state is accessed after
> > > + * the job memory is freed.
> > > + *
> > > + * Context: Any context if %DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE is set on the
> > > + *   job's queue; otherwise process context only, as the release callback may
> > > + *   sleep.
> > > + */
> > > +static void drm_dep_job_release(struct kref *kref)
> > > +{
> > > +	struct drm_dep_job *job =
> > > +		container_of(kref, struct drm_dep_job, refcount);
> > > +	struct drm_dep_queue *q = job->q;
> > > +
> > > +	drm_dep_job_fini(job);
> > > +
> > > +	if (job->ops && job->ops->release)
> > > +		job->ops->release(job);
> > > +	else
> > > +		kfree(job);
> > > +
> > > +	drm_dep_queue_put(q);
> > > +}
> > > +
> > > +/**
> > > + * drm_dep_job_put() - release a reference to a dep job
> > > + * @job: dep job to release a reference on, or NULL
> > > + *
> > > + * When the last reference is dropped, calls &drm_dep_job_ops.release if set,
> > > + * otherwise frees @job with kfree(). Does nothing if @job is NULL.
> > > + *
> > > + * Context: Any context if %DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE is set on the
> > > + *   job's queue; otherwise process context only, as the release callback may
> > > + *   sleep.
> > > + */
> > > +void drm_dep_job_put(struct drm_dep_job *job)
> > > +{
> > > +	if (job)
> > > +		kref_put(&job->refcount, drm_dep_job_release);
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_job_put);
> > > +
> > > +/**
> > > + * drm_dep_job_arm() - arm a dep job for submission
> > > + * @job: dep job to arm
> > > + *
> > > + * Initialises the finished fence on @job->dfence, assigning
> > > + * it a sequence number from the job's queue. Must be called after
> > > + * drm_dep_job_init() and before drm_dep_job_push(). Once armed,
> > > + * drm_dep_job_finished_fence() returns a valid fence that may be passed to
> > > + * userspace or used as a dependency by other jobs.
> > > + *
> > > + * Begins the DMA fence signalling path via dma_fence_begin_signalling().
> > > + * After this point, memory allocations that could trigger reclaim are
> > > + * forbidden; lockdep enforces this. arm() must always be paired with
> > > + * drm_dep_job_push(); lockdep also enforces this pairing.
> > > + *
> > > + * Warns if the job has already been armed.
> > > + *
> > > + * Context: Process context if %DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED is set
> > > + *   (takes @q->sched.lock, a mutex); any context otherwise. DMA fence signaling
> > > + *   path.
> > > + */
> > > +void drm_dep_job_arm(struct drm_dep_job *job)
> > > +{
> > > +	drm_dep_queue_push_job_begin(job->q);
> > > +	WARN_ON(drm_dep_fence_is_armed(job->dfence));
> > > +	drm_dep_fence_init(job->dfence, job->q);
> > > +	job->signalling_cookie = dma_fence_begin_signalling();  
> > 
> > I'd really like DMA-signalling-path annotation to be something that
> > doesn't leak to the job object. The way I see it, in the submit path,
> > it should be some sort of block initializing an opaque token, and
> > drm_dep_job_arm() should expect a valid token to be passed, thus
> > guaranteeing that anything between arm and push, and more generally
> > anything in that section is safe.
> >   
> 
> Yes. drm_dep_queue_push_job_begin internally creates a token (current)
> that is paired drm_dep_queue_push_job_end. If you ever have imbalance
> between arm() and push() you will get complaints.
> 
> > 	struct drm_job_submit_context submit_ctx;
> > 
> > 	// Do all the prep stuff, pre-alloc, resv setup, ...
> > 
> > 	// Non-faillible section of the submit starts here.
> > 	// This is properly annotated with
> > 	// dma_fence_{begin,end}_signalling() to ensure we're
> > 	// not taking locks or doing allocations forbidden in
> > 	// the signalling path
> > 	drm_job_submit_non_faillible_section(&submit_ctx) {
> > 		for_each_job() {
> > 			drm_dep_job_arm(&submit_ctx, &job);
> > 
> > 			// pass the armed fence around, if needed
> > 
> > 			drm_dep_job_push(&submit_ctx, &job);
> > 		}
> > 	}
> > 
> > With the current solution, there's no control that
> > drm_dep_job_{arm,push}() calls are balanced, with the risk of leaving a
> > DMA-signalling annotation behind.  
> 
> See above, that is what drm_dep_queue_push_job_begin/end do.

That's still error-prone, and the kind of errors you only detect at
runtime. Let alone the fact you might not even notice if the unbalanced
symptoms are caused by error paths that are rarely tested. I'm
proposing something that's designed so you can't make those mistakes
unless you really want to:

- drm_job_submit_non_faillible_section() is a block-like macro
  with a clear scope before/after which the token is invalid
- drm_job_submit_non_faillible_section() is the only place that can
  produce a valid token (not enforceable in C, but with an
  __drm_dep_queue_create_submit_token() and proper disclaimer, I guess
  we can discourage people to inadvertently use it)
- drm_dep_job_{arm,push}() calls requires a valid token to work, and
  with the two points mentioned above, that means you can't call
  drm_dep_job_{arm,push}() outside a
  drm_job_submit_non_faillible_section() block

It's not quite the compile-time checks rust would enforce, but it's a
model that forces people to do it the right way, with extra runtime
checks for the case where they still got it wrong (like, putting the
_arm() and _push() in two different
drm_job_submit_non_faillible_section() blocks).

> 
> >   
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_job_arm);
> > > +
> > > +/**
> > > + * drm_dep_job_push() - submit a job to its queue for execution
> > > + * @job: dep job to push
> > > + *
> > > + * Submits @job to the queue it was initialised with. Must be called after
> > > + * drm_dep_job_arm(). Acquires a reference on @job on behalf of the queue,
> > > + * held until the queue is fully done with it. The reference is released
> > > + * directly in the finished-fence dma_fence callback for queues with
> > > + * %DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE (where drm_dep_job_done() may run
> > > + * from hardirq context), or via the put_job work item on the submit
> > > + * workqueue otherwise.
> > > + *
> > > + * Ends the DMA fence signalling path begun by drm_dep_job_arm() via
> > > + * dma_fence_end_signalling(). This must be paired with arm(); lockdep
> > > + * enforces the pairing.
> > > + *
> > > + * Once pushed, &drm_dep_queue_ops.run_job is guaranteed to be called for
> > > + * @job exactly once, even if the queue is killed or torn down before the
> > > + * job reaches the head of the queue. Drivers can use this guarantee to
> > > + * perform bookkeeping cleanup; the actual backend operation should be
> > > + * skipped when drm_dep_queue_is_killed() returns true.
> > > + *
> > > + * If the queue does not support the bypass path, the job is pushed directly
> > > + * onto the SPSC submission queue via drm_dep_queue_push_job() without holding
> > > + * @q->sched.lock. Otherwise, @q->sched.lock is taken and the job is either
> > > + * run immediately via drm_dep_queue_run_job() if it qualifies for bypass, or
> > > + * enqueued via drm_dep_queue_push_job() for dispatch by the run_job work item.
> > > + *
> > > + * Warns if the job has not been armed.
> > > + *
> > > + * Context: Process context if %DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED is set
> > > + *   (takes @q->sched.lock, a mutex); any context otherwise. DMA fence signaling
> > > + *   path.
> > > + */
> > > +void drm_dep_job_push(struct drm_dep_job *job)
> > > +{
> > > +	struct drm_dep_queue *q = job->q;
> > > +
> > > +	WARN_ON(!drm_dep_fence_is_armed(job->dfence));
> > > +
> > > +	drm_dep_job_get(job);
> > > +
> > > +	if (!(q->sched.flags & DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED)) {
> > > +		drm_dep_queue_push_job(q, job);
> > > +		dma_fence_end_signalling(job->signalling_cookie);
> > > +		drm_dep_queue_push_job_end(job->q);
> > > +		return;
> > > +	}
> > > +
> > > +	scoped_guard(mutex, &q->sched.lock) {
> > > +		if (drm_dep_queue_can_job_bypass(q, job))
> > > +			drm_dep_queue_run_job(q, job);
> > > +		else
> > > +			drm_dep_queue_push_job(q, job);
> > > +	}
> > > +
> > > +	dma_fence_end_signalling(job->signalling_cookie);
> > > +	drm_dep_queue_push_job_end(job->q);
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_job_push);
> > > +
> > > +/**
> > > + * drm_dep_job_add_dependency() - adds the fence as a job dependency
> > > + * @job: dep job to add the dependencies to
> > > + * @fence: the dma_fence to add to the list of dependencies, or
> > > + *         %DRM_DEP_JOB_FENCE_PREALLOC to reserve a slot for later.
> > > + *
> > > + * Note that @fence is consumed in both the success and error cases (except
> > > + * when @fence is %DRM_DEP_JOB_FENCE_PREALLOC, which carries no reference).
> > > + *
> > > + * Signalled fences and fences belonging to the same queue as @job (i.e. where
> > > + * fence->context matches the queue's finished fence context) are silently
> > > + * dropped; the job need not wait on its own queue's output.
> > > + *
> > > + * Warns if the job has already been armed (dependencies must be added before
> > > + * drm_dep_job_arm()).
> > > + *
> > > + * **Pre-allocation pattern**
> > > + *
> > > + * When multiple jobs across different queues must be prepared and submitted
> > > + * together in a single atomic commit — for example, where job A's finished
> > > + * fence is an input dependency of job B — all jobs must be armed and pushed
> > > + * within a single dma_fence_begin_signalling() / dma_fence_end_signalling()
> > > + * region.  Once that region has started no memory allocation is permitted.
> > > + *
> > > + * To handle this, pass %DRM_DEP_JOB_FENCE_PREALLOC during the preparation
> > > + * phase (before arming any job, while GFP_KERNEL allocation is still allowed)
> > > + * to pre-allocate a slot in @job->dependencies.  The slot index assigned by
> > > + * the underlying xarray must be tracked by the caller separately (e.g. it is
> > > + * always index 0 when the dependency array is empty, as Xe relies on).
> > > + * After all jobs have been armed and the finished fences are available, call
> > > + * drm_dep_job_replace_dependency() with that index and the real fence.
> > > + * drm_dep_job_replace_dependency() uses GFP_NOWAIT internally and may be
> > > + * called from atomic or signalling context.
> > > + *
> > > + * The sentinel slot is never skipped by the signalled-fence fast-path,
> > > + * ensuring a slot is always allocated even when the real fence is not yet
> > > + * known.
> > > + *
> > > + * **Example: bind job feeding TLB invalidation jobs**
> > > + *
> > > + * Consider a GPU with separate queues for page-table bind operations and for
> > > + * TLB invalidation.  A single atomic commit must:
> > > + *
> > > + *  1. Run a bind job that modifies page tables.
> > > + *  2. Run one TLB-invalidation job per MMU that depends on the bind
> > > + *     completing, so stale translations are flushed before the engines
> > > + *     continue.
> > > + *
> > > + * Because all jobs must be armed and pushed inside a signalling region (where
> > > + * GFP_KERNEL is forbidden), pre-allocate slots before entering the region::
> > > + *
> > > + *   // Phase 1 — process context, GFP_KERNEL allowed
> > > + *   drm_dep_job_init(bind_job, bind_queue, ops);
> > > + *   for_each_mmu(mmu) {
> > > + *       drm_dep_job_init(tlb_job[mmu], tlb_queue[mmu], ops);
> > > + *       // Pre-allocate slot at index 0; real fence not available yet
> > > + *       drm_dep_job_add_dependency(tlb_job[mmu], DRM_DEP_JOB_FENCE_PREALLOC);
> > > + *   }
> > > + *
> > > + *   // Phase 2 — inside signalling region, no GFP_KERNEL
> > > + *   dma_fence_begin_signalling();
> > > + *   drm_dep_job_arm(bind_job);
> > > + *   for_each_mmu(mmu) {
> > > + *       // Swap sentinel for bind job's finished fence
> > > + *       drm_dep_job_replace_dependency(tlb_job[mmu], 0,
> > > + *                                      dma_fence_get(bind_job->finished));  
> > 
> > Just FYI, Panthor doesn't have this {begin,end}_signalling() in the
> > submit path. If we were to add it, it would be around the
> > panthor_submit_ctx_push_jobs() call, which might seem broken. In  
> 
> Yes, I noticed that. I put XXX comment in my port [1] around this.
> 
> [1] https://patchwork.freedesktop.org/patch/711952/?series=163245&rev=1
> 
> > practice I don't think it is because we don't expose fences to the
> > outside world until all jobs have been pushed. So what happens is that
> > a job depending on a previous job in the same batch-submit has the
> > armed-but-not-yet-pushed fence in its deps, and that's the only place
> > where this fence is present. If something fails on a subsequent job
> > preparation in the next batch submit, the rollback logic will just drop
> > the jobs on the floor, and release the armed-but-not-pushed-fence,
> > meaning we're not leaking a fence that will never be signalled. I'm in
> > no way saying this design is sane, just trying to explain why it's
> > currently safe and works fine.  
> 
> Yep, I think would be better have no failure points between arm and
> push which again I do my best to enforce via lockdep/warnings.

I'm still not entirely convinced by that. To me _arm() is not quite the
moment you make your fence public, and I'm not sure the extra complexity
added for intra-batch dependencies (one job in a SUBMIT depending on a
previous job in the same SUBMIT) is justified, because what really
matters is not that we leave dangling/unsignalled dma_fence objects
around, the problem is when you do so on an object that has been
exposed publicly (syncobj, dma_resv, sync_file, ...).

> 
> > 
> > In general, I wonder if we should distinguish between "armed" and
> > "publicly exposed" to help deal with this intra-batch dep thing without
> > resorting to reservation and other tricks like that.
> >   
> 
> I'm not exactly sure what you suggesting but always open to ideas.

Right now _arm() is what does the dma_fence_init(). But there's an
extra step between initializing the fence object and making it
visible to the outside world. In order for the dep to be added to the
job, you need the fence to be initialized, but that's not quite
external visibility, because the job is still very much a driver
object, and if something fails, the rollback mechanism makes it so all
the deps are dropped on the floor along the job that's being destroyed.
So we won't really wait on this fence that's never going to be
signalled.

I see what's appealing in pretending that _arm() == externally-visible,
but it's also forcing us to do extra pre-alloc (or other pre-init)
operations that would otherwise not be required in the submit path. Not
a hill I'm willing to die on, but I just thought I'd mention the fact I
find it weird that we put extra constraints on ourselves that are not
strictly needed, just because we fail to properly flag the dma_fence
visibility transitions.

On the rust side it would be directly described through the type
system (see the Visibility attribute in Daniel's branch[1]). On C side,
this could take the form of a new DMA_FENCE_FLAG_INACTIVE (or whichever
name you want to give it). Any operation pushing the fence to public
container (dma_resv, syncobj, sync_file, ...) would be rejected when
that flag is set. At _push() time, we'd clear that flag with a
dma_fence_set_active() helper, which would reflect the fact the fence
can now be observed and exposed to the outside world.

> 
> > > + *       drm_dep_job_arm(tlb_job[mmu]);
> > > + *   }
> > > + *   drm_dep_job_push(bind_job);
> > > + *   for_each_mmu(mmu)
> > > + *       drm_dep_job_push(tlb_job[mmu]);
> > > + *   dma_fence_end_signalling();
> > > + *
> > > + * Context: Process context. May allocate memory with GFP_KERNEL.
> > > + * Return: If fence == DRM_DEP_JOB_FENCE_PREALLOC index of allocation on
> > > + * success, else 0 on success, or a negative error code.
> > > + */
> > > +int drm_dep_job_add_dependency(struct drm_dep_job *job, struct dma_fence *fence)
> > > +{
> > > +	struct drm_dep_queue *q = job->q;
> > > +	struct dma_fence *entry;
> > > +	unsigned long index;
> > > +	u32 id = 0;
> > > +	int ret;
> > > +
> > > +	WARN_ON(drm_dep_fence_is_armed(job->dfence));
> > > +	might_alloc(GFP_KERNEL);
> > > +
> > > +	if (!fence)
> > > +		return 0;
> > > +
> > > +	if (fence == DRM_DEP_JOB_FENCE_PREALLOC)
> > > +		goto add_fence;
> > > +
> > > +	/*
> > > +	 * Ignore signalled fences or fences from our own queue — finished
> > > +	 * fences use q->fence.context.
> > > +	 */
> > > +	if (dma_fence_test_signaled_flag(fence) ||
> > > +	    fence->context == q->fence.context) {
> > > +		dma_fence_put(fence);
> > > +		return 0;
> > > +	}
> > > +
> > > +	/* Deduplicate if we already depend on a fence from the same context.
> > > +	 * This lets the size of the array of deps scale with the number of
> > > +	 * engines involved, rather than the number of BOs.
> > > +	 */
> > > +	xa_for_each(&job->dependencies, index, entry) {
> > > +		if (entry == DRM_DEP_JOB_FENCE_PREALLOC ||
> > > +		    entry->context != fence->context)
> > > +			continue;
> > > +
> > > +		if (dma_fence_is_later(fence, entry)) {
> > > +			dma_fence_put(entry);
> > > +			xa_store(&job->dependencies, index, fence, GFP_KERNEL);
> > > +		} else {
> > > +			dma_fence_put(fence);
> > > +		}
> > > +		return 0;
> > > +	}
> > > +
> > > +add_fence:
> > > +	ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b,
> > > +		       GFP_KERNEL);
> > > +	if (ret != 0) {
> > > +		if (fence != DRM_DEP_JOB_FENCE_PREALLOC)
> > > +			dma_fence_put(fence);
> > > +		return ret;
> > > +	}
> > > +
> > > +	return (fence == DRM_DEP_JOB_FENCE_PREALLOC) ? id : 0;
> > > +}
> > > +EXPORT_SYMBOL(drm_dep_job_add_dependency);
> > > +
> > > +/**
> > > + * drm_dep_job_replace_dependency() - replace a pre-allocated dependency slot
> > > + * @job: dep job to update
> > > + * @index: xarray index of the slot to replace, as returned when the sentinel
> > > + *         was originally inserted via drm_dep_job_add_dependency()
> > > + * @fence: the real dma_fence to store; its reference is always consumed
> > > + *
> > > + * Replaces the %DRM_DEP_JOB_FENCE_PREALLOC sentinel at @index in
> > > + * @job->dependencies with @fence.  The slot must have been pre-allocated by
> > > + * passing %DRM_DEP_JOB_FENCE_PREALLOC to drm_dep_job_add_dependency(); the
> > > + * existing entry is asserted to be the sentinel.
> > > + *
> > > + * This is the second half of the pre-allocation pattern described in
> > > + * drm_dep_job_add_dependency().  It is intended to be called inside a
> > > + * dma_fence_begin_signalling() / dma_fence_end_signalling() region where
> > > + * memory allocation with GFP_KERNEL is forbidden.  It uses GFP_NOWAIT
> > > + * internally so it is safe to call from atomic or signalling context, but
> > > + * since the slot has been pre-allocated no actual memory allocation occurs.
> > > + *
> > > + * If @fence is already signalled the slot is erased rather than storing a
> > > + * redundant dependency.  The successful store is asserted — if the store
> > > + * fails it indicates a programming error (slot index out of range or
> > > + * concurrent modification).
> > > + *
> > > + * Must be called before drm_dep_job_arm(). @fence is consumed in all cases.
> > > + *
> > > + * Context: Any context. DMA fence signaling path.
> > > + */
> > > +void drm_dep_job_replace_dependency(struct drm_dep_job *job, u32 index,
> > > +				    struct dma_fence *fence)
> > > +{
> > > +	WARN_ON(xa_load(&job->dependencies, index) !=
> > > +		DRM_DEP_JOB_FENCE_PREALLOC);
> > > +
> > > +	if (dma_fence_test_signaled_flag(fence)) {
> > > +		xa_erase(&job->dependencies, index);
> > > +		dma_fence_put(fence);
> > > +		return;
> > > +	}
> > > +
> > > +	if (WARN_ON(xa_is_err(xa_store(&job->dependencies, index, fence,
> > > +				       GFP_NOWAIT)))) {
> > > +		dma_fence_put(fence);
> > > +		return;
> > > +	}  
> > 
> > You don't seem to go for the
> > replace-if-earlier-fence-on-same-context-exists optimization that we
> > have in drm_dep_job_add_dependency(). Any reason not to?
> >   
> 
> No, that could be added in. My reasoning for ommiting was if you are
> pre-alloc a slot you likely know that the same timeline hasn't already
> been added in but maybe that is bad assumption.

Hm, in Panthor that would mean extra checks driver side, because at the
moment we don't check where deps come from. I'd be tempted to say, the
more we can automate the better, dunno.

Regards,

Boris

[1]https://gitlab.freedesktop.org/panfrost/linux/-/merge_requests/61/diffs#a5a71f917ff65cfe4c1a341fa7e55ae149d22863_300_693

next prev parent reply	other threads:[~2026-03-19  9:11 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260316043255.226352-1-matthew.brost@intel.com>
2026-03-16  4:32 ` [RFC PATCH 01/12] workqueue: Add interface to teach lockdep to warn on reclaim violations Matthew Brost
2026-03-25 15:59   ` Tejun Heo
2026-03-26  1:49     ` Matthew Brost
2026-03-26  2:19       ` Tejun Heo
2026-03-27  4:33         ` Matthew Brost
2026-03-16  4:32 ` [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer Matthew Brost
2026-03-16  9:16   ` Boris Brezillon
2026-03-17  5:22     ` Matthew Brost
2026-03-17  8:48       ` Boris Brezillon
2026-03-16 10:25   ` Danilo Krummrich
2026-03-17  5:10     ` Matthew Brost
2026-03-17 12:19       ` Danilo Krummrich
2026-03-18 23:02         ` Matthew Brost
2026-03-17  2:47   ` Daniel Almeida
2026-03-17  5:45     ` Matthew Brost
2026-03-17  7:17       ` Miguel Ojeda
2026-03-17  8:26         ` Matthew Brost
2026-03-17 12:04           ` Daniel Almeida
2026-03-17 19:41           ` Miguel Ojeda
2026-03-23 17:31             ` Matthew Brost
2026-03-23 17:42               ` Miguel Ojeda
2026-03-17 18:14       ` Matthew Brost
2026-03-17 19:48         ` Daniel Almeida
2026-03-17 20:43         ` Boris Brezillon
2026-03-18 22:40           ` Matthew Brost
2026-03-19  9:57             ` Boris Brezillon
2026-03-22  6:43               ` Matthew Brost
2026-03-23  7:58                 ` Matthew Brost
2026-03-23 10:06                   ` Boris Brezillon
2026-03-23 17:11                     ` Matthew Brost
2026-03-17 12:31     ` Danilo Krummrich
2026-03-17 14:25       ` Daniel Almeida
2026-03-17 14:33         ` Danilo Krummrich
2026-03-18 22:50           ` Matthew Brost
2026-03-17  8:47   ` Christian König
2026-03-17 14:55   ` Boris Brezillon
2026-03-18 23:28     ` Matthew Brost
2026-03-19  9:11       ` Boris Brezillon [this message]
2026-03-23  4:50         ` Matthew Brost
2026-03-23  9:55           ` Boris Brezillon
2026-03-23 17:08             ` Matthew Brost
2026-03-23 18:38               ` Matthew Brost
2026-03-24  9:23                 ` Boris Brezillon
2026-03-24 16:06                   ` Matthew Brost
2026-03-25  2:33                     ` Matthew Brost
2026-03-24  8:49               ` Boris Brezillon
2026-03-24 16:51                 ` Matthew Brost
2026-03-17 16:30   ` Shashank Sharma
2026-03-16  4:32 ` [RFC PATCH 11/12] accel/amdxdna: Convert to drm_dep scheduler layer Matthew Brost
2026-03-16  4:32 ` [RFC PATCH 12/12] drm/panthor: " Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260319101153.169c7f36@fedora \
    --to=boris.brezillon@collabora.com \
    --cc=airlied@gmail.com \
    --cc=christian.koenig@amd.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=matthew.brost@intel.com \
    --cc=mripard@kernel.org \
    --cc=phasta@kernel.org \
    --cc=rodrigo.vivi@intel.com \
    --cc=simona@ffwll.ch \
    --cc=sumit.semwal@linaro.org \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=tvrtko.ursulin@igalia.com \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox