All of lore.kernel.org
 help / color / mirror / Atom feed
* Independence for dma_fences! v7
@ 2026-02-10 10:01 Christian König
  2026-02-10 10:01 ` [PATCH 1/8] dma-buf: protected fence ops by RCU v5 Christian König
                   ` (7 more replies)
  0 siblings, 8 replies; 33+ messages in thread
From: Christian König @ 2026-02-10 10:01 UTC (permalink / raw)
  To: phasta, matthew.brost, sumit.semwal; +Cc: dri-devel, linaro-mm-sig

Hi everyone,

dma_fences have ever lived under the tyranny dictated by the module
lifetime of their issuer, leading to crashes should anybody still holding
a reference to a dma_fence when the module of the issuer was unloaded.

The basic problem is that when buffer are shared between drivers
dma_fence objects can leak into external drivers and stay there even
after they are signaled. The dma_resv object for example only lazy releases
dma_fences.

So what happens is that when the module who originally created the dma_fence
unloads the dma_fence_ops function table becomes unavailable as well and so
any attempt to release the fence crashes the system.

Previously various approaches have been discussed, including changing the
locking semantics of the dma_fence callbacks (by me) as well as using the
drm scheduler as intermediate layer (by Sima) to disconnect dma_fences
from their actual users, but none of them are actually solving all problems.

Tvrtko did some really nice prerequisite work by protecting the returned
strings of the dma_fence_ops by RCU. This way dma_fence creators where
able to just wait for an RCU grace period after fence signaling before
they could be save to free those data structures.

Now this patch set here goes a step further and protects the whole
dma_fence_ops structure by RCU, so that after the fence signals the
pointer to the dma_fence_ops is set to NULL when there is no wait nor
release callback given. All functionality which use the dma_fence_ops
reference are put inside an RCU critical section, except for the
deprecated issuer specific wait and of course the optional release
callback.

Additional to the RCU changes the lock protecting the dma_fence state
previously had to be allocated external. This set here now changes the
functionality to make that external lock optional and allows dma_fences
to use an inline lock and be self contained.

v4:

Rebases the whole set on upstream changes, especially the cleanup
from Philip in patch "drm/amdgpu: independence for the amdkfd_fence!".

Adding two patches which brings the DMA-fence self tests up to date.
The first selftest changes removes the mock_wait and so actually starts
testing the default behavior instead of some hacky implementation in the
test. This one got upstreamed independent of this set.
The second drops the mock_fence as well and tests the new RCU and inline
spinlock functionality.

v5:

Rebase on top of drm-misc-next instead of drm-tip, leave out all driver
changes for now since those should go through the driver specific paths
anyway.

Address a few more review comments, especially some rebase mess and
typos. And finally fix one more bug found by AMDs CI system.

v6:

Minor style changes, re-ordered patch #1, dropped the scheduler fence
change for now

v7:

The patch adding the dma_fence_was_initialized() function was pushed
upstream individually since that is really an independent cleanup.

Fixed some missing i915 bits in patch "dma-buf: abstract fence locking".

Please review and comment,
Christian.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/8] dma-buf: protected fence ops by RCU v5
  2026-02-10 10:01 Independence for dma_fences! v7 Christian König
@ 2026-02-10 10:01 ` Christian König
  2026-02-11 10:06   ` Philipp Stanner
                     ` (2 more replies)
  2026-02-10 10:01 ` [PATCH 2/8] dma-buf: detach fence ops on signal v2 Christian König
                   ` (6 subsequent siblings)
  7 siblings, 3 replies; 33+ messages in thread
From: Christian König @ 2026-02-10 10:01 UTC (permalink / raw)
  To: phasta, matthew.brost, sumit.semwal; +Cc: dri-devel, linaro-mm-sig

At first glance it is counter intuitive to protect a constant function
pointer table by RCU, but this allows modules providing the function
table to unload by waiting for an RCU grace period.

v2: make one the now duplicated lockdep warnings a comment instead.
v3: Add more documentation to ->wait and ->release callback.
v4: fix typo in documentation
v5: rebased on drm-tip

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-fence.c | 69 +++++++++++++++++++++++++------------
 include/linux/dma-fence.h   | 29 ++++++++++++++--
 2 files changed, 73 insertions(+), 25 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index e05beae6e407..de9bf18be3d4 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -522,6 +522,7 @@ EXPORT_SYMBOL(dma_fence_signal);
 signed long
 dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
 {
+	const struct dma_fence_ops *ops;
 	signed long ret;
 
 	if (WARN_ON(timeout < 0))
@@ -533,15 +534,21 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
 
 	dma_fence_enable_sw_signaling(fence);
 
-	if (trace_dma_fence_wait_start_enabled()) {
-		rcu_read_lock();
-		trace_dma_fence_wait_start(fence);
+	rcu_read_lock();
+	ops = rcu_dereference(fence->ops);
+	trace_dma_fence_wait_start(fence);
+	if (ops->wait) {
+		/*
+		 * Implementing the wait ops is deprecated and not supported for
+		 * issuer independent fences, so it is ok to use the ops outside
+		 * the RCU protected section.
+		 */
+		rcu_read_unlock();
+		ret = ops->wait(fence, intr, timeout);
+	} else {
 		rcu_read_unlock();
-	}
-	if (fence->ops->wait)
-		ret = fence->ops->wait(fence, intr, timeout);
-	else
 		ret = dma_fence_default_wait(fence, intr, timeout);
+	}
 	if (trace_dma_fence_wait_end_enabled()) {
 		rcu_read_lock();
 		trace_dma_fence_wait_end(fence);
@@ -562,6 +569,7 @@ void dma_fence_release(struct kref *kref)
 {
 	struct dma_fence *fence =
 		container_of(kref, struct dma_fence, refcount);
+	const struct dma_fence_ops *ops;
 
 	rcu_read_lock();
 	trace_dma_fence_destroy(fence);
@@ -593,12 +601,12 @@ void dma_fence_release(struct kref *kref)
 		spin_unlock_irqrestore(fence->lock, flags);
 	}
 
-	rcu_read_unlock();
-
-	if (fence->ops->release)
-		fence->ops->release(fence);
+	ops = rcu_dereference(fence->ops);
+	if (ops->release)
+		ops->release(fence);
 	else
 		dma_fence_free(fence);
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL(dma_fence_release);
 
@@ -617,6 +625,7 @@ EXPORT_SYMBOL(dma_fence_free);
 
 static bool __dma_fence_enable_signaling(struct dma_fence *fence)
 {
+	const struct dma_fence_ops *ops;
 	bool was_set;
 
 	lockdep_assert_held(fence->lock);
@@ -627,14 +636,18 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
 	if (dma_fence_test_signaled_flag(fence))
 		return false;
 
-	if (!was_set && fence->ops->enable_signaling) {
+	rcu_read_lock();
+	ops = rcu_dereference(fence->ops);
+	if (!was_set && ops->enable_signaling) {
 		trace_dma_fence_enable_signal(fence);
 
-		if (!fence->ops->enable_signaling(fence)) {
+		if (!ops->enable_signaling(fence)) {
+			rcu_read_unlock();
 			dma_fence_signal_locked(fence);
 			return false;
 		}
 	}
+	rcu_read_unlock();
 
 	return true;
 }
@@ -1007,8 +1020,13 @@ EXPORT_SYMBOL(dma_fence_wait_any_timeout);
  */
 void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
 {
-	if (fence->ops->set_deadline && !dma_fence_is_signaled(fence))
-		fence->ops->set_deadline(fence, deadline);
+	const struct dma_fence_ops *ops;
+
+	rcu_read_lock();
+	ops = rcu_dereference(fence->ops);
+	if (ops->set_deadline && !dma_fence_is_signaled(fence))
+		ops->set_deadline(fence, deadline);
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL(dma_fence_set_deadline);
 
@@ -1049,7 +1067,12 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
 	BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
 
 	kref_init(&fence->refcount);
-	fence->ops = ops;
+	/*
+	 * At first glance it is counter intuitive to protect a constant
+	 * function pointer table by RCU, but this allows modules providing the
+	 * function table to unload by waiting for an RCU grace period.
+	 */
+	RCU_INIT_POINTER(fence->ops, ops);
 	INIT_LIST_HEAD(&fence->cb_list);
 	fence->lock = lock;
 	fence->context = context;
@@ -1129,11 +1152,12 @@ EXPORT_SYMBOL(dma_fence_init64);
  */
 const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
 {
-	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
-			 "RCU protection is required for safe access to returned string");
+	const struct dma_fence_ops *ops;
 
+	/* RCU protection is required for safe access to returned string */
+	ops = rcu_dereference(fence->ops);
 	if (!dma_fence_test_signaled_flag(fence))
-		return (const char __rcu *)fence->ops->get_driver_name(fence);
+		return (const char __rcu *)ops->get_driver_name(fence);
 	else
 		return (const char __rcu *)"detached-driver";
 }
@@ -1161,11 +1185,12 @@ EXPORT_SYMBOL(dma_fence_driver_name);
  */
 const char __rcu *dma_fence_timeline_name(struct dma_fence *fence)
 {
-	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
-			 "RCU protection is required for safe access to returned string");
+	const struct dma_fence_ops *ops;
 
+	/* RCU protection is required for safe access to returned string */
+	ops = rcu_dereference(fence->ops);
 	if (!dma_fence_test_signaled_flag(fence))
-		return (const char __rcu *)fence->ops->get_driver_name(fence);
+		return (const char __rcu *)ops->get_driver_name(fence);
 	else
 		return (const char __rcu *)"signaled-timeline";
 }
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 9c4d25289239..6bf4feb0e01f 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -67,7 +67,7 @@ struct seq_file;
  */
 struct dma_fence {
 	spinlock_t *lock;
-	const struct dma_fence_ops *ops;
+	const struct dma_fence_ops __rcu *ops;
 	/*
 	 * We clear the callback list on kref_put so that by the time we
 	 * release the fence it is unused. No one should be adding to the
@@ -220,6 +220,10 @@ struct dma_fence_ops {
 	 * timed out. Can also return other error values on custom implementations,
 	 * which should be treated as if the fence is signaled. For example a hardware
 	 * lockup could be reported like that.
+	 *
+	 * Implementing this callback prevents the fence from detaching after
+	 * signaling and so it is mandatory for the module providing the
+	 * dma_fence_ops to stay loaded as long as the dma_fence exists.
 	 */
 	signed long (*wait)(struct dma_fence *fence,
 			    bool intr, signed long timeout);
@@ -231,6 +235,13 @@ struct dma_fence_ops {
 	 * Can be called from irq context.  This callback is optional. If it is
 	 * NULL, then dma_fence_free() is instead called as the default
 	 * implementation.
+	 *
+	 * Implementing this callback prevents the fence from detaching after
+	 * signaling and so it is mandatory for the module providing the
+	 * dma_fence_ops to stay loaded as long as the dma_fence exists.
+	 *
+	 * If the callback is implemented the memory backing the dma_fence
+	 * object must be freed RCU safe.
 	 */
 	void (*release)(struct dma_fence *fence);
 
@@ -454,13 +465,19 @@ dma_fence_test_signaled_flag(struct dma_fence *fence)
 static inline bool
 dma_fence_is_signaled_locked(struct dma_fence *fence)
 {
+	const struct dma_fence_ops *ops;
+
 	if (dma_fence_test_signaled_flag(fence))
 		return true;
 
-	if (fence->ops->signaled && fence->ops->signaled(fence)) {
+	rcu_read_lock();
+	ops = rcu_dereference(fence->ops);
+	if (ops->signaled && ops->signaled(fence)) {
+		rcu_read_unlock();
 		dma_fence_signal_locked(fence);
 		return true;
 	}
+	rcu_read_unlock();
 
 	return false;
 }
@@ -484,13 +501,19 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
 static inline bool
 dma_fence_is_signaled(struct dma_fence *fence)
 {
+	const struct dma_fence_ops *ops;
+
 	if (dma_fence_test_signaled_flag(fence))
 		return true;
 
-	if (fence->ops->signaled && fence->ops->signaled(fence)) {
+	rcu_read_lock();
+	ops = rcu_dereference(fence->ops);
+	if (ops->signaled && ops->signaled(fence)) {
+		rcu_read_unlock();
 		dma_fence_signal(fence);
 		return true;
 	}
+	rcu_read_unlock();
 
 	return false;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/8] dma-buf: detach fence ops on signal v2
  2026-02-10 10:01 Independence for dma_fences! v7 Christian König
  2026-02-10 10:01 ` [PATCH 1/8] dma-buf: protected fence ops by RCU v5 Christian König
@ 2026-02-10 10:01 ` Christian König
  2026-02-13 14:22   ` Boris Brezillon
  2026-02-10 10:01 ` [PATCH 3/8] dma-buf: abstract fence locking v2 Christian König
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: Christian König @ 2026-02-10 10:01 UTC (permalink / raw)
  To: phasta, matthew.brost, sumit.semwal; +Cc: dri-devel, linaro-mm-sig

When neither a release nor a wait backend ops is specified it is possible
to let the dma_fence live on independently of the module who issued it.

This makes it possible to unload drivers and only wait for all their
fences to signal.

v2: fix typo in comment

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Philipp Stanner <phasta@kernel.org>
---
 drivers/dma-buf/dma-fence.c | 16 ++++++++++++----
 include/linux/dma-fence.h   |  4 ++--
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index de9bf18be3d4..ba02321bef0b 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -371,6 +371,14 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
 				      &fence->flags)))
 		return;
 
+	/*
+	 * When neither a release nor a wait operation is specified set the ops
+	 * pointer to NULL to allow the fence structure to become independent
+	 * from who originally issued it.
+	 */
+	if (!fence->ops->release && !fence->ops->wait)
+		RCU_INIT_POINTER(fence->ops, NULL);
+
 	/* Stash the cb_list before replacing it with the timestamp */
 	list_replace(&fence->cb_list, &cb_list);
 
@@ -537,7 +545,7 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
 	rcu_read_lock();
 	ops = rcu_dereference(fence->ops);
 	trace_dma_fence_wait_start(fence);
-	if (ops->wait) {
+	if (ops && ops->wait) {
 		/*
 		 * Implementing the wait ops is deprecated and not supported for
 		 * issuer independent fences, so it is ok to use the ops outside
@@ -602,7 +610,7 @@ void dma_fence_release(struct kref *kref)
 	}
 
 	ops = rcu_dereference(fence->ops);
-	if (ops->release)
+	if (ops && ops->release)
 		ops->release(fence);
 	else
 		dma_fence_free(fence);
@@ -638,7 +646,7 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
 
 	rcu_read_lock();
 	ops = rcu_dereference(fence->ops);
-	if (!was_set && ops->enable_signaling) {
+	if (!was_set && ops && ops->enable_signaling) {
 		trace_dma_fence_enable_signal(fence);
 
 		if (!ops->enable_signaling(fence)) {
@@ -1024,7 +1032,7 @@ void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
 
 	rcu_read_lock();
 	ops = rcu_dereference(fence->ops);
-	if (ops->set_deadline && !dma_fence_is_signaled(fence))
+	if (ops && ops->set_deadline && !dma_fence_is_signaled(fence))
 		ops->set_deadline(fence, deadline);
 	rcu_read_unlock();
 }
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 6bf4feb0e01f..e1afbb5909f9 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -472,7 +472,7 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
 
 	rcu_read_lock();
 	ops = rcu_dereference(fence->ops);
-	if (ops->signaled && ops->signaled(fence)) {
+	if (ops && ops->signaled && ops->signaled(fence)) {
 		rcu_read_unlock();
 		dma_fence_signal_locked(fence);
 		return true;
@@ -508,7 +508,7 @@ dma_fence_is_signaled(struct dma_fence *fence)
 
 	rcu_read_lock();
 	ops = rcu_dereference(fence->ops);
-	if (ops->signaled && ops->signaled(fence)) {
+	if (ops && ops->signaled && ops->signaled(fence)) {
 		rcu_read_unlock();
 		dma_fence_signal(fence);
 		return true;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/8] dma-buf: abstract fence locking v2
  2026-02-10 10:01 Independence for dma_fences! v7 Christian König
  2026-02-10 10:01 ` [PATCH 1/8] dma-buf: protected fence ops by RCU v5 Christian König
  2026-02-10 10:01 ` [PATCH 2/8] dma-buf: detach fence ops on signal v2 Christian König
@ 2026-02-10 10:01 ` Christian König
  2026-02-12  9:07   ` Tvrtko Ursulin
  2026-02-10 10:01 ` [PATCH 4/8] dma-buf: inline spinlock for fence protection v4 Christian König
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: Christian König @ 2026-02-10 10:01 UTC (permalink / raw)
  To: phasta, matthew.brost, sumit.semwal; +Cc: dri-devel, linaro-mm-sig

Add dma_fence_lock_irqsafe() and dma_fence_unlock_irqrestore() wrappers
and mechanically apply them everywhere.

Just a pre-requisite cleanup for a follow up patch.

v2: add some missing i915 bits, add abstraction for lockdep assertion as
    well

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> (v1)
---
 drivers/dma-buf/dma-fence.c                 | 48 ++++++++++-----------
 drivers/dma-buf/st-dma-fence.c              |  6 ++-
 drivers/dma-buf/sw_sync.c                   | 14 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c    |  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      |  4 +-
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c |  2 +-
 drivers/gpu/drm/i915/i915_active.c          | 19 ++++----
 drivers/gpu/drm/nouveau/nouveau_drm.c       |  5 ++-
 drivers/gpu/drm/scheduler/sched_fence.c     |  6 +--
 drivers/gpu/drm/xe/xe_sched_job.c           |  4 +-
 include/linux/dma-fence.h                   | 38 ++++++++++++++++
 11 files changed, 95 insertions(+), 55 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index ba02321bef0b..56aa59867eaa 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -365,7 +365,7 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
 	struct dma_fence_cb *cur, *tmp;
 	struct list_head cb_list;
 
-	lockdep_assert_held(fence->lock);
+	dma_fence_assert_held(fence);
 
 	if (unlikely(test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
 				      &fence->flags)))
@@ -412,9 +412,9 @@ void dma_fence_signal_timestamp(struct dma_fence *fence, ktime_t timestamp)
 	if (WARN_ON(!fence))
 		return;
 
-	spin_lock_irqsave(fence->lock, flags);
+	dma_fence_lock_irqsave(fence, flags);
 	dma_fence_signal_timestamp_locked(fence, timestamp);
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock_irqrestore(fence, flags);
 }
 EXPORT_SYMBOL(dma_fence_signal_timestamp);
 
@@ -473,9 +473,9 @@ bool dma_fence_check_and_signal(struct dma_fence *fence)
 	unsigned long flags;
 	bool ret;
 
-	spin_lock_irqsave(fence->lock, flags);
+	dma_fence_lock_irqsave(fence, flags);
 	ret = dma_fence_check_and_signal_locked(fence);
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock_irqrestore(fence, flags);
 
 	return ret;
 }
@@ -501,9 +501,9 @@ void dma_fence_signal(struct dma_fence *fence)
 
 	tmp = dma_fence_begin_signalling();
 
-	spin_lock_irqsave(fence->lock, flags);
+	dma_fence_lock_irqsave(fence, flags);
 	dma_fence_signal_timestamp_locked(fence, ktime_get());
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock_irqrestore(fence, flags);
 
 	dma_fence_end_signalling(tmp);
 }
@@ -603,10 +603,10 @@ void dma_fence_release(struct kref *kref)
 		 * don't leave chains dangling. We set the error flag first
 		 * so that the callbacks know this signal is due to an error.
 		 */
-		spin_lock_irqsave(fence->lock, flags);
+		dma_fence_lock_irqsave(fence, flags);
 		fence->error = -EDEADLK;
 		dma_fence_signal_locked(fence);
-		spin_unlock_irqrestore(fence->lock, flags);
+		dma_fence_unlock_irqrestore(fence, flags);
 	}
 
 	ops = rcu_dereference(fence->ops);
@@ -636,7 +636,7 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
 	const struct dma_fence_ops *ops;
 	bool was_set;
 
-	lockdep_assert_held(fence->lock);
+	dma_fence_assert_held(fence);
 
 	was_set = test_and_set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
 				   &fence->flags);
@@ -672,9 +672,9 @@ void dma_fence_enable_sw_signaling(struct dma_fence *fence)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(fence->lock, flags);
+	dma_fence_lock_irqsave(fence, flags);
 	__dma_fence_enable_signaling(fence);
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock_irqrestore(fence, flags);
 }
 EXPORT_SYMBOL(dma_fence_enable_sw_signaling);
 
@@ -714,8 +714,7 @@ int dma_fence_add_callback(struct dma_fence *fence, struct dma_fence_cb *cb,
 		return -ENOENT;
 	}
 
-	spin_lock_irqsave(fence->lock, flags);
-
+	dma_fence_lock_irqsave(fence, flags);
 	if (__dma_fence_enable_signaling(fence)) {
 		cb->func = func;
 		list_add_tail(&cb->node, &fence->cb_list);
@@ -723,8 +722,7 @@ int dma_fence_add_callback(struct dma_fence *fence, struct dma_fence_cb *cb,
 		INIT_LIST_HEAD(&cb->node);
 		ret = -ENOENT;
 	}
-
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock_irqrestore(fence, flags);
 
 	return ret;
 }
@@ -747,9 +745,9 @@ int dma_fence_get_status(struct dma_fence *fence)
 	unsigned long flags;
 	int status;
 
-	spin_lock_irqsave(fence->lock, flags);
+	dma_fence_lock_irqsave(fence, flags);
 	status = dma_fence_get_status_locked(fence);
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock_irqrestore(fence, flags);
 
 	return status;
 }
@@ -779,13 +777,11 @@ dma_fence_remove_callback(struct dma_fence *fence, struct dma_fence_cb *cb)
 	unsigned long flags;
 	bool ret;
 
-	spin_lock_irqsave(fence->lock, flags);
-
+	dma_fence_lock_irqsave(fence, flags);
 	ret = !list_empty(&cb->node);
 	if (ret)
 		list_del_init(&cb->node);
-
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock_irqrestore(fence, flags);
 
 	return ret;
 }
@@ -824,7 +820,7 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
 	unsigned long flags;
 	signed long ret = timeout ? timeout : 1;
 
-	spin_lock_irqsave(fence->lock, flags);
+	dma_fence_lock_irqsave(fence, flags);
 
 	if (dma_fence_test_signaled_flag(fence))
 		goto out;
@@ -848,11 +844,11 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
 			__set_current_state(TASK_INTERRUPTIBLE);
 		else
 			__set_current_state(TASK_UNINTERRUPTIBLE);
-		spin_unlock_irqrestore(fence->lock, flags);
+		dma_fence_unlock_irqrestore(fence, flags);
 
 		ret = schedule_timeout(ret);
 
-		spin_lock_irqsave(fence->lock, flags);
+		dma_fence_lock_irqsave(fence, flags);
 		if (ret > 0 && intr && signal_pending(current))
 			ret = -ERESTARTSYS;
 	}
@@ -862,7 +858,7 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
 	__set_current_state(TASK_RUNNING);
 
 out:
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock_irqrestore(fence, flags);
 	return ret;
 }
 EXPORT_SYMBOL(dma_fence_default_wait);
diff --git a/drivers/dma-buf/st-dma-fence.c b/drivers/dma-buf/st-dma-fence.c
index 73ed6fd48a13..5d0d9abc6e21 100644
--- a/drivers/dma-buf/st-dma-fence.c
+++ b/drivers/dma-buf/st-dma-fence.c
@@ -410,8 +410,10 @@ struct race_thread {
 
 static void __wait_for_callbacks(struct dma_fence *f)
 {
-	spin_lock_irq(f->lock);
-	spin_unlock_irq(f->lock);
+	unsigned long flags;
+
+	dma_fence_lock_irqsave(f, flags);
+	dma_fence_unlock_irqrestore(f, flags);
 }
 
 static int thread_signal_callback(void *arg)
diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
index 6f09d13be6b6..4c81a37dd682 100644
--- a/drivers/dma-buf/sw_sync.c
+++ b/drivers/dma-buf/sw_sync.c
@@ -156,12 +156,12 @@ static void timeline_fence_release(struct dma_fence *fence)
 	struct sync_timeline *parent = dma_fence_parent(fence);
 	unsigned long flags;
 
-	spin_lock_irqsave(fence->lock, flags);
+	dma_fence_lock_irqsave(fence, flags);
 	if (!list_empty(&pt->link)) {
 		list_del(&pt->link);
 		rb_erase(&pt->node, &parent->pt_tree);
 	}
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock_irqrestore(fence, flags);
 
 	sync_timeline_put(parent);
 	dma_fence_free(fence);
@@ -179,7 +179,7 @@ static void timeline_fence_set_deadline(struct dma_fence *fence, ktime_t deadlin
 	struct sync_pt *pt = dma_fence_to_sync_pt(fence);
 	unsigned long flags;
 
-	spin_lock_irqsave(fence->lock, flags);
+	dma_fence_lock_irqsave(fence, flags);
 	if (test_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags)) {
 		if (ktime_before(deadline, pt->deadline))
 			pt->deadline = deadline;
@@ -187,7 +187,7 @@ static void timeline_fence_set_deadline(struct dma_fence *fence, ktime_t deadlin
 		pt->deadline = deadline;
 		__set_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags);
 	}
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock_irqrestore(fence, flags);
 }
 
 static const struct dma_fence_ops timeline_fence_ops = {
@@ -431,13 +431,13 @@ static int sw_sync_ioctl_get_deadline(struct sync_timeline *obj, unsigned long a
 		goto put_fence;
 	}
 
-	spin_lock_irqsave(fence->lock, flags);
+	dma_fence_lock_irqsave(fence, flags);
 	if (!test_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags)) {
 		ret = -ENOENT;
 		goto unlock;
 	}
 	data.deadline_ns = ktime_to_ns(pt->deadline);
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock_irqrestore(fence, flags);
 
 	dma_fence_put(fence);
 
@@ -450,7 +450,7 @@ static int sw_sync_ioctl_get_deadline(struct sync_timeline *obj, unsigned long a
 	return 0;
 
 unlock:
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock_irqrestore(fence, flags);
 put_fence:
 	dma_fence_put(fence);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index b82357c65723..1404e1fe62a4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -479,10 +479,10 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, unsigned int vmid,
 	if (amdgpu_sriov_vf(ring->adev) || !ring->funcs->soft_recovery || !fence)
 		return false;
 
-	spin_lock_irqsave(fence->lock, flags);
+	dma_fence_lock_irqsave(fence, flags);
 	if (!dma_fence_is_signaled_locked(fence))
 		dma_fence_set_error(fence, -ENODATA);
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock_irqrestore(fence, flags);
 
 	while (!dma_fence_is_signaled(fence) &&
 	       ktime_to_ns(ktime_sub(deadline, ktime_get())) > 0)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 6a2ea200d90c..4761e7486811 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2802,8 +2802,8 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 	dma_fence_put(vm->last_unlocked);
 	dma_fence_wait(vm->last_tlb_flush, false);
 	/* Make sure that all fence callbacks have completed */
-	spin_lock_irqsave(vm->last_tlb_flush->lock, flags);
-	spin_unlock_irqrestore(vm->last_tlb_flush->lock, flags);
+	dma_fence_lock_irqsave(vm->last_tlb_flush, flags);
+	dma_fence_unlock_irqrestore(vm->last_tlb_flush, flags);
 	dma_fence_put(vm->last_tlb_flush);
 
 	list_for_each_entry_safe(mapping, tmp, &vm->freed, list) {
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index bf6117d5fc57..78ea2d9ccedf 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -148,7 +148,7 @@ __dma_fence_signal__notify(struct dma_fence *fence,
 {
 	struct dma_fence_cb *cur, *tmp;
 
-	lockdep_assert_held(fence->lock);
+	dma_fence_assert_held(fence);
 
 	list_for_each_entry_safe(cur, tmp, list, node) {
 		INIT_LIST_HEAD(&cur->node);
diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index 6b0c1162505a..9d41e052ab65 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -1045,9 +1045,10 @@ __i915_active_fence_set(struct i915_active_fence *active,
 	 * nesting rules for the fence->lock; the inner lock is always the
 	 * older lock.
 	 */
-	spin_lock_irqsave(fence->lock, flags);
+	dma_fence_lock_irqsave(fence, flags);
 	if (prev)
-		spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING);
+		spin_lock_nested(dma_fence_spinlock(prev),
+				 SINGLE_DEPTH_NESTING);
 
 	/*
 	 * A does the cmpxchg first, and so it sees C or NULL, as before, or
@@ -1061,17 +1062,18 @@ __i915_active_fence_set(struct i915_active_fence *active,
 	 */
 	while (cmpxchg(__active_fence_slot(active), prev, fence) != prev) {
 		if (prev) {
-			spin_unlock(prev->lock);
+			spin_unlock(dma_fence_spinlock(prev));
 			dma_fence_put(prev);
 		}
-		spin_unlock_irqrestore(fence->lock, flags);
+		dma_fence_unlock_irqrestore(fence, flags);
 
 		prev = i915_active_fence_get(active);
 		GEM_BUG_ON(prev == fence);
 
-		spin_lock_irqsave(fence->lock, flags);
+		dma_fence_lock_irqsave(fence, flags);
 		if (prev)
-			spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING);
+			spin_lock_nested(dma_fence_spinlock(prev),
+					 SINGLE_DEPTH_NESTING);
 	}
 
 	/*
@@ -1088,10 +1090,11 @@ __i915_active_fence_set(struct i915_active_fence *active,
 	 */
 	if (prev) {
 		__list_del_entry(&active->cb.node);
-		spin_unlock(prev->lock); /* serialise with prev->cb_list */
+		/* serialise with prev->cb_list */
+		spin_unlock(dma_fence_spinlock(prev));
 	}
 	list_add_tail(&active->cb.node, &fence->cb_list);
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock_irqrestore(fence, flags);
 
 	return prev;
 }
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 1527b801f013..ec4dfa3ea725 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -156,12 +156,13 @@ nouveau_name(struct drm_device *dev)
 static inline bool
 nouveau_cli_work_ready(struct dma_fence *fence)
 {
+	unsigned long flags;
 	bool ret = true;
 
-	spin_lock_irq(fence->lock);
+	dma_fence_lock_irqsave(fence, flags);
 	if (!dma_fence_is_signaled_locked(fence))
 		ret = false;
-	spin_unlock_irq(fence->lock);
+	dma_fence_unlock_irqrestore(fence, flags);
 
 	if (ret == true)
 		dma_fence_put(fence);
diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
index 9391d6f0dc01..724d77694246 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -156,19 +156,19 @@ static void drm_sched_fence_set_deadline_finished(struct dma_fence *f,
 	struct dma_fence *parent;
 	unsigned long flags;
 
-	spin_lock_irqsave(&fence->lock, flags);
+	dma_fence_lock_irqsave(f, flags);
 
 	/* If we already have an earlier deadline, keep it: */
 	if (test_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) &&
 	    ktime_before(fence->deadline, deadline)) {
-		spin_unlock_irqrestore(&fence->lock, flags);
+		dma_fence_unlock_irqrestore(f, flags);
 		return;
 	}
 
 	fence->deadline = deadline;
 	set_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags);
 
-	spin_unlock_irqrestore(&fence->lock, flags);
+	dma_fence_unlock_irqrestore(f, flags);
 
 	/*
 	 * smp_load_aquire() to ensure that if we are racing another
diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
index 3927666fe556..ae5b38b2a884 100644
--- a/drivers/gpu/drm/xe/xe_sched_job.c
+++ b/drivers/gpu/drm/xe/xe_sched_job.c
@@ -190,11 +190,11 @@ static bool xe_fence_set_error(struct dma_fence *fence, int error)
 	unsigned long irq_flags;
 	bool signaled;
 
-	spin_lock_irqsave(fence->lock, irq_flags);
+	dma_fence_lock_irqsave(fence, irq_flags);
 	signaled = test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
 	if (!signaled)
 		dma_fence_set_error(fence, error);
-	spin_unlock_irqrestore(fence->lock, irq_flags);
+	dma_fence_unlock_irqrestore(fence, irq_flags);
 
 	return signaled;
 }
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index e1afbb5909f9..88c842fc35d5 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -377,6 +377,44 @@ dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep)
 	} while (1);
 }
 
+/**
+ * dma_fence_spinlock - return pointer to the spinlock protecting the fence
+ * @fence: the fence to get the lock from
+ *
+ * Return the pointer to the extern lock.
+ */
+static inline spinlock_t *dma_fence_spinlock(struct dma_fence *fence)
+{
+	return fence->lock;
+}
+
+/**
+ * dma_fence_lock_irqsave - irqsave lock the fence
+ * @fence: the fence to lock
+ * @flags: where to store the CPU flags.
+ *
+ * Lock the fence, preventing it from changing to the signaled state.
+ */
+#define dma_fence_lock_irqsave(fence, flags)	\
+	spin_lock_irqsave(fence->lock, flags)
+
+/**
+ * dma_fence_unlock_irqrestore - unlock the fence and irqrestore
+ * @fence: the fence to unlock
+ * @flags the CPU flags to restore
+ *
+ * Unlock the fence, allowing it to change it's state to signaled again.
+ */
+#define dma_fence_unlock_irqrestore(fence, flags)	\
+	spin_unlock_irqrestore(fence->lock, flags)
+
+/**
+ * dma_fence_assert_held - lockdep assertion that fence is locked
+ * @fence: the fence which should be locked
+ */
+#define dma_fence_assert_held(fence)	\
+	lockdep_assert_held(dma_fence_spinlock(fence));
+
 #ifdef CONFIG_LOCKDEP
 bool dma_fence_begin_signalling(void);
 void dma_fence_end_signalling(bool cookie);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/8] dma-buf: inline spinlock for fence protection v4
  2026-02-10 10:01 Independence for dma_fences! v7 Christian König
                   ` (2 preceding siblings ...)
  2026-02-10 10:01 ` [PATCH 3/8] dma-buf: abstract fence locking v2 Christian König
@ 2026-02-10 10:01 ` Christian König
  2026-02-11  9:50   ` Philipp Stanner
                     ` (2 more replies)
  2026-02-10 10:02 ` [PATCH 5/8] dma-buf/selftests: test RCU ops and inline lock v2 Christian König
                   ` (3 subsequent siblings)
  7 siblings, 3 replies; 33+ messages in thread
From: Christian König @ 2026-02-10 10:01 UTC (permalink / raw)
  To: phasta, matthew.brost, sumit.semwal; +Cc: dri-devel, linaro-mm-sig

Implement per-fence spinlocks, allowing implementations to not give an
external spinlock to protect the fence internal statei. Instead a spinlock
embedded into the fence structure itself is used in this case.

Shared spinlocks have the problem that implementations need to guarantee
that the lock live at least as long all fences referencing them.

Using a per-fence spinlock allows completely decoupling spinlock producer
and consumer life times, simplifying the handling in most use cases.

v2: improve naming, coverage and function documentation
v3: fix one additional locking in the selftests
v4: separate out some changes to make the patch smaller,
    fix one amdgpu crash found by CI systems

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-fence.c             | 21 ++++++++++++++++-----
 drivers/dma-buf/sync_debug.h            |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  2 +-
 drivers/gpu/drm/drm_crtc.c              |  2 +-
 drivers/gpu/drm/drm_writeback.c         |  2 +-
 drivers/gpu/drm/nouveau/nouveau_fence.c |  3 ++-
 drivers/gpu/drm/qxl/qxl_release.c       |  3 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c   |  3 ++-
 drivers/gpu/drm/xe/xe_hw_fence.c        |  3 ++-
 include/linux/dma-fence.h               | 19 +++++++++++++------
 10 files changed, 41 insertions(+), 19 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 56aa59867eaa..1833889e7466 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -343,7 +343,6 @@ void __dma_fence_might_wait(void)
 }
 #endif
 
-
 /**
  * dma_fence_signal_timestamp_locked - signal completion of a fence
  * @fence: the fence to signal
@@ -1067,7 +1066,6 @@ static void
 __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
 	         spinlock_t *lock, u64 context, u64 seqno, unsigned long flags)
 {
-	BUG_ON(!lock);
 	BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
 
 	kref_init(&fence->refcount);
@@ -1078,10 +1076,15 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
 	 */
 	RCU_INIT_POINTER(fence->ops, ops);
 	INIT_LIST_HEAD(&fence->cb_list);
-	fence->lock = lock;
 	fence->context = context;
 	fence->seqno = seqno;
 	fence->flags = flags | BIT(DMA_FENCE_FLAG_INITIALIZED_BIT);
+	if (lock) {
+		fence->extern_lock = lock;
+	} else {
+		spin_lock_init(&fence->inline_lock);
+		fence->flags |= BIT(DMA_FENCE_FLAG_INLINE_LOCK_BIT);
+	}
 	fence->error = 0;
 
 	trace_dma_fence_init(fence);
@@ -1091,7 +1094,7 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
  * dma_fence_init - Initialize a custom fence.
  * @fence: the fence to initialize
  * @ops: the dma_fence_ops for operations on this fence
- * @lock: the irqsafe spinlock to use for locking this fence
+ * @lock: optional irqsafe spinlock to use for locking this fence
  * @context: the execution context this fence is run on
  * @seqno: a linear increasing sequence number for this context
  *
@@ -1101,6 +1104,10 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
  *
  * context and seqno are used for easy comparison between fences, allowing
  * to check which fence is later by simply using dma_fence_later().
+ *
+ * It is strongly discouraged to provide an external lock. This is only allowed
+ * for legacy use cases when multiple fences need to be prevented from
+ * signaling out of order.
  */
 void
 dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
@@ -1114,7 +1121,7 @@ EXPORT_SYMBOL(dma_fence_init);
  * dma_fence_init64 - Initialize a custom fence with 64-bit seqno support.
  * @fence: the fence to initialize
  * @ops: the dma_fence_ops for operations on this fence
- * @lock: the irqsafe spinlock to use for locking this fence
+ * @lock: optional irqsafe spinlock to use for locking this fence
  * @context: the execution context this fence is run on
  * @seqno: a linear increasing sequence number for this context
  *
@@ -1124,6 +1131,10 @@ EXPORT_SYMBOL(dma_fence_init);
  *
  * Context and seqno are used for easy comparison between fences, allowing
  * to check which fence is later by simply using dma_fence_later().
+ *
+ * It is strongly discouraged to provide an external lock. This is only allowed
+ * for legacy use cases when multiple fences need to be prevented from
+ * signaling out of order.
  */
 void
 dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops,
diff --git a/drivers/dma-buf/sync_debug.h b/drivers/dma-buf/sync_debug.h
index 02af347293d0..c49324505b20 100644
--- a/drivers/dma-buf/sync_debug.h
+++ b/drivers/dma-buf/sync_debug.h
@@ -47,7 +47,7 @@ struct sync_timeline {
 
 static inline struct sync_timeline *dma_fence_parent(struct dma_fence *fence)
 {
-	return container_of(fence->lock, struct sync_timeline, lock);
+	return container_of(fence->extern_lock, struct sync_timeline, lock);
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 139642eacdd0..d5c41e24fb51 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -638,7 +638,7 @@ static inline uint64_t amdgpu_vm_tlb_seq(struct amdgpu_vm *vm)
 	 * sure that the dma_fence structure isn't freed up.
 	 */
 	rcu_read_lock();
-	lock = vm->last_tlb_flush->lock;
+	lock = dma_fence_spinlock(vm->last_tlb_flush);
 	rcu_read_unlock();
 
 	spin_lock_irqsave(lock, flags);
diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
index a7797d260f1e..17472915842f 100644
--- a/drivers/gpu/drm/drm_crtc.c
+++ b/drivers/gpu/drm/drm_crtc.c
@@ -159,7 +159,7 @@ static const struct dma_fence_ops drm_crtc_fence_ops;
 static struct drm_crtc *fence_to_crtc(struct dma_fence *fence)
 {
 	BUG_ON(fence->ops != &drm_crtc_fence_ops);
-	return container_of(fence->lock, struct drm_crtc, fence_lock);
+	return container_of(fence->extern_lock, struct drm_crtc, fence_lock);
 }
 
 static const char *drm_crtc_fence_get_driver_name(struct dma_fence *fence)
diff --git a/drivers/gpu/drm/drm_writeback.c b/drivers/gpu/drm/drm_writeback.c
index 95b8a2e4bda6..624a4e8b6c99 100644
--- a/drivers/gpu/drm/drm_writeback.c
+++ b/drivers/gpu/drm/drm_writeback.c
@@ -81,7 +81,7 @@
  *	From userspace, this property will always read as zero.
  */
 
-#define fence_to_wb_connector(x) container_of(x->lock, \
+#define fence_to_wb_connector(x) container_of(x->extern_lock, \
 					      struct drm_writeback_connector, \
 					      fence_lock)
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
index 4a193b7d6d9e..c282c94138b2 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -41,7 +41,8 @@ static const struct dma_fence_ops nouveau_fence_ops_legacy;
 static inline struct nouveau_fence_chan *
 nouveau_fctx(struct nouveau_fence *fence)
 {
-	return container_of(fence->base.lock, struct nouveau_fence_chan, lock);
+	return container_of(fence->base.extern_lock, struct nouveau_fence_chan,
+			    lock);
 }
 
 static bool
diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index 06b0b2aa7953..37d4ae0faf0d 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -62,7 +62,8 @@ static long qxl_fence_wait(struct dma_fence *fence, bool intr,
 	struct qxl_device *qdev;
 	unsigned long cur, end = jiffies + timeout;
 
-	qdev = container_of(fence->lock, struct qxl_device, release_lock);
+	qdev = container_of(fence->extern_lock, struct qxl_device,
+			    release_lock);
 
 	if (!wait_event_timeout(qdev->release_event,
 				(dma_fence_is_signaled(fence) ||
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 85795082fef9..d251eec57df9 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -47,7 +47,8 @@ struct vmw_event_fence_action {
 static struct vmw_fence_manager *
 fman_from_fence(struct vmw_fence_obj *fence)
 {
-	return container_of(fence->base.lock, struct vmw_fence_manager, lock);
+	return container_of(fence->base.extern_lock, struct vmw_fence_manager,
+			    lock);
 }
 
 static void vmw_fence_obj_destroy(struct dma_fence *f)
diff --git a/drivers/gpu/drm/xe/xe_hw_fence.c b/drivers/gpu/drm/xe/xe_hw_fence.c
index ae8ed15b64c5..14720623ad00 100644
--- a/drivers/gpu/drm/xe/xe_hw_fence.c
+++ b/drivers/gpu/drm/xe/xe_hw_fence.c
@@ -124,7 +124,8 @@ static struct xe_hw_fence *to_xe_hw_fence(struct dma_fence *fence);
 
 static struct xe_hw_fence_irq *xe_hw_fence_irq(struct xe_hw_fence *fence)
 {
-	return container_of(fence->dma.lock, struct xe_hw_fence_irq, lock);
+	return container_of(fence->dma.extern_lock, struct xe_hw_fence_irq,
+			    lock);
 }
 
 static const char *xe_hw_fence_get_driver_name(struct dma_fence *dma_fence)
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 88c842fc35d5..6eabbb1c471c 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -34,7 +34,8 @@ struct seq_file;
  * @ops: dma_fence_ops associated with this fence
  * @rcu: used for releasing fence with kfree_rcu
  * @cb_list: list of all callbacks to call
- * @lock: spin_lock_irqsave used for locking
+ * @extern_lock: external spin_lock_irqsave used for locking
+ * @inline_lock: alternative internal spin_lock_irqsave used for locking
  * @context: execution context this fence belongs to, returned by
  *           dma_fence_context_alloc()
  * @seqno: the sequence number of this fence inside the execution context,
@@ -49,6 +50,7 @@ struct seq_file;
  * of the time.
  *
  * DMA_FENCE_FLAG_INITIALIZED_BIT - fence was initialized
+ * DMA_FENCE_FLAG_INLINE_LOCK_BIT - use inline spinlock instead of external one
  * DMA_FENCE_FLAG_SIGNALED_BIT - fence is already signaled
  * DMA_FENCE_FLAG_TIMESTAMP_BIT - timestamp recorded for fence signaling
  * DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT - enable_signaling might have been called
@@ -66,7 +68,10 @@ struct seq_file;
  * been completed, or never called at all.
  */
 struct dma_fence {
-	spinlock_t *lock;
+	union {
+		spinlock_t *extern_lock;
+		spinlock_t inline_lock;
+	};
 	const struct dma_fence_ops __rcu *ops;
 	/*
 	 * We clear the callback list on kref_put so that by the time we
@@ -100,6 +105,7 @@ struct dma_fence {
 
 enum dma_fence_flag_bits {
 	DMA_FENCE_FLAG_INITIALIZED_BIT,
+	DMA_FENCE_FLAG_INLINE_LOCK_BIT,
 	DMA_FENCE_FLAG_SEQNO64_BIT,
 	DMA_FENCE_FLAG_SIGNALED_BIT,
 	DMA_FENCE_FLAG_TIMESTAMP_BIT,
@@ -381,11 +387,12 @@ dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep)
  * dma_fence_spinlock - return pointer to the spinlock protecting the fence
  * @fence: the fence to get the lock from
  *
- * Return the pointer to the extern lock.
+ * Return either the pointer to the embedded or the external spin lock.
  */
 static inline spinlock_t *dma_fence_spinlock(struct dma_fence *fence)
 {
-	return fence->lock;
+	return test_bit(DMA_FENCE_FLAG_INLINE_LOCK_BIT, &fence->flags) ?
+		&fence->inline_lock : fence->extern_lock;
 }
 
 /**
@@ -396,7 +403,7 @@ static inline spinlock_t *dma_fence_spinlock(struct dma_fence *fence)
  * Lock the fence, preventing it from changing to the signaled state.
  */
 #define dma_fence_lock_irqsave(fence, flags)	\
-	spin_lock_irqsave(fence->lock, flags)
+	spin_lock_irqsave(dma_fence_spinlock(fence), flags)
 
 /**
  * dma_fence_unlock_irqrestore - unlock the fence and irqrestore
@@ -406,7 +413,7 @@ static inline spinlock_t *dma_fence_spinlock(struct dma_fence *fence)
  * Unlock the fence, allowing it to change it's state to signaled again.
  */
 #define dma_fence_unlock_irqrestore(fence, flags)	\
-	spin_unlock_irqrestore(fence->lock, flags)
+	spin_unlock_irqrestore(dma_fence_spinlock(fence), flags)
 
 /**
  * dma_fence_assert_held - lockdep assertion that fence is locked
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 5/8] dma-buf/selftests: test RCU ops and inline lock v2
  2026-02-10 10:01 Independence for dma_fences! v7 Christian König
                   ` (3 preceding siblings ...)
  2026-02-10 10:01 ` [PATCH 4/8] dma-buf: inline spinlock for fence protection v4 Christian König
@ 2026-02-10 10:02 ` Christian König
  2026-02-10 10:02 ` [PATCH 6/8] dma-buf: use inline lock for the stub fence v2 Christian König
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 33+ messages in thread
From: Christian König @ 2026-02-10 10:02 UTC (permalink / raw)
  To: phasta, matthew.brost, sumit.semwal; +Cc: dri-devel, linaro-mm-sig

Drop the mock_fence and the kmem_cache, instead use the inline lock and
test if the ops are properly dropped after signaling.

v2: move the RCU check to the end of the test

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
---
 drivers/dma-buf/st-dma-fence.c | 44 ++++++++--------------------------
 1 file changed, 10 insertions(+), 34 deletions(-)

diff --git a/drivers/dma-buf/st-dma-fence.c b/drivers/dma-buf/st-dma-fence.c
index 5d0d9abc6e21..0d9d524d79b6 100644
--- a/drivers/dma-buf/st-dma-fence.c
+++ b/drivers/dma-buf/st-dma-fence.c
@@ -14,43 +14,26 @@
 
 #include "selftest.h"
 
-static struct kmem_cache *slab_fences;
-
-static struct mock_fence {
-	struct dma_fence base;
-	struct spinlock lock;
-} *to_mock_fence(struct dma_fence *f) {
-	return container_of(f, struct mock_fence, base);
-}
-
 static const char *mock_name(struct dma_fence *f)
 {
 	return "mock";
 }
 
-static void mock_fence_release(struct dma_fence *f)
-{
-	kmem_cache_free(slab_fences, to_mock_fence(f));
-}
-
 static const struct dma_fence_ops mock_ops = {
 	.get_driver_name = mock_name,
 	.get_timeline_name = mock_name,
-	.release = mock_fence_release,
 };
 
 static struct dma_fence *mock_fence(void)
 {
-	struct mock_fence *f;
+	struct dma_fence *f;
 
-	f = kmem_cache_alloc(slab_fences, GFP_KERNEL);
+	f = kmalloc(sizeof(*f), GFP_KERNEL);
 	if (!f)
 		return NULL;
 
-	spin_lock_init(&f->lock);
-	dma_fence_init(&f->base, &mock_ops, &f->lock, 0, 0);
-
-	return &f->base;
+	dma_fence_init(f, &mock_ops, NULL, 0, 0);
+	return f;
 }
 
 static int sanitycheck(void *arg)
@@ -100,6 +83,11 @@ static int test_signaling(void *arg)
 		goto err_free;
 	}
 
+	if (rcu_dereference_protected(f->ops, true)) {
+		pr_err("Fence ops not cleared on signal\n");
+		goto err_free;
+	}
+
 	err = 0;
 err_free:
 	dma_fence_put(f);
@@ -540,19 +528,7 @@ int dma_fence(void)
 		SUBTEST(test_stub),
 		SUBTEST(race_signal_callback),
 	};
-	int ret;
 
 	pr_info("sizeof(dma_fence)=%zu\n", sizeof(struct dma_fence));
-
-	slab_fences = KMEM_CACHE(mock_fence,
-				 SLAB_TYPESAFE_BY_RCU |
-				 SLAB_HWCACHE_ALIGN);
-	if (!slab_fences)
-		return -ENOMEM;
-
-	ret = subtests(tests, NULL);
-
-	kmem_cache_destroy(slab_fences);
-
-	return ret;
+	return subtests(tests, NULL);
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 6/8] dma-buf: use inline lock for the stub fence v2
  2026-02-10 10:01 Independence for dma_fences! v7 Christian König
                   ` (4 preceding siblings ...)
  2026-02-10 10:02 ` [PATCH 5/8] dma-buf/selftests: test RCU ops and inline lock v2 Christian König
@ 2026-02-10 10:02 ` Christian König
  2026-02-13 14:32   ` Boris Brezillon
  2026-02-10 10:02 ` [PATCH 7/8] dma-buf: use inline lock for the dma-fence-array Christian König
  2026-02-10 10:02 ` [PATCH 8/8] dma-buf: use inline lock for the dma-fence-chain Christian König
  7 siblings, 1 reply; 33+ messages in thread
From: Christian König @ 2026-02-10 10:02 UTC (permalink / raw)
  To: phasta, matthew.brost, sumit.semwal; +Cc: dri-devel, linaro-mm-sig

Using the inline lock is now the recommended way for dma_fence
implementations.

So use this approach for the framework's internal fences as well.

Also saves about 4 bytes for the external spinlock.

v2: drop unnecessary changes

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Philipp Stanner <phasta@kernel.org>
---
 drivers/dma-buf/dma-fence.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 1833889e7466..541e20aa4e6c 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -24,7 +24,6 @@ EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit);
 EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal);
 EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
 
-static DEFINE_SPINLOCK(dma_fence_stub_lock);
 static struct dma_fence dma_fence_stub;
 
 /*
@@ -123,12 +122,9 @@ static const struct dma_fence_ops dma_fence_stub_ops = {
 
 static int __init dma_fence_init_stub(void)
 {
-	dma_fence_init(&dma_fence_stub, &dma_fence_stub_ops,
-		       &dma_fence_stub_lock, 0, 0);
-
+	dma_fence_init(&dma_fence_stub, &dma_fence_stub_ops, NULL, 0, 0);
 	set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
 		&dma_fence_stub.flags);
-
 	dma_fence_signal(&dma_fence_stub);
 	return 0;
 }
@@ -160,11 +156,7 @@ struct dma_fence *dma_fence_allocate_private_stub(ktime_t timestamp)
 	if (fence == NULL)
 		return NULL;
 
-	dma_fence_init(fence,
-		       &dma_fence_stub_ops,
-		       &dma_fence_stub_lock,
-		       0, 0);
-
+	dma_fence_init(fence, &dma_fence_stub_ops, NULL, 0, 0);
 	set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
 		&fence->flags);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 7/8] dma-buf: use inline lock for the dma-fence-array
  2026-02-10 10:01 Independence for dma_fences! v7 Christian König
                   ` (5 preceding siblings ...)
  2026-02-10 10:02 ` [PATCH 6/8] dma-buf: use inline lock for the stub fence v2 Christian König
@ 2026-02-10 10:02 ` Christian König
  2026-02-13 14:33   ` Boris Brezillon
  2026-02-10 10:02 ` [PATCH 8/8] dma-buf: use inline lock for the dma-fence-chain Christian König
  7 siblings, 1 reply; 33+ messages in thread
From: Christian König @ 2026-02-10 10:02 UTC (permalink / raw)
  To: phasta, matthew.brost, sumit.semwal; +Cc: dri-devel, linaro-mm-sig

Using the inline lock is now the recommended way for dma_fence
implementations.

So use this approach for the framework's internal fences as well.

Also saves about 4 bytes for the external spinlock.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Philipp Stanner <phasta@kernel.org>
---
 drivers/dma-buf/dma-fence-array.c | 5 ++---
 include/linux/dma-fence-array.h   | 1 -
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c
index 6657d4b30af9..c2119a8049fe 100644
--- a/drivers/dma-buf/dma-fence-array.c
+++ b/drivers/dma-buf/dma-fence-array.c
@@ -204,9 +204,8 @@ void dma_fence_array_init(struct dma_fence_array *array,
 
 	array->num_fences = num_fences;
 
-	spin_lock_init(&array->lock);
-	dma_fence_init(&array->base, &dma_fence_array_ops, &array->lock,
-		       context, seqno);
+	dma_fence_init(&array->base, &dma_fence_array_ops, NULL, context,
+		       seqno);
 	init_irq_work(&array->work, irq_dma_fence_array_work);
 
 	atomic_set(&array->num_pending, signal_on_any ? 1 : num_fences);
diff --git a/include/linux/dma-fence-array.h b/include/linux/dma-fence-array.h
index 079b3dec0a16..370b3d2bba37 100644
--- a/include/linux/dma-fence-array.h
+++ b/include/linux/dma-fence-array.h
@@ -38,7 +38,6 @@ struct dma_fence_array_cb {
 struct dma_fence_array {
 	struct dma_fence base;
 
-	spinlock_t lock;
 	unsigned num_fences;
 	atomic_t num_pending;
 	struct dma_fence **fences;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 8/8] dma-buf: use inline lock for the dma-fence-chain
  2026-02-10 10:01 Independence for dma_fences! v7 Christian König
                   ` (6 preceding siblings ...)
  2026-02-10 10:02 ` [PATCH 7/8] dma-buf: use inline lock for the dma-fence-array Christian König
@ 2026-02-10 10:02 ` Christian König
  2026-02-13 14:33   ` Boris Brezillon
  7 siblings, 1 reply; 33+ messages in thread
From: Christian König @ 2026-02-10 10:02 UTC (permalink / raw)
  To: phasta, matthew.brost, sumit.semwal; +Cc: dri-devel, linaro-mm-sig

Using the inline lock is now the recommended way for dma_fence
implementations.

So use this approach for the framework's internal fences as well.

Also saves about 4 bytes for the external spinlock.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Philipp Stanner <phasta@kernel.org>
---
 drivers/dma-buf/dma-fence-chain.c | 3 +--
 include/linux/dma-fence-chain.h   | 1 -
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
index a8a90acf4f34..a707792b6025 100644
--- a/drivers/dma-buf/dma-fence-chain.c
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -245,7 +245,6 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
 	struct dma_fence_chain *prev_chain = to_dma_fence_chain(prev);
 	uint64_t context;
 
-	spin_lock_init(&chain->lock);
 	rcu_assign_pointer(chain->prev, prev);
 	chain->fence = fence;
 	chain->prev_seqno = 0;
@@ -261,7 +260,7 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
 			seqno = max(prev->seqno, seqno);
 	}
 
-	dma_fence_init64(&chain->base, &dma_fence_chain_ops, &chain->lock,
+	dma_fence_init64(&chain->base, &dma_fence_chain_ops, NULL,
 			 context, seqno);
 
 	/*
diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
index 68c3c1e41014..d39ce7a2e599 100644
--- a/include/linux/dma-fence-chain.h
+++ b/include/linux/dma-fence-chain.h
@@ -46,7 +46,6 @@ struct dma_fence_chain {
 		 */
 		struct irq_work work;
 	};
-	spinlock_t lock;
 };
 
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/8] dma-buf: inline spinlock for fence protection v4
  2026-02-10 10:01 ` [PATCH 4/8] dma-buf: inline spinlock for fence protection v4 Christian König
@ 2026-02-11  9:50   ` Philipp Stanner
  2026-02-11 14:59     ` Christian König
  2026-02-12  9:16   ` Tvrtko Ursulin
  2026-02-13 14:27   ` Boris Brezillon
  2 siblings, 1 reply; 33+ messages in thread
From: Philipp Stanner @ 2026-02-11  9:50 UTC (permalink / raw)
  To: Christian König, matthew.brost, sumit.semwal
  Cc: dri-devel, linaro-mm-sig

On Tue, 2026-02-10 at 11:01 +0100, Christian König wrote:
> Implement per-fence spinlocks, allowing implementations to not give an
> external spinlock to protect the fence internal statei. Instead a spinlock

s/statei/state

> embedded into the fence structure itself is used in this case.
> 
> Shared spinlocks have the problem that implementations need to guarantee
> that the lock live at least as long all fences referencing them.

s/live/lives

> 
> Using a per-fence spinlock allows completely decoupling spinlock producer
> and consumer life times, simplifying the handling in most use cases.

That's a good commit message btw, detailing what the motivation is.
Would be great to see messages like that more frequently :]

> 
> v2: improve naming, coverage and function documentation
> v3: fix one additional locking in the selftests
> v4: separate out some changes to make the patch smaller,
>     fix one amdgpu crash found by CI systems
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/dma-buf/dma-fence.c             | 21 ++++++++++++++++-----
>  drivers/dma-buf/sync_debug.h            |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  2 +-
>  drivers/gpu/drm/drm_crtc.c              |  2 +-
>  drivers/gpu/drm/drm_writeback.c         |  2 +-
>  drivers/gpu/drm/nouveau/nouveau_fence.c |  3 ++-
>  drivers/gpu/drm/qxl/qxl_release.c       |  3 ++-
>  drivers/gpu/drm/vmwgfx/vmwgfx_fence.c   |  3 ++-
>  drivers/gpu/drm/xe/xe_hw_fence.c        |  3 ++-
>  include/linux/dma-fence.h               | 19 +++++++++++++------
>  10 files changed, 41 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 56aa59867eaa..1833889e7466 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -343,7 +343,6 @@ void __dma_fence_might_wait(void)
>  }
>  #endif
>  
> -
>  /**
>   * dma_fence_signal_timestamp_locked - signal completion of a fence
>   * @fence: the fence to signal
> @@ -1067,7 +1066,6 @@ static void
>  __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>  	         spinlock_t *lock, u64 context, u64 seqno, unsigned long flags)
>  {
> -	BUG_ON(!lock);
>  	BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
>  
>  	kref_init(&fence->refcount);
> @@ -1078,10 +1076,15 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>  	 */
>  	RCU_INIT_POINTER(fence->ops, ops);
>  	INIT_LIST_HEAD(&fence->cb_list);
> -	fence->lock = lock;
>  	fence->context = context;
>  	fence->seqno = seqno;
>  	fence->flags = flags | BIT(DMA_FENCE_FLAG_INITIALIZED_BIT);
> +	if (lock) {
> +		fence->extern_lock = lock;
> +	} else {
> +		spin_lock_init(&fence->inline_lock);
> +		fence->flags |= BIT(DMA_FENCE_FLAG_INLINE_LOCK_BIT);
> +	}
>  	fence->error = 0;
>  
>  	trace_dma_fence_init(fence);
> @@ -1091,7 +1094,7 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>   * dma_fence_init - Initialize a custom fence.
>   * @fence: the fence to initialize
>   * @ops: the dma_fence_ops for operations on this fence
> - * @lock: the irqsafe spinlock to use for locking this fence
> + * @lock: optional irqsafe spinlock to use for locking this fence
>   * @context: the execution context this fence is run on
>   * @seqno: a linear increasing sequence number for this context
>   *
> @@ -1101,6 +1104,10 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>   *
>   * context and seqno are used for easy comparison between fences, allowing
>   * to check which fence is later by simply using dma_fence_later().
> + *
> + * It is strongly discouraged to provide an external lock. This is only allowed

"strongly discouraged […] because this does not decouple lock and fence
life times." ?

> + * for legacy use cases when multiple fences need to be prevented from
> + * signaling out of order.

I think our previous discussions revealed that the external lock does
not even help with that, does it?

>   */
>  void
>  dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> @@ -1114,7 +1121,7 @@ EXPORT_SYMBOL(dma_fence_init);
>   * dma_fence_init64 - Initialize a custom fence with 64-bit seqno support.
>   * @fence: the fence to initialize
>   * @ops: the dma_fence_ops for operations on this fence
> - * @lock: the irqsafe spinlock to use for locking this fence
> + * @lock: optional irqsafe spinlock to use for locking this fence
>   * @context: the execution context this fence is run on
>   * @seqno: a linear increasing sequence number for this context
>   *
> @@ -1124,6 +1131,10 @@ EXPORT_SYMBOL(dma_fence_init);
>   *
>   * Context and seqno are used for easy comparison between fences, allowing
>   * to check which fence is later by simply using dma_fence_later().
> + *
> + * It is strongly discouraged to provide an external lock. This is only allowed

same

> + * for legacy use cases when multiple fences need to be prevented from
> + * signaling out of order.
>   */
>  void
>  dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops,
> diff --git a/drivers/dma-buf/sync_debug.h b/drivers/dma-buf/sync_debug.h
> index 02af347293d0..c49324505b20 100644
> --- a/drivers/dma-buf/sync_debug.h
> +++ b/drivers/dma-buf/sync_debug.h
> @@ -47,7 +47,7 @@ struct sync_timeline {
>  
>  static inline struct sync_timeline *dma_fence_parent(struct dma_fence *fence)
>  {
> -	return container_of(fence->lock, struct sync_timeline, lock);
> +	return container_of(fence->extern_lock, struct sync_timeline, lock);

You're sure that this will never have to check for the flag?

>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index 139642eacdd0..d5c41e24fb51 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -638,7 +638,7 @@ static inline uint64_t amdgpu_vm_tlb_seq(struct amdgpu_vm *vm)
>  	 * sure that the dma_fence structure isn't freed up.
>  	 */
>  	rcu_read_lock();
> -	lock = vm->last_tlb_flush->lock;
> +	lock = dma_fence_spinlock(vm->last_tlb_flush);
>  	rcu_read_unlock();
>  
>  	spin_lock_irqsave(lock, flags);
> diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
> index a7797d260f1e..17472915842f 100644
> --- a/drivers/gpu/drm/drm_crtc.c
> +++ b/drivers/gpu/drm/drm_crtc.c
> @@ -159,7 +159,7 @@ static const struct dma_fence_ops drm_crtc_fence_ops;
>  static struct drm_crtc *fence_to_crtc(struct dma_fence *fence)
>  {
>  	BUG_ON(fence->ops != &drm_crtc_fence_ops);
> -	return container_of(fence->lock, struct drm_crtc, fence_lock);
> +	return container_of(fence->extern_lock, struct drm_crtc, fence_lock);
>  }
>  
>  static const char *drm_crtc_fence_get_driver_name(struct dma_fence *fence)
> diff --git a/drivers/gpu/drm/drm_writeback.c b/drivers/gpu/drm/drm_writeback.c
> index 95b8a2e4bda6..624a4e8b6c99 100644
> --- a/drivers/gpu/drm/drm_writeback.c
> +++ b/drivers/gpu/drm/drm_writeback.c
> @@ -81,7 +81,7 @@
>   *	From userspace, this property will always read as zero.
>   */
>  
> -#define fence_to_wb_connector(x) container_of(x->lock, \
> +#define fence_to_wb_connector(x) container_of(x->extern_lock, \
>  					      struct drm_writeback_connector, \
>  					      fence_lock)
>  
> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
> index 4a193b7d6d9e..c282c94138b2 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> @@ -41,7 +41,8 @@ static const struct dma_fence_ops nouveau_fence_ops_legacy;
>  static inline struct nouveau_fence_chan *
>  nouveau_fctx(struct nouveau_fence *fence)
>  {
> -	return container_of(fence->base.lock, struct nouveau_fence_chan, lock);
> +	return container_of(fence->base.extern_lock, struct nouveau_fence_chan,
> +			    lock);
>  }
>  
>  static bool
> diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
> index 06b0b2aa7953..37d4ae0faf0d 100644
> --- a/drivers/gpu/drm/qxl/qxl_release.c
> +++ b/drivers/gpu/drm/qxl/qxl_release.c
> @@ -62,7 +62,8 @@ static long qxl_fence_wait(struct dma_fence *fence, bool intr,
>  	struct qxl_device *qdev;
>  	unsigned long cur, end = jiffies + timeout;
>  
> -	qdev = container_of(fence->lock, struct qxl_device, release_lock);
> +	qdev = container_of(fence->extern_lock, struct qxl_device,
> +			    release_lock);
>  
>  	if (!wait_event_timeout(qdev->release_event,
>  				(dma_fence_is_signaled(fence) ||
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> index 85795082fef9..d251eec57df9 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> @@ -47,7 +47,8 @@ struct vmw_event_fence_action {
>  static struct vmw_fence_manager *
>  fman_from_fence(struct vmw_fence_obj *fence)
>  {
> -	return container_of(fence->base.lock, struct vmw_fence_manager, lock);
> +	return container_of(fence->base.extern_lock, struct vmw_fence_manager,
> +			    lock);
>  }
>  
>  static void vmw_fence_obj_destroy(struct dma_fence *f)
> diff --git a/drivers/gpu/drm/xe/xe_hw_fence.c b/drivers/gpu/drm/xe/xe_hw_fence.c
> index ae8ed15b64c5..14720623ad00 100644
> --- a/drivers/gpu/drm/xe/xe_hw_fence.c
> +++ b/drivers/gpu/drm/xe/xe_hw_fence.c
> @@ -124,7 +124,8 @@ static struct xe_hw_fence *to_xe_hw_fence(struct dma_fence *fence);
>  
>  static struct xe_hw_fence_irq *xe_hw_fence_irq(struct xe_hw_fence *fence)
>  {
> -	return container_of(fence->dma.lock, struct xe_hw_fence_irq, lock);
> +	return container_of(fence->dma.extern_lock, struct xe_hw_fence_irq,
> +			    lock);
>  }
>  
>  static const char *xe_hw_fence_get_driver_name(struct dma_fence *dma_fence)
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 88c842fc35d5..6eabbb1c471c 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -34,7 +34,8 @@ struct seq_file;
>   * @ops: dma_fence_ops associated with this fence
>   * @rcu: used for releasing fence with kfree_rcu
>   * @cb_list: list of all callbacks to call
> - * @lock: spin_lock_irqsave used for locking
> + * @extern_lock: external spin_lock_irqsave used for locking

Add a "(deprecated)" ?

> + * @inline_lock: alternative internal spin_lock_irqsave used for locking
>   * @context: execution context this fence belongs to, returned by
>   *           dma_fence_context_alloc()
>   * @seqno: the sequence number of this fence inside the execution context,
> @@ -49,6 +50,7 @@ struct seq_file;
>   * of the time.
>   *
>   * DMA_FENCE_FLAG_INITIALIZED_BIT - fence was initialized
> + * DMA_FENCE_FLAG_INLINE_LOCK_BIT - use inline spinlock instead of external one
>   * DMA_FENCE_FLAG_SIGNALED_BIT - fence is already signaled
>   * DMA_FENCE_FLAG_TIMESTAMP_BIT - timestamp recorded for fence signaling
>   * DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT - enable_signaling might have been called
> @@ -66,7 +68,10 @@ struct seq_file;
>   * been completed, or never called at all.
>   */
>  struct dma_fence {
> -	spinlock_t *lock;
> +	union {
> +		spinlock_t *extern_lock;
> +		spinlock_t inline_lock;
> +	};
>  	const struct dma_fence_ops __rcu *ops;
>  	/*
>  	 * We clear the callback list on kref_put so that by the time we
> @@ -100,6 +105,7 @@ struct dma_fence {
>  
>  enum dma_fence_flag_bits {
>  	DMA_FENCE_FLAG_INITIALIZED_BIT,
> +	DMA_FENCE_FLAG_INLINE_LOCK_BIT,

Just asking about a nit: what's the order here, always alphabetically?

>  	DMA_FENCE_FLAG_SEQNO64_BIT,
>  	DMA_FENCE_FLAG_SIGNALED_BIT,
>  	DMA_FENCE_FLAG_TIMESTAMP_BIT,
> @@ -381,11 +387,12 @@ dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep)
>   * dma_fence_spinlock - return pointer to the spinlock protecting the fence
>   * @fence: the fence to get the lock from
>   *
> - * Return the pointer to the extern lock.
> + * Return either the pointer to the embedded or the external spin lock.
>   */
>  static inline spinlock_t *dma_fence_spinlock(struct dma_fence *fence)
>  {
> -	return fence->lock;
> +	return test_bit(DMA_FENCE_FLAG_INLINE_LOCK_BIT, &fence->flags) ?
> +		&fence->inline_lock : fence->extern_lock;

I personally am not a fan of using '?' for anything longer than 1 line
and think that

if (condition)
  return a;

return b;

is much better readable.



P.

>  }
>  
>  /**
> @@ -396,7 +403,7 @@ static inline spinlock_t *dma_fence_spinlock(struct dma_fence *fence)
>   * Lock the fence, preventing it from changing to the signaled state.
>   */
>  #define dma_fence_lock_irqsave(fence, flags)	\
> -	spin_lock_irqsave(fence->lock, flags)
> +	spin_lock_irqsave(dma_fence_spinlock(fence), flags)
>  
>  /**
>   * dma_fence_unlock_irqrestore - unlock the fence and irqrestore
> @@ -406,7 +413,7 @@ static inline spinlock_t *dma_fence_spinlock(struct dma_fence *fence)
>   * Unlock the fence, allowing it to change it's state to signaled again.
>   */
>  #define dma_fence_unlock_irqrestore(fence, flags)	\
> -	spin_unlock_irqrestore(fence->lock, flags)
> +	spin_unlock_irqrestore(dma_fence_spinlock(fence), flags)
>  
>  /**
>   * dma_fence_assert_held - lockdep assertion that fence is locked


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/8] dma-buf: protected fence ops by RCU v5
  2026-02-10 10:01 ` [PATCH 1/8] dma-buf: protected fence ops by RCU v5 Christian König
@ 2026-02-11 10:06   ` Philipp Stanner
  2026-02-11 15:43     ` Christian König
  2026-02-12  9:31   ` Tvrtko Ursulin
  2026-02-13 14:20   ` Boris Brezillon
  2 siblings, 1 reply; 33+ messages in thread
From: Philipp Stanner @ 2026-02-11 10:06 UTC (permalink / raw)
  To: Christian König, matthew.brost, sumit.semwal
  Cc: dri-devel, linaro-mm-sig

On Tue, 2026-02-10 at 11:01 +0100, Christian König wrote:
> At first glance it is counter intuitive to protect a constant function
> pointer table by RCU, but this allows modules providing the function
> table to unload by waiting for an RCU grace period.

I think that someone who does not already have a deep understanding
about dma-buf and fences will have much trouble understanding *why*
this patch is in the log and *what it achieves*.

Good commit messages are at least as important as good code. In
drm/sched for example I've been trying so many times to figure out why
certain hacks and changes were implemented, but all that git-blame ever
gave me was one liners, often hinting at some driver internal work
around ._.

> 
> v2: make one the now duplicated lockdep warnings a comment instead.
> v3: Add more documentation to ->wait and ->release callback.
> v4: fix typo in documentation
> v5: rebased on drm-tip
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/dma-buf/dma-fence.c | 69 +++++++++++++++++++++++++------------
>  include/linux/dma-fence.h   | 29 ++++++++++++++--
>  2 files changed, 73 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index e05beae6e407..de9bf18be3d4 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -522,6 +522,7 @@ EXPORT_SYMBOL(dma_fence_signal);
>  signed long
>  dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
>  {
> +	const struct dma_fence_ops *ops;
>  	signed long ret;
>  
>  	if (WARN_ON(timeout < 0))
> @@ -533,15 +534,21 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
>  
>  	dma_fence_enable_sw_signaling(fence);
>  
> -	if (trace_dma_fence_wait_start_enabled()) {

Why can wait_start_enabled() be removed? Is that related to the life
time decoupling or is it a separate topic?

> -		rcu_read_lock();
> -		trace_dma_fence_wait_start(fence);
> +	rcu_read_lock();
> +	ops = rcu_dereference(fence->ops);
> +	trace_dma_fence_wait_start(fence);
> +	if (ops->wait) {
> +		/*
> +		 * Implementing the wait ops is deprecated and not supported for
> +		 * issuer independent fences, so it is ok to use the ops outside

s/issuer/issuers of

And how do we know that this here is an independent fence?
What even is an "independent fence" – one with internal spinlock?

> +		 * the RCU protected section.
> +		 */
> +		rcu_read_unlock();
> +		ret = ops->wait(fence, intr, timeout);
> +	} else {
>  		rcu_read_unlock();
> -	}
> -	if (fence->ops->wait)
> -		ret = fence->ops->wait(fence, intr, timeout);
> -	else
>  		ret = dma_fence_default_wait(fence, intr, timeout);
> +	}

The git diff here looks awkward. Do you use git format-patch --
histogram?

>  	if (trace_dma_fence_wait_end_enabled()) {
>  		rcu_read_lock();
>  		trace_dma_fence_wait_end(fence);
> @@ -562,6 +569,7 @@ void dma_fence_release(struct kref *kref)
>  {
>  	struct dma_fence *fence =
>  		container_of(kref, struct dma_fence, refcount);
> +	const struct dma_fence_ops *ops;
>  
>  	rcu_read_lock();
>  	trace_dma_fence_destroy(fence);
> @@ -593,12 +601,12 @@ void dma_fence_release(struct kref *kref)
>  		spin_unlock_irqrestore(fence->lock, flags);
>  	}
>  
> -	rcu_read_unlock();
> -
> -	if (fence->ops->release)
> -		fence->ops->release(fence);
> +	ops = rcu_dereference(fence->ops);
> +	if (ops->release)
> +		ops->release(fence);
>  	else
>  		dma_fence_free(fence);
> +	rcu_read_unlock();
>  }
>  EXPORT_SYMBOL(dma_fence_release);
>  
> @@ -617,6 +625,7 @@ EXPORT_SYMBOL(dma_fence_free);
>  
>  static bool __dma_fence_enable_signaling(struct dma_fence *fence)
>  {
> +	const struct dma_fence_ops *ops;
>  	bool was_set;
>  
>  	lockdep_assert_held(fence->lock);
> @@ -627,14 +636,18 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
>  	if (dma_fence_test_signaled_flag(fence))
>  		return false;
>  
> -	if (!was_set && fence->ops->enable_signaling) {
> +	rcu_read_lock();
> +	ops = rcu_dereference(fence->ops);
> +	if (!was_set && ops->enable_signaling) {
>  		trace_dma_fence_enable_signal(fence);
>  
> -		if (!fence->ops->enable_signaling(fence)) {
> +		if (!ops->enable_signaling(fence)) {
> +			rcu_read_unlock();
>  			dma_fence_signal_locked(fence);
>  			return false;
>  		}
>  	}
> +	rcu_read_unlock();
>  
>  	return true;
>  }
> @@ -1007,8 +1020,13 @@ EXPORT_SYMBOL(dma_fence_wait_any_timeout);
>   */
>  void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
>  {
> -	if (fence->ops->set_deadline && !dma_fence_is_signaled(fence))
> -		fence->ops->set_deadline(fence, deadline);
> +	const struct dma_fence_ops *ops;
> +
> +	rcu_read_lock();
> +	ops = rcu_dereference(fence->ops);
> +	if (ops->set_deadline && !dma_fence_is_signaled(fence))
> +		ops->set_deadline(fence, deadline);
> +	rcu_read_unlock();
>  }
>  EXPORT_SYMBOL(dma_fence_set_deadline);
>  
> @@ -1049,7 +1067,12 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>  	BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
>  
>  	kref_init(&fence->refcount);
> -	fence->ops = ops;
> +	/*
> +	 * At first glance it is counter intuitive to protect a constant
> +	 * function pointer table by RCU, but this allows modules providing the
> +	 * function table to unload by waiting for an RCU grace period.

Maybe add a sentence like "Fences can live longer than the module which
issued them."

> +	 */
> +	RCU_INIT_POINTER(fence->ops, ops);
>  	INIT_LIST_HEAD(&fence->cb_list);
>  	fence->lock = lock;
>  	fence->context = context;
> @@ -1129,11 +1152,12 @@ EXPORT_SYMBOL(dma_fence_init64);
>   */
>  const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
>  {
> -	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
> -			 "RCU protection is required for safe access to returned string");
> +	const struct dma_fence_ops *ops;
>  
> +	/* RCU protection is required for safe access to returned string */
> +	ops = rcu_dereference(fence->ops);
>  	if (!dma_fence_test_signaled_flag(fence))
> -		return (const char __rcu *)fence->ops->get_driver_name(fence);
> +		return (const char __rcu *)ops->get_driver_name(fence);
>  	else
>  		return (const char __rcu *)"detached-driver";
>  }
> @@ -1161,11 +1185,12 @@ EXPORT_SYMBOL(dma_fence_driver_name);
>   */
>  const char __rcu *dma_fence_timeline_name(struct dma_fence *fence)
>  {
> -	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
> -			 "RCU protection is required for safe access to returned string");
> +	const struct dma_fence_ops *ops;
>  
> +	/* RCU protection is required for safe access to returned string */
> +	ops = rcu_dereference(fence->ops);
>  	if (!dma_fence_test_signaled_flag(fence))
> -		return (const char __rcu *)fence->ops->get_driver_name(fence);
> +		return (const char __rcu *)ops->get_driver_name(fence);
>  	else
>  		return (const char __rcu *)"signaled-timeline";
>  }

Did we make any progress in our conversation about removing those two
functions and callbacks? They're only used by i915.


> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 9c4d25289239..6bf4feb0e01f 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -67,7 +67,7 @@ struct seq_file;
>   */
>  struct dma_fence {
>  	spinlock_t *lock;
> -	const struct dma_fence_ops *ops;
> +	const struct dma_fence_ops __rcu *ops;
>  	/*
>  	 * We clear the callback list on kref_put so that by the time we
>  	 * release the fence it is unused. No one should be adding to the
> @@ -220,6 +220,10 @@ struct dma_fence_ops {
>  	 * timed out. Can also return other error values on custom implementations,
>  	 * which should be treated as if the fence is signaled. For example a hardware
>  	 * lockup could be reported like that.
> +	 *
> +	 * Implementing this callback prevents the fence from detaching after
> +	 * signaling and so it is mandatory for the module providing the

s/mandatory/necessary ?

> +	 * dma_fence_ops to stay loaded as long as the dma_fence exists.
>  	 */
>  	signed long (*wait)(struct dma_fence *fence,
>  			    bool intr, signed long timeout);
> @@ -231,6 +235,13 @@ struct dma_fence_ops {
>  	 * Can be called from irq context.  This callback is optional. If it is
>  	 * NULL, then dma_fence_free() is instead called as the default
>  	 * implementation.
> +	 *
> +	 * Implementing this callback prevents the fence from detaching after
> +	 * signaling and so it is mandatory for the module providing the

same

> +	 * dma_fence_ops to stay loaded as long as the dma_fence exists.
> +	 *
> +	 * If the callback is implemented the memory backing the dma_fence
> +	 * object must be freed RCU safe.
>  	 */
>  	void (*release)(struct dma_fence *fence);
>  
> @@ -454,13 +465,19 @@ dma_fence_test_signaled_flag(struct dma_fence *fence)
>  static inline bool
>  dma_fence_is_signaled_locked(struct dma_fence *fence)
>  {
> +	const struct dma_fence_ops *ops;
> +
>  	if (dma_fence_test_signaled_flag(fence))
>  		return true;
>  
> -	if (fence->ops->signaled && fence->ops->signaled(fence)) {
> +	rcu_read_lock();
> +	ops = rcu_dereference(fence->ops);
> +	if (ops->signaled && ops->signaled(fence)) {

Maybe you can educate me a bit about RCU here – couldn't this still
race? If the ops were unloaded before you take rcu_read_lock(),
rcu_dereference() would give you an invalid pointer here since you
don't check for !ops, no?


> +		rcu_read_unlock();
>  		dma_fence_signal_locked(fence);
>  		return true;
>  	}
> +	rcu_read_unlock();
>  
>  	return false;
>  }
> @@ -484,13 +501,19 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
>  static inline bool
>  dma_fence_is_signaled(struct dma_fence *fence)
>  {
> +	const struct dma_fence_ops *ops;
> +
>  	if (dma_fence_test_signaled_flag(fence))
>  		return true;
>  
> -	if (fence->ops->signaled && fence->ops->signaled(fence)) {
> +	rcu_read_lock();
> +	ops = rcu_dereference(fence->ops);
> +	if (ops->signaled && ops->signaled(fence)) {

same


Danke,
P.

> +		rcu_read_unlock();
>  		dma_fence_signal(fence);
>  		return true;
>  	}
> +	rcu_read_unlock();
>  
>  	return false;
>  }


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/8] dma-buf: inline spinlock for fence protection v4
  2026-02-11  9:50   ` Philipp Stanner
@ 2026-02-11 14:59     ` Christian König
  2026-02-12  9:01       ` Philipp Stanner
  0 siblings, 1 reply; 33+ messages in thread
From: Christian König @ 2026-02-11 14:59 UTC (permalink / raw)
  To: phasta, matthew.brost, sumit.semwal; +Cc: dri-devel, linaro-mm-sig

On 2/11/26 10:50, Philipp Stanner wrote:
> On Tue, 2026-02-10 at 11:01 +0100, Christian König wrote:
...
>> Using a per-fence spinlock allows completely decoupling spinlock producer
>> and consumer life times, simplifying the handling in most use cases.
> 
> That's a good commit message btw, detailing what the motivation is.
> Would be great to see messages like that more frequently :]

Yeah, but they are not so easy to write.

>>  	trace_dma_fence_init(fence);
>> @@ -1091,7 +1094,7 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>>   * dma_fence_init - Initialize a custom fence.
>>   * @fence: the fence to initialize
>>   * @ops: the dma_fence_ops for operations on this fence
>> - * @lock: the irqsafe spinlock to use for locking this fence
>> + * @lock: optional irqsafe spinlock to use for locking this fence
>>   * @context: the execution context this fence is run on
>>   * @seqno: a linear increasing sequence number for this context
>>   *
>> @@ -1101,6 +1104,10 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>>   *
>>   * context and seqno are used for easy comparison between fences, allowing
>>   * to check which fence is later by simply using dma_fence_later().
>> + *
>> + * It is strongly discouraged to provide an external lock. This is only allowed
> 
> "strongly discouraged […] because this does not decouple lock and fence
> life times." ?

Good point, added some more text.
 
>> + * for legacy use cases when multiple fences need to be prevented from
>> + * signaling out of order.
> 
> I think our previous discussions revealed that the external lock does
> not even help with that, does it?

Well only when you provide a ->signaled() callback in the dma_fence_ops.

The reason we have so much different approaches in the dma_fence handling is because it is basically the unification multiple different driver implementations which all targeted more or less different use cases.

>> + * for legacy use cases when multiple fences need to be prevented from
>> + * signaling out of order.
>>   */
>>  void
>>  dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops,
>> diff --git a/drivers/dma-buf/sync_debug.h b/drivers/dma-buf/sync_debug.h
>> index 02af347293d0..c49324505b20 100644
>> --- a/drivers/dma-buf/sync_debug.h
>> +++ b/drivers/dma-buf/sync_debug.h
>> @@ -47,7 +47,7 @@ struct sync_timeline {
>>  
>>  static inline struct sync_timeline *dma_fence_parent(struct dma_fence *fence)
>>  {
>> -	return container_of(fence->lock, struct sync_timeline, lock);
>> +	return container_of(fence->extern_lock, struct sync_timeline, lock);
> 
> You're sure that this will never have to check for the flag?

Yes, the code would have crashed before if anything than a sync_pt created by sync_pt_create was encountered here.

We could drop the wrapper, move the cast to the only place where it matters and document the why and what with a code comment.... but this is all dead code which breaks some of the fundamental dma-fence rules and it is only left here because we can't break the UAPI.
>>  static const char *xe_hw_fence_get_driver_name(struct dma_fence *dma_fence)
>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>> index 88c842fc35d5..6eabbb1c471c 100644
>> --- a/include/linux/dma-fence.h
>> +++ b/include/linux/dma-fence.h
>> @@ -34,7 +34,8 @@ struct seq_file;
>>   * @ops: dma_fence_ops associated with this fence
>>   * @rcu: used for releasing fence with kfree_rcu
>>   * @cb_list: list of all callbacks to call
>> - * @lock: spin_lock_irqsave used for locking
>> + * @extern_lock: external spin_lock_irqsave used for locking
> 
> Add a "(deprecated)" ?

Done.

> 
>> + * @inline_lock: alternative internal spin_lock_irqsave used for locking
>>   * @context: execution context this fence belongs to, returned by
>>   *           dma_fence_context_alloc()
>>   * @seqno: the sequence number of this fence inside the execution context,
>> @@ -49,6 +50,7 @@ struct seq_file;
>>   * of the time.
>>   *
>>   * DMA_FENCE_FLAG_INITIALIZED_BIT - fence was initialized
>> + * DMA_FENCE_FLAG_INLINE_LOCK_BIT - use inline spinlock instead of external one
>>   * DMA_FENCE_FLAG_SIGNALED_BIT - fence is already signaled
>>   * DMA_FENCE_FLAG_TIMESTAMP_BIT - timestamp recorded for fence signaling
>>   * DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT - enable_signaling might have been called
>> @@ -66,7 +68,10 @@ struct seq_file;
>>   * been completed, or never called at all.
>>   */
>>  struct dma_fence {
>> -	spinlock_t *lock;
>> +	union {
>> +		spinlock_t *extern_lock;
>> +		spinlock_t inline_lock;
>> +	};
>>  	const struct dma_fence_ops __rcu *ops;
>>  	/*
>>  	 * We clear the callback list on kref_put so that by the time we
>> @@ -100,6 +105,7 @@ struct dma_fence {
>>  
>>  enum dma_fence_flag_bits {
>>  	DMA_FENCE_FLAG_INITIALIZED_BIT,
>> +	DMA_FENCE_FLAG_INLINE_LOCK_BIT,
> 
> Just asking about a nit: what's the order here, always alphabetically?

In which the flags are used in the code flow.

>>  	DMA_FENCE_FLAG_SEQNO64_BIT,
>>  	DMA_FENCE_FLAG_SIGNALED_BIT,
>>  	DMA_FENCE_FLAG_TIMESTAMP_BIT,
>> @@ -381,11 +387,12 @@ dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep)
>>   * dma_fence_spinlock - return pointer to the spinlock protecting the fence
>>   * @fence: the fence to get the lock from
>>   *
>> - * Return the pointer to the extern lock.
>> + * Return either the pointer to the embedded or the external spin lock.
>>   */
>>  static inline spinlock_t *dma_fence_spinlock(struct dma_fence *fence)
>>  {
>> -	return fence->lock;
>> +	return test_bit(DMA_FENCE_FLAG_INLINE_LOCK_BIT, &fence->flags) ?
>> +		&fence->inline_lock : fence->extern_lock;
> 
> I personally am not a fan of using '?' for anything longer than 1 line
> and think that
> 
> if (condition)
>   return a;
> 
> return b;
> 
> is much better readable.

Mhm, I disagree in this particular case. Especially that you have both possibilities side by side makes it more readable I think.

Regards,
Christian.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/8] dma-buf: protected fence ops by RCU v5
  2026-02-11 10:06   ` Philipp Stanner
@ 2026-02-11 15:43     ` Christian König
  2026-02-12  8:56       ` Philipp Stanner
  2026-02-12  9:03       ` Tvrtko Ursulin
  0 siblings, 2 replies; 33+ messages in thread
From: Christian König @ 2026-02-11 15:43 UTC (permalink / raw)
  To: phasta, matthew.brost, sumit.semwal; +Cc: dri-devel, linaro-mm-sig

On 2/11/26 11:06, Philipp Stanner wrote:
> On Tue, 2026-02-10 at 11:01 +0100, Christian König wrote:
>> At first glance it is counter intuitive to protect a constant function
>> pointer table by RCU, but this allows modules providing the function
>> table to unload by waiting for an RCU grace period.
> 
> I think that someone who does not already have a deep understanding
> about dma-buf and fences will have much trouble understanding *why*
> this patch is in the log and *what it achieves*.
> 
> Good commit messages are at least as important as good code. In
> drm/sched for example I've been trying so many times to figure out why
> certain hacks and changes were implemented, but all that git-blame ever
> gave me was one liners, often hinting at some driver internal work
> around ._.

How about something like this:

The fence ops of a dma_fence currently need to life as long as the dma_fence is alive.

This means that the module who originally issued a dma_fence can't unload unless all of them are freed up.

As first step to solve this issue protect the fence ops by RCU.

While it is counter intuitive to protect a constant function pointer table by RCU it allows modules to wait for an RCU grace period to make sure that nobody is executing their functions any more.


> 
>>
>> v2: make one the now duplicated lockdep warnings a comment instead.
>> v3: Add more documentation to ->wait and ->release callback.
>> v4: fix typo in documentation
>> v5: rebased on drm-tip
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>  drivers/dma-buf/dma-fence.c | 69 +++++++++++++++++++++++++------------
>>  include/linux/dma-fence.h   | 29 ++++++++++++++--
>>  2 files changed, 73 insertions(+), 25 deletions(-)
>>
>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>> index e05beae6e407..de9bf18be3d4 100644
>> --- a/drivers/dma-buf/dma-fence.c
>> +++ b/drivers/dma-buf/dma-fence.c
>> @@ -522,6 +522,7 @@ EXPORT_SYMBOL(dma_fence_signal);
>>  signed long
>>  dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
>>  {
>> +	const struct dma_fence_ops *ops;
>>  	signed long ret;
>>  
>>  	if (WARN_ON(timeout < 0))
>> @@ -533,15 +534,21 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
>>  
>>  	dma_fence_enable_sw_signaling(fence);
>>  
>> -	if (trace_dma_fence_wait_start_enabled()) {
> 
> Why can wait_start_enabled() be removed? Is that related to the life
> time decoupling or is it a separate topic?

It isn't removed, I've just removed the "if (trace_dma_fence_wait_start_enabled())" optimization which is used by the tracing subsystem as self-patching code (longer story).

The trace_dma_fence_wait_start() trace point function is still called a few lines below.

>> -		rcu_read_lock();
>> -		trace_dma_fence_wait_start(fence);
>> +	rcu_read_lock();
>> +	ops = rcu_dereference(fence->ops);
>> +	trace_dma_fence_wait_start(fence);
>> +	if (ops->wait) {
>> +		/*
>> +		 * Implementing the wait ops is deprecated and not supported for
>> +		 * issuer independent fences, so it is ok to use the ops outside
> 
> s/issuer/issuers of

Fixed.

> And how do we know that this here is an independent fence?
> What even is an "independent fence" – one with internal spinlock?

I rephrased the sentence a bit to make that more clearer:

                /*
                 * Implementing the wait ops is deprecated and not supported for
                 * issuers of fences who wants them to be independent of their
                 * module after they signal, so it is ok to use the ops outside
                 * the RCU protected section.
                 */


> 
>> +		 * the RCU protected section.
>> +		 */
>> +		rcu_read_unlock();
>> +		ret = ops->wait(fence, intr, timeout);
>> +	} else {
>>  		rcu_read_unlock();
>> -	}
>> -	if (fence->ops->wait)
>> -		ret = fence->ops->wait(fence, intr, timeout);
>> -	else
>>  		ret = dma_fence_default_wait(fence, intr, timeout);
>> +	}
> 
> The git diff here looks awkward. Do you use git format-patch --
> histogram?

Nope, what's the matter?

>>  	if (trace_dma_fence_wait_end_enabled()) {
>>  		rcu_read_lock();
>>  		trace_dma_fence_wait_end(fence);

>>  
>> @@ -1049,7 +1067,12 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>>  	BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
>>  
>>  	kref_init(&fence->refcount);
>> -	fence->ops = ops;
>> +	/*
>> +	 * At first glance it is counter intuitive to protect a constant
>> +	 * function pointer table by RCU, but this allows modules providing the
>> +	 * function table to unload by waiting for an RCU grace period.
> 
> Maybe add a sentence like "Fences can live longer than the module which
> issued them."

Going to use the same as the commit message here as soon as we synced up on that.

> 
>> +	 */
>> +	RCU_INIT_POINTER(fence->ops, ops);
>>  	INIT_LIST_HEAD(&fence->cb_list);
>>  	fence->lock = lock;
>>  	fence->context = context;
>> @@ -1129,11 +1152,12 @@ EXPORT_SYMBOL(dma_fence_init64);
>>   */
>>  const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
>>  {
>> -	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
>> -			 "RCU protection is required for safe access to returned string");
>> +	const struct dma_fence_ops *ops;
>>  
>> +	/* RCU protection is required for safe access to returned string */
>> +	ops = rcu_dereference(fence->ops);
>>  	if (!dma_fence_test_signaled_flag(fence))
>> -		return (const char __rcu *)fence->ops->get_driver_name(fence);
>> +		return (const char __rcu *)ops->get_driver_name(fence);
>>  	else
>>  		return (const char __rcu *)"detached-driver";
>>  }
>> @@ -1161,11 +1185,12 @@ EXPORT_SYMBOL(dma_fence_driver_name);
>>   */
>>  const char __rcu *dma_fence_timeline_name(struct dma_fence *fence)
>>  {
>> -	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
>> -			 "RCU protection is required for safe access to returned string");
>> +	const struct dma_fence_ops *ops;
>>  
>> +	/* RCU protection is required for safe access to returned string */
>> +	ops = rcu_dereference(fence->ops);
>>  	if (!dma_fence_test_signaled_flag(fence))
>> -		return (const char __rcu *)fence->ops->get_driver_name(fence);
>> +		return (const char __rcu *)ops->get_driver_name(fence);
>>  	else
>>  		return (const char __rcu *)"signaled-timeline";
>>  }
> 
> Did we make any progress in our conversation about removing those two
> functions and callbacks? They're only used by i915.

Actually they are mostly used by the trace points and debugfs, so we certainly can't remove them.

But I'm really wondering why the heck i915 is using them?

>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>> index 9c4d25289239..6bf4feb0e01f 100644
>> --- a/include/linux/dma-fence.h
>> +++ b/include/linux/dma-fence.h
>> @@ -67,7 +67,7 @@ struct seq_file;
>>   */
>>  struct dma_fence {
>>  	spinlock_t *lock;
>> -	const struct dma_fence_ops *ops;
>> +	const struct dma_fence_ops __rcu *ops;
>>  	/*
>>  	 * We clear the callback list on kref_put so that by the time we
>>  	 * release the fence it is unused. No one should be adding to the
>> @@ -220,6 +220,10 @@ struct dma_fence_ops {
>>  	 * timed out. Can also return other error values on custom implementations,
>>  	 * which should be treated as if the fence is signaled. For example a hardware
>>  	 * lockup could be reported like that.
>> +	 *
>> +	 * Implementing this callback prevents the fence from detaching after
>> +	 * signaling and so it is mandatory for the module providing the
> 
> s/mandatory/necessary ?

Fixed.

> 
>> +	 * dma_fence_ops to stay loaded as long as the dma_fence exists.
>>  	 */
>>  	signed long (*wait)(struct dma_fence *fence,
>>  			    bool intr, signed long timeout);
>> @@ -231,6 +235,13 @@ struct dma_fence_ops {
>>  	 * Can be called from irq context.  This callback is optional. If it is
>>  	 * NULL, then dma_fence_free() is instead called as the default
>>  	 * implementation.
>> +	 *
>> +	 * Implementing this callback prevents the fence from detaching after
>> +	 * signaling and so it is mandatory for the module providing the
> 
> same

Fixed.

> 
>> +	 * dma_fence_ops to stay loaded as long as the dma_fence exists.
>> +	 *
>> +	 * If the callback is implemented the memory backing the dma_fence
>> +	 * object must be freed RCU safe.
>>  	 */
>>  	void (*release)(struct dma_fence *fence);
>>  
>> @@ -454,13 +465,19 @@ dma_fence_test_signaled_flag(struct dma_fence *fence)
>>  static inline bool
>>  dma_fence_is_signaled_locked(struct dma_fence *fence)
>>  {
>> +	const struct dma_fence_ops *ops;
>> +
>>  	if (dma_fence_test_signaled_flag(fence))
>>  		return true;
>>  
>> -	if (fence->ops->signaled && fence->ops->signaled(fence)) {
>> +	rcu_read_lock();
>> +	ops = rcu_dereference(fence->ops);
>> +	if (ops->signaled && ops->signaled(fence)) {
> 
> Maybe you can educate me a bit about RCU here – couldn't this still
> race? If the ops were unloaded before you take rcu_read_lock(),
> rcu_dereference() would give you an invalid pointer here since you
> don't check for !ops, no?

Perfectly correct thinking, yes.

But the check for !ops is added in patch #2 when we actually start to set ops = NULL when the fence signals.

I intentionally separated that because it is basically the second step in making the solution to detach the fence ops from the module by RCU work.

We could merge the two patches together, but I think the separation actually makes sense should anybody start to complain about the additional RCU overhead.

Thanks,
Christian.

> 
> 
>> +		rcu_read_unlock();
>>  		dma_fence_signal_locked(fence);
>>  		return true;
>>  	}
>> +	rcu_read_unlock();
>>  
>>  	return false;
>>  }
>> @@ -484,13 +501,19 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
>>  static inline bool
>>  dma_fence_is_signaled(struct dma_fence *fence)
>>  {
>> +	const struct dma_fence_ops *ops;
>> +
>>  	if (dma_fence_test_signaled_flag(fence))
>>  		return true;
>>  
>> -	if (fence->ops->signaled && fence->ops->signaled(fence)) {
>> +	rcu_read_lock();
>> +	ops = rcu_dereference(fence->ops);
>> +	if (ops->signaled && ops->signaled(fence)) {
> 
> same
> 
> 
> Danke,
> P.
> 
>> +		rcu_read_unlock();
>>  		dma_fence_signal(fence);
>>  		return true;
>>  	}
>> +	rcu_read_unlock();
>>  
>>  	return false;
>>  }
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/8] dma-buf: protected fence ops by RCU v5
  2026-02-11 15:43     ` Christian König
@ 2026-02-12  8:56       ` Philipp Stanner
  2026-02-19 10:23         ` Christian König
  2026-02-12  9:03       ` Tvrtko Ursulin
  1 sibling, 1 reply; 33+ messages in thread
From: Philipp Stanner @ 2026-02-12  8:56 UTC (permalink / raw)
  To: Christian König, phasta, matthew.brost, sumit.semwal
  Cc: dri-devel, linaro-mm-sig

On Wed, 2026-02-11 at 16:43 +0100, Christian König wrote:
> On 2/11/26 11:06, Philipp Stanner wrote:
> > On Tue, 2026-02-10 at 11:01 +0100, Christian König wrote:
> > > At first glance it is counter intuitive to protect a constant function
> > > pointer table by RCU, but this allows modules providing the function
> > > table to unload by waiting for an RCU grace period.
> > 
> > I think that someone who does not already have a deep understanding
> > about dma-buf and fences will have much trouble understanding *why*
> > this patch is in the log and *what it achieves*.
> > 
> > Good commit messages are at least as important as good code. In
> > drm/sched for example I've been trying so many times to figure out why
> > certain hacks and changes were implemented, but all that git-blame ever
> > gave me was one liners, often hinting at some driver internal work
> > around ._.
> 
> How about something like this:
> 
> The fence ops of a dma_fence currently need to life as long as the dma_fence is alive.
> 
> This means that the module who originally issued a dma_fence can't unload unless all of them are freed up.

s/who/which
s/of them/fences

> 
> As first step to solve this issue protect the fence ops by RCU.
> 
> While it is counter intuitive to protect a constant function pointer table by RCU it allows modules to wait for an RCU grace period to make sure that nobody is executing their functions any more.

I'd say "… allows modules to wait for an RCU grace period before they
unload, to make sure that …"

As for the commit's purpose, see bottom of my reply

> 
> 
> > 
> > > 
> > > v2: make one the now duplicated lockdep warnings a comment instead.
> > > v3: Add more documentation to ->wait and ->release callback.
> > > v4: fix typo in documentation
> > > v5: rebased on drm-tip
> > > 
> > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > ---
> > >  drivers/dma-buf/dma-fence.c | 69 +++++++++++++++++++++++++------------
> > >  include/linux/dma-fence.h   | 29 ++++++++++++++--
> > >  2 files changed, 73 insertions(+), 25 deletions(-)
> > > 
> > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > > index e05beae6e407..de9bf18be3d4 100644
> > > --- a/drivers/dma-buf/dma-fence.c
> > > +++ b/drivers/dma-buf/dma-fence.c
> > > @@ -522,6 +522,7 @@ EXPORT_SYMBOL(dma_fence_signal);
> > >  signed long
> > >  dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
> > >  {
> > > +	const struct dma_fence_ops *ops;
> > >  	signed long ret;
> > >  
> > >  	if (WARN_ON(timeout < 0))
> > > @@ -533,15 +534,21 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
> > >  
> > >  	dma_fence_enable_sw_signaling(fence);
> > >  
> > > -	if (trace_dma_fence_wait_start_enabled()) {
> > 
> > Why can wait_start_enabled() be removed? Is that related to the life
> > time decoupling or is it a separate topic?
> 
> It isn't removed, I've just removed the "if (trace_dma_fence_wait_start_enabled())" optimization which is used by the tracing subsystem as self-patching code (longer story).
> 
> The trace_dma_fence_wait_start() trace point function is still called a few lines below.

OK.

> 
> > > -		rcu_read_lock();
> > > -		trace_dma_fence_wait_start(fence);
> > > +	rcu_read_lock();
> > > +	ops = rcu_dereference(fence->ops);
> > > +	trace_dma_fence_wait_start(fence);
> > > +	if (ops->wait) {
> > > +		/*
> > > +		 * Implementing the wait ops is deprecated and not supported for
> > > +		 * issuer independent fences, so it is ok to use the ops outside
> > 
> > s/issuer/issuers of
> 
> Fixed.
> 
> > And how do we know that this here is an independent fence?
> > What even is an "independent fence" – one with internal spinlock?
> 
> I rephrased the sentence a bit to make that more clearer:
> 
>                 /*
>                  * Implementing the wait ops is deprecated and not supported for
>                  * issuers of fences who wants them to be independent of their

s/wants/need their lifetime to be

>                  * module after they signal, so it is ok to use the ops outside
>                  * the RCU protected section.
>                  */
> 
> 
> > 
> > > +		 * the RCU protected section.
> > > +		 */
> > > +		rcu_read_unlock();
> > > +		ret = ops->wait(fence, intr, timeout);
> > > +	} else {
> > >  		rcu_read_unlock();
> > > -	}
> > > -	if (fence->ops->wait)
> > > -		ret = fence->ops->wait(fence, intr, timeout);
> > > -	else
> > >  		ret = dma_fence_default_wait(fence, intr, timeout);
> > > +	}
> > 
> > The git diff here looks awkward. Do you use git format-patch --
> > histogram?
> 
> Nope, what's the matter?

The '}' is removed and then added again.

> 
> > >  	if (trace_dma_fence_wait_end_enabled()) {
> > >  		rcu_read_lock();
> > >  		trace_dma_fence_wait_end(fence);
> 
> > >  
> > > @@ -1049,7 +1067,12 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> > >  	BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
> > >  
> > >  	kref_init(&fence->refcount);
> > > -	fence->ops = ops;
> > > +	/*
> > > +	 * At first glance it is counter intuitive to protect a constant
> > > +	 * function pointer table by RCU, but this allows modules providing the
> > > +	 * function table to unload by waiting for an RCU grace period.
> > 
> > Maybe add a sentence like "Fences can live longer than the module which
> > issued them."
> 
> Going to use the same as the commit message here as soon as we synced up on that.

Jawohl.

> 
> > 
> > > +	 */
> > > +	RCU_INIT_POINTER(fence->ops, ops);
> > >  	INIT_LIST_HEAD(&fence->cb_list);
> > >  	fence->lock = lock;
> > >  	fence->context = context;
> > > @@ -1129,11 +1152,12 @@ EXPORT_SYMBOL(dma_fence_init64);
> > >   */
> > >  const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
> > >  {
> > > -	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
> > > -			 "RCU protection is required for safe access to returned string");
> > > +	const struct dma_fence_ops *ops;
> > >  
> > > +	/* RCU protection is required for safe access to returned string */
> > > +	ops = rcu_dereference(fence->ops);
> > >  	if (!dma_fence_test_signaled_flag(fence))
> > > -		return (const char __rcu *)fence->ops->get_driver_name(fence);
> > > +		return (const char __rcu *)ops->get_driver_name(fence);
> > >  	else
> > >  		return (const char __rcu *)"detached-driver";
> > >  }
> > > @@ -1161,11 +1185,12 @@ EXPORT_SYMBOL(dma_fence_driver_name);
> > >   */
> > >  const char __rcu *dma_fence_timeline_name(struct dma_fence *fence)
> > >  {
> > > -	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
> > > -			 "RCU protection is required for safe access to returned string");
> > > +	const struct dma_fence_ops *ops;
> > >  
> > > +	/* RCU protection is required for safe access to returned string */
> > > +	ops = rcu_dereference(fence->ops);
> > >  	if (!dma_fence_test_signaled_flag(fence))
> > > -		return (const char __rcu *)fence->ops->get_driver_name(fence);
> > > +		return (const char __rcu *)ops->get_driver_name(fence);
> > >  	else
> > >  		return (const char __rcu *)"signaled-timeline";
> > >  }
> > 
> > Did we make any progress in our conversation about removing those two
> > functions and callbacks? They're only used by i915.
> 
> Actually they are mostly used by the trace points and debugfs, so we certainly can't remove them.

._.

> 
> But I'm really wondering why the heck i915 is using them?

I just got confused because I couldn't find the place anymore. Since
they have removed it since then.

In older kernels it was used for driver logging:

https://elixir.bootlin.com/linux/v6.15.11/source/drivers/gpu/drm/i915/i915_sw_fence.c#L437


> 
> > > 
> > > @@ -454,13 +465,19 @@ dma_fence_test_signaled_flag(struct dma_fence *fence)
> > >  static inline bool
> > >  dma_fence_is_signaled_locked(struct dma_fence *fence)
> > >  {
> > > +	const struct dma_fence_ops *ops;
> > > +
> > >  	if (dma_fence_test_signaled_flag(fence))
> > >  		return true;
> > >  
> > > -	if (fence->ops->signaled && fence->ops->signaled(fence)) {
> > > +	rcu_read_lock();
> > > +	ops = rcu_dereference(fence->ops);
> > > +	if (ops->signaled && ops->signaled(fence)) {
> > 
> > Maybe you can educate me a bit about RCU here – couldn't this still
> > race? If the ops were unloaded before you take rcu_read_lock(),
> > rcu_dereference() would give you an invalid pointer here since you
> > don't check for !ops, no?
> 
> Perfectly correct thinking, yes.
> 
> But the check for !ops is added in patch #2 when we actually start to set ops = NULL when the fence signals.
> 
> I intentionally separated that because it is basically the second step in making the solution to detach the fence ops from the module by RCU work.
> 
> We could merge the two patches together, but I think the separation actually makes sense should anybody start to complain about the additional RCU overhead.
> 

Alright, makes sense. However the above does not read correct..

But then my question would be: What's the purpose of this patch, what
does it solve or address atomically?

Adding RCU here does not yet change behavior and it does not solve the
unloading problem, does it?


If it's a mere preperational step and the patches should not be merged,
I'd guard the above with a simple comment like "Cleanup preparation.
'ops' can yet not be NULL, but this will be the case subsequently."


P.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/8] dma-buf: inline spinlock for fence protection v4
  2026-02-11 14:59     ` Christian König
@ 2026-02-12  9:01       ` Philipp Stanner
  0 siblings, 0 replies; 33+ messages in thread
From: Philipp Stanner @ 2026-02-12  9:01 UTC (permalink / raw)
  To: Christian König, phasta, matthew.brost, sumit.semwal
  Cc: dri-devel, linaro-mm-sig

On Wed, 2026-02-11 at 15:59 +0100, Christian König wrote:
> On 2/11/26 10:50, Philipp Stanner wrote:
> > On Tue, 2026-02-10 at 11:01 +0100, Christian König wrote:
> ...
> > > Using a per-fence spinlock allows completely decoupling spinlock producer
> > > and consumer life times, simplifying the handling in most use cases.
> > 
> > That's a good commit message btw, detailing what the motivation is.
> > Would be great to see messages like that more frequently :]
> 
> Yeah, but they are not so easy to write.

Valuable things are rarely easy :}

> 
> > >  	trace_dma_fence_init(fence);
> > > @@ -1091,7 +1094,7 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> > >   * dma_fence_init - Initialize a custom fence.
> > >   * @fence: the fence to initialize
> > >   * @ops: the dma_fence_ops for operations on this fence
> > > - * @lock: the irqsafe spinlock to use for locking this fence
> > > + * @lock: optional irqsafe spinlock to use for locking this fence
> > >   * @context: the execution context this fence is run on
> > >   * @seqno: a linear increasing sequence number for this context
> > >   *
> > > @@ -1101,6 +1104,10 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> > >   *
> > >   * context and seqno are used for easy comparison between fences, allowing
> > >   * to check which fence is later by simply using dma_fence_later().
> > > + *
> > > + * It is strongly discouraged to provide an external lock. This is only allowed
> > 
> > "strongly discouraged […] because this does not decouple lock and fence
> > life times." ?
> 
> Good point, added some more text.
>  
> > > + * for legacy use cases when multiple fences need to be prevented from
> > > + * signaling out of order.
> > 
> > I think our previous discussions revealed that the external lock does
> > not even help with that, does it?
> 
> Well only when you provide a ->signaled() callback in the dma_fence_ops.

Mhm, no?

The external lock does not protect against signaling out ouf order,
independently of that callback, because a driver can take and release
that lock in between signaling.

The way how to get this right is to make the fence context and actual
object with actual rules. In Rust, it could also house timeline and
driver name strings, requiring two fewer callbacks.

> 
> The reason we have so much different approaches in the dma_fence handling is because it is basically the unification multiple different driver implementations which all targeted more or less different use cases.
> 

When did dma_fence actually come to be? I suppose at some point we
discovered that all drivers basically have very similar requirements
regarding their job completion signaling.

> > > 

[…]

> > >  
> > >  enum dma_fence_flag_bits {
> > >  	DMA_FENCE_FLAG_INITIALIZED_BIT,
> > > +	DMA_FENCE_FLAG_INLINE_LOCK_BIT,
> > 
> > Just asking about a nit: what's the order here, always alphabetically?
> 
> In which the flags are used in the code flow.

Not intuitive, but it's OK, no big deal



P.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/8] dma-buf: protected fence ops by RCU v5
  2026-02-11 15:43     ` Christian König
  2026-02-12  8:56       ` Philipp Stanner
@ 2026-02-12  9:03       ` Tvrtko Ursulin
  1 sibling, 0 replies; 33+ messages in thread
From: Tvrtko Ursulin @ 2026-02-12  9:03 UTC (permalink / raw)
  To: Christian König, phasta, matthew.brost, sumit.semwal
  Cc: dri-devel, linaro-mm-sig


On 11/02/2026 15:43, Christian König wrote:

8><

>>> +	 */
>>> +	RCU_INIT_POINTER(fence->ops, ops);
>>>   	INIT_LIST_HEAD(&fence->cb_list);
>>>   	fence->lock = lock;
>>>   	fence->context = context;
>>> @@ -1129,11 +1152,12 @@ EXPORT_SYMBOL(dma_fence_init64);
>>>    */
>>>   const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
>>>   {
>>> -	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
>>> -			 "RCU protection is required for safe access to returned string");
>>> +	const struct dma_fence_ops *ops;
>>>   
>>> +	/* RCU protection is required for safe access to returned string */
>>> +	ops = rcu_dereference(fence->ops);
>>>   	if (!dma_fence_test_signaled_flag(fence))
>>> -		return (const char __rcu *)fence->ops->get_driver_name(fence);
>>> +		return (const char __rcu *)ops->get_driver_name(fence);
>>>   	else
>>>   		return (const char __rcu *)"detached-driver";
>>>   }
>>> @@ -1161,11 +1185,12 @@ EXPORT_SYMBOL(dma_fence_driver_name);
>>>    */
>>>   const char __rcu *dma_fence_timeline_name(struct dma_fence *fence)
>>>   {
>>> -	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
>>> -			 "RCU protection is required for safe access to returned string");
>>> +	const struct dma_fence_ops *ops;
>>>   
>>> +	/* RCU protection is required for safe access to returned string */
>>> +	ops = rcu_dereference(fence->ops);
>>>   	if (!dma_fence_test_signaled_flag(fence))
>>> -		return (const char __rcu *)fence->ops->get_driver_name(fence);
>>> +		return (const char __rcu *)ops->get_driver_name(fence);
>>>   	else
>>>   		return (const char __rcu *)"signaled-timeline";
>>>   }
>>
>> Did we make any progress in our conversation about removing those two
>> functions and callbacks? They're only used by i915.
> 
> Actually they are mostly used by the trace points and debugfs, so we certainly can't remove them.
> 
> But I'm really wondering why the heck i915 is using them?

Mostly directed to Philipp - by using you mean calling the helpers? I 
thought I mentioned before the synce fence uapi (SYNC_IOC_FILE_INFO) 
actually relies on the names. Sync fence was in fact the easiest way to 
trigger the use after free as a posted the IGT to show it last year. So 
to remove we would need to prove no existing userspace uses that.

Regards,

Tvrtko


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/8] dma-buf: abstract fence locking v2
  2026-02-10 10:01 ` [PATCH 3/8] dma-buf: abstract fence locking v2 Christian König
@ 2026-02-12  9:07   ` Tvrtko Ursulin
  0 siblings, 0 replies; 33+ messages in thread
From: Tvrtko Ursulin @ 2026-02-12  9:07 UTC (permalink / raw)
  To: Christian König, phasta, matthew.brost, sumit.semwal
  Cc: dri-devel, linaro-mm-sig


On 10/02/2026 10:01, Christian König wrote:
> Add dma_fence_lock_irqsafe() and dma_fence_unlock_irqrestore() wrappers
> and mechanically apply them everywhere.
> 
> Just a pre-requisite cleanup for a follow up patch.
> 
> v2: add some missing i915 bits, add abstraction for lockdep assertion as
>      well
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> (v1)

LGTM, can upgrade the r-b:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>

Regards,

Tvrtko

> ---
>   drivers/dma-buf/dma-fence.c                 | 48 ++++++++++-----------
>   drivers/dma-buf/st-dma-fence.c              |  6 ++-
>   drivers/dma-buf/sw_sync.c                   | 14 +++---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c    |  4 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      |  4 +-
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c |  2 +-
>   drivers/gpu/drm/i915/i915_active.c          | 19 ++++----
>   drivers/gpu/drm/nouveau/nouveau_drm.c       |  5 ++-
>   drivers/gpu/drm/scheduler/sched_fence.c     |  6 +--
>   drivers/gpu/drm/xe/xe_sched_job.c           |  4 +-
>   include/linux/dma-fence.h                   | 38 ++++++++++++++++
>   11 files changed, 95 insertions(+), 55 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index ba02321bef0b..56aa59867eaa 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -365,7 +365,7 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>   	struct dma_fence_cb *cur, *tmp;
>   	struct list_head cb_list;
>   
> -	lockdep_assert_held(fence->lock);
> +	dma_fence_assert_held(fence);
>   
>   	if (unlikely(test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
>   				      &fence->flags)))
> @@ -412,9 +412,9 @@ void dma_fence_signal_timestamp(struct dma_fence *fence, ktime_t timestamp)
>   	if (WARN_ON(!fence))
>   		return;
>   
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock_irqsave(fence, flags);
>   	dma_fence_signal_timestamp_locked(fence, timestamp);
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock_irqrestore(fence, flags);
>   }
>   EXPORT_SYMBOL(dma_fence_signal_timestamp);
>   
> @@ -473,9 +473,9 @@ bool dma_fence_check_and_signal(struct dma_fence *fence)
>   	unsigned long flags;
>   	bool ret;
>   
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock_irqsave(fence, flags);
>   	ret = dma_fence_check_and_signal_locked(fence);
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock_irqrestore(fence, flags);
>   
>   	return ret;
>   }
> @@ -501,9 +501,9 @@ void dma_fence_signal(struct dma_fence *fence)
>   
>   	tmp = dma_fence_begin_signalling();
>   
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock_irqsave(fence, flags);
>   	dma_fence_signal_timestamp_locked(fence, ktime_get());
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock_irqrestore(fence, flags);
>   
>   	dma_fence_end_signalling(tmp);
>   }
> @@ -603,10 +603,10 @@ void dma_fence_release(struct kref *kref)
>   		 * don't leave chains dangling. We set the error flag first
>   		 * so that the callbacks know this signal is due to an error.
>   		 */
> -		spin_lock_irqsave(fence->lock, flags);
> +		dma_fence_lock_irqsave(fence, flags);
>   		fence->error = -EDEADLK;
>   		dma_fence_signal_locked(fence);
> -		spin_unlock_irqrestore(fence->lock, flags);
> +		dma_fence_unlock_irqrestore(fence, flags);
>   	}
>   
>   	ops = rcu_dereference(fence->ops);
> @@ -636,7 +636,7 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
>   	const struct dma_fence_ops *ops;
>   	bool was_set;
>   
> -	lockdep_assert_held(fence->lock);
> +	dma_fence_assert_held(fence);
>   
>   	was_set = test_and_set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
>   				   &fence->flags);
> @@ -672,9 +672,9 @@ void dma_fence_enable_sw_signaling(struct dma_fence *fence)
>   {
>   	unsigned long flags;
>   
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock_irqsave(fence, flags);
>   	__dma_fence_enable_signaling(fence);
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock_irqrestore(fence, flags);
>   }
>   EXPORT_SYMBOL(dma_fence_enable_sw_signaling);
>   
> @@ -714,8 +714,7 @@ int dma_fence_add_callback(struct dma_fence *fence, struct dma_fence_cb *cb,
>   		return -ENOENT;
>   	}
>   
> -	spin_lock_irqsave(fence->lock, flags);
> -
> +	dma_fence_lock_irqsave(fence, flags);
>   	if (__dma_fence_enable_signaling(fence)) {
>   		cb->func = func;
>   		list_add_tail(&cb->node, &fence->cb_list);
> @@ -723,8 +722,7 @@ int dma_fence_add_callback(struct dma_fence *fence, struct dma_fence_cb *cb,
>   		INIT_LIST_HEAD(&cb->node);
>   		ret = -ENOENT;
>   	}
> -
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock_irqrestore(fence, flags);
>   
>   	return ret;
>   }
> @@ -747,9 +745,9 @@ int dma_fence_get_status(struct dma_fence *fence)
>   	unsigned long flags;
>   	int status;
>   
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock_irqsave(fence, flags);
>   	status = dma_fence_get_status_locked(fence);
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock_irqrestore(fence, flags);
>   
>   	return status;
>   }
> @@ -779,13 +777,11 @@ dma_fence_remove_callback(struct dma_fence *fence, struct dma_fence_cb *cb)
>   	unsigned long flags;
>   	bool ret;
>   
> -	spin_lock_irqsave(fence->lock, flags);
> -
> +	dma_fence_lock_irqsave(fence, flags);
>   	ret = !list_empty(&cb->node);
>   	if (ret)
>   		list_del_init(&cb->node);
> -
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock_irqrestore(fence, flags);
>   
>   	return ret;
>   }
> @@ -824,7 +820,7 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
>   	unsigned long flags;
>   	signed long ret = timeout ? timeout : 1;
>   
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock_irqsave(fence, flags);
>   
>   	if (dma_fence_test_signaled_flag(fence))
>   		goto out;
> @@ -848,11 +844,11 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
>   			__set_current_state(TASK_INTERRUPTIBLE);
>   		else
>   			__set_current_state(TASK_UNINTERRUPTIBLE);
> -		spin_unlock_irqrestore(fence->lock, flags);
> +		dma_fence_unlock_irqrestore(fence, flags);
>   
>   		ret = schedule_timeout(ret);
>   
> -		spin_lock_irqsave(fence->lock, flags);
> +		dma_fence_lock_irqsave(fence, flags);
>   		if (ret > 0 && intr && signal_pending(current))
>   			ret = -ERESTARTSYS;
>   	}
> @@ -862,7 +858,7 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
>   	__set_current_state(TASK_RUNNING);
>   
>   out:
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock_irqrestore(fence, flags);
>   	return ret;
>   }
>   EXPORT_SYMBOL(dma_fence_default_wait);
> diff --git a/drivers/dma-buf/st-dma-fence.c b/drivers/dma-buf/st-dma-fence.c
> index 73ed6fd48a13..5d0d9abc6e21 100644
> --- a/drivers/dma-buf/st-dma-fence.c
> +++ b/drivers/dma-buf/st-dma-fence.c
> @@ -410,8 +410,10 @@ struct race_thread {
>   
>   static void __wait_for_callbacks(struct dma_fence *f)
>   {
> -	spin_lock_irq(f->lock);
> -	spin_unlock_irq(f->lock);
> +	unsigned long flags;
> +
> +	dma_fence_lock_irqsave(f, flags);
> +	dma_fence_unlock_irqrestore(f, flags);
>   }
>   
>   static int thread_signal_callback(void *arg)
> diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
> index 6f09d13be6b6..4c81a37dd682 100644
> --- a/drivers/dma-buf/sw_sync.c
> +++ b/drivers/dma-buf/sw_sync.c
> @@ -156,12 +156,12 @@ static void timeline_fence_release(struct dma_fence *fence)
>   	struct sync_timeline *parent = dma_fence_parent(fence);
>   	unsigned long flags;
>   
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock_irqsave(fence, flags);
>   	if (!list_empty(&pt->link)) {
>   		list_del(&pt->link);
>   		rb_erase(&pt->node, &parent->pt_tree);
>   	}
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock_irqrestore(fence, flags);
>   
>   	sync_timeline_put(parent);
>   	dma_fence_free(fence);
> @@ -179,7 +179,7 @@ static void timeline_fence_set_deadline(struct dma_fence *fence, ktime_t deadlin
>   	struct sync_pt *pt = dma_fence_to_sync_pt(fence);
>   	unsigned long flags;
>   
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock_irqsave(fence, flags);
>   	if (test_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags)) {
>   		if (ktime_before(deadline, pt->deadline))
>   			pt->deadline = deadline;
> @@ -187,7 +187,7 @@ static void timeline_fence_set_deadline(struct dma_fence *fence, ktime_t deadlin
>   		pt->deadline = deadline;
>   		__set_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags);
>   	}
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock_irqrestore(fence, flags);
>   }
>   
>   static const struct dma_fence_ops timeline_fence_ops = {
> @@ -431,13 +431,13 @@ static int sw_sync_ioctl_get_deadline(struct sync_timeline *obj, unsigned long a
>   		goto put_fence;
>   	}
>   
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock_irqsave(fence, flags);
>   	if (!test_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags)) {
>   		ret = -ENOENT;
>   		goto unlock;
>   	}
>   	data.deadline_ns = ktime_to_ns(pt->deadline);
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock_irqrestore(fence, flags);
>   
>   	dma_fence_put(fence);
>   
> @@ -450,7 +450,7 @@ static int sw_sync_ioctl_get_deadline(struct sync_timeline *obj, unsigned long a
>   	return 0;
>   
>   unlock:
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock_irqrestore(fence, flags);
>   put_fence:
>   	dma_fence_put(fence);
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> index b82357c65723..1404e1fe62a4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> @@ -479,10 +479,10 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, unsigned int vmid,
>   	if (amdgpu_sriov_vf(ring->adev) || !ring->funcs->soft_recovery || !fence)
>   		return false;
>   
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock_irqsave(fence, flags);
>   	if (!dma_fence_is_signaled_locked(fence))
>   		dma_fence_set_error(fence, -ENODATA);
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock_irqrestore(fence, flags);
>   
>   	while (!dma_fence_is_signaled(fence) &&
>   	       ktime_to_ns(ktime_sub(deadline, ktime_get())) > 0)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 6a2ea200d90c..4761e7486811 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -2802,8 +2802,8 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
>   	dma_fence_put(vm->last_unlocked);
>   	dma_fence_wait(vm->last_tlb_flush, false);
>   	/* Make sure that all fence callbacks have completed */
> -	spin_lock_irqsave(vm->last_tlb_flush->lock, flags);
> -	spin_unlock_irqrestore(vm->last_tlb_flush->lock, flags);
> +	dma_fence_lock_irqsave(vm->last_tlb_flush, flags);
> +	dma_fence_unlock_irqrestore(vm->last_tlb_flush, flags);
>   	dma_fence_put(vm->last_tlb_flush);
>   
>   	list_for_each_entry_safe(mapping, tmp, &vm->freed, list) {
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> index bf6117d5fc57..78ea2d9ccedf 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> @@ -148,7 +148,7 @@ __dma_fence_signal__notify(struct dma_fence *fence,
>   {
>   	struct dma_fence_cb *cur, *tmp;
>   
> -	lockdep_assert_held(fence->lock);
> +	dma_fence_assert_held(fence);
>   
>   	list_for_each_entry_safe(cur, tmp, list, node) {
>   		INIT_LIST_HEAD(&cur->node);
> diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
> index 6b0c1162505a..9d41e052ab65 100644
> --- a/drivers/gpu/drm/i915/i915_active.c
> +++ b/drivers/gpu/drm/i915/i915_active.c
> @@ -1045,9 +1045,10 @@ __i915_active_fence_set(struct i915_active_fence *active,
>   	 * nesting rules for the fence->lock; the inner lock is always the
>   	 * older lock.
>   	 */
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock_irqsave(fence, flags);
>   	if (prev)
> -		spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING);
> +		spin_lock_nested(dma_fence_spinlock(prev),
> +				 SINGLE_DEPTH_NESTING);
>   
>   	/*
>   	 * A does the cmpxchg first, and so it sees C or NULL, as before, or
> @@ -1061,17 +1062,18 @@ __i915_active_fence_set(struct i915_active_fence *active,
>   	 */
>   	while (cmpxchg(__active_fence_slot(active), prev, fence) != prev) {
>   		if (prev) {
> -			spin_unlock(prev->lock);
> +			spin_unlock(dma_fence_spinlock(prev));
>   			dma_fence_put(prev);
>   		}
> -		spin_unlock_irqrestore(fence->lock, flags);
> +		dma_fence_unlock_irqrestore(fence, flags);
>   
>   		prev = i915_active_fence_get(active);
>   		GEM_BUG_ON(prev == fence);
>   
> -		spin_lock_irqsave(fence->lock, flags);
> +		dma_fence_lock_irqsave(fence, flags);
>   		if (prev)
> -			spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING);
> +			spin_lock_nested(dma_fence_spinlock(prev),
> +					 SINGLE_DEPTH_NESTING);
>   	}
>   
>   	/*
> @@ -1088,10 +1090,11 @@ __i915_active_fence_set(struct i915_active_fence *active,
>   	 */
>   	if (prev) {
>   		__list_del_entry(&active->cb.node);
> -		spin_unlock(prev->lock); /* serialise with prev->cb_list */
> +		/* serialise with prev->cb_list */
> +		spin_unlock(dma_fence_spinlock(prev));
>   	}
>   	list_add_tail(&active->cb.node, &fence->cb_list);
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock_irqrestore(fence, flags);
>   
>   	return prev;
>   }
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index 1527b801f013..ec4dfa3ea725 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -156,12 +156,13 @@ nouveau_name(struct drm_device *dev)
>   static inline bool
>   nouveau_cli_work_ready(struct dma_fence *fence)
>   {
> +	unsigned long flags;
>   	bool ret = true;
>   
> -	spin_lock_irq(fence->lock);
> +	dma_fence_lock_irqsave(fence, flags);
>   	if (!dma_fence_is_signaled_locked(fence))
>   		ret = false;
> -	spin_unlock_irq(fence->lock);
> +	dma_fence_unlock_irqrestore(fence, flags);
>   
>   	if (ret == true)
>   		dma_fence_put(fence);
> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
> index 9391d6f0dc01..724d77694246 100644
> --- a/drivers/gpu/drm/scheduler/sched_fence.c
> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> @@ -156,19 +156,19 @@ static void drm_sched_fence_set_deadline_finished(struct dma_fence *f,
>   	struct dma_fence *parent;
>   	unsigned long flags;
>   
> -	spin_lock_irqsave(&fence->lock, flags);
> +	dma_fence_lock_irqsave(f, flags);
>   
>   	/* If we already have an earlier deadline, keep it: */
>   	if (test_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) &&
>   	    ktime_before(fence->deadline, deadline)) {
> -		spin_unlock_irqrestore(&fence->lock, flags);
> +		dma_fence_unlock_irqrestore(f, flags);
>   		return;
>   	}
>   
>   	fence->deadline = deadline;
>   	set_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags);
>   
> -	spin_unlock_irqrestore(&fence->lock, flags);
> +	dma_fence_unlock_irqrestore(f, flags);
>   
>   	/*
>   	 * smp_load_aquire() to ensure that if we are racing another
> diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
> index 3927666fe556..ae5b38b2a884 100644
> --- a/drivers/gpu/drm/xe/xe_sched_job.c
> +++ b/drivers/gpu/drm/xe/xe_sched_job.c
> @@ -190,11 +190,11 @@ static bool xe_fence_set_error(struct dma_fence *fence, int error)
>   	unsigned long irq_flags;
>   	bool signaled;
>   
> -	spin_lock_irqsave(fence->lock, irq_flags);
> +	dma_fence_lock_irqsave(fence, irq_flags);
>   	signaled = test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
>   	if (!signaled)
>   		dma_fence_set_error(fence, error);
> -	spin_unlock_irqrestore(fence->lock, irq_flags);
> +	dma_fence_unlock_irqrestore(fence, irq_flags);
>   
>   	return signaled;
>   }
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index e1afbb5909f9..88c842fc35d5 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -377,6 +377,44 @@ dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep)
>   	} while (1);
>   }
>   
> +/**
> + * dma_fence_spinlock - return pointer to the spinlock protecting the fence
> + * @fence: the fence to get the lock from
> + *
> + * Return the pointer to the extern lock.
> + */
> +static inline spinlock_t *dma_fence_spinlock(struct dma_fence *fence)
> +{
> +	return fence->lock;
> +}
> +
> +/**
> + * dma_fence_lock_irqsave - irqsave lock the fence
> + * @fence: the fence to lock
> + * @flags: where to store the CPU flags.
> + *
> + * Lock the fence, preventing it from changing to the signaled state.
> + */
> +#define dma_fence_lock_irqsave(fence, flags)	\
> +	spin_lock_irqsave(fence->lock, flags)
> +
> +/**
> + * dma_fence_unlock_irqrestore - unlock the fence and irqrestore
> + * @fence: the fence to unlock
> + * @flags the CPU flags to restore
> + *
> + * Unlock the fence, allowing it to change it's state to signaled again.
> + */
> +#define dma_fence_unlock_irqrestore(fence, flags)	\
> +	spin_unlock_irqrestore(fence->lock, flags)
> +
> +/**
> + * dma_fence_assert_held - lockdep assertion that fence is locked
> + * @fence: the fence which should be locked
> + */
> +#define dma_fence_assert_held(fence)	\
> +	lockdep_assert_held(dma_fence_spinlock(fence));
> +
>   #ifdef CONFIG_LOCKDEP
>   bool dma_fence_begin_signalling(void);
>   void dma_fence_end_signalling(bool cookie);


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/8] dma-buf: inline spinlock for fence protection v4
  2026-02-10 10:01 ` [PATCH 4/8] dma-buf: inline spinlock for fence protection v4 Christian König
  2026-02-11  9:50   ` Philipp Stanner
@ 2026-02-12  9:16   ` Tvrtko Ursulin
  2026-02-13 14:27   ` Boris Brezillon
  2 siblings, 0 replies; 33+ messages in thread
From: Tvrtko Ursulin @ 2026-02-12  9:16 UTC (permalink / raw)
  To: Christian König, phasta, matthew.brost, sumit.semwal
  Cc: dri-devel, linaro-mm-sig


On 10/02/2026 10:01, Christian König wrote:
> Implement per-fence spinlocks, allowing implementations to not give an
> external spinlock to protect the fence internal statei. Instead a spinlock
> embedded into the fence structure itself is used in this case.
> 
> Shared spinlocks have the problem that implementations need to guarantee
> that the lock live at least as long all fences referencing them.
> 
> Using a per-fence spinlock allows completely decoupling spinlock producer
> and consumer life times, simplifying the handling in most use cases.
> 
> v2: improve naming, coverage and function documentation
> v3: fix one additional locking in the selftests
> v4: separate out some changes to make the patch smaller,
>      fix one amdgpu crash found by CI systems
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/dma-buf/dma-fence.c             | 21 ++++++++++++++++-----
>   drivers/dma-buf/sync_debug.h            |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  2 +-
>   drivers/gpu/drm/drm_crtc.c              |  2 +-
>   drivers/gpu/drm/drm_writeback.c         |  2 +-
>   drivers/gpu/drm/nouveau/nouveau_fence.c |  3 ++-
>   drivers/gpu/drm/qxl/qxl_release.c       |  3 ++-
>   drivers/gpu/drm/vmwgfx/vmwgfx_fence.c   |  3 ++-
>   drivers/gpu/drm/xe/xe_hw_fence.c        |  3 ++-
>   include/linux/dma-fence.h               | 19 +++++++++++++------
>   10 files changed, 41 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 56aa59867eaa..1833889e7466 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -343,7 +343,6 @@ void __dma_fence_might_wait(void)
>   }
>   #endif
>   
> -
>   /**
>    * dma_fence_signal_timestamp_locked - signal completion of a fence
>    * @fence: the fence to signal
> @@ -1067,7 +1066,6 @@ static void
>   __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>   	         spinlock_t *lock, u64 context, u64 seqno, unsigned long flags)
>   {
> -	BUG_ON(!lock);
>   	BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
>   
>   	kref_init(&fence->refcount);
> @@ -1078,10 +1076,15 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>   	 */
>   	RCU_INIT_POINTER(fence->ops, ops);
>   	INIT_LIST_HEAD(&fence->cb_list);
> -	fence->lock = lock;
>   	fence->context = context;
>   	fence->seqno = seqno;
>   	fence->flags = flags | BIT(DMA_FENCE_FLAG_INITIALIZED_BIT);
> +	if (lock) {
> +		fence->extern_lock = lock;
> +	} else {
> +		spin_lock_init(&fence->inline_lock);
> +		fence->flags |= BIT(DMA_FENCE_FLAG_INLINE_LOCK_BIT);
> +	}
>   	fence->error = 0;
>   
>   	trace_dma_fence_init(fence);
> @@ -1091,7 +1094,7 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>    * dma_fence_init - Initialize a custom fence.
>    * @fence: the fence to initialize
>    * @ops: the dma_fence_ops for operations on this fence
> - * @lock: the irqsafe spinlock to use for locking this fence
> + * @lock: optional irqsafe spinlock to use for locking this fence
>    * @context: the execution context this fence is run on
>    * @seqno: a linear increasing sequence number for this context
>    *
> @@ -1101,6 +1104,10 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>    *
>    * context and seqno are used for easy comparison between fences, allowing
>    * to check which fence is later by simply using dma_fence_later().
> + *
> + * It is strongly discouraged to provide an external lock. This is only allowed
> + * for legacy use cases when multiple fences need to be prevented from
> + * signaling out of order.
>    */
>   void
>   dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> @@ -1114,7 +1121,7 @@ EXPORT_SYMBOL(dma_fence_init);
>    * dma_fence_init64 - Initialize a custom fence with 64-bit seqno support.
>    * @fence: the fence to initialize
>    * @ops: the dma_fence_ops for operations on this fence
> - * @lock: the irqsafe spinlock to use for locking this fence
> + * @lock: optional irqsafe spinlock to use for locking this fence
>    * @context: the execution context this fence is run on
>    * @seqno: a linear increasing sequence number for this context
>    *
> @@ -1124,6 +1131,10 @@ EXPORT_SYMBOL(dma_fence_init);
>    *
>    * Context and seqno are used for easy comparison between fences, allowing
>    * to check which fence is later by simply using dma_fence_later().
> + *
> + * It is strongly discouraged to provide an external lock. This is only allowed
> + * for legacy use cases when multiple fences need to be prevented from
> + * signaling out of order.
>    */
>   void
>   dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops,
> diff --git a/drivers/dma-buf/sync_debug.h b/drivers/dma-buf/sync_debug.h
> index 02af347293d0..c49324505b20 100644
> --- a/drivers/dma-buf/sync_debug.h
> +++ b/drivers/dma-buf/sync_debug.h
> @@ -47,7 +47,7 @@ struct sync_timeline {
>   
>   static inline struct sync_timeline *dma_fence_parent(struct dma_fence *fence)
>   {
> -	return container_of(fence->lock, struct sync_timeline, lock);
> +	return container_of(fence->extern_lock, struct sync_timeline, lock);
>   }
>   
>   /**
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index 139642eacdd0..d5c41e24fb51 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -638,7 +638,7 @@ static inline uint64_t amdgpu_vm_tlb_seq(struct amdgpu_vm *vm)
>   	 * sure that the dma_fence structure isn't freed up.
>   	 */
>   	rcu_read_lock();
> -	lock = vm->last_tlb_flush->lock;
> +	lock = dma_fence_spinlock(vm->last_tlb_flush);

This hunk should go into the patch which adds dma_fence_spinlock helper. 
With that and fixes to typo and comments improvements Philipp is 
pointing out:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>

Regards,

Tvrtko

>   	rcu_read_unlock();
>   
>   	spin_lock_irqsave(lock, flags);
> diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
> index a7797d260f1e..17472915842f 100644
> --- a/drivers/gpu/drm/drm_crtc.c
> +++ b/drivers/gpu/drm/drm_crtc.c
> @@ -159,7 +159,7 @@ static const struct dma_fence_ops drm_crtc_fence_ops;
>   static struct drm_crtc *fence_to_crtc(struct dma_fence *fence)
>   {
>   	BUG_ON(fence->ops != &drm_crtc_fence_ops);
> -	return container_of(fence->lock, struct drm_crtc, fence_lock);
> +	return container_of(fence->extern_lock, struct drm_crtc, fence_lock);
>   }
>   
>   static const char *drm_crtc_fence_get_driver_name(struct dma_fence *fence)
> diff --git a/drivers/gpu/drm/drm_writeback.c b/drivers/gpu/drm/drm_writeback.c
> index 95b8a2e4bda6..624a4e8b6c99 100644
> --- a/drivers/gpu/drm/drm_writeback.c
> +++ b/drivers/gpu/drm/drm_writeback.c
> @@ -81,7 +81,7 @@
>    *	From userspace, this property will always read as zero.
>    */
>   
> -#define fence_to_wb_connector(x) container_of(x->lock, \
> +#define fence_to_wb_connector(x) container_of(x->extern_lock, \
>   					      struct drm_writeback_connector, \
>   					      fence_lock)
>   
> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
> index 4a193b7d6d9e..c282c94138b2 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> @@ -41,7 +41,8 @@ static const struct dma_fence_ops nouveau_fence_ops_legacy;
>   static inline struct nouveau_fence_chan *
>   nouveau_fctx(struct nouveau_fence *fence)
>   {
> -	return container_of(fence->base.lock, struct nouveau_fence_chan, lock);
> +	return container_of(fence->base.extern_lock, struct nouveau_fence_chan,
> +			    lock);
>   }
>   
>   static bool
> diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
> index 06b0b2aa7953..37d4ae0faf0d 100644
> --- a/drivers/gpu/drm/qxl/qxl_release.c
> +++ b/drivers/gpu/drm/qxl/qxl_release.c
> @@ -62,7 +62,8 @@ static long qxl_fence_wait(struct dma_fence *fence, bool intr,
>   	struct qxl_device *qdev;
>   	unsigned long cur, end = jiffies + timeout;
>   
> -	qdev = container_of(fence->lock, struct qxl_device, release_lock);
> +	qdev = container_of(fence->extern_lock, struct qxl_device,
> +			    release_lock);
>   
>   	if (!wait_event_timeout(qdev->release_event,
>   				(dma_fence_is_signaled(fence) ||
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> index 85795082fef9..d251eec57df9 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> @@ -47,7 +47,8 @@ struct vmw_event_fence_action {
>   static struct vmw_fence_manager *
>   fman_from_fence(struct vmw_fence_obj *fence)
>   {
> -	return container_of(fence->base.lock, struct vmw_fence_manager, lock);
> +	return container_of(fence->base.extern_lock, struct vmw_fence_manager,
> +			    lock);
>   }
>   
>   static void vmw_fence_obj_destroy(struct dma_fence *f)
> diff --git a/drivers/gpu/drm/xe/xe_hw_fence.c b/drivers/gpu/drm/xe/xe_hw_fence.c
> index ae8ed15b64c5..14720623ad00 100644
> --- a/drivers/gpu/drm/xe/xe_hw_fence.c
> +++ b/drivers/gpu/drm/xe/xe_hw_fence.c
> @@ -124,7 +124,8 @@ static struct xe_hw_fence *to_xe_hw_fence(struct dma_fence *fence);
>   
>   static struct xe_hw_fence_irq *xe_hw_fence_irq(struct xe_hw_fence *fence)
>   {
> -	return container_of(fence->dma.lock, struct xe_hw_fence_irq, lock);
> +	return container_of(fence->dma.extern_lock, struct xe_hw_fence_irq,
> +			    lock);
>   }
>   
>   static const char *xe_hw_fence_get_driver_name(struct dma_fence *dma_fence)
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 88c842fc35d5..6eabbb1c471c 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -34,7 +34,8 @@ struct seq_file;
>    * @ops: dma_fence_ops associated with this fence
>    * @rcu: used for releasing fence with kfree_rcu
>    * @cb_list: list of all callbacks to call
> - * @lock: spin_lock_irqsave used for locking
> + * @extern_lock: external spin_lock_irqsave used for locking
> + * @inline_lock: alternative internal spin_lock_irqsave used for locking
>    * @context: execution context this fence belongs to, returned by
>    *           dma_fence_context_alloc()
>    * @seqno: the sequence number of this fence inside the execution context,
> @@ -49,6 +50,7 @@ struct seq_file;
>    * of the time.
>    *
>    * DMA_FENCE_FLAG_INITIALIZED_BIT - fence was initialized
> + * DMA_FENCE_FLAG_INLINE_LOCK_BIT - use inline spinlock instead of external one
>    * DMA_FENCE_FLAG_SIGNALED_BIT - fence is already signaled
>    * DMA_FENCE_FLAG_TIMESTAMP_BIT - timestamp recorded for fence signaling
>    * DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT - enable_signaling might have been called
> @@ -66,7 +68,10 @@ struct seq_file;
>    * been completed, or never called at all.
>    */
>   struct dma_fence {
> -	spinlock_t *lock;
> +	union {
> +		spinlock_t *extern_lock;
> +		spinlock_t inline_lock;
> +	};
>   	const struct dma_fence_ops __rcu *ops;
>   	/*
>   	 * We clear the callback list on kref_put so that by the time we
> @@ -100,6 +105,7 @@ struct dma_fence {
>   
>   enum dma_fence_flag_bits {
>   	DMA_FENCE_FLAG_INITIALIZED_BIT,
> +	DMA_FENCE_FLAG_INLINE_LOCK_BIT,
>   	DMA_FENCE_FLAG_SEQNO64_BIT,
>   	DMA_FENCE_FLAG_SIGNALED_BIT,
>   	DMA_FENCE_FLAG_TIMESTAMP_BIT,
> @@ -381,11 +387,12 @@ dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep)
>    * dma_fence_spinlock - return pointer to the spinlock protecting the fence
>    * @fence: the fence to get the lock from
>    *
> - * Return the pointer to the extern lock.
> + * Return either the pointer to the embedded or the external spin lock.
>    */
>   static inline spinlock_t *dma_fence_spinlock(struct dma_fence *fence)
>   {
> -	return fence->lock;
> +	return test_bit(DMA_FENCE_FLAG_INLINE_LOCK_BIT, &fence->flags) ?
> +		&fence->inline_lock : fence->extern_lock;
>   }
>   
>   /**
> @@ -396,7 +403,7 @@ static inline spinlock_t *dma_fence_spinlock(struct dma_fence *fence)
>    * Lock the fence, preventing it from changing to the signaled state.
>    */
>   #define dma_fence_lock_irqsave(fence, flags)	\
> -	spin_lock_irqsave(fence->lock, flags)
> +	spin_lock_irqsave(dma_fence_spinlock(fence), flags)
>   
>   /**
>    * dma_fence_unlock_irqrestore - unlock the fence and irqrestore
> @@ -406,7 +413,7 @@ static inline spinlock_t *dma_fence_spinlock(struct dma_fence *fence)
>    * Unlock the fence, allowing it to change it's state to signaled again.
>    */
>   #define dma_fence_unlock_irqrestore(fence, flags)	\
> -	spin_unlock_irqrestore(fence->lock, flags)
> +	spin_unlock_irqrestore(dma_fence_spinlock(fence), flags)
>   
>   /**
>    * dma_fence_assert_held - lockdep assertion that fence is locked


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/8] dma-buf: protected fence ops by RCU v5
  2026-02-10 10:01 ` [PATCH 1/8] dma-buf: protected fence ops by RCU v5 Christian König
  2026-02-11 10:06   ` Philipp Stanner
@ 2026-02-12  9:31   ` Tvrtko Ursulin
  2026-02-13 14:20   ` Boris Brezillon
  2 siblings, 0 replies; 33+ messages in thread
From: Tvrtko Ursulin @ 2026-02-12  9:31 UTC (permalink / raw)
  To: Christian König, phasta, matthew.brost, sumit.semwal
  Cc: dri-devel, linaro-mm-sig


On 10/02/2026 10:01, Christian König wrote:
> At first glance it is counter intuitive to protect a constant function
> pointer table by RCU, but this allows modules providing the function
> table to unload by waiting for an RCU grace period.
> 
> v2: make one the now duplicated lockdep warnings a comment instead.
> v3: Add more documentation to ->wait and ->release callback.
> v4: fix typo in documentation
> v5: rebased on drm-tip
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/dma-buf/dma-fence.c | 69 +++++++++++++++++++++++++------------
>   include/linux/dma-fence.h   | 29 ++++++++++++++--
>   2 files changed, 73 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index e05beae6e407..de9bf18be3d4 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -522,6 +522,7 @@ EXPORT_SYMBOL(dma_fence_signal);
>   signed long
>   dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
>   {
> +	const struct dma_fence_ops *ops;
>   	signed long ret;
>   
>   	if (WARN_ON(timeout < 0))
> @@ -533,15 +534,21 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
>   
>   	dma_fence_enable_sw_signaling(fence);
>   
> -	if (trace_dma_fence_wait_start_enabled()) {
> -		rcu_read_lock();
> -		trace_dma_fence_wait_start(fence);
> +	rcu_read_lock();
> +	ops = rcu_dereference(fence->ops);
> +	trace_dma_fence_wait_start(fence);
> +	if (ops->wait) {
> +		/*
> +		 * Implementing the wait ops is deprecated and not supported for
> +		 * issuer independent fences, so it is ok to use the ops outside
> +		 * the RCU protected section.
> +		 */
> +		rcu_read_unlock();
> +		ret = ops->wait(fence, intr, timeout);
> +	} else {
>   		rcu_read_unlock();
> -	}
> -	if (fence->ops->wait)
> -		ret = fence->ops->wait(fence, intr, timeout);
> -	else
>   		ret = dma_fence_default_wait(fence, intr, timeout);
> +	}
>   	if (trace_dma_fence_wait_end_enabled()) {
>   		rcu_read_lock();
>   		trace_dma_fence_wait_end(fence);
> @@ -562,6 +569,7 @@ void dma_fence_release(struct kref *kref)
>   {
>   	struct dma_fence *fence =
>   		container_of(kref, struct dma_fence, refcount);
> +	const struct dma_fence_ops *ops;
>   
>   	rcu_read_lock();
>   	trace_dma_fence_destroy(fence);
> @@ -593,12 +601,12 @@ void dma_fence_release(struct kref *kref)
>   		spin_unlock_irqrestore(fence->lock, flags);
>   	}
>   
> -	rcu_read_unlock();
> -
> -	if (fence->ops->release)
> -		fence->ops->release(fence);
> +	ops = rcu_dereference(fence->ops);
> +	if (ops->release)
> +		ops->release(fence);
>   	else
>   		dma_fence_free(fence);
> +	rcu_read_unlock();
>   }
>   EXPORT_SYMBOL(dma_fence_release);
>   
> @@ -617,6 +625,7 @@ EXPORT_SYMBOL(dma_fence_free);
>   
>   static bool __dma_fence_enable_signaling(struct dma_fence *fence)
>   {
> +	const struct dma_fence_ops *ops;
>   	bool was_set;
>   
>   	lockdep_assert_held(fence->lock);
> @@ -627,14 +636,18 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
>   	if (dma_fence_test_signaled_flag(fence))
>   		return false;
>   
> -	if (!was_set && fence->ops->enable_signaling) {
> +	rcu_read_lock();
> +	ops = rcu_dereference(fence->ops);
> +	if (!was_set && ops->enable_signaling) {
>   		trace_dma_fence_enable_signal(fence);
>   
> -		if (!fence->ops->enable_signaling(fence)) {
> +		if (!ops->enable_signaling(fence)) {
> +			rcu_read_unlock();
>   			dma_fence_signal_locked(fence);
>   			return false;
>   		}
>   	}
> +	rcu_read_unlock();
>   
>   	return true;
>   }
> @@ -1007,8 +1020,13 @@ EXPORT_SYMBOL(dma_fence_wait_any_timeout);
>    */
>   void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
>   {
> -	if (fence->ops->set_deadline && !dma_fence_is_signaled(fence))
> -		fence->ops->set_deadline(fence, deadline);
> +	const struct dma_fence_ops *ops;
> +
> +	rcu_read_lock();
> +	ops = rcu_dereference(fence->ops);
> +	if (ops->set_deadline && !dma_fence_is_signaled(fence))
> +		ops->set_deadline(fence, deadline);
> +	rcu_read_unlock();
>   }
>   EXPORT_SYMBOL(dma_fence_set_deadline);
>   
> @@ -1049,7 +1067,12 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>   	BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
>   
>   	kref_init(&fence->refcount);
> -	fence->ops = ops;
> +	/*
> +	 * At first glance it is counter intuitive to protect a constant
> +	 * function pointer table by RCU, but this allows modules providing the
> +	 * function table to unload by waiting for an RCU grace period.
> +	 */
> +	RCU_INIT_POINTER(fence->ops, ops);
>   	INIT_LIST_HEAD(&fence->cb_list);
>   	fence->lock = lock;
>   	fence->context = context;
> @@ -1129,11 +1152,12 @@ EXPORT_SYMBOL(dma_fence_init64);
>    */
>   const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
>   {
> -	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
> -			 "RCU protection is required for safe access to returned string");
> +	const struct dma_fence_ops *ops;
>   
> +	/* RCU protection is required for safe access to returned string */
> +	ops = rcu_dereference(fence->ops);
>   	if (!dma_fence_test_signaled_flag(fence))
> -		return (const char __rcu *)fence->ops->get_driver_name(fence);
> +		return (const char __rcu *)ops->get_driver_name(fence);
>   	else
>   		return (const char __rcu *)"detached-driver";
>   }
> @@ -1161,11 +1185,12 @@ EXPORT_SYMBOL(dma_fence_driver_name);
>    */
>   const char __rcu *dma_fence_timeline_name(struct dma_fence *fence)
>   {
> -	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
> -			 "RCU protection is required for safe access to returned string");
> +	const struct dma_fence_ops *ops;
>   
> +	/* RCU protection is required for safe access to returned string */
> +	ops = rcu_dereference(fence->ops);
>   	if (!dma_fence_test_signaled_flag(fence))
> -		return (const char __rcu *)fence->ops->get_driver_name(fence);
> +		return (const char __rcu *)ops->get_driver_name(fence);
>   	else
>   		return (const char __rcu *)"signaled-timeline";
>   }
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 9c4d25289239..6bf4feb0e01f 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -67,7 +67,7 @@ struct seq_file;
>    */
>   struct dma_fence {
>   	spinlock_t *lock;
> -	const struct dma_fence_ops *ops;
> +	const struct dma_fence_ops __rcu *ops;
>   	/*
>   	 * We clear the callback list on kref_put so that by the time we
>   	 * release the fence it is unused. No one should be adding to the
> @@ -220,6 +220,10 @@ struct dma_fence_ops {
>   	 * timed out. Can also return other error values on custom implementations,
>   	 * which should be treated as if the fence is signaled. For example a hardware
>   	 * lockup could be reported like that.
> +	 *
> +	 * Implementing this callback prevents the fence from detaching after
> +	 * signaling and so it is mandatory for the module providing the
> +	 * dma_fence_ops to stay loaded as long as the dma_fence exists.
>   	 */
>   	signed long (*wait)(struct dma_fence *fence,
>   			    bool intr, signed long timeout);
> @@ -231,6 +235,13 @@ struct dma_fence_ops {
>   	 * Can be called from irq context.  This callback is optional. If it is
>   	 * NULL, then dma_fence_free() is instead called as the default
>   	 * implementation.
> +	 *
> +	 * Implementing this callback prevents the fence from detaching after
> +	 * signaling and so it is mandatory for the module providing the
> +	 * dma_fence_ops to stay loaded as long as the dma_fence exists.
> +	 *
> +	 * If the callback is implemented the memory backing the dma_fence
> +	 * object must be freed RCU safe.
>   	 */
>   	void (*release)(struct dma_fence *fence);
>   
> @@ -454,13 +465,19 @@ dma_fence_test_signaled_flag(struct dma_fence *fence)
>   static inline bool
>   dma_fence_is_signaled_locked(struct dma_fence *fence)
>   {
> +	const struct dma_fence_ops *ops;
> +
>   	if (dma_fence_test_signaled_flag(fence))
>   		return true;
>   
> -	if (fence->ops->signaled && fence->ops->signaled(fence)) {
> +	rcu_read_lock();
> +	ops = rcu_dereference(fence->ops);
> +	if (ops->signaled && ops->signaled(fence)) {
> +		rcu_read_unlock();
>   		dma_fence_signal_locked(fence);
>   		return true;
>   	}
> +	rcu_read_unlock();
>   
>   	return false;
>   }
> @@ -484,13 +501,19 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
>   static inline bool
>   dma_fence_is_signaled(struct dma_fence *fence)
>   {
> +	const struct dma_fence_ops *ops;
> +
>   	if (dma_fence_test_signaled_flag(fence))
>   		return true;
>   
> -	if (fence->ops->signaled && fence->ops->signaled(fence)) {
> +	rcu_read_lock();
> +	ops = rcu_dereference(fence->ops);
> +	if (ops->signaled && ops->signaled(fence)) {
> +		rcu_read_unlock();
>   		dma_fence_signal(fence);
>   		return true;
>   	}
> +	rcu_read_unlock();
>   
>   	return false;
>   }

Pending parallel discussion on the comment tweaks, the logic and 
implementation look good to me:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>

Regards,

Tvrtko


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/8] dma-buf: protected fence ops by RCU v5
  2026-02-10 10:01 ` [PATCH 1/8] dma-buf: protected fence ops by RCU v5 Christian König
  2026-02-11 10:06   ` Philipp Stanner
  2026-02-12  9:31   ` Tvrtko Ursulin
@ 2026-02-13 14:20   ` Boris Brezillon
  2 siblings, 0 replies; 33+ messages in thread
From: Boris Brezillon @ 2026-02-13 14:20 UTC (permalink / raw)
  To: Christian König
  Cc: phasta, matthew.brost, sumit.semwal, dri-devel, linaro-mm-sig

On Tue, 10 Feb 2026 11:01:56 +0100
"Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:

> At first glance it is counter intuitive to protect a constant function
> pointer table by RCU, but this allows modules providing the function
> table to unload by waiting for an RCU grace period.
> 
> v2: make one the now duplicated lockdep warnings a comment instead.
> v3: Add more documentation to ->wait and ->release callback.
> v4: fix typo in documentation
> v5: rebased on drm-tip
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>

> ---
>  drivers/dma-buf/dma-fence.c | 69 +++++++++++++++++++++++++------------
>  include/linux/dma-fence.h   | 29 ++++++++++++++--
>  2 files changed, 73 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index e05beae6e407..de9bf18be3d4 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -522,6 +522,7 @@ EXPORT_SYMBOL(dma_fence_signal);
>  signed long
>  dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
>  {
> +	const struct dma_fence_ops *ops;
>  	signed long ret;
>  
>  	if (WARN_ON(timeout < 0))
> @@ -533,15 +534,21 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
>  
>  	dma_fence_enable_sw_signaling(fence);
>  
> -	if (trace_dma_fence_wait_start_enabled()) {
> -		rcu_read_lock();
> -		trace_dma_fence_wait_start(fence);
> +	rcu_read_lock();
> +	ops = rcu_dereference(fence->ops);
> +	trace_dma_fence_wait_start(fence);
> +	if (ops->wait) {
> +		/*
> +		 * Implementing the wait ops is deprecated and not supported for
> +		 * issuer independent fences, so it is ok to use the ops outside
> +		 * the RCU protected section.
> +		 */
> +		rcu_read_unlock();
> +		ret = ops->wait(fence, intr, timeout);
> +	} else {
>  		rcu_read_unlock();
> -	}
> -	if (fence->ops->wait)
> -		ret = fence->ops->wait(fence, intr, timeout);
> -	else
>  		ret = dma_fence_default_wait(fence, intr, timeout);
> +	}
>  	if (trace_dma_fence_wait_end_enabled()) {
>  		rcu_read_lock();
>  		trace_dma_fence_wait_end(fence);
> @@ -562,6 +569,7 @@ void dma_fence_release(struct kref *kref)
>  {
>  	struct dma_fence *fence =
>  		container_of(kref, struct dma_fence, refcount);
> +	const struct dma_fence_ops *ops;
>  
>  	rcu_read_lock();
>  	trace_dma_fence_destroy(fence);
> @@ -593,12 +601,12 @@ void dma_fence_release(struct kref *kref)
>  		spin_unlock_irqrestore(fence->lock, flags);
>  	}
>  
> -	rcu_read_unlock();
> -
> -	if (fence->ops->release)
> -		fence->ops->release(fence);
> +	ops = rcu_dereference(fence->ops);
> +	if (ops->release)
> +		ops->release(fence);
>  	else
>  		dma_fence_free(fence);
> +	rcu_read_unlock();
>  }
>  EXPORT_SYMBOL(dma_fence_release);
>  
> @@ -617,6 +625,7 @@ EXPORT_SYMBOL(dma_fence_free);
>  
>  static bool __dma_fence_enable_signaling(struct dma_fence *fence)
>  {
> +	const struct dma_fence_ops *ops;
>  	bool was_set;
>  
>  	lockdep_assert_held(fence->lock);
> @@ -627,14 +636,18 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
>  	if (dma_fence_test_signaled_flag(fence))
>  		return false;
>  
> -	if (!was_set && fence->ops->enable_signaling) {
> +	rcu_read_lock();
> +	ops = rcu_dereference(fence->ops);
> +	if (!was_set && ops->enable_signaling) {
>  		trace_dma_fence_enable_signal(fence);
>  
> -		if (!fence->ops->enable_signaling(fence)) {
> +		if (!ops->enable_signaling(fence)) {
> +			rcu_read_unlock();
>  			dma_fence_signal_locked(fence);
>  			return false;
>  		}
>  	}
> +	rcu_read_unlock();
>  
>  	return true;
>  }
> @@ -1007,8 +1020,13 @@ EXPORT_SYMBOL(dma_fence_wait_any_timeout);
>   */
>  void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
>  {
> -	if (fence->ops->set_deadline && !dma_fence_is_signaled(fence))
> -		fence->ops->set_deadline(fence, deadline);
> +	const struct dma_fence_ops *ops;
> +
> +	rcu_read_lock();
> +	ops = rcu_dereference(fence->ops);
> +	if (ops->set_deadline && !dma_fence_is_signaled(fence))
> +		ops->set_deadline(fence, deadline);
> +	rcu_read_unlock();
>  }
>  EXPORT_SYMBOL(dma_fence_set_deadline);
>  
> @@ -1049,7 +1067,12 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>  	BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
>  
>  	kref_init(&fence->refcount);
> -	fence->ops = ops;
> +	/*
> +	 * At first glance it is counter intuitive to protect a constant
> +	 * function pointer table by RCU, but this allows modules providing the
> +	 * function table to unload by waiting for an RCU grace period.
> +	 */
> +	RCU_INIT_POINTER(fence->ops, ops);
>  	INIT_LIST_HEAD(&fence->cb_list);
>  	fence->lock = lock;
>  	fence->context = context;
> @@ -1129,11 +1152,12 @@ EXPORT_SYMBOL(dma_fence_init64);
>   */
>  const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
>  {
> -	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
> -			 "RCU protection is required for safe access to returned string");
> +	const struct dma_fence_ops *ops;
>  
> +	/* RCU protection is required for safe access to returned string */
> +	ops = rcu_dereference(fence->ops);
>  	if (!dma_fence_test_signaled_flag(fence))
> -		return (const char __rcu *)fence->ops->get_driver_name(fence);
> +		return (const char __rcu *)ops->get_driver_name(fence);
>  	else
>  		return (const char __rcu *)"detached-driver";
>  }
> @@ -1161,11 +1185,12 @@ EXPORT_SYMBOL(dma_fence_driver_name);
>   */
>  const char __rcu *dma_fence_timeline_name(struct dma_fence *fence)
>  {
> -	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
> -			 "RCU protection is required for safe access to returned string");
> +	const struct dma_fence_ops *ops;
>  
> +	/* RCU protection is required for safe access to returned string */
> +	ops = rcu_dereference(fence->ops);
>  	if (!dma_fence_test_signaled_flag(fence))
> -		return (const char __rcu *)fence->ops->get_driver_name(fence);
> +		return (const char __rcu *)ops->get_driver_name(fence);
>  	else
>  		return (const char __rcu *)"signaled-timeline";
>  }
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 9c4d25289239..6bf4feb0e01f 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -67,7 +67,7 @@ struct seq_file;
>   */
>  struct dma_fence {
>  	spinlock_t *lock;
> -	const struct dma_fence_ops *ops;
> +	const struct dma_fence_ops __rcu *ops;
>  	/*
>  	 * We clear the callback list on kref_put so that by the time we
>  	 * release the fence it is unused. No one should be adding to the
> @@ -220,6 +220,10 @@ struct dma_fence_ops {
>  	 * timed out. Can also return other error values on custom implementations,
>  	 * which should be treated as if the fence is signaled. For example a hardware
>  	 * lockup could be reported like that.
> +	 *
> +	 * Implementing this callback prevents the fence from detaching after
> +	 * signaling and so it is mandatory for the module providing the
> +	 * dma_fence_ops to stay loaded as long as the dma_fence exists.
>  	 */
>  	signed long (*wait)(struct dma_fence *fence,
>  			    bool intr, signed long timeout);
> @@ -231,6 +235,13 @@ struct dma_fence_ops {
>  	 * Can be called from irq context.  This callback is optional. If it is
>  	 * NULL, then dma_fence_free() is instead called as the default
>  	 * implementation.
> +	 *
> +	 * Implementing this callback prevents the fence from detaching after
> +	 * signaling and so it is mandatory for the module providing the
> +	 * dma_fence_ops to stay loaded as long as the dma_fence exists.
> +	 *
> +	 * If the callback is implemented the memory backing the dma_fence
> +	 * object must be freed RCU safe.
>  	 */
>  	void (*release)(struct dma_fence *fence);
>  
> @@ -454,13 +465,19 @@ dma_fence_test_signaled_flag(struct dma_fence *fence)
>  static inline bool
>  dma_fence_is_signaled_locked(struct dma_fence *fence)
>  {
> +	const struct dma_fence_ops *ops;
> +
>  	if (dma_fence_test_signaled_flag(fence))
>  		return true;
>  
> -	if (fence->ops->signaled && fence->ops->signaled(fence)) {
> +	rcu_read_lock();
> +	ops = rcu_dereference(fence->ops);
> +	if (ops->signaled && ops->signaled(fence)) {
> +		rcu_read_unlock();
>  		dma_fence_signal_locked(fence);
>  		return true;
>  	}
> +	rcu_read_unlock();
>  
>  	return false;
>  }
> @@ -484,13 +501,19 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
>  static inline bool
>  dma_fence_is_signaled(struct dma_fence *fence)
>  {
> +	const struct dma_fence_ops *ops;
> +
>  	if (dma_fence_test_signaled_flag(fence))
>  		return true;
>  
> -	if (fence->ops->signaled && fence->ops->signaled(fence)) {
> +	rcu_read_lock();
> +	ops = rcu_dereference(fence->ops);
> +	if (ops->signaled && ops->signaled(fence)) {
> +		rcu_read_unlock();
>  		dma_fence_signal(fence);
>  		return true;
>  	}
> +	rcu_read_unlock();
>  
>  	return false;
>  }


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/8] dma-buf: detach fence ops on signal v2
  2026-02-10 10:01 ` [PATCH 2/8] dma-buf: detach fence ops on signal v2 Christian König
@ 2026-02-13 14:22   ` Boris Brezillon
  2026-02-19 12:52     ` Christian König
  0 siblings, 1 reply; 33+ messages in thread
From: Boris Brezillon @ 2026-02-13 14:22 UTC (permalink / raw)
  To: Christian König
  Cc: phasta, matthew.brost, sumit.semwal, dri-devel, linaro-mm-sig

On Tue, 10 Feb 2026 11:01:57 +0100
"Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:

> When neither a release nor a wait backend ops is specified it is possible
> to let the dma_fence live on independently of the module who issued it.
> 
> This makes it possible to unload drivers and only wait for all their
> fences to signal.
> 
> v2: fix typo in comment
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Reviewed-by: Philipp Stanner <phasta@kernel.org>

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>

One nit below.

> ---
>  drivers/dma-buf/dma-fence.c | 16 ++++++++++++----
>  include/linux/dma-fence.h   |  4 ++--
>  2 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index de9bf18be3d4..ba02321bef0b 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -371,6 +371,14 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>  				      &fence->flags)))
>  		return;
>  
> +	/*
> +	 * When neither a release nor a wait operation is specified set the ops
> +	 * pointer to NULL to allow the fence structure to become independent
> +	 * from who originally issued it.

I think this deserves some comment in the dma_fence_ops doc, so that
people know what to expect when they implement this interface.

> +	 */
> +	if (!fence->ops->release && !fence->ops->wait)
> +		RCU_INIT_POINTER(fence->ops, NULL);
> +
>  	/* Stash the cb_list before replacing it with the timestamp */
>  	list_replace(&fence->cb_list, &cb_list);
>  
> @@ -537,7 +545,7 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
>  	rcu_read_lock();
>  	ops = rcu_dereference(fence->ops);
>  	trace_dma_fence_wait_start(fence);
> -	if (ops->wait) {
> +	if (ops && ops->wait) {
>  		/*
>  		 * Implementing the wait ops is deprecated and not supported for
>  		 * issuer independent fences, so it is ok to use the ops outside
> @@ -602,7 +610,7 @@ void dma_fence_release(struct kref *kref)
>  	}
>  
>  	ops = rcu_dereference(fence->ops);
> -	if (ops->release)
> +	if (ops && ops->release)
>  		ops->release(fence);
>  	else
>  		dma_fence_free(fence);
> @@ -638,7 +646,7 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
>  
>  	rcu_read_lock();
>  	ops = rcu_dereference(fence->ops);
> -	if (!was_set && ops->enable_signaling) {
> +	if (!was_set && ops && ops->enable_signaling) {
>  		trace_dma_fence_enable_signal(fence);
>  
>  		if (!ops->enable_signaling(fence)) {
> @@ -1024,7 +1032,7 @@ void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
>  
>  	rcu_read_lock();
>  	ops = rcu_dereference(fence->ops);
> -	if (ops->set_deadline && !dma_fence_is_signaled(fence))
> +	if (ops && ops->set_deadline && !dma_fence_is_signaled(fence))
>  		ops->set_deadline(fence, deadline);
>  	rcu_read_unlock();
>  }
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 6bf4feb0e01f..e1afbb5909f9 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -472,7 +472,7 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
>  
>  	rcu_read_lock();
>  	ops = rcu_dereference(fence->ops);
> -	if (ops->signaled && ops->signaled(fence)) {
> +	if (ops && ops->signaled && ops->signaled(fence)) {
>  		rcu_read_unlock();
>  		dma_fence_signal_locked(fence);
>  		return true;
> @@ -508,7 +508,7 @@ dma_fence_is_signaled(struct dma_fence *fence)
>  
>  	rcu_read_lock();
>  	ops = rcu_dereference(fence->ops);
> -	if (ops->signaled && ops->signaled(fence)) {
> +	if (ops && ops->signaled && ops->signaled(fence)) {
>  		rcu_read_unlock();
>  		dma_fence_signal(fence);
>  		return true;


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/8] dma-buf: inline spinlock for fence protection v4
  2026-02-10 10:01 ` [PATCH 4/8] dma-buf: inline spinlock for fence protection v4 Christian König
  2026-02-11  9:50   ` Philipp Stanner
  2026-02-12  9:16   ` Tvrtko Ursulin
@ 2026-02-13 14:27   ` Boris Brezillon
  2026-02-15  8:48     ` Boris Brezillon
  2026-02-16  7:33     ` Philipp Stanner
  2 siblings, 2 replies; 33+ messages in thread
From: Boris Brezillon @ 2026-02-13 14:27 UTC (permalink / raw)
  To: Christian König
  Cc: phasta, matthew.brost, sumit.semwal, dri-devel, linaro-mm-sig

On Tue, 10 Feb 2026 11:01:59 +0100
"Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:

> Implement per-fence spinlocks, allowing implementations to not give an
> external spinlock to protect the fence internal statei. Instead a spinlock
> embedded into the fence structure itself is used in this case.
> 
> Shared spinlocks have the problem that implementations need to guarantee
> that the lock live at least as long all fences referencing them.
> 
> Using a per-fence spinlock allows completely decoupling spinlock producer
> and consumer life times, simplifying the handling in most use cases.
> 
> v2: improve naming, coverage and function documentation
> v3: fix one additional locking in the selftests
> v4: separate out some changes to make the patch smaller,
>     fix one amdgpu crash found by CI systems
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/dma-buf/dma-fence.c             | 21 ++++++++++++++++-----
>  drivers/dma-buf/sync_debug.h            |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  2 +-
>  drivers/gpu/drm/drm_crtc.c              |  2 +-
>  drivers/gpu/drm/drm_writeback.c         |  2 +-
>  drivers/gpu/drm/nouveau/nouveau_fence.c |  3 ++-
>  drivers/gpu/drm/qxl/qxl_release.c       |  3 ++-
>  drivers/gpu/drm/vmwgfx/vmwgfx_fence.c   |  3 ++-
>  drivers/gpu/drm/xe/xe_hw_fence.c        |  3 ++-
>  include/linux/dma-fence.h               | 19 +++++++++++++------
>  10 files changed, 41 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 56aa59867eaa..1833889e7466 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -343,7 +343,6 @@ void __dma_fence_might_wait(void)
>  }
>  #endif
>  
> -
>  /**
>   * dma_fence_signal_timestamp_locked - signal completion of a fence
>   * @fence: the fence to signal
> @@ -1067,7 +1066,6 @@ static void
>  __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>  	         spinlock_t *lock, u64 context, u64 seqno, unsigned long flags)
>  {
> -	BUG_ON(!lock);
>  	BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
>  
>  	kref_init(&fence->refcount);
> @@ -1078,10 +1076,15 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>  	 */
>  	RCU_INIT_POINTER(fence->ops, ops);
>  	INIT_LIST_HEAD(&fence->cb_list);
> -	fence->lock = lock;
>  	fence->context = context;
>  	fence->seqno = seqno;
>  	fence->flags = flags | BIT(DMA_FENCE_FLAG_INITIALIZED_BIT);
> +	if (lock) {
> +		fence->extern_lock = lock;
> +	} else {
> +		spin_lock_init(&fence->inline_lock);
> +		fence->flags |= BIT(DMA_FENCE_FLAG_INLINE_LOCK_BIT);

Hm, does this even make a different in term of instructions to check for
a bit instead of extern_lock == NULL? If not, I'd be in favor of
killing this redundancy and checking extern_lock against NULL in
dma_fence_spinlock().

> +	}
>  	fence->error = 0;
>  
>  	trace_dma_fence_init(fence);
> @@ -1091,7 +1094,7 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>   * dma_fence_init - Initialize a custom fence.
>   * @fence: the fence to initialize
>   * @ops: the dma_fence_ops for operations on this fence
> - * @lock: the irqsafe spinlock to use for locking this fence
> + * @lock: optional irqsafe spinlock to use for locking this fence
>   * @context: the execution context this fence is run on
>   * @seqno: a linear increasing sequence number for this context
>   *
> @@ -1101,6 +1104,10 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>   *
>   * context and seqno are used for easy comparison between fences, allowing
>   * to check which fence is later by simply using dma_fence_later().
> + *
> + * It is strongly discouraged to provide an external lock. This is only allowed
> + * for legacy use cases when multiple fences need to be prevented from
> + * signaling out of order.
>   */
>  void
>  dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> @@ -1114,7 +1121,7 @@ EXPORT_SYMBOL(dma_fence_init);
>   * dma_fence_init64 - Initialize a custom fence with 64-bit seqno support.
>   * @fence: the fence to initialize
>   * @ops: the dma_fence_ops for operations on this fence
> - * @lock: the irqsafe spinlock to use for locking this fence
> + * @lock: optional irqsafe spinlock to use for locking this fence
>   * @context: the execution context this fence is run on
>   * @seqno: a linear increasing sequence number for this context
>   *
> @@ -1124,6 +1131,10 @@ EXPORT_SYMBOL(dma_fence_init);
>   *
>   * Context and seqno are used for easy comparison between fences, allowing
>   * to check which fence is later by simply using dma_fence_later().
> + *
> + * It is strongly discouraged to provide an external lock. This is only allowed
> + * for legacy use cases when multiple fences need to be prevented from
> + * signaling out of order.
>   */
>  void
>  dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops,
> diff --git a/drivers/dma-buf/sync_debug.h b/drivers/dma-buf/sync_debug.h
> index 02af347293d0..c49324505b20 100644
> --- a/drivers/dma-buf/sync_debug.h
> +++ b/drivers/dma-buf/sync_debug.h
> @@ -47,7 +47,7 @@ struct sync_timeline {
>  
>  static inline struct sync_timeline *dma_fence_parent(struct dma_fence *fence)
>  {
> -	return container_of(fence->lock, struct sync_timeline, lock);
> +	return container_of(fence->extern_lock, struct sync_timeline, lock);
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index 139642eacdd0..d5c41e24fb51 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -638,7 +638,7 @@ static inline uint64_t amdgpu_vm_tlb_seq(struct amdgpu_vm *vm)
>  	 * sure that the dma_fence structure isn't freed up.
>  	 */
>  	rcu_read_lock();
> -	lock = vm->last_tlb_flush->lock;
> +	lock = dma_fence_spinlock(vm->last_tlb_flush);
>  	rcu_read_unlock();
>  
>  	spin_lock_irqsave(lock, flags);
> diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
> index a7797d260f1e..17472915842f 100644
> --- a/drivers/gpu/drm/drm_crtc.c
> +++ b/drivers/gpu/drm/drm_crtc.c
> @@ -159,7 +159,7 @@ static const struct dma_fence_ops drm_crtc_fence_ops;
>  static struct drm_crtc *fence_to_crtc(struct dma_fence *fence)
>  {
>  	BUG_ON(fence->ops != &drm_crtc_fence_ops);
> -	return container_of(fence->lock, struct drm_crtc, fence_lock);
> +	return container_of(fence->extern_lock, struct drm_crtc, fence_lock);
>  }
>  
>  static const char *drm_crtc_fence_get_driver_name(struct dma_fence *fence)
> diff --git a/drivers/gpu/drm/drm_writeback.c b/drivers/gpu/drm/drm_writeback.c
> index 95b8a2e4bda6..624a4e8b6c99 100644
> --- a/drivers/gpu/drm/drm_writeback.c
> +++ b/drivers/gpu/drm/drm_writeback.c
> @@ -81,7 +81,7 @@
>   *	From userspace, this property will always read as zero.
>   */
>  
> -#define fence_to_wb_connector(x) container_of(x->lock, \
> +#define fence_to_wb_connector(x) container_of(x->extern_lock, \
>  					      struct drm_writeback_connector, \
>  					      fence_lock)
>  
> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
> index 4a193b7d6d9e..c282c94138b2 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> @@ -41,7 +41,8 @@ static const struct dma_fence_ops nouveau_fence_ops_legacy;
>  static inline struct nouveau_fence_chan *
>  nouveau_fctx(struct nouveau_fence *fence)
>  {
> -	return container_of(fence->base.lock, struct nouveau_fence_chan, lock);
> +	return container_of(fence->base.extern_lock, struct nouveau_fence_chan,
> +			    lock);
>  }
>  
>  static bool
> diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
> index 06b0b2aa7953..37d4ae0faf0d 100644
> --- a/drivers/gpu/drm/qxl/qxl_release.c
> +++ b/drivers/gpu/drm/qxl/qxl_release.c
> @@ -62,7 +62,8 @@ static long qxl_fence_wait(struct dma_fence *fence, bool intr,
>  	struct qxl_device *qdev;
>  	unsigned long cur, end = jiffies + timeout;
>  
> -	qdev = container_of(fence->lock, struct qxl_device, release_lock);
> +	qdev = container_of(fence->extern_lock, struct qxl_device,
> +			    release_lock);
>  
>  	if (!wait_event_timeout(qdev->release_event,
>  				(dma_fence_is_signaled(fence) ||
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> index 85795082fef9..d251eec57df9 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> @@ -47,7 +47,8 @@ struct vmw_event_fence_action {
>  static struct vmw_fence_manager *
>  fman_from_fence(struct vmw_fence_obj *fence)
>  {
> -	return container_of(fence->base.lock, struct vmw_fence_manager, lock);
> +	return container_of(fence->base.extern_lock, struct vmw_fence_manager,
> +			    lock);
>  }
>  
>  static void vmw_fence_obj_destroy(struct dma_fence *f)
> diff --git a/drivers/gpu/drm/xe/xe_hw_fence.c b/drivers/gpu/drm/xe/xe_hw_fence.c
> index ae8ed15b64c5..14720623ad00 100644
> --- a/drivers/gpu/drm/xe/xe_hw_fence.c
> +++ b/drivers/gpu/drm/xe/xe_hw_fence.c
> @@ -124,7 +124,8 @@ static struct xe_hw_fence *to_xe_hw_fence(struct dma_fence *fence);
>  
>  static struct xe_hw_fence_irq *xe_hw_fence_irq(struct xe_hw_fence *fence)
>  {
> -	return container_of(fence->dma.lock, struct xe_hw_fence_irq, lock);
> +	return container_of(fence->dma.extern_lock, struct xe_hw_fence_irq,
> +			    lock);
>  }
>  
>  static const char *xe_hw_fence_get_driver_name(struct dma_fence *dma_fence)
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 88c842fc35d5..6eabbb1c471c 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -34,7 +34,8 @@ struct seq_file;
>   * @ops: dma_fence_ops associated with this fence
>   * @rcu: used for releasing fence with kfree_rcu
>   * @cb_list: list of all callbacks to call
> - * @lock: spin_lock_irqsave used for locking
> + * @extern_lock: external spin_lock_irqsave used for locking
> + * @inline_lock: alternative internal spin_lock_irqsave used for locking
>   * @context: execution context this fence belongs to, returned by
>   *           dma_fence_context_alloc()
>   * @seqno: the sequence number of this fence inside the execution context,
> @@ -49,6 +50,7 @@ struct seq_file;
>   * of the time.
>   *
>   * DMA_FENCE_FLAG_INITIALIZED_BIT - fence was initialized
> + * DMA_FENCE_FLAG_INLINE_LOCK_BIT - use inline spinlock instead of external one
>   * DMA_FENCE_FLAG_SIGNALED_BIT - fence is already signaled
>   * DMA_FENCE_FLAG_TIMESTAMP_BIT - timestamp recorded for fence signaling
>   * DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT - enable_signaling might have been called
> @@ -66,7 +68,10 @@ struct seq_file;
>   * been completed, or never called at all.
>   */
>  struct dma_fence {
> -	spinlock_t *lock;
> +	union {
> +		spinlock_t *extern_lock;
> +		spinlock_t inline_lock;
> +	};
>  	const struct dma_fence_ops __rcu *ops;
>  	/*
>  	 * We clear the callback list on kref_put so that by the time we
> @@ -100,6 +105,7 @@ struct dma_fence {
>  
>  enum dma_fence_flag_bits {
>  	DMA_FENCE_FLAG_INITIALIZED_BIT,
> +	DMA_FENCE_FLAG_INLINE_LOCK_BIT,
>  	DMA_FENCE_FLAG_SEQNO64_BIT,
>  	DMA_FENCE_FLAG_SIGNALED_BIT,
>  	DMA_FENCE_FLAG_TIMESTAMP_BIT,
> @@ -381,11 +387,12 @@ dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep)
>   * dma_fence_spinlock - return pointer to the spinlock protecting the fence
>   * @fence: the fence to get the lock from
>   *
> - * Return the pointer to the extern lock.
> + * Return either the pointer to the embedded or the external spin lock.
>   */
>  static inline spinlock_t *dma_fence_spinlock(struct dma_fence *fence)
>  {
> -	return fence->lock;
> +	return test_bit(DMA_FENCE_FLAG_INLINE_LOCK_BIT, &fence->flags) ?
> +		&fence->inline_lock : fence->extern_lock;
>  }
>  
>  /**
> @@ -396,7 +403,7 @@ static inline spinlock_t *dma_fence_spinlock(struct dma_fence *fence)
>   * Lock the fence, preventing it from changing to the signaled state.
>   */
>  #define dma_fence_lock_irqsave(fence, flags)	\
> -	spin_lock_irqsave(fence->lock, flags)
> +	spin_lock_irqsave(dma_fence_spinlock(fence), flags)
>  
>  /**
>   * dma_fence_unlock_irqrestore - unlock the fence and irqrestore
> @@ -406,7 +413,7 @@ static inline spinlock_t *dma_fence_spinlock(struct dma_fence *fence)
>   * Unlock the fence, allowing it to change it's state to signaled again.
>   */
>  #define dma_fence_unlock_irqrestore(fence, flags)	\
> -	spin_unlock_irqrestore(fence->lock, flags)
> +	spin_unlock_irqrestore(dma_fence_spinlock(fence), flags)
>  
>  /**
>   * dma_fence_assert_held - lockdep assertion that fence is locked


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/8] dma-buf: use inline lock for the stub fence v2
  2026-02-10 10:02 ` [PATCH 6/8] dma-buf: use inline lock for the stub fence v2 Christian König
@ 2026-02-13 14:32   ` Boris Brezillon
  0 siblings, 0 replies; 33+ messages in thread
From: Boris Brezillon @ 2026-02-13 14:32 UTC (permalink / raw)
  To: Christian König
  Cc: phasta, matthew.brost, sumit.semwal, dri-devel, linaro-mm-sig

On Tue, 10 Feb 2026 11:02:01 +0100
"Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:

> Using the inline lock is now the recommended way for dma_fence
> implementations.
> 
> So use this approach for the framework's internal fences as well.
> 
> Also saves about 4 bytes for the external spinlock.
> 
> v2: drop unnecessary changes
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Reviewed-by: Philipp Stanner <phasta@kernel.org>

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>

> ---
>  drivers/dma-buf/dma-fence.c | 12 ++----------
>  1 file changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 1833889e7466..541e20aa4e6c 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -24,7 +24,6 @@ EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit);
>  EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal);
>  EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
>  
> -static DEFINE_SPINLOCK(dma_fence_stub_lock);
>  static struct dma_fence dma_fence_stub;
>  
>  /*
> @@ -123,12 +122,9 @@ static const struct dma_fence_ops dma_fence_stub_ops = {
>  
>  static int __init dma_fence_init_stub(void)
>  {
> -	dma_fence_init(&dma_fence_stub, &dma_fence_stub_ops,
> -		       &dma_fence_stub_lock, 0, 0);
> -
> +	dma_fence_init(&dma_fence_stub, &dma_fence_stub_ops, NULL, 0, 0);
>  	set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
>  		&dma_fence_stub.flags);
> -
>  	dma_fence_signal(&dma_fence_stub);
>  	return 0;
>  }
> @@ -160,11 +156,7 @@ struct dma_fence *dma_fence_allocate_private_stub(ktime_t timestamp)
>  	if (fence == NULL)
>  		return NULL;
>  
> -	dma_fence_init(fence,
> -		       &dma_fence_stub_ops,
> -		       &dma_fence_stub_lock,
> -		       0, 0);
> -
> +	dma_fence_init(fence, &dma_fence_stub_ops, NULL, 0, 0);
>  	set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
>  		&fence->flags);
>  


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/8] dma-buf: use inline lock for the dma-fence-array
  2026-02-10 10:02 ` [PATCH 7/8] dma-buf: use inline lock for the dma-fence-array Christian König
@ 2026-02-13 14:33   ` Boris Brezillon
  0 siblings, 0 replies; 33+ messages in thread
From: Boris Brezillon @ 2026-02-13 14:33 UTC (permalink / raw)
  To: Christian König
  Cc: phasta, matthew.brost, sumit.semwal, dri-devel, linaro-mm-sig

On Tue, 10 Feb 2026 11:02:02 +0100
"Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:

> Using the inline lock is now the recommended way for dma_fence
> implementations.
> 
> So use this approach for the framework's internal fences as well.
> 
> Also saves about 4 bytes for the external spinlock.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Reviewed-by: Philipp Stanner <phasta@kernel.org>

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>

> ---
>  drivers/dma-buf/dma-fence-array.c | 5 ++---
>  include/linux/dma-fence-array.h   | 1 -
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c
> index 6657d4b30af9..c2119a8049fe 100644
> --- a/drivers/dma-buf/dma-fence-array.c
> +++ b/drivers/dma-buf/dma-fence-array.c
> @@ -204,9 +204,8 @@ void dma_fence_array_init(struct dma_fence_array *array,
>  
>  	array->num_fences = num_fences;
>  
> -	spin_lock_init(&array->lock);
> -	dma_fence_init(&array->base, &dma_fence_array_ops, &array->lock,
> -		       context, seqno);
> +	dma_fence_init(&array->base, &dma_fence_array_ops, NULL, context,
> +		       seqno);
>  	init_irq_work(&array->work, irq_dma_fence_array_work);
>  
>  	atomic_set(&array->num_pending, signal_on_any ? 1 : num_fences);
> diff --git a/include/linux/dma-fence-array.h b/include/linux/dma-fence-array.h
> index 079b3dec0a16..370b3d2bba37 100644
> --- a/include/linux/dma-fence-array.h
> +++ b/include/linux/dma-fence-array.h
> @@ -38,7 +38,6 @@ struct dma_fence_array_cb {
>  struct dma_fence_array {
>  	struct dma_fence base;
>  
> -	spinlock_t lock;
>  	unsigned num_fences;
>  	atomic_t num_pending;
>  	struct dma_fence **fences;


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 8/8] dma-buf: use inline lock for the dma-fence-chain
  2026-02-10 10:02 ` [PATCH 8/8] dma-buf: use inline lock for the dma-fence-chain Christian König
@ 2026-02-13 14:33   ` Boris Brezillon
  0 siblings, 0 replies; 33+ messages in thread
From: Boris Brezillon @ 2026-02-13 14:33 UTC (permalink / raw)
  To: Christian König
  Cc: phasta, matthew.brost, sumit.semwal, dri-devel, linaro-mm-sig

On Tue, 10 Feb 2026 11:02:03 +0100
"Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:

> Using the inline lock is now the recommended way for dma_fence
> implementations.
> 
> So use this approach for the framework's internal fences as well.
> 
> Also saves about 4 bytes for the external spinlock.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Reviewed-by: Philipp Stanner <phasta@kernel.org>

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>

> ---
>  drivers/dma-buf/dma-fence-chain.c | 3 +--
>  include/linux/dma-fence-chain.h   | 1 -
>  2 files changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
> index a8a90acf4f34..a707792b6025 100644
> --- a/drivers/dma-buf/dma-fence-chain.c
> +++ b/drivers/dma-buf/dma-fence-chain.c
> @@ -245,7 +245,6 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
>  	struct dma_fence_chain *prev_chain = to_dma_fence_chain(prev);
>  	uint64_t context;
>  
> -	spin_lock_init(&chain->lock);
>  	rcu_assign_pointer(chain->prev, prev);
>  	chain->fence = fence;
>  	chain->prev_seqno = 0;
> @@ -261,7 +260,7 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
>  			seqno = max(prev->seqno, seqno);
>  	}
>  
> -	dma_fence_init64(&chain->base, &dma_fence_chain_ops, &chain->lock,
> +	dma_fence_init64(&chain->base, &dma_fence_chain_ops, NULL,
>  			 context, seqno);
>  
>  	/*
> diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
> index 68c3c1e41014..d39ce7a2e599 100644
> --- a/include/linux/dma-fence-chain.h
> +++ b/include/linux/dma-fence-chain.h
> @@ -46,7 +46,6 @@ struct dma_fence_chain {
>  		 */
>  		struct irq_work work;
>  	};
> -	spinlock_t lock;
>  };
>  
>  


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/8] dma-buf: inline spinlock for fence protection v4
  2026-02-13 14:27   ` Boris Brezillon
@ 2026-02-15  8:48     ` Boris Brezillon
  2026-02-16  7:33     ` Philipp Stanner
  1 sibling, 0 replies; 33+ messages in thread
From: Boris Brezillon @ 2026-02-15  8:48 UTC (permalink / raw)
  To: Christian König
  Cc: phasta, matthew.brost, sumit.semwal, dri-devel, linaro-mm-sig

On Fri, 13 Feb 2026 15:27:33 +0100
Boris Brezillon <boris.brezillon@collabora.com> wrote:

> On Tue, 10 Feb 2026 11:01:59 +0100
> "Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:
> 
> > Implement per-fence spinlocks, allowing implementations to not give an
> > external spinlock to protect the fence internal statei. Instead a spinlock
> > embedded into the fence structure itself is used in this case.
> > 
> > Shared spinlocks have the problem that implementations need to guarantee
> > that the lock live at least as long all fences referencing them.
> > 
> > Using a per-fence spinlock allows completely decoupling spinlock producer
> > and consumer life times, simplifying the handling in most use cases.
> > 
> > v2: improve naming, coverage and function documentation
> > v3: fix one additional locking in the selftests
> > v4: separate out some changes to make the patch smaller,
> >     fix one amdgpu crash found by CI systems
> > 
> > Signed-off-by: Christian König <christian.koenig@amd.com>
> > ---
> >  drivers/dma-buf/dma-fence.c             | 21 ++++++++++++++++-----
> >  drivers/dma-buf/sync_debug.h            |  2 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  2 +-
> >  drivers/gpu/drm/drm_crtc.c              |  2 +-
> >  drivers/gpu/drm/drm_writeback.c         |  2 +-
> >  drivers/gpu/drm/nouveau/nouveau_fence.c |  3 ++-
> >  drivers/gpu/drm/qxl/qxl_release.c       |  3 ++-
> >  drivers/gpu/drm/vmwgfx/vmwgfx_fence.c   |  3 ++-
> >  drivers/gpu/drm/xe/xe_hw_fence.c        |  3 ++-
> >  include/linux/dma-fence.h               | 19 +++++++++++++------
> >  10 files changed, 41 insertions(+), 19 deletions(-)
> > 
> > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > index 56aa59867eaa..1833889e7466 100644
> > --- a/drivers/dma-buf/dma-fence.c
> > +++ b/drivers/dma-buf/dma-fence.c
> > @@ -343,7 +343,6 @@ void __dma_fence_might_wait(void)
> >  }
> >  #endif
> >  
> > -
> >  /**
> >   * dma_fence_signal_timestamp_locked - signal completion of a fence
> >   * @fence: the fence to signal
> > @@ -1067,7 +1066,6 @@ static void
> >  __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> >  	         spinlock_t *lock, u64 context, u64 seqno, unsigned long flags)
> >  {
> > -	BUG_ON(!lock);
> >  	BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
> >  
> >  	kref_init(&fence->refcount);
> > @@ -1078,10 +1076,15 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> >  	 */
> >  	RCU_INIT_POINTER(fence->ops, ops);
> >  	INIT_LIST_HEAD(&fence->cb_list);
> > -	fence->lock = lock;
> >  	fence->context = context;
> >  	fence->seqno = seqno;
> >  	fence->flags = flags | BIT(DMA_FENCE_FLAG_INITIALIZED_BIT);
> > +	if (lock) {
> > +		fence->extern_lock = lock;
> > +	} else {
> > +		spin_lock_init(&fence->inline_lock);
> > +		fence->flags |= BIT(DMA_FENCE_FLAG_INLINE_LOCK_BIT);  
> 
> Hm, does this even make a different in term of instructions to check for
> a bit instead of extern_lock == NULL? If not, I'd be in favor of
> killing this redundancy and checking extern_lock against NULL in
> dma_fence_spinlock().

Scratch that, I didn't notice {extern,inline}_lock were under a union
(which makes sense). Looks all good to me.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>

> 
> > +	}
> >  	fence->error = 0;
> >  
> >  	trace_dma_fence_init(fence);
> > @@ -1091,7 +1094,7 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> >   * dma_fence_init - Initialize a custom fence.
> >   * @fence: the fence to initialize
> >   * @ops: the dma_fence_ops for operations on this fence
> > - * @lock: the irqsafe spinlock to use for locking this fence
> > + * @lock: optional irqsafe spinlock to use for locking this fence
> >   * @context: the execution context this fence is run on
> >   * @seqno: a linear increasing sequence number for this context
> >   *
> > @@ -1101,6 +1104,10 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> >   *
> >   * context and seqno are used for easy comparison between fences, allowing
> >   * to check which fence is later by simply using dma_fence_later().
> > + *
> > + * It is strongly discouraged to provide an external lock. This is only allowed
> > + * for legacy use cases when multiple fences need to be prevented from
> > + * signaling out of order.
> >   */
> >  void
> >  dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> > @@ -1114,7 +1121,7 @@ EXPORT_SYMBOL(dma_fence_init);
> >   * dma_fence_init64 - Initialize a custom fence with 64-bit seqno support.
> >   * @fence: the fence to initialize
> >   * @ops: the dma_fence_ops for operations on this fence
> > - * @lock: the irqsafe spinlock to use for locking this fence
> > + * @lock: optional irqsafe spinlock to use for locking this fence
> >   * @context: the execution context this fence is run on
> >   * @seqno: a linear increasing sequence number for this context
> >   *
> > @@ -1124,6 +1131,10 @@ EXPORT_SYMBOL(dma_fence_init);
> >   *
> >   * Context and seqno are used for easy comparison between fences, allowing
> >   * to check which fence is later by simply using dma_fence_later().
> > + *
> > + * It is strongly discouraged to provide an external lock. This is only allowed
> > + * for legacy use cases when multiple fences need to be prevented from
> > + * signaling out of order.
> >   */
> >  void
> >  dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops,
> > diff --git a/drivers/dma-buf/sync_debug.h b/drivers/dma-buf/sync_debug.h
> > index 02af347293d0..c49324505b20 100644
> > --- a/drivers/dma-buf/sync_debug.h
> > +++ b/drivers/dma-buf/sync_debug.h
> > @@ -47,7 +47,7 @@ struct sync_timeline {
> >  
> >  static inline struct sync_timeline *dma_fence_parent(struct dma_fence *fence)
> >  {
> > -	return container_of(fence->lock, struct sync_timeline, lock);
> > +	return container_of(fence->extern_lock, struct sync_timeline, lock);
> >  }
> >  
> >  /**
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> > index 139642eacdd0..d5c41e24fb51 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> > @@ -638,7 +638,7 @@ static inline uint64_t amdgpu_vm_tlb_seq(struct amdgpu_vm *vm)
> >  	 * sure that the dma_fence structure isn't freed up.
> >  	 */
> >  	rcu_read_lock();
> > -	lock = vm->last_tlb_flush->lock;
> > +	lock = dma_fence_spinlock(vm->last_tlb_flush);
> >  	rcu_read_unlock();
> >  
> >  	spin_lock_irqsave(lock, flags);
> > diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
> > index a7797d260f1e..17472915842f 100644
> > --- a/drivers/gpu/drm/drm_crtc.c
> > +++ b/drivers/gpu/drm/drm_crtc.c
> > @@ -159,7 +159,7 @@ static const struct dma_fence_ops drm_crtc_fence_ops;
> >  static struct drm_crtc *fence_to_crtc(struct dma_fence *fence)
> >  {
> >  	BUG_ON(fence->ops != &drm_crtc_fence_ops);
> > -	return container_of(fence->lock, struct drm_crtc, fence_lock);
> > +	return container_of(fence->extern_lock, struct drm_crtc, fence_lock);
> >  }
> >  
> >  static const char *drm_crtc_fence_get_driver_name(struct dma_fence *fence)
> > diff --git a/drivers/gpu/drm/drm_writeback.c b/drivers/gpu/drm/drm_writeback.c
> > index 95b8a2e4bda6..624a4e8b6c99 100644
> > --- a/drivers/gpu/drm/drm_writeback.c
> > +++ b/drivers/gpu/drm/drm_writeback.c
> > @@ -81,7 +81,7 @@
> >   *	From userspace, this property will always read as zero.
> >   */
> >  
> > -#define fence_to_wb_connector(x) container_of(x->lock, \
> > +#define fence_to_wb_connector(x) container_of(x->extern_lock, \
> >  					      struct drm_writeback_connector, \
> >  					      fence_lock)
> >  
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
> > index 4a193b7d6d9e..c282c94138b2 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> > @@ -41,7 +41,8 @@ static const struct dma_fence_ops nouveau_fence_ops_legacy;
> >  static inline struct nouveau_fence_chan *
> >  nouveau_fctx(struct nouveau_fence *fence)
> >  {
> > -	return container_of(fence->base.lock, struct nouveau_fence_chan, lock);
> > +	return container_of(fence->base.extern_lock, struct nouveau_fence_chan,
> > +			    lock);
> >  }
> >  
> >  static bool
> > diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
> > index 06b0b2aa7953..37d4ae0faf0d 100644
> > --- a/drivers/gpu/drm/qxl/qxl_release.c
> > +++ b/drivers/gpu/drm/qxl/qxl_release.c
> > @@ -62,7 +62,8 @@ static long qxl_fence_wait(struct dma_fence *fence, bool intr,
> >  	struct qxl_device *qdev;
> >  	unsigned long cur, end = jiffies + timeout;
> >  
> > -	qdev = container_of(fence->lock, struct qxl_device, release_lock);
> > +	qdev = container_of(fence->extern_lock, struct qxl_device,
> > +			    release_lock);
> >  
> >  	if (!wait_event_timeout(qdev->release_event,
> >  				(dma_fence_is_signaled(fence) ||
> > diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> > index 85795082fef9..d251eec57df9 100644
> > --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> > +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> > @@ -47,7 +47,8 @@ struct vmw_event_fence_action {
> >  static struct vmw_fence_manager *
> >  fman_from_fence(struct vmw_fence_obj *fence)
> >  {
> > -	return container_of(fence->base.lock, struct vmw_fence_manager, lock);
> > +	return container_of(fence->base.extern_lock, struct vmw_fence_manager,
> > +			    lock);
> >  }
> >  
> >  static void vmw_fence_obj_destroy(struct dma_fence *f)
> > diff --git a/drivers/gpu/drm/xe/xe_hw_fence.c b/drivers/gpu/drm/xe/xe_hw_fence.c
> > index ae8ed15b64c5..14720623ad00 100644
> > --- a/drivers/gpu/drm/xe/xe_hw_fence.c
> > +++ b/drivers/gpu/drm/xe/xe_hw_fence.c
> > @@ -124,7 +124,8 @@ static struct xe_hw_fence *to_xe_hw_fence(struct dma_fence *fence);
> >  
> >  static struct xe_hw_fence_irq *xe_hw_fence_irq(struct xe_hw_fence *fence)
> >  {
> > -	return container_of(fence->dma.lock, struct xe_hw_fence_irq, lock);
> > +	return container_of(fence->dma.extern_lock, struct xe_hw_fence_irq,
> > +			    lock);
> >  }
> >  
> >  static const char *xe_hw_fence_get_driver_name(struct dma_fence *dma_fence)
> > diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> > index 88c842fc35d5..6eabbb1c471c 100644
> > --- a/include/linux/dma-fence.h
> > +++ b/include/linux/dma-fence.h
> > @@ -34,7 +34,8 @@ struct seq_file;
> >   * @ops: dma_fence_ops associated with this fence
> >   * @rcu: used for releasing fence with kfree_rcu
> >   * @cb_list: list of all callbacks to call
> > - * @lock: spin_lock_irqsave used for locking
> > + * @extern_lock: external spin_lock_irqsave used for locking
> > + * @inline_lock: alternative internal spin_lock_irqsave used for locking
> >   * @context: execution context this fence belongs to, returned by
> >   *           dma_fence_context_alloc()
> >   * @seqno: the sequence number of this fence inside the execution context,
> > @@ -49,6 +50,7 @@ struct seq_file;
> >   * of the time.
> >   *
> >   * DMA_FENCE_FLAG_INITIALIZED_BIT - fence was initialized
> > + * DMA_FENCE_FLAG_INLINE_LOCK_BIT - use inline spinlock instead of external one
> >   * DMA_FENCE_FLAG_SIGNALED_BIT - fence is already signaled
> >   * DMA_FENCE_FLAG_TIMESTAMP_BIT - timestamp recorded for fence signaling
> >   * DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT - enable_signaling might have been called
> > @@ -66,7 +68,10 @@ struct seq_file;
> >   * been completed, or never called at all.
> >   */
> >  struct dma_fence {
> > -	spinlock_t *lock;
> > +	union {
> > +		spinlock_t *extern_lock;
> > +		spinlock_t inline_lock;
> > +	};
> >  	const struct dma_fence_ops __rcu *ops;
> >  	/*
> >  	 * We clear the callback list on kref_put so that by the time we
> > @@ -100,6 +105,7 @@ struct dma_fence {
> >  
> >  enum dma_fence_flag_bits {
> >  	DMA_FENCE_FLAG_INITIALIZED_BIT,
> > +	DMA_FENCE_FLAG_INLINE_LOCK_BIT,
> >  	DMA_FENCE_FLAG_SEQNO64_BIT,
> >  	DMA_FENCE_FLAG_SIGNALED_BIT,
> >  	DMA_FENCE_FLAG_TIMESTAMP_BIT,
> > @@ -381,11 +387,12 @@ dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep)
> >   * dma_fence_spinlock - return pointer to the spinlock protecting the fence
> >   * @fence: the fence to get the lock from
> >   *
> > - * Return the pointer to the extern lock.
> > + * Return either the pointer to the embedded or the external spin lock.
> >   */
> >  static inline spinlock_t *dma_fence_spinlock(struct dma_fence *fence)
> >  {
> > -	return fence->lock;
> > +	return test_bit(DMA_FENCE_FLAG_INLINE_LOCK_BIT, &fence->flags) ?
> > +		&fence->inline_lock : fence->extern_lock;
> >  }
> >  
> >  /**
> > @@ -396,7 +403,7 @@ static inline spinlock_t *dma_fence_spinlock(struct dma_fence *fence)
> >   * Lock the fence, preventing it from changing to the signaled state.
> >   */
> >  #define dma_fence_lock_irqsave(fence, flags)	\
> > -	spin_lock_irqsave(fence->lock, flags)
> > +	spin_lock_irqsave(dma_fence_spinlock(fence), flags)
> >  
> >  /**
> >   * dma_fence_unlock_irqrestore - unlock the fence and irqrestore
> > @@ -406,7 +413,7 @@ static inline spinlock_t *dma_fence_spinlock(struct dma_fence *fence)
> >   * Unlock the fence, allowing it to change it's state to signaled again.
> >   */
> >  #define dma_fence_unlock_irqrestore(fence, flags)	\
> > -	spin_unlock_irqrestore(fence->lock, flags)
> > +	spin_unlock_irqrestore(dma_fence_spinlock(fence), flags)
> >  
> >  /**
> >   * dma_fence_assert_held - lockdep assertion that fence is locked  
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/8] dma-buf: inline spinlock for fence protection v4
  2026-02-13 14:27   ` Boris Brezillon
  2026-02-15  8:48     ` Boris Brezillon
@ 2026-02-16  7:33     ` Philipp Stanner
  2026-02-16  9:48       ` Boris Brezillon
  1 sibling, 1 reply; 33+ messages in thread
From: Philipp Stanner @ 2026-02-16  7:33 UTC (permalink / raw)
  To: Boris Brezillon, Christian König
  Cc: matthew.brost, sumit.semwal, dri-devel, linaro-mm-sig

On Fri, 2026-02-13 at 15:27 +0100, Boris Brezillon wrote:
> On Tue, 10 Feb 2026 11:01:59 +0100
> "Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:
> 
> > Implement per-fence spinlocks, allowing implementations to not give an
> > external spinlock to protect the fence internal statei. Instead a spinlock
> > embedded into the fence structure itself is used in this case.
> > 
> > Shared spinlocks have the problem that implementations need to guarantee
> > that the lock live at least as long all fences referencing them.
> > 
> > Using a per-fence spinlock allows completely decoupling spinlock producer
> > and consumer life times, simplifying the handling in most use cases.
> > 
> > v2: improve naming, coverage and function documentation
> > v3: fix one additional locking in the selftests
> > v4: separate out some changes to make the patch smaller,
> >     fix one amdgpu crash found by CI systems
> > 
> > Signed-off-by: Christian König <christian.koenig@amd.com>
> > ---
> >  drivers/dma-buf/dma-fence.c             | 21 ++++++++++++++++-----
> >  drivers/dma-buf/sync_debug.h            |  2 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  2 +-
> >  drivers/gpu/drm/drm_crtc.c              |  2 +-
> >  drivers/gpu/drm/drm_writeback.c         |  2 +-
> >  drivers/gpu/drm/nouveau/nouveau_fence.c |  3 ++-
> >  drivers/gpu/drm/qxl/qxl_release.c       |  3 ++-
> >  drivers/gpu/drm/vmwgfx/vmwgfx_fence.c   |  3 ++-
> >  drivers/gpu/drm/xe/xe_hw_fence.c        |  3 ++-
> >  include/linux/dma-fence.h               | 19 +++++++++++++------
> >  10 files changed, 41 insertions(+), 19 deletions(-)
> > 
> > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > index 56aa59867eaa..1833889e7466 100644
> > --- a/drivers/dma-buf/dma-fence.c
> > +++ b/drivers/dma-buf/dma-fence.c
> > @@ -343,7 +343,6 @@ void __dma_fence_might_wait(void)
> >  }
> >  #endif
> >  
> > -
> >  /**
> >   * dma_fence_signal_timestamp_locked - signal completion of a fence
> >   * @fence: the fence to signal
> > @@ -1067,7 +1066,6 @@ static void
> >  __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> >  	         spinlock_t *lock, u64 context, u64 seqno, unsigned long flags)
> >  {
> > -	BUG_ON(!lock);
> >  	BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
> >  
> >  	kref_init(&fence->refcount);
> > @@ -1078,10 +1076,15 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> >  	 */
> >  	RCU_INIT_POINTER(fence->ops, ops);
> >  	INIT_LIST_HEAD(&fence->cb_list);
> > -	fence->lock = lock;
> >  	fence->context = context;
> >  	fence->seqno = seqno;
> >  	fence->flags = flags | BIT(DMA_FENCE_FLAG_INITIALIZED_BIT);
> > +	if (lock) {
> > +		fence->extern_lock = lock;
> > +	} else {
> > +		spin_lock_init(&fence->inline_lock);
> > +		fence->flags |= BIT(DMA_FENCE_FLAG_INLINE_LOCK_BIT);
> 
> Hm, does this even make a different in term of instructions to check for
> a bit instead of extern_lock == NULL? If not, I'd be in favor of
> killing this redundancy and checking extern_lock against NULL in
> dma_fence_spinlock().

extern_lock and inline_lock are a union, so they overlap each other.
inline_lock will only be equivalent to all zeros after initializing a
new fence to 0.


P.

PS: Can you terminate messages by a delimiter or by cropping? I give
this tip sometimes, because often the reviewer has to scroll emails
down to the end to see whether there are further comments. I terminate
my messages with "P." for that purpose ;]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/8] dma-buf: inline spinlock for fence protection v4
  2026-02-16  7:33     ` Philipp Stanner
@ 2026-02-16  9:48       ` Boris Brezillon
  0 siblings, 0 replies; 33+ messages in thread
From: Boris Brezillon @ 2026-02-16  9:48 UTC (permalink / raw)
  To: Philipp Stanner
  Cc: phasta, Christian König, matthew.brost, sumit.semwal,
	dri-devel, linaro-mm-sig

On Mon, 16 Feb 2026 08:33:21 +0100
Philipp Stanner <phasta@mailbox.org> wrote:

> On Fri, 2026-02-13 at 15:27 +0100, Boris Brezillon wrote:
> > On Tue, 10 Feb 2026 11:01:59 +0100
> > "Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:
> >   
> > > Implement per-fence spinlocks, allowing implementations to not give an
> > > external spinlock to protect the fence internal statei. Instead a spinlock
> > > embedded into the fence structure itself is used in this case.
> > > 
> > > Shared spinlocks have the problem that implementations need to guarantee
> > > that the lock live at least as long all fences referencing them.
> > > 
> > > Using a per-fence spinlock allows completely decoupling spinlock producer
> > > and consumer life times, simplifying the handling in most use cases.
> > > 
> > > v2: improve naming, coverage and function documentation
> > > v3: fix one additional locking in the selftests
> > > v4: separate out some changes to make the patch smaller,
> > >     fix one amdgpu crash found by CI systems
> > > 
> > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > ---
> > >  drivers/dma-buf/dma-fence.c             | 21 ++++++++++++++++-----
> > >  drivers/dma-buf/sync_debug.h            |  2 +-
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  2 +-
> > >  drivers/gpu/drm/drm_crtc.c              |  2 +-
> > >  drivers/gpu/drm/drm_writeback.c         |  2 +-
> > >  drivers/gpu/drm/nouveau/nouveau_fence.c |  3 ++-
> > >  drivers/gpu/drm/qxl/qxl_release.c       |  3 ++-
> > >  drivers/gpu/drm/vmwgfx/vmwgfx_fence.c   |  3 ++-
> > >  drivers/gpu/drm/xe/xe_hw_fence.c        |  3 ++-
> > >  include/linux/dma-fence.h               | 19 +++++++++++++------
> > >  10 files changed, 41 insertions(+), 19 deletions(-)
> > > 
> > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > > index 56aa59867eaa..1833889e7466 100644
> > > --- a/drivers/dma-buf/dma-fence.c
> > > +++ b/drivers/dma-buf/dma-fence.c
> > > @@ -343,7 +343,6 @@ void __dma_fence_might_wait(void)
> > >  }
> > >  #endif
> > >  
> > > -
> > >  /**
> > >   * dma_fence_signal_timestamp_locked - signal completion of a fence
> > >   * @fence: the fence to signal
> > > @@ -1067,7 +1066,6 @@ static void
> > >  __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> > >  	         spinlock_t *lock, u64 context, u64 seqno, unsigned long flags)
> > >  {
> > > -	BUG_ON(!lock);
> > >  	BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
> > >  
> > >  	kref_init(&fence->refcount);
> > > @@ -1078,10 +1076,15 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> > >  	 */
> > >  	RCU_INIT_POINTER(fence->ops, ops);
> > >  	INIT_LIST_HEAD(&fence->cb_list);
> > > -	fence->lock = lock;
> > >  	fence->context = context;
> > >  	fence->seqno = seqno;
> > >  	fence->flags = flags | BIT(DMA_FENCE_FLAG_INITIALIZED_BIT);
> > > +	if (lock) {
> > > +		fence->extern_lock = lock;
> > > +	} else {
> > > +		spin_lock_init(&fence->inline_lock);
> > > +		fence->flags |= BIT(DMA_FENCE_FLAG_INLINE_LOCK_BIT);  
> > 
> > Hm, does this even make a different in term of instructions to check for
> > a bit instead of extern_lock == NULL? If not, I'd be in favor of
> > killing this redundancy and checking extern_lock against NULL in
> > dma_fence_spinlock().  
> 
> extern_lock and inline_lock are a union, so they overlap each other.
> inline_lock will only be equivalent to all zeros after initializing a
> new fence to 0.
> 
> 
> P.
> 
> PS: Can you terminate messages by a delimiter or by cropping? I give
> this tip sometimes, because often the reviewer has to scroll emails
> down to the end to see whether there are further comments. I terminate
> my messages with "P." for that purpose ;]

I tend to strip messages and quote only the bits I comment on. I get
this time I didn't, my bad.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/8] dma-buf: protected fence ops by RCU v5
  2026-02-12  8:56       ` Philipp Stanner
@ 2026-02-19 10:23         ` Christian König
  2026-02-19 10:35           ` Philipp Stanner
  0 siblings, 1 reply; 33+ messages in thread
From: Christian König @ 2026-02-19 10:23 UTC (permalink / raw)
  To: phasta, matthew.brost, sumit.semwal; +Cc: dri-devel, linaro-mm-sig

On 2/12/26 09:56, Philipp Stanner wrote:
>>>> @@ -454,13 +465,19 @@ dma_fence_test_signaled_flag(struct dma_fence *fence)
>>>>  static inline bool
>>>>  dma_fence_is_signaled_locked(struct dma_fence *fence)
>>>>  {
>>>> +	const struct dma_fence_ops *ops;
>>>> +
>>>>  	if (dma_fence_test_signaled_flag(fence))
>>>>  		return true;
>>>>  
>>>> -	if (fence->ops->signaled && fence->ops->signaled(fence)) {
>>>> +	rcu_read_lock();
>>>> +	ops = rcu_dereference(fence->ops);
>>>> +	if (ops->signaled && ops->signaled(fence)) {
>>>
>>> Maybe you can educate me a bit about RCU here – couldn't this still
>>> race? If the ops were unloaded before you take rcu_read_lock(),
>>> rcu_dereference() would give you an invalid pointer here since you
>>> don't check for !ops, no?
>>
>> Perfectly correct thinking, yes.
>>
>> But the check for !ops is added in patch #2 when we actually start to set ops = NULL when the fence signals.
>>
>> I intentionally separated that because it is basically the second step in making the solution to detach the fence ops from the module by RCU work.
>>
>> We could merge the two patches together, but I think the separation actually makes sense should anybody start to complain about the additional RCU overhead.
>>
> 
> Alright, makes sense. However the above does not read correct..
> 
> But then my question would be: What's the purpose of this patch, what
> does it solve or address atomically?

Adding the RCU annotation and related logic, e.g. rcu_read_lock()/rcu_read_unlock()/rcu_dereference() etc...

This allows the automated statically RCU checker to validate what we do here and point out potential mistakes.

Additional to that should adding the rcu_read_lock() protection cause performance problems it will bisect to this patch here alone.

> Adding RCU here does not yet change behavior and it does not solve the
> unloading problem, does it?

Nope, no functional behavior change. It's purely to get the automated checkers going.

> If it's a mere preperational step and the patches should not be merged,
> I'd guard the above with a simple comment like "Cleanup preparation.
> 'ops' can yet not be NULL, but this will be the case subsequently."

A comment added in this patch and removed in the next one? Na, that sounds like overkill to me.

Christian.

> 
> 
> P.
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/8] dma-buf: protected fence ops by RCU v5
  2026-02-19 10:23         ` Christian König
@ 2026-02-19 10:35           ` Philipp Stanner
  2026-02-19 12:49             ` Christian König
  0 siblings, 1 reply; 33+ messages in thread
From: Philipp Stanner @ 2026-02-19 10:35 UTC (permalink / raw)
  To: Christian König, phasta, matthew.brost, sumit.semwal
  Cc: dri-devel, linaro-mm-sig

On Thu, 2026-02-19 at 11:23 +0100, Christian König wrote:
> On 2/12/26 09:56, Philipp Stanner wrote:
> > > > > @@ -454,13 +465,19 @@ dma_fence_test_signaled_flag(struct dma_fence *fence)
> > > > >  static inline bool
> > > > >  dma_fence_is_signaled_locked(struct dma_fence *fence)
> > > > >  {
> > > > > +	const struct dma_fence_ops *ops;
> > > > > +
> > > > >  	if (dma_fence_test_signaled_flag(fence))
> > > > >  		return true;
> > > > >  
> > > > > -	if (fence->ops->signaled && fence->ops->signaled(fence)) {
> > > > > +	rcu_read_lock();
> > > > > +	ops = rcu_dereference(fence->ops);
> > > > > +	if (ops->signaled && ops->signaled(fence)) {
> > > > 
> > > > Maybe you can educate me a bit about RCU here – couldn't this still
> > > > race? If the ops were unloaded before you take rcu_read_lock(),
> > > > rcu_dereference() would give you an invalid pointer here since you
> > > > don't check for !ops, no?
> > > 
> > > Perfectly correct thinking, yes.
> > > 
> > > But the check for !ops is added in patch #2 when we actually start to set ops = NULL when the fence signals.
> > > 
> > > I intentionally separated that because it is basically the second step in making the solution to detach the fence ops from the module by RCU work.
> > > 
> > > We could merge the two patches together, but I think the separation actually makes sense should anybody start to complain about the additional RCU overhead.
> > > 
> > 
> > Alright, makes sense. However the above does not read correct..
> > 
> > But then my question would be: What's the purpose of this patch, what
> > does it solve or address atomically?
> 
> Adding the RCU annotation and related logic, e.g. rcu_read_lock()/rcu_read_unlock()/rcu_dereference() etc...
> 
> This allows the automated statically RCU checker to validate what we do here and point out potential mistakes.
> 
> Additional to that should adding the rcu_read_lock() protection cause performance problems it will bisect to this patch here alone.

Alright, thx for the info. Very useful

> 
> > Adding RCU here does not yet change behavior and it does not solve the
> > unloading problem, does it?
> 
> Nope, no functional behavior change. It's purely to get the automated checkers going.
> 
> > If it's a mere preperational step and the patches should not be merged,
> > I'd guard the above with a simple comment like "Cleanup preparation.
> > 'ops' can yet not be NULL, but this will be the case subsequently."
> 
> A comment added in this patch and removed in the next one? Na, that sounds like overkill to me.

ACK.
But then lets do a normalkill by adding the info you provided above
into the commit message, shall we? ^_^

"At first glance it is counter intuitive to protect a constant function
pointer table by RCU, but this allows modules providing the function
table to unload by waiting for an RCU grace period."

This doesn't reveal what the patch is actually about, just that
something is counter-intuitive to someone already very familiar with
the series' intent and the code's deeper background :)

"This or that about dma_fence shall be cleaned up in subsequent
patches. To prepare for that, add … which allows the RCU checker to
validate …"

*Philipp reads that*: ["Ah, this patch is in preparation and allows the
RCU checker to validate everything!"]

;p

P.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/8] dma-buf: protected fence ops by RCU v5
  2026-02-19 10:35           ` Philipp Stanner
@ 2026-02-19 12:49             ` Christian König
  0 siblings, 0 replies; 33+ messages in thread
From: Christian König @ 2026-02-19 12:49 UTC (permalink / raw)
  To: phasta, matthew.brost, sumit.semwal; +Cc: dri-devel, linaro-mm-sig

On 2/19/26 11:35, Philipp Stanner wrote:
> On Thu, 2026-02-19 at 11:23 +0100, Christian König wrote:
>> On 2/12/26 09:56, Philipp Stanner wrote:
>>>>>> @@ -454,13 +465,19 @@ dma_fence_test_signaled_flag(struct dma_fence *fence)
>>>>>>  static inline bool
>>>>>>  dma_fence_is_signaled_locked(struct dma_fence *fence)
>>>>>>  {
>>>>>> +	const struct dma_fence_ops *ops;
>>>>>> +
>>>>>>  	if (dma_fence_test_signaled_flag(fence))
>>>>>>  		return true;
>>>>>>  
>>>>>> -	if (fence->ops->signaled && fence->ops->signaled(fence)) {
>>>>>> +	rcu_read_lock();
>>>>>> +	ops = rcu_dereference(fence->ops);
>>>>>> +	if (ops->signaled && ops->signaled(fence)) {
>>>>>
>>>>> Maybe you can educate me a bit about RCU here – couldn't this still
>>>>> race? If the ops were unloaded before you take rcu_read_lock(),
>>>>> rcu_dereference() would give you an invalid pointer here since you
>>>>> don't check for !ops, no?
>>>>
>>>> Perfectly correct thinking, yes.
>>>>
>>>> But the check for !ops is added in patch #2 when we actually start to set ops = NULL when the fence signals.
>>>>
>>>> I intentionally separated that because it is basically the second step in making the solution to detach the fence ops from the module by RCU work.
>>>>
>>>> We could merge the two patches together, but I think the separation actually makes sense should anybody start to complain about the additional RCU overhead.
>>>>
>>>
>>> Alright, makes sense. However the above does not read correct..
>>>
>>> But then my question would be: What's the purpose of this patch, what
>>> does it solve or address atomically?
>>
>> Adding the RCU annotation and related logic, e.g. rcu_read_lock()/rcu_read_unlock()/rcu_dereference() etc...
>>
>> This allows the automated statically RCU checker to validate what we do here and point out potential mistakes.
>>
>> Additional to that should adding the rcu_read_lock() protection cause performance problems it will bisect to this patch here alone.
> 
> Alright, thx for the info. Very useful
> 
>>
>>> Adding RCU here does not yet change behavior and it does not solve the
>>> unloading problem, does it?
>>
>> Nope, no functional behavior change. It's purely to get the automated checkers going.
>>
>>> If it's a mere preperational step and the patches should not be merged,
>>> I'd guard the above with a simple comment like "Cleanup preparation.
>>> 'ops' can yet not be NULL, but this will be the case subsequently."
>>
>> A comment added in this patch and removed in the next one? Na, that sounds like overkill to me.
> 
> ACK.
> But then lets do a normalkill by adding the info you provided above
> into the commit message, shall we? ^_^
> 
> "At first glance it is counter intuitive to protect a constant function
> pointer table by RCU, but this allows modules providing the function
> table to unload by waiting for an RCU grace period."
> 
> This doesn't reveal what the patch is actually about, just that
> something is counter-intuitive to someone already very familiar with
> the series' intent and the code's deeper background :)
> 
> "This or that about dma_fence shall be cleaned up in subsequent
> patches. To prepare for that, add … which allows the RCU checker to
> validate …"

I've already added the sentence "...As first step to solve this issue protect the fence ops by RCU." in the commit message to make it clear that this is not a full solution to the issue.

> *Philipp reads that*: ["Ah, this patch is in preparation and allows the
> RCU checker to validate everything!"]

Yeah, mentioning the RCU checker is clearly a good idea. Going to add that.

Christian.

> 
> ;p
> 
> P.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/8] dma-buf: detach fence ops on signal v2
  2026-02-13 14:22   ` Boris Brezillon
@ 2026-02-19 12:52     ` Christian König
  2026-02-19 15:49       ` Boris Brezillon
  0 siblings, 1 reply; 33+ messages in thread
From: Christian König @ 2026-02-19 12:52 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: phasta, matthew.brost, sumit.semwal, dri-devel, linaro-mm-sig

On 2/13/26 15:22, Boris Brezillon wrote:
>> ---
>>  drivers/dma-buf/dma-fence.c | 16 ++++++++++++----
>>  include/linux/dma-fence.h   |  4 ++--
>>  2 files changed, 14 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>> index de9bf18be3d4..ba02321bef0b 100644
>> --- a/drivers/dma-buf/dma-fence.c
>> +++ b/drivers/dma-buf/dma-fence.c
>> @@ -371,6 +371,14 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>>  				      &fence->flags)))
>>  		return;
>>  
>> +	/*
>> +	 * When neither a release nor a wait operation is specified set the ops
>> +	 * pointer to NULL to allow the fence structure to become independent
>> +	 * from who originally issued it.
> 
> I think this deserves some comment in the dma_fence_ops doc, so that
> people know what to expect when they implement this interface.
There was already a warning added like ~5years ago that implementations shouldn't use the wait callback.

Completely independent of this patch set here we already had tons of trouble with it because it can't take into account when userpsace waits for multiple fences from different implementations.

It potentially was never a good idea to have in the first place, we basically only had it because radeon (and IIRC nouveau at that point) depended on it.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/8] dma-buf: detach fence ops on signal v2
  2026-02-19 12:52     ` Christian König
@ 2026-02-19 15:49       ` Boris Brezillon
  0 siblings, 0 replies; 33+ messages in thread
From: Boris Brezillon @ 2026-02-19 15:49 UTC (permalink / raw)
  To: Christian König
  Cc: phasta, matthew.brost, sumit.semwal, dri-devel, linaro-mm-sig

On Thu, 19 Feb 2026 13:52:43 +0100
Christian König <christian.koenig@amd.com> wrote:

> On 2/13/26 15:22, Boris Brezillon wrote:
> >> ---
> >>  drivers/dma-buf/dma-fence.c | 16 ++++++++++++----
> >>  include/linux/dma-fence.h   |  4 ++--
> >>  2 files changed, 14 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> >> index de9bf18be3d4..ba02321bef0b 100644
> >> --- a/drivers/dma-buf/dma-fence.c
> >> +++ b/drivers/dma-buf/dma-fence.c
> >> @@ -371,6 +371,14 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
> >>  				      &fence->flags)))
> >>  		return;
> >>  
> >> +	/*
> >> +	 * When neither a release nor a wait operation is specified set the ops
> >> +	 * pointer to NULL to allow the fence structure to become independent
> >> +	 * from who originally issued it.  
> > 
> > I think this deserves some comment in the dma_fence_ops doc, so that
> > people know what to expect when they implement this interface.  
> There was already a warning added like ~5years ago that implementations shouldn't use the wait callback.
> 
> Completely independent of this patch set here we already had tons of trouble with it because it can't take into account when userpsace waits for multiple fences from different implementations.
> 
> It potentially was never a good idea to have in the first place, we basically only had it because radeon (and IIRC nouveau at that point) depended on it.

Fair enough. If it's flagged deprecated already, let's keep things like
that.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2026-02-19 15:49 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-10 10:01 Independence for dma_fences! v7 Christian König
2026-02-10 10:01 ` [PATCH 1/8] dma-buf: protected fence ops by RCU v5 Christian König
2026-02-11 10:06   ` Philipp Stanner
2026-02-11 15:43     ` Christian König
2026-02-12  8:56       ` Philipp Stanner
2026-02-19 10:23         ` Christian König
2026-02-19 10:35           ` Philipp Stanner
2026-02-19 12:49             ` Christian König
2026-02-12  9:03       ` Tvrtko Ursulin
2026-02-12  9:31   ` Tvrtko Ursulin
2026-02-13 14:20   ` Boris Brezillon
2026-02-10 10:01 ` [PATCH 2/8] dma-buf: detach fence ops on signal v2 Christian König
2026-02-13 14:22   ` Boris Brezillon
2026-02-19 12:52     ` Christian König
2026-02-19 15:49       ` Boris Brezillon
2026-02-10 10:01 ` [PATCH 3/8] dma-buf: abstract fence locking v2 Christian König
2026-02-12  9:07   ` Tvrtko Ursulin
2026-02-10 10:01 ` [PATCH 4/8] dma-buf: inline spinlock for fence protection v4 Christian König
2026-02-11  9:50   ` Philipp Stanner
2026-02-11 14:59     ` Christian König
2026-02-12  9:01       ` Philipp Stanner
2026-02-12  9:16   ` Tvrtko Ursulin
2026-02-13 14:27   ` Boris Brezillon
2026-02-15  8:48     ` Boris Brezillon
2026-02-16  7:33     ` Philipp Stanner
2026-02-16  9:48       ` Boris Brezillon
2026-02-10 10:02 ` [PATCH 5/8] dma-buf/selftests: test RCU ops and inline lock v2 Christian König
2026-02-10 10:02 ` [PATCH 6/8] dma-buf: use inline lock for the stub fence v2 Christian König
2026-02-13 14:32   ` Boris Brezillon
2026-02-10 10:02 ` [PATCH 7/8] dma-buf: use inline lock for the dma-fence-array Christian König
2026-02-13 14:33   ` Boris Brezillon
2026-02-10 10:02 ` [PATCH 8/8] dma-buf: use inline lock for the dma-fence-chain Christian König
2026-02-13 14:33   ` Boris Brezillon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.