AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Independence for dma_fences!
@ 2025-10-13 13:48 Christian König
  2025-10-13 13:48 ` [PATCH 01/15] dma-buf: cleanup dma_fence_describe Christian König
                   ` (16 more replies)
  0 siblings, 17 replies; 47+ messages in thread
From: Christian König @ 2025-10-13 13:48 UTC (permalink / raw)
  To: phasta, alexdeucher, simona.vetter, tursulin; +Cc: dri-devel, amd-gfx

Hi everyone,

dma_fences have ever lived under the tyranny dictated by the module
lifetime of their issuer, leading to crashes should anybody still holding
a reference to a dma_fence when the module of the issuer was unloaded.

But those days are over! The patch set following this mail finally
implements a way for issuers to release their dma_fence out of this
slavery and outlive the module who originally created them.

Previously various approaches have been discussed, including changing the
locking semantics of the dma_fence callbacks (by me) as well as using the
drm scheduler as intermediate layer (by Sima) to disconnect dma_fences
from their actual users.

Changing the locking semantics turned out to be much more trickier than
originally thought because especially on older drivers (nouveau, radeon,
but also i915) this locking semantics is actually needed for correct
operation.

Using the drm_scheduler as intermediate layer is still a good idea and
should probably be implemented to make live simpler for some drivers, but
doesn't work for all use cases. Especially TLB flush fences, preemption
fences and userqueue fences don't go through the drm scheduler because it
doesn't make sense for them.

Tvrtko did some really nice prerequisite work by protecting the returned
strings of the dma_fence_ops by RCU. This way dma_fence creators where
able to just wait for an RCU grace period after fence signaling before
they could be save to free those data structures.

Now this patch set here goes a step further and protects the whole
dma_fence_ops structure by RCU, so that after the fence signals the
pointer to the dma_fence_ops is set to NULL when there is no wait nor
release callback given. All functionality which use the dma_fence_ops
reference are put inside an RCU critical section, except for the
deprecated issuer specific wait and of course the optional release
callback.

Additional to the RCU changes the lock protecting the dma_fence state
previously had to be allocated external. This set here now changes the
functionality to make that external lock optional and allows dma_fences
to use an inline lock and be self contained.

The new approach is then applied to amdgpu allowing the module to be
unloaded even when dma_fences issued by it are still around.

Please review and comment,
Christian.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 01/15] dma-buf: cleanup dma_fence_describe
  2025-10-13 13:48 Independence for dma_fences! Christian König
@ 2025-10-13 13:48 ` Christian König
  2025-10-14 14:37   ` Tvrtko Ursulin
  2025-10-13 13:48 ` [PATCH 02/15] dma-buf: rework stub fence initialisation Christian König
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: Christian König @ 2025-10-13 13:48 UTC (permalink / raw)
  To: phasta, alexdeucher, simona.vetter, tursulin; +Cc: dri-devel, amd-gfx

The driver and timeline name are meaningless for signaled fences.

Drop them and also print the context number.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-fence.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 3f78c56b58dc..f0539c73ed57 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -1001,17 +1001,18 @@ void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq)
 {
 	const char __rcu *timeline;
 	const char __rcu *driver;
+	const char *signaled = "un";
 
 	rcu_read_lock();
 
 	timeline = dma_fence_timeline_name(fence);
 	driver = dma_fence_driver_name(fence);
 
-	seq_printf(seq, "%s %s seq %llu %ssignalled\n",
-		   rcu_dereference(driver),
-		   rcu_dereference(timeline),
-		   fence->seqno,
-		   dma_fence_is_signaled(fence) ? "" : "un");
+	if (dma_fence_is_signaled(fence))
+		timeline = driver = signaled = "";
+
+	seq_printf(seq, "%llu %s %s seq %llu %ssignalled\n", fence->context,
+		   timeline, driver, fence->seqno, signaled);
 
 	rcu_read_unlock();
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 02/15] dma-buf: rework stub fence initialisation
  2025-10-13 13:48 Independence for dma_fences! Christian König
  2025-10-13 13:48 ` [PATCH 01/15] dma-buf: cleanup dma_fence_describe Christian König
@ 2025-10-13 13:48 ` Christian König
  2025-10-14 15:03   ` Tvrtko Ursulin
  2025-10-24  7:29   ` Tvrtko Ursulin
  2025-10-13 13:48 ` [PATCH 03/15] dma-buf: protected fence ops by RCU Christian König
                   ` (14 subsequent siblings)
  16 siblings, 2 replies; 47+ messages in thread
From: Christian König @ 2025-10-13 13:48 UTC (permalink / raw)
  To: phasta, alexdeucher, simona.vetter, tursulin; +Cc: dri-devel, amd-gfx

Instead of doing this on the first call of the function just initialize
the stub fence during kernel load.

This has the clear advantage of lower overhead and also doesn't rely on
the ops to not be NULL any more.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-fence.c | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index f0539c73ed57..51ee13d005bc 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -121,29 +121,27 @@ static const struct dma_fence_ops dma_fence_stub_ops = {
 	.get_timeline_name = dma_fence_stub_get_name,
 };
 
+static int __init dma_fence_init_stub(void)
+{
+	dma_fence_init(&dma_fence_stub, &dma_fence_stub_ops,
+		       &dma_fence_stub_lock, 0, 0);
+
+	set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
+		&dma_fence_stub.flags);
+
+	dma_fence_signal_locked(&dma_fence_stub);
+	return 0;
+}
+subsys_initcall(dma_fence_init_stub);
+
 /**
  * dma_fence_get_stub - return a signaled fence
  *
- * Return a stub fence which is already signaled. The fence's
- * timestamp corresponds to the first time after boot this
- * function is called.
+ * Return a stub fence which is already signaled. The fence's timestamp
+ * corresponds to the initialisation time of the linux kernel.
  */
 struct dma_fence *dma_fence_get_stub(void)
 {
-	spin_lock(&dma_fence_stub_lock);
-	if (!dma_fence_stub.ops) {
-		dma_fence_init(&dma_fence_stub,
-			       &dma_fence_stub_ops,
-			       &dma_fence_stub_lock,
-			       0, 0);
-
-		set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
-			&dma_fence_stub.flags);
-
-		dma_fence_signal_locked(&dma_fence_stub);
-	}
-	spin_unlock(&dma_fence_stub_lock);
-
 	return dma_fence_get(&dma_fence_stub);
 }
 EXPORT_SYMBOL(dma_fence_get_stub);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 03/15] dma-buf: protected fence ops by RCU
  2025-10-13 13:48 Independence for dma_fences! Christian König
  2025-10-13 13:48 ` [PATCH 01/15] dma-buf: cleanup dma_fence_describe Christian König
  2025-10-13 13:48 ` [PATCH 02/15] dma-buf: rework stub fence initialisation Christian König
@ 2025-10-13 13:48 ` Christian König
  2025-10-16 18:04   ` Tvrtko Ursulin
  2025-10-31 10:35   ` Tvrtko Ursulin
  2025-10-13 13:48 ` [PATCH 04/15] dma-buf: detach fence ops on signal Christian König
                   ` (13 subsequent siblings)
  16 siblings, 2 replies; 47+ messages in thread
From: Christian König @ 2025-10-13 13:48 UTC (permalink / raw)
  To: phasta, alexdeucher, simona.vetter, tursulin; +Cc: dri-devel, amd-gfx

At first glance it is counter intuitive to protect a constant function
pointer table by RCU, but this allows modules providing the function
table to unload by waiting for an RCU grace period.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-fence.c | 65 +++++++++++++++++++++++++++----------
 include/linux/dma-fence.h   | 18 ++++++++--
 2 files changed, 62 insertions(+), 21 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 51ee13d005bc..982f2b2a62c0 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -498,6 +498,7 @@ EXPORT_SYMBOL(dma_fence_signal);
 signed long
 dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
 {
+	const struct dma_fence_ops *ops;
 	signed long ret;
 
 	if (WARN_ON(timeout < 0))
@@ -509,15 +510,21 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
 
 	dma_fence_enable_sw_signaling(fence);
 
-	if (trace_dma_fence_wait_start_enabled()) {
-		rcu_read_lock();
-		trace_dma_fence_wait_start(fence);
+	rcu_read_lock();
+	ops = rcu_dereference(fence->ops);
+	trace_dma_fence_wait_start(fence);
+	if (ops->wait) {
+		/*
+		 * Implementing the wait ops is deprecated and not supported for
+		 * issuer independent fences, so it is ok to use the ops outside
+		 * the RCU protected section.
+		 */
+		rcu_read_unlock();
+		ret = ops->wait(fence, intr, timeout);
+	} else {
 		rcu_read_unlock();
-	}
-	if (fence->ops->wait)
-		ret = fence->ops->wait(fence, intr, timeout);
-	else
 		ret = dma_fence_default_wait(fence, intr, timeout);
+	}
 	if (trace_dma_fence_wait_end_enabled()) {
 		rcu_read_lock();
 		trace_dma_fence_wait_end(fence);
@@ -538,6 +545,7 @@ void dma_fence_release(struct kref *kref)
 {
 	struct dma_fence *fence =
 		container_of(kref, struct dma_fence, refcount);
+	const struct dma_fence_ops *ops;
 
 	rcu_read_lock();
 	trace_dma_fence_destroy(fence);
@@ -569,12 +577,12 @@ void dma_fence_release(struct kref *kref)
 		spin_unlock_irqrestore(fence->lock, flags);
 	}
 
-	rcu_read_unlock();
-
-	if (fence->ops->release)
-		fence->ops->release(fence);
+	ops = rcu_dereference(fence->ops);
+	if (ops->release)
+		ops->release(fence);
 	else
 		dma_fence_free(fence);
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL(dma_fence_release);
 
@@ -593,6 +601,7 @@ EXPORT_SYMBOL(dma_fence_free);
 
 static bool __dma_fence_enable_signaling(struct dma_fence *fence)
 {
+	const struct dma_fence_ops *ops;
 	bool was_set;
 
 	lockdep_assert_held(fence->lock);
@@ -603,14 +612,18 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
 	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
 		return false;
 
-	if (!was_set && fence->ops->enable_signaling) {
+	rcu_read_lock();
+	ops = rcu_dereference(fence->ops);
+	if (!was_set && ops->enable_signaling) {
 		trace_dma_fence_enable_signal(fence);
 
-		if (!fence->ops->enable_signaling(fence)) {
+		if (!ops->enable_signaling(fence)) {
+			rcu_read_unlock();
 			dma_fence_signal_locked(fence);
 			return false;
 		}
 	}
+	rcu_read_unlock();
 
 	return true;
 }
@@ -983,8 +996,13 @@ EXPORT_SYMBOL(dma_fence_wait_any_timeout);
  */
 void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
 {
-	if (fence->ops->set_deadline && !dma_fence_is_signaled(fence))
-		fence->ops->set_deadline(fence, deadline);
+	const struct dma_fence_ops *ops;
+
+	rcu_read_lock();
+	ops = rcu_dereference(fence->ops);
+	if (ops->set_deadline && !dma_fence_is_signaled(fence))
+		ops->set_deadline(fence, deadline);
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL(dma_fence_set_deadline);
 
@@ -1024,7 +1042,12 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
 	BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
 
 	kref_init(&fence->refcount);
-	fence->ops = ops;
+	/*
+	 * At first glance it is counter intuitive to protect a constant
+	 * function pointer table by RCU, but this allows modules providing the
+	 * function table to unload by waiting for an RCU grace period.
+	 */
+	RCU_INIT_POINTER(fence->ops, ops);
 	INIT_LIST_HEAD(&fence->cb_list);
 	fence->lock = lock;
 	fence->context = context;
@@ -1104,11 +1127,14 @@ EXPORT_SYMBOL(dma_fence_init64);
  */
 const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
 {
+	const struct dma_fence_ops *ops;
+
 	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
 			 "RCU protection is required for safe access to returned string");
 
+	ops = rcu_dereference(fence->ops);
 	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
-		return fence->ops->get_driver_name(fence);
+		return ops->get_driver_name(fence);
 	else
 		return "detached-driver";
 }
@@ -1136,11 +1162,14 @@ EXPORT_SYMBOL(dma_fence_driver_name);
  */
 const char __rcu *dma_fence_timeline_name(struct dma_fence *fence)
 {
+	const struct dma_fence_ops *ops;
+
 	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
 			 "RCU protection is required for safe access to returned string");
 
+	ops = rcu_dereference(fence->ops);
 	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
-		return fence->ops->get_driver_name(fence);
+		return ops->get_driver_name(fence);
 	else
 		return "signaled-timeline";
 }
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 64639e104110..38421a0c7c5b 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -66,7 +66,7 @@ struct seq_file;
  */
 struct dma_fence {
 	spinlock_t *lock;
-	const struct dma_fence_ops *ops;
+	const struct dma_fence_ops __rcu *ops;
 	/*
 	 * We clear the callback list on kref_put so that by the time we
 	 * release the fence it is unused. No one should be adding to the
@@ -418,13 +418,19 @@ const char __rcu *dma_fence_timeline_name(struct dma_fence *fence);
 static inline bool
 dma_fence_is_signaled_locked(struct dma_fence *fence)
 {
+	const struct dma_fence_ops *ops;
+
 	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
 		return true;
 
-	if (fence->ops->signaled && fence->ops->signaled(fence)) {
+	rcu_read_lock();
+	ops = rcu_dereference(fence->ops);
+	if (ops->signaled && ops->signaled(fence)) {
+		rcu_read_unlock();
 		dma_fence_signal_locked(fence);
 		return true;
 	}
+	rcu_read_unlock();
 
 	return false;
 }
@@ -448,13 +454,19 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
 static inline bool
 dma_fence_is_signaled(struct dma_fence *fence)
 {
+	const struct dma_fence_ops *ops;
+
 	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
 		return true;
 
-	if (fence->ops->signaled && fence->ops->signaled(fence)) {
+	rcu_read_lock();
+	ops = rcu_dereference(fence->ops);
+	if (ops->signaled && ops->signaled(fence)) {
+		rcu_read_unlock();
 		dma_fence_signal(fence);
 		return true;
 	}
+	rcu_read_unlock();
 
 	return false;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 04/15] dma-buf: detach fence ops on signal
  2025-10-13 13:48 Independence for dma_fences! Christian König
                   ` (2 preceding siblings ...)
  2025-10-13 13:48 ` [PATCH 03/15] dma-buf: protected fence ops by RCU Christian König
@ 2025-10-13 13:48 ` Christian König
  2025-10-16  8:56   ` Tvrtko Ursulin
  2025-10-17  9:14   ` Philipp Stanner
  2025-10-13 13:48 ` [PATCH 05/15] dma-buf: inline spinlock for fence protection Christian König
                   ` (12 subsequent siblings)
  16 siblings, 2 replies; 47+ messages in thread
From: Christian König @ 2025-10-13 13:48 UTC (permalink / raw)
  To: phasta, alexdeucher, simona.vetter, tursulin; +Cc: dri-devel, amd-gfx

When neither a release nor a wait operation is specified it is possible
to let the dma_fence live on independent of the module who issued it.

This makes it possible to unload drivers and only wait for all their
fences to signal.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-fence.c | 16 ++++++++++++----
 include/linux/dma-fence.h   |  4 ++--
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 982f2b2a62c0..39f73edf3a33 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -374,6 +374,14 @@ int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
 				      &fence->flags)))
 		return -EINVAL;
 
+	/*
+	 * When neither a release nor a wait operation is specified set the ops
+	 * pointer to NULL to allow the fence structure to become independent
+	 * who originally issued it.
+	 */
+	if (!fence->ops->release && !fence->ops->wait)
+		RCU_INIT_POINTER(fence->ops, NULL);
+
 	/* Stash the cb_list before replacing it with the timestamp */
 	list_replace(&fence->cb_list, &cb_list);
 
@@ -513,7 +521,7 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
 	rcu_read_lock();
 	ops = rcu_dereference(fence->ops);
 	trace_dma_fence_wait_start(fence);
-	if (ops->wait) {
+	if (ops && ops->wait) {
 		/*
 		 * Implementing the wait ops is deprecated and not supported for
 		 * issuer independent fences, so it is ok to use the ops outside
@@ -578,7 +586,7 @@ void dma_fence_release(struct kref *kref)
 	}
 
 	ops = rcu_dereference(fence->ops);
-	if (ops->release)
+	if (ops && ops->release)
 		ops->release(fence);
 	else
 		dma_fence_free(fence);
@@ -614,7 +622,7 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
 
 	rcu_read_lock();
 	ops = rcu_dereference(fence->ops);
-	if (!was_set && ops->enable_signaling) {
+	if (!was_set && ops && ops->enable_signaling) {
 		trace_dma_fence_enable_signal(fence);
 
 		if (!ops->enable_signaling(fence)) {
@@ -1000,7 +1008,7 @@ void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
 
 	rcu_read_lock();
 	ops = rcu_dereference(fence->ops);
-	if (ops->set_deadline && !dma_fence_is_signaled(fence))
+	if (ops && ops->set_deadline && !dma_fence_is_signaled(fence))
 		ops->set_deadline(fence, deadline);
 	rcu_read_unlock();
 }
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 38421a0c7c5b..e1ba1d53de88 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -425,7 +425,7 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
 
 	rcu_read_lock();
 	ops = rcu_dereference(fence->ops);
-	if (ops->signaled && ops->signaled(fence)) {
+	if (ops && ops->signaled && ops->signaled(fence)) {
 		rcu_read_unlock();
 		dma_fence_signal_locked(fence);
 		return true;
@@ -461,7 +461,7 @@ dma_fence_is_signaled(struct dma_fence *fence)
 
 	rcu_read_lock();
 	ops = rcu_dereference(fence->ops);
-	if (ops->signaled && ops->signaled(fence)) {
+	if (ops && ops->signaled && ops->signaled(fence)) {
 		rcu_read_unlock();
 		dma_fence_signal(fence);
 		return true;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 05/15] dma-buf: inline spinlock for fence protection
  2025-10-13 13:48 Independence for dma_fences! Christian König
                   ` (3 preceding siblings ...)
  2025-10-13 13:48 ` [PATCH 04/15] dma-buf: detach fence ops on signal Christian König
@ 2025-10-13 13:48 ` Christian König
  2025-10-16  9:26   ` Tvrtko Ursulin
  2025-10-23 18:09   ` Matthew Brost
  2025-10-13 13:48 ` [PATCH 06/15] dma-buf: use inline lock for the stub fence Christian König
                   ` (11 subsequent siblings)
  16 siblings, 2 replies; 47+ messages in thread
From: Christian König @ 2025-10-13 13:48 UTC (permalink / raw)
  To: phasta, alexdeucher, simona.vetter, tursulin; +Cc: dri-devel, amd-gfx

Allow implementations to not give a spinlock to protect the fence
internal state, instead a spinlock embedded into the fence structure
itself is used in this case.

Apart from simplifying the handling for containers and the stub fence
this has the advantage of allowing implementations to issue fences
without caring about theit spinlock lifetime.

That in turn is necessary for independent fences who outlive the module
who originally issued them.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-fence.c              | 54 ++++++++++++------------
 drivers/dma-buf/sw_sync.c                | 14 +++---
 drivers/dma-buf/sync_debug.h             |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 12 +++---
 drivers/gpu/drm/drm_crtc.c               |  2 +-
 drivers/gpu/drm/drm_writeback.c          |  2 +-
 drivers/gpu/drm/nouveau/nouveau_drm.c    |  5 ++-
 drivers/gpu/drm/nouveau/nouveau_fence.c  |  3 +-
 drivers/gpu/drm/qxl/qxl_release.c        |  3 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c    |  3 +-
 drivers/gpu/drm/xe/xe_hw_fence.c         |  3 +-
 drivers/gpu/drm/xe/xe_sched_job.c        |  4 +-
 include/linux/dma-fence.h                | 42 +++++++++++++++++-
 15 files changed, 99 insertions(+), 58 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 39f73edf3a33..a0b328fdd90d 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -343,7 +343,6 @@ void __dma_fence_might_wait(void)
 }
 #endif
 
-
 /**
  * dma_fence_signal_timestamp_locked - signal completion of a fence
  * @fence: the fence to signal
@@ -368,7 +367,7 @@ int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
 	struct dma_fence_cb *cur, *tmp;
 	struct list_head cb_list;
 
-	lockdep_assert_held(fence->lock);
+	lockdep_assert_held(dma_fence_spinlock(fence));
 
 	if (unlikely(test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
 				      &fence->flags)))
@@ -421,9 +420,9 @@ int dma_fence_signal_timestamp(struct dma_fence *fence, ktime_t timestamp)
 	if (WARN_ON(!fence))
 		return -EINVAL;
 
-	spin_lock_irqsave(fence->lock, flags);
+	dma_fence_lock(fence, flags);
 	ret = dma_fence_signal_timestamp_locked(fence, timestamp);
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock(fence, flags);
 
 	return ret;
 }
@@ -475,9 +474,9 @@ int dma_fence_signal(struct dma_fence *fence)
 
 	tmp = dma_fence_begin_signalling();
 
-	spin_lock_irqsave(fence->lock, flags);
+	dma_fence_lock(fence, flags);
 	ret = dma_fence_signal_timestamp_locked(fence, ktime_get());
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock(fence, flags);
 
 	dma_fence_end_signalling(tmp);
 
@@ -579,10 +578,10 @@ void dma_fence_release(struct kref *kref)
 		 * don't leave chains dangling. We set the error flag first
 		 * so that the callbacks know this signal is due to an error.
 		 */
-		spin_lock_irqsave(fence->lock, flags);
+		dma_fence_lock(fence, flags);
 		fence->error = -EDEADLK;
 		dma_fence_signal_locked(fence);
-		spin_unlock_irqrestore(fence->lock, flags);
+		dma_fence_unlock(fence, flags);
 	}
 
 	ops = rcu_dereference(fence->ops);
@@ -612,7 +611,7 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
 	const struct dma_fence_ops *ops;
 	bool was_set;
 
-	lockdep_assert_held(fence->lock);
+	lockdep_assert_held(dma_fence_spinlock(fence));
 
 	was_set = test_and_set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
 				   &fence->flags);
@@ -648,9 +647,9 @@ void dma_fence_enable_sw_signaling(struct dma_fence *fence)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(fence->lock, flags);
+	dma_fence_lock(fence, flags);
 	__dma_fence_enable_signaling(fence);
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock(fence, flags);
 }
 EXPORT_SYMBOL(dma_fence_enable_sw_signaling);
 
@@ -690,8 +689,7 @@ int dma_fence_add_callback(struct dma_fence *fence, struct dma_fence_cb *cb,
 		return -ENOENT;
 	}
 
-	spin_lock_irqsave(fence->lock, flags);
-
+	dma_fence_lock(fence, flags);
 	if (__dma_fence_enable_signaling(fence)) {
 		cb->func = func;
 		list_add_tail(&cb->node, &fence->cb_list);
@@ -699,8 +697,7 @@ int dma_fence_add_callback(struct dma_fence *fence, struct dma_fence_cb *cb,
 		INIT_LIST_HEAD(&cb->node);
 		ret = -ENOENT;
 	}
-
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock(fence, flags);
 
 	return ret;
 }
@@ -723,9 +720,9 @@ int dma_fence_get_status(struct dma_fence *fence)
 	unsigned long flags;
 	int status;
 
-	spin_lock_irqsave(fence->lock, flags);
+	dma_fence_lock(fence, flags);
 	status = dma_fence_get_status_locked(fence);
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock(fence, flags);
 
 	return status;
 }
@@ -755,13 +752,11 @@ dma_fence_remove_callback(struct dma_fence *fence, struct dma_fence_cb *cb)
 	unsigned long flags;
 	bool ret;
 
-	spin_lock_irqsave(fence->lock, flags);
-
+	dma_fence_lock(fence, flags);
 	ret = !list_empty(&cb->node);
 	if (ret)
 		list_del_init(&cb->node);
-
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock(fence, flags);
 
 	return ret;
 }
@@ -800,8 +795,7 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
 	unsigned long flags;
 	signed long ret = timeout ? timeout : 1;
 
-	spin_lock_irqsave(fence->lock, flags);
-
+	dma_fence_lock(fence, flags);
 	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
 		goto out;
 
@@ -824,11 +818,11 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
 			__set_current_state(TASK_INTERRUPTIBLE);
 		else
 			__set_current_state(TASK_UNINTERRUPTIBLE);
-		spin_unlock_irqrestore(fence->lock, flags);
+		dma_fence_unlock(fence, flags);
 
 		ret = schedule_timeout(ret);
 
-		spin_lock_irqsave(fence->lock, flags);
+		dma_fence_lock(fence, flags);
 		if (ret > 0 && intr && signal_pending(current))
 			ret = -ERESTARTSYS;
 	}
@@ -838,7 +832,7 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
 	__set_current_state(TASK_RUNNING);
 
 out:
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock(fence, flags);
 	return ret;
 }
 EXPORT_SYMBOL(dma_fence_default_wait);
@@ -1046,7 +1040,6 @@ static void
 __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
 	         spinlock_t *lock, u64 context, u64 seqno, unsigned long flags)
 {
-	BUG_ON(!lock);
 	BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
 
 	kref_init(&fence->refcount);
@@ -1057,10 +1050,15 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
 	 */
 	RCU_INIT_POINTER(fence->ops, ops);
 	INIT_LIST_HEAD(&fence->cb_list);
-	fence->lock = lock;
 	fence->context = context;
 	fence->seqno = seqno;
 	fence->flags = flags;
+	if (lock) {
+		fence->extern_lock = lock;
+	} else {
+		spin_lock_init(&fence->inline_lock);
+		fence->flags |= BIT(DMA_FENCE_FLAG_INLINE_LOCK_BIT);
+	}
 	fence->error = 0;
 
 	trace_dma_fence_init(fence);
diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
index 3c20f1d31cf5..8f48529214a4 100644
--- a/drivers/dma-buf/sw_sync.c
+++ b/drivers/dma-buf/sw_sync.c
@@ -155,12 +155,12 @@ static void timeline_fence_release(struct dma_fence *fence)
 	struct sync_timeline *parent = dma_fence_parent(fence);
 	unsigned long flags;
 
-	spin_lock_irqsave(fence->lock, flags);
+	dma_fence_lock(fence, flags);
 	if (!list_empty(&pt->link)) {
 		list_del(&pt->link);
 		rb_erase(&pt->node, &parent->pt_tree);
 	}
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock(fence, flags);
 
 	sync_timeline_put(parent);
 	dma_fence_free(fence);
@@ -178,7 +178,7 @@ static void timeline_fence_set_deadline(struct dma_fence *fence, ktime_t deadlin
 	struct sync_pt *pt = dma_fence_to_sync_pt(fence);
 	unsigned long flags;
 
-	spin_lock_irqsave(fence->lock, flags);
+	dma_fence_lock(fence, flags);
 	if (test_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags)) {
 		if (ktime_before(deadline, pt->deadline))
 			pt->deadline = deadline;
@@ -186,7 +186,7 @@ static void timeline_fence_set_deadline(struct dma_fence *fence, ktime_t deadlin
 		pt->deadline = deadline;
 		__set_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags);
 	}
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock(fence, flags);
 }
 
 static const struct dma_fence_ops timeline_fence_ops = {
@@ -427,13 +427,13 @@ static int sw_sync_ioctl_get_deadline(struct sync_timeline *obj, unsigned long a
 		goto put_fence;
 	}
 
-	spin_lock_irqsave(fence->lock, flags);
+	dma_fence_lock(fence, flags);
 	if (!test_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags)) {
 		ret = -ENOENT;
 		goto unlock;
 	}
 	data.deadline_ns = ktime_to_ns(pt->deadline);
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock(fence, flags);
 
 	dma_fence_put(fence);
 
@@ -446,7 +446,7 @@ static int sw_sync_ioctl_get_deadline(struct sync_timeline *obj, unsigned long a
 	return 0;
 
 unlock:
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock(fence, flags);
 put_fence:
 	dma_fence_put(fence);
 
diff --git a/drivers/dma-buf/sync_debug.h b/drivers/dma-buf/sync_debug.h
index 02af347293d0..c49324505b20 100644
--- a/drivers/dma-buf/sync_debug.h
+++ b/drivers/dma-buf/sync_debug.h
@@ -47,7 +47,7 @@ struct sync_timeline {
 
 static inline struct sync_timeline *dma_fence_parent(struct dma_fence *fence)
 {
-	return container_of(fence->lock, struct sync_timeline, lock);
+	return container_of(fence->extern_lock, struct sync_timeline, lock);
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 5ec5c3ff22bb..fcc7a3fb93b3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -468,10 +468,10 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, unsigned int vmid,
 	if (amdgpu_sriov_vf(ring->adev) || !ring->funcs->soft_recovery || !fence)
 		return false;
 
-	spin_lock_irqsave(fence->lock, flags);
+	dma_fence_lock(fence, flags);
 	if (!dma_fence_is_signaled_locked(fence))
 		dma_fence_set_error(fence, -ENODATA);
-	spin_unlock_irqrestore(fence->lock, flags);
+	dma_fence_unlock(fence, flags);
 
 	while (!dma_fence_is_signaled(fence) &&
 	       ktime_to_ns(ktime_sub(deadline, ktime_get())) > 0)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index db66b4232de0..db6516ce8335 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2774,8 +2774,8 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 	dma_fence_put(vm->last_unlocked);
 	dma_fence_wait(vm->last_tlb_flush, false);
 	/* Make sure that all fence callbacks have completed */
-	spin_lock_irqsave(vm->last_tlb_flush->lock, flags);
-	spin_unlock_irqrestore(vm->last_tlb_flush->lock, flags);
+	dma_fence_lock(vm->last_tlb_flush, flags);
+	dma_fence_unlock(vm->last_tlb_flush, flags);
 	dma_fence_put(vm->last_tlb_flush);
 
 	list_for_each_entry_safe(mapping, tmp, &vm->freed, list) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 77207f4e448e..4fc7f66b7d13 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -631,20 +631,20 @@ bool amdgpu_vm_is_bo_always_valid(struct amdgpu_vm *vm, struct amdgpu_bo *bo);
  */
 static inline uint64_t amdgpu_vm_tlb_seq(struct amdgpu_vm *vm)
 {
+	struct dma_fence *fence;
 	unsigned long flags;
-	spinlock_t *lock;
 
 	/*
 	 * Workaround to stop racing between the fence signaling and handling
-	 * the cb. The lock is static after initially setting it up, just make
-	 * sure that the dma_fence structure isn't freed up.
+	 * the cb.
 	 */
 	rcu_read_lock();
-	lock = vm->last_tlb_flush->lock;
+	fence = dma_fence_get_rcu(vm->last_tlb_flush);
 	rcu_read_unlock();
 
-	spin_lock_irqsave(lock, flags);
-	spin_unlock_irqrestore(lock, flags);
+	dma_fence_lock(fence, flags);
+	dma_fence_unlock(fence, flags);
+	dma_fence_put(fence);
 
 	return atomic64_read(&vm->tlb_seq);
 }
diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
index 46655339003d..ad47f58cd159 100644
--- a/drivers/gpu/drm/drm_crtc.c
+++ b/drivers/gpu/drm/drm_crtc.c
@@ -159,7 +159,7 @@ static const struct dma_fence_ops drm_crtc_fence_ops;
 static struct drm_crtc *fence_to_crtc(struct dma_fence *fence)
 {
 	BUG_ON(fence->ops != &drm_crtc_fence_ops);
-	return container_of(fence->lock, struct drm_crtc, fence_lock);
+	return container_of(fence->extern_lock, struct drm_crtc, fence_lock);
 }
 
 static const char *drm_crtc_fence_get_driver_name(struct dma_fence *fence)
diff --git a/drivers/gpu/drm/drm_writeback.c b/drivers/gpu/drm/drm_writeback.c
index 95b8a2e4bda6..624a4e8b6c99 100644
--- a/drivers/gpu/drm/drm_writeback.c
+++ b/drivers/gpu/drm/drm_writeback.c
@@ -81,7 +81,7 @@
  *	From userspace, this property will always read as zero.
  */
 
-#define fence_to_wb_connector(x) container_of(x->lock, \
+#define fence_to_wb_connector(x) container_of(x->extern_lock, \
 					      struct drm_writeback_connector, \
 					      fence_lock)
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 1527b801f013..2956ed2ec073 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -156,12 +156,13 @@ nouveau_name(struct drm_device *dev)
 static inline bool
 nouveau_cli_work_ready(struct dma_fence *fence)
 {
+	unsigned long flags;
 	bool ret = true;
 
-	spin_lock_irq(fence->lock);
+	dma_fence_lock(fence, flags);
 	if (!dma_fence_is_signaled_locked(fence))
 		ret = false;
-	spin_unlock_irq(fence->lock);
+	dma_fence_unlock(fence, flags);
 
 	if (ret == true)
 		dma_fence_put(fence);
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
index d5654e26d5bc..272b492c4d7c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -47,7 +47,8 @@ from_fence(struct dma_fence *fence)
 static inline struct nouveau_fence_chan *
 nouveau_fctx(struct nouveau_fence *fence)
 {
-	return container_of(fence->base.lock, struct nouveau_fence_chan, lock);
+	return container_of(fence->base.extern_lock, struct nouveau_fence_chan,
+			    lock);
 }
 
 static bool
diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index 05204a6a3fa8..1d346822c1f7 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -60,7 +60,8 @@ static long qxl_fence_wait(struct dma_fence *fence, bool intr,
 	struct qxl_device *qdev;
 	unsigned long cur, end = jiffies + timeout;
 
-	qdev = container_of(fence->lock, struct qxl_device, release_lock);
+	qdev = container_of(fence->extern_lock, struct qxl_device,
+			    release_lock);
 
 	if (!wait_event_timeout(qdev->release_event,
 				(dma_fence_is_signaled(fence) ||
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index c2294abbe753..346761172c1b 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -47,7 +47,8 @@ struct vmw_event_fence_action {
 static struct vmw_fence_manager *
 fman_from_fence(struct vmw_fence_obj *fence)
 {
-	return container_of(fence->base.lock, struct vmw_fence_manager, lock);
+	return container_of(fence->base.extern_lock, struct vmw_fence_manager,
+			    lock);
 }
 
 static void vmw_fence_obj_destroy(struct dma_fence *f)
diff --git a/drivers/gpu/drm/xe/xe_hw_fence.c b/drivers/gpu/drm/xe/xe_hw_fence.c
index b2a0c46dfcd4..3456bec93c70 100644
--- a/drivers/gpu/drm/xe/xe_hw_fence.c
+++ b/drivers/gpu/drm/xe/xe_hw_fence.c
@@ -144,7 +144,8 @@ static struct xe_hw_fence *to_xe_hw_fence(struct dma_fence *fence);
 
 static struct xe_hw_fence_irq *xe_hw_fence_irq(struct xe_hw_fence *fence)
 {
-	return container_of(fence->dma.lock, struct xe_hw_fence_irq, lock);
+	return container_of(fence->dma.extern_lock, struct xe_hw_fence_irq,
+			    lock);
 }
 
 static const char *xe_hw_fence_get_driver_name(struct dma_fence *dma_fence)
diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
index d21bf8f26964..ea7038475b4b 100644
--- a/drivers/gpu/drm/xe/xe_sched_job.c
+++ b/drivers/gpu/drm/xe/xe_sched_job.c
@@ -187,11 +187,11 @@ static bool xe_fence_set_error(struct dma_fence *fence, int error)
 	unsigned long irq_flags;
 	bool signaled;
 
-	spin_lock_irqsave(fence->lock, irq_flags);
+	dma_fence_lock(fence, irq_flags);
 	signaled = test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
 	if (!signaled)
 		dma_fence_set_error(fence, error);
-	spin_unlock_irqrestore(fence->lock, irq_flags);
+	dma_fence_unlock(fence, irq_flags);
 
 	return signaled;
 }
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index e1ba1d53de88..fb416f500664 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -34,7 +34,8 @@ struct seq_file;
  * @ops: dma_fence_ops associated with this fence
  * @rcu: used for releasing fence with kfree_rcu
  * @cb_list: list of all callbacks to call
- * @lock: spin_lock_irqsave used for locking
+ * @extern_lock: external spin_lock_irqsave used for locking
+ * @inline_lock: alternative internal spin_lock_irqsave used for locking
  * @context: execution context this fence belongs to, returned by
  *           dma_fence_context_alloc()
  * @seqno: the sequence number of this fence inside the execution context,
@@ -48,6 +49,7 @@ struct seq_file;
  * atomic ops (bit_*), so taking the spinlock will not be needed most
  * of the time.
  *
+ * DMA_FENCE_FLAG_INLINE_LOCK_BIT - use inline spinlock instead of external one
  * DMA_FENCE_FLAG_SIGNALED_BIT - fence is already signaled
  * DMA_FENCE_FLAG_TIMESTAMP_BIT - timestamp recorded for fence signaling
  * DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT - enable_signaling might have been called
@@ -65,7 +67,10 @@ struct seq_file;
  * been completed, or never called at all.
  */
 struct dma_fence {
-	spinlock_t *lock;
+	union {
+		spinlock_t *extern_lock;
+		spinlock_t inline_lock;
+	};
 	const struct dma_fence_ops __rcu *ops;
 	/*
 	 * We clear the callback list on kref_put so that by the time we
@@ -98,6 +103,7 @@ struct dma_fence {
 };
 
 enum dma_fence_flag_bits {
+	DMA_FENCE_FLAG_INLINE_LOCK_BIT,
 	DMA_FENCE_FLAG_SEQNO64_BIT,
 	DMA_FENCE_FLAG_SIGNALED_BIT,
 	DMA_FENCE_FLAG_TIMESTAMP_BIT,
@@ -351,6 +357,38 @@ dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep)
 	} while (1);
 }
 
+/**
+ * dma_fence_spinlock - return pointer to the spinlock protecting the fence
+ * @fence: the fence to get the lock from
+ *
+ * Return either the pointer to the embedded or the external spin lock.
+ */
+static inline spinlock_t *dma_fence_spinlock(struct dma_fence *fence)
+{
+	return test_bit(DMA_FENCE_FLAG_INLINE_LOCK_BIT, &fence->flags) ?
+		&fence->inline_lock : fence->extern_lock;
+}
+
+/**
+ * dma_fence_lock - irqsave lock the fence
+ * @fence: the fence to lock
+ * @flags: where to store the CPU flags.
+ *
+ * Lock the fence, preventing it from changing to the signaled state.
+ */
+#define dma_fence_lock(fence, flags)	\
+	spin_lock_irqsave(dma_fence_spinlock(fence), flags)
+
+/**
+ * dma_fence_unlock - unlock the fence and irqrestore
+ * @fence: the fence to unlock
+ * @flags the CPU flags to restore
+ *
+ * Unlock the fence, allowing it to change it's state to signaled again.
+ */
+#define dma_fence_unlock(fence, flags)	\
+	spin_unlock_irqrestore(dma_fence_spinlock(fence), flags)
+
 #ifdef CONFIG_LOCKDEP
 bool dma_fence_begin_signalling(void);
 void dma_fence_end_signalling(bool cookie);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 06/15] dma-buf: use inline lock for the stub fence
  2025-10-13 13:48 Independence for dma_fences! Christian König
                   ` (4 preceding siblings ...)
  2025-10-13 13:48 ` [PATCH 05/15] dma-buf: inline spinlock for fence protection Christian König
@ 2025-10-13 13:48 ` Christian König
  2025-10-13 13:48 ` [PATCH 07/15] dma-buf: use inline lock for the dma-fence-array Christian König
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: Christian König @ 2025-10-13 13:48 UTC (permalink / raw)
  To: phasta, alexdeucher, simona.vetter, tursulin; +Cc: dri-devel, amd-gfx

Just as proof of concept and minor cleanup.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-fence.c | 20 ++++----------------
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index a0b328fdd90d..a41ff8e8cd25 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -24,7 +24,6 @@ EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit);
 EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal);
 EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
 
-static DEFINE_SPINLOCK(dma_fence_stub_lock);
 static struct dma_fence dma_fence_stub;
 
 /*
@@ -123,12 +122,8 @@ static const struct dma_fence_ops dma_fence_stub_ops = {
 
 static int __init dma_fence_init_stub(void)
 {
-	dma_fence_init(&dma_fence_stub, &dma_fence_stub_ops,
-		       &dma_fence_stub_lock, 0, 0);
-
-	set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
-		&dma_fence_stub.flags);
-
+	dma_fence_init(&dma_fence_stub, &dma_fence_stub_ops, NULL, 0, 0);
+	set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &dma_fence_stub.flags);
 	dma_fence_signal_locked(&dma_fence_stub);
 	return 0;
 }
@@ -160,16 +155,9 @@ struct dma_fence *dma_fence_allocate_private_stub(ktime_t timestamp)
 	if (fence == NULL)
 		return NULL;
 
-	dma_fence_init(fence,
-		       &dma_fence_stub_ops,
-		       &dma_fence_stub_lock,
-		       0, 0);
-
-	set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
-		&fence->flags);
-
+	dma_fence_init(fence, &dma_fence_stub_ops, NULL, 0, 0);
+	set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags);
 	dma_fence_signal_timestamp(fence, timestamp);
-
 	return fence;
 }
 EXPORT_SYMBOL(dma_fence_allocate_private_stub);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 07/15] dma-buf: use inline lock for the dma-fence-array
  2025-10-13 13:48 Independence for dma_fences! Christian König
                   ` (5 preceding siblings ...)
  2025-10-13 13:48 ` [PATCH 06/15] dma-buf: use inline lock for the stub fence Christian König
@ 2025-10-13 13:48 ` Christian König
  2025-10-13 13:48 ` [PATCH 08/15] dma-buf: use inline lock for the dma-fence-chain Christian König
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: Christian König @ 2025-10-13 13:48 UTC (permalink / raw)
  To: phasta, alexdeucher, simona.vetter, tursulin; +Cc: dri-devel, amd-gfx

Just as proof of concept and minor cleanup.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-fence-array.c | 5 ++---
 include/linux/dma-fence-array.h   | 1 -
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c
index 6657d4b30af9..c2119a8049fe 100644
--- a/drivers/dma-buf/dma-fence-array.c
+++ b/drivers/dma-buf/dma-fence-array.c
@@ -204,9 +204,8 @@ void dma_fence_array_init(struct dma_fence_array *array,
 
 	array->num_fences = num_fences;
 
-	spin_lock_init(&array->lock);
-	dma_fence_init(&array->base, &dma_fence_array_ops, &array->lock,
-		       context, seqno);
+	dma_fence_init(&array->base, &dma_fence_array_ops, NULL, context,
+		       seqno);
 	init_irq_work(&array->work, irq_dma_fence_array_work);
 
 	atomic_set(&array->num_pending, signal_on_any ? 1 : num_fences);
diff --git a/include/linux/dma-fence-array.h b/include/linux/dma-fence-array.h
index 079b3dec0a16..370b3d2bba37 100644
--- a/include/linux/dma-fence-array.h
+++ b/include/linux/dma-fence-array.h
@@ -38,7 +38,6 @@ struct dma_fence_array_cb {
 struct dma_fence_array {
 	struct dma_fence base;
 
-	spinlock_t lock;
 	unsigned num_fences;
 	atomic_t num_pending;
 	struct dma_fence **fences;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 08/15] dma-buf: use inline lock for the dma-fence-chain
  2025-10-13 13:48 Independence for dma_fences! Christian König
                   ` (6 preceding siblings ...)
  2025-10-13 13:48 ` [PATCH 07/15] dma-buf: use inline lock for the dma-fence-array Christian König
@ 2025-10-13 13:48 ` Christian König
  2025-10-13 13:48 ` [PATCH 09/15] drm/sched: use inline locks for the drm-sched-fence Christian König
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: Christian König @ 2025-10-13 13:48 UTC (permalink / raw)
  To: phasta, alexdeucher, simona.vetter, tursulin; +Cc: dri-devel, amd-gfx

Just as proof of concept and minor cleanup.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-fence-chain.c | 3 +--
 include/linux/dma-fence-chain.h   | 1 -
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
index a8a90acf4f34..a707792b6025 100644
--- a/drivers/dma-buf/dma-fence-chain.c
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -245,7 +245,6 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
 	struct dma_fence_chain *prev_chain = to_dma_fence_chain(prev);
 	uint64_t context;
 
-	spin_lock_init(&chain->lock);
 	rcu_assign_pointer(chain->prev, prev);
 	chain->fence = fence;
 	chain->prev_seqno = 0;
@@ -261,7 +260,7 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
 			seqno = max(prev->seqno, seqno);
 	}
 
-	dma_fence_init64(&chain->base, &dma_fence_chain_ops, &chain->lock,
+	dma_fence_init64(&chain->base, &dma_fence_chain_ops, NULL,
 			 context, seqno);
 
 	/*
diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
index 68c3c1e41014..d39ce7a2e599 100644
--- a/include/linux/dma-fence-chain.h
+++ b/include/linux/dma-fence-chain.h
@@ -46,7 +46,6 @@ struct dma_fence_chain {
 		 */
 		struct irq_work work;
 	};
-	spinlock_t lock;
 };
 
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 09/15] drm/sched: use inline locks for the drm-sched-fence
  2025-10-13 13:48 Independence for dma_fences! Christian König
                   ` (7 preceding siblings ...)
  2025-10-13 13:48 ` [PATCH 08/15] dma-buf: use inline lock for the dma-fence-chain Christian König
@ 2025-10-13 13:48 ` Christian König
  2025-10-13 13:48 ` [PATCH 10/15] drm/amdgpu: fix KFD eviction fence enable_signaling path Christian König
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: Christian König @ 2025-10-13 13:48 UTC (permalink / raw)
  To: phasta, alexdeucher, simona.vetter, tursulin; +Cc: dri-devel, amd-gfx

Just as proof of concept and minor cleanup.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/scheduler/sched_fence.c | 11 +++++------
 include/drm/gpu_scheduler.h             |  4 ----
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
index 9391d6f0dc01..7a94e03341cb 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -156,19 +156,19 @@ static void drm_sched_fence_set_deadline_finished(struct dma_fence *f,
 	struct dma_fence *parent;
 	unsigned long flags;
 
-	spin_lock_irqsave(&fence->lock, flags);
+	dma_fence_lock(f, flags);
 
 	/* If we already have an earlier deadline, keep it: */
 	if (test_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) &&
 	    ktime_before(fence->deadline, deadline)) {
-		spin_unlock_irqrestore(&fence->lock, flags);
+		dma_fence_unlock(f, flags);
 		return;
 	}
 
 	fence->deadline = deadline;
 	set_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags);
 
-	spin_unlock_irqrestore(&fence->lock, flags);
+	dma_fence_unlock(f, flags);
 
 	/*
 	 * smp_load_aquire() to ensure that if we are racing another
@@ -217,7 +217,6 @@ struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
 
 	fence->owner = owner;
 	fence->drm_client_id = drm_client_id;
-	spin_lock_init(&fence->lock);
 
 	return fence;
 }
@@ -230,9 +229,9 @@ void drm_sched_fence_init(struct drm_sched_fence *fence,
 	fence->sched = entity->rq->sched;
 	seq = atomic_inc_return(&entity->fence_seq);
 	dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
-		       &fence->lock, entity->fence_context, seq);
+		       NULL, entity->fence_context, seq);
 	dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
-		       &fence->lock, entity->fence_context + 1, seq);
+		       NULL, entity->fence_context + 1, seq);
 }
 
 module_init(drm_sched_fence_slab_init);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index e62a7214e052..4478164ea174 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -297,10 +297,6 @@ struct drm_sched_fence {
          * belongs to.
          */
 	struct drm_gpu_scheduler	*sched;
-        /**
-         * @lock: the lock used by the scheduled and the finished fences.
-         */
-	spinlock_t			lock;
         /**
          * @owner: job owner for debugging
          */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 10/15] drm/amdgpu: fix KFD eviction fence enable_signaling path
  2025-10-13 13:48 Independence for dma_fences! Christian König
                   ` (8 preceding siblings ...)
  2025-10-13 13:48 ` [PATCH 09/15] drm/sched: use inline locks for the drm-sched-fence Christian König
@ 2025-10-13 13:48 ` Christian König
  2025-10-13 13:48 ` [PATCH 11/15] drm/amdgpu: independence for the amdgpu_fence! Christian König
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: Christian König @ 2025-10-13 13:48 UTC (permalink / raw)
  To: phasta, alexdeucher, simona.vetter, tursulin; +Cc: dri-devel, amd-gfx

Calling dma_fence_is_signaled() here is illegal!

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
index 1ef758ac5076..09c919f72b6c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
@@ -120,12 +120,6 @@ static bool amdkfd_fence_enable_signaling(struct dma_fence *f)
 {
 	struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
 
-	if (!fence)
-		return false;
-
-	if (dma_fence_is_signaled(f))
-		return true;
-
 	if (!fence->svm_bo) {
 		if (!kgd2kfd_schedule_evict_and_restore_process(fence->mm, f))
 			return true;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 11/15] drm/amdgpu: independence for the amdgpu_fence!
  2025-10-13 13:48 Independence for dma_fences! Christian König
                   ` (9 preceding siblings ...)
  2025-10-13 13:48 ` [PATCH 10/15] drm/amdgpu: fix KFD eviction fence enable_signaling path Christian König
@ 2025-10-13 13:48 ` Christian König
  2025-10-13 13:48 ` [PATCH 12/15] drm/amdgpu: independence for the amdgpu_eviction_fence! Christian König
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: Christian König @ 2025-10-13 13:48 UTC (permalink / raw)
  To: phasta, alexdeucher, simona.vetter, tursulin; +Cc: dri-devel, amd-gfx

This should allow amdgpu_fences to outlive the amdgpu module.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 63 +++++++----------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h  |  1 -
 2 files changed, 20 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 1fe31d2f2706..413f65239ebd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -112,8 +112,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct amdgpu_fence *af,
 	af->ring = ring;
 
 	seq = ++ring->fence_drv.sync_seq;
-	dma_fence_init(fence, &amdgpu_fence_ops,
-		       &ring->fence_drv.lock,
+	dma_fence_init(fence, &amdgpu_fence_ops, NULL,
 		       adev->fence_context + ring->idx, seq);
 
 	amdgpu_ring_emit_fence(ring, ring->fence_drv.gpu_addr,
@@ -468,7 +467,6 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring)
 	timer_setup(&ring->fence_drv.fallback_timer, amdgpu_fence_fallback, 0);
 
 	ring->fence_drv.num_fences_mask = ring->num_hw_submission * 2 - 1;
-	spin_lock_init(&ring->fence_drv.lock);
 	ring->fence_drv.fences = kcalloc(ring->num_hw_submission * 2, sizeof(void *),
 					 GFP_KERNEL);
 
@@ -655,16 +653,20 @@ void amdgpu_fence_driver_set_error(struct amdgpu_ring *ring, int error)
 	struct amdgpu_fence_driver *drv = &ring->fence_drv;
 	unsigned long flags;
 
-	spin_lock_irqsave(&drv->lock, flags);
+	rcu_read_lock();
 	for (unsigned int i = 0; i <= drv->num_fences_mask; ++i) {
 		struct dma_fence *fence;
 
-		fence = rcu_dereference_protected(drv->fences[i],
-						  lockdep_is_held(&drv->lock));
-		if (fence && !dma_fence_is_signaled_locked(fence))
+		fence = dma_fence_get_rcu(drv->fences[i]);
+		if (!fence)
+			continue;
+
+		dma_fence_lock(fence, flags);
+		if (!dma_fence_is_signaled_locked(fence))
 			dma_fence_set_error(fence, error);
+		dma_fence_unlock(fence, flags);
 	}
-	spin_unlock_irqrestore(&drv->lock, flags);
+	rcu_read_unlock();
 }
 
 /**
@@ -715,16 +717,19 @@ void amdgpu_fence_driver_guilty_force_completion(struct amdgpu_fence *af)
 	seq = ring->fence_drv.sync_seq & ring->fence_drv.num_fences_mask;
 
 	/* mark all fences from the guilty context with an error */
-	spin_lock_irqsave(&ring->fence_drv.lock, flags);
+	rcu_read_lock();
 	do {
 		last_seq++;
 		last_seq &= ring->fence_drv.num_fences_mask;
 
 		ptr = &ring->fence_drv.fences[last_seq];
-		rcu_read_lock();
-		unprocessed = rcu_dereference(*ptr);
+		unprocessed = dma_fence_get_rcu(*ptr);
+
+		if (!unprocessed)
+			continue;
 
-		if (unprocessed && !dma_fence_is_signaled_locked(unprocessed)) {
+		dma_fence_lock(unprocessed, flags);
+		if (dma_fence_is_signaled_locked(unprocessed)) {
 			fence = container_of(unprocessed, struct amdgpu_fence, base);
 
 			if (fence == af)
@@ -732,9 +737,10 @@ void amdgpu_fence_driver_guilty_force_completion(struct amdgpu_fence *af)
 			else if (fence->context == af->context)
 				dma_fence_set_error(&fence->base, -ECANCELED);
 		}
-		rcu_read_unlock();
+		dma_fence_unlock(unprocessed, flags);
+		dma_fence_put(unprocessed);
 	} while (last_seq != seq);
-	spin_unlock_irqrestore(&ring->fence_drv.lock, flags);
+	rcu_read_unlock();
 	/* signal the guilty fence */
 	amdgpu_fence_write(ring, (u32)af->base.seqno);
 	amdgpu_fence_process(ring);
@@ -824,39 +830,10 @@ static bool amdgpu_fence_enable_signaling(struct dma_fence *f)
 	return true;
 }
 
-/**
- * amdgpu_fence_free - free up the fence memory
- *
- * @rcu: RCU callback head
- *
- * Free up the fence memory after the RCU grace period.
- */
-static void amdgpu_fence_free(struct rcu_head *rcu)
-{
-	struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
-
-	/* free fence_slab if it's separated fence*/
-	kfree(to_amdgpu_fence(f));
-}
-
-/**
- * amdgpu_fence_release - callback that fence can be freed
- *
- * @f: fence
- *
- * This function is called when the reference count becomes zero.
- * It just RCU schedules freeing up the fence.
- */
-static void amdgpu_fence_release(struct dma_fence *f)
-{
-	call_rcu(&f->rcu, amdgpu_fence_free);
-}
-
 static const struct dma_fence_ops amdgpu_fence_ops = {
 	.get_driver_name = amdgpu_fence_get_driver_name,
 	.get_timeline_name = amdgpu_fence_get_timeline_name,
 	.enable_signaling = amdgpu_fence_enable_signaling,
-	.release = amdgpu_fence_release,
 };
 
 /*
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 87b962df5460..cab59a29b7c3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -124,7 +124,6 @@ struct amdgpu_fence_driver {
 	unsigned			irq_type;
 	struct timer_list		fallback_timer;
 	unsigned			num_fences_mask;
-	spinlock_t			lock;
 	struct dma_fence		**fences;
 };
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 12/15] drm/amdgpu: independence for the amdgpu_eviction_fence!
  2025-10-13 13:48 Independence for dma_fences! Christian König
                   ` (10 preceding siblings ...)
  2025-10-13 13:48 ` [PATCH 11/15] drm/amdgpu: independence for the amdgpu_fence! Christian König
@ 2025-10-13 13:48 ` Christian König
  2025-10-13 13:48 ` [PATCH 13/15] drm/amdgpu: independence for the amdgpu_vm_tlb_fence! Christian König
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: Christian König @ 2025-10-13 13:48 UTC (permalink / raw)
  To: phasta, alexdeucher, simona.vetter, tursulin; +Cc: dri-devel, amd-gfx

This should allow amdgpu_fences to outlive the amdgpu module.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c | 3 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h | 1 -
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
index 23d7d0b0d625..95ee22c43ceb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
@@ -167,9 +167,8 @@ amdgpu_eviction_fence_create(struct amdgpu_eviction_fence_mgr *evf_mgr)
 
 	ev_fence->evf_mgr = evf_mgr;
 	get_task_comm(ev_fence->timeline_name, current);
-	spin_lock_init(&ev_fence->lock);
 	dma_fence_init64(&ev_fence->base, &amdgpu_eviction_fence_ops,
-			 &ev_fence->lock, evf_mgr->ev_fence_ctx,
+			 NULL, evf_mgr->ev_fence_ctx,
 			 atomic_inc_return(&evf_mgr->ev_fence_seq));
 	return ev_fence;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
index fcd867b7147d..fb70efb54338 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
@@ -27,7 +27,6 @@
 
 struct amdgpu_eviction_fence {
 	struct dma_fence base;
-	spinlock_t	 lock;
 	char		 timeline_name[TASK_COMM_LEN];
 	struct amdgpu_eviction_fence_mgr *evf_mgr;
 };
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 13/15] drm/amdgpu: independence for the amdgpu_vm_tlb_fence!
  2025-10-13 13:48 Independence for dma_fences! Christian König
                   ` (11 preceding siblings ...)
  2025-10-13 13:48 ` [PATCH 12/15] drm/amdgpu: independence for the amdgpu_eviction_fence! Christian König
@ 2025-10-13 13:48 ` Christian König
  2025-10-13 13:48 ` [PATCH 14/15] drm/amdgpu: independence for the amdkfd_fence! Christian König
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: Christian König @ 2025-10-13 13:48 UTC (permalink / raw)
  To: phasta, alexdeucher, simona.vetter, tursulin; +Cc: dri-devel, amd-gfx

This should allow amdgpu_vm_tlb_fences to outlive the amdgpu module.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
index 5d26797356a3..27bf1f569830 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
@@ -33,7 +33,6 @@ struct amdgpu_tlb_fence {
 	struct amdgpu_device	*adev;
 	struct dma_fence	*dependency;
 	struct work_struct	work;
-	spinlock_t		lock;
 	uint16_t		pasid;
 
 };
@@ -98,9 +97,8 @@ void amdgpu_vm_tlb_fence_create(struct amdgpu_device *adev, struct amdgpu_vm *vm
 	f->dependency = *fence;
 	f->pasid = vm->pasid;
 	INIT_WORK(&f->work, amdgpu_tlb_fence_work);
-	spin_lock_init(&f->lock);
 
-	dma_fence_init64(&f->base, &amdgpu_tlb_fence_ops, &f->lock,
+	dma_fence_init64(&f->base, &amdgpu_tlb_fence_ops, NULL,
 			 vm->tlb_fence_context, atomic64_read(&vm->tlb_seq));
 
 	/* TODO: We probably need a separate wq here */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 14/15] drm/amdgpu: independence for the amdkfd_fence!
  2025-10-13 13:48 Independence for dma_fences! Christian König
                   ` (12 preceding siblings ...)
  2025-10-13 13:48 ` [PATCH 13/15] drm/amdgpu: independence for the amdgpu_vm_tlb_fence! Christian König
@ 2025-10-13 13:48 ` Christian König
  2025-10-17 22:22   ` Felix Kuehling
  2025-10-13 13:48 ` [PATCH 15/15] drm/amdgpu: independence for the amdgpu_userq__fence! Christian König
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: Christian König @ 2025-10-13 13:48 UTC (permalink / raw)
  To: phasta, alexdeucher, simona.vetter, tursulin; +Cc: dri-devel, amd-gfx

This should allow amdkfd_fences to outlive the amdgpu module.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  6 ++++
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c  | 36 +++++++------------
 drivers/gpu/drm/amd/amdkfd/kfd_process.c      |  7 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c          |  4 +--
 4 files changed, 24 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 9e120c934cc1..35c59c784b7b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -196,6 +196,7 @@ int kfd_debugfs_kfd_mem_limits(struct seq_file *m, void *data);
 #endif
 #if IS_ENABLED(CONFIG_HSA_AMD)
 bool amdkfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm);
+void amdkfd_fence_signal(struct dma_fence *f);
 struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f);
 void amdgpu_amdkfd_remove_all_eviction_fences(struct amdgpu_bo *bo);
 int amdgpu_amdkfd_evict_userptr(struct mmu_interval_notifier *mni,
@@ -210,6 +211,11 @@ bool amdkfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm)
 	return false;
 }
 
+static inline
+void amdkfd_fence_signal(struct dma_fence *f)
+{
+}
+
 static inline
 struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f)
 {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
index 09c919f72b6c..69bca4536326 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
@@ -127,29 +127,9 @@ static bool amdkfd_fence_enable_signaling(struct dma_fence *f)
 		if (!svm_range_schedule_evict_svm_bo(fence))
 			return true;
 	}
-	return false;
-}
-
-/**
- * amdkfd_fence_release - callback that fence can be freed
- *
- * @f: dma_fence
- *
- * This function is called when the reference count becomes zero.
- * Drops the mm_struct reference and RCU schedules freeing up the fence.
- */
-static void amdkfd_fence_release(struct dma_fence *f)
-{
-	struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
-
-	/* Unconditionally signal the fence. The process is getting
-	 * terminated.
-	 */
-	if (WARN_ON(!fence))
-		return; /* Not an amdgpu_amdkfd_fence */
-
 	mmdrop(fence->mm);
-	kfree_rcu(f, rcu);
+	fence->mm = NULL;
+	return false;
 }
 
 /**
@@ -174,9 +154,19 @@ bool amdkfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm)
 	return false;
 }
 
+void amdkfd_fence_signal(struct dma_fence *f)
+{
+	struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
+
+	if (fence) {
+		mmdrop(fence->mm);
+		fence->mm = NULL;
+	}
+	dma_fence_signal(f);
+}
+
 static const struct dma_fence_ops amdkfd_fence_ops = {
 	.get_driver_name = amdkfd_fence_get_driver_name,
 	.get_timeline_name = amdkfd_fence_get_timeline_name,
 	.enable_signaling = amdkfd_fence_enable_signaling,
-	.release = amdkfd_fence_release,
 };
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index ddfe30c13e9d..779d7701bac9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1177,7 +1177,7 @@ static void kfd_process_wq_release(struct work_struct *work)
 	synchronize_rcu();
 	ef = rcu_access_pointer(p->ef);
 	if (ef)
-		dma_fence_signal(ef);
+		amdkfd_fence_signal(ef);
 
 	kfd_process_remove_sysfs(p);
 	kfd_debugfs_remove_process(p);
@@ -1986,7 +1986,6 @@ kfd_process_gpuid_from_node(struct kfd_process *p, struct kfd_node *node,
 static int signal_eviction_fence(struct kfd_process *p)
 {
 	struct dma_fence *ef;
-	int ret;
 
 	rcu_read_lock();
 	ef = dma_fence_get_rcu_safe(&p->ef);
@@ -1994,10 +1993,10 @@ static int signal_eviction_fence(struct kfd_process *p)
 	if (!ef)
 		return -EINVAL;
 
-	ret = dma_fence_signal(ef);
+	amdkfd_fence_signal(ef);
 	dma_fence_put(ef);
 
-	return ret;
+	return 0;
 }
 
 static void evict_process_worker(struct work_struct *work)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 91609dd5730f..01ce2d853602 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -428,7 +428,7 @@ static void svm_range_bo_release(struct kref *kref)
 
 	if (!dma_fence_is_signaled(&svm_bo->eviction_fence->base))
 		/* We're not in the eviction worker. Signal the fence. */
-		dma_fence_signal(&svm_bo->eviction_fence->base);
+		amdkfd_fence_signal(&svm_bo->eviction_fence->base);
 	dma_fence_put(&svm_bo->eviction_fence->base);
 	amdgpu_bo_unref(&svm_bo->bo);
 	kfree(svm_bo);
@@ -3628,7 +3628,7 @@ static void svm_range_evict_svm_bo_worker(struct work_struct *work)
 	mmap_read_unlock(mm);
 	mmput(mm);
 
-	dma_fence_signal(&svm_bo->eviction_fence->base);
+	amdkfd_fence_signal(&svm_bo->eviction_fence->base);
 
 	/* This is the last reference to svm_bo, after svm_range_vram_node_free
 	 * has been called in svm_migrate_vram_to_ram
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 15/15] drm/amdgpu: independence for the amdgpu_userq__fence!
  2025-10-13 13:48 Independence for dma_fences! Christian König
                   ` (13 preceding siblings ...)
  2025-10-13 13:48 ` [PATCH 14/15] drm/amdgpu: independence for the amdkfd_fence! Christian König
@ 2025-10-13 13:48 ` Christian König
  2025-10-13 14:54 ` Independence for dma_fences! Philipp Stanner
  2025-10-15  0:51 ` Dave Airlie
  16 siblings, 0 replies; 47+ messages in thread
From: Christian König @ 2025-10-13 13:48 UTC (permalink / raw)
  To: phasta, alexdeucher, simona.vetter, tursulin; +Cc: dri-devel, amd-gfx

This should allow amdgpu_userq_fences to outlive the amdgpu module.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       | 13 +----
 .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   | 54 ++++---------------
 .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h   |  8 ---
 3 files changed, 11 insertions(+), 64 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index b4c41b19cb88..808a5907a325 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -3136,11 +3136,7 @@ static int __init amdgpu_init(void)
 
 	r = amdgpu_sync_init();
 	if (r)
-		goto error_sync;
-
-	r = amdgpu_userq_fence_slab_init();
-	if (r)
-		goto error_fence;
+		return r;
 
 	DRM_INFO("amdgpu kernel modesetting enabled.\n");
 	amdgpu_register_atpx_handler();
@@ -3157,12 +3153,6 @@ static int __init amdgpu_init(void)
 
 	/* let modprobe override vga console setting */
 	return pci_register_driver(&amdgpu_kms_pci_driver);
-
-error_fence:
-	amdgpu_sync_fini();
-
-error_sync:
-	return r;
 }
 
 static void __exit amdgpu_exit(void)
@@ -3172,7 +3162,6 @@ static void __exit amdgpu_exit(void)
 	amdgpu_unregister_atpx_handler();
 	amdgpu_acpi_release();
 	amdgpu_sync_fini();
-	amdgpu_userq_fence_slab_fini();
 	mmu_notifier_synchronize();
 	amdgpu_xcp_drv_release();
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
index 761bad98da3e..9e0d558c1e4c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
@@ -33,26 +33,6 @@
 #include "amdgpu_userq_fence.h"
 
 static const struct dma_fence_ops amdgpu_userq_fence_ops;
-static struct kmem_cache *amdgpu_userq_fence_slab;
-
-int amdgpu_userq_fence_slab_init(void)
-{
-	amdgpu_userq_fence_slab = kmem_cache_create("amdgpu_userq_fence",
-						    sizeof(struct amdgpu_userq_fence),
-						    0,
-						    SLAB_HWCACHE_ALIGN,
-						    NULL);
-	if (!amdgpu_userq_fence_slab)
-		return -ENOMEM;
-
-	return 0;
-}
-
-void amdgpu_userq_fence_slab_fini(void)
-{
-	rcu_barrier();
-	kmem_cache_destroy(amdgpu_userq_fence_slab);
-}
 
 static inline struct amdgpu_userq_fence *to_amdgpu_userq_fence(struct dma_fence *f)
 {
@@ -226,7 +206,7 @@ void amdgpu_userq_fence_driver_put(struct amdgpu_userq_fence_driver *fence_drv)
 
 static int amdgpu_userq_fence_alloc(struct amdgpu_userq_fence **userq_fence)
 {
-	*userq_fence = kmem_cache_alloc(amdgpu_userq_fence_slab, GFP_ATOMIC);
+	*userq_fence = kmalloc(sizeof(**userq_fence), GFP_ATOMIC);
 	return *userq_fence ? 0 : -ENOMEM;
 }
 
@@ -242,12 +222,11 @@ static int amdgpu_userq_fence_create(struct amdgpu_usermode_queue *userq,
 	if (!fence_drv)
 		return -EINVAL;
 
-	spin_lock_init(&userq_fence->lock);
 	INIT_LIST_HEAD(&userq_fence->link);
 	fence = &userq_fence->base;
 	userq_fence->fence_drv = fence_drv;
 
-	dma_fence_init64(fence, &amdgpu_userq_fence_ops, &userq_fence->lock,
+	dma_fence_init64(fence, &amdgpu_userq_fence_ops, NULL,
 			 fence_drv->context, seq);
 
 	amdgpu_userq_fence_driver_get(fence_drv);
@@ -317,35 +296,22 @@ static bool amdgpu_userq_fence_signaled(struct dma_fence *f)
 	rptr = amdgpu_userq_fence_read(fence_drv);
 	wptr = fence->base.seqno;
 
-	if (rptr >= wptr)
+	if (rptr >= wptr) {
+		amdgpu_userq_fence_driver_put(fence->fence_drv);
+		fence->fence_drv = NULL;
+
+		kvfree(fence->fence_drv_array);
+		fence->fence_drv_array = NULL;
 		return true;
+	}
 
 	return false;
 }
 
-static void amdgpu_userq_fence_free(struct rcu_head *rcu)
-{
-	struct dma_fence *fence = container_of(rcu, struct dma_fence, rcu);
-	struct amdgpu_userq_fence *userq_fence = to_amdgpu_userq_fence(fence);
-	struct amdgpu_userq_fence_driver *fence_drv = userq_fence->fence_drv;
-
-	/* Release the fence driver reference */
-	amdgpu_userq_fence_driver_put(fence_drv);
-
-	kvfree(userq_fence->fence_drv_array);
-	kmem_cache_free(amdgpu_userq_fence_slab, userq_fence);
-}
-
-static void amdgpu_userq_fence_release(struct dma_fence *f)
-{
-	call_rcu(&f->rcu, amdgpu_userq_fence_free);
-}
-
 static const struct dma_fence_ops amdgpu_userq_fence_ops = {
 	.get_driver_name = amdgpu_userq_fence_get_driver_name,
 	.get_timeline_name = amdgpu_userq_fence_get_timeline_name,
 	.signaled = amdgpu_userq_fence_signaled,
-	.release = amdgpu_userq_fence_release,
 };
 
 /**
@@ -558,7 +524,7 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
 	r = amdgpu_userq_fence_create(queue, userq_fence, wptr, &fence);
 	if (r) {
 		mutex_unlock(&userq_mgr->userq_mutex);
-		kmem_cache_free(amdgpu_userq_fence_slab, userq_fence);
+		kfree(userq_fence);
 		goto put_gobj_write;
 	}
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
index d76add2afc77..6f04782f3ea9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
@@ -31,11 +31,6 @@
 
 struct amdgpu_userq_fence {
 	struct dma_fence base;
-	/*
-	 * This lock is necessary to synchronize the
-	 * userqueue dma fence operations.
-	 */
-	spinlock_t lock;
 	struct list_head link;
 	unsigned long fence_drv_array_count;
 	struct amdgpu_userq_fence_driver *fence_drv;
@@ -58,9 +53,6 @@ struct amdgpu_userq_fence_driver {
 	char timeline_name[TASK_COMM_LEN];
 };
 
-int amdgpu_userq_fence_slab_init(void);
-void amdgpu_userq_fence_slab_fini(void);
-
 void amdgpu_userq_fence_driver_get(struct amdgpu_userq_fence_driver *fence_drv);
 void amdgpu_userq_fence_driver_put(struct amdgpu_userq_fence_driver *fence_drv);
 int amdgpu_userq_fence_driver_alloc(struct amdgpu_device *adev,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: Independence for dma_fences!
  2025-10-13 13:48 Independence for dma_fences! Christian König
                   ` (14 preceding siblings ...)
  2025-10-13 13:48 ` [PATCH 15/15] drm/amdgpu: independence for the amdgpu_userq__fence! Christian König
@ 2025-10-13 14:54 ` Philipp Stanner
  2025-10-14 15:54   ` Christian König
  2025-10-15  0:51 ` Dave Airlie
  16 siblings, 1 reply; 47+ messages in thread
From: Philipp Stanner @ 2025-10-13 14:54 UTC (permalink / raw)
  To: Christian König, alexdeucher, simona.vetter, tursulin
  Cc: dri-devel, amd-gfx

On Mon, 2025-10-13 at 15:48 +0200, Christian König wrote:
> Hi everyone,
> 
> dma_fences have ever lived under the tyranny dictated by the module
> lifetime of their issuer, leading to crashes should anybody still holding
> a reference to a dma_fence when the module of the issuer was unloaded.
> 
> But those days are over! The patch set following this mail finally
> implements a way for issuers to release their dma_fence out of this
> slavery and outlive the module who originally created them.
> 
> Previously various approaches have been discussed, including changing the
> locking semantics of the dma_fence callbacks (by me) as well as using the
> drm scheduler as intermediate layer (by Sima) to disconnect dma_fences
> from their actual users.
> 
> Changing the locking semantics turned out to be much more trickier than
> originally thought because especially on older drivers (nouveau, radeon,
> but also i915) this locking semantics is actually needed for correct
> operation.
> 
> Using the drm_scheduler as intermediate layer is still a good idea and
> should probably be implemented to make live simpler for some drivers, but
> doesn't work for all use cases. Especially TLB flush fences, preemption
> fences and userqueue fences don't go through the drm scheduler because it
> doesn't make sense for them.
> 
> Tvrtko did some really nice prerequisite work by protecting the returned
> strings of the dma_fence_ops by RCU. This way dma_fence creators where
> able to just wait for an RCU grace period after fence signaling before
> they could be save to free those data structures.
> 
> Now this patch set here goes a step further and protects the whole
> dma_fence_ops structure by RCU, so that after the fence signals the
> pointer to the dma_fence_ops is set to NULL when there is no wait nor
> release callback given. All functionality which use the dma_fence_ops
> reference are put inside an RCU critical section, except for the
> deprecated issuer specific wait and of course the optional release
> callback.
> 
> Additional to the RCU changes the lock protecting the dma_fence state
> previously had to be allocated external. This set here now changes the
> functionality to make that external lock optional and allows dma_fences
> to use an inline lock and be self contained.

Allowing for an embedded lock, is that actually necessary for the goals
of this series, or is it an optional change / improvement?

If I understood you correctly at XDC you wanted to have an embedded
lock because it improves the memory footprint and because an external
lock couldn't achieve some goals about fence-signaling-order originally
intended. Can you elaborate on that?

P.


> 
> The new approach is then applied to amdgpu allowing the module to be
> unloaded even when dma_fences issued by it are still around.
> 
> Please review and comment,
> Christian.
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 01/15] dma-buf: cleanup dma_fence_describe
  2025-10-13 13:48 ` [PATCH 01/15] dma-buf: cleanup dma_fence_describe Christian König
@ 2025-10-14 14:37   ` Tvrtko Ursulin
  2025-10-23  3:45     ` Matthew Brost
  0 siblings, 1 reply; 47+ messages in thread
From: Tvrtko Ursulin @ 2025-10-14 14:37 UTC (permalink / raw)
  To: Christian König, phasta, alexdeucher, simona.vetter
  Cc: dri-devel, amd-gfx


On 13/10/2025 14:48, Christian König wrote:
> The driver and timeline name are meaningless for signaled fences.
> 
> Drop them and also print the context number.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/dma-buf/dma-fence.c | 11 ++++++-----
>   1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 3f78c56b58dc..f0539c73ed57 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -1001,17 +1001,18 @@ void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq)
>   {
>   	const char __rcu *timeline;
>   	const char __rcu *driver;
> +	const char *signaled = "un";
>   
>   	rcu_read_lock();
>   
>   	timeline = dma_fence_timeline_name(fence);
>   	driver = dma_fence_driver_name(fence);
>   
> -	seq_printf(seq, "%s %s seq %llu %ssignalled\n",
> -		   rcu_dereference(driver),
> -		   rcu_dereference(timeline),
> -		   fence->seqno,
> -		   dma_fence_is_signaled(fence) ? "" : "un");
> +	if (dma_fence_is_signaled(fence))
> +		timeline = driver = signaled = "";

FWIW you could avoid calling dma_fence_timeline_name() and 
dma_fence_driver_name() since you added the signaled check.

May end up slightly nicer than to override strings returned from helpers 
with a chained assignment.

Or even store the signaled status in a local bool and branch off two 
seq_printfs based on it.

> +
> +	seq_printf(seq, "%llu %s %s seq %llu %ssignalled\n", fence->context,
> +		   timeline, driver, fence->seqno, signaled);

I was initially worried if this string ends up anywhere which could be 
considered ABI but it seems debugfs only so changing the formatting is fine.

How about making dma_fence_describe() conditional on CONFIG_DEBUG_FS to 
set this in stone? (And dma_resv_describe..)

And maybe unify the %llu:%llu context:fence as the tracepoints use?

Altogether something like:

rcu_read_lock();

signaled = dma_fence_is_signaled(fence);

if (signaled)
	seq_printf("%llu:%llu %s %s signalled",
		   fence->context,
		   fence->seqno,
		   dma_fence_timeline_name(fence),
		   dma_fence_timeline_name(fence);
else
	seq_print("%llu:%llu unsignalled",
		  fence->seqno, fence->context);

Maybe more readable but up to you.

Regards,

Tvrtko
>   	rcu_read_unlock();
>   }


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 02/15] dma-buf: rework stub fence initialisation
  2025-10-13 13:48 ` [PATCH 02/15] dma-buf: rework stub fence initialisation Christian König
@ 2025-10-14 15:03   ` Tvrtko Ursulin
  2025-10-24  7:29   ` Tvrtko Ursulin
  1 sibling, 0 replies; 47+ messages in thread
From: Tvrtko Ursulin @ 2025-10-14 15:03 UTC (permalink / raw)
  To: Christian König, phasta, alexdeucher, simona.vetter
  Cc: dri-devel, amd-gfx


On 13/10/2025 14:48, Christian König wrote:
> Instead of doing this on the first call of the function just initialize
> the stub fence during kernel load.
> 
> This has the clear advantage of lower overhead and also doesn't rely on
> the ops to not be NULL any more.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/dma-buf/dma-fence.c | 32 +++++++++++++++-----------------
>   1 file changed, 15 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index f0539c73ed57..51ee13d005bc 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -121,29 +121,27 @@ static const struct dma_fence_ops dma_fence_stub_ops = {
>   	.get_timeline_name = dma_fence_stub_get_name,
>   };
>   
> +static int __init dma_fence_init_stub(void)
> +{
> +	dma_fence_init(&dma_fence_stub, &dma_fence_stub_ops,
> +		       &dma_fence_stub_lock, 0, 0);
> +
> +	set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> +		&dma_fence_stub.flags);
> +
> +	dma_fence_signal_locked(&dma_fence_stub);
> +	return 0;
> +}
> +subsys_initcall(dma_fence_init_stub);
> +
>   /**
>    * dma_fence_get_stub - return a signaled fence
>    *
> - * Return a stub fence which is already signaled. The fence's
> - * timestamp corresponds to the first time after boot this
> - * function is called.
> + * Return a stub fence which is already signaled. The fence's timestamp
> + * corresponds to the initialisation time of the linux kernel.

We sure hope it's Linux kernel and not some imposter! :D (Ie. you can 
drop linux if you want.)

>    */
>   struct dma_fence *dma_fence_get_stub(void)
>   {
> -	spin_lock(&dma_fence_stub_lock);
> -	if (!dma_fence_stub.ops) {
> -		dma_fence_init(&dma_fence_stub,
> -			       &dma_fence_stub_ops,
> -			       &dma_fence_stub_lock,
> -			       0, 0);
> -
> -		set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> -			&dma_fence_stub.flags);
> -
> -		dma_fence_signal_locked(&dma_fence_stub);
> -	}
> -	spin_unlock(&dma_fence_stub_lock);
> -
>   	return dma_fence_get(&dma_fence_stub);
>   }
>   EXPORT_SYMBOL(dma_fence_get_stub);

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>

Regards,

Tvrtko


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Independence for dma_fences!
  2025-10-13 14:54 ` Independence for dma_fences! Philipp Stanner
@ 2025-10-14 15:54   ` Christian König
  2025-10-17  8:32     ` Philipp Stanner
  0 siblings, 1 reply; 47+ messages in thread
From: Christian König @ 2025-10-14 15:54 UTC (permalink / raw)
  To: phasta, alexdeucher, simona.vetter, tursulin; +Cc: dri-devel, amd-gfx

On 13.10.25 16:54, Philipp Stanner wrote:
> On Mon, 2025-10-13 at 15:48 +0200, Christian König wrote:
>> Hi everyone,
>>
>> dma_fences have ever lived under the tyranny dictated by the module
>> lifetime of their issuer, leading to crashes should anybody still holding
>> a reference to a dma_fence when the module of the issuer was unloaded.
>>
>> But those days are over! The patch set following this mail finally
>> implements a way for issuers to release their dma_fence out of this
>> slavery and outlive the module who originally created them.
>>
>> Previously various approaches have been discussed, including changing the
>> locking semantics of the dma_fence callbacks (by me) as well as using the
>> drm scheduler as intermediate layer (by Sima) to disconnect dma_fences
>> from their actual users.
>>
>> Changing the locking semantics turned out to be much more trickier than
>> originally thought because especially on older drivers (nouveau, radeon,
>> but also i915) this locking semantics is actually needed for correct
>> operation.
>>
>> Using the drm_scheduler as intermediate layer is still a good idea and
>> should probably be implemented to make live simpler for some drivers, but
>> doesn't work for all use cases. Especially TLB flush fences, preemption
>> fences and userqueue fences don't go through the drm scheduler because it
>> doesn't make sense for them.
>>
>> Tvrtko did some really nice prerequisite work by protecting the returned
>> strings of the dma_fence_ops by RCU. This way dma_fence creators where
>> able to just wait for an RCU grace period after fence signaling before
>> they could be save to free those data structures.
>>
>> Now this patch set here goes a step further and protects the whole
>> dma_fence_ops structure by RCU, so that after the fence signals the
>> pointer to the dma_fence_ops is set to NULL when there is no wait nor
>> release callback given. All functionality which use the dma_fence_ops
>> reference are put inside an RCU critical section, except for the
>> deprecated issuer specific wait and of course the optional release
>> callback.
>>
>> Additional to the RCU changes the lock protecting the dma_fence state
>> previously had to be allocated external. This set here now changes the
>> functionality to make that external lock optional and allows dma_fences
>> to use an inline lock and be self contained.
> 
> Allowing for an embedded lock, is that actually necessary for the goals
> of this series, or is it an optional change / improvement?

It is kind of necessary because otherwise you can't fully determine the lifetime of the lock.

The lock is used to avoid signaling a dma_fence when you modify the linked list of callbacks for example.

An alternative would be to protect the lock by RCU as well instead of embedding it in the structure, but that would make things even more complicated.

> If I understood you correctly at XDC you wanted to have an embedded
> lock because it improves the memory footprint and because an external
> lock couldn't achieve some goals about fence-signaling-order originally
> intended. Can you elaborate on that?

The embedded lock is also nice to have for the dma_fence_array, dma_fence_chain and drm_sched_fence, but that just saves a few cache lines in some use cases.

The fence-signaling-order is important for drivers like radeon where the external lock is protecting multiple fences from signaling at the same time and makes sure that everything stays in order.

While it is possible to change the locking semantics on such old drivers, it's probably just better to stay away from it.

Regards,
Christian.

> 
> P.
> 
> 
>>
>> The new approach is then applied to amdgpu allowing the module to be
>> unloaded even when dma_fences issued by it are still around.
>>
>> Please review and comment,
>> Christian.
>>
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Independence for dma_fences!
  2025-10-13 13:48 Independence for dma_fences! Christian König
                   ` (15 preceding siblings ...)
  2025-10-13 14:54 ` Independence for dma_fences! Philipp Stanner
@ 2025-10-15  0:51 ` Dave Airlie
  16 siblings, 0 replies; 47+ messages in thread
From: Dave Airlie @ 2025-10-15  0:51 UTC (permalink / raw)
  To: Christian König
  Cc: phasta, alexdeucher, simona.vetter, tursulin, dri-devel, amd-gfx

On Tue, 14 Oct 2025 at 01:11, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Hi everyone,
>
> dma_fences have ever lived under the tyranny dictated by the module
> lifetime of their issuer, leading to crashes should anybody still holding
> a reference to a dma_fence when the module of the issuer was unloaded.
>
> But those days are over! The patch set following this mail finally
> implements a way for issuers to release their dma_fence out of this
> slavery and outlive the module who originally created them.
>
> Previously various approaches have been discussed, including changing the
> locking semantics of the dma_fence callbacks (by me) as well as using the
> drm scheduler as intermediate layer (by Sima) to disconnect dma_fences
> from their actual users.
>
> Changing the locking semantics turned out to be much more trickier than
> originally thought because especially on older drivers (nouveau, radeon,
> but also i915) this locking semantics is actually needed for correct
> operation.
>
> Using the drm_scheduler as intermediate layer is still a good idea and
> should probably be implemented to make live simpler for some drivers, but
> doesn't work for all use cases. Especially TLB flush fences, preemption
> fences and userqueue fences don't go through the drm scheduler because it
> doesn't make sense for them.
>
> Tvrtko did some really nice prerequisite work by protecting the returned
> strings of the dma_fence_ops by RCU. This way dma_fence creators where
> able to just wait for an RCU grace period after fence signaling before
> they could be save to free those data structures.
>
> Now this patch set here goes a step further and protects the whole
> dma_fence_ops structure by RCU, so that after the fence signals the
> pointer to the dma_fence_ops is set to NULL when there is no wait nor
> release callback given. All functionality which use the dma_fence_ops
> reference are put inside an RCU critical section, except for the
> deprecated issuer specific wait and of course the optional release
> callback.
>
> Additional to the RCU changes the lock protecting the dma_fence state
> previously had to be allocated external. This set here now changes the
> functionality to make that external lock optional and allows dma_fences
> to use an inline lock and be self contained.
>
> The new approach is then applied to amdgpu allowing the module to be
> unloaded even when dma_fences issued by it are still around.

Can we add some Why? in here, like what use cases does this enable,

Some more explanation about what these hanging about fences will be
used in, like if the module is gone away, I have to assume this is for
already signalled fences, so someone is waiting and hasn't cleaned up
yet?

What problem does it solve wrt module unload, what scenario is
unloading amdgpu not possible in now, what scenario will it be able to
unload in after?

Thanks,

Dave.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/15] dma-buf: detach fence ops on signal
  2025-10-13 13:48 ` [PATCH 04/15] dma-buf: detach fence ops on signal Christian König
@ 2025-10-16  8:56   ` Tvrtko Ursulin
  2025-10-16 15:57     ` Tvrtko Ursulin
  2025-10-17  9:14   ` Philipp Stanner
  1 sibling, 1 reply; 47+ messages in thread
From: Tvrtko Ursulin @ 2025-10-16  8:56 UTC (permalink / raw)
  To: Christian König, phasta, alexdeucher, simona.vetter
  Cc: dri-devel, amd-gfx


On 13/10/2025 14:48, Christian König wrote:
> When neither a release nor a wait operation is specified it is possible
> to let the dma_fence live on independent of the module who issued it.
> 
> This makes it possible to unload drivers and only wait for all their
> fences to signal.

Have you looked at whether the requirement to not have the release and 
wait callbacks will exclude some drivers from being able to benefit from 
this?

Regards,

Tvrtko
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/dma-buf/dma-fence.c | 16 ++++++++++++----
>   include/linux/dma-fence.h   |  4 ++--
>   2 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 982f2b2a62c0..39f73edf3a33 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -374,6 +374,14 @@ int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>   				      &fence->flags)))
>   		return -EINVAL;
>   
> +	/*
> +	 * When neither a release nor a wait operation is specified set the ops
> +	 * pointer to NULL to allow the fence structure to become independent
> +	 * who originally issued it.
> +	 */
> +	if (!fence->ops->release && !fence->ops->wait)
> +		RCU_INIT_POINTER(fence->ops, NULL);
> +
>   	/* Stash the cb_list before replacing it with the timestamp */
>   	list_replace(&fence->cb_list, &cb_list);
>   
> @@ -513,7 +521,7 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
>   	rcu_read_lock();
>   	ops = rcu_dereference(fence->ops);
>   	trace_dma_fence_wait_start(fence);
> -	if (ops->wait) {
> +	if (ops && ops->wait) {
>   		/*
>   		 * Implementing the wait ops is deprecated and not supported for
>   		 * issuer independent fences, so it is ok to use the ops outside
> @@ -578,7 +586,7 @@ void dma_fence_release(struct kref *kref)
>   	}
>   
>   	ops = rcu_dereference(fence->ops);
> -	if (ops->release)
> +	if (ops && ops->release)
>   		ops->release(fence);
>   	else
>   		dma_fence_free(fence);
> @@ -614,7 +622,7 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
>   
>   	rcu_read_lock();
>   	ops = rcu_dereference(fence->ops);
> -	if (!was_set && ops->enable_signaling) {
> +	if (!was_set && ops && ops->enable_signaling) {
>   		trace_dma_fence_enable_signal(fence);
>   
>   		if (!ops->enable_signaling(fence)) {
> @@ -1000,7 +1008,7 @@ void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
>   
>   	rcu_read_lock();
>   	ops = rcu_dereference(fence->ops);
> -	if (ops->set_deadline && !dma_fence_is_signaled(fence))
> +	if (ops && ops->set_deadline && !dma_fence_is_signaled(fence))
>   		ops->set_deadline(fence, deadline);
>   	rcu_read_unlock();
>   }
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 38421a0c7c5b..e1ba1d53de88 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -425,7 +425,7 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
>   
>   	rcu_read_lock();
>   	ops = rcu_dereference(fence->ops);
> -	if (ops->signaled && ops->signaled(fence)) {
> +	if (ops && ops->signaled && ops->signaled(fence)) {
>   		rcu_read_unlock();
>   		dma_fence_signal_locked(fence);
>   		return true;
> @@ -461,7 +461,7 @@ dma_fence_is_signaled(struct dma_fence *fence)
>   
>   	rcu_read_lock();
>   	ops = rcu_dereference(fence->ops);
> -	if (ops->signaled && ops->signaled(fence)) {
> +	if (ops && ops->signaled && ops->signaled(fence)) {
>   		rcu_read_unlock();
>   		dma_fence_signal(fence);
>   		return true;


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/15] dma-buf: inline spinlock for fence protection
  2025-10-13 13:48 ` [PATCH 05/15] dma-buf: inline spinlock for fence protection Christian König
@ 2025-10-16  9:26   ` Tvrtko Ursulin
  2025-11-03 13:07     ` Philipp Stanner
  2025-10-23 18:09   ` Matthew Brost
  1 sibling, 1 reply; 47+ messages in thread
From: Tvrtko Ursulin @ 2025-10-16  9:26 UTC (permalink / raw)
  To: Christian König, phasta, alexdeucher, simona.vetter
  Cc: dri-devel, amd-gfx


Hi Christian,

Only some preliminary comments while I am building a complete picture.

On 13/10/2025 14:48, Christian König wrote:
> Allow implementations to not give a spinlock to protect the fence
> internal state, instead a spinlock embedded into the fence structure
> itself is used in this case.
> 
> Apart from simplifying the handling for containers and the stub fence
> this has the advantage of allowing implementations to issue fences
> without caring about theit spinlock lifetime.
> 
> That in turn is necessary for independent fences who outlive the module
> who originally issued them.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/dma-buf/dma-fence.c              | 54 ++++++++++++------------
>   drivers/dma-buf/sw_sync.c                | 14 +++---
>   drivers/dma-buf/sync_debug.h             |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  4 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |  4 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 12 +++---
>   drivers/gpu/drm/drm_crtc.c               |  2 +-
>   drivers/gpu/drm/drm_writeback.c          |  2 +-
>   drivers/gpu/drm/nouveau/nouveau_drm.c    |  5 ++-
>   drivers/gpu/drm/nouveau/nouveau_fence.c  |  3 +-
>   drivers/gpu/drm/qxl/qxl_release.c        |  3 +-
>   drivers/gpu/drm/vmwgfx/vmwgfx_fence.c    |  3 +-
>   drivers/gpu/drm/xe/xe_hw_fence.c         |  3 +-
>   drivers/gpu/drm/xe/xe_sched_job.c        |  4 +-
>   include/linux/dma-fence.h                | 42 +++++++++++++++++-
>   15 files changed, 99 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 39f73edf3a33..a0b328fdd90d 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -343,7 +343,6 @@ void __dma_fence_might_wait(void)
>   }
>   #endif
>   
> -
>   /**
>    * dma_fence_signal_timestamp_locked - signal completion of a fence
>    * @fence: the fence to signal
> @@ -368,7 +367,7 @@ int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>   	struct dma_fence_cb *cur, *tmp;
>   	struct list_head cb_list;
>   
> -	lockdep_assert_held(fence->lock);
> +	lockdep_assert_held(dma_fence_spinlock(fence));
>   
>   	if (unlikely(test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
>   				      &fence->flags)))
> @@ -421,9 +420,9 @@ int dma_fence_signal_timestamp(struct dma_fence *fence, ktime_t timestamp)
>   	if (WARN_ON(!fence))
>   		return -EINVAL;
>   
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock(fence, flags);
>   	ret = dma_fence_signal_timestamp_locked(fence, timestamp);
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock(fence, flags);
>   
>   	return ret;
>   }
> @@ -475,9 +474,9 @@ int dma_fence_signal(struct dma_fence *fence)
>   
>   	tmp = dma_fence_begin_signalling();
>   
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock(fence, flags);
>   	ret = dma_fence_signal_timestamp_locked(fence, ktime_get());
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock(fence, flags);
>   
>   	dma_fence_end_signalling(tmp);
>   
> @@ -579,10 +578,10 @@ void dma_fence_release(struct kref *kref)
>   		 * don't leave chains dangling. We set the error flag first
>   		 * so that the callbacks know this signal is due to an error.
>   		 */
> -		spin_lock_irqsave(fence->lock, flags);
> +		dma_fence_lock(fence, flags);
>   		fence->error = -EDEADLK;
>   		dma_fence_signal_locked(fence);
> -		spin_unlock_irqrestore(fence->lock, flags);
> +		dma_fence_unlock(fence, flags);
>   	}
>   
>   	ops = rcu_dereference(fence->ops);
> @@ -612,7 +611,7 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
>   	const struct dma_fence_ops *ops;
>   	bool was_set;
>   
> -	lockdep_assert_held(fence->lock);
> +	lockdep_assert_held(dma_fence_spinlock(fence));
>   
>   	was_set = test_and_set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
>   				   &fence->flags);
> @@ -648,9 +647,9 @@ void dma_fence_enable_sw_signaling(struct dma_fence *fence)
>   {
>   	unsigned long flags;
>   
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock(fence, flags);
>   	__dma_fence_enable_signaling(fence);
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock(fence, flags);
>   }
>   EXPORT_SYMBOL(dma_fence_enable_sw_signaling);
>   
> @@ -690,8 +689,7 @@ int dma_fence_add_callback(struct dma_fence *fence, struct dma_fence_cb *cb,
>   		return -ENOENT;
>   	}
>   
> -	spin_lock_irqsave(fence->lock, flags);
> -
> +	dma_fence_lock(fence, flags);
>   	if (__dma_fence_enable_signaling(fence)) {
>   		cb->func = func;
>   		list_add_tail(&cb->node, &fence->cb_list);
> @@ -699,8 +697,7 @@ int dma_fence_add_callback(struct dma_fence *fence, struct dma_fence_cb *cb,
>   		INIT_LIST_HEAD(&cb->node);
>   		ret = -ENOENT;
>   	}
> -
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock(fence, flags);
>   
>   	return ret;
>   }
> @@ -723,9 +720,9 @@ int dma_fence_get_status(struct dma_fence *fence)
>   	unsigned long flags;
>   	int status;
>   
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock(fence, flags);
>   	status = dma_fence_get_status_locked(fence);
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock(fence, flags);
>   
>   	return status;
>   }
> @@ -755,13 +752,11 @@ dma_fence_remove_callback(struct dma_fence *fence, struct dma_fence_cb *cb)
>   	unsigned long flags;
>   	bool ret;
>   
> -	spin_lock_irqsave(fence->lock, flags);
> -
> +	dma_fence_lock(fence, flags);
>   	ret = !list_empty(&cb->node);
>   	if (ret)
>   		list_del_init(&cb->node);
> -
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock(fence, flags);
>   
>   	return ret;
>   }
> @@ -800,8 +795,7 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
>   	unsigned long flags;
>   	signed long ret = timeout ? timeout : 1;
>   
> -	spin_lock_irqsave(fence->lock, flags);
> -
> +	dma_fence_lock(fence, flags);
>   	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>   		goto out;
>   
> @@ -824,11 +818,11 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
>   			__set_current_state(TASK_INTERRUPTIBLE);
>   		else
>   			__set_current_state(TASK_UNINTERRUPTIBLE);
> -		spin_unlock_irqrestore(fence->lock, flags);
> +		dma_fence_unlock(fence, flags);
>   
>   		ret = schedule_timeout(ret);
>   
> -		spin_lock_irqsave(fence->lock, flags);
> +		dma_fence_lock(fence, flags);
>   		if (ret > 0 && intr && signal_pending(current))
>   			ret = -ERESTARTSYS;
>   	}
> @@ -838,7 +832,7 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
>   	__set_current_state(TASK_RUNNING);
>   
>   out:
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock(fence, flags);
>   	return ret;
>   }
>   EXPORT_SYMBOL(dma_fence_default_wait);
> @@ -1046,7 +1040,6 @@ static void
>   __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>   	         spinlock_t *lock, u64 context, u64 seqno, unsigned long flags)
>   {
> -	BUG_ON(!lock);
>   	BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
>   
>   	kref_init(&fence->refcount);
> @@ -1057,10 +1050,15 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>   	 */
>   	RCU_INIT_POINTER(fence->ops, ops);
>   	INIT_LIST_HEAD(&fence->cb_list);
> -	fence->lock = lock;
>   	fence->context = context;
>   	fence->seqno = seqno;
>   	fence->flags = flags;
> +	if (lock) {
> +		fence->extern_lock = lock;
> +	} else {
> +		spin_lock_init(&fence->inline_lock);
> +		fence->flags |= BIT(DMA_FENCE_FLAG_INLINE_LOCK_BIT);
> +	}
>   	fence->error = 0;
>   
>   	trace_dma_fence_init(fence);
> diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
> index 3c20f1d31cf5..8f48529214a4 100644
> --- a/drivers/dma-buf/sw_sync.c
> +++ b/drivers/dma-buf/sw_sync.c
> @@ -155,12 +155,12 @@ static void timeline_fence_release(struct dma_fence *fence)
>   	struct sync_timeline *parent = dma_fence_parent(fence);
>   	unsigned long flags;
>   
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock(fence, flags);
>   	if (!list_empty(&pt->link)) {
>   		list_del(&pt->link);
>   		rb_erase(&pt->node, &parent->pt_tree);
>   	}
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock(fence, flags);
>   
>   	sync_timeline_put(parent);
>   	dma_fence_free(fence);
> @@ -178,7 +178,7 @@ static void timeline_fence_set_deadline(struct dma_fence *fence, ktime_t deadlin
>   	struct sync_pt *pt = dma_fence_to_sync_pt(fence);
>   	unsigned long flags;
>   
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock(fence, flags);
>   	if (test_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags)) {
>   		if (ktime_before(deadline, pt->deadline))
>   			pt->deadline = deadline;
> @@ -186,7 +186,7 @@ static void timeline_fence_set_deadline(struct dma_fence *fence, ktime_t deadlin
>   		pt->deadline = deadline;
>   		__set_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags);
>   	}
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock(fence, flags);
>   }
>   
>   static const struct dma_fence_ops timeline_fence_ops = {
> @@ -427,13 +427,13 @@ static int sw_sync_ioctl_get_deadline(struct sync_timeline *obj, unsigned long a
>   		goto put_fence;
>   	}
>   
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock(fence, flags);
>   	if (!test_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags)) {
>   		ret = -ENOENT;
>   		goto unlock;
>   	}
>   	data.deadline_ns = ktime_to_ns(pt->deadline);
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock(fence, flags);
>   
>   	dma_fence_put(fence);
>   
> @@ -446,7 +446,7 @@ static int sw_sync_ioctl_get_deadline(struct sync_timeline *obj, unsigned long a
>   	return 0;
>   
>   unlock:
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock(fence, flags);
>   put_fence:
>   	dma_fence_put(fence);
>   
> diff --git a/drivers/dma-buf/sync_debug.h b/drivers/dma-buf/sync_debug.h
> index 02af347293d0..c49324505b20 100644
> --- a/drivers/dma-buf/sync_debug.h
> +++ b/drivers/dma-buf/sync_debug.h
> @@ -47,7 +47,7 @@ struct sync_timeline {
>   
>   static inline struct sync_timeline *dma_fence_parent(struct dma_fence *fence)
>   {
> -	return container_of(fence->lock, struct sync_timeline, lock);
> +	return container_of(fence->extern_lock, struct sync_timeline, lock);

These container_ofs are a bit annoying. Maybe even a bit fragile if 
someone switches to embedded lock and forgets to update them all.

Would prep patch to first replace them with some dma_fence_container_of 
wrapper make sense? Then it could even have a (debug builds only) assert 
added to check for correct usage.

>   }
>   
>   /**
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> index 5ec5c3ff22bb..fcc7a3fb93b3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> @@ -468,10 +468,10 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, unsigned int vmid,
>   	if (amdgpu_sriov_vf(ring->adev) || !ring->funcs->soft_recovery || !fence)
>   		return false;
>   
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock(fence, flags);
>   	if (!dma_fence_is_signaled_locked(fence))
>   		dma_fence_set_error(fence, -ENODATA);
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock(fence, flags);
>   
>   	while (!dma_fence_is_signaled(fence) &&
>   	       ktime_to_ns(ktime_sub(deadline, ktime_get())) > 0)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index db66b4232de0..db6516ce8335 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -2774,8 +2774,8 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
>   	dma_fence_put(vm->last_unlocked);
>   	dma_fence_wait(vm->last_tlb_flush, false);
>   	/* Make sure that all fence callbacks have completed */
> -	spin_lock_irqsave(vm->last_tlb_flush->lock, flags);
> -	spin_unlock_irqrestore(vm->last_tlb_flush->lock, flags);
> +	dma_fence_lock(vm->last_tlb_flush, flags);
> +	dma_fence_unlock(vm->last_tlb_flush, flags);
>   	dma_fence_put(vm->last_tlb_flush);
>   
>   	list_for_each_entry_safe(mapping, tmp, &vm->freed, list) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index 77207f4e448e..4fc7f66b7d13 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -631,20 +631,20 @@ bool amdgpu_vm_is_bo_always_valid(struct amdgpu_vm *vm, struct amdgpu_bo *bo);
>    */
>   static inline uint64_t amdgpu_vm_tlb_seq(struct amdgpu_vm *vm)
>   {
> +	struct dma_fence *fence;
>   	unsigned long flags;
> -	spinlock_t *lock;
>   
>   	/*
>   	 * Workaround to stop racing between the fence signaling and handling
> -	 * the cb. The lock is static after initially setting it up, just make
> -	 * sure that the dma_fence structure isn't freed up.
> +	 * the cb.
>   	 */
>   	rcu_read_lock();
> -	lock = vm->last_tlb_flush->lock;
> +	fence = dma_fence_get_rcu(vm->last_tlb_flush);

Split out addition of reference counting to a separate patch?

>   	rcu_read_unlock();
>   
> -	spin_lock_irqsave(lock, flags);
> -	spin_unlock_irqrestore(lock, flags);
> +	dma_fence_lock(fence, flags);
> +	dma_fence_unlock(fence, flags);
> +	dma_fence_put(fence);
>   
>   	return atomic64_read(&vm->tlb_seq);
>   }
> diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
> index 46655339003d..ad47f58cd159 100644
> --- a/drivers/gpu/drm/drm_crtc.c
> +++ b/drivers/gpu/drm/drm_crtc.c
> @@ -159,7 +159,7 @@ static const struct dma_fence_ops drm_crtc_fence_ops;
>   static struct drm_crtc *fence_to_crtc(struct dma_fence *fence)
>   {
>   	BUG_ON(fence->ops != &drm_crtc_fence_ops);
> -	return container_of(fence->lock, struct drm_crtc, fence_lock);
> +	return container_of(fence->extern_lock, struct drm_crtc, fence_lock);
>   }
>   
>   static const char *drm_crtc_fence_get_driver_name(struct dma_fence *fence)
> diff --git a/drivers/gpu/drm/drm_writeback.c b/drivers/gpu/drm/drm_writeback.c
> index 95b8a2e4bda6..624a4e8b6c99 100644
> --- a/drivers/gpu/drm/drm_writeback.c
> +++ b/drivers/gpu/drm/drm_writeback.c
> @@ -81,7 +81,7 @@
>    *	From userspace, this property will always read as zero.
>    */
>   
> -#define fence_to_wb_connector(x) container_of(x->lock, \
> +#define fence_to_wb_connector(x) container_of(x->extern_lock, \
>   					      struct drm_writeback_connector, \
>   					      fence_lock)
>   
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index 1527b801f013..2956ed2ec073 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -156,12 +156,13 @@ nouveau_name(struct drm_device *dev)
>   static inline bool
>   nouveau_cli_work_ready(struct dma_fence *fence)
>   {
> +	unsigned long flags;
>   	bool ret = true;
>   
> -	spin_lock_irq(fence->lock);
> +	dma_fence_lock(fence, flags);
>   	if (!dma_fence_is_signaled_locked(fence))
>   		ret = false;
> -	spin_unlock_irq(fence->lock);
> +	dma_fence_unlock(fence, flags);
>   
>   	if (ret == true)
>   		dma_fence_put(fence);
> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
> index d5654e26d5bc..272b492c4d7c 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> @@ -47,7 +47,8 @@ from_fence(struct dma_fence *fence)
>   static inline struct nouveau_fence_chan *
>   nouveau_fctx(struct nouveau_fence *fence)
>   {
> -	return container_of(fence->base.lock, struct nouveau_fence_chan, lock);
> +	return container_of(fence->base.extern_lock, struct nouveau_fence_chan,
> +			    lock);
>   }
>   
>   static bool
> diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
> index 05204a6a3fa8..1d346822c1f7 100644
> --- a/drivers/gpu/drm/qxl/qxl_release.c
> +++ b/drivers/gpu/drm/qxl/qxl_release.c
> @@ -60,7 +60,8 @@ static long qxl_fence_wait(struct dma_fence *fence, bool intr,
>   	struct qxl_device *qdev;
>   	unsigned long cur, end = jiffies + timeout;
>   
> -	qdev = container_of(fence->lock, struct qxl_device, release_lock);
> +	qdev = container_of(fence->extern_lock, struct qxl_device,
> +			    release_lock);
>   
>   	if (!wait_event_timeout(qdev->release_event,
>   				(dma_fence_is_signaled(fence) ||
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> index c2294abbe753..346761172c1b 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> @@ -47,7 +47,8 @@ struct vmw_event_fence_action {
>   static struct vmw_fence_manager *
>   fman_from_fence(struct vmw_fence_obj *fence)
>   {
> -	return container_of(fence->base.lock, struct vmw_fence_manager, lock);
> +	return container_of(fence->base.extern_lock, struct vmw_fence_manager,
> +			    lock);
>   }
>   
>   static void vmw_fence_obj_destroy(struct dma_fence *f)
> diff --git a/drivers/gpu/drm/xe/xe_hw_fence.c b/drivers/gpu/drm/xe/xe_hw_fence.c
> index b2a0c46dfcd4..3456bec93c70 100644
> --- a/drivers/gpu/drm/xe/xe_hw_fence.c
> +++ b/drivers/gpu/drm/xe/xe_hw_fence.c
> @@ -144,7 +144,8 @@ static struct xe_hw_fence *to_xe_hw_fence(struct dma_fence *fence);
>   
>   static struct xe_hw_fence_irq *xe_hw_fence_irq(struct xe_hw_fence *fence)
>   {
> -	return container_of(fence->dma.lock, struct xe_hw_fence_irq, lock);
> +	return container_of(fence->dma.extern_lock, struct xe_hw_fence_irq,
> +			    lock);
>   }
>   
>   static const char *xe_hw_fence_get_driver_name(struct dma_fence *dma_fence)
> diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
> index d21bf8f26964..ea7038475b4b 100644
> --- a/drivers/gpu/drm/xe/xe_sched_job.c
> +++ b/drivers/gpu/drm/xe/xe_sched_job.c
> @@ -187,11 +187,11 @@ static bool xe_fence_set_error(struct dma_fence *fence, int error)
>   	unsigned long irq_flags;
>   	bool signaled;
>   
> -	spin_lock_irqsave(fence->lock, irq_flags);
> +	dma_fence_lock(fence, irq_flags);
>   	signaled = test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
>   	if (!signaled)
>   		dma_fence_set_error(fence, error);
> -	spin_unlock_irqrestore(fence->lock, irq_flags);
> +	dma_fence_unlock(fence, irq_flags);
>   
>   	return signaled;
>   }
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index e1ba1d53de88..fb416f500664 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -34,7 +34,8 @@ struct seq_file;
>    * @ops: dma_fence_ops associated with this fence
>    * @rcu: used for releasing fence with kfree_rcu
>    * @cb_list: list of all callbacks to call
> - * @lock: spin_lock_irqsave used for locking
> + * @extern_lock: external spin_lock_irqsave used for locking
> + * @inline_lock: alternative internal spin_lock_irqsave used for locking
>    * @context: execution context this fence belongs to, returned by
>    *           dma_fence_context_alloc()
>    * @seqno: the sequence number of this fence inside the execution context,
> @@ -48,6 +49,7 @@ struct seq_file;
>    * atomic ops (bit_*), so taking the spinlock will not be needed most
>    * of the time.
>    *
> + * DMA_FENCE_FLAG_INLINE_LOCK_BIT - use inline spinlock instead of external one
>    * DMA_FENCE_FLAG_SIGNALED_BIT - fence is already signaled
>    * DMA_FENCE_FLAG_TIMESTAMP_BIT - timestamp recorded for fence signaling
>    * DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT - enable_signaling might have been called
> @@ -65,7 +67,10 @@ struct seq_file;
>    * been completed, or never called at all.
>    */
>   struct dma_fence {
> -	spinlock_t *lock;
> +	union {
> +		spinlock_t *extern_lock;
> +		spinlock_t inline_lock;

This will grow the struct on some architectures so I think, given the 
strong push back to struct past a 64B cacheline in the past, it should 
be called out in the commit message.

> +	};
>   	const struct dma_fence_ops __rcu *ops;
>   	/*
>   	 * We clear the callback list on kref_put so that by the time we
> @@ -98,6 +103,7 @@ struct dma_fence {
>   };
>   
>   enum dma_fence_flag_bits {
> +	DMA_FENCE_FLAG_INLINE_LOCK_BIT,
>   	DMA_FENCE_FLAG_SEQNO64_BIT,
>   	DMA_FENCE_FLAG_SIGNALED_BIT,
>   	DMA_FENCE_FLAG_TIMESTAMP_BIT,
> @@ -351,6 +357,38 @@ dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep)
>   	} while (1);
>   }
>   
> +/**
> + * dma_fence_spinlock - return pointer to the spinlock protecting the fence
> + * @fence: the fence to get the lock from
> + *
> + * Return either the pointer to the embedded or the external spin lock.
> + */
> +static inline spinlock_t *dma_fence_spinlock(struct dma_fence *fence)
> +{
> +	return test_bit(DMA_FENCE_FLAG_INLINE_LOCK_BIT, &fence->flags) ?
> +		&fence->inline_lock : fence->extern_lock;

Slightly annoying to have a conditional on every lock operation but I 
suppose not a super huge deal.

If I suggested to not have it inline and extern lock as the union but 
always store the pointer then the size of the whole struct would spill 
over 64B on many more platforms. But it would be more robust against 
mistakes.

> +}
> +
> +/**
> + * dma_fence_lock - irqsave lock the fence
> + * @fence: the fence to lock
> + * @flags: where to store the CPU flags.
> + *
> + * Lock the fence, preventing it from changing to the signaled state.
> + */
> +#define dma_fence_lock(fence, flags)	\
> +	spin_lock_irqsave(dma_fence_spinlock(fence), flags)

Do you think keeping the irqsave/restore naming would be clearer? I am 
leaning towards thinking so. Less mental load to keep it completely 
analogue with spin_lock_irqsave(lock, flags) since it takes the flags 
argument and has the same semantics.

Regards,

Tvrtko

> +
> +/**
> + * dma_fence_unlock - unlock the fence and irqrestore
> + * @fence: the fence to unlock
> + * @flags the CPU flags to restore
> + *
> + * Unlock the fence, allowing it to change it's state to signaled again.
> + */
> +#define dma_fence_unlock(fence, flags)	\
> +	spin_unlock_irqrestore(dma_fence_spinlock(fence), flags)
> +
>   #ifdef CONFIG_LOCKDEP
>   bool dma_fence_begin_signalling(void);
>   void dma_fence_end_signalling(bool cookie);


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/15] dma-buf: detach fence ops on signal
  2025-10-16  8:56   ` Tvrtko Ursulin
@ 2025-10-16 15:57     ` Tvrtko Ursulin
  2025-10-23  4:23       ` Matthew Brost
  2025-10-30 13:52       ` Christian König
  0 siblings, 2 replies; 47+ messages in thread
From: Tvrtko Ursulin @ 2025-10-16 15:57 UTC (permalink / raw)
  To: Christian König, phasta, alexdeucher, simona.vetter
  Cc: dri-devel, amd-gfx


On 16/10/2025 09:56, Tvrtko Ursulin wrote:
> 
> On 13/10/2025 14:48, Christian König wrote:
>> When neither a release nor a wait operation is specified it is possible
>> to let the dma_fence live on independent of the module who issued it.
>>
>> This makes it possible to unload drivers and only wait for all their
>> fences to signal.
> 
> Have you looked at whether the requirement to not have the release and 
> wait callbacks will exclude some drivers from being able to benefit from 
> this?

I had a browse and this seems to be the situation:

Custom .wait:
  - radeon, qxl, nouveau, i915

Those would therefore still be vulnerable to the unbind->unload 
sequence. Actually not sure about qxl, but other three are PCI so in 
theory at least. I915 at least supports unbind and unload.

Custom .release:
  - vgem, nouveau, lima, pvr, i915, usb-gadget, industrialio, etnaviv, xe

Out of those there do not actually need a custom release and could 
probably be weaned off it:
  - usb-gadget, industrialio, etnaviv, xe

(Xe would lose a debug assert and some would have their kfrees replaced 
with kfree_rcu. Plus build time asserts added the struct dma-fence 
remains first in the respective driver structs. It sounds feasible.)

That would leave us with .release in:
  - vgem, nouveau, lima, pvr, i915

Combined list of custom .wait + .release:
  - radeon, qxl, nouveau, i915, lima, pvr, vgem

 From those the ones which support unbind and module unload would remain 
potentially vulnerable to use after free.

It doesn't sound great to only solve it partially but maybe it is a 
reasonable next step. Where could we go from there to solve it for everyone?

Regards,

Tvrtko

>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/dma-buf/dma-fence.c | 16 ++++++++++++----
>>   include/linux/dma-fence.h   |  4 ++--
>>   2 files changed, 14 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>> index 982f2b2a62c0..39f73edf3a33 100644
>> --- a/drivers/dma-buf/dma-fence.c
>> +++ b/drivers/dma-buf/dma-fence.c
>> @@ -374,6 +374,14 @@ int dma_fence_signal_timestamp_locked(struct 
>> dma_fence *fence,
>>                         &fence->flags)))
>>           return -EINVAL;
>> +    /*
>> +     * When neither a release nor a wait operation is specified set 
>> the ops
>> +     * pointer to NULL to allow the fence structure to become 
>> independent
>> +     * who originally issued it.
>> +     */
>> +    if (!fence->ops->release && !fence->ops->wait)
>> +        RCU_INIT_POINTER(fence->ops, NULL);
>> +
>>       /* Stash the cb_list before replacing it with the timestamp */
>>       list_replace(&fence->cb_list, &cb_list);
>> @@ -513,7 +521,7 @@ dma_fence_wait_timeout(struct dma_fence *fence, 
>> bool intr, signed long timeout)
>>       rcu_read_lock();
>>       ops = rcu_dereference(fence->ops);
>>       trace_dma_fence_wait_start(fence);
>> -    if (ops->wait) {
>> +    if (ops && ops->wait) {
>>           /*
>>            * Implementing the wait ops is deprecated and not supported 
>> for
>>            * issuer independent fences, so it is ok to use the ops 
>> outside
>> @@ -578,7 +586,7 @@ void dma_fence_release(struct kref *kref)
>>       }
>>       ops = rcu_dereference(fence->ops);
>> -    if (ops->release)
>> +    if (ops && ops->release)
>>           ops->release(fence);
>>       else
>>           dma_fence_free(fence);
>> @@ -614,7 +622,7 @@ static bool __dma_fence_enable_signaling(struct 
>> dma_fence *fence)
>>       rcu_read_lock();
>>       ops = rcu_dereference(fence->ops);
>> -    if (!was_set && ops->enable_signaling) {
>> +    if (!was_set && ops && ops->enable_signaling) {
>>           trace_dma_fence_enable_signal(fence);
>>           if (!ops->enable_signaling(fence)) {
>> @@ -1000,7 +1008,7 @@ void dma_fence_set_deadline(struct dma_fence 
>> *fence, ktime_t deadline)
>>       rcu_read_lock();
>>       ops = rcu_dereference(fence->ops);
>> -    if (ops->set_deadline && !dma_fence_is_signaled(fence))
>> +    if (ops && ops->set_deadline && !dma_fence_is_signaled(fence))
>>           ops->set_deadline(fence, deadline);
>>       rcu_read_unlock();
>>   }
>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>> index 38421a0c7c5b..e1ba1d53de88 100644
>> --- a/include/linux/dma-fence.h
>> +++ b/include/linux/dma-fence.h
>> @@ -425,7 +425,7 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
>>       rcu_read_lock();
>>       ops = rcu_dereference(fence->ops);
>> -    if (ops->signaled && ops->signaled(fence)) {
>> +    if (ops && ops->signaled && ops->signaled(fence)) {
>>           rcu_read_unlock();
>>           dma_fence_signal_locked(fence);
>>           return true;
>> @@ -461,7 +461,7 @@ dma_fence_is_signaled(struct dma_fence *fence)
>>       rcu_read_lock();
>>       ops = rcu_dereference(fence->ops);
>> -    if (ops->signaled && ops->signaled(fence)) {
>> +    if (ops && ops->signaled && ops->signaled(fence)) {
>>           rcu_read_unlock();
>>           dma_fence_signal(fence);
>>           return true;
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/15] dma-buf: protected fence ops by RCU
  2025-10-13 13:48 ` [PATCH 03/15] dma-buf: protected fence ops by RCU Christian König
@ 2025-10-16 18:04   ` Tvrtko Ursulin
  2025-10-31 10:35   ` Tvrtko Ursulin
  1 sibling, 0 replies; 47+ messages in thread
From: Tvrtko Ursulin @ 2025-10-16 18:04 UTC (permalink / raw)
  To: Christian König, phasta, alexdeucher, simona.vetter
  Cc: dri-devel, amd-gfx


On 13/10/2025 14:48, Christian König wrote:
> At first glance it is counter intuitive to protect a constant function
> pointer table by RCU, but this allows modules providing the function
> table to unload by waiting for an RCU grace period.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/dma-buf/dma-fence.c | 65 +++++++++++++++++++++++++++----------
>   include/linux/dma-fence.h   | 18 ++++++++--
>   2 files changed, 62 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 51ee13d005bc..982f2b2a62c0 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -498,6 +498,7 @@ EXPORT_SYMBOL(dma_fence_signal);
>   signed long
>   dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
>   {
> +	const struct dma_fence_ops *ops;
>   	signed long ret;
>   
>   	if (WARN_ON(timeout < 0))
> @@ -509,15 +510,21 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
>   
>   	dma_fence_enable_sw_signaling(fence);
>   
> -	if (trace_dma_fence_wait_start_enabled()) {
> -		rcu_read_lock();
> -		trace_dma_fence_wait_start(fence);
> +	rcu_read_lock();
> +	ops = rcu_dereference(fence->ops);
> +	trace_dma_fence_wait_start(fence);
> +	if (ops->wait) {
> +		/*
> +		 * Implementing the wait ops is deprecated and not supported for
> +		 * issuer independent fences, so it is ok to use the ops outside
> +		 * the RCU protected section.
> +		 */
> +		rcu_read_unlock();
> +		ret = ops->wait(fence, intr, timeout);
> +	} else {
>   		rcu_read_unlock();
> -	}
> -	if (fence->ops->wait)
> -		ret = fence->ops->wait(fence, intr, timeout);
> -	else
>   		ret = dma_fence_default_wait(fence, intr, timeout);
> +	}
>   	if (trace_dma_fence_wait_end_enabled()) {
>   		rcu_read_lock();
>   		trace_dma_fence_wait_end(fence);
> @@ -538,6 +545,7 @@ void dma_fence_release(struct kref *kref)
>   {
>   	struct dma_fence *fence =
>   		container_of(kref, struct dma_fence, refcount);
> +	const struct dma_fence_ops *ops;
>   
>   	rcu_read_lock();
>   	trace_dma_fence_destroy(fence);
> @@ -569,12 +577,12 @@ void dma_fence_release(struct kref *kref)
>   		spin_unlock_irqrestore(fence->lock, flags);
>   	}
>   
> -	rcu_read_unlock();
> -
> -	if (fence->ops->release)
> -		fence->ops->release(fence);
> +	ops = rcu_dereference(fence->ops);
> +	if (ops->release)
> +		ops->release(fence);
>   	else
>   		dma_fence_free(fence);
> +	rcu_read_unlock();

I had it like this back in May you were worried release callback can 
sleep. So I gather since then you figured out no one sleeps or takes a 
sleeping lock?

I went through them all and it seems that could be (almost) so.

There is only vgem_fence_release() which calls timer_delete_sync(), and 
while __timer_delete_sync() has a comment saying 
del_timer_wait_running() has a sleeping slow path I think this is only 
due spinlocks becoming sleeping locks.

Due this PREEMPT_RT might be a problem for the RCU approach in general, 
not just for vgem.

Possibly if you enable it you would start seeing warnings fire for 
sleeping while preemption disabled. Something to double check in case I 
got confused.

Hm actually, do you even need to move the RCU section around the 
.release() and .wait() if the premise of the series is drivers who 
specify those will not be protected?

Regards,

Tvrtko

>   }
>   EXPORT_SYMBOL(dma_fence_release);
>   
> @@ -593,6 +601,7 @@ EXPORT_SYMBOL(dma_fence_free);
>   
>   static bool __dma_fence_enable_signaling(struct dma_fence *fence)
>   {
> +	const struct dma_fence_ops *ops;
>   	bool was_set;
>   
>   	lockdep_assert_held(fence->lock);
> @@ -603,14 +612,18 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
>   	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>   		return false;
>   
> -	if (!was_set && fence->ops->enable_signaling) {
> +	rcu_read_lock();
> +	ops = rcu_dereference(fence->ops);
> +	if (!was_set && ops->enable_signaling) {
>   		trace_dma_fence_enable_signal(fence);
>   
> -		if (!fence->ops->enable_signaling(fence)) {
> +		if (!ops->enable_signaling(fence)) {
> +			rcu_read_unlock();
>   			dma_fence_signal_locked(fence);
>   			return false;
>   		}
>   	}
> +	rcu_read_unlock();
>   
>   	return true;
>   }
> @@ -983,8 +996,13 @@ EXPORT_SYMBOL(dma_fence_wait_any_timeout);
>    */
>   void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
>   {
> -	if (fence->ops->set_deadline && !dma_fence_is_signaled(fence))
> -		fence->ops->set_deadline(fence, deadline);
> +	const struct dma_fence_ops *ops;
> +
> +	rcu_read_lock();
> +	ops = rcu_dereference(fence->ops);
> +	if (ops->set_deadline && !dma_fence_is_signaled(fence))
> +		ops->set_deadline(fence, deadline);
> +	rcu_read_unlock();
>   }
>   EXPORT_SYMBOL(dma_fence_set_deadline);
>   
> @@ -1024,7 +1042,12 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>   	BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
>   
>   	kref_init(&fence->refcount);
> -	fence->ops = ops;
> +	/*
> +	 * At first glance it is counter intuitive to protect a constant
> +	 * function pointer table by RCU, but this allows modules providing the
> +	 * function table to unload by waiting for an RCU grace period.
> +	 */
> +	RCU_INIT_POINTER(fence->ops, ops);
>   	INIT_LIST_HEAD(&fence->cb_list);
>   	fence->lock = lock;
>   	fence->context = context;
> @@ -1104,11 +1127,14 @@ EXPORT_SYMBOL(dma_fence_init64);
>    */
>   const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
>   {
> +	const struct dma_fence_ops *ops;
> +
>   	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
>   			 "RCU protection is required for safe access to returned string");
>   
> +	ops = rcu_dereference(fence->ops);
>   	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> -		return fence->ops->get_driver_name(fence);
> +		return ops->get_driver_name(fence);
>   	else
>   		return "detached-driver";
>   }
> @@ -1136,11 +1162,14 @@ EXPORT_SYMBOL(dma_fence_driver_name);
>    */
>   const char __rcu *dma_fence_timeline_name(struct dma_fence *fence)
>   {
> +	const struct dma_fence_ops *ops;
> +
>   	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
>   			 "RCU protection is required for safe access to returned string");
>   
> +	ops = rcu_dereference(fence->ops);
>   	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> -		return fence->ops->get_driver_name(fence);
> +		return ops->get_driver_name(fence);
>   	else
>   		return "signaled-timeline";
>   }
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 64639e104110..38421a0c7c5b 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -66,7 +66,7 @@ struct seq_file;
>    */
>   struct dma_fence {
>   	spinlock_t *lock;
> -	const struct dma_fence_ops *ops;
> +	const struct dma_fence_ops __rcu *ops;
>   	/*
>   	 * We clear the callback list on kref_put so that by the time we
>   	 * release the fence it is unused. No one should be adding to the
> @@ -418,13 +418,19 @@ const char __rcu *dma_fence_timeline_name(struct dma_fence *fence);
>   static inline bool
>   dma_fence_is_signaled_locked(struct dma_fence *fence)
>   {
> +	const struct dma_fence_ops *ops;
> +
>   	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>   		return true;
>   
> -	if (fence->ops->signaled && fence->ops->signaled(fence)) {
> +	rcu_read_lock();
> +	ops = rcu_dereference(fence->ops);
> +	if (ops->signaled && ops->signaled(fence)) {
> +		rcu_read_unlock();
>   		dma_fence_signal_locked(fence);
>   		return true;
>   	}
> +	rcu_read_unlock();
>   
>   	return false;
>   }
> @@ -448,13 +454,19 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
>   static inline bool
>   dma_fence_is_signaled(struct dma_fence *fence)
>   {
> +	const struct dma_fence_ops *ops;
> +
>   	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>   		return true;
>   
> -	if (fence->ops->signaled && fence->ops->signaled(fence)) {
> +	rcu_read_lock();
> +	ops = rcu_dereference(fence->ops);
> +	if (ops->signaled && ops->signaled(fence)) {
> +		rcu_read_unlock();
>   		dma_fence_signal(fence);
>   		return true;
>   	}
> +	rcu_read_unlock();
>   
>   	return false;
>   }


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Independence for dma_fences!
  2025-10-14 15:54   ` Christian König
@ 2025-10-17  8:32     ` Philipp Stanner
  2025-10-28 14:06       ` Christian König
  0 siblings, 1 reply; 47+ messages in thread
From: Philipp Stanner @ 2025-10-17  8:32 UTC (permalink / raw)
  To: Christian König, phasta, alexdeucher, simona.vetter,
	tursulin
  Cc: dri-devel, amd-gfx

On Tue, 2025-10-14 at 17:54 +0200, Christian König wrote:
> On 13.10.25 16:54, Philipp Stanner wrote:
> > On Mon, 2025-10-13 at 15:48 +0200, Christian König wrote:
> > > Hi everyone,
> > > 
> > > dma_fences have ever lived under the tyranny dictated by the module
> > > lifetime of their issuer, leading to crashes should anybody still holding
> > > a reference to a dma_fence when the module of the issuer was unloaded.
> > > 
> > > But those days are over! The patch set following this mail finally
> > > implements a way for issuers to release their dma_fence out of this
> > > slavery and outlive the module who originally created them.
> > > 
> > > Previously various approaches have been discussed, including changing the
> > > locking semantics of the dma_fence callbacks (by me) as well as using the
> > > drm scheduler as intermediate layer (by Sima) to disconnect dma_fences
> > > from their actual users.
> > > 
> > > Changing the locking semantics turned out to be much more trickier than
> > > originally thought because especially on older drivers (nouveau, radeon,
> > > but also i915) this locking semantics is actually needed for correct
> > > operation.
> > > 
> > > Using the drm_scheduler as intermediate layer is still a good idea and
> > > should probably be implemented to make live simpler for some drivers, but
> > > doesn't work for all use cases. Especially TLB flush fences, preemption
> > > fences and userqueue fences don't go through the drm scheduler because it
> > > doesn't make sense for them.
> > > 
> > > Tvrtko did some really nice prerequisite work by protecting the returned
> > > strings of the dma_fence_ops by RCU. This way dma_fence creators where
> > > able to just wait for an RCU grace period after fence signaling before
> > > they could be save to free those data structures.
> > > 
> > > Now this patch set here goes a step further and protects the whole
> > > dma_fence_ops structure by RCU, so that after the fence signals the
> > > pointer to the dma_fence_ops is set to NULL when there is no wait nor
> > > release callback given. All functionality which use the dma_fence_ops
> > > reference are put inside an RCU critical section, except for the
> > > deprecated issuer specific wait and of course the optional release
> > > callback.
> > > 
> > > Additional to the RCU changes the lock protecting the dma_fence state
> > > previously had to be allocated external. This set here now changes the
> > > functionality to make that external lock optional and allows dma_fences
> > > to use an inline lock and be self contained.
> > 
> > Allowing for an embedded lock, is that actually necessary for the goals
> > of this series, or is it an optional change / improvement?
> 
> It is kind of necessary because otherwise you can't fully determine the lifetime of the lock.
> 
> The lock is used to avoid signaling a dma_fence when you modify the linked list of callbacks for example.
> 
> An alternative would be to protect the lock by RCU as well instead of embedding it in the structure, but that would make things even more complicated.
> 
> > If I understood you correctly at XDC you wanted to have an embedded
> > lock because it improves the memory footprint and because an external
> > lock couldn't achieve some goals about fence-signaling-order originally
> > intended. Can you elaborate on that?
> 
> The embedded lock is also nice to have for the dma_fence_array, dma_fence_chain and drm_sched_fence, but that just saves a few cache lines in some use cases.
> 
> The fence-signaling-order is important for drivers like radeon where the external lock is protecting multiple fences from signaling at the same time and makes sure that everything stays in order.

I mean, neither external nor internal lock can somehow force the driver
to signal fences in order, can they?

Only the driver can ensure this.

I am, however, considering modeling something like that on a
FenceContext object:

fctx.signal_all_fences_up_to_ordered(seqno);


P.

> 
> While it is possible to change the locking semantics on such old drivers, it's probably just better to stay away from it.
> 
> Regards,
> Christian.
> 
> > 
> > P.
> > 
> > 
> > > 
> > > The new approach is then applied to amdgpu allowing the module to be
> > > unloaded even when dma_fences issued by it are still around.
> > > 
> > > Please review and comment,
> > > Christian.
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/15] dma-buf: detach fence ops on signal
  2025-10-13 13:48 ` [PATCH 04/15] dma-buf: detach fence ops on signal Christian König
  2025-10-16  8:56   ` Tvrtko Ursulin
@ 2025-10-17  9:14   ` Philipp Stanner
  2025-10-30 15:05     ` Christian König
  1 sibling, 1 reply; 47+ messages in thread
From: Philipp Stanner @ 2025-10-17  9:14 UTC (permalink / raw)
  To: Christian König, alexdeucher, simona.vetter, tursulin
  Cc: dri-devel, amd-gfx

On Mon, 2025-10-13 at 15:48 +0200, Christian König wrote:
> When neither a release nor a wait operation is specified it is possible
> to let the dma_fence live on independent of the module who issued it.
> 
> This makes it possible to unload drivers and only wait for all their
> fences to signal.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/dma-buf/dma-fence.c | 16 ++++++++++++----
>  include/linux/dma-fence.h   |  4 ++--
>  2 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 982f2b2a62c0..39f73edf3a33 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -374,6 +374,14 @@ int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>  				      &fence->flags)))
>  		return -EINVAL;
>  
> +	/*
> +	 * When neither a release nor a wait operation is specified set the ops
> +	 * pointer to NULL to allow the fence structure to become independent
> +	 * who originally issued it.
> +	 */
> +	if (!fence->ops->release && !fence->ops->wait)
> +		RCU_INIT_POINTER(fence->ops, NULL);

OK, so the basic idea is that still living fences can't access driver
data or driver code anymore after the driver is unloaded. Good and
well, nice idea. We need something like that in Rust, too.

That's based on the rule that the driver, on unload, must signal all
the fences. Also OK.

However, how can that possibly fly by relying on the release callback
not being implemented? How many users don't need it, and could those
who implement release() be ported to.. sth else?


P.

> +
>  	/* Stash the cb_list before replacing it with the timestamp */
>  	list_replace(&fence->cb_list, &cb_list);
>  
> @@ -513,7 +521,7 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
>  	rcu_read_lock();
>  	ops = rcu_dereference(fence->ops);
>  	trace_dma_fence_wait_start(fence);
> -	if (ops->wait) {
> +	if (ops && ops->wait) {
>  		/*
>  		 * Implementing the wait ops is deprecated and not supported for
>  		 * issuer independent fences, so it is ok to use the ops outside
> @@ -578,7 +586,7 @@ void dma_fence_release(struct kref *kref)
>  	}
>  
>  	ops = rcu_dereference(fence->ops);
> -	if (ops->release)
> +	if (ops && ops->release)
>  		ops->release(fence);
>  	else
>  		dma_fence_free(fence);
> @@ -614,7 +622,7 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
>  
>  	rcu_read_lock();
>  	ops = rcu_dereference(fence->ops);
> -	if (!was_set && ops->enable_signaling) {
> +	if (!was_set && ops && ops->enable_signaling) {
>  		trace_dma_fence_enable_signal(fence);
>  
>  		if (!ops->enable_signaling(fence)) {
> @@ -1000,7 +1008,7 @@ void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
>  
>  	rcu_read_lock();
>  	ops = rcu_dereference(fence->ops);
> -	if (ops->set_deadline && !dma_fence_is_signaled(fence))
> +	if (ops && ops->set_deadline && !dma_fence_is_signaled(fence))
>  		ops->set_deadline(fence, deadline);
>  	rcu_read_unlock();
>  }
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 38421a0c7c5b..e1ba1d53de88 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -425,7 +425,7 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
>  
>  	rcu_read_lock();
>  	ops = rcu_dereference(fence->ops);
> -	if (ops->signaled && ops->signaled(fence)) {
> +	if (ops && ops->signaled && ops->signaled(fence)) {
>  		rcu_read_unlock();
>  		dma_fence_signal_locked(fence);
>  		return true;
> @@ -461,7 +461,7 @@ dma_fence_is_signaled(struct dma_fence *fence)
>  
>  	rcu_read_lock();
>  	ops = rcu_dereference(fence->ops);
> -	if (ops->signaled && ops->signaled(fence)) {
> +	if (ops && ops->signaled && ops->signaled(fence)) {
>  		rcu_read_unlock();
>  		dma_fence_signal(fence);
>  		return true;


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 14/15] drm/amdgpu: independence for the amdkfd_fence!
  2025-10-13 13:48 ` [PATCH 14/15] drm/amdgpu: independence for the amdkfd_fence! Christian König
@ 2025-10-17 22:22   ` Felix Kuehling
  2025-10-30 15:07     ` Christian König
  0 siblings, 1 reply; 47+ messages in thread
From: Felix Kuehling @ 2025-10-17 22:22 UTC (permalink / raw)
  To: Christian König, phasta, alexdeucher, simona.vetter,
	tursulin
  Cc: dri-devel, amd-gfx


On 2025-10-13 09:48, Christian König wrote:
> This should allow amdkfd_fences to outlive the amdgpu module.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  6 ++++
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c  | 36 +++++++------------
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c      |  7 ++--
>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c          |  4 +--
>   4 files changed, 24 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index 9e120c934cc1..35c59c784b7b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -196,6 +196,7 @@ int kfd_debugfs_kfd_mem_limits(struct seq_file *m, void *data);
>   #endif
>   #if IS_ENABLED(CONFIG_HSA_AMD)
>   bool amdkfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm);
> +void amdkfd_fence_signal(struct dma_fence *f);
>   struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f);
>   void amdgpu_amdkfd_remove_all_eviction_fences(struct amdgpu_bo *bo);
>   int amdgpu_amdkfd_evict_userptr(struct mmu_interval_notifier *mni,
> @@ -210,6 +211,11 @@ bool amdkfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm)
>   	return false;
>   }
>   
> +static inline
> +void amdkfd_fence_signal(struct dma_fence *f)
> +{
> +}
> +
>   static inline
>   struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f)
>   {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
> index 09c919f72b6c..69bca4536326 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
> @@ -127,29 +127,9 @@ static bool amdkfd_fence_enable_signaling(struct dma_fence *f)
>   		if (!svm_range_schedule_evict_svm_bo(fence))
>   			return true;
>   	}
> -	return false;
> -}
> -
> -/**
> - * amdkfd_fence_release - callback that fence can be freed
> - *
> - * @f: dma_fence
> - *
> - * This function is called when the reference count becomes zero.
> - * Drops the mm_struct reference and RCU schedules freeing up the fence.
> - */
> -static void amdkfd_fence_release(struct dma_fence *f)
> -{
> -	struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
> -
> -	/* Unconditionally signal the fence. The process is getting
> -	 * terminated.
> -	 */
> -	if (WARN_ON(!fence))
> -		return; /* Not an amdgpu_amdkfd_fence */
> -
>   	mmdrop(fence->mm);
> -	kfree_rcu(f, rcu);
> +	fence->mm = NULL;
> +	return false;
>   }
>   
>   /**
> @@ -174,9 +154,19 @@ bool amdkfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm)
>   	return false;
>   }
>   
> +void amdkfd_fence_signal(struct dma_fence *f)
> +{
> +	struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
> +
> +	if (fence) {
> +		mmdrop(fence->mm);
> +		fence->mm = NULL;

Isn't fence->mm already NULL here if it was dropped in 
amdkfd_fence_enable_signaling?

Regards,
   Felix


> +	}
> +	dma_fence_signal(f);
> +}
> +
>   static const struct dma_fence_ops amdkfd_fence_ops = {
>   	.get_driver_name = amdkfd_fence_get_driver_name,
>   	.get_timeline_name = amdkfd_fence_get_timeline_name,
>   	.enable_signaling = amdkfd_fence_enable_signaling,
> -	.release = amdkfd_fence_release,
>   };
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index ddfe30c13e9d..779d7701bac9 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -1177,7 +1177,7 @@ static void kfd_process_wq_release(struct work_struct *work)
>   	synchronize_rcu();
>   	ef = rcu_access_pointer(p->ef);
>   	if (ef)
> -		dma_fence_signal(ef);
> +		amdkfd_fence_signal(ef);
>   
>   	kfd_process_remove_sysfs(p);
>   	kfd_debugfs_remove_process(p);
> @@ -1986,7 +1986,6 @@ kfd_process_gpuid_from_node(struct kfd_process *p, struct kfd_node *node,
>   static int signal_eviction_fence(struct kfd_process *p)
>   {
>   	struct dma_fence *ef;
> -	int ret;
>   
>   	rcu_read_lock();
>   	ef = dma_fence_get_rcu_safe(&p->ef);
> @@ -1994,10 +1993,10 @@ static int signal_eviction_fence(struct kfd_process *p)
>   	if (!ef)
>   		return -EINVAL;
>   
> -	ret = dma_fence_signal(ef);
> +	amdkfd_fence_signal(ef);
>   	dma_fence_put(ef);
>   
> -	return ret;
> +	return 0;
>   }
>   
>   static void evict_process_worker(struct work_struct *work)
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 91609dd5730f..01ce2d853602 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -428,7 +428,7 @@ static void svm_range_bo_release(struct kref *kref)
>   
>   	if (!dma_fence_is_signaled(&svm_bo->eviction_fence->base))
>   		/* We're not in the eviction worker. Signal the fence. */
> -		dma_fence_signal(&svm_bo->eviction_fence->base);
> +		amdkfd_fence_signal(&svm_bo->eviction_fence->base);
>   	dma_fence_put(&svm_bo->eviction_fence->base);
>   	amdgpu_bo_unref(&svm_bo->bo);
>   	kfree(svm_bo);
> @@ -3628,7 +3628,7 @@ static void svm_range_evict_svm_bo_worker(struct work_struct *work)
>   	mmap_read_unlock(mm);
>   	mmput(mm);
>   
> -	dma_fence_signal(&svm_bo->eviction_fence->base);
> +	amdkfd_fence_signal(&svm_bo->eviction_fence->base);
>   
>   	/* This is the last reference to svm_bo, after svm_range_vram_node_free
>   	 * has been called in svm_migrate_vram_to_ram

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 01/15] dma-buf: cleanup dma_fence_describe
  2025-10-14 14:37   ` Tvrtko Ursulin
@ 2025-10-23  3:45     ` Matthew Brost
  0 siblings, 0 replies; 47+ messages in thread
From: Matthew Brost @ 2025-10-23  3:45 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Christian König, phasta, alexdeucher, simona.vetter,
	dri-devel, amd-gfx

On Tue, Oct 14, 2025 at 03:37:03PM +0100, Tvrtko Ursulin wrote:
> 
> On 13/10/2025 14:48, Christian König wrote:
> > The driver and timeline name are meaningless for signaled fences.
> > 
> > Drop them and also print the context number.
> > 
> > Signed-off-by: Christian König <christian.koenig@amd.com>
> > ---
> >   drivers/dma-buf/dma-fence.c | 11 ++++++-----
> >   1 file changed, 6 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > index 3f78c56b58dc..f0539c73ed57 100644
> > --- a/drivers/dma-buf/dma-fence.c
> > +++ b/drivers/dma-buf/dma-fence.c
> > @@ -1001,17 +1001,18 @@ void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq)
> >   {
> >   	const char __rcu *timeline;
> >   	const char __rcu *driver;
> > +	const char *signaled = "un";
> >   	rcu_read_lock();
> >   	timeline = dma_fence_timeline_name(fence);
> >   	driver = dma_fence_driver_name(fence);
> > -	seq_printf(seq, "%s %s seq %llu %ssignalled\n",
> > -		   rcu_dereference(driver),
> > -		   rcu_dereference(timeline),
> > -		   fence->seqno,
> > -		   dma_fence_is_signaled(fence) ? "" : "un");
> > +	if (dma_fence_is_signaled(fence))
> > +		timeline = driver = signaled = "";
> 
> FWIW you could avoid calling dma_fence_timeline_name() and
> dma_fence_driver_name() since you added the signaled check.
> 

+1 to avoid calling dma_fence_timeline_name / dma_fence_driver_name on
signaled fences.

Matt

> May end up slightly nicer than to override strings returned from helpers
> with a chained assignment.
> 
> Or even store the signaled status in a local bool and branch off two
> seq_printfs based on it.
> 
> > +
> > +	seq_printf(seq, "%llu %s %s seq %llu %ssignalled\n", fence->context,
> > +		   timeline, driver, fence->seqno, signaled);
> 
> I was initially worried if this string ends up anywhere which could be
> considered ABI but it seems debugfs only so changing the formatting is fine.
> 
> How about making dma_fence_describe() conditional on CONFIG_DEBUG_FS to set
> this in stone? (And dma_resv_describe..)
> 
> And maybe unify the %llu:%llu context:fence as the tracepoints use?
> 
> Altogether something like:
> 
> rcu_read_lock();
> 
> signaled = dma_fence_is_signaled(fence);
> 
> if (signaled)
> 	seq_printf("%llu:%llu %s %s signalled",
> 		   fence->context,
> 		   fence->seqno,
> 		   dma_fence_timeline_name(fence),
> 		   dma_fence_timeline_name(fence);
> else
> 	seq_print("%llu:%llu unsignalled",
> 		  fence->seqno, fence->context);
> 
> Maybe more readable but up to you.
> 
> Regards,
> 
> Tvrtko
> >   	rcu_read_unlock();
> >   }
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/15] dma-buf: detach fence ops on signal
  2025-10-16 15:57     ` Tvrtko Ursulin
@ 2025-10-23  4:23       ` Matthew Brost
  2025-10-23  4:44         ` Matthew Brost
  2025-10-30 13:52       ` Christian König
  1 sibling, 1 reply; 47+ messages in thread
From: Matthew Brost @ 2025-10-23  4:23 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Christian König, phasta, alexdeucher, simona.vetter,
	dri-devel, amd-gfx

On Thu, Oct 16, 2025 at 04:57:37PM +0100, Tvrtko Ursulin wrote:
> 
> On 16/10/2025 09:56, Tvrtko Ursulin wrote:
> > 
> > On 13/10/2025 14:48, Christian König wrote:
> > > When neither a release nor a wait operation is specified it is possible
> > > to let the dma_fence live on independent of the module who issued it.
> > > 
> > > This makes it possible to unload drivers and only wait for all their
> > > fences to signal.
> > 
> > Have you looked at whether the requirement to not have the release and
> > wait callbacks will exclude some drivers from being able to benefit from
> > this?
> 
> I had a browse and this seems to be the situation:
> 
> Custom .wait:
>  - radeon, qxl, nouveau, i915
> 
> Those would therefore still be vulnerable to the unbind->unload sequence.
> Actually not sure about qxl, but other three are PCI so in theory at least.
> I915 at least supports unbind and unload.
> 
> Custom .release:
>  - vgem, nouveau, lima, pvr, i915, usb-gadget, industrialio, etnaviv, xe
> 
> Out of those there do not actually need a custom release and could probably
> be weaned off it:
>  - usb-gadget, industrialio, etnaviv, xe
> 
> (Xe would lose a debug assert and some would have their kfrees replaced with
> kfree_rcu. Plus build time asserts added the struct dma-fence remains first
> in the respective driver structs. It sounds feasible.)

FWIW, I pulled this series from Christian into Xe and attempted to
disconnect fences in Xe [1]. It seems to work in my local testing, but
let’s see what CI says.

I still needed a release callback [2] to maintain an external lock for
our HW fences and the dma-fence signaling IRQ, but it should now be
fully disconnected from the module. I coded this in about an hour, so
take it with a grain of salt.

Matt

[1] https://patchwork.freedesktop.org/series/156388/
[2] https://patchwork.freedesktop.org/patch/682962/?series=156388&rev=1

> 
> That would leave us with .release in:
>  - vgem, nouveau, lima, pvr, i915
> 
> Combined list of custom .wait + .release:
>  - radeon, qxl, nouveau, i915, lima, pvr, vgem
> 
> From those the ones which support unbind and module unload would remain
> potentially vulnerable to use after free.
> 
> It doesn't sound great to only solve it partially but maybe it is a
> reasonable next step. Where could we go from there to solve it for everyone?
> 
> Regards,
> 
> Tvrtko
> 
> > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > ---
> > >   drivers/dma-buf/dma-fence.c | 16 ++++++++++++----
> > >   include/linux/dma-fence.h   |  4 ++--
> > >   2 files changed, 14 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > > index 982f2b2a62c0..39f73edf3a33 100644
> > > --- a/drivers/dma-buf/dma-fence.c
> > > +++ b/drivers/dma-buf/dma-fence.c
> > > @@ -374,6 +374,14 @@ int dma_fence_signal_timestamp_locked(struct
> > > dma_fence *fence,
> > >                         &fence->flags)))
> > >           return -EINVAL;
> > > +    /*
> > > +     * When neither a release nor a wait operation is specified set
> > > the ops
> > > +     * pointer to NULL to allow the fence structure to become
> > > independent
> > > +     * who originally issued it.
> > > +     */
> > > +    if (!fence->ops->release && !fence->ops->wait)
> > > +        RCU_INIT_POINTER(fence->ops, NULL);
> > > +
> > >       /* Stash the cb_list before replacing it with the timestamp */
> > >       list_replace(&fence->cb_list, &cb_list);
> > > @@ -513,7 +521,7 @@ dma_fence_wait_timeout(struct dma_fence *fence,
> > > bool intr, signed long timeout)
> > >       rcu_read_lock();
> > >       ops = rcu_dereference(fence->ops);
> > >       trace_dma_fence_wait_start(fence);
> > > -    if (ops->wait) {
> > > +    if (ops && ops->wait) {
> > >           /*
> > >            * Implementing the wait ops is deprecated and not
> > > supported for
> > >            * issuer independent fences, so it is ok to use the ops
> > > outside
> > > @@ -578,7 +586,7 @@ void dma_fence_release(struct kref *kref)
> > >       }
> > >       ops = rcu_dereference(fence->ops);
> > > -    if (ops->release)
> > > +    if (ops && ops->release)
> > >           ops->release(fence);
> > >       else
> > >           dma_fence_free(fence);
> > > @@ -614,7 +622,7 @@ static bool __dma_fence_enable_signaling(struct
> > > dma_fence *fence)
> > >       rcu_read_lock();
> > >       ops = rcu_dereference(fence->ops);
> > > -    if (!was_set && ops->enable_signaling) {
> > > +    if (!was_set && ops && ops->enable_signaling) {
> > >           trace_dma_fence_enable_signal(fence);
> > >           if (!ops->enable_signaling(fence)) {
> > > @@ -1000,7 +1008,7 @@ void dma_fence_set_deadline(struct dma_fence
> > > *fence, ktime_t deadline)
> > >       rcu_read_lock();
> > >       ops = rcu_dereference(fence->ops);
> > > -    if (ops->set_deadline && !dma_fence_is_signaled(fence))
> > > +    if (ops && ops->set_deadline && !dma_fence_is_signaled(fence))
> > >           ops->set_deadline(fence, deadline);
> > >       rcu_read_unlock();
> > >   }
> > > diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> > > index 38421a0c7c5b..e1ba1d53de88 100644
> > > --- a/include/linux/dma-fence.h
> > > +++ b/include/linux/dma-fence.h
> > > @@ -425,7 +425,7 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
> > >       rcu_read_lock();
> > >       ops = rcu_dereference(fence->ops);
> > > -    if (ops->signaled && ops->signaled(fence)) {
> > > +    if (ops && ops->signaled && ops->signaled(fence)) {
> > >           rcu_read_unlock();
> > >           dma_fence_signal_locked(fence);
> > >           return true;
> > > @@ -461,7 +461,7 @@ dma_fence_is_signaled(struct dma_fence *fence)
> > >       rcu_read_lock();
> > >       ops = rcu_dereference(fence->ops);
> > > -    if (ops->signaled && ops->signaled(fence)) {
> > > +    if (ops && ops->signaled && ops->signaled(fence)) {
> > >           rcu_read_unlock();
> > >           dma_fence_signal(fence);
> > >           return true;
> > 
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/15] dma-buf: detach fence ops on signal
  2025-10-23  4:23       ` Matthew Brost
@ 2025-10-23  4:44         ` Matthew Brost
  0 siblings, 0 replies; 47+ messages in thread
From: Matthew Brost @ 2025-10-23  4:44 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Christian König, phasta, alexdeucher, simona.vetter,
	dri-devel, amd-gfx

On Wed, Oct 22, 2025 at 09:23:50PM -0700, Matthew Brost wrote:
> On Thu, Oct 16, 2025 at 04:57:37PM +0100, Tvrtko Ursulin wrote:
> > 
> > On 16/10/2025 09:56, Tvrtko Ursulin wrote:
> > > 
> > > On 13/10/2025 14:48, Christian König wrote:
> > > > When neither a release nor a wait operation is specified it is possible
> > > > to let the dma_fence live on independent of the module who issued it.
> > > > 
> > > > This makes it possible to unload drivers and only wait for all their
> > > > fences to signal.
> > > 
> > > Have you looked at whether the requirement to not have the release and
> > > wait callbacks will exclude some drivers from being able to benefit from
> > > this?
> > 
> > I had a browse and this seems to be the situation:
> > 
> > Custom .wait:
> >  - radeon, qxl, nouveau, i915
> > 
> > Those would therefore still be vulnerable to the unbind->unload sequence.
> > Actually not sure about qxl, but other three are PCI so in theory at least.
> > I915 at least supports unbind and unload.
> > 
> > Custom .release:
> >  - vgem, nouveau, lima, pvr, i915, usb-gadget, industrialio, etnaviv, xe
> > 
> > Out of those there do not actually need a custom release and could probably
> > be weaned off it:
> >  - usb-gadget, industrialio, etnaviv, xe
> > 
> > (Xe would lose a debug assert and some would have their kfrees replaced with
> > kfree_rcu. Plus build time asserts added the struct dma-fence remains first
> > in the respective driver structs. It sounds feasible.)
> 
> FWIW, I pulled this series from Christian into Xe and attempted to
> disconnect fences in Xe [1]. It seems to work in my local testing, but
> let’s see what CI says.
> 
> I still needed a release callback [2] to maintain an external lock for

I realized after I sent this release CB static ops disappear on driver
unload. I can drop the need for a release CB.

Matt

> our HW fences and the dma-fence signaling IRQ, but it should now be
> fully disconnected from the module. I coded this in about an hour, so
> take it with a grain of salt.
> 
> Matt
> 
> [1] https://patchwork.freedesktop.org/series/156388/
> [2] https://patchwork.freedesktop.org/patch/682962/?series=156388&rev=1
> 
> > 
> > That would leave us with .release in:
> >  - vgem, nouveau, lima, pvr, i915
> > 
> > Combined list of custom .wait + .release:
> >  - radeon, qxl, nouveau, i915, lima, pvr, vgem
> > 
> > From those the ones which support unbind and module unload would remain
> > potentially vulnerable to use after free.
> > 
> > It doesn't sound great to only solve it partially but maybe it is a
> > reasonable next step. Where could we go from there to solve it for everyone?
> > 
> > Regards,
> > 
> > Tvrtko
> > 
> > > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > > ---
> > > >   drivers/dma-buf/dma-fence.c | 16 ++++++++++++----
> > > >   include/linux/dma-fence.h   |  4 ++--
> > > >   2 files changed, 14 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > > > index 982f2b2a62c0..39f73edf3a33 100644
> > > > --- a/drivers/dma-buf/dma-fence.c
> > > > +++ b/drivers/dma-buf/dma-fence.c
> > > > @@ -374,6 +374,14 @@ int dma_fence_signal_timestamp_locked(struct
> > > > dma_fence *fence,
> > > >                         &fence->flags)))
> > > >           return -EINVAL;
> > > > +    /*
> > > > +     * When neither a release nor a wait operation is specified set
> > > > the ops
> > > > +     * pointer to NULL to allow the fence structure to become
> > > > independent
> > > > +     * who originally issued it.
> > > > +     */
> > > > +    if (!fence->ops->release && !fence->ops->wait)
> > > > +        RCU_INIT_POINTER(fence->ops, NULL);
> > > > +
> > > >       /* Stash the cb_list before replacing it with the timestamp */
> > > >       list_replace(&fence->cb_list, &cb_list);
> > > > @@ -513,7 +521,7 @@ dma_fence_wait_timeout(struct dma_fence *fence,
> > > > bool intr, signed long timeout)
> > > >       rcu_read_lock();
> > > >       ops = rcu_dereference(fence->ops);
> > > >       trace_dma_fence_wait_start(fence);
> > > > -    if (ops->wait) {
> > > > +    if (ops && ops->wait) {
> > > >           /*
> > > >            * Implementing the wait ops is deprecated and not
> > > > supported for
> > > >            * issuer independent fences, so it is ok to use the ops
> > > > outside
> > > > @@ -578,7 +586,7 @@ void dma_fence_release(struct kref *kref)
> > > >       }
> > > >       ops = rcu_dereference(fence->ops);
> > > > -    if (ops->release)
> > > > +    if (ops && ops->release)
> > > >           ops->release(fence);
> > > >       else
> > > >           dma_fence_free(fence);
> > > > @@ -614,7 +622,7 @@ static bool __dma_fence_enable_signaling(struct
> > > > dma_fence *fence)
> > > >       rcu_read_lock();
> > > >       ops = rcu_dereference(fence->ops);
> > > > -    if (!was_set && ops->enable_signaling) {
> > > > +    if (!was_set && ops && ops->enable_signaling) {
> > > >           trace_dma_fence_enable_signal(fence);
> > > >           if (!ops->enable_signaling(fence)) {
> > > > @@ -1000,7 +1008,7 @@ void dma_fence_set_deadline(struct dma_fence
> > > > *fence, ktime_t deadline)
> > > >       rcu_read_lock();
> > > >       ops = rcu_dereference(fence->ops);
> > > > -    if (ops->set_deadline && !dma_fence_is_signaled(fence))
> > > > +    if (ops && ops->set_deadline && !dma_fence_is_signaled(fence))
> > > >           ops->set_deadline(fence, deadline);
> > > >       rcu_read_unlock();
> > > >   }
> > > > diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> > > > index 38421a0c7c5b..e1ba1d53de88 100644
> > > > --- a/include/linux/dma-fence.h
> > > > +++ b/include/linux/dma-fence.h
> > > > @@ -425,7 +425,7 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
> > > >       rcu_read_lock();
> > > >       ops = rcu_dereference(fence->ops);
> > > > -    if (ops->signaled && ops->signaled(fence)) {
> > > > +    if (ops && ops->signaled && ops->signaled(fence)) {
> > > >           rcu_read_unlock();
> > > >           dma_fence_signal_locked(fence);
> > > >           return true;
> > > > @@ -461,7 +461,7 @@ dma_fence_is_signaled(struct dma_fence *fence)
> > > >       rcu_read_lock();
> > > >       ops = rcu_dereference(fence->ops);
> > > > -    if (ops->signaled && ops->signaled(fence)) {
> > > > +    if (ops && ops->signaled && ops->signaled(fence)) {
> > > >           rcu_read_unlock();
> > > >           dma_fence_signal(fence);
> > > >           return true;
> > > 
> > 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/15] dma-buf: inline spinlock for fence protection
  2025-10-13 13:48 ` [PATCH 05/15] dma-buf: inline spinlock for fence protection Christian König
  2025-10-16  9:26   ` Tvrtko Ursulin
@ 2025-10-23 18:09   ` Matthew Brost
  2025-10-30 15:14     ` Christian König
  1 sibling, 1 reply; 47+ messages in thread
From: Matthew Brost @ 2025-10-23 18:09 UTC (permalink / raw)
  To: Christian König
  Cc: phasta, alexdeucher, simona.vetter, tursulin, dri-devel, amd-gfx

On Mon, Oct 13, 2025 at 03:48:32PM +0200, Christian König wrote:
> Allow implementations to not give a spinlock to protect the fence
> internal state, instead a spinlock embedded into the fence structure
> itself is used in this case.
> 
> Apart from simplifying the handling for containers and the stub fence
> this has the advantage of allowing implementations to issue fences
> without caring about theit spinlock lifetime.
> 
> That in turn is necessary for independent fences who outlive the module
> who originally issued them.
> 

One thing Xe really wants to do is use a shared lock for HW fences,
since our IRQ handler walks the pending fence list under the shared lock
and signals the fences. I don’t think it would be desirable to split
this into an IRQ handler list lock and individual fence locks.

It would be great if we could come up with a way to support this model,
ensure it's safe for module unload, and document it clearly.

I’ve thought of a few possible ideas:

- After a fence signals, the dma-fence core should not be allowed to touch
  the external lock.
- Only export HW fences to the DRM scheduler, which guarantees that after
  HW signaling, it won’t perform any dangerous operations (e.g., making a
  call that takes the external lock). This really isn't generic solution
  though.
- Create an embedded dma-fence spinlock structure with a refcount and a
  dma-fence flag indicating that the external lock is refcounted. In
  dma_fence_free, drop the lock’s refcount.

In all of the above cases, the rule is that all module fences must be
signaled before unloading.

Thoughts?

Matt

> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/dma-buf/dma-fence.c              | 54 ++++++++++++------------
>  drivers/dma-buf/sw_sync.c                | 14 +++---
>  drivers/dma-buf/sync_debug.h             |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  4 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |  4 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 12 +++---
>  drivers/gpu/drm/drm_crtc.c               |  2 +-
>  drivers/gpu/drm/drm_writeback.c          |  2 +-
>  drivers/gpu/drm/nouveau/nouveau_drm.c    |  5 ++-
>  drivers/gpu/drm/nouveau/nouveau_fence.c  |  3 +-
>  drivers/gpu/drm/qxl/qxl_release.c        |  3 +-
>  drivers/gpu/drm/vmwgfx/vmwgfx_fence.c    |  3 +-
>  drivers/gpu/drm/xe/xe_hw_fence.c         |  3 +-
>  drivers/gpu/drm/xe/xe_sched_job.c        |  4 +-
>  include/linux/dma-fence.h                | 42 +++++++++++++++++-
>  15 files changed, 99 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 39f73edf3a33..a0b328fdd90d 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -343,7 +343,6 @@ void __dma_fence_might_wait(void)
>  }
>  #endif
>  
> -
>  /**
>   * dma_fence_signal_timestamp_locked - signal completion of a fence
>   * @fence: the fence to signal
> @@ -368,7 +367,7 @@ int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>  	struct dma_fence_cb *cur, *tmp;
>  	struct list_head cb_list;
>  
> -	lockdep_assert_held(fence->lock);
> +	lockdep_assert_held(dma_fence_spinlock(fence));
>  
>  	if (unlikely(test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
>  				      &fence->flags)))
> @@ -421,9 +420,9 @@ int dma_fence_signal_timestamp(struct dma_fence *fence, ktime_t timestamp)
>  	if (WARN_ON(!fence))
>  		return -EINVAL;
>  
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock(fence, flags);
>  	ret = dma_fence_signal_timestamp_locked(fence, timestamp);
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock(fence, flags);
>  
>  	return ret;
>  }
> @@ -475,9 +474,9 @@ int dma_fence_signal(struct dma_fence *fence)
>  
>  	tmp = dma_fence_begin_signalling();
>  
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock(fence, flags);
>  	ret = dma_fence_signal_timestamp_locked(fence, ktime_get());
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock(fence, flags);
>  
>  	dma_fence_end_signalling(tmp);
>  
> @@ -579,10 +578,10 @@ void dma_fence_release(struct kref *kref)
>  		 * don't leave chains dangling. We set the error flag first
>  		 * so that the callbacks know this signal is due to an error.
>  		 */
> -		spin_lock_irqsave(fence->lock, flags);
> +		dma_fence_lock(fence, flags);
>  		fence->error = -EDEADLK;
>  		dma_fence_signal_locked(fence);
> -		spin_unlock_irqrestore(fence->lock, flags);
> +		dma_fence_unlock(fence, flags);
>  	}
>  
>  	ops = rcu_dereference(fence->ops);
> @@ -612,7 +611,7 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
>  	const struct dma_fence_ops *ops;
>  	bool was_set;
>  
> -	lockdep_assert_held(fence->lock);
> +	lockdep_assert_held(dma_fence_spinlock(fence));
>  
>  	was_set = test_and_set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
>  				   &fence->flags);
> @@ -648,9 +647,9 @@ void dma_fence_enable_sw_signaling(struct dma_fence *fence)
>  {
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock(fence, flags);
>  	__dma_fence_enable_signaling(fence);
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock(fence, flags);
>  }
>  EXPORT_SYMBOL(dma_fence_enable_sw_signaling);
>  
> @@ -690,8 +689,7 @@ int dma_fence_add_callback(struct dma_fence *fence, struct dma_fence_cb *cb,
>  		return -ENOENT;
>  	}
>  
> -	spin_lock_irqsave(fence->lock, flags);
> -
> +	dma_fence_lock(fence, flags);
>  	if (__dma_fence_enable_signaling(fence)) {
>  		cb->func = func;
>  		list_add_tail(&cb->node, &fence->cb_list);
> @@ -699,8 +697,7 @@ int dma_fence_add_callback(struct dma_fence *fence, struct dma_fence_cb *cb,
>  		INIT_LIST_HEAD(&cb->node);
>  		ret = -ENOENT;
>  	}
> -
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock(fence, flags);
>  
>  	return ret;
>  }
> @@ -723,9 +720,9 @@ int dma_fence_get_status(struct dma_fence *fence)
>  	unsigned long flags;
>  	int status;
>  
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock(fence, flags);
>  	status = dma_fence_get_status_locked(fence);
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock(fence, flags);
>  
>  	return status;
>  }
> @@ -755,13 +752,11 @@ dma_fence_remove_callback(struct dma_fence *fence, struct dma_fence_cb *cb)
>  	unsigned long flags;
>  	bool ret;
>  
> -	spin_lock_irqsave(fence->lock, flags);
> -
> +	dma_fence_lock(fence, flags);
>  	ret = !list_empty(&cb->node);
>  	if (ret)
>  		list_del_init(&cb->node);
> -
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock(fence, flags);
>  
>  	return ret;
>  }
> @@ -800,8 +795,7 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
>  	unsigned long flags;
>  	signed long ret = timeout ? timeout : 1;
>  
> -	spin_lock_irqsave(fence->lock, flags);
> -
> +	dma_fence_lock(fence, flags);
>  	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>  		goto out;
>  
> @@ -824,11 +818,11 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
>  			__set_current_state(TASK_INTERRUPTIBLE);
>  		else
>  			__set_current_state(TASK_UNINTERRUPTIBLE);
> -		spin_unlock_irqrestore(fence->lock, flags);
> +		dma_fence_unlock(fence, flags);
>  
>  		ret = schedule_timeout(ret);
>  
> -		spin_lock_irqsave(fence->lock, flags);
> +		dma_fence_lock(fence, flags);
>  		if (ret > 0 && intr && signal_pending(current))
>  			ret = -ERESTARTSYS;
>  	}
> @@ -838,7 +832,7 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
>  	__set_current_state(TASK_RUNNING);
>  
>  out:
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock(fence, flags);
>  	return ret;
>  }
>  EXPORT_SYMBOL(dma_fence_default_wait);
> @@ -1046,7 +1040,6 @@ static void
>  __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>  	         spinlock_t *lock, u64 context, u64 seqno, unsigned long flags)
>  {
> -	BUG_ON(!lock);
>  	BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
>  
>  	kref_init(&fence->refcount);
> @@ -1057,10 +1050,15 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>  	 */
>  	RCU_INIT_POINTER(fence->ops, ops);
>  	INIT_LIST_HEAD(&fence->cb_list);
> -	fence->lock = lock;
>  	fence->context = context;
>  	fence->seqno = seqno;
>  	fence->flags = flags;
> +	if (lock) {
> +		fence->extern_lock = lock;
> +	} else {
> +		spin_lock_init(&fence->inline_lock);
> +		fence->flags |= BIT(DMA_FENCE_FLAG_INLINE_LOCK_BIT);
> +	}
>  	fence->error = 0;
>  
>  	trace_dma_fence_init(fence);
> diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
> index 3c20f1d31cf5..8f48529214a4 100644
> --- a/drivers/dma-buf/sw_sync.c
> +++ b/drivers/dma-buf/sw_sync.c
> @@ -155,12 +155,12 @@ static void timeline_fence_release(struct dma_fence *fence)
>  	struct sync_timeline *parent = dma_fence_parent(fence);
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock(fence, flags);
>  	if (!list_empty(&pt->link)) {
>  		list_del(&pt->link);
>  		rb_erase(&pt->node, &parent->pt_tree);
>  	}
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock(fence, flags);
>  
>  	sync_timeline_put(parent);
>  	dma_fence_free(fence);
> @@ -178,7 +178,7 @@ static void timeline_fence_set_deadline(struct dma_fence *fence, ktime_t deadlin
>  	struct sync_pt *pt = dma_fence_to_sync_pt(fence);
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock(fence, flags);
>  	if (test_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags)) {
>  		if (ktime_before(deadline, pt->deadline))
>  			pt->deadline = deadline;
> @@ -186,7 +186,7 @@ static void timeline_fence_set_deadline(struct dma_fence *fence, ktime_t deadlin
>  		pt->deadline = deadline;
>  		__set_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags);
>  	}
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock(fence, flags);
>  }
>  
>  static const struct dma_fence_ops timeline_fence_ops = {
> @@ -427,13 +427,13 @@ static int sw_sync_ioctl_get_deadline(struct sync_timeline *obj, unsigned long a
>  		goto put_fence;
>  	}
>  
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock(fence, flags);
>  	if (!test_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags)) {
>  		ret = -ENOENT;
>  		goto unlock;
>  	}
>  	data.deadline_ns = ktime_to_ns(pt->deadline);
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock(fence, flags);
>  
>  	dma_fence_put(fence);
>  
> @@ -446,7 +446,7 @@ static int sw_sync_ioctl_get_deadline(struct sync_timeline *obj, unsigned long a
>  	return 0;
>  
>  unlock:
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock(fence, flags);
>  put_fence:
>  	dma_fence_put(fence);
>  
> diff --git a/drivers/dma-buf/sync_debug.h b/drivers/dma-buf/sync_debug.h
> index 02af347293d0..c49324505b20 100644
> --- a/drivers/dma-buf/sync_debug.h
> +++ b/drivers/dma-buf/sync_debug.h
> @@ -47,7 +47,7 @@ struct sync_timeline {
>  
>  static inline struct sync_timeline *dma_fence_parent(struct dma_fence *fence)
>  {
> -	return container_of(fence->lock, struct sync_timeline, lock);
> +	return container_of(fence->extern_lock, struct sync_timeline, lock);
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> index 5ec5c3ff22bb..fcc7a3fb93b3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> @@ -468,10 +468,10 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, unsigned int vmid,
>  	if (amdgpu_sriov_vf(ring->adev) || !ring->funcs->soft_recovery || !fence)
>  		return false;
>  
> -	spin_lock_irqsave(fence->lock, flags);
> +	dma_fence_lock(fence, flags);
>  	if (!dma_fence_is_signaled_locked(fence))
>  		dma_fence_set_error(fence, -ENODATA);
> -	spin_unlock_irqrestore(fence->lock, flags);
> +	dma_fence_unlock(fence, flags);
>  
>  	while (!dma_fence_is_signaled(fence) &&
>  	       ktime_to_ns(ktime_sub(deadline, ktime_get())) > 0)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index db66b4232de0..db6516ce8335 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -2774,8 +2774,8 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
>  	dma_fence_put(vm->last_unlocked);
>  	dma_fence_wait(vm->last_tlb_flush, false);
>  	/* Make sure that all fence callbacks have completed */
> -	spin_lock_irqsave(vm->last_tlb_flush->lock, flags);
> -	spin_unlock_irqrestore(vm->last_tlb_flush->lock, flags);
> +	dma_fence_lock(vm->last_tlb_flush, flags);
> +	dma_fence_unlock(vm->last_tlb_flush, flags);
>  	dma_fence_put(vm->last_tlb_flush);
>  
>  	list_for_each_entry_safe(mapping, tmp, &vm->freed, list) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index 77207f4e448e..4fc7f66b7d13 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -631,20 +631,20 @@ bool amdgpu_vm_is_bo_always_valid(struct amdgpu_vm *vm, struct amdgpu_bo *bo);
>   */
>  static inline uint64_t amdgpu_vm_tlb_seq(struct amdgpu_vm *vm)
>  {
> +	struct dma_fence *fence;
>  	unsigned long flags;
> -	spinlock_t *lock;
>  
>  	/*
>  	 * Workaround to stop racing between the fence signaling and handling
> -	 * the cb. The lock is static after initially setting it up, just make
> -	 * sure that the dma_fence structure isn't freed up.
> +	 * the cb.
>  	 */
>  	rcu_read_lock();
> -	lock = vm->last_tlb_flush->lock;
> +	fence = dma_fence_get_rcu(vm->last_tlb_flush);
>  	rcu_read_unlock();
>  
> -	spin_lock_irqsave(lock, flags);
> -	spin_unlock_irqrestore(lock, flags);
> +	dma_fence_lock(fence, flags);
> +	dma_fence_unlock(fence, flags);
> +	dma_fence_put(fence);
>  
>  	return atomic64_read(&vm->tlb_seq);
>  }
> diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
> index 46655339003d..ad47f58cd159 100644
> --- a/drivers/gpu/drm/drm_crtc.c
> +++ b/drivers/gpu/drm/drm_crtc.c
> @@ -159,7 +159,7 @@ static const struct dma_fence_ops drm_crtc_fence_ops;
>  static struct drm_crtc *fence_to_crtc(struct dma_fence *fence)
>  {
>  	BUG_ON(fence->ops != &drm_crtc_fence_ops);
> -	return container_of(fence->lock, struct drm_crtc, fence_lock);
> +	return container_of(fence->extern_lock, struct drm_crtc, fence_lock);
>  }
>  
>  static const char *drm_crtc_fence_get_driver_name(struct dma_fence *fence)
> diff --git a/drivers/gpu/drm/drm_writeback.c b/drivers/gpu/drm/drm_writeback.c
> index 95b8a2e4bda6..624a4e8b6c99 100644
> --- a/drivers/gpu/drm/drm_writeback.c
> +++ b/drivers/gpu/drm/drm_writeback.c
> @@ -81,7 +81,7 @@
>   *	From userspace, this property will always read as zero.
>   */
>  
> -#define fence_to_wb_connector(x) container_of(x->lock, \
> +#define fence_to_wb_connector(x) container_of(x->extern_lock, \
>  					      struct drm_writeback_connector, \
>  					      fence_lock)
>  
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index 1527b801f013..2956ed2ec073 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -156,12 +156,13 @@ nouveau_name(struct drm_device *dev)
>  static inline bool
>  nouveau_cli_work_ready(struct dma_fence *fence)
>  {
> +	unsigned long flags;
>  	bool ret = true;
>  
> -	spin_lock_irq(fence->lock);
> +	dma_fence_lock(fence, flags);
>  	if (!dma_fence_is_signaled_locked(fence))
>  		ret = false;
> -	spin_unlock_irq(fence->lock);
> +	dma_fence_unlock(fence, flags);
>  
>  	if (ret == true)
>  		dma_fence_put(fence);
> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
> index d5654e26d5bc..272b492c4d7c 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> @@ -47,7 +47,8 @@ from_fence(struct dma_fence *fence)
>  static inline struct nouveau_fence_chan *
>  nouveau_fctx(struct nouveau_fence *fence)
>  {
> -	return container_of(fence->base.lock, struct nouveau_fence_chan, lock);
> +	return container_of(fence->base.extern_lock, struct nouveau_fence_chan,
> +			    lock);
>  }
>  
>  static bool
> diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
> index 05204a6a3fa8..1d346822c1f7 100644
> --- a/drivers/gpu/drm/qxl/qxl_release.c
> +++ b/drivers/gpu/drm/qxl/qxl_release.c
> @@ -60,7 +60,8 @@ static long qxl_fence_wait(struct dma_fence *fence, bool intr,
>  	struct qxl_device *qdev;
>  	unsigned long cur, end = jiffies + timeout;
>  
> -	qdev = container_of(fence->lock, struct qxl_device, release_lock);
> +	qdev = container_of(fence->extern_lock, struct qxl_device,
> +			    release_lock);
>  
>  	if (!wait_event_timeout(qdev->release_event,
>  				(dma_fence_is_signaled(fence) ||
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> index c2294abbe753..346761172c1b 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> @@ -47,7 +47,8 @@ struct vmw_event_fence_action {
>  static struct vmw_fence_manager *
>  fman_from_fence(struct vmw_fence_obj *fence)
>  {
> -	return container_of(fence->base.lock, struct vmw_fence_manager, lock);
> +	return container_of(fence->base.extern_lock, struct vmw_fence_manager,
> +			    lock);
>  }
>  
>  static void vmw_fence_obj_destroy(struct dma_fence *f)
> diff --git a/drivers/gpu/drm/xe/xe_hw_fence.c b/drivers/gpu/drm/xe/xe_hw_fence.c
> index b2a0c46dfcd4..3456bec93c70 100644
> --- a/drivers/gpu/drm/xe/xe_hw_fence.c
> +++ b/drivers/gpu/drm/xe/xe_hw_fence.c
> @@ -144,7 +144,8 @@ static struct xe_hw_fence *to_xe_hw_fence(struct dma_fence *fence);
>  
>  static struct xe_hw_fence_irq *xe_hw_fence_irq(struct xe_hw_fence *fence)
>  {
> -	return container_of(fence->dma.lock, struct xe_hw_fence_irq, lock);
> +	return container_of(fence->dma.extern_lock, struct xe_hw_fence_irq,
> +			    lock);
>  }
>  
>  static const char *xe_hw_fence_get_driver_name(struct dma_fence *dma_fence)
> diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
> index d21bf8f26964..ea7038475b4b 100644
> --- a/drivers/gpu/drm/xe/xe_sched_job.c
> +++ b/drivers/gpu/drm/xe/xe_sched_job.c
> @@ -187,11 +187,11 @@ static bool xe_fence_set_error(struct dma_fence *fence, int error)
>  	unsigned long irq_flags;
>  	bool signaled;
>  
> -	spin_lock_irqsave(fence->lock, irq_flags);
> +	dma_fence_lock(fence, irq_flags);
>  	signaled = test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
>  	if (!signaled)
>  		dma_fence_set_error(fence, error);
> -	spin_unlock_irqrestore(fence->lock, irq_flags);
> +	dma_fence_unlock(fence, irq_flags);
>  
>  	return signaled;
>  }
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index e1ba1d53de88..fb416f500664 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -34,7 +34,8 @@ struct seq_file;
>   * @ops: dma_fence_ops associated with this fence
>   * @rcu: used for releasing fence with kfree_rcu
>   * @cb_list: list of all callbacks to call
> - * @lock: spin_lock_irqsave used for locking
> + * @extern_lock: external spin_lock_irqsave used for locking
> + * @inline_lock: alternative internal spin_lock_irqsave used for locking
>   * @context: execution context this fence belongs to, returned by
>   *           dma_fence_context_alloc()
>   * @seqno: the sequence number of this fence inside the execution context,
> @@ -48,6 +49,7 @@ struct seq_file;
>   * atomic ops (bit_*), so taking the spinlock will not be needed most
>   * of the time.
>   *
> + * DMA_FENCE_FLAG_INLINE_LOCK_BIT - use inline spinlock instead of external one
>   * DMA_FENCE_FLAG_SIGNALED_BIT - fence is already signaled
>   * DMA_FENCE_FLAG_TIMESTAMP_BIT - timestamp recorded for fence signaling
>   * DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT - enable_signaling might have been called
> @@ -65,7 +67,10 @@ struct seq_file;
>   * been completed, or never called at all.
>   */
>  struct dma_fence {
> -	spinlock_t *lock;
> +	union {
> +		spinlock_t *extern_lock;
> +		spinlock_t inline_lock;
> +	};
>  	const struct dma_fence_ops __rcu *ops;
>  	/*
>  	 * We clear the callback list on kref_put so that by the time we
> @@ -98,6 +103,7 @@ struct dma_fence {
>  };
>  
>  enum dma_fence_flag_bits {
> +	DMA_FENCE_FLAG_INLINE_LOCK_BIT,
>  	DMA_FENCE_FLAG_SEQNO64_BIT,
>  	DMA_FENCE_FLAG_SIGNALED_BIT,
>  	DMA_FENCE_FLAG_TIMESTAMP_BIT,
> @@ -351,6 +357,38 @@ dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep)
>  	} while (1);
>  }
>  
> +/**
> + * dma_fence_spinlock - return pointer to the spinlock protecting the fence
> + * @fence: the fence to get the lock from
> + *
> + * Return either the pointer to the embedded or the external spin lock.
> + */
> +static inline spinlock_t *dma_fence_spinlock(struct dma_fence *fence)
> +{
> +	return test_bit(DMA_FENCE_FLAG_INLINE_LOCK_BIT, &fence->flags) ?
> +		&fence->inline_lock : fence->extern_lock;
> +}
> +
> +/**
> + * dma_fence_lock - irqsave lock the fence
> + * @fence: the fence to lock
> + * @flags: where to store the CPU flags.
> + *
> + * Lock the fence, preventing it from changing to the signaled state.
> + */
> +#define dma_fence_lock(fence, flags)	\
> +	spin_lock_irqsave(dma_fence_spinlock(fence), flags)
> +
> +/**
> + * dma_fence_unlock - unlock the fence and irqrestore
> + * @fence: the fence to unlock
> + * @flags the CPU flags to restore
> + *
> + * Unlock the fence, allowing it to change it's state to signaled again.
> + */
> +#define dma_fence_unlock(fence, flags)	\
> +	spin_unlock_irqrestore(dma_fence_spinlock(fence), flags)
> +
>  #ifdef CONFIG_LOCKDEP
>  bool dma_fence_begin_signalling(void);
>  void dma_fence_end_signalling(bool cookie);
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 02/15] dma-buf: rework stub fence initialisation
  2025-10-13 13:48 ` [PATCH 02/15] dma-buf: rework stub fence initialisation Christian König
  2025-10-14 15:03   ` Tvrtko Ursulin
@ 2025-10-24  7:29   ` Tvrtko Ursulin
  1 sibling, 0 replies; 47+ messages in thread
From: Tvrtko Ursulin @ 2025-10-24  7:29 UTC (permalink / raw)
  To: Christian König, phasta, alexdeucher, simona.vetter
  Cc: dri-devel, amd-gfx


On 13/10/2025 14:48, Christian König wrote:
> Instead of doing this on the first call of the function just initialize
> the stub fence during kernel load.
> 
> This has the clear advantage of lower overhead and also doesn't rely on
> the ops to not be NULL any more.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/dma-buf/dma-fence.c | 32 +++++++++++++++-----------------
>   1 file changed, 15 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index f0539c73ed57..51ee13d005bc 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -121,29 +121,27 @@ static const struct dma_fence_ops dma_fence_stub_ops = {
>   	.get_timeline_name = dma_fence_stub_get_name,
>   };
>   
> +static int __init dma_fence_init_stub(void)
> +{
> +	dma_fence_init(&dma_fence_stub, &dma_fence_stub_ops,
> +		       &dma_fence_stub_lock, 0, 0);
> +
> +	set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> +		&dma_fence_stub.flags);
> +
> +	dma_fence_signal_locked(&dma_fence_stub);

Kernel test robot reports lockdep_assert_held(fence->lock) inside here. 
Probably just copy and paste error, at least I don't see a reason why 
dma_fence_signal couldn't be called.

Regards,

Tvrtko

> +	return 0;
> +}
> +subsys_initcall(dma_fence_init_stub);
> +
>   /**
>    * dma_fence_get_stub - return a signaled fence
>    *
> - * Return a stub fence which is already signaled. The fence's
> - * timestamp corresponds to the first time after boot this
> - * function is called.
> + * Return a stub fence which is already signaled. The fence's timestamp
> + * corresponds to the initialisation time of the linux kernel.
>    */
>   struct dma_fence *dma_fence_get_stub(void)
>   {
> -	spin_lock(&dma_fence_stub_lock);
> -	if (!dma_fence_stub.ops) {
> -		dma_fence_init(&dma_fence_stub,
> -			       &dma_fence_stub_ops,
> -			       &dma_fence_stub_lock,
> -			       0, 0);
> -
> -		set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> -			&dma_fence_stub.flags);
> -
> -		dma_fence_signal_locked(&dma_fence_stub);
> -	}
> -	spin_unlock(&dma_fence_stub_lock);
> -
>   	return dma_fence_get(&dma_fence_stub);
>   }
>   EXPORT_SYMBOL(dma_fence_get_stub);


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Independence for dma_fences!
  2025-10-17  8:32     ` Philipp Stanner
@ 2025-10-28 14:06       ` Christian König
  2025-10-29 20:53         ` Matthew Brost
  0 siblings, 1 reply; 47+ messages in thread
From: Christian König @ 2025-10-28 14:06 UTC (permalink / raw)
  To: phasta, alexdeucher, simona.vetter, tursulin; +Cc: dri-devel, amd-gfx

On 10/17/25 10:32, Philipp Stanner wrote:
> On Tue, 2025-10-14 at 17:54 +0200, Christian König wrote:
>> On 13.10.25 16:54, Philipp Stanner wrote:
>>> On Mon, 2025-10-13 at 15:48 +0200, Christian König wrote:
>>>> Hi everyone,
>>>>
>>>> dma_fences have ever lived under the tyranny dictated by the module
>>>> lifetime of their issuer, leading to crashes should anybody still holding
>>>> a reference to a dma_fence when the module of the issuer was unloaded.
>>>>
>>>> But those days are over! The patch set following this mail finally
>>>> implements a way for issuers to release their dma_fence out of this
>>>> slavery and outlive the module who originally created them.
>>>>
>>>> Previously various approaches have been discussed, including changing the
>>>> locking semantics of the dma_fence callbacks (by me) as well as using the
>>>> drm scheduler as intermediate layer (by Sima) to disconnect dma_fences
>>>> from their actual users.
>>>>
>>>> Changing the locking semantics turned out to be much more trickier than
>>>> originally thought because especially on older drivers (nouveau, radeon,
>>>> but also i915) this locking semantics is actually needed for correct
>>>> operation.
>>>>
>>>> Using the drm_scheduler as intermediate layer is still a good idea and
>>>> should probably be implemented to make live simpler for some drivers, but
>>>> doesn't work for all use cases. Especially TLB flush fences, preemption
>>>> fences and userqueue fences don't go through the drm scheduler because it
>>>> doesn't make sense for them.
>>>>
>>>> Tvrtko did some really nice prerequisite work by protecting the returned
>>>> strings of the dma_fence_ops by RCU. This way dma_fence creators where
>>>> able to just wait for an RCU grace period after fence signaling before
>>>> they could be save to free those data structures.
>>>>
>>>> Now this patch set here goes a step further and protects the whole
>>>> dma_fence_ops structure by RCU, so that after the fence signals the
>>>> pointer to the dma_fence_ops is set to NULL when there is no wait nor
>>>> release callback given. All functionality which use the dma_fence_ops
>>>> reference are put inside an RCU critical section, except for the
>>>> deprecated issuer specific wait and of course the optional release
>>>> callback.
>>>>
>>>> Additional to the RCU changes the lock protecting the dma_fence state
>>>> previously had to be allocated external. This set here now changes the
>>>> functionality to make that external lock optional and allows dma_fences
>>>> to use an inline lock and be self contained.
>>>
>>> Allowing for an embedded lock, is that actually necessary for the goals
>>> of this series, or is it an optional change / improvement?
>>
>> It is kind of necessary because otherwise you can't fully determine the lifetime of the lock.
>>
>> The lock is used to avoid signaling a dma_fence when you modify the linked list of callbacks for example.
>>
>> An alternative would be to protect the lock by RCU as well instead of embedding it in the structure, but that would make things even more complicated.
>>
>>> If I understood you correctly at XDC you wanted to have an embedded
>>> lock because it improves the memory footprint and because an external
>>> lock couldn't achieve some goals about fence-signaling-order originally
>>> intended. Can you elaborate on that?
>>
>> The embedded lock is also nice to have for the dma_fence_array, dma_fence_chain and drm_sched_fence, but that just saves a few cache lines in some use cases.
>>
>> The fence-signaling-order is important for drivers like radeon where the external lock is protecting multiple fences from signaling at the same time and makes sure that everything stays in order.
> 
> I mean, neither external nor internal lock can somehow force the driver
> to signal fences in order, can they?

Nope, as I said before this approach is actually pretty useless.

> Only the driver can ensure this.

Only when the signaled callback is not implemented which basically all driver do.

So the whole point of sharing the lock is just not existent any more, it's just that changing it all at once as I tried before results in a way to big patch.

> 
> I am, however, considering modeling something like that on a
> FenceContext object:
> 
> fctx.signal_all_fences_up_to_ordered(seqno);

Yeah, I have patches for that as well. But then found that amdgpus TLB fences trigger that check and I won't have time to fix it.



> 
> 
> P.
> 
>>
>> While it is possible to change the locking semantics on such old drivers, it's probably just better to stay away from it.
>>
>> Regards,
>> Christian.
>>
>>>
>>> P.
>>>
>>>
>>>>
>>>> The new approach is then applied to amdgpu allowing the module to be
>>>> unloaded even when dma_fences issued by it are still around.
>>>>
>>>> Please review and comment,
>>>> Christian.
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Independence for dma_fences!
  2025-10-28 14:06       ` Christian König
@ 2025-10-29 20:53         ` Matthew Brost
  2025-10-30 10:59           ` Christian König
  0 siblings, 1 reply; 47+ messages in thread
From: Matthew Brost @ 2025-10-29 20:53 UTC (permalink / raw)
  To: Christian König
  Cc: phasta, alexdeucher, simona.vetter, tursulin, dri-devel, amd-gfx

On Tue, Oct 28, 2025 at 03:06:22PM +0100, Christian König wrote:
> On 10/17/25 10:32, Philipp Stanner wrote:
> > On Tue, 2025-10-14 at 17:54 +0200, Christian König wrote:
> >> On 13.10.25 16:54, Philipp Stanner wrote:
> >>> On Mon, 2025-10-13 at 15:48 +0200, Christian König wrote:
> >>>> Hi everyone,
> >>>>
> >>>> dma_fences have ever lived under the tyranny dictated by the module
> >>>> lifetime of their issuer, leading to crashes should anybody still holding
> >>>> a reference to a dma_fence when the module of the issuer was unloaded.
> >>>>
> >>>> But those days are over! The patch set following this mail finally
> >>>> implements a way for issuers to release their dma_fence out of this
> >>>> slavery and outlive the module who originally created them.
> >>>>
> >>>> Previously various approaches have been discussed, including changing the
> >>>> locking semantics of the dma_fence callbacks (by me) as well as using the
> >>>> drm scheduler as intermediate layer (by Sima) to disconnect dma_fences
> >>>> from their actual users.
> >>>>
> >>>> Changing the locking semantics turned out to be much more trickier than
> >>>> originally thought because especially on older drivers (nouveau, radeon,
> >>>> but also i915) this locking semantics is actually needed for correct
> >>>> operation.
> >>>>
> >>>> Using the drm_scheduler as intermediate layer is still a good idea and
> >>>> should probably be implemented to make live simpler for some drivers, but
> >>>> doesn't work for all use cases. Especially TLB flush fences, preemption
> >>>> fences and userqueue fences don't go through the drm scheduler because it
> >>>> doesn't make sense for them.
> >>>>
> >>>> Tvrtko did some really nice prerequisite work by protecting the returned
> >>>> strings of the dma_fence_ops by RCU. This way dma_fence creators where
> >>>> able to just wait for an RCU grace period after fence signaling before
> >>>> they could be save to free those data structures.
> >>>>
> >>>> Now this patch set here goes a step further and protects the whole
> >>>> dma_fence_ops structure by RCU, so that after the fence signals the
> >>>> pointer to the dma_fence_ops is set to NULL when there is no wait nor
> >>>> release callback given. All functionality which use the dma_fence_ops
> >>>> reference are put inside an RCU critical section, except for the
> >>>> deprecated issuer specific wait and of course the optional release
> >>>> callback.
> >>>>
> >>>> Additional to the RCU changes the lock protecting the dma_fence state
> >>>> previously had to be allocated external. This set here now changes the
> >>>> functionality to make that external lock optional and allows dma_fences
> >>>> to use an inline lock and be self contained.
> >>>
> >>> Allowing for an embedded lock, is that actually necessary for the goals
> >>> of this series, or is it an optional change / improvement?
> >>
> >> It is kind of necessary because otherwise you can't fully determine the lifetime of the lock.
> >>
> >> The lock is used to avoid signaling a dma_fence when you modify the linked list of callbacks for example.
> >>
> >> An alternative would be to protect the lock by RCU as well instead of embedding it in the structure, but that would make things even more complicated.
> >>
> >>> If I understood you correctly at XDC you wanted to have an embedded
> >>> lock because it improves the memory footprint and because an external
> >>> lock couldn't achieve some goals about fence-signaling-order originally
> >>> intended. Can you elaborate on that?
> >>
> >> The embedded lock is also nice to have for the dma_fence_array, dma_fence_chain and drm_sched_fence, but that just saves a few cache lines in some use cases.
> >>
> >> The fence-signaling-order is important for drivers like radeon where the external lock is protecting multiple fences from signaling at the same time and makes sure that everything stays in order.

Not to derail the conversation, but I noticed that dma-fence-arrays can,
in fact, signal out of order. The issue lies in dma-fence-cb, which
signals the fence using irq_queue_work. Internally, irq_queue_work uses
llist, a LIFO structure. So, if two dma-fence-arrays have all their
fences signaled from a thread, the IRQ work that signals each individual
dma-fence-array will execute out of order.

We should probably fix this.

Matt

> > 
> > I mean, neither external nor internal lock can somehow force the driver
> > to signal fences in order, can they?
> 
> Nope, as I said before this approach is actually pretty useless.
> 
> > Only the driver can ensure this.
> 
> Only when the signaled callback is not implemented which basically all driver do.
> 
> So the whole point of sharing the lock is just not existent any more, it's just that changing it all at once as I tried before results in a way to big patch.
> 
> > 
> > I am, however, considering modeling something like that on a
> > FenceContext object:
> > 
> > fctx.signal_all_fences_up_to_ordered(seqno);
> 
> Yeah, I have patches for that as well. But then found that amdgpus TLB fences trigger that check and I won't have time to fix it.
> 
> 
> 
> > 
> > 
> > P.
> > 
> >>
> >> While it is possible to change the locking semantics on such old drivers, it's probably just better to stay away from it.
> >>
> >> Regards,
> >> Christian.
> >>
> >>>
> >>> P.
> >>>
> >>>
> >>>>
> >>>> The new approach is then applied to amdgpu allowing the module to be
> >>>> unloaded even when dma_fences issued by it are still around.
> >>>>
> >>>> Please review and comment,
> >>>> Christian.
> >>>>
> >>>
> >>
> > 
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Independence for dma_fences!
  2025-10-29 20:53         ` Matthew Brost
@ 2025-10-30 10:59           ` Christian König
  2025-10-31 17:44             ` Matthew Brost
  0 siblings, 1 reply; 47+ messages in thread
From: Christian König @ 2025-10-30 10:59 UTC (permalink / raw)
  To: Matthew Brost
  Cc: phasta, alexdeucher, simona.vetter, tursulin, dri-devel, amd-gfx

On 10/29/25 21:53, Matthew Brost wrote:
> On Tue, Oct 28, 2025 at 03:06:22PM +0100, Christian König wrote:
>> On 10/17/25 10:32, Philipp Stanner wrote:
>>> On Tue, 2025-10-14 at 17:54 +0200, Christian König wrote:
>>>> On 13.10.25 16:54, Philipp Stanner wrote:
>>>>> On Mon, 2025-10-13 at 15:48 +0200, Christian König wrote:
>>>>>> Hi everyone,
>>>>>>
>>>>>> dma_fences have ever lived under the tyranny dictated by the module
>>>>>> lifetime of their issuer, leading to crashes should anybody still holding
>>>>>> a reference to a dma_fence when the module of the issuer was unloaded.
>>>>>>
>>>>>> But those days are over! The patch set following this mail finally
>>>>>> implements a way for issuers to release their dma_fence out of this
>>>>>> slavery and outlive the module who originally created them.
>>>>>>
>>>>>> Previously various approaches have been discussed, including changing the
>>>>>> locking semantics of the dma_fence callbacks (by me) as well as using the
>>>>>> drm scheduler as intermediate layer (by Sima) to disconnect dma_fences
>>>>>> from their actual users.
>>>>>>
>>>>>> Changing the locking semantics turned out to be much more trickier than
>>>>>> originally thought because especially on older drivers (nouveau, radeon,
>>>>>> but also i915) this locking semantics is actually needed for correct
>>>>>> operation.
>>>>>>
>>>>>> Using the drm_scheduler as intermediate layer is still a good idea and
>>>>>> should probably be implemented to make live simpler for some drivers, but
>>>>>> doesn't work for all use cases. Especially TLB flush fences, preemption
>>>>>> fences and userqueue fences don't go through the drm scheduler because it
>>>>>> doesn't make sense for them.
>>>>>>
>>>>>> Tvrtko did some really nice prerequisite work by protecting the returned
>>>>>> strings of the dma_fence_ops by RCU. This way dma_fence creators where
>>>>>> able to just wait for an RCU grace period after fence signaling before
>>>>>> they could be save to free those data structures.
>>>>>>
>>>>>> Now this patch set here goes a step further and protects the whole
>>>>>> dma_fence_ops structure by RCU, so that after the fence signals the
>>>>>> pointer to the dma_fence_ops is set to NULL when there is no wait nor
>>>>>> release callback given. All functionality which use the dma_fence_ops
>>>>>> reference are put inside an RCU critical section, except for the
>>>>>> deprecated issuer specific wait and of course the optional release
>>>>>> callback.
>>>>>>
>>>>>> Additional to the RCU changes the lock protecting the dma_fence state
>>>>>> previously had to be allocated external. This set here now changes the
>>>>>> functionality to make that external lock optional and allows dma_fences
>>>>>> to use an inline lock and be self contained.
>>>>>
>>>>> Allowing for an embedded lock, is that actually necessary for the goals
>>>>> of this series, or is it an optional change / improvement?
>>>>
>>>> It is kind of necessary because otherwise you can't fully determine the lifetime of the lock.
>>>>
>>>> The lock is used to avoid signaling a dma_fence when you modify the linked list of callbacks for example.
>>>>
>>>> An alternative would be to protect the lock by RCU as well instead of embedding it in the structure, but that would make things even more complicated.
>>>>
>>>>> If I understood you correctly at XDC you wanted to have an embedded
>>>>> lock because it improves the memory footprint and because an external
>>>>> lock couldn't achieve some goals about fence-signaling-order originally
>>>>> intended. Can you elaborate on that?
>>>>
>>>> The embedded lock is also nice to have for the dma_fence_array, dma_fence_chain and drm_sched_fence, but that just saves a few cache lines in some use cases.
>>>>
>>>> The fence-signaling-order is important for drivers like radeon where the external lock is protecting multiple fences from signaling at the same time and makes sure that everything stays in order.
> 
> Not to derail the conversation, but I noticed that dma-fence-arrays can,
> in fact, signal out of order. The issue lies in dma-fence-cb, which
> signals the fence using irq_queue_work. Internally, irq_queue_work uses
> llist, a LIFO structure. So, if two dma-fence-arrays have all their
> fences signaled from a thread, the IRQ work that signals each individual
> dma-fence-array will execute out of order.
> 
> We should probably fix this.

No we don't. That's what I'm trying to point out all the time.

The original idea of sharing the lock was to guarantee that fence signal in order, but that never worked correct even for driver fences.

The background is the optimization we do in the signaling fast path. E.g. when dma_fence_is_signaled() is called.

This means that when fence A,B and C are submitted to the HW it is perfectly possible that somebody query the status of fence B but not A and C. And this querying of the status is faster than the interrupt which signals A and C.

So in this scenario B signals before A.

The only way to avoid that is to not implement the fast path and as far as I know no real HW driver does that because it makes your driver horrible slow.

So of to the trash bin with the signaling order, things have worked for over 10 years without it and as far as I know nobody complained about it.

Regards,
Christian.
 

> 
> Matt
> 
>>>
>>> I mean, neither external nor internal lock can somehow force the driver
>>> to signal fences in order, can they?
>>
>> Nope, as I said before this approach is actually pretty useless.
>>
>>> Only the driver can ensure this.
>>
>> Only when the signaled callback is not implemented which basically all driver do.
>>
>> So the whole point of sharing the lock is just not existent any more, it's just that changing it all at once as I tried before results in a way to big patch.
>>
>>>
>>> I am, however, considering modeling something like that on a
>>> FenceContext object:
>>>
>>> fctx.signal_all_fences_up_to_ordered(seqno);
>>
>> Yeah, I have patches for that as well. But then found that amdgpus TLB fences trigger that check and I won't have time to fix it.
>>
>>
>>
>>>
>>>
>>> P.
>>>
>>>>
>>>> While it is possible to change the locking semantics on such old drivers, it's probably just better to stay away from it.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>
>>>>> P.
>>>>>
>>>>>
>>>>>>
>>>>>> The new approach is then applied to amdgpu allowing the module to be
>>>>>> unloaded even when dma_fences issued by it are still around.
>>>>>>
>>>>>> Please review and comment,
>>>>>> Christian.
>>>>>>
>>>>>
>>>>
>>>
>>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/15] dma-buf: detach fence ops on signal
  2025-10-16 15:57     ` Tvrtko Ursulin
  2025-10-23  4:23       ` Matthew Brost
@ 2025-10-30 13:52       ` Christian König
  2025-10-31 10:31         ` Tvrtko Ursulin
  1 sibling, 1 reply; 47+ messages in thread
From: Christian König @ 2025-10-30 13:52 UTC (permalink / raw)
  To: Tvrtko Ursulin, phasta, alexdeucher, simona.vetter; +Cc: dri-devel, amd-gfx

Hi Tvrtko,

On 10/16/25 17:57, Tvrtko Ursulin wrote:
> On 16/10/2025 09:56, Tvrtko Ursulin wrote:
>>
>> On 13/10/2025 14:48, Christian König wrote:
>>> When neither a release nor a wait operation is specified it is possible
>>> to let the dma_fence live on independent of the module who issued it.
>>>
>>> This makes it possible to unload drivers and only wait for all their
>>> fences to signal.
>>
>> Have you looked at whether the requirement to not have the release and wait callbacks will exclude some drivers from being able to benefit from this?
> 
> I had a browse and this seems to be the situation:

Oh, thanks a lot for doing that!

> 
> Custom .wait:
>  - radeon, qxl, nouveau, i915
> 
> Those would therefore still be vulnerable to the unbind->unload sequence. Actually not sure about qxl, but other three are PCI so in theory at least. I915 at least supports unbind and unload.

radeon, yeah I know that is because of the reset handling there. Not going to change and as maintainer I honestly don't care.

qxl, pretty outdated as well and probably not worth fixing it.

nouveau, no idea why that is there in the first place. Philip?

i915, that is really surprising. What is the reason for that?

> Custom .release:
>  - vgem, nouveau, lima, pvr, i915, usb-gadget, industrialio, etnaviv, xe
> 
> Out of those there do not actually need a custom release and could probably be weaned off it:
>  - usb-gadget, industrialio, etnaviv, xe
> 
> (Xe would lose a debug assert and some would have their kfrees replaced with kfree_rcu. Plus build time asserts added the struct dma-fence remains first in the respective driver structs. It sounds feasible.)

Oh, crap! Using kfree_rcu for dma_fences is an absolutely must have!

Where have you seen that? This is obviously a bug in the drivers doing that.
> That would leave us with .release in:
>  - vgem, nouveau, lima, pvr, i915
> 
> Combined list of custom .wait + .release:
>  - radeon, qxl, nouveau, i915, lima, pvr, vgem
> 
> From those the ones which support unbind and module unload would remain potentially vulnerable to use after free.
> 
> It doesn't sound great to only solve it partially but maybe it is a reasonable next step. Where could we go from there to solve it for everyone?
Well I only see the way of getting rid of the legacy stuff (like ->wait callbacks) for everybody who cares about their module unload.

But I'm pretty sure that for things like radeon and qxl we don't care.

Regards,
Christian.


> 
> Regards,
> 
> Tvrtko
> 
>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>> ---
>>>   drivers/dma-buf/dma-fence.c | 16 ++++++++++++----
>>>   include/linux/dma-fence.h   |  4 ++--
>>>   2 files changed, 14 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>>> index 982f2b2a62c0..39f73edf3a33 100644
>>> --- a/drivers/dma-buf/dma-fence.c
>>> +++ b/drivers/dma-buf/dma-fence.c
>>> @@ -374,6 +374,14 @@ int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>>>                         &fence->flags)))
>>>           return -EINVAL;
>>> +    /*
>>> +     * When neither a release nor a wait operation is specified set the ops
>>> +     * pointer to NULL to allow the fence structure to become independent
>>> +     * who originally issued it.
>>> +     */
>>> +    if (!fence->ops->release && !fence->ops->wait)
>>> +        RCU_INIT_POINTER(fence->ops, NULL);
>>> +
>>>       /* Stash the cb_list before replacing it with the timestamp */
>>>       list_replace(&fence->cb_list, &cb_list);
>>> @@ -513,7 +521,7 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
>>>       rcu_read_lock();
>>>       ops = rcu_dereference(fence->ops);
>>>       trace_dma_fence_wait_start(fence);
>>> -    if (ops->wait) {
>>> +    if (ops && ops->wait) {
>>>           /*
>>>            * Implementing the wait ops is deprecated and not supported for
>>>            * issuer independent fences, so it is ok to use the ops outside
>>> @@ -578,7 +586,7 @@ void dma_fence_release(struct kref *kref)
>>>       }
>>>       ops = rcu_dereference(fence->ops);
>>> -    if (ops->release)
>>> +    if (ops && ops->release)
>>>           ops->release(fence);
>>>       else
>>>           dma_fence_free(fence);
>>> @@ -614,7 +622,7 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
>>>       rcu_read_lock();
>>>       ops = rcu_dereference(fence->ops);
>>> -    if (!was_set && ops->enable_signaling) {
>>> +    if (!was_set && ops && ops->enable_signaling) {
>>>           trace_dma_fence_enable_signal(fence);
>>>           if (!ops->enable_signaling(fence)) {
>>> @@ -1000,7 +1008,7 @@ void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
>>>       rcu_read_lock();
>>>       ops = rcu_dereference(fence->ops);
>>> -    if (ops->set_deadline && !dma_fence_is_signaled(fence))
>>> +    if (ops && ops->set_deadline && !dma_fence_is_signaled(fence))
>>>           ops->set_deadline(fence, deadline);
>>>       rcu_read_unlock();
>>>   }
>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>>> index 38421a0c7c5b..e1ba1d53de88 100644
>>> --- a/include/linux/dma-fence.h
>>> +++ b/include/linux/dma-fence.h
>>> @@ -425,7 +425,7 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
>>>       rcu_read_lock();
>>>       ops = rcu_dereference(fence->ops);
>>> -    if (ops->signaled && ops->signaled(fence)) {
>>> +    if (ops && ops->signaled && ops->signaled(fence)) {
>>>           rcu_read_unlock();
>>>           dma_fence_signal_locked(fence);
>>>           return true;
>>> @@ -461,7 +461,7 @@ dma_fence_is_signaled(struct dma_fence *fence)
>>>       rcu_read_lock();
>>>       ops = rcu_dereference(fence->ops);
>>> -    if (ops->signaled && ops->signaled(fence)) {
>>> +    if (ops && ops->signaled && ops->signaled(fence)) {
>>>           rcu_read_unlock();
>>>           dma_fence_signal(fence);
>>>           return true;
>>
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/15] dma-buf: detach fence ops on signal
  2025-10-17  9:14   ` Philipp Stanner
@ 2025-10-30 15:05     ` Christian König
  0 siblings, 0 replies; 47+ messages in thread
From: Christian König @ 2025-10-30 15:05 UTC (permalink / raw)
  To: phasta, alexdeucher, simona.vetter, tursulin; +Cc: dri-devel, amd-gfx

On 10/17/25 11:14, Philipp Stanner wrote:
> On Mon, 2025-10-13 at 15:48 +0200, Christian König wrote:
>> When neither a release nor a wait operation is specified it is possible
>> to let the dma_fence live on independent of the module who issued it.
>>
>> This makes it possible to unload drivers and only wait for all their
>> fences to signal.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>  drivers/dma-buf/dma-fence.c | 16 ++++++++++++----
>>  include/linux/dma-fence.h   |  4 ++--
>>  2 files changed, 14 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>> index 982f2b2a62c0..39f73edf3a33 100644
>> --- a/drivers/dma-buf/dma-fence.c
>> +++ b/drivers/dma-buf/dma-fence.c
>> @@ -374,6 +374,14 @@ int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>>  				      &fence->flags)))
>>  		return -EINVAL;
>>  
>> +	/*
>> +	 * When neither a release nor a wait operation is specified set the ops
>> +	 * pointer to NULL to allow the fence structure to become independent
>> +	 * who originally issued it.
>> +	 */
>> +	if (!fence->ops->release && !fence->ops->wait)
>> +		RCU_INIT_POINTER(fence->ops, NULL);
> 
> OK, so the basic idea is that still living fences can't access driver
> data or driver code anymore after the driver is unloaded. Good and
> well, nice idea. We need something like that in Rust, too.
> 
> That's based on the rule that the driver, on unload, must signal all
> the fences. Also OK.
> 
> However, how can that possibly fly by relying on the release callback
> not being implemented? How many users don't need it, and could those
> who implement release() be ported to.. sth else?

As far as I can see the only one who really needs the ->release callback for technical reasons is the DRM scheduler fence and the dma_fence_array and dma_fence_chain containers. 

For the DRM scheduler fence it is just the finished fence which needs to drop the reference to the scheduled fence because we can now be sure that nobody can cast the fence any more.

For the dma_fence_array we could actually clean up the state on signaling, but that would need some more cleanup in the framework.

For the dma_fence_chain it is a must have to avoid potential kernel stack overrun.

Apart from that all drivers should be able to cleanup their internal state necessary for signaling when they actually signal.

Regards,
Christian.

> 
> 
> P.
> 
>> +
>>  	/* Stash the cb_list before replacing it with the timestamp */
>>  	list_replace(&fence->cb_list, &cb_list);
>>  
>> @@ -513,7 +521,7 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
>>  	rcu_read_lock();
>>  	ops = rcu_dereference(fence->ops);
>>  	trace_dma_fence_wait_start(fence);
>> -	if (ops->wait) {
>> +	if (ops && ops->wait) {
>>  		/*
>>  		 * Implementing the wait ops is deprecated and not supported for
>>  		 * issuer independent fences, so it is ok to use the ops outside
>> @@ -578,7 +586,7 @@ void dma_fence_release(struct kref *kref)
>>  	}
>>  
>>  	ops = rcu_dereference(fence->ops);
>> -	if (ops->release)
>> +	if (ops && ops->release)
>>  		ops->release(fence);
>>  	else
>>  		dma_fence_free(fence);
>> @@ -614,7 +622,7 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
>>  
>>  	rcu_read_lock();
>>  	ops = rcu_dereference(fence->ops);
>> -	if (!was_set && ops->enable_signaling) {
>> +	if (!was_set && ops && ops->enable_signaling) {
>>  		trace_dma_fence_enable_signal(fence);
>>  
>>  		if (!ops->enable_signaling(fence)) {
>> @@ -1000,7 +1008,7 @@ void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
>>  
>>  	rcu_read_lock();
>>  	ops = rcu_dereference(fence->ops);
>> -	if (ops->set_deadline && !dma_fence_is_signaled(fence))
>> +	if (ops && ops->set_deadline && !dma_fence_is_signaled(fence))
>>  		ops->set_deadline(fence, deadline);
>>  	rcu_read_unlock();
>>  }
>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>> index 38421a0c7c5b..e1ba1d53de88 100644
>> --- a/include/linux/dma-fence.h
>> +++ b/include/linux/dma-fence.h
>> @@ -425,7 +425,7 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
>>  
>>  	rcu_read_lock();
>>  	ops = rcu_dereference(fence->ops);
>> -	if (ops->signaled && ops->signaled(fence)) {
>> +	if (ops && ops->signaled && ops->signaled(fence)) {
>>  		rcu_read_unlock();
>>  		dma_fence_signal_locked(fence);
>>  		return true;
>> @@ -461,7 +461,7 @@ dma_fence_is_signaled(struct dma_fence *fence)
>>  
>>  	rcu_read_lock();
>>  	ops = rcu_dereference(fence->ops);
>> -	if (ops->signaled && ops->signaled(fence)) {
>> +	if (ops && ops->signaled && ops->signaled(fence)) {
>>  		rcu_read_unlock();
>>  		dma_fence_signal(fence);
>>  		return true;
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 14/15] drm/amdgpu: independence for the amdkfd_fence!
  2025-10-17 22:22   ` Felix Kuehling
@ 2025-10-30 15:07     ` Christian König
  2025-10-30 20:04       ` Felix Kuehling
  0 siblings, 1 reply; 47+ messages in thread
From: Christian König @ 2025-10-30 15:07 UTC (permalink / raw)
  To: Felix Kuehling, phasta, alexdeucher, simona.vetter, tursulin
  Cc: dri-devel, amd-gfx



On 10/18/25 00:22, Felix Kuehling wrote:
> 
> On 2025-10-13 09:48, Christian König wrote:
>> This should allow amdkfd_fences to outlive the amdgpu module.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  6 ++++
>>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c  | 36 +++++++------------
>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c      |  7 ++--
>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c          |  4 +--
>>   4 files changed, 24 insertions(+), 29 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> index 9e120c934cc1..35c59c784b7b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> @@ -196,6 +196,7 @@ int kfd_debugfs_kfd_mem_limits(struct seq_file *m, void *data);
>>   #endif
>>   #if IS_ENABLED(CONFIG_HSA_AMD)
>>   bool amdkfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm);
>> +void amdkfd_fence_signal(struct dma_fence *f);
>>   struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f);
>>   void amdgpu_amdkfd_remove_all_eviction_fences(struct amdgpu_bo *bo);
>>   int amdgpu_amdkfd_evict_userptr(struct mmu_interval_notifier *mni,
>> @@ -210,6 +211,11 @@ bool amdkfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm)
>>       return false;
>>   }
>>   +static inline
>> +void amdkfd_fence_signal(struct dma_fence *f)
>> +{
>> +}
>> +
>>   static inline
>>   struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f)
>>   {
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>> index 09c919f72b6c..69bca4536326 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>> @@ -127,29 +127,9 @@ static bool amdkfd_fence_enable_signaling(struct dma_fence *f)
>>           if (!svm_range_schedule_evict_svm_bo(fence))
>>               return true;
>>       }
>> -    return false;
>> -}
>> -
>> -/**
>> - * amdkfd_fence_release - callback that fence can be freed
>> - *
>> - * @f: dma_fence
>> - *
>> - * This function is called when the reference count becomes zero.
>> - * Drops the mm_struct reference and RCU schedules freeing up the fence.
>> - */
>> -static void amdkfd_fence_release(struct dma_fence *f)
>> -{
>> -    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>> -
>> -    /* Unconditionally signal the fence. The process is getting
>> -     * terminated.
>> -     */
>> -    if (WARN_ON(!fence))
>> -        return; /* Not an amdgpu_amdkfd_fence */
>> -
>>       mmdrop(fence->mm);
>> -    kfree_rcu(f, rcu);
>> +    fence->mm = NULL;
>> +    return false;
>>   }
>>     /**
>> @@ -174,9 +154,19 @@ bool amdkfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm)
>>       return false;
>>   }
>>   +void amdkfd_fence_signal(struct dma_fence *f)
>> +{
>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>> +
>> +    if (fence) {
>> +        mmdrop(fence->mm);
>> +        fence->mm = NULL;
> 
> Isn't fence->mm already NULL here if it was dropped in amdkfd_fence_enable_signaling?

It looked like ther're some use cases which signals the fence without going through amdkfd_fence_enable_signaling.

E.g. kfd_process_wq_release which is most likely used on process tear down.

Regards,
Christian.

> 
> Regards,
>   Felix
> 
> 
>> +    }
>> +    dma_fence_signal(f);
>> +}
>> +
>>   static const struct dma_fence_ops amdkfd_fence_ops = {
>>       .get_driver_name = amdkfd_fence_get_driver_name,
>>       .get_timeline_name = amdkfd_fence_get_timeline_name,
>>       .enable_signaling = amdkfd_fence_enable_signaling,
>> -    .release = amdkfd_fence_release,
>>   };
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> index ddfe30c13e9d..779d7701bac9 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> @@ -1177,7 +1177,7 @@ static void kfd_process_wq_release(struct work_struct *work)
>>       synchronize_rcu();
>>       ef = rcu_access_pointer(p->ef);
>>       if (ef)
>> -        dma_fence_signal(ef);
>> +        amdkfd_fence_signal(ef);
>>         kfd_process_remove_sysfs(p);
>>       kfd_debugfs_remove_process(p);
>> @@ -1986,7 +1986,6 @@ kfd_process_gpuid_from_node(struct kfd_process *p, struct kfd_node *node,
>>   static int signal_eviction_fence(struct kfd_process *p)
>>   {
>>       struct dma_fence *ef;
>> -    int ret;
>>         rcu_read_lock();
>>       ef = dma_fence_get_rcu_safe(&p->ef);
>> @@ -1994,10 +1993,10 @@ static int signal_eviction_fence(struct kfd_process *p)
>>       if (!ef)
>>           return -EINVAL;
>>   -    ret = dma_fence_signal(ef);
>> +    amdkfd_fence_signal(ef);
>>       dma_fence_put(ef);
>>   -    return ret;
>> +    return 0;
>>   }
>>     static void evict_process_worker(struct work_struct *work)
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> index 91609dd5730f..01ce2d853602 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> @@ -428,7 +428,7 @@ static void svm_range_bo_release(struct kref *kref)
>>         if (!dma_fence_is_signaled(&svm_bo->eviction_fence->base))
>>           /* We're not in the eviction worker. Signal the fence. */
>> -        dma_fence_signal(&svm_bo->eviction_fence->base);
>> +        amdkfd_fence_signal(&svm_bo->eviction_fence->base);
>>       dma_fence_put(&svm_bo->eviction_fence->base);
>>       amdgpu_bo_unref(&svm_bo->bo);
>>       kfree(svm_bo);
>> @@ -3628,7 +3628,7 @@ static void svm_range_evict_svm_bo_worker(struct work_struct *work)
>>       mmap_read_unlock(mm);
>>       mmput(mm);
>>   -    dma_fence_signal(&svm_bo->eviction_fence->base);
>> +    amdkfd_fence_signal(&svm_bo->eviction_fence->base);
>>         /* This is the last reference to svm_bo, after svm_range_vram_node_free
>>        * has been called in svm_migrate_vram_to_ram


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/15] dma-buf: inline spinlock for fence protection
  2025-10-23 18:09   ` Matthew Brost
@ 2025-10-30 15:14     ` Christian König
  0 siblings, 0 replies; 47+ messages in thread
From: Christian König @ 2025-10-30 15:14 UTC (permalink / raw)
  To: Matthew Brost
  Cc: phasta, alexdeucher, simona.vetter, tursulin, dri-devel, amd-gfx

On 10/23/25 20:09, Matthew Brost wrote:
> On Mon, Oct 13, 2025 at 03:48:32PM +0200, Christian König wrote:
>> Allow implementations to not give a spinlock to protect the fence
>> internal state, instead a spinlock embedded into the fence structure
>> itself is used in this case.
>>
>> Apart from simplifying the handling for containers and the stub fence
>> this has the advantage of allowing implementations to issue fences
>> without caring about theit spinlock lifetime.
>>
>> That in turn is necessary for independent fences who outlive the module
>> who originally issued them.
>>
> 
> One thing Xe really wants to do is use a shared lock for HW fences,
> since our IRQ handler walks the pending fence list under the shared lock
> and signals the fences. I don’t think it would be desirable to split
> this into an IRQ handler list lock and individual fence locks.

Why it is desireable to have the same lock for the list in the IRQ handler and the dma_fences?

The lock inside the dma_fences is used by both the CPU cores who want to wait for the event as well as the one who signals the event.

So what you end up with is that the cache line for this spinlock constantly plays ping/pong between the CPU cores, and that is really undesireable.

It is potentially be much better to have one lock for your IRQ handler and a separate lock for the dma_fences.

> It would be great if we could come up with a way to support this model,
> ensure it's safe for module unload, and document it clearly.
> 
> I’ve thought of a few possible ideas:
> 
> - After a fence signals, the dma-fence core should not be allowed to touch
>   the external lock.

I though about that a lot as well, but as far as I can see that is impossible. The main reason for the lock is to protect the signaled state, so no state change without grabbing the lock.

There is the fast path of checking the signaled bit before grabbing the lock and that is just for pure optimization and not correctness.

Regards,
Christian.


> - Only export HW fences to the DRM scheduler, which guarantees that after
>   HW signaling, it won’t perform any dangerous operations (e.g., making a
>   call that takes the external lock). This really isn't generic solution
>   though.
> - Create an embedded dma-fence spinlock structure with a refcount and a
>   dma-fence flag indicating that the external lock is refcounted. In
>   dma_fence_free, drop the lock’s refcount.
> 
> In all of the above cases, the rule is that all module fences must be
> signaled before unloading.
> 
> Thoughts?
> 
> Matt
> 
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>  drivers/dma-buf/dma-fence.c              | 54 ++++++++++++------------
>>  drivers/dma-buf/sw_sync.c                | 14 +++---
>>  drivers/dma-buf/sync_debug.h             |  2 +-
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  4 +-
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |  4 +-
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 12 +++---
>>  drivers/gpu/drm/drm_crtc.c               |  2 +-
>>  drivers/gpu/drm/drm_writeback.c          |  2 +-
>>  drivers/gpu/drm/nouveau/nouveau_drm.c    |  5 ++-
>>  drivers/gpu/drm/nouveau/nouveau_fence.c  |  3 +-
>>  drivers/gpu/drm/qxl/qxl_release.c        |  3 +-
>>  drivers/gpu/drm/vmwgfx/vmwgfx_fence.c    |  3 +-
>>  drivers/gpu/drm/xe/xe_hw_fence.c         |  3 +-
>>  drivers/gpu/drm/xe/xe_sched_job.c        |  4 +-
>>  include/linux/dma-fence.h                | 42 +++++++++++++++++-
>>  15 files changed, 99 insertions(+), 58 deletions(-)
>>
>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>> index 39f73edf3a33..a0b328fdd90d 100644
>> --- a/drivers/dma-buf/dma-fence.c
>> +++ b/drivers/dma-buf/dma-fence.c
>> @@ -343,7 +343,6 @@ void __dma_fence_might_wait(void)
>>  }
>>  #endif
>>  
>> -
>>  /**
>>   * dma_fence_signal_timestamp_locked - signal completion of a fence
>>   * @fence: the fence to signal
>> @@ -368,7 +367,7 @@ int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>>  	struct dma_fence_cb *cur, *tmp;
>>  	struct list_head cb_list;
>>  
>> -	lockdep_assert_held(fence->lock);
>> +	lockdep_assert_held(dma_fence_spinlock(fence));
>>  
>>  	if (unlikely(test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
>>  				      &fence->flags)))
>> @@ -421,9 +420,9 @@ int dma_fence_signal_timestamp(struct dma_fence *fence, ktime_t timestamp)
>>  	if (WARN_ON(!fence))
>>  		return -EINVAL;
>>  
>> -	spin_lock_irqsave(fence->lock, flags);
>> +	dma_fence_lock(fence, flags);
>>  	ret = dma_fence_signal_timestamp_locked(fence, timestamp);
>> -	spin_unlock_irqrestore(fence->lock, flags);
>> +	dma_fence_unlock(fence, flags);
>>  
>>  	return ret;
>>  }
>> @@ -475,9 +474,9 @@ int dma_fence_signal(struct dma_fence *fence)
>>  
>>  	tmp = dma_fence_begin_signalling();
>>  
>> -	spin_lock_irqsave(fence->lock, flags);
>> +	dma_fence_lock(fence, flags);
>>  	ret = dma_fence_signal_timestamp_locked(fence, ktime_get());
>> -	spin_unlock_irqrestore(fence->lock, flags);
>> +	dma_fence_unlock(fence, flags);
>>  
>>  	dma_fence_end_signalling(tmp);
>>  
>> @@ -579,10 +578,10 @@ void dma_fence_release(struct kref *kref)
>>  		 * don't leave chains dangling. We set the error flag first
>>  		 * so that the callbacks know this signal is due to an error.
>>  		 */
>> -		spin_lock_irqsave(fence->lock, flags);
>> +		dma_fence_lock(fence, flags);
>>  		fence->error = -EDEADLK;
>>  		dma_fence_signal_locked(fence);
>> -		spin_unlock_irqrestore(fence->lock, flags);
>> +		dma_fence_unlock(fence, flags);
>>  	}
>>  
>>  	ops = rcu_dereference(fence->ops);
>> @@ -612,7 +611,7 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
>>  	const struct dma_fence_ops *ops;
>>  	bool was_set;
>>  
>> -	lockdep_assert_held(fence->lock);
>> +	lockdep_assert_held(dma_fence_spinlock(fence));
>>  
>>  	was_set = test_and_set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
>>  				   &fence->flags);
>> @@ -648,9 +647,9 @@ void dma_fence_enable_sw_signaling(struct dma_fence *fence)
>>  {
>>  	unsigned long flags;
>>  
>> -	spin_lock_irqsave(fence->lock, flags);
>> +	dma_fence_lock(fence, flags);
>>  	__dma_fence_enable_signaling(fence);
>> -	spin_unlock_irqrestore(fence->lock, flags);
>> +	dma_fence_unlock(fence, flags);
>>  }
>>  EXPORT_SYMBOL(dma_fence_enable_sw_signaling);
>>  
>> @@ -690,8 +689,7 @@ int dma_fence_add_callback(struct dma_fence *fence, struct dma_fence_cb *cb,
>>  		return -ENOENT;
>>  	}
>>  
>> -	spin_lock_irqsave(fence->lock, flags);
>> -
>> +	dma_fence_lock(fence, flags);
>>  	if (__dma_fence_enable_signaling(fence)) {
>>  		cb->func = func;
>>  		list_add_tail(&cb->node, &fence->cb_list);
>> @@ -699,8 +697,7 @@ int dma_fence_add_callback(struct dma_fence *fence, struct dma_fence_cb *cb,
>>  		INIT_LIST_HEAD(&cb->node);
>>  		ret = -ENOENT;
>>  	}
>> -
>> -	spin_unlock_irqrestore(fence->lock, flags);
>> +	dma_fence_unlock(fence, flags);
>>  
>>  	return ret;
>>  }
>> @@ -723,9 +720,9 @@ int dma_fence_get_status(struct dma_fence *fence)
>>  	unsigned long flags;
>>  	int status;
>>  
>> -	spin_lock_irqsave(fence->lock, flags);
>> +	dma_fence_lock(fence, flags);
>>  	status = dma_fence_get_status_locked(fence);
>> -	spin_unlock_irqrestore(fence->lock, flags);
>> +	dma_fence_unlock(fence, flags);
>>  
>>  	return status;
>>  }
>> @@ -755,13 +752,11 @@ dma_fence_remove_callback(struct dma_fence *fence, struct dma_fence_cb *cb)
>>  	unsigned long flags;
>>  	bool ret;
>>  
>> -	spin_lock_irqsave(fence->lock, flags);
>> -
>> +	dma_fence_lock(fence, flags);
>>  	ret = !list_empty(&cb->node);
>>  	if (ret)
>>  		list_del_init(&cb->node);
>> -
>> -	spin_unlock_irqrestore(fence->lock, flags);
>> +	dma_fence_unlock(fence, flags);
>>  
>>  	return ret;
>>  }
>> @@ -800,8 +795,7 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
>>  	unsigned long flags;
>>  	signed long ret = timeout ? timeout : 1;
>>  
>> -	spin_lock_irqsave(fence->lock, flags);
>> -
>> +	dma_fence_lock(fence, flags);
>>  	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>  		goto out;
>>  
>> @@ -824,11 +818,11 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
>>  			__set_current_state(TASK_INTERRUPTIBLE);
>>  		else
>>  			__set_current_state(TASK_UNINTERRUPTIBLE);
>> -		spin_unlock_irqrestore(fence->lock, flags);
>> +		dma_fence_unlock(fence, flags);
>>  
>>  		ret = schedule_timeout(ret);
>>  
>> -		spin_lock_irqsave(fence->lock, flags);
>> +		dma_fence_lock(fence, flags);
>>  		if (ret > 0 && intr && signal_pending(current))
>>  			ret = -ERESTARTSYS;
>>  	}
>> @@ -838,7 +832,7 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
>>  	__set_current_state(TASK_RUNNING);
>>  
>>  out:
>> -	spin_unlock_irqrestore(fence->lock, flags);
>> +	dma_fence_unlock(fence, flags);
>>  	return ret;
>>  }
>>  EXPORT_SYMBOL(dma_fence_default_wait);
>> @@ -1046,7 +1040,6 @@ static void
>>  __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>>  	         spinlock_t *lock, u64 context, u64 seqno, unsigned long flags)
>>  {
>> -	BUG_ON(!lock);
>>  	BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
>>  
>>  	kref_init(&fence->refcount);
>> @@ -1057,10 +1050,15 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>>  	 */
>>  	RCU_INIT_POINTER(fence->ops, ops);
>>  	INIT_LIST_HEAD(&fence->cb_list);
>> -	fence->lock = lock;
>>  	fence->context = context;
>>  	fence->seqno = seqno;
>>  	fence->flags = flags;
>> +	if (lock) {
>> +		fence->extern_lock = lock;
>> +	} else {
>> +		spin_lock_init(&fence->inline_lock);
>> +		fence->flags |= BIT(DMA_FENCE_FLAG_INLINE_LOCK_BIT);
>> +	}
>>  	fence->error = 0;
>>  
>>  	trace_dma_fence_init(fence);
>> diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
>> index 3c20f1d31cf5..8f48529214a4 100644
>> --- a/drivers/dma-buf/sw_sync.c
>> +++ b/drivers/dma-buf/sw_sync.c
>> @@ -155,12 +155,12 @@ static void timeline_fence_release(struct dma_fence *fence)
>>  	struct sync_timeline *parent = dma_fence_parent(fence);
>>  	unsigned long flags;
>>  
>> -	spin_lock_irqsave(fence->lock, flags);
>> +	dma_fence_lock(fence, flags);
>>  	if (!list_empty(&pt->link)) {
>>  		list_del(&pt->link);
>>  		rb_erase(&pt->node, &parent->pt_tree);
>>  	}
>> -	spin_unlock_irqrestore(fence->lock, flags);
>> +	dma_fence_unlock(fence, flags);
>>  
>>  	sync_timeline_put(parent);
>>  	dma_fence_free(fence);
>> @@ -178,7 +178,7 @@ static void timeline_fence_set_deadline(struct dma_fence *fence, ktime_t deadlin
>>  	struct sync_pt *pt = dma_fence_to_sync_pt(fence);
>>  	unsigned long flags;
>>  
>> -	spin_lock_irqsave(fence->lock, flags);
>> +	dma_fence_lock(fence, flags);
>>  	if (test_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags)) {
>>  		if (ktime_before(deadline, pt->deadline))
>>  			pt->deadline = deadline;
>> @@ -186,7 +186,7 @@ static void timeline_fence_set_deadline(struct dma_fence *fence, ktime_t deadlin
>>  		pt->deadline = deadline;
>>  		__set_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags);
>>  	}
>> -	spin_unlock_irqrestore(fence->lock, flags);
>> +	dma_fence_unlock(fence, flags);
>>  }
>>  
>>  static const struct dma_fence_ops timeline_fence_ops = {
>> @@ -427,13 +427,13 @@ static int sw_sync_ioctl_get_deadline(struct sync_timeline *obj, unsigned long a
>>  		goto put_fence;
>>  	}
>>  
>> -	spin_lock_irqsave(fence->lock, flags);
>> +	dma_fence_lock(fence, flags);
>>  	if (!test_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags)) {
>>  		ret = -ENOENT;
>>  		goto unlock;
>>  	}
>>  	data.deadline_ns = ktime_to_ns(pt->deadline);
>> -	spin_unlock_irqrestore(fence->lock, flags);
>> +	dma_fence_unlock(fence, flags);
>>  
>>  	dma_fence_put(fence);
>>  
>> @@ -446,7 +446,7 @@ static int sw_sync_ioctl_get_deadline(struct sync_timeline *obj, unsigned long a
>>  	return 0;
>>  
>>  unlock:
>> -	spin_unlock_irqrestore(fence->lock, flags);
>> +	dma_fence_unlock(fence, flags);
>>  put_fence:
>>  	dma_fence_put(fence);
>>  
>> diff --git a/drivers/dma-buf/sync_debug.h b/drivers/dma-buf/sync_debug.h
>> index 02af347293d0..c49324505b20 100644
>> --- a/drivers/dma-buf/sync_debug.h
>> +++ b/drivers/dma-buf/sync_debug.h
>> @@ -47,7 +47,7 @@ struct sync_timeline {
>>  
>>  static inline struct sync_timeline *dma_fence_parent(struct dma_fence *fence)
>>  {
>> -	return container_of(fence->lock, struct sync_timeline, lock);
>> +	return container_of(fence->extern_lock, struct sync_timeline, lock);
>>  }
>>  
>>  /**
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>> index 5ec5c3ff22bb..fcc7a3fb93b3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>> @@ -468,10 +468,10 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, unsigned int vmid,
>>  	if (amdgpu_sriov_vf(ring->adev) || !ring->funcs->soft_recovery || !fence)
>>  		return false;
>>  
>> -	spin_lock_irqsave(fence->lock, flags);
>> +	dma_fence_lock(fence, flags);
>>  	if (!dma_fence_is_signaled_locked(fence))
>>  		dma_fence_set_error(fence, -ENODATA);
>> -	spin_unlock_irqrestore(fence->lock, flags);
>> +	dma_fence_unlock(fence, flags);
>>  
>>  	while (!dma_fence_is_signaled(fence) &&
>>  	       ktime_to_ns(ktime_sub(deadline, ktime_get())) > 0)
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index db66b4232de0..db6516ce8335 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -2774,8 +2774,8 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
>>  	dma_fence_put(vm->last_unlocked);
>>  	dma_fence_wait(vm->last_tlb_flush, false);
>>  	/* Make sure that all fence callbacks have completed */
>> -	spin_lock_irqsave(vm->last_tlb_flush->lock, flags);
>> -	spin_unlock_irqrestore(vm->last_tlb_flush->lock, flags);
>> +	dma_fence_lock(vm->last_tlb_flush, flags);
>> +	dma_fence_unlock(vm->last_tlb_flush, flags);
>>  	dma_fence_put(vm->last_tlb_flush);
>>  
>>  	list_for_each_entry_safe(mapping, tmp, &vm->freed, list) {
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> index 77207f4e448e..4fc7f66b7d13 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> @@ -631,20 +631,20 @@ bool amdgpu_vm_is_bo_always_valid(struct amdgpu_vm *vm, struct amdgpu_bo *bo);
>>   */
>>  static inline uint64_t amdgpu_vm_tlb_seq(struct amdgpu_vm *vm)
>>  {
>> +	struct dma_fence *fence;
>>  	unsigned long flags;
>> -	spinlock_t *lock;
>>  
>>  	/*
>>  	 * Workaround to stop racing between the fence signaling and handling
>> -	 * the cb. The lock is static after initially setting it up, just make
>> -	 * sure that the dma_fence structure isn't freed up.
>> +	 * the cb.
>>  	 */
>>  	rcu_read_lock();
>> -	lock = vm->last_tlb_flush->lock;
>> +	fence = dma_fence_get_rcu(vm->last_tlb_flush);
>>  	rcu_read_unlock();
>>  
>> -	spin_lock_irqsave(lock, flags);
>> -	spin_unlock_irqrestore(lock, flags);
>> +	dma_fence_lock(fence, flags);
>> +	dma_fence_unlock(fence, flags);
>> +	dma_fence_put(fence);
>>  
>>  	return atomic64_read(&vm->tlb_seq);
>>  }
>> diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
>> index 46655339003d..ad47f58cd159 100644
>> --- a/drivers/gpu/drm/drm_crtc.c
>> +++ b/drivers/gpu/drm/drm_crtc.c
>> @@ -159,7 +159,7 @@ static const struct dma_fence_ops drm_crtc_fence_ops;
>>  static struct drm_crtc *fence_to_crtc(struct dma_fence *fence)
>>  {
>>  	BUG_ON(fence->ops != &drm_crtc_fence_ops);
>> -	return container_of(fence->lock, struct drm_crtc, fence_lock);
>> +	return container_of(fence->extern_lock, struct drm_crtc, fence_lock);
>>  }
>>  
>>  static const char *drm_crtc_fence_get_driver_name(struct dma_fence *fence)
>> diff --git a/drivers/gpu/drm/drm_writeback.c b/drivers/gpu/drm/drm_writeback.c
>> index 95b8a2e4bda6..624a4e8b6c99 100644
>> --- a/drivers/gpu/drm/drm_writeback.c
>> +++ b/drivers/gpu/drm/drm_writeback.c
>> @@ -81,7 +81,7 @@
>>   *	From userspace, this property will always read as zero.
>>   */
>>  
>> -#define fence_to_wb_connector(x) container_of(x->lock, \
>> +#define fence_to_wb_connector(x) container_of(x->extern_lock, \
>>  					      struct drm_writeback_connector, \
>>  					      fence_lock)
>>  
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
>> index 1527b801f013..2956ed2ec073 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
>> @@ -156,12 +156,13 @@ nouveau_name(struct drm_device *dev)
>>  static inline bool
>>  nouveau_cli_work_ready(struct dma_fence *fence)
>>  {
>> +	unsigned long flags;
>>  	bool ret = true;
>>  
>> -	spin_lock_irq(fence->lock);
>> +	dma_fence_lock(fence, flags);
>>  	if (!dma_fence_is_signaled_locked(fence))
>>  		ret = false;
>> -	spin_unlock_irq(fence->lock);
>> +	dma_fence_unlock(fence, flags);
>>  
>>  	if (ret == true)
>>  		dma_fence_put(fence);
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
>> index d5654e26d5bc..272b492c4d7c 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
>> @@ -47,7 +47,8 @@ from_fence(struct dma_fence *fence)
>>  static inline struct nouveau_fence_chan *
>>  nouveau_fctx(struct nouveau_fence *fence)
>>  {
>> -	return container_of(fence->base.lock, struct nouveau_fence_chan, lock);
>> +	return container_of(fence->base.extern_lock, struct nouveau_fence_chan,
>> +			    lock);
>>  }
>>  
>>  static bool
>> diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
>> index 05204a6a3fa8..1d346822c1f7 100644
>> --- a/drivers/gpu/drm/qxl/qxl_release.c
>> +++ b/drivers/gpu/drm/qxl/qxl_release.c
>> @@ -60,7 +60,8 @@ static long qxl_fence_wait(struct dma_fence *fence, bool intr,
>>  	struct qxl_device *qdev;
>>  	unsigned long cur, end = jiffies + timeout;
>>  
>> -	qdev = container_of(fence->lock, struct qxl_device, release_lock);
>> +	qdev = container_of(fence->extern_lock, struct qxl_device,
>> +			    release_lock);
>>  
>>  	if (!wait_event_timeout(qdev->release_event,
>>  				(dma_fence_is_signaled(fence) ||
>> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
>> index c2294abbe753..346761172c1b 100644
>> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
>> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
>> @@ -47,7 +47,8 @@ struct vmw_event_fence_action {
>>  static struct vmw_fence_manager *
>>  fman_from_fence(struct vmw_fence_obj *fence)
>>  {
>> -	return container_of(fence->base.lock, struct vmw_fence_manager, lock);
>> +	return container_of(fence->base.extern_lock, struct vmw_fence_manager,
>> +			    lock);
>>  }
>>  
>>  static void vmw_fence_obj_destroy(struct dma_fence *f)
>> diff --git a/drivers/gpu/drm/xe/xe_hw_fence.c b/drivers/gpu/drm/xe/xe_hw_fence.c
>> index b2a0c46dfcd4..3456bec93c70 100644
>> --- a/drivers/gpu/drm/xe/xe_hw_fence.c
>> +++ b/drivers/gpu/drm/xe/xe_hw_fence.c
>> @@ -144,7 +144,8 @@ static struct xe_hw_fence *to_xe_hw_fence(struct dma_fence *fence);
>>  
>>  static struct xe_hw_fence_irq *xe_hw_fence_irq(struct xe_hw_fence *fence)
>>  {
>> -	return container_of(fence->dma.lock, struct xe_hw_fence_irq, lock);
>> +	return container_of(fence->dma.extern_lock, struct xe_hw_fence_irq,
>> +			    lock);
>>  }
>>  
>>  static const char *xe_hw_fence_get_driver_name(struct dma_fence *dma_fence)
>> diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
>> index d21bf8f26964..ea7038475b4b 100644
>> --- a/drivers/gpu/drm/xe/xe_sched_job.c
>> +++ b/drivers/gpu/drm/xe/xe_sched_job.c
>> @@ -187,11 +187,11 @@ static bool xe_fence_set_error(struct dma_fence *fence, int error)
>>  	unsigned long irq_flags;
>>  	bool signaled;
>>  
>> -	spin_lock_irqsave(fence->lock, irq_flags);
>> +	dma_fence_lock(fence, irq_flags);
>>  	signaled = test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
>>  	if (!signaled)
>>  		dma_fence_set_error(fence, error);
>> -	spin_unlock_irqrestore(fence->lock, irq_flags);
>> +	dma_fence_unlock(fence, irq_flags);
>>  
>>  	return signaled;
>>  }
>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>> index e1ba1d53de88..fb416f500664 100644
>> --- a/include/linux/dma-fence.h
>> +++ b/include/linux/dma-fence.h
>> @@ -34,7 +34,8 @@ struct seq_file;
>>   * @ops: dma_fence_ops associated with this fence
>>   * @rcu: used for releasing fence with kfree_rcu
>>   * @cb_list: list of all callbacks to call
>> - * @lock: spin_lock_irqsave used for locking
>> + * @extern_lock: external spin_lock_irqsave used for locking
>> + * @inline_lock: alternative internal spin_lock_irqsave used for locking
>>   * @context: execution context this fence belongs to, returned by
>>   *           dma_fence_context_alloc()
>>   * @seqno: the sequence number of this fence inside the execution context,
>> @@ -48,6 +49,7 @@ struct seq_file;
>>   * atomic ops (bit_*), so taking the spinlock will not be needed most
>>   * of the time.
>>   *
>> + * DMA_FENCE_FLAG_INLINE_LOCK_BIT - use inline spinlock instead of external one
>>   * DMA_FENCE_FLAG_SIGNALED_BIT - fence is already signaled
>>   * DMA_FENCE_FLAG_TIMESTAMP_BIT - timestamp recorded for fence signaling
>>   * DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT - enable_signaling might have been called
>> @@ -65,7 +67,10 @@ struct seq_file;
>>   * been completed, or never called at all.
>>   */
>>  struct dma_fence {
>> -	spinlock_t *lock;
>> +	union {
>> +		spinlock_t *extern_lock;
>> +		spinlock_t inline_lock;
>> +	};
>>  	const struct dma_fence_ops __rcu *ops;
>>  	/*
>>  	 * We clear the callback list on kref_put so that by the time we
>> @@ -98,6 +103,7 @@ struct dma_fence {
>>  };
>>  
>>  enum dma_fence_flag_bits {
>> +	DMA_FENCE_FLAG_INLINE_LOCK_BIT,
>>  	DMA_FENCE_FLAG_SEQNO64_BIT,
>>  	DMA_FENCE_FLAG_SIGNALED_BIT,
>>  	DMA_FENCE_FLAG_TIMESTAMP_BIT,
>> @@ -351,6 +357,38 @@ dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep)
>>  	} while (1);
>>  }
>>  
>> +/**
>> + * dma_fence_spinlock - return pointer to the spinlock protecting the fence
>> + * @fence: the fence to get the lock from
>> + *
>> + * Return either the pointer to the embedded or the external spin lock.
>> + */
>> +static inline spinlock_t *dma_fence_spinlock(struct dma_fence *fence)
>> +{
>> +	return test_bit(DMA_FENCE_FLAG_INLINE_LOCK_BIT, &fence->flags) ?
>> +		&fence->inline_lock : fence->extern_lock;
>> +}
>> +
>> +/**
>> + * dma_fence_lock - irqsave lock the fence
>> + * @fence: the fence to lock
>> + * @flags: where to store the CPU flags.
>> + *
>> + * Lock the fence, preventing it from changing to the signaled state.
>> + */
>> +#define dma_fence_lock(fence, flags)	\
>> +	spin_lock_irqsave(dma_fence_spinlock(fence), flags)
>> +
>> +/**
>> + * dma_fence_unlock - unlock the fence and irqrestore
>> + * @fence: the fence to unlock
>> + * @flags the CPU flags to restore
>> + *
>> + * Unlock the fence, allowing it to change it's state to signaled again.
>> + */
>> +#define dma_fence_unlock(fence, flags)	\
>> +	spin_unlock_irqrestore(dma_fence_spinlock(fence), flags)
>> +
>>  #ifdef CONFIG_LOCKDEP
>>  bool dma_fence_begin_signalling(void);
>>  void dma_fence_end_signalling(bool cookie);
>> -- 
>> 2.43.0
>>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 14/15] drm/amdgpu: independence for the amdkfd_fence!
  2025-10-30 15:07     ` Christian König
@ 2025-10-30 20:04       ` Felix Kuehling
  0 siblings, 0 replies; 47+ messages in thread
From: Felix Kuehling @ 2025-10-30 20:04 UTC (permalink / raw)
  To: Christian König, phasta, alexdeucher, simona.vetter,
	tursulin
  Cc: dri-devel, amd-gfx


On 2025-10-30 11:07, Christian König wrote:
> On 10/18/25 00:22, Felix Kuehling wrote:
>> On 2025-10-13 09:48, Christian König wrote:
>>> This should allow amdkfd_fences to outlive the amdgpu module.
>>>
>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  6 ++++
>>>    .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c  | 36 +++++++------------
>>>    drivers/gpu/drm/amd/amdkfd/kfd_process.c      |  7 ++--
>>>    drivers/gpu/drm/amd/amdkfd/kfd_svm.c          |  4 +--
>>>    4 files changed, 24 insertions(+), 29 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>> index 9e120c934cc1..35c59c784b7b 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>> @@ -196,6 +196,7 @@ int kfd_debugfs_kfd_mem_limits(struct seq_file *m, void *data);
>>>    #endif
>>>    #if IS_ENABLED(CONFIG_HSA_AMD)
>>>    bool amdkfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm);
>>> +void amdkfd_fence_signal(struct dma_fence *f);
>>>    struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f);
>>>    void amdgpu_amdkfd_remove_all_eviction_fences(struct amdgpu_bo *bo);
>>>    int amdgpu_amdkfd_evict_userptr(struct mmu_interval_notifier *mni,
>>> @@ -210,6 +211,11 @@ bool amdkfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm)
>>>        return false;
>>>    }
>>>    +static inline
>>> +void amdkfd_fence_signal(struct dma_fence *f)
>>> +{
>>> +}
>>> +
>>>    static inline
>>>    struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f)
>>>    {
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>> index 09c919f72b6c..69bca4536326 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>> @@ -127,29 +127,9 @@ static bool amdkfd_fence_enable_signaling(struct dma_fence *f)
>>>            if (!svm_range_schedule_evict_svm_bo(fence))
>>>                return true;
>>>        }
>>> -    return false;
>>> -}
>>> -
>>> -/**
>>> - * amdkfd_fence_release - callback that fence can be freed
>>> - *
>>> - * @f: dma_fence
>>> - *
>>> - * This function is called when the reference count becomes zero.
>>> - * Drops the mm_struct reference and RCU schedules freeing up the fence.
>>> - */
>>> -static void amdkfd_fence_release(struct dma_fence *f)
>>> -{
>>> -    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>> -
>>> -    /* Unconditionally signal the fence. The process is getting
>>> -     * terminated.
>>> -     */
>>> -    if (WARN_ON(!fence))
>>> -        return; /* Not an amdgpu_amdkfd_fence */
>>> -
>>>        mmdrop(fence->mm);
>>> -    kfree_rcu(f, rcu);
>>> +    fence->mm = NULL;
>>> +    return false;
>>>    }
>>>      /**
>>> @@ -174,9 +154,19 @@ bool amdkfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm)
>>>        return false;
>>>    }
>>>    +void amdkfd_fence_signal(struct dma_fence *f)
>>> +{
>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>> +
>>> +    if (fence) {
>>> +        mmdrop(fence->mm);
>>> +        fence->mm = NULL;
>> Isn't fence->mm already NULL here if it was dropped in amdkfd_fence_enable_signaling?
> It looked like ther're some use cases which signals the fence without going through amdkfd_fence_enable_signaling.
>
> E.g. kfd_process_wq_release which is most likely used on process tear down.

I see. Could there be race conditions here, if enable_signaling happens 
concurrently and we end up calling mmdrop twice?

Regards,
   Felix


>
> Regards,
> Christian.
>
>> Regards,
>>    Felix
>>
>>
>>> +    }
>>> +    dma_fence_signal(f);
>>> +}
>>> +
>>>    static const struct dma_fence_ops amdkfd_fence_ops = {
>>>        .get_driver_name = amdkfd_fence_get_driver_name,
>>>        .get_timeline_name = amdkfd_fence_get_timeline_name,
>>>        .enable_signaling = amdkfd_fence_enable_signaling,
>>> -    .release = amdkfd_fence_release,
>>>    };
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> index ddfe30c13e9d..779d7701bac9 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> @@ -1177,7 +1177,7 @@ static void kfd_process_wq_release(struct work_struct *work)
>>>        synchronize_rcu();
>>>        ef = rcu_access_pointer(p->ef);
>>>        if (ef)
>>> -        dma_fence_signal(ef);
>>> +        amdkfd_fence_signal(ef);
>>>          kfd_process_remove_sysfs(p);
>>>        kfd_debugfs_remove_process(p);
>>> @@ -1986,7 +1986,6 @@ kfd_process_gpuid_from_node(struct kfd_process *p, struct kfd_node *node,
>>>    static int signal_eviction_fence(struct kfd_process *p)
>>>    {
>>>        struct dma_fence *ef;
>>> -    int ret;
>>>          rcu_read_lock();
>>>        ef = dma_fence_get_rcu_safe(&p->ef);
>>> @@ -1994,10 +1993,10 @@ static int signal_eviction_fence(struct kfd_process *p)
>>>        if (!ef)
>>>            return -EINVAL;
>>>    -    ret = dma_fence_signal(ef);
>>> +    amdkfd_fence_signal(ef);
>>>        dma_fence_put(ef);
>>>    -    return ret;
>>> +    return 0;
>>>    }
>>>      static void evict_process_worker(struct work_struct *work)
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> index 91609dd5730f..01ce2d853602 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> @@ -428,7 +428,7 @@ static void svm_range_bo_release(struct kref *kref)
>>>          if (!dma_fence_is_signaled(&svm_bo->eviction_fence->base))
>>>            /* We're not in the eviction worker. Signal the fence. */
>>> -        dma_fence_signal(&svm_bo->eviction_fence->base);
>>> +        amdkfd_fence_signal(&svm_bo->eviction_fence->base);
>>>        dma_fence_put(&svm_bo->eviction_fence->base);
>>>        amdgpu_bo_unref(&svm_bo->bo);
>>>        kfree(svm_bo);
>>> @@ -3628,7 +3628,7 @@ static void svm_range_evict_svm_bo_worker(struct work_struct *work)
>>>        mmap_read_unlock(mm);
>>>        mmput(mm);
>>>    -    dma_fence_signal(&svm_bo->eviction_fence->base);
>>> +    amdkfd_fence_signal(&svm_bo->eviction_fence->base);
>>>          /* This is the last reference to svm_bo, after svm_range_vram_node_free
>>>         * has been called in svm_migrate_vram_to_ram

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/15] dma-buf: detach fence ops on signal
  2025-10-30 13:52       ` Christian König
@ 2025-10-31 10:31         ` Tvrtko Ursulin
  0 siblings, 0 replies; 47+ messages in thread
From: Tvrtko Ursulin @ 2025-10-31 10:31 UTC (permalink / raw)
  To: Christian König, phasta, alexdeucher, simona.vetter
  Cc: dri-devel, amd-gfx


On 30/10/2025 13:52, Christian König wrote:
> Hi Tvrtko,
> 
> On 10/16/25 17:57, Tvrtko Ursulin wrote:
>> On 16/10/2025 09:56, Tvrtko Ursulin wrote:
>>>
>>> On 13/10/2025 14:48, Christian König wrote:
>>>> When neither a release nor a wait operation is specified it is possible
>>>> to let the dma_fence live on independent of the module who issued it.
>>>>
>>>> This makes it possible to unload drivers and only wait for all their
>>>> fences to signal.
>>>
>>> Have you looked at whether the requirement to not have the release and wait callbacks will exclude some drivers from being able to benefit from this?
>>
>> I had a browse and this seems to be the situation:
> 
> Oh, thanks a lot for doing that!
> 
>>
>> Custom .wait:
>>   - radeon, qxl, nouveau, i915
>>
>> Those would therefore still be vulnerable to the unbind->unload sequence. Actually not sure about qxl, but other three are PCI so in theory at least. I915 at least supports unbind and unload.
> 
> radeon, yeah I know that is because of the reset handling there. Not going to change and as maintainer I honestly don't care.
> 
> qxl, pretty outdated as well and probably not worth fixing it.
> 
> nouveau, no idea why that is there in the first place. Philip?
> 
> i915, that is really surprising. What is the reason for that?

I915 has some optimisations on the wait path like a short busy spin 
before going to sleep (under limited conditions) and a way to kick the 
hardware to improve the latencies caused by irq and softirq processing.

But another one, and probably the most important one, is "wait boosting" 
ie. raising GPU clocks if userspace is waiting on a specific GPU job. 
This was a significant win for some workloads in the past.

I tried to move this to generic code long time ago but AFAIR dma-fence 
64B size was a concern. Perhaps now that we are thinking of breaking 
that size barrier we could revisit. Let me try to find this work..

Right, it was this: https://patchwork.freedesktop.org/series/113846/

Executive summary would be: Allowing dma-fence owning drivers to see 
when userspace is waiting on a specific fence.

Longer story was an OpenCL application (IIRC a video conference 
background blurring thingy) and a tale of two OpenCL stacks.

The native Intel OpenCL library uses the i915 ioctls. So when it would 
wait on a kernel to complete it would get the waitboost logic courtesy 
of using the i915 wait ioctl.

But then the same application running on the clvk stack would run much, 
much slower, because the waits in that case are going via the DRM 
syncobj route and i915 could not know to waitboost.

And the duration and time distribution of these jobs was such that 
hardware/firmware would not be ramping up the GPU clocks fast enough 
without this external "someone is waiting, hurry up" signal.

It may be worth to revisit this story if we are growing the dma-fence 
struct anyway. With my changes drivers could then choose whether to do 
anything with this info or not. Because it is essentially only allowing 
drivers to see if someone is waiting.

Or an alternative option would be to call a new fence ops vfunc from the 
generic dma fence wait before going to sleep (with the number of 
waiters), and after. But I would need to think more about this, to see 
if it could potentially allow at least i915 to drop the custom wait 
callback.
>> Custom .release:
>>   - vgem, nouveau, lima, pvr, i915, usb-gadget, industrialio, etnaviv, xe
>>
>> Out of those there do not actually need a custom release and could probably be weaned off it:
>>   - usb-gadget, industrialio, etnaviv, xe
>>
>> (Xe would lose a debug assert and some would have their kfrees replaced with kfree_rcu. Plus build time asserts added the struct dma-fence remains first in the respective driver structs. It sounds feasible.)
> 
> Oh, crap! Using kfree_rcu for dma_fences is an absolutely must have!
> 
> Where have you seen that? This is obviously a bug in the drivers doing that.

Industrialio and usb-gadget use a plain kfree. But both looks easily 
fixable by just making sure dma-fence is first in the inherited object 
and then the custom release can be dropped.

Etnaviv and xe aren't broken, they use some variant of RCU, but could 
probably be weaned of the custom release easily. Especially etnaviv.

>> That would leave us with .release in:
>>   - vgem, nouveau, lima, pvr, i915
>>
>> Combined list of custom .wait + .release:
>>   - radeon, qxl, nouveau, i915, lima, pvr, vgem
>>
>>  From those the ones which support unbind and module unload would remain potentially vulnerable to use after free.
>>
>> It doesn't sound great to only solve it partially but maybe it is a reasonable next step. Where could we go from there to solve it for everyone?
> Well I only see the way of getting rid of the legacy stuff (like ->wait callbacks) for everybody who cares about their module unload.
> 
> But I'm pretty sure that for things like radeon and qxl we don't care.

Yeah, I agree the proposal moves into making things better. I would 
incorporate some of the above easily fixable drivers into the series, 
and the ones with known unresolved issues just list them in the cover 
letter.

Regards,

Tvrtko

>>>> ---
>>>>    drivers/dma-buf/dma-fence.c | 16 ++++++++++++----
>>>>    include/linux/dma-fence.h   |  4 ++--
>>>>    2 files changed, 14 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>>>> index 982f2b2a62c0..39f73edf3a33 100644
>>>> --- a/drivers/dma-buf/dma-fence.c
>>>> +++ b/drivers/dma-buf/dma-fence.c
>>>> @@ -374,6 +374,14 @@ int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>>>>                          &fence->flags)))
>>>>            return -EINVAL;
>>>> +    /*
>>>> +     * When neither a release nor a wait operation is specified set the ops
>>>> +     * pointer to NULL to allow the fence structure to become independent
>>>> +     * who originally issued it.
>>>> +     */
>>>> +    if (!fence->ops->release && !fence->ops->wait)
>>>> +        RCU_INIT_POINTER(fence->ops, NULL);
>>>> +
>>>>        /* Stash the cb_list before replacing it with the timestamp */
>>>>        list_replace(&fence->cb_list, &cb_list);
>>>> @@ -513,7 +521,7 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
>>>>        rcu_read_lock();
>>>>        ops = rcu_dereference(fence->ops);
>>>>        trace_dma_fence_wait_start(fence);
>>>> -    if (ops->wait) {
>>>> +    if (ops && ops->wait) {
>>>>            /*
>>>>             * Implementing the wait ops is deprecated and not supported for
>>>>             * issuer independent fences, so it is ok to use the ops outside
>>>> @@ -578,7 +586,7 @@ void dma_fence_release(struct kref *kref)
>>>>        }
>>>>        ops = rcu_dereference(fence->ops);
>>>> -    if (ops->release)
>>>> +    if (ops && ops->release)
>>>>            ops->release(fence);
>>>>        else
>>>>            dma_fence_free(fence);
>>>> @@ -614,7 +622,7 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
>>>>        rcu_read_lock();
>>>>        ops = rcu_dereference(fence->ops);
>>>> -    if (!was_set && ops->enable_signaling) {
>>>> +    if (!was_set && ops && ops->enable_signaling) {
>>>>            trace_dma_fence_enable_signal(fence);
>>>>            if (!ops->enable_signaling(fence)) {
>>>> @@ -1000,7 +1008,7 @@ void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
>>>>        rcu_read_lock();
>>>>        ops = rcu_dereference(fence->ops);
>>>> -    if (ops->set_deadline && !dma_fence_is_signaled(fence))
>>>> +    if (ops && ops->set_deadline && !dma_fence_is_signaled(fence))
>>>>            ops->set_deadline(fence, deadline);
>>>>        rcu_read_unlock();
>>>>    }
>>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>>>> index 38421a0c7c5b..e1ba1d53de88 100644
>>>> --- a/include/linux/dma-fence.h
>>>> +++ b/include/linux/dma-fence.h
>>>> @@ -425,7 +425,7 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
>>>>        rcu_read_lock();
>>>>        ops = rcu_dereference(fence->ops);
>>>> -    if (ops->signaled && ops->signaled(fence)) {
>>>> +    if (ops && ops->signaled && ops->signaled(fence)) {
>>>>            rcu_read_unlock();
>>>>            dma_fence_signal_locked(fence);
>>>>            return true;
>>>> @@ -461,7 +461,7 @@ dma_fence_is_signaled(struct dma_fence *fence)
>>>>        rcu_read_lock();
>>>>        ops = rcu_dereference(fence->ops);
>>>> -    if (ops->signaled && ops->signaled(fence)) {
>>>> +    if (ops && ops->signaled && ops->signaled(fence)) {
>>>>            rcu_read_unlock();
>>>>            dma_fence_signal(fence);
>>>>            return true;
>>>
>>
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/15] dma-buf: protected fence ops by RCU
  2025-10-13 13:48 ` [PATCH 03/15] dma-buf: protected fence ops by RCU Christian König
  2025-10-16 18:04   ` Tvrtko Ursulin
@ 2025-10-31 10:35   ` Tvrtko Ursulin
  1 sibling, 0 replies; 47+ messages in thread
From: Tvrtko Ursulin @ 2025-10-31 10:35 UTC (permalink / raw)
  To: Christian König, phasta, alexdeucher, simona.vetter
  Cc: dri-devel, amd-gfx


One additional comment on this patch:

On 13/10/2025 14:48, Christian König wrote:
> At first glance it is counter intuitive to protect a constant function
> pointer table by RCU, but this allows modules providing the function
> table to unload by waiting for an RCU grace period.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/dma-buf/dma-fence.c | 65 +++++++++++++++++++++++++++----------
>   include/linux/dma-fence.h   | 18 ++++++++--
>   2 files changed, 62 insertions(+), 21 deletions(-)
> 

8><

> @@ -1104,11 +1127,14 @@ EXPORT_SYMBOL(dma_fence_init64);
>    */
>   const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
>   {
> +	const struct dma_fence_ops *ops;
> +
>   	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
>   			 "RCU protection is required for safe access to returned string");
>   
> +	ops = rcu_dereference(fence->ops);
>   	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> -		return fence->ops->get_driver_name(fence);
> +		return ops->get_driver_name(fence);
>   	else
>   		return "detached-driver";
>   }
> @@ -1136,11 +1162,14 @@ EXPORT_SYMBOL(dma_fence_driver_name);
>    */
>   const char __rcu *dma_fence_timeline_name(struct dma_fence *fence)
>   {
> +	const struct dma_fence_ops *ops;
> +
>   	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
>   			 "RCU protection is required for safe access to returned string");
>   
> +	ops = rcu_dereference(fence->ops);

For the above two functions, the RCU_LOCKDEP_WARN now becomes redundant 
to the one rcu_dererence() would emit. Maybe just move the string into a 
comment?

Regards,

Tvrtko

>   	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> -		return fence->ops->get_driver_name(fence);
> +		return ops->get_driver_name(fence);
>   	else
>   		return "signaled-timeline";
>   }

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Independence for dma_fences!
  2025-10-30 10:59           ` Christian König
@ 2025-10-31 17:44             ` Matthew Brost
  2025-11-03 11:43               ` Christian König
  0 siblings, 1 reply; 47+ messages in thread
From: Matthew Brost @ 2025-10-31 17:44 UTC (permalink / raw)
  To: Christian König
  Cc: phasta, alexdeucher, simona.vetter, tursulin, dri-devel, amd-gfx

On Thu, Oct 30, 2025 at 11:59:01AM +0100, Christian König wrote:
> On 10/29/25 21:53, Matthew Brost wrote:
> > On Tue, Oct 28, 2025 at 03:06:22PM +0100, Christian König wrote:
> >> On 10/17/25 10:32, Philipp Stanner wrote:
> >>> On Tue, 2025-10-14 at 17:54 +0200, Christian König wrote:
> >>>> On 13.10.25 16:54, Philipp Stanner wrote:
> >>>>> On Mon, 2025-10-13 at 15:48 +0200, Christian König wrote:
> >>>>>> Hi everyone,
> >>>>>>
> >>>>>> dma_fences have ever lived under the tyranny dictated by the module
> >>>>>> lifetime of their issuer, leading to crashes should anybody still holding
> >>>>>> a reference to a dma_fence when the module of the issuer was unloaded.
> >>>>>>
> >>>>>> But those days are over! The patch set following this mail finally
> >>>>>> implements a way for issuers to release their dma_fence out of this
> >>>>>> slavery and outlive the module who originally created them.
> >>>>>>
> >>>>>> Previously various approaches have been discussed, including changing the
> >>>>>> locking semantics of the dma_fence callbacks (by me) as well as using the
> >>>>>> drm scheduler as intermediate layer (by Sima) to disconnect dma_fences
> >>>>>> from their actual users.
> >>>>>>
> >>>>>> Changing the locking semantics turned out to be much more trickier than
> >>>>>> originally thought because especially on older drivers (nouveau, radeon,
> >>>>>> but also i915) this locking semantics is actually needed for correct
> >>>>>> operation.
> >>>>>>
> >>>>>> Using the drm_scheduler as intermediate layer is still a good idea and
> >>>>>> should probably be implemented to make live simpler for some drivers, but
> >>>>>> doesn't work for all use cases. Especially TLB flush fences, preemption
> >>>>>> fences and userqueue fences don't go through the drm scheduler because it
> >>>>>> doesn't make sense for them.
> >>>>>>
> >>>>>> Tvrtko did some really nice prerequisite work by protecting the returned
> >>>>>> strings of the dma_fence_ops by RCU. This way dma_fence creators where
> >>>>>> able to just wait for an RCU grace period after fence signaling before
> >>>>>> they could be save to free those data structures.
> >>>>>>
> >>>>>> Now this patch set here goes a step further and protects the whole
> >>>>>> dma_fence_ops structure by RCU, so that after the fence signals the
> >>>>>> pointer to the dma_fence_ops is set to NULL when there is no wait nor
> >>>>>> release callback given. All functionality which use the dma_fence_ops
> >>>>>> reference are put inside an RCU critical section, except for the
> >>>>>> deprecated issuer specific wait and of course the optional release
> >>>>>> callback.
> >>>>>>
> >>>>>> Additional to the RCU changes the lock protecting the dma_fence state
> >>>>>> previously had to be allocated external. This set here now changes the
> >>>>>> functionality to make that external lock optional and allows dma_fences
> >>>>>> to use an inline lock and be self contained.
> >>>>>
> >>>>> Allowing for an embedded lock, is that actually necessary for the goals
> >>>>> of this series, or is it an optional change / improvement?
> >>>>
> >>>> It is kind of necessary because otherwise you can't fully determine the lifetime of the lock.
> >>>>
> >>>> The lock is used to avoid signaling a dma_fence when you modify the linked list of callbacks for example.
> >>>>
> >>>> An alternative would be to protect the lock by RCU as well instead of embedding it in the structure, but that would make things even more complicated.
> >>>>
> >>>>> If I understood you correctly at XDC you wanted to have an embedded
> >>>>> lock because it improves the memory footprint and because an external
> >>>>> lock couldn't achieve some goals about fence-signaling-order originally
> >>>>> intended. Can you elaborate on that?
> >>>>
> >>>> The embedded lock is also nice to have for the dma_fence_array, dma_fence_chain and drm_sched_fence, but that just saves a few cache lines in some use cases.
> >>>>
> >>>> The fence-signaling-order is important for drivers like radeon where the external lock is protecting multiple fences from signaling at the same time and makes sure that everything stays in order.
> > 
> > Not to derail the conversation, but I noticed that dma-fence-arrays can,
> > in fact, signal out of order. The issue lies in dma-fence-cb, which
> > signals the fence using irq_queue_work. Internally, irq_queue_work uses
> > llist, a LIFO structure. So, if two dma-fence-arrays have all their
> > fences signaled from a thread, the IRQ work that signals each individual
> > dma-fence-array will execute out of order.
> > 
> > We should probably fix this.
> 
> No we don't. That's what I'm trying to point out all the time.
> 
> The original idea of sharing the lock was to guarantee that fence signal in order, but that never worked correct even for driver fences.
> 
> The background is the optimization we do in the signaling fast path. E.g. when dma_fence_is_signaled() is called.
> 

Ah, yes—I see this now. I was operating under the assumption that fences
on a timeline must signal in order, but that’s not actually true. What
is true is that if a fence later on a timeline signals, all prior fences
are complete (i.e., the underlying hardware condition is met, even if
the software hasn’t signaled them yet).

Could we document this somewhere in the dma-fence kernel docs? I can
take a stab at writing it up if you'd like. This is a fairly confusing
aspect of dma-fence behavior.

Matt

> This means that when fence A,B and C are submitted to the HW it is perfectly possible that somebody query the status of fence B but not A and C. And this querying of the status is faster than the interrupt which signals A and C.
> 
> So in this scenario B signals before A.
> 
> The only way to avoid that is to not implement the fast path and as far as I know no real HW driver does that because it makes your driver horrible slow.
> 
> So of to the trash bin with the signaling order, things have worked for over 10 years without it and as far as I know nobody complained about it.
> 
> Regards,
> Christian.
>  
> 
> > 
> > Matt
> > 
> >>>
> >>> I mean, neither external nor internal lock can somehow force the driver
> >>> to signal fences in order, can they?
> >>
> >> Nope, as I said before this approach is actually pretty useless.
> >>
> >>> Only the driver can ensure this.
> >>
> >> Only when the signaled callback is not implemented which basically all driver do.
> >>
> >> So the whole point of sharing the lock is just not existent any more, it's just that changing it all at once as I tried before results in a way to big patch.
> >>
> >>>
> >>> I am, however, considering modeling something like that on a
> >>> FenceContext object:
> >>>
> >>> fctx.signal_all_fences_up_to_ordered(seqno);
> >>
> >> Yeah, I have patches for that as well. But then found that amdgpus TLB fences trigger that check and I won't have time to fix it.
> >>
> >>
> >>
> >>>
> >>>
> >>> P.
> >>>
> >>>>
> >>>> While it is possible to change the locking semantics on such old drivers, it's probably just better to stay away from it.
> >>>>
> >>>> Regards,
> >>>> Christian.
> >>>>
> >>>>>
> >>>>> P.
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> The new approach is then applied to amdgpu allowing the module to be
> >>>>>> unloaded even when dma_fences issued by it are still around.
> >>>>>>
> >>>>>> Please review and comment,
> >>>>>> Christian.
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Independence for dma_fences!
  2025-10-31 17:44             ` Matthew Brost
@ 2025-11-03 11:43               ` Christian König
  2025-11-03 19:32                 ` Matthew Brost
  0 siblings, 1 reply; 47+ messages in thread
From: Christian König @ 2025-11-03 11:43 UTC (permalink / raw)
  To: Matthew Brost
  Cc: phasta, alexdeucher, simona.vetter, tursulin, dri-devel, amd-gfx

On 10/31/25 18:44, Matthew Brost wrote:
>>> Not to derail the conversation, but I noticed that dma-fence-arrays can,
>>> in fact, signal out of order. The issue lies in dma-fence-cb, which
>>> signals the fence using irq_queue_work. Internally, irq_queue_work uses
>>> llist, a LIFO structure. So, if two dma-fence-arrays have all their
>>> fences signaled from a thread, the IRQ work that signals each individual
>>> dma-fence-array will execute out of order.
>>>
>>> We should probably fix this.
>>
>> No we don't. That's what I'm trying to point out all the time.
>>
>> The original idea of sharing the lock was to guarantee that fence signal in order, but that never worked correct even for driver fences.
>>
>> The background is the optimization we do in the signaling fast path. E.g. when dma_fence_is_signaled() is called.
>>
> 
> Ah, yes—I see this now. I was operating under the assumption that fences
> on a timeline must signal in order, but that’s not actually true. What
> is true is that if a fence later on a timeline signals, all prior fences
> are complete (i.e., the underlying hardware condition is met, even if
> the software hasn’t signaled them yet).
> 
> Could we document this somewhere in the dma-fence kernel docs? I can
> take a stab at writing it up if you'd like. This is a fairly confusing
> aspect of dma-fence behavior.

We do have some hints in the documentation about that, but nothing which clearly says "don't expect fences submitted in the order A,B,C to also signal in order A,B,C unless signaling of each is enabled".

Were could we add something like that?

Christian.

> 
> Matt

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/15] dma-buf: inline spinlock for fence protection
  2025-10-16  9:26   ` Tvrtko Ursulin
@ 2025-11-03 13:07     ` Philipp Stanner
  0 siblings, 0 replies; 47+ messages in thread
From: Philipp Stanner @ 2025-11-03 13:07 UTC (permalink / raw)
  To: Tvrtko Ursulin, Christian König, alexdeucher, simona.vetter
  Cc: dri-devel, amd-gfx

On Thu, 2025-10-16 at 10:26 +0100, Tvrtko Ursulin wrote:
> 
> Hi Christian,
> 
> Only some preliminary comments while I am building a complete picture.
> 
> On 13/10/2025 14:48, Christian König wrote:
> > Allow implementations to not give a spinlock to protect the fence
> > internal state, instead a spinlock embedded into the fence structure
> > itself is used in this case.
> > 
> > Apart from simplifying the handling for containers and the stub fence
> > this has the advantage of allowing implementations to issue fences
> > without caring about theit spinlock lifetime.
> > 
> > That in turn is necessary for independent fences who outlive the module
> > who originally issued them.

I like the overall idea and think that separate locks will help me with
Rust dma_fence, where I also had begun to investigate the issue of
module unload vs backend_ops.

> > 
> > Signed-off-by: Christian König <christian.koenig@amd.com>
> > 

[…]

> >   
> >   static inline struct sync_timeline *dma_fence_parent(struct dma_fence *fence)
> >   {
> > -	return container_of(fence->lock, struct sync_timeline, lock);
> > +	return container_of(fence->extern_lock, struct sync_timeline, lock);
> 
> These container_ofs are a bit annoying. Maybe even a bit fragile if 
> someone switches to embedded lock and forgets to update them all.
> 
> Would prep patch to first replace them with some dma_fence_container_of 
> wrapper make sense? 
> 

+1, would be a nice change.

It's not related to the series, though, so should be done in an
independent patch (IMO).

> Then it could even have a (debug builds only) assert 
> added to check for correct usage.
> 
> >   }
> >   
> >   /**
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> > 
> > 

[…]

> >    *
> > + * DMA_FENCE_FLAG_INLINE_LOCK_BIT - use inline spinlock instead of external one
> >    * DMA_FENCE_FLAG_SIGNALED_BIT - fence is already signaled
> >    * DMA_FENCE_FLAG_TIMESTAMP_BIT - timestamp recorded for fence signaling
> >    * DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT - enable_signaling might have been called
> > @@ -65,7 +67,10 @@ struct seq_file;
> >    * been completed, or never called at all.
> >    */
> >   struct dma_fence {
> > -	spinlock_t *lock;
> > +	union {
> > +		spinlock_t *extern_lock;
> > +		spinlock_t inline_lock;
> 
> This will grow the struct on some architectures so I think, given the
> strong push back to struct past a 64B cacheline in the past, it should 
> be called out in the commit message.

+1

Although: Christian, you told me at XDC that you did some exact
measurements about the new vs old cache line size. Can you help out my
memory here, what were those sizes?


P.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Independence for dma_fences!
  2025-11-03 11:43               ` Christian König
@ 2025-11-03 19:32                 ` Matthew Brost
  0 siblings, 0 replies; 47+ messages in thread
From: Matthew Brost @ 2025-11-03 19:32 UTC (permalink / raw)
  To: Christian König
  Cc: phasta, alexdeucher, simona.vetter, tursulin, dri-devel, amd-gfx

On Mon, Nov 03, 2025 at 12:43:32PM +0100, Christian König wrote:
> On 10/31/25 18:44, Matthew Brost wrote:
> >>> Not to derail the conversation, but I noticed that dma-fence-arrays can,
> >>> in fact, signal out of order. The issue lies in dma-fence-cb, which
> >>> signals the fence using irq_queue_work. Internally, irq_queue_work uses
> >>> llist, a LIFO structure. So, if two dma-fence-arrays have all their
> >>> fences signaled from a thread, the IRQ work that signals each individual
> >>> dma-fence-array will execute out of order.
> >>>
> >>> We should probably fix this.
> >>
> >> No we don't. That's what I'm trying to point out all the time.
> >>
> >> The original idea of sharing the lock was to guarantee that fence signal in order, but that never worked correct even for driver fences.
> >>
> >> The background is the optimization we do in the signaling fast path. E.g. when dma_fence_is_signaled() is called.
> >>
> > 
> > Ah, yes—I see this now. I was operating under the assumption that fences
> > on a timeline must signal in order, but that’s not actually true. What
> > is true is that if a fence later on a timeline signals, all prior fences
> > are complete (i.e., the underlying hardware condition is met, even if
> > the software hasn’t signaled them yet).
> > 
> > Could we document this somewhere in the dma-fence kernel docs? I can
> > take a stab at writing it up if you'd like. This is a fairly confusing
> > aspect of dma-fence behavior.
> 
> We do have some hints in the documentation about that, but nothing which clearly says "don't expect fences submitted in the order A,B,C to also signal in order A,B,C unless signaling of each is enabled".
> 
> Were could we add something like that?

Yea, let me take a shot at writing something up.

Matt

> 
> Christian.
> 
> > 
> > Matt

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2025-11-03 19:32 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-13 13:48 Independence for dma_fences! Christian König
2025-10-13 13:48 ` [PATCH 01/15] dma-buf: cleanup dma_fence_describe Christian König
2025-10-14 14:37   ` Tvrtko Ursulin
2025-10-23  3:45     ` Matthew Brost
2025-10-13 13:48 ` [PATCH 02/15] dma-buf: rework stub fence initialisation Christian König
2025-10-14 15:03   ` Tvrtko Ursulin
2025-10-24  7:29   ` Tvrtko Ursulin
2025-10-13 13:48 ` [PATCH 03/15] dma-buf: protected fence ops by RCU Christian König
2025-10-16 18:04   ` Tvrtko Ursulin
2025-10-31 10:35   ` Tvrtko Ursulin
2025-10-13 13:48 ` [PATCH 04/15] dma-buf: detach fence ops on signal Christian König
2025-10-16  8:56   ` Tvrtko Ursulin
2025-10-16 15:57     ` Tvrtko Ursulin
2025-10-23  4:23       ` Matthew Brost
2025-10-23  4:44         ` Matthew Brost
2025-10-30 13:52       ` Christian König
2025-10-31 10:31         ` Tvrtko Ursulin
2025-10-17  9:14   ` Philipp Stanner
2025-10-30 15:05     ` Christian König
2025-10-13 13:48 ` [PATCH 05/15] dma-buf: inline spinlock for fence protection Christian König
2025-10-16  9:26   ` Tvrtko Ursulin
2025-11-03 13:07     ` Philipp Stanner
2025-10-23 18:09   ` Matthew Brost
2025-10-30 15:14     ` Christian König
2025-10-13 13:48 ` [PATCH 06/15] dma-buf: use inline lock for the stub fence Christian König
2025-10-13 13:48 ` [PATCH 07/15] dma-buf: use inline lock for the dma-fence-array Christian König
2025-10-13 13:48 ` [PATCH 08/15] dma-buf: use inline lock for the dma-fence-chain Christian König
2025-10-13 13:48 ` [PATCH 09/15] drm/sched: use inline locks for the drm-sched-fence Christian König
2025-10-13 13:48 ` [PATCH 10/15] drm/amdgpu: fix KFD eviction fence enable_signaling path Christian König
2025-10-13 13:48 ` [PATCH 11/15] drm/amdgpu: independence for the amdgpu_fence! Christian König
2025-10-13 13:48 ` [PATCH 12/15] drm/amdgpu: independence for the amdgpu_eviction_fence! Christian König
2025-10-13 13:48 ` [PATCH 13/15] drm/amdgpu: independence for the amdgpu_vm_tlb_fence! Christian König
2025-10-13 13:48 ` [PATCH 14/15] drm/amdgpu: independence for the amdkfd_fence! Christian König
2025-10-17 22:22   ` Felix Kuehling
2025-10-30 15:07     ` Christian König
2025-10-30 20:04       ` Felix Kuehling
2025-10-13 13:48 ` [PATCH 15/15] drm/amdgpu: independence for the amdgpu_userq__fence! Christian König
2025-10-13 14:54 ` Independence for dma_fences! Philipp Stanner
2025-10-14 15:54   ` Christian König
2025-10-17  8:32     ` Philipp Stanner
2025-10-28 14:06       ` Christian König
2025-10-29 20:53         ` Matthew Brost
2025-10-30 10:59           ` Christian König
2025-10-31 17:44             ` Matthew Brost
2025-11-03 11:43               ` Christian König
2025-11-03 19:32                 ` Matthew Brost
2025-10-15  0:51 ` Dave Airlie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox