* Independence for dma_fences! v2
@ 2025-10-31 13:16 Christian König
2025-10-31 13:16 ` [PATCH 01/20] dma-buf: cleanup dma_fence_describe v2 Christian König
` (19 more replies)
0 siblings, 20 replies; 55+ messages in thread
From: Christian König @ 2025-10-31 13:16 UTC (permalink / raw)
To: phasta, alexdeucher, simona.vetter, tursulin, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
Hi everyone,
dma_fences have ever lived under the tyranny dictated by the module
lifetime of their issuer, leading to crashes should anybody still holding
a reference to a dma_fence when the module of the issuer was unloaded.
The basic problem is that when buffer are shared between drivers
dma_fence objects can leak into external drivers and stay there even
after they are signaled. The dma_resv object for example only lazy releases
dma_fences.
So what happens is that when the module who originally created the dma_fence
unloads the dma_fence_ops function table becomes unavailable as well and so
any attempt to release the fence crashes the system.
Previously various approaches have been discussed, including changing the
locking semantics of the dma_fence callbacks (by me) as well as using the
drm scheduler as intermediate layer (by Sima) to disconnect dma_fences
from their actual users, but none of them are actually solving all problems.
Tvrtko did some really nice prerequisite work by protecting the returned
strings of the dma_fence_ops by RCU. This way dma_fence creators where
able to just wait for an RCU grace period after fence signaling before
they could be save to free those data structures.
Now this patch set here goes a step further and protects the whole
dma_fence_ops structure by RCU, so that after the fence signals the
pointer to the dma_fence_ops is set to NULL when there is no wait nor
release callback given. All functionality which use the dma_fence_ops
reference are put inside an RCU critical section, except for the
deprecated issuer specific wait and of course the optional release
callback.
Additional to the RCU changes the lock protecting the dma_fence state
previously had to be allocated external. This set here now changes the
functionality to make that external lock optional and allows dma_fences
to use an inline lock and be self contained.
This patch set addressed all previous code review comments, is now based
on drm-tip instead of amd-staging-drm-next, includes my changes for amdgpu
as well as Mathew's patches for XE.
Please review and comment,
Christian.
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH 01/20] dma-buf: cleanup dma_fence_describe v2
2025-10-31 13:16 Independence for dma_fences! v2 Christian König
@ 2025-10-31 13:16 ` Christian König
2025-10-31 14:04 ` Tvrtko Ursulin
2025-10-31 13:16 ` [PATCH 02/20] dma-buf: rework stub fence initialisation v2 Christian König
` (18 subsequent siblings)
19 siblings, 1 reply; 55+ messages in thread
From: Christian König @ 2025-10-31 13:16 UTC (permalink / raw)
To: phasta, alexdeucher, simona.vetter, tursulin, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
The driver and timeline name are meaningless for signaled fences.
Drop them and also print the context number.
v2: avoid the calls when the BO is already signaled.
Signed-off-by: Christian König <christian.koenig@amd.com>
---
drivers/dma-buf/dma-fence.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 39e6f93dc310..cfa6ddcc1278 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -999,19 +999,20 @@ EXPORT_SYMBOL(dma_fence_set_deadline);
*/
void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq)
{
- const char __rcu *timeline;
- const char __rcu *driver;
+ const char __rcu *timeline = "";
+ const char __rcu *driver = "";
+ const char *signaled = "";
rcu_read_lock();
- timeline = dma_fence_timeline_name(fence);
- driver = dma_fence_driver_name(fence);
+ if (!dma_fence_is_signaled(fence)) {
+ timeline = dma_fence_timeline_name(fence);
+ driver = dma_fence_driver_name(fence);
+ signaled = "un";
+ }
- seq_printf(seq, "%s %s seq %llu %ssignalled\n",
- rcu_dereference(driver),
- rcu_dereference(timeline),
- fence->seqno,
- dma_fence_is_signaled(fence) ? "" : "un");
+ seq_printf(seq, "%llu %s %s seq %llu %ssignalled\n", fence->context,
+ timeline, driver, fence->seqno, signaled);
rcu_read_unlock();
}
--
2.43.0
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH 02/20] dma-buf: rework stub fence initialisation v2
2025-10-31 13:16 Independence for dma_fences! v2 Christian König
2025-10-31 13:16 ` [PATCH 01/20] dma-buf: cleanup dma_fence_describe v2 Christian König
@ 2025-10-31 13:16 ` Christian König
2025-10-31 14:05 ` Tvrtko Ursulin
2025-11-04 15:01 ` Tvrtko Ursulin
2025-10-31 13:16 ` [PATCH 03/20] dma-buf: protected fence ops by RCU v2 Christian König
` (17 subsequent siblings)
19 siblings, 2 replies; 55+ messages in thread
From: Christian König @ 2025-10-31 13:16 UTC (permalink / raw)
To: phasta, alexdeucher, simona.vetter, tursulin, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
Instead of doing this on the first call of the function just initialize
the stub fence during kernel load.
This has the clear advantage of lower overhead and also doesn't rely on
the ops to not be NULL any more.
v2: use correct signal function
Signed-off-by: Christian König <christian.koenig@amd.com>
---
drivers/dma-buf/dma-fence.c | 32 +++++++++++++++-----------------
1 file changed, 15 insertions(+), 17 deletions(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index cfa6ddcc1278..b229d96f551c 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -121,29 +121,27 @@ static const struct dma_fence_ops dma_fence_stub_ops = {
.get_timeline_name = dma_fence_stub_get_name,
};
+static int __init dma_fence_init_stub(void)
+{
+ dma_fence_init(&dma_fence_stub, &dma_fence_stub_ops,
+ &dma_fence_stub_lock, 0, 0);
+
+ set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
+ &dma_fence_stub.flags);
+
+ dma_fence_signal(&dma_fence_stub);
+ return 0;
+}
+subsys_initcall(dma_fence_init_stub);
+
/**
* dma_fence_get_stub - return a signaled fence
*
- * Return a stub fence which is already signaled. The fence's
- * timestamp corresponds to the first time after boot this
- * function is called.
+ * Return a stub fence which is already signaled. The fence's timestamp
+ * corresponds to the initialisation time of the linux kernel.
*/
struct dma_fence *dma_fence_get_stub(void)
{
- spin_lock(&dma_fence_stub_lock);
- if (!dma_fence_stub.ops) {
- dma_fence_init(&dma_fence_stub,
- &dma_fence_stub_ops,
- &dma_fence_stub_lock,
- 0, 0);
-
- set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
- &dma_fence_stub.flags);
-
- dma_fence_signal_locked(&dma_fence_stub);
- }
- spin_unlock(&dma_fence_stub_lock);
-
return dma_fence_get(&dma_fence_stub);
}
EXPORT_SYMBOL(dma_fence_get_stub);
--
2.43.0
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH 03/20] dma-buf: protected fence ops by RCU v2
2025-10-31 13:16 Independence for dma_fences! v2 Christian König
2025-10-31 13:16 ` [PATCH 01/20] dma-buf: cleanup dma_fence_describe v2 Christian König
2025-10-31 13:16 ` [PATCH 02/20] dma-buf: rework stub fence initialisation v2 Christian König
@ 2025-10-31 13:16 ` Christian König
2025-10-31 14:29 ` Tvrtko Ursulin
2025-10-31 13:16 ` [PATCH 04/20] dma-buf: detach fence ops on signal Christian König
` (16 subsequent siblings)
19 siblings, 1 reply; 55+ messages in thread
From: Christian König @ 2025-10-31 13:16 UTC (permalink / raw)
To: phasta, alexdeucher, simona.vetter, tursulin, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
At first glance it is counter intuitive to protect a constant function
pointer table by RCU, but this allows modules providing the function
table to unload by waiting for an RCU grace period.
v2: make one the now duplicated lockdep warnings a comment instead.
Signed-off-by: Christian König <christian.koenig@amd.com>
---
drivers/dma-buf/dma-fence.c | 69 +++++++++++++++++++++++++------------
include/linux/dma-fence.h | 18 ++++++++--
2 files changed, 62 insertions(+), 25 deletions(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index b229d96f551c..ed82e8361096 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -498,6 +498,7 @@ EXPORT_SYMBOL(dma_fence_signal);
signed long
dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
{
+ const struct dma_fence_ops *ops;
signed long ret;
if (WARN_ON(timeout < 0))
@@ -509,15 +510,21 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
dma_fence_enable_sw_signaling(fence);
- if (trace_dma_fence_wait_start_enabled()) {
- rcu_read_lock();
- trace_dma_fence_wait_start(fence);
+ rcu_read_lock();
+ ops = rcu_dereference(fence->ops);
+ trace_dma_fence_wait_start(fence);
+ if (ops->wait) {
+ /*
+ * Implementing the wait ops is deprecated and not supported for
+ * issuer independent fences, so it is ok to use the ops outside
+ * the RCU protected section.
+ */
+ rcu_read_unlock();
+ ret = ops->wait(fence, intr, timeout);
+ } else {
rcu_read_unlock();
- }
- if (fence->ops->wait)
- ret = fence->ops->wait(fence, intr, timeout);
- else
ret = dma_fence_default_wait(fence, intr, timeout);
+ }
if (trace_dma_fence_wait_end_enabled()) {
rcu_read_lock();
trace_dma_fence_wait_end(fence);
@@ -538,6 +545,7 @@ void dma_fence_release(struct kref *kref)
{
struct dma_fence *fence =
container_of(kref, struct dma_fence, refcount);
+ const struct dma_fence_ops *ops;
rcu_read_lock();
trace_dma_fence_destroy(fence);
@@ -569,12 +577,12 @@ void dma_fence_release(struct kref *kref)
spin_unlock_irqrestore(fence->lock, flags);
}
- rcu_read_unlock();
-
- if (fence->ops->release)
- fence->ops->release(fence);
+ ops = rcu_dereference(fence->ops);
+ if (ops->release)
+ ops->release(fence);
else
dma_fence_free(fence);
+ rcu_read_unlock();
}
EXPORT_SYMBOL(dma_fence_release);
@@ -593,6 +601,7 @@ EXPORT_SYMBOL(dma_fence_free);
static bool __dma_fence_enable_signaling(struct dma_fence *fence)
{
+ const struct dma_fence_ops *ops;
bool was_set;
lockdep_assert_held(fence->lock);
@@ -603,14 +612,18 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
return false;
- if (!was_set && fence->ops->enable_signaling) {
+ rcu_read_lock();
+ ops = rcu_dereference(fence->ops);
+ if (!was_set && ops->enable_signaling) {
trace_dma_fence_enable_signal(fence);
- if (!fence->ops->enable_signaling(fence)) {
+ if (!ops->enable_signaling(fence)) {
+ rcu_read_unlock();
dma_fence_signal_locked(fence);
return false;
}
}
+ rcu_read_unlock();
return true;
}
@@ -983,8 +996,13 @@ EXPORT_SYMBOL(dma_fence_wait_any_timeout);
*/
void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
{
- if (fence->ops->set_deadline && !dma_fence_is_signaled(fence))
- fence->ops->set_deadline(fence, deadline);
+ const struct dma_fence_ops *ops;
+
+ rcu_read_lock();
+ ops = rcu_dereference(fence->ops);
+ if (ops->set_deadline && !dma_fence_is_signaled(fence))
+ ops->set_deadline(fence, deadline);
+ rcu_read_unlock();
}
EXPORT_SYMBOL(dma_fence_set_deadline);
@@ -1024,7 +1042,12 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
kref_init(&fence->refcount);
- fence->ops = ops;
+ /*
+ * At first glance it is counter intuitive to protect a constant
+ * function pointer table by RCU, but this allows modules providing the
+ * function table to unload by waiting for an RCU grace period.
+ */
+ RCU_INIT_POINTER(fence->ops, ops);
INIT_LIST_HEAD(&fence->cb_list);
fence->lock = lock;
fence->context = context;
@@ -1104,11 +1127,12 @@ EXPORT_SYMBOL(dma_fence_init64);
*/
const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
{
- RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
- "RCU protection is required for safe access to returned string");
+ const struct dma_fence_ops *ops;
+ /* RCU protection is required for safe access to returned string */
+ ops = rcu_dereference(fence->ops);
if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
- return fence->ops->get_driver_name(fence);
+ return ops->get_driver_name(fence);
else
return "detached-driver";
}
@@ -1136,11 +1160,12 @@ EXPORT_SYMBOL(dma_fence_driver_name);
*/
const char __rcu *dma_fence_timeline_name(struct dma_fence *fence)
{
- RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
- "RCU protection is required for safe access to returned string");
+ const struct dma_fence_ops *ops;
+ /* RCU protection is required for safe access to returned string */
+ ops = rcu_dereference(fence->ops);
if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
- return fence->ops->get_timeline_name(fence);
+ return ops->get_timeline_name(fence);
else
return "signaled-timeline";
}
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 64639e104110..38421a0c7c5b 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -66,7 +66,7 @@ struct seq_file;
*/
struct dma_fence {
spinlock_t *lock;
- const struct dma_fence_ops *ops;
+ const struct dma_fence_ops __rcu *ops;
/*
* We clear the callback list on kref_put so that by the time we
* release the fence it is unused. No one should be adding to the
@@ -418,13 +418,19 @@ const char __rcu *dma_fence_timeline_name(struct dma_fence *fence);
static inline bool
dma_fence_is_signaled_locked(struct dma_fence *fence)
{
+ const struct dma_fence_ops *ops;
+
if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
return true;
- if (fence->ops->signaled && fence->ops->signaled(fence)) {
+ rcu_read_lock();
+ ops = rcu_dereference(fence->ops);
+ if (ops->signaled && ops->signaled(fence)) {
+ rcu_read_unlock();
dma_fence_signal_locked(fence);
return true;
}
+ rcu_read_unlock();
return false;
}
@@ -448,13 +454,19 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
static inline bool
dma_fence_is_signaled(struct dma_fence *fence)
{
+ const struct dma_fence_ops *ops;
+
if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
return true;
- if (fence->ops->signaled && fence->ops->signaled(fence)) {
+ rcu_read_lock();
+ ops = rcu_dereference(fence->ops);
+ if (ops->signaled && ops->signaled(fence)) {
+ rcu_read_unlock();
dma_fence_signal(fence);
return true;
}
+ rcu_read_unlock();
return false;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH 04/20] dma-buf: detach fence ops on signal
2025-10-31 13:16 Independence for dma_fences! v2 Christian König
` (2 preceding siblings ...)
2025-10-31 13:16 ` [PATCH 03/20] dma-buf: protected fence ops by RCU v2 Christian König
@ 2025-10-31 13:16 ` Christian König
2025-11-07 11:04 ` Philipp Stanner
2025-10-31 13:16 ` [PATCH 05/20] dma-buf: inline spinlock for fence protection Christian König
` (15 subsequent siblings)
19 siblings, 1 reply; 55+ messages in thread
From: Christian König @ 2025-10-31 13:16 UTC (permalink / raw)
To: phasta, alexdeucher, simona.vetter, tursulin, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
When neither a release nor a wait operation is specified it is possible
to let the dma_fence live on independent of the module who issued it.
This makes it possible to unload drivers and only wait for all their
fences to signal.
Signed-off-by: Christian König <christian.koenig@amd.com>
---
drivers/dma-buf/dma-fence.c | 16 ++++++++++++----
include/linux/dma-fence.h | 4 ++--
2 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index ed82e8361096..cd222984e2e1 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -374,6 +374,14 @@ int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
&fence->flags)))
return -EINVAL;
+ /*
+ * When neither a release nor a wait operation is specified set the ops
+ * pointer to NULL to allow the fence structure to become independent
+ * who originally issued it.
+ */
+ if (!fence->ops->release && !fence->ops->wait)
+ RCU_INIT_POINTER(fence->ops, NULL);
+
/* Stash the cb_list before replacing it with the timestamp */
list_replace(&fence->cb_list, &cb_list);
@@ -513,7 +521,7 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
rcu_read_lock();
ops = rcu_dereference(fence->ops);
trace_dma_fence_wait_start(fence);
- if (ops->wait) {
+ if (ops && ops->wait) {
/*
* Implementing the wait ops is deprecated and not supported for
* issuer independent fences, so it is ok to use the ops outside
@@ -578,7 +586,7 @@ void dma_fence_release(struct kref *kref)
}
ops = rcu_dereference(fence->ops);
- if (ops->release)
+ if (ops && ops->release)
ops->release(fence);
else
dma_fence_free(fence);
@@ -614,7 +622,7 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
rcu_read_lock();
ops = rcu_dereference(fence->ops);
- if (!was_set && ops->enable_signaling) {
+ if (!was_set && ops && ops->enable_signaling) {
trace_dma_fence_enable_signal(fence);
if (!ops->enable_signaling(fence)) {
@@ -1000,7 +1008,7 @@ void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
rcu_read_lock();
ops = rcu_dereference(fence->ops);
- if (ops->set_deadline && !dma_fence_is_signaled(fence))
+ if (ops && ops->set_deadline && !dma_fence_is_signaled(fence))
ops->set_deadline(fence, deadline);
rcu_read_unlock();
}
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 38421a0c7c5b..e1ba1d53de88 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -425,7 +425,7 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
rcu_read_lock();
ops = rcu_dereference(fence->ops);
- if (ops->signaled && ops->signaled(fence)) {
+ if (ops && ops->signaled && ops->signaled(fence)) {
rcu_read_unlock();
dma_fence_signal_locked(fence);
return true;
@@ -461,7 +461,7 @@ dma_fence_is_signaled(struct dma_fence *fence)
rcu_read_lock();
ops = rcu_dereference(fence->ops);
- if (ops->signaled && ops->signaled(fence)) {
+ if (ops && ops->signaled && ops->signaled(fence)) {
rcu_read_unlock();
dma_fence_signal(fence);
return true;
--
2.43.0
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH 05/20] dma-buf: inline spinlock for fence protection
2025-10-31 13:16 Independence for dma_fences! v2 Christian König
` (3 preceding siblings ...)
2025-10-31 13:16 ` [PATCH 04/20] dma-buf: detach fence ops on signal Christian König
@ 2025-10-31 13:16 ` Christian König
2025-11-07 11:59 ` Philipp Stanner
2025-10-31 13:16 ` [PATCH 06/20] dma-buf: use inline lock for the stub fence Christian König
` (14 subsequent siblings)
19 siblings, 1 reply; 55+ messages in thread
From: Christian König @ 2025-10-31 13:16 UTC (permalink / raw)
To: phasta, alexdeucher, simona.vetter, tursulin, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
Allow implementations to not give a spinlock to protect the fence
internal state, instead a spinlock embedded into the fence structure
itself is used in this case.
Apart from simplifying the handling for containers and the stub fence
this has the advantage of allowing implementations to issue fences
without caring about theit spinlock lifetime.
That in turn is necessary for independent fences who outlive the module
who originally issued them.
Signed-off-by: Christian König <christian.koenig@amd.com>
---
drivers/dma-buf/dma-fence.c | 54 ++++++++++++------------
drivers/dma-buf/sw_sync.c | 14 +++---
drivers/dma-buf/sync_debug.h | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 4 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 12 +++---
drivers/gpu/drm/drm_crtc.c | 2 +-
drivers/gpu/drm/drm_writeback.c | 2 +-
drivers/gpu/drm/nouveau/nouveau_drm.c | 5 ++-
drivers/gpu/drm/nouveau/nouveau_fence.c | 3 +-
drivers/gpu/drm/qxl/qxl_release.c | 3 +-
drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 3 +-
drivers/gpu/drm/xe/xe_hw_fence.c | 3 +-
drivers/gpu/drm/xe/xe_sched_job.c | 4 +-
include/linux/dma-fence.h | 42 +++++++++++++++++-
15 files changed, 99 insertions(+), 58 deletions(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index cd222984e2e1..f9d7439275d1 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -343,7 +343,6 @@ void __dma_fence_might_wait(void)
}
#endif
-
/**
* dma_fence_signal_timestamp_locked - signal completion of a fence
* @fence: the fence to signal
@@ -368,7 +367,7 @@ int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
struct dma_fence_cb *cur, *tmp;
struct list_head cb_list;
- lockdep_assert_held(fence->lock);
+ lockdep_assert_held(dma_fence_spinlock(fence));
if (unlikely(test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
&fence->flags)))
@@ -421,9 +420,9 @@ int dma_fence_signal_timestamp(struct dma_fence *fence, ktime_t timestamp)
if (WARN_ON(!fence))
return -EINVAL;
- spin_lock_irqsave(fence->lock, flags);
+ dma_fence_lock(fence, flags);
ret = dma_fence_signal_timestamp_locked(fence, timestamp);
- spin_unlock_irqrestore(fence->lock, flags);
+ dma_fence_unlock(fence, flags);
return ret;
}
@@ -475,9 +474,9 @@ int dma_fence_signal(struct dma_fence *fence)
tmp = dma_fence_begin_signalling();
- spin_lock_irqsave(fence->lock, flags);
+ dma_fence_lock(fence, flags);
ret = dma_fence_signal_timestamp_locked(fence, ktime_get());
- spin_unlock_irqrestore(fence->lock, flags);
+ dma_fence_unlock(fence, flags);
dma_fence_end_signalling(tmp);
@@ -579,10 +578,10 @@ void dma_fence_release(struct kref *kref)
* don't leave chains dangling. We set the error flag first
* so that the callbacks know this signal is due to an error.
*/
- spin_lock_irqsave(fence->lock, flags);
+ dma_fence_lock(fence, flags);
fence->error = -EDEADLK;
dma_fence_signal_locked(fence);
- spin_unlock_irqrestore(fence->lock, flags);
+ dma_fence_unlock(fence, flags);
}
ops = rcu_dereference(fence->ops);
@@ -612,7 +611,7 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
const struct dma_fence_ops *ops;
bool was_set;
- lockdep_assert_held(fence->lock);
+ lockdep_assert_held(dma_fence_spinlock(fence));
was_set = test_and_set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
&fence->flags);
@@ -648,9 +647,9 @@ void dma_fence_enable_sw_signaling(struct dma_fence *fence)
{
unsigned long flags;
- spin_lock_irqsave(fence->lock, flags);
+ dma_fence_lock(fence, flags);
__dma_fence_enable_signaling(fence);
- spin_unlock_irqrestore(fence->lock, flags);
+ dma_fence_unlock(fence, flags);
}
EXPORT_SYMBOL(dma_fence_enable_sw_signaling);
@@ -690,8 +689,7 @@ int dma_fence_add_callback(struct dma_fence *fence, struct dma_fence_cb *cb,
return -ENOENT;
}
- spin_lock_irqsave(fence->lock, flags);
-
+ dma_fence_lock(fence, flags);
if (__dma_fence_enable_signaling(fence)) {
cb->func = func;
list_add_tail(&cb->node, &fence->cb_list);
@@ -699,8 +697,7 @@ int dma_fence_add_callback(struct dma_fence *fence, struct dma_fence_cb *cb,
INIT_LIST_HEAD(&cb->node);
ret = -ENOENT;
}
-
- spin_unlock_irqrestore(fence->lock, flags);
+ dma_fence_unlock(fence, flags);
return ret;
}
@@ -723,9 +720,9 @@ int dma_fence_get_status(struct dma_fence *fence)
unsigned long flags;
int status;
- spin_lock_irqsave(fence->lock, flags);
+ dma_fence_lock(fence, flags);
status = dma_fence_get_status_locked(fence);
- spin_unlock_irqrestore(fence->lock, flags);
+ dma_fence_unlock(fence, flags);
return status;
}
@@ -755,13 +752,11 @@ dma_fence_remove_callback(struct dma_fence *fence, struct dma_fence_cb *cb)
unsigned long flags;
bool ret;
- spin_lock_irqsave(fence->lock, flags);
-
+ dma_fence_lock(fence, flags);
ret = !list_empty(&cb->node);
if (ret)
list_del_init(&cb->node);
-
- spin_unlock_irqrestore(fence->lock, flags);
+ dma_fence_unlock(fence, flags);
return ret;
}
@@ -800,8 +795,7 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
unsigned long flags;
signed long ret = timeout ? timeout : 1;
- spin_lock_irqsave(fence->lock, flags);
-
+ dma_fence_lock(fence, flags);
if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
goto out;
@@ -824,11 +818,11 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
__set_current_state(TASK_INTERRUPTIBLE);
else
__set_current_state(TASK_UNINTERRUPTIBLE);
- spin_unlock_irqrestore(fence->lock, flags);
+ dma_fence_unlock(fence, flags);
ret = schedule_timeout(ret);
- spin_lock_irqsave(fence->lock, flags);
+ dma_fence_lock(fence, flags);
if (ret > 0 && intr && signal_pending(current))
ret = -ERESTARTSYS;
}
@@ -838,7 +832,7 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
__set_current_state(TASK_RUNNING);
out:
- spin_unlock_irqrestore(fence->lock, flags);
+ dma_fence_unlock(fence, flags);
return ret;
}
EXPORT_SYMBOL(dma_fence_default_wait);
@@ -1046,7 +1040,6 @@ static void
__dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
spinlock_t *lock, u64 context, u64 seqno, unsigned long flags)
{
- BUG_ON(!lock);
BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
kref_init(&fence->refcount);
@@ -1057,10 +1050,15 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
*/
RCU_INIT_POINTER(fence->ops, ops);
INIT_LIST_HEAD(&fence->cb_list);
- fence->lock = lock;
fence->context = context;
fence->seqno = seqno;
fence->flags = flags;
+ if (lock) {
+ fence->extern_lock = lock;
+ } else {
+ spin_lock_init(&fence->inline_lock);
+ fence->flags |= BIT(DMA_FENCE_FLAG_INLINE_LOCK_BIT);
+ }
fence->error = 0;
trace_dma_fence_init(fence);
diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
index 3c20f1d31cf5..8f48529214a4 100644
--- a/drivers/dma-buf/sw_sync.c
+++ b/drivers/dma-buf/sw_sync.c
@@ -155,12 +155,12 @@ static void timeline_fence_release(struct dma_fence *fence)
struct sync_timeline *parent = dma_fence_parent(fence);
unsigned long flags;
- spin_lock_irqsave(fence->lock, flags);
+ dma_fence_lock(fence, flags);
if (!list_empty(&pt->link)) {
list_del(&pt->link);
rb_erase(&pt->node, &parent->pt_tree);
}
- spin_unlock_irqrestore(fence->lock, flags);
+ dma_fence_unlock(fence, flags);
sync_timeline_put(parent);
dma_fence_free(fence);
@@ -178,7 +178,7 @@ static void timeline_fence_set_deadline(struct dma_fence *fence, ktime_t deadlin
struct sync_pt *pt = dma_fence_to_sync_pt(fence);
unsigned long flags;
- spin_lock_irqsave(fence->lock, flags);
+ dma_fence_lock(fence, flags);
if (test_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags)) {
if (ktime_before(deadline, pt->deadline))
pt->deadline = deadline;
@@ -186,7 +186,7 @@ static void timeline_fence_set_deadline(struct dma_fence *fence, ktime_t deadlin
pt->deadline = deadline;
__set_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags);
}
- spin_unlock_irqrestore(fence->lock, flags);
+ dma_fence_unlock(fence, flags);
}
static const struct dma_fence_ops timeline_fence_ops = {
@@ -427,13 +427,13 @@ static int sw_sync_ioctl_get_deadline(struct sync_timeline *obj, unsigned long a
goto put_fence;
}
- spin_lock_irqsave(fence->lock, flags);
+ dma_fence_lock(fence, flags);
if (!test_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags)) {
ret = -ENOENT;
goto unlock;
}
data.deadline_ns = ktime_to_ns(pt->deadline);
- spin_unlock_irqrestore(fence->lock, flags);
+ dma_fence_unlock(fence, flags);
dma_fence_put(fence);
@@ -446,7 +446,7 @@ static int sw_sync_ioctl_get_deadline(struct sync_timeline *obj, unsigned long a
return 0;
unlock:
- spin_unlock_irqrestore(fence->lock, flags);
+ dma_fence_unlock(fence, flags);
put_fence:
dma_fence_put(fence);
diff --git a/drivers/dma-buf/sync_debug.h b/drivers/dma-buf/sync_debug.h
index 02af347293d0..c49324505b20 100644
--- a/drivers/dma-buf/sync_debug.h
+++ b/drivers/dma-buf/sync_debug.h
@@ -47,7 +47,7 @@ struct sync_timeline {
static inline struct sync_timeline *dma_fence_parent(struct dma_fence *fence)
{
- return container_of(fence->lock, struct sync_timeline, lock);
+ return container_of(fence->extern_lock, struct sync_timeline, lock);
}
/**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 5ec5c3ff22bb..fcc7a3fb93b3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -468,10 +468,10 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, unsigned int vmid,
if (amdgpu_sriov_vf(ring->adev) || !ring->funcs->soft_recovery || !fence)
return false;
- spin_lock_irqsave(fence->lock, flags);
+ dma_fence_lock(fence, flags);
if (!dma_fence_is_signaled_locked(fence))
dma_fence_set_error(fence, -ENODATA);
- spin_unlock_irqrestore(fence->lock, flags);
+ dma_fence_unlock(fence, flags);
while (!dma_fence_is_signaled(fence) &&
ktime_to_ns(ktime_sub(deadline, ktime_get())) > 0)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index c1a801203949..61adba6ed071 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2767,8 +2767,8 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
dma_fence_put(vm->last_unlocked);
dma_fence_wait(vm->last_tlb_flush, false);
/* Make sure that all fence callbacks have completed */
- spin_lock_irqsave(vm->last_tlb_flush->lock, flags);
- spin_unlock_irqrestore(vm->last_tlb_flush->lock, flags);
+ dma_fence_lock(vm->last_tlb_flush, flags);
+ dma_fence_unlock(vm->last_tlb_flush, flags);
dma_fence_put(vm->last_tlb_flush);
list_for_each_entry_safe(mapping, tmp, &vm->freed, list) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index cf0ec94e8a07..d41511f59d15 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -631,20 +631,20 @@ bool amdgpu_vm_is_bo_always_valid(struct amdgpu_vm *vm, struct amdgpu_bo *bo);
*/
static inline uint64_t amdgpu_vm_tlb_seq(struct amdgpu_vm *vm)
{
+ struct dma_fence *fence;
unsigned long flags;
- spinlock_t *lock;
/*
* Workaround to stop racing between the fence signaling and handling
- * the cb. The lock is static after initially setting it up, just make
- * sure that the dma_fence structure isn't freed up.
+ * the cb.
*/
rcu_read_lock();
- lock = vm->last_tlb_flush->lock;
+ fence = dma_fence_get_rcu(vm->last_tlb_flush);
rcu_read_unlock();
- spin_lock_irqsave(lock, flags);
- spin_unlock_irqrestore(lock, flags);
+ dma_fence_lock(fence, flags);
+ dma_fence_unlock(fence, flags);
+ dma_fence_put(fence);
return atomic64_read(&vm->tlb_seq);
}
diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
index a7797d260f1e..17472915842f 100644
--- a/drivers/gpu/drm/drm_crtc.c
+++ b/drivers/gpu/drm/drm_crtc.c
@@ -159,7 +159,7 @@ static const struct dma_fence_ops drm_crtc_fence_ops;
static struct drm_crtc *fence_to_crtc(struct dma_fence *fence)
{
BUG_ON(fence->ops != &drm_crtc_fence_ops);
- return container_of(fence->lock, struct drm_crtc, fence_lock);
+ return container_of(fence->extern_lock, struct drm_crtc, fence_lock);
}
static const char *drm_crtc_fence_get_driver_name(struct dma_fence *fence)
diff --git a/drivers/gpu/drm/drm_writeback.c b/drivers/gpu/drm/drm_writeback.c
index 95b8a2e4bda6..624a4e8b6c99 100644
--- a/drivers/gpu/drm/drm_writeback.c
+++ b/drivers/gpu/drm/drm_writeback.c
@@ -81,7 +81,7 @@
* From userspace, this property will always read as zero.
*/
-#define fence_to_wb_connector(x) container_of(x->lock, \
+#define fence_to_wb_connector(x) container_of(x->extern_lock, \
struct drm_writeback_connector, \
fence_lock)
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 1527b801f013..2956ed2ec073 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -156,12 +156,13 @@ nouveau_name(struct drm_device *dev)
static inline bool
nouveau_cli_work_ready(struct dma_fence *fence)
{
+ unsigned long flags;
bool ret = true;
- spin_lock_irq(fence->lock);
+ dma_fence_lock(fence, flags);
if (!dma_fence_is_signaled_locked(fence))
ret = false;
- spin_unlock_irq(fence->lock);
+ dma_fence_unlock(fence, flags);
if (ret == true)
dma_fence_put(fence);
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
index 869d4335c0f4..a7512c22c85f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -41,7 +41,8 @@ static const struct dma_fence_ops nouveau_fence_ops_legacy;
static inline struct nouveau_fence_chan *
nouveau_fctx(struct nouveau_fence *fence)
{
- return container_of(fence->base.lock, struct nouveau_fence_chan, lock);
+ return container_of(fence->base.extern_lock, struct nouveau_fence_chan,
+ lock);
}
static bool
diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index 7b3c9a6016db..de05e392df88 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -62,7 +62,8 @@ static long qxl_fence_wait(struct dma_fence *fence, bool intr,
struct qxl_device *qdev;
unsigned long cur, end = jiffies + timeout;
- qdev = container_of(fence->lock, struct qxl_device, release_lock);
+ qdev = container_of(fence->extern_lock, struct qxl_device,
+ release_lock);
if (!wait_event_timeout(qdev->release_event,
(dma_fence_is_signaled(fence) ||
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 00be92da5509..621aa0aa8406 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -47,7 +47,8 @@ struct vmw_event_fence_action {
static struct vmw_fence_manager *
fman_from_fence(struct vmw_fence_obj *fence)
{
- return container_of(fence->base.lock, struct vmw_fence_manager, lock);
+ return container_of(fence->base.extern_lock, struct vmw_fence_manager,
+ lock);
}
static void vmw_fence_obj_destroy(struct dma_fence *f)
diff --git a/drivers/gpu/drm/xe/xe_hw_fence.c b/drivers/gpu/drm/xe/xe_hw_fence.c
index b2a0c46dfcd4..3456bec93c70 100644
--- a/drivers/gpu/drm/xe/xe_hw_fence.c
+++ b/drivers/gpu/drm/xe/xe_hw_fence.c
@@ -144,7 +144,8 @@ static struct xe_hw_fence *to_xe_hw_fence(struct dma_fence *fence);
static struct xe_hw_fence_irq *xe_hw_fence_irq(struct xe_hw_fence *fence)
{
- return container_of(fence->dma.lock, struct xe_hw_fence_irq, lock);
+ return container_of(fence->dma.extern_lock, struct xe_hw_fence_irq,
+ lock);
}
static const char *xe_hw_fence_get_driver_name(struct dma_fence *dma_fence)
diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
index f1ba9c19e218..415bee1fc0a5 100644
--- a/drivers/gpu/drm/xe/xe_sched_job.c
+++ b/drivers/gpu/drm/xe/xe_sched_job.c
@@ -189,11 +189,11 @@ static bool xe_fence_set_error(struct dma_fence *fence, int error)
unsigned long irq_flags;
bool signaled;
- spin_lock_irqsave(fence->lock, irq_flags);
+ dma_fence_lock(fence, irq_flags);
signaled = test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
if (!signaled)
dma_fence_set_error(fence, error);
- spin_unlock_irqrestore(fence->lock, irq_flags);
+ dma_fence_unlock(fence, irq_flags);
return signaled;
}
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index e1ba1d53de88..fb416f500664 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -34,7 +34,8 @@ struct seq_file;
* @ops: dma_fence_ops associated with this fence
* @rcu: used for releasing fence with kfree_rcu
* @cb_list: list of all callbacks to call
- * @lock: spin_lock_irqsave used for locking
+ * @extern_lock: external spin_lock_irqsave used for locking
+ * @inline_lock: alternative internal spin_lock_irqsave used for locking
* @context: execution context this fence belongs to, returned by
* dma_fence_context_alloc()
* @seqno: the sequence number of this fence inside the execution context,
@@ -48,6 +49,7 @@ struct seq_file;
* atomic ops (bit_*), so taking the spinlock will not be needed most
* of the time.
*
+ * DMA_FENCE_FLAG_INLINE_LOCK_BIT - use inline spinlock instead of external one
* DMA_FENCE_FLAG_SIGNALED_BIT - fence is already signaled
* DMA_FENCE_FLAG_TIMESTAMP_BIT - timestamp recorded for fence signaling
* DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT - enable_signaling might have been called
@@ -65,7 +67,10 @@ struct seq_file;
* been completed, or never called at all.
*/
struct dma_fence {
- spinlock_t *lock;
+ union {
+ spinlock_t *extern_lock;
+ spinlock_t inline_lock;
+ };
const struct dma_fence_ops __rcu *ops;
/*
* We clear the callback list on kref_put so that by the time we
@@ -98,6 +103,7 @@ struct dma_fence {
};
enum dma_fence_flag_bits {
+ DMA_FENCE_FLAG_INLINE_LOCK_BIT,
DMA_FENCE_FLAG_SEQNO64_BIT,
DMA_FENCE_FLAG_SIGNALED_BIT,
DMA_FENCE_FLAG_TIMESTAMP_BIT,
@@ -351,6 +357,38 @@ dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep)
} while (1);
}
+/**
+ * dma_fence_spinlock - return pointer to the spinlock protecting the fence
+ * @fence: the fence to get the lock from
+ *
+ * Return either the pointer to the embedded or the external spin lock.
+ */
+static inline spinlock_t *dma_fence_spinlock(struct dma_fence *fence)
+{
+ return test_bit(DMA_FENCE_FLAG_INLINE_LOCK_BIT, &fence->flags) ?
+ &fence->inline_lock : fence->extern_lock;
+}
+
+/**
+ * dma_fence_lock - irqsave lock the fence
+ * @fence: the fence to lock
+ * @flags: where to store the CPU flags.
+ *
+ * Lock the fence, preventing it from changing to the signaled state.
+ */
+#define dma_fence_lock(fence, flags) \
+ spin_lock_irqsave(dma_fence_spinlock(fence), flags)
+
+/**
+ * dma_fence_unlock - unlock the fence and irqrestore
+ * @fence: the fence to unlock
+ * @flags the CPU flags to restore
+ *
+ * Unlock the fence, allowing it to change it's state to signaled again.
+ */
+#define dma_fence_unlock(fence, flags) \
+ spin_unlock_irqrestore(dma_fence_spinlock(fence), flags)
+
#ifdef CONFIG_LOCKDEP
bool dma_fence_begin_signalling(void);
void dma_fence_end_signalling(bool cookie);
--
2.43.0
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH 06/20] dma-buf: use inline lock for the stub fence
2025-10-31 13:16 Independence for dma_fences! v2 Christian König
` (4 preceding siblings ...)
2025-10-31 13:16 ` [PATCH 05/20] dma-buf: inline spinlock for fence protection Christian König
@ 2025-10-31 13:16 ` Christian König
2025-11-04 15:05 ` Tvrtko Ursulin
2025-10-31 13:16 ` [PATCH 07/20] dma-buf: use inline lock for the dma-fence-array Christian König
` (13 subsequent siblings)
19 siblings, 1 reply; 55+ messages in thread
From: Christian König @ 2025-10-31 13:16 UTC (permalink / raw)
To: phasta, alexdeucher, simona.vetter, tursulin, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
Just as proof of concept and minor cleanup.
Signed-off-by: Christian König <christian.koenig@amd.com>
---
drivers/dma-buf/dma-fence.c | 20 ++++----------------
1 file changed, 4 insertions(+), 16 deletions(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index f9d7439275d1..0dbd2f641f37 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -24,7 +24,6 @@ EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit);
EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal);
EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
-static DEFINE_SPINLOCK(dma_fence_stub_lock);
static struct dma_fence dma_fence_stub;
/*
@@ -123,12 +122,8 @@ static const struct dma_fence_ops dma_fence_stub_ops = {
static int __init dma_fence_init_stub(void)
{
- dma_fence_init(&dma_fence_stub, &dma_fence_stub_ops,
- &dma_fence_stub_lock, 0, 0);
-
- set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
- &dma_fence_stub.flags);
-
+ dma_fence_init(&dma_fence_stub, &dma_fence_stub_ops, NULL, 0, 0);
+ set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &dma_fence_stub.flags);
dma_fence_signal(&dma_fence_stub);
return 0;
}
@@ -160,16 +155,9 @@ struct dma_fence *dma_fence_allocate_private_stub(ktime_t timestamp)
if (fence == NULL)
return NULL;
- dma_fence_init(fence,
- &dma_fence_stub_ops,
- &dma_fence_stub_lock,
- 0, 0);
-
- set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
- &fence->flags);
-
+ dma_fence_init(fence, &dma_fence_stub_ops, NULL, 0, 0);
+ set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags);
dma_fence_signal_timestamp(fence, timestamp);
-
return fence;
}
EXPORT_SYMBOL(dma_fence_allocate_private_stub);
--
2.43.0
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH 07/20] dma-buf: use inline lock for the dma-fence-array
2025-10-31 13:16 Independence for dma_fences! v2 Christian König
` (5 preceding siblings ...)
2025-10-31 13:16 ` [PATCH 06/20] dma-buf: use inline lock for the stub fence Christian König
@ 2025-10-31 13:16 ` Christian König
2025-11-05 8:50 ` Tvrtko Ursulin
2025-11-07 12:04 ` Philipp Stanner
2025-10-31 13:16 ` [PATCH 08/20] dma-buf: use inline lock for the dma-fence-chain Christian König
` (12 subsequent siblings)
19 siblings, 2 replies; 55+ messages in thread
From: Christian König @ 2025-10-31 13:16 UTC (permalink / raw)
To: phasta, alexdeucher, simona.vetter, tursulin, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
Just as proof of concept and minor cleanup.
Signed-off-by: Christian König <christian.koenig@amd.com>
---
drivers/dma-buf/dma-fence-array.c | 5 ++---
include/linux/dma-fence-array.h | 1 -
2 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c
index 6657d4b30af9..c2119a8049fe 100644
--- a/drivers/dma-buf/dma-fence-array.c
+++ b/drivers/dma-buf/dma-fence-array.c
@@ -204,9 +204,8 @@ void dma_fence_array_init(struct dma_fence_array *array,
array->num_fences = num_fences;
- spin_lock_init(&array->lock);
- dma_fence_init(&array->base, &dma_fence_array_ops, &array->lock,
- context, seqno);
+ dma_fence_init(&array->base, &dma_fence_array_ops, NULL, context,
+ seqno);
init_irq_work(&array->work, irq_dma_fence_array_work);
atomic_set(&array->num_pending, signal_on_any ? 1 : num_fences);
diff --git a/include/linux/dma-fence-array.h b/include/linux/dma-fence-array.h
index 079b3dec0a16..370b3d2bba37 100644
--- a/include/linux/dma-fence-array.h
+++ b/include/linux/dma-fence-array.h
@@ -38,7 +38,6 @@ struct dma_fence_array_cb {
struct dma_fence_array {
struct dma_fence base;
- spinlock_t lock;
unsigned num_fences;
atomic_t num_pending;
struct dma_fence **fences;
--
2.43.0
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH 08/20] dma-buf: use inline lock for the dma-fence-chain
2025-10-31 13:16 Independence for dma_fences! v2 Christian König
` (6 preceding siblings ...)
2025-10-31 13:16 ` [PATCH 07/20] dma-buf: use inline lock for the dma-fence-array Christian König
@ 2025-10-31 13:16 ` Christian König
2025-11-04 15:08 ` Tvrtko Ursulin
2025-10-31 13:16 ` [PATCH 09/20] drm/sched: use inline locks for the drm-sched-fence Christian König
` (11 subsequent siblings)
19 siblings, 1 reply; 55+ messages in thread
From: Christian König @ 2025-10-31 13:16 UTC (permalink / raw)
To: phasta, alexdeucher, simona.vetter, tursulin, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
Just as proof of concept and minor cleanup.
Signed-off-by: Christian König <christian.koenig@amd.com>
---
drivers/dma-buf/dma-fence-chain.c | 3 +--
include/linux/dma-fence-chain.h | 1 -
2 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
index a8a90acf4f34..a707792b6025 100644
--- a/drivers/dma-buf/dma-fence-chain.c
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -245,7 +245,6 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
struct dma_fence_chain *prev_chain = to_dma_fence_chain(prev);
uint64_t context;
- spin_lock_init(&chain->lock);
rcu_assign_pointer(chain->prev, prev);
chain->fence = fence;
chain->prev_seqno = 0;
@@ -261,7 +260,7 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
seqno = max(prev->seqno, seqno);
}
- dma_fence_init64(&chain->base, &dma_fence_chain_ops, &chain->lock,
+ dma_fence_init64(&chain->base, &dma_fence_chain_ops, NULL,
context, seqno);
/*
diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
index 68c3c1e41014..d39ce7a2e599 100644
--- a/include/linux/dma-fence-chain.h
+++ b/include/linux/dma-fence-chain.h
@@ -46,7 +46,6 @@ struct dma_fence_chain {
*/
struct irq_work work;
};
- spinlock_t lock;
};
--
2.43.0
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH 09/20] drm/sched: use inline locks for the drm-sched-fence
2025-10-31 13:16 Independence for dma_fences! v2 Christian König
` (7 preceding siblings ...)
2025-10-31 13:16 ` [PATCH 08/20] dma-buf: use inline lock for the dma-fence-chain Christian König
@ 2025-10-31 13:16 ` Christian König
2025-11-04 15:12 ` Tvrtko Ursulin
2025-10-31 13:16 ` [PATCH 10/20] drm/amdgpu: clean up and unify hw fence handling Christian König
` (10 subsequent siblings)
19 siblings, 1 reply; 55+ messages in thread
From: Christian König @ 2025-10-31 13:16 UTC (permalink / raw)
To: phasta, alexdeucher, simona.vetter, tursulin, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
Just as proof of concept and minor cleanup.
Signed-off-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/scheduler/sched_fence.c | 11 +++++------
include/drm/gpu_scheduler.h | 4 ----
2 files changed, 5 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
index 9391d6f0dc01..7a94e03341cb 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -156,19 +156,19 @@ static void drm_sched_fence_set_deadline_finished(struct dma_fence *f,
struct dma_fence *parent;
unsigned long flags;
- spin_lock_irqsave(&fence->lock, flags);
+ dma_fence_lock(f, flags);
/* If we already have an earlier deadline, keep it: */
if (test_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) &&
ktime_before(fence->deadline, deadline)) {
- spin_unlock_irqrestore(&fence->lock, flags);
+ dma_fence_unlock(f, flags);
return;
}
fence->deadline = deadline;
set_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags);
- spin_unlock_irqrestore(&fence->lock, flags);
+ dma_fence_unlock(f, flags);
/*
* smp_load_aquire() to ensure that if we are racing another
@@ -217,7 +217,6 @@ struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
fence->owner = owner;
fence->drm_client_id = drm_client_id;
- spin_lock_init(&fence->lock);
return fence;
}
@@ -230,9 +229,9 @@ void drm_sched_fence_init(struct drm_sched_fence *fence,
fence->sched = entity->rq->sched;
seq = atomic_inc_return(&entity->fence_seq);
dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
- &fence->lock, entity->fence_context, seq);
+ NULL, entity->fence_context, seq);
dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
- &fence->lock, entity->fence_context + 1, seq);
+ NULL, entity->fence_context + 1, seq);
}
module_init(drm_sched_fence_slab_init);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index fb88301b3c45..b77f24a783e3 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -297,10 +297,6 @@ struct drm_sched_fence {
* belongs to.
*/
struct drm_gpu_scheduler *sched;
- /**
- * @lock: the lock used by the scheduled and the finished fences.
- */
- spinlock_t lock;
/**
* @owner: job owner for debugging
*/
--
2.43.0
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH 10/20] drm/amdgpu: clean up and unify hw fence handling
2025-10-31 13:16 Independence for dma_fences! v2 Christian König
` (8 preceding siblings ...)
2025-10-31 13:16 ` [PATCH 09/20] drm/sched: use inline locks for the drm-sched-fence Christian König
@ 2025-10-31 13:16 ` Christian König
2025-11-04 15:14 ` Tvrtko Ursulin
2025-10-31 13:16 ` [PATCH 11/20] drm/amdgpu: fix KFD eviction fence enable_signaling path Christian König
` (9 subsequent siblings)
19 siblings, 1 reply; 55+ messages in thread
From: Christian König @ 2025-10-31 13:16 UTC (permalink / raw)
To: phasta, alexdeucher, simona.vetter, tursulin, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
From: Alex Deucher <alexander.deucher@amd.com>
Decouple the amdgpu fence from the amdgpu_job structure.
This lets us clean up the separate fence ops for the embedded
fence and other fences. This also allows us to allocate the
vm fence up front when we allocate the job.
v2: Additional cleanup suggested by Christian
v3: Additional cleanups suggested by Christian
v4: Additional cleanups suggested by David and
vm fence fix
v5: cast seqno (David)
Cc: David.Wu3@amd.com
Cc: christian.koenig@amd.com
Tested-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 143 ++------------------
drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 17 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 43 ++++--
drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 3 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 8 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 7 +-
8 files changed, 63 insertions(+), 167 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index a70651050acf..0a62727e74f0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1902,7 +1902,7 @@ static void amdgpu_ib_preempt_mark_partial_job(struct amdgpu_ring *ring)
continue;
}
job = to_amdgpu_job(s_job);
- if (preempted && (&job->hw_fence.base) == fence)
+ if (preempted && (&job->hw_fence->base) == fence)
/* mark the job as preempted */
job->preemption_status |= AMDGPU_IB_PREEMPTED;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index aa3736de238d..e4301686a8f4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5809,11 +5809,6 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
if (!amdgpu_ring_sched_ready(ring))
continue;
- /* Clear job fence from fence drv to avoid force_completion
- * leave NULL and vm flush fence in fence drv
- */
- amdgpu_fence_driver_clear_job_fences(ring);
-
/* after all hw jobs are reset, hw fence is meaningless, so force_completion */
amdgpu_fence_driver_force_completion(ring);
}
@@ -6542,7 +6537,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
*
* job->base holds a reference to parent fence
*/
- if (job && dma_fence_is_signaled(&job->hw_fence.base)) {
+ if (job && dma_fence_is_signaled(&job->hw_fence->base)) {
job_signaled = true;
dev_info(adev->dev, "Guilty job already signaled, skipping HW reset");
goto skip_hw_reset;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 18a7829122d2..1fe31d2f2706 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -45,16 +45,11 @@
* Cast helper
*/
static const struct dma_fence_ops amdgpu_fence_ops;
-static const struct dma_fence_ops amdgpu_job_fence_ops;
static inline struct amdgpu_fence *to_amdgpu_fence(struct dma_fence *f)
{
struct amdgpu_fence *__f = container_of(f, struct amdgpu_fence, base);
- if (__f->base.ops == &amdgpu_fence_ops ||
- __f->base.ops == &amdgpu_job_fence_ops)
- return __f;
-
- return NULL;
+ return __f;
}
/**
@@ -98,51 +93,32 @@ static u32 amdgpu_fence_read(struct amdgpu_ring *ring)
* amdgpu_fence_emit - emit a fence on the requested ring
*
* @ring: ring the fence is associated with
- * @f: resulting fence object
* @af: amdgpu fence input
* @flags: flags to pass into the subordinate .emit_fence() call
*
* Emits a fence command on the requested ring (all asics).
* Returns 0 on success, -ENOMEM on failure.
*/
-int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
- struct amdgpu_fence *af, unsigned int flags)
+int amdgpu_fence_emit(struct amdgpu_ring *ring, struct amdgpu_fence *af,
+ unsigned int flags)
{
struct amdgpu_device *adev = ring->adev;
struct dma_fence *fence;
- struct amdgpu_fence *am_fence;
struct dma_fence __rcu **ptr;
uint32_t seq;
int r;
- if (!af) {
- /* create a separate hw fence */
- am_fence = kzalloc(sizeof(*am_fence), GFP_KERNEL);
- if (!am_fence)
- return -ENOMEM;
- } else {
- am_fence = af;
- }
- fence = &am_fence->base;
- am_fence->ring = ring;
+ fence = &af->base;
+ af->ring = ring;
seq = ++ring->fence_drv.sync_seq;
- am_fence->seq = seq;
- if (af) {
- dma_fence_init(fence, &amdgpu_job_fence_ops,
- &ring->fence_drv.lock,
- adev->fence_context + ring->idx, seq);
- /* Against remove in amdgpu_job_{free, free_cb} */
- dma_fence_get(fence);
- } else {
- dma_fence_init(fence, &amdgpu_fence_ops,
- &ring->fence_drv.lock,
- adev->fence_context + ring->idx, seq);
- }
+ dma_fence_init(fence, &amdgpu_fence_ops,
+ &ring->fence_drv.lock,
+ adev->fence_context + ring->idx, seq);
amdgpu_ring_emit_fence(ring, ring->fence_drv.gpu_addr,
seq, flags | AMDGPU_FENCE_FLAG_INT);
- amdgpu_fence_save_wptr(fence);
+ amdgpu_fence_save_wptr(af);
pm_runtime_get_noresume(adev_to_drm(adev)->dev);
ptr = &ring->fence_drv.fences[seq & ring->fence_drv.num_fences_mask];
if (unlikely(rcu_dereference_protected(*ptr, 1))) {
@@ -167,8 +143,6 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
*/
rcu_assign_pointer(*ptr, dma_fence_get(fence));
- *f = fence;
-
return 0;
}
@@ -669,36 +643,6 @@ void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev)
}
}
-/**
- * amdgpu_fence_driver_clear_job_fences - clear job embedded fences of ring
- *
- * @ring: fence of the ring to be cleared
- *
- */
-void amdgpu_fence_driver_clear_job_fences(struct amdgpu_ring *ring)
-{
- int i;
- struct dma_fence *old, **ptr;
-
- for (i = 0; i <= ring->fence_drv.num_fences_mask; i++) {
- ptr = &ring->fence_drv.fences[i];
- old = rcu_dereference_protected(*ptr, 1);
- if (old && old->ops == &amdgpu_job_fence_ops) {
- struct amdgpu_job *job;
-
- /* For non-scheduler bad job, i.e. failed ib test, we need to signal
- * it right here or we won't be able to track them in fence_drv
- * and they will remain unsignaled during sa_bo free.
- */
- job = container_of(old, struct amdgpu_job, hw_fence.base);
- if (!job->base.s_fence && !dma_fence_is_signaled(old))
- dma_fence_signal(old);
- RCU_INIT_POINTER(*ptr, NULL);
- dma_fence_put(old);
- }
- }
-}
-
/**
* amdgpu_fence_driver_set_error - set error code on fences
* @ring: the ring which contains the fences
@@ -755,7 +699,7 @@ void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring)
/**
* amdgpu_fence_driver_guilty_force_completion - force signal of specified sequence
*
- * @fence: fence of the ring to signal
+ * @af: fence of the ring to signal
*
*/
void amdgpu_fence_driver_guilty_force_completion(struct amdgpu_fence *af)
@@ -792,15 +736,13 @@ void amdgpu_fence_driver_guilty_force_completion(struct amdgpu_fence *af)
} while (last_seq != seq);
spin_unlock_irqrestore(&ring->fence_drv.lock, flags);
/* signal the guilty fence */
- amdgpu_fence_write(ring, af->seq);
+ amdgpu_fence_write(ring, (u32)af->base.seqno);
amdgpu_fence_process(ring);
}
-void amdgpu_fence_save_wptr(struct dma_fence *fence)
+void amdgpu_fence_save_wptr(struct amdgpu_fence *af)
{
- struct amdgpu_fence *am_fence = container_of(fence, struct amdgpu_fence, base);
-
- am_fence->wptr = am_fence->ring->wptr;
+ af->wptr = af->ring->wptr;
}
static void amdgpu_ring_backup_unprocessed_command(struct amdgpu_ring *ring,
@@ -866,13 +808,6 @@ static const char *amdgpu_fence_get_timeline_name(struct dma_fence *f)
return (const char *)to_amdgpu_fence(f)->ring->name;
}
-static const char *amdgpu_job_fence_get_timeline_name(struct dma_fence *f)
-{
- struct amdgpu_job *job = container_of(f, struct amdgpu_job, hw_fence.base);
-
- return (const char *)to_amdgpu_ring(job->base.sched)->name;
-}
-
/**
* amdgpu_fence_enable_signaling - enable signalling on fence
* @f: fence
@@ -889,23 +824,6 @@ static bool amdgpu_fence_enable_signaling(struct dma_fence *f)
return true;
}
-/**
- * amdgpu_job_fence_enable_signaling - enable signalling on job fence
- * @f: fence
- *
- * This is the simliar function with amdgpu_fence_enable_signaling above, it
- * only handles the job embedded fence.
- */
-static bool amdgpu_job_fence_enable_signaling(struct dma_fence *f)
-{
- struct amdgpu_job *job = container_of(f, struct amdgpu_job, hw_fence.base);
-
- if (!timer_pending(&to_amdgpu_ring(job->base.sched)->fence_drv.fallback_timer))
- amdgpu_fence_schedule_fallback(to_amdgpu_ring(job->base.sched));
-
- return true;
-}
-
/**
* amdgpu_fence_free - free up the fence memory
*
@@ -921,21 +839,6 @@ static void amdgpu_fence_free(struct rcu_head *rcu)
kfree(to_amdgpu_fence(f));
}
-/**
- * amdgpu_job_fence_free - free up the job with embedded fence
- *
- * @rcu: RCU callback head
- *
- * Free up the job with embedded fence after the RCU grace period.
- */
-static void amdgpu_job_fence_free(struct rcu_head *rcu)
-{
- struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
-
- /* free job if fence has a parent job */
- kfree(container_of(f, struct amdgpu_job, hw_fence.base));
-}
-
/**
* amdgpu_fence_release - callback that fence can be freed
*
@@ -949,19 +852,6 @@ static void amdgpu_fence_release(struct dma_fence *f)
call_rcu(&f->rcu, amdgpu_fence_free);
}
-/**
- * amdgpu_job_fence_release - callback that job embedded fence can be freed
- *
- * @f: fence
- *
- * This is the simliar function with amdgpu_fence_release above, it
- * only handles the job embedded fence.
- */
-static void amdgpu_job_fence_release(struct dma_fence *f)
-{
- call_rcu(&f->rcu, amdgpu_job_fence_free);
-}
-
static const struct dma_fence_ops amdgpu_fence_ops = {
.get_driver_name = amdgpu_fence_get_driver_name,
.get_timeline_name = amdgpu_fence_get_timeline_name,
@@ -969,13 +859,6 @@ static const struct dma_fence_ops amdgpu_fence_ops = {
.release = amdgpu_fence_release,
};
-static const struct dma_fence_ops amdgpu_job_fence_ops = {
- .get_driver_name = amdgpu_fence_get_driver_name,
- .get_timeline_name = amdgpu_job_fence_get_timeline_name,
- .enable_signaling = amdgpu_job_fence_enable_signaling,
- .release = amdgpu_job_fence_release,
-};
-
/*
* Fence debugfs
*/
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 7d9bcb72e8dd..39229ece83f8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -149,17 +149,19 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
if (job) {
vm = job->vm;
fence_ctx = job->base.s_fence ?
- job->base.s_fence->scheduled.context : 0;
+ job->base.s_fence->finished.context : 0;
shadow_va = job->shadow_va;
csa_va = job->csa_va;
gds_va = job->gds_va;
init_shadow = job->init_shadow;
- af = &job->hw_fence;
+ af = job->hw_fence;
/* Save the context of the job for reset handling.
* The driver needs this so it can skip the ring
* contents for guilty contexts.
*/
- af->context = job->base.s_fence ? job->base.s_fence->finished.context : 0;
+ af->context = fence_ctx;
+ /* the vm fence is also part of the job's context */
+ job->hw_vm_fence->context = fence_ctx;
} else {
vm = NULL;
fence_ctx = 0;
@@ -167,7 +169,9 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
csa_va = 0;
gds_va = 0;
init_shadow = false;
- af = NULL;
+ af = kzalloc(sizeof(*af), GFP_ATOMIC);
+ if (!af)
+ return -ENOMEM;
}
if (!ring->sched.ready) {
@@ -289,7 +293,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
amdgpu_ring_init_cond_exec(ring, ring->cond_exe_gpu_addr);
}
- r = amdgpu_fence_emit(ring, f, af, fence_flags);
+ r = amdgpu_fence_emit(ring, af, fence_flags);
if (r) {
dev_err(adev->dev, "failed to emit fence (%d)\n", r);
if (job && job->vmid)
@@ -297,6 +301,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
amdgpu_ring_undo(ring);
return r;
}
+ *f = &af->base;
if (ring->funcs->insert_end)
ring->funcs->insert_end(ring);
@@ -317,7 +322,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
* fence so we know what rings contents to backup
* after we reset the queue.
*/
- amdgpu_fence_save_wptr(*f);
+ amdgpu_fence_save_wptr(af);
amdgpu_ring_ib_end(ring);
amdgpu_ring_commit(ring);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index d020a890a0ea..e08d837668f1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -137,7 +137,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
ring->funcs->reset) {
dev_err(adev->dev, "Starting %s ring reset\n",
s_job->sched->name);
- r = amdgpu_ring_reset(ring, job->vmid, &job->hw_fence);
+ r = amdgpu_ring_reset(ring, job->vmid, job->hw_fence);
if (!r) {
atomic_inc(&ring->adev->gpu_reset_counter);
dev_err(adev->dev, "Ring %s reset succeeded\n",
@@ -186,6 +186,9 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
unsigned int num_ibs, struct amdgpu_job **job,
u64 drm_client_id)
{
+ struct amdgpu_fence *af;
+ int r;
+
if (num_ibs == 0)
return -EINVAL;
@@ -193,6 +196,20 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
if (!*job)
return -ENOMEM;
+ af = kzalloc(sizeof(struct amdgpu_fence), GFP_KERNEL);
+ if (!af) {
+ r = -ENOMEM;
+ goto err_job;
+ }
+ (*job)->hw_fence = af;
+
+ af = kzalloc(sizeof(struct amdgpu_fence), GFP_KERNEL);
+ if (!af) {
+ r = -ENOMEM;
+ goto err_fence;
+ }
+ (*job)->hw_vm_fence = af;
+
(*job)->vm = vm;
amdgpu_sync_create(&(*job)->explicit_sync);
@@ -204,6 +221,13 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
return drm_sched_job_init(&(*job)->base, entity, 1, owner,
drm_client_id);
+
+err_fence:
+ kfree((*job)->hw_fence);
+err_job:
+ kfree(*job);
+
+ return r;
}
int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev,
@@ -251,11 +275,11 @@ void amdgpu_job_free_resources(struct amdgpu_job *job)
struct dma_fence *f;
unsigned i;
- /* Check if any fences where initialized */
+ /* Check if any fences were initialized */
if (job->base.s_fence && job->base.s_fence->finished.ops)
f = &job->base.s_fence->finished;
- else if (job->hw_fence.base.ops)
- f = &job->hw_fence.base;
+ else if (job->hw_fence && job->hw_fence->base.ops)
+ f = &job->hw_fence->base;
else
f = NULL;
@@ -271,11 +295,7 @@ static void amdgpu_job_free_cb(struct drm_sched_job *s_job)
amdgpu_sync_free(&job->explicit_sync);
- /* only put the hw fence if has embedded fence */
- if (!job->hw_fence.base.ops)
- kfree(job);
- else
- dma_fence_put(&job->hw_fence.base);
+ kfree(job);
}
void amdgpu_job_set_gang_leader(struct amdgpu_job *job,
@@ -304,10 +324,7 @@ void amdgpu_job_free(struct amdgpu_job *job)
if (job->gang_submit != &job->base.s_fence->scheduled)
dma_fence_put(job->gang_submit);
- if (!job->hw_fence.base.ops)
- kfree(job);
- else
- dma_fence_put(&job->hw_fence.base);
+ kfree(job);
}
struct dma_fence *amdgpu_job_submit(struct amdgpu_job *job)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
index 4a6487eb6cb5..7abf069d17d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
@@ -64,7 +64,8 @@ struct amdgpu_job {
struct drm_sched_job base;
struct amdgpu_vm *vm;
struct amdgpu_sync explicit_sync;
- struct amdgpu_fence hw_fence;
+ struct amdgpu_fence *hw_fence;
+ struct amdgpu_fence *hw_vm_fence;
struct dma_fence *gang_submit;
uint32_t preamble_status;
uint32_t preemption_status;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 4b46e3c26ff3..87b962df5460 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -147,16 +147,14 @@ struct amdgpu_fence {
u64 wptr;
/* fence context for resets */
u64 context;
- uint32_t seq;
};
extern const struct drm_sched_backend_ops amdgpu_sched_ops;
-void amdgpu_fence_driver_clear_job_fences(struct amdgpu_ring *ring);
void amdgpu_fence_driver_set_error(struct amdgpu_ring *ring, int error);
void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring);
void amdgpu_fence_driver_guilty_force_completion(struct amdgpu_fence *af);
-void amdgpu_fence_save_wptr(struct dma_fence *fence);
+void amdgpu_fence_save_wptr(struct amdgpu_fence *af);
int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring);
int amdgpu_fence_driver_start_ring(struct amdgpu_ring *ring,
@@ -166,8 +164,8 @@ void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev);
void amdgpu_fence_driver_hw_fini(struct amdgpu_device *adev);
int amdgpu_fence_driver_sw_init(struct amdgpu_device *adev);
void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev);
-int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
- struct amdgpu_fence *af, unsigned int flags);
+int amdgpu_fence_emit(struct amdgpu_ring *ring, struct amdgpu_fence *af,
+ unsigned int flags);
int amdgpu_fence_emit_polling(struct amdgpu_ring *ring, uint32_t *s,
uint32_t timeout);
bool amdgpu_fence_process(struct amdgpu_ring *ring);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 61adba6ed071..164091eb4e7e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -779,7 +779,6 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job,
bool cleaner_shader_needed = false;
bool pasid_mapping_needed = false;
struct dma_fence *fence = NULL;
- struct amdgpu_fence *af;
unsigned int patch;
int r;
@@ -842,12 +841,10 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job,
}
if (vm_flush_needed || pasid_mapping_needed || cleaner_shader_needed) {
- r = amdgpu_fence_emit(ring, &fence, NULL, 0);
+ r = amdgpu_fence_emit(ring, job->hw_vm_fence, 0);
if (r)
return r;
- /* this is part of the job's context */
- af = container_of(fence, struct amdgpu_fence, base);
- af->context = job->base.s_fence ? job->base.s_fence->finished.context : 0;
+ fence = &job->hw_vm_fence->base;
}
if (vm_flush_needed) {
--
2.43.0
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH 11/20] drm/amdgpu: fix KFD eviction fence enable_signaling path
2025-10-31 13:16 Independence for dma_fences! v2 Christian König
` (9 preceding siblings ...)
2025-10-31 13:16 ` [PATCH 10/20] drm/amdgpu: clean up and unify hw fence handling Christian König
@ 2025-10-31 13:16 ` Christian König
2025-11-04 16:28 ` Philipp Stanner
2025-10-31 13:16 ` [PATCH 12/20] drm/amdgpu: independence for the amdgpu_fence! Christian König
` (8 subsequent siblings)
19 siblings, 1 reply; 55+ messages in thread
From: Christian König @ 2025-10-31 13:16 UTC (permalink / raw)
To: phasta, alexdeucher, simona.vetter, tursulin, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
Calling dma_fence_is_signaled() here is illegal!
Signed-off-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
index 1ef758ac5076..09c919f72b6c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
@@ -120,12 +120,6 @@ static bool amdkfd_fence_enable_signaling(struct dma_fence *f)
{
struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
- if (!fence)
- return false;
-
- if (dma_fence_is_signaled(f))
- return true;
-
if (!fence->svm_bo) {
if (!kgd2kfd_schedule_evict_and_restore_process(fence->mm, f))
return true;
--
2.43.0
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH 12/20] drm/amdgpu: independence for the amdgpu_fence!
2025-10-31 13:16 Independence for dma_fences! v2 Christian König
` (10 preceding siblings ...)
2025-10-31 13:16 ` [PATCH 11/20] drm/amdgpu: fix KFD eviction fence enable_signaling path Christian König
@ 2025-10-31 13:16 ` Christian König
2025-10-31 13:16 ` [PATCH 13/20] drm/amdgpu: independence for the amdgpu_eviction_fence! Christian König
` (7 subsequent siblings)
19 siblings, 0 replies; 55+ messages in thread
From: Christian König @ 2025-10-31 13:16 UTC (permalink / raw)
To: phasta, alexdeucher, simona.vetter, tursulin, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
This should allow amdgpu_fences to outlive the amdgpu module.
Signed-off-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 63 +++++++----------------
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 -
2 files changed, 20 insertions(+), 44 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 1fe31d2f2706..413f65239ebd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -112,8 +112,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct amdgpu_fence *af,
af->ring = ring;
seq = ++ring->fence_drv.sync_seq;
- dma_fence_init(fence, &amdgpu_fence_ops,
- &ring->fence_drv.lock,
+ dma_fence_init(fence, &amdgpu_fence_ops, NULL,
adev->fence_context + ring->idx, seq);
amdgpu_ring_emit_fence(ring, ring->fence_drv.gpu_addr,
@@ -468,7 +467,6 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring)
timer_setup(&ring->fence_drv.fallback_timer, amdgpu_fence_fallback, 0);
ring->fence_drv.num_fences_mask = ring->num_hw_submission * 2 - 1;
- spin_lock_init(&ring->fence_drv.lock);
ring->fence_drv.fences = kcalloc(ring->num_hw_submission * 2, sizeof(void *),
GFP_KERNEL);
@@ -655,16 +653,20 @@ void amdgpu_fence_driver_set_error(struct amdgpu_ring *ring, int error)
struct amdgpu_fence_driver *drv = &ring->fence_drv;
unsigned long flags;
- spin_lock_irqsave(&drv->lock, flags);
+ rcu_read_lock();
for (unsigned int i = 0; i <= drv->num_fences_mask; ++i) {
struct dma_fence *fence;
- fence = rcu_dereference_protected(drv->fences[i],
- lockdep_is_held(&drv->lock));
- if (fence && !dma_fence_is_signaled_locked(fence))
+ fence = dma_fence_get_rcu(drv->fences[i]);
+ if (!fence)
+ continue;
+
+ dma_fence_lock(fence, flags);
+ if (!dma_fence_is_signaled_locked(fence))
dma_fence_set_error(fence, error);
+ dma_fence_unlock(fence, flags);
}
- spin_unlock_irqrestore(&drv->lock, flags);
+ rcu_read_unlock();
}
/**
@@ -715,16 +717,19 @@ void amdgpu_fence_driver_guilty_force_completion(struct amdgpu_fence *af)
seq = ring->fence_drv.sync_seq & ring->fence_drv.num_fences_mask;
/* mark all fences from the guilty context with an error */
- spin_lock_irqsave(&ring->fence_drv.lock, flags);
+ rcu_read_lock();
do {
last_seq++;
last_seq &= ring->fence_drv.num_fences_mask;
ptr = &ring->fence_drv.fences[last_seq];
- rcu_read_lock();
- unprocessed = rcu_dereference(*ptr);
+ unprocessed = dma_fence_get_rcu(*ptr);
+
+ if (!unprocessed)
+ continue;
- if (unprocessed && !dma_fence_is_signaled_locked(unprocessed)) {
+ dma_fence_lock(unprocessed, flags);
+ if (dma_fence_is_signaled_locked(unprocessed)) {
fence = container_of(unprocessed, struct amdgpu_fence, base);
if (fence == af)
@@ -732,9 +737,10 @@ void amdgpu_fence_driver_guilty_force_completion(struct amdgpu_fence *af)
else if (fence->context == af->context)
dma_fence_set_error(&fence->base, -ECANCELED);
}
- rcu_read_unlock();
+ dma_fence_unlock(unprocessed, flags);
+ dma_fence_put(unprocessed);
} while (last_seq != seq);
- spin_unlock_irqrestore(&ring->fence_drv.lock, flags);
+ rcu_read_unlock();
/* signal the guilty fence */
amdgpu_fence_write(ring, (u32)af->base.seqno);
amdgpu_fence_process(ring);
@@ -824,39 +830,10 @@ static bool amdgpu_fence_enable_signaling(struct dma_fence *f)
return true;
}
-/**
- * amdgpu_fence_free - free up the fence memory
- *
- * @rcu: RCU callback head
- *
- * Free up the fence memory after the RCU grace period.
- */
-static void amdgpu_fence_free(struct rcu_head *rcu)
-{
- struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
-
- /* free fence_slab if it's separated fence*/
- kfree(to_amdgpu_fence(f));
-}
-
-/**
- * amdgpu_fence_release - callback that fence can be freed
- *
- * @f: fence
- *
- * This function is called when the reference count becomes zero.
- * It just RCU schedules freeing up the fence.
- */
-static void amdgpu_fence_release(struct dma_fence *f)
-{
- call_rcu(&f->rcu, amdgpu_fence_free);
-}
-
static const struct dma_fence_ops amdgpu_fence_ops = {
.get_driver_name = amdgpu_fence_get_driver_name,
.get_timeline_name = amdgpu_fence_get_timeline_name,
.enable_signaling = amdgpu_fence_enable_signaling,
- .release = amdgpu_fence_release,
};
/*
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 87b962df5460..cab59a29b7c3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -124,7 +124,6 @@ struct amdgpu_fence_driver {
unsigned irq_type;
struct timer_list fallback_timer;
unsigned num_fences_mask;
- spinlock_t lock;
struct dma_fence **fences;
};
--
2.43.0
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH 13/20] drm/amdgpu: independence for the amdgpu_eviction_fence!
2025-10-31 13:16 Independence for dma_fences! v2 Christian König
` (11 preceding siblings ...)
2025-10-31 13:16 ` [PATCH 12/20] drm/amdgpu: independence for the amdgpu_fence! Christian König
@ 2025-10-31 13:16 ` Christian König
2025-11-04 15:45 ` Tvrtko Ursulin
2025-10-31 13:16 ` [PATCH 14/20] drm/amdgpu: independence for the amdgpu_vm_tlb_fence! Christian König
` (6 subsequent siblings)
19 siblings, 1 reply; 55+ messages in thread
From: Christian König @ 2025-10-31 13:16 UTC (permalink / raw)
To: phasta, alexdeucher, simona.vetter, tursulin, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
This should allow amdgpu_fences to outlive the amdgpu module.
Signed-off-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c | 3 +--
drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h | 1 -
2 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
index 23d7d0b0d625..95ee22c43ceb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
@@ -167,9 +167,8 @@ amdgpu_eviction_fence_create(struct amdgpu_eviction_fence_mgr *evf_mgr)
ev_fence->evf_mgr = evf_mgr;
get_task_comm(ev_fence->timeline_name, current);
- spin_lock_init(&ev_fence->lock);
dma_fence_init64(&ev_fence->base, &amdgpu_eviction_fence_ops,
- &ev_fence->lock, evf_mgr->ev_fence_ctx,
+ NULL, evf_mgr->ev_fence_ctx,
atomic_inc_return(&evf_mgr->ev_fence_seq));
return ev_fence;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
index fcd867b7147d..fb70efb54338 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
@@ -27,7 +27,6 @@
struct amdgpu_eviction_fence {
struct dma_fence base;
- spinlock_t lock;
char timeline_name[TASK_COMM_LEN];
struct amdgpu_eviction_fence_mgr *evf_mgr;
};
--
2.43.0
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH 14/20] drm/amdgpu: independence for the amdgpu_vm_tlb_fence!
2025-10-31 13:16 Independence for dma_fences! v2 Christian König
` (12 preceding siblings ...)
2025-10-31 13:16 ` [PATCH 13/20] drm/amdgpu: independence for the amdgpu_eviction_fence! Christian König
@ 2025-10-31 13:16 ` Christian König
2025-11-04 15:45 ` Tvrtko Ursulin
2025-10-31 13:16 ` [PATCH 15/20] drm/amdgpu: independence for the amdkfd_fence! Christian König
` (5 subsequent siblings)
19 siblings, 1 reply; 55+ messages in thread
From: Christian König @ 2025-10-31 13:16 UTC (permalink / raw)
To: phasta, alexdeucher, simona.vetter, tursulin, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
This should allow amdgpu_vm_tlb_fences to outlive the amdgpu module.
Signed-off-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
index 5d26797356a3..27bf1f569830 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
@@ -33,7 +33,6 @@ struct amdgpu_tlb_fence {
struct amdgpu_device *adev;
struct dma_fence *dependency;
struct work_struct work;
- spinlock_t lock;
uint16_t pasid;
};
@@ -98,9 +97,8 @@ void amdgpu_vm_tlb_fence_create(struct amdgpu_device *adev, struct amdgpu_vm *vm
f->dependency = *fence;
f->pasid = vm->pasid;
INIT_WORK(&f->work, amdgpu_tlb_fence_work);
- spin_lock_init(&f->lock);
- dma_fence_init64(&f->base, &amdgpu_tlb_fence_ops, &f->lock,
+ dma_fence_init64(&f->base, &amdgpu_tlb_fence_ops, NULL,
vm->tlb_fence_context, atomic64_read(&vm->tlb_seq));
/* TODO: We probably need a separate wq here */
--
2.43.0
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH 15/20] drm/amdgpu: independence for the amdkfd_fence!
2025-10-31 13:16 Independence for dma_fences! v2 Christian König
` (13 preceding siblings ...)
2025-10-31 13:16 ` [PATCH 14/20] drm/amdgpu: independence for the amdgpu_vm_tlb_fence! Christian König
@ 2025-10-31 13:16 ` Christian König
2025-10-31 14:34 ` Kuehling, Felix
2025-10-31 13:16 ` [PATCH 16/20] drm/amdgpu: independence for the amdgpu_userq__fence! Christian König
` (4 subsequent siblings)
19 siblings, 1 reply; 55+ messages in thread
From: Christian König @ 2025-10-31 13:16 UTC (permalink / raw)
To: phasta, alexdeucher, simona.vetter, tursulin, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
This should allow amdkfd_fences to outlive the amdgpu module.
Signed-off-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 6 ++++
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 36 +++++++------------
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 7 ++--
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 +--
4 files changed, 24 insertions(+), 29 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 9e120c934cc1..35c59c784b7b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -196,6 +196,7 @@ int kfd_debugfs_kfd_mem_limits(struct seq_file *m, void *data);
#endif
#if IS_ENABLED(CONFIG_HSA_AMD)
bool amdkfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm);
+void amdkfd_fence_signal(struct dma_fence *f);
struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f);
void amdgpu_amdkfd_remove_all_eviction_fences(struct amdgpu_bo *bo);
int amdgpu_amdkfd_evict_userptr(struct mmu_interval_notifier *mni,
@@ -210,6 +211,11 @@ bool amdkfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm)
return false;
}
+static inline
+void amdkfd_fence_signal(struct dma_fence *f)
+{
+}
+
static inline
struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f)
{
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
index 09c919f72b6c..69bca4536326 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
@@ -127,29 +127,9 @@ static bool amdkfd_fence_enable_signaling(struct dma_fence *f)
if (!svm_range_schedule_evict_svm_bo(fence))
return true;
}
- return false;
-}
-
-/**
- * amdkfd_fence_release - callback that fence can be freed
- *
- * @f: dma_fence
- *
- * This function is called when the reference count becomes zero.
- * Drops the mm_struct reference and RCU schedules freeing up the fence.
- */
-static void amdkfd_fence_release(struct dma_fence *f)
-{
- struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
-
- /* Unconditionally signal the fence. The process is getting
- * terminated.
- */
- if (WARN_ON(!fence))
- return; /* Not an amdgpu_amdkfd_fence */
-
mmdrop(fence->mm);
- kfree_rcu(f, rcu);
+ fence->mm = NULL;
+ return false;
}
/**
@@ -174,9 +154,19 @@ bool amdkfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm)
return false;
}
+void amdkfd_fence_signal(struct dma_fence *f)
+{
+ struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
+
+ if (fence) {
+ mmdrop(fence->mm);
+ fence->mm = NULL;
+ }
+ dma_fence_signal(f);
+}
+
static const struct dma_fence_ops amdkfd_fence_ops = {
.get_driver_name = amdkfd_fence_get_driver_name,
.get_timeline_name = amdkfd_fence_get_timeline_name,
.enable_signaling = amdkfd_fence_enable_signaling,
- .release = amdkfd_fence_release,
};
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index ddfe30c13e9d..779d7701bac9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1177,7 +1177,7 @@ static void kfd_process_wq_release(struct work_struct *work)
synchronize_rcu();
ef = rcu_access_pointer(p->ef);
if (ef)
- dma_fence_signal(ef);
+ amdkfd_fence_signal(ef);
kfd_process_remove_sysfs(p);
kfd_debugfs_remove_process(p);
@@ -1986,7 +1986,6 @@ kfd_process_gpuid_from_node(struct kfd_process *p, struct kfd_node *node,
static int signal_eviction_fence(struct kfd_process *p)
{
struct dma_fence *ef;
- int ret;
rcu_read_lock();
ef = dma_fence_get_rcu_safe(&p->ef);
@@ -1994,10 +1993,10 @@ static int signal_eviction_fence(struct kfd_process *p)
if (!ef)
return -EINVAL;
- ret = dma_fence_signal(ef);
+ amdkfd_fence_signal(ef);
dma_fence_put(ef);
- return ret;
+ return 0;
}
static void evict_process_worker(struct work_struct *work)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 9d72411c3379..5d62d231a865 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -428,7 +428,7 @@ static void svm_range_bo_release(struct kref *kref)
if (!dma_fence_is_signaled(&svm_bo->eviction_fence->base))
/* We're not in the eviction worker. Signal the fence. */
- dma_fence_signal(&svm_bo->eviction_fence->base);
+ amdkfd_fence_signal(&svm_bo->eviction_fence->base);
dma_fence_put(&svm_bo->eviction_fence->base);
amdgpu_bo_unref(&svm_bo->bo);
kfree(svm_bo);
@@ -3622,7 +3622,7 @@ static void svm_range_evict_svm_bo_worker(struct work_struct *work)
mmap_read_unlock(mm);
mmput(mm);
- dma_fence_signal(&svm_bo->eviction_fence->base);
+ amdkfd_fence_signal(&svm_bo->eviction_fence->base);
/* This is the last reference to svm_bo, after svm_range_vram_node_free
* has been called in svm_migrate_vram_to_ram
--
2.43.0
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH 16/20] drm/amdgpu: independence for the amdgpu_userq__fence!
2025-10-31 13:16 Independence for dma_fences! v2 Christian König
` (14 preceding siblings ...)
2025-10-31 13:16 ` [PATCH 15/20] drm/amdgpu: independence for the amdkfd_fence! Christian König
@ 2025-10-31 13:16 ` Christian König
2025-11-04 15:59 ` Tvrtko Ursulin
2025-10-31 13:16 ` [PATCH 17/20] drm/xe: Disconnect the low hanging fences from Xe module Christian König
` (3 subsequent siblings)
19 siblings, 1 reply; 55+ messages in thread
From: Christian König @ 2025-10-31 13:16 UTC (permalink / raw)
To: phasta, alexdeucher, simona.vetter, tursulin, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
This should allow amdgpu_userq_fences to outlive the amdgpu module.
Signed-off-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 13 +----
.../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 54 ++++---------------
.../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h | 8 ---
3 files changed, 11 insertions(+), 64 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 61268aa82df4..2ec4ffd7002a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -3147,11 +3147,7 @@ static int __init amdgpu_init(void)
r = amdgpu_sync_init();
if (r)
- goto error_sync;
-
- r = amdgpu_userq_fence_slab_init();
- if (r)
- goto error_fence;
+ return r;
DRM_INFO("amdgpu kernel modesetting enabled.\n");
amdgpu_register_atpx_handler();
@@ -3168,12 +3164,6 @@ static int __init amdgpu_init(void)
/* let modprobe override vga console setting */
return pci_register_driver(&amdgpu_kms_pci_driver);
-
-error_fence:
- amdgpu_sync_fini();
-
-error_sync:
- return r;
}
static void __exit amdgpu_exit(void)
@@ -3183,7 +3173,6 @@ static void __exit amdgpu_exit(void)
amdgpu_unregister_atpx_handler();
amdgpu_acpi_release();
amdgpu_sync_fini();
- amdgpu_userq_fence_slab_fini();
mmu_notifier_synchronize();
amdgpu_xcp_drv_release();
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
index 761bad98da3e..9e0d558c1e4c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
@@ -33,26 +33,6 @@
#include "amdgpu_userq_fence.h"
static const struct dma_fence_ops amdgpu_userq_fence_ops;
-static struct kmem_cache *amdgpu_userq_fence_slab;
-
-int amdgpu_userq_fence_slab_init(void)
-{
- amdgpu_userq_fence_slab = kmem_cache_create("amdgpu_userq_fence",
- sizeof(struct amdgpu_userq_fence),
- 0,
- SLAB_HWCACHE_ALIGN,
- NULL);
- if (!amdgpu_userq_fence_slab)
- return -ENOMEM;
-
- return 0;
-}
-
-void amdgpu_userq_fence_slab_fini(void)
-{
- rcu_barrier();
- kmem_cache_destroy(amdgpu_userq_fence_slab);
-}
static inline struct amdgpu_userq_fence *to_amdgpu_userq_fence(struct dma_fence *f)
{
@@ -226,7 +206,7 @@ void amdgpu_userq_fence_driver_put(struct amdgpu_userq_fence_driver *fence_drv)
static int amdgpu_userq_fence_alloc(struct amdgpu_userq_fence **userq_fence)
{
- *userq_fence = kmem_cache_alloc(amdgpu_userq_fence_slab, GFP_ATOMIC);
+ *userq_fence = kmalloc(sizeof(**userq_fence), GFP_ATOMIC);
return *userq_fence ? 0 : -ENOMEM;
}
@@ -242,12 +222,11 @@ static int amdgpu_userq_fence_create(struct amdgpu_usermode_queue *userq,
if (!fence_drv)
return -EINVAL;
- spin_lock_init(&userq_fence->lock);
INIT_LIST_HEAD(&userq_fence->link);
fence = &userq_fence->base;
userq_fence->fence_drv = fence_drv;
- dma_fence_init64(fence, &amdgpu_userq_fence_ops, &userq_fence->lock,
+ dma_fence_init64(fence, &amdgpu_userq_fence_ops, NULL,
fence_drv->context, seq);
amdgpu_userq_fence_driver_get(fence_drv);
@@ -317,35 +296,22 @@ static bool amdgpu_userq_fence_signaled(struct dma_fence *f)
rptr = amdgpu_userq_fence_read(fence_drv);
wptr = fence->base.seqno;
- if (rptr >= wptr)
+ if (rptr >= wptr) {
+ amdgpu_userq_fence_driver_put(fence->fence_drv);
+ fence->fence_drv = NULL;
+
+ kvfree(fence->fence_drv_array);
+ fence->fence_drv_array = NULL;
return true;
+ }
return false;
}
-static void amdgpu_userq_fence_free(struct rcu_head *rcu)
-{
- struct dma_fence *fence = container_of(rcu, struct dma_fence, rcu);
- struct amdgpu_userq_fence *userq_fence = to_amdgpu_userq_fence(fence);
- struct amdgpu_userq_fence_driver *fence_drv = userq_fence->fence_drv;
-
- /* Release the fence driver reference */
- amdgpu_userq_fence_driver_put(fence_drv);
-
- kvfree(userq_fence->fence_drv_array);
- kmem_cache_free(amdgpu_userq_fence_slab, userq_fence);
-}
-
-static void amdgpu_userq_fence_release(struct dma_fence *f)
-{
- call_rcu(&f->rcu, amdgpu_userq_fence_free);
-}
-
static const struct dma_fence_ops amdgpu_userq_fence_ops = {
.get_driver_name = amdgpu_userq_fence_get_driver_name,
.get_timeline_name = amdgpu_userq_fence_get_timeline_name,
.signaled = amdgpu_userq_fence_signaled,
- .release = amdgpu_userq_fence_release,
};
/**
@@ -558,7 +524,7 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
r = amdgpu_userq_fence_create(queue, userq_fence, wptr, &fence);
if (r) {
mutex_unlock(&userq_mgr->userq_mutex);
- kmem_cache_free(amdgpu_userq_fence_slab, userq_fence);
+ kfree(userq_fence);
goto put_gobj_write;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
index d76add2afc77..6f04782f3ea9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
@@ -31,11 +31,6 @@
struct amdgpu_userq_fence {
struct dma_fence base;
- /*
- * This lock is necessary to synchronize the
- * userqueue dma fence operations.
- */
- spinlock_t lock;
struct list_head link;
unsigned long fence_drv_array_count;
struct amdgpu_userq_fence_driver *fence_drv;
@@ -58,9 +53,6 @@ struct amdgpu_userq_fence_driver {
char timeline_name[TASK_COMM_LEN];
};
-int amdgpu_userq_fence_slab_init(void);
-void amdgpu_userq_fence_slab_fini(void);
-
void amdgpu_userq_fence_driver_get(struct amdgpu_userq_fence_driver *fence_drv);
void amdgpu_userq_fence_driver_put(struct amdgpu_userq_fence_driver *fence_drv);
int amdgpu_userq_fence_driver_alloc(struct amdgpu_device *adev,
--
2.43.0
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH 17/20] drm/xe: Disconnect the low hanging fences from Xe module
2025-10-31 13:16 Independence for dma_fences! v2 Christian König
` (15 preceding siblings ...)
2025-10-31 13:16 ` [PATCH 16/20] drm/amdgpu: independence for the amdgpu_userq__fence! Christian König
@ 2025-10-31 13:16 ` Christian König
2025-10-31 13:16 ` [PATCH 18/20] drm/xe: Drop HW fence slab Christian König
` (2 subsequent siblings)
19 siblings, 0 replies; 55+ messages in thread
From: Christian König @ 2025-10-31 13:16 UTC (permalink / raw)
To: phasta, alexdeucher, simona.vetter, tursulin, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
From: Matthew Brost <matthew.brost@intel.com>
Preempt, tlb invalidation, and OA fences now use embedded fence lock.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
drivers/gpu/drm/xe/xe_oa.c | 5 +----
drivers/gpu/drm/xe/xe_preempt_fence.c | 3 +--
drivers/gpu/drm/xe/xe_preempt_fence_types.h | 2 --
drivers/gpu/drm/xe/xe_tlb_inval.c | 5 +----
drivers/gpu/drm/xe/xe_tlb_inval_types.h | 2 --
5 files changed, 3 insertions(+), 14 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
index f901ba52b403..c8613bafb8db 100644
--- a/drivers/gpu/drm/xe/xe_oa.c
+++ b/drivers/gpu/drm/xe/xe_oa.c
@@ -111,8 +111,6 @@ struct xe_oa_config_bo {
struct xe_oa_fence {
/* @base: dma fence base */
struct dma_fence base;
- /* @lock: lock for the fence */
- spinlock_t lock;
/* @work: work to signal @base */
struct delayed_work work;
/* @cb: callback to schedule @work */
@@ -1016,8 +1014,7 @@ static int xe_oa_emit_oa_config(struct xe_oa_stream *stream, struct xe_oa_config
}
/* Point of no return: initialize and set fence to signal */
- spin_lock_init(&ofence->lock);
- dma_fence_init(&ofence->base, &xe_oa_fence_ops, &ofence->lock, 0, 0);
+ dma_fence_init(&ofence->base, &xe_oa_fence_ops, NULL, 0, 0);
for (i = 0; i < stream->num_syncs; i++) {
if (stream->syncs[i].flags & DRM_XE_SYNC_FLAG_SIGNAL)
diff --git a/drivers/gpu/drm/xe/xe_preempt_fence.c b/drivers/gpu/drm/xe/xe_preempt_fence.c
index 7f587ca3947d..75f433aee747 100644
--- a/drivers/gpu/drm/xe/xe_preempt_fence.c
+++ b/drivers/gpu/drm/xe/xe_preempt_fence.c
@@ -145,9 +145,8 @@ xe_preempt_fence_arm(struct xe_preempt_fence *pfence, struct xe_exec_queue *q,
{
list_del_init(&pfence->link);
pfence->q = xe_exec_queue_get(q);
- spin_lock_init(&pfence->lock);
dma_fence_init(&pfence->base, &preempt_fence_ops,
- &pfence->lock, context, seqno);
+ NULL, context, seqno);
return &pfence->base;
}
diff --git a/drivers/gpu/drm/xe/xe_preempt_fence_types.h b/drivers/gpu/drm/xe/xe_preempt_fence_types.h
index ac125c697a41..a98de8d1c723 100644
--- a/drivers/gpu/drm/xe/xe_preempt_fence_types.h
+++ b/drivers/gpu/drm/xe/xe_preempt_fence_types.h
@@ -25,8 +25,6 @@ struct xe_preempt_fence {
struct xe_exec_queue *q;
/** @preempt_work: work struct which issues preemption */
struct work_struct preempt_work;
- /** @lock: dma-fence fence lock */
- spinlock_t lock;
/** @error: preempt fence is in error state */
int error;
};
diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.c b/drivers/gpu/drm/xe/xe_tlb_inval.c
index 918a59e686ea..5c23e76b0241 100644
--- a/drivers/gpu/drm/xe/xe_tlb_inval.c
+++ b/drivers/gpu/drm/xe/xe_tlb_inval.c
@@ -133,7 +133,6 @@ int xe_gt_tlb_inval_init_early(struct xe_gt *gt)
tlb_inval->seqno = 1;
INIT_LIST_HEAD(&tlb_inval->pending_fences);
spin_lock_init(&tlb_inval->pending_lock);
- spin_lock_init(&tlb_inval->lock);
INIT_DELAYED_WORK(&tlb_inval->fence_tdr, xe_tlb_inval_fence_timeout);
err = drmm_mutex_init(&xe->drm, &tlb_inval->seqno_lock);
@@ -420,10 +419,8 @@ void xe_tlb_inval_fence_init(struct xe_tlb_inval *tlb_inval,
{
xe_pm_runtime_get_noresume(tlb_inval->xe);
- spin_lock_irq(&tlb_inval->lock);
- dma_fence_init(&fence->base, &inval_fence_ops, &tlb_inval->lock,
+ dma_fence_init(&fence->base, &inval_fence_ops, NULL,
dma_fence_context_alloc(1), 1);
- spin_unlock_irq(&tlb_inval->lock);
INIT_LIST_HEAD(&fence->link);
if (stack)
set_bit(FENCE_STACK_BIT, &fence->base.flags);
diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_types.h b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
index 8f8b060e9005..80e893950099 100644
--- a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
+++ b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
@@ -104,8 +104,6 @@ struct xe_tlb_inval {
struct delayed_work fence_tdr;
/** @job_wq: schedules TLB invalidation jobs */
struct workqueue_struct *job_wq;
- /** @tlb_inval.lock: protects TLB invalidation fences */
- spinlock_t lock;
};
/**
--
2.43.0
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH 18/20] drm/xe: Drop HW fence slab
2025-10-31 13:16 Independence for dma_fences! v2 Christian König
` (16 preceding siblings ...)
2025-10-31 13:16 ` [PATCH 17/20] drm/xe: Disconnect the low hanging fences from Xe module Christian König
@ 2025-10-31 13:16 ` Christian König
2025-10-31 13:16 ` [PATCH 19/20] drm/xe: Promote xe_hw_fence_irq to an ref counted object Christian König
2025-10-31 13:16 ` [PATCH 20/20] drm/xe: Finish disconnect HW fences from module Christian König
19 siblings, 0 replies; 55+ messages in thread
From: Christian König @ 2025-10-31 13:16 UTC (permalink / raw)
To: phasta, alexdeucher, simona.vetter, tursulin, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
From: Matthew Brost <matthew.brost@intel.com>
Helps with disconnecting fences from Xe module.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
drivers/gpu/drm/xe/xe_hw_fence.c | 25 ++-----------------------
drivers/gpu/drm/xe/xe_hw_fence.h | 3 ---
drivers/gpu/drm/xe/xe_module.c | 5 -----
3 files changed, 2 insertions(+), 31 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_hw_fence.c b/drivers/gpu/drm/xe/xe_hw_fence.c
index 3456bec93c70..5edcf057aceb 100644
--- a/drivers/gpu/drm/xe/xe_hw_fence.c
+++ b/drivers/gpu/drm/xe/xe_hw_fence.c
@@ -6,7 +6,6 @@
#include "xe_hw_fence.h"
#include <linux/device.h>
-#include <linux/slab.h>
#include "xe_bo.h"
#include "xe_device.h"
@@ -16,28 +15,9 @@
#include "xe_map.h"
#include "xe_trace.h"
-static struct kmem_cache *xe_hw_fence_slab;
-
-int __init xe_hw_fence_module_init(void)
-{
- xe_hw_fence_slab = kmem_cache_create("xe_hw_fence",
- sizeof(struct xe_hw_fence), 0,
- SLAB_HWCACHE_ALIGN, NULL);
- if (!xe_hw_fence_slab)
- return -ENOMEM;
-
- return 0;
-}
-
-void xe_hw_fence_module_exit(void)
-{
- rcu_barrier();
- kmem_cache_destroy(xe_hw_fence_slab);
-}
-
static struct xe_hw_fence *fence_alloc(void)
{
- return kmem_cache_zalloc(xe_hw_fence_slab, GFP_KERNEL);
+ return kzalloc(sizeof(struct xe_hw_fence), GFP_KERNEL);
}
static void fence_free(struct rcu_head *rcu)
@@ -45,8 +25,7 @@ static void fence_free(struct rcu_head *rcu)
struct xe_hw_fence *fence =
container_of(rcu, struct xe_hw_fence, dma.rcu);
- if (!WARN_ON_ONCE(!fence))
- kmem_cache_free(xe_hw_fence_slab, fence);
+ kfree(fence);
}
static void hw_fence_irq_run_cb(struct irq_work *work)
diff --git a/drivers/gpu/drm/xe/xe_hw_fence.h b/drivers/gpu/drm/xe/xe_hw_fence.h
index f13a1c4982c7..96f34332fd8d 100644
--- a/drivers/gpu/drm/xe/xe_hw_fence.h
+++ b/drivers/gpu/drm/xe/xe_hw_fence.h
@@ -11,9 +11,6 @@
/* Cause an early wrap to catch wrapping errors */
#define XE_FENCE_INITIAL_SEQNO (-127)
-int xe_hw_fence_module_init(void);
-void xe_hw_fence_module_exit(void);
-
void xe_hw_fence_irq_init(struct xe_hw_fence_irq *irq);
void xe_hw_fence_irq_finish(struct xe_hw_fence_irq *irq);
void xe_hw_fence_irq_run(struct xe_hw_fence_irq *irq);
diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
index d08338fc3bc1..32517bcd533c 100644
--- a/drivers/gpu/drm/xe/xe_module.c
+++ b/drivers/gpu/drm/xe/xe_module.c
@@ -12,7 +12,6 @@
#include "xe_drv.h"
#include "xe_configfs.h"
-#include "xe_hw_fence.h"
#include "xe_pci.h"
#include "xe_pm.h"
#include "xe_observation.h"
@@ -114,10 +113,6 @@ static const struct init_funcs init_funcs[] = {
.init = xe_configfs_init,
.exit = xe_configfs_exit,
},
- {
- .init = xe_hw_fence_module_init,
- .exit = xe_hw_fence_module_exit,
- },
{
.init = xe_sched_job_module_init,
.exit = xe_sched_job_module_exit,
--
2.43.0
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH 19/20] drm/xe: Promote xe_hw_fence_irq to an ref counted object
2025-10-31 13:16 Independence for dma_fences! v2 Christian König
` (17 preceding siblings ...)
2025-10-31 13:16 ` [PATCH 18/20] drm/xe: Drop HW fence slab Christian König
@ 2025-10-31 13:16 ` Christian König
2025-10-31 13:16 ` [PATCH 20/20] drm/xe: Finish disconnect HW fences from module Christian König
19 siblings, 0 replies; 55+ messages in thread
From: Christian König @ 2025-10-31 13:16 UTC (permalink / raw)
To: phasta, alexdeucher, simona.vetter, tursulin, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
From: Matthew Brost <matthew.brost@intel.com>
Help disconnect fences from the Xe module.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
drivers/gpu/drm/xe/xe_exec_queue.c | 2 +-
drivers/gpu/drm/xe/xe_gt.c | 7 ++++--
drivers/gpu/drm/xe/xe_gt_types.h | 2 +-
drivers/gpu/drm/xe/xe_hw_engine.c | 2 +-
drivers/gpu/drm/xe/xe_hw_fence.c | 35 ++++++++++++++++++++++++--
drivers/gpu/drm/xe/xe_hw_fence.h | 2 +-
drivers/gpu/drm/xe/xe_hw_fence_types.h | 4 +++
7 files changed, 46 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index 1b57d7c2cc94..1ccc7f2ce75e 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -139,7 +139,7 @@ static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
q->width = width;
q->msix_vec = XE_IRQ_DEFAULT_MSIX;
q->logical_mask = logical_mask;
- q->fence_irq = >->fence_irq[hwe->class];
+ q->fence_irq = gt->fence_irq[hwe->class];
q->ring_ops = gt->ring_ops[hwe->class];
q->ops = gt->exec_queue_ops;
INIT_LIST_HEAD(&q->lr.link);
diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index 89808b33d0a8..d12b4669ebb6 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -608,7 +608,8 @@ static void xe_gt_fini(void *arg)
int i;
for (i = 0; i < XE_ENGINE_CLASS_MAX; ++i)
- xe_hw_fence_irq_finish(>->fence_irq[i]);
+ if (gt->fence_irq[i])
+ xe_hw_fence_irq_finish(gt->fence_irq[i]);
xe_gt_disable_host_l2_vram(gt);
}
@@ -622,7 +623,9 @@ int xe_gt_init(struct xe_gt *gt)
for (i = 0; i < XE_ENGINE_CLASS_MAX; ++i) {
gt->ring_ops[i] = xe_ring_ops_get(gt, i);
- xe_hw_fence_irq_init(>->fence_irq[i]);
+ gt->fence_irq[i] = xe_hw_fence_irq_init();
+ if (!gt->fence_irq[i])
+ return -ENOMEM;
}
err = devm_add_action_or_reset(gt_to_xe(gt)->drm.dev, xe_gt_fini, gt);
diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
index 0b525643a048..2e6066bc5056 100644
--- a/drivers/gpu/drm/xe/xe_gt_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_types.h
@@ -305,7 +305,7 @@ struct xe_gt {
const struct xe_ring_ops *ring_ops[XE_ENGINE_CLASS_MAX];
/** @fence_irq: fence IRQs (1 per engine class) */
- struct xe_hw_fence_irq fence_irq[XE_ENGINE_CLASS_MAX];
+ struct xe_hw_fence_irq *fence_irq[XE_ENGINE_CLASS_MAX];
/** @default_lrc: default LRC state */
void *default_lrc[XE_ENGINE_CLASS_MAX];
diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c
index 6a9e2a4272dd..480972c3da84 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine.c
+++ b/drivers/gpu/drm/xe/xe_hw_engine.c
@@ -524,7 +524,7 @@ static void hw_engine_init_early(struct xe_gt *gt, struct xe_hw_engine *hwe,
info->irq_offset;
hwe->domain = info->domain;
hwe->name = info->name;
- hwe->fence_irq = >->fence_irq[info->class];
+ hwe->fence_irq = gt->fence_irq[info->class];
hwe->engine_id = id;
hwe->eclass = >->eclass[hwe->class];
diff --git a/drivers/gpu/drm/xe/xe_hw_fence.c b/drivers/gpu/drm/xe/xe_hw_fence.c
index 5edcf057aceb..f5fad4426729 100644
--- a/drivers/gpu/drm/xe/xe_hw_fence.c
+++ b/drivers/gpu/drm/xe/xe_hw_fence.c
@@ -15,6 +15,25 @@
#include "xe_map.h"
#include "xe_trace.h"
+static void xe_hw_fence_irq_destroy(struct kref *ref)
+{
+ struct xe_hw_fence_irq *irq = container_of(ref, typeof(*irq), refcount);
+
+ kfree(irq);
+}
+
+static void xe_hw_fence_irq_put(struct xe_hw_fence_irq *irq)
+{
+ if (irq)
+ kref_put(&irq->refcount, xe_hw_fence_irq_destroy);
+}
+
+static struct xe_hw_fence_irq *xe_hw_fence_irq_get(struct xe_hw_fence_irq *irq)
+{
+ kref_get(&irq->refcount);
+ return irq;
+}
+
static struct xe_hw_fence *fence_alloc(void)
{
return kzalloc(sizeof(struct xe_hw_fence), GFP_KERNEL);
@@ -25,6 +44,7 @@ static void fence_free(struct rcu_head *rcu)
struct xe_hw_fence *fence =
container_of(rcu, struct xe_hw_fence, dma.rcu);
+ xe_hw_fence_irq_put(fence->irq);
kfree(fence);
}
@@ -52,12 +72,20 @@ static void hw_fence_irq_run_cb(struct irq_work *work)
dma_fence_end_signalling(tmp);
}
-void xe_hw_fence_irq_init(struct xe_hw_fence_irq *irq)
+struct xe_hw_fence_irq *xe_hw_fence_irq_init(void)
{
+ struct xe_hw_fence_irq *irq = kzalloc(sizeof(*irq), GFP_KERNEL);
+
+ if (!irq)
+ return NULL;
+
+ kref_init(&irq->refcount);
spin_lock_init(&irq->lock);
init_irq_work(&irq->work, hw_fence_irq_run_cb);
INIT_LIST_HEAD(&irq->pending);
irq->enabled = true;
+
+ return irq;
}
void xe_hw_fence_irq_finish(struct xe_hw_fence_irq *irq)
@@ -82,6 +110,8 @@ void xe_hw_fence_irq_finish(struct xe_hw_fence_irq *irq)
/* Safe release of the irq->lock used in dma_fence_init. */
synchronize_rcu();
+
+ xe_hw_fence_irq_put(irq);
}
void xe_hw_fence_irq_run(struct xe_hw_fence_irq *irq)
@@ -233,13 +263,14 @@ void xe_hw_fence_free(struct dma_fence *fence)
void xe_hw_fence_init(struct dma_fence *fence, struct xe_hw_fence_ctx *ctx,
struct iosys_map seqno_map)
{
- struct xe_hw_fence *hw_fence =
+ struct xe_hw_fence *hw_fence =
container_of(fence, typeof(*hw_fence), dma);
hw_fence->xe = gt_to_xe(ctx->gt);
snprintf(hw_fence->name, sizeof(hw_fence->name), "%s", ctx->name);
hw_fence->seqno_map = seqno_map;
INIT_LIST_HEAD(&hw_fence->irq_link);
+ hw_fence->irq = xe_hw_fence_irq_get(ctx->irq);
dma_fence_init(fence, &xe_hw_fence_ops, &ctx->irq->lock,
ctx->dma_fence_ctx, ctx->next_seqno++);
diff --git a/drivers/gpu/drm/xe/xe_hw_fence.h b/drivers/gpu/drm/xe/xe_hw_fence.h
index 96f34332fd8d..fa1620203b90 100644
--- a/drivers/gpu/drm/xe/xe_hw_fence.h
+++ b/drivers/gpu/drm/xe/xe_hw_fence.h
@@ -11,7 +11,7 @@
/* Cause an early wrap to catch wrapping errors */
#define XE_FENCE_INITIAL_SEQNO (-127)
-void xe_hw_fence_irq_init(struct xe_hw_fence_irq *irq);
+struct xe_hw_fence_irq *xe_hw_fence_irq_init(void);
void xe_hw_fence_irq_finish(struct xe_hw_fence_irq *irq);
void xe_hw_fence_irq_run(struct xe_hw_fence_irq *irq);
void xe_hw_fence_irq_stop(struct xe_hw_fence_irq *irq);
diff --git a/drivers/gpu/drm/xe/xe_hw_fence_types.h b/drivers/gpu/drm/xe/xe_hw_fence_types.h
index 58a8d09afe5c..0682c12520e9 100644
--- a/drivers/gpu/drm/xe/xe_hw_fence_types.h
+++ b/drivers/gpu/drm/xe/xe_hw_fence_types.h
@@ -28,6 +28,8 @@ struct xe_hw_fence_irq {
struct irq_work work;
/** @pending: list of pending xe_hw_fences */
struct list_head pending;
+ /** @refcount: ref count of this exec queue */
+ struct kref refcount;
/** @enabled: fence signaling enabled */
bool enabled;
};
@@ -62,6 +64,8 @@ struct xe_hw_fence_ctx {
struct xe_hw_fence {
/** @dma: base dma fence for hardware fence context */
struct dma_fence dma;
+ /** @irq: fence irq handler */
+ struct xe_hw_fence_irq *irq;
/** @xe: Xe device for hw fence driver name */
struct xe_device *xe;
/** @name: name of hardware fence context */
--
2.43.0
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH 20/20] drm/xe: Finish disconnect HW fences from module
2025-10-31 13:16 Independence for dma_fences! v2 Christian König
` (18 preceding siblings ...)
2025-10-31 13:16 ` [PATCH 19/20] drm/xe: Promote xe_hw_fence_irq to an ref counted object Christian König
@ 2025-10-31 13:16 ` Christian König
19 siblings, 0 replies; 55+ messages in thread
From: Christian König @ 2025-10-31 13:16 UTC (permalink / raw)
To: phasta, alexdeucher, simona.vetter, tursulin, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
From: Matthew Brost <matthew.brost@intel.com>
Be safe when dereferencing fence->xe.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
drivers/gpu/drm/xe/xe_hw_fence.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_hw_fence.c b/drivers/gpu/drm/xe/xe_hw_fence.c
index f5fad4426729..8181dfc628e4 100644
--- a/drivers/gpu/drm/xe/xe_hw_fence.c
+++ b/drivers/gpu/drm/xe/xe_hw_fence.c
@@ -159,9 +159,7 @@ static struct xe_hw_fence_irq *xe_hw_fence_irq(struct xe_hw_fence *fence)
static const char *xe_hw_fence_get_driver_name(struct dma_fence *dma_fence)
{
- struct xe_hw_fence *fence = to_xe_hw_fence(dma_fence);
-
- return dev_name(fence->xe->drm.dev);
+ return "xe";
}
static const char *xe_hw_fence_get_timeline_name(struct dma_fence *dma_fence)
@@ -175,10 +173,13 @@ static bool xe_hw_fence_signaled(struct dma_fence *dma_fence)
{
struct xe_hw_fence *fence = to_xe_hw_fence(dma_fence);
struct xe_device *xe = fence->xe;
- u32 seqno = xe_map_rd(xe, &fence->seqno_map, 0, u32);
+ u32 seqno;
+
+ if (dma_fence->error)
+ return true;
- return dma_fence->error ||
- !__dma_fence_is_later(dma_fence, dma_fence->seqno, seqno);
+ seqno = xe_map_rd(xe, &fence->seqno_map, 0, u32);
+ return !__dma_fence_is_later(dma_fence, dma_fence->seqno, seqno);
}
static bool xe_hw_fence_enable_signaling(struct dma_fence *dma_fence)
--
2.43.0
^ permalink raw reply related [flat|nested] 55+ messages in thread
* Re: [PATCH 01/20] dma-buf: cleanup dma_fence_describe v2
2025-10-31 13:16 ` [PATCH 01/20] dma-buf: cleanup dma_fence_describe v2 Christian König
@ 2025-10-31 14:04 ` Tvrtko Ursulin
0 siblings, 0 replies; 55+ messages in thread
From: Tvrtko Ursulin @ 2025-10-31 14:04 UTC (permalink / raw)
To: Christian König, phasta, alexdeucher, simona.vetter, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On 31/10/2025 13:16, Christian König wrote:
> The driver and timeline name are meaningless for signaled fences.
>
> Drop them and also print the context number.
>
> v2: avoid the calls when the BO is already signaled.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/dma-buf/dma-fence.c | 19 ++++++++++---------
> 1 file changed, 10 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 39e6f93dc310..cfa6ddcc1278 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -999,19 +999,20 @@ EXPORT_SYMBOL(dma_fence_set_deadline);
> */
> void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq)
> {
> - const char __rcu *timeline;
> - const char __rcu *driver;
> + const char __rcu *timeline = "";
> + const char __rcu *driver = "";
> + const char *signaled = "";
>
> rcu_read_lock();
>
> - timeline = dma_fence_timeline_name(fence);
> - driver = dma_fence_driver_name(fence);
> + if (!dma_fence_is_signaled(fence)) {
> + timeline = dma_fence_timeline_name(fence);
> + driver = dma_fence_driver_name(fence);
> + signaled = "un";
> + }
>
> - seq_printf(seq, "%s %s seq %llu %ssignalled\n",
> - rcu_dereference(driver),
> - rcu_dereference(timeline),
> - fence->seqno,
> - dma_fence_is_signaled(fence) ? "" : "un");
> + seq_printf(seq, "%llu %s %s seq %llu %ssignalled\n", fence->context,
> + timeline, driver, fence->seqno, signaled);
I still prefer my version, so if I am allowed to bike shed:
a) %llu:%llu for context:seqno to align with the tracepoints would
have been nicer.
b) Why move the driver name in the middle of the timeline and seqno?
c) Signalled case will have triple space which is odd.
But it's debugfs so who cares.
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Regards,
Tvrtko
>
> rcu_read_unlock();
> }
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 02/20] dma-buf: rework stub fence initialisation v2
2025-10-31 13:16 ` [PATCH 02/20] dma-buf: rework stub fence initialisation v2 Christian König
@ 2025-10-31 14:05 ` Tvrtko Ursulin
2025-11-04 15:01 ` Tvrtko Ursulin
1 sibling, 0 replies; 55+ messages in thread
From: Tvrtko Ursulin @ 2025-10-31 14:05 UTC (permalink / raw)
To: Christian König, phasta, alexdeucher, simona.vetter, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On 31/10/2025 13:16, Christian König wrote:
> Instead of doing this on the first call of the function just initialize
> the stub fence during kernel load.
>
> This has the clear advantage of lower overhead and also doesn't rely on
> the ops to not be NULL any more.
>
> v2: use correct signal function
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Regards,
Tvrtko
> ---
> drivers/dma-buf/dma-fence.c | 32 +++++++++++++++-----------------
> 1 file changed, 15 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index cfa6ddcc1278..b229d96f551c 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -121,29 +121,27 @@ static const struct dma_fence_ops dma_fence_stub_ops = {
> .get_timeline_name = dma_fence_stub_get_name,
> };
>
> +static int __init dma_fence_init_stub(void)
> +{
> + dma_fence_init(&dma_fence_stub, &dma_fence_stub_ops,
> + &dma_fence_stub_lock, 0, 0);
> +
> + set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> + &dma_fence_stub.flags);
> +
> + dma_fence_signal(&dma_fence_stub);
> + return 0;
> +}
> +subsys_initcall(dma_fence_init_stub);
> +
> /**
> * dma_fence_get_stub - return a signaled fence
> *
> - * Return a stub fence which is already signaled. The fence's
> - * timestamp corresponds to the first time after boot this
> - * function is called.
> + * Return a stub fence which is already signaled. The fence's timestamp
> + * corresponds to the initialisation time of the linux kernel.
> */
> struct dma_fence *dma_fence_get_stub(void)
> {
> - spin_lock(&dma_fence_stub_lock);
> - if (!dma_fence_stub.ops) {
> - dma_fence_init(&dma_fence_stub,
> - &dma_fence_stub_ops,
> - &dma_fence_stub_lock,
> - 0, 0);
> -
> - set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> - &dma_fence_stub.flags);
> -
> - dma_fence_signal_locked(&dma_fence_stub);
> - }
> - spin_unlock(&dma_fence_stub_lock);
> -
> return dma_fence_get(&dma_fence_stub);
> }
> EXPORT_SYMBOL(dma_fence_get_stub);
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 03/20] dma-buf: protected fence ops by RCU v2
2025-10-31 13:16 ` [PATCH 03/20] dma-buf: protected fence ops by RCU v2 Christian König
@ 2025-10-31 14:29 ` Tvrtko Ursulin
2025-11-06 13:14 ` Christian König
0 siblings, 1 reply; 55+ messages in thread
From: Tvrtko Ursulin @ 2025-10-31 14:29 UTC (permalink / raw)
To: Christian König, phasta, alexdeucher, simona.vetter, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On 31/10/2025 13:16, Christian König wrote:
> At first glance it is counter intuitive to protect a constant function
> pointer table by RCU, but this allows modules providing the function
> table to unload by waiting for an RCU grace period.
>
> v2: make one the now duplicated lockdep warnings a comment instead.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/dma-buf/dma-fence.c | 69 +++++++++++++++++++++++++------------
> include/linux/dma-fence.h | 18 ++++++++--
> 2 files changed, 62 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index b229d96f551c..ed82e8361096 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -498,6 +498,7 @@ EXPORT_SYMBOL(dma_fence_signal);
> signed long
> dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
> {
> + const struct dma_fence_ops *ops;
> signed long ret;
>
> if (WARN_ON(timeout < 0))
> @@ -509,15 +510,21 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
>
> dma_fence_enable_sw_signaling(fence);
>
> - if (trace_dma_fence_wait_start_enabled()) {
> - rcu_read_lock();
> - trace_dma_fence_wait_start(fence);
> + rcu_read_lock();
> + ops = rcu_dereference(fence->ops);
> + trace_dma_fence_wait_start(fence);
> + if (ops->wait) {
> + /*
> + * Implementing the wait ops is deprecated and not supported for
> + * issuer independent fences, so it is ok to use the ops outside
> + * the RCU protected section.
> + */
Probably a good idea to put this explanation about issue independent
fences to struct dma_fence_ops kerneldoc. At the moment only .wait is
documented as deprecated, so both it and .release can be expanded with
this additional angle.
> + rcu_read_unlock();
> + ret = ops->wait(fence, intr, timeout);
> + } else {
> rcu_read_unlock();
> - }
> - if (fence->ops->wait)
> - ret = fence->ops->wait(fence, intr, timeout);
> - else
> ret = dma_fence_default_wait(fence, intr, timeout);
> + }
> if (trace_dma_fence_wait_end_enabled()) {
> rcu_read_lock();
> trace_dma_fence_wait_end(fence);
> @@ -538,6 +545,7 @@ void dma_fence_release(struct kref *kref)
> {
> struct dma_fence *fence =
> container_of(kref, struct dma_fence, refcount);
> + const struct dma_fence_ops *ops;
>
> rcu_read_lock();
> trace_dma_fence_destroy(fence);
> @@ -569,12 +577,12 @@ void dma_fence_release(struct kref *kref)
> spin_unlock_irqrestore(fence->lock, flags);
> }
>
> - rcu_read_unlock();
> -
> - if (fence->ops->release)
> - fence->ops->release(fence);
> + ops = rcu_dereference(fence->ops);
> + if (ops->release)
> + ops->release(fence);
> else
> dma_fence_free(fence);
> + rcu_read_unlock();
> }
> EXPORT_SYMBOL(dma_fence_release);
>
> @@ -593,6 +601,7 @@ EXPORT_SYMBOL(dma_fence_free);
>
> static bool __dma_fence_enable_signaling(struct dma_fence *fence)
> {
> + const struct dma_fence_ops *ops;
> bool was_set;
>
> lockdep_assert_held(fence->lock);
> @@ -603,14 +612,18 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
> if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> return false;
>
> - if (!was_set && fence->ops->enable_signaling) {
> + rcu_read_lock();
> + ops = rcu_dereference(fence->ops);
> + if (!was_set && ops->enable_signaling) {
> trace_dma_fence_enable_signal(fence);
>
> - if (!fence->ops->enable_signaling(fence)) {
> + if (!ops->enable_signaling(fence)) {
Have you tried the series with PREEMPT_RT enabled? I am worried about
spin locks in any fence ops callbacks which now run with preemption
disabled.
Regards,
Tvrtko
> + rcu_read_unlock();
> dma_fence_signal_locked(fence);
> return false;
> }
> }
> + rcu_read_unlock();
>
> return true;
> }
> @@ -983,8 +996,13 @@ EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> */
> void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
> {
> - if (fence->ops->set_deadline && !dma_fence_is_signaled(fence))
> - fence->ops->set_deadline(fence, deadline);
> + const struct dma_fence_ops *ops;
> +
> + rcu_read_lock();
> + ops = rcu_dereference(fence->ops);
> + if (ops->set_deadline && !dma_fence_is_signaled(fence))
> + ops->set_deadline(fence, deadline);
> + rcu_read_unlock();
> }
> EXPORT_SYMBOL(dma_fence_set_deadline);
>
> @@ -1024,7 +1042,12 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
>
> kref_init(&fence->refcount);
> - fence->ops = ops;
> + /*
> + * At first glance it is counter intuitive to protect a constant
> + * function pointer table by RCU, but this allows modules providing the
> + * function table to unload by waiting for an RCU grace period.
> + */
> + RCU_INIT_POINTER(fence->ops, ops);
> INIT_LIST_HEAD(&fence->cb_list);
> fence->lock = lock;
> fence->context = context;
> @@ -1104,11 +1127,12 @@ EXPORT_SYMBOL(dma_fence_init64);
> */
> const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
> {
> - RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
> - "RCU protection is required for safe access to returned string");
> + const struct dma_fence_ops *ops;
>
> + /* RCU protection is required for safe access to returned string */
> + ops = rcu_dereference(fence->ops);
> if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> - return fence->ops->get_driver_name(fence);
> + return ops->get_driver_name(fence);
> else
> return "detached-driver";
> }
> @@ -1136,11 +1160,12 @@ EXPORT_SYMBOL(dma_fence_driver_name);
> */
> const char __rcu *dma_fence_timeline_name(struct dma_fence *fence)
> {
> - RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
> - "RCU protection is required for safe access to returned string");
> + const struct dma_fence_ops *ops;
>
> + /* RCU protection is required for safe access to returned string */
> + ops = rcu_dereference(fence->ops);
> if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> - return fence->ops->get_timeline_name(fence);
> + return ops->get_timeline_name(fence);
> else
> return "signaled-timeline";
> }
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 64639e104110..38421a0c7c5b 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -66,7 +66,7 @@ struct seq_file;
> */
> struct dma_fence {
> spinlock_t *lock;
> - const struct dma_fence_ops *ops;
> + const struct dma_fence_ops __rcu *ops;
> /*
> * We clear the callback list on kref_put so that by the time we
> * release the fence it is unused. No one should be adding to the
> @@ -418,13 +418,19 @@ const char __rcu *dma_fence_timeline_name(struct dma_fence *fence);
> static inline bool
> dma_fence_is_signaled_locked(struct dma_fence *fence)
> {
> + const struct dma_fence_ops *ops;
> +
> if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> return true;
>
> - if (fence->ops->signaled && fence->ops->signaled(fence)) {
> + rcu_read_lock();
> + ops = rcu_dereference(fence->ops);
> + if (ops->signaled && ops->signaled(fence)) {
> + rcu_read_unlock();
> dma_fence_signal_locked(fence);
> return true;
> }
> + rcu_read_unlock();
>
> return false;
> }
> @@ -448,13 +454,19 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
> static inline bool
> dma_fence_is_signaled(struct dma_fence *fence)
> {
> + const struct dma_fence_ops *ops;
> +
> if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> return true;
>
> - if (fence->ops->signaled && fence->ops->signaled(fence)) {
> + rcu_read_lock();
> + ops = rcu_dereference(fence->ops);
> + if (ops->signaled && ops->signaled(fence)) {
> + rcu_read_unlock();
> dma_fence_signal(fence);
> return true;
> }
> + rcu_read_unlock();
>
> return false;
> }
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 15/20] drm/amdgpu: independence for the amdkfd_fence!
2025-10-31 13:16 ` [PATCH 15/20] drm/amdgpu: independence for the amdkfd_fence! Christian König
@ 2025-10-31 14:34 ` Kuehling, Felix
0 siblings, 0 replies; 55+ messages in thread
From: Kuehling, Felix @ 2025-10-31 14:34 UTC (permalink / raw)
To: Christian König, phasta, alexdeucher, simona.vetter,
tursulin, airlied, matthew.brost
Cc: dri-devel, amd-gfx
[-- Attachment #1: Type: text/plain, Size: 6117 bytes --]
On 2025-10-31 09:16, Christian König wrote:
> This should allow amdkfd_fences to outlive the amdgpu module.
>
> Signed-off-by: Christian König<christian.koenig@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 6 ++++
> .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 36 +++++++------------
> drivers/gpu/drm/amd/amdkfd/kfd_process.c | 7 ++--
> drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 +--
> 4 files changed, 24 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index 9e120c934cc1..35c59c784b7b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -196,6 +196,7 @@ int kfd_debugfs_kfd_mem_limits(struct seq_file *m, void *data);
> #endif
> #if IS_ENABLED(CONFIG_HSA_AMD)
> bool amdkfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm);
> +void amdkfd_fence_signal(struct dma_fence *f);
> struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f);
> void amdgpu_amdkfd_remove_all_eviction_fences(struct amdgpu_bo *bo);
> int amdgpu_amdkfd_evict_userptr(struct mmu_interval_notifier *mni,
> @@ -210,6 +211,11 @@ bool amdkfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm)
> return false;
> }
>
> +static inline
> +void amdkfd_fence_signal(struct dma_fence *f)
> +{
> +}
> +
> static inline
> struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f)
> {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
> index 09c919f72b6c..69bca4536326 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
> @@ -127,29 +127,9 @@ static bool amdkfd_fence_enable_signaling(struct dma_fence *f)
> if (!svm_range_schedule_evict_svm_bo(fence))
> return true;
> }
> - return false;
> -}
> -
> -/**
> - * amdkfd_fence_release - callback that fence can be freed
> - *
> - * @f: dma_fence
> - *
> - * This function is called when the reference count becomes zero.
> - * Drops the mm_struct reference and RCU schedules freeing up the fence.
> - */
> -static void amdkfd_fence_release(struct dma_fence *f)
> -{
> - struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
> -
> - /* Unconditionally signal the fence. The process is getting
> - * terminated.
> - */
> - if (WARN_ON(!fence))
> - return; /* Not an amdgpu_amdkfd_fence */
> -
> mmdrop(fence->mm);
> - kfree_rcu(f, rcu);
> + fence->mm = NULL;
> + return false;
> }
>
> /**
> @@ -174,9 +154,19 @@ bool amdkfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm)
> return false;
> }
>
> +void amdkfd_fence_signal(struct dma_fence *f)
> +{
> + struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
> +
> + if (fence) {
> + mmdrop(fence->mm);
> + fence->mm = NULL;
> + }
> + dma_fence_signal(f);
> +}
> +
I'm still concerned about possible race conditions between
amdkfd_fence_signal and amdkfd_fence_enable_signaling. I think the
latter is always called with the fence->lock held. So this could be
fixed by taking the fence->lock in amdkfd_fence_signal:
void amdkfd_fence_signal(struct dma_fence *f)
{
struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
unsigned long flags;
spin_lock_irqsave(f->lock, &flags);
if (fence && fence->mm) {
mmdrop(fence->mm);
fence->mm = NULL;
}
dma_fence_signal_locked(f);
spin_unlock_irqrestore(f->lock, flags);
}
Also note that you need to NULL-check fence->mm (here and in
enable_signaling) because mmdrop doesn't have a check.
Regards,
Felix
> static const struct dma_fence_ops amdkfd_fence_ops = {
> .get_driver_name = amdkfd_fence_get_driver_name,
> .get_timeline_name = amdkfd_fence_get_timeline_name,
> .enable_signaling = amdkfd_fence_enable_signaling,
> - .release = amdkfd_fence_release,
> };
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index ddfe30c13e9d..779d7701bac9 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -1177,7 +1177,7 @@ static void kfd_process_wq_release(struct work_struct *work)
> synchronize_rcu();
> ef = rcu_access_pointer(p->ef);
> if (ef)
> - dma_fence_signal(ef);
> + amdkfd_fence_signal(ef);
>
> kfd_process_remove_sysfs(p);
> kfd_debugfs_remove_process(p);
> @@ -1986,7 +1986,6 @@ kfd_process_gpuid_from_node(struct kfd_process *p, struct kfd_node *node,
> static int signal_eviction_fence(struct kfd_process *p)
> {
> struct dma_fence *ef;
> - int ret;
>
> rcu_read_lock();
> ef = dma_fence_get_rcu_safe(&p->ef);
> @@ -1994,10 +1993,10 @@ static int signal_eviction_fence(struct kfd_process *p)
> if (!ef)
> return -EINVAL;
>
> - ret = dma_fence_signal(ef);
> + amdkfd_fence_signal(ef);
> dma_fence_put(ef);
>
> - return ret;
> + return 0;
> }
>
> static void evict_process_worker(struct work_struct *work)
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 9d72411c3379..5d62d231a865 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -428,7 +428,7 @@ static void svm_range_bo_release(struct kref *kref)
>
> if (!dma_fence_is_signaled(&svm_bo->eviction_fence->base))
> /* We're not in the eviction worker. Signal the fence. */
> - dma_fence_signal(&svm_bo->eviction_fence->base);
> + amdkfd_fence_signal(&svm_bo->eviction_fence->base);
> dma_fence_put(&svm_bo->eviction_fence->base);
> amdgpu_bo_unref(&svm_bo->bo);
> kfree(svm_bo);
> @@ -3622,7 +3622,7 @@ static void svm_range_evict_svm_bo_worker(struct work_struct *work)
> mmap_read_unlock(mm);
> mmput(mm);
>
> - dma_fence_signal(&svm_bo->eviction_fence->base);
> + amdkfd_fence_signal(&svm_bo->eviction_fence->base);
>
> /* This is the last reference to svm_bo, after svm_range_vram_node_free
> * has been called in svm_migrate_vram_to_ram
[-- Attachment #2: Type: text/html, Size: 6571 bytes --]
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 02/20] dma-buf: rework stub fence initialisation v2
2025-10-31 13:16 ` [PATCH 02/20] dma-buf: rework stub fence initialisation v2 Christian König
2025-10-31 14:05 ` Tvrtko Ursulin
@ 2025-11-04 15:01 ` Tvrtko Ursulin
2025-11-06 13:16 ` Christian König
1 sibling, 1 reply; 55+ messages in thread
From: Tvrtko Ursulin @ 2025-11-04 15:01 UTC (permalink / raw)
To: Christian König, phasta, alexdeucher, simona.vetter, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On 31/10/2025 13:16, Christian König wrote:
> Instead of doing this on the first call of the function just initialize
> the stub fence during kernel load.
>
> This has the clear advantage of lower overhead and also doesn't rely on
> the ops to not be NULL any more.
>
> v2: use correct signal function
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/dma-buf/dma-fence.c | 32 +++++++++++++++-----------------
> 1 file changed, 15 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index cfa6ddcc1278..b229d96f551c 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -121,29 +121,27 @@ static const struct dma_fence_ops dma_fence_stub_ops = {
> .get_timeline_name = dma_fence_stub_get_name,
> };
>
> +static int __init dma_fence_init_stub(void)
> +{
> + dma_fence_init(&dma_fence_stub, &dma_fence_stub_ops,
> + &dma_fence_stub_lock, 0, 0);
> +
> + set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> + &dma_fence_stub.flags);
> +
> + dma_fence_signal(&dma_fence_stub);
> + return 0;
> +}
> +subsys_initcall(dma_fence_init_stub);
> +
> /**
> * dma_fence_get_stub - return a signaled fence
> *
> - * Return a stub fence which is already signaled. The fence's
> - * timestamp corresponds to the first time after boot this
> - * function is called.
> + * Return a stub fence which is already signaled. The fence's timestamp
> + * corresponds to the initialisation time of the linux kernel.
> */
> struct dma_fence *dma_fence_get_stub(void)
> {
> - spin_lock(&dma_fence_stub_lock);
> - if (!dma_fence_stub.ops) {
> - dma_fence_init(&dma_fence_stub,
> - &dma_fence_stub_ops,
> - &dma_fence_stub_lock,
> - 0, 0);
> -
> - set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> - &dma_fence_stub.flags);
> -
> - dma_fence_signal_locked(&dma_fence_stub);
> - }
> - spin_unlock(&dma_fence_stub_lock);
> -
> return dma_fence_get(&dma_fence_stub);
Actually could you check if this could be demoted to static inline? It
strikes me pointless to export a symbol for such a trivial wrapper.
Regards,
Tvrtko
> }
> EXPORT_SYMBOL(dma_fence_get_stub);
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 06/20] dma-buf: use inline lock for the stub fence
2025-10-31 13:16 ` [PATCH 06/20] dma-buf: use inline lock for the stub fence Christian König
@ 2025-11-04 15:05 ` Tvrtko Ursulin
0 siblings, 0 replies; 55+ messages in thread
From: Tvrtko Ursulin @ 2025-11-04 15:05 UTC (permalink / raw)
To: Christian König, phasta, alexdeucher, simona.vetter, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On 31/10/2025 13:16, Christian König wrote:
> Just as proof of concept and minor cleanup.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/dma-buf/dma-fence.c | 20 ++++----------------
> 1 file changed, 4 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index f9d7439275d1..0dbd2f641f37 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -24,7 +24,6 @@ EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit);
> EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal);
> EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
>
> -static DEFINE_SPINLOCK(dma_fence_stub_lock);
> static struct dma_fence dma_fence_stub;
>
> /*
> @@ -123,12 +122,8 @@ static const struct dma_fence_ops dma_fence_stub_ops = {
>
> static int __init dma_fence_init_stub(void)
> {
> - dma_fence_init(&dma_fence_stub, &dma_fence_stub_ops,
> - &dma_fence_stub_lock, 0, 0);
> -
> - set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> - &dma_fence_stub.flags);
> -
> + dma_fence_init(&dma_fence_stub, &dma_fence_stub_ops, NULL, 0, 0);
> + set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &dma_fence_stub.flags);
> dma_fence_signal(&dma_fence_stub);
> return 0;
> }
> @@ -160,16 +155,9 @@ struct dma_fence *dma_fence_allocate_private_stub(ktime_t timestamp)
> if (fence == NULL)
> return NULL;
>
> - dma_fence_init(fence,
> - &dma_fence_stub_ops,
> - &dma_fence_stub_lock,
> - 0, 0);
> -
> - set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> - &fence->flags);
> -
> + dma_fence_init(fence, &dma_fence_stub_ops, NULL, 0, 0);
> + set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags);
> dma_fence_signal_timestamp(fence, timestamp);
> -
> return fence;
> }
> EXPORT_SYMBOL(dma_fence_allocate_private_stub);
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Regards,
Tvrtko
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 08/20] dma-buf: use inline lock for the dma-fence-chain
2025-10-31 13:16 ` [PATCH 08/20] dma-buf: use inline lock for the dma-fence-chain Christian König
@ 2025-11-04 15:08 ` Tvrtko Ursulin
0 siblings, 0 replies; 55+ messages in thread
From: Tvrtko Ursulin @ 2025-11-04 15:08 UTC (permalink / raw)
To: Christian König, phasta, alexdeucher, simona.vetter, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On 31/10/2025 13:16, Christian König wrote:
> Just as proof of concept and minor cleanup.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/dma-buf/dma-fence-chain.c | 3 +--
> include/linux/dma-fence-chain.h | 1 -
> 2 files changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
> index a8a90acf4f34..a707792b6025 100644
> --- a/drivers/dma-buf/dma-fence-chain.c
> +++ b/drivers/dma-buf/dma-fence-chain.c
> @@ -245,7 +245,6 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
> struct dma_fence_chain *prev_chain = to_dma_fence_chain(prev);
> uint64_t context;
>
> - spin_lock_init(&chain->lock);
> rcu_assign_pointer(chain->prev, prev);
> chain->fence = fence;
> chain->prev_seqno = 0;
> @@ -261,7 +260,7 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
> seqno = max(prev->seqno, seqno);
> }
>
> - dma_fence_init64(&chain->base, &dma_fence_chain_ops, &chain->lock,
> + dma_fence_init64(&chain->base, &dma_fence_chain_ops, NULL,
> context, seqno);
>
> /*
> diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
> index 68c3c1e41014..d39ce7a2e599 100644
> --- a/include/linux/dma-fence-chain.h
> +++ b/include/linux/dma-fence-chain.h
> @@ -46,7 +46,6 @@ struct dma_fence_chain {
> */
> struct irq_work work;
> };
> - spinlock_t lock;
> };
>
>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Regards,
Tvrtko
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 09/20] drm/sched: use inline locks for the drm-sched-fence
2025-10-31 13:16 ` [PATCH 09/20] drm/sched: use inline locks for the drm-sched-fence Christian König
@ 2025-11-04 15:12 ` Tvrtko Ursulin
2025-11-06 13:23 ` Christian König
2025-11-07 8:33 ` Philipp Stanner
0 siblings, 2 replies; 55+ messages in thread
From: Tvrtko Ursulin @ 2025-11-04 15:12 UTC (permalink / raw)
To: Christian König, phasta, alexdeucher, simona.vetter, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On 31/10/2025 13:16, Christian König wrote:
> Just as proof of concept and minor cleanup.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/gpu/drm/scheduler/sched_fence.c | 11 +++++------
> include/drm/gpu_scheduler.h | 4 ----
> 2 files changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
> index 9391d6f0dc01..7a94e03341cb 100644
> --- a/drivers/gpu/drm/scheduler/sched_fence.c
> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> @@ -156,19 +156,19 @@ static void drm_sched_fence_set_deadline_finished(struct dma_fence *f,
> struct dma_fence *parent;
> unsigned long flags;
>
> - spin_lock_irqsave(&fence->lock, flags);
> + dma_fence_lock(f, flags);
Moving to dma_fence_lock should either be a separate patch or squashed
into the one which converts many other drivers. Even a separate patch
before that previous patch would be better.
Naming wise, I however still think dma_fence_lock_irqsave would probably
be better to stick with the same pattern everyone is so used too.
Regards,
Tvrtko
>
> /* If we already have an earlier deadline, keep it: */
> if (test_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) &&
> ktime_before(fence->deadline, deadline)) {
> - spin_unlock_irqrestore(&fence->lock, flags);
> + dma_fence_unlock(f, flags);
> return;
> }
>
> fence->deadline = deadline;
> set_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags);
>
> - spin_unlock_irqrestore(&fence->lock, flags);
> + dma_fence_unlock(f, flags);
>
> /*
> * smp_load_aquire() to ensure that if we are racing another
> @@ -217,7 +217,6 @@ struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
>
> fence->owner = owner;
> fence->drm_client_id = drm_client_id;
> - spin_lock_init(&fence->lock);
>
> return fence;
> }
> @@ -230,9 +229,9 @@ void drm_sched_fence_init(struct drm_sched_fence *fence,
> fence->sched = entity->rq->sched;
> seq = atomic_inc_return(&entity->fence_seq);
> dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
> - &fence->lock, entity->fence_context, seq);
> + NULL, entity->fence_context, seq);
> dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
> - &fence->lock, entity->fence_context + 1, seq);
> + NULL, entity->fence_context + 1, seq);
> }
>
> module_init(drm_sched_fence_slab_init);
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index fb88301b3c45..b77f24a783e3 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -297,10 +297,6 @@ struct drm_sched_fence {
> * belongs to.
> */
> struct drm_gpu_scheduler *sched;
> - /**
> - * @lock: the lock used by the scheduled and the finished fences.
> - */
> - spinlock_t lock;
> /**
> * @owner: job owner for debugging
> */
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 10/20] drm/amdgpu: clean up and unify hw fence handling
2025-10-31 13:16 ` [PATCH 10/20] drm/amdgpu: clean up and unify hw fence handling Christian König
@ 2025-11-04 15:14 ` Tvrtko Ursulin
0 siblings, 0 replies; 55+ messages in thread
From: Tvrtko Ursulin @ 2025-11-04 15:14 UTC (permalink / raw)
To: Christian König, phasta, alexdeucher, simona.vetter, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On 31/10/2025 13:16, Christian König wrote:
> From: Alex Deucher <alexander.deucher@amd.com>
>
> Decouple the amdgpu fence from the amdgpu_job structure.
> This lets us clean up the separate fence ops for the embedded
> fence and other fences. This also allows us to allocate the
> vm fence up front when we allocate the job.
This is already merged broken patch causing kmemleak to go crazy right?
Is it useful to read the series further or you will send out a rebased
version once that is fixed?
Regards,
Tvrtko
>
> v2: Additional cleanup suggested by Christian
> v3: Additional cleanups suggested by Christian
> v4: Additional cleanups suggested by David and
> vm fence fix
> v5: cast seqno (David)
>
> Cc: David.Wu3@amd.com
> Cc: christian.koenig@amd.com
> Tested-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
> Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 143 ++------------------
> drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 17 ++-
> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 43 ++++--
> drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 3 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 8 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 7 +-
> 8 files changed, 63 insertions(+), 167 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> index a70651050acf..0a62727e74f0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> @@ -1902,7 +1902,7 @@ static void amdgpu_ib_preempt_mark_partial_job(struct amdgpu_ring *ring)
> continue;
> }
> job = to_amdgpu_job(s_job);
> - if (preempted && (&job->hw_fence.base) == fence)
> + if (preempted && (&job->hw_fence->base) == fence)
> /* mark the job as preempted */
> job->preemption_status |= AMDGPU_IB_PREEMPTED;
> }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index aa3736de238d..e4301686a8f4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -5809,11 +5809,6 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
> if (!amdgpu_ring_sched_ready(ring))
> continue;
>
> - /* Clear job fence from fence drv to avoid force_completion
> - * leave NULL and vm flush fence in fence drv
> - */
> - amdgpu_fence_driver_clear_job_fences(ring);
> -
> /* after all hw jobs are reset, hw fence is meaningless, so force_completion */
> amdgpu_fence_driver_force_completion(ring);
> }
> @@ -6542,7 +6537,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
> *
> * job->base holds a reference to parent fence
> */
> - if (job && dma_fence_is_signaled(&job->hw_fence.base)) {
> + if (job && dma_fence_is_signaled(&job->hw_fence->base)) {
> job_signaled = true;
> dev_info(adev->dev, "Guilty job already signaled, skipping HW reset");
> goto skip_hw_reset;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index 18a7829122d2..1fe31d2f2706 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -45,16 +45,11 @@
> * Cast helper
> */
> static const struct dma_fence_ops amdgpu_fence_ops;
> -static const struct dma_fence_ops amdgpu_job_fence_ops;
> static inline struct amdgpu_fence *to_amdgpu_fence(struct dma_fence *f)
> {
> struct amdgpu_fence *__f = container_of(f, struct amdgpu_fence, base);
>
> - if (__f->base.ops == &amdgpu_fence_ops ||
> - __f->base.ops == &amdgpu_job_fence_ops)
> - return __f;
> -
> - return NULL;
> + return __f;
> }
>
> /**
> @@ -98,51 +93,32 @@ static u32 amdgpu_fence_read(struct amdgpu_ring *ring)
> * amdgpu_fence_emit - emit a fence on the requested ring
> *
> * @ring: ring the fence is associated with
> - * @f: resulting fence object
> * @af: amdgpu fence input
> * @flags: flags to pass into the subordinate .emit_fence() call
> *
> * Emits a fence command on the requested ring (all asics).
> * Returns 0 on success, -ENOMEM on failure.
> */
> -int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
> - struct amdgpu_fence *af, unsigned int flags)
> +int amdgpu_fence_emit(struct amdgpu_ring *ring, struct amdgpu_fence *af,
> + unsigned int flags)
> {
> struct amdgpu_device *adev = ring->adev;
> struct dma_fence *fence;
> - struct amdgpu_fence *am_fence;
> struct dma_fence __rcu **ptr;
> uint32_t seq;
> int r;
>
> - if (!af) {
> - /* create a separate hw fence */
> - am_fence = kzalloc(sizeof(*am_fence), GFP_KERNEL);
> - if (!am_fence)
> - return -ENOMEM;
> - } else {
> - am_fence = af;
> - }
> - fence = &am_fence->base;
> - am_fence->ring = ring;
> + fence = &af->base;
> + af->ring = ring;
>
> seq = ++ring->fence_drv.sync_seq;
> - am_fence->seq = seq;
> - if (af) {
> - dma_fence_init(fence, &amdgpu_job_fence_ops,
> - &ring->fence_drv.lock,
> - adev->fence_context + ring->idx, seq);
> - /* Against remove in amdgpu_job_{free, free_cb} */
> - dma_fence_get(fence);
> - } else {
> - dma_fence_init(fence, &amdgpu_fence_ops,
> - &ring->fence_drv.lock,
> - adev->fence_context + ring->idx, seq);
> - }
> + dma_fence_init(fence, &amdgpu_fence_ops,
> + &ring->fence_drv.lock,
> + adev->fence_context + ring->idx, seq);
>
> amdgpu_ring_emit_fence(ring, ring->fence_drv.gpu_addr,
> seq, flags | AMDGPU_FENCE_FLAG_INT);
> - amdgpu_fence_save_wptr(fence);
> + amdgpu_fence_save_wptr(af);
> pm_runtime_get_noresume(adev_to_drm(adev)->dev);
> ptr = &ring->fence_drv.fences[seq & ring->fence_drv.num_fences_mask];
> if (unlikely(rcu_dereference_protected(*ptr, 1))) {
> @@ -167,8 +143,6 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
> */
> rcu_assign_pointer(*ptr, dma_fence_get(fence));
>
> - *f = fence;
> -
> return 0;
> }
>
> @@ -669,36 +643,6 @@ void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev)
> }
> }
>
> -/**
> - * amdgpu_fence_driver_clear_job_fences - clear job embedded fences of ring
> - *
> - * @ring: fence of the ring to be cleared
> - *
> - */
> -void amdgpu_fence_driver_clear_job_fences(struct amdgpu_ring *ring)
> -{
> - int i;
> - struct dma_fence *old, **ptr;
> -
> - for (i = 0; i <= ring->fence_drv.num_fences_mask; i++) {
> - ptr = &ring->fence_drv.fences[i];
> - old = rcu_dereference_protected(*ptr, 1);
> - if (old && old->ops == &amdgpu_job_fence_ops) {
> - struct amdgpu_job *job;
> -
> - /* For non-scheduler bad job, i.e. failed ib test, we need to signal
> - * it right here or we won't be able to track them in fence_drv
> - * and they will remain unsignaled during sa_bo free.
> - */
> - job = container_of(old, struct amdgpu_job, hw_fence.base);
> - if (!job->base.s_fence && !dma_fence_is_signaled(old))
> - dma_fence_signal(old);
> - RCU_INIT_POINTER(*ptr, NULL);
> - dma_fence_put(old);
> - }
> - }
> -}
> -
> /**
> * amdgpu_fence_driver_set_error - set error code on fences
> * @ring: the ring which contains the fences
> @@ -755,7 +699,7 @@ void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring)
> /**
> * amdgpu_fence_driver_guilty_force_completion - force signal of specified sequence
> *
> - * @fence: fence of the ring to signal
> + * @af: fence of the ring to signal
> *
> */
> void amdgpu_fence_driver_guilty_force_completion(struct amdgpu_fence *af)
> @@ -792,15 +736,13 @@ void amdgpu_fence_driver_guilty_force_completion(struct amdgpu_fence *af)
> } while (last_seq != seq);
> spin_unlock_irqrestore(&ring->fence_drv.lock, flags);
> /* signal the guilty fence */
> - amdgpu_fence_write(ring, af->seq);
> + amdgpu_fence_write(ring, (u32)af->base.seqno);
> amdgpu_fence_process(ring);
> }
>
> -void amdgpu_fence_save_wptr(struct dma_fence *fence)
> +void amdgpu_fence_save_wptr(struct amdgpu_fence *af)
> {
> - struct amdgpu_fence *am_fence = container_of(fence, struct amdgpu_fence, base);
> -
> - am_fence->wptr = am_fence->ring->wptr;
> + af->wptr = af->ring->wptr;
> }
>
> static void amdgpu_ring_backup_unprocessed_command(struct amdgpu_ring *ring,
> @@ -866,13 +808,6 @@ static const char *amdgpu_fence_get_timeline_name(struct dma_fence *f)
> return (const char *)to_amdgpu_fence(f)->ring->name;
> }
>
> -static const char *amdgpu_job_fence_get_timeline_name(struct dma_fence *f)
> -{
> - struct amdgpu_job *job = container_of(f, struct amdgpu_job, hw_fence.base);
> -
> - return (const char *)to_amdgpu_ring(job->base.sched)->name;
> -}
> -
> /**
> * amdgpu_fence_enable_signaling - enable signalling on fence
> * @f: fence
> @@ -889,23 +824,6 @@ static bool amdgpu_fence_enable_signaling(struct dma_fence *f)
> return true;
> }
>
> -/**
> - * amdgpu_job_fence_enable_signaling - enable signalling on job fence
> - * @f: fence
> - *
> - * This is the simliar function with amdgpu_fence_enable_signaling above, it
> - * only handles the job embedded fence.
> - */
> -static bool amdgpu_job_fence_enable_signaling(struct dma_fence *f)
> -{
> - struct amdgpu_job *job = container_of(f, struct amdgpu_job, hw_fence.base);
> -
> - if (!timer_pending(&to_amdgpu_ring(job->base.sched)->fence_drv.fallback_timer))
> - amdgpu_fence_schedule_fallback(to_amdgpu_ring(job->base.sched));
> -
> - return true;
> -}
> -
> /**
> * amdgpu_fence_free - free up the fence memory
> *
> @@ -921,21 +839,6 @@ static void amdgpu_fence_free(struct rcu_head *rcu)
> kfree(to_amdgpu_fence(f));
> }
>
> -/**
> - * amdgpu_job_fence_free - free up the job with embedded fence
> - *
> - * @rcu: RCU callback head
> - *
> - * Free up the job with embedded fence after the RCU grace period.
> - */
> -static void amdgpu_job_fence_free(struct rcu_head *rcu)
> -{
> - struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
> -
> - /* free job if fence has a parent job */
> - kfree(container_of(f, struct amdgpu_job, hw_fence.base));
> -}
> -
> /**
> * amdgpu_fence_release - callback that fence can be freed
> *
> @@ -949,19 +852,6 @@ static void amdgpu_fence_release(struct dma_fence *f)
> call_rcu(&f->rcu, amdgpu_fence_free);
> }
>
> -/**
> - * amdgpu_job_fence_release - callback that job embedded fence can be freed
> - *
> - * @f: fence
> - *
> - * This is the simliar function with amdgpu_fence_release above, it
> - * only handles the job embedded fence.
> - */
> -static void amdgpu_job_fence_release(struct dma_fence *f)
> -{
> - call_rcu(&f->rcu, amdgpu_job_fence_free);
> -}
> -
> static const struct dma_fence_ops amdgpu_fence_ops = {
> .get_driver_name = amdgpu_fence_get_driver_name,
> .get_timeline_name = amdgpu_fence_get_timeline_name,
> @@ -969,13 +859,6 @@ static const struct dma_fence_ops amdgpu_fence_ops = {
> .release = amdgpu_fence_release,
> };
>
> -static const struct dma_fence_ops amdgpu_job_fence_ops = {
> - .get_driver_name = amdgpu_fence_get_driver_name,
> - .get_timeline_name = amdgpu_job_fence_get_timeline_name,
> - .enable_signaling = amdgpu_job_fence_enable_signaling,
> - .release = amdgpu_job_fence_release,
> -};
> -
> /*
> * Fence debugfs
> */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> index 7d9bcb72e8dd..39229ece83f8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> @@ -149,17 +149,19 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
> if (job) {
> vm = job->vm;
> fence_ctx = job->base.s_fence ?
> - job->base.s_fence->scheduled.context : 0;
> + job->base.s_fence->finished.context : 0;
> shadow_va = job->shadow_va;
> csa_va = job->csa_va;
> gds_va = job->gds_va;
> init_shadow = job->init_shadow;
> - af = &job->hw_fence;
> + af = job->hw_fence;
> /* Save the context of the job for reset handling.
> * The driver needs this so it can skip the ring
> * contents for guilty contexts.
> */
> - af->context = job->base.s_fence ? job->base.s_fence->finished.context : 0;
> + af->context = fence_ctx;
> + /* the vm fence is also part of the job's context */
> + job->hw_vm_fence->context = fence_ctx;
> } else {
> vm = NULL;
> fence_ctx = 0;
> @@ -167,7 +169,9 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
> csa_va = 0;
> gds_va = 0;
> init_shadow = false;
> - af = NULL;
> + af = kzalloc(sizeof(*af), GFP_ATOMIC);
> + if (!af)
> + return -ENOMEM;
> }
>
> if (!ring->sched.ready) {
> @@ -289,7 +293,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
> amdgpu_ring_init_cond_exec(ring, ring->cond_exe_gpu_addr);
> }
>
> - r = amdgpu_fence_emit(ring, f, af, fence_flags);
> + r = amdgpu_fence_emit(ring, af, fence_flags);
> if (r) {
> dev_err(adev->dev, "failed to emit fence (%d)\n", r);
> if (job && job->vmid)
> @@ -297,6 +301,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
> amdgpu_ring_undo(ring);
> return r;
> }
> + *f = &af->base;
>
> if (ring->funcs->insert_end)
> ring->funcs->insert_end(ring);
> @@ -317,7 +322,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
> * fence so we know what rings contents to backup
> * after we reset the queue.
> */
> - amdgpu_fence_save_wptr(*f);
> + amdgpu_fence_save_wptr(af);
>
> amdgpu_ring_ib_end(ring);
> amdgpu_ring_commit(ring);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> index d020a890a0ea..e08d837668f1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> @@ -137,7 +137,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
> ring->funcs->reset) {
> dev_err(adev->dev, "Starting %s ring reset\n",
> s_job->sched->name);
> - r = amdgpu_ring_reset(ring, job->vmid, &job->hw_fence);
> + r = amdgpu_ring_reset(ring, job->vmid, job->hw_fence);
> if (!r) {
> atomic_inc(&ring->adev->gpu_reset_counter);
> dev_err(adev->dev, "Ring %s reset succeeded\n",
> @@ -186,6 +186,9 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
> unsigned int num_ibs, struct amdgpu_job **job,
> u64 drm_client_id)
> {
> + struct amdgpu_fence *af;
> + int r;
> +
> if (num_ibs == 0)
> return -EINVAL;
>
> @@ -193,6 +196,20 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
> if (!*job)
> return -ENOMEM;
>
> + af = kzalloc(sizeof(struct amdgpu_fence), GFP_KERNEL);
> + if (!af) {
> + r = -ENOMEM;
> + goto err_job;
> + }
> + (*job)->hw_fence = af;
> +
> + af = kzalloc(sizeof(struct amdgpu_fence), GFP_KERNEL);
> + if (!af) {
> + r = -ENOMEM;
> + goto err_fence;
> + }
> + (*job)->hw_vm_fence = af;
> +
> (*job)->vm = vm;
>
> amdgpu_sync_create(&(*job)->explicit_sync);
> @@ -204,6 +221,13 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>
> return drm_sched_job_init(&(*job)->base, entity, 1, owner,
> drm_client_id);
> +
> +err_fence:
> + kfree((*job)->hw_fence);
> +err_job:
> + kfree(*job);
> +
> + return r;
> }
>
> int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev,
> @@ -251,11 +275,11 @@ void amdgpu_job_free_resources(struct amdgpu_job *job)
> struct dma_fence *f;
> unsigned i;
>
> - /* Check if any fences where initialized */
> + /* Check if any fences were initialized */
> if (job->base.s_fence && job->base.s_fence->finished.ops)
> f = &job->base.s_fence->finished;
> - else if (job->hw_fence.base.ops)
> - f = &job->hw_fence.base;
> + else if (job->hw_fence && job->hw_fence->base.ops)
> + f = &job->hw_fence->base;
> else
> f = NULL;
>
> @@ -271,11 +295,7 @@ static void amdgpu_job_free_cb(struct drm_sched_job *s_job)
>
> amdgpu_sync_free(&job->explicit_sync);
>
> - /* only put the hw fence if has embedded fence */
> - if (!job->hw_fence.base.ops)
> - kfree(job);
> - else
> - dma_fence_put(&job->hw_fence.base);
> + kfree(job);
> }
>
> void amdgpu_job_set_gang_leader(struct amdgpu_job *job,
> @@ -304,10 +324,7 @@ void amdgpu_job_free(struct amdgpu_job *job)
> if (job->gang_submit != &job->base.s_fence->scheduled)
> dma_fence_put(job->gang_submit);
>
> - if (!job->hw_fence.base.ops)
> - kfree(job);
> - else
> - dma_fence_put(&job->hw_fence.base);
> + kfree(job);
> }
>
> struct dma_fence *amdgpu_job_submit(struct amdgpu_job *job)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> index 4a6487eb6cb5..7abf069d17d4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> @@ -64,7 +64,8 @@ struct amdgpu_job {
> struct drm_sched_job base;
> struct amdgpu_vm *vm;
> struct amdgpu_sync explicit_sync;
> - struct amdgpu_fence hw_fence;
> + struct amdgpu_fence *hw_fence;
> + struct amdgpu_fence *hw_vm_fence;
> struct dma_fence *gang_submit;
> uint32_t preamble_status;
> uint32_t preemption_status;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index 4b46e3c26ff3..87b962df5460 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -147,16 +147,14 @@ struct amdgpu_fence {
> u64 wptr;
> /* fence context for resets */
> u64 context;
> - uint32_t seq;
> };
>
> extern const struct drm_sched_backend_ops amdgpu_sched_ops;
>
> -void amdgpu_fence_driver_clear_job_fences(struct amdgpu_ring *ring);
> void amdgpu_fence_driver_set_error(struct amdgpu_ring *ring, int error);
> void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring);
> void amdgpu_fence_driver_guilty_force_completion(struct amdgpu_fence *af);
> -void amdgpu_fence_save_wptr(struct dma_fence *fence);
> +void amdgpu_fence_save_wptr(struct amdgpu_fence *af);
>
> int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring);
> int amdgpu_fence_driver_start_ring(struct amdgpu_ring *ring,
> @@ -166,8 +164,8 @@ void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev);
> void amdgpu_fence_driver_hw_fini(struct amdgpu_device *adev);
> int amdgpu_fence_driver_sw_init(struct amdgpu_device *adev);
> void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev);
> -int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
> - struct amdgpu_fence *af, unsigned int flags);
> +int amdgpu_fence_emit(struct amdgpu_ring *ring, struct amdgpu_fence *af,
> + unsigned int flags);
> int amdgpu_fence_emit_polling(struct amdgpu_ring *ring, uint32_t *s,
> uint32_t timeout);
> bool amdgpu_fence_process(struct amdgpu_ring *ring);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 61adba6ed071..164091eb4e7e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -779,7 +779,6 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job,
> bool cleaner_shader_needed = false;
> bool pasid_mapping_needed = false;
> struct dma_fence *fence = NULL;
> - struct amdgpu_fence *af;
> unsigned int patch;
> int r;
>
> @@ -842,12 +841,10 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job,
> }
>
> if (vm_flush_needed || pasid_mapping_needed || cleaner_shader_needed) {
> - r = amdgpu_fence_emit(ring, &fence, NULL, 0);
> + r = amdgpu_fence_emit(ring, job->hw_vm_fence, 0);
> if (r)
> return r;
> - /* this is part of the job's context */
> - af = container_of(fence, struct amdgpu_fence, base);
> - af->context = job->base.s_fence ? job->base.s_fence->finished.context : 0;
> + fence = &job->hw_vm_fence->base;
> }
>
> if (vm_flush_needed) {
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 13/20] drm/amdgpu: independence for the amdgpu_eviction_fence!
2025-10-31 13:16 ` [PATCH 13/20] drm/amdgpu: independence for the amdgpu_eviction_fence! Christian König
@ 2025-11-04 15:45 ` Tvrtko Ursulin
0 siblings, 0 replies; 55+ messages in thread
From: Tvrtko Ursulin @ 2025-11-04 15:45 UTC (permalink / raw)
To: Christian König, phasta, alexdeucher, simona.vetter, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On 31/10/2025 13:16, Christian König wrote:
> This should allow amdgpu_fences to outlive the amdgpu module.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c | 3 +--
> drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h | 1 -
> 2 files changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> index 23d7d0b0d625..95ee22c43ceb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> @@ -167,9 +167,8 @@ amdgpu_eviction_fence_create(struct amdgpu_eviction_fence_mgr *evf_mgr)
>
> ev_fence->evf_mgr = evf_mgr;
> get_task_comm(ev_fence->timeline_name, current);
> - spin_lock_init(&ev_fence->lock);
> dma_fence_init64(&ev_fence->base, &amdgpu_eviction_fence_ops,
> - &ev_fence->lock, evf_mgr->ev_fence_ctx,
> + NULL, evf_mgr->ev_fence_ctx,
> atomic_inc_return(&evf_mgr->ev_fence_seq));
> return ev_fence;
> }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> index fcd867b7147d..fb70efb54338 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> @@ -27,7 +27,6 @@
>
> struct amdgpu_eviction_fence {
> struct dma_fence base;
> - spinlock_t lock;
> char timeline_name[TASK_COMM_LEN];
> struct amdgpu_eviction_fence_mgr *evf_mgr;
> };
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Regards,
Tvrtko
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 14/20] drm/amdgpu: independence for the amdgpu_vm_tlb_fence!
2025-10-31 13:16 ` [PATCH 14/20] drm/amdgpu: independence for the amdgpu_vm_tlb_fence! Christian König
@ 2025-11-04 15:45 ` Tvrtko Ursulin
0 siblings, 0 replies; 55+ messages in thread
From: Tvrtko Ursulin @ 2025-11-04 15:45 UTC (permalink / raw)
To: Christian König, phasta, alexdeucher, simona.vetter, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On 31/10/2025 13:16, Christian König wrote:
> This should allow amdgpu_vm_tlb_fences to outlive the amdgpu module.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
> index 5d26797356a3..27bf1f569830 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
> @@ -33,7 +33,6 @@ struct amdgpu_tlb_fence {
> struct amdgpu_device *adev;
> struct dma_fence *dependency;
> struct work_struct work;
> - spinlock_t lock;
> uint16_t pasid;
>
> };
> @@ -98,9 +97,8 @@ void amdgpu_vm_tlb_fence_create(struct amdgpu_device *adev, struct amdgpu_vm *vm
> f->dependency = *fence;
> f->pasid = vm->pasid;
> INIT_WORK(&f->work, amdgpu_tlb_fence_work);
> - spin_lock_init(&f->lock);
>
> - dma_fence_init64(&f->base, &amdgpu_tlb_fence_ops, &f->lock,
> + dma_fence_init64(&f->base, &amdgpu_tlb_fence_ops, NULL,
> vm->tlb_fence_context, atomic64_read(&vm->tlb_seq));
>
> /* TODO: We probably need a separate wq here */
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Regards,
Tvrtko
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 16/20] drm/amdgpu: independence for the amdgpu_userq__fence!
2025-10-31 13:16 ` [PATCH 16/20] drm/amdgpu: independence for the amdgpu_userq__fence! Christian König
@ 2025-11-04 15:59 ` Tvrtko Ursulin
0 siblings, 0 replies; 55+ messages in thread
From: Tvrtko Ursulin @ 2025-11-04 15:59 UTC (permalink / raw)
To: Christian König, phasta, alexdeucher, simona.vetter, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On 31/10/2025 13:16, Christian König wrote:
> This should allow amdgpu_userq_fences to outlive the amdgpu module.
Commit message explaining what and how would be nice! ;)
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 13 +----
> .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 54 ++++---------------
> .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h | 8 ---
> 3 files changed, 11 insertions(+), 64 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 61268aa82df4..2ec4ffd7002a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -3147,11 +3147,7 @@ static int __init amdgpu_init(void)
>
> r = amdgpu_sync_init();
> if (r)
> - goto error_sync;
> -
> - r = amdgpu_userq_fence_slab_init();
> - if (r)
> - goto error_fence;
> + return r;
>
> DRM_INFO("amdgpu kernel modesetting enabled.\n");
> amdgpu_register_atpx_handler();
> @@ -3168,12 +3164,6 @@ static int __init amdgpu_init(void)
>
> /* let modprobe override vga console setting */
> return pci_register_driver(&amdgpu_kms_pci_driver);
> -
> -error_fence:
> - amdgpu_sync_fini();
> -
> -error_sync:
> - return r;
> }
>
> static void __exit amdgpu_exit(void)
> @@ -3183,7 +3173,6 @@ static void __exit amdgpu_exit(void)
> amdgpu_unregister_atpx_handler();
> amdgpu_acpi_release();
> amdgpu_sync_fini();
> - amdgpu_userq_fence_slab_fini();
> mmu_notifier_synchronize();
> amdgpu_xcp_drv_release();
> }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> index 761bad98da3e..9e0d558c1e4c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> @@ -33,26 +33,6 @@
> #include "amdgpu_userq_fence.h"
>
> static const struct dma_fence_ops amdgpu_userq_fence_ops;
> -static struct kmem_cache *amdgpu_userq_fence_slab;
> -
> -int amdgpu_userq_fence_slab_init(void)
> -{
> - amdgpu_userq_fence_slab = kmem_cache_create("amdgpu_userq_fence",
> - sizeof(struct amdgpu_userq_fence),
> - 0,
> - SLAB_HWCACHE_ALIGN,
> - NULL);
> - if (!amdgpu_userq_fence_slab)
> - return -ENOMEM;
> -
> - return 0;
> -}
> -
> -void amdgpu_userq_fence_slab_fini(void)
> -{
> - rcu_barrier();
> - kmem_cache_destroy(amdgpu_userq_fence_slab);
> -}
>
> static inline struct amdgpu_userq_fence *to_amdgpu_userq_fence(struct dma_fence *f)
> {
> @@ -226,7 +206,7 @@ void amdgpu_userq_fence_driver_put(struct amdgpu_userq_fence_driver *fence_drv)
>
> static int amdgpu_userq_fence_alloc(struct amdgpu_userq_fence **userq_fence)
> {
> - *userq_fence = kmem_cache_alloc(amdgpu_userq_fence_slab, GFP_ATOMIC);
> + *userq_fence = kmalloc(sizeof(**userq_fence), GFP_ATOMIC);
A side question - why atomic when only caller is the ioctl context?
> return *userq_fence ? 0 : -ENOMEM;
> }
>
> @@ -242,12 +222,11 @@ static int amdgpu_userq_fence_create(struct amdgpu_usermode_queue *userq,
> if (!fence_drv)
> return -EINVAL;
>
> - spin_lock_init(&userq_fence->lock);
> INIT_LIST_HEAD(&userq_fence->link);
> fence = &userq_fence->base;
> userq_fence->fence_drv = fence_drv;
>
> - dma_fence_init64(fence, &amdgpu_userq_fence_ops, &userq_fence->lock,
> + dma_fence_init64(fence, &amdgpu_userq_fence_ops, NULL,
> fence_drv->context, seq);
>
> amdgpu_userq_fence_driver_get(fence_drv);
> @@ -317,35 +296,22 @@ static bool amdgpu_userq_fence_signaled(struct dma_fence *f)
> rptr = amdgpu_userq_fence_read(fence_drv);
> wptr = fence->base.seqno;
>
> - if (rptr >= wptr)
> + if (rptr >= wptr) {
> + amdgpu_userq_fence_driver_put(fence->fence_drv);
> + fence->fence_drv = NULL;
> +
> + kvfree(fence->fence_drv_array);
> + fence->fence_drv_array = NULL;
Setting pointers to NULL after freeing them makes me shudder.
->signaled() can also run unlocked, no?
Regards,
Tvrtko
> return true;
> + }
>
> return false;
> }
>
> -static void amdgpu_userq_fence_free(struct rcu_head *rcu)
> -{
> - struct dma_fence *fence = container_of(rcu, struct dma_fence, rcu);
> - struct amdgpu_userq_fence *userq_fence = to_amdgpu_userq_fence(fence);
> - struct amdgpu_userq_fence_driver *fence_drv = userq_fence->fence_drv;
> -
> - /* Release the fence driver reference */
> - amdgpu_userq_fence_driver_put(fence_drv);
> -
> - kvfree(userq_fence->fence_drv_array);
> - kmem_cache_free(amdgpu_userq_fence_slab, userq_fence);
> -}
> -
> -static void amdgpu_userq_fence_release(struct dma_fence *f)
> -{
> - call_rcu(&f->rcu, amdgpu_userq_fence_free);
> -}
> -
> static const struct dma_fence_ops amdgpu_userq_fence_ops = {
> .get_driver_name = amdgpu_userq_fence_get_driver_name,
> .get_timeline_name = amdgpu_userq_fence_get_timeline_name,
> .signaled = amdgpu_userq_fence_signaled,
> - .release = amdgpu_userq_fence_release,
> };
>
> /**
> @@ -558,7 +524,7 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
> r = amdgpu_userq_fence_create(queue, userq_fence, wptr, &fence);
> if (r) {
> mutex_unlock(&userq_mgr->userq_mutex);
> - kmem_cache_free(amdgpu_userq_fence_slab, userq_fence);
> + kfree(userq_fence);
> goto put_gobj_write;
> }
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
> index d76add2afc77..6f04782f3ea9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
> @@ -31,11 +31,6 @@
>
> struct amdgpu_userq_fence {
> struct dma_fence base;
> - /*
> - * This lock is necessary to synchronize the
> - * userqueue dma fence operations.
> - */
> - spinlock_t lock;
> struct list_head link;
> unsigned long fence_drv_array_count;
> struct amdgpu_userq_fence_driver *fence_drv;
> @@ -58,9 +53,6 @@ struct amdgpu_userq_fence_driver {
> char timeline_name[TASK_COMM_LEN];
> };
>
> -int amdgpu_userq_fence_slab_init(void);
> -void amdgpu_userq_fence_slab_fini(void);
> -
> void amdgpu_userq_fence_driver_get(struct amdgpu_userq_fence_driver *fence_drv);
> void amdgpu_userq_fence_driver_put(struct amdgpu_userq_fence_driver *fence_drv);
> int amdgpu_userq_fence_driver_alloc(struct amdgpu_device *adev,
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 11/20] drm/amdgpu: fix KFD eviction fence enable_signaling path
2025-10-31 13:16 ` [PATCH 11/20] drm/amdgpu: fix KFD eviction fence enable_signaling path Christian König
@ 2025-11-04 16:28 ` Philipp Stanner
2025-11-06 13:43 ` Christian König
0 siblings, 1 reply; 55+ messages in thread
From: Philipp Stanner @ 2025-11-04 16:28 UTC (permalink / raw)
To: Christian König, alexdeucher, simona.vetter, tursulin,
airlied, felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On Fri, 2025-10-31 at 14:16 +0100, Christian König wrote:
> Calling dma_fence_is_signaled() here is illegal!
The series was sent as a v2. But is this still an RFC?
If not, more detailed commit messages are a desirable thing.
P.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 6 ------
> 1 file changed, 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
> index 1ef758ac5076..09c919f72b6c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
> @@ -120,12 +120,6 @@ static bool amdkfd_fence_enable_signaling(struct
> dma_fence *f)
> {
> struct amdgpu_amdkfd_fence *fence =
> to_amdgpu_amdkfd_fence(f);
>
> - if (!fence)
> - return false;
> -
> - if (dma_fence_is_signaled(f))
> - return true;
> -
> if (!fence->svm_bo) {
> if
> (!kgd2kfd_schedule_evict_and_restore_process(fence->mm, f))
> return true;
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 07/20] dma-buf: use inline lock for the dma-fence-array
2025-10-31 13:16 ` [PATCH 07/20] dma-buf: use inline lock for the dma-fence-array Christian König
@ 2025-11-05 8:50 ` Tvrtko Ursulin
2025-11-07 12:04 ` Philipp Stanner
1 sibling, 0 replies; 55+ messages in thread
From: Tvrtko Ursulin @ 2025-11-05 8:50 UTC (permalink / raw)
To: Christian König, phasta, alexdeucher, simona.vetter, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On 31/10/2025 13:16, Christian König wrote:
> Just as proof of concept and minor cleanup.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/dma-buf/dma-fence-array.c | 5 ++---
> include/linux/dma-fence-array.h | 1 -
> 2 files changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c
> index 6657d4b30af9..c2119a8049fe 100644
> --- a/drivers/dma-buf/dma-fence-array.c
> +++ b/drivers/dma-buf/dma-fence-array.c
> @@ -204,9 +204,8 @@ void dma_fence_array_init(struct dma_fence_array *array,
>
> array->num_fences = num_fences;
>
> - spin_lock_init(&array->lock);
> - dma_fence_init(&array->base, &dma_fence_array_ops, &array->lock,
> - context, seqno);
> + dma_fence_init(&array->base, &dma_fence_array_ops, NULL, context,
> + seqno);
> init_irq_work(&array->work, irq_dma_fence_array_work);
>
> atomic_set(&array->num_pending, signal_on_any ? 1 : num_fences);
> diff --git a/include/linux/dma-fence-array.h b/include/linux/dma-fence-array.h
> index 079b3dec0a16..370b3d2bba37 100644
> --- a/include/linux/dma-fence-array.h
> +++ b/include/linux/dma-fence-array.h
> @@ -38,7 +38,6 @@ struct dma_fence_array_cb {
> struct dma_fence_array {
> struct dma_fence base;
>
> - spinlock_t lock;
> unsigned num_fences;
> atomic_t num_pending;
> struct dma_fence **fences;
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Regards,
Tvrtko
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 03/20] dma-buf: protected fence ops by RCU v2
2025-10-31 14:29 ` Tvrtko Ursulin
@ 2025-11-06 13:14 ` Christian König
2025-11-07 11:09 ` Tvrtko Ursulin
0 siblings, 1 reply; 55+ messages in thread
From: Christian König @ 2025-11-06 13:14 UTC (permalink / raw)
To: Tvrtko Ursulin, phasta, alexdeucher, simona.vetter, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On 10/31/25 15:29, Tvrtko Ursulin wrote:
> On 31/10/2025 13:16, Christian König wrote:
>> At first glance it is counter intuitive to protect a constant function
>> pointer table by RCU, but this allows modules providing the function
>> table to unload by waiting for an RCU grace period.
>>
>> v2: make one the now duplicated lockdep warnings a comment instead.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>> drivers/dma-buf/dma-fence.c | 69 +++++++++++++++++++++++++------------
>> include/linux/dma-fence.h | 18 ++++++++--
>> 2 files changed, 62 insertions(+), 25 deletions(-)
>>
>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>> index b229d96f551c..ed82e8361096 100644
>> --- a/drivers/dma-buf/dma-fence.c
>> +++ b/drivers/dma-buf/dma-fence.c
>> @@ -498,6 +498,7 @@ EXPORT_SYMBOL(dma_fence_signal);
>> signed long
>> dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
>> {
>> + const struct dma_fence_ops *ops;
>> signed long ret;
>> if (WARN_ON(timeout < 0))
>> @@ -509,15 +510,21 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
>> dma_fence_enable_sw_signaling(fence);
>> - if (trace_dma_fence_wait_start_enabled()) {
>> - rcu_read_lock();
>> - trace_dma_fence_wait_start(fence);
>> + rcu_read_lock();
>> + ops = rcu_dereference(fence->ops);
>> + trace_dma_fence_wait_start(fence);
>> + if (ops->wait) {
>> + /*
>> + * Implementing the wait ops is deprecated and not supported for
>> + * issuer independent fences, so it is ok to use the ops outside
>> + * the RCU protected section.
>> + */
>
> Probably a good idea to put this explanation about issue independent fences to struct dma_fence_ops kerneldoc. At the moment only .wait is documented as deprecated, so both it and .release can be expanded with this additional angle.
Done, but I'm not sure if my documentation is sufficient. You should probably take a look at the next version.
>
>> + rcu_read_unlock();
>> + ret = ops->wait(fence, intr, timeout);
>> + } else {
>> rcu_read_unlock();
>> - }
>> - if (fence->ops->wait)
>> - ret = fence->ops->wait(fence, intr, timeout);
>> - else
>> ret = dma_fence_default_wait(fence, intr, timeout);
>> + }
>> if (trace_dma_fence_wait_end_enabled()) {
>> rcu_read_lock();
>> trace_dma_fence_wait_end(fence);
>> @@ -538,6 +545,7 @@ void dma_fence_release(struct kref *kref)
>> {
>> struct dma_fence *fence =
>> container_of(kref, struct dma_fence, refcount);
>> + const struct dma_fence_ops *ops;
>> rcu_read_lock();
>> trace_dma_fence_destroy(fence);
>> @@ -569,12 +577,12 @@ void dma_fence_release(struct kref *kref)
>> spin_unlock_irqrestore(fence->lock, flags);
>> }
>> - rcu_read_unlock();
>> -
>> - if (fence->ops->release)
>> - fence->ops->release(fence);
>> + ops = rcu_dereference(fence->ops);
>> + if (ops->release)
>> + ops->release(fence);
>> else
>> dma_fence_free(fence);
>> + rcu_read_unlock();
>> }
>> EXPORT_SYMBOL(dma_fence_release);
>> @@ -593,6 +601,7 @@ EXPORT_SYMBOL(dma_fence_free);
>> static bool __dma_fence_enable_signaling(struct dma_fence *fence)
>> {
>> + const struct dma_fence_ops *ops;
>> bool was_set;
>> lockdep_assert_held(fence->lock);
>> @@ -603,14 +612,18 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
>> if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>> return false;
>> - if (!was_set && fence->ops->enable_signaling) {
>> + rcu_read_lock();
>> + ops = rcu_dereference(fence->ops);
>> + if (!was_set && ops->enable_signaling) {
>> trace_dma_fence_enable_signal(fence);
>> - if (!fence->ops->enable_signaling(fence)) {
>> + if (!ops->enable_signaling(fence)) {
>
> Have you tried the series with PREEMPT_RT enabled?
No, that is not something we usually test with.
> I am worried about spin locks in any fence ops callbacks which now run with preemption disabled.
Hui? Why would spin_locks be problematic here?
Regards,
Christian.
>
> Regards,
>
> Tvrtko
>
>> + rcu_read_unlock();
>> dma_fence_signal_locked(fence);
>> return false;
>> }
>> }
>> + rcu_read_unlock();
>> return true;
>> }
>> @@ -983,8 +996,13 @@ EXPORT_SYMBOL(dma_fence_wait_any_timeout);
>> */
>> void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
>> {
>> - if (fence->ops->set_deadline && !dma_fence_is_signaled(fence))
>> - fence->ops->set_deadline(fence, deadline);
>> + const struct dma_fence_ops *ops;
>> +
>> + rcu_read_lock();
>> + ops = rcu_dereference(fence->ops);
>> + if (ops->set_deadline && !dma_fence_is_signaled(fence))
>> + ops->set_deadline(fence, deadline);
>> + rcu_read_unlock();
>> }
>> EXPORT_SYMBOL(dma_fence_set_deadline);
>> @@ -1024,7 +1042,12 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>> BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
>> kref_init(&fence->refcount);
>> - fence->ops = ops;
>> + /*
>> + * At first glance it is counter intuitive to protect a constant
>> + * function pointer table by RCU, but this allows modules providing the
>> + * function table to unload by waiting for an RCU grace period.
>> + */
>> + RCU_INIT_POINTER(fence->ops, ops);
>> INIT_LIST_HEAD(&fence->cb_list);
>> fence->lock = lock;
>> fence->context = context;
>> @@ -1104,11 +1127,12 @@ EXPORT_SYMBOL(dma_fence_init64);
>> */
>> const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
>> {
>> - RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
>> - "RCU protection is required for safe access to returned string");
>> + const struct dma_fence_ops *ops;
>> + /* RCU protection is required for safe access to returned string */
>> + ops = rcu_dereference(fence->ops);
>> if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>> - return fence->ops->get_driver_name(fence);
>> + return ops->get_driver_name(fence);
>> else
>> return "detached-driver";
>> }
>> @@ -1136,11 +1160,12 @@ EXPORT_SYMBOL(dma_fence_driver_name);
>> */
>> const char __rcu *dma_fence_timeline_name(struct dma_fence *fence)
>> {
>> - RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
>> - "RCU protection is required for safe access to returned string");
>> + const struct dma_fence_ops *ops;
>> + /* RCU protection is required for safe access to returned string */
>> + ops = rcu_dereference(fence->ops);
>> if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>> - return fence->ops->get_timeline_name(fence);
>> + return ops->get_timeline_name(fence);
>> else
>> return "signaled-timeline";
>> }
>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>> index 64639e104110..38421a0c7c5b 100644
>> --- a/include/linux/dma-fence.h
>> +++ b/include/linux/dma-fence.h
>> @@ -66,7 +66,7 @@ struct seq_file;
>> */
>> struct dma_fence {
>> spinlock_t *lock;
>> - const struct dma_fence_ops *ops;
>> + const struct dma_fence_ops __rcu *ops;
>> /*
>> * We clear the callback list on kref_put so that by the time we
>> * release the fence it is unused. No one should be adding to the
>> @@ -418,13 +418,19 @@ const char __rcu *dma_fence_timeline_name(struct dma_fence *fence);
>> static inline bool
>> dma_fence_is_signaled_locked(struct dma_fence *fence)
>> {
>> + const struct dma_fence_ops *ops;
>> +
>> if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>> return true;
>> - if (fence->ops->signaled && fence->ops->signaled(fence)) {
>> + rcu_read_lock();
>> + ops = rcu_dereference(fence->ops);
>> + if (ops->signaled && ops->signaled(fence)) {
>> + rcu_read_unlock();
>> dma_fence_signal_locked(fence);
>> return true;
>> }
>> + rcu_read_unlock();
>> return false;
>> }
>> @@ -448,13 +454,19 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
>> static inline bool
>> dma_fence_is_signaled(struct dma_fence *fence)
>> {
>> + const struct dma_fence_ops *ops;
>> +
>> if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>> return true;
>> - if (fence->ops->signaled && fence->ops->signaled(fence)) {
>> + rcu_read_lock();
>> + ops = rcu_dereference(fence->ops);
>> + if (ops->signaled && ops->signaled(fence)) {
>> + rcu_read_unlock();
>> dma_fence_signal(fence);
>> return true;
>> }
>> + rcu_read_unlock();
>> return false;
>> }
>
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 02/20] dma-buf: rework stub fence initialisation v2
2025-11-04 15:01 ` Tvrtko Ursulin
@ 2025-11-06 13:16 ` Christian König
0 siblings, 0 replies; 55+ messages in thread
From: Christian König @ 2025-11-06 13:16 UTC (permalink / raw)
To: Tvrtko Ursulin, phasta, alexdeucher, simona.vetter, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On 11/4/25 16:01, Tvrtko Ursulin wrote:
> On 31/10/2025 13:16, Christian König wrote:
>> /**
>> * dma_fence_get_stub - return a signaled fence
>> *
>> - * Return a stub fence which is already signaled. The fence's
>> - * timestamp corresponds to the first time after boot this
>> - * function is called.
>> + * Return a stub fence which is already signaled. The fence's timestamp
>> + * corresponds to the initialisation time of the linux kernel.
>> */
>> struct dma_fence *dma_fence_get_stub(void)
>> {
>> - spin_lock(&dma_fence_stub_lock);
>> - if (!dma_fence_stub.ops) {
>> - dma_fence_init(&dma_fence_stub,
>> - &dma_fence_stub_ops,
>> - &dma_fence_stub_lock,
>> - 0, 0);
>> -
>> - set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
>> - &dma_fence_stub.flags);
>> -
>> - dma_fence_signal_locked(&dma_fence_stub);
>> - }
>> - spin_unlock(&dma_fence_stub_lock);
>> -
>> return dma_fence_get(&dma_fence_stub);
>
> Actually could you check if this could be demoted to static inline? It strikes me pointless to export a symbol for such a trivial wrapper.
I thought about that previously as well, but the answer is "No it can't".
We would need to expose the dma_fence_stub symbol then and the reason we have the function in the first place is to avoid that.
Regards,
Christian.
>
> Regards,
>
> Tvrtko
>
>> }
>> EXPORT_SYMBOL(dma_fence_get_stub);
>
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 09/20] drm/sched: use inline locks for the drm-sched-fence
2025-11-04 15:12 ` Tvrtko Ursulin
@ 2025-11-06 13:23 ` Christian König
2025-11-06 13:45 ` Tvrtko Ursulin
2025-11-07 8:33 ` Philipp Stanner
1 sibling, 1 reply; 55+ messages in thread
From: Christian König @ 2025-11-06 13:23 UTC (permalink / raw)
To: Tvrtko Ursulin, phasta, alexdeucher, simona.vetter, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On 11/4/25 16:12, Tvrtko Ursulin wrote:
>
> On 31/10/2025 13:16, Christian König wrote:
>> Just as proof of concept and minor cleanup.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>> drivers/gpu/drm/scheduler/sched_fence.c | 11 +++++------
>> include/drm/gpu_scheduler.h | 4 ----
>> 2 files changed, 5 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
>> index 9391d6f0dc01..7a94e03341cb 100644
>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
>> @@ -156,19 +156,19 @@ static void drm_sched_fence_set_deadline_finished(struct dma_fence *f,
>> struct dma_fence *parent;
>> unsigned long flags;
>> - spin_lock_irqsave(&fence->lock, flags);
>> + dma_fence_lock(f, flags);
>
> Moving to dma_fence_lock should either be a separate patch or squashed into the one which converts many other drivers. Even a separate patch before that previous patch would be better.
As far as I can see that won't work or would be at least rather tricky.
Previously from spin_lock_irqsave() locked drm_sched_fence->lock, but now it is locking dma_fence->lock.
That only works because we switched to using the internal lock.
> Naming wise, I however still think dma_fence_lock_irqsave would probably be better to stick with the same pattern everyone is so used too.
Oh, that is a good idea. Going to apply this to the patch set.
Regards,
Christian.
>
> Regards,
>
> Tvrtko
>
>> /* If we already have an earlier deadline, keep it: */
>> if (test_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) &&
>> ktime_before(fence->deadline, deadline)) {
>> - spin_unlock_irqrestore(&fence->lock, flags);
>> + dma_fence_unlock(f, flags);
>> return;
>> }
>> fence->deadline = deadline;
>> set_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags);
>> - spin_unlock_irqrestore(&fence->lock, flags);
>> + dma_fence_unlock(f, flags);
>> /*
>> * smp_load_aquire() to ensure that if we are racing another
>> @@ -217,7 +217,6 @@ struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
>> fence->owner = owner;
>> fence->drm_client_id = drm_client_id;
>> - spin_lock_init(&fence->lock);
>> return fence;
>> }
>> @@ -230,9 +229,9 @@ void drm_sched_fence_init(struct drm_sched_fence *fence,
>> fence->sched = entity->rq->sched;
>> seq = atomic_inc_return(&entity->fence_seq);
>> dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
>> - &fence->lock, entity->fence_context, seq);
>> + NULL, entity->fence_context, seq);
>> dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
>> - &fence->lock, entity->fence_context + 1, seq);
>> + NULL, entity->fence_context + 1, seq);
>> }
>> module_init(drm_sched_fence_slab_init);
>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>> index fb88301b3c45..b77f24a783e3 100644
>> --- a/include/drm/gpu_scheduler.h
>> +++ b/include/drm/gpu_scheduler.h
>> @@ -297,10 +297,6 @@ struct drm_sched_fence {
>> * belongs to.
>> */
>> struct drm_gpu_scheduler *sched;
>> - /**
>> - * @lock: the lock used by the scheduled and the finished fences.
>> - */
>> - spinlock_t lock;
>> /**
>> * @owner: job owner for debugging
>> */
>
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 11/20] drm/amdgpu: fix KFD eviction fence enable_signaling path
2025-11-04 16:28 ` Philipp Stanner
@ 2025-11-06 13:43 ` Christian König
2025-11-06 16:37 ` Kuehling, Felix
0 siblings, 1 reply; 55+ messages in thread
From: Christian König @ 2025-11-06 13:43 UTC (permalink / raw)
To: phasta, alexdeucher, simona.vetter, tursulin, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On 11/4/25 17:28, Philipp Stanner wrote:
> On Fri, 2025-10-31 at 14:16 +0100, Christian König wrote:
>> Calling dma_fence_is_signaled() here is illegal!
>
> The series was sent as a v2. But is this still an RFC?
I think when Matthew came up with the XE patches we pretty much agreed that this is the way to go.
>
> If not, more detailed commit messages are a desirable thing.
Good point, how about:
The enable_signaling callback is called with the same irqsave spinlock held than dma_fence_is_signaled() tries to grab. That will 100% reliable deadlock if that happens.
Thanks,
Christian.
>
>
> P.
>
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 6 ------
>> 1 file changed, 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>> index 1ef758ac5076..09c919f72b6c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>> @@ -120,12 +120,6 @@ static bool amdkfd_fence_enable_signaling(struct
>> dma_fence *f)
>> {
>> struct amdgpu_amdkfd_fence *fence =
>> to_amdgpu_amdkfd_fence(f);
>>
>> - if (!fence)
>> - return false;
>> -
>> - if (dma_fence_is_signaled(f))
>> - return true;
>> -
>> if (!fence->svm_bo) {
>> if
>> (!kgd2kfd_schedule_evict_and_restore_process(fence->mm, f))
>> return true;
>
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 09/20] drm/sched: use inline locks for the drm-sched-fence
2025-11-06 13:23 ` Christian König
@ 2025-11-06 13:45 ` Tvrtko Ursulin
0 siblings, 0 replies; 55+ messages in thread
From: Tvrtko Ursulin @ 2025-11-06 13:45 UTC (permalink / raw)
To: Christian König, phasta, alexdeucher, simona.vetter, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On 06/11/2025 13:23, Christian König wrote:
> On 11/4/25 16:12, Tvrtko Ursulin wrote:
>>
>> On 31/10/2025 13:16, Christian König wrote:
>>> Just as proof of concept and minor cleanup.
>>>
>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>> ---
>>> drivers/gpu/drm/scheduler/sched_fence.c | 11 +++++------
>>> include/drm/gpu_scheduler.h | 4 ----
>>> 2 files changed, 5 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
>>> index 9391d6f0dc01..7a94e03341cb 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
>>> @@ -156,19 +156,19 @@ static void drm_sched_fence_set_deadline_finished(struct dma_fence *f,
>>> struct dma_fence *parent;
>>> unsigned long flags;
>>> - spin_lock_irqsave(&fence->lock, flags);
>>> + dma_fence_lock(f, flags);
>>
>> Moving to dma_fence_lock should either be a separate patch or squashed into the one which converts many other drivers. Even a separate patch before that previous patch would be better.
>
> As far as I can see that won't work or would be at least rather tricky.
>
> Previously from spin_lock_irqsave() locked drm_sched_fence->lock, but now it is locking dma_fence->lock.
>
> That only works because we switched to using the internal lock.
What I meant here is to add a patch before the current 5/20.
Because in 5/20 one we have a lot of:
- spin_lock_irqsave(fence->lock, flags);
+ dma_fence_lock(fence, flags);
If before it you insert a patch like "dma-fence: Add lock/unlock helper"
as standalone, then 5/20 simply touches the internal of the helper and
becomes smaller.
That new patch would also include the drm/sched part where it would be a
straight replacement. At that point dma_fence->lock ==
dma_sched_fence->lock.
Then at the point of this patch you remove dma_sched_fence->lock member
but dma_fence_lock_* already does the right thing before it, no?
Sorry if I got confused somehow, I am jumping between topics.
Regards,
Tvrtko
>> Naming wise, I however still think dma_fence_lock_irqsave would probably be better to stick with the same pattern everyone is so used too.
>
> Oh, that is a good idea. Going to apply this to the patch set.
>
> Regards,
> Christian.
>
>>
>> Regards,
>>
>> Tvrtko
>>
>>> /* If we already have an earlier deadline, keep it: */
>>> if (test_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) &&
>>> ktime_before(fence->deadline, deadline)) {
>>> - spin_unlock_irqrestore(&fence->lock, flags);
>>> + dma_fence_unlock(f, flags);
>>> return;
>>> }
>>> fence->deadline = deadline;
>>> set_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags);
>>> - spin_unlock_irqrestore(&fence->lock, flags);
>>> + dma_fence_unlock(f, flags);
>>> /*
>>> * smp_load_aquire() to ensure that if we are racing another
>>> @@ -217,7 +217,6 @@ struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
>>> fence->owner = owner;
>>> fence->drm_client_id = drm_client_id;
>>> - spin_lock_init(&fence->lock);
>>> return fence;
>>> }
>>> @@ -230,9 +229,9 @@ void drm_sched_fence_init(struct drm_sched_fence *fence,
>>> fence->sched = entity->rq->sched;
>>> seq = atomic_inc_return(&entity->fence_seq);
>>> dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
>>> - &fence->lock, entity->fence_context, seq);
>>> + NULL, entity->fence_context, seq);
>>> dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
>>> - &fence->lock, entity->fence_context + 1, seq);
>>> + NULL, entity->fence_context + 1, seq);
>>> }
>>> module_init(drm_sched_fence_slab_init);
>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>> index fb88301b3c45..b77f24a783e3 100644
>>> --- a/include/drm/gpu_scheduler.h
>>> +++ b/include/drm/gpu_scheduler.h
>>> @@ -297,10 +297,6 @@ struct drm_sched_fence {
>>> * belongs to.
>>> */
>>> struct drm_gpu_scheduler *sched;
>>> - /**
>>> - * @lock: the lock used by the scheduled and the finished fences.
>>> - */
>>> - spinlock_t lock;
>>> /**
>>> * @owner: job owner for debugging
>>> */
>>
>
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 11/20] drm/amdgpu: fix KFD eviction fence enable_signaling path
2025-11-06 13:43 ` Christian König
@ 2025-11-06 16:37 ` Kuehling, Felix
2025-11-06 16:46 ` Christian König
0 siblings, 1 reply; 55+ messages in thread
From: Kuehling, Felix @ 2025-11-06 16:37 UTC (permalink / raw)
To: Christian König, phasta, alexdeucher, simona.vetter,
tursulin, airlied, matthew.brost
Cc: dri-devel, amd-gfx
On 2025-11-06 08:43, Christian König wrote:
> On 11/4/25 17:28, Philipp Stanner wrote:
>> On Fri, 2025-10-31 at 14:16 +0100, Christian König wrote:
>>> Calling dma_fence_is_signaled() here is illegal!
>> The series was sent as a v2. But is this still an RFC?
> I think when Matthew came up with the XE patches we pretty much agreed that this is the way to go.
>
>> If not, more detailed commit messages are a desirable thing.
> Good point, how about:
>
> The enable_signaling callback is called with the same irqsave spinlock held than dma_fence_is_signaled() tries to grab. That will 100% reliable deadlock if that happens.
I guess we could use dma_fence_is_signaled_locked instead. That said, it
only tries to take the lock (in dma_fence_signal) if fence->ops->signal
is set, which isn't the case for these fences. That's why this has never
caused a problem up till now.
Regards,
Felix
>
> Thanks,
> Christian.
>
>>
>> P.
>>
>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 6 ------
>>> 1 file changed, 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>> index 1ef758ac5076..09c919f72b6c 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>> @@ -120,12 +120,6 @@ static bool amdkfd_fence_enable_signaling(struct
>>> dma_fence *f)
>>> {
>>> struct amdgpu_amdkfd_fence *fence =
>>> to_amdgpu_amdkfd_fence(f);
>>>
>>> - if (!fence)
>>> - return false;
>>> -
>>> - if (dma_fence_is_signaled(f))
>>> - return true;
>>> -
>>> if (!fence->svm_bo) {
>>> if
>>> (!kgd2kfd_schedule_evict_and_restore_process(fence->mm, f))
>>> return true;
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 11/20] drm/amdgpu: fix KFD eviction fence enable_signaling path
2025-11-06 16:37 ` Kuehling, Felix
@ 2025-11-06 16:46 ` Christian König
2025-11-06 17:07 ` Kuehling, Felix
0 siblings, 1 reply; 55+ messages in thread
From: Christian König @ 2025-11-06 16:46 UTC (permalink / raw)
To: Kuehling, Felix, phasta, alexdeucher, simona.vetter, tursulin,
airlied, matthew.brost
Cc: dri-devel, amd-gfx
On 11/6/25 17:37, Kuehling, Felix wrote:
>
> On 2025-11-06 08:43, Christian König wrote:
>> On 11/4/25 17:28, Philipp Stanner wrote:
>>> On Fri, 2025-10-31 at 14:16 +0100, Christian König wrote:
>>>> Calling dma_fence_is_signaled() here is illegal!
>>> The series was sent as a v2. But is this still an RFC?
>> I think when Matthew came up with the XE patches we pretty much agreed that this is the way to go.
>>
>>> If not, more detailed commit messages are a desirable thing.
>> Good point, how about:
>>
>> The enable_signaling callback is called with the same irqsave spinlock held than dma_fence_is_signaled() tries to grab. That will 100% reliable deadlock if that happens.
>
> I guess we could use dma_fence_is_signaled_locked instead. That said, it only tries to take the lock (in dma_fence_signal) if fence->ops->signal is set, which isn't the case for these fences. That's why this has never caused a problem up till now.
But when fence->ops->signal isn't set then why are we calling this?
Regards,
Christian.
>
> Regards,
> Felix
>
>
>>
>> Thanks,
>> Christian.
>>
>>>
>>> P.
>>>
>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 6 ------
>>>> 1 file changed, 6 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>>> index 1ef758ac5076..09c919f72b6c 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>>> @@ -120,12 +120,6 @@ static bool amdkfd_fence_enable_signaling(struct
>>>> dma_fence *f)
>>>> {
>>>> struct amdgpu_amdkfd_fence *fence =
>>>> to_amdgpu_amdkfd_fence(f);
>>>> - if (!fence)
>>>> - return false;
>>>> -
>>>> - if (dma_fence_is_signaled(f))
>>>> - return true;
>>>> -
>>>> if (!fence->svm_bo) {
>>>> if
>>>> (!kgd2kfd_schedule_evict_and_restore_process(fence->mm, f))
>>>> return true;
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 11/20] drm/amdgpu: fix KFD eviction fence enable_signaling path
2025-11-06 16:46 ` Christian König
@ 2025-11-06 17:07 ` Kuehling, Felix
2025-11-06 17:09 ` Christian König
0 siblings, 1 reply; 55+ messages in thread
From: Kuehling, Felix @ 2025-11-06 17:07 UTC (permalink / raw)
To: Christian König, phasta, alexdeucher, simona.vetter,
tursulin, airlied, matthew.brost
Cc: dri-devel, amd-gfx
On 2025-11-06 11:46, Christian König wrote:
> On 11/6/25 17:37, Kuehling, Felix wrote:
>> On 2025-11-06 08:43, Christian König wrote:
>>> On 11/4/25 17:28, Philipp Stanner wrote:
>>>> On Fri, 2025-10-31 at 14:16 +0100, Christian König wrote:
>>>>> Calling dma_fence_is_signaled() here is illegal!
>>>> The series was sent as a v2. But is this still an RFC?
>>> I think when Matthew came up with the XE patches we pretty much agreed that this is the way to go.
>>>
>>>> If not, more detailed commit messages are a desirable thing.
>>> Good point, how about:
>>>
>>> The enable_signaling callback is called with the same irqsave spinlock held than dma_fence_is_signaled() tries to grab. That will 100% reliable deadlock if that happens.
>> I guess we could use dma_fence_is_signaled_locked instead. That said, it only tries to take the lock (in dma_fence_signal) if fence->ops->signal is set, which isn't the case for these fences. That's why this has never caused a problem up till now.
> But when fence->ops->signal isn't set then why are we calling this?
There is no need to enable-signaling (and trigger a preemption), if the
eviction fence has already signaled.
Regards,
Felix
>
> Regards,
> Christian.
>
>> Regards,
>> Felix
>>
>>
>>> Thanks,
>>> Christian.
>>>
>>>> P.
>>>>
>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>> ---
>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 6 ------
>>>>> 1 file changed, 6 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>>>> index 1ef758ac5076..09c919f72b6c 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>>>> @@ -120,12 +120,6 @@ static bool amdkfd_fence_enable_signaling(struct
>>>>> dma_fence *f)
>>>>> {
>>>>> struct amdgpu_amdkfd_fence *fence =
>>>>> to_amdgpu_amdkfd_fence(f);
>>>>> - if (!fence)
>>>>> - return false;
>>>>> -
>>>>> - if (dma_fence_is_signaled(f))
>>>>> - return true;
>>>>> -
>>>>> if (!fence->svm_bo) {
>>>>> if
>>>>> (!kgd2kfd_schedule_evict_and_restore_process(fence->mm, f))
>>>>> return true;
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 11/20] drm/amdgpu: fix KFD eviction fence enable_signaling path
2025-11-06 17:07 ` Kuehling, Felix
@ 2025-11-06 17:09 ` Christian König
2025-11-06 17:25 ` Kuehling, Felix
0 siblings, 1 reply; 55+ messages in thread
From: Christian König @ 2025-11-06 17:09 UTC (permalink / raw)
To: Kuehling, Felix, phasta, alexdeucher, simona.vetter, tursulin,
airlied, matthew.brost
Cc: dri-devel, amd-gfx
On 11/6/25 18:07, Kuehling, Felix wrote:
>
> On 2025-11-06 11:46, Christian König wrote:
>> On 11/6/25 17:37, Kuehling, Felix wrote:
>>> On 2025-11-06 08:43, Christian König wrote:
>>>> On 11/4/25 17:28, Philipp Stanner wrote:
>>>>> On Fri, 2025-10-31 at 14:16 +0100, Christian König wrote:
>>>>>> Calling dma_fence_is_signaled() here is illegal!
>>>>> The series was sent as a v2. But is this still an RFC?
>>>> I think when Matthew came up with the XE patches we pretty much agreed that this is the way to go.
>>>>
>>>>> If not, more detailed commit messages are a desirable thing.
>>>> Good point, how about:
>>>>
>>>> The enable_signaling callback is called with the same irqsave spinlock held than dma_fence_is_signaled() tries to grab. That will 100% reliable deadlock if that happens.
>>> I guess we could use dma_fence_is_signaled_locked instead. That said, it only tries to take the lock (in dma_fence_signal) if fence->ops->signal is set, which isn't the case for these fences. That's why this has never caused a problem up till now.
>> But when fence->ops->signal isn't set then why are we calling this?
>
> There is no need to enable-signaling (and trigger a preemption), if the eviction fence has already signaled.
But when the evicted fence has already been signaled then enable_signaling is not called in the first place:
if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
return false;
if (!was_set && fence->ops->enable_signaling) {
trace_dma_fence_enable_signal(fence);
if (!fence->ops->enable_signaling(fence)) {
...
So the extra check is actually completely superfluous as far as I can see.
Regards,
Christian.
>
> Regards,
> Felix
>
>
>>
>> Regards,
>> Christian.
>>
>>> Regards,
>>> Felix
>>>
>>>
>>>> Thanks,
>>>> Christian.
>>>>
>>>>> P.
>>>>>
>>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>> ---
>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 6 ------
>>>>>> 1 file changed, 6 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>>>>> index 1ef758ac5076..09c919f72b6c 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>>>>> @@ -120,12 +120,6 @@ static bool amdkfd_fence_enable_signaling(struct
>>>>>> dma_fence *f)
>>>>>> {
>>>>>> struct amdgpu_amdkfd_fence *fence =
>>>>>> to_amdgpu_amdkfd_fence(f);
>>>>>> - if (!fence)
>>>>>> - return false;
>>>>>> -
>>>>>> - if (dma_fence_is_signaled(f))
>>>>>> - return true;
>>>>>> -
>>>>>> if (!fence->svm_bo) {
>>>>>> if
>>>>>> (!kgd2kfd_schedule_evict_and_restore_process(fence->mm, f))
>>>>>> return true;
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 11/20] drm/amdgpu: fix KFD eviction fence enable_signaling path
2025-11-06 17:09 ` Christian König
@ 2025-11-06 17:25 ` Kuehling, Felix
2025-11-13 14:37 ` Christian König
0 siblings, 1 reply; 55+ messages in thread
From: Kuehling, Felix @ 2025-11-06 17:25 UTC (permalink / raw)
To: Christian König, phasta, alexdeucher, simona.vetter,
tursulin, airlied, matthew.brost
Cc: dri-devel, amd-gfx
On 2025-11-06 12:09, Christian König wrote:
>
> On 11/6/25 18:07, Kuehling, Felix wrote:
>> On 2025-11-06 11:46, Christian König wrote:
>>> On 11/6/25 17:37, Kuehling, Felix wrote:
>>>> On 2025-11-06 08:43, Christian König wrote:
>>>>> On 11/4/25 17:28, Philipp Stanner wrote:
>>>>>> On Fri, 2025-10-31 at 14:16 +0100, Christian König wrote:
>>>>>>> Calling dma_fence_is_signaled() here is illegal!
>>>>>> The series was sent as a v2. But is this still an RFC?
>>>>> I think when Matthew came up with the XE patches we pretty much agreed that this is the way to go.
>>>>>
>>>>>> If not, more detailed commit messages are a desirable thing.
>>>>> Good point, how about:
>>>>>
>>>>> The enable_signaling callback is called with the same irqsave spinlock held than dma_fence_is_signaled() tries to grab. That will 100% reliable deadlock if that happens.
>>>> I guess we could use dma_fence_is_signaled_locked instead. That said, it only tries to take the lock (in dma_fence_signal) if fence->ops->signal is set, which isn't the case for these fences. That's why this has never caused a problem up till now.
>>> But when fence->ops->signal isn't set then why are we calling this?
>> There is no need to enable-signaling (and trigger a preemption), if the eviction fence has already signaled.
> But when the evicted fence has already been signaled then enable_signaling is not called in the first place:
>
> if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> return false;
>
> if (!was_set && fence->ops->enable_signaling) {
> trace_dma_fence_enable_signal(fence);
>
> if (!fence->ops->enable_signaling(fence)) {
> ...
>
> So the extra check is actually completely superfluous as far as I can see.
Thanks. I agree.
Regards,
Felix
>
> Regards,
> Christian.
>
>> Regards,
>> Felix
>>
>>
>>> Regards,
>>> Christian.
>>>
>>>> Regards,
>>>> Felix
>>>>
>>>>
>>>>> Thanks,
>>>>> Christian.
>>>>>
>>>>>> P.
>>>>>>
>>>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>>> ---
>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 6 ------
>>>>>>> 1 file changed, 6 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>>>>>> index 1ef758ac5076..09c919f72b6c 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>>>>>> @@ -120,12 +120,6 @@ static bool amdkfd_fence_enable_signaling(struct
>>>>>>> dma_fence *f)
>>>>>>> {
>>>>>>> struct amdgpu_amdkfd_fence *fence =
>>>>>>> to_amdgpu_amdkfd_fence(f);
>>>>>>> - if (!fence)
>>>>>>> - return false;
>>>>>>> -
>>>>>>> - if (dma_fence_is_signaled(f))
>>>>>>> - return true;
>>>>>>> -
>>>>>>> if (!fence->svm_bo) {
>>>>>>> if
>>>>>>> (!kgd2kfd_schedule_evict_and_restore_process(fence->mm, f))
>>>>>>> return true;
>
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 09/20] drm/sched: use inline locks for the drm-sched-fence
2025-11-04 15:12 ` Tvrtko Ursulin
2025-11-06 13:23 ` Christian König
@ 2025-11-07 8:33 ` Philipp Stanner
2025-11-12 13:58 ` Christian König
1 sibling, 1 reply; 55+ messages in thread
From: Philipp Stanner @ 2025-11-07 8:33 UTC (permalink / raw)
To: Tvrtko Ursulin, Christian König, alexdeucher, simona.vetter,
airlied, felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On Tue, 2025-11-04 at 15:12 +0000, Tvrtko Ursulin wrote:
>
> On 31/10/2025 13:16, Christian König wrote:
> > Just as proof of concept and minor cleanup.
That's by the way why I'm asking whether this series is intended as an
RFC.
> >
> > Signed-off-by: Christian König <christian.koenig@amd.com>
> > ---
> > drivers/gpu/drm/scheduler/sched_fence.c | 11 +++++------
> > include/drm/gpu_scheduler.h | 4 ----
> > 2 files changed, 5 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
> > index 9391d6f0dc01..7a94e03341cb 100644
> > --- a/drivers/gpu/drm/scheduler/sched_fence.c
> > +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> > @@ -156,19 +156,19 @@ static void drm_sched_fence_set_deadline_finished(struct dma_fence *f,
> > struct dma_fence *parent;
> > unsigned long flags;
> >
> > - spin_lock_irqsave(&fence->lock, flags);
> > + dma_fence_lock(f, flags);
>
> Moving to dma_fence_lock should either be a separate patch or squashed
> into the one which converts many other drivers. Even a separate patch
> before that previous patch would be better.
Yes. +1
>
> Naming wise, I however still think dma_fence_lock_irqsave would probably
> be better to stick with the same pattern everyone is so used too.
I also think this would be better. Who knows, one day maybe someone
really wants a lock that definitely must not be irqsave for some
reason, and then the naming pattern would completely break.
> >
> >
[…]
> > @@ -230,9 +229,9 @@ void drm_sched_fence_init(struct drm_sched_fence *fence,
> > fence->sched = entity->rq->sched;
> > seq = atomic_inc_return(&entity->fence_seq);
> > dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
> > - &fence->lock, entity->fence_context, seq);
> > + NULL, entity->fence_context, seq);
> > dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
> > - &fence->lock, entity->fence_context + 1, seq);
> > + NULL, entity->fence_context + 1, seq);
> > }
Do we agree that there is no benefit in porting the scheduler to the
non-shared spinlock?
If so, I really prefer to not do it.
P.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 04/20] dma-buf: detach fence ops on signal
2025-10-31 13:16 ` [PATCH 04/20] dma-buf: detach fence ops on signal Christian König
@ 2025-11-07 11:04 ` Philipp Stanner
0 siblings, 0 replies; 55+ messages in thread
From: Philipp Stanner @ 2025-11-07 11:04 UTC (permalink / raw)
To: Christian König, alexdeucher, simona.vetter, tursulin,
airlied, felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On Fri, 2025-10-31 at 14:16 +0100, Christian König wrote:
> When neither a release nor a wait operation is specified it is possible
Maybe call it "backend ops" instead of "operation"?
> to let the dma_fence live on independent of the module who issued it.
s/independent/independently
>
> This makes it possible to unload drivers and only wait for all their
> fences to signal.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/dma-buf/dma-fence.c | 16 ++++++++++++----
> include/linux/dma-fence.h | 4 ++--
> 2 files changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index ed82e8361096..cd222984e2e1 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -374,6 +374,14 @@ int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
> &fence->flags)))
> return -EINVAL;
>
> + /*
> + * When neither a release nor a wait operation is specified set the ops
> + * pointer to NULL to allow the fence structure to become independent
missing "from"
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 03/20] dma-buf: protected fence ops by RCU v2
2025-11-06 13:14 ` Christian König
@ 2025-11-07 11:09 ` Tvrtko Ursulin
0 siblings, 0 replies; 55+ messages in thread
From: Tvrtko Ursulin @ 2025-11-07 11:09 UTC (permalink / raw)
To: Christian König, phasta, alexdeucher, simona.vetter, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On 06/11/2025 13:14, Christian König wrote:
> On 10/31/25 15:29, Tvrtko Ursulin wrote:
>> On 31/10/2025 13:16, Christian König wrote:
>>> At first glance it is counter intuitive to protect a constant function
>>> pointer table by RCU, but this allows modules providing the function
>>> table to unload by waiting for an RCU grace period.
>>>
>>> v2: make one the now duplicated lockdep warnings a comment instead.
>>>
>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>> ---
>>> drivers/dma-buf/dma-fence.c | 69 +++++++++++++++++++++++++------------
>>> include/linux/dma-fence.h | 18 ++++++++--
>>> 2 files changed, 62 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>>> index b229d96f551c..ed82e8361096 100644
>>> --- a/drivers/dma-buf/dma-fence.c
>>> +++ b/drivers/dma-buf/dma-fence.c
>>> @@ -498,6 +498,7 @@ EXPORT_SYMBOL(dma_fence_signal);
>>> signed long
>>> dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
>>> {
>>> + const struct dma_fence_ops *ops;
>>> signed long ret;
>>> if (WARN_ON(timeout < 0))
>>> @@ -509,15 +510,21 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
>>> dma_fence_enable_sw_signaling(fence);
>>> - if (trace_dma_fence_wait_start_enabled()) {
>>> - rcu_read_lock();
>>> - trace_dma_fence_wait_start(fence);
>>> + rcu_read_lock();
>>> + ops = rcu_dereference(fence->ops);
>>> + trace_dma_fence_wait_start(fence);
>>> + if (ops->wait) {
>>> + /*
>>> + * Implementing the wait ops is deprecated and not supported for
>>> + * issuer independent fences, so it is ok to use the ops outside
>>> + * the RCU protected section.
>>> + */
>>
>> Probably a good idea to put this explanation about issue independent fences to struct dma_fence_ops kerneldoc. At the moment only .wait is documented as deprecated, so both it and .release can be expanded with this additional angle.
>
> Done, but I'm not sure if my documentation is sufficient. You should probably take a look at the next version.
Will do.
>>> + rcu_read_unlock();
>>> + ret = ops->wait(fence, intr, timeout);
>>> + } else {
>>> rcu_read_unlock();
>>> - }
>>> - if (fence->ops->wait)
>>> - ret = fence->ops->wait(fence, intr, timeout);
>>> - else
>>> ret = dma_fence_default_wait(fence, intr, timeout);
>>> + }
>>> if (trace_dma_fence_wait_end_enabled()) {
>>> rcu_read_lock();
>>> trace_dma_fence_wait_end(fence);
>>> @@ -538,6 +545,7 @@ void dma_fence_release(struct kref *kref)
>>> {
>>> struct dma_fence *fence =
>>> container_of(kref, struct dma_fence, refcount);
>>> + const struct dma_fence_ops *ops;
>>> rcu_read_lock();
>>> trace_dma_fence_destroy(fence);
>>> @@ -569,12 +577,12 @@ void dma_fence_release(struct kref *kref)
>>> spin_unlock_irqrestore(fence->lock, flags);
>>> }
>>> - rcu_read_unlock();
>>> -
>>> - if (fence->ops->release)
>>> - fence->ops->release(fence);
>>> + ops = rcu_dereference(fence->ops);
>>> + if (ops->release)
>>> + ops->release(fence);
>>> else
>>> dma_fence_free(fence);
>>> + rcu_read_unlock();
>>> }
>>> EXPORT_SYMBOL(dma_fence_release);
>>> @@ -593,6 +601,7 @@ EXPORT_SYMBOL(dma_fence_free);
>>> static bool __dma_fence_enable_signaling(struct dma_fence *fence)
>>> {
>>> + const struct dma_fence_ops *ops;
>>> bool was_set;
>>> lockdep_assert_held(fence->lock);
>>> @@ -603,14 +612,18 @@ static bool __dma_fence_enable_signaling(struct dma_fence *fence)
>>> if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>> return false;
>>> - if (!was_set && fence->ops->enable_signaling) {
>>> + rcu_read_lock();
>>> + ops = rcu_dereference(fence->ops);
>>> + if (!was_set && ops->enable_signaling) {
>>> trace_dma_fence_enable_signal(fence);
>>> - if (!fence->ops->enable_signaling(fence)) {
>>> + if (!ops->enable_signaling(fence)) {
>>
>> Have you tried the series with PREEMPT_RT enabled?
>
> No, that is not something we usually test with.
>
>> I am worried about spin locks in any fence ops callbacks which now run with preemption disabled.
>
> Hui? Why would spin_locks be problematic here?
They become sleeping locks and IIRC there's a might_sleep equivalent in
there somewhere which fires when inside the preempt_disable section.
Regards,
Tvrtko
>>> + rcu_read_unlock();
>>> dma_fence_signal_locked(fence);
>>> return false;
>>> }
>>> }
>>> + rcu_read_unlock();
>>> return true;
>>> }
>>> @@ -983,8 +996,13 @@ EXPORT_SYMBOL(dma_fence_wait_any_timeout);
>>> */
>>> void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
>>> {
>>> - if (fence->ops->set_deadline && !dma_fence_is_signaled(fence))
>>> - fence->ops->set_deadline(fence, deadline);
>>> + const struct dma_fence_ops *ops;
>>> +
>>> + rcu_read_lock();
>>> + ops = rcu_dereference(fence->ops);
>>> + if (ops->set_deadline && !dma_fence_is_signaled(fence))
>>> + ops->set_deadline(fence, deadline);
>>> + rcu_read_unlock();
>>> }
>>> EXPORT_SYMBOL(dma_fence_set_deadline);
>>> @@ -1024,7 +1042,12 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>>> BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
>>> kref_init(&fence->refcount);
>>> - fence->ops = ops;
>>> + /*
>>> + * At first glance it is counter intuitive to protect a constant
>>> + * function pointer table by RCU, but this allows modules providing the
>>> + * function table to unload by waiting for an RCU grace period.
>>> + */
>>> + RCU_INIT_POINTER(fence->ops, ops);
>>> INIT_LIST_HEAD(&fence->cb_list);
>>> fence->lock = lock;
>>> fence->context = context;
>>> @@ -1104,11 +1127,12 @@ EXPORT_SYMBOL(dma_fence_init64);
>>> */
>>> const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
>>> {
>>> - RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
>>> - "RCU protection is required for safe access to returned string");
>>> + const struct dma_fence_ops *ops;
>>> + /* RCU protection is required for safe access to returned string */
>>> + ops = rcu_dereference(fence->ops);
>>> if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>> - return fence->ops->get_driver_name(fence);
>>> + return ops->get_driver_name(fence);
>>> else
>>> return "detached-driver";
>>> }
>>> @@ -1136,11 +1160,12 @@ EXPORT_SYMBOL(dma_fence_driver_name);
>>> */
>>> const char __rcu *dma_fence_timeline_name(struct dma_fence *fence)
>>> {
>>> - RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
>>> - "RCU protection is required for safe access to returned string");
>>> + const struct dma_fence_ops *ops;
>>> + /* RCU protection is required for safe access to returned string */
>>> + ops = rcu_dereference(fence->ops);
>>> if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>> - return fence->ops->get_timeline_name(fence);
>>> + return ops->get_timeline_name(fence);
>>> else
>>> return "signaled-timeline";
>>> }
>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>>> index 64639e104110..38421a0c7c5b 100644
>>> --- a/include/linux/dma-fence.h
>>> +++ b/include/linux/dma-fence.h
>>> @@ -66,7 +66,7 @@ struct seq_file;
>>> */
>>> struct dma_fence {
>>> spinlock_t *lock;
>>> - const struct dma_fence_ops *ops;
>>> + const struct dma_fence_ops __rcu *ops;
>>> /*
>>> * We clear the callback list on kref_put so that by the time we
>>> * release the fence it is unused. No one should be adding to the
>>> @@ -418,13 +418,19 @@ const char __rcu *dma_fence_timeline_name(struct dma_fence *fence);
>>> static inline bool
>>> dma_fence_is_signaled_locked(struct dma_fence *fence)
>>> {
>>> + const struct dma_fence_ops *ops;
>>> +
>>> if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>> return true;
>>> - if (fence->ops->signaled && fence->ops->signaled(fence)) {
>>> + rcu_read_lock();
>>> + ops = rcu_dereference(fence->ops);
>>> + if (ops->signaled && ops->signaled(fence)) {
>>> + rcu_read_unlock();
>>> dma_fence_signal_locked(fence);
>>> return true;
>>> }
>>> + rcu_read_unlock();
>>> return false;
>>> }
>>> @@ -448,13 +454,19 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
>>> static inline bool
>>> dma_fence_is_signaled(struct dma_fence *fence)
>>> {
>>> + const struct dma_fence_ops *ops;
>>> +
>>> if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>> return true;
>>> - if (fence->ops->signaled && fence->ops->signaled(fence)) {
>>> + rcu_read_lock();
>>> + ops = rcu_dereference(fence->ops);
>>> + if (ops->signaled && ops->signaled(fence)) {
>>> + rcu_read_unlock();
>>> dma_fence_signal(fence);
>>> return true;
>>> }
>>> + rcu_read_unlock();
>>> return false;
>>> }
>>
>
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 05/20] dma-buf: inline spinlock for fence protection
2025-10-31 13:16 ` [PATCH 05/20] dma-buf: inline spinlock for fence protection Christian König
@ 2025-11-07 11:59 ` Philipp Stanner
0 siblings, 0 replies; 55+ messages in thread
From: Philipp Stanner @ 2025-11-07 11:59 UTC (permalink / raw)
To: Christian König, alexdeucher, simona.vetter, tursulin,
airlied, felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On Fri, 2025-10-31 at 14:16 +0100, Christian König wrote:
> Allow implementations to not give a spinlock to protect the fence
> internal state, instead a spinlock embedded into the fence structure
> itself is used in this case.
I'd rather say sth like "Implement per-fence spinlocks. Shared
spinlocks have the problem that […]"
>
> Apart from simplifying the handling for containers and the stub fence
> this has the advantage of allowing implementations to issue fences
> without caring about theit spinlock lifetime.
s/theit/their.
But rather "it allows for completely decoupling spinlock producer and
consumer life times"
>
> That in turn is necessary for independent fences who outlive the module
> who originally issued them.
s/who/which
>
>
[…]
>
> enum dma_fence_flag_bits {
> + DMA_FENCE_FLAG_INLINE_LOCK_BIT,
> DMA_FENCE_FLAG_SEQNO64_BIT,
> DMA_FENCE_FLAG_SIGNALED_BIT,
> DMA_FENCE_FLAG_TIMESTAMP_BIT,
> @@ -351,6 +357,38 @@ dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep)
> } while (1);
> }
>
> +/**
> + * dma_fence_spinlock - return pointer to the spinlock protecting the fence
> + * @fence: the fence to get the lock from
> + *
> + * Return either the pointer to the embedded or the external spin lock.
> + */
> +static inline spinlock_t *dma_fence_spinlock(struct dma_fence *fence)
> +{
> + return test_bit(DMA_FENCE_FLAG_INLINE_LOCK_BIT, &fence->flags) ?
> + &fence->inline_lock : fence->extern_lock;
Wouldn't you agree that an 'if' is much more readable? I prefer using
the ternary operator only for tiny operations.
if test_bit(…)
return &fence->inline_lock;
return fence->extern_lock;
Besides, I agree with the idea and I think I actually need a feature like that in dma_fence.
P.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 07/20] dma-buf: use inline lock for the dma-fence-array
2025-10-31 13:16 ` [PATCH 07/20] dma-buf: use inline lock for the dma-fence-array Christian König
2025-11-05 8:50 ` Tvrtko Ursulin
@ 2025-11-07 12:04 ` Philipp Stanner
2025-11-12 13:53 ` Christian König
1 sibling, 1 reply; 55+ messages in thread
From: Philipp Stanner @ 2025-11-07 12:04 UTC (permalink / raw)
To: Christian König, alexdeucher, simona.vetter, tursulin,
airlied, felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On Fri, 2025-10-31 at 14:16 +0100, Christian König wrote:
> Just as proof of concept and minor cleanup.
I maintain that even relatively simple commits should give a new reader
ore one who's browsing through the log in 3 years a rough idea what's
going on.
That is: quickly describe what the situation (motivation) is and what
the commit does.
At the very least "proof of concept" is nothing anyone would expect in
a non-RFC patch. To me as a non-expert in dma-buf it's not clear at all
whether this patch here is actually necessary, i.e., solves a problem.
I also don't see how replacing one lock position with another is a
"cleanup". Sharing spinlocks is perfectly legal and will remain so, no?
P.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 07/20] dma-buf: use inline lock for the dma-fence-array
2025-11-07 12:04 ` Philipp Stanner
@ 2025-11-12 13:53 ` Christian König
2025-11-12 14:00 ` Philipp Stanner
0 siblings, 1 reply; 55+ messages in thread
From: Christian König @ 2025-11-12 13:53 UTC (permalink / raw)
To: phasta, alexdeucher, simona.vetter, tursulin, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On 11/7/25 13:04, Philipp Stanner wrote:
> On Fri, 2025-10-31 at 14:16 +0100, Christian König wrote:
>> Just as proof of concept and minor cleanup.
>
> I maintain that even relatively simple commits should give a new reader
> ore one who's browsing through the log in 3 years a rough idea what's
> going on.
>
> That is: quickly describe what the situation (motivation) is and what
> the commit does.
>
> At the very least "proof of concept" is nothing anyone would expect in
> a non-RFC patch. To me as a non-expert in dma-buf it's not clear at all
> whether this patch here is actually necessary, i.e., solves a problem.
Proof of concept in the sense "I use this patch to test the concept with with the kernel unit tests and robots".
> I also don't see how replacing one lock position with another is a
> "cleanup". Sharing spinlocks is perfectly legal and will remain so, no?
Well that's the more interesting question.
On the one hand I'm now pretty sure that allowing those shared fences was a really bad idea, there is simply no valid use case for them.
On the other hand changing all the existing implementations would be tons of work with limited gain. I already tried that before and without some intermediate solution like this here it would be an enormous patch set touching all current implementations at the same time.
Regards,
Christian.
>
> P.
>
>
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 09/20] drm/sched: use inline locks for the drm-sched-fence
2025-11-07 8:33 ` Philipp Stanner
@ 2025-11-12 13:58 ` Christian König
0 siblings, 0 replies; 55+ messages in thread
From: Christian König @ 2025-11-12 13:58 UTC (permalink / raw)
To: phasta, Tvrtko Ursulin, alexdeucher, simona.vetter, airlied,
felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On 11/7/25 09:33, Philipp Stanner wrote:
> On Tue, 2025-11-04 at 15:12 +0000, Tvrtko Ursulin wrote:
>>
>> On 31/10/2025 13:16, Christian König wrote:
>>> Just as proof of concept and minor cleanup.
>
> That's by the way why I'm asking whether this series is intended as an
> RFC.
>
>>>
>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>> ---
>>> drivers/gpu/drm/scheduler/sched_fence.c | 11 +++++------
>>> include/drm/gpu_scheduler.h | 4 ----
>>> 2 files changed, 5 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
>>> index 9391d6f0dc01..7a94e03341cb 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
>>> @@ -156,19 +156,19 @@ static void drm_sched_fence_set_deadline_finished(struct dma_fence *f,
>>> struct dma_fence *parent;
>>> unsigned long flags;
>>>
>>> - spin_lock_irqsave(&fence->lock, flags);
>>> + dma_fence_lock(f, flags);
>>
>> Moving to dma_fence_lock should either be a separate patch or squashed
>> into the one which converts many other drivers. Even a separate patch
>> before that previous patch would be better.
>
> Yes. +1
Now that I got what was meant here I agree as well.
>
>>
>> Naming wise, I however still think dma_fence_lock_irqsave would probably
>> be better to stick with the same pattern everyone is so used too.
>
> I also think this would be better. Who knows, one day maybe someone
> really wants a lock that definitely must not be irqsave for some
> reason, and then the naming pattern would completely break.
Already done.
>
>>>
>>>
>
> […]
>
>>> @@ -230,9 +229,9 @@ void drm_sched_fence_init(struct drm_sched_fence *fence,
>>> fence->sched = entity->rq->sched;
>>> seq = atomic_inc_return(&entity->fence_seq);
>>> dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
>>> - &fence->lock, entity->fence_context, seq);
>>> + NULL, entity->fence_context, seq);
>>> dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
>>> - &fence->lock, entity->fence_context + 1, seq);
>>> + NULL, entity->fence_context + 1, seq);
>>> }
>
> Do we agree that there is no benefit in porting the scheduler to the
> non-shared spinlock?
It saves something like 4 bytes :)
>
> If so, I really prefer to not do it.
k, going to drop it. On the other hand if I'm not completely mistaken it slightly reduces the complexity.
Regards,
Christian.
>
>
> P.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 07/20] dma-buf: use inline lock for the dma-fence-array
2025-11-12 13:53 ` Christian König
@ 2025-11-12 14:00 ` Philipp Stanner
0 siblings, 0 replies; 55+ messages in thread
From: Philipp Stanner @ 2025-11-12 14:00 UTC (permalink / raw)
To: Christian König, phasta, alexdeucher, simona.vetter,
tursulin, airlied, felix.kuehling, matthew.brost
Cc: dri-devel, amd-gfx
On Wed, 2025-11-12 at 14:53 +0100, Christian König wrote:
> On 11/7/25 13:04, Philipp Stanner wrote:
> > On Fri, 2025-10-31 at 14:16 +0100, Christian König wrote:
> > > Just as proof of concept and minor cleanup.
> >
> > I maintain that even relatively simple commits should give a new
> > reader
> > ore one who's browsing through the log in 3 years a rough idea
> > what's
> > going on.
> >
> > That is: quickly describe what the situation (motivation) is and
> > what
> > the commit does.
> >
> > At the very least "proof of concept" is nothing anyone would expect
> > in
> > a non-RFC patch. To me as a non-expert in dma-buf it's not clear at
> > all
> > whether this patch here is actually necessary, i.e., solves a
> > problem.
>
> Proof of concept in the sense "I use this patch to test the concept
> with with the kernel unit tests and robots".
I think since independent fences are the recommendation (see below),
it's better to have this commit being about moving users to the
recommended usage.
>
> > I also don't see how replacing one lock position with another is a
> > "cleanup". Sharing spinlocks is perfectly legal and will remain so,
> > no?
>
> Well that's the more interesting question.
>
> On the one hand I'm now pretty sure that allowing those shared fences
> was a really bad idea, there is simply no valid use case for them.
>
> On the other hand changing all the existing implementations would be
> tons of work with limited gain. I already tried that before and
> without some intermediate solution like this here it would be an
> enormous patch set touching all current implementations at the same
> time.
I also prefer (want) non-shared fences for Rust.
What we can do is explicitly document that independent fences are the
strong recommendation for new users. "It is recommended that you pass
NULL for the fence so that fences get separate locks, which allows for
cleanly decoupling fences from their issuer and even the fence context
object (being protected by the same spinlock)."
I think that's not included in the series yet.
P.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 11/20] drm/amdgpu: fix KFD eviction fence enable_signaling path
2025-11-06 17:25 ` Kuehling, Felix
@ 2025-11-13 14:37 ` Christian König
2025-11-13 17:46 ` Kuehling, Felix
0 siblings, 1 reply; 55+ messages in thread
From: Christian König @ 2025-11-13 14:37 UTC (permalink / raw)
To: Kuehling, Felix, phasta, alexdeucher, simona.vetter, tursulin,
airlied, matthew.brost
Cc: dri-devel, amd-gfx
On 11/6/25 18:25, Kuehling, Felix wrote:
>
> On 2025-11-06 12:09, Christian König wrote:
>>
>> On 11/6/25 18:07, Kuehling, Felix wrote:
>>> On 2025-11-06 11:46, Christian König wrote:
>>>> On 11/6/25 17:37, Kuehling, Felix wrote:
>>>>> On 2025-11-06 08:43, Christian König wrote:
>>>>>> On 11/4/25 17:28, Philipp Stanner wrote:
>>>>>>> On Fri, 2025-10-31 at 14:16 +0100, Christian König wrote:
>>>>>>>> Calling dma_fence_is_signaled() here is illegal!
>>>>>>> The series was sent as a v2. But is this still an RFC?
>>>>>> I think when Matthew came up with the XE patches we pretty much agreed that this is the way to go.
>>>>>>
>>>>>>> If not, more detailed commit messages are a desirable thing.
>>>>>> Good point, how about:
>>>>>>
>>>>>> The enable_signaling callback is called with the same irqsave spinlock held than dma_fence_is_signaled() tries to grab. That will 100% reliable deadlock if that happens.
>>>>> I guess we could use dma_fence_is_signaled_locked instead. That said, it only tries to take the lock (in dma_fence_signal) if fence->ops->signal is set, which isn't the case for these fences. That's why this has never caused a problem up till now.
>>>> But when fence->ops->signal isn't set then why are we calling this?
>>> There is no need to enable-signaling (and trigger a preemption), if the eviction fence has already signaled.
>> But when the evicted fence has already been signaled then enable_signaling is not called in the first place:
>>
>> if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>> return false;
>>
>> if (!was_set && fence->ops->enable_signaling) {
>> trace_dma_fence_enable_signal(fence);
>>
>> if (!fence->ops->enable_signaling(fence)) {
>> ...
>>
>> So the extra check is actually completely superfluous as far as I can see.
>
> Thanks. I agree.
Can I take that as reviewed-by you for this patch?
Thanks,
Christian.
>
> Regards,
> Felix
>
>
>>
>> Regards,
>> Christian.
>>
>>> Regards,
>>> Felix
>>>
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Regards,
>>>>> Felix
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>> Christian.
>>>>>>
>>>>>>> P.
>>>>>>>
>>>>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>>>> ---
>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 6 ------
>>>>>>>> 1 file changed, 6 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>>>>>>> index 1ef758ac5076..09c919f72b6c 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>>>>>>> @@ -120,12 +120,6 @@ static bool amdkfd_fence_enable_signaling(struct
>>>>>>>> dma_fence *f)
>>>>>>>> {
>>>>>>>> struct amdgpu_amdkfd_fence *fence =
>>>>>>>> to_amdgpu_amdkfd_fence(f);
>>>>>>>> - if (!fence)
>>>>>>>> - return false;
>>>>>>>> -
>>>>>>>> - if (dma_fence_is_signaled(f))
>>>>>>>> - return true;
>>>>>>>> -
>>>>>>>> if (!fence->svm_bo) {
>>>>>>>> if
>>>>>>>> (!kgd2kfd_schedule_evict_and_restore_process(fence->mm, f))
>>>>>>>> return true;
>>
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 11/20] drm/amdgpu: fix KFD eviction fence enable_signaling path
2025-11-13 14:37 ` Christian König
@ 2025-11-13 17:46 ` Kuehling, Felix
0 siblings, 0 replies; 55+ messages in thread
From: Kuehling, Felix @ 2025-11-13 17:46 UTC (permalink / raw)
To: Christian König, phasta, alexdeucher, simona.vetter,
tursulin, airlied, matthew.brost
Cc: dri-devel, amd-gfx
On 2025-11-13 09:37, Christian König wrote:
>>>> There is no need to enable-signaling (and trigger a preemption), if the eviction fence has already signaled.
>>> But when the evicted fence has already been signaled then enable_signaling is not called in the first place:
>>>
>>> if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>> return false;
>>>
>>> if (!was_set && fence->ops->enable_signaling) {
>>> trace_dma_fence_enable_signal(fence);
>>>
>>> if (!fence->ops->enable_signaling(fence)) {
>>> ...
>>>
>>> So the extra check is actually completely superfluous as far as I can see.
>> Thanks. I agree.
> Can I take that as reviewed-by you for this patch?
Yes.
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
>
> Thanks,
> Christian.
>
>> Regards,
>> Felix
>>
>>
>>> Regards,
>>> Christian.
>>>
>>>> Regards,
>>>> Felix
>>>>
>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> Regards,
>>>>>> Felix
>>>>>>
>>>>>>
>>>>>>> Thanks,
>>>>>>> Christian.
>>>>>>>
>>>>>>>> P.
>>>>>>>>
>>>>>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>>>>> ---
>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 6 ------
>>>>>>>>> 1 file changed, 6 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>>>>>>>> index 1ef758ac5076..09c919f72b6c 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>>>>>>>>> @@ -120,12 +120,6 @@ static bool amdkfd_fence_enable_signaling(struct
>>>>>>>>> dma_fence *f)
>>>>>>>>> {
>>>>>>>>> struct amdgpu_amdkfd_fence *fence =
>>>>>>>>> to_amdgpu_amdkfd_fence(f);
>>>>>>>>> - if (!fence)
>>>>>>>>> - return false;
>>>>>>>>> -
>>>>>>>>> - if (dma_fence_is_signaled(f))
>>>>>>>>> - return true;
>>>>>>>>> -
>>>>>>>>> if (!fence->svm_bo) {
>>>>>>>>> if
>>>>>>>>> (!kgd2kfd_schedule_evict_and_restore_process(fence->mm, f))
>>>>>>>>> return true;
^ permalink raw reply [flat|nested] 55+ messages in thread
end of thread, other threads:[~2025-11-13 17:46 UTC | newest]
Thread overview: 55+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-31 13:16 Independence for dma_fences! v2 Christian König
2025-10-31 13:16 ` [PATCH 01/20] dma-buf: cleanup dma_fence_describe v2 Christian König
2025-10-31 14:04 ` Tvrtko Ursulin
2025-10-31 13:16 ` [PATCH 02/20] dma-buf: rework stub fence initialisation v2 Christian König
2025-10-31 14:05 ` Tvrtko Ursulin
2025-11-04 15:01 ` Tvrtko Ursulin
2025-11-06 13:16 ` Christian König
2025-10-31 13:16 ` [PATCH 03/20] dma-buf: protected fence ops by RCU v2 Christian König
2025-10-31 14:29 ` Tvrtko Ursulin
2025-11-06 13:14 ` Christian König
2025-11-07 11:09 ` Tvrtko Ursulin
2025-10-31 13:16 ` [PATCH 04/20] dma-buf: detach fence ops on signal Christian König
2025-11-07 11:04 ` Philipp Stanner
2025-10-31 13:16 ` [PATCH 05/20] dma-buf: inline spinlock for fence protection Christian König
2025-11-07 11:59 ` Philipp Stanner
2025-10-31 13:16 ` [PATCH 06/20] dma-buf: use inline lock for the stub fence Christian König
2025-11-04 15:05 ` Tvrtko Ursulin
2025-10-31 13:16 ` [PATCH 07/20] dma-buf: use inline lock for the dma-fence-array Christian König
2025-11-05 8:50 ` Tvrtko Ursulin
2025-11-07 12:04 ` Philipp Stanner
2025-11-12 13:53 ` Christian König
2025-11-12 14:00 ` Philipp Stanner
2025-10-31 13:16 ` [PATCH 08/20] dma-buf: use inline lock for the dma-fence-chain Christian König
2025-11-04 15:08 ` Tvrtko Ursulin
2025-10-31 13:16 ` [PATCH 09/20] drm/sched: use inline locks for the drm-sched-fence Christian König
2025-11-04 15:12 ` Tvrtko Ursulin
2025-11-06 13:23 ` Christian König
2025-11-06 13:45 ` Tvrtko Ursulin
2025-11-07 8:33 ` Philipp Stanner
2025-11-12 13:58 ` Christian König
2025-10-31 13:16 ` [PATCH 10/20] drm/amdgpu: clean up and unify hw fence handling Christian König
2025-11-04 15:14 ` Tvrtko Ursulin
2025-10-31 13:16 ` [PATCH 11/20] drm/amdgpu: fix KFD eviction fence enable_signaling path Christian König
2025-11-04 16:28 ` Philipp Stanner
2025-11-06 13:43 ` Christian König
2025-11-06 16:37 ` Kuehling, Felix
2025-11-06 16:46 ` Christian König
2025-11-06 17:07 ` Kuehling, Felix
2025-11-06 17:09 ` Christian König
2025-11-06 17:25 ` Kuehling, Felix
2025-11-13 14:37 ` Christian König
2025-11-13 17:46 ` Kuehling, Felix
2025-10-31 13:16 ` [PATCH 12/20] drm/amdgpu: independence for the amdgpu_fence! Christian König
2025-10-31 13:16 ` [PATCH 13/20] drm/amdgpu: independence for the amdgpu_eviction_fence! Christian König
2025-11-04 15:45 ` Tvrtko Ursulin
2025-10-31 13:16 ` [PATCH 14/20] drm/amdgpu: independence for the amdgpu_vm_tlb_fence! Christian König
2025-11-04 15:45 ` Tvrtko Ursulin
2025-10-31 13:16 ` [PATCH 15/20] drm/amdgpu: independence for the amdkfd_fence! Christian König
2025-10-31 14:34 ` Kuehling, Felix
2025-10-31 13:16 ` [PATCH 16/20] drm/amdgpu: independence for the amdgpu_userq__fence! Christian König
2025-11-04 15:59 ` Tvrtko Ursulin
2025-10-31 13:16 ` [PATCH 17/20] drm/xe: Disconnect the low hanging fences from Xe module Christian König
2025-10-31 13:16 ` [PATCH 18/20] drm/xe: Drop HW fence slab Christian König
2025-10-31 13:16 ` [PATCH 19/20] drm/xe: Promote xe_hw_fence_irq to an ref counted object Christian König
2025-10-31 13:16 ` [PATCH 20/20] drm/xe: Finish disconnect HW fences from module Christian König
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox