dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [RFC v3 00/10] Some (drm_sched_|dma_)fence lifetime issues
@ 2025-05-13  7:45 Tvrtko Ursulin
  2025-05-13  7:45 ` [RFC v3 01/10] dma-fence: Change signature of __dma_fence_is_later Tvrtko Ursulin
                   ` (9 more replies)
  0 siblings, 10 replies; 14+ messages in thread
From: Tvrtko Ursulin @ 2025-05-13  7:45 UTC (permalink / raw)
  To: dri-devel
  Cc: Rob Clark, Sumit Semwal, Gustavo Padovan, Christian König,
	Matthew Brost, Lucas De Marchi, Rodrigo Vivi, amd-gfx, intel-xe,
	intel-gfx, linux-media, linaro-mm-sig, kernel-dev, Tvrtko Ursulin

Hi all,

tl;dr;
Xe and probably some other drivers can tear down the internal state referenced
by an exported sync_file fence which then causes a null pointer derefences on
accessing said fence.

IGT that exploits the problem:
https://patchwork.freedesktop.org/patch/642709/?series=146211&rev=2

It seems there is a consensus this is a known problem with the dma-fence design,
where internal state shouldn't really be accessed after the fence has been
signaled. However currently the code is mostly unaware of that hence the use-
after-free potential.

To fix it, between the option of adding more reference counting and trying to
"revoke" the fence, suggestion is to focus on the later.

Reference to the recent discussion:
https://lore.kernel.org/dri-devel/20250418164246.72426-1-tvrtko.ursulin@igalia.com/

This series therefore attempts to implement a solution along those lines.

Most of the description what and how can be found in:
 "dma-fence: Add safe access helpers and document the rules"

Other than that, the series starts with some cleanups, with the general goal of
hiding more of the dma-fence implementation details behind explicit API. This
means adding helpers for access to driver and timeline name, and also moving as
much as it is easily possible of driver allocated state into the fence object
itself. Because dma-fence is already reference counted, any state we can embed
automatically becomes safe.

Having said that, the series only addreses the runtime use-after-free scenarios,
such as the above explained situation with the xe driver. For now the module
unload problem is deliberately left for later. (Although again, some of the
early patches do make it safer, and will make future improvements easier due
fewer accesses to fence->ops.)

Final patch in the series is the one which makes xe compliant with the rules
and API proposed earlier in the series. It does so by ensuring there is at least
one RCU grace period between fences being signaled and driver allocated memory
accessible from xe fences getting freed. Which couples with the earlier (from
the series) refactors which added dma_fence_access_begin/end() protection to
the relevant call sites.

If this approach is acceptable the next steps will be to see if any other
drivers will need similar changes. And also to discuss whether we want to go a
step futher and later move to SRCU, so code would be protected against module
unload as well.

v2:
 * Dropped module unload handling.
 * Proposing real API instead of hacks.

v3:
 * Dropped the dma_fence_is_array|chain ops to flags conversion.
 * Dropped the i915 cleanup patch which is now independent.
 * Squashed dma-fence helpers with internal usage patches.
 * Restored missing hunk in "dma-fence: Use a flag for 64-bit seqnos".
 * Removed the AMDGPU_JOB_GET_TIMELINE_NAME macro.
 * Applied some r-b tags.

Tvrtko Ursulin (10):
  dma-fence: Change signature of __dma_fence_is_later
  dma-fence: Use a flag for 64-bit seqnos
  dma-fence: Add helpers for accessing driver and timeline name
  sync_file: Use dma-fence driver and timeline name helpers
  drm/amdgpu: Use dma-fence driver and timeline name helpers
  drm/i915: Use dma-fence driver and timeline name helpers
  dma-fence: Add safe access helpers and document the rules
  sync_file: Protect access to driver and timeline name
  drm/i915: Protect access to driver and timeline name
  drm/xe: Make dma-fences compliant with the safe access rules

 drivers/dma-buf/dma-fence-chain.c             |  7 +-
 drivers/dma-buf/dma-fence.c                   | 87 ++++++++++++++++++-
 drivers/dma-buf/sw_sync.c                     |  2 +-
 drivers/dma-buf/sync_file.c                   | 14 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h     |  9 +-
 .../gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c  |  5 +-
 drivers/gpu/drm/i915/gt/intel_gt_requests.c   |  6 +-
 drivers/gpu/drm/i915/i915_request.c           |  5 +-
 drivers/gpu/drm/i915/i915_sw_fence.c          |  6 +-
 drivers/gpu/drm/xe/xe_guc_exec_queue_types.h  |  2 +
 drivers/gpu/drm/xe/xe_guc_submit.c            |  7 +-
 drivers/gpu/drm/xe/xe_hw_fence.c              |  5 +-
 drivers/gpu/drm/xe/xe_sched_job.c             | 14 +--
 include/linux/dma-fence.h                     | 47 +++++++---
 include/trace/events/dma_fence.h              |  4 +-
 15 files changed, 168 insertions(+), 52 deletions(-)

-- 
2.48.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC v3 01/10] dma-fence: Change signature of __dma_fence_is_later
  2025-05-13  7:45 [RFC v3 00/10] Some (drm_sched_|dma_)fence lifetime issues Tvrtko Ursulin
@ 2025-05-13  7:45 ` Tvrtko Ursulin
  2025-05-13  7:45 ` [RFC v3 02/10] dma-fence: Use a flag for 64-bit seqnos Tvrtko Ursulin
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Tvrtko Ursulin @ 2025-05-13  7:45 UTC (permalink / raw)
  To: dri-devel
  Cc: Rob Clark, Sumit Semwal, Gustavo Padovan, Christian König,
	Matthew Brost, Lucas De Marchi, Rodrigo Vivi, amd-gfx, intel-xe,
	intel-gfx, linux-media, linaro-mm-sig, kernel-dev, Tvrtko Ursulin

With the goal of reducing the need for drivers to touch (and dereference)
fence->ops, we change the prototype of __dma_fence_is_later() to take
fence instead of fence->ops.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-fence-chain.c |  2 +-
 drivers/dma-buf/sw_sync.c         |  2 +-
 drivers/gpu/drm/xe/xe_hw_fence.c  |  2 +-
 drivers/gpu/drm/xe/xe_sched_job.c | 14 ++++++++------
 include/linux/dma-fence.h         |  9 ++++-----
 5 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
index 9663ba1bb6ac..90424f23fd73 100644
--- a/drivers/dma-buf/dma-fence-chain.c
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -252,7 +252,7 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
 	chain->prev_seqno = 0;
 
 	/* Try to reuse the context of the previous chain node. */
-	if (prev_chain && __dma_fence_is_later(seqno, prev->seqno, prev->ops)) {
+	if (prev_chain && __dma_fence_is_later(prev, seqno, prev->seqno)) {
 		context = prev->context;
 		chain->prev_seqno = prev->seqno;
 	} else {
diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
index 4f27ee93a00c..3c20f1d31cf5 100644
--- a/drivers/dma-buf/sw_sync.c
+++ b/drivers/dma-buf/sw_sync.c
@@ -170,7 +170,7 @@ static bool timeline_fence_signaled(struct dma_fence *fence)
 {
 	struct sync_timeline *parent = dma_fence_parent(fence);
 
-	return !__dma_fence_is_later(fence->seqno, parent->value, fence->ops);
+	return !__dma_fence_is_later(fence, fence->seqno, parent->value);
 }
 
 static void timeline_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
diff --git a/drivers/gpu/drm/xe/xe_hw_fence.c b/drivers/gpu/drm/xe/xe_hw_fence.c
index 0b4f12be3692..03eb8c6d1616 100644
--- a/drivers/gpu/drm/xe/xe_hw_fence.c
+++ b/drivers/gpu/drm/xe/xe_hw_fence.c
@@ -165,7 +165,7 @@ static bool xe_hw_fence_signaled(struct dma_fence *dma_fence)
 	u32 seqno = xe_map_rd(xe, &fence->seqno_map, 0, u32);
 
 	return dma_fence->error ||
-		!__dma_fence_is_later(dma_fence->seqno, seqno, dma_fence->ops);
+		!__dma_fence_is_later(dma_fence, dma_fence->seqno, seqno);
 }
 
 static bool xe_hw_fence_enable_signaling(struct dma_fence *dma_fence)
diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
index 1905ca590965..f0a6ce610948 100644
--- a/drivers/gpu/drm/xe/xe_sched_job.c
+++ b/drivers/gpu/drm/xe/xe_sched_job.c
@@ -216,15 +216,17 @@ void xe_sched_job_set_error(struct xe_sched_job *job, int error)
 
 bool xe_sched_job_started(struct xe_sched_job *job)
 {
+	struct dma_fence *fence = dma_fence_chain_contained(job->fence);
 	struct xe_lrc *lrc = job->q->lrc[0];
 
-	return !__dma_fence_is_later(xe_sched_job_lrc_seqno(job),
-				     xe_lrc_start_seqno(lrc),
-				     dma_fence_chain_contained(job->fence)->ops);
+	return !__dma_fence_is_later(fence,
+				     xe_sched_job_lrc_seqno(job),
+				     xe_lrc_start_seqno(lrc));
 }
 
 bool xe_sched_job_completed(struct xe_sched_job *job)
 {
+	struct dma_fence *fence = dma_fence_chain_contained(job->fence);
 	struct xe_lrc *lrc = job->q->lrc[0];
 
 	/*
@@ -232,9 +234,9 @@ bool xe_sched_job_completed(struct xe_sched_job *job)
 	 * parallel handshake is done.
 	 */
 
-	return !__dma_fence_is_later(xe_sched_job_lrc_seqno(job),
-				     xe_lrc_seqno(lrc),
-				     dma_fence_chain_contained(job->fence)->ops);
+	return !__dma_fence_is_later(fence,
+				     xe_sched_job_lrc_seqno(job),
+				     xe_lrc_seqno(lrc));
 }
 
 void xe_sched_job_arm(struct xe_sched_job *job)
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index b12776883d14..48b5202c531d 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -441,21 +441,20 @@ dma_fence_is_signaled(struct dma_fence *fence)
 
 /**
  * __dma_fence_is_later - return if f1 is chronologically later than f2
+ * @fence: fence in whose context to do the comparison
  * @f1: the first fence's seqno
  * @f2: the second fence's seqno from the same context
- * @ops: dma_fence_ops associated with the seqno
  *
  * Returns true if f1 is chronologically later than f2. Both fences must be
  * from the same context, since a seqno is not common across contexts.
  */
-static inline bool __dma_fence_is_later(u64 f1, u64 f2,
-					const struct dma_fence_ops *ops)
+static inline bool __dma_fence_is_later(struct dma_fence *fence, u64 f1, u64 f2)
 {
 	/* This is for backward compatibility with drivers which can only handle
 	 * 32bit sequence numbers. Use a 64bit compare when the driver says to
 	 * do so.
 	 */
-	if (ops->use_64bit_seqno)
+	if (fence->ops->use_64bit_seqno)
 		return f1 > f2;
 
 	return (int)(lower_32_bits(f1) - lower_32_bits(f2)) > 0;
@@ -475,7 +474,7 @@ static inline bool dma_fence_is_later(struct dma_fence *f1,
 	if (WARN_ON(f1->context != f2->context))
 		return false;
 
-	return __dma_fence_is_later(f1->seqno, f2->seqno, f1->ops);
+	return __dma_fence_is_later(f1, f1->seqno, f2->seqno);
 }
 
 /**
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC v3 02/10] dma-fence: Use a flag for 64-bit seqnos
  2025-05-13  7:45 [RFC v3 00/10] Some (drm_sched_|dma_)fence lifetime issues Tvrtko Ursulin
  2025-05-13  7:45 ` [RFC v3 01/10] dma-fence: Change signature of __dma_fence_is_later Tvrtko Ursulin
@ 2025-05-13  7:45 ` Tvrtko Ursulin
  2025-05-13  7:45 ` [RFC v3 03/10] dma-fence: Add helpers for accessing driver and timeline name Tvrtko Ursulin
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Tvrtko Ursulin @ 2025-05-13  7:45 UTC (permalink / raw)
  To: dri-devel
  Cc: Rob Clark, Sumit Semwal, Gustavo Padovan, Christian König,
	Matthew Brost, Lucas De Marchi, Rodrigo Vivi, amd-gfx, intel-xe,
	intel-gfx, linux-media, linaro-mm-sig, kernel-dev, Tvrtko Ursulin

With the goal of reducing the need for drivers to touch (and dereference)
fence->ops, we move the 64-bit seqnos flag from struct dma_fence_ops to
the fence->flags.

Drivers which were setting this flag are changed to use new
dma_fence_init64() instead of dma_fence_init().

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-fence-chain.c                |  5 ++---
 drivers/dma-buf/dma-fence.c                      |  9 +++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c |  5 ++---
 include/linux/dma-fence.h                        | 14 +++++---------
 4 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
index 90424f23fd73..a8a90acf4f34 100644
--- a/drivers/dma-buf/dma-fence-chain.c
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -218,7 +218,6 @@ static void dma_fence_chain_set_deadline(struct dma_fence *fence,
 }
 
 const struct dma_fence_ops dma_fence_chain_ops = {
-	.use_64bit_seqno = true,
 	.get_driver_name = dma_fence_chain_get_driver_name,
 	.get_timeline_name = dma_fence_chain_get_timeline_name,
 	.enable_signaling = dma_fence_chain_enable_signaling,
@@ -262,8 +261,8 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
 			seqno = max(prev->seqno, seqno);
 	}
 
-	dma_fence_init(&chain->base, &dma_fence_chain_ops,
-		       &chain->lock, context, seqno);
+	dma_fence_init64(&chain->base, &dma_fence_chain_ops, &chain->lock,
+			 context, seqno);
 
 	/*
 	 * Chaining dma_fence_chain container together is only allowed through
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index f0cdd3e99d36..33661658f684 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -1023,3 +1023,12 @@ dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
 	trace_dma_fence_init(fence);
 }
 EXPORT_SYMBOL(dma_fence_init);
+
+void
+dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops,
+		 spinlock_t *lock, u64 context, u64 seqno)
+{
+	dma_fence_init(fence, ops, lock, context, seqno);
+	__set_bit(DMA_FENCE_FLAG_SEQNO64_BIT, &fence->flags);
+}
+EXPORT_SYMBOL(dma_fence_init64);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
index 51cddfa3f1e8..5d26797356a3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
@@ -71,7 +71,6 @@ static void amdgpu_tlb_fence_work(struct work_struct *work)
 }
 
 static const struct dma_fence_ops amdgpu_tlb_fence_ops = {
-	.use_64bit_seqno = true,
 	.get_driver_name = amdgpu_tlb_fence_get_driver_name,
 	.get_timeline_name = amdgpu_tlb_fence_get_timeline_name
 };
@@ -101,8 +100,8 @@ void amdgpu_vm_tlb_fence_create(struct amdgpu_device *adev, struct amdgpu_vm *vm
 	INIT_WORK(&f->work, amdgpu_tlb_fence_work);
 	spin_lock_init(&f->lock);
 
-	dma_fence_init(&f->base, &amdgpu_tlb_fence_ops, &f->lock,
-		       vm->tlb_fence_context, atomic64_read(&vm->tlb_seq));
+	dma_fence_init64(&f->base, &amdgpu_tlb_fence_ops, &f->lock,
+			 vm->tlb_fence_context, atomic64_read(&vm->tlb_seq));
 
 	/* TODO: We probably need a separate wq here */
 	dma_fence_get(&f->base);
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 48b5202c531d..a34a0dcdc446 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -97,6 +97,7 @@ struct dma_fence {
 };
 
 enum dma_fence_flag_bits {
+	DMA_FENCE_FLAG_SEQNO64_BIT,
 	DMA_FENCE_FLAG_SIGNALED_BIT,
 	DMA_FENCE_FLAG_TIMESTAMP_BIT,
 	DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
@@ -124,14 +125,6 @@ struct dma_fence_cb {
  *
  */
 struct dma_fence_ops {
-	/**
-	 * @use_64bit_seqno:
-	 *
-	 * True if this dma_fence implementation uses 64bit seqno, false
-	 * otherwise.
-	 */
-	bool use_64bit_seqno;
-
 	/**
 	 * @get_driver_name:
 	 *
@@ -262,6 +255,9 @@ struct dma_fence_ops {
 void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
 		    spinlock_t *lock, u64 context, u64 seqno);
 
+void dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops,
+		      spinlock_t *lock, u64 context, u64 seqno);
+
 void dma_fence_release(struct kref *kref);
 void dma_fence_free(struct dma_fence *fence);
 void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq);
@@ -454,7 +450,7 @@ static inline bool __dma_fence_is_later(struct dma_fence *fence, u64 f1, u64 f2)
 	 * 32bit sequence numbers. Use a 64bit compare when the driver says to
 	 * do so.
 	 */
-	if (fence->ops->use_64bit_seqno)
+	if (test_bit(DMA_FENCE_FLAG_SEQNO64_BIT, &fence->flags))
 		return f1 > f2;
 
 	return (int)(lower_32_bits(f1) - lower_32_bits(f2)) > 0;
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC v3 03/10] dma-fence: Add helpers for accessing driver and timeline name
  2025-05-13  7:45 [RFC v3 00/10] Some (drm_sched_|dma_)fence lifetime issues Tvrtko Ursulin
  2025-05-13  7:45 ` [RFC v3 01/10] dma-fence: Change signature of __dma_fence_is_later Tvrtko Ursulin
  2025-05-13  7:45 ` [RFC v3 02/10] dma-fence: Use a flag for 64-bit seqnos Tvrtko Ursulin
@ 2025-05-13  7:45 ` Tvrtko Ursulin
  2025-05-13  7:45 ` [RFC v3 04/10] sync_file: Use dma-fence driver and timeline name helpers Tvrtko Ursulin
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Tvrtko Ursulin @ 2025-05-13  7:45 UTC (permalink / raw)
  To: dri-devel
  Cc: Rob Clark, Sumit Semwal, Gustavo Padovan, Christian König,
	Matthew Brost, Lucas De Marchi, Rodrigo Vivi, amd-gfx, intel-xe,
	intel-gfx, linux-media, linaro-mm-sig, kernel-dev, Tvrtko Ursulin

Add some helpers in order to enable preventing dma-fence users accessing
the implementation details directly and make the implementation itself use
them.

This will also enable later adding some asserts to a consolidated
location.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-fence.c      |  9 +++++----
 include/linux/dma-fence.h        | 10 ++++++++++
 include/trace/events/dma_fence.h |  4 ++--
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 33661658f684..dc2456f68685 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -538,8 +538,8 @@ void dma_fence_release(struct kref *kref)
 	if (WARN(!list_empty(&fence->cb_list) &&
 		 !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags),
 		 "Fence %s:%s:%llx:%llx released with pending signals!\n",
-		 fence->ops->get_driver_name(fence),
-		 fence->ops->get_timeline_name(fence),
+		 dma_fence_driver_name(fence),
+		 dma_fence_timeline_name(fence),
 		 fence->context, fence->seqno)) {
 		unsigned long flags;
 
@@ -983,8 +983,9 @@ EXPORT_SYMBOL(dma_fence_set_deadline);
 void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq)
 {
 	seq_printf(seq, "%s %s seq %llu %ssignalled\n",
-		   fence->ops->get_driver_name(fence),
-		   fence->ops->get_timeline_name(fence), fence->seqno,
+		   dma_fence_driver_name(fence),
+		   dma_fence_timeline_name(fence),
+		   fence->seqno,
 		   dma_fence_is_signaled(fence) ? "" : "un");
 }
 EXPORT_SYMBOL(dma_fence_describe);
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index a34a0dcdc446..c5ac37e10d85 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -377,6 +377,16 @@ bool dma_fence_remove_callback(struct dma_fence *fence,
 			       struct dma_fence_cb *cb);
 void dma_fence_enable_sw_signaling(struct dma_fence *fence);
 
+static inline const char *dma_fence_driver_name(struct dma_fence *fence)
+{
+	return fence->ops->get_driver_name(fence);
+}
+
+static inline const char *dma_fence_timeline_name(struct dma_fence *fence)
+{
+	return fence->ops->get_timeline_name(fence);
+}
+
 /**
  * dma_fence_is_signaled_locked - Return an indication if the fence
  *                                is signaled yet.
diff --git a/include/trace/events/dma_fence.h b/include/trace/events/dma_fence.h
index a4de3df8500b..84c83074ee81 100644
--- a/include/trace/events/dma_fence.h
+++ b/include/trace/events/dma_fence.h
@@ -16,8 +16,8 @@ DECLARE_EVENT_CLASS(dma_fence,
 	TP_ARGS(fence),
 
 	TP_STRUCT__entry(
-		__string(driver, fence->ops->get_driver_name(fence))
-		__string(timeline, fence->ops->get_timeline_name(fence))
+		__string(driver, dma_fence_driver_name(fence))
+		__string(timeline, dma_fence_timeline_name(fence))
 		__field(unsigned int, context)
 		__field(unsigned int, seqno)
 	),
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC v3 04/10] sync_file: Use dma-fence driver and timeline name helpers
  2025-05-13  7:45 [RFC v3 00/10] Some (drm_sched_|dma_)fence lifetime issues Tvrtko Ursulin
                   ` (2 preceding siblings ...)
  2025-05-13  7:45 ` [RFC v3 03/10] dma-fence: Add helpers for accessing driver and timeline name Tvrtko Ursulin
@ 2025-05-13  7:45 ` Tvrtko Ursulin
  2025-05-13  7:45 ` [RFC v3 05/10] drm/amdgpu: " Tvrtko Ursulin
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Tvrtko Ursulin @ 2025-05-13  7:45 UTC (permalink / raw)
  To: dri-devel
  Cc: Rob Clark, Sumit Semwal, Gustavo Padovan, Christian König,
	Matthew Brost, Lucas De Marchi, Rodrigo Vivi, amd-gfx, intel-xe,
	intel-gfx, linux-media, linaro-mm-sig, kernel-dev, Tvrtko Ursulin

Access the dma-fence internals via the previously added helpers.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/sync_file.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
index d9b1c1b2a72b..212df4b849fe 100644
--- a/drivers/dma-buf/sync_file.c
+++ b/drivers/dma-buf/sync_file.c
@@ -137,8 +137,8 @@ char *sync_file_get_name(struct sync_file *sync_file, char *buf, int len)
 		struct dma_fence *fence = sync_file->fence;
 
 		snprintf(buf, len, "%s-%s%llu-%lld",
-			 fence->ops->get_driver_name(fence),
-			 fence->ops->get_timeline_name(fence),
+			 dma_fence_driver_name(fence),
+			 dma_fence_timeline_name(fence),
 			 fence->context,
 			 fence->seqno);
 	}
@@ -262,9 +262,9 @@ static long sync_file_ioctl_merge(struct sync_file *sync_file,
 static int sync_fill_fence_info(struct dma_fence *fence,
 				 struct sync_fence_info *info)
 {
-	strscpy(info->obj_name, fence->ops->get_timeline_name(fence),
+	strscpy(info->obj_name, dma_fence_timeline_name(fence),
 		sizeof(info->obj_name));
-	strscpy(info->driver_name, fence->ops->get_driver_name(fence),
+	strscpy(info->driver_name, dma_fence_driver_name(fence),
 		sizeof(info->driver_name));
 
 	info->status = dma_fence_get_status(fence);
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC v3 05/10] drm/amdgpu: Use dma-fence driver and timeline name helpers
  2025-05-13  7:45 [RFC v3 00/10] Some (drm_sched_|dma_)fence lifetime issues Tvrtko Ursulin
                   ` (3 preceding siblings ...)
  2025-05-13  7:45 ` [RFC v3 04/10] sync_file: Use dma-fence driver and timeline name helpers Tvrtko Ursulin
@ 2025-05-13  7:45 ` Tvrtko Ursulin
  2025-05-14 13:35   ` Christian König
  2025-05-13  7:45 ` [RFC v3 06/10] drm/i915: " Tvrtko Ursulin
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 14+ messages in thread
From: Tvrtko Ursulin @ 2025-05-13  7:45 UTC (permalink / raw)
  To: dri-devel
  Cc: Rob Clark, Sumit Semwal, Gustavo Padovan, Christian König,
	Matthew Brost, Lucas De Marchi, Rodrigo Vivi, amd-gfx, intel-xe,
	intel-gfx, linux-media, linaro-mm-sig, kernel-dev, Tvrtko Ursulin

Access the dma-fence internals via the previously added helpers.

Drop the macro while at it, since the length is now more manageable.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
index 11dd2e0f7979..4c61e4168f23 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
@@ -32,9 +32,6 @@
 #define TRACE_SYSTEM amdgpu
 #define TRACE_INCLUDE_FILE amdgpu_trace
 
-#define AMDGPU_JOB_GET_TIMELINE_NAME(job) \
-	 job->base.s_fence->finished.ops->get_timeline_name(&job->base.s_fence->finished)
-
 TRACE_EVENT(amdgpu_device_rreg,
 	    TP_PROTO(unsigned did, uint32_t reg, uint32_t value),
 	    TP_ARGS(did, reg, value),
@@ -168,7 +165,7 @@ TRACE_EVENT(amdgpu_cs_ioctl,
 	    TP_ARGS(job),
 	    TP_STRUCT__entry(
 			     __field(uint64_t, sched_job_id)
-			     __string(timeline, AMDGPU_JOB_GET_TIMELINE_NAME(job))
+			     __string(timeline, dma_fence_timeline_name(&job->base.s_fence->finished))
 			     __field(unsigned int, context)
 			     __field(unsigned int, seqno)
 			     __field(struct dma_fence *, fence)
@@ -194,7 +191,7 @@ TRACE_EVENT(amdgpu_sched_run_job,
 	    TP_ARGS(job),
 	    TP_STRUCT__entry(
 			     __field(uint64_t, sched_job_id)
-			     __string(timeline, AMDGPU_JOB_GET_TIMELINE_NAME(job))
+			     __string(timeline, dma_fence_timeline_name(&job->base.s_fence->finished))
 			     __field(unsigned int, context)
 			     __field(unsigned int, seqno)
 			     __string(ring, to_amdgpu_ring(job->base.sched)->name)
@@ -585,8 +582,6 @@ TRACE_EVENT(amdgpu_reset_reg_dumps,
 		      __entry->address,
 		      __entry->value)
 );
-
-#undef AMDGPU_JOB_GET_TIMELINE_NAME
 #endif
 
 /* This part must be outside protection */
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC v3 06/10] drm/i915: Use dma-fence driver and timeline name helpers
  2025-05-13  7:45 [RFC v3 00/10] Some (drm_sched_|dma_)fence lifetime issues Tvrtko Ursulin
                   ` (4 preceding siblings ...)
  2025-05-13  7:45 ` [RFC v3 05/10] drm/amdgpu: " Tvrtko Ursulin
@ 2025-05-13  7:45 ` Tvrtko Ursulin
  2025-05-13  7:45 ` [RFC v3 07/10] dma-fence: Add safe access helpers and document the rules Tvrtko Ursulin
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Tvrtko Ursulin @ 2025-05-13  7:45 UTC (permalink / raw)
  To: dri-devel
  Cc: Rob Clark, Sumit Semwal, Gustavo Padovan, Christian König,
	Matthew Brost, Lucas De Marchi, Rodrigo Vivi, amd-gfx, intel-xe,
	intel-gfx, linux-media, linaro-mm-sig, kernel-dev, Tvrtko Ursulin

Access the dma-fence internals via the previously added helpers.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_requests.c | 4 ++--
 drivers/gpu/drm/i915/i915_request.c         | 2 +-
 drivers/gpu/drm/i915/i915_sw_fence.c        | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index d1a382dfaa1d..ae3557ed6c1e 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -252,8 +252,8 @@ void intel_gt_watchdog_work(struct work_struct *work)
 			struct dma_fence *f = &rq->fence;
 
 			pr_notice("Fence expiration time out i915-%s:%s:%llx!\n",
-				  f->ops->get_driver_name(f),
-				  f->ops->get_timeline_name(f),
+				  dma_fence_driver_name(f),
+				  dma_fence_timeline_name(f),
 				  f->seqno);
 			i915_request_cancel(rq, -EINTR);
 		}
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index c3d27eadc0a7..4874c4f1e4ab 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -2184,7 +2184,7 @@ void i915_request_show(struct drm_printer *m,
 		       const char *prefix,
 		       int indent)
 {
-	const char *name = rq->fence.ops->get_timeline_name((struct dma_fence *)&rq->fence);
+	const char *name = dma_fence_timeline_name((struct dma_fence *)&rq->fence);
 	char buf[80] = "";
 	int x = 0;
 
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
index 1d4cc91c0e40..e51ca7e50a4e 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -435,8 +435,8 @@ static void timer_i915_sw_fence_wake(struct timer_list *t)
 		return;
 
 	pr_notice("Asynchronous wait on fence %s:%s:%llx timed out (hint:%ps)\n",
-		  cb->dma->ops->get_driver_name(cb->dma),
-		  cb->dma->ops->get_timeline_name(cb->dma),
+		  dma_fence_driver_name(cb->dma),
+		  dma_fence_timeline_name(cb->dma),
 		  cb->dma->seqno,
 		  i915_sw_fence_debug_hint(fence));
 
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC v3 07/10] dma-fence: Add safe access helpers and document the rules
  2025-05-13  7:45 [RFC v3 00/10] Some (drm_sched_|dma_)fence lifetime issues Tvrtko Ursulin
                   ` (5 preceding siblings ...)
  2025-05-13  7:45 ` [RFC v3 06/10] drm/i915: " Tvrtko Ursulin
@ 2025-05-13  7:45 ` Tvrtko Ursulin
  2025-05-14 13:50   ` Christian König
  2025-05-13  7:45 ` [RFC v3 08/10] sync_file: Protect access to driver and timeline name Tvrtko Ursulin
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 14+ messages in thread
From: Tvrtko Ursulin @ 2025-05-13  7:45 UTC (permalink / raw)
  To: dri-devel
  Cc: Rob Clark, Sumit Semwal, Gustavo Padovan, Christian König,
	Matthew Brost, Lucas De Marchi, Rodrigo Vivi, amd-gfx, intel-xe,
	intel-gfx, linux-media, linaro-mm-sig, kernel-dev, Tvrtko Ursulin

Dma-fence objects currently suffer from a potential use after free problem
where fences exported to userspace and other drivers can outlive the
exporting driver, or the associated data structures.

The discussion on how to address this concluded that adding reference
counting to all the involved objects is not desirable, since it would need
to be very wide reaching and could cause unloadable drivers if another
entity would be holding onto a signaled fence reference potentially
indefinitely.

This patch enables the safe access by introducing and documenting a
contract between fence exporters and users. It documents a set of
contraints and adds helpers which a) drivers with potential to suffer from
the use after free must use and b) users of the dma-fence API must use as
well.

Premise of the design has multiple sides:

1. Drivers (fence exporters) MUST ensure a RCU grace period between
signalling a fence and freeing the driver private data associated with it.

The grace period does not have to follow the signalling immediately but
HAS to happen before data is freed.

2. Users of the dma-fence API marked with such requirement MUST contain
the complete access to the data within a single code block guarded by the
new dma_fence_access_begin() and dma_fence_access_end() helpers.

The combination of the two ensures that whoever sees the
DMA_FENCE_FLAG_SIGNALED_BIT not set is guaranteed to have access to a
valid fence->lock and valid data potentially accessed by the fence->ops
virtual functions, until the call to dma_fence_access_end().

3. Module unload (fence->ops) disappearing is for now explicitly not
handled. That would required a more complex protection, possibly needing
SRCU instead of RCU to handle callers such as dma_fence_wait_timeout(),
where race between dma_fence_enable_sw_signaling, signalling, and
dereference of fence->ops->wait() would need a sleeping SRCU context.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
---
 drivers/dma-buf/dma-fence.c | 69 +++++++++++++++++++++++++++++++++++++
 include/linux/dma-fence.h   | 32 ++++++++++++-----
 2 files changed, 93 insertions(+), 8 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index dc2456f68685..cfe1d7b79c22 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -533,6 +533,7 @@ void dma_fence_release(struct kref *kref)
 	struct dma_fence *fence =
 		container_of(kref, struct dma_fence, refcount);
 
+	dma_fence_access_begin();
 	trace_dma_fence_destroy(fence);
 
 	if (WARN(!list_empty(&fence->cb_list) &&
@@ -560,6 +561,8 @@ void dma_fence_release(struct kref *kref)
 		fence->ops->release(fence);
 	else
 		dma_fence_free(fence);
+
+	dma_fence_access_end();
 }
 EXPORT_SYMBOL(dma_fence_release);
 
@@ -982,11 +985,13 @@ EXPORT_SYMBOL(dma_fence_set_deadline);
  */
 void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq)
 {
+	dma_fence_access_begin();
 	seq_printf(seq, "%s %s seq %llu %ssignalled\n",
 		   dma_fence_driver_name(fence),
 		   dma_fence_timeline_name(fence),
 		   fence->seqno,
 		   dma_fence_is_signaled(fence) ? "" : "un");
+	dma_fence_access_end();
 }
 EXPORT_SYMBOL(dma_fence_describe);
 
@@ -1033,3 +1038,67 @@ dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops,
 	__set_bit(DMA_FENCE_FLAG_SEQNO64_BIT, &fence->flags);
 }
 EXPORT_SYMBOL(dma_fence_init64);
+
+/**
+ * dma_fence_driver_name - Access the driver name
+ * @fence: the fence to query
+ *
+ * Returns a driver name backing the dma-fence implementation.
+ *
+ * IMPORTANT CONSIDERATION:
+ * Dma-fence contract stipulates that access to driver provided data (data not
+ * directly embedded into the object itself), such as the &dma_fence.lock and
+ * memory potentially accessed by the &dma_fence.ops functions, is forbidden
+ * after the fence has been signalled. Drivers are allowed to free that data,
+ * and some do.
+ *
+ * To allow safe access drivers are mandated to guarantee a RCU grace period
+ * between signalling the fence and freeing said data.
+ *
+ * As such access to the driver name is only valid inside a RCU locked section.
+ * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded
+ * by the &dma_fence_access_being and &dma_fence_access_end pair.
+ */
+const char *dma_fence_driver_name(struct dma_fence *fence)
+{
+	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
+			 "rcu_read_lock() required for safe access to returned string");
+
+	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
+		return fence->ops->get_driver_name(fence);
+	else
+		return "detached-driver";
+}
+EXPORT_SYMBOL(dma_fence_driver_name);
+
+/**
+ * dma_fence_timeline_name - Access the timeline name
+ * @fence: the fence to query
+ *
+ * Returns a timeline name provided by the dma-fence implementation.
+ *
+ * IMPORTANT CONSIDERATION:
+ * Dma-fence contract stipulates that access to driver provided data (data not
+ * directly embedded into the object itself), such as the &dma_fence.lock and
+ * memory potentially accessed by the &dma_fence.ops functions, is forbidden
+ * after the fence has been signalled. Drivers are allowed to free that data,
+ * and some do.
+ *
+ * To allow safe access drivers are mandated to guarantee a RCU grace period
+ * between signalling the fence and freeing said data.
+ *
+ * As such access to the driver name is only valid inside a RCU locked section.
+ * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded
+ * by the &dma_fence_access_being and &dma_fence_access_end pair.
+ */
+const char *dma_fence_timeline_name(struct dma_fence *fence)
+{
+	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
+			 "rcu_read_lock() required for safe access to returned string");
+
+	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
+		return fence->ops->get_driver_name(fence);
+	else
+		return "signaled-timeline";
+}
+EXPORT_SYMBOL(dma_fence_timeline_name);
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index c5ac37e10d85..b39e430142ea 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -377,15 +377,31 @@ bool dma_fence_remove_callback(struct dma_fence *fence,
 			       struct dma_fence_cb *cb);
 void dma_fence_enable_sw_signaling(struct dma_fence *fence);
 
-static inline const char *dma_fence_driver_name(struct dma_fence *fence)
-{
-	return fence->ops->get_driver_name(fence);
-}
+/**
+ * DOC: Safe external access to driver provided object members
+ *
+ * All data not stored directly in the dma-fence object, such as the
+ * &dma_fence.lock and memory potentially accessed by functions in the
+ * &dma_fence.ops table, MUST NOT be accessed after the fence has been signalled
+ * because after that point drivers are allowed to free it.
+ *
+ * All code accessing that data via the dma-fence API (or directly, which is
+ * discouraged), MUST make sure to contain the complete access within a
+ * &dma_fence_access_begin and &dma_fence_access_end pair.
+ *
+ * Some dma-fence API handles this automatically, while other, as for example
+ * &dma_fence_driver_name and &dma_fence_timeline_name, leave that
+ * responsibility to the caller.
+ *
+ * To enable this scheme to work drivers MUST ensure a RCU grace period elapses
+ * between signalling the fence and freeing the said data.
+ *
+ */
+#define dma_fence_access_begin	rcu_read_lock
+#define dma_fence_access_end	rcu_read_unlock
 
-static inline const char *dma_fence_timeline_name(struct dma_fence *fence)
-{
-	return fence->ops->get_timeline_name(fence);
-}
+const char *dma_fence_driver_name(struct dma_fence *fence);
+const char *dma_fence_timeline_name(struct dma_fence *fence);
 
 /**
  * dma_fence_is_signaled_locked - Return an indication if the fence
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC v3 08/10] sync_file: Protect access to driver and timeline name
  2025-05-13  7:45 [RFC v3 00/10] Some (drm_sched_|dma_)fence lifetime issues Tvrtko Ursulin
                   ` (6 preceding siblings ...)
  2025-05-13  7:45 ` [RFC v3 07/10] dma-fence: Add safe access helpers and document the rules Tvrtko Ursulin
@ 2025-05-13  7:45 ` Tvrtko Ursulin
  2025-05-13  7:45 ` [RFC v3 09/10] drm/i915: " Tvrtko Ursulin
  2025-05-13  7:45 ` [RFC v3 10/10] drm/xe: Make dma-fences compliant with the safe access rules Tvrtko Ursulin
  9 siblings, 0 replies; 14+ messages in thread
From: Tvrtko Ursulin @ 2025-05-13  7:45 UTC (permalink / raw)
  To: dri-devel
  Cc: Rob Clark, Sumit Semwal, Gustavo Padovan, Christian König,
	Matthew Brost, Lucas De Marchi, Rodrigo Vivi, amd-gfx, intel-xe,
	intel-gfx, linux-media, linaro-mm-sig, kernel-dev, Tvrtko Ursulin

Protect the access to driver and timeline name which otherwise could be
freed as dma-fence exported is signalling fences.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
---
 drivers/dma-buf/sync_file.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
index 212df4b849fe..ad87116baa24 100644
--- a/drivers/dma-buf/sync_file.c
+++ b/drivers/dma-buf/sync_file.c
@@ -136,11 +136,13 @@ char *sync_file_get_name(struct sync_file *sync_file, char *buf, int len)
 	} else {
 		struct dma_fence *fence = sync_file->fence;
 
+		dma_fence_access_begin();
 		snprintf(buf, len, "%s-%s%llu-%lld",
 			 dma_fence_driver_name(fence),
 			 dma_fence_timeline_name(fence),
 			 fence->context,
 			 fence->seqno);
+		dma_fence_access_end();
 	}
 
 	return buf;
@@ -262,6 +264,8 @@ static long sync_file_ioctl_merge(struct sync_file *sync_file,
 static int sync_fill_fence_info(struct dma_fence *fence,
 				 struct sync_fence_info *info)
 {
+	dma_fence_access_begin();
+
 	strscpy(info->obj_name, dma_fence_timeline_name(fence),
 		sizeof(info->obj_name));
 	strscpy(info->driver_name, dma_fence_driver_name(fence),
@@ -273,6 +277,8 @@ static int sync_fill_fence_info(struct dma_fence *fence,
 			ktime_to_ns(dma_fence_timestamp(fence)) :
 			ktime_set(0, 0);
 
+	dma_fence_access_end();
+
 	return info->status;
 }
 
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC v3 09/10] drm/i915: Protect access to driver and timeline name
  2025-05-13  7:45 [RFC v3 00/10] Some (drm_sched_|dma_)fence lifetime issues Tvrtko Ursulin
                   ` (7 preceding siblings ...)
  2025-05-13  7:45 ` [RFC v3 08/10] sync_file: Protect access to driver and timeline name Tvrtko Ursulin
@ 2025-05-13  7:45 ` Tvrtko Ursulin
  2025-05-13  7:45 ` [RFC v3 10/10] drm/xe: Make dma-fences compliant with the safe access rules Tvrtko Ursulin
  9 siblings, 0 replies; 14+ messages in thread
From: Tvrtko Ursulin @ 2025-05-13  7:45 UTC (permalink / raw)
  To: dri-devel
  Cc: Rob Clark, Sumit Semwal, Gustavo Padovan, Christian König,
	Matthew Brost, Lucas De Marchi, Rodrigo Vivi, amd-gfx, intel-xe,
	intel-gfx, linux-media, linaro-mm-sig, kernel-dev, Tvrtko Ursulin

Protect the access to driver and timeline name which otherwise could be
freed as dma-fence exported is signalling fences.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_requests.c | 2 ++
 drivers/gpu/drm/i915/i915_request.c         | 5 +++--
 drivers/gpu/drm/i915/i915_sw_fence.c        | 2 ++
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index ae3557ed6c1e..11fca24c8b5b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -251,10 +251,12 @@ void intel_gt_watchdog_work(struct work_struct *work)
 		if (!i915_request_completed(rq)) {
 			struct dma_fence *f = &rq->fence;
 
+			dma_fence_access_begin();
 			pr_notice("Fence expiration time out i915-%s:%s:%llx!\n",
 				  dma_fence_driver_name(f),
 				  dma_fence_timeline_name(f),
 				  f->seqno);
+			dma_fence_access_end();
 			i915_request_cancel(rq, -EINTR);
 		}
 		i915_request_put(rq);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 4874c4f1e4ab..8008b7767641 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -2184,7 +2184,6 @@ void i915_request_show(struct drm_printer *m,
 		       const char *prefix,
 		       int indent)
 {
-	const char *name = dma_fence_timeline_name((struct dma_fence *)&rq->fence);
 	char buf[80] = "";
 	int x = 0;
 
@@ -2220,6 +2219,7 @@ void i915_request_show(struct drm_printer *m,
 
 	x = print_sched_attr(&rq->sched.attr, buf, x, sizeof(buf));
 
+	dma_fence_access_begin();
 	drm_printf(m, "%s%.*s%c %llx:%lld%s%s %s @ %dms: %s\n",
 		   prefix, indent, "                ",
 		   queue_status(rq),
@@ -2228,7 +2228,8 @@ void i915_request_show(struct drm_printer *m,
 		   fence_status(rq),
 		   buf,
 		   jiffies_to_msecs(jiffies - rq->emitted_jiffies),
-		   name);
+		   dma_fence_timeline_name((struct dma_fence *)&rq->fence));
+	dma_fence_access_end();
 }
 
 static bool engine_match_ring(struct intel_engine_cs *engine, struct i915_request *rq)
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
index e51ca7e50a4e..e7bdc1165b90 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -434,11 +434,13 @@ static void timer_i915_sw_fence_wake(struct timer_list *t)
 	if (!fence)
 		return;
 
+	dma_fence_access_begin();
 	pr_notice("Asynchronous wait on fence %s:%s:%llx timed out (hint:%ps)\n",
 		  dma_fence_driver_name(cb->dma),
 		  dma_fence_timeline_name(cb->dma),
 		  cb->dma->seqno,
 		  i915_sw_fence_debug_hint(fence));
+	dma_fence_access_end();
 
 	i915_sw_fence_set_error_once(fence, -ETIMEDOUT);
 	i915_sw_fence_complete(fence);
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC v3 10/10] drm/xe: Make dma-fences compliant with the safe access rules
  2025-05-13  7:45 [RFC v3 00/10] Some (drm_sched_|dma_)fence lifetime issues Tvrtko Ursulin
                   ` (8 preceding siblings ...)
  2025-05-13  7:45 ` [RFC v3 09/10] drm/i915: " Tvrtko Ursulin
@ 2025-05-13  7:45 ` Tvrtko Ursulin
  9 siblings, 0 replies; 14+ messages in thread
From: Tvrtko Ursulin @ 2025-05-13  7:45 UTC (permalink / raw)
  To: dri-devel
  Cc: Rob Clark, Sumit Semwal, Gustavo Padovan, Christian König,
	Matthew Brost, Lucas De Marchi, Rodrigo Vivi, amd-gfx, intel-xe,
	intel-gfx, linux-media, linaro-mm-sig, kernel-dev, Tvrtko Ursulin

Xe can free some of the data pointed to by the dma-fences it exports. Most
notably the timeline name can get freed if userspace closes the associated
submit queue. At the same time the fence could have been exported to a
third party (for example a sync_fence fd) which will then cause an use-
after-free on subsequent access.

To make this safe we need to make the driver compliant with the newly
documented dma-fence rules. Driver has to ensure a RCU grace period
between signalling a fence and freeing any data pointed to by said fence.

For the timeline name we simply make the queue be freed via kfree_rcu and
for the shared lock associated with multiple queues we add a RCU grace
period before freeing the per GT structure holding the lock.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_exec_queue_types.h | 2 ++
 drivers/gpu/drm/xe/xe_guc_submit.c           | 7 ++++++-
 drivers/gpu/drm/xe/xe_hw_fence.c             | 3 +++
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h b/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
index 4c39f01e4f52..a3f421e2adc0 100644
--- a/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
@@ -20,6 +20,8 @@ struct xe_exec_queue;
 struct xe_guc_exec_queue {
 	/** @q: Backpointer to parent xe_exec_queue */
 	struct xe_exec_queue *q;
+	/** @rcu: For safe freeing of exported dma fences */
+	struct rcu_head rcu;
 	/** @sched: GPU scheduler for this xe_exec_queue */
 	struct xe_gpu_scheduler sched;
 	/** @entity: Scheduler entity for this xe_exec_queue */
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index fb125f940de8..879a4474bf51 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1291,7 +1291,11 @@ static void __guc_exec_queue_fini_async(struct work_struct *w)
 	xe_sched_entity_fini(&ge->entity);
 	xe_sched_fini(&ge->sched);
 
-	kfree(ge);
+	/*
+	 * RCU free due sched being exported via DRM scheduler fences
+	 * (timeline name).
+	 */
+	kfree_rcu(ge, rcu);
 	xe_exec_queue_fini(q);
 	xe_pm_runtime_put(guc_to_xe(guc));
 }
@@ -1474,6 +1478,7 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
 
 	q->guc = ge;
 	ge->q = q;
+	init_rcu_head(&ge->rcu);
 	init_waitqueue_head(&ge->suspend_wait);
 
 	for (i = 0; i < MAX_STATIC_MSG_TYPE; ++i)
diff --git a/drivers/gpu/drm/xe/xe_hw_fence.c b/drivers/gpu/drm/xe/xe_hw_fence.c
index 03eb8c6d1616..b2a0c46dfcd4 100644
--- a/drivers/gpu/drm/xe/xe_hw_fence.c
+++ b/drivers/gpu/drm/xe/xe_hw_fence.c
@@ -100,6 +100,9 @@ void xe_hw_fence_irq_finish(struct xe_hw_fence_irq *irq)
 		spin_unlock_irqrestore(&irq->lock, flags);
 		dma_fence_end_signalling(tmp);
 	}
+
+	/* Safe release of the irq->lock used in dma_fence_init. */
+	synchronize_rcu();
 }
 
 void xe_hw_fence_irq_run(struct xe_hw_fence_irq *irq)
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC v3 05/10] drm/amdgpu: Use dma-fence driver and timeline name helpers
  2025-05-13  7:45 ` [RFC v3 05/10] drm/amdgpu: " Tvrtko Ursulin
@ 2025-05-14 13:35   ` Christian König
  0 siblings, 0 replies; 14+ messages in thread
From: Christian König @ 2025-05-14 13:35 UTC (permalink / raw)
  To: Tvrtko Ursulin, dri-devel
  Cc: Rob Clark, Sumit Semwal, Gustavo Padovan, Matthew Brost,
	Lucas De Marchi, Rodrigo Vivi, amd-gfx, intel-xe, intel-gfx,
	linux-media, linaro-mm-sig, kernel-dev

On 5/13/25 09:45, Tvrtko Ursulin wrote:
> Access the dma-fence internals via the previously added helpers.
> 
> Drop the macro while at it, since the length is now more manageable.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 9 ++-------
>  1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> index 11dd2e0f7979..4c61e4168f23 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> @@ -32,9 +32,6 @@
>  #define TRACE_SYSTEM amdgpu
>  #define TRACE_INCLUDE_FILE amdgpu_trace
>  
> -#define AMDGPU_JOB_GET_TIMELINE_NAME(job) \
> -	 job->base.s_fence->finished.ops->get_timeline_name(&job->base.s_fence->finished)
> -
>  TRACE_EVENT(amdgpu_device_rreg,
>  	    TP_PROTO(unsigned did, uint32_t reg, uint32_t value),
>  	    TP_ARGS(did, reg, value),
> @@ -168,7 +165,7 @@ TRACE_EVENT(amdgpu_cs_ioctl,
>  	    TP_ARGS(job),
>  	    TP_STRUCT__entry(
>  			     __field(uint64_t, sched_job_id)
> -			     __string(timeline, AMDGPU_JOB_GET_TIMELINE_NAME(job))
> +			     __string(timeline, dma_fence_timeline_name(&job->base.s_fence->finished))
>  			     __field(unsigned int, context)
>  			     __field(unsigned int, seqno)
>  			     __field(struct dma_fence *, fence)
> @@ -194,7 +191,7 @@ TRACE_EVENT(amdgpu_sched_run_job,
>  	    TP_ARGS(job),
>  	    TP_STRUCT__entry(
>  			     __field(uint64_t, sched_job_id)
> -			     __string(timeline, AMDGPU_JOB_GET_TIMELINE_NAME(job))
> +			     __string(timeline, dma_fence_timeline_name(&job->base.s_fence->finished))
>  			     __field(unsigned int, context)
>  			     __field(unsigned int, seqno)
>  			     __string(ring, to_amdgpu_ring(job->base.sched)->name)
> @@ -585,8 +582,6 @@ TRACE_EVENT(amdgpu_reset_reg_dumps,
>  		      __entry->address,
>  		      __entry->value)
>  );
> -
> -#undef AMDGPU_JOB_GET_TIMELINE_NAME
>  #endif
>  
>  /* This part must be outside protection */


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v3 07/10] dma-fence: Add safe access helpers and document the rules
  2025-05-13  7:45 ` [RFC v3 07/10] dma-fence: Add safe access helpers and document the rules Tvrtko Ursulin
@ 2025-05-14 13:50   ` Christian König
  2025-05-14 14:46     ` Tvrtko Ursulin
  0 siblings, 1 reply; 14+ messages in thread
From: Christian König @ 2025-05-14 13:50 UTC (permalink / raw)
  To: Tvrtko Ursulin, dri-devel
  Cc: Rob Clark, Sumit Semwal, Gustavo Padovan, Matthew Brost,
	Lucas De Marchi, Rodrigo Vivi, amd-gfx, intel-xe, intel-gfx,
	linux-media, linaro-mm-sig, kernel-dev

I'm going to push patches #1-#6 to drm-misc-next.

They make sense as a stand alone cleanups anyway.

But that here needs a bit more documentation I think.

On 5/13/25 09:45, Tvrtko Ursulin wrote:
> Dma-fence objects currently suffer from a potential use after free problem
> where fences exported to userspace and other drivers can outlive the
> exporting driver, or the associated data structures.
> 
> The discussion on how to address this concluded that adding reference
> counting to all the involved objects is not desirable, since it would need
> to be very wide reaching and could cause unloadable drivers if another
> entity would be holding onto a signaled fence reference potentially
> indefinitely.
> 
> This patch enables the safe access by introducing and documenting a
> contract between fence exporters and users. It documents a set of
> contraints and adds helpers which a) drivers with potential to suffer from
> the use after free must use and b) users of the dma-fence API must use as
> well.
> 
> Premise of the design has multiple sides:
> 
> 1. Drivers (fence exporters) MUST ensure a RCU grace period between
> signalling a fence and freeing the driver private data associated with it.

That's a must have anyway, otherwise functions like dma_fence_get_rcu() won't work.

I hope that we have documented that somewhere, but I'm not 100% sure to be honest.

> The grace period does not have to follow the signalling immediately but
> HAS to happen before data is freed.

That is the new requirement we have to document somehow.

I'm not 100% sure but I think module unloading waits for an RCU grace period anyway.


> 2. Users of the dma-fence API marked with such requirement MUST contain
> the complete access to the data within a single code block guarded by the
> new dma_fence_access_begin() and dma_fence_access_end() helpers.
> 
> The combination of the two ensures that whoever sees the
> DMA_FENCE_FLAG_SIGNALED_BIT not set is guaranteed to have access to a
> valid fence->lock and valid data potentially accessed by the fence->ops
> virtual functions, until the call to dma_fence_access_end().

Mhm, how about returning copies of the string?

This is only for debugging anyway and kstrdup_const() isn't that costly.

Regards,
Christian.


> 
> 3. Module unload (fence->ops) disappearing is for now explicitly not
> handled. That would required a more complex protection, possibly needing
> SRCU instead of RCU to handle callers such as dma_fence_wait_timeout(),
> where race between dma_fence_enable_sw_signaling, signalling, and
> dereference of fence->ops->wait() would need a sleeping SRCU context.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> ---
>  drivers/dma-buf/dma-fence.c | 69 +++++++++++++++++++++++++++++++++++++
>  include/linux/dma-fence.h   | 32 ++++++++++++-----
>  2 files changed, 93 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index dc2456f68685..cfe1d7b79c22 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -533,6 +533,7 @@ void dma_fence_release(struct kref *kref)
>  	struct dma_fence *fence =
>  		container_of(kref, struct dma_fence, refcount);
>  
> +	dma_fence_access_begin();
>  	trace_dma_fence_destroy(fence);
>  
>  	if (WARN(!list_empty(&fence->cb_list) &&
> @@ -560,6 +561,8 @@ void dma_fence_release(struct kref *kref)
>  		fence->ops->release(fence);
>  	else
>  		dma_fence_free(fence);
> +
> +	dma_fence_access_end();
>  }
>  EXPORT_SYMBOL(dma_fence_release);
>  
> @@ -982,11 +985,13 @@ EXPORT_SYMBOL(dma_fence_set_deadline);
>   */
>  void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq)
>  {
> +	dma_fence_access_begin();
>  	seq_printf(seq, "%s %s seq %llu %ssignalled\n",
>  		   dma_fence_driver_name(fence),
>  		   dma_fence_timeline_name(fence),
>  		   fence->seqno,
>  		   dma_fence_is_signaled(fence) ? "" : "un");
> +	dma_fence_access_end();
>  }
>  EXPORT_SYMBOL(dma_fence_describe);
>  
> @@ -1033,3 +1038,67 @@ dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops,
>  	__set_bit(DMA_FENCE_FLAG_SEQNO64_BIT, &fence->flags);
>  }
>  EXPORT_SYMBOL(dma_fence_init64);
> +
> +/**
> + * dma_fence_driver_name - Access the driver name
> + * @fence: the fence to query
> + *
> + * Returns a driver name backing the dma-fence implementation.
> + *
> + * IMPORTANT CONSIDERATION:
> + * Dma-fence contract stipulates that access to driver provided data (data not
> + * directly embedded into the object itself), such as the &dma_fence.lock and
> + * memory potentially accessed by the &dma_fence.ops functions, is forbidden
> + * after the fence has been signalled. Drivers are allowed to free that data,
> + * and some do.
> + *
> + * To allow safe access drivers are mandated to guarantee a RCU grace period
> + * between signalling the fence and freeing said data.
> + *
> + * As such access to the driver name is only valid inside a RCU locked section.
> + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded
> + * by the &dma_fence_access_being and &dma_fence_access_end pair.
> + */
> +const char *dma_fence_driver_name(struct dma_fence *fence)
> +{
> +	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
> +			 "rcu_read_lock() required for safe access to returned string");
> +
> +	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> +		return fence->ops->get_driver_name(fence);
> +	else
> +		return "detached-driver";
> +}
> +EXPORT_SYMBOL(dma_fence_driver_name);
> +
> +/**
> + * dma_fence_timeline_name - Access the timeline name
> + * @fence: the fence to query
> + *
> + * Returns a timeline name provided by the dma-fence implementation.
> + *
> + * IMPORTANT CONSIDERATION:
> + * Dma-fence contract stipulates that access to driver provided data (data not
> + * directly embedded into the object itself), such as the &dma_fence.lock and
> + * memory potentially accessed by the &dma_fence.ops functions, is forbidden
> + * after the fence has been signalled. Drivers are allowed to free that data,
> + * and some do.
> + *
> + * To allow safe access drivers are mandated to guarantee a RCU grace period
> + * between signalling the fence and freeing said data.
> + *
> + * As such access to the driver name is only valid inside a RCU locked section.
> + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded
> + * by the &dma_fence_access_being and &dma_fence_access_end pair.
> + */
> +const char *dma_fence_timeline_name(struct dma_fence *fence)
> +{
> +	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
> +			 "rcu_read_lock() required for safe access to returned string");
> +
> +	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> +		return fence->ops->get_driver_name(fence);
> +	else
> +		return "signaled-timeline";
> +}
> +EXPORT_SYMBOL(dma_fence_timeline_name);
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index c5ac37e10d85..b39e430142ea 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -377,15 +377,31 @@ bool dma_fence_remove_callback(struct dma_fence *fence,
>  			       struct dma_fence_cb *cb);
>  void dma_fence_enable_sw_signaling(struct dma_fence *fence);
>  
> -static inline const char *dma_fence_driver_name(struct dma_fence *fence)
> -{
> -	return fence->ops->get_driver_name(fence);
> -}
> +/**
> + * DOC: Safe external access to driver provided object members
> + *
> + * All data not stored directly in the dma-fence object, such as the
> + * &dma_fence.lock and memory potentially accessed by functions in the
> + * &dma_fence.ops table, MUST NOT be accessed after the fence has been signalled
> + * because after that point drivers are allowed to free it.
> + *
> + * All code accessing that data via the dma-fence API (or directly, which is
> + * discouraged), MUST make sure to contain the complete access within a
> + * &dma_fence_access_begin and &dma_fence_access_end pair.
> + *
> + * Some dma-fence API handles this automatically, while other, as for example
> + * &dma_fence_driver_name and &dma_fence_timeline_name, leave that
> + * responsibility to the caller.
> + *
> + * To enable this scheme to work drivers MUST ensure a RCU grace period elapses
> + * between signalling the fence and freeing the said data.
> + *
> + */
> +#define dma_fence_access_begin	rcu_read_lock
> +#define dma_fence_access_end	rcu_read_unlock
>  
> -static inline const char *dma_fence_timeline_name(struct dma_fence *fence)
> -{
> -	return fence->ops->get_timeline_name(fence);
> -}
> +const char *dma_fence_driver_name(struct dma_fence *fence);
> +const char *dma_fence_timeline_name(struct dma_fence *fence);
>  
>  /**
>   * dma_fence_is_signaled_locked - Return an indication if the fence


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v3 07/10] dma-fence: Add safe access helpers and document the rules
  2025-05-14 13:50   ` Christian König
@ 2025-05-14 14:46     ` Tvrtko Ursulin
  0 siblings, 0 replies; 14+ messages in thread
From: Tvrtko Ursulin @ 2025-05-14 14:46 UTC (permalink / raw)
  To: Christian König, dri-devel
  Cc: Rob Clark, Sumit Semwal, Gustavo Padovan, Matthew Brost,
	Lucas De Marchi, Rodrigo Vivi, amd-gfx, intel-xe, intel-gfx,
	linux-media, linaro-mm-sig, kernel-dev


On 14/05/2025 14:50, Christian König wrote:
> I'm going to push patches #1-#6 to drm-misc-next.
> 
> They make sense as a stand alone cleanups anyway.

Holding off for now because the 64 bit seqno flag needs a build fix. I 
have it locally but was waiting for more feedback on the series.
  > But that here needs a bit more documentation I think.
> 
> On 5/13/25 09:45, Tvrtko Ursulin wrote:
>> Dma-fence objects currently suffer from a potential use after free problem
>> where fences exported to userspace and other drivers can outlive the
>> exporting driver, or the associated data structures.
>>
>> The discussion on how to address this concluded that adding reference
>> counting to all the involved objects is not desirable, since it would need
>> to be very wide reaching and could cause unloadable drivers if another
>> entity would be holding onto a signaled fence reference potentially
>> indefinitely.
>>
>> This patch enables the safe access by introducing and documenting a
>> contract between fence exporters and users. It documents a set of
>> contraints and adds helpers which a) drivers with potential to suffer from
>> the use after free must use and b) users of the dma-fence API must use as
>> well.
>>
>> Premise of the design has multiple sides:
>>
>> 1. Drivers (fence exporters) MUST ensure a RCU grace period between
>> signalling a fence and freeing the driver private data associated with it.
> 
> That's a must have anyway, otherwise functions like dma_fence_get_rcu() won't work.

That one is for the fence object itself, right? Above I was specifically 
talking about the data pointed to from the fence such as the lock and 
and driver data accessible via the ops vfuncs.

> I hope that we have documented that somewhere, but I'm not 100% sure to be honest.
> 
>> The grace period does not have to follow the signalling immediately but
>> HAS to happen before data is freed.
> 
> That is the new requirement we have to document somehow.

It is still the same number 1) bullet point. Just clarifying that one 
does not need engineer for grace periods right after signalling.

> I'm not 100% sure but I think module unloading waits for an RCU grace period anyway.
> 
> 
>> 2. Users of the dma-fence API marked with such requirement MUST contain
>> the complete access to the data within a single code block guarded by the
>> new dma_fence_access_begin() and dma_fence_access_end() helpers.
>>
>> The combination of the two ensures that whoever sees the
>> DMA_FENCE_FLAG_SIGNALED_BIT not set is guaranteed to have access to a
>> valid fence->lock and valid data potentially accessed by the fence->ops
>> virtual functions, until the call to dma_fence_access_end().
> 
> Mhm, how about returning copies of the string?
> 
> This is only for debugging anyway and kstrdup_const() isn't that costly.

Not sure exactly how that would work. Wouldn't it imply either 
GFP_ATOMIC or upgrading the RCU requirements to SRCU?

Regards,

Tvrtko

>> 3. Module unload (fence->ops) disappearing is for now explicitly not
>> handled. That would required a more complex protection, possibly needing
>> SRCU instead of RCU to handle callers such as dma_fence_wait_timeout(),
>> where race between dma_fence_enable_sw_signaling, signalling, and
>> dereference of fence->ops->wait() would need a sleeping SRCU context.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> ---
>>   drivers/dma-buf/dma-fence.c | 69 +++++++++++++++++++++++++++++++++++++
>>   include/linux/dma-fence.h   | 32 ++++++++++++-----
>>   2 files changed, 93 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>> index dc2456f68685..cfe1d7b79c22 100644
>> --- a/drivers/dma-buf/dma-fence.c
>> +++ b/drivers/dma-buf/dma-fence.c
>> @@ -533,6 +533,7 @@ void dma_fence_release(struct kref *kref)
>>   	struct dma_fence *fence =
>>   		container_of(kref, struct dma_fence, refcount);
>>   
>> +	dma_fence_access_begin();
>>   	trace_dma_fence_destroy(fence);
>>   
>>   	if (WARN(!list_empty(&fence->cb_list) &&
>> @@ -560,6 +561,8 @@ void dma_fence_release(struct kref *kref)
>>   		fence->ops->release(fence);
>>   	else
>>   		dma_fence_free(fence);
>> +
>> +	dma_fence_access_end();
>>   }
>>   EXPORT_SYMBOL(dma_fence_release);
>>   
>> @@ -982,11 +985,13 @@ EXPORT_SYMBOL(dma_fence_set_deadline);
>>    */
>>   void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq)
>>   {
>> +	dma_fence_access_begin();
>>   	seq_printf(seq, "%s %s seq %llu %ssignalled\n",
>>   		   dma_fence_driver_name(fence),
>>   		   dma_fence_timeline_name(fence),
>>   		   fence->seqno,
>>   		   dma_fence_is_signaled(fence) ? "" : "un");
>> +	dma_fence_access_end();
>>   }
>>   EXPORT_SYMBOL(dma_fence_describe);
>>   
>> @@ -1033,3 +1038,67 @@ dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops,
>>   	__set_bit(DMA_FENCE_FLAG_SEQNO64_BIT, &fence->flags);
>>   }
>>   EXPORT_SYMBOL(dma_fence_init64);
>> +
>> +/**
>> + * dma_fence_driver_name - Access the driver name
>> + * @fence: the fence to query
>> + *
>> + * Returns a driver name backing the dma-fence implementation.
>> + *
>> + * IMPORTANT CONSIDERATION:
>> + * Dma-fence contract stipulates that access to driver provided data (data not
>> + * directly embedded into the object itself), such as the &dma_fence.lock and
>> + * memory potentially accessed by the &dma_fence.ops functions, is forbidden
>> + * after the fence has been signalled. Drivers are allowed to free that data,
>> + * and some do.
>> + *
>> + * To allow safe access drivers are mandated to guarantee a RCU grace period
>> + * between signalling the fence and freeing said data.
>> + *
>> + * As such access to the driver name is only valid inside a RCU locked section.
>> + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded
>> + * by the &dma_fence_access_being and &dma_fence_access_end pair.
>> + */
>> +const char *dma_fence_driver_name(struct dma_fence *fence)
>> +{
>> +	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
>> +			 "rcu_read_lock() required for safe access to returned string");
>> +
>> +	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>> +		return fence->ops->get_driver_name(fence);
>> +	else
>> +		return "detached-driver";
>> +}
>> +EXPORT_SYMBOL(dma_fence_driver_name);
>> +
>> +/**
>> + * dma_fence_timeline_name - Access the timeline name
>> + * @fence: the fence to query
>> + *
>> + * Returns a timeline name provided by the dma-fence implementation.
>> + *
>> + * IMPORTANT CONSIDERATION:
>> + * Dma-fence contract stipulates that access to driver provided data (data not
>> + * directly embedded into the object itself), such as the &dma_fence.lock and
>> + * memory potentially accessed by the &dma_fence.ops functions, is forbidden
>> + * after the fence has been signalled. Drivers are allowed to free that data,
>> + * and some do.
>> + *
>> + * To allow safe access drivers are mandated to guarantee a RCU grace period
>> + * between signalling the fence and freeing said data.
>> + *
>> + * As such access to the driver name is only valid inside a RCU locked section.
>> + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded
>> + * by the &dma_fence_access_being and &dma_fence_access_end pair.
>> + */
>> +const char *dma_fence_timeline_name(struct dma_fence *fence)
>> +{
>> +	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
>> +			 "rcu_read_lock() required for safe access to returned string");
>> +
>> +	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>> +		return fence->ops->get_driver_name(fence);
>> +	else
>> +		return "signaled-timeline";
>> +}
>> +EXPORT_SYMBOL(dma_fence_timeline_name);
>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>> index c5ac37e10d85..b39e430142ea 100644
>> --- a/include/linux/dma-fence.h
>> +++ b/include/linux/dma-fence.h
>> @@ -377,15 +377,31 @@ bool dma_fence_remove_callback(struct dma_fence *fence,
>>   			       struct dma_fence_cb *cb);
>>   void dma_fence_enable_sw_signaling(struct dma_fence *fence);
>>   
>> -static inline const char *dma_fence_driver_name(struct dma_fence *fence)
>> -{
>> -	return fence->ops->get_driver_name(fence);
>> -}
>> +/**
>> + * DOC: Safe external access to driver provided object members
>> + *
>> + * All data not stored directly in the dma-fence object, such as the
>> + * &dma_fence.lock and memory potentially accessed by functions in the
>> + * &dma_fence.ops table, MUST NOT be accessed after the fence has been signalled
>> + * because after that point drivers are allowed to free it.
>> + *
>> + * All code accessing that data via the dma-fence API (or directly, which is
>> + * discouraged), MUST make sure to contain the complete access within a
>> + * &dma_fence_access_begin and &dma_fence_access_end pair.
>> + *
>> + * Some dma-fence API handles this automatically, while other, as for example
>> + * &dma_fence_driver_name and &dma_fence_timeline_name, leave that
>> + * responsibility to the caller.
>> + *
>> + * To enable this scheme to work drivers MUST ensure a RCU grace period elapses
>> + * between signalling the fence and freeing the said data.
>> + *
>> + */
>> +#define dma_fence_access_begin	rcu_read_lock
>> +#define dma_fence_access_end	rcu_read_unlock
>>   
>> -static inline const char *dma_fence_timeline_name(struct dma_fence *fence)
>> -{
>> -	return fence->ops->get_timeline_name(fence);
>> -}
>> +const char *dma_fence_driver_name(struct dma_fence *fence);
>> +const char *dma_fence_timeline_name(struct dma_fence *fence);
>>   
>>   /**
>>    * dma_fence_is_signaled_locked - Return an indication if the fence
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-05-14 14:46 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-13  7:45 [RFC v3 00/10] Some (drm_sched_|dma_)fence lifetime issues Tvrtko Ursulin
2025-05-13  7:45 ` [RFC v3 01/10] dma-fence: Change signature of __dma_fence_is_later Tvrtko Ursulin
2025-05-13  7:45 ` [RFC v3 02/10] dma-fence: Use a flag for 64-bit seqnos Tvrtko Ursulin
2025-05-13  7:45 ` [RFC v3 03/10] dma-fence: Add helpers for accessing driver and timeline name Tvrtko Ursulin
2025-05-13  7:45 ` [RFC v3 04/10] sync_file: Use dma-fence driver and timeline name helpers Tvrtko Ursulin
2025-05-13  7:45 ` [RFC v3 05/10] drm/amdgpu: " Tvrtko Ursulin
2025-05-14 13:35   ` Christian König
2025-05-13  7:45 ` [RFC v3 06/10] drm/i915: " Tvrtko Ursulin
2025-05-13  7:45 ` [RFC v3 07/10] dma-fence: Add safe access helpers and document the rules Tvrtko Ursulin
2025-05-14 13:50   ` Christian König
2025-05-14 14:46     ` Tvrtko Ursulin
2025-05-13  7:45 ` [RFC v3 08/10] sync_file: Protect access to driver and timeline name Tvrtko Ursulin
2025-05-13  7:45 ` [RFC v3 09/10] drm/i915: " Tvrtko Ursulin
2025-05-13  7:45 ` [RFC v3 10/10] drm/xe: Make dma-fences compliant with the safe access rules Tvrtko Ursulin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).