All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] dma-buf/dma_fence: Make races for dma_fence_is_signaled() less likely
@ 2026-06-12 10:42 Philipp Stanner
  2026-06-12 10:56 ` sashiko-bot
  0 siblings, 1 reply; 2+ messages in thread
From: Philipp Stanner @ 2026-06-12 10:42 UTC (permalink / raw)
  To: Danilo Krummrich, Maarten Lankhorst, David Airlie, Simona Vetter,
	Sumit Semwal, Christian König, Tvrtko Ursulin,
	Boris Brezillon, Paul E . McKenney
  Cc: dri-devel, linux-kernel, Philipp Stanner

dma_fence_is_signaled() returns whether a fence has been signaled
already. That function contains a fast path opportunistic check which is
not guarded by the lock and, according to Christian, cannot be guarded
by the lock without causing a massive performance regression.

This now means that dma_fence_is_signaled() can return true WHILE the
fence callbacks are still being executed. This is razy and has lead to
at least one bug solved in:

commit c8a5d5ea3ba6 ("nouveau: fix client work fence deletion race")

Make this race impossible, by simply setting the bit only once the
callbacks are actually completed.

Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
 drivers/dma-buf/dma-fence.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index c7ea1e75d38a..2416cc86ce93 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -359,8 +359,19 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
 
 	dma_fence_assert_held(fence);
 
-	if (unlikely(test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
-				      &fence->flags)))
+	/*
+	 * First test the bit, so we don't signal an already signaled fence again.
+	 * The lock protects against multiple parties setting the bit. The bit
+	 * is then set at the end of the function.
+	 *
+	 * The background is that there is a fast path check in
+	 * dma_fence_is_signaled() which does not use lock protection and can
+	 * return true *while* the fence callbacks are still executing.
+	 *
+	 * This fast path check supposedly cannot be guarded by the lock because
+	 * of significant performance regressions.
+	 */
+	if (unlikely(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)))
 		return;
 
 	trace_dma_fence_signaled(fence);
@@ -384,6 +395,9 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
 		INIT_LIST_HEAD(&cur->node);
 		cur->func(fence, cur);
 	}
+
+	// TODO: we need some barrier here, don't we?
+	set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
 }
 EXPORT_SYMBOL(dma_fence_signal_timestamp_locked);
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [RFC PATCH] dma-buf/dma_fence: Make races for dma_fence_is_signaled() less likely
  2026-06-12 10:42 [RFC PATCH] dma-buf/dma_fence: Make races for dma_fence_is_signaled() less likely Philipp Stanner
@ 2026-06-12 10:56 ` sashiko-bot
  0 siblings, 0 replies; 2+ messages in thread
From: sashiko-bot @ 2026-06-12 10:56 UTC (permalink / raw)
  To: Philipp Stanner; +Cc: dri-devel

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Critical] Moving the signaled bit update creates a window where `fence->ops` is cleared to NULL but the fence is not yet marked as signaled, causing a NULL pointer dereference in lockless accessors.
- [High] Missing release memory barrier before setting the signaled bit, defeating the purpose of the patch on weakly-ordered architectures and risking use-after-free.
- [High] Delaying the signaled bit breaks the fast-path lockless check in `dma_fence_is_signaled()`, leading to lock contention, recursive deadlocks, and API contract violations.
--

commit 0ba6de5d00b831f763affe10965d5c591e6f0b5f
Author: Philipp Stanner <phasta@kernel.org>

dma-buf/dma_fence: Make races for dma_fence_is_signaled() less likely

This commit addresses a race condition where dma_fence_is_signaled() can
return true while fence callbacks are still executing. It delays setting
the DMA_FENCE_FLAG_SIGNALED_BIT until after the callbacks have completed,
instead of setting it at the beginning of dma_fence_signal_timestamp_locked().

> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -359,8 +359,19 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>  
>  	dma_fence_assert_held(fence);
>  
> -	if (unlikely(test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> -				      &fence->flags)))
> +	/*
> +	 * First test the bit, so we don't signal an already signaled fence again.
> +	 * The lock protects against multiple parties setting the bit. The bit
> +	 * is then set at the end of the function.
> +	 *
> +	 * The background is that there is a fast path check in
> +	 * dma_fence_is_signaled() which does not use lock protection and can
> +	 * return true *while* the fence callbacks are still executing.
> +	 *
> +	 * This fast path check supposedly cannot be guarded by the lock because
> +	 * of significant performance regressions.
> +	 */
> +	if (unlikely(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)))
>  		return;
>  
>  	trace_dma_fence_signaled(fence);

[Severity: Critical]
Because the signaled bit is no longer set here, is there a risk of a NULL
pointer dereference for concurrent lockless readers?

Just below this in dma_fence_signal_timestamp_locked(), fence->ops can be
cleared:

    ops = rcu_dereference_protected(fence->ops, true);
    if (!ops->release && !ops->wait)
        RCU_INIT_POINTER(fence->ops, NULL);

With the signaled bit still false, could lockless accessors like
dma_fence_driver_name() see the unsignaled state and evaluate
ops->get_driver_name(fence) on the NULL ops pointer?

[ ... ]
> @@ -384,6 +395,9 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>  		INIT_LIST_HEAD(&cur->node);
>  		cur->func(fence, cur);
>  	}

[Severity: High]
If a callback executes while the signaled bit is still false, won't this
break the lockless fast-path check in dma_fence_is_signaled()?

If dma_fence_is_signaled() or dma_fence_timestamp() is called from within
a callback, the unsignaled state would cause dma_fence_timestamp() to hit
a WARN_ON(!test_bit(...)).

Additionally, dma_fence_is_signaled() would fall back to dma_fence_signal(),
which attempts to acquire the fence->lock. Since the lock is already held
by the current thread executing the callbacks, would this result in a
recursive deadlock?

> +
> +	// TODO: we need some barrier here, don't we?
> +	set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);

[Severity: High]
Does this missing memory barrier reintroduce the use-after-free risk on
weakly-ordered architectures?

Since set_bit() does not imply a memory barrier, could the CPU reorder
this assignment to become globally visible before the memory writes performed
by the callbacks?

If a concurrent lockless reader using dma_fence_test_signaled_flag()
observes the bit as set but reads stale data, this might defeat the purpose
of the patch. Is an explicit smp_mb__before_atomic() needed here?

>  }
>  EXPORT_SYMBOL(dma_fence_signal_timestamp_locked);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260612104251.2264707-2-phasta@kernel.org?part=1

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-06-12 10:56 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-12 10:42 [RFC PATCH] dma-buf/dma_fence: Make races for dma_fence_is_signaled() less likely Philipp Stanner
2026-06-12 10:56 ` sashiko-bot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.