All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] dma-buf: dma-fence: Fix potential NULL pointer dereference
@ 2026-06-26 10:04 Philipp Stanner
  2026-06-26 10:16 ` sashiko-bot
  2026-06-26 12:21 ` Christian König
  0 siblings, 2 replies; 3+ messages in thread
From: Philipp Stanner @ 2026-06-26 10:04 UTC (permalink / raw)
  To: Sumit Semwal, Christian König, Boris Brezillon,
	Tvrtko Ursulin, André Draszik, Danilo Krummrich, Gary Guo,
	Paul E . McKenney, Boqun Feng
  Cc: linux-media, dri-devel, linaro-mm-sig, linux-kernel,
	Philipp Stanner, stable

The commit mentioned in the fixes tag below introduced a mechanism
through which fence producers can fully decouple from fence consumers.
This, desirable, mechanism is based on the fence's signaled-bit as the
"decoupling point".

A sophisticated interaction between RCU and atomic instructions attempts
to ensure that fence consumers can still interact with fence producers
through the dma_fence_ops, callback pointers into the producer.

This is the desired behavior: to check for decoupling, the signaled-bit
is first checked. If it's not yet signaled, RCU ensures that the ops
pointer cannot yet be NULL.

Hereby, dma_fence_signal_timestamp_locked() first sets the signaled-bit,
and then sets the ops pointer to NULL. Readers first load the ops
pointer, and then check through the signaled-bit whether the pointer can
legally be accessed.

These set and load operations could occur out of order on weakly ordered
platforms. Hence, we need to enforce strict ordering all the time.

Add the appropriate memory barriers.

Cc: stable@vger.kernel.org
Fixes: f4cc3ab824d6 ("dma-buf: protected fence ops by RCU v8")
Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
Tested with dmabuf and drm_sched unit tests.

Memory barriers are notoriously difficult, so I would appreciate if some
of the more experienced folks can check this. Notably, I am not sure
whether the smp_wmb() is necessary.

The documentation for test_and_set_bit() makes the mysterious statement
"This is an atomic fully-ordered operation (implied full memory
barrier)", but the kcsan_mb() seems to be some sort of debugging
barrier, and in any case the docu doesn't make it obvious to me whether
that "full barrier" comes before or after the bit setting takes place.

Moreover, in my opinion we should order dma_fence_is_signaled(), too –
but if we agree to merge Christian's new series [1] that need should
disappear.


[1] https://lore.kernel.org/dri-devel/20260624122917.2483-1-christian.koenig@amd.com/ 
---
 drivers/dma-buf/dma-fence.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index c7ea1e75d38a..2e80b01499de 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -363,6 +363,18 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
 				      &fence->flags)))
 		return;
 
+	/*
+	 * Fully order setting of the bit above with setting of the ops pointer
+	 * to NULL below, so that all parties can use the signaled flag to
+	 * detect that the fence decoupled from its ops in a safe manner.
+	 *
+	 * The counter parts of this barrier are in dma_fence_timeline_name()
+	 * and dma_fence_driver_name(). All other future parties that rely on
+	 * the signaled flag for valid access to the ops pointer will need a
+	 * memory barrier.
+	 */
+	smp_wmb();
+
 	trace_dma_fence_signaled(fence);
 
 	/*
@@ -1170,6 +1182,12 @@ const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
 
 	/* RCU protection is required for safe access to returned string */
 	ops = rcu_dereference(fence->ops);
+	/*
+	 * Fully order the dereference above with the flag check. Otherwise,
+	 * ops could be dereferenced as a NULL pointer. The barrier's
+	 * counterpart is in dma_fence_signal_timestamp_locked().
+	 */
+	smp_rmb();
 	if (!dma_fence_test_signaled_flag(fence))
 		return (const char __rcu *)ops->get_driver_name(fence);
 	else
@@ -1203,6 +1221,12 @@ const char __rcu *dma_fence_timeline_name(struct dma_fence *fence)
 
 	/* RCU protection is required for safe access to returned string */
 	ops = rcu_dereference(fence->ops);
+	/*
+	 * Fully order the dereference above with the flag check. Otherwise,
+	 * ops could be dereferenced as a NULL pointer. The barrier's
+	 * counterpart is in dma_fence_signal_timestamp_locked().
+	 */
+	smp_rmb();
 	if (!dma_fence_test_signaled_flag(fence))
 		return (const char __rcu *)ops->get_driver_name(fence);
 	else

base-commit: cdeb2ccd993ed8647adbbda2c3b103aa717fd6f7
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] dma-buf: dma-fence: Fix potential NULL pointer dereference
  2026-06-26 10:04 [PATCH] dma-buf: dma-fence: Fix potential NULL pointer dereference Philipp Stanner
@ 2026-06-26 10:16 ` sashiko-bot
  2026-06-26 12:21 ` Christian König
  1 sibling, 0 replies; 3+ messages in thread
From: sashiko-bot @ 2026-06-26 10:16 UTC (permalink / raw)
  To: Philipp Stanner; +Cc: dri-devel

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] TOCTOU race condition in DRM CRTC fence operations triggers a kernel panic via BUG_ON.
--

commit 00a837e98e1d45bf6a3df4ddad52336211392885
Author: Philipp Stanner <phasta@kernel.org>

dma-buf: dma-fence: Fix potential NULL pointer dereference

This commit adds memory barriers to the fence signaling and name retrieval
functions. It ensures that the signaled bit is set before the ops pointer
is cleared, and that readers correctly order the ops pointer load before
checking the signaled bit to safely handle decoupled fences.

> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index c7ea1e75d38a2..2e80b01499de8 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c

[ ... ]

> @@ -1170,6 +1182,12 @@ const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
>  
>  	/* RCU protection is required for safe access to returned string */
>  	ops = rcu_dereference(fence->ops);
> +	/*
> +	 * Fully order the dereference above with the flag check. Otherwise,
> +	 * ops could be dereferenced as a NULL pointer. The barrier's
> +	 * counterpart is in dma_fence_signal_timestamp_locked().
> +	 */
> +	smp_rmb();
>  	if (!dma_fence_test_signaled_flag(fence))
>  		return (const char __rcu *)ops->get_driver_name(fence);

[Severity: High]
This is a pre-existing issue, but can this lead to a kernel panic during
concurrent fence signaling?

If a concurrent thread signals the fence after the flag is checked here,
the ops pointer can be set to NULL. When the callback resolves to
drm_crtc_fence_get_driver_name() in the drm crtc code, it calls
fence_to_crtc(), which re-reads the ops pointer:

drivers/gpu/drm/drm_crtc.c:fence_to_crtc() {
        BUG_ON(rcu_access_pointer(fence->ops) != &drm_crtc_fence_ops);
        return container_of(fence->extern_lock, struct drm_crtc, fence_lock);
}

Since the decoupling mechanism can set the ops pointer to NULL, won't this
BUG_ON trigger and crash the system? Do the fence callbacks need to be
updated to avoid re-reading the ops pointer?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260626100442.2202221-2-phasta@kernel.org?part=1

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] dma-buf: dma-fence: Fix potential NULL pointer dereference
  2026-06-26 10:04 [PATCH] dma-buf: dma-fence: Fix potential NULL pointer dereference Philipp Stanner
  2026-06-26 10:16 ` sashiko-bot
@ 2026-06-26 12:21 ` Christian König
  1 sibling, 0 replies; 3+ messages in thread
From: Christian König @ 2026-06-26 12:21 UTC (permalink / raw)
  To: Philipp Stanner, Sumit Semwal, Boris Brezillon, Tvrtko Ursulin,
	André Draszik, Danilo Krummrich, Gary Guo, Paul E . McKenney,
	Boqun Feng
  Cc: linux-media, dri-devel, linaro-mm-sig, linux-kernel, stable

On 6/26/26 12:04, Philipp Stanner wrote:
> The commit mentioned in the fixes tag below introduced a mechanism
> through which fence producers can fully decouple from fence consumers.
> This, desirable, mechanism is based on the fence's signaled-bit as the
> "decoupling point".
> 
> A sophisticated interaction between RCU and atomic instructions attempts
> to ensure that fence consumers can still interact with fence producers
> through the dma_fence_ops, callback pointers into the producer.
> 
> This is the desired behavior: to check for decoupling, the signaled-bit
> is first checked. If it's not yet signaled, RCU ensures that the ops
> pointer cannot yet be NULL.
> 
> Hereby, dma_fence_signal_timestamp_locked() first sets the signaled-bit,
> and then sets the ops pointer to NULL. Readers first load the ops
> pointer, and then check through the signaled-bit whether the pointer can
> legally be accessed.
> 
> These set and load operations could occur out of order on weakly ordered
> platforms. Hence, we need to enforce strict ordering all the time.

Ah! Good catch, now I've got what you mean with that.

> 
> Add the appropriate memory barriers.
> 
> Cc: stable@vger.kernel.org
> Fixes: f4cc3ab824d6 ("dma-buf: protected fence ops by RCU v8")
> Signed-off-by: Philipp Stanner <phasta@kernel.org>
> ---
> Tested with dmabuf and drm_sched unit tests.
> 
> Memory barriers are notoriously difficult, so I would appreciate if some
> of the more experienced folks can check this. Notably, I am not sure
> whether the smp_wmb() is necessary.
> 
> The documentation for test_and_set_bit() makes the mysterious statement
> "This is an atomic fully-ordered operation (implied full memory
> barrier)", but the kcsan_mb() seems to be some sort of debugging
> barrier, and in any case the docu doesn't make it obvious to me whether
> that "full barrier" comes before or after the bit setting takes place.
> 
> Moreover, in my opinion we should order dma_fence_is_signaled(), too –
> but if we agree to merge Christian's new series [1] that need should
> disappear.
> 
> 
> [1] https://lore.kernel.org/dri-devel/20260624122917.2483-1-christian.koenig@amd.com/ 
> ---
>  drivers/dma-buf/dma-fence.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index c7ea1e75d38a..2e80b01499de 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -363,6 +363,18 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>  				      &fence->flags)))
>  		return;
>  
> +	/*
> +	 * Fully order setting of the bit above with setting of the ops pointer
> +	 * to NULL below, so that all parties can use the signaled flag to
> +	 * detect that the fence decoupled from its ops in a safe manner.
> +	 *
> +	 * The counter parts of this barrier are in dma_fence_timeline_name()
> +	 * and dma_fence_driver_name(). All other future parties that rely on
> +	 * the signaled flag for valid access to the ops pointer will need a
> +	 * memory barrier.
> +	 */
> +	smp_wmb();
> +
>  	trace_dma_fence_signaled(fence);
>  
>  	/*
> @@ -1170,6 +1182,12 @@ const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
>  
>  	/* RCU protection is required for safe access to returned string */
>  	ops = rcu_dereference(fence->ops);
> +	/*
> +	 * Fully order the dereference above with the flag check. Otherwise,
> +	 * ops could be dereferenced as a NULL pointer. The barrier's
> +	 * counterpart is in dma_fence_signal_timestamp_locked().
> +	 */
> +	smp_rmb();
>  	if (!dma_fence_test_signaled_flag(fence))

Instead of adding the smp_rmb() I think we should check the ops pointer here for consistency with the other cases where we call the ops functions.

>  		return (const char __rcu *)ops->get_driver_name(fence);
>  	else
> @@ -1203,6 +1221,12 @@ const char __rcu *dma_fence_timeline_name(struct dma_fence *fence)
>  
>  	/* RCU protection is required for safe access to returned string */
>  	ops = rcu_dereference(fence->ops);
> +	/*
> +	 * Fully order the dereference above with the flag check. Otherwise,
> +	 * ops could be dereferenced as a NULL pointer. The barrier's
> +	 * counterpart is in dma_fence_signal_timestamp_locked().
> +	 */
> +	smp_rmb();
>  	if (!dma_fence_test_signaled_flag(fence))

Same of course here as well.

Thanks,
Christian.

>  		return (const char __rcu *)ops->get_driver_name(fence);
>  	else
> 
> base-commit: cdeb2ccd993ed8647adbbda2c3b103aa717fd6f7


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-06-26 12:22 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-26 10:04 [PATCH] dma-buf: dma-fence: Fix potential NULL pointer dereference Philipp Stanner
2026-06-26 10:16 ` sashiko-bot
2026-06-26 12:21 ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.