public inbox for linux-media@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] dma-buf/dma_fence: be more defensive in dma_fence_release
@ 2026-03-17 14:48 Christian König
  2026-03-17 15:21 ` Boris Brezillon
  2026-03-18 12:25 ` Alice Ryhl
  0 siblings, 2 replies; 6+ messages in thread
From: Christian König @ 2026-03-17 14:48 UTC (permalink / raw)
  To: phasta, aliceryhl, boris.brezillon, gary, lossin, daniel.almeida,
	joelagnelf, sumit.semwal
  Cc: dri-devel, linux-media, linaro-mm-sig

In case of a refcounting bug dma_fence_release() can be called before the
fence was even signaled.

Previously the dma_fence framework then force signaled the fence to make
sure to unblock waiters, but that can potentially lead to random memory
corruption when the DMA operation continues. So be more defensive here and
pick the lesser evil.

Instead of force signaling the fence set an error code on the fence,
re-initialize the refcount to something large and taint the kernel.

This will leak memory and eventually can cause a deadlock when the fence
is never signaled, but at least we won't run into an use after free or
random memory corruption.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-fence.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 1826ba73094c..8bf07685a053 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -593,14 +593,24 @@ void dma_fence_release(struct kref *kref)
 		/*
 		 * Failed to signal before release, likely a refcounting issue.
 		 *
-		 * This should never happen, but if it does make sure that we
-		 * don't leave chains dangling. We set the error flag first
-		 * so that the callbacks know this signal is due to an error.
+		 * This should never happen, but if try to be defensive and take
+		 * the lesser evil. Initialize the refcount to something large,
+		 * but not so large that it can overflow.
+		 *
+		 * That will leak memory and could deadlock if the fence never
+		 * signals, but at least it doesn't cause an use after free or
+		 * random memory corruption.
+		 *
+		 * Also taint the kernel to note that it is rather unreliable to
+		 * continue.
 		 */
 		dma_fence_lock_irqsave(fence, flags);
 		fence->error = -EDEADLK;
-		dma_fence_signal_locked(fence);
+		refcount_set(&fence->refcount.refcount, INT_MAX);
 		dma_fence_unlock_irqrestore(fence, flags);
+		rcu_read_unlock();
+		add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
+		return;
 	}
 
 	ops = rcu_dereference(fence->ops);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] dma-buf/dma_fence: be more defensive in dma_fence_release
  2026-03-17 14:48 [PATCH] dma-buf/dma_fence: be more defensive in dma_fence_release Christian König
@ 2026-03-17 15:21 ` Boris Brezillon
  2026-03-18  8:21   ` Christian König
  2026-03-18 12:25 ` Alice Ryhl
  1 sibling, 1 reply; 6+ messages in thread
From: Boris Brezillon @ 2026-03-17 15:21 UTC (permalink / raw)
  To: Christian König
  Cc: phasta, aliceryhl, gary, lossin, daniel.almeida, joelagnelf,
	sumit.semwal, dri-devel, linux-media, linaro-mm-sig

On Tue, 17 Mar 2026 15:48:25 +0100
"Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:

> In case of a refcounting bug dma_fence_release() can be called before the
> fence was even signaled.
> 
> Previously the dma_fence framework then force signaled the fence to make
> sure to unblock waiters, but that can potentially lead to random memory
> corruption when the DMA operation continues. So be more defensive here and
> pick the lesser evil.
> 
> Instead of force signaling the fence set an error code on the fence,
> re-initialize the refcount to something large and taint the kernel.
> 
> This will leak memory and eventually can cause a deadlock when the fence
> is never signaled, but at least we won't run into an use after free or
> random memory corruption.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/dma-buf/dma-fence.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 1826ba73094c..8bf07685a053 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -593,14 +593,24 @@ void dma_fence_release(struct kref *kref)
>  		/*
>  		 * Failed to signal before release, likely a refcounting issue.
>  		 *
> -		 * This should never happen, but if it does make sure that we
> -		 * don't leave chains dangling. We set the error flag first
> -		 * so that the callbacks know this signal is due to an error.
> +		 * This should never happen, but if try to be defensive and take
> +		 * the lesser evil. Initialize the refcount to something large,
> +		 * but not so large that it can overflow.
> +		 *
> +		 * That will leak memory and could deadlock if the fence never
> +		 * signals, but at least it doesn't cause an use after free or
> +		 * random memory corruption.
> +		 *
> +		 * Also taint the kernel to note that it is rather unreliable to
> +		 * continue.
>  		 */
>  		dma_fence_lock_irqsave(fence, flags);
>  		fence->error = -EDEADLK;
> -		dma_fence_signal_locked(fence);
> +		refcount_set(&fence->refcount.refcount, INT_MAX);

I'm not convinced this is useful. If we leak the object, no one should
have a ref to release anyway. This does raise a question though. The
case we're trying to protect against is fence_callback being registered
to this fence and waiting for an event to signal another proxy fence.
How can the refcnt drop to zero in that case? Isn't the proxy supposed
to own a ref on the fence. Before we go further, I'd like to understand
what we're trying to do.

The original discussion that led you to write this patch was about
detecting when a fence emitter/producer would leave unsignalled fences
behind, and the problem we have is when such unsignalled fences have
observers waiting for a "signalled" event. If the refcnt drops to zero
and the fence is released, we're already passed that point,
unfortunately. It can be that:

- the fence was never exposed -> this is fine
- the fence was exposed but never observed -> this is broken, because if
  it had been observed it would have led to a deadlock
- the fence was exposed, observed for some time, but the observer got
  bored, stopped waiting and:
  * decided to go and execute its stuff anyway -> use-before-ready
    situation
  * gave up -> kinda okay, but we should still consider the fence
    emitter broken
- the fence observer registered a callback but didn't take a ref on the
  object -> this is potential UAF on the dma_fence, which can also lead
  to a VRAM/system-mem UAF if the emitter drops the dma_fence without
  signalling, because of the auto-signal you're getting rid of in this
  patch.  But the latter is just a side effect of the dma_fence UAF,
  which I'm not convinced we should try to protect against.

>  		dma_fence_unlock_irqrestore(fence, flags);
> +		rcu_read_unlock();
> +		add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
> +		return;
>  	}
>  
>  	ops = rcu_dereference(fence->ops);


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] dma-buf/dma_fence: be more defensive in dma_fence_release
  2026-03-17 15:21 ` Boris Brezillon
@ 2026-03-18  8:21   ` Christian König
  2026-03-18  9:18     ` Boris Brezillon
  0 siblings, 1 reply; 6+ messages in thread
From: Christian König @ 2026-03-18  8:21 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: phasta, aliceryhl, gary, lossin, daniel.almeida, joelagnelf,
	sumit.semwal, dri-devel, linux-media, linaro-mm-sig

On 3/17/26 16:21, Boris Brezillon wrote:
> On Tue, 17 Mar 2026 15:48:25 +0100
> "Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:
> 
>> In case of a refcounting bug dma_fence_release() can be called before the
>> fence was even signaled.
>>
>> Previously the dma_fence framework then force signaled the fence to make
>> sure to unblock waiters, but that can potentially lead to random memory
>> corruption when the DMA operation continues. So be more defensive here and
>> pick the lesser evil.
>>
>> Instead of force signaling the fence set an error code on the fence,
>> re-initialize the refcount to something large and taint the kernel.
>>
>> This will leak memory and eventually can cause a deadlock when the fence
>> is never signaled, but at least we won't run into an use after free or
>> random memory corruption.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>  drivers/dma-buf/dma-fence.c | 18 ++++++++++++++----
>>  1 file changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>> index 1826ba73094c..8bf07685a053 100644
>> --- a/drivers/dma-buf/dma-fence.c
>> +++ b/drivers/dma-buf/dma-fence.c
>> @@ -593,14 +593,24 @@ void dma_fence_release(struct kref *kref)
>>  		/*
>>  		 * Failed to signal before release, likely a refcounting issue.
>>  		 *
>> -		 * This should never happen, but if it does make sure that we
>> -		 * don't leave chains dangling. We set the error flag first
>> -		 * so that the callbacks know this signal is due to an error.
>> +		 * This should never happen, but if try to be defensive and take
>> +		 * the lesser evil. Initialize the refcount to something large,
>> +		 * but not so large that it can overflow.
>> +		 *
>> +		 * That will leak memory and could deadlock if the fence never
>> +		 * signals, but at least it doesn't cause an use after free or
>> +		 * random memory corruption.
>> +		 *
>> +		 * Also taint the kernel to note that it is rather unreliable to
>> +		 * continue.
>>  		 */
>>  		dma_fence_lock_irqsave(fence, flags);
>>  		fence->error = -EDEADLK;
>> -		dma_fence_signal_locked(fence);
>> +		refcount_set(&fence->refcount.refcount, INT_MAX);
> 
> I'm not convinced this is useful. If we leak the object, no one should
> have a ref to release anyway. This does raise a question though. The
> case we're trying to protect against is fence_callback being registered
> to this fence and waiting for an event to signal another proxy fence.

Not quite. The real problematic case is that it is necessary to wait for a fence to signal with tons of memory management locks held.

So it can be that a simple memory allocation cycles back and depends on the fence to signal.

> How can the refcnt drop to zero in that case? Isn't the proxy supposed
> to own a ref on the fence. Before we go further, I'd like to understand
> what we're trying to do.

Well we are in C here, so its simply coding errors. An unecessary dma_fence_put() in an error path is enough to trigger this.

> The original discussion that led you to write this patch was about
> detecting when a fence emitter/producer would leave unsignalled fences
> behind, and the problem we have is when such unsignalled fences have
> observers waiting for a "signalled" event. If the refcnt drops to zero
> and the fence is released, we're already passed that point,
> unfortunately.

Well that is not quite correct.

The most common problem is that we have unbalanced dma_fence_get()/dma_fence_put() and we end up in dma_fence_release() before the issuer of the dma_fence has a chance to signal it.

See the main purpose of DMA fences is to prevent releasing memory back into the core memory management before the DMA operation is completed.

So when a DMA fence signals to early it means that the HW is still writing to that memory but we already potentially re-using the memory ending in random memory corruption.

UAF issues are harmless compared to that.

Regards,
Christian.

> It can be that:
> 
> - the fence was never exposed -> this is fine
> - the fence was exposed but never observed -> this is broken, because if
>   it had been observed it would have led to a deadlock
> - the fence was exposed, observed for some time, but the observer got
>   bored, stopped waiting and:
>   * decided to go and execute its stuff anyway -> use-before-ready
>     situation
>   * gave up -> kinda okay, but we should still consider the fence
>     emitter broken
> - the fence observer registered a callback but didn't take a ref on the
>   object -> this is potential UAF on the dma_fence, which can also lead
>   to a VRAM/system-mem UAF if the emitter drops the dma_fence without
>   signalling, because of the auto-signal you're getting rid of in this
>   patch.  But the latter is just a side effect of the dma_fence UAF,
>   which I'm not convinced we should try to protect against.
> 
>>  		dma_fence_unlock_irqrestore(fence, flags);
>> +		rcu_read_unlock();
>> +		add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
>> +		return;
>>  	}
>>  
>>  	ops = rcu_dereference(fence->ops);
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] dma-buf/dma_fence: be more defensive in dma_fence_release
  2026-03-18  8:21   ` Christian König
@ 2026-03-18  9:18     ` Boris Brezillon
  2026-03-18  9:50       ` Christian König
  0 siblings, 1 reply; 6+ messages in thread
From: Boris Brezillon @ 2026-03-18  9:18 UTC (permalink / raw)
  To: Christian König
  Cc: phasta, aliceryhl, gary, lossin, daniel.almeida, joelagnelf,
	sumit.semwal, dri-devel, linux-media, linaro-mm-sig

Hi Christian,

On Wed, 18 Mar 2026 09:21:34 +0100
Christian König <christian.koenig@amd.com> wrote:

> On 3/17/26 16:21, Boris Brezillon wrote:
> > On Tue, 17 Mar 2026 15:48:25 +0100
> > "Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:
> >   
> >> In case of a refcounting bug dma_fence_release() can be called
> >> before the fence was even signaled.
> >>
> >> Previously the dma_fence framework then force signaled the fence
> >> to make sure to unblock waiters, but that can potentially lead to
> >> random memory corruption when the DMA operation continues. So be
> >> more defensive here and pick the lesser evil.
> >>
> >> Instead of force signaling the fence set an error code on the
> >> fence, re-initialize the refcount to something large and taint the
> >> kernel.
> >>
> >> This will leak memory and eventually can cause a deadlock when the
> >> fence is never signaled, but at least we won't run into an use
> >> after free or random memory corruption.
> >>
> >> Signed-off-by: Christian König <christian.koenig@amd.com>
> >> ---
> >>  drivers/dma-buf/dma-fence.c | 18 ++++++++++++++----
> >>  1 file changed, 14 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/dma-buf/dma-fence.c
> >> b/drivers/dma-buf/dma-fence.c index 1826ba73094c..8bf07685a053
> >> 100644 --- a/drivers/dma-buf/dma-fence.c
> >> +++ b/drivers/dma-buf/dma-fence.c
> >> @@ -593,14 +593,24 @@ void dma_fence_release(struct kref *kref)
> >>  		/*
> >>  		 * Failed to signal before release, likely a
> >> refcounting issue. *
> >> -		 * This should never happen, but if it does make
> >> sure that we
> >> -		 * don't leave chains dangling. We set the error
> >> flag first
> >> -		 * so that the callbacks know this signal is due
> >> to an error.
> >> +		 * This should never happen, but if try to be
> >> defensive and take
> >> +		 * the lesser evil. Initialize the refcount to
> >> something large,
> >> +		 * but not so large that it can overflow.
> >> +		 *
> >> +		 * That will leak memory and could deadlock if
> >> the fence never
> >> +		 * signals, but at least it doesn't cause an use
> >> after free or
> >> +		 * random memory corruption.
> >> +		 *
> >> +		 * Also taint the kernel to note that it is
> >> rather unreliable to
> >> +		 * continue.
> >>  		 */
> >>  		dma_fence_lock_irqsave(fence, flags);
> >>  		fence->error = -EDEADLK;
> >> -		dma_fence_signal_locked(fence);
> >> +		refcount_set(&fence->refcount.refcount, INT_MAX);
> >>  
> > 
> > I'm not convinced this is useful. If we leak the object, no one
> > should have a ref to release anyway. This does raise a question
> > though. The case we're trying to protect against is fence_callback
> > being registered to this fence and waiting for an event to signal
> > another proxy fence.  
> 
> Not quite. The real problematic case is that it is necessary to wait
> for a fence to signal with tons of memory management locks held.
> 
> So it can be that a simple memory allocation cycles back and depends
> on the fence to signal.
> 
> > How can the refcnt drop to zero in that case? Isn't the proxy
> > supposed to own a ref on the fence. Before we go further, I'd like
> > to understand what we're trying to do.  
> 
> Well we are in C here, so its simply coding errors. An unecessary
> dma_fence_put() in an error path is enough to trigger this.
> 
> > The original discussion that led you to write this patch was about
> > detecting when a fence emitter/producer would leave unsignalled
> > fences behind, and the problem we have is when such unsignalled
> > fences have observers waiting for a "signalled" event. If the
> > refcnt drops to zero and the fence is released, we're already
> > passed that point, unfortunately.  
> 
> Well that is not quite correct.
> 
> The most common problem is that we have unbalanced
> dma_fence_get()/dma_fence_put() and we end up in dma_fence_release()
> before the issuer of the dma_fence has a chance to signal it.

Okay, so that's clearly not solving the problem we were discussing on
[1], I thought it was related. Also, I'm still skeptical that we should
try and harden security for a situation that's already covered by
refcount overflow detection. I get why you want to do that, but it
feels like the wrong tool to me. I mean, we wouldn't even see it as
an unbalanced dma_fence_get/put() now that you manually set the refcount
to INT_MAX, which is the bug you're trying to cover for in the first
place.

> 
> See the main purpose of DMA fences is to prevent releasing memory
> back into the core memory management before the DMA operation is
> completed.

That's a UAF, just a differnt kind (device UAF instead of CPU UAF).
Anyway, my point remains, the root of the issue you're covering for is
a dma_fence UAF (more put()s than get()s, and the CPU still has a ref
on a released dma_fence object). The outcome of this might be device
UAF because of the auto-signalling, but that's still just another
symptom of the dma_fence UAF (with wider consequences, admittedly).

> 
> So when a DMA fence signals to early it means that the HW is still
> writing to that memory but we already potentially re-using the memory
> ending in random memory corruption.

Yep, I'm well aware of that.

> 
> UAF issues are harmless compared to that.

That's not what I'm arguing against. What I'm saying is that you just
paper over an issue by messing up with the refcount, and now it's hard
to tell what the root cause is. 

Regards,

Boris

[1]https://yhbt.net/lore/all/8bac1559-e139-4a74-a6e8-c2846093db72@amd.com/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] dma-buf/dma_fence: be more defensive in dma_fence_release
  2026-03-18  9:18     ` Boris Brezillon
@ 2026-03-18  9:50       ` Christian König
  0 siblings, 0 replies; 6+ messages in thread
From: Christian König @ 2026-03-18  9:50 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: phasta, aliceryhl, gary, lossin, daniel.almeida, joelagnelf,
	sumit.semwal, dri-devel, linux-media, linaro-mm-sig

Hi Boris,

On 3/18/26 10:18, Boris Brezillon wrote:
> Hi Christian,
> 
> On Wed, 18 Mar 2026 09:21:34 +0100
> Christian König <christian.koenig@amd.com> wrote:
> 
>> On 3/17/26 16:21, Boris Brezillon wrote:
>>> On Tue, 17 Mar 2026 15:48:25 +0100
>>> "Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:
>>>   
>>>> In case of a refcounting bug dma_fence_release() can be called
>>>> before the fence was even signaled.
>>>>
>>>> Previously the dma_fence framework then force signaled the fence
>>>> to make sure to unblock waiters, but that can potentially lead to
>>>> random memory corruption when the DMA operation continues. So be
>>>> more defensive here and pick the lesser evil.
>>>>
>>>> Instead of force signaling the fence set an error code on the
>>>> fence, re-initialize the refcount to something large and taint the
>>>> kernel.
>>>>
>>>> This will leak memory and eventually can cause a deadlock when the
>>>> fence is never signaled, but at least we won't run into an use
>>>> after free or random memory corruption.
>>>>
>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>> ---
>>>>  drivers/dma-buf/dma-fence.c | 18 ++++++++++++++----
>>>>  1 file changed, 14 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/dma-buf/dma-fence.c
>>>> b/drivers/dma-buf/dma-fence.c index 1826ba73094c..8bf07685a053
>>>> 100644 --- a/drivers/dma-buf/dma-fence.c
>>>> +++ b/drivers/dma-buf/dma-fence.c
>>>> @@ -593,14 +593,24 @@ void dma_fence_release(struct kref *kref)
>>>>  		/*
>>>>  		 * Failed to signal before release, likely a
>>>> refcounting issue. *
>>>> -		 * This should never happen, but if it does make
>>>> sure that we
>>>> -		 * don't leave chains dangling. We set the error
>>>> flag first
>>>> -		 * so that the callbacks know this signal is due
>>>> to an error.
>>>> +		 * This should never happen, but if try to be
>>>> defensive and take
>>>> +		 * the lesser evil. Initialize the refcount to
>>>> something large,
>>>> +		 * but not so large that it can overflow.
>>>> +		 *
>>>> +		 * That will leak memory and could deadlock if
>>>> the fence never
>>>> +		 * signals, but at least it doesn't cause an use
>>>> after free or
>>>> +		 * random memory corruption.
>>>> +		 *
>>>> +		 * Also taint the kernel to note that it is
>>>> rather unreliable to
>>>> +		 * continue.
>>>>  		 */
>>>>  		dma_fence_lock_irqsave(fence, flags);
>>>>  		fence->error = -EDEADLK;
>>>> -		dma_fence_signal_locked(fence);
>>>> +		refcount_set(&fence->refcount.refcount, INT_MAX);
>>>>  
>>>
>>> I'm not convinced this is useful. If we leak the object, no one
>>> should have a ref to release anyway. This does raise a question
>>> though. The case we're trying to protect against is fence_callback
>>> being registered to this fence and waiting for an event to signal
>>> another proxy fence.  
>>
>> Not quite. The real problematic case is that it is necessary to wait
>> for a fence to signal with tons of memory management locks held.
>>
>> So it can be that a simple memory allocation cycles back and depends
>> on the fence to signal.
>>
>>> How can the refcnt drop to zero in that case? Isn't the proxy
>>> supposed to own a ref on the fence. Before we go further, I'd like
>>> to understand what we're trying to do.  
>>
>> Well we are in C here, so its simply coding errors. An unecessary
>> dma_fence_put() in an error path is enough to trigger this.
>>
>>> The original discussion that led you to write this patch was about
>>> detecting when a fence emitter/producer would leave unsignalled
>>> fences behind, and the problem we have is when such unsignalled
>>> fences have observers waiting for a "signalled" event. If the
>>> refcnt drops to zero and the fence is released, we're already
>>> passed that point, unfortunately.  
>>
>> Well that is not quite correct.
>>
>> The most common problem is that we have unbalanced
>> dma_fence_get()/dma_fence_put() and we end up in dma_fence_release()
>> before the issuer of the dma_fence has a chance to signal it.
> 
> Okay, so that's clearly not solving the problem we were discussing on
> [1], I thought it was related.

Yeah, correct. The situation on the Rust side is clearly different, you simply doesn't have incorrect refcounting issues there.

> Also, I'm still skeptical that we should
> try and harden security for a situation that's already covered by
> refcount overflow detection.

Refcount overflow detection is unfortunately not enabled everywhere and even if it is enabled it doesn't protect against such issues here, it only points them out when it is already to late.

> I get why you want to do that, but it
> feels like the wrong tool to me. I mean, we wouldn't even see it as
> an unbalanced dma_fence_get/put() now that you manually set the refcount
> to INT_MAX, which is the bug you're trying to cover for in the first
> place.
> 
>>
>> See the main purpose of DMA fences is to prevent releasing memory
>> back into the core memory management before the DMA operation is
>> completed.
> 
> That's a UAF, just a differnt kind (device UAF instead of CPU UAF).

Yeah agree completely.

The problem is that SW UAF issues are preventable by using something like Rust while HW UAF issues can only be found by an IOMMU and that in turn is disabled more often than not.

Especially GPUs and accelerators usually use pass through mode for IOMMU because of both HW bugs as well as performance overhead.

> Anyway, my point remains, the root of the issue you're covering for is
> a dma_fence UAF (more put()s than get()s, and the CPU still has a ref
> on a released dma_fence object). The outcome of this might be device
> UAF because of the auto-signalling, but that's still just another
> symptom of the dma_fence UAF (with wider consequences, admittedly).
> 
>>
>> So when a DMA fence signals to early it means that the HW is still
>> writing to that memory but we already potentially re-using the memory
>> ending in random memory corruption.
> 
> Yep, I'm well aware of that.
> 
>>
>> UAF issues are harmless compared to that.
> 
> That's not what I'm arguing against. What I'm saying is that you just
> paper over an issue by messing up with the refcount, and now it's hard
> to tell what the root cause is. 

Completely agree as well. It's not a real solution, but only the lesser evil.

Regards,
Christian.

> 
> Regards,
> 
> Boris
> 
> [1]https://yhbt.net/lore/all/8bac1559-e139-4a74-a6e8-c2846093db72@amd.com/


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] dma-buf/dma_fence: be more defensive in dma_fence_release
  2026-03-17 14:48 [PATCH] dma-buf/dma_fence: be more defensive in dma_fence_release Christian König
  2026-03-17 15:21 ` Boris Brezillon
@ 2026-03-18 12:25 ` Alice Ryhl
  1 sibling, 0 replies; 6+ messages in thread
From: Alice Ryhl @ 2026-03-18 12:25 UTC (permalink / raw)
  To: Christian König
  Cc: phasta, boris.brezillon, gary, lossin, daniel.almeida, joelagnelf,
	sumit.semwal, dri-devel, linux-media, linaro-mm-sig

On Tue, Mar 17, 2026 at 03:48:25PM +0100, Christian König wrote:
> In case of a refcounting bug dma_fence_release() can be called before the
> fence was even signaled.
> 
> Previously the dma_fence framework then force signaled the fence to make
> sure to unblock waiters, but that can potentially lead to random memory
> corruption when the DMA operation continues. So be more defensive here and
> pick the lesser evil.
> 
> Instead of force signaling the fence set an error code on the fence,
> re-initialize the refcount to something large and taint the kernel.
> 
> This will leak memory and eventually can cause a deadlock when the fence
> is never signaled, but at least we won't run into an use after free or
> random memory corruption.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/dma-buf/dma-fence.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 1826ba73094c..8bf07685a053 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -593,14 +593,24 @@ void dma_fence_release(struct kref *kref)
>  		/*
>  		 * Failed to signal before release, likely a refcounting issue.
>  		 *
> -		 * This should never happen, but if it does make sure that we
> -		 * don't leave chains dangling. We set the error flag first
> -		 * so that the callbacks know this signal is due to an error.
> +		 * This should never happen, but if try to be defensive and take
> +		 * the lesser evil. Initialize the refcount to something large,
> +		 * but not so large that it can overflow.
> +		 *
> +		 * That will leak memory and could deadlock if the fence never
> +		 * signals, but at least it doesn't cause an use after free or
> +		 * random memory corruption.
> +		 *
> +		 * Also taint the kernel to note that it is rather unreliable to
> +		 * continue.
>  		 */
>  		dma_fence_lock_irqsave(fence, flags);
>  		fence->error = -EDEADLK;
> -		dma_fence_signal_locked(fence);
> +		refcount_set(&fence->refcount.refcount, INT_MAX);

It's much better to leave the refcount with a value of zero here. That
way, when the refcount is decremented next time, the usual underflow
detection checks will trigger.

You can still skip the kfree() to avoid use-after-free.

Alice

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-03-18 12:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-17 14:48 [PATCH] dma-buf/dma_fence: be more defensive in dma_fence_release Christian König
2026-03-17 15:21 ` Boris Brezillon
2026-03-18  8:21   ` Christian König
2026-03-18  9:18     ` Boris Brezillon
2026-03-18  9:50       ` Christian König
2026-03-18 12:25 ` Alice Ryhl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox