public inbox for linux-media@vger.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: Boris Brezillon <boris.brezillon@collabora.com>
Cc: phasta@mailbox.org, aliceryhl@google.com, gary@garyguo.net,
	lossin@kernel.org, daniel.almeida@collabora.com,
	joelagnelf@nvidia.com, sumit.semwal@linaro.org,
	dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org,
	linaro-mm-sig@lists.linaro.org
Subject: Re: [PATCH] dma-buf/dma_fence: be more defensive in dma_fence_release
Date: Wed, 18 Mar 2026 09:21:34 +0100	[thread overview]
Message-ID: <3e7ce7bb-75ab-4f73-8a5d-dabd5d56cd61@amd.com> (raw)
In-Reply-To: <20260317162147.4a7f03ff@fedora>

On 3/17/26 16:21, Boris Brezillon wrote:
> On Tue, 17 Mar 2026 15:48:25 +0100
> "Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:
> 
>> In case of a refcounting bug dma_fence_release() can be called before the
>> fence was even signaled.
>>
>> Previously the dma_fence framework then force signaled the fence to make
>> sure to unblock waiters, but that can potentially lead to random memory
>> corruption when the DMA operation continues. So be more defensive here and
>> pick the lesser evil.
>>
>> Instead of force signaling the fence set an error code on the fence,
>> re-initialize the refcount to something large and taint the kernel.
>>
>> This will leak memory and eventually can cause a deadlock when the fence
>> is never signaled, but at least we won't run into an use after free or
>> random memory corruption.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>  drivers/dma-buf/dma-fence.c | 18 ++++++++++++++----
>>  1 file changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>> index 1826ba73094c..8bf07685a053 100644
>> --- a/drivers/dma-buf/dma-fence.c
>> +++ b/drivers/dma-buf/dma-fence.c
>> @@ -593,14 +593,24 @@ void dma_fence_release(struct kref *kref)
>>  		/*
>>  		 * Failed to signal before release, likely a refcounting issue.
>>  		 *
>> -		 * This should never happen, but if it does make sure that we
>> -		 * don't leave chains dangling. We set the error flag first
>> -		 * so that the callbacks know this signal is due to an error.
>> +		 * This should never happen, but if try to be defensive and take
>> +		 * the lesser evil. Initialize the refcount to something large,
>> +		 * but not so large that it can overflow.
>> +		 *
>> +		 * That will leak memory and could deadlock if the fence never
>> +		 * signals, but at least it doesn't cause an use after free or
>> +		 * random memory corruption.
>> +		 *
>> +		 * Also taint the kernel to note that it is rather unreliable to
>> +		 * continue.
>>  		 */
>>  		dma_fence_lock_irqsave(fence, flags);
>>  		fence->error = -EDEADLK;
>> -		dma_fence_signal_locked(fence);
>> +		refcount_set(&fence->refcount.refcount, INT_MAX);
> 
> I'm not convinced this is useful. If we leak the object, no one should
> have a ref to release anyway. This does raise a question though. The
> case we're trying to protect against is fence_callback being registered
> to this fence and waiting for an event to signal another proxy fence.

Not quite. The real problematic case is that it is necessary to wait for a fence to signal with tons of memory management locks held.

So it can be that a simple memory allocation cycles back and depends on the fence to signal.

> How can the refcnt drop to zero in that case? Isn't the proxy supposed
> to own a ref on the fence. Before we go further, I'd like to understand
> what we're trying to do.

Well we are in C here, so its simply coding errors. An unecessary dma_fence_put() in an error path is enough to trigger this.

> The original discussion that led you to write this patch was about
> detecting when a fence emitter/producer would leave unsignalled fences
> behind, and the problem we have is when such unsignalled fences have
> observers waiting for a "signalled" event. If the refcnt drops to zero
> and the fence is released, we're already passed that point,
> unfortunately.

Well that is not quite correct.

The most common problem is that we have unbalanced dma_fence_get()/dma_fence_put() and we end up in dma_fence_release() before the issuer of the dma_fence has a chance to signal it.

See the main purpose of DMA fences is to prevent releasing memory back into the core memory management before the DMA operation is completed.

So when a DMA fence signals to early it means that the HW is still writing to that memory but we already potentially re-using the memory ending in random memory corruption.

UAF issues are harmless compared to that.

Regards,
Christian.

> It can be that:
> 
> - the fence was never exposed -> this is fine
> - the fence was exposed but never observed -> this is broken, because if
>   it had been observed it would have led to a deadlock
> - the fence was exposed, observed for some time, but the observer got
>   bored, stopped waiting and:
>   * decided to go and execute its stuff anyway -> use-before-ready
>     situation
>   * gave up -> kinda okay, but we should still consider the fence
>     emitter broken
> - the fence observer registered a callback but didn't take a ref on the
>   object -> this is potential UAF on the dma_fence, which can also lead
>   to a VRAM/system-mem UAF if the emitter drops the dma_fence without
>   signalling, because of the auto-signal you're getting rid of in this
>   patch.  But the latter is just a side effect of the dma_fence UAF,
>   which I'm not convinced we should try to protect against.
> 
>>  		dma_fence_unlock_irqrestore(fence, flags);
>> +		rcu_read_unlock();
>> +		add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
>> +		return;
>>  	}
>>  
>>  	ops = rcu_dereference(fence->ops);
> 


  reply	other threads:[~2026-03-18  8:21 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17 14:48 [PATCH] dma-buf/dma_fence: be more defensive in dma_fence_release Christian König
2026-03-17 15:21 ` Boris Brezillon
2026-03-18  8:21   ` Christian König [this message]
2026-03-18  9:18     ` Boris Brezillon
2026-03-18  9:50       ` Christian König
2026-03-18 12:25 ` Alice Ryhl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3e7ce7bb-75ab-4f73-8a5d-dabd5d56cd61@amd.com \
    --to=christian.koenig@amd.com \
    --cc=aliceryhl@google.com \
    --cc=boris.brezillon@collabora.com \
    --cc=daniel.almeida@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gary@garyguo.net \
    --cc=joelagnelf@nvidia.com \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-media@vger.kernel.org \
    --cc=lossin@kernel.org \
    --cc=phasta@mailbox.org \
    --cc=sumit.semwal@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox