From: "Christian König" <christian.koenig@amd.com>
To: Boris Brezillon <boris.brezillon@collabora.com>
Cc: phasta@mailbox.org, aliceryhl@google.com, gary@garyguo.net,
lossin@kernel.org, daniel.almeida@collabora.com,
joelagnelf@nvidia.com, sumit.semwal@linaro.org,
dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org,
linaro-mm-sig@lists.linaro.org
Subject: Re: [PATCH] dma-buf/dma_fence: be more defensive in dma_fence_release
Date: Wed, 18 Mar 2026 09:21:34 +0100 [thread overview]
Message-ID: <3e7ce7bb-75ab-4f73-8a5d-dabd5d56cd61@amd.com> (raw)
In-Reply-To: <20260317162147.4a7f03ff@fedora>
On 3/17/26 16:21, Boris Brezillon wrote:
> On Tue, 17 Mar 2026 15:48:25 +0100
> "Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:
>
>> In case of a refcounting bug dma_fence_release() can be called before the
>> fence was even signaled.
>>
>> Previously the dma_fence framework then force signaled the fence to make
>> sure to unblock waiters, but that can potentially lead to random memory
>> corruption when the DMA operation continues. So be more defensive here and
>> pick the lesser evil.
>>
>> Instead of force signaling the fence set an error code on the fence,
>> re-initialize the refcount to something large and taint the kernel.
>>
>> This will leak memory and eventually can cause a deadlock when the fence
>> is never signaled, but at least we won't run into an use after free or
>> random memory corruption.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>> drivers/dma-buf/dma-fence.c | 18 ++++++++++++++----
>> 1 file changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>> index 1826ba73094c..8bf07685a053 100644
>> --- a/drivers/dma-buf/dma-fence.c
>> +++ b/drivers/dma-buf/dma-fence.c
>> @@ -593,14 +593,24 @@ void dma_fence_release(struct kref *kref)
>> /*
>> * Failed to signal before release, likely a refcounting issue.
>> *
>> - * This should never happen, but if it does make sure that we
>> - * don't leave chains dangling. We set the error flag first
>> - * so that the callbacks know this signal is due to an error.
>> + * This should never happen, but if try to be defensive and take
>> + * the lesser evil. Initialize the refcount to something large,
>> + * but not so large that it can overflow.
>> + *
>> + * That will leak memory and could deadlock if the fence never
>> + * signals, but at least it doesn't cause an use after free or
>> + * random memory corruption.
>> + *
>> + * Also taint the kernel to note that it is rather unreliable to
>> + * continue.
>> */
>> dma_fence_lock_irqsave(fence, flags);
>> fence->error = -EDEADLK;
>> - dma_fence_signal_locked(fence);
>> + refcount_set(&fence->refcount.refcount, INT_MAX);
>
> I'm not convinced this is useful. If we leak the object, no one should
> have a ref to release anyway. This does raise a question though. The
> case we're trying to protect against is fence_callback being registered
> to this fence and waiting for an event to signal another proxy fence.
Not quite. The real problematic case is that it is necessary to wait for a fence to signal with tons of memory management locks held.
So it can be that a simple memory allocation cycles back and depends on the fence to signal.
> How can the refcnt drop to zero in that case? Isn't the proxy supposed
> to own a ref on the fence. Before we go further, I'd like to understand
> what we're trying to do.
Well we are in C here, so its simply coding errors. An unecessary dma_fence_put() in an error path is enough to trigger this.
> The original discussion that led you to write this patch was about
> detecting when a fence emitter/producer would leave unsignalled fences
> behind, and the problem we have is when such unsignalled fences have
> observers waiting for a "signalled" event. If the refcnt drops to zero
> and the fence is released, we're already passed that point,
> unfortunately.
Well that is not quite correct.
The most common problem is that we have unbalanced dma_fence_get()/dma_fence_put() and we end up in dma_fence_release() before the issuer of the dma_fence has a chance to signal it.
See the main purpose of DMA fences is to prevent releasing memory back into the core memory management before the DMA operation is completed.
So when a DMA fence signals to early it means that the HW is still writing to that memory but we already potentially re-using the memory ending in random memory corruption.
UAF issues are harmless compared to that.
Regards,
Christian.
> It can be that:
>
> - the fence was never exposed -> this is fine
> - the fence was exposed but never observed -> this is broken, because if
> it had been observed it would have led to a deadlock
> - the fence was exposed, observed for some time, but the observer got
> bored, stopped waiting and:
> * decided to go and execute its stuff anyway -> use-before-ready
> situation
> * gave up -> kinda okay, but we should still consider the fence
> emitter broken
> - the fence observer registered a callback but didn't take a ref on the
> object -> this is potential UAF on the dma_fence, which can also lead
> to a VRAM/system-mem UAF if the emitter drops the dma_fence without
> signalling, because of the auto-signal you're getting rid of in this
> patch. But the latter is just a side effect of the dma_fence UAF,
> which I'm not convinced we should try to protect against.
>
>> dma_fence_unlock_irqrestore(fence, flags);
>> + rcu_read_unlock();
>> + add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
>> + return;
>> }
>>
>> ops = rcu_dereference(fence->ops);
>
next prev parent reply other threads:[~2026-03-18 8:21 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-17 14:48 [PATCH] dma-buf/dma_fence: be more defensive in dma_fence_release Christian König
2026-03-17 15:21 ` Boris Brezillon
2026-03-18 8:21 ` Christian König [this message]
2026-03-18 9:18 ` Boris Brezillon
2026-03-18 9:50 ` Christian König
2026-03-18 12:25 ` Alice Ryhl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3e7ce7bb-75ab-4f73-8a5d-dabd5d56cd61@amd.com \
--to=christian.koenig@amd.com \
--cc=aliceryhl@google.com \
--cc=boris.brezillon@collabora.com \
--cc=daniel.almeida@collabora.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=gary@garyguo.net \
--cc=joelagnelf@nvidia.com \
--cc=linaro-mm-sig@lists.linaro.org \
--cc=linux-media@vger.kernel.org \
--cc=lossin@kernel.org \
--cc=phasta@mailbox.org \
--cc=sumit.semwal@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox