From: "Christian König" <christian.koenig@amd.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: phasta@kernel.org, alexdeucher@gmail.com, simona.vetter@ffwll.ch,
tursulin@ursulin.net, dri-devel@lists.freedesktop.org,
amd-gfx@lists.freedesktop.org
Subject: Re: Independence for dma_fences!
Date: Thu, 30 Oct 2025 11:59:01 +0100 [thread overview]
Message-ID: <6739aebe-45eb-44a5-a539-209fc3ed809b@amd.com> (raw)
In-Reply-To: <aQJ+08BFLtmsM8LQ@lstrano-desk.jf.intel.com>
On 10/29/25 21:53, Matthew Brost wrote:
> On Tue, Oct 28, 2025 at 03:06:22PM +0100, Christian König wrote:
>> On 10/17/25 10:32, Philipp Stanner wrote:
>>> On Tue, 2025-10-14 at 17:54 +0200, Christian König wrote:
>>>> On 13.10.25 16:54, Philipp Stanner wrote:
>>>>> On Mon, 2025-10-13 at 15:48 +0200, Christian König wrote:
>>>>>> Hi everyone,
>>>>>>
>>>>>> dma_fences have ever lived under the tyranny dictated by the module
>>>>>> lifetime of their issuer, leading to crashes should anybody still holding
>>>>>> a reference to a dma_fence when the module of the issuer was unloaded.
>>>>>>
>>>>>> But those days are over! The patch set following this mail finally
>>>>>> implements a way for issuers to release their dma_fence out of this
>>>>>> slavery and outlive the module who originally created them.
>>>>>>
>>>>>> Previously various approaches have been discussed, including changing the
>>>>>> locking semantics of the dma_fence callbacks (by me) as well as using the
>>>>>> drm scheduler as intermediate layer (by Sima) to disconnect dma_fences
>>>>>> from their actual users.
>>>>>>
>>>>>> Changing the locking semantics turned out to be much more trickier than
>>>>>> originally thought because especially on older drivers (nouveau, radeon,
>>>>>> but also i915) this locking semantics is actually needed for correct
>>>>>> operation.
>>>>>>
>>>>>> Using the drm_scheduler as intermediate layer is still a good idea and
>>>>>> should probably be implemented to make live simpler for some drivers, but
>>>>>> doesn't work for all use cases. Especially TLB flush fences, preemption
>>>>>> fences and userqueue fences don't go through the drm scheduler because it
>>>>>> doesn't make sense for them.
>>>>>>
>>>>>> Tvrtko did some really nice prerequisite work by protecting the returned
>>>>>> strings of the dma_fence_ops by RCU. This way dma_fence creators where
>>>>>> able to just wait for an RCU grace period after fence signaling before
>>>>>> they could be save to free those data structures.
>>>>>>
>>>>>> Now this patch set here goes a step further and protects the whole
>>>>>> dma_fence_ops structure by RCU, so that after the fence signals the
>>>>>> pointer to the dma_fence_ops is set to NULL when there is no wait nor
>>>>>> release callback given. All functionality which use the dma_fence_ops
>>>>>> reference are put inside an RCU critical section, except for the
>>>>>> deprecated issuer specific wait and of course the optional release
>>>>>> callback.
>>>>>>
>>>>>> Additional to the RCU changes the lock protecting the dma_fence state
>>>>>> previously had to be allocated external. This set here now changes the
>>>>>> functionality to make that external lock optional and allows dma_fences
>>>>>> to use an inline lock and be self contained.
>>>>>
>>>>> Allowing for an embedded lock, is that actually necessary for the goals
>>>>> of this series, or is it an optional change / improvement?
>>>>
>>>> It is kind of necessary because otherwise you can't fully determine the lifetime of the lock.
>>>>
>>>> The lock is used to avoid signaling a dma_fence when you modify the linked list of callbacks for example.
>>>>
>>>> An alternative would be to protect the lock by RCU as well instead of embedding it in the structure, but that would make things even more complicated.
>>>>
>>>>> If I understood you correctly at XDC you wanted to have an embedded
>>>>> lock because it improves the memory footprint and because an external
>>>>> lock couldn't achieve some goals about fence-signaling-order originally
>>>>> intended. Can you elaborate on that?
>>>>
>>>> The embedded lock is also nice to have for the dma_fence_array, dma_fence_chain and drm_sched_fence, but that just saves a few cache lines in some use cases.
>>>>
>>>> The fence-signaling-order is important for drivers like radeon where the external lock is protecting multiple fences from signaling at the same time and makes sure that everything stays in order.
>
> Not to derail the conversation, but I noticed that dma-fence-arrays can,
> in fact, signal out of order. The issue lies in dma-fence-cb, which
> signals the fence using irq_queue_work. Internally, irq_queue_work uses
> llist, a LIFO structure. So, if two dma-fence-arrays have all their
> fences signaled from a thread, the IRQ work that signals each individual
> dma-fence-array will execute out of order.
>
> We should probably fix this.
No we don't. That's what I'm trying to point out all the time.
The original idea of sharing the lock was to guarantee that fence signal in order, but that never worked correct even for driver fences.
The background is the optimization we do in the signaling fast path. E.g. when dma_fence_is_signaled() is called.
This means that when fence A,B and C are submitted to the HW it is perfectly possible that somebody query the status of fence B but not A and C. And this querying of the status is faster than the interrupt which signals A and C.
So in this scenario B signals before A.
The only way to avoid that is to not implement the fast path and as far as I know no real HW driver does that because it makes your driver horrible slow.
So of to the trash bin with the signaling order, things have worked for over 10 years without it and as far as I know nobody complained about it.
Regards,
Christian.
>
> Matt
>
>>>
>>> I mean, neither external nor internal lock can somehow force the driver
>>> to signal fences in order, can they?
>>
>> Nope, as I said before this approach is actually pretty useless.
>>
>>> Only the driver can ensure this.
>>
>> Only when the signaled callback is not implemented which basically all driver do.
>>
>> So the whole point of sharing the lock is just not existent any more, it's just that changing it all at once as I tried before results in a way to big patch.
>>
>>>
>>> I am, however, considering modeling something like that on a
>>> FenceContext object:
>>>
>>> fctx.signal_all_fences_up_to_ordered(seqno);
>>
>> Yeah, I have patches for that as well. But then found that amdgpus TLB fences trigger that check and I won't have time to fix it.
>>
>>
>>
>>>
>>>
>>> P.
>>>
>>>>
>>>> While it is possible to change the locking semantics on such old drivers, it's probably just better to stay away from it.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>
>>>>> P.
>>>>>
>>>>>
>>>>>>
>>>>>> The new approach is then applied to amdgpu allowing the module to be
>>>>>> unloaded even when dma_fences issued by it are still around.
>>>>>>
>>>>>> Please review and comment,
>>>>>> Christian.
>>>>>>
>>>>>
>>>>
>>>
>>
next prev parent reply other threads:[~2025-10-30 10:59 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-13 13:48 Independence for dma_fences! Christian König
2025-10-13 13:48 ` [PATCH 01/15] dma-buf: cleanup dma_fence_describe Christian König
2025-10-14 14:37 ` Tvrtko Ursulin
2025-10-23 3:45 ` Matthew Brost
2025-10-13 13:48 ` [PATCH 02/15] dma-buf: rework stub fence initialisation Christian König
2025-10-14 15:03 ` Tvrtko Ursulin
2025-10-24 7:29 ` Tvrtko Ursulin
2025-10-13 13:48 ` [PATCH 03/15] dma-buf: protected fence ops by RCU Christian König
2025-10-16 18:04 ` Tvrtko Ursulin
2025-10-31 10:35 ` Tvrtko Ursulin
2025-10-13 13:48 ` [PATCH 04/15] dma-buf: detach fence ops on signal Christian König
2025-10-16 8:56 ` Tvrtko Ursulin
2025-10-16 15:57 ` Tvrtko Ursulin
2025-10-23 4:23 ` Matthew Brost
2025-10-23 4:44 ` Matthew Brost
2025-10-30 13:52 ` Christian König
2025-10-31 10:31 ` Tvrtko Ursulin
2025-10-17 9:14 ` Philipp Stanner
2025-10-30 15:05 ` Christian König
2025-10-13 13:48 ` [PATCH 05/15] dma-buf: inline spinlock for fence protection Christian König
2025-10-16 9:26 ` Tvrtko Ursulin
2025-11-03 13:07 ` Philipp Stanner
2025-10-23 18:09 ` Matthew Brost
2025-10-30 15:14 ` Christian König
2025-10-13 13:48 ` [PATCH 06/15] dma-buf: use inline lock for the stub fence Christian König
2025-10-13 13:48 ` [PATCH 07/15] dma-buf: use inline lock for the dma-fence-array Christian König
2025-10-13 13:48 ` [PATCH 08/15] dma-buf: use inline lock for the dma-fence-chain Christian König
2025-10-13 13:48 ` [PATCH 09/15] drm/sched: use inline locks for the drm-sched-fence Christian König
2025-10-13 13:48 ` [PATCH 10/15] drm/amdgpu: fix KFD eviction fence enable_signaling path Christian König
2025-10-13 13:48 ` [PATCH 11/15] drm/amdgpu: independence for the amdgpu_fence! Christian König
2025-10-13 13:48 ` [PATCH 12/15] drm/amdgpu: independence for the amdgpu_eviction_fence! Christian König
2025-10-13 13:48 ` [PATCH 13/15] drm/amdgpu: independence for the amdgpu_vm_tlb_fence! Christian König
2025-10-13 13:48 ` [PATCH 14/15] drm/amdgpu: independence for the amdkfd_fence! Christian König
2025-10-17 22:22 ` Felix Kuehling
2025-10-30 15:07 ` Christian König
2025-10-30 20:04 ` Felix Kuehling
2025-10-13 13:48 ` [PATCH 15/15] drm/amdgpu: independence for the amdgpu_userq__fence! Christian König
2025-10-13 14:54 ` Independence for dma_fences! Philipp Stanner
2025-10-14 15:54 ` Christian König
2025-10-17 8:32 ` Philipp Stanner
2025-10-28 14:06 ` Christian König
2025-10-29 20:53 ` Matthew Brost
2025-10-30 10:59 ` Christian König [this message]
2025-10-31 17:44 ` Matthew Brost
2025-11-03 11:43 ` Christian König
2025-11-03 19:32 ` Matthew Brost
2025-10-15 0:51 ` Dave Airlie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6739aebe-45eb-44a5-a539-209fc3ed809b@amd.com \
--to=christian.koenig@amd.com \
--cc=alexdeucher@gmail.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
--cc=phasta@kernel.org \
--cc=simona.vetter@ffwll.ch \
--cc=tursulin@ursulin.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox