From: "Christian König" <christian.koenig@amd.com>
To: phasta@kernel.org, alexdeucher@gmail.com, simona.vetter@ffwll.ch,
tursulin@ursulin.net, matthew.brost@intel.com,
dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org,
linaro-mm-sig@lists.linaro.org, sumit.semwal@linaro.org
Subject: Re: [PATCH 13/18] drm/amdgpu: independence for the amdkfd_fence! v2
Date: Fri, 28 Nov 2025 11:06:18 +0100 [thread overview]
Message-ID: <30c8a395-6870-4787-a954-6c9cbc68be62@amd.com> (raw)
In-Reply-To: <3cf92ff5fa9c9c73c8464434b0e8e13e402091fd.camel@mailbox.org>
On 11/27/25 12:10, Philipp Stanner wrote:
> On Thu, 2025-11-13 at 15:51 +0100, Christian König wrote:
>> This should allow amdkfd_fences to outlive the amdgpu module.
>>
>> v2: implement Felix suggestion to lock the fence while signaling it.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 6 +++
>> .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 39 ++++++++-----------
>> drivers/gpu/drm/amd/amdkfd/kfd_process.c | 7 ++--
>> drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 +-
>> 4 files changed, 27 insertions(+), 29 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> index 8bdfcde2029b..6254cef04213 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> @@ -196,6 +196,7 @@ int kfd_debugfs_kfd_mem_limits(struct seq_file *m, void *data);
>> #endif
>> #if IS_ENABLED(CONFIG_HSA_AMD)
>> bool amdkfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm);
>> +void amdkfd_fence_signal(struct dma_fence *f);
>> struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f);
>> void amdgpu_amdkfd_remove_all_eviction_fences(struct amdgpu_bo *bo);
>> int amdgpu_amdkfd_evict_userptr(struct mmu_interval_notifier *mni,
>> @@ -210,6 +211,11 @@ bool amdkfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm)
>> return false;
>> }
>>
>> +static inline
>> +void amdkfd_fence_signal(struct dma_fence *f)
>> +{
>
> I would add a short comment here: "Empty function because …"
>
>> +}
>> +
>> static inline
>> struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f)
>> {
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>> index 09c919f72b6c..f76c3c52a2a1 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>> @@ -127,29 +127,9 @@ static bool amdkfd_fence_enable_signaling(struct dma_fence *f)
>> if (!svm_range_schedule_evict_svm_bo(fence))
>> return true;
>> }
>> - return false;
>> -}
>> -
>> -/**
>> - * amdkfd_fence_release - callback that fence can be freed
>> - *
>> - * @f: dma_fence
>> - *
>> - * This function is called when the reference count becomes zero.
>> - * Drops the mm_struct reference and RCU schedules freeing up the fence.
>> - */
>> -static void amdkfd_fence_release(struct dma_fence *f)
>> -{
>> - struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>> -
>> - /* Unconditionally signal the fence. The process is getting
>> - * terminated.
>> - */
>> - if (WARN_ON(!fence))
>> - return; /* Not an amdgpu_amdkfd_fence */
>> -
>> mmdrop(fence->mm);
>> - kfree_rcu(f, rcu);
>> + fence->mm = NULL;
>
> That the storage actually takes place is guaranteed by the lock taken
> when calling the fence ops?
>
>> + return false;
>> }
>>
>> /**
>> @@ -174,9 +154,22 @@ bool amdkfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm)
>> return false;
>> }
>>
>> +void amdkfd_fence_signal(struct dma_fence *f)
>> +{
>> + struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>> + long flags;
>> +
>> + dma_fence_lock_irqsafe(f, flags)
>> + if (fence->mm) {
>> + mmdrop(fence->mm);
>> + fence->mm = NULL;
>> + }
>> + dma_fence_signal_locked(f);
>> + dma_fence_unlock_irqrestore(f, flags)
>> +}
>> +
>> static const struct dma_fence_ops amdkfd_fence_ops = {
>> .get_driver_name = amdkfd_fence_get_driver_name,
>> .get_timeline_name = amdkfd_fence_get_timeline_name,
>> .enable_signaling = amdkfd_fence_enable_signaling,
>> - .release = amdkfd_fence_release,
>> };
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> index a085faac9fe1..8fac70b839ed 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> @@ -1173,7 +1173,7 @@ static void kfd_process_wq_release(struct work_struct *work)
>> synchronize_rcu();
>> ef = rcu_access_pointer(p->ef);
>> if (ef)
>> - dma_fence_signal(ef);
>> + amdkfd_fence_signal(ef);
>>
>> kfd_process_remove_sysfs(p);
>> kfd_debugfs_remove_process(p);
>> @@ -1990,7 +1990,6 @@ kfd_process_gpuid_from_node(struct kfd_process *p, struct kfd_node *node,
>> static int signal_eviction_fence(struct kfd_process *p)
>> {
>> struct dma_fence *ef;
>> - int ret;
>>
>> rcu_read_lock();
>> ef = dma_fence_get_rcu_safe(&p->ef);
>> @@ -1998,10 +1997,10 @@ static int signal_eviction_fence(struct kfd_process *p)
>> if (!ef)
>> return -EINVAL;
>>
>> - ret = dma_fence_signal(ef);
>> + amdkfd_fence_signal(ef);
>> dma_fence_put(ef);
>>
>> - return ret;
>> + return 0;
>
> Oh wait, that's the code I'm also touching in my return code series!
>
> https://lore.kernel.org/dri-devel/cef83fed-5994-4c77-962c-9c7aac9f7306@amd.com/
>
>
> Does this series then solve the problem Felix pointed out in
> evict_process_worker()?
No it doesn't, I wasn't aware that the higher level code actually needs the status. After all Felix is the maintainer of this part.
This patch here needs to be rebased on top of yours and changed accordingly to still return the fence status correctly.
But thanks for pointing that out.
Regards,
Christian.
>
>
> P.
>
>
>> }
>>
>> static void evict_process_worker(struct work_struct *work)
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> index c30dfb8ec236..566950702b7d 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> @@ -428,7 +428,7 @@ static void svm_range_bo_release(struct kref *kref)
>>
>> if (!dma_fence_is_signaled(&svm_bo->eviction_fence->base))
>> /* We're not in the eviction worker. Signal the fence. */
>> - dma_fence_signal(&svm_bo->eviction_fence->base);
>> + amdkfd_fence_signal(&svm_bo->eviction_fence->base);
>> dma_fence_put(&svm_bo->eviction_fence->base);
>> amdgpu_bo_unref(&svm_bo->bo);
>> kfree(svm_bo);
>> @@ -3628,7 +3628,7 @@ static void svm_range_evict_svm_bo_worker(struct work_struct *work)
>> mmap_read_unlock(mm);
>> mmput(mm);
>>
>> - dma_fence_signal(&svm_bo->eviction_fence->base);
>> + amdkfd_fence_signal(&svm_bo->eviction_fence->base);
>>
>> /* This is the last reference to svm_bo, after svm_range_vram_node_free
>> * has been called in svm_migrate_vram_to_ram
>
next prev parent reply other threads:[~2025-11-28 10:06 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-13 14:51 Independence for dma_fences! v3 Christian König
2025-11-13 14:51 ` [PATCH 01/18] dma-buf: cleanup dma_fence_describe v3 Christian König
2025-11-20 14:09 ` Tvrtko Ursulin
2025-11-27 10:17 ` Philipp Stanner
2025-11-13 14:51 ` [PATCH 02/18] dma-buf: protected fence ops by RCU v3 Christian König
2025-11-14 10:50 ` Tvrtko Ursulin
2025-11-18 14:28 ` Christian König
2025-11-18 16:03 ` Tvrtko Ursulin
2025-11-20 14:03 ` Christian König
2025-11-20 14:08 ` Tvrtko Ursulin
2025-11-13 14:51 ` [PATCH 03/18] dma-buf: detach fence ops on signal v2 Christian König
2025-11-20 14:14 ` Tvrtko Ursulin
2025-11-27 10:29 ` Philipp Stanner
2025-11-13 14:51 ` [PATCH 04/18] dma-buf: inline spinlock for fence protection v2 Christian König
2025-11-13 20:49 ` kernel test robot
2025-11-14 7:30 ` kernel test robot
2025-11-14 11:49 ` Tvrtko Ursulin
2025-11-27 10:44 ` Philipp Stanner
2025-11-13 14:51 ` [PATCH 05/18] dma-buf: use inline lock for the stub fence Christian König
2025-11-27 10:50 ` Philipp Stanner
2025-11-28 12:31 ` Christian König
2025-11-13 14:51 ` [PATCH 06/18] dma-buf: use inline lock for the dma-fence-array Christian König
2025-11-27 10:51 ` Philipp Stanner
2025-11-13 14:51 ` [PATCH 07/18] dma-buf: use inline lock for the dma-fence-chain Christian König
2025-11-27 10:52 ` Philipp Stanner
2025-11-13 14:51 ` [PATCH 08/18] drm/sched: use inline locks for the drm-sched-fence Christian König
2025-11-13 16:23 ` Philipp Stanner
2025-11-17 15:32 ` Christian König
2025-11-18 7:10 ` Philipp Stanner
2025-11-20 14:17 ` Tvrtko Ursulin
2025-11-13 14:51 ` [PATCH 09/18] drm/amdgpu: fix KFD eviction fence enable_signaling path Christian König
2025-11-27 10:57 ` Philipp Stanner
2025-11-28 10:01 ` Christian König
2025-11-13 14:51 ` [PATCH 10/18] drm/amdgpu: independence for the amdgpu_fence! Christian König
2025-11-20 14:42 ` Tvrtko Ursulin
2025-11-13 14:51 ` [PATCH 11/18] drm/amdgpu: independence for the amdgpu_eviction_fence! Christian König
2025-11-27 11:02 ` Philipp Stanner
2025-11-13 14:51 ` [PATCH 12/18] drm/amdgpu: independence for the amdgpu_vm_tlb_fence! Christian König
2025-11-13 14:51 ` [PATCH 13/18] drm/amdgpu: independence for the amdkfd_fence! v2 Christian König
2025-11-14 11:43 ` kernel test robot
2025-11-27 11:10 ` Philipp Stanner
2025-11-28 10:06 ` Christian König [this message]
2025-11-28 10:10 ` Philipp Stanner
2025-11-28 10:12 ` Christian König
2025-11-13 14:51 ` [PATCH 14/18] drm/amdgpu: independence for the amdgpu_userq__fence! Christian König
2025-11-13 14:51 ` [PATCH 15/18] drm/xe: Disconnect the low hanging fences from Xe module Christian König
2025-11-13 14:51 ` [PATCH 16/18] drm/xe: Drop HW fence slab Christian König
2025-11-13 14:51 ` [PATCH 17/18] drm/xe: Promote xe_hw_fence_irq to an ref counted object Christian König
2025-11-13 14:51 ` [PATCH 18/18] drm/xe: Finish disconnect HW fences from module Christian König
2025-11-27 11:17 ` Philipp Stanner
2025-11-13 16:20 ` Independence for dma_fences! v3 Philipp Stanner
2025-11-17 15:28 ` Christian König
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=30c8a395-6870-4787-a954-6c9cbc68be62@amd.com \
--to=christian.koenig@amd.com \
--cc=alexdeucher@gmail.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=linaro-mm-sig@lists.linaro.org \
--cc=matthew.brost@intel.com \
--cc=phasta@kernel.org \
--cc=simona.vetter@ffwll.ch \
--cc=sumit.semwal@linaro.org \
--cc=tursulin@ursulin.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox