Re: [PATCH 03/11] drm/amdgpu: nuke amdgpu_userq_fence_free

public inbox for amd-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed

From: "Khatri, Sunil" <sukhatri@amd.com>
To: "Christian König" <christian.koenig@amd.com>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	alexander.deucher@amd.com, Prike.Liang@amd.com,
	amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 03/11] drm/amdgpu: nuke amdgpu_userq_fence_free
Date: Wed, 22 Apr 2026 20:02:32 +0530	[thread overview]
Message-ID: <6aa57b50-52e0-4a53-bd68-c6fb8d484dce@amd.com> (raw)
In-Reply-To: <ffed97ca-b8f4-414c-964c-3153df1d87e7@amd.com>


On 22-04-2026 03:42 pm, Christian König wrote:
> On 4/22/26 11:40, Khatri, Sunil wrote:
>> On 22-04-2026 02:56 pm, Christian König wrote:
>>> On 4/22/26 10:29, Khatri, Sunil wrote:
>>>> On 21-04-2026 06:25 pm, Christian König wrote:
>>>>> As preparation for independent fences remove the function and do all of
>>>>> it's cleanup directly after signaling.
>>>>>
>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>> ---
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       | 13 +--
>>>>>    .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   | 79 +++++++------------
>>>>>    .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h   |  3 -
>>>>>    3 files changed, 31 insertions(+), 64 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>> index c6546a858597..1b15b51dc3f4 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>> @@ -3162,11 +3162,7 @@ static int __init amdgpu_init(void)
>>>>>          r = amdgpu_sync_init();
>>>>>        if (r)
>>>>> -        goto error_sync;
>>>>> -
>>>>> -    r = amdgpu_userq_fence_slab_init();
>>>>> -    if (r)
>>>>> -        goto error_fence;
>>>>> +        return r;
>>>>>          amdgpu_register_atpx_handler();
>>>>>        amdgpu_acpi_detect();
>>>>> @@ -3182,12 +3178,6 @@ static int __init amdgpu_init(void)
>>>>>          /* let modprobe override vga console setting */
>>>>>        return pci_register_driver(&amdgpu_kms_pci_driver);
>>>>> -
>>>>> -error_fence:
>>>>> -    amdgpu_sync_fini();
>>>>> -
>>>>> -error_sync:
>>>>> -    return r;
>>>>>    }
>>>>>      static void __exit amdgpu_exit(void)
>>>>> @@ -3197,7 +3187,6 @@ static void __exit amdgpu_exit(void)
>>>>>        amdgpu_unregister_atpx_handler();
>>>>>        amdgpu_acpi_release();
>>>>>        amdgpu_sync_fini();
>>>>> -    amdgpu_userq_fence_slab_fini();
>>>>>        mmu_notifier_synchronize();
>>>>>        amdgpu_xcp_drv_release();
>>>>>    }
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>>>> index a58342c2ac44..909bdccc2a92 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>>>> @@ -32,29 +32,9 @@
>>>>>    #include "amdgpu.h"
>>>>>    #include "amdgpu_userq_fence.h"
>>>>>    -static const struct dma_fence_ops amdgpu_userq_fence_ops;
>>>>> -static struct kmem_cache *amdgpu_userq_fence_slab;
>>>>> -
>>>>>    #define AMDGPU_USERQ_MAX_HANDLES    (1U << 16)
>>>>>    -int amdgpu_userq_fence_slab_init(void)
>>>>> -{
>>>>> -    amdgpu_userq_fence_slab = kmem_cache_create("amdgpu_userq_fence",
>>>>> -                            sizeof(struct amdgpu_userq_fence),
>>>>> -                            0,
>>>>> -                            SLAB_HWCACHE_ALIGN,
>>>>> -                            NULL);
>>>> Are we not having benefit enough to continue create a cache here ? If that is fine that LGTM,
>>> Using all those kmem_cache instances was a bad idea to begin with.
>>>
>>> See the idea of a kmem_cache is to reduce the number of CPU cache lines and memory you need for certain number of objects when the object size is not a power of two.
>>>
>>> So for exampe two objects with 96 bytes only take 3 cache lines and 192 bytes instead of 256 bytes and 4 cache lines.
>>>
>>> But that difference is so marginally for most use cases that you absolutely don't need it.
>> Thanks for the explanation. Also if i am not wrong in cases where last no of such objects are often created and deleted kmem_cache helps in reuse and helps with internal fragmentation too.
> Nope that's not correct. See kmalloc() and co also uses kmem_cache underneath, they just round up the size of the allocated object to the next power of two.
>
> As long as you don't give special flags kmem_cache object storages are also shared among users, e.g. when one driver creates a kmem_cache for an 96 byte sized object and another driver does the same they will actually share their allocations which each other.
>
> Regards,
> Christian.
>
>> Regards
>>
>> Sunil Khatri
>>
>>> Regards,
>>> Christian.
>>>
>>>> Acked-by: Sunil Khatri <sunil.khatri@amd.com>
>>>>
>>>> Regards
>>>> Sunil
>>>>
>>>>
>>>>
>>>>> -    if (!amdgpu_userq_fence_slab)
>>>>> -        return -ENOMEM;
>>>>> -
>>>>> -    return 0;
>>>>> -}
>>>>> -
>>>>> -void amdgpu_userq_fence_slab_fini(void)
>>>>> -{
>>>>> -    rcu_barrier();
>>>>> -    kmem_cache_destroy(amdgpu_userq_fence_slab);
>>>>> -}
>>>>> +static const struct dma_fence_ops amdgpu_userq_fence_ops;
>>>>>      static inline struct amdgpu_userq_fence *to_amdgpu_userq_fence(struct dma_fence *f)
>>>>>    {
>>>>> @@ -146,12 +126,18 @@ amdgpu_userq_fence_driver_free(struct amdgpu_usermode_queue *userq)
>>>>>    }
>>>>>      static void
>>>>> -amdgpu_userq_fence_put_fence_drv_array(struct amdgpu_userq_fence *userq_fence)
>>>>> +amdgpu_userq_fence_put_fence_drv_refs(struct amdgpu_userq_fence *userq_fence)
>>>>>    {
>>>>>        unsigned long i;
>>>>> +
>>>>>        for (i = 0; i < userq_fence->fence_drv_array_count; i++)
>>>>>            amdgpu_userq_fence_driver_put(userq_fence->fence_drv_array[i]);
>>>>>        userq_fence->fence_drv_array_count = 0;
>>>>> +    kfree(userq_fence->fence_drv_array);
>>>>> +    userq_fence->fence_drv_array = NULL;
>>>>> +
>>>>> +    amdgpu_userq_fence_driver_put(userq_fence->fence_drv);
>>>>> +    userq_fence->fence_drv = NULL;

Setting this to NULL is hitting a NULL dereference in, so either we add 
a NULL check in the amdgpu_userq_fence_driver_process or lets not set 
the NULL.

Regards
Sunil Khatri

>>>>>    }
>>>>>      void amdgpu_userq_fence_driver_process(struct amdgpu_userq_fence_driver *fence_drv)
>>>>> @@ -181,10 +167,11 @@ void amdgpu_userq_fence_driver_process(struct amdgpu_userq_fence_driver *fence_d
>>>>>            fence = &userq_fence->base;
>>>>>            list_del_init(&userq_fence->link);
>>>>>            dma_fence_signal(fence);
>>>>> -        /* Drop fence_drv_array outside fence_list_lock
>>>>> +        /*
>>>>> +         * Drop fence_drv_array outside fence_list_lock
>>>>>             * to avoid the recursion lock.
>>>>>             */
>>>>> -        amdgpu_userq_fence_put_fence_drv_array(userq_fence);
>>>>> +        amdgpu_userq_fence_put_fence_drv_refs(userq_fence);
>>>>>            dma_fence_put(fence);
>>>>>        }
>>>>>    @@ -231,7 +218,7 @@ void amdgpu_userq_fence_driver_put(struct amdgpu_userq_fence_driver *fence_drv)
>>>>>      static int amdgpu_userq_fence_alloc(struct amdgpu_userq_fence **userq_fence)
>>>>>    {
>>>>> -    *userq_fence = kmem_cache_alloc(amdgpu_userq_fence_slab, GFP_ATOMIC);
>>>>> +    *userq_fence = kmalloc(sizeof(**userq_fence), GFP_ATOMIC);
>>>>>        return *userq_fence ? 0 : -ENOMEM;
>>>>>    }
>>>>>    @@ -299,7 +286,7 @@ static int amdgpu_userq_fence_create(struct amdgpu_usermode_queue *userq,
>>>>>        spin_unlock_irqrestore(&fence_drv->fence_list_lock, flags);
>>>>>          if (signaled)
>>>>> -        amdgpu_userq_fence_put_fence_drv_array(userq_fence);
>>>>> +        amdgpu_userq_fence_put_fence_drv_refs(userq_fence);
>>>>>          *f = fence;
>>>>>    @@ -333,29 +320,10 @@ static bool amdgpu_userq_fence_signaled(struct dma_fence *f)
>>>>>        return false;
>>>>>    }
>>>>>    -static void amdgpu_userq_fence_free(struct rcu_head *rcu)
>>>>> -{
>>>>> -    struct dma_fence *fence = container_of(rcu, struct dma_fence, rcu);
>>>>> -    struct amdgpu_userq_fence *userq_fence = to_amdgpu_userq_fence(fence);
>>>>> -    struct amdgpu_userq_fence_driver *fence_drv = userq_fence->fence_drv;
>>>>> -
>>>>> -    /* Release the fence driver reference */
>>>>> -    amdgpu_userq_fence_driver_put(fence_drv);
>>>>> -
>>>>> -    kvfree(userq_fence->fence_drv_array);
>>>>> -    kmem_cache_free(amdgpu_userq_fence_slab, userq_fence);
>>>>> -}
>>>>> -
>>>>> -static void amdgpu_userq_fence_release(struct dma_fence *f)
>>>>> -{
>>>>> -    call_rcu(&f->rcu, amdgpu_userq_fence_free);
>>>>> -}
>>>>> -
>>>>>    static const struct dma_fence_ops amdgpu_userq_fence_ops = {
>>>>>        .get_driver_name = amdgpu_userq_fence_get_driver_name,
>>>>>        .get_timeline_name = amdgpu_userq_fence_get_timeline_name,
>>>>>        .signaled = amdgpu_userq_fence_signaled,
>>>>> -    .release = amdgpu_userq_fence_release,
>>>>>    };
>>>>>      /**
>>>>> @@ -546,7 +514,7 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
>>>>>        r = amdgpu_userq_fence_create(queue, userq_fence, wptr, &fence);
>>>>>        if (r) {
>>>>>            mutex_unlock(&userq_mgr->userq_mutex);
>>>>> -        kmem_cache_free(amdgpu_userq_fence_slab, userq_fence);
>>>>> +        kfree(userq_fence);
>>>>>            goto put_gobj_write;
>>>>>        }
>>>>>    @@ -871,6 +839,7 @@ amdgpu_userq_wait_return_fence_info(struct drm_file *filp,
>>>>>        for (i = 0, cnt = 0; i < num_fences; i++) {
>>>>>            struct amdgpu_userq_fence_driver *fence_drv;
>>>>>            struct amdgpu_userq_fence *userq_fence;
>>>>> +        unsigned long flags;
>>>>>            u32 index;
>>>>>              userq_fence = to_amdgpu_userq_fence(fences[i]);
>>>>> @@ -886,7 +855,19 @@ amdgpu_userq_wait_return_fence_info(struct drm_file *filp,
>>>>>                continue;
>>>>>            }
>>>>>    +        spin_lock_irqsave(userq_fence->base.lock, flags);
>>>>> +        if (dma_fence_is_signaled_locked(&userq_fence->base)) {
>>>>> +            /*
>>>>> +             * It is possible that fence is already signaled and the
>>>>> +             * fence_drv now NULL, just skip over such fences.
>>>>> +            */
>>>>> +            spin_unlock_irqrestore(userq_fence->base.lock, flags);
>>>>> +            continue;
>>>>> +        }
>>>>>            fence_drv = userq_fence->fence_drv;
>>>>> +        amdgpu_userq_fence_driver_get(fence_drv);
>>>>> +        spin_unlock_irqrestore(userq_fence->base.lock, flags);
>>>>> +
>>>>>            /*
>>>>>             * We need to make sure the user queue release their reference
>>>>>             * to the fence drivers at some point before queue destruction.
>>>>> @@ -895,10 +876,10 @@ amdgpu_userq_wait_return_fence_info(struct drm_file *filp,
>>>>>             */
>>>>>            r = xa_alloc(&waitq->fence_drv_xa, &index, fence_drv,
>>>>>                     xa_limit_32b, GFP_KERNEL);
>>>>> -        if (r)
>>>>> +        if (r) {
>>>>> +            amdgpu_userq_fence_driver_put(fence_drv);
>>>>>                goto put_waitq;
>>>>> -
>>>>> -        amdgpu_userq_fence_driver_get(fence_drv);
>>>>> +        }
>>>>>              /* Store drm syncobj's gpu va address and value */
>>>>>            fence_info[cnt].va = fence_drv->va;
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
>>>>> index d56246ad8c26..d355a0eecc07 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
>>>>> @@ -58,9 +58,6 @@ struct amdgpu_userq_fence_driver {
>>>>>        char timeline_name[TASK_COMM_LEN];
>>>>>    };
>>>>>    -int amdgpu_userq_fence_slab_init(void);
>>>>> -void amdgpu_userq_fence_slab_fini(void);
>>>>> -
>>>>>    void amdgpu_userq_fence_driver_get(struct amdgpu_userq_fence_driver *fence_drv);
>>>>>    void amdgpu_userq_fence_driver_put(struct amdgpu_userq_fence_driver *fence_drv);
>>>>>    int amdgpu_userq_fence_driver_alloc(struct amdgpu_device *adev,

next prev parent reply	other threads:[~2026-04-22 14:32 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21 12:55 [PATCH 01/11] drm/amdgpu: fix AMDGPU_INFO_READ_MMR_REG Christian König
2026-04-21 12:55 ` [PATCH 02/11] drm/amdgpu: remove deadlocks from amdgpu_userq_pre_reset Christian König
2026-04-22  4:53   ` Khatri, Sunil
2026-04-22  7:13     ` Christian König
2026-04-22  7:19       ` Khatri, Sunil
2026-04-22  7:24         ` Christian König
2026-04-22  7:29           ` Khatri, Sunil
2026-04-27  8:45   ` Liang, Prike
2026-04-21 12:55 ` [PATCH 03/11] drm/amdgpu: nuke amdgpu_userq_fence_free Christian König
2026-04-22  8:29   ` Khatri, Sunil
2026-04-22  9:26     ` Christian König
2026-04-22  9:40       ` Khatri, Sunil
2026-04-22 10:12         ` Christian König
2026-04-22 14:32           ` Khatri, Sunil [this message]
2026-04-27  6:21   ` Liang, Prike
2026-04-21 12:55 ` [PATCH 04/11] drm/amdgpu: rework amdgpu_userq_signal_ioctl Christian König
2026-04-22 10:08   ` Khatri, Sunil
2026-04-22 10:14     ` Christian König
2026-04-22 15:14       ` Khatri, Sunil
2026-04-23  9:58   ` Liang, Prike
2026-04-23 10:47     ` Christian König
2026-04-23 10:54       ` Khatri, Sunil
2026-04-24  8:01       ` Liang, Prike
2026-04-24 13:02         ` Christian König
2026-04-21 12:55 ` [PATCH 05/11] drm/amdgpu: rework userq fence signal processing Christian König
2026-04-22 10:16   ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 06/11] drm/amdgpu: remove almost all calls to amdgpu_userq_detect_and_reset_queues Christian König
2026-04-22 10:20   ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 07/11] drm/amdgpu: fix userq hang detection and reset Christian König
2026-04-22 10:35   ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 08/11] drm/amdgpu: rework userq reset work handling Christian König
2026-04-23 10:43   ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 09/11] drm/amdgpu: revert to old status lock handling v4 Christian König
2026-04-23 10:45   ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 10/11] drm/amdgpu: restructure VM state machine v2 Christian König
2026-04-23 10:46   ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 11/11] drm/amdgpu: WIP sync amdgpu_ttm_fill_mem only to kernel fences Christian König
2026-04-23 10:47   ` Khatri, Sunil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6aa57b50-52e0-4a53-bd68-c6fb8d484dce@amd.com \
    --to=sukhatri@amd.com \
    --cc=Prike.Liang@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox