Re: [PATCH 03/11] drm/amdgpu: nuke amdgpu_userq_fence_free

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Khatri, Sunil" <sukhatri@amd.com>
To: "Christian König" <christian.koenig@amd.com>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	alexander.deucher@amd.com, Prike.Liang@amd.com,
	amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 03/11] drm/amdgpu: nuke amdgpu_userq_fence_free
Date: Wed, 22 Apr 2026 20:02:32 +0530	[thread overview]
Message-ID: <6aa57b50-52e0-4a53-bd68-c6fb8d484dce@amd.com> (raw)
In-Reply-To: <ffed97ca-b8f4-414c-964c-3153df1d87e7@amd.com>


On 22-04-2026 03:42 pm, Christian König wrote:
> On 4/22/26 11:40, Khatri, Sunil wrote:
>> On 22-04-2026 02:56 pm, Christian König wrote:
>>> On 4/22/26 10:29, Khatri, Sunil wrote:
>>>> On 21-04-2026 06:25 pm, Christian König wrote:
>>>>> As preparation for independent fences remove the function and do all of
>>>>> it's cleanup directly after signaling.
>>>>>
>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>> ---
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       | 13 +--
>>>>>    .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   | 79 +++++++------------
>>>>>    .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h   |  3 -
>>>>>    3 files changed, 31 insertions(+), 64 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>> index c6546a858597..1b15b51dc3f4 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>> @@ -3162,11 +3162,7 @@ static int __init amdgpu_init(void)
>>>>>          r = amdgpu_sync_init();
>>>>>        if (r)
>>>>> -        goto error_sync;
>>>>> -
>>>>> -    r = amdgpu_userq_fence_slab_init();
>>>>> -    if (r)
>>>>> -        goto error_fence;
>>>>> +        return r;
>>>>>          amdgpu_register_atpx_handler();
>>>>>        amdgpu_acpi_detect();
>>>>> @@ -3182,12 +3178,6 @@ static int __init amdgpu_init(void)
>>>>>          /* let modprobe override vga console setting */
>>>>>        return pci_register_driver(&amdgpu_kms_pci_driver);
>>>>> -
>>>>> -error_fence:
>>>>> -    amdgpu_sync_fini();
>>>>> -
>>>>> -error_sync:
>>>>> -    return r;
>>>>>    }
>>>>>      static void __exit amdgpu_exit(void)
>>>>> @@ -3197,7 +3187,6 @@ static void __exit amdgpu_exit(void)
>>>>>        amdgpu_unregister_atpx_handler();
>>>>>        amdgpu_acpi_release();
>>>>>        amdgpu_sync_fini();
>>>>> -    amdgpu_userq_fence_slab_fini();
>>>>>        mmu_notifier_synchronize();
>>>>>        amdgpu_xcp_drv_release();
>>>>>    }
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>>>> index a58342c2ac44..909bdccc2a92 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>>>> @@ -32,29 +32,9 @@
>>>>>    #include "amdgpu.h"
>>>>>    #include "amdgpu_userq_fence.h"
>>>>>    -static const struct dma_fence_ops amdgpu_userq_fence_ops;
>>>>> -static struct kmem_cache *amdgpu_userq_fence_slab;
>>>>> -
>>>>>    #define AMDGPU_USERQ_MAX_HANDLES    (1U << 16)
>>>>>    -int amdgpu_userq_fence_slab_init(void)
>>>>> -{
>>>>> -    amdgpu_userq_fence_slab = kmem_cache_create("amdgpu_userq_fence",
>>>>> -                            sizeof(struct amdgpu_userq_fence),
>>>>> -                            0,
>>>>> -                            SLAB_HWCACHE_ALIGN,
>>>>> -                            NULL);
>>>> Are we not having benefit enough to continue create a cache here ? If that is fine that LGTM,
>>> Using all those kmem_cache instances was a bad idea to begin with.
>>>
>>> See the idea of a kmem_cache is to reduce the number of CPU cache lines and memory you need for certain number of objects when the object size is not a power of two.
>>>
>>> So for exampe two objects with 96 bytes only take 3 cache lines and 192 bytes instead of 256 bytes and 4 cache lines.
>>>
>>> But that difference is so marginally for most use cases that you absolutely don't need it.
>> Thanks for the explanation. Also if i am not wrong in cases where last no of such objects are often created and deleted kmem_cache helps in reuse and helps with internal fragmentation too.
> Nope that's not correct. See kmalloc() and co also uses kmem_cache underneath, they just round up the size of the allocated object to the next power of two.
>
> As long as you don't give special flags kmem_cache object storages are also shared among users, e.g. when one driver creates a kmem_cache for an 96 byte sized object and another driver does the same they will actually share their allocations which each other.
>
> Regards,
> Christian.
>
>> Regards
>>
>> Sunil Khatri
>>
>>> Regards,
>>> Christian.
>>>
>>>> Acked-by: Sunil Khatri <sunil.khatri@amd.com>
>>>>
>>>> Regards
>>>> Sunil
>>>>
>>>>
>>>>
>>>>> -    if (!amdgpu_userq_fence_slab)
>>>>> -        return -ENOMEM;
>>>>> -
>>>>> -    return 0;
>>>>> -}
>>>>> -
>>>>> -void amdgpu_userq_fence_slab_fini(void)
>>>>> -{
>>>>> -    rcu_barrier();
>>>>> -    kmem_cache_destroy(amdgpu_userq_fence_slab);
>>>>> -}
>>>>> +static const struct dma_fence_ops amdgpu_userq_fence_ops;
>>>>>      static inline struct amdgpu_userq_fence *to_amdgpu_userq_fence(struct dma_fence *f)
>>>>>    {
>>>>> @@ -146,12 +126,18 @@ amdgpu_userq_fence_driver_free(struct amdgpu_usermode_queue *userq)
>>>>>    }
>>>>>      static void
>>>>> -amdgpu_userq_fence_put_fence_drv_array(struct amdgpu_userq_fence *userq_fence)
>>>>> +amdgpu_userq_fence_put_fence_drv_refs(struct amdgpu_userq_fence *userq_fence)
>>>>>    {
>>>>>        unsigned long i;
>>>>> +
>>>>>        for (i = 0; i < userq_fence->fence_drv_array_count; i++)
>>>>>            amdgpu_userq_fence_driver_put(userq_fence->fence_drv_array[i]);
>>>>>        userq_fence->fence_drv_array_count = 0;
>>>>> +    kfree(userq_fence->fence_drv_array);
>>>>> +    userq_fence->fence_drv_array = NULL;
>>>>> +
>>>>> +    amdgpu_userq_fence_driver_put(userq_fence->fence_drv);
>>>>> +    userq_fence->fence_drv = NULL;

Setting this to NULL is hitting a NULL dereference in, so either we add 
a NULL check in the amdgpu_userq_fence_driver_process or lets not set 
the NULL.

Regards
Sunil Khatri

>>>>>    }
>>>>>      void amdgpu_userq_fence_driver_process(struct amdgpu_userq_fence_driver *fence_drv)
>>>>> @@ -181,10 +167,11 @@ void amdgpu_userq_fence_driver_process(struct amdgpu_userq_fence_driver *fence_d
>>>>>            fence = &userq_fence->base;
>>>>>            list_del_init(&userq_fence->link);
>>>>>            dma_fence_signal(fence);
>>>>> -        /* Drop fence_drv_array outside fence_list_lock
>>>>> +        /*
>>>>> +         * Drop fence_drv_array outside fence_list_lock
>>>>>             * to avoid the recursion lock.
>>>>>             */
>>>>> -        amdgpu_userq_fence_put_fence_drv_array(userq_fence);
>>>>> +        amdgpu_userq_fence_put_fence_drv_refs(userq_fence);
>>>>>            dma_fence_put(fence);
>>>>>        }
>>>>>    @@ -231,7 +218,7 @@ void amdgpu_userq_fence_driver_put(struct amdgpu_userq_fence_driver *fence_drv)
>>>>>      static int amdgpu_userq_fence_alloc(struct amdgpu_userq_fence **userq_fence)
>>>>>    {
>>>>> -    *userq_fence = kmem_cache_alloc(amdgpu_userq_fence_slab, GFP_ATOMIC);
>>>>> +    *userq_fence = kmalloc(sizeof(**userq_fence), GFP_ATOMIC);
>>>>>        return *userq_fence ? 0 : -ENOMEM;
>>>>>    }
>>>>>    @@ -299,7 +286,7 @@ static int amdgpu_userq_fence_create(struct amdgpu_usermode_queue *userq,
>>>>>        spin_unlock_irqrestore(&fence_drv->fence_list_lock, flags);
>>>>>          if (signaled)
>>>>> -        amdgpu_userq_fence_put_fence_drv_array(userq_fence);
>>>>> +        amdgpu_userq_fence_put_fence_drv_refs(userq_fence);
>>>>>          *f = fence;
>>>>>    @@ -333,29 +320,10 @@ static bool amdgpu_userq_fence_signaled(struct dma_fence *f)
>>>>>        return false;
>>>>>    }
>>>>>    -static void amdgpu_userq_fence_free(struct rcu_head *rcu)
>>>>> -{
>>>>> -    struct dma_fence *fence = container_of(rcu, struct dma_fence, rcu);
>>>>> -    struct amdgpu_userq_fence *userq_fence = to_amdgpu_userq_fence(fence);
>>>>> -    struct amdgpu_userq_fence_driver *fence_drv = userq_fence->fence_drv;
>>>>> -
>>>>> -    /* Release the fence driver reference */
>>>>> -    amdgpu_userq_fence_driver_put(fence_drv);
>>>>> -
>>>>> -    kvfree(userq_fence->fence_drv_array);
>>>>> -    kmem_cache_free(amdgpu_userq_fence_slab, userq_fence);
>>>>> -}
>>>>> -
>>>>> -static void amdgpu_userq_fence_release(struct dma_fence *f)
>>>>> -{
>>>>> -    call_rcu(&f->rcu, amdgpu_userq_fence_free);
>>>>> -}
>>>>> -
>>>>>    static const struct dma_fence_ops amdgpu_userq_fence_ops = {
>>>>>        .get_driver_name = amdgpu_userq_fence_get_driver_name,
>>>>>        .get_timeline_name = amdgpu_userq_fence_get_timeline_name,
>>>>>        .signaled = amdgpu_userq_fence_signaled,
>>>>> -    .release = amdgpu_userq_fence_release,
>>>>>    };
>>>>>      /**
>>>>> @@ -546,7 +514,7 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
>>>>>        r = amdgpu_userq_fence_create(queue, userq_fence, wptr, &fence);
>>>>>        if (r) {
>>>>>            mutex_unlock(&userq_mgr->userq_mutex);
>>>>> -        kmem_cache_free(amdgpu_userq_fence_slab, userq_fence);
>>>>> +        kfree(userq_fence);
>>>>>            goto put_gobj_write;
>>>>>        }
>>>>>    @@ -871,6 +839,7 @@ amdgpu_userq_wait_return_fence_info(struct drm_file *filp,
>>>>>        for (i = 0, cnt = 0; i < num_fences; i++) {
>>>>>            struct amdgpu_userq_fence_driver *fence_drv;
>>>>>            struct amdgpu_userq_fence *userq_fence;
>>>>> +        unsigned long flags;
>>>>>            u32 index;
>>>>>              userq_fence = to_amdgpu_userq_fence(fences[i]);
>>>>> @@ -886,7 +855,19 @@ amdgpu_userq_wait_return_fence_info(struct drm_file *filp,
>>>>>                continue;
>>>>>            }
>>>>>    +        spin_lock_irqsave(userq_fence->base.lock, flags);
>>>>> +        if (dma_fence_is_signaled_locked(&userq_fence->base)) {
>>>>> +            /*
>>>>> +             * It is possible that fence is already signaled and the
>>>>> +             * fence_drv now NULL, just skip over such fences.
>>>>> +            */
>>>>> +            spin_unlock_irqrestore(userq_fence->base.lock, flags);
>>>>> +            continue;
>>>>> +        }
>>>>>            fence_drv = userq_fence->fence_drv;
>>>>> +        amdgpu_userq_fence_driver_get(fence_drv);
>>>>> +        spin_unlock_irqrestore(userq_fence->base.lock, flags);
>>>>> +
>>>>>            /*
>>>>>             * We need to make sure the user queue release their reference
>>>>>             * to the fence drivers at some point before queue destruction.
>>>>> @@ -895,10 +876,10 @@ amdgpu_userq_wait_return_fence_info(struct drm_file *filp,
>>>>>             */
>>>>>            r = xa_alloc(&waitq->fence_drv_xa, &index, fence_drv,
>>>>>                     xa_limit_32b, GFP_KERNEL);
>>>>> -        if (r)
>>>>> +        if (r) {
>>>>> +            amdgpu_userq_fence_driver_put(fence_drv);
>>>>>                goto put_waitq;
>>>>> -
>>>>> -        amdgpu_userq_fence_driver_get(fence_drv);
>>>>> +        }
>>>>>              /* Store drm syncobj's gpu va address and value */
>>>>>            fence_info[cnt].va = fence_drv->va;
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
>>>>> index d56246ad8c26..d355a0eecc07 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
>>>>> @@ -58,9 +58,6 @@ struct amdgpu_userq_fence_driver {
>>>>>        char timeline_name[TASK_COMM_LEN];
>>>>>    };
>>>>>    -int amdgpu_userq_fence_slab_init(void);
>>>>> -void amdgpu_userq_fence_slab_fini(void);
>>>>> -
>>>>>    void amdgpu_userq_fence_driver_get(struct amdgpu_userq_fence_driver *fence_drv);
>>>>>    void amdgpu_userq_fence_driver_put(struct amdgpu_userq_fence_driver *fence_drv);
>>>>>    int amdgpu_userq_fence_driver_alloc(struct amdgpu_device *adev,

next prev parent reply	other threads:[~2026-04-22 14:32 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21 12:55 [PATCH 01/11] drm/amdgpu: fix AMDGPU_INFO_READ_MMR_REG Christian König
2026-04-21 12:55 ` [PATCH 02/11] drm/amdgpu: remove deadlocks from amdgpu_userq_pre_reset Christian König
2026-04-22  4:53   ` Khatri, Sunil
2026-04-22  7:13     ` Christian König
2026-04-22  7:19       ` Khatri, Sunil
2026-04-22  7:24         ` Christian König
2026-04-22  7:29           ` Khatri, Sunil
2026-04-27  8:45   ` Liang, Prike
2026-04-21 12:55 ` [PATCH 03/11] drm/amdgpu: nuke amdgpu_userq_fence_free Christian König
2026-04-22  8:29   ` Khatri, Sunil
2026-04-22  9:26     ` Christian König
2026-04-22  9:40       ` Khatri, Sunil
2026-04-22 10:12         ` Christian König
2026-04-22 14:32           ` Khatri, Sunil [this message]
2026-04-27  6:21   ` Liang, Prike
2026-04-21 12:55 ` [PATCH 04/11] drm/amdgpu: rework amdgpu_userq_signal_ioctl Christian König
2026-04-22 10:08   ` Khatri, Sunil
2026-04-22 10:14     ` Christian König
2026-04-22 15:14       ` Khatri, Sunil
2026-04-23  9:58   ` Liang, Prike
2026-04-23 10:47     ` Christian König
2026-04-23 10:54       ` Khatri, Sunil
2026-04-24  8:01       ` Liang, Prike
2026-04-24 13:02         ` Christian König
2026-04-21 12:55 ` [PATCH 05/11] drm/amdgpu: rework userq fence signal processing Christian König
2026-04-22 10:16   ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 06/11] drm/amdgpu: remove almost all calls to amdgpu_userq_detect_and_reset_queues Christian König
2026-04-22 10:20   ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 07/11] drm/amdgpu: fix userq hang detection and reset Christian König
2026-04-22 10:35   ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 08/11] drm/amdgpu: rework userq reset work handling Christian König
2026-04-23 10:43   ` Khatri, Sunil
2026-05-11 17:50     ` Christian König
2026-05-11 17:58       ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 09/11] drm/amdgpu: revert to old status lock handling v4 Christian König
2026-04-23 10:45   ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 10/11] drm/amdgpu: restructure VM state machine v2 Christian König
2026-04-23 10:46   ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 11/11] drm/amdgpu: WIP sync amdgpu_ttm_fill_mem only to kernel fences Christian König
2026-04-23 10:47   ` Khatri, Sunil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6aa57b50-52e0-4a53-bd68-c6fb8d484dce@amd.com \
    --to=sukhatri@amd.com \
    --cc=Prike.Liang@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.