From: "Christian König" <christian.koenig@amd.com>
To: "Liang, Prike" <Prike.Liang@amd.com>,
"Khatri, Sunil" <Sunil.Khatri@amd.com>
Cc: "Deucher, Alexander" <Alexander.Deucher@amd.com>,
"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH 04/11] drm/amdgpu: rework amdgpu_userq_signal_ioctl
Date: Fri, 24 Apr 2026 15:02:21 +0200 [thread overview]
Message-ID: <fead91ae-a962-4124-88fb-ac746cfc525c@amd.com> (raw)
In-Reply-To: <PH7PR12MB60002D0F42DB677AC0C2F40AFB2B2@PH7PR12MB6000.namprd12.prod.outlook.com>
Hi Prike,
On 4/24/26 10:01, Liang, Prike wrote:
>> -----Original Message-----
>> From: Koenig, Christian <Christian.Koenig@amd.com>
>> Sent: Thursday, April 23, 2026 6:48 PM
>> To: Liang, Prike <Prike.Liang@amd.com>; Khatri, Sunil <Sunil.Khatri@amd.com>
>> Cc: Koenig, Christian <Christian.Koenig@amd.com>; Deucher, Alexander
>> <Alexander.Deucher@amd.com>; amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH 04/11] drm/amdgpu: rework amdgpu_userq_signal_ioctl
>>
>> Hi guys,
>>
>> On 4/23/26 11:58, Liang, Prike wrote:
>> ...
>>>> -static int amdgpu_userq_fence_alloc(struct amdgpu_userq_fence
>>>> **userq_fence)
>>>> +static int amdgpu_userq_fence_alloc(struct amdgpu_usermode_queue *userq,
>>>> + struct amdgpu_userq_fence **pfence)
>>>> {
>>>> - *userq_fence = kmalloc(sizeof(**userq_fence), GFP_ATOMIC);
>>>> - return *userq_fence ? 0 : -ENOMEM;
>>>> + struct amdgpu_userq_fence_driver *fence_drv = userq->fence_drv;
>>>> + struct amdgpu_userq_fence *userq_fence;
>>>> + unsigned long count;
>>> We must initialize count; otherwise, it may contain a garbage value,
>>> which can cause amdgpu_userq_fence_alloc() to fail and, in turn, make userq
>> fence emission fail.
>>
>> I've got the same comment from both Sunil and Prike but as far as I can see and
>> that is actually incorrect.
> This patch breaks the userq fence emit path, causing desktop boot to fail. Initializing count only works around the amdgpu_userq_fence_alloc() failure, and it doesn't address the root cause, which is that xa_find() cannot initialize count when fence_drv_xa itself hasn't been set up yet. Instead of just initializing count, we may need to check the return value of xa_find(), and if no wait fences are pending, skip retrieving the wait fence array entirely.
Yeah Sunil and I figured out what was wrong here.
I was looking at the xas_find() function and thought that xa_find() would be just a wrapper around that.
But that doesn't work like that. So I not only need to initialize count, but use the xas_fine function directly.
Thanks for pointing that out,
Christian.
>
>>>
>>>> + userq_fence = kmalloc(sizeof(*userq_fence), GFP_KERNEL);
>>>> + if (!userq_fence)
>>>> + return -ENOMEM;
>>>> +
>>>> + /*
>>>> + * Get the next unused entry, since we fill from the start this can be
>>>> + * used as size to allocate the array.
>>>> + */
>>>> + mutex_lock(&userq->fence_drv_lock);
>>>> + xa_find(&userq->fence_drv_xa, &count, ULONG_MAX, XA_FREE_MARK);
>>
>> The count should be initialized here. But could be that this doesn't work.
>>
>> Did you guys got a KASAN warning or something like that?
> I didn't see the KASAN warning. However, the underlying problem is that when fence_drv_xa hasn't been set up, count remains uninitialized (garbage), which eventually causes kvmalloc_array() to fail when allocating fence_drv_array.
>
>>>> +
>>>> + userq_fence->fence_drv_array = kvmalloc_array(count, sizeof(fence_drv),
>>>> + GFP_KERNEL);
>>>> + if (!userq_fence->fence_drv_array) {
>>>> + mutex_unlock(&userq->fence_drv_lock);
>>>> + kfree(userq_fence);
>>>> + return -ENOMEM;
>>>> + }
>>>> +
>>>> + userq_fence->fence_drv_array_count = count;
>>>> + xa_extract(&userq->fence_drv_xa, (void **)userq_fence->fence_drv_array,
>>>> + 0, ULONG_MAX, count, XA_PRESENT);
>>> We may need to assign the userq_fence->fence_drv_array_count the exact copied
>> number from the xa_extract().
>>
>> Interresting point. Why could that differ ?
> Generally, xa_extract() should return the same number as count, but when there's a retry entry, the actual number of copied entries may differ from the wait fence array capacity indicated by count.
>
>> Thanks for the comments,
>> Christian.
next prev parent reply other threads:[~2026-04-24 13:02 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-21 12:55 [PATCH 01/11] drm/amdgpu: fix AMDGPU_INFO_READ_MMR_REG Christian König
2026-04-21 12:55 ` [PATCH 02/11] drm/amdgpu: remove deadlocks from amdgpu_userq_pre_reset Christian König
2026-04-22 4:53 ` Khatri, Sunil
2026-04-22 7:13 ` Christian König
2026-04-22 7:19 ` Khatri, Sunil
2026-04-22 7:24 ` Christian König
2026-04-22 7:29 ` Khatri, Sunil
2026-04-27 8:45 ` Liang, Prike
2026-04-21 12:55 ` [PATCH 03/11] drm/amdgpu: nuke amdgpu_userq_fence_free Christian König
2026-04-22 8:29 ` Khatri, Sunil
2026-04-22 9:26 ` Christian König
2026-04-22 9:40 ` Khatri, Sunil
2026-04-22 10:12 ` Christian König
2026-04-22 14:32 ` Khatri, Sunil
2026-04-27 6:21 ` Liang, Prike
2026-04-21 12:55 ` [PATCH 04/11] drm/amdgpu: rework amdgpu_userq_signal_ioctl Christian König
2026-04-22 10:08 ` Khatri, Sunil
2026-04-22 10:14 ` Christian König
2026-04-22 15:14 ` Khatri, Sunil
2026-04-23 9:58 ` Liang, Prike
2026-04-23 10:47 ` Christian König
2026-04-23 10:54 ` Khatri, Sunil
2026-04-24 8:01 ` Liang, Prike
2026-04-24 13:02 ` Christian König [this message]
2026-04-21 12:55 ` [PATCH 05/11] drm/amdgpu: rework userq fence signal processing Christian König
2026-04-22 10:16 ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 06/11] drm/amdgpu: remove almost all calls to amdgpu_userq_detect_and_reset_queues Christian König
2026-04-22 10:20 ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 07/11] drm/amdgpu: fix userq hang detection and reset Christian König
2026-04-22 10:35 ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 08/11] drm/amdgpu: rework userq reset work handling Christian König
2026-04-23 10:43 ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 09/11] drm/amdgpu: revert to old status lock handling v4 Christian König
2026-04-23 10:45 ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 10/11] drm/amdgpu: restructure VM state machine v2 Christian König
2026-04-23 10:46 ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 11/11] drm/amdgpu: WIP sync amdgpu_ttm_fill_mem only to kernel fences Christian König
2026-04-23 10:47 ` Khatri, Sunil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fead91ae-a962-4124-88fb-ac746cfc525c@amd.com \
--to=christian.koenig@amd.com \
--cc=Alexander.Deucher@amd.com \
--cc=Prike.Liang@amd.com \
--cc=Sunil.Khatri@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox