From: "Khatri, Sunil" <sukhatri@amd.com>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>,
alexander.deucher@amd.com, Prike.Liang@amd.com,
amd-gfx@lists.freedesktop.org
Cc: christian.koenig@amd.com
Subject: Re: [PATCH 02/11] drm/amdgpu: remove deadlocks from amdgpu_userq_pre_reset
Date: Wed, 22 Apr 2026 10:23:11 +0530 [thread overview]
Message-ID: <e0761bb4-6cb5-40ec-b5f4-f57c6ef636e2@amd.com> (raw)
In-Reply-To: <20260421125513.4545-2-christian.koenig@amd.com>
[-- Attachment #1: Type: text/plain, Size: 2674 bytes --]
On 21-04-2026 06:25 pm, Christian König wrote:
> The purpose of a GPU reset is to make sure that fence can be signaled
> again and the signal and resume workers can make progress again.
>
> So waiting for the resume worker or any fence in the GPU reset path is
> just utterly nonsense.
>
> Signed-off-by: Christian König<christian.koenig@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 26 +++++++++++------------
> 1 file changed, 12 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> index 8f48520cb822..b632bc3c952b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> @@ -1496,23 +1496,21 @@ void amdgpu_userq_pre_reset(struct amdgpu_device *adev)
> {
> const struct amdgpu_userq_funcs *userq_funcs;
> struct amdgpu_usermode_queue *queue;
> - struct amdgpu_userq_mgr *uqm;
> unsigned long queue_id;
>
> + /* TODO: We probably need a new lock for the queue state */
> xa_for_each(&adev->userq_doorbell_xa, queue_id, queue) {
> - uqm = queue->userq_mgr;
> - cancel_delayed_work_sync(&uqm->resume_work);
> - if (queue->state == AMDGPU_USERQ_STATE_MAPPED) {
> - amdgpu_userq_wait_for_last_fence(queue);
> - userq_funcs = adev->userq_funcs[queue->queue_type];
> - userq_funcs->unmap(queue);
> - /* just mark all queues as hung at this point.
> - * if unmap succeeds, we could map again
> - * in amdgpu_userq_post_reset() if vram is not lost
> - */
> - queue->state = AMDGPU_USERQ_STATE_HUNG;
> - amdgpu_userq_fence_driver_force_completion(queue);
> - }
> + if (queue->state != AMDGPU_USERQ_STATE_MAPPED)
> + continue;
If the queue is in prempt state and if at that time we are in this
function we should still be doing force completion for work in those
queue else the waiters will keep waiting.
> +
> + userq_funcs = adev->userq_funcs[queue->queue_type];
> + userq_funcs->unmap(queue);
GPU is already hung if we are here and observation is we are unable to
unmap as we have tried to reset via the fw and that failed to atleast
thats what i have seen. Could we skip unmap ???
> + /* just mark all queues as hung at this point.
> + * if unmap succeeds, we could map again
> + * in amdgpu_userq_post_reset() if vram is not lost
> + */
> + queue->state = AMDGPU_USERQ_STATE_HUNG;
> + amdgpu_userq_fence_driver_force_completion(queue);
we should be calling completion irrespective of queue state here. The
GPU atleast the queue is hung and fw has failed to reset. We have to
release the fences by foce completion.
Regards
Sunil Khatri
> }
> }
>
[-- Attachment #2: Type: text/html, Size: 3612 bytes --]
next prev parent reply other threads:[~2026-04-22 4:53 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-21 12:55 [PATCH 01/11] drm/amdgpu: fix AMDGPU_INFO_READ_MMR_REG Christian König
2026-04-21 12:55 ` [PATCH 02/11] drm/amdgpu: remove deadlocks from amdgpu_userq_pre_reset Christian König
2026-04-22 4:53 ` Khatri, Sunil [this message]
2026-04-22 7:13 ` Christian König
2026-04-22 7:19 ` Khatri, Sunil
2026-04-22 7:24 ` Christian König
2026-04-22 7:29 ` Khatri, Sunil
2026-04-27 8:45 ` Liang, Prike
2026-04-21 12:55 ` [PATCH 03/11] drm/amdgpu: nuke amdgpu_userq_fence_free Christian König
2026-04-22 8:29 ` Khatri, Sunil
2026-04-22 9:26 ` Christian König
2026-04-22 9:40 ` Khatri, Sunil
2026-04-22 10:12 ` Christian König
2026-04-22 14:32 ` Khatri, Sunil
2026-04-27 6:21 ` Liang, Prike
2026-04-21 12:55 ` [PATCH 04/11] drm/amdgpu: rework amdgpu_userq_signal_ioctl Christian König
2026-04-22 10:08 ` Khatri, Sunil
2026-04-22 10:14 ` Christian König
2026-04-22 15:14 ` Khatri, Sunil
2026-04-23 9:58 ` Liang, Prike
2026-04-23 10:47 ` Christian König
2026-04-23 10:54 ` Khatri, Sunil
2026-04-24 8:01 ` Liang, Prike
2026-04-24 13:02 ` Christian König
2026-04-21 12:55 ` [PATCH 05/11] drm/amdgpu: rework userq fence signal processing Christian König
2026-04-22 10:16 ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 06/11] drm/amdgpu: remove almost all calls to amdgpu_userq_detect_and_reset_queues Christian König
2026-04-22 10:20 ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 07/11] drm/amdgpu: fix userq hang detection and reset Christian König
2026-04-22 10:35 ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 08/11] drm/amdgpu: rework userq reset work handling Christian König
2026-04-23 10:43 ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 09/11] drm/amdgpu: revert to old status lock handling v4 Christian König
2026-04-23 10:45 ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 10/11] drm/amdgpu: restructure VM state machine v2 Christian König
2026-04-23 10:46 ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 11/11] drm/amdgpu: WIP sync amdgpu_ttm_fill_mem only to kernel fences Christian König
2026-04-23 10:47 ` Khatri, Sunil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e0761bb4-6cb5-40ec-b5f4-f57c6ef636e2@amd.com \
--to=sukhatri@amd.com \
--cc=Prike.Liang@amd.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=christian.koenig@amd.com \
--cc=ckoenig.leichtzumerken@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox