From: "Khatri, Sunil" <sukhatri@amd.com>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>,
alexander.deucher@amd.com, Prike.Liang@amd.com,
amd-gfx@lists.freedesktop.org
Cc: christian.koenig@amd.com
Subject: Re: [PATCH 06/11] drm/amdgpu: remove almost all calls to amdgpu_userq_detect_and_reset_queues
Date: Wed, 22 Apr 2026 15:50:03 +0530 [thread overview]
Message-ID: <30fcb9fe-cef3-4320-b430-735071e808c8@amd.com> (raw)
In-Reply-To: <20260421125513.4545-6-christian.koenig@amd.com>
Now this is exactly how i nearly disabled the reset login in my
validation setup.
Looks clean and as per the expectations.
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
On 21-04-2026 06:25 pm, Christian König wrote:
> Well the reset handling seems broken on multiple levels.
>
> As first step of fixing this remove most calls to the hang detection.
> That function should only be called after we run into a timeout! And *NOT*
> as random check spread over the code in multiple places.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 38 +++++++++--------------
> 1 file changed, 14 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> index 8ce001481d42..5ccd53ad8efd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> @@ -345,23 +345,18 @@ static int amdgpu_userq_preempt_helper(struct amdgpu_usermode_queue *queue)
> struct amdgpu_device *adev = uq_mgr->adev;
> const struct amdgpu_userq_funcs *userq_funcs =
> adev->userq_funcs[queue->queue_type];
> - bool found_hung_queue = false;
> - int r = 0;
> + int r;
>
> if (queue->state == AMDGPU_USERQ_STATE_MAPPED) {
> r = userq_funcs->preempt(queue);
> if (r) {
> queue->state = AMDGPU_USERQ_STATE_HUNG;
> - found_hung_queue = true;
> + return r;
> } else {
> queue->state = AMDGPU_USERQ_STATE_PREEMPTED;
> }
> }
> -
> - if (found_hung_queue)
> - amdgpu_userq_detect_and_reset_queues(uq_mgr);
> -
> - return r;
> + return 0;
> }
>
> static int amdgpu_userq_restore_helper(struct amdgpu_usermode_queue *queue)
> @@ -390,24 +385,21 @@ static int amdgpu_userq_unmap_helper(struct amdgpu_usermode_queue *queue)
> struct amdgpu_device *adev = uq_mgr->adev;
> const struct amdgpu_userq_funcs *userq_funcs =
> adev->userq_funcs[queue->queue_type];
> - bool found_hung_queue = false;
> - int r = 0;
> + int r;
>
> if ((queue->state == AMDGPU_USERQ_STATE_MAPPED) ||
> - (queue->state == AMDGPU_USERQ_STATE_PREEMPTED)) {
> + (queue->state == AMDGPU_USERQ_STATE_PREEMPTED)) {
> +
> r = userq_funcs->unmap(queue);
> if (r) {
> queue->state = AMDGPU_USERQ_STATE_HUNG;
> - found_hung_queue = true;
> + return r;
> } else {
> queue->state = AMDGPU_USERQ_STATE_UNMAPPED;
> }
> }
>
> - if (found_hung_queue)
> - amdgpu_userq_detect_and_reset_queues(uq_mgr);
> -
> - return r;
> + return 0;
> }
>
> static int amdgpu_userq_map_helper(struct amdgpu_usermode_queue *queue)
> @@ -416,19 +408,19 @@ static int amdgpu_userq_map_helper(struct amdgpu_usermode_queue *queue)
> struct amdgpu_device *adev = uq_mgr->adev;
> const struct amdgpu_userq_funcs *userq_funcs =
> adev->userq_funcs[queue->queue_type];
> - int r = 0;
> + int r;
>
> if (queue->state == AMDGPU_USERQ_STATE_UNMAPPED) {
> r = userq_funcs->map(queue);
> if (r) {
> queue->state = AMDGPU_USERQ_STATE_HUNG;
> - amdgpu_userq_detect_and_reset_queues(uq_mgr);
> + return r;
> } else {
> queue->state = AMDGPU_USERQ_STATE_MAPPED;
> }
> }
>
> - return r;
> + return 0;
> }
>
> static void amdgpu_userq_wait_for_last_fence(struct amdgpu_usermode_queue *queue)
> @@ -654,7 +646,6 @@ amdgpu_userq_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_que
> #if defined(CONFIG_DEBUG_FS)
> debugfs_remove_recursive(queue->debugfs_queue);
> #endif
> - amdgpu_userq_detect_and_reset_queues(uq_mgr);
> r = amdgpu_userq_unmap_helper(queue);
> /*TODO: It requires a reset for userq hw unmap error*/
> if (r) {
> @@ -1268,7 +1259,6 @@ amdgpu_userq_evict_all(struct amdgpu_userq_mgr *uq_mgr)
> unsigned long queue_id;
> int ret = 0, r;
>
> - amdgpu_userq_detect_and_reset_queues(uq_mgr);
> /* Try to unmap all the queues in this process ctx */
> xa_for_each(&uq_mgr->userq_xa, queue_id, queue) {
> r = amdgpu_userq_preempt_helper(queue);
> @@ -1276,9 +1266,11 @@ amdgpu_userq_evict_all(struct amdgpu_userq_mgr *uq_mgr)
> ret = r;
> }
>
> - if (ret)
> + if (ret) {
> drm_file_err(uq_mgr->file,
> "Couldn't unmap all the queues, eviction failed ret=%d\n", ret);
> + amdgpu_userq_detect_and_reset_queues(uq_mgr);
> + }
> return ret;
> }
>
> @@ -1378,7 +1370,6 @@ int amdgpu_userq_suspend(struct amdgpu_device *adev)
> uqm = queue->userq_mgr;
> cancel_delayed_work_sync(&uqm->resume_work);
> guard(mutex)(&uqm->userq_mutex);
> - amdgpu_userq_detect_and_reset_queues(uqm);
> if (adev->in_s0ix)
> r = amdgpu_userq_preempt_helper(queue);
> else
> @@ -1437,7 +1428,6 @@ int amdgpu_userq_stop_sched_for_enforce_isolation(struct amdgpu_device *adev,
> if (((queue->queue_type == AMDGPU_HW_IP_GFX) ||
> (queue->queue_type == AMDGPU_HW_IP_COMPUTE)) &&
> (queue->xcp_id == idx)) {
> - amdgpu_userq_detect_and_reset_queues(uqm);
> r = amdgpu_userq_preempt_helper(queue);
> if (r)
> ret = r;
next prev parent reply other threads:[~2026-04-22 10:20 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-21 12:55 [PATCH 01/11] drm/amdgpu: fix AMDGPU_INFO_READ_MMR_REG Christian König
2026-04-21 12:55 ` [PATCH 02/11] drm/amdgpu: remove deadlocks from amdgpu_userq_pre_reset Christian König
2026-04-22 4:53 ` Khatri, Sunil
2026-04-22 7:13 ` Christian König
2026-04-22 7:19 ` Khatri, Sunil
2026-04-22 7:24 ` Christian König
2026-04-22 7:29 ` Khatri, Sunil
2026-04-27 8:45 ` Liang, Prike
2026-04-21 12:55 ` [PATCH 03/11] drm/amdgpu: nuke amdgpu_userq_fence_free Christian König
2026-04-22 8:29 ` Khatri, Sunil
2026-04-22 9:26 ` Christian König
2026-04-22 9:40 ` Khatri, Sunil
2026-04-22 10:12 ` Christian König
2026-04-22 14:32 ` Khatri, Sunil
2026-04-27 6:21 ` Liang, Prike
2026-04-21 12:55 ` [PATCH 04/11] drm/amdgpu: rework amdgpu_userq_signal_ioctl Christian König
2026-04-22 10:08 ` Khatri, Sunil
2026-04-22 10:14 ` Christian König
2026-04-22 15:14 ` Khatri, Sunil
2026-04-23 9:58 ` Liang, Prike
2026-04-23 10:47 ` Christian König
2026-04-23 10:54 ` Khatri, Sunil
2026-04-24 8:01 ` Liang, Prike
2026-04-24 13:02 ` Christian König
2026-04-21 12:55 ` [PATCH 05/11] drm/amdgpu: rework userq fence signal processing Christian König
2026-04-22 10:16 ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 06/11] drm/amdgpu: remove almost all calls to amdgpu_userq_detect_and_reset_queues Christian König
2026-04-22 10:20 ` Khatri, Sunil [this message]
2026-04-21 12:55 ` [PATCH 07/11] drm/amdgpu: fix userq hang detection and reset Christian König
2026-04-22 10:35 ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 08/11] drm/amdgpu: rework userq reset work handling Christian König
2026-04-23 10:43 ` Khatri, Sunil
2026-05-11 17:50 ` Christian König
2026-05-11 17:58 ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 09/11] drm/amdgpu: revert to old status lock handling v4 Christian König
2026-04-23 10:45 ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 10/11] drm/amdgpu: restructure VM state machine v2 Christian König
2026-04-23 10:46 ` Khatri, Sunil
2026-04-21 12:55 ` [PATCH 11/11] drm/amdgpu: WIP sync amdgpu_ttm_fill_mem only to kernel fences Christian König
2026-04-23 10:47 ` Khatri, Sunil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=30fcb9fe-cef3-4320-b430-735071e808c8@amd.com \
--to=sukhatri@amd.com \
--cc=Prike.Liang@amd.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=christian.koenig@amd.com \
--cc=ckoenig.leichtzumerken@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.