Re: [PATCH 1/2] drm/amdgpu: clean up userq iVA mapping after removing userq from MES

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Khatri, Sunil" <sukhatri@amd.com>
To: Prike Liang <Prike.Liang@amd.com>, amd-gfx@lists.freedesktop.org
Cc: Alexander.Deucher@amd.com, Christian.Koenig@amd.com
Subject: Re: [PATCH 1/2] drm/amdgpu: clean up userq iVA mapping after removing userq from MES
Date: Fri, 15 May 2026 10:38:14 +0530	[thread overview]
Message-ID: <36b3d9e6-1600-400e-a509-ea6727569254@amd.com> (raw)
In-Reply-To: <20260514124250.3833711-1-Prike.Liang@amd.com>

[-- Attachment #1: Type: text/plain, Size: 3641 bytes --]


On 14-05-2026 06:12 pm, Prike Liang wrote:
> User queue destroy removed the tracked queue VA mappings before removing
> the HW queue from MES. If the queue still had active waves, HW like as TCP
> could continue accessing queue backing memory after the VM mappings were
> removed, resulting in gfxhub page faults.
>
> So, that needs to move queue VA cleanup after HW queue unmap. Meanwhile,
> if MES fails to remove the queue, then need to run reset recovery before
> freeing queue resources.
>
> Signed-off-by: Prike Liang<Prike.Liang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 43 +++++++++++++++++------
>   1 file changed, 32 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> index 83aee0810513..2e3edb6dd506 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> @@ -625,8 +625,7 @@ amdgpu_userq_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_que
>   	struct amdgpu_device *adev = uq_mgr->adev;
>   	struct amdgpu_fpriv *fpriv = uq_mgr_to_fpriv(uq_mgr);
>   	struct amdgpu_vm *vm = &fpriv->vm;
> -
> -	int r = 0;
> +	int r = 0, tmp;
>   
>   	trace_amdgpu_userq_destroy_start(queue);
>   
> @@ -635,15 +634,6 @@ amdgpu_userq_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_que
>   	/* Cancel any pending hang detection work and cleanup */
>   	cancel_delayed_work_sync(&queue->hang_detect_work);
>   
> -	r = amdgpu_bo_reserve(vm->root.bo, false);
> -	if (r) {
> -		drm_file_err(uq_mgr->file, "Failed to reserve root bo during userqueue destroy\n");
> -		trace_amdgpu_userq_destroy_end(queue, r);
> -		return r;
> -	}
> -	amdgpu_userq_buffer_vas_list_cleanup(adev, queue);
> -	amdgpu_bo_unreserve(vm->root.bo);
> -
>   	mutex_lock(&uq_mgr->userq_mutex);
>   	amdgpu_userq_wait_for_last_fence(queue);
>   
> @@ -651,6 +641,37 @@ amdgpu_userq_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_que
>   	debugfs_remove_recursive(queue->debugfs_queue);
>   #endif
>   	r = amdgpu_userq_unmap_helper(queue);
> +
> +	if (r) {
> +		drm_file_err(uq_mgr->file,
> +			     "Failed to unmap userqueue during destroy, ret=%d\n",
> +			     r);
> +		amdgpu_userq_fence_driver_force_completion(queue);
We should be calling this before amdgpu_userq_unmap_helper because we 
are in tear down and we want to finish all the work and their 
corresponding fences and if unmap fails or not we should be cleaning up 
the resources.
> +		amdgpu_reset_domain_schedule(uq_mgr->adev->reset_domain,
> +					     &uq_mgr->reset_work);

reset at this stage during tear down seems like a bad idea, we are 
infact waiting for if any reset thread if already scheduled.

regards
Sunil khatri

> +		flush_work(&uq_mgr->reset_work);
> +	}
> +	mutex_unlock(&uq_mgr->userq_mutex);
> +	/*
> +	 * Drop the queue VA mappings only after the HW queue is removed (or
> +	 * reset recovery has run). Removing the mappings first lets active TCP
> +	 * waves fault on queue backing memory while MES is still trying to
> +	 * process REMOVE_QUEUE.
> +	 */
> +	tmp = amdgpu_bo_reserve(vm->root.bo, false);
> +	if (tmp) {
> +		drm_file_err(uq_mgr->file,
> +			     "Failed to reserve root bo during userqueue destroy\n");
> +		if (!r)
> +			r = tmp;
> +	} else {
> +		tmp = amdgpu_userq_buffer_vas_list_cleanup(adev, queue);
> +		amdgpu_bo_unreserve(vm->root.bo);
> +		if (tmp && !r)
> +			r = tmp;
> +	}
> +
> +	mutex_lock(&uq_mgr->userq_mutex);
>   	atomic_dec(&uq_mgr->userq_count[queue->queue_type]);
>   	amdgpu_userq_cleanup(queue);
>   	mutex_unlock(&uq_mgr->userq_mutex);

[-- Attachment #2: Type: text/html, Size: 4426 bytes --]

next prev parent reply	other threads:[~2026-05-15  5:08 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-14 12:42 [PATCH 1/2] drm/amdgpu: clean up userq iVA mapping after removing userq from MES Prike Liang
2026-05-14 12:42 ` [PATCH 2/2] drm/amdgpu: unmap userq for evicting user queue Prike Liang
2026-05-15  2:30   ` Liang, Prike
2026-05-15  4:52     ` Khatri, Sunil
2026-05-15  7:23       ` Liang, Prike
2026-05-15  5:08 ` Khatri, Sunil [this message]
2026-05-15  6:32   ` [PATCH 1/2] drm/amdgpu: clean up userq iVA mapping after removing userq from MES Liang, Prike

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=36b3d9e6-1600-400e-a509-ea6727569254@amd.com \
    --to=sukhatri@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=Prike.Liang@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.