AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [Patch v4 0/4] Bunch of patches to fix fence timeout cleanup
@ 2026-03-31  7:49 Sunil Khatri
  2026-03-31  7:49 ` [Patch v4 1/4] drm/amdgpu/userq: dont check return value in amdgpu_userq_evict Sunil Khatri
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Sunil Khatri @ 2026-03-31  7:49 UTC (permalink / raw)
  To: Alex Deucher, Christian König; +Cc: amd-gfx, Sunil Khatri

v2: add more fixes as suggested by christian and update some logging.

v3: patch no 4 in amdgpu_userq_wait_for_last_fence to remove signalled
check.

v4: update dma_fence_wait_timeout to dma_fence_wait since its infinite
wait.

Sunil Khatri (4):
  drm/amdgpu/userq: dont check return value in amdgpu_userq_evict
  drm/amdgpu/userq: add the return code too in error condition
  drm/amdgpu/userq: call dma_resv_wait_timeout without test for
    signalled
  drm/amdgpu/userq: use dma_fence_wait_timeout without test for
    signalled

 drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 63 +++++++++--------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c    |  4 +-
 2 files changed, 27 insertions(+), 40 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Patch v4 1/4] drm/amdgpu/userq: dont check return value in amdgpu_userq_evict
  2026-03-31  7:49 [Patch v4 0/4] Bunch of patches to fix fence timeout cleanup Sunil Khatri
@ 2026-03-31  7:49 ` Sunil Khatri
  2026-03-31 11:42   ` Christian König
  2026-03-31  7:49 ` [Patch v4 2/4] drm/amdgpu/userq: add the return code too in error condition Sunil Khatri
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Sunil Khatri @ 2026-03-31  7:49 UTC (permalink / raw)
  To: Alex Deucher, Christian König; +Cc: amd-gfx, Sunil Khatri

In function amdgpu_userq_evict we do not need to check
for return values and print errors as we are already
print error in all the functions of amdgpu_userq_evict.

a. amdgpu_userq_wait_for_signal: Could timeout and we print
   error message in the function already
b. amdgpu_userq_evict_all: We unmap all the queues here and
   in case of unmap failure we already print unmap error.

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 26 +++++++++--------------
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
index fdae8c411aaa..79ee2f6e09da 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
@@ -1258,7 +1258,8 @@ amdgpu_userq_evict_all(struct amdgpu_userq_mgr *uq_mgr)
 	}
 
 	if (ret)
-		drm_file_err(uq_mgr->file, "Couldn't unmap all the queues\n");
+		drm_file_err(uq_mgr->file,
+			     "Couldn't unmap all the queues, eviction failed ret=%d\n", ret);
 	return ret;
 }
 
@@ -1289,13 +1290,14 @@ amdgpu_userq_wait_for_signal(struct amdgpu_userq_mgr *uq_mgr)
 	xa_for_each(&uq_mgr->userq_xa, queue_id, queue) {
 		struct dma_fence *f = queue->last_fence;
 
-		if (!f || dma_fence_is_signaled(f))
+		if (!f)
 			continue;
 
-		ret = dma_fence_wait_timeout(f, true, msecs_to_jiffies(100));
+		ret = dma_fence_wait(f, false);
 		if (ret <= 0) {
-			drm_file_err(uq_mgr->file, "Timed out waiting for fence=%llu:%llu\n",
-				     f->context, f->seqno);
+			drm_file_err(uq_mgr->file,
+				     "Timed out in wait_for_signal fence=%llu:%llu ret=%d\n",
+				     f->context, f->seqno, ret);
 
 			return -ETIMEDOUT;
 		}
@@ -1307,18 +1309,10 @@ amdgpu_userq_wait_for_signal(struct amdgpu_userq_mgr *uq_mgr)
 void
 amdgpu_userq_evict(struct amdgpu_userq_mgr *uq_mgr)
 {
-	struct amdgpu_device *adev = uq_mgr->adev;
-	int ret;
-
 	/* Wait for any pending userqueue fence work to finish */
-	ret = amdgpu_userq_wait_for_signal(uq_mgr);
-	if (ret)
-		dev_err(adev->dev, "Not evicting userqueue, timeout waiting for work\n");
-
-	ret = amdgpu_userq_evict_all(uq_mgr);
-	if (ret)
-		dev_err(adev->dev, "Failed to evict userqueue\n");
-
+	amdgpu_userq_wait_for_signal(uq_mgr);
+	/* unmaps all the queues */
+	amdgpu_userq_evict_all(uq_mgr);
 }
 
 int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct drm_file *file_priv,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Patch v4 2/4] drm/amdgpu/userq: add the return code too in error condition
  2026-03-31  7:49 [Patch v4 0/4] Bunch of patches to fix fence timeout cleanup Sunil Khatri
  2026-03-31  7:49 ` [Patch v4 1/4] drm/amdgpu/userq: dont check return value in amdgpu_userq_evict Sunil Khatri
@ 2026-03-31  7:49 ` Sunil Khatri
  2026-03-31  7:49 ` [Patch v4 3/4] drm/amdgpu/userq: call dma_resv_wait_timeout without test for signalled Sunil Khatri
  2026-03-31  7:49 ` [Patch v4 4/4] drm/amdgpu/userq: use dma_fence_wait_timeout " Sunil Khatri
  3 siblings, 0 replies; 9+ messages in thread
From: Sunil Khatri @ 2026-03-31  7:49 UTC (permalink / raw)
  To: Alex Deucher, Christian König; +Cc: amd-gfx, Sunil Khatri

In function amdgpu_userq_restore
a. amdgpu_userq_vm_validate: add return code in error condition
b. amdgpu_userq_restore_all: It already prints the error log, just
   update the erorr log in the function and remove it from caller.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
index 79ee2f6e09da..c85a4f4eefcf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
@@ -1023,7 +1023,8 @@ amdgpu_userq_restore_all(struct amdgpu_userq_mgr *uq_mgr)
 	mutex_unlock(&uq_mgr->userq_mutex);
 
 	if (ret)
-		drm_file_err(uq_mgr->file, "Failed to map all the queues\n");
+		drm_file_err(uq_mgr->file,
+			     "Failed to map all the queues, restore failed ret=%d\n", ret);
 	return ret;
 }
 
@@ -1230,13 +1231,11 @@ static void amdgpu_userq_restore_worker(struct work_struct *work)
 
 	ret = amdgpu_userq_vm_validate(uq_mgr);
 	if (ret) {
-		drm_file_err(uq_mgr->file, "Failed to validate BOs to restore\n");
+		drm_file_err(uq_mgr->file, "Failed to validate BOs to restore ret=%d\n", ret);
 		goto put_fence;
 	}
 
-	ret = amdgpu_userq_restore_all(uq_mgr);
-	if (ret)
-		drm_file_err(uq_mgr->file, "Failed to restore all queues\n");
+	amdgpu_userq_restore_all(uq_mgr);
 
 put_fence:
 	dma_fence_put(ev_fence);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Patch v4 3/4] drm/amdgpu/userq: call dma_resv_wait_timeout without test for signalled
  2026-03-31  7:49 [Patch v4 0/4] Bunch of patches to fix fence timeout cleanup Sunil Khatri
  2026-03-31  7:49 ` [Patch v4 1/4] drm/amdgpu/userq: dont check return value in amdgpu_userq_evict Sunil Khatri
  2026-03-31  7:49 ` [Patch v4 2/4] drm/amdgpu/userq: add the return code too in error condition Sunil Khatri
@ 2026-03-31  7:49 ` Sunil Khatri
  2026-03-31 11:51   ` Christian König
  2026-03-31  7:49 ` [Patch v4 4/4] drm/amdgpu/userq: use dma_fence_wait_timeout " Sunil Khatri
  3 siblings, 1 reply; 9+ messages in thread
From: Sunil Khatri @ 2026-03-31  7:49 UTC (permalink / raw)
  To: Alex Deucher, Christian König; +Cc: amd-gfx, Sunil Khatri

In function amdgpu_userq_gem_va_unmap_validate call
dma_resv_wait_timeout directly.

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 11 ++---------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c    |  4 ++--
 2 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
index c85a4f4eefcf..0ef829065403 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
@@ -1480,7 +1480,6 @@ int amdgpu_userq_gem_va_unmap_validate(struct amdgpu_device *adev,
 	u32 ip_mask = amdgpu_userq_get_supported_ip_mask(adev);
 	struct amdgpu_bo_va *bo_va = mapping->bo_va;
 	struct dma_resv *resv = bo_va->base.bo->tbo.base.resv;
-	int ret = 0;
 
 	if (!ip_mask)
 		return 0;
@@ -1494,14 +1493,8 @@ int amdgpu_userq_gem_va_unmap_validate(struct amdgpu_device *adev,
 	 * unmap is only for one kind of userq VAs, so at this point suppose
 	 * the eviction fence is always unsignaled.
 	 */
-	if (!dma_resv_test_signaled(resv, DMA_RESV_USAGE_BOOKKEEP)) {
-		ret = dma_resv_wait_timeout(resv, DMA_RESV_USAGE_BOOKKEEP, true,
-					    MAX_SCHEDULE_TIMEOUT);
-		if (ret <= 0)
-			return -EBUSY;
-	}
-
-	return 0;
+	return dma_resv_wait_timeout(resv, DMA_RESV_USAGE_BOOKKEEP,
+				     true, MAX_SCHEDULE_TIMEOUT);
 }
 
 void amdgpu_userq_pre_reset(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 937a6dd3a4b5..43a7cb2d5db9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2005,9 +2005,9 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
 	 */
 	if (unlikely(atomic_read(&bo_va->userq_va_mapped) > 0)) {
 		r = amdgpu_userq_gem_va_unmap_validate(adev, mapping, saddr);
-		if (unlikely(r == -EBUSY))
+		if (r <= 0 && r != -ERESTARTSYS)
 			dev_warn_once(adev->dev,
-				      "Attempt to unmap an active userq buffer\n");
+				      "Attempt to unmap an active userq buffer ret=%d\n", r);
 	}
 
 	list_del(&mapping->list);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Patch v4 4/4] drm/amdgpu/userq: use dma_fence_wait_timeout without test for signalled
  2026-03-31  7:49 [Patch v4 0/4] Bunch of patches to fix fence timeout cleanup Sunil Khatri
                   ` (2 preceding siblings ...)
  2026-03-31  7:49 ` [Patch v4 3/4] drm/amdgpu/userq: call dma_resv_wait_timeout without test for signalled Sunil Khatri
@ 2026-03-31  7:49 ` Sunil Khatri
  3 siblings, 0 replies; 9+ messages in thread
From: Sunil Khatri @ 2026-03-31  7:49 UTC (permalink / raw)
  To: Alex Deucher, Christian König; +Cc: amd-gfx, Sunil Khatri

In function amdgpu_userq_wait_for_last_fence use
dma_fence_wait_timeout directly instead of checking
for signalled fence first.

Return dma_fence_wait_timeout return value. Also update
the fence timeout log to differentiate where fence timedout.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
index 0ef829065403..002162dcbd3f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
@@ -433,14 +433,16 @@ static int amdgpu_userq_wait_for_last_fence(struct amdgpu_usermode_queue *queue)
 	struct dma_fence *f = queue->last_fence;
 	int ret = 0;
 
-	if (f && !dma_fence_is_signaled(f)) {
-		ret = dma_fence_wait_timeout(f, true, MAX_SCHEDULE_TIMEOUT);
-		if (ret <= 0) {
-			drm_file_err(uq_mgr->file, "Timed out waiting for fence=%llu:%llu\n",
-				     f->context, f->seqno);
-			queue->state = AMDGPU_USERQ_STATE_HUNG;
-			return -ETIME;
-		}
+	if (!f)
+		return 0;
+
+	ret = dma_fence_wait(f, true);
+	if (ret <= 0) {
+		drm_file_err(uq_mgr->file,
+			     "Timed out in wait_for_last_fence fence=%llu:%llu\n",
+			     f->context, f->seqno);
+		queue->state = AMDGPU_USERQ_STATE_HUNG;
+		return -ETIMEDOUT;
 	}
 
 	return ret;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Patch v4 1/4] drm/amdgpu/userq: dont check return value in amdgpu_userq_evict
  2026-03-31  7:49 ` [Patch v4 1/4] drm/amdgpu/userq: dont check return value in amdgpu_userq_evict Sunil Khatri
@ 2026-03-31 11:42   ` Christian König
  2026-03-31 11:55     ` Khatri, Sunil
  0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2026-03-31 11:42 UTC (permalink / raw)
  To: Sunil Khatri, Alex Deucher; +Cc: amd-gfx



On 3/31/26 09:49, Sunil Khatri wrote:
> In function amdgpu_userq_evict we do not need to check
> for return values and print errors as we are already
> print error in all the functions of amdgpu_userq_evict.
> 
> a. amdgpu_userq_wait_for_signal: Could timeout and we print
>    error message in the function already
> b. amdgpu_userq_evict_all: We unmap all the queues here and
>    in case of unmap failure we already print unmap error.
> 
> Suggested-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 26 +++++++++--------------
>  1 file changed, 10 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> index fdae8c411aaa..79ee2f6e09da 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> @@ -1258,7 +1258,8 @@ amdgpu_userq_evict_all(struct amdgpu_userq_mgr *uq_mgr)
>  	}
>  
>  	if (ret)
> -		drm_file_err(uq_mgr->file, "Couldn't unmap all the queues\n");
> +		drm_file_err(uq_mgr->file,
> +			     "Couldn't unmap all the queues, eviction failed ret=%d\n", ret);
>  	return ret;
>  }
>  
> @@ -1289,13 +1290,14 @@ amdgpu_userq_wait_for_signal(struct amdgpu_userq_mgr *uq_mgr)
>  	xa_for_each(&uq_mgr->userq_xa, queue_id, queue) {
>  		struct dma_fence *f = queue->last_fence;
>  
> -		if (!f || dma_fence_is_signaled(f))
> +		if (!f)
>  			continue;
>  
> -		ret = dma_fence_wait_timeout(f, true, msecs_to_jiffies(100));
> +		ret = dma_fence_wait(f, false);

>  		if (ret <= 0) {
> -			drm_file_err(uq_mgr->file, "Timed out waiting for fence=%llu:%llu\n",
> -				     f->context, f->seqno);
> +			drm_file_err(uq_mgr->file,
> +				     "Timed out in wait_for_signal fence=%llu:%llu ret=%d\n",
> +				     f->context, f->seqno, ret);
>  
>  			return -ETIMEDOUT;
>  		}

You can completely drop this. dma_fence_wait() will never return any error when used like this.

Apart from that the patch looks correct to me.

Regards,
Christian.


> @@ -1307,18 +1309,10 @@ amdgpu_userq_wait_for_signal(struct amdgpu_userq_mgr *uq_mgr)
>  void
>  amdgpu_userq_evict(struct amdgpu_userq_mgr *uq_mgr)
>  {
> -	struct amdgpu_device *adev = uq_mgr->adev;
> -	int ret;
> -
>  	/* Wait for any pending userqueue fence work to finish */
> -	ret = amdgpu_userq_wait_for_signal(uq_mgr);
> -	if (ret)
> -		dev_err(adev->dev, "Not evicting userqueue, timeout waiting for work\n");
> -
> -	ret = amdgpu_userq_evict_all(uq_mgr);
> -	if (ret)
> -		dev_err(adev->dev, "Failed to evict userqueue\n");
> -
> +	amdgpu_userq_wait_for_signal(uq_mgr);
> +	/* unmaps all the queues */
> +	amdgpu_userq_evict_all(uq_mgr);
>  }
>  
>  int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct drm_file *file_priv,


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Patch v4 3/4] drm/amdgpu/userq: call dma_resv_wait_timeout without test for signalled
  2026-03-31  7:49 ` [Patch v4 3/4] drm/amdgpu/userq: call dma_resv_wait_timeout without test for signalled Sunil Khatri
@ 2026-03-31 11:51   ` Christian König
  2026-03-31 11:57     ` Khatri, Sunil
  0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2026-03-31 11:51 UTC (permalink / raw)
  To: Sunil Khatri, Alex Deucher; +Cc: amd-gfx

On 3/31/26 09:49, Sunil Khatri wrote:
> In function amdgpu_userq_gem_va_unmap_validate call
> dma_resv_wait_timeout directly.
> 
> Suggested-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 11 ++---------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c    |  4 ++--
>  2 files changed, 4 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> index c85a4f4eefcf..0ef829065403 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> @@ -1480,7 +1480,6 @@ int amdgpu_userq_gem_va_unmap_validate(struct amdgpu_device *adev,
>  	u32 ip_mask = amdgpu_userq_get_supported_ip_mask(adev);
>  	struct amdgpu_bo_va *bo_va = mapping->bo_va;
>  	struct dma_resv *resv = bo_va->base.bo->tbo.base.resv;
> -	int ret = 0;
>  
>  	if (!ip_mask)
>  		return 0;
> @@ -1494,14 +1493,8 @@ int amdgpu_userq_gem_va_unmap_validate(struct amdgpu_device *adev,
>  	 * unmap is only for one kind of userq VAs, so at this point suppose
>  	 * the eviction fence is always unsignaled.
>  	 */
> -	if (!dma_resv_test_signaled(resv, DMA_RESV_USAGE_BOOKKEEP)) {
> -		ret = dma_resv_wait_timeout(resv, DMA_RESV_USAGE_BOOKKEEP, true,
> -					    MAX_SCHEDULE_TIMEOUT);
> -		if (ret <= 0)
> -			return -EBUSY;
> -	}
> -
> -	return 0;
> +	return dma_resv_wait_timeout(resv, DMA_RESV_USAGE_BOOKKEEP,
> +				     true, MAX_SCHEDULE_TIMEOUT);

That wait can never fail and so never return an error.

Just return 0 here or even better drop the return value.

Regards,
Christian.

>  }
>  
>  void amdgpu_userq_pre_reset(struct amdgpu_device *adev)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 937a6dd3a4b5..43a7cb2d5db9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -2005,9 +2005,9 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
>  	 */
>  	if (unlikely(atomic_read(&bo_va->userq_va_mapped) > 0)) {
>  		r = amdgpu_userq_gem_va_unmap_validate(adev, mapping, saddr);
> -		if (unlikely(r == -EBUSY))
> +		if (r <= 0 && r != -ERESTARTSYS)
>  			dev_warn_once(adev->dev,
> -				      "Attempt to unmap an active userq buffer\n");
> +				      "Attempt to unmap an active userq buffer ret=%d\n", r);
>  	}
>  
>  	list_del(&mapping->list);


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Patch v4 1/4] drm/amdgpu/userq: dont check return value in amdgpu_userq_evict
  2026-03-31 11:42   ` Christian König
@ 2026-03-31 11:55     ` Khatri, Sunil
  0 siblings, 0 replies; 9+ messages in thread
From: Khatri, Sunil @ 2026-03-31 11:55 UTC (permalink / raw)
  To: Christian König, Sunil Khatri, Alex Deucher; +Cc: amd-gfx


On 31-03-2026 05:12 pm, Christian König wrote:
>
> On 3/31/26 09:49, Sunil Khatri wrote:
>> In function amdgpu_userq_evict we do not need to check
>> for return values and print errors as we are already
>> print error in all the functions of amdgpu_userq_evict.
>>
>> a. amdgpu_userq_wait_for_signal: Could timeout and we print
>>     error message in the function already
>> b. amdgpu_userq_evict_all: We unmap all the queues here and
>>     in case of unmap failure we already print unmap error.
>>
>> Suggested-by: Christian König <christian.koenig@amd.com>
>> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 26 +++++++++--------------
>>   1 file changed, 10 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>> index fdae8c411aaa..79ee2f6e09da 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>> @@ -1258,7 +1258,8 @@ amdgpu_userq_evict_all(struct amdgpu_userq_mgr *uq_mgr)
>>   	}
>>   
>>   	if (ret)
>> -		drm_file_err(uq_mgr->file, "Couldn't unmap all the queues\n");
>> +		drm_file_err(uq_mgr->file,
>> +			     "Couldn't unmap all the queues, eviction failed ret=%d\n", ret);
>>   	return ret;
>>   }
>>   
>> @@ -1289,13 +1290,14 @@ amdgpu_userq_wait_for_signal(struct amdgpu_userq_mgr *uq_mgr)
>>   	xa_for_each(&uq_mgr->userq_xa, queue_id, queue) {
>>   		struct dma_fence *f = queue->last_fence;
>>   
>> -		if (!f || dma_fence_is_signaled(f))
>> +		if (!f)
>>   			continue;
>>   
>> -		ret = dma_fence_wait_timeout(f, true, msecs_to_jiffies(100));
>> +		ret = dma_fence_wait(f, false);
>>   		if (ret <= 0) {
>> -			drm_file_err(uq_mgr->file, "Timed out waiting for fence=%llu:%llu\n",
>> -				     f->context, f->seqno);
>> +			drm_file_err(uq_mgr->file,
>> +				     "Timed out in wait_for_signal fence=%llu:%llu ret=%d\n",
>> +				     f->context, f->seqno, ret);
>>   
>>   			return -ETIMEDOUT;
>>   		}
> You can completely drop this. dma_fence_wait() will never return any error when used like this.
>
> Apart from that the patch looks correct to me.

Got it

Regards
Sunil

>
> Regards,
> Christian.
>
>
>> @@ -1307,18 +1309,10 @@ amdgpu_userq_wait_for_signal(struct amdgpu_userq_mgr *uq_mgr)
>>   void
>>   amdgpu_userq_evict(struct amdgpu_userq_mgr *uq_mgr)
>>   {
>> -	struct amdgpu_device *adev = uq_mgr->adev;
>> -	int ret;
>> -
>>   	/* Wait for any pending userqueue fence work to finish */
>> -	ret = amdgpu_userq_wait_for_signal(uq_mgr);
>> -	if (ret)
>> -		dev_err(adev->dev, "Not evicting userqueue, timeout waiting for work\n");
>> -
>> -	ret = amdgpu_userq_evict_all(uq_mgr);
>> -	if (ret)
>> -		dev_err(adev->dev, "Failed to evict userqueue\n");
>> -
>> +	amdgpu_userq_wait_for_signal(uq_mgr);
>> +	/* unmaps all the queues */
>> +	amdgpu_userq_evict_all(uq_mgr);
>>   }
>>   
>>   int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct drm_file *file_priv,

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Patch v4 3/4] drm/amdgpu/userq: call dma_resv_wait_timeout without test for signalled
  2026-03-31 11:51   ` Christian König
@ 2026-03-31 11:57     ` Khatri, Sunil
  0 siblings, 0 replies; 9+ messages in thread
From: Khatri, Sunil @ 2026-03-31 11:57 UTC (permalink / raw)
  To: Christian König, Sunil Khatri, Alex Deucher; +Cc: amd-gfx

[-- Attachment #1: Type: text/plain, Size: 2671 bytes --]


On 31-03-2026 05:21 pm, Christian König wrote:
> On 3/31/26 09:49, Sunil Khatri wrote:
>> In function amdgpu_userq_gem_va_unmap_validate call
>> dma_resv_wait_timeout directly.
>>
>> Suggested-by: Christian König<christian.koenig@amd.com>
>> Signed-off-by: Sunil Khatri<sunil.khatri@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 11 ++---------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c    |  4 ++--
>>   2 files changed, 4 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>> index c85a4f4eefcf..0ef829065403 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>> @@ -1480,7 +1480,6 @@ int amdgpu_userq_gem_va_unmap_validate(struct amdgpu_device *adev,
>>   	u32 ip_mask = amdgpu_userq_get_supported_ip_mask(adev);
>>   	struct amdgpu_bo_va *bo_va = mapping->bo_va;
>>   	struct dma_resv *resv = bo_va->base.bo->tbo.base.resv;
>> -	int ret = 0;
>>   
>>   	if (!ip_mask)
>>   		return 0;
>> @@ -1494,14 +1493,8 @@ int amdgpu_userq_gem_va_unmap_validate(struct amdgpu_device *adev,
>>   	 * unmap is only for one kind of userq VAs, so at this point suppose
>>   	 * the eviction fence is always unsignaled.
>>   	 */
>> -	if (!dma_resv_test_signaled(resv, DMA_RESV_USAGE_BOOKKEEP)) {
>> -		ret = dma_resv_wait_timeout(resv, DMA_RESV_USAGE_BOOKKEEP, true,
>> -					    MAX_SCHEDULE_TIMEOUT);
>> -		if (ret <= 0)
>> -			return -EBUSY;
>> -	}
>> -
>> -	return 0;
>> +	return dma_resv_wait_timeout(resv, DMA_RESV_USAGE_BOOKKEEP,
>> +				     true, MAX_SCHEDULE_TIMEOUT);
> That wait can never fail and so never return an error.
>
> Just return 0 here or even better drop the return value.
Do we want to return and check in caller for -ERESTARTSYS ?


Regards
Sunil Khatri

>
> Regards,
> Christian.
>
>>   }
>>   
>>   void amdgpu_userq_pre_reset(struct amdgpu_device *adev)
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 937a6dd3a4b5..43a7cb2d5db9 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -2005,9 +2005,9 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
>>   	 */
>>   	if (unlikely(atomic_read(&bo_va->userq_va_mapped) > 0)) {
>>   		r = amdgpu_userq_gem_va_unmap_validate(adev, mapping, saddr);
>> -		if (unlikely(r == -EBUSY))
>> +		if (r <= 0 && r != -ERESTARTSYS)
>>   			dev_warn_once(adev->dev,
>> -				      "Attempt to unmap an active userq buffer\n");
>> +				      "Attempt to unmap an active userq buffer ret=%d\n", r);
>>   	}
>>   
>>   	list_del(&mapping->list);

[-- Attachment #2: Type: text/html, Size: 4138 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-03-31 11:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-31  7:49 [Patch v4 0/4] Bunch of patches to fix fence timeout cleanup Sunil Khatri
2026-03-31  7:49 ` [Patch v4 1/4] drm/amdgpu/userq: dont check return value in amdgpu_userq_evict Sunil Khatri
2026-03-31 11:42   ` Christian König
2026-03-31 11:55     ` Khatri, Sunil
2026-03-31  7:49 ` [Patch v4 2/4] drm/amdgpu/userq: add the return code too in error condition Sunil Khatri
2026-03-31  7:49 ` [Patch v4 3/4] drm/amdgpu/userq: call dma_resv_wait_timeout without test for signalled Sunil Khatri
2026-03-31 11:51   ` Christian König
2026-03-31 11:57     ` Khatri, Sunil
2026-03-31  7:49 ` [Patch v4 4/4] drm/amdgpu/userq: use dma_fence_wait_timeout " Sunil Khatri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox