* [Patch v4 0/4] Bunch of patches to fix fence timeout cleanup
@ 2026-03-31 7:49 Sunil Khatri
2026-03-31 7:49 ` [Patch v4 1/4] drm/amdgpu/userq: dont check return value in amdgpu_userq_evict Sunil Khatri
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Sunil Khatri @ 2026-03-31 7:49 UTC (permalink / raw)
To: Alex Deucher, Christian König; +Cc: amd-gfx, Sunil Khatri
v2: add more fixes as suggested by christian and update some logging.
v3: patch no 4 in amdgpu_userq_wait_for_last_fence to remove signalled
check.
v4: update dma_fence_wait_timeout to dma_fence_wait since its infinite
wait.
Sunil Khatri (4):
drm/amdgpu/userq: dont check return value in amdgpu_userq_evict
drm/amdgpu/userq: add the return code too in error condition
drm/amdgpu/userq: call dma_resv_wait_timeout without test for
signalled
drm/amdgpu/userq: use dma_fence_wait_timeout without test for
signalled
drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 63 +++++++++--------------
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 +-
2 files changed, 27 insertions(+), 40 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Patch v4 1/4] drm/amdgpu/userq: dont check return value in amdgpu_userq_evict
2026-03-31 7:49 [Patch v4 0/4] Bunch of patches to fix fence timeout cleanup Sunil Khatri
@ 2026-03-31 7:49 ` Sunil Khatri
2026-03-31 11:42 ` Christian König
2026-03-31 7:49 ` [Patch v4 2/4] drm/amdgpu/userq: add the return code too in error condition Sunil Khatri
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: Sunil Khatri @ 2026-03-31 7:49 UTC (permalink / raw)
To: Alex Deucher, Christian König; +Cc: amd-gfx, Sunil Khatri
In function amdgpu_userq_evict we do not need to check
for return values and print errors as we are already
print error in all the functions of amdgpu_userq_evict.
a. amdgpu_userq_wait_for_signal: Could timeout and we print
error message in the function already
b. amdgpu_userq_evict_all: We unmap all the queues here and
in case of unmap failure we already print unmap error.
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 26 +++++++++--------------
1 file changed, 10 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
index fdae8c411aaa..79ee2f6e09da 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
@@ -1258,7 +1258,8 @@ amdgpu_userq_evict_all(struct amdgpu_userq_mgr *uq_mgr)
}
if (ret)
- drm_file_err(uq_mgr->file, "Couldn't unmap all the queues\n");
+ drm_file_err(uq_mgr->file,
+ "Couldn't unmap all the queues, eviction failed ret=%d\n", ret);
return ret;
}
@@ -1289,13 +1290,14 @@ amdgpu_userq_wait_for_signal(struct amdgpu_userq_mgr *uq_mgr)
xa_for_each(&uq_mgr->userq_xa, queue_id, queue) {
struct dma_fence *f = queue->last_fence;
- if (!f || dma_fence_is_signaled(f))
+ if (!f)
continue;
- ret = dma_fence_wait_timeout(f, true, msecs_to_jiffies(100));
+ ret = dma_fence_wait(f, false);
if (ret <= 0) {
- drm_file_err(uq_mgr->file, "Timed out waiting for fence=%llu:%llu\n",
- f->context, f->seqno);
+ drm_file_err(uq_mgr->file,
+ "Timed out in wait_for_signal fence=%llu:%llu ret=%d\n",
+ f->context, f->seqno, ret);
return -ETIMEDOUT;
}
@@ -1307,18 +1309,10 @@ amdgpu_userq_wait_for_signal(struct amdgpu_userq_mgr *uq_mgr)
void
amdgpu_userq_evict(struct amdgpu_userq_mgr *uq_mgr)
{
- struct amdgpu_device *adev = uq_mgr->adev;
- int ret;
-
/* Wait for any pending userqueue fence work to finish */
- ret = amdgpu_userq_wait_for_signal(uq_mgr);
- if (ret)
- dev_err(adev->dev, "Not evicting userqueue, timeout waiting for work\n");
-
- ret = amdgpu_userq_evict_all(uq_mgr);
- if (ret)
- dev_err(adev->dev, "Failed to evict userqueue\n");
-
+ amdgpu_userq_wait_for_signal(uq_mgr);
+ /* unmaps all the queues */
+ amdgpu_userq_evict_all(uq_mgr);
}
int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct drm_file *file_priv,
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Patch v4 2/4] drm/amdgpu/userq: add the return code too in error condition
2026-03-31 7:49 [Patch v4 0/4] Bunch of patches to fix fence timeout cleanup Sunil Khatri
2026-03-31 7:49 ` [Patch v4 1/4] drm/amdgpu/userq: dont check return value in amdgpu_userq_evict Sunil Khatri
@ 2026-03-31 7:49 ` Sunil Khatri
2026-03-31 7:49 ` [Patch v4 3/4] drm/amdgpu/userq: call dma_resv_wait_timeout without test for signalled Sunil Khatri
2026-03-31 7:49 ` [Patch v4 4/4] drm/amdgpu/userq: use dma_fence_wait_timeout " Sunil Khatri
3 siblings, 0 replies; 9+ messages in thread
From: Sunil Khatri @ 2026-03-31 7:49 UTC (permalink / raw)
To: Alex Deucher, Christian König; +Cc: amd-gfx, Sunil Khatri
In function amdgpu_userq_restore
a. amdgpu_userq_vm_validate: add return code in error condition
b. amdgpu_userq_restore_all: It already prints the error log, just
update the erorr log in the function and remove it from caller.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
index 79ee2f6e09da..c85a4f4eefcf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
@@ -1023,7 +1023,8 @@ amdgpu_userq_restore_all(struct amdgpu_userq_mgr *uq_mgr)
mutex_unlock(&uq_mgr->userq_mutex);
if (ret)
- drm_file_err(uq_mgr->file, "Failed to map all the queues\n");
+ drm_file_err(uq_mgr->file,
+ "Failed to map all the queues, restore failed ret=%d\n", ret);
return ret;
}
@@ -1230,13 +1231,11 @@ static void amdgpu_userq_restore_worker(struct work_struct *work)
ret = amdgpu_userq_vm_validate(uq_mgr);
if (ret) {
- drm_file_err(uq_mgr->file, "Failed to validate BOs to restore\n");
+ drm_file_err(uq_mgr->file, "Failed to validate BOs to restore ret=%d\n", ret);
goto put_fence;
}
- ret = amdgpu_userq_restore_all(uq_mgr);
- if (ret)
- drm_file_err(uq_mgr->file, "Failed to restore all queues\n");
+ amdgpu_userq_restore_all(uq_mgr);
put_fence:
dma_fence_put(ev_fence);
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Patch v4 3/4] drm/amdgpu/userq: call dma_resv_wait_timeout without test for signalled
2026-03-31 7:49 [Patch v4 0/4] Bunch of patches to fix fence timeout cleanup Sunil Khatri
2026-03-31 7:49 ` [Patch v4 1/4] drm/amdgpu/userq: dont check return value in amdgpu_userq_evict Sunil Khatri
2026-03-31 7:49 ` [Patch v4 2/4] drm/amdgpu/userq: add the return code too in error condition Sunil Khatri
@ 2026-03-31 7:49 ` Sunil Khatri
2026-03-31 11:51 ` Christian König
2026-03-31 7:49 ` [Patch v4 4/4] drm/amdgpu/userq: use dma_fence_wait_timeout " Sunil Khatri
3 siblings, 1 reply; 9+ messages in thread
From: Sunil Khatri @ 2026-03-31 7:49 UTC (permalink / raw)
To: Alex Deucher, Christian König; +Cc: amd-gfx, Sunil Khatri
In function amdgpu_userq_gem_va_unmap_validate call
dma_resv_wait_timeout directly.
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 11 ++---------
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++--
2 files changed, 4 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
index c85a4f4eefcf..0ef829065403 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
@@ -1480,7 +1480,6 @@ int amdgpu_userq_gem_va_unmap_validate(struct amdgpu_device *adev,
u32 ip_mask = amdgpu_userq_get_supported_ip_mask(adev);
struct amdgpu_bo_va *bo_va = mapping->bo_va;
struct dma_resv *resv = bo_va->base.bo->tbo.base.resv;
- int ret = 0;
if (!ip_mask)
return 0;
@@ -1494,14 +1493,8 @@ int amdgpu_userq_gem_va_unmap_validate(struct amdgpu_device *adev,
* unmap is only for one kind of userq VAs, so at this point suppose
* the eviction fence is always unsignaled.
*/
- if (!dma_resv_test_signaled(resv, DMA_RESV_USAGE_BOOKKEEP)) {
- ret = dma_resv_wait_timeout(resv, DMA_RESV_USAGE_BOOKKEEP, true,
- MAX_SCHEDULE_TIMEOUT);
- if (ret <= 0)
- return -EBUSY;
- }
-
- return 0;
+ return dma_resv_wait_timeout(resv, DMA_RESV_USAGE_BOOKKEEP,
+ true, MAX_SCHEDULE_TIMEOUT);
}
void amdgpu_userq_pre_reset(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 937a6dd3a4b5..43a7cb2d5db9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2005,9 +2005,9 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
*/
if (unlikely(atomic_read(&bo_va->userq_va_mapped) > 0)) {
r = amdgpu_userq_gem_va_unmap_validate(adev, mapping, saddr);
- if (unlikely(r == -EBUSY))
+ if (r <= 0 && r != -ERESTARTSYS)
dev_warn_once(adev->dev,
- "Attempt to unmap an active userq buffer\n");
+ "Attempt to unmap an active userq buffer ret=%d\n", r);
}
list_del(&mapping->list);
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Patch v4 4/4] drm/amdgpu/userq: use dma_fence_wait_timeout without test for signalled
2026-03-31 7:49 [Patch v4 0/4] Bunch of patches to fix fence timeout cleanup Sunil Khatri
` (2 preceding siblings ...)
2026-03-31 7:49 ` [Patch v4 3/4] drm/amdgpu/userq: call dma_resv_wait_timeout without test for signalled Sunil Khatri
@ 2026-03-31 7:49 ` Sunil Khatri
3 siblings, 0 replies; 9+ messages in thread
From: Sunil Khatri @ 2026-03-31 7:49 UTC (permalink / raw)
To: Alex Deucher, Christian König; +Cc: amd-gfx, Sunil Khatri
In function amdgpu_userq_wait_for_last_fence use
dma_fence_wait_timeout directly instead of checking
for signalled fence first.
Return dma_fence_wait_timeout return value. Also update
the fence timeout log to differentiate where fence timedout.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
index 0ef829065403..002162dcbd3f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
@@ -433,14 +433,16 @@ static int amdgpu_userq_wait_for_last_fence(struct amdgpu_usermode_queue *queue)
struct dma_fence *f = queue->last_fence;
int ret = 0;
- if (f && !dma_fence_is_signaled(f)) {
- ret = dma_fence_wait_timeout(f, true, MAX_SCHEDULE_TIMEOUT);
- if (ret <= 0) {
- drm_file_err(uq_mgr->file, "Timed out waiting for fence=%llu:%llu\n",
- f->context, f->seqno);
- queue->state = AMDGPU_USERQ_STATE_HUNG;
- return -ETIME;
- }
+ if (!f)
+ return 0;
+
+ ret = dma_fence_wait(f, true);
+ if (ret <= 0) {
+ drm_file_err(uq_mgr->file,
+ "Timed out in wait_for_last_fence fence=%llu:%llu\n",
+ f->context, f->seqno);
+ queue->state = AMDGPU_USERQ_STATE_HUNG;
+ return -ETIMEDOUT;
}
return ret;
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Patch v4 1/4] drm/amdgpu/userq: dont check return value in amdgpu_userq_evict
2026-03-31 7:49 ` [Patch v4 1/4] drm/amdgpu/userq: dont check return value in amdgpu_userq_evict Sunil Khatri
@ 2026-03-31 11:42 ` Christian König
2026-03-31 11:55 ` Khatri, Sunil
0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2026-03-31 11:42 UTC (permalink / raw)
To: Sunil Khatri, Alex Deucher; +Cc: amd-gfx
On 3/31/26 09:49, Sunil Khatri wrote:
> In function amdgpu_userq_evict we do not need to check
> for return values and print errors as we are already
> print error in all the functions of amdgpu_userq_evict.
>
> a. amdgpu_userq_wait_for_signal: Could timeout and we print
> error message in the function already
> b. amdgpu_userq_evict_all: We unmap all the queues here and
> in case of unmap failure we already print unmap error.
>
> Suggested-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 26 +++++++++--------------
> 1 file changed, 10 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> index fdae8c411aaa..79ee2f6e09da 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> @@ -1258,7 +1258,8 @@ amdgpu_userq_evict_all(struct amdgpu_userq_mgr *uq_mgr)
> }
>
> if (ret)
> - drm_file_err(uq_mgr->file, "Couldn't unmap all the queues\n");
> + drm_file_err(uq_mgr->file,
> + "Couldn't unmap all the queues, eviction failed ret=%d\n", ret);
> return ret;
> }
>
> @@ -1289,13 +1290,14 @@ amdgpu_userq_wait_for_signal(struct amdgpu_userq_mgr *uq_mgr)
> xa_for_each(&uq_mgr->userq_xa, queue_id, queue) {
> struct dma_fence *f = queue->last_fence;
>
> - if (!f || dma_fence_is_signaled(f))
> + if (!f)
> continue;
>
> - ret = dma_fence_wait_timeout(f, true, msecs_to_jiffies(100));
> + ret = dma_fence_wait(f, false);
> if (ret <= 0) {
> - drm_file_err(uq_mgr->file, "Timed out waiting for fence=%llu:%llu\n",
> - f->context, f->seqno);
> + drm_file_err(uq_mgr->file,
> + "Timed out in wait_for_signal fence=%llu:%llu ret=%d\n",
> + f->context, f->seqno, ret);
>
> return -ETIMEDOUT;
> }
You can completely drop this. dma_fence_wait() will never return any error when used like this.
Apart from that the patch looks correct to me.
Regards,
Christian.
> @@ -1307,18 +1309,10 @@ amdgpu_userq_wait_for_signal(struct amdgpu_userq_mgr *uq_mgr)
> void
> amdgpu_userq_evict(struct amdgpu_userq_mgr *uq_mgr)
> {
> - struct amdgpu_device *adev = uq_mgr->adev;
> - int ret;
> -
> /* Wait for any pending userqueue fence work to finish */
> - ret = amdgpu_userq_wait_for_signal(uq_mgr);
> - if (ret)
> - dev_err(adev->dev, "Not evicting userqueue, timeout waiting for work\n");
> -
> - ret = amdgpu_userq_evict_all(uq_mgr);
> - if (ret)
> - dev_err(adev->dev, "Failed to evict userqueue\n");
> -
> + amdgpu_userq_wait_for_signal(uq_mgr);
> + /* unmaps all the queues */
> + amdgpu_userq_evict_all(uq_mgr);
> }
>
> int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct drm_file *file_priv,
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Patch v4 3/4] drm/amdgpu/userq: call dma_resv_wait_timeout without test for signalled
2026-03-31 7:49 ` [Patch v4 3/4] drm/amdgpu/userq: call dma_resv_wait_timeout without test for signalled Sunil Khatri
@ 2026-03-31 11:51 ` Christian König
2026-03-31 11:57 ` Khatri, Sunil
0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2026-03-31 11:51 UTC (permalink / raw)
To: Sunil Khatri, Alex Deucher; +Cc: amd-gfx
On 3/31/26 09:49, Sunil Khatri wrote:
> In function amdgpu_userq_gem_va_unmap_validate call
> dma_resv_wait_timeout directly.
>
> Suggested-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 11 ++---------
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++--
> 2 files changed, 4 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> index c85a4f4eefcf..0ef829065403 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> @@ -1480,7 +1480,6 @@ int amdgpu_userq_gem_va_unmap_validate(struct amdgpu_device *adev,
> u32 ip_mask = amdgpu_userq_get_supported_ip_mask(adev);
> struct amdgpu_bo_va *bo_va = mapping->bo_va;
> struct dma_resv *resv = bo_va->base.bo->tbo.base.resv;
> - int ret = 0;
>
> if (!ip_mask)
> return 0;
> @@ -1494,14 +1493,8 @@ int amdgpu_userq_gem_va_unmap_validate(struct amdgpu_device *adev,
> * unmap is only for one kind of userq VAs, so at this point suppose
> * the eviction fence is always unsignaled.
> */
> - if (!dma_resv_test_signaled(resv, DMA_RESV_USAGE_BOOKKEEP)) {
> - ret = dma_resv_wait_timeout(resv, DMA_RESV_USAGE_BOOKKEEP, true,
> - MAX_SCHEDULE_TIMEOUT);
> - if (ret <= 0)
> - return -EBUSY;
> - }
> -
> - return 0;
> + return dma_resv_wait_timeout(resv, DMA_RESV_USAGE_BOOKKEEP,
> + true, MAX_SCHEDULE_TIMEOUT);
That wait can never fail and so never return an error.
Just return 0 here or even better drop the return value.
Regards,
Christian.
> }
>
> void amdgpu_userq_pre_reset(struct amdgpu_device *adev)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 937a6dd3a4b5..43a7cb2d5db9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -2005,9 +2005,9 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
> */
> if (unlikely(atomic_read(&bo_va->userq_va_mapped) > 0)) {
> r = amdgpu_userq_gem_va_unmap_validate(adev, mapping, saddr);
> - if (unlikely(r == -EBUSY))
> + if (r <= 0 && r != -ERESTARTSYS)
> dev_warn_once(adev->dev,
> - "Attempt to unmap an active userq buffer\n");
> + "Attempt to unmap an active userq buffer ret=%d\n", r);
> }
>
> list_del(&mapping->list);
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Patch v4 1/4] drm/amdgpu/userq: dont check return value in amdgpu_userq_evict
2026-03-31 11:42 ` Christian König
@ 2026-03-31 11:55 ` Khatri, Sunil
0 siblings, 0 replies; 9+ messages in thread
From: Khatri, Sunil @ 2026-03-31 11:55 UTC (permalink / raw)
To: Christian König, Sunil Khatri, Alex Deucher; +Cc: amd-gfx
On 31-03-2026 05:12 pm, Christian König wrote:
>
> On 3/31/26 09:49, Sunil Khatri wrote:
>> In function amdgpu_userq_evict we do not need to check
>> for return values and print errors as we are already
>> print error in all the functions of amdgpu_userq_evict.
>>
>> a. amdgpu_userq_wait_for_signal: Could timeout and we print
>> error message in the function already
>> b. amdgpu_userq_evict_all: We unmap all the queues here and
>> in case of unmap failure we already print unmap error.
>>
>> Suggested-by: Christian König <christian.koenig@amd.com>
>> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 26 +++++++++--------------
>> 1 file changed, 10 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>> index fdae8c411aaa..79ee2f6e09da 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>> @@ -1258,7 +1258,8 @@ amdgpu_userq_evict_all(struct amdgpu_userq_mgr *uq_mgr)
>> }
>>
>> if (ret)
>> - drm_file_err(uq_mgr->file, "Couldn't unmap all the queues\n");
>> + drm_file_err(uq_mgr->file,
>> + "Couldn't unmap all the queues, eviction failed ret=%d\n", ret);
>> return ret;
>> }
>>
>> @@ -1289,13 +1290,14 @@ amdgpu_userq_wait_for_signal(struct amdgpu_userq_mgr *uq_mgr)
>> xa_for_each(&uq_mgr->userq_xa, queue_id, queue) {
>> struct dma_fence *f = queue->last_fence;
>>
>> - if (!f || dma_fence_is_signaled(f))
>> + if (!f)
>> continue;
>>
>> - ret = dma_fence_wait_timeout(f, true, msecs_to_jiffies(100));
>> + ret = dma_fence_wait(f, false);
>> if (ret <= 0) {
>> - drm_file_err(uq_mgr->file, "Timed out waiting for fence=%llu:%llu\n",
>> - f->context, f->seqno);
>> + drm_file_err(uq_mgr->file,
>> + "Timed out in wait_for_signal fence=%llu:%llu ret=%d\n",
>> + f->context, f->seqno, ret);
>>
>> return -ETIMEDOUT;
>> }
> You can completely drop this. dma_fence_wait() will never return any error when used like this.
>
> Apart from that the patch looks correct to me.
Got it
Regards
Sunil
>
> Regards,
> Christian.
>
>
>> @@ -1307,18 +1309,10 @@ amdgpu_userq_wait_for_signal(struct amdgpu_userq_mgr *uq_mgr)
>> void
>> amdgpu_userq_evict(struct amdgpu_userq_mgr *uq_mgr)
>> {
>> - struct amdgpu_device *adev = uq_mgr->adev;
>> - int ret;
>> -
>> /* Wait for any pending userqueue fence work to finish */
>> - ret = amdgpu_userq_wait_for_signal(uq_mgr);
>> - if (ret)
>> - dev_err(adev->dev, "Not evicting userqueue, timeout waiting for work\n");
>> -
>> - ret = amdgpu_userq_evict_all(uq_mgr);
>> - if (ret)
>> - dev_err(adev->dev, "Failed to evict userqueue\n");
>> -
>> + amdgpu_userq_wait_for_signal(uq_mgr);
>> + /* unmaps all the queues */
>> + amdgpu_userq_evict_all(uq_mgr);
>> }
>>
>> int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct drm_file *file_priv,
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Patch v4 3/4] drm/amdgpu/userq: call dma_resv_wait_timeout without test for signalled
2026-03-31 11:51 ` Christian König
@ 2026-03-31 11:57 ` Khatri, Sunil
0 siblings, 0 replies; 9+ messages in thread
From: Khatri, Sunil @ 2026-03-31 11:57 UTC (permalink / raw)
To: Christian König, Sunil Khatri, Alex Deucher; +Cc: amd-gfx
[-- Attachment #1: Type: text/plain, Size: 2671 bytes --]
On 31-03-2026 05:21 pm, Christian König wrote:
> On 3/31/26 09:49, Sunil Khatri wrote:
>> In function amdgpu_userq_gem_va_unmap_validate call
>> dma_resv_wait_timeout directly.
>>
>> Suggested-by: Christian König<christian.koenig@amd.com>
>> Signed-off-by: Sunil Khatri<sunil.khatri@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 11 ++---------
>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++--
>> 2 files changed, 4 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>> index c85a4f4eefcf..0ef829065403 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>> @@ -1480,7 +1480,6 @@ int amdgpu_userq_gem_va_unmap_validate(struct amdgpu_device *adev,
>> u32 ip_mask = amdgpu_userq_get_supported_ip_mask(adev);
>> struct amdgpu_bo_va *bo_va = mapping->bo_va;
>> struct dma_resv *resv = bo_va->base.bo->tbo.base.resv;
>> - int ret = 0;
>>
>> if (!ip_mask)
>> return 0;
>> @@ -1494,14 +1493,8 @@ int amdgpu_userq_gem_va_unmap_validate(struct amdgpu_device *adev,
>> * unmap is only for one kind of userq VAs, so at this point suppose
>> * the eviction fence is always unsignaled.
>> */
>> - if (!dma_resv_test_signaled(resv, DMA_RESV_USAGE_BOOKKEEP)) {
>> - ret = dma_resv_wait_timeout(resv, DMA_RESV_USAGE_BOOKKEEP, true,
>> - MAX_SCHEDULE_TIMEOUT);
>> - if (ret <= 0)
>> - return -EBUSY;
>> - }
>> -
>> - return 0;
>> + return dma_resv_wait_timeout(resv, DMA_RESV_USAGE_BOOKKEEP,
>> + true, MAX_SCHEDULE_TIMEOUT);
> That wait can never fail and so never return an error.
>
> Just return 0 here or even better drop the return value.
Do we want to return and check in caller for -ERESTARTSYS ?
Regards
Sunil Khatri
>
> Regards,
> Christian.
>
>> }
>>
>> void amdgpu_userq_pre_reset(struct amdgpu_device *adev)
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 937a6dd3a4b5..43a7cb2d5db9 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -2005,9 +2005,9 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
>> */
>> if (unlikely(atomic_read(&bo_va->userq_va_mapped) > 0)) {
>> r = amdgpu_userq_gem_va_unmap_validate(adev, mapping, saddr);
>> - if (unlikely(r == -EBUSY))
>> + if (r <= 0 && r != -ERESTARTSYS)
>> dev_warn_once(adev->dev,
>> - "Attempt to unmap an active userq buffer\n");
>> + "Attempt to unmap an active userq buffer ret=%d\n", r);
>> }
>>
>> list_del(&mapping->list);
[-- Attachment #2: Type: text/html, Size: 4138 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-03-31 11:58 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-31 7:49 [Patch v4 0/4] Bunch of patches to fix fence timeout cleanup Sunil Khatri
2026-03-31 7:49 ` [Patch v4 1/4] drm/amdgpu/userq: dont check return value in amdgpu_userq_evict Sunil Khatri
2026-03-31 11:42 ` Christian König
2026-03-31 11:55 ` Khatri, Sunil
2026-03-31 7:49 ` [Patch v4 2/4] drm/amdgpu/userq: add the return code too in error condition Sunil Khatri
2026-03-31 7:49 ` [Patch v4 3/4] drm/amdgpu/userq: call dma_resv_wait_timeout without test for signalled Sunil Khatri
2026-03-31 11:51 ` Christian König
2026-03-31 11:57 ` Khatri, Sunil
2026-03-31 7:49 ` [Patch v4 4/4] drm/amdgpu/userq: use dma_fence_wait_timeout " Sunil Khatri
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox