* [PATCH v3 1/5] drm/amdgpu: promote the implicit sync to the dependent read fences
@ 2025-04-30 2:40 Prike Liang
2025-04-30 2:40 ` [PATCH v3 2/5] drm/amdgpu: don't sync the user queue eviction fence Prike Liang
` (3 more replies)
0 siblings, 4 replies; 17+ messages in thread
From: Prike Liang @ 2025-04-30 2:40 UTC (permalink / raw)
To: amd-gfx; +Cc: Alexander.Deucher, Christian.Koenig, Prike Liang
The driver doesn't want to implicitly sync on the DMA_RESV_USAGE_BOOKKEEP
usage fences, and the BOOKEEP fences should be synced explicitly. So, as
the VM implicit syncing only need to return and sync the dependent read
fences.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index 5576ed0b508f..d6ae9974c952 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -249,9 +249,8 @@ int amdgpu_sync_resv(struct amdgpu_device *adev, struct amdgpu_sync *sync,
if (resv == NULL)
return -EINVAL;
-
- /* TODO: Use DMA_RESV_USAGE_READ here */
- dma_resv_for_each_fence(&cursor, resv, DMA_RESV_USAGE_BOOKKEEP, f) {
+ /* Implicitly sync only to KERNEL, WRITE and READ */
+ dma_resv_for_each_fence(&cursor, resv, DMA_RESV_USAGE_READ, f) {
dma_fence_chain_for_each(f, f) {
struct dma_fence *tmp = dma_fence_chain_contained(f);
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v3 2/5] drm/amdgpu: don't sync the user queue eviction fence
2025-04-30 2:40 [PATCH v3 1/5] drm/amdgpu: promote the implicit sync to the dependent read fences Prike Liang
@ 2025-04-30 2:40 ` Prike Liang
2025-04-30 11:56 ` Christian König
2025-04-30 2:40 ` [PATCH v3 3/5] drm/amdgpu: fix the eviction fence dereference Prike Liang
` (2 subsequent siblings)
3 siblings, 1 reply; 17+ messages in thread
From: Prike Liang @ 2025-04-30 2:40 UTC (permalink / raw)
To: amd-gfx; +Cc: Alexander.Deucher, Christian.Koenig, Prike Liang
Don't return and sync the user queue eviction fence;
otherwise, the eviction fence will be returned as a
dependent fence during VM update and refer to the fence
result in leakage.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c | 11 +++++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 4 ++++
3 files changed, 16 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
index d86e611a9ff4..6c9b2b43a929 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
@@ -224,6 +224,17 @@ void amdgpu_eviction_fence_detach(struct amdgpu_eviction_fence_mgr *evf_mgr,
dma_fence_put(stub);
}
+bool amdgpu_eviction_fence_valid(struct dma_fence *f)
+{
+
+ if(!f)
+ return false;
+ if (f->ops == &amdgpu_eviction_fence_ops)
+ return true;
+
+ return false;
+}
+
int amdgpu_eviction_fence_init(struct amdgpu_eviction_fence_mgr *evf_mgr)
{
/* This needs to be done one time per open */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
index fcd867b7147d..d4e1975cac71 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
@@ -42,6 +42,7 @@ struct amdgpu_eviction_fence_mgr {
};
/* Eviction fence helper functions */
+bool amdgpu_eviction_fence_valid(struct dma_fence *f);
struct amdgpu_eviction_fence *
amdgpu_eviction_fence_create(struct amdgpu_eviction_fence_mgr *evf_mgr);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index d6ae9974c952..8ac685eb1be1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -185,6 +185,10 @@ static bool amdgpu_sync_test_fence(struct amdgpu_device *adev,
{
void *fence_owner = amdgpu_sync_get_owner(f);
+ /* don't sync the kgd userq eviction fence*/
+ if(amdgpu_eviction_fence_valid(f))
+ return false;
+
/* Always sync to moves, no matter what */
if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED)
return true;
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v3 3/5] drm/amdgpu: fix the eviction fence dereference
2025-04-30 2:40 [PATCH v3 1/5] drm/amdgpu: promote the implicit sync to the dependent read fences Prike Liang
2025-04-30 2:40 ` [PATCH v3 2/5] drm/amdgpu: don't sync the user queue eviction fence Prike Liang
@ 2025-04-30 2:40 ` Prike Liang
2025-04-30 11:58 ` Christian König
2025-04-30 2:40 ` [PATCH v3 4/5] drm/amdgpu: validate the eviction fence before attaching/detaching Prike Liang
2025-04-30 2:40 ` [PATCH v3 5/5] drm/amdgpu: lock the eviction fence before signaling it Prike Liang
3 siblings, 1 reply; 17+ messages in thread
From: Prike Liang @ 2025-04-30 2:40 UTC (permalink / raw)
To: amd-gfx
Cc: Alexander.Deucher, Christian.Koenig, Prike Liang,
Christian König
The dma_resv_add_fence() already refers to the added fence.
So when attaching the evciton fence to the gem bo, it needn't
refer to it anymore.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
index 6c9b2b43a929..7a5f02ef45a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
@@ -189,7 +189,6 @@ void amdgpu_eviction_fence_destroy(struct amdgpu_eviction_fence_mgr *evf_mgr)
int amdgpu_eviction_fence_attach(struct amdgpu_eviction_fence_mgr *evf_mgr,
struct amdgpu_bo *bo)
{
- struct dma_fence *ef;
struct amdgpu_eviction_fence *ev_fence;
struct dma_resv *resv = bo->tbo.base.resv;
int ret;
@@ -205,10 +204,8 @@ int amdgpu_eviction_fence_attach(struct amdgpu_eviction_fence_mgr *evf_mgr,
spin_lock(&evf_mgr->ev_fence_lock);
ev_fence = evf_mgr->ev_fence;
- if (ev_fence) {
- ef = dma_fence_get(&ev_fence->base);
- dma_resv_add_fence(resv, ef, DMA_RESV_USAGE_BOOKKEEP);
- }
+ if (ev_fence)
+ dma_resv_add_fence(resv, &ev_fence->base, DMA_RESV_USAGE_BOOKKEEP);
spin_unlock(&evf_mgr->ev_fence_lock);
return 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v3 4/5] drm/amdgpu: validate the eviction fence before attaching/detaching
2025-04-30 2:40 [PATCH v3 1/5] drm/amdgpu: promote the implicit sync to the dependent read fences Prike Liang
2025-04-30 2:40 ` [PATCH v3 2/5] drm/amdgpu: don't sync the user queue eviction fence Prike Liang
2025-04-30 2:40 ` [PATCH v3 3/5] drm/amdgpu: fix the eviction fence dereference Prike Liang
@ 2025-04-30 2:40 ` Prike Liang
2025-04-30 12:01 ` Christian König
2025-04-30 2:40 ` [PATCH v3 5/5] drm/amdgpu: lock the eviction fence before signaling it Prike Liang
3 siblings, 1 reply; 17+ messages in thread
From: Prike Liang @ 2025-04-30 2:40 UTC (permalink / raw)
To: amd-gfx; +Cc: Alexander.Deucher, Christian.Koenig, Prike Liang
Before the user queue BOs resume workqueue is scheduled;
there's no valid eviction fence to attach the gem obj.
For this case, it doesn't need to attach/detach the eviction
fence. Also, it needs to unlock the bo first before returning
from the eviction fence attached error.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
---
.../drm/amd/amdgpu/amdgpu_eviction_fence.c | 22 +++++++++++++++----
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 3 ++-
2 files changed, 20 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
index 7a5f02ef45a7..242bfb91c4f7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
@@ -196,16 +196,22 @@ int amdgpu_eviction_fence_attach(struct amdgpu_eviction_fence_mgr *evf_mgr,
if (!resv)
return 0;
+ /* Validate the eviction fence first */
+ spin_lock(&evf_mgr->ev_fence_lock);
+ ev_fence = evf_mgr->ev_fence;
+ if (!ev_fence ||
+ dma_fence_is_signaled(&evf_mgr->ev_fence->base)) {
+ spin_unlock(&evf_mgr->ev_fence_lock);
+ return 0;
+ }
+
ret = dma_resv_reserve_fences(resv, 1);
if (ret) {
DRM_DEBUG_DRIVER("Failed to resv fence space\n");
return ret;
}
- spin_lock(&evf_mgr->ev_fence_lock);
- ev_fence = evf_mgr->ev_fence;
- if (ev_fence)
- dma_resv_add_fence(resv, &ev_fence->base, DMA_RESV_USAGE_BOOKKEEP);
+ dma_resv_add_fence(resv, &ev_fence->base, DMA_RESV_USAGE_BOOKKEEP);
spin_unlock(&evf_mgr->ev_fence_lock);
return 0;
@@ -216,6 +222,14 @@ void amdgpu_eviction_fence_detach(struct amdgpu_eviction_fence_mgr *evf_mgr,
{
struct dma_fence *stub = dma_fence_get_stub();
+ spin_lock(&evf_mgr->ev_fence_lock);
+ if (!evf_mgr->ev_fence ||
+ dma_fence_is_signaled(&evf_mgr->ev_fence->base)) {
+ spin_unlock(&evf_mgr->ev_fence_lock);
+ return;
+ }
+ spin_unlock(&evf_mgr->ev_fence_lock);
+
dma_resv_replace_fences(bo->tbo.base.resv, evf_mgr->ev_fence_ctx,
stub, DMA_RESV_USAGE_BOOKKEEP);
dma_fence_put(stub);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index f03fc3cf4d50..86779dc817b9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -294,10 +294,11 @@ static int amdgpu_gem_object_open(struct drm_gem_object *obj,
else
++bo_va->ref_count;
- /* attach gfx eviction fence */
+ /* attach gfx the validated eviction fence */
r = amdgpu_eviction_fence_attach(&fpriv->evf_mgr, abo);
if (r) {
DRM_DEBUG_DRIVER("Failed to attach eviction fence to BO\n");
+ amdgpu_bo_unreserve(abo);
return r;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v3 5/5] drm/amdgpu: lock the eviction fence before signaling it
2025-04-30 2:40 [PATCH v3 1/5] drm/amdgpu: promote the implicit sync to the dependent read fences Prike Liang
` (2 preceding siblings ...)
2025-04-30 2:40 ` [PATCH v3 4/5] drm/amdgpu: validate the eviction fence before attaching/detaching Prike Liang
@ 2025-04-30 2:40 ` Prike Liang
2025-04-30 12:03 ` Christian König
3 siblings, 1 reply; 17+ messages in thread
From: Prike Liang @ 2025-04-30 2:40 UTC (permalink / raw)
To: amd-gfx; +Cc: Alexander.Deucher, Christian.Koenig, Prike Liang
Lock the eviction fence before trying to signal it.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
index 242bfb91c4f7..fed065892568 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
@@ -108,7 +108,9 @@ amdgpu_eviction_fence_suspend_worker(struct work_struct *work)
struct amdgpu_eviction_fence *ev_fence;
mutex_lock(&uq_mgr->userq_mutex);
+ spin_lock(&evf_mgr->ev_fence_lock);
ev_fence = evf_mgr->ev_fence;
+ spin_unlock(&evf_mgr->ev_fence_lock);
if (!ev_fence)
goto unlock;
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v3 2/5] drm/amdgpu: don't sync the user queue eviction fence
2025-04-30 2:40 ` [PATCH v3 2/5] drm/amdgpu: don't sync the user queue eviction fence Prike Liang
@ 2025-04-30 11:56 ` Christian König
2025-05-06 2:09 ` Liang, Prike
0 siblings, 1 reply; 17+ messages in thread
From: Christian König @ 2025-04-30 11:56 UTC (permalink / raw)
To: Prike Liang, amd-gfx; +Cc: Alexander.Deucher
On 4/30/25 04:40, Prike Liang wrote:
> Don't return and sync the user queue eviction fence;
> otherwise, the eviction fence will be returned as a
> dependent fence during VM update and refer to the fence
> result in leakage.
Please drop that patch, it shouldn't be needed any more after the changes in patch #1.
Regards,
Christian.
>
> Signed-off-by: Prike Liang <Prike.Liang@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c | 11 +++++++++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h | 1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 4 ++++
> 3 files changed, 16 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> index d86e611a9ff4..6c9b2b43a929 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> @@ -224,6 +224,17 @@ void amdgpu_eviction_fence_detach(struct amdgpu_eviction_fence_mgr *evf_mgr,
> dma_fence_put(stub);
> }
>
> +bool amdgpu_eviction_fence_valid(struct dma_fence *f)
> +{
> +
> + if(!f)
> + return false;
> + if (f->ops == &amdgpu_eviction_fence_ops)
> + return true;
> +
> + return false;
> +}
> +
> int amdgpu_eviction_fence_init(struct amdgpu_eviction_fence_mgr *evf_mgr)
> {
> /* This needs to be done one time per open */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> index fcd867b7147d..d4e1975cac71 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> @@ -42,6 +42,7 @@ struct amdgpu_eviction_fence_mgr {
> };
>
> /* Eviction fence helper functions */
> +bool amdgpu_eviction_fence_valid(struct dma_fence *f);
> struct amdgpu_eviction_fence *
> amdgpu_eviction_fence_create(struct amdgpu_eviction_fence_mgr *evf_mgr);
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> index d6ae9974c952..8ac685eb1be1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> @@ -185,6 +185,10 @@ static bool amdgpu_sync_test_fence(struct amdgpu_device *adev,
> {
> void *fence_owner = amdgpu_sync_get_owner(f);
>
> + /* don't sync the kgd userq eviction fence*/
> + if(amdgpu_eviction_fence_valid(f))
> + return false;
> +
> /* Always sync to moves, no matter what */
> if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED)
> return true;
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 3/5] drm/amdgpu: fix the eviction fence dereference
2025-04-30 2:40 ` [PATCH v3 3/5] drm/amdgpu: fix the eviction fence dereference Prike Liang
@ 2025-04-30 11:58 ` Christian König
2025-05-06 2:19 ` Liang, Prike
0 siblings, 1 reply; 17+ messages in thread
From: Christian König @ 2025-04-30 11:58 UTC (permalink / raw)
To: Prike Liang, amd-gfx; +Cc: Alexander.Deucher, Christian.Koenig
On 4/30/25 04:40, Prike Liang wrote:
> The dma_resv_add_fence() already refers to the added fence.
> So when attaching the evciton fence to the gem bo, it needn't
> refer to it anymore.
>
> Signed-off-by: Prike Liang <Prike.Liang@amd.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>
This is a bug fix and as such should always come as first patch in a series.
Please make sure to commit this one to amd-staging-drm-next ASAP.
Regards,
Christian.
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> index 6c9b2b43a929..7a5f02ef45a7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> @@ -189,7 +189,6 @@ void amdgpu_eviction_fence_destroy(struct amdgpu_eviction_fence_mgr *evf_mgr)
> int amdgpu_eviction_fence_attach(struct amdgpu_eviction_fence_mgr *evf_mgr,
> struct amdgpu_bo *bo)
> {
> - struct dma_fence *ef;
> struct amdgpu_eviction_fence *ev_fence;
> struct dma_resv *resv = bo->tbo.base.resv;
> int ret;
> @@ -205,10 +204,8 @@ int amdgpu_eviction_fence_attach(struct amdgpu_eviction_fence_mgr *evf_mgr,
>
> spin_lock(&evf_mgr->ev_fence_lock);
> ev_fence = evf_mgr->ev_fence;
> - if (ev_fence) {
> - ef = dma_fence_get(&ev_fence->base);
> - dma_resv_add_fence(resv, ef, DMA_RESV_USAGE_BOOKKEEP);
> - }
> + if (ev_fence)
> + dma_resv_add_fence(resv, &ev_fence->base, DMA_RESV_USAGE_BOOKKEEP);
> spin_unlock(&evf_mgr->ev_fence_lock);
>
> return 0;
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 4/5] drm/amdgpu: validate the eviction fence before attaching/detaching
2025-04-30 2:40 ` [PATCH v3 4/5] drm/amdgpu: validate the eviction fence before attaching/detaching Prike Liang
@ 2025-04-30 12:01 ` Christian König
2025-05-06 8:22 ` Liang, Prike
0 siblings, 1 reply; 17+ messages in thread
From: Christian König @ 2025-04-30 12:01 UTC (permalink / raw)
To: Prike Liang, amd-gfx; +Cc: Alexander.Deucher, Christian.Koenig
On 4/30/25 04:40, Prike Liang wrote:
> Before the user queue BOs resume workqueue is scheduled;
> there's no valid eviction fence to attach the gem obj.
> For this case, it doesn't need to attach/detach the eviction
> fence. Also, it needs to unlock the bo first before returning
> from the eviction fence attached error.
>
> Signed-off-by: Prike Liang <Prike.Liang@amd.com>
> ---
> .../drm/amd/amdgpu/amdgpu_eviction_fence.c | 22 +++++++++++++++----
> drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 3 ++-
> 2 files changed, 20 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> index 7a5f02ef45a7..242bfb91c4f7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> @@ -196,16 +196,22 @@ int amdgpu_eviction_fence_attach(struct amdgpu_eviction_fence_mgr *evf_mgr,
> if (!resv)
> return 0;
>
> + /* Validate the eviction fence first */
> + spin_lock(&evf_mgr->ev_fence_lock);
> + ev_fence = evf_mgr->ev_fence;
> + if (!ev_fence ||
> + dma_fence_is_signaled(&evf_mgr->ev_fence->base)) {
> + spin_unlock(&evf_mgr->ev_fence_lock);
> + return 0;
> + }
> +
> ret = dma_resv_reserve_fences(resv, 1);
> if (ret) {
> DRM_DEBUG_DRIVER("Failed to resv fence space\n");
> return ret;
> }
>
> - spin_lock(&evf_mgr->ev_fence_lock);
> - ev_fence = evf_mgr->ev_fence;
> - if (ev_fence)
> - dma_resv_add_fence(resv, &ev_fence->base, DMA_RESV_USAGE_BOOKKEEP);
> + dma_resv_add_fence(resv, &ev_fence->base, DMA_RESV_USAGE_BOOKKEEP);
Once more: Absolutely clear NAK to that! You are breaking the logic here.
> spin_unlock(&evf_mgr->ev_fence_lock);
>
> return 0;
> @@ -216,6 +222,14 @@ void amdgpu_eviction_fence_detach(struct amdgpu_eviction_fence_mgr *evf_mgr,
> {
> struct dma_fence *stub = dma_fence_get_stub();
>
> + spin_lock(&evf_mgr->ev_fence_lock);
> + if (!evf_mgr->ev_fence ||
> + dma_fence_is_signaled(&evf_mgr->ev_fence->base)) {
> + spin_unlock(&evf_mgr->ev_fence_lock);
> + return;
> + }
> + spin_unlock(&evf_mgr->ev_fence_lock);
> +
That is unnecessary as far as I can see.
> dma_resv_replace_fences(bo->tbo.base.resv, evf_mgr->ev_fence_ctx,
> stub, DMA_RESV_USAGE_BOOKKEEP);
> dma_fence_put(stub);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index f03fc3cf4d50..86779dc817b9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -294,10 +294,11 @@ static int amdgpu_gem_object_open(struct drm_gem_object *obj,
> else
> ++bo_va->ref_count;
>
> - /* attach gfx eviction fence */
> + /* attach gfx the validated eviction fence */
> r = amdgpu_eviction_fence_attach(&fpriv->evf_mgr, abo);
> if (r) {
> DRM_DEBUG_DRIVER("Failed to attach eviction fence to BO\n");
> + amdgpu_bo_unreserve(abo);
Adding this here looks like the only valid fix in the patch.
Regards,
Christian.
> return r;
> }
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 5/5] drm/amdgpu: lock the eviction fence before signaling it
2025-04-30 2:40 ` [PATCH v3 5/5] drm/amdgpu: lock the eviction fence before signaling it Prike Liang
@ 2025-04-30 12:03 ` Christian König
0 siblings, 0 replies; 17+ messages in thread
From: Christian König @ 2025-04-30 12:03 UTC (permalink / raw)
To: Prike Liang, amd-gfx; +Cc: Alexander.Deucher
On 4/30/25 04:40, Prike Liang wrote:
> Lock the eviction fence before trying to signal it.
>
> Signed-off-by: Prike Liang <Prike.Liang@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> index 242bfb91c4f7..fed065892568 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> @@ -108,7 +108,9 @@ amdgpu_eviction_fence_suspend_worker(struct work_struct *work)
> struct amdgpu_eviction_fence *ev_fence;
>
> mutex_lock(&uq_mgr->userq_mutex);
> + spin_lock(&evf_mgr->ev_fence_lock);
> ev_fence = evf_mgr->ev_fence;
> + spin_unlock(&evf_mgr->ev_fence_lock);
That's a good catch, but won't work like this.
You need to grab a reference to the fence while holding the lock, e.g. something like dma_fence_get(evf_mgr->ev_fence);
Regards,
Christian.
> if (!ev_fence)
> goto unlock;
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [PATCH v3 2/5] drm/amdgpu: don't sync the user queue eviction fence
2025-04-30 11:56 ` Christian König
@ 2025-05-06 2:09 ` Liang, Prike
2025-05-06 8:23 ` Christian König
0 siblings, 1 reply; 17+ messages in thread
From: Liang, Prike @ 2025-05-06 2:09 UTC (permalink / raw)
To: Koenig, Christian, amd-gfx@lists.freedesktop.org; +Cc: Deucher, Alexander
[Public]
> From: Koenig, Christian <Christian.Koenig@amd.com>
> Sent: Wednesday, April 30, 2025 7:57 PM
> To: Liang, Prike <Prike.Liang@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>
> Subject: Re: [PATCH v3 2/5] drm/amdgpu: don't sync the user queue eviction
> fence
>
> On 4/30/25 04:40, Prike Liang wrote:
> > Don't return and sync the user queue eviction fence; otherwise, the
> > eviction fence will be returned as a dependent fence during VM update
> > and refer to the fence result in leakage.
>
> Please drop that patch, it shouldn't be needed any more after the changes in
> patch #1.
Yes, may I get patch#1( drm/amdgpu: promote the implicit sync to the dependent read fences) reviewed?
> Regards,
> Christian.
>
> >
> > Signed-off-by: Prike Liang <Prike.Liang@amd.com>
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c | 11 +++++++++++
> > drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h | 1 +
> > drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 4 ++++
> > 3 files changed, 16 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> > index d86e611a9ff4..6c9b2b43a929 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> > @@ -224,6 +224,17 @@ void amdgpu_eviction_fence_detach(struct
> amdgpu_eviction_fence_mgr *evf_mgr,
> > dma_fence_put(stub);
> > }
> >
> > +bool amdgpu_eviction_fence_valid(struct dma_fence *f) {
> > +
> > + if(!f)
> > + return false;
> > + if (f->ops == &amdgpu_eviction_fence_ops)
> > + return true;
> > +
> > + return false;
> > +}
> > +
> > int amdgpu_eviction_fence_init(struct amdgpu_eviction_fence_mgr
> > *evf_mgr) {
> > /* This needs to be done one time per open */ diff --git
> > a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> > index fcd867b7147d..d4e1975cac71 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> > @@ -42,6 +42,7 @@ struct amdgpu_eviction_fence_mgr { };
> >
> > /* Eviction fence helper functions */
> > +bool amdgpu_eviction_fence_valid(struct dma_fence *f);
> > struct amdgpu_eviction_fence *
> > amdgpu_eviction_fence_create(struct amdgpu_eviction_fence_mgr
> > *evf_mgr);
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > index d6ae9974c952..8ac685eb1be1 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > @@ -185,6 +185,10 @@ static bool amdgpu_sync_test_fence(struct
> > amdgpu_device *adev, {
> > void *fence_owner = amdgpu_sync_get_owner(f);
> >
> > + /* don't sync the kgd userq eviction fence*/
> > + if(amdgpu_eviction_fence_valid(f))
> > + return false;
> > +
> > /* Always sync to moves, no matter what */
> > if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED)
> > return true;
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [PATCH v3 3/5] drm/amdgpu: fix the eviction fence dereference
2025-04-30 11:58 ` Christian König
@ 2025-05-06 2:19 ` Liang, Prike
0 siblings, 0 replies; 17+ messages in thread
From: Liang, Prike @ 2025-05-06 2:19 UTC (permalink / raw)
To: Koenig, Christian, amd-gfx@lists.freedesktop.org; +Cc: Deucher, Alexander
[Public]
> From: Koenig, Christian <Christian.Koenig@amd.com>
> Sent: Wednesday, April 30, 2025 7:58 PM
> To: Liang, Prike <Prike.Liang@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian
> <Christian.Koenig@amd.com>
> Subject: Re: [PATCH v3 3/5] drm/amdgpu: fix the eviction fence dereference
>
> On 4/30/25 04:40, Prike Liang wrote:
> > The dma_resv_add_fence() already refers to the added fence.
> > So when attaching the evciton fence to the gem bo, it needn't refer to
> > it anymore.
> >
> > Signed-off-by: Prike Liang <Prike.Liang@amd.com>
> > Reviewed-by: Christian König <christian.koenig@amd.com>
>
> This is a bug fix and as such should always come as first patch in a series.
>
> Please make sure to commit this one to amd-staging-drm-next ASAP.
Thank you for the reminder. The eviction fence release total fixes need to include the patch#1, but this fix can be pushed separately. I have pushed it now.
> Regards,
> Christian.
>
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c | 7 ++-----
> > 1 file changed, 2 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> > index 6c9b2b43a929..7a5f02ef45a7 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> > @@ -189,7 +189,6 @@ void amdgpu_eviction_fence_destroy(struct
> > amdgpu_eviction_fence_mgr *evf_mgr) int
> amdgpu_eviction_fence_attach(struct amdgpu_eviction_fence_mgr *evf_mgr,
> > struct amdgpu_bo *bo)
> > {
> > - struct dma_fence *ef;
> > struct amdgpu_eviction_fence *ev_fence;
> > struct dma_resv *resv = bo->tbo.base.resv;
> > int ret;
> > @@ -205,10 +204,8 @@ int amdgpu_eviction_fence_attach(struct
> > amdgpu_eviction_fence_mgr *evf_mgr,
> >
> > spin_lock(&evf_mgr->ev_fence_lock);
> > ev_fence = evf_mgr->ev_fence;
> > - if (ev_fence) {
> > - ef = dma_fence_get(&ev_fence->base);
> > - dma_resv_add_fence(resv, ef, DMA_RESV_USAGE_BOOKKEEP);
> > - }
> > + if (ev_fence)
> > + dma_resv_add_fence(resv, &ev_fence->base,
> DMA_RESV_USAGE_BOOKKEEP);
> > spin_unlock(&evf_mgr->ev_fence_lock);
> >
> > return 0;
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [PATCH v3 4/5] drm/amdgpu: validate the eviction fence before attaching/detaching
2025-04-30 12:01 ` Christian König
@ 2025-05-06 8:22 ` Liang, Prike
2025-05-06 8:38 ` Christian König
0 siblings, 1 reply; 17+ messages in thread
From: Liang, Prike @ 2025-05-06 8:22 UTC (permalink / raw)
To: Koenig, Christian, amd-gfx@lists.freedesktop.org; +Cc: Deucher, Alexander
[Public]
> From: Koenig, Christian <Christian.Koenig@amd.com>
> Sent: Wednesday, April 30, 2025 8:02 PM
> To: Liang, Prike <Prike.Liang@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian
> <Christian.Koenig@amd.com>
> Subject: Re: [PATCH v3 4/5] drm/amdgpu: validate the eviction fence before
> attaching/detaching
>
>
>
> On 4/30/25 04:40, Prike Liang wrote:
> > Before the user queue BOs resume workqueue is scheduled; there's no
> > valid eviction fence to attach the gem obj.
> > For this case, it doesn't need to attach/detach the eviction fence.
> > Also, it needs to unlock the bo first before returning from the
> > eviction fence attached error.
> >
> > Signed-off-by: Prike Liang <Prike.Liang@amd.com>
> > ---
> > .../drm/amd/amdgpu/amdgpu_eviction_fence.c | 22 +++++++++++++++----
> > drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 3 ++-
> > 2 files changed, 20 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> > index 7a5f02ef45a7..242bfb91c4f7 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> > @@ -196,16 +196,22 @@ int amdgpu_eviction_fence_attach(struct
> amdgpu_eviction_fence_mgr *evf_mgr,
> > if (!resv)
> > return 0;
> >
> > + /* Validate the eviction fence first */
> > + spin_lock(&evf_mgr->ev_fence_lock);
> > + ev_fence = evf_mgr->ev_fence;
> > + if (!ev_fence ||
> > + dma_fence_is_signaled(&evf_mgr->ev_fence->base)) {
> > + spin_unlock(&evf_mgr->ev_fence_lock);
> > + return 0;
> > + }
> > +
> > ret = dma_resv_reserve_fences(resv, 1);
> > if (ret) {
> > DRM_DEBUG_DRIVER("Failed to resv fence space\n");
> > return ret;
> > }
> >
> > - spin_lock(&evf_mgr->ev_fence_lock);
> > - ev_fence = evf_mgr->ev_fence;
> > - if (ev_fence)
> > - dma_resv_add_fence(resv, &ev_fence->base,
> DMA_RESV_USAGE_BOOKKEEP);
> > + dma_resv_add_fence(resv, &ev_fence->base,
> DMA_RESV_USAGE_BOOKKEEP);
>
> Once more: Absolutely clear NAK to that! You are breaking the logic here.
>
>
> > spin_unlock(&evf_mgr->ev_fence_lock);
> >
> > return 0;
> > @@ -216,6 +222,14 @@ void amdgpu_eviction_fence_detach(struct
> > amdgpu_eviction_fence_mgr *evf_mgr, {
> > struct dma_fence *stub = dma_fence_get_stub();
> >
> > + spin_lock(&evf_mgr->ev_fence_lock);
> > + if (!evf_mgr->ev_fence ||
> > + dma_fence_is_signaled(&evf_mgr->ev_fence->base)) {
> > + spin_unlock(&evf_mgr->ev_fence_lock);
> > + return;
> > + }
> > + spin_unlock(&evf_mgr->ev_fence_lock);
> > +
> That is unnecessary as far as I can see.
>
> > dma_resv_replace_fences(bo->tbo.base.resv, evf_mgr->ev_fence_ctx,
> > stub, DMA_RESV_USAGE_BOOKKEEP);
> > dma_fence_put(stub);
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > index f03fc3cf4d50..86779dc817b9 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > @@ -294,10 +294,11 @@ static int amdgpu_gem_object_open(struct
> drm_gem_object *obj,
> > else
> > ++bo_va->ref_count;
> >
> > - /* attach gfx eviction fence */
> > + /* attach gfx the validated eviction fence */
> > r = amdgpu_eviction_fence_attach(&fpriv->evf_mgr, abo);
> > if (r) {
> > DRM_DEBUG_DRIVER("Failed to attach eviction fence to BO\n");
> > + amdgpu_bo_unreserve(abo);
> Adding this here looks like the only valid fix in the patch.
As the eviction fence will be invalidated until the user queue is created from the user space, here it requires validating the eviction fence before trying to attach and detach it to the reservation.
I will try to draft a patch for validating the eviction fence at attach/detach separately with this attach error handler change.
Thanks,
Prike
>
> Regards,
> Christian.
>
> > return r;
> > }
> >
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 2/5] drm/amdgpu: don't sync the user queue eviction fence
2025-05-06 2:09 ` Liang, Prike
@ 2025-05-06 8:23 ` Christian König
2025-05-06 8:59 ` Liang, Prike
0 siblings, 1 reply; 17+ messages in thread
From: Christian König @ 2025-05-06 8:23 UTC (permalink / raw)
To: Liang, Prike, amd-gfx@lists.freedesktop.org; +Cc: Deucher, Alexander
On 5/6/25 04:09, Liang, Prike wrote:
> [Public]
>
>> From: Koenig, Christian <Christian.Koenig@amd.com>
>> Sent: Wednesday, April 30, 2025 7:57 PM
>> To: Liang, Prike <Prike.Liang@amd.com>; amd-gfx@lists.freedesktop.org
>> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>
>> Subject: Re: [PATCH v3 2/5] drm/amdgpu: don't sync the user queue eviction
>> fence
>>
>> On 4/30/25 04:40, Prike Liang wrote:
>>> Don't return and sync the user queue eviction fence; otherwise, the
>>> eviction fence will be returned as a dependent fence during VM update
>>> and refer to the fence result in leakage.
>>
>> Please drop that patch, it shouldn't be needed any more after the changes in
>> patch #1.
>
> Yes, may I get patch#1( drm/amdgpu: promote the implicit sync to the dependent read fences) reviewed?
Sorry, I thought I've already done that.
Feel free to add Reviewed-by: Christian König <christian.koenig@amd.com> to that patch.
Regards,
Christian.
>
>
>> Regards,
>> Christian.
>>
>>>
>>> Signed-off-by: Prike Liang <Prike.Liang@amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c | 11 +++++++++++
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h | 1 +
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 4 ++++
>>> 3 files changed, 16 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
>>> index d86e611a9ff4..6c9b2b43a929 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
>>> @@ -224,6 +224,17 @@ void amdgpu_eviction_fence_detach(struct
>> amdgpu_eviction_fence_mgr *evf_mgr,
>>> dma_fence_put(stub);
>>> }
>>>
>>> +bool amdgpu_eviction_fence_valid(struct dma_fence *f) {
>>> +
>>> + if(!f)
>>> + return false;
>>> + if (f->ops == &amdgpu_eviction_fence_ops)
>>> + return true;
>>> +
>>> + return false;
>>> +}
>>> +
>>> int amdgpu_eviction_fence_init(struct amdgpu_eviction_fence_mgr
>>> *evf_mgr) {
>>> /* This needs to be done one time per open */ diff --git
>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>>> index fcd867b7147d..d4e1975cac71 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>>> @@ -42,6 +42,7 @@ struct amdgpu_eviction_fence_mgr { };
>>>
>>> /* Eviction fence helper functions */
>>> +bool amdgpu_eviction_fence_valid(struct dma_fence *f);
>>> struct amdgpu_eviction_fence *
>>> amdgpu_eviction_fence_create(struct amdgpu_eviction_fence_mgr
>>> *evf_mgr);
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>> index d6ae9974c952..8ac685eb1be1 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>> @@ -185,6 +185,10 @@ static bool amdgpu_sync_test_fence(struct
>>> amdgpu_device *adev, {
>>> void *fence_owner = amdgpu_sync_get_owner(f);
>>>
>>> + /* don't sync the kgd userq eviction fence*/
>>> + if(amdgpu_eviction_fence_valid(f))
>>> + return false;
>>> +
>>> /* Always sync to moves, no matter what */
>>> if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED)
>>> return true;
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 4/5] drm/amdgpu: validate the eviction fence before attaching/detaching
2025-05-06 8:22 ` Liang, Prike
@ 2025-05-06 8:38 ` Christian König
2025-05-08 7:08 ` Liang, Prike
0 siblings, 1 reply; 17+ messages in thread
From: Christian König @ 2025-05-06 8:38 UTC (permalink / raw)
To: Liang, Prike, amd-gfx@lists.freedesktop.org; +Cc: Deucher, Alexander
On 5/6/25 10:22, Liang, Prike wrote:
>>> - /* attach gfx eviction fence */
>>> + /* attach gfx the validated eviction fence */
>>> r = amdgpu_eviction_fence_attach(&fpriv->evf_mgr, abo);
>>> if (r) {
>>> DRM_DEBUG_DRIVER("Failed to attach eviction fence to BO\n");
>>> + amdgpu_bo_unreserve(abo);
>> Adding this here looks like the only valid fix in the patch.
> As the eviction fence will be invalidated until the user queue is created from the user space, here it requires validating the eviction fence before trying to attach and detach it to the reservation.
> I will try to draft a patch for validating the eviction fence at attach/detach separately with this attach error handler change.
No, that is clearly incorrect.
See the eviction fence works like this:
Validating thread
* Create new eviction fence
* Publish eviction fence
* Lock all BOs
* Replace eviction fence
Attaching:
* Lock BO
* Attach current eviction fence
* Unlock BO
Detaching:
* Lock BO
* Unconditionally detach all possible eviction fences, no matter if new or old.
* Unlock BO
This order is necessary or otherwise you break the logic here.
Any additional check will completely mess that up because it makes the operation racy.
Regards,
Christian.
>
> Thanks,
> Prike
>
>>
>> Regards,
>> Christian.
>>
>>> return r;
>>> }
>>>
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [PATCH v3 2/5] drm/amdgpu: don't sync the user queue eviction fence
2025-05-06 8:23 ` Christian König
@ 2025-05-06 8:59 ` Liang, Prike
0 siblings, 0 replies; 17+ messages in thread
From: Liang, Prike @ 2025-05-06 8:59 UTC (permalink / raw)
To: Koenig, Christian, amd-gfx@lists.freedesktop.org; +Cc: Deucher, Alexander
[Public]
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig@amd.com>
> Sent: Tuesday, May 6, 2025 4:23 PM
> To: Liang, Prike <Prike.Liang@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>
> Subject: Re: [PATCH v3 2/5] drm/amdgpu: don't sync the user queue eviction
> fence
>
> On 5/6/25 04:09, Liang, Prike wrote:
> > [Public]
> >
> >> From: Koenig, Christian <Christian.Koenig@amd.com>
> >> Sent: Wednesday, April 30, 2025 7:57 PM
> >> To: Liang, Prike <Prike.Liang@amd.com>; amd-gfx@lists.freedesktop.org
> >> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>
> >> Subject: Re: [PATCH v3 2/5] drm/amdgpu: don't sync the user queue
> >> eviction fence
> >>
> >> On 4/30/25 04:40, Prike Liang wrote:
> >>> Don't return and sync the user queue eviction fence; otherwise, the
> >>> eviction fence will be returned as a dependent fence during VM
> >>> update and refer to the fence result in leakage.
> >>
> >> Please drop that patch, it shouldn't be needed any more after the
> >> changes in patch #1.
> >
> > Yes, may I get patch#1( drm/amdgpu: promote the implicit sync to the
> dependent read fences) reviewed?
>
> Sorry, I thought I've already done that.
>
> Feel free to add Reviewed-by: Christian König <christian.koenig@amd.com> to
> that patch.
>
> Regards,
> Christian.
Thanks, I have pushed the eviction fence release fixes.
> >
> >
> >> Regards,
> >> Christian.
> >>
> >>>
> >>> Signed-off-by: Prike Liang <Prike.Liang@amd.com>
> >>> ---
> >>> drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c | 11 +++++++++++
> >>> drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h | 1 +
> >>> drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 4 ++++
> >>> 3 files changed, 16 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> >>> index d86e611a9ff4..6c9b2b43a929 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> >>> @@ -224,6 +224,17 @@ void amdgpu_eviction_fence_detach(struct
> >> amdgpu_eviction_fence_mgr *evf_mgr,
> >>> dma_fence_put(stub);
> >>> }
> >>>
> >>> +bool amdgpu_eviction_fence_valid(struct dma_fence *f) {
> >>> +
> >>> + if(!f)
> >>> + return false;
> >>> + if (f->ops == &amdgpu_eviction_fence_ops)
> >>> + return true;
> >>> +
> >>> + return false;
> >>> +}
> >>> +
> >>> int amdgpu_eviction_fence_init(struct amdgpu_eviction_fence_mgr
> >>> *evf_mgr) {
> >>> /* This needs to be done one time per open */ diff --git
> >>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> >>> index fcd867b7147d..d4e1975cac71 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> >>> @@ -42,6 +42,7 @@ struct amdgpu_eviction_fence_mgr { };
> >>>
> >>> /* Eviction fence helper functions */
> >>> +bool amdgpu_eviction_fence_valid(struct dma_fence *f);
> >>> struct amdgpu_eviction_fence *
> >>> amdgpu_eviction_fence_create(struct amdgpu_eviction_fence_mgr
> >>> *evf_mgr);
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> >>> index d6ae9974c952..8ac685eb1be1 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> >>> @@ -185,6 +185,10 @@ static bool amdgpu_sync_test_fence(struct
> >>> amdgpu_device *adev, {
> >>> void *fence_owner = amdgpu_sync_get_owner(f);
> >>>
> >>> + /* don't sync the kgd userq eviction fence*/
> >>> + if(amdgpu_eviction_fence_valid(f))
> >>> + return false;
> >>> +
> >>> /* Always sync to moves, no matter what */
> >>> if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED)
> >>> return true;
> >
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [PATCH v3 4/5] drm/amdgpu: validate the eviction fence before attaching/detaching
2025-05-06 8:38 ` Christian König
@ 2025-05-08 7:08 ` Liang, Prike
2025-05-08 9:40 ` Christian König
0 siblings, 1 reply; 17+ messages in thread
From: Liang, Prike @ 2025-05-08 7:08 UTC (permalink / raw)
To: Koenig, Christian, amd-gfx@lists.freedesktop.org; +Cc: Deucher, Alexander
[Public]
> From: Koenig, Christian <Christian.Koenig@amd.com>
> Sent: Tuesday, May 6, 2025 4:39 PM
> To: Liang, Prike <Prike.Liang@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>
> Subject: Re: [PATCH v3 4/5] drm/amdgpu: validate the eviction fence before
> attaching/detaching
>
> On 5/6/25 10:22, Liang, Prike wrote:
> >>> - /* attach gfx eviction fence */
> >>> + /* attach gfx the validated eviction fence */
> >>> r = amdgpu_eviction_fence_attach(&fpriv->evf_mgr, abo);
> >>> if (r) {
> >>> DRM_DEBUG_DRIVER("Failed to attach eviction fence to BO\n");
> >>> + amdgpu_bo_unreserve(abo);
> >> Adding this here looks like the only valid fix in the patch.
> > As the eviction fence will be invalidated until the user queue is created from the
> user space, here it requires validating the eviction fence before trying to attach
> and detach it to the reservation.
> > I will try to draft a patch for validating the eviction fence at attach/detach
> separately with this attach error handler change.
>
>
> No, that is clearly incorrect.
>
> See the eviction fence works like this:
>
> Validating thread
> * Create new eviction fence
> * Publish eviction fence
> * Lock all BOs
> * Replace eviction fence
>
> Attaching:
> * Lock BO
> * Attach current eviction fence
> * Unlock BO
>
> Detaching:
> * Lock BO
> * Unconditionally detach all possible eviction fences, no matter if new or old.
> * Unlock BO
>
> This order is necessary or otherwise you break the logic here.
>
> Any additional check will completely mess that up because it makes the operation
> racy.
As the user queue eviction fence doesn't create until user queue submission, the eviction fence will be NULL without userq submission. So do we still try to attach/detach the null eviction fence for the kernel queue case?
It's ok without validating the eviction fence or userqueue work before attach/detach the eviction fence, but it will cost cycles for walking over the reservation fences array in the dma_resv_reserve_fences() and dma_resv_replace_fences().
> Regards,
> Christian.
>
> >
> > Thanks,
> > Prike
> >
> >>
> >> Regards,
> >> Christian.
> >>
> >>> return r;
> >>> }
> >>>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 4/5] drm/amdgpu: validate the eviction fence before attaching/detaching
2025-05-08 7:08 ` Liang, Prike
@ 2025-05-08 9:40 ` Christian König
0 siblings, 0 replies; 17+ messages in thread
From: Christian König @ 2025-05-08 9:40 UTC (permalink / raw)
To: Liang, Prike, amd-gfx@lists.freedesktop.org; +Cc: Deucher, Alexander
On 5/8/25 09:08, Liang, Prike wrote:
> [Public]
>
>> From: Koenig, Christian <Christian.Koenig@amd.com>
>> Sent: Tuesday, May 6, 2025 4:39 PM
>> To: Liang, Prike <Prike.Liang@amd.com>; amd-gfx@lists.freedesktop.org
>> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>
>> Subject: Re: [PATCH v3 4/5] drm/amdgpu: validate the eviction fence before
>> attaching/detaching
>>
>> On 5/6/25 10:22, Liang, Prike wrote:
>>>>> - /* attach gfx eviction fence */
>>>>> + /* attach gfx the validated eviction fence */
>>>>> r = amdgpu_eviction_fence_attach(&fpriv->evf_mgr, abo);
>>>>> if (r) {
>>>>> DRM_DEBUG_DRIVER("Failed to attach eviction fence to BO\n");
>>>>> + amdgpu_bo_unreserve(abo);
>>>> Adding this here looks like the only valid fix in the patch.
>>> As the eviction fence will be invalidated until the user queue is created from the
>> user space, here it requires validating the eviction fence before trying to attach
>> and detach it to the reservation.
>>> I will try to draft a patch for validating the eviction fence at attach/detach
>> separately with this attach error handler change.
>>
>>
>> No, that is clearly incorrect.
>>
>> See the eviction fence works like this:
>>
>> Validating thread
>> * Create new eviction fence
>> * Publish eviction fence
>> * Lock all BOs
>> * Replace eviction fence
>>
>> Attaching:
>> * Lock BO
>> * Attach current eviction fence
>> * Unlock BO
>>
>> Detaching:
>> * Lock BO
>> * Unconditionally detach all possible eviction fences, no matter if new or old.
>> * Unlock BO
>>
>> This order is necessary or otherwise you break the logic here.
>>
>> Any additional check will completely mess that up because it makes the operation
>> racy.
> As the user queue eviction fence doesn't create until user queue submission, the eviction fence will be NULL without userq submission. So do we still try to attach/detach the null eviction fence for the kernel queue case?
Yes, the problem is that we can't check the eviction fence before we have taken the reservation lock.
Otherwise it can always be that there is an eviction fence created between the check and attaching it.
I also suggested before that the eviction fence is never NULL, we just start with a dummy stub fence (see function dma_fence_get_stub()). This way we can avoid all the NULL checks.
> It's ok without validating the eviction fence or userqueue work before attach/detach the eviction fence, but it will cost cycles for walking over the reservation fences array in the dma_resv_reserve_fences() and dma_resv_replace_fences().
That's completely irrelevant. Important is that we have the right sequence to not create a race condition.
Regards,
Christian.
>
>> Regards,
>> Christian.
>>
>>>
>>> Thanks,
>>> Prike
>>>
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> return r;
>>>>> }
>>>>>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-05-08 9:40 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-30 2:40 [PATCH v3 1/5] drm/amdgpu: promote the implicit sync to the dependent read fences Prike Liang
2025-04-30 2:40 ` [PATCH v3 2/5] drm/amdgpu: don't sync the user queue eviction fence Prike Liang
2025-04-30 11:56 ` Christian König
2025-05-06 2:09 ` Liang, Prike
2025-05-06 8:23 ` Christian König
2025-05-06 8:59 ` Liang, Prike
2025-04-30 2:40 ` [PATCH v3 3/5] drm/amdgpu: fix the eviction fence dereference Prike Liang
2025-04-30 11:58 ` Christian König
2025-05-06 2:19 ` Liang, Prike
2025-04-30 2:40 ` [PATCH v3 4/5] drm/amdgpu: validate the eviction fence before attaching/detaching Prike Liang
2025-04-30 12:01 ` Christian König
2025-05-06 8:22 ` Liang, Prike
2025-05-06 8:38 ` Christian König
2025-05-08 7:08 ` Liang, Prike
2025-05-08 9:40 ` Christian König
2025-04-30 2:40 ` [PATCH v3 5/5] drm/amdgpu: lock the eviction fence before signaling it Prike Liang
2025-04-30 12:03 ` Christian König
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox