[PATCH 0/5] drm/exec: drm

public inbox for amd-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed

* [PATCH 0/5] drm/exec: drm_exec polishing
@ 2026-03-31  9:20 Thomas Hellström
  2026-03-31  9:20 ` [PATCH 1/5] drm/exec: Remove the index parameter from drm_exec_for_each_locked_obj[_reverse] Thomas Hellström
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Thomas Hellström @ 2026-03-31  9:20 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Felix Kuehling, Alex Deucher,
	Christian König, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Danilo Krummrich, Matthew Brost, Alice Ryhl, Rob Clark,
	Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang, Sean Paul,
	Marijn Suijten, amd-gfx, dri-devel, linux-arm-msm, freedreno

During the work towards enabling exhaustive eviction using full
ww locking in TTM, Christian indicated that the path for the
drm_exec moving forward was to be a full drm_exec helper with
things like userptr validation rather than a WW transaction
abstraction. The idea was then briefly discussed to craft a
WW transaction helper and then subclass that with drm_exec
with the idea that the WW transaction helper could be used in
TTM for eviction and for other uses that didn't mandate a full
exec sequence.

Regardless whether that actually happens or not, this series
aims to clean up abuses of drm_exec internals in drivers
so that future development of drm_exec isn't blocked by
such driver usage.

Except for patch 3 which is a small cleanup only.

Thomas Hellström (5):
  drm/exec: Remove the index parameter from
    drm_exec_for_each_locked_obj[_reverse]
  drm/msm: Remove abuse of drm_exec internals
  drm/exec: Make the drm_exec_until_all_locked() macro more readable
  drm/exec, drm/xe: Avoid abusing the drm_exec retry pointer
  drm/exec, drm/xe, drm/amdgpu: Add an accessor for struct
    drm_exec::ticket

 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        | 15 ++---
 .../drm/amd/amdgpu/amdgpu_eviction_fence.c    |  3 +-
 drivers/gpu/drm/drm_exec.c                    |  6 +-
 drivers/gpu/drm/drm_gpuvm.c                   |  3 +-
 drivers/gpu/drm/msm/msm_gem.h                 |  1 +
 drivers/gpu/drm/msm/msm_gem_submit.c          |  4 +-
 drivers/gpu/drm/xe/xe_validation.c            |  4 +-
 drivers/gpu/drm/xe/xe_validation.h            |  2 +-
 drivers/gpu/drm/xe/xe_vm.c                    |  3 +-
 include/drm/drm_exec.h                        | 55 +++++++++++++------
 11 files changed, 58 insertions(+), 42 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/5] drm/exec: Remove the index parameter from drm_exec_for_each_locked_obj[_reverse]
  2026-03-31  9:20 [PATCH 0/5] drm/exec: drm_exec polishing Thomas Hellström
@ 2026-03-31  9:20 ` Thomas Hellström
  2026-03-31  9:29   ` Christian König
  2026-03-31  9:20 ` [PATCH 2/5] drm/msm: Remove abuse of drm_exec internals Thomas Hellström
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Thomas Hellström @ 2026-03-31  9:20 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Felix Kuehling, Alex Deucher,
	Christian König, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Danilo Krummrich, Matthew Brost, Alice Ryhl, Rob Clark,
	Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang, Sean Paul,
	Marijn Suijten, amd-gfx, dri-devel, linux-arm-msm, freedreno

Nobody makes any use of it. Possible internal future users can
instead use the _index variable. External users shouldn't use
it since the array it's pointing into is internal drm_exec state.

Assisted-by: GitHub Copilot:claude-sonnet-4.6
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c             |  9 +++------
 drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c |  3 +--
 drivers/gpu/drm/drm_exec.c                         |  6 ++----
 drivers/gpu/drm/drm_gpuvm.c                        |  3 +--
 drivers/gpu/drm/xe/xe_vm.c                         |  3 +--
 include/drm/drm_exec.h                             | 14 ++++++--------
 6 files changed, 14 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index c048217615c1..c4ee19603460 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -850,7 +850,6 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 	struct amdgpu_vm *vm = &fpriv->vm;
 	struct amdgpu_bo_list_entry *e;
 	struct drm_gem_object *obj;
-	unsigned long index;
 	unsigned int i;
 	int r;
 
@@ -962,7 +961,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 		goto out_free_user_pages;
 	}
 
-	drm_exec_for_each_locked_object(&p->exec, index, obj) {
+	drm_exec_for_each_locked_object(&p->exec, obj) {
 		r = amdgpu_cs_bo_validate(p, gem_to_amdgpu_bo(obj));
 		if (unlikely(r))
 			goto out_free_user_pages;
@@ -1201,7 +1200,6 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p)
 	struct drm_gpu_scheduler *sched;
 	struct drm_gem_object *obj;
 	struct dma_fence *fence;
-	unsigned long index;
 	unsigned int i;
 	int r;
 
@@ -1212,7 +1210,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p)
 		return r;
 	}
 
-	drm_exec_for_each_locked_object(&p->exec, index, obj) {
+	drm_exec_for_each_locked_object(&p->exec, obj) {
 		struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
 
 		struct dma_resv *resv = bo->tbo.base.resv;
@@ -1280,7 +1278,6 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 	struct amdgpu_job *leader = p->gang_leader;
 	struct amdgpu_bo_list_entry *e;
 	struct drm_gem_object *gobj;
-	unsigned long index;
 	unsigned int i;
 	uint64_t seq;
 	int r;
@@ -1330,7 +1327,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 	}
 
 	p->fence = dma_fence_get(&leader->base.s_fence->finished);
-	drm_exec_for_each_locked_object(&p->exec, index, gobj) {
+	drm_exec_for_each_locked_object(&p->exec, gobj) {
 
 		ttm_bo_move_to_lru_tail_unlocked(&gem_to_amdgpu_bo(gobj)->tbo);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
index 4c5e38dea4c2..f6b7522c3c82 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
@@ -121,7 +121,6 @@ int amdgpu_evf_mgr_rearm(struct amdgpu_eviction_fence_mgr *evf_mgr,
 {
 	struct amdgpu_eviction_fence *ev_fence;
 	struct drm_gem_object *obj;
-	unsigned long index;
 
 	/* Create and initialize a new eviction fence */
 	ev_fence = kzalloc_obj(*ev_fence);
@@ -140,7 +139,7 @@ int amdgpu_evf_mgr_rearm(struct amdgpu_eviction_fence_mgr *evf_mgr,
 	evf_mgr->ev_fence = &ev_fence->base;
 
 	/* And add it to all existing BOs */
-	drm_exec_for_each_locked_object(exec, index, obj) {
+	drm_exec_for_each_locked_object(exec, obj) {
 		struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
 
 		amdgpu_evf_mgr_attach_fence(evf_mgr, bo);
diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
index 8d0601400182..746210f3f6c2 100644
--- a/drivers/gpu/drm/drm_exec.c
+++ b/drivers/gpu/drm/drm_exec.c
@@ -24,7 +24,6 @@
  *
  *	struct drm_gem_object *obj;
  *	struct drm_exec exec;
- *	unsigned long index;
  *	int ret;
  *
  *	drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT);
@@ -40,7 +39,7 @@
  *			goto error;
  *	}
  *
- *	drm_exec_for_each_locked_object(&exec, index, obj) {
+ *	drm_exec_for_each_locked_object(&exec, obj) {
  *		dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
  *		...
  *	}
@@ -56,9 +55,8 @@
 static void drm_exec_unlock_all(struct drm_exec *exec)
 {
 	struct drm_gem_object *obj;
-	unsigned long index;
 
-	drm_exec_for_each_locked_object_reverse(exec, index, obj) {
+	drm_exec_for_each_locked_object_reverse(exec, obj) {
 		dma_resv_unlock(obj->resv);
 		drm_gem_object_put(obj);
 	}
diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
index 44acfe4120d2..2e44671e05b1 100644
--- a/drivers/gpu/drm/drm_gpuvm.c
+++ b/drivers/gpu/drm/drm_gpuvm.c
@@ -1550,9 +1550,8 @@ drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
 			 enum dma_resv_usage extobj_usage)
 {
 	struct drm_gem_object *obj;
-	unsigned long index;
 
-	drm_exec_for_each_locked_object(exec, index, obj) {
+	drm_exec_for_each_locked_object(exec, obj) {
 		dma_resv_assert_held(obj->resv);
 		dma_resv_add_fence(obj->resv, fence,
 				   drm_gpuvm_is_extobj(gpuvm, obj) ?
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 56e2db50bb36..30efd6721da1 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -373,7 +373,6 @@ int xe_vm_validate_rebind(struct xe_vm *vm, struct drm_exec *exec,
 			  unsigned int num_fences)
 {
 	struct drm_gem_object *obj;
-	unsigned long index;
 	int ret;
 
 	do {
@@ -386,7 +385,7 @@ int xe_vm_validate_rebind(struct xe_vm *vm, struct drm_exec *exec,
 			return ret;
 	} while (!list_empty(&vm->gpuvm.evict.list));
 
-	drm_exec_for_each_locked_object(exec, index, obj) {
+	drm_exec_for_each_locked_object(exec, obj) {
 		ret = dma_resv_reserve_fences(obj->resv, num_fences);
 		if (ret)
 			return ret;
diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
index aa786b828a0a..25db52dd2af0 100644
--- a/include/drm/drm_exec.h
+++ b/include/drm/drm_exec.h
@@ -68,28 +68,26 @@ drm_exec_obj(struct drm_exec *exec, unsigned long index)
 /**
  * drm_exec_for_each_locked_object - iterate over all the locked objects
  * @exec: drm_exec object
- * @index: unsigned long index for the iteration
  * @obj: the current GEM object
  *
  * Iterate over all the locked GEM objects inside the drm_exec object.
  */
-#define drm_exec_for_each_locked_object(exec, index, obj)		\
-	for ((index) = 0; ((obj) = drm_exec_obj(exec, index)); ++(index))
+#define drm_exec_for_each_locked_object(exec, obj)		\
+	for (unsigned long _index = 0; ((obj) = drm_exec_obj(exec, _index)); ++_index)
 
 /**
  * drm_exec_for_each_locked_object_reverse - iterate over all the locked
  * objects in reverse locking order
  * @exec: drm_exec object
- * @index: unsigned long index for the iteration
  * @obj: the current GEM object
  *
  * Iterate over all the locked GEM objects inside the drm_exec object in
- * reverse locking order. Note that @index may go below zero and wrap,
+ * reverse locking order. Note that the internal index may wrap around,
  * but that will be caught by drm_exec_obj(), returning a NULL object.
  */
-#define drm_exec_for_each_locked_object_reverse(exec, index, obj)	\
-	for ((index) = (exec)->num_objects - 1;				\
-	     ((obj) = drm_exec_obj(exec, index)); --(index))
+#define drm_exec_for_each_locked_object_reverse(exec, obj)	\
+	for (unsigned long _index = (exec)->num_objects - 1;				\
+	     ((obj) = drm_exec_obj(exec, _index)); --_index)
 
 /**
  * drm_exec_until_all_locked - loop until all GEM objects are locked
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/5] drm/msm: Remove abuse of drm_exec internals
  2026-03-31  9:20 [PATCH 0/5] drm/exec: drm_exec polishing Thomas Hellström
  2026-03-31  9:20 ` [PATCH 1/5] drm/exec: Remove the index parameter from drm_exec_for_each_locked_obj[_reverse] Thomas Hellström
@ 2026-03-31  9:20 ` Thomas Hellström
  2026-03-31  9:30   ` Christian König
                     ` (2 more replies)
  2026-03-31  9:20 ` [PATCH 3/5] drm/exec: Make the drm_exec_until_all_locked() macro more readable Thomas Hellström
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 23+ messages in thread
From: Thomas Hellström @ 2026-03-31  9:20 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Felix Kuehling, Alex Deucher,
	Christian König, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Danilo Krummrich, Matthew Brost, Alice Ryhl, Rob Clark,
	Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang, Sean Paul,
	Marijn Suijten, amd-gfx, dri-devel, linux-arm-msm, freedreno

The code was reading drm_exec internal state to determine whether
the drm_exec structure had been initialized or not, and therefore
needed cleaning up, relying on undocumented behaviour.

Instead add a bool to struct msm_gem_submit to indicate whether
drm_exec cleaning up is needed.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/msm/msm_gem.h        | 1 +
 drivers/gpu/drm/msm/msm_gem_submit.c | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index cb32093fda47..762e546d25ef 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -452,6 +452,7 @@ struct msm_gem_submit {
 	bool bos_pinned : 1;
 	bool fault_dumped:1;/* Limit devcoredump dumping to one per submit */
 	bool in_rb : 1;     /* "sudo" mode, copy cmds into RB */
+	bool has_exec : 1;  /* @exec is initialized. */
 	struct msm_ringbuffer *ring;
 	unsigned int nr_cmds;
 	unsigned int nr_bos;
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index 75d9f3574370..26ea8a28be47 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -278,6 +278,7 @@ static int submit_lock_objects_vmbind(struct msm_gem_submit *submit)
 	int ret = 0;
 
 	drm_exec_init(&submit->exec, flags, submit->nr_bos);
+	submit->has_exec = true;
 
 	drm_exec_until_all_locked (&submit->exec) {
 		ret = drm_gpuvm_prepare_vm(submit->vm, exec, 1);
@@ -304,6 +305,7 @@ static int submit_lock_objects(struct msm_gem_submit *submit)
 		return submit_lock_objects_vmbind(submit);
 
 	drm_exec_init(&submit->exec, flags, submit->nr_bos);
+	submit->has_exec = true;
 
 	drm_exec_until_all_locked (&submit->exec) {
 		ret = drm_exec_lock_obj(&submit->exec,
@@ -523,7 +525,7 @@ static void submit_cleanup(struct msm_gem_submit *submit, bool error)
 	if (error)
 		submit_unpin_objects(submit);
 
-	if (submit->exec.objects)
+	if (submit->has_exec)
 		drm_exec_fini(&submit->exec);
 
 	/* if job wasn't enqueued to scheduler, early retirement: */
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/5] drm/exec: Make the drm_exec_until_all_locked() macro more readable
  2026-03-31  9:20 [PATCH 0/5] drm/exec: drm_exec polishing Thomas Hellström
  2026-03-31  9:20 ` [PATCH 1/5] drm/exec: Remove the index parameter from drm_exec_for_each_locked_obj[_reverse] Thomas Hellström
  2026-03-31  9:20 ` [PATCH 2/5] drm/msm: Remove abuse of drm_exec internals Thomas Hellström
@ 2026-03-31  9:20 ` Thomas Hellström
  2026-03-31  9:39   ` Christian König
  2026-03-31  9:20 ` [PATCH 4/5] drm/exec, drm/xe: Avoid abusing the drm_exec retry pointer Thomas Hellström
  2026-03-31  9:20 ` [PATCH 5/5] drm/exec, drm/xe, drm/amdgpu: Add an accessor for struct drm_exec::ticket Thomas Hellström
  4 siblings, 1 reply; 23+ messages in thread
From: Thomas Hellström @ 2026-03-31  9:20 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Felix Kuehling, Alex Deucher,
	Christian König, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Danilo Krummrich, Matthew Brost, Alice Ryhl, Rob Clark,
	Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang, Sean Paul,
	Marijn Suijten, amd-gfx, dri-devel, linux-arm-msm, freedreno

Use __UNIQUE_ID as done elsewhere in the kernel rather than a
hand-rolled __PASTE to craft a unique id.

Also use __maybe_unused rather than (void) to signify that a
variable, althrough written to, may not actually be used.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 include/drm/drm_exec.h | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
index 25db52dd2af0..fc95a979e253 100644
--- a/include/drm/drm_exec.h
+++ b/include/drm/drm_exec.h
@@ -89,6 +89,19 @@ drm_exec_obj(struct drm_exec *exec, unsigned long index)
 	for (unsigned long _index = (exec)->num_objects - 1;				\
 	     ((obj) = drm_exec_obj(exec, _index)); --_index)
 
+/*
+ * Helper to drm_exec_until_all_locked(). Don't use directly.
+ *
+ * Since labels can't be defined local to the loop's body we use a jump pointer
+ * to make sure that the retry is only used from within the loop's body.
+ */
+#define __drm_exec_until_all_locked(exec, _label)			\
+_label:									\
+	for (void * __maybe_unused __drm_exec_retry_ptr; ({		\
+		__drm_exec_retry_ptr = &&_label;			\
+		drm_exec_cleanup(exec);					\
+	});)
+
 /**
  * drm_exec_until_all_locked - loop until all GEM objects are locked
  * @exec: drm_exec object
@@ -96,17 +109,9 @@ drm_exec_obj(struct drm_exec *exec, unsigned long index)
  * Core functionality of the drm_exec object. Loops until all GEM objects are
  * locked and no more contention exists. At the beginning of the loop it is
  * guaranteed that no GEM object is locked.
- *
- * Since labels can't be defined local to the loops body we use a jump pointer
- * to make sure that the retry is only used from within the loops body.
  */
 #define drm_exec_until_all_locked(exec)					\
-__PASTE(__drm_exec_, __LINE__):						\
-	for (void *__drm_exec_retry_ptr; ({				\
-		__drm_exec_retry_ptr = &&__PASTE(__drm_exec_, __LINE__);\
-		(void)__drm_exec_retry_ptr;				\
-		drm_exec_cleanup(exec);					\
-	});)
+	__drm_exec_until_all_locked(exec, __UNIQUE_ID(drm_exec))
 
 /**
  * drm_exec_retry_on_contention - restart the loop to grap all locks
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/5] drm/exec, drm/xe: Avoid abusing the drm_exec retry pointer
  2026-03-31  9:20 [PATCH 0/5] drm/exec: drm_exec polishing Thomas Hellström
                   ` (2 preceding siblings ...)
  2026-03-31  9:20 ` [PATCH 3/5] drm/exec: Make the drm_exec_until_all_locked() macro more readable Thomas Hellström
@ 2026-03-31  9:20 ` Thomas Hellström
  2026-03-31  9:44   ` Christian König
  2026-03-31  9:20 ` [PATCH 5/5] drm/exec, drm/xe, drm/amdgpu: Add an accessor for struct drm_exec::ticket Thomas Hellström
  4 siblings, 1 reply; 23+ messages in thread
From: Thomas Hellström @ 2026-03-31  9:20 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Felix Kuehling, Alex Deucher,
	Christian König, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Danilo Krummrich, Matthew Brost, Alice Ryhl, Rob Clark,
	Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang, Sean Paul,
	Marijn Suijten, amd-gfx, dri-devel, linux-arm-msm, freedreno

The xe driver was using the drm_exec retry pointer directly to
restart the locking loop after out-of-memory errors. This is
relying on documented behaviour.

Instead add a drm_exec_retry() macro that can be used in this
situation, and that also asserts that the struct drm_exec is
in a state that is compatible with retrying:
Either newly initialized or in a contended state with all locks
dropped.

Use that macro in xe.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_validation.h |  2 +-
 include/drm/drm_exec.h             | 13 +++++++++++++
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_validation.h b/drivers/gpu/drm/xe/xe_validation.h
index a30e732c4d51..4cd955ce6cd2 100644
--- a/drivers/gpu/drm/xe/xe_validation.h
+++ b/drivers/gpu/drm/xe/xe_validation.h
@@ -146,7 +146,7 @@ bool xe_validation_should_retry(struct xe_validation_ctx *ctx, int *ret);
 #define xe_validation_retry_on_oom(_ctx, _ret)				\
 	do {								\
 		if (xe_validation_should_retry(_ctx, _ret))		\
-			goto *__drm_exec_retry_ptr;			\
+			drm_exec_retry((_ctx)->exec);			\
 	} while (0)
 
 /**
diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
index fc95a979e253..5ed5be1f8244 100644
--- a/include/drm/drm_exec.h
+++ b/include/drm/drm_exec.h
@@ -138,6 +138,19 @@ static inline bool drm_exec_is_contended(struct drm_exec *exec)
 	return !!exec->contended;
 }
 
+/**
+ * drm_exec_retry() - Unconditionally restart the loop to grab all locks.
+ * @exec: drm_exec object
+ *
+ * Unconditionally retry the loop to lock all objects. For consistency,
+ * the exec object needs to be newly initialized or contended.
+ */
+#define drm_exec_retry(_exec)				\
+	do {						\
+		WARN_ON(!drm_exec_is_contended(_exec)); \
+		goto *__drm_exec_retry_ptr;		\
+	} while (0)
+
 void drm_exec_init(struct drm_exec *exec, u32 flags, unsigned nr);
 void drm_exec_fini(struct drm_exec *exec);
 bool drm_exec_cleanup(struct drm_exec *exec);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/5] drm/exec, drm/xe, drm/amdgpu: Add an accessor for struct drm_exec::ticket
  2026-03-31  9:20 [PATCH 0/5] drm/exec: drm_exec polishing Thomas Hellström
                   ` (3 preceding siblings ...)
  2026-03-31  9:20 ` [PATCH 4/5] drm/exec, drm/xe: Avoid abusing the drm_exec retry pointer Thomas Hellström
@ 2026-03-31  9:20 ` Thomas Hellström
  2026-03-31  9:46   ` Christian König
                     ` (3 more replies)
  4 siblings, 4 replies; 23+ messages in thread
From: Thomas Hellström @ 2026-03-31  9:20 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Felix Kuehling, Alex Deucher,
	Christian König, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Danilo Krummrich, Matthew Brost, Alice Ryhl, Rob Clark,
	Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang, Sean Paul,
	Marijn Suijten, amd-gfx, dri-devel, linux-arm-msm, freedreno

Drivers were accessing this drm_exec member directly.
Provide an accessor, drm_exec_ticket() to avoid that.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c           | 6 +++---
 drivers/gpu/drm/xe/xe_validation.c               | 4 ++--
 include/drm/drm_exec.h                           | 5 +++++
 4 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 29b400cdd6d5..8a4fb9a62485 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2998,7 +2998,7 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence __rcu *
 	/* Validate PDs, PTs and evicted DMABuf imports last. Otherwise BO
 	 * validations above would invalidate DMABuf imports again.
 	 */
-	ret = process_validate_vms(process_info, &exec.ticket);
+	ret = process_validate_vms(process_info, drm_exec_ticket(exec));
 	if (ret) {
 		pr_debug("Validating VMs failed, ret: %d\n", ret);
 		goto validate_map_fail;
@@ -3039,7 +3039,7 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence __rcu *
 			goto validate_map_fail;
 		}
 
-		ret = amdgpu_vm_handle_moved(adev, peer_vm, &exec.ticket);
+		ret = amdgpu_vm_handle_moved(adev, peer_vm, drm_exec_ticket(exec));
 		if (ret) {
 			dev_dbg(adev->dev,
 				"Memory eviction: handle moved failed, pid %8d. Try again.\n",
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index c4ee19603460..c725a7976c63 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1157,7 +1157,7 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser *p)
 			return r;
 	}
 
-	r = amdgpu_vm_handle_moved(adev, vm, &p->exec.ticket);
+	r = amdgpu_vm_handle_moved(adev, vm, drm_exec_ticket(&p->exec));
 	if (r)
 		return r;
 
@@ -1358,7 +1358,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 	cs->out.handle = seq;
 	leader->uf_sequence = seq;
 
-	amdgpu_vm_bo_trace_cs(&fpriv->vm, &p->exec.ticket);
+	amdgpu_vm_bo_trace_cs(&fpriv->vm, drm_exec_ticket(&p->exec));
 	for (i = 0; i < p->gang_size; ++i) {
 		amdgpu_job_free_resources(p->jobs[i]);
 		trace_amdgpu_cs_ioctl(p->jobs[i]);
@@ -1793,7 +1793,7 @@ int amdgpu_cs_find_mapping(struct amdgpu_cs_parser *parser,
 	*map = mapping;
 
 	/* Double check that the BO is reserved by this CS */
-	if (dma_resv_locking_ctx((*bo)->tbo.base.resv) != &parser->exec.ticket)
+	if (dma_resv_locking_ctx((*bo)->tbo.base.resv) != drm_exec_ticket(&parser->exec))
 		return -EINVAL;
 
 	/* Make sure VRAM is allocated contigiously */
diff --git a/drivers/gpu/drm/xe/xe_validation.c b/drivers/gpu/drm/xe/xe_validation.c
index a611438eaafe..8dff4d0ec895 100644
--- a/drivers/gpu/drm/xe/xe_validation.c
+++ b/drivers/gpu/drm/xe/xe_validation.c
@@ -156,7 +156,7 @@ int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct xe_validation_d
 
 #ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
 /*
- * This abuses both drm_exec and ww_mutex internals and should be
+ * This abuses ww_mutex internals and should be
  * replaced by checking for -EDEADLK when we can make TTM
  * stop converting -EDEADLK to -ENOMEM.
  * An alternative is to not have exhaustive eviction with
@@ -164,7 +164,7 @@ int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct xe_validation_d
  */
 static bool xe_validation_contention_injected(struct drm_exec *exec)
 {
-	return !!exec->ticket.contending_lock;
+	return !!drm_exec_ticket(exec)->contending_lock;
 }
 
 #else
diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
index 5ed5be1f8244..50d056a87de0 100644
--- a/include/drm/drm_exec.h
+++ b/include/drm/drm_exec.h
@@ -151,6 +151,11 @@ static inline bool drm_exec_is_contended(struct drm_exec *exec)
 		goto *__drm_exec_retry_ptr;		\
 	} while (0)
 
+static inline struct ww_acquire_ctx *drm_exec_ticket(struct drm_exec *exec)
+{
+	return &exec->ticket;
+}
+
 void drm_exec_init(struct drm_exec *exec, u32 flags, unsigned nr);
 void drm_exec_fini(struct drm_exec *exec);
 bool drm_exec_cleanup(struct drm_exec *exec);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/5] drm/exec: Remove the index parameter from drm_exec_for_each_locked_obj[_reverse]
  2026-03-31  9:20 ` [PATCH 1/5] drm/exec: Remove the index parameter from drm_exec_for_each_locked_obj[_reverse] Thomas Hellström
@ 2026-03-31  9:29   ` Christian König
  0 siblings, 0 replies; 23+ messages in thread
From: Christian König @ 2026-03-31  9:29 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: Felix Kuehling, Alex Deucher, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Danilo Krummrich, Matthew Brost, Alice Ryhl, Rob Clark,
	Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang, Sean Paul,
	Marijn Suijten, amd-gfx, dri-devel, linux-arm-msm, freedreno

On 3/31/26 11:20, Thomas Hellström wrote:
> Nobody makes any use of it. Possible internal future users can
> instead use the _index variable. External users shouldn't use
> it since the array it's pointing into is internal drm_exec state.

Yeah that was on my TODO list as well, just one more comment below.

> 
> Assisted-by: GitHub Copilot:claude-sonnet-4.6
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c             |  9 +++------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c |  3 +--
>  drivers/gpu/drm/drm_exec.c                         |  6 ++----
>  drivers/gpu/drm/drm_gpuvm.c                        |  3 +--
>  drivers/gpu/drm/xe/xe_vm.c                         |  3 +--
>  include/drm/drm_exec.h                             | 14 ++++++--------
>  6 files changed, 14 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index c048217615c1..c4ee19603460 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -850,7 +850,6 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>  	struct amdgpu_vm *vm = &fpriv->vm;
>  	struct amdgpu_bo_list_entry *e;
>  	struct drm_gem_object *obj;
> -	unsigned long index;
>  	unsigned int i;
>  	int r;
>  
> @@ -962,7 +961,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>  		goto out_free_user_pages;
>  	}
>  
> -	drm_exec_for_each_locked_object(&p->exec, index, obj) {
> +	drm_exec_for_each_locked_object(&p->exec, obj) {
>  		r = amdgpu_cs_bo_validate(p, gem_to_amdgpu_bo(obj));
>  		if (unlikely(r))
>  			goto out_free_user_pages;
> @@ -1201,7 +1200,6 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p)
>  	struct drm_gpu_scheduler *sched;
>  	struct drm_gem_object *obj;
>  	struct dma_fence *fence;
> -	unsigned long index;
>  	unsigned int i;
>  	int r;
>  
> @@ -1212,7 +1210,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p)
>  		return r;
>  	}
>  
> -	drm_exec_for_each_locked_object(&p->exec, index, obj) {
> +	drm_exec_for_each_locked_object(&p->exec, obj) {
>  		struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
>  
>  		struct dma_resv *resv = bo->tbo.base.resv;
> @@ -1280,7 +1278,6 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>  	struct amdgpu_job *leader = p->gang_leader;
>  	struct amdgpu_bo_list_entry *e;
>  	struct drm_gem_object *gobj;
> -	unsigned long index;
>  	unsigned int i;
>  	uint64_t seq;
>  	int r;
> @@ -1330,7 +1327,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>  	}
>  
>  	p->fence = dma_fence_get(&leader->base.s_fence->finished);
> -	drm_exec_for_each_locked_object(&p->exec, index, gobj) {
> +	drm_exec_for_each_locked_object(&p->exec, gobj) {
>  
>  		ttm_bo_move_to_lru_tail_unlocked(&gem_to_amdgpu_bo(gobj)->tbo);
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> index 4c5e38dea4c2..f6b7522c3c82 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> @@ -121,7 +121,6 @@ int amdgpu_evf_mgr_rearm(struct amdgpu_eviction_fence_mgr *evf_mgr,
>  {
>  	struct amdgpu_eviction_fence *ev_fence;
>  	struct drm_gem_object *obj;
> -	unsigned long index;
>  
>  	/* Create and initialize a new eviction fence */
>  	ev_fence = kzalloc_obj(*ev_fence);
> @@ -140,7 +139,7 @@ int amdgpu_evf_mgr_rearm(struct amdgpu_eviction_fence_mgr *evf_mgr,
>  	evf_mgr->ev_fence = &ev_fence->base;
>  
>  	/* And add it to all existing BOs */
> -	drm_exec_for_each_locked_object(exec, index, obj) {
> +	drm_exec_for_each_locked_object(exec, obj) {
>  		struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
>  
>  		amdgpu_evf_mgr_attach_fence(evf_mgr, bo);
> diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
> index 8d0601400182..746210f3f6c2 100644
> --- a/drivers/gpu/drm/drm_exec.c
> +++ b/drivers/gpu/drm/drm_exec.c
> @@ -24,7 +24,6 @@
>   *
>   *	struct drm_gem_object *obj;
>   *	struct drm_exec exec;
> - *	unsigned long index;
>   *	int ret;
>   *
>   *	drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT);
> @@ -40,7 +39,7 @@
>   *			goto error;
>   *	}
>   *
> - *	drm_exec_for_each_locked_object(&exec, index, obj) {
> + *	drm_exec_for_each_locked_object(&exec, obj) {
>   *		dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
>   *		...
>   *	}
> @@ -56,9 +55,8 @@
>  static void drm_exec_unlock_all(struct drm_exec *exec)
>  {
>  	struct drm_gem_object *obj;
> -	unsigned long index;
>  
> -	drm_exec_for_each_locked_object_reverse(exec, index, obj) {
> +	drm_exec_for_each_locked_object_reverse(exec, obj) {
>  		dma_resv_unlock(obj->resv);
>  		drm_gem_object_put(obj);
>  	}
> diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> index 44acfe4120d2..2e44671e05b1 100644
> --- a/drivers/gpu/drm/drm_gpuvm.c
> +++ b/drivers/gpu/drm/drm_gpuvm.c
> @@ -1550,9 +1550,8 @@ drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>  			 enum dma_resv_usage extobj_usage)
>  {
>  	struct drm_gem_object *obj;
> -	unsigned long index;
>  
> -	drm_exec_for_each_locked_object(exec, index, obj) {
> +	drm_exec_for_each_locked_object(exec, obj) {
>  		dma_resv_assert_held(obj->resv);
>  		dma_resv_add_fence(obj->resv, fence,
>  				   drm_gpuvm_is_extobj(gpuvm, obj) ?
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 56e2db50bb36..30efd6721da1 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -373,7 +373,6 @@ int xe_vm_validate_rebind(struct xe_vm *vm, struct drm_exec *exec,
>  			  unsigned int num_fences)
>  {
>  	struct drm_gem_object *obj;
> -	unsigned long index;
>  	int ret;
>  
>  	do {
> @@ -386,7 +385,7 @@ int xe_vm_validate_rebind(struct xe_vm *vm, struct drm_exec *exec,
>  			return ret;
>  	} while (!list_empty(&vm->gpuvm.evict.list));
>  
> -	drm_exec_for_each_locked_object(exec, index, obj) {
> +	drm_exec_for_each_locked_object(exec, obj) {
>  		ret = dma_resv_reserve_fences(obj->resv, num_fences);
>  		if (ret)
>  			return ret;
> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
> index aa786b828a0a..25db52dd2af0 100644
> --- a/include/drm/drm_exec.h
> +++ b/include/drm/drm_exec.h
> @@ -68,28 +68,26 @@ drm_exec_obj(struct drm_exec *exec, unsigned long index)
>  /**
>   * drm_exec_for_each_locked_object - iterate over all the locked objects
>   * @exec: drm_exec object
> - * @index: unsigned long index for the iteration
>   * @obj: the current GEM object
>   *
>   * Iterate over all the locked GEM objects inside the drm_exec object.
>   */
> -#define drm_exec_for_each_locked_object(exec, index, obj)		\
> -	for ((index) = 0; ((obj) = drm_exec_obj(exec, index)); ++(index))
> +#define drm_exec_for_each_locked_object(exec, obj)		\
> +	for (unsigned long _index = 0; ((obj) = drm_exec_obj(exec, _index)); ++_index)

I'm not sure if _index is unique enough here, would use something like __PASTE(_drm_exec_index, __LINE__) instead.

Apart from that looks good to me.

Regards,
Christian.

>  
>  /**
>   * drm_exec_for_each_locked_object_reverse - iterate over all the locked
>   * objects in reverse locking order
>   * @exec: drm_exec object
> - * @index: unsigned long index for the iteration
>   * @obj: the current GEM object
>   *
>   * Iterate over all the locked GEM objects inside the drm_exec object in
> - * reverse locking order. Note that @index may go below zero and wrap,
> + * reverse locking order. Note that the internal index may wrap around,
>   * but that will be caught by drm_exec_obj(), returning a NULL object.
>   */
> -#define drm_exec_for_each_locked_object_reverse(exec, index, obj)	\
> -	for ((index) = (exec)->num_objects - 1;				\
> -	     ((obj) = drm_exec_obj(exec, index)); --(index))
> +#define drm_exec_for_each_locked_object_reverse(exec, obj)	\
> +	for (unsigned long _index = (exec)->num_objects - 1;				\
> +	     ((obj) = drm_exec_obj(exec, _index)); --_index)
>  
>  /**
>   * drm_exec_until_all_locked - loop until all GEM objects are locked


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/5] drm/msm: Remove abuse of drm_exec internals
  2026-03-31  9:20 ` [PATCH 2/5] drm/msm: Remove abuse of drm_exec internals Thomas Hellström
@ 2026-03-31  9:30   ` Christian König
  2026-03-31  9:36   ` Christian König
  2026-03-31 19:08   ` Rob Clark
  2 siblings, 0 replies; 23+ messages in thread
From: Christian König @ 2026-03-31  9:30 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: Felix Kuehling, Alex Deucher, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Danilo Krummrich, Matthew Brost, Alice Ryhl, Rob Clark,
	Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang, Sean Paul,
	Marijn Suijten, amd-gfx, dri-devel, linux-arm-msm, freedreno

On 3/31/26 11:20, Thomas Hellström wrote:
> The code was reading drm_exec internal state to determine whether
> the drm_exec structure had been initialized or not, and therefore
> needed cleaning up, relying on undocumented behaviour.
> 
> Instead add a bool to struct msm_gem_submit to indicate whether
> drm_exec cleaning up is needed.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Acked-by: Christian König <christian.koenig@amd.com>

> ---
>  drivers/gpu/drm/msm/msm_gem.h        | 1 +
>  drivers/gpu/drm/msm/msm_gem_submit.c | 4 +++-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
> index cb32093fda47..762e546d25ef 100644
> --- a/drivers/gpu/drm/msm/msm_gem.h
> +++ b/drivers/gpu/drm/msm/msm_gem.h
> @@ -452,6 +452,7 @@ struct msm_gem_submit {
>  	bool bos_pinned : 1;
>  	bool fault_dumped:1;/* Limit devcoredump dumping to one per submit */
>  	bool in_rb : 1;     /* "sudo" mode, copy cmds into RB */
> +	bool has_exec : 1;  /* @exec is initialized. */
>  	struct msm_ringbuffer *ring;
>  	unsigned int nr_cmds;
>  	unsigned int nr_bos;
> diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> index 75d9f3574370..26ea8a28be47 100644
> --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> @@ -278,6 +278,7 @@ static int submit_lock_objects_vmbind(struct msm_gem_submit *submit)
>  	int ret = 0;
>  
>  	drm_exec_init(&submit->exec, flags, submit->nr_bos);
> +	submit->has_exec = true;
>  
>  	drm_exec_until_all_locked (&submit->exec) {
>  		ret = drm_gpuvm_prepare_vm(submit->vm, exec, 1);
> @@ -304,6 +305,7 @@ static int submit_lock_objects(struct msm_gem_submit *submit)
>  		return submit_lock_objects_vmbind(submit);
>  
>  	drm_exec_init(&submit->exec, flags, submit->nr_bos);
> +	submit->has_exec = true;
>  
>  	drm_exec_until_all_locked (&submit->exec) {
>  		ret = drm_exec_lock_obj(&submit->exec,
> @@ -523,7 +525,7 @@ static void submit_cleanup(struct msm_gem_submit *submit, bool error)
>  	if (error)
>  		submit_unpin_objects(submit);
>  
> -	if (submit->exec.objects)
> +	if (submit->has_exec)
>  		drm_exec_fini(&submit->exec);
>  
>  	/* if job wasn't enqueued to scheduler, early retirement: */


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/5] drm/msm: Remove abuse of drm_exec internals
  2026-03-31  9:20 ` [PATCH 2/5] drm/msm: Remove abuse of drm_exec internals Thomas Hellström
  2026-03-31  9:30   ` Christian König
@ 2026-03-31  9:36   ` Christian König
  2026-03-31 19:08   ` Rob Clark
  2 siblings, 0 replies; 23+ messages in thread
From: Christian König @ 2026-03-31  9:36 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: Felix Kuehling, Alex Deucher, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Danilo Krummrich, Matthew Brost, Alice Ryhl, Rob Clark,
	Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang, Sean Paul,
	Marijn Suijten, amd-gfx, dri-devel, linux-arm-msm, freedreno

On 3/31/26 11:20, Thomas Hellström wrote:
> The code was reading drm_exec internal state to determine whether
> the drm_exec structure had been initialized or not, and therefore
> needed cleaning up, relying on undocumented behaviour.
> 
> Instead add a bool to struct msm_gem_submit to indicate whether
> drm_exec cleaning up is needed.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Acked-by: Christian König <christian.koenig@amd.com>

> ---
>  drivers/gpu/drm/msm/msm_gem.h        | 1 +
>  drivers/gpu/drm/msm/msm_gem_submit.c | 4 +++-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
> index cb32093fda47..762e546d25ef 100644
> --- a/drivers/gpu/drm/msm/msm_gem.h
> +++ b/drivers/gpu/drm/msm/msm_gem.h
> @@ -452,6 +452,7 @@ struct msm_gem_submit {
>  	bool bos_pinned : 1;
>  	bool fault_dumped:1;/* Limit devcoredump dumping to one per submit */
>  	bool in_rb : 1;     /* "sudo" mode, copy cmds into RB */
> +	bool has_exec : 1;  /* @exec is initialized. */
>  	struct msm_ringbuffer *ring;
>  	unsigned int nr_cmds;
>  	unsigned int nr_bos;
> diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> index 75d9f3574370..26ea8a28be47 100644
> --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> @@ -278,6 +278,7 @@ static int submit_lock_objects_vmbind(struct msm_gem_submit *submit)
>  	int ret = 0;
>  
>  	drm_exec_init(&submit->exec, flags, submit->nr_bos);
> +	submit->has_exec = true;
>  
>  	drm_exec_until_all_locked (&submit->exec) {
>  		ret = drm_gpuvm_prepare_vm(submit->vm, exec, 1);
> @@ -304,6 +305,7 @@ static int submit_lock_objects(struct msm_gem_submit *submit)
>  		return submit_lock_objects_vmbind(submit);
>  
>  	drm_exec_init(&submit->exec, flags, submit->nr_bos);
> +	submit->has_exec = true;
>  
>  	drm_exec_until_all_locked (&submit->exec) {
>  		ret = drm_exec_lock_obj(&submit->exec,
> @@ -523,7 +525,7 @@ static void submit_cleanup(struct msm_gem_submit *submit, bool error)
>  	if (error)
>  		submit_unpin_objects(submit);
>  
> -	if (submit->exec.objects)
> +	if (submit->has_exec)
>  		drm_exec_fini(&submit->exec);
>  
>  	/* if job wasn't enqueued to scheduler, early retirement: */


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/5] drm/exec: Make the drm_exec_until_all_locked() macro more readable
  2026-03-31  9:20 ` [PATCH 3/5] drm/exec: Make the drm_exec_until_all_locked() macro more readable Thomas Hellström
@ 2026-03-31  9:39   ` Christian König
  2026-03-31 11:03     ` Thomas Hellström
  0 siblings, 1 reply; 23+ messages in thread
From: Christian König @ 2026-03-31  9:39 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: Felix Kuehling, Alex Deucher, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Danilo Krummrich, Matthew Brost, Alice Ryhl, Rob Clark,
	Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang, Sean Paul,
	Marijn Suijten, amd-gfx, dri-devel, linux-arm-msm, freedreno



On 3/31/26 11:20, Thomas Hellström wrote:
> Use __UNIQUE_ID as done elsewhere in the kernel rather than a
> hand-rolled __PASTE to craft a unique id.
> 
> Also use __maybe_unused rather than (void) to signify that a
> variable, althrough written to, may not actually be used.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  include/drm/drm_exec.h | 23 ++++++++++++++---------
>  1 file changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
> index 25db52dd2af0..fc95a979e253 100644
> --- a/include/drm/drm_exec.h
> +++ b/include/drm/drm_exec.h
> @@ -89,6 +89,19 @@ drm_exec_obj(struct drm_exec *exec, unsigned long index)
>  	for (unsigned long _index = (exec)->num_objects - 1;				\
>  	     ((obj) = drm_exec_obj(exec, _index)); --_index)
>  
> +/*
> + * Helper to drm_exec_until_all_locked(). Don't use directly.
> + *
> + * Since labels can't be defined local to the loop's body we use a jump pointer
> + * to make sure that the retry is only used from within the loop's body.
> + */
> +#define __drm_exec_until_all_locked(exec, _label)			\
> +_label:									\
> +	for (void * __maybe_unused __drm_exec_retry_ptr; ({		\
> +		__drm_exec_retry_ptr = &&_label;			\

I think when using __maybe_unused we could also move assigning the variable to the deceleration and drop the extra ({}).

Apart from that looks good to me.

Regards,
Christian.

> +		drm_exec_cleanup(exec);					\
> +	});)
> +
>  /**
>   * drm_exec_until_all_locked - loop until all GEM objects are locked
>   * @exec: drm_exec object
> @@ -96,17 +109,9 @@ drm_exec_obj(struct drm_exec *exec, unsigned long index)
>   * Core functionality of the drm_exec object. Loops until all GEM objects are
>   * locked and no more contention exists. At the beginning of the loop it is
>   * guaranteed that no GEM object is locked.
> - *
> - * Since labels can't be defined local to the loops body we use a jump pointer
> - * to make sure that the retry is only used from within the loops body.
>   */
>  #define drm_exec_until_all_locked(exec)					\
> -__PASTE(__drm_exec_, __LINE__):						\
> -	for (void *__drm_exec_retry_ptr; ({				\
> -		__drm_exec_retry_ptr = &&__PASTE(__drm_exec_, __LINE__);\
> -		(void)__drm_exec_retry_ptr;				\
> -		drm_exec_cleanup(exec);					\
> -	});)
> +	__drm_exec_until_all_locked(exec, __UNIQUE_ID(drm_exec))
>  
>  /**
>   * drm_exec_retry_on_contention - restart the loop to grap all locks


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/5] drm/exec, drm/xe: Avoid abusing the drm_exec retry pointer
  2026-03-31  9:20 ` [PATCH 4/5] drm/exec, drm/xe: Avoid abusing the drm_exec retry pointer Thomas Hellström
@ 2026-03-31  9:44   ` Christian König
  2026-03-31 10:13     ` Thomas Hellström
  0 siblings, 1 reply; 23+ messages in thread
From: Christian König @ 2026-03-31  9:44 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: Felix Kuehling, Alex Deucher, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Danilo Krummrich, Matthew Brost, Alice Ryhl, Rob Clark,
	Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang, Sean Paul,
	Marijn Suijten, amd-gfx, dri-devel, linux-arm-msm, freedreno

On 3/31/26 11:20, Thomas Hellström wrote:
> The xe driver was using the drm_exec retry pointer directly to
> restart the locking loop after out-of-memory errors. This is
> relying on documented behaviour.
> 
> Instead add a drm_exec_retry() macro that can be used in this
> situation, and that also asserts that the struct drm_exec is
> in a state that is compatible with retrying:
> Either newly initialized or in a contended state with all locks
> dropped.
> 
> Use that macro in xe.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/xe_validation.h |  2 +-
>  include/drm/drm_exec.h             | 13 +++++++++++++
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_validation.h b/drivers/gpu/drm/xe/xe_validation.h
> index a30e732c4d51..4cd955ce6cd2 100644
> --- a/drivers/gpu/drm/xe/xe_validation.h
> +++ b/drivers/gpu/drm/xe/xe_validation.h
> @@ -146,7 +146,7 @@ bool xe_validation_should_retry(struct xe_validation_ctx *ctx, int *ret);
>  #define xe_validation_retry_on_oom(_ctx, _ret)				\
>  	do {								\
>  		if (xe_validation_should_retry(_ctx, _ret))		\
> -			goto *__drm_exec_retry_ptr;			\
> +			drm_exec_retry((_ctx)->exec);			\

Oh, that goto is extremely questionable to begin with.

>  	} while (0)
>  
>  /**
> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
> index fc95a979e253..5ed5be1f8244 100644
> --- a/include/drm/drm_exec.h
> +++ b/include/drm/drm_exec.h
> @@ -138,6 +138,19 @@ static inline bool drm_exec_is_contended(struct drm_exec *exec)
>  	return !!exec->contended;
>  }
>  
> +/**
> + * drm_exec_retry() - Unconditionally restart the loop to grab all locks.
> + * @exec: drm_exec object
> + *
> + * Unconditionally retry the loop to lock all objects. For consistency,
> + * the exec object needs to be newly initialized or contended.
> + */
> +#define drm_exec_retry(_exec)				\
> +	do {						\
> +		WARN_ON(!drm_exec_is_contended(_exec)); \

This warning would trigger!

See the code in xe_bo_notifier_prepare_pinned() for example:

                        drm_exec_retry_on_contention(&exec);
                        ret = PTR_ERR(backup);
                        xe_validation_retry_on_oom(&ctx, &ret);

Without contention we would just skip the loop and never lock anything.

What XE does here just doesn't work as far as I can see.

Regards,
Christian.

> +		goto *__drm_exec_retry_ptr;		\
> +	} while (0)
> +
>  void drm_exec_init(struct drm_exec *exec, u32 flags, unsigned nr);
>  void drm_exec_fini(struct drm_exec *exec);
>  bool drm_exec_cleanup(struct drm_exec *exec);


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] drm/exec, drm/xe, drm/amdgpu: Add an accessor for struct drm_exec::ticket
  2026-03-31  9:20 ` [PATCH 5/5] drm/exec, drm/xe, drm/amdgpu: Add an accessor for struct drm_exec::ticket Thomas Hellström
@ 2026-03-31  9:46   ` Christian König
  2026-03-31 10:18     ` Thomas Hellström
  2026-03-31 21:46   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 23+ messages in thread
From: Christian König @ 2026-03-31  9:46 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: Felix Kuehling, Alex Deucher, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Danilo Krummrich, Matthew Brost, Alice Ryhl, Rob Clark,
	Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang, Sean Paul,
	Marijn Suijten, amd-gfx, dri-devel, linux-arm-msm, freedreno

On 3/31/26 11:20, Thomas Hellström wrote:
> Drivers were accessing this drm_exec member directly.

I don't see a problem with that as long as we have documented that this is allowed.

Regards,
Christian.

> Provide an accessor, drm_exec_ticket() to avoid that.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c           | 6 +++---
>  drivers/gpu/drm/xe/xe_validation.c               | 4 ++--
>  include/drm/drm_exec.h                           | 5 +++++
>  4 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 29b400cdd6d5..8a4fb9a62485 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -2998,7 +2998,7 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence __rcu *
>  	/* Validate PDs, PTs and evicted DMABuf imports last. Otherwise BO
>  	 * validations above would invalidate DMABuf imports again.
>  	 */
> -	ret = process_validate_vms(process_info, &exec.ticket);
> +	ret = process_validate_vms(process_info, drm_exec_ticket(exec));
>  	if (ret) {
>  		pr_debug("Validating VMs failed, ret: %d\n", ret);
>  		goto validate_map_fail;
> @@ -3039,7 +3039,7 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence __rcu *
>  			goto validate_map_fail;
>  		}
>  
> -		ret = amdgpu_vm_handle_moved(adev, peer_vm, &exec.ticket);
> +		ret = amdgpu_vm_handle_moved(adev, peer_vm, drm_exec_ticket(exec));
>  		if (ret) {
>  			dev_dbg(adev->dev,
>  				"Memory eviction: handle moved failed, pid %8d. Try again.\n",
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index c4ee19603460..c725a7976c63 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -1157,7 +1157,7 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser *p)
>  			return r;
>  	}
>  
> -	r = amdgpu_vm_handle_moved(adev, vm, &p->exec.ticket);
> +	r = amdgpu_vm_handle_moved(adev, vm, drm_exec_ticket(&p->exec));
>  	if (r)
>  		return r;
>  
> @@ -1358,7 +1358,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>  	cs->out.handle = seq;
>  	leader->uf_sequence = seq;
>  
> -	amdgpu_vm_bo_trace_cs(&fpriv->vm, &p->exec.ticket);
> +	amdgpu_vm_bo_trace_cs(&fpriv->vm, drm_exec_ticket(&p->exec));
>  	for (i = 0; i < p->gang_size; ++i) {
>  		amdgpu_job_free_resources(p->jobs[i]);
>  		trace_amdgpu_cs_ioctl(p->jobs[i]);
> @@ -1793,7 +1793,7 @@ int amdgpu_cs_find_mapping(struct amdgpu_cs_parser *parser,
>  	*map = mapping;
>  
>  	/* Double check that the BO is reserved by this CS */
> -	if (dma_resv_locking_ctx((*bo)->tbo.base.resv) != &parser->exec.ticket)
> +	if (dma_resv_locking_ctx((*bo)->tbo.base.resv) != drm_exec_ticket(&parser->exec))
>  		return -EINVAL;
>  
>  	/* Make sure VRAM is allocated contigiously */
> diff --git a/drivers/gpu/drm/xe/xe_validation.c b/drivers/gpu/drm/xe/xe_validation.c
> index a611438eaafe..8dff4d0ec895 100644
> --- a/drivers/gpu/drm/xe/xe_validation.c
> +++ b/drivers/gpu/drm/xe/xe_validation.c
> @@ -156,7 +156,7 @@ int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct xe_validation_d
>  
>  #ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
>  /*
> - * This abuses both drm_exec and ww_mutex internals and should be
> + * This abuses ww_mutex internals and should be
>   * replaced by checking for -EDEADLK when we can make TTM
>   * stop converting -EDEADLK to -ENOMEM.
>   * An alternative is to not have exhaustive eviction with
> @@ -164,7 +164,7 @@ int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct xe_validation_d
>   */
>  static bool xe_validation_contention_injected(struct drm_exec *exec)
>  {
> -	return !!exec->ticket.contending_lock;
> +	return !!drm_exec_ticket(exec)->contending_lock;
>  }
>  
>  #else
> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
> index 5ed5be1f8244..50d056a87de0 100644
> --- a/include/drm/drm_exec.h
> +++ b/include/drm/drm_exec.h
> @@ -151,6 +151,11 @@ static inline bool drm_exec_is_contended(struct drm_exec *exec)
>  		goto *__drm_exec_retry_ptr;		\
>  	} while (0)
>  
> +static inline struct ww_acquire_ctx *drm_exec_ticket(struct drm_exec *exec)
> +{
> +	return &exec->ticket;
> +}
> +
>  void drm_exec_init(struct drm_exec *exec, u32 flags, unsigned nr);
>  void drm_exec_fini(struct drm_exec *exec);
>  bool drm_exec_cleanup(struct drm_exec *exec);


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/5] drm/exec, drm/xe: Avoid abusing the drm_exec retry pointer
  2026-03-31  9:44   ` Christian König
@ 2026-03-31 10:13     ` Thomas Hellström
  2026-03-31 11:09       ` Thomas Hellström
  2026-03-31 11:59       ` Christian König
  0 siblings, 2 replies; 23+ messages in thread
From: Thomas Hellström @ 2026-03-31 10:13 UTC (permalink / raw)
  To: Christian König, intel-xe
  Cc: Felix Kuehling, Alex Deucher, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Danilo Krummrich, Matthew Brost, Alice Ryhl, Rob Clark,
	Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang, Sean Paul,
	Marijn Suijten, amd-gfx, dri-devel, linux-arm-msm, freedreno

On Tue, 2026-03-31 at 11:44 +0200, Christian König wrote:
> On 3/31/26 11:20, Thomas Hellström wrote:
> > The xe driver was using the drm_exec retry pointer directly to
> > restart the locking loop after out-of-memory errors. This is
> > relying on documented behaviour.
> > 
> > Instead add a drm_exec_retry() macro that can be used in this
> > situation, and that also asserts that the struct drm_exec is
> > in a state that is compatible with retrying:
> > Either newly initialized or in a contended state with all locks
> > dropped.
> > 
> > Use that macro in xe.
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_validation.h |  2 +-
> >  include/drm/drm_exec.h             | 13 +++++++++++++
> >  2 files changed, 14 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_validation.h
> > b/drivers/gpu/drm/xe/xe_validation.h
> > index a30e732c4d51..4cd955ce6cd2 100644
> > --- a/drivers/gpu/drm/xe/xe_validation.h
> > +++ b/drivers/gpu/drm/xe/xe_validation.h
> > @@ -146,7 +146,7 @@ bool xe_validation_should_retry(struct
> > xe_validation_ctx *ctx, int *ret);
> >  #define xe_validation_retry_on_oom(_ctx,
> > _ret)				\
> >  	do
> > {								\
> >  		if (xe_validation_should_retry(_ctx,
> > _ret))		\
> > -			goto
> > *__drm_exec_retry_ptr;			\
> > +			drm_exec_retry((_ctx)-
> > >exec);			\
> 
> Oh, that goto is extremely questionable to begin with.
> 
> >  	} while (0)
> >  
> >  /**
> > diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
> > index fc95a979e253..5ed5be1f8244 100644
> > --- a/include/drm/drm_exec.h
> > +++ b/include/drm/drm_exec.h
> > @@ -138,6 +138,19 @@ static inline bool
> > drm_exec_is_contended(struct drm_exec *exec)
> >  	return !!exec->contended;
> >  }
> >  
> > +/**
> > + * drm_exec_retry() - Unconditionally restart the loop to grab all
> > locks.
> > + * @exec: drm_exec object
> > + *
> > + * Unconditionally retry the loop to lock all objects. For
> > consistency,
> > + * the exec object needs to be newly initialized or contended.
> > + */
> > +#define drm_exec_retry(_exec)				\
> > +	do {						\
> > +		WARN_ON(!drm_exec_is_contended(_exec)); \
> 
> This warning would trigger!
> 
> See the code in xe_bo_notifier_prepare_pinned() for example:
> 
>                         drm_exec_retry_on_contention(&exec);
>                         ret = PTR_ERR(backup);
>                         xe_validation_retry_on_oom(&ctx, &ret);
> 
> Without contention we would just skip the loop and never lock
> anything.
> 
> What XE does here just doesn't work as far as I can see.

So if the xe_validation_retry_on_oom() is actually retrying it
internally call drm_exec_fini() and drm_exec_init() first, which means
that the warning doesn't trigger, due to the dummy value of contended.

So the warning does its job, and xe is safe.

Thanks,
Thomas



> 
> Regards,
> Christian.
> 
> > +		goto *__drm_exec_retry_ptr;		\
> > +	} while (0)
> > +
> >  void drm_exec_init(struct drm_exec *exec, u32 flags, unsigned nr);
> >  void drm_exec_fini(struct drm_exec *exec);
> >  bool drm_exec_cleanup(struct drm_exec *exec);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] drm/exec, drm/xe, drm/amdgpu: Add an accessor for struct drm_exec::ticket
  2026-03-31  9:46   ` Christian König
@ 2026-03-31 10:18     ` Thomas Hellström
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Hellström @ 2026-03-31 10:18 UTC (permalink / raw)
  To: Christian König, intel-xe
  Cc: Felix Kuehling, Alex Deucher, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Danilo Krummrich, Matthew Brost, Alice Ryhl, Rob Clark,
	Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang, Sean Paul,
	Marijn Suijten, amd-gfx, dri-devel, linux-arm-msm, freedreno

Hi,

On Tue, 2026-03-31 at 11:46 +0200, Christian König wrote:
> On 3/31/26 11:20, Thomas Hellström wrote:
> > Drivers were accessing this drm_exec member directly.
> 
> I don't see a problem with that as long as we have documented that
> this is allowed.

It's more of forward-looking for the case I mentioned in the cover-
letter. If drm_exec becomes a subclass of a drm_transaction or
whatever, then this would likely be &exec->txn.ticket;

Could ofc postpone that to any such refactor, but since the patch is up
for review...

Thanks,
Thomas


> 
> Regards,
> Christian.
> 
> > Provide an accessor, drm_exec_ticket() to avoid that.
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 ++--
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c           | 6 +++---
> >  drivers/gpu/drm/xe/xe_validation.c               | 4 ++--
> >  include/drm/drm_exec.h                           | 5 +++++
> >  4 files changed, 12 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > index 29b400cdd6d5..8a4fb9a62485 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > @@ -2998,7 +2998,7 @@ int
> > amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct
> > dma_fence __rcu *
> >  	/* Validate PDs, PTs and evicted DMABuf imports last.
> > Otherwise BO
> >  	 * validations above would invalidate DMABuf imports
> > again.
> >  	 */
> > -	ret = process_validate_vms(process_info, &exec.ticket);
> > +	ret = process_validate_vms(process_info,
> > drm_exec_ticket(exec));
> >  	if (ret) {
> >  		pr_debug("Validating VMs failed, ret: %d\n", ret);
> >  		goto validate_map_fail;
> > @@ -3039,7 +3039,7 @@ int
> > amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct
> > dma_fence __rcu *
> >  			goto validate_map_fail;
> >  		}
> >  
> > -		ret = amdgpu_vm_handle_moved(adev, peer_vm,
> > &exec.ticket);
> > +		ret = amdgpu_vm_handle_moved(adev, peer_vm,
> > drm_exec_ticket(exec));
> >  		if (ret) {
> >  			dev_dbg(adev->dev,
> >  				"Memory eviction: handle moved
> > failed, pid %8d. Try again.\n",
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > index c4ee19603460..c725a7976c63 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > @@ -1157,7 +1157,7 @@ static int amdgpu_cs_vm_handling(struct
> > amdgpu_cs_parser *p)
> >  			return r;
> >  	}
> >  
> > -	r = amdgpu_vm_handle_moved(adev, vm, &p->exec.ticket);
> > +	r = amdgpu_vm_handle_moved(adev, vm, drm_exec_ticket(&p-
> > >exec));
> >  	if (r)
> >  		return r;
> >  
> > @@ -1358,7 +1358,7 @@ static int amdgpu_cs_submit(struct
> > amdgpu_cs_parser *p,
> >  	cs->out.handle = seq;
> >  	leader->uf_sequence = seq;
> >  
> > -	amdgpu_vm_bo_trace_cs(&fpriv->vm, &p->exec.ticket);
> > +	amdgpu_vm_bo_trace_cs(&fpriv->vm, drm_exec_ticket(&p-
> > >exec));
> >  	for (i = 0; i < p->gang_size; ++i) {
> >  		amdgpu_job_free_resources(p->jobs[i]);
> >  		trace_amdgpu_cs_ioctl(p->jobs[i]);
> > @@ -1793,7 +1793,7 @@ int amdgpu_cs_find_mapping(struct
> > amdgpu_cs_parser *parser,
> >  	*map = mapping;
> >  
> >  	/* Double check that the BO is reserved by this CS */
> > -	if (dma_resv_locking_ctx((*bo)->tbo.base.resv) != &parser-
> > >exec.ticket)
> > +	if (dma_resv_locking_ctx((*bo)->tbo.base.resv) !=
> > drm_exec_ticket(&parser->exec))
> >  		return -EINVAL;
> >  
> >  	/* Make sure VRAM is allocated contigiously */
> > diff --git a/drivers/gpu/drm/xe/xe_validation.c
> > b/drivers/gpu/drm/xe/xe_validation.c
> > index a611438eaafe..8dff4d0ec895 100644
> > --- a/drivers/gpu/drm/xe/xe_validation.c
> > +++ b/drivers/gpu/drm/xe/xe_validation.c
> > @@ -156,7 +156,7 @@ int xe_validation_ctx_init(struct
> > xe_validation_ctx *ctx, struct xe_validation_d
> >  
> >  #ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> >  /*
> > - * This abuses both drm_exec and ww_mutex internals and should be
> > + * This abuses ww_mutex internals and should be
> >   * replaced by checking for -EDEADLK when we can make TTM
> >   * stop converting -EDEADLK to -ENOMEM.
> >   * An alternative is to not have exhaustive eviction with
> > @@ -164,7 +164,7 @@ int xe_validation_ctx_init(struct
> > xe_validation_ctx *ctx, struct xe_validation_d
> >   */
> >  static bool xe_validation_contention_injected(struct drm_exec
> > *exec)
> >  {
> > -	return !!exec->ticket.contending_lock;
> > +	return !!drm_exec_ticket(exec)->contending_lock;
> >  }
> >  
> >  #else
> > diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
> > index 5ed5be1f8244..50d056a87de0 100644
> > --- a/include/drm/drm_exec.h
> > +++ b/include/drm/drm_exec.h
> > @@ -151,6 +151,11 @@ static inline bool
> > drm_exec_is_contended(struct drm_exec *exec)
> >  		goto *__drm_exec_retry_ptr;		\
> >  	} while (0)
> >  
> > +static inline struct ww_acquire_ctx *drm_exec_ticket(struct
> > drm_exec *exec)
> > +{
> > +	return &exec->ticket;
> > +}
> > +
> >  void drm_exec_init(struct drm_exec *exec, u32 flags, unsigned nr);
> >  void drm_exec_fini(struct drm_exec *exec);
> >  bool drm_exec_cleanup(struct drm_exec *exec);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/5] drm/exec: Make the drm_exec_until_all_locked() macro more readable
  2026-03-31  9:39   ` Christian König
@ 2026-03-31 11:03     ` Thomas Hellström
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Hellström @ 2026-03-31 11:03 UTC (permalink / raw)
  To: Christian König, intel-xe
  Cc: Felix Kuehling, Alex Deucher, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Danilo Krummrich, Matthew Brost, Alice Ryhl, Rob Clark,
	Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang, Sean Paul,
	Marijn Suijten, amd-gfx, dri-devel, linux-arm-msm, freedreno

On Tue, 2026-03-31 at 11:39 +0200, Christian König wrote:
> 
> 
> On 3/31/26 11:20, Thomas Hellström wrote:
> > Use __UNIQUE_ID as done elsewhere in the kernel rather than a
> > hand-rolled __PASTE to craft a unique id.
> > 
> > Also use __maybe_unused rather than (void) to signify that a
> > variable, althrough written to, may not actually be used.
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  include/drm/drm_exec.h | 23 ++++++++++++++---------
> >  1 file changed, 14 insertions(+), 9 deletions(-)
> > 
> > diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
> > index 25db52dd2af0..fc95a979e253 100644
> > --- a/include/drm/drm_exec.h
> > +++ b/include/drm/drm_exec.h
> > @@ -89,6 +89,19 @@ drm_exec_obj(struct drm_exec *exec, unsigned
> > long index)
> >  	for (unsigned long _index = (exec)->num_objects -
> > 1;				\
> >  	     ((obj) = drm_exec_obj(exec, _index)); --_index)
> >  
> > +/*
> > + * Helper to drm_exec_until_all_locked(). Don't use directly.
> > + *
> > + * Since labels can't be defined local to the loop's body we use a
> > jump pointer
> > + * to make sure that the retry is only used from within the loop's
> > body.
> > + */
> > +#define __drm_exec_until_all_locked(exec,
> > _label)			\
> > +_label:							
> > 		\
> > +	for (void * __maybe_unused __drm_exec_retry_ptr;
> > ({		\
> > +		__drm_exec_retry_ptr =
> > &&_label;			\
> 
> I think when using __maybe_unused we could also move assigning the
> variable to the deceleration and drop the extra ({}).

Sure. Looks even better.

Thanks,
Thomas



> 
> Apart from that looks good to me.
> 
> Regards,
> Christian.
> 
> > +		drm_exec_cleanup(exec);			
> > 		\
> > +	});)
> > +
> >  /**
> >   * drm_exec_until_all_locked - loop until all GEM objects are
> > locked
> >   * @exec: drm_exec object
> > @@ -96,17 +109,9 @@ drm_exec_obj(struct drm_exec *exec, unsigned
> > long index)
> >   * Core functionality of the drm_exec object. Loops until all GEM
> > objects are
> >   * locked and no more contention exists. At the beginning of the
> > loop it is
> >   * guaranteed that no GEM object is locked.
> > - *
> > - * Since labels can't be defined local to the loops body we use a
> > jump pointer
> > - * to make sure that the retry is only used from within the loops
> > body.
> >   */
> >  #define
> > drm_exec_until_all_locked(exec)					\
> > -__PASTE(__drm_exec_,
> > __LINE__):						\
> > -	for (void *__drm_exec_retry_ptr;
> > ({				\
> > -		__drm_exec_retry_ptr = &&__PASTE(__drm_exec_,
> > __LINE__);\
> > -
> > 		(void)__drm_exec_retry_ptr;				\
> > -
> > 		drm_exec_cleanup(exec);					\
> > -	});)
> > +	__drm_exec_until_all_locked(exec, __UNIQUE_ID(drm_exec))
> >  
> >  /**
> >   * drm_exec_retry_on_contention - restart the loop to grap all
> > locks

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/5] drm/exec, drm/xe: Avoid abusing the drm_exec retry pointer
  2026-03-31 10:13     ` Thomas Hellström
@ 2026-03-31 11:09       ` Thomas Hellström
  2026-03-31 11:59       ` Christian König
  1 sibling, 0 replies; 23+ messages in thread
From: Thomas Hellström @ 2026-03-31 11:09 UTC (permalink / raw)
  To: Christian König, intel-xe
  Cc: Felix Kuehling, Alex Deucher, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Danilo Krummrich, Matthew Brost, Alice Ryhl, Rob Clark,
	Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang, Sean Paul,
	Marijn Suijten, amd-gfx, dri-devel, linux-arm-msm, freedreno

On Tue, 2026-03-31 at 12:13 +0200, Thomas Hellström wrote:
> On Tue, 2026-03-31 at 11:44 +0200, Christian König wrote:
> > On 3/31/26 11:20, Thomas Hellström wrote:
> > > The xe driver was using the drm_exec retry pointer directly to
> > > restart the locking loop after out-of-memory errors. This is
> > > relying on documented behaviour.
> > > 
> > > Instead add a drm_exec_retry() macro that can be used in this
> > > situation, and that also asserts that the struct drm_exec is
> > > in a state that is compatible with retrying:
> > > Either newly initialized or in a contended state with all locks
> > > dropped.
> > > 
> > > Use that macro in xe.
> > > 
> > > Signed-off-by: Thomas Hellström
> > > <thomas.hellstrom@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_validation.h |  2 +-
> > >  include/drm/drm_exec.h             | 13 +++++++++++++
> > >  2 files changed, 14 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_validation.h
> > > b/drivers/gpu/drm/xe/xe_validation.h
> > > index a30e732c4d51..4cd955ce6cd2 100644
> > > --- a/drivers/gpu/drm/xe/xe_validation.h
> > > +++ b/drivers/gpu/drm/xe/xe_validation.h
> > > @@ -146,7 +146,7 @@ bool xe_validation_should_retry(struct
> > > xe_validation_ctx *ctx, int *ret);
> > >  #define xe_validation_retry_on_oom(_ctx,
> > > _ret)				\
> > >  	do
> > > {								\
> > >  		if (xe_validation_should_retry(_ctx,
> > > _ret))		\
> > > -			goto
> > > *__drm_exec_retry_ptr;			\
> > > +			drm_exec_retry((_ctx)-
> > > > exec);			\
> > 
> > Oh, that goto is extremely questionable to begin with.
> > 
> > >  	} while (0)
> > >  
> > >  /**
> > > diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
> > > index fc95a979e253..5ed5be1f8244 100644
> > > --- a/include/drm/drm_exec.h
> > > +++ b/include/drm/drm_exec.h
> > > @@ -138,6 +138,19 @@ static inline bool
> > > drm_exec_is_contended(struct drm_exec *exec)
> > >  	return !!exec->contended;
> > >  }
> > >  
> > > +/**
> > > + * drm_exec_retry() - Unconditionally restart the loop to grab
> > > all
> > > locks.
> > > + * @exec: drm_exec object
> > > + *
> > > + * Unconditionally retry the loop to lock all objects. For
> > > consistency,
> > > + * the exec object needs to be newly initialized or contended.
> > > + */
> > > +#define drm_exec_retry(_exec)				\
> > > +	do {						\
> > > +		WARN_ON(!drm_exec_is_contended(_exec)); \
> > 
> > This warning would trigger!
> > 
> > See the code in xe_bo_notifier_prepare_pinned() for example:
> > 
> >                         drm_exec_retry_on_contention(&exec);
> >                         ret = PTR_ERR(backup);
> >                         xe_validation_retry_on_oom(&ctx, &ret);
> > 
> > Without contention we would just skip the loop and never lock
> > anything.
> > 
> > What XE does here just doesn't work as far as I can see.
> 
> So if the xe_validation_retry_on_oom() is actually retrying it
> internally call drm_exec_fini() and drm_exec_init() first, which
> means
> that the warning doesn't trigger, due to the dummy value of
> contended.
> 
> So the warning does its job, and xe is safe.

So the xe stuff is actually basically an outer loop to
drm_exec_until_all_locked().

We could ofc explicitly code that implementing an
xe_validation_until_all_valid() and have a separate goto ptr, but I'm
not sure that is cleaner, really. They'd point to the same address
anyway.

In the end, the WARN_ON in drm_exec_retry() would ensure drm_exec is
not in an awkward state anyway.

Thanks,
Thomas
 


> 
> Thanks,
> Thomas
> 
> 
> 
> > 
> > Regards,
> > Christian.
> > 
> > > +		goto *__drm_exec_retry_ptr;		\
> > > +	} while (0)
> > > +
> > >  void drm_exec_init(struct drm_exec *exec, u32 flags, unsigned
> > > nr);
> > >  void drm_exec_fini(struct drm_exec *exec);
> > >  bool drm_exec_cleanup(struct drm_exec *exec);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/5] drm/exec, drm/xe: Avoid abusing the drm_exec retry pointer
  2026-03-31 10:13     ` Thomas Hellström
  2026-03-31 11:09       ` Thomas Hellström
@ 2026-03-31 11:59       ` Christian König
  1 sibling, 0 replies; 23+ messages in thread
From: Christian König @ 2026-03-31 11:59 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: Felix Kuehling, Alex Deucher, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Danilo Krummrich, Matthew Brost, Alice Ryhl, Rob Clark,
	Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang, Sean Paul,
	Marijn Suijten, amd-gfx, dri-devel, linux-arm-msm, freedreno

On 3/31/26 12:13, Thomas Hellström wrote:
> On Tue, 2026-03-31 at 11:44 +0200, Christian König wrote:
>> On 3/31/26 11:20, Thomas Hellström wrote:
>>> The xe driver was using the drm_exec retry pointer directly to
>>> restart the locking loop after out-of-memory errors. This is
>>> relying on documented behaviour.
>>>
>>> Instead add a drm_exec_retry() macro that can be used in this
>>> situation, and that also asserts that the struct drm_exec is
>>> in a state that is compatible with retrying:
>>> Either newly initialized or in a contended state with all locks
>>> dropped.
>>>
>>> Use that macro in xe.
>>>
>>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> ---
>>>  drivers/gpu/drm/xe/xe_validation.h |  2 +-
>>>  include/drm/drm_exec.h             | 13 +++++++++++++
>>>  2 files changed, 14 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_validation.h
>>> b/drivers/gpu/drm/xe/xe_validation.h
>>> index a30e732c4d51..4cd955ce6cd2 100644
>>> --- a/drivers/gpu/drm/xe/xe_validation.h
>>> +++ b/drivers/gpu/drm/xe/xe_validation.h
>>> @@ -146,7 +146,7 @@ bool xe_validation_should_retry(struct
>>> xe_validation_ctx *ctx, int *ret);
>>>  #define xe_validation_retry_on_oom(_ctx,
>>> _ret)				\
>>>  	do
>>> {								\
>>>  		if (xe_validation_should_retry(_ctx,
>>> _ret))		\
>>> -			goto
>>> *__drm_exec_retry_ptr;			\
>>> +			drm_exec_retry((_ctx)-
>>>> exec);			\
>>
>> Oh, that goto is extremely questionable to begin with.
>>
>>>  	} while (0)
>>>  
>>>  /**
>>> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
>>> index fc95a979e253..5ed5be1f8244 100644
>>> --- a/include/drm/drm_exec.h
>>> +++ b/include/drm/drm_exec.h
>>> @@ -138,6 +138,19 @@ static inline bool
>>> drm_exec_is_contended(struct drm_exec *exec)
>>>  	return !!exec->contended;
>>>  }
>>>  
>>> +/**
>>> + * drm_exec_retry() - Unconditionally restart the loop to grab all
>>> locks.
>>> + * @exec: drm_exec object
>>> + *
>>> + * Unconditionally retry the loop to lock all objects. For
>>> consistency,
>>> + * the exec object needs to be newly initialized or contended.
>>> + */
>>> +#define drm_exec_retry(_exec)				\
>>> +	do {						\
>>> +		WARN_ON(!drm_exec_is_contended(_exec)); \
>>
>> This warning would trigger!
>>
>> See the code in xe_bo_notifier_prepare_pinned() for example:
>>
>>                         drm_exec_retry_on_contention(&exec);
>>                         ret = PTR_ERR(backup);
>>                         xe_validation_retry_on_oom(&ctx, &ret);
>>
>> Without contention we would just skip the loop and never lock
>> anything.
>>
>> What XE does here just doesn't work as far as I can see.
> 
> So if the xe_validation_retry_on_oom() is actually retrying it
> internally call drm_exec_fini() and drm_exec_init() first, which means
> that the warning doesn't trigger, due to the dummy value of contended.

Ah! Yeah that information was missing.

I'm really wondering if the calls to drm_exec_fini()/drm_exec_init() should be part of the drm_exec_retry() handling.

Otherwise that is kind of easy to mess up.

Regards,
Christian.

> 
> So the warning does its job, and xe is safe.
> 
> Thanks,
> Thomas
> 
> 
> 
>>
>> Regards,
>> Christian.
>>
>>> +		goto *__drm_exec_retry_ptr;		\
>>> +	} while (0)
>>> +
>>>  void drm_exec_init(struct drm_exec *exec, u32 flags, unsigned nr);
>>>  void drm_exec_fini(struct drm_exec *exec);
>>>  bool drm_exec_cleanup(struct drm_exec *exec);


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/5] drm/msm: Remove abuse of drm_exec internals
  2026-03-31  9:20 ` [PATCH 2/5] drm/msm: Remove abuse of drm_exec internals Thomas Hellström
  2026-03-31  9:30   ` Christian König
  2026-03-31  9:36   ` Christian König
@ 2026-03-31 19:08   ` Rob Clark
  2026-03-31 19:52     ` Thomas Hellström
  2 siblings, 1 reply; 23+ messages in thread
From: Rob Clark @ 2026-03-31 19:08 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Felix Kuehling, Alex Deucher, Christian König,
	David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Danilo Krummrich, Matthew Brost, Alice Ryhl,
	Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang, Sean Paul,
	Marijn Suijten, amd-gfx, dri-devel, linux-arm-msm, freedreno

On Tue, Mar 31, 2026 at 2:21 AM Thomas Hellström
<thomas.hellstrom@linux.intel.com> wrote:
>
> The code was reading drm_exec internal state to determine whether
> the drm_exec structure had been initialized or not, and therefore
> needed cleaning up, relying on undocumented behaviour.
>
> Instead add a bool to struct msm_gem_submit to indicate whether
> drm_exec cleaning up is needed.
>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Reviewed-by: Rob Clark <rob.clark@oss.qualcomm.com>

This is pretty stand-alone, so I can pick it up for v7.1.  Or ack for
landing it via drm-misc with the rest of the series if that is easier
for you.  It shouldn't conflict with anything in flight.

BR,
-R

> ---
>  drivers/gpu/drm/msm/msm_gem.h        | 1 +
>  drivers/gpu/drm/msm/msm_gem_submit.c | 4 +++-
>  2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
> index cb32093fda47..762e546d25ef 100644
> --- a/drivers/gpu/drm/msm/msm_gem.h
> +++ b/drivers/gpu/drm/msm/msm_gem.h
> @@ -452,6 +452,7 @@ struct msm_gem_submit {
>         bool bos_pinned : 1;
>         bool fault_dumped:1;/* Limit devcoredump dumping to one per submit */
>         bool in_rb : 1;     /* "sudo" mode, copy cmds into RB */
> +       bool has_exec : 1;  /* @exec is initialized. */
>         struct msm_ringbuffer *ring;
>         unsigned int nr_cmds;
>         unsigned int nr_bos;
> diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> index 75d9f3574370..26ea8a28be47 100644
> --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> @@ -278,6 +278,7 @@ static int submit_lock_objects_vmbind(struct msm_gem_submit *submit)
>         int ret = 0;
>
>         drm_exec_init(&submit->exec, flags, submit->nr_bos);
> +       submit->has_exec = true;
>
>         drm_exec_until_all_locked (&submit->exec) {
>                 ret = drm_gpuvm_prepare_vm(submit->vm, exec, 1);
> @@ -304,6 +305,7 @@ static int submit_lock_objects(struct msm_gem_submit *submit)
>                 return submit_lock_objects_vmbind(submit);
>
>         drm_exec_init(&submit->exec, flags, submit->nr_bos);
> +       submit->has_exec = true;
>
>         drm_exec_until_all_locked (&submit->exec) {
>                 ret = drm_exec_lock_obj(&submit->exec,
> @@ -523,7 +525,7 @@ static void submit_cleanup(struct msm_gem_submit *submit, bool error)
>         if (error)
>                 submit_unpin_objects(submit);
>
> -       if (submit->exec.objects)
> +       if (submit->has_exec)
>                 drm_exec_fini(&submit->exec);
>
>         /* if job wasn't enqueued to scheduler, early retirement: */
> --
> 2.53.0
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/5] drm/msm: Remove abuse of drm_exec internals
  2026-03-31 19:08   ` Rob Clark
@ 2026-03-31 19:52     ` Thomas Hellström
  2026-03-31 20:39       ` Rob Clark
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Hellström @ 2026-03-31 19:52 UTC (permalink / raw)
  To: rob.clark
  Cc: intel-xe, Felix Kuehling, Alex Deucher, Christian König,
	David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Danilo Krummrich, Matthew Brost, Alice Ryhl,
	Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang, Sean Paul,
	Marijn Suijten, amd-gfx, dri-devel, linux-arm-msm, freedreno

On Tue, 2026-03-31 at 12:08 -0700, Rob Clark wrote:
> On Tue, Mar 31, 2026 at 2:21 AM Thomas Hellström
> <thomas.hellstrom@linux.intel.com> wrote:
> > 
> > The code was reading drm_exec internal state to determine whether
> > the drm_exec structure had been initialized or not, and therefore
> > needed cleaning up, relying on undocumented behaviour.
> > 
> > Instead add a bool to struct msm_gem_submit to indicate whether
> > drm_exec cleaning up is needed.
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> 
> Reviewed-by: Rob Clark <rob.clark@oss.qualcomm.com>
> 
> This is pretty stand-alone, so I can pick it up for v7.1.  Or ack for
> landing it via drm-misc with the rest of the series if that is easier
> for you.  It shouldn't conflict with anything in flight.

Thanks Rob. Please pick it up and I'll exclude it from the next
iteration of the series.

Thanks,
Thomas

> 
> BR,
> -R
> 
> > ---
> >  drivers/gpu/drm/msm/msm_gem.h        | 1 +
> >  drivers/gpu/drm/msm/msm_gem_submit.c | 4 +++-
> >  2 files changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/msm/msm_gem.h
> > b/drivers/gpu/drm/msm/msm_gem.h
> > index cb32093fda47..762e546d25ef 100644
> > --- a/drivers/gpu/drm/msm/msm_gem.h
> > +++ b/drivers/gpu/drm/msm/msm_gem.h
> > @@ -452,6 +452,7 @@ struct msm_gem_submit {
> >         bool bos_pinned : 1;
> >         bool fault_dumped:1;/* Limit devcoredump dumping to one per
> > submit */
> >         bool in_rb : 1;     /* "sudo" mode, copy cmds into RB */
> > +       bool has_exec : 1;  /* @exec is initialized. */
> >         struct msm_ringbuffer *ring;
> >         unsigned int nr_cmds;
> >         unsigned int nr_bos;
> > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c
> > b/drivers/gpu/drm/msm/msm_gem_submit.c
> > index 75d9f3574370..26ea8a28be47 100644
> > --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> > @@ -278,6 +278,7 @@ static int submit_lock_objects_vmbind(struct
> > msm_gem_submit *submit)
> >         int ret = 0;
> > 
> >         drm_exec_init(&submit->exec, flags, submit->nr_bos);
> > +       submit->has_exec = true;
> > 
> >         drm_exec_until_all_locked (&submit->exec) {
> >                 ret = drm_gpuvm_prepare_vm(submit->vm, exec, 1);
> > @@ -304,6 +305,7 @@ static int submit_lock_objects(struct
> > msm_gem_submit *submit)
> >                 return submit_lock_objects_vmbind(submit);
> > 
> >         drm_exec_init(&submit->exec, flags, submit->nr_bos);
> > +       submit->has_exec = true;
> > 
> >         drm_exec_until_all_locked (&submit->exec) {
> >                 ret = drm_exec_lock_obj(&submit->exec,
> > @@ -523,7 +525,7 @@ static void submit_cleanup(struct
> > msm_gem_submit *submit, bool error)
> >         if (error)
> >                 submit_unpin_objects(submit);
> > 
> > -       if (submit->exec.objects)
> > +       if (submit->has_exec)
> >                 drm_exec_fini(&submit->exec);
> > 
> >         /* if job wasn't enqueued to scheduler, early retirement:
> > */
> > --
> > 2.53.0
> > 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/5] drm/msm: Remove abuse of drm_exec internals
  2026-03-31 19:52     ` Thomas Hellström
@ 2026-03-31 20:39       ` Rob Clark
  0 siblings, 0 replies; 23+ messages in thread
From: Rob Clark @ 2026-03-31 20:39 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Felix Kuehling, Alex Deucher, Christian König,
	David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Danilo Krummrich, Matthew Brost, Alice Ryhl,
	Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang, Sean Paul,
	Marijn Suijten, amd-gfx, dri-devel, linux-arm-msm, freedreno

On Tue, Mar 31, 2026 at 12:52 PM Thomas Hellström
<thomas.hellstrom@linux.intel.com> wrote:
>
> On Tue, 2026-03-31 at 12:08 -0700, Rob Clark wrote:
> > On Tue, Mar 31, 2026 at 2:21 AM Thomas Hellström
> > <thomas.hellstrom@linux.intel.com> wrote:
> > >
> > > The code was reading drm_exec internal state to determine whether
> > > the drm_exec structure had been initialized or not, and therefore
> > > needed cleaning up, relying on undocumented behaviour.
> > >
> > > Instead add a bool to struct msm_gem_submit to indicate whether
> > > drm_exec cleaning up is needed.
> > >
> > > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> >
> > Reviewed-by: Rob Clark <rob.clark@oss.qualcomm.com>
> >
> > This is pretty stand-alone, so I can pick it up for v7.1.  Or ack for
> > landing it via drm-misc with the rest of the series if that is easier
> > for you.  It shouldn't conflict with anything in flight.
>
> Thanks Rob. Please pick it up and I'll exclude it from the next
> iteration of the series.

Will do, I have it queued up:
https://gitlab.freedesktop.org/drm/msm/-/merge_requests/227

BR,
-R

> Thanks,
> Thomas
>
> >
> > BR,
> > -R
> >
> > > ---
> > >  drivers/gpu/drm/msm/msm_gem.h        | 1 +
> > >  drivers/gpu/drm/msm/msm_gem_submit.c | 4 +++-
> > >  2 files changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/msm/msm_gem.h
> > > b/drivers/gpu/drm/msm/msm_gem.h
> > > index cb32093fda47..762e546d25ef 100644
> > > --- a/drivers/gpu/drm/msm/msm_gem.h
> > > +++ b/drivers/gpu/drm/msm/msm_gem.h
> > > @@ -452,6 +452,7 @@ struct msm_gem_submit {
> > >         bool bos_pinned : 1;
> > >         bool fault_dumped:1;/* Limit devcoredump dumping to one per
> > > submit */
> > >         bool in_rb : 1;     /* "sudo" mode, copy cmds into RB */
> > > +       bool has_exec : 1;  /* @exec is initialized. */
> > >         struct msm_ringbuffer *ring;
> > >         unsigned int nr_cmds;
> > >         unsigned int nr_bos;
> > > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c
> > > b/drivers/gpu/drm/msm/msm_gem_submit.c
> > > index 75d9f3574370..26ea8a28be47 100644
> > > --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> > > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> > > @@ -278,6 +278,7 @@ static int submit_lock_objects_vmbind(struct
> > > msm_gem_submit *submit)
> > >         int ret = 0;
> > >
> > >         drm_exec_init(&submit->exec, flags, submit->nr_bos);
> > > +       submit->has_exec = true;
> > >
> > >         drm_exec_until_all_locked (&submit->exec) {
> > >                 ret = drm_gpuvm_prepare_vm(submit->vm, exec, 1);
> > > @@ -304,6 +305,7 @@ static int submit_lock_objects(struct
> > > msm_gem_submit *submit)
> > >                 return submit_lock_objects_vmbind(submit);
> > >
> > >         drm_exec_init(&submit->exec, flags, submit->nr_bos);
> > > +       submit->has_exec = true;
> > >
> > >         drm_exec_until_all_locked (&submit->exec) {
> > >                 ret = drm_exec_lock_obj(&submit->exec,
> > > @@ -523,7 +525,7 @@ static void submit_cleanup(struct
> > > msm_gem_submit *submit, bool error)
> > >         if (error)
> > >                 submit_unpin_objects(submit);
> > >
> > > -       if (submit->exec.objects)
> > > +       if (submit->has_exec)
> > >                 drm_exec_fini(&submit->exec);
> > >
> > >         /* if job wasn't enqueued to scheduler, early retirement:
> > > */
> > > --
> > > 2.53.0
> > >

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] drm/exec, drm/xe, drm/amdgpu: Add an accessor for struct drm_exec::ticket
  2026-03-31  9:20 ` [PATCH 5/5] drm/exec, drm/xe, drm/amdgpu: Add an accessor for struct drm_exec::ticket Thomas Hellström
  2026-03-31  9:46   ` Christian König
@ 2026-03-31 21:46   ` kernel test robot
  2026-03-31 22:07   ` kernel test robot
  2026-04-01  0:38   ` kernel test robot
  3 siblings, 0 replies; 23+ messages in thread
From: kernel test robot @ 2026-03-31 21:46 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: oe-kbuild-all, Thomas Hellström, Felix Kuehling,
	Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Danilo Krummrich, Matthew Brost, Alice Ryhl, Rob Clark,
	Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang, Sean Paul,
	Marijn Suijten, amd-gfx, dri-devel, linux-arm-msm, freedreno

Hi Thomas,

kernel test robot noticed the following build errors:

[auto build test ERROR on drm-misc/drm-misc-next]
[also build test ERROR on next-20260331]
[cannot apply to drm-xe/drm-xe-next linus/master v6.16-rc1]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Thomas-Hellstr-m/drm-exec-Remove-the-index-parameter-from-drm_exec_for_each_locked_obj-_reverse/20260331-220349
base:   https://gitlab.freedesktop.org/drm/misc/kernel.git drm-misc-next
patch link:    https://lore.kernel.org/r/20260331092023.81616-6-thomas.hellstrom%40linux.intel.com
patch subject: [PATCH 5/5] drm/exec, drm/xe, drm/amdgpu: Add an accessor for struct drm_exec::ticket
config: x86_64-rhel-9.4 (https://download.01.org/0day-ci/archive/20260331/202603312339.70s7djVd-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260331/202603312339.70s7djVd-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603312339.70s7djVd-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c: In function 'amdgpu_amdkfd_gpuvm_restore_process_bos':
>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:3001:66: error: incompatible type for argument 1 of 'drm_exec_ticket'
    3001 |         ret = process_validate_vms(process_info, drm_exec_ticket(exec));
         |                                                                  ^~~~
         |                                                                  |
         |                                                                  struct drm_exec
   In file included from drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:30:
   include/drm/drm_exec.h:154:71: note: expected 'struct drm_exec *' but argument is of type 'struct drm_exec'
     154 | static inline struct ww_acquire_ctx *drm_exec_ticket(struct drm_exec *exec)
         |                                                      ~~~~~~~~~~~~~~~~~^~~~
   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:3042:77: error: incompatible type for argument 1 of 'drm_exec_ticket'
    3042 |                 ret = amdgpu_vm_handle_moved(adev, peer_vm, drm_exec_ticket(exec));
         |                                                                             ^~~~
         |                                                                             |
         |                                                                             struct drm_exec
   include/drm/drm_exec.h:154:71: note: expected 'struct drm_exec *' but argument is of type 'struct drm_exec'
     154 | static inline struct ww_acquire_ctx *drm_exec_ticket(struct drm_exec *exec)
         |                                                      ~~~~~~~~~~~~~~~~~^~~~


vim +/drm_exec_ticket +3001 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

  2897	
  2898	/** amdgpu_amdkfd_gpuvm_restore_process_bos - Restore all BOs for the given
  2899	 *   KFD process identified by process_info
  2900	 *
  2901	 * @process_info: amdkfd_process_info of the KFD process
  2902	 *
  2903	 * After memory eviction, restore thread calls this function. The function
  2904	 * should be called when the Process is still valid. BO restore involves -
  2905	 *
  2906	 * 1.  Release old eviction fence and create new one
  2907	 * 2.  Get two copies of PD BO list from all the VMs. Keep one copy as pd_list.
  2908	 * 3   Use the second PD list and kfd_bo_list to create a list (ctx.list) of
  2909	 *     BOs that need to be reserved.
  2910	 * 4.  Reserve all the BOs
  2911	 * 5.  Validate of PD and PT BOs.
  2912	 * 6.  Validate all KFD BOs using kfd_bo_list and Map them and add new fence
  2913	 * 7.  Add fence to all PD and PT BOs.
  2914	 * 8.  Unreserve all BOs
  2915	 */
  2916	int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence __rcu **ef)
  2917	{
  2918		struct amdkfd_process_info *process_info = info;
  2919		struct amdgpu_vm *peer_vm;
  2920		struct kgd_mem *mem;
  2921		struct list_head duplicate_save;
  2922		struct amdgpu_sync sync_obj;
  2923		unsigned long failed_size = 0;
  2924		unsigned long total_size = 0;
  2925		struct drm_exec exec;
  2926		int ret;
  2927	
  2928		INIT_LIST_HEAD(&duplicate_save);
  2929	
  2930		mutex_lock(&process_info->lock);
  2931	
  2932		drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES, 0);
  2933		drm_exec_until_all_locked(&exec) {
  2934			list_for_each_entry(peer_vm, &process_info->vm_list_head,
  2935					    vm_list_node) {
  2936				ret = amdgpu_vm_lock_pd(peer_vm, &exec, 2);
  2937				drm_exec_retry_on_contention(&exec);
  2938				if (unlikely(ret)) {
  2939					pr_err("Locking VM PD failed, ret: %d\n", ret);
  2940					goto ttm_reserve_fail;
  2941				}
  2942			}
  2943	
  2944			/* Reserve all BOs and page tables/directory. Add all BOs from
  2945			 * kfd_bo_list to ctx.list
  2946			 */
  2947			list_for_each_entry(mem, &process_info->kfd_bo_list,
  2948					    validate_list) {
  2949				struct drm_gem_object *gobj;
  2950	
  2951				gobj = &mem->bo->tbo.base;
  2952				ret = drm_exec_prepare_obj(&exec, gobj, 1);
  2953				drm_exec_retry_on_contention(&exec);
  2954				if (unlikely(ret)) {
  2955					pr_err("drm_exec_prepare_obj failed, ret: %d\n", ret);
  2956					goto ttm_reserve_fail;
  2957				}
  2958			}
  2959		}
  2960	
  2961		amdgpu_sync_create(&sync_obj);
  2962	
  2963		/* Validate BOs managed by KFD */
  2964		list_for_each_entry(mem, &process_info->kfd_bo_list,
  2965				    validate_list) {
  2966	
  2967			struct amdgpu_bo *bo = mem->bo;
  2968			uint32_t domain = mem->domain;
  2969			struct dma_resv_iter cursor;
  2970			struct dma_fence *fence;
  2971	
  2972			total_size += amdgpu_bo_size(bo);
  2973	
  2974			ret = amdgpu_amdkfd_bo_validate(bo, domain, false);
  2975			if (ret) {
  2976				pr_debug("Memory eviction: Validate BOs failed\n");
  2977				failed_size += amdgpu_bo_size(bo);
  2978				ret = amdgpu_amdkfd_bo_validate(bo,
  2979							AMDGPU_GEM_DOMAIN_GTT, false);
  2980				if (ret) {
  2981					pr_debug("Memory eviction: Try again\n");
  2982					goto validate_map_fail;
  2983				}
  2984			}
  2985			dma_resv_for_each_fence(&cursor, bo->tbo.base.resv,
  2986						DMA_RESV_USAGE_KERNEL, fence) {
  2987				ret = amdgpu_sync_fence(&sync_obj, fence, GFP_KERNEL);
  2988				if (ret) {
  2989					pr_debug("Memory eviction: Sync BO fence failed. Try again\n");
  2990					goto validate_map_fail;
  2991				}
  2992			}
  2993		}
  2994	
  2995		if (failed_size)
  2996			pr_debug("0x%lx/0x%lx in system\n", failed_size, total_size);
  2997	
  2998		/* Validate PDs, PTs and evicted DMABuf imports last. Otherwise BO
  2999		 * validations above would invalidate DMABuf imports again.
  3000		 */
> 3001		ret = process_validate_vms(process_info, drm_exec_ticket(exec));
  3002		if (ret) {
  3003			pr_debug("Validating VMs failed, ret: %d\n", ret);
  3004			goto validate_map_fail;
  3005		}
  3006	
  3007		/* Update mappings managed by KFD. */
  3008		list_for_each_entry(mem, &process_info->kfd_bo_list,
  3009				    validate_list) {
  3010			struct kfd_mem_attachment *attachment;
  3011	
  3012			list_for_each_entry(attachment, &mem->attachments, list) {
  3013				if (!attachment->is_mapped)
  3014					continue;
  3015	
  3016				kfd_mem_dmaunmap_attachment(mem, attachment);
  3017				ret = update_gpuvm_pte(mem, attachment, &sync_obj);
  3018				if (ret) {
  3019					pr_debug("Memory eviction: update PTE failed. Try again\n");
  3020					goto validate_map_fail;
  3021				}
  3022			}
  3023		}
  3024	
  3025		/* Update mappings not managed by KFD */
  3026		list_for_each_entry(peer_vm, &process_info->vm_list_head,
  3027				vm_list_node) {
  3028			struct amdgpu_device *adev = amdgpu_ttm_adev(
  3029				peer_vm->root.bo->tbo.bdev);
  3030	
  3031			struct amdgpu_fpriv *fpriv =
  3032				container_of(peer_vm, struct amdgpu_fpriv, vm);
  3033	
  3034			ret = amdgpu_vm_bo_update(adev, fpriv->prt_va, false);
  3035			if (ret) {
  3036				dev_dbg(adev->dev,
  3037					"Memory eviction: handle PRT moved failed, pid %8d. Try again.\n",
  3038					pid_nr(process_info->pid));
  3039				goto validate_map_fail;
  3040			}
  3041	
  3042			ret = amdgpu_vm_handle_moved(adev, peer_vm, drm_exec_ticket(exec));
  3043			if (ret) {
  3044				dev_dbg(adev->dev,
  3045					"Memory eviction: handle moved failed, pid %8d. Try again.\n",
  3046					pid_nr(process_info->pid));
  3047				goto validate_map_fail;
  3048			}
  3049		}
  3050	
  3051		/* Update page directories */
  3052		ret = process_update_pds(process_info, &sync_obj);
  3053		if (ret) {
  3054			pr_debug("Memory eviction: update PDs failed. Try again\n");
  3055			goto validate_map_fail;
  3056		}
  3057	
  3058		/* Sync with fences on all the page tables. They implicitly depend on any
  3059		 * move fences from amdgpu_vm_handle_moved above.
  3060		 */
  3061		ret = process_sync_pds_resv(process_info, &sync_obj);
  3062		if (ret) {
  3063			pr_debug("Memory eviction: Failed to sync to PD BO moving fence. Try again\n");
  3064			goto validate_map_fail;
  3065		}
  3066	
  3067		/* Wait for validate and PT updates to finish */
  3068		amdgpu_sync_wait(&sync_obj, false);
  3069	
  3070		/* The old eviction fence may be unsignaled if restore happens
  3071		 * after a GPU reset or suspend/resume. Keep the old fence in that
  3072		 * case. Otherwise release the old eviction fence and create new
  3073		 * one, because fence only goes from unsignaled to signaled once
  3074		 * and cannot be reused. Use context and mm from the old fence.
  3075		 *
  3076		 * If an old eviction fence signals after this check, that's OK.
  3077		 * Anyone signaling an eviction fence must stop the queues first
  3078		 * and schedule another restore worker.
  3079		 */
  3080		if (dma_fence_is_signaled(&process_info->eviction_fence->base)) {
  3081			struct amdgpu_amdkfd_fence *new_fence =
  3082				amdgpu_amdkfd_fence_create(
  3083					process_info->eviction_fence->base.context,
  3084					process_info->eviction_fence->mm,
  3085					NULL, process_info->context_id);
  3086	
  3087			if (!new_fence) {
  3088				pr_err("Failed to create eviction fence\n");
  3089				ret = -ENOMEM;
  3090				goto validate_map_fail;
  3091			}
  3092			dma_fence_put(&process_info->eviction_fence->base);
  3093			process_info->eviction_fence = new_fence;
  3094			replace_eviction_fence(ef, dma_fence_get(&new_fence->base));
  3095		} else {
  3096			WARN_ONCE(*ef != &process_info->eviction_fence->base,
  3097				  "KFD eviction fence doesn't match KGD process_info");
  3098		}
  3099	
  3100		/* Attach new eviction fence to all BOs except pinned ones */
  3101		list_for_each_entry(mem, &process_info->kfd_bo_list, validate_list) {
  3102			if (mem->bo->tbo.pin_count)
  3103				continue;
  3104	
  3105			dma_resv_add_fence(mem->bo->tbo.base.resv,
  3106					   &process_info->eviction_fence->base,
  3107					   DMA_RESV_USAGE_BOOKKEEP);
  3108		}
  3109		/* Attach eviction fence to PD / PT BOs and DMABuf imports */
  3110		list_for_each_entry(peer_vm, &process_info->vm_list_head,
  3111				    vm_list_node) {
  3112			struct amdgpu_bo *bo = peer_vm->root.bo;
  3113	
  3114			dma_resv_add_fence(bo->tbo.base.resv,
  3115					   &process_info->eviction_fence->base,
  3116					   DMA_RESV_USAGE_BOOKKEEP);
  3117		}
  3118	
  3119	validate_map_fail:
  3120		amdgpu_sync_free(&sync_obj);
  3121	ttm_reserve_fail:
  3122		drm_exec_fini(&exec);
  3123		mutex_unlock(&process_info->lock);
  3124		return ret;
  3125	}
  3126	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] drm/exec, drm/xe, drm/amdgpu: Add an accessor for struct drm_exec::ticket
  2026-03-31  9:20 ` [PATCH 5/5] drm/exec, drm/xe, drm/amdgpu: Add an accessor for struct drm_exec::ticket Thomas Hellström
  2026-03-31  9:46   ` Christian König
  2026-03-31 21:46   ` kernel test robot
@ 2026-03-31 22:07   ` kernel test robot
  2026-04-01  0:38   ` kernel test robot
  3 siblings, 0 replies; 23+ messages in thread
From: kernel test robot @ 2026-03-31 22:07 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: oe-kbuild-all, Thomas Hellström, Felix Kuehling,
	Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Danilo Krummrich, Matthew Brost, Alice Ryhl, Rob Clark,
	Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang, Sean Paul,
	Marijn Suijten, amd-gfx, dri-devel, linux-arm-msm, freedreno

Hi Thomas,

kernel test robot noticed the following build errors:

[auto build test ERROR on drm-misc/drm-misc-next]
[also build test ERROR on next-20260330]
[cannot apply to drm-xe/drm-xe-next linus/master v7.0-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Thomas-Hellstr-m/drm-exec-Remove-the-index-parameter-from-drm_exec_for_each_locked_obj-_reverse/20260331-220349
base:   https://gitlab.freedesktop.org/drm/misc/kernel.git drm-misc-next
patch link:    https://lore.kernel.org/r/20260331092023.81616-6-thomas.hellstrom%40linux.intel.com
patch subject: [PATCH 5/5] drm/exec, drm/xe, drm/amdgpu: Add an accessor for struct drm_exec::ticket
config: riscv-randconfig-r073-20260401 (https://download.01.org/0day-ci/archive/20260401/202604010642.6F4lO2Gd-lkp@intel.com/config)
compiler: clang version 23.0.0git (https://github.com/llvm/llvm-project 2cd67b8b69f78e3f95918204320c3075a74ba16c)
smatch: v0.5.0-9004-gb810ac53
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260401/202604010642.6F4lO2Gd-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202604010642.6F4lO2Gd-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:3001:59: error: passing 'struct drm_exec' to parameter of incompatible type 'struct drm_exec *'; take the address with &
    3001 |         ret = process_validate_vms(process_info, drm_exec_ticket(exec));
         |                                                                  ^~~~
         |                                                                  &
   include/drm/drm_exec.h:154:71: note: passing argument to parameter 'exec' here
     154 | static inline struct ww_acquire_ctx *drm_exec_ticket(struct drm_exec *exec)
         |                                                                       ^
   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:3042:63: error: passing 'struct drm_exec' to parameter of incompatible type 'struct drm_exec *'; take the address with &
    3042 |                 ret = amdgpu_vm_handle_moved(adev, peer_vm, drm_exec_ticket(exec));
         |                                                                             ^~~~
         |                                                                             &
   include/drm/drm_exec.h:154:71: note: passing argument to parameter 'exec' here
     154 | static inline struct ww_acquire_ctx *drm_exec_ticket(struct drm_exec *exec)
         |                                                                       ^
   2 errors generated.


vim +3001 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

  2897	
  2898	/** amdgpu_amdkfd_gpuvm_restore_process_bos - Restore all BOs for the given
  2899	 *   KFD process identified by process_info
  2900	 *
  2901	 * @process_info: amdkfd_process_info of the KFD process
  2902	 *
  2903	 * After memory eviction, restore thread calls this function. The function
  2904	 * should be called when the Process is still valid. BO restore involves -
  2905	 *
  2906	 * 1.  Release old eviction fence and create new one
  2907	 * 2.  Get two copies of PD BO list from all the VMs. Keep one copy as pd_list.
  2908	 * 3   Use the second PD list and kfd_bo_list to create a list (ctx.list) of
  2909	 *     BOs that need to be reserved.
  2910	 * 4.  Reserve all the BOs
  2911	 * 5.  Validate of PD and PT BOs.
  2912	 * 6.  Validate all KFD BOs using kfd_bo_list and Map them and add new fence
  2913	 * 7.  Add fence to all PD and PT BOs.
  2914	 * 8.  Unreserve all BOs
  2915	 */
  2916	int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence __rcu **ef)
  2917	{
  2918		struct amdkfd_process_info *process_info = info;
  2919		struct amdgpu_vm *peer_vm;
  2920		struct kgd_mem *mem;
  2921		struct list_head duplicate_save;
  2922		struct amdgpu_sync sync_obj;
  2923		unsigned long failed_size = 0;
  2924		unsigned long total_size = 0;
  2925		struct drm_exec exec;
  2926		int ret;
  2927	
  2928		INIT_LIST_HEAD(&duplicate_save);
  2929	
  2930		mutex_lock(&process_info->lock);
  2931	
  2932		drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES, 0);
  2933		drm_exec_until_all_locked(&exec) {
  2934			list_for_each_entry(peer_vm, &process_info->vm_list_head,
  2935					    vm_list_node) {
  2936				ret = amdgpu_vm_lock_pd(peer_vm, &exec, 2);
  2937				drm_exec_retry_on_contention(&exec);
  2938				if (unlikely(ret)) {
  2939					pr_err("Locking VM PD failed, ret: %d\n", ret);
  2940					goto ttm_reserve_fail;
  2941				}
  2942			}
  2943	
  2944			/* Reserve all BOs and page tables/directory. Add all BOs from
  2945			 * kfd_bo_list to ctx.list
  2946			 */
  2947			list_for_each_entry(mem, &process_info->kfd_bo_list,
  2948					    validate_list) {
  2949				struct drm_gem_object *gobj;
  2950	
  2951				gobj = &mem->bo->tbo.base;
  2952				ret = drm_exec_prepare_obj(&exec, gobj, 1);
  2953				drm_exec_retry_on_contention(&exec);
  2954				if (unlikely(ret)) {
  2955					pr_err("drm_exec_prepare_obj failed, ret: %d\n", ret);
  2956					goto ttm_reserve_fail;
  2957				}
  2958			}
  2959		}
  2960	
  2961		amdgpu_sync_create(&sync_obj);
  2962	
  2963		/* Validate BOs managed by KFD */
  2964		list_for_each_entry(mem, &process_info->kfd_bo_list,
  2965				    validate_list) {
  2966	
  2967			struct amdgpu_bo *bo = mem->bo;
  2968			uint32_t domain = mem->domain;
  2969			struct dma_resv_iter cursor;
  2970			struct dma_fence *fence;
  2971	
  2972			total_size += amdgpu_bo_size(bo);
  2973	
  2974			ret = amdgpu_amdkfd_bo_validate(bo, domain, false);
  2975			if (ret) {
  2976				pr_debug("Memory eviction: Validate BOs failed\n");
  2977				failed_size += amdgpu_bo_size(bo);
  2978				ret = amdgpu_amdkfd_bo_validate(bo,
  2979							AMDGPU_GEM_DOMAIN_GTT, false);
  2980				if (ret) {
  2981					pr_debug("Memory eviction: Try again\n");
  2982					goto validate_map_fail;
  2983				}
  2984			}
  2985			dma_resv_for_each_fence(&cursor, bo->tbo.base.resv,
  2986						DMA_RESV_USAGE_KERNEL, fence) {
  2987				ret = amdgpu_sync_fence(&sync_obj, fence, GFP_KERNEL);
  2988				if (ret) {
  2989					pr_debug("Memory eviction: Sync BO fence failed. Try again\n");
  2990					goto validate_map_fail;
  2991				}
  2992			}
  2993		}
  2994	
  2995		if (failed_size)
  2996			pr_debug("0x%lx/0x%lx in system\n", failed_size, total_size);
  2997	
  2998		/* Validate PDs, PTs and evicted DMABuf imports last. Otherwise BO
  2999		 * validations above would invalidate DMABuf imports again.
  3000		 */
> 3001		ret = process_validate_vms(process_info, drm_exec_ticket(exec));
  3002		if (ret) {
  3003			pr_debug("Validating VMs failed, ret: %d\n", ret);
  3004			goto validate_map_fail;
  3005		}
  3006	
  3007		/* Update mappings managed by KFD. */
  3008		list_for_each_entry(mem, &process_info->kfd_bo_list,
  3009				    validate_list) {
  3010			struct kfd_mem_attachment *attachment;
  3011	
  3012			list_for_each_entry(attachment, &mem->attachments, list) {
  3013				if (!attachment->is_mapped)
  3014					continue;
  3015	
  3016				kfd_mem_dmaunmap_attachment(mem, attachment);
  3017				ret = update_gpuvm_pte(mem, attachment, &sync_obj);
  3018				if (ret) {
  3019					pr_debug("Memory eviction: update PTE failed. Try again\n");
  3020					goto validate_map_fail;
  3021				}
  3022			}
  3023		}
  3024	
  3025		/* Update mappings not managed by KFD */
  3026		list_for_each_entry(peer_vm, &process_info->vm_list_head,
  3027				vm_list_node) {
  3028			struct amdgpu_device *adev = amdgpu_ttm_adev(
  3029				peer_vm->root.bo->tbo.bdev);
  3030	
  3031			struct amdgpu_fpriv *fpriv =
  3032				container_of(peer_vm, struct amdgpu_fpriv, vm);
  3033	
  3034			ret = amdgpu_vm_bo_update(adev, fpriv->prt_va, false);
  3035			if (ret) {
  3036				dev_dbg(adev->dev,
  3037					"Memory eviction: handle PRT moved failed, pid %8d. Try again.\n",
  3038					pid_nr(process_info->pid));
  3039				goto validate_map_fail;
  3040			}
  3041	
  3042			ret = amdgpu_vm_handle_moved(adev, peer_vm, drm_exec_ticket(exec));
  3043			if (ret) {
  3044				dev_dbg(adev->dev,
  3045					"Memory eviction: handle moved failed, pid %8d. Try again.\n",
  3046					pid_nr(process_info->pid));
  3047				goto validate_map_fail;
  3048			}
  3049		}
  3050	
  3051		/* Update page directories */
  3052		ret = process_update_pds(process_info, &sync_obj);
  3053		if (ret) {
  3054			pr_debug("Memory eviction: update PDs failed. Try again\n");
  3055			goto validate_map_fail;
  3056		}
  3057	
  3058		/* Sync with fences on all the page tables. They implicitly depend on any
  3059		 * move fences from amdgpu_vm_handle_moved above.
  3060		 */
  3061		ret = process_sync_pds_resv(process_info, &sync_obj);
  3062		if (ret) {
  3063			pr_debug("Memory eviction: Failed to sync to PD BO moving fence. Try again\n");
  3064			goto validate_map_fail;
  3065		}
  3066	
  3067		/* Wait for validate and PT updates to finish */
  3068		amdgpu_sync_wait(&sync_obj, false);
  3069	
  3070		/* The old eviction fence may be unsignaled if restore happens
  3071		 * after a GPU reset or suspend/resume. Keep the old fence in that
  3072		 * case. Otherwise release the old eviction fence and create new
  3073		 * one, because fence only goes from unsignaled to signaled once
  3074		 * and cannot be reused. Use context and mm from the old fence.
  3075		 *
  3076		 * If an old eviction fence signals after this check, that's OK.
  3077		 * Anyone signaling an eviction fence must stop the queues first
  3078		 * and schedule another restore worker.
  3079		 */
  3080		if (dma_fence_is_signaled(&process_info->eviction_fence->base)) {
  3081			struct amdgpu_amdkfd_fence *new_fence =
  3082				amdgpu_amdkfd_fence_create(
  3083					process_info->eviction_fence->base.context,
  3084					process_info->eviction_fence->mm,
  3085					NULL, process_info->context_id);
  3086	
  3087			if (!new_fence) {
  3088				pr_err("Failed to create eviction fence\n");
  3089				ret = -ENOMEM;
  3090				goto validate_map_fail;
  3091			}
  3092			dma_fence_put(&process_info->eviction_fence->base);
  3093			process_info->eviction_fence = new_fence;
  3094			replace_eviction_fence(ef, dma_fence_get(&new_fence->base));
  3095		} else {
  3096			WARN_ONCE(*ef != &process_info->eviction_fence->base,
  3097				  "KFD eviction fence doesn't match KGD process_info");
  3098		}
  3099	
  3100		/* Attach new eviction fence to all BOs except pinned ones */
  3101		list_for_each_entry(mem, &process_info->kfd_bo_list, validate_list) {
  3102			if (mem->bo->tbo.pin_count)
  3103				continue;
  3104	
  3105			dma_resv_add_fence(mem->bo->tbo.base.resv,
  3106					   &process_info->eviction_fence->base,
  3107					   DMA_RESV_USAGE_BOOKKEEP);
  3108		}
  3109		/* Attach eviction fence to PD / PT BOs and DMABuf imports */
  3110		list_for_each_entry(peer_vm, &process_info->vm_list_head,
  3111				    vm_list_node) {
  3112			struct amdgpu_bo *bo = peer_vm->root.bo;
  3113	
  3114			dma_resv_add_fence(bo->tbo.base.resv,
  3115					   &process_info->eviction_fence->base,
  3116					   DMA_RESV_USAGE_BOOKKEEP);
  3117		}
  3118	
  3119	validate_map_fail:
  3120		amdgpu_sync_free(&sync_obj);
  3121	ttm_reserve_fail:
  3122		drm_exec_fini(&exec);
  3123		mutex_unlock(&process_info->lock);
  3124		return ret;
  3125	}
  3126	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] drm/exec, drm/xe, drm/amdgpu: Add an accessor for struct drm_exec::ticket
  2026-03-31  9:20 ` [PATCH 5/5] drm/exec, drm/xe, drm/amdgpu: Add an accessor for struct drm_exec::ticket Thomas Hellström
                     ` (2 preceding siblings ...)
  2026-03-31 22:07   ` kernel test robot
@ 2026-04-01  0:38   ` kernel test robot
  3 siblings, 0 replies; 23+ messages in thread
From: kernel test robot @ 2026-04-01  0:38 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: oe-kbuild-all, Thomas Hellström, Felix Kuehling,
	Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Danilo Krummrich, Matthew Brost, Alice Ryhl, Rob Clark,
	Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang, Sean Paul,
	Marijn Suijten, amd-gfx, dri-devel, linux-arm-msm, freedreno

Hi Thomas,

kernel test robot noticed the following build errors:

[auto build test ERROR on drm-misc/drm-misc-next]
[also build test ERROR on next-20260330]
[cannot apply to drm-xe/drm-xe-next linus/master v7.0-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Thomas-Hellstr-m/drm-exec-Remove-the-index-parameter-from-drm_exec_for_each_locked_obj-_reverse/20260331-220349
base:   https://gitlab.freedesktop.org/drm/misc/kernel.git drm-misc-next
patch link:    https://lore.kernel.org/r/20260331092023.81616-6-thomas.hellstrom%40linux.intel.com
patch subject: [PATCH 5/5] drm/exec, drm/xe, drm/amdgpu: Add an accessor for struct drm_exec::ticket
config: x86_64-rhel-9.4 (https://download.01.org/0day-ci/archive/20260401/202604010859.7LmkFoJx-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260401/202604010859.7LmkFoJx-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202604010859.7LmkFoJx-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c: In function 'amdgpu_amdkfd_gpuvm_restore_process_bos':
>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:3001:66: error: incompatible type for argument 1 of 'drm_exec_ticket'
    3001 |         ret = process_validate_vms(process_info, drm_exec_ticket(exec));
         |                                                                  ^~~~
         |                                                                  |
         |                                                                  struct drm_exec
   In file included from drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:30:
   include/drm/drm_exec.h:154:71: note: expected 'struct drm_exec *' but argument is of type 'struct drm_exec'
     154 | static inline struct ww_acquire_ctx *drm_exec_ticket(struct drm_exec *exec)
         |                                                      ~~~~~~~~~~~~~~~~~^~~~
   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:3042:77: error: incompatible type for argument 1 of 'drm_exec_ticket'
    3042 |                 ret = amdgpu_vm_handle_moved(adev, peer_vm, drm_exec_ticket(exec));
         |                                                                             ^~~~
         |                                                                             |
         |                                                                             struct drm_exec
   include/drm/drm_exec.h:154:71: note: expected 'struct drm_exec *' but argument is of type 'struct drm_exec'
     154 | static inline struct ww_acquire_ctx *drm_exec_ticket(struct drm_exec *exec)
         |                                                      ~~~~~~~~~~~~~~~~~^~~~


vim +/drm_exec_ticket +3001 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

  2897	
  2898	/** amdgpu_amdkfd_gpuvm_restore_process_bos - Restore all BOs for the given
  2899	 *   KFD process identified by process_info
  2900	 *
  2901	 * @process_info: amdkfd_process_info of the KFD process
  2902	 *
  2903	 * After memory eviction, restore thread calls this function. The function
  2904	 * should be called when the Process is still valid. BO restore involves -
  2905	 *
  2906	 * 1.  Release old eviction fence and create new one
  2907	 * 2.  Get two copies of PD BO list from all the VMs. Keep one copy as pd_list.
  2908	 * 3   Use the second PD list and kfd_bo_list to create a list (ctx.list) of
  2909	 *     BOs that need to be reserved.
  2910	 * 4.  Reserve all the BOs
  2911	 * 5.  Validate of PD and PT BOs.
  2912	 * 6.  Validate all KFD BOs using kfd_bo_list and Map them and add new fence
  2913	 * 7.  Add fence to all PD and PT BOs.
  2914	 * 8.  Unreserve all BOs
  2915	 */
  2916	int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence __rcu **ef)
  2917	{
  2918		struct amdkfd_process_info *process_info = info;
  2919		struct amdgpu_vm *peer_vm;
  2920		struct kgd_mem *mem;
  2921		struct list_head duplicate_save;
  2922		struct amdgpu_sync sync_obj;
  2923		unsigned long failed_size = 0;
  2924		unsigned long total_size = 0;
  2925		struct drm_exec exec;
  2926		int ret;
  2927	
  2928		INIT_LIST_HEAD(&duplicate_save);
  2929	
  2930		mutex_lock(&process_info->lock);
  2931	
  2932		drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES, 0);
  2933		drm_exec_until_all_locked(&exec) {
  2934			list_for_each_entry(peer_vm, &process_info->vm_list_head,
  2935					    vm_list_node) {
  2936				ret = amdgpu_vm_lock_pd(peer_vm, &exec, 2);
  2937				drm_exec_retry_on_contention(&exec);
  2938				if (unlikely(ret)) {
  2939					pr_err("Locking VM PD failed, ret: %d\n", ret);
  2940					goto ttm_reserve_fail;
  2941				}
  2942			}
  2943	
  2944			/* Reserve all BOs and page tables/directory. Add all BOs from
  2945			 * kfd_bo_list to ctx.list
  2946			 */
  2947			list_for_each_entry(mem, &process_info->kfd_bo_list,
  2948					    validate_list) {
  2949				struct drm_gem_object *gobj;
  2950	
  2951				gobj = &mem->bo->tbo.base;
  2952				ret = drm_exec_prepare_obj(&exec, gobj, 1);
  2953				drm_exec_retry_on_contention(&exec);
  2954				if (unlikely(ret)) {
  2955					pr_err("drm_exec_prepare_obj failed, ret: %d\n", ret);
  2956					goto ttm_reserve_fail;
  2957				}
  2958			}
  2959		}
  2960	
  2961		amdgpu_sync_create(&sync_obj);
  2962	
  2963		/* Validate BOs managed by KFD */
  2964		list_for_each_entry(mem, &process_info->kfd_bo_list,
  2965				    validate_list) {
  2966	
  2967			struct amdgpu_bo *bo = mem->bo;
  2968			uint32_t domain = mem->domain;
  2969			struct dma_resv_iter cursor;
  2970			struct dma_fence *fence;
  2971	
  2972			total_size += amdgpu_bo_size(bo);
  2973	
  2974			ret = amdgpu_amdkfd_bo_validate(bo, domain, false);
  2975			if (ret) {
  2976				pr_debug("Memory eviction: Validate BOs failed\n");
  2977				failed_size += amdgpu_bo_size(bo);
  2978				ret = amdgpu_amdkfd_bo_validate(bo,
  2979							AMDGPU_GEM_DOMAIN_GTT, false);
  2980				if (ret) {
  2981					pr_debug("Memory eviction: Try again\n");
  2982					goto validate_map_fail;
  2983				}
  2984			}
  2985			dma_resv_for_each_fence(&cursor, bo->tbo.base.resv,
  2986						DMA_RESV_USAGE_KERNEL, fence) {
  2987				ret = amdgpu_sync_fence(&sync_obj, fence, GFP_KERNEL);
  2988				if (ret) {
  2989					pr_debug("Memory eviction: Sync BO fence failed. Try again\n");
  2990					goto validate_map_fail;
  2991				}
  2992			}
  2993		}
  2994	
  2995		if (failed_size)
  2996			pr_debug("0x%lx/0x%lx in system\n", failed_size, total_size);
  2997	
  2998		/* Validate PDs, PTs and evicted DMABuf imports last. Otherwise BO
  2999		 * validations above would invalidate DMABuf imports again.
  3000		 */
> 3001		ret = process_validate_vms(process_info, drm_exec_ticket(exec));
  3002		if (ret) {
  3003			pr_debug("Validating VMs failed, ret: %d\n", ret);
  3004			goto validate_map_fail;
  3005		}
  3006	
  3007		/* Update mappings managed by KFD. */
  3008		list_for_each_entry(mem, &process_info->kfd_bo_list,
  3009				    validate_list) {
  3010			struct kfd_mem_attachment *attachment;
  3011	
  3012			list_for_each_entry(attachment, &mem->attachments, list) {
  3013				if (!attachment->is_mapped)
  3014					continue;
  3015	
  3016				kfd_mem_dmaunmap_attachment(mem, attachment);
  3017				ret = update_gpuvm_pte(mem, attachment, &sync_obj);
  3018				if (ret) {
  3019					pr_debug("Memory eviction: update PTE failed. Try again\n");
  3020					goto validate_map_fail;
  3021				}
  3022			}
  3023		}
  3024	
  3025		/* Update mappings not managed by KFD */
  3026		list_for_each_entry(peer_vm, &process_info->vm_list_head,
  3027				vm_list_node) {
  3028			struct amdgpu_device *adev = amdgpu_ttm_adev(
  3029				peer_vm->root.bo->tbo.bdev);
  3030	
  3031			struct amdgpu_fpriv *fpriv =
  3032				container_of(peer_vm, struct amdgpu_fpriv, vm);
  3033	
  3034			ret = amdgpu_vm_bo_update(adev, fpriv->prt_va, false);
  3035			if (ret) {
  3036				dev_dbg(adev->dev,
  3037					"Memory eviction: handle PRT moved failed, pid %8d. Try again.\n",
  3038					pid_nr(process_info->pid));
  3039				goto validate_map_fail;
  3040			}
  3041	
  3042			ret = amdgpu_vm_handle_moved(adev, peer_vm, drm_exec_ticket(exec));
  3043			if (ret) {
  3044				dev_dbg(adev->dev,
  3045					"Memory eviction: handle moved failed, pid %8d. Try again.\n",
  3046					pid_nr(process_info->pid));
  3047				goto validate_map_fail;
  3048			}
  3049		}
  3050	
  3051		/* Update page directories */
  3052		ret = process_update_pds(process_info, &sync_obj);
  3053		if (ret) {
  3054			pr_debug("Memory eviction: update PDs failed. Try again\n");
  3055			goto validate_map_fail;
  3056		}
  3057	
  3058		/* Sync with fences on all the page tables. They implicitly depend on any
  3059		 * move fences from amdgpu_vm_handle_moved above.
  3060		 */
  3061		ret = process_sync_pds_resv(process_info, &sync_obj);
  3062		if (ret) {
  3063			pr_debug("Memory eviction: Failed to sync to PD BO moving fence. Try again\n");
  3064			goto validate_map_fail;
  3065		}
  3066	
  3067		/* Wait for validate and PT updates to finish */
  3068		amdgpu_sync_wait(&sync_obj, false);
  3069	
  3070		/* The old eviction fence may be unsignaled if restore happens
  3071		 * after a GPU reset or suspend/resume. Keep the old fence in that
  3072		 * case. Otherwise release the old eviction fence and create new
  3073		 * one, because fence only goes from unsignaled to signaled once
  3074		 * and cannot be reused. Use context and mm from the old fence.
  3075		 *
  3076		 * If an old eviction fence signals after this check, that's OK.
  3077		 * Anyone signaling an eviction fence must stop the queues first
  3078		 * and schedule another restore worker.
  3079		 */
  3080		if (dma_fence_is_signaled(&process_info->eviction_fence->base)) {
  3081			struct amdgpu_amdkfd_fence *new_fence =
  3082				amdgpu_amdkfd_fence_create(
  3083					process_info->eviction_fence->base.context,
  3084					process_info->eviction_fence->mm,
  3085					NULL, process_info->context_id);
  3086	
  3087			if (!new_fence) {
  3088				pr_err("Failed to create eviction fence\n");
  3089				ret = -ENOMEM;
  3090				goto validate_map_fail;
  3091			}
  3092			dma_fence_put(&process_info->eviction_fence->base);
  3093			process_info->eviction_fence = new_fence;
  3094			replace_eviction_fence(ef, dma_fence_get(&new_fence->base));
  3095		} else {
  3096			WARN_ONCE(*ef != &process_info->eviction_fence->base,
  3097				  "KFD eviction fence doesn't match KGD process_info");
  3098		}
  3099	
  3100		/* Attach new eviction fence to all BOs except pinned ones */
  3101		list_for_each_entry(mem, &process_info->kfd_bo_list, validate_list) {
  3102			if (mem->bo->tbo.pin_count)
  3103				continue;
  3104	
  3105			dma_resv_add_fence(mem->bo->tbo.base.resv,
  3106					   &process_info->eviction_fence->base,
  3107					   DMA_RESV_USAGE_BOOKKEEP);
  3108		}
  3109		/* Attach eviction fence to PD / PT BOs and DMABuf imports */
  3110		list_for_each_entry(peer_vm, &process_info->vm_list_head,
  3111				    vm_list_node) {
  3112			struct amdgpu_bo *bo = peer_vm->root.bo;
  3113	
  3114			dma_resv_add_fence(bo->tbo.base.resv,
  3115					   &process_info->eviction_fence->base,
  3116					   DMA_RESV_USAGE_BOOKKEEP);
  3117		}
  3118	
  3119	validate_map_fail:
  3120		amdgpu_sync_free(&sync_obj);
  3121	ttm_reserve_fail:
  3122		drm_exec_fini(&exec);
  3123		mutex_unlock(&process_info->lock);
  3124		return ret;
  3125	}
  3126	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2026-04-01 18:20 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-31  9:20 [PATCH 0/5] drm/exec: drm_exec polishing Thomas Hellström
2026-03-31  9:20 ` [PATCH 1/5] drm/exec: Remove the index parameter from drm_exec_for_each_locked_obj[_reverse] Thomas Hellström
2026-03-31  9:29   ` Christian König
2026-03-31  9:20 ` [PATCH 2/5] drm/msm: Remove abuse of drm_exec internals Thomas Hellström
2026-03-31  9:30   ` Christian König
2026-03-31  9:36   ` Christian König
2026-03-31 19:08   ` Rob Clark
2026-03-31 19:52     ` Thomas Hellström
2026-03-31 20:39       ` Rob Clark
2026-03-31  9:20 ` [PATCH 3/5] drm/exec: Make the drm_exec_until_all_locked() macro more readable Thomas Hellström
2026-03-31  9:39   ` Christian König
2026-03-31 11:03     ` Thomas Hellström
2026-03-31  9:20 ` [PATCH 4/5] drm/exec, drm/xe: Avoid abusing the drm_exec retry pointer Thomas Hellström
2026-03-31  9:44   ` Christian König
2026-03-31 10:13     ` Thomas Hellström
2026-03-31 11:09       ` Thomas Hellström
2026-03-31 11:59       ` Christian König
2026-03-31  9:20 ` [PATCH 5/5] drm/exec, drm/xe, drm/amdgpu: Add an accessor for struct drm_exec::ticket Thomas Hellström
2026-03-31  9:46   ` Christian König
2026-03-31 10:18     ` Thomas Hellström
2026-03-31 21:46   ` kernel test robot
2026-03-31 22:07   ` kernel test robot
2026-04-01  0:38   ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox