intel-xe.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/15] Driver-managed exhaustive eviction
@ 2025-08-13 10:51 Thomas Hellström
  2025-08-13 10:51 ` [PATCH 01/15] drm/xe/vm: Don't use a pin the vm_resv during validation Thomas Hellström
                   ` (18 more replies)
  0 siblings, 19 replies; 66+ messages in thread
From: Thomas Hellström @ 2025-08-13 10:51 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

Exhaustive eviction means that every client should in theory be able to
allocate all graphics memory (minus pinned memory). This is done by
evicting other client's memory.

Currently when TTM wants to evict a buffer object it will typically
trylock that buffer object. It may also optionally try a sleeping lock,
but if deadlock resolution kicks in while doing so (the locking
returns -EDEADLK), that is converted to an -ENOMEM and returned to the
caller. If there are multiple clients simultaneously wanting to evict
eachother's buffer objects, there is a chance that clients end
up returning -ENOMEM.

The key to resolving this is that on memory contention, lower
priority clients back off, releasing their buffer object locks and
thereby allow their memory to be evicted. Eventually their priority
will elevate and they will succeed. TTM has long been intending to
implent this using full drm_exec locking during eviction. This means
that when that is implemented, clients wanting to validate memory must
pass the drm_exec context used to lock its buffer object to TTM
validation. Most of this series is making sure that is done, both
on exec-type validation and buffer object creation. The big benefit of
this approach is that it can distinguish between memory types and
avoid lock release rollbacks until it is really necessary. One
drawback is that it can't handle system memory contention resolved
by a shrinker.

However, since TTM has still to implement drm_exec validation, this
series, while preparing for the TTM implementation, takes a different
approach with an outer rw semaphore on top of the drm_exec retry loop.
When a client wants to allocate graphics memory, the lock is taken in
non-exclusive mode. If an OOM is hit, the locks are released and the
outer lock is retaken in exclusive mode. That ensures that on memory
contention, the client will, when the exclusive lock is held, be
the only client trying to allocate memory. It requires, however,
that all clients adhere to the same scheme.

The idea is that when TTM implements drm_exec eviction, the driver-
managed scheme could be retired.

Patch 1 to 3 fixes fixes problems hit while testing.
Patch 4 identifies the code-paths where we need a drm_exec transaction.
Patch 5 introduces the wrapper with the rw-semaphore

The rest of the patches ensure that we wrap graphics memory
allocation in the combined rw-semaphore / drm-exec loop.

As a follow up, additional patches around suspend / resume will
be posted.

Thomas Hellström (15):
  drm/xe/vm: Don't use a pin the vm_resv during validation
  drm/xe/tests/xe_dma_buf: Set the drm_object::dma_buf member
  drm/xe/vm: Clear the scratch_pt pointer on error
  drm/xe: Pass down drm_exec context to validation
  drm/xe: Introduce an xe_validation wrapper around drm_exec
  drm/xe: Convert xe_bo_create_user() for exhaustive eviction
  drm/xe: Convert SVM validation for exhaustive eviction
  drm/xe: Convert existing drm_exec transactions for exhaustive eviction
  drm/xe: Convert the CPU fault handler for exhaustive eviction
  drm/xe/display: Convert __xe_pin_fb_vma()
  drm/xe: Convert xe_dma_buf.c for exhaustive eviction
  drm/xe: Rename ___xe_bo_create_locked()
  drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction
  drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction
  drm/xe: Convert pinned suspend eviction for exhaustive eviction

 drivers/gpu/drm/xe/Makefile                   |   1 +
 .../compat-i915-headers/gem/i915_gem_stolen.h |  24 +-
 drivers/gpu/drm/xe/display/intel_fbdev_fb.c   |  18 +-
 drivers/gpu/drm/xe/display/xe_dsb_buffer.c    |  10 +-
 drivers/gpu/drm/xe/display/xe_fb_pin.c        |  62 +-
 drivers/gpu/drm/xe/display/xe_hdcp_gsc.c      |   8 +-
 drivers/gpu/drm/xe/display/xe_plane_initial.c |   4 +-
 drivers/gpu/drm/xe/tests/xe_bo.c              |  36 +-
 drivers/gpu/drm/xe/tests/xe_dma_buf.c         |  24 +-
 drivers/gpu/drm/xe/tests/xe_migrate.c         |  66 +-
 drivers/gpu/drm/xe/xe_bo.c                    | 589 ++++++++++++------
 drivers/gpu/drm/xe/xe_bo.h                    |  56 +-
 drivers/gpu/drm/xe/xe_device.c                |   2 +
 drivers/gpu/drm/xe/xe_device_types.h          |   3 +
 drivers/gpu/drm/xe/xe_dma_buf.c               |  70 ++-
 drivers/gpu/drm/xe/xe_eu_stall.c              |   6 +-
 drivers/gpu/drm/xe/xe_exec.c                  |  26 +-
 drivers/gpu/drm/xe/xe_ggtt.c                  |  15 +-
 drivers/gpu/drm/xe/xe_ggtt.h                  |   5 +-
 drivers/gpu/drm/xe/xe_gsc.c                   |   8 +-
 drivers/gpu/drm/xe/xe_gt_pagefault.c          |  24 +-
 drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c    |  22 +-
 drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c |  24 +-
 drivers/gpu/drm/xe/xe_guc_engine_activity.c   |  13 +-
 drivers/gpu/drm/xe/xe_lmtt.c                  |  12 +-
 drivers/gpu/drm/xe/xe_lrc.c                   |   7 +-
 drivers/gpu/drm/xe/xe_migrate.c               |  20 +-
 drivers/gpu/drm/xe/xe_oa.c                    |   6 +-
 drivers/gpu/drm/xe/xe_pt.c                    |  10 +-
 drivers/gpu/drm/xe/xe_pt.h                    |   3 +-
 drivers/gpu/drm/xe/xe_pxp_submit.c            |  34 +-
 drivers/gpu/drm/xe/xe_svm.c                   |  65 +-
 drivers/gpu/drm/xe/xe_validation.c            | 248 ++++++++
 drivers/gpu/drm/xe/xe_validation.h            | 176 ++++++
 drivers/gpu/drm/xe/xe_vm.c                    | 287 +++++----
 drivers/gpu/drm/xe/xe_vm.h                    |  52 +-
 drivers/gpu/drm/xe/xe_vm_types.h              |  32 +-
 37 files changed, 1413 insertions(+), 655 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_validation.c
 create mode 100644 drivers/gpu/drm/xe/xe_validation.h

-- 
2.50.1


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 01/15] drm/xe/vm: Don't use a pin the vm_resv during validation
  2025-08-13 10:51 [PATCH 00/15] Driver-managed exhaustive eviction Thomas Hellström
@ 2025-08-13 10:51 ` Thomas Hellström
  2025-08-13 14:28   ` Matthew Brost
  2025-08-13 10:51 ` [PATCH 02/15] drm/xe/tests/xe_dma_buf: Set the drm_object::dma_buf member Thomas Hellström
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 66+ messages in thread
From: Thomas Hellström @ 2025-08-13 10:51 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

The pinning has the odd side-effect that unlocking *any* resv
during validation triggers an "unlocking pinned lock" warning.

Cc: Matthew Brost <matthew.brost@intel.com>
Fixes: 9d5558649f68 ("drm/xe: Rework eviction rejection of bound external bos")
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c |  5 ++---
 drivers/gpu/drm/xe/xe_vm.h | 15 ++-------------
 2 files changed, 4 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 6fea39842e1e..11eaf3b06766 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -2468,7 +2468,6 @@ int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict)
 		.no_wait_gpu = false,
 		.gfp_retry_mayfail = true,
 	};
-	struct pin_cookie cookie;
 	int ret;
 
 	if (vm) {
@@ -2479,10 +2478,10 @@ int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict)
 		ctx.resv = xe_vm_resv(vm);
 	}
 
-	cookie = xe_vm_set_validating(vm, allow_res_evict);
+	xe_vm_set_validating(vm, allow_res_evict);
 	trace_xe_bo_validate(bo);
 	ret = ttm_bo_validate(&bo->ttm, &bo->placement, &ctx);
-	xe_vm_clear_validating(vm, allow_res_evict, cookie);
+	xe_vm_clear_validating(vm, allow_res_evict);
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 2f213737c7e5..2ecb417c19a2 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -315,22 +315,14 @@ void xe_vm_snapshot_free(struct xe_vm_snapshot *snap);
  * Register this task as currently making bos resident for the vm. Intended
  * to avoid eviction by the same task of shared bos bound to the vm.
  * Call with the vm's resv lock held.
- *
- * Return: A pin cookie that should be used for xe_vm_clear_validating().
  */
-static inline struct pin_cookie xe_vm_set_validating(struct xe_vm *vm,
-						     bool allow_res_evict)
+static inline void xe_vm_set_validating(struct xe_vm *vm, bool allow_res_evict)
 {
-	struct pin_cookie cookie = {};
-
 	if (vm && !allow_res_evict) {
 		xe_vm_assert_held(vm);
-		cookie = lockdep_pin_lock(&xe_vm_resv(vm)->lock.base);
 		/* Pairs with READ_ONCE in xe_vm_is_validating() */
 		WRITE_ONCE(vm->validating, current);
 	}
-
-	return cookie;
 }
 
 /**
@@ -338,17 +330,14 @@ static inline struct pin_cookie xe_vm_set_validating(struct xe_vm *vm,
  * @vm: Pointer to the vm or NULL
  * @allow_res_evict: Eviction from @vm was allowed. Must be set to the same
  * value as for xe_vm_set_validation().
- * @cookie: Cookie obtained from xe_vm_set_validating().
  *
  * Register this task as currently making bos resident for the vm. Intended
  * to avoid eviction by the same task of shared bos bound to the vm.
  * Call with the vm's resv lock held.
  */
-static inline void xe_vm_clear_validating(struct xe_vm *vm, bool allow_res_evict,
-					  struct pin_cookie cookie)
+static inline void xe_vm_clear_validating(struct xe_vm *vm, bool allow_res_evict)
 {
 	if (vm && !allow_res_evict) {
-		lockdep_unpin_lock(&xe_vm_resv(vm)->lock.base, cookie);
 		/* Pairs with READ_ONCE in xe_vm_is_validating() */
 		WRITE_ONCE(vm->validating, NULL);
 	}
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 02/15] drm/xe/tests/xe_dma_buf: Set the drm_object::dma_buf member
  2025-08-13 10:51 [PATCH 00/15] Driver-managed exhaustive eviction Thomas Hellström
  2025-08-13 10:51 ` [PATCH 01/15] drm/xe/vm: Don't use a pin the vm_resv during validation Thomas Hellström
@ 2025-08-13 10:51 ` Thomas Hellström
  2025-08-14  2:52   ` Matthew Brost
  2025-08-13 10:51 ` [PATCH 03/15] drm/xe/vm: Clear the scratch_pt pointer on error Thomas Hellström
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 66+ messages in thread
From: Thomas Hellström @ 2025-08-13 10:51 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

This member is set when exporting using prime. However
the xe_gem_prime_export() alone doesn't set it, since it's done
later in the prime export flow.
For the test, set it manually and remove the hack that set it
temporarily when it was really needed.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/tests/xe_dma_buf.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/xe/tests/xe_dma_buf.c b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
index c53f67ce4b0a..cde9530bef8c 100644
--- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
@@ -57,16 +57,12 @@ static void check_residency(struct kunit *test, struct xe_bo *exported,
 		return;
 
 	/*
-	 * Evict exporter. Note that the gem object dma_buf member isn't
-	 * set from xe_gem_prime_export(), and it's needed for the move_notify()
-	 * functionality, so hack that up here. Evicting the exported bo will
+	 * Evict exporter. Evicting the exported bo will
 	 * evict also the imported bo through the move_notify() functionality if
 	 * importer is on a different device. If they're on the same device,
 	 * the exporter and the importer should be the same bo.
 	 */
-	swap(exported->ttm.base.dma_buf, dmabuf);
 	ret = xe_bo_evict(exported);
-	swap(exported->ttm.base.dma_buf, dmabuf);
 	if (ret) {
 		if (ret != -EINTR && ret != -ERESTARTSYS)
 			KUNIT_FAIL(test, "Evicting exporter failed with err=%d.\n",
@@ -139,6 +135,7 @@ static void xe_test_dmabuf_import_same_driver(struct xe_device *xe)
 			   PTR_ERR(dmabuf));
 		goto out;
 	}
+	bo->ttm.base.dma_buf = dmabuf;
 
 	import = xe_gem_prime_import(&xe->drm, dmabuf);
 	if (!IS_ERR(import)) {
@@ -186,6 +183,7 @@ static void xe_test_dmabuf_import_same_driver(struct xe_device *xe)
 		KUNIT_FAIL(test, "dynamic p2p attachment failed with err=%ld\n",
 			   PTR_ERR(import));
 	}
+	bo->ttm.base.dma_buf = NULL;
 	dma_buf_put(dmabuf);
 out:
 	drm_gem_object_put(&bo->ttm.base);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 03/15] drm/xe/vm: Clear the scratch_pt pointer on error
  2025-08-13 10:51 [PATCH 00/15] Driver-managed exhaustive eviction Thomas Hellström
  2025-08-13 10:51 ` [PATCH 01/15] drm/xe/vm: Don't use a pin the vm_resv during validation Thomas Hellström
  2025-08-13 10:51 ` [PATCH 02/15] drm/xe/tests/xe_dma_buf: Set the drm_object::dma_buf member Thomas Hellström
@ 2025-08-13 10:51 ` Thomas Hellström
  2025-08-13 14:45   ` Matthew Brost
  2025-08-13 10:51 ` [PATCH 04/15] drm/xe: Pass down drm_exec context to validation Thomas Hellström
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 66+ messages in thread
From: Thomas Hellström @ 2025-08-13 10:51 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Brian Welty, Rodrigo Vivi, Lucas De Marchi,
	stable, Matthew Brost, Joonas Lahtinen, Jani Nikula,
	Maarten Lankhorst, Matthew Auld

Avoid triggering a dereference of an error pointer on cleanup in
xe_vm_free_scratch() by clearing any scratch_pt error pointer.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Fixes: 06951c2ee72d ("drm/xe: Use NULL PTEs as scratch PTEs")
Cc: Brian Welty <brian.welty@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
---
 drivers/gpu/drm/xe/xe_vm.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index d40d2d43c041..12e661960244 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1635,8 +1635,12 @@ static int xe_vm_create_scratch(struct xe_device *xe, struct xe_tile *tile,
 
 	for (i = MAX_HUGEPTE_LEVEL; i < vm->pt_root[id]->level; i++) {
 		vm->scratch_pt[id][i] = xe_pt_create(vm, tile, i);
-		if (IS_ERR(vm->scratch_pt[id][i]))
-			return PTR_ERR(vm->scratch_pt[id][i]);
+		if (IS_ERR(vm->scratch_pt[id][i])) {
+			int err = PTR_ERR(vm->scratch_pt[id][i]);
+
+			vm->scratch_pt[id][i] = NULL;
+			return err;
+		}
 
 		xe_pt_populate_empty(tile, vm, vm->scratch_pt[id][i]);
 	}
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 04/15] drm/xe: Pass down drm_exec context to validation
  2025-08-13 10:51 [PATCH 00/15] Driver-managed exhaustive eviction Thomas Hellström
                   ` (2 preceding siblings ...)
  2025-08-13 10:51 ` [PATCH 03/15] drm/xe/vm: Clear the scratch_pt pointer on error Thomas Hellström
@ 2025-08-13 10:51 ` Thomas Hellström
  2025-08-13 16:42   ` Matthew Brost
  2025-08-13 10:51 ` [PATCH 05/15] drm/xe: Introduce an xe_validation wrapper around drm_exec Thomas Hellström
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 66+ messages in thread
From: Thomas Hellström @ 2025-08-13 10:51 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

We want all validation (potential backing store allocation) to be part
of a drm_exec transaction. Therefore add a drm_exec pointer argument
to xe_bo_validate() and ___xe_bo_create_locked(). Upcoming patches
will deal with making all (or nearly all) calls to these functions
part of a drm_exec transaction. In the meantime, define special values
of the drm_exec pointer:

XE_VALIDATION_UNIMPLEMENTED: Implementation of the drm_exec transaction
has not been done yet.
XE_VALIDATION_UNSUPPORTED: Some Middle-layers (dma-buf) doesn't allow
the drm_exec context to be passed down to map_attachment where
validation takes place.
XE_VALIDATION_OPT_OUT: May be used only for kunit tests where exhaustive
eviction isn't crucial and the ROI of converting those is very
small.

For XE_VALIDATION_UNIMPLEMENTED and XE_VALIDATION_OPT_OUT there is also
a lockdep check that a drm_exec transaction can indeed start at the
location where the macro is expanded. This is to encourage
developers to take this into consideration early in the code
development process.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/Makefile                   |   1 +
 .../compat-i915-headers/gem/i915_gem_stolen.h |   6 +-
 drivers/gpu/drm/xe/display/xe_fb_pin.c        |   5 +-
 drivers/gpu/drm/xe/tests/xe_bo.c              |  20 +--
 drivers/gpu/drm/xe/tests/xe_dma_buf.c         |  12 +-
 drivers/gpu/drm/xe/tests/xe_migrate.c         |  45 +++---
 drivers/gpu/drm/xe/xe_bo.c                    | 129 +++++++++++++++---
 drivers/gpu/drm/xe/xe_bo.h                    |  20 +--
 drivers/gpu/drm/xe/xe_dma_buf.c               |  19 ++-
 drivers/gpu/drm/xe/xe_exec.c                  |   6 +-
 drivers/gpu/drm/xe/xe_ggtt.c                  |  15 +-
 drivers/gpu/drm/xe/xe_ggtt.h                  |   5 +-
 drivers/gpu/drm/xe/xe_gt_pagefault.c          |   4 +-
 drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c    |   6 +-
 drivers/gpu/drm/xe/xe_svm.c                   |   4 +-
 drivers/gpu/drm/xe/xe_validation.c            |  49 +++++++
 drivers/gpu/drm/xe/xe_validation.h            |  69 ++++++++++
 drivers/gpu/drm/xe/xe_vm.c                    |  26 +++-
 drivers/gpu/drm/xe/xe_vm.h                    |  33 ++++-
 drivers/gpu/drm/xe/xe_vm_types.h              |  32 +++--
 20 files changed, 401 insertions(+), 105 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_validation.c
 create mode 100644 drivers/gpu/drm/xe/xe_validation.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 8e0c3412a757..8ee7d275128d 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -127,6 +127,7 @@ xe-y += xe_bb.o \
 	xe_tuning.o \
 	xe_uc.o \
 	xe_uc_fw.o \
+	xe_validation.o \
 	xe_vm.o \
 	xe_vram.o \
 	xe_vram_freq.o \
diff --git a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
index 41d39d67817a..1ce1e9da975b 100644
--- a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
+++ b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
@@ -8,6 +8,7 @@
 
 #include "xe_ttm_stolen_mgr.h"
 #include "xe_res_cursor.h"
+#include "xe_validation.h"
 
 struct xe_bo;
 
@@ -20,6 +21,7 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
 						       u32 size, u32 align,
 						       u32 start, u32 end)
 {
+	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
 	struct xe_bo *bo;
 	int err;
 	u32 flags = XE_BO_FLAG_PINNED | XE_BO_FLAG_STOLEN;
@@ -34,13 +36,13 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
 
 	bo = xe_bo_create_locked_range(xe, xe_device_get_root_tile(xe),
 				       NULL, size, start, end,
-				       ttm_bo_type_kernel, flags, 0);
+				       ttm_bo_type_kernel, flags, 0, exec);
 	if (IS_ERR(bo)) {
 		err = PTR_ERR(bo);
 		bo = NULL;
 		return err;
 	}
-	err = xe_bo_pin(bo);
+	err = xe_bo_pin(bo, exec);
 	xe_bo_unlock_vm_held(bo);
 
 	if (err) {
diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
index f1f8b5ab53ef..4b0748e6fdd6 100644
--- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
+++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
@@ -281,6 +281,7 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
 	struct i915_vma *vma = kzalloc(sizeof(*vma), GFP_KERNEL);
 	struct drm_gem_object *obj = intel_fb_bo(&fb->base);
 	struct xe_bo *bo = gem_to_xe_bo(obj);
+	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
 	int ret;
 
 	if (!vma)
@@ -313,9 +314,9 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
 		goto err;
 
 	if (IS_DGFX(xe))
-		ret = xe_bo_migrate(bo, XE_PL_VRAM0);
+		ret = xe_bo_migrate(bo, XE_PL_VRAM0, exec);
 	else
-		ret = xe_bo_validate(bo, NULL, true);
+		ret = xe_bo_validate(bo, NULL, true, exec);
 	if (!ret)
 		ttm_bo_pin(&bo->ttm);
 	ttm_bo_unreserve(&bo->ttm);
diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
index bb469096d072..06ceba6c3c25 100644
--- a/drivers/gpu/drm/xe/tests/xe_bo.c
+++ b/drivers/gpu/drm/xe/tests/xe_bo.c
@@ -23,7 +23,7 @@
 
 static int ccs_test_migrate(struct xe_tile *tile, struct xe_bo *bo,
 			    bool clear, u64 get_val, u64 assign_val,
-			    struct kunit *test)
+			    struct kunit *test, struct drm_exec *exec)
 {
 	struct dma_fence *fence;
 	struct ttm_tt *ttm;
@@ -35,7 +35,7 @@ static int ccs_test_migrate(struct xe_tile *tile, struct xe_bo *bo,
 	u32 offset;
 
 	/* Move bo to VRAM if not already there. */
-	ret = xe_bo_validate(bo, NULL, false);
+	ret = xe_bo_validate(bo, NULL, false, exec);
 	if (ret) {
 		KUNIT_FAIL(test, "Failed to validate bo.\n");
 		return ret;
@@ -60,7 +60,7 @@ static int ccs_test_migrate(struct xe_tile *tile, struct xe_bo *bo,
 	}
 
 	/* Evict to system. CCS data should be copied. */
-	ret = xe_bo_evict(bo);
+	ret = xe_bo_evict(bo, exec);
 	if (ret) {
 		KUNIT_FAIL(test, "Failed to evict bo.\n");
 		return ret;
@@ -132,6 +132,7 @@ static void ccs_test_run_tile(struct xe_device *xe, struct xe_tile *tile,
 
 	/* TODO: Sanity check */
 	unsigned int bo_flags = XE_BO_FLAG_VRAM_IF_DGFX(tile);
+	struct drm_exec *exec = XE_VALIDATION_OPT_OUT;
 
 	if (IS_DGFX(xe))
 		kunit_info(test, "Testing vram id %u\n", tile->id);
@@ -149,18 +150,18 @@ static void ccs_test_run_tile(struct xe_device *xe, struct xe_tile *tile,
 
 	kunit_info(test, "Verifying that CCS data is cleared on creation.\n");
 	ret = ccs_test_migrate(tile, bo, false, 0ULL, 0xdeadbeefdeadbeefULL,
-			       test);
+			       test, exec);
 	if (ret)
 		goto out_unlock;
 
 	kunit_info(test, "Verifying that CCS data survives migration.\n");
 	ret = ccs_test_migrate(tile, bo, false, 0xdeadbeefdeadbeefULL,
-			       0xdeadbeefdeadbeefULL, test);
+			       0xdeadbeefdeadbeefULL, test, exec);
 	if (ret)
 		goto out_unlock;
 
 	kunit_info(test, "Verifying that CCS data can be properly cleared.\n");
-	ret = ccs_test_migrate(tile, bo, true, 0ULL, 0ULL, test);
+	ret = ccs_test_migrate(tile, bo, true, 0ULL, 0ULL, test, exec);
 
 out_unlock:
 	xe_bo_unlock(bo);
@@ -210,6 +211,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
 	struct xe_bo *bo, *external;
 	unsigned int bo_flags = XE_BO_FLAG_VRAM_IF_DGFX(tile);
 	struct xe_vm *vm = xe_migrate_get_vm(xe_device_get_root_tile(xe)->migrate);
+	struct drm_exec *exec = XE_VALIDATION_OPT_OUT;
 	struct xe_gt *__gt;
 	int err, i, id;
 
@@ -236,7 +238,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
 		}
 
 		xe_bo_lock(external, false);
-		err = xe_bo_pin_external(external);
+		err = xe_bo_pin_external(external, exec);
 		xe_bo_unlock(external);
 		if (err) {
 			KUNIT_FAIL(test, "external bo pin err=%pe\n",
@@ -294,7 +296,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
 		if (i) {
 			down_read(&vm->lock);
 			xe_vm_lock(vm, false);
-			err = xe_bo_validate(bo, bo->vm, false);
+			err = xe_bo_validate(bo, bo->vm, false, exec);
 			xe_vm_unlock(vm);
 			up_read(&vm->lock);
 			if (err) {
@@ -303,7 +305,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
 				goto cleanup_all;
 			}
 			xe_bo_lock(external, false);
-			err = xe_bo_validate(external, NULL, false);
+			err = xe_bo_validate(external, NULL, false, exec);
 			xe_bo_unlock(external);
 			if (err) {
 				KUNIT_FAIL(test, "external bo valid err=%pe\n",
diff --git a/drivers/gpu/drm/xe/tests/xe_dma_buf.c b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
index cde9530bef8c..965dd3280468 100644
--- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
@@ -27,7 +27,8 @@ static bool is_dynamic(struct dma_buf_test_params *params)
 }
 
 static void check_residency(struct kunit *test, struct xe_bo *exported,
-			    struct xe_bo *imported, struct dma_buf *dmabuf)
+			    struct xe_bo *imported, struct dma_buf *dmabuf,
+			    struct drm_exec *exec)
 {
 	struct dma_buf_test_params *params = to_dma_buf_test_params(test->priv);
 	u32 mem_type;
@@ -62,7 +63,7 @@ static void check_residency(struct kunit *test, struct xe_bo *exported,
 	 * importer is on a different device. If they're on the same device,
 	 * the exporter and the importer should be the same bo.
 	 */
-	ret = xe_bo_evict(exported);
+	ret = xe_bo_evict(exported, exec);
 	if (ret) {
 		if (ret != -EINTR && ret != -ERESTARTSYS)
 			KUNIT_FAIL(test, "Evicting exporter failed with err=%d.\n",
@@ -77,7 +78,7 @@ static void check_residency(struct kunit *test, struct xe_bo *exported,
 	}
 
 	/* Re-validate the importer. This should move also exporter in. */
-	ret = xe_bo_validate(imported, NULL, false);
+	ret = xe_bo_validate(imported, NULL, false, exec);
 	if (ret) {
 		if (ret != -EINTR && ret != -ERESTARTSYS)
 			KUNIT_FAIL(test, "Validating importer failed with err=%d.\n",
@@ -150,11 +151,12 @@ static void xe_test_dmabuf_import_same_driver(struct xe_device *xe)
 			KUNIT_FAIL(test,
 				   "xe_gem_prime_import() succeeded when it shouldn't have\n");
 		} else {
+			struct drm_exec *exec = XE_VALIDATION_OPT_OUT;
 			int err;
 
 			/* Is everything where we expect it to be? */
 			xe_bo_lock(import_bo, false);
-			err = xe_bo_validate(import_bo, NULL, false);
+			err = xe_bo_validate(import_bo, NULL, false, exec);
 
 			/* Pinning in VRAM is not allowed. */
 			if (!is_dynamic(params) &&
@@ -167,7 +169,7 @@ static void xe_test_dmabuf_import_same_driver(struct xe_device *xe)
 						  err == -ERESTARTSYS);
 
 			if (!err)
-				check_residency(test, bo, import_bo, dmabuf);
+				check_residency(test, bo, import_bo, dmabuf, exec);
 			xe_bo_unlock(import_bo);
 		}
 		drm_gem_object_put(import);
diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
index edd1e701aa1c..dfb445d09759 100644
--- a/drivers/gpu/drm/xe/tests/xe_migrate.c
+++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
@@ -70,7 +70,7 @@ static int run_sanity_job(struct xe_migrate *m, struct xe_device *xe,
 		} } while (0)
 
 static void test_copy(struct xe_migrate *m, struct xe_bo *bo,
-		      struct kunit *test, u32 region)
+		      struct kunit *test, u32 region, struct drm_exec *exec)
 {
 	struct xe_device *xe = tile_to_xe(m->tile);
 	u64 retval, expected = 0;
@@ -84,14 +84,15 @@ static void test_copy(struct xe_migrate *m, struct xe_bo *bo,
 						   ttm_bo_type_kernel,
 						   region |
 						   XE_BO_FLAG_NEEDS_CPU_ACCESS |
-						   XE_BO_FLAG_PINNED);
+						   XE_BO_FLAG_PINNED,
+						   exec);
 	if (IS_ERR(remote)) {
 		KUNIT_FAIL(test, "Failed to allocate remote bo for %s: %pe\n",
 			   str, remote);
 		return;
 	}
 
-	err = xe_bo_validate(remote, NULL, false);
+	err = xe_bo_validate(remote, NULL, false, exec);
 	if (err) {
 		KUNIT_FAIL(test, "Failed to validate system bo for %s: %i\n",
 			   str, err);
@@ -161,13 +162,13 @@ static void test_copy(struct xe_migrate *m, struct xe_bo *bo,
 }
 
 static void test_copy_sysmem(struct xe_migrate *m, struct xe_bo *bo,
-			     struct kunit *test)
+			     struct drm_exec *exec, struct kunit *test)
 {
-	test_copy(m, bo, test, XE_BO_FLAG_SYSTEM);
+	test_copy(m, bo, test, XE_BO_FLAG_SYSTEM, exec);
 }
 
 static void test_copy_vram(struct xe_migrate *m, struct xe_bo *bo,
-			   struct kunit *test)
+			   struct drm_exec *exec, struct kunit *test)
 {
 	u32 region;
 
@@ -178,10 +179,11 @@ static void test_copy_vram(struct xe_migrate *m, struct xe_bo *bo,
 		region = XE_BO_FLAG_VRAM1;
 	else
 		region = XE_BO_FLAG_VRAM0;
-	test_copy(m, bo, test, region);
+	test_copy(m, bo, test, region, exec);
 }
 
-static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
+static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test,
+				   struct drm_exec *exec)
 {
 	struct xe_tile *tile = m->tile;
 	struct xe_device *xe = tile_to_xe(tile);
@@ -290,10 +292,10 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
 	check(retval, expected, "Command clear small last value", test);
 
 	kunit_info(test, "Copying small buffer object to system\n");
-	test_copy_sysmem(m, tiny, test);
+	test_copy_sysmem(m, tiny, exec, test);
 	if (xe->info.tile_count > 1) {
 		kunit_info(test, "Copying small buffer object to other vram\n");
-		test_copy_vram(m, tiny, test);
+		test_copy_vram(m, tiny, exec, test);
 	}
 
 	/* Clear a big bo */
@@ -312,10 +314,10 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
 	check(retval, expected, "Command clear big last value", test);
 
 	kunit_info(test, "Copying big buffer object to system\n");
-	test_copy_sysmem(m, big, test);
+	test_copy_sysmem(m, big, exec, test);
 	if (xe->info.tile_count > 1) {
 		kunit_info(test, "Copying big buffer object to other vram\n");
-		test_copy_vram(m, big, test);
+		test_copy_vram(m, big, exec, test);
 	}
 
 out:
@@ -343,10 +345,11 @@ static int migrate_test_run_device(struct xe_device *xe)
 
 	for_each_tile(tile, xe, id) {
 		struct xe_migrate *m = tile->migrate;
+		struct drm_exec *exec = XE_VALIDATION_OPT_OUT;
 
 		kunit_info(test, "Testing tile id %d.\n", id);
 		xe_vm_lock(m->q->vm, false);
-		xe_migrate_sanity_test(m, test);
+		xe_migrate_sanity_test(m, test, exec);
 		xe_vm_unlock(m->q->vm);
 	}
 
@@ -490,7 +493,7 @@ static struct dma_fence *blt_copy(struct xe_tile *tile,
 
 static void test_migrate(struct xe_device *xe, struct xe_tile *tile,
 			 struct xe_bo *sys_bo, struct xe_bo *vram_bo, struct xe_bo *ccs_bo,
-			 struct kunit *test)
+			 struct drm_exec *exec, struct kunit *test)
 {
 	struct dma_fence *fence;
 	u64 expected, retval;
@@ -509,7 +512,7 @@ static void test_migrate(struct xe_device *xe, struct xe_tile *tile,
 	dma_fence_put(fence);
 
 	kunit_info(test, "Evict vram buffer object\n");
-	ret = xe_bo_evict(vram_bo);
+	ret = xe_bo_evict(vram_bo, exec);
 	if (ret) {
 		KUNIT_FAIL(test, "Failed to evict bo.\n");
 		return;
@@ -538,7 +541,7 @@ static void test_migrate(struct xe_device *xe, struct xe_tile *tile,
 	dma_fence_put(fence);
 
 	kunit_info(test, "Restore vram buffer object\n");
-	ret = xe_bo_validate(vram_bo, NULL, false);
+	ret = xe_bo_validate(vram_bo, NULL, false, exec);
 	if (ret) {
 		KUNIT_FAIL(test, "Failed to validate vram bo for: %li\n", ret);
 		return;
@@ -636,6 +639,7 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
 {
 	struct xe_bo *sys_bo, *vram_bo = NULL, *ccs_bo = NULL;
 	unsigned int bo_flags = XE_BO_FLAG_VRAM_IF_DGFX(tile);
+	struct drm_exec *exec;
 	long ret;
 
 	sys_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M,
@@ -650,8 +654,9 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
 		return;
 	}
 
+	exec = XE_VALIDATION_OPT_OUT;
 	xe_bo_lock(sys_bo, false);
-	ret = xe_bo_validate(sys_bo, NULL, false);
+	ret = xe_bo_validate(sys_bo, NULL, false, exec);
 	if (ret) {
 		KUNIT_FAIL(test, "Failed to validate system bo for: %li\n", ret);
 		goto free_sysbo;
@@ -676,7 +681,7 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
 	}
 
 	xe_bo_lock(ccs_bo, false);
-	ret = xe_bo_validate(ccs_bo, NULL, false);
+	ret = xe_bo_validate(ccs_bo, NULL, false, exec);
 	if (ret) {
 		KUNIT_FAIL(test, "Failed to validate system bo for: %li\n", ret);
 		goto free_ccsbo;
@@ -700,7 +705,7 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
 	}
 
 	xe_bo_lock(vram_bo, false);
-	ret = xe_bo_validate(vram_bo, NULL, false);
+	ret = xe_bo_validate(vram_bo, NULL, false, exec);
 	if (ret) {
 		KUNIT_FAIL(test, "Failed to validate vram bo for: %li\n", ret);
 		goto free_vrambo;
@@ -713,7 +718,7 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
 	}
 
 	test_clear(xe, tile, sys_bo, vram_bo, test);
-	test_migrate(xe, tile, sys_bo, vram_bo, ccs_bo, test);
+	test_migrate(xe, tile, sys_bo, vram_bo, ccs_bo, exec, test);
 	xe_bo_unlock(vram_bo);
 
 	xe_bo_lock(vram_bo, false);
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 11eaf3b06766..e71addf51ed0 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1139,6 +1139,7 @@ long xe_bo_shrink(struct ttm_operation_ctx *ctx, struct ttm_buffer_object *bo,
 int xe_bo_notifier_prepare_pinned(struct xe_bo *bo)
 {
 	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
+	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
 	struct xe_bo *backup;
 	int ret = 0;
 
@@ -1163,7 +1164,7 @@ int xe_bo_notifier_prepare_pinned(struct xe_bo *bo)
 	backup = ___xe_bo_create_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
 					DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
 					XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
-					XE_BO_FLAG_PINNED);
+					XE_BO_FLAG_PINNED, exec);
 	if (IS_ERR(backup)) {
 		ret = PTR_ERR(backup);
 		goto out_unlock_bo;
@@ -1214,6 +1215,7 @@ int xe_bo_notifier_unprepare_pinned(struct xe_bo *bo)
 int xe_bo_evict_pinned(struct xe_bo *bo)
 {
 	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
+	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
 	struct xe_bo *backup = bo->backup_obj;
 	bool backup_created = false;
 	bool unmap = false;
@@ -1242,7 +1244,7 @@ int xe_bo_evict_pinned(struct xe_bo *bo)
 						NULL, xe_bo_size(bo),
 						DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
 						XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
-						XE_BO_FLAG_PINNED);
+						XE_BO_FLAG_PINNED, exec);
 		if (IS_ERR(backup)) {
 			ret = PTR_ERR(backup);
 			goto out_unlock_bo;
@@ -1718,12 +1720,14 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
 	struct xe_device *xe = to_xe_device(ddev);
 	struct xe_bo *bo = ttm_to_xe_bo(tbo);
 	bool needs_rpm = bo->flags & XE_BO_FLAG_VRAM_MASK;
+	struct drm_exec *exec;
 	vm_fault_t ret;
 	int idx;
 
 	if (needs_rpm)
 		xe_pm_runtime_get(xe);
 
+	exec = XE_VALIDATION_UNIMPLEMENTED;
 	ret = ttm_bo_vm_reserve(tbo, vmf);
 	if (ret)
 		goto out;
@@ -1731,6 +1735,7 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
 	if (drm_dev_enter(ddev, &idx)) {
 		trace_xe_bo_cpu_fault(bo);
 
+		xe_validation_assert_exec(xe, exec, &tbo->base);
 		ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
 					       TTM_BO_VM_NUM_PREFAULT);
 		drm_dev_exit(idx);
@@ -1850,11 +1855,32 @@ void xe_bo_free(struct xe_bo *bo)
 	kfree(bo);
 }
 
+/**
+ * ___xe_bo_create_locked() - Initialize or create an xe_bo.
+ * @xe: The xe device.
+ * @bo: An already allocated buffer object or NULL
+ * if the function should allocate a new one.
+ * @tile: The tile to select for migration of this bo, and the tile used for
+ * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
+ * @resv: Pointer to a locked shared reservation object to use fo this bo,
+ * or NULL for the xe_bo to use its own.
+ * @bulk: The bulk move to use for LRU bumping, or NULL for external bos.
+ * @size: The storage size to use for the bo.
+ * @cpu_caching: The cpu caching used for system memory backing store.
+ * @type: The TTM buffer object type.
+ * @flags: XE_BO_FLAG_ flags.
+ * @exec: The drm_exec transaction to use for exhaustive eviction.
+ *
+ * Initialize or create an xe buffer object. On failure, any allocated buffer
+ * object passed in @bo will have been unreferenced.
+ *
+ * Return: The buffer object on success. Negative error pointer on failure.
+ */
 struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
 				     struct xe_tile *tile, struct dma_resv *resv,
 				     struct ttm_lru_bulk_move *bulk, size_t size,
 				     u16 cpu_caching, enum ttm_bo_type type,
-				     u32 flags)
+				     u32 flags, struct drm_exec *exec)
 {
 	struct ttm_operation_ctx ctx = {
 		.interruptible = true,
@@ -1923,6 +1949,7 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
 		ctx.resv = resv;
 	}
 
+	xe_validation_assert_exec(xe, exec, &bo->ttm.base);
 	if (!(flags & XE_BO_FLAG_FIXED_PLACEMENT)) {
 		err = __xe_bo_placement_for_flags(xe, bo, bo->flags);
 		if (WARN_ON(err)) {
@@ -2024,7 +2051,7 @@ __xe_bo_create_locked(struct xe_device *xe,
 		      struct xe_tile *tile, struct xe_vm *vm,
 		      size_t size, u64 start, u64 end,
 		      u16 cpu_caching, enum ttm_bo_type type, u32 flags,
-		      u64 alignment)
+		      u64 alignment, struct drm_exec *exec)
 {
 	struct xe_bo *bo = NULL;
 	int err;
@@ -2049,7 +2076,7 @@ __xe_bo_create_locked(struct xe_device *xe,
 				    vm && !xe_vm_in_fault_mode(vm) &&
 				    flags & XE_BO_FLAG_USER ?
 				    &vm->lru_bulk_move : NULL, size,
-				    cpu_caching, type, flags);
+				    cpu_caching, type, flags, exec);
 	if (IS_ERR(bo))
 		return bo;
 
@@ -2083,9 +2110,10 @@ __xe_bo_create_locked(struct xe_device *xe,
 
 			if (flags & XE_BO_FLAG_FIXED_PLACEMENT) {
 				err = xe_ggtt_insert_bo_at(t->mem.ggtt, bo,
-							   start + xe_bo_size(bo), U64_MAX);
+							   start + xe_bo_size(bo), U64_MAX,
+							   exec);
 			} else {
-				err = xe_ggtt_insert_bo(t->mem.ggtt, bo);
+				err = xe_ggtt_insert_bo(t->mem.ggtt, bo, exec);
 			}
 			if (err)
 				goto err_unlock_put_bo;
@@ -2102,22 +2130,59 @@ __xe_bo_create_locked(struct xe_device *xe,
 	return ERR_PTR(err);
 }
 
+/**
+ * xe_bo_create_locked_range() - Create a BO with range- and alignment options
+ * @xe: The xe device.
+ * @tile: The tile to select for migration of this bo, and the tile used for
+ * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
+ * @vm: The local vm or NULL for external objects.
+ * @size: The storage size to use for the bo.
+ * @start: Start of fixed VRAM range or 0.
+ * @end: End of fixed VRAM range or ~0ULL.
+ * @type: The TTM buffer object type.
+ * @flags: XE_BO_FLAG_ flags.
+ * @alignment: For GGTT buffer objects, the minimum GGTT alignment.
+ * @exec: The drm_exec transaction to use for exhaustive eviction.
+ *
+ * Create an Xe BO with range- and alignment options. If @start and @end indicate
+ * a fixed VRAM range, this must be a ttm_bo_type_kernel bo with VRAM placement
+ * only. The @alignment parameter can be used for GGTT alignment.
+ *
+ * Return: The buffer object on success. Negative error pointer on failure.
+ */
 struct xe_bo *
 xe_bo_create_locked_range(struct xe_device *xe,
 			  struct xe_tile *tile, struct xe_vm *vm,
 			  size_t size, u64 start, u64 end,
-			  enum ttm_bo_type type, u32 flags, u64 alignment)
+			  enum ttm_bo_type type, u32 flags, u64 alignment,
+			  struct drm_exec *exec)
 {
 	return __xe_bo_create_locked(xe, tile, vm, size, start, end, 0, type,
-				     flags, alignment);
+				     flags, alignment, exec);
 }
 
+/**
+ * xe_bo_create_locked() - Create a BO
+ * @xe: The xe device.
+ * @tile: The tile to select for migration of this bo, and the tile used for
+ * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
+ * @vm: The local vm or NULL for external objects.
+ * @size: The storage size to use for the bo.
+ * @type: The TTM buffer object type.
+ * @flags: XE_BO_FLAG_ flags.
+ * @exec: The drm_exec transaction to use for exhaustive eviction.
+ *
+ * Create a locked xe BO with no range- nor alignment restrictions.
+ *
+ * Return: The buffer object on success. Negative error pointer on failure.
+ */
 struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
 				  struct xe_vm *vm, size_t size,
-				  enum ttm_bo_type type, u32 flags)
+				  enum ttm_bo_type type, u32 flags,
+				  struct drm_exec *exec)
 {
 	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, type,
-				     flags, 0);
+				     flags, 0, exec);
 }
 
 struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
@@ -2125,9 +2190,10 @@ struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
 				u16 cpu_caching,
 				u32 flags)
 {
+	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
 	struct xe_bo *bo = __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL,
 						 cpu_caching, ttm_bo_type_device,
-						 flags | XE_BO_FLAG_USER, 0);
+						 flags | XE_BO_FLAG_USER, 0, exec);
 	if (!IS_ERR(bo))
 		xe_bo_unlock_vm_held(bo);
 
@@ -2138,7 +2204,8 @@ struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
 			   struct xe_vm *vm, size_t size,
 			   enum ttm_bo_type type, u32 flags)
 {
-	struct xe_bo *bo = xe_bo_create_locked(xe, tile, vm, size, type, flags);
+	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
+	struct xe_bo *bo = xe_bo_create_locked(xe, tile, vm, size, type, flags, exec);
 
 	if (!IS_ERR(bo))
 		xe_bo_unlock_vm_held(bo);
@@ -2166,6 +2233,7 @@ struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
 	int err;
 	u64 start = offset == ~0ull ? 0 : offset;
 	u64 end = offset == ~0ull ? offset : start + size;
+	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
 
 	if (flags & XE_BO_FLAG_STOLEN &&
 	    xe_ttm_stolen_cpu_access_needs_ggtt(xe))
@@ -2173,11 +2241,11 @@ struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
 
 	bo = xe_bo_create_locked_range(xe, tile, vm, size, start, end, type,
 				       flags | XE_BO_FLAG_NEEDS_CPU_ACCESS | XE_BO_FLAG_PINNED,
-				       alignment);
+				       alignment, exec);
 	if (IS_ERR(bo))
 		return bo;
 
-	err = xe_bo_pin(bo);
+	err = xe_bo_pin(bo, exec);
 	if (err)
 		goto err_put;
 
@@ -2299,6 +2367,7 @@ uint64_t vram_region_gpu_offset(struct ttm_resource *res)
 /**
  * xe_bo_pin_external - pin an external BO
  * @bo: buffer object to be pinned
+ * @exec: The drm_exec transaction to use for exhaustive eviction.
  *
  * Pin an external (not tied to a VM, can be exported via dma-buf / prime FD)
  * BO. Unique call compared to xe_bo_pin as this function has it own set of
@@ -2306,7 +2375,7 @@ uint64_t vram_region_gpu_offset(struct ttm_resource *res)
  *
  * Returns 0 for success, negative error code otherwise.
  */
-int xe_bo_pin_external(struct xe_bo *bo)
+int xe_bo_pin_external(struct xe_bo *bo, struct drm_exec *exec)
 {
 	struct xe_device *xe = xe_bo_device(bo);
 	int err;
@@ -2315,7 +2384,7 @@ int xe_bo_pin_external(struct xe_bo *bo)
 	xe_assert(xe, xe_bo_is_user(bo));
 
 	if (!xe_bo_is_pinned(bo)) {
-		err = xe_bo_validate(bo, NULL, false);
+		err = xe_bo_validate(bo, NULL, false, exec);
 		if (err)
 			return err;
 
@@ -2337,7 +2406,17 @@ int xe_bo_pin_external(struct xe_bo *bo)
 	return 0;
 }
 
-int xe_bo_pin(struct xe_bo *bo)
+/**
+ * xe_bo_pin() - Pin a kernel bo after potentially migrating it
+ * @bo: The kernel bo to pin.
+ * @exec: The drm_exec transaction to use for exhaustive eviction.
+ *
+ * Attempts to migrate a bo to @bo->placement. If that succeeds,
+ * pins the bo.
+ *
+ * Return: %0 on success, negative error code on migration failure.
+ */
+int xe_bo_pin(struct xe_bo *bo, struct drm_exec *exec)
 {
 	struct ttm_place *place = &bo->placements[0];
 	struct xe_device *xe = xe_bo_device(bo);
@@ -2359,7 +2438,7 @@ int xe_bo_pin(struct xe_bo *bo)
 	/* We only expect at most 1 pin */
 	xe_assert(xe, !xe_bo_is_pinned(bo));
 
-	err = xe_bo_validate(bo, NULL, false);
+	err = xe_bo_validate(bo, NULL, false, exec);
 	if (err)
 		return err;
 
@@ -2452,6 +2531,7 @@ void xe_bo_unpin(struct xe_bo *bo)
  *      NULL. Used together with @allow_res_evict.
  * @allow_res_evict: Whether it's allowed to evict bos sharing @vm's
  *                   reservation object.
+ * @exec: The drm_exec transaction to use for exhaustive eviction.
  *
  * Make sure the bo is in allowed placement, migrating it if necessary. If
  * needed, other bos will be evicted. If bos selected for eviction shares
@@ -2461,7 +2541,8 @@ void xe_bo_unpin(struct xe_bo *bo)
  * Return: 0 on success, negative error code on failure. May return
  * -EINTR or -ERESTARTSYS if internal waits are interrupted by a signal.
  */
-int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict)
+int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict,
+		   struct drm_exec *exec)
 {
 	struct ttm_operation_ctx ctx = {
 		.interruptible = true,
@@ -2480,6 +2561,7 @@ int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict)
 
 	xe_vm_set_validating(vm, allow_res_evict);
 	trace_xe_bo_validate(bo);
+	xe_validation_assert_exec(xe_bo_device(bo), exec, &bo->ttm.base);
 	ret = ttm_bo_validate(&bo->ttm, &bo->placement, &ctx);
 	xe_vm_clear_validating(vm, allow_res_evict);
 
@@ -2917,6 +2999,7 @@ static void xe_place_from_ttm_type(u32 mem_type, struct ttm_place *place)
  * xe_bo_migrate - Migrate an object to the desired region id
  * @bo: The buffer object to migrate.
  * @mem_type: The TTM region type to migrate to.
+ * @exec: The drm_exec transaction to use for exhaustive eviction.
  *
  * Attempt to migrate the buffer object to the desired memory region. The
  * buffer object may not be pinned, and must be locked.
@@ -2928,7 +3011,7 @@ static void xe_place_from_ttm_type(u32 mem_type, struct ttm_place *place)
  * Return: 0 on success. Negative error code on failure. In particular may
  * return -EINTR or -ERESTARTSYS if signal pending.
  */
-int xe_bo_migrate(struct xe_bo *bo, u32 mem_type)
+int xe_bo_migrate(struct xe_bo *bo, u32 mem_type, struct drm_exec *exec)
 {
 	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
 	struct ttm_operation_ctx ctx = {
@@ -2966,19 +3049,21 @@ int xe_bo_migrate(struct xe_bo *bo, u32 mem_type)
 		add_vram(xe, bo, &requested, bo->flags, mem_type, &c);
 	}
 
+	xe_validation_assert_exec(xe_bo_device(bo), exec, &bo->ttm.base);
 	return ttm_bo_validate(&bo->ttm, &placement, &ctx);
 }
 
 /**
  * xe_bo_evict - Evict an object to evict placement
  * @bo: The buffer object to migrate.
+ * @exec: The drm_exec transaction to use for exhaustive eviction.
  *
  * On successful completion, the object memory will be moved to evict
  * placement. This function blocks until the object has been fully moved.
  *
  * Return: 0 on success. Negative error code on failure.
  */
-int xe_bo_evict(struct xe_bo *bo)
+int xe_bo_evict(struct xe_bo *bo, struct drm_exec *exec)
 {
 	struct ttm_operation_ctx ctx = {
 		.interruptible = false,
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index 8cce413b5235..b1b6cb622d71 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -10,6 +10,7 @@
 
 #include "xe_bo_types.h"
 #include "xe_macros.h"
+#include "xe_validation.h"
 #include "xe_vm_types.h"
 #include "xe_vm.h"
 #include "xe_vram_types.h"
@@ -92,15 +93,17 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
 				     struct xe_tile *tile, struct dma_resv *resv,
 				     struct ttm_lru_bulk_move *bulk, size_t size,
 				     u16 cpu_caching, enum ttm_bo_type type,
-				     u32 flags);
+				     u32 flags, struct drm_exec *exec);
 struct xe_bo *
 xe_bo_create_locked_range(struct xe_device *xe,
 			  struct xe_tile *tile, struct xe_vm *vm,
 			  size_t size, u64 start, u64 end,
-			  enum ttm_bo_type type, u32 flags, u64 alignment);
+			  enum ttm_bo_type type, u32 flags, u64 alignment,
+			  struct drm_exec *exec);
 struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
 				  struct xe_vm *vm, size_t size,
-				  enum ttm_bo_type type, u32 flags);
+				  enum ttm_bo_type type, u32 flags,
+				  struct drm_exec *exec);
 struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
 			   struct xe_vm *vm, size_t size,
 			   enum ttm_bo_type type, u32 flags);
@@ -200,11 +203,12 @@ static inline void xe_bo_unlock_vm_held(struct xe_bo *bo)
 	}
 }
 
-int xe_bo_pin_external(struct xe_bo *bo);
-int xe_bo_pin(struct xe_bo *bo);
+int xe_bo_pin_external(struct xe_bo *bo, struct drm_exec *exec);
+int xe_bo_pin(struct xe_bo *bo, struct drm_exec *exec);
 void xe_bo_unpin_external(struct xe_bo *bo);
 void xe_bo_unpin(struct xe_bo *bo);
-int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict);
+int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict,
+		   struct drm_exec *exec);
 
 static inline bool xe_bo_is_pinned(struct xe_bo *bo)
 {
@@ -285,8 +289,8 @@ uint64_t vram_region_gpu_offset(struct ttm_resource *res);
 
 bool xe_bo_can_migrate(struct xe_bo *bo, u32 mem_type);
 
-int xe_bo_migrate(struct xe_bo *bo, u32 mem_type);
-int xe_bo_evict(struct xe_bo *bo);
+int xe_bo_migrate(struct xe_bo *bo, u32 mem_type, struct drm_exec *exec);
+int xe_bo_evict(struct xe_bo *bo, struct drm_exec *exec);
 
 int xe_bo_evict_pinned(struct xe_bo *bo);
 int xe_bo_notifier_prepare_pinned(struct xe_bo *bo);
diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
index 346f857f3837..78a827d4e726 100644
--- a/drivers/gpu/drm/xe/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/xe_dma_buf.c
@@ -51,6 +51,7 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
 	struct drm_gem_object *obj = attach->dmabuf->priv;
 	struct xe_bo *bo = gem_to_xe_bo(obj);
 	struct xe_device *xe = xe_bo_device(bo);
+	struct drm_exec *exec = XE_VALIDATION_UNSUPPORTED;
 	int ret;
 
 	/*
@@ -63,7 +64,7 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
 		return -EINVAL;
 	}
 
-	ret = xe_bo_migrate(bo, XE_PL_TT);
+	ret = xe_bo_migrate(bo, XE_PL_TT, exec);
 	if (ret) {
 		if (ret != -EINTR && ret != -ERESTARTSYS)
 			drm_dbg(&xe->drm,
@@ -72,7 +73,7 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
 		return ret;
 	}
 
-	ret = xe_bo_pin_external(bo);
+	ret = xe_bo_pin_external(bo, exec);
 	xe_assert(xe, !ret);
 
 	return 0;
@@ -92,6 +93,7 @@ static struct sg_table *xe_dma_buf_map(struct dma_buf_attachment *attach,
 	struct dma_buf *dma_buf = attach->dmabuf;
 	struct drm_gem_object *obj = dma_buf->priv;
 	struct xe_bo *bo = gem_to_xe_bo(obj);
+	struct drm_exec *exec = XE_VALIDATION_UNSUPPORTED;
 	struct sg_table *sgt;
 	int r = 0;
 
@@ -100,9 +102,9 @@ static struct sg_table *xe_dma_buf_map(struct dma_buf_attachment *attach,
 
 	if (!xe_bo_is_pinned(bo)) {
 		if (!attach->peer2peer)
-			r = xe_bo_migrate(bo, XE_PL_TT);
+			r = xe_bo_migrate(bo, XE_PL_TT, exec);
 		else
-			r = xe_bo_validate(bo, NULL, false);
+			r = xe_bo_validate(bo, NULL, false, exec);
 		if (r)
 			return ERR_PTR(r);
 	}
@@ -161,13 +163,14 @@ static int xe_dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
 	struct xe_bo *bo = gem_to_xe_bo(obj);
 	bool reads =  (direction == DMA_BIDIRECTIONAL ||
 		       direction == DMA_FROM_DEVICE);
+	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
 
 	if (!reads)
 		return 0;
 
 	/* Can we do interruptible lock here? */
 	xe_bo_lock(bo, false);
-	(void)xe_bo_migrate(bo, XE_PL_TT);
+	(void)xe_bo_migrate(bo, XE_PL_TT, exec);
 	xe_bo_unlock(bo);
 
 	return 0;
@@ -208,13 +211,14 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
 {
 	struct dma_resv *resv = dma_buf->resv;
 	struct xe_device *xe = to_xe_device(dev);
+	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
 	struct xe_bo *bo;
 	int ret;
 
 	dma_resv_lock(resv, NULL);
 	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
 				    0, /* Will require 1way or 2way for vm_bind */
-				    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM);
+				    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM, exec);
 	if (IS_ERR(bo)) {
 		ret = PTR_ERR(bo);
 		goto error;
@@ -232,8 +236,9 @@ static void xe_dma_buf_move_notify(struct dma_buf_attachment *attach)
 {
 	struct drm_gem_object *obj = attach->importer_priv;
 	struct xe_bo *bo = gem_to_xe_bo(obj);
+	struct drm_exec *exec = XE_VALIDATION_UNSUPPORTED;
 
-	XE_WARN_ON(xe_bo_evict(bo));
+	XE_WARN_ON(xe_bo_evict(bo, exec));
 }
 
 static const struct dma_buf_attach_ops xe_dma_buf_attach_ops = {
diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index 44364c042ad7..0bcb4fb9a10e 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -97,9 +97,13 @@
 static int xe_exec_fn(struct drm_gpuvm_exec *vm_exec)
 {
 	struct xe_vm *vm = container_of(vm_exec->vm, struct xe_vm, gpuvm);
+	int ret;
 
 	/* The fence slot added here is intended for the exec sched job. */
-	return xe_vm_validate_rebind(vm, &vm_exec->exec, 1);
+	xe_vm_set_validation_exec(vm, &vm_exec->exec);
+	ret = xe_vm_validate_rebind(vm, &vm_exec->exec, 1);
+	xe_vm_set_validation_exec(vm, NULL);
+	return ret;
 }
 
 int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index e03222f5ac5a..a47c0131956b 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -731,7 +731,7 @@ void xe_ggtt_map_bo_unlocked(struct xe_ggtt *ggtt, struct xe_bo *bo)
 }
 
 static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
-				  u64 start, u64 end)
+				  u64 start, u64 end, struct drm_exec *exec)
 {
 	u64 alignment = bo->min_align > 0 ? bo->min_align : XE_PAGE_SIZE;
 	u8 tile_id = ggtt->tile->id;
@@ -746,7 +746,7 @@ static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
 		return 0;
 	}
 
-	err = xe_bo_validate(bo, NULL, false);
+	err = xe_bo_validate(bo, NULL, false, exec);
 	if (err)
 		return err;
 
@@ -788,25 +788,28 @@ static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
  * @bo: the &xe_bo to be inserted
  * @start: address where it will be inserted
  * @end: end of the range where it will be inserted
+ * @exec: The drm_exec transaction to use for exhaustive eviction.
  *
  * Return: 0 on success or a negative error code on failure.
  */
 int xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
-			 u64 start, u64 end)
+			 u64 start, u64 end, struct drm_exec *exec)
 {
-	return __xe_ggtt_insert_bo_at(ggtt, bo, start, end);
+	return __xe_ggtt_insert_bo_at(ggtt, bo, start, end, exec);
 }
 
 /**
  * xe_ggtt_insert_bo - Insert BO into GGTT
  * @ggtt: the &xe_ggtt where bo will be inserted
  * @bo: the &xe_bo to be inserted
+ * @exec: The drm_exec transaction to use for exhaustive eviction.
  *
  * Return: 0 on success or a negative error code on failure.
  */
-int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo)
+int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo,
+		      struct drm_exec *exec)
 {
-	return __xe_ggtt_insert_bo_at(ggtt, bo, 0, U64_MAX);
+	return __xe_ggtt_insert_bo_at(ggtt, bo, 0, U64_MAX, exec);
 }
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_ggtt.h b/drivers/gpu/drm/xe/xe_ggtt.h
index fbe1e397d05d..75fc7a1efea7 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.h
+++ b/drivers/gpu/drm/xe/xe_ggtt.h
@@ -10,6 +10,7 @@
 
 struct drm_printer;
 struct xe_tile;
+struct drm_exec;
 
 struct xe_ggtt *xe_ggtt_alloc(struct xe_tile *tile);
 int xe_ggtt_init_early(struct xe_ggtt *ggtt);
@@ -31,9 +32,9 @@ bool xe_ggtt_node_allocated(const struct xe_ggtt_node *node);
 void xe_ggtt_map_bo(struct xe_ggtt *ggtt, struct xe_ggtt_node *node,
 		    struct xe_bo *bo, u16 pat_index);
 void xe_ggtt_map_bo_unlocked(struct xe_ggtt *ggtt, struct xe_bo *bo);
-int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo);
+int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo, struct drm_exec *exec);
 int xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
-			 u64 start, u64 end);
+			 u64 start, u64 end, struct drm_exec *exec);
 void xe_ggtt_remove_bo(struct xe_ggtt *ggtt, struct xe_bo *bo);
 u64 xe_ggtt_largest_hole(struct xe_ggtt *ggtt, u64 alignment, u64 *spare);
 
diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index ab43dec52776..2c7f10cc423f 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -94,12 +94,12 @@ static int xe_pf_begin(struct drm_exec *exec, struct xe_vma *vma,
 		}
 
 		/* Migrate to VRAM, move should invalidate the VMA first */
-		err = xe_bo_migrate(bo, vram->placement);
+		err = xe_bo_migrate(bo, vram->placement, exec);
 		if (err)
 			return err;
 	} else if (bo) {
 		/* Create backing store if needed */
-		err = xe_bo_validate(bo, vm, true);
+		err = xe_bo_validate(bo, vm, true, exec);
 		if (err)
 			return err;
 	}
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
index c8f0320d032f..906011671b60 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
@@ -1452,6 +1452,7 @@ static bool pf_release_vf_config_lmem(struct xe_gt *gt, struct xe_gt_sriov_confi
 static int pf_provision_vf_lmem(struct xe_gt *gt, unsigned int vfid, u64 size)
 {
 	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
+	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
 	struct xe_device *xe = gt_to_xe(gt);
 	struct xe_tile *tile = gt_to_tile(gt);
 	struct xe_bo *bo;
@@ -1484,11 +1485,12 @@ static int pf_provision_vf_lmem(struct xe_gt *gt, unsigned int vfid, u64 size)
 				 XE_BO_FLAG_VRAM_IF_DGFX(tile) |
 				 XE_BO_FLAG_NEEDS_2M |
 				 XE_BO_FLAG_PINNED |
-				 XE_BO_FLAG_PINNED_LATE_RESTORE);
+				 XE_BO_FLAG_PINNED_LATE_RESTORE,
+				 exec);
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
 
-	err = xe_bo_pin(bo);
+	err = xe_bo_pin(bo, exec);
 	xe_bo_unlock(bo);
 	if (unlikely(err)) {
 		xe_bo_put(bo);
diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index e35c6d4def20..39e3aa6df25a 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -700,6 +700,7 @@ static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
 	struct device *dev = xe->drm.dev;
 	struct drm_buddy_block *block;
 	struct list_head *blocks;
+	struct drm_exec *exec;
 	struct xe_bo *bo;
 	ktime_t time_end = 0;
 	int err, idx;
@@ -708,12 +709,13 @@ static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
 		return -ENODEV;
 
 	xe_pm_runtime_get(xe);
+	exec = XE_VALIDATION_UNIMPLEMENTED;
 
  retry:
 	bo = xe_bo_create_locked(vr->xe, NULL, NULL, end - start,
 				 ttm_bo_type_device,
 				 (IS_DGFX(xe) ? XE_BO_FLAG_VRAM(vr) : XE_BO_FLAG_SYSTEM) |
-				 XE_BO_FLAG_CPU_ADDR_MIRROR);
+				 XE_BO_FLAG_CPU_ADDR_MIRROR, exec);
 	if (IS_ERR(bo)) {
 		err = PTR_ERR(bo);
 		if (xe_vm_validate_should_retry(NULL, err, &time_end))
diff --git a/drivers/gpu/drm/xe/xe_validation.c b/drivers/gpu/drm/xe/xe_validation.c
new file mode 100644
index 000000000000..cc0684d24e02
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_validation.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2024 Intel Corporation
+ */
+#include "xe_bo.h"
+#include <drm/drm_exec.h>
+#include <drm/drm_gem.h>
+
+#include "xe_assert.h"
+#include "xe_validation.h"
+
+#ifdef CONFIG_DRM_XE_DEBUG
+/**
+ * xe_validation_assert_exec() - Assert that the drm_exec pointer is suitable
+ * for validation.
+ * @xe: Pointer to the xe device.
+ * @exec: The drm_exec pointer to check.
+ * @obj: Pointer to the object subject to validation.
+ *
+ * NULL exec pointers are not allowed.
+ * For XE_VALIDATION_UNIMPLEMENTED, no checking.
+ * For XE_VLIDATION_OPT_OUT, check that the caller is a kunit test
+ * For XE_VALIDATION_UNSUPPORTED, check that the object subject to
+ * validation is a dma-buf, for which support for ww locking is
+ * not in place in the dma-buf layer.
+ */
+void xe_validation_assert_exec(const struct xe_device *xe,
+			       const struct drm_exec *exec,
+			       const struct drm_gem_object *obj)
+{
+	xe_assert(xe, exec);
+	if (IS_ERR(exec)) {
+		switch (PTR_ERR(exec)) {
+		case __XE_VAL_UNIMPLEMENTED:
+			break;
+		case __XE_VAL_UNSUPPORTED:
+			xe_assert(xe, !!obj->dma_buf);
+			break;
+#if IS_ENABLED(CONFIG_KUNIT)
+		case __XE_VAL_OPT_OUT:
+			xe_assert(xe, current->kunit_test);
+			break;
+#endif
+		default:
+			xe_assert(xe, false);
+		}
+	}
+}
+#endif
diff --git a/drivers/gpu/drm/xe/xe_validation.h b/drivers/gpu/drm/xe/xe_validation.h
new file mode 100644
index 000000000000..db50feacad7a
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_validation.h
@@ -0,0 +1,69 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2024 Intel Corporation
+ */
+#ifndef _XE_VALIDATION_H_
+#define _XE_VALIDATION_H_
+
+#include <linux/dma-resv.h>
+#include <linux/types.h>
+
+struct drm_exec;
+struct drm_gem_object;
+struct xe_device;
+
+#ifdef CONFIG_PROVE_LOCKING
+/**
+ * xe_validation_lockdep() - Assert that a drm_exec locking transaction can
+ * be initialized at this point.
+ */
+static inline void xe_validation_lockdep(void)
+{
+	struct ww_acquire_ctx ticket;
+
+	ww_acquire_init(&ticket, &reservation_ww_class);
+	ww_acquire_fini(&ticket);
+}
+#else
+static inline void xe_validation_lockdep(void)
+{
+}
+#endif
+
+/*
+ * Various values of the drm_exec pointer where we've not (yet)
+ * implemented full ww locking.
+ *
+ * XE_VALIDATION_UNIMPLEMENTED means implementation is pending.
+ * A lockdep check is made to assure that a drm_exec locking
+ * transaction can actually take place where the macro is
+ * used. If this asserts, the exec pointer needs to be assigned
+ * higher up in the callchain and passed down.
+ *
+ * XE_VALIDATION_UNSUPPORTED is for dma-buf code only where
+ * the dma-buf layer doesn't support WW locking.
+ *
+ * XE_VALIDATION_OPT_OUT is for simplification of kunit tests where
+ * exhaustive eviction isn't necessary.
+ */
+#define __XE_VAL_UNIMPLEMENTED -EINVAL
+#define XE_VALIDATION_UNIMPLEMENTED (xe_validation_lockdep(),		\
+				     (struct drm_exec *)ERR_PTR(__XE_VAL_UNIMPLEMENTED))
+
+#define __XE_VAL_UNSUPPORTED -EOPNOTSUPP
+#define XE_VALIDATION_UNSUPPORTED ((struct drm_exec *)ERR_PTR(__XE_VAL_UNSUPPORTED))
+
+#define __XE_VAL_OPT_OUT -ENOMEM
+#define XE_VALIDATION_OPT_OUT (xe_validation_lockdep(), \
+			       (struct drm_exec *)ERR_PTR(__XE_VAL_OPT_OUT))
+#ifdef CONFIG_DRM_XE_DEBUG
+void xe_validation_assert_exec(const struct xe_device *xe, const struct drm_exec *exec,
+			       const struct drm_gem_object *obj);
+#else
+#define xe_validation_assert_exec(_xe, _exec, _obj)	\
+	do {						\
+		(void)_xe; (void)_exec; (void)_obj;	\
+	} while (0)
+#endif
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 12e661960244..600aaadb4bee 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -393,7 +393,7 @@ static int xe_gpuvm_validate(struct drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
 		list_move_tail(&gpuva_to_vma(gpuva)->combined_links.rebind,
 			       &vm->rebind_list);
 
-	ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, false);
+	ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, false, exec);
 	if (ret)
 		return ret;
 
@@ -451,6 +451,7 @@ static int xe_preempt_work_begin(struct drm_exec *exec, struct xe_vm *vm,
 	if (err)
 		return err;
 
+	xe_vm_set_validation_exec(vm, exec);
 	if (xe_vm_is_idle(vm)) {
 		vm->preempt.rebind_deactivated = true;
 		*done = true;
@@ -516,6 +517,7 @@ static void preempt_rebind_work_func(struct work_struct *w)
 		err = xe_preempt_work_begin(&exec, vm, &done);
 		drm_exec_retry_on_contention(&exec);
 		if (err || done) {
+			xe_vm_set_validation_exec(vm, NULL);
 			drm_exec_fini(&exec);
 			if (err && xe_vm_validate_should_retry(&exec, err, &end))
 				err = -EAGAIN;
@@ -565,6 +567,7 @@ static void preempt_rebind_work_func(struct work_struct *w)
 	up_read(&vm->userptr.notifier_lock);
 
 out_unlock:
+	xe_vm_set_validation_exec(vm, NULL);
 	drm_exec_fini(&exec);
 out_unlock_outer:
 	if (err == -EAGAIN) {
@@ -1375,6 +1378,8 @@ int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma)
 	err = drm_exec_lock_obj(exec, xe_vm_obj(vm));
 	if (!err && bo && !bo->vm)
 		err = drm_exec_lock_obj(exec, &bo->ttm.base);
+	if (!err)
+		xe_vm_set_validation_exec(vm, exec);
 
 	return err;
 }
@@ -2889,7 +2894,7 @@ static int vma_lock_and_validate(struct drm_exec *exec, struct xe_vma *vma,
 			err = drm_exec_lock_obj(exec, &bo->ttm.base);
 		if (!err && validate)
 			err = xe_bo_validate(bo, vm,
-					     !xe_vm_in_preempt_fence_mode(vm));
+					     !xe_vm_in_preempt_fence_mode(vm), exec);
 	}
 
 	return err;
@@ -3012,7 +3017,8 @@ static int op_lock_and_prep(struct drm_exec *exec, struct xe_vm *vm,
 					    false);
 		if (!err && !xe_vma_has_no_bo(vma))
 			err = xe_bo_migrate(xe_vma_bo(vma),
-					    region_to_mem_type[region]);
+					    region_to_mem_type[region],
+					    exec);
 		break;
 	}
 	default:
@@ -3052,6 +3058,7 @@ static int vm_bind_ioctl_ops_lock_and_prep(struct drm_exec *exec,
 	if (err)
 		return err;
 
+	xe_vm_set_validation_exec(vm, exec);
 	list_for_each_entry(op, &vops->list, link) {
 		err = op_lock_and_prep(exec, vm, op);
 		if (err)
@@ -3850,10 +3857,18 @@ struct dma_fence *xe_vm_bind_kernel_bo(struct xe_vm *vm, struct xe_bo *bo,
  */
 int xe_vm_lock(struct xe_vm *vm, bool intr)
 {
+	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
+	int ret;
+
 	if (intr)
-		return dma_resv_lock_interruptible(xe_vm_resv(vm), NULL);
+		ret = dma_resv_lock_interruptible(xe_vm_resv(vm), NULL);
+	else
+		ret = dma_resv_lock(xe_vm_resv(vm), NULL);
+
+	if (!ret)
+		xe_vm_set_validation_exec(vm, exec);
 
-	return dma_resv_lock(xe_vm_resv(vm), NULL);
+	return ret;
 }
 
 /**
@@ -3864,6 +3879,7 @@ int xe_vm_lock(struct xe_vm *vm, bool intr)
  */
 void xe_vm_unlock(struct xe_vm *vm)
 {
+	xe_vm_set_validation_exec(vm, NULL);
 	dma_resv_unlock(xe_vm_resv(vm));
 }
 
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 2ecb417c19a2..4ba26eed7e96 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -321,7 +321,7 @@ static inline void xe_vm_set_validating(struct xe_vm *vm, bool allow_res_evict)
 	if (vm && !allow_res_evict) {
 		xe_vm_assert_held(vm);
 		/* Pairs with READ_ONCE in xe_vm_is_validating() */
-		WRITE_ONCE(vm->validating, current);
+		WRITE_ONCE(vm->validation.validating, current);
 	}
 }
 
@@ -339,7 +339,7 @@ static inline void xe_vm_clear_validating(struct xe_vm *vm, bool allow_res_evict
 {
 	if (vm && !allow_res_evict) {
 		/* Pairs with READ_ONCE in xe_vm_is_validating() */
-		WRITE_ONCE(vm->validating, NULL);
+		WRITE_ONCE(vm->validation.validating, NULL);
 	}
 }
 
@@ -357,13 +357,40 @@ static inline void xe_vm_clear_validating(struct xe_vm *vm, bool allow_res_evict
 static inline bool xe_vm_is_validating(struct xe_vm *vm)
 {
 	/* Pairs with WRITE_ONCE in xe_vm_is_validating() */
-	if (READ_ONCE(vm->validating) == current) {
+	if (READ_ONCE(vm->validation.validating) == current) {
 		xe_vm_assert_held(vm);
 		return true;
 	}
 	return false;
 }
 
+/**
+ * xe_vm_set_validation_exec() - Accessor to set the drm_exec object
+ * @vm: The vm we want to register a drm_exec object with.
+ * @exec: The exec object we want to register.
+ *
+ * Set the drm_exec object used to lock the vm's resv.
+ */
+static inline void xe_vm_set_validation_exec(struct xe_vm *vm, struct drm_exec *exec)
+{
+	xe_vm_assert_held(vm);
+	vm->validation._exec = exec;
+}
+
+/**
+ * xe_vm_set_validation_exec() - Accessor to read the drm_exec object
+ * @vm: The vm we want to register a drm_exec object with.
+ *
+ * Return: The drm_exec object used to lock the vm's resv. The value
+ * is a valid pointer, %NULL, or one of the special values defined in
+ * xe_validation.h.
+ */
+static inline struct drm_exec *xe_vm_validation_exec(struct xe_vm *vm)
+{
+	xe_vm_assert_held(vm);
+	return vm->validation._exec;
+}
+
 /**
  * xe_vm_has_valid_gpu_mapping() - Advisory helper to check if VMA or SVM range has
  * a valid GPU mapping
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 8a07feef503b..2f88808e36bb 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -312,19 +312,35 @@ struct xe_vm {
 		bool capture_once;
 	} error_capture;
 
+	/**
+	 * @validation: Validation data only valid with the vm resv held.
+	 * Note: This is really task state of the task holding the vm resv,
+	 * and moving forward we should
+	 * come up with a better way of passing this down the call-
+	 * chain.
+	 */
+	struct {
+		/**
+		 * @validation.validating: The task that is currently making bos resident.
+		 * for this vm.
+		 * Protected by the VM's resv for writing. Opportunistic reading can be done
+		 * using READ_ONCE. Note: This is a workaround for the
+		 * TTM eviction_valuable() callback not being passed a struct
+		 * ttm_operation_context(). Future work might want to address this.
+		 */
+		struct task_struct *validating;
+		/**
+		 *  @validation.exec The drm_exec context used when locking the vm resv.
+		 *  Protected by the vm's resv.
+		 */
+		struct drm_exec *_exec;
+	} validation;
+
 	/**
 	 * @tlb_flush_seqno: Required TLB flush seqno for the next exec.
 	 * protected by the vm resv.
 	 */
 	u64 tlb_flush_seqno;
-	/**
-	 * @validating: The task that is currently making bos resident for this vm.
-	 * Protected by the VM's resv for writing. Opportunistic reading can be done
-	 * using READ_ONCE. Note: This is a workaround for the
-	 * TTM eviction_valuable() callback not being passed a struct
-	 * ttm_operation_context(). Future work might want to address this.
-	 */
-	struct task_struct *validating;
 	/** @batch_invalidate_tlb: Always invalidate TLB before batch start */
 	bool batch_invalidate_tlb;
 	/** @xef: XE file handle for tracking this VM's drm client */
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 05/15] drm/xe: Introduce an xe_validation wrapper around drm_exec
  2025-08-13 10:51 [PATCH 00/15] Driver-managed exhaustive eviction Thomas Hellström
                   ` (3 preceding siblings ...)
  2025-08-13 10:51 ` [PATCH 04/15] drm/xe: Pass down drm_exec context to validation Thomas Hellström
@ 2025-08-13 10:51 ` Thomas Hellström
  2025-08-13 17:25   ` Matthew Brost
                     ` (2 more replies)
  2025-08-13 10:51 ` [PATCH 06/15] drm/xe: Convert xe_bo_create_user() for exhaustive eviction Thomas Hellström
                   ` (13 subsequent siblings)
  18 siblings, 3 replies; 66+ messages in thread
From: Thomas Hellström @ 2025-08-13 10:51 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

Introduce a validation wrapper xe_validation_guard() as a helper
intended to be used around drm_exec transactions what perform
validations. Once TTM can handle exhaustive eviction we could
remove this wrapper or make it mostly a NO-OP unless other
functionality is added to it.

Currently the wrapper takes a read lock upon entry and if the
transaction hits an OOM, all locks are released and the
transaction is retried with a write-lock. If all other
validations participate in this scheme, the transaction with
the write lock will be the only transaction validating and
should have access to all available non-pinned memory.

There is currently a problem in that TTM converts -EDEADLOCKS to
-ENOMEM, and with ww_mutex slowpath error injections, we can hit
-ENOMEMs without having actually ran out of memory. We abuse
ww_mutex internals to detect such situations until TTM is fixes
to not convert the error code. In the meantime, injecting
ww_mutex slowpath -EDEADLOCKs is a good way to test
the implementation in the absence of real OOMs.

Just introduce the wrapper in this commit. It will be hooked up
to the driver in following commits.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_validation.c | 199 +++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_validation.h | 107 ++++++++++++++++
 2 files changed, 306 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_validation.c b/drivers/gpu/drm/xe/xe_validation.c
index cc0684d24e02..cd1424f04237 100644
--- a/drivers/gpu/drm/xe/xe_validation.c
+++ b/drivers/gpu/drm/xe/xe_validation.c
@@ -5,6 +5,7 @@
 #include "xe_bo.h"
 #include <drm/drm_exec.h>
 #include <drm/drm_gem.h>
+#include <drm/drm_gpuvm.h>
 
 #include "xe_assert.h"
 #include "xe_validation.h"
@@ -47,3 +48,201 @@ void xe_validation_assert_exec(const struct xe_device *xe,
 	}
 }
 #endif
+
+static int xe_validation_lock(struct xe_validation_ctx *ctx)
+{
+	struct xe_validation_device *val = ctx->val;
+	int ret = 0;
+
+	if (ctx->flags & DRM_EXEC_INTERRUPTIBLE_WAIT) {
+		if (ctx->request_exclusive)
+			ret = down_write_killable(&val->lock);
+		else
+			ret = down_read_interruptible(&val->lock);
+	} else {
+		if (ctx->request_exclusive)
+			down_write(&val->lock);
+		else
+			down_read(&val->lock);
+	}
+
+	if (!ret) {
+		ctx->lock_held = true;
+		ctx->lock_held_exclusive = ctx->request_exclusive;
+	}
+
+	return ret;
+}
+
+static void xe_validation_unlock(struct xe_validation_ctx *ctx)
+{
+	if (!ctx->lock_held)
+		return;
+
+	if (ctx->lock_held_exclusive)
+		up_write(&ctx->val->lock);
+	else
+		up_read(&ctx->val->lock);
+
+	ctx->lock_held = false;
+}
+
+/**
+ * xe_validation_ctx_init() - Initialize an xe_validation_ctx
+ * @ctx: The xe_validation_ctx to initialize.
+ * @val: The xe_validation_device representing the validation domain.
+ * @exec: The struct drm_exec to use for the transaction.
+ * @flags: The flags to use for drm_exec initialization.
+ * @nr: The number of anticipated buffer object locks. Forwarded to
+ * drm_exec initialization.
+ * @exclusive: Whether to use exclusive locking already on first validation.
+ *
+ * Initialize and lock a an xe_validation transaction using the validation domain
+ * represented by @val. Also initialize the drm_exec object forwarding
+ * @flags and @nr to the drm_exec initialization. The @exclusive parameter should
+ * typically be set to false to avoid locking out other validators from the
+ * domain until an OOM is hit. For testing- or final attempt purposes it can,
+ * however, be set to true.
+ *
+ * Return: %0 on success, %-EINTR if interruptible initial locking failed with a
+ * signal pending.
+ */
+int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct xe_validation_device *val,
+			   struct drm_exec *exec, u32 flags, unsigned int nr,
+			   bool exclusive)
+{
+	int ret;
+
+	ctx->exec = exec;
+	ctx->val = val;
+	ctx->lock_held = false;
+	ctx->lock_held_exclusive = false;
+	ctx->request_exclusive = exclusive;
+	ctx->flags = flags;
+	ctx->nr = nr;
+
+	ret = xe_validation_lock(ctx);
+	if (ret)
+		return ret;
+
+	drm_exec_init(exec, flags, nr);
+
+	return 0;
+}
+
+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+/*
+ * This abuses both drm_exec and ww_mutex internals and should be
+ * replaced by checking for -EDEADLK when we can make TTM
+ * stop converting -EDEADLK to -ENOMEM.
+ * An alternative is to not have exhaustive eviction with
+ * CONFIG_DEBUG_WW_MUTEX_SLOWPATH until that happens.
+ */
+static bool xe_validation_contention_injected(struct drm_exec *exec)
+{
+	return !!exec->ticket.contending_lock;
+}
+
+#else
+
+static bool xe_validation_contention_injected(struct drm_exec *exec)
+{
+	return false;
+}
+
+#endif
+
+static bool __xe_validation_should_retry(struct xe_validation_ctx *ctx, int ret)
+{
+	if (ret == -ENOMEM &&
+	    ((ctx->request_exclusive &&
+	      xe_validation_contention_injected(ctx->exec)) ||
+	     !ctx->request_exclusive)) {
+		ctx->request_exclusive = true;
+		return true;
+	}
+
+	return false;
+}
+
+/**
+ * xe_validation_exec_lock() - Perform drm_gpuvm_exec_lock within a validation
+ * transaction.
+ * @ctx: An uninitialized xe_validation_ctx.
+ * @vm_exec: An initialized struct vm_exec.
+ * @val: The validation domain.
+ *
+ * The drm_gpuvm_exec_lock() function internally initializes its drm_exec
+ * transaction and therefore doesn't lend itself very well to be using
+ * xe_validation_ctx_init(). Provide a helper that takes an uninitialized
+ * xe_validation_ctx and calls drm_gpuvm_exec_lock() with OOM retry.
+ *
+ * Return: %0 on success, negative error code on failure.
+ */
+int xe_validation_exec_lock(struct xe_validation_ctx *ctx,
+			    struct drm_gpuvm_exec *vm_exec,
+			    struct xe_validation_device *val)
+{
+	int ret;
+
+	memset(ctx, 0, sizeof(*ctx));
+	ctx->exec = &vm_exec->exec;
+	ctx->flags = vm_exec->flags;
+	ctx->val = val;
+retry:
+	ret = xe_validation_lock(ctx);
+	if (ret)
+		return ret;
+
+	ret = drm_gpuvm_exec_lock(vm_exec);
+	if (ret) {
+		xe_validation_unlock(ctx);
+		if (__xe_validation_should_retry(ctx, ret))
+			goto retry;
+	}
+
+	return ret;
+}
+
+/**
+ * xe_validation_ctx_fini() - Finalize a validation transaction
+ * @ctx: The Validation transaction to finalize.
+ *
+ * Finalize a validation transaction and its related drm_exec transaction.
+ */
+void xe_validation_ctx_fini(struct xe_validation_ctx *ctx)
+{
+	drm_exec_fini(ctx->exec);
+	xe_validation_unlock(ctx);
+}
+
+/**
+ * xe_validation_should_retry() - Determine if a validation transaction should retry
+ * @ctx: The validation transaction.
+ * @ret: Pointer to a return value variable.
+ *
+ * Determines whether a validation transaction should retry based on the
+ * internal transaction state and the return value pointed to by @ret.
+ * If a validation should be retried, the transaction is prepared for that,
+ * and the validation locked might be re-locked in exclusive mode, and *@ret
+ * is set to %0. If the re-locking errors, typically due to interruptible
+ * locking with signal pending, *@ret is instead set to -EINTR and the
+ * function returns %false.
+ *
+ * Return: %true if validation should be retried, %false otherwise.
+ */
+bool xe_validation_should_retry(struct xe_validation_ctx *ctx, int *ret)
+{
+	if (__xe_validation_should_retry(ctx, *ret)) {
+		drm_exec_fini(ctx->exec);
+		*ret = 0;
+		if (ctx->request_exclusive != ctx->lock_held_exclusive) {
+			xe_validation_unlock(ctx);
+			*ret = xe_validation_lock(ctx);
+		}
+		drm_exec_init(ctx->exec, ctx->flags, ctx->nr);
+		return !*ret;
+	}
+
+	return false;
+}
diff --git a/drivers/gpu/drm/xe/xe_validation.h b/drivers/gpu/drm/xe/xe_validation.h
index db50feacad7a..a708c260cf18 100644
--- a/drivers/gpu/drm/xe/xe_validation.h
+++ b/drivers/gpu/drm/xe/xe_validation.h
@@ -7,9 +7,11 @@
 
 #include <linux/dma-resv.h>
 #include <linux/types.h>
+#include <linux/rwsem.h>
 
 struct drm_exec;
 struct drm_gem_object;
+struct drm_gpuvm_exec;
 struct xe_device;
 
 #ifdef CONFIG_PROVE_LOCKING
@@ -66,4 +68,109 @@ void xe_validation_assert_exec(const struct xe_device *xe, const struct drm_exec
 	} while (0)
 #endif
 
+/**
+ * struct xe_validation_device - The domain for exhaustive eviction
+ * @lock: The lock used to exclude other processes from allocating graphics memory
+ *
+ * The struct xe_validation_device represents the domain for which we want to use
+ * exhaustive eviction. The @lock is typically grabbed in read mode for allocations
+ * but when graphics memory allocation fails, it is retried with the write mode held.
+ */
+struct xe_validation_device {
+	struct rw_semaphore lock;
+};
+
+/**
+ * struct xe_validation_ctx - A struct drm_exec subclass with support for
+ * exhaustive eviction
+ * @exec: The drm_exec object base class. Note that we use a pointer instead of
+ * embedding to avoid diamond inheritance.
+ * @val: The exhaustive eviction domain.
+ * @lock_held: Whether The domain lock is currently held.
+ * @lock_held_exclusive: Whether the domain lock is held in exclusive mode.
+ * @request_exclusive: Whether to lock exclusively (write mode) the next time
+ * the domain lock is locked.
+ * @flags: The drm_exec flags used for drm_exec (re-)initialization.
+ * @nr: The drm_exec nr parameter used for drm_exec (re-)initializaiton.
+ */
+struct xe_validation_ctx {
+	struct drm_exec *exec;
+	struct xe_validation_device *val;
+	bool lock_held;
+	bool lock_held_exclusive;
+	bool request_exclusive;
+	u32 flags;
+	unsigned int nr;
+};
+
+int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct xe_validation_device *val,
+			   struct drm_exec *exec, u32 flags, unsigned int nr,
+			   bool exclusive);
+
+int xe_validation_exec_lock(struct xe_validation_ctx *ctx, struct drm_gpuvm_exec *vm_exec,
+			    struct xe_validation_device *val);
+
+void xe_validation_ctx_fini(struct xe_validation_ctx *ctx);
+
+bool xe_validation_should_retry(struct xe_validation_ctx *ctx, int *ret);
+
+/**
+ * xe_validation_retry_on_oom() - Retry on oom in an xe_validaton transaction
+ * @_ctx: Pointer to the xe_validation_ctx
+ * @_ret: The current error value possibly holding -ENOMEM
+ *
+ * Use this in way similar to drm_exec_retry_on_contention().
+ * If @_ret contains -ENOMEM the tranaction is restarted once in a way that
+ * blocks other transactions and allows exhastive eviction. If the transaction
+ * was already restarted once, Just return the -ENOMEM. May also set
+ * _ret to -EINTR if not retrying and waits are interruptible.
+ * May only be used within a drm_exec_until_all_locked() loop.
+ */
+#define xe_validation_retry_on_oom(_ctx, _ret)				\
+	do {								\
+		if (xe_validation_should_retry(_ctx, _ret))		\
+			goto *__drm_exec_retry_ptr;			\
+	} while (0)
+
+/**
+ * xe_validation_device_init - Initialize a struct xe_validation_device
+ * @val: The xe_validation_device to init.
+ */
+static inline void
+xe_validation_device_init(struct xe_validation_device *val)
+{
+	init_rwsem(&val->lock);
+}
+
+/*
+ * Make guard() and scoped_guard() work with xe_validation_ctx
+ * so that we can exit transactions without caring about the
+ * cleanup.
+ */
+DEFINE_CLASS(xe_validation, struct xe_validation_ctx *,
+	     if (!IS_ERR(_T)) xe_validation_ctx_fini(_T);,
+	     ({_ret = xe_validation_ctx_init(_ctx, _val, _exec, _flags, 0, _excl);
+	       _ret ? NULL : _ctx; }),
+	     struct xe_validation_ctx *_ctx, struct xe_validation_device *_val,
+	     struct drm_exec *_exec, u32 _flags, int _ret, bool _excl);
+static inline void *class_xe_validation_lock_ptr(class_xe_validation_t *_T)
+{return *_T; }
+#define class_xe_validation_is_conditional false
+
+/**
+ * xe_validation_guard() - An auto-cleanup xe_validation_ctx transaction
+ * @_ctx: The xe_validation_ctx.
+ * @_val: The xe_validation_device.
+ * @_exec: The struct drm_exec object
+ * @_flags: Flags for the drm_exec transaction. See the struct drm_exec documention!
+ * @_ret: Return in / out parameter. May be set by this macro. Typicall 0 when called.
+ * @_excl: Whether to start in exclusive mode already in the first iteration.
+ *
+ * This macro is will initiate a drm_exec transaction with additional support for
+ * exhaustive eviction.
+ */
+#define xe_validation_guard(_ctx, _val, _exec, _flags, _ret, _excl)	\
+	scoped_guard(xe_validation, _ctx, _val, _exec, _flags, _ret, _excl) \
+	drm_exec_until_all_locked(_exec)
+
 #endif
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 06/15] drm/xe: Convert xe_bo_create_user() for exhaustive eviction
  2025-08-13 10:51 [PATCH 00/15] Driver-managed exhaustive eviction Thomas Hellström
                   ` (4 preceding siblings ...)
  2025-08-13 10:51 ` [PATCH 05/15] drm/xe: Introduce an xe_validation wrapper around drm_exec Thomas Hellström
@ 2025-08-13 10:51 ` Thomas Hellström
  2025-08-14  2:23   ` Matthew Brost
  2025-08-13 10:51 ` [PATCH 07/15] drm/xe: Convert SVM validation " Thomas Hellström
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 66+ messages in thread
From: Thomas Hellström @ 2025-08-13 10:51 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

Use the xe_validation_guard() to convert xe_bo_create_user()
for exhaustive eviction.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/tests/xe_bo.c      |  16 ++--
 drivers/gpu/drm/xe/tests/xe_dma_buf.c |   4 +-
 drivers/gpu/drm/xe/tests/xe_migrate.c |  12 +--
 drivers/gpu/drm/xe/xe_bo.c            | 116 +++++++++++++++++---------
 drivers/gpu/drm/xe/xe_bo.h            |   9 +-
 drivers/gpu/drm/xe/xe_device.c        |   2 +
 drivers/gpu/drm/xe/xe_device_types.h  |   3 +
 drivers/gpu/drm/xe/xe_vm.c            |  14 ++++
 drivers/gpu/drm/xe/xe_vm.h            |   2 +
 9 files changed, 116 insertions(+), 62 deletions(-)

diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
index 06ceba6c3c25..42f914692a02 100644
--- a/drivers/gpu/drm/xe/tests/xe_bo.c
+++ b/drivers/gpu/drm/xe/tests/xe_bo.c
@@ -139,8 +139,8 @@ static void ccs_test_run_tile(struct xe_device *xe, struct xe_tile *tile,
 	else
 		kunit_info(test, "Testing system memory\n");
 
-	bo = xe_bo_create_user(xe, NULL, NULL, SZ_1M, DRM_XE_GEM_CPU_CACHING_WC,
-			       bo_flags);
+	bo = xe_bo_create_user(xe, NULL, SZ_1M, DRM_XE_GEM_CPU_CACHING_WC,
+			       bo_flags, exec);
 	if (IS_ERR(bo)) {
 		KUNIT_FAIL(test, "Failed to create bo.\n");
 		return;
@@ -220,18 +220,18 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
 
 	for (i = 0; i < 2; ++i) {
 		xe_vm_lock(vm, false);
-		bo = xe_bo_create_user(xe, NULL, vm, 0x10000,
+		bo = xe_bo_create_user(xe, vm, 0x10000,
 				       DRM_XE_GEM_CPU_CACHING_WC,
-				       bo_flags);
+				       bo_flags, exec);
 		xe_vm_unlock(vm);
 		if (IS_ERR(bo)) {
 			KUNIT_FAIL(test, "bo create err=%pe\n", bo);
 			break;
 		}
 
-		external = xe_bo_create_user(xe, NULL, NULL, 0x10000,
+		external = xe_bo_create_user(xe, NULL, 0x10000,
 					     DRM_XE_GEM_CPU_CACHING_WC,
-					     bo_flags);
+					     bo_flags, NULL);
 		if (IS_ERR(external)) {
 			KUNIT_FAIL(test, "external bo create err=%pe\n", external);
 			goto cleanup_bo;
@@ -497,9 +497,9 @@ static int shrink_test_run_device(struct xe_device *xe)
 		INIT_LIST_HEAD(&link->link);
 
 		/* We can create bos using WC caching here. But it is slower. */
-		bo = xe_bo_create_user(xe, NULL, NULL, XE_BO_SHRINK_SIZE,
+		bo = xe_bo_create_user(xe, NULL, XE_BO_SHRINK_SIZE,
 				       DRM_XE_GEM_CPU_CACHING_WB,
-				       XE_BO_FLAG_SYSTEM);
+				       XE_BO_FLAG_SYSTEM, NULL);
 		if (IS_ERR(bo)) {
 			if (bo != ERR_PTR(-ENOMEM) && bo != ERR_PTR(-ENOSPC) &&
 			    bo != ERR_PTR(-EINTR) && bo != ERR_PTR(-ERESTARTSYS))
diff --git a/drivers/gpu/drm/xe/tests/xe_dma_buf.c b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
index 965dd3280468..8126b35f4aeb 100644
--- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
@@ -122,8 +122,8 @@ static void xe_test_dmabuf_import_same_driver(struct xe_device *xe)
 		size = SZ_64K;
 
 	kunit_info(test, "running %s\n", __func__);
-	bo = xe_bo_create_user(xe, NULL, NULL, size, DRM_XE_GEM_CPU_CACHING_WC,
-			       params->mem_mask);
+	bo = xe_bo_create_user(xe, NULL, size, DRM_XE_GEM_CPU_CACHING_WC,
+			       params->mem_mask, NULL);
 	if (IS_ERR(bo)) {
 		KUNIT_FAIL(test, "xe_bo_create() failed with err=%ld\n",
 			   PTR_ERR(bo));
diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
index dfb445d09759..afa794e56065 100644
--- a/drivers/gpu/drm/xe/tests/xe_migrate.c
+++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
@@ -642,11 +642,11 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
 	struct drm_exec *exec;
 	long ret;
 
-	sys_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M,
+	sys_bo = xe_bo_create_user(xe, NULL, SZ_4M,
 				   DRM_XE_GEM_CPU_CACHING_WC,
 				   XE_BO_FLAG_SYSTEM |
 				   XE_BO_FLAG_NEEDS_CPU_ACCESS |
-				   XE_BO_FLAG_PINNED);
+				   XE_BO_FLAG_PINNED, NULL);
 
 	if (IS_ERR(sys_bo)) {
 		KUNIT_FAIL(test, "xe_bo_create() failed with err=%ld\n",
@@ -669,10 +669,10 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
 	}
 	xe_bo_unlock(sys_bo);
 
-	ccs_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M,
+	ccs_bo = xe_bo_create_user(xe, NULL, SZ_4M,
 				   DRM_XE_GEM_CPU_CACHING_WC,
 				   bo_flags | XE_BO_FLAG_NEEDS_CPU_ACCESS |
-				   XE_BO_FLAG_PINNED);
+				   XE_BO_FLAG_PINNED, NULL);
 
 	if (IS_ERR(ccs_bo)) {
 		KUNIT_FAIL(test, "xe_bo_create() failed with err=%ld\n",
@@ -694,10 +694,10 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
 	}
 	xe_bo_unlock(ccs_bo);
 
-	vram_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M,
+	vram_bo = xe_bo_create_user(xe, NULL, SZ_4M,
 				    DRM_XE_GEM_CPU_CACHING_WC,
 				    bo_flags | XE_BO_FLAG_NEEDS_CPU_ACCESS |
-				    XE_BO_FLAG_PINNED);
+				    XE_BO_FLAG_PINNED, NULL);
 	if (IS_ERR(vram_bo)) {
 		KUNIT_FAIL(test, "xe_bo_create() failed with err=%ld\n",
 			   PTR_ERR(vram_bo));
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index e71addf51ed0..5e40b6cb8d2a 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -2185,30 +2185,66 @@ struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
 				     flags, 0, exec);
 }
 
-struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
-				struct xe_vm *vm, size_t size,
-				u16 cpu_caching,
-				u32 flags)
-{
-	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
-	struct xe_bo *bo = __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL,
-						 cpu_caching, ttm_bo_type_device,
-						 flags | XE_BO_FLAG_USER, 0, exec);
-	if (!IS_ERR(bo))
-		xe_bo_unlock_vm_held(bo);
+static struct xe_bo *xe_bo_create_novm(struct xe_device *xe, struct xe_tile *tile,
+				       size_t size, u16 cpu_caching,
+				       enum ttm_bo_type type, u32 flags,
+				       u64 alignment, bool intr)
+{
+	u32 drm_exec_flags = intr ? DRM_EXEC_INTERRUPTIBLE_WAIT : 0;
+	struct xe_validation_ctx ctx;
+	struct drm_exec exec;
+	struct xe_bo *bo;
+	int ret = 0;
 
-	return bo;
+	xe_validation_guard(&ctx, &xe->val, &exec, drm_exec_flags, ret, false) {
+		bo = __xe_bo_create_locked(xe, tile, NULL, size, 0, ~0ULL,
+					   cpu_caching, type, flags, alignment, &exec);
+		drm_exec_retry_on_contention(&exec);
+		if (IS_ERR(bo)) {
+			ret = PTR_ERR(bo);
+			xe_validation_retry_on_oom(&ctx, &ret);
+		} else {
+			xe_bo_unlock(bo);
+		}
+	}
+
+	return ret ? ERR_PTR(ret) : bo;
 }
 
-struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
-			   struct xe_vm *vm, size_t size,
-			   enum ttm_bo_type type, u32 flags)
+/**
+ * xe_bo_create_user() - Create a user BO
+ * @xe: The xe device.
+ * @vm: The local vm or NULL for external objects.
+ * @size: The storage size to use for the bo.
+ * @cpu_caching: The caching mode to be used for system backing store.
+ * @flags: XE_BO_FLAG_ flags.
+ * @exec: The drm_exec transaction to use for exhaustive eviction, or NULL
+ * if such a transaction should be initiated by the call.
+ *
+ * Create a bo on behalf of user-space.
+ *
+ * Return: The buffer object on success. Negative error pointer on failure.
+ */
+struct xe_bo *xe_bo_create_user(struct xe_device *xe,
+				struct xe_vm *vm, size_t size,
+				u16 cpu_caching,
+				u32 flags, struct drm_exec *exec)
 {
-	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
-	struct xe_bo *bo = xe_bo_create_locked(xe, tile, vm, size, type, flags, exec);
+	struct xe_bo *bo;
+
+	flags |= XE_BO_FLAG_USER;
 
-	if (!IS_ERR(bo))
-		xe_bo_unlock_vm_held(bo);
+	if (vm || exec) {
+		xe_assert(xe, exec);
+		bo = __xe_bo_create_locked(xe, NULL, vm, size, 0, ~0ULL,
+					   cpu_caching, ttm_bo_type_device,
+					   flags, 0, exec);
+		if (!IS_ERR(bo))
+			xe_bo_unlock_vm_held(bo);
+	} else {
+		bo = xe_bo_create_novm(xe, NULL, size, cpu_caching,
+				       ttm_bo_type_device, flags, 0, true);
+	}
 
 	return bo;
 }
@@ -2757,8 +2793,9 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
 	struct xe_device *xe = to_xe_device(dev);
 	struct xe_file *xef = to_xe_file(file);
 	struct drm_xe_gem_create *args = data;
+	struct xe_validation_ctx ctx;
+	struct drm_exec exec;
 	struct xe_vm *vm = NULL;
-	ktime_t end = 0;
 	struct xe_bo *bo;
 	unsigned int bo_flags;
 	u32 handle;
@@ -2832,25 +2869,26 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
 			return -ENOENT;
 	}
 
-retry:
-	if (vm) {
-		err = xe_vm_lock(vm, true);
-		if (err)
-			goto out_vm;
+	err = 0;
+	xe_validation_guard(&ctx, &xe->val, &exec,
+			    DRM_EXEC_INTERRUPTIBLE_WAIT, err, false) {
+		if (vm) {
+			err = xe_vm_drm_exec_lock(vm, &exec);
+			drm_exec_retry_on_contention(&exec);
+			if (err)
+				break;
+		}
+		bo = xe_bo_create_user(xe, vm, args->size, args->cpu_caching,
+				       bo_flags, &exec);
+		drm_exec_retry_on_contention(&exec);
+		if (IS_ERR(bo)) {
+			err = PTR_ERR(bo);
+			xe_validation_retry_on_oom(&ctx, &err);
+			break;
+		}
 	}
-
-	bo = xe_bo_create_user(xe, NULL, vm, args->size, args->cpu_caching,
-			       bo_flags);
-
-	if (vm)
-		xe_vm_unlock(vm);
-
-	if (IS_ERR(bo)) {
-		err = PTR_ERR(bo);
-		if (xe_vm_validate_should_retry(NULL, err, &end))
-			goto retry;
+	if (err)
 		goto out_vm;
-	}
 
 	if (args->extensions) {
 		err = gem_create_user_extensions(xe, bo, args->extensions, 0);
@@ -3223,11 +3261,11 @@ int xe_bo_dumb_create(struct drm_file *file_priv,
 	args->size = ALIGN(mul_u32_u32(args->pitch, args->height),
 			   page_size);
 
-	bo = xe_bo_create_user(xe, NULL, NULL, args->size,
+	bo = xe_bo_create_user(xe, NULL, args->size,
 			       DRM_XE_GEM_CPU_CACHING_WC,
 			       XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
 			       XE_BO_FLAG_SCANOUT |
-			       XE_BO_FLAG_NEEDS_CPU_ACCESS);
+			       XE_BO_FLAG_NEEDS_CPU_ACCESS, NULL);
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
 
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index b1b6cb622d71..c6bb90ca5c2e 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -104,13 +104,8 @@ struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
 				  struct xe_vm *vm, size_t size,
 				  enum ttm_bo_type type, u32 flags,
 				  struct drm_exec *exec);
-struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
-			   struct xe_vm *vm, size_t size,
-			   enum ttm_bo_type type, u32 flags);
-struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
-				struct xe_vm *vm, size_t size,
-				u16 cpu_caching,
-				u32 flags);
+struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_vm *vm, size_t size,
+				u16 cpu_caching, u32 flags, struct drm_exec *exec);
 struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
 				   struct xe_vm *vm, size_t size,
 				   enum ttm_bo_type type, u32 flags);
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 3e0402dff423..6b152aa89dbb 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -452,6 +452,8 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
 	if (err)
 		goto err;
 
+	xe_validation_device_init(&xe->val);
+
 	init_waitqueue_head(&xe->ufence_wq);
 
 	init_rwsem(&xe->usm.lock);
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 01e8fa0d2f9f..a4eb32bac151 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -26,6 +26,7 @@
 #include "xe_sriov_vf_ccs_types.h"
 #include "xe_step_types.h"
 #include "xe_survivability_mode_types.h"
+#include "xe_validation.h"
 
 #if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
 #define TEST_VM_OPS_ERROR
@@ -575,6 +576,8 @@ struct xe_device {
 	 */
 	atomic64_t global_total_pages;
 #endif
+	/** @val: The domain for exhaustive eviction, which is currently per device. */
+	struct xe_validation_device val;
 
 	/* private: */
 
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 600aaadb4bee..1c2d9d9065c6 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -47,6 +47,20 @@ static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm)
 	return vm->gpuvm.r_obj;
 }
 
+/**
+ * xe_vm_drm_exec_lock() - Lock the vm's resv with a drm_exec transaction
+ * @vm: The vm whose resv is to be locked.
+ * @exec: The drm_exec transaction.
+ *
+ * Helper to lock the vm's resv as part of a drm_exec transaction.
+ *
+ * Return: %0 on success. See drm_exec_lock_obj() for error codes.
+ */
+int xe_vm_drm_exec_lock(struct xe_vm *vm, struct drm_exec *exec)
+{
+	return drm_exec_lock_obj(exec, xe_vm_obj(vm));
+}
+
 /**
  * xe_vma_userptr_check_repin() - Advisory check for repin needed
  * @uvma: The userptr vma
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 4ba26eed7e96..3b6e7234dac4 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -292,6 +292,8 @@ void xe_vm_kill(struct xe_vm *vm, bool unlocked);
  */
 #define xe_vm_assert_held(vm) dma_resv_assert_held(xe_vm_resv(vm))
 
+int xe_vm_drm_exec_lock(struct xe_vm *vm, struct drm_exec *exec);
+
 #if IS_ENABLED(CONFIG_DRM_XE_DEBUG_VM)
 #define vm_dbg drm_dbg
 #else
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 07/15] drm/xe: Convert SVM validation for exhaustive eviction
  2025-08-13 10:51 [PATCH 00/15] Driver-managed exhaustive eviction Thomas Hellström
                   ` (5 preceding siblings ...)
  2025-08-13 10:51 ` [PATCH 06/15] drm/xe: Convert xe_bo_create_user() for exhaustive eviction Thomas Hellström
@ 2025-08-13 10:51 ` Thomas Hellström
  2025-08-13 15:32   ` Matthew Brost
  2025-08-13 10:51 ` [PATCH 08/15] drm/xe: Convert existing drm_exec transactions " Thomas Hellström
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 66+ messages in thread
From: Thomas Hellström @ 2025-08-13 10:51 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

Convert SVM validation to support exhaustive eviction,
using xe_validation_guard().

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_svm.c | 63 ++++++++++++++++++-------------------
 1 file changed, 30 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index 39e3aa6df25a..ba85665d85d4 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -699,51 +699,48 @@ static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
 	struct xe_device *xe = vr->xe;
 	struct device *dev = xe->drm.dev;
 	struct drm_buddy_block *block;
+	struct xe_validation_ctx vctx;
 	struct list_head *blocks;
-	struct drm_exec *exec;
+	struct drm_exec exec;
 	struct xe_bo *bo;
-	ktime_t time_end = 0;
-	int err, idx;
+	int err = 0, idx;
 
 	if (!drm_dev_enter(&xe->drm, &idx))
 		return -ENODEV;
 
 	xe_pm_runtime_get(xe);
-	exec = XE_VALIDATION_UNIMPLEMENTED;
-
- retry:
-	bo = xe_bo_create_locked(vr->xe, NULL, NULL, end - start,
-				 ttm_bo_type_device,
-				 (IS_DGFX(xe) ? XE_BO_FLAG_VRAM(vr) : XE_BO_FLAG_SYSTEM) |
-				 XE_BO_FLAG_CPU_ADDR_MIRROR, exec);
-	if (IS_ERR(bo)) {
-		err = PTR_ERR(bo);
-		if (xe_vm_validate_should_retry(NULL, err, &time_end))
-			goto retry;
-		goto out_pm_put;
-	}
 
-	drm_pagemap_devmem_init(&bo->devmem_allocation, dev, mm,
-				&dpagemap_devmem_ops, dpagemap, end - start);
-
-	blocks = &to_xe_ttm_vram_mgr_resource(bo->ttm.resource)->blocks;
-	list_for_each_entry(block, blocks, link)
-		block->private = vr;
+	xe_validation_guard(&vctx, &xe->val, &exec, 0, err, false) {
+		bo = xe_bo_create_locked(xe, NULL, NULL, end - start,
+					 ttm_bo_type_device,
+					 (IS_DGFX(xe) ? XE_BO_FLAG_VRAM(vr) : XE_BO_FLAG_SYSTEM) |
+					 XE_BO_FLAG_CPU_ADDR_MIRROR, &exec);
+		drm_exec_retry_on_contention(&exec);
+		if (IS_ERR(bo)) {
+			err = PTR_ERR(bo);
+			xe_validation_retry_on_oom(&vctx, &err);
+			break;
+		}
 
-	xe_bo_get(bo);
+		drm_pagemap_devmem_init(&bo->devmem_allocation, dev, mm,
+					&dpagemap_devmem_ops, dpagemap, end - start);
 
-	/* Ensure the device has a pm ref while there are device pages active. */
-	xe_pm_runtime_get_noresume(xe);
-	err = drm_pagemap_migrate_to_devmem(&bo->devmem_allocation, mm,
-					    start, end, timeslice_ms,
-					    xe_svm_devm_owner(xe));
-	if (err)
-		xe_svm_devmem_release(&bo->devmem_allocation);
+		blocks = &to_xe_ttm_vram_mgr_resource(bo->ttm.resource)->blocks;
+		list_for_each_entry(block, blocks, link)
+			block->private = vr;
 
-	xe_bo_unlock(bo);
-	xe_bo_put(bo);
+		xe_bo_get(bo);
 
-out_pm_put:
+		/* Ensure the device has a pm ref while there are device pages active. */
+		xe_pm_runtime_get_noresume(xe);
+		err = drm_pagemap_migrate_to_devmem(&bo->devmem_allocation, mm,
+						    start, end, timeslice_ms,
+						    xe_svm_devm_owner(xe));
+		if (err)
+			xe_svm_devmem_release(&bo->devmem_allocation);
+		xe_bo_unlock(bo);
+		xe_bo_put(bo);
+	}
 	xe_pm_runtime_put(xe);
 	drm_dev_exit(idx);
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 08/15] drm/xe: Convert existing drm_exec transactions for exhaustive eviction
  2025-08-13 10:51 [PATCH 00/15] Driver-managed exhaustive eviction Thomas Hellström
                   ` (6 preceding siblings ...)
  2025-08-13 10:51 ` [PATCH 07/15] drm/xe: Convert SVM validation " Thomas Hellström
@ 2025-08-13 10:51 ` Thomas Hellström
  2025-08-14  2:48   ` Matthew Brost
  2025-08-13 10:51 ` [PATCH 09/15] drm/xe: Convert the CPU fault handler " Thomas Hellström
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 66+ messages in thread
From: Thomas Hellström @ 2025-08-13 10:51 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

Convert existing drm_exec transactions, like GT pagefault validation,
non-LR exec() IOCTL and the rebind worker to support
exhaustive eviction using the xe_validation_guard().

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_exec.c         |  20 ++--
 drivers/gpu/drm/xe/xe_gt_pagefault.c |  20 ++--
 drivers/gpu/drm/xe/xe_svm.c          |   4 -
 drivers/gpu/drm/xe/xe_vm.c           | 132 +++++++++++----------------
 drivers/gpu/drm/xe/xe_vm.h           |   2 -
 5 files changed, 70 insertions(+), 108 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index 0bcb4fb9a10e..cdc3ff931a90 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -119,10 +119,10 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	struct drm_gpuvm_exec vm_exec = {.extra.fn = xe_exec_fn};
 	struct drm_exec *exec = &vm_exec.exec;
 	u32 i, num_syncs, num_ufence = 0;
+	struct xe_validation_ctx ctx;
 	struct xe_sched_job *job;
 	struct xe_vm *vm;
 	bool write_locked, skip_retry = false;
-	ktime_t end = 0;
 	int err = 0;
 	struct xe_hw_engine_group *group;
 	enum xe_hw_engine_group_execution_mode mode, previous_mode;
@@ -241,17 +241,12 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		goto err_unlock_list;
 	}
 
-	vm_exec.vm = &vm->gpuvm;
-	vm_exec.flags = DRM_EXEC_INTERRUPTIBLE_WAIT;
-	if (xe_vm_in_lr_mode(vm)) {
-		drm_exec_init(exec, vm_exec.flags, 0);
-	} else {
-		err = drm_gpuvm_exec_lock(&vm_exec);
-		if (err) {
-			if (xe_vm_validate_should_retry(exec, err, &end))
-				err = -EAGAIN;
+	if (!xe_vm_in_lr_mode(vm)) {
+		vm_exec.vm = &vm->gpuvm;
+		vm_exec.flags = DRM_EXEC_INTERRUPTIBLE_WAIT;
+		err = xe_validation_exec_lock(&ctx, &vm_exec, &xe->val);
+		if (err)
 			goto err_unlock_list;
-		}
 	}
 
 	if (xe_vm_is_closed_or_banned(q->vm)) {
@@ -345,7 +340,8 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	if (err)
 		xe_sched_job_put(job);
 err_exec:
-	drm_exec_fini(exec);
+	if (!xe_vm_in_lr_mode(vm))
+		xe_validation_ctx_fini(&ctx);
 err_unlock_list:
 	up_read(&vm->lock);
 	if (err == -EAGAIN && !skip_retry)
diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index 2c7f10cc423f..67dc503d6e04 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -112,9 +112,9 @@ static int handle_vma_pagefault(struct xe_gt *gt, struct xe_vma *vma,
 {
 	struct xe_vm *vm = xe_vma_vm(vma);
 	struct xe_tile *tile = gt_to_tile(gt);
+	struct xe_validation_ctx ctx;
 	struct drm_exec exec;
 	struct dma_fence *fence;
-	ktime_t end = 0;
 	int err;
 
 	lockdep_assert_held_write(&vm->lock);
@@ -139,12 +139,11 @@ static int handle_vma_pagefault(struct xe_gt *gt, struct xe_vma *vma,
 	}
 
 	/* Lock VM and BOs dma-resv */
-	drm_exec_init(&exec, 0, 0);
+	xe_validation_ctx_init(&ctx, &vm->xe->val, &exec, 0, 0, false);
 	drm_exec_until_all_locked(&exec) {
 		err = xe_pf_begin(&exec, vma, atomic, tile->mem.vram);
 		drm_exec_retry_on_contention(&exec);
-		if (xe_vm_validate_should_retry(&exec, err, &end))
-			err = -EAGAIN;
+		xe_validation_retry_on_oom(&ctx, &err);
 		if (err)
 			goto unlock_dma_resv;
 
@@ -153,8 +152,7 @@ static int handle_vma_pagefault(struct xe_gt *gt, struct xe_vma *vma,
 		fence = xe_vma_rebind(vm, vma, BIT(tile->id));
 		if (IS_ERR(fence)) {
 			err = PTR_ERR(fence);
-			if (xe_vm_validate_should_retry(&exec, err, &end))
-				err = -EAGAIN;
+			xe_validation_retry_on_oom(&ctx, &err);
 			goto unlock_dma_resv;
 		}
 	}
@@ -163,7 +161,7 @@ static int handle_vma_pagefault(struct xe_gt *gt, struct xe_vma *vma,
 	dma_fence_put(fence);
 
 unlock_dma_resv:
-	drm_exec_fini(&exec);
+	xe_validation_ctx_fini(&ctx);
 	if (err == -EAGAIN)
 		goto retry_userptr;
 
@@ -545,6 +543,7 @@ static int handle_acc(struct xe_gt *gt, struct acc *acc)
 {
 	struct xe_device *xe = gt_to_xe(gt);
 	struct xe_tile *tile = gt_to_tile(gt);
+	struct xe_validation_ctx ctx;
 	struct drm_exec exec;
 	struct xe_vm *vm;
 	struct xe_vma *vma;
@@ -574,15 +573,14 @@ static int handle_acc(struct xe_gt *gt, struct acc *acc)
 		goto unlock_vm;
 
 	/* Lock VM and BOs dma-resv */
-	drm_exec_init(&exec, 0, 0);
+	xe_validation_ctx_init(&ctx, &vm->xe->val, &exec, 0, 0, false);
 	drm_exec_until_all_locked(&exec) {
 		ret = xe_pf_begin(&exec, vma, true, tile->mem.vram);
 		drm_exec_retry_on_contention(&exec);
-		if (ret)
-			break;
+		xe_validation_retry_on_oom(&ctx, &ret);
 	}
 
-	drm_exec_fini(&exec);
+	xe_validation_ctx_fini(&ctx);
 unlock_vm:
 	up_read(&vm->lock);
 	xe_vm_put(vm);
diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index ba85665d85d4..93d10f0b81cb 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -821,7 +821,6 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
 	struct dma_fence *fence;
 	struct xe_tile *tile = gt_to_tile(gt);
 	int migrate_try_count = ctx.devmem_only ? 3 : 1;
-	ktime_t end = 0;
 	int err;
 
 	lockdep_assert_held_write(&vm->lock);
@@ -891,7 +890,6 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
 
 	range_debug(range, "PAGE FAULT - BIND");
 
-retry_bind:
 	xe_vm_lock(vm, false);
 	fence = xe_vm_range_rebind(vm, vma, range, BIT(tile->id));
 	if (IS_ERR(fence)) {
@@ -902,8 +900,6 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
 			range_debug(range, "PAGE FAULT - RETRY BIND");
 			goto retry;
 		}
-		if (xe_vm_validate_should_retry(NULL, err, &end))
-			goto retry_bind;
 		goto err_out;
 	}
 	xe_vm_unlock(vm);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 1c2d9d9065c6..989d84c2e82f 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -241,6 +241,7 @@ int xe_vm_add_compute_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q)
 		.num_fences = 1,
 	};
 	struct drm_exec *exec = &vm_exec.exec;
+	struct xe_validation_ctx ctx;
 	struct dma_fence *pfence;
 	int err;
 	bool wait;
@@ -248,7 +249,7 @@ int xe_vm_add_compute_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q)
 	xe_assert(vm->xe, xe_vm_in_preempt_fence_mode(vm));
 
 	down_write(&vm->lock);
-	err = drm_gpuvm_exec_lock(&vm_exec);
+	err = xe_validation_exec_lock(&ctx, &vm_exec, &vm->xe->val);
 	if (err)
 		goto out_up_write;
 
@@ -280,7 +281,7 @@ int xe_vm_add_compute_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q)
 	up_read(&vm->userptr.notifier_lock);
 
 out_fini:
-	drm_exec_fini(exec);
+	xe_validation_ctx_fini(&ctx);
 out_up_write:
 	up_write(&vm->lock);
 
@@ -363,39 +364,6 @@ void xe_vm_kill(struct xe_vm *vm, bool unlocked)
 	/* TODO: Inform user the VM is banned */
 }
 
-/**
- * xe_vm_validate_should_retry() - Whether to retry after a validate error.
- * @exec: The drm_exec object used for locking before validation.
- * @err: The error returned from ttm_bo_validate().
- * @end: A ktime_t cookie that should be set to 0 before first use and
- * that should be reused on subsequent calls.
- *
- * With multiple active VMs, under memory pressure, it is possible that
- * ttm_bo_validate() run into -EDEADLK and in such case returns -ENOMEM.
- * Until ttm properly handles locking in such scenarios, best thing the
- * driver can do is retry with a timeout. Check if that is necessary, and
- * if so unlock the drm_exec's objects while keeping the ticket to prepare
- * for a rerun.
- *
- * Return: true if a retry after drm_exec_init() is recommended;
- * false otherwise.
- */
-bool xe_vm_validate_should_retry(struct drm_exec *exec, int err, ktime_t *end)
-{
-	ktime_t cur;
-
-	if (err != -ENOMEM)
-		return false;
-
-	cur = ktime_get();
-	*end = *end ? : ktime_add_ms(cur, XE_VM_REBIND_RETRY_TIMEOUT_MS);
-	if (!ktime_before(cur, *end))
-		return false;
-
-	msleep(20);
-	return true;
-}
-
 static int xe_gpuvm_validate(struct drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
 {
 	struct xe_vm *vm = gpuvm_to_vm(vm_bo->vm);
@@ -497,10 +465,10 @@ static int xe_preempt_work_begin(struct drm_exec *exec, struct xe_vm *vm,
 static void preempt_rebind_work_func(struct work_struct *w)
 {
 	struct xe_vm *vm = container_of(w, struct xe_vm, preempt.rebind_work);
+	struct xe_validation_ctx ctx;
 	struct drm_exec exec;
 	unsigned int fence_count = 0;
 	LIST_HEAD(preempt_fences);
-	ktime_t end = 0;
 	int err = 0;
 	long wait;
 	int __maybe_unused tries = 0;
@@ -523,19 +491,20 @@ static void preempt_rebind_work_func(struct work_struct *w)
 			goto out_unlock_outer;
 	}
 
-	drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
+	err = xe_validation_ctx_init(&ctx, &vm->xe->val,
+				     &exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0, false);
+	if (err)
+		goto out_unlock_outer;
 
 	drm_exec_until_all_locked(&exec) {
 		bool done = false;
 
 		err = xe_preempt_work_begin(&exec, vm, &done);
 		drm_exec_retry_on_contention(&exec);
+		xe_validation_retry_on_oom(&ctx, &err);
 		if (err || done) {
 			xe_vm_set_validation_exec(vm, NULL);
-			drm_exec_fini(&exec);
-			if (err && xe_vm_validate_should_retry(&exec, err, &end))
-				err = -EAGAIN;
-
+			xe_validation_ctx_fini(&ctx);
 			goto out_unlock_outer;
 		}
 	}
@@ -582,7 +551,7 @@ static void preempt_rebind_work_func(struct work_struct *w)
 
 out_unlock:
 	xe_vm_set_validation_exec(vm, NULL);
-	drm_exec_fini(&exec);
+	xe_validation_ctx_fini(&ctx);
 out_unlock_outer:
 	if (err == -EAGAIN) {
 		trace_xe_vm_rebind_worker_retry(vm);
@@ -1400,20 +1369,19 @@ int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma)
 
 static void xe_vma_destroy_unlocked(struct xe_vma *vma)
 {
+	struct xe_device *xe = xe_vma_vm(vma)->xe;
+	struct xe_validation_ctx ctx;
 	struct drm_exec exec;
-	int err;
+	int err = 0;
 
-	drm_exec_init(&exec, 0, 0);
-	drm_exec_until_all_locked(&exec) {
+	xe_validation_guard(&ctx, &xe->val, &exec, 0, err, false) {
 		err = xe_vm_lock_vma(&exec, vma);
 		drm_exec_retry_on_contention(&exec);
 		if (XE_WARN_ON(err))
 			break;
+		xe_vma_destroy(vma, NULL);
 	}
-
-	xe_vma_destroy(vma, NULL);
-
-	drm_exec_fini(&exec);
+	xe_assert(xe, !err);
 }
 
 struct xe_vma *
@@ -2490,6 +2458,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
 			      u16 pat_index, unsigned int flags)
 {
 	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
+	struct xe_validation_ctx ctx;
 	struct drm_exec exec;
 	struct xe_vma *vma;
 	int err = 0;
@@ -2497,9 +2466,9 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
 	lockdep_assert_held_write(&vm->lock);
 
 	if (bo) {
-		drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
-		drm_exec_until_all_locked(&exec) {
-			err = 0;
+		err = 0;
+		xe_validation_guard(&ctx, &vm->xe->val, &exec,
+				    DRM_EXEC_INTERRUPTIBLE_WAIT, err, false) {
 			if (!bo->vm) {
 				err = drm_exec_lock_obj(&exec, xe_vm_obj(vm));
 				drm_exec_retry_on_contention(&exec);
@@ -2508,27 +2477,34 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
 				err = drm_exec_lock_obj(&exec, &bo->ttm.base);
 				drm_exec_retry_on_contention(&exec);
 			}
-			if (err) {
-				drm_exec_fini(&exec);
+			if (err)
 				return ERR_PTR(err);
+
+			vma = xe_vma_create(vm, bo, op->gem.offset,
+					    op->va.addr, op->va.addr +
+					    op->va.range - 1, pat_index, flags);
+			if (IS_ERR(vma))
+				return vma;
+
+			if (!bo->vm) {
+				err = add_preempt_fences(vm, bo);
+				goto out_err;
 			}
 		}
+		if (err)
+			return ERR_PTR(err);
+	} else {
+		vma = xe_vma_create(vm, NULL, op->gem.offset,
+				    op->va.addr, op->va.addr +
+				    op->va.range - 1, pat_index, flags);
+		if (IS_ERR(vma))
+			return vma;
+
+		if (xe_vma_is_userptr(vma))
+			err = xe_vma_userptr_pin_pages(to_userptr_vma(vma));
 	}
-	vma = xe_vma_create(vm, bo, op->gem.offset,
-			    op->va.addr, op->va.addr +
-			    op->va.range - 1, pat_index, flags);
-	if (IS_ERR(vma))
-		goto err_unlock;
-
-	if (xe_vma_is_userptr(vma))
-		err = xe_vma_userptr_pin_pages(to_userptr_vma(vma));
-	else if (!xe_vma_has_no_bo(vma) && !bo->vm)
-		err = add_preempt_fences(vm, bo);
-
-err_unlock:
-	if (bo)
-		drm_exec_fini(&exec);
 
+out_err:
 	if (err) {
 		prep_vma_destroy(vm, vma, false);
 		xe_vma_destroy_unlocked(vma);
@@ -3296,34 +3272,32 @@ static void vm_bind_ioctl_ops_fini(struct xe_vm *vm, struct xe_vma_ops *vops,
 static struct dma_fence *vm_bind_ioctl_ops_execute(struct xe_vm *vm,
 						   struct xe_vma_ops *vops)
 {
+	struct xe_validation_ctx ctx;
 	struct drm_exec exec;
 	struct dma_fence *fence;
-	int err;
+	int err = 0;
 
 	lockdep_assert_held_write(&vm->lock);
 
-	drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT |
-		      DRM_EXEC_IGNORE_DUPLICATES, 0);
-	drm_exec_until_all_locked(&exec) {
+	xe_validation_guard(&ctx, &vm->xe->val, &exec,
+			    DRM_EXEC_INTERRUPTIBLE_WAIT |
+			    DRM_EXEC_IGNORE_DUPLICATES, err, true) {
 		err = vm_bind_ioctl_ops_lock_and_prep(&exec, vm, vops);
 		drm_exec_retry_on_contention(&exec);
-		if (err) {
-			fence = ERR_PTR(err);
-			goto unlock;
-		}
+		xe_validation_retry_on_oom(&ctx, &err);
+		if (err)
+			return ERR_PTR(err);
 
 		fence = ops_execute(vm, vops);
 		if (IS_ERR(fence)) {
 			if (PTR_ERR(fence) == -ENODATA)
 				vm_bind_ioctl_ops_fini(vm, vops, NULL);
-			goto unlock;
+			return fence;
 		}
 
 		vm_bind_ioctl_ops_fini(vm, vops, fence);
 	}
 
-unlock:
-	drm_exec_fini(&exec);
 	return fence;
 }
 ALLOW_ERROR_INJECTION(vm_bind_ioctl_ops_execute, ERRNO);
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 3b6e7234dac4..418940222690 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -262,8 +262,6 @@ int xe_vma_userptr_pin_pages(struct xe_userptr_vma *uvma);
 
 int xe_vma_userptr_check_repin(struct xe_userptr_vma *uvma);
 
-bool xe_vm_validate_should_retry(struct drm_exec *exec, int err, ktime_t *end);
-
 int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma);
 
 int xe_vm_validate_rebind(struct xe_vm *vm, struct drm_exec *exec,
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 09/15] drm/xe: Convert the CPU fault handler for exhaustive eviction
  2025-08-13 10:51 [PATCH 00/15] Driver-managed exhaustive eviction Thomas Hellström
                   ` (7 preceding siblings ...)
  2025-08-13 10:51 ` [PATCH 08/15] drm/xe: Convert existing drm_exec transactions " Thomas Hellström
@ 2025-08-13 10:51 ` Thomas Hellström
  2025-08-13 22:06   ` Matthew Brost
  2025-08-13 10:51 ` [PATCH 10/15] drm/xe/display: Convert __xe_pin_fb_vma() Thomas Hellström
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 66+ messages in thread
From: Thomas Hellström @ 2025-08-13 10:51 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

The CPU fault handler may populate bos and migrate, and in doing
so might interfere with other tasks validing.

Convert it for exhaustive eviction. To do this properly without
potentially introducing stalls with the mmap lock held requires
TTM work. In the meantime, let's live with those stalls that
would typically happen on memory pressure.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 5e40b6cb8d2a..dd1e0e9957e0 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1720,14 +1720,18 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
 	struct xe_device *xe = to_xe_device(ddev);
 	struct xe_bo *bo = ttm_to_xe_bo(tbo);
 	bool needs_rpm = bo->flags & XE_BO_FLAG_VRAM_MASK;
-	struct drm_exec *exec;
+	struct xe_validation_ctx ctx;
+	struct drm_exec exec;
 	vm_fault_t ret;
 	int idx;
 
 	if (needs_rpm)
 		xe_pm_runtime_get(xe);
 
-	exec = XE_VALIDATION_UNIMPLEMENTED;
+	if (xe_validation_ctx_init(&ctx, &xe->val, &exec,
+				   DRM_EXEC_INTERRUPTIBLE_WAIT, 0, false))
+		return VM_FAULT_NOPAGE;
+
 	ret = ttm_bo_vm_reserve(tbo, vmf);
 	if (ret)
 		goto out;
@@ -1735,7 +1739,7 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
 	if (drm_dev_enter(ddev, &idx)) {
 		trace_xe_bo_cpu_fault(bo);
 
-		xe_validation_assert_exec(xe, exec, &tbo->base);
+		xe_validation_assert_exec(xe, &exec, &tbo->base);
 		ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
 					       TTM_BO_VM_NUM_PREFAULT);
 		drm_dev_exit(idx);
@@ -1761,6 +1765,7 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
 
 	dma_resv_unlock(tbo->base.resv);
 out:
+	xe_validation_ctx_fini(&ctx);
 	if (needs_rpm)
 		xe_pm_runtime_put(xe);
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 10/15] drm/xe/display: Convert __xe_pin_fb_vma()
  2025-08-13 10:51 [PATCH 00/15] Driver-managed exhaustive eviction Thomas Hellström
                   ` (8 preceding siblings ...)
  2025-08-13 10:51 ` [PATCH 09/15] drm/xe: Convert the CPU fault handler " Thomas Hellström
@ 2025-08-13 10:51 ` Thomas Hellström
  2025-08-14  2:35   ` Matthew Brost
  2025-08-13 10:51 ` [PATCH 11/15] drm/xe: Convert xe_dma_buf.c for exhaustive eviction Thomas Hellström
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 66+ messages in thread
From: Thomas Hellström @ 2025-08-13 10:51 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

Convert __xe_pin_fb_vma() for exhaustive eviction
using xe_validation_guard().

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/display/xe_fb_pin.c | 27 +++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
index 4b0748e6fdd6..43c45344ea26 100644
--- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
+++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
@@ -281,7 +281,8 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
 	struct i915_vma *vma = kzalloc(sizeof(*vma), GFP_KERNEL);
 	struct drm_gem_object *obj = intel_fb_bo(&fb->base);
 	struct xe_bo *bo = gem_to_xe_bo(obj);
-	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
+	struct xe_validation_ctx ctx;
+	struct drm_exec exec;
 	int ret;
 
 	if (!vma)
@@ -309,17 +310,21 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
 	 * Pin the framebuffer, we can't use xe_bo_(un)pin functions as the
 	 * assumptions are incorrect for framebuffers
 	 */
-	ret = ttm_bo_reserve(&bo->ttm, false, false, NULL);
-	if (ret)
-		goto err;
+	xe_validation_guard(&ctx, &xe->val, &exec, 0, ret, false) {
+		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
+		drm_exec_retry_on_contention(&exec);
+		if (ret)
+			goto err;
 
-	if (IS_DGFX(xe))
-		ret = xe_bo_migrate(bo, XE_PL_VRAM0, exec);
-	else
-		ret = xe_bo_validate(bo, NULL, true, exec);
-	if (!ret)
-		ttm_bo_pin(&bo->ttm);
-	ttm_bo_unreserve(&bo->ttm);
+		if (IS_DGFX(xe))
+			ret = xe_bo_migrate(bo, XE_PL_VRAM0, &exec);
+		else
+			ret = xe_bo_validate(bo, NULL, true, &exec);
+		drm_exec_retry_on_contention(&exec);
+		xe_validation_retry_on_oom(&ctx, &ret);
+		if (!ret)
+			ttm_bo_pin(&bo->ttm);
+	}
 	if (ret)
 		goto err;
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 11/15] drm/xe: Convert xe_dma_buf.c for exhaustive eviction
  2025-08-13 10:51 [PATCH 00/15] Driver-managed exhaustive eviction Thomas Hellström
                   ` (9 preceding siblings ...)
  2025-08-13 10:51 ` [PATCH 10/15] drm/xe/display: Convert __xe_pin_fb_vma() Thomas Hellström
@ 2025-08-13 10:51 ` Thomas Hellström
  2025-08-13 21:37   ` Matthew Brost
  2025-08-14 20:37   ` Matthew Brost
  2025-08-13 10:51 ` [PATCH 12/15] drm/xe: Rename ___xe_bo_create_locked() Thomas Hellström
                   ` (7 subsequent siblings)
  18 siblings, 2 replies; 66+ messages in thread
From: Thomas Hellström @ 2025-08-13 10:51 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

Convert dma-buf migration to XE_PL_TT and dma-buf import to
support exhaustive eviction, using xe_validation_guard().
It seems unlikely that the import would result in an -ENOMEM,
but convert import anyway for completeness.

The dma-buf map_attachment() functionality unfortunately doesn't
support passing a drm_exec, which means that foreign devices
validating a dma-buf that we exported will not, unless they are
xeKMD devices, participate in the exhaustive eviction scheme.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_dma_buf.c | 59 +++++++++++++++++++++++----------
 1 file changed, 42 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
index 78a827d4e726..56df1d84df21 100644
--- a/drivers/gpu/drm/xe/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/xe_dma_buf.c
@@ -163,16 +163,27 @@ static int xe_dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
 	struct xe_bo *bo = gem_to_xe_bo(obj);
 	bool reads =  (direction == DMA_BIDIRECTIONAL ||
 		       direction == DMA_FROM_DEVICE);
-	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
+	struct xe_validation_ctx ctx;
+	struct drm_exec exec;
+	int ret = 0;
 
 	if (!reads)
 		return 0;
 
 	/* Can we do interruptible lock here? */
-	xe_bo_lock(bo, false);
-	(void)xe_bo_migrate(bo, XE_PL_TT, exec);
-	xe_bo_unlock(bo);
-
+	xe_validation_guard(&ctx, &xe_bo_device(bo)->val, &exec, 0, ret, false) {
+		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
+		drm_exec_retry_on_contention(&exec);
+		if (ret)
+			goto out;
+
+		ret = xe_bo_migrate(bo, XE_PL_TT, &exec);
+		drm_exec_retry_on_contention(&exec);
+		xe_validation_retry_on_oom(&ctx, &ret);
+	}
+out:
+	/* If we failed, cpu-access takes place in current placement. */
+	(void)ret;
 	return 0;
 }
 
@@ -211,24 +222,38 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
 {
 	struct dma_resv *resv = dma_buf->resv;
 	struct xe_device *xe = to_xe_device(dev);
-	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
+	struct xe_validation_ctx ctx;
+	struct drm_gem_object *dummy_obj;
+	struct drm_exec exec;
 	struct xe_bo *bo;
-	int ret;
-
-	dma_resv_lock(resv, NULL);
-	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
-				    0, /* Will require 1way or 2way for vm_bind */
-				    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM, exec);
-	if (IS_ERR(bo)) {
-		ret = PTR_ERR(bo);
-		goto error;
+	int ret = 0;
+
+	dummy_obj = drm_gpuvm_resv_object_alloc(&xe->drm);
+	if (!dummy_obj)
+		return ERR_PTR(-ENOMEM);
+
+	dummy_obj->resv = resv;
+	xe_validation_guard(&ctx, &xe->val, &exec, 0, ret, false) {
+		ret = drm_exec_lock_obj(&exec, dummy_obj);
+		drm_exec_retry_on_contention(&exec);
+		if (ret)
+			goto error;
+
+		bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
+					    0, /* Will require 1way or 2way for vm_bind */
+					    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM, &exec);
+		drm_exec_retry_on_contention(&exec);
+		if (IS_ERR(bo)) {
+			ret = PTR_ERR(bo);
+			xe_validation_retry_on_oom(&ctx, &ret);
+			goto error;
+		}
 	}
-	dma_resv_unlock(resv);
+	drm_gem_object_put(dummy_obj);
 
 	return &bo->ttm.base;
 
 error:
-	dma_resv_unlock(resv);
 	return ERR_PTR(ret);
 }
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 12/15] drm/xe: Rename ___xe_bo_create_locked()
  2025-08-13 10:51 [PATCH 00/15] Driver-managed exhaustive eviction Thomas Hellström
                   ` (10 preceding siblings ...)
  2025-08-13 10:51 ` [PATCH 11/15] drm/xe: Convert xe_dma_buf.c for exhaustive eviction Thomas Hellström
@ 2025-08-13 10:51 ` Thomas Hellström
  2025-08-13 21:33   ` Matthew Brost
  2025-08-13 10:51 ` [PATCH 13/15] drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction Thomas Hellström
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 66+ messages in thread
From: Thomas Hellström @ 2025-08-13 10:51 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

Don't start external function names with underscores.
Rename to xe_bo_init_locked().

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c      | 39 ++++++++++++++++-----------------
 drivers/gpu/drm/xe/xe_bo.h      | 10 ++++-----
 drivers/gpu/drm/xe/xe_dma_buf.c |  6 ++---
 3 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index dd1e0e9957e0..23b28eeef59f 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1161,10 +1161,10 @@ int xe_bo_notifier_prepare_pinned(struct xe_bo *bo)
 	if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
 		goto out_unlock_bo;
 
-	backup = ___xe_bo_create_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
-					DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
-					XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
-					XE_BO_FLAG_PINNED, exec);
+	backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
+				   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
+				   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
+				   XE_BO_FLAG_PINNED, exec);
 	if (IS_ERR(backup)) {
 		ret = PTR_ERR(backup);
 		goto out_unlock_bo;
@@ -1240,11 +1240,10 @@ int xe_bo_evict_pinned(struct xe_bo *bo)
 		goto out_unlock_bo;
 
 	if (!backup) {
-		backup = ___xe_bo_create_locked(xe, NULL, NULL, bo->ttm.base.resv,
-						NULL, xe_bo_size(bo),
-						DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
-						XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
-						XE_BO_FLAG_PINNED, exec);
+		backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
+					   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
+					   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
+					   XE_BO_FLAG_PINNED, exec);
 		if (IS_ERR(backup)) {
 			ret = PTR_ERR(backup);
 			goto out_unlock_bo;
@@ -1861,7 +1860,7 @@ void xe_bo_free(struct xe_bo *bo)
 }
 
 /**
- * ___xe_bo_create_locked() - Initialize or create an xe_bo.
+ * xe_bo_init_locked() - Initialize or create an xe_bo.
  * @xe: The xe device.
  * @bo: An already allocated buffer object or NULL
  * if the function should allocate a new one.
@@ -1881,11 +1880,11 @@ void xe_bo_free(struct xe_bo *bo)
  *
  * Return: The buffer object on success. Negative error pointer on failure.
  */
-struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
-				     struct xe_tile *tile, struct dma_resv *resv,
-				     struct ttm_lru_bulk_move *bulk, size_t size,
-				     u16 cpu_caching, enum ttm_bo_type type,
-				     u32 flags, struct drm_exec *exec)
+struct xe_bo *xe_bo_init_locked(struct xe_device *xe, struct xe_bo *bo,
+				struct xe_tile *tile, struct dma_resv *resv,
+				struct ttm_lru_bulk_move *bulk, size_t size,
+				u16 cpu_caching, enum ttm_bo_type type,
+				u32 flags, struct drm_exec *exec)
 {
 	struct ttm_operation_ctx ctx = {
 		.interruptible = true,
@@ -2077,11 +2076,11 @@ __xe_bo_create_locked(struct xe_device *xe,
 		}
 	}
 
-	bo = ___xe_bo_create_locked(xe, bo, tile, vm ? xe_vm_resv(vm) : NULL,
-				    vm && !xe_vm_in_fault_mode(vm) &&
-				    flags & XE_BO_FLAG_USER ?
-				    &vm->lru_bulk_move : NULL, size,
-				    cpu_caching, type, flags, exec);
+	bo = xe_bo_init_locked(xe, bo, tile, vm ? xe_vm_resv(vm) : NULL,
+			       vm && !xe_vm_in_fault_mode(vm) &&
+			       flags & XE_BO_FLAG_USER ?
+			       &vm->lru_bulk_move : NULL, size,
+			       cpu_caching, type, flags, exec);
 	if (IS_ERR(bo))
 		return bo;
 
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index c6bb90ca5c2e..a625806deeb6 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -89,11 +89,11 @@ struct sg_table;
 struct xe_bo *xe_bo_alloc(void);
 void xe_bo_free(struct xe_bo *bo);
 
-struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
-				     struct xe_tile *tile, struct dma_resv *resv,
-				     struct ttm_lru_bulk_move *bulk, size_t size,
-				     u16 cpu_caching, enum ttm_bo_type type,
-				     u32 flags, struct drm_exec *exec);
+struct xe_bo *xe_bo_init_locked(struct xe_device *xe, struct xe_bo *bo,
+				struct xe_tile *tile, struct dma_resv *resv,
+				struct ttm_lru_bulk_move *bulk, size_t size,
+				u16 cpu_caching, enum ttm_bo_type type,
+				u32 flags, struct drm_exec *exec);
 struct xe_bo *
 xe_bo_create_locked_range(struct xe_device *xe,
 			  struct xe_tile *tile, struct xe_vm *vm,
diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
index 56df1d84df21..ca6e397828ad 100644
--- a/drivers/gpu/drm/xe/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/xe_dma_buf.c
@@ -239,9 +239,9 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
 		if (ret)
 			goto error;
 
-		bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
-					    0, /* Will require 1way or 2way for vm_bind */
-					    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM, &exec);
+		bo = xe_bo_init_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
+				       0, /* Will require 1way or 2way for vm_bind */
+				       ttm_bo_type_sg, XE_BO_FLAG_SYSTEM, &exec);
 		drm_exec_retry_on_contention(&exec);
 		if (IS_ERR(bo)) {
 			ret = PTR_ERR(bo);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 13/15] drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction
  2025-08-13 10:51 [PATCH 00/15] Driver-managed exhaustive eviction Thomas Hellström
                   ` (11 preceding siblings ...)
  2025-08-13 10:51 ` [PATCH 12/15] drm/xe: Rename ___xe_bo_create_locked() Thomas Hellström
@ 2025-08-13 10:51 ` Thomas Hellström
  2025-08-14  3:58   ` Matthew Brost
                     ` (2 more replies)
  2025-08-13 10:51 ` [PATCH 14/15] drm/xe: Convert xe_bo_create_pin_map() " Thomas Hellström
                   ` (5 subsequent siblings)
  18 siblings, 3 replies; 66+ messages in thread
From: Thomas Hellström @ 2025-08-13 10:51 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

Most users of xe_bo_create_pin_map_at() and
xe_bo_create_pin_map_at_aligned() are not using the vm parameter,
and that simplifies conversion. Introduce an
xe_bo_create_pin_map_at_novm() function and make the _aligned()
version static. Use xe_validation_guard() for conversion.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 .../compat-i915-headers/gem/i915_gem_stolen.h | 24 ++----
 drivers/gpu/drm/xe/display/xe_fb_pin.c        | 45 +++++-----
 drivers/gpu/drm/xe/display/xe_plane_initial.c |  4 +-
 drivers/gpu/drm/xe/xe_bo.c                    | 83 ++++++++++++++-----
 drivers/gpu/drm/xe/xe_bo.h                    | 13 +--
 drivers/gpu/drm/xe/xe_eu_stall.c              |  6 +-
 6 files changed, 101 insertions(+), 74 deletions(-)

diff --git a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
index 1ce1e9da975b..ab48635ddffa 100644
--- a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
+++ b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
@@ -21,9 +21,7 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
 						       u32 size, u32 align,
 						       u32 start, u32 end)
 {
-	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
 	struct xe_bo *bo;
-	int err;
 	u32 flags = XE_BO_FLAG_PINNED | XE_BO_FLAG_STOLEN;
 
 	if (start < SZ_4K)
@@ -34,25 +32,15 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
 		start = ALIGN(start, align);
 	}
 
-	bo = xe_bo_create_locked_range(xe, xe_device_get_root_tile(xe),
-				       NULL, size, start, end,
-				       ttm_bo_type_kernel, flags, 0, exec);
-	if (IS_ERR(bo)) {
-		err = PTR_ERR(bo);
-		bo = NULL;
-		return err;
-	}
-	err = xe_bo_pin(bo, exec);
-	xe_bo_unlock_vm_held(bo);
-
-	if (err) {
-		xe_bo_put(fb->bo);
-		bo = NULL;
-	}
+	bo = xe_bo_create_pin_map_at_novm(xe, xe_device_get_root_tile(xe),
+					  size, start, ttm_bo_type_kernel, flags,
+					  false, 0, true);
+	if (IS_ERR(bo))
+		return PTR_ERR(bo);
 
 	fb->bo = bo;
 
-	return err;
+	return 0;
 }
 
 static inline int i915_gem_stolen_insert_node(struct xe_device *xe,
diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
index 43c45344ea26..d46ff7ebb0a1 100644
--- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
+++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
@@ -102,29 +102,32 @@ static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb,
 				 XE_PAGE_SIZE);
 
 	if (IS_DGFX(xe))
-		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0, NULL,
-						      dpt_size, ~0ull,
-						      ttm_bo_type_kernel,
-						      XE_BO_FLAG_VRAM0 |
-						      XE_BO_FLAG_GGTT |
-						      XE_BO_FLAG_PAGETABLE,
-						      alignment);
+		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
+						   dpt_size, ~0ull,
+						   ttm_bo_type_kernel,
+						   true,
+						   XE_BO_FLAG_VRAM0 |
+						   XE_BO_FLAG_GGTT |
+						   XE_BO_FLAG_PAGETABLE,
+						   alignment, false);
 	else
-		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0, NULL,
-						      dpt_size,  ~0ull,
-						      ttm_bo_type_kernel,
-						      XE_BO_FLAG_STOLEN |
-						      XE_BO_FLAG_GGTT |
-						      XE_BO_FLAG_PAGETABLE,
-						      alignment);
+		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
+						   dpt_size,  ~0ull,
+						   ttm_bo_type_kernel,
+						   true,
+						   XE_BO_FLAG_STOLEN |
+						   XE_BO_FLAG_GGTT |
+						   XE_BO_FLAG_PAGETABLE,
+						   alignment, false);
 	if (IS_ERR(dpt))
-		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0, NULL,
-						      dpt_size,  ~0ull,
-						      ttm_bo_type_kernel,
-						      XE_BO_FLAG_SYSTEM |
-						      XE_BO_FLAG_GGTT |
-						      XE_BO_FLAG_PAGETABLE,
-						      alignment);
+		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
+						   dpt_size,  ~0ull,
+						   ttm_bo_type_kernel,
+						   true,
+						   XE_BO_FLAG_SYSTEM |
+						   XE_BO_FLAG_GGTT |
+						   XE_BO_FLAG_PAGETABLE,
+						   alignment, false);
 	if (IS_ERR(dpt))
 		return PTR_ERR(dpt);
 
diff --git a/drivers/gpu/drm/xe/display/xe_plane_initial.c b/drivers/gpu/drm/xe/display/xe_plane_initial.c
index 826ac3d578b7..79d00127caf4 100644
--- a/drivers/gpu/drm/xe/display/xe_plane_initial.c
+++ b/drivers/gpu/drm/xe/display/xe_plane_initial.c
@@ -140,8 +140,8 @@ initial_plane_bo(struct xe_device *xe,
 			page_size);
 	size -= base;
 
-	bo = xe_bo_create_pin_map_at(xe, tile0, NULL, size, phys_base,
-				     ttm_bo_type_kernel, flags);
+	bo = xe_bo_create_pin_map_at_novm(xe, tile0, size, phys_base,
+					  ttm_bo_type_kernel, flags, true, 0, false);
 	if (IS_ERR(bo)) {
 		drm_dbg(&xe->drm,
 			"Failed to create bo phys_base=%pa size %u with flags %x: %li\n",
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 23b28eeef59f..c9928d4ee5a0 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -2253,29 +2253,20 @@ struct xe_bo *xe_bo_create_user(struct xe_device *xe,
 	return bo;
 }
 
-struct xe_bo *xe_bo_create_pin_map_at(struct xe_device *xe, struct xe_tile *tile,
-				      struct xe_vm *vm,
-				      size_t size, u64 offset,
-				      enum ttm_bo_type type, u32 flags)
-{
-	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size, offset,
-					       type, flags, 0);
-}
-
-struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
-					      struct xe_tile *tile,
-					      struct xe_vm *vm,
-					      size_t size, u64 offset,
-					      enum ttm_bo_type type, u32 flags,
-					      u64 alignment)
+static struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
+						     struct xe_tile *tile,
+						     struct xe_vm *vm,
+						     size_t size, u64 offset,
+						     enum ttm_bo_type type, u32 flags,
+						     bool vmap, u64 alignment,
+						     struct drm_exec *exec)
 {
 	struct xe_bo *bo;
 	int err;
 	u64 start = offset == ~0ull ? 0 : offset;
 	u64 end = offset == ~0ull ? offset : start + size;
-	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
 
-	if (flags & XE_BO_FLAG_STOLEN &&
+	if (flags & XE_BO_FLAG_STOLEN && vmap &&
 	    xe_ttm_stolen_cpu_access_needs_ggtt(xe))
 		flags |= XE_BO_FLAG_GGTT;
 
@@ -2289,9 +2280,11 @@ struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
 	if (err)
 		goto err_put;
 
-	err = xe_bo_vmap(bo);
-	if (err)
-		goto err_unpin;
+	if (vmap) {
+		err = xe_bo_vmap(bo);
+		if (err)
+			goto err_unpin;
+	}
 
 	xe_bo_unlock_vm_held(bo);
 
@@ -2305,11 +2298,59 @@ struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
 	return ERR_PTR(err);
 }
 
+/**
+ * xe_bo_create_pin_map_at_novm() - Create pinned and mapped bo at optional VRAM offset
+ * @xe: The xe device.
+ * @tile: The tile to select for migration of this bo, and the tile used for
+ * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
+ * @size: The storage size to use for the bo.
+ * @offset: Optional VRAM offset or %0 for don't care.
+ * @type: The TTM buffer object type.
+ * @flags: XE_BO_FLAG_ flags.
+ * @vmap: Whether to create a buffer object map.
+ * @alignment: GGTT alignment.
+ * @intr: Whether to execut any waits for backing store interruptible.
+ *
+ * Create a pinned and optionally mapped bo with VRAM offset and GGTT alignment
+ * options. The bo will be external and not associated with a VM.
+ *
+ * Return: The buffer object on success. Negative error pointer on failure.
+ * In particular, the function may return ERR_PTR(%-EINTR) if @intr was set
+ * to true on entry.
+ */
+struct xe_bo *
+xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile *tile,
+			     size_t size, u64 offset, enum ttm_bo_type type, u32 flags,
+			     bool vmap, u64 alignment, bool intr)
+{
+	u32 drm_exec_flags = intr ? DRM_EXEC_INTERRUPTIBLE_WAIT : 0;
+	struct xe_validation_ctx ctx;
+	struct drm_exec exec;
+	struct xe_bo *bo;
+	int ret = 0;
+
+	xe_validation_guard(&ctx, &xe->val, &exec, drm_exec_flags, ret, false) {
+		bo = xe_bo_create_pin_map_at_aligned(xe, tile, NULL, size, offset,
+						     type, flags, vmap,
+						     alignment, &exec);
+		drm_exec_retry_on_contention(&exec);
+		if (IS_ERR(bo)) {
+			ret = PTR_ERR(bo);
+			xe_validation_retry_on_oom(&ctx, &ret);
+		}
+	}
+
+	return ret ? ERR_PTR(ret) : bo;
+}
+
 struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
 				   struct xe_vm *vm, size_t size,
 				   enum ttm_bo_type type, u32 flags)
 {
-	return xe_bo_create_pin_map_at(xe, tile, vm, size, ~0ull, type, flags);
+	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
+
+	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size, ~0ull, type, flags,
+					       true, 0, exec);
 }
 
 static void __xe_bo_unpin_map_no_vm(void *arg)
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index a625806deeb6..d06266af9662 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -109,15 +109,10 @@ struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_vm *vm, size_t s
 struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
 				   struct xe_vm *vm, size_t size,
 				   enum ttm_bo_type type, u32 flags);
-struct xe_bo *xe_bo_create_pin_map_at(struct xe_device *xe, struct xe_tile *tile,
-				      struct xe_vm *vm, size_t size, u64 offset,
-				      enum ttm_bo_type type, u32 flags);
-struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
-					      struct xe_tile *tile,
-					      struct xe_vm *vm,
-					      size_t size, u64 offset,
-					      enum ttm_bo_type type, u32 flags,
-					      u64 alignment);
+struct xe_bo *
+xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile *tile,
+			     size_t size, u64 offset, enum ttm_bo_type type,
+			     u32 flags, bool vmap, u64 alignment, bool intr);
 struct xe_bo *xe_managed_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
 					   size_t size, u32 flags);
 struct xe_bo *xe_managed_bo_create_from_data(struct xe_device *xe, struct xe_tile *tile,
diff --git a/drivers/gpu/drm/xe/xe_eu_stall.c b/drivers/gpu/drm/xe/xe_eu_stall.c
index fdd514fec5ef..afabfc125488 100644
--- a/drivers/gpu/drm/xe/xe_eu_stall.c
+++ b/drivers/gpu/drm/xe/xe_eu_stall.c
@@ -617,9 +617,9 @@ static int xe_eu_stall_data_buf_alloc(struct xe_eu_stall_data_stream *stream,
 
 	size = stream->per_xecore_buf_size * last_xecore;
 
-	bo = xe_bo_create_pin_map_at_aligned(tile->xe, tile, NULL,
-					     size, ~0ull, ttm_bo_type_kernel,
-					     XE_BO_FLAG_SYSTEM | XE_BO_FLAG_GGTT, SZ_64);
+	bo = xe_bo_create_pin_map_at_novm(tile->xe, tile, size, ~0ull, ttm_bo_type_kernel,
+					  XE_BO_FLAG_SYSTEM | XE_BO_FLAG_GGTT, true,
+					  SZ_64, false);
 	if (IS_ERR(bo)) {
 		kfree(stream->xecore_buf);
 		return PTR_ERR(bo);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 14/15] drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction
  2025-08-13 10:51 [PATCH 00/15] Driver-managed exhaustive eviction Thomas Hellström
                   ` (12 preceding siblings ...)
  2025-08-13 10:51 ` [PATCH 13/15] drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction Thomas Hellström
@ 2025-08-13 10:51 ` Thomas Hellström
  2025-08-14  4:18   ` Matthew Brost
  2025-08-13 10:51 ` [PATCH 15/15] drm/xe: Convert pinned suspend eviction " Thomas Hellström
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 66+ messages in thread
From: Thomas Hellström @ 2025-08-13 10:51 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

Introduce an xe_bo_create_pin_map_novm() function that does not
take the drm_exec paramenter to simplify the conversion of many
callsites.
For the rest, ensure that the same drm_exec context that was used
for locking the vm is passed down to validation.

Use xe_validation_guard() where appropriate.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/display/intel_fbdev_fb.c   |  18 +--
 drivers/gpu/drm/xe/display/xe_dsb_buffer.c    |  10 +-
 drivers/gpu/drm/xe/display/xe_fb_pin.c        |  39 +++---
 drivers/gpu/drm/xe/display/xe_hdcp_gsc.c      |   8 +-
 drivers/gpu/drm/xe/tests/xe_migrate.c         |   9 +-
 drivers/gpu/drm/xe/xe_bo.c                    |  53 +++++++-
 drivers/gpu/drm/xe/xe_bo.h                    |   6 +-
 drivers/gpu/drm/xe/xe_gsc.c                   |   8 +-
 drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c    |  24 ++--
 drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c |  24 ++--
 drivers/gpu/drm/xe/xe_guc_engine_activity.c   |  13 +-
 drivers/gpu/drm/xe/xe_lmtt.c                  |  12 +-
 drivers/gpu/drm/xe/xe_lrc.c                   |   7 +-
 drivers/gpu/drm/xe/xe_migrate.c               |  20 ++-
 drivers/gpu/drm/xe/xe_oa.c                    |   6 +-
 drivers/gpu/drm/xe/xe_pt.c                    |  10 +-
 drivers/gpu/drm/xe/xe_pt.h                    |   3 +-
 drivers/gpu/drm/xe/xe_pxp_submit.c            |  34 +++--
 drivers/gpu/drm/xe/xe_vm.c                    | 119 ++++++++++--------
 19 files changed, 252 insertions(+), 171 deletions(-)

diff --git a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
index d96ba2b51065..8ea9a472113c 100644
--- a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
+++ b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
@@ -42,11 +42,11 @@ struct intel_framebuffer *intel_fbdev_fb_alloc(struct drm_fb_helper *helper,
 	obj = ERR_PTR(-ENODEV);
 
 	if (!IS_DGFX(xe) && !XE_GT_WA(xe_root_mmio_gt(xe), 22019338487_display)) {
-		obj = xe_bo_create_pin_map(xe, xe_device_get_root_tile(xe),
-					   NULL, size,
-					   ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
-					   XE_BO_FLAG_STOLEN |
-					   XE_BO_FLAG_GGTT);
+		obj = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe),
+						size,
+						ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
+						XE_BO_FLAG_STOLEN |
+						XE_BO_FLAG_GGTT, false);
 		if (!IS_ERR(obj))
 			drm_info(&xe->drm, "Allocated fbdev into stolen\n");
 		else
@@ -54,10 +54,10 @@ struct intel_framebuffer *intel_fbdev_fb_alloc(struct drm_fb_helper *helper,
 	}
 
 	if (IS_ERR(obj)) {
-		obj = xe_bo_create_pin_map(xe, xe_device_get_root_tile(xe), NULL, size,
-					   ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
-					   XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
-					   XE_BO_FLAG_GGTT);
+		obj = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe), size,
+						ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
+						XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
+						XE_BO_FLAG_GGTT, false);
 	}
 
 	if (IS_ERR(obj)) {
diff --git a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
index 9f941fc2e36b..58581d7aaae6 100644
--- a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
+++ b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
@@ -43,11 +43,11 @@ bool intel_dsb_buffer_create(struct intel_crtc *crtc, struct intel_dsb_buffer *d
 		return false;
 
 	/* Set scanout flag for WC mapping */
-	obj = xe_bo_create_pin_map(xe, xe_device_get_root_tile(xe),
-				   NULL, PAGE_ALIGN(size),
-				   ttm_bo_type_kernel,
-				   XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
-				   XE_BO_FLAG_SCANOUT | XE_BO_FLAG_GGTT);
+	obj = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe),
+					PAGE_ALIGN(size),
+					ttm_bo_type_kernel,
+					XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
+					XE_BO_FLAG_SCANOUT | XE_BO_FLAG_GGTT, false);
 	if (IS_ERR(obj)) {
 		kfree(vma);
 		return false;
diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
index d46ff7ebb0a1..d8e15ebb740c 100644
--- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
+++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
@@ -102,32 +102,23 @@ static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb,
 				 XE_PAGE_SIZE);
 
 	if (IS_DGFX(xe))
-		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
-						   dpt_size, ~0ull,
-						   ttm_bo_type_kernel,
-						   true,
-						   XE_BO_FLAG_VRAM0 |
-						   XE_BO_FLAG_GGTT |
-						   XE_BO_FLAG_PAGETABLE,
-						   alignment, false);
+		dpt = xe_bo_create_pin_map_novm(xe, tile0, dpt_size,
+						ttm_bo_type_kernel,
+						XE_BO_FLAG_VRAM0 |
+						XE_BO_FLAG_GGTT |
+						XE_BO_FLAG_PAGETABLE, true);
 	else
-		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
-						   dpt_size,  ~0ull,
-						   ttm_bo_type_kernel,
-						   true,
-						   XE_BO_FLAG_STOLEN |
-						   XE_BO_FLAG_GGTT |
-						   XE_BO_FLAG_PAGETABLE,
-						   alignment, false);
+		dpt = xe_bo_create_pin_map_novm(xe, tile0, dpt_size,
+						ttm_bo_type_kernel,
+						XE_BO_FLAG_STOLEN |
+						XE_BO_FLAG_GGTT |
+						XE_BO_FLAG_PAGETABLE, true);
 	if (IS_ERR(dpt))
-		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
-						   dpt_size,  ~0ull,
-						   ttm_bo_type_kernel,
-						   true,
-						   XE_BO_FLAG_SYSTEM |
-						   XE_BO_FLAG_GGTT |
-						   XE_BO_FLAG_PAGETABLE,
-						   alignment, false);
+		dpt = xe_bo_create_pin_map_novm(xe, tile0, dpt_size,
+						ttm_bo_type_kernel,
+						XE_BO_FLAG_SYSTEM |
+						XE_BO_FLAG_GGTT |
+						XE_BO_FLAG_PAGETABLE, true);
 	if (IS_ERR(dpt))
 		return PTR_ERR(dpt);
 
diff --git a/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c b/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
index 30f1073141fc..4ae847b628e2 100644
--- a/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
+++ b/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
@@ -72,10 +72,10 @@ static int intel_hdcp_gsc_initialize_message(struct xe_device *xe,
 	int ret = 0;
 
 	/* allocate object of two page for HDCP command memory and store it */
-	bo = xe_bo_create_pin_map(xe, xe_device_get_root_tile(xe), NULL, PAGE_SIZE * 2,
-				  ttm_bo_type_kernel,
-				  XE_BO_FLAG_SYSTEM |
-				  XE_BO_FLAG_GGTT);
+	bo = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe), PAGE_SIZE * 2,
+				       ttm_bo_type_kernel,
+				       XE_BO_FLAG_SYSTEM |
+				       XE_BO_FLAG_GGTT, false);
 
 	if (IS_ERR(bo)) {
 		drm_err(&xe->drm, "Failed to allocate bo for HDCP streaming command!\n");
diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
index afa794e56065..5904d658d1f2 100644
--- a/drivers/gpu/drm/xe/tests/xe_migrate.c
+++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
@@ -204,7 +204,8 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test,
 
 	big = xe_bo_create_pin_map(xe, tile, m->q->vm, SZ_4M,
 				   ttm_bo_type_kernel,
-				   XE_BO_FLAG_VRAM_IF_DGFX(tile));
+				   XE_BO_FLAG_VRAM_IF_DGFX(tile),
+				   exec);
 	if (IS_ERR(big)) {
 		KUNIT_FAIL(test, "Failed to allocate bo: %li\n", PTR_ERR(big));
 		goto vunmap;
@@ -212,7 +213,8 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test,
 
 	pt = xe_bo_create_pin_map(xe, tile, m->q->vm, XE_PAGE_SIZE,
 				  ttm_bo_type_kernel,
-				  XE_BO_FLAG_VRAM_IF_DGFX(tile));
+				  XE_BO_FLAG_VRAM_IF_DGFX(tile),
+				  exec);
 	if (IS_ERR(pt)) {
 		KUNIT_FAIL(test, "Failed to allocate fake pt: %li\n",
 			   PTR_ERR(pt));
@@ -222,7 +224,8 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test,
 	tiny = xe_bo_create_pin_map(xe, tile, m->q->vm,
 				    2 * SZ_4K,
 				    ttm_bo_type_kernel,
-				    XE_BO_FLAG_VRAM_IF_DGFX(tile));
+				    XE_BO_FLAG_VRAM_IF_DGFX(tile),
+				    exec);
 	if (IS_ERR(tiny)) {
 		KUNIT_FAIL(test, "Failed to allocate tiny fake pt: %li\n",
 			   PTR_ERR(tiny));
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index c9928d4ee5a0..82bf158426ad 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -2343,16 +2343,60 @@ xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile *tile,
 	return ret ? ERR_PTR(ret) : bo;
 }
 
+/**
+ * xe_bo_create_pin_map() - Create pinned and mapped bo
+ * @xe: The xe device.
+ * @tile: The tile to select for migration of this bo, and the tile used for
+ * @vm: The vm to associate the buffer object with. The vm's resv must be locked
+ * with the transaction represented by @exec.
+ * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
+ * @size: The storage size to use for the bo.
+ * @type: The TTM buffer object type.
+ * @flags: XE_BO_FLAG_ flags.
+ * @exec: The drm_exec transaction to use for exhaustive eviction, and
+ * previously used for locking @vm's resv.
+ *
+ * Create a pinned and mapped bo. The bo will be external and not associated
+ * with a VM.
+ *
+ * Return: The buffer object on success. Negative error pointer on failure.
+ * In particular, the function may return ERR_PTR(%-EINTR) if @intr was set
+ * to true on entry.
+ */
 struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
 				   struct xe_vm *vm, size_t size,
-				   enum ttm_bo_type type, u32 flags)
+				   enum ttm_bo_type type, u32 flags,
+				   struct drm_exec *exec)
 {
-	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
-
+	xe_assert(xe, exec);
 	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size, ~0ull, type, flags,
 					       true, 0, exec);
 }
 
+/**
+ * xe_bo_create_pin_map_novm() - Create pinned and mapped bo
+ * @xe: The xe device.
+ * @tile: The tile to select for migration of this bo, and the tile used for
+ * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
+ * @size: The storage size to use for the bo.
+ * @type: The TTM buffer object type.
+ * @flags: XE_BO_FLAG_ flags.
+ * @intr: Whether to execut any waits for backing store interruptible.
+ *
+ * Create a pinned and mapped bo. The bo will be external and not associated
+ * with a VM.
+ *
+ * Return: The buffer object on success. Negative error pointer on failure.
+ * In particular, the function may return ERR_PTR(%-EINTR) if @intr was set
+ * to true on entry.
+ */
+struct xe_bo *xe_bo_create_pin_map_novm(struct xe_device *xe, struct xe_tile *tile,
+					size_t size, enum ttm_bo_type type, u32 flags,
+					bool intr)
+{
+	return xe_bo_create_pin_map_at_novm(xe, tile, size, ~0ull, type, flags, true, 0, intr);
+}
+
 static void __xe_bo_unpin_map_no_vm(void *arg)
 {
 	xe_bo_unpin_map_no_vm(arg);
@@ -2365,8 +2409,7 @@ struct xe_bo *xe_managed_bo_create_pin_map(struct xe_device *xe, struct xe_tile
 	int ret;
 
 	KUNIT_STATIC_STUB_REDIRECT(xe_managed_bo_create_pin_map, xe, tile, size, flags);
-
-	bo = xe_bo_create_pin_map(xe, tile, NULL, size, ttm_bo_type_kernel, flags);
+	bo = xe_bo_create_pin_map_novm(xe, tile, size, ttm_bo_type_kernel, flags, true);
 	if (IS_ERR(bo))
 		return bo;
 
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index d06266af9662..802e3c7d7872 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -108,7 +108,11 @@ struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_vm *vm, size_t s
 				u16 cpu_caching, u32 flags, struct drm_exec *exec);
 struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
 				   struct xe_vm *vm, size_t size,
-				   enum ttm_bo_type type, u32 flags);
+				   enum ttm_bo_type type, u32 flags,
+				   struct drm_exec *exec);
+struct xe_bo *xe_bo_create_pin_map_novm(struct xe_device *xe, struct xe_tile *tile,
+					size_t size, enum ttm_bo_type type, u32 flags,
+					bool intr);
 struct xe_bo *
 xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile *tile,
 			     size_t size, u64 offset, enum ttm_bo_type type,
diff --git a/drivers/gpu/drm/xe/xe_gsc.c b/drivers/gpu/drm/xe/xe_gsc.c
index f5ae28af60d4..83d61bf8ec62 100644
--- a/drivers/gpu/drm/xe/xe_gsc.c
+++ b/drivers/gpu/drm/xe/xe_gsc.c
@@ -136,10 +136,10 @@ static int query_compatibility_version(struct xe_gsc *gsc)
 	u64 ggtt_offset;
 	int err;
 
-	bo = xe_bo_create_pin_map(xe, tile, NULL, GSC_VER_PKT_SZ * 2,
-				  ttm_bo_type_kernel,
-				  XE_BO_FLAG_SYSTEM |
-				  XE_BO_FLAG_GGTT);
+	bo = xe_bo_create_pin_map_novm(xe, tile, GSC_VER_PKT_SZ * 2,
+				       ttm_bo_type_kernel,
+				       XE_BO_FLAG_SYSTEM |
+				       XE_BO_FLAG_GGTT, false);
 	if (IS_ERR(bo)) {
 		xe_gt_err(gt, "failed to allocate bo for GSC version query\n");
 		return PTR_ERR(bo);
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
index 906011671b60..d0a87d7b028b 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
@@ -1452,7 +1452,6 @@ static bool pf_release_vf_config_lmem(struct xe_gt *gt, struct xe_gt_sriov_confi
 static int pf_provision_vf_lmem(struct xe_gt *gt, unsigned int vfid, u64 size)
 {
 	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
-	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
 	struct xe_device *xe = gt_to_xe(gt);
 	struct xe_tile *tile = gt_to_tile(gt);
 	struct xe_bo *bo;
@@ -1479,24 +1478,17 @@ static int pf_provision_vf_lmem(struct xe_gt *gt, unsigned int vfid, u64 size)
 		return 0;
 
 	xe_gt_assert(gt, pf_get_lmem_alignment(gt) == SZ_2M);
-	bo = xe_bo_create_locked(xe, tile, NULL,
-				 ALIGN(size, PAGE_SIZE),
-				 ttm_bo_type_kernel,
-				 XE_BO_FLAG_VRAM_IF_DGFX(tile) |
-				 XE_BO_FLAG_NEEDS_2M |
-				 XE_BO_FLAG_PINNED |
-				 XE_BO_FLAG_PINNED_LATE_RESTORE,
-				 exec);
+	bo = xe_bo_create_pin_map_at_novm(xe, tile,
+					  ALIGN(size, PAGE_SIZE),
+					  0,
+					  ttm_bo_type_kernel,
+					  XE_BO_FLAG_VRAM_IF_DGFX(tile) |
+					  XE_BO_FLAG_NEEDS_2M |
+					  XE_BO_FLAG_PINNED_LATE_RESTORE,
+					  false, 0, false);
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
 
-	err = xe_bo_pin(bo, exec);
-	xe_bo_unlock(bo);
-	if (unlikely(err)) {
-		xe_bo_put(bo);
-		return err;
-	}
-
 	config->lmem_obj = bo;
 
 	if (xe_device_has_lmtt(xe)) {
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
index c712111aa30d..44cc612b0a75 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
@@ -55,12 +55,12 @@ static int pf_send_guc_save_vf_state(struct xe_gt *gt, unsigned int vfid,
 	xe_gt_assert(gt, size % sizeof(u32) == 0);
 	xe_gt_assert(gt, size == ndwords * sizeof(u32));
 
-	bo = xe_bo_create_pin_map(xe, tile, NULL,
-				  ALIGN(size, PAGE_SIZE),
-				  ttm_bo_type_kernel,
-				  XE_BO_FLAG_SYSTEM |
-				  XE_BO_FLAG_GGTT |
-				  XE_BO_FLAG_GGTT_INVALIDATE);
+	bo = xe_bo_create_pin_map_novm(xe, tile,
+				       ALIGN(size, PAGE_SIZE),
+				       ttm_bo_type_kernel,
+				       XE_BO_FLAG_SYSTEM |
+				       XE_BO_FLAG_GGTT |
+				       XE_BO_FLAG_GGTT_INVALIDATE, false);
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
 
@@ -91,12 +91,12 @@ static int pf_send_guc_restore_vf_state(struct xe_gt *gt, unsigned int vfid,
 	xe_gt_assert(gt, size % sizeof(u32) == 0);
 	xe_gt_assert(gt, size == ndwords * sizeof(u32));
 
-	bo = xe_bo_create_pin_map(xe, tile, NULL,
-				  ALIGN(size, PAGE_SIZE),
-				  ttm_bo_type_kernel,
-				  XE_BO_FLAG_SYSTEM |
-				  XE_BO_FLAG_GGTT |
-				  XE_BO_FLAG_GGTT_INVALIDATE);
+	bo = xe_bo_create_pin_map_novm(xe, tile,
+				       ALIGN(size, PAGE_SIZE),
+				       ttm_bo_type_kernel,
+				       XE_BO_FLAG_SYSTEM |
+				       XE_BO_FLAG_GGTT |
+				       XE_BO_FLAG_GGTT_INVALIDATE, false);
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
 
diff --git a/drivers/gpu/drm/xe/xe_guc_engine_activity.c b/drivers/gpu/drm/xe/xe_guc_engine_activity.c
index 92e1f9f41b8c..2b99c1ebdd58 100644
--- a/drivers/gpu/drm/xe/xe_guc_engine_activity.c
+++ b/drivers/gpu/drm/xe/xe_guc_engine_activity.c
@@ -94,16 +94,17 @@ static int allocate_engine_activity_buffers(struct xe_guc *guc,
 	struct xe_tile *tile = gt_to_tile(gt);
 	struct xe_bo *bo, *metadata_bo;
 
-	metadata_bo = xe_bo_create_pin_map(gt_to_xe(gt), tile, NULL, PAGE_ALIGN(metadata_size),
-					   ttm_bo_type_kernel, XE_BO_FLAG_SYSTEM |
-					   XE_BO_FLAG_GGTT | XE_BO_FLAG_GGTT_INVALIDATE);
+	metadata_bo = xe_bo_create_pin_map_novm(gt_to_xe(gt), tile, PAGE_ALIGN(metadata_size),
+						ttm_bo_type_kernel, XE_BO_FLAG_SYSTEM |
+						XE_BO_FLAG_GGTT | XE_BO_FLAG_GGTT_INVALIDATE,
+						false);
 
 	if (IS_ERR(metadata_bo))
 		return PTR_ERR(metadata_bo);
 
-	bo = xe_bo_create_pin_map(gt_to_xe(gt), tile, NULL, PAGE_ALIGN(size),
-				  ttm_bo_type_kernel, XE_BO_FLAG_VRAM_IF_DGFX(tile) |
-				  XE_BO_FLAG_GGTT | XE_BO_FLAG_GGTT_INVALIDATE);
+	bo = xe_bo_create_pin_map_novm(gt_to_xe(gt), tile, PAGE_ALIGN(size),
+				       ttm_bo_type_kernel, XE_BO_FLAG_VRAM_IF_DGFX(tile) |
+				       XE_BO_FLAG_GGTT | XE_BO_FLAG_GGTT_INVALIDATE, false);
 
 	if (IS_ERR(bo)) {
 		xe_bo_unpin_map_no_vm(metadata_bo);
diff --git a/drivers/gpu/drm/xe/xe_lmtt.c b/drivers/gpu/drm/xe/xe_lmtt.c
index a78c9d474a6e..4ad468574174 100644
--- a/drivers/gpu/drm/xe/xe_lmtt.c
+++ b/drivers/gpu/drm/xe/xe_lmtt.c
@@ -67,12 +67,12 @@ static struct xe_lmtt_pt *lmtt_pt_alloc(struct xe_lmtt *lmtt, unsigned int level
 		goto out;
 	}
 
-	bo = xe_bo_create_pin_map(lmtt_to_xe(lmtt), lmtt_to_tile(lmtt), NULL,
-				  PAGE_ALIGN(lmtt->ops->lmtt_pte_size(level) *
-					     lmtt->ops->lmtt_pte_num(level)),
-				  ttm_bo_type_kernel,
-				  XE_BO_FLAG_VRAM_IF_DGFX(lmtt_to_tile(lmtt)) |
-				  XE_BO_FLAG_NEEDS_64K);
+	bo = xe_bo_create_pin_map_novm(lmtt_to_xe(lmtt), lmtt_to_tile(lmtt),
+				       PAGE_ALIGN(lmtt->ops->lmtt_pte_size(level) *
+						  lmtt->ops->lmtt_pte_num(level)),
+				       ttm_bo_type_kernel,
+				       XE_BO_FLAG_VRAM_IF_DGFX(lmtt_to_tile(lmtt)) |
+				       XE_BO_FLAG_NEEDS_64K, false);
 	if (IS_ERR(bo)) {
 		err = PTR_ERR(bo);
 		goto out_free_pt;
diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index 8f6c3ba47882..6d52e0eb97f5 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -1340,9 +1340,10 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
 	if (vm && vm->xef) /* userspace */
 		bo_flags |= XE_BO_FLAG_PINNED_LATE_RESTORE;
 
-	lrc->bo = xe_bo_create_pin_map(xe, tile, NULL, bo_size,
-				       ttm_bo_type_kernel,
-				       bo_flags);
+	lrc->bo = xe_bo_create_pin_map_novm(xe, tile,
+					    bo_size,
+					    ttm_bo_type_kernel,
+					    bo_flags, false);
 	if (IS_ERR(lrc->bo))
 		return PTR_ERR(lrc->bo);
 
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index ddfad7506a82..fe0d15ab340e 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -35,6 +35,7 @@
 #include "xe_sched_job.h"
 #include "xe_sync.h"
 #include "xe_trace_bo.h"
+#include "xe_validation.h"
 #include "xe_vm.h"
 #include "xe_vram.h"
 
@@ -173,7 +174,7 @@ static void xe_migrate_program_identity(struct xe_device *xe, struct xe_vm *vm,
 }
 
 static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
-				 struct xe_vm *vm)
+				 struct xe_vm *vm, struct drm_exec *exec)
 {
 	struct xe_device *xe = tile_to_xe(tile);
 	u16 pat_index = xe->pat.idx[XE_CACHE_WB];
@@ -200,7 +201,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 				  num_entries * XE_PAGE_SIZE,
 				  ttm_bo_type_kernel,
 				  XE_BO_FLAG_VRAM_IF_DGFX(tile) |
-				  XE_BO_FLAG_PAGETABLE);
+				  XE_BO_FLAG_PAGETABLE, exec);
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
 
@@ -404,6 +405,8 @@ int xe_migrate_init(struct xe_migrate *m)
 	struct xe_tile *tile = m->tile;
 	struct xe_gt *primary_gt = tile->primary_gt;
 	struct xe_device *xe = tile_to_xe(tile);
+	struct xe_validation_ctx ctx;
+	struct drm_exec exec;
 	struct xe_vm *vm;
 	int err;
 
@@ -413,11 +416,16 @@ int xe_migrate_init(struct xe_migrate *m)
 	if (IS_ERR(vm))
 		return PTR_ERR(vm);
 
-	xe_vm_lock(vm, false);
-	err = xe_migrate_prepare_vm(tile, m, vm);
-	xe_vm_unlock(vm);
+	err = 0;
+	xe_validation_guard(&ctx, &xe->val, &exec, 0, err, false) {
+		err = xe_vm_drm_exec_lock(vm, &exec);
+		drm_exec_retry_on_contention(&exec);
+		err = xe_migrate_prepare_vm(tile, m, vm, &exec);
+		drm_exec_retry_on_contention(&exec);
+		xe_validation_retry_on_oom(&ctx, &err);
+	}
 	if (err)
-		goto err_out;
+		return err;
 
 	if (xe->info.has_usm) {
 		struct xe_hw_engine *hwe = xe_gt_hw_engine(primary_gt,
diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
index a188bad172ad..a4894eb0d7f3 100644
--- a/drivers/gpu/drm/xe/xe_oa.c
+++ b/drivers/gpu/drm/xe/xe_oa.c
@@ -883,9 +883,9 @@ static int xe_oa_alloc_oa_buffer(struct xe_oa_stream *stream, size_t size)
 {
 	struct xe_bo *bo;
 
-	bo = xe_bo_create_pin_map(stream->oa->xe, stream->gt->tile, NULL,
-				  size, ttm_bo_type_kernel,
-				  XE_BO_FLAG_SYSTEM | XE_BO_FLAG_GGTT);
+	bo = xe_bo_create_pin_map_novm(stream->oa->xe, stream->gt->tile,
+				       size, ttm_bo_type_kernel,
+				       XE_BO_FLAG_SYSTEM | XE_BO_FLAG_GGTT, false);
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
 
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index f3a39e734a90..33ad40418ceb 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -88,6 +88,7 @@ static void xe_pt_free(struct xe_pt *pt)
  * @vm: The vm to create for.
  * @tile: The tile to create for.
  * @level: The page-table level.
+ * @exec: The drm_exec object used to lock the vm.
  *
  * Allocate and initialize a single struct xe_pt metadata structure. Also
  * create the corresponding page-table bo, but don't initialize it. If the
@@ -99,7 +100,7 @@ static void xe_pt_free(struct xe_pt *pt)
  * error.
  */
 struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
-			   unsigned int level)
+			   unsigned int level, struct drm_exec *exec)
 {
 	struct xe_pt *pt;
 	struct xe_bo *bo;
@@ -123,9 +124,11 @@ struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
 		bo_flags |= XE_BO_FLAG_PINNED_LATE_RESTORE;
 
 	pt->level = level;
+
+	drm_WARN_ON(&vm->xe->drm, IS_ERR_OR_NULL(exec));
 	bo = xe_bo_create_pin_map(vm->xe, tile, vm, SZ_4K,
 				  ttm_bo_type_kernel,
-				  bo_flags);
+				  bo_flags, exec);
 	if (IS_ERR(bo)) {
 		err = PTR_ERR(bo);
 		goto err_kfree;
@@ -589,7 +592,8 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
 	if (covers || !*child) {
 		u64 flags = 0;
 
-		xe_child = xe_pt_create(xe_walk->vm, xe_walk->tile, level - 1);
+		xe_child = xe_pt_create(xe_walk->vm, xe_walk->tile, level - 1,
+					xe_vm_validation_exec(vm));
 		if (IS_ERR(xe_child))
 			return PTR_ERR(xe_child);
 
diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
index 5ecf003d513c..4daeebaab5a1 100644
--- a/drivers/gpu/drm/xe/xe_pt.h
+++ b/drivers/gpu/drm/xe/xe_pt.h
@@ -10,6 +10,7 @@
 #include "xe_pt_types.h"
 
 struct dma_fence;
+struct drm_exec;
 struct xe_bo;
 struct xe_device;
 struct xe_exec_queue;
@@ -29,7 +30,7 @@ struct xe_vma_ops;
 unsigned int xe_pt_shift(unsigned int level);
 
 struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
-			   unsigned int level);
+			   unsigned int level, struct drm_exec *exec);
 
 void xe_pt_populate_empty(struct xe_tile *tile, struct xe_vm *vm,
 			  struct xe_pt *pt);
diff --git a/drivers/gpu/drm/xe/xe_pxp_submit.c b/drivers/gpu/drm/xe/xe_pxp_submit.c
index ca95f2a4d4ef..54bd6b64dc6d 100644
--- a/drivers/gpu/drm/xe/xe_pxp_submit.c
+++ b/drivers/gpu/drm/xe/xe_pxp_submit.c
@@ -54,8 +54,9 @@ static int allocate_vcs_execution_resources(struct xe_pxp *pxp)
 	 * Each termination is 16 DWORDS, so 4K is enough to contain a
 	 * termination for each sessions.
 	 */
-	bo = xe_bo_create_pin_map(xe, tile, NULL, SZ_4K, ttm_bo_type_kernel,
-				  XE_BO_FLAG_SYSTEM | XE_BO_FLAG_PINNED | XE_BO_FLAG_GGTT);
+	bo = xe_bo_create_pin_map_novm(xe, tile, SZ_4K, ttm_bo_type_kernel,
+				       XE_BO_FLAG_SYSTEM | XE_BO_FLAG_PINNED | XE_BO_FLAG_GGTT,
+				       false);
 	if (IS_ERR(bo)) {
 		err = PTR_ERR(bo);
 		goto out_queue;
@@ -87,7 +88,9 @@ static int allocate_gsc_client_resources(struct xe_gt *gt,
 {
 	struct xe_tile *tile = gt_to_tile(gt);
 	struct xe_device *xe = tile_to_xe(tile);
+	struct xe_validation_ctx ctx;
 	struct xe_hw_engine *hwe;
+	struct drm_exec exec;
 	struct xe_vm *vm;
 	struct xe_bo *bo;
 	struct xe_exec_queue *q;
@@ -106,15 +109,26 @@ static int allocate_gsc_client_resources(struct xe_gt *gt,
 		return PTR_ERR(vm);
 
 	/* We allocate a single object for the batch and the in/out memory */
-	xe_vm_lock(vm, false);
-	bo = xe_bo_create_pin_map(xe, tile, vm, PXP_BB_SIZE + inout_size * 2,
-				  ttm_bo_type_kernel,
-				  XE_BO_FLAG_SYSTEM | XE_BO_FLAG_PINNED | XE_BO_FLAG_NEEDS_UC);
-	xe_vm_unlock(vm);
-	if (IS_ERR(bo)) {
-		err = PTR_ERR(bo);
-		goto vm_out;
+
+	xe_validation_guard(&ctx, &xe->val, &exec, 0, err, false) {
+		err = xe_vm_drm_exec_lock(vm, &exec);
+		drm_exec_retry_on_contention(&exec);
+		if (err)
+			break;
+
+		bo = xe_bo_create_pin_map(xe, tile, vm, PXP_BB_SIZE + inout_size * 2,
+					  ttm_bo_type_kernel,
+					  XE_BO_FLAG_SYSTEM | XE_BO_FLAG_PINNED |
+					  XE_BO_FLAG_NEEDS_UC, &exec);
+		drm_exec_retry_on_contention(&exec);
+		if (IS_ERR(bo)) {
+			err = PTR_ERR(bo);
+			xe_validation_retry_on_oom(&ctx, &err);
+			break;
+		}
 	}
+	if (err)
+		goto vm_out;
 
 	fence = xe_vm_bind_kernel_bo(vm, bo, NULL, 0, XE_CACHE_WB);
 	if (IS_ERR(fence)) {
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 989d84c2e82f..b3ee65126841 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1606,6 +1606,7 @@ static void vm_destroy_work_func(struct work_struct *w);
  * @xe: xe device.
  * @tile: tile to set up for.
  * @vm: vm to set up for.
+ * @exec: The struct drm_exec object used to lock the vm resv.
  *
  * Sets up a pagetable tree with one page-table per level and a single
  * leaf PTE. All pagetable entries point to the single page-table or,
@@ -1615,20 +1616,19 @@ static void vm_destroy_work_func(struct work_struct *w);
  * Return: 0 on success, negative error code on error.
  */
 static int xe_vm_create_scratch(struct xe_device *xe, struct xe_tile *tile,
-				struct xe_vm *vm)
+				struct xe_vm *vm, struct drm_exec *exec)
 {
 	u8 id = tile->id;
 	int i;
 
 	for (i = MAX_HUGEPTE_LEVEL; i < vm->pt_root[id]->level; i++) {
-		vm->scratch_pt[id][i] = xe_pt_create(vm, tile, i);
+		vm->scratch_pt[id][i] = xe_pt_create(vm, tile, i, exec);
 		if (IS_ERR(vm->scratch_pt[id][i])) {
 			int err = PTR_ERR(vm->scratch_pt[id][i]);
 
 			vm->scratch_pt[id][i] = NULL;
 			return err;
 		}
-
 		xe_pt_populate_empty(tile, vm, vm->scratch_pt[id][i]);
 	}
 
@@ -1656,9 +1656,26 @@ static void xe_vm_free_scratch(struct xe_vm *vm)
 	}
 }
 
+static void xe_vm_pt_destroy(struct xe_vm *vm)
+{
+	struct xe_tile *tile;
+	u8 id;
+
+	xe_vm_assert_held(vm);
+
+	for_each_tile(tile, vm->xe, id) {
+		if (vm->pt_root[id]) {
+			xe_pt_destroy(vm->pt_root[id], vm->flags, NULL);
+			vm->pt_root[id] = NULL;
+		}
+	}
+}
+
 struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct xe_file *xef)
 {
 	struct drm_gem_object *vm_resv_obj;
+	struct xe_validation_ctx ctx;
+	struct drm_exec exec;
 	struct xe_vm *vm;
 	int err, number_tiles = 0;
 	struct xe_tile *tile;
@@ -1745,49 +1762,64 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct xe_file *xef)
 
 	drm_gem_object_put(vm_resv_obj);
 
-	err = xe_vm_lock(vm, true);
-	if (err)
-		goto err_close;
+	err = 0;
+	xe_validation_guard(&ctx, &xe->val, &exec, DRM_EXEC_INTERRUPTIBLE_WAIT,
+			    err, true) {
+		err = xe_vm_drm_exec_lock(vm, &exec);
+		drm_exec_retry_on_contention(&exec);
 
-	if (IS_DGFX(xe) && xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K)
-		vm->flags |= XE_VM_FLAG_64K;
+		if (IS_DGFX(xe) && xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K)
+			vm->flags |= XE_VM_FLAG_64K;
 
-	for_each_tile(tile, xe, id) {
-		if (flags & XE_VM_FLAG_MIGRATION &&
-		    tile->id != XE_VM_FLAG_TILE_ID(flags))
-			continue;
+		for_each_tile(tile, xe, id) {
+			if (flags & XE_VM_FLAG_MIGRATION &&
+			    tile->id != XE_VM_FLAG_TILE_ID(flags))
+				continue;
 
-		vm->pt_root[id] = xe_pt_create(vm, tile, xe->info.vm_max_level);
-		if (IS_ERR(vm->pt_root[id])) {
-			err = PTR_ERR(vm->pt_root[id]);
-			vm->pt_root[id] = NULL;
-			goto err_unlock_close;
+			vm->pt_root[id] = xe_pt_create(vm, tile, xe->info.vm_max_level,
+						       &exec);
+			if (IS_ERR(vm->pt_root[id])) {
+				err = PTR_ERR(vm->pt_root[id]);
+				vm->pt_root[id] = NULL;
+				xe_vm_pt_destroy(vm);
+				drm_exec_retry_on_contention(&exec);
+				xe_validation_retry_on_oom(&ctx, &err);
+				goto err_close;
+			}
 		}
-	}
 
-	if (xe_vm_has_scratch(vm)) {
+		if (xe_vm_has_scratch(vm)) {
+			for_each_tile(tile, xe, id) {
+				if (!vm->pt_root[id])
+					continue;
+
+				err = xe_vm_create_scratch(xe, tile, vm, &exec);
+				if (err) {
+					xe_vm_free_scratch(vm);
+					xe_vm_pt_destroy(vm);
+					drm_exec_retry_on_contention(&exec);
+					xe_validation_retry_on_oom(&ctx, &err);
+					goto err_close;
+				}
+			}
+			vm->batch_invalidate_tlb = true;
+		}
+
+		if (vm->flags & XE_VM_FLAG_LR_MODE) {
+			INIT_WORK(&vm->preempt.rebind_work, preempt_rebind_work_func);
+			vm->batch_invalidate_tlb = false;
+		}
+
+		/* Fill pt_root after allocating scratch tables */
 		for_each_tile(tile, xe, id) {
 			if (!vm->pt_root[id])
 				continue;
 
-			err = xe_vm_create_scratch(xe, tile, vm);
-			if (err)
-				goto err_unlock_close;
+			xe_pt_populate_empty(tile, vm, vm->pt_root[id]);
 		}
-		vm->batch_invalidate_tlb = true;
-	}
-
-	if (vm->flags & XE_VM_FLAG_LR_MODE)
-		vm->batch_invalidate_tlb = false;
-
-	/* Fill pt_root after allocating scratch tables */
-	for_each_tile(tile, xe, id) {
-		if (!vm->pt_root[id])
-			continue;
-
-		xe_pt_populate_empty(tile, vm, vm->pt_root[id]);
 	}
-	xe_vm_unlock(vm);
+	if (err)
+		goto err_close;
 
 	/* Kernel migration VM shouldn't have a circular loop.. */
 	if (!(flags & XE_VM_FLAG_MIGRATION)) {
@@ -1820,7 +1852,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct xe_file *xef)
 				      &xe->usm.next_asid, GFP_KERNEL);
 		up_write(&xe->usm.lock);
 		if (err < 0)
-			goto err_unlock_close;
+			goto err_close;
 
 		vm->usm.asid = asid;
 	}
@@ -1829,8 +1861,6 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct xe_file *xef)
 
 	return vm;
 
-err_unlock_close:
-	xe_vm_unlock(vm);
 err_close:
 	xe_vm_close_and_put(vm);
 	return ERR_PTR(err);
@@ -1959,13 +1989,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
 	 * destroy the pagetables immediately.
 	 */
 	xe_vm_free_scratch(vm);
-
-	for_each_tile(tile, xe, id) {
-		if (vm->pt_root[id]) {
-			xe_pt_destroy(vm->pt_root[id], vm->flags, NULL);
-			vm->pt_root[id] = NULL;
-		}
-	}
+	xe_vm_pt_destroy(vm);
 	xe_vm_unlock(vm);
 
 	/*
@@ -3845,7 +3869,6 @@ struct dma_fence *xe_vm_bind_kernel_bo(struct xe_vm *vm, struct xe_bo *bo,
  */
 int xe_vm_lock(struct xe_vm *vm, bool intr)
 {
-	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
 	int ret;
 
 	if (intr)
@@ -3853,9 +3876,6 @@ int xe_vm_lock(struct xe_vm *vm, bool intr)
 	else
 		ret = dma_resv_lock(xe_vm_resv(vm), NULL);
 
-	if (!ret)
-		xe_vm_set_validation_exec(vm, exec);
-
 	return ret;
 }
 
@@ -3867,7 +3887,6 @@ int xe_vm_lock(struct xe_vm *vm, bool intr)
  */
 void xe_vm_unlock(struct xe_vm *vm)
 {
-	xe_vm_set_validation_exec(vm, NULL);
 	dma_resv_unlock(xe_vm_resv(vm));
 }
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 15/15] drm/xe: Convert pinned suspend eviction for exhaustive eviction
  2025-08-13 10:51 [PATCH 00/15] Driver-managed exhaustive eviction Thomas Hellström
                   ` (13 preceding siblings ...)
  2025-08-13 10:51 ` [PATCH 14/15] drm/xe: Convert xe_bo_create_pin_map() " Thomas Hellström
@ 2025-08-13 10:51 ` Thomas Hellström
  2025-08-13 12:13   ` Matthew Auld
  2025-08-14 20:30   ` Matthew Brost
  2025-08-13 11:54 ` ✗ CI.checkpatch: warning for Driver-managed " Patchwork
                   ` (3 subsequent siblings)
  18 siblings, 2 replies; 66+ messages in thread
From: Thomas Hellström @ 2025-08-13 10:51 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

Pinned suspend eviction and preparation for eviction validates
system memory for eviction buffers. Do that under a
validation exclusive lock to avoid interfering with other
processes validating system graphics memory.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c | 205 +++++++++++++++++++------------------
 1 file changed, 108 insertions(+), 97 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 82bf158426ad..efb9c88b6aa7 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1139,43 +1139,47 @@ long xe_bo_shrink(struct ttm_operation_ctx *ctx, struct ttm_buffer_object *bo,
 int xe_bo_notifier_prepare_pinned(struct xe_bo *bo)
 {
 	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
-	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
+	struct xe_validation_ctx ctx;
+	struct drm_exec exec;
 	struct xe_bo *backup;
 	int ret = 0;
 
-	xe_bo_lock(bo, false);
+	xe_validation_guard(&ctx, &xe->val, &exec, 0, ret, true) {
+		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
+		drm_exec_retry_on_contention(&exec);
+		xe_assert(xe, !ret);
+		xe_assert(xe, !bo->backup_obj);
 
-	xe_assert(xe, !bo->backup_obj);
+		/*
+		 * Since this is called from the PM notifier we might have raced with
+		 * someone unpinning this after we dropped the pinned list lock and
+		 * grabbing the above bo lock.
+		 */
+		if (!xe_bo_is_pinned(bo))
+			break;
 
-	/*
-	 * Since this is called from the PM notifier we might have raced with
-	 * someone unpinning this after we dropped the pinned list lock and
-	 * grabbing the above bo lock.
-	 */
-	if (!xe_bo_is_pinned(bo))
-		goto out_unlock_bo;
+		if (!xe_bo_is_vram(bo))
+			break;
 
-	if (!xe_bo_is_vram(bo))
-		goto out_unlock_bo;
+		if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
+			break;
 
-	if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
-		goto out_unlock_bo;
+		backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
+					   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
+					   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
+					   XE_BO_FLAG_PINNED, &exec);
+		if (IS_ERR(backup)) {
+			drm_exec_retry_on_contention(&exec);
+			ret = PTR_ERR(backup);
+			xe_validation_retry_on_oom(&ctx, &ret);
+			break;
+		}
 
-	backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
-				   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
-				   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
-				   XE_BO_FLAG_PINNED, exec);
-	if (IS_ERR(backup)) {
-		ret = PTR_ERR(backup);
-		goto out_unlock_bo;
+		backup->parent_obj = xe_bo_get(bo); /* Released by bo_destroy */
+		ttm_bo_pin(&backup->ttm);
+		bo->backup_obj = backup;
 	}
 
-	backup->parent_obj = xe_bo_get(bo); /* Released by bo_destroy */
-	ttm_bo_pin(&backup->ttm);
-	bo->backup_obj = backup;
-
-out_unlock_bo:
-	xe_bo_unlock(bo);
 	return ret;
 }
 
@@ -1215,99 +1219,106 @@ int xe_bo_notifier_unprepare_pinned(struct xe_bo *bo)
 int xe_bo_evict_pinned(struct xe_bo *bo)
 {
 	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
-	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
+	struct xe_validation_ctx ctx;
+	struct drm_exec exec;
 	struct xe_bo *backup = bo->backup_obj;
 	bool backup_created = false;
 	bool unmap = false;
 	int ret = 0;
 
-	xe_bo_lock(bo, false);
+	xe_validation_guard(&ctx, &xe->val, &exec, 0, ret, true) {
+		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
+		drm_exec_retry_on_contention(&exec);
+		xe_assert(xe, !ret);
 
-	if (WARN_ON(!bo->ttm.resource)) {
-		ret = -EINVAL;
-		goto out_unlock_bo;
-	}
+		if (WARN_ON(!bo->ttm.resource)) {
+			ret = -EINVAL;
+			break;
+		}
 
-	if (WARN_ON(!xe_bo_is_pinned(bo))) {
-		ret = -EINVAL;
-		goto out_unlock_bo;
-	}
+		if (WARN_ON(!xe_bo_is_pinned(bo))) {
+			ret = -EINVAL;
+			break;
+		}
 
-	if (!xe_bo_is_vram(bo))
-		goto out_unlock_bo;
+		if (!xe_bo_is_vram(bo))
+			break;
 
-	if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
-		goto out_unlock_bo;
+		if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
+			break;
 
-	if (!backup) {
-		backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
-					   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
-					   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
-					   XE_BO_FLAG_PINNED, exec);
-		if (IS_ERR(backup)) {
-			ret = PTR_ERR(backup);
-			goto out_unlock_bo;
+		if (!backup) {
+			backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL,
+						   xe_bo_size(bo),
+						   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
+						   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
+						   XE_BO_FLAG_PINNED, &exec);
+			if (IS_ERR(backup)) {
+				drm_exec_retry_on_contention(&exec);
+				ret = PTR_ERR(backup);
+				xe_validation_retry_on_oom(&ctx, &ret);
+				break;
+			}
+			backup->parent_obj = xe_bo_get(bo); /* Released by bo_destroy */
+			backup_created = true;
 		}
-		backup->parent_obj = xe_bo_get(bo); /* Released by bo_destroy */
-		backup_created = true;
-	}
 
-	if (xe_bo_is_user(bo) || (bo->flags & XE_BO_FLAG_PINNED_LATE_RESTORE)) {
-		struct xe_migrate *migrate;
-		struct dma_fence *fence;
-
-		if (bo->tile)
-			migrate = bo->tile->migrate;
-		else
-			migrate = mem_type_to_migrate(xe, bo->ttm.resource->mem_type);
+		if (xe_bo_is_user(bo) || (bo->flags & XE_BO_FLAG_PINNED_LATE_RESTORE)) {
+			struct xe_migrate *migrate;
+			struct dma_fence *fence;
 
-		ret = dma_resv_reserve_fences(bo->ttm.base.resv, 1);
-		if (ret)
-			goto out_backup;
+			if (bo->tile)
+				migrate = bo->tile->migrate;
+			else
+				migrate = mem_type_to_migrate(xe, bo->ttm.resource->mem_type);
 
-		ret = dma_resv_reserve_fences(backup->ttm.base.resv, 1);
-		if (ret)
-			goto out_backup;
+			ret = dma_resv_reserve_fences(bo->ttm.base.resv, 1);
+			if (ret)
+				goto out_backup;
 
-		fence = xe_migrate_copy(migrate, bo, backup, bo->ttm.resource,
-					backup->ttm.resource, false);
-		if (IS_ERR(fence)) {
-			ret = PTR_ERR(fence);
-			goto out_backup;
-		}
+			ret = dma_resv_reserve_fences(backup->ttm.base.resv, 1);
+			if (ret)
+				goto out_backup;
 
-		dma_resv_add_fence(bo->ttm.base.resv, fence,
-				   DMA_RESV_USAGE_KERNEL);
-		dma_resv_add_fence(backup->ttm.base.resv, fence,
-				   DMA_RESV_USAGE_KERNEL);
-		dma_fence_put(fence);
-	} else {
-		ret = xe_bo_vmap(backup);
-		if (ret)
-			goto out_backup;
+			fence = xe_migrate_copy(migrate, bo, backup, bo->ttm.resource,
+						backup->ttm.resource, false);
+			if (IS_ERR(fence)) {
+				ret = PTR_ERR(fence);
+				goto out_backup;
+			}
 
-		if (iosys_map_is_null(&bo->vmap)) {
-			ret = xe_bo_vmap(bo);
+			dma_resv_add_fence(bo->ttm.base.resv, fence,
+					   DMA_RESV_USAGE_KERNEL);
+			dma_resv_add_fence(backup->ttm.base.resv, fence,
+					   DMA_RESV_USAGE_KERNEL);
+			dma_fence_put(fence);
+		} else {
+			ret = xe_bo_vmap(backup);
 			if (ret)
 				goto out_backup;
-			unmap = true;
-		}
 
-		xe_map_memcpy_from(xe, backup->vmap.vaddr, &bo->vmap, 0,
-				   xe_bo_size(bo));
-	}
+			if (iosys_map_is_null(&bo->vmap)) {
+				ret = xe_bo_vmap(bo);
+				if (ret)
+					goto out_vunmap;
+				unmap = true;
+			}
 
-	if (!bo->backup_obj)
-		bo->backup_obj = backup;
+			xe_map_memcpy_from(xe, backup->vmap.vaddr, &bo->vmap, 0,
+					   xe_bo_size(bo));
+		}
 
+		if (!bo->backup_obj)
+			bo->backup_obj = backup;
+out_vunmap:
+		xe_bo_vunmap(backup);
 out_backup:
-	xe_bo_vunmap(backup);
-	if (ret && backup_created)
-		xe_bo_put(backup);
-out_unlock_bo:
-	if (unmap)
-		xe_bo_vunmap(bo);
-	xe_bo_unlock(bo);
+		if (ret && backup_created)
+			xe_bo_put(backup);
+		if (unmap)
+			xe_bo_vunmap(bo);
+	}
+
 	return ret;
 }
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* ✗ CI.checkpatch: warning for Driver-managed exhaustive eviction
  2025-08-13 10:51 [PATCH 00/15] Driver-managed exhaustive eviction Thomas Hellström
                   ` (14 preceding siblings ...)
  2025-08-13 10:51 ` [PATCH 15/15] drm/xe: Convert pinned suspend eviction " Thomas Hellström
@ 2025-08-13 11:54 ` Patchwork
  2025-08-13 11:55 ` ✓ CI.KUnit: success " Patchwork
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 66+ messages in thread
From: Patchwork @ 2025-08-13 11:54 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

== Series Details ==

Series: Driver-managed exhaustive eviction
URL   : https://patchwork.freedesktop.org/series/152882/
State : warning

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
6f9293a391ff3c575bc021f454be5d0a0c076f57
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit 4425bbc1f8949d4655e6680ecd3e4dda32c7e69d
Author: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Date:   Wed Aug 13 12:51:21 2025 +0200

    drm/xe: Convert pinned suspend eviction for exhaustive eviction
    
    Pinned suspend eviction and preparation for eviction validates
    system memory for eviction buffers. Do that under a
    validation exclusive lock to avoid interfering with other
    processes validating system graphics memory.
    
    Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
+ /mt/dim checkpatch 546fc742f08b8dbd3fa1486933c9b15085e11d13 drm-intel
cf30641243aa drm/xe/vm: Don't use a pin the vm_resv during validation
5aa304916620 drm/xe/tests/xe_dma_buf: Set the drm_object::dma_buf member
2888a1ff8929 drm/xe/vm: Clear the scratch_pt pointer on error
26a7db17e2ec drm/xe: Pass down drm_exec context to validation
-:1117: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#1117: 
new file mode 100644

total: 0 errors, 1 warnings, 0 checks, 1244 lines checked
fab6c1544ca5 drm/xe: Introduce an xe_validation wrapper around drm_exec
-:327: WARNING:MACRO_WITH_FLOW_CONTROL: Macros with flow control statements should be avoided
#327: FILE: drivers/gpu/drm/xe/xe_validation.h:129:
+#define xe_validation_retry_on_oom(_ctx, _ret)				\
+	do {								\
+		if (xe_validation_should_retry(_ctx, _ret))		\
+			goto *__drm_exec_retry_ptr;			\
+	} while (0)

-:349: WARNING:TABSTOP: Statements should start on a tabstop
#349: FILE: drivers/gpu/drm/xe/xe_validation.h:151:
+	     if (!IS_ERR(_T)) xe_validation_ctx_fini(_T);,

-:349: ERROR:SPACING: space required after that ';' (ctx:VxO)
#349: FILE: drivers/gpu/drm/xe/xe_validation.h:151:
+	     if (!IS_ERR(_T)) xe_validation_ctx_fini(_T);,
 	                                                ^

-:349: ERROR:TRAILING_STATEMENTS: trailing statements should be on next line
#349: FILE: drivers/gpu/drm/xe/xe_validation.h:151:
+	     if (!IS_ERR(_T)) xe_validation_ctx_fini(_T);,

-:370: ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#370: FILE: drivers/gpu/drm/xe/xe_validation.h:172:
+#define xe_validation_guard(_ctx, _val, _exec, _flags, _ret, _excl)	\
+	scoped_guard(xe_validation, _ctx, _val, _exec, _flags, _ret, _excl) \
+	drm_exec_until_all_locked(_exec)

BUT SEE:

   do {} while (0) advice is over-stated in a few situations:

   The more obvious case is macros, like MODULE_PARM_DESC, invoked at
   file-scope, where C disallows code (it must be in functions).  See
   $exceptions if you have one to add by name.

   More troublesome is declarative macros used at top of new scope,
   like DECLARE_PER_CPU.  These might just compile with a do-while-0
   wrapper, but would be incorrect.  Most of these are handled by
   detecting struct,union,etc declaration primitives in $exceptions.

   Theres also macros called inside an if (block), which "return" an
   expression.  These cannot do-while, and need a ({}) wrapper.

   Enjoy this qualification while we work to improve our heuristics.

-:370: CHECK:MACRO_ARG_REUSE: Macro argument reuse '_exec' - possible side-effects?
#370: FILE: drivers/gpu/drm/xe/xe_validation.h:172:
+#define xe_validation_guard(_ctx, _val, _exec, _flags, _ret, _excl)	\
+	scoped_guard(xe_validation, _ctx, _val, _exec, _flags, _ret, _excl) \
+	drm_exec_until_all_locked(_exec)

total: 3 errors, 2 warnings, 1 checks, 328 lines checked
b47d58d9d4a1 drm/xe: Convert xe_bo_create_user() for exhaustive eviction
16e2235dff4f drm/xe: Convert SVM validation for exhaustive eviction
0687b1d5f43c drm/xe: Convert existing drm_exec transactions for exhaustive eviction
1fe2c75f4aa2 drm/xe: Convert the CPU fault handler for exhaustive eviction
70dc7173e90e drm/xe/display: Convert __xe_pin_fb_vma()
500867c30229 drm/xe: Convert xe_dma_buf.c for exhaustive eviction
11a5fd486fb7 drm/xe: Rename ___xe_bo_create_locked()
e553928c86d4 drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction
12b3df6adabb drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction
-:51: WARNING:LONG_LINE: line length of 102 exceeds 100 columns
#51: FILE: drivers/gpu/drm/xe/display/intel_fbdev_fb.c:59:
+						XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |

total: 0 errors, 1 warnings, 0 checks, 733 lines checked
4425bbc1f894 drm/xe: Convert pinned suspend eviction for exhaustive eviction



^ permalink raw reply	[flat|nested] 66+ messages in thread

* ✓ CI.KUnit: success for Driver-managed exhaustive eviction
  2025-08-13 10:51 [PATCH 00/15] Driver-managed exhaustive eviction Thomas Hellström
                   ` (15 preceding siblings ...)
  2025-08-13 11:54 ` ✗ CI.checkpatch: warning for Driver-managed " Patchwork
@ 2025-08-13 11:55 ` Patchwork
  2025-08-13 13:20 ` ✗ Xe.CI.BAT: failure " Patchwork
  2025-08-13 14:25 ` ✗ Xe.CI.Full: " Patchwork
  18 siblings, 0 replies; 66+ messages in thread
From: Patchwork @ 2025-08-13 11:55 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

== Series Details ==

Series: Driver-managed exhaustive eviction
URL   : https://patchwork.freedesktop.org/series/152882/
State : success

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
[11:54:19] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[11:54:23] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[11:54:52] Starting KUnit Kernel (1/1)...
[11:54:52] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[11:54:52] ================== guc_buf (11 subtests) ===================
[11:54:52] [PASSED] test_smallest
[11:54:52] [PASSED] test_largest
[11:54:52] [PASSED] test_granular
[11:54:52] [PASSED] test_unique
[11:54:52] [PASSED] test_overlap
[11:54:52] [PASSED] test_reusable
[11:54:52] [PASSED] test_too_big
[11:54:52] [PASSED] test_flush
[11:54:52] [PASSED] test_lookup
[11:54:52] [PASSED] test_data
[11:54:52] [PASSED] test_class
[11:54:52] ===================== [PASSED] guc_buf =====================
[11:54:52] =================== guc_dbm (7 subtests) ===================
[11:54:52] [PASSED] test_empty
[11:54:52] [PASSED] test_default
[11:54:52] ======================== test_size  ========================
[11:54:52] [PASSED] 4
[11:54:52] [PASSED] 8
[11:54:52] [PASSED] 32
[11:54:52] [PASSED] 256
[11:54:52] ==================== [PASSED] test_size ====================
[11:54:52] ======================= test_reuse  ========================
[11:54:52] [PASSED] 4
[11:54:52] [PASSED] 8
[11:54:52] [PASSED] 32
[11:54:52] [PASSED] 256
[11:54:52] =================== [PASSED] test_reuse ====================
[11:54:52] =================== test_range_overlap  ====================
[11:54:52] [PASSED] 4
[11:54:52] [PASSED] 8
[11:54:52] [PASSED] 32
[11:54:52] [PASSED] 256
[11:54:52] =============== [PASSED] test_range_overlap ================
[11:54:52] =================== test_range_compact  ====================
[11:54:52] [PASSED] 4
[11:54:52] [PASSED] 8
[11:54:52] [PASSED] 32
[11:54:52] [PASSED] 256
[11:54:52] =============== [PASSED] test_range_compact ================
[11:54:52] ==================== test_range_spare  =====================
[11:54:52] [PASSED] 4
[11:54:52] [PASSED] 8
[11:54:52] [PASSED] 32
[11:54:52] [PASSED] 256
[11:54:52] ================ [PASSED] test_range_spare =================
[11:54:52] ===================== [PASSED] guc_dbm =====================
[11:54:52] =================== guc_idm (6 subtests) ===================
[11:54:52] [PASSED] bad_init
[11:54:52] [PASSED] no_init
[11:54:52] [PASSED] init_fini
[11:54:52] [PASSED] check_used
[11:54:52] [PASSED] check_quota
[11:54:52] [PASSED] check_all
[11:54:52] ===================== [PASSED] guc_idm =====================
[11:54:52] ================== no_relay (3 subtests) ===================
[11:54:52] [PASSED] xe_drops_guc2pf_if_not_ready
[11:54:52] [PASSED] xe_drops_guc2vf_if_not_ready
[11:54:52] [PASSED] xe_rejects_send_if_not_ready
[11:54:52] ==================== [PASSED] no_relay =====================
[11:54:52] ================== pf_relay (14 subtests) ==================
[11:54:52] [PASSED] pf_rejects_guc2pf_too_short
[11:54:52] [PASSED] pf_rejects_guc2pf_too_long
[11:54:52] [PASSED] pf_rejects_guc2pf_no_payload
[11:54:52] [PASSED] pf_fails_no_payload
[11:54:52] [PASSED] pf_fails_bad_origin
[11:54:52] [PASSED] pf_fails_bad_type
[11:54:52] [PASSED] pf_txn_reports_error
[11:54:52] [PASSED] pf_txn_sends_pf2guc
[11:54:52] [PASSED] pf_sends_pf2guc
[11:54:52] [SKIPPED] pf_loopback_nop
[11:54:52] [SKIPPED] pf_loopback_echo
[11:54:52] [SKIPPED] pf_loopback_fail
[11:54:52] [SKIPPED] pf_loopback_busy
[11:54:52] [SKIPPED] pf_loopback_retry
[11:54:52] ==================== [PASSED] pf_relay =====================
[11:54:52] ================== vf_relay (3 subtests) ===================
[11:54:52] [PASSED] vf_rejects_guc2vf_too_short
[11:54:52] [PASSED] vf_rejects_guc2vf_too_long
[11:54:52] [PASSED] vf_rejects_guc2vf_no_payload
[11:54:52] ==================== [PASSED] vf_relay =====================
[11:54:52] ===================== lmtt (1 subtest) =====================
[11:54:52] ======================== test_ops  =========================
[11:54:52] [PASSED] 2-level
[11:54:52] [PASSED] multi-level
[11:54:52] ==================== [PASSED] test_ops =====================
[11:54:52] ====================== [PASSED] lmtt =======================
[11:54:52] ================= pf_service (11 subtests) =================
[11:54:52] [PASSED] pf_negotiate_any
[11:54:52] [PASSED] pf_negotiate_base_match
[11:54:52] [PASSED] pf_negotiate_base_newer
[11:54:52] [PASSED] pf_negotiate_base_next
[11:54:52] [SKIPPED] pf_negotiate_base_older
[11:54:52] [PASSED] pf_negotiate_base_prev
[11:54:52] [PASSED] pf_negotiate_latest_match
[11:54:52] [PASSED] pf_negotiate_latest_newer
[11:54:52] [PASSED] pf_negotiate_latest_next
[11:54:52] [SKIPPED] pf_negotiate_latest_older
[11:54:52] [SKIPPED] pf_negotiate_latest_prev
[11:54:52] =================== [PASSED] pf_service ====================
[11:54:52] =================== xe_mocs (2 subtests) ===================
[11:54:52] ================ xe_live_mocs_kernel_kunit  ================
[11:54:52] =========== [SKIPPED] xe_live_mocs_kernel_kunit ============
[11:54:52] ================ xe_live_mocs_reset_kunit  =================
[11:54:52] ============ [SKIPPED] xe_live_mocs_reset_kunit ============
[11:54:52] ==================== [SKIPPED] xe_mocs =====================
[11:54:52] ================= xe_migrate (2 subtests) ==================
[11:54:52] ================= xe_migrate_sanity_kunit  =================
[11:54:52] ============ [SKIPPED] xe_migrate_sanity_kunit =============
[11:54:52] ================== xe_validate_ccs_kunit  ==================
[11:54:52] ============= [SKIPPED] xe_validate_ccs_kunit ==============
[11:54:52] =================== [SKIPPED] xe_migrate ===================
[11:54:52] ================== xe_dma_buf (1 subtest) ==================
[11:54:52] ==================== xe_dma_buf_kunit  =====================
[11:54:52] ================ [SKIPPED] xe_dma_buf_kunit ================
[11:54:52] =================== [SKIPPED] xe_dma_buf ===================
[11:54:52] ================= xe_bo_shrink (1 subtest) =================
[11:54:52] =================== xe_bo_shrink_kunit  ====================
[11:54:52] =============== [SKIPPED] xe_bo_shrink_kunit ===============
[11:54:52] ================== [SKIPPED] xe_bo_shrink ==================
[11:54:52] ==================== xe_bo (2 subtests) ====================
[11:54:52] ================== xe_ccs_migrate_kunit  ===================
[11:54:52] ============== [SKIPPED] xe_ccs_migrate_kunit ==============
[11:54:52] ==================== xe_bo_evict_kunit  ====================
[11:54:52] =============== [SKIPPED] xe_bo_evict_kunit ================
[11:54:52] ===================== [SKIPPED] xe_bo ======================
[11:54:52] ==================== args (11 subtests) ====================
[11:54:52] [PASSED] count_args_test
[11:54:52] [PASSED] call_args_example
[11:54:52] [PASSED] call_args_test
[11:54:52] [PASSED] drop_first_arg_example
[11:54:52] [PASSED] drop_first_arg_test
[11:54:52] [PASSED] first_arg_example
[11:54:52] [PASSED] first_arg_test
[11:54:52] [PASSED] last_arg_example
[11:54:52] [PASSED] last_arg_test
[11:54:52] [PASSED] pick_arg_example
[11:54:52] [PASSED] sep_comma_example
[11:54:52] ====================== [PASSED] args =======================
[11:54:52] =================== xe_pci (3 subtests) ====================
[11:54:52] ==================== check_graphics_ip  ====================
[11:54:52] [PASSED] 12.70 Xe_LPG
[11:54:52] [PASSED] 12.71 Xe_LPG
[11:54:52] [PASSED] 12.74 Xe_LPG+
[11:54:52] [PASSED] 20.01 Xe2_HPG
[11:54:52] [PASSED] 20.02 Xe2_HPG
[11:54:52] [PASSED] 20.04 Xe2_LPG
[11:54:52] [PASSED] 30.00 Xe3_LPG
[11:54:52] [PASSED] 30.01 Xe3_LPG
[11:54:52] [PASSED] 30.03 Xe3_LPG
[11:54:52] ================ [PASSED] check_graphics_ip ================
[11:54:52] ===================== check_media_ip  ======================
[11:54:52] [PASSED] 13.00 Xe_LPM+
[11:54:52] [PASSED] 13.01 Xe2_HPM
[11:54:52] [PASSED] 20.00 Xe2_LPM
[11:54:52] [PASSED] 30.00 Xe3_LPM
[11:54:52] [PASSED] 30.02 Xe3_LPM
[11:54:52] ================= [PASSED] check_media_ip ==================
[11:54:52] ================= check_platform_gt_count  =================
[11:54:52] [PASSED] 0x9A60 (TIGERLAKE)
[11:54:52] [PASSED] 0x9A68 (TIGERLAKE)
[11:54:52] [PASSED] 0x9A70 (TIGERLAKE)
[11:54:52] [PASSED] 0x9A40 (TIGERLAKE)
[11:54:52] [PASSED] 0x9A49 (TIGERLAKE)
[11:54:52] [PASSED] 0x9A59 (TIGERLAKE)
[11:54:52] [PASSED] 0x9A78 (TIGERLAKE)
[11:54:52] [PASSED] 0x9AC0 (TIGERLAKE)
[11:54:52] [PASSED] 0x9AC9 (TIGERLAKE)
[11:54:52] [PASSED] 0x9AD9 (TIGERLAKE)
[11:54:52] [PASSED] 0x9AF8 (TIGERLAKE)
[11:54:52] [PASSED] 0x4C80 (ROCKETLAKE)
[11:54:52] [PASSED] 0x4C8A (ROCKETLAKE)
[11:54:52] [PASSED] 0x4C8B (ROCKETLAKE)
[11:54:52] [PASSED] 0x4C8C (ROCKETLAKE)
[11:54:52] [PASSED] 0x4C90 (ROCKETLAKE)
[11:54:52] [PASSED] 0x4C9A (ROCKETLAKE)
[11:54:52] [PASSED] 0x4680 (ALDERLAKE_S)
[11:54:52] [PASSED] 0x4682 (ALDERLAKE_S)
[11:54:52] [PASSED] 0x4688 (ALDERLAKE_S)
[11:54:52] [PASSED] 0x468A (ALDERLAKE_S)
[11:54:52] [PASSED] 0x468B (ALDERLAKE_S)
[11:54:52] [PASSED] 0x4690 (ALDERLAKE_S)
[11:54:52] [PASSED] 0x4692 (ALDERLAKE_S)
[11:54:52] [PASSED] 0x4693 (ALDERLAKE_S)
[11:54:52] [PASSED] 0x46A0 (ALDERLAKE_P)
[11:54:52] [PASSED] 0x46A1 (ALDERLAKE_P)
[11:54:52] [PASSED] 0x46A2 (ALDERLAKE_P)
[11:54:52] [PASSED] 0x46A3 (ALDERLAKE_P)
[11:54:52] [PASSED] 0x46A6 (ALDERLAKE_P)
[11:54:52] [PASSED] 0x46A8 (ALDERLAKE_P)
[11:54:52] [PASSED] 0x46AA (ALDERLAKE_P)
[11:54:52] [PASSED] 0x462A (ALDERLAKE_P)
[11:54:52] [PASSED] 0x4626 (ALDERLAKE_P)
[11:54:52] [PASSED] 0x4628 (ALDERLAKE_P)
[11:54:52] [PASSED] 0x46B0 (ALDERLAKE_P)
[11:54:52] [PASSED] 0x46B1 (ALDERLAKE_P)
[11:54:52] [PASSED] 0x46B2 (ALDERLAKE_P)
[11:54:52] [PASSED] 0x46B3 (ALDERLAKE_P)
[11:54:52] [PASSED] 0x46C0 (ALDERLAKE_P)
[11:54:52] [PASSED] 0x46C1 (ALDERLAKE_P)
[11:54:52] [PASSED] 0x46C2 (ALDERLAKE_P)
[11:54:52] [PASSED] 0x46C3 (ALDERLAKE_P)
[11:54:52] [PASSED] 0x46D0 (ALDERLAKE_N)
[11:54:52] [PASSED] 0x46D1 (ALDERLAKE_N)
[11:54:52] [PASSED] 0x46D2 (ALDERLAKE_N)
[11:54:52] [PASSED] 0x46D3 (ALDERLAKE_N)
[11:54:52] [PASSED] 0x46D4 (ALDERLAKE_N)
[11:54:52] [PASSED] 0xA721 (ALDERLAKE_P)
[11:54:52] [PASSED] 0xA7A1 (ALDERLAKE_P)
[11:54:52] [PASSED] 0xA7A9 (ALDERLAKE_P)
[11:54:52] [PASSED] 0xA7AC (ALDERLAKE_P)
[11:54:52] [PASSED] 0xA7AD (ALDERLAKE_P)
[11:54:52] [PASSED] 0xA720 (ALDERLAKE_P)
[11:54:52] [PASSED] 0xA7A0 (ALDERLAKE_P)
[11:54:52] [PASSED] 0xA7A8 (ALDERLAKE_P)
[11:54:52] [PASSED] 0xA7AA (ALDERLAKE_P)
[11:54:52] [PASSED] 0xA7AB (ALDERLAKE_P)
[11:54:52] [PASSED] 0xA780 (ALDERLAKE_S)
[11:54:52] [PASSED] 0xA781 (ALDERLAKE_S)
[11:54:52] [PASSED] 0xA782 (ALDERLAKE_S)
[11:54:52] [PASSED] 0xA783 (ALDERLAKE_S)
[11:54:52] [PASSED] 0xA788 (ALDERLAKE_S)
[11:54:52] [PASSED] 0xA789 (ALDERLAKE_S)
[11:54:52] [PASSED] 0xA78A (ALDERLAKE_S)
[11:54:52] [PASSED] 0xA78B (ALDERLAKE_S)
[11:54:52] [PASSED] 0x4905 (DG1)
[11:54:52] [PASSED] 0x4906 (DG1)
[11:54:52] [PASSED] 0x4907 (DG1)
[11:54:52] [PASSED] 0x4908 (DG1)
[11:54:52] [PASSED] 0x4909 (DG1)
[11:54:52] [PASSED] 0x56C0 (DG2)
[11:54:52] [PASSED] 0x56C2 (DG2)
[11:54:52] [PASSED] 0x56C1 (DG2)
[11:54:52] [PASSED] 0x7D51 (METEORLAKE)
[11:54:52] [PASSED] 0x7DD1 (METEORLAKE)
[11:54:52] [PASSED] 0x7D41 (METEORLAKE)
[11:54:52] [PASSED] 0x7D67 (METEORLAKE)
[11:54:52] [PASSED] 0xB640 (METEORLAKE)
[11:54:52] [PASSED] 0x56A0 (DG2)
[11:54:52] [PASSED] 0x56A1 (DG2)
[11:54:52] [PASSED] 0x56A2 (DG2)
[11:54:52] [PASSED] 0x56BE (DG2)
[11:54:52] [PASSED] 0x56BF (DG2)
[11:54:52] [PASSED] 0x5690 (DG2)
[11:54:52] [PASSED] 0x5691 (DG2)
[11:54:52] [PASSED] 0x5692 (DG2)
[11:54:52] [PASSED] 0x56A5 (DG2)
[11:54:52] [PASSED] 0x56A6 (DG2)
[11:54:52] [PASSED] 0x56B0 (DG2)
[11:54:52] [PASSED] 0x56B1 (DG2)
[11:54:52] [PASSED] 0x56BA (DG2)
[11:54:52] [PASSED] 0x56BB (DG2)
[11:54:52] [PASSED] 0x56BC (DG2)
[11:54:52] [PASSED] 0x56BD (DG2)
[11:54:52] [PASSED] 0x5693 (DG2)
[11:54:52] [PASSED] 0x5694 (DG2)
[11:54:52] [PASSED] 0x5695 (DG2)
[11:54:52] [PASSED] 0x56A3 (DG2)
[11:54:52] [PASSED] 0x56A4 (DG2)
[11:54:52] [PASSED] 0x56B2 (DG2)
[11:54:52] [PASSED] 0x56B3 (DG2)
[11:54:52] [PASSED] 0x5696 (DG2)
[11:54:52] [PASSED] 0x5697 (DG2)
[11:54:52] [PASSED] 0xB69 (PVC)
[11:54:52] [PASSED] 0xB6E (PVC)
[11:54:52] [PASSED] 0xBD4 (PVC)
[11:54:52] [PASSED] 0xBD5 (PVC)
[11:54:52] [PASSED] 0xBD6 (PVC)
[11:54:52] [PASSED] 0xBD7 (PVC)
[11:54:52] [PASSED] 0xBD8 (PVC)
[11:54:52] [PASSED] 0xBD9 (PVC)
[11:54:52] [PASSED] 0xBDA (PVC)
[11:54:52] [PASSED] 0xBDB (PVC)
[11:54:52] [PASSED] 0xBE0 (PVC)
[11:54:52] [PASSED] 0xBE1 (PVC)
[11:54:52] [PASSED] 0xBE5 (PVC)
[11:54:52] [PASSED] 0x7D40 (METEORLAKE)
[11:54:52] [PASSED] 0x7D45 (METEORLAKE)
[11:54:52] [PASSED] 0x7D55 (METEORLAKE)
[11:54:52] [PASSED] 0x7D60 (METEORLAKE)
[11:54:52] [PASSED] 0x7DD5 (METEORLAKE)
[11:54:52] [PASSED] 0x6420 (LUNARLAKE)
[11:54:52] [PASSED] 0x64A0 (LUNARLAKE)
[11:54:52] [PASSED] 0x64B0 (LUNARLAKE)
[11:54:52] [PASSED] 0xE202 (BATTLEMAGE)
[11:54:52] [PASSED] 0xE209 (BATTLEMAGE)
[11:54:52] [PASSED] 0xE20B (BATTLEMAGE)
[11:54:52] [PASSED] 0xE20C (BATTLEMAGE)
[11:54:52] [PASSED] 0xE20D (BATTLEMAGE)
[11:54:52] [PASSED] 0xE210 (BATTLEMAGE)
[11:54:52] [PASSED] 0xE211 (BATTLEMAGE)
[11:54:52] [PASSED] 0xE212 (BATTLEMAGE)
[11:54:52] [PASSED] 0xE216 (BATTLEMAGE)
[11:54:52] [PASSED] 0xE220 (BATTLEMAGE)
[11:54:52] [PASSED] 0xE221 (BATTLEMAGE)
[11:54:52] [PASSED] 0xE222 (BATTLEMAGE)
[11:54:52] [PASSED] 0xE223 (BATTLEMAGE)
[11:54:52] [PASSED] 0xB080 (PANTHERLAKE)
[11:54:52] [PASSED] 0xB081 (PANTHERLAKE)
[11:54:52] [PASSED] 0xB082 (PANTHERLAKE)
[11:54:52] [PASSED] 0xB083 (PANTHERLAKE)
[11:54:52] [PASSED] 0xB084 (PANTHERLAKE)
[11:54:52] [PASSED] 0xB085 (PANTHERLAKE)
[11:54:52] [PASSED] 0xB086 (PANTHERLAKE)
[11:54:52] [PASSED] 0xB087 (PANTHERLAKE)
[11:54:52] [PASSED] 0xB08F (PANTHERLAKE)
[11:54:52] [PASSED] 0xB090 (PANTHERLAKE)
[11:54:52] [PASSED] 0xB0A0 (PANTHERLAKE)
[11:54:52] [PASSED] 0xB0B0 (PANTHERLAKE)
[11:54:52] [PASSED] 0xFD80 (PANTHERLAKE)
[11:54:52] [PASSED] 0xFD81 (PANTHERLAKE)
[11:54:52] ============= [PASSED] check_platform_gt_count =============
[11:54:52] ===================== [PASSED] xe_pci ======================
[11:54:52] =================== xe_rtp (2 subtests) ====================
[11:54:52] =============== xe_rtp_process_to_sr_tests  ================
[11:54:52] [PASSED] coalesce-same-reg
[11:54:52] [PASSED] no-match-no-add
[11:54:52] [PASSED] match-or
[11:54:52] [PASSED] match-or-xfail
[11:54:52] [PASSED] no-match-no-add-multiple-rules
[11:54:52] [PASSED] two-regs-two-entries
[11:54:52] [PASSED] clr-one-set-other
[11:54:52] [PASSED] set-field
[11:54:52] [PASSED] conflict-duplicate
[11:54:52] [PASSED] conflict-not-disjoint
[11:54:52] [PASSED] conflict-reg-type
[11:54:52] =========== [PASSED] xe_rtp_process_to_sr_tests ============
[11:54:52] ================== xe_rtp_process_tests  ===================
[11:54:52] [PASSED] active1
[11:54:52] [PASSED] active2
[11:54:52] [PASSED] active-inactive
[11:54:52] [PASSED] inactive-active
[11:54:52] [PASSED] inactive-1st_or_active-inactive
[11:54:52] [PASSED] inactive-2nd_or_active-inactive
[11:54:52] [PASSED] inactive-last_or_active-inactive
[11:54:52] [PASSED] inactive-no_or_active-inactive
[11:54:52] ============== [PASSED] xe_rtp_process_tests ===============
[11:54:52] ===================== [PASSED] xe_rtp ======================
[11:54:52] ==================== xe_wa (1 subtest) =====================
[11:54:52] ======================== xe_wa_gt  =========================
[11:54:52] [PASSED] TIGERLAKE (B0)
[11:54:52] [PASSED] DG1 (A0)
[11:54:52] [PASSED] DG1 (B0)
[11:54:52] [PASSED] ALDERLAKE_S (A0)
[11:54:52] [PASSED] ALDERLAKE_S (B0)
[11:54:52] [PASSED] ALDERLAKE_S (C0)
[11:54:52] [PASSED] ALDERLAKE_S (D0)
[11:54:52] [PASSED] ALDERLAKE_P (A0)
[11:54:52] [PASSED] ALDERLAKE_P (B0)
[11:54:52] [PASSED] ALDERLAKE_P (C0)
[11:54:52] [PASSED] ALDERLAKE_S_RPLS (D0)
[11:54:52] [PASSED] ALDERLAKE_P_RPLU (E0)
[11:54:52] [PASSED] DG2_G10 (C0)
[11:54:52] [PASSED] DG2_G11 (B1)
[11:54:52] [PASSED] DG2_G12 (A1)
[11:54:52] [PASSED] METEORLAKE (g:A0, m:A0)
[11:54:52] [PASSED] METEORLAKE (g:A0, m:A0)
[11:54:52] [PASSED] METEORLAKE (g:A0, m:A0)
[11:54:52] [PASSED] LUNARLAKE (g:A0, m:A0)
[11:54:52] [PASSED] LUNARLAKE (g:B0, m:A0)
stty: 'standard input': Inappropriate ioctl for device
[11:54:52] [PASSED] BATTLEMAGE (g:A0, m:A1)
[11:54:52] ==================== [PASSED] xe_wa_gt =====================
[11:54:52] ====================== [PASSED] xe_wa ======================
[11:54:52] ============================================================
[11:54:52] Testing complete. Ran 297 tests: passed: 281, skipped: 16
[11:54:52] Elapsed time: 33.288s total, 4.222s configuring, 28.699s building, 0.319s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/tests/.kunitconfig
[11:54:52] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[11:54:54] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[11:55:17] Starting KUnit Kernel (1/1)...
[11:55:17] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[11:55:17] == drm_test_atomic_get_connector_for_encoder (1 subtest) ===
[11:55:17] [PASSED] drm_test_drm_atomic_get_connector_for_encoder
[11:55:17] ==== [PASSED] drm_test_atomic_get_connector_for_encoder ====
[11:55:17] =========== drm_validate_clone_mode (2 subtests) ===========
[11:55:17] ============== drm_test_check_in_clone_mode  ===============
[11:55:17] [PASSED] in_clone_mode
[11:55:17] [PASSED] not_in_clone_mode
[11:55:17] ========== [PASSED] drm_test_check_in_clone_mode ===========
[11:55:17] =============== drm_test_check_valid_clones  ===============
[11:55:17] [PASSED] not_in_clone_mode
[11:55:17] [PASSED] valid_clone
[11:55:17] [PASSED] invalid_clone
[11:55:17] =========== [PASSED] drm_test_check_valid_clones ===========
[11:55:17] ============= [PASSED] drm_validate_clone_mode =============
[11:55:17] ============= drm_validate_modeset (1 subtest) =============
[11:55:17] [PASSED] drm_test_check_connector_changed_modeset
[11:55:17] ============== [PASSED] drm_validate_modeset ===============
[11:55:17] ====== drm_test_bridge_get_current_state (2 subtests) ======
[11:55:17] [PASSED] drm_test_drm_bridge_get_current_state_atomic
[11:55:17] [PASSED] drm_test_drm_bridge_get_current_state_legacy
[11:55:17] ======== [PASSED] drm_test_bridge_get_current_state ========
[11:55:17] ====== drm_test_bridge_helper_reset_crtc (3 subtests) ======
[11:55:17] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic
[11:55:17] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic_disabled
[11:55:17] [PASSED] drm_test_drm_bridge_helper_reset_crtc_legacy
[11:55:17] ======== [PASSED] drm_test_bridge_helper_reset_crtc ========
[11:55:17] ============== drm_bridge_alloc (2 subtests) ===============
[11:55:17] [PASSED] drm_test_drm_bridge_alloc_basic
[11:55:17] [PASSED] drm_test_drm_bridge_alloc_get_put
[11:55:17] ================ [PASSED] drm_bridge_alloc =================
[11:55:17] ================== drm_buddy (7 subtests) ==================
[11:55:17] [PASSED] drm_test_buddy_alloc_limit
[11:55:17] [PASSED] drm_test_buddy_alloc_optimistic
[11:55:17] [PASSED] drm_test_buddy_alloc_pessimistic
[11:55:17] [PASSED] drm_test_buddy_alloc_pathological
[11:55:17] [PASSED] drm_test_buddy_alloc_contiguous
[11:55:17] [PASSED] drm_test_buddy_alloc_clear
[11:55:17] [PASSED] drm_test_buddy_alloc_range_bias
[11:55:17] ==================== [PASSED] drm_buddy ====================
[11:55:17] ============= drm_cmdline_parser (40 subtests) =============
[11:55:17] [PASSED] drm_test_cmdline_force_d_only
[11:55:17] [PASSED] drm_test_cmdline_force_D_only_dvi
[11:55:17] [PASSED] drm_test_cmdline_force_D_only_hdmi
[11:55:17] [PASSED] drm_test_cmdline_force_D_only_not_digital
[11:55:17] [PASSED] drm_test_cmdline_force_e_only
[11:55:17] [PASSED] drm_test_cmdline_res
[11:55:17] [PASSED] drm_test_cmdline_res_vesa
[11:55:17] [PASSED] drm_test_cmdline_res_vesa_rblank
[11:55:17] [PASSED] drm_test_cmdline_res_rblank
[11:55:17] [PASSED] drm_test_cmdline_res_bpp
[11:55:17] [PASSED] drm_test_cmdline_res_refresh
[11:55:17] [PASSED] drm_test_cmdline_res_bpp_refresh
[11:55:17] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced
[11:55:17] [PASSED] drm_test_cmdline_res_bpp_refresh_margins
[11:55:17] [PASSED] drm_test_cmdline_res_bpp_refresh_force_off
[11:55:17] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on
[11:55:17] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_analog
[11:55:17] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_digital
[11:55:17] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced_margins_force_on
[11:55:17] [PASSED] drm_test_cmdline_res_margins_force_on
[11:55:17] [PASSED] drm_test_cmdline_res_vesa_margins
[11:55:17] [PASSED] drm_test_cmdline_name
[11:55:17] [PASSED] drm_test_cmdline_name_bpp
[11:55:17] [PASSED] drm_test_cmdline_name_option
[11:55:17] [PASSED] drm_test_cmdline_name_bpp_option
[11:55:17] [PASSED] drm_test_cmdline_rotate_0
[11:55:17] [PASSED] drm_test_cmdline_rotate_90
[11:55:17] [PASSED] drm_test_cmdline_rotate_180
[11:55:17] [PASSED] drm_test_cmdline_rotate_270
[11:55:17] [PASSED] drm_test_cmdline_hmirror
[11:55:17] [PASSED] drm_test_cmdline_vmirror
[11:55:17] [PASSED] drm_test_cmdline_margin_options
[11:55:17] [PASSED] drm_test_cmdline_multiple_options
[11:55:17] [PASSED] drm_test_cmdline_bpp_extra_and_option
[11:55:17] [PASSED] drm_test_cmdline_extra_and_option
[11:55:17] [PASSED] drm_test_cmdline_freestanding_options
[11:55:17] [PASSED] drm_test_cmdline_freestanding_force_e_and_options
[11:55:17] [PASSED] drm_test_cmdline_panel_orientation
[11:55:17] ================ drm_test_cmdline_invalid  =================
[11:55:17] [PASSED] margin_only
[11:55:17] [PASSED] interlace_only
[11:55:17] [PASSED] res_missing_x
[11:55:17] [PASSED] res_missing_y
[11:55:17] [PASSED] res_bad_y
[11:55:17] [PASSED] res_missing_y_bpp
[11:55:17] [PASSED] res_bad_bpp
[11:55:17] [PASSED] res_bad_refresh
[11:55:17] [PASSED] res_bpp_refresh_force_on_off
[11:55:17] [PASSED] res_invalid_mode
[11:55:17] [PASSED] res_bpp_wrong_place_mode
[11:55:17] [PASSED] name_bpp_refresh
[11:55:17] [PASSED] name_refresh
[11:55:17] [PASSED] name_refresh_wrong_mode
[11:55:17] [PASSED] name_refresh_invalid_mode
[11:55:17] [PASSED] rotate_multiple
[11:55:17] [PASSED] rotate_invalid_val
[11:55:17] [PASSED] rotate_truncated
[11:55:17] [PASSED] invalid_option
[11:55:17] [PASSED] invalid_tv_option
[11:55:17] [PASSED] truncated_tv_option
[11:55:17] ============ [PASSED] drm_test_cmdline_invalid =============
[11:55:17] =============== drm_test_cmdline_tv_options  ===============
[11:55:17] [PASSED] NTSC
[11:55:17] [PASSED] NTSC_443
[11:55:17] [PASSED] NTSC_J
[11:55:17] [PASSED] PAL
[11:55:17] [PASSED] PAL_M
[11:55:17] [PASSED] PAL_N
[11:55:17] [PASSED] SECAM
[11:55:17] [PASSED] MONO_525
[11:55:17] [PASSED] MONO_625
[11:55:17] =========== [PASSED] drm_test_cmdline_tv_options ===========
[11:55:17] =============== [PASSED] drm_cmdline_parser ================
[11:55:17] ========== drmm_connector_hdmi_init (20 subtests) ==========
[11:55:17] [PASSED] drm_test_connector_hdmi_init_valid
[11:55:17] [PASSED] drm_test_connector_hdmi_init_bpc_8
[11:55:17] [PASSED] drm_test_connector_hdmi_init_bpc_10
[11:55:17] [PASSED] drm_test_connector_hdmi_init_bpc_12
[11:55:17] [PASSED] drm_test_connector_hdmi_init_bpc_invalid
[11:55:17] [PASSED] drm_test_connector_hdmi_init_bpc_null
[11:55:17] [PASSED] drm_test_connector_hdmi_init_formats_empty
[11:55:17] [PASSED] drm_test_connector_hdmi_init_formats_no_rgb
[11:55:17] === drm_test_connector_hdmi_init_formats_yuv420_allowed  ===
[11:55:17] [PASSED] supported_formats=0x9 yuv420_allowed=1
[11:55:17] [PASSED] supported_formats=0x9 yuv420_allowed=0
[11:55:17] [PASSED] supported_formats=0x3 yuv420_allowed=1
[11:55:17] [PASSED] supported_formats=0x3 yuv420_allowed=0
[11:55:17] === [PASSED] drm_test_connector_hdmi_init_formats_yuv420_allowed ===
[11:55:17] [PASSED] drm_test_connector_hdmi_init_null_ddc
[11:55:17] [PASSED] drm_test_connector_hdmi_init_null_product
[11:55:17] [PASSED] drm_test_connector_hdmi_init_null_vendor
[11:55:17] [PASSED] drm_test_connector_hdmi_init_product_length_exact
[11:55:17] [PASSED] drm_test_connector_hdmi_init_product_length_too_long
[11:55:17] [PASSED] drm_test_connector_hdmi_init_product_valid
[11:55:17] [PASSED] drm_test_connector_hdmi_init_vendor_length_exact
[11:55:17] [PASSED] drm_test_connector_hdmi_init_vendor_length_too_long
[11:55:17] [PASSED] drm_test_connector_hdmi_init_vendor_valid
[11:55:17] ========= drm_test_connector_hdmi_init_type_valid  =========
[11:55:17] [PASSED] HDMI-A
[11:55:17] [PASSED] HDMI-B
[11:55:17] ===== [PASSED] drm_test_connector_hdmi_init_type_valid =====
[11:55:17] ======== drm_test_connector_hdmi_init_type_invalid  ========
[11:55:17] [PASSED] Unknown
[11:55:17] [PASSED] VGA
[11:55:17] [PASSED] DVI-I
[11:55:17] [PASSED] DVI-D
[11:55:17] [PASSED] DVI-A
[11:55:17] [PASSED] Composite
[11:55:17] [PASSED] SVIDEO
[11:55:17] [PASSED] LVDS
[11:55:17] [PASSED] Component
[11:55:17] [PASSED] DIN
[11:55:17] [PASSED] DP
[11:55:17] [PASSED] TV
[11:55:17] [PASSED] eDP
[11:55:17] [PASSED] Virtual
[11:55:17] [PASSED] DSI
[11:55:17] [PASSED] DPI
[11:55:17] [PASSED] Writeback
[11:55:17] [PASSED] SPI
[11:55:17] [PASSED] USB
[11:55:17] ==== [PASSED] drm_test_connector_hdmi_init_type_invalid ====
[11:55:17] ============ [PASSED] drmm_connector_hdmi_init =============
[11:55:17] ============= drmm_connector_init (3 subtests) =============
[11:55:17] [PASSED] drm_test_drmm_connector_init
[11:55:17] [PASSED] drm_test_drmm_connector_init_null_ddc
[11:55:17] ========= drm_test_drmm_connector_init_type_valid  =========
[11:55:17] [PASSED] Unknown
[11:55:17] [PASSED] VGA
[11:55:17] [PASSED] DVI-I
[11:55:17] [PASSED] DVI-D
[11:55:17] [PASSED] DVI-A
[11:55:17] [PASSED] Composite
[11:55:17] [PASSED] SVIDEO
[11:55:17] [PASSED] LVDS
[11:55:17] [PASSED] Component
[11:55:17] [PASSED] DIN
[11:55:17] [PASSED] DP
[11:55:17] [PASSED] HDMI-A
[11:55:17] [PASSED] HDMI-B
[11:55:17] [PASSED] TV
[11:55:17] [PASSED] eDP
[11:55:17] [PASSED] Virtual
[11:55:17] [PASSED] DSI
[11:55:17] [PASSED] DPI
[11:55:17] [PASSED] Writeback
[11:55:17] [PASSED] SPI
[11:55:17] [PASSED] USB
[11:55:17] ===== [PASSED] drm_test_drmm_connector_init_type_valid =====
[11:55:17] =============== [PASSED] drmm_connector_init ===============
[11:55:17] ========= drm_connector_dynamic_init (6 subtests) ==========
[11:55:17] [PASSED] drm_test_drm_connector_dynamic_init
[11:55:17] [PASSED] drm_test_drm_connector_dynamic_init_null_ddc
[11:55:17] [PASSED] drm_test_drm_connector_dynamic_init_not_added
[11:55:17] [PASSED] drm_test_drm_connector_dynamic_init_properties
[11:55:17] ===== drm_test_drm_connector_dynamic_init_type_valid  ======
[11:55:17] [PASSED] Unknown
[11:55:17] [PASSED] VGA
[11:55:17] [PASSED] DVI-I
[11:55:17] [PASSED] DVI-D
[11:55:17] [PASSED] DVI-A
[11:55:17] [PASSED] Composite
[11:55:17] [PASSED] SVIDEO
[11:55:17] [PASSED] LVDS
[11:55:17] [PASSED] Component
[11:55:17] [PASSED] DIN
[11:55:17] [PASSED] DP
[11:55:17] [PASSED] HDMI-A
[11:55:17] [PASSED] HDMI-B
[11:55:17] [PASSED] TV
[11:55:17] [PASSED] eDP
[11:55:17] [PASSED] Virtual
[11:55:17] [PASSED] DSI
[11:55:17] [PASSED] DPI
[11:55:17] [PASSED] Writeback
[11:55:17] [PASSED] SPI
[11:55:17] [PASSED] USB
[11:55:17] = [PASSED] drm_test_drm_connector_dynamic_init_type_valid ==
[11:55:17] ======== drm_test_drm_connector_dynamic_init_name  =========
[11:55:17] [PASSED] Unknown
[11:55:17] [PASSED] VGA
[11:55:17] [PASSED] DVI-I
[11:55:17] [PASSED] DVI-D
[11:55:17] [PASSED] DVI-A
[11:55:17] [PASSED] Composite
[11:55:17] [PASSED] SVIDEO
[11:55:17] [PASSED] LVDS
[11:55:17] [PASSED] Component
[11:55:17] [PASSED] DIN
[11:55:17] [PASSED] DP
[11:55:17] [PASSED] HDMI-A
[11:55:17] [PASSED] HDMI-B
[11:55:17] [PASSED] TV
[11:55:17] [PASSED] eDP
[11:55:17] [PASSED] Virtual
[11:55:17] [PASSED] DSI
[11:55:17] [PASSED] DPI
[11:55:17] [PASSED] Writeback
[11:55:17] [PASSED] SPI
[11:55:17] [PASSED] USB
[11:55:17] ==== [PASSED] drm_test_drm_connector_dynamic_init_name =====
[11:55:17] =========== [PASSED] drm_connector_dynamic_init ============
[11:55:17] ==== drm_connector_dynamic_register_early (4 subtests) =====
[11:55:17] [PASSED] drm_test_drm_connector_dynamic_register_early_on_list
[11:55:17] [PASSED] drm_test_drm_connector_dynamic_register_early_defer
[11:55:17] [PASSED] drm_test_drm_connector_dynamic_register_early_no_init
[11:55:17] [PASSED] drm_test_drm_connector_dynamic_register_early_no_mode_object
[11:55:17] ====== [PASSED] drm_connector_dynamic_register_early =======
[11:55:17] ======= drm_connector_dynamic_register (7 subtests) ========
[11:55:17] [PASSED] drm_test_drm_connector_dynamic_register_on_list
[11:55:17] [PASSED] drm_test_drm_connector_dynamic_register_no_defer
[11:55:17] [PASSED] drm_test_drm_connector_dynamic_register_no_init
[11:55:17] [PASSED] drm_test_drm_connector_dynamic_register_mode_object
[11:55:17] [PASSED] drm_test_drm_connector_dynamic_register_sysfs
[11:55:17] [PASSED] drm_test_drm_connector_dynamic_register_sysfs_name
[11:55:17] [PASSED] drm_test_drm_connector_dynamic_register_debugfs
[11:55:17] ========= [PASSED] drm_connector_dynamic_register ==========
[11:55:17] = drm_connector_attach_broadcast_rgb_property (2 subtests) =
[11:55:17] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property
[11:55:17] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property_hdmi_connector
[11:55:17] === [PASSED] drm_connector_attach_broadcast_rgb_property ===
[11:55:17] ========== drm_get_tv_mode_from_name (2 subtests) ==========
[11:55:17] ========== drm_test_get_tv_mode_from_name_valid  ===========
[11:55:17] [PASSED] NTSC
[11:55:17] [PASSED] NTSC-443
[11:55:17] [PASSED] NTSC-J
[11:55:17] [PASSED] PAL
[11:55:17] [PASSED] PAL-M
[11:55:17] [PASSED] PAL-N
[11:55:17] [PASSED] SECAM
[11:55:17] [PASSED] Mono
[11:55:17] ====== [PASSED] drm_test_get_tv_mode_from_name_valid =======
[11:55:17] [PASSED] drm_test_get_tv_mode_from_name_truncated
[11:55:17] ============ [PASSED] drm_get_tv_mode_from_name ============
[11:55:17] = drm_test_connector_hdmi_compute_mode_clock (12 subtests) =
[11:55:17] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb
[11:55:17] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc
[11:55:17] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc_vic_1
[11:55:17] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc
[11:55:17] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc_vic_1
[11:55:17] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_double
[11:55:17] = drm_test_connector_hdmi_compute_mode_clock_yuv420_valid  =
[11:55:17] [PASSED] VIC 96
[11:55:17] [PASSED] VIC 97
[11:55:17] [PASSED] VIC 101
[11:55:17] [PASSED] VIC 102
[11:55:17] [PASSED] VIC 106
[11:55:17] [PASSED] VIC 107
[11:55:17] === [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_valid ===
[11:55:17] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_10_bpc
[11:55:17] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_12_bpc
[11:55:17] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_8_bpc
[11:55:17] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_10_bpc
[11:55:17] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_12_bpc
[11:55:17] === [PASSED] drm_test_connector_hdmi_compute_mode_clock ====
[11:55:17] == drm_hdmi_connector_get_broadcast_rgb_name (2 subtests) ==
[11:55:17] === drm_test_drm_hdmi_connector_get_broadcast_rgb_name  ====
[11:55:17] [PASSED] Automatic
[11:55:17] [PASSED] Full
[11:55:17] [PASSED] Limited 16:235
[11:55:17] === [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name ===
[11:55:17] [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name_invalid
[11:55:17] ==== [PASSED] drm_hdmi_connector_get_broadcast_rgb_name ====
[11:55:17] == drm_hdmi_connector_get_output_format_name (2 subtests) ==
[11:55:17] === drm_test_drm_hdmi_connector_get_output_format_name  ====
[11:55:17] [PASSED] RGB
[11:55:17] [PASSED] YUV 4:2:0
[11:55:17] [PASSED] YUV 4:2:2
[11:55:17] [PASSED] YUV 4:4:4
[11:55:17] === [PASSED] drm_test_drm_hdmi_connector_get_output_format_name ===
[11:55:17] [PASSED] drm_test_drm_hdmi_connector_get_output_format_name_invalid
[11:55:17] ==== [PASSED] drm_hdmi_connector_get_output_format_name ====
[11:55:17] ============= drm_damage_helper (21 subtests) ==============
[11:55:17] [PASSED] drm_test_damage_iter_no_damage
[11:55:17] [PASSED] drm_test_damage_iter_no_damage_fractional_src
[11:55:17] [PASSED] drm_test_damage_iter_no_damage_src_moved
[11:55:17] [PASSED] drm_test_damage_iter_no_damage_fractional_src_moved
[11:55:17] [PASSED] drm_test_damage_iter_no_damage_not_visible
[11:55:17] [PASSED] drm_test_damage_iter_no_damage_no_crtc
[11:55:17] [PASSED] drm_test_damage_iter_no_damage_no_fb
[11:55:17] [PASSED] drm_test_damage_iter_simple_damage
[11:55:17] [PASSED] drm_test_damage_iter_single_damage
[11:55:17] [PASSED] drm_test_damage_iter_single_damage_intersect_src
[11:55:17] [PASSED] drm_test_damage_iter_single_damage_outside_src
[11:55:17] [PASSED] drm_test_damage_iter_single_damage_fractional_src
[11:55:17] [PASSED] drm_test_damage_iter_single_damage_intersect_fractional_src
[11:55:17] [PASSED] drm_test_damage_iter_single_damage_outside_fractional_src
[11:55:17] [PASSED] drm_test_damage_iter_single_damage_src_moved
[11:55:17] [PASSED] drm_test_damage_iter_single_damage_fractional_src_moved
[11:55:17] [PASSED] drm_test_damage_iter_damage
[11:55:17] [PASSED] drm_test_damage_iter_damage_one_intersect
[11:55:17] [PASSED] drm_test_damage_iter_damage_one_outside
[11:55:17] [PASSED] drm_test_damage_iter_damage_src_moved
[11:55:17] [PASSED] drm_test_damage_iter_damage_not_visible
[11:55:17] ================ [PASSED] drm_damage_helper ================
[11:55:17] ============== drm_dp_mst_helper (3 subtests) ==============
[11:55:17] ============== drm_test_dp_mst_calc_pbn_mode  ==============
[11:55:17] [PASSED] Clock 154000 BPP 30 DSC disabled
[11:55:17] [PASSED] Clock 234000 BPP 30 DSC disabled
[11:55:17] [PASSED] Clock 297000 BPP 24 DSC disabled
[11:55:17] [PASSED] Clock 332880 BPP 24 DSC enabled
[11:55:17] [PASSED] Clock 324540 BPP 24 DSC enabled
[11:55:17] ========== [PASSED] drm_test_dp_mst_calc_pbn_mode ==========
[11:55:17] ============== drm_test_dp_mst_calc_pbn_div  ===============
[11:55:17] [PASSED] Link rate 2000000 lane count 4
[11:55:17] [PASSED] Link rate 2000000 lane count 2
[11:55:17] [PASSED] Link rate 2000000 lane count 1
[11:55:17] [PASSED] Link rate 1350000 lane count 4
[11:55:17] [PASSED] Link rate 1350000 lane count 2
[11:55:17] [PASSED] Link rate 1350000 lane count 1
[11:55:17] [PASSED] Link rate 1000000 lane count 4
[11:55:17] [PASSED] Link rate 1000000 lane count 2
[11:55:17] [PASSED] Link rate 1000000 lane count 1
[11:55:17] [PASSED] Link rate 810000 lane count 4
[11:55:17] [PASSED] Link rate 810000 lane count 2
[11:55:17] [PASSED] Link rate 810000 lane count 1
[11:55:17] [PASSED] Link rate 540000 lane count 4
[11:55:17] [PASSED] Link rate 540000 lane count 2
[11:55:17] [PASSED] Link rate 540000 lane count 1
[11:55:17] [PASSED] Link rate 270000 lane count 4
[11:55:17] [PASSED] Link rate 270000 lane count 2
[11:55:17] [PASSED] Link rate 270000 lane count 1
[11:55:17] [PASSED] Link rate 162000 lane count 4
[11:55:17] [PASSED] Link rate 162000 lane count 2
[11:55:17] [PASSED] Link rate 162000 lane count 1
[11:55:17] ========== [PASSED] drm_test_dp_mst_calc_pbn_div ===========
[11:55:17] ========= drm_test_dp_mst_sideband_msg_req_decode  =========
[11:55:17] [PASSED] DP_ENUM_PATH_RESOURCES with port number
[11:55:17] [PASSED] DP_POWER_UP_PHY with port number
[11:55:17] [PASSED] DP_POWER_DOWN_PHY with port number
[11:55:17] [PASSED] DP_ALLOCATE_PAYLOAD with SDP stream sinks
[11:55:17] [PASSED] DP_ALLOCATE_PAYLOAD with port number
[11:55:17] [PASSED] DP_ALLOCATE_PAYLOAD with VCPI
[11:55:17] [PASSED] DP_ALLOCATE_PAYLOAD with PBN
[11:55:17] [PASSED] DP_QUERY_PAYLOAD with port number
[11:55:17] [PASSED] DP_QUERY_PAYLOAD with VCPI
[11:55:17] [PASSED] DP_REMOTE_DPCD_READ with port number
[11:55:17] [PASSED] DP_REMOTE_DPCD_READ with DPCD address
[11:55:17] [PASSED] DP_REMOTE_DPCD_READ with max number of bytes
[11:55:17] [PASSED] DP_REMOTE_DPCD_WRITE with port number
[11:55:17] [PASSED] DP_REMOTE_DPCD_WRITE with DPCD address
[11:55:17] [PASSED] DP_REMOTE_DPCD_WRITE with data array
[11:55:17] [PASSED] DP_REMOTE_I2C_READ with port number
[11:55:17] [PASSED] DP_REMOTE_I2C_READ with I2C device ID
[11:55:17] [PASSED] DP_REMOTE_I2C_READ with transactions array
[11:55:17] [PASSED] DP_REMOTE_I2C_WRITE with port number
[11:55:17] [PASSED] DP_REMOTE_I2C_WRITE with I2C device ID
[11:55:17] [PASSED] DP_REMOTE_I2C_WRITE with data array
[11:55:17] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream ID
[11:55:17] [PASSED] DP_QUERY_STREAM_ENC_STATUS with client ID
[11:55:17] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream event
[11:55:17] [PASSED] DP_QUERY_STREAM_ENC_STATUS with valid stream event
[11:55:17] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream behavior
[11:55:17] [PASSED] DP_QUERY_STREAM_ENC_STATUS with a valid stream behavior
[11:55:17] ===== [PASSED] drm_test_dp_mst_sideband_msg_req_decode =====
[11:55:17] ================ [PASSED] drm_dp_mst_helper ================
[11:55:17] ================== drm_exec (7 subtests) ===================
[11:55:17] [PASSED] sanitycheck
[11:55:17] [PASSED] test_lock
[11:55:17] [PASSED] test_lock_unlock
[11:55:17] [PASSED] test_duplicates
[11:55:17] [PASSED] test_prepare
[11:55:17] [PASSED] test_prepare_array
[11:55:17] [PASSED] test_multiple_loops
[11:55:17] ==================== [PASSED] drm_exec =====================
[11:55:17] =========== drm_format_helper_test (17 subtests) ===========
[11:55:17] ============== drm_test_fb_xrgb8888_to_gray8  ==============
[11:55:17] [PASSED] single_pixel_source_buffer
[11:55:17] [PASSED] single_pixel_clip_rectangle
[11:55:17] [PASSED] well_known_colors
[11:55:17] [PASSED] destination_pitch
[11:55:17] ========== [PASSED] drm_test_fb_xrgb8888_to_gray8 ==========
[11:55:17] ============= drm_test_fb_xrgb8888_to_rgb332  ==============
[11:55:17] [PASSED] single_pixel_source_buffer
[11:55:17] [PASSED] single_pixel_clip_rectangle
[11:55:17] [PASSED] well_known_colors
[11:55:17] [PASSED] destination_pitch
[11:55:17] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb332 ==========
[11:55:17] ============= drm_test_fb_xrgb8888_to_rgb565  ==============
[11:55:17] [PASSED] single_pixel_source_buffer
[11:55:17] [PASSED] single_pixel_clip_rectangle
[11:55:17] [PASSED] well_known_colors
[11:55:17] [PASSED] destination_pitch
[11:55:17] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb565 ==========
[11:55:17] ============ drm_test_fb_xrgb8888_to_xrgb1555  =============
[11:55:17] [PASSED] single_pixel_source_buffer
[11:55:17] [PASSED] single_pixel_clip_rectangle
[11:55:17] [PASSED] well_known_colors
[11:55:17] [PASSED] destination_pitch
[11:55:17] ======== [PASSED] drm_test_fb_xrgb8888_to_xrgb1555 =========
[11:55:17] ============ drm_test_fb_xrgb8888_to_argb1555  =============
[11:55:17] [PASSED] single_pixel_source_buffer
[11:55:17] [PASSED] single_pixel_clip_rectangle
[11:55:17] [PASSED] well_known_colors
[11:55:17] [PASSED] destination_pitch
[11:55:17] ======== [PASSED] drm_test_fb_xrgb8888_to_argb1555 =========
[11:55:17] ============ drm_test_fb_xrgb8888_to_rgba5551  =============
[11:55:17] [PASSED] single_pixel_source_buffer
[11:55:17] [PASSED] single_pixel_clip_rectangle
[11:55:17] [PASSED] well_known_colors
[11:55:17] [PASSED] destination_pitch
[11:55:17] ======== [PASSED] drm_test_fb_xrgb8888_to_rgba5551 =========
[11:55:17] ============= drm_test_fb_xrgb8888_to_rgb888  ==============
[11:55:17] [PASSED] single_pixel_source_buffer
[11:55:17] [PASSED] single_pixel_clip_rectangle
[11:55:17] [PASSED] well_known_colors
[11:55:17] [PASSED] destination_pitch
[11:55:17] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb888 ==========
[11:55:17] ============= drm_test_fb_xrgb8888_to_bgr888  ==============
[11:55:17] [PASSED] single_pixel_source_buffer
[11:55:17] [PASSED] single_pixel_clip_rectangle
[11:55:17] [PASSED] well_known_colors
[11:55:17] [PASSED] destination_pitch
[11:55:17] ========= [PASSED] drm_test_fb_xrgb8888_to_bgr888 ==========
[11:55:17] ============ drm_test_fb_xrgb8888_to_argb8888  =============
[11:55:17] [PASSED] single_pixel_source_buffer
[11:55:17] [PASSED] single_pixel_clip_rectangle
[11:55:17] [PASSED] well_known_colors
[11:55:17] [PASSED] destination_pitch
[11:55:17] ======== [PASSED] drm_test_fb_xrgb8888_to_argb8888 =========
[11:55:17] =========== drm_test_fb_xrgb8888_to_xrgb2101010  ===========
[11:55:17] [PASSED] single_pixel_source_buffer
[11:55:17] [PASSED] single_pixel_clip_rectangle
[11:55:17] [PASSED] well_known_colors
[11:55:17] [PASSED] destination_pitch
[11:55:17] ======= [PASSED] drm_test_fb_xrgb8888_to_xrgb2101010 =======
[11:55:17] =========== drm_test_fb_xrgb8888_to_argb2101010  ===========
[11:55:17] [PASSED] single_pixel_source_buffer
[11:55:17] [PASSED] single_pixel_clip_rectangle
[11:55:17] [PASSED] well_known_colors
[11:55:17] [PASSED] destination_pitch
[11:55:17] ======= [PASSED] drm_test_fb_xrgb8888_to_argb2101010 =======
[11:55:17] ============== drm_test_fb_xrgb8888_to_mono  ===============
[11:55:17] [PASSED] single_pixel_source_buffer
[11:55:17] [PASSED] single_pixel_clip_rectangle
[11:55:17] [PASSED] well_known_colors
[11:55:17] [PASSED] destination_pitch
[11:55:17] ========== [PASSED] drm_test_fb_xrgb8888_to_mono ===========
[11:55:17] ==================== drm_test_fb_swab  =====================
[11:55:17] [PASSED] single_pixel_source_buffer
[11:55:17] [PASSED] single_pixel_clip_rectangle
[11:55:17] [PASSED] well_known_colors
[11:55:17] [PASSED] destination_pitch
[11:55:17] ================ [PASSED] drm_test_fb_swab =================
[11:55:17] ============ drm_test_fb_xrgb8888_to_xbgr8888  =============
[11:55:17] [PASSED] single_pixel_source_buffer
[11:55:17] [PASSED] single_pixel_clip_rectangle
[11:55:17] [PASSED] well_known_colors
[11:55:17] [PASSED] destination_pitch
[11:55:17] ======== [PASSED] drm_test_fb_xrgb8888_to_xbgr8888 =========
[11:55:17] ============ drm_test_fb_xrgb8888_to_abgr8888  =============
[11:55:17] [PASSED] single_pixel_source_buffer
[11:55:17] [PASSED] single_pixel_clip_rectangle
[11:55:17] [PASSED] well_known_colors
[11:55:17] [PASSED] destination_pitch
[11:55:17] ======== [PASSED] drm_test_fb_xrgb8888_to_abgr8888 =========
[11:55:17] ================= drm_test_fb_clip_offset  =================
[11:55:17] [PASSED] pass through
[11:55:17] [PASSED] horizontal offset
[11:55:17] [PASSED] vertical offset
[11:55:17] [PASSED] horizontal and vertical offset
[11:55:17] [PASSED] horizontal offset (custom pitch)
[11:55:17] [PASSED] vertical offset (custom pitch)
[11:55:17] [PASSED] horizontal and vertical offset (custom pitch)
[11:55:17] ============= [PASSED] drm_test_fb_clip_offset =============
[11:55:17] =================== drm_test_fb_memcpy  ====================
[11:55:17] [PASSED] single_pixel_source_buffer: XR24 little-endian (0x34325258)
[11:55:17] [PASSED] single_pixel_source_buffer: XRA8 little-endian (0x38415258)
[11:55:17] [PASSED] single_pixel_source_buffer: YU24 little-endian (0x34325559)
[11:55:17] [PASSED] single_pixel_clip_rectangle: XB24 little-endian (0x34324258)
[11:55:17] [PASSED] single_pixel_clip_rectangle: XRA8 little-endian (0x38415258)
[11:55:17] [PASSED] single_pixel_clip_rectangle: YU24 little-endian (0x34325559)
[11:55:17] [PASSED] well_known_colors: XB24 little-endian (0x34324258)
[11:55:17] [PASSED] well_known_colors: XRA8 little-endian (0x38415258)
[11:55:17] [PASSED] well_known_colors: YU24 little-endian (0x34325559)
[11:55:17] [PASSED] destination_pitch: XB24 little-endian (0x34324258)
[11:55:17] [PASSED] destination_pitch: XRA8 little-endian (0x38415258)
[11:55:17] [PASSED] destination_pitch: YU24 little-endian (0x34325559)
[11:55:17] =============== [PASSED] drm_test_fb_memcpy ================
[11:55:17] ============= [PASSED] drm_format_helper_test ==============
[11:55:17] ================= drm_format (18 subtests) =================
[11:55:17] [PASSED] drm_test_format_block_width_invalid
[11:55:17] [PASSED] drm_test_format_block_width_one_plane
[11:55:17] [PASSED] drm_test_format_block_width_two_plane
[11:55:17] [PASSED] drm_test_format_block_width_three_plane
[11:55:17] [PASSED] drm_test_format_block_width_tiled
[11:55:17] [PASSED] drm_test_format_block_height_invalid
[11:55:17] [PASSED] drm_test_format_block_height_one_plane
[11:55:17] [PASSED] drm_test_format_block_height_two_plane
[11:55:17] [PASSED] drm_test_format_block_height_three_plane
[11:55:17] [PASSED] drm_test_format_block_height_tiled
[11:55:17] [PASSED] drm_test_format_min_pitch_invalid
[11:55:17] [PASSED] drm_test_format_min_pitch_one_plane_8bpp
[11:55:17] [PASSED] drm_test_format_min_pitch_one_plane_16bpp
[11:55:17] [PASSED] drm_test_format_min_pitch_one_plane_24bpp
[11:55:17] [PASSED] drm_test_format_min_pitch_one_plane_32bpp
[11:55:17] [PASSED] drm_test_format_min_pitch_two_plane
[11:55:17] [PASSED] drm_test_format_min_pitch_three_plane_8bpp
[11:55:17] [PASSED] drm_test_format_min_pitch_tiled
[11:55:17] =================== [PASSED] drm_format ====================
[11:55:17] ============== drm_framebuffer (10 subtests) ===============
[11:55:17] ========== drm_test_framebuffer_check_src_coords  ==========
[11:55:17] [PASSED] Success: source fits into fb
[11:55:17] [PASSED] Fail: overflowing fb with x-axis coordinate
[11:55:17] [PASSED] Fail: overflowing fb with y-axis coordinate
[11:55:17] [PASSED] Fail: overflowing fb with source width
[11:55:17] [PASSED] Fail: overflowing fb with source height
[11:55:17] ====== [PASSED] drm_test_framebuffer_check_src_coords ======
[11:55:17] [PASSED] drm_test_framebuffer_cleanup
[11:55:17] =============== drm_test_framebuffer_create  ===============
[11:55:17] [PASSED] ABGR8888 normal sizes
[11:55:17] [PASSED] ABGR8888 max sizes
[11:55:17] [PASSED] ABGR8888 pitch greater than min required
[11:55:17] [PASSED] ABGR8888 pitch less than min required
[11:55:17] [PASSED] ABGR8888 Invalid width
[11:55:17] [PASSED] ABGR8888 Invalid buffer handle
[11:55:17] [PASSED] No pixel format
[11:55:17] [PASSED] ABGR8888 Width 0
[11:55:17] [PASSED] ABGR8888 Height 0
[11:55:17] [PASSED] ABGR8888 Out of bound height * pitch combination
[11:55:17] [PASSED] ABGR8888 Large buffer offset
[11:55:17] [PASSED] ABGR8888 Buffer offset for inexistent plane
[11:55:17] [PASSED] ABGR8888 Invalid flag
[11:55:17] [PASSED] ABGR8888 Set DRM_MODE_FB_MODIFIERS without modifiers
[11:55:17] [PASSED] ABGR8888 Valid buffer modifier
[11:55:17] [PASSED] ABGR8888 Invalid buffer modifier(DRM_FORMAT_MOD_SAMSUNG_64_32_TILE)
[11:55:17] [PASSED] ABGR8888 Extra pitches without DRM_MODE_FB_MODIFIERS
[11:55:17] [PASSED] ABGR8888 Extra pitches with DRM_MODE_FB_MODIFIERS
[11:55:17] [PASSED] NV12 Normal sizes
[11:55:17] [PASSED] NV12 Max sizes
[11:55:17] [PASSED] NV12 Invalid pitch
[11:55:17] [PASSED] NV12 Invalid modifier/missing DRM_MODE_FB_MODIFIERS flag
[11:55:17] [PASSED] NV12 different  modifier per-plane
[11:55:17] [PASSED] NV12 with DRM_FORMAT_MOD_SAMSUNG_64_32_TILE
[11:55:17] [PASSED] NV12 Valid modifiers without DRM_MODE_FB_MODIFIERS
[11:55:17] [PASSED] NV12 Modifier for inexistent plane
[11:55:17] [PASSED] NV12 Handle for inexistent plane
[11:55:17] [PASSED] NV12 Handle for inexistent plane without DRM_MODE_FB_MODIFIERS
[11:55:17] [PASSED] YVU420 DRM_MODE_FB_MODIFIERS set without modifier
[11:55:17] [PASSED] YVU420 Normal sizes
[11:55:17] [PASSED] YVU420 Max sizes
[11:55:17] [PASSED] YVU420 Invalid pitch
[11:55:17] [PASSED] YVU420 Different pitches
[11:55:17] [PASSED] YVU420 Different buffer offsets/pitches
[11:55:17] [PASSED] YVU420 Modifier set just for plane 0, without DRM_MODE_FB_MODIFIERS
[11:55:17] [PASSED] YVU420 Modifier set just for planes 0, 1, without DRM_MODE_FB_MODIFIERS
[11:55:17] [PASSED] YVU420 Modifier set just for plane 0, 1, with DRM_MODE_FB_MODIFIERS
[11:55:17] [PASSED] YVU420 Valid modifier
[11:55:17] [PASSED] YVU420 Different modifiers per plane
[11:55:17] [PASSED] YVU420 Modifier for inexistent plane
[11:55:17] [PASSED] YUV420_10BIT Invalid modifier(DRM_FORMAT_MOD_LINEAR)
[11:55:17] [PASSED] X0L2 Normal sizes
[11:55:17] [PASSED] X0L2 Max sizes
[11:55:17] [PASSED] X0L2 Invalid pitch
[11:55:17] [PASSED] X0L2 Pitch greater than minimum required
[11:55:17] [PASSED] X0L2 Handle for inexistent plane
[11:55:17] [PASSED] X0L2 Offset for inexistent plane, without DRM_MODE_FB_MODIFIERS set
[11:55:17] [PASSED] X0L2 Modifier without DRM_MODE_FB_MODIFIERS set
[11:55:17] [PASSED] X0L2 Valid modifier
[11:55:17] [PASSED] X0L2 Modifier for inexistent plane
[11:55:17] =========== [PASSED] drm_test_framebuffer_create ===========
[11:55:17] [PASSED] drm_test_framebuffer_free
[11:55:17] [PASSED] drm_test_framebuffer_init
[11:55:17] [PASSED] drm_test_framebuffer_init_bad_format
[11:55:17] [PASSED] drm_test_framebuffer_init_dev_mismatch
[11:55:17] [PASSED] drm_test_framebuffer_lookup
[11:55:17] [PASSED] drm_test_framebuffer_lookup_inexistent
[11:55:17] [PASSED] drm_test_framebuffer_modifiers_not_supported
[11:55:17] ================= [PASSED] drm_framebuffer =================
[11:55:17] ================ drm_gem_shmem (8 subtests) ================
[11:55:17] [PASSED] drm_gem_shmem_test_obj_create
[11:55:17] [PASSED] drm_gem_shmem_test_obj_create_private
[11:55:17] [PASSED] drm_gem_shmem_test_pin_pages
[11:55:17] [PASSED] drm_gem_shmem_test_vmap
[11:55:17] [PASSED] drm_gem_shmem_test_get_pages_sgt
[11:55:17] [PASSED] drm_gem_shmem_test_get_sg_table
[11:55:17] [PASSED] drm_gem_shmem_test_madvise
[11:55:17] [PASSED] drm_gem_shmem_test_purge
[11:55:17] ================== [PASSED] drm_gem_shmem ==================
[11:55:17] === drm_atomic_helper_connector_hdmi_check (27 subtests) ===
[11:55:17] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode
[11:55:17] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode_vic_1
[11:55:17] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode
[11:55:17] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode_vic_1
[11:55:17] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode
[11:55:17] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode_vic_1
[11:55:17] ====== drm_test_check_broadcast_rgb_cea_mode_yuv420  =======
[11:55:17] [PASSED] Automatic
[11:55:17] [PASSED] Full
[11:55:17] [PASSED] Limited 16:235
[11:55:17] == [PASSED] drm_test_check_broadcast_rgb_cea_mode_yuv420 ===
[11:55:17] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_changed
[11:55:17] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_not_changed
[11:55:17] [PASSED] drm_test_check_disable_connector
[11:55:17] [PASSED] drm_test_check_hdmi_funcs_reject_rate
[11:55:17] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_rgb
[11:55:17] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_yuv420
[11:55:17] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv422
[11:55:17] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv420
[11:55:17] [PASSED] drm_test_check_driver_unsupported_fallback_yuv420
[11:55:17] [PASSED] drm_test_check_output_bpc_crtc_mode_changed
[11:55:17] [PASSED] drm_test_check_output_bpc_crtc_mode_not_changed
[11:55:17] [PASSED] drm_test_check_output_bpc_dvi
[11:55:17] [PASSED] drm_test_check_output_bpc_format_vic_1
[11:55:17] [PASSED] drm_test_check_output_bpc_format_display_8bpc_only
[11:55:17] [PASSED] drm_test_check_output_bpc_format_display_rgb_only
[11:55:17] [PASSED] drm_test_check_output_bpc_format_driver_8bpc_only
[11:55:17] [PASSED] drm_test_check_output_bpc_format_driver_rgb_only
[11:55:17] [PASSED] drm_test_check_tmds_char_rate_rgb_8bpc
[11:55:17] [PASSED] drm_test_check_tmds_char_rate_rgb_10bpc
[11:55:17] [PASSED] drm_test_check_tmds_char_rate_rgb_12bpc
[11:55:17] ===== [PASSED] drm_atomic_helper_connector_hdmi_check ======
[11:55:17] === drm_atomic_helper_connector_hdmi_reset (6 subtests) ====
[11:55:17] [PASSED] drm_test_check_broadcast_rgb_value
[11:55:17] [PASSED] drm_test_check_bpc_8_value
[11:55:17] [PASSED] drm_test_check_bpc_10_value
[11:55:17] [PASSED] drm_test_check_bpc_12_value
[11:55:17] [PASSED] drm_test_check_format_value
[11:55:17] [PASSED] drm_test_check_tmds_char_value
[11:55:17] ===== [PASSED] drm_atomic_helper_connector_hdmi_reset ======
[11:55:17] = drm_atomic_helper_connector_hdmi_mode_valid (4 subtests) =
[11:55:17] [PASSED] drm_test_check_mode_valid
[11:55:17] [PASSED] drm_test_check_mode_valid_reject
[11:55:17] [PASSED] drm_test_check_mode_valid_reject_rate
[11:55:17] [PASSED] drm_test_check_mode_valid_reject_max_clock
[11:55:17] === [PASSED] drm_atomic_helper_connector_hdmi_mode_valid ===
[11:55:17] ================= drm_managed (2 subtests) =================
[11:55:17] [PASSED] drm_test_managed_release_action
[11:55:17] [PASSED] drm_test_managed_run_action
[11:55:17] =================== [PASSED] drm_managed ===================
[11:55:17] =================== drm_mm (6 subtests) ====================
[11:55:17] [PASSED] drm_test_mm_init
[11:55:17] [PASSED] drm_test_mm_debug
[11:55:17] [PASSED] drm_test_mm_align32
[11:55:17] [PASSED] drm_test_mm_align64
[11:55:17] [PASSED] drm_test_mm_lowest
[11:55:17] [PASSED] drm_test_mm_highest
[11:55:17] ===================== [PASSED] drm_mm ======================
[11:55:17] ============= drm_modes_analog_tv (5 subtests) =============
[11:55:17] [PASSED] drm_test_modes_analog_tv_mono_576i
[11:55:17] [PASSED] drm_test_modes_analog_tv_ntsc_480i
[11:55:17] [PASSED] drm_test_modes_analog_tv_ntsc_480i_inlined
[11:55:17] [PASSED] drm_test_modes_analog_tv_pal_576i
[11:55:17] [PASSED] drm_test_modes_analog_tv_pal_576i_inlined
[11:55:17] =============== [PASSED] drm_modes_analog_tv ===============
[11:55:17] ============== drm_plane_helper (2 subtests) ===============
[11:55:17] =============== drm_test_check_plane_state  ================
[11:55:17] [PASSED] clipping_simple
[11:55:17] [PASSED] clipping_rotate_reflect
[11:55:17] [PASSED] positioning_simple
[11:55:17] [PASSED] upscaling
[11:55:17] [PASSED] downscaling
[11:55:17] [PASSED] rounding1
[11:55:17] [PASSED] rounding2
[11:55:17] [PASSED] rounding3
[11:55:17] [PASSED] rounding4
[11:55:17] =========== [PASSED] drm_test_check_plane_state ============
[11:55:17] =========== drm_test_check_invalid_plane_state  ============
[11:55:17] [PASSED] positioning_invalid
[11:55:17] [PASSED] upscaling_invalid
[11:55:17] [PASSED] downscaling_invalid
[11:55:17] ======= [PASSED] drm_test_check_invalid_plane_state ========
[11:55:17] ================ [PASSED] drm_plane_helper =================
[11:55:17] ====== drm_connector_helper_tv_get_modes (1 subtest) =======
[11:55:17] ====== drm_test_connector_helper_tv_get_modes_check  =======
[11:55:17] [PASSED] None
[11:55:17] [PASSED] PAL
[11:55:17] [PASSED] NTSC
[11:55:17] [PASSED] Both, NTSC Default
[11:55:17] [PASSED] Both, PAL Default
[11:55:17] [PASSED] Both, NTSC Default, with PAL on command-line
[11:55:17] [PASSED] Both, PAL Default, with NTSC on command-line
[11:55:17] == [PASSED] drm_test_connector_helper_tv_get_modes_check ===
[11:55:17] ======== [PASSED] drm_connector_helper_tv_get_modes ========
[11:55:17] ================== drm_rect (9 subtests) ===================
[11:55:17] [PASSED] drm_test_rect_clip_scaled_div_by_zero
[11:55:17] [PASSED] drm_test_rect_clip_scaled_not_clipped
[11:55:17] [PASSED] drm_test_rect_clip_scaled_clipped
[11:55:17] [PASSED] drm_test_rect_clip_scaled_signed_vs_unsigned
[11:55:17] ================= drm_test_rect_intersect  =================
[11:55:17] [PASSED] top-left x bottom-right: 2x2+1+1 x 2x2+0+0
[11:55:17] [PASSED] top-right x bottom-left: 2x2+0+0 x 2x2+1-1
[11:55:17] [PASSED] bottom-left x top-right: 2x2+1-1 x 2x2+0+0
[11:55:17] [PASSED] bottom-right x top-left: 2x2+0+0 x 2x2+1+1
[11:55:17] [PASSED] right x left: 2x1+0+0 x 3x1+1+0
[11:55:17] [PASSED] left x right: 3x1+1+0 x 2x1+0+0
[11:55:17] [PASSED] up x bottom: 1x2+0+0 x 1x3+0-1
[11:55:17] [PASSED] bottom x up: 1x3+0-1 x 1x2+0+0
[11:55:17] [PASSED] touching corner: 1x1+0+0 x 2x2+1+1
[11:55:17] [PASSED] touching side: 1x1+0+0 x 1x1+1+0
[11:55:17] [PASSED] equal rects: 2x2+0+0 x 2x2+0+0
[11:55:17] [PASSED] inside another: 2x2+0+0 x 1x1+1+1
[11:55:17] [PASSED] far away: 1x1+0+0 x 1x1+3+6
[11:55:17] [PASSED] points intersecting: 0x0+5+10 x 0x0+5+10
[11:55:17] [PASSED] points not intersecting: 0x0+0+0 x 0x0+5+10
[11:55:17] ============= [PASSED] drm_test_rect_intersect =============
[11:55:17] ================ drm_test_rect_calc_hscale  ================
[11:55:17] [PASSED] normal use
[11:55:17] [PASSED] out of max range
[11:55:17] [PASSED] out of min range
[11:55:17] [PASSED] zero dst
[11:55:17] [PASSED] negative src
[11:55:17] [PASSED] negative dst
[11:55:17] ============ [PASSED] drm_test_rect_calc_hscale ============
[11:55:17] ================ drm_test_rect_calc_vscale  ================
[11:55:17] [PASSED] normal use
[11:55:17] [PASSED] out of max range
[11:55:17] [PASSED] out of min range
[11:55:17] [PASSED] zero dst
[11:55:17] [PASSED] negative src
[11:55:17] [PASSED] negative dst
[11:55:17] ============ [PASSED] drm_test_rect_calc_vscale ============
[11:55:17] ================== drm_test_rect_rotate  ===================
[11:55:17] [PASSED] reflect-x
[11:55:17] [PASSED] reflect-y
[11:55:17] [PASSED] rotate-0
[11:55:17] [PASSED] rotate-90
[11:55:17] [PASSED] rotate-180
[11:55:17] [PASSED] rotate-270
stty: 'standard input': Inappropriate ioctl for device
[11:55:17] ============== [PASSED] drm_test_rect_rotate ===============
[11:55:17] ================ drm_test_rect_rotate_inv  =================
[11:55:17] [PASSED] reflect-x
[11:55:17] [PASSED] reflect-y
[11:55:17] [PASSED] rotate-0
[11:55:17] [PASSED] rotate-90
[11:55:17] [PASSED] rotate-180
[11:55:17] [PASSED] rotate-270
[11:55:17] ============ [PASSED] drm_test_rect_rotate_inv =============
[11:55:17] ==================== [PASSED] drm_rect =====================
[11:55:17] ============ drm_sysfb_modeset_test (1 subtest) ============
[11:55:17] ============ drm_test_sysfb_build_fourcc_list  =============
[11:55:17] [PASSED] no native formats
[11:55:17] [PASSED] XRGB8888 as native format
[11:55:17] [PASSED] remove duplicates
[11:55:17] [PASSED] convert alpha formats
[11:55:17] [PASSED] random formats
[11:55:17] ======== [PASSED] drm_test_sysfb_build_fourcc_list =========
[11:55:17] ============= [PASSED] drm_sysfb_modeset_test ==============
[11:55:17] ============================================================
[11:55:17] Testing complete. Ran 616 tests: passed: 616
[11:55:17] Elapsed time: 25.094s total, 1.736s configuring, 23.137s building, 0.193s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/ttm/tests/.kunitconfig
[11:55:17] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[11:55:19] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[11:55:27] Starting KUnit Kernel (1/1)...
[11:55:27] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[11:55:27] ================= ttm_device (5 subtests) ==================
[11:55:27] [PASSED] ttm_device_init_basic
[11:55:27] [PASSED] ttm_device_init_multiple
[11:55:27] [PASSED] ttm_device_fini_basic
[11:55:27] [PASSED] ttm_device_init_no_vma_man
[11:55:27] ================== ttm_device_init_pools  ==================
[11:55:27] [PASSED] No DMA allocations, no DMA32 required
[11:55:27] [PASSED] DMA allocations, DMA32 required
[11:55:27] [PASSED] No DMA allocations, DMA32 required
[11:55:27] [PASSED] DMA allocations, no DMA32 required
[11:55:27] ============== [PASSED] ttm_device_init_pools ==============
[11:55:27] =================== [PASSED] ttm_device ====================
[11:55:27] ================== ttm_pool (8 subtests) ===================
[11:55:27] ================== ttm_pool_alloc_basic  ===================
[11:55:27] [PASSED] One page
[11:55:27] [PASSED] More than one page
[11:55:27] [PASSED] Above the allocation limit
[11:55:27] [PASSED] One page, with coherent DMA mappings enabled
[11:55:27] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[11:55:27] ============== [PASSED] ttm_pool_alloc_basic ===============
[11:55:27] ============== ttm_pool_alloc_basic_dma_addr  ==============
[11:55:27] [PASSED] One page
[11:55:27] [PASSED] More than one page
[11:55:27] [PASSED] Above the allocation limit
[11:55:27] [PASSED] One page, with coherent DMA mappings enabled
[11:55:27] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[11:55:27] ========== [PASSED] ttm_pool_alloc_basic_dma_addr ==========
[11:55:27] [PASSED] ttm_pool_alloc_order_caching_match
[11:55:27] [PASSED] ttm_pool_alloc_caching_mismatch
[11:55:27] [PASSED] ttm_pool_alloc_order_mismatch
[11:55:27] [PASSED] ttm_pool_free_dma_alloc
[11:55:27] [PASSED] ttm_pool_free_no_dma_alloc
[11:55:27] [PASSED] ttm_pool_fini_basic
[11:55:27] ==================== [PASSED] ttm_pool =====================
[11:55:27] ================ ttm_resource (8 subtests) =================
[11:55:27] ================= ttm_resource_init_basic  =================
[11:55:27] [PASSED] Init resource in TTM_PL_SYSTEM
[11:55:27] [PASSED] Init resource in TTM_PL_VRAM
[11:55:27] [PASSED] Init resource in a private placement
[11:55:27] [PASSED] Init resource in TTM_PL_SYSTEM, set placement flags
[11:55:27] ============= [PASSED] ttm_resource_init_basic =============
[11:55:27] [PASSED] ttm_resource_init_pinned
[11:55:27] [PASSED] ttm_resource_fini_basic
[11:55:27] [PASSED] ttm_resource_manager_init_basic
[11:55:27] [PASSED] ttm_resource_manager_usage_basic
[11:55:27] [PASSED] ttm_resource_manager_set_used_basic
[11:55:27] [PASSED] ttm_sys_man_alloc_basic
[11:55:27] [PASSED] ttm_sys_man_free_basic
[11:55:27] ================== [PASSED] ttm_resource ===================
[11:55:27] =================== ttm_tt (15 subtests) ===================
[11:55:27] ==================== ttm_tt_init_basic  ====================
[11:55:27] [PASSED] Page-aligned size
[11:55:27] [PASSED] Extra pages requested
[11:55:27] ================ [PASSED] ttm_tt_init_basic ================
[11:55:27] [PASSED] ttm_tt_init_misaligned
[11:55:27] [PASSED] ttm_tt_fini_basic
[11:55:27] [PASSED] ttm_tt_fini_sg
[11:55:27] [PASSED] ttm_tt_fini_shmem
[11:55:27] [PASSED] ttm_tt_create_basic
[11:55:27] [PASSED] ttm_tt_create_invalid_bo_type
[11:55:27] [PASSED] ttm_tt_create_ttm_exists
[11:55:27] [PASSED] ttm_tt_create_failed
[11:55:27] [PASSED] ttm_tt_destroy_basic
[11:55:27] [PASSED] ttm_tt_populate_null_ttm
[11:55:27] [PASSED] ttm_tt_populate_populated_ttm
[11:55:27] [PASSED] ttm_tt_unpopulate_basic
[11:55:27] [PASSED] ttm_tt_unpopulate_empty_ttm
[11:55:27] [PASSED] ttm_tt_swapin_basic
[11:55:27] ===================== [PASSED] ttm_tt ======================
[11:55:27] =================== ttm_bo (14 subtests) ===================
[11:55:27] =========== ttm_bo_reserve_optimistic_no_ticket  ===========
[11:55:27] [PASSED] Cannot be interrupted and sleeps
[11:55:27] [PASSED] Cannot be interrupted, locks straight away
[11:55:27] [PASSED] Can be interrupted, sleeps
[11:55:27] ======= [PASSED] ttm_bo_reserve_optimistic_no_ticket =======
[11:55:27] [PASSED] ttm_bo_reserve_locked_no_sleep
[11:55:27] [PASSED] ttm_bo_reserve_no_wait_ticket
[11:55:27] [PASSED] ttm_bo_reserve_double_resv
[11:55:27] [PASSED] ttm_bo_reserve_interrupted
[11:55:27] [PASSED] ttm_bo_reserve_deadlock
[11:55:27] [PASSED] ttm_bo_unreserve_basic
[11:55:27] [PASSED] ttm_bo_unreserve_pinned
[11:55:27] [PASSED] ttm_bo_unreserve_bulk
[11:55:27] [PASSED] ttm_bo_put_basic
[11:55:27] [PASSED] ttm_bo_put_shared_resv
[11:55:27] [PASSED] ttm_bo_pin_basic
[11:55:27] [PASSED] ttm_bo_pin_unpin_resource
[11:55:27] [PASSED] ttm_bo_multiple_pin_one_unpin
[11:55:27] ===================== [PASSED] ttm_bo ======================
[11:55:27] ============== ttm_bo_validate (21 subtests) ===============
[11:55:27] ============== ttm_bo_init_reserved_sys_man  ===============
[11:55:27] [PASSED] Buffer object for userspace
[11:55:27] [PASSED] Kernel buffer object
[11:55:27] [PASSED] Shared buffer object
[11:55:27] ========== [PASSED] ttm_bo_init_reserved_sys_man ===========
[11:55:27] ============== ttm_bo_init_reserved_mock_man  ==============
[11:55:27] [PASSED] Buffer object for userspace
[11:55:27] [PASSED] Kernel buffer object
[11:55:27] [PASSED] Shared buffer object
[11:55:27] ========== [PASSED] ttm_bo_init_reserved_mock_man ==========
[11:55:27] [PASSED] ttm_bo_init_reserved_resv
[11:55:27] ================== ttm_bo_validate_basic  ==================
[11:55:27] [PASSED] Buffer object for userspace
[11:55:27] [PASSED] Kernel buffer object
[11:55:27] [PASSED] Shared buffer object
[11:55:27] ============== [PASSED] ttm_bo_validate_basic ==============
[11:55:27] [PASSED] ttm_bo_validate_invalid_placement
[11:55:27] ============= ttm_bo_validate_same_placement  ==============
[11:55:27] [PASSED] System manager
[11:55:27] [PASSED] VRAM manager
[11:55:27] ========= [PASSED] ttm_bo_validate_same_placement ==========
[11:55:27] [PASSED] ttm_bo_validate_failed_alloc
[11:55:27] [PASSED] ttm_bo_validate_pinned
[11:55:27] [PASSED] ttm_bo_validate_busy_placement
[11:55:27] ================ ttm_bo_validate_multihop  =================
[11:55:27] [PASSED] Buffer object for userspace
[11:55:27] [PASSED] Kernel buffer object
[11:55:27] [PASSED] Shared buffer object
[11:55:27] ============ [PASSED] ttm_bo_validate_multihop =============
[11:55:27] ========== ttm_bo_validate_no_placement_signaled  ==========
[11:55:27] [PASSED] Buffer object in system domain, no page vector
[11:55:27] [PASSED] Buffer object in system domain with an existing page vector
[11:55:27] ====== [PASSED] ttm_bo_validate_no_placement_signaled ======
[11:55:27] ======== ttm_bo_validate_no_placement_not_signaled  ========
[11:55:27] [PASSED] Buffer object for userspace
[11:55:27] [PASSED] Kernel buffer object
[11:55:27] [PASSED] Shared buffer object
[11:55:27] ==== [PASSED] ttm_bo_validate_no_placement_not_signaled ====
[11:55:27] [PASSED] ttm_bo_validate_move_fence_signaled
[11:55:27] ========= ttm_bo_validate_move_fence_not_signaled  =========
[11:55:27] [PASSED] Waits for GPU
[11:55:27] [PASSED] Tries to lock straight away
[11:55:27] ===== [PASSED] ttm_bo_validate_move_fence_not_signaled =====
[11:55:27] [PASSED] ttm_bo_validate_happy_evict
[11:55:27] [PASSED] ttm_bo_validate_all_pinned_evict
[11:55:27] [PASSED] ttm_bo_validate_allowed_only_evict
[11:55:27] [PASSED] ttm_bo_validate_deleted_evict
[11:55:27] [PASSED] ttm_bo_validate_busy_domain_evict
[11:55:27] [PASSED] ttm_bo_validate_evict_gutting
[11:55:27] [PASSED] ttm_bo_validate_recrusive_evict
stty: 'standard input': Inappropriate ioctl for device
[11:55:27] ================= [PASSED] ttm_bo_validate =================
[11:55:27] ============================================================
[11:55:27] Testing complete. Ran 101 tests: passed: 101
[11:55:27] Elapsed time: 9.690s total, 1.701s configuring, 7.773s building, 0.187s running

+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 15/15] drm/xe: Convert pinned suspend eviction for exhaustive eviction
  2025-08-13 10:51 ` [PATCH 15/15] drm/xe: Convert pinned suspend eviction " Thomas Hellström
@ 2025-08-13 12:13   ` Matthew Auld
  2025-08-13 12:30     ` Thomas Hellström
  2025-08-14 20:30   ` Matthew Brost
  1 sibling, 1 reply; 66+ messages in thread
From: Matthew Auld @ 2025-08-13 12:13 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: Matthew Brost, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst

Hi,

On 13/08/2025 11:51, Thomas Hellström wrote:
> Pinned suspend eviction and preparation for eviction validates
> system memory for eviction buffers. Do that under a
> validation exclusive lock to avoid interfering with other
> processes validating system graphics memory.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>   drivers/gpu/drm/xe/xe_bo.c | 205 +++++++++++++++++++------------------
>   1 file changed, 108 insertions(+), 97 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 82bf158426ad..efb9c88b6aa7 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -1139,43 +1139,47 @@ long xe_bo_shrink(struct ttm_operation_ctx *ctx, struct ttm_buffer_object *bo,
>   int xe_bo_notifier_prepare_pinned(struct xe_bo *bo)
>   {
>   	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
> -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> +	struct xe_validation_ctx ctx;
> +	struct drm_exec exec;
>   	struct xe_bo *backup;
>   	int ret = 0;
>   
> -	xe_bo_lock(bo, false);
> +	xe_validation_guard(&ctx, &xe->val, &exec, 0, ret, true) {

Ah, this reminded me of 
https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4288

Could this help with that? If you could maybe keep the exlusive mode 
turned on for the entire prepare/unprepare stage, to ensure other 
execs/validates back off until we are done?

> +		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
> +		drm_exec_retry_on_contention(&exec);
> +		xe_assert(xe, !ret);
> +		xe_assert(xe, !bo->backup_obj);
>   
> -	xe_assert(xe, !bo->backup_obj);
> +		/*
> +		 * Since this is called from the PM notifier we might have raced with
> +		 * someone unpinning this after we dropped the pinned list lock and
> +		 * grabbing the above bo lock.
> +		 */
> +		if (!xe_bo_is_pinned(bo))
> +			break;
>   
> -	/*
> -	 * Since this is called from the PM notifier we might have raced with
> -	 * someone unpinning this after we dropped the pinned list lock and
> -	 * grabbing the above bo lock.
> -	 */
> -	if (!xe_bo_is_pinned(bo))
> -		goto out_unlock_bo;
> +		if (!xe_bo_is_vram(bo))
> +			break;
>   
> -	if (!xe_bo_is_vram(bo))
> -		goto out_unlock_bo;
> +		if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
> +			break;
>   
> -	if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
> -		goto out_unlock_bo;
> +		backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
> +					   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> +					   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> +					   XE_BO_FLAG_PINNED, &exec);
> +		if (IS_ERR(backup)) {
> +			drm_exec_retry_on_contention(&exec);
> +			ret = PTR_ERR(backup);
> +			xe_validation_retry_on_oom(&ctx, &ret);
> +			break;
> +		}
>   
> -	backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
> -				   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> -				   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> -				   XE_BO_FLAG_PINNED, exec);
> -	if (IS_ERR(backup)) {
> -		ret = PTR_ERR(backup);
> -		goto out_unlock_bo;
> +		backup->parent_obj = xe_bo_get(bo); /* Released by bo_destroy */
> +		ttm_bo_pin(&backup->ttm);
> +		bo->backup_obj = backup;
>   	}
>   
> -	backup->parent_obj = xe_bo_get(bo); /* Released by bo_destroy */
> -	ttm_bo_pin(&backup->ttm);
> -	bo->backup_obj = backup;
> -
> -out_unlock_bo:
> -	xe_bo_unlock(bo);
>   	return ret;
>   }
>   
> @@ -1215,99 +1219,106 @@ int xe_bo_notifier_unprepare_pinned(struct xe_bo *bo)
>   int xe_bo_evict_pinned(struct xe_bo *bo)
>   {
>   	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
> -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> +	struct xe_validation_ctx ctx;
> +	struct drm_exec exec;
>   	struct xe_bo *backup = bo->backup_obj;
>   	bool backup_created = false;
>   	bool unmap = false;
>   	int ret = 0;
>   
> -	xe_bo_lock(bo, false);
> +	xe_validation_guard(&ctx, &xe->val, &exec, 0, ret, true) {
> +		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
> +		drm_exec_retry_on_contention(&exec);
> +		xe_assert(xe, !ret);
>   
> -	if (WARN_ON(!bo->ttm.resource)) {
> -		ret = -EINVAL;
> -		goto out_unlock_bo;
> -	}
> +		if (WARN_ON(!bo->ttm.resource)) {
> +			ret = -EINVAL;
> +			break;
> +		}
>   
> -	if (WARN_ON(!xe_bo_is_pinned(bo))) {
> -		ret = -EINVAL;
> -		goto out_unlock_bo;
> -	}
> +		if (WARN_ON(!xe_bo_is_pinned(bo))) {
> +			ret = -EINVAL;
> +			break;
> +		}
>   
> -	if (!xe_bo_is_vram(bo))
> -		goto out_unlock_bo;
> +		if (!xe_bo_is_vram(bo))
> +			break;
>   
> -	if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
> -		goto out_unlock_bo;
> +		if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
> +			break;
>   
> -	if (!backup) {
> -		backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
> -					   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> -					   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> -					   XE_BO_FLAG_PINNED, exec);
> -		if (IS_ERR(backup)) {
> -			ret = PTR_ERR(backup);
> -			goto out_unlock_bo;
> +		if (!backup) {
> +			backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL,
> +						   xe_bo_size(bo),
> +						   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> +						   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> +						   XE_BO_FLAG_PINNED, &exec);
> +			if (IS_ERR(backup)) {
> +				drm_exec_retry_on_contention(&exec);
> +				ret = PTR_ERR(backup);
> +				xe_validation_retry_on_oom(&ctx, &ret);
> +				break;
> +			}
> +			backup->parent_obj = xe_bo_get(bo); /* Released by bo_destroy */
> +			backup_created = true;
>   		}
> -		backup->parent_obj = xe_bo_get(bo); /* Released by bo_destroy */
> -		backup_created = true;
> -	}
>   
> -	if (xe_bo_is_user(bo) || (bo->flags & XE_BO_FLAG_PINNED_LATE_RESTORE)) {
> -		struct xe_migrate *migrate;
> -		struct dma_fence *fence;
> -
> -		if (bo->tile)
> -			migrate = bo->tile->migrate;
> -		else
> -			migrate = mem_type_to_migrate(xe, bo->ttm.resource->mem_type);
> +		if (xe_bo_is_user(bo) || (bo->flags & XE_BO_FLAG_PINNED_LATE_RESTORE)) {
> +			struct xe_migrate *migrate;
> +			struct dma_fence *fence;
>   
> -		ret = dma_resv_reserve_fences(bo->ttm.base.resv, 1);
> -		if (ret)
> -			goto out_backup;
> +			if (bo->tile)
> +				migrate = bo->tile->migrate;
> +			else
> +				migrate = mem_type_to_migrate(xe, bo->ttm.resource->mem_type);
>   
> -		ret = dma_resv_reserve_fences(backup->ttm.base.resv, 1);
> -		if (ret)
> -			goto out_backup;
> +			ret = dma_resv_reserve_fences(bo->ttm.base.resv, 1);
> +			if (ret)
> +				goto out_backup;
>   
> -		fence = xe_migrate_copy(migrate, bo, backup, bo->ttm.resource,
> -					backup->ttm.resource, false);
> -		if (IS_ERR(fence)) {
> -			ret = PTR_ERR(fence);
> -			goto out_backup;
> -		}
> +			ret = dma_resv_reserve_fences(backup->ttm.base.resv, 1);
> +			if (ret)
> +				goto out_backup;
>   
> -		dma_resv_add_fence(bo->ttm.base.resv, fence,
> -				   DMA_RESV_USAGE_KERNEL);
> -		dma_resv_add_fence(backup->ttm.base.resv, fence,
> -				   DMA_RESV_USAGE_KERNEL);
> -		dma_fence_put(fence);
> -	} else {
> -		ret = xe_bo_vmap(backup);
> -		if (ret)
> -			goto out_backup;
> +			fence = xe_migrate_copy(migrate, bo, backup, bo->ttm.resource,
> +						backup->ttm.resource, false);
> +			if (IS_ERR(fence)) {
> +				ret = PTR_ERR(fence);
> +				goto out_backup;
> +			}
>   
> -		if (iosys_map_is_null(&bo->vmap)) {
> -			ret = xe_bo_vmap(bo);
> +			dma_resv_add_fence(bo->ttm.base.resv, fence,
> +					   DMA_RESV_USAGE_KERNEL);
> +			dma_resv_add_fence(backup->ttm.base.resv, fence,
> +					   DMA_RESV_USAGE_KERNEL);
> +			dma_fence_put(fence);
> +		} else {
> +			ret = xe_bo_vmap(backup);
>   			if (ret)
>   				goto out_backup;
> -			unmap = true;
> -		}
>   
> -		xe_map_memcpy_from(xe, backup->vmap.vaddr, &bo->vmap, 0,
> -				   xe_bo_size(bo));
> -	}
> +			if (iosys_map_is_null(&bo->vmap)) {
> +				ret = xe_bo_vmap(bo);
> +				if (ret)
> +					goto out_vunmap;
> +				unmap = true;
> +			}
>   
> -	if (!bo->backup_obj)
> -		bo->backup_obj = backup;
> +			xe_map_memcpy_from(xe, backup->vmap.vaddr, &bo->vmap, 0,
> +					   xe_bo_size(bo));
> +		}
>   
> +		if (!bo->backup_obj)
> +			bo->backup_obj = backup;
> +out_vunmap:
> +		xe_bo_vunmap(backup);
>   out_backup:
> -	xe_bo_vunmap(backup);
> -	if (ret && backup_created)
> -		xe_bo_put(backup);
> -out_unlock_bo:
> -	if (unmap)
> -		xe_bo_vunmap(bo);
> -	xe_bo_unlock(bo);
> +		if (ret && backup_created)
> +			xe_bo_put(backup);
> +		if (unmap)
> +			xe_bo_vunmap(bo);
> +	}
> +
>   	return ret;
>   }
>   


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 15/15] drm/xe: Convert pinned suspend eviction for exhaustive eviction
  2025-08-13 12:13   ` Matthew Auld
@ 2025-08-13 12:30     ` Thomas Hellström
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Hellström @ 2025-08-13 12:30 UTC (permalink / raw)
  To: Matthew Auld, intel-xe
  Cc: Matthew Brost, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst

On Wed, 2025-08-13 at 13:13 +0100, Matthew Auld wrote:
> Hi,
> 
> On 13/08/2025 11:51, Thomas Hellström wrote:
> > Pinned suspend eviction and preparation for eviction validates
> > system memory for eviction buffers. Do that under a
> > validation exclusive lock to avoid interfering with other
> > processes validating system graphics memory.
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_bo.c | 205 +++++++++++++++++++-------------
> > -----
> >   1 file changed, 108 insertions(+), 97 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > b/drivers/gpu/drm/xe/xe_bo.c
> > index 82bf158426ad..efb9c88b6aa7 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > @@ -1139,43 +1139,47 @@ long xe_bo_shrink(struct ttm_operation_ctx
> > *ctx, struct ttm_buffer_object *bo,
> >   int xe_bo_notifier_prepare_pinned(struct xe_bo *bo)
> >   {
> >   	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
> > -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> > +	struct xe_validation_ctx ctx;
> > +	struct drm_exec exec;
> >   	struct xe_bo *backup;
> >   	int ret = 0;
> >   
> > -	xe_bo_lock(bo, false);
> > +	xe_validation_guard(&ctx, &xe->val, &exec, 0, ret, true) {
> 
> Ah, this reminded me of 
> https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4288
> 
> Could this help with that? If you could maybe keep the exlusive mode 
> turned on for the entire prepare/unprepare stage, to ensure other 
> execs/validates back off until we are done?


I have a fix in the pipe for that, but I reverted it just before
sending since the whole suspend / resume stuff needs some interface
additions. I'm seeing some false lockdep errors.

Basically xe_validation_guard is set to block on an interruptible
struct completion if we're suspending. We can only do this on freezable
tasks allowing interruptible waits ATM, otherwise these tasks will
block on the completion until freezing time and then never freeze.

Otherwise when freezing the wait is interrupted and an -ERESTARTSYS is
returned, and the task will be frozen in the signal delivery code. This
part I have verified to work.

This should take care of most validations during suspend, but not the
rebind worker. I figure we can handle it as well by using a freezable
workqueue and if the worker receives an -EINTR or -ERESTARTSYS it
doesn't error but simply requeues itself as newly added work items
aren't run until the wqs are thawed. This part I haven't verified,
though.

Finally this assumes that all validations under uninterruptible context
are too few to cause any problems during suspend.

/Thomas




> 
> > +		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
> > +		drm_exec_retry_on_contention(&exec);
> > +		xe_assert(xe, !ret);
> > +		xe_assert(xe, !bo->backup_obj);
> >   
> > -	xe_assert(xe, !bo->backup_obj);
> > +		/*
> > +		 * Since this is called from the PM notifier we
> > might have raced with
> > +		 * someone unpinning this after we dropped the
> > pinned list lock and
> > +		 * grabbing the above bo lock.
> > +		 */
> > +		if (!xe_bo_is_pinned(bo))
> > +			break;
> >   
> > -	/*
> > -	 * Since this is called from the PM notifier we might have
> > raced with
> > -	 * someone unpinning this after we dropped the pinned list
> > lock and
> > -	 * grabbing the above bo lock.
> > -	 */
> > -	if (!xe_bo_is_pinned(bo))
> > -		goto out_unlock_bo;
> > +		if (!xe_bo_is_vram(bo))
> > +			break;
> >   
> > -	if (!xe_bo_is_vram(bo))
> > -		goto out_unlock_bo;
> > +		if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
> > +			break;
> >   
> > -	if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
> > -		goto out_unlock_bo;
> > +		backup = xe_bo_init_locked(xe, NULL, NULL, bo-
> > >ttm.base.resv, NULL, xe_bo_size(bo),
> > +					  
> > DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> > +					   XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_NEEDS_CPU_ACCESS |
> > +					   XE_BO_FLAG_PINNED,
> > &exec);
> > +		if (IS_ERR(backup)) {
> > +			drm_exec_retry_on_contention(&exec);
> > +			ret = PTR_ERR(backup);
> > +			xe_validation_retry_on_oom(&ctx, &ret);
> > +			break;
> > +		}
> >   
> > -	backup = xe_bo_init_locked(xe, NULL, NULL, bo-
> > >ttm.base.resv, NULL, xe_bo_size(bo),
> > -				   DRM_XE_GEM_CPU_CACHING_WB,
> > ttm_bo_type_kernel,
> > -				   XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_NEEDS_CPU_ACCESS |
> > -				   XE_BO_FLAG_PINNED, exec);
> > -	if (IS_ERR(backup)) {
> > -		ret = PTR_ERR(backup);
> > -		goto out_unlock_bo;
> > +		backup->parent_obj = xe_bo_get(bo); /* Released by
> > bo_destroy */
> > +		ttm_bo_pin(&backup->ttm);
> > +		bo->backup_obj = backup;
> >   	}
> >   
> > -	backup->parent_obj = xe_bo_get(bo); /* Released by
> > bo_destroy */
> > -	ttm_bo_pin(&backup->ttm);
> > -	bo->backup_obj = backup;
> > -
> > -out_unlock_bo:
> > -	xe_bo_unlock(bo);
> >   	return ret;
> >   }
> >   
> > @@ -1215,99 +1219,106 @@ int xe_bo_notifier_unprepare_pinned(struct
> > xe_bo *bo)
> >   int xe_bo_evict_pinned(struct xe_bo *bo)
> >   {
> >   	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
> > -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> > +	struct xe_validation_ctx ctx;
> > +	struct drm_exec exec;
> >   	struct xe_bo *backup = bo->backup_obj;
> >   	bool backup_created = false;
> >   	bool unmap = false;
> >   	int ret = 0;
> >   
> > -	xe_bo_lock(bo, false);
> > +	xe_validation_guard(&ctx, &xe->val, &exec, 0, ret, true) {
> > +		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
> > +		drm_exec_retry_on_contention(&exec);
> > +		xe_assert(xe, !ret);
> >   
> > -	if (WARN_ON(!bo->ttm.resource)) {
> > -		ret = -EINVAL;
> > -		goto out_unlock_bo;
> > -	}
> > +		if (WARN_ON(!bo->ttm.resource)) {
> > +			ret = -EINVAL;
> > +			break;
> > +		}
> >   
> > -	if (WARN_ON(!xe_bo_is_pinned(bo))) {
> > -		ret = -EINVAL;
> > -		goto out_unlock_bo;
> > -	}
> > +		if (WARN_ON(!xe_bo_is_pinned(bo))) {
> > +			ret = -EINVAL;
> > +			break;
> > +		}
> >   
> > -	if (!xe_bo_is_vram(bo))
> > -		goto out_unlock_bo;
> > +		if (!xe_bo_is_vram(bo))
> > +			break;
> >   
> > -	if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
> > -		goto out_unlock_bo;
> > +		if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
> > +			break;
> >   
> > -	if (!backup) {
> > -		backup = xe_bo_init_locked(xe, NULL, NULL, bo-
> > >ttm.base.resv, NULL, xe_bo_size(bo),
> > -					  
> > DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> > -					   XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_NEEDS_CPU_ACCESS |
> > -					   XE_BO_FLAG_PINNED,
> > exec);
> > -		if (IS_ERR(backup)) {
> > -			ret = PTR_ERR(backup);
> > -			goto out_unlock_bo;
> > +		if (!backup) {
> > +			backup = xe_bo_init_locked(xe, NULL, NULL,
> > bo->ttm.base.resv, NULL,
> > +						   xe_bo_size(bo),
> > +						  
> > DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> > +						  
> > XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> > +						  
> > XE_BO_FLAG_PINNED, &exec);
> > +			if (IS_ERR(backup)) {
> > +				drm_exec_retry_on_contention(&exec
> > );
> > +				ret = PTR_ERR(backup);
> > +				xe_validation_retry_on_oom(&ctx,
> > &ret);
> > +				break;
> > +			}
> > +			backup->parent_obj = xe_bo_get(bo); /*
> > Released by bo_destroy */
> > +			backup_created = true;
> >   		}
> > -		backup->parent_obj = xe_bo_get(bo); /* Released by
> > bo_destroy */
> > -		backup_created = true;
> > -	}
> >   
> > -	if (xe_bo_is_user(bo) || (bo->flags &
> > XE_BO_FLAG_PINNED_LATE_RESTORE)) {
> > -		struct xe_migrate *migrate;
> > -		struct dma_fence *fence;
> > -
> > -		if (bo->tile)
> > -			migrate = bo->tile->migrate;
> > -		else
> > -			migrate = mem_type_to_migrate(xe, bo-
> > >ttm.resource->mem_type);
> > +		if (xe_bo_is_user(bo) || (bo->flags &
> > XE_BO_FLAG_PINNED_LATE_RESTORE)) {
> > +			struct xe_migrate *migrate;
> > +			struct dma_fence *fence;
> >   
> > -		ret = dma_resv_reserve_fences(bo->ttm.base.resv,
> > 1);
> > -		if (ret)
> > -			goto out_backup;
> > +			if (bo->tile)
> > +				migrate = bo->tile->migrate;
> > +			else
> > +				migrate = mem_type_to_migrate(xe,
> > bo->ttm.resource->mem_type);
> >   
> > -		ret = dma_resv_reserve_fences(backup-
> > >ttm.base.resv, 1);
> > -		if (ret)
> > -			goto out_backup;
> > +			ret = dma_resv_reserve_fences(bo-
> > >ttm.base.resv, 1);
> > +			if (ret)
> > +				goto out_backup;
> >   
> > -		fence = xe_migrate_copy(migrate, bo, backup, bo-
> > >ttm.resource,
> > -					backup->ttm.resource,
> > false);
> > -		if (IS_ERR(fence)) {
> > -			ret = PTR_ERR(fence);
> > -			goto out_backup;
> > -		}
> > +			ret = dma_resv_reserve_fences(backup-
> > >ttm.base.resv, 1);
> > +			if (ret)
> > +				goto out_backup;
> >   
> > -		dma_resv_add_fence(bo->ttm.base.resv, fence,
> > -				   DMA_RESV_USAGE_KERNEL);
> > -		dma_resv_add_fence(backup->ttm.base.resv, fence,
> > -				   DMA_RESV_USAGE_KERNEL);
> > -		dma_fence_put(fence);
> > -	} else {
> > -		ret = xe_bo_vmap(backup);
> > -		if (ret)
> > -			goto out_backup;
> > +			fence = xe_migrate_copy(migrate, bo,
> > backup, bo->ttm.resource,
> > +						backup-
> > >ttm.resource, false);
> > +			if (IS_ERR(fence)) {
> > +				ret = PTR_ERR(fence);
> > +				goto out_backup;
> > +			}
> >   
> > -		if (iosys_map_is_null(&bo->vmap)) {
> > -			ret = xe_bo_vmap(bo);
> > +			dma_resv_add_fence(bo->ttm.base.resv,
> > fence,
> > +					   DMA_RESV_USAGE_KERNEL);
> > +			dma_resv_add_fence(backup->ttm.base.resv,
> > fence,
> > +					   DMA_RESV_USAGE_KERNEL);
> > +			dma_fence_put(fence);
> > +		} else {
> > +			ret = xe_bo_vmap(backup);
> >   			if (ret)
> >   				goto out_backup;
> > -			unmap = true;
> > -		}
> >   
> > -		xe_map_memcpy_from(xe, backup->vmap.vaddr, &bo-
> > >vmap, 0,
> > -				   xe_bo_size(bo));
> > -	}
> > +			if (iosys_map_is_null(&bo->vmap)) {
> > +				ret = xe_bo_vmap(bo);
> > +				if (ret)
> > +					goto out_vunmap;
> > +				unmap = true;
> > +			}
> >   
> > -	if (!bo->backup_obj)
> > -		bo->backup_obj = backup;
> > +			xe_map_memcpy_from(xe, backup->vmap.vaddr,
> > &bo->vmap, 0,
> > +					   xe_bo_size(bo));
> > +		}
> >   
> > +		if (!bo->backup_obj)
> > +			bo->backup_obj = backup;
> > +out_vunmap:
> > +		xe_bo_vunmap(backup);
> >   out_backup:
> > -	xe_bo_vunmap(backup);
> > -	if (ret && backup_created)
> > -		xe_bo_put(backup);
> > -out_unlock_bo:
> > -	if (unmap)
> > -		xe_bo_vunmap(bo);
> > -	xe_bo_unlock(bo);
> > +		if (ret && backup_created)
> > +			xe_bo_put(backup);
> > +		if (unmap)
> > +			xe_bo_vunmap(bo);
> > +	}
> > +
> >   	return ret;
> >   }
> >   
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* ✗ Xe.CI.BAT: failure for Driver-managed exhaustive eviction
  2025-08-13 10:51 [PATCH 00/15] Driver-managed exhaustive eviction Thomas Hellström
                   ` (16 preceding siblings ...)
  2025-08-13 11:55 ` ✓ CI.KUnit: success " Patchwork
@ 2025-08-13 13:20 ` Patchwork
  2025-08-13 14:25 ` ✗ Xe.CI.Full: " Patchwork
  18 siblings, 0 replies; 66+ messages in thread
From: Patchwork @ 2025-08-13 13:20 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 2506 bytes --]

== Series Details ==

Series: Driver-managed exhaustive eviction
URL   : https://patchwork.freedesktop.org/series/152882/
State : failure

== Summary ==

CI Bug Log - changes from xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13_BAT -> xe-pw-152882v1_BAT
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with xe-pw-152882v1_BAT absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in xe-pw-152882v1_BAT, please notify your bug team (I915-ci-infra@lists.freedesktop.org) to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (11 -> 9)
------------------------------

  Missing    (2): bat-adlp-vm bat-ptl-vm 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in xe-pw-152882v1_BAT:

### IGT changes ###

#### Possible regressions ####

  * igt@sriov_basic@enable-vfs-autoprobe-off@numvfs-all:
    - bat-bmg-2:          [PASS][1] -> [FAIL][2] +1 other test fail
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/bat-bmg-2/igt@sriov_basic@enable-vfs-autoprobe-off@numvfs-all.html
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/bat-bmg-2/igt@sriov_basic@enable-vfs-autoprobe-off@numvfs-all.html
    - bat-atsm-2:         [PASS][3] -> [FAIL][4] +1 other test fail
   [3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/bat-atsm-2/igt@sriov_basic@enable-vfs-autoprobe-off@numvfs-all.html
   [4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/bat-atsm-2/igt@sriov_basic@enable-vfs-autoprobe-off@numvfs-all.html

  * igt@xe_sriov_flr@flr-vf1-clear:
    - bat-bmg-1:          [PASS][5] -> [FAIL][6] +4 other tests fail
   [5]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/bat-bmg-1/igt@xe_sriov_flr@flr-vf1-clear.html
   [6]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/bat-bmg-1/igt@xe_sriov_flr@flr-vf1-clear.html

  


Build changes
-------------

  * Linux: xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13 -> xe-pw-152882v1

  IGT_8493: 8493
  xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13: 546fc742f08b8dbd3fa1486933c9b15085e11d13
  xe-pw-152882v1: 152882v1

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/index.html

[-- Attachment #2: Type: text/html, Size: 3140 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* ✗ Xe.CI.Full: failure for Driver-managed exhaustive eviction
  2025-08-13 10:51 [PATCH 00/15] Driver-managed exhaustive eviction Thomas Hellström
                   ` (17 preceding siblings ...)
  2025-08-13 13:20 ` ✗ Xe.CI.BAT: failure " Patchwork
@ 2025-08-13 14:25 ` Patchwork
  18 siblings, 0 replies; 66+ messages in thread
From: Patchwork @ 2025-08-13 14:25 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 73228 bytes --]

== Series Details ==

Series: Driver-managed exhaustive eviction
URL   : https://patchwork.freedesktop.org/series/152882/
State : failure

== Summary ==

CI Bug Log - changes from xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13_FULL -> xe-pw-152882v1_FULL
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with xe-pw-152882v1_FULL absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in xe-pw-152882v1_FULL, please notify your bug team (I915-ci-infra@lists.freedesktop.org) to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (4 -> 4)
------------------------------

  No changes in participating hosts

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in xe-pw-152882v1_FULL:

### IGT changes ###

#### Possible regressions ####

  * igt@kms_async_flips@async-flip-with-page-flip-events-tiled-atomic@pipe-d-hdmi-a-3-x:
    - shard-bmg:          [PASS][1] -> [DMESG-WARN][2] +6 other tests dmesg-warn
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-7/igt@kms_async_flips@async-flip-with-page-flip-events-tiled-atomic@pipe-d-hdmi-a-3-x.html
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-6/igt@kms_async_flips@async-flip-with-page-flip-events-tiled-atomic@pipe-d-hdmi-a-3-x.html

  * igt@kms_async_flips@crc-atomic@pipe-c-dp-2:
    - shard-bmg:          [PASS][3] -> [DMESG-FAIL][4] +3 other tests dmesg-fail
   [3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-7/igt@kms_async_flips@crc-atomic@pipe-c-dp-2.html
   [4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-3/igt@kms_async_flips@crc-atomic@pipe-c-dp-2.html

  * igt@kms_fbcon_fbt@fbc:
    - shard-adlp:         [PASS][5] -> [FAIL][6]
   [5]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-adlp-2/igt@kms_fbcon_fbt@fbc.html
   [6]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-1/igt@kms_fbcon_fbt@fbc.html
    - shard-dg2-set2:     [PASS][7] -> [FAIL][8]
   [7]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-dg2-433/igt@kms_fbcon_fbt@fbc.html
   [8]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-464/igt@kms_fbcon_fbt@fbc.html

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-pri-indfb-draw-blt:
    - shard-dg2-set2:     [PASS][9] -> [SKIP][10] +67 other tests skip
   [9]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-dg2-433/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-pri-indfb-draw-blt.html
   [10]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-464/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-pri-indfb-draw-blt.html

  * igt@kms_frontbuffer_tracking@fbc-1p-rte:
    - shard-adlp:         [PASS][11] -> [SKIP][12] +11 other tests skip
   [11]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-adlp-9/igt@kms_frontbuffer_tracking@fbc-1p-rte.html
   [12]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-8/igt@kms_frontbuffer_tracking@fbc-1p-rte.html
    - shard-dg2-set2:     NOTRUN -> [SKIP][13] +2 other tests skip
   [13]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-432/igt@kms_frontbuffer_tracking@fbc-1p-rte.html

  * igt@sriov_basic@enable-vfs-autoprobe-off@numvfs-7:
    - shard-bmg:          [PASS][14] -> [FAIL][15] +33 other tests fail
   [14]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-2/igt@sriov_basic@enable-vfs-autoprobe-off@numvfs-7.html
   [15]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-5/igt@sriov_basic@enable-vfs-autoprobe-off@numvfs-7.html

  * igt@xe_exec_fault_mode@many-execqueues-bindexecqueue-userptr-rebind-imm:
    - shard-lnl:          NOTRUN -> [FAIL][16]
   [16]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-1/igt@xe_exec_fault_mode@many-execqueues-bindexecqueue-userptr-rebind-imm.html

  * igt@xe_sriov_flr@flr-each-isolation:
    - shard-bmg:          NOTRUN -> [FAIL][17]
   [17]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-4/igt@xe_sriov_flr@flr-each-isolation.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * {igt@kms_async_flips@async-flip-hang@pipe-d-dp-2}:
    - shard-bmg:          [PASS][18] -> [DMESG-WARN][19] +9 other tests dmesg-warn
   [18]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-2/igt@kms_async_flips@async-flip-hang@pipe-d-dp-2.html
   [19]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-5/igt@kms_async_flips@async-flip-hang@pipe-d-dp-2.html

  * {igt@xe_compute_preempt@compute-preempt-many-vram}:
    - shard-bmg:          [PASS][20] -> [INCOMPLETE][21] +1 other test incomplete
   [20]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-6/igt@xe_compute_preempt@compute-preempt-many-vram.html
   [21]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-1/igt@xe_compute_preempt@compute-preempt-many-vram.html

  * {igt@xe_pmu@engine-activity-suspend}:
    - shard-dg2-set2:     [PASS][22] -> [INCOMPLETE][23]
   [22]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-dg2-434/igt@xe_pmu@engine-activity-suspend.html
   [23]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-434/igt@xe_pmu@engine-activity-suspend.html

  
Known issues
------------

  Here are the changes found in xe-pw-152882v1_FULL that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_async_flips@crc-atomic@pipe-d-hdmi-a-3:
    - shard-bmg:          [PASS][24] -> [FAIL][25] ([Intel XE#4677]) +1 other test fail
   [24]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-7/igt@kms_async_flips@crc-atomic@pipe-d-hdmi-a-3.html
   [25]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-3/igt@kms_async_flips@crc-atomic@pipe-d-hdmi-a-3.html

  * igt@kms_async_flips@crc@pipe-d-dp-2:
    - shard-bmg:          [PASS][26] -> [FAIL][27] ([Intel XE#3884]) +1 other test fail
   [26]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-5/igt@kms_async_flips@crc@pipe-d-dp-2.html
   [27]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-8/igt@kms_async_flips@crc@pipe-d-dp-2.html

  * igt@kms_async_flips@crc@pipe-d-hdmi-a-3:
    - shard-bmg:          [PASS][28] -> [DMESG-FAIL][29] ([Intel XE#4626]) +8 other tests dmesg-fail
   [28]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-5/igt@kms_async_flips@crc@pipe-d-hdmi-a-3.html
   [29]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-8/igt@kms_async_flips@crc@pipe-d-hdmi-a-3.html

  * igt@kms_atomic_transition@plane-all-modeset-transition:
    - shard-lnl:          NOTRUN -> [SKIP][30] ([Intel XE#3279])
   [30]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@kms_atomic_transition@plane-all-modeset-transition.html

  * igt@kms_big_fb@4-tiled-addfb-size-offset-overflow:
    - shard-adlp:         NOTRUN -> [SKIP][31] ([Intel XE#607])
   [31]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-9/igt@kms_big_fb@4-tiled-addfb-size-offset-overflow.html

  * igt@kms_big_fb@y-tiled-16bpp-rotate-0:
    - shard-adlp:         [PASS][32] -> [DMESG-FAIL][33] ([Intel XE#4543]) +7 other tests dmesg-fail
   [32]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-adlp-3/igt@kms_big_fb@y-tiled-16bpp-rotate-0.html
   [33]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-3/igt@kms_big_fb@y-tiled-16bpp-rotate-0.html

  * igt@kms_big_fb@y-tiled-16bpp-rotate-180:
    - shard-lnl:          NOTRUN -> [SKIP][34] ([Intel XE#1124]) +2 other tests skip
   [34]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@kms_big_fb@y-tiled-16bpp-rotate-180.html

  * igt@kms_big_fb@y-tiled-16bpp-rotate-270:
    - shard-adlp:         NOTRUN -> [SKIP][35] ([Intel XE#316]) +1 other test skip
   [35]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-9/igt@kms_big_fb@y-tiled-16bpp-rotate-270.html

  * igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-hflip:
    - shard-dg2-set2:     NOTRUN -> [SKIP][36] ([Intel XE#1124])
   [36]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-432/igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-hflip.html

  * igt@kms_big_fb@yf-tiled-8bpp-rotate-90:
    - shard-bmg:          NOTRUN -> [SKIP][37] ([Intel XE#1124]) +1 other test skip
   [37]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-4/igt@kms_big_fb@yf-tiled-8bpp-rotate-90.html

  * igt@kms_big_fb@yf-tiled-addfb-size-offset-overflow:
    - shard-dg2-set2:     NOTRUN -> [SKIP][38] ([Intel XE#607])
   [38]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-432/igt@kms_big_fb@yf-tiled-addfb-size-offset-overflow.html

  * igt@kms_big_fb@yf-tiled-addfb-size-overflow:
    - shard-lnl:          NOTRUN -> [SKIP][39] ([Intel XE#1428])
   [39]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@kms_big_fb@yf-tiled-addfb-size-overflow.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-0-hflip:
    - shard-adlp:         NOTRUN -> [SKIP][40] ([Intel XE#1124]) +1 other test skip
   [40]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-9/igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-0-hflip.html

  * igt@kms_bw@linear-tiling-2-displays-1920x1080p:
    - shard-dg2-set2:     NOTRUN -> [SKIP][41] ([Intel XE#367])
   [41]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-435/igt@kms_bw@linear-tiling-2-displays-1920x1080p.html

  * igt@kms_bw@linear-tiling-2-displays-2160x1440p:
    - shard-lnl:          NOTRUN -> [SKIP][42] ([Intel XE#367]) +1 other test skip
   [42]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@kms_bw@linear-tiling-2-displays-2160x1440p.html

  * igt@kms_ccs@bad-aux-stride-4-tiled-mtl-rc-ccs@pipe-a-dp-2:
    - shard-dg2-set2:     NOTRUN -> [SKIP][43] ([Intel XE#787]) +146 other tests skip
   [43]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-432/igt@kms_ccs@bad-aux-stride-4-tiled-mtl-rc-ccs@pipe-a-dp-2.html

  * igt@kms_ccs@crc-primary-basic-4-tiled-lnl-ccs:
    - shard-adlp:         NOTRUN -> [SKIP][44] ([Intel XE#2907])
   [44]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-9/igt@kms_ccs@crc-primary-basic-4-tiled-lnl-ccs.html

  * igt@kms_ccs@crc-primary-basic-yf-tiled-ccs@pipe-d-dp-2:
    - shard-dg2-set2:     NOTRUN -> [SKIP][45] ([Intel XE#455] / [Intel XE#787]) +24 other tests skip
   [45]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-432/igt@kms_ccs@crc-primary-basic-yf-tiled-ccs@pipe-d-dp-2.html

  * igt@kms_ccs@crc-primary-suspend-4-tiled-bmg-ccs:
    - shard-bmg:          [PASS][46] -> [INCOMPLETE][47] ([Intel XE#3862]) +1 other test incomplete
   [46]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-1/igt@kms_ccs@crc-primary-suspend-4-tiled-bmg-ccs.html
   [47]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-5/igt@kms_ccs@crc-primary-suspend-4-tiled-bmg-ccs.html

  * igt@kms_ccs@crc-primary-suspend-4-tiled-dg2-mc-ccs:
    - shard-dg2-set2:     [PASS][48] -> [INCOMPLETE][49] ([Intel XE#3862]) +1 other test incomplete
   [48]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-dg2-464/igt@kms_ccs@crc-primary-suspend-4-tiled-dg2-mc-ccs.html
   [49]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-433/igt@kms_ccs@crc-primary-suspend-4-tiled-dg2-mc-ccs.html

  * igt@kms_ccs@crc-primary-suspend-4-tiled-lnl-ccs@pipe-a-dp-2:
    - shard-bmg:          NOTRUN -> [SKIP][50] ([Intel XE#2652] / [Intel XE#787]) +3 other tests skip
   [50]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-7/igt@kms_ccs@crc-primary-suspend-4-tiled-lnl-ccs@pipe-a-dp-2.html

  * igt@kms_ccs@missing-ccs-buffer-y-tiled-ccs:
    - shard-bmg:          NOTRUN -> [SKIP][51] ([Intel XE#2887]) +3 other tests skip
   [51]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-4/igt@kms_ccs@missing-ccs-buffer-y-tiled-ccs.html

  * igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc:
    - shard-lnl:          NOTRUN -> [SKIP][52] ([Intel XE#2887]) +5 other tests skip
   [52]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-1/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc.html

  * igt@kms_ccs@random-ccs-data-y-tiled-ccs:
    - shard-adlp:         NOTRUN -> [SKIP][53] ([Intel XE#455] / [Intel XE#787]) +5 other tests skip
   [53]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-9/igt@kms_ccs@random-ccs-data-y-tiled-ccs.html

  * igt@kms_ccs@random-ccs-data-y-tiled-ccs@pipe-b-hdmi-a-1:
    - shard-adlp:         NOTRUN -> [SKIP][54] ([Intel XE#787]) +8 other tests skip
   [54]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-9/igt@kms_ccs@random-ccs-data-y-tiled-ccs@pipe-b-hdmi-a-1.html

  * igt@kms_cdclk@mode-transition@pipe-a-dp-2:
    - shard-dg2-set2:     NOTRUN -> [SKIP][55] ([Intel XE#4417]) +3 other tests skip
   [55]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-432/igt@kms_cdclk@mode-transition@pipe-a-dp-2.html

  * igt@kms_chamelium_color@ctm-max:
    - shard-lnl:          NOTRUN -> [SKIP][56] ([Intel XE#306])
   [56]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@kms_chamelium_color@ctm-max.html

  * igt@kms_chamelium_edid@dp-mode-timings:
    - shard-adlp:         NOTRUN -> [SKIP][57] ([Intel XE#373])
   [57]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-9/igt@kms_chamelium_edid@dp-mode-timings.html

  * igt@kms_chamelium_frames@hdmi-aspect-ratio:
    - shard-dg2-set2:     NOTRUN -> [SKIP][58] ([Intel XE#373]) +3 other tests skip
   [58]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-435/igt@kms_chamelium_frames@hdmi-aspect-ratio.html

  * igt@kms_chamelium_hpd@dp-hpd-enable-disable-mode:
    - shard-bmg:          NOTRUN -> [SKIP][59] ([Intel XE#2252])
   [59]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-4/igt@kms_chamelium_hpd@dp-hpd-enable-disable-mode.html

  * igt@kms_chamelium_hpd@vga-hpd-with-enabled-mode:
    - shard-lnl:          NOTRUN -> [SKIP][60] ([Intel XE#373]) +2 other tests skip
   [60]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-1/igt@kms_chamelium_hpd@vga-hpd-with-enabled-mode.html

  * igt@kms_concurrent@multi-plane-atomic-lowres:
    - shard-bmg:          NOTRUN -> [ABORT][61] ([Intel XE#5826])
   [61]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-4/igt@kms_concurrent@multi-plane-atomic-lowres.html

  * igt@kms_concurrent@multi-plane-atomic-lowres@pipe-a-dp-2:
    - shard-bmg:          NOTRUN -> [ABORT][62] ([Intel XE#5898])
   [62]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-4/igt@kms_concurrent@multi-plane-atomic-lowres@pipe-a-dp-2.html

  * igt@kms_concurrent@multi-plane-atomic-lowres@pipe-a-hdmi-a-3:
    - shard-bmg:          NOTRUN -> [DMESG-WARN][63] ([Intel XE#5826])
   [63]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-4/igt@kms_concurrent@multi-plane-atomic-lowres@pipe-a-hdmi-a-3.html

  * igt@kms_content_protection@content-type-change:
    - shard-lnl:          NOTRUN -> [SKIP][64] ([Intel XE#3278])
   [64]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-1/igt@kms_content_protection@content-type-change.html

  * igt@kms_content_protection@dp-mst-lic-type-0:
    - shard-lnl:          NOTRUN -> [SKIP][65] ([Intel XE#307])
   [65]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@kms_content_protection@dp-mst-lic-type-0.html

  * igt@kms_content_protection@dp-mst-lic-type-1:
    - shard-dg2-set2:     NOTRUN -> [SKIP][66] ([Intel XE#307])
   [66]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-432/igt@kms_content_protection@dp-mst-lic-type-1.html

  * igt@kms_content_protection@lic-type-0@pipe-a-dp-4:
    - shard-dg2-set2:     NOTRUN -> [FAIL][67] ([Intel XE#3304])
   [67]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-434/igt@kms_content_protection@lic-type-0@pipe-a-dp-4.html

  * igt@kms_content_protection@uevent@pipe-a-dp-2:
    - shard-bmg:          NOTRUN -> [FAIL][68] ([Intel XE#1188])
   [68]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-7/igt@kms_content_protection@uevent@pipe-a-dp-2.html

  * igt@kms_cursor_crc@cursor-onscreen-256x85:
    - shard-bmg:          NOTRUN -> [SKIP][69] ([Intel XE#2320]) +1 other test skip
   [69]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-4/igt@kms_cursor_crc@cursor-onscreen-256x85.html

  * igt@kms_cursor_crc@cursor-onscreen-32x32:
    - shard-adlp:         NOTRUN -> [SKIP][70] ([Intel XE#455])
   [70]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-9/igt@kms_cursor_crc@cursor-onscreen-32x32.html

  * igt@kms_cursor_crc@cursor-random-32x32:
    - shard-lnl:          NOTRUN -> [SKIP][71] ([Intel XE#1424])
   [71]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@kms_cursor_crc@cursor-random-32x32.html

  * igt@kms_cursor_crc@cursor-sliding-512x512:
    - shard-lnl:          NOTRUN -> [SKIP][72] ([Intel XE#2321])
   [72]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@kms_cursor_crc@cursor-sliding-512x512.html

  * igt@kms_cursor_legacy@cursora-vs-flipb-atomic:
    - shard-adlp:         NOTRUN -> [SKIP][73] ([Intel XE#309])
   [73]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-9/igt@kms_cursor_legacy@cursora-vs-flipb-atomic.html

  * igt@kms_cursor_legacy@cursorb-vs-flipa-atomic:
    - shard-bmg:          [PASS][74] -> [SKIP][75] ([Intel XE#2291]) +1 other test skip
   [74]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-8/igt@kms_cursor_legacy@cursorb-vs-flipa-atomic.html
   [75]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-6/igt@kms_cursor_legacy@cursorb-vs-flipa-atomic.html

  * igt@kms_cursor_legacy@cursorb-vs-flipa-varying-size:
    - shard-lnl:          NOTRUN -> [SKIP][76] ([Intel XE#309]) +1 other test skip
   [76]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-1/igt@kms_cursor_legacy@cursorb-vs-flipa-varying-size.html

  * igt@kms_cursor_legacy@short-busy-flip-before-cursor-toggle:
    - shard-dg2-set2:     NOTRUN -> [SKIP][77] ([Intel XE#323])
   [77]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-432/igt@kms_cursor_legacy@short-busy-flip-before-cursor-toggle.html

  * igt@kms_dirtyfb@drrs-dirtyfb-ioctl:
    - shard-lnl:          NOTRUN -> [SKIP][78] ([Intel XE#1508])
   [78]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-1/igt@kms_dirtyfb@drrs-dirtyfb-ioctl.html

  * igt@kms_dp_link_training@non-uhbr-sst:
    - shard-bmg:          [PASS][79] -> [SKIP][80] ([Intel XE#4354])
   [79]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-7/igt@kms_dp_link_training@non-uhbr-sst.html
   [80]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-6/igt@kms_dp_link_training@non-uhbr-sst.html

  * igt@kms_dsc@dsc-with-bpc:
    - shard-dg2-set2:     NOTRUN -> [SKIP][81] ([Intel XE#455]) +3 other tests skip
   [81]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-435/igt@kms_dsc@dsc-with-bpc.html
    - shard-lnl:          NOTRUN -> [SKIP][82] ([Intel XE#2244])
   [82]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-4/igt@kms_dsc@dsc-with-bpc.html

  * igt@kms_flip@2x-flip-vs-dpms-off-vs-modeset-interruptible:
    - shard-bmg:          [PASS][83] -> [SKIP][84] ([Intel XE#2316]) +2 other tests skip
   [83]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-8/igt@kms_flip@2x-flip-vs-dpms-off-vs-modeset-interruptible.html
   [84]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-6/igt@kms_flip@2x-flip-vs-dpms-off-vs-modeset-interruptible.html

  * igt@kms_flip@2x-flip-vs-rmfb-interruptible:
    - shard-lnl:          NOTRUN -> [SKIP][85] ([Intel XE#1421]) +4 other tests skip
   [85]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@kms_flip@2x-flip-vs-rmfb-interruptible.html

  * igt@kms_flip@flip-vs-dpms-on-nop-interruptible@a-hdmi-a1:
    - shard-adlp:         [PASS][86] -> [DMESG-WARN][87] ([Intel XE#4543]) +13 other tests dmesg-warn
   [86]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-adlp-1/igt@kms_flip@flip-vs-dpms-on-nop-interruptible@a-hdmi-a1.html
   [87]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-6/igt@kms_flip@flip-vs-dpms-on-nop-interruptible@a-hdmi-a1.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible:
    - shard-lnl:          [PASS][88] -> [FAIL][89] ([Intel XE#301]) +1 other test fail
   [88]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-7/igt@kms_flip@flip-vs-expired-vblank-interruptible.html
   [89]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-5/igt@kms_flip@flip-vs-expired-vblank-interruptible.html

  * igt@kms_flip@flip-vs-expired-vblank@c-edp1:
    - shard-lnl:          [PASS][90] -> [FAIL][91] ([Intel XE#301] / [Intel XE#3149])
   [90]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-3/igt@kms_flip@flip-vs-expired-vblank@c-edp1.html
   [91]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@kms_flip@flip-vs-expired-vblank@c-edp1.html

  * igt@kms_flip@flip-vs-rmfb:
    - shard-adlp:         [PASS][92] -> [DMESG-WARN][93] ([Intel XE#5208])
   [92]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-adlp-3/igt@kms_flip@flip-vs-rmfb.html
   [93]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-3/igt@kms_flip@flip-vs-rmfb.html

  * igt@kms_flip@flip-vs-suspend:
    - shard-bmg:          [PASS][94] -> [INCOMPLETE][95] ([Intel XE#2049] / [Intel XE#2597]) +1 other test incomplete
   [94]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-8/igt@kms_flip@flip-vs-suspend.html
   [95]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-1/igt@kms_flip@flip-vs-suspend.html

  * igt@kms_flip_scaled_crc@flip-32bpp-yftileccs-to-64bpp-yftile-downscaling:
    - shard-bmg:          NOTRUN -> [SKIP][96] ([Intel XE#2293] / [Intel XE#2380])
   [96]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-4/igt@kms_flip_scaled_crc@flip-32bpp-yftileccs-to-64bpp-yftile-downscaling.html

  * igt@kms_flip_scaled_crc@flip-32bpp-yftileccs-to-64bpp-yftile-downscaling@pipe-a-valid-mode:
    - shard-bmg:          NOTRUN -> [SKIP][97] ([Intel XE#2293])
   [97]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-4/igt@kms_flip_scaled_crc@flip-32bpp-yftileccs-to-64bpp-yftile-downscaling@pipe-a-valid-mode.html

  * igt@kms_flip_scaled_crc@flip-64bpp-4tile-to-32bpp-4tile-downscaling:
    - shard-lnl:          NOTRUN -> [SKIP][98] ([Intel XE#1397] / [Intel XE#1745])
   [98]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-1/igt@kms_flip_scaled_crc@flip-64bpp-4tile-to-32bpp-4tile-downscaling.html

  * igt@kms_flip_scaled_crc@flip-64bpp-4tile-to-32bpp-4tile-downscaling@pipe-a-default-mode:
    - shard-lnl:          NOTRUN -> [SKIP][99] ([Intel XE#1397])
   [99]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-1/igt@kms_flip_scaled_crc@flip-64bpp-4tile-to-32bpp-4tile-downscaling@pipe-a-default-mode.html

  * igt@kms_flip_scaled_crc@flip-64bpp-yftile-to-16bpp-yftile-downscaling:
    - shard-lnl:          NOTRUN -> [SKIP][100] ([Intel XE#1401] / [Intel XE#1745])
   [100]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@kms_flip_scaled_crc@flip-64bpp-yftile-to-16bpp-yftile-downscaling.html

  * igt@kms_flip_scaled_crc@flip-64bpp-yftile-to-16bpp-yftile-downscaling@pipe-a-default-mode:
    - shard-lnl:          NOTRUN -> [SKIP][101] ([Intel XE#1401])
   [101]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@kms_flip_scaled_crc@flip-64bpp-yftile-to-16bpp-yftile-downscaling@pipe-a-default-mode.html

  * igt@kms_frontbuffer_tracking@drrs-2p-scndscrn-indfb-pgflip-blt:
    - shard-bmg:          NOTRUN -> [SKIP][102] ([Intel XE#2311]) +4 other tests skip
   [102]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-4/igt@kms_frontbuffer_tracking@drrs-2p-scndscrn-indfb-pgflip-blt.html

  * igt@kms_frontbuffer_tracking@drrs-rgb565-draw-render:
    - shard-adlp:         NOTRUN -> [SKIP][103] ([Intel XE#651]) +1 other test skip
   [103]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-9/igt@kms_frontbuffer_tracking@drrs-rgb565-draw-render.html

  * igt@kms_frontbuffer_tracking@drrs-slowdraw:
    - shard-lnl:          NOTRUN -> [SKIP][104] ([Intel XE#651]) +5 other tests skip
   [104]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-1/igt@kms_frontbuffer_tracking@drrs-slowdraw.html

  * igt@kms_frontbuffer_tracking@drrs-suspend:
    - shard-dg2-set2:     NOTRUN -> [SKIP][105] ([Intel XE#651]) +4 other tests skip
   [105]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-432/igt@kms_frontbuffer_tracking@drrs-suspend.html

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-fullscreen:
    - shard-adlp:         NOTRUN -> [SKIP][106] ([Intel XE#656]) +7 other tests skip
   [106]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-9/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-fullscreen.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-move:
    - shard-bmg:          NOTRUN -> [SKIP][107] ([Intel XE#5390]) +1 other test skip
   [107]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-4/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-move.html

  * igt@kms_frontbuffer_tracking@fbcdrrs-2p-scndscrn-pri-indfb-draw-render:
    - shard-lnl:          NOTRUN -> [SKIP][108] ([Intel XE#656]) +12 other tests skip
   [108]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-1/igt@kms_frontbuffer_tracking@fbcdrrs-2p-scndscrn-pri-indfb-draw-render.html

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-shrfb-plflip-blt:
    - shard-adlp:         NOTRUN -> [SKIP][109] ([Intel XE#653]) +3 other tests skip
   [109]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-4/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-shrfb-plflip-blt.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-cur-indfb-onoff:
    - shard-dg2-set2:     NOTRUN -> [SKIP][110] ([Intel XE#653]) +6 other tests skip
   [110]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-432/igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-cur-indfb-onoff.html

  * igt@kms_frontbuffer_tracking@psr-modesetfrombusy:
    - shard-bmg:          NOTRUN -> [SKIP][111] ([Intel XE#2313]) +6 other tests skip
   [111]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-4/igt@kms_frontbuffer_tracking@psr-modesetfrombusy.html

  * igt@kms_getfb@getfb-handle-closed:
    - shard-adlp:         [PASS][112] -> [DMESG-WARN][113] ([Intel XE#2953] / [Intel XE#4173])
   [112]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-adlp-1/igt@kms_getfb@getfb-handle-closed.html
   [113]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-6/igt@kms_getfb@getfb-handle-closed.html

  * igt@kms_hdr@brightness-with-hdr:
    - shard-lnl:          NOTRUN -> [SKIP][114] ([Intel XE#3374] / [Intel XE#3544])
   [114]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@kms_hdr@brightness-with-hdr.html

  * igt@kms_hdr@invalid-metadata-sizes:
    - shard-bmg:          [PASS][115] -> [SKIP][116] ([Intel XE#1503])
   [115]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-8/igt@kms_hdr@invalid-metadata-sizes.html
   [116]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-6/igt@kms_hdr@invalid-metadata-sizes.html

  * igt@kms_joiner@basic-force-big-joiner:
    - shard-bmg:          [PASS][117] -> [SKIP][118] ([Intel XE#3012])
   [117]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-4/igt@kms_joiner@basic-force-big-joiner.html
   [118]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-6/igt@kms_joiner@basic-force-big-joiner.html

  * igt@kms_plane_multiple@tiling-x@pipe-b-edp-1:
    - shard-lnl:          NOTRUN -> [FAIL][119] ([Intel XE#4658]) +3 other tests fail
   [119]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@kms_plane_multiple@tiling-x@pipe-b-edp-1.html

  * igt@kms_plane_scaling@planes-downscale-factor-0-5:
    - shard-lnl:          NOTRUN -> [SKIP][120] ([Intel XE#2763]) +3 other tests skip
   [120]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-1/igt@kms_plane_scaling@planes-downscale-factor-0-5.html

  * igt@kms_pm_backlight@brightness-with-dpms:
    - shard-bmg:          NOTRUN -> [SKIP][121] ([Intel XE#2938])
   [121]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-4/igt@kms_pm_backlight@brightness-with-dpms.html

  * igt@kms_pm_rpm@modeset-non-lpsp:
    - shard-lnl:          NOTRUN -> [SKIP][122] ([Intel XE#1439] / [Intel XE#3141])
   [122]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-1/igt@kms_pm_rpm@modeset-non-lpsp.html

  * igt@kms_pm_rpm@modeset-non-lpsp-stress:
    - shard-adlp:         NOTRUN -> [SKIP][123] ([Intel XE#836])
   [123]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-9/igt@kms_pm_rpm@modeset-non-lpsp-stress.html

  * igt@kms_psr2_sf@fbc-pr-cursor-plane-move-continuous-sf:
    - shard-bmg:          NOTRUN -> [SKIP][124] ([Intel XE#1489] / [Intel XE#5899]) +1 other test skip
   [124]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-4/igt@kms_psr2_sf@fbc-pr-cursor-plane-move-continuous-sf.html

  * igt@kms_psr2_sf@fbc-psr2-overlay-primary-update-sf-dmg-area:
    - shard-dg2-set2:     NOTRUN -> [SKIP][125] ([Intel XE#1489] / [Intel XE#5899]) +1 other test skip
   [125]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-432/igt@kms_psr2_sf@fbc-psr2-overlay-primary-update-sf-dmg-area.html

  * igt@kms_psr2_sf@psr2-overlay-plane-move-continuous-sf:
    - shard-adlp:         NOTRUN -> [SKIP][126] ([Intel XE#1489] / [Intel XE#5899])
   [126]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-9/igt@kms_psr2_sf@psr2-overlay-plane-move-continuous-sf.html

  * igt@kms_psr@fbc-pr-cursor-blt:
    - shard-bmg:          NOTRUN -> [SKIP][127] ([Intel XE#2234] / [Intel XE#2850] / [Intel XE#5899]) +2 other tests skip
   [127]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-4/igt@kms_psr@fbc-pr-cursor-blt.html

  * igt@kms_psr@pr-dpms:
    - shard-lnl:          NOTRUN -> [SKIP][128] ([Intel XE#1406] / [Intel XE#5899]) +1 other test skip
   [128]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@kms_psr@pr-dpms.html

  * igt@kms_psr@pr-sprite-render:
    - shard-adlp:         NOTRUN -> [SKIP][129] ([Intel XE#2850] / [Intel XE#5899] / [Intel XE#929]) +1 other test skip
   [129]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-9/igt@kms_psr@pr-sprite-render.html

  * igt@kms_psr@psr2-primary-render:
    - shard-dg2-set2:     NOTRUN -> [SKIP][130] ([Intel XE#2850] / [Intel XE#5899] / [Intel XE#929]) +2 other tests skip
   [130]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-435/igt@kms_psr@psr2-primary-render.html

  * igt@kms_rotation_crc@multiplane-rotation-cropping-top:
    - shard-adlp:         NOTRUN -> [FAIL][131] ([Intel XE#1874])
   [131]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-9/igt@kms_rotation_crc@multiplane-rotation-cropping-top.html

  * igt@kms_rotation_crc@primary-x-tiled-reflect-x-0:
    - shard-lnl:          NOTRUN -> [FAIL][132] ([Intel XE#4689])
   [132]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@kms_rotation_crc@primary-x-tiled-reflect-x-0.html

  * igt@kms_rotation_crc@primary-y-tiled-reflect-x-90:
    - shard-dg2-set2:     NOTRUN -> [SKIP][133] ([Intel XE#3414])
   [133]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-432/igt@kms_rotation_crc@primary-y-tiled-reflect-x-90.html

  * igt@kms_setmode@clone-exclusive-crtc:
    - shard-lnl:          NOTRUN -> [SKIP][134] ([Intel XE#1435])
   [134]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-4/igt@kms_setmode@clone-exclusive-crtc.html

  * igt@kms_vrr@flip-dpms:
    - shard-bmg:          NOTRUN -> [SKIP][135] ([Intel XE#1499])
   [135]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-4/igt@kms_vrr@flip-dpms.html

  * igt@xe_ccs@block-multicopy-inplace:
    - shard-adlp:         NOTRUN -> [SKIP][136] ([Intel XE#455] / [Intel XE#488] / [Intel XE#5607])
   [136]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-9/igt@xe_ccs@block-multicopy-inplace.html

  * igt@xe_create@multigpu-create-massive-size:
    - shard-dg2-set2:     NOTRUN -> [SKIP][137] ([Intel XE#944])
   [137]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-432/igt@xe_create@multigpu-create-massive-size.html

  * igt@xe_eudebug@basic-connect:
    - shard-lnl:          NOTRUN -> [SKIP][138] ([Intel XE#4837]) +3 other tests skip
   [138]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@xe_eudebug@basic-connect.html

  * igt@xe_eudebug@basic-vm-bind-discovery:
    - shard-dg2-set2:     NOTRUN -> [SKIP][139] ([Intel XE#4837]) +2 other tests skip
   [139]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-432/igt@xe_eudebug@basic-vm-bind-discovery.html

  * igt@xe_eudebug_online@interrupt-all-set-breakpoint-faultable:
    - shard-adlp:         NOTRUN -> [SKIP][140] ([Intel XE#4837] / [Intel XE#5565])
   [140]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-9/igt@xe_eudebug_online@interrupt-all-set-breakpoint-faultable.html

  * igt@xe_eudebug_online@single-step:
    - shard-bmg:          NOTRUN -> [SKIP][141] ([Intel XE#4837]) +1 other test skip
   [141]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-4/igt@xe_eudebug_online@single-step.html

  * igt@xe_evict@evict-beng-large-multi-vm:
    - shard-lnl:          NOTRUN -> [SKIP][142] ([Intel XE#688])
   [142]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-4/igt@xe_evict@evict-beng-large-multi-vm.html

  * igt@xe_evict@evict-beng-small-external-cm:
    - shard-adlp:         NOTRUN -> [SKIP][143] ([Intel XE#261] / [Intel XE#5564] / [Intel XE#688]) +1 other test skip
   [143]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-4/igt@xe_evict@evict-beng-small-external-cm.html

  * igt@xe_evict@evict-large-external:
    - shard-adlp:         NOTRUN -> [SKIP][144] ([Intel XE#261] / [Intel XE#5564])
   [144]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-4/igt@xe_evict@evict-large-external.html

  * igt@xe_exec_basic@multigpu-many-execqueues-many-vm-rebind:
    - shard-bmg:          NOTRUN -> [SKIP][145] ([Intel XE#2322]) +1 other test skip
   [145]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-4/igt@xe_exec_basic@multigpu-many-execqueues-many-vm-rebind.html

  * igt@xe_exec_basic@multigpu-many-execqueues-many-vm-userptr-invalidate-race:
    - shard-dg2-set2:     [PASS][146] -> [SKIP][147] ([Intel XE#1392]) +5 other tests skip
   [146]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-dg2-433/igt@xe_exec_basic@multigpu-many-execqueues-many-vm-userptr-invalidate-race.html
   [147]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-432/igt@xe_exec_basic@multigpu-many-execqueues-many-vm-userptr-invalidate-race.html

  * igt@xe_exec_basic@multigpu-no-exec-basic-defer-mmap:
    - shard-lnl:          NOTRUN -> [SKIP][148] ([Intel XE#1392]) +4 other tests skip
   [148]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-1/igt@xe_exec_basic@multigpu-no-exec-basic-defer-mmap.html

  * igt@xe_exec_basic@multigpu-once-basic-defer-mmap:
    - shard-adlp:         NOTRUN -> [SKIP][149] ([Intel XE#1392] / [Intel XE#5575]) +1 other test skip
   [149]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-9/igt@xe_exec_basic@multigpu-once-basic-defer-mmap.html

  * igt@xe_exec_fault_mode@invalid-va:
    - shard-dg2-set2:     NOTRUN -> [SKIP][150] ([Intel XE#288]) +4 other tests skip
   [150]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-432/igt@xe_exec_fault_mode@invalid-va.html

  * igt@xe_exec_fault_mode@twice-userptr-rebind-prefetch:
    - shard-adlp:         NOTRUN -> [SKIP][151] ([Intel XE#288] / [Intel XE#5561]) +5 other tests skip
   [151]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-9/igt@xe_exec_fault_mode@twice-userptr-rebind-prefetch.html

  * igt@xe_exec_reset@parallel-gt-reset:
    - shard-adlp:         [PASS][152] -> [DMESG-WARN][153] ([Intel XE#3876])
   [152]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-adlp-2/igt@xe_exec_reset@parallel-gt-reset.html
   [153]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-1/igt@xe_exec_reset@parallel-gt-reset.html

  * igt@xe_exec_system_allocator@many-large-mmap-free-race-nomemset:
    - shard-dg2-set2:     NOTRUN -> [SKIP][154] ([Intel XE#4915]) +50 other tests skip
   [154]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-432/igt@xe_exec_system_allocator@many-large-mmap-free-race-nomemset.html

  * igt@xe_exec_system_allocator@once-malloc-bo-unmap:
    - shard-adlp:         NOTRUN -> [SKIP][155] ([Intel XE#4915]) +36 other tests skip
   [155]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-9/igt@xe_exec_system_allocator@once-malloc-bo-unmap.html

  * igt@xe_exec_system_allocator@threads-many-execqueues-mmap-new-huge:
    - shard-bmg:          NOTRUN -> [SKIP][156] ([Intel XE#4943]) +5 other tests skip
   [156]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-4/igt@xe_exec_system_allocator@threads-many-execqueues-mmap-new-huge.html

  * igt@xe_exec_system_allocator@threads-shared-vm-many-stride-mmap-free-huge-nomemset:
    - shard-lnl:          NOTRUN -> [SKIP][157] ([Intel XE#4943]) +9 other tests skip
   [157]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@xe_exec_system_allocator@threads-shared-vm-many-stride-mmap-free-huge-nomemset.html

  * igt@xe_exec_threads@threads-bal-shared-vm-userptr:
    - shard-adlp:         [PASS][158] -> [DMESG-FAIL][159] ([Intel XE#3876])
   [158]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-adlp-2/igt@xe_exec_threads@threads-bal-shared-vm-userptr.html
   [159]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-1/igt@xe_exec_threads@threads-bal-shared-vm-userptr.html

  * igt@xe_media_fill@media-fill:
    - shard-bmg:          NOTRUN -> [SKIP][160] ([Intel XE#2459] / [Intel XE#2596])
   [160]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-4/igt@xe_media_fill@media-fill.html

  * igt@xe_mmap@vram:
    - shard-adlp:         NOTRUN -> [SKIP][161] ([Intel XE#1008] / [Intel XE#5591])
   [161]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-9/igt@xe_mmap@vram.html

  * igt@xe_module_load@load:
    - shard-lnl:          ([PASS][162], [PASS][163], [PASS][164], [PASS][165], [PASS][166], [PASS][167], [PASS][168], [PASS][169], [PASS][170], [PASS][171], [PASS][172], [PASS][173], [PASS][174], [PASS][175], [PASS][176], [PASS][177], [PASS][178], [PASS][179], [PASS][180], [PASS][181], [PASS][182], [PASS][183], [PASS][184], [PASS][185], [PASS][186]) -> ([PASS][187], [PASS][188], [PASS][189], [PASS][190], [PASS][191], [PASS][192], [PASS][193], [PASS][194], [PASS][195], [PASS][196], [PASS][197], [PASS][198], [PASS][199], [PASS][200], [PASS][201], [PASS][202], [PASS][203], [PASS][204], [PASS][205], [PASS][206], [PASS][207], [PASS][208], [PASS][209], [PASS][210], [SKIP][211], [PASS][212]) ([Intel XE#378])
   [162]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-2/igt@xe_module_load@load.html
   [163]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-1/igt@xe_module_load@load.html
   [164]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-3/igt@xe_module_load@load.html
   [165]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-1/igt@xe_module_load@load.html
   [166]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-7/igt@xe_module_load@load.html
   [167]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-7/igt@xe_module_load@load.html
   [168]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-2/igt@xe_module_load@load.html
   [169]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-1/igt@xe_module_load@load.html
   [170]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-8/igt@xe_module_load@load.html
   [171]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-3/igt@xe_module_load@load.html
   [172]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-3/igt@xe_module_load@load.html
   [173]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-3/igt@xe_module_load@load.html
   [174]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-5/igt@xe_module_load@load.html
   [175]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-5/igt@xe_module_load@load.html
   [176]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-1/igt@xe_module_load@load.html
   [177]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-4/igt@xe_module_load@load.html
   [178]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-2/igt@xe_module_load@load.html
   [179]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-4/igt@xe_module_load@load.html
   [180]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-4/igt@xe_module_load@load.html
   [181]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-5/igt@xe_module_load@load.html
   [182]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-7/igt@xe_module_load@load.html
   [183]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-8/igt@xe_module_load@load.html
   [184]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-8/igt@xe_module_load@load.html
   [185]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-8/igt@xe_module_load@load.html
   [186]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-2/igt@xe_module_load@load.html
   [187]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-5/igt@xe_module_load@load.html
   [188]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-1/igt@xe_module_load@load.html
   [189]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@xe_module_load@load.html
   [190]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-8/igt@xe_module_load@load.html
   [191]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@xe_module_load@load.html
   [192]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-4/igt@xe_module_load@load.html
   [193]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@xe_module_load@load.html
   [194]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-4/igt@xe_module_load@load.html
   [195]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-3/igt@xe_module_load@load.html
   [196]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-3/igt@xe_module_load@load.html
   [197]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-8/igt@xe_module_load@load.html
   [198]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-8/igt@xe_module_load@load.html
   [199]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-3/igt@xe_module_load@load.html
   [200]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-3/igt@xe_module_load@load.html
   [201]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-1/igt@xe_module_load@load.html
   [202]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-1/igt@xe_module_load@load.html
   [203]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-4/igt@xe_module_load@load.html
   [204]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-8/igt@xe_module_load@load.html
   [205]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-5/igt@xe_module_load@load.html
   [206]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-7/igt@xe_module_load@load.html
   [207]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-8/igt@xe_module_load@load.html
   [208]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-7/igt@xe_module_load@load.html
   [209]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-7/igt@xe_module_load@load.html
   [210]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-5/igt@xe_module_load@load.html
   [211]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@xe_module_load@load.html
   [212]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@xe_module_load@load.html

  * igt@xe_oa@mmio-triggered-reports-read:
    - shard-dg2-set2:     NOTRUN -> [SKIP][213] ([Intel XE#5103])
   [213]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-432/igt@xe_oa@mmio-triggered-reports-read.html

  * igt@xe_oa@polling:
    - shard-adlp:         NOTRUN -> [SKIP][214] ([Intel XE#3573])
   [214]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-9/igt@xe_oa@polling.html

  * igt@xe_peer2peer@read:
    - shard-lnl:          NOTRUN -> [SKIP][215] ([Intel XE#1061])
   [215]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-4/igt@xe_peer2peer@read.html

  * igt@xe_peer2peer@read@read-gpua-vram01-gpub-system-p2p:
    - shard-dg2-set2:     NOTRUN -> [FAIL][216] ([Intel XE#1173]) +1 other test fail
   [216]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-435/igt@xe_peer2peer@read@read-gpua-vram01-gpub-system-p2p.html

  * igt@xe_pm@d3cold-mmap-system:
    - shard-lnl:          NOTRUN -> [SKIP][217] ([Intel XE#2284] / [Intel XE#366])
   [217]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-1/igt@xe_pm@d3cold-mmap-system.html

  * igt@xe_pm@s2idle-d3cold-basic-exec:
    - shard-bmg:          NOTRUN -> [SKIP][218] ([Intel XE#2284])
   [218]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-4/igt@xe_pm@s2idle-d3cold-basic-exec.html

  * igt@xe_pm@s3-vm-bind-unbind-all:
    - shard-lnl:          NOTRUN -> [SKIP][219] ([Intel XE#584])
   [219]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-4/igt@xe_pm@s3-vm-bind-unbind-all.html

  * igt@xe_query@multigpu-query-cs-cycles:
    - shard-lnl:          NOTRUN -> [SKIP][220] ([Intel XE#944]) +1 other test skip
   [220]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-1/igt@xe_query@multigpu-query-cs-cycles.html

  * igt@xe_query@multigpu-query-pxp-status:
    - shard-adlp:         NOTRUN -> [SKIP][221] ([Intel XE#944])
   [221]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-9/igt@xe_query@multigpu-query-pxp-status.html

  
#### Possible fixes ####

  * igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-0-hflip-async-flip:
    - shard-adlp:         [DMESG-FAIL][222] ([Intel XE#4543]) -> [PASS][223]
   [222]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-adlp-3/igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-0-hflip-async-flip.html
   [223]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-3/igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-0-hflip-async-flip.html

  * igt@kms_bw@connected-linear-tiling-2-displays-2560x1440p:
    - shard-bmg:          [SKIP][224] ([Intel XE#2314] / [Intel XE#2894]) -> [PASS][225]
   [224]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-6/igt@kms_bw@connected-linear-tiling-2-displays-2560x1440p.html
   [225]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-7/igt@kms_bw@connected-linear-tiling-2-displays-2560x1440p.html

  * igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs:
    - shard-dg2-set2:     [INCOMPLETE][226] ([Intel XE#1727] / [Intel XE#2705] / [Intel XE#3113] / [Intel XE#4212] / [Intel XE#4345] / [Intel XE#4522]) -> [PASS][227]
   [226]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-dg2-433/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs.html
   [227]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-432/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs.html

  * igt@kms_concurrent@multi-plane-atomic-lowres:
    - shard-dg2-set2:     [ABORT][228] ([Intel XE#5826]) -> [PASS][229]
   [228]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-dg2-436/igt@kms_concurrent@multi-plane-atomic-lowres.html
   [229]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-435/igt@kms_concurrent@multi-plane-atomic-lowres.html
    - shard-lnl:          [ABORT][230] ([Intel XE#5826]) -> [PASS][231] +1 other test pass
   [230]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-5/igt@kms_concurrent@multi-plane-atomic-lowres.html
   [231]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-4/igt@kms_concurrent@multi-plane-atomic-lowres.html

  * igt@kms_concurrent@multi-plane-atomic-lowres@pipe-a-hdmi-a-6:
    - shard-dg2-set2:     [ABORT][232] ([Intel XE#5826] / [Intel XE#5898]) -> [PASS][233]
   [232]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-dg2-436/igt@kms_concurrent@multi-plane-atomic-lowres@pipe-a-hdmi-a-6.html
   [233]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-435/igt@kms_concurrent@multi-plane-atomic-lowres@pipe-a-hdmi-a-6.html

  * igt@kms_cursor_crc@cursor-random-256x85:
    - shard-adlp:         [ABORT][234] ([Intel XE#5826]) -> [PASS][235] +1 other test pass
   [234]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-adlp-4/igt@kms_cursor_crc@cursor-random-256x85.html
   [235]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-4/igt@kms_cursor_crc@cursor-random-256x85.html

  * igt@kms_cursor_legacy@cursorb-vs-flipb-atomic-transitions-varying-size:
    - shard-bmg:          [SKIP][236] ([Intel XE#2291]) -> [PASS][237] +5 other tests pass
   [236]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-6/igt@kms_cursor_legacy@cursorb-vs-flipb-atomic-transitions-varying-size.html
   [237]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-7/igt@kms_cursor_legacy@cursorb-vs-flipb-atomic-transitions-varying-size.html

  * igt@kms_display_modes@extended-mode-basic:
    - shard-bmg:          [SKIP][238] ([Intel XE#4302]) -> [PASS][239]
   [238]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-6/igt@kms_display_modes@extended-mode-basic.html
   [239]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-7/igt@kms_display_modes@extended-mode-basic.html

  * igt@kms_flip@2x-plain-flip-fb-recreate:
    - shard-bmg:          [SKIP][240] ([Intel XE#2316]) -> [PASS][241] +7 other tests pass
   [240]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-6/igt@kms_flip@2x-plain-flip-fb-recreate.html
   [241]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-7/igt@kms_flip@2x-plain-flip-fb-recreate.html

  * igt@kms_flip@flip-vs-expired-vblank@b-edp1:
    - shard-lnl:          [FAIL][242] ([Intel XE#301]) -> [PASS][243]
   [242]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-3/igt@kms_flip@flip-vs-expired-vblank@b-edp1.html
   [243]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@kms_flip@flip-vs-expired-vblank@b-edp1.html

  * igt@kms_flip@plain-flip-interruptible@b-hdmi-a1:
    - shard-adlp:         [DMESG-WARN][244] ([Intel XE#4543]) -> [PASS][245] +7 other tests pass
   [244]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-adlp-3/igt@kms_flip@plain-flip-interruptible@b-hdmi-a1.html
   [245]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-3/igt@kms_flip@plain-flip-interruptible@b-hdmi-a1.html

  * igt@kms_pipe_crc_basic@suspend-read-crc@pipe-a-hdmi-a-1:
    - shard-adlp:         [DMESG-WARN][246] ([Intel XE#2953] / [Intel XE#4173]) -> [PASS][247] +4 other tests pass
   [246]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-adlp-8/igt@kms_pipe_crc_basic@suspend-read-crc@pipe-a-hdmi-a-1.html
   [247]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-2/igt@kms_pipe_crc_basic@suspend-read-crc@pipe-a-hdmi-a-1.html

  * igt@xe_exec_basic@multigpu-no-exec-null-defer-bind:
    - shard-dg2-set2:     [SKIP][248] ([Intel XE#1392]) -> [PASS][249] +5 other tests pass
   [248]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-dg2-432/igt@xe_exec_basic@multigpu-no-exec-null-defer-bind.html
   [249]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-dg2-464/igt@xe_exec_basic@multigpu-no-exec-null-defer-bind.html

  
#### Warnings ####

  * igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-hflip:
    - shard-adlp:         [DMESG-FAIL][250] ([Intel XE#4543]) -> [FAIL][251] ([Intel XE#1874])
   [250]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-adlp-9/igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-hflip.html
   [251]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-8/igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-hflip.html

  * igt@kms_chamelium_edid@hdmi-edid-change-during-hibernate:
    - shard-lnl:          [ABORT][252] -> [SKIP][253] ([Intel XE#373])
   [252]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-1/igt@kms_chamelium_edid@hdmi-edid-change-during-hibernate.html
   [253]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@kms_chamelium_edid@hdmi-edid-change-during-hibernate.html

  * igt@kms_content_protection@srm:
    - shard-bmg:          [FAIL][254] ([Intel XE#1178]) -> [SKIP][255] ([Intel XE#2341])
   [254]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-4/igt@kms_content_protection@srm.html
   [255]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-6/igt@kms_content_protection@srm.html

  * igt@kms_content_protection@uevent:
    - shard-bmg:          [SKIP][256] ([Intel XE#2341]) -> [FAIL][257] ([Intel XE#1188])
   [256]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-6/igt@kms_content_protection@uevent.html
   [257]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-7/igt@kms_content_protection@uevent.html

  * igt@kms_fbcon_fbt@fbc-suspend:
    - shard-adlp:         [ABORT][258] ([Intel XE#4847]) -> [ABORT][259] ([Intel XE#4847] / [Intel XE#5545])
   [258]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-adlp-2/igt@kms_fbcon_fbt@fbc-suspend.html
   [259]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-adlp-1/igt@kms_fbcon_fbt@fbc-suspend.html

  * igt@kms_flip@flip-vs-expired-vblank:
    - shard-lnl:          [FAIL][260] ([Intel XE#301]) -> [FAIL][261] ([Intel XE#301] / [Intel XE#3149])
   [260]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-3/igt@kms_flip@flip-vs-expired-vblank.html
   [261]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-2/igt@kms_flip@flip-vs-expired-vblank.html

  * igt@kms_frontbuffer_tracking@drrs-2p-primscrn-cur-indfb-draw-render:
    - shard-bmg:          [SKIP][262] ([Intel XE#2312]) -> [SKIP][263] ([Intel XE#2311]) +15 other tests skip
   [262]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-6/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-cur-indfb-draw-render.html
   [263]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-7/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-cur-indfb-draw-render.html

  * igt@kms_frontbuffer_tracking@drrs-2p-scndscrn-pri-shrfb-draw-mmap-wc:
    - shard-bmg:          [SKIP][264] ([Intel XE#2311]) -> [SKIP][265] ([Intel XE#2312]) +9 other tests skip
   [264]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-4/igt@kms_frontbuffer_tracking@drrs-2p-scndscrn-pri-shrfb-draw-mmap-wc.html
   [265]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-6/igt@kms_frontbuffer_tracking@drrs-2p-scndscrn-pri-shrfb-draw-mmap-wc.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-shrfb-pgflip-blt:
    - shard-bmg:          [SKIP][266] ([Intel XE#2312]) -> [SKIP][267] ([Intel XE#5390]) +4 other tests skip
   [266]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-6/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-shrfb-pgflip-blt.html
   [267]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-1/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-shrfb-pgflip-blt.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-onoff:
    - shard-bmg:          [SKIP][268] ([Intel XE#5390]) -> [SKIP][269] ([Intel XE#2312]) +7 other tests skip
   [268]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-7/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-onoff.html
   [269]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-6/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-onoff.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-shrfb-plflip-blt:
    - shard-bmg:          [SKIP][270] ([Intel XE#2312]) -> [SKIP][271] ([Intel XE#2313]) +15 other tests skip
   [270]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-6/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-shrfb-plflip-blt.html
   [271]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-7/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-shrfb-plflip-blt.html

  * igt@kms_frontbuffer_tracking@psr-2p-scndscrn-pri-indfb-draw-mmap-wc:
    - shard-bmg:          [SKIP][272] ([Intel XE#2313]) -> [SKIP][273] ([Intel XE#2312]) +10 other tests skip
   [272]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-4/igt@kms_frontbuffer_tracking@psr-2p-scndscrn-pri-indfb-draw-mmap-wc.html
   [273]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-6/igt@kms_frontbuffer_tracking@psr-2p-scndscrn-pri-indfb-draw-mmap-wc.html

  * igt@kms_plane_multiple@2x-tiling-y:
    - shard-bmg:          [SKIP][274] ([Intel XE#5021]) -> [SKIP][275] ([Intel XE#4596])
   [274]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-7/igt@kms_plane_multiple@2x-tiling-y.html
   [275]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-6/igt@kms_plane_multiple@2x-tiling-y.html

  * igt@kms_tiled_display@basic-test-pattern-with-chamelium:
    - shard-bmg:          [SKIP][276] ([Intel XE#2509]) -> [SKIP][277] ([Intel XE#2426])
   [276]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-bmg-5/igt@kms_tiled_display@basic-test-pattern-with-chamelium.html
   [277]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-bmg-8/igt@kms_tiled_display@basic-test-pattern-with-chamelium.html

  * igt@xe_fault_injection@probe-fail-guc-xe_guc_ct_send_recv:
    - shard-lnl:          [ABORT][278] ([Intel XE#4917] / [Intel XE#5466]) -> [ABORT][279] ([Intel XE#5466])
   [278]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13/shard-lnl-2/igt@xe_fault_injection@probe-fail-guc-xe_guc_ct_send_recv.html
   [279]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/shard-lnl-8/igt@xe_fault_injection@probe-fail-guc-xe_guc_ct_send_recv.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [Intel XE#1008]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1008
  [Intel XE#1061]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1061
  [Intel XE#1124]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1124
  [Intel XE#1173]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1173
  [Intel XE#1178]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1178
  [Intel XE#1188]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1188
  [Intel XE#1392]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1392
  [Intel XE#1397]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1397
  [Intel XE#1401]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1401
  [Intel XE#1406]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1406
  [Intel XE#1421]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1421
  [Intel XE#1424]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1424
  [Intel XE#1428]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1428
  [Intel XE#1435]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1435
  [Intel XE#1439]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1439
  [Intel XE#1489]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1489
  [Intel XE#1499]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1499
  [Intel XE#1503]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1503
  [Intel XE#1508]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1508
  [Intel XE#1727]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1727
  [Intel XE#1745]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1745
  [Intel XE#1874]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1874
  [Intel XE#2049]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2049
  [Intel XE#2234]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2234
  [Intel XE#2244]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2244
  [Intel XE#2252]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2252
  [Intel XE#2284]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2284
  [Intel XE#2291]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2291
  [Intel XE#2293]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2293
  [Intel XE#2311]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2311
  [Intel XE#2312]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2312
  [Intel XE#2313]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2313
  [Intel XE#2314]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2314
  [Intel XE#2316]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2316
  [Intel XE#2320]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2320
  [Intel XE#2321]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2321
  [Intel XE#2322]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2322
  [Intel XE#2341]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2341
  [Intel XE#2380]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2380
  [Intel XE#2426]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2426
  [Intel XE#2459]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2459
  [Intel XE#2509]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2509
  [Intel XE#2596]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2596
  [Intel XE#2597]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2597
  [Intel XE#261]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/261
  [Intel XE#2652]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2652
  [Intel XE#2705]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2705
  [Intel XE#2763]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2763
  [Intel XE#2850]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2850
  [Intel XE#288]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/288
  [Intel XE#2887]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2887
  [Intel XE#2894]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2894
  [Intel XE#2907]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2907
  [Intel XE#2938]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2938
  [Intel XE#2953]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2953
  [Intel XE#301]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/301
  [Intel XE#3012]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3012
  [Intel XE#306]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/306
  [Intel XE#307]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/307
  [Intel XE#309]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/309
  [Intel XE#3113]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3113
  [Intel XE#3141]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3141
  [Intel XE#3149]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3149
  [Intel XE#316]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/316
  [Intel XE#323]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/323
  [Intel XE#3278]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3278
  [Intel XE#3279]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3279
  [Intel XE#3304]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3304
  [Intel XE#3374]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3374
  [Intel XE#3414]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3414
  [Intel XE#3544]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3544
  [Intel XE#3573]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3573
  [Intel XE#366]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/366
  [Intel XE#367]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/367
  [Intel XE#373]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/373
  [Intel XE#378]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/378
  [Intel XE#3862]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3862
  [Intel XE#3876]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3876
  [Intel XE#3884]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3884
  [Intel XE#4173]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4173
  [Intel XE#4212]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4212
  [Intel XE#4302]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4302
  [Intel XE#4345]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4345
  [Intel XE#4354]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4354
  [Intel XE#4417]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4417
  [Intel XE#4522]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4522
  [Intel XE#4543]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4543
  [Intel XE#455]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/455
  [Intel XE#4596]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4596
  [Intel XE#4626]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4626
  [Intel XE#4658]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4658
  [Intel XE#4677]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4677
  [Intel XE#4689]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4689
  [Intel XE#4837]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4837
  [Intel XE#4847]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4847
  [Intel XE#488]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/488
  [Intel XE#4915]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4915
  [Intel XE#4917]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4917
  [Intel XE#4943]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4943
  [Intel XE#5021]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5021
  [Intel XE#5103]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5103
  [Intel XE#5208]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5208
  [Intel XE#5390]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5390
  [Intel XE#5466]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5466
  [Intel XE#5545]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5545
  [Intel XE#5561]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5561
  [Intel XE#5564]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5564
  [Intel XE#5565]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5565
  [Intel XE#5575]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5575
  [Intel XE#5591]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5591
  [Intel XE#5607]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5607
  [Intel XE#5826]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5826
  [Intel XE#584]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/584
  [Intel XE#5898]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5898
  [Intel XE#5899]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5899
  [Intel XE#607]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/607
  [Intel XE#651]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/651
  [Intel XE#653]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/653
  [Intel XE#656]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/656
  [Intel XE#688]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/688
  [Intel XE#787]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/787
  [Intel XE#836]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/836
  [Intel XE#929]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/929
  [Intel XE#944]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/944


Build changes
-------------

  * Linux: xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13 -> xe-pw-152882v1

  IGT_8493: 8493
  xe-3539-546fc742f08b8dbd3fa1486933c9b15085e11d13: 546fc742f08b8dbd3fa1486933c9b15085e11d13
  xe-pw-152882v1: 152882v1

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v1/index.html

[-- Attachment #2: Type: text/html, Size: 84622 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 01/15] drm/xe/vm: Don't use a pin the vm_resv during validation
  2025-08-13 10:51 ` [PATCH 01/15] drm/xe/vm: Don't use a pin the vm_resv during validation Thomas Hellström
@ 2025-08-13 14:28   ` Matthew Brost
  2025-08-13 14:33     ` Thomas Hellström
  0 siblings, 1 reply; 66+ messages in thread
From: Matthew Brost @ 2025-08-13 14:28 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, Aug 13, 2025 at 12:51:07PM +0200, Thomas Hellström wrote:
> The pinning has the odd side-effect that unlocking *any* resv
> during validation triggers an "unlocking pinned lock" warning.
> 

So this is a cross process thing then - right? e.g., Process A pins a
dma-resv lock, Process B unlock a dma-resv lock and boom lockdep
warning? Just want to make sure I am understandinf the problem
correctly.

Matt 

> Cc: Matthew Brost <matthew.brost@intel.com>
> Fixes: 9d5558649f68 ("drm/xe: Rework eviction rejection of bound external bos")
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.c |  5 ++---
>  drivers/gpu/drm/xe/xe_vm.h | 15 ++-------------
>  2 files changed, 4 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 6fea39842e1e..11eaf3b06766 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -2468,7 +2468,6 @@ int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict)
>  		.no_wait_gpu = false,
>  		.gfp_retry_mayfail = true,
>  	};
> -	struct pin_cookie cookie;
>  	int ret;
>  
>  	if (vm) {
> @@ -2479,10 +2478,10 @@ int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict)
>  		ctx.resv = xe_vm_resv(vm);
>  	}
>  
> -	cookie = xe_vm_set_validating(vm, allow_res_evict);
> +	xe_vm_set_validating(vm, allow_res_evict);
>  	trace_xe_bo_validate(bo);
>  	ret = ttm_bo_validate(&bo->ttm, &bo->placement, &ctx);
> -	xe_vm_clear_validating(vm, allow_res_evict, cookie);
> +	xe_vm_clear_validating(vm, allow_res_evict);
>  
>  	return ret;
>  }
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index 2f213737c7e5..2ecb417c19a2 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -315,22 +315,14 @@ void xe_vm_snapshot_free(struct xe_vm_snapshot *snap);
>   * Register this task as currently making bos resident for the vm. Intended
>   * to avoid eviction by the same task of shared bos bound to the vm.
>   * Call with the vm's resv lock held.
> - *
> - * Return: A pin cookie that should be used for xe_vm_clear_validating().
>   */
> -static inline struct pin_cookie xe_vm_set_validating(struct xe_vm *vm,
> -						     bool allow_res_evict)
> +static inline void xe_vm_set_validating(struct xe_vm *vm, bool allow_res_evict)
>  {
> -	struct pin_cookie cookie = {};
> -
>  	if (vm && !allow_res_evict) {
>  		xe_vm_assert_held(vm);
> -		cookie = lockdep_pin_lock(&xe_vm_resv(vm)->lock.base);
>  		/* Pairs with READ_ONCE in xe_vm_is_validating() */
>  		WRITE_ONCE(vm->validating, current);
>  	}
> -
> -	return cookie;
>  }
>  
>  /**
> @@ -338,17 +330,14 @@ static inline struct pin_cookie xe_vm_set_validating(struct xe_vm *vm,
>   * @vm: Pointer to the vm or NULL
>   * @allow_res_evict: Eviction from @vm was allowed. Must be set to the same
>   * value as for xe_vm_set_validation().
> - * @cookie: Cookie obtained from xe_vm_set_validating().
>   *
>   * Register this task as currently making bos resident for the vm. Intended
>   * to avoid eviction by the same task of shared bos bound to the vm.
>   * Call with the vm's resv lock held.
>   */
> -static inline void xe_vm_clear_validating(struct xe_vm *vm, bool allow_res_evict,
> -					  struct pin_cookie cookie)
> +static inline void xe_vm_clear_validating(struct xe_vm *vm, bool allow_res_evict)
>  {
>  	if (vm && !allow_res_evict) {
> -		lockdep_unpin_lock(&xe_vm_resv(vm)->lock.base, cookie);
>  		/* Pairs with READ_ONCE in xe_vm_is_validating() */
>  		WRITE_ONCE(vm->validating, NULL);
>  	}
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 01/15] drm/xe/vm: Don't use a pin the vm_resv during validation
  2025-08-13 14:28   ` Matthew Brost
@ 2025-08-13 14:33     ` Thomas Hellström
  2025-08-13 15:17       ` Matthew Brost
  0 siblings, 1 reply; 66+ messages in thread
From: Thomas Hellström @ 2025-08-13 14:33 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, 2025-08-13 at 07:28 -0700, Matthew Brost wrote:
> On Wed, Aug 13, 2025 at 12:51:07PM +0200, Thomas Hellström wrote:
> > The pinning has the odd side-effect that unlocking *any* resv
> > during validation triggers an "unlocking pinned lock" warning.
> > 
> 
> So this is a cross process thing then - right? e.g., Process A pins a
> dma-resv lock, Process B unlock a dma-resv lock and boom lockdep
> warning? Just want to make sure I am understandinf the problem
> correctly.

No, my understanding is that this is a single process thing.
We lock the vm, pin it, lock a bo for eviction, perhaps using the same
ww_mutex_ctx, unlock the evicted bo => bang.

It might be that the locks need to be locked using the same ww_mutex
context, but not sure.

/Thomas


> 
> Matt 
> 
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Fixes: 9d5558649f68 ("drm/xe: Rework eviction rejection of bound
> > external bos")
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_bo.c |  5 ++---
> >  drivers/gpu/drm/xe/xe_vm.h | 15 ++-------------
> >  2 files changed, 4 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > b/drivers/gpu/drm/xe/xe_bo.c
> > index 6fea39842e1e..11eaf3b06766 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > @@ -2468,7 +2468,6 @@ int xe_bo_validate(struct xe_bo *bo, struct
> > xe_vm *vm, bool allow_res_evict)
> >  		.no_wait_gpu = false,
> >  		.gfp_retry_mayfail = true,
> >  	};
> > -	struct pin_cookie cookie;
> >  	int ret;
> >  
> >  	if (vm) {
> > @@ -2479,10 +2478,10 @@ int xe_bo_validate(struct xe_bo *bo, struct
> > xe_vm *vm, bool allow_res_evict)
> >  		ctx.resv = xe_vm_resv(vm);
> >  	}
> >  
> > -	cookie = xe_vm_set_validating(vm, allow_res_evict);
> > +	xe_vm_set_validating(vm, allow_res_evict);
> >  	trace_xe_bo_validate(bo);
> >  	ret = ttm_bo_validate(&bo->ttm, &bo->placement, &ctx);
> > -	xe_vm_clear_validating(vm, allow_res_evict, cookie);
> > +	xe_vm_clear_validating(vm, allow_res_evict);
> >  
> >  	return ret;
> >  }
> > diff --git a/drivers/gpu/drm/xe/xe_vm.h
> > b/drivers/gpu/drm/xe/xe_vm.h
> > index 2f213737c7e5..2ecb417c19a2 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.h
> > +++ b/drivers/gpu/drm/xe/xe_vm.h
> > @@ -315,22 +315,14 @@ void xe_vm_snapshot_free(struct
> > xe_vm_snapshot *snap);
> >   * Register this task as currently making bos resident for the vm.
> > Intended
> >   * to avoid eviction by the same task of shared bos bound to the
> > vm.
> >   * Call with the vm's resv lock held.
> > - *
> > - * Return: A pin cookie that should be used for
> > xe_vm_clear_validating().
> >   */
> > -static inline struct pin_cookie xe_vm_set_validating(struct xe_vm
> > *vm,
> > -						     bool
> > allow_res_evict)
> > +static inline void xe_vm_set_validating(struct xe_vm *vm, bool
> > allow_res_evict)
> >  {
> > -	struct pin_cookie cookie = {};
> > -
> >  	if (vm && !allow_res_evict) {
> >  		xe_vm_assert_held(vm);
> > -		cookie = lockdep_pin_lock(&xe_vm_resv(vm)-
> > >lock.base);
> >  		/* Pairs with READ_ONCE in xe_vm_is_validating()
> > */
> >  		WRITE_ONCE(vm->validating, current);
> >  	}
> > -
> > -	return cookie;
> >  }
> >  
> >  /**
> > @@ -338,17 +330,14 @@ static inline struct pin_cookie
> > xe_vm_set_validating(struct xe_vm *vm,
> >   * @vm: Pointer to the vm or NULL
> >   * @allow_res_evict: Eviction from @vm was allowed. Must be set to
> > the same
> >   * value as for xe_vm_set_validation().
> > - * @cookie: Cookie obtained from xe_vm_set_validating().
> >   *
> >   * Register this task as currently making bos resident for the vm.
> > Intended
> >   * to avoid eviction by the same task of shared bos bound to the
> > vm.
> >   * Call with the vm's resv lock held.
> >   */
> > -static inline void xe_vm_clear_validating(struct xe_vm *vm, bool
> > allow_res_evict,
> > -					  struct pin_cookie
> > cookie)
> > +static inline void xe_vm_clear_validating(struct xe_vm *vm, bool
> > allow_res_evict)
> >  {
> >  	if (vm && !allow_res_evict) {
> > -		lockdep_unpin_lock(&xe_vm_resv(vm)->lock.base,
> > cookie);
> >  		/* Pairs with READ_ONCE in xe_vm_is_validating()
> > */
> >  		WRITE_ONCE(vm->validating, NULL);
> >  	}
> > -- 
> > 2.50.1
> > 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 03/15] drm/xe/vm: Clear the scratch_pt pointer on error
  2025-08-13 10:51 ` [PATCH 03/15] drm/xe/vm: Clear the scratch_pt pointer on error Thomas Hellström
@ 2025-08-13 14:45   ` Matthew Brost
  0 siblings, 0 replies; 66+ messages in thread
From: Matthew Brost @ 2025-08-13 14:45 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Brian Welty, Rodrigo Vivi, Lucas De Marchi, stable,
	Joonas Lahtinen, Jani Nikula, Maarten Lankhorst, Matthew Auld

On Wed, Aug 13, 2025 at 12:51:09PM +0200, Thomas Hellström wrote:
> Avoid triggering a dereference of an error pointer on cleanup in
> xe_vm_free_scratch() by clearing any scratch_pt error pointer.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Fixes: 06951c2ee72d ("drm/xe: Use NULL PTEs as scratch PTEs")
> Cc: Brian Welty <brian.welty@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> Cc: <stable@vger.kernel.org> # v6.8+

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_vm.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index d40d2d43c041..12e661960244 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -1635,8 +1635,12 @@ static int xe_vm_create_scratch(struct xe_device *xe, struct xe_tile *tile,
>  
>  	for (i = MAX_HUGEPTE_LEVEL; i < vm->pt_root[id]->level; i++) {
>  		vm->scratch_pt[id][i] = xe_pt_create(vm, tile, i);
> -		if (IS_ERR(vm->scratch_pt[id][i]))
> -			return PTR_ERR(vm->scratch_pt[id][i]);
> +		if (IS_ERR(vm->scratch_pt[id][i])) {
> +			int err = PTR_ERR(vm->scratch_pt[id][i]);
> +
> +			vm->scratch_pt[id][i] = NULL;
> +			return err;
> +		}
>  
>  		xe_pt_populate_empty(tile, vm, vm->scratch_pt[id][i]);
>  	}
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 01/15] drm/xe/vm: Don't use a pin the vm_resv during validation
  2025-08-13 14:33     ` Thomas Hellström
@ 2025-08-13 15:17       ` Matthew Brost
  0 siblings, 0 replies; 66+ messages in thread
From: Matthew Brost @ 2025-08-13 15:17 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, Aug 13, 2025 at 04:33:01PM +0200, Thomas Hellström wrote:
> On Wed, 2025-08-13 at 07:28 -0700, Matthew Brost wrote:
> > On Wed, Aug 13, 2025 at 12:51:07PM +0200, Thomas Hellström wrote:
> > > The pinning has the odd side-effect that unlocking *any* resv
> > > during validation triggers an "unlocking pinned lock" warning.
> > > 
> > 
> > So this is a cross process thing then - right? e.g., Process A pins a
> > dma-resv lock, Process B unlock a dma-resv lock and boom lockdep
> > warning? Just want to make sure I am understandinf the problem
> > correctly.
> 
> No, my understanding is that this is a single process thing.
> We lock the vm, pin it, lock a bo for eviction, perhaps using the same
> ww_mutex_ctx, unlock the evicted bo => bang.
> 

Ah, ok. Got it, makes sense how this can occuring within a single
process if eviction is triggered.

> It might be that the locks need to be locked using the same ww_mutex
> context, but not sure.
>

I'd guess lockdep_pin_lock operates on a lockdep class, all dma-resv use
the same class which creates this issue.

At any rate, this extra assert isn't really needed so I don't see an
issue removing this.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> /Thomas
> 
> 
> > 
> > Matt 
> > 
> > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > Fixes: 9d5558649f68 ("drm/xe: Rework eviction rejection of bound
> > > external bos")
> > > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_bo.c |  5 ++---
> > >  drivers/gpu/drm/xe/xe_vm.h | 15 ++-------------
> > >  2 files changed, 4 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > > b/drivers/gpu/drm/xe/xe_bo.c
> > > index 6fea39842e1e..11eaf3b06766 100644
> > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > @@ -2468,7 +2468,6 @@ int xe_bo_validate(struct xe_bo *bo, struct
> > > xe_vm *vm, bool allow_res_evict)
> > >  		.no_wait_gpu = false,
> > >  		.gfp_retry_mayfail = true,
> > >  	};
> > > -	struct pin_cookie cookie;
> > >  	int ret;
> > >  
> > >  	if (vm) {
> > > @@ -2479,10 +2478,10 @@ int xe_bo_validate(struct xe_bo *bo, struct
> > > xe_vm *vm, bool allow_res_evict)
> > >  		ctx.resv = xe_vm_resv(vm);
> > >  	}
> > >  
> > > -	cookie = xe_vm_set_validating(vm, allow_res_evict);
> > > +	xe_vm_set_validating(vm, allow_res_evict);
> > >  	trace_xe_bo_validate(bo);
> > >  	ret = ttm_bo_validate(&bo->ttm, &bo->placement, &ctx);
> > > -	xe_vm_clear_validating(vm, allow_res_evict, cookie);
> > > +	xe_vm_clear_validating(vm, allow_res_evict);
> > >  
> > >  	return ret;
> > >  }
> > > diff --git a/drivers/gpu/drm/xe/xe_vm.h
> > > b/drivers/gpu/drm/xe/xe_vm.h
> > > index 2f213737c7e5..2ecb417c19a2 100644
> > > --- a/drivers/gpu/drm/xe/xe_vm.h
> > > +++ b/drivers/gpu/drm/xe/xe_vm.h
> > > @@ -315,22 +315,14 @@ void xe_vm_snapshot_free(struct
> > > xe_vm_snapshot *snap);
> > >   * Register this task as currently making bos resident for the vm.
> > > Intended
> > >   * to avoid eviction by the same task of shared bos bound to the
> > > vm.
> > >   * Call with the vm's resv lock held.
> > > - *
> > > - * Return: A pin cookie that should be used for
> > > xe_vm_clear_validating().
> > >   */
> > > -static inline struct pin_cookie xe_vm_set_validating(struct xe_vm
> > > *vm,
> > > -						     bool
> > > allow_res_evict)
> > > +static inline void xe_vm_set_validating(struct xe_vm *vm, bool
> > > allow_res_evict)
> > >  {
> > > -	struct pin_cookie cookie = {};
> > > -
> > >  	if (vm && !allow_res_evict) {
> > >  		xe_vm_assert_held(vm);
> > > -		cookie = lockdep_pin_lock(&xe_vm_resv(vm)-
> > > >lock.base);
> > >  		/* Pairs with READ_ONCE in xe_vm_is_validating()
> > > */
> > >  		WRITE_ONCE(vm->validating, current);
> > >  	}
> > > -
> > > -	return cookie;
> > >  }
> > >  
> > >  /**
> > > @@ -338,17 +330,14 @@ static inline struct pin_cookie
> > > xe_vm_set_validating(struct xe_vm *vm,
> > >   * @vm: Pointer to the vm or NULL
> > >   * @allow_res_evict: Eviction from @vm was allowed. Must be set to
> > > the same
> > >   * value as for xe_vm_set_validation().
> > > - * @cookie: Cookie obtained from xe_vm_set_validating().
> > >   *
> > >   * Register this task as currently making bos resident for the vm.
> > > Intended
> > >   * to avoid eviction by the same task of shared bos bound to the
> > > vm.
> > >   * Call with the vm's resv lock held.
> > >   */
> > > -static inline void xe_vm_clear_validating(struct xe_vm *vm, bool
> > > allow_res_evict,
> > > -					  struct pin_cookie
> > > cookie)
> > > +static inline void xe_vm_clear_validating(struct xe_vm *vm, bool
> > > allow_res_evict)
> > >  {
> > >  	if (vm && !allow_res_evict) {
> > > -		lockdep_unpin_lock(&xe_vm_resv(vm)->lock.base,
> > > cookie);
> > >  		/* Pairs with READ_ONCE in xe_vm_is_validating()
> > > */
> > >  		WRITE_ONCE(vm->validating, NULL);
> > >  	}
> > > -- 
> > > 2.50.1
> > > 
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 07/15] drm/xe: Convert SVM validation for exhaustive eviction
  2025-08-13 10:51 ` [PATCH 07/15] drm/xe: Convert SVM validation " Thomas Hellström
@ 2025-08-13 15:32   ` Matthew Brost
  2025-08-14 12:24     ` Thomas Hellström
  0 siblings, 1 reply; 66+ messages in thread
From: Matthew Brost @ 2025-08-13 15:32 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, Aug 13, 2025 at 12:51:13PM +0200, Thomas Hellström wrote:
> Convert SVM validation to support exhaustive eviction,
> using xe_validation_guard().
> 

Do we not need to validation guard + xe_vm_set_validation_exec around
xe_vm_range_rebind, given that on first fault of range we can allocate
PTs?

Matt

> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/xe_svm.c | 63 ++++++++++++++++++-------------------
>  1 file changed, 30 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> index 39e3aa6df25a..ba85665d85d4 100644
> --- a/drivers/gpu/drm/xe/xe_svm.c
> +++ b/drivers/gpu/drm/xe/xe_svm.c
> @@ -699,51 +699,48 @@ static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
>  	struct xe_device *xe = vr->xe;
>  	struct device *dev = xe->drm.dev;
>  	struct drm_buddy_block *block;
> +	struct xe_validation_ctx vctx;
>  	struct list_head *blocks;
> -	struct drm_exec *exec;
> +	struct drm_exec exec;
>  	struct xe_bo *bo;
> -	ktime_t time_end = 0;
> -	int err, idx;
> +	int err = 0, idx;
>  
>  	if (!drm_dev_enter(&xe->drm, &idx))
>  		return -ENODEV;
>  
>  	xe_pm_runtime_get(xe);
> -	exec = XE_VALIDATION_UNIMPLEMENTED;
> -
> - retry:
> -	bo = xe_bo_create_locked(vr->xe, NULL, NULL, end - start,
> -				 ttm_bo_type_device,
> -				 (IS_DGFX(xe) ? XE_BO_FLAG_VRAM(vr) : XE_BO_FLAG_SYSTEM) |
> -				 XE_BO_FLAG_CPU_ADDR_MIRROR, exec);
> -	if (IS_ERR(bo)) {
> -		err = PTR_ERR(bo);
> -		if (xe_vm_validate_should_retry(NULL, err, &time_end))
> -			goto retry;
> -		goto out_pm_put;
> -	}
>  
> -	drm_pagemap_devmem_init(&bo->devmem_allocation, dev, mm,
> -				&dpagemap_devmem_ops, dpagemap, end - start);
> -
> -	blocks = &to_xe_ttm_vram_mgr_resource(bo->ttm.resource)->blocks;
> -	list_for_each_entry(block, blocks, link)
> -		block->private = vr;
> +	xe_validation_guard(&vctx, &xe->val, &exec, 0, err, false) {
> +		bo = xe_bo_create_locked(xe, NULL, NULL, end - start,
> +					 ttm_bo_type_device,
> +					 (IS_DGFX(xe) ? XE_BO_FLAG_VRAM(vr) : XE_BO_FLAG_SYSTEM) |
> +					 XE_BO_FLAG_CPU_ADDR_MIRROR, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +		if (IS_ERR(bo)) {
> +			err = PTR_ERR(bo);
> +			xe_validation_retry_on_oom(&vctx, &err);
> +			break;
> +		}
>  
> -	xe_bo_get(bo);
> +		drm_pagemap_devmem_init(&bo->devmem_allocation, dev, mm,
> +					&dpagemap_devmem_ops, dpagemap, end - start);
>  
> -	/* Ensure the device has a pm ref while there are device pages active. */
> -	xe_pm_runtime_get_noresume(xe);
> -	err = drm_pagemap_migrate_to_devmem(&bo->devmem_allocation, mm,
> -					    start, end, timeslice_ms,
> -					    xe_svm_devm_owner(xe));
> -	if (err)
> -		xe_svm_devmem_release(&bo->devmem_allocation);
> +		blocks = &to_xe_ttm_vram_mgr_resource(bo->ttm.resource)->blocks;
> +		list_for_each_entry(block, blocks, link)
> +			block->private = vr;
>  
> -	xe_bo_unlock(bo);
> -	xe_bo_put(bo);
> +		xe_bo_get(bo);
>  
> -out_pm_put:
> +		/* Ensure the device has a pm ref while there are device pages active. */
> +		xe_pm_runtime_get_noresume(xe);
> +		err = drm_pagemap_migrate_to_devmem(&bo->devmem_allocation, mm,
> +						    start, end, timeslice_ms,
> +						    xe_svm_devm_owner(xe));
> +		if (err)
> +			xe_svm_devmem_release(&bo->devmem_allocation);
> +		xe_bo_unlock(bo);
> +		xe_bo_put(bo);
> +	}
>  	xe_pm_runtime_put(xe);
>  	drm_dev_exit(idx);
>  
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 04/15] drm/xe: Pass down drm_exec context to validation
  2025-08-13 10:51 ` [PATCH 04/15] drm/xe: Pass down drm_exec context to validation Thomas Hellström
@ 2025-08-13 16:42   ` Matthew Brost
  2025-08-14  7:49     ` Thomas Hellström
  2025-08-22  7:40     ` Thomas Hellström
  0 siblings, 2 replies; 66+ messages in thread
From: Matthew Brost @ 2025-08-13 16:42 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, Aug 13, 2025 at 12:51:10PM +0200, Thomas Hellström wrote:
> We want all validation (potential backing store allocation) to be part
> of a drm_exec transaction. Therefore add a drm_exec pointer argument
> to xe_bo_validate() and ___xe_bo_create_locked(). Upcoming patches
> will deal with making all (or nearly all) calls to these functions
> part of a drm_exec transaction. In the meantime, define special values
> of the drm_exec pointer:
>

Would the eventual idea be pass the exec further down to TTM?
 
> XE_VALIDATION_UNIMPLEMENTED: Implementation of the drm_exec transaction
> has not been done yet.
> XE_VALIDATION_UNSUPPORTED: Some Middle-layers (dma-buf) doesn't allow
> the drm_exec context to be passed down to map_attachment where
> validation takes place.

What is the expected longterm implictation of paths that are
UNIMPLEMENTED and UNSUPPORTED?

> XE_VALIDATION_OPT_OUT: May be used only for kunit tests where exhaustive
> eviction isn't crucial and the ROI of converting those is very
> small.
> 
> For XE_VALIDATION_UNIMPLEMENTED and XE_VALIDATION_OPT_OUT there is also
> a lockdep check that a drm_exec transaction can indeed start at the
> location where the macro is expanded. This is to encourage
> developers to take this into consideration early in the code
> development process.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/Makefile                   |   1 +
>  .../compat-i915-headers/gem/i915_gem_stolen.h |   6 +-
>  drivers/gpu/drm/xe/display/xe_fb_pin.c        |   5 +-
>  drivers/gpu/drm/xe/tests/xe_bo.c              |  20 +--
>  drivers/gpu/drm/xe/tests/xe_dma_buf.c         |  12 +-
>  drivers/gpu/drm/xe/tests/xe_migrate.c         |  45 +++---
>  drivers/gpu/drm/xe/xe_bo.c                    | 129 +++++++++++++++---
>  drivers/gpu/drm/xe/xe_bo.h                    |  20 +--
>  drivers/gpu/drm/xe/xe_dma_buf.c               |  19 ++-
>  drivers/gpu/drm/xe/xe_exec.c                  |   6 +-
>  drivers/gpu/drm/xe/xe_ggtt.c                  |  15 +-
>  drivers/gpu/drm/xe/xe_ggtt.h                  |   5 +-
>  drivers/gpu/drm/xe/xe_gt_pagefault.c          |   4 +-
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c    |   6 +-
>  drivers/gpu/drm/xe/xe_svm.c                   |   4 +-
>  drivers/gpu/drm/xe/xe_validation.c            |  49 +++++++
>  drivers/gpu/drm/xe/xe_validation.h            |  69 ++++++++++
>  drivers/gpu/drm/xe/xe_vm.c                    |  26 +++-
>  drivers/gpu/drm/xe/xe_vm.h                    |  33 ++++-
>  drivers/gpu/drm/xe/xe_vm_types.h              |  32 +++--
>  20 files changed, 401 insertions(+), 105 deletions(-)
>  create mode 100644 drivers/gpu/drm/xe/xe_validation.c
>  create mode 100644 drivers/gpu/drm/xe/xe_validation.h
> 
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index 8e0c3412a757..8ee7d275128d 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -127,6 +127,7 @@ xe-y += xe_bb.o \
>  	xe_tuning.o \
>  	xe_uc.o \
>  	xe_uc_fw.o \
> +	xe_validation.o \
>  	xe_vm.o \
>  	xe_vram.o \
>  	xe_vram_freq.o \
> diff --git a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> index 41d39d67817a..1ce1e9da975b 100644
> --- a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> +++ b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> @@ -8,6 +8,7 @@
>  
>  #include "xe_ttm_stolen_mgr.h"
>  #include "xe_res_cursor.h"
> +#include "xe_validation.h"
>  
>  struct xe_bo;
>  
> @@ -20,6 +21,7 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
>  						       u32 size, u32 align,
>  						       u32 start, u32 end)
>  {
> +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
>  	struct xe_bo *bo;
>  	int err;
>  	u32 flags = XE_BO_FLAG_PINNED | XE_BO_FLAG_STOLEN;
> @@ -34,13 +36,13 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
>  
>  	bo = xe_bo_create_locked_range(xe, xe_device_get_root_tile(xe),
>  				       NULL, size, start, end,
> -				       ttm_bo_type_kernel, flags, 0);
> +				       ttm_bo_type_kernel, flags, 0, exec);
>  	if (IS_ERR(bo)) {
>  		err = PTR_ERR(bo);
>  		bo = NULL;
>  		return err;
>  	}
> -	err = xe_bo_pin(bo);
> +	err = xe_bo_pin(bo, exec);
>  	xe_bo_unlock_vm_held(bo);
>  
>  	if (err) {
> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> index f1f8b5ab53ef..4b0748e6fdd6 100644
> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> @@ -281,6 +281,7 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
>  	struct i915_vma *vma = kzalloc(sizeof(*vma), GFP_KERNEL);
>  	struct drm_gem_object *obj = intel_fb_bo(&fb->base);
>  	struct xe_bo *bo = gem_to_xe_bo(obj);
> +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
>  	int ret;
>  
>  	if (!vma)
> @@ -313,9 +314,9 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
>  		goto err;
>  
>  	if (IS_DGFX(xe))
> -		ret = xe_bo_migrate(bo, XE_PL_VRAM0);
> +		ret = xe_bo_migrate(bo, XE_PL_VRAM0, exec);
>  	else
> -		ret = xe_bo_validate(bo, NULL, true);
> +		ret = xe_bo_validate(bo, NULL, true, exec);
>  	if (!ret)
>  		ttm_bo_pin(&bo->ttm);
>  	ttm_bo_unreserve(&bo->ttm);
> diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
> index bb469096d072..06ceba6c3c25 100644
> --- a/drivers/gpu/drm/xe/tests/xe_bo.c
> +++ b/drivers/gpu/drm/xe/tests/xe_bo.c
> @@ -23,7 +23,7 @@
>  
>  static int ccs_test_migrate(struct xe_tile *tile, struct xe_bo *bo,
>  			    bool clear, u64 get_val, u64 assign_val,
> -			    struct kunit *test)
> +			    struct kunit *test, struct drm_exec *exec)
>  {
>  	struct dma_fence *fence;
>  	struct ttm_tt *ttm;
> @@ -35,7 +35,7 @@ static int ccs_test_migrate(struct xe_tile *tile, struct xe_bo *bo,
>  	u32 offset;
>  
>  	/* Move bo to VRAM if not already there. */
> -	ret = xe_bo_validate(bo, NULL, false);
> +	ret = xe_bo_validate(bo, NULL, false, exec);
>  	if (ret) {
>  		KUNIT_FAIL(test, "Failed to validate bo.\n");
>  		return ret;
> @@ -60,7 +60,7 @@ static int ccs_test_migrate(struct xe_tile *tile, struct xe_bo *bo,
>  	}
>  
>  	/* Evict to system. CCS data should be copied. */
> -	ret = xe_bo_evict(bo);
> +	ret = xe_bo_evict(bo, exec);
>  	if (ret) {
>  		KUNIT_FAIL(test, "Failed to evict bo.\n");
>  		return ret;
> @@ -132,6 +132,7 @@ static void ccs_test_run_tile(struct xe_device *xe, struct xe_tile *tile,
>  
>  	/* TODO: Sanity check */
>  	unsigned int bo_flags = XE_BO_FLAG_VRAM_IF_DGFX(tile);
> +	struct drm_exec *exec = XE_VALIDATION_OPT_OUT;
>  
>  	if (IS_DGFX(xe))
>  		kunit_info(test, "Testing vram id %u\n", tile->id);
> @@ -149,18 +150,18 @@ static void ccs_test_run_tile(struct xe_device *xe, struct xe_tile *tile,
>  
>  	kunit_info(test, "Verifying that CCS data is cleared on creation.\n");
>  	ret = ccs_test_migrate(tile, bo, false, 0ULL, 0xdeadbeefdeadbeefULL,
> -			       test);
> +			       test, exec);
>  	if (ret)
>  		goto out_unlock;
>  
>  	kunit_info(test, "Verifying that CCS data survives migration.\n");
>  	ret = ccs_test_migrate(tile, bo, false, 0xdeadbeefdeadbeefULL,
> -			       0xdeadbeefdeadbeefULL, test);
> +			       0xdeadbeefdeadbeefULL, test, exec);
>  	if (ret)
>  		goto out_unlock;
>  
>  	kunit_info(test, "Verifying that CCS data can be properly cleared.\n");
> -	ret = ccs_test_migrate(tile, bo, true, 0ULL, 0ULL, test);
> +	ret = ccs_test_migrate(tile, bo, true, 0ULL, 0ULL, test, exec);
>  
>  out_unlock:
>  	xe_bo_unlock(bo);
> @@ -210,6 +211,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
>  	struct xe_bo *bo, *external;
>  	unsigned int bo_flags = XE_BO_FLAG_VRAM_IF_DGFX(tile);
>  	struct xe_vm *vm = xe_migrate_get_vm(xe_device_get_root_tile(xe)->migrate);
> +	struct drm_exec *exec = XE_VALIDATION_OPT_OUT;
>  	struct xe_gt *__gt;
>  	int err, i, id;
>  
> @@ -236,7 +238,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
>  		}
>  
>  		xe_bo_lock(external, false);
> -		err = xe_bo_pin_external(external);
> +		err = xe_bo_pin_external(external, exec);
>  		xe_bo_unlock(external);
>  		if (err) {
>  			KUNIT_FAIL(test, "external bo pin err=%pe\n",
> @@ -294,7 +296,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
>  		if (i) {
>  			down_read(&vm->lock);
>  			xe_vm_lock(vm, false);
> -			err = xe_bo_validate(bo, bo->vm, false);
> +			err = xe_bo_validate(bo, bo->vm, false, exec);
>  			xe_vm_unlock(vm);
>  			up_read(&vm->lock);
>  			if (err) {
> @@ -303,7 +305,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
>  				goto cleanup_all;
>  			}
>  			xe_bo_lock(external, false);
> -			err = xe_bo_validate(external, NULL, false);
> +			err = xe_bo_validate(external, NULL, false, exec);
>  			xe_bo_unlock(external);
>  			if (err) {
>  				KUNIT_FAIL(test, "external bo valid err=%pe\n",
> diff --git a/drivers/gpu/drm/xe/tests/xe_dma_buf.c b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
> index cde9530bef8c..965dd3280468 100644
> --- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c
> +++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
> @@ -27,7 +27,8 @@ static bool is_dynamic(struct dma_buf_test_params *params)
>  }
>  
>  static void check_residency(struct kunit *test, struct xe_bo *exported,
> -			    struct xe_bo *imported, struct dma_buf *dmabuf)
> +			    struct xe_bo *imported, struct dma_buf *dmabuf,
> +			    struct drm_exec *exec)
>  {
>  	struct dma_buf_test_params *params = to_dma_buf_test_params(test->priv);
>  	u32 mem_type;
> @@ -62,7 +63,7 @@ static void check_residency(struct kunit *test, struct xe_bo *exported,
>  	 * importer is on a different device. If they're on the same device,
>  	 * the exporter and the importer should be the same bo.
>  	 */
> -	ret = xe_bo_evict(exported);
> +	ret = xe_bo_evict(exported, exec);
>  	if (ret) {
>  		if (ret != -EINTR && ret != -ERESTARTSYS)
>  			KUNIT_FAIL(test, "Evicting exporter failed with err=%d.\n",
> @@ -77,7 +78,7 @@ static void check_residency(struct kunit *test, struct xe_bo *exported,
>  	}
>  
>  	/* Re-validate the importer. This should move also exporter in. */
> -	ret = xe_bo_validate(imported, NULL, false);
> +	ret = xe_bo_validate(imported, NULL, false, exec);
>  	if (ret) {
>  		if (ret != -EINTR && ret != -ERESTARTSYS)
>  			KUNIT_FAIL(test, "Validating importer failed with err=%d.\n",
> @@ -150,11 +151,12 @@ static void xe_test_dmabuf_import_same_driver(struct xe_device *xe)
>  			KUNIT_FAIL(test,
>  				   "xe_gem_prime_import() succeeded when it shouldn't have\n");
>  		} else {
> +			struct drm_exec *exec = XE_VALIDATION_OPT_OUT;
>  			int err;
>  
>  			/* Is everything where we expect it to be? */
>  			xe_bo_lock(import_bo, false);
> -			err = xe_bo_validate(import_bo, NULL, false);
> +			err = xe_bo_validate(import_bo, NULL, false, exec);
>  
>  			/* Pinning in VRAM is not allowed. */
>  			if (!is_dynamic(params) &&
> @@ -167,7 +169,7 @@ static void xe_test_dmabuf_import_same_driver(struct xe_device *xe)
>  						  err == -ERESTARTSYS);
>  
>  			if (!err)
> -				check_residency(test, bo, import_bo, dmabuf);
> +				check_residency(test, bo, import_bo, dmabuf, exec);
>  			xe_bo_unlock(import_bo);
>  		}
>  		drm_gem_object_put(import);
> diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
> index edd1e701aa1c..dfb445d09759 100644
> --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
> @@ -70,7 +70,7 @@ static int run_sanity_job(struct xe_migrate *m, struct xe_device *xe,
>  		} } while (0)
>  
>  static void test_copy(struct xe_migrate *m, struct xe_bo *bo,
> -		      struct kunit *test, u32 region)
> +		      struct kunit *test, u32 region, struct drm_exec *exec)
>  {
>  	struct xe_device *xe = tile_to_xe(m->tile);
>  	u64 retval, expected = 0;
> @@ -84,14 +84,15 @@ static void test_copy(struct xe_migrate *m, struct xe_bo *bo,
>  						   ttm_bo_type_kernel,
>  						   region |
>  						   XE_BO_FLAG_NEEDS_CPU_ACCESS |
> -						   XE_BO_FLAG_PINNED);
> +						   XE_BO_FLAG_PINNED,
> +						   exec);
>  	if (IS_ERR(remote)) {
>  		KUNIT_FAIL(test, "Failed to allocate remote bo for %s: %pe\n",
>  			   str, remote);
>  		return;
>  	}
>  
> -	err = xe_bo_validate(remote, NULL, false);
> +	err = xe_bo_validate(remote, NULL, false, exec);
>  	if (err) {
>  		KUNIT_FAIL(test, "Failed to validate system bo for %s: %i\n",
>  			   str, err);
> @@ -161,13 +162,13 @@ static void test_copy(struct xe_migrate *m, struct xe_bo *bo,
>  }
>  
>  static void test_copy_sysmem(struct xe_migrate *m, struct xe_bo *bo,
> -			     struct kunit *test)
> +			     struct drm_exec *exec, struct kunit *test)
>  {
> -	test_copy(m, bo, test, XE_BO_FLAG_SYSTEM);
> +	test_copy(m, bo, test, XE_BO_FLAG_SYSTEM, exec);
>  }
>  
>  static void test_copy_vram(struct xe_migrate *m, struct xe_bo *bo,
> -			   struct kunit *test)
> +			   struct drm_exec *exec, struct kunit *test)
>  {
>  	u32 region;
>  
> @@ -178,10 +179,11 @@ static void test_copy_vram(struct xe_migrate *m, struct xe_bo *bo,
>  		region = XE_BO_FLAG_VRAM1;
>  	else
>  		region = XE_BO_FLAG_VRAM0;
> -	test_copy(m, bo, test, region);
> +	test_copy(m, bo, test, region, exec);
>  }
>  
> -static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
> +static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test,
> +				   struct drm_exec *exec)
>  {
>  	struct xe_tile *tile = m->tile;
>  	struct xe_device *xe = tile_to_xe(tile);
> @@ -290,10 +292,10 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
>  	check(retval, expected, "Command clear small last value", test);
>  
>  	kunit_info(test, "Copying small buffer object to system\n");
> -	test_copy_sysmem(m, tiny, test);
> +	test_copy_sysmem(m, tiny, exec, test);
>  	if (xe->info.tile_count > 1) {
>  		kunit_info(test, "Copying small buffer object to other vram\n");
> -		test_copy_vram(m, tiny, test);
> +		test_copy_vram(m, tiny, exec, test);
>  	}
>  
>  	/* Clear a big bo */
> @@ -312,10 +314,10 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
>  	check(retval, expected, "Command clear big last value", test);
>  
>  	kunit_info(test, "Copying big buffer object to system\n");
> -	test_copy_sysmem(m, big, test);
> +	test_copy_sysmem(m, big, exec, test);
>  	if (xe->info.tile_count > 1) {
>  		kunit_info(test, "Copying big buffer object to other vram\n");
> -		test_copy_vram(m, big, test);
> +		test_copy_vram(m, big, exec, test);
>  	}
>  
>  out:
> @@ -343,10 +345,11 @@ static int migrate_test_run_device(struct xe_device *xe)
>  
>  	for_each_tile(tile, xe, id) {
>  		struct xe_migrate *m = tile->migrate;
> +		struct drm_exec *exec = XE_VALIDATION_OPT_OUT;
>  
>  		kunit_info(test, "Testing tile id %d.\n", id);
>  		xe_vm_lock(m->q->vm, false);
> -		xe_migrate_sanity_test(m, test);
> +		xe_migrate_sanity_test(m, test, exec);
>  		xe_vm_unlock(m->q->vm);
>  	}
>  
> @@ -490,7 +493,7 @@ static struct dma_fence *blt_copy(struct xe_tile *tile,
>  
>  static void test_migrate(struct xe_device *xe, struct xe_tile *tile,
>  			 struct xe_bo *sys_bo, struct xe_bo *vram_bo, struct xe_bo *ccs_bo,
> -			 struct kunit *test)
> +			 struct drm_exec *exec, struct kunit *test)
>  {
>  	struct dma_fence *fence;
>  	u64 expected, retval;
> @@ -509,7 +512,7 @@ static void test_migrate(struct xe_device *xe, struct xe_tile *tile,
>  	dma_fence_put(fence);
>  
>  	kunit_info(test, "Evict vram buffer object\n");
> -	ret = xe_bo_evict(vram_bo);
> +	ret = xe_bo_evict(vram_bo, exec);
>  	if (ret) {
>  		KUNIT_FAIL(test, "Failed to evict bo.\n");
>  		return;
> @@ -538,7 +541,7 @@ static void test_migrate(struct xe_device *xe, struct xe_tile *tile,
>  	dma_fence_put(fence);
>  
>  	kunit_info(test, "Restore vram buffer object\n");
> -	ret = xe_bo_validate(vram_bo, NULL, false);
> +	ret = xe_bo_validate(vram_bo, NULL, false, exec);
>  	if (ret) {
>  		KUNIT_FAIL(test, "Failed to validate vram bo for: %li\n", ret);
>  		return;
> @@ -636,6 +639,7 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
>  {
>  	struct xe_bo *sys_bo, *vram_bo = NULL, *ccs_bo = NULL;
>  	unsigned int bo_flags = XE_BO_FLAG_VRAM_IF_DGFX(tile);
> +	struct drm_exec *exec;
>  	long ret;
>  
>  	sys_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M,
> @@ -650,8 +654,9 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
>  		return;
>  	}
>  
> +	exec = XE_VALIDATION_OPT_OUT;
>  	xe_bo_lock(sys_bo, false);
> -	ret = xe_bo_validate(sys_bo, NULL, false);
> +	ret = xe_bo_validate(sys_bo, NULL, false, exec);
>  	if (ret) {
>  		KUNIT_FAIL(test, "Failed to validate system bo for: %li\n", ret);
>  		goto free_sysbo;
> @@ -676,7 +681,7 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
>  	}
>  
>  	xe_bo_lock(ccs_bo, false);
> -	ret = xe_bo_validate(ccs_bo, NULL, false);
> +	ret = xe_bo_validate(ccs_bo, NULL, false, exec);
>  	if (ret) {
>  		KUNIT_FAIL(test, "Failed to validate system bo for: %li\n", ret);
>  		goto free_ccsbo;
> @@ -700,7 +705,7 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
>  	}
>  
>  	xe_bo_lock(vram_bo, false);
> -	ret = xe_bo_validate(vram_bo, NULL, false);
> +	ret = xe_bo_validate(vram_bo, NULL, false, exec);
>  	if (ret) {
>  		KUNIT_FAIL(test, "Failed to validate vram bo for: %li\n", ret);
>  		goto free_vrambo;
> @@ -713,7 +718,7 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
>  	}
>  
>  	test_clear(xe, tile, sys_bo, vram_bo, test);
> -	test_migrate(xe, tile, sys_bo, vram_bo, ccs_bo, test);
> +	test_migrate(xe, tile, sys_bo, vram_bo, ccs_bo, exec, test);
>  	xe_bo_unlock(vram_bo);
>  
>  	xe_bo_lock(vram_bo, false);
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 11eaf3b06766..e71addf51ed0 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -1139,6 +1139,7 @@ long xe_bo_shrink(struct ttm_operation_ctx *ctx, struct ttm_buffer_object *bo,
>  int xe_bo_notifier_prepare_pinned(struct xe_bo *bo)
>  {
>  	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
> +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
>  	struct xe_bo *backup;
>  	int ret = 0;
>  
> @@ -1163,7 +1164,7 @@ int xe_bo_notifier_prepare_pinned(struct xe_bo *bo)
>  	backup = ___xe_bo_create_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
>  					DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
>  					XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> -					XE_BO_FLAG_PINNED);
> +					XE_BO_FLAG_PINNED, exec);
>  	if (IS_ERR(backup)) {
>  		ret = PTR_ERR(backup);
>  		goto out_unlock_bo;
> @@ -1214,6 +1215,7 @@ int xe_bo_notifier_unprepare_pinned(struct xe_bo *bo)
>  int xe_bo_evict_pinned(struct xe_bo *bo)
>  {
>  	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
> +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
>  	struct xe_bo *backup = bo->backup_obj;
>  	bool backup_created = false;
>  	bool unmap = false;
> @@ -1242,7 +1244,7 @@ int xe_bo_evict_pinned(struct xe_bo *bo)
>  						NULL, xe_bo_size(bo),
>  						DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
>  						XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> -						XE_BO_FLAG_PINNED);
> +						XE_BO_FLAG_PINNED, exec);
>  		if (IS_ERR(backup)) {
>  			ret = PTR_ERR(backup);
>  			goto out_unlock_bo;
> @@ -1718,12 +1720,14 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
>  	struct xe_device *xe = to_xe_device(ddev);
>  	struct xe_bo *bo = ttm_to_xe_bo(tbo);
>  	bool needs_rpm = bo->flags & XE_BO_FLAG_VRAM_MASK;
> +	struct drm_exec *exec;
>  	vm_fault_t ret;
>  	int idx;
>  
>  	if (needs_rpm)
>  		xe_pm_runtime_get(xe);
>  
> +	exec = XE_VALIDATION_UNIMPLEMENTED;
>  	ret = ttm_bo_vm_reserve(tbo, vmf);
>  	if (ret)
>  		goto out;
> @@ -1731,6 +1735,7 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
>  	if (drm_dev_enter(ddev, &idx)) {
>  		trace_xe_bo_cpu_fault(bo);
>  
> +		xe_validation_assert_exec(xe, exec, &tbo->base);
>  		ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
>  					       TTM_BO_VM_NUM_PREFAULT);
>  		drm_dev_exit(idx);
> @@ -1850,11 +1855,32 @@ void xe_bo_free(struct xe_bo *bo)
>  	kfree(bo);
>  }
>  
> +/**
> + * ___xe_bo_create_locked() - Initialize or create an xe_bo.
> + * @xe: The xe device.
> + * @bo: An already allocated buffer object or NULL
> + * if the function should allocate a new one.
> + * @tile: The tile to select for migration of this bo, and the tile used for
> + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
> + * @resv: Pointer to a locked shared reservation object to use fo this bo,
> + * or NULL for the xe_bo to use its own.
> + * @bulk: The bulk move to use for LRU bumping, or NULL for external bos.
> + * @size: The storage size to use for the bo.
> + * @cpu_caching: The cpu caching used for system memory backing store.
> + * @type: The TTM buffer object type.
> + * @flags: XE_BO_FLAG_ flags.
> + * @exec: The drm_exec transaction to use for exhaustive eviction.
> + *
> + * Initialize or create an xe buffer object. On failure, any allocated buffer
> + * object passed in @bo will have been unreferenced.
> + *
> + * Return: The buffer object on success. Negative error pointer on failure.
> + */
>  struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>  				     struct xe_tile *tile, struct dma_resv *resv,
>  				     struct ttm_lru_bulk_move *bulk, size_t size,
>  				     u16 cpu_caching, enum ttm_bo_type type,
> -				     u32 flags)
> +				     u32 flags, struct drm_exec *exec)
>  {
>  	struct ttm_operation_ctx ctx = {
>  		.interruptible = true,
> @@ -1923,6 +1949,7 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>  		ctx.resv = resv;
>  	}
>  
> +	xe_validation_assert_exec(xe, exec, &bo->ttm.base);
>  	if (!(flags & XE_BO_FLAG_FIXED_PLACEMENT)) {
>  		err = __xe_bo_placement_for_flags(xe, bo, bo->flags);
>  		if (WARN_ON(err)) {
> @@ -2024,7 +2051,7 @@ __xe_bo_create_locked(struct xe_device *xe,
>  		      struct xe_tile *tile, struct xe_vm *vm,
>  		      size_t size, u64 start, u64 end,
>  		      u16 cpu_caching, enum ttm_bo_type type, u32 flags,
> -		      u64 alignment)
> +		      u64 alignment, struct drm_exec *exec)
>  {
>  	struct xe_bo *bo = NULL;
>  	int err;
> @@ -2049,7 +2076,7 @@ __xe_bo_create_locked(struct xe_device *xe,
>  				    vm && !xe_vm_in_fault_mode(vm) &&
>  				    flags & XE_BO_FLAG_USER ?
>  				    &vm->lru_bulk_move : NULL, size,
> -				    cpu_caching, type, flags);
> +				    cpu_caching, type, flags, exec);
>  	if (IS_ERR(bo))
>  		return bo;
>  
> @@ -2083,9 +2110,10 @@ __xe_bo_create_locked(struct xe_device *xe,
>  
>  			if (flags & XE_BO_FLAG_FIXED_PLACEMENT) {
>  				err = xe_ggtt_insert_bo_at(t->mem.ggtt, bo,
> -							   start + xe_bo_size(bo), U64_MAX);
> +							   start + xe_bo_size(bo), U64_MAX,
> +							   exec);
>  			} else {
> -				err = xe_ggtt_insert_bo(t->mem.ggtt, bo);
> +				err = xe_ggtt_insert_bo(t->mem.ggtt, bo, exec);
>  			}
>  			if (err)
>  				goto err_unlock_put_bo;
> @@ -2102,22 +2130,59 @@ __xe_bo_create_locked(struct xe_device *xe,
>  	return ERR_PTR(err);
>  }
>  
> +/**
> + * xe_bo_create_locked_range() - Create a BO with range- and alignment options
> + * @xe: The xe device.
> + * @tile: The tile to select for migration of this bo, and the tile used for
> + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
> + * @vm: The local vm or NULL for external objects.
> + * @size: The storage size to use for the bo.
> + * @start: Start of fixed VRAM range or 0.
> + * @end: End of fixed VRAM range or ~0ULL.
> + * @type: The TTM buffer object type.
> + * @flags: XE_BO_FLAG_ flags.
> + * @alignment: For GGTT buffer objects, the minimum GGTT alignment.
> + * @exec: The drm_exec transaction to use for exhaustive eviction.
> + *
> + * Create an Xe BO with range- and alignment options. If @start and @end indicate
> + * a fixed VRAM range, this must be a ttm_bo_type_kernel bo with VRAM placement
> + * only. The @alignment parameter can be used for GGTT alignment.
> + *
> + * Return: The buffer object on success. Negative error pointer on failure.
> + */
>  struct xe_bo *
>  xe_bo_create_locked_range(struct xe_device *xe,
>  			  struct xe_tile *tile, struct xe_vm *vm,
>  			  size_t size, u64 start, u64 end,
> -			  enum ttm_bo_type type, u32 flags, u64 alignment)
> +			  enum ttm_bo_type type, u32 flags, u64 alignment,
> +			  struct drm_exec *exec)
>  {
>  	return __xe_bo_create_locked(xe, tile, vm, size, start, end, 0, type,
> -				     flags, alignment);
> +				     flags, alignment, exec);
>  }
>  
> +/**
> + * xe_bo_create_locked() - Create a BO
> + * @xe: The xe device.
> + * @tile: The tile to select for migration of this bo, and the tile used for
> + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
> + * @vm: The local vm or NULL for external objects.
> + * @size: The storage size to use for the bo.
> + * @type: The TTM buffer object type.
> + * @flags: XE_BO_FLAG_ flags.
> + * @exec: The drm_exec transaction to use for exhaustive eviction.
> + *
> + * Create a locked xe BO with no range- nor alignment restrictions.
> + *
> + * Return: The buffer object on success. Negative error pointer on failure.
> + */
>  struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
>  				  struct xe_vm *vm, size_t size,
> -				  enum ttm_bo_type type, u32 flags)
> +				  enum ttm_bo_type type, u32 flags,
> +				  struct drm_exec *exec)
>  {
>  	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, type,
> -				     flags, 0);
> +				     flags, 0, exec);
>  }
>  
>  struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
> @@ -2125,9 +2190,10 @@ struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
>  				u16 cpu_caching,
>  				u32 flags)
>  {
> +	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
>  	struct xe_bo *bo = __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL,
>  						 cpu_caching, ttm_bo_type_device,
> -						 flags | XE_BO_FLAG_USER, 0);
> +						 flags | XE_BO_FLAG_USER, 0, exec);
>  	if (!IS_ERR(bo))
>  		xe_bo_unlock_vm_held(bo);
>  
> @@ -2138,7 +2204,8 @@ struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
>  			   struct xe_vm *vm, size_t size,
>  			   enum ttm_bo_type type, u32 flags)
>  {
> -	struct xe_bo *bo = xe_bo_create_locked(xe, tile, vm, size, type, flags);
> +	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
> +	struct xe_bo *bo = xe_bo_create_locked(xe, tile, vm, size, type, flags, exec);
>  
>  	if (!IS_ERR(bo))
>  		xe_bo_unlock_vm_held(bo);
> @@ -2166,6 +2233,7 @@ struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
>  	int err;
>  	u64 start = offset == ~0ull ? 0 : offset;
>  	u64 end = offset == ~0ull ? offset : start + size;
> +	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
>  
>  	if (flags & XE_BO_FLAG_STOLEN &&
>  	    xe_ttm_stolen_cpu_access_needs_ggtt(xe))
> @@ -2173,11 +2241,11 @@ struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
>  
>  	bo = xe_bo_create_locked_range(xe, tile, vm, size, start, end, type,
>  				       flags | XE_BO_FLAG_NEEDS_CPU_ACCESS | XE_BO_FLAG_PINNED,
> -				       alignment);
> +				       alignment, exec);
>  	if (IS_ERR(bo))
>  		return bo;
>  
> -	err = xe_bo_pin(bo);
> +	err = xe_bo_pin(bo, exec);
>  	if (err)
>  		goto err_put;
>  
> @@ -2299,6 +2367,7 @@ uint64_t vram_region_gpu_offset(struct ttm_resource *res)
>  /**
>   * xe_bo_pin_external - pin an external BO
>   * @bo: buffer object to be pinned
> + * @exec: The drm_exec transaction to use for exhaustive eviction.
>   *
>   * Pin an external (not tied to a VM, can be exported via dma-buf / prime FD)
>   * BO. Unique call compared to xe_bo_pin as this function has it own set of
> @@ -2306,7 +2375,7 @@ uint64_t vram_region_gpu_offset(struct ttm_resource *res)
>   *
>   * Returns 0 for success, negative error code otherwise.
>   */
> -int xe_bo_pin_external(struct xe_bo *bo)
> +int xe_bo_pin_external(struct xe_bo *bo, struct drm_exec *exec)
>  {
>  	struct xe_device *xe = xe_bo_device(bo);
>  	int err;
> @@ -2315,7 +2384,7 @@ int xe_bo_pin_external(struct xe_bo *bo)
>  	xe_assert(xe, xe_bo_is_user(bo));
>  
>  	if (!xe_bo_is_pinned(bo)) {
> -		err = xe_bo_validate(bo, NULL, false);
> +		err = xe_bo_validate(bo, NULL, false, exec);
>  		if (err)
>  			return err;
>  
> @@ -2337,7 +2406,17 @@ int xe_bo_pin_external(struct xe_bo *bo)
>  	return 0;
>  }
>  
> -int xe_bo_pin(struct xe_bo *bo)
> +/**
> + * xe_bo_pin() - Pin a kernel bo after potentially migrating it
> + * @bo: The kernel bo to pin.
> + * @exec: The drm_exec transaction to use for exhaustive eviction.
> + *
> + * Attempts to migrate a bo to @bo->placement. If that succeeds,
> + * pins the bo.
> + *
> + * Return: %0 on success, negative error code on migration failure.
> + */
> +int xe_bo_pin(struct xe_bo *bo, struct drm_exec *exec)
>  {
>  	struct ttm_place *place = &bo->placements[0];
>  	struct xe_device *xe = xe_bo_device(bo);
> @@ -2359,7 +2438,7 @@ int xe_bo_pin(struct xe_bo *bo)
>  	/* We only expect at most 1 pin */
>  	xe_assert(xe, !xe_bo_is_pinned(bo));
>  
> -	err = xe_bo_validate(bo, NULL, false);
> +	err = xe_bo_validate(bo, NULL, false, exec);
>  	if (err)
>  		return err;
>  
> @@ -2452,6 +2531,7 @@ void xe_bo_unpin(struct xe_bo *bo)
>   *      NULL. Used together with @allow_res_evict.
>   * @allow_res_evict: Whether it's allowed to evict bos sharing @vm's
>   *                   reservation object.
> + * @exec: The drm_exec transaction to use for exhaustive eviction.
>   *
>   * Make sure the bo is in allowed placement, migrating it if necessary. If
>   * needed, other bos will be evicted. If bos selected for eviction shares
> @@ -2461,7 +2541,8 @@ void xe_bo_unpin(struct xe_bo *bo)
>   * Return: 0 on success, negative error code on failure. May return
>   * -EINTR or -ERESTARTSYS if internal waits are interrupted by a signal.
>   */
> -int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict)
> +int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict,
> +		   struct drm_exec *exec)
>  {
>  	struct ttm_operation_ctx ctx = {
>  		.interruptible = true,
> @@ -2480,6 +2561,7 @@ int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict)
>  
>  	xe_vm_set_validating(vm, allow_res_evict);
>  	trace_xe_bo_validate(bo);
> +	xe_validation_assert_exec(xe_bo_device(bo), exec, &bo->ttm.base);
>  	ret = ttm_bo_validate(&bo->ttm, &bo->placement, &ctx);
>  	xe_vm_clear_validating(vm, allow_res_evict);
>  
> @@ -2917,6 +2999,7 @@ static void xe_place_from_ttm_type(u32 mem_type, struct ttm_place *place)
>   * xe_bo_migrate - Migrate an object to the desired region id
>   * @bo: The buffer object to migrate.
>   * @mem_type: The TTM region type to migrate to.
> + * @exec: The drm_exec transaction to use for exhaustive eviction.
>   *
>   * Attempt to migrate the buffer object to the desired memory region. The
>   * buffer object may not be pinned, and must be locked.
> @@ -2928,7 +3011,7 @@ static void xe_place_from_ttm_type(u32 mem_type, struct ttm_place *place)
>   * Return: 0 on success. Negative error code on failure. In particular may
>   * return -EINTR or -ERESTARTSYS if signal pending.
>   */
> -int xe_bo_migrate(struct xe_bo *bo, u32 mem_type)
> +int xe_bo_migrate(struct xe_bo *bo, u32 mem_type, struct drm_exec *exec)
>  {
>  	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
>  	struct ttm_operation_ctx ctx = {
> @@ -2966,19 +3049,21 @@ int xe_bo_migrate(struct xe_bo *bo, u32 mem_type)
>  		add_vram(xe, bo, &requested, bo->flags, mem_type, &c);
>  	}
>  
> +	xe_validation_assert_exec(xe_bo_device(bo), exec, &bo->ttm.base);
>  	return ttm_bo_validate(&bo->ttm, &placement, &ctx);
>  }
>  
>  /**
>   * xe_bo_evict - Evict an object to evict placement
>   * @bo: The buffer object to migrate.
> + * @exec: The drm_exec transaction to use for exhaustive eviction.
>   *
>   * On successful completion, the object memory will be moved to evict
>   * placement. This function blocks until the object has been fully moved.
>   *
>   * Return: 0 on success. Negative error code on failure.
>   */
> -int xe_bo_evict(struct xe_bo *bo)
> +int xe_bo_evict(struct xe_bo *bo, struct drm_exec *exec)
>  {
>  	struct ttm_operation_ctx ctx = {
>  		.interruptible = false,
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index 8cce413b5235..b1b6cb622d71 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -10,6 +10,7 @@
>  
>  #include "xe_bo_types.h"
>  #include "xe_macros.h"
> +#include "xe_validation.h"
>  #include "xe_vm_types.h"
>  #include "xe_vm.h"
>  #include "xe_vram_types.h"
> @@ -92,15 +93,17 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>  				     struct xe_tile *tile, struct dma_resv *resv,
>  				     struct ttm_lru_bulk_move *bulk, size_t size,
>  				     u16 cpu_caching, enum ttm_bo_type type,
> -				     u32 flags);
> +				     u32 flags, struct drm_exec *exec);
>  struct xe_bo *
>  xe_bo_create_locked_range(struct xe_device *xe,
>  			  struct xe_tile *tile, struct xe_vm *vm,
>  			  size_t size, u64 start, u64 end,
> -			  enum ttm_bo_type type, u32 flags, u64 alignment);
> +			  enum ttm_bo_type type, u32 flags, u64 alignment,
> +			  struct drm_exec *exec);
>  struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
>  				  struct xe_vm *vm, size_t size,
> -				  enum ttm_bo_type type, u32 flags);
> +				  enum ttm_bo_type type, u32 flags,
> +				  struct drm_exec *exec);
>  struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
>  			   struct xe_vm *vm, size_t size,
>  			   enum ttm_bo_type type, u32 flags);
> @@ -200,11 +203,12 @@ static inline void xe_bo_unlock_vm_held(struct xe_bo *bo)
>  	}
>  }
>  
> -int xe_bo_pin_external(struct xe_bo *bo);
> -int xe_bo_pin(struct xe_bo *bo);
> +int xe_bo_pin_external(struct xe_bo *bo, struct drm_exec *exec);
> +int xe_bo_pin(struct xe_bo *bo, struct drm_exec *exec);
>  void xe_bo_unpin_external(struct xe_bo *bo);
>  void xe_bo_unpin(struct xe_bo *bo);
> -int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict);
> +int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict,
> +		   struct drm_exec *exec);
>  
>  static inline bool xe_bo_is_pinned(struct xe_bo *bo)
>  {
> @@ -285,8 +289,8 @@ uint64_t vram_region_gpu_offset(struct ttm_resource *res);
>  
>  bool xe_bo_can_migrate(struct xe_bo *bo, u32 mem_type);
>  
> -int xe_bo_migrate(struct xe_bo *bo, u32 mem_type);
> -int xe_bo_evict(struct xe_bo *bo);
> +int xe_bo_migrate(struct xe_bo *bo, u32 mem_type, struct drm_exec *exec);
> +int xe_bo_evict(struct xe_bo *bo, struct drm_exec *exec);
>  
>  int xe_bo_evict_pinned(struct xe_bo *bo);
>  int xe_bo_notifier_prepare_pinned(struct xe_bo *bo);
> diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
> index 346f857f3837..78a827d4e726 100644
> --- a/drivers/gpu/drm/xe/xe_dma_buf.c
> +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
> @@ -51,6 +51,7 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
>  	struct drm_gem_object *obj = attach->dmabuf->priv;
>  	struct xe_bo *bo = gem_to_xe_bo(obj);
>  	struct xe_device *xe = xe_bo_device(bo);
> +	struct drm_exec *exec = XE_VALIDATION_UNSUPPORTED;
>  	int ret;
>  
>  	/*
> @@ -63,7 +64,7 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
>  		return -EINVAL;
>  	}
>  
> -	ret = xe_bo_migrate(bo, XE_PL_TT);
> +	ret = xe_bo_migrate(bo, XE_PL_TT, exec);
>  	if (ret) {
>  		if (ret != -EINTR && ret != -ERESTARTSYS)
>  			drm_dbg(&xe->drm,
> @@ -72,7 +73,7 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
>  		return ret;
>  	}
>  
> -	ret = xe_bo_pin_external(bo);
> +	ret = xe_bo_pin_external(bo, exec);
>  	xe_assert(xe, !ret);
>  
>  	return 0;
> @@ -92,6 +93,7 @@ static struct sg_table *xe_dma_buf_map(struct dma_buf_attachment *attach,
>  	struct dma_buf *dma_buf = attach->dmabuf;
>  	struct drm_gem_object *obj = dma_buf->priv;
>  	struct xe_bo *bo = gem_to_xe_bo(obj);
> +	struct drm_exec *exec = XE_VALIDATION_UNSUPPORTED;
>  	struct sg_table *sgt;
>  	int r = 0;
>  
> @@ -100,9 +102,9 @@ static struct sg_table *xe_dma_buf_map(struct dma_buf_attachment *attach,
>  
>  	if (!xe_bo_is_pinned(bo)) {
>  		if (!attach->peer2peer)
> -			r = xe_bo_migrate(bo, XE_PL_TT);
> +			r = xe_bo_migrate(bo, XE_PL_TT, exec);
>  		else
> -			r = xe_bo_validate(bo, NULL, false);
> +			r = xe_bo_validate(bo, NULL, false, exec);
>  		if (r)
>  			return ERR_PTR(r);
>  	}
> @@ -161,13 +163,14 @@ static int xe_dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
>  	struct xe_bo *bo = gem_to_xe_bo(obj);
>  	bool reads =  (direction == DMA_BIDIRECTIONAL ||
>  		       direction == DMA_FROM_DEVICE);
> +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
>  
>  	if (!reads)
>  		return 0;
>  
>  	/* Can we do interruptible lock here? */
>  	xe_bo_lock(bo, false);
> -	(void)xe_bo_migrate(bo, XE_PL_TT);
> +	(void)xe_bo_migrate(bo, XE_PL_TT, exec);
>  	xe_bo_unlock(bo);
>  
>  	return 0;
> @@ -208,13 +211,14 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
>  {
>  	struct dma_resv *resv = dma_buf->resv;
>  	struct xe_device *xe = to_xe_device(dev);
> +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
>  	struct xe_bo *bo;
>  	int ret;
>  
>  	dma_resv_lock(resv, NULL);
>  	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
>  				    0, /* Will require 1way or 2way for vm_bind */
> -				    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM);
> +				    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM, exec);
>  	if (IS_ERR(bo)) {
>  		ret = PTR_ERR(bo);
>  		goto error;
> @@ -232,8 +236,9 @@ static void xe_dma_buf_move_notify(struct dma_buf_attachment *attach)
>  {
>  	struct drm_gem_object *obj = attach->importer_priv;
>  	struct xe_bo *bo = gem_to_xe_bo(obj);
> +	struct drm_exec *exec = XE_VALIDATION_UNSUPPORTED;
>  
> -	XE_WARN_ON(xe_bo_evict(bo));
> +	XE_WARN_ON(xe_bo_evict(bo, exec));
>  }
>  
>  static const struct dma_buf_attach_ops xe_dma_buf_attach_ops = {
> diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> index 44364c042ad7..0bcb4fb9a10e 100644
> --- a/drivers/gpu/drm/xe/xe_exec.c
> +++ b/drivers/gpu/drm/xe/xe_exec.c
> @@ -97,9 +97,13 @@
>  static int xe_exec_fn(struct drm_gpuvm_exec *vm_exec)
>  {
>  	struct xe_vm *vm = container_of(vm_exec->vm, struct xe_vm, gpuvm);
> +	int ret;
>  
>  	/* The fence slot added here is intended for the exec sched job. */
> -	return xe_vm_validate_rebind(vm, &vm_exec->exec, 1);
> +	xe_vm_set_validation_exec(vm, &vm_exec->exec);
> +	ret = xe_vm_validate_rebind(vm, &vm_exec->exec, 1);
> +	xe_vm_set_validation_exec(vm, NULL);
> +	return ret;
>  }
>  
>  int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> index e03222f5ac5a..a47c0131956b 100644
> --- a/drivers/gpu/drm/xe/xe_ggtt.c
> +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> @@ -731,7 +731,7 @@ void xe_ggtt_map_bo_unlocked(struct xe_ggtt *ggtt, struct xe_bo *bo)
>  }
>  
>  static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
> -				  u64 start, u64 end)
> +				  u64 start, u64 end, struct drm_exec *exec)
>  {
>  	u64 alignment = bo->min_align > 0 ? bo->min_align : XE_PAGE_SIZE;
>  	u8 tile_id = ggtt->tile->id;
> @@ -746,7 +746,7 @@ static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
>  		return 0;
>  	}
>  
> -	err = xe_bo_validate(bo, NULL, false);
> +	err = xe_bo_validate(bo, NULL, false, exec);
>  	if (err)
>  		return err;
>  
> @@ -788,25 +788,28 @@ static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
>   * @bo: the &xe_bo to be inserted
>   * @start: address where it will be inserted
>   * @end: end of the range where it will be inserted
> + * @exec: The drm_exec transaction to use for exhaustive eviction.
>   *
>   * Return: 0 on success or a negative error code on failure.
>   */
>  int xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
> -			 u64 start, u64 end)
> +			 u64 start, u64 end, struct drm_exec *exec)
>  {
> -	return __xe_ggtt_insert_bo_at(ggtt, bo, start, end);
> +	return __xe_ggtt_insert_bo_at(ggtt, bo, start, end, exec);
>  }
>  
>  /**
>   * xe_ggtt_insert_bo - Insert BO into GGTT
>   * @ggtt: the &xe_ggtt where bo will be inserted
>   * @bo: the &xe_bo to be inserted
> + * @exec: The drm_exec transaction to use for exhaustive eviction.
>   *
>   * Return: 0 on success or a negative error code on failure.
>   */
> -int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo)
> +int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo,
> +		      struct drm_exec *exec)
>  {
> -	return __xe_ggtt_insert_bo_at(ggtt, bo, 0, U64_MAX);
> +	return __xe_ggtt_insert_bo_at(ggtt, bo, 0, U64_MAX, exec);
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/xe/xe_ggtt.h b/drivers/gpu/drm/xe/xe_ggtt.h
> index fbe1e397d05d..75fc7a1efea7 100644
> --- a/drivers/gpu/drm/xe/xe_ggtt.h
> +++ b/drivers/gpu/drm/xe/xe_ggtt.h
> @@ -10,6 +10,7 @@
>  
>  struct drm_printer;
>  struct xe_tile;
> +struct drm_exec;
>  
>  struct xe_ggtt *xe_ggtt_alloc(struct xe_tile *tile);
>  int xe_ggtt_init_early(struct xe_ggtt *ggtt);
> @@ -31,9 +32,9 @@ bool xe_ggtt_node_allocated(const struct xe_ggtt_node *node);
>  void xe_ggtt_map_bo(struct xe_ggtt *ggtt, struct xe_ggtt_node *node,
>  		    struct xe_bo *bo, u16 pat_index);
>  void xe_ggtt_map_bo_unlocked(struct xe_ggtt *ggtt, struct xe_bo *bo);
> -int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo);
> +int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo, struct drm_exec *exec);
>  int xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
> -			 u64 start, u64 end);
> +			 u64 start, u64 end, struct drm_exec *exec);
>  void xe_ggtt_remove_bo(struct xe_ggtt *ggtt, struct xe_bo *bo);
>  u64 xe_ggtt_largest_hole(struct xe_ggtt *ggtt, u64 alignment, u64 *spare);
>  
> diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> index ab43dec52776..2c7f10cc423f 100644
> --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> @@ -94,12 +94,12 @@ static int xe_pf_begin(struct drm_exec *exec, struct xe_vma *vma,
>  		}
>  
>  		/* Migrate to VRAM, move should invalidate the VMA first */
> -		err = xe_bo_migrate(bo, vram->placement);
> +		err = xe_bo_migrate(bo, vram->placement, exec);
>  		if (err)
>  			return err;
>  	} else if (bo) {
>  		/* Create backing store if needed */
> -		err = xe_bo_validate(bo, vm, true);
> +		err = xe_bo_validate(bo, vm, true, exec);
>  		if (err)
>  			return err;
>  	}
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> index c8f0320d032f..906011671b60 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> @@ -1452,6 +1452,7 @@ static bool pf_release_vf_config_lmem(struct xe_gt *gt, struct xe_gt_sriov_confi
>  static int pf_provision_vf_lmem(struct xe_gt *gt, unsigned int vfid, u64 size)
>  {
>  	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
> +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
>  	struct xe_device *xe = gt_to_xe(gt);
>  	struct xe_tile *tile = gt_to_tile(gt);
>  	struct xe_bo *bo;
> @@ -1484,11 +1485,12 @@ static int pf_provision_vf_lmem(struct xe_gt *gt, unsigned int vfid, u64 size)
>  				 XE_BO_FLAG_VRAM_IF_DGFX(tile) |
>  				 XE_BO_FLAG_NEEDS_2M |
>  				 XE_BO_FLAG_PINNED |
> -				 XE_BO_FLAG_PINNED_LATE_RESTORE);
> +				 XE_BO_FLAG_PINNED_LATE_RESTORE,
> +				 exec);
>  	if (IS_ERR(bo))
>  		return PTR_ERR(bo);
>  
> -	err = xe_bo_pin(bo);
> +	err = xe_bo_pin(bo, exec);
>  	xe_bo_unlock(bo);
>  	if (unlikely(err)) {
>  		xe_bo_put(bo);
> diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> index e35c6d4def20..39e3aa6df25a 100644
> --- a/drivers/gpu/drm/xe/xe_svm.c
> +++ b/drivers/gpu/drm/xe/xe_svm.c
> @@ -700,6 +700,7 @@ static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
>  	struct device *dev = xe->drm.dev;
>  	struct drm_buddy_block *block;
>  	struct list_head *blocks;
> +	struct drm_exec *exec;
>  	struct xe_bo *bo;
>  	ktime_t time_end = 0;
>  	int err, idx;
> @@ -708,12 +709,13 @@ static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
>  		return -ENODEV;
>  
>  	xe_pm_runtime_get(xe);
> +	exec = XE_VALIDATION_UNIMPLEMENTED;
>  
>   retry:
>  	bo = xe_bo_create_locked(vr->xe, NULL, NULL, end - start,
>  				 ttm_bo_type_device,
>  				 (IS_DGFX(xe) ? XE_BO_FLAG_VRAM(vr) : XE_BO_FLAG_SYSTEM) |
> -				 XE_BO_FLAG_CPU_ADDR_MIRROR);
> +				 XE_BO_FLAG_CPU_ADDR_MIRROR, exec);
>  	if (IS_ERR(bo)) {
>  		err = PTR_ERR(bo);
>  		if (xe_vm_validate_should_retry(NULL, err, &time_end))
> diff --git a/drivers/gpu/drm/xe/xe_validation.c b/drivers/gpu/drm/xe/xe_validation.c
> new file mode 100644
> index 000000000000..cc0684d24e02
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_validation.c
> @@ -0,0 +1,49 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2024 Intel Corporation
> + */
> +#include "xe_bo.h"
> +#include <drm/drm_exec.h>
> +#include <drm/drm_gem.h>
> +
> +#include "xe_assert.h"
> +#include "xe_validation.h"
> +
> +#ifdef CONFIG_DRM_XE_DEBUG
> +/**
> + * xe_validation_assert_exec() - Assert that the drm_exec pointer is suitable
> + * for validation.
> + * @xe: Pointer to the xe device.
> + * @exec: The drm_exec pointer to check.
> + * @obj: Pointer to the object subject to validation.
> + *
> + * NULL exec pointers are not allowed.
> + * For XE_VALIDATION_UNIMPLEMENTED, no checking.
> + * For XE_VLIDATION_OPT_OUT, check that the caller is a kunit test
> + * For XE_VALIDATION_UNSUPPORTED, check that the object subject to
> + * validation is a dma-buf, for which support for ww locking is
> + * not in place in the dma-buf layer.
> + */
> +void xe_validation_assert_exec(const struct xe_device *xe,
> +			       const struct drm_exec *exec,
> +			       const struct drm_gem_object *obj)
> +{
> +	xe_assert(xe, exec);
> +	if (IS_ERR(exec)) {
> +		switch (PTR_ERR(exec)) {
> +		case __XE_VAL_UNIMPLEMENTED:
> +			break;
> +		case __XE_VAL_UNSUPPORTED:
> +			xe_assert(xe, !!obj->dma_buf);
> +			break;
> +#if IS_ENABLED(CONFIG_KUNIT)
> +		case __XE_VAL_OPT_OUT:
> +			xe_assert(xe, current->kunit_test);
> +			break;
> +#endif
> +		default:
> +			xe_assert(xe, false);
> +		}
> +	}
> +}
> +#endif
> diff --git a/drivers/gpu/drm/xe/xe_validation.h b/drivers/gpu/drm/xe/xe_validation.h
> new file mode 100644
> index 000000000000..db50feacad7a
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_validation.h
> @@ -0,0 +1,69 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2024 Intel Corporation
> + */
> +#ifndef _XE_VALIDATION_H_
> +#define _XE_VALIDATION_H_
> +
> +#include <linux/dma-resv.h>
> +#include <linux/types.h>
> +
> +struct drm_exec;
> +struct drm_gem_object;
> +struct xe_device;
> +
> +#ifdef CONFIG_PROVE_LOCKING
> +/**
> + * xe_validation_lockdep() - Assert that a drm_exec locking transaction can
> + * be initialized at this point.
> + */
> +static inline void xe_validation_lockdep(void)
> +{
> +	struct ww_acquire_ctx ticket;
> +
> +	ww_acquire_init(&ticket, &reservation_ww_class);
> +	ww_acquire_fini(&ticket);
> +}
> +#else
> +static inline void xe_validation_lockdep(void)
> +{
> +}
> +#endif
> +
> +/*
> + * Various values of the drm_exec pointer where we've not (yet)
> + * implemented full ww locking.
> + *
> + * XE_VALIDATION_UNIMPLEMENTED means implementation is pending.
> + * A lockdep check is made to assure that a drm_exec locking
> + * transaction can actually take place where the macro is
> + * used. If this asserts, the exec pointer needs to be assigned
> + * higher up in the callchain and passed down.
> + *
> + * XE_VALIDATION_UNSUPPORTED is for dma-buf code only where
> + * the dma-buf layer doesn't support WW locking.
> + *
> + * XE_VALIDATION_OPT_OUT is for simplification of kunit tests where
> + * exhaustive eviction isn't necessary.
> + */
> +#define __XE_VAL_UNIMPLEMENTED -EINVAL
> +#define XE_VALIDATION_UNIMPLEMENTED (xe_validation_lockdep(),		\
> +				     (struct drm_exec *)ERR_PTR(__XE_VAL_UNIMPLEMENTED))
> +
> +#define __XE_VAL_UNSUPPORTED -EOPNOTSUPP
> +#define XE_VALIDATION_UNSUPPORTED ((struct drm_exec *)ERR_PTR(__XE_VAL_UNSUPPORTED))
> +
> +#define __XE_VAL_OPT_OUT -ENOMEM
> +#define XE_VALIDATION_OPT_OUT (xe_validation_lockdep(), \
> +			       (struct drm_exec *)ERR_PTR(__XE_VAL_OPT_OUT))
> +#ifdef CONFIG_DRM_XE_DEBUG
> +void xe_validation_assert_exec(const struct xe_device *xe, const struct drm_exec *exec,
> +			       const struct drm_gem_object *obj);
> +#else
> +#define xe_validation_assert_exec(_xe, _exec, _obj)	\
> +	do {						\
> +		(void)_xe; (void)_exec; (void)_obj;	\
> +	} while (0)
> +#endif
> +
> +#endif
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 12e661960244..600aaadb4bee 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -393,7 +393,7 @@ static int xe_gpuvm_validate(struct drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
>  		list_move_tail(&gpuva_to_vma(gpuva)->combined_links.rebind,
>  			       &vm->rebind_list);
>  
> -	ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, false);
> +	ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, false, exec);
>  	if (ret)
>  		return ret;
>  
> @@ -451,6 +451,7 @@ static int xe_preempt_work_begin(struct drm_exec *exec, struct xe_vm *vm,
>  	if (err)
>  		return err;
>  
> +	xe_vm_set_validation_exec(vm, exec);
>  	if (xe_vm_is_idle(vm)) {
>  		vm->preempt.rebind_deactivated = true;
>  		*done = true;
> @@ -516,6 +517,7 @@ static void preempt_rebind_work_func(struct work_struct *w)
>  		err = xe_preempt_work_begin(&exec, vm, &done);
>  		drm_exec_retry_on_contention(&exec);
>  		if (err || done) {
> +			xe_vm_set_validation_exec(vm, NULL);
>  			drm_exec_fini(&exec);
>  			if (err && xe_vm_validate_should_retry(&exec, err, &end))
>  				err = -EAGAIN;
> @@ -565,6 +567,7 @@ static void preempt_rebind_work_func(struct work_struct *w)
>  	up_read(&vm->userptr.notifier_lock);
>  
>  out_unlock:
> +	xe_vm_set_validation_exec(vm, NULL);
>  	drm_exec_fini(&exec);
>  out_unlock_outer:
>  	if (err == -EAGAIN) {
> @@ -1375,6 +1378,8 @@ int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma)
>  	err = drm_exec_lock_obj(exec, xe_vm_obj(vm));
>  	if (!err && bo && !bo->vm)
>  		err = drm_exec_lock_obj(exec, &bo->ttm.base);
> +	if (!err)
> +		xe_vm_set_validation_exec(vm, exec);

Do you have imbalance here? I see this function called in xe_pf_begin
and xe_vma_destroy_unlocked but I don't see
xe_vm_set_validation_exec(vm, NULL) called.

>  
>  	return err;
>  }
> @@ -2889,7 +2894,7 @@ static int vma_lock_and_validate(struct drm_exec *exec, struct xe_vma *vma,
>  			err = drm_exec_lock_obj(exec, &bo->ttm.base);
>  		if (!err && validate)
>  			err = xe_bo_validate(bo, vm,
> -					     !xe_vm_in_preempt_fence_mode(vm));
> +					     !xe_vm_in_preempt_fence_mode(vm), exec);
>  	}
>  
>  	return err;
> @@ -3012,7 +3017,8 @@ static int op_lock_and_prep(struct drm_exec *exec, struct xe_vm *vm,
>  					    false);
>  		if (!err && !xe_vma_has_no_bo(vma))
>  			err = xe_bo_migrate(xe_vma_bo(vma),
> -					    region_to_mem_type[region]);
> +					    region_to_mem_type[region],
> +					    exec);
>  		break;
>  	}
>  	default:
> @@ -3052,6 +3058,7 @@ static int vm_bind_ioctl_ops_lock_and_prep(struct drm_exec *exec,
>  	if (err)
>  		return err;
>  
> +	xe_vm_set_validation_exec(vm, exec);
>  	list_for_each_entry(op, &vops->list, link) {
>  		err = op_lock_and_prep(exec, vm, op);
>  		if (err)
> @@ -3850,10 +3857,18 @@ struct dma_fence *xe_vm_bind_kernel_bo(struct xe_vm *vm, struct xe_bo *bo,
>   */
>  int xe_vm_lock(struct xe_vm *vm, bool intr)
>  {
> +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> +	int ret;
> +
>  	if (intr)
> -		return dma_resv_lock_interruptible(xe_vm_resv(vm), NULL);
> +		ret = dma_resv_lock_interruptible(xe_vm_resv(vm), NULL);
> +	else
> +		ret = dma_resv_lock(xe_vm_resv(vm), NULL);
> +
> +	if (!ret)
> +		xe_vm_set_validation_exec(vm, exec);
>  
> -	return dma_resv_lock(xe_vm_resv(vm), NULL);
> +	return ret;
>  }
>  
>  /**
> @@ -3864,6 +3879,7 @@ int xe_vm_lock(struct xe_vm *vm, bool intr)
>   */
>  void xe_vm_unlock(struct xe_vm *vm)
>  {
> +	xe_vm_set_validation_exec(vm, NULL);
>  	dma_resv_unlock(xe_vm_resv(vm));
>  }
>  
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index 2ecb417c19a2..4ba26eed7e96 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -321,7 +321,7 @@ static inline void xe_vm_set_validating(struct xe_vm *vm, bool allow_res_evict)
>  	if (vm && !allow_res_evict) {
>  		xe_vm_assert_held(vm);
>  		/* Pairs with READ_ONCE in xe_vm_is_validating() */
> -		WRITE_ONCE(vm->validating, current);
> +		WRITE_ONCE(vm->validation.validating, current);
>  	}
>  }
>  
> @@ -339,7 +339,7 @@ static inline void xe_vm_clear_validating(struct xe_vm *vm, bool allow_res_evict
>  {
>  	if (vm && !allow_res_evict) {
>  		/* Pairs with READ_ONCE in xe_vm_is_validating() */
> -		WRITE_ONCE(vm->validating, NULL);
> +		WRITE_ONCE(vm->validation.validating, NULL);
>  	}
>  }
>  
> @@ -357,13 +357,40 @@ static inline void xe_vm_clear_validating(struct xe_vm *vm, bool allow_res_evict
>  static inline bool xe_vm_is_validating(struct xe_vm *vm)
>  {
>  	/* Pairs with WRITE_ONCE in xe_vm_is_validating() */
> -	if (READ_ONCE(vm->validating) == current) {
> +	if (READ_ONCE(vm->validation.validating) == current) {
>  		xe_vm_assert_held(vm);
>  		return true;
>  	}
>  	return false;
>  }
>  
> +/**
> + * xe_vm_set_validation_exec() - Accessor to set the drm_exec object
> + * @vm: The vm we want to register a drm_exec object with.
> + * @exec: The exec object we want to register.
> + *
> + * Set the drm_exec object used to lock the vm's resv.
> + */
> +static inline void xe_vm_set_validation_exec(struct xe_vm *vm, struct drm_exec *exec)
> +{
> +	xe_vm_assert_held(vm);
> +	vm->validation._exec = exec;
> +}
> +
> +/**
> + * xe_vm_set_validation_exec() - Accessor to read the drm_exec object
> + * @vm: The vm we want to register a drm_exec object with.
> + *
> + * Return: The drm_exec object used to lock the vm's resv. The value
> + * is a valid pointer, %NULL, or one of the special values defined in
> + * xe_validation.h.
> + */
> +static inline struct drm_exec *xe_vm_validation_exec(struct xe_vm *vm)
> +{
> +	xe_vm_assert_held(vm);
> +	return vm->validation._exec;
> +}
> +
>  /**
>   * xe_vm_has_valid_gpu_mapping() - Advisory helper to check if VMA or SVM range has
>   * a valid GPU mapping
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index 8a07feef503b..2f88808e36bb 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -312,19 +312,35 @@ struct xe_vm {
>  		bool capture_once;
>  	} error_capture;
>  
> +	/**
> +	 * @validation: Validation data only valid with the vm resv held.
> +	 * Note: This is really task state of the task holding the vm resv,
> +	 * and moving forward we should
> +	 * come up with a better way of passing this down the call-
> +	 * chain.

I've already mentioned this, attaching the _exec xe_vma_ops might be
good option as xe_vma_ops has lifetime of only existing for the bind
(i.e., it is stack variable) so you'd only need to set it (i.e., no
clear required).

I think patch largely makes sense.

Matt 

> +	 */
> +	struct {
> +		/**
> +		 * @validation.validating: The task that is currently making bos resident.
> +		 * for this vm.
> +		 * Protected by the VM's resv for writing. Opportunistic reading can be done
> +		 * using READ_ONCE. Note: This is a workaround for the
> +		 * TTM eviction_valuable() callback not being passed a struct
> +		 * ttm_operation_context(). Future work might want to address this.
> +		 */
> +		struct task_struct *validating;
> +		/**
> +		 *  @validation.exec The drm_exec context used when locking the vm resv.
> +		 *  Protected by the vm's resv.
> +		 */
> +		struct drm_exec *_exec;
> +	} validation;
> +
>  	/**
>  	 * @tlb_flush_seqno: Required TLB flush seqno for the next exec.
>  	 * protected by the vm resv.
>  	 */
>  	u64 tlb_flush_seqno;
> -	/**
> -	 * @validating: The task that is currently making bos resident for this vm.
> -	 * Protected by the VM's resv for writing. Opportunistic reading can be done
> -	 * using READ_ONCE. Note: This is a workaround for the
> -	 * TTM eviction_valuable() callback not being passed a struct
> -	 * ttm_operation_context(). Future work might want to address this.
> -	 */
> -	struct task_struct *validating;
>  	/** @batch_invalidate_tlb: Always invalidate TLB before batch start */
>  	bool batch_invalidate_tlb;
>  	/** @xef: XE file handle for tracking this VM's drm client */
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 05/15] drm/xe: Introduce an xe_validation wrapper around drm_exec
  2025-08-13 10:51 ` [PATCH 05/15] drm/xe: Introduce an xe_validation wrapper around drm_exec Thomas Hellström
@ 2025-08-13 17:25   ` Matthew Brost
  2025-08-15 15:04     ` Thomas Hellström
  2025-08-14  2:33   ` Matthew Brost
  2025-08-17 14:05   ` [05/15] " Simon Richter
  2 siblings, 1 reply; 66+ messages in thread
From: Matthew Brost @ 2025-08-13 17:25 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, Aug 13, 2025 at 12:51:11PM +0200, Thomas Hellström wrote:
> Introduce a validation wrapper xe_validation_guard() as a helper
> intended to be used around drm_exec transactions what perform
> validations. Once TTM can handle exhaustive eviction we could
> remove this wrapper or make it mostly a NO-OP unless other
> functionality is added to it.
> 
> Currently the wrapper takes a read lock upon entry and if the
> transaction hits an OOM, all locks are released and the
> transaction is retried with a write-lock. If all other
> validations participate in this scheme, the transaction with
> the write lock will be the only transaction validating and
> should have access to all available non-pinned memory.
> 
> There is currently a problem in that TTM converts -EDEADLOCKS to
> -ENOMEM, and with ww_mutex slowpath error injections, we can hit
> -ENOMEMs without having actually ran out of memory. We abuse
> ww_mutex internals to detect such situations until TTM is fixes
> to not convert the error code. In the meantime, injecting
> ww_mutex slowpath -EDEADLOCKs is a good way to test
> the implementation in the absence of real OOMs.
> 
> Just introduce the wrapper in this commit. It will be hooked up
> to the driver in following commits.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/xe_validation.c | 199 +++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_validation.h | 107 ++++++++++++++++
>  2 files changed, 306 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_validation.c b/drivers/gpu/drm/xe/xe_validation.c
> index cc0684d24e02..cd1424f04237 100644
> --- a/drivers/gpu/drm/xe/xe_validation.c
> +++ b/drivers/gpu/drm/xe/xe_validation.c
> @@ -5,6 +5,7 @@
>  #include "xe_bo.h"
>  #include <drm/drm_exec.h>
>  #include <drm/drm_gem.h>
> +#include <drm/drm_gpuvm.h>
>  
>  #include "xe_assert.h"
>  #include "xe_validation.h"
> @@ -47,3 +48,201 @@ void xe_validation_assert_exec(const struct xe_device *xe,
>  	}
>  }
>  #endif
> +
> +static int xe_validation_lock(struct xe_validation_ctx *ctx)
> +{
> +	struct xe_validation_device *val = ctx->val;
> +	int ret = 0;
> +
> +	if (ctx->flags & DRM_EXEC_INTERRUPTIBLE_WAIT) {
> +		if (ctx->request_exclusive)
> +			ret = down_write_killable(&val->lock);
> +		else
> +			ret = down_read_interruptible(&val->lock);
> +	} else {
> +		if (ctx->request_exclusive)
> +			down_write(&val->lock);
> +		else
> +			down_read(&val->lock);
> +	}
> +
> +	if (!ret) {
> +		ctx->lock_held = true;
> +		ctx->lock_held_exclusive = ctx->request_exclusive;
> +	}
> +
> +	return ret;
> +}
> +
> +static void xe_validation_unlock(struct xe_validation_ctx *ctx)
> +{
> +	if (!ctx->lock_held)
> +		return;
> +
> +	if (ctx->lock_held_exclusive)
> +		up_write(&ctx->val->lock);
> +	else
> +		up_read(&ctx->val->lock);
> +
> +	ctx->lock_held = false;
> +}
> +
> +/**
> + * xe_validation_ctx_init() - Initialize an xe_validation_ctx
> + * @ctx: The xe_validation_ctx to initialize.
> + * @val: The xe_validation_device representing the validation domain.
> + * @exec: The struct drm_exec to use for the transaction.
> + * @flags: The flags to use for drm_exec initialization.
> + * @nr: The number of anticipated buffer object locks. Forwarded to
> + * drm_exec initialization.
> + * @exclusive: Whether to use exclusive locking already on first validation.
> + *
> + * Initialize and lock a an xe_validation transaction using the validation domain
> + * represented by @val. Also initialize the drm_exec object forwarding
> + * @flags and @nr to the drm_exec initialization. The @exclusive parameter should
> + * typically be set to false to avoid locking out other validators from the
> + * domain until an OOM is hit. For testing- or final attempt purposes it can,
> + * however, be set to true.
> + *
> + * Return: %0 on success, %-EINTR if interruptible initial locking failed with a
> + * signal pending.
> + */
> +int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct xe_validation_device *val,
> +			   struct drm_exec *exec, u32 flags, unsigned int nr,
> +			   bool exclusive)
> +{
> +	int ret;
> +
> +	ctx->exec = exec;
> +	ctx->val = val;
> +	ctx->lock_held = false;
> +	ctx->lock_held_exclusive = false;
> +	ctx->request_exclusive = exclusive;
> +	ctx->flags = flags;
> +	ctx->nr = nr;
> +
> +	ret = xe_validation_lock(ctx);
> +	if (ret)
> +		return ret;
> +
> +	drm_exec_init(exec, flags, nr);
> +
> +	return 0;
> +}
> +
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> +/*
> + * This abuses both drm_exec and ww_mutex internals and should be
> + * replaced by checking for -EDEADLK when we can make TTM
> + * stop converting -EDEADLK to -ENOMEM.
> + * An alternative is to not have exhaustive eviction with
> + * CONFIG_DEBUG_WW_MUTEX_SLOWPATH until that happens.
> + */

I vote to keep this the way you have it and live with the abuse until
TTM is updated.

> +static bool xe_validation_contention_injected(struct drm_exec *exec)
> +{
> +	return !!exec->ticket.contending_lock;
> +}
> +
> +#else
> +
> +static bool xe_validation_contention_injected(struct drm_exec *exec)
> +{
> +	return false;
> +}
> +
> +#endif
> +
> +static bool __xe_validation_should_retry(struct xe_validation_ctx *ctx, int ret)
> +{
> +	if (ret == -ENOMEM &&
> +	    ((ctx->request_exclusive &&
> +	      xe_validation_contention_injected(ctx->exec)) ||
> +	     !ctx->request_exclusive)) {
> +		ctx->request_exclusive = true;

Can the locking across multiple GPUs fall when request_exclusive is held
and the GPUs are sharing dma-buffers? I suppose we'd need true WW
locking throughout the stack (TTM) and a ticketed retry to guarantee
foward progress.

> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +/**
> + * xe_validation_exec_lock() - Perform drm_gpuvm_exec_lock within a validation
> + * transaction.
> + * @ctx: An uninitialized xe_validation_ctx.
> + * @vm_exec: An initialized struct vm_exec.
> + * @val: The validation domain.
> + *
> + * The drm_gpuvm_exec_lock() function internally initializes its drm_exec
> + * transaction and therefore doesn't lend itself very well to be using
> + * xe_validation_ctx_init(). Provide a helper that takes an uninitialized
> + * xe_validation_ctx and calls drm_gpuvm_exec_lock() with OOM retry.
> + *
> + * Return: %0 on success, negative error code on failure.
> + */
> +int xe_validation_exec_lock(struct xe_validation_ctx *ctx,
> +			    struct drm_gpuvm_exec *vm_exec,
> +			    struct xe_validation_device *val)
> +{
> +	int ret;
> +
> +	memset(ctx, 0, sizeof(*ctx));
> +	ctx->exec = &vm_exec->exec;
> +	ctx->flags = vm_exec->flags;
> +	ctx->val = val;
> +retry:
> +	ret = xe_validation_lock(ctx);
> +	if (ret)
> +		return ret;
> +
> +	ret = drm_gpuvm_exec_lock(vm_exec);
> +	if (ret) {
> +		xe_validation_unlock(ctx);
> +		if (__xe_validation_should_retry(ctx, ret))
> +			goto retry;
> +	}
> +
> +	return ret;
> +}
> +
> +/**
> + * xe_validation_ctx_fini() - Finalize a validation transaction
> + * @ctx: The Validation transaction to finalize.
> + *
> + * Finalize a validation transaction and its related drm_exec transaction.
> + */
> +void xe_validation_ctx_fini(struct xe_validation_ctx *ctx)
> +{
> +	drm_exec_fini(ctx->exec);
> +	xe_validation_unlock(ctx);
> +}
> +
> +/**
> + * xe_validation_should_retry() - Determine if a validation transaction should retry
> + * @ctx: The validation transaction.
> + * @ret: Pointer to a return value variable.
> + *
> + * Determines whether a validation transaction should retry based on the
> + * internal transaction state and the return value pointed to by @ret.
> + * If a validation should be retried, the transaction is prepared for that,
> + * and the validation locked might be re-locked in exclusive mode, and *@ret
> + * is set to %0. If the re-locking errors, typically due to interruptible
> + * locking with signal pending, *@ret is instead set to -EINTR and the
> + * function returns %false.
> + *
> + * Return: %true if validation should be retried, %false otherwise.
> + */
> +bool xe_validation_should_retry(struct xe_validation_ctx *ctx, int *ret)
> +{
> +	if (__xe_validation_should_retry(ctx, *ret)) {
> +		drm_exec_fini(ctx->exec);
> +		*ret = 0;
> +		if (ctx->request_exclusive != ctx->lock_held_exclusive) {
> +			xe_validation_unlock(ctx);
> +			*ret = xe_validation_lock(ctx);
> +		}
> +		drm_exec_init(ctx->exec, ctx->flags, ctx->nr);
> +		return !*ret;
> +	}
> +
> +	return false;
> +}
> diff --git a/drivers/gpu/drm/xe/xe_validation.h b/drivers/gpu/drm/xe/xe_validation.h
> index db50feacad7a..a708c260cf18 100644
> --- a/drivers/gpu/drm/xe/xe_validation.h
> +++ b/drivers/gpu/drm/xe/xe_validation.h
> @@ -7,9 +7,11 @@
>  
>  #include <linux/dma-resv.h>
>  #include <linux/types.h>
> +#include <linux/rwsem.h>
>  
>  struct drm_exec;
>  struct drm_gem_object;
> +struct drm_gpuvm_exec;
>  struct xe_device;
>  
>  #ifdef CONFIG_PROVE_LOCKING
> @@ -66,4 +68,109 @@ void xe_validation_assert_exec(const struct xe_device *xe, const struct drm_exec
>  	} while (0)
>  #endif
>  
> +/**
> + * struct xe_validation_device - The domain for exhaustive eviction
> + * @lock: The lock used to exclude other processes from allocating graphics memory
> + *
> + * The struct xe_validation_device represents the domain for which we want to use
> + * exhaustive eviction. The @lock is typically grabbed in read mode for allocations
> + * but when graphics memory allocation fails, it is retried with the write mode held.
> + */
> +struct xe_validation_device {
> +	struct rw_semaphore lock;
> +};
> +
> +/**
> + * struct xe_validation_ctx - A struct drm_exec subclass with support for
> + * exhaustive eviction
> + * @exec: The drm_exec object base class. Note that we use a pointer instead of
> + * embedding to avoid diamond inheritance.
> + * @val: The exhaustive eviction domain.
> + * @lock_held: Whether The domain lock is currently held.
> + * @lock_held_exclusive: Whether the domain lock is held in exclusive mode.
> + * @request_exclusive: Whether to lock exclusively (write mode) the next time
> + * the domain lock is locked.
> + * @flags: The drm_exec flags used for drm_exec (re-)initialization.
> + * @nr: The drm_exec nr parameter used for drm_exec (re-)initializaiton.
> + */
> +struct xe_validation_ctx {
> +	struct drm_exec *exec;
> +	struct xe_validation_device *val;
> +	bool lock_held;
> +	bool lock_held_exclusive;
> +	bool request_exclusive;
> +	u32 flags;
> +	unsigned int nr;
> +};
> +
> +int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct xe_validation_device *val,
> +			   struct drm_exec *exec, u32 flags, unsigned int nr,
> +			   bool exclusive);
> +
> +int xe_validation_exec_lock(struct xe_validation_ctx *ctx, struct drm_gpuvm_exec *vm_exec,
> +			    struct xe_validation_device *val);
> +
> +void xe_validation_ctx_fini(struct xe_validation_ctx *ctx);
> +
> +bool xe_validation_should_retry(struct xe_validation_ctx *ctx, int *ret);
> +
> +/**
> + * xe_validation_retry_on_oom() - Retry on oom in an xe_validaton transaction
> + * @_ctx: Pointer to the xe_validation_ctx
> + * @_ret: The current error value possibly holding -ENOMEM
> + *
> + * Use this in way similar to drm_exec_retry_on_contention().
> + * If @_ret contains -ENOMEM the tranaction is restarted once in a way that
> + * blocks other transactions and allows exhastive eviction. If the transaction
> + * was already restarted once, Just return the -ENOMEM. May also set
> + * _ret to -EINTR if not retrying and waits are interruptible.
> + * May only be used within a drm_exec_until_all_locked() loop.
> + */
> +#define xe_validation_retry_on_oom(_ctx, _ret)				\
> +	do {								\
> +		if (xe_validation_should_retry(_ctx, _ret))		\
> +			goto *__drm_exec_retry_ptr;			\
> +	} while (0)
> +
> +/**
> + * xe_validation_device_init - Initialize a struct xe_validation_device
> + * @val: The xe_validation_device to init.
> + */
> +static inline void
> +xe_validation_device_init(struct xe_validation_device *val)
> +{
> +	init_rwsem(&val->lock);
> +}
> +
> +/*
> + * Make guard() and scoped_guard() work with xe_validation_ctx
> + * so that we can exit transactions without caring about the
> + * cleanup.
> + */
> +DEFINE_CLASS(xe_validation, struct xe_validation_ctx *,
> +	     if (!IS_ERR(_T)) xe_validation_ctx_fini(_T);,

I think this should be 'if (_T)', right?

> +	     ({_ret = xe_validation_ctx_init(_ctx, _val, _exec, _flags, 0, _excl);
> +	       _ret ? NULL : _ctx; }),

Or here '_ret ? ERR_PTR(_ret) : _ctx;'

One or the other if I correctly understand how DEFINE_CLASS works.

Matt

> +	     struct xe_validation_ctx *_ctx, struct xe_validation_device *_val,
> +	     struct drm_exec *_exec, u32 _flags, int _ret, bool _excl);
> +static inline void *class_xe_validation_lock_ptr(class_xe_validation_t *_T)
> +{return *_T; }
> +#define class_xe_validation_is_conditional false
> +
> +/**
> + * xe_validation_guard() - An auto-cleanup xe_validation_ctx transaction
> + * @_ctx: The xe_validation_ctx.
> + * @_val: The xe_validation_device.
> + * @_exec: The struct drm_exec object
> + * @_flags: Flags for the drm_exec transaction. See the struct drm_exec documention!
> + * @_ret: Return in / out parameter. May be set by this macro. Typicall 0 when called.
> + * @_excl: Whether to start in exclusive mode already in the first iteration.
> + *
> + * This macro is will initiate a drm_exec transaction with additional support for
> + * exhaustive eviction.
> + */
> +#define xe_validation_guard(_ctx, _val, _exec, _flags, _ret, _excl)	\
> +	scoped_guard(xe_validation, _ctx, _val, _exec, _flags, _ret, _excl) \
> +	drm_exec_until_all_locked(_exec)
> +
>  #endif
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 12/15] drm/xe: Rename ___xe_bo_create_locked()
  2025-08-13 10:51 ` [PATCH 12/15] drm/xe: Rename ___xe_bo_create_locked() Thomas Hellström
@ 2025-08-13 21:33   ` Matthew Brost
  0 siblings, 0 replies; 66+ messages in thread
From: Matthew Brost @ 2025-08-13 21:33 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, Aug 13, 2025 at 12:51:18PM +0200, Thomas Hellström wrote:
> Don't start external function names with underscores.
> Rename to xe_bo_init_locked().
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_bo.c      | 39 ++++++++++++++++-----------------
>  drivers/gpu/drm/xe/xe_bo.h      | 10 ++++-----
>  drivers/gpu/drm/xe/xe_dma_buf.c |  6 ++---
>  3 files changed, 27 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index dd1e0e9957e0..23b28eeef59f 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -1161,10 +1161,10 @@ int xe_bo_notifier_prepare_pinned(struct xe_bo *bo)
>  	if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
>  		goto out_unlock_bo;
>  
> -	backup = ___xe_bo_create_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
> -					DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> -					XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> -					XE_BO_FLAG_PINNED, exec);
> +	backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
> +				   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> +				   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> +				   XE_BO_FLAG_PINNED, exec);
>  	if (IS_ERR(backup)) {
>  		ret = PTR_ERR(backup);
>  		goto out_unlock_bo;
> @@ -1240,11 +1240,10 @@ int xe_bo_evict_pinned(struct xe_bo *bo)
>  		goto out_unlock_bo;
>  
>  	if (!backup) {
> -		backup = ___xe_bo_create_locked(xe, NULL, NULL, bo->ttm.base.resv,
> -						NULL, xe_bo_size(bo),
> -						DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> -						XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> -						XE_BO_FLAG_PINNED, exec);
> +		backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
> +					   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> +					   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> +					   XE_BO_FLAG_PINNED, exec);
>  		if (IS_ERR(backup)) {
>  			ret = PTR_ERR(backup);
>  			goto out_unlock_bo;
> @@ -1861,7 +1860,7 @@ void xe_bo_free(struct xe_bo *bo)
>  }
>  
>  /**
> - * ___xe_bo_create_locked() - Initialize or create an xe_bo.
> + * xe_bo_init_locked() - Initialize or create an xe_bo.
>   * @xe: The xe device.
>   * @bo: An already allocated buffer object or NULL
>   * if the function should allocate a new one.
> @@ -1881,11 +1880,11 @@ void xe_bo_free(struct xe_bo *bo)
>   *
>   * Return: The buffer object on success. Negative error pointer on failure.
>   */
> -struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> -				     struct xe_tile *tile, struct dma_resv *resv,
> -				     struct ttm_lru_bulk_move *bulk, size_t size,
> -				     u16 cpu_caching, enum ttm_bo_type type,
> -				     u32 flags, struct drm_exec *exec)
> +struct xe_bo *xe_bo_init_locked(struct xe_device *xe, struct xe_bo *bo,
> +				struct xe_tile *tile, struct dma_resv *resv,
> +				struct ttm_lru_bulk_move *bulk, size_t size,
> +				u16 cpu_caching, enum ttm_bo_type type,
> +				u32 flags, struct drm_exec *exec)
>  {
>  	struct ttm_operation_ctx ctx = {
>  		.interruptible = true,
> @@ -2077,11 +2076,11 @@ __xe_bo_create_locked(struct xe_device *xe,
>  		}
>  	}
>  
> -	bo = ___xe_bo_create_locked(xe, bo, tile, vm ? xe_vm_resv(vm) : NULL,
> -				    vm && !xe_vm_in_fault_mode(vm) &&
> -				    flags & XE_BO_FLAG_USER ?
> -				    &vm->lru_bulk_move : NULL, size,
> -				    cpu_caching, type, flags, exec);
> +	bo = xe_bo_init_locked(xe, bo, tile, vm ? xe_vm_resv(vm) : NULL,
> +			       vm && !xe_vm_in_fault_mode(vm) &&
> +			       flags & XE_BO_FLAG_USER ?
> +			       &vm->lru_bulk_move : NULL, size,
> +			       cpu_caching, type, flags, exec);
>  	if (IS_ERR(bo))
>  		return bo;
>  
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index c6bb90ca5c2e..a625806deeb6 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -89,11 +89,11 @@ struct sg_table;
>  struct xe_bo *xe_bo_alloc(void);
>  void xe_bo_free(struct xe_bo *bo);
>  
> -struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> -				     struct xe_tile *tile, struct dma_resv *resv,
> -				     struct ttm_lru_bulk_move *bulk, size_t size,
> -				     u16 cpu_caching, enum ttm_bo_type type,
> -				     u32 flags, struct drm_exec *exec);
> +struct xe_bo *xe_bo_init_locked(struct xe_device *xe, struct xe_bo *bo,
> +				struct xe_tile *tile, struct dma_resv *resv,
> +				struct ttm_lru_bulk_move *bulk, size_t size,
> +				u16 cpu_caching, enum ttm_bo_type type,
> +				u32 flags, struct drm_exec *exec);
>  struct xe_bo *
>  xe_bo_create_locked_range(struct xe_device *xe,
>  			  struct xe_tile *tile, struct xe_vm *vm,
> diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
> index 56df1d84df21..ca6e397828ad 100644
> --- a/drivers/gpu/drm/xe/xe_dma_buf.c
> +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
> @@ -239,9 +239,9 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
>  		if (ret)
>  			goto error;
>  
> -		bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
> -					    0, /* Will require 1way or 2way for vm_bind */
> -					    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM, &exec);
> +		bo = xe_bo_init_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
> +				       0, /* Will require 1way or 2way for vm_bind */
> +				       ttm_bo_type_sg, XE_BO_FLAG_SYSTEM, &exec);
>  		drm_exec_retry_on_contention(&exec);
>  		if (IS_ERR(bo)) {
>  			ret = PTR_ERR(bo);
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 11/15] drm/xe: Convert xe_dma_buf.c for exhaustive eviction
  2025-08-13 10:51 ` [PATCH 11/15] drm/xe: Convert xe_dma_buf.c for exhaustive eviction Thomas Hellström
@ 2025-08-13 21:37   ` Matthew Brost
  2025-08-15 15:05     ` Thomas Hellström
  2025-08-14 20:37   ` Matthew Brost
  1 sibling, 1 reply; 66+ messages in thread
From: Matthew Brost @ 2025-08-13 21:37 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, Aug 13, 2025 at 12:51:17PM +0200, Thomas Hellström wrote:
> Convert dma-buf migration to XE_PL_TT and dma-buf import to
> support exhaustive eviction, using xe_validation_guard().
> It seems unlikely that the import would result in an -ENOMEM,
> but convert import anyway for completeness.
> 
> The dma-buf map_attachment() functionality unfortunately doesn't
> support passing a drm_exec, which means that foreign devices
> validating a dma-buf that we exported will not, unless they are
> xeKMD devices, participate in the exhaustive eviction scheme.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/xe_dma_buf.c | 59 +++++++++++++++++++++++----------
>  1 file changed, 42 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
> index 78a827d4e726..56df1d84df21 100644
> --- a/drivers/gpu/drm/xe/xe_dma_buf.c
> +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
> @@ -163,16 +163,27 @@ static int xe_dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
>  	struct xe_bo *bo = gem_to_xe_bo(obj);
>  	bool reads =  (direction == DMA_BIDIRECTIONAL ||
>  		       direction == DMA_FROM_DEVICE);
> -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> +	struct xe_validation_ctx ctx;
> +	struct drm_exec exec;
> +	int ret = 0;
>  
>  	if (!reads)
>  		return 0;
>  
>  	/* Can we do interruptible lock here? */
> -	xe_bo_lock(bo, false);
> -	(void)xe_bo_migrate(bo, XE_PL_TT, exec);
> -	xe_bo_unlock(bo);
> -
> +	xe_validation_guard(&ctx, &xe_bo_device(bo)->val, &exec, 0, ret, false) {
> +		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
> +		drm_exec_retry_on_contention(&exec);
> +		if (ret)
> +			goto out;
> +
> +		ret = xe_bo_migrate(bo, XE_PL_TT, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +		xe_validation_retry_on_oom(&ctx, &ret);
> +	}
> +out:
> +	/* If we failed, cpu-access takes place in current placement. */
> +	(void)ret;

Do you need the above line of code? I don't see this often in kernel code.

Nit aside, patch LGTM.

Matt

>  	return 0;
>  }
>  
> @@ -211,24 +222,38 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
>  {
>  	struct dma_resv *resv = dma_buf->resv;
>  	struct xe_device *xe = to_xe_device(dev);
> -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> +	struct xe_validation_ctx ctx;
> +	struct drm_gem_object *dummy_obj;
> +	struct drm_exec exec;
>  	struct xe_bo *bo;
> -	int ret;
> -
> -	dma_resv_lock(resv, NULL);
> -	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
> -				    0, /* Will require 1way or 2way for vm_bind */
> -				    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM, exec);
> -	if (IS_ERR(bo)) {
> -		ret = PTR_ERR(bo);
> -		goto error;
> +	int ret = 0;
> +
> +	dummy_obj = drm_gpuvm_resv_object_alloc(&xe->drm);
> +	if (!dummy_obj)
> +		return ERR_PTR(-ENOMEM);
> +
> +	dummy_obj->resv = resv;
> +	xe_validation_guard(&ctx, &xe->val, &exec, 0, ret, false) {
> +		ret = drm_exec_lock_obj(&exec, dummy_obj);
> +		drm_exec_retry_on_contention(&exec);
> +		if (ret)
> +			goto error;
> +
> +		bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
> +					    0, /* Will require 1way or 2way for vm_bind */
> +					    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +		if (IS_ERR(bo)) {
> +			ret = PTR_ERR(bo);
> +			xe_validation_retry_on_oom(&ctx, &ret);
> +			goto error;
> +		}
>  	}
> -	dma_resv_unlock(resv);
> +	drm_gem_object_put(dummy_obj);
>  
>  	return &bo->ttm.base;
>  
>  error:
> -	dma_resv_unlock(resv);
>  	return ERR_PTR(ret);
>  }
>  
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 09/15] drm/xe: Convert the CPU fault handler for exhaustive eviction
  2025-08-13 10:51 ` [PATCH 09/15] drm/xe: Convert the CPU fault handler " Thomas Hellström
@ 2025-08-13 22:06   ` Matthew Brost
  2025-08-15 15:16     ` Thomas Hellström
  0 siblings, 1 reply; 66+ messages in thread
From: Matthew Brost @ 2025-08-13 22:06 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, Aug 13, 2025 at 12:51:15PM +0200, Thomas Hellström wrote:
> The CPU fault handler may populate bos and migrate, and in doing
> so might interfere with other tasks validing.
> 
> Convert it for exhaustive eviction. To do this properly without
> potentially introducing stalls with the mmap lock held requires
> TTM work. In the meantime, let's live with those stalls that
> would typically happen on memory pressure.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 5e40b6cb8d2a..dd1e0e9957e0 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -1720,14 +1720,18 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
>  	struct xe_device *xe = to_xe_device(ddev);
>  	struct xe_bo *bo = ttm_to_xe_bo(tbo);
>  	bool needs_rpm = bo->flags & XE_BO_FLAG_VRAM_MASK;
> -	struct drm_exec *exec;
> +	struct xe_validation_ctx ctx;
> +	struct drm_exec exec;
>  	vm_fault_t ret;
>  	int idx;
>  
>  	if (needs_rpm)
>  		xe_pm_runtime_get(xe);
>  
> -	exec = XE_VALIDATION_UNIMPLEMENTED;
> +	if (xe_validation_ctx_init(&ctx, &xe->val, &exec,
> +				   DRM_EXEC_INTERRUPTIBLE_WAIT, 0, false))
> +		return VM_FAULT_NOPAGE;

Any particular reason to not use xe_validation_guard here?

Matt

> +
>  	ret = ttm_bo_vm_reserve(tbo, vmf);
>  	if (ret)
>  		goto out;
> @@ -1735,7 +1739,7 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
>  	if (drm_dev_enter(ddev, &idx)) {
>  		trace_xe_bo_cpu_fault(bo);
>  
> -		xe_validation_assert_exec(xe, exec, &tbo->base);
> +		xe_validation_assert_exec(xe, &exec, &tbo->base);
>  		ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
>  					       TTM_BO_VM_NUM_PREFAULT);
>  		drm_dev_exit(idx);
> @@ -1761,6 +1765,7 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
>  
>  	dma_resv_unlock(tbo->base.resv);
>  out:
> +	xe_validation_ctx_fini(&ctx);
>  	if (needs_rpm)
>  		xe_pm_runtime_put(xe);
>  
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 06/15] drm/xe: Convert xe_bo_create_user() for exhaustive eviction
  2025-08-13 10:51 ` [PATCH 06/15] drm/xe: Convert xe_bo_create_user() for exhaustive eviction Thomas Hellström
@ 2025-08-14  2:23   ` Matthew Brost
  0 siblings, 0 replies; 66+ messages in thread
From: Matthew Brost @ 2025-08-14  2:23 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, Aug 13, 2025 at 12:51:12PM +0200, Thomas Hellström wrote:
> Use the xe_validation_guard() to convert xe_bo_create_user()
> for exhaustive eviction.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/xe/tests/xe_bo.c      |  16 ++--
>  drivers/gpu/drm/xe/tests/xe_dma_buf.c |   4 +-
>  drivers/gpu/drm/xe/tests/xe_migrate.c |  12 +--
>  drivers/gpu/drm/xe/xe_bo.c            | 116 +++++++++++++++++---------
>  drivers/gpu/drm/xe/xe_bo.h            |   9 +-
>  drivers/gpu/drm/xe/xe_device.c        |   2 +
>  drivers/gpu/drm/xe/xe_device_types.h  |   3 +
>  drivers/gpu/drm/xe/xe_vm.c            |  14 ++++
>  drivers/gpu/drm/xe/xe_vm.h            |   2 +
>  9 files changed, 116 insertions(+), 62 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
> index 06ceba6c3c25..42f914692a02 100644
> --- a/drivers/gpu/drm/xe/tests/xe_bo.c
> +++ b/drivers/gpu/drm/xe/tests/xe_bo.c
> @@ -139,8 +139,8 @@ static void ccs_test_run_tile(struct xe_device *xe, struct xe_tile *tile,
>  	else
>  		kunit_info(test, "Testing system memory\n");
>  
> -	bo = xe_bo_create_user(xe, NULL, NULL, SZ_1M, DRM_XE_GEM_CPU_CACHING_WC,
> -			       bo_flags);
> +	bo = xe_bo_create_user(xe, NULL, SZ_1M, DRM_XE_GEM_CPU_CACHING_WC,
> +			       bo_flags, exec);
>  	if (IS_ERR(bo)) {
>  		KUNIT_FAIL(test, "Failed to create bo.\n");
>  		return;
> @@ -220,18 +220,18 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
>  
>  	for (i = 0; i < 2; ++i) {
>  		xe_vm_lock(vm, false);
> -		bo = xe_bo_create_user(xe, NULL, vm, 0x10000,
> +		bo = xe_bo_create_user(xe, vm, 0x10000,
>  				       DRM_XE_GEM_CPU_CACHING_WC,
> -				       bo_flags);
> +				       bo_flags, exec);
>  		xe_vm_unlock(vm);
>  		if (IS_ERR(bo)) {
>  			KUNIT_FAIL(test, "bo create err=%pe\n", bo);
>  			break;
>  		}
>  
> -		external = xe_bo_create_user(xe, NULL, NULL, 0x10000,
> +		external = xe_bo_create_user(xe, NULL, 0x10000,
>  					     DRM_XE_GEM_CPU_CACHING_WC,
> -					     bo_flags);
> +					     bo_flags, NULL);
>  		if (IS_ERR(external)) {
>  			KUNIT_FAIL(test, "external bo create err=%pe\n", external);
>  			goto cleanup_bo;
> @@ -497,9 +497,9 @@ static int shrink_test_run_device(struct xe_device *xe)
>  		INIT_LIST_HEAD(&link->link);
>  
>  		/* We can create bos using WC caching here. But it is slower. */
> -		bo = xe_bo_create_user(xe, NULL, NULL, XE_BO_SHRINK_SIZE,
> +		bo = xe_bo_create_user(xe, NULL, XE_BO_SHRINK_SIZE,
>  				       DRM_XE_GEM_CPU_CACHING_WB,
> -				       XE_BO_FLAG_SYSTEM);
> +				       XE_BO_FLAG_SYSTEM, NULL);
>  		if (IS_ERR(bo)) {
>  			if (bo != ERR_PTR(-ENOMEM) && bo != ERR_PTR(-ENOSPC) &&
>  			    bo != ERR_PTR(-EINTR) && bo != ERR_PTR(-ERESTARTSYS))
> diff --git a/drivers/gpu/drm/xe/tests/xe_dma_buf.c b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
> index 965dd3280468..8126b35f4aeb 100644
> --- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c
> +++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
> @@ -122,8 +122,8 @@ static void xe_test_dmabuf_import_same_driver(struct xe_device *xe)
>  		size = SZ_64K;
>  
>  	kunit_info(test, "running %s\n", __func__);
> -	bo = xe_bo_create_user(xe, NULL, NULL, size, DRM_XE_GEM_CPU_CACHING_WC,
> -			       params->mem_mask);
> +	bo = xe_bo_create_user(xe, NULL, size, DRM_XE_GEM_CPU_CACHING_WC,
> +			       params->mem_mask, NULL);
>  	if (IS_ERR(bo)) {
>  		KUNIT_FAIL(test, "xe_bo_create() failed with err=%ld\n",
>  			   PTR_ERR(bo));
> diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
> index dfb445d09759..afa794e56065 100644
> --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
> @@ -642,11 +642,11 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
>  	struct drm_exec *exec;
>  	long ret;
>  
> -	sys_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M,
> +	sys_bo = xe_bo_create_user(xe, NULL, SZ_4M,
>  				   DRM_XE_GEM_CPU_CACHING_WC,
>  				   XE_BO_FLAG_SYSTEM |
>  				   XE_BO_FLAG_NEEDS_CPU_ACCESS |
> -				   XE_BO_FLAG_PINNED);
> +				   XE_BO_FLAG_PINNED, NULL);
>  
>  	if (IS_ERR(sys_bo)) {
>  		KUNIT_FAIL(test, "xe_bo_create() failed with err=%ld\n",
> @@ -669,10 +669,10 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
>  	}
>  	xe_bo_unlock(sys_bo);
>  
> -	ccs_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M,
> +	ccs_bo = xe_bo_create_user(xe, NULL, SZ_4M,
>  				   DRM_XE_GEM_CPU_CACHING_WC,
>  				   bo_flags | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> -				   XE_BO_FLAG_PINNED);
> +				   XE_BO_FLAG_PINNED, NULL);
>  
>  	if (IS_ERR(ccs_bo)) {
>  		KUNIT_FAIL(test, "xe_bo_create() failed with err=%ld\n",
> @@ -694,10 +694,10 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
>  	}
>  	xe_bo_unlock(ccs_bo);
>  
> -	vram_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M,
> +	vram_bo = xe_bo_create_user(xe, NULL, SZ_4M,
>  				    DRM_XE_GEM_CPU_CACHING_WC,
>  				    bo_flags | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> -				    XE_BO_FLAG_PINNED);
> +				    XE_BO_FLAG_PINNED, NULL);
>  	if (IS_ERR(vram_bo)) {
>  		KUNIT_FAIL(test, "xe_bo_create() failed with err=%ld\n",
>  			   PTR_ERR(vram_bo));
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index e71addf51ed0..5e40b6cb8d2a 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -2185,30 +2185,66 @@ struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
>  				     flags, 0, exec);
>  }
>  
> -struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
> -				struct xe_vm *vm, size_t size,
> -				u16 cpu_caching,
> -				u32 flags)
> -{
> -	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
> -	struct xe_bo *bo = __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL,
> -						 cpu_caching, ttm_bo_type_device,
> -						 flags | XE_BO_FLAG_USER, 0, exec);
> -	if (!IS_ERR(bo))
> -		xe_bo_unlock_vm_held(bo);
> +static struct xe_bo *xe_bo_create_novm(struct xe_device *xe, struct xe_tile *tile,
> +				       size_t size, u16 cpu_caching,
> +				       enum ttm_bo_type type, u32 flags,
> +				       u64 alignment, bool intr)
> +{
> +	u32 drm_exec_flags = intr ? DRM_EXEC_INTERRUPTIBLE_WAIT : 0;
> +	struct xe_validation_ctx ctx;
> +	struct drm_exec exec;
> +	struct xe_bo *bo;
> +	int ret = 0;
>  
> -	return bo;
> +	xe_validation_guard(&ctx, &xe->val, &exec, drm_exec_flags, ret, false) {
> +		bo = __xe_bo_create_locked(xe, tile, NULL, size, 0, ~0ULL,
> +					   cpu_caching, type, flags, alignment, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +		if (IS_ERR(bo)) {
> +			ret = PTR_ERR(bo);
> +			xe_validation_retry_on_oom(&ctx, &ret);
> +		} else {
> +			xe_bo_unlock(bo);
> +		}
> +	}
> +
> +	return ret ? ERR_PTR(ret) : bo;
>  }
>  
> -struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
> -			   struct xe_vm *vm, size_t size,
> -			   enum ttm_bo_type type, u32 flags)
> +/**
> + * xe_bo_create_user() - Create a user BO
> + * @xe: The xe device.
> + * @vm: The local vm or NULL for external objects.
> + * @size: The storage size to use for the bo.
> + * @cpu_caching: The caching mode to be used for system backing store.
> + * @flags: XE_BO_FLAG_ flags.
> + * @exec: The drm_exec transaction to use for exhaustive eviction, or NULL
> + * if such a transaction should be initiated by the call.
> + *
> + * Create a bo on behalf of user-space.
> + *
> + * Return: The buffer object on success. Negative error pointer on failure.
> + */
> +struct xe_bo *xe_bo_create_user(struct xe_device *xe,
> +				struct xe_vm *vm, size_t size,
> +				u16 cpu_caching,
> +				u32 flags, struct drm_exec *exec)
>  {
> -	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
> -	struct xe_bo *bo = xe_bo_create_locked(xe, tile, vm, size, type, flags, exec);
> +	struct xe_bo *bo;
> +
> +	flags |= XE_BO_FLAG_USER;
>  
> -	if (!IS_ERR(bo))
> -		xe_bo_unlock_vm_held(bo);
> +	if (vm || exec) {
> +		xe_assert(xe, exec);
> +		bo = __xe_bo_create_locked(xe, NULL, vm, size, 0, ~0ULL,
> +					   cpu_caching, ttm_bo_type_device,
> +					   flags, 0, exec);
> +		if (!IS_ERR(bo))
> +			xe_bo_unlock_vm_held(bo);
> +	} else {
> +		bo = xe_bo_create_novm(xe, NULL, size, cpu_caching,
> +				       ttm_bo_type_device, flags, 0, true);
> +	}
>  
>  	return bo;
>  }
> @@ -2757,8 +2793,9 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>  	struct xe_device *xe = to_xe_device(dev);
>  	struct xe_file *xef = to_xe_file(file);
>  	struct drm_xe_gem_create *args = data;
> +	struct xe_validation_ctx ctx;
> +	struct drm_exec exec;
>  	struct xe_vm *vm = NULL;
> -	ktime_t end = 0;
>  	struct xe_bo *bo;
>  	unsigned int bo_flags;
>  	u32 handle;
> @@ -2832,25 +2869,26 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>  			return -ENOENT;
>  	}
>  
> -retry:
> -	if (vm) {
> -		err = xe_vm_lock(vm, true);
> -		if (err)
> -			goto out_vm;
> +	err = 0;
> +	xe_validation_guard(&ctx, &xe->val, &exec,
> +			    DRM_EXEC_INTERRUPTIBLE_WAIT, err, false) {
> +		if (vm) {
> +			err = xe_vm_drm_exec_lock(vm, &exec);
> +			drm_exec_retry_on_contention(&exec);
> +			if (err)
> +				break;
> +		}
> +		bo = xe_bo_create_user(xe, vm, args->size, args->cpu_caching,
> +				       bo_flags, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +		if (IS_ERR(bo)) {
> +			err = PTR_ERR(bo);
> +			xe_validation_retry_on_oom(&ctx, &err);
> +			break;
> +		}
>  	}
> -
> -	bo = xe_bo_create_user(xe, NULL, vm, args->size, args->cpu_caching,
> -			       bo_flags);
> -
> -	if (vm)
> -		xe_vm_unlock(vm);
> -
> -	if (IS_ERR(bo)) {
> -		err = PTR_ERR(bo);
> -		if (xe_vm_validate_should_retry(NULL, err, &end))
> -			goto retry;
> +	if (err)
>  		goto out_vm;
> -	}
>  
>  	if (args->extensions) {
>  		err = gem_create_user_extensions(xe, bo, args->extensions, 0);
> @@ -3223,11 +3261,11 @@ int xe_bo_dumb_create(struct drm_file *file_priv,
>  	args->size = ALIGN(mul_u32_u32(args->pitch, args->height),
>  			   page_size);
>  
> -	bo = xe_bo_create_user(xe, NULL, NULL, args->size,
> +	bo = xe_bo_create_user(xe, NULL, args->size,
>  			       DRM_XE_GEM_CPU_CACHING_WC,
>  			       XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
>  			       XE_BO_FLAG_SCANOUT |
> -			       XE_BO_FLAG_NEEDS_CPU_ACCESS);
> +			       XE_BO_FLAG_NEEDS_CPU_ACCESS, NULL);
>  	if (IS_ERR(bo))
>  		return PTR_ERR(bo);
>  
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index b1b6cb622d71..c6bb90ca5c2e 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -104,13 +104,8 @@ struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
>  				  struct xe_vm *vm, size_t size,
>  				  enum ttm_bo_type type, u32 flags,
>  				  struct drm_exec *exec);
> -struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
> -			   struct xe_vm *vm, size_t size,
> -			   enum ttm_bo_type type, u32 flags);
> -struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
> -				struct xe_vm *vm, size_t size,
> -				u16 cpu_caching,
> -				u32 flags);
> +struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_vm *vm, size_t size,
> +				u16 cpu_caching, u32 flags, struct drm_exec *exec);
>  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
>  				   struct xe_vm *vm, size_t size,
>  				   enum ttm_bo_type type, u32 flags);
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 3e0402dff423..6b152aa89dbb 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -452,6 +452,8 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
>  	if (err)
>  		goto err;
>  
> +	xe_validation_device_init(&xe->val);
> +
>  	init_waitqueue_head(&xe->ufence_wq);
>  
>  	init_rwsem(&xe->usm.lock);
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 01e8fa0d2f9f..a4eb32bac151 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -26,6 +26,7 @@
>  #include "xe_sriov_vf_ccs_types.h"
>  #include "xe_step_types.h"
>  #include "xe_survivability_mode_types.h"
> +#include "xe_validation.h"
>  
>  #if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
>  #define TEST_VM_OPS_ERROR
> @@ -575,6 +576,8 @@ struct xe_device {
>  	 */
>  	atomic64_t global_total_pages;
>  #endif
> +	/** @val: The domain for exhaustive eviction, which is currently per device. */
> +	struct xe_validation_device val;
>  
>  	/* private: */
>  
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 600aaadb4bee..1c2d9d9065c6 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -47,6 +47,20 @@ static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm)
>  	return vm->gpuvm.r_obj;
>  }
>  
> +/**
> + * xe_vm_drm_exec_lock() - Lock the vm's resv with a drm_exec transaction
> + * @vm: The vm whose resv is to be locked.
> + * @exec: The drm_exec transaction.
> + *
> + * Helper to lock the vm's resv as part of a drm_exec transaction.
> + *
> + * Return: %0 on success. See drm_exec_lock_obj() for error codes.
> + */
> +int xe_vm_drm_exec_lock(struct xe_vm *vm, struct drm_exec *exec)
> +{
> +	return drm_exec_lock_obj(exec, xe_vm_obj(vm));
> +}
> +
>  /**
>   * xe_vma_userptr_check_repin() - Advisory check for repin needed
>   * @uvma: The userptr vma
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index 4ba26eed7e96..3b6e7234dac4 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -292,6 +292,8 @@ void xe_vm_kill(struct xe_vm *vm, bool unlocked);
>   */
>  #define xe_vm_assert_held(vm) dma_resv_assert_held(xe_vm_resv(vm))
>  
> +int xe_vm_drm_exec_lock(struct xe_vm *vm, struct drm_exec *exec);
> +
>  #if IS_ENABLED(CONFIG_DRM_XE_DEBUG_VM)
>  #define vm_dbg drm_dbg
>  #else
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 05/15] drm/xe: Introduce an xe_validation wrapper around drm_exec
  2025-08-13 10:51 ` [PATCH 05/15] drm/xe: Introduce an xe_validation wrapper around drm_exec Thomas Hellström
  2025-08-13 17:25   ` Matthew Brost
@ 2025-08-14  2:33   ` Matthew Brost
  2025-08-14  4:23     ` Matthew Brost
  2025-08-15 15:23     ` Thomas Hellström
  2025-08-17 14:05   ` [05/15] " Simon Richter
  2 siblings, 2 replies; 66+ messages in thread
From: Matthew Brost @ 2025-08-14  2:33 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, Aug 13, 2025 at 12:51:11PM +0200, Thomas Hellström wrote:
> Introduce a validation wrapper xe_validation_guard() as a helper
> intended to be used around drm_exec transactions what perform
> validations. Once TTM can handle exhaustive eviction we could
> remove this wrapper or make it mostly a NO-OP unless other
> functionality is added to it.
> 
> Currently the wrapper takes a read lock upon entry and if the
> transaction hits an OOM, all locks are released and the
> transaction is retried with a write-lock. If all other
> validations participate in this scheme, the transaction with
> the write lock will be the only transaction validating and
> should have access to all available non-pinned memory.
> 
> There is currently a problem in that TTM converts -EDEADLOCKS to
> -ENOMEM, and with ww_mutex slowpath error injections, we can hit
> -ENOMEMs without having actually ran out of memory. We abuse
> ww_mutex internals to detect such situations until TTM is fixes
> to not convert the error code. In the meantime, injecting
> ww_mutex slowpath -EDEADLOCKs is a good way to test
> the implementation in the absence of real OOMs.
> 
> Just introduce the wrapper in this commit. It will be hooked up
> to the driver in following commits.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/xe_validation.c | 199 +++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_validation.h | 107 ++++++++++++++++
>  2 files changed, 306 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_validation.c b/drivers/gpu/drm/xe/xe_validation.c
> index cc0684d24e02..cd1424f04237 100644
> --- a/drivers/gpu/drm/xe/xe_validation.c
> +++ b/drivers/gpu/drm/xe/xe_validation.c
> @@ -5,6 +5,7 @@
>  #include "xe_bo.h"
>  #include <drm/drm_exec.h>
>  #include <drm/drm_gem.h>
> +#include <drm/drm_gpuvm.h>
>  
>  #include "xe_assert.h"
>  #include "xe_validation.h"
> @@ -47,3 +48,201 @@ void xe_validation_assert_exec(const struct xe_device *xe,
>  	}
>  }
>  #endif
> +
> +static int xe_validation_lock(struct xe_validation_ctx *ctx)
> +{
> +	struct xe_validation_device *val = ctx->val;
> +	int ret = 0;
> +
> +	if (ctx->flags & DRM_EXEC_INTERRUPTIBLE_WAIT) {
> +		if (ctx->request_exclusive)
> +			ret = down_write_killable(&val->lock);
> +		else
> +			ret = down_read_interruptible(&val->lock);
> +	} else {
> +		if (ctx->request_exclusive)
> +			down_write(&val->lock);
> +		else
> +			down_read(&val->lock);
> +	}
> +
> +	if (!ret) {
> +		ctx->lock_held = true;
> +		ctx->lock_held_exclusive = ctx->request_exclusive;
> +	}
> +
> +	return ret;
> +}
> +
> +static void xe_validation_unlock(struct xe_validation_ctx *ctx)
> +{
> +	if (!ctx->lock_held)
> +		return;
> +
> +	if (ctx->lock_held_exclusive)
> +		up_write(&ctx->val->lock);
> +	else
> +		up_read(&ctx->val->lock);
> +
> +	ctx->lock_held = false;
> +}
> +
> +/**
> + * xe_validation_ctx_init() - Initialize an xe_validation_ctx
> + * @ctx: The xe_validation_ctx to initialize.
> + * @val: The xe_validation_device representing the validation domain.
> + * @exec: The struct drm_exec to use for the transaction.
> + * @flags: The flags to use for drm_exec initialization.
> + * @nr: The number of anticipated buffer object locks. Forwarded to
> + * drm_exec initialization.
> + * @exclusive: Whether to use exclusive locking already on first validation.

The last two parameters of this function are always passed as 0 and
false in this series. Is it worth keeping them? I don’t see a case where
nr would ever be non-zero. exclusive is defensible, but it’s still
unused. Maybe drop both and reserve a bit in flags for a driver-defined
“exclusive.” That would make the call sites more readable—long argument
lists make it easy to forget what each parameter means or to transpose
them.

> + *
> + * Initialize and lock a an xe_validation transaction using the validation domain
> + * represented by @val. Also initialize the drm_exec object forwarding
> + * @flags and @nr to the drm_exec initialization. The @exclusive parameter should
> + * typically be set to false to avoid locking out other validators from the
> + * domain until an OOM is hit. For testing- or final attempt purposes it can,
> + * however, be set to true.
> + *
> + * Return: %0 on success, %-EINTR if interruptible initial locking failed with a
> + * signal pending.
> + */
> +int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct xe_validation_device *val,
> +			   struct drm_exec *exec, u32 flags, unsigned int nr,
> +			   bool exclusive)
> +{
> +	int ret;
> +
> +	ctx->exec = exec;
> +	ctx->val = val;
> +	ctx->lock_held = false;
> +	ctx->lock_held_exclusive = false;
> +	ctx->request_exclusive = exclusive;
> +	ctx->flags = flags;
> +	ctx->nr = nr;
> +
> +	ret = xe_validation_lock(ctx);
> +	if (ret)
> +		return ret;
> +
> +	drm_exec_init(exec, flags, nr);
> +
> +	return 0;
> +}
> +
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> +/*
> + * This abuses both drm_exec and ww_mutex internals and should be
> + * replaced by checking for -EDEADLK when we can make TTM
> + * stop converting -EDEADLK to -ENOMEM.
> + * An alternative is to not have exhaustive eviction with
> + * CONFIG_DEBUG_WW_MUTEX_SLOWPATH until that happens.
> + */
> +static bool xe_validation_contention_injected(struct drm_exec *exec)
> +{
> +	return !!exec->ticket.contending_lock;
> +}
> +
> +#else
> +
> +static bool xe_validation_contention_injected(struct drm_exec *exec)
> +{
> +	return false;
> +}
> +
> +#endif
> +
> +static bool __xe_validation_should_retry(struct xe_validation_ctx *ctx, int ret)
> +{
> +	if (ret == -ENOMEM &&
> +	    ((ctx->request_exclusive &&
> +	      xe_validation_contention_injected(ctx->exec)) ||
> +	     !ctx->request_exclusive)) {
> +		ctx->request_exclusive = true;
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +/**
> + * xe_validation_exec_lock() - Perform drm_gpuvm_exec_lock within a validation
> + * transaction.
> + * @ctx: An uninitialized xe_validation_ctx.
> + * @vm_exec: An initialized struct vm_exec.
> + * @val: The validation domain.
> + *
> + * The drm_gpuvm_exec_lock() function internally initializes its drm_exec
> + * transaction and therefore doesn't lend itself very well to be using
> + * xe_validation_ctx_init(). Provide a helper that takes an uninitialized
> + * xe_validation_ctx and calls drm_gpuvm_exec_lock() with OOM retry.
> + *
> + * Return: %0 on success, negative error code on failure.
> + */
> +int xe_validation_exec_lock(struct xe_validation_ctx *ctx,
> +			    struct drm_gpuvm_exec *vm_exec,
> +			    struct xe_validation_device *val)
> +{
> +	int ret;
> +
> +	memset(ctx, 0, sizeof(*ctx));
> +	ctx->exec = &vm_exec->exec;
> +	ctx->flags = vm_exec->flags;
> +	ctx->val = val;
> +retry:
> +	ret = xe_validation_lock(ctx);
> +	if (ret)
> +		return ret;
> +
> +	ret = drm_gpuvm_exec_lock(vm_exec);
> +	if (ret) {
> +		xe_validation_unlock(ctx);
> +		if (__xe_validation_should_retry(ctx, ret))
> +			goto retry;
> +	}
> +
> +	return ret;
> +}
> +
> +/**
> + * xe_validation_ctx_fini() - Finalize a validation transaction
> + * @ctx: The Validation transaction to finalize.
> + *
> + * Finalize a validation transaction and its related drm_exec transaction.
> + */
> +void xe_validation_ctx_fini(struct xe_validation_ctx *ctx)
> +{
> +	drm_exec_fini(ctx->exec);
> +	xe_validation_unlock(ctx);
> +}
> +
> +/**
> + * xe_validation_should_retry() - Determine if a validation transaction should retry
> + * @ctx: The validation transaction.
> + * @ret: Pointer to a return value variable.
> + *
> + * Determines whether a validation transaction should retry based on the
> + * internal transaction state and the return value pointed to by @ret.
> + * If a validation should be retried, the transaction is prepared for that,
> + * and the validation locked might be re-locked in exclusive mode, and *@ret
> + * is set to %0. If the re-locking errors, typically due to interruptible
> + * locking with signal pending, *@ret is instead set to -EINTR and the
> + * function returns %false.
> + *
> + * Return: %true if validation should be retried, %false otherwise.
> + */
> +bool xe_validation_should_retry(struct xe_validation_ctx *ctx, int *ret)
> +{
> +	if (__xe_validation_should_retry(ctx, *ret)) {
> +		drm_exec_fini(ctx->exec);
> +		*ret = 0;
> +		if (ctx->request_exclusive != ctx->lock_held_exclusive) {
> +			xe_validation_unlock(ctx);
> +			*ret = xe_validation_lock(ctx);
> +		}
> +		drm_exec_init(ctx->exec, ctx->flags, ctx->nr);
> +		return !*ret;
> +	}
> +
> +	return false;
> +}
> diff --git a/drivers/gpu/drm/xe/xe_validation.h b/drivers/gpu/drm/xe/xe_validation.h
> index db50feacad7a..a708c260cf18 100644
> --- a/drivers/gpu/drm/xe/xe_validation.h
> +++ b/drivers/gpu/drm/xe/xe_validation.h
> @@ -7,9 +7,11 @@
>  
>  #include <linux/dma-resv.h>
>  #include <linux/types.h>
> +#include <linux/rwsem.h>
>  
>  struct drm_exec;
>  struct drm_gem_object;
> +struct drm_gpuvm_exec;
>  struct xe_device;
>  
>  #ifdef CONFIG_PROVE_LOCKING
> @@ -66,4 +68,109 @@ void xe_validation_assert_exec(const struct xe_device *xe, const struct drm_exec
>  	} while (0)
>  #endif
>  
> +/**
> + * struct xe_validation_device - The domain for exhaustive eviction
> + * @lock: The lock used to exclude other processes from allocating graphics memory
> + *
> + * The struct xe_validation_device represents the domain for which we want to use
> + * exhaustive eviction. The @lock is typically grabbed in read mode for allocations
> + * but when graphics memory allocation fails, it is retried with the write mode held.
> + */
> +struct xe_validation_device {
> +	struct rw_semaphore lock;
> +};
> +
> +/**
> + * struct xe_validation_ctx - A struct drm_exec subclass with support for
> + * exhaustive eviction
> + * @exec: The drm_exec object base class. Note that we use a pointer instead of
> + * embedding to avoid diamond inheritance.
> + * @val: The exhaustive eviction domain.
> + * @lock_held: Whether The domain lock is currently held.
> + * @lock_held_exclusive: Whether the domain lock is held in exclusive mode.
> + * @request_exclusive: Whether to lock exclusively (write mode) the next time
> + * the domain lock is locked.
> + * @flags: The drm_exec flags used for drm_exec (re-)initialization.
> + * @nr: The drm_exec nr parameter used for drm_exec (re-)initializaiton.
> + */
> +struct xe_validation_ctx {
> +	struct drm_exec *exec;
> +	struct xe_validation_device *val;
> +	bool lock_held;
> +	bool lock_held_exclusive;
> +	bool request_exclusive;
> +	u32 flags;
> +	unsigned int nr;
> +};
> +
> +int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct xe_validation_device *val,
> +			   struct drm_exec *exec, u32 flags, unsigned int nr,
> +			   bool exclusive);
> +
> +int xe_validation_exec_lock(struct xe_validation_ctx *ctx, struct drm_gpuvm_exec *vm_exec,
> +			    struct xe_validation_device *val);
> +
> +void xe_validation_ctx_fini(struct xe_validation_ctx *ctx);
> +
> +bool xe_validation_should_retry(struct xe_validation_ctx *ctx, int *ret);
> +
> +/**
> + * xe_validation_retry_on_oom() - Retry on oom in an xe_validaton transaction
> + * @_ctx: Pointer to the xe_validation_ctx
> + * @_ret: The current error value possibly holding -ENOMEM
> + *
> + * Use this in way similar to drm_exec_retry_on_contention().
> + * If @_ret contains -ENOMEM the tranaction is restarted once in a way that
> + * blocks other transactions and allows exhastive eviction. If the transaction
> + * was already restarted once, Just return the -ENOMEM. May also set
> + * _ret to -EINTR if not retrying and waits are interruptible.
> + * May only be used within a drm_exec_until_all_locked() loop.
> + */
> +#define xe_validation_retry_on_oom(_ctx, _ret)				\
> +	do {								\
> +		if (xe_validation_should_retry(_ctx, _ret))		\
> +			goto *__drm_exec_retry_ptr;			\
> +	} while (0)
> +
> +/**
> + * xe_validation_device_init - Initialize a struct xe_validation_device
> + * @val: The xe_validation_device to init.
> + */
> +static inline void
> +xe_validation_device_init(struct xe_validation_device *val)
> +{
> +	init_rwsem(&val->lock);
> +}
> +
> +/*
> + * Make guard() and scoped_guard() work with xe_validation_ctx
> + * so that we can exit transactions without caring about the
> + * cleanup.
> + */
> +DEFINE_CLASS(xe_validation, struct xe_validation_ctx *,
> +	     if (!IS_ERR(_T)) xe_validation_ctx_fini(_T);,
> +	     ({_ret = xe_validation_ctx_init(_ctx, _val, _exec, _flags, 0, _excl);
> +	       _ret ? NULL : _ctx; }),
> +	     struct xe_validation_ctx *_ctx, struct xe_validation_device *_val,
> +	     struct drm_exec *_exec, u32 _flags, int _ret, bool _excl);
> +static inline void *class_xe_validation_lock_ptr(class_xe_validation_t *_T)
> +{return *_T; }
> +#define class_xe_validation_is_conditional false
> +
> +/**
> + * xe_validation_guard() - An auto-cleanup xe_validation_ctx transaction
> + * @_ctx: The xe_validation_ctx.
> + * @_val: The xe_validation_device.
> + * @_exec: The struct drm_exec object
> + * @_flags: Flags for the drm_exec transaction. See the struct drm_exec documention!
> + * @_ret: Return in / out parameter. May be set by this macro. Typicall 0 when called.
> + * @_excl: Whether to start in exclusive mode already in the first iteration.
> + *

Same comment as above on function xe_validation_ctx_init wrt to
arguments.

Matt

> + * This macro is will initiate a drm_exec transaction with additional support for
> + * exhaustive eviction.
> + */
> +#define xe_validation_guard(_ctx, _val, _exec, _flags, _ret, _excl)	\
> +	scoped_guard(xe_validation, _ctx, _val, _exec, _flags, _ret, _excl) \
> +	drm_exec_until_all_locked(_exec)
> +
>  #endif
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 10/15] drm/xe/display: Convert __xe_pin_fb_vma()
  2025-08-13 10:51 ` [PATCH 10/15] drm/xe/display: Convert __xe_pin_fb_vma() Thomas Hellström
@ 2025-08-14  2:35   ` Matthew Brost
  0 siblings, 0 replies; 66+ messages in thread
From: Matthew Brost @ 2025-08-14  2:35 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, Aug 13, 2025 at 12:51:16PM +0200, Thomas Hellström wrote:
> Convert __xe_pin_fb_vma() for exhaustive eviction
> using xe_validation_guard().
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

I'll leave the to someone who works in display, but patch looks correct
to me.

Matt

> ---
>  drivers/gpu/drm/xe/display/xe_fb_pin.c | 27 +++++++++++++++-----------
>  1 file changed, 16 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> index 4b0748e6fdd6..43c45344ea26 100644
> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> @@ -281,7 +281,8 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
>  	struct i915_vma *vma = kzalloc(sizeof(*vma), GFP_KERNEL);
>  	struct drm_gem_object *obj = intel_fb_bo(&fb->base);
>  	struct xe_bo *bo = gem_to_xe_bo(obj);
> -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> +	struct xe_validation_ctx ctx;
> +	struct drm_exec exec;
>  	int ret;
>  
>  	if (!vma)
> @@ -309,17 +310,21 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
>  	 * Pin the framebuffer, we can't use xe_bo_(un)pin functions as the
>  	 * assumptions are incorrect for framebuffers
>  	 */
> -	ret = ttm_bo_reserve(&bo->ttm, false, false, NULL);
> -	if (ret)
> -		goto err;
> +	xe_validation_guard(&ctx, &xe->val, &exec, 0, ret, false) {
> +		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
> +		drm_exec_retry_on_contention(&exec);
> +		if (ret)
> +			goto err;
>  
> -	if (IS_DGFX(xe))
> -		ret = xe_bo_migrate(bo, XE_PL_VRAM0, exec);
> -	else
> -		ret = xe_bo_validate(bo, NULL, true, exec);
> -	if (!ret)
> -		ttm_bo_pin(&bo->ttm);
> -	ttm_bo_unreserve(&bo->ttm);
> +		if (IS_DGFX(xe))
> +			ret = xe_bo_migrate(bo, XE_PL_VRAM0, &exec);
> +		else
> +			ret = xe_bo_validate(bo, NULL, true, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +		xe_validation_retry_on_oom(&ctx, &ret);
> +		if (!ret)
> +			ttm_bo_pin(&bo->ttm);
> +	}
>  	if (ret)
>  		goto err;
>  
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/15] drm/xe: Convert existing drm_exec transactions for exhaustive eviction
  2025-08-13 10:51 ` [PATCH 08/15] drm/xe: Convert existing drm_exec transactions " Thomas Hellström
@ 2025-08-14  2:48   ` Matthew Brost
  0 siblings, 0 replies; 66+ messages in thread
From: Matthew Brost @ 2025-08-14  2:48 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, Aug 13, 2025 at 12:51:14PM +0200, Thomas Hellström wrote:
> Convert existing drm_exec transactions, like GT pagefault validation,
> non-LR exec() IOCTL and the rebind worker to support
> exhaustive eviction using the xe_validation_guard().
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_exec.c         |  20 ++--
>  drivers/gpu/drm/xe/xe_gt_pagefault.c |  20 ++--
>  drivers/gpu/drm/xe/xe_svm.c          |   4 -
>  drivers/gpu/drm/xe/xe_vm.c           | 132 +++++++++++----------------
>  drivers/gpu/drm/xe/xe_vm.h           |   2 -
>  5 files changed, 70 insertions(+), 108 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> index 0bcb4fb9a10e..cdc3ff931a90 100644
> --- a/drivers/gpu/drm/xe/xe_exec.c
> +++ b/drivers/gpu/drm/xe/xe_exec.c
> @@ -119,10 +119,10 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  	struct drm_gpuvm_exec vm_exec = {.extra.fn = xe_exec_fn};
>  	struct drm_exec *exec = &vm_exec.exec;
>  	u32 i, num_syncs, num_ufence = 0;
> +	struct xe_validation_ctx ctx;
>  	struct xe_sched_job *job;
>  	struct xe_vm *vm;
>  	bool write_locked, skip_retry = false;
> -	ktime_t end = 0;
>  	int err = 0;
>  	struct xe_hw_engine_group *group;
>  	enum xe_hw_engine_group_execution_mode mode, previous_mode;
> @@ -241,17 +241,12 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  		goto err_unlock_list;
>  	}
>  
> -	vm_exec.vm = &vm->gpuvm;
> -	vm_exec.flags = DRM_EXEC_INTERRUPTIBLE_WAIT;
> -	if (xe_vm_in_lr_mode(vm)) {
> -		drm_exec_init(exec, vm_exec.flags, 0);
> -	} else {
> -		err = drm_gpuvm_exec_lock(&vm_exec);
> -		if (err) {
> -			if (xe_vm_validate_should_retry(exec, err, &end))
> -				err = -EAGAIN;
> +	if (!xe_vm_in_lr_mode(vm)) {
> +		vm_exec.vm = &vm->gpuvm;
> +		vm_exec.flags = DRM_EXEC_INTERRUPTIBLE_WAIT;
> +		err = xe_validation_exec_lock(&ctx, &vm_exec, &xe->val);
> +		if (err)
>  			goto err_unlock_list;
> -		}
>  	}
>  
>  	if (xe_vm_is_closed_or_banned(q->vm)) {
> @@ -345,7 +340,8 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  	if (err)
>  		xe_sched_job_put(job);
>  err_exec:
> -	drm_exec_fini(exec);
> +	if (!xe_vm_in_lr_mode(vm))
> +		xe_validation_ctx_fini(&ctx);
>  err_unlock_list:
>  	up_read(&vm->lock);
>  	if (err == -EAGAIN && !skip_retry)
> diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> index 2c7f10cc423f..67dc503d6e04 100644
> --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> @@ -112,9 +112,9 @@ static int handle_vma_pagefault(struct xe_gt *gt, struct xe_vma *vma,
>  {
>  	struct xe_vm *vm = xe_vma_vm(vma);
>  	struct xe_tile *tile = gt_to_tile(gt);
> +	struct xe_validation_ctx ctx;
>  	struct drm_exec exec;
>  	struct dma_fence *fence;
> -	ktime_t end = 0;
>  	int err;
>  
>  	lockdep_assert_held_write(&vm->lock);
> @@ -139,12 +139,11 @@ static int handle_vma_pagefault(struct xe_gt *gt, struct xe_vma *vma,
>  	}
>  
>  	/* Lock VM and BOs dma-resv */
> -	drm_exec_init(&exec, 0, 0);
> +	xe_validation_ctx_init(&ctx, &vm->xe->val, &exec, 0, 0, false);
>  	drm_exec_until_all_locked(&exec) {
>  		err = xe_pf_begin(&exec, vma, atomic, tile->mem.vram);
>  		drm_exec_retry_on_contention(&exec);
> -		if (xe_vm_validate_should_retry(&exec, err, &end))
> -			err = -EAGAIN;
> +		xe_validation_retry_on_oom(&ctx, &err);
>  		if (err)
>  			goto unlock_dma_resv;
>  
> @@ -153,8 +152,7 @@ static int handle_vma_pagefault(struct xe_gt *gt, struct xe_vma *vma,
>  		fence = xe_vma_rebind(vm, vma, BIT(tile->id));
>  		if (IS_ERR(fence)) {
>  			err = PTR_ERR(fence);
> -			if (xe_vm_validate_should_retry(&exec, err, &end))
> -				err = -EAGAIN;
> +			xe_validation_retry_on_oom(&ctx, &err);
>  			goto unlock_dma_resv;
>  		}
>  	}
> @@ -163,7 +161,7 @@ static int handle_vma_pagefault(struct xe_gt *gt, struct xe_vma *vma,
>  	dma_fence_put(fence);
>  
>  unlock_dma_resv:
> -	drm_exec_fini(&exec);
> +	xe_validation_ctx_fini(&ctx);
>  	if (err == -EAGAIN)
>  		goto retry_userptr;
>  
> @@ -545,6 +543,7 @@ static int handle_acc(struct xe_gt *gt, struct acc *acc)
>  {
>  	struct xe_device *xe = gt_to_xe(gt);
>  	struct xe_tile *tile = gt_to_tile(gt);
> +	struct xe_validation_ctx ctx;
>  	struct drm_exec exec;
>  	struct xe_vm *vm;
>  	struct xe_vma *vma;
> @@ -574,15 +573,14 @@ static int handle_acc(struct xe_gt *gt, struct acc *acc)
>  		goto unlock_vm;
>  
>  	/* Lock VM and BOs dma-resv */
> -	drm_exec_init(&exec, 0, 0);
> +	xe_validation_ctx_init(&ctx, &vm->xe->val, &exec, 0, 0, false);
>  	drm_exec_until_all_locked(&exec) {
>  		ret = xe_pf_begin(&exec, vma, true, tile->mem.vram);
>  		drm_exec_retry_on_contention(&exec);
> -		if (ret)
> -			break;
> +		xe_validation_retry_on_oom(&ctx, &ret);
>  	}
>  
> -	drm_exec_fini(&exec);
> +	xe_validation_ctx_fini(&ctx);
>  unlock_vm:
>  	up_read(&vm->lock);
>  	xe_vm_put(vm);
> diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> index ba85665d85d4..93d10f0b81cb 100644
> --- a/drivers/gpu/drm/xe/xe_svm.c
> +++ b/drivers/gpu/drm/xe/xe_svm.c
> @@ -821,7 +821,6 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
>  	struct dma_fence *fence;
>  	struct xe_tile *tile = gt_to_tile(gt);
>  	int migrate_try_count = ctx.devmem_only ? 3 : 1;
> -	ktime_t end = 0;
>  	int err;
>  
>  	lockdep_assert_held_write(&vm->lock);
> @@ -891,7 +890,6 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
>  
>  	range_debug(range, "PAGE FAULT - BIND");
>  
> -retry_bind:
>  	xe_vm_lock(vm, false);
>  	fence = xe_vm_range_rebind(vm, vma, range, BIT(tile->id));
>  	if (IS_ERR(fence)) {
> @@ -902,8 +900,6 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
>  			range_debug(range, "PAGE FAULT - RETRY BIND");
>  			goto retry;
>  		}
> -		if (xe_vm_validate_should_retry(NULL, err, &end))
> -			goto retry_bind;
>  		goto err_out;
>  	}
>  	xe_vm_unlock(vm);
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 1c2d9d9065c6..989d84c2e82f 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -241,6 +241,7 @@ int xe_vm_add_compute_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q)
>  		.num_fences = 1,
>  	};
>  	struct drm_exec *exec = &vm_exec.exec;
> +	struct xe_validation_ctx ctx;
>  	struct dma_fence *pfence;
>  	int err;
>  	bool wait;
> @@ -248,7 +249,7 @@ int xe_vm_add_compute_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q)
>  	xe_assert(vm->xe, xe_vm_in_preempt_fence_mode(vm));
>  
>  	down_write(&vm->lock);
> -	err = drm_gpuvm_exec_lock(&vm_exec);
> +	err = xe_validation_exec_lock(&ctx, &vm_exec, &vm->xe->val);
>  	if (err)
>  		goto out_up_write;
>  
> @@ -280,7 +281,7 @@ int xe_vm_add_compute_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q)
>  	up_read(&vm->userptr.notifier_lock);
>  
>  out_fini:
> -	drm_exec_fini(exec);
> +	xe_validation_ctx_fini(&ctx);
>  out_up_write:
>  	up_write(&vm->lock);
>  
> @@ -363,39 +364,6 @@ void xe_vm_kill(struct xe_vm *vm, bool unlocked)
>  	/* TODO: Inform user the VM is banned */
>  }
>  
> -/**
> - * xe_vm_validate_should_retry() - Whether to retry after a validate error.
> - * @exec: The drm_exec object used for locking before validation.
> - * @err: The error returned from ttm_bo_validate().
> - * @end: A ktime_t cookie that should be set to 0 before first use and
> - * that should be reused on subsequent calls.
> - *
> - * With multiple active VMs, under memory pressure, it is possible that
> - * ttm_bo_validate() run into -EDEADLK and in such case returns -ENOMEM.
> - * Until ttm properly handles locking in such scenarios, best thing the
> - * driver can do is retry with a timeout. Check if that is necessary, and
> - * if so unlock the drm_exec's objects while keeping the ticket to prepare
> - * for a rerun.
> - *
> - * Return: true if a retry after drm_exec_init() is recommended;
> - * false otherwise.
> - */
> -bool xe_vm_validate_should_retry(struct drm_exec *exec, int err, ktime_t *end)
> -{
> -	ktime_t cur;
> -
> -	if (err != -ENOMEM)
> -		return false;
> -
> -	cur = ktime_get();
> -	*end = *end ? : ktime_add_ms(cur, XE_VM_REBIND_RETRY_TIMEOUT_MS);
> -	if (!ktime_before(cur, *end))
> -		return false;
> -
> -	msleep(20);
> -	return true;
> -}
> -
>  static int xe_gpuvm_validate(struct drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
>  {
>  	struct xe_vm *vm = gpuvm_to_vm(vm_bo->vm);
> @@ -497,10 +465,10 @@ static int xe_preempt_work_begin(struct drm_exec *exec, struct xe_vm *vm,
>  static void preempt_rebind_work_func(struct work_struct *w)
>  {
>  	struct xe_vm *vm = container_of(w, struct xe_vm, preempt.rebind_work);
> +	struct xe_validation_ctx ctx;
>  	struct drm_exec exec;
>  	unsigned int fence_count = 0;
>  	LIST_HEAD(preempt_fences);
> -	ktime_t end = 0;
>  	int err = 0;
>  	long wait;
>  	int __maybe_unused tries = 0;
> @@ -523,19 +491,20 @@ static void preempt_rebind_work_func(struct work_struct *w)
>  			goto out_unlock_outer;
>  	}
>  
> -	drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
> +	err = xe_validation_ctx_init(&ctx, &vm->xe->val,
> +				     &exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0, false);
> +	if (err)
> +		goto out_unlock_outer;
>  
>  	drm_exec_until_all_locked(&exec) {
>  		bool done = false;
>  
>  		err = xe_preempt_work_begin(&exec, vm, &done);
>  		drm_exec_retry_on_contention(&exec);
> +		xe_validation_retry_on_oom(&ctx, &err);
>  		if (err || done) {
>  			xe_vm_set_validation_exec(vm, NULL);
> -			drm_exec_fini(&exec);
> -			if (err && xe_vm_validate_should_retry(&exec, err, &end))
> -				err = -EAGAIN;
> -
> +			xe_validation_ctx_fini(&ctx);
>  			goto out_unlock_outer;
>  		}
>  	}
> @@ -582,7 +551,7 @@ static void preempt_rebind_work_func(struct work_struct *w)
>  
>  out_unlock:
>  	xe_vm_set_validation_exec(vm, NULL);
> -	drm_exec_fini(&exec);
> +	xe_validation_ctx_fini(&ctx);
>  out_unlock_outer:
>  	if (err == -EAGAIN) {
>  		trace_xe_vm_rebind_worker_retry(vm);
> @@ -1400,20 +1369,19 @@ int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma)
>  
>  static void xe_vma_destroy_unlocked(struct xe_vma *vma)
>  {
> +	struct xe_device *xe = xe_vma_vm(vma)->xe;
> +	struct xe_validation_ctx ctx;
>  	struct drm_exec exec;
> -	int err;
> +	int err = 0;
>  
> -	drm_exec_init(&exec, 0, 0);
> -	drm_exec_until_all_locked(&exec) {
> +	xe_validation_guard(&ctx, &xe->val, &exec, 0, err, false) {
>  		err = xe_vm_lock_vma(&exec, vma);
>  		drm_exec_retry_on_contention(&exec);
>  		if (XE_WARN_ON(err))
>  			break;
> +		xe_vma_destroy(vma, NULL);
>  	}
> -
> -	xe_vma_destroy(vma, NULL);
> -
> -	drm_exec_fini(&exec);
> +	xe_assert(xe, !err);
>  }
>  
>  struct xe_vma *
> @@ -2490,6 +2458,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
>  			      u16 pat_index, unsigned int flags)
>  {
>  	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
> +	struct xe_validation_ctx ctx;
>  	struct drm_exec exec;
>  	struct xe_vma *vma;
>  	int err = 0;
> @@ -2497,9 +2466,9 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
>  	lockdep_assert_held_write(&vm->lock);
>  
>  	if (bo) {
> -		drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
> -		drm_exec_until_all_locked(&exec) {
> -			err = 0;
> +		err = 0;
> +		xe_validation_guard(&ctx, &vm->xe->val, &exec,
> +				    DRM_EXEC_INTERRUPTIBLE_WAIT, err, false) {
>  			if (!bo->vm) {
>  				err = drm_exec_lock_obj(&exec, xe_vm_obj(vm));
>  				drm_exec_retry_on_contention(&exec);
> @@ -2508,27 +2477,34 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
>  				err = drm_exec_lock_obj(&exec, &bo->ttm.base);
>  				drm_exec_retry_on_contention(&exec);
>  			}
> -			if (err) {
> -				drm_exec_fini(&exec);
> +			if (err)
>  				return ERR_PTR(err);
> +
> +			vma = xe_vma_create(vm, bo, op->gem.offset,
> +					    op->va.addr, op->va.addr +
> +					    op->va.range - 1, pat_index, flags);
> +			if (IS_ERR(vma))
> +				return vma;
> +
> +			if (!bo->vm) {
> +				err = add_preempt_fences(vm, bo);
> +				goto out_err;
>  			}
>  		}
> +		if (err)
> +			return ERR_PTR(err);
> +	} else {
> +		vma = xe_vma_create(vm, NULL, op->gem.offset,
> +				    op->va.addr, op->va.addr +
> +				    op->va.range - 1, pat_index, flags);
> +		if (IS_ERR(vma))
> +			return vma;
> +
> +		if (xe_vma_is_userptr(vma))
> +			err = xe_vma_userptr_pin_pages(to_userptr_vma(vma));
>  	}
> -	vma = xe_vma_create(vm, bo, op->gem.offset,
> -			    op->va.addr, op->va.addr +
> -			    op->va.range - 1, pat_index, flags);
> -	if (IS_ERR(vma))
> -		goto err_unlock;
> -
> -	if (xe_vma_is_userptr(vma))
> -		err = xe_vma_userptr_pin_pages(to_userptr_vma(vma));
> -	else if (!xe_vma_has_no_bo(vma) && !bo->vm)
> -		err = add_preempt_fences(vm, bo);
> -
> -err_unlock:
> -	if (bo)
> -		drm_exec_fini(&exec);
>  
> +out_err:
>  	if (err) {
>  		prep_vma_destroy(vm, vma, false);
>  		xe_vma_destroy_unlocked(vma);
> @@ -3296,34 +3272,32 @@ static void vm_bind_ioctl_ops_fini(struct xe_vm *vm, struct xe_vma_ops *vops,
>  static struct dma_fence *vm_bind_ioctl_ops_execute(struct xe_vm *vm,
>  						   struct xe_vma_ops *vops)
>  {
> +	struct xe_validation_ctx ctx;
>  	struct drm_exec exec;
>  	struct dma_fence *fence;
> -	int err;
> +	int err = 0;
>  
>  	lockdep_assert_held_write(&vm->lock);
>  
> -	drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT |
> -		      DRM_EXEC_IGNORE_DUPLICATES, 0);
> -	drm_exec_until_all_locked(&exec) {
> +	xe_validation_guard(&ctx, &vm->xe->val, &exec,
> +			    DRM_EXEC_INTERRUPTIBLE_WAIT |
> +			    DRM_EXEC_IGNORE_DUPLICATES, err, true) {
>  		err = vm_bind_ioctl_ops_lock_and_prep(&exec, vm, vops);
>  		drm_exec_retry_on_contention(&exec);
> -		if (err) {
> -			fence = ERR_PTR(err);
> -			goto unlock;
> -		}
> +		xe_validation_retry_on_oom(&ctx, &err);
> +		if (err)
> +			return ERR_PTR(err);
>  
>  		fence = ops_execute(vm, vops);
>  		if (IS_ERR(fence)) {
>  			if (PTR_ERR(fence) == -ENODATA)
>  				vm_bind_ioctl_ops_fini(vm, vops, NULL);
> -			goto unlock;
> +			return fence;
>  		}
>  
>  		vm_bind_ioctl_ops_fini(vm, vops, fence);
>  	}
>  
> -unlock:
> -	drm_exec_fini(&exec);
>  	return fence;
>  }
>  ALLOW_ERROR_INJECTION(vm_bind_ioctl_ops_execute, ERRNO);
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index 3b6e7234dac4..418940222690 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -262,8 +262,6 @@ int xe_vma_userptr_pin_pages(struct xe_userptr_vma *uvma);
>  
>  int xe_vma_userptr_check_repin(struct xe_userptr_vma *uvma);
>  
> -bool xe_vm_validate_should_retry(struct drm_exec *exec, int err, ktime_t *end);
> -
>  int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma);
>  
>  int xe_vm_validate_rebind(struct xe_vm *vm, struct drm_exec *exec,
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/15] drm/xe/tests/xe_dma_buf: Set the drm_object::dma_buf member
  2025-08-13 10:51 ` [PATCH 02/15] drm/xe/tests/xe_dma_buf: Set the drm_object::dma_buf member Thomas Hellström
@ 2025-08-14  2:52   ` Matthew Brost
  0 siblings, 0 replies; 66+ messages in thread
From: Matthew Brost @ 2025-08-14  2:52 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, Aug 13, 2025 at 12:51:08PM +0200, Thomas Hellström wrote:
> This member is set when exporting using prime. However
> the xe_gem_prime_export() alone doesn't set it, since it's done
> later in the prime export flow.
> For the test, set it manually and remove the hack that set it
> temporarily when it was really needed.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/xe/tests/xe_dma_buf.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/tests/xe_dma_buf.c b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
> index c53f67ce4b0a..cde9530bef8c 100644
> --- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c
> +++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
> @@ -57,16 +57,12 @@ static void check_residency(struct kunit *test, struct xe_bo *exported,
>  		return;
>  
>  	/*
> -	 * Evict exporter. Note that the gem object dma_buf member isn't
> -	 * set from xe_gem_prime_export(), and it's needed for the move_notify()
> -	 * functionality, so hack that up here. Evicting the exported bo will
> +	 * Evict exporter. Evicting the exported bo will
>  	 * evict also the imported bo through the move_notify() functionality if
>  	 * importer is on a different device. If they're on the same device,
>  	 * the exporter and the importer should be the same bo.
>  	 */
> -	swap(exported->ttm.base.dma_buf, dmabuf);
>  	ret = xe_bo_evict(exported);
> -	swap(exported->ttm.base.dma_buf, dmabuf);
>  	if (ret) {
>  		if (ret != -EINTR && ret != -ERESTARTSYS)
>  			KUNIT_FAIL(test, "Evicting exporter failed with err=%d.\n",
> @@ -139,6 +135,7 @@ static void xe_test_dmabuf_import_same_driver(struct xe_device *xe)
>  			   PTR_ERR(dmabuf));
>  		goto out;
>  	}
> +	bo->ttm.base.dma_buf = dmabuf;
>  
>  	import = xe_gem_prime_import(&xe->drm, dmabuf);
>  	if (!IS_ERR(import)) {
> @@ -186,6 +183,7 @@ static void xe_test_dmabuf_import_same_driver(struct xe_device *xe)
>  		KUNIT_FAIL(test, "dynamic p2p attachment failed with err=%ld\n",
>  			   PTR_ERR(import));
>  	}
> +	bo->ttm.base.dma_buf = NULL;
>  	dma_buf_put(dmabuf);
>  out:
>  	drm_gem_object_put(&bo->ttm.base);
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 13/15] drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction
  2025-08-13 10:51 ` [PATCH 13/15] drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction Thomas Hellström
@ 2025-08-14  3:58   ` Matthew Brost
  2025-08-15 15:25     ` Thomas Hellström
  2025-08-14  4:05   ` Matthew Brost
  2025-08-14 18:48   ` Matthew Brost
  2 siblings, 1 reply; 66+ messages in thread
From: Matthew Brost @ 2025-08-14  3:58 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, Aug 13, 2025 at 12:51:19PM +0200, Thomas Hellström wrote:
> Most users of xe_bo_create_pin_map_at() and
> xe_bo_create_pin_map_at_aligned() are not using the vm parameter,
> and that simplifies conversion. Introduce an
> xe_bo_create_pin_map_at_novm() function and make the _aligned()
> version static. Use xe_validation_guard() for conversion.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  .../compat-i915-headers/gem/i915_gem_stolen.h | 24 ++----
>  drivers/gpu/drm/xe/display/xe_fb_pin.c        | 45 +++++-----
>  drivers/gpu/drm/xe/display/xe_plane_initial.c |  4 +-
>  drivers/gpu/drm/xe/xe_bo.c                    | 83 ++++++++++++++-----
>  drivers/gpu/drm/xe/xe_bo.h                    | 13 +--
>  drivers/gpu/drm/xe/xe_eu_stall.c              |  6 +-
>  6 files changed, 101 insertions(+), 74 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> index 1ce1e9da975b..ab48635ddffa 100644
> --- a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> +++ b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> @@ -21,9 +21,7 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
>  						       u32 size, u32 align,
>  						       u32 start, u32 end)
>  {
> -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
>  	struct xe_bo *bo;
> -	int err;
>  	u32 flags = XE_BO_FLAG_PINNED | XE_BO_FLAG_STOLEN;
>  
>  	if (start < SZ_4K)
> @@ -34,25 +32,15 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
>  		start = ALIGN(start, align);
>  	}
>  
> -	bo = xe_bo_create_locked_range(xe, xe_device_get_root_tile(xe),
> -				       NULL, size, start, end,
> -				       ttm_bo_type_kernel, flags, 0, exec);
> -	if (IS_ERR(bo)) {
> -		err = PTR_ERR(bo);
> -		bo = NULL;
> -		return err;
> -	}
> -	err = xe_bo_pin(bo, exec);
> -	xe_bo_unlock_vm_held(bo);
> -
> -	if (err) {
> -		xe_bo_put(fb->bo);
> -		bo = NULL;
> -	}
> +	bo = xe_bo_create_pin_map_at_novm(xe, xe_device_get_root_tile(xe),
> +					  size, start, ttm_bo_type_kernel, flags,
> +					  false, 0, true);
> +	if (IS_ERR(bo))
> +		return PTR_ERR(bo);
>  
>  	fb->bo = bo;
>  
> -	return err;
> +	return 0;
>  }
>  
>  static inline int i915_gem_stolen_insert_node(struct xe_device *xe,
> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> index 43c45344ea26..d46ff7ebb0a1 100644
> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> @@ -102,29 +102,32 @@ static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb,
>  				 XE_PAGE_SIZE);
>  
>  	if (IS_DGFX(xe))
> -		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0, NULL,
> -						      dpt_size, ~0ull,
> -						      ttm_bo_type_kernel,
> -						      XE_BO_FLAG_VRAM0 |
> -						      XE_BO_FLAG_GGTT |
> -						      XE_BO_FLAG_PAGETABLE,
> -						      alignment);
> +		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> +						   dpt_size, ~0ull,
> +						   ttm_bo_type_kernel,
> +						   true,
> +						   XE_BO_FLAG_VRAM0 |
> +						   XE_BO_FLAG_GGTT |
> +						   XE_BO_FLAG_PAGETABLE,
> +						   alignment, false);
>  	else
> -		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0, NULL,
> -						      dpt_size,  ~0ull,
> -						      ttm_bo_type_kernel,
> -						      XE_BO_FLAG_STOLEN |
> -						      XE_BO_FLAG_GGTT |
> -						      XE_BO_FLAG_PAGETABLE,
> -						      alignment);
> +		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> +						   dpt_size,  ~0ull,
> +						   ttm_bo_type_kernel,
> +						   true,
> +						   XE_BO_FLAG_STOLEN |
> +						   XE_BO_FLAG_GGTT |
> +						   XE_BO_FLAG_PAGETABLE,
> +						   alignment, false);
>  	if (IS_ERR(dpt))
> -		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0, NULL,
> -						      dpt_size,  ~0ull,
> -						      ttm_bo_type_kernel,
> -						      XE_BO_FLAG_SYSTEM |
> -						      XE_BO_FLAG_GGTT |
> -						      XE_BO_FLAG_PAGETABLE,
> -						      alignment);
> +		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> +						   dpt_size,  ~0ull,
> +						   ttm_bo_type_kernel,
> +						   true,
> +						   XE_BO_FLAG_SYSTEM |
> +						   XE_BO_FLAG_GGTT |
> +						   XE_BO_FLAG_PAGETABLE,
> +						   alignment, false);
>  	if (IS_ERR(dpt))
>  		return PTR_ERR(dpt);
>  
> diff --git a/drivers/gpu/drm/xe/display/xe_plane_initial.c b/drivers/gpu/drm/xe/display/xe_plane_initial.c
> index 826ac3d578b7..79d00127caf4 100644
> --- a/drivers/gpu/drm/xe/display/xe_plane_initial.c
> +++ b/drivers/gpu/drm/xe/display/xe_plane_initial.c
> @@ -140,8 +140,8 @@ initial_plane_bo(struct xe_device *xe,
>  			page_size);
>  	size -= base;
>  
> -	bo = xe_bo_create_pin_map_at(xe, tile0, NULL, size, phys_base,
> -				     ttm_bo_type_kernel, flags);
> +	bo = xe_bo_create_pin_map_at_novm(xe, tile0, size, phys_base,
> +					  ttm_bo_type_kernel, flags, true, 0, false);
>  	if (IS_ERR(bo)) {
>  		drm_dbg(&xe->drm,
>  			"Failed to create bo phys_base=%pa size %u with flags %x: %li\n",
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 23b28eeef59f..c9928d4ee5a0 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -2253,29 +2253,20 @@ struct xe_bo *xe_bo_create_user(struct xe_device *xe,
>  	return bo;
>  }
>  
> -struct xe_bo *xe_bo_create_pin_map_at(struct xe_device *xe, struct xe_tile *tile,
> -				      struct xe_vm *vm,
> -				      size_t size, u64 offset,
> -				      enum ttm_bo_type type, u32 flags)
> -{
> -	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size, offset,
> -					       type, flags, 0);
> -}
> -
> -struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
> -					      struct xe_tile *tile,
> -					      struct xe_vm *vm,
> -					      size_t size, u64 offset,
> -					      enum ttm_bo_type type, u32 flags,
> -					      u64 alignment)
> +static struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
> +						     struct xe_tile *tile,
> +						     struct xe_vm *vm,
> +						     size_t size, u64 offset,
> +						     enum ttm_bo_type type, u32 flags,
> +						     bool vmap, u64 alignment,
> +						     struct drm_exec *exec)
>  {
>  	struct xe_bo *bo;
>  	int err;
>  	u64 start = offset == ~0ull ? 0 : offset;
>  	u64 end = offset == ~0ull ? offset : start + size;
> -	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
>  
> -	if (flags & XE_BO_FLAG_STOLEN &&
> +	if (flags & XE_BO_FLAG_STOLEN && vmap &&
>  	    xe_ttm_stolen_cpu_access_needs_ggtt(xe))
>  		flags |= XE_BO_FLAG_GGTT;
>  
> @@ -2289,9 +2280,11 @@ struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
>  	if (err)
>  		goto err_put;
>  
> -	err = xe_bo_vmap(bo);
> -	if (err)
> -		goto err_unpin;
> +	if (vmap) {
> +		err = xe_bo_vmap(bo);
> +		if (err)
> +			goto err_unpin;
> +	}
>  
>  	xe_bo_unlock_vm_held(bo);
>  
> @@ -2305,11 +2298,59 @@ struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
>  	return ERR_PTR(err);
>  }
>  
> +/**
> + * xe_bo_create_pin_map_at_novm() - Create pinned and mapped bo at optional VRAM offset
> + * @xe: The xe device.
> + * @tile: The tile to select for migration of this bo, and the tile used for
> + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
> + * @size: The storage size to use for the bo.
> + * @offset: Optional VRAM offset or %0 for don't care.
> + * @type: The TTM buffer object type.
> + * @flags: XE_BO_FLAG_ flags.
> + * @vmap: Whether to create a buffer object map.

Can we stick vmap into XE_BO_FLAG_?

Also why do we need argument now when it what omitted previously.

Matt

> + * @alignment: GGTT alignment.
> + * @intr: Whether to execut any waits for backing store interruptible.
> + *
> + * Create a pinned and optionally mapped bo with VRAM offset and GGTT alignment
> + * options. The bo will be external and not associated with a VM.
> + *
> + * Return: The buffer object on success. Negative error pointer on failure.
> + * In particular, the function may return ERR_PTR(%-EINTR) if @intr was set
> + * to true on entry.
> + */
> +struct xe_bo *
> +xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile *tile,
> +			     size_t size, u64 offset, enum ttm_bo_type type, u32 flags,
> +			     bool vmap, u64 alignment, bool intr)
> +{
> +	u32 drm_exec_flags = intr ? DRM_EXEC_INTERRUPTIBLE_WAIT : 0;
> +	struct xe_validation_ctx ctx;
> +	struct drm_exec exec;
> +	struct xe_bo *bo;
> +	int ret = 0;
> +
> +	xe_validation_guard(&ctx, &xe->val, &exec, drm_exec_flags, ret, false) {
> +		bo = xe_bo_create_pin_map_at_aligned(xe, tile, NULL, size, offset,
> +						     type, flags, vmap,
> +						     alignment, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +		if (IS_ERR(bo)) {
> +			ret = PTR_ERR(bo);
> +			xe_validation_retry_on_oom(&ctx, &ret);
> +		}
> +	}
> +
> +	return ret ? ERR_PTR(ret) : bo;
> +}
> +
>  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
>  				   struct xe_vm *vm, size_t size,
>  				   enum ttm_bo_type type, u32 flags)
>  {
> -	return xe_bo_create_pin_map_at(xe, tile, vm, size, ~0ull, type, flags);
> +	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
> +
> +	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size, ~0ull, type, flags,
> +					       true, 0, exec);
>  }
>  
>  static void __xe_bo_unpin_map_no_vm(void *arg)
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index a625806deeb6..d06266af9662 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -109,15 +109,10 @@ struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_vm *vm, size_t s
>  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
>  				   struct xe_vm *vm, size_t size,
>  				   enum ttm_bo_type type, u32 flags);
> -struct xe_bo *xe_bo_create_pin_map_at(struct xe_device *xe, struct xe_tile *tile,
> -				      struct xe_vm *vm, size_t size, u64 offset,
> -				      enum ttm_bo_type type, u32 flags);
> -struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
> -					      struct xe_tile *tile,
> -					      struct xe_vm *vm,
> -					      size_t size, u64 offset,
> -					      enum ttm_bo_type type, u32 flags,
> -					      u64 alignment);
> +struct xe_bo *
> +xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile *tile,
> +			     size_t size, u64 offset, enum ttm_bo_type type,
> +			     u32 flags, bool vmap, u64 alignment, bool intr);
>  struct xe_bo *xe_managed_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
>  					   size_t size, u32 flags);
>  struct xe_bo *xe_managed_bo_create_from_data(struct xe_device *xe, struct xe_tile *tile,
> diff --git a/drivers/gpu/drm/xe/xe_eu_stall.c b/drivers/gpu/drm/xe/xe_eu_stall.c
> index fdd514fec5ef..afabfc125488 100644
> --- a/drivers/gpu/drm/xe/xe_eu_stall.c
> +++ b/drivers/gpu/drm/xe/xe_eu_stall.c
> @@ -617,9 +617,9 @@ static int xe_eu_stall_data_buf_alloc(struct xe_eu_stall_data_stream *stream,
>  
>  	size = stream->per_xecore_buf_size * last_xecore;
>  
> -	bo = xe_bo_create_pin_map_at_aligned(tile->xe, tile, NULL,
> -					     size, ~0ull, ttm_bo_type_kernel,
> -					     XE_BO_FLAG_SYSTEM | XE_BO_FLAG_GGTT, SZ_64);
> +	bo = xe_bo_create_pin_map_at_novm(tile->xe, tile, size, ~0ull, ttm_bo_type_kernel,
> +					  XE_BO_FLAG_SYSTEM | XE_BO_FLAG_GGTT, true,
> +					  SZ_64, false);
>  	if (IS_ERR(bo)) {
>  		kfree(stream->xecore_buf);
>  		return PTR_ERR(bo);
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 13/15] drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction
  2025-08-13 10:51 ` [PATCH 13/15] drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction Thomas Hellström
  2025-08-14  3:58   ` Matthew Brost
@ 2025-08-14  4:05   ` Matthew Brost
  2025-08-15 15:27     ` Thomas Hellström
  2025-08-14 18:48   ` Matthew Brost
  2 siblings, 1 reply; 66+ messages in thread
From: Matthew Brost @ 2025-08-14  4:05 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, Aug 13, 2025 at 12:51:19PM +0200, Thomas Hellström wrote:
> Most users of xe_bo_create_pin_map_at() and
> xe_bo_create_pin_map_at_aligned() are not using the vm parameter,
> and that simplifies conversion. Introduce an
> xe_bo_create_pin_map_at_novm() function and make the _aligned()
> version static. Use xe_validation_guard() for conversion.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  .../compat-i915-headers/gem/i915_gem_stolen.h | 24 ++----
>  drivers/gpu/drm/xe/display/xe_fb_pin.c        | 45 +++++-----
>  drivers/gpu/drm/xe/display/xe_plane_initial.c |  4 +-
>  drivers/gpu/drm/xe/xe_bo.c                    | 83 ++++++++++++++-----
>  drivers/gpu/drm/xe/xe_bo.h                    | 13 +--
>  drivers/gpu/drm/xe/xe_eu_stall.c              |  6 +-
>  6 files changed, 101 insertions(+), 74 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> index 1ce1e9da975b..ab48635ddffa 100644
> --- a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> +++ b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> @@ -21,9 +21,7 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
>  						       u32 size, u32 align,
>  						       u32 start, u32 end)
>  {
> -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
>  	struct xe_bo *bo;
> -	int err;
>  	u32 flags = XE_BO_FLAG_PINNED | XE_BO_FLAG_STOLEN;
>  
>  	if (start < SZ_4K)
> @@ -34,25 +32,15 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
>  		start = ALIGN(start, align);
>  	}
>  
> -	bo = xe_bo_create_locked_range(xe, xe_device_get_root_tile(xe),
> -				       NULL, size, start, end,
> -				       ttm_bo_type_kernel, flags, 0, exec);
> -	if (IS_ERR(bo)) {
> -		err = PTR_ERR(bo);
> -		bo = NULL;
> -		return err;
> -	}
> -	err = xe_bo_pin(bo, exec);
> -	xe_bo_unlock_vm_held(bo);
> -
> -	if (err) {
> -		xe_bo_put(fb->bo);
> -		bo = NULL;
> -	}
> +	bo = xe_bo_create_pin_map_at_novm(xe, xe_device_get_root_tile(xe),
> +					  size, start, ttm_bo_type_kernel, flags,
> +					  false, 0, true);
> +	if (IS_ERR(bo))
> +		return PTR_ERR(bo);
>  
>  	fb->bo = bo;
>  
> -	return err;
> +	return 0;
>  }
>  
>  static inline int i915_gem_stolen_insert_node(struct xe_device *xe,
> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> index 43c45344ea26..d46ff7ebb0a1 100644
> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> @@ -102,29 +102,32 @@ static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb,
>  				 XE_PAGE_SIZE);
>  
>  	if (IS_DGFX(xe))
> -		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0, NULL,
> -						      dpt_size, ~0ull,
> -						      ttm_bo_type_kernel,
> -						      XE_BO_FLAG_VRAM0 |
> -						      XE_BO_FLAG_GGTT |
> -						      XE_BO_FLAG_PAGETABLE,
> -						      alignment);
> +		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> +						   dpt_size, ~0ull,
> +						   ttm_bo_type_kernel,
> +						   true,
> +						   XE_BO_FLAG_VRAM0 |
> +						   XE_BO_FLAG_GGTT |
> +						   XE_BO_FLAG_PAGETABLE,

The flags and vmap arguments are swapped.

> +						   alignment, false);
>  	else
> -		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0, NULL,
> -						      dpt_size,  ~0ull,
> -						      ttm_bo_type_kernel,
> -						      XE_BO_FLAG_STOLEN |
> -						      XE_BO_FLAG_GGTT |
> -						      XE_BO_FLAG_PAGETABLE,
> -						      alignment);
> +		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> +						   dpt_size,  ~0ull,
> +						   ttm_bo_type_kernel,
> +						   true,
> +						   XE_BO_FLAG_STOLEN |
> +						   XE_BO_FLAG_GGTT |
> +						   XE_BO_FLAG_PAGETABLE,

The flags and vmap arguments are swapped.

Matt

> +						   alignment, false);
>  	if (IS_ERR(dpt))
> -		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0, NULL,
> -						      dpt_size,  ~0ull,
> -						      ttm_bo_type_kernel,
> -						      XE_BO_FLAG_SYSTEM |
> -						      XE_BO_FLAG_GGTT |
> -						      XE_BO_FLAG_PAGETABLE,
> -						      alignment);
> +		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> +						   dpt_size,  ~0ull,
> +						   ttm_bo_type_kernel,
> +						   true,
> +						   XE_BO_FLAG_SYSTEM |
> +						   XE_BO_FLAG_GGTT |
> +						   XE_BO_FLAG_PAGETABLE,
> +						   alignment, false);
>  	if (IS_ERR(dpt))
>  		return PTR_ERR(dpt);
>  
> diff --git a/drivers/gpu/drm/xe/display/xe_plane_initial.c b/drivers/gpu/drm/xe/display/xe_plane_initial.c
> index 826ac3d578b7..79d00127caf4 100644
> --- a/drivers/gpu/drm/xe/display/xe_plane_initial.c
> +++ b/drivers/gpu/drm/xe/display/xe_plane_initial.c
> @@ -140,8 +140,8 @@ initial_plane_bo(struct xe_device *xe,
>  			page_size);
>  	size -= base;
>  
> -	bo = xe_bo_create_pin_map_at(xe, tile0, NULL, size, phys_base,
> -				     ttm_bo_type_kernel, flags);
> +	bo = xe_bo_create_pin_map_at_novm(xe, tile0, size, phys_base,
> +					  ttm_bo_type_kernel, flags, true, 0, false);
>  	if (IS_ERR(bo)) {
>  		drm_dbg(&xe->drm,
>  			"Failed to create bo phys_base=%pa size %u with flags %x: %li\n",
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 23b28eeef59f..c9928d4ee5a0 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -2253,29 +2253,20 @@ struct xe_bo *xe_bo_create_user(struct xe_device *xe,
>  	return bo;
>  }
>  
> -struct xe_bo *xe_bo_create_pin_map_at(struct xe_device *xe, struct xe_tile *tile,
> -				      struct xe_vm *vm,
> -				      size_t size, u64 offset,
> -				      enum ttm_bo_type type, u32 flags)
> -{
> -	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size, offset,
> -					       type, flags, 0);
> -}
> -
> -struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
> -					      struct xe_tile *tile,
> -					      struct xe_vm *vm,
> -					      size_t size, u64 offset,
> -					      enum ttm_bo_type type, u32 flags,
> -					      u64 alignment)
> +static struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
> +						     struct xe_tile *tile,
> +						     struct xe_vm *vm,
> +						     size_t size, u64 offset,
> +						     enum ttm_bo_type type, u32 flags,
> +						     bool vmap, u64 alignment,
> +						     struct drm_exec *exec)
>  {
>  	struct xe_bo *bo;
>  	int err;
>  	u64 start = offset == ~0ull ? 0 : offset;
>  	u64 end = offset == ~0ull ? offset : start + size;
> -	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
>  
> -	if (flags & XE_BO_FLAG_STOLEN &&
> +	if (flags & XE_BO_FLAG_STOLEN && vmap &&
>  	    xe_ttm_stolen_cpu_access_needs_ggtt(xe))
>  		flags |= XE_BO_FLAG_GGTT;
>  
> @@ -2289,9 +2280,11 @@ struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
>  	if (err)
>  		goto err_put;
>  
> -	err = xe_bo_vmap(bo);
> -	if (err)
> -		goto err_unpin;
> +	if (vmap) {
> +		err = xe_bo_vmap(bo);
> +		if (err)
> +			goto err_unpin;
> +	}
>  
>  	xe_bo_unlock_vm_held(bo);
>  
> @@ -2305,11 +2298,59 @@ struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
>  	return ERR_PTR(err);
>  }
>  
> +/**
> + * xe_bo_create_pin_map_at_novm() - Create pinned and mapped bo at optional VRAM offset
> + * @xe: The xe device.
> + * @tile: The tile to select for migration of this bo, and the tile used for
> + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
> + * @size: The storage size to use for the bo.
> + * @offset: Optional VRAM offset or %0 for don't care.
> + * @type: The TTM buffer object type.
> + * @flags: XE_BO_FLAG_ flags.
> + * @vmap: Whether to create a buffer object map.
> + * @alignment: GGTT alignment.
> + * @intr: Whether to execut any waits for backing store interruptible.
> + *
> + * Create a pinned and optionally mapped bo with VRAM offset and GGTT alignment
> + * options. The bo will be external and not associated with a VM.
> + *
> + * Return: The buffer object on success. Negative error pointer on failure.
> + * In particular, the function may return ERR_PTR(%-EINTR) if @intr was set
> + * to true on entry.
> + */
> +struct xe_bo *
> +xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile *tile,
> +			     size_t size, u64 offset, enum ttm_bo_type type, u32 flags,
> +			     bool vmap, u64 alignment, bool intr)
> +{
> +	u32 drm_exec_flags = intr ? DRM_EXEC_INTERRUPTIBLE_WAIT : 0;
> +	struct xe_validation_ctx ctx;
> +	struct drm_exec exec;
> +	struct xe_bo *bo;
> +	int ret = 0;
> +
> +	xe_validation_guard(&ctx, &xe->val, &exec, drm_exec_flags, ret, false) {
> +		bo = xe_bo_create_pin_map_at_aligned(xe, tile, NULL, size, offset,
> +						     type, flags, vmap,
> +						     alignment, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +		if (IS_ERR(bo)) {
> +			ret = PTR_ERR(bo);
> +			xe_validation_retry_on_oom(&ctx, &ret);
> +		}
> +	}
> +
> +	return ret ? ERR_PTR(ret) : bo;
> +}
> +
>  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
>  				   struct xe_vm *vm, size_t size,
>  				   enum ttm_bo_type type, u32 flags)
>  {
> -	return xe_bo_create_pin_map_at(xe, tile, vm, size, ~0ull, type, flags);
> +	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
> +
> +	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size, ~0ull, type, flags,
> +					       true, 0, exec);
>  }
>  
>  static void __xe_bo_unpin_map_no_vm(void *arg)
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index a625806deeb6..d06266af9662 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -109,15 +109,10 @@ struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_vm *vm, size_t s
>  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
>  				   struct xe_vm *vm, size_t size,
>  				   enum ttm_bo_type type, u32 flags);
> -struct xe_bo *xe_bo_create_pin_map_at(struct xe_device *xe, struct xe_tile *tile,
> -				      struct xe_vm *vm, size_t size, u64 offset,
> -				      enum ttm_bo_type type, u32 flags);
> -struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
> -					      struct xe_tile *tile,
> -					      struct xe_vm *vm,
> -					      size_t size, u64 offset,
> -					      enum ttm_bo_type type, u32 flags,
> -					      u64 alignment);
> +struct xe_bo *
> +xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile *tile,
> +			     size_t size, u64 offset, enum ttm_bo_type type,
> +			     u32 flags, bool vmap, u64 alignment, bool intr);
>  struct xe_bo *xe_managed_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
>  					   size_t size, u32 flags);
>  struct xe_bo *xe_managed_bo_create_from_data(struct xe_device *xe, struct xe_tile *tile,
> diff --git a/drivers/gpu/drm/xe/xe_eu_stall.c b/drivers/gpu/drm/xe/xe_eu_stall.c
> index fdd514fec5ef..afabfc125488 100644
> --- a/drivers/gpu/drm/xe/xe_eu_stall.c
> +++ b/drivers/gpu/drm/xe/xe_eu_stall.c
> @@ -617,9 +617,9 @@ static int xe_eu_stall_data_buf_alloc(struct xe_eu_stall_data_stream *stream,
>  
>  	size = stream->per_xecore_buf_size * last_xecore;
>  
> -	bo = xe_bo_create_pin_map_at_aligned(tile->xe, tile, NULL,
> -					     size, ~0ull, ttm_bo_type_kernel,
> -					     XE_BO_FLAG_SYSTEM | XE_BO_FLAG_GGTT, SZ_64);
> +	bo = xe_bo_create_pin_map_at_novm(tile->xe, tile, size, ~0ull, ttm_bo_type_kernel,
> +					  XE_BO_FLAG_SYSTEM | XE_BO_FLAG_GGTT, true,
> +					  SZ_64, false);
>  	if (IS_ERR(bo)) {
>  		kfree(stream->xecore_buf);
>  		return PTR_ERR(bo);
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 14/15] drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction
  2025-08-13 10:51 ` [PATCH 14/15] drm/xe: Convert xe_bo_create_pin_map() " Thomas Hellström
@ 2025-08-14  4:18   ` Matthew Brost
  2025-08-14 13:14     ` Thomas Hellström
  0 siblings, 1 reply; 66+ messages in thread
From: Matthew Brost @ 2025-08-14  4:18 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, Aug 13, 2025 at 12:51:20PM +0200, Thomas Hellström wrote:
> Introduce an xe_bo_create_pin_map_novm() function that does not
> take the drm_exec paramenter to simplify the conversion of many
> callsites.
> For the rest, ensure that the same drm_exec context that was used
> for locking the vm is passed down to validation.
> 
> Use xe_validation_guard() where appropriate.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/display/intel_fbdev_fb.c   |  18 +--
>  drivers/gpu/drm/xe/display/xe_dsb_buffer.c    |  10 +-
>  drivers/gpu/drm/xe/display/xe_fb_pin.c        |  39 +++---
>  drivers/gpu/drm/xe/display/xe_hdcp_gsc.c      |   8 +-
>  drivers/gpu/drm/xe/tests/xe_migrate.c         |   9 +-
>  drivers/gpu/drm/xe/xe_bo.c                    |  53 +++++++-
>  drivers/gpu/drm/xe/xe_bo.h                    |   6 +-
>  drivers/gpu/drm/xe/xe_gsc.c                   |   8 +-
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c    |  24 ++--
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c |  24 ++--
>  drivers/gpu/drm/xe/xe_guc_engine_activity.c   |  13 +-
>  drivers/gpu/drm/xe/xe_lmtt.c                  |  12 +-
>  drivers/gpu/drm/xe/xe_lrc.c                   |   7 +-
>  drivers/gpu/drm/xe/xe_migrate.c               |  20 ++-
>  drivers/gpu/drm/xe/xe_oa.c                    |   6 +-
>  drivers/gpu/drm/xe/xe_pt.c                    |  10 +-
>  drivers/gpu/drm/xe/xe_pt.h                    |   3 +-
>  drivers/gpu/drm/xe/xe_pxp_submit.c            |  34 +++--
>  drivers/gpu/drm/xe/xe_vm.c                    | 119 ++++++++++--------
>  19 files changed, 252 insertions(+), 171 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
> index d96ba2b51065..8ea9a472113c 100644
> --- a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
> +++ b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
> @@ -42,11 +42,11 @@ struct intel_framebuffer *intel_fbdev_fb_alloc(struct drm_fb_helper *helper,
>  	obj = ERR_PTR(-ENODEV);
>  
>  	if (!IS_DGFX(xe) && !XE_GT_WA(xe_root_mmio_gt(xe), 22019338487_display)) {
> -		obj = xe_bo_create_pin_map(xe, xe_device_get_root_tile(xe),
> -					   NULL, size,
> -					   ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
> -					   XE_BO_FLAG_STOLEN |
> -					   XE_BO_FLAG_GGTT);
> +		obj = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe),
> +						size,
> +						ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
> +						XE_BO_FLAG_STOLEN |
> +						XE_BO_FLAG_GGTT, false);
>  		if (!IS_ERR(obj))
>  			drm_info(&xe->drm, "Allocated fbdev into stolen\n");
>  		else
> @@ -54,10 +54,10 @@ struct intel_framebuffer *intel_fbdev_fb_alloc(struct drm_fb_helper *helper,
>  	}
>  
>  	if (IS_ERR(obj)) {
> -		obj = xe_bo_create_pin_map(xe, xe_device_get_root_tile(xe), NULL, size,
> -					   ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
> -					   XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> -					   XE_BO_FLAG_GGTT);
> +		obj = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe), size,
> +						ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
> +						XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> +						XE_BO_FLAG_GGTT, false);
>  	}
>  
>  	if (IS_ERR(obj)) {
> diff --git a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
> index 9f941fc2e36b..58581d7aaae6 100644
> --- a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
> +++ b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
> @@ -43,11 +43,11 @@ bool intel_dsb_buffer_create(struct intel_crtc *crtc, struct intel_dsb_buffer *d
>  		return false;
>  
>  	/* Set scanout flag for WC mapping */
> -	obj = xe_bo_create_pin_map(xe, xe_device_get_root_tile(xe),
> -				   NULL, PAGE_ALIGN(size),
> -				   ttm_bo_type_kernel,
> -				   XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> -				   XE_BO_FLAG_SCANOUT | XE_BO_FLAG_GGTT);
> +	obj = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe),
> +					PAGE_ALIGN(size),
> +					ttm_bo_type_kernel,
> +					XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> +					XE_BO_FLAG_SCANOUT | XE_BO_FLAG_GGTT, false);
>  	if (IS_ERR(obj)) {
>  		kfree(vma);
>  		return false;
> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> index d46ff7ebb0a1..d8e15ebb740c 100644
> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> @@ -102,32 +102,23 @@ static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb,
>  				 XE_PAGE_SIZE);
>  
>  	if (IS_DGFX(xe))
> -		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> -						   dpt_size, ~0ull,
> -						   ttm_bo_type_kernel,
> -						   true,
> -						   XE_BO_FLAG_VRAM0 |
> -						   XE_BO_FLAG_GGTT |
> -						   XE_BO_FLAG_PAGETABLE,
> -						   alignment, false);
> +		dpt = xe_bo_create_pin_map_novm(xe, tile0, dpt_size,
> +						ttm_bo_type_kernel,
> +						XE_BO_FLAG_VRAM0 |
> +						XE_BO_FLAG_GGTT |
> +						XE_BO_FLAG_PAGETABLE, true);
>  	else
> -		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> -						   dpt_size,  ~0ull,
> -						   ttm_bo_type_kernel,
> -						   true,
> -						   XE_BO_FLAG_STOLEN |
> -						   XE_BO_FLAG_GGTT |
> -						   XE_BO_FLAG_PAGETABLE,
> -						   alignment, false);
> +		dpt = xe_bo_create_pin_map_novm(xe, tile0, dpt_size,
> +						ttm_bo_type_kernel,
> +						XE_BO_FLAG_STOLEN |
> +						XE_BO_FLAG_GGTT |
> +						XE_BO_FLAG_PAGETABLE, true);
>  	if (IS_ERR(dpt))
> -		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> -						   dpt_size,  ~0ull,
> -						   ttm_bo_type_kernel,
> -						   true,
> -						   XE_BO_FLAG_SYSTEM |
> -						   XE_BO_FLAG_GGTT |
> -						   XE_BO_FLAG_PAGETABLE,
> -						   alignment, false);
> +		dpt = xe_bo_create_pin_map_novm(xe, tile0, dpt_size,
> +						ttm_bo_type_kernel,
> +						XE_BO_FLAG_SYSTEM |
> +						XE_BO_FLAG_GGTT |
> +						XE_BO_FLAG_PAGETABLE, true);
>  	if (IS_ERR(dpt))
>  		return PTR_ERR(dpt);
>  
> diff --git a/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c b/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
> index 30f1073141fc..4ae847b628e2 100644
> --- a/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
> +++ b/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
> @@ -72,10 +72,10 @@ static int intel_hdcp_gsc_initialize_message(struct xe_device *xe,
>  	int ret = 0;
>  
>  	/* allocate object of two page for HDCP command memory and store it */
> -	bo = xe_bo_create_pin_map(xe, xe_device_get_root_tile(xe), NULL, PAGE_SIZE * 2,
> -				  ttm_bo_type_kernel,
> -				  XE_BO_FLAG_SYSTEM |
> -				  XE_BO_FLAG_GGTT);
> +	bo = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe), PAGE_SIZE * 2,
> +				       ttm_bo_type_kernel,
> +				       XE_BO_FLAG_SYSTEM |
> +				       XE_BO_FLAG_GGTT, false);
>  
>  	if (IS_ERR(bo)) {
>  		drm_err(&xe->drm, "Failed to allocate bo for HDCP streaming command!\n");
> diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
> index afa794e56065..5904d658d1f2 100644
> --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
> @@ -204,7 +204,8 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test,
>  
>  	big = xe_bo_create_pin_map(xe, tile, m->q->vm, SZ_4M,
>  				   ttm_bo_type_kernel,
> -				   XE_BO_FLAG_VRAM_IF_DGFX(tile));
> +				   XE_BO_FLAG_VRAM_IF_DGFX(tile),
> +				   exec);
>  	if (IS_ERR(big)) {
>  		KUNIT_FAIL(test, "Failed to allocate bo: %li\n", PTR_ERR(big));
>  		goto vunmap;
> @@ -212,7 +213,8 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test,
>  
>  	pt = xe_bo_create_pin_map(xe, tile, m->q->vm, XE_PAGE_SIZE,
>  				  ttm_bo_type_kernel,
> -				  XE_BO_FLAG_VRAM_IF_DGFX(tile));
> +				  XE_BO_FLAG_VRAM_IF_DGFX(tile),
> +				  exec);
>  	if (IS_ERR(pt)) {
>  		KUNIT_FAIL(test, "Failed to allocate fake pt: %li\n",
>  			   PTR_ERR(pt));
> @@ -222,7 +224,8 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test,
>  	tiny = xe_bo_create_pin_map(xe, tile, m->q->vm,
>  				    2 * SZ_4K,
>  				    ttm_bo_type_kernel,
> -				    XE_BO_FLAG_VRAM_IF_DGFX(tile));
> +				    XE_BO_FLAG_VRAM_IF_DGFX(tile),
> +				    exec);
>  	if (IS_ERR(tiny)) {
>  		KUNIT_FAIL(test, "Failed to allocate tiny fake pt: %li\n",
>  			   PTR_ERR(tiny));
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index c9928d4ee5a0..82bf158426ad 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -2343,16 +2343,60 @@ xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile *tile,
>  	return ret ? ERR_PTR(ret) : bo;
>  }
>  
> +/**
> + * xe_bo_create_pin_map() - Create pinned and mapped bo
> + * @xe: The xe device.
> + * @tile: The tile to select for migration of this bo, and the tile used for
> + * @vm: The vm to associate the buffer object with. The vm's resv must be locked
> + * with the transaction represented by @exec.
> + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
> + * @size: The storage size to use for the bo.
> + * @type: The TTM buffer object type.
> + * @flags: XE_BO_FLAG_ flags.
> + * @exec: The drm_exec transaction to use for exhaustive eviction, and
> + * previously used for locking @vm's resv.
> + *
> + * Create a pinned and mapped bo. The bo will be external and not associated
> + * with a VM.
> + *
> + * Return: The buffer object on success. Negative error pointer on failure.
> + * In particular, the function may return ERR_PTR(%-EINTR) if @intr was set
> + * to true on entry.
> + */
>  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
>  				   struct xe_vm *vm, size_t size,
> -				   enum ttm_bo_type type, u32 flags)
> +				   enum ttm_bo_type type, u32 flags,
> +				   struct drm_exec *exec)
>  {
> -	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
> -
> +	xe_assert(xe, exec);
>  	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size, ~0ull, type, flags,
>  					       true, 0, exec);
>  }
>  
> +/**
> + * xe_bo_create_pin_map_novm() - Create pinned and mapped bo
> + * @xe: The xe device.
> + * @tile: The tile to select for migration of this bo, and the tile used for
> + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
> + * @size: The storage size to use for the bo.
> + * @type: The TTM buffer object type.
> + * @flags: XE_BO_FLAG_ flags.
> + * @intr: Whether to execut any waits for backing store interruptible.
> + *
> + * Create a pinned and mapped bo. The bo will be external and not associated
> + * with a VM.
> + *
> + * Return: The buffer object on success. Negative error pointer on failure.
> + * In particular, the function may return ERR_PTR(%-EINTR) if @intr was set
> + * to true on entry.
> + */
> +struct xe_bo *xe_bo_create_pin_map_novm(struct xe_device *xe, struct xe_tile *tile,
> +					size_t size, enum ttm_bo_type type, u32 flags,
> +					bool intr)
> +{
> +	return xe_bo_create_pin_map_at_novm(xe, tile, size, ~0ull, type, flags, true, 0, intr);
> +}
> +
>  static void __xe_bo_unpin_map_no_vm(void *arg)
>  {
>  	xe_bo_unpin_map_no_vm(arg);
> @@ -2365,8 +2409,7 @@ struct xe_bo *xe_managed_bo_create_pin_map(struct xe_device *xe, struct xe_tile
>  	int ret;
>  
>  	KUNIT_STATIC_STUB_REDIRECT(xe_managed_bo_create_pin_map, xe, tile, size, flags);
> -
> -	bo = xe_bo_create_pin_map(xe, tile, NULL, size, ttm_bo_type_kernel, flags);
> +	bo = xe_bo_create_pin_map_novm(xe, tile, size, ttm_bo_type_kernel, flags, true);
>  	if (IS_ERR(bo))
>  		return bo;
>  
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index d06266af9662..802e3c7d7872 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -108,7 +108,11 @@ struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_vm *vm, size_t s
>  				u16 cpu_caching, u32 flags, struct drm_exec *exec);
>  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
>  				   struct xe_vm *vm, size_t size,
> -				   enum ttm_bo_type type, u32 flags);
> +				   enum ttm_bo_type type, u32 flags,
> +				   struct drm_exec *exec);
> +struct xe_bo *xe_bo_create_pin_map_novm(struct xe_device *xe, struct xe_tile *tile,
> +					size_t size, enum ttm_bo_type type, u32 flags,
> +					bool intr);
>  struct xe_bo *
>  xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile *tile,
>  			     size_t size, u64 offset, enum ttm_bo_type type,
> diff --git a/drivers/gpu/drm/xe/xe_gsc.c b/drivers/gpu/drm/xe/xe_gsc.c
> index f5ae28af60d4..83d61bf8ec62 100644
> --- a/drivers/gpu/drm/xe/xe_gsc.c
> +++ b/drivers/gpu/drm/xe/xe_gsc.c
> @@ -136,10 +136,10 @@ static int query_compatibility_version(struct xe_gsc *gsc)
>  	u64 ggtt_offset;
>  	int err;
>  
> -	bo = xe_bo_create_pin_map(xe, tile, NULL, GSC_VER_PKT_SZ * 2,
> -				  ttm_bo_type_kernel,
> -				  XE_BO_FLAG_SYSTEM |
> -				  XE_BO_FLAG_GGTT);
> +	bo = xe_bo_create_pin_map_novm(xe, tile, GSC_VER_PKT_SZ * 2,
> +				       ttm_bo_type_kernel,
> +				       XE_BO_FLAG_SYSTEM |
> +				       XE_BO_FLAG_GGTT, false);
>  	if (IS_ERR(bo)) {
>  		xe_gt_err(gt, "failed to allocate bo for GSC version query\n");
>  		return PTR_ERR(bo);
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> index 906011671b60..d0a87d7b028b 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> @@ -1452,7 +1452,6 @@ static bool pf_release_vf_config_lmem(struct xe_gt *gt, struct xe_gt_sriov_confi
>  static int pf_provision_vf_lmem(struct xe_gt *gt, unsigned int vfid, u64 size)
>  {
>  	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
> -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
>  	struct xe_device *xe = gt_to_xe(gt);
>  	struct xe_tile *tile = gt_to_tile(gt);
>  	struct xe_bo *bo;
> @@ -1479,24 +1478,17 @@ static int pf_provision_vf_lmem(struct xe_gt *gt, unsigned int vfid, u64 size)
>  		return 0;
>  
>  	xe_gt_assert(gt, pf_get_lmem_alignment(gt) == SZ_2M);
> -	bo = xe_bo_create_locked(xe, tile, NULL,
> -				 ALIGN(size, PAGE_SIZE),
> -				 ttm_bo_type_kernel,
> -				 XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> -				 XE_BO_FLAG_NEEDS_2M |
> -				 XE_BO_FLAG_PINNED |
> -				 XE_BO_FLAG_PINNED_LATE_RESTORE,
> -				 exec);
> +	bo = xe_bo_create_pin_map_at_novm(xe, tile,
> +					  ALIGN(size, PAGE_SIZE),
> +					  0,
> +					  ttm_bo_type_kernel,
> +					  XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> +					  XE_BO_FLAG_NEEDS_2M |
> +					  XE_BO_FLAG_PINNED_LATE_RESTORE,
> +					  false, 0, false);
>  	if (IS_ERR(bo))
>  		return PTR_ERR(bo);
>  
> -	err = xe_bo_pin(bo, exec);
> -	xe_bo_unlock(bo);
> -	if (unlikely(err)) {
> -		xe_bo_put(bo);
> -		return err;
> -	}
> -
>  	config->lmem_obj = bo;
>  
>  	if (xe_device_has_lmtt(xe)) {
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> index c712111aa30d..44cc612b0a75 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> @@ -55,12 +55,12 @@ static int pf_send_guc_save_vf_state(struct xe_gt *gt, unsigned int vfid,
>  	xe_gt_assert(gt, size % sizeof(u32) == 0);
>  	xe_gt_assert(gt, size == ndwords * sizeof(u32));
>  
> -	bo = xe_bo_create_pin_map(xe, tile, NULL,
> -				  ALIGN(size, PAGE_SIZE),
> -				  ttm_bo_type_kernel,
> -				  XE_BO_FLAG_SYSTEM |
> -				  XE_BO_FLAG_GGTT |
> -				  XE_BO_FLAG_GGTT_INVALIDATE);
> +	bo = xe_bo_create_pin_map_novm(xe, tile,
> +				       ALIGN(size, PAGE_SIZE),
> +				       ttm_bo_type_kernel,
> +				       XE_BO_FLAG_SYSTEM |
> +				       XE_BO_FLAG_GGTT |
> +				       XE_BO_FLAG_GGTT_INVALIDATE, false);
>  	if (IS_ERR(bo))
>  		return PTR_ERR(bo);
>  
> @@ -91,12 +91,12 @@ static int pf_send_guc_restore_vf_state(struct xe_gt *gt, unsigned int vfid,
>  	xe_gt_assert(gt, size % sizeof(u32) == 0);
>  	xe_gt_assert(gt, size == ndwords * sizeof(u32));
>  
> -	bo = xe_bo_create_pin_map(xe, tile, NULL,
> -				  ALIGN(size, PAGE_SIZE),
> -				  ttm_bo_type_kernel,
> -				  XE_BO_FLAG_SYSTEM |
> -				  XE_BO_FLAG_GGTT |
> -				  XE_BO_FLAG_GGTT_INVALIDATE);
> +	bo = xe_bo_create_pin_map_novm(xe, tile,
> +				       ALIGN(size, PAGE_SIZE),
> +				       ttm_bo_type_kernel,
> +				       XE_BO_FLAG_SYSTEM |
> +				       XE_BO_FLAG_GGTT |
> +				       XE_BO_FLAG_GGTT_INVALIDATE, false);
>  	if (IS_ERR(bo))
>  		return PTR_ERR(bo);
>  
> diff --git a/drivers/gpu/drm/xe/xe_guc_engine_activity.c b/drivers/gpu/drm/xe/xe_guc_engine_activity.c
> index 92e1f9f41b8c..2b99c1ebdd58 100644
> --- a/drivers/gpu/drm/xe/xe_guc_engine_activity.c
> +++ b/drivers/gpu/drm/xe/xe_guc_engine_activity.c
> @@ -94,16 +94,17 @@ static int allocate_engine_activity_buffers(struct xe_guc *guc,
>  	struct xe_tile *tile = gt_to_tile(gt);
>  	struct xe_bo *bo, *metadata_bo;
>  
> -	metadata_bo = xe_bo_create_pin_map(gt_to_xe(gt), tile, NULL, PAGE_ALIGN(metadata_size),
> -					   ttm_bo_type_kernel, XE_BO_FLAG_SYSTEM |
> -					   XE_BO_FLAG_GGTT | XE_BO_FLAG_GGTT_INVALIDATE);
> +	metadata_bo = xe_bo_create_pin_map_novm(gt_to_xe(gt), tile, PAGE_ALIGN(metadata_size),
> +						ttm_bo_type_kernel, XE_BO_FLAG_SYSTEM |
> +						XE_BO_FLAG_GGTT | XE_BO_FLAG_GGTT_INVALIDATE,
> +						false);
>  
>  	if (IS_ERR(metadata_bo))
>  		return PTR_ERR(metadata_bo);
>  
> -	bo = xe_bo_create_pin_map(gt_to_xe(gt), tile, NULL, PAGE_ALIGN(size),
> -				  ttm_bo_type_kernel, XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> -				  XE_BO_FLAG_GGTT | XE_BO_FLAG_GGTT_INVALIDATE);
> +	bo = xe_bo_create_pin_map_novm(gt_to_xe(gt), tile, PAGE_ALIGN(size),
> +				       ttm_bo_type_kernel, XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> +				       XE_BO_FLAG_GGTT | XE_BO_FLAG_GGTT_INVALIDATE, false);
>  
>  	if (IS_ERR(bo)) {
>  		xe_bo_unpin_map_no_vm(metadata_bo);
> diff --git a/drivers/gpu/drm/xe/xe_lmtt.c b/drivers/gpu/drm/xe/xe_lmtt.c
> index a78c9d474a6e..4ad468574174 100644
> --- a/drivers/gpu/drm/xe/xe_lmtt.c
> +++ b/drivers/gpu/drm/xe/xe_lmtt.c
> @@ -67,12 +67,12 @@ static struct xe_lmtt_pt *lmtt_pt_alloc(struct xe_lmtt *lmtt, unsigned int level
>  		goto out;
>  	}
>  
> -	bo = xe_bo_create_pin_map(lmtt_to_xe(lmtt), lmtt_to_tile(lmtt), NULL,
> -				  PAGE_ALIGN(lmtt->ops->lmtt_pte_size(level) *
> -					     lmtt->ops->lmtt_pte_num(level)),
> -				  ttm_bo_type_kernel,
> -				  XE_BO_FLAG_VRAM_IF_DGFX(lmtt_to_tile(lmtt)) |
> -				  XE_BO_FLAG_NEEDS_64K);
> +	bo = xe_bo_create_pin_map_novm(lmtt_to_xe(lmtt), lmtt_to_tile(lmtt),
> +				       PAGE_ALIGN(lmtt->ops->lmtt_pte_size(level) *
> +						  lmtt->ops->lmtt_pte_num(level)),
> +				       ttm_bo_type_kernel,
> +				       XE_BO_FLAG_VRAM_IF_DGFX(lmtt_to_tile(lmtt)) |
> +				       XE_BO_FLAG_NEEDS_64K, false);
>  	if (IS_ERR(bo)) {
>  		err = PTR_ERR(bo);
>  		goto out_free_pt;
> diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
> index 8f6c3ba47882..6d52e0eb97f5 100644
> --- a/drivers/gpu/drm/xe/xe_lrc.c
> +++ b/drivers/gpu/drm/xe/xe_lrc.c
> @@ -1340,9 +1340,10 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
>  	if (vm && vm->xef) /* userspace */
>  		bo_flags |= XE_BO_FLAG_PINNED_LATE_RESTORE;
>  
> -	lrc->bo = xe_bo_create_pin_map(xe, tile, NULL, bo_size,
> -				       ttm_bo_type_kernel,
> -				       bo_flags);
> +	lrc->bo = xe_bo_create_pin_map_novm(xe, tile,
> +					    bo_size,
> +					    ttm_bo_type_kernel,
> +					    bo_flags, false);
>  	if (IS_ERR(lrc->bo))
>  		return PTR_ERR(lrc->bo);
>  
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> index ddfad7506a82..fe0d15ab340e 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -35,6 +35,7 @@
>  #include "xe_sched_job.h"
>  #include "xe_sync.h"
>  #include "xe_trace_bo.h"
> +#include "xe_validation.h"
>  #include "xe_vm.h"
>  #include "xe_vram.h"
>  
> @@ -173,7 +174,7 @@ static void xe_migrate_program_identity(struct xe_device *xe, struct xe_vm *vm,
>  }
>  
>  static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
> -				 struct xe_vm *vm)
> +				 struct xe_vm *vm, struct drm_exec *exec)
>  {
>  	struct xe_device *xe = tile_to_xe(tile);
>  	u16 pat_index = xe->pat.idx[XE_CACHE_WB];
> @@ -200,7 +201,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>  				  num_entries * XE_PAGE_SIZE,
>  				  ttm_bo_type_kernel,
>  				  XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> -				  XE_BO_FLAG_PAGETABLE);
> +				  XE_BO_FLAG_PAGETABLE, exec);
>  	if (IS_ERR(bo))
>  		return PTR_ERR(bo);
>  
> @@ -404,6 +405,8 @@ int xe_migrate_init(struct xe_migrate *m)
>  	struct xe_tile *tile = m->tile;
>  	struct xe_gt *primary_gt = tile->primary_gt;
>  	struct xe_device *xe = tile_to_xe(tile);
> +	struct xe_validation_ctx ctx;
> +	struct drm_exec exec;
>  	struct xe_vm *vm;
>  	int err;
>  
> @@ -413,11 +416,16 @@ int xe_migrate_init(struct xe_migrate *m)
>  	if (IS_ERR(vm))
>  		return PTR_ERR(vm);
>  
> -	xe_vm_lock(vm, false);
> -	err = xe_migrate_prepare_vm(tile, m, vm);
> -	xe_vm_unlock(vm);
> +	err = 0;
> +	xe_validation_guard(&ctx, &xe->val, &exec, 0, err, false) {
> +		err = xe_vm_drm_exec_lock(vm, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +		err = xe_migrate_prepare_vm(tile, m, vm, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +		xe_validation_retry_on_oom(&ctx, &err);
> +	}
>  	if (err)
> -		goto err_out;
> +		return err;
>  
>  	if (xe->info.has_usm) {
>  		struct xe_hw_engine *hwe = xe_gt_hw_engine(primary_gt,
> diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
> index a188bad172ad..a4894eb0d7f3 100644
> --- a/drivers/gpu/drm/xe/xe_oa.c
> +++ b/drivers/gpu/drm/xe/xe_oa.c
> @@ -883,9 +883,9 @@ static int xe_oa_alloc_oa_buffer(struct xe_oa_stream *stream, size_t size)
>  {
>  	struct xe_bo *bo;
>  
> -	bo = xe_bo_create_pin_map(stream->oa->xe, stream->gt->tile, NULL,
> -				  size, ttm_bo_type_kernel,
> -				  XE_BO_FLAG_SYSTEM | XE_BO_FLAG_GGTT);
> +	bo = xe_bo_create_pin_map_novm(stream->oa->xe, stream->gt->tile,
> +				       size, ttm_bo_type_kernel,
> +				       XE_BO_FLAG_SYSTEM | XE_BO_FLAG_GGTT, false);
>  	if (IS_ERR(bo))
>  		return PTR_ERR(bo);
>  
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index f3a39e734a90..33ad40418ceb 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -88,6 +88,7 @@ static void xe_pt_free(struct xe_pt *pt)
>   * @vm: The vm to create for.
>   * @tile: The tile to create for.
>   * @level: The page-table level.
> + * @exec: The drm_exec object used to lock the vm.
>   *
>   * Allocate and initialize a single struct xe_pt metadata structure. Also
>   * create the corresponding page-table bo, but don't initialize it. If the
> @@ -99,7 +100,7 @@ static void xe_pt_free(struct xe_pt *pt)
>   * error.
>   */
>  struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
> -			   unsigned int level)
> +			   unsigned int level, struct drm_exec *exec)
>  {
>  	struct xe_pt *pt;
>  	struct xe_bo *bo;
> @@ -123,9 +124,11 @@ struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
>  		bo_flags |= XE_BO_FLAG_PINNED_LATE_RESTORE;
>  
>  	pt->level = level;
> +
> +	drm_WARN_ON(&vm->xe->drm, IS_ERR_OR_NULL(exec));
>  	bo = xe_bo_create_pin_map(vm->xe, tile, vm, SZ_4K,
>  				  ttm_bo_type_kernel,
> -				  bo_flags);
> +				  bo_flags, exec);
>  	if (IS_ERR(bo)) {
>  		err = PTR_ERR(bo);
>  		goto err_kfree;
> @@ -589,7 +592,8 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
>  	if (covers || !*child) {
>  		u64 flags = 0;
>  
> -		xe_child = xe_pt_create(xe_walk->vm, xe_walk->tile, level - 1);
> +		xe_child = xe_pt_create(xe_walk->vm, xe_walk->tile, level - 1,
> +					xe_vm_validation_exec(vm));
>  		if (IS_ERR(xe_child))
>  			return PTR_ERR(xe_child);
>  
> diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
> index 5ecf003d513c..4daeebaab5a1 100644
> --- a/drivers/gpu/drm/xe/xe_pt.h
> +++ b/drivers/gpu/drm/xe/xe_pt.h
> @@ -10,6 +10,7 @@
>  #include "xe_pt_types.h"
>  
>  struct dma_fence;
> +struct drm_exec;
>  struct xe_bo;
>  struct xe_device;
>  struct xe_exec_queue;
> @@ -29,7 +30,7 @@ struct xe_vma_ops;
>  unsigned int xe_pt_shift(unsigned int level);
>  
>  struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
> -			   unsigned int level);
> +			   unsigned int level, struct drm_exec *exec);
>  
>  void xe_pt_populate_empty(struct xe_tile *tile, struct xe_vm *vm,
>  			  struct xe_pt *pt);
> diff --git a/drivers/gpu/drm/xe/xe_pxp_submit.c b/drivers/gpu/drm/xe/xe_pxp_submit.c
> index ca95f2a4d4ef..54bd6b64dc6d 100644
> --- a/drivers/gpu/drm/xe/xe_pxp_submit.c
> +++ b/drivers/gpu/drm/xe/xe_pxp_submit.c
> @@ -54,8 +54,9 @@ static int allocate_vcs_execution_resources(struct xe_pxp *pxp)
>  	 * Each termination is 16 DWORDS, so 4K is enough to contain a
>  	 * termination for each sessions.
>  	 */
> -	bo = xe_bo_create_pin_map(xe, tile, NULL, SZ_4K, ttm_bo_type_kernel,
> -				  XE_BO_FLAG_SYSTEM | XE_BO_FLAG_PINNED | XE_BO_FLAG_GGTT);
> +	bo = xe_bo_create_pin_map_novm(xe, tile, SZ_4K, ttm_bo_type_kernel,
> +				       XE_BO_FLAG_SYSTEM | XE_BO_FLAG_PINNED | XE_BO_FLAG_GGTT,
> +				       false);
>  	if (IS_ERR(bo)) {
>  		err = PTR_ERR(bo);
>  		goto out_queue;
> @@ -87,7 +88,9 @@ static int allocate_gsc_client_resources(struct xe_gt *gt,
>  {
>  	struct xe_tile *tile = gt_to_tile(gt);
>  	struct xe_device *xe = tile_to_xe(tile);
> +	struct xe_validation_ctx ctx;
>  	struct xe_hw_engine *hwe;
> +	struct drm_exec exec;
>  	struct xe_vm *vm;
>  	struct xe_bo *bo;
>  	struct xe_exec_queue *q;
> @@ -106,15 +109,26 @@ static int allocate_gsc_client_resources(struct xe_gt *gt,
>  		return PTR_ERR(vm);
>  
>  	/* We allocate a single object for the batch and the in/out memory */
> -	xe_vm_lock(vm, false);
> -	bo = xe_bo_create_pin_map(xe, tile, vm, PXP_BB_SIZE + inout_size * 2,
> -				  ttm_bo_type_kernel,
> -				  XE_BO_FLAG_SYSTEM | XE_BO_FLAG_PINNED | XE_BO_FLAG_NEEDS_UC);
> -	xe_vm_unlock(vm);
> -	if (IS_ERR(bo)) {
> -		err = PTR_ERR(bo);
> -		goto vm_out;
> +
> +	xe_validation_guard(&ctx, &xe->val, &exec, 0, err, false) {
> +		err = xe_vm_drm_exec_lock(vm, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +		if (err)
> +			break;
> +
> +		bo = xe_bo_create_pin_map(xe, tile, vm, PXP_BB_SIZE + inout_size * 2,
> +					  ttm_bo_type_kernel,
> +					  XE_BO_FLAG_SYSTEM | XE_BO_FLAG_PINNED |
> +					  XE_BO_FLAG_NEEDS_UC, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +		if (IS_ERR(bo)) {
> +			err = PTR_ERR(bo);
> +			xe_validation_retry_on_oom(&ctx, &err);
> +			break;
> +		}
>  	}
> +	if (err)
> +		goto vm_out;
>  
>  	fence = xe_vm_bind_kernel_bo(vm, bo, NULL, 0, XE_CACHE_WB);
>  	if (IS_ERR(fence)) {
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 989d84c2e82f..b3ee65126841 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -1606,6 +1606,7 @@ static void vm_destroy_work_func(struct work_struct *w);
>   * @xe: xe device.
>   * @tile: tile to set up for.
>   * @vm: vm to set up for.
> + * @exec: The struct drm_exec object used to lock the vm resv.
>   *
>   * Sets up a pagetable tree with one page-table per level and a single
>   * leaf PTE. All pagetable entries point to the single page-table or,
> @@ -1615,20 +1616,19 @@ static void vm_destroy_work_func(struct work_struct *w);
>   * Return: 0 on success, negative error code on error.
>   */
>  static int xe_vm_create_scratch(struct xe_device *xe, struct xe_tile *tile,
> -				struct xe_vm *vm)
> +				struct xe_vm *vm, struct drm_exec *exec)
>  {
>  	u8 id = tile->id;
>  	int i;
>  
>  	for (i = MAX_HUGEPTE_LEVEL; i < vm->pt_root[id]->level; i++) {
> -		vm->scratch_pt[id][i] = xe_pt_create(vm, tile, i);
> +		vm->scratch_pt[id][i] = xe_pt_create(vm, tile, i, exec);
>  		if (IS_ERR(vm->scratch_pt[id][i])) {
>  			int err = PTR_ERR(vm->scratch_pt[id][i]);
>  
>  			vm->scratch_pt[id][i] = NULL;
>  			return err;
>  		}
> -
>  		xe_pt_populate_empty(tile, vm, vm->scratch_pt[id][i]);
>  	}
>  
> @@ -1656,9 +1656,26 @@ static void xe_vm_free_scratch(struct xe_vm *vm)
>  	}
>  }
>  
> +static void xe_vm_pt_destroy(struct xe_vm *vm)
> +{
> +	struct xe_tile *tile;
> +	u8 id;
> +
> +	xe_vm_assert_held(vm);
> +
> +	for_each_tile(tile, vm->xe, id) {
> +		if (vm->pt_root[id]) {
> +			xe_pt_destroy(vm->pt_root[id], vm->flags, NULL);
> +			vm->pt_root[id] = NULL;
> +		}
> +	}
> +}
> +
>  struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct xe_file *xef)
>  {
>  	struct drm_gem_object *vm_resv_obj;
> +	struct xe_validation_ctx ctx;
> +	struct drm_exec exec;
>  	struct xe_vm *vm;
>  	int err, number_tiles = 0;
>  	struct xe_tile *tile;
> @@ -1745,49 +1762,64 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct xe_file *xef)
>  
>  	drm_gem_object_put(vm_resv_obj);
>  
> -	err = xe_vm_lock(vm, true);
> -	if (err)
> -		goto err_close;
> +	err = 0;
> +	xe_validation_guard(&ctx, &xe->val, &exec, DRM_EXEC_INTERRUPTIBLE_WAIT,
> +			    err, true) {
> +		err = xe_vm_drm_exec_lock(vm, &exec);
> +		drm_exec_retry_on_contention(&exec);
>  
> -	if (IS_DGFX(xe) && xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K)
> -		vm->flags |= XE_VM_FLAG_64K;
> +		if (IS_DGFX(xe) && xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K)
> +			vm->flags |= XE_VM_FLAG_64K;
>  
> -	for_each_tile(tile, xe, id) {
> -		if (flags & XE_VM_FLAG_MIGRATION &&
> -		    tile->id != XE_VM_FLAG_TILE_ID(flags))
> -			continue;
> +		for_each_tile(tile, xe, id) {
> +			if (flags & XE_VM_FLAG_MIGRATION &&
> +			    tile->id != XE_VM_FLAG_TILE_ID(flags))
> +				continue;
>  
> -		vm->pt_root[id] = xe_pt_create(vm, tile, xe->info.vm_max_level);
> -		if (IS_ERR(vm->pt_root[id])) {
> -			err = PTR_ERR(vm->pt_root[id]);
> -			vm->pt_root[id] = NULL;
> -			goto err_unlock_close;
> +			vm->pt_root[id] = xe_pt_create(vm, tile, xe->info.vm_max_level,
> +						       &exec);
> +			if (IS_ERR(vm->pt_root[id])) {
> +				err = PTR_ERR(vm->pt_root[id]);
> +				vm->pt_root[id] = NULL;
> +				xe_vm_pt_destroy(vm);
> +				drm_exec_retry_on_contention(&exec);
> +				xe_validation_retry_on_oom(&ctx, &err);
> +				goto err_close;
> +			}
>  		}
> -	}
>  
> -	if (xe_vm_has_scratch(vm)) {
> +		if (xe_vm_has_scratch(vm)) {
> +			for_each_tile(tile, xe, id) {
> +				if (!vm->pt_root[id])
> +					continue;
> +
> +				err = xe_vm_create_scratch(xe, tile, vm, &exec);
> +				if (err) {
> +					xe_vm_free_scratch(vm);
> +					xe_vm_pt_destroy(vm);
> +					drm_exec_retry_on_contention(&exec);
> +					xe_validation_retry_on_oom(&ctx, &err);
> +					goto err_close;
> +				}
> +			}
> +			vm->batch_invalidate_tlb = true;
> +		}
> +
> +		if (vm->flags & XE_VM_FLAG_LR_MODE) {
> +			INIT_WORK(&vm->preempt.rebind_work, preempt_rebind_work_func);
> +			vm->batch_invalidate_tlb = false;
> +		}
> +
> +		/* Fill pt_root after allocating scratch tables */
>  		for_each_tile(tile, xe, id) {
>  			if (!vm->pt_root[id])
>  				continue;
>  
> -			err = xe_vm_create_scratch(xe, tile, vm);
> -			if (err)
> -				goto err_unlock_close;
> +			xe_pt_populate_empty(tile, vm, vm->pt_root[id]);
>  		}
> -		vm->batch_invalidate_tlb = true;
> -	}
> -
> -	if (vm->flags & XE_VM_FLAG_LR_MODE)
> -		vm->batch_invalidate_tlb = false;
> -
> -	/* Fill pt_root after allocating scratch tables */
> -	for_each_tile(tile, xe, id) {
> -		if (!vm->pt_root[id])
> -			continue;
> -
> -		xe_pt_populate_empty(tile, vm, vm->pt_root[id]);
>  	}
> -	xe_vm_unlock(vm);
> +	if (err)
> +		goto err_close;
>  
>  	/* Kernel migration VM shouldn't have a circular loop.. */
>  	if (!(flags & XE_VM_FLAG_MIGRATION)) {
> @@ -1820,7 +1852,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct xe_file *xef)
>  				      &xe->usm.next_asid, GFP_KERNEL);
>  		up_write(&xe->usm.lock);
>  		if (err < 0)
> -			goto err_unlock_close;
> +			goto err_close;
>  
>  		vm->usm.asid = asid;
>  	}
> @@ -1829,8 +1861,6 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct xe_file *xef)
>  
>  	return vm;
>  
> -err_unlock_close:
> -	xe_vm_unlock(vm);
>  err_close:
>  	xe_vm_close_and_put(vm);
>  	return ERR_PTR(err);
> @@ -1959,13 +1989,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
>  	 * destroy the pagetables immediately.
>  	 */
>  	xe_vm_free_scratch(vm);
> -
> -	for_each_tile(tile, xe, id) {
> -		if (vm->pt_root[id]) {
> -			xe_pt_destroy(vm->pt_root[id], vm->flags, NULL);
> -			vm->pt_root[id] = NULL;
> -		}
> -	}
> +	xe_vm_pt_destroy(vm);
>  	xe_vm_unlock(vm);
>  
>  	/*
> @@ -3845,7 +3869,6 @@ struct dma_fence *xe_vm_bind_kernel_bo(struct xe_vm *vm, struct xe_bo *bo,
>   */
>  int xe_vm_lock(struct xe_vm *vm, bool intr)
>  {
> -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;

You add this earlier in the series and then delete it here, which seems
odd. If the intent is to apply a workaround (WA) earlier and finalize it
here, that makes sense.

However, deleting it here appears to leave a problematic path in
xe_svm.c around xe_vm_range_rebind, where vm->validating._exec may
remain unset/stale. I’m surprised CI didn’t catch this—perhaps due to an
unbalanced xe_vm_set_validation_exec leaving stale state—or I’m missing
something.

Matt

>  	int ret;
>  
>  	if (intr)
> @@ -3853,9 +3876,6 @@ int xe_vm_lock(struct xe_vm *vm, bool intr)
>  	else
>  		ret = dma_resv_lock(xe_vm_resv(vm), NULL);
>  
> -	if (!ret)
> -		xe_vm_set_validation_exec(vm, exec);
> -
>  	return ret;
>  }
>  
> @@ -3867,7 +3887,6 @@ int xe_vm_lock(struct xe_vm *vm, bool intr)
>   */
>  void xe_vm_unlock(struct xe_vm *vm)
>  {
> -	xe_vm_set_validation_exec(vm, NULL);
>  	dma_resv_unlock(xe_vm_resv(vm));
>  }
>  
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 05/15] drm/xe: Introduce an xe_validation wrapper around drm_exec
  2025-08-14  2:33   ` Matthew Brost
@ 2025-08-14  4:23     ` Matthew Brost
  2025-08-15 15:23     ` Thomas Hellström
  1 sibling, 0 replies; 66+ messages in thread
From: Matthew Brost @ 2025-08-14  4:23 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, Aug 13, 2025 at 07:33:58PM -0700, Matthew Brost wrote:
> On Wed, Aug 13, 2025 at 12:51:11PM +0200, Thomas Hellström wrote:
> > Introduce a validation wrapper xe_validation_guard() as a helper
> > intended to be used around drm_exec transactions what perform
> > validations. Once TTM can handle exhaustive eviction we could
> > remove this wrapper or make it mostly a NO-OP unless other
> > functionality is added to it.
> > 
> > Currently the wrapper takes a read lock upon entry and if the
> > transaction hits an OOM, all locks are released and the
> > transaction is retried with a write-lock. If all other
> > validations participate in this scheme, the transaction with
> > the write lock will be the only transaction validating and
> > should have access to all available non-pinned memory.
> > 
> > There is currently a problem in that TTM converts -EDEADLOCKS to
> > -ENOMEM, and with ww_mutex slowpath error injections, we can hit
> > -ENOMEMs without having actually ran out of memory. We abuse
> > ww_mutex internals to detect such situations until TTM is fixes
> > to not convert the error code. In the meantime, injecting
> > ww_mutex slowpath -EDEADLOCKs is a good way to test
> > the implementation in the absence of real OOMs.
> > 
> > Just introduce the wrapper in this commit. It will be hooked up
> > to the driver in following commits.
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_validation.c | 199 +++++++++++++++++++++++++++++
> >  drivers/gpu/drm/xe/xe_validation.h | 107 ++++++++++++++++
> >  2 files changed, 306 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_validation.c b/drivers/gpu/drm/xe/xe_validation.c
> > index cc0684d24e02..cd1424f04237 100644
> > --- a/drivers/gpu/drm/xe/xe_validation.c
> > +++ b/drivers/gpu/drm/xe/xe_validation.c
> > @@ -5,6 +5,7 @@
> >  #include "xe_bo.h"
> >  #include <drm/drm_exec.h>
> >  #include <drm/drm_gem.h>
> > +#include <drm/drm_gpuvm.h>
> >  
> >  #include "xe_assert.h"
> >  #include "xe_validation.h"
> > @@ -47,3 +48,201 @@ void xe_validation_assert_exec(const struct xe_device *xe,
> >  	}
> >  }
> >  #endif
> > +
> > +static int xe_validation_lock(struct xe_validation_ctx *ctx)
> > +{
> > +	struct xe_validation_device *val = ctx->val;
> > +	int ret = 0;
> > +
> > +	if (ctx->flags & DRM_EXEC_INTERRUPTIBLE_WAIT) {
> > +		if (ctx->request_exclusive)
> > +			ret = down_write_killable(&val->lock);
> > +		else
> > +			ret = down_read_interruptible(&val->lock);
> > +	} else {
> > +		if (ctx->request_exclusive)
> > +			down_write(&val->lock);
> > +		else
> > +			down_read(&val->lock);
> > +	}
> > +
> > +	if (!ret) {
> > +		ctx->lock_held = true;
> > +		ctx->lock_held_exclusive = ctx->request_exclusive;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static void xe_validation_unlock(struct xe_validation_ctx *ctx)
> > +{
> > +	if (!ctx->lock_held)
> > +		return;
> > +
> > +	if (ctx->lock_held_exclusive)
> > +		up_write(&ctx->val->lock);
> > +	else
> > +		up_read(&ctx->val->lock);
> > +
> > +	ctx->lock_held = false;
> > +}
> > +
> > +/**
> > + * xe_validation_ctx_init() - Initialize an xe_validation_ctx
> > + * @ctx: The xe_validation_ctx to initialize.
> > + * @val: The xe_validation_device representing the validation domain.
> > + * @exec: The struct drm_exec to use for the transaction.
> > + * @flags: The flags to use for drm_exec initialization.
> > + * @nr: The number of anticipated buffer object locks. Forwarded to
> > + * drm_exec initialization.
> > + * @exclusive: Whether to use exclusive locking already on first validation.
> 
> The last two parameters of this function are always passed as 0 and
> false in this series. Is it worth keeping them? I don’t see a case where

Self correction, I see the shrinker uses exclusive. Same suggestion
though wrt to extending the flags field here for exclusive.

Matt

> nr would ever be non-zero. exclusive is defensible, but it’s still
> unused. Maybe drop both and reserve a bit in flags for a driver-defined
> “exclusive.” That would make the call sites more readable—long argument
> lists make it easy to forget what each parameter means or to transpose
> them.
> 
> > + *
> > + * Initialize and lock a an xe_validation transaction using the validation domain
> > + * represented by @val. Also initialize the drm_exec object forwarding
> > + * @flags and @nr to the drm_exec initialization. The @exclusive parameter should
> > + * typically be set to false to avoid locking out other validators from the
> > + * domain until an OOM is hit. For testing- or final attempt purposes it can,
> > + * however, be set to true.
> > + *
> > + * Return: %0 on success, %-EINTR if interruptible initial locking failed with a
> > + * signal pending.
> > + */
> > +int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct xe_validation_device *val,
> > +			   struct drm_exec *exec, u32 flags, unsigned int nr,
> > +			   bool exclusive)
> > +{
> > +	int ret;
> > +
> > +	ctx->exec = exec;
> > +	ctx->val = val;
> > +	ctx->lock_held = false;
> > +	ctx->lock_held_exclusive = false;
> > +	ctx->request_exclusive = exclusive;
> > +	ctx->flags = flags;
> > +	ctx->nr = nr;
> > +
> > +	ret = xe_validation_lock(ctx);
> > +	if (ret)
> > +		return ret;
> > +
> > +	drm_exec_init(exec, flags, nr);
> > +
> > +	return 0;
> > +}
> > +
> > +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> > +/*
> > + * This abuses both drm_exec and ww_mutex internals and should be
> > + * replaced by checking for -EDEADLK when we can make TTM
> > + * stop converting -EDEADLK to -ENOMEM.
> > + * An alternative is to not have exhaustive eviction with
> > + * CONFIG_DEBUG_WW_MUTEX_SLOWPATH until that happens.
> > + */
> > +static bool xe_validation_contention_injected(struct drm_exec *exec)
> > +{
> > +	return !!exec->ticket.contending_lock;
> > +}
> > +
> > +#else
> > +
> > +static bool xe_validation_contention_injected(struct drm_exec *exec)
> > +{
> > +	return false;
> > +}
> > +
> > +#endif
> > +
> > +static bool __xe_validation_should_retry(struct xe_validation_ctx *ctx, int ret)
> > +{
> > +	if (ret == -ENOMEM &&
> > +	    ((ctx->request_exclusive &&
> > +	      xe_validation_contention_injected(ctx->exec)) ||
> > +	     !ctx->request_exclusive)) {
> > +		ctx->request_exclusive = true;
> > +		return true;
> > +	}
> > +
> > +	return false;
> > +}
> > +
> > +/**
> > + * xe_validation_exec_lock() - Perform drm_gpuvm_exec_lock within a validation
> > + * transaction.
> > + * @ctx: An uninitialized xe_validation_ctx.
> > + * @vm_exec: An initialized struct vm_exec.
> > + * @val: The validation domain.
> > + *
> > + * The drm_gpuvm_exec_lock() function internally initializes its drm_exec
> > + * transaction and therefore doesn't lend itself very well to be using
> > + * xe_validation_ctx_init(). Provide a helper that takes an uninitialized
> > + * xe_validation_ctx and calls drm_gpuvm_exec_lock() with OOM retry.
> > + *
> > + * Return: %0 on success, negative error code on failure.
> > + */
> > +int xe_validation_exec_lock(struct xe_validation_ctx *ctx,
> > +			    struct drm_gpuvm_exec *vm_exec,
> > +			    struct xe_validation_device *val)
> > +{
> > +	int ret;
> > +
> > +	memset(ctx, 0, sizeof(*ctx));
> > +	ctx->exec = &vm_exec->exec;
> > +	ctx->flags = vm_exec->flags;
> > +	ctx->val = val;
> > +retry:
> > +	ret = xe_validation_lock(ctx);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = drm_gpuvm_exec_lock(vm_exec);
> > +	if (ret) {
> > +		xe_validation_unlock(ctx);
> > +		if (__xe_validation_should_retry(ctx, ret))
> > +			goto retry;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +/**
> > + * xe_validation_ctx_fini() - Finalize a validation transaction
> > + * @ctx: The Validation transaction to finalize.
> > + *
> > + * Finalize a validation transaction and its related drm_exec transaction.
> > + */
> > +void xe_validation_ctx_fini(struct xe_validation_ctx *ctx)
> > +{
> > +	drm_exec_fini(ctx->exec);
> > +	xe_validation_unlock(ctx);
> > +}
> > +
> > +/**
> > + * xe_validation_should_retry() - Determine if a validation transaction should retry
> > + * @ctx: The validation transaction.
> > + * @ret: Pointer to a return value variable.
> > + *
> > + * Determines whether a validation transaction should retry based on the
> > + * internal transaction state and the return value pointed to by @ret.
> > + * If a validation should be retried, the transaction is prepared for that,
> > + * and the validation locked might be re-locked in exclusive mode, and *@ret
> > + * is set to %0. If the re-locking errors, typically due to interruptible
> > + * locking with signal pending, *@ret is instead set to -EINTR and the
> > + * function returns %false.
> > + *
> > + * Return: %true if validation should be retried, %false otherwise.
> > + */
> > +bool xe_validation_should_retry(struct xe_validation_ctx *ctx, int *ret)
> > +{
> > +	if (__xe_validation_should_retry(ctx, *ret)) {
> > +		drm_exec_fini(ctx->exec);
> > +		*ret = 0;
> > +		if (ctx->request_exclusive != ctx->lock_held_exclusive) {
> > +			xe_validation_unlock(ctx);
> > +			*ret = xe_validation_lock(ctx);
> > +		}
> > +		drm_exec_init(ctx->exec, ctx->flags, ctx->nr);
> > +		return !*ret;
> > +	}
> > +
> > +	return false;
> > +}
> > diff --git a/drivers/gpu/drm/xe/xe_validation.h b/drivers/gpu/drm/xe/xe_validation.h
> > index db50feacad7a..a708c260cf18 100644
> > --- a/drivers/gpu/drm/xe/xe_validation.h
> > +++ b/drivers/gpu/drm/xe/xe_validation.h
> > @@ -7,9 +7,11 @@
> >  
> >  #include <linux/dma-resv.h>
> >  #include <linux/types.h>
> > +#include <linux/rwsem.h>
> >  
> >  struct drm_exec;
> >  struct drm_gem_object;
> > +struct drm_gpuvm_exec;
> >  struct xe_device;
> >  
> >  #ifdef CONFIG_PROVE_LOCKING
> > @@ -66,4 +68,109 @@ void xe_validation_assert_exec(const struct xe_device *xe, const struct drm_exec
> >  	} while (0)
> >  #endif
> >  
> > +/**
> > + * struct xe_validation_device - The domain for exhaustive eviction
> > + * @lock: The lock used to exclude other processes from allocating graphics memory
> > + *
> > + * The struct xe_validation_device represents the domain for which we want to use
> > + * exhaustive eviction. The @lock is typically grabbed in read mode for allocations
> > + * but when graphics memory allocation fails, it is retried with the write mode held.
> > + */
> > +struct xe_validation_device {
> > +	struct rw_semaphore lock;
> > +};
> > +
> > +/**
> > + * struct xe_validation_ctx - A struct drm_exec subclass with support for
> > + * exhaustive eviction
> > + * @exec: The drm_exec object base class. Note that we use a pointer instead of
> > + * embedding to avoid diamond inheritance.
> > + * @val: The exhaustive eviction domain.
> > + * @lock_held: Whether The domain lock is currently held.
> > + * @lock_held_exclusive: Whether the domain lock is held in exclusive mode.
> > + * @request_exclusive: Whether to lock exclusively (write mode) the next time
> > + * the domain lock is locked.
> > + * @flags: The drm_exec flags used for drm_exec (re-)initialization.
> > + * @nr: The drm_exec nr parameter used for drm_exec (re-)initializaiton.
> > + */
> > +struct xe_validation_ctx {
> > +	struct drm_exec *exec;
> > +	struct xe_validation_device *val;
> > +	bool lock_held;
> > +	bool lock_held_exclusive;
> > +	bool request_exclusive;
> > +	u32 flags;
> > +	unsigned int nr;
> > +};
> > +
> > +int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct xe_validation_device *val,
> > +			   struct drm_exec *exec, u32 flags, unsigned int nr,
> > +			   bool exclusive);
> > +
> > +int xe_validation_exec_lock(struct xe_validation_ctx *ctx, struct drm_gpuvm_exec *vm_exec,
> > +			    struct xe_validation_device *val);
> > +
> > +void xe_validation_ctx_fini(struct xe_validation_ctx *ctx);
> > +
> > +bool xe_validation_should_retry(struct xe_validation_ctx *ctx, int *ret);
> > +
> > +/**
> > + * xe_validation_retry_on_oom() - Retry on oom in an xe_validaton transaction
> > + * @_ctx: Pointer to the xe_validation_ctx
> > + * @_ret: The current error value possibly holding -ENOMEM
> > + *
> > + * Use this in way similar to drm_exec_retry_on_contention().
> > + * If @_ret contains -ENOMEM the tranaction is restarted once in a way that
> > + * blocks other transactions and allows exhastive eviction. If the transaction
> > + * was already restarted once, Just return the -ENOMEM. May also set
> > + * _ret to -EINTR if not retrying and waits are interruptible.
> > + * May only be used within a drm_exec_until_all_locked() loop.
> > + */
> > +#define xe_validation_retry_on_oom(_ctx, _ret)				\
> > +	do {								\
> > +		if (xe_validation_should_retry(_ctx, _ret))		\
> > +			goto *__drm_exec_retry_ptr;			\
> > +	} while (0)
> > +
> > +/**
> > + * xe_validation_device_init - Initialize a struct xe_validation_device
> > + * @val: The xe_validation_device to init.
> > + */
> > +static inline void
> > +xe_validation_device_init(struct xe_validation_device *val)
> > +{
> > +	init_rwsem(&val->lock);
> > +}
> > +
> > +/*
> > + * Make guard() and scoped_guard() work with xe_validation_ctx
> > + * so that we can exit transactions without caring about the
> > + * cleanup.
> > + */
> > +DEFINE_CLASS(xe_validation, struct xe_validation_ctx *,
> > +	     if (!IS_ERR(_T)) xe_validation_ctx_fini(_T);,
> > +	     ({_ret = xe_validation_ctx_init(_ctx, _val, _exec, _flags, 0, _excl);
> > +	       _ret ? NULL : _ctx; }),
> > +	     struct xe_validation_ctx *_ctx, struct xe_validation_device *_val,
> > +	     struct drm_exec *_exec, u32 _flags, int _ret, bool _excl);
> > +static inline void *class_xe_validation_lock_ptr(class_xe_validation_t *_T)
> > +{return *_T; }
> > +#define class_xe_validation_is_conditional false
> > +
> > +/**
> > + * xe_validation_guard() - An auto-cleanup xe_validation_ctx transaction
> > + * @_ctx: The xe_validation_ctx.
> > + * @_val: The xe_validation_device.
> > + * @_exec: The struct drm_exec object
> > + * @_flags: Flags for the drm_exec transaction. See the struct drm_exec documention!
> > + * @_ret: Return in / out parameter. May be set by this macro. Typicall 0 when called.
> > + * @_excl: Whether to start in exclusive mode already in the first iteration.
> > + *
> 
> Same comment as above on function xe_validation_ctx_init wrt to
> arguments.
> 
> Matt
> 
> > + * This macro is will initiate a drm_exec transaction with additional support for
> > + * exhaustive eviction.
> > + */
> > +#define xe_validation_guard(_ctx, _val, _exec, _flags, _ret, _excl)	\
> > +	scoped_guard(xe_validation, _ctx, _val, _exec, _flags, _ret, _excl) \
> > +	drm_exec_until_all_locked(_exec)
> > +
> >  #endif
> > -- 
> > 2.50.1
> > 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 04/15] drm/xe: Pass down drm_exec context to validation
  2025-08-13 16:42   ` Matthew Brost
@ 2025-08-14  7:49     ` Thomas Hellström
  2025-08-14 19:09       ` Matthew Brost
  2025-08-22  7:40     ` Thomas Hellström
  1 sibling, 1 reply; 66+ messages in thread
From: Thomas Hellström @ 2025-08-14  7:49 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, 2025-08-13 at 09:42 -0700, Matthew Brost wrote:

> On Wed, Aug 13, 2025 at 12:51:10PM +0200, Thomas Hellström wrote:
> 
> > We want all validation (potential backing store allocation) to be part
> > of a drm_exec transaction. Therefore add a drm_exec pointer argument
> > to xe_bo_validate() and ___xe_bo_create_locked(). Upcoming patches
> > will deal with making all (or nearly all) calls to these functions
> > part of a drm_exec transaction. In the meantime, define special values
> > of the drm_exec pointer:
> > 
> 
> 
> Would the eventual idea be pass the exec further down to TTM?


Yes. The original series did this, and required multiple changes both to drm_exec and to TTM. Christian had some other ideas, though although the final goal was the same. So it's a task for us and AMD to agree on something here. The TTM object refcount removal series from Christian is a step on the way there.


>  
> 
> > XE_VALIDATION_UNIMPLEMENTED: Implementation of the drm_exec transaction
> > has not been done yet.
> > XE_VALIDATION_UNSUPPORTED: Some Middle-layers (dma-buf) doesn't allow
> > the drm_exec context to be passed down to map_attachment where
> > validation takes place.
> 
> 
> What is the expected longterm implictation of paths that are
> UNIMPLEMENTED and UNSUPPORTED?


IMO Unimplemented should not be allowed moving forward other than for debugging. UNIMPLEMENTED requires a new dma-buf mapping interface with an exec argument. I don't think all peers will support that, though and those won't participate fully in the scheme.



> 
> > XE_VALIDATION_OPT_OUT: May be used only for kunit tests where exhaustive
> > eviction isn't crucial and the ROI of converting those is very
> > small.
> > 
> > For XE_VALIDATION_UNIMPLEMENTED and XE_VALIDATION_OPT_OUT there is also
> > a lockdep check that a drm_exec transaction can indeed start at the
> > location where the macro is expanded. This is to encourage
> > developers to take this into consideration early in the code
> > development process.
> > 
> > Signed-off-by: Thomas Hellström <[thomas.hellstrom@linux.intel.com](mailto:thomas.hellstrom@linux.intel.com)>
> > ---
> >  drivers/gpu/drm/xe/Makefile                   |   1 +
> >  .../compat-i915-headers/gem/i915_gem_stolen.h |   6 +-
> >  drivers/gpu/drm/xe/display/xe_fb_pin.c        |   5 +-
> >  drivers/gpu/drm/xe/tests/xe_bo.c              |  20 +--
> >  drivers/gpu/drm/xe/tests/xe_dma_buf.c         |  12 +-
> >  drivers/gpu/drm/xe/tests/xe_migrate.c         |  45 +++---
> >  drivers/gpu/drm/xe/xe_bo.c                    | 129 +++++++++++++++---
> >  drivers/gpu/drm/xe/xe_bo.h                    |  20 +--
> >  drivers/gpu/drm/xe/xe_dma_buf.c               |  19 ++-
> >  drivers/gpu/drm/xe/xe_exec.c                  |   6 +-
> >  drivers/gpu/drm/xe/xe_ggtt.c                  |  15 +-
> >  drivers/gpu/drm/xe/xe_ggtt.h                  |   5 +-
> >  drivers/gpu/drm/xe/xe_gt_pagefault.c          |   4 +-
> >  drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c    |   6 +-
> >  drivers/gpu/drm/xe/xe_svm.c                   |   4 +-
> >  drivers/gpu/drm/xe/xe_validation.c            |  49 +++++++
> >  drivers/gpu/drm/xe/xe_validation.h            |  69 ++++++++++
> >  drivers/gpu/drm/xe/xe_vm.c                    |  26 +++-
> >  drivers/gpu/drm/xe/xe_vm.h                    |  33 ++++-
> >  drivers/gpu/drm/xe/xe_vm_types.h              |  32 +++--
> >  20 files changed, 401 insertions(+), 105 deletions(-)
> >  create mode 100644 drivers/gpu/drm/xe/xe_validation.c
> >  create mode 100644 drivers/gpu/drm/xe/xe_validation.h
> > 
> > diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> > index 8e0c3412a757..8ee7d275128d 100644
> > --- a/drivers/gpu/drm/xe/Makefile
> > +++ b/drivers/gpu/drm/xe/Makefile
> > @@ -127,6 +127,7 @@ xe-y += xe_bb.o \
> >  	xe_tuning.o \
> >  	xe_uc.o \
> >  	xe_uc_fw.o \
> > +	xe_validation.o \
> >  	xe_vm.o \
> >  	xe_vram.o \
> >  	xe_vram_freq.o \
> > diff --git a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> > index 41d39d67817a..1ce1e9da975b 100644
> > --- a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> > +++ b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> > @@ -8,6 +8,7 @@
> >  
> >  #include "xe_ttm_stolen_mgr.h"
> >  #include "xe_res_cursor.h"
> > +#include "xe_validation.h"
> >  
> >  struct xe_bo;
> >  
> > @@ -20,6 +21,7 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
> >  						       u32 size, u32 align,
> >  						       u32 start, u32 end)
> >  {
> > +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> >  	struct xe_bo *bo;
> >  	int err;
> >  	u32 flags = XE_BO_FLAG_PINNED | XE_BO_FLAG_STOLEN;
> > @@ -34,13 +36,13 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
> >  
> >  	bo = xe_bo_create_locked_range(xe, xe_device_get_root_tile(xe),
> >  				       NULL, size, start, end,
> > -				       ttm_bo_type_kernel, flags, 0);
> > +				       ttm_bo_type_kernel, flags, 0, exec);
> >  	if (IS_ERR(bo)) {
> >  		err = PTR_ERR(bo);
> >  		bo = NULL;
> >  		return err;
> >  	}
> > -	err = xe_bo_pin(bo);
> > +	err = xe_bo_pin(bo, exec);
> >  	xe_bo_unlock_vm_held(bo);
> >  
> >  	if (err) {
> > diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > index f1f8b5ab53ef..4b0748e6fdd6 100644
> > --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > @@ -281,6 +281,7 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
> >  	struct i915_vma *vma = kzalloc(sizeof(*vma), GFP_KERNEL);
> >  	struct drm_gem_object *obj = intel_fb_bo(&fb->base);
> >  	struct xe_bo *bo = gem_to_xe_bo(obj);
> > +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> >  	int ret;
> >  
> >  	if (!vma)
> > @@ -313,9 +314,9 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
> >  		goto err;
> >  
> >  	if (IS_DGFX(xe))
> > -		ret = xe_bo_migrate(bo, XE_PL_VRAM0);
> > +		ret = xe_bo_migrate(bo, XE_PL_VRAM0, exec);
> >  	else
> > -		ret = xe_bo_validate(bo, NULL, true);
> > +		ret = xe_bo_validate(bo, NULL, true, exec);
> >  	if (!ret)
> >  		ttm_bo_pin(&bo->ttm);
> >  	ttm_bo_unreserve(&bo->ttm);
> > diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
> > index bb469096d072..06ceba6c3c25 100644
> > --- a/drivers/gpu/drm/xe/tests/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/tests/xe_bo.c
> > @@ -23,7 +23,7 @@
> >  
> >  static int ccs_test_migrate(struct xe_tile *tile, struct xe_bo *bo,
> >  			    bool clear, u64 get_val, u64 assign_val,
> > -			    struct kunit *test)
> > +			    struct kunit *test, struct drm_exec *exec)
> >  {
> >  	struct dma_fence *fence;
> >  	struct ttm_tt *ttm;
> > @@ -35,7 +35,7 @@ static int ccs_test_migrate(struct xe_tile *tile, struct xe_bo *bo,
> >  	u32 offset;
> >  
> >  	/* Move bo to VRAM if not already there. */
> > -	ret = xe_bo_validate(bo, NULL, false);
> > +	ret = xe_bo_validate(bo, NULL, false, exec);
> >  	if (ret) {
> >  		KUNIT_FAIL(test, "Failed to validate bo.\n");
> >  		return ret;
> > @@ -60,7 +60,7 @@ static int ccs_test_migrate(struct xe_tile *tile, struct xe_bo *bo,
> >  	}
> >  
> >  	/* Evict to system. CCS data should be copied. */
> > -	ret = xe_bo_evict(bo);
> > +	ret = xe_bo_evict(bo, exec);
> >  	if (ret) {
> >  		KUNIT_FAIL(test, "Failed to evict bo.\n");
> >  		return ret;
> > @@ -132,6 +132,7 @@ static void ccs_test_run_tile(struct xe_device *xe, struct xe_tile *tile,
> >  
> >  	/* TODO: Sanity check */
> >  	unsigned int bo_flags = XE_BO_FLAG_VRAM_IF_DGFX(tile);
> > +	struct drm_exec *exec = XE_VALIDATION_OPT_OUT;
> >  
> >  	if (IS_DGFX(xe))
> >  		kunit_info(test, "Testing vram id %u\n", tile->id);
> > @@ -149,18 +150,18 @@ static void ccs_test_run_tile(struct xe_device *xe, struct xe_tile *tile,
> >  
> >  	kunit_info(test, "Verifying that CCS data is cleared on creation.\n");
> >  	ret = ccs_test_migrate(tile, bo, false, 0ULL, 0xdeadbeefdeadbeefULL,
> > -			       test);
> > +			       test, exec);
> >  	if (ret)
> >  		goto out_unlock;
> >  
> >  	kunit_info(test, "Verifying that CCS data survives migration.\n");
> >  	ret = ccs_test_migrate(tile, bo, false, 0xdeadbeefdeadbeefULL,
> > -			       0xdeadbeefdeadbeefULL, test);
> > +			       0xdeadbeefdeadbeefULL, test, exec);
> >  	if (ret)
> >  		goto out_unlock;
> >  
> >  	kunit_info(test, "Verifying that CCS data can be properly cleared.\n");
> > -	ret = ccs_test_migrate(tile, bo, true, 0ULL, 0ULL, test);
> > +	ret = ccs_test_migrate(tile, bo, true, 0ULL, 0ULL, test, exec);
> >  
> >  out_unlock:
> >  	xe_bo_unlock(bo);
> > @@ -210,6 +211,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
> >  	struct xe_bo *bo, *external;
> >  	unsigned int bo_flags = XE_BO_FLAG_VRAM_IF_DGFX(tile);
> >  	struct xe_vm *vm = xe_migrate_get_vm(xe_device_get_root_tile(xe)->migrate);
> > +	struct drm_exec *exec = XE_VALIDATION_OPT_OUT;
> >  	struct xe_gt *__gt;
> >  	int err, i, id;
> >  
> > @@ -236,7 +238,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
> >  		}
> >  
> >  		xe_bo_lock(external, false);
> > -		err = xe_bo_pin_external(external);
> > +		err = xe_bo_pin_external(external, exec);
> >  		xe_bo_unlock(external);
> >  		if (err) {
> >  			KUNIT_FAIL(test, "external bo pin err=%pe\n",
> > @@ -294,7 +296,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
> >  		if (i) {
> >  			down_read(&vm->lock);
> >  			xe_vm_lock(vm, false);
> > -			err = xe_bo_validate(bo, bo->vm, false);
> > +			err = xe_bo_validate(bo, bo->vm, false, exec);
> >  			xe_vm_unlock(vm);
> >  			up_read(&vm->lock);
> >  			if (err) {
> > @@ -303,7 +305,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
> >  				goto cleanup_all;
> >  			}
> >  			xe_bo_lock(external, false);
> > -			err = xe_bo_validate(external, NULL, false);
> > +			err = xe_bo_validate(external, NULL, false, exec);
> >  			xe_bo_unlock(external);
> >  			if (err) {
> >  				KUNIT_FAIL(test, "external bo valid err=%pe\n",
> > diff --git a/drivers/gpu/drm/xe/tests/xe_dma_buf.c b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
> > index cde9530bef8c..965dd3280468 100644
> > --- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c
> > +++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
> > @@ -27,7 +27,8 @@ static bool is_dynamic(struct dma_buf_test_params *params)
> >  }
> >  
> >  static void check_residency(struct kunit *test, struct xe_bo *exported,
> > -			    struct xe_bo *imported, struct dma_buf *dmabuf)
> > +			    struct xe_bo *imported, struct dma_buf *dmabuf,
> > +			    struct drm_exec *exec)
> >  {
> >  	struct dma_buf_test_params *params = to_dma_buf_test_params(test->priv);
> >  	u32 mem_type;
> > @@ -62,7 +63,7 @@ static void check_residency(struct kunit *test, struct xe_bo *exported,
> >  	 * importer is on a different device. If they're on the same device,
> >  	 * the exporter and the importer should be the same bo.
> >  	 */
> > -	ret = xe_bo_evict(exported);
> > +	ret = xe_bo_evict(exported, exec);
> >  	if (ret) {
> >  		if (ret != -EINTR && ret != -ERESTARTSYS)
> >  			KUNIT_FAIL(test, "Evicting exporter failed with err=%d.\n",
> > @@ -77,7 +78,7 @@ static void check_residency(struct kunit *test, struct xe_bo *exported,
> >  	}
> >  
> >  	/* Re-validate the importer. This should move also exporter in. */
> > -	ret = xe_bo_validate(imported, NULL, false);
> > +	ret = xe_bo_validate(imported, NULL, false, exec);
> >  	if (ret) {
> >  		if (ret != -EINTR && ret != -ERESTARTSYS)
> >  			KUNIT_FAIL(test, "Validating importer failed with err=%d.\n",
> > @@ -150,11 +151,12 @@ static void xe_test_dmabuf_import_same_driver(struct xe_device *xe)
> >  			KUNIT_FAIL(test,
> >  				   "xe_gem_prime_import() succeeded when it shouldn't have\n");
> >  		} else {
> > +			struct drm_exec *exec = XE_VALIDATION_OPT_OUT;
> >  			int err;
> >  
> >  			/* Is everything where we expect it to be? */
> >  			xe_bo_lock(import_bo, false);
> > -			err = xe_bo_validate(import_bo, NULL, false);
> > +			err = xe_bo_validate(import_bo, NULL, false, exec);
> >  
> >  			/* Pinning in VRAM is not allowed. */
> >  			if (!is_dynamic(params) &&
> > @@ -167,7 +169,7 @@ static void xe_test_dmabuf_import_same_driver(struct xe_device *xe)
> >  						  err == -ERESTARTSYS);
> >  
> >  			if (!err)
> > -				check_residency(test, bo, import_bo, dmabuf);
> > +				check_residency(test, bo, import_bo, dmabuf, exec);
> >  			xe_bo_unlock(import_bo);
> >  		}
> >  		drm_gem_object_put(import);
> > diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
> > index edd1e701aa1c..dfb445d09759 100644
> > --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
> > +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
> > @@ -70,7 +70,7 @@ static int run_sanity_job(struct xe_migrate *m, struct xe_device *xe,
> >  		} } while (0)
> >  
> >  static void test_copy(struct xe_migrate *m, struct xe_bo *bo,
> > -		      struct kunit *test, u32 region)
> > +		      struct kunit *test, u32 region, struct drm_exec *exec)
> >  {
> >  	struct xe_device *xe = tile_to_xe(m->tile);
> >  	u64 retval, expected = 0;
> > @@ -84,14 +84,15 @@ static void test_copy(struct xe_migrate *m, struct xe_bo *bo,
> >  						   ttm_bo_type_kernel,
> >  						   region |
> >  						   XE_BO_FLAG_NEEDS_CPU_ACCESS |
> > -						   XE_BO_FLAG_PINNED);
> > +						   XE_BO_FLAG_PINNED,
> > +						   exec);
> >  	if (IS_ERR(remote)) {
> >  		KUNIT_FAIL(test, "Failed to allocate remote bo for %s: %pe\n",
> >  			   str, remote);
> >  		return;
> >  	}
> >  
> > -	err = xe_bo_validate(remote, NULL, false);
> > +	err = xe_bo_validate(remote, NULL, false, exec);
> >  	if (err) {
> >  		KUNIT_FAIL(test, "Failed to validate system bo for %s: %i\n",
> >  			   str, err);
> > @@ -161,13 +162,13 @@ static void test_copy(struct xe_migrate *m, struct xe_bo *bo,
> >  }
> >  
> >  static void test_copy_sysmem(struct xe_migrate *m, struct xe_bo *bo,
> > -			     struct kunit *test)
> > +			     struct drm_exec *exec, struct kunit *test)
> >  {
> > -	test_copy(m, bo, test, XE_BO_FLAG_SYSTEM);
> > +	test_copy(m, bo, test, XE_BO_FLAG_SYSTEM, exec);
> >  }
> >  
> >  static void test_copy_vram(struct xe_migrate *m, struct xe_bo *bo,
> > -			   struct kunit *test)
> > +			   struct drm_exec *exec, struct kunit *test)
> >  {
> >  	u32 region;
> >  
> > @@ -178,10 +179,11 @@ static void test_copy_vram(struct xe_migrate *m, struct xe_bo *bo,
> >  		region = XE_BO_FLAG_VRAM1;
> >  	else
> >  		region = XE_BO_FLAG_VRAM0;
> > -	test_copy(m, bo, test, region);
> > +	test_copy(m, bo, test, region, exec);
> >  }
> >  
> > -static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
> > +static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test,
> > +				   struct drm_exec *exec)
> >  {
> >  	struct xe_tile *tile = m->tile;
> >  	struct xe_device *xe = tile_to_xe(tile);
> > @@ -290,10 +292,10 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
> >  	check(retval, expected, "Command clear small last value", test);
> >  
> >  	kunit_info(test, "Copying small buffer object to system\n");
> > -	test_copy_sysmem(m, tiny, test);
> > +	test_copy_sysmem(m, tiny, exec, test);
> >  	if (xe->info.tile_count > 1) {
> >  		kunit_info(test, "Copying small buffer object to other vram\n");
> > -		test_copy_vram(m, tiny, test);
> > +		test_copy_vram(m, tiny, exec, test);
> >  	}
> >  
> >  	/* Clear a big bo */
> > @@ -312,10 +314,10 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
> >  	check(retval, expected, "Command clear big last value", test);
> >  
> >  	kunit_info(test, "Copying big buffer object to system\n");
> > -	test_copy_sysmem(m, big, test);
> > +	test_copy_sysmem(m, big, exec, test);
> >  	if (xe->info.tile_count > 1) {
> >  		kunit_info(test, "Copying big buffer object to other vram\n");
> > -		test_copy_vram(m, big, test);
> > +		test_copy_vram(m, big, exec, test);
> >  	}
> >  
> >  out:
> > @@ -343,10 +345,11 @@ static int migrate_test_run_device(struct xe_device *xe)
> >  
> >  	for_each_tile(tile, xe, id) {
> >  		struct xe_migrate *m = tile->migrate;
> > +		struct drm_exec *exec = XE_VALIDATION_OPT_OUT;
> >  
> >  		kunit_info(test, "Testing tile id %d.\n", id);
> >  		xe_vm_lock(m->q->vm, false);
> > -		xe_migrate_sanity_test(m, test);
> > +		xe_migrate_sanity_test(m, test, exec);
> >  		xe_vm_unlock(m->q->vm);
> >  	}
> >  
> > @@ -490,7 +493,7 @@ static struct dma_fence *blt_copy(struct xe_tile *tile,
> >  
> >  static void test_migrate(struct xe_device *xe, struct xe_tile *tile,
> >  			 struct xe_bo *sys_bo, struct xe_bo *vram_bo, struct xe_bo *ccs_bo,
> > -			 struct kunit *test)
> > +			 struct drm_exec *exec, struct kunit *test)
> >  {
> >  	struct dma_fence *fence;
> >  	u64 expected, retval;
> > @@ -509,7 +512,7 @@ static void test_migrate(struct xe_device *xe, struct xe_tile *tile,
> >  	dma_fence_put(fence);
> >  
> >  	kunit_info(test, "Evict vram buffer object\n");
> > -	ret = xe_bo_evict(vram_bo);
> > +	ret = xe_bo_evict(vram_bo, exec);
> >  	if (ret) {
> >  		KUNIT_FAIL(test, "Failed to evict bo.\n");
> >  		return;
> > @@ -538,7 +541,7 @@ static void test_migrate(struct xe_device *xe, struct xe_tile *tile,
> >  	dma_fence_put(fence);
> >  
> >  	kunit_info(test, "Restore vram buffer object\n");
> > -	ret = xe_bo_validate(vram_bo, NULL, false);
> > +	ret = xe_bo_validate(vram_bo, NULL, false, exec);
> >  	if (ret) {
> >  		KUNIT_FAIL(test, "Failed to validate vram bo for: %li\n", ret);
> >  		return;
> > @@ -636,6 +639,7 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
> >  {
> >  	struct xe_bo *sys_bo, *vram_bo = NULL, *ccs_bo = NULL;
> >  	unsigned int bo_flags = XE_BO_FLAG_VRAM_IF_DGFX(tile);
> > +	struct drm_exec *exec;
> >  	long ret;
> >  
> >  	sys_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M,
> > @@ -650,8 +654,9 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
> >  		return;
> >  	}
> >  
> > +	exec = XE_VALIDATION_OPT_OUT;
> >  	xe_bo_lock(sys_bo, false);
> > -	ret = xe_bo_validate(sys_bo, NULL, false);
> > +	ret = xe_bo_validate(sys_bo, NULL, false, exec);
> >  	if (ret) {
> >  		KUNIT_FAIL(test, "Failed to validate system bo for: %li\n", ret);
> >  		goto free_sysbo;
> > @@ -676,7 +681,7 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
> >  	}
> >  
> >  	xe_bo_lock(ccs_bo, false);
> > -	ret = xe_bo_validate(ccs_bo, NULL, false);
> > +	ret = xe_bo_validate(ccs_bo, NULL, false, exec);
> >  	if (ret) {
> >  		KUNIT_FAIL(test, "Failed to validate system bo for: %li\n", ret);
> >  		goto free_ccsbo;
> > @@ -700,7 +705,7 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
> >  	}
> >  
> >  	xe_bo_lock(vram_bo, false);
> > -	ret = xe_bo_validate(vram_bo, NULL, false);
> > +	ret = xe_bo_validate(vram_bo, NULL, false, exec);
> >  	if (ret) {
> >  		KUNIT_FAIL(test, "Failed to validate vram bo for: %li\n", ret);
> >  		goto free_vrambo;
> > @@ -713,7 +718,7 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
> >  	}
> >  
> >  	test_clear(xe, tile, sys_bo, vram_bo, test);
> > -	test_migrate(xe, tile, sys_bo, vram_bo, ccs_bo, test);
> > +	test_migrate(xe, tile, sys_bo, vram_bo, ccs_bo, exec, test);
> >  	xe_bo_unlock(vram_bo);
> >  
> >  	xe_bo_lock(vram_bo, false);
> > diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> > index 11eaf3b06766..e71addf51ed0 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > @@ -1139,6 +1139,7 @@ long xe_bo_shrink(struct ttm_operation_ctx *ctx, struct ttm_buffer_object *bo,
> >  int xe_bo_notifier_prepare_pinned(struct xe_bo *bo)
> >  {
> >  	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
> > +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> >  	struct xe_bo *backup;
> >  	int ret = 0;
> >  
> > @@ -1163,7 +1164,7 @@ int xe_bo_notifier_prepare_pinned(struct xe_bo *bo)
> >  	backup = ___xe_bo_create_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
> >  					DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> >  					XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> > -					XE_BO_FLAG_PINNED);
> > +					XE_BO_FLAG_PINNED, exec);
> >  	if (IS_ERR(backup)) {
> >  		ret = PTR_ERR(backup);
> >  		goto out_unlock_bo;
> > @@ -1214,6 +1215,7 @@ int xe_bo_notifier_unprepare_pinned(struct xe_bo *bo)
> >  int xe_bo_evict_pinned(struct xe_bo *bo)
> >  {
> >  	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
> > +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> >  	struct xe_bo *backup = bo->backup_obj;
> >  	bool backup_created = false;
> >  	bool unmap = false;
> > @@ -1242,7 +1244,7 @@ int xe_bo_evict_pinned(struct xe_bo *bo)
> >  						NULL, xe_bo_size(bo),
> >  						DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> >  						XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> > -						XE_BO_FLAG_PINNED);
> > +						XE_BO_FLAG_PINNED, exec);
> >  		if (IS_ERR(backup)) {
> >  			ret = PTR_ERR(backup);
> >  			goto out_unlock_bo;
> > @@ -1718,12 +1720,14 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
> >  	struct xe_device *xe = to_xe_device(ddev);
> >  	struct xe_bo *bo = ttm_to_xe_bo(tbo);
> >  	bool needs_rpm = bo->flags & XE_BO_FLAG_VRAM_MASK;
> > +	struct drm_exec *exec;
> >  	vm_fault_t ret;
> >  	int idx;
> >  
> >  	if (needs_rpm)
> >  		xe_pm_runtime_get(xe);
> >  
> > +	exec = XE_VALIDATION_UNIMPLEMENTED;
> >  	ret = ttm_bo_vm_reserve(tbo, vmf);
> >  	if (ret)
> >  		goto out;
> > @@ -1731,6 +1735,7 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
> >  	if (drm_dev_enter(ddev, &idx)) {
> >  		trace_xe_bo_cpu_fault(bo);
> >  
> > +		xe_validation_assert_exec(xe, exec, &tbo->base);
> >  		ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
> >  					       TTM_BO_VM_NUM_PREFAULT);
> >  		drm_dev_exit(idx);
> > @@ -1850,11 +1855,32 @@ void xe_bo_free(struct xe_bo *bo)
> >  	kfree(bo);
> >  }
> >  
> > +/**
> > + * ___xe_bo_create_locked() - Initialize or create an xe_bo.
> > + * @xe: The xe device.
> > + * @bo: An already allocated buffer object or NULL
> > + * if the function should allocate a new one.
> > + * @tile: The tile to select for migration of this bo, and the tile used for
> > + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
> > + * @resv: Pointer to a locked shared reservation object to use fo this bo,
> > + * or NULL for the xe_bo to use its own.
> > + * @bulk: The bulk move to use for LRU bumping, or NULL for external bos.
> > + * @size: The storage size to use for the bo.
> > + * @cpu_caching: The cpu caching used for system memory backing store.
> > + * @type: The TTM buffer object type.
> > + * @flags: XE_BO_FLAG_ flags.
> > + * @exec: The drm_exec transaction to use for exhaustive eviction.
> > + *
> > + * Initialize or create an xe buffer object. On failure, any allocated buffer
> > + * object passed in @bo will have been unreferenced.
> > + *
> > + * Return: The buffer object on success. Negative error pointer on failure.
> > + */
> >  struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> >  				     struct xe_tile *tile, struct dma_resv *resv,
> >  				     struct ttm_lru_bulk_move *bulk, size_t size,
> >  				     u16 cpu_caching, enum ttm_bo_type type,
> > -				     u32 flags)
> > +				     u32 flags, struct drm_exec *exec)
> >  {
> >  	struct ttm_operation_ctx ctx = {
> >  		.interruptible = true,
> > @@ -1923,6 +1949,7 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> >  		ctx.resv = resv;
> >  	}
> >  
> > +	xe_validation_assert_exec(xe, exec, &bo->ttm.base);
> >  	if (!(flags & XE_BO_FLAG_FIXED_PLACEMENT)) {
> >  		err = __xe_bo_placement_for_flags(xe, bo, bo->flags);
> >  		if (WARN_ON(err)) {
> > @@ -2024,7 +2051,7 @@ __xe_bo_create_locked(struct xe_device *xe,
> >  		      struct xe_tile *tile, struct xe_vm *vm,
> >  		      size_t size, u64 start, u64 end,
> >  		      u16 cpu_caching, enum ttm_bo_type type, u32 flags,
> > -		      u64 alignment)
> > +		      u64 alignment, struct drm_exec *exec)
> >  {
> >  	struct xe_bo *bo = NULL;
> >  	int err;
> > @@ -2049,7 +2076,7 @@ __xe_bo_create_locked(struct xe_device *xe,
> >  				    vm && !xe_vm_in_fault_mode(vm) &&
> >  				    flags & XE_BO_FLAG_USER ?
> >  				    &vm->lru_bulk_move : NULL, size,
> > -				    cpu_caching, type, flags);
> > +				    cpu_caching, type, flags, exec);
> >  	if (IS_ERR(bo))
> >  		return bo;
> >  
> > @@ -2083,9 +2110,10 @@ __xe_bo_create_locked(struct xe_device *xe,
> >  
> >  			if (flags & XE_BO_FLAG_FIXED_PLACEMENT) {
> >  				err = xe_ggtt_insert_bo_at(t->mem.ggtt, bo,
> > -							   start + xe_bo_size(bo), U64_MAX);
> > +							   start + xe_bo_size(bo), U64_MAX,
> > +							   exec);
> >  			} else {
> > -				err = xe_ggtt_insert_bo(t->mem.ggtt, bo);
> > +				err = xe_ggtt_insert_bo(t->mem.ggtt, bo, exec);
> >  			}
> >  			if (err)
> >  				goto err_unlock_put_bo;
> > @@ -2102,22 +2130,59 @@ __xe_bo_create_locked(struct xe_device *xe,
> >  	return ERR_PTR(err);
> >  }
> >  
> > +/**
> > + * xe_bo_create_locked_range() - Create a BO with range- and alignment options
> > + * @xe: The xe device.
> > + * @tile: The tile to select for migration of this bo, and the tile used for
> > + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
> > + * @vm: The local vm or NULL for external objects.
> > + * @size: The storage size to use for the bo.
> > + * @start: Start of fixed VRAM range or 0.
> > + * @end: End of fixed VRAM range or ~0ULL.
> > + * @type: The TTM buffer object type.
> > + * @flags: XE_BO_FLAG_ flags.
> > + * @alignment: For GGTT buffer objects, the minimum GGTT alignment.
> > + * @exec: The drm_exec transaction to use for exhaustive eviction.
> > + *
> > + * Create an Xe BO with range- and alignment options. If @start and @end indicate
> > + * a fixed VRAM range, this must be a ttm_bo_type_kernel bo with VRAM placement
> > + * only. The @alignment parameter can be used for GGTT alignment.
> > + *
> > + * Return: The buffer object on success. Negative error pointer on failure.
> > + */
> >  struct xe_bo *
> >  xe_bo_create_locked_range(struct xe_device *xe,
> >  			  struct xe_tile *tile, struct xe_vm *vm,
> >  			  size_t size, u64 start, u64 end,
> > -			  enum ttm_bo_type type, u32 flags, u64 alignment)
> > +			  enum ttm_bo_type type, u32 flags, u64 alignment,
> > +			  struct drm_exec *exec)
> >  {
> >  	return __xe_bo_create_locked(xe, tile, vm, size, start, end, 0, type,
> > -				     flags, alignment);
> > +				     flags, alignment, exec);
> >  }
> >  
> > +/**
> > + * xe_bo_create_locked() - Create a BO
> > + * @xe: The xe device.
> > + * @tile: The tile to select for migration of this bo, and the tile used for
> > + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
> > + * @vm: The local vm or NULL for external objects.
> > + * @size: The storage size to use for the bo.
> > + * @type: The TTM buffer object type.
> > + * @flags: XE_BO_FLAG_ flags.
> > + * @exec: The drm_exec transaction to use for exhaustive eviction.
> > + *
> > + * Create a locked xe BO with no range- nor alignment restrictions.
> > + *
> > + * Return: The buffer object on success. Negative error pointer on failure.
> > + */
> >  struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
> >  				  struct xe_vm *vm, size_t size,
> > -				  enum ttm_bo_type type, u32 flags)
> > +				  enum ttm_bo_type type, u32 flags,
> > +				  struct drm_exec *exec)
> >  {
> >  	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, type,
> > -				     flags, 0);
> > +				     flags, 0, exec);
> >  }
> >  
> >  struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
> > @@ -2125,9 +2190,10 @@ struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
> >  				u16 cpu_caching,
> >  				u32 flags)
> >  {
> > +	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
> >  	struct xe_bo *bo = __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL,
> >  						 cpu_caching, ttm_bo_type_device,
> > -						 flags | XE_BO_FLAG_USER, 0);
> > +						 flags | XE_BO_FLAG_USER, 0, exec);
> >  	if (!IS_ERR(bo))
> >  		xe_bo_unlock_vm_held(bo);
> >  
> > @@ -2138,7 +2204,8 @@ struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
> >  			   struct xe_vm *vm, size_t size,
> >  			   enum ttm_bo_type type, u32 flags)
> >  {
> > -	struct xe_bo *bo = xe_bo_create_locked(xe, tile, vm, size, type, flags);
> > +	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
> > +	struct xe_bo *bo = xe_bo_create_locked(xe, tile, vm, size, type, flags, exec);
> >  
> >  	if (!IS_ERR(bo))
> >  		xe_bo_unlock_vm_held(bo);
> > @@ -2166,6 +2233,7 @@ struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
> >  	int err;
> >  	u64 start = offset == ~0ull ? 0 : offset;
> >  	u64 end = offset == ~0ull ? offset : start + size;
> > +	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
> >  
> >  	if (flags & XE_BO_FLAG_STOLEN &&
> >  	    xe_ttm_stolen_cpu_access_needs_ggtt(xe))
> > @@ -2173,11 +2241,11 @@ struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
> >  
> >  	bo = xe_bo_create_locked_range(xe, tile, vm, size, start, end, type,
> >  				       flags | XE_BO_FLAG_NEEDS_CPU_ACCESS | XE_BO_FLAG_PINNED,
> > -				       alignment);
> > +				       alignment, exec);
> >  	if (IS_ERR(bo))
> >  		return bo;
> >  
> > -	err = xe_bo_pin(bo);
> > +	err = xe_bo_pin(bo, exec);
> >  	if (err)
> >  		goto err_put;
> >  
> > @@ -2299,6 +2367,7 @@ uint64_t vram_region_gpu_offset(struct ttm_resource *res)
> >  /**
> >   * xe_bo_pin_external - pin an external BO
> >   * @bo: buffer object to be pinned
> > + * @exec: The drm_exec transaction to use for exhaustive eviction.
> >   *
> >   * Pin an external (not tied to a VM, can be exported via dma-buf / prime FD)
> >   * BO. Unique call compared to xe_bo_pin as this function has it own set of
> > @@ -2306,7 +2375,7 @@ uint64_t vram_region_gpu_offset(struct ttm_resource *res)
> >   *
> >   * Returns 0 for success, negative error code otherwise.
> >   */
> > -int xe_bo_pin_external(struct xe_bo *bo)
> > +int xe_bo_pin_external(struct xe_bo *bo, struct drm_exec *exec)
> >  {
> >  	struct xe_device *xe = xe_bo_device(bo);
> >  	int err;
> > @@ -2315,7 +2384,7 @@ int xe_bo_pin_external(struct xe_bo *bo)
> >  	xe_assert(xe, xe_bo_is_user(bo));
> >  
> >  	if (!xe_bo_is_pinned(bo)) {
> > -		err = xe_bo_validate(bo, NULL, false);
> > +		err = xe_bo_validate(bo, NULL, false, exec);
> >  		if (err)
> >  			return err;
> >  
> > @@ -2337,7 +2406,17 @@ int xe_bo_pin_external(struct xe_bo *bo)
> >  	return 0;
> >  }
> >  
> > -int xe_bo_pin(struct xe_bo *bo)
> > +/**
> > + * xe_bo_pin() - Pin a kernel bo after potentially migrating it
> > + * @bo: The kernel bo to pin.
> > + * @exec: The drm_exec transaction to use for exhaustive eviction.
> > + *
> > + * Attempts to migrate a bo to @bo->placement. If that succeeds,
> > + * pins the bo.
> > + *
> > + * Return: %0 on success, negative error code on migration failure.
> > + */
> > +int xe_bo_pin(struct xe_bo *bo, struct drm_exec *exec)
> >  {
> >  	struct ttm_place *place = &bo->placements[0];
> >  	struct xe_device *xe = xe_bo_device(bo);
> > @@ -2359,7 +2438,7 @@ int xe_bo_pin(struct xe_bo *bo)
> >  	/* We only expect at most 1 pin */
> >  	xe_assert(xe, !xe_bo_is_pinned(bo));
> >  
> > -	err = xe_bo_validate(bo, NULL, false);
> > +	err = xe_bo_validate(bo, NULL, false, exec);
> >  	if (err)
> >  		return err;
> >  
> > @@ -2452,6 +2531,7 @@ void xe_bo_unpin(struct xe_bo *bo)
> >   *      NULL. Used together with @allow_res_evict.
> >   * @allow_res_evict: Whether it's allowed to evict bos sharing @vm's
> >   *                   reservation object.
> > + * @exec: The drm_exec transaction to use for exhaustive eviction.
> >   *
> >   * Make sure the bo is in allowed placement, migrating it if necessary. If
> >   * needed, other bos will be evicted. If bos selected for eviction shares
> > @@ -2461,7 +2541,8 @@ void xe_bo_unpin(struct xe_bo *bo)
> >   * Return: 0 on success, negative error code on failure. May return
> >   * -EINTR or -ERESTARTSYS if internal waits are interrupted by a signal.
> >   */
> > -int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict)
> > +int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict,
> > +		   struct drm_exec *exec)
> >  {
> >  	struct ttm_operation_ctx ctx = {
> >  		.interruptible = true,
> > @@ -2480,6 +2561,7 @@ int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict)
> >  
> >  	xe_vm_set_validating(vm, allow_res_evict);
> >  	trace_xe_bo_validate(bo);
> > +	xe_validation_assert_exec(xe_bo_device(bo), exec, &bo->ttm.base);
> >  	ret = ttm_bo_validate(&bo->ttm, &bo->placement, &ctx);
> >  	xe_vm_clear_validating(vm, allow_res_evict);
> >  
> > @@ -2917,6 +2999,7 @@ static void xe_place_from_ttm_type(u32 mem_type, struct ttm_place *place)
> >   * xe_bo_migrate - Migrate an object to the desired region id
> >   * @bo: The buffer object to migrate.
> >   * @mem_type: The TTM region type to migrate to.
> > + * @exec: The drm_exec transaction to use for exhaustive eviction.
> >   *
> >   * Attempt to migrate the buffer object to the desired memory region. The
> >   * buffer object may not be pinned, and must be locked.
> > @@ -2928,7 +3011,7 @@ static void xe_place_from_ttm_type(u32 mem_type, struct ttm_place *place)
> >   * Return: 0 on success. Negative error code on failure. In particular may
> >   * return -EINTR or -ERESTARTSYS if signal pending.
> >   */
> > -int xe_bo_migrate(struct xe_bo *bo, u32 mem_type)
> > +int xe_bo_migrate(struct xe_bo *bo, u32 mem_type, struct drm_exec *exec)
> >  {
> >  	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
> >  	struct ttm_operation_ctx ctx = {
> > @@ -2966,19 +3049,21 @@ int xe_bo_migrate(struct xe_bo *bo, u32 mem_type)
> >  		add_vram(xe, bo, &requested, bo->flags, mem_type, &c);
> >  	}
> >  
> > +	xe_validation_assert_exec(xe_bo_device(bo), exec, &bo->ttm.base);
> >  	return ttm_bo_validate(&bo->ttm, &placement, &ctx);
> >  }
> >  
> >  /**
> >   * xe_bo_evict - Evict an object to evict placement
> >   * @bo: The buffer object to migrate.
> > + * @exec: The drm_exec transaction to use for exhaustive eviction.
> >   *
> >   * On successful completion, the object memory will be moved to evict
> >   * placement. This function blocks until the object has been fully moved.
> >   *
> >   * Return: 0 on success. Negative error code on failure.
> >   */
> > -int xe_bo_evict(struct xe_bo *bo)
> > +int xe_bo_evict(struct xe_bo *bo, struct drm_exec *exec)
> >  {
> >  	struct ttm_operation_ctx ctx = {
> >  		.interruptible = false,
> > diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> > index 8cce413b5235..b1b6cb622d71 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.h
> > +++ b/drivers/gpu/drm/xe/xe_bo.h
> > @@ -10,6 +10,7 @@
> >  
> >  #include "xe_bo_types.h"
> >  #include "xe_macros.h"
> > +#include "xe_validation.h"
> >  #include "xe_vm_types.h"
> >  #include "xe_vm.h"
> >  #include "xe_vram_types.h"
> > @@ -92,15 +93,17 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> >  				     struct xe_tile *tile, struct dma_resv *resv,
> >  				     struct ttm_lru_bulk_move *bulk, size_t size,
> >  				     u16 cpu_caching, enum ttm_bo_type type,
> > -				     u32 flags);
> > +				     u32 flags, struct drm_exec *exec);
> >  struct xe_bo *
> >  xe_bo_create_locked_range(struct xe_device *xe,
> >  			  struct xe_tile *tile, struct xe_vm *vm,
> >  			  size_t size, u64 start, u64 end,
> > -			  enum ttm_bo_type type, u32 flags, u64 alignment);
> > +			  enum ttm_bo_type type, u32 flags, u64 alignment,
> > +			  struct drm_exec *exec);
> >  struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
> >  				  struct xe_vm *vm, size_t size,
> > -				  enum ttm_bo_type type, u32 flags);
> > +				  enum ttm_bo_type type, u32 flags,
> > +				  struct drm_exec *exec);
> >  struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
> >  			   struct xe_vm *vm, size_t size,
> >  			   enum ttm_bo_type type, u32 flags);
> > @@ -200,11 +203,12 @@ static inline void xe_bo_unlock_vm_held(struct xe_bo *bo)
> >  	}
> >  }
> >  
> > -int xe_bo_pin_external(struct xe_bo *bo);
> > -int xe_bo_pin(struct xe_bo *bo);
> > +int xe_bo_pin_external(struct xe_bo *bo, struct drm_exec *exec);
> > +int xe_bo_pin(struct xe_bo *bo, struct drm_exec *exec);
> >  void xe_bo_unpin_external(struct xe_bo *bo);
> >  void xe_bo_unpin(struct xe_bo *bo);
> > -int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict);
> > +int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict,
> > +		   struct drm_exec *exec);
> >  
> >  static inline bool xe_bo_is_pinned(struct xe_bo *bo)
> >  {
> > @@ -285,8 +289,8 @@ uint64_t vram_region_gpu_offset(struct ttm_resource *res);
> >  
> >  bool xe_bo_can_migrate(struct xe_bo *bo, u32 mem_type);
> >  
> > -int xe_bo_migrate(struct xe_bo *bo, u32 mem_type);
> > -int xe_bo_evict(struct xe_bo *bo);
> > +int xe_bo_migrate(struct xe_bo *bo, u32 mem_type, struct drm_exec *exec);
> > +int xe_bo_evict(struct xe_bo *bo, struct drm_exec *exec);
> >  
> >  int xe_bo_evict_pinned(struct xe_bo *bo);
> >  int xe_bo_notifier_prepare_pinned(struct xe_bo *bo);
> > diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
> > index 346f857f3837..78a827d4e726 100644
> > --- a/drivers/gpu/drm/xe/xe_dma_buf.c
> > +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
> > @@ -51,6 +51,7 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
> >  	struct drm_gem_object *obj = attach->dmabuf->priv;
> >  	struct xe_bo *bo = gem_to_xe_bo(obj);
> >  	struct xe_device *xe = xe_bo_device(bo);
> > +	struct drm_exec *exec = XE_VALIDATION_UNSUPPORTED;
> >  	int ret;
> >  
> >  	/*
> > @@ -63,7 +64,7 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
> >  		return -EINVAL;
> >  	}
> >  
> > -	ret = xe_bo_migrate(bo, XE_PL_TT);
> > +	ret = xe_bo_migrate(bo, XE_PL_TT, exec);
> >  	if (ret) {
> >  		if (ret != -EINTR && ret != -ERESTARTSYS)
> >  			drm_dbg(&xe->drm,
> > @@ -72,7 +73,7 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
> >  		return ret;
> >  	}
> >  
> > -	ret = xe_bo_pin_external(bo);
> > +	ret = xe_bo_pin_external(bo, exec);
> >  	xe_assert(xe, !ret);
> >  
> >  	return 0;
> > @@ -92,6 +93,7 @@ static struct sg_table *xe_dma_buf_map(struct dma_buf_attachment *attach,
> >  	struct dma_buf *dma_buf = attach->dmabuf;
> >  	struct drm_gem_object *obj = dma_buf->priv;
> >  	struct xe_bo *bo = gem_to_xe_bo(obj);
> > +	struct drm_exec *exec = XE_VALIDATION_UNSUPPORTED;
> >  	struct sg_table *sgt;
> >  	int r = 0;
> >  
> > @@ -100,9 +102,9 @@ static struct sg_table *xe_dma_buf_map(struct dma_buf_attachment *attach,
> >  
> >  	if (!xe_bo_is_pinned(bo)) {
> >  		if (!attach->peer2peer)
> > -			r = xe_bo_migrate(bo, XE_PL_TT);
> > +			r = xe_bo_migrate(bo, XE_PL_TT, exec);
> >  		else
> > -			r = xe_bo_validate(bo, NULL, false);
> > +			r = xe_bo_validate(bo, NULL, false, exec);
> >  		if (r)
> >  			return ERR_PTR(r);
> >  	}
> > @@ -161,13 +163,14 @@ static int xe_dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
> >  	struct xe_bo *bo = gem_to_xe_bo(obj);
> >  	bool reads =  (direction == DMA_BIDIRECTIONAL ||
> >  		       direction == DMA_FROM_DEVICE);
> > +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> >  
> >  	if (!reads)
> >  		return 0;
> >  
> >  	/* Can we do interruptible lock here? */
> >  	xe_bo_lock(bo, false);
> > -	(void)xe_bo_migrate(bo, XE_PL_TT);
> > +	(void)xe_bo_migrate(bo, XE_PL_TT, exec);
> >  	xe_bo_unlock(bo);
> >  
> >  	return 0;
> > @@ -208,13 +211,14 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
> >  {
> >  	struct dma_resv *resv = dma_buf->resv;
> >  	struct xe_device *xe = to_xe_device(dev);
> > +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> >  	struct xe_bo *bo;
> >  	int ret;
> >  
> >  	dma_resv_lock(resv, NULL);
> >  	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
> >  				    0, /* Will require 1way or 2way for vm_bind */
> > -				    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM);
> > +				    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM, exec);
> >  	if (IS_ERR(bo)) {
> >  		ret = PTR_ERR(bo);
> >  		goto error;
> > @@ -232,8 +236,9 @@ static void xe_dma_buf_move_notify(struct dma_buf_attachment *attach)
> >  {
> >  	struct drm_gem_object *obj = attach->importer_priv;
> >  	struct xe_bo *bo = gem_to_xe_bo(obj);
> > +	struct drm_exec *exec = XE_VALIDATION_UNSUPPORTED;
> >  
> > -	XE_WARN_ON(xe_bo_evict(bo));
> > +	XE_WARN_ON(xe_bo_evict(bo, exec));
> >  }
> >  
> >  static const struct dma_buf_attach_ops xe_dma_buf_attach_ops = {
> > diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> > index 44364c042ad7..0bcb4fb9a10e 100644
> > --- a/drivers/gpu/drm/xe/xe_exec.c
> > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > @@ -97,9 +97,13 @@
> >  static int xe_exec_fn(struct drm_gpuvm_exec *vm_exec)
> >  {
> >  	struct xe_vm *vm = container_of(vm_exec->vm, struct xe_vm, gpuvm);
> > +	int ret;
> >  
> >  	/* The fence slot added here is intended for the exec sched job. */
> > -	return xe_vm_validate_rebind(vm, &vm_exec->exec, 1);
> > +	xe_vm_set_validation_exec(vm, &vm_exec->exec);
> > +	ret = xe_vm_validate_rebind(vm, &vm_exec->exec, 1);
> > +	xe_vm_set_validation_exec(vm, NULL);
> > +	return ret;
> >  }
> >  
> >  int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> > diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> > index e03222f5ac5a..a47c0131956b 100644
> > --- a/drivers/gpu/drm/xe/xe_ggtt.c
> > +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> > @@ -731,7 +731,7 @@ void xe_ggtt_map_bo_unlocked(struct xe_ggtt *ggtt, struct xe_bo *bo)
> >  }
> >  
> >  static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
> > -				  u64 start, u64 end)
> > +				  u64 start, u64 end, struct drm_exec *exec)
> >  {
> >  	u64 alignment = bo->min_align > 0 ? bo->min_align : XE_PAGE_SIZE;
> >  	u8 tile_id = ggtt->tile->id;
> > @@ -746,7 +746,7 @@ static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
> >  		return 0;
> >  	}
> >  
> > -	err = xe_bo_validate(bo, NULL, false);
> > +	err = xe_bo_validate(bo, NULL, false, exec);
> >  	if (err)
> >  		return err;
> >  
> > @@ -788,25 +788,28 @@ static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
> >   * @bo: the &xe_bo to be inserted
> >   * @start: address where it will be inserted
> >   * @end: end of the range where it will be inserted
> > + * @exec: The drm_exec transaction to use for exhaustive eviction.
> >   *
> >   * Return: 0 on success or a negative error code on failure.
> >   */
> >  int xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
> > -			 u64 start, u64 end)
> > +			 u64 start, u64 end, struct drm_exec *exec)
> >  {
> > -	return __xe_ggtt_insert_bo_at(ggtt, bo, start, end);
> > +	return __xe_ggtt_insert_bo_at(ggtt, bo, start, end, exec);
> >  }
> >  
> >  /**
> >   * xe_ggtt_insert_bo - Insert BO into GGTT
> >   * @ggtt: the &xe_ggtt where bo will be inserted
> >   * @bo: the &xe_bo to be inserted
> > + * @exec: The drm_exec transaction to use for exhaustive eviction.
> >   *
> >   * Return: 0 on success or a negative error code on failure.
> >   */
> > -int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo)
> > +int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo,
> > +		      struct drm_exec *exec)
> >  {
> > -	return __xe_ggtt_insert_bo_at(ggtt, bo, 0, U64_MAX);
> > +	return __xe_ggtt_insert_bo_at(ggtt, bo, 0, U64_MAX, exec);
> >  }
> >  
> >  /**
> > diff --git a/drivers/gpu/drm/xe/xe_ggtt.h b/drivers/gpu/drm/xe/xe_ggtt.h
> > index fbe1e397d05d..75fc7a1efea7 100644
> > --- a/drivers/gpu/drm/xe/xe_ggtt.h
> > +++ b/drivers/gpu/drm/xe/xe_ggtt.h
> > @@ -10,6 +10,7 @@
> >  
> >  struct drm_printer;
> >  struct xe_tile;
> > +struct drm_exec;
> >  
> >  struct xe_ggtt *xe_ggtt_alloc(struct xe_tile *tile);
> >  int xe_ggtt_init_early(struct xe_ggtt *ggtt);
> > @@ -31,9 +32,9 @@ bool xe_ggtt_node_allocated(const struct xe_ggtt_node *node);
> >  void xe_ggtt_map_bo(struct xe_ggtt *ggtt, struct xe_ggtt_node *node,
> >  		    struct xe_bo *bo, u16 pat_index);
> >  void xe_ggtt_map_bo_unlocked(struct xe_ggtt *ggtt, struct xe_bo *bo);
> > -int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo);
> > +int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo, struct drm_exec *exec);
> >  int xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
> > -			 u64 start, u64 end);
> > +			 u64 start, u64 end, struct drm_exec *exec);
> >  void xe_ggtt_remove_bo(struct xe_ggtt *ggtt, struct xe_bo *bo);
> >  u64 xe_ggtt_largest_hole(struct xe_ggtt *ggtt, u64 alignment, u64 *spare);
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > index ab43dec52776..2c7f10cc423f 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > @@ -94,12 +94,12 @@ static int xe_pf_begin(struct drm_exec *exec, struct xe_vma *vma,
> >  		}
> >  
> >  		/* Migrate to VRAM, move should invalidate the VMA first */
> > -		err = xe_bo_migrate(bo, vram->placement);
> > +		err = xe_bo_migrate(bo, vram->placement, exec);
> >  		if (err)
> >  			return err;
> >  	} else if (bo) {
> >  		/* Create backing store if needed */
> > -		err = xe_bo_validate(bo, vm, true);
> > +		err = xe_bo_validate(bo, vm, true, exec);
> >  		if (err)
> >  			return err;
> >  	}
> > diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> > index c8f0320d032f..906011671b60 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> > +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> > @@ -1452,6 +1452,7 @@ static bool pf_release_vf_config_lmem(struct xe_gt *gt, struct xe_gt_sriov_confi
> >  static int pf_provision_vf_lmem(struct xe_gt *gt, unsigned int vfid, u64 size)
> >  {
> >  	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
> > +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> >  	struct xe_device *xe = gt_to_xe(gt);
> >  	struct xe_tile *tile = gt_to_tile(gt);
> >  	struct xe_bo *bo;
> > @@ -1484,11 +1485,12 @@ static int pf_provision_vf_lmem(struct xe_gt *gt, unsigned int vfid, u64 size)
> >  				 XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> >  				 XE_BO_FLAG_NEEDS_2M |
> >  				 XE_BO_FLAG_PINNED |
> > -				 XE_BO_FLAG_PINNED_LATE_RESTORE);
> > +				 XE_BO_FLAG_PINNED_LATE_RESTORE,
> > +				 exec);
> >  	if (IS_ERR(bo))
> >  		return PTR_ERR(bo);
> >  
> > -	err = xe_bo_pin(bo);
> > +	err = xe_bo_pin(bo, exec);
> >  	xe_bo_unlock(bo);
> >  	if (unlikely(err)) {
> >  		xe_bo_put(bo);
> > diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> > index e35c6d4def20..39e3aa6df25a 100644
> > --- a/drivers/gpu/drm/xe/xe_svm.c
> > +++ b/drivers/gpu/drm/xe/xe_svm.c
> > @@ -700,6 +700,7 @@ static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
> >  	struct device *dev = xe->drm.dev;
> >  	struct drm_buddy_block *block;
> >  	struct list_head *blocks;
> > +	struct drm_exec *exec;
> >  	struct xe_bo *bo;
> >  	ktime_t time_end = 0;
> >  	int err, idx;
> > @@ -708,12 +709,13 @@ static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
> >  		return -ENODEV;
> >  
> >  	xe_pm_runtime_get(xe);
> > +	exec = XE_VALIDATION_UNIMPLEMENTED;
> >  
> >   retry:
> >  	bo = xe_bo_create_locked(vr->xe, NULL, NULL, end - start,
> >  				 ttm_bo_type_device,
> >  				 (IS_DGFX(xe) ? XE_BO_FLAG_VRAM(vr) : XE_BO_FLAG_SYSTEM) |
> > -				 XE_BO_FLAG_CPU_ADDR_MIRROR);
> > +				 XE_BO_FLAG_CPU_ADDR_MIRROR, exec);
> >  	if (IS_ERR(bo)) {
> >  		err = PTR_ERR(bo);
> >  		if (xe_vm_validate_should_retry(NULL, err, &time_end))
> > diff --git a/drivers/gpu/drm/xe/xe_validation.c b/drivers/gpu/drm/xe/xe_validation.c
> > new file mode 100644
> > index 000000000000..cc0684d24e02
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_validation.c
> > @@ -0,0 +1,49 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright © 2024 Intel Corporation
> > + */
> > +#include "xe_bo.h"
> > +#include <drm/drm_exec.h>
> > +#include <drm/drm_gem.h>
> > +
> > +#include "xe_assert.h"
> > +#include "xe_validation.h"
> > +
> > +#ifdef CONFIG_DRM_XE_DEBUG
> > +/**
> > + * xe_validation_assert_exec() - Assert that the drm_exec pointer is suitable
> > + * for validation.
> > + * @xe: Pointer to the xe device.
> > + * @exec: The drm_exec pointer to check.
> > + * @obj: Pointer to the object subject to validation.
> > + *
> > + * NULL exec pointers are not allowed.
> > + * For XE_VALIDATION_UNIMPLEMENTED, no checking.
> > + * For XE_VLIDATION_OPT_OUT, check that the caller is a kunit test
> > + * For XE_VALIDATION_UNSUPPORTED, check that the object subject to
> > + * validation is a dma-buf, for which support for ww locking is
> > + * not in place in the dma-buf layer.
> > + */
> > +void xe_validation_assert_exec(const struct xe_device *xe,
> > +			       const struct drm_exec *exec,
> > +			       const struct drm_gem_object *obj)
> > +{
> > +	xe_assert(xe, exec);
> > +	if (IS_ERR(exec)) {
> > +		switch (PTR_ERR(exec)) {
> > +		case __XE_VAL_UNIMPLEMENTED:
> > +			break;
> > +		case __XE_VAL_UNSUPPORTED:
> > +			xe_assert(xe, !!obj->dma_buf);
> > +			break;
> > +#if IS_ENABLED(CONFIG_KUNIT)
> > +		case __XE_VAL_OPT_OUT:
> > +			xe_assert(xe, current->kunit_test);
> > +			break;
> > +#endif
> > +		default:
> > +			xe_assert(xe, false);
> > +		}
> > +	}
> > +}
> > +#endif
> > diff --git a/drivers/gpu/drm/xe/xe_validation.h b/drivers/gpu/drm/xe/xe_validation.h
> > new file mode 100644
> > index 000000000000..db50feacad7a
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_validation.h
> > @@ -0,0 +1,69 @@
> > +/* SPDX-License-Identifier: MIT */
> > +/*
> > + * Copyright © 2024 Intel Corporation
> > + */
> > +#ifndef _XE_VALIDATION_H_
> > +#define _XE_VALIDATION_H_
> > +
> > +#include <linux/dma-resv.h>
> > +#include <linux/types.h>
> > +
> > +struct drm_exec;
> > +struct drm_gem_object;
> > +struct xe_device;
> > +
> > +#ifdef CONFIG_PROVE_LOCKING
> > +/**
> > + * xe_validation_lockdep() - Assert that a drm_exec locking transaction can
> > + * be initialized at this point.
> > + */
> > +static inline void xe_validation_lockdep(void)
> > +{
> > +	struct ww_acquire_ctx ticket;
> > +
> > +	ww_acquire_init(&ticket, &reservation_ww_class);
> > +	ww_acquire_fini(&ticket);
> > +}
> > +#else
> > +static inline void xe_validation_lockdep(void)
> > +{
> > +}
> > +#endif
> > +
> > +/*
> > + * Various values of the drm_exec pointer where we've not (yet)
> > + * implemented full ww locking.
> > + *
> > + * XE_VALIDATION_UNIMPLEMENTED means implementation is pending.
> > + * A lockdep check is made to assure that a drm_exec locking
> > + * transaction can actually take place where the macro is
> > + * used. If this asserts, the exec pointer needs to be assigned
> > + * higher up in the callchain and passed down.
> > + *
> > + * XE_VALIDATION_UNSUPPORTED is for dma-buf code only where
> > + * the dma-buf layer doesn't support WW locking.
> > + *
> > + * XE_VALIDATION_OPT_OUT is for simplification of kunit tests where
> > + * exhaustive eviction isn't necessary.
> > + */
> > +#define __XE_VAL_UNIMPLEMENTED -EINVAL
> > +#define XE_VALIDATION_UNIMPLEMENTED (xe_validation_lockdep(),		\
> > +				     (struct drm_exec *)ERR_PTR(__XE_VAL_UNIMPLEMENTED))
> > +
> > +#define __XE_VAL_UNSUPPORTED -EOPNOTSUPP
> > +#define XE_VALIDATION_UNSUPPORTED ((struct drm_exec *)ERR_PTR(__XE_VAL_UNSUPPORTED))
> > +
> > +#define __XE_VAL_OPT_OUT -ENOMEM
> > +#define XE_VALIDATION_OPT_OUT (xe_validation_lockdep(), \
> > +			       (struct drm_exec *)ERR_PTR(__XE_VAL_OPT_OUT))
> > +#ifdef CONFIG_DRM_XE_DEBUG
> > +void xe_validation_assert_exec(const struct xe_device *xe, const struct drm_exec *exec,
> > +			       const struct drm_gem_object *obj);
> > +#else
> > +#define xe_validation_assert_exec(_xe, _exec, _obj)	\
> > +	do {						\
> > +		(void)_xe; (void)_exec; (void)_obj;	\
> > +	} while (0)
> > +#endif
> > +
> > +#endif
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > index 12e661960244..600aaadb4bee 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -393,7 +393,7 @@ static int xe_gpuvm_validate(struct drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
> >  		list_move_tail(&gpuva_to_vma(gpuva)->combined_links.rebind,
> >  			       &vm->rebind_list);
> >  
> > -	ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, false);
> > +	ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, false, exec);
> >  	if (ret)
> >  		return ret;
> >  
> > @@ -451,6 +451,7 @@ static int xe_preempt_work_begin(struct drm_exec *exec, struct xe_vm *vm,
> >  	if (err)
> >  		return err;
> >  
> > +	xe_vm_set_validation_exec(vm, exec);
> >  	if (xe_vm_is_idle(vm)) {
> >  		vm->preempt.rebind_deactivated = true;
> >  		*done = true;
> > @@ -516,6 +517,7 @@ static void preempt_rebind_work_func(struct work_struct *w)
> >  		err = xe_preempt_work_begin(&exec, vm, &done);
> >  		drm_exec_retry_on_contention(&exec);
> >  		if (err || done) {
> > +			xe_vm_set_validation_exec(vm, NULL);
> >  			drm_exec_fini(&exec);
> >  			if (err && xe_vm_validate_should_retry(&exec, err, &end))
> >  				err = -EAGAIN;
> > @@ -565,6 +567,7 @@ static void preempt_rebind_work_func(struct work_struct *w)
> >  	up_read(&vm->userptr.notifier_lock);
> >  
> >  out_unlock:
> > +	xe_vm_set_validation_exec(vm, NULL);
> >  	drm_exec_fini(&exec);
> >  out_unlock_outer:
> >  	if (err == -EAGAIN) {
> > @@ -1375,6 +1378,8 @@ int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma)
> >  	err = drm_exec_lock_obj(exec, xe_vm_obj(vm));
> >  	if (!err && bo && !bo->vm)
> >  		err = drm_exec_lock_obj(exec, &bo->ttm.base);
> > +	if (!err)
> > +		xe_vm_set_validation_exec(vm, exec);
> 
> 
> Do you have imbalance here? I see this function called in xe_pf_begin
> and xe_vma_destroy_unlocked but I don't see
> xe_vm_set_validation_exec(vm, NULL) called.
> 
> 
> >  
> >  	return err;
> >  }
> > @@ -2889,7 +2894,7 @@ static int vma_lock_and_validate(struct drm_exec *exec, struct xe_vma *vma,
> >  			err = drm_exec_lock_obj(exec, &bo->ttm.base);
> >  		if (!err && validate)
> >  			err = xe_bo_validate(bo, vm,
> > -					     !xe_vm_in_preempt_fence_mode(vm));
> > +					     !xe_vm_in_preempt_fence_mode(vm), exec);
> >  	}
> >  
> >  	return err;
> > @@ -3012,7 +3017,8 @@ static int op_lock_and_prep(struct drm_exec *exec, struct xe_vm *vm,
> >  					    false);
> >  		if (!err && !xe_vma_has_no_bo(vma))
> >  			err = xe_bo_migrate(xe_vma_bo(vma),
> > -					    region_to_mem_type[region]);
> > +					    region_to_mem_type[region],
> > +					    exec);
> >  		break;
> >  	}
> >  	default:
> > @@ -3052,6 +3058,7 @@ static int vm_bind_ioctl_ops_lock_and_prep(struct drm_exec *exec,
> >  	if (err)
> >  		return err;
> >  
> > +	xe_vm_set_validation_exec(vm, exec);
> >  	list_for_each_entry(op, &vops->list, link) {
> >  		err = op_lock_and_prep(exec, vm, op);
> >  		if (err)
> > @@ -3850,10 +3857,18 @@ struct dma_fence *xe_vm_bind_kernel_bo(struct xe_vm *vm, struct xe_bo *bo,
> >   */
> >  int xe_vm_lock(struct xe_vm *vm, bool intr)
> >  {
> > +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> > +	int ret;
> > +
> >  	if (intr)
> > -		return dma_resv_lock_interruptible(xe_vm_resv(vm), NULL);
> > +		ret = dma_resv_lock_interruptible(xe_vm_resv(vm), NULL);
> > +	else
> > +		ret = dma_resv_lock(xe_vm_resv(vm), NULL);
> > +
> > +	if (!ret)
> > +		xe_vm_set_validation_exec(vm, exec);
> >  
> > -	return dma_resv_lock(xe_vm_resv(vm), NULL);
> > +	return ret;
> >  }
> >  
> >  /**
> > @@ -3864,6 +3879,7 @@ int xe_vm_lock(struct xe_vm *vm, bool intr)
> >   */
> >  void xe_vm_unlock(struct xe_vm *vm)
> >  {
> > +	xe_vm_set_validation_exec(vm, NULL);
> >  	dma_resv_unlock(xe_vm_resv(vm));
> >  }
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> > index 2ecb417c19a2..4ba26eed7e96 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.h
> > +++ b/drivers/gpu/drm/xe/xe_vm.h
> > @@ -321,7 +321,7 @@ static inline void xe_vm_set_validating(struct xe_vm *vm, bool allow_res_evict)
> >  	if (vm && !allow_res_evict) {
> >  		xe_vm_assert_held(vm);
> >  		/* Pairs with READ_ONCE in xe_vm_is_validating() */
> > -		WRITE_ONCE(vm->validating, current);
> > +		WRITE_ONCE(vm->validation.validating, current);
> >  	}
> >  }
> >  
> > @@ -339,7 +339,7 @@ static inline void xe_vm_clear_validating(struct xe_vm *vm, bool allow_res_evict
> >  {
> >  	if (vm && !allow_res_evict) {
> >  		/* Pairs with READ_ONCE in xe_vm_is_validating() */
> > -		WRITE_ONCE(vm->validating, NULL);
> > +		WRITE_ONCE(vm->validation.validating, NULL);
> >  	}
> >  }
> >  
> > @@ -357,13 +357,40 @@ static inline void xe_vm_clear_validating(struct xe_vm *vm, bool allow_res_evict
> >  static inline bool xe_vm_is_validating(struct xe_vm *vm)
> >  {
> >  	/* Pairs with WRITE_ONCE in xe_vm_is_validating() */
> > -	if (READ_ONCE(vm->validating) == current) {
> > +	if (READ_ONCE(vm->validation.validating) == current) {
> >  		xe_vm_assert_held(vm);
> >  		return true;
> >  	}
> >  	return false;
> >  }
> >  
> > +/**
> > + * xe_vm_set_validation_exec() - Accessor to set the drm_exec object
> > + * @vm: The vm we want to register a drm_exec object with.
> > + * @exec: The exec object we want to register.
> > + *
> > + * Set the drm_exec object used to lock the vm's resv.
> > + */
> > +static inline void xe_vm_set_validation_exec(struct xe_vm *vm, struct drm_exec *exec)
> > +{
> > +	xe_vm_assert_held(vm);
> > +	vm->validation._exec = exec;
> > +}
> > +
> > +/**
> > + * xe_vm_set_validation_exec() - Accessor to read the drm_exec object
> > + * @vm: The vm we want to register a drm_exec object with.
> > + *
> > + * Return: The drm_exec object used to lock the vm's resv. The value
> > + * is a valid pointer, %NULL, or one of the special values defined in
> > + * xe_validation.h.
> > + */
> > +static inline struct drm_exec *xe_vm_validation_exec(struct xe_vm *vm)
> > +{
> > +	xe_vm_assert_held(vm);
> > +	return vm->validation._exec;
> > +}
> > +
> >  /**
> >   * xe_vm_has_valid_gpu_mapping() - Advisory helper to check if VMA or SVM range has
> >   * a valid GPU mapping
> > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> > index 8a07feef503b..2f88808e36bb 100644
> > --- a/drivers/gpu/drm/xe/xe_vm_types.h
> > +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> > @@ -312,19 +312,35 @@ struct xe_vm {
> >  		bool capture_once;
> >  	} error_capture;
> >  
> > +	/**
> > +	 * @validation: Validation data only valid with the vm resv held.
> > +	 * Note: This is really task state of the task holding the vm resv,
> > +	 * and moving forward we should
> > +	 * come up with a better way of passing this down the call-
> > +	 * chain.
> 
> 
> I've already mentioned this, attaching the _exec xe_vma_ops might be
> good option as xe_vma_ops has lifetime of only existing for the bind
> (i.e., it is stack variable) so you'd only need to set it (i.e., no
> clear required).
> 
> I think patch largely makes sense.
> 
> Matt 
> 
> 
> > +	 */
> > +	struct {
> > +		/**
> > +		 * @validation.validating: The task that is currently making bos resident.
> > +		 * for this vm.
> > +		 * Protected by the VM's resv for writing. Opportunistic reading can be done
> > +		 * using READ_ONCE. Note: This is a workaround for the
> > +		 * TTM eviction_valuable() callback not being passed a struct
> > +		 * ttm_operation_context(). Future work might want to address this.
> > +		 */
> > +		struct task_struct *validating;
> > +		/**
> > +		 *  @validation.exec The drm_exec context used when locking the vm resv.
> > +		 *  Protected by the vm's resv.
> > +		 */
> > +		struct drm_exec *_exec;
> > +	} validation;
> > +
> >  	/**
> >  	 * @tlb_flush_seqno: Required TLB flush seqno for the next exec.
> >  	 * protected by the vm resv.
> >  	 */
> >  	u64 tlb_flush_seqno;
> > -	/**
> > -	 * @validating: The task that is currently making bos resident for this vm.
> > -	 * Protected by the VM's resv for writing. Opportunistic reading can be done
> > -	 * using READ_ONCE. Note: This is a workaround for the
> > -	 * TTM eviction_valuable() callback not being passed a struct
> > -	 * ttm_operation_context(). Future work might want to address this.
> > -	 */
> > -	struct task_struct *validating;
> >  	/** @batch_invalidate_tlb: Always invalidate TLB before batch start */
> >  	bool batch_invalidate_tlb;
> >  	/** @xef: XE file handle for tracking this VM's drm client */
> > -- 
> > 2.50.1
> > 
>




^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 07/15] drm/xe: Convert SVM validation for exhaustive eviction
  2025-08-13 15:32   ` Matthew Brost
@ 2025-08-14 12:24     ` Thomas Hellström
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Hellström @ 2025-08-14 12:24 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, 2025-08-13 at 08:32 -0700, Matthew Brost wrote:
> On Wed, Aug 13, 2025 at 12:51:13PM +0200, Thomas Hellström wrote:
> > Convert SVM validation to support exhaustive eviction,
> > using xe_validation_guard().
> > 
> 
> Do we not need to validation guard + xe_vm_set_validation_exec around
> xe_vm_range_rebind, given that on first fault of range we can
> allocate
> PTs?

Yes, you're right. I see this comment in a later patch as well.
The reason the asserts didn't trigger here is that we were leaking an
xe_vm_set_validation(). I'm fixing that up in v2 and add more asserts
so that's much less likely to happen.

Ideally we'd want to pass the drm_exec (perhaps part of a pt_details
struct) all the way down to pt allocation. But when I tried that during
developement the headers became quite bloated.

/Thomas


> 
> Matt
> 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_svm.c | 63 ++++++++++++++++++---------------
> > ----
> >  1 file changed, 30 insertions(+), 33 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_svm.c
> > b/drivers/gpu/drm/xe/xe_svm.c
> > index 39e3aa6df25a..ba85665d85d4 100644
> > --- a/drivers/gpu/drm/xe/xe_svm.c
> > +++ b/drivers/gpu/drm/xe/xe_svm.c
> > @@ -699,51 +699,48 @@ static int xe_drm_pagemap_populate_mm(struct
> > drm_pagemap *dpagemap,
> >  	struct xe_device *xe = vr->xe;
> >  	struct device *dev = xe->drm.dev;
> >  	struct drm_buddy_block *block;
> > +	struct xe_validation_ctx vctx;
> >  	struct list_head *blocks;
> > -	struct drm_exec *exec;
> > +	struct drm_exec exec;
> >  	struct xe_bo *bo;
> > -	ktime_t time_end = 0;
> > -	int err, idx;
> > +	int err = 0, idx;
> >  
> >  	if (!drm_dev_enter(&xe->drm, &idx))
> >  		return -ENODEV;
> >  
> >  	xe_pm_runtime_get(xe);
> > -	exec = XE_VALIDATION_UNIMPLEMENTED;
> > -
> > - retry:
> > -	bo = xe_bo_create_locked(vr->xe, NULL, NULL, end - start,
> > -				 ttm_bo_type_device,
> > -				 (IS_DGFX(xe) ?
> > XE_BO_FLAG_VRAM(vr) : XE_BO_FLAG_SYSTEM) |
> > -				 XE_BO_FLAG_CPU_ADDR_MIRROR,
> > exec);
> > -	if (IS_ERR(bo)) {
> > -		err = PTR_ERR(bo);
> > -		if (xe_vm_validate_should_retry(NULL, err,
> > &time_end))
> > -			goto retry;
> > -		goto out_pm_put;
> > -	}
> >  
> > -	drm_pagemap_devmem_init(&bo->devmem_allocation, dev, mm,
> > -				&dpagemap_devmem_ops, dpagemap,
> > end - start);
> > -
> > -	blocks = &to_xe_ttm_vram_mgr_resource(bo->ttm.resource)-
> > >blocks;
> > -	list_for_each_entry(block, blocks, link)
> > -		block->private = vr;
> > +	xe_validation_guard(&vctx, &xe->val, &exec, 0, err, false)
> > {
> > +		bo = xe_bo_create_locked(xe, NULL, NULL, end -
> > start,
> > +					 ttm_bo_type_device,
> > +					 (IS_DGFX(xe) ?
> > XE_BO_FLAG_VRAM(vr) : XE_BO_FLAG_SYSTEM) |
> > +					
> > XE_BO_FLAG_CPU_ADDR_MIRROR, &exec);
> > +		drm_exec_retry_on_contention(&exec);
> > +		if (IS_ERR(bo)) {
> > +			err = PTR_ERR(bo);
> > +			xe_validation_retry_on_oom(&vctx, &err);
> > +			break;
> > +		}
> >  
> > -	xe_bo_get(bo);
> > +		drm_pagemap_devmem_init(&bo->devmem_allocation,
> > dev, mm,
> > +					&dpagemap_devmem_ops,
> > dpagemap, end - start);
> >  
> > -	/* Ensure the device has a pm ref while there are device
> > pages active. */
> > -	xe_pm_runtime_get_noresume(xe);
> > -	err = drm_pagemap_migrate_to_devmem(&bo-
> > >devmem_allocation, mm,
> > -					    start, end,
> > timeslice_ms,
> > -					   
> > xe_svm_devm_owner(xe));
> > -	if (err)
> > -		xe_svm_devmem_release(&bo->devmem_allocation);
> > +		blocks = &to_xe_ttm_vram_mgr_resource(bo-
> > >ttm.resource)->blocks;
> > +		list_for_each_entry(block, blocks, link)
> > +			block->private = vr;
> >  
> > -	xe_bo_unlock(bo);
> > -	xe_bo_put(bo);
> > +		xe_bo_get(bo);
> >  
> > -out_pm_put:
> > +		/* Ensure the device has a pm ref while there are
> > device pages active. */
> > +		xe_pm_runtime_get_noresume(xe);
> > +		err = drm_pagemap_migrate_to_devmem(&bo-
> > >devmem_allocation, mm,
> > +						    start, end,
> > timeslice_ms,
> > +						   
> > xe_svm_devm_owner(xe));
> > +		if (err)
> > +			xe_svm_devmem_release(&bo-
> > >devmem_allocation);
> > +		xe_bo_unlock(bo);
> > +		xe_bo_put(bo);
> > +	}
> >  	xe_pm_runtime_put(xe);
> >  	drm_dev_exit(idx);
> >  
> > -- 
> > 2.50.1
> > 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 14/15] drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction
  2025-08-14  4:18   ` Matthew Brost
@ 2025-08-14 13:14     ` Thomas Hellström
  2025-08-14 18:39       ` Matthew Brost
  0 siblings, 1 reply; 66+ messages in thread
From: Thomas Hellström @ 2025-08-14 13:14 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, 2025-08-13 at 21:18 -0700, Matthew Brost wrote:
> On Wed, Aug 13, 2025 at 12:51:20PM +0200, Thomas Hellström wrote:
> > Introduce an xe_bo_create_pin_map_novm() function that does not
> > take the drm_exec paramenter to simplify the conversion of many
> > callsites.
> > For the rest, ensure that the same drm_exec context that was used
> > for locking the vm is passed down to validation.
> > 
> > Use xe_validation_guard() where appropriate.
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  drivers/gpu/drm/xe/display/intel_fbdev_fb.c   |  18 +--
> >  drivers/gpu/drm/xe/display/xe_dsb_buffer.c    |  10 +-
> >  drivers/gpu/drm/xe/display/xe_fb_pin.c        |  39 +++---
> >  drivers/gpu/drm/xe/display/xe_hdcp_gsc.c      |   8 +-
> >  drivers/gpu/drm/xe/tests/xe_migrate.c         |   9 +-
> >  drivers/gpu/drm/xe/xe_bo.c                    |  53 +++++++-
> >  drivers/gpu/drm/xe/xe_bo.h                    |   6 +-
> >  drivers/gpu/drm/xe/xe_gsc.c                   |   8 +-
> >  drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c    |  24 ++--
> >  drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c |  24 ++--
> >  drivers/gpu/drm/xe/xe_guc_engine_activity.c   |  13 +-
> >  drivers/gpu/drm/xe/xe_lmtt.c                  |  12 +-
> >  drivers/gpu/drm/xe/xe_lrc.c                   |   7 +-
> >  drivers/gpu/drm/xe/xe_migrate.c               |  20 ++-
> >  drivers/gpu/drm/xe/xe_oa.c                    |   6 +-
> >  drivers/gpu/drm/xe/xe_pt.c                    |  10 +-
> >  drivers/gpu/drm/xe/xe_pt.h                    |   3 +-
> >  drivers/gpu/drm/xe/xe_pxp_submit.c            |  34 +++--
> >  drivers/gpu/drm/xe/xe_vm.c                    | 119 ++++++++++----
> > ----
> >  19 files changed, 252 insertions(+), 171 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
> > b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
> > index d96ba2b51065..8ea9a472113c 100644
> > --- a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
> > +++ b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
> > @@ -42,11 +42,11 @@ struct intel_framebuffer
> > *intel_fbdev_fb_alloc(struct drm_fb_helper *helper,
> >  	obj = ERR_PTR(-ENODEV);
> >  
> >  	if (!IS_DGFX(xe) && !XE_GT_WA(xe_root_mmio_gt(xe),
> > 22019338487_display)) {
> > -		obj = xe_bo_create_pin_map(xe,
> > xe_device_get_root_tile(xe),
> > -					   NULL, size,
> > -					   ttm_bo_type_kernel,
> > XE_BO_FLAG_SCANOUT |
> > -					   XE_BO_FLAG_STOLEN |
> > -					   XE_BO_FLAG_GGTT);
> > +		obj = xe_bo_create_pin_map_novm(xe,
> > xe_device_get_root_tile(xe),
> > +						size,
> > +						ttm_bo_type_kernel
> > , XE_BO_FLAG_SCANOUT |
> > +						XE_BO_FLAG_STOLEN
> > |
> > +						XE_BO_FLAG_GGTT,
> > false);
> >  		if (!IS_ERR(obj))
> >  			drm_info(&xe->drm, "Allocated fbdev into
> > stolen\n");
> >  		else
> > @@ -54,10 +54,10 @@ struct intel_framebuffer
> > *intel_fbdev_fb_alloc(struct drm_fb_helper *helper,
> >  	}
> >  
> >  	if (IS_ERR(obj)) {
> > -		obj = xe_bo_create_pin_map(xe,
> > xe_device_get_root_tile(xe), NULL, size,
> > -					   ttm_bo_type_kernel,
> > XE_BO_FLAG_SCANOUT |
> > -					  
> > XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> > -					   XE_BO_FLAG_GGTT);
> > +		obj = xe_bo_create_pin_map_novm(xe,
> > xe_device_get_root_tile(xe), size,
> > +						ttm_bo_type_kernel
> > , XE_BO_FLAG_SCANOUT |
> > +						XE_BO_FLAG_VRAM_IF
> > _DGFX(xe_device_get_root_tile(xe)) |
> > +						XE_BO_FLAG_GGTT,
> > false);
> >  	}
> >  
> >  	if (IS_ERR(obj)) {
> > diff --git a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
> > b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
> > index 9f941fc2e36b..58581d7aaae6 100644
> > --- a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
> > +++ b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
> > @@ -43,11 +43,11 @@ bool intel_dsb_buffer_create(struct intel_crtc
> > *crtc, struct intel_dsb_buffer *d
> >  		return false;
> >  
> >  	/* Set scanout flag for WC mapping */
> > -	obj = xe_bo_create_pin_map(xe,
> > xe_device_get_root_tile(xe),
> > -				   NULL, PAGE_ALIGN(size),
> > -				   ttm_bo_type_kernel,
> > -				  
> > XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> > -				   XE_BO_FLAG_SCANOUT |
> > XE_BO_FLAG_GGTT);
> > +	obj = xe_bo_create_pin_map_novm(xe,
> > xe_device_get_root_tile(xe),
> > +					PAGE_ALIGN(size),
> > +					ttm_bo_type_kernel,
> > +					XE_BO_FLAG_VRAM_IF_DGFX(xe
> > _device_get_root_tile(xe)) |
> > +					XE_BO_FLAG_SCANOUT |
> > XE_BO_FLAG_GGTT, false);
> >  	if (IS_ERR(obj)) {
> >  		kfree(vma);
> >  		return false;
> > diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > index d46ff7ebb0a1..d8e15ebb740c 100644
> > --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > @@ -102,32 +102,23 @@ static int __xe_pin_fb_vma_dpt(const struct
> > intel_framebuffer *fb,
> >  				 XE_PAGE_SIZE);
> >  
> >  	if (IS_DGFX(xe))
> > -		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> > -						   dpt_size,
> > ~0ull,
> > -						  
> > ttm_bo_type_kernel,
> > -						   true,
> > -						  
> > XE_BO_FLAG_VRAM0 |
> > -						   XE_BO_FLAG_GGTT
> > |
> > -						  
> > XE_BO_FLAG_PAGETABLE,
> > -						   alignment,
> > false);
> > +		dpt = xe_bo_create_pin_map_novm(xe, tile0,
> > dpt_size,
> > +						ttm_bo_type_kernel
> > ,
> > +						XE_BO_FLAG_VRAM0 |
> > +						XE_BO_FLAG_GGTT |
> > +						XE_BO_FLAG_PAGETAB
> > LE, true);
> >  	else
> > -		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> > -						   dpt_size, 
> > ~0ull,
> > -						  
> > ttm_bo_type_kernel,
> > -						   true,
> > -						  
> > XE_BO_FLAG_STOLEN |
> > -						   XE_BO_FLAG_GGTT
> > |
> > -						  
> > XE_BO_FLAG_PAGETABLE,
> > -						   alignment,
> > false);
> > +		dpt = xe_bo_create_pin_map_novm(xe, tile0,
> > dpt_size,
> > +						ttm_bo_type_kernel
> > ,
> > +						XE_BO_FLAG_STOLEN
> > |
> > +						XE_BO_FLAG_GGTT |
> > +						XE_BO_FLAG_PAGETAB
> > LE, true);
> >  	if (IS_ERR(dpt))
> > -		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> > -						   dpt_size, 
> > ~0ull,
> > -						  
> > ttm_bo_type_kernel,
> > -						   true,
> > -						  
> > XE_BO_FLAG_SYSTEM |
> > -						   XE_BO_FLAG_GGTT
> > |
> > -						  
> > XE_BO_FLAG_PAGETABLE,
> > -						   alignment,
> > false);
> > +		dpt = xe_bo_create_pin_map_novm(xe, tile0,
> > dpt_size,
> > +						ttm_bo_type_kernel
> > ,
> > +						XE_BO_FLAG_SYSTEM
> > |
> > +						XE_BO_FLAG_GGTT |
> > +						XE_BO_FLAG_PAGETAB
> > LE, true);
> >  	if (IS_ERR(dpt))
> >  		return PTR_ERR(dpt);
> >  
> > diff --git a/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
> > b/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
> > index 30f1073141fc..4ae847b628e2 100644
> > --- a/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
> > +++ b/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
> > @@ -72,10 +72,10 @@ static int
> > intel_hdcp_gsc_initialize_message(struct xe_device *xe,
> >  	int ret = 0;
> >  
> >  	/* allocate object of two page for HDCP command memory and
> > store it */
> > -	bo = xe_bo_create_pin_map(xe, xe_device_get_root_tile(xe),
> > NULL, PAGE_SIZE * 2,
> > -				  ttm_bo_type_kernel,
> > -				  XE_BO_FLAG_SYSTEM |
> > -				  XE_BO_FLAG_GGTT);
> > +	bo = xe_bo_create_pin_map_novm(xe,
> > xe_device_get_root_tile(xe), PAGE_SIZE * 2,
> > +				       ttm_bo_type_kernel,
> > +				       XE_BO_FLAG_SYSTEM |
> > +				       XE_BO_FLAG_GGTT, false);
> >  
> >  	if (IS_ERR(bo)) {
> >  		drm_err(&xe->drm, "Failed to allocate bo for HDCP
> > streaming command!\n");
> > diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c
> > b/drivers/gpu/drm/xe/tests/xe_migrate.c
> > index afa794e56065..5904d658d1f2 100644
> > --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
> > +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
> > @@ -204,7 +204,8 @@ static void xe_migrate_sanity_test(struct
> > xe_migrate *m, struct kunit *test,
> >  
> >  	big = xe_bo_create_pin_map(xe, tile, m->q->vm, SZ_4M,
> >  				   ttm_bo_type_kernel,
> > -				   XE_BO_FLAG_VRAM_IF_DGFX(tile));
> > +				   XE_BO_FLAG_VRAM_IF_DGFX(tile),
> > +				   exec);
> >  	if (IS_ERR(big)) {
> >  		KUNIT_FAIL(test, "Failed to allocate bo: %li\n",
> > PTR_ERR(big));
> >  		goto vunmap;
> > @@ -212,7 +213,8 @@ static void xe_migrate_sanity_test(struct
> > xe_migrate *m, struct kunit *test,
> >  
> >  	pt = xe_bo_create_pin_map(xe, tile, m->q->vm,
> > XE_PAGE_SIZE,
> >  				  ttm_bo_type_kernel,
> > -				  XE_BO_FLAG_VRAM_IF_DGFX(tile));
> > +				  XE_BO_FLAG_VRAM_IF_DGFX(tile),
> > +				  exec);
> >  	if (IS_ERR(pt)) {
> >  		KUNIT_FAIL(test, "Failed to allocate fake pt:
> > %li\n",
> >  			   PTR_ERR(pt));
> > @@ -222,7 +224,8 @@ static void xe_migrate_sanity_test(struct
> > xe_migrate *m, struct kunit *test,
> >  	tiny = xe_bo_create_pin_map(xe, tile, m->q->vm,
> >  				    2 * SZ_4K,
> >  				    ttm_bo_type_kernel,
> > -				   
> > XE_BO_FLAG_VRAM_IF_DGFX(tile));
> > +				    XE_BO_FLAG_VRAM_IF_DGFX(tile),
> > +				    exec);
> >  	if (IS_ERR(tiny)) {
> >  		KUNIT_FAIL(test, "Failed to allocate tiny fake pt:
> > %li\n",
> >  			   PTR_ERR(tiny));
> > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > b/drivers/gpu/drm/xe/xe_bo.c
> > index c9928d4ee5a0..82bf158426ad 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > @@ -2343,16 +2343,60 @@ xe_bo_create_pin_map_at_novm(struct
> > xe_device *xe, struct xe_tile *tile,
> >  	return ret ? ERR_PTR(ret) : bo;
> >  }
> >  
> > +/**
> > + * xe_bo_create_pin_map() - Create pinned and mapped bo
> > + * @xe: The xe device.
> > + * @tile: The tile to select for migration of this bo, and the
> > tile used for
> > + * @vm: The vm to associate the buffer object with. The vm's resv
> > must be locked
> > + * with the transaction represented by @exec.
> > + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel
> > bos.
> > + * @size: The storage size to use for the bo.
> > + * @type: The TTM buffer object type.
> > + * @flags: XE_BO_FLAG_ flags.
> > + * @exec: The drm_exec transaction to use for exhaustive eviction,
> > and
> > + * previously used for locking @vm's resv.
> > + *
> > + * Create a pinned and mapped bo. The bo will be external and not
> > associated
> > + * with a VM.
> > + *
> > + * Return: The buffer object on success. Negative error pointer on
> > failure.
> > + * In particular, the function may return ERR_PTR(%-EINTR) if
> > @intr was set
> > + * to true on entry.
> > + */
> >  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct
> > xe_tile *tile,
> >  				   struct xe_vm *vm, size_t size,
> > -				   enum ttm_bo_type type, u32
> > flags)
> > +				   enum ttm_bo_type type, u32
> > flags,
> > +				   struct drm_exec *exec)
> >  {
> > -	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) :
> > XE_VALIDATION_UNIMPLEMENTED;
> > -
> > +	xe_assert(xe, exec);
> >  	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size,
> > ~0ull, type, flags,
> >  					       true, 0, exec);
> >  }
> >  
> > +/**
> > + * xe_bo_create_pin_map_novm() - Create pinned and mapped bo
> > + * @xe: The xe device.
> > + * @tile: The tile to select for migration of this bo, and the
> > tile used for
> > + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel
> > bos.
> > + * @size: The storage size to use for the bo.
> > + * @type: The TTM buffer object type.
> > + * @flags: XE_BO_FLAG_ flags.
> > + * @intr: Whether to execut any waits for backing store
> > interruptible.
> > + *
> > + * Create a pinned and mapped bo. The bo will be external and not
> > associated
> > + * with a VM.
> > + *
> > + * Return: The buffer object on success. Negative error pointer on
> > failure.
> > + * In particular, the function may return ERR_PTR(%-EINTR) if
> > @intr was set
> > + * to true on entry.
> > + */
> > +struct xe_bo *xe_bo_create_pin_map_novm(struct xe_device *xe,
> > struct xe_tile *tile,
> > +					size_t size, enum
> > ttm_bo_type type, u32 flags,
> > +					bool intr)
> > +{
> > +	return xe_bo_create_pin_map_at_novm(xe, tile, size, ~0ull,
> > type, flags, true, 0, intr);
> > +}
> > +
> >  static void __xe_bo_unpin_map_no_vm(void *arg)
> >  {
> >  	xe_bo_unpin_map_no_vm(arg);
> > @@ -2365,8 +2409,7 @@ struct xe_bo
> > *xe_managed_bo_create_pin_map(struct xe_device *xe, struct xe_tile
> >  	int ret;
> >  
> >  	KUNIT_STATIC_STUB_REDIRECT(xe_managed_bo_create_pin_map,
> > xe, tile, size, flags);
> > -
> > -	bo = xe_bo_create_pin_map(xe, tile, NULL, size,
> > ttm_bo_type_kernel, flags);
> > +	bo = xe_bo_create_pin_map_novm(xe, tile, size,
> > ttm_bo_type_kernel, flags, true);
> >  	if (IS_ERR(bo))
> >  		return bo;
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_bo.h
> > b/drivers/gpu/drm/xe/xe_bo.h
> > index d06266af9662..802e3c7d7872 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.h
> > +++ b/drivers/gpu/drm/xe/xe_bo.h
> > @@ -108,7 +108,11 @@ struct xe_bo *xe_bo_create_user(struct
> > xe_device *xe, struct xe_vm *vm, size_t s
> >  				u16 cpu_caching, u32 flags, struct
> > drm_exec *exec);
> >  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct
> > xe_tile *tile,
> >  				   struct xe_vm *vm, size_t size,
> > -				   enum ttm_bo_type type, u32
> > flags);
> > +				   enum ttm_bo_type type, u32
> > flags,
> > +				   struct drm_exec *exec);
> > +struct xe_bo *xe_bo_create_pin_map_novm(struct xe_device *xe,
> > struct xe_tile *tile,
> > +					size_t size, enum
> > ttm_bo_type type, u32 flags,
> > +					bool intr);
> >  struct xe_bo *
> >  xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile
> > *tile,
> >  			     size_t size, u64 offset, enum
> > ttm_bo_type type,
> > diff --git a/drivers/gpu/drm/xe/xe_gsc.c
> > b/drivers/gpu/drm/xe/xe_gsc.c
> > index f5ae28af60d4..83d61bf8ec62 100644
> > --- a/drivers/gpu/drm/xe/xe_gsc.c
> > +++ b/drivers/gpu/drm/xe/xe_gsc.c
> > @@ -136,10 +136,10 @@ static int query_compatibility_version(struct
> > xe_gsc *gsc)
> >  	u64 ggtt_offset;
> >  	int err;
> >  
> > -	bo = xe_bo_create_pin_map(xe, tile, NULL, GSC_VER_PKT_SZ *
> > 2,
> > -				  ttm_bo_type_kernel,
> > -				  XE_BO_FLAG_SYSTEM |
> > -				  XE_BO_FLAG_GGTT);
> > +	bo = xe_bo_create_pin_map_novm(xe, tile, GSC_VER_PKT_SZ *
> > 2,
> > +				       ttm_bo_type_kernel,
> > +				       XE_BO_FLAG_SYSTEM |
> > +				       XE_BO_FLAG_GGTT, false);
> >  	if (IS_ERR(bo)) {
> >  		xe_gt_err(gt, "failed to allocate bo for GSC
> > version query\n");
> >  		return PTR_ERR(bo);
> > diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> > b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> > index 906011671b60..d0a87d7b028b 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> > +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> > @@ -1452,7 +1452,6 @@ static bool pf_release_vf_config_lmem(struct
> > xe_gt *gt, struct xe_gt_sriov_confi
> >  static int pf_provision_vf_lmem(struct xe_gt *gt, unsigned int
> > vfid, u64 size)
> >  {
> >  	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt,
> > vfid);
> > -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> >  	struct xe_device *xe = gt_to_xe(gt);
> >  	struct xe_tile *tile = gt_to_tile(gt);
> >  	struct xe_bo *bo;
> > @@ -1479,24 +1478,17 @@ static int pf_provision_vf_lmem(struct
> > xe_gt *gt, unsigned int vfid, u64 size)
> >  		return 0;
> >  
> >  	xe_gt_assert(gt, pf_get_lmem_alignment(gt) == SZ_2M);
> > -	bo = xe_bo_create_locked(xe, tile, NULL,
> > -				 ALIGN(size, PAGE_SIZE),
> > -				 ttm_bo_type_kernel,
> > -				 XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> > -				 XE_BO_FLAG_NEEDS_2M |
> > -				 XE_BO_FLAG_PINNED |
> > -				 XE_BO_FLAG_PINNED_LATE_RESTORE,
> > -				 exec);
> > +	bo = xe_bo_create_pin_map_at_novm(xe, tile,
> > +					  ALIGN(size, PAGE_SIZE),
> > +					  0,
> > +					  ttm_bo_type_kernel,
> > +					 
> > XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> > +					  XE_BO_FLAG_NEEDS_2M |
> > +					 
> > XE_BO_FLAG_PINNED_LATE_RESTORE,
> > +					  false, 0, false);
> >  	if (IS_ERR(bo))
> >  		return PTR_ERR(bo);
> >  
> > -	err = xe_bo_pin(bo, exec);
> > -	xe_bo_unlock(bo);
> > -	if (unlikely(err)) {
> > -		xe_bo_put(bo);
> > -		return err;
> > -	}
> > -
> >  	config->lmem_obj = bo;
> >  
> >  	if (xe_device_has_lmtt(xe)) {
> > diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> > b/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> > index c712111aa30d..44cc612b0a75 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> > +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> > @@ -55,12 +55,12 @@ static int pf_send_guc_save_vf_state(struct
> > xe_gt *gt, unsigned int vfid,
> >  	xe_gt_assert(gt, size % sizeof(u32) == 0);
> >  	xe_gt_assert(gt, size == ndwords * sizeof(u32));
> >  
> > -	bo = xe_bo_create_pin_map(xe, tile, NULL,
> > -				  ALIGN(size, PAGE_SIZE),
> > -				  ttm_bo_type_kernel,
> > -				  XE_BO_FLAG_SYSTEM |
> > -				  XE_BO_FLAG_GGTT |
> > -				  XE_BO_FLAG_GGTT_INVALIDATE);
> > +	bo = xe_bo_create_pin_map_novm(xe, tile,
> > +				       ALIGN(size, PAGE_SIZE),
> > +				       ttm_bo_type_kernel,
> > +				       XE_BO_FLAG_SYSTEM |
> > +				       XE_BO_FLAG_GGTT |
> > +				       XE_BO_FLAG_GGTT_INVALIDATE,
> > false);
> >  	if (IS_ERR(bo))
> >  		return PTR_ERR(bo);
> >  
> > @@ -91,12 +91,12 @@ static int pf_send_guc_restore_vf_state(struct
> > xe_gt *gt, unsigned int vfid,
> >  	xe_gt_assert(gt, size % sizeof(u32) == 0);
> >  	xe_gt_assert(gt, size == ndwords * sizeof(u32));
> >  
> > -	bo = xe_bo_create_pin_map(xe, tile, NULL,
> > -				  ALIGN(size, PAGE_SIZE),
> > -				  ttm_bo_type_kernel,
> > -				  XE_BO_FLAG_SYSTEM |
> > -				  XE_BO_FLAG_GGTT |
> > -				  XE_BO_FLAG_GGTT_INVALIDATE);
> > +	bo = xe_bo_create_pin_map_novm(xe, tile,
> > +				       ALIGN(size, PAGE_SIZE),
> > +				       ttm_bo_type_kernel,
> > +				       XE_BO_FLAG_SYSTEM |
> > +				       XE_BO_FLAG_GGTT |
> > +				       XE_BO_FLAG_GGTT_INVALIDATE,
> > false);
> >  	if (IS_ERR(bo))
> >  		return PTR_ERR(bo);
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_guc_engine_activity.c
> > b/drivers/gpu/drm/xe/xe_guc_engine_activity.c
> > index 92e1f9f41b8c..2b99c1ebdd58 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_engine_activity.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_engine_activity.c
> > @@ -94,16 +94,17 @@ static int
> > allocate_engine_activity_buffers(struct xe_guc *guc,
> >  	struct xe_tile *tile = gt_to_tile(gt);
> >  	struct xe_bo *bo, *metadata_bo;
> >  
> > -	metadata_bo = xe_bo_create_pin_map(gt_to_xe(gt), tile,
> > NULL, PAGE_ALIGN(metadata_size),
> > -					   ttm_bo_type_kernel,
> > XE_BO_FLAG_SYSTEM |
> > -					   XE_BO_FLAG_GGTT |
> > XE_BO_FLAG_GGTT_INVALIDATE);
> > +	metadata_bo = xe_bo_create_pin_map_novm(gt_to_xe(gt),
> > tile, PAGE_ALIGN(metadata_size),
> > +						ttm_bo_type_kernel
> > , XE_BO_FLAG_SYSTEM |
> > +						XE_BO_FLAG_GGTT |
> > XE_BO_FLAG_GGTT_INVALIDATE,
> > +						false);
> >  
> >  	if (IS_ERR(metadata_bo))
> >  		return PTR_ERR(metadata_bo);
> >  
> > -	bo = xe_bo_create_pin_map(gt_to_xe(gt), tile, NULL,
> > PAGE_ALIGN(size),
> > -				  ttm_bo_type_kernel,
> > XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> > -				  XE_BO_FLAG_GGTT |
> > XE_BO_FLAG_GGTT_INVALIDATE);
> > +	bo = xe_bo_create_pin_map_novm(gt_to_xe(gt), tile,
> > PAGE_ALIGN(size),
> > +				       ttm_bo_type_kernel,
> > XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> > +				       XE_BO_FLAG_GGTT |
> > XE_BO_FLAG_GGTT_INVALIDATE, false);
> >  
> >  	if (IS_ERR(bo)) {
> >  		xe_bo_unpin_map_no_vm(metadata_bo);
> > diff --git a/drivers/gpu/drm/xe/xe_lmtt.c
> > b/drivers/gpu/drm/xe/xe_lmtt.c
> > index a78c9d474a6e..4ad468574174 100644
> > --- a/drivers/gpu/drm/xe/xe_lmtt.c
> > +++ b/drivers/gpu/drm/xe/xe_lmtt.c
> > @@ -67,12 +67,12 @@ static struct xe_lmtt_pt *lmtt_pt_alloc(struct
> > xe_lmtt *lmtt, unsigned int level
> >  		goto out;
> >  	}
> >  
> > -	bo = xe_bo_create_pin_map(lmtt_to_xe(lmtt),
> > lmtt_to_tile(lmtt), NULL,
> > -				  PAGE_ALIGN(lmtt->ops-
> > >lmtt_pte_size(level) *
> > -					     lmtt->ops-
> > >lmtt_pte_num(level)),
> > -				  ttm_bo_type_kernel,
> > -				 
> > XE_BO_FLAG_VRAM_IF_DGFX(lmtt_to_tile(lmtt)) |
> > -				  XE_BO_FLAG_NEEDS_64K);
> > +	bo = xe_bo_create_pin_map_novm(lmtt_to_xe(lmtt),
> > lmtt_to_tile(lmtt),
> > +				       PAGE_ALIGN(lmtt->ops-
> > >lmtt_pte_size(level) *
> > +						  lmtt->ops-
> > >lmtt_pte_num(level)),
> > +				       ttm_bo_type_kernel,
> > +				      
> > XE_BO_FLAG_VRAM_IF_DGFX(lmtt_to_tile(lmtt)) |
> > +				       XE_BO_FLAG_NEEDS_64K,
> > false);
> >  	if (IS_ERR(bo)) {
> >  		err = PTR_ERR(bo);
> >  		goto out_free_pt;
> > diff --git a/drivers/gpu/drm/xe/xe_lrc.c
> > b/drivers/gpu/drm/xe/xe_lrc.c
> > index 8f6c3ba47882..6d52e0eb97f5 100644
> > --- a/drivers/gpu/drm/xe/xe_lrc.c
> > +++ b/drivers/gpu/drm/xe/xe_lrc.c
> > @@ -1340,9 +1340,10 @@ static int xe_lrc_init(struct xe_lrc *lrc,
> > struct xe_hw_engine *hwe,
> >  	if (vm && vm->xef) /* userspace */
> >  		bo_flags |= XE_BO_FLAG_PINNED_LATE_RESTORE;
> >  
> > -	lrc->bo = xe_bo_create_pin_map(xe, tile, NULL, bo_size,
> > -				       ttm_bo_type_kernel,
> > -				       bo_flags);
> > +	lrc->bo = xe_bo_create_pin_map_novm(xe, tile,
> > +					    bo_size,
> > +					    ttm_bo_type_kernel,
> > +					    bo_flags, false);
> >  	if (IS_ERR(lrc->bo))
> >  		return PTR_ERR(lrc->bo);
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> > b/drivers/gpu/drm/xe/xe_migrate.c
> > index ddfad7506a82..fe0d15ab340e 100644
> > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > @@ -35,6 +35,7 @@
> >  #include "xe_sched_job.h"
> >  #include "xe_sync.h"
> >  #include "xe_trace_bo.h"
> > +#include "xe_validation.h"
> >  #include "xe_vm.h"
> >  #include "xe_vram.h"
> >  
> > @@ -173,7 +174,7 @@ static void xe_migrate_program_identity(struct
> > xe_device *xe, struct xe_vm *vm,
> >  }
> >  
> >  static int xe_migrate_prepare_vm(struct xe_tile *tile, struct
> > xe_migrate *m,
> > -				 struct xe_vm *vm)
> > +				 struct xe_vm *vm, struct drm_exec
> > *exec)
> >  {
> >  	struct xe_device *xe = tile_to_xe(tile);
> >  	u16 pat_index = xe->pat.idx[XE_CACHE_WB];
> > @@ -200,7 +201,7 @@ static int xe_migrate_prepare_vm(struct xe_tile
> > *tile, struct xe_migrate *m,
> >  				  num_entries * XE_PAGE_SIZE,
> >  				  ttm_bo_type_kernel,
> >  				  XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> > -				  XE_BO_FLAG_PAGETABLE);
> > +				  XE_BO_FLAG_PAGETABLE, exec);
> >  	if (IS_ERR(bo))
> >  		return PTR_ERR(bo);
> >  
> > @@ -404,6 +405,8 @@ int xe_migrate_init(struct xe_migrate *m)
> >  	struct xe_tile *tile = m->tile;
> >  	struct xe_gt *primary_gt = tile->primary_gt;
> >  	struct xe_device *xe = tile_to_xe(tile);
> > +	struct xe_validation_ctx ctx;
> > +	struct drm_exec exec;
> >  	struct xe_vm *vm;
> >  	int err;
> >  
> > @@ -413,11 +416,16 @@ int xe_migrate_init(struct xe_migrate *m)
> >  	if (IS_ERR(vm))
> >  		return PTR_ERR(vm);
> >  
> > -	xe_vm_lock(vm, false);
> > -	err = xe_migrate_prepare_vm(tile, m, vm);
> > -	xe_vm_unlock(vm);
> > +	err = 0;
> > +	xe_validation_guard(&ctx, &xe->val, &exec, 0, err, false)
> > {
> > +		err = xe_vm_drm_exec_lock(vm, &exec);
> > +		drm_exec_retry_on_contention(&exec);
> > +		err = xe_migrate_prepare_vm(tile, m, vm, &exec);
> > +		drm_exec_retry_on_contention(&exec);
> > +		xe_validation_retry_on_oom(&ctx, &err);
> > +	}
> >  	if (err)
> > -		goto err_out;
> > +		return err;
> >  
> >  	if (xe->info.has_usm) {
> >  		struct xe_hw_engine *hwe =
> > xe_gt_hw_engine(primary_gt,
> > diff --git a/drivers/gpu/drm/xe/xe_oa.c
> > b/drivers/gpu/drm/xe/xe_oa.c
> > index a188bad172ad..a4894eb0d7f3 100644
> > --- a/drivers/gpu/drm/xe/xe_oa.c
> > +++ b/drivers/gpu/drm/xe/xe_oa.c
> > @@ -883,9 +883,9 @@ static int xe_oa_alloc_oa_buffer(struct
> > xe_oa_stream *stream, size_t size)
> >  {
> >  	struct xe_bo *bo;
> >  
> > -	bo = xe_bo_create_pin_map(stream->oa->xe, stream->gt-
> > >tile, NULL,
> > -				  size, ttm_bo_type_kernel,
> > -				  XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_GGTT);
> > +	bo = xe_bo_create_pin_map_novm(stream->oa->xe, stream->gt-
> > >tile,
> > +				       size, ttm_bo_type_kernel,
> > +				       XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_GGTT, false);
> >  	if (IS_ERR(bo))
> >  		return PTR_ERR(bo);
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_pt.c
> > b/drivers/gpu/drm/xe/xe_pt.c
> > index f3a39e734a90..33ad40418ceb 100644
> > --- a/drivers/gpu/drm/xe/xe_pt.c
> > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > @@ -88,6 +88,7 @@ static void xe_pt_free(struct xe_pt *pt)
> >   * @vm: The vm to create for.
> >   * @tile: The tile to create for.
> >   * @level: The page-table level.
> > + * @exec: The drm_exec object used to lock the vm.
> >   *
> >   * Allocate and initialize a single struct xe_pt metadata
> > structure. Also
> >   * create the corresponding page-table bo, but don't initialize
> > it. If the
> > @@ -99,7 +100,7 @@ static void xe_pt_free(struct xe_pt *pt)
> >   * error.
> >   */
> >  struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
> > -			   unsigned int level)
> > +			   unsigned int level, struct drm_exec
> > *exec)
> >  {
> >  	struct xe_pt *pt;
> >  	struct xe_bo *bo;
> > @@ -123,9 +124,11 @@ struct xe_pt *xe_pt_create(struct xe_vm *vm,
> > struct xe_tile *tile,
> >  		bo_flags |= XE_BO_FLAG_PINNED_LATE_RESTORE;
> >  
> >  	pt->level = level;
> > +
> > +	drm_WARN_ON(&vm->xe->drm, IS_ERR_OR_NULL(exec));
> >  	bo = xe_bo_create_pin_map(vm->xe, tile, vm, SZ_4K,
> >  				  ttm_bo_type_kernel,
> > -				  bo_flags);
> > +				  bo_flags, exec);
> >  	if (IS_ERR(bo)) {
> >  		err = PTR_ERR(bo);
> >  		goto err_kfree;
> > @@ -589,7 +592,8 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent,
> > pgoff_t offset,
> >  	if (covers || !*child) {
> >  		u64 flags = 0;
> >  
> > -		xe_child = xe_pt_create(xe_walk->vm, xe_walk-
> > >tile, level - 1);
> > +		xe_child = xe_pt_create(xe_walk->vm, xe_walk-
> > >tile, level - 1,
> > +					xe_vm_validation_exec(vm))
> > ;
> >  		if (IS_ERR(xe_child))
> >  			return PTR_ERR(xe_child);
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_pt.h
> > b/drivers/gpu/drm/xe/xe_pt.h
> > index 5ecf003d513c..4daeebaab5a1 100644
> > --- a/drivers/gpu/drm/xe/xe_pt.h
> > +++ b/drivers/gpu/drm/xe/xe_pt.h
> > @@ -10,6 +10,7 @@
> >  #include "xe_pt_types.h"
> >  
> >  struct dma_fence;
> > +struct drm_exec;
> >  struct xe_bo;
> >  struct xe_device;
> >  struct xe_exec_queue;
> > @@ -29,7 +30,7 @@ struct xe_vma_ops;
> >  unsigned int xe_pt_shift(unsigned int level);
> >  
> >  struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
> > -			   unsigned int level);
> > +			   unsigned int level, struct drm_exec
> > *exec);
> >  
> >  void xe_pt_populate_empty(struct xe_tile *tile, struct xe_vm *vm,
> >  			  struct xe_pt *pt);
> > diff --git a/drivers/gpu/drm/xe/xe_pxp_submit.c
> > b/drivers/gpu/drm/xe/xe_pxp_submit.c
> > index ca95f2a4d4ef..54bd6b64dc6d 100644
> > --- a/drivers/gpu/drm/xe/xe_pxp_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_pxp_submit.c
> > @@ -54,8 +54,9 @@ static int
> > allocate_vcs_execution_resources(struct xe_pxp *pxp)
> >  	 * Each termination is 16 DWORDS, so 4K is enough to
> > contain a
> >  	 * termination for each sessions.
> >  	 */
> > -	bo = xe_bo_create_pin_map(xe, tile, NULL, SZ_4K,
> > ttm_bo_type_kernel,
> > -				  XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_PINNED | XE_BO_FLAG_GGTT);
> > +	bo = xe_bo_create_pin_map_novm(xe, tile, SZ_4K,
> > ttm_bo_type_kernel,
> > +				       XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_PINNED | XE_BO_FLAG_GGTT,
> > +				       false);
> >  	if (IS_ERR(bo)) {
> >  		err = PTR_ERR(bo);
> >  		goto out_queue;
> > @@ -87,7 +88,9 @@ static int allocate_gsc_client_resources(struct
> > xe_gt *gt,
> >  {
> >  	struct xe_tile *tile = gt_to_tile(gt);
> >  	struct xe_device *xe = tile_to_xe(tile);
> > +	struct xe_validation_ctx ctx;
> >  	struct xe_hw_engine *hwe;
> > +	struct drm_exec exec;
> >  	struct xe_vm *vm;
> >  	struct xe_bo *bo;
> >  	struct xe_exec_queue *q;
> > @@ -106,15 +109,26 @@ static int
> > allocate_gsc_client_resources(struct xe_gt *gt,
> >  		return PTR_ERR(vm);
> >  
> >  	/* We allocate a single object for the batch and the
> > in/out memory */
> > -	xe_vm_lock(vm, false);
> > -	bo = xe_bo_create_pin_map(xe, tile, vm, PXP_BB_SIZE +
> > inout_size * 2,
> > -				  ttm_bo_type_kernel,
> > -				  XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_PINNED | XE_BO_FLAG_NEEDS_UC);
> > -	xe_vm_unlock(vm);
> > -	if (IS_ERR(bo)) {
> > -		err = PTR_ERR(bo);
> > -		goto vm_out;
> > +
> > +	xe_validation_guard(&ctx, &xe->val, &exec, 0, err, false)
> > {
> > +		err = xe_vm_drm_exec_lock(vm, &exec);
> > +		drm_exec_retry_on_contention(&exec);
> > +		if (err)
> > +			break;
> > +
> > +		bo = xe_bo_create_pin_map(xe, tile, vm,
> > PXP_BB_SIZE + inout_size * 2,
> > +					  ttm_bo_type_kernel,
> > +					  XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_PINNED |
> > +					  XE_BO_FLAG_NEEDS_UC,
> > &exec);
> > +		drm_exec_retry_on_contention(&exec);
> > +		if (IS_ERR(bo)) {
> > +			err = PTR_ERR(bo);
> > +			xe_validation_retry_on_oom(&ctx, &err);
> > +			break;
> > +		}
> >  	}
> > +	if (err)
> > +		goto vm_out;
> >  
> >  	fence = xe_vm_bind_kernel_bo(vm, bo, NULL, 0,
> > XE_CACHE_WB);
> >  	if (IS_ERR(fence)) {
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > b/drivers/gpu/drm/xe/xe_vm.c
> > index 989d84c2e82f..b3ee65126841 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -1606,6 +1606,7 @@ static void vm_destroy_work_func(struct
> > work_struct *w);
> >   * @xe: xe device.
> >   * @tile: tile to set up for.
> >   * @vm: vm to set up for.
> > + * @exec: The struct drm_exec object used to lock the vm resv.
> >   *
> >   * Sets up a pagetable tree with one page-table per level and a
> > single
> >   * leaf PTE. All pagetable entries point to the single page-table
> > or,
> > @@ -1615,20 +1616,19 @@ static void vm_destroy_work_func(struct
> > work_struct *w);
> >   * Return: 0 on success, negative error code on error.
> >   */
> >  static int xe_vm_create_scratch(struct xe_device *xe, struct
> > xe_tile *tile,
> > -				struct xe_vm *vm)
> > +				struct xe_vm *vm, struct drm_exec
> > *exec)
> >  {
> >  	u8 id = tile->id;
> >  	int i;
> >  
> >  	for (i = MAX_HUGEPTE_LEVEL; i < vm->pt_root[id]->level;
> > i++) {
> > -		vm->scratch_pt[id][i] = xe_pt_create(vm, tile, i);
> > +		vm->scratch_pt[id][i] = xe_pt_create(vm, tile, i,
> > exec);
> >  		if (IS_ERR(vm->scratch_pt[id][i])) {
> >  			int err = PTR_ERR(vm->scratch_pt[id][i]);
> >  
> >  			vm->scratch_pt[id][i] = NULL;
> >  			return err;
> >  		}
> > -
> >  		xe_pt_populate_empty(tile, vm, vm-
> > >scratch_pt[id][i]);
> >  	}
> >  
> > @@ -1656,9 +1656,26 @@ static void xe_vm_free_scratch(struct xe_vm
> > *vm)
> >  	}
> >  }
> >  
> > +static void xe_vm_pt_destroy(struct xe_vm *vm)
> > +{
> > +	struct xe_tile *tile;
> > +	u8 id;
> > +
> > +	xe_vm_assert_held(vm);
> > +
> > +	for_each_tile(tile, vm->xe, id) {
> > +		if (vm->pt_root[id]) {
> > +			xe_pt_destroy(vm->pt_root[id], vm->flags,
> > NULL);
> > +			vm->pt_root[id] = NULL;
> > +		}
> > +	}
> > +}
> > +
> >  struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct
> > xe_file *xef)
> >  {
> >  	struct drm_gem_object *vm_resv_obj;
> > +	struct xe_validation_ctx ctx;
> > +	struct drm_exec exec;
> >  	struct xe_vm *vm;
> >  	int err, number_tiles = 0;
> >  	struct xe_tile *tile;
> > @@ -1745,49 +1762,64 @@ struct xe_vm *xe_vm_create(struct xe_device
> > *xe, u32 flags, struct xe_file *xef)
> >  
> >  	drm_gem_object_put(vm_resv_obj);
> >  
> > -	err = xe_vm_lock(vm, true);
> > -	if (err)
> > -		goto err_close;
> > +	err = 0;
> > +	xe_validation_guard(&ctx, &xe->val, &exec,
> > DRM_EXEC_INTERRUPTIBLE_WAIT,
> > +			    err, true) {
> > +		err = xe_vm_drm_exec_lock(vm, &exec);
> > +		drm_exec_retry_on_contention(&exec);
> >  
> > -	if (IS_DGFX(xe) && xe->info.vram_flags &
> > XE_VRAM_FLAGS_NEED64K)
> > -		vm->flags |= XE_VM_FLAG_64K;
> > +		if (IS_DGFX(xe) && xe->info.vram_flags &
> > XE_VRAM_FLAGS_NEED64K)
> > +			vm->flags |= XE_VM_FLAG_64K;
> >  
> > -	for_each_tile(tile, xe, id) {
> > -		if (flags & XE_VM_FLAG_MIGRATION &&
> > -		    tile->id != XE_VM_FLAG_TILE_ID(flags))
> > -			continue;
> > +		for_each_tile(tile, xe, id) {
> > +			if (flags & XE_VM_FLAG_MIGRATION &&
> > +			    tile->id != XE_VM_FLAG_TILE_ID(flags))
> > +				continue;
> >  
> > -		vm->pt_root[id] = xe_pt_create(vm, tile, xe-
> > >info.vm_max_level);
> > -		if (IS_ERR(vm->pt_root[id])) {
> > -			err = PTR_ERR(vm->pt_root[id]);
> > -			vm->pt_root[id] = NULL;
> > -			goto err_unlock_close;
> > +			vm->pt_root[id] = xe_pt_create(vm, tile,
> > xe->info.vm_max_level,
> > +						       &exec);
> > +			if (IS_ERR(vm->pt_root[id])) {
> > +				err = PTR_ERR(vm->pt_root[id]);
> > +				vm->pt_root[id] = NULL;
> > +				xe_vm_pt_destroy(vm);
> > +				drm_exec_retry_on_contention(&exec
> > );
> > +				xe_validation_retry_on_oom(&ctx,
> > &err);
> > +				goto err_close;
> > +			}
> >  		}
> > -	}
> >  
> > -	if (xe_vm_has_scratch(vm)) {
> > +		if (xe_vm_has_scratch(vm)) {
> > +			for_each_tile(tile, xe, id) {
> > +				if (!vm->pt_root[id])
> > +					continue;
> > +
> > +				err = xe_vm_create_scratch(xe,
> > tile, vm, &exec);
> > +				if (err) {
> > +					xe_vm_free_scratch(vm);
> > +					xe_vm_pt_destroy(vm);
> > +					drm_exec_retry_on_contenti
> > on(&exec);
> > +					xe_validation_retry_on_oom
> > (&ctx, &err);
> > +					goto err_close;
> > +				}
> > +			}
> > +			vm->batch_invalidate_tlb = true;
> > +		}
> > +
> > +		if (vm->flags & XE_VM_FLAG_LR_MODE) {
> > +			INIT_WORK(&vm->preempt.rebind_work,
> > preempt_rebind_work_func);
> > +			vm->batch_invalidate_tlb = false;
> > +		}
> > +
> > +		/* Fill pt_root after allocating scratch tables */
> >  		for_each_tile(tile, xe, id) {
> >  			if (!vm->pt_root[id])
> >  				continue;
> >  
> > -			err = xe_vm_create_scratch(xe, tile, vm);
> > -			if (err)
> > -				goto err_unlock_close;
> > +			xe_pt_populate_empty(tile, vm, vm-
> > >pt_root[id]);
> >  		}
> > -		vm->batch_invalidate_tlb = true;
> > -	}
> > -
> > -	if (vm->flags & XE_VM_FLAG_LR_MODE)
> > -		vm->batch_invalidate_tlb = false;
> > -
> > -	/* Fill pt_root after allocating scratch tables */
> > -	for_each_tile(tile, xe, id) {
> > -		if (!vm->pt_root[id])
> > -			continue;
> > -
> > -		xe_pt_populate_empty(tile, vm, vm->pt_root[id]);
> >  	}
> > -	xe_vm_unlock(vm);
> > +	if (err)
> > +		goto err_close;
> >  
> >  	/* Kernel migration VM shouldn't have a circular loop.. */
> >  	if (!(flags & XE_VM_FLAG_MIGRATION)) {
> > @@ -1820,7 +1852,7 @@ struct xe_vm *xe_vm_create(struct xe_device
> > *xe, u32 flags, struct xe_file *xef)
> >  				      &xe->usm.next_asid,
> > GFP_KERNEL);
> >  		up_write(&xe->usm.lock);
> >  		if (err < 0)
> > -			goto err_unlock_close;
> > +			goto err_close;
> >  
> >  		vm->usm.asid = asid;
> >  	}
> > @@ -1829,8 +1861,6 @@ struct xe_vm *xe_vm_create(struct xe_device
> > *xe, u32 flags, struct xe_file *xef)
> >  
> >  	return vm;
> >  
> > -err_unlock_close:
> > -	xe_vm_unlock(vm);
> >  err_close:
> >  	xe_vm_close_and_put(vm);
> >  	return ERR_PTR(err);
> > @@ -1959,13 +1989,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
> >  	 * destroy the pagetables immediately.
> >  	 */
> >  	xe_vm_free_scratch(vm);
> > -
> > -	for_each_tile(tile, xe, id) {
> > -		if (vm->pt_root[id]) {
> > -			xe_pt_destroy(vm->pt_root[id], vm->flags,
> > NULL);
> > -			vm->pt_root[id] = NULL;
> > -		}
> > -	}
> > +	xe_vm_pt_destroy(vm);
> >  	xe_vm_unlock(vm);
> >  
> >  	/*
> > @@ -3845,7 +3869,6 @@ struct dma_fence *xe_vm_bind_kernel_bo(struct
> > xe_vm *vm, struct xe_bo *bo,
> >   */
> >  int xe_vm_lock(struct xe_vm *vm, bool intr)
> >  {
> > -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> 
> You add this earlier in the series and then delete it here, which
> seems
> odd. If the intent is to apply a workaround (WA) earlier and finalize
> it
> here, that makes sense.

Yes, the patch that introduces these identifies all the call-sites
where we need a fix-up, and uses lockdep to verify that a validation
transaction starting there doesn't violate the dma_resv locking orders.

Most of the series then eliminates these _UNIMPLEMENTED one by one.
Of course one could do it in the opposite order to avoid introducing
and then eliminating these but I think that would be harder both for
the developer and reviewer.

> 
> However, deleting it here appears to leave a problematic path in
> xe_svm.c around xe_vm_range_rebind, where vm->validating._exec may
> remain unset/stale. I’m surprised CI didn’t catch this—perhaps due to
> an
> unbalanced xe_vm_set_validation_exec leaving stale state—or I’m
> missing
> something.

Yes, you are correct. There was an imbalance and I will fix that.

Thanks,
Thomas

> 
> Matt
> 
> >  	int ret;
> >  
> >  	if (intr)
> > @@ -3853,9 +3876,6 @@ int xe_vm_lock(struct xe_vm *vm, bool intr)
> >  	else
> >  		ret = dma_resv_lock(xe_vm_resv(vm), NULL);
> >  
> > -	if (!ret)
> > -		xe_vm_set_validation_exec(vm, exec);
> > -
> >  	return ret;
> >  }
> >  
> > @@ -3867,7 +3887,6 @@ int xe_vm_lock(struct xe_vm *vm, bool intr)
> >   */
> >  void xe_vm_unlock(struct xe_vm *vm)
> >  {
> > -	xe_vm_set_validation_exec(vm, NULL);
> >  	dma_resv_unlock(xe_vm_resv(vm));
> >  }
> >  
> > -- 
> > 2.50.1
> > 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 14/15] drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction
  2025-08-14 13:14     ` Thomas Hellström
@ 2025-08-14 18:39       ` Matthew Brost
  0 siblings, 0 replies; 66+ messages in thread
From: Matthew Brost @ 2025-08-14 18:39 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Thu, Aug 14, 2025 at 03:14:57PM +0200, Thomas Hellström wrote:
> On Wed, 2025-08-13 at 21:18 -0700, Matthew Brost wrote:
> > On Wed, Aug 13, 2025 at 12:51:20PM +0200, Thomas Hellström wrote:
> > > Introduce an xe_bo_create_pin_map_novm() function that does not
> > > take the drm_exec paramenter to simplify the conversion of many
> > > callsites.
> > > For the rest, ensure that the same drm_exec context that was used
> > > for locking the vm is passed down to validation.
> > > 
> > > Use xe_validation_guard() where appropriate.
> > > 
> > > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/display/intel_fbdev_fb.c   |  18 +--
> > >  drivers/gpu/drm/xe/display/xe_dsb_buffer.c    |  10 +-
> > >  drivers/gpu/drm/xe/display/xe_fb_pin.c        |  39 +++---
> > >  drivers/gpu/drm/xe/display/xe_hdcp_gsc.c      |   8 +-
> > >  drivers/gpu/drm/xe/tests/xe_migrate.c         |   9 +-
> > >  drivers/gpu/drm/xe/xe_bo.c                    |  53 +++++++-
> > >  drivers/gpu/drm/xe/xe_bo.h                    |   6 +-
> > >  drivers/gpu/drm/xe/xe_gsc.c                   |   8 +-
> > >  drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c    |  24 ++--
> > >  drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c |  24 ++--
> > >  drivers/gpu/drm/xe/xe_guc_engine_activity.c   |  13 +-
> > >  drivers/gpu/drm/xe/xe_lmtt.c                  |  12 +-
> > >  drivers/gpu/drm/xe/xe_lrc.c                   |   7 +-
> > >  drivers/gpu/drm/xe/xe_migrate.c               |  20 ++-
> > >  drivers/gpu/drm/xe/xe_oa.c                    |   6 +-
> > >  drivers/gpu/drm/xe/xe_pt.c                    |  10 +-
> > >  drivers/gpu/drm/xe/xe_pt.h                    |   3 +-
> > >  drivers/gpu/drm/xe/xe_pxp_submit.c            |  34 +++--
> > >  drivers/gpu/drm/xe/xe_vm.c                    | 119 ++++++++++----
> > > ----
> > >  19 files changed, 252 insertions(+), 171 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
> > > b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
> > > index d96ba2b51065..8ea9a472113c 100644
> > > --- a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
> > > +++ b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
> > > @@ -42,11 +42,11 @@ struct intel_framebuffer
> > > *intel_fbdev_fb_alloc(struct drm_fb_helper *helper,
> > >  	obj = ERR_PTR(-ENODEV);
> > >  
> > >  	if (!IS_DGFX(xe) && !XE_GT_WA(xe_root_mmio_gt(xe),
> > > 22019338487_display)) {
> > > -		obj = xe_bo_create_pin_map(xe,
> > > xe_device_get_root_tile(xe),
> > > -					   NULL, size,
> > > -					   ttm_bo_type_kernel,
> > > XE_BO_FLAG_SCANOUT |
> > > -					   XE_BO_FLAG_STOLEN |
> > > -					   XE_BO_FLAG_GGTT);
> > > +		obj = xe_bo_create_pin_map_novm(xe,
> > > xe_device_get_root_tile(xe),
> > > +						size,
> > > +						ttm_bo_type_kernel
> > > , XE_BO_FLAG_SCANOUT |
> > > +						XE_BO_FLAG_STOLEN
> > > |
> > > +						XE_BO_FLAG_GGTT,
> > > false);
> > >  		if (!IS_ERR(obj))
> > >  			drm_info(&xe->drm, "Allocated fbdev into
> > > stolen\n");
> > >  		else
> > > @@ -54,10 +54,10 @@ struct intel_framebuffer
> > > *intel_fbdev_fb_alloc(struct drm_fb_helper *helper,
> > >  	}
> > >  
> > >  	if (IS_ERR(obj)) {
> > > -		obj = xe_bo_create_pin_map(xe,
> > > xe_device_get_root_tile(xe), NULL, size,
> > > -					   ttm_bo_type_kernel,
> > > XE_BO_FLAG_SCANOUT |
> > > -					  
> > > XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> > > -					   XE_BO_FLAG_GGTT);
> > > +		obj = xe_bo_create_pin_map_novm(xe,
> > > xe_device_get_root_tile(xe), size,
> > > +						ttm_bo_type_kernel
> > > , XE_BO_FLAG_SCANOUT |
> > > +						XE_BO_FLAG_VRAM_IF
> > > _DGFX(xe_device_get_root_tile(xe)) |
> > > +						XE_BO_FLAG_GGTT,
> > > false);
> > >  	}
> > >  
> > >  	if (IS_ERR(obj)) {
> > > diff --git a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
> > > b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
> > > index 9f941fc2e36b..58581d7aaae6 100644
> > > --- a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
> > > +++ b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
> > > @@ -43,11 +43,11 @@ bool intel_dsb_buffer_create(struct intel_crtc
> > > *crtc, struct intel_dsb_buffer *d
> > >  		return false;
> > >  
> > >  	/* Set scanout flag for WC mapping */
> > > -	obj = xe_bo_create_pin_map(xe,
> > > xe_device_get_root_tile(xe),
> > > -				   NULL, PAGE_ALIGN(size),
> > > -				   ttm_bo_type_kernel,
> > > -				  
> > > XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> > > -				   XE_BO_FLAG_SCANOUT |
> > > XE_BO_FLAG_GGTT);
> > > +	obj = xe_bo_create_pin_map_novm(xe,
> > > xe_device_get_root_tile(xe),
> > > +					PAGE_ALIGN(size),
> > > +					ttm_bo_type_kernel,
> > > +					XE_BO_FLAG_VRAM_IF_DGFX(xe
> > > _device_get_root_tile(xe)) |
> > > +					XE_BO_FLAG_SCANOUT |
> > > XE_BO_FLAG_GGTT, false);
> > >  	if (IS_ERR(obj)) {
> > >  		kfree(vma);
> > >  		return false;
> > > diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > > b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > > index d46ff7ebb0a1..d8e15ebb740c 100644
> > > --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > > +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > > @@ -102,32 +102,23 @@ static int __xe_pin_fb_vma_dpt(const struct
> > > intel_framebuffer *fb,
> > >  				 XE_PAGE_SIZE);
> > >  
> > >  	if (IS_DGFX(xe))
> > > -		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> > > -						   dpt_size,
> > > ~0ull,
> > > -						  
> > > ttm_bo_type_kernel,
> > > -						   true,
> > > -						  
> > > XE_BO_FLAG_VRAM0 |
> > > -						   XE_BO_FLAG_GGTT
> > > |
> > > -						  
> > > XE_BO_FLAG_PAGETABLE,
> > > -						   alignment,
> > > false);
> > > +		dpt = xe_bo_create_pin_map_novm(xe, tile0,
> > > dpt_size,
> > > +						ttm_bo_type_kernel
> > > ,
> > > +						XE_BO_FLAG_VRAM0 |
> > > +						XE_BO_FLAG_GGTT |
> > > +						XE_BO_FLAG_PAGETAB
> > > LE, true);
> > >  	else
> > > -		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> > > -						   dpt_size, 
> > > ~0ull,
> > > -						  
> > > ttm_bo_type_kernel,
> > > -						   true,
> > > -						  
> > > XE_BO_FLAG_STOLEN |
> > > -						   XE_BO_FLAG_GGTT
> > > |
> > > -						  
> > > XE_BO_FLAG_PAGETABLE,
> > > -						   alignment,
> > > false);
> > > +		dpt = xe_bo_create_pin_map_novm(xe, tile0,
> > > dpt_size,
> > > +						ttm_bo_type_kernel
> > > ,
> > > +						XE_BO_FLAG_STOLEN
> > > |
> > > +						XE_BO_FLAG_GGTT |
> > > +						XE_BO_FLAG_PAGETAB
> > > LE, true);
> > >  	if (IS_ERR(dpt))
> > > -		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> > > -						   dpt_size, 
> > > ~0ull,
> > > -						  
> > > ttm_bo_type_kernel,
> > > -						   true,
> > > -						  
> > > XE_BO_FLAG_SYSTEM |
> > > -						   XE_BO_FLAG_GGTT
> > > |
> > > -						  
> > > XE_BO_FLAG_PAGETABLE,
> > > -						   alignment,
> > > false);
> > > +		dpt = xe_bo_create_pin_map_novm(xe, tile0,
> > > dpt_size,
> > > +						ttm_bo_type_kernel
> > > ,
> > > +						XE_BO_FLAG_SYSTEM
> > > |
> > > +						XE_BO_FLAG_GGTT |
> > > +						XE_BO_FLAG_PAGETAB
> > > LE, true);
> > >  	if (IS_ERR(dpt))
> > >  		return PTR_ERR(dpt);
> > >  
> > > diff --git a/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
> > > b/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
> > > index 30f1073141fc..4ae847b628e2 100644
> > > --- a/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
> > > +++ b/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
> > > @@ -72,10 +72,10 @@ static int
> > > intel_hdcp_gsc_initialize_message(struct xe_device *xe,
> > >  	int ret = 0;
> > >  
> > >  	/* allocate object of two page for HDCP command memory and
> > > store it */
> > > -	bo = xe_bo_create_pin_map(xe, xe_device_get_root_tile(xe),
> > > NULL, PAGE_SIZE * 2,
> > > -				  ttm_bo_type_kernel,
> > > -				  XE_BO_FLAG_SYSTEM |
> > > -				  XE_BO_FLAG_GGTT);
> > > +	bo = xe_bo_create_pin_map_novm(xe,
> > > xe_device_get_root_tile(xe), PAGE_SIZE * 2,
> > > +				       ttm_bo_type_kernel,
> > > +				       XE_BO_FLAG_SYSTEM |
> > > +				       XE_BO_FLAG_GGTT, false);
> > >  
> > >  	if (IS_ERR(bo)) {
> > >  		drm_err(&xe->drm, "Failed to allocate bo for HDCP
> > > streaming command!\n");
> > > diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c
> > > b/drivers/gpu/drm/xe/tests/xe_migrate.c
> > > index afa794e56065..5904d658d1f2 100644
> > > --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
> > > +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
> > > @@ -204,7 +204,8 @@ static void xe_migrate_sanity_test(struct
> > > xe_migrate *m, struct kunit *test,
> > >  
> > >  	big = xe_bo_create_pin_map(xe, tile, m->q->vm, SZ_4M,
> > >  				   ttm_bo_type_kernel,
> > > -				   XE_BO_FLAG_VRAM_IF_DGFX(tile));
> > > +				   XE_BO_FLAG_VRAM_IF_DGFX(tile),
> > > +				   exec);
> > >  	if (IS_ERR(big)) {
> > >  		KUNIT_FAIL(test, "Failed to allocate bo: %li\n",
> > > PTR_ERR(big));
> > >  		goto vunmap;
> > > @@ -212,7 +213,8 @@ static void xe_migrate_sanity_test(struct
> > > xe_migrate *m, struct kunit *test,
> > >  
> > >  	pt = xe_bo_create_pin_map(xe, tile, m->q->vm,
> > > XE_PAGE_SIZE,
> > >  				  ttm_bo_type_kernel,
> > > -				  XE_BO_FLAG_VRAM_IF_DGFX(tile));
> > > +				  XE_BO_FLAG_VRAM_IF_DGFX(tile),
> > > +				  exec);
> > >  	if (IS_ERR(pt)) {
> > >  		KUNIT_FAIL(test, "Failed to allocate fake pt:
> > > %li\n",
> > >  			   PTR_ERR(pt));
> > > @@ -222,7 +224,8 @@ static void xe_migrate_sanity_test(struct
> > > xe_migrate *m, struct kunit *test,
> > >  	tiny = xe_bo_create_pin_map(xe, tile, m->q->vm,
> > >  				    2 * SZ_4K,
> > >  				    ttm_bo_type_kernel,
> > > -				   
> > > XE_BO_FLAG_VRAM_IF_DGFX(tile));
> > > +				    XE_BO_FLAG_VRAM_IF_DGFX(tile),
> > > +				    exec);
> > >  	if (IS_ERR(tiny)) {
> > >  		KUNIT_FAIL(test, "Failed to allocate tiny fake pt:
> > > %li\n",
> > >  			   PTR_ERR(tiny));
> > > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > > b/drivers/gpu/drm/xe/xe_bo.c
> > > index c9928d4ee5a0..82bf158426ad 100644
> > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > @@ -2343,16 +2343,60 @@ xe_bo_create_pin_map_at_novm(struct
> > > xe_device *xe, struct xe_tile *tile,
> > >  	return ret ? ERR_PTR(ret) : bo;
> > >  }
> > >  
> > > +/**
> > > + * xe_bo_create_pin_map() - Create pinned and mapped bo
> > > + * @xe: The xe device.
> > > + * @tile: The tile to select for migration of this bo, and the
> > > tile used for
> > > + * @vm: The vm to associate the buffer object with. The vm's resv
> > > must be locked
> > > + * with the transaction represented by @exec.
> > > + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel
> > > bos.
> > > + * @size: The storage size to use for the bo.
> > > + * @type: The TTM buffer object type.
> > > + * @flags: XE_BO_FLAG_ flags.
> > > + * @exec: The drm_exec transaction to use for exhaustive eviction,
> > > and
> > > + * previously used for locking @vm's resv.
> > > + *
> > > + * Create a pinned and mapped bo. The bo will be external and not
> > > associated
> > > + * with a VM.
> > > + *
> > > + * Return: The buffer object on success. Negative error pointer on
> > > failure.
> > > + * In particular, the function may return ERR_PTR(%-EINTR) if
> > > @intr was set
> > > + * to true on entry.
> > > + */
> > >  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct
> > > xe_tile *tile,
> > >  				   struct xe_vm *vm, size_t size,
> > > -				   enum ttm_bo_type type, u32
> > > flags)
> > > +				   enum ttm_bo_type type, u32
> > > flags,
> > > +				   struct drm_exec *exec)
> > >  {
> > > -	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) :
> > > XE_VALIDATION_UNIMPLEMENTED;
> > > -
> > > +	xe_assert(xe, exec);
> > >  	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size,
> > > ~0ull, type, flags,
> > >  					       true, 0, exec);
> > >  }
> > >  
> > > +/**
> > > + * xe_bo_create_pin_map_novm() - Create pinned and mapped bo
> > > + * @xe: The xe device.
> > > + * @tile: The tile to select for migration of this bo, and the
> > > tile used for
> > > + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel
> > > bos.
> > > + * @size: The storage size to use for the bo.
> > > + * @type: The TTM buffer object type.
> > > + * @flags: XE_BO_FLAG_ flags.
> > > + * @intr: Whether to execut any waits for backing store
> > > interruptible.
> > > + *
> > > + * Create a pinned and mapped bo. The bo will be external and not
> > > associated
> > > + * with a VM.
> > > + *
> > > + * Return: The buffer object on success. Negative error pointer on
> > > failure.
> > > + * In particular, the function may return ERR_PTR(%-EINTR) if
> > > @intr was set
> > > + * to true on entry.
> > > + */
> > > +struct xe_bo *xe_bo_create_pin_map_novm(struct xe_device *xe,
> > > struct xe_tile *tile,
> > > +					size_t size, enum
> > > ttm_bo_type type, u32 flags,
> > > +					bool intr)
> > > +{
> > > +	return xe_bo_create_pin_map_at_novm(xe, tile, size, ~0ull,
> > > type, flags, true, 0, intr);
> > > +}
> > > +
> > >  static void __xe_bo_unpin_map_no_vm(void *arg)
> > >  {
> > >  	xe_bo_unpin_map_no_vm(arg);
> > > @@ -2365,8 +2409,7 @@ struct xe_bo
> > > *xe_managed_bo_create_pin_map(struct xe_device *xe, struct xe_tile
> > >  	int ret;
> > >  
> > >  	KUNIT_STATIC_STUB_REDIRECT(xe_managed_bo_create_pin_map,
> > > xe, tile, size, flags);
> > > -
> > > -	bo = xe_bo_create_pin_map(xe, tile, NULL, size,
> > > ttm_bo_type_kernel, flags);
> > > +	bo = xe_bo_create_pin_map_novm(xe, tile, size,
> > > ttm_bo_type_kernel, flags, true);
> > >  	if (IS_ERR(bo))
> > >  		return bo;
> > >  
> > > diff --git a/drivers/gpu/drm/xe/xe_bo.h
> > > b/drivers/gpu/drm/xe/xe_bo.h
> > > index d06266af9662..802e3c7d7872 100644
> > > --- a/drivers/gpu/drm/xe/xe_bo.h
> > > +++ b/drivers/gpu/drm/xe/xe_bo.h
> > > @@ -108,7 +108,11 @@ struct xe_bo *xe_bo_create_user(struct
> > > xe_device *xe, struct xe_vm *vm, size_t s
> > >  				u16 cpu_caching, u32 flags, struct
> > > drm_exec *exec);
> > >  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct
> > > xe_tile *tile,
> > >  				   struct xe_vm *vm, size_t size,
> > > -				   enum ttm_bo_type type, u32
> > > flags);
> > > +				   enum ttm_bo_type type, u32
> > > flags,
> > > +				   struct drm_exec *exec);
> > > +struct xe_bo *xe_bo_create_pin_map_novm(struct xe_device *xe,
> > > struct xe_tile *tile,
> > > +					size_t size, enum
> > > ttm_bo_type type, u32 flags,
> > > +					bool intr);
> > >  struct xe_bo *
> > >  xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile
> > > *tile,
> > >  			     size_t size, u64 offset, enum
> > > ttm_bo_type type,
> > > diff --git a/drivers/gpu/drm/xe/xe_gsc.c
> > > b/drivers/gpu/drm/xe/xe_gsc.c
> > > index f5ae28af60d4..83d61bf8ec62 100644
> > > --- a/drivers/gpu/drm/xe/xe_gsc.c
> > > +++ b/drivers/gpu/drm/xe/xe_gsc.c
> > > @@ -136,10 +136,10 @@ static int query_compatibility_version(struct
> > > xe_gsc *gsc)
> > >  	u64 ggtt_offset;
> > >  	int err;
> > >  
> > > -	bo = xe_bo_create_pin_map(xe, tile, NULL, GSC_VER_PKT_SZ *
> > > 2,
> > > -				  ttm_bo_type_kernel,
> > > -				  XE_BO_FLAG_SYSTEM |
> > > -				  XE_BO_FLAG_GGTT);
> > > +	bo = xe_bo_create_pin_map_novm(xe, tile, GSC_VER_PKT_SZ *
> > > 2,
> > > +				       ttm_bo_type_kernel,
> > > +				       XE_BO_FLAG_SYSTEM |
> > > +				       XE_BO_FLAG_GGTT, false);
> > >  	if (IS_ERR(bo)) {
> > >  		xe_gt_err(gt, "failed to allocate bo for GSC
> > > version query\n");
> > >  		return PTR_ERR(bo);
> > > diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> > > b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> > > index 906011671b60..d0a87d7b028b 100644
> > > --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> > > +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> > > @@ -1452,7 +1452,6 @@ static bool pf_release_vf_config_lmem(struct
> > > xe_gt *gt, struct xe_gt_sriov_confi
> > >  static int pf_provision_vf_lmem(struct xe_gt *gt, unsigned int
> > > vfid, u64 size)
> > >  {
> > >  	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt,
> > > vfid);
> > > -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> > >  	struct xe_device *xe = gt_to_xe(gt);
> > >  	struct xe_tile *tile = gt_to_tile(gt);
> > >  	struct xe_bo *bo;
> > > @@ -1479,24 +1478,17 @@ static int pf_provision_vf_lmem(struct
> > > xe_gt *gt, unsigned int vfid, u64 size)
> > >  		return 0;
> > >  
> > >  	xe_gt_assert(gt, pf_get_lmem_alignment(gt) == SZ_2M);
> > > -	bo = xe_bo_create_locked(xe, tile, NULL,
> > > -				 ALIGN(size, PAGE_SIZE),
> > > -				 ttm_bo_type_kernel,
> > > -				 XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> > > -				 XE_BO_FLAG_NEEDS_2M |
> > > -				 XE_BO_FLAG_PINNED |
> > > -				 XE_BO_FLAG_PINNED_LATE_RESTORE,
> > > -				 exec);
> > > +	bo = xe_bo_create_pin_map_at_novm(xe, tile,
> > > +					  ALIGN(size, PAGE_SIZE),
> > > +					  0,
> > > +					  ttm_bo_type_kernel,
> > > +					 
> > > XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> > > +					  XE_BO_FLAG_NEEDS_2M |
> > > +					 
> > > XE_BO_FLAG_PINNED_LATE_RESTORE,
> > > +					  false, 0, false);
> > >  	if (IS_ERR(bo))
> > >  		return PTR_ERR(bo);
> > >  
> > > -	err = xe_bo_pin(bo, exec);
> > > -	xe_bo_unlock(bo);
> > > -	if (unlikely(err)) {
> > > -		xe_bo_put(bo);
> > > -		return err;
> > > -	}
> > > -
> > >  	config->lmem_obj = bo;
> > >  
> > >  	if (xe_device_has_lmtt(xe)) {
> > > diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> > > b/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> > > index c712111aa30d..44cc612b0a75 100644
> > > --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> > > +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> > > @@ -55,12 +55,12 @@ static int pf_send_guc_save_vf_state(struct
> > > xe_gt *gt, unsigned int vfid,
> > >  	xe_gt_assert(gt, size % sizeof(u32) == 0);
> > >  	xe_gt_assert(gt, size == ndwords * sizeof(u32));
> > >  
> > > -	bo = xe_bo_create_pin_map(xe, tile, NULL,
> > > -				  ALIGN(size, PAGE_SIZE),
> > > -				  ttm_bo_type_kernel,
> > > -				  XE_BO_FLAG_SYSTEM |
> > > -				  XE_BO_FLAG_GGTT |
> > > -				  XE_BO_FLAG_GGTT_INVALIDATE);
> > > +	bo = xe_bo_create_pin_map_novm(xe, tile,
> > > +				       ALIGN(size, PAGE_SIZE),
> > > +				       ttm_bo_type_kernel,
> > > +				       XE_BO_FLAG_SYSTEM |
> > > +				       XE_BO_FLAG_GGTT |
> > > +				       XE_BO_FLAG_GGTT_INVALIDATE,
> > > false);
> > >  	if (IS_ERR(bo))
> > >  		return PTR_ERR(bo);
> > >  
> > > @@ -91,12 +91,12 @@ static int pf_send_guc_restore_vf_state(struct
> > > xe_gt *gt, unsigned int vfid,
> > >  	xe_gt_assert(gt, size % sizeof(u32) == 0);
> > >  	xe_gt_assert(gt, size == ndwords * sizeof(u32));
> > >  
> > > -	bo = xe_bo_create_pin_map(xe, tile, NULL,
> > > -				  ALIGN(size, PAGE_SIZE),
> > > -				  ttm_bo_type_kernel,
> > > -				  XE_BO_FLAG_SYSTEM |
> > > -				  XE_BO_FLAG_GGTT |
> > > -				  XE_BO_FLAG_GGTT_INVALIDATE);
> > > +	bo = xe_bo_create_pin_map_novm(xe, tile,
> > > +				       ALIGN(size, PAGE_SIZE),
> > > +				       ttm_bo_type_kernel,
> > > +				       XE_BO_FLAG_SYSTEM |
> > > +				       XE_BO_FLAG_GGTT |
> > > +				       XE_BO_FLAG_GGTT_INVALIDATE,
> > > false);
> > >  	if (IS_ERR(bo))
> > >  		return PTR_ERR(bo);
> > >  
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_engine_activity.c
> > > b/drivers/gpu/drm/xe/xe_guc_engine_activity.c
> > > index 92e1f9f41b8c..2b99c1ebdd58 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_engine_activity.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_engine_activity.c
> > > @@ -94,16 +94,17 @@ static int
> > > allocate_engine_activity_buffers(struct xe_guc *guc,
> > >  	struct xe_tile *tile = gt_to_tile(gt);
> > >  	struct xe_bo *bo, *metadata_bo;
> > >  
> > > -	metadata_bo = xe_bo_create_pin_map(gt_to_xe(gt), tile,
> > > NULL, PAGE_ALIGN(metadata_size),
> > > -					   ttm_bo_type_kernel,
> > > XE_BO_FLAG_SYSTEM |
> > > -					   XE_BO_FLAG_GGTT |
> > > XE_BO_FLAG_GGTT_INVALIDATE);
> > > +	metadata_bo = xe_bo_create_pin_map_novm(gt_to_xe(gt),
> > > tile, PAGE_ALIGN(metadata_size),
> > > +						ttm_bo_type_kernel
> > > , XE_BO_FLAG_SYSTEM |
> > > +						XE_BO_FLAG_GGTT |
> > > XE_BO_FLAG_GGTT_INVALIDATE,
> > > +						false);
> > >  
> > >  	if (IS_ERR(metadata_bo))
> > >  		return PTR_ERR(metadata_bo);
> > >  
> > > -	bo = xe_bo_create_pin_map(gt_to_xe(gt), tile, NULL,
> > > PAGE_ALIGN(size),
> > > -				  ttm_bo_type_kernel,
> > > XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> > > -				  XE_BO_FLAG_GGTT |
> > > XE_BO_FLAG_GGTT_INVALIDATE);
> > > +	bo = xe_bo_create_pin_map_novm(gt_to_xe(gt), tile,
> > > PAGE_ALIGN(size),
> > > +				       ttm_bo_type_kernel,
> > > XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> > > +				       XE_BO_FLAG_GGTT |
> > > XE_BO_FLAG_GGTT_INVALIDATE, false);
> > >  
> > >  	if (IS_ERR(bo)) {
> > >  		xe_bo_unpin_map_no_vm(metadata_bo);
> > > diff --git a/drivers/gpu/drm/xe/xe_lmtt.c
> > > b/drivers/gpu/drm/xe/xe_lmtt.c
> > > index a78c9d474a6e..4ad468574174 100644
> > > --- a/drivers/gpu/drm/xe/xe_lmtt.c
> > > +++ b/drivers/gpu/drm/xe/xe_lmtt.c
> > > @@ -67,12 +67,12 @@ static struct xe_lmtt_pt *lmtt_pt_alloc(struct
> > > xe_lmtt *lmtt, unsigned int level
> > >  		goto out;
> > >  	}
> > >  
> > > -	bo = xe_bo_create_pin_map(lmtt_to_xe(lmtt),
> > > lmtt_to_tile(lmtt), NULL,
> > > -				  PAGE_ALIGN(lmtt->ops-
> > > >lmtt_pte_size(level) *
> > > -					     lmtt->ops-
> > > >lmtt_pte_num(level)),
> > > -				  ttm_bo_type_kernel,
> > > -				 
> > > XE_BO_FLAG_VRAM_IF_DGFX(lmtt_to_tile(lmtt)) |
> > > -				  XE_BO_FLAG_NEEDS_64K);
> > > +	bo = xe_bo_create_pin_map_novm(lmtt_to_xe(lmtt),
> > > lmtt_to_tile(lmtt),
> > > +				       PAGE_ALIGN(lmtt->ops-
> > > >lmtt_pte_size(level) *
> > > +						  lmtt->ops-
> > > >lmtt_pte_num(level)),
> > > +				       ttm_bo_type_kernel,
> > > +				      
> > > XE_BO_FLAG_VRAM_IF_DGFX(lmtt_to_tile(lmtt)) |
> > > +				       XE_BO_FLAG_NEEDS_64K,
> > > false);
> > >  	if (IS_ERR(bo)) {
> > >  		err = PTR_ERR(bo);
> > >  		goto out_free_pt;
> > > diff --git a/drivers/gpu/drm/xe/xe_lrc.c
> > > b/drivers/gpu/drm/xe/xe_lrc.c
> > > index 8f6c3ba47882..6d52e0eb97f5 100644
> > > --- a/drivers/gpu/drm/xe/xe_lrc.c
> > > +++ b/drivers/gpu/drm/xe/xe_lrc.c
> > > @@ -1340,9 +1340,10 @@ static int xe_lrc_init(struct xe_lrc *lrc,
> > > struct xe_hw_engine *hwe,
> > >  	if (vm && vm->xef) /* userspace */
> > >  		bo_flags |= XE_BO_FLAG_PINNED_LATE_RESTORE;
> > >  
> > > -	lrc->bo = xe_bo_create_pin_map(xe, tile, NULL, bo_size,
> > > -				       ttm_bo_type_kernel,
> > > -				       bo_flags);
> > > +	lrc->bo = xe_bo_create_pin_map_novm(xe, tile,
> > > +					    bo_size,
> > > +					    ttm_bo_type_kernel,
> > > +					    bo_flags, false);
> > >  	if (IS_ERR(lrc->bo))
> > >  		return PTR_ERR(lrc->bo);
> > >  
> > > diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> > > b/drivers/gpu/drm/xe/xe_migrate.c
> > > index ddfad7506a82..fe0d15ab340e 100644
> > > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > > @@ -35,6 +35,7 @@
> > >  #include "xe_sched_job.h"
> > >  #include "xe_sync.h"
> > >  #include "xe_trace_bo.h"
> > > +#include "xe_validation.h"
> > >  #include "xe_vm.h"
> > >  #include "xe_vram.h"
> > >  
> > > @@ -173,7 +174,7 @@ static void xe_migrate_program_identity(struct
> > > xe_device *xe, struct xe_vm *vm,
> > >  }
> > >  
> > >  static int xe_migrate_prepare_vm(struct xe_tile *tile, struct
> > > xe_migrate *m,
> > > -				 struct xe_vm *vm)
> > > +				 struct xe_vm *vm, struct drm_exec
> > > *exec)
> > >  {
> > >  	struct xe_device *xe = tile_to_xe(tile);
> > >  	u16 pat_index = xe->pat.idx[XE_CACHE_WB];
> > > @@ -200,7 +201,7 @@ static int xe_migrate_prepare_vm(struct xe_tile
> > > *tile, struct xe_migrate *m,
> > >  				  num_entries * XE_PAGE_SIZE,
> > >  				  ttm_bo_type_kernel,
> > >  				  XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> > > -				  XE_BO_FLAG_PAGETABLE);
> > > +				  XE_BO_FLAG_PAGETABLE, exec);
> > >  	if (IS_ERR(bo))
> > >  		return PTR_ERR(bo);
> > >  
> > > @@ -404,6 +405,8 @@ int xe_migrate_init(struct xe_migrate *m)
> > >  	struct xe_tile *tile = m->tile;
> > >  	struct xe_gt *primary_gt = tile->primary_gt;
> > >  	struct xe_device *xe = tile_to_xe(tile);
> > > +	struct xe_validation_ctx ctx;
> > > +	struct drm_exec exec;
> > >  	struct xe_vm *vm;
> > >  	int err;
> > >  
> > > @@ -413,11 +416,16 @@ int xe_migrate_init(struct xe_migrate *m)
> > >  	if (IS_ERR(vm))
> > >  		return PTR_ERR(vm);
> > >  
> > > -	xe_vm_lock(vm, false);
> > > -	err = xe_migrate_prepare_vm(tile, m, vm);
> > > -	xe_vm_unlock(vm);
> > > +	err = 0;
> > > +	xe_validation_guard(&ctx, &xe->val, &exec, 0, err, false)
> > > {
> > > +		err = xe_vm_drm_exec_lock(vm, &exec);
> > > +		drm_exec_retry_on_contention(&exec);
> > > +		err = xe_migrate_prepare_vm(tile, m, vm, &exec);
> > > +		drm_exec_retry_on_contention(&exec);
> > > +		xe_validation_retry_on_oom(&ctx, &err);
> > > +	}
> > >  	if (err)
> > > -		goto err_out;
> > > +		return err;
> > >  
> > >  	if (xe->info.has_usm) {
> > >  		struct xe_hw_engine *hwe =
> > > xe_gt_hw_engine(primary_gt,
> > > diff --git a/drivers/gpu/drm/xe/xe_oa.c
> > > b/drivers/gpu/drm/xe/xe_oa.c
> > > index a188bad172ad..a4894eb0d7f3 100644
> > > --- a/drivers/gpu/drm/xe/xe_oa.c
> > > +++ b/drivers/gpu/drm/xe/xe_oa.c
> > > @@ -883,9 +883,9 @@ static int xe_oa_alloc_oa_buffer(struct
> > > xe_oa_stream *stream, size_t size)
> > >  {
> > >  	struct xe_bo *bo;
> > >  
> > > -	bo = xe_bo_create_pin_map(stream->oa->xe, stream->gt-
> > > >tile, NULL,
> > > -				  size, ttm_bo_type_kernel,
> > > -				  XE_BO_FLAG_SYSTEM |
> > > XE_BO_FLAG_GGTT);
> > > +	bo = xe_bo_create_pin_map_novm(stream->oa->xe, stream->gt-
> > > >tile,
> > > +				       size, ttm_bo_type_kernel,
> > > +				       XE_BO_FLAG_SYSTEM |
> > > XE_BO_FLAG_GGTT, false);
> > >  	if (IS_ERR(bo))
> > >  		return PTR_ERR(bo);
> > >  
> > > diff --git a/drivers/gpu/drm/xe/xe_pt.c
> > > b/drivers/gpu/drm/xe/xe_pt.c
> > > index f3a39e734a90..33ad40418ceb 100644
> > > --- a/drivers/gpu/drm/xe/xe_pt.c
> > > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > > @@ -88,6 +88,7 @@ static void xe_pt_free(struct xe_pt *pt)
> > >   * @vm: The vm to create for.
> > >   * @tile: The tile to create for.
> > >   * @level: The page-table level.
> > > + * @exec: The drm_exec object used to lock the vm.
> > >   *
> > >   * Allocate and initialize a single struct xe_pt metadata
> > > structure. Also
> > >   * create the corresponding page-table bo, but don't initialize
> > > it. If the
> > > @@ -99,7 +100,7 @@ static void xe_pt_free(struct xe_pt *pt)
> > >   * error.
> > >   */
> > >  struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
> > > -			   unsigned int level)
> > > +			   unsigned int level, struct drm_exec
> > > *exec)
> > >  {
> > >  	struct xe_pt *pt;
> > >  	struct xe_bo *bo;
> > > @@ -123,9 +124,11 @@ struct xe_pt *xe_pt_create(struct xe_vm *vm,
> > > struct xe_tile *tile,
> > >  		bo_flags |= XE_BO_FLAG_PINNED_LATE_RESTORE;
> > >  
> > >  	pt->level = level;
> > > +
> > > +	drm_WARN_ON(&vm->xe->drm, IS_ERR_OR_NULL(exec));
> > >  	bo = xe_bo_create_pin_map(vm->xe, tile, vm, SZ_4K,
> > >  				  ttm_bo_type_kernel,
> > > -				  bo_flags);
> > > +				  bo_flags, exec);
> > >  	if (IS_ERR(bo)) {
> > >  		err = PTR_ERR(bo);
> > >  		goto err_kfree;
> > > @@ -589,7 +592,8 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent,
> > > pgoff_t offset,
> > >  	if (covers || !*child) {
> > >  		u64 flags = 0;
> > >  
> > > -		xe_child = xe_pt_create(xe_walk->vm, xe_walk-
> > > >tile, level - 1);
> > > +		xe_child = xe_pt_create(xe_walk->vm, xe_walk-
> > > >tile, level - 1,
> > > +					xe_vm_validation_exec(vm))
> > > ;
> > >  		if (IS_ERR(xe_child))
> > >  			return PTR_ERR(xe_child);
> > >  
> > > diff --git a/drivers/gpu/drm/xe/xe_pt.h
> > > b/drivers/gpu/drm/xe/xe_pt.h
> > > index 5ecf003d513c..4daeebaab5a1 100644
> > > --- a/drivers/gpu/drm/xe/xe_pt.h
> > > +++ b/drivers/gpu/drm/xe/xe_pt.h
> > > @@ -10,6 +10,7 @@
> > >  #include "xe_pt_types.h"
> > >  
> > >  struct dma_fence;
> > > +struct drm_exec;
> > >  struct xe_bo;
> > >  struct xe_device;
> > >  struct xe_exec_queue;
> > > @@ -29,7 +30,7 @@ struct xe_vma_ops;
> > >  unsigned int xe_pt_shift(unsigned int level);
> > >  
> > >  struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
> > > -			   unsigned int level);
> > > +			   unsigned int level, struct drm_exec
> > > *exec);
> > >  
> > >  void xe_pt_populate_empty(struct xe_tile *tile, struct xe_vm *vm,
> > >  			  struct xe_pt *pt);
> > > diff --git a/drivers/gpu/drm/xe/xe_pxp_submit.c
> > > b/drivers/gpu/drm/xe/xe_pxp_submit.c
> > > index ca95f2a4d4ef..54bd6b64dc6d 100644
> > > --- a/drivers/gpu/drm/xe/xe_pxp_submit.c
> > > +++ b/drivers/gpu/drm/xe/xe_pxp_submit.c
> > > @@ -54,8 +54,9 @@ static int
> > > allocate_vcs_execution_resources(struct xe_pxp *pxp)
> > >  	 * Each termination is 16 DWORDS, so 4K is enough to
> > > contain a
> > >  	 * termination for each sessions.
> > >  	 */
> > > -	bo = xe_bo_create_pin_map(xe, tile, NULL, SZ_4K,
> > > ttm_bo_type_kernel,
> > > -				  XE_BO_FLAG_SYSTEM |
> > > XE_BO_FLAG_PINNED | XE_BO_FLAG_GGTT);
> > > +	bo = xe_bo_create_pin_map_novm(xe, tile, SZ_4K,
> > > ttm_bo_type_kernel,
> > > +				       XE_BO_FLAG_SYSTEM |
> > > XE_BO_FLAG_PINNED | XE_BO_FLAG_GGTT,
> > > +				       false);
> > >  	if (IS_ERR(bo)) {
> > >  		err = PTR_ERR(bo);
> > >  		goto out_queue;
> > > @@ -87,7 +88,9 @@ static int allocate_gsc_client_resources(struct
> > > xe_gt *gt,
> > >  {
> > >  	struct xe_tile *tile = gt_to_tile(gt);
> > >  	struct xe_device *xe = tile_to_xe(tile);
> > > +	struct xe_validation_ctx ctx;
> > >  	struct xe_hw_engine *hwe;
> > > +	struct drm_exec exec;
> > >  	struct xe_vm *vm;
> > >  	struct xe_bo *bo;
> > >  	struct xe_exec_queue *q;
> > > @@ -106,15 +109,26 @@ static int
> > > allocate_gsc_client_resources(struct xe_gt *gt,
> > >  		return PTR_ERR(vm);
> > >  
> > >  	/* We allocate a single object for the batch and the
> > > in/out memory */
> > > -	xe_vm_lock(vm, false);
> > > -	bo = xe_bo_create_pin_map(xe, tile, vm, PXP_BB_SIZE +
> > > inout_size * 2,
> > > -				  ttm_bo_type_kernel,
> > > -				  XE_BO_FLAG_SYSTEM |
> > > XE_BO_FLAG_PINNED | XE_BO_FLAG_NEEDS_UC);
> > > -	xe_vm_unlock(vm);
> > > -	if (IS_ERR(bo)) {
> > > -		err = PTR_ERR(bo);
> > > -		goto vm_out;
> > > +
> > > +	xe_validation_guard(&ctx, &xe->val, &exec, 0, err, false)
> > > {
> > > +		err = xe_vm_drm_exec_lock(vm, &exec);
> > > +		drm_exec_retry_on_contention(&exec);
> > > +		if (err)
> > > +			break;
> > > +
> > > +		bo = xe_bo_create_pin_map(xe, tile, vm,
> > > PXP_BB_SIZE + inout_size * 2,
> > > +					  ttm_bo_type_kernel,
> > > +					  XE_BO_FLAG_SYSTEM |
> > > XE_BO_FLAG_PINNED |
> > > +					  XE_BO_FLAG_NEEDS_UC,
> > > &exec);
> > > +		drm_exec_retry_on_contention(&exec);
> > > +		if (IS_ERR(bo)) {
> > > +			err = PTR_ERR(bo);
> > > +			xe_validation_retry_on_oom(&ctx, &err);
> > > +			break;
> > > +		}
> > >  	}
> > > +	if (err)
> > > +		goto vm_out;
> > >  
> > >  	fence = xe_vm_bind_kernel_bo(vm, bo, NULL, 0,
> > > XE_CACHE_WB);
> > >  	if (IS_ERR(fence)) {
> > > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > > b/drivers/gpu/drm/xe/xe_vm.c
> > > index 989d84c2e82f..b3ee65126841 100644
> > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > @@ -1606,6 +1606,7 @@ static void vm_destroy_work_func(struct
> > > work_struct *w);
> > >   * @xe: xe device.
> > >   * @tile: tile to set up for.
> > >   * @vm: vm to set up for.
> > > + * @exec: The struct drm_exec object used to lock the vm resv.
> > >   *
> > >   * Sets up a pagetable tree with one page-table per level and a
> > > single
> > >   * leaf PTE. All pagetable entries point to the single page-table
> > > or,
> > > @@ -1615,20 +1616,19 @@ static void vm_destroy_work_func(struct
> > > work_struct *w);
> > >   * Return: 0 on success, negative error code on error.
> > >   */
> > >  static int xe_vm_create_scratch(struct xe_device *xe, struct
> > > xe_tile *tile,
> > > -				struct xe_vm *vm)
> > > +				struct xe_vm *vm, struct drm_exec
> > > *exec)
> > >  {
> > >  	u8 id = tile->id;
> > >  	int i;
> > >  
> > >  	for (i = MAX_HUGEPTE_LEVEL; i < vm->pt_root[id]->level;
> > > i++) {
> > > -		vm->scratch_pt[id][i] = xe_pt_create(vm, tile, i);
> > > +		vm->scratch_pt[id][i] = xe_pt_create(vm, tile, i,
> > > exec);
> > >  		if (IS_ERR(vm->scratch_pt[id][i])) {
> > >  			int err = PTR_ERR(vm->scratch_pt[id][i]);
> > >  
> > >  			vm->scratch_pt[id][i] = NULL;
> > >  			return err;
> > >  		}
> > > -
> > >  		xe_pt_populate_empty(tile, vm, vm-
> > > >scratch_pt[id][i]);
> > >  	}
> > >  
> > > @@ -1656,9 +1656,26 @@ static void xe_vm_free_scratch(struct xe_vm
> > > *vm)
> > >  	}
> > >  }
> > >  
> > > +static void xe_vm_pt_destroy(struct xe_vm *vm)
> > > +{
> > > +	struct xe_tile *tile;
> > > +	u8 id;
> > > +
> > > +	xe_vm_assert_held(vm);
> > > +
> > > +	for_each_tile(tile, vm->xe, id) {
> > > +		if (vm->pt_root[id]) {
> > > +			xe_pt_destroy(vm->pt_root[id], vm->flags,
> > > NULL);
> > > +			vm->pt_root[id] = NULL;
> > > +		}
> > > +	}
> > > +}
> > > +
> > >  struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct
> > > xe_file *xef)
> > >  {
> > >  	struct drm_gem_object *vm_resv_obj;
> > > +	struct xe_validation_ctx ctx;
> > > +	struct drm_exec exec;
> > >  	struct xe_vm *vm;
> > >  	int err, number_tiles = 0;
> > >  	struct xe_tile *tile;
> > > @@ -1745,49 +1762,64 @@ struct xe_vm *xe_vm_create(struct xe_device
> > > *xe, u32 flags, struct xe_file *xef)
> > >  
> > >  	drm_gem_object_put(vm_resv_obj);
> > >  
> > > -	err = xe_vm_lock(vm, true);
> > > -	if (err)
> > > -		goto err_close;
> > > +	err = 0;
> > > +	xe_validation_guard(&ctx, &xe->val, &exec,
> > > DRM_EXEC_INTERRUPTIBLE_WAIT,
> > > +			    err, true) {
> > > +		err = xe_vm_drm_exec_lock(vm, &exec);
> > > +		drm_exec_retry_on_contention(&exec);
> > >  
> > > -	if (IS_DGFX(xe) && xe->info.vram_flags &
> > > XE_VRAM_FLAGS_NEED64K)
> > > -		vm->flags |= XE_VM_FLAG_64K;
> > > +		if (IS_DGFX(xe) && xe->info.vram_flags &
> > > XE_VRAM_FLAGS_NEED64K)
> > > +			vm->flags |= XE_VM_FLAG_64K;
> > >  
> > > -	for_each_tile(tile, xe, id) {
> > > -		if (flags & XE_VM_FLAG_MIGRATION &&
> > > -		    tile->id != XE_VM_FLAG_TILE_ID(flags))
> > > -			continue;
> > > +		for_each_tile(tile, xe, id) {
> > > +			if (flags & XE_VM_FLAG_MIGRATION &&
> > > +			    tile->id != XE_VM_FLAG_TILE_ID(flags))
> > > +				continue;
> > >  
> > > -		vm->pt_root[id] = xe_pt_create(vm, tile, xe-
> > > >info.vm_max_level);
> > > -		if (IS_ERR(vm->pt_root[id])) {
> > > -			err = PTR_ERR(vm->pt_root[id]);
> > > -			vm->pt_root[id] = NULL;
> > > -			goto err_unlock_close;
> > > +			vm->pt_root[id] = xe_pt_create(vm, tile,
> > > xe->info.vm_max_level,
> > > +						       &exec);
> > > +			if (IS_ERR(vm->pt_root[id])) {
> > > +				err = PTR_ERR(vm->pt_root[id]);
> > > +				vm->pt_root[id] = NULL;
> > > +				xe_vm_pt_destroy(vm);
> > > +				drm_exec_retry_on_contention(&exec
> > > );
> > > +				xe_validation_retry_on_oom(&ctx,
> > > &err);
> > > +				goto err_close;
> > > +			}
> > >  		}
> > > -	}
> > >  
> > > -	if (xe_vm_has_scratch(vm)) {
> > > +		if (xe_vm_has_scratch(vm)) {
> > > +			for_each_tile(tile, xe, id) {
> > > +				if (!vm->pt_root[id])
> > > +					continue;
> > > +
> > > +				err = xe_vm_create_scratch(xe,
> > > tile, vm, &exec);
> > > +				if (err) {
> > > +					xe_vm_free_scratch(vm);
> > > +					xe_vm_pt_destroy(vm);
> > > +					drm_exec_retry_on_contenti
> > > on(&exec);
> > > +					xe_validation_retry_on_oom
> > > (&ctx, &err);
> > > +					goto err_close;
> > > +				}
> > > +			}
> > > +			vm->batch_invalidate_tlb = true;
> > > +		}
> > > +
> > > +		if (vm->flags & XE_VM_FLAG_LR_MODE) {
> > > +			INIT_WORK(&vm->preempt.rebind_work,
> > > preempt_rebind_work_func);
> > > +			vm->batch_invalidate_tlb = false;
> > > +		}
> > > +
> > > +		/* Fill pt_root after allocating scratch tables */
> > >  		for_each_tile(tile, xe, id) {
> > >  			if (!vm->pt_root[id])
> > >  				continue;
> > >  
> > > -			err = xe_vm_create_scratch(xe, tile, vm);
> > > -			if (err)
> > > -				goto err_unlock_close;
> > > +			xe_pt_populate_empty(tile, vm, vm-
> > > >pt_root[id]);
> > >  		}
> > > -		vm->batch_invalidate_tlb = true;
> > > -	}
> > > -
> > > -	if (vm->flags & XE_VM_FLAG_LR_MODE)
> > > -		vm->batch_invalidate_tlb = false;
> > > -
> > > -	/* Fill pt_root after allocating scratch tables */
> > > -	for_each_tile(tile, xe, id) {
> > > -		if (!vm->pt_root[id])
> > > -			continue;
> > > -
> > > -		xe_pt_populate_empty(tile, vm, vm->pt_root[id]);
> > >  	}
> > > -	xe_vm_unlock(vm);
> > > +	if (err)
> > > +		goto err_close;
> > >  
> > >  	/* Kernel migration VM shouldn't have a circular loop.. */
> > >  	if (!(flags & XE_VM_FLAG_MIGRATION)) {
> > > @@ -1820,7 +1852,7 @@ struct xe_vm *xe_vm_create(struct xe_device
> > > *xe, u32 flags, struct xe_file *xef)
> > >  				      &xe->usm.next_asid,
> > > GFP_KERNEL);
> > >  		up_write(&xe->usm.lock);
> > >  		if (err < 0)
> > > -			goto err_unlock_close;
> > > +			goto err_close;
> > >  
> > >  		vm->usm.asid = asid;
> > >  	}
> > > @@ -1829,8 +1861,6 @@ struct xe_vm *xe_vm_create(struct xe_device
> > > *xe, u32 flags, struct xe_file *xef)
> > >  
> > >  	return vm;
> > >  
> > > -err_unlock_close:
> > > -	xe_vm_unlock(vm);
> > >  err_close:
> > >  	xe_vm_close_and_put(vm);
> > >  	return ERR_PTR(err);
> > > @@ -1959,13 +1989,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
> > >  	 * destroy the pagetables immediately.
> > >  	 */
> > >  	xe_vm_free_scratch(vm);
> > > -
> > > -	for_each_tile(tile, xe, id) {
> > > -		if (vm->pt_root[id]) {
> > > -			xe_pt_destroy(vm->pt_root[id], vm->flags,
> > > NULL);
> > > -			vm->pt_root[id] = NULL;
> > > -		}
> > > -	}
> > > +	xe_vm_pt_destroy(vm);
> > >  	xe_vm_unlock(vm);
> > >  
> > >  	/*
> > > @@ -3845,7 +3869,6 @@ struct dma_fence *xe_vm_bind_kernel_bo(struct
> > > xe_vm *vm, struct xe_bo *bo,
> > >   */
> > >  int xe_vm_lock(struct xe_vm *vm, bool intr)
> > >  {
> > > -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> > 
> > You add this earlier in the series and then delete it here, which
> > seems
> > odd. If the intent is to apply a workaround (WA) earlier and finalize
> > it
> > here, that makes sense.
> 
> Yes, the patch that introduces these identifies all the call-sites
> where we need a fix-up, and uses lockdep to verify that a validation
> transaction starting there doesn't violate the dma_resv locking orders.
> 
> Most of the series then eliminates these _UNIMPLEMENTED one by one.
> Of course one could do it in the opposite order to avoid introducing
> and then eliminating these but I think that would be harder both for
> the developer and reviewer.
> 

Fine with the ordering, just wanted to make sure I was understanding why
this was done.

> > 
> > However, deleting it here appears to leave a problematic path in
> > xe_svm.c around xe_vm_range_rebind, where vm->validating._exec may
> > remain unset/stale. I’m surprised CI didn’t catch this—perhaps due to
> > an
> > unbalanced xe_vm_set_validation_exec leaving stale state—or I’m
> > missing
> > something.
> 
> Yes, you are correct. There was an imbalance and I will fix that.
> 

+1

Matt

> Thanks,
> Thomas
> 
> > 
> > Matt
> > 
> > >  	int ret;
> > >  
> > >  	if (intr)
> > > @@ -3853,9 +3876,6 @@ int xe_vm_lock(struct xe_vm *vm, bool intr)
> > >  	else
> > >  		ret = dma_resv_lock(xe_vm_resv(vm), NULL);
> > >  
> > > -	if (!ret)
> > > -		xe_vm_set_validation_exec(vm, exec);
> > > -
> > >  	return ret;
> > >  }
> > >  
> > > @@ -3867,7 +3887,6 @@ int xe_vm_lock(struct xe_vm *vm, bool intr)
> > >   */
> > >  void xe_vm_unlock(struct xe_vm *vm)
> > >  {
> > > -	xe_vm_set_validation_exec(vm, NULL);
> > >  	dma_resv_unlock(xe_vm_resv(vm));
> > >  }
> > >  
> > > -- 
> > > 2.50.1
> > > 
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 13/15] drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction
  2025-08-13 10:51 ` [PATCH 13/15] drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction Thomas Hellström
  2025-08-14  3:58   ` Matthew Brost
  2025-08-14  4:05   ` Matthew Brost
@ 2025-08-14 18:48   ` Matthew Brost
  2025-08-15  9:37     ` Thomas Hellström
  2 siblings, 1 reply; 66+ messages in thread
From: Matthew Brost @ 2025-08-14 18:48 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, Aug 13, 2025 at 12:51:19PM +0200, Thomas Hellström wrote:
> Most users of xe_bo_create_pin_map_at() and
> xe_bo_create_pin_map_at_aligned() are not using the vm parameter,
> and that simplifies conversion. Introduce an
> xe_bo_create_pin_map_at_novm() function and make the _aligned()
> version static. Use xe_validation_guard() for conversion.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  .../compat-i915-headers/gem/i915_gem_stolen.h | 24 ++----
>  drivers/gpu/drm/xe/display/xe_fb_pin.c        | 45 +++++-----
>  drivers/gpu/drm/xe/display/xe_plane_initial.c |  4 +-
>  drivers/gpu/drm/xe/xe_bo.c                    | 83 ++++++++++++++-----
>  drivers/gpu/drm/xe/xe_bo.h                    | 13 +--
>  drivers/gpu/drm/xe/xe_eu_stall.c              |  6 +-
>  6 files changed, 101 insertions(+), 74 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> index 1ce1e9da975b..ab48635ddffa 100644
> --- a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> +++ b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> @@ -21,9 +21,7 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
>  						       u32 size, u32 align,
>  						       u32 start, u32 end)
>  {
> -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
>  	struct xe_bo *bo;
> -	int err;
>  	u32 flags = XE_BO_FLAG_PINNED | XE_BO_FLAG_STOLEN;
>  
>  	if (start < SZ_4K)
> @@ -34,25 +32,15 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
>  		start = ALIGN(start, align);
>  	}
>  
> -	bo = xe_bo_create_locked_range(xe, xe_device_get_root_tile(xe),
> -				       NULL, size, start, end,
> -				       ttm_bo_type_kernel, flags, 0, exec);
> -	if (IS_ERR(bo)) {
> -		err = PTR_ERR(bo);
> -		bo = NULL;
> -		return err;
> -	}
> -	err = xe_bo_pin(bo, exec);
> -	xe_bo_unlock_vm_held(bo);
> -
> -	if (err) {
> -		xe_bo_put(fb->bo);
> -		bo = NULL;
> -	}
> +	bo = xe_bo_create_pin_map_at_novm(xe, xe_device_get_root_tile(xe),
> +					  size, start, ttm_bo_type_kernel, flags,
> +					  false, 0, true);
> +	if (IS_ERR(bo))
> +		return PTR_ERR(bo);
>  
>  	fb->bo = bo;
>  
> -	return err;
> +	return 0;
>  }
>  
>  static inline int i915_gem_stolen_insert_node(struct xe_device *xe,
> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> index 43c45344ea26..d46ff7ebb0a1 100644
> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> @@ -102,29 +102,32 @@ static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb,
>  				 XE_PAGE_SIZE);
>  
>  	if (IS_DGFX(xe))
> -		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0, NULL,
> -						      dpt_size, ~0ull,
> -						      ttm_bo_type_kernel,
> -						      XE_BO_FLAG_VRAM0 |
> -						      XE_BO_FLAG_GGTT |
> -						      XE_BO_FLAG_PAGETABLE,
> -						      alignment);
> +		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> +						   dpt_size, ~0ull,
> +						   ttm_bo_type_kernel,
> +						   true,
> +						   XE_BO_FLAG_VRAM0 |
> +						   XE_BO_FLAG_GGTT |
> +						   XE_BO_FLAG_PAGETABLE,
> +						   alignment, false);
>  	else
> -		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0, NULL,
> -						      dpt_size,  ~0ull,
> -						      ttm_bo_type_kernel,
> -						      XE_BO_FLAG_STOLEN |
> -						      XE_BO_FLAG_GGTT |
> -						      XE_BO_FLAG_PAGETABLE,
> -						      alignment);
> +		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> +						   dpt_size,  ~0ull,
> +						   ttm_bo_type_kernel,
> +						   true,
> +						   XE_BO_FLAG_STOLEN |
> +						   XE_BO_FLAG_GGTT |
> +						   XE_BO_FLAG_PAGETABLE,
> +						   alignment, false);
>  	if (IS_ERR(dpt))
> -		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0, NULL,
> -						      dpt_size,  ~0ull,
> -						      ttm_bo_type_kernel,
> -						      XE_BO_FLAG_SYSTEM |
> -						      XE_BO_FLAG_GGTT |
> -						      XE_BO_FLAG_PAGETABLE,
> -						      alignment);
> +		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> +						   dpt_size,  ~0ull,
> +						   ttm_bo_type_kernel,
> +						   true,
> +						   XE_BO_FLAG_SYSTEM |
> +						   XE_BO_FLAG_GGTT |
> +						   XE_BO_FLAG_PAGETABLE,
> +						   alignment, false);
>  	if (IS_ERR(dpt))
>  		return PTR_ERR(dpt);
>  
> diff --git a/drivers/gpu/drm/xe/display/xe_plane_initial.c b/drivers/gpu/drm/xe/display/xe_plane_initial.c
> index 826ac3d578b7..79d00127caf4 100644
> --- a/drivers/gpu/drm/xe/display/xe_plane_initial.c
> +++ b/drivers/gpu/drm/xe/display/xe_plane_initial.c
> @@ -140,8 +140,8 @@ initial_plane_bo(struct xe_device *xe,
>  			page_size);
>  	size -= base;
>  
> -	bo = xe_bo_create_pin_map_at(xe, tile0, NULL, size, phys_base,
> -				     ttm_bo_type_kernel, flags);
> +	bo = xe_bo_create_pin_map_at_novm(xe, tile0, size, phys_base,
> +					  ttm_bo_type_kernel, flags, true, 0, false);
>  	if (IS_ERR(bo)) {
>  		drm_dbg(&xe->drm,
>  			"Failed to create bo phys_base=%pa size %u with flags %x: %li\n",
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 23b28eeef59f..c9928d4ee5a0 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -2253,29 +2253,20 @@ struct xe_bo *xe_bo_create_user(struct xe_device *xe,
>  	return bo;
>  }
>  
> -struct xe_bo *xe_bo_create_pin_map_at(struct xe_device *xe, struct xe_tile *tile,
> -				      struct xe_vm *vm,
> -				      size_t size, u64 offset,
> -				      enum ttm_bo_type type, u32 flags)
> -{
> -	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size, offset,
> -					       type, flags, 0);
> -}
> -
> -struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
> -					      struct xe_tile *tile,
> -					      struct xe_vm *vm,
> -					      size_t size, u64 offset,
> -					      enum ttm_bo_type type, u32 flags,
> -					      u64 alignment)
> +static struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
> +						     struct xe_tile *tile,
> +						     struct xe_vm *vm,
> +						     size_t size, u64 offset,
> +						     enum ttm_bo_type type, u32 flags,
> +						     bool vmap, u64 alignment,
> +						     struct drm_exec *exec)
>  {
>  	struct xe_bo *bo;
>  	int err;
>  	u64 start = offset == ~0ull ? 0 : offset;
>  	u64 end = offset == ~0ull ? offset : start + size;
> -	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
>  

General comment for the series: should all BO-layer functions that
allocate (or may allocate) memory include a lockdep assertion that the
xe_validation_device->val lock is held? We already have
xe_validation_assert_exec in several places, which is similar, but IMO
it wouldn’t hurt to also assert xe_validation_device->val in the
relevant driver paths. The new TTM manager functions are good candidates
as well. Consider adding a follow-up patch at the end of the series to
add these assertions once all allocation paths adhere to the new locking
model.

Matt

> -	if (flags & XE_BO_FLAG_STOLEN &&
> +	if (flags & XE_BO_FLAG_STOLEN && vmap &&
>  	    xe_ttm_stolen_cpu_access_needs_ggtt(xe))
>  		flags |= XE_BO_FLAG_GGTT;
>  
> @@ -2289,9 +2280,11 @@ struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
>  	if (err)
>  		goto err_put;
>  
> -	err = xe_bo_vmap(bo);
> -	if (err)
> -		goto err_unpin;
> +	if (vmap) {
> +		err = xe_bo_vmap(bo);
> +		if (err)
> +			goto err_unpin;
> +	}
>  
>  	xe_bo_unlock_vm_held(bo);
>  
> @@ -2305,11 +2298,59 @@ struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
>  	return ERR_PTR(err);
>  }
>  
> +/**
> + * xe_bo_create_pin_map_at_novm() - Create pinned and mapped bo at optional VRAM offset
> + * @xe: The xe device.
> + * @tile: The tile to select for migration of this bo, and the tile used for
> + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
> + * @size: The storage size to use for the bo.
> + * @offset: Optional VRAM offset or %0 for don't care.
> + * @type: The TTM buffer object type.
> + * @flags: XE_BO_FLAG_ flags.
> + * @vmap: Whether to create a buffer object map.
> + * @alignment: GGTT alignment.
> + * @intr: Whether to execut any waits for backing store interruptible.
> + *
> + * Create a pinned and optionally mapped bo with VRAM offset and GGTT alignment
> + * options. The bo will be external and not associated with a VM.
> + *
> + * Return: The buffer object on success. Negative error pointer on failure.
> + * In particular, the function may return ERR_PTR(%-EINTR) if @intr was set
> + * to true on entry.
> + */
> +struct xe_bo *
> +xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile *tile,
> +			     size_t size, u64 offset, enum ttm_bo_type type, u32 flags,
> +			     bool vmap, u64 alignment, bool intr)
> +{
> +	u32 drm_exec_flags = intr ? DRM_EXEC_INTERRUPTIBLE_WAIT : 0;
> +	struct xe_validation_ctx ctx;
> +	struct drm_exec exec;
> +	struct xe_bo *bo;
> +	int ret = 0;
> +
> +	xe_validation_guard(&ctx, &xe->val, &exec, drm_exec_flags, ret, false) {
> +		bo = xe_bo_create_pin_map_at_aligned(xe, tile, NULL, size, offset,
> +						     type, flags, vmap,
> +						     alignment, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +		if (IS_ERR(bo)) {
> +			ret = PTR_ERR(bo);
> +			xe_validation_retry_on_oom(&ctx, &ret);
> +		}
> +	}
> +
> +	return ret ? ERR_PTR(ret) : bo;
> +}
> +
>  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
>  				   struct xe_vm *vm, size_t size,
>  				   enum ttm_bo_type type, u32 flags)
>  {
> -	return xe_bo_create_pin_map_at(xe, tile, vm, size, ~0ull, type, flags);
> +	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
> +
> +	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size, ~0ull, type, flags,
> +					       true, 0, exec);
>  }
>  
>  static void __xe_bo_unpin_map_no_vm(void *arg)
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index a625806deeb6..d06266af9662 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -109,15 +109,10 @@ struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_vm *vm, size_t s
>  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
>  				   struct xe_vm *vm, size_t size,
>  				   enum ttm_bo_type type, u32 flags);
> -struct xe_bo *xe_bo_create_pin_map_at(struct xe_device *xe, struct xe_tile *tile,
> -				      struct xe_vm *vm, size_t size, u64 offset,
> -				      enum ttm_bo_type type, u32 flags);
> -struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
> -					      struct xe_tile *tile,
> -					      struct xe_vm *vm,
> -					      size_t size, u64 offset,
> -					      enum ttm_bo_type type, u32 flags,
> -					      u64 alignment);
> +struct xe_bo *
> +xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile *tile,
> +			     size_t size, u64 offset, enum ttm_bo_type type,
> +			     u32 flags, bool vmap, u64 alignment, bool intr);
>  struct xe_bo *xe_managed_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
>  					   size_t size, u32 flags);
>  struct xe_bo *xe_managed_bo_create_from_data(struct xe_device *xe, struct xe_tile *tile,
> diff --git a/drivers/gpu/drm/xe/xe_eu_stall.c b/drivers/gpu/drm/xe/xe_eu_stall.c
> index fdd514fec5ef..afabfc125488 100644
> --- a/drivers/gpu/drm/xe/xe_eu_stall.c
> +++ b/drivers/gpu/drm/xe/xe_eu_stall.c
> @@ -617,9 +617,9 @@ static int xe_eu_stall_data_buf_alloc(struct xe_eu_stall_data_stream *stream,
>  
>  	size = stream->per_xecore_buf_size * last_xecore;
>  
> -	bo = xe_bo_create_pin_map_at_aligned(tile->xe, tile, NULL,
> -					     size, ~0ull, ttm_bo_type_kernel,
> -					     XE_BO_FLAG_SYSTEM | XE_BO_FLAG_GGTT, SZ_64);
> +	bo = xe_bo_create_pin_map_at_novm(tile->xe, tile, size, ~0ull, ttm_bo_type_kernel,
> +					  XE_BO_FLAG_SYSTEM | XE_BO_FLAG_GGTT, true,
> +					  SZ_64, false);
>  	if (IS_ERR(bo)) {
>  		kfree(stream->xecore_buf);
>  		return PTR_ERR(bo);
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 04/15] drm/xe: Pass down drm_exec context to validation
  2025-08-14  7:49     ` Thomas Hellström
@ 2025-08-14 19:09       ` Matthew Brost
  0 siblings, 0 replies; 66+ messages in thread
From: Matthew Brost @ 2025-08-14 19:09 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Thu, Aug 14, 2025 at 09:49:59AM +0200, Thomas Hellström wrote:
> On Wed, 2025-08-13 at 09:42 -0700, Matthew Brost wrote:
> 
> > On Wed, Aug 13, 2025 at 12:51:10PM +0200, Thomas Hellström wrote:
> > 
> > > We want all validation (potential backing store allocation) to be part
> > > of a drm_exec transaction. Therefore add a drm_exec pointer argument
> > > to xe_bo_validate() and ___xe_bo_create_locked(). Upcoming patches
> > > will deal with making all (or nearly all) calls to these functions
> > > part of a drm_exec transaction. In the meantime, define special values
> > > of the drm_exec pointer:
> > > 
> > 
> > 
> > Would the eventual idea be pass the exec further down to TTM?
> 
> 
> Yes. The original series did this, and required multiple changes both to drm_exec and to TTM. Christian had some other ideas, though although the final goal was the same. So it's a task for us and AMD to agree on something here. The TTM object refcount removal series from Christian is a step on the way there.
> 

Ok, I thought that was the idea and wanted to confirm.

Let me look at Christian's series now too.

> 
> >  
> > 
> > > XE_VALIDATION_UNIMPLEMENTED: Implementation of the drm_exec transaction
> > > has not been done yet.
> > > XE_VALIDATION_UNSUPPORTED: Some Middle-layers (dma-buf) doesn't allow
> > > the drm_exec context to be passed down to map_attachment where
> > > validation takes place.
> > 
> > 
> > What is the expected longterm implictation of paths that are
> > UNIMPLEMENTED and UNSUPPORTED?
> 
> 
> IMO Unimplemented should not be allowed moving forward other than for debugging. UNIMPLEMENTED requires a new dma-buf mapping interface with an exec argument. I don't think all peers will support that, though and those won't participate fully in the scheme.
> 

That was my thinking too—once the dma-buf mapping is fixed up, disallow this.

Matt

> 
> 
> > 
> > > XE_VALIDATION_OPT_OUT: May be used only for kunit tests where exhaustive
> > > eviction isn't crucial and the ROI of converting those is very
> > > small.
> > > 
> > > For XE_VALIDATION_UNIMPLEMENTED and XE_VALIDATION_OPT_OUT there is also
> > > a lockdep check that a drm_exec transaction can indeed start at the
> > > location where the macro is expanded. This is to encourage
> > > developers to take this into consideration early in the code
> > > development process.
> > > 
> > > Signed-off-by: Thomas Hellström <[thomas.hellstrom@linux.intel.com](mailto:thomas.hellstrom@linux.intel.com)>
> > > ---
> > >  drivers/gpu/drm/xe/Makefile                   |   1 +
> > >  .../compat-i915-headers/gem/i915_gem_stolen.h |   6 +-
> > >  drivers/gpu/drm/xe/display/xe_fb_pin.c        |   5 +-
> > >  drivers/gpu/drm/xe/tests/xe_bo.c              |  20 +--
> > >  drivers/gpu/drm/xe/tests/xe_dma_buf.c         |  12 +-
> > >  drivers/gpu/drm/xe/tests/xe_migrate.c         |  45 +++---
> > >  drivers/gpu/drm/xe/xe_bo.c                    | 129 +++++++++++++++---
> > >  drivers/gpu/drm/xe/xe_bo.h                    |  20 +--
> > >  drivers/gpu/drm/xe/xe_dma_buf.c               |  19 ++-
> > >  drivers/gpu/drm/xe/xe_exec.c                  |   6 +-
> > >  drivers/gpu/drm/xe/xe_ggtt.c                  |  15 +-
> > >  drivers/gpu/drm/xe/xe_ggtt.h                  |   5 +-
> > >  drivers/gpu/drm/xe/xe_gt_pagefault.c          |   4 +-
> > >  drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c    |   6 +-
> > >  drivers/gpu/drm/xe/xe_svm.c                   |   4 +-
> > >  drivers/gpu/drm/xe/xe_validation.c            |  49 +++++++
> > >  drivers/gpu/drm/xe/xe_validation.h            |  69 ++++++++++
> > >  drivers/gpu/drm/xe/xe_vm.c                    |  26 +++-
> > >  drivers/gpu/drm/xe/xe_vm.h                    |  33 ++++-
> > >  drivers/gpu/drm/xe/xe_vm_types.h              |  32 +++--
> > >  20 files changed, 401 insertions(+), 105 deletions(-)
> > >  create mode 100644 drivers/gpu/drm/xe/xe_validation.c
> > >  create mode 100644 drivers/gpu/drm/xe/xe_validation.h
> > > 
> > > diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> > > index 8e0c3412a757..8ee7d275128d 100644
> > > --- a/drivers/gpu/drm/xe/Makefile
> > > +++ b/drivers/gpu/drm/xe/Makefile
> > > @@ -127,6 +127,7 @@ xe-y += xe_bb.o \
> > >  	xe_tuning.o \
> > >  	xe_uc.o \
> > >  	xe_uc_fw.o \
> > > +	xe_validation.o \
> > >  	xe_vm.o \
> > >  	xe_vram.o \
> > >  	xe_vram_freq.o \
> > > diff --git a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> > > index 41d39d67817a..1ce1e9da975b 100644
> > > --- a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> > > +++ b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> > > @@ -8,6 +8,7 @@
> > >  
> > >  #include "xe_ttm_stolen_mgr.h"
> > >  #include "xe_res_cursor.h"
> > > +#include "xe_validation.h"
> > >  
> > >  struct xe_bo;
> > >  
> > > @@ -20,6 +21,7 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
> > >  						       u32 size, u32 align,
> > >  						       u32 start, u32 end)
> > >  {
> > > +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> > >  	struct xe_bo *bo;
> > >  	int err;
> > >  	u32 flags = XE_BO_FLAG_PINNED | XE_BO_FLAG_STOLEN;
> > > @@ -34,13 +36,13 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
> > >  
> > >  	bo = xe_bo_create_locked_range(xe, xe_device_get_root_tile(xe),
> > >  				       NULL, size, start, end,
> > > -				       ttm_bo_type_kernel, flags, 0);
> > > +				       ttm_bo_type_kernel, flags, 0, exec);
> > >  	if (IS_ERR(bo)) {
> > >  		err = PTR_ERR(bo);
> > >  		bo = NULL;
> > >  		return err;
> > >  	}
> > > -	err = xe_bo_pin(bo);
> > > +	err = xe_bo_pin(bo, exec);
> > >  	xe_bo_unlock_vm_held(bo);
> > >  
> > >  	if (err) {
> > > diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > > index f1f8b5ab53ef..4b0748e6fdd6 100644
> > > --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > > +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > > @@ -281,6 +281,7 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
> > >  	struct i915_vma *vma = kzalloc(sizeof(*vma), GFP_KERNEL);
> > >  	struct drm_gem_object *obj = intel_fb_bo(&fb->base);
> > >  	struct xe_bo *bo = gem_to_xe_bo(obj);
> > > +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> > >  	int ret;
> > >  
> > >  	if (!vma)
> > > @@ -313,9 +314,9 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
> > >  		goto err;
> > >  
> > >  	if (IS_DGFX(xe))
> > > -		ret = xe_bo_migrate(bo, XE_PL_VRAM0);
> > > +		ret = xe_bo_migrate(bo, XE_PL_VRAM0, exec);
> > >  	else
> > > -		ret = xe_bo_validate(bo, NULL, true);
> > > +		ret = xe_bo_validate(bo, NULL, true, exec);
> > >  	if (!ret)
> > >  		ttm_bo_pin(&bo->ttm);
> > >  	ttm_bo_unreserve(&bo->ttm);
> > > diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
> > > index bb469096d072..06ceba6c3c25 100644
> > > --- a/drivers/gpu/drm/xe/tests/xe_bo.c
> > > +++ b/drivers/gpu/drm/xe/tests/xe_bo.c
> > > @@ -23,7 +23,7 @@
> > >  
> > >  static int ccs_test_migrate(struct xe_tile *tile, struct xe_bo *bo,
> > >  			    bool clear, u64 get_val, u64 assign_val,
> > > -			    struct kunit *test)
> > > +			    struct kunit *test, struct drm_exec *exec)
> > >  {
> > >  	struct dma_fence *fence;
> > >  	struct ttm_tt *ttm;
> > > @@ -35,7 +35,7 @@ static int ccs_test_migrate(struct xe_tile *tile, struct xe_bo *bo,
> > >  	u32 offset;
> > >  
> > >  	/* Move bo to VRAM if not already there. */
> > > -	ret = xe_bo_validate(bo, NULL, false);
> > > +	ret = xe_bo_validate(bo, NULL, false, exec);
> > >  	if (ret) {
> > >  		KUNIT_FAIL(test, "Failed to validate bo.\n");
> > >  		return ret;
> > > @@ -60,7 +60,7 @@ static int ccs_test_migrate(struct xe_tile *tile, struct xe_bo *bo,
> > >  	}
> > >  
> > >  	/* Evict to system. CCS data should be copied. */
> > > -	ret = xe_bo_evict(bo);
> > > +	ret = xe_bo_evict(bo, exec);
> > >  	if (ret) {
> > >  		KUNIT_FAIL(test, "Failed to evict bo.\n");
> > >  		return ret;
> > > @@ -132,6 +132,7 @@ static void ccs_test_run_tile(struct xe_device *xe, struct xe_tile *tile,
> > >  
> > >  	/* TODO: Sanity check */
> > >  	unsigned int bo_flags = XE_BO_FLAG_VRAM_IF_DGFX(tile);
> > > +	struct drm_exec *exec = XE_VALIDATION_OPT_OUT;
> > >  
> > >  	if (IS_DGFX(xe))
> > >  		kunit_info(test, "Testing vram id %u\n", tile->id);
> > > @@ -149,18 +150,18 @@ static void ccs_test_run_tile(struct xe_device *xe, struct xe_tile *tile,
> > >  
> > >  	kunit_info(test, "Verifying that CCS data is cleared on creation.\n");
> > >  	ret = ccs_test_migrate(tile, bo, false, 0ULL, 0xdeadbeefdeadbeefULL,
> > > -			       test);
> > > +			       test, exec);
> > >  	if (ret)
> > >  		goto out_unlock;
> > >  
> > >  	kunit_info(test, "Verifying that CCS data survives migration.\n");
> > >  	ret = ccs_test_migrate(tile, bo, false, 0xdeadbeefdeadbeefULL,
> > > -			       0xdeadbeefdeadbeefULL, test);
> > > +			       0xdeadbeefdeadbeefULL, test, exec);
> > >  	if (ret)
> > >  		goto out_unlock;
> > >  
> > >  	kunit_info(test, "Verifying that CCS data can be properly cleared.\n");
> > > -	ret = ccs_test_migrate(tile, bo, true, 0ULL, 0ULL, test);
> > > +	ret = ccs_test_migrate(tile, bo, true, 0ULL, 0ULL, test, exec);
> > >  
> > >  out_unlock:
> > >  	xe_bo_unlock(bo);
> > > @@ -210,6 +211,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
> > >  	struct xe_bo *bo, *external;
> > >  	unsigned int bo_flags = XE_BO_FLAG_VRAM_IF_DGFX(tile);
> > >  	struct xe_vm *vm = xe_migrate_get_vm(xe_device_get_root_tile(xe)->migrate);
> > > +	struct drm_exec *exec = XE_VALIDATION_OPT_OUT;
> > >  	struct xe_gt *__gt;
> > >  	int err, i, id;
> > >  
> > > @@ -236,7 +238,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
> > >  		}
> > >  
> > >  		xe_bo_lock(external, false);
> > > -		err = xe_bo_pin_external(external);
> > > +		err = xe_bo_pin_external(external, exec);
> > >  		xe_bo_unlock(external);
> > >  		if (err) {
> > >  			KUNIT_FAIL(test, "external bo pin err=%pe\n",
> > > @@ -294,7 +296,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
> > >  		if (i) {
> > >  			down_read(&vm->lock);
> > >  			xe_vm_lock(vm, false);
> > > -			err = xe_bo_validate(bo, bo->vm, false);
> > > +			err = xe_bo_validate(bo, bo->vm, false, exec);
> > >  			xe_vm_unlock(vm);
> > >  			up_read(&vm->lock);
> > >  			if (err) {
> > > @@ -303,7 +305,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
> > >  				goto cleanup_all;
> > >  			}
> > >  			xe_bo_lock(external, false);
> > > -			err = xe_bo_validate(external, NULL, false);
> > > +			err = xe_bo_validate(external, NULL, false, exec);
> > >  			xe_bo_unlock(external);
> > >  			if (err) {
> > >  				KUNIT_FAIL(test, "external bo valid err=%pe\n",
> > > diff --git a/drivers/gpu/drm/xe/tests/xe_dma_buf.c b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
> > > index cde9530bef8c..965dd3280468 100644
> > > --- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c
> > > +++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
> > > @@ -27,7 +27,8 @@ static bool is_dynamic(struct dma_buf_test_params *params)
> > >  }
> > >  
> > >  static void check_residency(struct kunit *test, struct xe_bo *exported,
> > > -			    struct xe_bo *imported, struct dma_buf *dmabuf)
> > > +			    struct xe_bo *imported, struct dma_buf *dmabuf,
> > > +			    struct drm_exec *exec)
> > >  {
> > >  	struct dma_buf_test_params *params = to_dma_buf_test_params(test->priv);
> > >  	u32 mem_type;
> > > @@ -62,7 +63,7 @@ static void check_residency(struct kunit *test, struct xe_bo *exported,
> > >  	 * importer is on a different device. If they're on the same device,
> > >  	 * the exporter and the importer should be the same bo.
> > >  	 */
> > > -	ret = xe_bo_evict(exported);
> > > +	ret = xe_bo_evict(exported, exec);
> > >  	if (ret) {
> > >  		if (ret != -EINTR && ret != -ERESTARTSYS)
> > >  			KUNIT_FAIL(test, "Evicting exporter failed with err=%d.\n",
> > > @@ -77,7 +78,7 @@ static void check_residency(struct kunit *test, struct xe_bo *exported,
> > >  	}
> > >  
> > >  	/* Re-validate the importer. This should move also exporter in. */
> > > -	ret = xe_bo_validate(imported, NULL, false);
> > > +	ret = xe_bo_validate(imported, NULL, false, exec);
> > >  	if (ret) {
> > >  		if (ret != -EINTR && ret != -ERESTARTSYS)
> > >  			KUNIT_FAIL(test, "Validating importer failed with err=%d.\n",
> > > @@ -150,11 +151,12 @@ static void xe_test_dmabuf_import_same_driver(struct xe_device *xe)
> > >  			KUNIT_FAIL(test,
> > >  				   "xe_gem_prime_import() succeeded when it shouldn't have\n");
> > >  		} else {
> > > +			struct drm_exec *exec = XE_VALIDATION_OPT_OUT;
> > >  			int err;
> > >  
> > >  			/* Is everything where we expect it to be? */
> > >  			xe_bo_lock(import_bo, false);
> > > -			err = xe_bo_validate(import_bo, NULL, false);
> > > +			err = xe_bo_validate(import_bo, NULL, false, exec);
> > >  
> > >  			/* Pinning in VRAM is not allowed. */
> > >  			if (!is_dynamic(params) &&
> > > @@ -167,7 +169,7 @@ static void xe_test_dmabuf_import_same_driver(struct xe_device *xe)
> > >  						  err == -ERESTARTSYS);
> > >  
> > >  			if (!err)
> > > -				check_residency(test, bo, import_bo, dmabuf);
> > > +				check_residency(test, bo, import_bo, dmabuf, exec);
> > >  			xe_bo_unlock(import_bo);
> > >  		}
> > >  		drm_gem_object_put(import);
> > > diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
> > > index edd1e701aa1c..dfb445d09759 100644
> > > --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
> > > +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
> > > @@ -70,7 +70,7 @@ static int run_sanity_job(struct xe_migrate *m, struct xe_device *xe,
> > >  		} } while (0)
> > >  
> > >  static void test_copy(struct xe_migrate *m, struct xe_bo *bo,
> > > -		      struct kunit *test, u32 region)
> > > +		      struct kunit *test, u32 region, struct drm_exec *exec)
> > >  {
> > >  	struct xe_device *xe = tile_to_xe(m->tile);
> > >  	u64 retval, expected = 0;
> > > @@ -84,14 +84,15 @@ static void test_copy(struct xe_migrate *m, struct xe_bo *bo,
> > >  						   ttm_bo_type_kernel,
> > >  						   region |
> > >  						   XE_BO_FLAG_NEEDS_CPU_ACCESS |
> > > -						   XE_BO_FLAG_PINNED);
> > > +						   XE_BO_FLAG_PINNED,
> > > +						   exec);
> > >  	if (IS_ERR(remote)) {
> > >  		KUNIT_FAIL(test, "Failed to allocate remote bo for %s: %pe\n",
> > >  			   str, remote);
> > >  		return;
> > >  	}
> > >  
> > > -	err = xe_bo_validate(remote, NULL, false);
> > > +	err = xe_bo_validate(remote, NULL, false, exec);
> > >  	if (err) {
> > >  		KUNIT_FAIL(test, "Failed to validate system bo for %s: %i\n",
> > >  			   str, err);
> > > @@ -161,13 +162,13 @@ static void test_copy(struct xe_migrate *m, struct xe_bo *bo,
> > >  }
> > >  
> > >  static void test_copy_sysmem(struct xe_migrate *m, struct xe_bo *bo,
> > > -			     struct kunit *test)
> > > +			     struct drm_exec *exec, struct kunit *test)
> > >  {
> > > -	test_copy(m, bo, test, XE_BO_FLAG_SYSTEM);
> > > +	test_copy(m, bo, test, XE_BO_FLAG_SYSTEM, exec);
> > >  }
> > >  
> > >  static void test_copy_vram(struct xe_migrate *m, struct xe_bo *bo,
> > > -			   struct kunit *test)
> > > +			   struct drm_exec *exec, struct kunit *test)
> > >  {
> > >  	u32 region;
> > >  
> > > @@ -178,10 +179,11 @@ static void test_copy_vram(struct xe_migrate *m, struct xe_bo *bo,
> > >  		region = XE_BO_FLAG_VRAM1;
> > >  	else
> > >  		region = XE_BO_FLAG_VRAM0;
> > > -	test_copy(m, bo, test, region);
> > > +	test_copy(m, bo, test, region, exec);
> > >  }
> > >  
> > > -static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
> > > +static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test,
> > > +				   struct drm_exec *exec)
> > >  {
> > >  	struct xe_tile *tile = m->tile;
> > >  	struct xe_device *xe = tile_to_xe(tile);
> > > @@ -290,10 +292,10 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
> > >  	check(retval, expected, "Command clear small last value", test);
> > >  
> > >  	kunit_info(test, "Copying small buffer object to system\n");
> > > -	test_copy_sysmem(m, tiny, test);
> > > +	test_copy_sysmem(m, tiny, exec, test);
> > >  	if (xe->info.tile_count > 1) {
> > >  		kunit_info(test, "Copying small buffer object to other vram\n");
> > > -		test_copy_vram(m, tiny, test);
> > > +		test_copy_vram(m, tiny, exec, test);
> > >  	}
> > >  
> > >  	/* Clear a big bo */
> > > @@ -312,10 +314,10 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
> > >  	check(retval, expected, "Command clear big last value", test);
> > >  
> > >  	kunit_info(test, "Copying big buffer object to system\n");
> > > -	test_copy_sysmem(m, big, test);
> > > +	test_copy_sysmem(m, big, exec, test);
> > >  	if (xe->info.tile_count > 1) {
> > >  		kunit_info(test, "Copying big buffer object to other vram\n");
> > > -		test_copy_vram(m, big, test);
> > > +		test_copy_vram(m, big, exec, test);
> > >  	}
> > >  
> > >  out:
> > > @@ -343,10 +345,11 @@ static int migrate_test_run_device(struct xe_device *xe)
> > >  
> > >  	for_each_tile(tile, xe, id) {
> > >  		struct xe_migrate *m = tile->migrate;
> > > +		struct drm_exec *exec = XE_VALIDATION_OPT_OUT;
> > >  
> > >  		kunit_info(test, "Testing tile id %d.\n", id);
> > >  		xe_vm_lock(m->q->vm, false);
> > > -		xe_migrate_sanity_test(m, test);
> > > +		xe_migrate_sanity_test(m, test, exec);
> > >  		xe_vm_unlock(m->q->vm);
> > >  	}
> > >  
> > > @@ -490,7 +493,7 @@ static struct dma_fence *blt_copy(struct xe_tile *tile,
> > >  
> > >  static void test_migrate(struct xe_device *xe, struct xe_tile *tile,
> > >  			 struct xe_bo *sys_bo, struct xe_bo *vram_bo, struct xe_bo *ccs_bo,
> > > -			 struct kunit *test)
> > > +			 struct drm_exec *exec, struct kunit *test)
> > >  {
> > >  	struct dma_fence *fence;
> > >  	u64 expected, retval;
> > > @@ -509,7 +512,7 @@ static void test_migrate(struct xe_device *xe, struct xe_tile *tile,
> > >  	dma_fence_put(fence);
> > >  
> > >  	kunit_info(test, "Evict vram buffer object\n");
> > > -	ret = xe_bo_evict(vram_bo);
> > > +	ret = xe_bo_evict(vram_bo, exec);
> > >  	if (ret) {
> > >  		KUNIT_FAIL(test, "Failed to evict bo.\n");
> > >  		return;
> > > @@ -538,7 +541,7 @@ static void test_migrate(struct xe_device *xe, struct xe_tile *tile,
> > >  	dma_fence_put(fence);
> > >  
> > >  	kunit_info(test, "Restore vram buffer object\n");
> > > -	ret = xe_bo_validate(vram_bo, NULL, false);
> > > +	ret = xe_bo_validate(vram_bo, NULL, false, exec);
> > >  	if (ret) {
> > >  		KUNIT_FAIL(test, "Failed to validate vram bo for: %li\n", ret);
> > >  		return;
> > > @@ -636,6 +639,7 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
> > >  {
> > >  	struct xe_bo *sys_bo, *vram_bo = NULL, *ccs_bo = NULL;
> > >  	unsigned int bo_flags = XE_BO_FLAG_VRAM_IF_DGFX(tile);
> > > +	struct drm_exec *exec;
> > >  	long ret;
> > >  
> > >  	sys_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M,
> > > @@ -650,8 +654,9 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
> > >  		return;
> > >  	}
> > >  
> > > +	exec = XE_VALIDATION_OPT_OUT;
> > >  	xe_bo_lock(sys_bo, false);
> > > -	ret = xe_bo_validate(sys_bo, NULL, false);
> > > +	ret = xe_bo_validate(sys_bo, NULL, false, exec);
> > >  	if (ret) {
> > >  		KUNIT_FAIL(test, "Failed to validate system bo for: %li\n", ret);
> > >  		goto free_sysbo;
> > > @@ -676,7 +681,7 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
> > >  	}
> > >  
> > >  	xe_bo_lock(ccs_bo, false);
> > > -	ret = xe_bo_validate(ccs_bo, NULL, false);
> > > +	ret = xe_bo_validate(ccs_bo, NULL, false, exec);
> > >  	if (ret) {
> > >  		KUNIT_FAIL(test, "Failed to validate system bo for: %li\n", ret);
> > >  		goto free_ccsbo;
> > > @@ -700,7 +705,7 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
> > >  	}
> > >  
> > >  	xe_bo_lock(vram_bo, false);
> > > -	ret = xe_bo_validate(vram_bo, NULL, false);
> > > +	ret = xe_bo_validate(vram_bo, NULL, false, exec);
> > >  	if (ret) {
> > >  		KUNIT_FAIL(test, "Failed to validate vram bo for: %li\n", ret);
> > >  		goto free_vrambo;
> > > @@ -713,7 +718,7 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
> > >  	}
> > >  
> > >  	test_clear(xe, tile, sys_bo, vram_bo, test);
> > > -	test_migrate(xe, tile, sys_bo, vram_bo, ccs_bo, test);
> > > +	test_migrate(xe, tile, sys_bo, vram_bo, ccs_bo, exec, test);
> > >  	xe_bo_unlock(vram_bo);
> > >  
> > >  	xe_bo_lock(vram_bo, false);
> > > diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> > > index 11eaf3b06766..e71addf51ed0 100644
> > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > @@ -1139,6 +1139,7 @@ long xe_bo_shrink(struct ttm_operation_ctx *ctx, struct ttm_buffer_object *bo,
> > >  int xe_bo_notifier_prepare_pinned(struct xe_bo *bo)
> > >  {
> > >  	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
> > > +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> > >  	struct xe_bo *backup;
> > >  	int ret = 0;
> > >  
> > > @@ -1163,7 +1164,7 @@ int xe_bo_notifier_prepare_pinned(struct xe_bo *bo)
> > >  	backup = ___xe_bo_create_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
> > >  					DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> > >  					XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> > > -					XE_BO_FLAG_PINNED);
> > > +					XE_BO_FLAG_PINNED, exec);
> > >  	if (IS_ERR(backup)) {
> > >  		ret = PTR_ERR(backup);
> > >  		goto out_unlock_bo;
> > > @@ -1214,6 +1215,7 @@ int xe_bo_notifier_unprepare_pinned(struct xe_bo *bo)
> > >  int xe_bo_evict_pinned(struct xe_bo *bo)
> > >  {
> > >  	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
> > > +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> > >  	struct xe_bo *backup = bo->backup_obj;
> > >  	bool backup_created = false;
> > >  	bool unmap = false;
> > > @@ -1242,7 +1244,7 @@ int xe_bo_evict_pinned(struct xe_bo *bo)
> > >  						NULL, xe_bo_size(bo),
> > >  						DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> > >  						XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> > > -						XE_BO_FLAG_PINNED);
> > > +						XE_BO_FLAG_PINNED, exec);
> > >  		if (IS_ERR(backup)) {
> > >  			ret = PTR_ERR(backup);
> > >  			goto out_unlock_bo;
> > > @@ -1718,12 +1720,14 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
> > >  	struct xe_device *xe = to_xe_device(ddev);
> > >  	struct xe_bo *bo = ttm_to_xe_bo(tbo);
> > >  	bool needs_rpm = bo->flags & XE_BO_FLAG_VRAM_MASK;
> > > +	struct drm_exec *exec;
> > >  	vm_fault_t ret;
> > >  	int idx;
> > >  
> > >  	if (needs_rpm)
> > >  		xe_pm_runtime_get(xe);
> > >  
> > > +	exec = XE_VALIDATION_UNIMPLEMENTED;
> > >  	ret = ttm_bo_vm_reserve(tbo, vmf);
> > >  	if (ret)
> > >  		goto out;
> > > @@ -1731,6 +1735,7 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
> > >  	if (drm_dev_enter(ddev, &idx)) {
> > >  		trace_xe_bo_cpu_fault(bo);
> > >  
> > > +		xe_validation_assert_exec(xe, exec, &tbo->base);
> > >  		ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
> > >  					       TTM_BO_VM_NUM_PREFAULT);
> > >  		drm_dev_exit(idx);
> > > @@ -1850,11 +1855,32 @@ void xe_bo_free(struct xe_bo *bo)
> > >  	kfree(bo);
> > >  }
> > >  
> > > +/**
> > > + * ___xe_bo_create_locked() - Initialize or create an xe_bo.
> > > + * @xe: The xe device.
> > > + * @bo: An already allocated buffer object or NULL
> > > + * if the function should allocate a new one.
> > > + * @tile: The tile to select for migration of this bo, and the tile used for
> > > + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
> > > + * @resv: Pointer to a locked shared reservation object to use fo this bo,
> > > + * or NULL for the xe_bo to use its own.
> > > + * @bulk: The bulk move to use for LRU bumping, or NULL for external bos.
> > > + * @size: The storage size to use for the bo.
> > > + * @cpu_caching: The cpu caching used for system memory backing store.
> > > + * @type: The TTM buffer object type.
> > > + * @flags: XE_BO_FLAG_ flags.
> > > + * @exec: The drm_exec transaction to use for exhaustive eviction.
> > > + *
> > > + * Initialize or create an xe buffer object. On failure, any allocated buffer
> > > + * object passed in @bo will have been unreferenced.
> > > + *
> > > + * Return: The buffer object on success. Negative error pointer on failure.
> > > + */
> > >  struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> > >  				     struct xe_tile *tile, struct dma_resv *resv,
> > >  				     struct ttm_lru_bulk_move *bulk, size_t size,
> > >  				     u16 cpu_caching, enum ttm_bo_type type,
> > > -				     u32 flags)
> > > +				     u32 flags, struct drm_exec *exec)
> > >  {
> > >  	struct ttm_operation_ctx ctx = {
> > >  		.interruptible = true,
> > > @@ -1923,6 +1949,7 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> > >  		ctx.resv = resv;
> > >  	}
> > >  
> > > +	xe_validation_assert_exec(xe, exec, &bo->ttm.base);
> > >  	if (!(flags & XE_BO_FLAG_FIXED_PLACEMENT)) {
> > >  		err = __xe_bo_placement_for_flags(xe, bo, bo->flags);
> > >  		if (WARN_ON(err)) {
> > > @@ -2024,7 +2051,7 @@ __xe_bo_create_locked(struct xe_device *xe,
> > >  		      struct xe_tile *tile, struct xe_vm *vm,
> > >  		      size_t size, u64 start, u64 end,
> > >  		      u16 cpu_caching, enum ttm_bo_type type, u32 flags,
> > > -		      u64 alignment)
> > > +		      u64 alignment, struct drm_exec *exec)
> > >  {
> > >  	struct xe_bo *bo = NULL;
> > >  	int err;
> > > @@ -2049,7 +2076,7 @@ __xe_bo_create_locked(struct xe_device *xe,
> > >  				    vm && !xe_vm_in_fault_mode(vm) &&
> > >  				    flags & XE_BO_FLAG_USER ?
> > >  				    &vm->lru_bulk_move : NULL, size,
> > > -				    cpu_caching, type, flags);
> > > +				    cpu_caching, type, flags, exec);
> > >  	if (IS_ERR(bo))
> > >  		return bo;
> > >  
> > > @@ -2083,9 +2110,10 @@ __xe_bo_create_locked(struct xe_device *xe,
> > >  
> > >  			if (flags & XE_BO_FLAG_FIXED_PLACEMENT) {
> > >  				err = xe_ggtt_insert_bo_at(t->mem.ggtt, bo,
> > > -							   start + xe_bo_size(bo), U64_MAX);
> > > +							   start + xe_bo_size(bo), U64_MAX,
> > > +							   exec);
> > >  			} else {
> > > -				err = xe_ggtt_insert_bo(t->mem.ggtt, bo);
> > > +				err = xe_ggtt_insert_bo(t->mem.ggtt, bo, exec);
> > >  			}
> > >  			if (err)
> > >  				goto err_unlock_put_bo;
> > > @@ -2102,22 +2130,59 @@ __xe_bo_create_locked(struct xe_device *xe,
> > >  	return ERR_PTR(err);
> > >  }
> > >  
> > > +/**
> > > + * xe_bo_create_locked_range() - Create a BO with range- and alignment options
> > > + * @xe: The xe device.
> > > + * @tile: The tile to select for migration of this bo, and the tile used for
> > > + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
> > > + * @vm: The local vm or NULL for external objects.
> > > + * @size: The storage size to use for the bo.
> > > + * @start: Start of fixed VRAM range or 0.
> > > + * @end: End of fixed VRAM range or ~0ULL.
> > > + * @type: The TTM buffer object type.
> > > + * @flags: XE_BO_FLAG_ flags.
> > > + * @alignment: For GGTT buffer objects, the minimum GGTT alignment.
> > > + * @exec: The drm_exec transaction to use for exhaustive eviction.
> > > + *
> > > + * Create an Xe BO with range- and alignment options. If @start and @end indicate
> > > + * a fixed VRAM range, this must be a ttm_bo_type_kernel bo with VRAM placement
> > > + * only. The @alignment parameter can be used for GGTT alignment.
> > > + *
> > > + * Return: The buffer object on success. Negative error pointer on failure.
> > > + */
> > >  struct xe_bo *
> > >  xe_bo_create_locked_range(struct xe_device *xe,
> > >  			  struct xe_tile *tile, struct xe_vm *vm,
> > >  			  size_t size, u64 start, u64 end,
> > > -			  enum ttm_bo_type type, u32 flags, u64 alignment)
> > > +			  enum ttm_bo_type type, u32 flags, u64 alignment,
> > > +			  struct drm_exec *exec)
> > >  {
> > >  	return __xe_bo_create_locked(xe, tile, vm, size, start, end, 0, type,
> > > -				     flags, alignment);
> > > +				     flags, alignment, exec);
> > >  }
> > >  
> > > +/**
> > > + * xe_bo_create_locked() - Create a BO
> > > + * @xe: The xe device.
> > > + * @tile: The tile to select for migration of this bo, and the tile used for
> > > + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
> > > + * @vm: The local vm or NULL for external objects.
> > > + * @size: The storage size to use for the bo.
> > > + * @type: The TTM buffer object type.
> > > + * @flags: XE_BO_FLAG_ flags.
> > > + * @exec: The drm_exec transaction to use for exhaustive eviction.
> > > + *
> > > + * Create a locked xe BO with no range- nor alignment restrictions.
> > > + *
> > > + * Return: The buffer object on success. Negative error pointer on failure.
> > > + */
> > >  struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
> > >  				  struct xe_vm *vm, size_t size,
> > > -				  enum ttm_bo_type type, u32 flags)
> > > +				  enum ttm_bo_type type, u32 flags,
> > > +				  struct drm_exec *exec)
> > >  {
> > >  	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, type,
> > > -				     flags, 0);
> > > +				     flags, 0, exec);
> > >  }
> > >  
> > >  struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
> > > @@ -2125,9 +2190,10 @@ struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
> > >  				u16 cpu_caching,
> > >  				u32 flags)
> > >  {
> > > +	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
> > >  	struct xe_bo *bo = __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL,
> > >  						 cpu_caching, ttm_bo_type_device,
> > > -						 flags | XE_BO_FLAG_USER, 0);
> > > +						 flags | XE_BO_FLAG_USER, 0, exec);
> > >  	if (!IS_ERR(bo))
> > >  		xe_bo_unlock_vm_held(bo);
> > >  
> > > @@ -2138,7 +2204,8 @@ struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
> > >  			   struct xe_vm *vm, size_t size,
> > >  			   enum ttm_bo_type type, u32 flags)
> > >  {
> > > -	struct xe_bo *bo = xe_bo_create_locked(xe, tile, vm, size, type, flags);
> > > +	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
> > > +	struct xe_bo *bo = xe_bo_create_locked(xe, tile, vm, size, type, flags, exec);
> > >  
> > >  	if (!IS_ERR(bo))
> > >  		xe_bo_unlock_vm_held(bo);
> > > @@ -2166,6 +2233,7 @@ struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
> > >  	int err;
> > >  	u64 start = offset == ~0ull ? 0 : offset;
> > >  	u64 end = offset == ~0ull ? offset : start + size;
> > > +	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
> > >  
> > >  	if (flags & XE_BO_FLAG_STOLEN &&
> > >  	    xe_ttm_stolen_cpu_access_needs_ggtt(xe))
> > > @@ -2173,11 +2241,11 @@ struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
> > >  
> > >  	bo = xe_bo_create_locked_range(xe, tile, vm, size, start, end, type,
> > >  				       flags | XE_BO_FLAG_NEEDS_CPU_ACCESS | XE_BO_FLAG_PINNED,
> > > -				       alignment);
> > > +				       alignment, exec);
> > >  	if (IS_ERR(bo))
> > >  		return bo;
> > >  
> > > -	err = xe_bo_pin(bo);
> > > +	err = xe_bo_pin(bo, exec);
> > >  	if (err)
> > >  		goto err_put;
> > >  
> > > @@ -2299,6 +2367,7 @@ uint64_t vram_region_gpu_offset(struct ttm_resource *res)
> > >  /**
> > >   * xe_bo_pin_external - pin an external BO
> > >   * @bo: buffer object to be pinned
> > > + * @exec: The drm_exec transaction to use for exhaustive eviction.
> > >   *
> > >   * Pin an external (not tied to a VM, can be exported via dma-buf / prime FD)
> > >   * BO. Unique call compared to xe_bo_pin as this function has it own set of
> > > @@ -2306,7 +2375,7 @@ uint64_t vram_region_gpu_offset(struct ttm_resource *res)
> > >   *
> > >   * Returns 0 for success, negative error code otherwise.
> > >   */
> > > -int xe_bo_pin_external(struct xe_bo *bo)
> > > +int xe_bo_pin_external(struct xe_bo *bo, struct drm_exec *exec)
> > >  {
> > >  	struct xe_device *xe = xe_bo_device(bo);
> > >  	int err;
> > > @@ -2315,7 +2384,7 @@ int xe_bo_pin_external(struct xe_bo *bo)
> > >  	xe_assert(xe, xe_bo_is_user(bo));
> > >  
> > >  	if (!xe_bo_is_pinned(bo)) {
> > > -		err = xe_bo_validate(bo, NULL, false);
> > > +		err = xe_bo_validate(bo, NULL, false, exec);
> > >  		if (err)
> > >  			return err;
> > >  
> > > @@ -2337,7 +2406,17 @@ int xe_bo_pin_external(struct xe_bo *bo)
> > >  	return 0;
> > >  }
> > >  
> > > -int xe_bo_pin(struct xe_bo *bo)
> > > +/**
> > > + * xe_bo_pin() - Pin a kernel bo after potentially migrating it
> > > + * @bo: The kernel bo to pin.
> > > + * @exec: The drm_exec transaction to use for exhaustive eviction.
> > > + *
> > > + * Attempts to migrate a bo to @bo->placement. If that succeeds,
> > > + * pins the bo.
> > > + *
> > > + * Return: %0 on success, negative error code on migration failure.
> > > + */
> > > +int xe_bo_pin(struct xe_bo *bo, struct drm_exec *exec)
> > >  {
> > >  	struct ttm_place *place = &bo->placements[0];
> > >  	struct xe_device *xe = xe_bo_device(bo);
> > > @@ -2359,7 +2438,7 @@ int xe_bo_pin(struct xe_bo *bo)
> > >  	/* We only expect at most 1 pin */
> > >  	xe_assert(xe, !xe_bo_is_pinned(bo));
> > >  
> > > -	err = xe_bo_validate(bo, NULL, false);
> > > +	err = xe_bo_validate(bo, NULL, false, exec);
> > >  	if (err)
> > >  		return err;
> > >  
> > > @@ -2452,6 +2531,7 @@ void xe_bo_unpin(struct xe_bo *bo)
> > >   *      NULL. Used together with @allow_res_evict.
> > >   * @allow_res_evict: Whether it's allowed to evict bos sharing @vm's
> > >   *                   reservation object.
> > > + * @exec: The drm_exec transaction to use for exhaustive eviction.
> > >   *
> > >   * Make sure the bo is in allowed placement, migrating it if necessary. If
> > >   * needed, other bos will be evicted. If bos selected for eviction shares
> > > @@ -2461,7 +2541,8 @@ void xe_bo_unpin(struct xe_bo *bo)
> > >   * Return: 0 on success, negative error code on failure. May return
> > >   * -EINTR or -ERESTARTSYS if internal waits are interrupted by a signal.
> > >   */
> > > -int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict)
> > > +int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict,
> > > +		   struct drm_exec *exec)
> > >  {
> > >  	struct ttm_operation_ctx ctx = {
> > >  		.interruptible = true,
> > > @@ -2480,6 +2561,7 @@ int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict)
> > >  
> > >  	xe_vm_set_validating(vm, allow_res_evict);
> > >  	trace_xe_bo_validate(bo);
> > > +	xe_validation_assert_exec(xe_bo_device(bo), exec, &bo->ttm.base);
> > >  	ret = ttm_bo_validate(&bo->ttm, &bo->placement, &ctx);
> > >  	xe_vm_clear_validating(vm, allow_res_evict);
> > >  
> > > @@ -2917,6 +2999,7 @@ static void xe_place_from_ttm_type(u32 mem_type, struct ttm_place *place)
> > >   * xe_bo_migrate - Migrate an object to the desired region id
> > >   * @bo: The buffer object to migrate.
> > >   * @mem_type: The TTM region type to migrate to.
> > > + * @exec: The drm_exec transaction to use for exhaustive eviction.
> > >   *
> > >   * Attempt to migrate the buffer object to the desired memory region. The
> > >   * buffer object may not be pinned, and must be locked.
> > > @@ -2928,7 +3011,7 @@ static void xe_place_from_ttm_type(u32 mem_type, struct ttm_place *place)
> > >   * Return: 0 on success. Negative error code on failure. In particular may
> > >   * return -EINTR or -ERESTARTSYS if signal pending.
> > >   */
> > > -int xe_bo_migrate(struct xe_bo *bo, u32 mem_type)
> > > +int xe_bo_migrate(struct xe_bo *bo, u32 mem_type, struct drm_exec *exec)
> > >  {
> > >  	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
> > >  	struct ttm_operation_ctx ctx = {
> > > @@ -2966,19 +3049,21 @@ int xe_bo_migrate(struct xe_bo *bo, u32 mem_type)
> > >  		add_vram(xe, bo, &requested, bo->flags, mem_type, &c);
> > >  	}
> > >  
> > > +	xe_validation_assert_exec(xe_bo_device(bo), exec, &bo->ttm.base);
> > >  	return ttm_bo_validate(&bo->ttm, &placement, &ctx);
> > >  }
> > >  
> > >  /**
> > >   * xe_bo_evict - Evict an object to evict placement
> > >   * @bo: The buffer object to migrate.
> > > + * @exec: The drm_exec transaction to use for exhaustive eviction.
> > >   *
> > >   * On successful completion, the object memory will be moved to evict
> > >   * placement. This function blocks until the object has been fully moved.
> > >   *
> > >   * Return: 0 on success. Negative error code on failure.
> > >   */
> > > -int xe_bo_evict(struct xe_bo *bo)
> > > +int xe_bo_evict(struct xe_bo *bo, struct drm_exec *exec)
> > >  {
> > >  	struct ttm_operation_ctx ctx = {
> > >  		.interruptible = false,
> > > diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> > > index 8cce413b5235..b1b6cb622d71 100644
> > > --- a/drivers/gpu/drm/xe/xe_bo.h
> > > +++ b/drivers/gpu/drm/xe/xe_bo.h
> > > @@ -10,6 +10,7 @@
> > >  
> > >  #include "xe_bo_types.h"
> > >  #include "xe_macros.h"
> > > +#include "xe_validation.h"
> > >  #include "xe_vm_types.h"
> > >  #include "xe_vm.h"
> > >  #include "xe_vram_types.h"
> > > @@ -92,15 +93,17 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> > >  				     struct xe_tile *tile, struct dma_resv *resv,
> > >  				     struct ttm_lru_bulk_move *bulk, size_t size,
> > >  				     u16 cpu_caching, enum ttm_bo_type type,
> > > -				     u32 flags);
> > > +				     u32 flags, struct drm_exec *exec);
> > >  struct xe_bo *
> > >  xe_bo_create_locked_range(struct xe_device *xe,
> > >  			  struct xe_tile *tile, struct xe_vm *vm,
> > >  			  size_t size, u64 start, u64 end,
> > > -			  enum ttm_bo_type type, u32 flags, u64 alignment);
> > > +			  enum ttm_bo_type type, u32 flags, u64 alignment,
> > > +			  struct drm_exec *exec);
> > >  struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
> > >  				  struct xe_vm *vm, size_t size,
> > > -				  enum ttm_bo_type type, u32 flags);
> > > +				  enum ttm_bo_type type, u32 flags,
> > > +				  struct drm_exec *exec);
> > >  struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
> > >  			   struct xe_vm *vm, size_t size,
> > >  			   enum ttm_bo_type type, u32 flags);
> > > @@ -200,11 +203,12 @@ static inline void xe_bo_unlock_vm_held(struct xe_bo *bo)
> > >  	}
> > >  }
> > >  
> > > -int xe_bo_pin_external(struct xe_bo *bo);
> > > -int xe_bo_pin(struct xe_bo *bo);
> > > +int xe_bo_pin_external(struct xe_bo *bo, struct drm_exec *exec);
> > > +int xe_bo_pin(struct xe_bo *bo, struct drm_exec *exec);
> > >  void xe_bo_unpin_external(struct xe_bo *bo);
> > >  void xe_bo_unpin(struct xe_bo *bo);
> > > -int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict);
> > > +int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict,
> > > +		   struct drm_exec *exec);
> > >  
> > >  static inline bool xe_bo_is_pinned(struct xe_bo *bo)
> > >  {
> > > @@ -285,8 +289,8 @@ uint64_t vram_region_gpu_offset(struct ttm_resource *res);
> > >  
> > >  bool xe_bo_can_migrate(struct xe_bo *bo, u32 mem_type);
> > >  
> > > -int xe_bo_migrate(struct xe_bo *bo, u32 mem_type);
> > > -int xe_bo_evict(struct xe_bo *bo);
> > > +int xe_bo_migrate(struct xe_bo *bo, u32 mem_type, struct drm_exec *exec);
> > > +int xe_bo_evict(struct xe_bo *bo, struct drm_exec *exec);
> > >  
> > >  int xe_bo_evict_pinned(struct xe_bo *bo);
> > >  int xe_bo_notifier_prepare_pinned(struct xe_bo *bo);
> > > diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
> > > index 346f857f3837..78a827d4e726 100644
> > > --- a/drivers/gpu/drm/xe/xe_dma_buf.c
> > > +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
> > > @@ -51,6 +51,7 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
> > >  	struct drm_gem_object *obj = attach->dmabuf->priv;
> > >  	struct xe_bo *bo = gem_to_xe_bo(obj);
> > >  	struct xe_device *xe = xe_bo_device(bo);
> > > +	struct drm_exec *exec = XE_VALIDATION_UNSUPPORTED;
> > >  	int ret;
> > >  
> > >  	/*
> > > @@ -63,7 +64,7 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
> > >  		return -EINVAL;
> > >  	}
> > >  
> > > -	ret = xe_bo_migrate(bo, XE_PL_TT);
> > > +	ret = xe_bo_migrate(bo, XE_PL_TT, exec);
> > >  	if (ret) {
> > >  		if (ret != -EINTR && ret != -ERESTARTSYS)
> > >  			drm_dbg(&xe->drm,
> > > @@ -72,7 +73,7 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
> > >  		return ret;
> > >  	}
> > >  
> > > -	ret = xe_bo_pin_external(bo);
> > > +	ret = xe_bo_pin_external(bo, exec);
> > >  	xe_assert(xe, !ret);
> > >  
> > >  	return 0;
> > > @@ -92,6 +93,7 @@ static struct sg_table *xe_dma_buf_map(struct dma_buf_attachment *attach,
> > >  	struct dma_buf *dma_buf = attach->dmabuf;
> > >  	struct drm_gem_object *obj = dma_buf->priv;
> > >  	struct xe_bo *bo = gem_to_xe_bo(obj);
> > > +	struct drm_exec *exec = XE_VALIDATION_UNSUPPORTED;
> > >  	struct sg_table *sgt;
> > >  	int r = 0;
> > >  
> > > @@ -100,9 +102,9 @@ static struct sg_table *xe_dma_buf_map(struct dma_buf_attachment *attach,
> > >  
> > >  	if (!xe_bo_is_pinned(bo)) {
> > >  		if (!attach->peer2peer)
> > > -			r = xe_bo_migrate(bo, XE_PL_TT);
> > > +			r = xe_bo_migrate(bo, XE_PL_TT, exec);
> > >  		else
> > > -			r = xe_bo_validate(bo, NULL, false);
> > > +			r = xe_bo_validate(bo, NULL, false, exec);
> > >  		if (r)
> > >  			return ERR_PTR(r);
> > >  	}
> > > @@ -161,13 +163,14 @@ static int xe_dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
> > >  	struct xe_bo *bo = gem_to_xe_bo(obj);
> > >  	bool reads =  (direction == DMA_BIDIRECTIONAL ||
> > >  		       direction == DMA_FROM_DEVICE);
> > > +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> > >  
> > >  	if (!reads)
> > >  		return 0;
> > >  
> > >  	/* Can we do interruptible lock here? */
> > >  	xe_bo_lock(bo, false);
> > > -	(void)xe_bo_migrate(bo, XE_PL_TT);
> > > +	(void)xe_bo_migrate(bo, XE_PL_TT, exec);
> > >  	xe_bo_unlock(bo);
> > >  
> > >  	return 0;
> > > @@ -208,13 +211,14 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
> > >  {
> > >  	struct dma_resv *resv = dma_buf->resv;
> > >  	struct xe_device *xe = to_xe_device(dev);
> > > +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> > >  	struct xe_bo *bo;
> > >  	int ret;
> > >  
> > >  	dma_resv_lock(resv, NULL);
> > >  	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
> > >  				    0, /* Will require 1way or 2way for vm_bind */
> > > -				    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM);
> > > +				    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM, exec);
> > >  	if (IS_ERR(bo)) {
> > >  		ret = PTR_ERR(bo);
> > >  		goto error;
> > > @@ -232,8 +236,9 @@ static void xe_dma_buf_move_notify(struct dma_buf_attachment *attach)
> > >  {
> > >  	struct drm_gem_object *obj = attach->importer_priv;
> > >  	struct xe_bo *bo = gem_to_xe_bo(obj);
> > > +	struct drm_exec *exec = XE_VALIDATION_UNSUPPORTED;
> > >  
> > > -	XE_WARN_ON(xe_bo_evict(bo));
> > > +	XE_WARN_ON(xe_bo_evict(bo, exec));
> > >  }
> > >  
> > >  static const struct dma_buf_attach_ops xe_dma_buf_attach_ops = {
> > > diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> > > index 44364c042ad7..0bcb4fb9a10e 100644
> > > --- a/drivers/gpu/drm/xe/xe_exec.c
> > > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > > @@ -97,9 +97,13 @@
> > >  static int xe_exec_fn(struct drm_gpuvm_exec *vm_exec)
> > >  {
> > >  	struct xe_vm *vm = container_of(vm_exec->vm, struct xe_vm, gpuvm);
> > > +	int ret;
> > >  
> > >  	/* The fence slot added here is intended for the exec sched job. */
> > > -	return xe_vm_validate_rebind(vm, &vm_exec->exec, 1);
> > > +	xe_vm_set_validation_exec(vm, &vm_exec->exec);
> > > +	ret = xe_vm_validate_rebind(vm, &vm_exec->exec, 1);
> > > +	xe_vm_set_validation_exec(vm, NULL);
> > > +	return ret;
> > >  }
> > >  
> > >  int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> > > diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> > > index e03222f5ac5a..a47c0131956b 100644
> > > --- a/drivers/gpu/drm/xe/xe_ggtt.c
> > > +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> > > @@ -731,7 +731,7 @@ void xe_ggtt_map_bo_unlocked(struct xe_ggtt *ggtt, struct xe_bo *bo)
> > >  }
> > >  
> > >  static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
> > > -				  u64 start, u64 end)
> > > +				  u64 start, u64 end, struct drm_exec *exec)
> > >  {
> > >  	u64 alignment = bo->min_align > 0 ? bo->min_align : XE_PAGE_SIZE;
> > >  	u8 tile_id = ggtt->tile->id;
> > > @@ -746,7 +746,7 @@ static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
> > >  		return 0;
> > >  	}
> > >  
> > > -	err = xe_bo_validate(bo, NULL, false);
> > > +	err = xe_bo_validate(bo, NULL, false, exec);
> > >  	if (err)
> > >  		return err;
> > >  
> > > @@ -788,25 +788,28 @@ static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
> > >   * @bo: the &xe_bo to be inserted
> > >   * @start: address where it will be inserted
> > >   * @end: end of the range where it will be inserted
> > > + * @exec: The drm_exec transaction to use for exhaustive eviction.
> > >   *
> > >   * Return: 0 on success or a negative error code on failure.
> > >   */
> > >  int xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
> > > -			 u64 start, u64 end)
> > > +			 u64 start, u64 end, struct drm_exec *exec)
> > >  {
> > > -	return __xe_ggtt_insert_bo_at(ggtt, bo, start, end);
> > > +	return __xe_ggtt_insert_bo_at(ggtt, bo, start, end, exec);
> > >  }
> > >  
> > >  /**
> > >   * xe_ggtt_insert_bo - Insert BO into GGTT
> > >   * @ggtt: the &xe_ggtt where bo will be inserted
> > >   * @bo: the &xe_bo to be inserted
> > > + * @exec: The drm_exec transaction to use for exhaustive eviction.
> > >   *
> > >   * Return: 0 on success or a negative error code on failure.
> > >   */
> > > -int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo)
> > > +int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo,
> > > +		      struct drm_exec *exec)
> > >  {
> > > -	return __xe_ggtt_insert_bo_at(ggtt, bo, 0, U64_MAX);
> > > +	return __xe_ggtt_insert_bo_at(ggtt, bo, 0, U64_MAX, exec);
> > >  }
> > >  
> > >  /**
> > > diff --git a/drivers/gpu/drm/xe/xe_ggtt.h b/drivers/gpu/drm/xe/xe_ggtt.h
> > > index fbe1e397d05d..75fc7a1efea7 100644
> > > --- a/drivers/gpu/drm/xe/xe_ggtt.h
> > > +++ b/drivers/gpu/drm/xe/xe_ggtt.h
> > > @@ -10,6 +10,7 @@
> > >  
> > >  struct drm_printer;
> > >  struct xe_tile;
> > > +struct drm_exec;
> > >  
> > >  struct xe_ggtt *xe_ggtt_alloc(struct xe_tile *tile);
> > >  int xe_ggtt_init_early(struct xe_ggtt *ggtt);
> > > @@ -31,9 +32,9 @@ bool xe_ggtt_node_allocated(const struct xe_ggtt_node *node);
> > >  void xe_ggtt_map_bo(struct xe_ggtt *ggtt, struct xe_ggtt_node *node,
> > >  		    struct xe_bo *bo, u16 pat_index);
> > >  void xe_ggtt_map_bo_unlocked(struct xe_ggtt *ggtt, struct xe_bo *bo);
> > > -int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo);
> > > +int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo, struct drm_exec *exec);
> > >  int xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
> > > -			 u64 start, u64 end);
> > > +			 u64 start, u64 end, struct drm_exec *exec);
> > >  void xe_ggtt_remove_bo(struct xe_ggtt *ggtt, struct xe_bo *bo);
> > >  u64 xe_ggtt_largest_hole(struct xe_ggtt *ggtt, u64 alignment, u64 *spare);
> > >  
> > > diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > index ab43dec52776..2c7f10cc423f 100644
> > > --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > @@ -94,12 +94,12 @@ static int xe_pf_begin(struct drm_exec *exec, struct xe_vma *vma,
> > >  		}
> > >  
> > >  		/* Migrate to VRAM, move should invalidate the VMA first */
> > > -		err = xe_bo_migrate(bo, vram->placement);
> > > +		err = xe_bo_migrate(bo, vram->placement, exec);
> > >  		if (err)
> > >  			return err;
> > >  	} else if (bo) {
> > >  		/* Create backing store if needed */
> > > -		err = xe_bo_validate(bo, vm, true);
> > > +		err = xe_bo_validate(bo, vm, true, exec);
> > >  		if (err)
> > >  			return err;
> > >  	}
> > > diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> > > index c8f0320d032f..906011671b60 100644
> > > --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> > > +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> > > @@ -1452,6 +1452,7 @@ static bool pf_release_vf_config_lmem(struct xe_gt *gt, struct xe_gt_sriov_confi
> > >  static int pf_provision_vf_lmem(struct xe_gt *gt, unsigned int vfid, u64 size)
> > >  {
> > >  	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
> > > +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> > >  	struct xe_device *xe = gt_to_xe(gt);
> > >  	struct xe_tile *tile = gt_to_tile(gt);
> > >  	struct xe_bo *bo;
> > > @@ -1484,11 +1485,12 @@ static int pf_provision_vf_lmem(struct xe_gt *gt, unsigned int vfid, u64 size)
> > >  				 XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> > >  				 XE_BO_FLAG_NEEDS_2M |
> > >  				 XE_BO_FLAG_PINNED |
> > > -				 XE_BO_FLAG_PINNED_LATE_RESTORE);
> > > +				 XE_BO_FLAG_PINNED_LATE_RESTORE,
> > > +				 exec);
> > >  	if (IS_ERR(bo))
> > >  		return PTR_ERR(bo);
> > >  
> > > -	err = xe_bo_pin(bo);
> > > +	err = xe_bo_pin(bo, exec);
> > >  	xe_bo_unlock(bo);
> > >  	if (unlikely(err)) {
> > >  		xe_bo_put(bo);
> > > diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> > > index e35c6d4def20..39e3aa6df25a 100644
> > > --- a/drivers/gpu/drm/xe/xe_svm.c
> > > +++ b/drivers/gpu/drm/xe/xe_svm.c
> > > @@ -700,6 +700,7 @@ static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
> > >  	struct device *dev = xe->drm.dev;
> > >  	struct drm_buddy_block *block;
> > >  	struct list_head *blocks;
> > > +	struct drm_exec *exec;
> > >  	struct xe_bo *bo;
> > >  	ktime_t time_end = 0;
> > >  	int err, idx;
> > > @@ -708,12 +709,13 @@ static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
> > >  		return -ENODEV;
> > >  
> > >  	xe_pm_runtime_get(xe);
> > > +	exec = XE_VALIDATION_UNIMPLEMENTED;
> > >  
> > >   retry:
> > >  	bo = xe_bo_create_locked(vr->xe, NULL, NULL, end - start,
> > >  				 ttm_bo_type_device,
> > >  				 (IS_DGFX(xe) ? XE_BO_FLAG_VRAM(vr) : XE_BO_FLAG_SYSTEM) |
> > > -				 XE_BO_FLAG_CPU_ADDR_MIRROR);
> > > +				 XE_BO_FLAG_CPU_ADDR_MIRROR, exec);
> > >  	if (IS_ERR(bo)) {
> > >  		err = PTR_ERR(bo);
> > >  		if (xe_vm_validate_should_retry(NULL, err, &time_end))
> > > diff --git a/drivers/gpu/drm/xe/xe_validation.c b/drivers/gpu/drm/xe/xe_validation.c
> > > new file mode 100644
> > > index 000000000000..cc0684d24e02
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/xe/xe_validation.c
> > > @@ -0,0 +1,49 @@
> > > +// SPDX-License-Identifier: MIT
> > > +/*
> > > + * Copyright © 2024 Intel Corporation
> > > + */
> > > +#include "xe_bo.h"
> > > +#include <drm/drm_exec.h>
> > > +#include <drm/drm_gem.h>
> > > +
> > > +#include "xe_assert.h"
> > > +#include "xe_validation.h"
> > > +
> > > +#ifdef CONFIG_DRM_XE_DEBUG
> > > +/**
> > > + * xe_validation_assert_exec() - Assert that the drm_exec pointer is suitable
> > > + * for validation.
> > > + * @xe: Pointer to the xe device.
> > > + * @exec: The drm_exec pointer to check.
> > > + * @obj: Pointer to the object subject to validation.
> > > + *
> > > + * NULL exec pointers are not allowed.
> > > + * For XE_VALIDATION_UNIMPLEMENTED, no checking.
> > > + * For XE_VLIDATION_OPT_OUT, check that the caller is a kunit test
> > > + * For XE_VALIDATION_UNSUPPORTED, check that the object subject to
> > > + * validation is a dma-buf, for which support for ww locking is
> > > + * not in place in the dma-buf layer.
> > > + */
> > > +void xe_validation_assert_exec(const struct xe_device *xe,
> > > +			       const struct drm_exec *exec,
> > > +			       const struct drm_gem_object *obj)
> > > +{
> > > +	xe_assert(xe, exec);
> > > +	if (IS_ERR(exec)) {
> > > +		switch (PTR_ERR(exec)) {
> > > +		case __XE_VAL_UNIMPLEMENTED:
> > > +			break;
> > > +		case __XE_VAL_UNSUPPORTED:
> > > +			xe_assert(xe, !!obj->dma_buf);
> > > +			break;
> > > +#if IS_ENABLED(CONFIG_KUNIT)
> > > +		case __XE_VAL_OPT_OUT:
> > > +			xe_assert(xe, current->kunit_test);
> > > +			break;
> > > +#endif
> > > +		default:
> > > +			xe_assert(xe, false);
> > > +		}
> > > +	}
> > > +}
> > > +#endif
> > > diff --git a/drivers/gpu/drm/xe/xe_validation.h b/drivers/gpu/drm/xe/xe_validation.h
> > > new file mode 100644
> > > index 000000000000..db50feacad7a
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/xe/xe_validation.h
> > > @@ -0,0 +1,69 @@
> > > +/* SPDX-License-Identifier: MIT */
> > > +/*
> > > + * Copyright © 2024 Intel Corporation
> > > + */
> > > +#ifndef _XE_VALIDATION_H_
> > > +#define _XE_VALIDATION_H_
> > > +
> > > +#include <linux/dma-resv.h>
> > > +#include <linux/types.h>
> > > +
> > > +struct drm_exec;
> > > +struct drm_gem_object;
> > > +struct xe_device;
> > > +
> > > +#ifdef CONFIG_PROVE_LOCKING
> > > +/**
> > > + * xe_validation_lockdep() - Assert that a drm_exec locking transaction can
> > > + * be initialized at this point.
> > > + */
> > > +static inline void xe_validation_lockdep(void)
> > > +{
> > > +	struct ww_acquire_ctx ticket;
> > > +
> > > +	ww_acquire_init(&ticket, &reservation_ww_class);
> > > +	ww_acquire_fini(&ticket);
> > > +}
> > > +#else
> > > +static inline void xe_validation_lockdep(void)
> > > +{
> > > +}
> > > +#endif
> > > +
> > > +/*
> > > + * Various values of the drm_exec pointer where we've not (yet)
> > > + * implemented full ww locking.
> > > + *
> > > + * XE_VALIDATION_UNIMPLEMENTED means implementation is pending.
> > > + * A lockdep check is made to assure that a drm_exec locking
> > > + * transaction can actually take place where the macro is
> > > + * used. If this asserts, the exec pointer needs to be assigned
> > > + * higher up in the callchain and passed down.
> > > + *
> > > + * XE_VALIDATION_UNSUPPORTED is for dma-buf code only where
> > > + * the dma-buf layer doesn't support WW locking.
> > > + *
> > > + * XE_VALIDATION_OPT_OUT is for simplification of kunit tests where
> > > + * exhaustive eviction isn't necessary.
> > > + */
> > > +#define __XE_VAL_UNIMPLEMENTED -EINVAL
> > > +#define XE_VALIDATION_UNIMPLEMENTED (xe_validation_lockdep(),		\
> > > +				     (struct drm_exec *)ERR_PTR(__XE_VAL_UNIMPLEMENTED))
> > > +
> > > +#define __XE_VAL_UNSUPPORTED -EOPNOTSUPP
> > > +#define XE_VALIDATION_UNSUPPORTED ((struct drm_exec *)ERR_PTR(__XE_VAL_UNSUPPORTED))
> > > +
> > > +#define __XE_VAL_OPT_OUT -ENOMEM
> > > +#define XE_VALIDATION_OPT_OUT (xe_validation_lockdep(), \
> > > +			       (struct drm_exec *)ERR_PTR(__XE_VAL_OPT_OUT))
> > > +#ifdef CONFIG_DRM_XE_DEBUG
> > > +void xe_validation_assert_exec(const struct xe_device *xe, const struct drm_exec *exec,
> > > +			       const struct drm_gem_object *obj);
> > > +#else
> > > +#define xe_validation_assert_exec(_xe, _exec, _obj)	\
> > > +	do {						\
> > > +		(void)_xe; (void)_exec; (void)_obj;	\
> > > +	} while (0)
> > > +#endif
> > > +
> > > +#endif
> > > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > > index 12e661960244..600aaadb4bee 100644
> > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > @@ -393,7 +393,7 @@ static int xe_gpuvm_validate(struct drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
> > >  		list_move_tail(&gpuva_to_vma(gpuva)->combined_links.rebind,
> > >  			       &vm->rebind_list);
> > >  
> > > -	ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, false);
> > > +	ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, false, exec);
> > >  	if (ret)
> > >  		return ret;
> > >  
> > > @@ -451,6 +451,7 @@ static int xe_preempt_work_begin(struct drm_exec *exec, struct xe_vm *vm,
> > >  	if (err)
> > >  		return err;
> > >  
> > > +	xe_vm_set_validation_exec(vm, exec);
> > >  	if (xe_vm_is_idle(vm)) {
> > >  		vm->preempt.rebind_deactivated = true;
> > >  		*done = true;
> > > @@ -516,6 +517,7 @@ static void preempt_rebind_work_func(struct work_struct *w)
> > >  		err = xe_preempt_work_begin(&exec, vm, &done);
> > >  		drm_exec_retry_on_contention(&exec);
> > >  		if (err || done) {
> > > +			xe_vm_set_validation_exec(vm, NULL);
> > >  			drm_exec_fini(&exec);
> > >  			if (err && xe_vm_validate_should_retry(&exec, err, &end))
> > >  				err = -EAGAIN;
> > > @@ -565,6 +567,7 @@ static void preempt_rebind_work_func(struct work_struct *w)
> > >  	up_read(&vm->userptr.notifier_lock);
> > >  
> > >  out_unlock:
> > > +	xe_vm_set_validation_exec(vm, NULL);
> > >  	drm_exec_fini(&exec);
> > >  out_unlock_outer:
> > >  	if (err == -EAGAIN) {
> > > @@ -1375,6 +1378,8 @@ int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma)
> > >  	err = drm_exec_lock_obj(exec, xe_vm_obj(vm));
> > >  	if (!err && bo && !bo->vm)
> > >  		err = drm_exec_lock_obj(exec, &bo->ttm.base);
> > > +	if (!err)
> > > +		xe_vm_set_validation_exec(vm, exec);
> > 
> > 
> > Do you have imbalance here? I see this function called in xe_pf_begin
> > and xe_vma_destroy_unlocked but I don't see
> > xe_vm_set_validation_exec(vm, NULL) called.
> > 
> > 
> > >  
> > >  	return err;
> > >  }
> > > @@ -2889,7 +2894,7 @@ static int vma_lock_and_validate(struct drm_exec *exec, struct xe_vma *vma,
> > >  			err = drm_exec_lock_obj(exec, &bo->ttm.base);
> > >  		if (!err && validate)
> > >  			err = xe_bo_validate(bo, vm,
> > > -					     !xe_vm_in_preempt_fence_mode(vm));
> > > +					     !xe_vm_in_preempt_fence_mode(vm), exec);
> > >  	}
> > >  
> > >  	return err;
> > > @@ -3012,7 +3017,8 @@ static int op_lock_and_prep(struct drm_exec *exec, struct xe_vm *vm,
> > >  					    false);
> > >  		if (!err && !xe_vma_has_no_bo(vma))
> > >  			err = xe_bo_migrate(xe_vma_bo(vma),
> > > -					    region_to_mem_type[region]);
> > > +					    region_to_mem_type[region],
> > > +					    exec);
> > >  		break;
> > >  	}
> > >  	default:
> > > @@ -3052,6 +3058,7 @@ static int vm_bind_ioctl_ops_lock_and_prep(struct drm_exec *exec,
> > >  	if (err)
> > >  		return err;
> > >  
> > > +	xe_vm_set_validation_exec(vm, exec);
> > >  	list_for_each_entry(op, &vops->list, link) {
> > >  		err = op_lock_and_prep(exec, vm, op);
> > >  		if (err)
> > > @@ -3850,10 +3857,18 @@ struct dma_fence *xe_vm_bind_kernel_bo(struct xe_vm *vm, struct xe_bo *bo,
> > >   */
> > >  int xe_vm_lock(struct xe_vm *vm, bool intr)
> > >  {
> > > +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> > > +	int ret;
> > > +
> > >  	if (intr)
> > > -		return dma_resv_lock_interruptible(xe_vm_resv(vm), NULL);
> > > +		ret = dma_resv_lock_interruptible(xe_vm_resv(vm), NULL);
> > > +	else
> > > +		ret = dma_resv_lock(xe_vm_resv(vm), NULL);
> > > +
> > > +	if (!ret)
> > > +		xe_vm_set_validation_exec(vm, exec);
> > >  
> > > -	return dma_resv_lock(xe_vm_resv(vm), NULL);
> > > +	return ret;
> > >  }
> > >  
> > >  /**
> > > @@ -3864,6 +3879,7 @@ int xe_vm_lock(struct xe_vm *vm, bool intr)
> > >   */
> > >  void xe_vm_unlock(struct xe_vm *vm)
> > >  {
> > > +	xe_vm_set_validation_exec(vm, NULL);
> > >  	dma_resv_unlock(xe_vm_resv(vm));
> > >  }
> > >  
> > > diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> > > index 2ecb417c19a2..4ba26eed7e96 100644
> > > --- a/drivers/gpu/drm/xe/xe_vm.h
> > > +++ b/drivers/gpu/drm/xe/xe_vm.h
> > > @@ -321,7 +321,7 @@ static inline void xe_vm_set_validating(struct xe_vm *vm, bool allow_res_evict)
> > >  	if (vm && !allow_res_evict) {
> > >  		xe_vm_assert_held(vm);
> > >  		/* Pairs with READ_ONCE in xe_vm_is_validating() */
> > > -		WRITE_ONCE(vm->validating, current);
> > > +		WRITE_ONCE(vm->validation.validating, current);
> > >  	}
> > >  }
> > >  
> > > @@ -339,7 +339,7 @@ static inline void xe_vm_clear_validating(struct xe_vm *vm, bool allow_res_evict
> > >  {
> > >  	if (vm && !allow_res_evict) {
> > >  		/* Pairs with READ_ONCE in xe_vm_is_validating() */
> > > -		WRITE_ONCE(vm->validating, NULL);
> > > +		WRITE_ONCE(vm->validation.validating, NULL);
> > >  	}
> > >  }
> > >  
> > > @@ -357,13 +357,40 @@ static inline void xe_vm_clear_validating(struct xe_vm *vm, bool allow_res_evict
> > >  static inline bool xe_vm_is_validating(struct xe_vm *vm)
> > >  {
> > >  	/* Pairs with WRITE_ONCE in xe_vm_is_validating() */
> > > -	if (READ_ONCE(vm->validating) == current) {
> > > +	if (READ_ONCE(vm->validation.validating) == current) {
> > >  		xe_vm_assert_held(vm);
> > >  		return true;
> > >  	}
> > >  	return false;
> > >  }
> > >  
> > > +/**
> > > + * xe_vm_set_validation_exec() - Accessor to set the drm_exec object
> > > + * @vm: The vm we want to register a drm_exec object with.
> > > + * @exec: The exec object we want to register.
> > > + *
> > > + * Set the drm_exec object used to lock the vm's resv.
> > > + */
> > > +static inline void xe_vm_set_validation_exec(struct xe_vm *vm, struct drm_exec *exec)
> > > +{
> > > +	xe_vm_assert_held(vm);
> > > +	vm->validation._exec = exec;
> > > +}
> > > +
> > > +/**
> > > + * xe_vm_set_validation_exec() - Accessor to read the drm_exec object
> > > + * @vm: The vm we want to register a drm_exec object with.
> > > + *
> > > + * Return: The drm_exec object used to lock the vm's resv. The value
> > > + * is a valid pointer, %NULL, or one of the special values defined in
> > > + * xe_validation.h.
> > > + */
> > > +static inline struct drm_exec *xe_vm_validation_exec(struct xe_vm *vm)
> > > +{
> > > +	xe_vm_assert_held(vm);
> > > +	return vm->validation._exec;
> > > +}
> > > +
> > >  /**
> > >   * xe_vm_has_valid_gpu_mapping() - Advisory helper to check if VMA or SVM range has
> > >   * a valid GPU mapping
> > > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> > > index 8a07feef503b..2f88808e36bb 100644
> > > --- a/drivers/gpu/drm/xe/xe_vm_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> > > @@ -312,19 +312,35 @@ struct xe_vm {
> > >  		bool capture_once;
> > >  	} error_capture;
> > >  
> > > +	/**
> > > +	 * @validation: Validation data only valid with the vm resv held.
> > > +	 * Note: This is really task state of the task holding the vm resv,
> > > +	 * and moving forward we should
> > > +	 * come up with a better way of passing this down the call-
> > > +	 * chain.
> > 
> > 
> > I've already mentioned this, attaching the _exec xe_vma_ops might be
> > good option as xe_vma_ops has lifetime of only existing for the bind
> > (i.e., it is stack variable) so you'd only need to set it (i.e., no
> > clear required).
> > 
> > I think patch largely makes sense.
> > 
> > Matt 
> > 
> > 
> > > +	 */
> > > +	struct {
> > > +		/**
> > > +		 * @validation.validating: The task that is currently making bos resident.
> > > +		 * for this vm.
> > > +		 * Protected by the VM's resv for writing. Opportunistic reading can be done
> > > +		 * using READ_ONCE. Note: This is a workaround for the
> > > +		 * TTM eviction_valuable() callback not being passed a struct
> > > +		 * ttm_operation_context(). Future work might want to address this.
> > > +		 */
> > > +		struct task_struct *validating;
> > > +		/**
> > > +		 *  @validation.exec The drm_exec context used when locking the vm resv.
> > > +		 *  Protected by the vm's resv.
> > > +		 */
> > > +		struct drm_exec *_exec;
> > > +	} validation;
> > > +
> > >  	/**
> > >  	 * @tlb_flush_seqno: Required TLB flush seqno for the next exec.
> > >  	 * protected by the vm resv.
> > >  	 */
> > >  	u64 tlb_flush_seqno;
> > > -	/**
> > > -	 * @validating: The task that is currently making bos resident for this vm.
> > > -	 * Protected by the VM's resv for writing. Opportunistic reading can be done
> > > -	 * using READ_ONCE. Note: This is a workaround for the
> > > -	 * TTM eviction_valuable() callback not being passed a struct
> > > -	 * ttm_operation_context(). Future work might want to address this.
> > > -	 */
> > > -	struct task_struct *validating;
> > >  	/** @batch_invalidate_tlb: Always invalidate TLB before batch start */
> > >  	bool batch_invalidate_tlb;
> > >  	/** @xef: XE file handle for tracking this VM's drm client */
> > > -- 
> > > 2.50.1
> > > 
> >
> 
> 
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 15/15] drm/xe: Convert pinned suspend eviction for exhaustive eviction
  2025-08-13 10:51 ` [PATCH 15/15] drm/xe: Convert pinned suspend eviction " Thomas Hellström
  2025-08-13 12:13   ` Matthew Auld
@ 2025-08-14 20:30   ` Matthew Brost
  2025-08-15 15:29     ` Thomas Hellström
  1 sibling, 1 reply; 66+ messages in thread
From: Matthew Brost @ 2025-08-14 20:30 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, Aug 13, 2025 at 12:51:21PM +0200, Thomas Hellström wrote:
> Pinned suspend eviction and preparation for eviction validates
> system memory for eviction buffers. Do that under a
> validation exclusive lock to avoid interfering with other
> processes validating system graphics memory.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.c | 205 +++++++++++++++++++------------------
>  1 file changed, 108 insertions(+), 97 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 82bf158426ad..efb9c88b6aa7 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -1139,43 +1139,47 @@ long xe_bo_shrink(struct ttm_operation_ctx *ctx, struct ttm_buffer_object *bo,
>  int xe_bo_notifier_prepare_pinned(struct xe_bo *bo)
>  {
>  	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
> -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> +	struct xe_validation_ctx ctx;
> +	struct drm_exec exec;
>  	struct xe_bo *backup;
>  	int ret = 0;
>  
> -	xe_bo_lock(bo, false);
> +	xe_validation_guard(&ctx, &xe->val, &exec, 0, ret, true) {
> +		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
> +		drm_exec_retry_on_contention(&exec);
> +		xe_assert(xe, !ret);
> +		xe_assert(xe, !bo->backup_obj);
>  
> -	xe_assert(xe, !bo->backup_obj);
> +		/*
> +		 * Since this is called from the PM notifier we might have raced with
> +		 * someone unpinning this after we dropped the pinned list lock and
> +		 * grabbing the above bo lock.
> +		 */
> +		if (!xe_bo_is_pinned(bo))
> +			break;
>  
> -	/*
> -	 * Since this is called from the PM notifier we might have raced with
> -	 * someone unpinning this after we dropped the pinned list lock and
> -	 * grabbing the above bo lock.
> -	 */
> -	if (!xe_bo_is_pinned(bo))
> -		goto out_unlock_bo;
> +		if (!xe_bo_is_vram(bo))
> +			break;
>  
> -	if (!xe_bo_is_vram(bo))
> -		goto out_unlock_bo;
> +		if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
> +			break;
>  
> -	if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
> -		goto out_unlock_bo;
> +		backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
> +					   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> +					   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> +					   XE_BO_FLAG_PINNED, &exec);
> +		if (IS_ERR(backup)) {
> +			drm_exec_retry_on_contention(&exec);
> +			ret = PTR_ERR(backup);
> +			xe_validation_retry_on_oom(&ctx, &ret);
> +			break;
> +		}
>  
> -	backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
> -				   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> -				   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> -				   XE_BO_FLAG_PINNED, exec);
> -	if (IS_ERR(backup)) {
> -		ret = PTR_ERR(backup);
> -		goto out_unlock_bo;
> +		backup->parent_obj = xe_bo_get(bo); /* Released by bo_destroy */
> +		ttm_bo_pin(&backup->ttm);
> +		bo->backup_obj = backup;
>  	}
>  
> -	backup->parent_obj = xe_bo_get(bo); /* Released by bo_destroy */
> -	ttm_bo_pin(&backup->ttm);
> -	bo->backup_obj = backup;
> -
> -out_unlock_bo:
> -	xe_bo_unlock(bo);
>  	return ret;
>  }
>  
> @@ -1215,99 +1219,106 @@ int xe_bo_notifier_unprepare_pinned(struct xe_bo *bo)
>  int xe_bo_evict_pinned(struct xe_bo *bo)
>  {
>  	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
> -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> +	struct xe_validation_ctx ctx;
> +	struct drm_exec exec;
>  	struct xe_bo *backup = bo->backup_obj;
>  	bool backup_created = false;
>  	bool unmap = false;
>  	int ret = 0;
>  
> -	xe_bo_lock(bo, false);
> +	xe_validation_guard(&ctx, &xe->val, &exec, 0, ret, true) {
> +		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
> +		drm_exec_retry_on_contention(&exec);
> +		xe_assert(xe, !ret);
>  
> -	if (WARN_ON(!bo->ttm.resource)) {
> -		ret = -EINVAL;
> -		goto out_unlock_bo;
> -	}
> +		if (WARN_ON(!bo->ttm.resource)) {
> +			ret = -EINVAL;
> +			break;
> +		}
>  
> -	if (WARN_ON(!xe_bo_is_pinned(bo))) {
> -		ret = -EINVAL;
> -		goto out_unlock_bo;
> -	}
> +		if (WARN_ON(!xe_bo_is_pinned(bo))) {
> +			ret = -EINVAL;
> +			break;
> +		}
>  
> -	if (!xe_bo_is_vram(bo))
> -		goto out_unlock_bo;
> +		if (!xe_bo_is_vram(bo))
> +			break;
>  
> -	if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
> -		goto out_unlock_bo;
> +		if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
> +			break;
>  
> -	if (!backup) {
> -		backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
> -					   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> -					   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> -					   XE_BO_FLAG_PINNED, exec);
> -		if (IS_ERR(backup)) {
> -			ret = PTR_ERR(backup);
> -			goto out_unlock_bo;
> +		if (!backup) {
> +			backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL,
> +						   xe_bo_size(bo),
> +						   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> +						   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> +						   XE_BO_FLAG_PINNED, &exec);
> +			if (IS_ERR(backup)) {
> +				drm_exec_retry_on_contention(&exec);
> +				ret = PTR_ERR(backup);
> +				xe_validation_retry_on_oom(&ctx, &ret);
> +				break;
> +			}
> +			backup->parent_obj = xe_bo_get(bo); /* Released by bo_destroy */
> +			backup_created = true;
>  		}
> -		backup->parent_obj = xe_bo_get(bo); /* Released by bo_destroy */
> -		backup_created = true;
> -	}
>  
> -	if (xe_bo_is_user(bo) || (bo->flags & XE_BO_FLAG_PINNED_LATE_RESTORE)) {
> -		struct xe_migrate *migrate;
> -		struct dma_fence *fence;
> -
> -		if (bo->tile)
> -			migrate = bo->tile->migrate;
> -		else
> -			migrate = mem_type_to_migrate(xe, bo->ttm.resource->mem_type);
> +		if (xe_bo_is_user(bo) || (bo->flags & XE_BO_FLAG_PINNED_LATE_RESTORE)) {
> +			struct xe_migrate *migrate;
> +			struct dma_fence *fence;
>  
> -		ret = dma_resv_reserve_fences(bo->ttm.base.resv, 1);
> -		if (ret)
> -			goto out_backup;
> +			if (bo->tile)
> +				migrate = bo->tile->migrate;
> +			else
> +				migrate = mem_type_to_migrate(xe, bo->ttm.resource->mem_type);
>  
> -		ret = dma_resv_reserve_fences(backup->ttm.base.resv, 1);
> -		if (ret)
> -			goto out_backup;
> +			ret = dma_resv_reserve_fences(bo->ttm.base.resv, 1);
> +			if (ret)
> +				goto out_backup;
>  
> -		fence = xe_migrate_copy(migrate, bo, backup, bo->ttm.resource,
> -					backup->ttm.resource, false);
> -		if (IS_ERR(fence)) {
> -			ret = PTR_ERR(fence);
> -			goto out_backup;
> -		}
> +			ret = dma_resv_reserve_fences(backup->ttm.base.resv, 1);
> +			if (ret)
> +				goto out_backup;
>  
> -		dma_resv_add_fence(bo->ttm.base.resv, fence,
> -				   DMA_RESV_USAGE_KERNEL);
> -		dma_resv_add_fence(backup->ttm.base.resv, fence,
> -				   DMA_RESV_USAGE_KERNEL);
> -		dma_fence_put(fence);
> -	} else {
> -		ret = xe_bo_vmap(backup);
> -		if (ret)
> -			goto out_backup;
> +			fence = xe_migrate_copy(migrate, bo, backup, bo->ttm.resource,
> +						backup->ttm.resource, false);
> +			if (IS_ERR(fence)) {
> +				ret = PTR_ERR(fence);
> +				goto out_backup;
> +			}
>  
> -		if (iosys_map_is_null(&bo->vmap)) {
> -			ret = xe_bo_vmap(bo);
> +			dma_resv_add_fence(bo->ttm.base.resv, fence,
> +					   DMA_RESV_USAGE_KERNEL);
> +			dma_resv_add_fence(backup->ttm.base.resv, fence,
> +					   DMA_RESV_USAGE_KERNEL);
> +			dma_fence_put(fence);
> +		} else {
> +			ret = xe_bo_vmap(backup);
>  			if (ret)
>  				goto out_backup;
> -			unmap = true;
> -		}
>  
> -		xe_map_memcpy_from(xe, backup->vmap.vaddr, &bo->vmap, 0,
> -				   xe_bo_size(bo));
> -	}
> +			if (iosys_map_is_null(&bo->vmap)) {
> +				ret = xe_bo_vmap(bo);
> +				if (ret)
> +					goto out_vunmap;
> +				unmap = true;
> +			}
>  
> -	if (!bo->backup_obj)
> -		bo->backup_obj = backup;
> +			xe_map_memcpy_from(xe, backup->vmap.vaddr, &bo->vmap, 0,
> +					   xe_bo_size(bo));
> +		}
>  
> +		if (!bo->backup_obj)
> +			bo->backup_obj = backup;
> +out_vunmap:

I just want to confirm that this is safe. The cleanup.h documentation
discourages the use of goto because of scoping issues. I assume that
since this label is within the scope of the guard, it is fine.

It might be worth adding a quick note in the validation guard’s
kernel-doc mentioning that goto can be dangerous, explaining what is
allowed, and perhaps referencing the cleanup.h documentation. I could
see this being something developers might trip over.

Patch LGTM, though.

Matt

> +		xe_bo_vunmap(backup);
>  out_backup:
> -	xe_bo_vunmap(backup);
> -	if (ret && backup_created)
> -		xe_bo_put(backup);
> -out_unlock_bo:
> -	if (unmap)
> -		xe_bo_vunmap(bo);
> -	xe_bo_unlock(bo);
> +		if (ret && backup_created)
> +			xe_bo_put(backup);
> +		if (unmap)
> +			xe_bo_vunmap(bo);
> +	}
> +
>  	return ret;
>  }
>  
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 11/15] drm/xe: Convert xe_dma_buf.c for exhaustive eviction
  2025-08-13 10:51 ` [PATCH 11/15] drm/xe: Convert xe_dma_buf.c for exhaustive eviction Thomas Hellström
  2025-08-13 21:37   ` Matthew Brost
@ 2025-08-14 20:37   ` Matthew Brost
  2025-08-15  6:57     ` Thomas Hellström
  1 sibling, 1 reply; 66+ messages in thread
From: Matthew Brost @ 2025-08-14 20:37 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, Aug 13, 2025 at 12:51:17PM +0200, Thomas Hellström wrote:
> Convert dma-buf migration to XE_PL_TT and dma-buf import to
> support exhaustive eviction, using xe_validation_guard().
> It seems unlikely that the import would result in an -ENOMEM,
> but convert import anyway for completeness.
> 
> The dma-buf map_attachment() functionality unfortunately doesn't
> support passing a drm_exec, which means that foreign devices
> validating a dma-buf that we exported will not, unless they are
> xeKMD devices, participate in the exhaustive eviction scheme.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/xe_dma_buf.c | 59 +++++++++++++++++++++++----------
>  1 file changed, 42 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
> index 78a827d4e726..56df1d84df21 100644
> --- a/drivers/gpu/drm/xe/xe_dma_buf.c
> +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
> @@ -163,16 +163,27 @@ static int xe_dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
>  	struct xe_bo *bo = gem_to_xe_bo(obj);
>  	bool reads =  (direction == DMA_BIDIRECTIONAL ||
>  		       direction == DMA_FROM_DEVICE);
> -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> +	struct xe_validation_ctx ctx;
> +	struct drm_exec exec;
> +	int ret = 0;
>  
>  	if (!reads)
>  		return 0;
>  
>  	/* Can we do interruptible lock here? */
> -	xe_bo_lock(bo, false);
> -	(void)xe_bo_migrate(bo, XE_PL_TT, exec);
> -	xe_bo_unlock(bo);
> -
> +	xe_validation_guard(&ctx, &xe_bo_device(bo)->val, &exec, 0, ret, false) {
> +		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
> +		drm_exec_retry_on_contention(&exec);
> +		if (ret)
> +			goto out;

Does this work? The label is out of scope, which I believe is a no-no
with guards if I correctly understand the rules detailed in the
cleanup.h kernel documentation.

> +
> +		ret = xe_bo_migrate(bo, XE_PL_TT, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +		xe_validation_retry_on_oom(&ctx, &ret);
> +	}
> +out:
> +	/* If we failed, cpu-access takes place in current placement. */
> +	(void)ret;
>  	return 0;
>  }
>  
> @@ -211,24 +222,38 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
>  {
>  	struct dma_resv *resv = dma_buf->resv;
>  	struct xe_device *xe = to_xe_device(dev);
> -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> +	struct xe_validation_ctx ctx;
> +	struct drm_gem_object *dummy_obj;
> +	struct drm_exec exec;
>  	struct xe_bo *bo;
> -	int ret;
> -
> -	dma_resv_lock(resv, NULL);
> -	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
> -				    0, /* Will require 1way or 2way for vm_bind */
> -				    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM, exec);
> -	if (IS_ERR(bo)) {
> -		ret = PTR_ERR(bo);
> -		goto error;
> +	int ret = 0;
> +
> +	dummy_obj = drm_gpuvm_resv_object_alloc(&xe->drm);
> +	if (!dummy_obj)
> +		return ERR_PTR(-ENOMEM);
> +
> +	dummy_obj->resv = resv;
> +	xe_validation_guard(&ctx, &xe->val, &exec, 0, ret, false) {
> +		ret = drm_exec_lock_obj(&exec, dummy_obj);
> +		drm_exec_retry_on_contention(&exec);
> +		if (ret)
> +			goto error;
> +
> +		bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
> +					    0, /* Will require 1way or 2way for vm_bind */
> +					    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +		if (IS_ERR(bo)) {
> +			ret = PTR_ERR(bo);
> +			xe_validation_retry_on_oom(&ctx, &ret);
> +			goto error;

Same question / issue here with goto error label.

Matt

> +		}
>  	}
> -	dma_resv_unlock(resv);
> +	drm_gem_object_put(dummy_obj);
>  
>  	return &bo->ttm.base;
>  
>  error:
> -	dma_resv_unlock(resv);
>  	return ERR_PTR(ret);
>  }
>  
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 11/15] drm/xe: Convert xe_dma_buf.c for exhaustive eviction
  2025-08-14 20:37   ` Matthew Brost
@ 2025-08-15  6:57     ` Thomas Hellström
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Hellström @ 2025-08-15  6:57 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Thu, 2025-08-14 at 13:37 -0700, Matthew Brost wrote:
> On Wed, Aug 13, 2025 at 12:51:17PM +0200, Thomas Hellström wrote:
> > Convert dma-buf migration to XE_PL_TT and dma-buf import to
> > support exhaustive eviction, using xe_validation_guard().
> > It seems unlikely that the import would result in an -ENOMEM,
> > but convert import anyway for completeness.
> > 
> > The dma-buf map_attachment() functionality unfortunately doesn't
> > support passing a drm_exec, which means that foreign devices
> > validating a dma-buf that we exported will not, unless they are
> > xeKMD devices, participate in the exhaustive eviction scheme.
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_dma_buf.c | 59 +++++++++++++++++++++++------
> > ----
> >  1 file changed, 42 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c
> > b/drivers/gpu/drm/xe/xe_dma_buf.c
> > index 78a827d4e726..56df1d84df21 100644
> > --- a/drivers/gpu/drm/xe/xe_dma_buf.c
> > +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
> > @@ -163,16 +163,27 @@ static int xe_dma_buf_begin_cpu_access(struct
> > dma_buf *dma_buf,
> >  	struct xe_bo *bo = gem_to_xe_bo(obj);
> >  	bool reads =  (direction == DMA_BIDIRECTIONAL ||
> >  		       direction == DMA_FROM_DEVICE);
> > -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> > +	struct xe_validation_ctx ctx;
> > +	struct drm_exec exec;
> > +	int ret = 0;
> >  
> >  	if (!reads)
> >  		return 0;
> >  
> >  	/* Can we do interruptible lock here? */
> > -	xe_bo_lock(bo, false);
> > -	(void)xe_bo_migrate(bo, XE_PL_TT, exec);
> > -	xe_bo_unlock(bo);
> > -
> > +	xe_validation_guard(&ctx, &xe_bo_device(bo)->val, &exec,
> > 0, ret, false) {
> > +		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
> > +		drm_exec_retry_on_contention(&exec);
> > +		if (ret)
> > +			goto out;
> 
> Does this work? The label is out of scope, which I believe is a no-no
> with guards if I correctly understand the rules detailed in the
> cleanup.h kernel documentation.

Good point.

My impression is that it would work in this particular case, since we
just exit a scope.

But since it obviously goes against the guidelines in cleanup.h we
shouln't be using these and replace them with a break and an error
check after the guard if needed.

I'll fix this up and look at other instances of this to.

Thanks,
Thomas


> 
> > +
> > +		ret = xe_bo_migrate(bo, XE_PL_TT, &exec);
> > +		drm_exec_retry_on_contention(&exec);
> > +		xe_validation_retry_on_oom(&ctx, &ret);
> > +	}
> > +out:
> > +	/* If we failed, cpu-access takes place in current
> > placement. */
> > +	(void)ret;
> >  	return 0;
> >  }
> >  
> > @@ -211,24 +222,38 @@ xe_dma_buf_init_obj(struct drm_device *dev,
> > struct xe_bo *storage,
> >  {
> >  	struct dma_resv *resv = dma_buf->resv;
> >  	struct xe_device *xe = to_xe_device(dev);
> > -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> > +	struct xe_validation_ctx ctx;
> > +	struct drm_gem_object *dummy_obj;
> > +	struct drm_exec exec;
> >  	struct xe_bo *bo;
> > -	int ret;
> > -
> > -	dma_resv_lock(resv, NULL);
> > -	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL,
> > dma_buf->size,
> > -				    0, /* Will require 1way or
> > 2way for vm_bind */
> > -				    ttm_bo_type_sg,
> > XE_BO_FLAG_SYSTEM, exec);
> > -	if (IS_ERR(bo)) {
> > -		ret = PTR_ERR(bo);
> > -		goto error;
> > +	int ret = 0;
> > +
> > +	dummy_obj = drm_gpuvm_resv_object_alloc(&xe->drm);
> > +	if (!dummy_obj)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	dummy_obj->resv = resv;
> > +	xe_validation_guard(&ctx, &xe->val, &exec, 0, ret, false)
> > {
> > +		ret = drm_exec_lock_obj(&exec, dummy_obj);
> > +		drm_exec_retry_on_contention(&exec);
> > +		if (ret)
> > +			goto error;
> > +
> > +		bo = ___xe_bo_create_locked(xe, storage, NULL,
> > resv, NULL, dma_buf->size,
> > +					    0, /* Will require
> > 1way or 2way for vm_bind */
> > +					    ttm_bo_type_sg,
> > XE_BO_FLAG_SYSTEM, &exec);
> > +		drm_exec_retry_on_contention(&exec);
> > +		if (IS_ERR(bo)) {
> > +			ret = PTR_ERR(bo);
> > +			xe_validation_retry_on_oom(&ctx, &ret);
> > +			goto error;
> 
> Same question / issue here with goto error label.
> 
> Matt
> 
> > +		}
> >  	}
> > -	dma_resv_unlock(resv);
> > +	drm_gem_object_put(dummy_obj);
> >  
> >  	return &bo->ttm.base;
> >  
> >  error:
> > -	dma_resv_unlock(resv);
> >  	return ERR_PTR(ret);
> >  }
> >  
> > -- 
> > 2.50.1
> > 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 13/15] drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction
  2025-08-14 18:48   ` Matthew Brost
@ 2025-08-15  9:37     ` Thomas Hellström
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Hellström @ 2025-08-15  9:37 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Thu, 2025-08-14 at 11:48 -0700, Matthew Brost wrote:
> On Wed, Aug 13, 2025 at 12:51:19PM +0200, Thomas Hellström wrote:
> > Most users of xe_bo_create_pin_map_at() and
> > xe_bo_create_pin_map_at_aligned() are not using the vm parameter,
> > and that simplifies conversion. Introduce an
> > xe_bo_create_pin_map_at_novm() function and make the _aligned()
> > version static. Use xe_validation_guard() for conversion.
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  .../compat-i915-headers/gem/i915_gem_stolen.h | 24 ++----
> >  drivers/gpu/drm/xe/display/xe_fb_pin.c        | 45 +++++-----
> >  drivers/gpu/drm/xe/display/xe_plane_initial.c |  4 +-
> >  drivers/gpu/drm/xe/xe_bo.c                    | 83 ++++++++++++++-
> > ----
> >  drivers/gpu/drm/xe/xe_bo.h                    | 13 +--
> >  drivers/gpu/drm/xe/xe_eu_stall.c              |  6 +-
> >  6 files changed, 101 insertions(+), 74 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/compat-i915-
> > headers/gem/i915_gem_stolen.h b/drivers/gpu/drm/xe/compat-i915-
> > headers/gem/i915_gem_stolen.h
> > index 1ce1e9da975b..ab48635ddffa 100644
> > --- a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> > +++ b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> > @@ -21,9 +21,7 @@ static inline int
> > i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
> >  						       u32 size,
> > u32 align,
> >  						       u32 start,
> > u32 end)
> >  {
> > -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> >  	struct xe_bo *bo;
> > -	int err;
> >  	u32 flags = XE_BO_FLAG_PINNED | XE_BO_FLAG_STOLEN;
> >  
> >  	if (start < SZ_4K)
> > @@ -34,25 +32,15 @@ static inline int
> > i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
> >  		start = ALIGN(start, align);
> >  	}
> >  
> > -	bo = xe_bo_create_locked_range(xe,
> > xe_device_get_root_tile(xe),
> > -				       NULL, size, start, end,
> > -				       ttm_bo_type_kernel, flags,
> > 0, exec);
> > -	if (IS_ERR(bo)) {
> > -		err = PTR_ERR(bo);
> > -		bo = NULL;
> > -		return err;
> > -	}
> > -	err = xe_bo_pin(bo, exec);
> > -	xe_bo_unlock_vm_held(bo);
> > -
> > -	if (err) {
> > -		xe_bo_put(fb->bo);
> > -		bo = NULL;
> > -	}
> > +	bo = xe_bo_create_pin_map_at_novm(xe,
> > xe_device_get_root_tile(xe),
> > +					  size, start,
> > ttm_bo_type_kernel, flags,
> > +					  false, 0, true);
> > +	if (IS_ERR(bo))
> > +		return PTR_ERR(bo);
> >  
> >  	fb->bo = bo;
> >  
> > -	return err;
> > +	return 0;
> >  }
> >  
> >  static inline int i915_gem_stolen_insert_node(struct xe_device
> > *xe,
> > diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > index 43c45344ea26..d46ff7ebb0a1 100644
> > --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > @@ -102,29 +102,32 @@ static int __xe_pin_fb_vma_dpt(const struct
> > intel_framebuffer *fb,
> >  				 XE_PAGE_SIZE);
> >  
> >  	if (IS_DGFX(xe))
> > -		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0,
> > NULL,
> > -						      dpt_size,
> > ~0ull,
> > -						     
> > ttm_bo_type_kernel,
> > -						     
> > XE_BO_FLAG_VRAM0 |
> > -						     
> > XE_BO_FLAG_GGTT |
> > -						     
> > XE_BO_FLAG_PAGETABLE,
> > -						      alignment);
> > +		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> > +						   dpt_size,
> > ~0ull,
> > +						  
> > ttm_bo_type_kernel,
> > +						   true,
> > +						  
> > XE_BO_FLAG_VRAM0 |
> > +						   XE_BO_FLAG_GGTT
> > |
> > +						  
> > XE_BO_FLAG_PAGETABLE,
> > +						   alignment,
> > false);
> >  	else
> > -		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0,
> > NULL,
> > -						      dpt_size, 
> > ~0ull,
> > -						     
> > ttm_bo_type_kernel,
> > -						     
> > XE_BO_FLAG_STOLEN |
> > -						     
> > XE_BO_FLAG_GGTT |
> > -						     
> > XE_BO_FLAG_PAGETABLE,
> > -						      alignment);
> > +		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> > +						   dpt_size, 
> > ~0ull,
> > +						  
> > ttm_bo_type_kernel,
> > +						   true,
> > +						  
> > XE_BO_FLAG_STOLEN |
> > +						   XE_BO_FLAG_GGTT
> > |
> > +						  
> > XE_BO_FLAG_PAGETABLE,
> > +						   alignment,
> > false);
> >  	if (IS_ERR(dpt))
> > -		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0,
> > NULL,
> > -						      dpt_size, 
> > ~0ull,
> > -						     
> > ttm_bo_type_kernel,
> > -						     
> > XE_BO_FLAG_SYSTEM |
> > -						     
> > XE_BO_FLAG_GGTT |
> > -						     
> > XE_BO_FLAG_PAGETABLE,
> > -						      alignment);
> > +		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> > +						   dpt_size, 
> > ~0ull,
> > +						  
> > ttm_bo_type_kernel,
> > +						   true,
> > +						  
> > XE_BO_FLAG_SYSTEM |
> > +						   XE_BO_FLAG_GGTT
> > |
> > +						  
> > XE_BO_FLAG_PAGETABLE,
> > +						   alignment,
> > false);
> >  	if (IS_ERR(dpt))
> >  		return PTR_ERR(dpt);
> >  
> > diff --git a/drivers/gpu/drm/xe/display/xe_plane_initial.c
> > b/drivers/gpu/drm/xe/display/xe_plane_initial.c
> > index 826ac3d578b7..79d00127caf4 100644
> > --- a/drivers/gpu/drm/xe/display/xe_plane_initial.c
> > +++ b/drivers/gpu/drm/xe/display/xe_plane_initial.c
> > @@ -140,8 +140,8 @@ initial_plane_bo(struct xe_device *xe,
> >  			page_size);
> >  	size -= base;
> >  
> > -	bo = xe_bo_create_pin_map_at(xe, tile0, NULL, size,
> > phys_base,
> > -				     ttm_bo_type_kernel, flags);
> > +	bo = xe_bo_create_pin_map_at_novm(xe, tile0, size,
> > phys_base,
> > +					  ttm_bo_type_kernel,
> > flags, true, 0, false);
> >  	if (IS_ERR(bo)) {
> >  		drm_dbg(&xe->drm,
> >  			"Failed to create bo phys_base=%pa size %u
> > with flags %x: %li\n",
> > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > b/drivers/gpu/drm/xe/xe_bo.c
> > index 23b28eeef59f..c9928d4ee5a0 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > @@ -2253,29 +2253,20 @@ struct xe_bo *xe_bo_create_user(struct
> > xe_device *xe,
> >  	return bo;
> >  }
> >  
> > -struct xe_bo *xe_bo_create_pin_map_at(struct xe_device *xe, struct
> > xe_tile *tile,
> > -				      struct xe_vm *vm,
> > -				      size_t size, u64 offset,
> > -				      enum ttm_bo_type type, u32
> > flags)
> > -{
> > -	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size,
> > offset,
> > -					       type, flags, 0);
> > -}
> > -
> > -struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device
> > *xe,
> > -					      struct xe_tile
> > *tile,
> > -					      struct xe_vm *vm,
> > -					      size_t size, u64
> > offset,
> > -					      enum ttm_bo_type
> > type, u32 flags,
> > -					      u64 alignment)
> > +static struct xe_bo *xe_bo_create_pin_map_at_aligned(struct
> > xe_device *xe,
> > +						     struct
> > xe_tile *tile,
> > +						     struct xe_vm
> > *vm,
> > +						     size_t size,
> > u64 offset,
> > +						     enum
> > ttm_bo_type type, u32 flags,
> > +						     bool vmap,
> > u64 alignment,
> > +						     struct
> > drm_exec *exec)
> >  {
> >  	struct xe_bo *bo;
> >  	int err;
> >  	u64 start = offset == ~0ull ? 0 : offset;
> >  	u64 end = offset == ~0ull ? offset : start + size;
> > -	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) :
> > XE_VALIDATION_UNIMPLEMENTED;
> >  
> 
> General comment for the series: should all BO-layer functions that
> allocate (or may allocate) memory include a lockdep assertion that
> the
> xe_validation_device->val lock is held? We already have
> xe_validation_assert_exec in several places, which is similar, but
> IMO
> it wouldn’t hurt to also assert xe_validation_device->val in the
> relevant driver paths. The new TTM manager functions are good
> candidates
> as well. Consider adding a follow-up patch at the end of the series
> to
> add these assertions once all allocation paths adhere to the new
> locking
> model.

Good idea, although the val lock may go away when TTM matures, we can
then perhaps replace that with just a lockdep map.

/Thomas



> 
> Matt
> 
> > -	if (flags & XE_BO_FLAG_STOLEN &&
> > +	if (flags & XE_BO_FLAG_STOLEN && vmap &&
> >  	    xe_ttm_stolen_cpu_access_needs_ggtt(xe))
> >  		flags |= XE_BO_FLAG_GGTT;
> >  
> > @@ -2289,9 +2280,11 @@ struct xe_bo
> > *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
> >  	if (err)
> >  		goto err_put;
> >  
> > -	err = xe_bo_vmap(bo);
> > -	if (err)
> > -		goto err_unpin;
> > +	if (vmap) {
> > +		err = xe_bo_vmap(bo);
> > +		if (err)
> > +			goto err_unpin;
> > +	}
> >  
> >  	xe_bo_unlock_vm_held(bo);
> >  
> > @@ -2305,11 +2298,59 @@ struct xe_bo
> > *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
> >  	return ERR_PTR(err);
> >  }
> >  
> > +/**
> > + * xe_bo_create_pin_map_at_novm() - Create pinned and mapped bo at
> > optional VRAM offset
> > + * @xe: The xe device.
> > + * @tile: The tile to select for migration of this bo, and the
> > tile used for
> > + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel
> > bos.
> > + * @size: The storage size to use for the bo.
> > + * @offset: Optional VRAM offset or %0 for don't care.
> > + * @type: The TTM buffer object type.
> > + * @flags: XE_BO_FLAG_ flags.
> > + * @vmap: Whether to create a buffer object map.
> > + * @alignment: GGTT alignment.
> > + * @intr: Whether to execut any waits for backing store
> > interruptible.
> > + *
> > + * Create a pinned and optionally mapped bo with VRAM offset and
> > GGTT alignment
> > + * options. The bo will be external and not associated with a VM.
> > + *
> > + * Return: The buffer object on success. Negative error pointer on
> > failure.
> > + * In particular, the function may return ERR_PTR(%-EINTR) if
> > @intr was set
> > + * to true on entry.
> > + */
> > +struct xe_bo *
> > +xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile
> > *tile,
> > +			     size_t size, u64 offset, enum
> > ttm_bo_type type, u32 flags,
> > +			     bool vmap, u64 alignment, bool intr)
> > +{
> > +	u32 drm_exec_flags = intr ? DRM_EXEC_INTERRUPTIBLE_WAIT :
> > 0;
> > +	struct xe_validation_ctx ctx;
> > +	struct drm_exec exec;
> > +	struct xe_bo *bo;
> > +	int ret = 0;
> > +
> > +	xe_validation_guard(&ctx, &xe->val, &exec, drm_exec_flags,
> > ret, false) {
> > +		bo = xe_bo_create_pin_map_at_aligned(xe, tile,
> > NULL, size, offset,
> > +						     type, flags,
> > vmap,
> > +						     alignment,
> > &exec);
> > +		drm_exec_retry_on_contention(&exec);
> > +		if (IS_ERR(bo)) {
> > +			ret = PTR_ERR(bo);
> > +			xe_validation_retry_on_oom(&ctx, &ret);
> > +		}
> > +	}
> > +
> > +	return ret ? ERR_PTR(ret) : bo;
> > +}
> > +
> >  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct
> > xe_tile *tile,
> >  				   struct xe_vm *vm, size_t size,
> >  				   enum ttm_bo_type type, u32
> > flags)
> >  {
> > -	return xe_bo_create_pin_map_at(xe, tile, vm, size, ~0ull,
> > type, flags);
> > +	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) :
> > XE_VALIDATION_UNIMPLEMENTED;
> > +
> > +	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size,
> > ~0ull, type, flags,
> > +					       true, 0, exec);
> >  }
> >  
> >  static void __xe_bo_unpin_map_no_vm(void *arg)
> > diff --git a/drivers/gpu/drm/xe/xe_bo.h
> > b/drivers/gpu/drm/xe/xe_bo.h
> > index a625806deeb6..d06266af9662 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.h
> > +++ b/drivers/gpu/drm/xe/xe_bo.h
> > @@ -109,15 +109,10 @@ struct xe_bo *xe_bo_create_user(struct
> > xe_device *xe, struct xe_vm *vm, size_t s
> >  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct
> > xe_tile *tile,
> >  				   struct xe_vm *vm, size_t size,
> >  				   enum ttm_bo_type type, u32
> > flags);
> > -struct xe_bo *xe_bo_create_pin_map_at(struct xe_device *xe, struct
> > xe_tile *tile,
> > -				      struct xe_vm *vm, size_t
> > size, u64 offset,
> > -				      enum ttm_bo_type type, u32
> > flags);
> > -struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device
> > *xe,
> > -					      struct xe_tile
> > *tile,
> > -					      struct xe_vm *vm,
> > -					      size_t size, u64
> > offset,
> > -					      enum ttm_bo_type
> > type, u32 flags,
> > -					      u64 alignment);
> > +struct xe_bo *
> > +xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile
> > *tile,
> > +			     size_t size, u64 offset, enum
> > ttm_bo_type type,
> > +			     u32 flags, bool vmap, u64 alignment,
> > bool intr);
> >  struct xe_bo *xe_managed_bo_create_pin_map(struct xe_device *xe,
> > struct xe_tile *tile,
> >  					   size_t size, u32
> > flags);
> >  struct xe_bo *xe_managed_bo_create_from_data(struct xe_device *xe,
> > struct xe_tile *tile,
> > diff --git a/drivers/gpu/drm/xe/xe_eu_stall.c
> > b/drivers/gpu/drm/xe/xe_eu_stall.c
> > index fdd514fec5ef..afabfc125488 100644
> > --- a/drivers/gpu/drm/xe/xe_eu_stall.c
> > +++ b/drivers/gpu/drm/xe/xe_eu_stall.c
> > @@ -617,9 +617,9 @@ static int xe_eu_stall_data_buf_alloc(struct
> > xe_eu_stall_data_stream *stream,
> >  
> >  	size = stream->per_xecore_buf_size * last_xecore;
> >  
> > -	bo = xe_bo_create_pin_map_at_aligned(tile->xe, tile, NULL,
> > -					     size, ~0ull,
> > ttm_bo_type_kernel,
> > -					     XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_GGTT, SZ_64);
> > +	bo = xe_bo_create_pin_map_at_novm(tile->xe, tile, size,
> > ~0ull, ttm_bo_type_kernel,
> > +					  XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_GGTT, true,
> > +					  SZ_64, false);
> >  	if (IS_ERR(bo)) {
> >  		kfree(stream->xecore_buf);
> >  		return PTR_ERR(bo);
> > -- 
> > 2.50.1
> > 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 05/15] drm/xe: Introduce an xe_validation wrapper around drm_exec
  2025-08-13 17:25   ` Matthew Brost
@ 2025-08-15 15:04     ` Thomas Hellström
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Hellström @ 2025-08-15 15:04 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, 2025-08-13 at 10:25 -0700, Matthew Brost wrote:
> On Wed, Aug 13, 2025 at 12:51:11PM +0200, Thomas Hellström wrote:
> > Introduce a validation wrapper xe_validation_guard() as a helper
> > intended to be used around drm_exec transactions what perform
> > validations. Once TTM can handle exhaustive eviction we could
> > remove this wrapper or make it mostly a NO-OP unless other
> > functionality is added to it.
> > 
> > Currently the wrapper takes a read lock upon entry and if the
> > transaction hits an OOM, all locks are released and the
> > transaction is retried with a write-lock. If all other
> > validations participate in this scheme, the transaction with
> > the write lock will be the only transaction validating and
> > should have access to all available non-pinned memory.
> > 
> > There is currently a problem in that TTM converts -EDEADLOCKS to
> > -ENOMEM, and with ww_mutex slowpath error injections, we can hit
> > -ENOMEMs without having actually ran out of memory. We abuse
> > ww_mutex internals to detect such situations until TTM is fixes
> > to not convert the error code. In the meantime, injecting
> > ww_mutex slowpath -EDEADLOCKs is a good way to test
> > the implementation in the absence of real OOMs.
> > 
> > Just introduce the wrapper in this commit. It will be hooked up
> > to the driver in following commits.
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_validation.c | 199
> > +++++++++++++++++++++++++++++
> >  drivers/gpu/drm/xe/xe_validation.h | 107 ++++++++++++++++
> >  2 files changed, 306 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_validation.c
> > b/drivers/gpu/drm/xe/xe_validation.c
> > index cc0684d24e02..cd1424f04237 100644
> > --- a/drivers/gpu/drm/xe/xe_validation.c
> > +++ b/drivers/gpu/drm/xe/xe_validation.c
> > @@ -5,6 +5,7 @@
> >  #include "xe_bo.h"
> >  #include <drm/drm_exec.h>
> >  #include <drm/drm_gem.h>
> > +#include <drm/drm_gpuvm.h>
> >  
> >  #include "xe_assert.h"
> >  #include "xe_validation.h"
> > @@ -47,3 +48,201 @@ void xe_validation_assert_exec(const struct
> > xe_device *xe,
> >  	}
> >  }
> >  #endif
> > +
> > +static int xe_validation_lock(struct xe_validation_ctx *ctx)
> > +{
> > +	struct xe_validation_device *val = ctx->val;
> > +	int ret = 0;
> > +
> > +	if (ctx->flags & DRM_EXEC_INTERRUPTIBLE_WAIT) {
> > +		if (ctx->request_exclusive)
> > +			ret = down_write_killable(&val->lock);
> > +		else
> > +			ret = down_read_interruptible(&val->lock);
> > +	} else {
> > +		if (ctx->request_exclusive)
> > +			down_write(&val->lock);
> > +		else
> > +			down_read(&val->lock);
> > +	}
> > +
> > +	if (!ret) {
> > +		ctx->lock_held = true;
> > +		ctx->lock_held_exclusive = ctx->request_exclusive;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static void xe_validation_unlock(struct xe_validation_ctx *ctx)
> > +{
> > +	if (!ctx->lock_held)
> > +		return;
> > +
> > +	if (ctx->lock_held_exclusive)
> > +		up_write(&ctx->val->lock);
> > +	else
> > +		up_read(&ctx->val->lock);
> > +
> > +	ctx->lock_held = false;
> > +}
> > +
> > +/**
> > + * xe_validation_ctx_init() - Initialize an xe_validation_ctx
> > + * @ctx: The xe_validation_ctx to initialize.
> > + * @val: The xe_validation_device representing the validation
> > domain.
> > + * @exec: The struct drm_exec to use for the transaction.
> > + * @flags: The flags to use for drm_exec initialization.
> > + * @nr: The number of anticipated buffer object locks. Forwarded
> > to
> > + * drm_exec initialization.
> > + * @exclusive: Whether to use exclusive locking already on first
> > validation.
> > + *
> > + * Initialize and lock a an xe_validation transaction using the
> > validation domain
> > + * represented by @val. Also initialize the drm_exec object
> > forwarding
> > + * @flags and @nr to the drm_exec initialization. The @exclusive
> > parameter should
> > + * typically be set to false to avoid locking out other validators
> > from the
> > + * domain until an OOM is hit. For testing- or final attempt
> > purposes it can,
> > + * however, be set to true.
> > + *
> > + * Return: %0 on success, %-EINTR if interruptible initial locking
> > failed with a
> > + * signal pending.
> > + */
> > +int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct
> > xe_validation_device *val,
> > +			   struct drm_exec *exec, u32 flags,
> > unsigned int nr,
> > +			   bool exclusive)
> > +{
> > +	int ret;
> > +
> > +	ctx->exec = exec;
> > +	ctx->val = val;
> > +	ctx->lock_held = false;
> > +	ctx->lock_held_exclusive = false;
> > +	ctx->request_exclusive = exclusive;
> > +	ctx->flags = flags;
> > +	ctx->nr = nr;
> > +
> > +	ret = xe_validation_lock(ctx);
> > +	if (ret)
> > +		return ret;
> > +
> > +	drm_exec_init(exec, flags, nr);
> > +
> > +	return 0;
> > +}
> > +
> > +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> > +/*
> > + * This abuses both drm_exec and ww_mutex internals and should be
> > + * replaced by checking for -EDEADLK when we can make TTM
> > + * stop converting -EDEADLK to -ENOMEM.
> > + * An alternative is to not have exhaustive eviction with
> > + * CONFIG_DEBUG_WW_MUTEX_SLOWPATH until that happens.
> > + */
> 
> I vote to keep this the way you have it and live with the abuse until
> TTM is updated.
> 
> > +static bool xe_validation_contention_injected(struct drm_exec
> > *exec)
> > +{
> > +	return !!exec->ticket.contending_lock;
> > +}
> > +
> > +#else
> > +
> > +static bool xe_validation_contention_injected(struct drm_exec
> > *exec)
> > +{
> > +	return false;
> > +}
> > +
> > +#endif
> > +
> > +static bool __xe_validation_should_retry(struct xe_validation_ctx
> > *ctx, int ret)
> > +{
> > +	if (ret == -ENOMEM &&
> > +	    ((ctx->request_exclusive &&
> > +	      xe_validation_contention_injected(ctx->exec)) ||
> > +	     !ctx->request_exclusive)) {
> > +		ctx->request_exclusive = true;
> 
> Can the locking across multiple GPUs fall when request_exclusive is
> held
> and the GPUs are sharing dma-buffers? I suppose we'd need true WW
> locking throughout the stack (TTM) and a ticketed retry to guarantee
> foward progress.

Yes. Either that or widen the validation domain to be per driver rather
than per device.
With the drm_exec retry, in addition dma-buf needs to support passing a
drm_exec, but that otoh would be a layering violation. So I have
historically advocated for abstracting the ww mutex transaction with a
dma-buf locking context with callbacks rather than a drm_exec. Then
drm_exec could derive from that and keep its functionality for gem
objects.

> 
> > +		return true;
> > +	}
> > +
> > +	return false;
> > +}
> > +
> > +/**
> > + * xe_validation_exec_lock() - Perform drm_gpuvm_exec_lock within
> > a validation
> > + * transaction.
> > + * @ctx: An uninitialized xe_validation_ctx.
> > + * @vm_exec: An initialized struct vm_exec.
> > + * @val: The validation domain.
> > + *
> > + * The drm_gpuvm_exec_lock() function internally initializes its
> > drm_exec
> > + * transaction and therefore doesn't lend itself very well to be
> > using
> > + * xe_validation_ctx_init(). Provide a helper that takes an
> > uninitialized
> > + * xe_validation_ctx and calls drm_gpuvm_exec_lock() with OOM
> > retry.
> > + *
> > + * Return: %0 on success, negative error code on failure.
> > + */
> > +int xe_validation_exec_lock(struct xe_validation_ctx *ctx,
> > +			    struct drm_gpuvm_exec *vm_exec,
> > +			    struct xe_validation_device *val)
> > +{
> > +	int ret;
> > +
> > +	memset(ctx, 0, sizeof(*ctx));
> > +	ctx->exec = &vm_exec->exec;
> > +	ctx->flags = vm_exec->flags;
> > +	ctx->val = val;
> > +retry:
> > +	ret = xe_validation_lock(ctx);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = drm_gpuvm_exec_lock(vm_exec);
> > +	if (ret) {
> > +		xe_validation_unlock(ctx);
> > +		if (__xe_validation_should_retry(ctx, ret))
> > +			goto retry;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +/**
> > + * xe_validation_ctx_fini() - Finalize a validation transaction
> > + * @ctx: The Validation transaction to finalize.
> > + *
> > + * Finalize a validation transaction and its related drm_exec
> > transaction.
> > + */
> > +void xe_validation_ctx_fini(struct xe_validation_ctx *ctx)
> > +{
> > +	drm_exec_fini(ctx->exec);
> > +	xe_validation_unlock(ctx);
> > +}
> > +
> > +/**
> > + * xe_validation_should_retry() - Determine if a validation
> > transaction should retry
> > + * @ctx: The validation transaction.
> > + * @ret: Pointer to a return value variable.
> > + *
> > + * Determines whether a validation transaction should retry based
> > on the
> > + * internal transaction state and the return value pointed to by
> > @ret.
> > + * If a validation should be retried, the transaction is prepared
> > for that,
> > + * and the validation locked might be re-locked in exclusive mode,
> > and *@ret
> > + * is set to %0. If the re-locking errors, typically due to
> > interruptible
> > + * locking with signal pending, *@ret is instead set to -EINTR and
> > the
> > + * function returns %false.
> > + *
> > + * Return: %true if validation should be retried, %false
> > otherwise.
> > + */
> > +bool xe_validation_should_retry(struct xe_validation_ctx *ctx, int
> > *ret)
> > +{
> > +	if (__xe_validation_should_retry(ctx, *ret)) {
> > +		drm_exec_fini(ctx->exec);
> > +		*ret = 0;
> > +		if (ctx->request_exclusive != ctx-
> > >lock_held_exclusive) {
> > +			xe_validation_unlock(ctx);
> > +			*ret = xe_validation_lock(ctx);
> > +		}
> > +		drm_exec_init(ctx->exec, ctx->flags, ctx->nr);
> > +		return !*ret;
> > +	}
> > +
> > +	return false;
> > +}
> > diff --git a/drivers/gpu/drm/xe/xe_validation.h
> > b/drivers/gpu/drm/xe/xe_validation.h
> > index db50feacad7a..a708c260cf18 100644
> > --- a/drivers/gpu/drm/xe/xe_validation.h
> > +++ b/drivers/gpu/drm/xe/xe_validation.h
> > @@ -7,9 +7,11 @@
> >  
> >  #include <linux/dma-resv.h>
> >  #include <linux/types.h>
> > +#include <linux/rwsem.h>
> >  
> >  struct drm_exec;
> >  struct drm_gem_object;
> > +struct drm_gpuvm_exec;
> >  struct xe_device;
> >  
> >  #ifdef CONFIG_PROVE_LOCKING
> > @@ -66,4 +68,109 @@ void xe_validation_assert_exec(const struct
> > xe_device *xe, const struct drm_exec
> >  	} while (0)
> >  #endif
> >  
> > +/**
> > + * struct xe_validation_device - The domain for exhaustive
> > eviction
> > + * @lock: The lock used to exclude other processes from allocating
> > graphics memory
> > + *
> > + * The struct xe_validation_device represents the domain for which
> > we want to use
> > + * exhaustive eviction. The @lock is typically grabbed in read
> > mode for allocations
> > + * but when graphics memory allocation fails, it is retried with
> > the write mode held.
> > + */
> > +struct xe_validation_device {
> > +	struct rw_semaphore lock;
> > +};
> > +
> > +/**
> > + * struct xe_validation_ctx - A struct drm_exec subclass with
> > support for
> > + * exhaustive eviction
> > + * @exec: The drm_exec object base class. Note that we use a
> > pointer instead of
> > + * embedding to avoid diamond inheritance.
> > + * @val: The exhaustive eviction domain.
> > + * @lock_held: Whether The domain lock is currently held.
> > + * @lock_held_exclusive: Whether the domain lock is held in
> > exclusive mode.
> > + * @request_exclusive: Whether to lock exclusively (write mode)
> > the next time
> > + * the domain lock is locked.
> > + * @flags: The drm_exec flags used for drm_exec (re-
> > )initialization.
> > + * @nr: The drm_exec nr parameter used for drm_exec (re-
> > )initializaiton.
> > + */
> > +struct xe_validation_ctx {
> > +	struct drm_exec *exec;
> > +	struct xe_validation_device *val;
> > +	bool lock_held;
> > +	bool lock_held_exclusive;
> > +	bool request_exclusive;
> > +	u32 flags;
> > +	unsigned int nr;
> > +};
> > +
> > +int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct
> > xe_validation_device *val,
> > +			   struct drm_exec *exec, u32 flags,
> > unsigned int nr,
> > +			   bool exclusive);
> > +
> > +int xe_validation_exec_lock(struct xe_validation_ctx *ctx, struct
> > drm_gpuvm_exec *vm_exec,
> > +			    struct xe_validation_device *val);
> > +
> > +void xe_validation_ctx_fini(struct xe_validation_ctx *ctx);
> > +
> > +bool xe_validation_should_retry(struct xe_validation_ctx *ctx, int
> > *ret);
> > +
> > +/**
> > + * xe_validation_retry_on_oom() - Retry on oom in an xe_validaton
> > transaction
> > + * @_ctx: Pointer to the xe_validation_ctx
> > + * @_ret: The current error value possibly holding -ENOMEM
> > + *
> > + * Use this in way similar to drm_exec_retry_on_contention().
> > + * If @_ret contains -ENOMEM the tranaction is restarted once in a
> > way that
> > + * blocks other transactions and allows exhastive eviction. If the
> > transaction
> > + * was already restarted once, Just return the -ENOMEM. May also
> > set
> > + * _ret to -EINTR if not retrying and waits are interruptible.
> > + * May only be used within a drm_exec_until_all_locked() loop.
> > + */
> > +#define xe_validation_retry_on_oom(_ctx,
> > _ret)				\
> > +	do
> > {								\
> > +		if (xe_validation_should_retry(_ctx,
> > _ret))		\
> > +			goto
> > *__drm_exec_retry_ptr;			\
> > +	} while (0)
> > +
> > +/**
> > + * xe_validation_device_init - Initialize a struct
> > xe_validation_device
> > + * @val: The xe_validation_device to init.
> > + */
> > +static inline void
> > +xe_validation_device_init(struct xe_validation_device *val)
> > +{
> > +	init_rwsem(&val->lock);
> > +}
> > +
> > +/*
> > + * Make guard() and scoped_guard() work with xe_validation_ctx
> > + * so that we can exit transactions without caring about the
> > + * cleanup.
> > + */
> > +DEFINE_CLASS(xe_validation, struct xe_validation_ctx *,
> > +	     if (!IS_ERR(_T)) xe_validation_ctx_fini(_T);,
> 
> I think this should be 'if (_T)', right?
> 
> > +	     ({_ret = xe_validation_ctx_init(_ctx, _val, _exec,
> > _flags, 0, _excl);
> > +	       _ret ? NULL : _ctx; }),
> 
> Or here '_ret ? ERR_PTR(_ret) : _ctx;'
> 
> One or the other if I correctly understand how DEFINE_CLASS works.

Right. good catch.


> 
> Matt
> 
> > +	     struct xe_validation_ctx *_ctx, struct
> > xe_validation_device *_val,
> > +	     struct drm_exec *_exec, u32 _flags, int _ret, bool
> > _excl);
> > +static inline void
> > *class_xe_validation_lock_ptr(class_xe_validation_t *_T)
> > +{return *_T; }
> > +#define class_xe_validation_is_conditional false
> > +
> > +/**
> > + * xe_validation_guard() - An auto-cleanup xe_validation_ctx
> > transaction
> > + * @_ctx: The xe_validation_ctx.
> > + * @_val: The xe_validation_device.
> > + * @_exec: The struct drm_exec object
> > + * @_flags: Flags for the drm_exec transaction. See the struct
> > drm_exec documention!
> > + * @_ret: Return in / out parameter. May be set by this macro.
> > Typicall 0 when called.
> > + * @_excl: Whether to start in exclusive mode already in the first
> > iteration.
> > + *
> > + * This macro is will initiate a drm_exec transaction with
> > additional support for
> > + * exhaustive eviction.
> > + */
> > +#define xe_validation_guard(_ctx, _val, _exec, _flags, _ret,
> > _excl)	\
> > +	scoped_guard(xe_validation, _ctx, _val, _exec, _flags,
> > _ret, _excl) \
> > +	drm_exec_until_all_locked(_exec)
> > +
> >  #endif
> > -- 
> > 2.50.1
> > 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 11/15] drm/xe: Convert xe_dma_buf.c for exhaustive eviction
  2025-08-13 21:37   ` Matthew Brost
@ 2025-08-15 15:05     ` Thomas Hellström
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Hellström @ 2025-08-15 15:05 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, 2025-08-13 at 14:37 -0700, Matthew Brost wrote:
> On Wed, Aug 13, 2025 at 12:51:17PM +0200, Thomas Hellström wrote:
> > Convert dma-buf migration to XE_PL_TT and dma-buf import to
> > support exhaustive eviction, using xe_validation_guard().
> > It seems unlikely that the import would result in an -ENOMEM,
> > but convert import anyway for completeness.
> > 
> > The dma-buf map_attachment() functionality unfortunately doesn't
> > support passing a drm_exec, which means that foreign devices
> > validating a dma-buf that we exported will not, unless they are
> > xeKMD devices, participate in the exhaustive eviction scheme.
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_dma_buf.c | 59 +++++++++++++++++++++++------
> > ----
> >  1 file changed, 42 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c
> > b/drivers/gpu/drm/xe/xe_dma_buf.c
> > index 78a827d4e726..56df1d84df21 100644
> > --- a/drivers/gpu/drm/xe/xe_dma_buf.c
> > +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
> > @@ -163,16 +163,27 @@ static int xe_dma_buf_begin_cpu_access(struct
> > dma_buf *dma_buf,
> >  	struct xe_bo *bo = gem_to_xe_bo(obj);
> >  	bool reads =  (direction == DMA_BIDIRECTIONAL ||
> >  		       direction == DMA_FROM_DEVICE);
> > -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> > +	struct xe_validation_ctx ctx;
> > +	struct drm_exec exec;
> > +	int ret = 0;
> >  
> >  	if (!reads)
> >  		return 0;
> >  
> >  	/* Can we do interruptible lock here? */
> > -	xe_bo_lock(bo, false);
> > -	(void)xe_bo_migrate(bo, XE_PL_TT, exec);
> > -	xe_bo_unlock(bo);
> > -
> > +	xe_validation_guard(&ctx, &xe_bo_device(bo)->val, &exec,
> > 0, ret, false) {
> > +		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
> > +		drm_exec_retry_on_contention(&exec);
> > +		if (ret)
> > +			goto out;
> > +
> > +		ret = xe_bo_migrate(bo, XE_PL_TT, &exec);
> > +		drm_exec_retry_on_contention(&exec);
> > +		xe_validation_retry_on_oom(&ctx, &ret);
> > +	}
> > +out:
> > +	/* If we failed, cpu-access takes place in current
> > placement. */
> > +	(void)ret;
> 
> Do you need the above line of code? I don't see this often in kernel
> code.

It's merely to annotate that we don't care about the returned value.
But I can remove it.

/Thomas


> 
> Nit aside, patch LGTM.
> 
> Matt
> 
> >  	return 0;
> >  }
> >  
> > @@ -211,24 +222,38 @@ xe_dma_buf_init_obj(struct drm_device *dev,
> > struct xe_bo *storage,
> >  {
> >  	struct dma_resv *resv = dma_buf->resv;
> >  	struct xe_device *xe = to_xe_device(dev);
> > -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> > +	struct xe_validation_ctx ctx;
> > +	struct drm_gem_object *dummy_obj;
> > +	struct drm_exec exec;
> >  	struct xe_bo *bo;
> > -	int ret;
> > -
> > -	dma_resv_lock(resv, NULL);
> > -	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL,
> > dma_buf->size,
> > -				    0, /* Will require 1way or
> > 2way for vm_bind */
> > -				    ttm_bo_type_sg,
> > XE_BO_FLAG_SYSTEM, exec);
> > -	if (IS_ERR(bo)) {
> > -		ret = PTR_ERR(bo);
> > -		goto error;
> > +	int ret = 0;
> > +
> > +	dummy_obj = drm_gpuvm_resv_object_alloc(&xe->drm);
> > +	if (!dummy_obj)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	dummy_obj->resv = resv;
> > +	xe_validation_guard(&ctx, &xe->val, &exec, 0, ret, false)
> > {
> > +		ret = drm_exec_lock_obj(&exec, dummy_obj);
> > +		drm_exec_retry_on_contention(&exec);
> > +		if (ret)
> > +			goto error;
> > +
> > +		bo = ___xe_bo_create_locked(xe, storage, NULL,
> > resv, NULL, dma_buf->size,
> > +					    0, /* Will require
> > 1way or 2way for vm_bind */
> > +					    ttm_bo_type_sg,
> > XE_BO_FLAG_SYSTEM, &exec);
> > +		drm_exec_retry_on_contention(&exec);
> > +		if (IS_ERR(bo)) {
> > +			ret = PTR_ERR(bo);
> > +			xe_validation_retry_on_oom(&ctx, &ret);
> > +			goto error;
> > +		}
> >  	}
> > -	dma_resv_unlock(resv);
> > +	drm_gem_object_put(dummy_obj);
> >  
> >  	return &bo->ttm.base;
> >  
> >  error:
> > -	dma_resv_unlock(resv);
> >  	return ERR_PTR(ret);
> >  }
> >  
> > -- 
> > 2.50.1
> > 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 09/15] drm/xe: Convert the CPU fault handler for exhaustive eviction
  2025-08-13 22:06   ` Matthew Brost
@ 2025-08-15 15:16     ` Thomas Hellström
  2025-08-15 19:04       ` Matthew Brost
  0 siblings, 1 reply; 66+ messages in thread
From: Thomas Hellström @ 2025-08-15 15:16 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, 2025-08-13 at 15:06 -0700, Matthew Brost wrote:
> On Wed, Aug 13, 2025 at 12:51:15PM +0200, Thomas Hellström wrote:
> > The CPU fault handler may populate bos and migrate, and in doing
> > so might interfere with other tasks validing.
> > 
> > Convert it for exhaustive eviction. To do this properly without
> > potentially introducing stalls with the mmap lock held requires
> > TTM work. In the meantime, let's live with those stalls that
> > would typically happen on memory pressure.
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_bo.c | 11 ++++++++---
> >  1 file changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > b/drivers/gpu/drm/xe/xe_bo.c
> > index 5e40b6cb8d2a..dd1e0e9957e0 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > @@ -1720,14 +1720,18 @@ static vm_fault_t xe_gem_fault(struct
> > vm_fault *vmf)
> >  	struct xe_device *xe = to_xe_device(ddev);
> >  	struct xe_bo *bo = ttm_to_xe_bo(tbo);
> >  	bool needs_rpm = bo->flags & XE_BO_FLAG_VRAM_MASK;
> > -	struct drm_exec *exec;
> > +	struct xe_validation_ctx ctx;
> > +	struct drm_exec exec;
> >  	vm_fault_t ret;
> >  	int idx;
> >  
> >  	if (needs_rpm)
> >  		xe_pm_runtime_get(xe);
> >  
> > -	exec = XE_VALIDATION_UNIMPLEMENTED;
> > +	if (xe_validation_ctx_init(&ctx, &xe->val, &exec,
> > +				   DRM_EXEC_INTERRUPTIBLE_WAIT, 0,
> > false))
> > +		return VM_FAULT_NOPAGE;
> 
> Any particular reason to not use xe_validation_guard here?

Well this is a bit complicated ATM.
We would need some serious TTM rework here to support drm_exec in these
helpers, and ATM I think upon closer inspection we'd need an
xe_validation_ctx_init that doesn't initialize a drm_exec.

ttm_bo_vm_reserve() might use a bo lock without a drm_exec and that
will cause a lockdep splat if the drm_exec transaction has initialized
the ww ctx, which happens in drm_exec_until_all_locked(). 

I should add a comment about that.

/Thomas



> 
> Matt
> 
> > +
> >  	ret = ttm_bo_vm_reserve(tbo, vmf);
> >  	if (ret)
> >  		goto out;
> > @@ -1735,7 +1739,7 @@ static vm_fault_t xe_gem_fault(struct
> > vm_fault *vmf)
> >  	if (drm_dev_enter(ddev, &idx)) {
> >  		trace_xe_bo_cpu_fault(bo);
> >  
> > -		xe_validation_assert_exec(xe, exec, &tbo->base);
> > +		xe_validation_assert_exec(xe, &exec, &tbo->base);
> >  		ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma-
> > >vm_page_prot,
> >  					      
> > TTM_BO_VM_NUM_PREFAULT);
> >  		drm_dev_exit(idx);
> > @@ -1761,6 +1765,7 @@ static vm_fault_t xe_gem_fault(struct
> > vm_fault *vmf)
> >  
> >  	dma_resv_unlock(tbo->base.resv);
> >  out:
> > +	xe_validation_ctx_fini(&ctx);
> >  	if (needs_rpm)
> >  		xe_pm_runtime_put(xe);
> >  
> > -- 
> > 2.50.1
> > 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 05/15] drm/xe: Introduce an xe_validation wrapper around drm_exec
  2025-08-14  2:33   ` Matthew Brost
  2025-08-14  4:23     ` Matthew Brost
@ 2025-08-15 15:23     ` Thomas Hellström
  2025-08-15 19:01       ` Matthew Brost
  1 sibling, 1 reply; 66+ messages in thread
From: Thomas Hellström @ 2025-08-15 15:23 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, 2025-08-13 at 19:33 -0700, Matthew Brost wrote:
> On Wed, Aug 13, 2025 at 12:51:11PM +0200, Thomas Hellström wrote:
> > Introduce a validation wrapper xe_validation_guard() as a helper
> > intended to be used around drm_exec transactions what perform
> > validations. Once TTM can handle exhaustive eviction we could
> > remove this wrapper or make it mostly a NO-OP unless other
> > functionality is added to it.
> > 
> > Currently the wrapper takes a read lock upon entry and if the
> > transaction hits an OOM, all locks are released and the
> > transaction is retried with a write-lock. If all other
> > validations participate in this scheme, the transaction with
> > the write lock will be the only transaction validating and
> > should have access to all available non-pinned memory.
> > 
> > There is currently a problem in that TTM converts -EDEADLOCKS to
> > -ENOMEM, and with ww_mutex slowpath error injections, we can hit
> > -ENOMEMs without having actually ran out of memory. We abuse
> > ww_mutex internals to detect such situations until TTM is fixes
> > to not convert the error code. In the meantime, injecting
> > ww_mutex slowpath -EDEADLOCKs is a good way to test
> > the implementation in the absence of real OOMs.
> > 
> > Just introduce the wrapper in this commit. It will be hooked up
> > to the driver in following commits.
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_validation.c | 199
> > +++++++++++++++++++++++++++++
> >  drivers/gpu/drm/xe/xe_validation.h | 107 ++++++++++++++++
> >  2 files changed, 306 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_validation.c
> > b/drivers/gpu/drm/xe/xe_validation.c
> > index cc0684d24e02..cd1424f04237 100644
> > --- a/drivers/gpu/drm/xe/xe_validation.c
> > +++ b/drivers/gpu/drm/xe/xe_validation.c
> > @@ -5,6 +5,7 @@
> >  #include "xe_bo.h"
> >  #include <drm/drm_exec.h>
> >  #include <drm/drm_gem.h>
> > +#include <drm/drm_gpuvm.h>
> >  
> >  #include "xe_assert.h"
> >  #include "xe_validation.h"
> > @@ -47,3 +48,201 @@ void xe_validation_assert_exec(const struct
> > xe_device *xe,
> >  	}
> >  }
> >  #endif
> > +
> > +static int xe_validation_lock(struct xe_validation_ctx *ctx)
> > +{
> > +	struct xe_validation_device *val = ctx->val;
> > +	int ret = 0;
> > +
> > +	if (ctx->flags & DRM_EXEC_INTERRUPTIBLE_WAIT) {
> > +		if (ctx->request_exclusive)
> > +			ret = down_write_killable(&val->lock);
> > +		else
> > +			ret = down_read_interruptible(&val->lock);
> > +	} else {
> > +		if (ctx->request_exclusive)
> > +			down_write(&val->lock);
> > +		else
> > +			down_read(&val->lock);
> > +	}
> > +
> > +	if (!ret) {
> > +		ctx->lock_held = true;
> > +		ctx->lock_held_exclusive = ctx->request_exclusive;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static void xe_validation_unlock(struct xe_validation_ctx *ctx)
> > +{
> > +	if (!ctx->lock_held)
> > +		return;
> > +
> > +	if (ctx->lock_held_exclusive)
> > +		up_write(&ctx->val->lock);
> > +	else
> > +		up_read(&ctx->val->lock);
> > +
> > +	ctx->lock_held = false;
> > +}
> > +
> > +/**
> > + * xe_validation_ctx_init() - Initialize an xe_validation_ctx
> > + * @ctx: The xe_validation_ctx to initialize.
> > + * @val: The xe_validation_device representing the validation
> > domain.
> > + * @exec: The struct drm_exec to use for the transaction.
> > + * @flags: The flags to use for drm_exec initialization.
> > + * @nr: The number of anticipated buffer object locks. Forwarded
> > to
> > + * drm_exec initialization.
> > + * @exclusive: Whether to use exclusive locking already on first
> > validation.
> 
> The last two parameters of this function are always passed as 0 and
> false in this series. Is it worth keeping them? I don’t see a case
> where
> nr would ever be non-zero.

Right. I'll remove that from the interface but keep it in the struct so
that if we ever need it, we can just change the interface.

> exclusive is defensible, but it’s still
> unused.

Actually I think with vf provisioning and in the pm notifier it makes
sense to set it to true.

>  Maybe drop both and reserve a bit in flags for a driver-defined
> “exclusive.” That would make the call sites more readable—long
> argument
> lists make it easy to forget what each parameter means or to
> transpose
> them.

Problem is that this is drm_exec flags. I'm not really keen on
overloading driver defined flags there. We could add a separate const
struct xe_validation_flags, though, but then we'd have a translation
step?

> 
> > + *
> > + * Initialize and lock a an xe_validation transaction using the
> > validation domain
> > + * represented by @val. Also initialize the drm_exec object
> > forwarding
> > + * @flags and @nr to the drm_exec initialization. The @exclusive
> > parameter should
> > + * typically be set to false to avoid locking out other validators
> > from the
> > + * domain until an OOM is hit. For testing- or final attempt
> > purposes it can,
> > + * however, be set to true.
> > + *
> > + * Return: %0 on success, %-EINTR if interruptible initial locking
> > failed with a
> > + * signal pending.
> > + */
> > +int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct
> > xe_validation_device *val,
> > +			   struct drm_exec *exec, u32 flags,
> > unsigned int nr,
> > +			   bool exclusive)
> > +{
> > +	int ret;
> > +
> > +	ctx->exec = exec;
> > +	ctx->val = val;
> > +	ctx->lock_held = false;
> > +	ctx->lock_held_exclusive = false;
> > +	ctx->request_exclusive = exclusive;
> > +	ctx->flags = flags;
> > +	ctx->nr = nr;
> > +
> > +	ret = xe_validation_lock(ctx);
> > +	if (ret)
> > +		return ret;
> > +
> > +	drm_exec_init(exec, flags, nr);
> > +
> > +	return 0;
> > +}
> > +
> > +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> > +/*
> > + * This abuses both drm_exec and ww_mutex internals and should be
> > + * replaced by checking for -EDEADLK when we can make TTM
> > + * stop converting -EDEADLK to -ENOMEM.
> > + * An alternative is to not have exhaustive eviction with
> > + * CONFIG_DEBUG_WW_MUTEX_SLOWPATH until that happens.
> > + */
> > +static bool xe_validation_contention_injected(struct drm_exec
> > *exec)
> > +{
> > +	return !!exec->ticket.contending_lock;
> > +}
> > +
> > +#else
> > +
> > +static bool xe_validation_contention_injected(struct drm_exec
> > *exec)
> > +{
> > +	return false;
> > +}
> > +
> > +#endif
> > +
> > +static bool __xe_validation_should_retry(struct xe_validation_ctx
> > *ctx, int ret)
> > +{
> > +	if (ret == -ENOMEM &&
> > +	    ((ctx->request_exclusive &&
> > +	      xe_validation_contention_injected(ctx->exec)) ||
> > +	     !ctx->request_exclusive)) {
> > +		ctx->request_exclusive = true;
> > +		return true;
> > +	}
> > +
> > +	return false;
> > +}
> > +
> > +/**
> > + * xe_validation_exec_lock() - Perform drm_gpuvm_exec_lock within
> > a validation
> > + * transaction.
> > + * @ctx: An uninitialized xe_validation_ctx.
> > + * @vm_exec: An initialized struct vm_exec.
> > + * @val: The validation domain.
> > + *
> > + * The drm_gpuvm_exec_lock() function internally initializes its
> > drm_exec
> > + * transaction and therefore doesn't lend itself very well to be
> > using
> > + * xe_validation_ctx_init(). Provide a helper that takes an
> > uninitialized
> > + * xe_validation_ctx and calls drm_gpuvm_exec_lock() with OOM
> > retry.
> > + *
> > + * Return: %0 on success, negative error code on failure.
> > + */
> > +int xe_validation_exec_lock(struct xe_validation_ctx *ctx,
> > +			    struct drm_gpuvm_exec *vm_exec,
> > +			    struct xe_validation_device *val)
> > +{
> > +	int ret;
> > +
> > +	memset(ctx, 0, sizeof(*ctx));
> > +	ctx->exec = &vm_exec->exec;
> > +	ctx->flags = vm_exec->flags;
> > +	ctx->val = val;
> > +retry:
> > +	ret = xe_validation_lock(ctx);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = drm_gpuvm_exec_lock(vm_exec);
> > +	if (ret) {
> > +		xe_validation_unlock(ctx);
> > +		if (__xe_validation_should_retry(ctx, ret))
> > +			goto retry;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +/**
> > + * xe_validation_ctx_fini() - Finalize a validation transaction
> > + * @ctx: The Validation transaction to finalize.
> > + *
> > + * Finalize a validation transaction and its related drm_exec
> > transaction.
> > + */
> > +void xe_validation_ctx_fini(struct xe_validation_ctx *ctx)
> > +{
> > +	drm_exec_fini(ctx->exec);
> > +	xe_validation_unlock(ctx);
> > +}
> > +
> > +/**
> > + * xe_validation_should_retry() - Determine if a validation
> > transaction should retry
> > + * @ctx: The validation transaction.
> > + * @ret: Pointer to a return value variable.
> > + *
> > + * Determines whether a validation transaction should retry based
> > on the
> > + * internal transaction state and the return value pointed to by
> > @ret.
> > + * If a validation should be retried, the transaction is prepared
> > for that,
> > + * and the validation locked might be re-locked in exclusive mode,
> > and *@ret
> > + * is set to %0. If the re-locking errors, typically due to
> > interruptible
> > + * locking with signal pending, *@ret is instead set to -EINTR and
> > the
> > + * function returns %false.
> > + *
> > + * Return: %true if validation should be retried, %false
> > otherwise.
> > + */
> > +bool xe_validation_should_retry(struct xe_validation_ctx *ctx, int
> > *ret)
> > +{
> > +	if (__xe_validation_should_retry(ctx, *ret)) {
> > +		drm_exec_fini(ctx->exec);
> > +		*ret = 0;
> > +		if (ctx->request_exclusive != ctx-
> > >lock_held_exclusive) {
> > +			xe_validation_unlock(ctx);
> > +			*ret = xe_validation_lock(ctx);
> > +		}
> > +		drm_exec_init(ctx->exec, ctx->flags, ctx->nr);
> > +		return !*ret;
> > +	}
> > +
> > +	return false;
> > +}
> > diff --git a/drivers/gpu/drm/xe/xe_validation.h
> > b/drivers/gpu/drm/xe/xe_validation.h
> > index db50feacad7a..a708c260cf18 100644
> > --- a/drivers/gpu/drm/xe/xe_validation.h
> > +++ b/drivers/gpu/drm/xe/xe_validation.h
> > @@ -7,9 +7,11 @@
> >  
> >  #include <linux/dma-resv.h>
> >  #include <linux/types.h>
> > +#include <linux/rwsem.h>
> >  
> >  struct drm_exec;
> >  struct drm_gem_object;
> > +struct drm_gpuvm_exec;
> >  struct xe_device;
> >  
> >  #ifdef CONFIG_PROVE_LOCKING
> > @@ -66,4 +68,109 @@ void xe_validation_assert_exec(const struct
> > xe_device *xe, const struct drm_exec
> >  	} while (0)
> >  #endif
> >  
> > +/**
> > + * struct xe_validation_device - The domain for exhaustive
> > eviction
> > + * @lock: The lock used to exclude other processes from allocating
> > graphics memory
> > + *
> > + * The struct xe_validation_device represents the domain for which
> > we want to use
> > + * exhaustive eviction. The @lock is typically grabbed in read
> > mode for allocations
> > + * but when graphics memory allocation fails, it is retried with
> > the write mode held.
> > + */
> > +struct xe_validation_device {
> > +	struct rw_semaphore lock;
> > +};
> > +
> > +/**
> > + * struct xe_validation_ctx - A struct drm_exec subclass with
> > support for
> > + * exhaustive eviction
> > + * @exec: The drm_exec object base class. Note that we use a
> > pointer instead of
> > + * embedding to avoid diamond inheritance.
> > + * @val: The exhaustive eviction domain.
> > + * @lock_held: Whether The domain lock is currently held.
> > + * @lock_held_exclusive: Whether the domain lock is held in
> > exclusive mode.
> > + * @request_exclusive: Whether to lock exclusively (write mode)
> > the next time
> > + * the domain lock is locked.
> > + * @flags: The drm_exec flags used for drm_exec (re-
> > )initialization.
> > + * @nr: The drm_exec nr parameter used for drm_exec (re-
> > )initializaiton.
> > + */
> > +struct xe_validation_ctx {
> > +	struct drm_exec *exec;
> > +	struct xe_validation_device *val;
> > +	bool lock_held;
> > +	bool lock_held_exclusive;
> > +	bool request_exclusive;
> > +	u32 flags;
> > +	unsigned int nr;
> > +};
> > +
> > +int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct
> > xe_validation_device *val,
> > +			   struct drm_exec *exec, u32 flags,
> > unsigned int nr,
> > +			   bool exclusive);
> > +
> > +int xe_validation_exec_lock(struct xe_validation_ctx *ctx, struct
> > drm_gpuvm_exec *vm_exec,
> > +			    struct xe_validation_device *val);
> > +
> > +void xe_validation_ctx_fini(struct xe_validation_ctx *ctx);
> > +
> > +bool xe_validation_should_retry(struct xe_validation_ctx *ctx, int
> > *ret);
> > +
> > +/**
> > + * xe_validation_retry_on_oom() - Retry on oom in an xe_validaton
> > transaction
> > + * @_ctx: Pointer to the xe_validation_ctx
> > + * @_ret: The current error value possibly holding -ENOMEM
> > + *
> > + * Use this in way similar to drm_exec_retry_on_contention().
> > + * If @_ret contains -ENOMEM the tranaction is restarted once in a
> > way that
> > + * blocks other transactions and allows exhastive eviction. If the
> > transaction
> > + * was already restarted once, Just return the -ENOMEM. May also
> > set
> > + * _ret to -EINTR if not retrying and waits are interruptible.
> > + * May only be used within a drm_exec_until_all_locked() loop.
> > + */
> > +#define xe_validation_retry_on_oom(_ctx,
> > _ret)				\
> > +	do
> > {								\
> > +		if (xe_validation_should_retry(_ctx,
> > _ret))		\
> > +			goto
> > *__drm_exec_retry_ptr;			\
> > +	} while (0)
> > +
> > +/**
> > + * xe_validation_device_init - Initialize a struct
> > xe_validation_device
> > + * @val: The xe_validation_device to init.
> > + */
> > +static inline void
> > +xe_validation_device_init(struct xe_validation_device *val)
> > +{
> > +	init_rwsem(&val->lock);
> > +}
> > +
> > +/*
> > + * Make guard() and scoped_guard() work with xe_validation_ctx
> > + * so that we can exit transactions without caring about the
> > + * cleanup.
> > + */
> > +DEFINE_CLASS(xe_validation, struct xe_validation_ctx *,
> > +	     if (!IS_ERR(_T)) xe_validation_ctx_fini(_T);,
> > +	     ({_ret = xe_validation_ctx_init(_ctx, _val, _exec,
> > _flags, 0, _excl);
> > +	       _ret ? NULL : _ctx; }),
> > +	     struct xe_validation_ctx *_ctx, struct
> > xe_validation_device *_val,
> > +	     struct drm_exec *_exec, u32 _flags, int _ret, bool
> > _excl);
> > +static inline void
> > *class_xe_validation_lock_ptr(class_xe_validation_t *_T)
> > +{return *_T; }
> > +#define class_xe_validation_is_conditional false
> > +
> > +/**
> > + * xe_validation_guard() - An auto-cleanup xe_validation_ctx
> > transaction
> > + * @_ctx: The xe_validation_ctx.
> > + * @_val: The xe_validation_device.
> > + * @_exec: The struct drm_exec object
> > + * @_flags: Flags for the drm_exec transaction. See the struct
> > drm_exec documention!
> > + * @_ret: Return in / out parameter. May be set by this macro.
> > Typicall 0 when called.
> > + * @_excl: Whether to start in exclusive mode already in the first
> > iteration.
> > + *
> 
> Same comment as above on function xe_validation_ctx_init wrt to
> arguments.

OK, let me know what you think given the above.

/Thomas


> 
> Matt
> 
> > + * This macro is will initiate a drm_exec transaction with
> > additional support for
> > + * exhaustive eviction.
> > + */
> > +#define xe_validation_guard(_ctx, _val, _exec, _flags, _ret,
> > _excl)	\
> > +	scoped_guard(xe_validation, _ctx, _val, _exec, _flags,
> > _ret, _excl) \
> > +	drm_exec_until_all_locked(_exec)
> > +
> >  #endif
> > -- 
> > 2.50.1
> > 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 13/15] drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction
  2025-08-14  3:58   ` Matthew Brost
@ 2025-08-15 15:25     ` Thomas Hellström
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Hellström @ 2025-08-15 15:25 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, 2025-08-13 at 20:58 -0700, Matthew Brost wrote:
> On Wed, Aug 13, 2025 at 12:51:19PM +0200, Thomas Hellström wrote:
> > Most users of xe_bo_create_pin_map_at() and
> > xe_bo_create_pin_map_at_aligned() are not using the vm parameter,
> > and that simplifies conversion. Introduce an
> > xe_bo_create_pin_map_at_novm() function and make the _aligned()
> > version static. Use xe_validation_guard() for conversion.
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  .../compat-i915-headers/gem/i915_gem_stolen.h | 24 ++----
> >  drivers/gpu/drm/xe/display/xe_fb_pin.c        | 45 +++++-----
> >  drivers/gpu/drm/xe/display/xe_plane_initial.c |  4 +-
> >  drivers/gpu/drm/xe/xe_bo.c                    | 83 ++++++++++++++-
> > ----
> >  drivers/gpu/drm/xe/xe_bo.h                    | 13 +--
> >  drivers/gpu/drm/xe/xe_eu_stall.c              |  6 +-
> >  6 files changed, 101 insertions(+), 74 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/compat-i915-
> > headers/gem/i915_gem_stolen.h b/drivers/gpu/drm/xe/compat-i915-
> > headers/gem/i915_gem_stolen.h
> > index 1ce1e9da975b..ab48635ddffa 100644
> > --- a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> > +++ b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> > @@ -21,9 +21,7 @@ static inline int
> > i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
> >  						       u32 size,
> > u32 align,
> >  						       u32 start,
> > u32 end)
> >  {
> > -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> >  	struct xe_bo *bo;
> > -	int err;
> >  	u32 flags = XE_BO_FLAG_PINNED | XE_BO_FLAG_STOLEN;
> >  
> >  	if (start < SZ_4K)
> > @@ -34,25 +32,15 @@ static inline int
> > i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
> >  		start = ALIGN(start, align);
> >  	}
> >  
> > -	bo = xe_bo_create_locked_range(xe,
> > xe_device_get_root_tile(xe),
> > -				       NULL, size, start, end,
> > -				       ttm_bo_type_kernel, flags,
> > 0, exec);
> > -	if (IS_ERR(bo)) {
> > -		err = PTR_ERR(bo);
> > -		bo = NULL;
> > -		return err;
> > -	}
> > -	err = xe_bo_pin(bo, exec);
> > -	xe_bo_unlock_vm_held(bo);
> > -
> > -	if (err) {
> > -		xe_bo_put(fb->bo);
> > -		bo = NULL;
> > -	}
> > +	bo = xe_bo_create_pin_map_at_novm(xe,
> > xe_device_get_root_tile(xe),
> > +					  size, start,
> > ttm_bo_type_kernel, flags,
> > +					  false, 0, true);
> > +	if (IS_ERR(bo))
> > +		return PTR_ERR(bo);
> >  
> >  	fb->bo = bo;
> >  
> > -	return err;
> > +	return 0;
> >  }
> >  
> >  static inline int i915_gem_stolen_insert_node(struct xe_device
> > *xe,
> > diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > index 43c45344ea26..d46ff7ebb0a1 100644
> > --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > @@ -102,29 +102,32 @@ static int __xe_pin_fb_vma_dpt(const struct
> > intel_framebuffer *fb,
> >  				 XE_PAGE_SIZE);
> >  
> >  	if (IS_DGFX(xe))
> > -		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0,
> > NULL,
> > -						      dpt_size,
> > ~0ull,
> > -						     
> > ttm_bo_type_kernel,
> > -						     
> > XE_BO_FLAG_VRAM0 |
> > -						     
> > XE_BO_FLAG_GGTT |
> > -						     
> > XE_BO_FLAG_PAGETABLE,
> > -						      alignment);
> > +		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> > +						   dpt_size,
> > ~0ull,
> > +						  
> > ttm_bo_type_kernel,
> > +						   true,
> > +						  
> > XE_BO_FLAG_VRAM0 |
> > +						   XE_BO_FLAG_GGTT
> > |
> > +						  
> > XE_BO_FLAG_PAGETABLE,
> > +						   alignment,
> > false);
> >  	else
> > -		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0,
> > NULL,
> > -						      dpt_size, 
> > ~0ull,
> > -						     
> > ttm_bo_type_kernel,
> > -						     
> > XE_BO_FLAG_STOLEN |
> > -						     
> > XE_BO_FLAG_GGTT |
> > -						     
> > XE_BO_FLAG_PAGETABLE,
> > -						      alignment);
> > +		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> > +						   dpt_size, 
> > ~0ull,
> > +						  
> > ttm_bo_type_kernel,
> > +						   true,
> > +						  
> > XE_BO_FLAG_STOLEN |
> > +						   XE_BO_FLAG_GGTT
> > |
> > +						  
> > XE_BO_FLAG_PAGETABLE,
> > +						   alignment,
> > false);
> >  	if (IS_ERR(dpt))
> > -		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0,
> > NULL,
> > -						      dpt_size, 
> > ~0ull,
> > -						     
> > ttm_bo_type_kernel,
> > -						     
> > XE_BO_FLAG_SYSTEM |
> > -						     
> > XE_BO_FLAG_GGTT |
> > -						     
> > XE_BO_FLAG_PAGETABLE,
> > -						      alignment);
> > +		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> > +						   dpt_size, 
> > ~0ull,
> > +						  
> > ttm_bo_type_kernel,
> > +						   true,
> > +						  
> > XE_BO_FLAG_SYSTEM |
> > +						   XE_BO_FLAG_GGTT
> > |
> > +						  
> > XE_BO_FLAG_PAGETABLE,
> > +						   alignment,
> > false);
> >  	if (IS_ERR(dpt))
> >  		return PTR_ERR(dpt);
> >  
> > diff --git a/drivers/gpu/drm/xe/display/xe_plane_initial.c
> > b/drivers/gpu/drm/xe/display/xe_plane_initial.c
> > index 826ac3d578b7..79d00127caf4 100644
> > --- a/drivers/gpu/drm/xe/display/xe_plane_initial.c
> > +++ b/drivers/gpu/drm/xe/display/xe_plane_initial.c
> > @@ -140,8 +140,8 @@ initial_plane_bo(struct xe_device *xe,
> >  			page_size);
> >  	size -= base;
> >  
> > -	bo = xe_bo_create_pin_map_at(xe, tile0, NULL, size,
> > phys_base,
> > -				     ttm_bo_type_kernel, flags);
> > +	bo = xe_bo_create_pin_map_at_novm(xe, tile0, size,
> > phys_base,
> > +					  ttm_bo_type_kernel,
> > flags, true, 0, false);
> >  	if (IS_ERR(bo)) {
> >  		drm_dbg(&xe->drm,
> >  			"Failed to create bo phys_base=%pa size %u
> > with flags %x: %li\n",
> > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > b/drivers/gpu/drm/xe/xe_bo.c
> > index 23b28eeef59f..c9928d4ee5a0 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > @@ -2253,29 +2253,20 @@ struct xe_bo *xe_bo_create_user(struct
> > xe_device *xe,
> >  	return bo;
> >  }
> >  
> > -struct xe_bo *xe_bo_create_pin_map_at(struct xe_device *xe, struct
> > xe_tile *tile,
> > -				      struct xe_vm *vm,
> > -				      size_t size, u64 offset,
> > -				      enum ttm_bo_type type, u32
> > flags)
> > -{
> > -	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size,
> > offset,
> > -					       type, flags, 0);
> > -}
> > -
> > -struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device
> > *xe,
> > -					      struct xe_tile
> > *tile,
> > -					      struct xe_vm *vm,
> > -					      size_t size, u64
> > offset,
> > -					      enum ttm_bo_type
> > type, u32 flags,
> > -					      u64 alignment)
> > +static struct xe_bo *xe_bo_create_pin_map_at_aligned(struct
> > xe_device *xe,
> > +						     struct
> > xe_tile *tile,
> > +						     struct xe_vm
> > *vm,
> > +						     size_t size,
> > u64 offset,
> > +						     enum
> > ttm_bo_type type, u32 flags,
> > +						     bool vmap,
> > u64 alignment,
> > +						     struct
> > drm_exec *exec)
> >  {
> >  	struct xe_bo *bo;
> >  	int err;
> >  	u64 start = offset == ~0ull ? 0 : offset;
> >  	u64 end = offset == ~0ull ? offset : start + size;
> > -	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) :
> > XE_VALIDATION_UNIMPLEMENTED;
> >  
> > -	if (flags & XE_BO_FLAG_STOLEN &&
> > +	if (flags & XE_BO_FLAG_STOLEN && vmap &&
> >  	    xe_ttm_stolen_cpu_access_needs_ggtt(xe))
> >  		flags |= XE_BO_FLAG_GGTT;
> >  
> > @@ -2289,9 +2280,11 @@ struct xe_bo
> > *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
> >  	if (err)
> >  		goto err_put;
> >  
> > -	err = xe_bo_vmap(bo);
> > -	if (err)
> > -		goto err_unpin;
> > +	if (vmap) {
> > +		err = xe_bo_vmap(bo);
> > +		if (err)
> > +			goto err_unpin;
> > +	}
> >  
> >  	xe_bo_unlock_vm_held(bo);
> >  
> > @@ -2305,11 +2298,59 @@ struct xe_bo
> > *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
> >  	return ERR_PTR(err);
> >  }
> >  
> > +/**
> > + * xe_bo_create_pin_map_at_novm() - Create pinned and mapped bo at
> > optional VRAM offset
> > + * @xe: The xe device.
> > + * @tile: The tile to select for migration of this bo, and the
> > tile used for
> > + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel
> > bos.
> > + * @size: The storage size to use for the bo.
> > + * @offset: Optional VRAM offset or %0 for don't care.
> > + * @type: The TTM buffer object type.
> > + * @flags: XE_BO_FLAG_ flags.
> > + * @vmap: Whether to create a buffer object map.
> 
> Can we stick vmap into XE_BO_FLAG_?
> 
> Also why do we need argument now when it what omitted previously.

It's for the single vf provisioning user that wants to pin without
mapping.

But I'm open-coding that in v2 so dropping the vmap bool.

/Thomas


> 
> Matt
> 
> > + * @alignment: GGTT alignment.
> > + * @intr: Whether to execut any waits for backing store
> > interruptible.
> > + *
> > + * Create a pinned and optionally mapped bo with VRAM offset and
> > GGTT alignment
> > + * options. The bo will be external and not associated with a VM.
> > + *
> > + * Return: The buffer object on success. Negative error pointer on
> > failure.
> > + * In particular, the function may return ERR_PTR(%-EINTR) if
> > @intr was set
> > + * to true on entry.
> > + */
> > +struct xe_bo *
> > +xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile
> > *tile,
> > +			     size_t size, u64 offset, enum
> > ttm_bo_type type, u32 flags,
> > +			     bool vmap, u64 alignment, bool intr)
> > +{
> > +	u32 drm_exec_flags = intr ? DRM_EXEC_INTERRUPTIBLE_WAIT :
> > 0;
> > +	struct xe_validation_ctx ctx;
> > +	struct drm_exec exec;
> > +	struct xe_bo *bo;
> > +	int ret = 0;
> > +
> > +	xe_validation_guard(&ctx, &xe->val, &exec, drm_exec_flags,
> > ret, false) {
> > +		bo = xe_bo_create_pin_map_at_aligned(xe, tile,
> > NULL, size, offset,
> > +						     type, flags,
> > vmap,
> > +						     alignment,
> > &exec);
> > +		drm_exec_retry_on_contention(&exec);
> > +		if (IS_ERR(bo)) {
> > +			ret = PTR_ERR(bo);
> > +			xe_validation_retry_on_oom(&ctx, &ret);
> > +		}
> > +	}
> > +
> > +	return ret ? ERR_PTR(ret) : bo;
> > +}
> > +
> >  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct
> > xe_tile *tile,
> >  				   struct xe_vm *vm, size_t size,
> >  				   enum ttm_bo_type type, u32
> > flags)
> >  {
> > -	return xe_bo_create_pin_map_at(xe, tile, vm, size, ~0ull,
> > type, flags);
> > +	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) :
> > XE_VALIDATION_UNIMPLEMENTED;
> > +
> > +	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size,
> > ~0ull, type, flags,
> > +					       true, 0, exec);
> >  }
> >  
> >  static void __xe_bo_unpin_map_no_vm(void *arg)
> > diff --git a/drivers/gpu/drm/xe/xe_bo.h
> > b/drivers/gpu/drm/xe/xe_bo.h
> > index a625806deeb6..d06266af9662 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.h
> > +++ b/drivers/gpu/drm/xe/xe_bo.h
> > @@ -109,15 +109,10 @@ struct xe_bo *xe_bo_create_user(struct
> > xe_device *xe, struct xe_vm *vm, size_t s
> >  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct
> > xe_tile *tile,
> >  				   struct xe_vm *vm, size_t size,
> >  				   enum ttm_bo_type type, u32
> > flags);
> > -struct xe_bo *xe_bo_create_pin_map_at(struct xe_device *xe, struct
> > xe_tile *tile,
> > -				      struct xe_vm *vm, size_t
> > size, u64 offset,
> > -				      enum ttm_bo_type type, u32
> > flags);
> > -struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device
> > *xe,
> > -					      struct xe_tile
> > *tile,
> > -					      struct xe_vm *vm,
> > -					      size_t size, u64
> > offset,
> > -					      enum ttm_bo_type
> > type, u32 flags,
> > -					      u64 alignment);
> > +struct xe_bo *
> > +xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile
> > *tile,
> > +			     size_t size, u64 offset, enum
> > ttm_bo_type type,
> > +			     u32 flags, bool vmap, u64 alignment,
> > bool intr);
> >  struct xe_bo *xe_managed_bo_create_pin_map(struct xe_device *xe,
> > struct xe_tile *tile,
> >  					   size_t size, u32
> > flags);
> >  struct xe_bo *xe_managed_bo_create_from_data(struct xe_device *xe,
> > struct xe_tile *tile,
> > diff --git a/drivers/gpu/drm/xe/xe_eu_stall.c
> > b/drivers/gpu/drm/xe/xe_eu_stall.c
> > index fdd514fec5ef..afabfc125488 100644
> > --- a/drivers/gpu/drm/xe/xe_eu_stall.c
> > +++ b/drivers/gpu/drm/xe/xe_eu_stall.c
> > @@ -617,9 +617,9 @@ static int xe_eu_stall_data_buf_alloc(struct
> > xe_eu_stall_data_stream *stream,
> >  
> >  	size = stream->per_xecore_buf_size * last_xecore;
> >  
> > -	bo = xe_bo_create_pin_map_at_aligned(tile->xe, tile, NULL,
> > -					     size, ~0ull,
> > ttm_bo_type_kernel,
> > -					     XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_GGTT, SZ_64);
> > +	bo = xe_bo_create_pin_map_at_novm(tile->xe, tile, size,
> > ~0ull, ttm_bo_type_kernel,
> > +					  XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_GGTT, true,
> > +					  SZ_64, false);
> >  	if (IS_ERR(bo)) {
> >  		kfree(stream->xecore_buf);
> >  		return PTR_ERR(bo);
> > -- 
> > 2.50.1
> > 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 13/15] drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction
  2025-08-14  4:05   ` Matthew Brost
@ 2025-08-15 15:27     ` Thomas Hellström
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Hellström @ 2025-08-15 15:27 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, 2025-08-13 at 21:05 -0700, Matthew Brost wrote:
> On Wed, Aug 13, 2025 at 12:51:19PM +0200, Thomas Hellström wrote:
> > Most users of xe_bo_create_pin_map_at() and
> > xe_bo_create_pin_map_at_aligned() are not using the vm parameter,
> > and that simplifies conversion. Introduce an
> > xe_bo_create_pin_map_at_novm() function and make the _aligned()
> > version static. Use xe_validation_guard() for conversion.
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  .../compat-i915-headers/gem/i915_gem_stolen.h | 24 ++----
> >  drivers/gpu/drm/xe/display/xe_fb_pin.c        | 45 +++++-----
> >  drivers/gpu/drm/xe/display/xe_plane_initial.c |  4 +-
> >  drivers/gpu/drm/xe/xe_bo.c                    | 83 ++++++++++++++-
> > ----
> >  drivers/gpu/drm/xe/xe_bo.h                    | 13 +--
> >  drivers/gpu/drm/xe/xe_eu_stall.c              |  6 +-
> >  6 files changed, 101 insertions(+), 74 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/compat-i915-
> > headers/gem/i915_gem_stolen.h b/drivers/gpu/drm/xe/compat-i915-
> > headers/gem/i915_gem_stolen.h
> > index 1ce1e9da975b..ab48635ddffa 100644
> > --- a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> > +++ b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> > @@ -21,9 +21,7 @@ static inline int
> > i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
> >  						       u32 size,
> > u32 align,
> >  						       u32 start,
> > u32 end)
> >  {
> > -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> >  	struct xe_bo *bo;
> > -	int err;
> >  	u32 flags = XE_BO_FLAG_PINNED | XE_BO_FLAG_STOLEN;
> >  
> >  	if (start < SZ_4K)
> > @@ -34,25 +32,15 @@ static inline int
> > i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
> >  		start = ALIGN(start, align);
> >  	}
> >  
> > -	bo = xe_bo_create_locked_range(xe,
> > xe_device_get_root_tile(xe),
> > -				       NULL, size, start, end,
> > -				       ttm_bo_type_kernel, flags,
> > 0, exec);
> > -	if (IS_ERR(bo)) {
> > -		err = PTR_ERR(bo);
> > -		bo = NULL;
> > -		return err;
> > -	}
> > -	err = xe_bo_pin(bo, exec);
> > -	xe_bo_unlock_vm_held(bo);
> > -
> > -	if (err) {
> > -		xe_bo_put(fb->bo);
> > -		bo = NULL;
> > -	}
> > +	bo = xe_bo_create_pin_map_at_novm(xe,
> > xe_device_get_root_tile(xe),
> > +					  size, start,
> > ttm_bo_type_kernel, flags,
> > +					  false, 0, true);
> > +	if (IS_ERR(bo))
> > +		return PTR_ERR(bo);
> >  
> >  	fb->bo = bo;
> >  
> > -	return err;
> > +	return 0;
> >  }
> >  
> >  static inline int i915_gem_stolen_insert_node(struct xe_device
> > *xe,
> > diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > index 43c45344ea26..d46ff7ebb0a1 100644
> > --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > @@ -102,29 +102,32 @@ static int __xe_pin_fb_vma_dpt(const struct
> > intel_framebuffer *fb,
> >  				 XE_PAGE_SIZE);
> >  
> >  	if (IS_DGFX(xe))
> > -		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0,
> > NULL,
> > -						      dpt_size,
> > ~0ull,
> > -						     
> > ttm_bo_type_kernel,
> > -						     
> > XE_BO_FLAG_VRAM0 |
> > -						     
> > XE_BO_FLAG_GGTT |
> > -						     
> > XE_BO_FLAG_PAGETABLE,
> > -						      alignment);
> > +		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> > +						   dpt_size,
> > ~0ull,
> > +						  
> > ttm_bo_type_kernel,
> > +						   true,
> > +						  
> > XE_BO_FLAG_VRAM0 |
> > +						   XE_BO_FLAG_GGTT
> > |
> > +						  
> > XE_BO_FLAG_PAGETABLE,
> 
> The flags and vmap arguments are swapped.
> 
> > +						   alignment,
> > false);
> >  	else
> > -		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0,
> > NULL,
> > -						      dpt_size, 
> > ~0ull,
> > -						     
> > ttm_bo_type_kernel,
> > -						     
> > XE_BO_FLAG_STOLEN |
> > -						     
> > XE_BO_FLAG_GGTT |
> > -						     
> > XE_BO_FLAG_PAGETABLE,
> > -						      alignment);
> > +		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> > +						   dpt_size, 
> > ~0ull,
> > +						  
> > ttm_bo_type_kernel,
> > +						   true,
> > +						  
> > XE_BO_FLAG_STOLEN |
> > +						   XE_BO_FLAG_GGTT
> > |
> > +						  
> > XE_BO_FLAG_PAGETABLE,
> 
> The flags and vmap arguments are swapped.

Right. This was fixed in the wrong patch (14), but will drop the vmap
argument in v2.

/Thomas




> 
> Matt
> 
> > +						   alignment,
> > false);
> >  	if (IS_ERR(dpt))
> > -		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0,
> > NULL,
> > -						      dpt_size, 
> > ~0ull,
> > -						     
> > ttm_bo_type_kernel,
> > -						     
> > XE_BO_FLAG_SYSTEM |
> > -						     
> > XE_BO_FLAG_GGTT |
> > -						     
> > XE_BO_FLAG_PAGETABLE,
> > -						      alignment);
> > +		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> > +						   dpt_size, 
> > ~0ull,
> > +						  
> > ttm_bo_type_kernel,
> > +						   true,
> > +						  
> > XE_BO_FLAG_SYSTEM |
> > +						   XE_BO_FLAG_GGTT
> > |
> > +						  
> > XE_BO_FLAG_PAGETABLE,
> > +						   alignment,
> > false);
> >  	if (IS_ERR(dpt))
> >  		return PTR_ERR(dpt);
> >  
> > diff --git a/drivers/gpu/drm/xe/display/xe_plane_initial.c
> > b/drivers/gpu/drm/xe/display/xe_plane_initial.c
> > index 826ac3d578b7..79d00127caf4 100644
> > --- a/drivers/gpu/drm/xe/display/xe_plane_initial.c
> > +++ b/drivers/gpu/drm/xe/display/xe_plane_initial.c
> > @@ -140,8 +140,8 @@ initial_plane_bo(struct xe_device *xe,
> >  			page_size);
> >  	size -= base;
> >  
> > -	bo = xe_bo_create_pin_map_at(xe, tile0, NULL, size,
> > phys_base,
> > -				     ttm_bo_type_kernel, flags);
> > +	bo = xe_bo_create_pin_map_at_novm(xe, tile0, size,
> > phys_base,
> > +					  ttm_bo_type_kernel,
> > flags, true, 0, false);
> >  	if (IS_ERR(bo)) {
> >  		drm_dbg(&xe->drm,
> >  			"Failed to create bo phys_base=%pa size %u
> > with flags %x: %li\n",
> > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > b/drivers/gpu/drm/xe/xe_bo.c
> > index 23b28eeef59f..c9928d4ee5a0 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > @@ -2253,29 +2253,20 @@ struct xe_bo *xe_bo_create_user(struct
> > xe_device *xe,
> >  	return bo;
> >  }
> >  
> > -struct xe_bo *xe_bo_create_pin_map_at(struct xe_device *xe, struct
> > xe_tile *tile,
> > -				      struct xe_vm *vm,
> > -				      size_t size, u64 offset,
> > -				      enum ttm_bo_type type, u32
> > flags)
> > -{
> > -	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size,
> > offset,
> > -					       type, flags, 0);
> > -}
> > -
> > -struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device
> > *xe,
> > -					      struct xe_tile
> > *tile,
> > -					      struct xe_vm *vm,
> > -					      size_t size, u64
> > offset,
> > -					      enum ttm_bo_type
> > type, u32 flags,
> > -					      u64 alignment)
> > +static struct xe_bo *xe_bo_create_pin_map_at_aligned(struct
> > xe_device *xe,
> > +						     struct
> > xe_tile *tile,
> > +						     struct xe_vm
> > *vm,
> > +						     size_t size,
> > u64 offset,
> > +						     enum
> > ttm_bo_type type, u32 flags,
> > +						     bool vmap,
> > u64 alignment,
> > +						     struct
> > drm_exec *exec)
> >  {
> >  	struct xe_bo *bo;
> >  	int err;
> >  	u64 start = offset == ~0ull ? 0 : offset;
> >  	u64 end = offset == ~0ull ? offset : start + size;
> > -	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) :
> > XE_VALIDATION_UNIMPLEMENTED;
> >  
> > -	if (flags & XE_BO_FLAG_STOLEN &&
> > +	if (flags & XE_BO_FLAG_STOLEN && vmap &&
> >  	    xe_ttm_stolen_cpu_access_needs_ggtt(xe))
> >  		flags |= XE_BO_FLAG_GGTT;
> >  
> > @@ -2289,9 +2280,11 @@ struct xe_bo
> > *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
> >  	if (err)
> >  		goto err_put;
> >  
> > -	err = xe_bo_vmap(bo);
> > -	if (err)
> > -		goto err_unpin;
> > +	if (vmap) {
> > +		err = xe_bo_vmap(bo);
> > +		if (err)
> > +			goto err_unpin;
> > +	}
> >  
> >  	xe_bo_unlock_vm_held(bo);
> >  
> > @@ -2305,11 +2298,59 @@ struct xe_bo
> > *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
> >  	return ERR_PTR(err);
> >  }
> >  
> > +/**
> > + * xe_bo_create_pin_map_at_novm() - Create pinned and mapped bo at
> > optional VRAM offset
> > + * @xe: The xe device.
> > + * @tile: The tile to select for migration of this bo, and the
> > tile used for
> > + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel
> > bos.
> > + * @size: The storage size to use for the bo.
> > + * @offset: Optional VRAM offset or %0 for don't care.
> > + * @type: The TTM buffer object type.
> > + * @flags: XE_BO_FLAG_ flags.
> > + * @vmap: Whether to create a buffer object map.
> > + * @alignment: GGTT alignment.
> > + * @intr: Whether to execut any waits for backing store
> > interruptible.
> > + *
> > + * Create a pinned and optionally mapped bo with VRAM offset and
> > GGTT alignment
> > + * options. The bo will be external and not associated with a VM.
> > + *
> > + * Return: The buffer object on success. Negative error pointer on
> > failure.
> > + * In particular, the function may return ERR_PTR(%-EINTR) if
> > @intr was set
> > + * to true on entry.
> > + */
> > +struct xe_bo *
> > +xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile
> > *tile,
> > +			     size_t size, u64 offset, enum
> > ttm_bo_type type, u32 flags,
> > +			     bool vmap, u64 alignment, bool intr)
> > +{
> > +	u32 drm_exec_flags = intr ? DRM_EXEC_INTERRUPTIBLE_WAIT :
> > 0;
> > +	struct xe_validation_ctx ctx;
> > +	struct drm_exec exec;
> > +	struct xe_bo *bo;
> > +	int ret = 0;
> > +
> > +	xe_validation_guard(&ctx, &xe->val, &exec, drm_exec_flags,
> > ret, false) {
> > +		bo = xe_bo_create_pin_map_at_aligned(xe, tile,
> > NULL, size, offset,
> > +						     type, flags,
> > vmap,
> > +						     alignment,
> > &exec);
> > +		drm_exec_retry_on_contention(&exec);
> > +		if (IS_ERR(bo)) {
> > +			ret = PTR_ERR(bo);
> > +			xe_validation_retry_on_oom(&ctx, &ret);
> > +		}
> > +	}
> > +
> > +	return ret ? ERR_PTR(ret) : bo;
> > +}
> > +
> >  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct
> > xe_tile *tile,
> >  				   struct xe_vm *vm, size_t size,
> >  				   enum ttm_bo_type type, u32
> > flags)
> >  {
> > -	return xe_bo_create_pin_map_at(xe, tile, vm, size, ~0ull,
> > type, flags);
> > +	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) :
> > XE_VALIDATION_UNIMPLEMENTED;
> > +
> > +	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size,
> > ~0ull, type, flags,
> > +					       true, 0, exec);
> >  }
> >  
> >  static void __xe_bo_unpin_map_no_vm(void *arg)
> > diff --git a/drivers/gpu/drm/xe/xe_bo.h
> > b/drivers/gpu/drm/xe/xe_bo.h
> > index a625806deeb6..d06266af9662 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.h
> > +++ b/drivers/gpu/drm/xe/xe_bo.h
> > @@ -109,15 +109,10 @@ struct xe_bo *xe_bo_create_user(struct
> > xe_device *xe, struct xe_vm *vm, size_t s
> >  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct
> > xe_tile *tile,
> >  				   struct xe_vm *vm, size_t size,
> >  				   enum ttm_bo_type type, u32
> > flags);
> > -struct xe_bo *xe_bo_create_pin_map_at(struct xe_device *xe, struct
> > xe_tile *tile,
> > -				      struct xe_vm *vm, size_t
> > size, u64 offset,
> > -				      enum ttm_bo_type type, u32
> > flags);
> > -struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device
> > *xe,
> > -					      struct xe_tile
> > *tile,
> > -					      struct xe_vm *vm,
> > -					      size_t size, u64
> > offset,
> > -					      enum ttm_bo_type
> > type, u32 flags,
> > -					      u64 alignment);
> > +struct xe_bo *
> > +xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile
> > *tile,
> > +			     size_t size, u64 offset, enum
> > ttm_bo_type type,
> > +			     u32 flags, bool vmap, u64 alignment,
> > bool intr);
> >  struct xe_bo *xe_managed_bo_create_pin_map(struct xe_device *xe,
> > struct xe_tile *tile,
> >  					   size_t size, u32
> > flags);
> >  struct xe_bo *xe_managed_bo_create_from_data(struct xe_device *xe,
> > struct xe_tile *tile,
> > diff --git a/drivers/gpu/drm/xe/xe_eu_stall.c
> > b/drivers/gpu/drm/xe/xe_eu_stall.c
> > index fdd514fec5ef..afabfc125488 100644
> > --- a/drivers/gpu/drm/xe/xe_eu_stall.c
> > +++ b/drivers/gpu/drm/xe/xe_eu_stall.c
> > @@ -617,9 +617,9 @@ static int xe_eu_stall_data_buf_alloc(struct
> > xe_eu_stall_data_stream *stream,
> >  
> >  	size = stream->per_xecore_buf_size * last_xecore;
> >  
> > -	bo = xe_bo_create_pin_map_at_aligned(tile->xe, tile, NULL,
> > -					     size, ~0ull,
> > ttm_bo_type_kernel,
> > -					     XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_GGTT, SZ_64);
> > +	bo = xe_bo_create_pin_map_at_novm(tile->xe, tile, size,
> > ~0ull, ttm_bo_type_kernel,
> > +					  XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_GGTT, true,
> > +					  SZ_64, false);
> >  	if (IS_ERR(bo)) {
> >  		kfree(stream->xecore_buf);
> >  		return PTR_ERR(bo);
> > -- 
> > 2.50.1
> > 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 15/15] drm/xe: Convert pinned suspend eviction for exhaustive eviction
  2025-08-14 20:30   ` Matthew Brost
@ 2025-08-15 15:29     ` Thomas Hellström
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Hellström @ 2025-08-15 15:29 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Thu, 2025-08-14 at 13:30 -0700, Matthew Brost wrote:
> On Wed, Aug 13, 2025 at 12:51:21PM +0200, Thomas Hellström wrote:
> > Pinned suspend eviction and preparation for eviction validates
> > system memory for eviction buffers. Do that under a
> > validation exclusive lock to avoid interfering with other
> > processes validating system graphics memory.
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_bo.c | 205 +++++++++++++++++++--------------
> > ----
> >  1 file changed, 108 insertions(+), 97 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > b/drivers/gpu/drm/xe/xe_bo.c
> > index 82bf158426ad..efb9c88b6aa7 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > @@ -1139,43 +1139,47 @@ long xe_bo_shrink(struct ttm_operation_ctx
> > *ctx, struct ttm_buffer_object *bo,
> >  int xe_bo_notifier_prepare_pinned(struct xe_bo *bo)
> >  {
> >  	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
> > -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> > +	struct xe_validation_ctx ctx;
> > +	struct drm_exec exec;
> >  	struct xe_bo *backup;
> >  	int ret = 0;
> >  
> > -	xe_bo_lock(bo, false);
> > +	xe_validation_guard(&ctx, &xe->val, &exec, 0, ret, true) {
> > +		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
> > +		drm_exec_retry_on_contention(&exec);
> > +		xe_assert(xe, !ret);
> > +		xe_assert(xe, !bo->backup_obj);
> >  
> > -	xe_assert(xe, !bo->backup_obj);
> > +		/*
> > +		 * Since this is called from the PM notifier we
> > might have raced with
> > +		 * someone unpinning this after we dropped the
> > pinned list lock and
> > +		 * grabbing the above bo lock.
> > +		 */
> > +		if (!xe_bo_is_pinned(bo))
> > +			break;
> >  
> > -	/*
> > -	 * Since this is called from the PM notifier we might have
> > raced with
> > -	 * someone unpinning this after we dropped the pinned list
> > lock and
> > -	 * grabbing the above bo lock.
> > -	 */
> > -	if (!xe_bo_is_pinned(bo))
> > -		goto out_unlock_bo;
> > +		if (!xe_bo_is_vram(bo))
> > +			break;
> >  
> > -	if (!xe_bo_is_vram(bo))
> > -		goto out_unlock_bo;
> > +		if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
> > +			break;
> >  
> > -	if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
> > -		goto out_unlock_bo;
> > +		backup = xe_bo_init_locked(xe, NULL, NULL, bo-
> > >ttm.base.resv, NULL, xe_bo_size(bo),
> > +					  
> > DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> > +					   XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_NEEDS_CPU_ACCESS |
> > +					   XE_BO_FLAG_PINNED,
> > &exec);
> > +		if (IS_ERR(backup)) {
> > +			drm_exec_retry_on_contention(&exec);
> > +			ret = PTR_ERR(backup);
> > +			xe_validation_retry_on_oom(&ctx, &ret);
> > +			break;
> > +		}
> >  
> > -	backup = xe_bo_init_locked(xe, NULL, NULL, bo-
> > >ttm.base.resv, NULL, xe_bo_size(bo),
> > -				   DRM_XE_GEM_CPU_CACHING_WB,
> > ttm_bo_type_kernel,
> > -				   XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_NEEDS_CPU_ACCESS |
> > -				   XE_BO_FLAG_PINNED, exec);
> > -	if (IS_ERR(backup)) {
> > -		ret = PTR_ERR(backup);
> > -		goto out_unlock_bo;
> > +		backup->parent_obj = xe_bo_get(bo); /* Released by
> > bo_destroy */
> > +		ttm_bo_pin(&backup->ttm);
> > +		bo->backup_obj = backup;
> >  	}
> >  
> > -	backup->parent_obj = xe_bo_get(bo); /* Released by
> > bo_destroy */
> > -	ttm_bo_pin(&backup->ttm);
> > -	bo->backup_obj = backup;
> > -
> > -out_unlock_bo:
> > -	xe_bo_unlock(bo);
> >  	return ret;
> >  }
> >  
> > @@ -1215,99 +1219,106 @@ int xe_bo_notifier_unprepare_pinned(struct
> > xe_bo *bo)
> >  int xe_bo_evict_pinned(struct xe_bo *bo)
> >  {
> >  	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
> > -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> > +	struct xe_validation_ctx ctx;
> > +	struct drm_exec exec;
> >  	struct xe_bo *backup = bo->backup_obj;
> >  	bool backup_created = false;
> >  	bool unmap = false;
> >  	int ret = 0;
> >  
> > -	xe_bo_lock(bo, false);
> > +	xe_validation_guard(&ctx, &xe->val, &exec, 0, ret, true) {
> > +		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
> > +		drm_exec_retry_on_contention(&exec);
> > +		xe_assert(xe, !ret);
> >  
> > -	if (WARN_ON(!bo->ttm.resource)) {
> > -		ret = -EINVAL;
> > -		goto out_unlock_bo;
> > -	}
> > +		if (WARN_ON(!bo->ttm.resource)) {
> > +			ret = -EINVAL;
> > +			break;
> > +		}
> >  
> > -	if (WARN_ON(!xe_bo_is_pinned(bo))) {
> > -		ret = -EINVAL;
> > -		goto out_unlock_bo;
> > -	}
> > +		if (WARN_ON(!xe_bo_is_pinned(bo))) {
> > +			ret = -EINVAL;
> > +			break;
> > +		}
> >  
> > -	if (!xe_bo_is_vram(bo))
> > -		goto out_unlock_bo;
> > +		if (!xe_bo_is_vram(bo))
> > +			break;
> >  
> > -	if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
> > -		goto out_unlock_bo;
> > +		if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
> > +			break;
> >  
> > -	if (!backup) {
> > -		backup = xe_bo_init_locked(xe, NULL, NULL, bo-
> > >ttm.base.resv, NULL, xe_bo_size(bo),
> > -					  
> > DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> > -					   XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_NEEDS_CPU_ACCESS |
> > -					   XE_BO_FLAG_PINNED,
> > exec);
> > -		if (IS_ERR(backup)) {
> > -			ret = PTR_ERR(backup);
> > -			goto out_unlock_bo;
> > +		if (!backup) {
> > +			backup = xe_bo_init_locked(xe, NULL, NULL,
> > bo->ttm.base.resv, NULL,
> > +						   xe_bo_size(bo),
> > +						  
> > DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> > +						  
> > XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> > +						  
> > XE_BO_FLAG_PINNED, &exec);
> > +			if (IS_ERR(backup)) {
> > +				drm_exec_retry_on_contention(&exec
> > );
> > +				ret = PTR_ERR(backup);
> > +				xe_validation_retry_on_oom(&ctx,
> > &ret);
> > +				break;
> > +			}
> > +			backup->parent_obj = xe_bo_get(bo); /*
> > Released by bo_destroy */
> > +			backup_created = true;
> >  		}
> > -		backup->parent_obj = xe_bo_get(bo); /* Released by
> > bo_destroy */
> > -		backup_created = true;
> > -	}
> >  
> > -	if (xe_bo_is_user(bo) || (bo->flags &
> > XE_BO_FLAG_PINNED_LATE_RESTORE)) {
> > -		struct xe_migrate *migrate;
> > -		struct dma_fence *fence;
> > -
> > -		if (bo->tile)
> > -			migrate = bo->tile->migrate;
> > -		else
> > -			migrate = mem_type_to_migrate(xe, bo-
> > >ttm.resource->mem_type);
> > +		if (xe_bo_is_user(bo) || (bo->flags &
> > XE_BO_FLAG_PINNED_LATE_RESTORE)) {
> > +			struct xe_migrate *migrate;
> > +			struct dma_fence *fence;
> >  
> > -		ret = dma_resv_reserve_fences(bo->ttm.base.resv,
> > 1);
> > -		if (ret)
> > -			goto out_backup;
> > +			if (bo->tile)
> > +				migrate = bo->tile->migrate;
> > +			else
> > +				migrate = mem_type_to_migrate(xe,
> > bo->ttm.resource->mem_type);
> >  
> > -		ret = dma_resv_reserve_fences(backup-
> > >ttm.base.resv, 1);
> > -		if (ret)
> > -			goto out_backup;
> > +			ret = dma_resv_reserve_fences(bo-
> > >ttm.base.resv, 1);
> > +			if (ret)
> > +				goto out_backup;
> >  
> > -		fence = xe_migrate_copy(migrate, bo, backup, bo-
> > >ttm.resource,
> > -					backup->ttm.resource,
> > false);
> > -		if (IS_ERR(fence)) {
> > -			ret = PTR_ERR(fence);
> > -			goto out_backup;
> > -		}
> > +			ret = dma_resv_reserve_fences(backup-
> > >ttm.base.resv, 1);
> > +			if (ret)
> > +				goto out_backup;
> >  
> > -		dma_resv_add_fence(bo->ttm.base.resv, fence,
> > -				   DMA_RESV_USAGE_KERNEL);
> > -		dma_resv_add_fence(backup->ttm.base.resv, fence,
> > -				   DMA_RESV_USAGE_KERNEL);
> > -		dma_fence_put(fence);
> > -	} else {
> > -		ret = xe_bo_vmap(backup);
> > -		if (ret)
> > -			goto out_backup;
> > +			fence = xe_migrate_copy(migrate, bo,
> > backup, bo->ttm.resource,
> > +						backup-
> > >ttm.resource, false);
> > +			if (IS_ERR(fence)) {
> > +				ret = PTR_ERR(fence);
> > +				goto out_backup;
> > +			}
> >  
> > -		if (iosys_map_is_null(&bo->vmap)) {
> > -			ret = xe_bo_vmap(bo);
> > +			dma_resv_add_fence(bo->ttm.base.resv,
> > fence,
> > +					   DMA_RESV_USAGE_KERNEL);
> > +			dma_resv_add_fence(backup->ttm.base.resv,
> > fence,
> > +					   DMA_RESV_USAGE_KERNEL);
> > +			dma_fence_put(fence);
> > +		} else {
> > +			ret = xe_bo_vmap(backup);
> >  			if (ret)
> >  				goto out_backup;
> > -			unmap = true;
> > -		}
> >  
> > -		xe_map_memcpy_from(xe, backup->vmap.vaddr, &bo-
> > >vmap, 0,
> > -				   xe_bo_size(bo));
> > -	}
> > +			if (iosys_map_is_null(&bo->vmap)) {
> > +				ret = xe_bo_vmap(bo);
> > +				if (ret)
> > +					goto out_vunmap;
> > +				unmap = true;
> > +			}
> >  
> > -	if (!bo->backup_obj)
> > -		bo->backup_obj = backup;
> > +			xe_map_memcpy_from(xe, backup->vmap.vaddr,
> > &bo->vmap, 0,
> > +					   xe_bo_size(bo));
> > +		}
> >  
> > +		if (!bo->backup_obj)
> > +			bo->backup_obj = backup;
> > +out_vunmap:
> 
> I just want to confirm that this is safe. The cleanup.h documentation
> discourages the use of goto because of scoping issues. I assume that
> since this label is within the scope of the guard, it is fine.
> 
> It might be worth adding a quick note in the validation guard’s
> kernel-doc mentioning that goto can be dangerous, explaining what is
> allowed, and perhaps referencing the cleanup.h documentation. I could
> see this being something developers might trip over.
> 

Yes you are correct. I'll avoid the gotos in v2.

/Thomas



> Patch LGTM, though.
> 
> Matt
> 
> > +		xe_bo_vunmap(backup);
> >  out_backup:
> > -	xe_bo_vunmap(backup);
> > -	if (ret && backup_created)
> > -		xe_bo_put(backup);
> > -out_unlock_bo:
> > -	if (unmap)
> > -		xe_bo_vunmap(bo);
> > -	xe_bo_unlock(bo);
> > +		if (ret && backup_created)
> > +			xe_bo_put(backup);
> > +		if (unmap)
> > +			xe_bo_vunmap(bo);
> > +	}
> > +
> >  	return ret;
> >  }
> >  
> > -- 
> > 2.50.1
> > 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 05/15] drm/xe: Introduce an xe_validation wrapper around drm_exec
  2025-08-15 15:23     ` Thomas Hellström
@ 2025-08-15 19:01       ` Matthew Brost
  0 siblings, 0 replies; 66+ messages in thread
From: Matthew Brost @ 2025-08-15 19:01 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Fri, Aug 15, 2025 at 05:23:41PM +0200, Thomas Hellström wrote:
> On Wed, 2025-08-13 at 19:33 -0700, Matthew Brost wrote:
> > On Wed, Aug 13, 2025 at 12:51:11PM +0200, Thomas Hellström wrote:
> > > Introduce a validation wrapper xe_validation_guard() as a helper
> > > intended to be used around drm_exec transactions what perform
> > > validations. Once TTM can handle exhaustive eviction we could
> > > remove this wrapper or make it mostly a NO-OP unless other
> > > functionality is added to it.
> > > 
> > > Currently the wrapper takes a read lock upon entry and if the
> > > transaction hits an OOM, all locks are released and the
> > > transaction is retried with a write-lock. If all other
> > > validations participate in this scheme, the transaction with
> > > the write lock will be the only transaction validating and
> > > should have access to all available non-pinned memory.
> > > 
> > > There is currently a problem in that TTM converts -EDEADLOCKS to
> > > -ENOMEM, and with ww_mutex slowpath error injections, we can hit
> > > -ENOMEMs without having actually ran out of memory. We abuse
> > > ww_mutex internals to detect such situations until TTM is fixes
> > > to not convert the error code. In the meantime, injecting
> > > ww_mutex slowpath -EDEADLOCKs is a good way to test
> > > the implementation in the absence of real OOMs.
> > > 
> > > Just introduce the wrapper in this commit. It will be hooked up
> > > to the driver in following commits.
> > > 
> > > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_validation.c | 199
> > > +++++++++++++++++++++++++++++
> > >  drivers/gpu/drm/xe/xe_validation.h | 107 ++++++++++++++++
> > >  2 files changed, 306 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_validation.c
> > > b/drivers/gpu/drm/xe/xe_validation.c
> > > index cc0684d24e02..cd1424f04237 100644
> > > --- a/drivers/gpu/drm/xe/xe_validation.c
> > > +++ b/drivers/gpu/drm/xe/xe_validation.c
> > > @@ -5,6 +5,7 @@
> > >  #include "xe_bo.h"
> > >  #include <drm/drm_exec.h>
> > >  #include <drm/drm_gem.h>
> > > +#include <drm/drm_gpuvm.h>
> > >  
> > >  #include "xe_assert.h"
> > >  #include "xe_validation.h"
> > > @@ -47,3 +48,201 @@ void xe_validation_assert_exec(const struct
> > > xe_device *xe,
> > >  	}
> > >  }
> > >  #endif
> > > +
> > > +static int xe_validation_lock(struct xe_validation_ctx *ctx)
> > > +{
> > > +	struct xe_validation_device *val = ctx->val;
> > > +	int ret = 0;
> > > +
> > > +	if (ctx->flags & DRM_EXEC_INTERRUPTIBLE_WAIT) {
> > > +		if (ctx->request_exclusive)
> > > +			ret = down_write_killable(&val->lock);
> > > +		else
> > > +			ret = down_read_interruptible(&val->lock);
> > > +	} else {
> > > +		if (ctx->request_exclusive)
> > > +			down_write(&val->lock);
> > > +		else
> > > +			down_read(&val->lock);
> > > +	}
> > > +
> > > +	if (!ret) {
> > > +		ctx->lock_held = true;
> > > +		ctx->lock_held_exclusive = ctx->request_exclusive;
> > > +	}
> > > +
> > > +	return ret;
> > > +}
> > > +
> > > +static void xe_validation_unlock(struct xe_validation_ctx *ctx)
> > > +{
> > > +	if (!ctx->lock_held)
> > > +		return;
> > > +
> > > +	if (ctx->lock_held_exclusive)
> > > +		up_write(&ctx->val->lock);
> > > +	else
> > > +		up_read(&ctx->val->lock);
> > > +
> > > +	ctx->lock_held = false;
> > > +}
> > > +
> > > +/**
> > > + * xe_validation_ctx_init() - Initialize an xe_validation_ctx
> > > + * @ctx: The xe_validation_ctx to initialize.
> > > + * @val: The xe_validation_device representing the validation
> > > domain.
> > > + * @exec: The struct drm_exec to use for the transaction.
> > > + * @flags: The flags to use for drm_exec initialization.
> > > + * @nr: The number of anticipated buffer object locks. Forwarded
> > > to
> > > + * drm_exec initialization.
> > > + * @exclusive: Whether to use exclusive locking already on first
> > > validation.
> > 
> > The last two parameters of this function are always passed as 0 and
> > false in this series. Is it worth keeping them? I don’t see a case
> > where
> > nr would ever be non-zero.
> 
> Right. I'll remove that from the interface but keep it in the struct so
> that if we ever need it, we can just change the interface.
> 
> > exclusive is defensible, but it’s still
> > unused.
> 
> Actually I think with vf provisioning and in the pm notifier it makes
> sense to set it to true.
> 
> >  Maybe drop both and reserve a bit in flags for a driver-defined
> > “exclusive.” That would make the call sites more readable—long
> > argument
> > lists make it easy to forget what each parameter means or to
> > transpose
> > them.
> 
> Problem is that this is drm_exec flags. I'm not really keen on
> overloading driver defined flags there. We could add a separate const

Yes, usually if you want to overload bits we have an explictly defined
user bit value in the base flags which doesn't currently exist.

> struct xe_validation_flags, though, but then we'd have a translation
> step?

xe_validation_flags would make it more clear at the caller what is doing
rather than 'false/true'.

Matt

> 
> > 
> > > + *
> > > + * Initialize and lock a an xe_validation transaction using the
> > > validation domain
> > > + * represented by @val. Also initialize the drm_exec object
> > > forwarding
> > > + * @flags and @nr to the drm_exec initialization. The @exclusive
> > > parameter should
> > > + * typically be set to false to avoid locking out other validators
> > > from the
> > > + * domain until an OOM is hit. For testing- or final attempt
> > > purposes it can,
> > > + * however, be set to true.
> > > + *
> > > + * Return: %0 on success, %-EINTR if interruptible initial locking
> > > failed with a
> > > + * signal pending.
> > > + */
> > > +int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct
> > > xe_validation_device *val,
> > > +			   struct drm_exec *exec, u32 flags,
> > > unsigned int nr,
> > > +			   bool exclusive)
> > > +{
> > > +	int ret;
> > > +
> > > +	ctx->exec = exec;
> > > +	ctx->val = val;
> > > +	ctx->lock_held = false;
> > > +	ctx->lock_held_exclusive = false;
> > > +	ctx->request_exclusive = exclusive;
> > > +	ctx->flags = flags;
> > > +	ctx->nr = nr;
> > > +
> > > +	ret = xe_validation_lock(ctx);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	drm_exec_init(exec, flags, nr);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> > > +/*
> > > + * This abuses both drm_exec and ww_mutex internals and should be
> > > + * replaced by checking for -EDEADLK when we can make TTM
> > > + * stop converting -EDEADLK to -ENOMEM.
> > > + * An alternative is to not have exhaustive eviction with
> > > + * CONFIG_DEBUG_WW_MUTEX_SLOWPATH until that happens.
> > > + */
> > > +static bool xe_validation_contention_injected(struct drm_exec
> > > *exec)
> > > +{
> > > +	return !!exec->ticket.contending_lock;
> > > +}
> > > +
> > > +#else
> > > +
> > > +static bool xe_validation_contention_injected(struct drm_exec
> > > *exec)
> > > +{
> > > +	return false;
> > > +}
> > > +
> > > +#endif
> > > +
> > > +static bool __xe_validation_should_retry(struct xe_validation_ctx
> > > *ctx, int ret)
> > > +{
> > > +	if (ret == -ENOMEM &&
> > > +	    ((ctx->request_exclusive &&
> > > +	      xe_validation_contention_injected(ctx->exec)) ||
> > > +	     !ctx->request_exclusive)) {
> > > +		ctx->request_exclusive = true;
> > > +		return true;
> > > +	}
> > > +
> > > +	return false;
> > > +}
> > > +
> > > +/**
> > > + * xe_validation_exec_lock() - Perform drm_gpuvm_exec_lock within
> > > a validation
> > > + * transaction.
> > > + * @ctx: An uninitialized xe_validation_ctx.
> > > + * @vm_exec: An initialized struct vm_exec.
> > > + * @val: The validation domain.
> > > + *
> > > + * The drm_gpuvm_exec_lock() function internally initializes its
> > > drm_exec
> > > + * transaction and therefore doesn't lend itself very well to be
> > > using
> > > + * xe_validation_ctx_init(). Provide a helper that takes an
> > > uninitialized
> > > + * xe_validation_ctx and calls drm_gpuvm_exec_lock() with OOM
> > > retry.
> > > + *
> > > + * Return: %0 on success, negative error code on failure.
> > > + */
> > > +int xe_validation_exec_lock(struct xe_validation_ctx *ctx,
> > > +			    struct drm_gpuvm_exec *vm_exec,
> > > +			    struct xe_validation_device *val)
> > > +{
> > > +	int ret;
> > > +
> > > +	memset(ctx, 0, sizeof(*ctx));
> > > +	ctx->exec = &vm_exec->exec;
> > > +	ctx->flags = vm_exec->flags;
> > > +	ctx->val = val;
> > > +retry:
> > > +	ret = xe_validation_lock(ctx);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	ret = drm_gpuvm_exec_lock(vm_exec);
> > > +	if (ret) {
> > > +		xe_validation_unlock(ctx);
> > > +		if (__xe_validation_should_retry(ctx, ret))
> > > +			goto retry;
> > > +	}
> > > +
> > > +	return ret;
> > > +}
> > > +
> > > +/**
> > > + * xe_validation_ctx_fini() - Finalize a validation transaction
> > > + * @ctx: The Validation transaction to finalize.
> > > + *
> > > + * Finalize a validation transaction and its related drm_exec
> > > transaction.
> > > + */
> > > +void xe_validation_ctx_fini(struct xe_validation_ctx *ctx)
> > > +{
> > > +	drm_exec_fini(ctx->exec);
> > > +	xe_validation_unlock(ctx);
> > > +}
> > > +
> > > +/**
> > > + * xe_validation_should_retry() - Determine if a validation
> > > transaction should retry
> > > + * @ctx: The validation transaction.
> > > + * @ret: Pointer to a return value variable.
> > > + *
> > > + * Determines whether a validation transaction should retry based
> > > on the
> > > + * internal transaction state and the return value pointed to by
> > > @ret.
> > > + * If a validation should be retried, the transaction is prepared
> > > for that,
> > > + * and the validation locked might be re-locked in exclusive mode,
> > > and *@ret
> > > + * is set to %0. If the re-locking errors, typically due to
> > > interruptible
> > > + * locking with signal pending, *@ret is instead set to -EINTR and
> > > the
> > > + * function returns %false.
> > > + *
> > > + * Return: %true if validation should be retried, %false
> > > otherwise.
> > > + */
> > > +bool xe_validation_should_retry(struct xe_validation_ctx *ctx, int
> > > *ret)
> > > +{
> > > +	if (__xe_validation_should_retry(ctx, *ret)) {
> > > +		drm_exec_fini(ctx->exec);
> > > +		*ret = 0;
> > > +		if (ctx->request_exclusive != ctx-
> > > >lock_held_exclusive) {
> > > +			xe_validation_unlock(ctx);
> > > +			*ret = xe_validation_lock(ctx);
> > > +		}
> > > +		drm_exec_init(ctx->exec, ctx->flags, ctx->nr);
> > > +		return !*ret;
> > > +	}
> > > +
> > > +	return false;
> > > +}
> > > diff --git a/drivers/gpu/drm/xe/xe_validation.h
> > > b/drivers/gpu/drm/xe/xe_validation.h
> > > index db50feacad7a..a708c260cf18 100644
> > > --- a/drivers/gpu/drm/xe/xe_validation.h
> > > +++ b/drivers/gpu/drm/xe/xe_validation.h
> > > @@ -7,9 +7,11 @@
> > >  
> > >  #include <linux/dma-resv.h>
> > >  #include <linux/types.h>
> > > +#include <linux/rwsem.h>
> > >  
> > >  struct drm_exec;
> > >  struct drm_gem_object;
> > > +struct drm_gpuvm_exec;
> > >  struct xe_device;
> > >  
> > >  #ifdef CONFIG_PROVE_LOCKING
> > > @@ -66,4 +68,109 @@ void xe_validation_assert_exec(const struct
> > > xe_device *xe, const struct drm_exec
> > >  	} while (0)
> > >  #endif
> > >  
> > > +/**
> > > + * struct xe_validation_device - The domain for exhaustive
> > > eviction
> > > + * @lock: The lock used to exclude other processes from allocating
> > > graphics memory
> > > + *
> > > + * The struct xe_validation_device represents the domain for which
> > > we want to use
> > > + * exhaustive eviction. The @lock is typically grabbed in read
> > > mode for allocations
> > > + * but when graphics memory allocation fails, it is retried with
> > > the write mode held.
> > > + */
> > > +struct xe_validation_device {
> > > +	struct rw_semaphore lock;
> > > +};
> > > +
> > > +/**
> > > + * struct xe_validation_ctx - A struct drm_exec subclass with
> > > support for
> > > + * exhaustive eviction
> > > + * @exec: The drm_exec object base class. Note that we use a
> > > pointer instead of
> > > + * embedding to avoid diamond inheritance.
> > > + * @val: The exhaustive eviction domain.
> > > + * @lock_held: Whether The domain lock is currently held.
> > > + * @lock_held_exclusive: Whether the domain lock is held in
> > > exclusive mode.
> > > + * @request_exclusive: Whether to lock exclusively (write mode)
> > > the next time
> > > + * the domain lock is locked.
> > > + * @flags: The drm_exec flags used for drm_exec (re-
> > > )initialization.
> > > + * @nr: The drm_exec nr parameter used for drm_exec (re-
> > > )initializaiton.
> > > + */
> > > +struct xe_validation_ctx {
> > > +	struct drm_exec *exec;
> > > +	struct xe_validation_device *val;
> > > +	bool lock_held;
> > > +	bool lock_held_exclusive;
> > > +	bool request_exclusive;
> > > +	u32 flags;
> > > +	unsigned int nr;
> > > +};
> > > +
> > > +int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct
> > > xe_validation_device *val,
> > > +			   struct drm_exec *exec, u32 flags,
> > > unsigned int nr,
> > > +			   bool exclusive);
> > > +
> > > +int xe_validation_exec_lock(struct xe_validation_ctx *ctx, struct
> > > drm_gpuvm_exec *vm_exec,
> > > +			    struct xe_validation_device *val);
> > > +
> > > +void xe_validation_ctx_fini(struct xe_validation_ctx *ctx);
> > > +
> > > +bool xe_validation_should_retry(struct xe_validation_ctx *ctx, int
> > > *ret);
> > > +
> > > +/**
> > > + * xe_validation_retry_on_oom() - Retry on oom in an xe_validaton
> > > transaction
> > > + * @_ctx: Pointer to the xe_validation_ctx
> > > + * @_ret: The current error value possibly holding -ENOMEM
> > > + *
> > > + * Use this in way similar to drm_exec_retry_on_contention().
> > > + * If @_ret contains -ENOMEM the tranaction is restarted once in a
> > > way that
> > > + * blocks other transactions and allows exhastive eviction. If the
> > > transaction
> > > + * was already restarted once, Just return the -ENOMEM. May also
> > > set
> > > + * _ret to -EINTR if not retrying and waits are interruptible.
> > > + * May only be used within a drm_exec_until_all_locked() loop.
> > > + */
> > > +#define xe_validation_retry_on_oom(_ctx,
> > > _ret)				\
> > > +	do
> > > {								\
> > > +		if (xe_validation_should_retry(_ctx,
> > > _ret))		\
> > > +			goto
> > > *__drm_exec_retry_ptr;			\
> > > +	} while (0)
> > > +
> > > +/**
> > > + * xe_validation_device_init - Initialize a struct
> > > xe_validation_device
> > > + * @val: The xe_validation_device to init.
> > > + */
> > > +static inline void
> > > +xe_validation_device_init(struct xe_validation_device *val)
> > > +{
> > > +	init_rwsem(&val->lock);
> > > +}
> > > +
> > > +/*
> > > + * Make guard() and scoped_guard() work with xe_validation_ctx
> > > + * so that we can exit transactions without caring about the
> > > + * cleanup.
> > > + */
> > > +DEFINE_CLASS(xe_validation, struct xe_validation_ctx *,
> > > +	     if (!IS_ERR(_T)) xe_validation_ctx_fini(_T);,
> > > +	     ({_ret = xe_validation_ctx_init(_ctx, _val, _exec,
> > > _flags, 0, _excl);
> > > +	       _ret ? NULL : _ctx; }),
> > > +	     struct xe_validation_ctx *_ctx, struct
> > > xe_validation_device *_val,
> > > +	     struct drm_exec *_exec, u32 _flags, int _ret, bool
> > > _excl);
> > > +static inline void
> > > *class_xe_validation_lock_ptr(class_xe_validation_t *_T)
> > > +{return *_T; }
> > > +#define class_xe_validation_is_conditional false
> > > +
> > > +/**
> > > + * xe_validation_guard() - An auto-cleanup xe_validation_ctx
> > > transaction
> > > + * @_ctx: The xe_validation_ctx.
> > > + * @_val: The xe_validation_device.
> > > + * @_exec: The struct drm_exec object
> > > + * @_flags: Flags for the drm_exec transaction. See the struct
> > > drm_exec documention!
> > > + * @_ret: Return in / out parameter. May be set by this macro.
> > > Typicall 0 when called.
> > > + * @_excl: Whether to start in exclusive mode already in the first
> > > iteration.
> > > + *
> > 
> > Same comment as above on function xe_validation_ctx_init wrt to
> > arguments.
> 
> OK, let me know what you think given the above.
> 
> /Thomas
> 
> 
> > 
> > Matt
> > 
> > > + * This macro is will initiate a drm_exec transaction with
> > > additional support for
> > > + * exhaustive eviction.
> > > + */
> > > +#define xe_validation_guard(_ctx, _val, _exec, _flags, _ret,
> > > _excl)	\
> > > +	scoped_guard(xe_validation, _ctx, _val, _exec, _flags,
> > > _ret, _excl) \
> > > +	drm_exec_until_all_locked(_exec)
> > > +
> > >  #endif
> > > -- 
> > > 2.50.1
> > > 
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 09/15] drm/xe: Convert the CPU fault handler for exhaustive eviction
  2025-08-15 15:16     ` Thomas Hellström
@ 2025-08-15 19:04       ` Matthew Brost
  2025-08-18  9:11         ` Thomas Hellström
  0 siblings, 1 reply; 66+ messages in thread
From: Matthew Brost @ 2025-08-15 19:04 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Fri, Aug 15, 2025 at 05:16:54PM +0200, Thomas Hellström wrote:
> On Wed, 2025-08-13 at 15:06 -0700, Matthew Brost wrote:
> > On Wed, Aug 13, 2025 at 12:51:15PM +0200, Thomas Hellström wrote:
> > > The CPU fault handler may populate bos and migrate, and in doing
> > > so might interfere with other tasks validing.
> > > 
> > > Convert it for exhaustive eviction. To do this properly without
> > > potentially introducing stalls with the mmap lock held requires
> > > TTM work. In the meantime, let's live with those stalls that
> > > would typically happen on memory pressure.
> > > 
> > > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_bo.c | 11 ++++++++---
> > >  1 file changed, 8 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > > b/drivers/gpu/drm/xe/xe_bo.c
> > > index 5e40b6cb8d2a..dd1e0e9957e0 100644
> > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > @@ -1720,14 +1720,18 @@ static vm_fault_t xe_gem_fault(struct
> > > vm_fault *vmf)
> > >  	struct xe_device *xe = to_xe_device(ddev);
> > >  	struct xe_bo *bo = ttm_to_xe_bo(tbo);
> > >  	bool needs_rpm = bo->flags & XE_BO_FLAG_VRAM_MASK;
> > > -	struct drm_exec *exec;
> > > +	struct xe_validation_ctx ctx;
> > > +	struct drm_exec exec;
> > >  	vm_fault_t ret;
> > >  	int idx;
> > >  
> > >  	if (needs_rpm)
> > >  		xe_pm_runtime_get(xe);
> > >  
> > > -	exec = XE_VALIDATION_UNIMPLEMENTED;
> > > +	if (xe_validation_ctx_init(&ctx, &xe->val, &exec,
> > > +				   DRM_EXEC_INTERRUPTIBLE_WAIT, 0,
> > > false))
> > > +		return VM_FAULT_NOPAGE;
> > 
> > Any particular reason to not use xe_validation_guard here?
> 
> Well this is a bit complicated ATM.
> We would need some serious TTM rework here to support drm_exec in these
> helpers, and ATM I think upon closer inspection we'd need an
> xe_validation_ctx_init that doesn't initialize a drm_exec.
> 

Right, so I think this is an unsupported case then.

Matt

> ttm_bo_vm_reserve() might use a bo lock without a drm_exec and that
> will cause a lockdep splat if the drm_exec transaction has initialized
> the ww ctx, which happens in drm_exec_until_all_locked(). 
> 
> I should add a comment about that.
> 
> /Thomas
> 
> 
> 
> > 
> > Matt
> > 
> > > +
> > >  	ret = ttm_bo_vm_reserve(tbo, vmf);
> > >  	if (ret)
> > >  		goto out;
> > > @@ -1735,7 +1739,7 @@ static vm_fault_t xe_gem_fault(struct
> > > vm_fault *vmf)
> > >  	if (drm_dev_enter(ddev, &idx)) {
> > >  		trace_xe_bo_cpu_fault(bo);
> > >  
> > > -		xe_validation_assert_exec(xe, exec, &tbo->base);
> > > +		xe_validation_assert_exec(xe, &exec, &tbo->base);
> > >  		ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma-
> > > >vm_page_prot,
> > >  					      
> > > TTM_BO_VM_NUM_PREFAULT);
> > >  		drm_dev_exit(idx);
> > > @@ -1761,6 +1765,7 @@ static vm_fault_t xe_gem_fault(struct
> > > vm_fault *vmf)
> > >  
> > >  	dma_resv_unlock(tbo->base.resv);
> > >  out:
> > > +	xe_validation_ctx_fini(&ctx);
> > >  	if (needs_rpm)
> > >  		xe_pm_runtime_put(xe);
> > >  
> > > -- 
> > > 2.50.1
> > > 
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [05/15] drm/xe: Introduce an xe_validation wrapper around drm_exec
  2025-08-13 10:51 ` [PATCH 05/15] drm/xe: Introduce an xe_validation wrapper around drm_exec Thomas Hellström
  2025-08-13 17:25   ` Matthew Brost
  2025-08-14  2:33   ` Matthew Brost
@ 2025-08-17 14:05   ` Simon Richter
  2025-08-18  2:19     ` Matthew Brost
  2025-08-18  9:19     ` Thomas Hellström
  2 siblings, 2 replies; 66+ messages in thread
From: Simon Richter @ 2025-08-17 14:05 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Matthew Brost, Joonas Lahtinen, Jani Nikula,
	Maarten Lankhorst, Matthew Auld

Hi,

On Wed, Aug 13, 2025 at 12:51:11PM +0200, Thomas Hellström wrote:

> +static int xe_validation_lock(struct xe_validation_ctx *ctx)
> +{
> +	struct xe_validation_device *val = ctx->val;
> +	int ret = 0;
> +
> +	if (ctx->flags & DRM_EXEC_INTERRUPTIBLE_WAIT) {
> +		if (ctx->request_exclusive)
> +			ret = down_write_killable(&val->lock);
> +		else
> +			ret = down_read_interruptible(&val->lock);
> +	} else {
> +		if (ctx->request_exclusive)
> +			down_write(&val->lock);
> +		else
> +			down_read(&val->lock);
> +	}
> +
> +	if (!ret) {
> +		ctx->lock_held = true;
> +		ctx->lock_held_exclusive = ctx->request_exclusive;
> +	}
> +
> +	return ret;
> +}

This can fail if DRM_EXEC_INTERRUPTIBLE_WAIT is set, ...

> +int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct xe_validation_device *val,
> +			   struct drm_exec *exec, u32 flags, unsigned int nr,
> +			   bool exclusive)
> +{
> +	int ret;
> +
> +	ctx->exec = exec;
> +	ctx->val = val;
> +	ctx->lock_held = false;
> +	ctx->lock_held_exclusive = false;
> +	ctx->request_exclusive = exclusive;
> +	ctx->flags = flags;
> +	ctx->nr = nr;
> +
> +	ret = xe_validation_lock(ctx);
> +	if (ret)
> +		return ret;

... causing an error to be returned here, which...

> +DEFINE_CLASS(xe_validation, struct xe_validation_ctx *,
> +	     if (!IS_ERR(_T)) xe_validation_ctx_fini(_T);,
> +	     ({_ret = xe_validation_ctx_init(_ctx, _val, _exec, _flags, 0, _excl);
> +	       _ret ? NULL : _ctx; }),
> +	     struct xe_validation_ctx *_ctx, struct xe_validation_device *_val,
> +	     struct drm_exec *_exec, u32 _flags, int _ret, bool _excl);

... causes a NULL pointer to be recorded for the scoped guard here, which
is then passed to xe_validation_ctx_fini on scope exit, causing

Kernel attempted to read user page (0) - exploit attempt? (uid: 1000)
BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc008000014bcf2a8
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix  SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat x_tables nf_tables nfnetlink br_netfilter bridge stp llc overlay binfmt_misc mei_gsc mei_me snd_hda_codec_intelhdmi mei snd_hda_codec_hdmi mtd_intel_dg xe joydev evdev drm_gpuvm drm_buddy gpu_sched drm_exec drm_suballoc_helper drm_ttm_helper ttm drm_display_helper hid_generic usbhid cec hid rc_core drm_client_lib drm_kms_helper snd_hda_intel snd_intel_dspcfg drm snd_hda_codec snd_hda_core aes_gcm_p10_crypto crypto_simd snd_hwdep cryptd xts snd_pcm drm_panel_orientation_quirks ghash_generic ofpart snd_timer vmx_crypto powernv_flash i2c_algo_bit snd ipmi_powernv gf128mul mtd ipmi_devintf configfs ipmi_msghandler opal_prd at24 soundcore regmap_i2c ext4 crc16 mbcache jbd2 dm_mod xhci_pci xhci_hcd nvme tg3 usbcore nvme_core libphy nvme_keyring nvme_auth mdio_bus usb_common
CPU: 24 UID: 0 PID: 2438 Comm: Xorg Not tainted 6.17.0-rc1+ #1 VOLUNTARY
Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-9858186 PowerNV
NIP:  c008000014bcf2a8 LR: c008000014bd5724 CTR: c0080000141a0248
REGS: c00000000fa0f4a0 TRAP: 0300   Not tainted  (6.17.0-rc1+)
MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 88002844  XER: 00000

CFAR: c008000014bd5720 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0
GPR00: c008000014bd5724 c00000000fa0f740 c008000014df9200 0000000000000000
GPR04: c000000023340000 c000000023340000 0000000000000031 fffffffffffe0000
GPR08: c00000002d0075a8 fffffffffffff000 0000000000000000 c008000014d8d3b8
GPR12: c0080000141a0248 c0000007fffcc800 0000000000000000 0000000000000000
GPR16: 0000000000000001 0000000000000000 0000000000000001 c0002000084dff40
GPR20: c000000019230308 fffffffffffff000 0000000000000000 fffffffffffff000
GPR24: 0000000000000000 0000000000000000 0000000000000001 c0002000134ab000
GPR28: 0000000000000000 c00020001333e818 0000000000000000 0000000000000000
NIP [c008000014bcf2a8] xe_validation_ctx_fini+0x20/0x90 [xe]
LR [c008000014bd5724] new_vma+0x32c/0x400 [xe]
Call Trace:
[c00000000fa0f740] [0000000000000001] 0x1 (unreliable)
[c00000000fa0f770] [c008000014bd5724] new_vma+0x32c/0x400 [xe]
[c00000000fa0f860] [c008000014bd5abc] vm_bind_ioctl_ops_parse+0x2c4/0x9b0 [xe]
[c00000000fa0f920] [c008000014bd9dac] xe_vm_bind_ioctl+0x1344/0x17a0 [xe]
[c00000000fa0faf0] [c00800001349ca58] drm_ioctl_kernel+0x100/0x1a0 [drm]
[c00000000fa0fb50] [c00800001349cd88] drm_ioctl+0x290/0x690 [drm]
[c00000000fa0fcc0] [c008000014b1fcdc] xe_drm_ioctl+0x74/0xd0 [xe]
[c00000000fa0fd10] [c0000000006ec654] sys_ioctl+0x594/0x1020
[c00000000fa0fe10] [c00000000002c020] system_call_exception+0x120/0x240
[c00000000fa0fe50] [c00000000000cfdc] system_call_vectored_common+0x15c/0x2ec

   Simon

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [05/15] drm/xe: Introduce an xe_validation wrapper around drm_exec
  2025-08-17 14:05   ` [05/15] " Simon Richter
@ 2025-08-18  2:19     ` Matthew Brost
  2025-08-18  5:24       ` Simon Richter
  2025-08-18  9:19     ` Thomas Hellström
  1 sibling, 1 reply; 66+ messages in thread
From: Matthew Brost @ 2025-08-18  2:19 UTC (permalink / raw)
  To: Simon Richter
  Cc: Thomas Hellström, intel-xe, Joonas Lahtinen, Jani Nikula,
	Maarten Lankhorst, Matthew Auld

On Sun, Aug 17, 2025 at 04:05:18PM +0200, Simon Richter wrote:
> Hi,
> 
> On Wed, Aug 13, 2025 at 12:51:11PM +0200, Thomas Hellström wrote:
> 
> > +static int xe_validation_lock(struct xe_validation_ctx *ctx)
> > +{
> > +	struct xe_validation_device *val = ctx->val;
> > +	int ret = 0;
> > +
> > +	if (ctx->flags & DRM_EXEC_INTERRUPTIBLE_WAIT) {
> > +		if (ctx->request_exclusive)
> > +			ret = down_write_killable(&val->lock);
> > +		else
> > +			ret = down_read_interruptible(&val->lock);
> > +	} else {
> > +		if (ctx->request_exclusive)
> > +			down_write(&val->lock);
> > +		else
> > +			down_read(&val->lock);
> > +	}
> > +
> > +	if (!ret) {
> > +		ctx->lock_held = true;
> > +		ctx->lock_held_exclusive = ctx->request_exclusive;
> > +	}
> > +
> > +	return ret;
> > +}
> 
> This can fail if DRM_EXEC_INTERRUPTIBLE_WAIT is set, ...
> 
> > +int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct xe_validation_device *val,
> > +			   struct drm_exec *exec, u32 flags, unsigned int nr,
> > +			   bool exclusive)
> > +{
> > +	int ret;
> > +
> > +	ctx->exec = exec;
> > +	ctx->val = val;
> > +	ctx->lock_held = false;
> > +	ctx->lock_held_exclusive = false;
> > +	ctx->request_exclusive = exclusive;
> > +	ctx->flags = flags;
> > +	ctx->nr = nr;
> > +
> > +	ret = xe_validation_lock(ctx);
> > +	if (ret)
> > +		return ret;
> 
> ... causing an error to be returned here, which...
> 
> > +DEFINE_CLASS(xe_validation, struct xe_validation_ctx *,
> > +	     if (!IS_ERR(_T)) xe_validation_ctx_fini(_T);,
> > +	     ({_ret = xe_validation_ctx_init(_ctx, _val, _exec, _flags, 0, _excl);
> > +	       _ret ? NULL : _ctx; }),
> > +	     struct xe_validation_ctx *_ctx, struct xe_validation_device *_val,
> > +	     struct drm_exec *_exec, u32 _flags, int _ret, bool _excl);
> 
> ... causes a NULL pointer to be recorded for the scoped guard here, which
> is then passed to xe_validation_ctx_fini on scope exit, causing
> 

Yep. Thomas and I agree [1] this is a bug.

Matt

[1] https://patchwork.freedesktop.org/patch/668295/?series=152882&rev=1#comment_1225697 

> Kernel attempted to read user page (0) - exploit attempt? (uid: 1000)
> BUG: Kernel NULL pointer dereference on read at 0x00000000
> Faulting instruction address: 0xc008000014bcf2a8
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Radix  SMP NR_CPUS=2048 NUMA PowerNV
> Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat x_tables nf_tables nfnetlink br_netfilter bridge stp llc overlay binfmt_misc mei_gsc mei_me snd_hda_codec_intelhdmi mei snd_hda_codec_hdmi mtd_intel_dg xe joydev evdev drm_gpuvm drm_buddy gpu_sched drm_exec drm_suballoc_helper drm_ttm_helper ttm drm_display_helper hid_generic usbhid cec hid rc_core drm_client_lib drm_kms_helper snd_hda_intel snd_intel_dspcfg drm snd_hda_codec snd_hda_core aes_gcm_p10_crypto crypto_simd snd_hwdep cryptd xts snd_pcm drm_panel_orientation_quirks ghash_generic ofpart snd_timer vmx_crypto powernv_flash i2c_algo_bit snd ipmi_powernv gf128mul mtd ipmi_devintf configfs ipmi_msghandler opal_prd at24 soundcore regmap_i2c ext4 crc16 mbcache jbd2 dm_mod xhci_pci xhci_hcd nvme tg3 usbcore nvme_core libphy nvme_keyring nvme_auth mdio_bus usb_common
> CPU: 24 UID: 0 PID: 2438 Comm: Xorg Not tainted 6.17.0-rc1+ #1 VOLUNTARY
> Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-9858186 PowerNV
> NIP:  c008000014bcf2a8 LR: c008000014bd5724 CTR: c0080000141a0248
> REGS: c00000000fa0f4a0 TRAP: 0300   Not tainted  (6.17.0-rc1+)
> MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 88002844  XER: 00000
> 
> CFAR: c008000014bd5720 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0
> GPR00: c008000014bd5724 c00000000fa0f740 c008000014df9200 0000000000000000
> GPR04: c000000023340000 c000000023340000 0000000000000031 fffffffffffe0000
> GPR08: c00000002d0075a8 fffffffffffff000 0000000000000000 c008000014d8d3b8
> GPR12: c0080000141a0248 c0000007fffcc800 0000000000000000 0000000000000000
> GPR16: 0000000000000001 0000000000000000 0000000000000001 c0002000084dff40
> GPR20: c000000019230308 fffffffffffff000 0000000000000000 fffffffffffff000
> GPR24: 0000000000000000 0000000000000000 0000000000000001 c0002000134ab000
> GPR28: 0000000000000000 c00020001333e818 0000000000000000 0000000000000000
> NIP [c008000014bcf2a8] xe_validation_ctx_fini+0x20/0x90 [xe]
> LR [c008000014bd5724] new_vma+0x32c/0x400 [xe]
> Call Trace:
> [c00000000fa0f740] [0000000000000001] 0x1 (unreliable)
> [c00000000fa0f770] [c008000014bd5724] new_vma+0x32c/0x400 [xe]
> [c00000000fa0f860] [c008000014bd5abc] vm_bind_ioctl_ops_parse+0x2c4/0x9b0 [xe]
> [c00000000fa0f920] [c008000014bd9dac] xe_vm_bind_ioctl+0x1344/0x17a0 [xe]
> [c00000000fa0faf0] [c00800001349ca58] drm_ioctl_kernel+0x100/0x1a0 [drm]
> [c00000000fa0fb50] [c00800001349cd88] drm_ioctl+0x290/0x690 [drm]
> [c00000000fa0fcc0] [c008000014b1fcdc] xe_drm_ioctl+0x74/0xd0 [xe]
> [c00000000fa0fd10] [c0000000006ec654] sys_ioctl+0x594/0x1020
> [c00000000fa0fe10] [c00000000002c020] system_call_exception+0x120/0x240
> [c00000000fa0fe50] [c00000000000cfdc] system_call_vectored_common+0x15c/0x2ec
> 
>    Simon

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [05/15] drm/xe: Introduce an xe_validation wrapper around drm_exec
  2025-08-18  2:19     ` Matthew Brost
@ 2025-08-18  5:24       ` Simon Richter
  0 siblings, 0 replies; 66+ messages in thread
From: Simon Richter @ 2025-08-18  5:24 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Thomas Hellström, intel-xe, Joonas Lahtinen, Jani Nikula,
	Maarten Lankhorst, Matthew Auld

Hi,

On 8/18/25 11:19 AM, Matthew Brost wrote:

>> ... causes a NULL pointer to be recorded for the scoped guard here, which
>> is then passed to xe_validation_ctx_fini on scope exit, causing

> Yep. Thomas and I agree [1] this is a bug.

Ah, excellent.

It is also really easy to trigger this, happened after a few seconds in 
during a Piglit run for me, on a single GPU as a normal user, just a lot 
of clients starting and stopping in parallel.

    Simon

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 09/15] drm/xe: Convert the CPU fault handler for exhaustive eviction
  2025-08-15 19:04       ` Matthew Brost
@ 2025-08-18  9:11         ` Thomas Hellström
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Hellström @ 2025-08-18  9:11 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Fri, 2025-08-15 at 12:04 -0700, Matthew Brost wrote:
> On Fri, Aug 15, 2025 at 05:16:54PM +0200, Thomas Hellström wrote:
> > On Wed, 2025-08-13 at 15:06 -0700, Matthew Brost wrote:
> > > On Wed, Aug 13, 2025 at 12:51:15PM +0200, Thomas Hellström wrote:
> > > > The CPU fault handler may populate bos and migrate, and in
> > > > doing
> > > > so might interfere with other tasks validing.
> > > > 
> > > > Convert it for exhaustive eviction. To do this properly without
> > > > potentially introducing stalls with the mmap lock held requires
> > > > TTM work. In the meantime, let's live with those stalls that
> > > > would typically happen on memory pressure.
> > > > 
> > > > Signed-off-by: Thomas Hellström
> > > > <thomas.hellstrom@linux.intel.com>
> > > > ---
> > > >  drivers/gpu/drm/xe/xe_bo.c | 11 ++++++++---
> > > >  1 file changed, 8 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > > > b/drivers/gpu/drm/xe/xe_bo.c
> > > > index 5e40b6cb8d2a..dd1e0e9957e0 100644
> > > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > > @@ -1720,14 +1720,18 @@ static vm_fault_t xe_gem_fault(struct
> > > > vm_fault *vmf)
> > > >  	struct xe_device *xe = to_xe_device(ddev);
> > > >  	struct xe_bo *bo = ttm_to_xe_bo(tbo);
> > > >  	bool needs_rpm = bo->flags & XE_BO_FLAG_VRAM_MASK;
> > > > -	struct drm_exec *exec;
> > > > +	struct xe_validation_ctx ctx;
> > > > +	struct drm_exec exec;
> > > >  	vm_fault_t ret;
> > > >  	int idx;
> > > >  
> > > >  	if (needs_rpm)
> > > >  		xe_pm_runtime_get(xe);
> > > >  
> > > > -	exec = XE_VALIDATION_UNIMPLEMENTED;
> > > > +	if (xe_validation_ctx_init(&ctx, &xe->val, &exec,
> > > > +				  
> > > > DRM_EXEC_INTERRUPTIBLE_WAIT, 0,
> > > > false))
> > > > +		return VM_FAULT_NOPAGE;
> > > 
> > > Any particular reason to not use xe_validation_guard here?
> > 
> > Well this is a bit complicated ATM.
> > We would need some serious TTM rework here to support drm_exec in
> > these
> > helpers, and ATM I think upon closer inspection we'd need an
> > xe_validation_ctx_init that doesn't initialize a drm_exec.
> > 
> 
> Right, so I think this is an unsupported case then.

We should be able to re-lock in write-mode, though.
Let me have a look at this in v2.

Thanks,
Thomas


> 
> Matt
> 
> > ttm_bo_vm_reserve() might use a bo lock without a drm_exec and that
> > will cause a lockdep splat if the drm_exec transaction has
> > initialized
> > the ww ctx, which happens in drm_exec_until_all_locked(). 
> > 
> > I should add a comment about that.
> > 
> > /Thomas
> > 
> > 
> > 
> > > 
> > > Matt
> > > 
> > > > +
> > > >  	ret = ttm_bo_vm_reserve(tbo, vmf);
> > > >  	if (ret)
> > > >  		goto out;
> > > > @@ -1735,7 +1739,7 @@ static vm_fault_t xe_gem_fault(struct
> > > > vm_fault *vmf)
> > > >  	if (drm_dev_enter(ddev, &idx)) {
> > > >  		trace_xe_bo_cpu_fault(bo);
> > > >  
> > > > -		xe_validation_assert_exec(xe, exec, &tbo-
> > > > >base);
> > > > +		xe_validation_assert_exec(xe, &exec, &tbo-
> > > > >base);
> > > >  		ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma-
> > > > > vm_page_prot,
> > > >  					      
> > > > TTM_BO_VM_NUM_PREFAULT);
> > > >  		drm_dev_exit(idx);
> > > > @@ -1761,6 +1765,7 @@ static vm_fault_t xe_gem_fault(struct
> > > > vm_fault *vmf)
> > > >  
> > > >  	dma_resv_unlock(tbo->base.resv);
> > > >  out:
> > > > +	xe_validation_ctx_fini(&ctx);
> > > >  	if (needs_rpm)
> > > >  		xe_pm_runtime_put(xe);
> > > >  
> > > > -- 
> > > > 2.50.1
> > > > 
> > 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [05/15] drm/xe: Introduce an xe_validation wrapper around drm_exec
  2025-08-17 14:05   ` [05/15] " Simon Richter
  2025-08-18  2:19     ` Matthew Brost
@ 2025-08-18  9:19     ` Thomas Hellström
  1 sibling, 0 replies; 66+ messages in thread
From: Thomas Hellström @ 2025-08-18  9:19 UTC (permalink / raw)
  To: Simon Richter
  Cc: intel-xe, Matthew Brost, Joonas Lahtinen, Jani Nikula,
	Maarten Lankhorst, Matthew Auld

On Sun, 2025-08-17 at 16:05 +0200, Simon Richter wrote:
> Hi,
> 
> On Wed, Aug 13, 2025 at 12:51:11PM +0200, Thomas Hellström wrote:
> 
> > +static int xe_validation_lock(struct xe_validation_ctx *ctx)
> > +{
> > +	struct xe_validation_device *val = ctx->val;
> > +	int ret = 0;
> > +
> > +	if (ctx->flags & DRM_EXEC_INTERRUPTIBLE_WAIT) {
> > +		if (ctx->request_exclusive)
> > +			ret = down_write_killable(&val->lock);
> > +		else
> > +			ret = down_read_interruptible(&val->lock);
> > +	} else {
> > +		if (ctx->request_exclusive)
> > +			down_write(&val->lock);
> > +		else
> > +			down_read(&val->lock);
> > +	}
> > +
> > +	if (!ret) {
> > +		ctx->lock_held = true;
> > +		ctx->lock_held_exclusive = ctx->request_exclusive;
> > +	}
> > +
> > +	return ret;
> > +}
> 
> This can fail if DRM_EXEC_INTERRUPTIBLE_WAIT is set, ...
> 
> > +int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct
> > xe_validation_device *val,
> > +			   struct drm_exec *exec, u32 flags,
> > unsigned int nr,
> > +			   bool exclusive)
> > +{
> > +	int ret;
> > +
> > +	ctx->exec = exec;
> > +	ctx->val = val;
> > +	ctx->lock_held = false;
> > +	ctx->lock_held_exclusive = false;
> > +	ctx->request_exclusive = exclusive;
> > +	ctx->flags = flags;
> > +	ctx->nr = nr;
> > +
> > +	ret = xe_validation_lock(ctx);
> > +	if (ret)
> > +		return ret;
> 
> ... causing an error to be returned here, which...
> 
> > +DEFINE_CLASS(xe_validation, struct xe_validation_ctx *,
> > +	     if (!IS_ERR(_T)) xe_validation_ctx_fini(_T);,
> > +	     ({_ret = xe_validation_ctx_init(_ctx, _val, _exec,
> > _flags, 0, _excl);
> > +	       _ret ? NULL : _ctx; }),
> > +	     struct xe_validation_ctx *_ctx, struct
> > xe_validation_device *_val,
> > +	     struct drm_exec *_exec, u32 _flags, int _ret, bool
> > _excl);
> 
> ... causes a NULL pointer to be recorded for the scoped guard here,
> which
> is then passed to xe_validation_ctx_fini on scope exit, causing
> 
> Kernel attempted to read user page (0) - exploit attempt? (uid: 1000)
> BUG: Kernel NULL pointer dereference on read at 0x00000000
> Faulting instruction address: 0xc008000014bcf2a8
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Radix  SMP NR_CPUS=2048 NUMA PowerNV
> Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat
> nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
> xfrm_user xfrm_algo xt_addrtype nft_compat x_tables nf_tables
> nfnetlink br_netfilter bridge stp llc overlay binfmt_misc mei_gsc
> mei_me snd_hda_codec_intelhdmi mei snd_hda_codec_hdmi mtd_intel_dg xe
> joydev evdev drm_gpuvm drm_buddy gpu_sched drm_exec
> drm_suballoc_helper drm_ttm_helper ttm drm_display_helper hid_generic
> usbhid cec hid rc_core drm_client_lib drm_kms_helper snd_hda_intel
> snd_intel_dspcfg drm snd_hda_codec snd_hda_core aes_gcm_p10_crypto
> crypto_simd snd_hwdep cryptd xts snd_pcm drm_panel_orientation_quirks
> ghash_generic ofpart snd_timer vmx_crypto powernv_flash i2c_algo_bit
> snd ipmi_powernv gf128mul mtd ipmi_devintf configfs ipmi_msghandler
> opal_prd at24 soundcore regmap_i2c ext4 crc16 mbcache jbd2 dm_mod
> xhci_pci xhci_hcd nvme tg3 usbcore nvme_core libphy nvme_keyring
> nvme_auth mdio_bus usb_common
> CPU: 24 UID: 0 PID: 2438 Comm: Xorg Not tainted 6.17.0-rc1+ #1
> VOLUNTARY
> Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-9858186
> PowerNV
> NIP:  c008000014bcf2a8 LR: c008000014bd5724 CTR: c0080000141a0248
> REGS: c00000000fa0f4a0 TRAP: 0300   Not tainted  (6.17.0-rc1+)
> MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR:
> 88002844  XER: 00000
> 
> CFAR: c008000014bd5720 DAR: 0000000000000000 DSISR: 40000000 IRQMASK:
> 0
> GPR00: c008000014bd5724 c00000000fa0f740 c008000014df9200
> 0000000000000000
> GPR04: c000000023340000 c000000023340000 0000000000000031
> fffffffffffe0000
> GPR08: c00000002d0075a8 fffffffffffff000 0000000000000000
> c008000014d8d3b8
> GPR12: c0080000141a0248 c0000007fffcc800 0000000000000000
> 0000000000000000
> GPR16: 0000000000000001 0000000000000000 0000000000000001
> c0002000084dff40
> GPR20: c000000019230308 fffffffffffff000 0000000000000000
> fffffffffffff000
> GPR24: 0000000000000000 0000000000000000 0000000000000001
> c0002000134ab000
> GPR28: 0000000000000000 c00020001333e818 0000000000000000
> 0000000000000000
> NIP [c008000014bcf2a8] xe_validation_ctx_fini+0x20/0x90 [xe]
> LR [c008000014bd5724] new_vma+0x32c/0x400 [xe]
> Call Trace:
> [c00000000fa0f740] [0000000000000001] 0x1 (unreliable)
> [c00000000fa0f770] [c008000014bd5724] new_vma+0x32c/0x400 [xe]
> [c00000000fa0f860] [c008000014bd5abc]
> vm_bind_ioctl_ops_parse+0x2c4/0x9b0 [xe]
> [c00000000fa0f920] [c008000014bd9dac] xe_vm_bind_ioctl+0x1344/0x17a0
> [xe]
> [c00000000fa0faf0] [c00800001349ca58] drm_ioctl_kernel+0x100/0x1a0
> [drm]
> [c00000000fa0fb50] [c00800001349cd88] drm_ioctl+0x290/0x690 [drm]
> [c00000000fa0fcc0] [c008000014b1fcdc] xe_drm_ioctl+0x74/0xd0 [xe]
> [c00000000fa0fd10] [c0000000006ec654] sys_ioctl+0x594/0x1020
> [c00000000fa0fe10] [c00000000002c020]
> system_call_exception+0x120/0x240
> [c00000000fa0fe50] [c00000000000cfdc]
> system_call_vectored_common+0x15c/0x2ec

Right, that's been pointed out also by Matt Brost's review. 
The idea here is that if the interruptible lock returns -EINTR, then we
should not enter the code within xe_validation_guard() either so on top
of the fix for this, it looks like I need to mark it as conditional.

Thanks,
Thomas



> 
>    Simon


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 04/15] drm/xe: Pass down drm_exec context to validation
  2025-08-13 16:42   ` Matthew Brost
  2025-08-14  7:49     ` Thomas Hellström
@ 2025-08-22  7:40     ` Thomas Hellström
  1 sibling, 0 replies; 66+ messages in thread
From: Thomas Hellström @ 2025-08-22  7:40 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, 2025-08-13 at 09:42 -0700, Matthew Brost wrote:
> >   
> > +	/**
> > +	 * @validation: Validation data only valid with the vm
> > resv held.
> > +	 * Note: This is really task state of the task holding the
> > vm resv,
> > +	 * and moving forward we should
> > +	 * come up with a better way of passing this down the
> > call-
> > +	 * chain.
> 
> I've already mentioned this, attaching the _exec xe_vma_ops might be
> good option as xe_vma_ops has lifetime of only existing for the bind
> (i.e., it is stack variable) so you'd only need to set it (i.e., no
> clear required).

That is true, although since this is really task state the correct
thing would be to pass this as a parameter to the functions involved,
but when I tried to implement that, it got quite complicated.

So figuring saving this for a future cleanup might make sense. I'll do
that for v2 anyway, and we can discuss it further.

Thanks,
Thomas


> 
> I think patch largely makes sense.
> 
> Matt 
> 
> > +


^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2025-08-22  7:40 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-13 10:51 [PATCH 00/15] Driver-managed exhaustive eviction Thomas Hellström
2025-08-13 10:51 ` [PATCH 01/15] drm/xe/vm: Don't use a pin the vm_resv during validation Thomas Hellström
2025-08-13 14:28   ` Matthew Brost
2025-08-13 14:33     ` Thomas Hellström
2025-08-13 15:17       ` Matthew Brost
2025-08-13 10:51 ` [PATCH 02/15] drm/xe/tests/xe_dma_buf: Set the drm_object::dma_buf member Thomas Hellström
2025-08-14  2:52   ` Matthew Brost
2025-08-13 10:51 ` [PATCH 03/15] drm/xe/vm: Clear the scratch_pt pointer on error Thomas Hellström
2025-08-13 14:45   ` Matthew Brost
2025-08-13 10:51 ` [PATCH 04/15] drm/xe: Pass down drm_exec context to validation Thomas Hellström
2025-08-13 16:42   ` Matthew Brost
2025-08-14  7:49     ` Thomas Hellström
2025-08-14 19:09       ` Matthew Brost
2025-08-22  7:40     ` Thomas Hellström
2025-08-13 10:51 ` [PATCH 05/15] drm/xe: Introduce an xe_validation wrapper around drm_exec Thomas Hellström
2025-08-13 17:25   ` Matthew Brost
2025-08-15 15:04     ` Thomas Hellström
2025-08-14  2:33   ` Matthew Brost
2025-08-14  4:23     ` Matthew Brost
2025-08-15 15:23     ` Thomas Hellström
2025-08-15 19:01       ` Matthew Brost
2025-08-17 14:05   ` [05/15] " Simon Richter
2025-08-18  2:19     ` Matthew Brost
2025-08-18  5:24       ` Simon Richter
2025-08-18  9:19     ` Thomas Hellström
2025-08-13 10:51 ` [PATCH 06/15] drm/xe: Convert xe_bo_create_user() for exhaustive eviction Thomas Hellström
2025-08-14  2:23   ` Matthew Brost
2025-08-13 10:51 ` [PATCH 07/15] drm/xe: Convert SVM validation " Thomas Hellström
2025-08-13 15:32   ` Matthew Brost
2025-08-14 12:24     ` Thomas Hellström
2025-08-13 10:51 ` [PATCH 08/15] drm/xe: Convert existing drm_exec transactions " Thomas Hellström
2025-08-14  2:48   ` Matthew Brost
2025-08-13 10:51 ` [PATCH 09/15] drm/xe: Convert the CPU fault handler " Thomas Hellström
2025-08-13 22:06   ` Matthew Brost
2025-08-15 15:16     ` Thomas Hellström
2025-08-15 19:04       ` Matthew Brost
2025-08-18  9:11         ` Thomas Hellström
2025-08-13 10:51 ` [PATCH 10/15] drm/xe/display: Convert __xe_pin_fb_vma() Thomas Hellström
2025-08-14  2:35   ` Matthew Brost
2025-08-13 10:51 ` [PATCH 11/15] drm/xe: Convert xe_dma_buf.c for exhaustive eviction Thomas Hellström
2025-08-13 21:37   ` Matthew Brost
2025-08-15 15:05     ` Thomas Hellström
2025-08-14 20:37   ` Matthew Brost
2025-08-15  6:57     ` Thomas Hellström
2025-08-13 10:51 ` [PATCH 12/15] drm/xe: Rename ___xe_bo_create_locked() Thomas Hellström
2025-08-13 21:33   ` Matthew Brost
2025-08-13 10:51 ` [PATCH 13/15] drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction Thomas Hellström
2025-08-14  3:58   ` Matthew Brost
2025-08-15 15:25     ` Thomas Hellström
2025-08-14  4:05   ` Matthew Brost
2025-08-15 15:27     ` Thomas Hellström
2025-08-14 18:48   ` Matthew Brost
2025-08-15  9:37     ` Thomas Hellström
2025-08-13 10:51 ` [PATCH 14/15] drm/xe: Convert xe_bo_create_pin_map() " Thomas Hellström
2025-08-14  4:18   ` Matthew Brost
2025-08-14 13:14     ` Thomas Hellström
2025-08-14 18:39       ` Matthew Brost
2025-08-13 10:51 ` [PATCH 15/15] drm/xe: Convert pinned suspend eviction " Thomas Hellström
2025-08-13 12:13   ` Matthew Auld
2025-08-13 12:30     ` Thomas Hellström
2025-08-14 20:30   ` Matthew Brost
2025-08-15 15:29     ` Thomas Hellström
2025-08-13 11:54 ` ✗ CI.checkpatch: warning for Driver-managed " Patchwork
2025-08-13 11:55 ` ✓ CI.KUnit: success " Patchwork
2025-08-13 13:20 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-08-13 14:25 ` ✗ Xe.CI.Full: " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).