intel-xe.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/16] Driver-managed exhaustive eviction
@ 2025-08-22  9:40 Thomas Hellström
  2025-08-22  9:40 ` [PATCH v2 01/16] drm/xe/vm: Don't pin the vm_resv during validation Thomas Hellström
                   ` (19 more replies)
  0 siblings, 20 replies; 36+ messages in thread
From: Thomas Hellström @ 2025-08-22  9:40 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

Exhaustive eviction means that every client should in theory be able to
allocate all graphics memory (minus pinned memory). This is done by
evicting other client's memory.

Currently when TTM wants to evict a buffer object it will typically
trylock that buffer object. It may also optionally try a sleeping lock,
but if deadlock resolution kicks in while doing so (the locking
returns -EDEADLK), that is converted to an -ENOMEM and returned to the
caller. If there are multiple clients simultaneously wanting to evict
eachother's buffer objects, there is a chance that clients end
up returning -ENOMEM.

The key to resolving this is that on memory contention, lower
priority clients back off, releasing their buffer object locks and
thereby allow their memory to be evicted. Eventually their priority
will elevate and they will succeed. TTM has long been intending to
implent this using full drm_exec locking during eviction. This means
that when that is implemented, clients wanting to validate memory must
pass the drm_exec context used to lock its buffer object to TTM
validation. Most of this series is making sure that is done, both
on exec-type validation and buffer object creation. The big benefit of
this approach is that it can distinguish between memory types and
avoid lock release rollbacks until it is really necessary. One
drawback is that it can't handle system memory contention resolved
by a shrinker.

However, since TTM has still to implement drm_exec validation, this
series, while preparing for the TTM implementation, takes a different
approach with an outer rw semaphore on top of the drm_exec retry loop.
When a client wants to allocate graphics memory, the lock is taken in
non-exclusive mode. If an OOM is hit, the locks are released and the
outer lock is retaken in exclusive mode. That ensures that on memory
contention, the client will, when the exclusive lock is held, be
the only client trying to allocate memory. It requires, however,
that all clients adhere to the same scheme.

The idea is that when TTM implements drm_exec eviction, the driver-
managed scheme could be retired.

Patch 1 to 3 fixes fixes problems hit while testing.
Patch 4 identifies the code-paths where we need a drm_exec transaction.
Patch 5 introduces the wrapper with the rw-semaphore

The rest of the patches ensure that we wrap graphics memory
allocation in the combined rw-semaphore / drm-exec loop.

As a follow up, additional patches around suspend / resume will
be posted.

v2: (Highlights)
- Fix a number of issues discovered during review.
- Rework the signature of xe_validation_guart (Matt Brost)
- Rework the CPU fault handler (Matt Brost)

Thomas Hellström (16):
  drm/xe/vm: Don't pin the vm_resv during validation
  drm/xe/tests/xe_dma_buf: Set the drm_object::dma_buf member
  drm/xe/vm: Clear the scratch_pt pointer on error
  drm/xe: Pass down drm_exec context to validation
  drm/xe: Introduce an xe_validation wrapper around drm_exec
  drm/xe: Convert xe_bo_create_user() for exhaustive eviction
  drm/xe: Convert SVM validation for exhaustive eviction
  drm/xe: Convert existing drm_exec transactions for exhaustive eviction
  drm/xe: Convert the CPU fault handler for exhaustive eviction
  drm/xe/display: Convert __xe_pin_fb_vma()
  drm/xe: Convert xe_dma_buf.c for exhaustive eviction
  drm/xe: Rename ___xe_bo_create_locked()
  drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction
  drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction
  drm/xe/sriov: Convert pf_provision_vf_lmem for exhaustive eviction
  drm/xe: Convert pinned suspend eviction for exhaustive eviction

 drivers/gpu/drm/xe/Makefile                   |   1 +
 .../compat-i915-headers/gem/i915_gem_stolen.h |  24 +-
 drivers/gpu/drm/xe/display/intel_fbdev_fb.c   |  18 +-
 drivers/gpu/drm/xe/display/xe_dsb_buffer.c    |  10 +-
 drivers/gpu/drm/xe/display/xe_fb_pin.c        |  70 +-
 drivers/gpu/drm/xe/display/xe_hdcp_gsc.c      |   8 +-
 drivers/gpu/drm/xe/display/xe_plane_initial.c |   4 +-
 drivers/gpu/drm/xe/tests/xe_bo.c              |  36 +-
 drivers/gpu/drm/xe/tests/xe_dma_buf.c         |  24 +-
 drivers/gpu/drm/xe/tests/xe_migrate.c         |  66 +-
 drivers/gpu/drm/xe/xe_bo.c                    | 738 +++++++++++++-----
 drivers/gpu/drm/xe/xe_bo.h                    |  56 +-
 drivers/gpu/drm/xe/xe_device.c                |   2 +
 drivers/gpu/drm/xe/xe_device_types.h          |   3 +
 drivers/gpu/drm/xe/xe_dma_buf.c               |  72 +-
 drivers/gpu/drm/xe/xe_eu_stall.c              |   5 +-
 drivers/gpu/drm/xe/xe_exec.c                  |  26 +-
 drivers/gpu/drm/xe/xe_ggtt.c                  |  15 +-
 drivers/gpu/drm/xe/xe_ggtt.h                  |   5 +-
 drivers/gpu/drm/xe/xe_gsc.c                   |   8 +-
 drivers/gpu/drm/xe/xe_gt_pagefault.c          |  26 +-
 drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c    |  49 +-
 drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c |  24 +-
 drivers/gpu/drm/xe/xe_guc_engine_activity.c   |  13 +-
 drivers/gpu/drm/xe/xe_lmtt.c                  |  12 +-
 drivers/gpu/drm/xe/xe_lrc.c                   |   7 +-
 drivers/gpu/drm/xe/xe_migrate.c               |  20 +-
 drivers/gpu/drm/xe/xe_oa.c                    |   6 +-
 drivers/gpu/drm/xe/xe_pt.c                    |  10 +-
 drivers/gpu/drm/xe/xe_pt.h                    |   3 +-
 drivers/gpu/drm/xe/xe_pxp_submit.c            |  34 +-
 drivers/gpu/drm/xe/xe_svm.c                   |  97 +--
 drivers/gpu/drm/xe/xe_validation.c            | 278 +++++++
 drivers/gpu/drm/xe/xe_validation.h            | 191 +++++
 drivers/gpu/drm/xe/xe_vm.c                    | 296 +++----
 drivers/gpu/drm/xe/xe_vm.h                    |  53 +-
 drivers/gpu/drm/xe/xe_vm_types.h              |  32 +-
 37 files changed, 1659 insertions(+), 683 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_validation.c
 create mode 100644 drivers/gpu/drm/xe/xe_validation.h

-- 
2.50.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 01/16] drm/xe/vm: Don't pin the vm_resv during validation
  2025-08-22  9:40 [PATCH v2 00/16] Driver-managed exhaustive eviction Thomas Hellström
@ 2025-08-22  9:40 ` Thomas Hellström
  2025-08-22  9:40 ` [PATCH v2 02/16] drm/xe/tests/xe_dma_buf: Set the drm_object::dma_buf member Thomas Hellström
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 36+ messages in thread
From: Thomas Hellström @ 2025-08-22  9:40 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

The pinning has the odd side-effect that unlocking *any* resv
during validation triggers an "unlocking pinned lock" warning.

Cc: Matthew Brost <matthew.brost@intel.com>
Fixes: 9d5558649f68 ("drm/xe: Rework eviction rejection of bound external bos")
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c |  5 ++---
 drivers/gpu/drm/xe/xe_vm.h | 15 ++-------------
 2 files changed, 4 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 6fea39842e1e..11eaf3b06766 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -2468,7 +2468,6 @@ int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict)
 		.no_wait_gpu = false,
 		.gfp_retry_mayfail = true,
 	};
-	struct pin_cookie cookie;
 	int ret;
 
 	if (vm) {
@@ -2479,10 +2478,10 @@ int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict)
 		ctx.resv = xe_vm_resv(vm);
 	}
 
-	cookie = xe_vm_set_validating(vm, allow_res_evict);
+	xe_vm_set_validating(vm, allow_res_evict);
 	trace_xe_bo_validate(bo);
 	ret = ttm_bo_validate(&bo->ttm, &bo->placement, &ctx);
-	xe_vm_clear_validating(vm, allow_res_evict, cookie);
+	xe_vm_clear_validating(vm, allow_res_evict);
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 2f213737c7e5..2ecb417c19a2 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -315,22 +315,14 @@ void xe_vm_snapshot_free(struct xe_vm_snapshot *snap);
  * Register this task as currently making bos resident for the vm. Intended
  * to avoid eviction by the same task of shared bos bound to the vm.
  * Call with the vm's resv lock held.
- *
- * Return: A pin cookie that should be used for xe_vm_clear_validating().
  */
-static inline struct pin_cookie xe_vm_set_validating(struct xe_vm *vm,
-						     bool allow_res_evict)
+static inline void xe_vm_set_validating(struct xe_vm *vm, bool allow_res_evict)
 {
-	struct pin_cookie cookie = {};
-
 	if (vm && !allow_res_evict) {
 		xe_vm_assert_held(vm);
-		cookie = lockdep_pin_lock(&xe_vm_resv(vm)->lock.base);
 		/* Pairs with READ_ONCE in xe_vm_is_validating() */
 		WRITE_ONCE(vm->validating, current);
 	}
-
-	return cookie;
 }
 
 /**
@@ -338,17 +330,14 @@ static inline struct pin_cookie xe_vm_set_validating(struct xe_vm *vm,
  * @vm: Pointer to the vm or NULL
  * @allow_res_evict: Eviction from @vm was allowed. Must be set to the same
  * value as for xe_vm_set_validation().
- * @cookie: Cookie obtained from xe_vm_set_validating().
  *
  * Register this task as currently making bos resident for the vm. Intended
  * to avoid eviction by the same task of shared bos bound to the vm.
  * Call with the vm's resv lock held.
  */
-static inline void xe_vm_clear_validating(struct xe_vm *vm, bool allow_res_evict,
-					  struct pin_cookie cookie)
+static inline void xe_vm_clear_validating(struct xe_vm *vm, bool allow_res_evict)
 {
 	if (vm && !allow_res_evict) {
-		lockdep_unpin_lock(&xe_vm_resv(vm)->lock.base, cookie);
 		/* Pairs with READ_ONCE in xe_vm_is_validating() */
 		WRITE_ONCE(vm->validating, NULL);
 	}
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 02/16] drm/xe/tests/xe_dma_buf: Set the drm_object::dma_buf member
  2025-08-22  9:40 [PATCH v2 00/16] Driver-managed exhaustive eviction Thomas Hellström
  2025-08-22  9:40 ` [PATCH v2 01/16] drm/xe/vm: Don't pin the vm_resv during validation Thomas Hellström
@ 2025-08-22  9:40 ` Thomas Hellström
  2025-08-22  9:40 ` [PATCH v2 03/16] drm/xe/vm: Clear the scratch_pt pointer on error Thomas Hellström
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 36+ messages in thread
From: Thomas Hellström @ 2025-08-22  9:40 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

This member is set when exporting using prime. However
the xe_gem_prime_export() alone doesn't set it, since it's done
later in the prime export flow.
For the test, set it manually and remove the hack that set it
temporarily when it was really needed.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/tests/xe_dma_buf.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/xe/tests/xe_dma_buf.c b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
index c53f67ce4b0a..cde9530bef8c 100644
--- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
@@ -57,16 +57,12 @@ static void check_residency(struct kunit *test, struct xe_bo *exported,
 		return;
 
 	/*
-	 * Evict exporter. Note that the gem object dma_buf member isn't
-	 * set from xe_gem_prime_export(), and it's needed for the move_notify()
-	 * functionality, so hack that up here. Evicting the exported bo will
+	 * Evict exporter. Evicting the exported bo will
 	 * evict also the imported bo through the move_notify() functionality if
 	 * importer is on a different device. If they're on the same device,
 	 * the exporter and the importer should be the same bo.
 	 */
-	swap(exported->ttm.base.dma_buf, dmabuf);
 	ret = xe_bo_evict(exported);
-	swap(exported->ttm.base.dma_buf, dmabuf);
 	if (ret) {
 		if (ret != -EINTR && ret != -ERESTARTSYS)
 			KUNIT_FAIL(test, "Evicting exporter failed with err=%d.\n",
@@ -139,6 +135,7 @@ static void xe_test_dmabuf_import_same_driver(struct xe_device *xe)
 			   PTR_ERR(dmabuf));
 		goto out;
 	}
+	bo->ttm.base.dma_buf = dmabuf;
 
 	import = xe_gem_prime_import(&xe->drm, dmabuf);
 	if (!IS_ERR(import)) {
@@ -186,6 +183,7 @@ static void xe_test_dmabuf_import_same_driver(struct xe_device *xe)
 		KUNIT_FAIL(test, "dynamic p2p attachment failed with err=%ld\n",
 			   PTR_ERR(import));
 	}
+	bo->ttm.base.dma_buf = NULL;
 	dma_buf_put(dmabuf);
 out:
 	drm_gem_object_put(&bo->ttm.base);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 03/16] drm/xe/vm: Clear the scratch_pt pointer on error
  2025-08-22  9:40 [PATCH v2 00/16] Driver-managed exhaustive eviction Thomas Hellström
  2025-08-22  9:40 ` [PATCH v2 01/16] drm/xe/vm: Don't pin the vm_resv during validation Thomas Hellström
  2025-08-22  9:40 ` [PATCH v2 02/16] drm/xe/tests/xe_dma_buf: Set the drm_object::dma_buf member Thomas Hellström
@ 2025-08-22  9:40 ` Thomas Hellström
  2025-08-22  9:40 ` [PATCH v2 04/16] drm/xe: Pass down drm_exec context to validation Thomas Hellström
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 36+ messages in thread
From: Thomas Hellström @ 2025-08-22  9:40 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Brian Welty, Rodrigo Vivi, Lucas De Marchi,
	stable, Matthew Brost, Joonas Lahtinen, Jani Nikula,
	Maarten Lankhorst, Matthew Auld

Avoid triggering a dereference of an error pointer on cleanup in
xe_vm_free_scratch() by clearing any scratch_pt error pointer.

Fixes: 06951c2ee72d ("drm/xe: Use NULL PTEs as scratch PTEs")
Cc: Brian Welty <brian.welty@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index f35d69c0b4c6..529b6767caac 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1635,8 +1635,12 @@ static int xe_vm_create_scratch(struct xe_device *xe, struct xe_tile *tile,
 
 	for (i = MAX_HUGEPTE_LEVEL; i < vm->pt_root[id]->level; i++) {
 		vm->scratch_pt[id][i] = xe_pt_create(vm, tile, i);
-		if (IS_ERR(vm->scratch_pt[id][i]))
-			return PTR_ERR(vm->scratch_pt[id][i]);
+		if (IS_ERR(vm->scratch_pt[id][i])) {
+			int err = PTR_ERR(vm->scratch_pt[id][i]);
+
+			vm->scratch_pt[id][i] = NULL;
+			return err;
+		}
 
 		xe_pt_populate_empty(tile, vm, vm->scratch_pt[id][i]);
 	}
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 04/16] drm/xe: Pass down drm_exec context to validation
  2025-08-22  9:40 [PATCH v2 00/16] Driver-managed exhaustive eviction Thomas Hellström
                   ` (2 preceding siblings ...)
  2025-08-22  9:40 ` [PATCH v2 03/16] drm/xe/vm: Clear the scratch_pt pointer on error Thomas Hellström
@ 2025-08-22  9:40 ` Thomas Hellström
  2025-08-22 19:59   ` Matthew Brost
  2025-08-22  9:40 ` [PATCH v2 05/16] drm/xe: Introduce an xe_validation wrapper around drm_exec Thomas Hellström
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Thomas Hellström @ 2025-08-22  9:40 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

We want all validation (potential backing store allocation) to be part
of a drm_exec transaction. Therefore add a drm_exec pointer argument
to xe_bo_validate() and ___xe_bo_create_locked(). Upcoming patches
will deal with making all (or nearly all) calls to these functions
part of a drm_exec transaction. In the meantime, define special values
of the drm_exec pointer:

XE_VALIDATION_UNIMPLEMENTED: Implementation of the drm_exec transaction
has not been done yet.
XE_VALIDATION_UNSUPPORTED: Some Middle-layers (dma-buf) doesn't allow
the drm_exec context to be passed down to map_attachment where
validation takes place.
XE_VALIDATION_OPT_OUT: May be used only for kunit tests where exhaustive
eviction isn't crucial and the ROI of converting those is very
small.

For XE_VALIDATION_UNIMPLEMENTED and XE_VALIDATION_OPT_OUT there is also
a lockdep check that a drm_exec transaction can indeed start at the
location where the macro is expanded. This is to encourage
developers to take this into consideration early in the code
development process.

v2:
- Fix xe_vm_set_validation_exec() imbalance. Add an assert that
  hopefully catches future instances of this (Matt Brost)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/Makefile                   |   1 +
 .../compat-i915-headers/gem/i915_gem_stolen.h |   6 +-
 drivers/gpu/drm/xe/display/xe_fb_pin.c        |   5 +-
 drivers/gpu/drm/xe/tests/xe_bo.c              |  20 +--
 drivers/gpu/drm/xe/tests/xe_dma_buf.c         |  12 +-
 drivers/gpu/drm/xe/tests/xe_migrate.c         |  45 +++---
 drivers/gpu/drm/xe/xe_bo.c                    | 129 +++++++++++++++---
 drivers/gpu/drm/xe/xe_bo.h                    |  20 +--
 drivers/gpu/drm/xe/xe_dma_buf.c               |  19 ++-
 drivers/gpu/drm/xe/xe_exec.c                  |   6 +-
 drivers/gpu/drm/xe/xe_ggtt.c                  |  15 +-
 drivers/gpu/drm/xe/xe_ggtt.h                  |   5 +-
 drivers/gpu/drm/xe/xe_gt_pagefault.c          |   6 +-
 drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c    |   6 +-
 drivers/gpu/drm/xe/xe_svm.c                   |   4 +-
 drivers/gpu/drm/xe/xe_validation.c            |  49 +++++++
 drivers/gpu/drm/xe/xe_validation.h            |  69 ++++++++++
 drivers/gpu/drm/xe/xe_vm.c                    |  24 +++-
 drivers/gpu/drm/xe/xe_vm.h                    |  34 ++++-
 drivers/gpu/drm/xe/xe_vm_types.h              |  32 +++--
 20 files changed, 402 insertions(+), 105 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_validation.c
 create mode 100644 drivers/gpu/drm/xe/xe_validation.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 8e0c3412a757..8ee7d275128d 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -127,6 +127,7 @@ xe-y += xe_bb.o \
 	xe_tuning.o \
 	xe_uc.o \
 	xe_uc_fw.o \
+	xe_validation.o \
 	xe_vm.o \
 	xe_vram.o \
 	xe_vram_freq.o \
diff --git a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
index 41d39d67817a..1ce1e9da975b 100644
--- a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
+++ b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
@@ -8,6 +8,7 @@
 
 #include "xe_ttm_stolen_mgr.h"
 #include "xe_res_cursor.h"
+#include "xe_validation.h"
 
 struct xe_bo;
 
@@ -20,6 +21,7 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
 						       u32 size, u32 align,
 						       u32 start, u32 end)
 {
+	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
 	struct xe_bo *bo;
 	int err;
 	u32 flags = XE_BO_FLAG_PINNED | XE_BO_FLAG_STOLEN;
@@ -34,13 +36,13 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
 
 	bo = xe_bo_create_locked_range(xe, xe_device_get_root_tile(xe),
 				       NULL, size, start, end,
-				       ttm_bo_type_kernel, flags, 0);
+				       ttm_bo_type_kernel, flags, 0, exec);
 	if (IS_ERR(bo)) {
 		err = PTR_ERR(bo);
 		bo = NULL;
 		return err;
 	}
-	err = xe_bo_pin(bo);
+	err = xe_bo_pin(bo, exec);
 	xe_bo_unlock_vm_held(bo);
 
 	if (err) {
diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
index f1f8b5ab53ef..4b0748e6fdd6 100644
--- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
+++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
@@ -281,6 +281,7 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
 	struct i915_vma *vma = kzalloc(sizeof(*vma), GFP_KERNEL);
 	struct drm_gem_object *obj = intel_fb_bo(&fb->base);
 	struct xe_bo *bo = gem_to_xe_bo(obj);
+	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
 	int ret;
 
 	if (!vma)
@@ -313,9 +314,9 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
 		goto err;
 
 	if (IS_DGFX(xe))
-		ret = xe_bo_migrate(bo, XE_PL_VRAM0);
+		ret = xe_bo_migrate(bo, XE_PL_VRAM0, exec);
 	else
-		ret = xe_bo_validate(bo, NULL, true);
+		ret = xe_bo_validate(bo, NULL, true, exec);
 	if (!ret)
 		ttm_bo_pin(&bo->ttm);
 	ttm_bo_unreserve(&bo->ttm);
diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
index bb469096d072..06ceba6c3c25 100644
--- a/drivers/gpu/drm/xe/tests/xe_bo.c
+++ b/drivers/gpu/drm/xe/tests/xe_bo.c
@@ -23,7 +23,7 @@
 
 static int ccs_test_migrate(struct xe_tile *tile, struct xe_bo *bo,
 			    bool clear, u64 get_val, u64 assign_val,
-			    struct kunit *test)
+			    struct kunit *test, struct drm_exec *exec)
 {
 	struct dma_fence *fence;
 	struct ttm_tt *ttm;
@@ -35,7 +35,7 @@ static int ccs_test_migrate(struct xe_tile *tile, struct xe_bo *bo,
 	u32 offset;
 
 	/* Move bo to VRAM if not already there. */
-	ret = xe_bo_validate(bo, NULL, false);
+	ret = xe_bo_validate(bo, NULL, false, exec);
 	if (ret) {
 		KUNIT_FAIL(test, "Failed to validate bo.\n");
 		return ret;
@@ -60,7 +60,7 @@ static int ccs_test_migrate(struct xe_tile *tile, struct xe_bo *bo,
 	}
 
 	/* Evict to system. CCS data should be copied. */
-	ret = xe_bo_evict(bo);
+	ret = xe_bo_evict(bo, exec);
 	if (ret) {
 		KUNIT_FAIL(test, "Failed to evict bo.\n");
 		return ret;
@@ -132,6 +132,7 @@ static void ccs_test_run_tile(struct xe_device *xe, struct xe_tile *tile,
 
 	/* TODO: Sanity check */
 	unsigned int bo_flags = XE_BO_FLAG_VRAM_IF_DGFX(tile);
+	struct drm_exec *exec = XE_VALIDATION_OPT_OUT;
 
 	if (IS_DGFX(xe))
 		kunit_info(test, "Testing vram id %u\n", tile->id);
@@ -149,18 +150,18 @@ static void ccs_test_run_tile(struct xe_device *xe, struct xe_tile *tile,
 
 	kunit_info(test, "Verifying that CCS data is cleared on creation.\n");
 	ret = ccs_test_migrate(tile, bo, false, 0ULL, 0xdeadbeefdeadbeefULL,
-			       test);
+			       test, exec);
 	if (ret)
 		goto out_unlock;
 
 	kunit_info(test, "Verifying that CCS data survives migration.\n");
 	ret = ccs_test_migrate(tile, bo, false, 0xdeadbeefdeadbeefULL,
-			       0xdeadbeefdeadbeefULL, test);
+			       0xdeadbeefdeadbeefULL, test, exec);
 	if (ret)
 		goto out_unlock;
 
 	kunit_info(test, "Verifying that CCS data can be properly cleared.\n");
-	ret = ccs_test_migrate(tile, bo, true, 0ULL, 0ULL, test);
+	ret = ccs_test_migrate(tile, bo, true, 0ULL, 0ULL, test, exec);
 
 out_unlock:
 	xe_bo_unlock(bo);
@@ -210,6 +211,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
 	struct xe_bo *bo, *external;
 	unsigned int bo_flags = XE_BO_FLAG_VRAM_IF_DGFX(tile);
 	struct xe_vm *vm = xe_migrate_get_vm(xe_device_get_root_tile(xe)->migrate);
+	struct drm_exec *exec = XE_VALIDATION_OPT_OUT;
 	struct xe_gt *__gt;
 	int err, i, id;
 
@@ -236,7 +238,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
 		}
 
 		xe_bo_lock(external, false);
-		err = xe_bo_pin_external(external);
+		err = xe_bo_pin_external(external, exec);
 		xe_bo_unlock(external);
 		if (err) {
 			KUNIT_FAIL(test, "external bo pin err=%pe\n",
@@ -294,7 +296,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
 		if (i) {
 			down_read(&vm->lock);
 			xe_vm_lock(vm, false);
-			err = xe_bo_validate(bo, bo->vm, false);
+			err = xe_bo_validate(bo, bo->vm, false, exec);
 			xe_vm_unlock(vm);
 			up_read(&vm->lock);
 			if (err) {
@@ -303,7 +305,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
 				goto cleanup_all;
 			}
 			xe_bo_lock(external, false);
-			err = xe_bo_validate(external, NULL, false);
+			err = xe_bo_validate(external, NULL, false, exec);
 			xe_bo_unlock(external);
 			if (err) {
 				KUNIT_FAIL(test, "external bo valid err=%pe\n",
diff --git a/drivers/gpu/drm/xe/tests/xe_dma_buf.c b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
index cde9530bef8c..965dd3280468 100644
--- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
@@ -27,7 +27,8 @@ static bool is_dynamic(struct dma_buf_test_params *params)
 }
 
 static void check_residency(struct kunit *test, struct xe_bo *exported,
-			    struct xe_bo *imported, struct dma_buf *dmabuf)
+			    struct xe_bo *imported, struct dma_buf *dmabuf,
+			    struct drm_exec *exec)
 {
 	struct dma_buf_test_params *params = to_dma_buf_test_params(test->priv);
 	u32 mem_type;
@@ -62,7 +63,7 @@ static void check_residency(struct kunit *test, struct xe_bo *exported,
 	 * importer is on a different device. If they're on the same device,
 	 * the exporter and the importer should be the same bo.
 	 */
-	ret = xe_bo_evict(exported);
+	ret = xe_bo_evict(exported, exec);
 	if (ret) {
 		if (ret != -EINTR && ret != -ERESTARTSYS)
 			KUNIT_FAIL(test, "Evicting exporter failed with err=%d.\n",
@@ -77,7 +78,7 @@ static void check_residency(struct kunit *test, struct xe_bo *exported,
 	}
 
 	/* Re-validate the importer. This should move also exporter in. */
-	ret = xe_bo_validate(imported, NULL, false);
+	ret = xe_bo_validate(imported, NULL, false, exec);
 	if (ret) {
 		if (ret != -EINTR && ret != -ERESTARTSYS)
 			KUNIT_FAIL(test, "Validating importer failed with err=%d.\n",
@@ -150,11 +151,12 @@ static void xe_test_dmabuf_import_same_driver(struct xe_device *xe)
 			KUNIT_FAIL(test,
 				   "xe_gem_prime_import() succeeded when it shouldn't have\n");
 		} else {
+			struct drm_exec *exec = XE_VALIDATION_OPT_OUT;
 			int err;
 
 			/* Is everything where we expect it to be? */
 			xe_bo_lock(import_bo, false);
-			err = xe_bo_validate(import_bo, NULL, false);
+			err = xe_bo_validate(import_bo, NULL, false, exec);
 
 			/* Pinning in VRAM is not allowed. */
 			if (!is_dynamic(params) &&
@@ -167,7 +169,7 @@ static void xe_test_dmabuf_import_same_driver(struct xe_device *xe)
 						  err == -ERESTARTSYS);
 
 			if (!err)
-				check_residency(test, bo, import_bo, dmabuf);
+				check_residency(test, bo, import_bo, dmabuf, exec);
 			xe_bo_unlock(import_bo);
 		}
 		drm_gem_object_put(import);
diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
index edd1e701aa1c..dfb445d09759 100644
--- a/drivers/gpu/drm/xe/tests/xe_migrate.c
+++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
@@ -70,7 +70,7 @@ static int run_sanity_job(struct xe_migrate *m, struct xe_device *xe,
 		} } while (0)
 
 static void test_copy(struct xe_migrate *m, struct xe_bo *bo,
-		      struct kunit *test, u32 region)
+		      struct kunit *test, u32 region, struct drm_exec *exec)
 {
 	struct xe_device *xe = tile_to_xe(m->tile);
 	u64 retval, expected = 0;
@@ -84,14 +84,15 @@ static void test_copy(struct xe_migrate *m, struct xe_bo *bo,
 						   ttm_bo_type_kernel,
 						   region |
 						   XE_BO_FLAG_NEEDS_CPU_ACCESS |
-						   XE_BO_FLAG_PINNED);
+						   XE_BO_FLAG_PINNED,
+						   exec);
 	if (IS_ERR(remote)) {
 		KUNIT_FAIL(test, "Failed to allocate remote bo for %s: %pe\n",
 			   str, remote);
 		return;
 	}
 
-	err = xe_bo_validate(remote, NULL, false);
+	err = xe_bo_validate(remote, NULL, false, exec);
 	if (err) {
 		KUNIT_FAIL(test, "Failed to validate system bo for %s: %i\n",
 			   str, err);
@@ -161,13 +162,13 @@ static void test_copy(struct xe_migrate *m, struct xe_bo *bo,
 }
 
 static void test_copy_sysmem(struct xe_migrate *m, struct xe_bo *bo,
-			     struct kunit *test)
+			     struct drm_exec *exec, struct kunit *test)
 {
-	test_copy(m, bo, test, XE_BO_FLAG_SYSTEM);
+	test_copy(m, bo, test, XE_BO_FLAG_SYSTEM, exec);
 }
 
 static void test_copy_vram(struct xe_migrate *m, struct xe_bo *bo,
-			   struct kunit *test)
+			   struct drm_exec *exec, struct kunit *test)
 {
 	u32 region;
 
@@ -178,10 +179,11 @@ static void test_copy_vram(struct xe_migrate *m, struct xe_bo *bo,
 		region = XE_BO_FLAG_VRAM1;
 	else
 		region = XE_BO_FLAG_VRAM0;
-	test_copy(m, bo, test, region);
+	test_copy(m, bo, test, region, exec);
 }
 
-static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
+static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test,
+				   struct drm_exec *exec)
 {
 	struct xe_tile *tile = m->tile;
 	struct xe_device *xe = tile_to_xe(tile);
@@ -290,10 +292,10 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
 	check(retval, expected, "Command clear small last value", test);
 
 	kunit_info(test, "Copying small buffer object to system\n");
-	test_copy_sysmem(m, tiny, test);
+	test_copy_sysmem(m, tiny, exec, test);
 	if (xe->info.tile_count > 1) {
 		kunit_info(test, "Copying small buffer object to other vram\n");
-		test_copy_vram(m, tiny, test);
+		test_copy_vram(m, tiny, exec, test);
 	}
 
 	/* Clear a big bo */
@@ -312,10 +314,10 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
 	check(retval, expected, "Command clear big last value", test);
 
 	kunit_info(test, "Copying big buffer object to system\n");
-	test_copy_sysmem(m, big, test);
+	test_copy_sysmem(m, big, exec, test);
 	if (xe->info.tile_count > 1) {
 		kunit_info(test, "Copying big buffer object to other vram\n");
-		test_copy_vram(m, big, test);
+		test_copy_vram(m, big, exec, test);
 	}
 
 out:
@@ -343,10 +345,11 @@ static int migrate_test_run_device(struct xe_device *xe)
 
 	for_each_tile(tile, xe, id) {
 		struct xe_migrate *m = tile->migrate;
+		struct drm_exec *exec = XE_VALIDATION_OPT_OUT;
 
 		kunit_info(test, "Testing tile id %d.\n", id);
 		xe_vm_lock(m->q->vm, false);
-		xe_migrate_sanity_test(m, test);
+		xe_migrate_sanity_test(m, test, exec);
 		xe_vm_unlock(m->q->vm);
 	}
 
@@ -490,7 +493,7 @@ static struct dma_fence *blt_copy(struct xe_tile *tile,
 
 static void test_migrate(struct xe_device *xe, struct xe_tile *tile,
 			 struct xe_bo *sys_bo, struct xe_bo *vram_bo, struct xe_bo *ccs_bo,
-			 struct kunit *test)
+			 struct drm_exec *exec, struct kunit *test)
 {
 	struct dma_fence *fence;
 	u64 expected, retval;
@@ -509,7 +512,7 @@ static void test_migrate(struct xe_device *xe, struct xe_tile *tile,
 	dma_fence_put(fence);
 
 	kunit_info(test, "Evict vram buffer object\n");
-	ret = xe_bo_evict(vram_bo);
+	ret = xe_bo_evict(vram_bo, exec);
 	if (ret) {
 		KUNIT_FAIL(test, "Failed to evict bo.\n");
 		return;
@@ -538,7 +541,7 @@ static void test_migrate(struct xe_device *xe, struct xe_tile *tile,
 	dma_fence_put(fence);
 
 	kunit_info(test, "Restore vram buffer object\n");
-	ret = xe_bo_validate(vram_bo, NULL, false);
+	ret = xe_bo_validate(vram_bo, NULL, false, exec);
 	if (ret) {
 		KUNIT_FAIL(test, "Failed to validate vram bo for: %li\n", ret);
 		return;
@@ -636,6 +639,7 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
 {
 	struct xe_bo *sys_bo, *vram_bo = NULL, *ccs_bo = NULL;
 	unsigned int bo_flags = XE_BO_FLAG_VRAM_IF_DGFX(tile);
+	struct drm_exec *exec;
 	long ret;
 
 	sys_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M,
@@ -650,8 +654,9 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
 		return;
 	}
 
+	exec = XE_VALIDATION_OPT_OUT;
 	xe_bo_lock(sys_bo, false);
-	ret = xe_bo_validate(sys_bo, NULL, false);
+	ret = xe_bo_validate(sys_bo, NULL, false, exec);
 	if (ret) {
 		KUNIT_FAIL(test, "Failed to validate system bo for: %li\n", ret);
 		goto free_sysbo;
@@ -676,7 +681,7 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
 	}
 
 	xe_bo_lock(ccs_bo, false);
-	ret = xe_bo_validate(ccs_bo, NULL, false);
+	ret = xe_bo_validate(ccs_bo, NULL, false, exec);
 	if (ret) {
 		KUNIT_FAIL(test, "Failed to validate system bo for: %li\n", ret);
 		goto free_ccsbo;
@@ -700,7 +705,7 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
 	}
 
 	xe_bo_lock(vram_bo, false);
-	ret = xe_bo_validate(vram_bo, NULL, false);
+	ret = xe_bo_validate(vram_bo, NULL, false, exec);
 	if (ret) {
 		KUNIT_FAIL(test, "Failed to validate vram bo for: %li\n", ret);
 		goto free_vrambo;
@@ -713,7 +718,7 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
 	}
 
 	test_clear(xe, tile, sys_bo, vram_bo, test);
-	test_migrate(xe, tile, sys_bo, vram_bo, ccs_bo, test);
+	test_migrate(xe, tile, sys_bo, vram_bo, ccs_bo, exec, test);
 	xe_bo_unlock(vram_bo);
 
 	xe_bo_lock(vram_bo, false);
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 11eaf3b06766..e71addf51ed0 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1139,6 +1139,7 @@ long xe_bo_shrink(struct ttm_operation_ctx *ctx, struct ttm_buffer_object *bo,
 int xe_bo_notifier_prepare_pinned(struct xe_bo *bo)
 {
 	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
+	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
 	struct xe_bo *backup;
 	int ret = 0;
 
@@ -1163,7 +1164,7 @@ int xe_bo_notifier_prepare_pinned(struct xe_bo *bo)
 	backup = ___xe_bo_create_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
 					DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
 					XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
-					XE_BO_FLAG_PINNED);
+					XE_BO_FLAG_PINNED, exec);
 	if (IS_ERR(backup)) {
 		ret = PTR_ERR(backup);
 		goto out_unlock_bo;
@@ -1214,6 +1215,7 @@ int xe_bo_notifier_unprepare_pinned(struct xe_bo *bo)
 int xe_bo_evict_pinned(struct xe_bo *bo)
 {
 	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
+	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
 	struct xe_bo *backup = bo->backup_obj;
 	bool backup_created = false;
 	bool unmap = false;
@@ -1242,7 +1244,7 @@ int xe_bo_evict_pinned(struct xe_bo *bo)
 						NULL, xe_bo_size(bo),
 						DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
 						XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
-						XE_BO_FLAG_PINNED);
+						XE_BO_FLAG_PINNED, exec);
 		if (IS_ERR(backup)) {
 			ret = PTR_ERR(backup);
 			goto out_unlock_bo;
@@ -1718,12 +1720,14 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
 	struct xe_device *xe = to_xe_device(ddev);
 	struct xe_bo *bo = ttm_to_xe_bo(tbo);
 	bool needs_rpm = bo->flags & XE_BO_FLAG_VRAM_MASK;
+	struct drm_exec *exec;
 	vm_fault_t ret;
 	int idx;
 
 	if (needs_rpm)
 		xe_pm_runtime_get(xe);
 
+	exec = XE_VALIDATION_UNIMPLEMENTED;
 	ret = ttm_bo_vm_reserve(tbo, vmf);
 	if (ret)
 		goto out;
@@ -1731,6 +1735,7 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
 	if (drm_dev_enter(ddev, &idx)) {
 		trace_xe_bo_cpu_fault(bo);
 
+		xe_validation_assert_exec(xe, exec, &tbo->base);
 		ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
 					       TTM_BO_VM_NUM_PREFAULT);
 		drm_dev_exit(idx);
@@ -1850,11 +1855,32 @@ void xe_bo_free(struct xe_bo *bo)
 	kfree(bo);
 }
 
+/**
+ * ___xe_bo_create_locked() - Initialize or create an xe_bo.
+ * @xe: The xe device.
+ * @bo: An already allocated buffer object or NULL
+ * if the function should allocate a new one.
+ * @tile: The tile to select for migration of this bo, and the tile used for
+ * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
+ * @resv: Pointer to a locked shared reservation object to use fo this bo,
+ * or NULL for the xe_bo to use its own.
+ * @bulk: The bulk move to use for LRU bumping, or NULL for external bos.
+ * @size: The storage size to use for the bo.
+ * @cpu_caching: The cpu caching used for system memory backing store.
+ * @type: The TTM buffer object type.
+ * @flags: XE_BO_FLAG_ flags.
+ * @exec: The drm_exec transaction to use for exhaustive eviction.
+ *
+ * Initialize or create an xe buffer object. On failure, any allocated buffer
+ * object passed in @bo will have been unreferenced.
+ *
+ * Return: The buffer object on success. Negative error pointer on failure.
+ */
 struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
 				     struct xe_tile *tile, struct dma_resv *resv,
 				     struct ttm_lru_bulk_move *bulk, size_t size,
 				     u16 cpu_caching, enum ttm_bo_type type,
-				     u32 flags)
+				     u32 flags, struct drm_exec *exec)
 {
 	struct ttm_operation_ctx ctx = {
 		.interruptible = true,
@@ -1923,6 +1949,7 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
 		ctx.resv = resv;
 	}
 
+	xe_validation_assert_exec(xe, exec, &bo->ttm.base);
 	if (!(flags & XE_BO_FLAG_FIXED_PLACEMENT)) {
 		err = __xe_bo_placement_for_flags(xe, bo, bo->flags);
 		if (WARN_ON(err)) {
@@ -2024,7 +2051,7 @@ __xe_bo_create_locked(struct xe_device *xe,
 		      struct xe_tile *tile, struct xe_vm *vm,
 		      size_t size, u64 start, u64 end,
 		      u16 cpu_caching, enum ttm_bo_type type, u32 flags,
-		      u64 alignment)
+		      u64 alignment, struct drm_exec *exec)
 {
 	struct xe_bo *bo = NULL;
 	int err;
@@ -2049,7 +2076,7 @@ __xe_bo_create_locked(struct xe_device *xe,
 				    vm && !xe_vm_in_fault_mode(vm) &&
 				    flags & XE_BO_FLAG_USER ?
 				    &vm->lru_bulk_move : NULL, size,
-				    cpu_caching, type, flags);
+				    cpu_caching, type, flags, exec);
 	if (IS_ERR(bo))
 		return bo;
 
@@ -2083,9 +2110,10 @@ __xe_bo_create_locked(struct xe_device *xe,
 
 			if (flags & XE_BO_FLAG_FIXED_PLACEMENT) {
 				err = xe_ggtt_insert_bo_at(t->mem.ggtt, bo,
-							   start + xe_bo_size(bo), U64_MAX);
+							   start + xe_bo_size(bo), U64_MAX,
+							   exec);
 			} else {
-				err = xe_ggtt_insert_bo(t->mem.ggtt, bo);
+				err = xe_ggtt_insert_bo(t->mem.ggtt, bo, exec);
 			}
 			if (err)
 				goto err_unlock_put_bo;
@@ -2102,22 +2130,59 @@ __xe_bo_create_locked(struct xe_device *xe,
 	return ERR_PTR(err);
 }
 
+/**
+ * xe_bo_create_locked_range() - Create a BO with range- and alignment options
+ * @xe: The xe device.
+ * @tile: The tile to select for migration of this bo, and the tile used for
+ * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
+ * @vm: The local vm or NULL for external objects.
+ * @size: The storage size to use for the bo.
+ * @start: Start of fixed VRAM range or 0.
+ * @end: End of fixed VRAM range or ~0ULL.
+ * @type: The TTM buffer object type.
+ * @flags: XE_BO_FLAG_ flags.
+ * @alignment: For GGTT buffer objects, the minimum GGTT alignment.
+ * @exec: The drm_exec transaction to use for exhaustive eviction.
+ *
+ * Create an Xe BO with range- and alignment options. If @start and @end indicate
+ * a fixed VRAM range, this must be a ttm_bo_type_kernel bo with VRAM placement
+ * only. The @alignment parameter can be used for GGTT alignment.
+ *
+ * Return: The buffer object on success. Negative error pointer on failure.
+ */
 struct xe_bo *
 xe_bo_create_locked_range(struct xe_device *xe,
 			  struct xe_tile *tile, struct xe_vm *vm,
 			  size_t size, u64 start, u64 end,
-			  enum ttm_bo_type type, u32 flags, u64 alignment)
+			  enum ttm_bo_type type, u32 flags, u64 alignment,
+			  struct drm_exec *exec)
 {
 	return __xe_bo_create_locked(xe, tile, vm, size, start, end, 0, type,
-				     flags, alignment);
+				     flags, alignment, exec);
 }
 
+/**
+ * xe_bo_create_locked() - Create a BO
+ * @xe: The xe device.
+ * @tile: The tile to select for migration of this bo, and the tile used for
+ * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
+ * @vm: The local vm or NULL for external objects.
+ * @size: The storage size to use for the bo.
+ * @type: The TTM buffer object type.
+ * @flags: XE_BO_FLAG_ flags.
+ * @exec: The drm_exec transaction to use for exhaustive eviction.
+ *
+ * Create a locked xe BO with no range- nor alignment restrictions.
+ *
+ * Return: The buffer object on success. Negative error pointer on failure.
+ */
 struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
 				  struct xe_vm *vm, size_t size,
-				  enum ttm_bo_type type, u32 flags)
+				  enum ttm_bo_type type, u32 flags,
+				  struct drm_exec *exec)
 {
 	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, type,
-				     flags, 0);
+				     flags, 0, exec);
 }
 
 struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
@@ -2125,9 +2190,10 @@ struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
 				u16 cpu_caching,
 				u32 flags)
 {
+	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
 	struct xe_bo *bo = __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL,
 						 cpu_caching, ttm_bo_type_device,
-						 flags | XE_BO_FLAG_USER, 0);
+						 flags | XE_BO_FLAG_USER, 0, exec);
 	if (!IS_ERR(bo))
 		xe_bo_unlock_vm_held(bo);
 
@@ -2138,7 +2204,8 @@ struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
 			   struct xe_vm *vm, size_t size,
 			   enum ttm_bo_type type, u32 flags)
 {
-	struct xe_bo *bo = xe_bo_create_locked(xe, tile, vm, size, type, flags);
+	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
+	struct xe_bo *bo = xe_bo_create_locked(xe, tile, vm, size, type, flags, exec);
 
 	if (!IS_ERR(bo))
 		xe_bo_unlock_vm_held(bo);
@@ -2166,6 +2233,7 @@ struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
 	int err;
 	u64 start = offset == ~0ull ? 0 : offset;
 	u64 end = offset == ~0ull ? offset : start + size;
+	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
 
 	if (flags & XE_BO_FLAG_STOLEN &&
 	    xe_ttm_stolen_cpu_access_needs_ggtt(xe))
@@ -2173,11 +2241,11 @@ struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
 
 	bo = xe_bo_create_locked_range(xe, tile, vm, size, start, end, type,
 				       flags | XE_BO_FLAG_NEEDS_CPU_ACCESS | XE_BO_FLAG_PINNED,
-				       alignment);
+				       alignment, exec);
 	if (IS_ERR(bo))
 		return bo;
 
-	err = xe_bo_pin(bo);
+	err = xe_bo_pin(bo, exec);
 	if (err)
 		goto err_put;
 
@@ -2299,6 +2367,7 @@ uint64_t vram_region_gpu_offset(struct ttm_resource *res)
 /**
  * xe_bo_pin_external - pin an external BO
  * @bo: buffer object to be pinned
+ * @exec: The drm_exec transaction to use for exhaustive eviction.
  *
  * Pin an external (not tied to a VM, can be exported via dma-buf / prime FD)
  * BO. Unique call compared to xe_bo_pin as this function has it own set of
@@ -2306,7 +2375,7 @@ uint64_t vram_region_gpu_offset(struct ttm_resource *res)
  *
  * Returns 0 for success, negative error code otherwise.
  */
-int xe_bo_pin_external(struct xe_bo *bo)
+int xe_bo_pin_external(struct xe_bo *bo, struct drm_exec *exec)
 {
 	struct xe_device *xe = xe_bo_device(bo);
 	int err;
@@ -2315,7 +2384,7 @@ int xe_bo_pin_external(struct xe_bo *bo)
 	xe_assert(xe, xe_bo_is_user(bo));
 
 	if (!xe_bo_is_pinned(bo)) {
-		err = xe_bo_validate(bo, NULL, false);
+		err = xe_bo_validate(bo, NULL, false, exec);
 		if (err)
 			return err;
 
@@ -2337,7 +2406,17 @@ int xe_bo_pin_external(struct xe_bo *bo)
 	return 0;
 }
 
-int xe_bo_pin(struct xe_bo *bo)
+/**
+ * xe_bo_pin() - Pin a kernel bo after potentially migrating it
+ * @bo: The kernel bo to pin.
+ * @exec: The drm_exec transaction to use for exhaustive eviction.
+ *
+ * Attempts to migrate a bo to @bo->placement. If that succeeds,
+ * pins the bo.
+ *
+ * Return: %0 on success, negative error code on migration failure.
+ */
+int xe_bo_pin(struct xe_bo *bo, struct drm_exec *exec)
 {
 	struct ttm_place *place = &bo->placements[0];
 	struct xe_device *xe = xe_bo_device(bo);
@@ -2359,7 +2438,7 @@ int xe_bo_pin(struct xe_bo *bo)
 	/* We only expect at most 1 pin */
 	xe_assert(xe, !xe_bo_is_pinned(bo));
 
-	err = xe_bo_validate(bo, NULL, false);
+	err = xe_bo_validate(bo, NULL, false, exec);
 	if (err)
 		return err;
 
@@ -2452,6 +2531,7 @@ void xe_bo_unpin(struct xe_bo *bo)
  *      NULL. Used together with @allow_res_evict.
  * @allow_res_evict: Whether it's allowed to evict bos sharing @vm's
  *                   reservation object.
+ * @exec: The drm_exec transaction to use for exhaustive eviction.
  *
  * Make sure the bo is in allowed placement, migrating it if necessary. If
  * needed, other bos will be evicted. If bos selected for eviction shares
@@ -2461,7 +2541,8 @@ void xe_bo_unpin(struct xe_bo *bo)
  * Return: 0 on success, negative error code on failure. May return
  * -EINTR or -ERESTARTSYS if internal waits are interrupted by a signal.
  */
-int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict)
+int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict,
+		   struct drm_exec *exec)
 {
 	struct ttm_operation_ctx ctx = {
 		.interruptible = true,
@@ -2480,6 +2561,7 @@ int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict)
 
 	xe_vm_set_validating(vm, allow_res_evict);
 	trace_xe_bo_validate(bo);
+	xe_validation_assert_exec(xe_bo_device(bo), exec, &bo->ttm.base);
 	ret = ttm_bo_validate(&bo->ttm, &bo->placement, &ctx);
 	xe_vm_clear_validating(vm, allow_res_evict);
 
@@ -2917,6 +2999,7 @@ static void xe_place_from_ttm_type(u32 mem_type, struct ttm_place *place)
  * xe_bo_migrate - Migrate an object to the desired region id
  * @bo: The buffer object to migrate.
  * @mem_type: The TTM region type to migrate to.
+ * @exec: The drm_exec transaction to use for exhaustive eviction.
  *
  * Attempt to migrate the buffer object to the desired memory region. The
  * buffer object may not be pinned, and must be locked.
@@ -2928,7 +3011,7 @@ static void xe_place_from_ttm_type(u32 mem_type, struct ttm_place *place)
  * Return: 0 on success. Negative error code on failure. In particular may
  * return -EINTR or -ERESTARTSYS if signal pending.
  */
-int xe_bo_migrate(struct xe_bo *bo, u32 mem_type)
+int xe_bo_migrate(struct xe_bo *bo, u32 mem_type, struct drm_exec *exec)
 {
 	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
 	struct ttm_operation_ctx ctx = {
@@ -2966,19 +3049,21 @@ int xe_bo_migrate(struct xe_bo *bo, u32 mem_type)
 		add_vram(xe, bo, &requested, bo->flags, mem_type, &c);
 	}
 
+	xe_validation_assert_exec(xe_bo_device(bo), exec, &bo->ttm.base);
 	return ttm_bo_validate(&bo->ttm, &placement, &ctx);
 }
 
 /**
  * xe_bo_evict - Evict an object to evict placement
  * @bo: The buffer object to migrate.
+ * @exec: The drm_exec transaction to use for exhaustive eviction.
  *
  * On successful completion, the object memory will be moved to evict
  * placement. This function blocks until the object has been fully moved.
  *
  * Return: 0 on success. Negative error code on failure.
  */
-int xe_bo_evict(struct xe_bo *bo)
+int xe_bo_evict(struct xe_bo *bo, struct drm_exec *exec)
 {
 	struct ttm_operation_ctx ctx = {
 		.interruptible = false,
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index 8cce413b5235..b1b6cb622d71 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -10,6 +10,7 @@
 
 #include "xe_bo_types.h"
 #include "xe_macros.h"
+#include "xe_validation.h"
 #include "xe_vm_types.h"
 #include "xe_vm.h"
 #include "xe_vram_types.h"
@@ -92,15 +93,17 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
 				     struct xe_tile *tile, struct dma_resv *resv,
 				     struct ttm_lru_bulk_move *bulk, size_t size,
 				     u16 cpu_caching, enum ttm_bo_type type,
-				     u32 flags);
+				     u32 flags, struct drm_exec *exec);
 struct xe_bo *
 xe_bo_create_locked_range(struct xe_device *xe,
 			  struct xe_tile *tile, struct xe_vm *vm,
 			  size_t size, u64 start, u64 end,
-			  enum ttm_bo_type type, u32 flags, u64 alignment);
+			  enum ttm_bo_type type, u32 flags, u64 alignment,
+			  struct drm_exec *exec);
 struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
 				  struct xe_vm *vm, size_t size,
-				  enum ttm_bo_type type, u32 flags);
+				  enum ttm_bo_type type, u32 flags,
+				  struct drm_exec *exec);
 struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
 			   struct xe_vm *vm, size_t size,
 			   enum ttm_bo_type type, u32 flags);
@@ -200,11 +203,12 @@ static inline void xe_bo_unlock_vm_held(struct xe_bo *bo)
 	}
 }
 
-int xe_bo_pin_external(struct xe_bo *bo);
-int xe_bo_pin(struct xe_bo *bo);
+int xe_bo_pin_external(struct xe_bo *bo, struct drm_exec *exec);
+int xe_bo_pin(struct xe_bo *bo, struct drm_exec *exec);
 void xe_bo_unpin_external(struct xe_bo *bo);
 void xe_bo_unpin(struct xe_bo *bo);
-int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict);
+int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict,
+		   struct drm_exec *exec);
 
 static inline bool xe_bo_is_pinned(struct xe_bo *bo)
 {
@@ -285,8 +289,8 @@ uint64_t vram_region_gpu_offset(struct ttm_resource *res);
 
 bool xe_bo_can_migrate(struct xe_bo *bo, u32 mem_type);
 
-int xe_bo_migrate(struct xe_bo *bo, u32 mem_type);
-int xe_bo_evict(struct xe_bo *bo);
+int xe_bo_migrate(struct xe_bo *bo, u32 mem_type, struct drm_exec *exec);
+int xe_bo_evict(struct xe_bo *bo, struct drm_exec *exec);
 
 int xe_bo_evict_pinned(struct xe_bo *bo);
 int xe_bo_notifier_prepare_pinned(struct xe_bo *bo);
diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
index 346f857f3837..78a827d4e726 100644
--- a/drivers/gpu/drm/xe/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/xe_dma_buf.c
@@ -51,6 +51,7 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
 	struct drm_gem_object *obj = attach->dmabuf->priv;
 	struct xe_bo *bo = gem_to_xe_bo(obj);
 	struct xe_device *xe = xe_bo_device(bo);
+	struct drm_exec *exec = XE_VALIDATION_UNSUPPORTED;
 	int ret;
 
 	/*
@@ -63,7 +64,7 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
 		return -EINVAL;
 	}
 
-	ret = xe_bo_migrate(bo, XE_PL_TT);
+	ret = xe_bo_migrate(bo, XE_PL_TT, exec);
 	if (ret) {
 		if (ret != -EINTR && ret != -ERESTARTSYS)
 			drm_dbg(&xe->drm,
@@ -72,7 +73,7 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
 		return ret;
 	}
 
-	ret = xe_bo_pin_external(bo);
+	ret = xe_bo_pin_external(bo, exec);
 	xe_assert(xe, !ret);
 
 	return 0;
@@ -92,6 +93,7 @@ static struct sg_table *xe_dma_buf_map(struct dma_buf_attachment *attach,
 	struct dma_buf *dma_buf = attach->dmabuf;
 	struct drm_gem_object *obj = dma_buf->priv;
 	struct xe_bo *bo = gem_to_xe_bo(obj);
+	struct drm_exec *exec = XE_VALIDATION_UNSUPPORTED;
 	struct sg_table *sgt;
 	int r = 0;
 
@@ -100,9 +102,9 @@ static struct sg_table *xe_dma_buf_map(struct dma_buf_attachment *attach,
 
 	if (!xe_bo_is_pinned(bo)) {
 		if (!attach->peer2peer)
-			r = xe_bo_migrate(bo, XE_PL_TT);
+			r = xe_bo_migrate(bo, XE_PL_TT, exec);
 		else
-			r = xe_bo_validate(bo, NULL, false);
+			r = xe_bo_validate(bo, NULL, false, exec);
 		if (r)
 			return ERR_PTR(r);
 	}
@@ -161,13 +163,14 @@ static int xe_dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
 	struct xe_bo *bo = gem_to_xe_bo(obj);
 	bool reads =  (direction == DMA_BIDIRECTIONAL ||
 		       direction == DMA_FROM_DEVICE);
+	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
 
 	if (!reads)
 		return 0;
 
 	/* Can we do interruptible lock here? */
 	xe_bo_lock(bo, false);
-	(void)xe_bo_migrate(bo, XE_PL_TT);
+	(void)xe_bo_migrate(bo, XE_PL_TT, exec);
 	xe_bo_unlock(bo);
 
 	return 0;
@@ -208,13 +211,14 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
 {
 	struct dma_resv *resv = dma_buf->resv;
 	struct xe_device *xe = to_xe_device(dev);
+	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
 	struct xe_bo *bo;
 	int ret;
 
 	dma_resv_lock(resv, NULL);
 	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
 				    0, /* Will require 1way or 2way for vm_bind */
-				    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM);
+				    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM, exec);
 	if (IS_ERR(bo)) {
 		ret = PTR_ERR(bo);
 		goto error;
@@ -232,8 +236,9 @@ static void xe_dma_buf_move_notify(struct dma_buf_attachment *attach)
 {
 	struct drm_gem_object *obj = attach->importer_priv;
 	struct xe_bo *bo = gem_to_xe_bo(obj);
+	struct drm_exec *exec = XE_VALIDATION_UNSUPPORTED;
 
-	XE_WARN_ON(xe_bo_evict(bo));
+	XE_WARN_ON(xe_bo_evict(bo, exec));
 }
 
 static const struct dma_buf_attach_ops xe_dma_buf_attach_ops = {
diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index 44364c042ad7..0bcb4fb9a10e 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -97,9 +97,13 @@
 static int xe_exec_fn(struct drm_gpuvm_exec *vm_exec)
 {
 	struct xe_vm *vm = container_of(vm_exec->vm, struct xe_vm, gpuvm);
+	int ret;
 
 	/* The fence slot added here is intended for the exec sched job. */
-	return xe_vm_validate_rebind(vm, &vm_exec->exec, 1);
+	xe_vm_set_validation_exec(vm, &vm_exec->exec);
+	ret = xe_vm_validate_rebind(vm, &vm_exec->exec, 1);
+	xe_vm_set_validation_exec(vm, NULL);
+	return ret;
 }
 
 int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index e03222f5ac5a..a47c0131956b 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -731,7 +731,7 @@ void xe_ggtt_map_bo_unlocked(struct xe_ggtt *ggtt, struct xe_bo *bo)
 }
 
 static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
-				  u64 start, u64 end)
+				  u64 start, u64 end, struct drm_exec *exec)
 {
 	u64 alignment = bo->min_align > 0 ? bo->min_align : XE_PAGE_SIZE;
 	u8 tile_id = ggtt->tile->id;
@@ -746,7 +746,7 @@ static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
 		return 0;
 	}
 
-	err = xe_bo_validate(bo, NULL, false);
+	err = xe_bo_validate(bo, NULL, false, exec);
 	if (err)
 		return err;
 
@@ -788,25 +788,28 @@ static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
  * @bo: the &xe_bo to be inserted
  * @start: address where it will be inserted
  * @end: end of the range where it will be inserted
+ * @exec: The drm_exec transaction to use for exhaustive eviction.
  *
  * Return: 0 on success or a negative error code on failure.
  */
 int xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
-			 u64 start, u64 end)
+			 u64 start, u64 end, struct drm_exec *exec)
 {
-	return __xe_ggtt_insert_bo_at(ggtt, bo, start, end);
+	return __xe_ggtt_insert_bo_at(ggtt, bo, start, end, exec);
 }
 
 /**
  * xe_ggtt_insert_bo - Insert BO into GGTT
  * @ggtt: the &xe_ggtt where bo will be inserted
  * @bo: the &xe_bo to be inserted
+ * @exec: The drm_exec transaction to use for exhaustive eviction.
  *
  * Return: 0 on success or a negative error code on failure.
  */
-int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo)
+int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo,
+		      struct drm_exec *exec)
 {
-	return __xe_ggtt_insert_bo_at(ggtt, bo, 0, U64_MAX);
+	return __xe_ggtt_insert_bo_at(ggtt, bo, 0, U64_MAX, exec);
 }
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_ggtt.h b/drivers/gpu/drm/xe/xe_ggtt.h
index fbe1e397d05d..75fc7a1efea7 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.h
+++ b/drivers/gpu/drm/xe/xe_ggtt.h
@@ -10,6 +10,7 @@
 
 struct drm_printer;
 struct xe_tile;
+struct drm_exec;
 
 struct xe_ggtt *xe_ggtt_alloc(struct xe_tile *tile);
 int xe_ggtt_init_early(struct xe_ggtt *ggtt);
@@ -31,9 +32,9 @@ bool xe_ggtt_node_allocated(const struct xe_ggtt_node *node);
 void xe_ggtt_map_bo(struct xe_ggtt *ggtt, struct xe_ggtt_node *node,
 		    struct xe_bo *bo, u16 pat_index);
 void xe_ggtt_map_bo_unlocked(struct xe_ggtt *ggtt, struct xe_bo *bo);
-int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo);
+int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo, struct drm_exec *exec);
 int xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
-			 u64 start, u64 end);
+			 u64 start, u64 end, struct drm_exec *exec);
 void xe_ggtt_remove_bo(struct xe_ggtt *ggtt, struct xe_bo *bo);
 u64 xe_ggtt_largest_hole(struct xe_ggtt *ggtt, u64 alignment, u64 *spare);
 
diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index ab43dec52776..4133b9b78f7d 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -94,12 +94,12 @@ static int xe_pf_begin(struct drm_exec *exec, struct xe_vma *vma,
 		}
 
 		/* Migrate to VRAM, move should invalidate the VMA first */
-		err = xe_bo_migrate(bo, vram->placement);
+		err = xe_bo_migrate(bo, vram->placement, exec);
 		if (err)
 			return err;
 	} else if (bo) {
 		/* Create backing store if needed */
-		err = xe_bo_validate(bo, vm, true);
+		err = xe_bo_validate(bo, vm, true, exec);
 		if (err)
 			return err;
 	}
@@ -150,7 +150,9 @@ static int handle_vma_pagefault(struct xe_gt *gt, struct xe_vma *vma,
 
 		/* Bind VMA only to the GT that has faulted */
 		trace_xe_vma_pf_bind(vma);
+		xe_vm_set_validation_exec(vm, &exec);
 		fence = xe_vma_rebind(vm, vma, BIT(tile->id));
+		xe_vm_set_validation_exec(vm, NULL);
 		if (IS_ERR(fence)) {
 			err = PTR_ERR(fence);
 			if (xe_vm_validate_should_retry(&exec, err, &end))
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
index c8f0320d032f..906011671b60 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
@@ -1452,6 +1452,7 @@ static bool pf_release_vf_config_lmem(struct xe_gt *gt, struct xe_gt_sriov_confi
 static int pf_provision_vf_lmem(struct xe_gt *gt, unsigned int vfid, u64 size)
 {
 	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
+	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
 	struct xe_device *xe = gt_to_xe(gt);
 	struct xe_tile *tile = gt_to_tile(gt);
 	struct xe_bo *bo;
@@ -1484,11 +1485,12 @@ static int pf_provision_vf_lmem(struct xe_gt *gt, unsigned int vfid, u64 size)
 				 XE_BO_FLAG_VRAM_IF_DGFX(tile) |
 				 XE_BO_FLAG_NEEDS_2M |
 				 XE_BO_FLAG_PINNED |
-				 XE_BO_FLAG_PINNED_LATE_RESTORE);
+				 XE_BO_FLAG_PINNED_LATE_RESTORE,
+				 exec);
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
 
-	err = xe_bo_pin(bo);
+	err = xe_bo_pin(bo, exec);
 	xe_bo_unlock(bo);
 	if (unlikely(err)) {
 		xe_bo_put(bo);
diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index e35c6d4def20..39e3aa6df25a 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -700,6 +700,7 @@ static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
 	struct device *dev = xe->drm.dev;
 	struct drm_buddy_block *block;
 	struct list_head *blocks;
+	struct drm_exec *exec;
 	struct xe_bo *bo;
 	ktime_t time_end = 0;
 	int err, idx;
@@ -708,12 +709,13 @@ static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
 		return -ENODEV;
 
 	xe_pm_runtime_get(xe);
+	exec = XE_VALIDATION_UNIMPLEMENTED;
 
  retry:
 	bo = xe_bo_create_locked(vr->xe, NULL, NULL, end - start,
 				 ttm_bo_type_device,
 				 (IS_DGFX(xe) ? XE_BO_FLAG_VRAM(vr) : XE_BO_FLAG_SYSTEM) |
-				 XE_BO_FLAG_CPU_ADDR_MIRROR);
+				 XE_BO_FLAG_CPU_ADDR_MIRROR, exec);
 	if (IS_ERR(bo)) {
 		err = PTR_ERR(bo);
 		if (xe_vm_validate_should_retry(NULL, err, &time_end))
diff --git a/drivers/gpu/drm/xe/xe_validation.c b/drivers/gpu/drm/xe/xe_validation.c
new file mode 100644
index 000000000000..cc0684d24e02
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_validation.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2024 Intel Corporation
+ */
+#include "xe_bo.h"
+#include <drm/drm_exec.h>
+#include <drm/drm_gem.h>
+
+#include "xe_assert.h"
+#include "xe_validation.h"
+
+#ifdef CONFIG_DRM_XE_DEBUG
+/**
+ * xe_validation_assert_exec() - Assert that the drm_exec pointer is suitable
+ * for validation.
+ * @xe: Pointer to the xe device.
+ * @exec: The drm_exec pointer to check.
+ * @obj: Pointer to the object subject to validation.
+ *
+ * NULL exec pointers are not allowed.
+ * For XE_VALIDATION_UNIMPLEMENTED, no checking.
+ * For XE_VLIDATION_OPT_OUT, check that the caller is a kunit test
+ * For XE_VALIDATION_UNSUPPORTED, check that the object subject to
+ * validation is a dma-buf, for which support for ww locking is
+ * not in place in the dma-buf layer.
+ */
+void xe_validation_assert_exec(const struct xe_device *xe,
+			       const struct drm_exec *exec,
+			       const struct drm_gem_object *obj)
+{
+	xe_assert(xe, exec);
+	if (IS_ERR(exec)) {
+		switch (PTR_ERR(exec)) {
+		case __XE_VAL_UNIMPLEMENTED:
+			break;
+		case __XE_VAL_UNSUPPORTED:
+			xe_assert(xe, !!obj->dma_buf);
+			break;
+#if IS_ENABLED(CONFIG_KUNIT)
+		case __XE_VAL_OPT_OUT:
+			xe_assert(xe, current->kunit_test);
+			break;
+#endif
+		default:
+			xe_assert(xe, false);
+		}
+	}
+}
+#endif
diff --git a/drivers/gpu/drm/xe/xe_validation.h b/drivers/gpu/drm/xe/xe_validation.h
new file mode 100644
index 000000000000..db50feacad7a
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_validation.h
@@ -0,0 +1,69 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2024 Intel Corporation
+ */
+#ifndef _XE_VALIDATION_H_
+#define _XE_VALIDATION_H_
+
+#include <linux/dma-resv.h>
+#include <linux/types.h>
+
+struct drm_exec;
+struct drm_gem_object;
+struct xe_device;
+
+#ifdef CONFIG_PROVE_LOCKING
+/**
+ * xe_validation_lockdep() - Assert that a drm_exec locking transaction can
+ * be initialized at this point.
+ */
+static inline void xe_validation_lockdep(void)
+{
+	struct ww_acquire_ctx ticket;
+
+	ww_acquire_init(&ticket, &reservation_ww_class);
+	ww_acquire_fini(&ticket);
+}
+#else
+static inline void xe_validation_lockdep(void)
+{
+}
+#endif
+
+/*
+ * Various values of the drm_exec pointer where we've not (yet)
+ * implemented full ww locking.
+ *
+ * XE_VALIDATION_UNIMPLEMENTED means implementation is pending.
+ * A lockdep check is made to assure that a drm_exec locking
+ * transaction can actually take place where the macro is
+ * used. If this asserts, the exec pointer needs to be assigned
+ * higher up in the callchain and passed down.
+ *
+ * XE_VALIDATION_UNSUPPORTED is for dma-buf code only where
+ * the dma-buf layer doesn't support WW locking.
+ *
+ * XE_VALIDATION_OPT_OUT is for simplification of kunit tests where
+ * exhaustive eviction isn't necessary.
+ */
+#define __XE_VAL_UNIMPLEMENTED -EINVAL
+#define XE_VALIDATION_UNIMPLEMENTED (xe_validation_lockdep(),		\
+				     (struct drm_exec *)ERR_PTR(__XE_VAL_UNIMPLEMENTED))
+
+#define __XE_VAL_UNSUPPORTED -EOPNOTSUPP
+#define XE_VALIDATION_UNSUPPORTED ((struct drm_exec *)ERR_PTR(__XE_VAL_UNSUPPORTED))
+
+#define __XE_VAL_OPT_OUT -ENOMEM
+#define XE_VALIDATION_OPT_OUT (xe_validation_lockdep(), \
+			       (struct drm_exec *)ERR_PTR(__XE_VAL_OPT_OUT))
+#ifdef CONFIG_DRM_XE_DEBUG
+void xe_validation_assert_exec(const struct xe_device *xe, const struct drm_exec *exec,
+			       const struct drm_gem_object *obj);
+#else
+#define xe_validation_assert_exec(_xe, _exec, _obj)	\
+	do {						\
+		(void)_xe; (void)_exec; (void)_obj;	\
+	} while (0)
+#endif
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 529b6767caac..f1e74959f8ff 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -393,7 +393,7 @@ static int xe_gpuvm_validate(struct drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
 		list_move_tail(&gpuva_to_vma(gpuva)->combined_links.rebind,
 			       &vm->rebind_list);
 
-	ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, false);
+	ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, false, exec);
 	if (ret)
 		return ret;
 
@@ -528,7 +528,9 @@ static void preempt_rebind_work_func(struct work_struct *w)
 	if (err)
 		goto out_unlock;
 
+	xe_vm_set_validation_exec(vm, &exec);
 	err = xe_vm_rebind(vm, true);
+	xe_vm_set_validation_exec(vm, NULL);
 	if (err)
 		goto out_unlock;
 
@@ -2896,7 +2898,7 @@ static int vma_lock_and_validate(struct drm_exec *exec, struct xe_vma *vma,
 			err = drm_exec_lock_obj(exec, &bo->ttm.base);
 		if (!err && validate)
 			err = xe_bo_validate(bo, vm,
-					     !xe_vm_in_preempt_fence_mode(vm));
+					     !xe_vm_in_preempt_fence_mode(vm), exec);
 	}
 
 	return err;
@@ -3019,7 +3021,8 @@ static int op_lock_and_prep(struct drm_exec *exec, struct xe_vm *vm,
 					    false);
 		if (!err && !xe_vma_has_no_bo(vma))
 			err = xe_bo_migrate(xe_vma_bo(vma),
-					    region_to_mem_type[region]);
+					    region_to_mem_type[region],
+					    exec);
 		break;
 	}
 	default:
@@ -3298,7 +3301,9 @@ static struct dma_fence *vm_bind_ioctl_ops_execute(struct xe_vm *vm,
 			goto unlock;
 		}
 
+		xe_vm_set_validation_exec(vm, &exec);
 		fence = ops_execute(vm, vops);
+		xe_vm_set_validation_exec(vm, NULL);
 		if (IS_ERR(fence)) {
 			if (PTR_ERR(fence) == -ENODATA)
 				vm_bind_ioctl_ops_fini(vm, vops, NULL);
@@ -3861,10 +3866,18 @@ struct dma_fence *xe_vm_bind_kernel_bo(struct xe_vm *vm, struct xe_bo *bo,
  */
 int xe_vm_lock(struct xe_vm *vm, bool intr)
 {
+	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
+	int ret;
+
 	if (intr)
-		return dma_resv_lock_interruptible(xe_vm_resv(vm), NULL);
+		ret = dma_resv_lock_interruptible(xe_vm_resv(vm), NULL);
+	else
+		ret = dma_resv_lock(xe_vm_resv(vm), NULL);
 
-	return dma_resv_lock(xe_vm_resv(vm), NULL);
+	if (!ret)
+		xe_vm_set_validation_exec(vm, exec);
+
+	return ret;
 }
 
 /**
@@ -3875,6 +3888,7 @@ int xe_vm_lock(struct xe_vm *vm, bool intr)
  */
 void xe_vm_unlock(struct xe_vm *vm)
 {
+	xe_vm_set_validation_exec(vm, NULL);
 	dma_resv_unlock(xe_vm_resv(vm));
 }
 
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 2ecb417c19a2..11f4e522cec5 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -321,7 +321,7 @@ static inline void xe_vm_set_validating(struct xe_vm *vm, bool allow_res_evict)
 	if (vm && !allow_res_evict) {
 		xe_vm_assert_held(vm);
 		/* Pairs with READ_ONCE in xe_vm_is_validating() */
-		WRITE_ONCE(vm->validating, current);
+		WRITE_ONCE(vm->validation.validating, current);
 	}
 }
 
@@ -339,7 +339,7 @@ static inline void xe_vm_clear_validating(struct xe_vm *vm, bool allow_res_evict
 {
 	if (vm && !allow_res_evict) {
 		/* Pairs with READ_ONCE in xe_vm_is_validating() */
-		WRITE_ONCE(vm->validating, NULL);
+		WRITE_ONCE(vm->validation.validating, NULL);
 	}
 }
 
@@ -357,13 +357,41 @@ static inline void xe_vm_clear_validating(struct xe_vm *vm, bool allow_res_evict
 static inline bool xe_vm_is_validating(struct xe_vm *vm)
 {
 	/* Pairs with WRITE_ONCE in xe_vm_is_validating() */
-	if (READ_ONCE(vm->validating) == current) {
+	if (READ_ONCE(vm->validation.validating) == current) {
 		xe_vm_assert_held(vm);
 		return true;
 	}
 	return false;
 }
 
+/**
+ * xe_vm_set_validation_exec() - Accessor to set the drm_exec object
+ * @vm: The vm we want to register a drm_exec object with.
+ * @exec: The exec object we want to register.
+ *
+ * Set the drm_exec object used to lock the vm's resv.
+ */
+static inline void xe_vm_set_validation_exec(struct xe_vm *vm, struct drm_exec *exec)
+{
+	xe_vm_assert_held(vm);
+	xe_assert(vm->xe, !!exec ^ !!vm->validation._exec);
+	vm->validation._exec = exec;
+}
+
+/**
+ * xe_vm_set_validation_exec() - Accessor to read the drm_exec object
+ * @vm: The vm we want to register a drm_exec object with.
+ *
+ * Return: The drm_exec object used to lock the vm's resv. The value
+ * is a valid pointer, %NULL, or one of the special values defined in
+ * xe_validation.h.
+ */
+static inline struct drm_exec *xe_vm_validation_exec(struct xe_vm *vm)
+{
+	xe_vm_assert_held(vm);
+	return vm->validation._exec;
+}
+
 /**
  * xe_vm_has_valid_gpu_mapping() - Advisory helper to check if VMA or SVM range has
  * a valid GPU mapping
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 8a07feef503b..2f88808e36bb 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -312,19 +312,35 @@ struct xe_vm {
 		bool capture_once;
 	} error_capture;
 
+	/**
+	 * @validation: Validation data only valid with the vm resv held.
+	 * Note: This is really task state of the task holding the vm resv,
+	 * and moving forward we should
+	 * come up with a better way of passing this down the call-
+	 * chain.
+	 */
+	struct {
+		/**
+		 * @validation.validating: The task that is currently making bos resident.
+		 * for this vm.
+		 * Protected by the VM's resv for writing. Opportunistic reading can be done
+		 * using READ_ONCE. Note: This is a workaround for the
+		 * TTM eviction_valuable() callback not being passed a struct
+		 * ttm_operation_context(). Future work might want to address this.
+		 */
+		struct task_struct *validating;
+		/**
+		 *  @validation.exec The drm_exec context used when locking the vm resv.
+		 *  Protected by the vm's resv.
+		 */
+		struct drm_exec *_exec;
+	} validation;
+
 	/**
 	 * @tlb_flush_seqno: Required TLB flush seqno for the next exec.
 	 * protected by the vm resv.
 	 */
 	u64 tlb_flush_seqno;
-	/**
-	 * @validating: The task that is currently making bos resident for this vm.
-	 * Protected by the VM's resv for writing. Opportunistic reading can be done
-	 * using READ_ONCE. Note: This is a workaround for the
-	 * TTM eviction_valuable() callback not being passed a struct
-	 * ttm_operation_context(). Future work might want to address this.
-	 */
-	struct task_struct *validating;
 	/** @batch_invalidate_tlb: Always invalidate TLB before batch start */
 	bool batch_invalidate_tlb;
 	/** @xef: XE file handle for tracking this VM's drm client */
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 05/16] drm/xe: Introduce an xe_validation wrapper around drm_exec
  2025-08-22  9:40 [PATCH v2 00/16] Driver-managed exhaustive eviction Thomas Hellström
                   ` (3 preceding siblings ...)
  2025-08-22  9:40 ` [PATCH v2 04/16] drm/xe: Pass down drm_exec context to validation Thomas Hellström
@ 2025-08-22  9:40 ` Thomas Hellström
  2025-08-26 20:42   ` Matthew Brost
  2025-08-22  9:40 ` [PATCH v2 06/16] drm/xe: Convert xe_bo_create_user() for exhaustive eviction Thomas Hellström
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Thomas Hellström @ 2025-08-22  9:40 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

Introduce a validation wrapper xe_validation_guard() as a helper
intended to be used around drm_exec transactions what perform
validations. Once TTM can handle exhaustive eviction we could
remove this wrapper or make it mostly a NO-OP unless other
functionality is added to it.

Currently the wrapper takes a read lock upon entry and if the
transaction hits an OOM, all locks are released and the
transaction is retried with a write-lock. If all other
validations participate in this scheme, the transaction with
the write lock will be the only transaction validating and
should have access to all available non-pinned memory.

There is currently a problem in that TTM converts -EDEADLOCKS to
-ENOMEM, and with ww_mutex slowpath error injections, we can hit
-ENOMEMs without having actually ran out of memory. We abuse
ww_mutex internals to detect such situations until TTM is fixes
to not convert the error code. In the meantime, injecting
ww_mutex slowpath -EDEADLOCKs is a good way to test
the implementation in the absence of real OOMs.

Just introduce the wrapper in this commit. It will be hooked up
to the driver in following commits.

v2:
- Mark class_xe_validation conditional so that the loop is
  skipped on initialization error.
- Argument sanitation (Matt Brost)
- Fix conditional execution of xe_validation_ctx_fini()
  (Matt Brost)
- Add a no_block mode for upcoming use in the CPU fault handler.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_validation.c | 228 +++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_validation.h | 122 +++++++++++++++
 2 files changed, 350 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_validation.c b/drivers/gpu/drm/xe/xe_validation.c
index cc0684d24e02..b90fda3dd5f4 100644
--- a/drivers/gpu/drm/xe/xe_validation.c
+++ b/drivers/gpu/drm/xe/xe_validation.c
@@ -5,6 +5,7 @@
 #include "xe_bo.h"
 #include <drm/drm_exec.h>
 #include <drm/drm_gem.h>
+#include <drm/drm_gpuvm.h>
 
 #include "xe_assert.h"
 #include "xe_validation.h"
@@ -47,3 +48,230 @@ void xe_validation_assert_exec(const struct xe_device *xe,
 	}
 }
 #endif
+
+static int xe_validation_lock(struct xe_validation_ctx *ctx)
+{
+	struct xe_validation_device *val = ctx->val;
+	int ret = 0;
+
+	if (ctx->val_flags.interruptible) {
+		if (ctx->request_exclusive)
+			ret = down_write_killable(&val->lock);
+		else
+			ret = down_read_interruptible(&val->lock);
+	} else {
+		if (ctx->request_exclusive)
+			down_write(&val->lock);
+		else
+			down_read(&val->lock);
+	}
+
+	if (!ret) {
+		ctx->lock_held = true;
+		ctx->lock_held_exclusive = ctx->request_exclusive;
+	}
+
+	return ret;
+}
+
+static int xe_validation_trylock(struct xe_validation_ctx *ctx)
+{
+	struct xe_validation_device *val = ctx->val;
+	bool locked;
+
+	if (ctx->request_exclusive)
+		locked = down_write_trylock(&val->lock);
+	else
+		locked = down_read_trylock(&val->lock);
+
+	if (locked) {
+		ctx->lock_held = true;
+		ctx->lock_held_exclusive = ctx->request_exclusive;
+	}
+
+	return locked ? 0 : -EWOULDBLOCK;
+}
+
+static void xe_validation_unlock(struct xe_validation_ctx *ctx)
+{
+	if (!ctx->lock_held)
+		return;
+
+	if (ctx->lock_held_exclusive)
+		up_write(&ctx->val->lock);
+	else
+		up_read(&ctx->val->lock);
+
+	ctx->lock_held = false;
+}
+
+/**
+ * xe_validation_ctx_init() - Initialize an xe_validation_ctx
+ * @ctx: The xe_validation_ctx to initialize.
+ * @val: The xe_validation_device representing the validation domain.
+ * @exec: The struct drm_exec to use for the transaction. May be NULL.
+ * @flags: The flags to use for initialization.
+ *
+ * Initialize and lock a an xe_validation transaction using the validation domain
+ * represented by @val. Also initialize the drm_exec object forwarding parts of
+ * @flags to the drm_exec initialization. The @flags.exclusive flag should
+ * typically be set to false to avoid locking out other validators from the
+ * domain until an OOM is hit. For testing- or final attempt purposes it can,
+ * however, be set to true.
+ *
+ * Return: %0 on success, %-EINTR if interruptible initial locking failed with a
+ * signal pending. If @flags.no_block is set to true, a failed trylock
+ * returns %-EWOULDBLOCK.
+ */
+int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct xe_validation_device *val,
+			   struct drm_exec *exec, const struct xe_val_flags flags)
+{
+	int ret;
+
+	ctx->exec = exec;
+	ctx->val = val;
+	ctx->lock_held = false;
+	ctx->lock_held_exclusive = false;
+	ctx->request_exclusive = flags.exclusive;
+	ctx->val_flags = flags;
+	ctx->exec_flags = 0;
+	ctx->nr = 0;
+
+	if (flags.no_block)
+		ret = xe_validation_trylock(ctx);
+	else
+		ret = xe_validation_lock(ctx);
+	if (ret)
+		return ret;
+
+	if (exec) {
+		if (flags.interruptible)
+			ctx->exec_flags |= DRM_EXEC_INTERRUPTIBLE_WAIT;
+		if (flags.exec_ignore_duplicates)
+			ctx->exec_flags |= DRM_EXEC_IGNORE_DUPLICATES;
+		drm_exec_init(exec, ctx->exec_flags, ctx->nr);
+	}
+
+	return 0;
+}
+
+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+/*
+ * This abuses both drm_exec and ww_mutex internals and should be
+ * replaced by checking for -EDEADLK when we can make TTM
+ * stop converting -EDEADLK to -ENOMEM.
+ * An alternative is to not have exhaustive eviction with
+ * CONFIG_DEBUG_WW_MUTEX_SLOWPATH until that happens.
+ */
+static bool xe_validation_contention_injected(struct drm_exec *exec)
+{
+	return !!exec->ticket.contending_lock;
+}
+
+#else
+
+static bool xe_validation_contention_injected(struct drm_exec *exec)
+{
+	return false;
+}
+
+#endif
+
+static bool __xe_validation_should_retry(struct xe_validation_ctx *ctx, int ret)
+{
+	if (ret == -ENOMEM &&
+	    ((ctx->request_exclusive &&
+	      xe_validation_contention_injected(ctx->exec)) ||
+	     !ctx->request_exclusive)) {
+		ctx->request_exclusive = true;
+		return true;
+	}
+
+	return false;
+}
+
+/**
+ * xe_validation_exec_lock() - Perform drm_gpuvm_exec_lock within a validation
+ * transaction.
+ * @ctx: An uninitialized xe_validation_ctx.
+ * @vm_exec: An initialized struct vm_exec.
+ * @val: The validation domain.
+ *
+ * The drm_gpuvm_exec_lock() function internally initializes its drm_exec
+ * transaction and therefore doesn't lend itself very well to be using
+ * xe_validation_ctx_init(). Provide a helper that takes an uninitialized
+ * xe_validation_ctx and calls drm_gpuvm_exec_lock() with OOM retry.
+ *
+ * Return: %0 on success, negative error code on failure.
+ */
+int xe_validation_exec_lock(struct xe_validation_ctx *ctx,
+			    struct drm_gpuvm_exec *vm_exec,
+			    struct xe_validation_device *val)
+{
+	int ret;
+
+	memset(ctx, 0, sizeof(*ctx));
+	ctx->exec = &vm_exec->exec;
+	ctx->exec_flags = vm_exec->flags;
+	ctx->val = val;
+	if (ctx->exec_flags & DRM_EXEC_INTERRUPTIBLE_WAIT)
+		ctx->val_flags.interruptible = 1;
+	if (ctx->exec_flags & DRM_EXEC_IGNORE_DUPLICATES)
+		ctx->val_flags.exec_ignore_duplicates = 1;
+retry:
+	ret = xe_validation_lock(ctx);
+	if (ret)
+		return ret;
+
+	ret = drm_gpuvm_exec_lock(vm_exec);
+	if (ret) {
+		xe_validation_unlock(ctx);
+		if (__xe_validation_should_retry(ctx, ret))
+			goto retry;
+	}
+
+	return ret;
+}
+
+/**
+ * xe_validation_ctx_fini() - Finalize a validation transaction
+ * @ctx: The Validation transaction to finalize.
+ *
+ * Finalize a validation transaction and its related drm_exec transaction.
+ */
+void xe_validation_ctx_fini(struct xe_validation_ctx *ctx)
+{
+	drm_exec_fini(ctx->exec);
+	xe_validation_unlock(ctx);
+}
+
+/**
+ * xe_validation_should_retry() - Determine if a validation transaction should retry
+ * @ctx: The validation transaction.
+ * @ret: Pointer to a return value variable.
+ *
+ * Determines whether a validation transaction should retry based on the
+ * internal transaction state and the return value pointed to by @ret.
+ * If a validation should be retried, the transaction is prepared for that,
+ * and the validation locked might be re-locked in exclusive mode, and *@ret
+ * is set to %0. If the re-locking errors, typically due to interruptible
+ * locking with signal pending, *@ret is instead set to -EINTR and the
+ * function returns %false.
+ *
+ * Return: %true if validation should be retried, %false otherwise.
+ */
+bool xe_validation_should_retry(struct xe_validation_ctx *ctx, int *ret)
+{
+	if (__xe_validation_should_retry(ctx, *ret)) {
+		drm_exec_fini(ctx->exec);
+		*ret = 0;
+		if (ctx->request_exclusive != ctx->lock_held_exclusive) {
+			xe_validation_unlock(ctx);
+			*ret = xe_validation_lock(ctx);
+		}
+		drm_exec_init(ctx->exec, ctx->exec_flags, ctx->nr);
+		return !*ret;
+	}
+
+	return false;
+}
diff --git a/drivers/gpu/drm/xe/xe_validation.h b/drivers/gpu/drm/xe/xe_validation.h
index db50feacad7a..36860974165e 100644
--- a/drivers/gpu/drm/xe/xe_validation.h
+++ b/drivers/gpu/drm/xe/xe_validation.h
@@ -7,9 +7,11 @@
 
 #include <linux/dma-resv.h>
 #include <linux/types.h>
+#include <linux/rwsem.h>
 
 struct drm_exec;
 struct drm_gem_object;
+struct drm_gpuvm_exec;
 struct xe_device;
 
 #ifdef CONFIG_PROVE_LOCKING
@@ -66,4 +68,124 @@ void xe_validation_assert_exec(const struct xe_device *xe, const struct drm_exec
 	} while (0)
 #endif
 
+/**
+ * struct xe_validation_device - The domain for exhaustive eviction
+ * @lock: The lock used to exclude other processes from allocating graphics memory
+ *
+ * The struct xe_validation_device represents the domain for which we want to use
+ * exhaustive eviction. The @lock is typically grabbed in read mode for allocations
+ * but when graphics memory allocation fails, it is retried with the write mode held.
+ */
+struct xe_validation_device {
+	struct rw_semaphore lock;
+};
+
+/**
+ * struct xe_val_flags - Flags for xe_validation_ctx_init().
+ * @exclusive: Start the validation transaction by locking out all other validators.
+ * @no_block:  Don't block on initialization.
+ * @interruptible: Block interruptible if blocking. Implies initializing the drm_exec
+ * context with the DRM_EXEC_INTERRUPTIBLE_WAIT flag.
+ * @exec_ignore_dupllicates: Initialize the drm_exec context with the
+ * DRM_EXEC_IGNORE_DUPLICATES flag.
+ */
+struct xe_val_flags {
+	u32 exclusive :1;
+	u32 no_block :1;
+	u32 interruptible :1;
+	u32 exec_ignore_duplicates :1;
+};
+
+/**
+ * struct xe_validation_ctx - A struct drm_exec subclass with support for
+ * exhaustive eviction
+ * @exec: The drm_exec object base class. Note that we use a pointer instead of
+ * embedding to avoid diamond inheritance.
+ * @val: The exhaustive eviction domain.
+ * @lock_held: Whether The domain lock is currently held.
+ * @lock_held_exclusive: Whether the domain lock is held in exclusive mode.
+ * @request_exclusive: Whether to lock exclusively (write mode) the next time
+ * the domain lock is locked.
+ * @flags: The drm_exec flags used for drm_exec (re-)initialization.
+ * @nr: The drm_exec nr parameter used for drm_exec (re-)initializaiton.
+ */
+struct xe_validation_ctx {
+	struct drm_exec *exec;
+	struct xe_validation_device *val;
+	struct xe_val_flags val_flags;
+	bool lock_held;
+	bool lock_held_exclusive;
+	bool request_exclusive;
+	u32 exec_flags;
+	unsigned int nr;
+};
+
+int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct xe_validation_device *val,
+			   struct drm_exec *exec, const struct xe_val_flags flags);
+
+int xe_validation_exec_lock(struct xe_validation_ctx *ctx, struct drm_gpuvm_exec *vm_exec,
+			    struct xe_validation_device *val);
+
+void xe_validation_ctx_fini(struct xe_validation_ctx *ctx);
+
+bool xe_validation_should_retry(struct xe_validation_ctx *ctx, int *ret);
+
+/**
+ * xe_validation_retry_on_oom() - Retry on oom in an xe_validaton transaction
+ * @_ctx: Pointer to the xe_validation_ctx
+ * @_ret: The current error value possibly holding -ENOMEM
+ *
+ * Use this in way similar to drm_exec_retry_on_contention().
+ * If @_ret contains -ENOMEM the tranaction is restarted once in a way that
+ * blocks other transactions and allows exhastive eviction. If the transaction
+ * was already restarted once, Just return the -ENOMEM. May also set
+ * _ret to -EINTR if not retrying and waits are interruptible.
+ * May only be used within a drm_exec_until_all_locked() loop.
+ */
+#define xe_validation_retry_on_oom(_ctx, _ret)				\
+	do {								\
+		if (xe_validation_should_retry(_ctx, _ret))		\
+			goto *__drm_exec_retry_ptr;			\
+	} while (0)
+
+/**
+ * xe_validation_device_init - Initialize a struct xe_validation_device
+ * @val: The xe_validation_device to init.
+ */
+static inline void
+xe_validation_device_init(struct xe_validation_device *val)
+{
+	init_rwsem(&val->lock);
+}
+
+/*
+ * Make guard() and scoped_guard() work with xe_validation_ctx
+ * so that we can exit transactions without caring about the
+ * cleanup.
+ */
+DEFINE_CLASS(xe_validation, struct xe_validation_ctx *,
+	     if (_T) xe_validation_ctx_fini(_T);,
+	     ({_ret = xe_validation_ctx_init(_ctx, _val, _exec, _flags);
+	       _ret ? NULL : _ctx; }),
+	     struct xe_validation_ctx *_ctx, struct xe_validation_device *_val,
+	     struct drm_exec *_exec, const struct xe_val_flags _flags, int _ret);
+static inline void *class_xe_validation_lock_ptr(class_xe_validation_t *_T)
+{return *_T; }
+#define class_xe_validation_is_conditional true
+
+/**
+ * xe_validation_guard() - An auto-cleanup xe_validation_ctx transaction
+ * @_ctx: The xe_validation_ctx.
+ * @_val: The xe_validation_device.
+ * @_exec: The struct drm_exec object
+ * @_flags: Flags for the xe_validation_ctx initialization.
+ * @_ret: Return in / out parameter. May be set by this macro. Typicall 0 when called.
+ *
+ * This macro is will initiate a drm_exec transaction with additional support for
+ * exhaustive eviction.
+ */
+#define xe_validation_guard(_ctx, _val, _exec, _flags, _ret)		\
+	scoped_guard(xe_validation, _ctx, _val, _exec, _flags, _ret) \
+	drm_exec_until_all_locked(_exec)
+
 #endif
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 06/16] drm/xe: Convert xe_bo_create_user() for exhaustive eviction
  2025-08-22  9:40 [PATCH v2 00/16] Driver-managed exhaustive eviction Thomas Hellström
                   ` (4 preceding siblings ...)
  2025-08-22  9:40 ` [PATCH v2 05/16] drm/xe: Introduce an xe_validation wrapper around drm_exec Thomas Hellström
@ 2025-08-22  9:40 ` Thomas Hellström
  2025-08-23  9:32   ` Simon Richter
  2025-08-22  9:40 ` [PATCH v2 07/16] drm/xe: Convert SVM validation " Thomas Hellström
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Thomas Hellström @ 2025-08-22  9:40 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

Use the xe_validation_guard() to convert xe_bo_create_user()
for exhaustive eviction.

v2:
- Adapt to argument changes of xe_validation_guard()

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v1
---
 drivers/gpu/drm/xe/tests/xe_bo.c      |  16 ++--
 drivers/gpu/drm/xe/tests/xe_dma_buf.c |   4 +-
 drivers/gpu/drm/xe/tests/xe_migrate.c |  12 +--
 drivers/gpu/drm/xe/xe_bo.c            | 114 +++++++++++++++++---------
 drivers/gpu/drm/xe/xe_bo.h            |   9 +-
 drivers/gpu/drm/xe/xe_device.c        |   2 +
 drivers/gpu/drm/xe/xe_device_types.h  |   3 +
 drivers/gpu/drm/xe/xe_vm.c            |  14 ++++
 drivers/gpu/drm/xe/xe_vm.h            |   2 +
 9 files changed, 115 insertions(+), 61 deletions(-)

diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
index 06ceba6c3c25..42f914692a02 100644
--- a/drivers/gpu/drm/xe/tests/xe_bo.c
+++ b/drivers/gpu/drm/xe/tests/xe_bo.c
@@ -139,8 +139,8 @@ static void ccs_test_run_tile(struct xe_device *xe, struct xe_tile *tile,
 	else
 		kunit_info(test, "Testing system memory\n");
 
-	bo = xe_bo_create_user(xe, NULL, NULL, SZ_1M, DRM_XE_GEM_CPU_CACHING_WC,
-			       bo_flags);
+	bo = xe_bo_create_user(xe, NULL, SZ_1M, DRM_XE_GEM_CPU_CACHING_WC,
+			       bo_flags, exec);
 	if (IS_ERR(bo)) {
 		KUNIT_FAIL(test, "Failed to create bo.\n");
 		return;
@@ -220,18 +220,18 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
 
 	for (i = 0; i < 2; ++i) {
 		xe_vm_lock(vm, false);
-		bo = xe_bo_create_user(xe, NULL, vm, 0x10000,
+		bo = xe_bo_create_user(xe, vm, 0x10000,
 				       DRM_XE_GEM_CPU_CACHING_WC,
-				       bo_flags);
+				       bo_flags, exec);
 		xe_vm_unlock(vm);
 		if (IS_ERR(bo)) {
 			KUNIT_FAIL(test, "bo create err=%pe\n", bo);
 			break;
 		}
 
-		external = xe_bo_create_user(xe, NULL, NULL, 0x10000,
+		external = xe_bo_create_user(xe, NULL, 0x10000,
 					     DRM_XE_GEM_CPU_CACHING_WC,
-					     bo_flags);
+					     bo_flags, NULL);
 		if (IS_ERR(external)) {
 			KUNIT_FAIL(test, "external bo create err=%pe\n", external);
 			goto cleanup_bo;
@@ -497,9 +497,9 @@ static int shrink_test_run_device(struct xe_device *xe)
 		INIT_LIST_HEAD(&link->link);
 
 		/* We can create bos using WC caching here. But it is slower. */
-		bo = xe_bo_create_user(xe, NULL, NULL, XE_BO_SHRINK_SIZE,
+		bo = xe_bo_create_user(xe, NULL, XE_BO_SHRINK_SIZE,
 				       DRM_XE_GEM_CPU_CACHING_WB,
-				       XE_BO_FLAG_SYSTEM);
+				       XE_BO_FLAG_SYSTEM, NULL);
 		if (IS_ERR(bo)) {
 			if (bo != ERR_PTR(-ENOMEM) && bo != ERR_PTR(-ENOSPC) &&
 			    bo != ERR_PTR(-EINTR) && bo != ERR_PTR(-ERESTARTSYS))
diff --git a/drivers/gpu/drm/xe/tests/xe_dma_buf.c b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
index 965dd3280468..8126b35f4aeb 100644
--- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
@@ -122,8 +122,8 @@ static void xe_test_dmabuf_import_same_driver(struct xe_device *xe)
 		size = SZ_64K;
 
 	kunit_info(test, "running %s\n", __func__);
-	bo = xe_bo_create_user(xe, NULL, NULL, size, DRM_XE_GEM_CPU_CACHING_WC,
-			       params->mem_mask);
+	bo = xe_bo_create_user(xe, NULL, size, DRM_XE_GEM_CPU_CACHING_WC,
+			       params->mem_mask, NULL);
 	if (IS_ERR(bo)) {
 		KUNIT_FAIL(test, "xe_bo_create() failed with err=%ld\n",
 			   PTR_ERR(bo));
diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
index dfb445d09759..afa794e56065 100644
--- a/drivers/gpu/drm/xe/tests/xe_migrate.c
+++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
@@ -642,11 +642,11 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
 	struct drm_exec *exec;
 	long ret;
 
-	sys_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M,
+	sys_bo = xe_bo_create_user(xe, NULL, SZ_4M,
 				   DRM_XE_GEM_CPU_CACHING_WC,
 				   XE_BO_FLAG_SYSTEM |
 				   XE_BO_FLAG_NEEDS_CPU_ACCESS |
-				   XE_BO_FLAG_PINNED);
+				   XE_BO_FLAG_PINNED, NULL);
 
 	if (IS_ERR(sys_bo)) {
 		KUNIT_FAIL(test, "xe_bo_create() failed with err=%ld\n",
@@ -669,10 +669,10 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
 	}
 	xe_bo_unlock(sys_bo);
 
-	ccs_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M,
+	ccs_bo = xe_bo_create_user(xe, NULL, SZ_4M,
 				   DRM_XE_GEM_CPU_CACHING_WC,
 				   bo_flags | XE_BO_FLAG_NEEDS_CPU_ACCESS |
-				   XE_BO_FLAG_PINNED);
+				   XE_BO_FLAG_PINNED, NULL);
 
 	if (IS_ERR(ccs_bo)) {
 		KUNIT_FAIL(test, "xe_bo_create() failed with err=%ld\n",
@@ -694,10 +694,10 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
 	}
 	xe_bo_unlock(ccs_bo);
 
-	vram_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M,
+	vram_bo = xe_bo_create_user(xe, NULL, SZ_4M,
 				    DRM_XE_GEM_CPU_CACHING_WC,
 				    bo_flags | XE_BO_FLAG_NEEDS_CPU_ACCESS |
-				    XE_BO_FLAG_PINNED);
+				    XE_BO_FLAG_PINNED, NULL);
 	if (IS_ERR(vram_bo)) {
 		KUNIT_FAIL(test, "xe_bo_create() failed with err=%ld\n",
 			   PTR_ERR(vram_bo));
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index e71addf51ed0..76e9c93826a2 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -2185,30 +2185,66 @@ struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
 				     flags, 0, exec);
 }
 
-struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
-				struct xe_vm *vm, size_t size,
-				u16 cpu_caching,
-				u32 flags)
+static struct xe_bo *xe_bo_create_novm(struct xe_device *xe, struct xe_tile *tile,
+				       size_t size, u16 cpu_caching,
+				       enum ttm_bo_type type, u32 flags,
+				       u64 alignment, bool intr)
 {
-	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
-	struct xe_bo *bo = __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL,
-						 cpu_caching, ttm_bo_type_device,
-						 flags | XE_BO_FLAG_USER, 0, exec);
-	if (!IS_ERR(bo))
-		xe_bo_unlock_vm_held(bo);
+	struct xe_validation_ctx ctx;
+	struct drm_exec exec;
+	struct xe_bo *bo;
+	int ret = 0;
 
-	return bo;
+	xe_validation_guard(&ctx, &xe->val, &exec, (struct xe_val_flags) {.interruptible = intr},
+			    ret) {
+		bo = __xe_bo_create_locked(xe, tile, NULL, size, 0, ~0ULL,
+					   cpu_caching, type, flags, alignment, &exec);
+		drm_exec_retry_on_contention(&exec);
+		if (IS_ERR(bo)) {
+			ret = PTR_ERR(bo);
+			xe_validation_retry_on_oom(&ctx, &ret);
+		} else {
+			xe_bo_unlock(bo);
+		}
+	}
+
+	return ret ? ERR_PTR(ret) : bo;
 }
 
-struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
-			   struct xe_vm *vm, size_t size,
-			   enum ttm_bo_type type, u32 flags)
+/**
+ * xe_bo_create_user() - Create a user BO
+ * @xe: The xe device.
+ * @vm: The local vm or NULL for external objects.
+ * @size: The storage size to use for the bo.
+ * @cpu_caching: The caching mode to be used for system backing store.
+ * @flags: XE_BO_FLAG_ flags.
+ * @exec: The drm_exec transaction to use for exhaustive eviction, or NULL
+ * if such a transaction should be initiated by the call.
+ *
+ * Create a bo on behalf of user-space.
+ *
+ * Return: The buffer object on success. Negative error pointer on failure.
+ */
+struct xe_bo *xe_bo_create_user(struct xe_device *xe,
+				struct xe_vm *vm, size_t size,
+				u16 cpu_caching,
+				u32 flags, struct drm_exec *exec)
 {
-	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
-	struct xe_bo *bo = xe_bo_create_locked(xe, tile, vm, size, type, flags, exec);
+	struct xe_bo *bo;
 
-	if (!IS_ERR(bo))
-		xe_bo_unlock_vm_held(bo);
+	flags |= XE_BO_FLAG_USER;
+
+	if (vm || exec) {
+		xe_assert(xe, exec);
+		bo = __xe_bo_create_locked(xe, NULL, vm, size, 0, ~0ULL,
+					   cpu_caching, ttm_bo_type_device,
+					   flags, 0, exec);
+		if (!IS_ERR(bo))
+			xe_bo_unlock_vm_held(bo);
+	} else {
+		bo = xe_bo_create_novm(xe, NULL, size, cpu_caching,
+				       ttm_bo_type_device, flags, 0, true);
+	}
 
 	return bo;
 }
@@ -2757,8 +2793,9 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
 	struct xe_device *xe = to_xe_device(dev);
 	struct xe_file *xef = to_xe_file(file);
 	struct drm_xe_gem_create *args = data;
+	struct xe_validation_ctx ctx;
+	struct drm_exec exec;
 	struct xe_vm *vm = NULL;
-	ktime_t end = 0;
 	struct xe_bo *bo;
 	unsigned int bo_flags;
 	u32 handle;
@@ -2832,25 +2869,26 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
 			return -ENOENT;
 	}
 
-retry:
-	if (vm) {
-		err = xe_vm_lock(vm, true);
-		if (err)
-			goto out_vm;
+	err = 0;
+	xe_validation_guard(&ctx, &xe->val, &exec, (struct xe_val_flags) {.interruptible = true},
+			    err) {
+		if (vm) {
+			err = xe_vm_drm_exec_lock(vm, &exec);
+			drm_exec_retry_on_contention(&exec);
+			if (err)
+				break;
+		}
+		bo = xe_bo_create_user(xe, vm, args->size, args->cpu_caching,
+				       bo_flags, &exec);
+		drm_exec_retry_on_contention(&exec);
+		if (IS_ERR(bo)) {
+			err = PTR_ERR(bo);
+			xe_validation_retry_on_oom(&ctx, &err);
+			break;
+		}
 	}
-
-	bo = xe_bo_create_user(xe, NULL, vm, args->size, args->cpu_caching,
-			       bo_flags);
-
-	if (vm)
-		xe_vm_unlock(vm);
-
-	if (IS_ERR(bo)) {
-		err = PTR_ERR(bo);
-		if (xe_vm_validate_should_retry(NULL, err, &end))
-			goto retry;
+	if (err)
 		goto out_vm;
-	}
 
 	if (args->extensions) {
 		err = gem_create_user_extensions(xe, bo, args->extensions, 0);
@@ -3223,11 +3261,11 @@ int xe_bo_dumb_create(struct drm_file *file_priv,
 	args->size = ALIGN(mul_u32_u32(args->pitch, args->height),
 			   page_size);
 
-	bo = xe_bo_create_user(xe, NULL, NULL, args->size,
+	bo = xe_bo_create_user(xe, NULL, args->size,
 			       DRM_XE_GEM_CPU_CACHING_WC,
 			       XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
 			       XE_BO_FLAG_SCANOUT |
-			       XE_BO_FLAG_NEEDS_CPU_ACCESS);
+			       XE_BO_FLAG_NEEDS_CPU_ACCESS, NULL);
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
 
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index b1b6cb622d71..c6bb90ca5c2e 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -104,13 +104,8 @@ struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
 				  struct xe_vm *vm, size_t size,
 				  enum ttm_bo_type type, u32 flags,
 				  struct drm_exec *exec);
-struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
-			   struct xe_vm *vm, size_t size,
-			   enum ttm_bo_type type, u32 flags);
-struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
-				struct xe_vm *vm, size_t size,
-				u16 cpu_caching,
-				u32 flags);
+struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_vm *vm, size_t size,
+				u16 cpu_caching, u32 flags, struct drm_exec *exec);
 struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
 				   struct xe_vm *vm, size_t size,
 				   enum ttm_bo_type type, u32 flags);
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 3e0402dff423..6b152aa89dbb 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -452,6 +452,8 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
 	if (err)
 		goto err;
 
+	xe_validation_device_init(&xe->val);
+
 	init_waitqueue_head(&xe->ufence_wq);
 
 	init_rwsem(&xe->usm.lock);
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index e67fbfe59afa..1337b07068e0 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -26,6 +26,7 @@
 #include "xe_sriov_vf_ccs_types.h"
 #include "xe_step_types.h"
 #include "xe_survivability_mode_types.h"
+#include "xe_validation.h"
 
 #if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
 #define TEST_VM_OPS_ERROR
@@ -575,6 +576,8 @@ struct xe_device {
 	 */
 	atomic64_t global_total_pages;
 #endif
+	/** @val: The domain for exhaustive eviction, which is currently per device. */
+	struct xe_validation_device val;
 
 	/* private: */
 
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index f1e74959f8ff..0f15041b4dde 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -47,6 +47,20 @@ static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm)
 	return vm->gpuvm.r_obj;
 }
 
+/**
+ * xe_vm_drm_exec_lock() - Lock the vm's resv with a drm_exec transaction
+ * @vm: The vm whose resv is to be locked.
+ * @exec: The drm_exec transaction.
+ *
+ * Helper to lock the vm's resv as part of a drm_exec transaction.
+ *
+ * Return: %0 on success. See drm_exec_lock_obj() for error codes.
+ */
+int xe_vm_drm_exec_lock(struct xe_vm *vm, struct drm_exec *exec)
+{
+	return drm_exec_lock_obj(exec, xe_vm_obj(vm));
+}
+
 /**
  * xe_vma_userptr_check_repin() - Advisory check for repin needed
  * @uvma: The userptr vma
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 11f4e522cec5..5a3456e82cf2 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -292,6 +292,8 @@ void xe_vm_kill(struct xe_vm *vm, bool unlocked);
  */
 #define xe_vm_assert_held(vm) dma_resv_assert_held(xe_vm_resv(vm))
 
+int xe_vm_drm_exec_lock(struct xe_vm *vm, struct drm_exec *exec);
+
 #if IS_ENABLED(CONFIG_DRM_XE_DEBUG_VM)
 #define vm_dbg drm_dbg
 #else
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 07/16] drm/xe: Convert SVM validation for exhaustive eviction
  2025-08-22  9:40 [PATCH v2 00/16] Driver-managed exhaustive eviction Thomas Hellström
                   ` (5 preceding siblings ...)
  2025-08-22  9:40 ` [PATCH v2 06/16] drm/xe: Convert xe_bo_create_user() for exhaustive eviction Thomas Hellström
@ 2025-08-22  9:40 ` Thomas Hellström
  2025-08-22 19:13   ` Matthew Brost
  2025-08-22  9:40 ` [PATCH v2 08/16] drm/xe: Convert existing drm_exec transactions " Thomas Hellström
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Thomas Hellström @ 2025-08-22  9:40 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

Convert SVM validation to support exhaustive eviction,
using xe_validation_guard().

v2:
- Wrap also xe_vm_range_rebind (Matt Brost)
- Adapt to argument changes of xe_validation_guard().

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_svm.c | 99 +++++++++++++++++++------------------
 1 file changed, 51 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index 39e3aa6df25a..667ca1f7cc29 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -699,51 +699,48 @@ static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
 	struct xe_device *xe = vr->xe;
 	struct device *dev = xe->drm.dev;
 	struct drm_buddy_block *block;
+	struct xe_validation_ctx vctx;
 	struct list_head *blocks;
-	struct drm_exec *exec;
+	struct drm_exec exec;
 	struct xe_bo *bo;
-	ktime_t time_end = 0;
-	int err, idx;
+	int err = 0, idx;
 
 	if (!drm_dev_enter(&xe->drm, &idx))
 		return -ENODEV;
 
 	xe_pm_runtime_get(xe);
-	exec = XE_VALIDATION_UNIMPLEMENTED;
-
- retry:
-	bo = xe_bo_create_locked(vr->xe, NULL, NULL, end - start,
-				 ttm_bo_type_device,
-				 (IS_DGFX(xe) ? XE_BO_FLAG_VRAM(vr) : XE_BO_FLAG_SYSTEM) |
-				 XE_BO_FLAG_CPU_ADDR_MIRROR, exec);
-	if (IS_ERR(bo)) {
-		err = PTR_ERR(bo);
-		if (xe_vm_validate_should_retry(NULL, err, &time_end))
-			goto retry;
-		goto out_pm_put;
-	}
 
-	drm_pagemap_devmem_init(&bo->devmem_allocation, dev, mm,
-				&dpagemap_devmem_ops, dpagemap, end - start);
-
-	blocks = &to_xe_ttm_vram_mgr_resource(bo->ttm.resource)->blocks;
-	list_for_each_entry(block, blocks, link)
-		block->private = vr;
+	xe_validation_guard(&vctx, &xe->val, &exec, (struct xe_val_flags) {}, err) {
+		bo = xe_bo_create_locked(xe, NULL, NULL, end - start,
+					 ttm_bo_type_device,
+					 (IS_DGFX(xe) ? XE_BO_FLAG_VRAM(vr) : XE_BO_FLAG_SYSTEM) |
+					 XE_BO_FLAG_CPU_ADDR_MIRROR, &exec);
+		drm_exec_retry_on_contention(&exec);
+		if (IS_ERR(bo)) {
+			err = PTR_ERR(bo);
+			xe_validation_retry_on_oom(&vctx, &err);
+			break;
+		}
 
-	xe_bo_get(bo);
+		drm_pagemap_devmem_init(&bo->devmem_allocation, dev, mm,
+					&dpagemap_devmem_ops, dpagemap, end - start);
 
-	/* Ensure the device has a pm ref while there are device pages active. */
-	xe_pm_runtime_get_noresume(xe);
-	err = drm_pagemap_migrate_to_devmem(&bo->devmem_allocation, mm,
-					    start, end, timeslice_ms,
-					    xe_svm_devm_owner(xe));
-	if (err)
-		xe_svm_devmem_release(&bo->devmem_allocation);
+		blocks = &to_xe_ttm_vram_mgr_resource(bo->ttm.resource)->blocks;
+		list_for_each_entry(block, blocks, link)
+			block->private = vr;
 
-	xe_bo_unlock(bo);
-	xe_bo_put(bo);
+		xe_bo_get(bo);
 
-out_pm_put:
+		/* Ensure the device has a pm ref while there are device pages active. */
+		xe_pm_runtime_get_noresume(xe);
+		err = drm_pagemap_migrate_to_devmem(&bo->devmem_allocation, mm,
+						    start, end, timeslice_ms,
+						    xe_svm_devm_owner(xe));
+		if (err)
+			xe_svm_devmem_release(&bo->devmem_allocation);
+		xe_bo_unlock(bo);
+		xe_bo_put(bo);
+	}
 	xe_pm_runtime_put(xe);
 	drm_dev_exit(idx);
 
@@ -820,11 +817,12 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
 			IS_ENABLED(CONFIG_DRM_XE_PAGEMAP) ?
 			vm->xe->atomic_svm_timeslice_ms : 0,
 	};
+	struct xe_validation_ctx vctx;
+	struct drm_exec exec;
 	struct xe_svm_range *range;
 	struct dma_fence *fence;
 	struct xe_tile *tile = gt_to_tile(gt);
 	int migrate_try_count = ctx.devmem_only ? 3 : 1;
-	ktime_t end = 0;
 	int err;
 
 	lockdep_assert_held_write(&vm->lock);
@@ -894,27 +892,32 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
 
 	range_debug(range, "PAGE FAULT - BIND");
 
-retry_bind:
-	xe_vm_lock(vm, false);
-	fence = xe_vm_range_rebind(vm, vma, range, BIT(tile->id));
-	if (IS_ERR(fence)) {
-		xe_vm_unlock(vm);
-		err = PTR_ERR(fence);
-		if (err == -EAGAIN) {
-			ctx.timeslice_ms <<= 1;	/* Double timeslice if we have to retry */
-			range_debug(range, "PAGE FAULT - RETRY BIND");
-			goto retry;
+	xe_validation_guard(&vctx, &vm->xe->val, &exec, (struct xe_val_flags) {}, err) {
+		err = xe_vm_drm_exec_lock(vm, &exec);
+		drm_exec_retry_on_contention(&exec);
+
+		xe_vm_set_validation_exec(vm, &exec);
+		fence = xe_vm_range_rebind(vm, vma, range, BIT(tile->id));
+		xe_vm_set_validation_exec(vm, NULL);
+		if (IS_ERR(fence)) {
+			drm_exec_retry_on_contention(&exec);
+			err = PTR_ERR(fence);
+			xe_validation_retry_on_oom(&vctx, &err);
 		}
-		if (xe_vm_validate_should_retry(NULL, err, &end))
-			goto retry_bind;
-		goto err_out;
 	}
-	xe_vm_unlock(vm);
+	if (err)
+		goto err_out;
 
 	dma_fence_wait(fence, false);
 	dma_fence_put(fence);
+	return 0;
 
 err_out:
+	if (err == -EAGAIN) {
+		ctx.timeslice_ms <<= 1;	/* Double timeslice if we have to retry */
+		range_debug(range, "PAGE FAULT - RETRY BIND");
+		goto retry;
+	}
 
 	return err;
 }
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 08/16] drm/xe: Convert existing drm_exec transactions for exhaustive eviction
  2025-08-22  9:40 [PATCH v2 00/16] Driver-managed exhaustive eviction Thomas Hellström
                   ` (6 preceding siblings ...)
  2025-08-22  9:40 ` [PATCH v2 07/16] drm/xe: Convert SVM validation " Thomas Hellström
@ 2025-08-22  9:40 ` Thomas Hellström
  2025-08-22  9:40 ` [PATCH v2 09/16] drm/xe: Convert the CPU fault handler " Thomas Hellström
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 36+ messages in thread
From: Thomas Hellström @ 2025-08-22  9:40 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

Convert existing drm_exec transactions, like GT pagefault validation,
non-LR exec() IOCTL and the rebind worker to support
exhaustive eviction using the xe_validation_guard().

v2:
- Adapt to signature change in xe_validation_guard() (Matt Brost)
- Avoid gotos from within xe_validation_guard() (Matt Brost)
- Check error return from xe_validation_guard()

Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v1
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_exec.c         |  20 ++--
 drivers/gpu/drm/xe/xe_gt_pagefault.c |  20 ++--
 drivers/gpu/drm/xe/xe_vm.c           | 139 +++++++++++----------------
 drivers/gpu/drm/xe/xe_vm.h           |   2 -
 4 files changed, 75 insertions(+), 106 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index 0bcb4fb9a10e..cdc3ff931a90 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -119,10 +119,10 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	struct drm_gpuvm_exec vm_exec = {.extra.fn = xe_exec_fn};
 	struct drm_exec *exec = &vm_exec.exec;
 	u32 i, num_syncs, num_ufence = 0;
+	struct xe_validation_ctx ctx;
 	struct xe_sched_job *job;
 	struct xe_vm *vm;
 	bool write_locked, skip_retry = false;
-	ktime_t end = 0;
 	int err = 0;
 	struct xe_hw_engine_group *group;
 	enum xe_hw_engine_group_execution_mode mode, previous_mode;
@@ -241,17 +241,12 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		goto err_unlock_list;
 	}
 
-	vm_exec.vm = &vm->gpuvm;
-	vm_exec.flags = DRM_EXEC_INTERRUPTIBLE_WAIT;
-	if (xe_vm_in_lr_mode(vm)) {
-		drm_exec_init(exec, vm_exec.flags, 0);
-	} else {
-		err = drm_gpuvm_exec_lock(&vm_exec);
-		if (err) {
-			if (xe_vm_validate_should_retry(exec, err, &end))
-				err = -EAGAIN;
+	if (!xe_vm_in_lr_mode(vm)) {
+		vm_exec.vm = &vm->gpuvm;
+		vm_exec.flags = DRM_EXEC_INTERRUPTIBLE_WAIT;
+		err = xe_validation_exec_lock(&ctx, &vm_exec, &xe->val);
+		if (err)
 			goto err_unlock_list;
-		}
 	}
 
 	if (xe_vm_is_closed_or_banned(q->vm)) {
@@ -345,7 +340,8 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	if (err)
 		xe_sched_job_put(job);
 err_exec:
-	drm_exec_fini(exec);
+	if (!xe_vm_in_lr_mode(vm))
+		xe_validation_ctx_fini(&ctx);
 err_unlock_list:
 	up_read(&vm->lock);
 	if (err == -EAGAIN && !skip_retry)
diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index 4133b9b78f7d..6ef448bd331b 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -112,9 +112,9 @@ static int handle_vma_pagefault(struct xe_gt *gt, struct xe_vma *vma,
 {
 	struct xe_vm *vm = xe_vma_vm(vma);
 	struct xe_tile *tile = gt_to_tile(gt);
+	struct xe_validation_ctx ctx;
 	struct drm_exec exec;
 	struct dma_fence *fence;
-	ktime_t end = 0;
 	int err;
 
 	lockdep_assert_held_write(&vm->lock);
@@ -139,12 +139,11 @@ static int handle_vma_pagefault(struct xe_gt *gt, struct xe_vma *vma,
 	}
 
 	/* Lock VM and BOs dma-resv */
-	drm_exec_init(&exec, 0, 0);
+	xe_validation_ctx_init(&ctx, &vm->xe->val, &exec, (struct xe_val_flags) {});
 	drm_exec_until_all_locked(&exec) {
 		err = xe_pf_begin(&exec, vma, atomic, tile->mem.vram);
 		drm_exec_retry_on_contention(&exec);
-		if (xe_vm_validate_should_retry(&exec, err, &end))
-			err = -EAGAIN;
+		xe_validation_retry_on_oom(&ctx, &err);
 		if (err)
 			goto unlock_dma_resv;
 
@@ -155,8 +154,7 @@ static int handle_vma_pagefault(struct xe_gt *gt, struct xe_vma *vma,
 		xe_vm_set_validation_exec(vm, NULL);
 		if (IS_ERR(fence)) {
 			err = PTR_ERR(fence);
-			if (xe_vm_validate_should_retry(&exec, err, &end))
-				err = -EAGAIN;
+			xe_validation_retry_on_oom(&ctx, &err);
 			goto unlock_dma_resv;
 		}
 	}
@@ -165,7 +163,7 @@ static int handle_vma_pagefault(struct xe_gt *gt, struct xe_vma *vma,
 	dma_fence_put(fence);
 
 unlock_dma_resv:
-	drm_exec_fini(&exec);
+	xe_validation_ctx_fini(&ctx);
 	if (err == -EAGAIN)
 		goto retry_userptr;
 
@@ -547,6 +545,7 @@ static int handle_acc(struct xe_gt *gt, struct acc *acc)
 {
 	struct xe_device *xe = gt_to_xe(gt);
 	struct xe_tile *tile = gt_to_tile(gt);
+	struct xe_validation_ctx ctx;
 	struct drm_exec exec;
 	struct xe_vm *vm;
 	struct xe_vma *vma;
@@ -576,15 +575,14 @@ static int handle_acc(struct xe_gt *gt, struct acc *acc)
 		goto unlock_vm;
 
 	/* Lock VM and BOs dma-resv */
-	drm_exec_init(&exec, 0, 0);
+	xe_validation_ctx_init(&ctx, &vm->xe->val, &exec, (struct xe_val_flags) {});
 	drm_exec_until_all_locked(&exec) {
 		ret = xe_pf_begin(&exec, vma, true, tile->mem.vram);
 		drm_exec_retry_on_contention(&exec);
-		if (ret)
-			break;
+		xe_validation_retry_on_oom(&ctx, &ret);
 	}
 
-	drm_exec_fini(&exec);
+	xe_validation_ctx_fini(&ctx);
 unlock_vm:
 	up_read(&vm->lock);
 	xe_vm_put(vm);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 0f15041b4dde..23015f369e34 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -241,6 +241,7 @@ int xe_vm_add_compute_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q)
 		.num_fences = 1,
 	};
 	struct drm_exec *exec = &vm_exec.exec;
+	struct xe_validation_ctx ctx;
 	struct dma_fence *pfence;
 	int err;
 	bool wait;
@@ -248,7 +249,7 @@ int xe_vm_add_compute_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q)
 	xe_assert(vm->xe, xe_vm_in_preempt_fence_mode(vm));
 
 	down_write(&vm->lock);
-	err = drm_gpuvm_exec_lock(&vm_exec);
+	err = xe_validation_exec_lock(&ctx, &vm_exec, &vm->xe->val);
 	if (err)
 		goto out_up_write;
 
@@ -280,7 +281,7 @@ int xe_vm_add_compute_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q)
 	up_read(&vm->userptr.notifier_lock);
 
 out_fini:
-	drm_exec_fini(exec);
+	xe_validation_ctx_fini(&ctx);
 out_up_write:
 	up_write(&vm->lock);
 
@@ -363,39 +364,6 @@ void xe_vm_kill(struct xe_vm *vm, bool unlocked)
 	/* TODO: Inform user the VM is banned */
 }
 
-/**
- * xe_vm_validate_should_retry() - Whether to retry after a validate error.
- * @exec: The drm_exec object used for locking before validation.
- * @err: The error returned from ttm_bo_validate().
- * @end: A ktime_t cookie that should be set to 0 before first use and
- * that should be reused on subsequent calls.
- *
- * With multiple active VMs, under memory pressure, it is possible that
- * ttm_bo_validate() run into -EDEADLK and in such case returns -ENOMEM.
- * Until ttm properly handles locking in such scenarios, best thing the
- * driver can do is retry with a timeout. Check if that is necessary, and
- * if so unlock the drm_exec's objects while keeping the ticket to prepare
- * for a rerun.
- *
- * Return: true if a retry after drm_exec_init() is recommended;
- * false otherwise.
- */
-bool xe_vm_validate_should_retry(struct drm_exec *exec, int err, ktime_t *end)
-{
-	ktime_t cur;
-
-	if (err != -ENOMEM)
-		return false;
-
-	cur = ktime_get();
-	*end = *end ? : ktime_add_ms(cur, XE_VM_REBIND_RETRY_TIMEOUT_MS);
-	if (!ktime_before(cur, *end))
-		return false;
-
-	msleep(20);
-	return true;
-}
-
 static int xe_gpuvm_validate(struct drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
 {
 	struct xe_vm *vm = gpuvm_to_vm(vm_bo->vm);
@@ -496,10 +464,10 @@ static int xe_preempt_work_begin(struct drm_exec *exec, struct xe_vm *vm,
 static void preempt_rebind_work_func(struct work_struct *w)
 {
 	struct xe_vm *vm = container_of(w, struct xe_vm, preempt.rebind_work);
+	struct xe_validation_ctx ctx;
 	struct drm_exec exec;
 	unsigned int fence_count = 0;
 	LIST_HEAD(preempt_fences);
-	ktime_t end = 0;
 	int err = 0;
 	long wait;
 	int __maybe_unused tries = 0;
@@ -522,18 +490,19 @@ static void preempt_rebind_work_func(struct work_struct *w)
 			goto out_unlock_outer;
 	}
 
-	drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
+	err = xe_validation_ctx_init(&ctx, &vm->xe->val, &exec,
+				     (struct xe_val_flags) {.interruptible = true});
+	if (err)
+		goto out_unlock_outer;
 
 	drm_exec_until_all_locked(&exec) {
 		bool done = false;
 
 		err = xe_preempt_work_begin(&exec, vm, &done);
 		drm_exec_retry_on_contention(&exec);
+		xe_validation_retry_on_oom(&ctx, &err);
 		if (err || done) {
-			drm_exec_fini(&exec);
-			if (err && xe_vm_validate_should_retry(&exec, err, &end))
-				err = -EAGAIN;
-
+			xe_validation_ctx_fini(&ctx);
 			goto out_unlock_outer;
 		}
 	}
@@ -581,7 +550,7 @@ static void preempt_rebind_work_func(struct work_struct *w)
 	up_read(&vm->userptr.notifier_lock);
 
 out_unlock:
-	drm_exec_fini(&exec);
+	xe_validation_ctx_fini(&ctx);
 out_unlock_outer:
 	if (err == -EAGAIN) {
 		trace_xe_vm_rebind_worker_retry(vm);
@@ -1397,20 +1366,19 @@ int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma)
 
 static void xe_vma_destroy_unlocked(struct xe_vma *vma)
 {
+	struct xe_device *xe = xe_vma_vm(vma)->xe;
+	struct xe_validation_ctx ctx;
 	struct drm_exec exec;
-	int err;
+	int err = 0;
 
-	drm_exec_init(&exec, 0, 0);
-	drm_exec_until_all_locked(&exec) {
+	xe_validation_guard(&ctx, &xe->val, &exec, (struct xe_val_flags) {}, err) {
 		err = xe_vm_lock_vma(&exec, vma);
 		drm_exec_retry_on_contention(&exec);
 		if (XE_WARN_ON(err))
 			break;
+		xe_vma_destroy(vma, NULL);
 	}
-
-	xe_vma_destroy(vma, NULL);
-
-	drm_exec_fini(&exec);
+	xe_assert(xe, !err);
 }
 
 struct xe_vma *
@@ -2494,6 +2462,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
 			      u16 pat_index, unsigned int flags)
 {
 	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
+	struct xe_validation_ctx ctx;
 	struct drm_exec exec;
 	struct xe_vma *vma;
 	int err = 0;
@@ -2501,9 +2470,9 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
 	lockdep_assert_held_write(&vm->lock);
 
 	if (bo) {
-		drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
-		drm_exec_until_all_locked(&exec) {
-			err = 0;
+		err = 0;
+		xe_validation_guard(&ctx, &vm->xe->val, &exec,
+				    (struct xe_val_flags) {.interruptible = true}, err) {
 			if (!bo->vm) {
 				err = drm_exec_lock_obj(&exec, xe_vm_obj(vm));
 				drm_exec_retry_on_contention(&exec);
@@ -2512,27 +2481,35 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
 				err = drm_exec_lock_obj(&exec, &bo->ttm.base);
 				drm_exec_retry_on_contention(&exec);
 			}
-			if (err) {
-				drm_exec_fini(&exec);
+			if (err)
 				return ERR_PTR(err);
+
+			vma = xe_vma_create(vm, bo, op->gem.offset,
+					    op->va.addr, op->va.addr +
+					    op->va.range - 1, pat_index, flags);
+			if (IS_ERR(vma))
+				return vma;
+
+			if (!bo->vm) {
+				err = add_preempt_fences(vm, bo);
+				if (err) {
+					prep_vma_destroy(vm, vma, false);
+					xe_vma_destroy_unlocked(vma);
+				}
 			}
 		}
+		if (err)
+			return ERR_PTR(err);
+	} else {
+		vma = xe_vma_create(vm, NULL, op->gem.offset,
+				    op->va.addr, op->va.addr +
+				    op->va.range - 1, pat_index, flags);
+		if (IS_ERR(vma))
+			return vma;
+
+		if (xe_vma_is_userptr(vma))
+			err = xe_vma_userptr_pin_pages(to_userptr_vma(vma));
 	}
-	vma = xe_vma_create(vm, bo, op->gem.offset,
-			    op->va.addr, op->va.addr +
-			    op->va.range - 1, pat_index, flags);
-	if (IS_ERR(vma))
-		goto err_unlock;
-
-	if (xe_vma_is_userptr(vma))
-		err = xe_vma_userptr_pin_pages(to_userptr_vma(vma));
-	else if (!xe_vma_has_no_bo(vma) && !bo->vm)
-		err = add_preempt_fences(vm, bo);
-
-err_unlock:
-	if (bo)
-		drm_exec_fini(&exec);
-
 	if (err) {
 		prep_vma_destroy(vm, vma, false);
 		xe_vma_destroy_unlocked(vma);
@@ -3299,21 +3276,23 @@ static void vm_bind_ioctl_ops_fini(struct xe_vm *vm, struct xe_vma_ops *vops,
 static struct dma_fence *vm_bind_ioctl_ops_execute(struct xe_vm *vm,
 						   struct xe_vma_ops *vops)
 {
+	struct xe_validation_ctx ctx;
 	struct drm_exec exec;
 	struct dma_fence *fence;
-	int err;
+	int err = 0;
 
 	lockdep_assert_held_write(&vm->lock);
 
-	drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT |
-		      DRM_EXEC_IGNORE_DUPLICATES, 0);
-	drm_exec_until_all_locked(&exec) {
+	xe_validation_guard(&ctx, &vm->xe->val, &exec,
+			    ((struct xe_val_flags) {
+				    .interruptible = true,
+				    .exec_ignore_duplicates = true,
+			    }), err) {
 		err = vm_bind_ioctl_ops_lock_and_prep(&exec, vm, vops);
 		drm_exec_retry_on_contention(&exec);
-		if (err) {
-			fence = ERR_PTR(err);
-			goto unlock;
-		}
+		xe_validation_retry_on_oom(&ctx, &err);
+		if (err)
+			return ERR_PTR(err);
 
 		xe_vm_set_validation_exec(vm, &exec);
 		fence = ops_execute(vm, vops);
@@ -3321,15 +3300,13 @@ static struct dma_fence *vm_bind_ioctl_ops_execute(struct xe_vm *vm,
 		if (IS_ERR(fence)) {
 			if (PTR_ERR(fence) == -ENODATA)
 				vm_bind_ioctl_ops_fini(vm, vops, NULL);
-			goto unlock;
+			return fence;
 		}
 
 		vm_bind_ioctl_ops_fini(vm, vops, fence);
 	}
 
-unlock:
-	drm_exec_fini(&exec);
-	return fence;
+	return err ? ERR_PTR(err) : fence;
 }
 ALLOW_ERROR_INJECTION(vm_bind_ioctl_ops_execute, ERRNO);
 
diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
index 5a3456e82cf2..b5b21da5c777 100644
--- a/drivers/gpu/drm/xe/xe_vm.h
+++ b/drivers/gpu/drm/xe/xe_vm.h
@@ -262,8 +262,6 @@ int xe_vma_userptr_pin_pages(struct xe_userptr_vma *uvma);
 
 int xe_vma_userptr_check_repin(struct xe_userptr_vma *uvma);
 
-bool xe_vm_validate_should_retry(struct drm_exec *exec, int err, ktime_t *end);
-
 int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma);
 
 int xe_vm_validate_rebind(struct xe_vm *vm, struct drm_exec *exec,
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 09/16] drm/xe: Convert the CPU fault handler for exhaustive eviction
  2025-08-22  9:40 [PATCH v2 00/16] Driver-managed exhaustive eviction Thomas Hellström
                   ` (7 preceding siblings ...)
  2025-08-22  9:40 ` [PATCH v2 08/16] drm/xe: Convert existing drm_exec transactions " Thomas Hellström
@ 2025-08-22  9:40 ` Thomas Hellström
  2025-08-26 22:53   ` Matthew Brost
  2025-08-22  9:40 ` [PATCH v2 10/16] drm/xe/display: Convert __xe_pin_fb_vma() Thomas Hellström
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Thomas Hellström @ 2025-08-22  9:40 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

The CPU fault handler may populate bos and migrate, and in doing
so might interfere with other tasks validating.

Rework the CPU fault handler completely into a fastpath
and a slowpath. The fastpath trylocks only the validation lock
in read-mode. If that fails, there's a fallback to the
slowpath, where we do a full validation transaction.

This mandates open-coding of bo locking, bo idling and
bo populating, but we still call into TTM for fault
finalizing.

v2:
- Rework the CPU fault handler to actually take part in
  the exhaustive eviction scheme (Matthew Brost).

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c         | 191 ++++++++++++++++++++++++-----
 drivers/gpu/drm/xe/xe_validation.c |   3 +-
 2 files changed, 163 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 76e9c93826a2..686ca5d6038a 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1713,57 +1713,188 @@ static void xe_gem_object_close(struct drm_gem_object *obj,
 	}
 }
 
-static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
+static vm_fault_t __xe_bo_cpu_fault(struct vm_fault *vmf, struct xe_device *xe, struct xe_bo *bo)
+{
+	vm_fault_t ret;
+
+	trace_xe_bo_cpu_fault(bo);
+
+	ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
+				       TTM_BO_VM_NUM_PREFAULT);
+	if (ret == VM_FAULT_NOPAGE &&
+	    mem_type_is_vram(bo->ttm.resource->mem_type)) {
+		mutex_lock(&xe->mem_access.vram_userfault.lock);
+		if (list_empty(&bo->vram_userfault_link))
+			list_add(&bo->vram_userfault_link,
+				 &xe->mem_access.vram_userfault.list);
+		mutex_unlock(&xe->mem_access.vram_userfault.lock);
+	}
+
+	return ret;
+}
+
+static vm_fault_t xe_err_to_fault_t(int err)
+{
+	switch (err) {
+	case 0:
+	case -EINTR:
+	case -ERESTARTSYS:
+	case -EAGAIN:
+		return VM_FAULT_NOPAGE;
+	case -ENOMEM:
+	case -ENOSPC:
+		return VM_FAULT_OOM;
+	default:
+		break;
+	}
+	return VM_FAULT_SIGBUS;
+}
+
+static vm_fault_t xe_bo_cpu_fault_fastpath(struct vm_fault *vmf, struct xe_device *xe,
+					   struct xe_bo *bo, bool needs_rpm)
+{
+	struct ttm_buffer_object *tbo = &bo->ttm;
+	vm_fault_t ret = VM_FAULT_RETRY;
+	struct xe_validation_ctx ctx;
+	int err;
+
+	if (needs_rpm && !xe_pm_runtime_get_if_active(xe))
+		return VM_FAULT_RETRY;
+
+	err = xe_validation_ctx_init(&ctx, &xe->val, NULL,
+				     (struct xe_val_flags) {
+					     .interruptible = true,
+					     .no_block = true
+				     });
+	if (err)
+		goto out_pm;
+
+	if (!dma_resv_trylock(tbo->base.resv))
+		goto out_validation;
+
+	if (!dma_resv_test_signaled(tbo->base.resv, DMA_RESV_USAGE_KERNEL))
+		goto out_unlock;
+
+	if (!tbo->resource->bus.is_iomem) {
+		struct ttm_operation_ctx ctx = {
+			.interruptible = true,
+			.no_wait_gpu = true,
+			.gfp_retry_mayfail = true,
+		};
+
+		err = ttm_bo_populate(tbo, &ctx);
+		if (err) {
+			if (err != -ENOMEM && err != -ENOSPC)
+				ret = xe_err_to_fault_t(err);
+			goto out_unlock;
+		}
+	}
+
+	ret = __xe_bo_cpu_fault(vmf, xe, bo);
+
+out_unlock:
+	dma_resv_unlock(tbo->base.resv);
+out_validation:
+	xe_validation_ctx_fini(&ctx);
+out_pm:
+	if (needs_rpm)
+		xe_pm_runtime_put(xe);
+
+	return ret;
+}
+
+static vm_fault_t xe_bo_cpu_fault(struct vm_fault *vmf)
 {
 	struct ttm_buffer_object *tbo = vmf->vma->vm_private_data;
 	struct drm_device *ddev = tbo->base.dev;
 	struct xe_device *xe = to_xe_device(ddev);
 	struct xe_bo *bo = ttm_to_xe_bo(tbo);
 	bool needs_rpm = bo->flags & XE_BO_FLAG_VRAM_MASK;
-	struct drm_exec *exec;
+	bool retry_after_wait = false;
+	struct xe_validation_ctx ctx;
+	struct drm_exec exec;
 	vm_fault_t ret;
+	int err = 0;
 	int idx;
 
+	if (!drm_dev_enter(&xe->drm, &idx))
+		return ttm_bo_vm_dummy_page(vmf, vmf->vma->vm_page_prot);
+
+	ret = xe_bo_cpu_fault_fastpath(vmf, xe, bo, needs_rpm);
+	if (ret != VM_FAULT_RETRY)
+		goto out;
+
+	if (fault_flag_allow_retry_first(vmf->flags)) {
+		if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
+			goto out;
+		retry_after_wait = true;
+		xe_bo_get(bo);
+		mmap_read_unlock(vmf->vma->vm_mm);
+	} else {
+		ret = VM_FAULT_NOPAGE;
+	}
+
+	/*
+	 * The fastpath failed and we were not required to return and retry immediately.
+	 * We're now running in one of two modes:
+	 *
+	 * 1) retry_after_wait == true: The mmap_read_lock() is dropped, and we're trying
+	 * to resolve blocking waits. But we can't resolve the fault since the
+	 * mmap_read_lock() is dropped. After retrying the fault, the aim is that the fastpath
+	 * should succeed. But it may fail since we drop the bo lock.
+	 *
+	 * 2) retry_after_wait == false: The fastpath failed, typically even after
+	 * a retry. Do whatever's necessary to resolve the fault.
+	 *
+	 * This construct is recommended to avoid excessive waits under the mmap_lock.
+	 */
+
 	if (needs_rpm)
 		xe_pm_runtime_get(xe);
 
-	exec = XE_VALIDATION_UNIMPLEMENTED;
-	ret = ttm_bo_vm_reserve(tbo, vmf);
-	if (ret)
-		goto out;
+	xe_validation_guard(&ctx, &xe->val, &exec, (struct xe_val_flags) {.interruptible = true},
+			    err) {
+		long lerr;
 
-	if (drm_dev_enter(ddev, &idx)) {
-		trace_xe_bo_cpu_fault(bo);
+		err = drm_exec_lock_obj(&exec, &tbo->base);
+		drm_exec_retry_on_contention(&exec);
+		if (err)
+			break;
 
-		xe_validation_assert_exec(xe, exec, &tbo->base);
-		ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
-					       TTM_BO_VM_NUM_PREFAULT);
-		drm_dev_exit(idx);
+		lerr = dma_resv_wait_timeout(tbo->base.resv,
+					     DMA_RESV_USAGE_KERNEL, true,
+					     MAX_SCHEDULE_TIMEOUT);
+		if (lerr < 0) {
+			err = lerr;
+			break;
+		}
 
-		if (ret == VM_FAULT_RETRY &&
-		    !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
-			goto out;
+		if (!tbo->resource->bus.is_iomem) {
+			struct ttm_operation_ctx tctx = {
+				.interruptible = true,
+				.no_wait_gpu = false,
+				.gfp_retry_mayfail = true,
+			};
 
-		/*
-		 * ttm_bo_vm_reserve() already has dma_resv_lock.
-		 */
-		if (ret == VM_FAULT_NOPAGE &&
-		    mem_type_is_vram(tbo->resource->mem_type)) {
-			mutex_lock(&xe->mem_access.vram_userfault.lock);
-			if (list_empty(&bo->vram_userfault_link))
-				list_add(&bo->vram_userfault_link,
-					 &xe->mem_access.vram_userfault.list);
-			mutex_unlock(&xe->mem_access.vram_userfault.lock);
+			err = ttm_bo_populate(tbo, &tctx);
+			xe_validation_retry_on_oom(&ctx, &err);
+			if (err && (err == -EINTR || err == -ERESTARTSYS))
+				break;
 		}
-	} else {
-		ret = ttm_bo_vm_dummy_page(vmf, vmf->vma->vm_page_prot);
+		if (!retry_after_wait)
+			ret = __xe_bo_cpu_fault(vmf, xe, bo);
 	}
+	if (err)
+		ret = xe_err_to_fault_t(err);
 
-	dma_resv_unlock(tbo->base.resv);
-out:
 	if (needs_rpm)
 		xe_pm_runtime_put(xe);
 
+	if (retry_after_wait)
+		xe_bo_put(bo);
+out:
+	drm_dev_exit(idx);
+
 	return ret;
 }
 
@@ -1807,7 +1938,7 @@ int xe_bo_read(struct xe_bo *bo, u64 offset, void *dst, int size)
 }
 
 static const struct vm_operations_struct xe_gem_vm_ops = {
-	.fault = xe_gem_fault,
+	.fault = xe_bo_cpu_fault,
 	.open = ttm_bo_vm_open,
 	.close = ttm_bo_vm_close,
 	.access = xe_bo_vm_access,
diff --git a/drivers/gpu/drm/xe/xe_validation.c b/drivers/gpu/drm/xe/xe_validation.c
index b90fda3dd5f4..826cd09966ef 100644
--- a/drivers/gpu/drm/xe/xe_validation.c
+++ b/drivers/gpu/drm/xe/xe_validation.c
@@ -241,7 +241,8 @@ int xe_validation_exec_lock(struct xe_validation_ctx *ctx,
  */
 void xe_validation_ctx_fini(struct xe_validation_ctx *ctx)
 {
-	drm_exec_fini(ctx->exec);
+	if (ctx->exec)
+		drm_exec_fini(ctx->exec);
 	xe_validation_unlock(ctx);
 }
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 10/16] drm/xe/display: Convert __xe_pin_fb_vma()
  2025-08-22  9:40 [PATCH v2 00/16] Driver-managed exhaustive eviction Thomas Hellström
                   ` (8 preceding siblings ...)
  2025-08-22  9:40 ` [PATCH v2 09/16] drm/xe: Convert the CPU fault handler " Thomas Hellström
@ 2025-08-22  9:40 ` Thomas Hellström
  2025-08-26 21:29   ` Matthew Brost
  2025-08-22  9:40 ` [PATCH v2 11/16] drm/xe: Convert xe_dma_buf.c for exhaustive eviction Thomas Hellström
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Thomas Hellström @ 2025-08-22  9:40 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

Convert __xe_pin_fb_vma() for exhaustive eviction
using xe_validation_guard().

v2:
- Avoid gotos from within xe_validation_guard(). (Matt Brost)
- Adapt to signature change of xe_validation_guard(). (Matt Brost)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/display/xe_fb_pin.c | 29 +++++++++++++++-----------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
index 4b0748e6fdd6..fe0000b211d9 100644
--- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
+++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
@@ -281,7 +281,8 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
 	struct i915_vma *vma = kzalloc(sizeof(*vma), GFP_KERNEL);
 	struct drm_gem_object *obj = intel_fb_bo(&fb->base);
 	struct xe_bo *bo = gem_to_xe_bo(obj);
-	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
+	struct xe_validation_ctx ctx;
+	struct drm_exec exec;
 	int ret;
 
 	if (!vma)
@@ -309,17 +310,21 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
 	 * Pin the framebuffer, we can't use xe_bo_(un)pin functions as the
 	 * assumptions are incorrect for framebuffers
 	 */
-	ret = ttm_bo_reserve(&bo->ttm, false, false, NULL);
-	if (ret)
-		goto err;
-
-	if (IS_DGFX(xe))
-		ret = xe_bo_migrate(bo, XE_PL_VRAM0, exec);
-	else
-		ret = xe_bo_validate(bo, NULL, true, exec);
-	if (!ret)
-		ttm_bo_pin(&bo->ttm);
-	ttm_bo_unreserve(&bo->ttm);
+	xe_validation_guard(&ctx, &xe->val, &exec, (struct xe_val_flags) {}, ret) {
+		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
+		drm_exec_retry_on_contention(&exec);
+		if (ret)
+			break;
+
+		if (IS_DGFX(xe))
+			ret = xe_bo_migrate(bo, XE_PL_VRAM0, &exec);
+		else
+			ret = xe_bo_validate(bo, NULL, true, &exec);
+		drm_exec_retry_on_contention(&exec);
+		xe_validation_retry_on_oom(&ctx, &ret);
+		if (!ret)
+			ttm_bo_pin(&bo->ttm);
+	}
 	if (ret)
 		goto err;
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 11/16] drm/xe: Convert xe_dma_buf.c for exhaustive eviction
  2025-08-22  9:40 [PATCH v2 00/16] Driver-managed exhaustive eviction Thomas Hellström
                   ` (9 preceding siblings ...)
  2025-08-22  9:40 ` [PATCH v2 10/16] drm/xe/display: Convert __xe_pin_fb_vma() Thomas Hellström
@ 2025-08-22  9:40 ` Thomas Hellström
  2025-08-26 21:16   ` Matthew Brost
  2025-08-22  9:40 ` [PATCH v2 12/16] drm/xe: Rename ___xe_bo_create_locked() Thomas Hellström
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Thomas Hellström @ 2025-08-22  9:40 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

Convert dma-buf migration to XE_PL_TT and dma-buf import to
support exhaustive eviction, using xe_validation_guard().
It seems unlikely that the import would result in an -ENOMEM,
but convert import anyway for completeness.

The dma-buf map_attachment() functionality unfortunately doesn't
support passing a drm_exec, which means that foreign devices
validating a dma-buf that we exported will not, unless they are
xeKMD devices, participate in the exhaustive eviction scheme.

v2:
- Avoid gotos from within xe_validation_guard(). (Matt Brost)
- Adapt to signature change of xe_validation_guard(). (Matt Brost)
- Remove an unneded (void)ret. (Matt Brost)
- Fix up an error path.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_dma_buf.c | 61 ++++++++++++++++++++++-----------
 1 file changed, 41 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
index 78a827d4e726..3f96101a06f3 100644
--- a/drivers/gpu/drm/xe/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/xe_dma_buf.c
@@ -163,16 +163,26 @@ static int xe_dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
 	struct xe_bo *bo = gem_to_xe_bo(obj);
 	bool reads =  (direction == DMA_BIDIRECTIONAL ||
 		       direction == DMA_FROM_DEVICE);
-	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
+	struct xe_validation_ctx ctx;
+	struct drm_exec exec;
+	int ret = 0;
 
 	if (!reads)
 		return 0;
 
 	/* Can we do interruptible lock here? */
-	xe_bo_lock(bo, false);
-	(void)xe_bo_migrate(bo, XE_PL_TT, exec);
-	xe_bo_unlock(bo);
+	xe_validation_guard(&ctx, &xe_bo_device(bo)->val, &exec, (struct xe_val_flags) {}, ret) {
+		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
+		drm_exec_retry_on_contention(&exec);
+		if (ret)
+			break;
+
+		ret = xe_bo_migrate(bo, XE_PL_TT, &exec);
+		drm_exec_retry_on_contention(&exec);
+		xe_validation_retry_on_oom(&ctx, &ret);
+	}
 
+	/* If we failed, cpu-access takes place in current placement. */
 	return 0;
 }
 
@@ -211,25 +221,36 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
 {
 	struct dma_resv *resv = dma_buf->resv;
 	struct xe_device *xe = to_xe_device(dev);
-	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
+	struct xe_validation_ctx ctx;
+	struct drm_gem_object *dummy_obj;
+	struct drm_exec exec;
 	struct xe_bo *bo;
-	int ret;
-
-	dma_resv_lock(resv, NULL);
-	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
-				    0, /* Will require 1way or 2way for vm_bind */
-				    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM, exec);
-	if (IS_ERR(bo)) {
-		ret = PTR_ERR(bo);
-		goto error;
+	int ret = 0;
+
+	dummy_obj = drm_gpuvm_resv_object_alloc(&xe->drm);
+	if (!dummy_obj)
+		return ERR_PTR(-ENOMEM);
+
+	dummy_obj->resv = resv;
+	xe_validation_guard(&ctx, &xe->val, &exec, (struct xe_val_flags) {}, ret) {
+		ret = drm_exec_lock_obj(&exec, dummy_obj);
+		drm_exec_retry_on_contention(&exec);
+		if (ret)
+			break;
+
+		bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
+					    0, /* Will require 1way or 2way for vm_bind */
+					    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM, &exec);
+		drm_exec_retry_on_contention(&exec);
+		if (IS_ERR(bo)) {
+			ret = PTR_ERR(bo);
+			xe_validation_retry_on_oom(&ctx, &ret);
+			break;
+		}
 	}
-	dma_resv_unlock(resv);
-
-	return &bo->ttm.base;
+	drm_gem_object_put(dummy_obj);
 
-error:
-	dma_resv_unlock(resv);
-	return ERR_PTR(ret);
+	return ret ? ERR_PTR(ret) : &bo->ttm.base;
 }
 
 static void xe_dma_buf_move_notify(struct dma_buf_attachment *attach)
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 12/16] drm/xe: Rename ___xe_bo_create_locked()
  2025-08-22  9:40 [PATCH v2 00/16] Driver-managed exhaustive eviction Thomas Hellström
                   ` (10 preceding siblings ...)
  2025-08-22  9:40 ` [PATCH v2 11/16] drm/xe: Convert xe_dma_buf.c for exhaustive eviction Thomas Hellström
@ 2025-08-22  9:40 ` Thomas Hellström
  2025-08-22  9:40 ` [PATCH v2 13/16] drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction Thomas Hellström
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 36+ messages in thread
From: Thomas Hellström @ 2025-08-22  9:40 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

Don't start external function names with underscores.
Rename to xe_bo_init_locked().

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c      | 39 ++++++++++++++++-----------------
 drivers/gpu/drm/xe/xe_bo.h      | 10 ++++-----
 drivers/gpu/drm/xe/xe_dma_buf.c |  6 ++---
 3 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 686ca5d6038a..a3b7288f6b3d 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1161,10 +1161,10 @@ int xe_bo_notifier_prepare_pinned(struct xe_bo *bo)
 	if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
 		goto out_unlock_bo;
 
-	backup = ___xe_bo_create_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
-					DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
-					XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
-					XE_BO_FLAG_PINNED, exec);
+	backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
+				   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
+				   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
+				   XE_BO_FLAG_PINNED, exec);
 	if (IS_ERR(backup)) {
 		ret = PTR_ERR(backup);
 		goto out_unlock_bo;
@@ -1240,11 +1240,10 @@ int xe_bo_evict_pinned(struct xe_bo *bo)
 		goto out_unlock_bo;
 
 	if (!backup) {
-		backup = ___xe_bo_create_locked(xe, NULL, NULL, bo->ttm.base.resv,
-						NULL, xe_bo_size(bo),
-						DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
-						XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
-						XE_BO_FLAG_PINNED, exec);
+		backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
+					   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
+					   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
+					   XE_BO_FLAG_PINNED, exec);
 		if (IS_ERR(backup)) {
 			ret = PTR_ERR(backup);
 			goto out_unlock_bo;
@@ -1987,7 +1986,7 @@ void xe_bo_free(struct xe_bo *bo)
 }
 
 /**
- * ___xe_bo_create_locked() - Initialize or create an xe_bo.
+ * xe_bo_init_locked() - Initialize or create an xe_bo.
  * @xe: The xe device.
  * @bo: An already allocated buffer object or NULL
  * if the function should allocate a new one.
@@ -2007,11 +2006,11 @@ void xe_bo_free(struct xe_bo *bo)
  *
  * Return: The buffer object on success. Negative error pointer on failure.
  */
-struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
-				     struct xe_tile *tile, struct dma_resv *resv,
-				     struct ttm_lru_bulk_move *bulk, size_t size,
-				     u16 cpu_caching, enum ttm_bo_type type,
-				     u32 flags, struct drm_exec *exec)
+struct xe_bo *xe_bo_init_locked(struct xe_device *xe, struct xe_bo *bo,
+				struct xe_tile *tile, struct dma_resv *resv,
+				struct ttm_lru_bulk_move *bulk, size_t size,
+				u16 cpu_caching, enum ttm_bo_type type,
+				u32 flags, struct drm_exec *exec)
 {
 	struct ttm_operation_ctx ctx = {
 		.interruptible = true,
@@ -2203,11 +2202,11 @@ __xe_bo_create_locked(struct xe_device *xe,
 		}
 	}
 
-	bo = ___xe_bo_create_locked(xe, bo, tile, vm ? xe_vm_resv(vm) : NULL,
-				    vm && !xe_vm_in_fault_mode(vm) &&
-				    flags & XE_BO_FLAG_USER ?
-				    &vm->lru_bulk_move : NULL, size,
-				    cpu_caching, type, flags, exec);
+	bo = xe_bo_init_locked(xe, bo, tile, vm ? xe_vm_resv(vm) : NULL,
+			       vm && !xe_vm_in_fault_mode(vm) &&
+			       flags & XE_BO_FLAG_USER ?
+			       &vm->lru_bulk_move : NULL, size,
+			       cpu_caching, type, flags, exec);
 	if (IS_ERR(bo))
 		return bo;
 
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index c6bb90ca5c2e..a625806deeb6 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -89,11 +89,11 @@ struct sg_table;
 struct xe_bo *xe_bo_alloc(void);
 void xe_bo_free(struct xe_bo *bo);
 
-struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
-				     struct xe_tile *tile, struct dma_resv *resv,
-				     struct ttm_lru_bulk_move *bulk, size_t size,
-				     u16 cpu_caching, enum ttm_bo_type type,
-				     u32 flags, struct drm_exec *exec);
+struct xe_bo *xe_bo_init_locked(struct xe_device *xe, struct xe_bo *bo,
+				struct xe_tile *tile, struct dma_resv *resv,
+				struct ttm_lru_bulk_move *bulk, size_t size,
+				u16 cpu_caching, enum ttm_bo_type type,
+				u32 flags, struct drm_exec *exec);
 struct xe_bo *
 xe_bo_create_locked_range(struct xe_device *xe,
 			  struct xe_tile *tile, struct xe_vm *vm,
diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
index 3f96101a06f3..a01d9fa7e136 100644
--- a/drivers/gpu/drm/xe/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/xe_dma_buf.c
@@ -238,9 +238,9 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
 		if (ret)
 			break;
 
-		bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
-					    0, /* Will require 1way or 2way for vm_bind */
-					    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM, &exec);
+		bo = xe_bo_init_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
+				       0, /* Will require 1way or 2way for vm_bind */
+				       ttm_bo_type_sg, XE_BO_FLAG_SYSTEM, &exec);
 		drm_exec_retry_on_contention(&exec);
 		if (IS_ERR(bo)) {
 			ret = PTR_ERR(bo);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 13/16] drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction
  2025-08-22  9:40 [PATCH v2 00/16] Driver-managed exhaustive eviction Thomas Hellström
                   ` (11 preceding siblings ...)
  2025-08-22  9:40 ` [PATCH v2 12/16] drm/xe: Rename ___xe_bo_create_locked() Thomas Hellström
@ 2025-08-22  9:40 ` Thomas Hellström
  2025-08-26 21:27   ` Matthew Brost
  2025-08-22  9:40 ` [PATCH v2 14/16] drm/xe: Convert xe_bo_create_pin_map() " Thomas Hellström
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Thomas Hellström @ 2025-08-22  9:40 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

Most users of xe_bo_create_pin_map_at() and
xe_bo_create_pin_map_at_aligned() are not using the vm parameter,
and that simplifies conversion. Introduce an
xe_bo_create_pin_map_at_novm() function and make the _aligned()
version static. Use xe_validation_guard() for conversion.

v2:
- Adapt to signature change of xe_validation_guard(). (Matt Brost)
- Fix up documentation.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 .../compat-i915-headers/gem/i915_gem_stolen.h | 24 ++-----
 drivers/gpu/drm/xe/display/xe_fb_pin.c        | 42 +++++------
 drivers/gpu/drm/xe/display/xe_plane_initial.c |  4 +-
 drivers/gpu/drm/xe/xe_bo.c                    | 72 ++++++++++++++-----
 drivers/gpu/drm/xe/xe_bo.h                    | 13 ++--
 drivers/gpu/drm/xe/xe_eu_stall.c              |  5 +-
 6 files changed, 89 insertions(+), 71 deletions(-)

diff --git a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
index 1ce1e9da975b..51afdf2ee98b 100644
--- a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
+++ b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
@@ -21,9 +21,7 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
 						       u32 size, u32 align,
 						       u32 start, u32 end)
 {
-	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
 	struct xe_bo *bo;
-	int err;
 	u32 flags = XE_BO_FLAG_PINNED | XE_BO_FLAG_STOLEN;
 
 	if (start < SZ_4K)
@@ -34,25 +32,15 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
 		start = ALIGN(start, align);
 	}
 
-	bo = xe_bo_create_locked_range(xe, xe_device_get_root_tile(xe),
-				       NULL, size, start, end,
-				       ttm_bo_type_kernel, flags, 0, exec);
-	if (IS_ERR(bo)) {
-		err = PTR_ERR(bo);
-		bo = NULL;
-		return err;
-	}
-	err = xe_bo_pin(bo, exec);
-	xe_bo_unlock_vm_held(bo);
-
-	if (err) {
-		xe_bo_put(fb->bo);
-		bo = NULL;
-	}
+	bo = xe_bo_create_pin_map_at_novm(xe, xe_device_get_root_tile(xe),
+					  size, start, ttm_bo_type_kernel, flags,
+					  0, true);
+	if (IS_ERR(bo))
+		return PTR_ERR(bo);
 
 	fb->bo = bo;
 
-	return err;
+	return 0;
 }
 
 static inline int i915_gem_stolen_insert_node(struct xe_device *xe,
diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
index fe0000b211d9..e73994dd4126 100644
--- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
+++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
@@ -102,29 +102,29 @@ static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb,
 				 XE_PAGE_SIZE);
 
 	if (IS_DGFX(xe))
-		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0, NULL,
-						      dpt_size, ~0ull,
-						      ttm_bo_type_kernel,
-						      XE_BO_FLAG_VRAM0 |
-						      XE_BO_FLAG_GGTT |
-						      XE_BO_FLAG_PAGETABLE,
-						      alignment);
+		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
+						   dpt_size, ~0ull,
+						   ttm_bo_type_kernel,
+						   XE_BO_FLAG_VRAM0 |
+						   XE_BO_FLAG_GGTT |
+						   XE_BO_FLAG_PAGETABLE,
+						   alignment, false);
 	else
-		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0, NULL,
-						      dpt_size,  ~0ull,
-						      ttm_bo_type_kernel,
-						      XE_BO_FLAG_STOLEN |
-						      XE_BO_FLAG_GGTT |
-						      XE_BO_FLAG_PAGETABLE,
-						      alignment);
+		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
+						   dpt_size,  ~0ull,
+						   ttm_bo_type_kernel,
+						   XE_BO_FLAG_STOLEN |
+						   XE_BO_FLAG_GGTT |
+						   XE_BO_FLAG_PAGETABLE,
+						   alignment, false);
 	if (IS_ERR(dpt))
-		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0, NULL,
-						      dpt_size,  ~0ull,
-						      ttm_bo_type_kernel,
-						      XE_BO_FLAG_SYSTEM |
-						      XE_BO_FLAG_GGTT |
-						      XE_BO_FLAG_PAGETABLE,
-						      alignment);
+		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
+						   dpt_size,  ~0ull,
+						   ttm_bo_type_kernel,
+						   XE_BO_FLAG_SYSTEM |
+						   XE_BO_FLAG_GGTT |
+						   XE_BO_FLAG_PAGETABLE,
+						   alignment, false);
 	if (IS_ERR(dpt))
 		return PTR_ERR(dpt);
 
diff --git a/drivers/gpu/drm/xe/display/xe_plane_initial.c b/drivers/gpu/drm/xe/display/xe_plane_initial.c
index 826ac3d578b7..94f00def811b 100644
--- a/drivers/gpu/drm/xe/display/xe_plane_initial.c
+++ b/drivers/gpu/drm/xe/display/xe_plane_initial.c
@@ -140,8 +140,8 @@ initial_plane_bo(struct xe_device *xe,
 			page_size);
 	size -= base;
 
-	bo = xe_bo_create_pin_map_at(xe, tile0, NULL, size, phys_base,
-				     ttm_bo_type_kernel, flags);
+	bo = xe_bo_create_pin_map_at_novm(xe, tile0, size, phys_base,
+					  ttm_bo_type_kernel, flags, 0, false);
 	if (IS_ERR(bo)) {
 		drm_dbg(&xe->drm,
 			"Failed to create bo phys_base=%pa size %u with flags %x: %li\n",
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index a3b7288f6b3d..d5172cb05078 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -2379,27 +2379,17 @@ struct xe_bo *xe_bo_create_user(struct xe_device *xe,
 	return bo;
 }
 
-struct xe_bo *xe_bo_create_pin_map_at(struct xe_device *xe, struct xe_tile *tile,
-				      struct xe_vm *vm,
-				      size_t size, u64 offset,
-				      enum ttm_bo_type type, u32 flags)
-{
-	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size, offset,
-					       type, flags, 0);
-}
-
-struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
-					      struct xe_tile *tile,
-					      struct xe_vm *vm,
-					      size_t size, u64 offset,
-					      enum ttm_bo_type type, u32 flags,
-					      u64 alignment)
+static struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
+						     struct xe_tile *tile,
+						     struct xe_vm *vm,
+						     size_t size, u64 offset,
+						     enum ttm_bo_type type, u32 flags,
+						     u64 alignment, struct drm_exec *exec)
 {
 	struct xe_bo *bo;
 	int err;
 	u64 start = offset == ~0ull ? 0 : offset;
-	u64 end = offset == ~0ull ? offset : start + size;
-	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
+	u64 end = offset == ~0ull ? ~0ull : start + size;
 
 	if (flags & XE_BO_FLAG_STOLEN &&
 	    xe_ttm_stolen_cpu_access_needs_ggtt(xe))
@@ -2431,11 +2421,57 @@ struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
 	return ERR_PTR(err);
 }
 
+/**
+ * xe_bo_create_pin_map_at_novm() - Create pinned and mapped bo at optional VRAM offset
+ * @xe: The xe device.
+ * @tile: The tile to select for migration of this bo, and the tile used for
+ * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
+ * @size: The storage size to use for the bo.
+ * @offset: Optional VRAM offset or %~0ull for don't care.
+ * @type: The TTM buffer object type.
+ * @flags: XE_BO_FLAG_ flags.
+ * @alignment: GGTT alignment.
+ * @intr: Whether to execute any waits for backing store interruptible.
+ *
+ * Create a pinned and optionally mapped bo with VRAM offset and GGTT alignment
+ * options. The bo will be external and not associated with a VM.
+ *
+ * Return: The buffer object on success. Negative error pointer on failure.
+ * In particular, the function may return ERR_PTR(%-EINTR) if @intr was set
+ * to true on entry.
+ */
+struct xe_bo *
+xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile *tile,
+			     size_t size, u64 offset, enum ttm_bo_type type, u32 flags,
+			     u64 alignment, bool intr)
+{
+	struct xe_validation_ctx ctx;
+	struct drm_exec exec;
+	struct xe_bo *bo;
+	int ret = 0;
+
+	xe_validation_guard(&ctx, &xe->val, &exec, (struct xe_val_flags) {.interruptible = intr},
+			    ret) {
+		bo = xe_bo_create_pin_map_at_aligned(xe, tile, NULL, size, offset,
+						     type, flags, alignment, &exec);
+		if (IS_ERR(bo)) {
+			drm_exec_retry_on_contention(&exec);
+			ret = PTR_ERR(bo);
+			xe_validation_retry_on_oom(&ctx, &ret);
+		}
+	}
+
+	return ret ? ERR_PTR(ret) : bo;
+}
+
 struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
 				   struct xe_vm *vm, size_t size,
 				   enum ttm_bo_type type, u32 flags)
 {
-	return xe_bo_create_pin_map_at(xe, tile, vm, size, ~0ull, type, flags);
+	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
+
+	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size, ~0ull, type, flags,
+					       0, exec);
 }
 
 static void __xe_bo_unpin_map_no_vm(void *arg)
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index a625806deeb6..decd601c802d 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -109,15 +109,10 @@ struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_vm *vm, size_t s
 struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
 				   struct xe_vm *vm, size_t size,
 				   enum ttm_bo_type type, u32 flags);
-struct xe_bo *xe_bo_create_pin_map_at(struct xe_device *xe, struct xe_tile *tile,
-				      struct xe_vm *vm, size_t size, u64 offset,
-				      enum ttm_bo_type type, u32 flags);
-struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
-					      struct xe_tile *tile,
-					      struct xe_vm *vm,
-					      size_t size, u64 offset,
-					      enum ttm_bo_type type, u32 flags,
-					      u64 alignment);
+struct xe_bo *
+xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile *tile,
+			     size_t size, u64 offset, enum ttm_bo_type type,
+			     u32 flags, u64 alignment, bool intr);
 struct xe_bo *xe_managed_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
 					   size_t size, u32 flags);
 struct xe_bo *xe_managed_bo_create_from_data(struct xe_device *xe, struct xe_tile *tile,
diff --git a/drivers/gpu/drm/xe/xe_eu_stall.c b/drivers/gpu/drm/xe/xe_eu_stall.c
index fdd514fec5ef..f5cfdf29fde3 100644
--- a/drivers/gpu/drm/xe/xe_eu_stall.c
+++ b/drivers/gpu/drm/xe/xe_eu_stall.c
@@ -617,9 +617,8 @@ static int xe_eu_stall_data_buf_alloc(struct xe_eu_stall_data_stream *stream,
 
 	size = stream->per_xecore_buf_size * last_xecore;
 
-	bo = xe_bo_create_pin_map_at_aligned(tile->xe, tile, NULL,
-					     size, ~0ull, ttm_bo_type_kernel,
-					     XE_BO_FLAG_SYSTEM | XE_BO_FLAG_GGTT, SZ_64);
+	bo = xe_bo_create_pin_map_at_novm(tile->xe, tile, size, ~0ull, ttm_bo_type_kernel,
+					  XE_BO_FLAG_SYSTEM | XE_BO_FLAG_GGTT, SZ_64, false);
 	if (IS_ERR(bo)) {
 		kfree(stream->xecore_buf);
 		return PTR_ERR(bo);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 14/16] drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction
  2025-08-22  9:40 [PATCH v2 00/16] Driver-managed exhaustive eviction Thomas Hellström
                   ` (12 preceding siblings ...)
  2025-08-22  9:40 ` [PATCH v2 13/16] drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction Thomas Hellström
@ 2025-08-22  9:40 ` Thomas Hellström
  2025-08-26 21:52   ` Matthew Brost
  2025-08-22  9:40 ` [PATCH v2 15/16] drm/xe/sriov: Convert pf_provision_vf_lmem " Thomas Hellström
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Thomas Hellström @ 2025-08-22  9:40 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

Introduce an xe_bo_create_pin_map_novm() function that does not
take the drm_exec paramenter to simplify the conversion of many
callsites.
For the rest, ensure that the same drm_exec context that was used
for locking the vm is passed down to validation.

Use xe_validation_guard() where appropriate.

v2:
- Avoid gotos from within xe_validation_guard(). (Matt Brost)
- Break out the change to pf_provision_vf_lmem8 to a separate
  patch.
- Adapt to signature change of xe_validation_guard().

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/display/intel_fbdev_fb.c   |  18 +--
 drivers/gpu/drm/xe/display/xe_dsb_buffer.c    |  10 +-
 drivers/gpu/drm/xe/display/xe_hdcp_gsc.c      |   8 +-
 drivers/gpu/drm/xe/tests/xe_migrate.c         |   9 +-
 drivers/gpu/drm/xe/xe_bo.c                    |  52 +++++++-
 drivers/gpu/drm/xe/xe_bo.h                    |   6 +-
 drivers/gpu/drm/xe/xe_gsc.c                   |   8 +-
 drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c |  24 ++--
 drivers/gpu/drm/xe/xe_guc_engine_activity.c   |  13 +-
 drivers/gpu/drm/xe/xe_lmtt.c                  |  12 +-
 drivers/gpu/drm/xe/xe_lrc.c                   |   7 +-
 drivers/gpu/drm/xe/xe_migrate.c               |  20 ++-
 drivers/gpu/drm/xe/xe_oa.c                    |   6 +-
 drivers/gpu/drm/xe/xe_pt.c                    |  10 +-
 drivers/gpu/drm/xe/xe_pt.h                    |   3 +-
 drivers/gpu/drm/xe/xe_pxp_submit.c            |  34 +++--
 drivers/gpu/drm/xe/xe_vm.c                    | 121 +++++++++++-------
 17 files changed, 231 insertions(+), 130 deletions(-)

diff --git a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
index d96ba2b51065..8ea9a472113c 100644
--- a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
+++ b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
@@ -42,11 +42,11 @@ struct intel_framebuffer *intel_fbdev_fb_alloc(struct drm_fb_helper *helper,
 	obj = ERR_PTR(-ENODEV);
 
 	if (!IS_DGFX(xe) && !XE_GT_WA(xe_root_mmio_gt(xe), 22019338487_display)) {
-		obj = xe_bo_create_pin_map(xe, xe_device_get_root_tile(xe),
-					   NULL, size,
-					   ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
-					   XE_BO_FLAG_STOLEN |
-					   XE_BO_FLAG_GGTT);
+		obj = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe),
+						size,
+						ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
+						XE_BO_FLAG_STOLEN |
+						XE_BO_FLAG_GGTT, false);
 		if (!IS_ERR(obj))
 			drm_info(&xe->drm, "Allocated fbdev into stolen\n");
 		else
@@ -54,10 +54,10 @@ struct intel_framebuffer *intel_fbdev_fb_alloc(struct drm_fb_helper *helper,
 	}
 
 	if (IS_ERR(obj)) {
-		obj = xe_bo_create_pin_map(xe, xe_device_get_root_tile(xe), NULL, size,
-					   ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
-					   XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
-					   XE_BO_FLAG_GGTT);
+		obj = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe), size,
+						ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
+						XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
+						XE_BO_FLAG_GGTT, false);
 	}
 
 	if (IS_ERR(obj)) {
diff --git a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
index 9f941fc2e36b..58581d7aaae6 100644
--- a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
+++ b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
@@ -43,11 +43,11 @@ bool intel_dsb_buffer_create(struct intel_crtc *crtc, struct intel_dsb_buffer *d
 		return false;
 
 	/* Set scanout flag for WC mapping */
-	obj = xe_bo_create_pin_map(xe, xe_device_get_root_tile(xe),
-				   NULL, PAGE_ALIGN(size),
-				   ttm_bo_type_kernel,
-				   XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
-				   XE_BO_FLAG_SCANOUT | XE_BO_FLAG_GGTT);
+	obj = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe),
+					PAGE_ALIGN(size),
+					ttm_bo_type_kernel,
+					XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
+					XE_BO_FLAG_SCANOUT | XE_BO_FLAG_GGTT, false);
 	if (IS_ERR(obj)) {
 		kfree(vma);
 		return false;
diff --git a/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c b/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
index 30f1073141fc..4ae847b628e2 100644
--- a/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
+++ b/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
@@ -72,10 +72,10 @@ static int intel_hdcp_gsc_initialize_message(struct xe_device *xe,
 	int ret = 0;
 
 	/* allocate object of two page for HDCP command memory and store it */
-	bo = xe_bo_create_pin_map(xe, xe_device_get_root_tile(xe), NULL, PAGE_SIZE * 2,
-				  ttm_bo_type_kernel,
-				  XE_BO_FLAG_SYSTEM |
-				  XE_BO_FLAG_GGTT);
+	bo = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe), PAGE_SIZE * 2,
+				       ttm_bo_type_kernel,
+				       XE_BO_FLAG_SYSTEM |
+				       XE_BO_FLAG_GGTT, false);
 
 	if (IS_ERR(bo)) {
 		drm_err(&xe->drm, "Failed to allocate bo for HDCP streaming command!\n");
diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
index afa794e56065..5904d658d1f2 100644
--- a/drivers/gpu/drm/xe/tests/xe_migrate.c
+++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
@@ -204,7 +204,8 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test,
 
 	big = xe_bo_create_pin_map(xe, tile, m->q->vm, SZ_4M,
 				   ttm_bo_type_kernel,
-				   XE_BO_FLAG_VRAM_IF_DGFX(tile));
+				   XE_BO_FLAG_VRAM_IF_DGFX(tile),
+				   exec);
 	if (IS_ERR(big)) {
 		KUNIT_FAIL(test, "Failed to allocate bo: %li\n", PTR_ERR(big));
 		goto vunmap;
@@ -212,7 +213,8 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test,
 
 	pt = xe_bo_create_pin_map(xe, tile, m->q->vm, XE_PAGE_SIZE,
 				  ttm_bo_type_kernel,
-				  XE_BO_FLAG_VRAM_IF_DGFX(tile));
+				  XE_BO_FLAG_VRAM_IF_DGFX(tile),
+				  exec);
 	if (IS_ERR(pt)) {
 		KUNIT_FAIL(test, "Failed to allocate fake pt: %li\n",
 			   PTR_ERR(pt));
@@ -222,7 +224,8 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test,
 	tiny = xe_bo_create_pin_map(xe, tile, m->q->vm,
 				    2 * SZ_4K,
 				    ttm_bo_type_kernel,
-				    XE_BO_FLAG_VRAM_IF_DGFX(tile));
+				    XE_BO_FLAG_VRAM_IF_DGFX(tile),
+				    exec);
 	if (IS_ERR(tiny)) {
 		KUNIT_FAIL(test, "Failed to allocate tiny fake pt: %li\n",
 			   PTR_ERR(tiny));
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index d5172cb05078..7a62629c88e0 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -2464,16 +2464,59 @@ xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile *tile,
 	return ret ? ERR_PTR(ret) : bo;
 }
 
+/**
+ * xe_bo_create_pin_map() - Create pinned and mapped bo
+ * @xe: The xe device.
+ * @tile: The tile to select for migration of this bo, and the tile used for
+ * @vm: The vm to associate the buffer object with. The vm's resv must be locked
+ * with the transaction represented by @exec.
+ * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
+ * @size: The storage size to use for the bo.
+ * @type: The TTM buffer object type.
+ * @flags: XE_BO_FLAG_ flags.
+ * @exec: The drm_exec transaction to use for exhaustive eviction, and
+ * previously used for locking @vm's resv.
+ *
+ * Create a pinned and mapped bo. The bo will be external and not associated
+ * with a VM.
+ *
+ * Return: The buffer object on success. Negative error pointer on failure.
+ * In particular, the function may return ERR_PTR(%-EINTR) if @exec was
+ * configured for interruptible locking.
+ */
 struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
 				   struct xe_vm *vm, size_t size,
-				   enum ttm_bo_type type, u32 flags)
+				   enum ttm_bo_type type, u32 flags,
+				   struct drm_exec *exec)
 {
-	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
-
 	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size, ~0ull, type, flags,
 					       0, exec);
 }
 
+/**
+ * xe_bo_create_pin_map_novm() - Create pinned and mapped bo
+ * @xe: The xe device.
+ * @tile: The tile to select for migration of this bo, and the tile used for
+ * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
+ * @size: The storage size to use for the bo.
+ * @type: The TTM buffer object type.
+ * @flags: XE_BO_FLAG_ flags.
+ * @intr: Whether to execut any waits for backing store interruptible.
+ *
+ * Create a pinned and mapped bo. The bo will be external and not associated
+ * with a VM.
+ *
+ * Return: The buffer object on success. Negative error pointer on failure.
+ * In particular, the function may return ERR_PTR(%-EINTR) if @intr was set
+ * to true on entry.
+ */
+struct xe_bo *xe_bo_create_pin_map_novm(struct xe_device *xe, struct xe_tile *tile,
+					size_t size, enum ttm_bo_type type, u32 flags,
+					bool intr)
+{
+	return xe_bo_create_pin_map_at_novm(xe, tile, size, ~0ull, type, flags, 0, intr);
+}
+
 static void __xe_bo_unpin_map_no_vm(void *arg)
 {
 	xe_bo_unpin_map_no_vm(arg);
@@ -2486,8 +2529,7 @@ struct xe_bo *xe_managed_bo_create_pin_map(struct xe_device *xe, struct xe_tile
 	int ret;
 
 	KUNIT_STATIC_STUB_REDIRECT(xe_managed_bo_create_pin_map, xe, tile, size, flags);
-
-	bo = xe_bo_create_pin_map(xe, tile, NULL, size, ttm_bo_type_kernel, flags);
+	bo = xe_bo_create_pin_map_novm(xe, tile, size, ttm_bo_type_kernel, flags, true);
 	if (IS_ERR(bo))
 		return bo;
 
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index decd601c802d..6f46f928a0d4 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -108,7 +108,11 @@ struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_vm *vm, size_t s
 				u16 cpu_caching, u32 flags, struct drm_exec *exec);
 struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
 				   struct xe_vm *vm, size_t size,
-				   enum ttm_bo_type type, u32 flags);
+				   enum ttm_bo_type type, u32 flags,
+				   struct drm_exec *exec);
+struct xe_bo *xe_bo_create_pin_map_novm(struct xe_device *xe, struct xe_tile *tile,
+					size_t size, enum ttm_bo_type type, u32 flags,
+					bool intr);
 struct xe_bo *
 xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile *tile,
 			     size_t size, u64 offset, enum ttm_bo_type type,
diff --git a/drivers/gpu/drm/xe/xe_gsc.c b/drivers/gpu/drm/xe/xe_gsc.c
index f5ae28af60d4..83d61bf8ec62 100644
--- a/drivers/gpu/drm/xe/xe_gsc.c
+++ b/drivers/gpu/drm/xe/xe_gsc.c
@@ -136,10 +136,10 @@ static int query_compatibility_version(struct xe_gsc *gsc)
 	u64 ggtt_offset;
 	int err;
 
-	bo = xe_bo_create_pin_map(xe, tile, NULL, GSC_VER_PKT_SZ * 2,
-				  ttm_bo_type_kernel,
-				  XE_BO_FLAG_SYSTEM |
-				  XE_BO_FLAG_GGTT);
+	bo = xe_bo_create_pin_map_novm(xe, tile, GSC_VER_PKT_SZ * 2,
+				       ttm_bo_type_kernel,
+				       XE_BO_FLAG_SYSTEM |
+				       XE_BO_FLAG_GGTT, false);
 	if (IS_ERR(bo)) {
 		xe_gt_err(gt, "failed to allocate bo for GSC version query\n");
 		return PTR_ERR(bo);
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
index c712111aa30d..44cc612b0a75 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
@@ -55,12 +55,12 @@ static int pf_send_guc_save_vf_state(struct xe_gt *gt, unsigned int vfid,
 	xe_gt_assert(gt, size % sizeof(u32) == 0);
 	xe_gt_assert(gt, size == ndwords * sizeof(u32));
 
-	bo = xe_bo_create_pin_map(xe, tile, NULL,
-				  ALIGN(size, PAGE_SIZE),
-				  ttm_bo_type_kernel,
-				  XE_BO_FLAG_SYSTEM |
-				  XE_BO_FLAG_GGTT |
-				  XE_BO_FLAG_GGTT_INVALIDATE);
+	bo = xe_bo_create_pin_map_novm(xe, tile,
+				       ALIGN(size, PAGE_SIZE),
+				       ttm_bo_type_kernel,
+				       XE_BO_FLAG_SYSTEM |
+				       XE_BO_FLAG_GGTT |
+				       XE_BO_FLAG_GGTT_INVALIDATE, false);
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
 
@@ -91,12 +91,12 @@ static int pf_send_guc_restore_vf_state(struct xe_gt *gt, unsigned int vfid,
 	xe_gt_assert(gt, size % sizeof(u32) == 0);
 	xe_gt_assert(gt, size == ndwords * sizeof(u32));
 
-	bo = xe_bo_create_pin_map(xe, tile, NULL,
-				  ALIGN(size, PAGE_SIZE),
-				  ttm_bo_type_kernel,
-				  XE_BO_FLAG_SYSTEM |
-				  XE_BO_FLAG_GGTT |
-				  XE_BO_FLAG_GGTT_INVALIDATE);
+	bo = xe_bo_create_pin_map_novm(xe, tile,
+				       ALIGN(size, PAGE_SIZE),
+				       ttm_bo_type_kernel,
+				       XE_BO_FLAG_SYSTEM |
+				       XE_BO_FLAG_GGTT |
+				       XE_BO_FLAG_GGTT_INVALIDATE, false);
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
 
diff --git a/drivers/gpu/drm/xe/xe_guc_engine_activity.c b/drivers/gpu/drm/xe/xe_guc_engine_activity.c
index 92e1f9f41b8c..2b99c1ebdd58 100644
--- a/drivers/gpu/drm/xe/xe_guc_engine_activity.c
+++ b/drivers/gpu/drm/xe/xe_guc_engine_activity.c
@@ -94,16 +94,17 @@ static int allocate_engine_activity_buffers(struct xe_guc *guc,
 	struct xe_tile *tile = gt_to_tile(gt);
 	struct xe_bo *bo, *metadata_bo;
 
-	metadata_bo = xe_bo_create_pin_map(gt_to_xe(gt), tile, NULL, PAGE_ALIGN(metadata_size),
-					   ttm_bo_type_kernel, XE_BO_FLAG_SYSTEM |
-					   XE_BO_FLAG_GGTT | XE_BO_FLAG_GGTT_INVALIDATE);
+	metadata_bo = xe_bo_create_pin_map_novm(gt_to_xe(gt), tile, PAGE_ALIGN(metadata_size),
+						ttm_bo_type_kernel, XE_BO_FLAG_SYSTEM |
+						XE_BO_FLAG_GGTT | XE_BO_FLAG_GGTT_INVALIDATE,
+						false);
 
 	if (IS_ERR(metadata_bo))
 		return PTR_ERR(metadata_bo);
 
-	bo = xe_bo_create_pin_map(gt_to_xe(gt), tile, NULL, PAGE_ALIGN(size),
-				  ttm_bo_type_kernel, XE_BO_FLAG_VRAM_IF_DGFX(tile) |
-				  XE_BO_FLAG_GGTT | XE_BO_FLAG_GGTT_INVALIDATE);
+	bo = xe_bo_create_pin_map_novm(gt_to_xe(gt), tile, PAGE_ALIGN(size),
+				       ttm_bo_type_kernel, XE_BO_FLAG_VRAM_IF_DGFX(tile) |
+				       XE_BO_FLAG_GGTT | XE_BO_FLAG_GGTT_INVALIDATE, false);
 
 	if (IS_ERR(bo)) {
 		xe_bo_unpin_map_no_vm(metadata_bo);
diff --git a/drivers/gpu/drm/xe/xe_lmtt.c b/drivers/gpu/drm/xe/xe_lmtt.c
index a78c9d474a6e..4ad468574174 100644
--- a/drivers/gpu/drm/xe/xe_lmtt.c
+++ b/drivers/gpu/drm/xe/xe_lmtt.c
@@ -67,12 +67,12 @@ static struct xe_lmtt_pt *lmtt_pt_alloc(struct xe_lmtt *lmtt, unsigned int level
 		goto out;
 	}
 
-	bo = xe_bo_create_pin_map(lmtt_to_xe(lmtt), lmtt_to_tile(lmtt), NULL,
-				  PAGE_ALIGN(lmtt->ops->lmtt_pte_size(level) *
-					     lmtt->ops->lmtt_pte_num(level)),
-				  ttm_bo_type_kernel,
-				  XE_BO_FLAG_VRAM_IF_DGFX(lmtt_to_tile(lmtt)) |
-				  XE_BO_FLAG_NEEDS_64K);
+	bo = xe_bo_create_pin_map_novm(lmtt_to_xe(lmtt), lmtt_to_tile(lmtt),
+				       PAGE_ALIGN(lmtt->ops->lmtt_pte_size(level) *
+						  lmtt->ops->lmtt_pte_num(level)),
+				       ttm_bo_type_kernel,
+				       XE_BO_FLAG_VRAM_IF_DGFX(lmtt_to_tile(lmtt)) |
+				       XE_BO_FLAG_NEEDS_64K, false);
 	if (IS_ERR(bo)) {
 		err = PTR_ERR(bo);
 		goto out_free_pt;
diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index 8f6c3ba47882..6d52e0eb97f5 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -1340,9 +1340,10 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
 	if (vm && vm->xef) /* userspace */
 		bo_flags |= XE_BO_FLAG_PINNED_LATE_RESTORE;
 
-	lrc->bo = xe_bo_create_pin_map(xe, tile, NULL, bo_size,
-				       ttm_bo_type_kernel,
-				       bo_flags);
+	lrc->bo = xe_bo_create_pin_map_novm(xe, tile,
+					    bo_size,
+					    ttm_bo_type_kernel,
+					    bo_flags, false);
 	if (IS_ERR(lrc->bo))
 		return PTR_ERR(lrc->bo);
 
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 57e6d5a8ac39..b27388db42a5 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -35,6 +35,7 @@
 #include "xe_sched_job.h"
 #include "xe_sync.h"
 #include "xe_trace_bo.h"
+#include "xe_validation.h"
 #include "xe_vm.h"
 #include "xe_vram.h"
 
@@ -173,7 +174,7 @@ static void xe_migrate_program_identity(struct xe_device *xe, struct xe_vm *vm,
 }
 
 static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
-				 struct xe_vm *vm)
+				 struct xe_vm *vm, struct drm_exec *exec)
 {
 	struct xe_device *xe = tile_to_xe(tile);
 	u16 pat_index = xe->pat.idx[XE_CACHE_WB];
@@ -200,7 +201,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 				  num_entries * XE_PAGE_SIZE,
 				  ttm_bo_type_kernel,
 				  XE_BO_FLAG_VRAM_IF_DGFX(tile) |
-				  XE_BO_FLAG_PAGETABLE);
+				  XE_BO_FLAG_PAGETABLE, exec);
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
 
@@ -404,6 +405,8 @@ int xe_migrate_init(struct xe_migrate *m)
 	struct xe_tile *tile = m->tile;
 	struct xe_gt *primary_gt = tile->primary_gt;
 	struct xe_device *xe = tile_to_xe(tile);
+	struct xe_validation_ctx ctx;
+	struct drm_exec exec;
 	struct xe_vm *vm;
 	int err;
 
@@ -413,11 +416,16 @@ int xe_migrate_init(struct xe_migrate *m)
 	if (IS_ERR(vm))
 		return PTR_ERR(vm);
 
-	xe_vm_lock(vm, false);
-	err = xe_migrate_prepare_vm(tile, m, vm);
-	xe_vm_unlock(vm);
+	err = 0;
+	xe_validation_guard(&ctx, &xe->val, &exec, (struct xe_val_flags) {}, err) {
+		err = xe_vm_drm_exec_lock(vm, &exec);
+		drm_exec_retry_on_contention(&exec);
+		err = xe_migrate_prepare_vm(tile, m, vm, &exec);
+		drm_exec_retry_on_contention(&exec);
+		xe_validation_retry_on_oom(&ctx, &err);
+	}
 	if (err)
-		goto err_out;
+		return err;
 
 	if (xe->info.has_usm) {
 		struct xe_hw_engine *hwe = xe_gt_hw_engine(primary_gt,
diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
index a188bad172ad..a4894eb0d7f3 100644
--- a/drivers/gpu/drm/xe/xe_oa.c
+++ b/drivers/gpu/drm/xe/xe_oa.c
@@ -883,9 +883,9 @@ static int xe_oa_alloc_oa_buffer(struct xe_oa_stream *stream, size_t size)
 {
 	struct xe_bo *bo;
 
-	bo = xe_bo_create_pin_map(stream->oa->xe, stream->gt->tile, NULL,
-				  size, ttm_bo_type_kernel,
-				  XE_BO_FLAG_SYSTEM | XE_BO_FLAG_GGTT);
+	bo = xe_bo_create_pin_map_novm(stream->oa->xe, stream->gt->tile,
+				       size, ttm_bo_type_kernel,
+				       XE_BO_FLAG_SYSTEM | XE_BO_FLAG_GGTT, false);
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
 
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index f3a39e734a90..33ad40418ceb 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -88,6 +88,7 @@ static void xe_pt_free(struct xe_pt *pt)
  * @vm: The vm to create for.
  * @tile: The tile to create for.
  * @level: The page-table level.
+ * @exec: The drm_exec object used to lock the vm.
  *
  * Allocate and initialize a single struct xe_pt metadata structure. Also
  * create the corresponding page-table bo, but don't initialize it. If the
@@ -99,7 +100,7 @@ static void xe_pt_free(struct xe_pt *pt)
  * error.
  */
 struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
-			   unsigned int level)
+			   unsigned int level, struct drm_exec *exec)
 {
 	struct xe_pt *pt;
 	struct xe_bo *bo;
@@ -123,9 +124,11 @@ struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
 		bo_flags |= XE_BO_FLAG_PINNED_LATE_RESTORE;
 
 	pt->level = level;
+
+	drm_WARN_ON(&vm->xe->drm, IS_ERR_OR_NULL(exec));
 	bo = xe_bo_create_pin_map(vm->xe, tile, vm, SZ_4K,
 				  ttm_bo_type_kernel,
-				  bo_flags);
+				  bo_flags, exec);
 	if (IS_ERR(bo)) {
 		err = PTR_ERR(bo);
 		goto err_kfree;
@@ -589,7 +592,8 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
 	if (covers || !*child) {
 		u64 flags = 0;
 
-		xe_child = xe_pt_create(xe_walk->vm, xe_walk->tile, level - 1);
+		xe_child = xe_pt_create(xe_walk->vm, xe_walk->tile, level - 1,
+					xe_vm_validation_exec(vm));
 		if (IS_ERR(xe_child))
 			return PTR_ERR(xe_child);
 
diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
index 5ecf003d513c..4daeebaab5a1 100644
--- a/drivers/gpu/drm/xe/xe_pt.h
+++ b/drivers/gpu/drm/xe/xe_pt.h
@@ -10,6 +10,7 @@
 #include "xe_pt_types.h"
 
 struct dma_fence;
+struct drm_exec;
 struct xe_bo;
 struct xe_device;
 struct xe_exec_queue;
@@ -29,7 +30,7 @@ struct xe_vma_ops;
 unsigned int xe_pt_shift(unsigned int level);
 
 struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
-			   unsigned int level);
+			   unsigned int level, struct drm_exec *exec);
 
 void xe_pt_populate_empty(struct xe_tile *tile, struct xe_vm *vm,
 			  struct xe_pt *pt);
diff --git a/drivers/gpu/drm/xe/xe_pxp_submit.c b/drivers/gpu/drm/xe/xe_pxp_submit.c
index ca95f2a4d4ef..e60526e30030 100644
--- a/drivers/gpu/drm/xe/xe_pxp_submit.c
+++ b/drivers/gpu/drm/xe/xe_pxp_submit.c
@@ -54,8 +54,9 @@ static int allocate_vcs_execution_resources(struct xe_pxp *pxp)
 	 * Each termination is 16 DWORDS, so 4K is enough to contain a
 	 * termination for each sessions.
 	 */
-	bo = xe_bo_create_pin_map(xe, tile, NULL, SZ_4K, ttm_bo_type_kernel,
-				  XE_BO_FLAG_SYSTEM | XE_BO_FLAG_PINNED | XE_BO_FLAG_GGTT);
+	bo = xe_bo_create_pin_map_novm(xe, tile, SZ_4K, ttm_bo_type_kernel,
+				       XE_BO_FLAG_SYSTEM | XE_BO_FLAG_PINNED | XE_BO_FLAG_GGTT,
+				       false);
 	if (IS_ERR(bo)) {
 		err = PTR_ERR(bo);
 		goto out_queue;
@@ -87,7 +88,9 @@ static int allocate_gsc_client_resources(struct xe_gt *gt,
 {
 	struct xe_tile *tile = gt_to_tile(gt);
 	struct xe_device *xe = tile_to_xe(tile);
+	struct xe_validation_ctx ctx;
 	struct xe_hw_engine *hwe;
+	struct drm_exec exec;
 	struct xe_vm *vm;
 	struct xe_bo *bo;
 	struct xe_exec_queue *q;
@@ -106,15 +109,26 @@ static int allocate_gsc_client_resources(struct xe_gt *gt,
 		return PTR_ERR(vm);
 
 	/* We allocate a single object for the batch and the in/out memory */
-	xe_vm_lock(vm, false);
-	bo = xe_bo_create_pin_map(xe, tile, vm, PXP_BB_SIZE + inout_size * 2,
-				  ttm_bo_type_kernel,
-				  XE_BO_FLAG_SYSTEM | XE_BO_FLAG_PINNED | XE_BO_FLAG_NEEDS_UC);
-	xe_vm_unlock(vm);
-	if (IS_ERR(bo)) {
-		err = PTR_ERR(bo);
-		goto vm_out;
+
+	xe_validation_guard(&ctx, &xe->val, &exec, (struct xe_val_flags){}, err) {
+		err = xe_vm_drm_exec_lock(vm, &exec);
+		drm_exec_retry_on_contention(&exec);
+		if (err)
+			break;
+
+		bo = xe_bo_create_pin_map(xe, tile, vm, PXP_BB_SIZE + inout_size * 2,
+					  ttm_bo_type_kernel,
+					  XE_BO_FLAG_SYSTEM | XE_BO_FLAG_PINNED |
+					  XE_BO_FLAG_NEEDS_UC, &exec);
+		drm_exec_retry_on_contention(&exec);
+		if (IS_ERR(bo)) {
+			err = PTR_ERR(bo);
+			xe_validation_retry_on_oom(&ctx, &err);
+			break;
+		}
 	}
+	if (err)
+		goto vm_out;
 
 	fence = xe_vm_bind_kernel_bo(vm, bo, NULL, 0, XE_CACHE_WB);
 	if (IS_ERR(fence)) {
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 23015f369e34..0d8414bd6caa 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1603,6 +1603,7 @@ static void vm_destroy_work_func(struct work_struct *w);
  * @xe: xe device.
  * @tile: tile to set up for.
  * @vm: vm to set up for.
+ * @exec: The struct drm_exec object used to lock the vm resv.
  *
  * Sets up a pagetable tree with one page-table per level and a single
  * leaf PTE. All pagetable entries point to the single page-table or,
@@ -1612,20 +1613,19 @@ static void vm_destroy_work_func(struct work_struct *w);
  * Return: 0 on success, negative error code on error.
  */
 static int xe_vm_create_scratch(struct xe_device *xe, struct xe_tile *tile,
-				struct xe_vm *vm)
+				struct xe_vm *vm, struct drm_exec *exec)
 {
 	u8 id = tile->id;
 	int i;
 
 	for (i = MAX_HUGEPTE_LEVEL; i < vm->pt_root[id]->level; i++) {
-		vm->scratch_pt[id][i] = xe_pt_create(vm, tile, i);
+		vm->scratch_pt[id][i] = xe_pt_create(vm, tile, i, exec);
 		if (IS_ERR(vm->scratch_pt[id][i])) {
 			int err = PTR_ERR(vm->scratch_pt[id][i]);
 
 			vm->scratch_pt[id][i] = NULL;
 			return err;
 		}
-
 		xe_pt_populate_empty(tile, vm, vm->scratch_pt[id][i]);
 	}
 
@@ -1653,9 +1653,26 @@ static void xe_vm_free_scratch(struct xe_vm *vm)
 	}
 }
 
+static void xe_vm_pt_destroy(struct xe_vm *vm)
+{
+	struct xe_tile *tile;
+	u8 id;
+
+	xe_vm_assert_held(vm);
+
+	for_each_tile(tile, vm->xe, id) {
+		if (vm->pt_root[id]) {
+			xe_pt_destroy(vm->pt_root[id], vm->flags, NULL);
+			vm->pt_root[id] = NULL;
+		}
+	}
+}
+
 struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct xe_file *xef)
 {
 	struct drm_gem_object *vm_resv_obj;
+	struct xe_validation_ctx ctx;
+	struct drm_exec exec;
 	struct xe_vm *vm;
 	int err, number_tiles = 0;
 	struct xe_tile *tile;
@@ -1742,49 +1759,68 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct xe_file *xef)
 
 	drm_gem_object_put(vm_resv_obj);
 
-	err = xe_vm_lock(vm, true);
-	if (err)
-		goto err_close;
+	err = 0;
+	xe_validation_guard(&ctx, &xe->val, &exec, (struct xe_val_flags) {.interruptible = true},
+			    err) {
+		err = xe_vm_drm_exec_lock(vm, &exec);
+		drm_exec_retry_on_contention(&exec);
 
-	if (IS_DGFX(xe) && xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K)
-		vm->flags |= XE_VM_FLAG_64K;
+		if (IS_DGFX(xe) && xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K)
+			vm->flags |= XE_VM_FLAG_64K;
 
-	for_each_tile(tile, xe, id) {
-		if (flags & XE_VM_FLAG_MIGRATION &&
-		    tile->id != XE_VM_FLAG_TILE_ID(flags))
-			continue;
+		for_each_tile(tile, xe, id) {
+			if (flags & XE_VM_FLAG_MIGRATION &&
+			    tile->id != XE_VM_FLAG_TILE_ID(flags))
+				continue;
 
-		vm->pt_root[id] = xe_pt_create(vm, tile, xe->info.vm_max_level);
-		if (IS_ERR(vm->pt_root[id])) {
-			err = PTR_ERR(vm->pt_root[id]);
-			vm->pt_root[id] = NULL;
-			goto err_unlock_close;
+			vm->pt_root[id] = xe_pt_create(vm, tile, xe->info.vm_max_level,
+						       &exec);
+			if (IS_ERR(vm->pt_root[id])) {
+				err = PTR_ERR(vm->pt_root[id]);
+				vm->pt_root[id] = NULL;
+				xe_vm_pt_destroy(vm);
+				drm_exec_retry_on_contention(&exec);
+				xe_validation_retry_on_oom(&ctx, &err);
+				break;
+			}
 		}
-	}
+		if (err)
+			break;
 
-	if (xe_vm_has_scratch(vm)) {
-		for_each_tile(tile, xe, id) {
-			if (!vm->pt_root[id])
-				continue;
+		if (xe_vm_has_scratch(vm)) {
+			for_each_tile(tile, xe, id) {
+				if (!vm->pt_root[id])
+					continue;
 
-			err = xe_vm_create_scratch(xe, tile, vm);
+				err = xe_vm_create_scratch(xe, tile, vm, &exec);
+				if (err) {
+					xe_vm_free_scratch(vm);
+					xe_vm_pt_destroy(vm);
+					drm_exec_retry_on_contention(&exec);
+					xe_validation_retry_on_oom(&ctx, &err);
+					break;
+				}
+			}
 			if (err)
-				goto err_unlock_close;
+				break;
+			vm->batch_invalidate_tlb = true;
 		}
-		vm->batch_invalidate_tlb = true;
-	}
 
-	if (vm->flags & XE_VM_FLAG_LR_MODE)
-		vm->batch_invalidate_tlb = false;
+		if (vm->flags & XE_VM_FLAG_LR_MODE) {
+			INIT_WORK(&vm->preempt.rebind_work, preempt_rebind_work_func);
+			vm->batch_invalidate_tlb = false;
+		}
 
-	/* Fill pt_root after allocating scratch tables */
-	for_each_tile(tile, xe, id) {
-		if (!vm->pt_root[id])
-			continue;
+		/* Fill pt_root after allocating scratch tables */
+		for_each_tile(tile, xe, id) {
+			if (!vm->pt_root[id])
+				continue;
 
-		xe_pt_populate_empty(tile, vm, vm->pt_root[id]);
+			xe_pt_populate_empty(tile, vm, vm->pt_root[id]);
+		}
 	}
-	xe_vm_unlock(vm);
+	if (err)
+		goto err_close;
 
 	/* Kernel migration VM shouldn't have a circular loop.. */
 	if (!(flags & XE_VM_FLAG_MIGRATION)) {
@@ -1817,7 +1853,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct xe_file *xef)
 				      &xe->usm.next_asid, GFP_KERNEL);
 		up_write(&xe->usm.lock);
 		if (err < 0)
-			goto err_unlock_close;
+			goto err_close;
 
 		vm->usm.asid = asid;
 	}
@@ -1826,8 +1862,6 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct xe_file *xef)
 
 	return vm;
 
-err_unlock_close:
-	xe_vm_unlock(vm);
 err_close:
 	xe_vm_close_and_put(vm);
 	return ERR_PTR(err);
@@ -1956,13 +1990,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
 	 * destroy the pagetables immediately.
 	 */
 	xe_vm_free_scratch(vm);
-
-	for_each_tile(tile, xe, id) {
-		if (vm->pt_root[id]) {
-			xe_pt_destroy(vm->pt_root[id], vm->flags, NULL);
-			vm->pt_root[id] = NULL;
-		}
-	}
+	xe_vm_pt_destroy(vm);
 	xe_vm_unlock(vm);
 
 	/*
@@ -3857,7 +3885,6 @@ struct dma_fence *xe_vm_bind_kernel_bo(struct xe_vm *vm, struct xe_bo *bo,
  */
 int xe_vm_lock(struct xe_vm *vm, bool intr)
 {
-	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
 	int ret;
 
 	if (intr)
@@ -3865,9 +3892,6 @@ int xe_vm_lock(struct xe_vm *vm, bool intr)
 	else
 		ret = dma_resv_lock(xe_vm_resv(vm), NULL);
 
-	if (!ret)
-		xe_vm_set_validation_exec(vm, exec);
-
 	return ret;
 }
 
@@ -3879,7 +3903,6 @@ int xe_vm_lock(struct xe_vm *vm, bool intr)
  */
 void xe_vm_unlock(struct xe_vm *vm)
 {
-	xe_vm_set_validation_exec(vm, NULL);
 	dma_resv_unlock(xe_vm_resv(vm));
 }
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 15/16] drm/xe/sriov: Convert pf_provision_vf_lmem for exhaustive eviction
  2025-08-22  9:40 [PATCH v2 00/16] Driver-managed exhaustive eviction Thomas Hellström
                   ` (13 preceding siblings ...)
  2025-08-22  9:40 ` [PATCH v2 14/16] drm/xe: Convert xe_bo_create_pin_map() " Thomas Hellström
@ 2025-08-22  9:40 ` Thomas Hellström
  2025-08-22 19:35   ` Matthew Brost
  2025-08-22  9:40 ` [PATCH v2 16/16] drm/xe: Convert pinned suspend eviction " Thomas Hellström
                   ` (4 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Thomas Hellström @ 2025-08-22  9:40 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

Open-code since this is the only identified instance of pinning
without mapping.

v2:
- Break out this patch from the previous one.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c | 51 ++++++++++++++--------
 1 file changed, 33 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
index 906011671b60..c9e3c811c35b 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
@@ -1452,11 +1452,12 @@ static bool pf_release_vf_config_lmem(struct xe_gt *gt, struct xe_gt_sriov_confi
 static int pf_provision_vf_lmem(struct xe_gt *gt, unsigned int vfid, u64 size)
 {
 	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
-	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
 	struct xe_device *xe = gt_to_xe(gt);
 	struct xe_tile *tile = gt_to_tile(gt);
+	struct xe_validation_ctx ctx;
+	struct drm_exec exec;
 	struct xe_bo *bo;
-	int err;
+	int err = 0;
 
 	xe_gt_assert(gt, vfid);
 	xe_gt_assert(gt, IS_DGFX(xe));
@@ -1479,23 +1480,37 @@ static int pf_provision_vf_lmem(struct xe_gt *gt, unsigned int vfid, u64 size)
 		return 0;
 
 	xe_gt_assert(gt, pf_get_lmem_alignment(gt) == SZ_2M);
-	bo = xe_bo_create_locked(xe, tile, NULL,
-				 ALIGN(size, PAGE_SIZE),
-				 ttm_bo_type_kernel,
-				 XE_BO_FLAG_VRAM_IF_DGFX(tile) |
-				 XE_BO_FLAG_NEEDS_2M |
-				 XE_BO_FLAG_PINNED |
-				 XE_BO_FLAG_PINNED_LATE_RESTORE,
-				 exec);
-	if (IS_ERR(bo))
-		return PTR_ERR(bo);
-
-	err = xe_bo_pin(bo, exec);
-	xe_bo_unlock(bo);
-	if (unlikely(err)) {
-		xe_bo_put(bo);
-		return err;
+
+	/*
+	 * Open-code for now, since this is the only instance of
+	 * pinning without mapping.
+	 */
+	xe_validation_guard(&ctx, &xe->val, &exec, (struct xe_val_flags) {.exclusive = true}, err) {
+		bo = xe_bo_create_locked(xe, tile, NULL,
+					 ALIGN(size, PAGE_SIZE),
+					 ttm_bo_type_kernel,
+					 XE_BO_FLAG_VRAM_IF_DGFX(tile) |
+					 XE_BO_FLAG_NEEDS_2M |
+					 XE_BO_FLAG_PINNED |
+					 XE_BO_FLAG_PINNED_LATE_RESTORE,
+					 &exec);
+		if (IS_ERR(bo)) {
+			drm_exec_retry_on_contention(&exec);
+			err = PTR_ERR(bo);
+			xe_validation_retry_on_oom(&ctx, &err);
+			return PTR_ERR(bo);
+		}
+
+		err = xe_bo_pin(bo, &exec);
+		xe_bo_unlock(bo);
+		if (err) {
+			xe_bo_put(bo);
+			drm_exec_retry_on_contention(&exec);
+			xe_validation_retry_on_oom(&ctx, &err);
+		}
 	}
+	if (err)
+		return err;
 
 	config->lmem_obj = bo;
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 16/16] drm/xe: Convert pinned suspend eviction for exhaustive eviction
  2025-08-22  9:40 [PATCH v2 00/16] Driver-managed exhaustive eviction Thomas Hellström
                   ` (14 preceding siblings ...)
  2025-08-22  9:40 ` [PATCH v2 15/16] drm/xe/sriov: Convert pf_provision_vf_lmem " Thomas Hellström
@ 2025-08-22  9:40 ` Thomas Hellström
  2025-08-26 22:08   ` Matthew Brost
  2025-08-22 10:50 ` ✗ CI.checkpatch: warning for Driver-managed exhaustive eviction (rev2) Patchwork
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Thomas Hellström @ 2025-08-22  9:40 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Joonas Lahtinen,
	Jani Nikula, Maarten Lankhorst, Matthew Auld

Pinned suspend eviction and preparation for eviction validates
system memory for eviction buffers. Do that under a
validation exclusive lock to avoid interfering with other
processes validating system graphics memory.

v2:
- Avoid gotos from within xe_validation_guard().
- Adapt to signature change of xe_validation_guard().

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c | 184 +++++++++++++++++++++----------------
 1 file changed, 103 insertions(+), 81 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 7a62629c88e0..9733f742525a 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1139,43 +1139,47 @@ long xe_bo_shrink(struct ttm_operation_ctx *ctx, struct ttm_buffer_object *bo,
 int xe_bo_notifier_prepare_pinned(struct xe_bo *bo)
 {
 	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
-	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
+	struct xe_validation_ctx ctx;
+	struct drm_exec exec;
 	struct xe_bo *backup;
 	int ret = 0;
 
-	xe_bo_lock(bo, false);
+	xe_validation_guard(&ctx, &xe->val, &exec, (struct xe_val_flags) {.exclusive = true}, ret) {
+		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
+		drm_exec_retry_on_contention(&exec);
+		xe_assert(xe, !ret);
+		xe_assert(xe, !bo->backup_obj);
 
-	xe_assert(xe, !bo->backup_obj);
+		/*
+		 * Since this is called from the PM notifier we might have raced with
+		 * someone unpinning this after we dropped the pinned list lock and
+		 * grabbing the above bo lock.
+		 */
+		if (!xe_bo_is_pinned(bo))
+			break;
 
-	/*
-	 * Since this is called from the PM notifier we might have raced with
-	 * someone unpinning this after we dropped the pinned list lock and
-	 * grabbing the above bo lock.
-	 */
-	if (!xe_bo_is_pinned(bo))
-		goto out_unlock_bo;
+		if (!xe_bo_is_vram(bo))
+			break;
 
-	if (!xe_bo_is_vram(bo))
-		goto out_unlock_bo;
+		if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
+			break;
 
-	if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
-		goto out_unlock_bo;
+		backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
+					   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
+					   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
+					   XE_BO_FLAG_PINNED, &exec);
+		if (IS_ERR(backup)) {
+			drm_exec_retry_on_contention(&exec);
+			ret = PTR_ERR(backup);
+			xe_validation_retry_on_oom(&ctx, &ret);
+			break;
+		}
 
-	backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
-				   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
-				   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
-				   XE_BO_FLAG_PINNED, exec);
-	if (IS_ERR(backup)) {
-		ret = PTR_ERR(backup);
-		goto out_unlock_bo;
+		backup->parent_obj = xe_bo_get(bo); /* Released by bo_destroy */
+		ttm_bo_pin(&backup->ttm);
+		bo->backup_obj = backup;
 	}
 
-	backup->parent_obj = xe_bo_get(bo); /* Released by bo_destroy */
-	ttm_bo_pin(&backup->ttm);
-	bo->backup_obj = backup;
-
-out_unlock_bo:
-	xe_bo_unlock(bo);
 	return ret;
 }
 
@@ -1201,57 +1205,12 @@ int xe_bo_notifier_unprepare_pinned(struct xe_bo *bo)
 	return 0;
 }
 
-/**
- * xe_bo_evict_pinned() - Evict a pinned VRAM object to system memory
- * @bo: The buffer object to move.
- *
- * On successful completion, the object memory will be moved to system memory.
- *
- * This is needed to for special handling of pinned VRAM object during
- * suspend-resume.
- *
- * Return: 0 on success. Negative error code on failure.
- */
-int xe_bo_evict_pinned(struct xe_bo *bo)
+static int xe_bo_evict_pinned_copy(struct xe_bo *bo, struct xe_bo *backup)
 {
-	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
-	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
-	struct xe_bo *backup = bo->backup_obj;
-	bool backup_created = false;
+	struct xe_device *xe = xe_bo_device(bo);
 	bool unmap = false;
 	int ret = 0;
 
-	xe_bo_lock(bo, false);
-
-	if (WARN_ON(!bo->ttm.resource)) {
-		ret = -EINVAL;
-		goto out_unlock_bo;
-	}
-
-	if (WARN_ON(!xe_bo_is_pinned(bo))) {
-		ret = -EINVAL;
-		goto out_unlock_bo;
-	}
-
-	if (!xe_bo_is_vram(bo))
-		goto out_unlock_bo;
-
-	if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
-		goto out_unlock_bo;
-
-	if (!backup) {
-		backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
-					   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
-					   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
-					   XE_BO_FLAG_PINNED, exec);
-		if (IS_ERR(backup)) {
-			ret = PTR_ERR(backup);
-			goto out_unlock_bo;
-		}
-		backup->parent_obj = xe_bo_get(bo); /* Released by bo_destroy */
-		backup_created = true;
-	}
-
 	if (xe_bo_is_user(bo) || (bo->flags & XE_BO_FLAG_PINNED_LATE_RESTORE)) {
 		struct xe_migrate *migrate;
 		struct dma_fence *fence;
@@ -1289,7 +1248,7 @@ int xe_bo_evict_pinned(struct xe_bo *bo)
 		if (iosys_map_is_null(&bo->vmap)) {
 			ret = xe_bo_vmap(bo);
 			if (ret)
-				goto out_backup;
+				goto out_vunmap;
 			unmap = true;
 		}
 
@@ -1299,15 +1258,78 @@ int xe_bo_evict_pinned(struct xe_bo *bo)
 
 	if (!bo->backup_obj)
 		bo->backup_obj = backup;
-
-out_backup:
+out_vunmap:
 	xe_bo_vunmap(backup);
-	if (ret && backup_created)
-		xe_bo_put(backup);
-out_unlock_bo:
+out_backup:
 	if (unmap)
 		xe_bo_vunmap(bo);
-	xe_bo_unlock(bo);
+
+	return ret;
+}
+
+/**
+ * xe_bo_evict_pinned() - Evict a pinned VRAM object to system memory
+ * @bo: The buffer object to move.
+ *
+ * On successful completion, the object memory will be moved to system memory.
+ *
+ * This is needed to for special handling of pinned VRAM object during
+ * suspend-resume.
+ *
+ * Return: 0 on success. Negative error code on failure.
+ */
+int xe_bo_evict_pinned(struct xe_bo *bo)
+{
+	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
+	struct xe_validation_ctx ctx;
+	struct drm_exec exec;
+	struct xe_bo *backup = bo->backup_obj;
+	bool backup_created = false;
+	int ret = 0;
+
+	xe_validation_guard(&ctx, &xe->val, &exec, (struct xe_val_flags) {.exclusive = true}, ret) {
+		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
+		drm_exec_retry_on_contention(&exec);
+		xe_assert(xe, !ret);
+
+		if (WARN_ON(!bo->ttm.resource)) {
+			ret = -EINVAL;
+			break;
+		}
+
+		if (WARN_ON(!xe_bo_is_pinned(bo))) {
+			ret = -EINVAL;
+			break;
+		}
+
+		if (!xe_bo_is_vram(bo))
+			break;
+
+		if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
+			break;
+
+		if (!backup) {
+			backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL,
+						   xe_bo_size(bo),
+						   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
+						   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
+						   XE_BO_FLAG_PINNED, &exec);
+			if (IS_ERR(backup)) {
+				drm_exec_retry_on_contention(&exec);
+				ret = PTR_ERR(backup);
+				xe_validation_retry_on_oom(&ctx, &ret);
+				break;
+			}
+			backup->parent_obj = xe_bo_get(bo); /* Released by bo_destroy */
+			backup_created = true;
+		}
+
+		ret = xe_bo_evict_pinned_copy(bo, backup);
+	}
+
+	if (ret && backup_created)
+		xe_bo_put(backup);
+
 	return ret;
 }
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* ✗ CI.checkpatch: warning for Driver-managed exhaustive eviction (rev2)
  2025-08-22  9:40 [PATCH v2 00/16] Driver-managed exhaustive eviction Thomas Hellström
                   ` (15 preceding siblings ...)
  2025-08-22  9:40 ` [PATCH v2 16/16] drm/xe: Convert pinned suspend eviction " Thomas Hellström
@ 2025-08-22 10:50 ` Patchwork
  2025-08-22 10:51 ` ✓ CI.KUnit: success " Patchwork
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 36+ messages in thread
From: Patchwork @ 2025-08-22 10:50 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

== Series Details ==

Series: Driver-managed exhaustive eviction (rev2)
URL   : https://patchwork.freedesktop.org/series/152882/
State : warning

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
553439844b6500767ce8aef522cfe9fbb7ece541
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit 365a3d09400cf0d55634db3a3ee0490501526dff
Author: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Date:   Fri Aug 22 11:40:30 2025 +0200

    drm/xe: Convert pinned suspend eviction for exhaustive eviction
    
    Pinned suspend eviction and preparation for eviction validates
    system memory for eviction buffers. Do that under a
    validation exclusive lock to avoid interfering with other
    processes validating system graphics memory.
    
    v2:
    - Avoid gotos from within xe_validation_guard().
    - Adapt to signature change of xe_validation_guard().
    
    Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
+ /mt/dim checkpatch cca87ca63e2f5b8a785dc59c23e526987530b27f drm-intel
df7c2f2aaf96 drm/xe/vm: Don't pin the vm_resv during validation
45afdb0d996c drm/xe/tests/xe_dma_buf: Set the drm_object::dma_buf member
b57290812749 drm/xe/vm: Clear the scratch_pt pointer on error
2f7d25b93f24 drm/xe: Pass down drm_exec context to validation
-:1131: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#1131: 
new file mode 100644

total: 0 errors, 1 warnings, 0 checks, 1236 lines checked
38fbfc5e8c13 drm/xe: Introduce an xe_validation wrapper around drm_exec
-:380: WARNING:MACRO_WITH_FLOW_CONTROL: Macros with flow control statements should be avoided
#380: FILE: drivers/gpu/drm/xe/xe_validation.h:145:
+#define xe_validation_retry_on_oom(_ctx, _ret)				\
+	do {								\
+		if (xe_validation_should_retry(_ctx, _ret))		\
+			goto *__drm_exec_retry_ptr;			\
+	} while (0)

-:402: WARNING:TABSTOP: Statements should start on a tabstop
#402: FILE: drivers/gpu/drm/xe/xe_validation.h:167:
+	     if (_T) xe_validation_ctx_fini(_T);,

-:402: ERROR:SPACING: space required after that ';' (ctx:VxO)
#402: FILE: drivers/gpu/drm/xe/xe_validation.h:167:
+	     if (_T) xe_validation_ctx_fini(_T);,
 	                                       ^

-:402: ERROR:TRAILING_STATEMENTS: trailing statements should be on next line
#402: FILE: drivers/gpu/drm/xe/xe_validation.h:167:
+	     if (_T) xe_validation_ctx_fini(_T);,

-:422: ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#422: FILE: drivers/gpu/drm/xe/xe_validation.h:187:
+#define xe_validation_guard(_ctx, _val, _exec, _flags, _ret)		\
+	scoped_guard(xe_validation, _ctx, _val, _exec, _flags, _ret) \
+	drm_exec_until_all_locked(_exec)

BUT SEE:

   do {} while (0) advice is over-stated in a few situations:

   The more obvious case is macros, like MODULE_PARM_DESC, invoked at
   file-scope, where C disallows code (it must be in functions).  See
   $exceptions if you have one to add by name.

   More troublesome is declarative macros used at top of new scope,
   like DECLARE_PER_CPU.  These might just compile with a do-while-0
   wrapper, but would be incorrect.  Most of these are handled by
   detecting struct,union,etc declaration primitives in $exceptions.

   Theres also macros called inside an if (block), which "return" an
   expression.  These cannot do-while, and need a ({}) wrapper.

   Enjoy this qualification while we work to improve our heuristics.

-:422: CHECK:MACRO_ARG_REUSE: Macro argument reuse '_exec' - possible side-effects?
#422: FILE: drivers/gpu/drm/xe/xe_validation.h:187:
+#define xe_validation_guard(_ctx, _val, _exec, _flags, _ret)		\
+	scoped_guard(xe_validation, _ctx, _val, _exec, _flags, _ret) \
+	drm_exec_until_all_locked(_exec)

total: 3 errors, 2 warnings, 1 checks, 372 lines checked
90db21b8e407 drm/xe: Convert xe_bo_create_user() for exhaustive eviction
48ef01d339f2 drm/xe: Convert SVM validation for exhaustive eviction
6be77af31ecd drm/xe: Convert existing drm_exec transactions for exhaustive eviction
ecd08d38f3ef drm/xe: Convert the CPU fault handler for exhaustive eviction
26e3c40ddb72 drm/xe/display: Convert __xe_pin_fb_vma()
40c78b67e852 drm/xe: Convert xe_dma_buf.c for exhaustive eviction
-:22: WARNING:TYPO_SPELLING: 'unneded' may be misspelled - perhaps 'unneeded'?
#22: 
- Remove an unneded (void)ret. (Matt Brost)
            ^^^^^^^

total: 0 errors, 1 warnings, 0 checks, 82 lines checked
45833160fd92 drm/xe: Rename ___xe_bo_create_locked()
6cf363c2c4ed drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction
750c480bf99d drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction
-:57: WARNING:LONG_LINE: line length of 102 exceeds 100 columns
#57: FILE: drivers/gpu/drm/xe/display/intel_fbdev_fb.c:59:
+						XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |

total: 0 errors, 1 warnings, 0 checks, 649 lines checked
ee1deede2fa7 drm/xe/sriov: Convert pf_provision_vf_lmem for exhaustive eviction
365a3d09400c drm/xe: Convert pinned suspend eviction for exhaustive eviction



^ permalink raw reply	[flat|nested] 36+ messages in thread

* ✓ CI.KUnit: success for Driver-managed exhaustive eviction (rev2)
  2025-08-22  9:40 [PATCH v2 00/16] Driver-managed exhaustive eviction Thomas Hellström
                   ` (16 preceding siblings ...)
  2025-08-22 10:50 ` ✗ CI.checkpatch: warning for Driver-managed exhaustive eviction (rev2) Patchwork
@ 2025-08-22 10:51 ` Patchwork
  2025-08-22 11:31 ` ✓ Xe.CI.BAT: " Patchwork
  2025-08-23  4:17 ` ✗ Xe.CI.Full: failure " Patchwork
  19 siblings, 0 replies; 36+ messages in thread
From: Patchwork @ 2025-08-22 10:51 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

== Series Details ==

Series: Driver-managed exhaustive eviction (rev2)
URL   : https://patchwork.freedesktop.org/series/152882/
State : success

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
[10:50:19] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[10:50:24] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[10:50:52] Starting KUnit Kernel (1/1)...
[10:50:52] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[10:50:53] ================== guc_buf (11 subtests) ===================
[10:50:53] [PASSED] test_smallest
[10:50:53] [PASSED] test_largest
[10:50:53] [PASSED] test_granular
[10:50:53] [PASSED] test_unique
[10:50:53] [PASSED] test_overlap
[10:50:53] [PASSED] test_reusable
[10:50:53] [PASSED] test_too_big
[10:50:53] [PASSED] test_flush
[10:50:53] [PASSED] test_lookup
[10:50:53] [PASSED] test_data
[10:50:53] [PASSED] test_class
[10:50:53] ===================== [PASSED] guc_buf =====================
[10:50:53] =================== guc_dbm (7 subtests) ===================
[10:50:53] [PASSED] test_empty
[10:50:53] [PASSED] test_default
[10:50:53] ======================== test_size  ========================
[10:50:53] [PASSED] 4
[10:50:53] [PASSED] 8
[10:50:53] [PASSED] 32
[10:50:53] [PASSED] 256
[10:50:53] ==================== [PASSED] test_size ====================
[10:50:53] ======================= test_reuse  ========================
[10:50:53] [PASSED] 4
[10:50:53] [PASSED] 8
[10:50:53] [PASSED] 32
[10:50:53] [PASSED] 256
[10:50:53] =================== [PASSED] test_reuse ====================
[10:50:53] =================== test_range_overlap  ====================
[10:50:53] [PASSED] 4
[10:50:53] [PASSED] 8
[10:50:53] [PASSED] 32
[10:50:53] [PASSED] 256
[10:50:53] =============== [PASSED] test_range_overlap ================
[10:50:53] =================== test_range_compact  ====================
[10:50:53] [PASSED] 4
[10:50:53] [PASSED] 8
[10:50:53] [PASSED] 32
[10:50:53] [PASSED] 256
[10:50:53] =============== [PASSED] test_range_compact ================
[10:50:53] ==================== test_range_spare  =====================
[10:50:53] [PASSED] 4
[10:50:53] [PASSED] 8
[10:50:53] [PASSED] 32
[10:50:53] [PASSED] 256
[10:50:53] ================ [PASSED] test_range_spare =================
[10:50:53] ===================== [PASSED] guc_dbm =====================
[10:50:53] =================== guc_idm (6 subtests) ===================
[10:50:53] [PASSED] bad_init
[10:50:53] [PASSED] no_init
[10:50:53] [PASSED] init_fini
[10:50:53] [PASSED] check_used
[10:50:53] [PASSED] check_quota
[10:50:53] [PASSED] check_all
[10:50:53] ===================== [PASSED] guc_idm =====================
[10:50:53] ================== no_relay (3 subtests) ===================
[10:50:53] [PASSED] xe_drops_guc2pf_if_not_ready
[10:50:53] [PASSED] xe_drops_guc2vf_if_not_ready
[10:50:53] [PASSED] xe_rejects_send_if_not_ready
[10:50:53] ==================== [PASSED] no_relay =====================
[10:50:53] ================== pf_relay (14 subtests) ==================
[10:50:53] [PASSED] pf_rejects_guc2pf_too_short
[10:50:53] [PASSED] pf_rejects_guc2pf_too_long
[10:50:53] [PASSED] pf_rejects_guc2pf_no_payload
[10:50:53] [PASSED] pf_fails_no_payload
[10:50:53] [PASSED] pf_fails_bad_origin
[10:50:53] [PASSED] pf_fails_bad_type
[10:50:53] [PASSED] pf_txn_reports_error
[10:50:53] [PASSED] pf_txn_sends_pf2guc
[10:50:53] [PASSED] pf_sends_pf2guc
[10:50:53] [SKIPPED] pf_loopback_nop
[10:50:53] [SKIPPED] pf_loopback_echo
[10:50:53] [SKIPPED] pf_loopback_fail
[10:50:53] [SKIPPED] pf_loopback_busy
[10:50:53] [SKIPPED] pf_loopback_retry
[10:50:53] ==================== [PASSED] pf_relay =====================
[10:50:53] ================== vf_relay (3 subtests) ===================
[10:50:53] [PASSED] vf_rejects_guc2vf_too_short
[10:50:53] [PASSED] vf_rejects_guc2vf_too_long
[10:50:53] [PASSED] vf_rejects_guc2vf_no_payload
[10:50:53] ==================== [PASSED] vf_relay =====================
[10:50:53] ===================== lmtt (1 subtest) =====================
[10:50:53] ======================== test_ops  =========================
[10:50:53] [PASSED] 2-level
[10:50:53] [PASSED] multi-level
[10:50:53] ==================== [PASSED] test_ops =====================
[10:50:53] ====================== [PASSED] lmtt =======================
[10:50:53] ================= pf_service (11 subtests) =================
[10:50:53] [PASSED] pf_negotiate_any
[10:50:53] [PASSED] pf_negotiate_base_match
[10:50:53] [PASSED] pf_negotiate_base_newer
[10:50:53] [PASSED] pf_negotiate_base_next
[10:50:53] [SKIPPED] pf_negotiate_base_older
[10:50:53] [PASSED] pf_negotiate_base_prev
[10:50:53] [PASSED] pf_negotiate_latest_match
[10:50:53] [PASSED] pf_negotiate_latest_newer
[10:50:53] [PASSED] pf_negotiate_latest_next
[10:50:53] [SKIPPED] pf_negotiate_latest_older
[10:50:53] [SKIPPED] pf_negotiate_latest_prev
[10:50:53] =================== [PASSED] pf_service ====================
[10:50:53] =================== xe_mocs (2 subtests) ===================
[10:50:53] ================ xe_live_mocs_kernel_kunit  ================
[10:50:53] =========== [SKIPPED] xe_live_mocs_kernel_kunit ============
[10:50:53] ================ xe_live_mocs_reset_kunit  =================
[10:50:53] ============ [SKIPPED] xe_live_mocs_reset_kunit ============
[10:50:53] ==================== [SKIPPED] xe_mocs =====================
[10:50:53] ================= xe_migrate (2 subtests) ==================
[10:50:53] ================= xe_migrate_sanity_kunit  =================
[10:50:53] ============ [SKIPPED] xe_migrate_sanity_kunit =============
[10:50:53] ================== xe_validate_ccs_kunit  ==================
[10:50:53] ============= [SKIPPED] xe_validate_ccs_kunit ==============
[10:50:53] =================== [SKIPPED] xe_migrate ===================
[10:50:53] ================== xe_dma_buf (1 subtest) ==================
[10:50:53] ==================== xe_dma_buf_kunit  =====================
[10:50:53] ================ [SKIPPED] xe_dma_buf_kunit ================
[10:50:53] =================== [SKIPPED] xe_dma_buf ===================
[10:50:53] ================= xe_bo_shrink (1 subtest) =================
[10:50:53] =================== xe_bo_shrink_kunit  ====================
[10:50:53] =============== [SKIPPED] xe_bo_shrink_kunit ===============
[10:50:53] ================== [SKIPPED] xe_bo_shrink ==================
[10:50:53] ==================== xe_bo (2 subtests) ====================
[10:50:53] ================== xe_ccs_migrate_kunit  ===================
[10:50:53] ============== [SKIPPED] xe_ccs_migrate_kunit ==============
[10:50:53] ==================== xe_bo_evict_kunit  ====================
[10:50:53] =============== [SKIPPED] xe_bo_evict_kunit ================
[10:50:53] ===================== [SKIPPED] xe_bo ======================
[10:50:53] ==================== args (11 subtests) ====================
[10:50:53] [PASSED] count_args_test
[10:50:53] [PASSED] call_args_example
[10:50:53] [PASSED] call_args_test
[10:50:53] [PASSED] drop_first_arg_example
[10:50:53] [PASSED] drop_first_arg_test
[10:50:53] [PASSED] first_arg_example
[10:50:53] [PASSED] first_arg_test
[10:50:53] [PASSED] last_arg_example
[10:50:53] [PASSED] last_arg_test
[10:50:53] [PASSED] pick_arg_example
[10:50:53] [PASSED] sep_comma_example
[10:50:53] ====================== [PASSED] args =======================
[10:50:53] =================== xe_pci (3 subtests) ====================
[10:50:53] ==================== check_graphics_ip  ====================
[10:50:53] [PASSED] 12.70 Xe_LPG
[10:50:53] [PASSED] 12.71 Xe_LPG
[10:50:53] [PASSED] 12.74 Xe_LPG+
[10:50:53] [PASSED] 20.01 Xe2_HPG
[10:50:53] [PASSED] 20.02 Xe2_HPG
[10:50:53] [PASSED] 20.04 Xe2_LPG
[10:50:53] [PASSED] 30.00 Xe3_LPG
[10:50:53] [PASSED] 30.01 Xe3_LPG
[10:50:53] [PASSED] 30.03 Xe3_LPG
[10:50:53] ================ [PASSED] check_graphics_ip ================
[10:50:53] ===================== check_media_ip  ======================
[10:50:53] [PASSED] 13.00 Xe_LPM+
[10:50:53] [PASSED] 13.01 Xe2_HPM
[10:50:53] [PASSED] 20.00 Xe2_LPM
[10:50:53] [PASSED] 30.00 Xe3_LPM
[10:50:53] [PASSED] 30.02 Xe3_LPM
[10:50:53] ================= [PASSED] check_media_ip ==================
[10:50:53] ================= check_platform_gt_count  =================
[10:50:53] [PASSED] 0x9A60 (TIGERLAKE)
[10:50:53] [PASSED] 0x9A68 (TIGERLAKE)
[10:50:53] [PASSED] 0x9A70 (TIGERLAKE)
[10:50:53] [PASSED] 0x9A40 (TIGERLAKE)
[10:50:53] [PASSED] 0x9A49 (TIGERLAKE)
[10:50:53] [PASSED] 0x9A59 (TIGERLAKE)
[10:50:53] [PASSED] 0x9A78 (TIGERLAKE)
[10:50:53] [PASSED] 0x9AC0 (TIGERLAKE)
[10:50:53] [PASSED] 0x9AC9 (TIGERLAKE)
[10:50:53] [PASSED] 0x9AD9 (TIGERLAKE)
[10:50:53] [PASSED] 0x9AF8 (TIGERLAKE)
[10:50:53] [PASSED] 0x4C80 (ROCKETLAKE)
[10:50:53] [PASSED] 0x4C8A (ROCKETLAKE)
[10:50:53] [PASSED] 0x4C8B (ROCKETLAKE)
[10:50:53] [PASSED] 0x4C8C (ROCKETLAKE)
[10:50:53] [PASSED] 0x4C90 (ROCKETLAKE)
[10:50:53] [PASSED] 0x4C9A (ROCKETLAKE)
[10:50:53] [PASSED] 0x4680 (ALDERLAKE_S)
[10:50:53] [PASSED] 0x4682 (ALDERLAKE_S)
[10:50:53] [PASSED] 0x4688 (ALDERLAKE_S)
[10:50:53] [PASSED] 0x468A (ALDERLAKE_S)
[10:50:53] [PASSED] 0x468B (ALDERLAKE_S)
[10:50:53] [PASSED] 0x4690 (ALDERLAKE_S)
[10:50:53] [PASSED] 0x4692 (ALDERLAKE_S)
[10:50:53] [PASSED] 0x4693 (ALDERLAKE_S)
[10:50:53] [PASSED] 0x46A0 (ALDERLAKE_P)
[10:50:53] [PASSED] 0x46A1 (ALDERLAKE_P)
[10:50:53] [PASSED] 0x46A2 (ALDERLAKE_P)
[10:50:53] [PASSED] 0x46A3 (ALDERLAKE_P)
[10:50:53] [PASSED] 0x46A6 (ALDERLAKE_P)
[10:50:53] [PASSED] 0x46A8 (ALDERLAKE_P)
[10:50:53] [PASSED] 0x46AA (ALDERLAKE_P)
[10:50:53] [PASSED] 0x462A (ALDERLAKE_P)
[10:50:53] [PASSED] 0x4626 (ALDERLAKE_P)
[10:50:53] [PASSED] 0x4628 (ALDERLAKE_P)
[10:50:53] [PASSED] 0x46B0 (ALDERLAKE_P)
[10:50:53] [PASSED] 0x46B1 (ALDERLAKE_P)
[10:50:53] [PASSED] 0x46B2 (ALDERLAKE_P)
[10:50:53] [PASSED] 0x46B3 (ALDERLAKE_P)
[10:50:53] [PASSED] 0x46C0 (ALDERLAKE_P)
[10:50:53] [PASSED] 0x46C1 (ALDERLAKE_P)
[10:50:53] [PASSED] 0x46C2 (ALDERLAKE_P)
[10:50:53] [PASSED] 0x46C3 (ALDERLAKE_P)
[10:50:53] [PASSED] 0x46D0 (ALDERLAKE_N)
[10:50:53] [PASSED] 0x46D1 (ALDERLAKE_N)
[10:50:53] [PASSED] 0x46D2 (ALDERLAKE_N)
[10:50:53] [PASSED] 0x46D3 (ALDERLAKE_N)
[10:50:53] [PASSED] 0x46D4 (ALDERLAKE_N)
[10:50:53] [PASSED] 0xA721 (ALDERLAKE_P)
[10:50:53] [PASSED] 0xA7A1 (ALDERLAKE_P)
[10:50:53] [PASSED] 0xA7A9 (ALDERLAKE_P)
[10:50:53] [PASSED] 0xA7AC (ALDERLAKE_P)
[10:50:53] [PASSED] 0xA7AD (ALDERLAKE_P)
[10:50:53] [PASSED] 0xA720 (ALDERLAKE_P)
[10:50:53] [PASSED] 0xA7A0 (ALDERLAKE_P)
[10:50:53] [PASSED] 0xA7A8 (ALDERLAKE_P)
[10:50:53] [PASSED] 0xA7AA (ALDERLAKE_P)
[10:50:53] [PASSED] 0xA7AB (ALDERLAKE_P)
[10:50:53] [PASSED] 0xA780 (ALDERLAKE_S)
[10:50:53] [PASSED] 0xA781 (ALDERLAKE_S)
[10:50:53] [PASSED] 0xA782 (ALDERLAKE_S)
[10:50:53] [PASSED] 0xA783 (ALDERLAKE_S)
[10:50:53] [PASSED] 0xA788 (ALDERLAKE_S)
[10:50:53] [PASSED] 0xA789 (ALDERLAKE_S)
[10:50:53] [PASSED] 0xA78A (ALDERLAKE_S)
[10:50:53] [PASSED] 0xA78B (ALDERLAKE_S)
[10:50:53] [PASSED] 0x4905 (DG1)
[10:50:53] [PASSED] 0x4906 (DG1)
[10:50:53] [PASSED] 0x4907 (DG1)
[10:50:53] [PASSED] 0x4908 (DG1)
[10:50:53] [PASSED] 0x4909 (DG1)
[10:50:53] [PASSED] 0x56C0 (DG2)
[10:50:53] [PASSED] 0x56C2 (DG2)
[10:50:53] [PASSED] 0x56C1 (DG2)
[10:50:53] [PASSED] 0x7D51 (METEORLAKE)
[10:50:53] [PASSED] 0x7DD1 (METEORLAKE)
[10:50:53] [PASSED] 0x7D41 (METEORLAKE)
[10:50:53] [PASSED] 0x7D67 (METEORLAKE)
[10:50:53] [PASSED] 0xB640 (METEORLAKE)
[10:50:53] [PASSED] 0x56A0 (DG2)
[10:50:53] [PASSED] 0x56A1 (DG2)
[10:50:53] [PASSED] 0x56A2 (DG2)
[10:50:53] [PASSED] 0x56BE (DG2)
[10:50:53] [PASSED] 0x56BF (DG2)
[10:50:53] [PASSED] 0x5690 (DG2)
[10:50:53] [PASSED] 0x5691 (DG2)
[10:50:53] [PASSED] 0x5692 (DG2)
[10:50:53] [PASSED] 0x56A5 (DG2)
[10:50:53] [PASSED] 0x56A6 (DG2)
[10:50:53] [PASSED] 0x56B0 (DG2)
[10:50:53] [PASSED] 0x56B1 (DG2)
[10:50:53] [PASSED] 0x56BA (DG2)
[10:50:53] [PASSED] 0x56BB (DG2)
[10:50:53] [PASSED] 0x56BC (DG2)
[10:50:53] [PASSED] 0x56BD (DG2)
[10:50:53] [PASSED] 0x5693 (DG2)
[10:50:53] [PASSED] 0x5694 (DG2)
[10:50:53] [PASSED] 0x5695 (DG2)
[10:50:53] [PASSED] 0x56A3 (DG2)
[10:50:53] [PASSED] 0x56A4 (DG2)
[10:50:53] [PASSED] 0x56B2 (DG2)
[10:50:53] [PASSED] 0x56B3 (DG2)
[10:50:53] [PASSED] 0x5696 (DG2)
[10:50:53] [PASSED] 0x5697 (DG2)
[10:50:53] [PASSED] 0xB69 (PVC)
[10:50:53] [PASSED] 0xB6E (PVC)
[10:50:53] [PASSED] 0xBD4 (PVC)
[10:50:53] [PASSED] 0xBD5 (PVC)
[10:50:53] [PASSED] 0xBD6 (PVC)
[10:50:53] [PASSED] 0xBD7 (PVC)
[10:50:53] [PASSED] 0xBD8 (PVC)
[10:50:53] [PASSED] 0xBD9 (PVC)
[10:50:53] [PASSED] 0xBDA (PVC)
[10:50:53] [PASSED] 0xBDB (PVC)
[10:50:53] [PASSED] 0xBE0 (PVC)
[10:50:53] [PASSED] 0xBE1 (PVC)
[10:50:53] [PASSED] 0xBE5 (PVC)
[10:50:53] [PASSED] 0x7D40 (METEORLAKE)
[10:50:53] [PASSED] 0x7D45 (METEORLAKE)
[10:50:53] [PASSED] 0x7D55 (METEORLAKE)
[10:50:53] [PASSED] 0x7D60 (METEORLAKE)
[10:50:53] [PASSED] 0x7DD5 (METEORLAKE)
[10:50:53] [PASSED] 0x6420 (LUNARLAKE)
[10:50:53] [PASSED] 0x64A0 (LUNARLAKE)
[10:50:53] [PASSED] 0x64B0 (LUNARLAKE)
[10:50:53] [PASSED] 0xE202 (BATTLEMAGE)
[10:50:53] [PASSED] 0xE209 (BATTLEMAGE)
[10:50:53] [PASSED] 0xE20B (BATTLEMAGE)
[10:50:53] [PASSED] 0xE20C (BATTLEMAGE)
[10:50:53] [PASSED] 0xE20D (BATTLEMAGE)
[10:50:53] [PASSED] 0xE210 (BATTLEMAGE)
[10:50:53] [PASSED] 0xE211 (BATTLEMAGE)
[10:50:53] [PASSED] 0xE212 (BATTLEMAGE)
[10:50:53] [PASSED] 0xE216 (BATTLEMAGE)
[10:50:53] [PASSED] 0xE220 (BATTLEMAGE)
[10:50:53] [PASSED] 0xE221 (BATTLEMAGE)
[10:50:53] [PASSED] 0xE222 (BATTLEMAGE)
[10:50:53] [PASSED] 0xE223 (BATTLEMAGE)
[10:50:53] [PASSED] 0xB080 (PANTHERLAKE)
[10:50:53] [PASSED] 0xB081 (PANTHERLAKE)
[10:50:53] [PASSED] 0xB082 (PANTHERLAKE)
[10:50:53] [PASSED] 0xB083 (PANTHERLAKE)
[10:50:53] [PASSED] 0xB084 (PANTHERLAKE)
[10:50:53] [PASSED] 0xB085 (PANTHERLAKE)
[10:50:53] [PASSED] 0xB086 (PANTHERLAKE)
[10:50:53] [PASSED] 0xB087 (PANTHERLAKE)
[10:50:53] [PASSED] 0xB08F (PANTHERLAKE)
[10:50:53] [PASSED] 0xB090 (PANTHERLAKE)
[10:50:53] [PASSED] 0xB0A0 (PANTHERLAKE)
[10:50:53] [PASSED] 0xB0B0 (PANTHERLAKE)
[10:50:53] [PASSED] 0xFD80 (PANTHERLAKE)
[10:50:53] [PASSED] 0xFD81 (PANTHERLAKE)
[10:50:53] ============= [PASSED] check_platform_gt_count =============
[10:50:53] ===================== [PASSED] xe_pci ======================
[10:50:53] =================== xe_rtp (2 subtests) ====================
[10:50:53] =============== xe_rtp_process_to_sr_tests  ================
[10:50:53] [PASSED] coalesce-same-reg
[10:50:53] [PASSED] no-match-no-add
[10:50:53] [PASSED] match-or
[10:50:53] [PASSED] match-or-xfail
[10:50:53] [PASSED] no-match-no-add-multiple-rules
[10:50:53] [PASSED] two-regs-two-entries
[10:50:53] [PASSED] clr-one-set-other
[10:50:53] [PASSED] set-field
[10:50:53] [PASSED] conflict-duplicate
[10:50:53] [PASSED] conflict-not-disjoint
[10:50:53] [PASSED] conflict-reg-type
[10:50:53] =========== [PASSED] xe_rtp_process_to_sr_tests ============
[10:50:53] ================== xe_rtp_process_tests  ===================
[10:50:53] [PASSED] active1
[10:50:53] [PASSED] active2
[10:50:53] [PASSED] active-inactive
[10:50:53] [PASSED] inactive-active
[10:50:53] [PASSED] inactive-1st_or_active-inactive
[10:50:53] [PASSED] inactive-2nd_or_active-inactive
[10:50:53] [PASSED] inactive-last_or_active-inactive
[10:50:53] [PASSED] inactive-no_or_active-inactive
[10:50:53] ============== [PASSED] xe_rtp_process_tests ===============
[10:50:53] ===================== [PASSED] xe_rtp ======================
[10:50:53] ==================== xe_wa (1 subtest) =====================
[10:50:53] ======================== xe_wa_gt  =========================
[10:50:53] [PASSED] TIGERLAKE (B0)
[10:50:53] [PASSED] DG1 (A0)
[10:50:53] [PASSED] DG1 (B0)
[10:50:53] [PASSED] ALDERLAKE_S (A0)
[10:50:53] [PASSED] ALDERLAKE_S (B0)
[10:50:53] [PASSED] ALDERLAKE_S (C0)
[10:50:53] [PASSED] ALDERLAKE_S (D0)
[10:50:53] [PASSED] ALDERLAKE_P (A0)
[10:50:53] [PASSED] ALDERLAKE_P (B0)
[10:50:53] [PASSED] ALDERLAKE_P (C0)
[10:50:53] [PASSED] ALDERLAKE_S_RPLS (D0)
[10:50:53] [PASSED] ALDERLAKE_P_RPLU (E0)
[10:50:53] [PASSED] DG2_G10 (C0)
[10:50:53] [PASSED] DG2_G11 (B1)
[10:50:53] [PASSED] DG2_G12 (A1)
[10:50:53] [PASSED] METEORLAKE (g:A0, m:A0)
[10:50:53] [PASSED] METEORLAKE (g:A0, m:A0)
[10:50:53] [PASSED] METEORLAKE (g:A0, m:A0)
[10:50:53] [PASSED] LUNARLAKE (g:A0, m:A0)
[10:50:53] [PASSED] LUNARLAKE (g:B0, m:A0)
stty: 'standard input': Inappropriate ioctl for device
[10:50:53] [PASSED] BATTLEMAGE (g:A0, m:A1)
[10:50:53] [PASSED] PANTHERLAKE (g:A0, m:A0)
[10:50:53] ==================== [PASSED] xe_wa_gt =====================
[10:50:53] ====================== [PASSED] xe_wa ======================
[10:50:53] ============================================================
[10:50:53] Testing complete. Ran 298 tests: passed: 282, skipped: 16
[10:50:53] Elapsed time: 33.222s total, 4.152s configuring, 28.704s building, 0.331s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/tests/.kunitconfig
[10:50:53] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[10:50:55] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[10:51:18] Starting KUnit Kernel (1/1)...
[10:51:18] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[10:51:18] == drm_test_atomic_get_connector_for_encoder (1 subtest) ===
[10:51:18] [PASSED] drm_test_drm_atomic_get_connector_for_encoder
[10:51:18] ==== [PASSED] drm_test_atomic_get_connector_for_encoder ====
[10:51:18] =========== drm_validate_clone_mode (2 subtests) ===========
[10:51:18] ============== drm_test_check_in_clone_mode  ===============
[10:51:18] [PASSED] in_clone_mode
[10:51:18] [PASSED] not_in_clone_mode
[10:51:18] ========== [PASSED] drm_test_check_in_clone_mode ===========
[10:51:18] =============== drm_test_check_valid_clones  ===============
[10:51:18] [PASSED] not_in_clone_mode
[10:51:18] [PASSED] valid_clone
[10:51:18] [PASSED] invalid_clone
[10:51:18] =========== [PASSED] drm_test_check_valid_clones ===========
[10:51:18] ============= [PASSED] drm_validate_clone_mode =============
[10:51:18] ============= drm_validate_modeset (1 subtest) =============
[10:51:18] [PASSED] drm_test_check_connector_changed_modeset
[10:51:18] ============== [PASSED] drm_validate_modeset ===============
[10:51:18] ====== drm_test_bridge_get_current_state (2 subtests) ======
[10:51:18] [PASSED] drm_test_drm_bridge_get_current_state_atomic
[10:51:18] [PASSED] drm_test_drm_bridge_get_current_state_legacy
[10:51:18] ======== [PASSED] drm_test_bridge_get_current_state ========
[10:51:18] ====== drm_test_bridge_helper_reset_crtc (3 subtests) ======
[10:51:18] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic
[10:51:18] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic_disabled
[10:51:18] [PASSED] drm_test_drm_bridge_helper_reset_crtc_legacy
[10:51:18] ======== [PASSED] drm_test_bridge_helper_reset_crtc ========
[10:51:18] ============== drm_bridge_alloc (2 subtests) ===============
[10:51:18] [PASSED] drm_test_drm_bridge_alloc_basic
[10:51:18] [PASSED] drm_test_drm_bridge_alloc_get_put
[10:51:18] ================ [PASSED] drm_bridge_alloc =================
[10:51:18] ================== drm_buddy (7 subtests) ==================
[10:51:18] [PASSED] drm_test_buddy_alloc_limit
[10:51:18] [PASSED] drm_test_buddy_alloc_optimistic
[10:51:18] [PASSED] drm_test_buddy_alloc_pessimistic
[10:51:18] [PASSED] drm_test_buddy_alloc_pathological
[10:51:18] [PASSED] drm_test_buddy_alloc_contiguous
[10:51:18] [PASSED] drm_test_buddy_alloc_clear
[10:51:18] [PASSED] drm_test_buddy_alloc_range_bias
[10:51:18] ==================== [PASSED] drm_buddy ====================
[10:51:18] ============= drm_cmdline_parser (40 subtests) =============
[10:51:18] [PASSED] drm_test_cmdline_force_d_only
[10:51:18] [PASSED] drm_test_cmdline_force_D_only_dvi
[10:51:18] [PASSED] drm_test_cmdline_force_D_only_hdmi
[10:51:18] [PASSED] drm_test_cmdline_force_D_only_not_digital
[10:51:18] [PASSED] drm_test_cmdline_force_e_only
[10:51:18] [PASSED] drm_test_cmdline_res
[10:51:18] [PASSED] drm_test_cmdline_res_vesa
[10:51:18] [PASSED] drm_test_cmdline_res_vesa_rblank
[10:51:18] [PASSED] drm_test_cmdline_res_rblank
[10:51:18] [PASSED] drm_test_cmdline_res_bpp
[10:51:18] [PASSED] drm_test_cmdline_res_refresh
[10:51:18] [PASSED] drm_test_cmdline_res_bpp_refresh
[10:51:18] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced
[10:51:18] [PASSED] drm_test_cmdline_res_bpp_refresh_margins
[10:51:18] [PASSED] drm_test_cmdline_res_bpp_refresh_force_off
[10:51:18] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on
[10:51:18] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_analog
[10:51:18] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_digital
[10:51:18] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced_margins_force_on
[10:51:18] [PASSED] drm_test_cmdline_res_margins_force_on
[10:51:18] [PASSED] drm_test_cmdline_res_vesa_margins
[10:51:18] [PASSED] drm_test_cmdline_name
[10:51:18] [PASSED] drm_test_cmdline_name_bpp
[10:51:18] [PASSED] drm_test_cmdline_name_option
[10:51:18] [PASSED] drm_test_cmdline_name_bpp_option
[10:51:18] [PASSED] drm_test_cmdline_rotate_0
[10:51:18] [PASSED] drm_test_cmdline_rotate_90
[10:51:18] [PASSED] drm_test_cmdline_rotate_180
[10:51:18] [PASSED] drm_test_cmdline_rotate_270
[10:51:18] [PASSED] drm_test_cmdline_hmirror
[10:51:18] [PASSED] drm_test_cmdline_vmirror
[10:51:18] [PASSED] drm_test_cmdline_margin_options
[10:51:18] [PASSED] drm_test_cmdline_multiple_options
[10:51:18] [PASSED] drm_test_cmdline_bpp_extra_and_option
[10:51:18] [PASSED] drm_test_cmdline_extra_and_option
[10:51:18] [PASSED] drm_test_cmdline_freestanding_options
[10:51:18] [PASSED] drm_test_cmdline_freestanding_force_e_and_options
[10:51:18] [PASSED] drm_test_cmdline_panel_orientation
[10:51:18] ================ drm_test_cmdline_invalid  =================
[10:51:18] [PASSED] margin_only
[10:51:18] [PASSED] interlace_only
[10:51:18] [PASSED] res_missing_x
[10:51:18] [PASSED] res_missing_y
[10:51:18] [PASSED] res_bad_y
[10:51:18] [PASSED] res_missing_y_bpp
[10:51:18] [PASSED] res_bad_bpp
[10:51:18] [PASSED] res_bad_refresh
[10:51:18] [PASSED] res_bpp_refresh_force_on_off
[10:51:18] [PASSED] res_invalid_mode
[10:51:18] [PASSED] res_bpp_wrong_place_mode
[10:51:18] [PASSED] name_bpp_refresh
[10:51:18] [PASSED] name_refresh
[10:51:18] [PASSED] name_refresh_wrong_mode
[10:51:18] [PASSED] name_refresh_invalid_mode
[10:51:18] [PASSED] rotate_multiple
[10:51:18] [PASSED] rotate_invalid_val
[10:51:18] [PASSED] rotate_truncated
[10:51:18] [PASSED] invalid_option
[10:51:18] [PASSED] invalid_tv_option
[10:51:18] [PASSED] truncated_tv_option
[10:51:18] ============ [PASSED] drm_test_cmdline_invalid =============
[10:51:18] =============== drm_test_cmdline_tv_options  ===============
[10:51:18] [PASSED] NTSC
[10:51:18] [PASSED] NTSC_443
[10:51:18] [PASSED] NTSC_J
[10:51:18] [PASSED] PAL
[10:51:18] [PASSED] PAL_M
[10:51:18] [PASSED] PAL_N
[10:51:18] [PASSED] SECAM
[10:51:18] [PASSED] MONO_525
[10:51:18] [PASSED] MONO_625
[10:51:18] =========== [PASSED] drm_test_cmdline_tv_options ===========
[10:51:18] =============== [PASSED] drm_cmdline_parser ================
[10:51:18] ========== drmm_connector_hdmi_init (20 subtests) ==========
[10:51:18] [PASSED] drm_test_connector_hdmi_init_valid
[10:51:18] [PASSED] drm_test_connector_hdmi_init_bpc_8
[10:51:18] [PASSED] drm_test_connector_hdmi_init_bpc_10
[10:51:18] [PASSED] drm_test_connector_hdmi_init_bpc_12
[10:51:18] [PASSED] drm_test_connector_hdmi_init_bpc_invalid
[10:51:18] [PASSED] drm_test_connector_hdmi_init_bpc_null
[10:51:18] [PASSED] drm_test_connector_hdmi_init_formats_empty
[10:51:18] [PASSED] drm_test_connector_hdmi_init_formats_no_rgb
[10:51:18] === drm_test_connector_hdmi_init_formats_yuv420_allowed  ===
[10:51:18] [PASSED] supported_formats=0x9 yuv420_allowed=1
[10:51:18] [PASSED] supported_formats=0x9 yuv420_allowed=0
[10:51:18] [PASSED] supported_formats=0x3 yuv420_allowed=1
[10:51:18] [PASSED] supported_formats=0x3 yuv420_allowed=0
[10:51:18] === [PASSED] drm_test_connector_hdmi_init_formats_yuv420_allowed ===
[10:51:18] [PASSED] drm_test_connector_hdmi_init_null_ddc
[10:51:18] [PASSED] drm_test_connector_hdmi_init_null_product
[10:51:18] [PASSED] drm_test_connector_hdmi_init_null_vendor
[10:51:18] [PASSED] drm_test_connector_hdmi_init_product_length_exact
[10:51:18] [PASSED] drm_test_connector_hdmi_init_product_length_too_long
[10:51:18] [PASSED] drm_test_connector_hdmi_init_product_valid
[10:51:18] [PASSED] drm_test_connector_hdmi_init_vendor_length_exact
[10:51:18] [PASSED] drm_test_connector_hdmi_init_vendor_length_too_long
[10:51:18] [PASSED] drm_test_connector_hdmi_init_vendor_valid
[10:51:18] ========= drm_test_connector_hdmi_init_type_valid  =========
[10:51:18] [PASSED] HDMI-A
[10:51:18] [PASSED] HDMI-B
[10:51:18] ===== [PASSED] drm_test_connector_hdmi_init_type_valid =====
[10:51:18] ======== drm_test_connector_hdmi_init_type_invalid  ========
[10:51:18] [PASSED] Unknown
[10:51:18] [PASSED] VGA
[10:51:18] [PASSED] DVI-I
[10:51:18] [PASSED] DVI-D
[10:51:18] [PASSED] DVI-A
[10:51:18] [PASSED] Composite
[10:51:18] [PASSED] SVIDEO
[10:51:18] [PASSED] LVDS
[10:51:18] [PASSED] Component
[10:51:18] [PASSED] DIN
[10:51:18] [PASSED] DP
[10:51:18] [PASSED] TV
[10:51:18] [PASSED] eDP
[10:51:18] [PASSED] Virtual
[10:51:18] [PASSED] DSI
[10:51:18] [PASSED] DPI
[10:51:18] [PASSED] Writeback
[10:51:18] [PASSED] SPI
[10:51:18] [PASSED] USB
[10:51:18] ==== [PASSED] drm_test_connector_hdmi_init_type_invalid ====
[10:51:18] ============ [PASSED] drmm_connector_hdmi_init =============
[10:51:18] ============= drmm_connector_init (3 subtests) =============
[10:51:18] [PASSED] drm_test_drmm_connector_init
[10:51:18] [PASSED] drm_test_drmm_connector_init_null_ddc
[10:51:18] ========= drm_test_drmm_connector_init_type_valid  =========
[10:51:18] [PASSED] Unknown
[10:51:18] [PASSED] VGA
[10:51:18] [PASSED] DVI-I
[10:51:18] [PASSED] DVI-D
[10:51:18] [PASSED] DVI-A
[10:51:18] [PASSED] Composite
[10:51:18] [PASSED] SVIDEO
[10:51:18] [PASSED] LVDS
[10:51:18] [PASSED] Component
[10:51:18] [PASSED] DIN
[10:51:18] [PASSED] DP
[10:51:18] [PASSED] HDMI-A
[10:51:18] [PASSED] HDMI-B
[10:51:18] [PASSED] TV
[10:51:18] [PASSED] eDP
[10:51:18] [PASSED] Virtual
[10:51:18] [PASSED] DSI
[10:51:18] [PASSED] DPI
[10:51:18] [PASSED] Writeback
[10:51:18] [PASSED] SPI
[10:51:18] [PASSED] USB
[10:51:18] ===== [PASSED] drm_test_drmm_connector_init_type_valid =====
[10:51:18] =============== [PASSED] drmm_connector_init ===============
[10:51:18] ========= drm_connector_dynamic_init (6 subtests) ==========
[10:51:18] [PASSED] drm_test_drm_connector_dynamic_init
[10:51:18] [PASSED] drm_test_drm_connector_dynamic_init_null_ddc
[10:51:18] [PASSED] drm_test_drm_connector_dynamic_init_not_added
[10:51:18] [PASSED] drm_test_drm_connector_dynamic_init_properties
[10:51:18] ===== drm_test_drm_connector_dynamic_init_type_valid  ======
[10:51:18] [PASSED] Unknown
[10:51:18] [PASSED] VGA
[10:51:18] [PASSED] DVI-I
[10:51:18] [PASSED] DVI-D
[10:51:18] [PASSED] DVI-A
[10:51:18] [PASSED] Composite
[10:51:18] [PASSED] SVIDEO
[10:51:18] [PASSED] LVDS
[10:51:18] [PASSED] Component
[10:51:18] [PASSED] DIN
[10:51:18] [PASSED] DP
[10:51:18] [PASSED] HDMI-A
[10:51:18] [PASSED] HDMI-B
[10:51:18] [PASSED] TV
[10:51:18] [PASSED] eDP
[10:51:18] [PASSED] Virtual
[10:51:18] [PASSED] DSI
[10:51:18] [PASSED] DPI
[10:51:18] [PASSED] Writeback
[10:51:18] [PASSED] SPI
[10:51:18] [PASSED] USB
[10:51:18] = [PASSED] drm_test_drm_connector_dynamic_init_type_valid ==
[10:51:18] ======== drm_test_drm_connector_dynamic_init_name  =========
[10:51:18] [PASSED] Unknown
[10:51:18] [PASSED] VGA
[10:51:18] [PASSED] DVI-I
[10:51:18] [PASSED] DVI-D
[10:51:18] [PASSED] DVI-A
[10:51:18] [PASSED] Composite
[10:51:18] [PASSED] SVIDEO
[10:51:18] [PASSED] LVDS
[10:51:18] [PASSED] Component
[10:51:18] [PASSED] DIN
[10:51:18] [PASSED] DP
[10:51:18] [PASSED] HDMI-A
[10:51:18] [PASSED] HDMI-B
[10:51:18] [PASSED] TV
[10:51:18] [PASSED] eDP
[10:51:18] [PASSED] Virtual
[10:51:18] [PASSED] DSI
[10:51:18] [PASSED] DPI
[10:51:18] [PASSED] Writeback
[10:51:18] [PASSED] SPI
[10:51:18] [PASSED] USB
[10:51:18] ==== [PASSED] drm_test_drm_connector_dynamic_init_name =====
[10:51:18] =========== [PASSED] drm_connector_dynamic_init ============
[10:51:18] ==== drm_connector_dynamic_register_early (4 subtests) =====
[10:51:18] [PASSED] drm_test_drm_connector_dynamic_register_early_on_list
[10:51:18] [PASSED] drm_test_drm_connector_dynamic_register_early_defer
[10:51:18] [PASSED] drm_test_drm_connector_dynamic_register_early_no_init
[10:51:18] [PASSED] drm_test_drm_connector_dynamic_register_early_no_mode_object
[10:51:18] ====== [PASSED] drm_connector_dynamic_register_early =======
[10:51:18] ======= drm_connector_dynamic_register (7 subtests) ========
[10:51:18] [PASSED] drm_test_drm_connector_dynamic_register_on_list
[10:51:18] [PASSED] drm_test_drm_connector_dynamic_register_no_defer
[10:51:18] [PASSED] drm_test_drm_connector_dynamic_register_no_init
[10:51:18] [PASSED] drm_test_drm_connector_dynamic_register_mode_object
[10:51:18] [PASSED] drm_test_drm_connector_dynamic_register_sysfs
[10:51:18] [PASSED] drm_test_drm_connector_dynamic_register_sysfs_name
[10:51:18] [PASSED] drm_test_drm_connector_dynamic_register_debugfs
[10:51:18] ========= [PASSED] drm_connector_dynamic_register ==========
[10:51:18] = drm_connector_attach_broadcast_rgb_property (2 subtests) =
[10:51:18] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property
[10:51:18] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property_hdmi_connector
[10:51:18] === [PASSED] drm_connector_attach_broadcast_rgb_property ===
[10:51:18] ========== drm_get_tv_mode_from_name (2 subtests) ==========
[10:51:18] ========== drm_test_get_tv_mode_from_name_valid  ===========
[10:51:18] [PASSED] NTSC
[10:51:18] [PASSED] NTSC-443
[10:51:18] [PASSED] NTSC-J
[10:51:18] [PASSED] PAL
[10:51:18] [PASSED] PAL-M
[10:51:18] [PASSED] PAL-N
[10:51:18] [PASSED] SECAM
[10:51:18] [PASSED] Mono
[10:51:18] ====== [PASSED] drm_test_get_tv_mode_from_name_valid =======
[10:51:18] [PASSED] drm_test_get_tv_mode_from_name_truncated
[10:51:18] ============ [PASSED] drm_get_tv_mode_from_name ============
[10:51:18] = drm_test_connector_hdmi_compute_mode_clock (12 subtests) =
[10:51:18] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb
[10:51:18] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc
[10:51:18] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc_vic_1
[10:51:18] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc
[10:51:18] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc_vic_1
[10:51:18] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_double
[10:51:18] = drm_test_connector_hdmi_compute_mode_clock_yuv420_valid  =
[10:51:18] [PASSED] VIC 96
[10:51:18] [PASSED] VIC 97
[10:51:18] [PASSED] VIC 101
[10:51:18] [PASSED] VIC 102
[10:51:18] [PASSED] VIC 106
[10:51:18] [PASSED] VIC 107
[10:51:18] === [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_valid ===
[10:51:18] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_10_bpc
[10:51:18] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_12_bpc
[10:51:18] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_8_bpc
[10:51:18] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_10_bpc
[10:51:18] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_12_bpc
[10:51:18] === [PASSED] drm_test_connector_hdmi_compute_mode_clock ====
[10:51:18] == drm_hdmi_connector_get_broadcast_rgb_name (2 subtests) ==
[10:51:18] === drm_test_drm_hdmi_connector_get_broadcast_rgb_name  ====
[10:51:18] [PASSED] Automatic
[10:51:18] [PASSED] Full
[10:51:18] [PASSED] Limited 16:235
[10:51:18] === [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name ===
[10:51:18] [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name_invalid
[10:51:18] ==== [PASSED] drm_hdmi_connector_get_broadcast_rgb_name ====
[10:51:18] == drm_hdmi_connector_get_output_format_name (2 subtests) ==
[10:51:18] === drm_test_drm_hdmi_connector_get_output_format_name  ====
[10:51:18] [PASSED] RGB
[10:51:18] [PASSED] YUV 4:2:0
[10:51:18] [PASSED] YUV 4:2:2
[10:51:18] [PASSED] YUV 4:4:4
[10:51:18] === [PASSED] drm_test_drm_hdmi_connector_get_output_format_name ===
[10:51:18] [PASSED] drm_test_drm_hdmi_connector_get_output_format_name_invalid
[10:51:18] ==== [PASSED] drm_hdmi_connector_get_output_format_name ====
[10:51:18] ============= drm_damage_helper (21 subtests) ==============
[10:51:18] [PASSED] drm_test_damage_iter_no_damage
[10:51:18] [PASSED] drm_test_damage_iter_no_damage_fractional_src
[10:51:18] [PASSED] drm_test_damage_iter_no_damage_src_moved
[10:51:18] [PASSED] drm_test_damage_iter_no_damage_fractional_src_moved
[10:51:18] [PASSED] drm_test_damage_iter_no_damage_not_visible
[10:51:18] [PASSED] drm_test_damage_iter_no_damage_no_crtc
[10:51:18] [PASSED] drm_test_damage_iter_no_damage_no_fb
[10:51:18] [PASSED] drm_test_damage_iter_simple_damage
[10:51:18] [PASSED] drm_test_damage_iter_single_damage
[10:51:18] [PASSED] drm_test_damage_iter_single_damage_intersect_src
[10:51:18] [PASSED] drm_test_damage_iter_single_damage_outside_src
[10:51:18] [PASSED] drm_test_damage_iter_single_damage_fractional_src
[10:51:18] [PASSED] drm_test_damage_iter_single_damage_intersect_fractional_src
[10:51:18] [PASSED] drm_test_damage_iter_single_damage_outside_fractional_src
[10:51:18] [PASSED] drm_test_damage_iter_single_damage_src_moved
[10:51:18] [PASSED] drm_test_damage_iter_single_damage_fractional_src_moved
[10:51:18] [PASSED] drm_test_damage_iter_damage
[10:51:18] [PASSED] drm_test_damage_iter_damage_one_intersect
[10:51:18] [PASSED] drm_test_damage_iter_damage_one_outside
[10:51:18] [PASSED] drm_test_damage_iter_damage_src_moved
[10:51:18] [PASSED] drm_test_damage_iter_damage_not_visible
[10:51:18] ================ [PASSED] drm_damage_helper ================
[10:51:18] ============== drm_dp_mst_helper (3 subtests) ==============
[10:51:18] ============== drm_test_dp_mst_calc_pbn_mode  ==============
[10:51:18] [PASSED] Clock 154000 BPP 30 DSC disabled
[10:51:18] [PASSED] Clock 234000 BPP 30 DSC disabled
[10:51:18] [PASSED] Clock 297000 BPP 24 DSC disabled
[10:51:18] [PASSED] Clock 332880 BPP 24 DSC enabled
[10:51:18] [PASSED] Clock 324540 BPP 24 DSC enabled
[10:51:18] ========== [PASSED] drm_test_dp_mst_calc_pbn_mode ==========
[10:51:18] ============== drm_test_dp_mst_calc_pbn_div  ===============
[10:51:18] [PASSED] Link rate 2000000 lane count 4
[10:51:18] [PASSED] Link rate 2000000 lane count 2
[10:51:18] [PASSED] Link rate 2000000 lane count 1
[10:51:18] [PASSED] Link rate 1350000 lane count 4
[10:51:18] [PASSED] Link rate 1350000 lane count 2
[10:51:18] [PASSED] Link rate 1350000 lane count 1
[10:51:18] [PASSED] Link rate 1000000 lane count 4
[10:51:18] [PASSED] Link rate 1000000 lane count 2
[10:51:18] [PASSED] Link rate 1000000 lane count 1
[10:51:18] [PASSED] Link rate 810000 lane count 4
[10:51:18] [PASSED] Link rate 810000 lane count 2
[10:51:18] [PASSED] Link rate 810000 lane count 1
[10:51:18] [PASSED] Link rate 540000 lane count 4
[10:51:18] [PASSED] Link rate 540000 lane count 2
[10:51:18] [PASSED] Link rate 540000 lane count 1
[10:51:18] [PASSED] Link rate 270000 lane count 4
[10:51:18] [PASSED] Link rate 270000 lane count 2
[10:51:18] [PASSED] Link rate 270000 lane count 1
[10:51:18] [PASSED] Link rate 162000 lane count 4
[10:51:18] [PASSED] Link rate 162000 lane count 2
[10:51:18] [PASSED] Link rate 162000 lane count 1
[10:51:18] ========== [PASSED] drm_test_dp_mst_calc_pbn_div ===========
[10:51:18] ========= drm_test_dp_mst_sideband_msg_req_decode  =========
[10:51:18] [PASSED] DP_ENUM_PATH_RESOURCES with port number
[10:51:18] [PASSED] DP_POWER_UP_PHY with port number
[10:51:18] [PASSED] DP_POWER_DOWN_PHY with port number
[10:51:18] [PASSED] DP_ALLOCATE_PAYLOAD with SDP stream sinks
[10:51:18] [PASSED] DP_ALLOCATE_PAYLOAD with port number
[10:51:18] [PASSED] DP_ALLOCATE_PAYLOAD with VCPI
[10:51:18] [PASSED] DP_ALLOCATE_PAYLOAD with PBN
[10:51:18] [PASSED] DP_QUERY_PAYLOAD with port number
[10:51:18] [PASSED] DP_QUERY_PAYLOAD with VCPI
[10:51:18] [PASSED] DP_REMOTE_DPCD_READ with port number
[10:51:18] [PASSED] DP_REMOTE_DPCD_READ with DPCD address
[10:51:18] [PASSED] DP_REMOTE_DPCD_READ with max number of bytes
[10:51:18] [PASSED] DP_REMOTE_DPCD_WRITE with port number
[10:51:18] [PASSED] DP_REMOTE_DPCD_WRITE with DPCD address
[10:51:18] [PASSED] DP_REMOTE_DPCD_WRITE with data array
[10:51:18] [PASSED] DP_REMOTE_I2C_READ with port number
[10:51:18] [PASSED] DP_REMOTE_I2C_READ with I2C device ID
[10:51:18] [PASSED] DP_REMOTE_I2C_READ with transactions array
[10:51:18] [PASSED] DP_REMOTE_I2C_WRITE with port number
[10:51:18] [PASSED] DP_REMOTE_I2C_WRITE with I2C device ID
[10:51:18] [PASSED] DP_REMOTE_I2C_WRITE with data array
[10:51:18] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream ID
[10:51:18] [PASSED] DP_QUERY_STREAM_ENC_STATUS with client ID
[10:51:18] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream event
[10:51:18] [PASSED] DP_QUERY_STREAM_ENC_STATUS with valid stream event
[10:51:18] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream behavior
[10:51:18] [PASSED] DP_QUERY_STREAM_ENC_STATUS with a valid stream behavior
[10:51:18] ===== [PASSED] drm_test_dp_mst_sideband_msg_req_decode =====
[10:51:18] ================ [PASSED] drm_dp_mst_helper ================
[10:51:18] ================== drm_exec (7 subtests) ===================
[10:51:18] [PASSED] sanitycheck
[10:51:18] [PASSED] test_lock
[10:51:18] [PASSED] test_lock_unlock
[10:51:18] [PASSED] test_duplicates
[10:51:18] [PASSED] test_prepare
[10:51:18] [PASSED] test_prepare_array
[10:51:18] [PASSED] test_multiple_loops
[10:51:18] ==================== [PASSED] drm_exec =====================
[10:51:18] =========== drm_format_helper_test (17 subtests) ===========
[10:51:18] ============== drm_test_fb_xrgb8888_to_gray8  ==============
[10:51:18] [PASSED] single_pixel_source_buffer
[10:51:18] [PASSED] single_pixel_clip_rectangle
[10:51:18] [PASSED] well_known_colors
[10:51:18] [PASSED] destination_pitch
[10:51:18] ========== [PASSED] drm_test_fb_xrgb8888_to_gray8 ==========
[10:51:18] ============= drm_test_fb_xrgb8888_to_rgb332  ==============
[10:51:18] [PASSED] single_pixel_source_buffer
[10:51:18] [PASSED] single_pixel_clip_rectangle
[10:51:18] [PASSED] well_known_colors
[10:51:18] [PASSED] destination_pitch
[10:51:18] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb332 ==========
[10:51:18] ============= drm_test_fb_xrgb8888_to_rgb565  ==============
[10:51:18] [PASSED] single_pixel_source_buffer
[10:51:18] [PASSED] single_pixel_clip_rectangle
[10:51:18] [PASSED] well_known_colors
[10:51:18] [PASSED] destination_pitch
[10:51:18] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb565 ==========
[10:51:18] ============ drm_test_fb_xrgb8888_to_xrgb1555  =============
[10:51:18] [PASSED] single_pixel_source_buffer
[10:51:18] [PASSED] single_pixel_clip_rectangle
[10:51:18] [PASSED] well_known_colors
[10:51:18] [PASSED] destination_pitch
[10:51:18] ======== [PASSED] drm_test_fb_xrgb8888_to_xrgb1555 =========
[10:51:18] ============ drm_test_fb_xrgb8888_to_argb1555  =============
[10:51:18] [PASSED] single_pixel_source_buffer
[10:51:18] [PASSED] single_pixel_clip_rectangle
[10:51:18] [PASSED] well_known_colors
[10:51:18] [PASSED] destination_pitch
[10:51:18] ======== [PASSED] drm_test_fb_xrgb8888_to_argb1555 =========
[10:51:18] ============ drm_test_fb_xrgb8888_to_rgba5551  =============
[10:51:18] [PASSED] single_pixel_source_buffer
[10:51:18] [PASSED] single_pixel_clip_rectangle
[10:51:18] [PASSED] well_known_colors
[10:51:18] [PASSED] destination_pitch
[10:51:18] ======== [PASSED] drm_test_fb_xrgb8888_to_rgba5551 =========
[10:51:18] ============= drm_test_fb_xrgb8888_to_rgb888  ==============
[10:51:18] [PASSED] single_pixel_source_buffer
[10:51:18] [PASSED] single_pixel_clip_rectangle
[10:51:18] [PASSED] well_known_colors
[10:51:18] [PASSED] destination_pitch
[10:51:18] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb888 ==========
[10:51:18] ============= drm_test_fb_xrgb8888_to_bgr888  ==============
[10:51:18] [PASSED] single_pixel_source_buffer
[10:51:18] [PASSED] single_pixel_clip_rectangle
[10:51:18] [PASSED] well_known_colors
[10:51:18] [PASSED] destination_pitch
[10:51:18] ========= [PASSED] drm_test_fb_xrgb8888_to_bgr888 ==========
[10:51:18] ============ drm_test_fb_xrgb8888_to_argb8888  =============
[10:51:18] [PASSED] single_pixel_source_buffer
[10:51:18] [PASSED] single_pixel_clip_rectangle
[10:51:18] [PASSED] well_known_colors
[10:51:18] [PASSED] destination_pitch
[10:51:18] ======== [PASSED] drm_test_fb_xrgb8888_to_argb8888 =========
[10:51:18] =========== drm_test_fb_xrgb8888_to_xrgb2101010  ===========
[10:51:18] [PASSED] single_pixel_source_buffer
[10:51:18] [PASSED] single_pixel_clip_rectangle
[10:51:18] [PASSED] well_known_colors
[10:51:18] [PASSED] destination_pitch
[10:51:18] ======= [PASSED] drm_test_fb_xrgb8888_to_xrgb2101010 =======
[10:51:18] =========== drm_test_fb_xrgb8888_to_argb2101010  ===========
[10:51:18] [PASSED] single_pixel_source_buffer
[10:51:18] [PASSED] single_pixel_clip_rectangle
[10:51:18] [PASSED] well_known_colors
[10:51:18] [PASSED] destination_pitch
[10:51:18] ======= [PASSED] drm_test_fb_xrgb8888_to_argb2101010 =======
[10:51:18] ============== drm_test_fb_xrgb8888_to_mono  ===============
[10:51:18] [PASSED] single_pixel_source_buffer
[10:51:18] [PASSED] single_pixel_clip_rectangle
[10:51:18] [PASSED] well_known_colors
[10:51:18] [PASSED] destination_pitch
[10:51:18] ========== [PASSED] drm_test_fb_xrgb8888_to_mono ===========
[10:51:18] ==================== drm_test_fb_swab  =====================
[10:51:18] [PASSED] single_pixel_source_buffer
[10:51:18] [PASSED] single_pixel_clip_rectangle
[10:51:18] [PASSED] well_known_colors
[10:51:18] [PASSED] destination_pitch
[10:51:18] ================ [PASSED] drm_test_fb_swab =================
[10:51:18] ============ drm_test_fb_xrgb8888_to_xbgr8888  =============
[10:51:18] [PASSED] single_pixel_source_buffer
[10:51:18] [PASSED] single_pixel_clip_rectangle
[10:51:18] [PASSED] well_known_colors
[10:51:18] [PASSED] destination_pitch
[10:51:18] ======== [PASSED] drm_test_fb_xrgb8888_to_xbgr8888 =========
[10:51:18] ============ drm_test_fb_xrgb8888_to_abgr8888  =============
[10:51:18] [PASSED] single_pixel_source_buffer
[10:51:18] [PASSED] single_pixel_clip_rectangle
[10:51:18] [PASSED] well_known_colors
[10:51:18] [PASSED] destination_pitch
[10:51:18] ======== [PASSED] drm_test_fb_xrgb8888_to_abgr8888 =========
[10:51:18] ================= drm_test_fb_clip_offset  =================
[10:51:18] [PASSED] pass through
[10:51:18] [PASSED] horizontal offset
[10:51:18] [PASSED] vertical offset
[10:51:18] [PASSED] horizontal and vertical offset
[10:51:18] [PASSED] horizontal offset (custom pitch)
[10:51:18] [PASSED] vertical offset (custom pitch)
[10:51:18] [PASSED] horizontal and vertical offset (custom pitch)
[10:51:18] ============= [PASSED] drm_test_fb_clip_offset =============
[10:51:18] =================== drm_test_fb_memcpy  ====================
[10:51:18] [PASSED] single_pixel_source_buffer: XR24 little-endian (0x34325258)
[10:51:18] [PASSED] single_pixel_source_buffer: XRA8 little-endian (0x38415258)
[10:51:18] [PASSED] single_pixel_source_buffer: YU24 little-endian (0x34325559)
[10:51:18] [PASSED] single_pixel_clip_rectangle: XB24 little-endian (0x34324258)
[10:51:18] [PASSED] single_pixel_clip_rectangle: XRA8 little-endian (0x38415258)
[10:51:18] [PASSED] single_pixel_clip_rectangle: YU24 little-endian (0x34325559)
[10:51:18] [PASSED] well_known_colors: XB24 little-endian (0x34324258)
[10:51:18] [PASSED] well_known_colors: XRA8 little-endian (0x38415258)
[10:51:18] [PASSED] well_known_colors: YU24 little-endian (0x34325559)
[10:51:18] [PASSED] destination_pitch: XB24 little-endian (0x34324258)
[10:51:18] [PASSED] destination_pitch: XRA8 little-endian (0x38415258)
[10:51:18] [PASSED] destination_pitch: YU24 little-endian (0x34325559)
[10:51:18] =============== [PASSED] drm_test_fb_memcpy ================
[10:51:18] ============= [PASSED] drm_format_helper_test ==============
[10:51:18] ================= drm_format (18 subtests) =================
[10:51:18] [PASSED] drm_test_format_block_width_invalid
[10:51:18] [PASSED] drm_test_format_block_width_one_plane
[10:51:18] [PASSED] drm_test_format_block_width_two_plane
[10:51:18] [PASSED] drm_test_format_block_width_three_plane
[10:51:18] [PASSED] drm_test_format_block_width_tiled
[10:51:18] [PASSED] drm_test_format_block_height_invalid
[10:51:18] [PASSED] drm_test_format_block_height_one_plane
[10:51:18] [PASSED] drm_test_format_block_height_two_plane
[10:51:18] [PASSED] drm_test_format_block_height_three_plane
[10:51:18] [PASSED] drm_test_format_block_height_tiled
[10:51:18] [PASSED] drm_test_format_min_pitch_invalid
[10:51:18] [PASSED] drm_test_format_min_pitch_one_plane_8bpp
[10:51:18] [PASSED] drm_test_format_min_pitch_one_plane_16bpp
[10:51:18] [PASSED] drm_test_format_min_pitch_one_plane_24bpp
[10:51:18] [PASSED] drm_test_format_min_pitch_one_plane_32bpp
[10:51:18] [PASSED] drm_test_format_min_pitch_two_plane
[10:51:18] [PASSED] drm_test_format_min_pitch_three_plane_8bpp
[10:51:18] [PASSED] drm_test_format_min_pitch_tiled
[10:51:18] =================== [PASSED] drm_format ====================
[10:51:18] ============== drm_framebuffer (10 subtests) ===============
[10:51:18] ========== drm_test_framebuffer_check_src_coords  ==========
[10:51:18] [PASSED] Success: source fits into fb
[10:51:18] [PASSED] Fail: overflowing fb with x-axis coordinate
[10:51:18] [PASSED] Fail: overflowing fb with y-axis coordinate
[10:51:18] [PASSED] Fail: overflowing fb with source width
[10:51:18] [PASSED] Fail: overflowing fb with source height
[10:51:18] ====== [PASSED] drm_test_framebuffer_check_src_coords ======
[10:51:18] [PASSED] drm_test_framebuffer_cleanup
[10:51:18] =============== drm_test_framebuffer_create  ===============
[10:51:18] [PASSED] ABGR8888 normal sizes
[10:51:18] [PASSED] ABGR8888 max sizes
[10:51:18] [PASSED] ABGR8888 pitch greater than min required
[10:51:18] [PASSED] ABGR8888 pitch less than min required
[10:51:18] [PASSED] ABGR8888 Invalid width
[10:51:18] [PASSED] ABGR8888 Invalid buffer handle
[10:51:18] [PASSED] No pixel format
[10:51:18] [PASSED] ABGR8888 Width 0
[10:51:18] [PASSED] ABGR8888 Height 0
[10:51:18] [PASSED] ABGR8888 Out of bound height * pitch combination
[10:51:18] [PASSED] ABGR8888 Large buffer offset
[10:51:18] [PASSED] ABGR8888 Buffer offset for inexistent plane
[10:51:18] [PASSED] ABGR8888 Invalid flag
[10:51:18] [PASSED] ABGR8888 Set DRM_MODE_FB_MODIFIERS without modifiers
[10:51:18] [PASSED] ABGR8888 Valid buffer modifier
[10:51:18] [PASSED] ABGR8888 Invalid buffer modifier(DRM_FORMAT_MOD_SAMSUNG_64_32_TILE)
[10:51:18] [PASSED] ABGR8888 Extra pitches without DRM_MODE_FB_MODIFIERS
[10:51:18] [PASSED] ABGR8888 Extra pitches with DRM_MODE_FB_MODIFIERS
[10:51:18] [PASSED] NV12 Normal sizes
[10:51:18] [PASSED] NV12 Max sizes
[10:51:18] [PASSED] NV12 Invalid pitch
[10:51:18] [PASSED] NV12 Invalid modifier/missing DRM_MODE_FB_MODIFIERS flag
[10:51:18] [PASSED] NV12 different  modifier per-plane
[10:51:18] [PASSED] NV12 with DRM_FORMAT_MOD_SAMSUNG_64_32_TILE
[10:51:18] [PASSED] NV12 Valid modifiers without DRM_MODE_FB_MODIFIERS
[10:51:18] [PASSED] NV12 Modifier for inexistent plane
[10:51:18] [PASSED] NV12 Handle for inexistent plane
[10:51:18] [PASSED] NV12 Handle for inexistent plane without DRM_MODE_FB_MODIFIERS
[10:51:18] [PASSED] YVU420 DRM_MODE_FB_MODIFIERS set without modifier
[10:51:18] [PASSED] YVU420 Normal sizes
[10:51:18] [PASSED] YVU420 Max sizes
[10:51:18] [PASSED] YVU420 Invalid pitch
[10:51:18] [PASSED] YVU420 Different pitches
[10:51:18] [PASSED] YVU420 Different buffer offsets/pitches
[10:51:18] [PASSED] YVU420 Modifier set just for plane 0, without DRM_MODE_FB_MODIFIERS
[10:51:18] [PASSED] YVU420 Modifier set just for planes 0, 1, without DRM_MODE_FB_MODIFIERS
[10:51:18] [PASSED] YVU420 Modifier set just for plane 0, 1, with DRM_MODE_FB_MODIFIERS
[10:51:18] [PASSED] YVU420 Valid modifier
[10:51:18] [PASSED] YVU420 Different modifiers per plane
[10:51:18] [PASSED] YVU420 Modifier for inexistent plane
[10:51:18] [PASSED] YUV420_10BIT Invalid modifier(DRM_FORMAT_MOD_LINEAR)
[10:51:18] [PASSED] X0L2 Normal sizes
[10:51:18] [PASSED] X0L2 Max sizes
[10:51:18] [PASSED] X0L2 Invalid pitch
[10:51:18] [PASSED] X0L2 Pitch greater than minimum required
[10:51:18] [PASSED] X0L2 Handle for inexistent plane
[10:51:18] [PASSED] X0L2 Offset for inexistent plane, without DRM_MODE_FB_MODIFIERS set
[10:51:18] [PASSED] X0L2 Modifier without DRM_MODE_FB_MODIFIERS set
[10:51:18] [PASSED] X0L2 Valid modifier
[10:51:18] [PASSED] X0L2 Modifier for inexistent plane
[10:51:18] =========== [PASSED] drm_test_framebuffer_create ===========
[10:51:18] [PASSED] drm_test_framebuffer_free
[10:51:18] [PASSED] drm_test_framebuffer_init
[10:51:18] [PASSED] drm_test_framebuffer_init_bad_format
[10:51:18] [PASSED] drm_test_framebuffer_init_dev_mismatch
[10:51:18] [PASSED] drm_test_framebuffer_lookup
[10:51:18] [PASSED] drm_test_framebuffer_lookup_inexistent
[10:51:18] [PASSED] drm_test_framebuffer_modifiers_not_supported
[10:51:18] ================= [PASSED] drm_framebuffer =================
[10:51:18] ================ drm_gem_shmem (8 subtests) ================
[10:51:18] [PASSED] drm_gem_shmem_test_obj_create
[10:51:18] [PASSED] drm_gem_shmem_test_obj_create_private
[10:51:18] [PASSED] drm_gem_shmem_test_pin_pages
[10:51:18] [PASSED] drm_gem_shmem_test_vmap
[10:51:18] [PASSED] drm_gem_shmem_test_get_pages_sgt
[10:51:18] [PASSED] drm_gem_shmem_test_get_sg_table
[10:51:18] [PASSED] drm_gem_shmem_test_madvise
[10:51:18] [PASSED] drm_gem_shmem_test_purge
[10:51:18] ================== [PASSED] drm_gem_shmem ==================
[10:51:18] === drm_atomic_helper_connector_hdmi_check (27 subtests) ===
[10:51:18] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode
[10:51:18] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode_vic_1
[10:51:18] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode
[10:51:18] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode_vic_1
[10:51:18] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode
[10:51:18] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode_vic_1
[10:51:18] ====== drm_test_check_broadcast_rgb_cea_mode_yuv420  =======
[10:51:18] [PASSED] Automatic
[10:51:18] [PASSED] Full
[10:51:18] [PASSED] Limited 16:235
[10:51:18] == [PASSED] drm_test_check_broadcast_rgb_cea_mode_yuv420 ===
[10:51:18] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_changed
[10:51:18] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_not_changed
[10:51:18] [PASSED] drm_test_check_disable_connector
[10:51:18] [PASSED] drm_test_check_hdmi_funcs_reject_rate
[10:51:18] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_rgb
[10:51:18] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_yuv420
[10:51:18] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv422
[10:51:18] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv420
[10:51:18] [PASSED] drm_test_check_driver_unsupported_fallback_yuv420
[10:51:18] [PASSED] drm_test_check_output_bpc_crtc_mode_changed
[10:51:18] [PASSED] drm_test_check_output_bpc_crtc_mode_not_changed
[10:51:18] [PASSED] drm_test_check_output_bpc_dvi
[10:51:18] [PASSED] drm_test_check_output_bpc_format_vic_1
[10:51:18] [PASSED] drm_test_check_output_bpc_format_display_8bpc_only
[10:51:18] [PASSED] drm_test_check_output_bpc_format_display_rgb_only
[10:51:18] [PASSED] drm_test_check_output_bpc_format_driver_8bpc_only
[10:51:18] [PASSED] drm_test_check_output_bpc_format_driver_rgb_only
[10:51:18] [PASSED] drm_test_check_tmds_char_rate_rgb_8bpc
[10:51:18] [PASSED] drm_test_check_tmds_char_rate_rgb_10bpc
[10:51:18] [PASSED] drm_test_check_tmds_char_rate_rgb_12bpc
[10:51:18] ===== [PASSED] drm_atomic_helper_connector_hdmi_check ======
[10:51:18] === drm_atomic_helper_connector_hdmi_reset (6 subtests) ====
[10:51:18] [PASSED] drm_test_check_broadcast_rgb_value
[10:51:18] [PASSED] drm_test_check_bpc_8_value
[10:51:18] [PASSED] drm_test_check_bpc_10_value
[10:51:18] [PASSED] drm_test_check_bpc_12_value
[10:51:18] [PASSED] drm_test_check_format_value
[10:51:18] [PASSED] drm_test_check_tmds_char_value
[10:51:18] ===== [PASSED] drm_atomic_helper_connector_hdmi_reset ======
[10:51:18] = drm_atomic_helper_connector_hdmi_mode_valid (4 subtests) =
[10:51:18] [PASSED] drm_test_check_mode_valid
[10:51:18] [PASSED] drm_test_check_mode_valid_reject
[10:51:18] [PASSED] drm_test_check_mode_valid_reject_rate
[10:51:18] [PASSED] drm_test_check_mode_valid_reject_max_clock
[10:51:18] === [PASSED] drm_atomic_helper_connector_hdmi_mode_valid ===
[10:51:18] ================= drm_managed (2 subtests) =================
[10:51:18] [PASSED] drm_test_managed_release_action
[10:51:18] [PASSED] drm_test_managed_run_action
[10:51:18] =================== [PASSED] drm_managed ===================
[10:51:18] =================== drm_mm (6 subtests) ====================
[10:51:18] [PASSED] drm_test_mm_init
[10:51:18] [PASSED] drm_test_mm_debug
[10:51:18] [PASSED] drm_test_mm_align32
[10:51:18] [PASSED] drm_test_mm_align64
[10:51:18] [PASSED] drm_test_mm_lowest
[10:51:18] [PASSED] drm_test_mm_highest
[10:51:18] ===================== [PASSED] drm_mm ======================
[10:51:18] ============= drm_modes_analog_tv (5 subtests) =============
[10:51:18] [PASSED] drm_test_modes_analog_tv_mono_576i
[10:51:18] [PASSED] drm_test_modes_analog_tv_ntsc_480i
[10:51:18] [PASSED] drm_test_modes_analog_tv_ntsc_480i_inlined
[10:51:18] [PASSED] drm_test_modes_analog_tv_pal_576i
[10:51:18] [PASSED] drm_test_modes_analog_tv_pal_576i_inlined
[10:51:18] =============== [PASSED] drm_modes_analog_tv ===============
[10:51:18] ============== drm_plane_helper (2 subtests) ===============
[10:51:18] =============== drm_test_check_plane_state  ================
[10:51:18] [PASSED] clipping_simple
[10:51:18] [PASSED] clipping_rotate_reflect
[10:51:18] [PASSED] positioning_simple
[10:51:18] [PASSED] upscaling
[10:51:18] [PASSED] downscaling
[10:51:18] [PASSED] rounding1
[10:51:18] [PASSED] rounding2
[10:51:18] [PASSED] rounding3
[10:51:18] [PASSED] rounding4
[10:51:18] =========== [PASSED] drm_test_check_plane_state ============
[10:51:18] =========== drm_test_check_invalid_plane_state  ============
[10:51:18] [PASSED] positioning_invalid
[10:51:18] [PASSED] upscaling_invalid
[10:51:18] [PASSED] downscaling_invalid
[10:51:18] ======= [PASSED] drm_test_check_invalid_plane_state ========
[10:51:18] ================ [PASSED] drm_plane_helper =================
[10:51:18] ====== drm_connector_helper_tv_get_modes (1 subtest) =======
[10:51:18] ====== drm_test_connector_helper_tv_get_modes_check  =======
[10:51:18] [PASSED] None
[10:51:18] [PASSED] PAL
[10:51:18] [PASSED] NTSC
[10:51:18] [PASSED] Both, NTSC Default
[10:51:18] [PASSED] Both, PAL Default
[10:51:18] [PASSED] Both, NTSC Default, with PAL on command-line
[10:51:18] [PASSED] Both, PAL Default, with NTSC on command-line
[10:51:18] == [PASSED] drm_test_connector_helper_tv_get_modes_check ===
[10:51:18] ======== [PASSED] drm_connector_helper_tv_get_modes ========
[10:51:18] ================== drm_rect (9 subtests) ===================
[10:51:18] [PASSED] drm_test_rect_clip_scaled_div_by_zero
[10:51:18] [PASSED] drm_test_rect_clip_scaled_not_clipped
[10:51:18] [PASSED] drm_test_rect_clip_scaled_clipped
[10:51:18] [PASSED] drm_test_rect_clip_scaled_signed_vs_unsigned
[10:51:18] ================= drm_test_rect_intersect  =================
[10:51:18] [PASSED] top-left x bottom-right: 2x2+1+1 x 2x2+0+0
[10:51:18] [PASSED] top-right x bottom-left: 2x2+0+0 x 2x2+1-1
[10:51:18] [PASSED] bottom-left x top-right: 2x2+1-1 x 2x2+0+0
[10:51:18] [PASSED] bottom-right x top-left: 2x2+0+0 x 2x2+1+1
[10:51:18] [PASSED] right x left: 2x1+0+0 x 3x1+1+0
[10:51:18] [PASSED] left x right: 3x1+1+0 x 2x1+0+0
[10:51:18] [PASSED] up x bottom: 1x2+0+0 x 1x3+0-1
[10:51:18] [PASSED] bottom x up: 1x3+0-1 x 1x2+0+0
[10:51:18] [PASSED] touching corner: 1x1+0+0 x 2x2+1+1
[10:51:18] [PASSED] touching side: 1x1+0+0 x 1x1+1+0
[10:51:18] [PASSED] equal rects: 2x2+0+0 x 2x2+0+0
[10:51:18] [PASSED] inside another: 2x2+0+0 x 1x1+1+1
[10:51:18] [PASSED] far away: 1x1+0+0 x 1x1+3+6
[10:51:18] [PASSED] points intersecting: 0x0+5+10 x 0x0+5+10
[10:51:18] [PASSED] points not intersecting: 0x0+0+0 x 0x0+5+10
[10:51:18] ============= [PASSED] drm_test_rect_intersect =============
[10:51:18] ================ drm_test_rect_calc_hscale  ================
[10:51:18] [PASSED] normal use
[10:51:18] [PASSED] out of max range
[10:51:18] [PASSED] out of min range
[10:51:18] [PASSED] zero dst
[10:51:18] [PASSED] negative src
[10:51:18] [PASSED] negative dst
[10:51:18] ============ [PASSED] drm_test_rect_calc_hscale ============
[10:51:18] ================ drm_test_rect_calc_vscale  ================
[10:51:18] [PASSED] normal use
[10:51:18] [PASSED] out of max range
[10:51:18] [PASSED] out of min range
[10:51:18] [PASSED] zero dst
[10:51:18] [PASSED] negative src
[10:51:18] [PASSED] negative dst
[10:51:18] ============ [PASSED] drm_test_rect_calc_vscale ============
[10:51:18] ================== drm_test_rect_rotate  ===================
[10:51:18] [PASSED] reflect-x
[10:51:18] [PASSED] reflect-y
[10:51:18] [PASSED] rotate-0
[10:51:18] [PASSED] rotate-90
[10:51:18] [PASSED] rotate-180
[10:51:18] [PASSED] rotate-270
stty: 'standard input': Inappropriate ioctl for device
[10:51:18] ============== [PASSED] drm_test_rect_rotate ===============
[10:51:18] ================ drm_test_rect_rotate_inv  =================
[10:51:18] [PASSED] reflect-x
[10:51:18] [PASSED] reflect-y
[10:51:18] [PASSED] rotate-0
[10:51:18] [PASSED] rotate-90
[10:51:18] [PASSED] rotate-180
[10:51:18] [PASSED] rotate-270
[10:51:18] ============ [PASSED] drm_test_rect_rotate_inv =============
[10:51:18] ==================== [PASSED] drm_rect =====================
[10:51:18] ============ drm_sysfb_modeset_test (1 subtest) ============
[10:51:18] ============ drm_test_sysfb_build_fourcc_list  =============
[10:51:18] [PASSED] no native formats
[10:51:18] [PASSED] XRGB8888 as native format
[10:51:18] [PASSED] remove duplicates
[10:51:18] [PASSED] convert alpha formats
[10:51:18] [PASSED] random formats
[10:51:18] ======== [PASSED] drm_test_sysfb_build_fourcc_list =========
[10:51:18] ============= [PASSED] drm_sysfb_modeset_test ==============
[10:51:18] ============================================================
[10:51:18] Testing complete. Ran 616 tests: passed: 616
[10:51:18] Elapsed time: 24.866s total, 1.735s configuring, 22.962s building, 0.132s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/ttm/tests/.kunitconfig
[10:51:18] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[10:51:19] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[10:51:27] Starting KUnit Kernel (1/1)...
[10:51:27] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[10:51:27] ================= ttm_device (5 subtests) ==================
[10:51:27] [PASSED] ttm_device_init_basic
[10:51:27] [PASSED] ttm_device_init_multiple
[10:51:27] [PASSED] ttm_device_fini_basic
[10:51:27] [PASSED] ttm_device_init_no_vma_man
[10:51:27] ================== ttm_device_init_pools  ==================
[10:51:27] [PASSED] No DMA allocations, no DMA32 required
[10:51:27] [PASSED] DMA allocations, DMA32 required
[10:51:27] [PASSED] No DMA allocations, DMA32 required
[10:51:27] [PASSED] DMA allocations, no DMA32 required
[10:51:27] ============== [PASSED] ttm_device_init_pools ==============
[10:51:27] =================== [PASSED] ttm_device ====================
[10:51:27] ================== ttm_pool (8 subtests) ===================
[10:51:27] ================== ttm_pool_alloc_basic  ===================
[10:51:27] [PASSED] One page
[10:51:27] [PASSED] More than one page
[10:51:27] [PASSED] Above the allocation limit
[10:51:27] [PASSED] One page, with coherent DMA mappings enabled
[10:51:27] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[10:51:27] ============== [PASSED] ttm_pool_alloc_basic ===============
[10:51:27] ============== ttm_pool_alloc_basic_dma_addr  ==============
[10:51:27] [PASSED] One page
[10:51:27] [PASSED] More than one page
[10:51:27] [PASSED] Above the allocation limit
[10:51:27] [PASSED] One page, with coherent DMA mappings enabled
[10:51:27] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[10:51:27] ========== [PASSED] ttm_pool_alloc_basic_dma_addr ==========
[10:51:27] [PASSED] ttm_pool_alloc_order_caching_match
[10:51:27] [PASSED] ttm_pool_alloc_caching_mismatch
[10:51:27] [PASSED] ttm_pool_alloc_order_mismatch
[10:51:27] [PASSED] ttm_pool_free_dma_alloc
[10:51:27] [PASSED] ttm_pool_free_no_dma_alloc
[10:51:27] [PASSED] ttm_pool_fini_basic
[10:51:27] ==================== [PASSED] ttm_pool =====================
[10:51:27] ================ ttm_resource (8 subtests) =================
[10:51:27] ================= ttm_resource_init_basic  =================
[10:51:27] [PASSED] Init resource in TTM_PL_SYSTEM
[10:51:27] [PASSED] Init resource in TTM_PL_VRAM
[10:51:27] [PASSED] Init resource in a private placement
[10:51:27] [PASSED] Init resource in TTM_PL_SYSTEM, set placement flags
[10:51:27] ============= [PASSED] ttm_resource_init_basic =============
[10:51:27] [PASSED] ttm_resource_init_pinned
[10:51:27] [PASSED] ttm_resource_fini_basic
[10:51:27] [PASSED] ttm_resource_manager_init_basic
[10:51:27] [PASSED] ttm_resource_manager_usage_basic
[10:51:27] [PASSED] ttm_resource_manager_set_used_basic
[10:51:27] [PASSED] ttm_sys_man_alloc_basic
[10:51:27] [PASSED] ttm_sys_man_free_basic
[10:51:27] ================== [PASSED] ttm_resource ===================
[10:51:27] =================== ttm_tt (15 subtests) ===================
[10:51:27] ==================== ttm_tt_init_basic  ====================
[10:51:27] [PASSED] Page-aligned size
[10:51:27] [PASSED] Extra pages requested
[10:51:27] ================ [PASSED] ttm_tt_init_basic ================
[10:51:27] [PASSED] ttm_tt_init_misaligned
[10:51:27] [PASSED] ttm_tt_fini_basic
[10:51:27] [PASSED] ttm_tt_fini_sg
[10:51:27] [PASSED] ttm_tt_fini_shmem
[10:51:27] [PASSED] ttm_tt_create_basic
[10:51:27] [PASSED] ttm_tt_create_invalid_bo_type
[10:51:27] [PASSED] ttm_tt_create_ttm_exists
[10:51:27] [PASSED] ttm_tt_create_failed
[10:51:27] [PASSED] ttm_tt_destroy_basic
[10:51:27] [PASSED] ttm_tt_populate_null_ttm
[10:51:27] [PASSED] ttm_tt_populate_populated_ttm
[10:51:27] [PASSED] ttm_tt_unpopulate_basic
[10:51:27] [PASSED] ttm_tt_unpopulate_empty_ttm
[10:51:27] [PASSED] ttm_tt_swapin_basic
[10:51:27] ===================== [PASSED] ttm_tt ======================
[10:51:27] =================== ttm_bo (14 subtests) ===================
[10:51:27] =========== ttm_bo_reserve_optimistic_no_ticket  ===========
[10:51:27] [PASSED] Cannot be interrupted and sleeps
[10:51:27] [PASSED] Cannot be interrupted, locks straight away
[10:51:27] [PASSED] Can be interrupted, sleeps
[10:51:27] ======= [PASSED] ttm_bo_reserve_optimistic_no_ticket =======
[10:51:27] [PASSED] ttm_bo_reserve_locked_no_sleep
[10:51:27] [PASSED] ttm_bo_reserve_no_wait_ticket
[10:51:28] [PASSED] ttm_bo_reserve_double_resv
[10:51:28] [PASSED] ttm_bo_reserve_interrupted
[10:51:28] [PASSED] ttm_bo_reserve_deadlock
[10:51:28] [PASSED] ttm_bo_unreserve_basic
[10:51:28] [PASSED] ttm_bo_unreserve_pinned
[10:51:28] [PASSED] ttm_bo_unreserve_bulk
[10:51:28] [PASSED] ttm_bo_put_basic
[10:51:28] [PASSED] ttm_bo_put_shared_resv
[10:51:28] [PASSED] ttm_bo_pin_basic
[10:51:28] [PASSED] ttm_bo_pin_unpin_resource
[10:51:28] [PASSED] ttm_bo_multiple_pin_one_unpin
[10:51:28] ===================== [PASSED] ttm_bo ======================
[10:51:28] ============== ttm_bo_validate (21 subtests) ===============
[10:51:28] ============== ttm_bo_init_reserved_sys_man  ===============
[10:51:28] [PASSED] Buffer object for userspace
[10:51:28] [PASSED] Kernel buffer object
[10:51:28] [PASSED] Shared buffer object
[10:51:28] ========== [PASSED] ttm_bo_init_reserved_sys_man ===========
[10:51:28] ============== ttm_bo_init_reserved_mock_man  ==============
[10:51:28] [PASSED] Buffer object for userspace
[10:51:28] [PASSED] Kernel buffer object
[10:51:28] [PASSED] Shared buffer object
[10:51:28] ========== [PASSED] ttm_bo_init_reserved_mock_man ==========
[10:51:28] [PASSED] ttm_bo_init_reserved_resv
[10:51:28] ================== ttm_bo_validate_basic  ==================
[10:51:28] [PASSED] Buffer object for userspace
[10:51:28] [PASSED] Kernel buffer object
[10:51:28] [PASSED] Shared buffer object
[10:51:28] ============== [PASSED] ttm_bo_validate_basic ==============
[10:51:28] [PASSED] ttm_bo_validate_invalid_placement
[10:51:28] ============= ttm_bo_validate_same_placement  ==============
[10:51:28] [PASSED] System manager
[10:51:28] [PASSED] VRAM manager
[10:51:28] ========= [PASSED] ttm_bo_validate_same_placement ==========
[10:51:28] [PASSED] ttm_bo_validate_failed_alloc
[10:51:28] [PASSED] ttm_bo_validate_pinned
[10:51:28] [PASSED] ttm_bo_validate_busy_placement
[10:51:28] ================ ttm_bo_validate_multihop  =================
[10:51:28] [PASSED] Buffer object for userspace
[10:51:28] [PASSED] Kernel buffer object
[10:51:28] [PASSED] Shared buffer object
[10:51:28] ============ [PASSED] ttm_bo_validate_multihop =============
[10:51:28] ========== ttm_bo_validate_no_placement_signaled  ==========
[10:51:28] [PASSED] Buffer object in system domain, no page vector
[10:51:28] [PASSED] Buffer object in system domain with an existing page vector
[10:51:28] ====== [PASSED] ttm_bo_validate_no_placement_signaled ======
[10:51:28] ======== ttm_bo_validate_no_placement_not_signaled  ========
[10:51:28] [PASSED] Buffer object for userspace
[10:51:28] [PASSED] Kernel buffer object
[10:51:28] [PASSED] Shared buffer object
[10:51:28] ==== [PASSED] ttm_bo_validate_no_placement_not_signaled ====
[10:51:28] [PASSED] ttm_bo_validate_move_fence_signaled
[10:51:28] ========= ttm_bo_validate_move_fence_not_signaled  =========
[10:51:28] [PASSED] Waits for GPU
[10:51:28] [PASSED] Tries to lock straight away
[10:51:28] ===== [PASSED] ttm_bo_validate_move_fence_not_signaled =====
[10:51:28] [PASSED] ttm_bo_validate_happy_evict
[10:51:28] [PASSED] ttm_bo_validate_all_pinned_evict
[10:51:28] [PASSED] ttm_bo_validate_allowed_only_evict
[10:51:28] [PASSED] ttm_bo_validate_deleted_evict
[10:51:28] [PASSED] ttm_bo_validate_busy_domain_evict
[10:51:28] [PASSED] ttm_bo_validate_evict_gutting
[10:51:28] [PASSED] ttm_bo_validate_recrusive_evict
stty: 'standard input': Inappropriate ioctl for device
[10:51:28] ================= [PASSED] ttm_bo_validate =================
[10:51:28] ============================================================
[10:51:28] Testing complete. Ran 101 tests: passed: 101
[10:51:28] Elapsed time: 9.839s total, 1.673s configuring, 7.950s building, 0.174s running

+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 36+ messages in thread

* ✓ Xe.CI.BAT: success for Driver-managed exhaustive eviction (rev2)
  2025-08-22  9:40 [PATCH v2 00/16] Driver-managed exhaustive eviction Thomas Hellström
                   ` (17 preceding siblings ...)
  2025-08-22 10:51 ` ✓ CI.KUnit: success " Patchwork
@ 2025-08-22 11:31 ` Patchwork
  2025-08-23  4:17 ` ✗ Xe.CI.Full: failure " Patchwork
  19 siblings, 0 replies; 36+ messages in thread
From: Patchwork @ 2025-08-22 11:31 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 1229 bytes --]

== Series Details ==

Series: Driver-managed exhaustive eviction (rev2)
URL   : https://patchwork.freedesktop.org/series/152882/
State : success

== Summary ==

CI Bug Log - changes from xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f_BAT -> xe-pw-152882v2_BAT
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Participating hosts (11 -> 11)
------------------------------

  No changes in participating hosts

Known issues
------------

  Here are the changes found in xe-pw-152882v2_BAT that come from known issues:

### IGT changes ###

  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [Intel XE#5783]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5783


Build changes
-------------

  * IGT: IGT_8503 -> IGT_8504
  * Linux: xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f -> xe-pw-152882v2

  IGT_8503: 8503
  IGT_8504: 8504
  xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f: cca87ca63e2f5b8a785dc59c23e526987530b27f
  xe-pw-152882v2: 152882v2

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/index.html

[-- Attachment #2: Type: text/html, Size: 1720 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 07/16] drm/xe: Convert SVM validation for exhaustive eviction
  2025-08-22  9:40 ` [PATCH v2 07/16] drm/xe: Convert SVM validation " Thomas Hellström
@ 2025-08-22 19:13   ` Matthew Brost
  0 siblings, 0 replies; 36+ messages in thread
From: Matthew Brost @ 2025-08-22 19:13 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Fri, Aug 22, 2025 at 11:40:21AM +0200, Thomas Hellström wrote:
> Convert SVM validation to support exhaustive eviction,
> using xe_validation_guard().
> 
> v2:
> - Wrap also xe_vm_range_rebind (Matt Brost)
> - Adapt to argument changes of xe_validation_guard().
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_svm.c | 99 +++++++++++++++++++------------------
>  1 file changed, 51 insertions(+), 48 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> index 39e3aa6df25a..667ca1f7cc29 100644
> --- a/drivers/gpu/drm/xe/xe_svm.c
> +++ b/drivers/gpu/drm/xe/xe_svm.c
> @@ -699,51 +699,48 @@ static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
>  	struct xe_device *xe = vr->xe;
>  	struct device *dev = xe->drm.dev;
>  	struct drm_buddy_block *block;
> +	struct xe_validation_ctx vctx;
>  	struct list_head *blocks;
> -	struct drm_exec *exec;
> +	struct drm_exec exec;
>  	struct xe_bo *bo;
> -	ktime_t time_end = 0;
> -	int err, idx;
> +	int err = 0, idx;
>  
>  	if (!drm_dev_enter(&xe->drm, &idx))
>  		return -ENODEV;
>  
>  	xe_pm_runtime_get(xe);
> -	exec = XE_VALIDATION_UNIMPLEMENTED;
> -
> - retry:
> -	bo = xe_bo_create_locked(vr->xe, NULL, NULL, end - start,
> -				 ttm_bo_type_device,
> -				 (IS_DGFX(xe) ? XE_BO_FLAG_VRAM(vr) : XE_BO_FLAG_SYSTEM) |
> -				 XE_BO_FLAG_CPU_ADDR_MIRROR, exec);
> -	if (IS_ERR(bo)) {
> -		err = PTR_ERR(bo);
> -		if (xe_vm_validate_should_retry(NULL, err, &time_end))
> -			goto retry;
> -		goto out_pm_put;
> -	}
>  
> -	drm_pagemap_devmem_init(&bo->devmem_allocation, dev, mm,
> -				&dpagemap_devmem_ops, dpagemap, end - start);
> -
> -	blocks = &to_xe_ttm_vram_mgr_resource(bo->ttm.resource)->blocks;
> -	list_for_each_entry(block, blocks, link)
> -		block->private = vr;
> +	xe_validation_guard(&vctx, &xe->val, &exec, (struct xe_val_flags) {}, err) {
> +		bo = xe_bo_create_locked(xe, NULL, NULL, end - start,
> +					 ttm_bo_type_device,
> +					 (IS_DGFX(xe) ? XE_BO_FLAG_VRAM(vr) : XE_BO_FLAG_SYSTEM) |
> +					 XE_BO_FLAG_CPU_ADDR_MIRROR, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +		if (IS_ERR(bo)) {
> +			err = PTR_ERR(bo);
> +			xe_validation_retry_on_oom(&vctx, &err);
> +			break;
> +		}
>  
> -	xe_bo_get(bo);
> +		drm_pagemap_devmem_init(&bo->devmem_allocation, dev, mm,
> +					&dpagemap_devmem_ops, dpagemap, end - start);
>  
> -	/* Ensure the device has a pm ref while there are device pages active. */
> -	xe_pm_runtime_get_noresume(xe);
> -	err = drm_pagemap_migrate_to_devmem(&bo->devmem_allocation, mm,
> -					    start, end, timeslice_ms,
> -					    xe_svm_devm_owner(xe));
> -	if (err)
> -		xe_svm_devmem_release(&bo->devmem_allocation);
> +		blocks = &to_xe_ttm_vram_mgr_resource(bo->ttm.resource)->blocks;
> +		list_for_each_entry(block, blocks, link)
> +			block->private = vr;
>  
> -	xe_bo_unlock(bo);
> -	xe_bo_put(bo);
> +		xe_bo_get(bo);
>  
> -out_pm_put:
> +		/* Ensure the device has a pm ref while there are device pages active. */
> +		xe_pm_runtime_get_noresume(xe);
> +		err = drm_pagemap_migrate_to_devmem(&bo->devmem_allocation, mm,
> +						    start, end, timeslice_ms,
> +						    xe_svm_devm_owner(xe));
> +		if (err)
> +			xe_svm_devmem_release(&bo->devmem_allocation);
> +		xe_bo_unlock(bo);
> +		xe_bo_put(bo);
> +	}
>  	xe_pm_runtime_put(xe);
>  	drm_dev_exit(idx);
>  
> @@ -820,11 +817,12 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
>  			IS_ENABLED(CONFIG_DRM_XE_PAGEMAP) ?
>  			vm->xe->atomic_svm_timeslice_ms : 0,
>  	};
> +	struct xe_validation_ctx vctx;
> +	struct drm_exec exec;
>  	struct xe_svm_range *range;
>  	struct dma_fence *fence;
>  	struct xe_tile *tile = gt_to_tile(gt);
>  	int migrate_try_count = ctx.devmem_only ? 3 : 1;
> -	ktime_t end = 0;
>  	int err;
>  
>  	lockdep_assert_held_write(&vm->lock);
> @@ -894,27 +892,32 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
>  
>  	range_debug(range, "PAGE FAULT - BIND");
>  
> -retry_bind:
> -	xe_vm_lock(vm, false);
> -	fence = xe_vm_range_rebind(vm, vma, range, BIT(tile->id));
> -	if (IS_ERR(fence)) {
> -		xe_vm_unlock(vm);
> -		err = PTR_ERR(fence);
> -		if (err == -EAGAIN) {
> -			ctx.timeslice_ms <<= 1;	/* Double timeslice if we have to retry */
> -			range_debug(range, "PAGE FAULT - RETRY BIND");
> -			goto retry;
> +	xe_validation_guard(&vctx, &vm->xe->val, &exec, (struct xe_val_flags) {}, err) {
> +		err = xe_vm_drm_exec_lock(vm, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +
> +		xe_vm_set_validation_exec(vm, &exec);
> +		fence = xe_vm_range_rebind(vm, vma, range, BIT(tile->id));
> +		xe_vm_set_validation_exec(vm, NULL);
> +		if (IS_ERR(fence)) {
> +			drm_exec_retry_on_contention(&exec);
> +			err = PTR_ERR(fence);
> +			xe_validation_retry_on_oom(&vctx, &err);
>  		}
> -		if (xe_vm_validate_should_retry(NULL, err, &end))
> -			goto retry_bind;
> -		goto err_out;
>  	}
> -	xe_vm_unlock(vm);
> +	if (err)
> +		goto err_out;
>  
>  	dma_fence_wait(fence, false);
>  	dma_fence_put(fence);
> +	return 0;
>  
>  err_out:
> +	if (err == -EAGAIN) {
> +		ctx.timeslice_ms <<= 1;	/* Double timeslice if we have to retry */
> +		range_debug(range, "PAGE FAULT - RETRY BIND");
> +		goto retry;
> +	}
>  
>  	return err;
>  }
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 15/16] drm/xe/sriov: Convert pf_provision_vf_lmem for exhaustive eviction
  2025-08-22  9:40 ` [PATCH v2 15/16] drm/xe/sriov: Convert pf_provision_vf_lmem " Thomas Hellström
@ 2025-08-22 19:35   ` Matthew Brost
  0 siblings, 0 replies; 36+ messages in thread
From: Matthew Brost @ 2025-08-22 19:35 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Fri, Aug 22, 2025 at 11:40:29AM +0200, Thomas Hellström wrote:
> Open-code since this is the only identified instance of pinning
> without mapping.
> 
> v2:
> - Break out this patch from the previous one.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c | 51 ++++++++++++++--------
>  1 file changed, 33 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> index 906011671b60..c9e3c811c35b 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> @@ -1452,11 +1452,12 @@ static bool pf_release_vf_config_lmem(struct xe_gt *gt, struct xe_gt_sriov_confi
>  static int pf_provision_vf_lmem(struct xe_gt *gt, unsigned int vfid, u64 size)
>  {
>  	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
> -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
>  	struct xe_device *xe = gt_to_xe(gt);
>  	struct xe_tile *tile = gt_to_tile(gt);
> +	struct xe_validation_ctx ctx;
> +	struct drm_exec exec;
>  	struct xe_bo *bo;
> -	int err;
> +	int err = 0;
>  
>  	xe_gt_assert(gt, vfid);
>  	xe_gt_assert(gt, IS_DGFX(xe));
> @@ -1479,23 +1480,37 @@ static int pf_provision_vf_lmem(struct xe_gt *gt, unsigned int vfid, u64 size)
>  		return 0;
>  
>  	xe_gt_assert(gt, pf_get_lmem_alignment(gt) == SZ_2M);
> -	bo = xe_bo_create_locked(xe, tile, NULL,
> -				 ALIGN(size, PAGE_SIZE),
> -				 ttm_bo_type_kernel,
> -				 XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> -				 XE_BO_FLAG_NEEDS_2M |
> -				 XE_BO_FLAG_PINNED |
> -				 XE_BO_FLAG_PINNED_LATE_RESTORE,
> -				 exec);
> -	if (IS_ERR(bo))
> -		return PTR_ERR(bo);
> -
> -	err = xe_bo_pin(bo, exec);
> -	xe_bo_unlock(bo);
> -	if (unlikely(err)) {
> -		xe_bo_put(bo);
> -		return err;
> +
> +	/*
> +	 * Open-code for now, since this is the only instance of
> +	 * pinning without mapping.
> +	 */
> +	xe_validation_guard(&ctx, &xe->val, &exec, (struct xe_val_flags) {.exclusive = true}, err) {
> +		bo = xe_bo_create_locked(xe, tile, NULL,
> +					 ALIGN(size, PAGE_SIZE),
> +					 ttm_bo_type_kernel,
> +					 XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> +					 XE_BO_FLAG_NEEDS_2M |
> +					 XE_BO_FLAG_PINNED |
> +					 XE_BO_FLAG_PINNED_LATE_RESTORE,
> +					 &exec);
> +		if (IS_ERR(bo)) {
> +			drm_exec_retry_on_contention(&exec);
> +			err = PTR_ERR(bo);
> +			xe_validation_retry_on_oom(&ctx, &err);
> +			return PTR_ERR(bo);
> +		}
> +
> +		err = xe_bo_pin(bo, &exec);
> +		xe_bo_unlock(bo);
> +		if (err) {
> +			xe_bo_put(bo);
> +			drm_exec_retry_on_contention(&exec);
> +			xe_validation_retry_on_oom(&ctx, &err);

I think you can do 'return err;' here.

Matt

> +		}
>  	}
> +	if (err)
> +		return err;
>  
>  	config->lmem_obj = bo;
>  
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 04/16] drm/xe: Pass down drm_exec context to validation
  2025-08-22  9:40 ` [PATCH v2 04/16] drm/xe: Pass down drm_exec context to validation Thomas Hellström
@ 2025-08-22 19:59   ` Matthew Brost
  0 siblings, 0 replies; 36+ messages in thread
From: Matthew Brost @ 2025-08-22 19:59 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Fri, Aug 22, 2025 at 11:40:18AM +0200, Thomas Hellström wrote:
> We want all validation (potential backing store allocation) to be part
> of a drm_exec transaction. Therefore add a drm_exec pointer argument
> to xe_bo_validate() and ___xe_bo_create_locked(). Upcoming patches
> will deal with making all (or nearly all) calls to these functions
> part of a drm_exec transaction. In the meantime, define special values
> of the drm_exec pointer:
> 
> XE_VALIDATION_UNIMPLEMENTED: Implementation of the drm_exec transaction
> has not been done yet.
> XE_VALIDATION_UNSUPPORTED: Some Middle-layers (dma-buf) doesn't allow
> the drm_exec context to be passed down to map_attachment where
> validation takes place.
> XE_VALIDATION_OPT_OUT: May be used only for kunit tests where exhaustive
> eviction isn't crucial and the ROI of converting those is very
> small.
> 
> For XE_VALIDATION_UNIMPLEMENTED and XE_VALIDATION_OPT_OUT there is also
> a lockdep check that a drm_exec transaction can indeed start at the
> location where the macro is expanded. This is to encourage
> developers to take this into consideration early in the code
> development process.
> 
> v2:
> - Fix xe_vm_set_validation_exec() imbalance. Add an assert that
>   hopefully catches future instances of this (Matt Brost)
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/xe/Makefile                   |   1 +
>  .../compat-i915-headers/gem/i915_gem_stolen.h |   6 +-
>  drivers/gpu/drm/xe/display/xe_fb_pin.c        |   5 +-
>  drivers/gpu/drm/xe/tests/xe_bo.c              |  20 +--
>  drivers/gpu/drm/xe/tests/xe_dma_buf.c         |  12 +-
>  drivers/gpu/drm/xe/tests/xe_migrate.c         |  45 +++---
>  drivers/gpu/drm/xe/xe_bo.c                    | 129 +++++++++++++++---
>  drivers/gpu/drm/xe/xe_bo.h                    |  20 +--
>  drivers/gpu/drm/xe/xe_dma_buf.c               |  19 ++-
>  drivers/gpu/drm/xe/xe_exec.c                  |   6 +-
>  drivers/gpu/drm/xe/xe_ggtt.c                  |  15 +-
>  drivers/gpu/drm/xe/xe_ggtt.h                  |   5 +-
>  drivers/gpu/drm/xe/xe_gt_pagefault.c          |   6 +-
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c    |   6 +-
>  drivers/gpu/drm/xe/xe_svm.c                   |   4 +-
>  drivers/gpu/drm/xe/xe_validation.c            |  49 +++++++
>  drivers/gpu/drm/xe/xe_validation.h            |  69 ++++++++++
>  drivers/gpu/drm/xe/xe_vm.c                    |  24 +++-
>  drivers/gpu/drm/xe/xe_vm.h                    |  34 ++++-
>  drivers/gpu/drm/xe/xe_vm_types.h              |  32 +++--
>  20 files changed, 402 insertions(+), 105 deletions(-)
>  create mode 100644 drivers/gpu/drm/xe/xe_validation.c
>  create mode 100644 drivers/gpu/drm/xe/xe_validation.h
> 
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index 8e0c3412a757..8ee7d275128d 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -127,6 +127,7 @@ xe-y += xe_bb.o \
>  	xe_tuning.o \
>  	xe_uc.o \
>  	xe_uc_fw.o \
> +	xe_validation.o \
>  	xe_vm.o \
>  	xe_vram.o \
>  	xe_vram_freq.o \
> diff --git a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> index 41d39d67817a..1ce1e9da975b 100644
> --- a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> +++ b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> @@ -8,6 +8,7 @@
>  
>  #include "xe_ttm_stolen_mgr.h"
>  #include "xe_res_cursor.h"
> +#include "xe_validation.h"
>  
>  struct xe_bo;
>  
> @@ -20,6 +21,7 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
>  						       u32 size, u32 align,
>  						       u32 start, u32 end)
>  {
> +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
>  	struct xe_bo *bo;
>  	int err;
>  	u32 flags = XE_BO_FLAG_PINNED | XE_BO_FLAG_STOLEN;
> @@ -34,13 +36,13 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
>  
>  	bo = xe_bo_create_locked_range(xe, xe_device_get_root_tile(xe),
>  				       NULL, size, start, end,
> -				       ttm_bo_type_kernel, flags, 0);
> +				       ttm_bo_type_kernel, flags, 0, exec);
>  	if (IS_ERR(bo)) {
>  		err = PTR_ERR(bo);
>  		bo = NULL;
>  		return err;
>  	}
> -	err = xe_bo_pin(bo);
> +	err = xe_bo_pin(bo, exec);
>  	xe_bo_unlock_vm_held(bo);
>  
>  	if (err) {
> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> index f1f8b5ab53ef..4b0748e6fdd6 100644
> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> @@ -281,6 +281,7 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
>  	struct i915_vma *vma = kzalloc(sizeof(*vma), GFP_KERNEL);
>  	struct drm_gem_object *obj = intel_fb_bo(&fb->base);
>  	struct xe_bo *bo = gem_to_xe_bo(obj);
> +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
>  	int ret;
>  
>  	if (!vma)
> @@ -313,9 +314,9 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
>  		goto err;
>  
>  	if (IS_DGFX(xe))
> -		ret = xe_bo_migrate(bo, XE_PL_VRAM0);
> +		ret = xe_bo_migrate(bo, XE_PL_VRAM0, exec);
>  	else
> -		ret = xe_bo_validate(bo, NULL, true);
> +		ret = xe_bo_validate(bo, NULL, true, exec);
>  	if (!ret)
>  		ttm_bo_pin(&bo->ttm);
>  	ttm_bo_unreserve(&bo->ttm);
> diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
> index bb469096d072..06ceba6c3c25 100644
> --- a/drivers/gpu/drm/xe/tests/xe_bo.c
> +++ b/drivers/gpu/drm/xe/tests/xe_bo.c
> @@ -23,7 +23,7 @@
>  
>  static int ccs_test_migrate(struct xe_tile *tile, struct xe_bo *bo,
>  			    bool clear, u64 get_val, u64 assign_val,
> -			    struct kunit *test)
> +			    struct kunit *test, struct drm_exec *exec)
>  {
>  	struct dma_fence *fence;
>  	struct ttm_tt *ttm;
> @@ -35,7 +35,7 @@ static int ccs_test_migrate(struct xe_tile *tile, struct xe_bo *bo,
>  	u32 offset;
>  
>  	/* Move bo to VRAM if not already there. */
> -	ret = xe_bo_validate(bo, NULL, false);
> +	ret = xe_bo_validate(bo, NULL, false, exec);
>  	if (ret) {
>  		KUNIT_FAIL(test, "Failed to validate bo.\n");
>  		return ret;
> @@ -60,7 +60,7 @@ static int ccs_test_migrate(struct xe_tile *tile, struct xe_bo *bo,
>  	}
>  
>  	/* Evict to system. CCS data should be copied. */
> -	ret = xe_bo_evict(bo);
> +	ret = xe_bo_evict(bo, exec);
>  	if (ret) {
>  		KUNIT_FAIL(test, "Failed to evict bo.\n");
>  		return ret;
> @@ -132,6 +132,7 @@ static void ccs_test_run_tile(struct xe_device *xe, struct xe_tile *tile,
>  
>  	/* TODO: Sanity check */
>  	unsigned int bo_flags = XE_BO_FLAG_VRAM_IF_DGFX(tile);
> +	struct drm_exec *exec = XE_VALIDATION_OPT_OUT;
>  
>  	if (IS_DGFX(xe))
>  		kunit_info(test, "Testing vram id %u\n", tile->id);
> @@ -149,18 +150,18 @@ static void ccs_test_run_tile(struct xe_device *xe, struct xe_tile *tile,
>  
>  	kunit_info(test, "Verifying that CCS data is cleared on creation.\n");
>  	ret = ccs_test_migrate(tile, bo, false, 0ULL, 0xdeadbeefdeadbeefULL,
> -			       test);
> +			       test, exec);
>  	if (ret)
>  		goto out_unlock;
>  
>  	kunit_info(test, "Verifying that CCS data survives migration.\n");
>  	ret = ccs_test_migrate(tile, bo, false, 0xdeadbeefdeadbeefULL,
> -			       0xdeadbeefdeadbeefULL, test);
> +			       0xdeadbeefdeadbeefULL, test, exec);
>  	if (ret)
>  		goto out_unlock;
>  
>  	kunit_info(test, "Verifying that CCS data can be properly cleared.\n");
> -	ret = ccs_test_migrate(tile, bo, true, 0ULL, 0ULL, test);
> +	ret = ccs_test_migrate(tile, bo, true, 0ULL, 0ULL, test, exec);
>  
>  out_unlock:
>  	xe_bo_unlock(bo);
> @@ -210,6 +211,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
>  	struct xe_bo *bo, *external;
>  	unsigned int bo_flags = XE_BO_FLAG_VRAM_IF_DGFX(tile);
>  	struct xe_vm *vm = xe_migrate_get_vm(xe_device_get_root_tile(xe)->migrate);
> +	struct drm_exec *exec = XE_VALIDATION_OPT_OUT;
>  	struct xe_gt *__gt;
>  	int err, i, id;
>  
> @@ -236,7 +238,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
>  		}
>  
>  		xe_bo_lock(external, false);
> -		err = xe_bo_pin_external(external);
> +		err = xe_bo_pin_external(external, exec);
>  		xe_bo_unlock(external);
>  		if (err) {
>  			KUNIT_FAIL(test, "external bo pin err=%pe\n",
> @@ -294,7 +296,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
>  		if (i) {
>  			down_read(&vm->lock);
>  			xe_vm_lock(vm, false);
> -			err = xe_bo_validate(bo, bo->vm, false);
> +			err = xe_bo_validate(bo, bo->vm, false, exec);
>  			xe_vm_unlock(vm);
>  			up_read(&vm->lock);
>  			if (err) {
> @@ -303,7 +305,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
>  				goto cleanup_all;
>  			}
>  			xe_bo_lock(external, false);
> -			err = xe_bo_validate(external, NULL, false);
> +			err = xe_bo_validate(external, NULL, false, exec);
>  			xe_bo_unlock(external);
>  			if (err) {
>  				KUNIT_FAIL(test, "external bo valid err=%pe\n",
> diff --git a/drivers/gpu/drm/xe/tests/xe_dma_buf.c b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
> index cde9530bef8c..965dd3280468 100644
> --- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c
> +++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
> @@ -27,7 +27,8 @@ static bool is_dynamic(struct dma_buf_test_params *params)
>  }
>  
>  static void check_residency(struct kunit *test, struct xe_bo *exported,
> -			    struct xe_bo *imported, struct dma_buf *dmabuf)
> +			    struct xe_bo *imported, struct dma_buf *dmabuf,
> +			    struct drm_exec *exec)
>  {
>  	struct dma_buf_test_params *params = to_dma_buf_test_params(test->priv);
>  	u32 mem_type;
> @@ -62,7 +63,7 @@ static void check_residency(struct kunit *test, struct xe_bo *exported,
>  	 * importer is on a different device. If they're on the same device,
>  	 * the exporter and the importer should be the same bo.
>  	 */
> -	ret = xe_bo_evict(exported);
> +	ret = xe_bo_evict(exported, exec);
>  	if (ret) {
>  		if (ret != -EINTR && ret != -ERESTARTSYS)
>  			KUNIT_FAIL(test, "Evicting exporter failed with err=%d.\n",
> @@ -77,7 +78,7 @@ static void check_residency(struct kunit *test, struct xe_bo *exported,
>  	}
>  
>  	/* Re-validate the importer. This should move also exporter in. */
> -	ret = xe_bo_validate(imported, NULL, false);
> +	ret = xe_bo_validate(imported, NULL, false, exec);
>  	if (ret) {
>  		if (ret != -EINTR && ret != -ERESTARTSYS)
>  			KUNIT_FAIL(test, "Validating importer failed with err=%d.\n",
> @@ -150,11 +151,12 @@ static void xe_test_dmabuf_import_same_driver(struct xe_device *xe)
>  			KUNIT_FAIL(test,
>  				   "xe_gem_prime_import() succeeded when it shouldn't have\n");
>  		} else {
> +			struct drm_exec *exec = XE_VALIDATION_OPT_OUT;
>  			int err;
>  
>  			/* Is everything where we expect it to be? */
>  			xe_bo_lock(import_bo, false);
> -			err = xe_bo_validate(import_bo, NULL, false);
> +			err = xe_bo_validate(import_bo, NULL, false, exec);
>  
>  			/* Pinning in VRAM is not allowed. */
>  			if (!is_dynamic(params) &&
> @@ -167,7 +169,7 @@ static void xe_test_dmabuf_import_same_driver(struct xe_device *xe)
>  						  err == -ERESTARTSYS);
>  
>  			if (!err)
> -				check_residency(test, bo, import_bo, dmabuf);
> +				check_residency(test, bo, import_bo, dmabuf, exec);
>  			xe_bo_unlock(import_bo);
>  		}
>  		drm_gem_object_put(import);
> diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
> index edd1e701aa1c..dfb445d09759 100644
> --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
> @@ -70,7 +70,7 @@ static int run_sanity_job(struct xe_migrate *m, struct xe_device *xe,
>  		} } while (0)
>  
>  static void test_copy(struct xe_migrate *m, struct xe_bo *bo,
> -		      struct kunit *test, u32 region)
> +		      struct kunit *test, u32 region, struct drm_exec *exec)
>  {
>  	struct xe_device *xe = tile_to_xe(m->tile);
>  	u64 retval, expected = 0;
> @@ -84,14 +84,15 @@ static void test_copy(struct xe_migrate *m, struct xe_bo *bo,
>  						   ttm_bo_type_kernel,
>  						   region |
>  						   XE_BO_FLAG_NEEDS_CPU_ACCESS |
> -						   XE_BO_FLAG_PINNED);
> +						   XE_BO_FLAG_PINNED,
> +						   exec);
>  	if (IS_ERR(remote)) {
>  		KUNIT_FAIL(test, "Failed to allocate remote bo for %s: %pe\n",
>  			   str, remote);
>  		return;
>  	}
>  
> -	err = xe_bo_validate(remote, NULL, false);
> +	err = xe_bo_validate(remote, NULL, false, exec);
>  	if (err) {
>  		KUNIT_FAIL(test, "Failed to validate system bo for %s: %i\n",
>  			   str, err);
> @@ -161,13 +162,13 @@ static void test_copy(struct xe_migrate *m, struct xe_bo *bo,
>  }
>  
>  static void test_copy_sysmem(struct xe_migrate *m, struct xe_bo *bo,
> -			     struct kunit *test)
> +			     struct drm_exec *exec, struct kunit *test)
>  {
> -	test_copy(m, bo, test, XE_BO_FLAG_SYSTEM);
> +	test_copy(m, bo, test, XE_BO_FLAG_SYSTEM, exec);
>  }
>  
>  static void test_copy_vram(struct xe_migrate *m, struct xe_bo *bo,
> -			   struct kunit *test)
> +			   struct drm_exec *exec, struct kunit *test)
>  {
>  	u32 region;
>  
> @@ -178,10 +179,11 @@ static void test_copy_vram(struct xe_migrate *m, struct xe_bo *bo,
>  		region = XE_BO_FLAG_VRAM1;
>  	else
>  		region = XE_BO_FLAG_VRAM0;
> -	test_copy(m, bo, test, region);
> +	test_copy(m, bo, test, region, exec);
>  }
>  
> -static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
> +static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test,
> +				   struct drm_exec *exec)
>  {
>  	struct xe_tile *tile = m->tile;
>  	struct xe_device *xe = tile_to_xe(tile);
> @@ -290,10 +292,10 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
>  	check(retval, expected, "Command clear small last value", test);
>  
>  	kunit_info(test, "Copying small buffer object to system\n");
> -	test_copy_sysmem(m, tiny, test);
> +	test_copy_sysmem(m, tiny, exec, test);
>  	if (xe->info.tile_count > 1) {
>  		kunit_info(test, "Copying small buffer object to other vram\n");
> -		test_copy_vram(m, tiny, test);
> +		test_copy_vram(m, tiny, exec, test);
>  	}
>  
>  	/* Clear a big bo */
> @@ -312,10 +314,10 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
>  	check(retval, expected, "Command clear big last value", test);
>  
>  	kunit_info(test, "Copying big buffer object to system\n");
> -	test_copy_sysmem(m, big, test);
> +	test_copy_sysmem(m, big, exec, test);
>  	if (xe->info.tile_count > 1) {
>  		kunit_info(test, "Copying big buffer object to other vram\n");
> -		test_copy_vram(m, big, test);
> +		test_copy_vram(m, big, exec, test);
>  	}
>  
>  out:
> @@ -343,10 +345,11 @@ static int migrate_test_run_device(struct xe_device *xe)
>  
>  	for_each_tile(tile, xe, id) {
>  		struct xe_migrate *m = tile->migrate;
> +		struct drm_exec *exec = XE_VALIDATION_OPT_OUT;
>  
>  		kunit_info(test, "Testing tile id %d.\n", id);
>  		xe_vm_lock(m->q->vm, false);
> -		xe_migrate_sanity_test(m, test);
> +		xe_migrate_sanity_test(m, test, exec);
>  		xe_vm_unlock(m->q->vm);
>  	}
>  
> @@ -490,7 +493,7 @@ static struct dma_fence *blt_copy(struct xe_tile *tile,
>  
>  static void test_migrate(struct xe_device *xe, struct xe_tile *tile,
>  			 struct xe_bo *sys_bo, struct xe_bo *vram_bo, struct xe_bo *ccs_bo,
> -			 struct kunit *test)
> +			 struct drm_exec *exec, struct kunit *test)
>  {
>  	struct dma_fence *fence;
>  	u64 expected, retval;
> @@ -509,7 +512,7 @@ static void test_migrate(struct xe_device *xe, struct xe_tile *tile,
>  	dma_fence_put(fence);
>  
>  	kunit_info(test, "Evict vram buffer object\n");
> -	ret = xe_bo_evict(vram_bo);
> +	ret = xe_bo_evict(vram_bo, exec);
>  	if (ret) {
>  		KUNIT_FAIL(test, "Failed to evict bo.\n");
>  		return;
> @@ -538,7 +541,7 @@ static void test_migrate(struct xe_device *xe, struct xe_tile *tile,
>  	dma_fence_put(fence);
>  
>  	kunit_info(test, "Restore vram buffer object\n");
> -	ret = xe_bo_validate(vram_bo, NULL, false);
> +	ret = xe_bo_validate(vram_bo, NULL, false, exec);
>  	if (ret) {
>  		KUNIT_FAIL(test, "Failed to validate vram bo for: %li\n", ret);
>  		return;
> @@ -636,6 +639,7 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
>  {
>  	struct xe_bo *sys_bo, *vram_bo = NULL, *ccs_bo = NULL;
>  	unsigned int bo_flags = XE_BO_FLAG_VRAM_IF_DGFX(tile);
> +	struct drm_exec *exec;
>  	long ret;
>  
>  	sys_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M,
> @@ -650,8 +654,9 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
>  		return;
>  	}
>  
> +	exec = XE_VALIDATION_OPT_OUT;
>  	xe_bo_lock(sys_bo, false);
> -	ret = xe_bo_validate(sys_bo, NULL, false);
> +	ret = xe_bo_validate(sys_bo, NULL, false, exec);
>  	if (ret) {
>  		KUNIT_FAIL(test, "Failed to validate system bo for: %li\n", ret);
>  		goto free_sysbo;
> @@ -676,7 +681,7 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
>  	}
>  
>  	xe_bo_lock(ccs_bo, false);
> -	ret = xe_bo_validate(ccs_bo, NULL, false);
> +	ret = xe_bo_validate(ccs_bo, NULL, false, exec);
>  	if (ret) {
>  		KUNIT_FAIL(test, "Failed to validate system bo for: %li\n", ret);
>  		goto free_ccsbo;
> @@ -700,7 +705,7 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
>  	}
>  
>  	xe_bo_lock(vram_bo, false);
> -	ret = xe_bo_validate(vram_bo, NULL, false);
> +	ret = xe_bo_validate(vram_bo, NULL, false, exec);
>  	if (ret) {
>  		KUNIT_FAIL(test, "Failed to validate vram bo for: %li\n", ret);
>  		goto free_vrambo;
> @@ -713,7 +718,7 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
>  	}
>  
>  	test_clear(xe, tile, sys_bo, vram_bo, test);
> -	test_migrate(xe, tile, sys_bo, vram_bo, ccs_bo, test);
> +	test_migrate(xe, tile, sys_bo, vram_bo, ccs_bo, exec, test);
>  	xe_bo_unlock(vram_bo);
>  
>  	xe_bo_lock(vram_bo, false);
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 11eaf3b06766..e71addf51ed0 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -1139,6 +1139,7 @@ long xe_bo_shrink(struct ttm_operation_ctx *ctx, struct ttm_buffer_object *bo,
>  int xe_bo_notifier_prepare_pinned(struct xe_bo *bo)
>  {
>  	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
> +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
>  	struct xe_bo *backup;
>  	int ret = 0;
>  
> @@ -1163,7 +1164,7 @@ int xe_bo_notifier_prepare_pinned(struct xe_bo *bo)
>  	backup = ___xe_bo_create_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
>  					DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
>  					XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> -					XE_BO_FLAG_PINNED);
> +					XE_BO_FLAG_PINNED, exec);
>  	if (IS_ERR(backup)) {
>  		ret = PTR_ERR(backup);
>  		goto out_unlock_bo;
> @@ -1214,6 +1215,7 @@ int xe_bo_notifier_unprepare_pinned(struct xe_bo *bo)
>  int xe_bo_evict_pinned(struct xe_bo *bo)
>  {
>  	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
> +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
>  	struct xe_bo *backup = bo->backup_obj;
>  	bool backup_created = false;
>  	bool unmap = false;
> @@ -1242,7 +1244,7 @@ int xe_bo_evict_pinned(struct xe_bo *bo)
>  						NULL, xe_bo_size(bo),
>  						DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
>  						XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> -						XE_BO_FLAG_PINNED);
> +						XE_BO_FLAG_PINNED, exec);
>  		if (IS_ERR(backup)) {
>  			ret = PTR_ERR(backup);
>  			goto out_unlock_bo;
> @@ -1718,12 +1720,14 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
>  	struct xe_device *xe = to_xe_device(ddev);
>  	struct xe_bo *bo = ttm_to_xe_bo(tbo);
>  	bool needs_rpm = bo->flags & XE_BO_FLAG_VRAM_MASK;
> +	struct drm_exec *exec;
>  	vm_fault_t ret;
>  	int idx;
>  
>  	if (needs_rpm)
>  		xe_pm_runtime_get(xe);
>  
> +	exec = XE_VALIDATION_UNIMPLEMENTED;
>  	ret = ttm_bo_vm_reserve(tbo, vmf);
>  	if (ret)
>  		goto out;
> @@ -1731,6 +1735,7 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
>  	if (drm_dev_enter(ddev, &idx)) {
>  		trace_xe_bo_cpu_fault(bo);
>  
> +		xe_validation_assert_exec(xe, exec, &tbo->base);
>  		ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
>  					       TTM_BO_VM_NUM_PREFAULT);
>  		drm_dev_exit(idx);
> @@ -1850,11 +1855,32 @@ void xe_bo_free(struct xe_bo *bo)
>  	kfree(bo);
>  }
>  
> +/**
> + * ___xe_bo_create_locked() - Initialize or create an xe_bo.
> + * @xe: The xe device.
> + * @bo: An already allocated buffer object or NULL
> + * if the function should allocate a new one.
> + * @tile: The tile to select for migration of this bo, and the tile used for
> + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
> + * @resv: Pointer to a locked shared reservation object to use fo this bo,
> + * or NULL for the xe_bo to use its own.
> + * @bulk: The bulk move to use for LRU bumping, or NULL for external bos.
> + * @size: The storage size to use for the bo.
> + * @cpu_caching: The cpu caching used for system memory backing store.
> + * @type: The TTM buffer object type.
> + * @flags: XE_BO_FLAG_ flags.
> + * @exec: The drm_exec transaction to use for exhaustive eviction.
> + *
> + * Initialize or create an xe buffer object. On failure, any allocated buffer
> + * object passed in @bo will have been unreferenced.
> + *
> + * Return: The buffer object on success. Negative error pointer on failure.
> + */
>  struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>  				     struct xe_tile *tile, struct dma_resv *resv,
>  				     struct ttm_lru_bulk_move *bulk, size_t size,
>  				     u16 cpu_caching, enum ttm_bo_type type,
> -				     u32 flags)
> +				     u32 flags, struct drm_exec *exec)
>  {
>  	struct ttm_operation_ctx ctx = {
>  		.interruptible = true,
> @@ -1923,6 +1949,7 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>  		ctx.resv = resv;
>  	}
>  
> +	xe_validation_assert_exec(xe, exec, &bo->ttm.base);
>  	if (!(flags & XE_BO_FLAG_FIXED_PLACEMENT)) {
>  		err = __xe_bo_placement_for_flags(xe, bo, bo->flags);
>  		if (WARN_ON(err)) {
> @@ -2024,7 +2051,7 @@ __xe_bo_create_locked(struct xe_device *xe,
>  		      struct xe_tile *tile, struct xe_vm *vm,
>  		      size_t size, u64 start, u64 end,
>  		      u16 cpu_caching, enum ttm_bo_type type, u32 flags,
> -		      u64 alignment)
> +		      u64 alignment, struct drm_exec *exec)
>  {
>  	struct xe_bo *bo = NULL;
>  	int err;
> @@ -2049,7 +2076,7 @@ __xe_bo_create_locked(struct xe_device *xe,
>  				    vm && !xe_vm_in_fault_mode(vm) &&
>  				    flags & XE_BO_FLAG_USER ?
>  				    &vm->lru_bulk_move : NULL, size,
> -				    cpu_caching, type, flags);
> +				    cpu_caching, type, flags, exec);
>  	if (IS_ERR(bo))
>  		return bo;
>  
> @@ -2083,9 +2110,10 @@ __xe_bo_create_locked(struct xe_device *xe,
>  
>  			if (flags & XE_BO_FLAG_FIXED_PLACEMENT) {
>  				err = xe_ggtt_insert_bo_at(t->mem.ggtt, bo,
> -							   start + xe_bo_size(bo), U64_MAX);
> +							   start + xe_bo_size(bo), U64_MAX,
> +							   exec);
>  			} else {
> -				err = xe_ggtt_insert_bo(t->mem.ggtt, bo);
> +				err = xe_ggtt_insert_bo(t->mem.ggtt, bo, exec);
>  			}
>  			if (err)
>  				goto err_unlock_put_bo;
> @@ -2102,22 +2130,59 @@ __xe_bo_create_locked(struct xe_device *xe,
>  	return ERR_PTR(err);
>  }
>  
> +/**
> + * xe_bo_create_locked_range() - Create a BO with range- and alignment options
> + * @xe: The xe device.
> + * @tile: The tile to select for migration of this bo, and the tile used for
> + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
> + * @vm: The local vm or NULL for external objects.
> + * @size: The storage size to use for the bo.
> + * @start: Start of fixed VRAM range or 0.
> + * @end: End of fixed VRAM range or ~0ULL.
> + * @type: The TTM buffer object type.
> + * @flags: XE_BO_FLAG_ flags.
> + * @alignment: For GGTT buffer objects, the minimum GGTT alignment.
> + * @exec: The drm_exec transaction to use for exhaustive eviction.
> + *
> + * Create an Xe BO with range- and alignment options. If @start and @end indicate
> + * a fixed VRAM range, this must be a ttm_bo_type_kernel bo with VRAM placement
> + * only. The @alignment parameter can be used for GGTT alignment.
> + *
> + * Return: The buffer object on success. Negative error pointer on failure.
> + */
>  struct xe_bo *
>  xe_bo_create_locked_range(struct xe_device *xe,
>  			  struct xe_tile *tile, struct xe_vm *vm,
>  			  size_t size, u64 start, u64 end,
> -			  enum ttm_bo_type type, u32 flags, u64 alignment)
> +			  enum ttm_bo_type type, u32 flags, u64 alignment,
> +			  struct drm_exec *exec)
>  {
>  	return __xe_bo_create_locked(xe, tile, vm, size, start, end, 0, type,
> -				     flags, alignment);
> +				     flags, alignment, exec);
>  }
>  
> +/**
> + * xe_bo_create_locked() - Create a BO
> + * @xe: The xe device.
> + * @tile: The tile to select for migration of this bo, and the tile used for
> + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
> + * @vm: The local vm or NULL for external objects.
> + * @size: The storage size to use for the bo.
> + * @type: The TTM buffer object type.
> + * @flags: XE_BO_FLAG_ flags.
> + * @exec: The drm_exec transaction to use for exhaustive eviction.
> + *
> + * Create a locked xe BO with no range- nor alignment restrictions.
> + *
> + * Return: The buffer object on success. Negative error pointer on failure.
> + */
>  struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
>  				  struct xe_vm *vm, size_t size,
> -				  enum ttm_bo_type type, u32 flags)
> +				  enum ttm_bo_type type, u32 flags,
> +				  struct drm_exec *exec)
>  {
>  	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, type,
> -				     flags, 0);
> +				     flags, 0, exec);
>  }
>  
>  struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
> @@ -2125,9 +2190,10 @@ struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
>  				u16 cpu_caching,
>  				u32 flags)
>  {
> +	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
>  	struct xe_bo *bo = __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL,
>  						 cpu_caching, ttm_bo_type_device,
> -						 flags | XE_BO_FLAG_USER, 0);
> +						 flags | XE_BO_FLAG_USER, 0, exec);
>  	if (!IS_ERR(bo))
>  		xe_bo_unlock_vm_held(bo);
>  
> @@ -2138,7 +2204,8 @@ struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
>  			   struct xe_vm *vm, size_t size,
>  			   enum ttm_bo_type type, u32 flags)
>  {
> -	struct xe_bo *bo = xe_bo_create_locked(xe, tile, vm, size, type, flags);
> +	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
> +	struct xe_bo *bo = xe_bo_create_locked(xe, tile, vm, size, type, flags, exec);
>  
>  	if (!IS_ERR(bo))
>  		xe_bo_unlock_vm_held(bo);
> @@ -2166,6 +2233,7 @@ struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
>  	int err;
>  	u64 start = offset == ~0ull ? 0 : offset;
>  	u64 end = offset == ~0ull ? offset : start + size;
> +	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
>  
>  	if (flags & XE_BO_FLAG_STOLEN &&
>  	    xe_ttm_stolen_cpu_access_needs_ggtt(xe))
> @@ -2173,11 +2241,11 @@ struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
>  
>  	bo = xe_bo_create_locked_range(xe, tile, vm, size, start, end, type,
>  				       flags | XE_BO_FLAG_NEEDS_CPU_ACCESS | XE_BO_FLAG_PINNED,
> -				       alignment);
> +				       alignment, exec);
>  	if (IS_ERR(bo))
>  		return bo;
>  
> -	err = xe_bo_pin(bo);
> +	err = xe_bo_pin(bo, exec);
>  	if (err)
>  		goto err_put;
>  
> @@ -2299,6 +2367,7 @@ uint64_t vram_region_gpu_offset(struct ttm_resource *res)
>  /**
>   * xe_bo_pin_external - pin an external BO
>   * @bo: buffer object to be pinned
> + * @exec: The drm_exec transaction to use for exhaustive eviction.
>   *
>   * Pin an external (not tied to a VM, can be exported via dma-buf / prime FD)
>   * BO. Unique call compared to xe_bo_pin as this function has it own set of
> @@ -2306,7 +2375,7 @@ uint64_t vram_region_gpu_offset(struct ttm_resource *res)
>   *
>   * Returns 0 for success, negative error code otherwise.
>   */
> -int xe_bo_pin_external(struct xe_bo *bo)
> +int xe_bo_pin_external(struct xe_bo *bo, struct drm_exec *exec)
>  {
>  	struct xe_device *xe = xe_bo_device(bo);
>  	int err;
> @@ -2315,7 +2384,7 @@ int xe_bo_pin_external(struct xe_bo *bo)
>  	xe_assert(xe, xe_bo_is_user(bo));
>  
>  	if (!xe_bo_is_pinned(bo)) {
> -		err = xe_bo_validate(bo, NULL, false);
> +		err = xe_bo_validate(bo, NULL, false, exec);
>  		if (err)
>  			return err;
>  
> @@ -2337,7 +2406,17 @@ int xe_bo_pin_external(struct xe_bo *bo)
>  	return 0;
>  }
>  
> -int xe_bo_pin(struct xe_bo *bo)
> +/**
> + * xe_bo_pin() - Pin a kernel bo after potentially migrating it
> + * @bo: The kernel bo to pin.
> + * @exec: The drm_exec transaction to use for exhaustive eviction.
> + *
> + * Attempts to migrate a bo to @bo->placement. If that succeeds,
> + * pins the bo.
> + *
> + * Return: %0 on success, negative error code on migration failure.
> + */
> +int xe_bo_pin(struct xe_bo *bo, struct drm_exec *exec)
>  {
>  	struct ttm_place *place = &bo->placements[0];
>  	struct xe_device *xe = xe_bo_device(bo);
> @@ -2359,7 +2438,7 @@ int xe_bo_pin(struct xe_bo *bo)
>  	/* We only expect at most 1 pin */
>  	xe_assert(xe, !xe_bo_is_pinned(bo));
>  
> -	err = xe_bo_validate(bo, NULL, false);
> +	err = xe_bo_validate(bo, NULL, false, exec);
>  	if (err)
>  		return err;
>  
> @@ -2452,6 +2531,7 @@ void xe_bo_unpin(struct xe_bo *bo)
>   *      NULL. Used together with @allow_res_evict.
>   * @allow_res_evict: Whether it's allowed to evict bos sharing @vm's
>   *                   reservation object.
> + * @exec: The drm_exec transaction to use for exhaustive eviction.
>   *
>   * Make sure the bo is in allowed placement, migrating it if necessary. If
>   * needed, other bos will be evicted. If bos selected for eviction shares
> @@ -2461,7 +2541,8 @@ void xe_bo_unpin(struct xe_bo *bo)
>   * Return: 0 on success, negative error code on failure. May return
>   * -EINTR or -ERESTARTSYS if internal waits are interrupted by a signal.
>   */
> -int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict)
> +int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict,
> +		   struct drm_exec *exec)
>  {
>  	struct ttm_operation_ctx ctx = {
>  		.interruptible = true,
> @@ -2480,6 +2561,7 @@ int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict)
>  
>  	xe_vm_set_validating(vm, allow_res_evict);
>  	trace_xe_bo_validate(bo);
> +	xe_validation_assert_exec(xe_bo_device(bo), exec, &bo->ttm.base);
>  	ret = ttm_bo_validate(&bo->ttm, &bo->placement, &ctx);
>  	xe_vm_clear_validating(vm, allow_res_evict);
>  
> @@ -2917,6 +2999,7 @@ static void xe_place_from_ttm_type(u32 mem_type, struct ttm_place *place)
>   * xe_bo_migrate - Migrate an object to the desired region id
>   * @bo: The buffer object to migrate.
>   * @mem_type: The TTM region type to migrate to.
> + * @exec: The drm_exec transaction to use for exhaustive eviction.
>   *
>   * Attempt to migrate the buffer object to the desired memory region. The
>   * buffer object may not be pinned, and must be locked.
> @@ -2928,7 +3011,7 @@ static void xe_place_from_ttm_type(u32 mem_type, struct ttm_place *place)
>   * Return: 0 on success. Negative error code on failure. In particular may
>   * return -EINTR or -ERESTARTSYS if signal pending.
>   */
> -int xe_bo_migrate(struct xe_bo *bo, u32 mem_type)
> +int xe_bo_migrate(struct xe_bo *bo, u32 mem_type, struct drm_exec *exec)
>  {
>  	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
>  	struct ttm_operation_ctx ctx = {
> @@ -2966,19 +3049,21 @@ int xe_bo_migrate(struct xe_bo *bo, u32 mem_type)
>  		add_vram(xe, bo, &requested, bo->flags, mem_type, &c);
>  	}
>  
> +	xe_validation_assert_exec(xe_bo_device(bo), exec, &bo->ttm.base);
>  	return ttm_bo_validate(&bo->ttm, &placement, &ctx);
>  }
>  
>  /**
>   * xe_bo_evict - Evict an object to evict placement
>   * @bo: The buffer object to migrate.
> + * @exec: The drm_exec transaction to use for exhaustive eviction.
>   *
>   * On successful completion, the object memory will be moved to evict
>   * placement. This function blocks until the object has been fully moved.
>   *
>   * Return: 0 on success. Negative error code on failure.
>   */
> -int xe_bo_evict(struct xe_bo *bo)
> +int xe_bo_evict(struct xe_bo *bo, struct drm_exec *exec)
>  {
>  	struct ttm_operation_ctx ctx = {
>  		.interruptible = false,
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index 8cce413b5235..b1b6cb622d71 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -10,6 +10,7 @@
>  
>  #include "xe_bo_types.h"
>  #include "xe_macros.h"
> +#include "xe_validation.h"
>  #include "xe_vm_types.h"
>  #include "xe_vm.h"
>  #include "xe_vram_types.h"
> @@ -92,15 +93,17 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>  				     struct xe_tile *tile, struct dma_resv *resv,
>  				     struct ttm_lru_bulk_move *bulk, size_t size,
>  				     u16 cpu_caching, enum ttm_bo_type type,
> -				     u32 flags);
> +				     u32 flags, struct drm_exec *exec);
>  struct xe_bo *
>  xe_bo_create_locked_range(struct xe_device *xe,
>  			  struct xe_tile *tile, struct xe_vm *vm,
>  			  size_t size, u64 start, u64 end,
> -			  enum ttm_bo_type type, u32 flags, u64 alignment);
> +			  enum ttm_bo_type type, u32 flags, u64 alignment,
> +			  struct drm_exec *exec);
>  struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
>  				  struct xe_vm *vm, size_t size,
> -				  enum ttm_bo_type type, u32 flags);
> +				  enum ttm_bo_type type, u32 flags,
> +				  struct drm_exec *exec);
>  struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
>  			   struct xe_vm *vm, size_t size,
>  			   enum ttm_bo_type type, u32 flags);
> @@ -200,11 +203,12 @@ static inline void xe_bo_unlock_vm_held(struct xe_bo *bo)
>  	}
>  }
>  
> -int xe_bo_pin_external(struct xe_bo *bo);
> -int xe_bo_pin(struct xe_bo *bo);
> +int xe_bo_pin_external(struct xe_bo *bo, struct drm_exec *exec);
> +int xe_bo_pin(struct xe_bo *bo, struct drm_exec *exec);
>  void xe_bo_unpin_external(struct xe_bo *bo);
>  void xe_bo_unpin(struct xe_bo *bo);
> -int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict);
> +int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict,
> +		   struct drm_exec *exec);
>  
>  static inline bool xe_bo_is_pinned(struct xe_bo *bo)
>  {
> @@ -285,8 +289,8 @@ uint64_t vram_region_gpu_offset(struct ttm_resource *res);
>  
>  bool xe_bo_can_migrate(struct xe_bo *bo, u32 mem_type);
>  
> -int xe_bo_migrate(struct xe_bo *bo, u32 mem_type);
> -int xe_bo_evict(struct xe_bo *bo);
> +int xe_bo_migrate(struct xe_bo *bo, u32 mem_type, struct drm_exec *exec);
> +int xe_bo_evict(struct xe_bo *bo, struct drm_exec *exec);
>  
>  int xe_bo_evict_pinned(struct xe_bo *bo);
>  int xe_bo_notifier_prepare_pinned(struct xe_bo *bo);
> diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
> index 346f857f3837..78a827d4e726 100644
> --- a/drivers/gpu/drm/xe/xe_dma_buf.c
> +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
> @@ -51,6 +51,7 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
>  	struct drm_gem_object *obj = attach->dmabuf->priv;
>  	struct xe_bo *bo = gem_to_xe_bo(obj);
>  	struct xe_device *xe = xe_bo_device(bo);
> +	struct drm_exec *exec = XE_VALIDATION_UNSUPPORTED;
>  	int ret;
>  
>  	/*
> @@ -63,7 +64,7 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
>  		return -EINVAL;
>  	}
>  
> -	ret = xe_bo_migrate(bo, XE_PL_TT);
> +	ret = xe_bo_migrate(bo, XE_PL_TT, exec);
>  	if (ret) {
>  		if (ret != -EINTR && ret != -ERESTARTSYS)
>  			drm_dbg(&xe->drm,
> @@ -72,7 +73,7 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
>  		return ret;
>  	}
>  
> -	ret = xe_bo_pin_external(bo);
> +	ret = xe_bo_pin_external(bo, exec);
>  	xe_assert(xe, !ret);
>  
>  	return 0;
> @@ -92,6 +93,7 @@ static struct sg_table *xe_dma_buf_map(struct dma_buf_attachment *attach,
>  	struct dma_buf *dma_buf = attach->dmabuf;
>  	struct drm_gem_object *obj = dma_buf->priv;
>  	struct xe_bo *bo = gem_to_xe_bo(obj);
> +	struct drm_exec *exec = XE_VALIDATION_UNSUPPORTED;
>  	struct sg_table *sgt;
>  	int r = 0;
>  
> @@ -100,9 +102,9 @@ static struct sg_table *xe_dma_buf_map(struct dma_buf_attachment *attach,
>  
>  	if (!xe_bo_is_pinned(bo)) {
>  		if (!attach->peer2peer)
> -			r = xe_bo_migrate(bo, XE_PL_TT);
> +			r = xe_bo_migrate(bo, XE_PL_TT, exec);
>  		else
> -			r = xe_bo_validate(bo, NULL, false);
> +			r = xe_bo_validate(bo, NULL, false, exec);
>  		if (r)
>  			return ERR_PTR(r);
>  	}
> @@ -161,13 +163,14 @@ static int xe_dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
>  	struct xe_bo *bo = gem_to_xe_bo(obj);
>  	bool reads =  (direction == DMA_BIDIRECTIONAL ||
>  		       direction == DMA_FROM_DEVICE);
> +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
>  
>  	if (!reads)
>  		return 0;
>  
>  	/* Can we do interruptible lock here? */
>  	xe_bo_lock(bo, false);
> -	(void)xe_bo_migrate(bo, XE_PL_TT);
> +	(void)xe_bo_migrate(bo, XE_PL_TT, exec);
>  	xe_bo_unlock(bo);
>  
>  	return 0;
> @@ -208,13 +211,14 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
>  {
>  	struct dma_resv *resv = dma_buf->resv;
>  	struct xe_device *xe = to_xe_device(dev);
> +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
>  	struct xe_bo *bo;
>  	int ret;
>  
>  	dma_resv_lock(resv, NULL);
>  	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
>  				    0, /* Will require 1way or 2way for vm_bind */
> -				    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM);
> +				    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM, exec);
>  	if (IS_ERR(bo)) {
>  		ret = PTR_ERR(bo);
>  		goto error;
> @@ -232,8 +236,9 @@ static void xe_dma_buf_move_notify(struct dma_buf_attachment *attach)
>  {
>  	struct drm_gem_object *obj = attach->importer_priv;
>  	struct xe_bo *bo = gem_to_xe_bo(obj);
> +	struct drm_exec *exec = XE_VALIDATION_UNSUPPORTED;
>  
> -	XE_WARN_ON(xe_bo_evict(bo));
> +	XE_WARN_ON(xe_bo_evict(bo, exec));
>  }
>  
>  static const struct dma_buf_attach_ops xe_dma_buf_attach_ops = {
> diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> index 44364c042ad7..0bcb4fb9a10e 100644
> --- a/drivers/gpu/drm/xe/xe_exec.c
> +++ b/drivers/gpu/drm/xe/xe_exec.c
> @@ -97,9 +97,13 @@
>  static int xe_exec_fn(struct drm_gpuvm_exec *vm_exec)
>  {
>  	struct xe_vm *vm = container_of(vm_exec->vm, struct xe_vm, gpuvm);
> +	int ret;
>  
>  	/* The fence slot added here is intended for the exec sched job. */
> -	return xe_vm_validate_rebind(vm, &vm_exec->exec, 1);
> +	xe_vm_set_validation_exec(vm, &vm_exec->exec);
> +	ret = xe_vm_validate_rebind(vm, &vm_exec->exec, 1);
> +	xe_vm_set_validation_exec(vm, NULL);
> +	return ret;
>  }
>  
>  int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> index e03222f5ac5a..a47c0131956b 100644
> --- a/drivers/gpu/drm/xe/xe_ggtt.c
> +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> @@ -731,7 +731,7 @@ void xe_ggtt_map_bo_unlocked(struct xe_ggtt *ggtt, struct xe_bo *bo)
>  }
>  
>  static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
> -				  u64 start, u64 end)
> +				  u64 start, u64 end, struct drm_exec *exec)
>  {
>  	u64 alignment = bo->min_align > 0 ? bo->min_align : XE_PAGE_SIZE;
>  	u8 tile_id = ggtt->tile->id;
> @@ -746,7 +746,7 @@ static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
>  		return 0;
>  	}
>  
> -	err = xe_bo_validate(bo, NULL, false);
> +	err = xe_bo_validate(bo, NULL, false, exec);
>  	if (err)
>  		return err;
>  
> @@ -788,25 +788,28 @@ static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
>   * @bo: the &xe_bo to be inserted
>   * @start: address where it will be inserted
>   * @end: end of the range where it will be inserted
> + * @exec: The drm_exec transaction to use for exhaustive eviction.
>   *
>   * Return: 0 on success or a negative error code on failure.
>   */
>  int xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
> -			 u64 start, u64 end)
> +			 u64 start, u64 end, struct drm_exec *exec)
>  {
> -	return __xe_ggtt_insert_bo_at(ggtt, bo, start, end);
> +	return __xe_ggtt_insert_bo_at(ggtt, bo, start, end, exec);
>  }
>  
>  /**
>   * xe_ggtt_insert_bo - Insert BO into GGTT
>   * @ggtt: the &xe_ggtt where bo will be inserted
>   * @bo: the &xe_bo to be inserted
> + * @exec: The drm_exec transaction to use for exhaustive eviction.
>   *
>   * Return: 0 on success or a negative error code on failure.
>   */
> -int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo)
> +int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo,
> +		      struct drm_exec *exec)
>  {
> -	return __xe_ggtt_insert_bo_at(ggtt, bo, 0, U64_MAX);
> +	return __xe_ggtt_insert_bo_at(ggtt, bo, 0, U64_MAX, exec);
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/xe/xe_ggtt.h b/drivers/gpu/drm/xe/xe_ggtt.h
> index fbe1e397d05d..75fc7a1efea7 100644
> --- a/drivers/gpu/drm/xe/xe_ggtt.h
> +++ b/drivers/gpu/drm/xe/xe_ggtt.h
> @@ -10,6 +10,7 @@
>  
>  struct drm_printer;
>  struct xe_tile;
> +struct drm_exec;
>  
>  struct xe_ggtt *xe_ggtt_alloc(struct xe_tile *tile);
>  int xe_ggtt_init_early(struct xe_ggtt *ggtt);
> @@ -31,9 +32,9 @@ bool xe_ggtt_node_allocated(const struct xe_ggtt_node *node);
>  void xe_ggtt_map_bo(struct xe_ggtt *ggtt, struct xe_ggtt_node *node,
>  		    struct xe_bo *bo, u16 pat_index);
>  void xe_ggtt_map_bo_unlocked(struct xe_ggtt *ggtt, struct xe_bo *bo);
> -int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo);
> +int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo, struct drm_exec *exec);
>  int xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
> -			 u64 start, u64 end);
> +			 u64 start, u64 end, struct drm_exec *exec);
>  void xe_ggtt_remove_bo(struct xe_ggtt *ggtt, struct xe_bo *bo);
>  u64 xe_ggtt_largest_hole(struct xe_ggtt *ggtt, u64 alignment, u64 *spare);
>  
> diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> index ab43dec52776..4133b9b78f7d 100644
> --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> @@ -94,12 +94,12 @@ static int xe_pf_begin(struct drm_exec *exec, struct xe_vma *vma,
>  		}
>  
>  		/* Migrate to VRAM, move should invalidate the VMA first */
> -		err = xe_bo_migrate(bo, vram->placement);
> +		err = xe_bo_migrate(bo, vram->placement, exec);
>  		if (err)
>  			return err;
>  	} else if (bo) {
>  		/* Create backing store if needed */
> -		err = xe_bo_validate(bo, vm, true);
> +		err = xe_bo_validate(bo, vm, true, exec);
>  		if (err)
>  			return err;
>  	}
> @@ -150,7 +150,9 @@ static int handle_vma_pagefault(struct xe_gt *gt, struct xe_vma *vma,
>  
>  		/* Bind VMA only to the GT that has faulted */
>  		trace_xe_vma_pf_bind(vma);
> +		xe_vm_set_validation_exec(vm, &exec);
>  		fence = xe_vma_rebind(vm, vma, BIT(tile->id));
> +		xe_vm_set_validation_exec(vm, NULL);
>  		if (IS_ERR(fence)) {
>  			err = PTR_ERR(fence);
>  			if (xe_vm_validate_should_retry(&exec, err, &end))
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> index c8f0320d032f..906011671b60 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
> @@ -1452,6 +1452,7 @@ static bool pf_release_vf_config_lmem(struct xe_gt *gt, struct xe_gt_sriov_confi
>  static int pf_provision_vf_lmem(struct xe_gt *gt, unsigned int vfid, u64 size)
>  {
>  	struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid);
> +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
>  	struct xe_device *xe = gt_to_xe(gt);
>  	struct xe_tile *tile = gt_to_tile(gt);
>  	struct xe_bo *bo;
> @@ -1484,11 +1485,12 @@ static int pf_provision_vf_lmem(struct xe_gt *gt, unsigned int vfid, u64 size)
>  				 XE_BO_FLAG_VRAM_IF_DGFX(tile) |
>  				 XE_BO_FLAG_NEEDS_2M |
>  				 XE_BO_FLAG_PINNED |
> -				 XE_BO_FLAG_PINNED_LATE_RESTORE);
> +				 XE_BO_FLAG_PINNED_LATE_RESTORE,
> +				 exec);
>  	if (IS_ERR(bo))
>  		return PTR_ERR(bo);
>  
> -	err = xe_bo_pin(bo);
> +	err = xe_bo_pin(bo, exec);
>  	xe_bo_unlock(bo);
>  	if (unlikely(err)) {
>  		xe_bo_put(bo);
> diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> index e35c6d4def20..39e3aa6df25a 100644
> --- a/drivers/gpu/drm/xe/xe_svm.c
> +++ b/drivers/gpu/drm/xe/xe_svm.c
> @@ -700,6 +700,7 @@ static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
>  	struct device *dev = xe->drm.dev;
>  	struct drm_buddy_block *block;
>  	struct list_head *blocks;
> +	struct drm_exec *exec;
>  	struct xe_bo *bo;
>  	ktime_t time_end = 0;
>  	int err, idx;
> @@ -708,12 +709,13 @@ static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
>  		return -ENODEV;
>  
>  	xe_pm_runtime_get(xe);
> +	exec = XE_VALIDATION_UNIMPLEMENTED;
>  
>   retry:
>  	bo = xe_bo_create_locked(vr->xe, NULL, NULL, end - start,
>  				 ttm_bo_type_device,
>  				 (IS_DGFX(xe) ? XE_BO_FLAG_VRAM(vr) : XE_BO_FLAG_SYSTEM) |
> -				 XE_BO_FLAG_CPU_ADDR_MIRROR);
> +				 XE_BO_FLAG_CPU_ADDR_MIRROR, exec);
>  	if (IS_ERR(bo)) {
>  		err = PTR_ERR(bo);
>  		if (xe_vm_validate_should_retry(NULL, err, &time_end))
> diff --git a/drivers/gpu/drm/xe/xe_validation.c b/drivers/gpu/drm/xe/xe_validation.c
> new file mode 100644
> index 000000000000..cc0684d24e02
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_validation.c
> @@ -0,0 +1,49 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2024 Intel Corporation
> + */
> +#include "xe_bo.h"
> +#include <drm/drm_exec.h>
> +#include <drm/drm_gem.h>
> +
> +#include "xe_assert.h"
> +#include "xe_validation.h"
> +
> +#ifdef CONFIG_DRM_XE_DEBUG
> +/**
> + * xe_validation_assert_exec() - Assert that the drm_exec pointer is suitable
> + * for validation.
> + * @xe: Pointer to the xe device.
> + * @exec: The drm_exec pointer to check.
> + * @obj: Pointer to the object subject to validation.
> + *
> + * NULL exec pointers are not allowed.
> + * For XE_VALIDATION_UNIMPLEMENTED, no checking.
> + * For XE_VLIDATION_OPT_OUT, check that the caller is a kunit test
> + * For XE_VALIDATION_UNSUPPORTED, check that the object subject to
> + * validation is a dma-buf, for which support for ww locking is
> + * not in place in the dma-buf layer.
> + */
> +void xe_validation_assert_exec(const struct xe_device *xe,
> +			       const struct drm_exec *exec,
> +			       const struct drm_gem_object *obj)
> +{
> +	xe_assert(xe, exec);
> +	if (IS_ERR(exec)) {
> +		switch (PTR_ERR(exec)) {
> +		case __XE_VAL_UNIMPLEMENTED:
> +			break;
> +		case __XE_VAL_UNSUPPORTED:
> +			xe_assert(xe, !!obj->dma_buf);
> +			break;
> +#if IS_ENABLED(CONFIG_KUNIT)
> +		case __XE_VAL_OPT_OUT:
> +			xe_assert(xe, current->kunit_test);
> +			break;
> +#endif
> +		default:
> +			xe_assert(xe, false);
> +		}
> +	}
> +}
> +#endif
> diff --git a/drivers/gpu/drm/xe/xe_validation.h b/drivers/gpu/drm/xe/xe_validation.h
> new file mode 100644
> index 000000000000..db50feacad7a
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_validation.h
> @@ -0,0 +1,69 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2024 Intel Corporation
> + */
> +#ifndef _XE_VALIDATION_H_
> +#define _XE_VALIDATION_H_
> +
> +#include <linux/dma-resv.h>
> +#include <linux/types.h>
> +
> +struct drm_exec;
> +struct drm_gem_object;
> +struct xe_device;
> +
> +#ifdef CONFIG_PROVE_LOCKING
> +/**
> + * xe_validation_lockdep() - Assert that a drm_exec locking transaction can
> + * be initialized at this point.
> + */
> +static inline void xe_validation_lockdep(void)
> +{
> +	struct ww_acquire_ctx ticket;
> +
> +	ww_acquire_init(&ticket, &reservation_ww_class);
> +	ww_acquire_fini(&ticket);
> +}
> +#else
> +static inline void xe_validation_lockdep(void)
> +{
> +}
> +#endif
> +
> +/*
> + * Various values of the drm_exec pointer where we've not (yet)
> + * implemented full ww locking.
> + *
> + * XE_VALIDATION_UNIMPLEMENTED means implementation is pending.
> + * A lockdep check is made to assure that a drm_exec locking
> + * transaction can actually take place where the macro is
> + * used. If this asserts, the exec pointer needs to be assigned
> + * higher up in the callchain and passed down.
> + *
> + * XE_VALIDATION_UNSUPPORTED is for dma-buf code only where
> + * the dma-buf layer doesn't support WW locking.
> + *
> + * XE_VALIDATION_OPT_OUT is for simplification of kunit tests where
> + * exhaustive eviction isn't necessary.
> + */
> +#define __XE_VAL_UNIMPLEMENTED -EINVAL
> +#define XE_VALIDATION_UNIMPLEMENTED (xe_validation_lockdep(),		\
> +				     (struct drm_exec *)ERR_PTR(__XE_VAL_UNIMPLEMENTED))
> +
> +#define __XE_VAL_UNSUPPORTED -EOPNOTSUPP
> +#define XE_VALIDATION_UNSUPPORTED ((struct drm_exec *)ERR_PTR(__XE_VAL_UNSUPPORTED))
> +
> +#define __XE_VAL_OPT_OUT -ENOMEM
> +#define XE_VALIDATION_OPT_OUT (xe_validation_lockdep(), \
> +			       (struct drm_exec *)ERR_PTR(__XE_VAL_OPT_OUT))
> +#ifdef CONFIG_DRM_XE_DEBUG
> +void xe_validation_assert_exec(const struct xe_device *xe, const struct drm_exec *exec,
> +			       const struct drm_gem_object *obj);
> +#else
> +#define xe_validation_assert_exec(_xe, _exec, _obj)	\
> +	do {						\
> +		(void)_xe; (void)_exec; (void)_obj;	\
> +	} while (0)
> +#endif
> +
> +#endif
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 529b6767caac..f1e74959f8ff 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -393,7 +393,7 @@ static int xe_gpuvm_validate(struct drm_gpuvm_bo *vm_bo, struct drm_exec *exec)
>  		list_move_tail(&gpuva_to_vma(gpuva)->combined_links.rebind,
>  			       &vm->rebind_list);
>  
> -	ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, false);
> +	ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, false, exec);
>  	if (ret)
>  		return ret;
>  
> @@ -528,7 +528,9 @@ static void preempt_rebind_work_func(struct work_struct *w)
>  	if (err)
>  		goto out_unlock;
>  
> +	xe_vm_set_validation_exec(vm, &exec);
>  	err = xe_vm_rebind(vm, true);
> +	xe_vm_set_validation_exec(vm, NULL);
>  	if (err)
>  		goto out_unlock;
>  
> @@ -2896,7 +2898,7 @@ static int vma_lock_and_validate(struct drm_exec *exec, struct xe_vma *vma,
>  			err = drm_exec_lock_obj(exec, &bo->ttm.base);
>  		if (!err && validate)
>  			err = xe_bo_validate(bo, vm,
> -					     !xe_vm_in_preempt_fence_mode(vm));
> +					     !xe_vm_in_preempt_fence_mode(vm), exec);
>  	}
>  
>  	return err;
> @@ -3019,7 +3021,8 @@ static int op_lock_and_prep(struct drm_exec *exec, struct xe_vm *vm,
>  					    false);
>  		if (!err && !xe_vma_has_no_bo(vma))
>  			err = xe_bo_migrate(xe_vma_bo(vma),
> -					    region_to_mem_type[region]);
> +					    region_to_mem_type[region],
> +					    exec);
>  		break;
>  	}
>  	default:
> @@ -3298,7 +3301,9 @@ static struct dma_fence *vm_bind_ioctl_ops_execute(struct xe_vm *vm,
>  			goto unlock;
>  		}
>  
> +		xe_vm_set_validation_exec(vm, &exec);
>  		fence = ops_execute(vm, vops);
> +		xe_vm_set_validation_exec(vm, NULL);
>  		if (IS_ERR(fence)) {
>  			if (PTR_ERR(fence) == -ENODATA)
>  				vm_bind_ioctl_ops_fini(vm, vops, NULL);
> @@ -3861,10 +3866,18 @@ struct dma_fence *xe_vm_bind_kernel_bo(struct xe_vm *vm, struct xe_bo *bo,
>   */
>  int xe_vm_lock(struct xe_vm *vm, bool intr)
>  {
> +	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> +	int ret;
> +
>  	if (intr)
> -		return dma_resv_lock_interruptible(xe_vm_resv(vm), NULL);
> +		ret = dma_resv_lock_interruptible(xe_vm_resv(vm), NULL);
> +	else
> +		ret = dma_resv_lock(xe_vm_resv(vm), NULL);
>  
> -	return dma_resv_lock(xe_vm_resv(vm), NULL);
> +	if (!ret)
> +		xe_vm_set_validation_exec(vm, exec);
> +
> +	return ret;
>  }
>  
>  /**
> @@ -3875,6 +3888,7 @@ int xe_vm_lock(struct xe_vm *vm, bool intr)
>   */
>  void xe_vm_unlock(struct xe_vm *vm)
>  {
> +	xe_vm_set_validation_exec(vm, NULL);
>  	dma_resv_unlock(xe_vm_resv(vm));
>  }
>  
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index 2ecb417c19a2..11f4e522cec5 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -321,7 +321,7 @@ static inline void xe_vm_set_validating(struct xe_vm *vm, bool allow_res_evict)
>  	if (vm && !allow_res_evict) {
>  		xe_vm_assert_held(vm);
>  		/* Pairs with READ_ONCE in xe_vm_is_validating() */
> -		WRITE_ONCE(vm->validating, current);
> +		WRITE_ONCE(vm->validation.validating, current);
>  	}
>  }
>  
> @@ -339,7 +339,7 @@ static inline void xe_vm_clear_validating(struct xe_vm *vm, bool allow_res_evict
>  {
>  	if (vm && !allow_res_evict) {
>  		/* Pairs with READ_ONCE in xe_vm_is_validating() */
> -		WRITE_ONCE(vm->validating, NULL);
> +		WRITE_ONCE(vm->validation.validating, NULL);
>  	}
>  }
>  
> @@ -357,13 +357,41 @@ static inline void xe_vm_clear_validating(struct xe_vm *vm, bool allow_res_evict
>  static inline bool xe_vm_is_validating(struct xe_vm *vm)
>  {
>  	/* Pairs with WRITE_ONCE in xe_vm_is_validating() */
> -	if (READ_ONCE(vm->validating) == current) {
> +	if (READ_ONCE(vm->validation.validating) == current) {
>  		xe_vm_assert_held(vm);
>  		return true;
>  	}
>  	return false;
>  }
>  
> +/**
> + * xe_vm_set_validation_exec() - Accessor to set the drm_exec object
> + * @vm: The vm we want to register a drm_exec object with.
> + * @exec: The exec object we want to register.
> + *
> + * Set the drm_exec object used to lock the vm's resv.
> + */
> +static inline void xe_vm_set_validation_exec(struct xe_vm *vm, struct drm_exec *exec)
> +{
> +	xe_vm_assert_held(vm);
> +	xe_assert(vm->xe, !!exec ^ !!vm->validation._exec);
> +	vm->validation._exec = exec;
> +}
> +
> +/**
> + * xe_vm_set_validation_exec() - Accessor to read the drm_exec object
> + * @vm: The vm we want to register a drm_exec object with.
> + *
> + * Return: The drm_exec object used to lock the vm's resv. The value
> + * is a valid pointer, %NULL, or one of the special values defined in
> + * xe_validation.h.
> + */
> +static inline struct drm_exec *xe_vm_validation_exec(struct xe_vm *vm)
> +{
> +	xe_vm_assert_held(vm);
> +	return vm->validation._exec;
> +}
> +
>  /**
>   * xe_vm_has_valid_gpu_mapping() - Advisory helper to check if VMA or SVM range has
>   * a valid GPU mapping
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index 8a07feef503b..2f88808e36bb 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -312,19 +312,35 @@ struct xe_vm {
>  		bool capture_once;
>  	} error_capture;
>  
> +	/**
> +	 * @validation: Validation data only valid with the vm resv held.
> +	 * Note: This is really task state of the task holding the vm resv,
> +	 * and moving forward we should
> +	 * come up with a better way of passing this down the call-
> +	 * chain.
> +	 */
> +	struct {
> +		/**
> +		 * @validation.validating: The task that is currently making bos resident.
> +		 * for this vm.
> +		 * Protected by the VM's resv for writing. Opportunistic reading can be done
> +		 * using READ_ONCE. Note: This is a workaround for the
> +		 * TTM eviction_valuable() callback not being passed a struct
> +		 * ttm_operation_context(). Future work might want to address this.
> +		 */
> +		struct task_struct *validating;
> +		/**
> +		 *  @validation.exec The drm_exec context used when locking the vm resv.
> +		 *  Protected by the vm's resv.
> +		 */
> +		struct drm_exec *_exec;
> +	} validation;
> +
>  	/**
>  	 * @tlb_flush_seqno: Required TLB flush seqno for the next exec.
>  	 * protected by the vm resv.
>  	 */
>  	u64 tlb_flush_seqno;
> -	/**
> -	 * @validating: The task that is currently making bos resident for this vm.
> -	 * Protected by the VM's resv for writing. Opportunistic reading can be done
> -	 * using READ_ONCE. Note: This is a workaround for the
> -	 * TTM eviction_valuable() callback not being passed a struct
> -	 * ttm_operation_context(). Future work might want to address this.
> -	 */
> -	struct task_struct *validating;
>  	/** @batch_invalidate_tlb: Always invalidate TLB before batch start */
>  	bool batch_invalidate_tlb;
>  	/** @xef: XE file handle for tracking this VM's drm client */
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* ✗ Xe.CI.Full: failure for Driver-managed exhaustive eviction (rev2)
  2025-08-22  9:40 [PATCH v2 00/16] Driver-managed exhaustive eviction Thomas Hellström
                   ` (18 preceding siblings ...)
  2025-08-22 11:31 ` ✓ Xe.CI.BAT: " Patchwork
@ 2025-08-23  4:17 ` Patchwork
  19 siblings, 0 replies; 36+ messages in thread
From: Patchwork @ 2025-08-23  4:17 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 71871 bytes --]

== Series Details ==

Series: Driver-managed exhaustive eviction (rev2)
URL   : https://patchwork.freedesktop.org/series/152882/
State : failure

== Summary ==

CI Bug Log - changes from xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f_FULL -> xe-pw-152882v2_FULL
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with xe-pw-152882v2_FULL absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in xe-pw-152882v2_FULL, please notify your bug team (I915-ci-infra@lists.freedesktop.org) to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (4 -> 4)
------------------------------

  No changes in participating hosts

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in xe-pw-152882v2_FULL:

### IGT changes ###

#### Possible regressions ####

  * igt@kms_flip@2x-nonexisting-fb-interruptible:
    - shard-dg2-set2:     NOTRUN -> [ABORT][1] +6 other tests abort
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-432/igt@kms_flip@2x-nonexisting-fb-interruptible.html

  * igt@kms_flip@absolute-wf_vblank-interruptible@a-hdmi-a3:
    - shard-bmg:          [PASS][2] -> [ABORT][3] +25 other tests abort
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-7/igt@kms_flip@absolute-wf_vblank-interruptible@a-hdmi-a3.html
   [3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-6/igt@kms_flip@absolute-wf_vblank-interruptible@a-hdmi-a3.html

  * igt@kms_flip@flip-vs-dpms-off-vs-modeset-interruptible:
    - shard-dg2-set2:     [PASS][4] -> [ABORT][5] +16 other tests abort
   [4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-dg2-466/igt@kms_flip@flip-vs-dpms-off-vs-modeset-interruptible.html
   [5]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-436/igt@kms_flip@flip-vs-dpms-off-vs-modeset-interruptible.html

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-blt:
    - shard-adlp:         [PASS][6] -> [SKIP][7] +12 other tests skip
   [6]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-adlp-6/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-blt.html
   [7]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-9/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-blt.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-shrfb-pgflip-blt:
    - shard-dg2-set2:     NOTRUN -> [SKIP][8] +3 other tests skip
   [8]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-466/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-shrfb-pgflip-blt.html

  * igt@kms_frontbuffer_tracking@fbc-indfb-scaledprimary:
    - shard-dg2-set2:     [PASS][9] -> [SKIP][10] +56 other tests skip
   [9]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-dg2-464/igt@kms_frontbuffer_tracking@fbc-indfb-scaledprimary.html
   [10]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-436/igt@kms_frontbuffer_tracking@fbc-indfb-scaledprimary.html

  * igt@xe_exec_fault_mode@many-rebind-imm:
    - shard-lnl:          [PASS][11] -> [FAIL][12]
   [11]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-lnl-4/igt@xe_exec_fault_mode@many-rebind-imm.html
   [12]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-4/igt@xe_exec_fault_mode@many-rebind-imm.html

  
Known issues
------------

  Here are the changes found in xe-pw-152882v2_FULL that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@intel_hwmon@hwmon-write:
    - shard-bmg:          [PASS][13] -> [FAIL][14] ([Intel XE#4665])
   [13]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-4/igt@intel_hwmon@hwmon-write.html
   [14]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-8/igt@intel_hwmon@hwmon-write.html

  * igt@kms_addfb_basic@addfb25-y-tiled-small-legacy:
    - shard-lnl:          NOTRUN -> [SKIP][15] ([Intel XE#1466])
   [15]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-5/igt@kms_addfb_basic@addfb25-y-tiled-small-legacy.html

  * igt@kms_atomic_transition@plane-toggle-modeset-transition:
    - shard-adlp:         [PASS][16] -> [FAIL][17] ([Intel XE#3908]) +1 other test fail
   [16]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-adlp-8/igt@kms_atomic_transition@plane-toggle-modeset-transition.html
   [17]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-8/igt@kms_atomic_transition@plane-toggle-modeset-transition.html

  * igt@kms_big_fb@4-tiled-addfb-size-overflow:
    - shard-adlp:         NOTRUN -> [SKIP][18] ([Intel XE#610])
   [18]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-9/igt@kms_big_fb@4-tiled-addfb-size-overflow.html

  * igt@kms_big_fb@4-tiled-max-hw-stride-32bpp-rotate-180-hflip:
    - shard-adlp:         NOTRUN -> [SKIP][19] ([Intel XE#1124])
   [19]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-6/igt@kms_big_fb@4-tiled-max-hw-stride-32bpp-rotate-180-hflip.html

  * igt@kms_big_fb@x-tiled-16bpp-rotate-270:
    - shard-dg2-set2:     NOTRUN -> [SKIP][20] ([Intel XE#316])
   [20]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-433/igt@kms_big_fb@x-tiled-16bpp-rotate-270.html

  * igt@kms_big_fb@x-tiled-32bpp-rotate-90:
    - shard-bmg:          NOTRUN -> [SKIP][21] ([Intel XE#2327]) +1 other test skip
   [21]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-6/igt@kms_big_fb@x-tiled-32bpp-rotate-90.html
    - shard-lnl:          NOTRUN -> [SKIP][22] ([Intel XE#1407]) +1 other test skip
   [22]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-5/igt@kms_big_fb@x-tiled-32bpp-rotate-90.html

  * igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-0:
    - shard-adlp:         [PASS][23] -> [DMESG-FAIL][24] ([Intel XE#4543]) +7 other tests dmesg-fail
   [23]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-adlp-6/igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-0.html
   [24]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-3/igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-0.html

  * igt@kms_big_fb@y-tiled-max-hw-stride-32bpp-rotate-0-async-flip:
    - shard-adlp:         NOTRUN -> [DMESG-FAIL][25] ([Intel XE#4543])
   [25]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-8/igt@kms_big_fb@y-tiled-max-hw-stride-32bpp-rotate-0-async-flip.html

  * igt@kms_big_fb@yf-tiled-16bpp-rotate-180:
    - shard-lnl:          NOTRUN -> [SKIP][26] ([Intel XE#1124]) +1 other test skip
   [26]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-4/igt@kms_big_fb@yf-tiled-16bpp-rotate-180.html

  * igt@kms_big_fb@yf-tiled-32bpp-rotate-0:
    - shard-bmg:          NOTRUN -> [SKIP][27] ([Intel XE#1124]) +3 other tests skip
   [27]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-2/igt@kms_big_fb@yf-tiled-32bpp-rotate-0.html

  * igt@kms_big_fb@yf-tiled-32bpp-rotate-180:
    - shard-dg2-set2:     NOTRUN -> [SKIP][28] ([Intel XE#1124]) +3 other tests skip
   [28]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-433/igt@kms_big_fb@yf-tiled-32bpp-rotate-180.html

  * igt@kms_bw@connected-linear-tiling-3-displays-2160x1440p:
    - shard-adlp:         NOTRUN -> [SKIP][29] ([Intel XE#2191])
   [29]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-3/igt@kms_bw@connected-linear-tiling-3-displays-2160x1440p.html
    - shard-bmg:          NOTRUN -> [SKIP][30] ([Intel XE#2314] / [Intel XE#2894])
   [30]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-2/igt@kms_bw@connected-linear-tiling-3-displays-2160x1440p.html
    - shard-dg2-set2:     NOTRUN -> [SKIP][31] ([Intel XE#2191])
   [31]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-433/igt@kms_bw@connected-linear-tiling-3-displays-2160x1440p.html
    - shard-lnl:          NOTRUN -> [SKIP][32] ([Intel XE#2191])
   [32]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-3/igt@kms_bw@connected-linear-tiling-3-displays-2160x1440p.html

  * igt@kms_bw@linear-tiling-4-displays-3840x2160p:
    - shard-dg2-set2:     NOTRUN -> [SKIP][33] ([Intel XE#367])
   [33]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-464/igt@kms_bw@linear-tiling-4-displays-3840x2160p.html

  * igt@kms_ccs@bad-rotation-90-4-tiled-mtl-rc-ccs-cc:
    - shard-lnl:          NOTRUN -> [SKIP][34] ([Intel XE#2887]) +1 other test skip
   [34]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-2/igt@kms_ccs@bad-rotation-90-4-tiled-mtl-rc-ccs-cc.html

  * igt@kms_ccs@ccs-on-another-bo-y-tiled-gen12-mc-ccs@pipe-a-hdmi-a-2:
    - shard-dg2-set2:     NOTRUN -> [SKIP][35] ([Intel XE#787]) +97 other tests skip
   [35]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-432/igt@kms_ccs@ccs-on-another-bo-y-tiled-gen12-mc-ccs@pipe-a-hdmi-a-2.html

  * igt@kms_ccs@crc-primary-basic-4-tiled-mtl-mc-ccs:
    - shard-adlp:         NOTRUN -> [SKIP][36] ([Intel XE#455] / [Intel XE#787]) +5 other tests skip
   [36]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-2/igt@kms_ccs@crc-primary-basic-4-tiled-mtl-mc-ccs.html

  * igt@kms_ccs@crc-primary-basic-4-tiled-mtl-mc-ccs@pipe-c-hdmi-a-1:
    - shard-adlp:         NOTRUN -> [SKIP][37] ([Intel XE#787]) +8 other tests skip
   [37]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-2/igt@kms_ccs@crc-primary-basic-4-tiled-mtl-mc-ccs@pipe-c-hdmi-a-1.html

  * igt@kms_ccs@crc-primary-suspend-y-tiled-ccs:
    - shard-bmg:          NOTRUN -> [SKIP][38] ([Intel XE#3432])
   [38]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-6/igt@kms_ccs@crc-primary-suspend-y-tiled-ccs.html
    - shard-lnl:          NOTRUN -> [SKIP][39] ([Intel XE#3432])
   [39]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-4/igt@kms_ccs@crc-primary-suspend-y-tiled-ccs.html

  * igt@kms_ccs@random-ccs-data-4-tiled-bmg-ccs:
    - shard-adlp:         NOTRUN -> [SKIP][40] ([Intel XE#2907])
   [40]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-4/igt@kms_ccs@random-ccs-data-4-tiled-bmg-ccs.html
    - shard-dg2-set2:     NOTRUN -> [SKIP][41] ([Intel XE#2907])
   [41]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-432/igt@kms_ccs@random-ccs-data-4-tiled-bmg-ccs.html
    - shard-lnl:          NOTRUN -> [SKIP][42] ([Intel XE#2669]) +3 other tests skip
   [42]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-8/igt@kms_ccs@random-ccs-data-4-tiled-bmg-ccs.html

  * igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc:
    - shard-bmg:          NOTRUN -> [SKIP][43] ([Intel XE#2887]) +3 other tests skip
   [43]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-2/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc.html
    - shard-dg2-set2:     NOTRUN -> [INCOMPLETE][44] ([Intel XE#1727] / [Intel XE#3113] / [Intel XE#3124])
   [44]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-433/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc.html

  * igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc@pipe-b-dp-4:
    - shard-dg2-set2:     NOTRUN -> [INCOMPLETE][45] ([Intel XE#3124])
   [45]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-433/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc@pipe-b-dp-4.html

  * igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc@pipe-b-hdmi-a-6:
    - shard-dg2-set2:     NOTRUN -> [DMESG-WARN][46] ([Intel XE#1727] / [Intel XE#3113])
   [46]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-433/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc@pipe-b-hdmi-a-6.html

  * igt@kms_ccs@random-ccs-data-yf-tiled-ccs@pipe-d-dp-4:
    - shard-dg2-set2:     NOTRUN -> [SKIP][47] ([Intel XE#455] / [Intel XE#787]) +17 other tests skip
   [47]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-464/igt@kms_ccs@random-ccs-data-yf-tiled-ccs@pipe-d-dp-4.html

  * igt@kms_cdclk@plane-scaling:
    - shard-lnl:          NOTRUN -> [SKIP][48] ([Intel XE#4416]) +3 other tests skip
   [48]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-3/igt@kms_cdclk@plane-scaling.html
    - shard-adlp:         NOTRUN -> [SKIP][49] ([Intel XE#4416] / [Intel XE#455])
   [49]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-9/igt@kms_cdclk@plane-scaling.html

  * igt@kms_cdclk@plane-scaling@pipe-a-hdmi-a-1:
    - shard-adlp:         NOTRUN -> [SKIP][50] ([Intel XE#4416]) +2 other tests skip
   [50]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-9/igt@kms_cdclk@plane-scaling@pipe-a-hdmi-a-1.html

  * igt@kms_cdclk@plane-scaling@pipe-b-dp-4:
    - shard-dg2-set2:     NOTRUN -> [SKIP][51] ([Intel XE#4416]) +3 other tests skip
   [51]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-434/igt@kms_cdclk@plane-scaling@pipe-b-dp-4.html

  * igt@kms_chamelium_color@ctm-0-75:
    - shard-dg2-set2:     NOTRUN -> [SKIP][52] ([Intel XE#306])
   [52]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-436/igt@kms_chamelium_color@ctm-0-75.html

  * igt@kms_chamelium_hpd@dp-hpd-storm:
    - shard-adlp:         NOTRUN -> [SKIP][53] ([Intel XE#373]) +2 other tests skip
   [53]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-3/igt@kms_chamelium_hpd@dp-hpd-storm.html
    - shard-bmg:          NOTRUN -> [SKIP][54] ([Intel XE#2252]) +2 other tests skip
   [54]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-7/igt@kms_chamelium_hpd@dp-hpd-storm.html

  * igt@kms_chamelium_hpd@hdmi-hpd:
    - shard-dg2-set2:     NOTRUN -> [SKIP][55] ([Intel XE#373]) +3 other tests skip
   [55]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-436/igt@kms_chamelium_hpd@hdmi-hpd.html

  * igt@kms_chamelium_hpd@vga-hpd-with-enabled-mode:
    - shard-lnl:          NOTRUN -> [SKIP][56] ([Intel XE#373]) +3 other tests skip
   [56]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-5/igt@kms_chamelium_hpd@vga-hpd-with-enabled-mode.html

  * igt@kms_content_protection@lic-type-0@pipe-a-dp-2:
    - shard-bmg:          NOTRUN -> [FAIL][57] ([Intel XE#1178])
   [57]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-3/igt@kms_content_protection@lic-type-0@pipe-a-dp-2.html

  * igt@kms_content_protection@srm@pipe-a-dp-4:
    - shard-dg2-set2:     NOTRUN -> [FAIL][58] ([Intel XE#1178])
   [58]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-463/igt@kms_content_protection@srm@pipe-a-dp-4.html

  * igt@kms_cursor_crc@cursor-offscreen-64x21:
    - shard-lnl:          NOTRUN -> [SKIP][59] ([Intel XE#1424]) +1 other test skip
   [59]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-3/igt@kms_cursor_crc@cursor-offscreen-64x21.html

  * igt@kms_cursor_crc@cursor-onscreen-128x42:
    - shard-bmg:          NOTRUN -> [SKIP][60] ([Intel XE#2320]) +2 other tests skip
   [60]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-8/igt@kms_cursor_crc@cursor-onscreen-128x42.html

  * igt@kms_cursor_crc@cursor-rapid-movement-512x512:
    - shard-dg2-set2:     NOTRUN -> [SKIP][61] ([Intel XE#308]) +1 other test skip
   [61]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-435/igt@kms_cursor_crc@cursor-rapid-movement-512x512.html
    - shard-bmg:          NOTRUN -> [SKIP][62] ([Intel XE#2321])
   [62]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-8/igt@kms_cursor_crc@cursor-rapid-movement-512x512.html

  * igt@kms_cursor_legacy@cursorb-vs-flipb-toggle:
    - shard-bmg:          [PASS][63] -> [SKIP][64] ([Intel XE#2291]) +2 other tests skip
   [63]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-8/igt@kms_cursor_legacy@cursorb-vs-flipb-toggle.html
   [64]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-6/igt@kms_cursor_legacy@cursorb-vs-flipb-toggle.html

  * igt@kms_dp_linktrain_fallback@dp-fallback:
    - shard-bmg:          [PASS][65] -> [SKIP][66] ([Intel XE#4294])
   [65]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-5/igt@kms_dp_linktrain_fallback@dp-fallback.html
   [66]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-6/igt@kms_dp_linktrain_fallback@dp-fallback.html

  * igt@kms_fbc_dirty_rect@fbc-dirty-rectangle-out-visible-area:
    - shard-adlp:         NOTRUN -> [SKIP][67] ([Intel XE#4422])
   [67]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-2/igt@kms_fbc_dirty_rect@fbc-dirty-rectangle-out-visible-area.html
    - shard-bmg:          NOTRUN -> [SKIP][68] ([Intel XE#4422])
   [68]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-3/igt@kms_fbc_dirty_rect@fbc-dirty-rectangle-out-visible-area.html
    - shard-lnl:          NOTRUN -> [SKIP][69] ([Intel XE#4422])
   [69]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-4/igt@kms_fbc_dirty_rect@fbc-dirty-rectangle-out-visible-area.html

  * igt@kms_fbcon_fbt@psr-suspend:
    - shard-bmg:          NOTRUN -> [SKIP][70] ([Intel XE#776])
   [70]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-7/igt@kms_fbcon_fbt@psr-suspend.html

  * igt@kms_flip@2x-flip-vs-blocking-wf-vblank:
    - shard-bmg:          [PASS][71] -> [SKIP][72] ([Intel XE#2316]) +3 other tests skip
   [71]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-8/igt@kms_flip@2x-flip-vs-blocking-wf-vblank.html
   [72]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-6/igt@kms_flip@2x-flip-vs-blocking-wf-vblank.html

  * igt@kms_flip@2x-flip-vs-dpms-off-vs-modeset-interruptible@ad-dp2-hdmi-a3:
    - shard-bmg:          [PASS][73] -> [INCOMPLETE][74] ([Intel XE#2049])
   [73]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-2/igt@kms_flip@2x-flip-vs-dpms-off-vs-modeset-interruptible@ad-dp2-hdmi-a3.html
   [74]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-2/igt@kms_flip@2x-flip-vs-dpms-off-vs-modeset-interruptible@ad-dp2-hdmi-a3.html

  * igt@kms_flip@2x-flip-vs-dpms-on-nop:
    - shard-lnl:          NOTRUN -> [SKIP][75] ([Intel XE#1421]) +1 other test skip
   [75]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-2/igt@kms_flip@2x-flip-vs-dpms-on-nop.html

  * igt@kms_flip@2x-flip-vs-wf_vblank:
    - shard-bmg:          NOTRUN -> [SKIP][76] ([Intel XE#2316])
   [76]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-6/igt@kms_flip@2x-flip-vs-wf_vblank.html
    - shard-adlp:         NOTRUN -> [SKIP][77] ([Intel XE#310])
   [77]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-6/igt@kms_flip@2x-flip-vs-wf_vblank.html

  * igt@kms_flip@flip-vs-dpms-on-nop-interruptible@a-hdmi-a1:
    - shard-adlp:         [PASS][78] -> [DMESG-WARN][79] ([Intel XE#4543]) +17 other tests dmesg-warn
   [78]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-adlp-8/igt@kms_flip@flip-vs-dpms-on-nop-interruptible@a-hdmi-a1.html
   [79]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-1/igt@kms_flip@flip-vs-dpms-on-nop-interruptible@a-hdmi-a1.html

  * igt@kms_flip@flip-vs-expired-vblank@c-edp1:
    - shard-lnl:          [PASS][80] -> [FAIL][81] ([Intel XE#301] / [Intel XE#3149]) +1 other test fail
   [80]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-lnl-3/igt@kms_flip@flip-vs-expired-vblank@c-edp1.html
   [81]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-8/igt@kms_flip@flip-vs-expired-vblank@c-edp1.html

  * igt@kms_flip_scaled_crc@flip-32bpp-yftile-to-32bpp-yftileccs-upscaling:
    - shard-adlp:         NOTRUN -> [SKIP][82] ([Intel XE#455]) +3 other tests skip
   [82]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-9/igt@kms_flip_scaled_crc@flip-32bpp-yftile-to-32bpp-yftileccs-upscaling.html
    - shard-lnl:          NOTRUN -> [SKIP][83] ([Intel XE#1401] / [Intel XE#1745])
   [83]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-3/igt@kms_flip_scaled_crc@flip-32bpp-yftile-to-32bpp-yftileccs-upscaling.html

  * igt@kms_flip_scaled_crc@flip-32bpp-yftile-to-32bpp-yftileccs-upscaling@pipe-a-default-mode:
    - shard-lnl:          NOTRUN -> [SKIP][84] ([Intel XE#1401])
   [84]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-3/igt@kms_flip_scaled_crc@flip-32bpp-yftile-to-32bpp-yftileccs-upscaling@pipe-a-default-mode.html

  * igt@kms_flip_scaled_crc@flip-32bpp-yftile-to-64bpp-yftile-upscaling:
    - shard-dg2-set2:     NOTRUN -> [SKIP][85] ([Intel XE#455]) +7 other tests skip
   [85]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-463/igt@kms_flip_scaled_crc@flip-32bpp-yftile-to-64bpp-yftile-upscaling.html

  * igt@kms_flip_scaled_crc@flip-64bpp-linear-to-32bpp-linear-downscaling:
    - shard-lnl:          NOTRUN -> [SKIP][86] ([Intel XE#1397] / [Intel XE#1745])
   [86]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-8/igt@kms_flip_scaled_crc@flip-64bpp-linear-to-32bpp-linear-downscaling.html

  * igt@kms_flip_scaled_crc@flip-64bpp-linear-to-32bpp-linear-downscaling@pipe-a-default-mode:
    - shard-lnl:          NOTRUN -> [SKIP][87] ([Intel XE#1397])
   [87]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-8/igt@kms_flip_scaled_crc@flip-64bpp-linear-to-32bpp-linear-downscaling@pipe-a-default-mode.html

  * igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilegen12rcccs-upscaling:
    - shard-bmg:          NOTRUN -> [SKIP][88] ([Intel XE#2293] / [Intel XE#2380]) +1 other test skip
   [88]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-5/igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilegen12rcccs-upscaling.html

  * igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilegen12rcccs-upscaling@pipe-a-valid-mode:
    - shard-bmg:          NOTRUN -> [SKIP][89] ([Intel XE#2293]) +1 other test skip
   [89]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-5/igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilegen12rcccs-upscaling@pipe-a-valid-mode.html

  * igt@kms_frontbuffer_tracking@drrs-1p-offscren-pri-indfb-draw-blt:
    - shard-dg2-set2:     NOTRUN -> [SKIP][90] ([Intel XE#651]) +7 other tests skip
   [90]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-435/igt@kms_frontbuffer_tracking@drrs-1p-offscren-pri-indfb-draw-blt.html

  * igt@kms_frontbuffer_tracking@drrs-2p-scndscrn-pri-shrfb-draw-mmap-wc:
    - shard-adlp:         NOTRUN -> [SKIP][91] ([Intel XE#656]) +5 other tests skip
   [91]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-8/igt@kms_frontbuffer_tracking@drrs-2p-scndscrn-pri-shrfb-draw-mmap-wc.html

  * igt@kms_frontbuffer_tracking@drrs-2p-scndscrn-spr-indfb-move:
    - shard-bmg:          NOTRUN -> [SKIP][92] ([Intel XE#2311]) +3 other tests skip
   [92]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-8/igt@kms_frontbuffer_tracking@drrs-2p-scndscrn-spr-indfb-move.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-cur-indfb-draw-render:
    - shard-bmg:          NOTRUN -> [SKIP][93] ([Intel XE#5390]) +2 other tests skip
   [93]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-5/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-cur-indfb-draw-render.html
    - shard-lnl:          NOTRUN -> [SKIP][94] ([Intel XE#656]) +8 other tests skip
   [94]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-2/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-cur-indfb-draw-render.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-draw-mmap-wc:
    - shard-bmg:          NOTRUN -> [SKIP][95] ([Intel XE#2312])
   [95]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-6/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-draw-mmap-wc.html

  * igt@kms_frontbuffer_tracking@fbcdrrs-1p-primscrn-cur-indfb-draw-mmap-wc:
    - shard-lnl:          NOTRUN -> [SKIP][96] ([Intel XE#651])
   [96]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-4/igt@kms_frontbuffer_tracking@fbcdrrs-1p-primscrn-cur-indfb-draw-mmap-wc.html

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-cur-indfb-draw-render:
    - shard-adlp:         NOTRUN -> [SKIP][97] ([Intel XE#653]) +1 other test skip
   [97]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-9/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-cur-indfb-draw-render.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-pri-shrfb-draw-blt:
    - shard-bmg:          NOTRUN -> [SKIP][98] ([Intel XE#2313]) +5 other tests skip
   [98]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-1/igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-pri-shrfb-draw-blt.html

  * igt@kms_frontbuffer_tracking@plane-fbc-rte:
    - shard-dg2-set2:     NOTRUN -> [SKIP][99] ([Intel XE#1158])
   [99]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-435/igt@kms_frontbuffer_tracking@plane-fbc-rte.html
    - shard-adlp:         NOTRUN -> [SKIP][100] ([Intel XE#1158])
   [100]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-3/igt@kms_frontbuffer_tracking@plane-fbc-rte.html
    - shard-bmg:          NOTRUN -> [SKIP][101] ([Intel XE#2350])
   [101]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-8/igt@kms_frontbuffer_tracking@plane-fbc-rte.html

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-shrfb-draw-mmap-wc:
    - shard-dg2-set2:     NOTRUN -> [SKIP][102] ([Intel XE#653]) +5 other tests skip
   [102]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-436/igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-shrfb-draw-mmap-wc.html

  * igt@kms_hdmi_inject@inject-audio:
    - shard-lnl:          NOTRUN -> [SKIP][103] ([Intel XE#1470] / [Intel XE#2853])
   [103]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-8/igt@kms_hdmi_inject@inject-audio.html

  * igt@kms_joiner@invalid-modeset-ultra-joiner:
    - shard-dg2-set2:     NOTRUN -> [SKIP][104] ([Intel XE#2927])
   [104]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-464/igt@kms_joiner@invalid-modeset-ultra-joiner.html

  * igt@kms_plane@plane-panning-bottom-right-suspend:
    - shard-adlp:         [PASS][105] -> [DMESG-WARN][106] ([Intel XE#2953] / [Intel XE#4173]) +4 other tests dmesg-warn
   [105]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-adlp-6/igt@kms_plane@plane-panning-bottom-right-suspend.html
   [106]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-4/igt@kms_plane@plane-panning-bottom-right-suspend.html

  * igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-5:
    - shard-bmg:          NOTRUN -> [SKIP][107] ([Intel XE#2763]) +4 other tests skip
   [107]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-5/igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-5.html

  * igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-5@pipe-c:
    - shard-lnl:          NOTRUN -> [SKIP][108] ([Intel XE#2763]) +3 other tests skip
   [108]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-2/igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-5@pipe-c.html

  * igt@kms_pm_dc@dc3co-vpb-simulation:
    - shard-dg2-set2:     NOTRUN -> [SKIP][109] ([Intel XE#1122])
   [109]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-433/igt@kms_pm_dc@dc3co-vpb-simulation.html

  * igt@kms_pm_dc@dc6-dpms:
    - shard-lnl:          [PASS][110] -> [FAIL][111] ([Intel XE#718])
   [110]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-lnl-1/igt@kms_pm_dc@dc6-dpms.html
   [111]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-3/igt@kms_pm_dc@dc6-dpms.html

  * igt@kms_pm_rpm@dpms-mode-unset-non-lpsp:
    - shard-adlp:         NOTRUN -> [SKIP][112] ([Intel XE#836])
   [112]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-8/igt@kms_pm_rpm@dpms-mode-unset-non-lpsp.html
    - shard-lnl:          NOTRUN -> [SKIP][113] ([Intel XE#1439] / [Intel XE#836])
   [113]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-2/igt@kms_pm_rpm@dpms-mode-unset-non-lpsp.html

  * igt@kms_psr2_sf@fbc-pr-overlay-primary-update-sf-dmg-area:
    - shard-bmg:          NOTRUN -> [SKIP][114] ([Intel XE#1489] / [Intel XE#5899])
   [114]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-6/igt@kms_psr2_sf@fbc-pr-overlay-primary-update-sf-dmg-area.html
    - shard-lnl:          NOTRUN -> [SKIP][115] ([Intel XE#2893] / [Intel XE#5899])
   [115]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-5/igt@kms_psr2_sf@fbc-pr-overlay-primary-update-sf-dmg-area.html

  * igt@kms_psr2_sf@psr2-cursor-plane-move-continuous-exceed-fully-sf:
    - shard-adlp:         NOTRUN -> [SKIP][116] ([Intel XE#1489] / [Intel XE#5899]) +1 other test skip
   [116]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-8/igt@kms_psr2_sf@psr2-cursor-plane-move-continuous-exceed-fully-sf.html
    - shard-dg2-set2:     NOTRUN -> [SKIP][117] ([Intel XE#1489] / [Intel XE#5899]) +2 other tests skip
   [117]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-464/igt@kms_psr2_sf@psr2-cursor-plane-move-continuous-exceed-fully-sf.html

  * igt@kms_psr2_su@page_flip-nv12:
    - shard-dg2-set2:     NOTRUN -> [SKIP][118] ([Intel XE#1122] / [Intel XE#5899])
   [118]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-433/igt@kms_psr2_su@page_flip-nv12.html

  * igt@kms_psr@fbc-pr-cursor-blt:
    - shard-bmg:          NOTRUN -> [SKIP][119] ([Intel XE#2234] / [Intel XE#2850] / [Intel XE#5899]) +6 other tests skip
   [119]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-4/igt@kms_psr@fbc-pr-cursor-blt.html

  * igt@kms_psr@fbc-pr-dpms:
    - shard-lnl:          NOTRUN -> [SKIP][120] ([Intel XE#1406] / [Intel XE#5899])
   [120]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-2/igt@kms_psr@fbc-pr-dpms.html

  * igt@kms_psr@pr-cursor-blt:
    - shard-adlp:         NOTRUN -> [SKIP][121] ([Intel XE#2850] / [Intel XE#5899] / [Intel XE#929]) +2 other tests skip
   [121]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-9/igt@kms_psr@pr-cursor-blt.html
    - shard-lnl:          NOTRUN -> [SKIP][122] ([Intel XE#5784] / [Intel XE#5899])
   [122]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-1/igt@kms_psr@pr-cursor-blt.html

  * igt@kms_psr@psr-dpms:
    - shard-dg2-set2:     NOTRUN -> [SKIP][123] ([Intel XE#2850] / [Intel XE#5899] / [Intel XE#929]) +4 other tests skip
   [123]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-436/igt@kms_psr@psr-dpms.html

  * igt@kms_rotation_crc@primary-yf-tiled-reflect-x-90:
    - shard-bmg:          NOTRUN -> [SKIP][124] ([Intel XE#3414] / [Intel XE#3904])
   [124]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-6/igt@kms_rotation_crc@primary-yf-tiled-reflect-x-90.html

  * igt@kms_setmode@clone-exclusive-crtc:
    - shard-bmg:          [PASS][125] -> [SKIP][126] ([Intel XE#1435])
   [125]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-1/igt@kms_setmode@clone-exclusive-crtc.html
   [126]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-6/igt@kms_setmode@clone-exclusive-crtc.html

  * igt@kms_vrr@cmrr@pipe-a-edp-1:
    - shard-lnl:          [PASS][127] -> [FAIL][128] ([Intel XE#4459]) +1 other test fail
   [127]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-lnl-1/igt@kms_vrr@cmrr@pipe-a-edp-1.html
   [128]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-4/igt@kms_vrr@cmrr@pipe-a-edp-1.html

  * igt@kms_vrr@seamless-rr-switch-virtual:
    - shard-bmg:          NOTRUN -> [SKIP][129] ([Intel XE#1499])
   [129]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-1/igt@kms_vrr@seamless-rr-switch-virtual.html

  * igt@xe_copy_basic@mem-set-linear-0xfffe:
    - shard-dg2-set2:     NOTRUN -> [SKIP][130] ([Intel XE#1126])
   [130]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-463/igt@xe_copy_basic@mem-set-linear-0xfffe.html

  * igt@xe_eu_stall@blocking-read:
    - shard-dg2-set2:     NOTRUN -> [SKIP][131] ([Intel XE#5626])
   [131]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-463/igt@xe_eu_stall@blocking-read.html

  * igt@xe_eudebug@basic-vm-access-userptr:
    - shard-adlp:         NOTRUN -> [SKIP][132] ([Intel XE#4837] / [Intel XE#5565])
   [132]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-3/igt@xe_eudebug@basic-vm-access-userptr.html

  * igt@xe_eudebug@basic-vm-bind-extended-discovery:
    - shard-dg2-set2:     NOTRUN -> [SKIP][133] ([Intel XE#4837]) +1 other test skip
   [133]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-463/igt@xe_eudebug@basic-vm-bind-extended-discovery.html

  * igt@xe_eudebug_online@basic-breakpoint:
    - shard-bmg:          NOTRUN -> [SKIP][134] ([Intel XE#4837]) +2 other tests skip
   [134]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-4/igt@xe_eudebug_online@basic-breakpoint.html
    - shard-lnl:          NOTRUN -> [SKIP][135] ([Intel XE#4837]) +2 other tests skip
   [135]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-1/igt@xe_eudebug_online@basic-breakpoint.html

  * igt@xe_evict@evict-beng-large:
    - shard-adlp:         NOTRUN -> [SKIP][136] ([Intel XE#261] / [Intel XE#5564])
   [136]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-9/igt@xe_evict@evict-beng-large.html
    - shard-lnl:          NOTRUN -> [SKIP][137] ([Intel XE#688])
   [137]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-3/igt@xe_evict@evict-beng-large.html

  * igt@xe_exec_basic@multigpu-many-execqueues-many-vm-bindexecqueue-userptr-rebind:
    - shard-lnl:          NOTRUN -> [SKIP][138] ([Intel XE#1392]) +1 other test skip
   [138]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-4/igt@xe_exec_basic@multigpu-many-execqueues-many-vm-bindexecqueue-userptr-rebind.html

  * igt@xe_exec_basic@multigpu-once-basic-defer-bind:
    - shard-adlp:         NOTRUN -> [SKIP][139] ([Intel XE#1392] / [Intel XE#5575]) +1 other test skip
   [139]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-3/igt@xe_exec_basic@multigpu-once-basic-defer-bind.html
    - shard-bmg:          NOTRUN -> [SKIP][140] ([Intel XE#2322]) +3 other tests skip
   [140]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-7/igt@xe_exec_basic@multigpu-once-basic-defer-bind.html

  * igt@xe_exec_basic@multigpu-once-bindexecqueue-userptr-rebind:
    - shard-dg2-set2:     [PASS][141] -> [SKIP][142] ([Intel XE#1392]) +2 other tests skip
   [141]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-dg2-463/igt@xe_exec_basic@multigpu-once-bindexecqueue-userptr-rebind.html
   [142]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-432/igt@xe_exec_basic@multigpu-once-bindexecqueue-userptr-rebind.html

  * igt@xe_exec_fault_mode@many-execqueues-bindexecqueue-userptr-rebind-imm:
    - shard-dg2-set2:     NOTRUN -> [SKIP][143] ([Intel XE#288]) +8 other tests skip
   [143]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-463/igt@xe_exec_fault_mode@many-execqueues-bindexecqueue-userptr-rebind-imm.html

  * igt@xe_exec_fault_mode@many-execqueues-rebind:
    - shard-adlp:         NOTRUN -> [SKIP][144] ([Intel XE#288] / [Intel XE#5561]) +4 other tests skip
   [144]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-2/igt@xe_exec_fault_mode@many-execqueues-rebind.html

  * igt@xe_exec_mix_modes@exec-simple-batch-store-dma-fence:
    - shard-dg2-set2:     NOTRUN -> [SKIP][145] ([Intel XE#2360]) +1 other test skip
   [145]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-463/igt@xe_exec_mix_modes@exec-simple-batch-store-dma-fence.html

  * igt@xe_exec_mix_modes@exec-spinner-interrupted-lr:
    - shard-adlp:         NOTRUN -> [SKIP][146] ([Intel XE#2360] / [Intel XE#5573])
   [146]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-2/igt@xe_exec_mix_modes@exec-spinner-interrupted-lr.html

  * igt@xe_exec_reset@gt-reset-stress:
    - shard-adlp:         NOTRUN -> [ABORT][147] ([Intel XE#5729])
   [147]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-6/igt@xe_exec_reset@gt-reset-stress.html

  * igt@xe_exec_reset@parallel-gt-reset:
    - shard-adlp:         [PASS][148] -> [DMESG-WARN][149] ([Intel XE#3876])
   [148]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-adlp-8/igt@xe_exec_reset@parallel-gt-reset.html
   [149]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-8/igt@xe_exec_reset@parallel-gt-reset.html

  * igt@xe_exec_system_allocator@many-stride-mmap-huge-nomemset:
    - shard-lnl:          NOTRUN -> [SKIP][150] ([Intel XE#4943]) +6 other tests skip
   [150]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-5/igt@xe_exec_system_allocator@many-stride-mmap-huge-nomemset.html

  * igt@xe_exec_system_allocator@once-mmap-new-huge-nomemset:
    - shard-bmg:          NOTRUN -> [SKIP][151] ([Intel XE#4943]) +9 other tests skip
   [151]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-4/igt@xe_exec_system_allocator@once-mmap-new-huge-nomemset.html

  * igt@xe_exec_system_allocator@threads-shared-vm-many-mmap-shared-nomemset:
    - shard-dg2-set2:     NOTRUN -> [SKIP][152] ([Intel XE#4915]) +81 other tests skip
   [152]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-436/igt@xe_exec_system_allocator@threads-shared-vm-many-mmap-shared-nomemset.html

  * igt@xe_exec_system_allocator@threads-shared-vm-many-mmap-shared-remap:
    - shard-adlp:         NOTRUN -> [SKIP][153] ([Intel XE#4915]) +39 other tests skip
   [153]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-8/igt@xe_exec_system_allocator@threads-shared-vm-many-mmap-shared-remap.html

  * igt@xe_exec_threads@threads-mixed-fd-rebind:
    - shard-adlp:         [PASS][154] -> [ABORT][155] ([Intel XE#3970])
   [154]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-adlp-6/igt@xe_exec_threads@threads-mixed-fd-rebind.html
   [155]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-8/igt@xe_exec_threads@threads-mixed-fd-rebind.html

  * igt@xe_live_ktest@xe_bo:
    - shard-adlp:         NOTRUN -> [SKIP][156] ([Intel XE#2229] / [Intel XE#455]) +1 other test skip
   [156]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-9/igt@xe_live_ktest@xe_bo.html

  * igt@xe_live_ktest@xe_bo@xe_bo_evict_kunit:
    - shard-lnl:          NOTRUN -> [SKIP][157] ([Intel XE#2229])
   [157]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-3/igt@xe_live_ktest@xe_bo@xe_bo_evict_kunit.html

  * igt@xe_live_ktest@xe_bo@xe_ccs_migrate_kunit:
    - shard-bmg:          NOTRUN -> [SKIP][158] ([Intel XE#2229])
   [158]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-3/igt@xe_live_ktest@xe_bo@xe_ccs_migrate_kunit.html
    - shard-adlp:         NOTRUN -> [SKIP][159] ([Intel XE#2229])
   [159]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-9/igt@xe_live_ktest@xe_bo@xe_ccs_migrate_kunit.html

  * igt@xe_oa@create-destroy-userspace-config:
    - shard-dg2-set2:     NOTRUN -> [SKIP][160] ([Intel XE#3573])
   [160]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-433/igt@xe_oa@create-destroy-userspace-config.html

  * igt@xe_oa@oa-regs-whitelisted:
    - shard-adlp:         NOTRUN -> [SKIP][161] ([Intel XE#3573])
   [161]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-4/igt@xe_oa@oa-regs-whitelisted.html

  * igt@xe_pat@pat-index-xehpc:
    - shard-adlp:         NOTRUN -> [SKIP][162] ([Intel XE#2838] / [Intel XE#979])
   [162]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-8/igt@xe_pat@pat-index-xehpc.html
    - shard-lnl:          NOTRUN -> [SKIP][163] ([Intel XE#1420] / [Intel XE#2838])
   [163]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-7/igt@xe_pat@pat-index-xehpc.html

  * igt@xe_pm@s2idle-vm-bind-userptr:
    - shard-adlp:         [PASS][164] -> [DMESG-WARN][165] ([Intel XE#2953] / [Intel XE#4173] / [Intel XE#4504])
   [164]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-adlp-9/igt@xe_pm@s2idle-vm-bind-userptr.html
   [165]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-8/igt@xe_pm@s2idle-vm-bind-userptr.html

  * igt@xe_pm@s3-d3cold-basic-exec:
    - shard-adlp:         NOTRUN -> [SKIP][166] ([Intel XE#2284] / [Intel XE#366])
   [166]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-2/igt@xe_pm@s3-d3cold-basic-exec.html
    - shard-bmg:          NOTRUN -> [SKIP][167] ([Intel XE#2284])
   [167]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-3/igt@xe_pm@s3-d3cold-basic-exec.html
    - shard-lnl:          NOTRUN -> [SKIP][168] ([Intel XE#2284] / [Intel XE#366])
   [168]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-4/igt@xe_pm@s3-d3cold-basic-exec.html

  * igt@xe_pxp@display-pxp-fb:
    - shard-dg2-set2:     NOTRUN -> [SKIP][169] ([Intel XE#4733])
   [169]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-433/igt@xe_pxp@display-pxp-fb.html

  * igt@xe_pxp@pxp-stale-bo-exec-post-rpm:
    - shard-bmg:          NOTRUN -> [SKIP][170] ([Intel XE#4733])
   [170]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-8/igt@xe_pxp@pxp-stale-bo-exec-post-rpm.html

  * igt@xe_query@multigpu-query-mem-usage:
    - shard-bmg:          NOTRUN -> [SKIP][171] ([Intel XE#944])
   [171]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-5/igt@xe_query@multigpu-query-mem-usage.html
    - shard-dg2-set2:     NOTRUN -> [SKIP][172] ([Intel XE#944]) +1 other test skip
   [172]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-436/igt@xe_query@multigpu-query-mem-usage.html
    - shard-lnl:          NOTRUN -> [SKIP][173] ([Intel XE#944])
   [173]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-2/igt@xe_query@multigpu-query-mem-usage.html
    - shard-adlp:         NOTRUN -> [SKIP][174] ([Intel XE#944])
   [174]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-6/igt@xe_query@multigpu-query-mem-usage.html

  
#### Possible fixes ####

  * igt@kms_async_flips@async-flip-suspend-resume@pipe-d-hdmi-a-3:
    - shard-bmg:          [INCOMPLETE][175] ([Intel XE#4912]) -> [PASS][176] +1 other test pass
   [175]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-4/igt@kms_async_flips@async-flip-suspend-resume@pipe-d-hdmi-a-3.html
   [176]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-4/igt@kms_async_flips@async-flip-suspend-resume@pipe-d-hdmi-a-3.html

  * igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-hflip:
    - shard-adlp:         [DMESG-FAIL][177] ([Intel XE#4543]) -> [PASS][178] +8 other tests pass
   [177]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-adlp-4/igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-hflip.html
   [178]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-2/igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-hflip.html

  * igt@kms_cursor_legacy@2x-long-flip-vs-cursor-legacy:
    - shard-bmg:          [SKIP][179] ([Intel XE#2291]) -> [PASS][180] +1 other test pass
   [179]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-6/igt@kms_cursor_legacy@2x-long-flip-vs-cursor-legacy.html
   [180]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-5/igt@kms_cursor_legacy@2x-long-flip-vs-cursor-legacy.html

  * igt@kms_dp_aux_dev:
    - shard-bmg:          [SKIP][181] ([Intel XE#3009]) -> [PASS][182]
   [181]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-6/igt@kms_dp_aux_dev.html
   [182]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-8/igt@kms_dp_aux_dev.html

  * igt@kms_flip@2x-plain-flip-ts-check-interruptible:
    - shard-bmg:          [SKIP][183] ([Intel XE#2316]) -> [PASS][184] +3 other tests pass
   [183]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-6/igt@kms_flip@2x-plain-flip-ts-check-interruptible.html
   [184]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-3/igt@kms_flip@2x-plain-flip-ts-check-interruptible.html

  * igt@kms_flip@flip-vs-panning-interruptible:
    - shard-adlp:         [DMESG-WARN][185] ([Intel XE#4543] / [Intel XE#5208]) -> [PASS][186]
   [185]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-adlp-8/igt@kms_flip@flip-vs-panning-interruptible.html
   [186]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-4/igt@kms_flip@flip-vs-panning-interruptible.html

  * igt@kms_flip@flip-vs-panning-interruptible@b-hdmi-a1:
    - shard-adlp:         [DMESG-WARN][187] ([Intel XE#4543]) -> [PASS][188] +20 other tests pass
   [187]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-adlp-8/igt@kms_flip@flip-vs-panning-interruptible@b-hdmi-a1.html
   [188]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-adlp-4/igt@kms_flip@flip-vs-panning-interruptible@b-hdmi-a1.html

  * igt@kms_flip@flip-vs-suspend:
    - shard-bmg:          [INCOMPLETE][189] ([Intel XE#2049] / [Intel XE#2597]) -> [PASS][190] +1 other test pass
   [189]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-3/igt@kms_flip@flip-vs-suspend.html
   [190]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-8/igt@kms_flip@flip-vs-suspend.html
    - shard-dg2-set2:     [INCOMPLETE][191] ([Intel XE#2049] / [Intel XE#2597]) -> [PASS][192] +1 other test pass
   [191]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-dg2-433/igt@kms_flip@flip-vs-suspend.html
   [192]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-434/igt@kms_flip@flip-vs-suspend.html

  * igt@kms_hdr@static-swap:
    - shard-bmg:          [SKIP][193] ([Intel XE#1503]) -> [PASS][194]
   [193]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-6/igt@kms_hdr@static-swap.html
   [194]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-7/igt@kms_hdr@static-swap.html

  * igt@kms_plane_cursor@overlay:
    - shard-dg2-set2:     [FAIL][195] ([Intel XE#616]) -> [PASS][196] +1 other test pass
   [195]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-dg2-432/igt@kms_plane_cursor@overlay.html
   [196]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-466/igt@kms_plane_cursor@overlay.html

  * igt@xe_exec_basic@multigpu-no-exec-null-defer-bind:
    - shard-dg2-set2:     [SKIP][197] ([Intel XE#1392]) -> [PASS][198]
   [197]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-dg2-432/igt@xe_exec_basic@multigpu-no-exec-null-defer-bind.html
   [198]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-436/igt@xe_exec_basic@multigpu-no-exec-null-defer-bind.html

  * igt@xe_exec_fault_mode@many-execqueues-bindexecqueue-userptr-rebind:
    - shard-bmg:          [FAIL][199] -> [PASS][200]
   [199]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-7/igt@xe_exec_fault_mode@many-execqueues-bindexecqueue-userptr-rebind.html
   [200]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-1/igt@xe_exec_fault_mode@many-execqueues-bindexecqueue-userptr-rebind.html

  * igt@xe_exec_reset@parallel-gt-reset:
    - shard-dg2-set2:     [DMESG-WARN][201] ([Intel XE#3876]) -> [PASS][202]
   [201]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-dg2-466/igt@xe_exec_reset@parallel-gt-reset.html
   [202]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-466/igt@xe_exec_reset@parallel-gt-reset.html

  * igt@xe_exec_threads@threads-hang-shared-vm-userptr-invalidate:
    - shard-bmg:          [DMESG-FAIL][203] ([Intel XE#3876]) -> [PASS][204]
   [203]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-3/igt@xe_exec_threads@threads-hang-shared-vm-userptr-invalidate.html
   [204]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-8/igt@xe_exec_threads@threads-hang-shared-vm-userptr-invalidate.html

  * igt@xe_module_load@load:
    - shard-bmg:          ([PASS][205], [PASS][206], [PASS][207], [PASS][208], [PASS][209], [PASS][210], [PASS][211], [PASS][212], [PASS][213], [PASS][214], [SKIP][215], [PASS][216], [PASS][217], [PASS][218], [PASS][219], [PASS][220], [PASS][221], [PASS][222], [PASS][223], [PASS][224], [PASS][225], [PASS][226], [PASS][227], [PASS][228], [PASS][229], [PASS][230]) ([Intel XE#2457]) -> ([PASS][231], [PASS][232], [PASS][233], [PASS][234], [PASS][235], [PASS][236], [PASS][237], [PASS][238], [PASS][239], [PASS][240], [PASS][241], [PASS][242], [PASS][243], [PASS][244], [PASS][245], [PASS][246], [PASS][247], [PASS][248], [PASS][249], [PASS][250], [PASS][251], [PASS][252], [PASS][253], [PASS][254], [PASS][255])
   [205]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-8/igt@xe_module_load@load.html
   [206]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-8/igt@xe_module_load@load.html
   [207]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-4/igt@xe_module_load@load.html
   [208]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-8/igt@xe_module_load@load.html
   [209]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-4/igt@xe_module_load@load.html
   [210]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-4/igt@xe_module_load@load.html
   [211]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-1/igt@xe_module_load@load.html
   [212]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-1/igt@xe_module_load@load.html
   [213]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-2/igt@xe_module_load@load.html
   [214]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-7/igt@xe_module_load@load.html
   [215]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-2/igt@xe_module_load@load.html
   [216]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-7/igt@xe_module_load@load.html
   [217]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-2/igt@xe_module_load@load.html
   [218]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-7/igt@xe_module_load@load.html
   [219]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-2/igt@xe_module_load@load.html
   [220]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-2/igt@xe_module_load@load.html
   [221]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-7/igt@xe_module_load@load.html
   [222]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-5/igt@xe_module_load@load.html
   [223]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-5/igt@xe_module_load@load.html
   [224]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-3/igt@xe_module_load@load.html
   [225]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-5/igt@xe_module_load@load.html
   [226]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-3/igt@xe_module_load@load.html
   [227]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-3/igt@xe_module_load@load.html
   [228]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-6/igt@xe_module_load@load.html
   [229]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-6/igt@xe_module_load@load.html
   [230]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-6/igt@xe_module_load@load.html
   [231]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-4/igt@xe_module_load@load.html
   [232]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-2/igt@xe_module_load@load.html
   [233]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-8/igt@xe_module_load@load.html
   [234]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-6/igt@xe_module_load@load.html
   [235]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-6/igt@xe_module_load@load.html
   [236]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-8/igt@xe_module_load@load.html
   [237]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-1/igt@xe_module_load@load.html
   [238]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-3/igt@xe_module_load@load.html
   [239]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-8/igt@xe_module_load@load.html
   [240]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-3/igt@xe_module_load@load.html
   [241]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-3/igt@xe_module_load@load.html
   [242]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-7/igt@xe_module_load@load.html
   [243]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-5/igt@xe_module_load@load.html
   [244]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-5/igt@xe_module_load@load.html
   [245]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-7/igt@xe_module_load@load.html
   [246]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-7/igt@xe_module_load@load.html
   [247]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-6/igt@xe_module_load@load.html
   [248]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-1/igt@xe_module_load@load.html
   [249]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-8/igt@xe_module_load@load.html
   [250]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-6/igt@xe_module_load@load.html
   [251]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-5/igt@xe_module_load@load.html
   [252]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-4/igt@xe_module_load@load.html
   [253]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-4/igt@xe_module_load@load.html
   [254]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-1/igt@xe_module_load@load.html
   [255]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-2/igt@xe_module_load@load.html

  * igt@xe_pmu@gt-frequency:
    - shard-dg2-set2:     [FAIL][256] ([Intel XE#4819]) -> [PASS][257] +1 other test pass
   [256]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-dg2-435/igt@xe_pmu@gt-frequency.html
   [257]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-dg2-466/igt@xe_pmu@gt-frequency.html
    - shard-lnl:          [FAIL][258] ([Intel XE#5166]) -> [PASS][259] +1 other test pass
   [258]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-lnl-7/igt@xe_pmu@gt-frequency.html
   [259]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-7/igt@xe_pmu@gt-frequency.html

  
#### Warnings ####

  * igt@kms_content_protection@lic-type-0:
    - shard-bmg:          [SKIP][260] ([Intel XE#2341]) -> [FAIL][261] ([Intel XE#1178])
   [260]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-6/igt@kms_content_protection@lic-type-0.html
   [261]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-3/igt@kms_content_protection@lic-type-0.html

  * igt@kms_content_protection@srm:
    - shard-bmg:          [FAIL][262] ([Intel XE#1178]) -> [SKIP][263] ([Intel XE#2341])
   [262]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-1/igt@kms_content_protection@srm.html
   [263]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-6/igt@kms_content_protection@srm.html

  * igt@kms_flip@bo-too-big-interruptible@a-edp1:
    - shard-lnl:          [TIMEOUT][264] ([Intel XE#1504] / [Intel XE#5737]) -> [TIMEOUT][265] ([Intel XE#1504]) +1 other test timeout
   [264]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-lnl-2/igt@kms_flip@bo-too-big-interruptible@a-edp1.html
   [265]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-lnl-5/igt@kms_flip@bo-too-big-interruptible@a-edp1.html

  * igt@kms_frontbuffer_tracking@drrs-2p-scndscrn-indfb-pgflip-blt:
    - shard-bmg:          [SKIP][266] ([Intel XE#2311]) -> [SKIP][267] ([Intel XE#2312]) +9 other tests skip
   [266]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-7/igt@kms_frontbuffer_tracking@drrs-2p-scndscrn-indfb-pgflip-blt.html
   [267]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-6/igt@kms_frontbuffer_tracking@drrs-2p-scndscrn-indfb-pgflip-blt.html

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-cur-indfb-draw-render:
    - shard-bmg:          [SKIP][268] ([Intel XE#5390]) -> [SKIP][269] ([Intel XE#2312]) +1 other test skip
   [268]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-5/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-cur-indfb-draw-render.html
   [269]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-6/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-cur-indfb-draw-render.html

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-onoff:
    - shard-bmg:          [SKIP][270] ([Intel XE#2312]) -> [SKIP][271] ([Intel XE#5390]) +3 other tests skip
   [270]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-6/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-onoff.html
   [271]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-5/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-onoff.html

  * igt@kms_frontbuffer_tracking@fbcdrrs-2p-primscrn-pri-indfb-draw-blt:
    - shard-bmg:          [SKIP][272] ([Intel XE#2312]) -> [SKIP][273] ([Intel XE#2311]) +6 other tests skip
   [272]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-6/igt@kms_frontbuffer_tracking@fbcdrrs-2p-primscrn-pri-indfb-draw-blt.html
   [273]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-8/igt@kms_frontbuffer_tracking@fbcdrrs-2p-primscrn-pri-indfb-draw-blt.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-spr-indfb-fullscreen:
    - shard-bmg:          [SKIP][274] ([Intel XE#2312]) -> [SKIP][275] ([Intel XE#2313]) +3 other tests skip
   [274]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-6/igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-spr-indfb-fullscreen.html
   [275]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-8/igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-spr-indfb-fullscreen.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-shrfb-pgflip-blt:
    - shard-bmg:          [SKIP][276] ([Intel XE#2313]) -> [SKIP][277] ([Intel XE#2312]) +11 other tests skip
   [276]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f/shard-bmg-1/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-shrfb-pgflip-blt.html
   [277]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/shard-bmg-6/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-shrfb-pgflip-blt.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [Intel XE#1122]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1122
  [Intel XE#1124]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1124
  [Intel XE#1126]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1126
  [Intel XE#1158]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1158
  [Intel XE#1178]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1178
  [Intel XE#1392]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1392
  [Intel XE#1397]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1397
  [Intel XE#1401]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1401
  [Intel XE#1406]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1406
  [Intel XE#1407]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1407
  [Intel XE#1420]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1420
  [Intel XE#1421]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1421
  [Intel XE#1424]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1424
  [Intel XE#1435]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1435
  [Intel XE#1439]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1439
  [Intel XE#1466]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1466
  [Intel XE#1470]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1470
  [Intel XE#1489]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1489
  [Intel XE#1499]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1499
  [Intel XE#1503]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1503
  [Intel XE#1504]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1504
  [Intel XE#1727]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1727
  [Intel XE#1745]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1745
  [Intel XE#2049]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2049
  [Intel XE#2191]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2191
  [Intel XE#2229]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2229
  [Intel XE#2234]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2234
  [Intel XE#2252]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2252
  [Intel XE#2284]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2284
  [Intel XE#2291]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2291
  [Intel XE#2293]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2293
  [Intel XE#2311]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2311
  [Intel XE#2312]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2312
  [Intel XE#2313]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2313
  [Intel XE#2314]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2314
  [Intel XE#2316]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2316
  [Intel XE#2320]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2320
  [Intel XE#2321]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2321
  [Intel XE#2322]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2322
  [Intel XE#2327]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2327
  [Intel XE#2341]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2341
  [Intel XE#2350]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2350
  [Intel XE#2360]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2360
  [Intel XE#2380]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2380
  [Intel XE#2457]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2457
  [Intel XE#2597]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2597
  [Intel XE#261]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/261
  [Intel XE#2669]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2669
  [Intel XE#2763]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2763
  [Intel XE#2838]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2838
  [Intel XE#2850]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2850
  [Intel XE#2853]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2853
  [Intel XE#288]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/288
  [Intel XE#2887]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2887
  [Intel XE#2893]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2893
  [Intel XE#2894]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2894
  [Intel XE#2907]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2907
  [Intel XE#2927]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2927
  [Intel XE#2953]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2953
  [Intel XE#3009]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3009
  [Intel XE#301]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/301
  [Intel XE#306]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/306
  [Intel XE#308]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/308
  [Intel XE#310]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/310
  [Intel XE#3113]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3113
  [Intel XE#3124]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3124
  [Intel XE#3149]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3149
  [Intel XE#316]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/316
  [Intel XE#3414]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3414
  [Intel XE#3432]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3432
  [Intel XE#3573]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3573
  [Intel XE#366]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/366
  [Intel XE#367]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/367
  [Intel XE#373]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/373
  [Intel XE#3876]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3876
  [Intel XE#3904]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3904
  [Intel XE#3908]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3908
  [Intel XE#3970]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3970
  [Intel XE#4173]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4173
  [Intel XE#4294]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4294
  [Intel XE#4416]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4416
  [Intel XE#4422]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4422
  [Intel XE#4459]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4459
  [Intel XE#4504]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4504
  [Intel XE#4543]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4543
  [Intel XE#455]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/455
  [Intel XE#4665]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4665
  [Intel XE#4733]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4733
  [Intel XE#4819]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4819
  [Intel XE#4837]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4837
  [Intel XE#4912]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4912
  [Intel XE#4915]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4915
  [Intel XE#4943]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4943
  [Intel XE#5166]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5166
  [Intel XE#5208]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5208
  [Intel XE#5390]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5390
  [Intel XE#5561]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5561
  [Intel XE#5564]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5564
  [Intel XE#5565]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5565
  [Intel XE#5573]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5573
  [Intel XE#5575]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5575
  [Intel XE#5626]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5626
  [Intel XE#5729]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5729
  [Intel XE#5737]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5737
  [Intel XE#5784]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5784
  [Intel XE#5899]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5899
  [Intel XE#610]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/610
  [Intel XE#616]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/616
  [Intel XE#651]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/651
  [Intel XE#653]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/653
  [Intel XE#656]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/656
  [Intel XE#688]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/688
  [Intel XE#718]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/718
  [Intel XE#776]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/776
  [Intel XE#787]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/787
  [Intel XE#836]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/836
  [Intel XE#929]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/929
  [Intel XE#944]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/944
  [Intel XE#979]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/979


Build changes
-------------

  * IGT: IGT_8503 -> IGT_8504
  * Linux: xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f -> xe-pw-152882v2

  IGT_8503: 8503
  IGT_8504: 8504
  xe-3597-cca87ca63e2f5b8a785dc59c23e526987530b27f: cca87ca63e2f5b8a785dc59c23e526987530b27f
  xe-pw-152882v2: 152882v2

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152882v2/index.html

[-- Attachment #2: Type: text/html, Size: 83799 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 06/16] drm/xe: Convert xe_bo_create_user() for exhaustive eviction
  2025-08-22  9:40 ` [PATCH v2 06/16] drm/xe: Convert xe_bo_create_user() for exhaustive eviction Thomas Hellström
@ 2025-08-23  9:32   ` Simon Richter
  0 siblings, 0 replies; 36+ messages in thread
From: Simon Richter @ 2025-08-23  9:32 UTC (permalink / raw)
  To: intel-xe


[-- Attachment #1.1: Type: text/plain, Size: 935 bytes --]

Hi,

On 8/22/25 18:40, Thomas Hellström wrote:

> +	xe_validation_guard(&ctx, &xe->val, &exec, (struct xe_val_flags) {.interruptible = true},
> +			    err) {
> +		if (vm) {
> +			err = xe_vm_drm_exec_lock(vm, &exec);
> +			drm_exec_retry_on_contention(&exec);
> +			if (err)
> +				break;
> +		}
> +		bo = xe_bo_create_user(xe, vm, args->size, args->cpu_caching,
> +				       bo_flags, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +		if (IS_ERR(bo)) {
> +			err = PTR_ERR(bo);
> +			xe_validation_retry_on_oom(&ctx, &err);
> +			break;
> +		}

I can't try it again because the patch series doesn't apply to the 
current drm-tip, but: when I was running Piglit against the last 
version, I ended up on the failing path for the xe_validation_guard, 
with 64 threads creating a context, submitting a small workload, and 
dismantling the context again.

Am I supposed to end up in there?

    Simon

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 05/16] drm/xe: Introduce an xe_validation wrapper around drm_exec
  2025-08-22  9:40 ` [PATCH v2 05/16] drm/xe: Introduce an xe_validation wrapper around drm_exec Thomas Hellström
@ 2025-08-26 20:42   ` Matthew Brost
  0 siblings, 0 replies; 36+ messages in thread
From: Matthew Brost @ 2025-08-26 20:42 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Fri, Aug 22, 2025 at 11:40:19AM +0200, Thomas Hellström wrote:
> Introduce a validation wrapper xe_validation_guard() as a helper
> intended to be used around drm_exec transactions what perform
> validations. Once TTM can handle exhaustive eviction we could
> remove this wrapper or make it mostly a NO-OP unless other
> functionality is added to it.
> 
> Currently the wrapper takes a read lock upon entry and if the
> transaction hits an OOM, all locks are released and the
> transaction is retried with a write-lock. If all other
> validations participate in this scheme, the transaction with
> the write lock will be the only transaction validating and
> should have access to all available non-pinned memory.
> 
> There is currently a problem in that TTM converts -EDEADLOCKS to
> -ENOMEM, and with ww_mutex slowpath error injections, we can hit
> -ENOMEMs without having actually ran out of memory. We abuse
> ww_mutex internals to detect such situations until TTM is fixes
> to not convert the error code. In the meantime, injecting
> ww_mutex slowpath -EDEADLOCKs is a good way to test
> the implementation in the absence of real OOMs.
> 
> Just introduce the wrapper in this commit. It will be hooked up
> to the driver in following commits.
> 
> v2:
> - Mark class_xe_validation conditional so that the loop is
>   skipped on initialization error.
> - Argument sanitation (Matt Brost)
> - Fix conditional execution of xe_validation_ctx_fini()
>   (Matt Brost)
> - Add a no_block mode for upcoming use in the CPU fault handler.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_validation.c | 228 +++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_validation.h | 122 +++++++++++++++
>  2 files changed, 350 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_validation.c b/drivers/gpu/drm/xe/xe_validation.c
> index cc0684d24e02..b90fda3dd5f4 100644
> --- a/drivers/gpu/drm/xe/xe_validation.c
> +++ b/drivers/gpu/drm/xe/xe_validation.c
> @@ -5,6 +5,7 @@
>  #include "xe_bo.h"
>  #include <drm/drm_exec.h>
>  #include <drm/drm_gem.h>
> +#include <drm/drm_gpuvm.h>
>  
>  #include "xe_assert.h"
>  #include "xe_validation.h"
> @@ -47,3 +48,230 @@ void xe_validation_assert_exec(const struct xe_device *xe,
>  	}
>  }
>  #endif
> +
> +static int xe_validation_lock(struct xe_validation_ctx *ctx)
> +{
> +	struct xe_validation_device *val = ctx->val;
> +	int ret = 0;
> +
> +	if (ctx->val_flags.interruptible) {
> +		if (ctx->request_exclusive)
> +			ret = down_write_killable(&val->lock);
> +		else
> +			ret = down_read_interruptible(&val->lock);
> +	} else {
> +		if (ctx->request_exclusive)
> +			down_write(&val->lock);
> +		else
> +			down_read(&val->lock);
> +	}
> +
> +	if (!ret) {
> +		ctx->lock_held = true;
> +		ctx->lock_held_exclusive = ctx->request_exclusive;
> +	}
> +
> +	return ret;
> +}
> +
> +static int xe_validation_trylock(struct xe_validation_ctx *ctx)
> +{
> +	struct xe_validation_device *val = ctx->val;
> +	bool locked;
> +
> +	if (ctx->request_exclusive)
> +		locked = down_write_trylock(&val->lock);
> +	else
> +		locked = down_read_trylock(&val->lock);
> +
> +	if (locked) {
> +		ctx->lock_held = true;
> +		ctx->lock_held_exclusive = ctx->request_exclusive;
> +	}
> +
> +	return locked ? 0 : -EWOULDBLOCK;
> +}
> +
> +static void xe_validation_unlock(struct xe_validation_ctx *ctx)
> +{
> +	if (!ctx->lock_held)
> +		return;
> +
> +	if (ctx->lock_held_exclusive)
> +		up_write(&ctx->val->lock);
> +	else
> +		up_read(&ctx->val->lock);
> +
> +	ctx->lock_held = false;
> +}
> +
> +/**
> + * xe_validation_ctx_init() - Initialize an xe_validation_ctx
> + * @ctx: The xe_validation_ctx to initialize.
> + * @val: The xe_validation_device representing the validation domain.
> + * @exec: The struct drm_exec to use for the transaction. May be NULL.
> + * @flags: The flags to use for initialization.
> + *
> + * Initialize and lock a an xe_validation transaction using the validation domain
> + * represented by @val. Also initialize the drm_exec object forwarding parts of
> + * @flags to the drm_exec initialization. The @flags.exclusive flag should
> + * typically be set to false to avoid locking out other validators from the
> + * domain until an OOM is hit. For testing- or final attempt purposes it can,
> + * however, be set to true.
> + *
> + * Return: %0 on success, %-EINTR if interruptible initial locking failed with a
> + * signal pending. If @flags.no_block is set to true, a failed trylock
> + * returns %-EWOULDBLOCK.
> + */
> +int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct xe_validation_device *val,
> +			   struct drm_exec *exec, const struct xe_val_flags flags)
> +{
> +	int ret;
> +
> +	ctx->exec = exec;
> +	ctx->val = val;
> +	ctx->lock_held = false;
> +	ctx->lock_held_exclusive = false;
> +	ctx->request_exclusive = flags.exclusive;
> +	ctx->val_flags = flags;
> +	ctx->exec_flags = 0;
> +	ctx->nr = 0;
> +
> +	if (flags.no_block)
> +		ret = xe_validation_trylock(ctx);
> +	else
> +		ret = xe_validation_lock(ctx);
> +	if (ret)
> +		return ret;
> +
> +	if (exec) {
> +		if (flags.interruptible)
> +			ctx->exec_flags |= DRM_EXEC_INTERRUPTIBLE_WAIT;
> +		if (flags.exec_ignore_duplicates)
> +			ctx->exec_flags |= DRM_EXEC_IGNORE_DUPLICATES;
> +		drm_exec_init(exec, ctx->exec_flags, ctx->nr);
> +	}
> +
> +	return 0;
> +}
> +
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> +/*
> + * This abuses both drm_exec and ww_mutex internals and should be
> + * replaced by checking for -EDEADLK when we can make TTM
> + * stop converting -EDEADLK to -ENOMEM.
> + * An alternative is to not have exhaustive eviction with
> + * CONFIG_DEBUG_WW_MUTEX_SLOWPATH until that happens.
> + */
> +static bool xe_validation_contention_injected(struct drm_exec *exec)
> +{
> +	return !!exec->ticket.contending_lock;
> +}
> +
> +#else
> +
> +static bool xe_validation_contention_injected(struct drm_exec *exec)
> +{
> +	return false;
> +}
> +
> +#endif
> +
> +static bool __xe_validation_should_retry(struct xe_validation_ctx *ctx, int ret)
> +{
> +	if (ret == -ENOMEM &&
> +	    ((ctx->request_exclusive &&
> +	      xe_validation_contention_injected(ctx->exec)) ||
> +	     !ctx->request_exclusive)) {
> +		ctx->request_exclusive = true;
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +/**
> + * xe_validation_exec_lock() - Perform drm_gpuvm_exec_lock within a validation
> + * transaction.
> + * @ctx: An uninitialized xe_validation_ctx.
> + * @vm_exec: An initialized struct vm_exec.
> + * @val: The validation domain.
> + *
> + * The drm_gpuvm_exec_lock() function internally initializes its drm_exec
> + * transaction and therefore doesn't lend itself very well to be using
> + * xe_validation_ctx_init(). Provide a helper that takes an uninitialized
> + * xe_validation_ctx and calls drm_gpuvm_exec_lock() with OOM retry.
> + *
> + * Return: %0 on success, negative error code on failure.
> + */
> +int xe_validation_exec_lock(struct xe_validation_ctx *ctx,
> +			    struct drm_gpuvm_exec *vm_exec,
> +			    struct xe_validation_device *val)
> +{
> +	int ret;
> +
> +	memset(ctx, 0, sizeof(*ctx));
> +	ctx->exec = &vm_exec->exec;
> +	ctx->exec_flags = vm_exec->flags;
> +	ctx->val = val;
> +	if (ctx->exec_flags & DRM_EXEC_INTERRUPTIBLE_WAIT)
> +		ctx->val_flags.interruptible = 1;
> +	if (ctx->exec_flags & DRM_EXEC_IGNORE_DUPLICATES)
> +		ctx->val_flags.exec_ignore_duplicates = 1;
> +retry:
> +	ret = xe_validation_lock(ctx);
> +	if (ret)
> +		return ret;
> +
> +	ret = drm_gpuvm_exec_lock(vm_exec);
> +	if (ret) {
> +		xe_validation_unlock(ctx);
> +		if (__xe_validation_should_retry(ctx, ret))
> +			goto retry;
> +	}
> +
> +	return ret;
> +}
> +
> +/**
> + * xe_validation_ctx_fini() - Finalize a validation transaction
> + * @ctx: The Validation transaction to finalize.
> + *
> + * Finalize a validation transaction and its related drm_exec transaction.
> + */
> +void xe_validation_ctx_fini(struct xe_validation_ctx *ctx)
> +{
> +	drm_exec_fini(ctx->exec);
> +	xe_validation_unlock(ctx);
> +}
> +
> +/**
> + * xe_validation_should_retry() - Determine if a validation transaction should retry
> + * @ctx: The validation transaction.
> + * @ret: Pointer to a return value variable.
> + *
> + * Determines whether a validation transaction should retry based on the
> + * internal transaction state and the return value pointed to by @ret.
> + * If a validation should be retried, the transaction is prepared for that,
> + * and the validation locked might be re-locked in exclusive mode, and *@ret
> + * is set to %0. If the re-locking errors, typically due to interruptible
> + * locking with signal pending, *@ret is instead set to -EINTR and the
> + * function returns %false.
> + *
> + * Return: %true if validation should be retried, %false otherwise.
> + */
> +bool xe_validation_should_retry(struct xe_validation_ctx *ctx, int *ret)
> +{
> +	if (__xe_validation_should_retry(ctx, *ret)) {
> +		drm_exec_fini(ctx->exec);
> +		*ret = 0;
> +		if (ctx->request_exclusive != ctx->lock_held_exclusive) {
> +			xe_validation_unlock(ctx);
> +			*ret = xe_validation_lock(ctx);
> +		}
> +		drm_exec_init(ctx->exec, ctx->exec_flags, ctx->nr);
> +		return !*ret;
> +	}
> +
> +	return false;
> +}
> diff --git a/drivers/gpu/drm/xe/xe_validation.h b/drivers/gpu/drm/xe/xe_validation.h
> index db50feacad7a..36860974165e 100644
> --- a/drivers/gpu/drm/xe/xe_validation.h
> +++ b/drivers/gpu/drm/xe/xe_validation.h
> @@ -7,9 +7,11 @@
>  
>  #include <linux/dma-resv.h>
>  #include <linux/types.h>
> +#include <linux/rwsem.h>
>  
>  struct drm_exec;
>  struct drm_gem_object;
> +struct drm_gpuvm_exec;
>  struct xe_device;
>  
>  #ifdef CONFIG_PROVE_LOCKING
> @@ -66,4 +68,124 @@ void xe_validation_assert_exec(const struct xe_device *xe, const struct drm_exec
>  	} while (0)
>  #endif
>  
> +/**
> + * struct xe_validation_device - The domain for exhaustive eviction
> + * @lock: The lock used to exclude other processes from allocating graphics memory
> + *
> + * The struct xe_validation_device represents the domain for which we want to use
> + * exhaustive eviction. The @lock is typically grabbed in read mode for allocations
> + * but when graphics memory allocation fails, it is retried with the write mode held.
> + */
> +struct xe_validation_device {
> +	struct rw_semaphore lock;
> +};
> +
> +/**
> + * struct xe_val_flags - Flags for xe_validation_ctx_init().
> + * @exclusive: Start the validation transaction by locking out all other validators.
> + * @no_block:  Don't block on initialization.
> + * @interruptible: Block interruptible if blocking. Implies initializing the drm_exec
> + * context with the DRM_EXEC_INTERRUPTIBLE_WAIT flag.
> + * @exec_ignore_dupllicates: Initialize the drm_exec context with the
> + * DRM_EXEC_IGNORE_DUPLICATES flag.
> + */
> +struct xe_val_flags {
> +	u32 exclusive :1;
> +	u32 no_block :1;
> +	u32 interruptible :1;
> +	u32 exec_ignore_duplicates :1;
> +};
> +
> +/**
> + * struct xe_validation_ctx - A struct drm_exec subclass with support for
> + * exhaustive eviction
> + * @exec: The drm_exec object base class. Note that we use a pointer instead of
> + * embedding to avoid diamond inheritance.
> + * @val: The exhaustive eviction domain.
> + * @lock_held: Whether The domain lock is currently held.
> + * @lock_held_exclusive: Whether the domain lock is held in exclusive mode.
> + * @request_exclusive: Whether to lock exclusively (write mode) the next time
> + * the domain lock is locked.
> + * @flags: The drm_exec flags used for drm_exec (re-)initialization.
> + * @nr: The drm_exec nr parameter used for drm_exec (re-)initializaiton.
> + */
> +struct xe_validation_ctx {
> +	struct drm_exec *exec;
> +	struct xe_validation_device *val;
> +	struct xe_val_flags val_flags;
> +	bool lock_held;
> +	bool lock_held_exclusive;
> +	bool request_exclusive;
> +	u32 exec_flags;
> +	unsigned int nr;
> +};
> +
> +int xe_validation_ctx_init(struct xe_validation_ctx *ctx, struct xe_validation_device *val,
> +			   struct drm_exec *exec, const struct xe_val_flags flags);
> +
> +int xe_validation_exec_lock(struct xe_validation_ctx *ctx, struct drm_gpuvm_exec *vm_exec,
> +			    struct xe_validation_device *val);
> +
> +void xe_validation_ctx_fini(struct xe_validation_ctx *ctx);
> +
> +bool xe_validation_should_retry(struct xe_validation_ctx *ctx, int *ret);
> +
> +/**
> + * xe_validation_retry_on_oom() - Retry on oom in an xe_validaton transaction
> + * @_ctx: Pointer to the xe_validation_ctx
> + * @_ret: The current error value possibly holding -ENOMEM
> + *
> + * Use this in way similar to drm_exec_retry_on_contention().
> + * If @_ret contains -ENOMEM the tranaction is restarted once in a way that
> + * blocks other transactions and allows exhastive eviction. If the transaction
> + * was already restarted once, Just return the -ENOMEM. May also set
> + * _ret to -EINTR if not retrying and waits are interruptible.
> + * May only be used within a drm_exec_until_all_locked() loop.
> + */
> +#define xe_validation_retry_on_oom(_ctx, _ret)				\
> +	do {								\
> +		if (xe_validation_should_retry(_ctx, _ret))		\
> +			goto *__drm_exec_retry_ptr;			\
> +	} while (0)
> +
> +/**
> + * xe_validation_device_init - Initialize a struct xe_validation_device
> + * @val: The xe_validation_device to init.
> + */
> +static inline void
> +xe_validation_device_init(struct xe_validation_device *val)
> +{
> +	init_rwsem(&val->lock);
> +}
> +
> +/*
> + * Make guard() and scoped_guard() work with xe_validation_ctx
> + * so that we can exit transactions without caring about the
> + * cleanup.
> + */
> +DEFINE_CLASS(xe_validation, struct xe_validation_ctx *,
> +	     if (_T) xe_validation_ctx_fini(_T);,
> +	     ({_ret = xe_validation_ctx_init(_ctx, _val, _exec, _flags);
> +	       _ret ? NULL : _ctx; }),
> +	     struct xe_validation_ctx *_ctx, struct xe_validation_device *_val,
> +	     struct drm_exec *_exec, const struct xe_val_flags _flags, int _ret);
> +static inline void *class_xe_validation_lock_ptr(class_xe_validation_t *_T)
> +{return *_T; }
> +#define class_xe_validation_is_conditional true
> +
> +/**
> + * xe_validation_guard() - An auto-cleanup xe_validation_ctx transaction
> + * @_ctx: The xe_validation_ctx.
> + * @_val: The xe_validation_device.
> + * @_exec: The struct drm_exec object
> + * @_flags: Flags for the xe_validation_ctx initialization.
> + * @_ret: Return in / out parameter. May be set by this macro. Typicall 0 when called.
> + *
> + * This macro is will initiate a drm_exec transaction with additional support for
> + * exhaustive eviction.
> + */
> +#define xe_validation_guard(_ctx, _val, _exec, _flags, _ret)		\
> +	scoped_guard(xe_validation, _ctx, _val, _exec, _flags, _ret) \
> +	drm_exec_until_all_locked(_exec)
> +
>  #endif
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 11/16] drm/xe: Convert xe_dma_buf.c for exhaustive eviction
  2025-08-22  9:40 ` [PATCH v2 11/16] drm/xe: Convert xe_dma_buf.c for exhaustive eviction Thomas Hellström
@ 2025-08-26 21:16   ` Matthew Brost
  0 siblings, 0 replies; 36+ messages in thread
From: Matthew Brost @ 2025-08-26 21:16 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Fri, Aug 22, 2025 at 11:40:25AM +0200, Thomas Hellström wrote:
> Convert dma-buf migration to XE_PL_TT and dma-buf import to
> support exhaustive eviction, using xe_validation_guard().
> It seems unlikely that the import would result in an -ENOMEM,
> but convert import anyway for completeness.
> 
> The dma-buf map_attachment() functionality unfortunately doesn't
> support passing a drm_exec, which means that foreign devices
> validating a dma-buf that we exported will not, unless they are
> xeKMD devices, participate in the exhaustive eviction scheme.
> 
> v2:
> - Avoid gotos from within xe_validation_guard(). (Matt Brost)
> - Adapt to signature change of xe_validation_guard(). (Matt Brost)
> - Remove an unneded (void)ret. (Matt Brost)
> - Fix up an error path.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_dma_buf.c | 61 ++++++++++++++++++++++-----------
>  1 file changed, 41 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
> index 78a827d4e726..3f96101a06f3 100644
> --- a/drivers/gpu/drm/xe/xe_dma_buf.c
> +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
> @@ -163,16 +163,26 @@ static int xe_dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
>  	struct xe_bo *bo = gem_to_xe_bo(obj);
>  	bool reads =  (direction == DMA_BIDIRECTIONAL ||
>  		       direction == DMA_FROM_DEVICE);
> -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> +	struct xe_validation_ctx ctx;
> +	struct drm_exec exec;
> +	int ret = 0;
>  
>  	if (!reads)
>  		return 0;
>  
>  	/* Can we do interruptible lock here? */
> -	xe_bo_lock(bo, false);
> -	(void)xe_bo_migrate(bo, XE_PL_TT, exec);
> -	xe_bo_unlock(bo);
> +	xe_validation_guard(&ctx, &xe_bo_device(bo)->val, &exec, (struct xe_val_flags) {}, ret) {
> +		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
> +		drm_exec_retry_on_contention(&exec);
> +		if (ret)
> +			break;
> +
> +		ret = xe_bo_migrate(bo, XE_PL_TT, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +		xe_validation_retry_on_oom(&ctx, &ret);
> +	}
>  
> +	/* If we failed, cpu-access takes place in current placement. */
>  	return 0;
>  }
>  
> @@ -211,25 +221,36 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
>  {
>  	struct dma_resv *resv = dma_buf->resv;
>  	struct xe_device *xe = to_xe_device(dev);
> -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> +	struct xe_validation_ctx ctx;
> +	struct drm_gem_object *dummy_obj;
> +	struct drm_exec exec;
>  	struct xe_bo *bo;
> -	int ret;
> -
> -	dma_resv_lock(resv, NULL);
> -	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
> -				    0, /* Will require 1way or 2way for vm_bind */
> -				    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM, exec);
> -	if (IS_ERR(bo)) {
> -		ret = PTR_ERR(bo);
> -		goto error;
> +	int ret = 0;
> +
> +	dummy_obj = drm_gpuvm_resv_object_alloc(&xe->drm);
> +	if (!dummy_obj)
> +		return ERR_PTR(-ENOMEM);
> +
> +	dummy_obj->resv = resv;
> +	xe_validation_guard(&ctx, &xe->val, &exec, (struct xe_val_flags) {}, ret) {
> +		ret = drm_exec_lock_obj(&exec, dummy_obj);
> +		drm_exec_retry_on_contention(&exec);
> +		if (ret)
> +			break;
> +
> +		bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
> +					    0, /* Will require 1way or 2way for vm_bind */
> +					    ttm_bo_type_sg, XE_BO_FLAG_SYSTEM, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +		if (IS_ERR(bo)) {
> +			ret = PTR_ERR(bo);
> +			xe_validation_retry_on_oom(&ctx, &ret);
> +			break;
> +		}
>  	}
> -	dma_resv_unlock(resv);
> -
> -	return &bo->ttm.base;
> +	drm_gem_object_put(dummy_obj);
>  
> -error:
> -	dma_resv_unlock(resv);
> -	return ERR_PTR(ret);
> +	return ret ? ERR_PTR(ret) : &bo->ttm.base;
>  }
>  
>  static void xe_dma_buf_move_notify(struct dma_buf_attachment *attach)
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 13/16] drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction
  2025-08-22  9:40 ` [PATCH v2 13/16] drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction Thomas Hellström
@ 2025-08-26 21:27   ` Matthew Brost
  0 siblings, 0 replies; 36+ messages in thread
From: Matthew Brost @ 2025-08-26 21:27 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Fri, Aug 22, 2025 at 11:40:27AM +0200, Thomas Hellström wrote:
> Most users of xe_bo_create_pin_map_at() and
> xe_bo_create_pin_map_at_aligned() are not using the vm parameter,
> and that simplifies conversion. Introduce an
> xe_bo_create_pin_map_at_novm() function and make the _aligned()
> version static. Use xe_validation_guard() for conversion.
> 
> v2:
> - Adapt to signature change of xe_validation_guard(). (Matt Brost)
> - Fix up documentation.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  .../compat-i915-headers/gem/i915_gem_stolen.h | 24 ++-----
>  drivers/gpu/drm/xe/display/xe_fb_pin.c        | 42 +++++------
>  drivers/gpu/drm/xe/display/xe_plane_initial.c |  4 +-
>  drivers/gpu/drm/xe/xe_bo.c                    | 72 ++++++++++++++-----
>  drivers/gpu/drm/xe/xe_bo.h                    | 13 ++--
>  drivers/gpu/drm/xe/xe_eu_stall.c              |  5 +-
>  6 files changed, 89 insertions(+), 71 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> index 1ce1e9da975b..51afdf2ee98b 100644
> --- a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> +++ b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_stolen.h
> @@ -21,9 +21,7 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
>  						       u32 size, u32 align,
>  						       u32 start, u32 end)
>  {
> -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
>  	struct xe_bo *bo;
> -	int err;
>  	u32 flags = XE_BO_FLAG_PINNED | XE_BO_FLAG_STOLEN;
>  
>  	if (start < SZ_4K)
> @@ -34,25 +32,15 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe,
>  		start = ALIGN(start, align);
>  	}
>  
> -	bo = xe_bo_create_locked_range(xe, xe_device_get_root_tile(xe),
> -				       NULL, size, start, end,
> -				       ttm_bo_type_kernel, flags, 0, exec);
> -	if (IS_ERR(bo)) {
> -		err = PTR_ERR(bo);
> -		bo = NULL;
> -		return err;
> -	}
> -	err = xe_bo_pin(bo, exec);
> -	xe_bo_unlock_vm_held(bo);
> -
> -	if (err) {
> -		xe_bo_put(fb->bo);
> -		bo = NULL;
> -	}
> +	bo = xe_bo_create_pin_map_at_novm(xe, xe_device_get_root_tile(xe),
> +					  size, start, ttm_bo_type_kernel, flags,
> +					  0, true);
> +	if (IS_ERR(bo))
> +		return PTR_ERR(bo);
>  
>  	fb->bo = bo;
>  
> -	return err;
> +	return 0;
>  }
>  
>  static inline int i915_gem_stolen_insert_node(struct xe_device *xe,
> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> index fe0000b211d9..e73994dd4126 100644
> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> @@ -102,29 +102,29 @@ static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb,
>  				 XE_PAGE_SIZE);
>  
>  	if (IS_DGFX(xe))
> -		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0, NULL,
> -						      dpt_size, ~0ull,
> -						      ttm_bo_type_kernel,
> -						      XE_BO_FLAG_VRAM0 |
> -						      XE_BO_FLAG_GGTT |
> -						      XE_BO_FLAG_PAGETABLE,
> -						      alignment);
> +		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> +						   dpt_size, ~0ull,
> +						   ttm_bo_type_kernel,
> +						   XE_BO_FLAG_VRAM0 |
> +						   XE_BO_FLAG_GGTT |
> +						   XE_BO_FLAG_PAGETABLE,
> +						   alignment, false);
>  	else
> -		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0, NULL,
> -						      dpt_size,  ~0ull,
> -						      ttm_bo_type_kernel,
> -						      XE_BO_FLAG_STOLEN |
> -						      XE_BO_FLAG_GGTT |
> -						      XE_BO_FLAG_PAGETABLE,
> -						      alignment);
> +		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> +						   dpt_size,  ~0ull,
> +						   ttm_bo_type_kernel,
> +						   XE_BO_FLAG_STOLEN |
> +						   XE_BO_FLAG_GGTT |
> +						   XE_BO_FLAG_PAGETABLE,
> +						   alignment, false);
>  	if (IS_ERR(dpt))
> -		dpt = xe_bo_create_pin_map_at_aligned(xe, tile0, NULL,
> -						      dpt_size,  ~0ull,
> -						      ttm_bo_type_kernel,
> -						      XE_BO_FLAG_SYSTEM |
> -						      XE_BO_FLAG_GGTT |
> -						      XE_BO_FLAG_PAGETABLE,
> -						      alignment);
> +		dpt = xe_bo_create_pin_map_at_novm(xe, tile0,
> +						   dpt_size,  ~0ull,
> +						   ttm_bo_type_kernel,
> +						   XE_BO_FLAG_SYSTEM |
> +						   XE_BO_FLAG_GGTT |
> +						   XE_BO_FLAG_PAGETABLE,
> +						   alignment, false);
>  	if (IS_ERR(dpt))
>  		return PTR_ERR(dpt);
>  
> diff --git a/drivers/gpu/drm/xe/display/xe_plane_initial.c b/drivers/gpu/drm/xe/display/xe_plane_initial.c
> index 826ac3d578b7..94f00def811b 100644
> --- a/drivers/gpu/drm/xe/display/xe_plane_initial.c
> +++ b/drivers/gpu/drm/xe/display/xe_plane_initial.c
> @@ -140,8 +140,8 @@ initial_plane_bo(struct xe_device *xe,
>  			page_size);
>  	size -= base;
>  
> -	bo = xe_bo_create_pin_map_at(xe, tile0, NULL, size, phys_base,
> -				     ttm_bo_type_kernel, flags);
> +	bo = xe_bo_create_pin_map_at_novm(xe, tile0, size, phys_base,
> +					  ttm_bo_type_kernel, flags, 0, false);
>  	if (IS_ERR(bo)) {
>  		drm_dbg(&xe->drm,
>  			"Failed to create bo phys_base=%pa size %u with flags %x: %li\n",
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index a3b7288f6b3d..d5172cb05078 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -2379,27 +2379,17 @@ struct xe_bo *xe_bo_create_user(struct xe_device *xe,
>  	return bo;
>  }
>  
> -struct xe_bo *xe_bo_create_pin_map_at(struct xe_device *xe, struct xe_tile *tile,
> -				      struct xe_vm *vm,
> -				      size_t size, u64 offset,
> -				      enum ttm_bo_type type, u32 flags)
> -{
> -	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size, offset,
> -					       type, flags, 0);
> -}
> -
> -struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
> -					      struct xe_tile *tile,
> -					      struct xe_vm *vm,
> -					      size_t size, u64 offset,
> -					      enum ttm_bo_type type, u32 flags,
> -					      u64 alignment)
> +static struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
> +						     struct xe_tile *tile,
> +						     struct xe_vm *vm,
> +						     size_t size, u64 offset,
> +						     enum ttm_bo_type type, u32 flags,
> +						     u64 alignment, struct drm_exec *exec)
>  {
>  	struct xe_bo *bo;
>  	int err;
>  	u64 start = offset == ~0ull ? 0 : offset;
> -	u64 end = offset == ~0ull ? offset : start + size;
> -	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
> +	u64 end = offset == ~0ull ? ~0ull : start + size;
>  
>  	if (flags & XE_BO_FLAG_STOLEN &&
>  	    xe_ttm_stolen_cpu_access_needs_ggtt(xe))
> @@ -2431,11 +2421,57 @@ struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
>  	return ERR_PTR(err);
>  }
>  
> +/**
> + * xe_bo_create_pin_map_at_novm() - Create pinned and mapped bo at optional VRAM offset
> + * @xe: The xe device.
> + * @tile: The tile to select for migration of this bo, and the tile used for
> + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
> + * @size: The storage size to use for the bo.
> + * @offset: Optional VRAM offset or %~0ull for don't care.
> + * @type: The TTM buffer object type.
> + * @flags: XE_BO_FLAG_ flags.
> + * @alignment: GGTT alignment.
> + * @intr: Whether to execute any waits for backing store interruptible.
> + *
> + * Create a pinned and optionally mapped bo with VRAM offset and GGTT alignment
> + * options. The bo will be external and not associated with a VM.
> + *
> + * Return: The buffer object on success. Negative error pointer on failure.
> + * In particular, the function may return ERR_PTR(%-EINTR) if @intr was set
> + * to true on entry.
> + */
> +struct xe_bo *
> +xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile *tile,
> +			     size_t size, u64 offset, enum ttm_bo_type type, u32 flags,
> +			     u64 alignment, bool intr)
> +{
> +	struct xe_validation_ctx ctx;
> +	struct drm_exec exec;
> +	struct xe_bo *bo;
> +	int ret = 0;
> +
> +	xe_validation_guard(&ctx, &xe->val, &exec, (struct xe_val_flags) {.interruptible = intr},
> +			    ret) {
> +		bo = xe_bo_create_pin_map_at_aligned(xe, tile, NULL, size, offset,
> +						     type, flags, alignment, &exec);
> +		if (IS_ERR(bo)) {
> +			drm_exec_retry_on_contention(&exec);
> +			ret = PTR_ERR(bo);
> +			xe_validation_retry_on_oom(&ctx, &ret);
> +		}
> +	}
> +
> +	return ret ? ERR_PTR(ret) : bo;
> +}
> +
>  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
>  				   struct xe_vm *vm, size_t size,
>  				   enum ttm_bo_type type, u32 flags)
>  {
> -	return xe_bo_create_pin_map_at(xe, tile, vm, size, ~0ull, type, flags);
> +	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
> +
> +	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size, ~0ull, type, flags,
> +					       0, exec);
>  }
>  
>  static void __xe_bo_unpin_map_no_vm(void *arg)
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index a625806deeb6..decd601c802d 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -109,15 +109,10 @@ struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_vm *vm, size_t s
>  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
>  				   struct xe_vm *vm, size_t size,
>  				   enum ttm_bo_type type, u32 flags);
> -struct xe_bo *xe_bo_create_pin_map_at(struct xe_device *xe, struct xe_tile *tile,
> -				      struct xe_vm *vm, size_t size, u64 offset,
> -				      enum ttm_bo_type type, u32 flags);
> -struct xe_bo *xe_bo_create_pin_map_at_aligned(struct xe_device *xe,
> -					      struct xe_tile *tile,
> -					      struct xe_vm *vm,
> -					      size_t size, u64 offset,
> -					      enum ttm_bo_type type, u32 flags,
> -					      u64 alignment);
> +struct xe_bo *
> +xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile *tile,
> +			     size_t size, u64 offset, enum ttm_bo_type type,
> +			     u32 flags, u64 alignment, bool intr);
>  struct xe_bo *xe_managed_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
>  					   size_t size, u32 flags);
>  struct xe_bo *xe_managed_bo_create_from_data(struct xe_device *xe, struct xe_tile *tile,
> diff --git a/drivers/gpu/drm/xe/xe_eu_stall.c b/drivers/gpu/drm/xe/xe_eu_stall.c
> index fdd514fec5ef..f5cfdf29fde3 100644
> --- a/drivers/gpu/drm/xe/xe_eu_stall.c
> +++ b/drivers/gpu/drm/xe/xe_eu_stall.c
> @@ -617,9 +617,8 @@ static int xe_eu_stall_data_buf_alloc(struct xe_eu_stall_data_stream *stream,
>  
>  	size = stream->per_xecore_buf_size * last_xecore;
>  
> -	bo = xe_bo_create_pin_map_at_aligned(tile->xe, tile, NULL,
> -					     size, ~0ull, ttm_bo_type_kernel,
> -					     XE_BO_FLAG_SYSTEM | XE_BO_FLAG_GGTT, SZ_64);
> +	bo = xe_bo_create_pin_map_at_novm(tile->xe, tile, size, ~0ull, ttm_bo_type_kernel,
> +					  XE_BO_FLAG_SYSTEM | XE_BO_FLAG_GGTT, SZ_64, false);
>  	if (IS_ERR(bo)) {
>  		kfree(stream->xecore_buf);
>  		return PTR_ERR(bo);
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 10/16] drm/xe/display: Convert __xe_pin_fb_vma()
  2025-08-22  9:40 ` [PATCH v2 10/16] drm/xe/display: Convert __xe_pin_fb_vma() Thomas Hellström
@ 2025-08-26 21:29   ` Matthew Brost
  0 siblings, 0 replies; 36+ messages in thread
From: Matthew Brost @ 2025-08-26 21:29 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Fri, Aug 22, 2025 at 11:40:24AM +0200, Thomas Hellström wrote:
> Convert __xe_pin_fb_vma() for exhaustive eviction
> using xe_validation_guard().
> 
> v2:
> - Avoid gotos from within xe_validation_guard(). (Matt Brost)
> - Adapt to signature change of xe_validation_guard(). (Matt Brost)
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/xe/display/xe_fb_pin.c | 29 +++++++++++++++-----------
>  1 file changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> index 4b0748e6fdd6..fe0000b211d9 100644
> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> @@ -281,7 +281,8 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
>  	struct i915_vma *vma = kzalloc(sizeof(*vma), GFP_KERNEL);
>  	struct drm_gem_object *obj = intel_fb_bo(&fb->base);
>  	struct xe_bo *bo = gem_to_xe_bo(obj);
> -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> +	struct xe_validation_ctx ctx;
> +	struct drm_exec exec;
>  	int ret;
>  
>  	if (!vma)
> @@ -309,17 +310,21 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb,
>  	 * Pin the framebuffer, we can't use xe_bo_(un)pin functions as the
>  	 * assumptions are incorrect for framebuffers
>  	 */
> -	ret = ttm_bo_reserve(&bo->ttm, false, false, NULL);
> -	if (ret)
> -		goto err;
> -
> -	if (IS_DGFX(xe))
> -		ret = xe_bo_migrate(bo, XE_PL_VRAM0, exec);
> -	else
> -		ret = xe_bo_validate(bo, NULL, true, exec);
> -	if (!ret)
> -		ttm_bo_pin(&bo->ttm);
> -	ttm_bo_unreserve(&bo->ttm);
> +	xe_validation_guard(&ctx, &xe->val, &exec, (struct xe_val_flags) {}, ret) {
> +		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
> +		drm_exec_retry_on_contention(&exec);
> +		if (ret)
> +			break;
> +
> +		if (IS_DGFX(xe))
> +			ret = xe_bo_migrate(bo, XE_PL_VRAM0, &exec);
> +		else
> +			ret = xe_bo_validate(bo, NULL, true, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +		xe_validation_retry_on_oom(&ctx, &ret);
> +		if (!ret)
> +			ttm_bo_pin(&bo->ttm);
> +	}
>  	if (ret)
>  		goto err;
>  
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 14/16] drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction
  2025-08-22  9:40 ` [PATCH v2 14/16] drm/xe: Convert xe_bo_create_pin_map() " Thomas Hellström
@ 2025-08-26 21:52   ` Matthew Brost
  2025-09-02 13:32     ` Thomas Hellström
  0 siblings, 1 reply; 36+ messages in thread
From: Matthew Brost @ 2025-08-26 21:52 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Fri, Aug 22, 2025 at 11:40:28AM +0200, Thomas Hellström wrote:
> Introduce an xe_bo_create_pin_map_novm() function that does not
> take the drm_exec paramenter to simplify the conversion of many
> callsites.
> For the rest, ensure that the same drm_exec context that was used
> for locking the vm is passed down to validation.
> 
> Use xe_validation_guard() where appropriate.
> 
> v2:
> - Avoid gotos from within xe_validation_guard(). (Matt Brost)
> - Break out the change to pf_provision_vf_lmem8 to a separate
>   patch.
> - Adapt to signature change of xe_validation_guard().
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/display/intel_fbdev_fb.c   |  18 +--
>  drivers/gpu/drm/xe/display/xe_dsb_buffer.c    |  10 +-
>  drivers/gpu/drm/xe/display/xe_hdcp_gsc.c      |   8 +-
>  drivers/gpu/drm/xe/tests/xe_migrate.c         |   9 +-
>  drivers/gpu/drm/xe/xe_bo.c                    |  52 +++++++-
>  drivers/gpu/drm/xe/xe_bo.h                    |   6 +-
>  drivers/gpu/drm/xe/xe_gsc.c                   |   8 +-
>  drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c |  24 ++--
>  drivers/gpu/drm/xe/xe_guc_engine_activity.c   |  13 +-
>  drivers/gpu/drm/xe/xe_lmtt.c                  |  12 +-
>  drivers/gpu/drm/xe/xe_lrc.c                   |   7 +-
>  drivers/gpu/drm/xe/xe_migrate.c               |  20 ++-
>  drivers/gpu/drm/xe/xe_oa.c                    |   6 +-
>  drivers/gpu/drm/xe/xe_pt.c                    |  10 +-
>  drivers/gpu/drm/xe/xe_pt.h                    |   3 +-
>  drivers/gpu/drm/xe/xe_pxp_submit.c            |  34 +++--
>  drivers/gpu/drm/xe/xe_vm.c                    | 121 +++++++++++-------
>  17 files changed, 231 insertions(+), 130 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
> index d96ba2b51065..8ea9a472113c 100644
> --- a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
> +++ b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
> @@ -42,11 +42,11 @@ struct intel_framebuffer *intel_fbdev_fb_alloc(struct drm_fb_helper *helper,
>  	obj = ERR_PTR(-ENODEV);
>  
>  	if (!IS_DGFX(xe) && !XE_GT_WA(xe_root_mmio_gt(xe), 22019338487_display)) {
> -		obj = xe_bo_create_pin_map(xe, xe_device_get_root_tile(xe),
> -					   NULL, size,
> -					   ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
> -					   XE_BO_FLAG_STOLEN |
> -					   XE_BO_FLAG_GGTT);
> +		obj = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe),
> +						size,
> +						ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
> +						XE_BO_FLAG_STOLEN |
> +						XE_BO_FLAG_GGTT, false);

This was interruptable before, same for a few other display conversions.

I'm not familar enough with display to know if this is ok.

>  		if (!IS_ERR(obj))
>  			drm_info(&xe->drm, "Allocated fbdev into stolen\n");
>  		else
> @@ -54,10 +54,10 @@ struct intel_framebuffer *intel_fbdev_fb_alloc(struct drm_fb_helper *helper,
>  	}
>  
>  	if (IS_ERR(obj)) {
> -		obj = xe_bo_create_pin_map(xe, xe_device_get_root_tile(xe), NULL, size,
> -					   ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
> -					   XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> -					   XE_BO_FLAG_GGTT);
> +		obj = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe), size,
> +						ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
> +						XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> +						XE_BO_FLAG_GGTT, false);
>  	}
>  
>  	if (IS_ERR(obj)) {
> diff --git a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
> index 9f941fc2e36b..58581d7aaae6 100644
> --- a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
> +++ b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
> @@ -43,11 +43,11 @@ bool intel_dsb_buffer_create(struct intel_crtc *crtc, struct intel_dsb_buffer *d
>  		return false;
>  
>  	/* Set scanout flag for WC mapping */
> -	obj = xe_bo_create_pin_map(xe, xe_device_get_root_tile(xe),
> -				   NULL, PAGE_ALIGN(size),
> -				   ttm_bo_type_kernel,
> -				   XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> -				   XE_BO_FLAG_SCANOUT | XE_BO_FLAG_GGTT);
> +	obj = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe),
> +					PAGE_ALIGN(size),
> +					ttm_bo_type_kernel,
> +					XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> +					XE_BO_FLAG_SCANOUT | XE_BO_FLAG_GGTT, false);
>  	if (IS_ERR(obj)) {
>  		kfree(vma);
>  		return false;
> diff --git a/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c b/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
> index 30f1073141fc..4ae847b628e2 100644
> --- a/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
> +++ b/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
> @@ -72,10 +72,10 @@ static int intel_hdcp_gsc_initialize_message(struct xe_device *xe,
>  	int ret = 0;
>  
>  	/* allocate object of two page for HDCP command memory and store it */
> -	bo = xe_bo_create_pin_map(xe, xe_device_get_root_tile(xe), NULL, PAGE_SIZE * 2,
> -				  ttm_bo_type_kernel,
> -				  XE_BO_FLAG_SYSTEM |
> -				  XE_BO_FLAG_GGTT);
> +	bo = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe), PAGE_SIZE * 2,
> +				       ttm_bo_type_kernel,
> +				       XE_BO_FLAG_SYSTEM |
> +				       XE_BO_FLAG_GGTT, false);
>  
>  	if (IS_ERR(bo)) {
>  		drm_err(&xe->drm, "Failed to allocate bo for HDCP streaming command!\n");
> diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
> index afa794e56065..5904d658d1f2 100644
> --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
> @@ -204,7 +204,8 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test,
>  
>  	big = xe_bo_create_pin_map(xe, tile, m->q->vm, SZ_4M,
>  				   ttm_bo_type_kernel,
> -				   XE_BO_FLAG_VRAM_IF_DGFX(tile));
> +				   XE_BO_FLAG_VRAM_IF_DGFX(tile),
> +				   exec);
>  	if (IS_ERR(big)) {
>  		KUNIT_FAIL(test, "Failed to allocate bo: %li\n", PTR_ERR(big));
>  		goto vunmap;
> @@ -212,7 +213,8 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test,
>  
>  	pt = xe_bo_create_pin_map(xe, tile, m->q->vm, XE_PAGE_SIZE,
>  				  ttm_bo_type_kernel,
> -				  XE_BO_FLAG_VRAM_IF_DGFX(tile));
> +				  XE_BO_FLAG_VRAM_IF_DGFX(tile),
> +				  exec);
>  	if (IS_ERR(pt)) {
>  		KUNIT_FAIL(test, "Failed to allocate fake pt: %li\n",
>  			   PTR_ERR(pt));
> @@ -222,7 +224,8 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test,
>  	tiny = xe_bo_create_pin_map(xe, tile, m->q->vm,
>  				    2 * SZ_4K,
>  				    ttm_bo_type_kernel,
> -				    XE_BO_FLAG_VRAM_IF_DGFX(tile));
> +				    XE_BO_FLAG_VRAM_IF_DGFX(tile),
> +				    exec);
>  	if (IS_ERR(tiny)) {
>  		KUNIT_FAIL(test, "Failed to allocate tiny fake pt: %li\n",
>  			   PTR_ERR(tiny));
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index d5172cb05078..7a62629c88e0 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -2464,16 +2464,59 @@ xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile *tile,
>  	return ret ? ERR_PTR(ret) : bo;
>  }
>  
> +/**
> + * xe_bo_create_pin_map() - Create pinned and mapped bo
> + * @xe: The xe device.
> + * @tile: The tile to select for migration of this bo, and the tile used for
> + * @vm: The vm to associate the buffer object with. The vm's resv must be locked
> + * with the transaction represented by @exec.
> + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
> + * @size: The storage size to use for the bo.
> + * @type: The TTM buffer object type.
> + * @flags: XE_BO_FLAG_ flags.
> + * @exec: The drm_exec transaction to use for exhaustive eviction, and
> + * previously used for locking @vm's resv.
> + *
> + * Create a pinned and mapped bo. The bo will be external and not associated
> + * with a VM.
> + *
> + * Return: The buffer object on success. Negative error pointer on failure.
> + * In particular, the function may return ERR_PTR(%-EINTR) if @exec was
> + * configured for interruptible locking.
> + */
>  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
>  				   struct xe_vm *vm, size_t size,
> -				   enum ttm_bo_type type, u32 flags)
> +				   enum ttm_bo_type type, u32 flags,
> +				   struct drm_exec *exec)
>  {
> -	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) : XE_VALIDATION_UNIMPLEMENTED;
> -
>  	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size, ~0ull, type, flags,
>  					       0, exec);
>  }
>  
> +/**
> + * xe_bo_create_pin_map_novm() - Create pinned and mapped bo
> + * @xe: The xe device.
> + * @tile: The tile to select for migration of this bo, and the tile used for
> + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel bos.
> + * @size: The storage size to use for the bo.
> + * @type: The TTM buffer object type.
> + * @flags: XE_BO_FLAG_ flags.
> + * @intr: Whether to execut any waits for backing store interruptible.
> + *
> + * Create a pinned and mapped bo. The bo will be external and not associated
> + * with a VM.
> + *
> + * Return: The buffer object on success. Negative error pointer on failure.
> + * In particular, the function may return ERR_PTR(%-EINTR) if @intr was set
> + * to true on entry.
> + */
> +struct xe_bo *xe_bo_create_pin_map_novm(struct xe_device *xe, struct xe_tile *tile,
> +					size_t size, enum ttm_bo_type type, u32 flags,
> +					bool intr)
> +{
> +	return xe_bo_create_pin_map_at_novm(xe, tile, size, ~0ull, type, flags, 0, intr);
> +}
> +
>  static void __xe_bo_unpin_map_no_vm(void *arg)
>  {
>  	xe_bo_unpin_map_no_vm(arg);
> @@ -2486,8 +2529,7 @@ struct xe_bo *xe_managed_bo_create_pin_map(struct xe_device *xe, struct xe_tile
>  	int ret;
>  
>  	KUNIT_STATIC_STUB_REDIRECT(xe_managed_bo_create_pin_map, xe, tile, size, flags);
> -
> -	bo = xe_bo_create_pin_map(xe, tile, NULL, size, ttm_bo_type_kernel, flags);
> +	bo = xe_bo_create_pin_map_novm(xe, tile, size, ttm_bo_type_kernel, flags, true);

This is a driver load call, so non-interruptable should be fine.

>  	if (IS_ERR(bo))
>  		return bo;
>  
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index decd601c802d..6f46f928a0d4 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -108,7 +108,11 @@ struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_vm *vm, size_t s
>  				u16 cpu_caching, u32 flags, struct drm_exec *exec);
>  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct xe_tile *tile,
>  				   struct xe_vm *vm, size_t size,
> -				   enum ttm_bo_type type, u32 flags);
> +				   enum ttm_bo_type type, u32 flags,
> +				   struct drm_exec *exec);
> +struct xe_bo *xe_bo_create_pin_map_novm(struct xe_device *xe, struct xe_tile *tile,
> +					size_t size, enum ttm_bo_type type, u32 flags,
> +					bool intr);
>  struct xe_bo *
>  xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile *tile,
>  			     size_t size, u64 offset, enum ttm_bo_type type,
> diff --git a/drivers/gpu/drm/xe/xe_gsc.c b/drivers/gpu/drm/xe/xe_gsc.c
> index f5ae28af60d4..83d61bf8ec62 100644
> --- a/drivers/gpu/drm/xe/xe_gsc.c
> +++ b/drivers/gpu/drm/xe/xe_gsc.c
> @@ -136,10 +136,10 @@ static int query_compatibility_version(struct xe_gsc *gsc)
>  	u64 ggtt_offset;
>  	int err;
>  
> -	bo = xe_bo_create_pin_map(xe, tile, NULL, GSC_VER_PKT_SZ * 2,
> -				  ttm_bo_type_kernel,
> -				  XE_BO_FLAG_SYSTEM |
> -				  XE_BO_FLAG_GGTT);
> +	bo = xe_bo_create_pin_map_novm(xe, tile, GSC_VER_PKT_SZ * 2,
> +				       ttm_bo_type_kernel,
> +				       XE_BO_FLAG_SYSTEM |
> +				       XE_BO_FLAG_GGTT, false);
>  	if (IS_ERR(bo)) {
>  		xe_gt_err(gt, "failed to allocate bo for GSC version query\n");
>  		return PTR_ERR(bo);
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> index c712111aa30d..44cc612b0a75 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> @@ -55,12 +55,12 @@ static int pf_send_guc_save_vf_state(struct xe_gt *gt, unsigned int vfid,
>  	xe_gt_assert(gt, size % sizeof(u32) == 0);
>  	xe_gt_assert(gt, size == ndwords * sizeof(u32));
>  
> -	bo = xe_bo_create_pin_map(xe, tile, NULL,
> -				  ALIGN(size, PAGE_SIZE),
> -				  ttm_bo_type_kernel,
> -				  XE_BO_FLAG_SYSTEM |
> -				  XE_BO_FLAG_GGTT |
> -				  XE_BO_FLAG_GGTT_INVALIDATE);
> +	bo = xe_bo_create_pin_map_novm(xe, tile,
> +				       ALIGN(size, PAGE_SIZE),
> +				       ttm_bo_type_kernel,
> +				       XE_BO_FLAG_SYSTEM |
> +				       XE_BO_FLAG_GGTT |
> +				       XE_BO_FLAG_GGTT_INVALIDATE, false);
>  	if (IS_ERR(bo))
>  		return PTR_ERR(bo);
>  
> @@ -91,12 +91,12 @@ static int pf_send_guc_restore_vf_state(struct xe_gt *gt, unsigned int vfid,
>  	xe_gt_assert(gt, size % sizeof(u32) == 0);
>  	xe_gt_assert(gt, size == ndwords * sizeof(u32));
>  
> -	bo = xe_bo_create_pin_map(xe, tile, NULL,
> -				  ALIGN(size, PAGE_SIZE),
> -				  ttm_bo_type_kernel,
> -				  XE_BO_FLAG_SYSTEM |
> -				  XE_BO_FLAG_GGTT |
> -				  XE_BO_FLAG_GGTT_INVALIDATE);
> +	bo = xe_bo_create_pin_map_novm(xe, tile,
> +				       ALIGN(size, PAGE_SIZE),
> +				       ttm_bo_type_kernel,
> +				       XE_BO_FLAG_SYSTEM |
> +				       XE_BO_FLAG_GGTT |
> +				       XE_BO_FLAG_GGTT_INVALIDATE, false);
>  	if (IS_ERR(bo))
>  		return PTR_ERR(bo);
>  
> diff --git a/drivers/gpu/drm/xe/xe_guc_engine_activity.c b/drivers/gpu/drm/xe/xe_guc_engine_activity.c
> index 92e1f9f41b8c..2b99c1ebdd58 100644
> --- a/drivers/gpu/drm/xe/xe_guc_engine_activity.c
> +++ b/drivers/gpu/drm/xe/xe_guc_engine_activity.c
> @@ -94,16 +94,17 @@ static int allocate_engine_activity_buffers(struct xe_guc *guc,
>  	struct xe_tile *tile = gt_to_tile(gt);
>  	struct xe_bo *bo, *metadata_bo;
>  
> -	metadata_bo = xe_bo_create_pin_map(gt_to_xe(gt), tile, NULL, PAGE_ALIGN(metadata_size),
> -					   ttm_bo_type_kernel, XE_BO_FLAG_SYSTEM |
> -					   XE_BO_FLAG_GGTT | XE_BO_FLAG_GGTT_INVALIDATE);
> +	metadata_bo = xe_bo_create_pin_map_novm(gt_to_xe(gt), tile, PAGE_ALIGN(metadata_size),
> +						ttm_bo_type_kernel, XE_BO_FLAG_SYSTEM |
> +						XE_BO_FLAG_GGTT | XE_BO_FLAG_GGTT_INVALIDATE,
> +						false);
>  
>  	if (IS_ERR(metadata_bo))
>  		return PTR_ERR(metadata_bo);
>  
> -	bo = xe_bo_create_pin_map(gt_to_xe(gt), tile, NULL, PAGE_ALIGN(size),
> -				  ttm_bo_type_kernel, XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> -				  XE_BO_FLAG_GGTT | XE_BO_FLAG_GGTT_INVALIDATE);
> +	bo = xe_bo_create_pin_map_novm(gt_to_xe(gt), tile, PAGE_ALIGN(size),
> +				       ttm_bo_type_kernel, XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> +				       XE_BO_FLAG_GGTT | XE_BO_FLAG_GGTT_INVALIDATE, false);
>  
>  	if (IS_ERR(bo)) {
>  		xe_bo_unpin_map_no_vm(metadata_bo);
> diff --git a/drivers/gpu/drm/xe/xe_lmtt.c b/drivers/gpu/drm/xe/xe_lmtt.c
> index a78c9d474a6e..4ad468574174 100644
> --- a/drivers/gpu/drm/xe/xe_lmtt.c
> +++ b/drivers/gpu/drm/xe/xe_lmtt.c
> @@ -67,12 +67,12 @@ static struct xe_lmtt_pt *lmtt_pt_alloc(struct xe_lmtt *lmtt, unsigned int level
>  		goto out;
>  	}
>  
> -	bo = xe_bo_create_pin_map(lmtt_to_xe(lmtt), lmtt_to_tile(lmtt), NULL,
> -				  PAGE_ALIGN(lmtt->ops->lmtt_pte_size(level) *
> -					     lmtt->ops->lmtt_pte_num(level)),
> -				  ttm_bo_type_kernel,
> -				  XE_BO_FLAG_VRAM_IF_DGFX(lmtt_to_tile(lmtt)) |
> -				  XE_BO_FLAG_NEEDS_64K);
> +	bo = xe_bo_create_pin_map_novm(lmtt_to_xe(lmtt), lmtt_to_tile(lmtt),
> +				       PAGE_ALIGN(lmtt->ops->lmtt_pte_size(level) *
> +						  lmtt->ops->lmtt_pte_num(level)),
> +				       ttm_bo_type_kernel,
> +				       XE_BO_FLAG_VRAM_IF_DGFX(lmtt_to_tile(lmtt)) |
> +				       XE_BO_FLAG_NEEDS_64K, false);
>  	if (IS_ERR(bo)) {
>  		err = PTR_ERR(bo);
>  		goto out_free_pt;
> diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
> index 8f6c3ba47882..6d52e0eb97f5 100644
> --- a/drivers/gpu/drm/xe/xe_lrc.c
> +++ b/drivers/gpu/drm/xe/xe_lrc.c
> @@ -1340,9 +1340,10 @@ static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
>  	if (vm && vm->xef) /* userspace */
>  		bo_flags |= XE_BO_FLAG_PINNED_LATE_RESTORE;
>  
> -	lrc->bo = xe_bo_create_pin_map(xe, tile, NULL, bo_size,
> -				       ttm_bo_type_kernel,
> -				       bo_flags);
> +	lrc->bo = xe_bo_create_pin_map_novm(xe, tile,
> +					    bo_size,
> +					    ttm_bo_type_kernel,
> +					    bo_flags, false);

This is in IOCTL call path, so it should interruptable, right?

>  	if (IS_ERR(lrc->bo))
>  		return PTR_ERR(lrc->bo);
>  
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> index 57e6d5a8ac39..b27388db42a5 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -35,6 +35,7 @@
>  #include "xe_sched_job.h"
>  #include "xe_sync.h"
>  #include "xe_trace_bo.h"
> +#include "xe_validation.h"
>  #include "xe_vm.h"
>  #include "xe_vram.h"
>  
> @@ -173,7 +174,7 @@ static void xe_migrate_program_identity(struct xe_device *xe, struct xe_vm *vm,
>  }
>  
>  static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
> -				 struct xe_vm *vm)
> +				 struct xe_vm *vm, struct drm_exec *exec)
>  {
>  	struct xe_device *xe = tile_to_xe(tile);
>  	u16 pat_index = xe->pat.idx[XE_CACHE_WB];
> @@ -200,7 +201,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>  				  num_entries * XE_PAGE_SIZE,
>  				  ttm_bo_type_kernel,
>  				  XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> -				  XE_BO_FLAG_PAGETABLE);
> +				  XE_BO_FLAG_PAGETABLE, exec);
>  	if (IS_ERR(bo))
>  		return PTR_ERR(bo);
>  
> @@ -404,6 +405,8 @@ int xe_migrate_init(struct xe_migrate *m)
>  	struct xe_tile *tile = m->tile;
>  	struct xe_gt *primary_gt = tile->primary_gt;
>  	struct xe_device *xe = tile_to_xe(tile);
> +	struct xe_validation_ctx ctx;
> +	struct drm_exec exec;
>  	struct xe_vm *vm;
>  	int err;
>  
> @@ -413,11 +416,16 @@ int xe_migrate_init(struct xe_migrate *m)
>  	if (IS_ERR(vm))
>  		return PTR_ERR(vm);
>  
> -	xe_vm_lock(vm, false);
> -	err = xe_migrate_prepare_vm(tile, m, vm);
> -	xe_vm_unlock(vm);
> +	err = 0;
> +	xe_validation_guard(&ctx, &xe->val, &exec, (struct xe_val_flags) {}, err) {
> +		err = xe_vm_drm_exec_lock(vm, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +		err = xe_migrate_prepare_vm(tile, m, vm, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +		xe_validation_retry_on_oom(&ctx, &err);
> +	}
>  	if (err)
> -		goto err_out;
> +		return err;
>  
>  	if (xe->info.has_usm) {
>  		struct xe_hw_engine *hwe = xe_gt_hw_engine(primary_gt,
> diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
> index a188bad172ad..a4894eb0d7f3 100644
> --- a/drivers/gpu/drm/xe/xe_oa.c
> +++ b/drivers/gpu/drm/xe/xe_oa.c
> @@ -883,9 +883,9 @@ static int xe_oa_alloc_oa_buffer(struct xe_oa_stream *stream, size_t size)
>  {
>  	struct xe_bo *bo;
>  
> -	bo = xe_bo_create_pin_map(stream->oa->xe, stream->gt->tile, NULL,
> -				  size, ttm_bo_type_kernel,
> -				  XE_BO_FLAG_SYSTEM | XE_BO_FLAG_GGTT);
> +	bo = xe_bo_create_pin_map_novm(stream->oa->xe, stream->gt->tile,
> +				       size, ttm_bo_type_kernel,
> +				       XE_BO_FLAG_SYSTEM | XE_BO_FLAG_GGTT, false);

This is in IOCTL call path, so it should interruptable, right?

Rest LGTM.

Matt

>  	if (IS_ERR(bo))
>  		return PTR_ERR(bo);
>  
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index f3a39e734a90..33ad40418ceb 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -88,6 +88,7 @@ static void xe_pt_free(struct xe_pt *pt)
>   * @vm: The vm to create for.
>   * @tile: The tile to create for.
>   * @level: The page-table level.
> + * @exec: The drm_exec object used to lock the vm.
>   *
>   * Allocate and initialize a single struct xe_pt metadata structure. Also
>   * create the corresponding page-table bo, but don't initialize it. If the
> @@ -99,7 +100,7 @@ static void xe_pt_free(struct xe_pt *pt)
>   * error.
>   */
>  struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
> -			   unsigned int level)
> +			   unsigned int level, struct drm_exec *exec)
>  {
>  	struct xe_pt *pt;
>  	struct xe_bo *bo;
> @@ -123,9 +124,11 @@ struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
>  		bo_flags |= XE_BO_FLAG_PINNED_LATE_RESTORE;
>  
>  	pt->level = level;
> +
> +	drm_WARN_ON(&vm->xe->drm, IS_ERR_OR_NULL(exec));
>  	bo = xe_bo_create_pin_map(vm->xe, tile, vm, SZ_4K,
>  				  ttm_bo_type_kernel,
> -				  bo_flags);
> +				  bo_flags, exec);
>  	if (IS_ERR(bo)) {
>  		err = PTR_ERR(bo);
>  		goto err_kfree;
> @@ -589,7 +592,8 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
>  	if (covers || !*child) {
>  		u64 flags = 0;
>  
> -		xe_child = xe_pt_create(xe_walk->vm, xe_walk->tile, level - 1);
> +		xe_child = xe_pt_create(xe_walk->vm, xe_walk->tile, level - 1,
> +					xe_vm_validation_exec(vm));
>  		if (IS_ERR(xe_child))
>  			return PTR_ERR(xe_child);
>  
> diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
> index 5ecf003d513c..4daeebaab5a1 100644
> --- a/drivers/gpu/drm/xe/xe_pt.h
> +++ b/drivers/gpu/drm/xe/xe_pt.h
> @@ -10,6 +10,7 @@
>  #include "xe_pt_types.h"
>  
>  struct dma_fence;
> +struct drm_exec;
>  struct xe_bo;
>  struct xe_device;
>  struct xe_exec_queue;
> @@ -29,7 +30,7 @@ struct xe_vma_ops;
>  unsigned int xe_pt_shift(unsigned int level);
>  
>  struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
> -			   unsigned int level);
> +			   unsigned int level, struct drm_exec *exec);
>  
>  void xe_pt_populate_empty(struct xe_tile *tile, struct xe_vm *vm,
>  			  struct xe_pt *pt);
> diff --git a/drivers/gpu/drm/xe/xe_pxp_submit.c b/drivers/gpu/drm/xe/xe_pxp_submit.c
> index ca95f2a4d4ef..e60526e30030 100644
> --- a/drivers/gpu/drm/xe/xe_pxp_submit.c
> +++ b/drivers/gpu/drm/xe/xe_pxp_submit.c
> @@ -54,8 +54,9 @@ static int allocate_vcs_execution_resources(struct xe_pxp *pxp)
>  	 * Each termination is 16 DWORDS, so 4K is enough to contain a
>  	 * termination for each sessions.
>  	 */
> -	bo = xe_bo_create_pin_map(xe, tile, NULL, SZ_4K, ttm_bo_type_kernel,
> -				  XE_BO_FLAG_SYSTEM | XE_BO_FLAG_PINNED | XE_BO_FLAG_GGTT);
> +	bo = xe_bo_create_pin_map_novm(xe, tile, SZ_4K, ttm_bo_type_kernel,
> +				       XE_BO_FLAG_SYSTEM | XE_BO_FLAG_PINNED | XE_BO_FLAG_GGTT,
> +				       false);
>  	if (IS_ERR(bo)) {
>  		err = PTR_ERR(bo);
>  		goto out_queue;
> @@ -87,7 +88,9 @@ static int allocate_gsc_client_resources(struct xe_gt *gt,
>  {
>  	struct xe_tile *tile = gt_to_tile(gt);
>  	struct xe_device *xe = tile_to_xe(tile);
> +	struct xe_validation_ctx ctx;
>  	struct xe_hw_engine *hwe;
> +	struct drm_exec exec;
>  	struct xe_vm *vm;
>  	struct xe_bo *bo;
>  	struct xe_exec_queue *q;
> @@ -106,15 +109,26 @@ static int allocate_gsc_client_resources(struct xe_gt *gt,
>  		return PTR_ERR(vm);
>  
>  	/* We allocate a single object for the batch and the in/out memory */
> -	xe_vm_lock(vm, false);
> -	bo = xe_bo_create_pin_map(xe, tile, vm, PXP_BB_SIZE + inout_size * 2,
> -				  ttm_bo_type_kernel,
> -				  XE_BO_FLAG_SYSTEM | XE_BO_FLAG_PINNED | XE_BO_FLAG_NEEDS_UC);
> -	xe_vm_unlock(vm);
> -	if (IS_ERR(bo)) {
> -		err = PTR_ERR(bo);
> -		goto vm_out;
> +
> +	xe_validation_guard(&ctx, &xe->val, &exec, (struct xe_val_flags){}, err) {
> +		err = xe_vm_drm_exec_lock(vm, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +		if (err)
> +			break;
> +
> +		bo = xe_bo_create_pin_map(xe, tile, vm, PXP_BB_SIZE + inout_size * 2,
> +					  ttm_bo_type_kernel,
> +					  XE_BO_FLAG_SYSTEM | XE_BO_FLAG_PINNED |
> +					  XE_BO_FLAG_NEEDS_UC, &exec);
> +		drm_exec_retry_on_contention(&exec);
> +		if (IS_ERR(bo)) {
> +			err = PTR_ERR(bo);
> +			xe_validation_retry_on_oom(&ctx, &err);
> +			break;
> +		}
>  	}
> +	if (err)
> +		goto vm_out;
>  
>  	fence = xe_vm_bind_kernel_bo(vm, bo, NULL, 0, XE_CACHE_WB);
>  	if (IS_ERR(fence)) {
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 23015f369e34..0d8414bd6caa 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -1603,6 +1603,7 @@ static void vm_destroy_work_func(struct work_struct *w);
>   * @xe: xe device.
>   * @tile: tile to set up for.
>   * @vm: vm to set up for.
> + * @exec: The struct drm_exec object used to lock the vm resv.
>   *
>   * Sets up a pagetable tree with one page-table per level and a single
>   * leaf PTE. All pagetable entries point to the single page-table or,
> @@ -1612,20 +1613,19 @@ static void vm_destroy_work_func(struct work_struct *w);
>   * Return: 0 on success, negative error code on error.
>   */
>  static int xe_vm_create_scratch(struct xe_device *xe, struct xe_tile *tile,
> -				struct xe_vm *vm)
> +				struct xe_vm *vm, struct drm_exec *exec)
>  {
>  	u8 id = tile->id;
>  	int i;
>  
>  	for (i = MAX_HUGEPTE_LEVEL; i < vm->pt_root[id]->level; i++) {
> -		vm->scratch_pt[id][i] = xe_pt_create(vm, tile, i);
> +		vm->scratch_pt[id][i] = xe_pt_create(vm, tile, i, exec);
>  		if (IS_ERR(vm->scratch_pt[id][i])) {
>  			int err = PTR_ERR(vm->scratch_pt[id][i]);
>  
>  			vm->scratch_pt[id][i] = NULL;
>  			return err;
>  		}
> -
>  		xe_pt_populate_empty(tile, vm, vm->scratch_pt[id][i]);
>  	}
>  
> @@ -1653,9 +1653,26 @@ static void xe_vm_free_scratch(struct xe_vm *vm)
>  	}
>  }
>  
> +static void xe_vm_pt_destroy(struct xe_vm *vm)
> +{
> +	struct xe_tile *tile;
> +	u8 id;
> +
> +	xe_vm_assert_held(vm);
> +
> +	for_each_tile(tile, vm->xe, id) {
> +		if (vm->pt_root[id]) {
> +			xe_pt_destroy(vm->pt_root[id], vm->flags, NULL);
> +			vm->pt_root[id] = NULL;
> +		}
> +	}
> +}
> +
>  struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct xe_file *xef)
>  {
>  	struct drm_gem_object *vm_resv_obj;
> +	struct xe_validation_ctx ctx;
> +	struct drm_exec exec;
>  	struct xe_vm *vm;
>  	int err, number_tiles = 0;
>  	struct xe_tile *tile;
> @@ -1742,49 +1759,68 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct xe_file *xef)
>  
>  	drm_gem_object_put(vm_resv_obj);
>  
> -	err = xe_vm_lock(vm, true);
> -	if (err)
> -		goto err_close;
> +	err = 0;
> +	xe_validation_guard(&ctx, &xe->val, &exec, (struct xe_val_flags) {.interruptible = true},
> +			    err) {
> +		err = xe_vm_drm_exec_lock(vm, &exec);
> +		drm_exec_retry_on_contention(&exec);
>  
> -	if (IS_DGFX(xe) && xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K)
> -		vm->flags |= XE_VM_FLAG_64K;
> +		if (IS_DGFX(xe) && xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K)
> +			vm->flags |= XE_VM_FLAG_64K;
>  
> -	for_each_tile(tile, xe, id) {
> -		if (flags & XE_VM_FLAG_MIGRATION &&
> -		    tile->id != XE_VM_FLAG_TILE_ID(flags))
> -			continue;
> +		for_each_tile(tile, xe, id) {
> +			if (flags & XE_VM_FLAG_MIGRATION &&
> +			    tile->id != XE_VM_FLAG_TILE_ID(flags))
> +				continue;
>  
> -		vm->pt_root[id] = xe_pt_create(vm, tile, xe->info.vm_max_level);
> -		if (IS_ERR(vm->pt_root[id])) {
> -			err = PTR_ERR(vm->pt_root[id]);
> -			vm->pt_root[id] = NULL;
> -			goto err_unlock_close;
> +			vm->pt_root[id] = xe_pt_create(vm, tile, xe->info.vm_max_level,
> +						       &exec);
> +			if (IS_ERR(vm->pt_root[id])) {
> +				err = PTR_ERR(vm->pt_root[id]);
> +				vm->pt_root[id] = NULL;
> +				xe_vm_pt_destroy(vm);
> +				drm_exec_retry_on_contention(&exec);
> +				xe_validation_retry_on_oom(&ctx, &err);
> +				break;
> +			}
>  		}
> -	}
> +		if (err)
> +			break;
>  
> -	if (xe_vm_has_scratch(vm)) {
> -		for_each_tile(tile, xe, id) {
> -			if (!vm->pt_root[id])
> -				continue;
> +		if (xe_vm_has_scratch(vm)) {
> +			for_each_tile(tile, xe, id) {
> +				if (!vm->pt_root[id])
> +					continue;
>  
> -			err = xe_vm_create_scratch(xe, tile, vm);
> +				err = xe_vm_create_scratch(xe, tile, vm, &exec);
> +				if (err) {
> +					xe_vm_free_scratch(vm);
> +					xe_vm_pt_destroy(vm);
> +					drm_exec_retry_on_contention(&exec);
> +					xe_validation_retry_on_oom(&ctx, &err);
> +					break;
> +				}
> +			}
>  			if (err)
> -				goto err_unlock_close;
> +				break;
> +			vm->batch_invalidate_tlb = true;
>  		}
> -		vm->batch_invalidate_tlb = true;
> -	}
>  
> -	if (vm->flags & XE_VM_FLAG_LR_MODE)
> -		vm->batch_invalidate_tlb = false;
> +		if (vm->flags & XE_VM_FLAG_LR_MODE) {
> +			INIT_WORK(&vm->preempt.rebind_work, preempt_rebind_work_func);
> +			vm->batch_invalidate_tlb = false;
> +		}
>  
> -	/* Fill pt_root after allocating scratch tables */
> -	for_each_tile(tile, xe, id) {
> -		if (!vm->pt_root[id])
> -			continue;
> +		/* Fill pt_root after allocating scratch tables */
> +		for_each_tile(tile, xe, id) {
> +			if (!vm->pt_root[id])
> +				continue;
>  
> -		xe_pt_populate_empty(tile, vm, vm->pt_root[id]);
> +			xe_pt_populate_empty(tile, vm, vm->pt_root[id]);
> +		}
>  	}
> -	xe_vm_unlock(vm);
> +	if (err)
> +		goto err_close;
>  
>  	/* Kernel migration VM shouldn't have a circular loop.. */
>  	if (!(flags & XE_VM_FLAG_MIGRATION)) {
> @@ -1817,7 +1853,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct xe_file *xef)
>  				      &xe->usm.next_asid, GFP_KERNEL);
>  		up_write(&xe->usm.lock);
>  		if (err < 0)
> -			goto err_unlock_close;
> +			goto err_close;
>  
>  		vm->usm.asid = asid;
>  	}
> @@ -1826,8 +1862,6 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct xe_file *xef)
>  
>  	return vm;
>  
> -err_unlock_close:
> -	xe_vm_unlock(vm);
>  err_close:
>  	xe_vm_close_and_put(vm);
>  	return ERR_PTR(err);
> @@ -1956,13 +1990,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
>  	 * destroy the pagetables immediately.
>  	 */
>  	xe_vm_free_scratch(vm);
> -
> -	for_each_tile(tile, xe, id) {
> -		if (vm->pt_root[id]) {
> -			xe_pt_destroy(vm->pt_root[id], vm->flags, NULL);
> -			vm->pt_root[id] = NULL;
> -		}
> -	}
> +	xe_vm_pt_destroy(vm);
>  	xe_vm_unlock(vm);
>  
>  	/*
> @@ -3857,7 +3885,6 @@ struct dma_fence *xe_vm_bind_kernel_bo(struct xe_vm *vm, struct xe_bo *bo,
>   */
>  int xe_vm_lock(struct xe_vm *vm, bool intr)
>  {
> -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
>  	int ret;
>  
>  	if (intr)
> @@ -3865,9 +3892,6 @@ int xe_vm_lock(struct xe_vm *vm, bool intr)
>  	else
>  		ret = dma_resv_lock(xe_vm_resv(vm), NULL);
>  
> -	if (!ret)
> -		xe_vm_set_validation_exec(vm, exec);
> -
>  	return ret;
>  }
>  
> @@ -3879,7 +3903,6 @@ int xe_vm_lock(struct xe_vm *vm, bool intr)
>   */
>  void xe_vm_unlock(struct xe_vm *vm)
>  {
> -	xe_vm_set_validation_exec(vm, NULL);
>  	dma_resv_unlock(xe_vm_resv(vm));
>  }
>  
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 16/16] drm/xe: Convert pinned suspend eviction for exhaustive eviction
  2025-08-22  9:40 ` [PATCH v2 16/16] drm/xe: Convert pinned suspend eviction " Thomas Hellström
@ 2025-08-26 22:08   ` Matthew Brost
  0 siblings, 0 replies; 36+ messages in thread
From: Matthew Brost @ 2025-08-26 22:08 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Fri, Aug 22, 2025 at 11:40:30AM +0200, Thomas Hellström wrote:
> Pinned suspend eviction and preparation for eviction validates
> system memory for eviction buffers. Do that under a
> validation exclusive lock to avoid interfering with other
> processes validating system graphics memory.
> 
> v2:
> - Avoid gotos from within xe_validation_guard().
> - Adapt to signature change of xe_validation_guard().
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_bo.c | 184 +++++++++++++++++++++----------------
>  1 file changed, 103 insertions(+), 81 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 7a62629c88e0..9733f742525a 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -1139,43 +1139,47 @@ long xe_bo_shrink(struct ttm_operation_ctx *ctx, struct ttm_buffer_object *bo,
>  int xe_bo_notifier_prepare_pinned(struct xe_bo *bo)
>  {
>  	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
> -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> +	struct xe_validation_ctx ctx;
> +	struct drm_exec exec;
>  	struct xe_bo *backup;
>  	int ret = 0;
>  
> -	xe_bo_lock(bo, false);
> +	xe_validation_guard(&ctx, &xe->val, &exec, (struct xe_val_flags) {.exclusive = true}, ret) {
> +		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
> +		drm_exec_retry_on_contention(&exec);
> +		xe_assert(xe, !ret);
> +		xe_assert(xe, !bo->backup_obj);
>  
> -	xe_assert(xe, !bo->backup_obj);
> +		/*
> +		 * Since this is called from the PM notifier we might have raced with
> +		 * someone unpinning this after we dropped the pinned list lock and
> +		 * grabbing the above bo lock.
> +		 */
> +		if (!xe_bo_is_pinned(bo))
> +			break;
>  
> -	/*
> -	 * Since this is called from the PM notifier we might have raced with
> -	 * someone unpinning this after we dropped the pinned list lock and
> -	 * grabbing the above bo lock.
> -	 */
> -	if (!xe_bo_is_pinned(bo))
> -		goto out_unlock_bo;
> +		if (!xe_bo_is_vram(bo))
> +			break;
>  
> -	if (!xe_bo_is_vram(bo))
> -		goto out_unlock_bo;
> +		if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
> +			break;
>  
> -	if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
> -		goto out_unlock_bo;
> +		backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
> +					   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> +					   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> +					   XE_BO_FLAG_PINNED, &exec);
> +		if (IS_ERR(backup)) {
> +			drm_exec_retry_on_contention(&exec);
> +			ret = PTR_ERR(backup);
> +			xe_validation_retry_on_oom(&ctx, &ret);
> +			break;
> +		}
>  
> -	backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
> -				   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> -				   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> -				   XE_BO_FLAG_PINNED, exec);
> -	if (IS_ERR(backup)) {
> -		ret = PTR_ERR(backup);
> -		goto out_unlock_bo;
> +		backup->parent_obj = xe_bo_get(bo); /* Released by bo_destroy */
> +		ttm_bo_pin(&backup->ttm);
> +		bo->backup_obj = backup;
>  	}
>  
> -	backup->parent_obj = xe_bo_get(bo); /* Released by bo_destroy */
> -	ttm_bo_pin(&backup->ttm);
> -	bo->backup_obj = backup;
> -
> -out_unlock_bo:
> -	xe_bo_unlock(bo);
>  	return ret;
>  }
>  
> @@ -1201,57 +1205,12 @@ int xe_bo_notifier_unprepare_pinned(struct xe_bo *bo)
>  	return 0;
>  }
>  
> -/**
> - * xe_bo_evict_pinned() - Evict a pinned VRAM object to system memory
> - * @bo: The buffer object to move.
> - *
> - * On successful completion, the object memory will be moved to system memory.
> - *
> - * This is needed to for special handling of pinned VRAM object during
> - * suspend-resume.
> - *
> - * Return: 0 on success. Negative error code on failure.
> - */
> -int xe_bo_evict_pinned(struct xe_bo *bo)
> +static int xe_bo_evict_pinned_copy(struct xe_bo *bo, struct xe_bo *backup)
>  {
> -	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
> -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> -	struct xe_bo *backup = bo->backup_obj;
> -	bool backup_created = false;
> +	struct xe_device *xe = xe_bo_device(bo);
>  	bool unmap = false;
>  	int ret = 0;
>  
> -	xe_bo_lock(bo, false);
> -
> -	if (WARN_ON(!bo->ttm.resource)) {
> -		ret = -EINVAL;
> -		goto out_unlock_bo;
> -	}
> -
> -	if (WARN_ON(!xe_bo_is_pinned(bo))) {
> -		ret = -EINVAL;
> -		goto out_unlock_bo;
> -	}
> -
> -	if (!xe_bo_is_vram(bo))
> -		goto out_unlock_bo;
> -
> -	if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
> -		goto out_unlock_bo;
> -
> -	if (!backup) {
> -		backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL, xe_bo_size(bo),
> -					   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> -					   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> -					   XE_BO_FLAG_PINNED, exec);
> -		if (IS_ERR(backup)) {
> -			ret = PTR_ERR(backup);
> -			goto out_unlock_bo;
> -		}
> -		backup->parent_obj = xe_bo_get(bo); /* Released by bo_destroy */
> -		backup_created = true;
> -	}
> -
>  	if (xe_bo_is_user(bo) || (bo->flags & XE_BO_FLAG_PINNED_LATE_RESTORE)) {
>  		struct xe_migrate *migrate;
>  		struct dma_fence *fence;
> @@ -1289,7 +1248,7 @@ int xe_bo_evict_pinned(struct xe_bo *bo)
>  		if (iosys_map_is_null(&bo->vmap)) {
>  			ret = xe_bo_vmap(bo);
>  			if (ret)
> -				goto out_backup;
> +				goto out_vunmap;
>  			unmap = true;
>  		}
>  
> @@ -1299,15 +1258,78 @@ int xe_bo_evict_pinned(struct xe_bo *bo)
>  
>  	if (!bo->backup_obj)
>  		bo->backup_obj = backup;
> -
> -out_backup:
> +out_vunmap:
>  	xe_bo_vunmap(backup);
> -	if (ret && backup_created)
> -		xe_bo_put(backup);
> -out_unlock_bo:
> +out_backup:
>  	if (unmap)
>  		xe_bo_vunmap(bo);
> -	xe_bo_unlock(bo);
> +
> +	return ret;
> +}
> +
> +/**
> + * xe_bo_evict_pinned() - Evict a pinned VRAM object to system memory
> + * @bo: The buffer object to move.
> + *
> + * On successful completion, the object memory will be moved to system memory.
> + *
> + * This is needed to for special handling of pinned VRAM object during
> + * suspend-resume.
> + *
> + * Return: 0 on success. Negative error code on failure.
> + */
> +int xe_bo_evict_pinned(struct xe_bo *bo)
> +{
> +	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
> +	struct xe_validation_ctx ctx;
> +	struct drm_exec exec;
> +	struct xe_bo *backup = bo->backup_obj;
> +	bool backup_created = false;
> +	int ret = 0;
> +
> +	xe_validation_guard(&ctx, &xe->val, &exec, (struct xe_val_flags) {.exclusive = true}, ret) {
> +		ret = drm_exec_lock_obj(&exec, &bo->ttm.base);
> +		drm_exec_retry_on_contention(&exec);
> +		xe_assert(xe, !ret);
> +
> +		if (WARN_ON(!bo->ttm.resource)) {
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		if (WARN_ON(!xe_bo_is_pinned(bo))) {
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		if (!xe_bo_is_vram(bo))
> +			break;
> +
> +		if (bo->flags & XE_BO_FLAG_PINNED_NORESTORE)
> +			break;
> +
> +		if (!backup) {
> +			backup = xe_bo_init_locked(xe, NULL, NULL, bo->ttm.base.resv, NULL,
> +						   xe_bo_size(bo),
> +						   DRM_XE_GEM_CPU_CACHING_WB, ttm_bo_type_kernel,
> +						   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS |
> +						   XE_BO_FLAG_PINNED, &exec);
> +			if (IS_ERR(backup)) {
> +				drm_exec_retry_on_contention(&exec);
> +				ret = PTR_ERR(backup);
> +				xe_validation_retry_on_oom(&ctx, &ret);
> +				break;
> +			}
> +			backup->parent_obj = xe_bo_get(bo); /* Released by bo_destroy */
> +			backup_created = true;
> +		}
> +
> +		ret = xe_bo_evict_pinned_copy(bo, backup);
> +	}
> +
> +	if (ret && backup_created)
> +		xe_bo_put(backup);
> +
>  	return ret;
>  }
>  
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 09/16] drm/xe: Convert the CPU fault handler for exhaustive eviction
  2025-08-22  9:40 ` [PATCH v2 09/16] drm/xe: Convert the CPU fault handler " Thomas Hellström
@ 2025-08-26 22:53   ` Matthew Brost
  2025-08-27 14:16     ` Thomas Hellström
  0 siblings, 1 reply; 36+ messages in thread
From: Matthew Brost @ 2025-08-26 22:53 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Fri, Aug 22, 2025 at 11:40:23AM +0200, Thomas Hellström wrote:
> The CPU fault handler may populate bos and migrate, and in doing
> so might interfere with other tasks validating.
> 
> Rework the CPU fault handler completely into a fastpath
> and a slowpath. The fastpath trylocks only the validation lock
> in read-mode. If that fails, there's a fallback to the
> slowpath, where we do a full validation transaction.
> 
> This mandates open-coding of bo locking, bo idling and
> bo populating, but we still call into TTM for fault
> finalizing.
> 
> v2:
> - Rework the CPU fault handler to actually take part in
>   the exhaustive eviction scheme (Matthew Brost).
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.c         | 191 ++++++++++++++++++++++++-----
>  drivers/gpu/drm/xe/xe_validation.c |   3 +-
>  2 files changed, 163 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 76e9c93826a2..686ca5d6038a 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -1713,57 +1713,188 @@ static void xe_gem_object_close(struct drm_gem_object *obj,
>  	}
>  }
>  
> -static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
> +static vm_fault_t __xe_bo_cpu_fault(struct vm_fault *vmf, struct xe_device *xe, struct xe_bo *bo)
> +{
> +	vm_fault_t ret;
> +
> +	trace_xe_bo_cpu_fault(bo);
> +
> +	ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
> +				       TTM_BO_VM_NUM_PREFAULT);
> +	if (ret == VM_FAULT_NOPAGE &&
> +	    mem_type_is_vram(bo->ttm.resource->mem_type)) {
> +		mutex_lock(&xe->mem_access.vram_userfault.lock);
> +		if (list_empty(&bo->vram_userfault_link))
> +			list_add(&bo->vram_userfault_link,
> +				 &xe->mem_access.vram_userfault.list);
> +		mutex_unlock(&xe->mem_access.vram_userfault.lock);
> +	}
> +
> +	return ret;
> +}
> +
> +static vm_fault_t xe_err_to_fault_t(int err)
> +{
> +	switch (err) {
> +	case 0:
> +	case -EINTR:
> +	case -ERESTARTSYS:
> +	case -EAGAIN:
> +		return VM_FAULT_NOPAGE;
> +	case -ENOMEM:
> +	case -ENOSPC:
> +		return VM_FAULT_OOM;
> +	default:
> +		break;
> +	}
> +	return VM_FAULT_SIGBUS;
> +}
> +
> +static vm_fault_t xe_bo_cpu_fault_fastpath(struct vm_fault *vmf, struct xe_device *xe,
> +					   struct xe_bo *bo, bool needs_rpm)
> +{
> +	struct ttm_buffer_object *tbo = &bo->ttm;
> +	vm_fault_t ret = VM_FAULT_RETRY;
> +	struct xe_validation_ctx ctx;
> +	int err;
> +
> +	if (needs_rpm && !xe_pm_runtime_get_if_active(xe))
> +		return VM_FAULT_RETRY;
> +
> +	err = xe_validation_ctx_init(&ctx, &xe->val, NULL,
> +				     (struct xe_val_flags) {
> +					     .interruptible = true,
> +					     .no_block = true
> +				     });
> +	if (err)
> +		goto out_pm;
> +
> +	if (!dma_resv_trylock(tbo->base.resv))
> +		goto out_validation;
> +
> +	if (!dma_resv_test_signaled(tbo->base.resv, DMA_RESV_USAGE_KERNEL))
> +		goto out_unlock;
> +
> +	if (!tbo->resource->bus.is_iomem) {
> +		struct ttm_operation_ctx ctx = {
> +			.interruptible = true,
> +			.no_wait_gpu = true,
> +			.gfp_retry_mayfail = true,
> +		};
> +
> +		err = ttm_bo_populate(tbo, &ctx);

The version of the fault handler before didn't have a ttm_bo_populate
call. Can you explain why it is added here?

Also in we have this code in ttm_bo_vm_reserve which rejects external
object marked as unmappable. Do we need something like this?

        if (bo->ttm && (bo->ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
                if (!(bo->ttm->page_flags & TTM_TT_FLAG_EXTERNAL_MAPPABLE)) {
                        dma_resv_unlock(bo->base.resv);
                        return VM_FAULT_SIGBUS;
                }
        }

Matt

> +		if (err) {
> +			if (err != -ENOMEM && err != -ENOSPC)
> +				ret = xe_err_to_fault_t(err);
> +			goto out_unlock;
> +		}
> +	}
> +
> +	ret = __xe_bo_cpu_fault(vmf, xe, bo);
> +
> +out_unlock:
> +	dma_resv_unlock(tbo->base.resv);
> +out_validation:
> +	xe_validation_ctx_fini(&ctx);
> +out_pm:
> +	if (needs_rpm)
> +		xe_pm_runtime_put(xe);
> +
> +	return ret;
> +}
> +
> +static vm_fault_t xe_bo_cpu_fault(struct vm_fault *vmf)
>  {
>  	struct ttm_buffer_object *tbo = vmf->vma->vm_private_data;
>  	struct drm_device *ddev = tbo->base.dev;
>  	struct xe_device *xe = to_xe_device(ddev);
>  	struct xe_bo *bo = ttm_to_xe_bo(tbo);
>  	bool needs_rpm = bo->flags & XE_BO_FLAG_VRAM_MASK;
> -	struct drm_exec *exec;
> +	bool retry_after_wait = false;
> +	struct xe_validation_ctx ctx;
> +	struct drm_exec exec;
>  	vm_fault_t ret;
> +	int err = 0;
>  	int idx;
>  
> +	if (!drm_dev_enter(&xe->drm, &idx))
> +		return ttm_bo_vm_dummy_page(vmf, vmf->vma->vm_page_prot);
> +
> +	ret = xe_bo_cpu_fault_fastpath(vmf, xe, bo, needs_rpm);
> +	if (ret != VM_FAULT_RETRY)
> +		goto out;
> +
> +	if (fault_flag_allow_retry_first(vmf->flags)) {
> +		if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
> +			goto out;
> +		retry_after_wait = true;
> +		xe_bo_get(bo);
> +		mmap_read_unlock(vmf->vma->vm_mm);
> +	} else {
> +		ret = VM_FAULT_NOPAGE;
> +	}
> +
> +	/*
> +	 * The fastpath failed and we were not required to return and retry immediately.
> +	 * We're now running in one of two modes:
> +	 *
> +	 * 1) retry_after_wait == true: The mmap_read_lock() is dropped, and we're trying
> +	 * to resolve blocking waits. But we can't resolve the fault since the
> +	 * mmap_read_lock() is dropped. After retrying the fault, the aim is that the fastpath
> +	 * should succeed. But it may fail since we drop the bo lock.
> +	 *
> +	 * 2) retry_after_wait == false: The fastpath failed, typically even after
> +	 * a retry. Do whatever's necessary to resolve the fault.
> +	 *
> +	 * This construct is recommended to avoid excessive waits under the mmap_lock.
> +	 */
> +
>  	if (needs_rpm)
>  		xe_pm_runtime_get(xe);
>  
> -	exec = XE_VALIDATION_UNIMPLEMENTED;
> -	ret = ttm_bo_vm_reserve(tbo, vmf);
> -	if (ret)
> -		goto out;
> +	xe_validation_guard(&ctx, &xe->val, &exec, (struct xe_val_flags) {.interruptible = true},
> +			    err) {
> +		long lerr;
>  
> -	if (drm_dev_enter(ddev, &idx)) {
> -		trace_xe_bo_cpu_fault(bo);
> +		err = drm_exec_lock_obj(&exec, &tbo->base);
> +		drm_exec_retry_on_contention(&exec);
> +		if (err)
> +			break;
>  
> -		xe_validation_assert_exec(xe, exec, &tbo->base);
> -		ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
> -					       TTM_BO_VM_NUM_PREFAULT);
> -		drm_dev_exit(idx);
> +		lerr = dma_resv_wait_timeout(tbo->base.resv,
> +					     DMA_RESV_USAGE_KERNEL, true,
> +					     MAX_SCHEDULE_TIMEOUT);
> +		if (lerr < 0) {
> +			err = lerr;
> +			break;
> +		}
>  
> -		if (ret == VM_FAULT_RETRY &&
> -		    !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
> -			goto out;
> +		if (!tbo->resource->bus.is_iomem) {
> +			struct ttm_operation_ctx tctx = {
> +				.interruptible = true,
> +				.no_wait_gpu = false,
> +				.gfp_retry_mayfail = true,
> +			};
>  
> -		/*
> -		 * ttm_bo_vm_reserve() already has dma_resv_lock.
> -		 */
> -		if (ret == VM_FAULT_NOPAGE &&
> -		    mem_type_is_vram(tbo->resource->mem_type)) {
> -			mutex_lock(&xe->mem_access.vram_userfault.lock);
> -			if (list_empty(&bo->vram_userfault_link))
> -				list_add(&bo->vram_userfault_link,
> -					 &xe->mem_access.vram_userfault.list);
> -			mutex_unlock(&xe->mem_access.vram_userfault.lock);
> +			err = ttm_bo_populate(tbo, &tctx);
> +			xe_validation_retry_on_oom(&ctx, &err);
> +			if (err && (err == -EINTR || err == -ERESTARTSYS))
> +				break;
>  		}
> -	} else {
> -		ret = ttm_bo_vm_dummy_page(vmf, vmf->vma->vm_page_prot);
> +		if (!retry_after_wait)
> +			ret = __xe_bo_cpu_fault(vmf, xe, bo);
>  	}
> +	if (err)
> +		ret = xe_err_to_fault_t(err);
>  
> -	dma_resv_unlock(tbo->base.resv);
> -out:
>  	if (needs_rpm)
>  		xe_pm_runtime_put(xe);
>  
> +	if (retry_after_wait)
> +		xe_bo_put(bo);
> +out:
> +	drm_dev_exit(idx);
> +
>  	return ret;
>  }
>  
> @@ -1807,7 +1938,7 @@ int xe_bo_read(struct xe_bo *bo, u64 offset, void *dst, int size)
>  }
>  
>  static const struct vm_operations_struct xe_gem_vm_ops = {
> -	.fault = xe_gem_fault,
> +	.fault = xe_bo_cpu_fault,
>  	.open = ttm_bo_vm_open,
>  	.close = ttm_bo_vm_close,
>  	.access = xe_bo_vm_access,
> diff --git a/drivers/gpu/drm/xe/xe_validation.c b/drivers/gpu/drm/xe/xe_validation.c
> index b90fda3dd5f4..826cd09966ef 100644
> --- a/drivers/gpu/drm/xe/xe_validation.c
> +++ b/drivers/gpu/drm/xe/xe_validation.c
> @@ -241,7 +241,8 @@ int xe_validation_exec_lock(struct xe_validation_ctx *ctx,
>   */
>  void xe_validation_ctx_fini(struct xe_validation_ctx *ctx)
>  {
> -	drm_exec_fini(ctx->exec);
> +	if (ctx->exec)
> +		drm_exec_fini(ctx->exec);
>  	xe_validation_unlock(ctx);
>  }
>  
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 09/16] drm/xe: Convert the CPU fault handler for exhaustive eviction
  2025-08-26 22:53   ` Matthew Brost
@ 2025-08-27 14:16     ` Thomas Hellström
  2025-08-27 15:52       ` Matthew Brost
  0 siblings, 1 reply; 36+ messages in thread
From: Thomas Hellström @ 2025-08-27 14:16 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Tue, 2025-08-26 at 15:53 -0700, Matthew Brost wrote:
> On Fri, Aug 22, 2025 at 11:40:23AM +0200, Thomas Hellström wrote:
> > The CPU fault handler may populate bos and migrate, and in doing
> > so might interfere with other tasks validating.
> > 
> > Rework the CPU fault handler completely into a fastpath
> > and a slowpath. The fastpath trylocks only the validation lock
> > in read-mode. If that fails, there's a fallback to the
> > slowpath, where we do a full validation transaction.
> > 
> > This mandates open-coding of bo locking, bo idling and
> > bo populating, but we still call into TTM for fault
> > finalizing.
> > 
> > v2:
> > - Rework the CPU fault handler to actually take part in
> >   the exhaustive eviction scheme (Matthew Brost).
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_bo.c         | 191 ++++++++++++++++++++++++-
> > ----
> >  drivers/gpu/drm/xe/xe_validation.c |   3 +-
> >  2 files changed, 163 insertions(+), 31 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > b/drivers/gpu/drm/xe/xe_bo.c
> > index 76e9c93826a2..686ca5d6038a 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > @@ -1713,57 +1713,188 @@ static void xe_gem_object_close(struct
> > drm_gem_object *obj,
> >  	}
> >  }
> >  
> > -static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
> > +static vm_fault_t __xe_bo_cpu_fault(struct vm_fault *vmf, struct
> > xe_device *xe, struct xe_bo *bo)
> > +{
> > +	vm_fault_t ret;
> > +
> > +	trace_xe_bo_cpu_fault(bo);
> > +
> > +	ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma-
> > >vm_page_prot,
> > +				       TTM_BO_VM_NUM_PREFAULT);
> > +	if (ret == VM_FAULT_NOPAGE &&
> > +	    mem_type_is_vram(bo->ttm.resource->mem_type)) {
> > +		mutex_lock(&xe->mem_access.vram_userfault.lock);
> > +		if (list_empty(&bo->vram_userfault_link))
> > +			list_add(&bo->vram_userfault_link,
> > +				 &xe-
> > >mem_access.vram_userfault.list);
> > +		mutex_unlock(&xe->mem_access.vram_userfault.lock);
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static vm_fault_t xe_err_to_fault_t(int err)
> > +{
> > +	switch (err) {
> > +	case 0:
> > +	case -EINTR:
> > +	case -ERESTARTSYS:
> > +	case -EAGAIN:
> > +		return VM_FAULT_NOPAGE;
> > +	case -ENOMEM:
> > +	case -ENOSPC:
> > +		return VM_FAULT_OOM;
> > +	default:
> > +		break;
> > +	}
> > +	return VM_FAULT_SIGBUS;
> > +}
> > +
> > +static vm_fault_t xe_bo_cpu_fault_fastpath(struct vm_fault *vmf,
> > struct xe_device *xe,
> > +					   struct xe_bo *bo, bool
> > needs_rpm)
> > +{
> > +	struct ttm_buffer_object *tbo = &bo->ttm;
> > +	vm_fault_t ret = VM_FAULT_RETRY;
> > +	struct xe_validation_ctx ctx;
> > +	int err;
> > +
> > +	if (needs_rpm && !xe_pm_runtime_get_if_active(xe))
> > +		return VM_FAULT_RETRY;
> > +
> > +	err = xe_validation_ctx_init(&ctx, &xe->val, NULL,
> > +				     (struct xe_val_flags) {
> > +					     .interruptible =
> > true,
> > +					     .no_block = true
> > +				     });
> > +	if (err)
> > +		goto out_pm;
> > +
> > +	if (!dma_resv_trylock(tbo->base.resv))
> > +		goto out_validation;
> > +
> > +	if (!dma_resv_test_signaled(tbo->base.resv,
> > DMA_RESV_USAGE_KERNEL))
> > +		goto out_unlock;
> > +
> > +	if (!tbo->resource->bus.is_iomem) {
> > +		struct ttm_operation_ctx ctx = {
> > +			.interruptible = true,
> > +			.no_wait_gpu = true,
> > +			.gfp_retry_mayfail = true,
> > +		};
> > +
> > +		err = ttm_bo_populate(tbo, &ctx);
> 
> The version of the fault handler before didn't have a ttm_bo_populate
> call. Can you explain why it is added here?

It's called from within ttm_bo_vm_fault_reserved() but with a blocking
ttm_operation_ctx. Here we call it non-blocking and if it succeeds the
version in ttm_bo_vm_fault_reserved() will be a NOP.

The functionality is to bring in bos from swap if needed. Or to be able
to access bos with deferred backing store allocation.

> 
> Also in we have this code in ttm_bo_vm_reserve which rejects external
> object marked as unmappable. Do we need something like this?
> 
>         if (bo->ttm && (bo->ttm->page_flags & TTM_TT_FLAG_EXTERNAL))
> {
>                 if (!(bo->ttm->page_flags &
> TTM_TT_FLAG_EXTERNAL_MAPPABLE)) {
>                         dma_resv_unlock(bo->base.resv);
>                         return VM_FAULT_SIGBUS;
>                 }
>         }

Ah, right. This essentially blocks imported dma-bufs from being mapped
here. I'll fix. This reminds me to add a comment to rework the ttm
helpers to move this stuff to TTM.

/Thomas

> 
> Matt
> 
> > +		if (err) {
> > +			if (err != -ENOMEM && err != -ENOSPC)
> > +				ret = xe_err_to_fault_t(err);
> > +			goto out_unlock;
> > +		}
> > +	}
> > +
> > +	ret = __xe_bo_cpu_fault(vmf, xe, bo);
> > +
> > +out_unlock:
> > +	dma_resv_unlock(tbo->base.resv);
> > +out_validation:
> > +	xe_validation_ctx_fini(&ctx);
> > +out_pm:
> > +	if (needs_rpm)
> > +		xe_pm_runtime_put(xe);
> > +
> > +	return ret;
> > +}
> > +
> > +static vm_fault_t xe_bo_cpu_fault(struct vm_fault *vmf)
> >  {
> >  	struct ttm_buffer_object *tbo = vmf->vma->vm_private_data;
> >  	struct drm_device *ddev = tbo->base.dev;
> >  	struct xe_device *xe = to_xe_device(ddev);
> >  	struct xe_bo *bo = ttm_to_xe_bo(tbo);
> >  	bool needs_rpm = bo->flags & XE_BO_FLAG_VRAM_MASK;
> > -	struct drm_exec *exec;
> > +	bool retry_after_wait = false;
> > +	struct xe_validation_ctx ctx;
> > +	struct drm_exec exec;
> >  	vm_fault_t ret;
> > +	int err = 0;
> >  	int idx;
> >  
> > +	if (!drm_dev_enter(&xe->drm, &idx))
> > +		return ttm_bo_vm_dummy_page(vmf, vmf->vma-
> > >vm_page_prot);
> > +
> > +	ret = xe_bo_cpu_fault_fastpath(vmf, xe, bo, needs_rpm);
> > +	if (ret != VM_FAULT_RETRY)
> > +		goto out;
> > +
> > +	if (fault_flag_allow_retry_first(vmf->flags)) {
> > +		if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
> > +			goto out;
> > +		retry_after_wait = true;
> > +		xe_bo_get(bo);
> > +		mmap_read_unlock(vmf->vma->vm_mm);
> > +	} else {
> > +		ret = VM_FAULT_NOPAGE;
> > +	}
> > +
> > +	/*
> > +	 * The fastpath failed and we were not required to return
> > and retry immediately.
> > +	 * We're now running in one of two modes:
> > +	 *
> > +	 * 1) retry_after_wait == true: The mmap_read_lock() is
> > dropped, and we're trying
> > +	 * to resolve blocking waits. But we can't resolve the
> > fault since the
> > +	 * mmap_read_lock() is dropped. After retrying the fault,
> > the aim is that the fastpath
> > +	 * should succeed. But it may fail since we drop the bo
> > lock.
> > +	 *
> > +	 * 2) retry_after_wait == false: The fastpath failed,
> > typically even after
> > +	 * a retry. Do whatever's necessary to resolve the fault.
> > +	 *
> > +	 * This construct is recommended to avoid excessive waits
> > under the mmap_lock.
> > +	 */
> > +
> >  	if (needs_rpm)
> >  		xe_pm_runtime_get(xe);
> >  
> > -	exec = XE_VALIDATION_UNIMPLEMENTED;
> > -	ret = ttm_bo_vm_reserve(tbo, vmf);
> > -	if (ret)
> > -		goto out;
> > +	xe_validation_guard(&ctx, &xe->val, &exec, (struct
> > xe_val_flags) {.interruptible = true},
> > +			    err) {
> > +		long lerr;
> >  
> > -	if (drm_dev_enter(ddev, &idx)) {
> > -		trace_xe_bo_cpu_fault(bo);
> > +		err = drm_exec_lock_obj(&exec, &tbo->base);
> > +		drm_exec_retry_on_contention(&exec);
> > +		if (err)
> > +			break;
> >  
> > -		xe_validation_assert_exec(xe, exec, &tbo->base);
> > -		ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma-
> > >vm_page_prot,
> > -					      
> > TTM_BO_VM_NUM_PREFAULT);
> > -		drm_dev_exit(idx);
> > +		lerr = dma_resv_wait_timeout(tbo->base.resv,
> > +					    
> > DMA_RESV_USAGE_KERNEL, true,
> > +					    
> > MAX_SCHEDULE_TIMEOUT);
> > +		if (lerr < 0) {
> > +			err = lerr;
> > +			break;
> > +		}
> >  
> > -		if (ret == VM_FAULT_RETRY &&
> > -		    !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
> > -			goto out;
> > +		if (!tbo->resource->bus.is_iomem) {
> > +			struct ttm_operation_ctx tctx = {
> > +				.interruptible = true,
> > +				.no_wait_gpu = false,
> > +				.gfp_retry_mayfail = true,
> > +			};
> >  
> > -		/*
> > -		 * ttm_bo_vm_reserve() already has dma_resv_lock.
> > -		 */
> > -		if (ret == VM_FAULT_NOPAGE &&
> > -		    mem_type_is_vram(tbo->resource->mem_type)) {
> > -			mutex_lock(&xe-
> > >mem_access.vram_userfault.lock);
> > -			if (list_empty(&bo->vram_userfault_link))
> > -				list_add(&bo->vram_userfault_link,
> > -					 &xe-
> > >mem_access.vram_userfault.list);
> > -			mutex_unlock(&xe-
> > >mem_access.vram_userfault.lock);
> > +			err = ttm_bo_populate(tbo, &tctx);
> > +			xe_validation_retry_on_oom(&ctx, &err);
> > +			if (err && (err == -EINTR || err == -
> > ERESTARTSYS))
> > +				break;
> >  		}
> > -	} else {
> > -		ret = ttm_bo_vm_dummy_page(vmf, vmf->vma-
> > >vm_page_prot);
> > +		if (!retry_after_wait)
> > +			ret = __xe_bo_cpu_fault(vmf, xe, bo);
> >  	}
> > +	if (err)
> > +		ret = xe_err_to_fault_t(err);
> >  
> > -	dma_resv_unlock(tbo->base.resv);
> > -out:
> >  	if (needs_rpm)
> >  		xe_pm_runtime_put(xe);
> >  
> > +	if (retry_after_wait)
> > +		xe_bo_put(bo);
> > +out:
> > +	drm_dev_exit(idx);
> > +
> >  	return ret;
> >  }
> >  
> > @@ -1807,7 +1938,7 @@ int xe_bo_read(struct xe_bo *bo, u64 offset,
> > void *dst, int size)
> >  }
> >  
> >  static const struct vm_operations_struct xe_gem_vm_ops = {
> > -	.fault = xe_gem_fault,
> > +	.fault = xe_bo_cpu_fault,
> >  	.open = ttm_bo_vm_open,
> >  	.close = ttm_bo_vm_close,
> >  	.access = xe_bo_vm_access,
> > diff --git a/drivers/gpu/drm/xe/xe_validation.c
> > b/drivers/gpu/drm/xe/xe_validation.c
> > index b90fda3dd5f4..826cd09966ef 100644
> > --- a/drivers/gpu/drm/xe/xe_validation.c
> > +++ b/drivers/gpu/drm/xe/xe_validation.c
> > @@ -241,7 +241,8 @@ int xe_validation_exec_lock(struct
> > xe_validation_ctx *ctx,
> >   */
> >  void xe_validation_ctx_fini(struct xe_validation_ctx *ctx)
> >  {
> > -	drm_exec_fini(ctx->exec);
> > +	if (ctx->exec)
> > +		drm_exec_fini(ctx->exec);
> >  	xe_validation_unlock(ctx);
> >  }
> >  
> > -- 
> > 2.50.1
> > 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 09/16] drm/xe: Convert the CPU fault handler for exhaustive eviction
  2025-08-27 14:16     ` Thomas Hellström
@ 2025-08-27 15:52       ` Matthew Brost
  2025-08-28  6:18         ` Thomas Hellström
  0 siblings, 1 reply; 36+ messages in thread
From: Matthew Brost @ 2025-08-27 15:52 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, Aug 27, 2025 at 04:16:25PM +0200, Thomas Hellström wrote:
> On Tue, 2025-08-26 at 15:53 -0700, Matthew Brost wrote:
> > On Fri, Aug 22, 2025 at 11:40:23AM +0200, Thomas Hellström wrote:
> > > The CPU fault handler may populate bos and migrate, and in doing
> > > so might interfere with other tasks validating.
> > > 
> > > Rework the CPU fault handler completely into a fastpath
> > > and a slowpath. The fastpath trylocks only the validation lock
> > > in read-mode. If that fails, there's a fallback to the
> > > slowpath, where we do a full validation transaction.
> > > 
> > > This mandates open-coding of bo locking, bo idling and
> > > bo populating, but we still call into TTM for fault
> > > finalizing.
> > > 
> > > v2:
> > > - Rework the CPU fault handler to actually take part in
> > >   the exhaustive eviction scheme (Matthew Brost).
> > > 
> > > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_bo.c         | 191 ++++++++++++++++++++++++-
> > > ----
> > >  drivers/gpu/drm/xe/xe_validation.c |   3 +-
> > >  2 files changed, 163 insertions(+), 31 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > > b/drivers/gpu/drm/xe/xe_bo.c
> > > index 76e9c93826a2..686ca5d6038a 100644
> > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > @@ -1713,57 +1713,188 @@ static void xe_gem_object_close(struct
> > > drm_gem_object *obj,
> > >  	}
> > >  }
> > >  
> > > -static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
> > > +static vm_fault_t __xe_bo_cpu_fault(struct vm_fault *vmf, struct
> > > xe_device *xe, struct xe_bo *bo)
> > > +{
> > > +	vm_fault_t ret;
> > > +
> > > +	trace_xe_bo_cpu_fault(bo);
> > > +
> > > +	ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma-
> > > >vm_page_prot,
> > > +				       TTM_BO_VM_NUM_PREFAULT);
> > > +	if (ret == VM_FAULT_NOPAGE &&
> > > +	    mem_type_is_vram(bo->ttm.resource->mem_type)) {
> > > +		mutex_lock(&xe->mem_access.vram_userfault.lock);
> > > +		if (list_empty(&bo->vram_userfault_link))
> > > +			list_add(&bo->vram_userfault_link,
> > > +				 &xe-
> > > >mem_access.vram_userfault.list);
> > > +		mutex_unlock(&xe->mem_access.vram_userfault.lock);
> > > +	}
> > > +
> > > +	return ret;
> > > +}
> > > +
> > > +static vm_fault_t xe_err_to_fault_t(int err)
> > > +{
> > > +	switch (err) {
> > > +	case 0:
> > > +	case -EINTR:
> > > +	case -ERESTARTSYS:
> > > +	case -EAGAIN:
> > > +		return VM_FAULT_NOPAGE;
> > > +	case -ENOMEM:
> > > +	case -ENOSPC:
> > > +		return VM_FAULT_OOM;
> > > +	default:
> > > +		break;
> > > +	}
> > > +	return VM_FAULT_SIGBUS;
> > > +}
> > > +
> > > +static vm_fault_t xe_bo_cpu_fault_fastpath(struct vm_fault *vmf,
> > > struct xe_device *xe,
> > > +					   struct xe_bo *bo, bool
> > > needs_rpm)
> > > +{
> > > +	struct ttm_buffer_object *tbo = &bo->ttm;
> > > +	vm_fault_t ret = VM_FAULT_RETRY;
> > > +	struct xe_validation_ctx ctx;
> > > +	int err;
> > > +
> > > +	if (needs_rpm && !xe_pm_runtime_get_if_active(xe))
> > > +		return VM_FAULT_RETRY;
> > > +
> > > +	err = xe_validation_ctx_init(&ctx, &xe->val, NULL,
> > > +				     (struct xe_val_flags) {
> > > +					     .interruptible =
> > > true,
> > > +					     .no_block = true
> > > +				     });
> > > +	if (err)
> > > +		goto out_pm;
> > > +
> > > +	if (!dma_resv_trylock(tbo->base.resv))
> > > +		goto out_validation;
> > > +
> > > +	if (!dma_resv_test_signaled(tbo->base.resv,
> > > DMA_RESV_USAGE_KERNEL))
> > > +		goto out_unlock;
> > > +
> > > +	if (!tbo->resource->bus.is_iomem) {
> > > +		struct ttm_operation_ctx ctx = {
> > > +			.interruptible = true,
> > > +			.no_wait_gpu = true,
> > > +			.gfp_retry_mayfail = true,
> > > +		};
> > > +
> > > +		err = ttm_bo_populate(tbo, &ctx);
> > 
> > The version of the fault handler before didn't have a ttm_bo_populate
> > call. Can you explain why it is added here?
> 
> It's called from within ttm_bo_vm_fault_reserved() but with a blocking
> ttm_operation_ctx. Here we call it non-blocking and if it succeeds the
> version in ttm_bo_vm_fault_reserved() will be a NOP.
> 
> The functionality is to bring in bos from swap if needed. Or to be able
> to access bos with deferred backing store allocation.
> 

Ah, yes. I see that now. This made notice potential another issue. See below.

> > 
> > Also in we have this code in ttm_bo_vm_reserve which rejects external
> > object marked as unmappable. Do we need something like this?
> > 
> >         if (bo->ttm && (bo->ttm->page_flags & TTM_TT_FLAG_EXTERNAL))
> > {
> >                 if (!(bo->ttm->page_flags &
> > TTM_TT_FLAG_EXTERNAL_MAPPABLE)) {
> >                         dma_resv_unlock(bo->base.resv);
> >                         return VM_FAULT_SIGBUS;
> >                 }
> >         }
> 
> Ah, right. This essentially blocks imported dma-bufs from being mapped
> here. I'll fix. This reminds me to add a comment to rework the ttm
> helpers to move this stuff to TTM.
> 
> /Thomas
> 
> > 
> > Matt
> > 
> > > +		if (err) {
> > > +			if (err != -ENOMEM && err != -ENOSPC)
> > > +				ret = xe_err_to_fault_t(err);
> > > +			goto out_unlock;
> > > +		}
> > > +	}
> > > +
> > > +	ret = __xe_bo_cpu_fault(vmf, xe, bo);
> > > +
> > > +out_unlock:
> > > +	dma_resv_unlock(tbo->base.resv);
> > > +out_validation:
> > > +	xe_validation_ctx_fini(&ctx);
> > > +out_pm:
> > > +	if (needs_rpm)
> > > +		xe_pm_runtime_put(xe);
> > > +
> > > +	return ret;
> > > +}
> > > +
> > > +static vm_fault_t xe_bo_cpu_fault(struct vm_fault *vmf)
> > >  {
> > >  	struct ttm_buffer_object *tbo = vmf->vma->vm_private_data;
> > >  	struct drm_device *ddev = tbo->base.dev;
> > >  	struct xe_device *xe = to_xe_device(ddev);
> > >  	struct xe_bo *bo = ttm_to_xe_bo(tbo);
> > >  	bool needs_rpm = bo->flags & XE_BO_FLAG_VRAM_MASK;
> > > -	struct drm_exec *exec;
> > > +	bool retry_after_wait = false;
> > > +	struct xe_validation_ctx ctx;
> > > +	struct drm_exec exec;
> > >  	vm_fault_t ret;
> > > +	int err = 0;
> > >  	int idx;
> > >  
> > > +	if (!drm_dev_enter(&xe->drm, &idx))
> > > +		return ttm_bo_vm_dummy_page(vmf, vmf->vma-
> > > >vm_page_prot);
> > > +
> > > +	ret = xe_bo_cpu_fault_fastpath(vmf, xe, bo, needs_rpm);
> > > +	if (ret != VM_FAULT_RETRY)
> > > +		goto out;
> > > +
> > > +	if (fault_flag_allow_retry_first(vmf->flags)) {
> > > +		if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
> > > +			goto out;
> > > +		retry_after_wait = true;
> > > +		xe_bo_get(bo);
> > > +		mmap_read_unlock(vmf->vma->vm_mm);
> > > +	} else {
> > > +		ret = VM_FAULT_NOPAGE;
> > > +	}
> > > +
> > > +	/*
> > > +	 * The fastpath failed and we were not required to return
> > > and retry immediately.
> > > +	 * We're now running in one of two modes:
> > > +	 *
> > > +	 * 1) retry_after_wait == true: The mmap_read_lock() is
> > > dropped, and we're trying
> > > +	 * to resolve blocking waits. But we can't resolve the
> > > fault since the
> > > +	 * mmap_read_lock() is dropped. After retrying the fault,
> > > the aim is that the fastpath
> > > +	 * should succeed. But it may fail since we drop the bo
> > > lock.
> > > +	 *
> > > +	 * 2) retry_after_wait == false: The fastpath failed,
> > > typically even after
> > > +	 * a retry. Do whatever's necessary to resolve the fault.
> > > +	 *
> > > +	 * This construct is recommended to avoid excessive waits
> > > under the mmap_lock.
> > > +	 */
> > > +
> > >  	if (needs_rpm)
> > >  		xe_pm_runtime_get(xe);
> > >  
> > > -	exec = XE_VALIDATION_UNIMPLEMENTED;
> > > -	ret = ttm_bo_vm_reserve(tbo, vmf);
> > > -	if (ret)
> > > -		goto out;
> > > +	xe_validation_guard(&ctx, &xe->val, &exec, (struct
> > > xe_val_flags) {.interruptible = true},
> > > +			    err) {
> > > +		long lerr;
> > >  
> > > -	if (drm_dev_enter(ddev, &idx)) {
> > > -		trace_xe_bo_cpu_fault(bo);
> > > +		err = drm_exec_lock_obj(&exec, &tbo->base);
> > > +		drm_exec_retry_on_contention(&exec);
> > > +		if (err)
> > > +			break;
> > >  
> > > -		xe_validation_assert_exec(xe, exec, &tbo->base);
> > > -		ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma-
> > > >vm_page_prot,
> > > -					      
> > > TTM_BO_VM_NUM_PREFAULT);
> > > -		drm_dev_exit(idx);
> > > +		lerr = dma_resv_wait_timeout(tbo->base.resv,
> > > +					    
> > > DMA_RESV_USAGE_KERNEL, true,
> > > +					    
> > > MAX_SCHEDULE_TIMEOUT);
> > > +		if (lerr < 0) {
> > > +			err = lerr;
> > > +			break;
> > > +		}
> > >  
> > > -		if (ret == VM_FAULT_RETRY &&
> > > -		    !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
> > > -			goto out;
> > > +		if (!tbo->resource->bus.is_iomem) {
> > > +			struct ttm_operation_ctx tctx = {
> > > +				.interruptible = true,
> > > +				.no_wait_gpu = false,
> > > +				.gfp_retry_mayfail = true,
> > > +			};
> > >  
> > > -		/*
> > > -		 * ttm_bo_vm_reserve() already has dma_resv_lock.
> > > -		 */
> > > -		if (ret == VM_FAULT_NOPAGE &&
> > > -		    mem_type_is_vram(tbo->resource->mem_type)) {
> > > -			mutex_lock(&xe-
> > > >mem_access.vram_userfault.lock);
> > > -			if (list_empty(&bo->vram_userfault_link))
> > > -				list_add(&bo->vram_userfault_link,
> > > -					 &xe-
> > > >mem_access.vram_userfault.list);
> > > -			mutex_unlock(&xe-
> > > >mem_access.vram_userfault.lock);
> > > +			err = ttm_bo_populate(tbo, &tctx);
> > > +			xe_validation_retry_on_oom(&ctx, &err);
> > > +			if (err && (err == -EINTR || err == -
> > > ERESTARTSYS))

This if statement looks odd.

'err && (err == -EINTR || err == -ERESTARTSYS)'

The 'err &&' is not required in the way this logic is written.

Should this be:

if (err)
	break;

Or is this an attempt to call ttm_bo_vm_fault_reserved which calls
ttm_bo_populate without gfp_retry_mayfail on OOO situations?

Also you don't check err == -EAGAIN in the existing logic which
ttm_bo_vm_fault_reserved does on the return of ttm_bo_populate.

Can you explain the reasoing here? Also a few comments in code
explaining the reasoning of the error handling would be helpful.

Matt

> > > +				break;
> > >  		}
> > > -	} else {
> > > -		ret = ttm_bo_vm_dummy_page(vmf, vmf->vma-
> > > >vm_page_prot);
> > > +		if (!retry_after_wait)
> > > +			ret = __xe_bo_cpu_fault(vmf, xe, bo);
> > >  	}
> > > +	if (err)
> > > +		ret = xe_err_to_fault_t(err);
> > >  
> > > -	dma_resv_unlock(tbo->base.resv);
> > > -out:
> > >  	if (needs_rpm)
> > >  		xe_pm_runtime_put(xe);
> > >  
> > > +	if (retry_after_wait)
> > > +		xe_bo_put(bo);
> > > +out:
> > > +	drm_dev_exit(idx);
> > > +
> > >  	return ret;
> > >  }
> > >  
> > > @@ -1807,7 +1938,7 @@ int xe_bo_read(struct xe_bo *bo, u64 offset,
> > > void *dst, int size)
> > >  }
> > >  
> > >  static const struct vm_operations_struct xe_gem_vm_ops = {
> > > -	.fault = xe_gem_fault,
> > > +	.fault = xe_bo_cpu_fault,
> > >  	.open = ttm_bo_vm_open,
> > >  	.close = ttm_bo_vm_close,
> > >  	.access = xe_bo_vm_access,
> > > diff --git a/drivers/gpu/drm/xe/xe_validation.c
> > > b/drivers/gpu/drm/xe/xe_validation.c
> > > index b90fda3dd5f4..826cd09966ef 100644
> > > --- a/drivers/gpu/drm/xe/xe_validation.c
> > > +++ b/drivers/gpu/drm/xe/xe_validation.c
> > > @@ -241,7 +241,8 @@ int xe_validation_exec_lock(struct
> > > xe_validation_ctx *ctx,
> > >   */
> > >  void xe_validation_ctx_fini(struct xe_validation_ctx *ctx)
> > >  {
> > > -	drm_exec_fini(ctx->exec);
> > > +	if (ctx->exec)
> > > +		drm_exec_fini(ctx->exec);
> > >  	xe_validation_unlock(ctx);
> > >  }
> > >  
> > > -- 
> > > 2.50.1
> > > 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 09/16] drm/xe: Convert the CPU fault handler for exhaustive eviction
  2025-08-27 15:52       ` Matthew Brost
@ 2025-08-28  6:18         ` Thomas Hellström
  0 siblings, 0 replies; 36+ messages in thread
From: Thomas Hellström @ 2025-08-28  6:18 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Wed, 2025-08-27 at 08:52 -0700, Matthew Brost wrote:
> On Wed, Aug 27, 2025 at 04:16:25PM +0200, Thomas Hellström wrote:
> > On Tue, 2025-08-26 at 15:53 -0700, Matthew Brost wrote:
> > > On Fri, Aug 22, 2025 at 11:40:23AM +0200, Thomas Hellström wrote:
> > > > The CPU fault handler may populate bos and migrate, and in
> > > > doing
> > > > so might interfere with other tasks validating.
> > > > 
> > > > Rework the CPU fault handler completely into a fastpath
> > > > and a slowpath. The fastpath trylocks only the validation lock
> > > > in read-mode. If that fails, there's a fallback to the
> > > > slowpath, where we do a full validation transaction.
> > > > 
> > > > This mandates open-coding of bo locking, bo idling and
> > > > bo populating, but we still call into TTM for fault
> > > > finalizing.
> > > > 
> > > > v2:
> > > > - Rework the CPU fault handler to actually take part in
> > > >   the exhaustive eviction scheme (Matthew Brost).
> > > > 
> > > > Signed-off-by: Thomas Hellström
> > > > <thomas.hellstrom@linux.intel.com>
> > > > ---
> > > >  drivers/gpu/drm/xe/xe_bo.c         | 191
> > > > ++++++++++++++++++++++++-
> > > > ----
> > > >  drivers/gpu/drm/xe/xe_validation.c |   3 +-
> > > >  2 files changed, 163 insertions(+), 31 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > > > b/drivers/gpu/drm/xe/xe_bo.c
> > > > index 76e9c93826a2..686ca5d6038a 100644
> > > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > > @@ -1713,57 +1713,188 @@ static void xe_gem_object_close(struct
> > > > drm_gem_object *obj,
> > > >  	}
> > > >  }
> > > >  
> > > > -static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
> > > > +static vm_fault_t __xe_bo_cpu_fault(struct vm_fault *vmf,
> > > > struct
> > > > xe_device *xe, struct xe_bo *bo)
> > > > +{
> > > > +	vm_fault_t ret;
> > > > +
> > > > +	trace_xe_bo_cpu_fault(bo);
> > > > +
> > > > +	ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma-
> > > > > vm_page_prot,
> > > > +				      
> > > > TTM_BO_VM_NUM_PREFAULT);
> > > > +	if (ret == VM_FAULT_NOPAGE &&
> > > > +	    mem_type_is_vram(bo->ttm.resource->mem_type)) {
> > > > +		mutex_lock(&xe-
> > > > >mem_access.vram_userfault.lock);
> > > > +		if (list_empty(&bo->vram_userfault_link))
> > > > +			list_add(&bo->vram_userfault_link,
> > > > +				 &xe-
> > > > > mem_access.vram_userfault.list);
> > > > +		mutex_unlock(&xe-
> > > > >mem_access.vram_userfault.lock);
> > > > +	}
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +static vm_fault_t xe_err_to_fault_t(int err)
> > > > +{
> > > > +	switch (err) {
> > > > +	case 0:
> > > > +	case -EINTR:
> > > > +	case -ERESTARTSYS:
> > > > +	case -EAGAIN:
> > > > +		return VM_FAULT_NOPAGE;
> > > > +	case -ENOMEM:
> > > > +	case -ENOSPC:
> > > > +		return VM_FAULT_OOM;
> > > > +	default:
> > > > +		break;
> > > > +	}
> > > > +	return VM_FAULT_SIGBUS;
> > > > +}
> > > > +
> > > > +static vm_fault_t xe_bo_cpu_fault_fastpath(struct vm_fault
> > > > *vmf,
> > > > struct xe_device *xe,
> > > > +					   struct xe_bo *bo,
> > > > bool
> > > > needs_rpm)
> > > > +{
> > > > +	struct ttm_buffer_object *tbo = &bo->ttm;
> > > > +	vm_fault_t ret = VM_FAULT_RETRY;
> > > > +	struct xe_validation_ctx ctx;
> > > > +	int err;
> > > > +
> > > > +	if (needs_rpm && !xe_pm_runtime_get_if_active(xe))
> > > > +		return VM_FAULT_RETRY;
> > > > +
> > > > +	err = xe_validation_ctx_init(&ctx, &xe->val, NULL,
> > > > +				     (struct xe_val_flags) {
> > > > +					     .interruptible =
> > > > true,
> > > > +					     .no_block = true
> > > > +				     });
> > > > +	if (err)
> > > > +		goto out_pm;
> > > > +
> > > > +	if (!dma_resv_trylock(tbo->base.resv))
> > > > +		goto out_validation;
> > > > +
> > > > +	if (!dma_resv_test_signaled(tbo->base.resv,
> > > > DMA_RESV_USAGE_KERNEL))
> > > > +		goto out_unlock;
> > > > +
> > > > +	if (!tbo->resource->bus.is_iomem) {
> > > > +		struct ttm_operation_ctx ctx = {
> > > > +			.interruptible = true,
> > > > +			.no_wait_gpu = true,
> > > > +			.gfp_retry_mayfail = true,
> > > > +		};
> > > > +
> > > > +		err = ttm_bo_populate(tbo, &ctx);
> > > 
> > > The version of the fault handler before didn't have a
> > > ttm_bo_populate
> > > call. Can you explain why it is added here?
> > 
> > It's called from within ttm_bo_vm_fault_reserved() but with a
> > blocking
> > ttm_operation_ctx. Here we call it non-blocking and if it succeeds
> > the
> > version in ttm_bo_vm_fault_reserved() will be a NOP.
> > 
> > The functionality is to bring in bos from swap if needed. Or to be
> > able
> > to access bos with deferred backing store allocation.
> > 
> 
> Ah, yes. I see that now. This made notice potential another issue.
> See below.
> 
> > > 
> > > Also in we have this code in ttm_bo_vm_reserve which rejects
> > > external
> > > object marked as unmappable. Do we need something like this?
> > > 
> > >         if (bo->ttm && (bo->ttm->page_flags &
> > > TTM_TT_FLAG_EXTERNAL))
> > > {
> > >                 if (!(bo->ttm->page_flags &
> > > TTM_TT_FLAG_EXTERNAL_MAPPABLE)) {
> > >                         dma_resv_unlock(bo->base.resv);
> > >                         return VM_FAULT_SIGBUS;
> > >                 }
> > >         }
> > 
> > Ah, right. This essentially blocks imported dma-bufs from being
> > mapped
> > here. I'll fix. This reminds me to add a comment to rework the ttm
> > helpers to move this stuff to TTM.
> > 
> > /Thomas
> > 
> > > 
> > > Matt
> > > 
> > > > +		if (err) {
> > > > +			if (err != -ENOMEM && err != -ENOSPC)
> > > > +				ret = xe_err_to_fault_t(err);
> > > > +			goto out_unlock;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	ret = __xe_bo_cpu_fault(vmf, xe, bo);
> > > > +
> > > > +out_unlock:
> > > > +	dma_resv_unlock(tbo->base.resv);
> > > > +out_validation:
> > > > +	xe_validation_ctx_fini(&ctx);
> > > > +out_pm:
> > > > +	if (needs_rpm)
> > > > +		xe_pm_runtime_put(xe);
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +static vm_fault_t xe_bo_cpu_fault(struct vm_fault *vmf)
> > > >  {
> > > >  	struct ttm_buffer_object *tbo = vmf->vma-
> > > > >vm_private_data;
> > > >  	struct drm_device *ddev = tbo->base.dev;
> > > >  	struct xe_device *xe = to_xe_device(ddev);
> > > >  	struct xe_bo *bo = ttm_to_xe_bo(tbo);
> > > >  	bool needs_rpm = bo->flags & XE_BO_FLAG_VRAM_MASK;
> > > > -	struct drm_exec *exec;
> > > > +	bool retry_after_wait = false;
> > > > +	struct xe_validation_ctx ctx;
> > > > +	struct drm_exec exec;
> > > >  	vm_fault_t ret;
> > > > +	int err = 0;
> > > >  	int idx;
> > > >  
> > > > +	if (!drm_dev_enter(&xe->drm, &idx))
> > > > +		return ttm_bo_vm_dummy_page(vmf, vmf->vma-
> > > > > vm_page_prot);
> > > > +
> > > > +	ret = xe_bo_cpu_fault_fastpath(vmf, xe, bo,
> > > > needs_rpm);
> > > > +	if (ret != VM_FAULT_RETRY)
> > > > +		goto out;
> > > > +
> > > > +	if (fault_flag_allow_retry_first(vmf->flags)) {
> > > > +		if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
> > > > +			goto out;
> > > > +		retry_after_wait = true;
> > > > +		xe_bo_get(bo);
> > > > +		mmap_read_unlock(vmf->vma->vm_mm);
> > > > +	} else {
> > > > +		ret = VM_FAULT_NOPAGE;
> > > > +	}
> > > > +
> > > > +	/*
> > > > +	 * The fastpath failed and we were not required to
> > > > return
> > > > and retry immediately.
> > > > +	 * We're now running in one of two modes:
> > > > +	 *
> > > > +	 * 1) retry_after_wait == true: The mmap_read_lock()
> > > > is
> > > > dropped, and we're trying
> > > > +	 * to resolve blocking waits. But we can't resolve the
> > > > fault since the
> > > > +	 * mmap_read_lock() is dropped. After retrying the
> > > > fault,
> > > > the aim is that the fastpath
> > > > +	 * should succeed. But it may fail since we drop the
> > > > bo
> > > > lock.
> > > > +	 *
> > > > +	 * 2) retry_after_wait == false: The fastpath failed,
> > > > typically even after
> > > > +	 * a retry. Do whatever's necessary to resolve the
> > > > fault.
> > > > +	 *
> > > > +	 * This construct is recommended to avoid excessive
> > > > waits
> > > > under the mmap_lock.
> > > > +	 */
> > > > +
> > > >  	if (needs_rpm)
> > > >  		xe_pm_runtime_get(xe);
> > > >  
> > > > -	exec = XE_VALIDATION_UNIMPLEMENTED;
> > > > -	ret = ttm_bo_vm_reserve(tbo, vmf);
> > > > -	if (ret)
> > > > -		goto out;
> > > > +	xe_validation_guard(&ctx, &xe->val, &exec, (struct
> > > > xe_val_flags) {.interruptible = true},
> > > > +			    err) {
> > > > +		long lerr;
> > > >  
> > > > -	if (drm_dev_enter(ddev, &idx)) {
> > > > -		trace_xe_bo_cpu_fault(bo);
> > > > +		err = drm_exec_lock_obj(&exec, &tbo->base);
> > > > +		drm_exec_retry_on_contention(&exec);
> > > > +		if (err)
> > > > +			break;
> > > >  
> > > > -		xe_validation_assert_exec(xe, exec, &tbo-
> > > > >base);
> > > > -		ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma-
> > > > > vm_page_prot,
> > > > -					      
> > > > TTM_BO_VM_NUM_PREFAULT);
> > > > -		drm_dev_exit(idx);
> > > > +		lerr = dma_resv_wait_timeout(tbo->base.resv,
> > > > +					    
> > > > DMA_RESV_USAGE_KERNEL, true,
> > > > +					    
> > > > MAX_SCHEDULE_TIMEOUT);
> > > > +		if (lerr < 0) {
> > > > +			err = lerr;
> > > > +			break;
> > > > +		}
> > > >  
> > > > -		if (ret == VM_FAULT_RETRY &&
> > > > -		    !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
> > > > -			goto out;
> > > > +		if (!tbo->resource->bus.is_iomem) {
> > > > +			struct ttm_operation_ctx tctx = {
> > > > +				.interruptible = true,
> > > > +				.no_wait_gpu = false,
> > > > +				.gfp_retry_mayfail = true,
> > > > +			};
> > > >  
> > > > -		/*
> > > > -		 * ttm_bo_vm_reserve() already has
> > > > dma_resv_lock.
> > > > -		 */
> > > > -		if (ret == VM_FAULT_NOPAGE &&
> > > > -		    mem_type_is_vram(tbo->resource->mem_type))
> > > > {
> > > > -			mutex_lock(&xe-
> > > > > mem_access.vram_userfault.lock);
> > > > -			if (list_empty(&bo-
> > > > >vram_userfault_link))
> > > > -				list_add(&bo-
> > > > >vram_userfault_link,
> > > > -					 &xe-
> > > > > mem_access.vram_userfault.list);
> > > > -			mutex_unlock(&xe-
> > > > > mem_access.vram_userfault.lock);
> > > > +			err = ttm_bo_populate(tbo, &tctx);
> > > > +			xe_validation_retry_on_oom(&ctx,
> > > > &err);
> > > > +			if (err && (err == -EINTR || err == -
> > > > ERESTARTSYS))
> 
> This if statement looks odd.
> 
> 'err && (err == -EINTR || err == -ERESTARTSYS)'
> 
> The 'err &&' is not required in the way this logic is written.
> 
> Should this be:
> 
> if (err)
> 	break;
> 
> Or is this an attempt to call ttm_bo_vm_fault_reserved which calls
> ttm_bo_populate without gfp_retry_mayfail on OOO situations?

Yes, the if (err && is there to short-circuit the rest of the
evaluation in the common case of err == 0. Not that I expect that there
will be a dramatic performance improvement, but it explains why. And
yes, I should include the -EAGAIN and if a removal is warranted, that
should be a separate patch. Finally yes, the intention is to try
without gfp_retry_mayfail as a last resort.

there is a VM_FAULT_OOM error, but it seems parts of the vm doesn't
really expect that to be returned and ends up printing a confused
memory from the OOM subsystem. Instead without the gfp_retry_mayfail,
then from what I can tell, the OOM killer will be invoked for faulting.

> 
> Also you don't check err == -EAGAIN in the existing logic which
> ttm_bo_vm_fault_reserved does on the return of ttm_bo_populate.
> 
> Can you explain the reasoing here? Also a few comments in code
> explaining the reasoning of the error handling would be helpful.

Agreed. I'll add that.

/Thomas




> 
> Matt
> 
> > > > +				break;
> > > >  		}
> > > > -	} else {
> > > > -		ret = ttm_bo_vm_dummy_page(vmf, vmf->vma-
> > > > > vm_page_prot);
> > > > +		if (!retry_after_wait)
> > > > +			ret = __xe_bo_cpu_fault(vmf, xe, bo);
> > > >  	}
> > > > +	if (err)
> > > > +		ret = xe_err_to_fault_t(err);
> > > >  
> > > > -	dma_resv_unlock(tbo->base.resv);
> > > > -out:
> > > >  	if (needs_rpm)
> > > >  		xe_pm_runtime_put(xe);
> > > >  
> > > > +	if (retry_after_wait)
> > > > +		xe_bo_put(bo);
> > > > +out:
> > > > +	drm_dev_exit(idx);
> > > > +
> > > >  	return ret;
> > > >  }
> > > >  
> > > > @@ -1807,7 +1938,7 @@ int xe_bo_read(struct xe_bo *bo, u64
> > > > offset,
> > > > void *dst, int size)
> > > >  }
> > > >  
> > > >  static const struct vm_operations_struct xe_gem_vm_ops = {
> > > > -	.fault = xe_gem_fault,
> > > > +	.fault = xe_bo_cpu_fault,
> > > >  	.open = ttm_bo_vm_open,
> > > >  	.close = ttm_bo_vm_close,
> > > >  	.access = xe_bo_vm_access,
> > > > diff --git a/drivers/gpu/drm/xe/xe_validation.c
> > > > b/drivers/gpu/drm/xe/xe_validation.c
> > > > index b90fda3dd5f4..826cd09966ef 100644
> > > > --- a/drivers/gpu/drm/xe/xe_validation.c
> > > > +++ b/drivers/gpu/drm/xe/xe_validation.c
> > > > @@ -241,7 +241,8 @@ int xe_validation_exec_lock(struct
> > > > xe_validation_ctx *ctx,
> > > >   */
> > > >  void xe_validation_ctx_fini(struct xe_validation_ctx *ctx)
> > > >  {
> > > > -	drm_exec_fini(ctx->exec);
> > > > +	if (ctx->exec)
> > > > +		drm_exec_fini(ctx->exec);
> > > >  	xe_validation_unlock(ctx);
> > > >  }
> > > >  
> > > > -- 
> > > > 2.50.1
> > > > 
> > 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 14/16] drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction
  2025-08-26 21:52   ` Matthew Brost
@ 2025-09-02 13:32     ` Thomas Hellström
  0 siblings, 0 replies; 36+ messages in thread
From: Thomas Hellström @ 2025-09-02 13:32 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, Joonas Lahtinen, Jani Nikula, Maarten Lankhorst,
	Matthew Auld

On Tue, 2025-08-26 at 14:52 -0700, Matthew Brost wrote:
> On Fri, Aug 22, 2025 at 11:40:28AM +0200, Thomas Hellström wrote:
> > Introduce an xe_bo_create_pin_map_novm() function that does not
> > take the drm_exec paramenter to simplify the conversion of many
> > callsites.
> > For the rest, ensure that the same drm_exec context that was used
> > for locking the vm is passed down to validation.
> > 
> > Use xe_validation_guard() where appropriate.
> > 
> > v2:
> > - Avoid gotos from within xe_validation_guard(). (Matt Brost)
> > - Break out the change to pf_provision_vf_lmem8 to a separate
> >   patch.
> > - Adapt to signature change of xe_validation_guard().
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  drivers/gpu/drm/xe/display/intel_fbdev_fb.c   |  18 +--
> >  drivers/gpu/drm/xe/display/xe_dsb_buffer.c    |  10 +-
> >  drivers/gpu/drm/xe/display/xe_hdcp_gsc.c      |   8 +-
> >  drivers/gpu/drm/xe/tests/xe_migrate.c         |   9 +-
> >  drivers/gpu/drm/xe/xe_bo.c                    |  52 +++++++-
> >  drivers/gpu/drm/xe/xe_bo.h                    |   6 +-
> >  drivers/gpu/drm/xe/xe_gsc.c                   |   8 +-
> >  drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c |  24 ++--
> >  drivers/gpu/drm/xe/xe_guc_engine_activity.c   |  13 +-
> >  drivers/gpu/drm/xe/xe_lmtt.c                  |  12 +-
> >  drivers/gpu/drm/xe/xe_lrc.c                   |   7 +-
> >  drivers/gpu/drm/xe/xe_migrate.c               |  20 ++-
> >  drivers/gpu/drm/xe/xe_oa.c                    |   6 +-
> >  drivers/gpu/drm/xe/xe_pt.c                    |  10 +-
> >  drivers/gpu/drm/xe/xe_pt.h                    |   3 +-
> >  drivers/gpu/drm/xe/xe_pxp_submit.c            |  34 +++--
> >  drivers/gpu/drm/xe/xe_vm.c                    | 121 +++++++++++---
> > ----
> >  17 files changed, 231 insertions(+), 130 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
> > b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
> > index d96ba2b51065..8ea9a472113c 100644
> > --- a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
> > +++ b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
> > @@ -42,11 +42,11 @@ struct intel_framebuffer
> > *intel_fbdev_fb_alloc(struct drm_fb_helper *helper,
> >  	obj = ERR_PTR(-ENODEV);
> >  
> >  	if (!IS_DGFX(xe) && !XE_GT_WA(xe_root_mmio_gt(xe),
> > 22019338487_display)) {
> > -		obj = xe_bo_create_pin_map(xe,
> > xe_device_get_root_tile(xe),
> > -					   NULL, size,
> > -					   ttm_bo_type_kernel,
> > XE_BO_FLAG_SCANOUT |
> > -					   XE_BO_FLAG_STOLEN |
> > -					   XE_BO_FLAG_GGTT);
> > +		obj = xe_bo_create_pin_map_novm(xe,
> > xe_device_get_root_tile(xe),
> > +						size,
> > +						ttm_bo_type_kernel
> > , XE_BO_FLAG_SCANOUT |
> > +						XE_BO_FLAG_STOLEN
> > |
> > +						XE_BO_FLAG_GGTT,
> > false);
> 
> This was interruptable before, same for a few other display
> conversions.
> 
> I'm not familar enough with display to know if this is ok.

So I added a comment in the cover letter regarding this. Basically
where one can deduce that an -EINTR or -ERESTARTSYS might have been
happening anyway, I've added interruptible waits. For the other
instances I keep uninterruptible waits but we probably want a follow-up
audit of the callers to see if interruptible waits are indeed possible.

/Thomas

> 
> >  		if (!IS_ERR(obj))
> >  			drm_info(&xe->drm, "Allocated fbdev into
> > stolen\n");
> >  		else
> > @@ -54,10 +54,10 @@ struct intel_framebuffer
> > *intel_fbdev_fb_alloc(struct drm_fb_helper *helper,
> >  	}
> >  
> >  	if (IS_ERR(obj)) {
> > -		obj = xe_bo_create_pin_map(xe,
> > xe_device_get_root_tile(xe), NULL, size,
> > -					   ttm_bo_type_kernel,
> > XE_BO_FLAG_SCANOUT |
> > -					  
> > XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> > -					   XE_BO_FLAG_GGTT);
> > +		obj = xe_bo_create_pin_map_novm(xe,
> > xe_device_get_root_tile(xe), size,
> > +						ttm_bo_type_kernel
> > , XE_BO_FLAG_SCANOUT |
> > +						XE_BO_FLAG_VRAM_IF
> > _DGFX(xe_device_get_root_tile(xe)) |
> > +						XE_BO_FLAG_GGTT,
> > false);
> >  	}
> >  
> >  	if (IS_ERR(obj)) {
> > diff --git a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
> > b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
> > index 9f941fc2e36b..58581d7aaae6 100644
> > --- a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
> > +++ b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
> > @@ -43,11 +43,11 @@ bool intel_dsb_buffer_create(struct intel_crtc
> > *crtc, struct intel_dsb_buffer *d
> >  		return false;
> >  
> >  	/* Set scanout flag for WC mapping */
> > -	obj = xe_bo_create_pin_map(xe,
> > xe_device_get_root_tile(xe),
> > -				   NULL, PAGE_ALIGN(size),
> > -				   ttm_bo_type_kernel,
> > -				  
> > XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> > -				   XE_BO_FLAG_SCANOUT |
> > XE_BO_FLAG_GGTT);
> > +	obj = xe_bo_create_pin_map_novm(xe,
> > xe_device_get_root_tile(xe),
> > +					PAGE_ALIGN(size),
> > +					ttm_bo_type_kernel,
> > +					XE_BO_FLAG_VRAM_IF_DGFX(xe
> > _device_get_root_tile(xe)) |
> > +					XE_BO_FLAG_SCANOUT |
> > XE_BO_FLAG_GGTT, false);
> >  	if (IS_ERR(obj)) {
> >  		kfree(vma);
> >  		return false;
> > diff --git a/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
> > b/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
> > index 30f1073141fc..4ae847b628e2 100644
> > --- a/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
> > +++ b/drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
> > @@ -72,10 +72,10 @@ static int
> > intel_hdcp_gsc_initialize_message(struct xe_device *xe,
> >  	int ret = 0;
> >  
> >  	/* allocate object of two page for HDCP command memory and
> > store it */
> > -	bo = xe_bo_create_pin_map(xe, xe_device_get_root_tile(xe),
> > NULL, PAGE_SIZE * 2,
> > -				  ttm_bo_type_kernel,
> > -				  XE_BO_FLAG_SYSTEM |
> > -				  XE_BO_FLAG_GGTT);
> > +	bo = xe_bo_create_pin_map_novm(xe,
> > xe_device_get_root_tile(xe), PAGE_SIZE * 2,
> > +				       ttm_bo_type_kernel,
> > +				       XE_BO_FLAG_SYSTEM |
> > +				       XE_BO_FLAG_GGTT, false);
> >  
> >  	if (IS_ERR(bo)) {
> >  		drm_err(&xe->drm, "Failed to allocate bo for HDCP
> > streaming command!\n");
> > diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c
> > b/drivers/gpu/drm/xe/tests/xe_migrate.c
> > index afa794e56065..5904d658d1f2 100644
> > --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
> > +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
> > @@ -204,7 +204,8 @@ static void xe_migrate_sanity_test(struct
> > xe_migrate *m, struct kunit *test,
> >  
> >  	big = xe_bo_create_pin_map(xe, tile, m->q->vm, SZ_4M,
> >  				   ttm_bo_type_kernel,
> > -				   XE_BO_FLAG_VRAM_IF_DGFX(tile));
> > +				   XE_BO_FLAG_VRAM_IF_DGFX(tile),
> > +				   exec);
> >  	if (IS_ERR(big)) {
> >  		KUNIT_FAIL(test, "Failed to allocate bo: %li\n",
> > PTR_ERR(big));
> >  		goto vunmap;
> > @@ -212,7 +213,8 @@ static void xe_migrate_sanity_test(struct
> > xe_migrate *m, struct kunit *test,
> >  
> >  	pt = xe_bo_create_pin_map(xe, tile, m->q->vm,
> > XE_PAGE_SIZE,
> >  				  ttm_bo_type_kernel,
> > -				  XE_BO_FLAG_VRAM_IF_DGFX(tile));
> > +				  XE_BO_FLAG_VRAM_IF_DGFX(tile),
> > +				  exec);
> >  	if (IS_ERR(pt)) {
> >  		KUNIT_FAIL(test, "Failed to allocate fake pt:
> > %li\n",
> >  			   PTR_ERR(pt));
> > @@ -222,7 +224,8 @@ static void xe_migrate_sanity_test(struct
> > xe_migrate *m, struct kunit *test,
> >  	tiny = xe_bo_create_pin_map(xe, tile, m->q->vm,
> >  				    2 * SZ_4K,
> >  				    ttm_bo_type_kernel,
> > -				   
> > XE_BO_FLAG_VRAM_IF_DGFX(tile));
> > +				    XE_BO_FLAG_VRAM_IF_DGFX(tile),
> > +				    exec);
> >  	if (IS_ERR(tiny)) {
> >  		KUNIT_FAIL(test, "Failed to allocate tiny fake pt:
> > %li\n",
> >  			   PTR_ERR(tiny));
> > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > b/drivers/gpu/drm/xe/xe_bo.c
> > index d5172cb05078..7a62629c88e0 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > @@ -2464,16 +2464,59 @@ xe_bo_create_pin_map_at_novm(struct
> > xe_device *xe, struct xe_tile *tile,
> >  	return ret ? ERR_PTR(ret) : bo;
> >  }
> >  
> > +/**
> > + * xe_bo_create_pin_map() - Create pinned and mapped bo
> > + * @xe: The xe device.
> > + * @tile: The tile to select for migration of this bo, and the
> > tile used for
> > + * @vm: The vm to associate the buffer object with. The vm's resv
> > must be locked
> > + * with the transaction represented by @exec.
> > + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel
> > bos.
> > + * @size: The storage size to use for the bo.
> > + * @type: The TTM buffer object type.
> > + * @flags: XE_BO_FLAG_ flags.
> > + * @exec: The drm_exec transaction to use for exhaustive eviction,
> > and
> > + * previously used for locking @vm's resv.
> > + *
> > + * Create a pinned and mapped bo. The bo will be external and not
> > associated
> > + * with a VM.
> > + *
> > + * Return: The buffer object on success. Negative error pointer on
> > failure.
> > + * In particular, the function may return ERR_PTR(%-EINTR) if
> > @exec was
> > + * configured for interruptible locking.
> > + */
> >  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct
> > xe_tile *tile,
> >  				   struct xe_vm *vm, size_t size,
> > -				   enum ttm_bo_type type, u32
> > flags)
> > +				   enum ttm_bo_type type, u32
> > flags,
> > +				   struct drm_exec *exec)
> >  {
> > -	struct drm_exec *exec = vm ? xe_vm_validation_exec(vm) :
> > XE_VALIDATION_UNIMPLEMENTED;
> > -
> >  	return xe_bo_create_pin_map_at_aligned(xe, tile, vm, size,
> > ~0ull, type, flags,
> >  					       0, exec);
> >  }
> >  
> > +/**
> > + * xe_bo_create_pin_map_novm() - Create pinned and mapped bo
> > + * @xe: The xe device.
> > + * @tile: The tile to select for migration of this bo, and the
> > tile used for
> > + * GGTT binding if any. Only to be non-NULL for ttm_bo_type_kernel
> > bos.
> > + * @size: The storage size to use for the bo.
> > + * @type: The TTM buffer object type.
> > + * @flags: XE_BO_FLAG_ flags.
> > + * @intr: Whether to execut any waits for backing store
> > interruptible.
> > + *
> > + * Create a pinned and mapped bo. The bo will be external and not
> > associated
> > + * with a VM.
> > + *
> > + * Return: The buffer object on success. Negative error pointer on
> > failure.
> > + * In particular, the function may return ERR_PTR(%-EINTR) if
> > @intr was set
> > + * to true on entry.
> > + */
> > +struct xe_bo *xe_bo_create_pin_map_novm(struct xe_device *xe,
> > struct xe_tile *tile,
> > +					size_t size, enum
> > ttm_bo_type type, u32 flags,
> > +					bool intr)
> > +{
> > +	return xe_bo_create_pin_map_at_novm(xe, tile, size, ~0ull,
> > type, flags, 0, intr);
> > +}
> > +
> >  static void __xe_bo_unpin_map_no_vm(void *arg)
> >  {
> >  	xe_bo_unpin_map_no_vm(arg);
> > @@ -2486,8 +2529,7 @@ struct xe_bo
> > *xe_managed_bo_create_pin_map(struct xe_device *xe, struct xe_tile
> >  	int ret;
> >  
> >  	KUNIT_STATIC_STUB_REDIRECT(xe_managed_bo_create_pin_map,
> > xe, tile, size, flags);
> > -
> > -	bo = xe_bo_create_pin_map(xe, tile, NULL, size,
> > ttm_bo_type_kernel, flags);
> > +	bo = xe_bo_create_pin_map_novm(xe, tile, size,
> > ttm_bo_type_kernel, flags, true);
> 
> This is a driver load call, so non-interruptable should be fine.
> 
> >  	if (IS_ERR(bo))
> >  		return bo;
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_bo.h
> > b/drivers/gpu/drm/xe/xe_bo.h
> > index decd601c802d..6f46f928a0d4 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.h
> > +++ b/drivers/gpu/drm/xe/xe_bo.h
> > @@ -108,7 +108,11 @@ struct xe_bo *xe_bo_create_user(struct
> > xe_device *xe, struct xe_vm *vm, size_t s
> >  				u16 cpu_caching, u32 flags, struct
> > drm_exec *exec);
> >  struct xe_bo *xe_bo_create_pin_map(struct xe_device *xe, struct
> > xe_tile *tile,
> >  				   struct xe_vm *vm, size_t size,
> > -				   enum ttm_bo_type type, u32
> > flags);
> > +				   enum ttm_bo_type type, u32
> > flags,
> > +				   struct drm_exec *exec);
> > +struct xe_bo *xe_bo_create_pin_map_novm(struct xe_device *xe,
> > struct xe_tile *tile,
> > +					size_t size, enum
> > ttm_bo_type type, u32 flags,
> > +					bool intr);
> >  struct xe_bo *
> >  xe_bo_create_pin_map_at_novm(struct xe_device *xe, struct xe_tile
> > *tile,
> >  			     size_t size, u64 offset, enum
> > ttm_bo_type type,
> > diff --git a/drivers/gpu/drm/xe/xe_gsc.c
> > b/drivers/gpu/drm/xe/xe_gsc.c
> > index f5ae28af60d4..83d61bf8ec62 100644
> > --- a/drivers/gpu/drm/xe/xe_gsc.c
> > +++ b/drivers/gpu/drm/xe/xe_gsc.c
> > @@ -136,10 +136,10 @@ static int query_compatibility_version(struct
> > xe_gsc *gsc)
> >  	u64 ggtt_offset;
> >  	int err;
> >  
> > -	bo = xe_bo_create_pin_map(xe, tile, NULL, GSC_VER_PKT_SZ *
> > 2,
> > -				  ttm_bo_type_kernel,
> > -				  XE_BO_FLAG_SYSTEM |
> > -				  XE_BO_FLAG_GGTT);
> > +	bo = xe_bo_create_pin_map_novm(xe, tile, GSC_VER_PKT_SZ *
> > 2,
> > +				       ttm_bo_type_kernel,
> > +				       XE_BO_FLAG_SYSTEM |
> > +				       XE_BO_FLAG_GGTT, false);
> >  	if (IS_ERR(bo)) {
> >  		xe_gt_err(gt, "failed to allocate bo for GSC
> > version query\n");
> >  		return PTR_ERR(bo);
> > diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> > b/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> > index c712111aa30d..44cc612b0a75 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> > +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> > @@ -55,12 +55,12 @@ static int pf_send_guc_save_vf_state(struct
> > xe_gt *gt, unsigned int vfid,
> >  	xe_gt_assert(gt, size % sizeof(u32) == 0);
> >  	xe_gt_assert(gt, size == ndwords * sizeof(u32));
> >  
> > -	bo = xe_bo_create_pin_map(xe, tile, NULL,
> > -				  ALIGN(size, PAGE_SIZE),
> > -				  ttm_bo_type_kernel,
> > -				  XE_BO_FLAG_SYSTEM |
> > -				  XE_BO_FLAG_GGTT |
> > -				  XE_BO_FLAG_GGTT_INVALIDATE);
> > +	bo = xe_bo_create_pin_map_novm(xe, tile,
> > +				       ALIGN(size, PAGE_SIZE),
> > +				       ttm_bo_type_kernel,
> > +				       XE_BO_FLAG_SYSTEM |
> > +				       XE_BO_FLAG_GGTT |
> > +				       XE_BO_FLAG_GGTT_INVALIDATE,
> > false);
> >  	if (IS_ERR(bo))
> >  		return PTR_ERR(bo);
> >  
> > @@ -91,12 +91,12 @@ static int pf_send_guc_restore_vf_state(struct
> > xe_gt *gt, unsigned int vfid,
> >  	xe_gt_assert(gt, size % sizeof(u32) == 0);
> >  	xe_gt_assert(gt, size == ndwords * sizeof(u32));
> >  
> > -	bo = xe_bo_create_pin_map(xe, tile, NULL,
> > -				  ALIGN(size, PAGE_SIZE),
> > -				  ttm_bo_type_kernel,
> > -				  XE_BO_FLAG_SYSTEM |
> > -				  XE_BO_FLAG_GGTT |
> > -				  XE_BO_FLAG_GGTT_INVALIDATE);
> > +	bo = xe_bo_create_pin_map_novm(xe, tile,
> > +				       ALIGN(size, PAGE_SIZE),
> > +				       ttm_bo_type_kernel,
> > +				       XE_BO_FLAG_SYSTEM |
> > +				       XE_BO_FLAG_GGTT |
> > +				       XE_BO_FLAG_GGTT_INVALIDATE,
> > false);
> >  	if (IS_ERR(bo))
> >  		return PTR_ERR(bo);
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_guc_engine_activity.c
> > b/drivers/gpu/drm/xe/xe_guc_engine_activity.c
> > index 92e1f9f41b8c..2b99c1ebdd58 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_engine_activity.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_engine_activity.c
> > @@ -94,16 +94,17 @@ static int
> > allocate_engine_activity_buffers(struct xe_guc *guc,
> >  	struct xe_tile *tile = gt_to_tile(gt);
> >  	struct xe_bo *bo, *metadata_bo;
> >  
> > -	metadata_bo = xe_bo_create_pin_map(gt_to_xe(gt), tile,
> > NULL, PAGE_ALIGN(metadata_size),
> > -					   ttm_bo_type_kernel,
> > XE_BO_FLAG_SYSTEM |
> > -					   XE_BO_FLAG_GGTT |
> > XE_BO_FLAG_GGTT_INVALIDATE);
> > +	metadata_bo = xe_bo_create_pin_map_novm(gt_to_xe(gt),
> > tile, PAGE_ALIGN(metadata_size),
> > +						ttm_bo_type_kernel
> > , XE_BO_FLAG_SYSTEM |
> > +						XE_BO_FLAG_GGTT |
> > XE_BO_FLAG_GGTT_INVALIDATE,
> > +						false);
> >  
> >  	if (IS_ERR(metadata_bo))
> >  		return PTR_ERR(metadata_bo);
> >  
> > -	bo = xe_bo_create_pin_map(gt_to_xe(gt), tile, NULL,
> > PAGE_ALIGN(size),
> > -				  ttm_bo_type_kernel,
> > XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> > -				  XE_BO_FLAG_GGTT |
> > XE_BO_FLAG_GGTT_INVALIDATE);
> > +	bo = xe_bo_create_pin_map_novm(gt_to_xe(gt), tile,
> > PAGE_ALIGN(size),
> > +				       ttm_bo_type_kernel,
> > XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> > +				       XE_BO_FLAG_GGTT |
> > XE_BO_FLAG_GGTT_INVALIDATE, false);
> >  
> >  	if (IS_ERR(bo)) {
> >  		xe_bo_unpin_map_no_vm(metadata_bo);
> > diff --git a/drivers/gpu/drm/xe/xe_lmtt.c
> > b/drivers/gpu/drm/xe/xe_lmtt.c
> > index a78c9d474a6e..4ad468574174 100644
> > --- a/drivers/gpu/drm/xe/xe_lmtt.c
> > +++ b/drivers/gpu/drm/xe/xe_lmtt.c
> > @@ -67,12 +67,12 @@ static struct xe_lmtt_pt *lmtt_pt_alloc(struct
> > xe_lmtt *lmtt, unsigned int level
> >  		goto out;
> >  	}
> >  
> > -	bo = xe_bo_create_pin_map(lmtt_to_xe(lmtt),
> > lmtt_to_tile(lmtt), NULL,
> > -				  PAGE_ALIGN(lmtt->ops-
> > >lmtt_pte_size(level) *
> > -					     lmtt->ops-
> > >lmtt_pte_num(level)),
> > -				  ttm_bo_type_kernel,
> > -				 
> > XE_BO_FLAG_VRAM_IF_DGFX(lmtt_to_tile(lmtt)) |
> > -				  XE_BO_FLAG_NEEDS_64K);
> > +	bo = xe_bo_create_pin_map_novm(lmtt_to_xe(lmtt),
> > lmtt_to_tile(lmtt),
> > +				       PAGE_ALIGN(lmtt->ops-
> > >lmtt_pte_size(level) *
> > +						  lmtt->ops-
> > >lmtt_pte_num(level)),
> > +				       ttm_bo_type_kernel,
> > +				      
> > XE_BO_FLAG_VRAM_IF_DGFX(lmtt_to_tile(lmtt)) |
> > +				       XE_BO_FLAG_NEEDS_64K,
> > false);
> >  	if (IS_ERR(bo)) {
> >  		err = PTR_ERR(bo);
> >  		goto out_free_pt;
> > diff --git a/drivers/gpu/drm/xe/xe_lrc.c
> > b/drivers/gpu/drm/xe/xe_lrc.c
> > index 8f6c3ba47882..6d52e0eb97f5 100644
> > --- a/drivers/gpu/drm/xe/xe_lrc.c
> > +++ b/drivers/gpu/drm/xe/xe_lrc.c
> > @@ -1340,9 +1340,10 @@ static int xe_lrc_init(struct xe_lrc *lrc,
> > struct xe_hw_engine *hwe,
> >  	if (vm && vm->xef) /* userspace */
> >  		bo_flags |= XE_BO_FLAG_PINNED_LATE_RESTORE;
> >  
> > -	lrc->bo = xe_bo_create_pin_map(xe, tile, NULL, bo_size,
> > -				       ttm_bo_type_kernel,
> > -				       bo_flags);
> > +	lrc->bo = xe_bo_create_pin_map_novm(xe, tile,
> > +					    bo_size,
> > +					    ttm_bo_type_kernel,
> > +					    bo_flags, false);
> 
> This is in IOCTL call path, so it should interruptable, right?
> 
> >  	if (IS_ERR(lrc->bo))
> >  		return PTR_ERR(lrc->bo);
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> > b/drivers/gpu/drm/xe/xe_migrate.c
> > index 57e6d5a8ac39..b27388db42a5 100644
> > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > @@ -35,6 +35,7 @@
> >  #include "xe_sched_job.h"
> >  #include "xe_sync.h"
> >  #include "xe_trace_bo.h"
> > +#include "xe_validation.h"
> >  #include "xe_vm.h"
> >  #include "xe_vram.h"
> >  
> > @@ -173,7 +174,7 @@ static void xe_migrate_program_identity(struct
> > xe_device *xe, struct xe_vm *vm,
> >  }
> >  
> >  static int xe_migrate_prepare_vm(struct xe_tile *tile, struct
> > xe_migrate *m,
> > -				 struct xe_vm *vm)
> > +				 struct xe_vm *vm, struct drm_exec
> > *exec)
> >  {
> >  	struct xe_device *xe = tile_to_xe(tile);
> >  	u16 pat_index = xe->pat.idx[XE_CACHE_WB];
> > @@ -200,7 +201,7 @@ static int xe_migrate_prepare_vm(struct xe_tile
> > *tile, struct xe_migrate *m,
> >  				  num_entries * XE_PAGE_SIZE,
> >  				  ttm_bo_type_kernel,
> >  				  XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> > -				  XE_BO_FLAG_PAGETABLE);
> > +				  XE_BO_FLAG_PAGETABLE, exec);
> >  	if (IS_ERR(bo))
> >  		return PTR_ERR(bo);
> >  
> > @@ -404,6 +405,8 @@ int xe_migrate_init(struct xe_migrate *m)
> >  	struct xe_tile *tile = m->tile;
> >  	struct xe_gt *primary_gt = tile->primary_gt;
> >  	struct xe_device *xe = tile_to_xe(tile);
> > +	struct xe_validation_ctx ctx;
> > +	struct drm_exec exec;
> >  	struct xe_vm *vm;
> >  	int err;
> >  
> > @@ -413,11 +416,16 @@ int xe_migrate_init(struct xe_migrate *m)
> >  	if (IS_ERR(vm))
> >  		return PTR_ERR(vm);
> >  
> > -	xe_vm_lock(vm, false);
> > -	err = xe_migrate_prepare_vm(tile, m, vm);
> > -	xe_vm_unlock(vm);
> > +	err = 0;
> > +	xe_validation_guard(&ctx, &xe->val, &exec, (struct
> > xe_val_flags) {}, err) {
> > +		err = xe_vm_drm_exec_lock(vm, &exec);
> > +		drm_exec_retry_on_contention(&exec);
> > +		err = xe_migrate_prepare_vm(tile, m, vm, &exec);
> > +		drm_exec_retry_on_contention(&exec);
> > +		xe_validation_retry_on_oom(&ctx, &err);
> > +	}
> >  	if (err)
> > -		goto err_out;
> > +		return err;
> >  
> >  	if (xe->info.has_usm) {
> >  		struct xe_hw_engine *hwe =
> > xe_gt_hw_engine(primary_gt,
> > diff --git a/drivers/gpu/drm/xe/xe_oa.c
> > b/drivers/gpu/drm/xe/xe_oa.c
> > index a188bad172ad..a4894eb0d7f3 100644
> > --- a/drivers/gpu/drm/xe/xe_oa.c
> > +++ b/drivers/gpu/drm/xe/xe_oa.c
> > @@ -883,9 +883,9 @@ static int xe_oa_alloc_oa_buffer(struct
> > xe_oa_stream *stream, size_t size)
> >  {
> >  	struct xe_bo *bo;
> >  
> > -	bo = xe_bo_create_pin_map(stream->oa->xe, stream->gt-
> > >tile, NULL,
> > -				  size, ttm_bo_type_kernel,
> > -				  XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_GGTT);
> > +	bo = xe_bo_create_pin_map_novm(stream->oa->xe, stream->gt-
> > >tile,
> > +				       size, ttm_bo_type_kernel,
> > +				       XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_GGTT, false);
> 
> This is in IOCTL call path, so it should interruptable, right?
> 
> Rest LGTM.
> 
> Matt
> 
> >  	if (IS_ERR(bo))
> >  		return PTR_ERR(bo);
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_pt.c
> > b/drivers/gpu/drm/xe/xe_pt.c
> > index f3a39e734a90..33ad40418ceb 100644
> > --- a/drivers/gpu/drm/xe/xe_pt.c
> > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > @@ -88,6 +88,7 @@ static void xe_pt_free(struct xe_pt *pt)
> >   * @vm: The vm to create for.
> >   * @tile: The tile to create for.
> >   * @level: The page-table level.
> > + * @exec: The drm_exec object used to lock the vm.
> >   *
> >   * Allocate and initialize a single struct xe_pt metadata
> > structure. Also
> >   * create the corresponding page-table bo, but don't initialize
> > it. If the
> > @@ -99,7 +100,7 @@ static void xe_pt_free(struct xe_pt *pt)
> >   * error.
> >   */
> >  struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
> > -			   unsigned int level)
> > +			   unsigned int level, struct drm_exec
> > *exec)
> >  {
> >  	struct xe_pt *pt;
> >  	struct xe_bo *bo;
> > @@ -123,9 +124,11 @@ struct xe_pt *xe_pt_create(struct xe_vm *vm,
> > struct xe_tile *tile,
> >  		bo_flags |= XE_BO_FLAG_PINNED_LATE_RESTORE;
> >  
> >  	pt->level = level;
> > +
> > +	drm_WARN_ON(&vm->xe->drm, IS_ERR_OR_NULL(exec));
> >  	bo = xe_bo_create_pin_map(vm->xe, tile, vm, SZ_4K,
> >  				  ttm_bo_type_kernel,
> > -				  bo_flags);
> > +				  bo_flags, exec);
> >  	if (IS_ERR(bo)) {
> >  		err = PTR_ERR(bo);
> >  		goto err_kfree;
> > @@ -589,7 +592,8 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent,
> > pgoff_t offset,
> >  	if (covers || !*child) {
> >  		u64 flags = 0;
> >  
> > -		xe_child = xe_pt_create(xe_walk->vm, xe_walk-
> > >tile, level - 1);
> > +		xe_child = xe_pt_create(xe_walk->vm, xe_walk-
> > >tile, level - 1,
> > +					xe_vm_validation_exec(vm))
> > ;
> >  		if (IS_ERR(xe_child))
> >  			return PTR_ERR(xe_child);
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_pt.h
> > b/drivers/gpu/drm/xe/xe_pt.h
> > index 5ecf003d513c..4daeebaab5a1 100644
> > --- a/drivers/gpu/drm/xe/xe_pt.h
> > +++ b/drivers/gpu/drm/xe/xe_pt.h
> > @@ -10,6 +10,7 @@
> >  #include "xe_pt_types.h"
> >  
> >  struct dma_fence;
> > +struct drm_exec;
> >  struct xe_bo;
> >  struct xe_device;
> >  struct xe_exec_queue;
> > @@ -29,7 +30,7 @@ struct xe_vma_ops;
> >  unsigned int xe_pt_shift(unsigned int level);
> >  
> >  struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
> > -			   unsigned int level);
> > +			   unsigned int level, struct drm_exec
> > *exec);
> >  
> >  void xe_pt_populate_empty(struct xe_tile *tile, struct xe_vm *vm,
> >  			  struct xe_pt *pt);
> > diff --git a/drivers/gpu/drm/xe/xe_pxp_submit.c
> > b/drivers/gpu/drm/xe/xe_pxp_submit.c
> > index ca95f2a4d4ef..e60526e30030 100644
> > --- a/drivers/gpu/drm/xe/xe_pxp_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_pxp_submit.c
> > @@ -54,8 +54,9 @@ static int
> > allocate_vcs_execution_resources(struct xe_pxp *pxp)
> >  	 * Each termination is 16 DWORDS, so 4K is enough to
> > contain a
> >  	 * termination for each sessions.
> >  	 */
> > -	bo = xe_bo_create_pin_map(xe, tile, NULL, SZ_4K,
> > ttm_bo_type_kernel,
> > -				  XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_PINNED | XE_BO_FLAG_GGTT);
> > +	bo = xe_bo_create_pin_map_novm(xe, tile, SZ_4K,
> > ttm_bo_type_kernel,
> > +				       XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_PINNED | XE_BO_FLAG_GGTT,
> > +				       false);
> >  	if (IS_ERR(bo)) {
> >  		err = PTR_ERR(bo);
> >  		goto out_queue;
> > @@ -87,7 +88,9 @@ static int allocate_gsc_client_resources(struct
> > xe_gt *gt,
> >  {
> >  	struct xe_tile *tile = gt_to_tile(gt);
> >  	struct xe_device *xe = tile_to_xe(tile);
> > +	struct xe_validation_ctx ctx;
> >  	struct xe_hw_engine *hwe;
> > +	struct drm_exec exec;
> >  	struct xe_vm *vm;
> >  	struct xe_bo *bo;
> >  	struct xe_exec_queue *q;
> > @@ -106,15 +109,26 @@ static int
> > allocate_gsc_client_resources(struct xe_gt *gt,
> >  		return PTR_ERR(vm);
> >  
> >  	/* We allocate a single object for the batch and the
> > in/out memory */
> > -	xe_vm_lock(vm, false);
> > -	bo = xe_bo_create_pin_map(xe, tile, vm, PXP_BB_SIZE +
> > inout_size * 2,
> > -				  ttm_bo_type_kernel,
> > -				  XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_PINNED | XE_BO_FLAG_NEEDS_UC);
> > -	xe_vm_unlock(vm);
> > -	if (IS_ERR(bo)) {
> > -		err = PTR_ERR(bo);
> > -		goto vm_out;
> > +
> > +	xe_validation_guard(&ctx, &xe->val, &exec, (struct
> > xe_val_flags){}, err) {
> > +		err = xe_vm_drm_exec_lock(vm, &exec);
> > +		drm_exec_retry_on_contention(&exec);
> > +		if (err)
> > +			break;
> > +
> > +		bo = xe_bo_create_pin_map(xe, tile, vm,
> > PXP_BB_SIZE + inout_size * 2,
> > +					  ttm_bo_type_kernel,
> > +					  XE_BO_FLAG_SYSTEM |
> > XE_BO_FLAG_PINNED |
> > +					  XE_BO_FLAG_NEEDS_UC,
> > &exec);
> > +		drm_exec_retry_on_contention(&exec);
> > +		if (IS_ERR(bo)) {
> > +			err = PTR_ERR(bo);
> > +			xe_validation_retry_on_oom(&ctx, &err);
> > +			break;
> > +		}
> >  	}
> > +	if (err)
> > +		goto vm_out;
> >  
> >  	fence = xe_vm_bind_kernel_bo(vm, bo, NULL, 0,
> > XE_CACHE_WB);
> >  	if (IS_ERR(fence)) {
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > b/drivers/gpu/drm/xe/xe_vm.c
> > index 23015f369e34..0d8414bd6caa 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -1603,6 +1603,7 @@ static void vm_destroy_work_func(struct
> > work_struct *w);
> >   * @xe: xe device.
> >   * @tile: tile to set up for.
> >   * @vm: vm to set up for.
> > + * @exec: The struct drm_exec object used to lock the vm resv.
> >   *
> >   * Sets up a pagetable tree with one page-table per level and a
> > single
> >   * leaf PTE. All pagetable entries point to the single page-table
> > or,
> > @@ -1612,20 +1613,19 @@ static void vm_destroy_work_func(struct
> > work_struct *w);
> >   * Return: 0 on success, negative error code on error.
> >   */
> >  static int xe_vm_create_scratch(struct xe_device *xe, struct
> > xe_tile *tile,
> > -				struct xe_vm *vm)
> > +				struct xe_vm *vm, struct drm_exec
> > *exec)
> >  {
> >  	u8 id = tile->id;
> >  	int i;
> >  
> >  	for (i = MAX_HUGEPTE_LEVEL; i < vm->pt_root[id]->level;
> > i++) {
> > -		vm->scratch_pt[id][i] = xe_pt_create(vm, tile, i);
> > +		vm->scratch_pt[id][i] = xe_pt_create(vm, tile, i,
> > exec);
> >  		if (IS_ERR(vm->scratch_pt[id][i])) {
> >  			int err = PTR_ERR(vm->scratch_pt[id][i]);
> >  
> >  			vm->scratch_pt[id][i] = NULL;
> >  			return err;
> >  		}
> > -
> >  		xe_pt_populate_empty(tile, vm, vm-
> > >scratch_pt[id][i]);
> >  	}
> >  
> > @@ -1653,9 +1653,26 @@ static void xe_vm_free_scratch(struct xe_vm
> > *vm)
> >  	}
> >  }
> >  
> > +static void xe_vm_pt_destroy(struct xe_vm *vm)
> > +{
> > +	struct xe_tile *tile;
> > +	u8 id;
> > +
> > +	xe_vm_assert_held(vm);
> > +
> > +	for_each_tile(tile, vm->xe, id) {
> > +		if (vm->pt_root[id]) {
> > +			xe_pt_destroy(vm->pt_root[id], vm->flags,
> > NULL);
> > +			vm->pt_root[id] = NULL;
> > +		}
> > +	}
> > +}
> > +
> >  struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct
> > xe_file *xef)
> >  {
> >  	struct drm_gem_object *vm_resv_obj;
> > +	struct xe_validation_ctx ctx;
> > +	struct drm_exec exec;
> >  	struct xe_vm *vm;
> >  	int err, number_tiles = 0;
> >  	struct xe_tile *tile;
> > @@ -1742,49 +1759,68 @@ struct xe_vm *xe_vm_create(struct xe_device
> > *xe, u32 flags, struct xe_file *xef)
> >  
> >  	drm_gem_object_put(vm_resv_obj);
> >  
> > -	err = xe_vm_lock(vm, true);
> > -	if (err)
> > -		goto err_close;
> > +	err = 0;
> > +	xe_validation_guard(&ctx, &xe->val, &exec, (struct
> > xe_val_flags) {.interruptible = true},
> > +			    err) {
> > +		err = xe_vm_drm_exec_lock(vm, &exec);
> > +		drm_exec_retry_on_contention(&exec);
> >  
> > -	if (IS_DGFX(xe) && xe->info.vram_flags &
> > XE_VRAM_FLAGS_NEED64K)
> > -		vm->flags |= XE_VM_FLAG_64K;
> > +		if (IS_DGFX(xe) && xe->info.vram_flags &
> > XE_VRAM_FLAGS_NEED64K)
> > +			vm->flags |= XE_VM_FLAG_64K;
> >  
> > -	for_each_tile(tile, xe, id) {
> > -		if (flags & XE_VM_FLAG_MIGRATION &&
> > -		    tile->id != XE_VM_FLAG_TILE_ID(flags))
> > -			continue;
> > +		for_each_tile(tile, xe, id) {
> > +			if (flags & XE_VM_FLAG_MIGRATION &&
> > +			    tile->id != XE_VM_FLAG_TILE_ID(flags))
> > +				continue;
> >  
> > -		vm->pt_root[id] = xe_pt_create(vm, tile, xe-
> > >info.vm_max_level);
> > -		if (IS_ERR(vm->pt_root[id])) {
> > -			err = PTR_ERR(vm->pt_root[id]);
> > -			vm->pt_root[id] = NULL;
> > -			goto err_unlock_close;
> > +			vm->pt_root[id] = xe_pt_create(vm, tile,
> > xe->info.vm_max_level,
> > +						       &exec);
> > +			if (IS_ERR(vm->pt_root[id])) {
> > +				err = PTR_ERR(vm->pt_root[id]);
> > +				vm->pt_root[id] = NULL;
> > +				xe_vm_pt_destroy(vm);
> > +				drm_exec_retry_on_contention(&exec
> > );
> > +				xe_validation_retry_on_oom(&ctx,
> > &err);
> > +				break;
> > +			}
> >  		}
> > -	}
> > +		if (err)
> > +			break;
> >  
> > -	if (xe_vm_has_scratch(vm)) {
> > -		for_each_tile(tile, xe, id) {
> > -			if (!vm->pt_root[id])
> > -				continue;
> > +		if (xe_vm_has_scratch(vm)) {
> > +			for_each_tile(tile, xe, id) {
> > +				if (!vm->pt_root[id])
> > +					continue;
> >  
> > -			err = xe_vm_create_scratch(xe, tile, vm);
> > +				err = xe_vm_create_scratch(xe,
> > tile, vm, &exec);
> > +				if (err) {
> > +					xe_vm_free_scratch(vm);
> > +					xe_vm_pt_destroy(vm);
> > +					drm_exec_retry_on_contenti
> > on(&exec);
> > +					xe_validation_retry_on_oom
> > (&ctx, &err);
> > +					break;
> > +				}
> > +			}
> >  			if (err)
> > -				goto err_unlock_close;
> > +				break;
> > +			vm->batch_invalidate_tlb = true;
> >  		}
> > -		vm->batch_invalidate_tlb = true;
> > -	}
> >  
> > -	if (vm->flags & XE_VM_FLAG_LR_MODE)
> > -		vm->batch_invalidate_tlb = false;
> > +		if (vm->flags & XE_VM_FLAG_LR_MODE) {
> > +			INIT_WORK(&vm->preempt.rebind_work,
> > preempt_rebind_work_func);
> > +			vm->batch_invalidate_tlb = false;
> > +		}
> >  
> > -	/* Fill pt_root after allocating scratch tables */
> > -	for_each_tile(tile, xe, id) {
> > -		if (!vm->pt_root[id])
> > -			continue;
> > +		/* Fill pt_root after allocating scratch tables */
> > +		for_each_tile(tile, xe, id) {
> > +			if (!vm->pt_root[id])
> > +				continue;
> >  
> > -		xe_pt_populate_empty(tile, vm, vm->pt_root[id]);
> > +			xe_pt_populate_empty(tile, vm, vm-
> > >pt_root[id]);
> > +		}
> >  	}
> > -	xe_vm_unlock(vm);
> > +	if (err)
> > +		goto err_close;
> >  
> >  	/* Kernel migration VM shouldn't have a circular loop.. */
> >  	if (!(flags & XE_VM_FLAG_MIGRATION)) {
> > @@ -1817,7 +1853,7 @@ struct xe_vm *xe_vm_create(struct xe_device
> > *xe, u32 flags, struct xe_file *xef)
> >  				      &xe->usm.next_asid,
> > GFP_KERNEL);
> >  		up_write(&xe->usm.lock);
> >  		if (err < 0)
> > -			goto err_unlock_close;
> > +			goto err_close;
> >  
> >  		vm->usm.asid = asid;
> >  	}
> > @@ -1826,8 +1862,6 @@ struct xe_vm *xe_vm_create(struct xe_device
> > *xe, u32 flags, struct xe_file *xef)
> >  
> >  	return vm;
> >  
> > -err_unlock_close:
> > -	xe_vm_unlock(vm);
> >  err_close:
> >  	xe_vm_close_and_put(vm);
> >  	return ERR_PTR(err);
> > @@ -1956,13 +1990,7 @@ void xe_vm_close_and_put(struct xe_vm *vm)
> >  	 * destroy the pagetables immediately.
> >  	 */
> >  	xe_vm_free_scratch(vm);
> > -
> > -	for_each_tile(tile, xe, id) {
> > -		if (vm->pt_root[id]) {
> > -			xe_pt_destroy(vm->pt_root[id], vm->flags,
> > NULL);
> > -			vm->pt_root[id] = NULL;
> > -		}
> > -	}
> > +	xe_vm_pt_destroy(vm);
> >  	xe_vm_unlock(vm);
> >  
> >  	/*
> > @@ -3857,7 +3885,6 @@ struct dma_fence *xe_vm_bind_kernel_bo(struct
> > xe_vm *vm, struct xe_bo *bo,
> >   */
> >  int xe_vm_lock(struct xe_vm *vm, bool intr)
> >  {
> > -	struct drm_exec *exec = XE_VALIDATION_UNIMPLEMENTED;
> >  	int ret;
> >  
> >  	if (intr)
> > @@ -3865,9 +3892,6 @@ int xe_vm_lock(struct xe_vm *vm, bool intr)
> >  	else
> >  		ret = dma_resv_lock(xe_vm_resv(vm), NULL);
> >  
> > -	if (!ret)
> > -		xe_vm_set_validation_exec(vm, exec);
> > -
> >  	return ret;
> >  }
> >  
> > @@ -3879,7 +3903,6 @@ int xe_vm_lock(struct xe_vm *vm, bool intr)
> >   */
> >  void xe_vm_unlock(struct xe_vm *vm)
> >  {
> > -	xe_vm_set_validation_exec(vm, NULL);
> >  	dma_resv_unlock(xe_vm_resv(vm));
> >  }
> >  
> > -- 
> > 2.50.1
> > 


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2025-09-02 13:33 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-22  9:40 [PATCH v2 00/16] Driver-managed exhaustive eviction Thomas Hellström
2025-08-22  9:40 ` [PATCH v2 01/16] drm/xe/vm: Don't pin the vm_resv during validation Thomas Hellström
2025-08-22  9:40 ` [PATCH v2 02/16] drm/xe/tests/xe_dma_buf: Set the drm_object::dma_buf member Thomas Hellström
2025-08-22  9:40 ` [PATCH v2 03/16] drm/xe/vm: Clear the scratch_pt pointer on error Thomas Hellström
2025-08-22  9:40 ` [PATCH v2 04/16] drm/xe: Pass down drm_exec context to validation Thomas Hellström
2025-08-22 19:59   ` Matthew Brost
2025-08-22  9:40 ` [PATCH v2 05/16] drm/xe: Introduce an xe_validation wrapper around drm_exec Thomas Hellström
2025-08-26 20:42   ` Matthew Brost
2025-08-22  9:40 ` [PATCH v2 06/16] drm/xe: Convert xe_bo_create_user() for exhaustive eviction Thomas Hellström
2025-08-23  9:32   ` Simon Richter
2025-08-22  9:40 ` [PATCH v2 07/16] drm/xe: Convert SVM validation " Thomas Hellström
2025-08-22 19:13   ` Matthew Brost
2025-08-22  9:40 ` [PATCH v2 08/16] drm/xe: Convert existing drm_exec transactions " Thomas Hellström
2025-08-22  9:40 ` [PATCH v2 09/16] drm/xe: Convert the CPU fault handler " Thomas Hellström
2025-08-26 22:53   ` Matthew Brost
2025-08-27 14:16     ` Thomas Hellström
2025-08-27 15:52       ` Matthew Brost
2025-08-28  6:18         ` Thomas Hellström
2025-08-22  9:40 ` [PATCH v2 10/16] drm/xe/display: Convert __xe_pin_fb_vma() Thomas Hellström
2025-08-26 21:29   ` Matthew Brost
2025-08-22  9:40 ` [PATCH v2 11/16] drm/xe: Convert xe_dma_buf.c for exhaustive eviction Thomas Hellström
2025-08-26 21:16   ` Matthew Brost
2025-08-22  9:40 ` [PATCH v2 12/16] drm/xe: Rename ___xe_bo_create_locked() Thomas Hellström
2025-08-22  9:40 ` [PATCH v2 13/16] drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction Thomas Hellström
2025-08-26 21:27   ` Matthew Brost
2025-08-22  9:40 ` [PATCH v2 14/16] drm/xe: Convert xe_bo_create_pin_map() " Thomas Hellström
2025-08-26 21:52   ` Matthew Brost
2025-09-02 13:32     ` Thomas Hellström
2025-08-22  9:40 ` [PATCH v2 15/16] drm/xe/sriov: Convert pf_provision_vf_lmem " Thomas Hellström
2025-08-22 19:35   ` Matthew Brost
2025-08-22  9:40 ` [PATCH v2 16/16] drm/xe: Convert pinned suspend eviction " Thomas Hellström
2025-08-26 22:08   ` Matthew Brost
2025-08-22 10:50 ` ✗ CI.checkpatch: warning for Driver-managed exhaustive eviction (rev2) Patchwork
2025-08-22 10:51 ` ✓ CI.KUnit: success " Patchwork
2025-08-22 11:31 ` ✓ Xe.CI.BAT: " Patchwork
2025-08-23  4:17 ` ✗ Xe.CI.Full: failure " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).