[PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache
@ 2026-02-26  4:28 Matthew Brost
  2026-02-26  4:28 ` [PATCH v4 01/12] drm/xe: Fine grained page fault locking Matthew Brost
                   ` (16 more replies)
  0 siblings, 17 replies; 33+ messages in thread
From: Matthew Brost @ 2026-02-26  4:28 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Fine-grained fault locking provides immediate benefits: it allows page
faults from the same VM to be processed in parallel (unless they target
the same range) and enables a sane multi-threaded prefetch
implementation. UMD prefetch benchmarks see 10% to 50% improvement in
prefetch performance on BMG depending on PCIe bus speed.

Once parallel fault processing is available, the pagefault queue can be
unified into a single queue with multiple workers pulling faults to
process. A single queue then allows a sensible pagefault cache to be
implemented, so that multiple faults targeting the same region can be
batched together and acknowledged in, ideally, a single pass. This saves
CPU cycles during pagefault handling and improves overall throughput of
the fault handler.

Significant improvements in UMD pagefault benchmarks can be seen when
utilizing this caching.

v3:
 - Fix kunit build (CI)
v4:
 - Actually fix kunit build (CI)

Matt

Matthew Brost (12):
  drm/xe: Fine grained page fault locking
  drm/xe: Allow prefetch-only VM bind IOCTLs to use VM read lock
  drm/xe: Thread prefetch of SVM ranges
  drm/xe: Use a single page-fault queue with multiple workers
  drm/xe: Add num_pf_work modparam
  drm/xe: Engine class and instance into a u8
  drm/xe: Track pagefault worker runtime
  drm/xe: Chain page faults via queue-resident cache to avoid fault
    storms
  drm/xe: Add pagefault chaining stats
  drm/xe: Add debugfs pagefault_info
  drm/xe: batch CT pagefault acks with periodic flush
  drm/xe: Track parallel page fault activity in GT stats

 drivers/gpu/drm/drm_gpusvm.c            |   2 +-
 drivers/gpu/drm/xe/xe_debugfs.c         |  11 +
 drivers/gpu/drm/xe/xe_defaults.h        |   1 +
 drivers/gpu/drm/xe/xe_device.c          |  17 +-
 drivers/gpu/drm/xe/xe_device_types.h    |  17 +-
 drivers/gpu/drm/xe/xe_gt_stats.c        |   7 +
 drivers/gpu/drm/xe/xe_gt_stats_types.h  |   7 +
 drivers/gpu/drm/xe/xe_guc_ct.c          |  94 +++-
 drivers/gpu/drm/xe/xe_guc_ct.h          |  35 +-
 drivers/gpu/drm/xe/xe_guc_pagefault.c   |  35 +-
 drivers/gpu/drm/xe/xe_guc_types.h       |   6 +
 drivers/gpu/drm/xe/xe_module.c          |   4 +
 drivers/gpu/drm/xe/xe_module.h          |   1 +
 drivers/gpu/drm/xe/xe_pagefault.c       | 675 ++++++++++++++++++++----
 drivers/gpu/drm/xe/xe_pagefault.h       |  74 +++
 drivers/gpu/drm/xe/xe_pagefault_types.h | 109 +++-
 drivers/gpu/drm/xe/xe_svm.c             | 129 +++--
 drivers/gpu/drm/xe/xe_svm.h             |  59 ++-
 drivers/gpu/drm/xe/xe_userptr.c         |  20 +-
 drivers/gpu/drm/xe/xe_vm.c              | 215 ++++++--
 drivers/gpu/drm/xe/xe_vm_types.h        |  37 +-
 21 files changed, 1309 insertions(+), 246 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v4 01/12] drm/xe: Fine grained page fault locking
  2026-02-26  4:28 [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache Matthew Brost
@ 2026-02-26  4:28 ` Matthew Brost
  2026-02-26  4:28 ` [PATCH v4 02/12] drm/xe: Allow prefetch-only VM bind IOCTLs to use VM read lock Matthew Brost
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Matthew Brost @ 2026-02-26  4:28 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Enable page faults to be serviced while holding vm->lock in read mode.

Introduce additional locks to:
 - Ensure only one page fault thread services a given range or VMA
 - Serialize SVM garbage collection
 - Protect SVM range insertion and removal

While these locks may contend during page faults, expensive operations
like migration can now run in parallel within a single VM.

In addition to new locking, ranges must be reference-counted after
lookup, as another thread could immediately remove them from the GPU SVM
tree, potentially dropping the last reference.

Lastly, decouple the VM’s ASID from the page fault queue selection to
allow parallel page fault handling within the same VM.

Lays the groundwork for prefetch IOCTLs to use threaded migration too.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/drm_gpusvm.c         |   2 +-
 drivers/gpu/drm/xe/xe_device_types.h |   2 +
 drivers/gpu/drm/xe/xe_pagefault.c    | 100 +++++++++++++++------------
 drivers/gpu/drm/xe/xe_svm.c          |  92 +++++++++++++++++-------
 drivers/gpu/drm/xe/xe_svm.h          |  44 ++++++++++++
 drivers/gpu/drm/xe/xe_userptr.c      |  20 +++++-
 drivers/gpu/drm/xe/xe_vm.c           |  40 +++++++++--
 drivers/gpu/drm/xe/xe_vm_types.h     |  24 ++++++-
 8 files changed, 243 insertions(+), 81 deletions(-)

diff --git a/drivers/gpu/drm/drm_gpusvm.c b/drivers/gpu/drm/drm_gpusvm.c
index 35dd07297dd0..c71dba009d32 100644
--- a/drivers/gpu/drm/drm_gpusvm.c
+++ b/drivers/gpu/drm/drm_gpusvm.c
@@ -1624,7 +1624,7 @@ void drm_gpusvm_unmap_pages(struct drm_gpusvm *gpusvm,
 			    const struct drm_gpusvm_ctx *ctx)
 {
 	if (ctx->in_notifier)
-		lockdep_assert_held_write(&gpusvm->notifier_lock);
+		lockdep_assert_held(&gpusvm->notifier_lock);
 	else
 		drm_gpusvm_notifier_lock(gpusvm);
 
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 8f3ef836541e..1eb0fe118940 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -304,6 +304,8 @@ struct xe_device {
 		struct xarray asid_to_vm;
 		/** @usm.next_asid: next ASID, used to cyclical alloc asids */
 		u32 next_asid;
+		/** @usm.current_pf_queue: current page fault queue */
+		u32 current_pf_queue;
 		/** @usm.lock: protects UM state */
 		struct rw_semaphore lock;
 		/** @usm.pf_wq: page fault work queue, unbound, high priority */
diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
index ea4857acf28d..421262c2a63a 100644
--- a/drivers/gpu/drm/xe/xe_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_pagefault.c
@@ -71,9 +71,9 @@ static int xe_pagefault_handle_vma(struct xe_gt *gt, struct xe_vma *vma,
 	struct xe_validation_ctx ctx;
 	struct drm_exec exec;
 	struct dma_fence *fence;
-	int err, needs_vram;
+	int err = 0, needs_vram;
 
-	lockdep_assert_held_write(&vm->lock);
+	lockdep_assert_held(&vm->lock);
 
 	needs_vram = xe_vma_need_vram_for_atomic(vm->xe, vma, atomic);
 	if (needs_vram < 0 || (needs_vram && xe_vma_is_userptr(vma)))
@@ -85,50 +85,52 @@ static int xe_pagefault_handle_vma(struct xe_gt *gt, struct xe_vma *vma,
 
 	trace_xe_vma_pagefault(vma);
 
+	guard(mutex)(&vma->fault_lock);
+
 	/* Check if VMA is valid, opportunistic check only */
 	if (xe_vm_has_valid_gpu_mapping(tile, vma->tile_present,
 					vma->tile_invalidated) && !atomic)
 		return 0;
 
-retry_userptr:
-	if (xe_vma_is_userptr(vma) &&
-	    xe_vma_userptr_check_repin(to_userptr_vma(vma))) {
-		struct xe_userptr_vma *uvma = to_userptr_vma(vma);
+	do {
+		if (xe_vma_is_userptr(vma) &&
+		    xe_vma_userptr_check_repin(to_userptr_vma(vma))) {
+			struct xe_userptr_vma *uvma = to_userptr_vma(vma);
 
-		err = xe_vma_userptr_pin_pages(uvma);
-		if (err)
-			return err;
-	}
+			err = xe_vma_userptr_pin_pages(uvma);
+			if (err)
+				return err;
+		}
 
-	/* Lock VM and BOs dma-resv */
-	xe_validation_ctx_init(&ctx, &vm->xe->val, &exec, (struct xe_val_flags) {});
-	drm_exec_until_all_locked(&exec) {
-		err = xe_pagefault_begin(&exec, vma, tile->mem.vram,
-					 needs_vram == 1);
-		drm_exec_retry_on_contention(&exec);
-		xe_validation_retry_on_oom(&ctx, &err);
-		if (err)
-			goto unlock_dma_resv;
-
-		/* Bind VMA only to the GT that has faulted */
-		trace_xe_vma_pf_bind(vma);
-		xe_vm_set_validation_exec(vm, &exec);
-		fence = xe_vma_rebind(vm, vma, BIT(tile->id));
-		xe_vm_set_validation_exec(vm, NULL);
-		if (IS_ERR(fence)) {
-			err = PTR_ERR(fence);
+		/* Lock VM and BOs dma-resv */
+		xe_validation_ctx_init(&ctx, &vm->xe->val, &exec,
+				       (struct xe_val_flags) {});
+		drm_exec_until_all_locked(&exec) {
+			err = xe_pagefault_begin(&exec, vma, tile->mem.vram,
+						 needs_vram == 1);
+			drm_exec_retry_on_contention(&exec);
 			xe_validation_retry_on_oom(&ctx, &err);
-			goto unlock_dma_resv;
+			if (err)
+				break;
+
+			/* Bind VMA only to the GT that has faulted */
+			trace_xe_vma_pf_bind(vma);
+			xe_vm_set_validation_exec(vm, &exec);
+			fence = xe_vma_rebind(vm, vma, BIT(tile->id));
+			xe_vm_set_validation_exec(vm, NULL);
+			if (IS_ERR(fence)) {
+				err = PTR_ERR(fence);
+				xe_validation_retry_on_oom(&ctx, &err);
+				break;
+			}
 		}
-	}
+		xe_validation_ctx_fini(&ctx);
+	} while (err == -EAGAIN);
 
-	dma_fence_wait(fence, false);
-	dma_fence_put(fence);
-
-unlock_dma_resv:
-	xe_validation_ctx_fini(&ctx);
-	if (err == -EAGAIN)
-		goto retry_userptr;
+	if (!err) {
+		dma_fence_wait(fence, false);
+		dma_fence_put(fence);
+	}
 
 	return err;
 }
@@ -171,10 +173,7 @@ static int xe_pagefault_service(struct xe_pagefault *pf)
 	if (IS_ERR(vm))
 		return PTR_ERR(vm);
 
-	/*
-	 * TODO: Change to read lock? Using write lock for simplicity.
-	 */
-	down_write(&vm->lock);
+	down_read(&vm->lock);
 
 	if (xe_vm_is_closed(vm)) {
 		err = -ENOENT;
@@ -198,7 +197,7 @@ static int xe_pagefault_service(struct xe_pagefault *pf)
 unlock_vm:
 	if (!err)
 		vm->usm.last_fault_vma = vma;
-	up_write(&vm->lock);
+	up_read(&vm->lock);
 	xe_vm_put(vm);
 
 	return err;
@@ -418,6 +417,19 @@ static bool xe_pagefault_queue_full(struct xe_pagefault_queue *pf_queue)
 		xe_pagefault_entry_size();
 }
 
+/*
+ * This function can race with multiple page fault producers, but worst case we
+ * stick a page fault on the same queue for consumption.
+ */
+static int xe_pagefault_queue_index(struct xe_device *xe)
+{
+	u32 old_pf_queue = READ_ONCE(xe->usm.current_pf_queue);
+
+	WRITE_ONCE(xe->usm.current_pf_queue, (old_pf_queue + 1));
+
+	return old_pf_queue % XE_PAGEFAULT_QUEUE_COUNT;
+}
+
 /**
  * xe_pagefault_handler() - Page fault handler
  * @xe: xe device instance
@@ -430,8 +442,8 @@ static bool xe_pagefault_queue_full(struct xe_pagefault_queue *pf_queue)
  */
 int xe_pagefault_handler(struct xe_device *xe, struct xe_pagefault *pf)
 {
-	struct xe_pagefault_queue *pf_queue = xe->usm.pf_queue +
-		(pf->consumer.asid % XE_PAGEFAULT_QUEUE_COUNT);
+	int queue_index = xe_pagefault_queue_index(xe);
+	struct xe_pagefault_queue *pf_queue = xe->usm.pf_queue + queue_index;
 	unsigned long flags;
 	bool full;
 
@@ -445,7 +457,7 @@ int xe_pagefault_handler(struct xe_device *xe, struct xe_pagefault *pf)
 	} else {
 		drm_warn(&xe->drm,
 			 "PageFault Queue (%d) full, shouldn't be possible\n",
-			 pf->consumer.asid % XE_PAGEFAULT_QUEUE_COUNT);
+			 queue_index);
 	}
 	spin_unlock_irqrestore(&pf_queue->lock, flags);
 
diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index 002b6c22ad3f..3e59695e0c01 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -114,6 +114,7 @@ xe_svm_range_alloc(struct drm_gpusvm *gpusvm)
 		return NULL;
 
 	INIT_LIST_HEAD(&range->garbage_collector_link);
+	mutex_init(&range->lock);
 	xe_vm_get(gpusvm_to_vm(gpusvm));
 
 	return &range->base;
@@ -121,6 +122,7 @@ xe_svm_range_alloc(struct drm_gpusvm *gpusvm)
 
 static void xe_svm_range_free(struct drm_gpusvm_range *range)
 {
+	mutex_destroy(&to_xe_range(range)->lock);
 	xe_vm_put(range_to_vm(range));
 	kfree(range);
 }
@@ -135,11 +137,11 @@ xe_svm_garbage_collector_add_range(struct xe_vm *vm, struct xe_svm_range *range,
 
 	drm_gpusvm_range_set_unmapped(&range->base, mmu_range);
 
-	spin_lock(&vm->svm.garbage_collector.lock);
+	spin_lock(&vm->svm.garbage_collector.list_lock);
 	if (list_empty(&range->garbage_collector_link))
 		list_add_tail(&range->garbage_collector_link,
 			      &vm->svm.garbage_collector.range_list);
-	spin_unlock(&vm->svm.garbage_collector.lock);
+	spin_unlock(&vm->svm.garbage_collector.list_lock);
 
 	queue_work(xe->usm.pf_wq, &vm->svm.garbage_collector.work);
 }
@@ -297,16 +299,24 @@ static int __xe_svm_garbage_collector(struct xe_vm *vm,
 {
 	struct dma_fence *fence;
 
-	range_debug(range, "GARBAGE COLLECTOR");
+	scoped_guard(mutex, &range->lock) {
+		drm_gpusvm_range_get(&range->base);
+		range->removed = true;
 
-	xe_vm_lock(vm, false);
-	fence = xe_vm_range_unbind(vm, range);
-	xe_vm_unlock(vm);
-	if (IS_ERR(fence))
-		return PTR_ERR(fence);
-	dma_fence_put(fence);
+		range_debug(range, "GARBAGE COLLECTOR");
+
+		xe_vm_lock(vm, false);
+		fence = xe_vm_range_unbind(vm, range);
+		xe_vm_unlock(vm);
+		if (IS_ERR(fence))
+			return PTR_ERR(fence);
+		dma_fence_put(fence);
 
-	drm_gpusvm_range_remove(&vm->svm.gpusvm, &range->base);
+		scoped_guard(mutex, &vm->svm.range_lock)
+			drm_gpusvm_range_remove(&vm->svm.gpusvm, &range->base);
+	}
+
+	drm_gpusvm_range_put(&range->base);
 
 	return 0;
 }
@@ -378,13 +388,15 @@ static int xe_svm_garbage_collector(struct xe_vm *vm)
 	u64 range_end;
 	int err, ret = 0;
 
-	lockdep_assert_held_write(&vm->lock);
+	lockdep_assert_held(&vm->lock);
 
 	if (xe_vm_is_closed_or_banned(vm))
 		return -ENOENT;
 
+	guard(mutex)(&vm->svm.garbage_collector.lock);
+
 	for (;;) {
-		spin_lock(&vm->svm.garbage_collector.lock);
+		spin_lock(&vm->svm.garbage_collector.list_lock);
 		range = list_first_entry_or_null(&vm->svm.garbage_collector.range_list,
 						 typeof(*range),
 						 garbage_collector_link);
@@ -395,7 +407,7 @@ static int xe_svm_garbage_collector(struct xe_vm *vm)
 		range_end = xe_svm_range_end(range);
 
 		list_del(&range->garbage_collector_link);
-		spin_unlock(&vm->svm.garbage_collector.lock);
+		spin_unlock(&vm->svm.garbage_collector.list_lock);
 
 		err = __xe_svm_garbage_collector(vm, range);
 		if (err) {
@@ -414,7 +426,7 @@ static int xe_svm_garbage_collector(struct xe_vm *vm)
 				return err;
 		}
 	}
-	spin_unlock(&vm->svm.garbage_collector.lock);
+	spin_unlock(&vm->svm.garbage_collector.list_lock);
 
 	return ret;
 }
@@ -424,9 +436,8 @@ static void xe_svm_garbage_collector_work_func(struct work_struct *w)
 	struct xe_vm *vm = container_of(w, struct xe_vm,
 					svm.garbage_collector.work);
 
-	down_write(&vm->lock);
+	guard(rwsem_read)(&vm->lock);
 	xe_svm_garbage_collector(vm);
-	up_write(&vm->lock);
 }
 
 #if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
@@ -855,8 +866,11 @@ int xe_svm_init(struct xe_vm *vm)
 {
 	int err;
 
+	mutex_init(&vm->svm.range_lock);
+	mutex_init(&vm->svm.garbage_collector.lock);
+
 	if (vm->flags & XE_VM_FLAG_FAULT_MODE) {
-		spin_lock_init(&vm->svm.garbage_collector.lock);
+		spin_lock_init(&vm->svm.garbage_collector.list_lock);
 		INIT_LIST_HEAD(&vm->svm.garbage_collector.range_list);
 		INIT_WORK(&vm->svm.garbage_collector.work,
 			  xe_svm_garbage_collector_work_func);
@@ -878,7 +892,7 @@ int xe_svm_init(struct xe_vm *vm)
 				      xe_modparam.svm_notifier_size * SZ_1M,
 				      &gpusvm_ops, fault_chunk_sizes,
 				      ARRAY_SIZE(fault_chunk_sizes));
-		drm_gpusvm_driver_set_lock(&vm->svm.gpusvm, &vm->lock);
+		drm_gpusvm_driver_set_lock(&vm->svm.gpusvm, &vm->svm.range_lock);
 
 		if (err) {
 			xe_svm_put_pagemaps(vm);
@@ -918,7 +932,10 @@ void xe_svm_fini(struct xe_vm *vm)
 {
 	xe_assert(vm->xe, xe_vm_is_closed(vm));
 
-	drm_gpusvm_fini(&vm->svm.gpusvm);
+	scoped_guard(mutex, &vm->svm.range_lock)
+		drm_gpusvm_fini(&vm->svm.gpusvm);
+	mutex_destroy(&vm->svm.range_lock);
+	mutex_destroy(&vm->svm.garbage_collector.lock);
 }
 
 static bool xe_svm_range_has_pagemap_locked(const struct xe_svm_range *range,
@@ -1198,20 +1215,26 @@ static int __xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
 	};
 	struct xe_validation_ctx vctx;
 	struct drm_exec exec;
-	struct xe_svm_range *range;
+	struct xe_svm_range *range = NULL;
 	struct dma_fence *fence;
 	struct drm_pagemap *dpagemap;
 	struct xe_tile *tile = gt_to_tile(gt);
 	int migrate_try_count = ctx.devmem_only ? 3 : 1;
 	ktime_t start = xe_gt_stats_ktime_get(), bind_start, get_pages_start;
-	int err;
+	int err = 0;
 
-	lockdep_assert_held_write(&vm->lock);
+	lockdep_assert_held(&vm->lock);
 	xe_assert(vm->xe, xe_vma_is_cpu_addr_mirror(vma));
 
 	xe_gt_stats_incr(gt, XE_GT_STATS_ID_SVM_PAGEFAULT_COUNT, 1);
 
 retry:
+	/* Release old range */
+	if (range) {
+		mutex_unlock(&range->lock);
+		drm_gpusvm_range_put(&range->base);
+	}
+
 	/* Always process UNMAPs first so view SVM ranges is current */
 	err = xe_svm_garbage_collector(vm);
 	if (err)
@@ -1227,6 +1250,11 @@ static int __xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
 
 	xe_svm_range_fault_count_stats_incr(gt, range);
 
+	mutex_lock(&range->lock);
+
+	if (xe_svm_range_is_removed(range))
+		goto retry;
+
 	if (ctx.devmem_only && !range->base.pages.flags.migrate_devmem) {
 		err = -EACCES;
 		goto out;
@@ -1268,7 +1296,7 @@ static int __xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
 				drm_err(&vm->xe->drm,
 					"VRAM allocation failed, retry count exceeded, asid=%u, errno=%pe\n",
 					vm->usm.asid, ERR_PTR(err));
-				return err;
+				goto err_out;
 			}
 		}
 	}
@@ -1330,6 +1358,8 @@ static int __xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
 
 out:
 	xe_svm_range_fault_us_stats_incr(gt, range, start);
+	mutex_unlock(&range->lock);
+	drm_gpusvm_range_put(&range->base);
 	return 0;
 
 err_out:
@@ -1339,6 +1369,9 @@ static int __xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
 		goto retry;
 	}
 
+	mutex_unlock(&range->lock);
+	drm_gpusvm_range_put(&range->base);
+
 	return err;
 }
 
@@ -1421,9 +1454,9 @@ void xe_svm_unmap_address_range(struct xe_vm *vm, u64 start, u64 end)
 				drm_gpusvm_range_get(range);
 				__xe_svm_garbage_collector(vm, to_xe_range(range));
 				if (!list_empty(&to_xe_range(range)->garbage_collector_link)) {
-					spin_lock(&vm->svm.garbage_collector.lock);
+					spin_lock(&vm->svm.garbage_collector.list_lock);
 					list_del(&to_xe_range(range)->garbage_collector_link);
-					spin_unlock(&vm->svm.garbage_collector.lock);
+					spin_unlock(&vm->svm.garbage_collector.list_lock);
 				}
 				drm_gpusvm_range_put(range);
 			}
@@ -1453,7 +1486,7 @@ int xe_svm_bo_evict(struct xe_bo *bo)
  * @ctx: GPU SVM context
  *
  * This function finds or inserts a newly allocated a SVM range based on the
- * address.
+ * address. Take a reference to SVM range on success.
  *
  * Return: Pointer to the SVM range on success, ERR_PTR() on failure.
  */
@@ -1462,11 +1495,15 @@ struct xe_svm_range *xe_svm_range_find_or_insert(struct xe_vm *vm, u64 addr,
 {
 	struct drm_gpusvm_range *r;
 
+	guard(mutex)(&vm->svm.range_lock);
+
 	r = drm_gpusvm_range_find_or_insert(&vm->svm.gpusvm, max(addr, xe_vma_start(vma)),
 					    xe_vma_start(vma), xe_vma_end(vma), ctx);
 	if (IS_ERR(r))
 		return ERR_CAST(r);
 
+	drm_gpusvm_range_get(r);
+
 	return to_xe_range(r);
 }
 
@@ -1486,6 +1523,8 @@ int xe_svm_range_get_pages(struct xe_vm *vm, struct xe_svm_range *range,
 {
 	int err = 0;
 
+	lockdep_assert_held(&range->lock);
+
 	err = drm_gpusvm_range_get_pages(&vm->svm.gpusvm, &range->base, ctx);
 	if (err == -EOPNOTSUPP) {
 		range_debug(range, "PAGE FAULT - EVICT PAGES");
@@ -1602,6 +1641,7 @@ int xe_svm_alloc_vram(struct xe_svm_range *range, const struct drm_gpusvm_ctx *c
 	int err, retries = 1;
 	bool write_locked = false;
 
+	lockdep_assert_held(&range->lock);
 	xe_assert(range_to_vm(&range->base)->xe, range->base.pages.flags.migrate_devmem);
 	range_debug(range, "ALLOCATE VRAM");
 
diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index b7b8eeacf196..fd26bfeb4a07 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -36,6 +36,13 @@ struct xe_svm_range {
 	 * list. Protected by VM's garbage collect lock.
 	 */
 	struct list_head garbage_collector_link;
+	/**
+	 * @lock: Protects fault handler, garbage collector, and prefetch
+	 * critical sections, ensuring only one thread operates on a range at a
+	 * time. Locking order: inside vm->lock and garbage collector, outside
+	 * dma-resv locks, vm->svm.range_lock.
+	 */
+	struct mutex lock;
 	/**
 	 * @tile_present: Tile mask of binding is present for this range.
 	 * Protected by GPU SVM notifier lock.
@@ -46,8 +53,22 @@ struct xe_svm_range {
 	 * range. Protected by GPU SVM notifier lock.
 	 */
 	u8 tile_invalidated;
+	/**
+	 * @removed: Range has been removed from GPU SVM tree, protected by
+	 * @lock.
+	 */
+	bool removed;
 };
 
+/**
+ * xe_svm_range_put() - SVM range put
+ * @range: SVM range
+ */
+static inline void xe_svm_range_put(struct xe_svm_range *range)
+{
+	drm_gpusvm_range_put(&range->base);
+}
+
 /**
  * struct xe_pagemap - Manages xe device_private memory for SVM.
  * @pagemap: The struct dev_pagemap providing the struct pages.
@@ -135,6 +156,19 @@ static inline bool xe_svm_range_has_dma_mapping(struct xe_svm_range *range)
 	return range->base.pages.flags.has_dma_mapping;
 }
 
+/**
+ * xe_svm_range_is_removed() - SVM range is removed from GPU SVM tree
+ * @range: SVM range
+ *
+ * Return: True if SVM range is removed from GPU SVM tree, False otherwise
+ */
+static inline bool xe_svm_range_is_removed(struct xe_svm_range *range)
+{
+	lockdep_assert_held(&range->lock);
+
+	return range->removed;
+}
+
 /**
  * to_xe_range - Convert a drm_gpusvm_range pointer to a xe_svm_range
  * @r: Pointer to the drm_gpusvm_range structure
@@ -214,10 +248,15 @@ struct xe_svm_range {
 			const struct drm_pagemap_addr *dma_addr;
 		} pages;
 	} base;
+	struct mutex lock;
 	u32 tile_present;
 	u32 tile_invalidated;
 };
 
+static inline void xe_svm_range_put(struct xe_svm_range *range)
+{
+}
+
 static inline bool xe_svm_range_pages_valid(struct xe_svm_range *range)
 {
 	return false;
@@ -387,6 +426,11 @@ static inline struct drm_pagemap *xe_drm_pagemap_from_fd(int fd, u32 region_inst
 	return ERR_PTR(-ENOENT);
 }
 
+static inline bool xe_svm_range_is_removed(struct xe_svm_range *range)
+{
+	return false;
+}
+
 #define xe_svm_range_has_dma_mapping(...) false
 #endif /* CONFIG_DRM_XE_GPUSVM */
 
diff --git a/drivers/gpu/drm/xe/xe_userptr.c b/drivers/gpu/drm/xe/xe_userptr.c
index e120323c43bc..bf6043de1b8e 100644
--- a/drivers/gpu/drm/xe/xe_userptr.c
+++ b/drivers/gpu/drm/xe/xe_userptr.c
@@ -48,6 +48,22 @@ int __xe_vm_userptr_needs_repin(struct xe_vm *vm)
 		list_empty(&vm->userptr.invalidated)) ? 0 : -EAGAIN;
 }
 
+#if IS_ENABLED(CONFIG_PROVE_LOCKING)
+static bool __xe_vma_userptr_lockdep(struct xe_userptr_vma *uvma)
+{
+	struct xe_vma *vma = &uvma->vma;
+	struct xe_vm *vm = xe_vma_vm(vma);
+
+	return lockdep_is_held_type(&vm->lock, 0) ||
+		(lockdep_is_held_type(&vm->lock, 1) &&
+		 lockdep_is_held_type(&vma->fault_lock, 0));
+}
+#define xe_vma_userptr_lockdep(uvma)	\
+	lockdep_assert(__xe_vma_userptr_lockdep(uvma))
+#else
+#define xe_vma_userptr_lockdep(uvma)
+#endif
+
 int xe_vma_userptr_pin_pages(struct xe_userptr_vma *uvma)
 {
 	struct xe_vma *vma = &uvma->vma;
@@ -59,7 +75,7 @@ int xe_vma_userptr_pin_pages(struct xe_userptr_vma *uvma)
 		.allow_mixed = true,
 	};
 
-	lockdep_assert_held(&vm->lock);
+	xe_vma_userptr_lockdep(uvma);
 	xe_assert(xe, xe_vma_is_userptr(vma));
 
 	if (vma->gpuva.flags & XE_VMA_DESTROYED)
@@ -167,7 +183,7 @@ void xe_vma_userptr_force_invalidate(struct xe_userptr_vma *uvma)
 	struct xe_vm *vm = xe_vma_vm(&uvma->vma);
 
 	/* Protect against concurrent userptr pinning */
-	lockdep_assert_held(&vm->lock);
+	xe_vma_userptr_lockdep(uvma);
 	/* Protect against concurrent notifiers */
 	lockdep_assert_held(&vm->svm.gpusvm.notifier_lock);
 	/*
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 548b0769b3ef..3332a86f464f 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -597,6 +597,17 @@ static int xe_vma_ops_alloc(struct xe_vma_ops *vops, bool array_of_binds)
 }
 ALLOW_ERROR_INJECTION(xe_vma_ops_alloc, ERRNO);
 
+static void xe_vma_svm_prefetch_ranges_fini(struct xe_vma_op *op)
+{
+	struct xe_svm_range *svm_range;
+	unsigned long i;
+
+	xa_for_each(&op->prefetch_range.range, i, svm_range)
+		xe_svm_range_put(svm_range);
+
+	xa_destroy(&op->prefetch_range.range);
+}
+
 static void xe_vma_svm_prefetch_op_fini(struct xe_vma_op *op)
 {
 	struct xe_vma *vma;
@@ -604,7 +615,7 @@ static void xe_vma_svm_prefetch_op_fini(struct xe_vma_op *op)
 	vma = gpuva_to_vma(op->base.prefetch.va);
 
 	if (op->base.op == DRM_GPUVA_OP_PREFETCH && xe_vma_is_cpu_addr_mirror(vma))
-		xa_destroy(&op->prefetch_range.range);
+		xe_vma_svm_prefetch_ranges_fini(op);
 }
 
 static void xe_vma_svm_prefetch_ops_fini(struct xe_vma_ops *vops)
@@ -838,6 +849,7 @@ struct dma_fence *xe_vm_range_rebind(struct xe_vm *vm,
 	u8 id;
 	int err;
 
+	lockdep_assert_held(&range->lock);
 	lockdep_assert_held(&vm->lock);
 	xe_vm_assert_held(vm);
 	xe_assert(vm->xe, xe_vm_in_fault_mode(vm));
@@ -920,6 +932,7 @@ struct dma_fence *xe_vm_range_unbind(struct xe_vm *vm,
 	u8 id;
 	int err;
 
+	lockdep_assert_held(&range->lock);
 	lockdep_assert_held(&vm->lock);
 	xe_vm_assert_held(vm);
 	xe_assert(vm->xe, xe_vm_in_fault_mode(vm));
@@ -1023,6 +1036,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 			vma->gpuva.gem.obj = &bo->ttm.base;
 	}
 
+	mutex_init(&vma->fault_lock);
+
 	INIT_LIST_HEAD(&vma->combined_links.rebind);
 
 	INIT_LIST_HEAD(&vma->gpuva.gem.entry);
@@ -1095,6 +1110,7 @@ static void xe_vma_destroy_late(struct xe_vma *vma)
 		xe_bo_put(xe_vma_bo(vma));
 	}
 
+	mutex_destroy(&vma->fault_lock);
 	xe_vma_free(vma);
 }
 
@@ -1115,11 +1131,18 @@ static void vma_destroy_cb(struct dma_fence *fence,
 	queue_work(system_dfl_wq, &vma->destroy_work);
 }
 
+static void xe_vm_assert_write_mode_or_garbage_collector(struct xe_vm *vm)
+{
+	lockdep_assert(lockdep_is_held_type(&vm->lock, 0) ||
+		       (lockdep_is_held_type(&vm->lock, 1) &&
+			lockdep_is_held_type(&vm->svm.garbage_collector.lock, 0)));
+}
+
 static void xe_vma_destroy(struct xe_vma *vma, struct dma_fence *fence)
 {
 	struct xe_vm *vm = xe_vma_vm(vma);
 
-	lockdep_assert_held_write(&vm->lock);
+	xe_vm_assert_write_mode_or_garbage_collector(vm);
 	xe_assert(vm->xe, list_empty(&vma->combined_links.destroy));
 
 	if (xe_vma_is_userptr(vma)) {
@@ -2462,7 +2485,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
 	struct xe_vma *vma;
 	int err = 0;
 
-	lockdep_assert_held_write(&vm->lock);
+	xe_vm_assert_write_mode_or_garbage_collector(vm);
 
 	if (bo) {
 		err = 0;
@@ -2559,7 +2582,7 @@ static int xe_vma_op_commit(struct xe_vm *vm, struct xe_vma_op *op)
 {
 	int err = 0;
 
-	lockdep_assert_held_write(&vm->lock);
+	xe_vm_assert_write_mode_or_garbage_collector(vm);
 
 	switch (op->base.op) {
 	case DRM_GPUVA_OP_MAP:
@@ -2650,7 +2673,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops,
 	u8 id, tile_mask = 0;
 	int err = 0;
 
-	lockdep_assert_held_write(&vm->lock);
+	xe_vm_assert_write_mode_or_garbage_collector(vm);
 
 	for_each_tile(tile, vm->xe, id)
 		tile_mask |= 0x1 << id;
@@ -2826,7 +2849,7 @@ static void xe_vma_op_unwind(struct xe_vm *vm, struct xe_vma_op *op,
 			     bool post_commit, bool prev_post_commit,
 			     bool next_post_commit)
 {
-	lockdep_assert_held_write(&vm->lock);
+	xe_vm_assert_write_mode_or_garbage_collector(vm);
 
 	switch (op->base.op) {
 	case DRM_GPUVA_OP_MAP:
@@ -2956,6 +2979,11 @@ static int prefetch_ranges(struct xe_vm *vm, struct xe_vma_op *op)
 
 	/* TODO: Threading the migration */
 	xa_for_each(&op->prefetch_range.range, i, svm_range) {
+		guard(mutex)(&svm_range->lock);
+
+		if (xe_svm_range_is_removed(svm_range))
+			return -ENODATA;
+
 		if (!dpagemap)
 			xe_svm_range_migrate_to_smem(vm, svm_range);
 
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 1f6f7e30e751..9c91934ec47f 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -119,6 +119,12 @@ struct xe_vma {
 		struct work_struct destroy_work;
 	};
 
+	/**
+	 * @fault_lock: Synchronizes fault processing. Locking order: inside
+	 * vm->lock, outside dma-resv.
+	 */
+	struct mutex fault_lock;
+
 	/**
 	 * @tile_invalidated: Tile mask of binding are invalidated for this VMA.
 	 * protected by BO's resv and for userptrs, vm->svm.gpusvm.notifier_lock in
@@ -183,13 +189,27 @@ struct xe_vm {
 	struct {
 		/** @svm.gpusvm: base GPUSVM used to track fault allocations */
 		struct drm_gpusvm gpusvm;
+		/**
+		 * @svm.range_lock: Protects insertion and removal of ranges
+		 * from GPU SVM tree.
+		 */
+		struct mutex range_lock;
 		/**
 		 * @svm.garbage_collector: Garbage collector which is used unmap
 		 * SVM range's GPU bindings and destroy the ranges.
 		 */
 		struct {
-			/** @svm.garbage_collector.lock: Protect's range list */
-			spinlock_t lock;
+			/**
+			 * @svm.garbage_collector.lock: Ensures only one thread
+			 * runs the garbage collector at a time. Locking order:
+			 * inside vm->lock, outside range->lock and dma-resv.
+			 */
+			struct mutex lock;
+			/**
+			 * @svm.garbage_collector.list_lock: Protect's range
+			 * list
+			 */
+			spinlock_t list_lock;
 			/**
 			 * @svm.garbage_collector.range_list: List of SVM ranges
 			 * in the garbage collector.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 02/12] drm/xe: Allow prefetch-only VM bind IOCTLs to use VM read lock
  2026-02-26  4:28 [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache Matthew Brost
  2026-02-26  4:28 ` [PATCH v4 01/12] drm/xe: Fine grained page fault locking Matthew Brost
@ 2026-02-26  4:28 ` Matthew Brost
  2026-02-26  4:28 ` [PATCH v4 03/12] drm/xe: Thread prefetch of SVM ranges Matthew Brost
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Matthew Brost @ 2026-02-26  4:28 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Prefetch-only VM bind IOCTLs do not modify VMAs after pinning userptr
pages. Downgrade vm->lock to read mode once pinning is complete.

Lays the groundwork for prefetch IOCTLs to use threaded migration.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c       | 36 +++++++++++++++++++++++++++-----
 drivers/gpu/drm/xe/xe_vm_types.h |  2 ++
 2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 3332a86f464f..204a89ca3397 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2336,10 +2336,12 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_vma_ops *vops,
 			.map.gem.offset = bo_offset_or_userptr,
 		};
 
+		vops->flags |= XE_VMA_OPS_FLAG_MODIFIES_GPUVA;
 		ops = drm_gpuvm_sm_map_ops_create(&vm->gpuvm, &map_req);
 		break;
 	}
 	case DRM_XE_VM_BIND_OP_UNMAP:
+		vops->flags |= XE_VMA_OPS_FLAG_MODIFIES_GPUVA;
 		ops = drm_gpuvm_sm_unmap_ops_create(&vm->gpuvm, addr, range);
 		break;
 	case DRM_XE_VM_BIND_OP_PREFETCH:
@@ -2348,6 +2350,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_vma_ops *vops,
 	case DRM_XE_VM_BIND_OP_UNMAP_ALL:
 		xe_assert(vm->xe, bo);
 
+		vops->flags |= XE_VMA_OPS_FLAG_MODIFIES_GPUVA;
 		err = xe_bo_lock(bo, true);
 		if (err)
 			return ERR_PTR(err);
@@ -2397,6 +2400,9 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_vma_ops *vops,
 			u8 id, tile_mask = 0;
 			u32 i;
 
+			if (xe_vma_is_userptr(vma))
+				vops->flags |= XE_VMA_OPS_FLAG_MODIFIES_GPUVA;
+
 			if (!xe_vma_is_cpu_addr_mirror(vma)) {
 				op->prefetch.region = prefetch_region;
 				break;
@@ -2582,10 +2588,12 @@ static int xe_vma_op_commit(struct xe_vm *vm, struct xe_vma_op *op)
 {
 	int err = 0;
 
-	xe_vm_assert_write_mode_or_garbage_collector(vm);
+	lockdep_assert_held(&vm->lock);
 
 	switch (op->base.op) {
 	case DRM_GPUVA_OP_MAP:
+		xe_vm_assert_write_mode_or_garbage_collector(vm);
+
 		err |= xe_vm_insert_vma(vm, op->map.vma);
 		if (!err)
 			op->flags |= XE_VMA_OP_COMMITTED;
@@ -2595,6 +2603,8 @@ static int xe_vma_op_commit(struct xe_vm *vm, struct xe_vma_op *op)
 		u8 tile_present =
 			gpuva_to_vma(op->base.remap.unmap->va)->tile_present;
 
+		xe_vm_assert_write_mode_or_garbage_collector(vm);
+
 		prep_vma_destroy(vm, gpuva_to_vma(op->base.remap.unmap->va),
 				 true);
 		op->flags |= XE_VMA_OP_COMMITTED;
@@ -2628,6 +2638,8 @@ static int xe_vma_op_commit(struct xe_vm *vm, struct xe_vma_op *op)
 		break;
 	}
 	case DRM_GPUVA_OP_UNMAP:
+		xe_vm_assert_write_mode_or_garbage_collector(vm);
+
 		prep_vma_destroy(vm, gpuva_to_vma(op->base.unmap.va), true);
 		op->flags |= XE_VMA_OP_COMMITTED;
 		break;
@@ -2849,10 +2861,12 @@ static void xe_vma_op_unwind(struct xe_vm *vm, struct xe_vma_op *op,
 			     bool post_commit, bool prev_post_commit,
 			     bool next_post_commit)
 {
-	xe_vm_assert_write_mode_or_garbage_collector(vm);
+	lockdep_assert_held(&vm->lock);
 
 	switch (op->base.op) {
 	case DRM_GPUVA_OP_MAP:
+		xe_vm_assert_write_mode_or_garbage_collector(vm);
+
 		if (op->map.vma) {
 			prep_vma_destroy(vm, op->map.vma, post_commit);
 			xe_vma_destroy_unlocked(op->map.vma);
@@ -2862,6 +2876,8 @@ static void xe_vma_op_unwind(struct xe_vm *vm, struct xe_vma_op *op,
 	{
 		struct xe_vma *vma = gpuva_to_vma(op->base.unmap.va);
 
+		xe_vm_assert_write_mode_or_garbage_collector(vm);
+
 		if (vma) {
 			xe_svm_notifier_lock(vm);
 			vma->gpuva.flags &= ~XE_VMA_DESTROYED;
@@ -2875,6 +2891,8 @@ static void xe_vma_op_unwind(struct xe_vm *vm, struct xe_vma_op *op,
 	{
 		struct xe_vma *vma = gpuva_to_vma(op->base.remap.unmap->va);
 
+		xe_vm_assert_write_mode_or_garbage_collector(vm);
+
 		if (op->remap.prev) {
 			prep_vma_destroy(vm, op->remap.prev, prev_post_commit);
 			xe_vma_destroy_unlocked(op->remap.prev);
@@ -3362,7 +3380,7 @@ static struct dma_fence *vm_bind_ioctl_ops_execute(struct xe_vm *vm,
 	struct dma_fence *fence;
 	int err = 0;
 
-	lockdep_assert_held_write(&vm->lock);
+	lockdep_assert_held(&vm->lock);
 
 	xe_validation_guard(&ctx, &vm->xe->val, &exec,
 			    ((struct xe_val_flags) {
@@ -3664,7 +3682,7 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	u32 num_syncs, num_ufence = 0;
 	struct xe_sync_entry *syncs = NULL;
 	struct drm_xe_vm_bind_op *bind_ops = NULL;
-	struct xe_vma_ops vops;
+	struct xe_vma_ops vops = { .flags = 0, };
 	struct dma_fence *fence;
 	int err;
 	int i;
@@ -3839,6 +3857,11 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		goto unwind_ops;
 	}
 
+	if (!(vops.flags & XE_VMA_OPS_FLAG_MODIFIES_GPUVA)) {
+		vops.flags |= XE_VMA_OPS_FLAG_DOWNGRADE_LOCK;
+		downgrade_write(&vm->lock);
+	}
+
 	err = xe_vma_ops_alloc(&vops, args->num_binds > 1);
 	if (err)
 		goto unwind_ops;
@@ -3875,7 +3898,10 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 free_bos:
 	kvfree(bos);
 release_vm_lock:
-	up_write(&vm->lock);
+	if (vops.flags & XE_VMA_OPS_FLAG_DOWNGRADE_LOCK)
+		up_read(&vm->lock);
+	else
+		up_write(&vm->lock);
 put_exec_queue:
 	if (q)
 		xe_exec_queue_put(q);
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 9c91934ec47f..db6e8e22a69f 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -518,6 +518,8 @@ struct xe_vma_ops {
 #define XE_VMA_OPS_ARRAY_OF_BINDS	 BIT(2)
 #define XE_VMA_OPS_FLAG_SKIP_TLB_WAIT	 BIT(3)
 #define XE_VMA_OPS_FLAG_ALLOW_SVM_UNMAP  BIT(4)
+#define XE_VMA_OPS_FLAG_MODIFIES_GPUVA	 BIT(5)
+#define XE_VMA_OPS_FLAG_DOWNGRADE_LOCK	 BIT(6)
 	u32 flags;
 #ifdef TEST_VM_OPS_ERROR
 	/** @inject_error: inject error to test error handling */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 03/12] drm/xe: Thread prefetch of SVM ranges
  2026-02-26  4:28 [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache Matthew Brost
  2026-02-26  4:28 ` [PATCH v4 01/12] drm/xe: Fine grained page fault locking Matthew Brost
  2026-02-26  4:28 ` [PATCH v4 02/12] drm/xe: Allow prefetch-only VM bind IOCTLs to use VM read lock Matthew Brost
@ 2026-02-26  4:28 ` Matthew Brost
  2026-02-26  4:28 ` [PATCH v4 04/12] drm/xe: Use a single page-fault queue with multiple workers Matthew Brost
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Matthew Brost @ 2026-02-26  4:28 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

The migrate_vma_* functions are very CPU-intensive; as a result,
prefetching SVM ranges is limited by CPU performance rather than paging
copy engine bandwidth. To accelerate SVM range prefetching, the step
that calls migrate_vma_* is now threaded. Reuses the page fault work
queue for threading.

Running xe_exec_system_allocator --r prefetch-benchmark, which tests
64MB prefetches, shows an increase from ~4.35 GB/s to 12.25 GB/s with
this patch on drm-tip. Enabling high SLPC further increases throughput
to ~15.25 GB/s, and combining SLPC with ULLS raises it to ~16 GB/s. Both
of these optimizations are upcoming.

v2:
 - Use dedicated prefetch workqueue
 - Pick dedicated prefetch thread count based on profiling
 - Skip threaded prefetch for only 1 range or if prefetching to SRAM
 - Fully tested
v3:
 - Use page fault work queue

Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_pagefault.c |  31 +++++-
 drivers/gpu/drm/xe/xe_svm.c       |  23 ++++-
 drivers/gpu/drm/xe/xe_svm.h       |   6 +-
 drivers/gpu/drm/xe/xe_vm.c        | 150 +++++++++++++++++++++++-------
 drivers/gpu/drm/xe/xe_vm_types.h  |  15 +--
 5 files changed, 175 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
index 421262c2a63a..a372db7cd839 100644
--- a/drivers/gpu/drm/xe/xe_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_pagefault.c
@@ -173,7 +173,17 @@ static int xe_pagefault_service(struct xe_pagefault *pf)
 	if (IS_ERR(vm))
 		return PTR_ERR(vm);
 
-	down_read(&vm->lock);
+	/*
+	 * We can't block threaded prefetches from completing. down_read() can
+	 * block on a pending down_write(), so without a trylock here, we could
+	 * deadlock, since the page fault workqueue is shared with prefetches,
+	 * prefetches flush work items onto the same workqueue, and a
+	 * down_write() could be pending.
+	 */
+	if (!down_read_trylock(&vm->lock)) {
+		err = -EAGAIN;
+		goto put_vm;
+	}
 
 	if (xe_vm_is_closed(vm)) {
 		err = -ENOENT;
@@ -198,11 +208,23 @@ static int xe_pagefault_service(struct xe_pagefault *pf)
 	if (!err)
 		vm->usm.last_fault_vma = vma;
 	up_read(&vm->lock);
+put_vm:
 	xe_vm_put(vm);
 
 	return err;
 }
 
+static void xe_pagefault_queue_retry(struct xe_pagefault_queue *pf_queue,
+				     struct xe_pagefault *pf)
+{
+	spin_lock_irq(&pf_queue->lock);
+	if (!pf_queue->tail)
+		pf_queue->tail = pf_queue->size - xe_pagefault_entry_size();
+	else
+		pf_queue->tail -= xe_pagefault_entry_size();
+	spin_unlock_irq(&pf_queue->lock);
+}
+
 static bool xe_pagefault_queue_pop(struct xe_pagefault_queue *pf_queue,
 				   struct xe_pagefault *pf)
 {
@@ -260,7 +282,12 @@ static void xe_pagefault_queue_work(struct work_struct *w)
 			continue;
 
 		err = xe_pagefault_service(&pf);
-		if (err) {
+
+		if (err == -EAGAIN) {
+			xe_pagefault_queue_retry(pf_queue, &pf);
+			queue_work(gt_to_xe(pf.gt)->usm.pf_wq, w);
+			break;
+		} else if (err) {
 			if (!(pf.consumer.access_type & XE_PAGEFAULT_ACCESS_PREFETCH)) {
 				xe_pagefault_print(&pf);
 				xe_gt_info(pf.gt, "Fault response: Unsuccessful %pe\n",
diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index 3e59695e0c01..66eee490a0c3 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -436,8 +436,19 @@ static void xe_svm_garbage_collector_work_func(struct work_struct *w)
 	struct xe_vm *vm = container_of(w, struct xe_vm,
 					svm.garbage_collector.work);
 
-	guard(rwsem_read)(&vm->lock);
-	xe_svm_garbage_collector(vm);
+	/*
+	 * We can't block threaded prefetches from completing. down_read() can
+	 * block on a pending down_write(), so without a trylock here, we could
+	 * deadlock, since the page fault workqueue is shared with prefetches,
+	 * prefetches flush work items onto the same workqueue, and a
+	 * down_write() could be pending.
+	 */
+	if (down_read_trylock(&vm->lock)) {
+		xe_svm_garbage_collector(vm);
+		up_read(&vm->lock);
+	} else {
+		queue_work(vm->xe->usm.pf_wq, &vm->svm.garbage_collector.work);
+	}
 }
 
 #if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
@@ -988,6 +999,7 @@ void xe_svm_range_migrate_to_smem(struct xe_vm *vm, struct xe_svm_range *range)
  * @tile_mask: Mask representing the tiles to be checked
  * @dpagemap: if !%NULL, the range is expected to be present
  * in device memory identified by this parameter.
+ * @valid_pages: Pages are valid, result written back to caller
  *
  * The xe_svm_range_validate() function checks if a range is
  * valid and located in the desired memory region.
@@ -996,7 +1008,8 @@ void xe_svm_range_migrate_to_smem(struct xe_vm *vm, struct xe_svm_range *range)
  */
 bool xe_svm_range_validate(struct xe_vm *vm,
 			   struct xe_svm_range *range,
-			   u8 tile_mask, const struct drm_pagemap *dpagemap)
+			   u8 tile_mask, const struct drm_pagemap *dpagemap,
+			   bool *valid_pages)
 {
 	bool ret;
 
@@ -1008,6 +1021,8 @@ bool xe_svm_range_validate(struct xe_vm *vm,
 	else
 		ret = ret && !range->base.pages.dpagemap;
 
+	*valid_pages = xe_svm_range_pages_valid(range);
+
 	xe_svm_notifier_unlock(vm);
 
 	return ret;
@@ -2064,5 +2079,5 @@ struct drm_pagemap *xe_drm_pagemap_from_fd(int fd, u32 region_instance)
 void xe_svm_flush(struct xe_vm *vm)
 {
 	if (xe_vm_in_fault_mode(vm))
-		flush_work(&vm->svm.garbage_collector.work);
+		__flush_workqueue(vm->xe->usm.pf_wq);
 }
diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index fd26bfeb4a07..ebcca34f7f4d 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -132,7 +132,8 @@ void xe_svm_range_migrate_to_smem(struct xe_vm *vm, struct xe_svm_range *range);
 
 bool xe_svm_range_validate(struct xe_vm *vm,
 			   struct xe_svm_range *range,
-			   u8 tile_mask, const struct drm_pagemap *dpagemap);
+			   u8 tile_mask, const struct drm_pagemap *dpagemap,
+			   bool *valid_pages);
 
 u64 xe_svm_find_vma_start(struct xe_vm *vm, u64 addr, u64 end,  struct xe_vma *vma);
 
@@ -374,7 +375,8 @@ void xe_svm_range_migrate_to_smem(struct xe_vm *vm, struct xe_svm_range *range)
 static inline
 bool xe_svm_range_validate(struct xe_vm *vm,
 			   struct xe_svm_range *range,
-			   u8 tile_mask, bool devmem_preferred)
+			   u8 tile_mask, const struct drm_pagemap *dpagemap,
+			   bool *valid_pages)
 {
 	return false;
 }
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 204a89ca3397..06669e9c500d 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2399,6 +2399,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_vma_ops *vops,
 			struct drm_pagemap *dpagemap = NULL;
 			u8 id, tile_mask = 0;
 			u32 i;
+			bool valid_pages;
 
 			if (xe_vma_is_userptr(vma))
 				vops->flags |= XE_VMA_OPS_FLAG_MODIFIES_GPUVA;
@@ -2446,8 +2447,10 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_vma_ops *vops,
 				goto unwind_prefetch_ops;
 			}
 
-			if (xe_svm_range_validate(vm, svm_range, tile_mask, dpagemap)) {
+			if (xe_svm_range_validate(vm, svm_range, tile_mask,
+						  dpagemap, &valid_pages)) {
 				xe_svm_range_debug(svm_range, "PREFETCH - RANGE IS VALID");
+				xe_assert(vm->xe, valid_pages);
 				goto check_next_range;
 			}
 
@@ -2460,6 +2463,8 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_vma_ops *vops,
 
 			op->prefetch_range.ranges_count++;
 			vops->flags |= XE_VMA_OPS_FLAG_HAS_SVM_PREFETCH;
+			if (valid_pages)
+				vops->flags |= XE_VMA_OPS_FLAG_HAS_SVM_VALID_RANGE;
 			xe_svm_range_debug(svm_range, "PREFETCH - RANGE CREATED");
 check_next_range:
 			if (range_end > xe_svm_range_end(svm_range) &&
@@ -2976,16 +2981,83 @@ static int check_ufence(struct xe_vma *vma)
 	return 0;
 }
 
-static int prefetch_ranges(struct xe_vm *vm, struct xe_vma_op *op)
+struct prefetch_thread {
+	struct work_struct work;
+	struct drm_gpusvm_ctx *ctx;
+	struct xe_vma *vma;
+	struct xe_svm_range *svm_range;
+	struct drm_pagemap *dpagemap;
+	int err;
+};
+
+static void prefetch_thread_func(struct prefetch_thread *thread)
+{
+	struct xe_vma *vma = thread->vma;
+	struct xe_vm *vm = xe_vma_vm(vma);
+	struct xe_svm_range *svm_range = thread->svm_range;
+	struct drm_pagemap *dpagemap = thread->dpagemap;
+	int err = 0;
+
+	guard(mutex)(&svm_range->lock);
+
+	if (xe_svm_range_is_removed(svm_range)) {
+		thread->err = -ENODATA;
+		return;
+	}
+
+	if (!dpagemap)
+		xe_svm_range_migrate_to_smem(vm, svm_range);
+
+	if (IS_ENABLED(CONFIG_DRM_XE_DEBUG_VM)) {
+		drm_dbg(&vm->xe->drm,
+			"Prefetch pagemap is %s start 0x%016lx end 0x%016lx\n",
+			dpagemap ? dpagemap->drm->unique : "system",
+			xe_svm_range_start(svm_range), xe_svm_range_end(svm_range));
+	}
+
+	if (xe_svm_range_needs_migrate_to_vram(svm_range, vma, dpagemap)) {
+		err = xe_svm_alloc_vram(svm_range, thread->ctx, dpagemap);
+		if (err) {
+			drm_dbg(&vm->xe->drm, "VRAM allocation failed, retry from userspace, asid=%u, gpusvm=%p, errno=%pe\n",
+				vm->usm.asid, &vm->svm.gpusvm, ERR_PTR(err));
+			thread->err = -ENODATA;
+			return;
+		}
+		xe_svm_range_debug(svm_range, "PREFETCH - RANGE MIGRATED TO VRAM");
+	}
+
+	err = xe_svm_range_get_pages(vm, svm_range, thread->ctx);
+	if (err) {
+		drm_dbg(&vm->xe->drm, "Get pages failed, asid=%u, gpusvm=%p, errno=%pe\n",
+			vm->usm.asid, &vm->svm.gpusvm, ERR_PTR(err));
+		if (err == -EOPNOTSUPP || err == -EFAULT || err == -EPERM)
+			err = -ENODATA;
+		thread->err = -ENODATA;
+		return;
+	}
+	xe_svm_range_debug(svm_range, "PREFETCH - RANGE GET PAGES DONE");
+}
+
+static void prefetch_work_func(struct work_struct *w)
+{
+	struct prefetch_thread *thread =
+		container_of(w, struct prefetch_thread, work);
+
+	prefetch_thread_func(thread);
+}
+
+static int prefetch_ranges(struct xe_vm *vm, struct xe_vma_ops *vops,
+			   struct xe_vma_op *op)
 {
 	bool devmem_possible = IS_DGFX(vm->xe) && IS_ENABLED(CONFIG_DRM_XE_PAGEMAP);
 	struct xe_vma *vma = gpuva_to_vma(op->base.prefetch.va);
 	struct drm_pagemap *dpagemap = op->prefetch_range.dpagemap;
-	int err = 0;
-
 	struct xe_svm_range *svm_range;
 	struct drm_gpusvm_ctx ctx = {};
+	struct prefetch_thread stack_thread, *thread, *prefetches;
 	unsigned long i;
+	int err = 0, idx = 0;
+	bool skip_threads;
 
 	if (!xe_vma_is_cpu_addr_mirror(vma))
 		return 0;
@@ -2995,42 +3067,49 @@ static int prefetch_ranges(struct xe_vm *vm, struct xe_vma_op *op)
 	ctx.check_pages_threshold = devmem_possible ? SZ_64K : 0;
 	ctx.device_private_page_owner = xe_svm_private_page_owner(vm, !dpagemap);
 
-	/* TODO: Threading the migration */
-	xa_for_each(&op->prefetch_range.range, i, svm_range) {
-		guard(mutex)(&svm_range->lock);
-
-		if (xe_svm_range_is_removed(svm_range))
-			return -ENODATA;
+	skip_threads =  op->prefetch_range.ranges_count == 1 ||
+		(!dpagemap && !(vops->flags &
+				XE_VMA_OPS_FLAG_HAS_SVM_VALID_RANGE)) ||
+		!(vops->flags & XE_VMA_OPS_FLAG_DOWNGRADE_LOCK);
+	thread = skip_threads ? &stack_thread : NULL;
 
-		if (!dpagemap)
-			xe_svm_range_migrate_to_smem(vm, svm_range);
+	if (!skip_threads) {
+		prefetches = kvmalloc_array(op->prefetch_range.ranges_count,
+					    sizeof(*prefetches), GFP_KERNEL);
+		if (!prefetches)
+			return -ENOMEM;
+	}
 
-		if (IS_ENABLED(CONFIG_DRM_XE_DEBUG_VM)) {
-			drm_dbg(&vm->xe->drm,
-				"Prefetch pagemap is %s start 0x%016lx end 0x%016lx\n",
-				dpagemap ? dpagemap->drm->unique : "system",
-				xe_svm_range_start(svm_range), xe_svm_range_end(svm_range));
+	xa_for_each(&op->prefetch_range.range, i, svm_range) {
+		if (!skip_threads) {
+			thread = prefetches + idx++;
+			INIT_WORK(&thread->work, prefetch_work_func);
 		}
 
-		if (xe_svm_range_needs_migrate_to_vram(svm_range, vma, dpagemap)) {
-			err = xe_svm_alloc_vram(svm_range, &ctx, dpagemap);
-			if (err) {
-				drm_dbg(&vm->xe->drm, "VRAM allocation failed, retry from userspace, asid=%u, gpusvm=%p, errno=%pe\n",
-					vm->usm.asid, &vm->svm.gpusvm, ERR_PTR(err));
-				return -ENODATA;
-			}
-			xe_svm_range_debug(svm_range, "PREFETCH - RANGE MIGRATED TO VRAM");
+		thread->ctx = &ctx;
+		thread->vma = vma;
+		thread->svm_range = svm_range;
+		thread->dpagemap = dpagemap;
+		thread->err = 0;
+
+		if (skip_threads) {
+			prefetch_thread_func(thread);
+			if (thread->err)
+				return thread->err;
+		} else {
+			queue_work(vm->xe->usm.pf_wq, &thread->work);
 		}
+	}
 
-		err = xe_svm_range_get_pages(vm, svm_range, &ctx);
-		if (err) {
-			drm_dbg(&vm->xe->drm, "Get pages failed, asid=%u, gpusvm=%p, errno=%pe\n",
-				vm->usm.asid, &vm->svm.gpusvm, ERR_PTR(err));
-			if (err == -EOPNOTSUPP || err == -EFAULT || err == -EPERM)
-				err = -ENODATA;
-			return err;
+	if (!skip_threads) {
+		for (i = 0; i < idx; ++i) {
+			thread = prefetches + i;
+
+			flush_work(&thread->work);
+			if (thread->err && (!err || err == -ENODATA))
+				err = thread->err;
 		}
-		xe_svm_range_debug(svm_range, "PREFETCH - RANGE GET PAGES DONE");
+		kvfree(prefetches);
 	}
 
 	return err;
@@ -3109,7 +3188,8 @@ static int op_lock_and_prep(struct drm_exec *exec, struct xe_vm *vm,
 	return err;
 }
 
-static int vm_bind_ioctl_ops_prefetch_ranges(struct xe_vm *vm, struct xe_vma_ops *vops)
+static int vm_bind_ioctl_ops_prefetch_ranges(struct xe_vm *vm,
+					     struct xe_vma_ops *vops)
 {
 	struct xe_vma_op *op;
 	int err;
@@ -3119,7 +3199,7 @@ static int vm_bind_ioctl_ops_prefetch_ranges(struct xe_vm *vm, struct xe_vma_ops
 
 	list_for_each_entry(op, &vops->list, link) {
 		if (op->base.op  == DRM_GPUVA_OP_PREFETCH) {
-			err = prefetch_ranges(vm, op);
+			err = prefetch_ranges(vm, vops, op);
 			if (err)
 				return err;
 		}
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index db6e8e22a69f..7d5a82b2b64f 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -513,13 +513,14 @@ struct xe_vma_ops {
 	/** @pt_update_ops: page table update operations */
 	struct xe_vm_pgtable_update_ops pt_update_ops[XE_MAX_TILES_PER_DEVICE];
 	/** @flag: signify the properties within xe_vma_ops*/
-#define XE_VMA_OPS_FLAG_HAS_SVM_PREFETCH BIT(0)
-#define XE_VMA_OPS_FLAG_MADVISE          BIT(1)
-#define XE_VMA_OPS_ARRAY_OF_BINDS	 BIT(2)
-#define XE_VMA_OPS_FLAG_SKIP_TLB_WAIT	 BIT(3)
-#define XE_VMA_OPS_FLAG_ALLOW_SVM_UNMAP  BIT(4)
-#define XE_VMA_OPS_FLAG_MODIFIES_GPUVA	 BIT(5)
-#define XE_VMA_OPS_FLAG_DOWNGRADE_LOCK	 BIT(6)
+#define XE_VMA_OPS_FLAG_HAS_SVM_PREFETCH	BIT(0)
+#define XE_VMA_OPS_FLAG_MADVISE			BIT(1)
+#define XE_VMA_OPS_ARRAY_OF_BINDS		BIT(2)
+#define XE_VMA_OPS_FLAG_SKIP_TLB_WAIT		BIT(3)
+#define XE_VMA_OPS_FLAG_ALLOW_SVM_UNMAP		BIT(4)
+#define XE_VMA_OPS_FLAG_MODIFIES_GPUVA		BIT(5)
+#define XE_VMA_OPS_FLAG_DOWNGRADE_LOCK		BIT(6)
+#define XE_VMA_OPS_FLAG_HAS_SVM_VALID_RANGE	BIT(7)
 	u32 flags;
 #ifdef TEST_VM_OPS_ERROR
 	/** @inject_error: inject error to test error handling */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 04/12] drm/xe: Use a single page-fault queue with multiple workers
  2026-02-26  4:28 [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache Matthew Brost
                   ` (2 preceding siblings ...)
  2026-02-26  4:28 ` [PATCH v4 03/12] drm/xe: Thread prefetch of SVM ranges Matthew Brost
@ 2026-02-26  4:28 ` Matthew Brost
  2026-05-06 15:46   ` Maciej Patelczyk
  2026-02-26  4:28 ` [PATCH v4 05/12] drm/xe: Add num_pf_work modparam Matthew Brost
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 33+ messages in thread
From: Matthew Brost @ 2026-02-26  4:28 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

With fine-grained page-fault locking, it no longer makes sense to
maintain multiple page-fault queues, as we no longer hash queues based
on the VM’s ASID. Multiple workers can pull page faults from a single
queue, eliminating any head-of-queue blocking. Refactor the structures
and code to use a single shared queue.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_device_types.h    | 12 +++---
 drivers/gpu/drm/xe/xe_pagefault.c       | 52 +++++++++++++------------
 drivers/gpu/drm/xe/xe_pagefault_types.h | 17 +++++++-
 3 files changed, 50 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 1eb0fe118940..0558dfd52541 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -304,8 +304,8 @@ struct xe_device {
 		struct xarray asid_to_vm;
 		/** @usm.next_asid: next ASID, used to cyclical alloc asids */
 		u32 next_asid;
-		/** @usm.current_pf_queue: current page fault queue */
-		u32 current_pf_queue;
+		/** @usm.current_pf_work: current page fault work item */
+		u32 current_pf_work;
 		/** @usm.lock: protects UM state */
 		struct rw_semaphore lock;
 		/** @usm.pf_wq: page fault work queue, unbound, high priority */
@@ -315,9 +315,11 @@ struct xe_device {
 		 * yields the best bandwidth utilization of the kernel paging
 		 * engine.
 		 */
-#define XE_PAGEFAULT_QUEUE_COUNT	4
-		/** @usm.pf_queue: Page fault queues */
-		struct xe_pagefault_queue pf_queue[XE_PAGEFAULT_QUEUE_COUNT];
+#define XE_PAGEFAULT_WORK_COUNT	4
+		/** @usm.pf_workers: Page fault workers */
+		struct xe_pagefault_work pf_workers[XE_PAGEFAULT_WORK_COUNT];
+		/** @usm.pf_queue: Page fault queue */
+		struct xe_pagefault_queue pf_queue;
 #if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
 		/** @usm.pagemap_shrinker: Shrinker for unused pagemaps */
 		struct drm_pagemap_shrinker *dpagemap_shrinker;
diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
index a372db7cd839..7880fc7e7eb4 100644
--- a/drivers/gpu/drm/xe/xe_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_pagefault.c
@@ -222,6 +222,7 @@ static void xe_pagefault_queue_retry(struct xe_pagefault_queue *pf_queue,
 		pf_queue->tail = pf_queue->size - xe_pagefault_entry_size();
 	else
 		pf_queue->tail -= xe_pagefault_entry_size();
+	memcpy(pf_queue->data + pf_queue->tail, pf, sizeof(*pf));
 	spin_unlock_irq(&pf_queue->lock);
 }
 
@@ -267,8 +268,10 @@ static void xe_pagefault_print(struct xe_pagefault *pf)
 
 static void xe_pagefault_queue_work(struct work_struct *w)
 {
-	struct xe_pagefault_queue *pf_queue =
-		container_of(w, typeof(*pf_queue), worker);
+	struct xe_pagefault_work *pf_work =
+		container_of(w, typeof(*pf_work), work);
+	struct xe_device *xe = pf_work->xe;
+	struct xe_pagefault_queue *pf_queue = &xe->usm.pf_queue;
 	struct xe_pagefault pf;
 	unsigned long threshold;
 
@@ -285,7 +288,7 @@ static void xe_pagefault_queue_work(struct work_struct *w)
 
 		if (err == -EAGAIN) {
 			xe_pagefault_queue_retry(pf_queue, &pf);
-			queue_work(gt_to_xe(pf.gt)->usm.pf_wq, w);
+			queue_work(xe->usm.pf_wq, w);
 			break;
 		} else if (err) {
 			if (!(pf.consumer.access_type & XE_PAGEFAULT_ACCESS_PREFETCH)) {
@@ -302,7 +305,7 @@ static void xe_pagefault_queue_work(struct work_struct *w)
 		pf.producer.ops->ack_fault(&pf, err);
 
 		if (time_after(jiffies, threshold)) {
-			queue_work(gt_to_xe(pf.gt)->usm.pf_wq, w);
+			queue_work(xe->usm.pf_wq, w);
 			break;
 		}
 	}
@@ -348,7 +351,6 @@ static int xe_pagefault_queue_init(struct xe_device *xe,
 		xe_pagefault_entry_size(), total_num_eus, pf_queue->size);
 
 	spin_lock_init(&pf_queue->lock);
-	INIT_WORK(&pf_queue->worker, xe_pagefault_queue_work);
 
 	pf_queue->data = drmm_kzalloc(&xe->drm, pf_queue->size, GFP_KERNEL);
 	if (!pf_queue->data)
@@ -381,14 +383,20 @@ int xe_pagefault_init(struct xe_device *xe)
 
 	xe->usm.pf_wq = alloc_workqueue("xe_page_fault_work_queue",
 					WQ_UNBOUND | WQ_HIGHPRI,
-					XE_PAGEFAULT_QUEUE_COUNT);
+					XE_PAGEFAULT_WORK_COUNT);
 	if (!xe->usm.pf_wq)
 		return -ENOMEM;
 
-	for (i = 0; i < XE_PAGEFAULT_QUEUE_COUNT; ++i) {
-		err = xe_pagefault_queue_init(xe, xe->usm.pf_queue + i);
-		if (err)
-			goto err_out;
+	err = xe_pagefault_queue_init(xe, &xe->usm.pf_queue);
+	if (err)
+		goto err_out;
+
+	for (i = 0; i < XE_PAGEFAULT_WORK_COUNT; ++i) {
+		struct xe_pagefault_work *pf_work = xe->usm.pf_workers + i;
+
+		pf_work->xe = xe;
+		pf_work->id = i;
+		INIT_WORK(&pf_work->work, xe_pagefault_queue_work);
 	}
 
 	return devm_add_action_or_reset(xe->drm.dev, xe_pagefault_fini, xe);
@@ -430,10 +438,7 @@ static void xe_pagefault_queue_reset(struct xe_device *xe, struct xe_gt *gt,
  */
 void xe_pagefault_reset(struct xe_device *xe, struct xe_gt *gt)
 {
-	int i;
-
-	for (i = 0; i < XE_PAGEFAULT_QUEUE_COUNT; ++i)
-		xe_pagefault_queue_reset(xe, gt, xe->usm.pf_queue + i);
+	xe_pagefault_queue_reset(xe, gt, &xe->usm.pf_queue);
 }
 
 static bool xe_pagefault_queue_full(struct xe_pagefault_queue *pf_queue)
@@ -448,13 +453,11 @@ static bool xe_pagefault_queue_full(struct xe_pagefault_queue *pf_queue)
  * This function can race with multiple page fault producers, but worst case we
  * stick a page fault on the same queue for consumption.
  */
-static int xe_pagefault_queue_index(struct xe_device *xe)
+static int xe_pagefault_work_index(struct xe_device *xe)
 {
-	u32 old_pf_queue = READ_ONCE(xe->usm.current_pf_queue);
-
-	WRITE_ONCE(xe->usm.current_pf_queue, (old_pf_queue + 1));
+	lockdep_assert_held(&xe->usm.pf_queue.lock);
 
-	return old_pf_queue % XE_PAGEFAULT_QUEUE_COUNT;
+	return xe->usm.current_pf_work++ % XE_PAGEFAULT_WORK_COUNT;
 }
 
 /**
@@ -469,22 +472,23 @@ static int xe_pagefault_queue_index(struct xe_device *xe)
  */
 int xe_pagefault_handler(struct xe_device *xe, struct xe_pagefault *pf)
 {
-	int queue_index = xe_pagefault_queue_index(xe);
-	struct xe_pagefault_queue *pf_queue = xe->usm.pf_queue + queue_index;
+	struct xe_pagefault_queue *pf_queue = &xe->usm.pf_queue;
 	unsigned long flags;
+	int work_index;
 	bool full;
 
 	spin_lock_irqsave(&pf_queue->lock, flags);
+	work_index = xe_pagefault_work_index(xe);
 	full = xe_pagefault_queue_full(pf_queue);
 	if (!full) {
 		memcpy(pf_queue->data + pf_queue->head, pf, sizeof(*pf));
 		pf_queue->head = (pf_queue->head + xe_pagefault_entry_size()) %
 			pf_queue->size;
-		queue_work(xe->usm.pf_wq, &pf_queue->worker);
+		queue_work(xe->usm.pf_wq,
+			   &xe->usm.pf_workers[work_index].work);
 	} else {
 		drm_warn(&xe->drm,
-			 "PageFault Queue (%d) full, shouldn't be possible\n",
-			 queue_index);
+			 "PageFault Queue full, shouldn't be possible\n");
 	}
 	spin_unlock_irqrestore(&pf_queue->lock, flags);
 
diff --git a/drivers/gpu/drm/xe/xe_pagefault_types.h b/drivers/gpu/drm/xe/xe_pagefault_types.h
index b3289219b1be..45065c25c25f 100644
--- a/drivers/gpu/drm/xe/xe_pagefault_types.h
+++ b/drivers/gpu/drm/xe/xe_pagefault_types.h
@@ -131,8 +131,21 @@ struct xe_pagefault_queue {
 	u32 tail;
 	/** @lock: protects page fault queue */
 	spinlock_t lock;
-	/** @worker: to process page faults */
-	struct work_struct worker;
+};
+
+/**
+ * struct xe_pagefault_work - Xe page fault work item (consumer)
+ *
+ * Represents a worker that pops a &struct xe_pagefault from the page fault
+ * queue and processes it.
+ */
+struct xe_pagefault_work {
+	/** @xe: Back-pointer to the Xe device */
+	struct xe_device *xe;
+	/** @id: Identifier for this work item */
+	int id;
+	/** @work: Work item used to process the page fault */
+	struct work_struct work;
 };
 
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 04/12] drm/xe: Use a single page-fault queue with multiple workers
  2026-02-26  4:28 ` [PATCH v4 04/12] drm/xe: Use a single page-fault queue with multiple workers Matthew Brost
@ 2026-05-06 15:46   ` Maciej Patelczyk
  2026-05-06 19:42     ` Matthew Brost
  0 siblings, 1 reply; 33+ messages in thread
From: Maciej Patelczyk @ 2026-05-06 15:46 UTC (permalink / raw)
  To: Matthew Brost, intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

On 26/02/2026 05:28, Matthew Brost wrote:

> With fine-grained page-fault locking, it no longer makes sense to
> maintain multiple page-fault queues, as we no longer hash queues based
> on the VM’s ASID. Multiple workers can pull page faults from a single
> queue, eliminating any head-of-queue blocking. Refactor the structures
> and code to use a single shared queue.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_device_types.h    | 12 +++---
>   drivers/gpu/drm/xe/xe_pagefault.c       | 52 +++++++++++++------------
>   drivers/gpu/drm/xe/xe_pagefault_types.h | 17 +++++++-
>   3 files changed, 50 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 1eb0fe118940..0558dfd52541 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -304,8 +304,8 @@ struct xe_device {
>   		struct xarray asid_to_vm;
>   		/** @usm.next_asid: next ASID, used to cyclical alloc asids */
>   		u32 next_asid;
> -		/** @usm.current_pf_queue: current page fault queue */
> -		u32 current_pf_queue;
> +		/** @usm.current_pf_work: current page fault work item */
> +		u32 current_pf_work;
>   		/** @usm.lock: protects UM state */
>   		struct rw_semaphore lock;
>   		/** @usm.pf_wq: page fault work queue, unbound, high priority */
> @@ -315,9 +315,11 @@ struct xe_device {
>   		 * yields the best bandwidth utilization of the kernel paging
>   		 * engine.
>   		 */
> -#define XE_PAGEFAULT_QUEUE_COUNT	4
> -		/** @usm.pf_queue: Page fault queues */
> -		struct xe_pagefault_queue pf_queue[XE_PAGEFAULT_QUEUE_COUNT];
> +#define XE_PAGEFAULT_WORK_COUNT	4
> +		/** @usm.pf_workers: Page fault workers */
> +		struct xe_pagefault_work pf_workers[XE_PAGEFAULT_WORK_COUNT];
> +		/** @usm.pf_queue: Page fault queue */
> +		struct xe_pagefault_queue pf_queue;
>   #if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
>   		/** @usm.pagemap_shrinker: Shrinker for unused pagemaps */
>   		struct drm_pagemap_shrinker *dpagemap_shrinker;
> diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
> index a372db7cd839..7880fc7e7eb4 100644
> --- a/drivers/gpu/drm/xe/xe_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_pagefault.c
> @@ -222,6 +222,7 @@ static void xe_pagefault_queue_retry(struct xe_pagefault_queue *pf_queue,
>   		pf_queue->tail = pf_queue->size - xe_pagefault_entry_size();
>   	else
>   		pf_queue->tail -= xe_pagefault_entry_size();
> +	memcpy(pf_queue->data + pf_queue->tail, pf, sizeof(*pf));
>   	spin_unlock_irq(&pf_queue->lock);
>   }
>   
> @@ -267,8 +268,10 @@ static void xe_pagefault_print(struct xe_pagefault *pf)
>   
>   static void xe_pagefault_queue_work(struct work_struct *w)
>   {
> -	struct xe_pagefault_queue *pf_queue =
> -		container_of(w, typeof(*pf_queue), worker);
> +	struct xe_pagefault_work *pf_work =
> +		container_of(w, typeof(*pf_work), work);
> +	struct xe_device *xe = pf_work->xe;
> +	struct xe_pagefault_queue *pf_queue = &xe->usm.pf_queue;
>   	struct xe_pagefault pf;
>   	unsigned long threshold;
>   
> @@ -285,7 +288,7 @@ static void xe_pagefault_queue_work(struct work_struct *w)
>   
>   		if (err == -EAGAIN) {
>   			xe_pagefault_queue_retry(pf_queue, &pf);
> -			queue_work(gt_to_xe(pf.gt)->usm.pf_wq, w);
> +			queue_work(xe->usm.pf_wq, w);
>   			break;
>   		} else if (err) {
>   			if (!(pf.consumer.access_type & XE_PAGEFAULT_ACCESS_PREFETCH)) {
> @@ -302,7 +305,7 @@ static void xe_pagefault_queue_work(struct work_struct *w)
>   		pf.producer.ops->ack_fault(&pf, err);
>   
>   		if (time_after(jiffies, threshold)) {
> -			queue_work(gt_to_xe(pf.gt)->usm.pf_wq, w);
> +			queue_work(xe->usm.pf_wq, w);
>   			break;
>   		}
>   	}
> @@ -348,7 +351,6 @@ static int xe_pagefault_queue_init(struct xe_device *xe,
>   		xe_pagefault_entry_size(), total_num_eus, pf_queue->size);
>   
>   	spin_lock_init(&pf_queue->lock);
> -	INIT_WORK(&pf_queue->worker, xe_pagefault_queue_work);
>   
>   	pf_queue->data = drmm_kzalloc(&xe->drm, pf_queue->size, GFP_KERNEL);
>   	if (!pf_queue->data)
> @@ -381,14 +383,20 @@ int xe_pagefault_init(struct xe_device *xe)
>   
>   	xe->usm.pf_wq = alloc_workqueue("xe_page_fault_work_queue",
>   					WQ_UNBOUND | WQ_HIGHPRI,
> -					XE_PAGEFAULT_QUEUE_COUNT);
> +					XE_PAGEFAULT_WORK_COUNT);
>   	if (!xe->usm.pf_wq)
>   		return -ENOMEM;
>   
> -	for (i = 0; i < XE_PAGEFAULT_QUEUE_COUNT; ++i) {
> -		err = xe_pagefault_queue_init(xe, xe->usm.pf_queue + i);
> -		if (err)
> -			goto err_out;
> +	err = xe_pagefault_queue_init(xe, &xe->usm.pf_queue);
> +	if (err)
> +		goto err_out;
> +
> +	for (i = 0; i < XE_PAGEFAULT_WORK_COUNT; ++i) {
> +		struct xe_pagefault_work *pf_work = xe->usm.pf_workers + i;
> +
> +		pf_work->xe = xe;
> +		pf_work->id = i;
> +		INIT_WORK(&pf_work->work, xe_pagefault_queue_work);
>   	}
>   
>   	return devm_add_action_or_reset(xe->drm.dev, xe_pagefault_fini, xe);
> @@ -430,10 +438,7 @@ static void xe_pagefault_queue_reset(struct xe_device *xe, struct xe_gt *gt,
>    */
>   void xe_pagefault_reset(struct xe_device *xe, struct xe_gt *gt)
>   {
> -	int i;
> -
> -	for (i = 0; i < XE_PAGEFAULT_QUEUE_COUNT; ++i)
> -		xe_pagefault_queue_reset(xe, gt, xe->usm.pf_queue + i);
> +	xe_pagefault_queue_reset(xe, gt, &xe->usm.pf_queue);
>   }
>   
>   static bool xe_pagefault_queue_full(struct xe_pagefault_queue *pf_queue)
> @@ -448,13 +453,11 @@ static bool xe_pagefault_queue_full(struct xe_pagefault_queue *pf_queue)
>    * This function can race with multiple page fault producers, but worst case we
>    * stick a page fault on the same queue for consumption.
>    */
> -static int xe_pagefault_queue_index(struct xe_device *xe)
> +static int xe_pagefault_work_index(struct xe_device *xe)
>   {
> -	u32 old_pf_queue = READ_ONCE(xe->usm.current_pf_queue);
> -
> -	WRITE_ONCE(xe->usm.current_pf_queue, (old_pf_queue + 1));
> +	lockdep_assert_held(&xe->usm.pf_queue.lock);
>   
> -	return old_pf_queue % XE_PAGEFAULT_QUEUE_COUNT;
> +	return xe->usm.current_pf_work++ % XE_PAGEFAULT_WORK_COUNT;
>   }
>   
>   /**
> @@ -469,22 +472,23 @@ static int xe_pagefault_queue_index(struct xe_device *xe)
>    */
>   int xe_pagefault_handler(struct xe_device *xe, struct xe_pagefault *pf)
>   {
> -	int queue_index = xe_pagefault_queue_index(xe);
> -	struct xe_pagefault_queue *pf_queue = xe->usm.pf_queue + queue_index;
> +	struct xe_pagefault_queue *pf_queue = &xe->usm.pf_queue;
>   	unsigned long flags;
> +	int work_index;
>   	bool full;
>   
>   	spin_lock_irqsave(&pf_queue->lock, flags);
> +	work_index = xe_pagefault_work_index(xe);
>   	full = xe_pagefault_queue_full(pf_queue);
>   	if (!full) {
>   		memcpy(pf_queue->data + pf_queue->head, pf, sizeof(*pf));
>   		pf_queue->head = (pf_queue->head + xe_pagefault_entry_size()) %
>   			pf_queue->size;
> -		queue_work(xe->usm.pf_wq, &pf_queue->worker);
> +		queue_work(xe->usm.pf_wq,
> +			   &xe->usm.pf_workers[work_index].work);
>   	} else {
>   		drm_warn(&xe->drm,
> -			 "PageFault Queue (%d) full, shouldn't be possible\n",
> -			 queue_index);
> +			 "PageFault Queue full, shouldn't be possible\n");
>   	}
>   	spin_unlock_irqrestore(&pf_queue->lock, flags);
>   
> diff --git a/drivers/gpu/drm/xe/xe_pagefault_types.h b/drivers/gpu/drm/xe/xe_pagefault_types.h
> index b3289219b1be..45065c25c25f 100644
> --- a/drivers/gpu/drm/xe/xe_pagefault_types.h
> +++ b/drivers/gpu/drm/xe/xe_pagefault_types.h
> @@ -131,8 +131,21 @@ struct xe_pagefault_queue {
>   	u32 tail;
>   	/** @lock: protects page fault queue */
>   	spinlock_t lock;
> -	/** @worker: to process page faults */
> -	struct work_struct worker;
> +};
> +
> +/**
> + * struct xe_pagefault_work - Xe page fault work item (consumer)
> + *
> + * Represents a worker that pops a &struct xe_pagefault from the page fault
> + * queue and processes it.
> + */
> +struct xe_pagefault_work {
> +	/** @xe: Back-pointer to the Xe device */
> +	struct xe_device *xe;
> +	/** @id: Identifier for this work item */
> +	int id;
> +	/** @work: Work item used to process the page fault */
> +	struct work_struct work;
>   };
>   
>   #endif

Matt,

There were total 4 pf_queues each of size = (total_num_eus + 
XE_NUM_HW_ENGINES) * xe_pagefault_entry_size() * PF_MULTIPLIER 
additionally bigger of roundup_pow_of_two().

Each of this queue had a dedicated worker.

There is a comment on queue calculation size in xe_pagefault_queue_init():

"XXX: Multiplier required as compute UMD are getting PF queue errors

without it. Follow on why this multiplier is required."

PF queue errors could be due to slow pf processing by handler in KMD 
plus generating PF for a single VM (asid) therefore hitting constantly 
single queue.


Now there is a single queue which is 4 times smaller (overall) but it 
has 4 workers and there are optimizations which potentially drastically 
decrease processing time.

In the end it could resolve to a case where a single queue had 4 workers 
instead of one which would be still faster than it is now.

Still, not sure if queue size is not too small.

Did you have a thought about it?


And I think this XXX comment becomes obsolete with such change.


Regards,

Maciej



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 04/12] drm/xe: Use a single page-fault queue with multiple workers
  2026-05-06 15:46   ` Maciej Patelczyk
@ 2026-05-06 19:42     ` Matthew Brost
  2026-05-07 12:41       ` Maciej Patelczyk
  0 siblings, 1 reply; 33+ messages in thread
From: Matthew Brost @ 2026-05-06 19:42 UTC (permalink / raw)
  To: Maciej Patelczyk
  Cc: intel-xe, stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

On Wed, May 06, 2026 at 05:46:30PM +0200, Maciej Patelczyk wrote:
> On 26/02/2026 05:28, Matthew Brost wrote:
> 
> > With fine-grained page-fault locking, it no longer makes sense to
> > maintain multiple page-fault queues, as we no longer hash queues based
> > on the VM’s ASID. Multiple workers can pull page faults from a single
> > queue, eliminating any head-of-queue blocking. Refactor the structures
> > and code to use a single shared queue.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_device_types.h    | 12 +++---
> >   drivers/gpu/drm/xe/xe_pagefault.c       | 52 +++++++++++++------------
> >   drivers/gpu/drm/xe/xe_pagefault_types.h | 17 +++++++-
> >   3 files changed, 50 insertions(+), 31 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> > index 1eb0fe118940..0558dfd52541 100644
> > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > @@ -304,8 +304,8 @@ struct xe_device {
> >   		struct xarray asid_to_vm;
> >   		/** @usm.next_asid: next ASID, used to cyclical alloc asids */
> >   		u32 next_asid;
> > -		/** @usm.current_pf_queue: current page fault queue */
> > -		u32 current_pf_queue;
> > +		/** @usm.current_pf_work: current page fault work item */
> > +		u32 current_pf_work;
> >   		/** @usm.lock: protects UM state */
> >   		struct rw_semaphore lock;
> >   		/** @usm.pf_wq: page fault work queue, unbound, high priority */
> > @@ -315,9 +315,11 @@ struct xe_device {
> >   		 * yields the best bandwidth utilization of the kernel paging
> >   		 * engine.
> >   		 */
> > -#define XE_PAGEFAULT_QUEUE_COUNT	4
> > -		/** @usm.pf_queue: Page fault queues */
> > -		struct xe_pagefault_queue pf_queue[XE_PAGEFAULT_QUEUE_COUNT];
> > +#define XE_PAGEFAULT_WORK_COUNT	4
> > +		/** @usm.pf_workers: Page fault workers */
> > +		struct xe_pagefault_work pf_workers[XE_PAGEFAULT_WORK_COUNT];
> > +		/** @usm.pf_queue: Page fault queue */
> > +		struct xe_pagefault_queue pf_queue;
> >   #if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
> >   		/** @usm.pagemap_shrinker: Shrinker for unused pagemaps */
> >   		struct drm_pagemap_shrinker *dpagemap_shrinker;
> > diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
> > index a372db7cd839..7880fc7e7eb4 100644
> > --- a/drivers/gpu/drm/xe/xe_pagefault.c
> > +++ b/drivers/gpu/drm/xe/xe_pagefault.c
> > @@ -222,6 +222,7 @@ static void xe_pagefault_queue_retry(struct xe_pagefault_queue *pf_queue,
> >   		pf_queue->tail = pf_queue->size - xe_pagefault_entry_size();
> >   	else
> >   		pf_queue->tail -= xe_pagefault_entry_size();
> > +	memcpy(pf_queue->data + pf_queue->tail, pf, sizeof(*pf));
> >   	spin_unlock_irq(&pf_queue->lock);
> >   }
> > @@ -267,8 +268,10 @@ static void xe_pagefault_print(struct xe_pagefault *pf)
> >   static void xe_pagefault_queue_work(struct work_struct *w)
> >   {
> > -	struct xe_pagefault_queue *pf_queue =
> > -		container_of(w, typeof(*pf_queue), worker);
> > +	struct xe_pagefault_work *pf_work =
> > +		container_of(w, typeof(*pf_work), work);
> > +	struct xe_device *xe = pf_work->xe;
> > +	struct xe_pagefault_queue *pf_queue = &xe->usm.pf_queue;
> >   	struct xe_pagefault pf;
> >   	unsigned long threshold;
> > @@ -285,7 +288,7 @@ static void xe_pagefault_queue_work(struct work_struct *w)
> >   		if (err == -EAGAIN) {
> >   			xe_pagefault_queue_retry(pf_queue, &pf);
> > -			queue_work(gt_to_xe(pf.gt)->usm.pf_wq, w);
> > +			queue_work(xe->usm.pf_wq, w);
> >   			break;
> >   		} else if (err) {
> >   			if (!(pf.consumer.access_type & XE_PAGEFAULT_ACCESS_PREFETCH)) {
> > @@ -302,7 +305,7 @@ static void xe_pagefault_queue_work(struct work_struct *w)
> >   		pf.producer.ops->ack_fault(&pf, err);
> >   		if (time_after(jiffies, threshold)) {
> > -			queue_work(gt_to_xe(pf.gt)->usm.pf_wq, w);
> > +			queue_work(xe->usm.pf_wq, w);
> >   			break;
> >   		}
> >   	}
> > @@ -348,7 +351,6 @@ static int xe_pagefault_queue_init(struct xe_device *xe,
> >   		xe_pagefault_entry_size(), total_num_eus, pf_queue->size);
> >   	spin_lock_init(&pf_queue->lock);
> > -	INIT_WORK(&pf_queue->worker, xe_pagefault_queue_work);
> >   	pf_queue->data = drmm_kzalloc(&xe->drm, pf_queue->size, GFP_KERNEL);
> >   	if (!pf_queue->data)
> > @@ -381,14 +383,20 @@ int xe_pagefault_init(struct xe_device *xe)
> >   	xe->usm.pf_wq = alloc_workqueue("xe_page_fault_work_queue",
> >   					WQ_UNBOUND | WQ_HIGHPRI,
> > -					XE_PAGEFAULT_QUEUE_COUNT);
> > +					XE_PAGEFAULT_WORK_COUNT);
> >   	if (!xe->usm.pf_wq)
> >   		return -ENOMEM;
> > -	for (i = 0; i < XE_PAGEFAULT_QUEUE_COUNT; ++i) {
> > -		err = xe_pagefault_queue_init(xe, xe->usm.pf_queue + i);
> > -		if (err)
> > -			goto err_out;
> > +	err = xe_pagefault_queue_init(xe, &xe->usm.pf_queue);
> > +	if (err)
> > +		goto err_out;
> > +
> > +	for (i = 0; i < XE_PAGEFAULT_WORK_COUNT; ++i) {
> > +		struct xe_pagefault_work *pf_work = xe->usm.pf_workers + i;
> > +
> > +		pf_work->xe = xe;
> > +		pf_work->id = i;
> > +		INIT_WORK(&pf_work->work, xe_pagefault_queue_work);
> >   	}
> >   	return devm_add_action_or_reset(xe->drm.dev, xe_pagefault_fini, xe);
> > @@ -430,10 +438,7 @@ static void xe_pagefault_queue_reset(struct xe_device *xe, struct xe_gt *gt,
> >    */
> >   void xe_pagefault_reset(struct xe_device *xe, struct xe_gt *gt)
> >   {
> > -	int i;
> > -
> > -	for (i = 0; i < XE_PAGEFAULT_QUEUE_COUNT; ++i)
> > -		xe_pagefault_queue_reset(xe, gt, xe->usm.pf_queue + i);
> > +	xe_pagefault_queue_reset(xe, gt, &xe->usm.pf_queue);
> >   }
> >   static bool xe_pagefault_queue_full(struct xe_pagefault_queue *pf_queue)
> > @@ -448,13 +453,11 @@ static bool xe_pagefault_queue_full(struct xe_pagefault_queue *pf_queue)
> >    * This function can race with multiple page fault producers, but worst case we
> >    * stick a page fault on the same queue for consumption.
> >    */
> > -static int xe_pagefault_queue_index(struct xe_device *xe)
> > +static int xe_pagefault_work_index(struct xe_device *xe)
> >   {
> > -	u32 old_pf_queue = READ_ONCE(xe->usm.current_pf_queue);
> > -
> > -	WRITE_ONCE(xe->usm.current_pf_queue, (old_pf_queue + 1));
> > +	lockdep_assert_held(&xe->usm.pf_queue.lock);
> > -	return old_pf_queue % XE_PAGEFAULT_QUEUE_COUNT;
> > +	return xe->usm.current_pf_work++ % XE_PAGEFAULT_WORK_COUNT;
> >   }
> >   /**
> > @@ -469,22 +472,23 @@ static int xe_pagefault_queue_index(struct xe_device *xe)
> >    */
> >   int xe_pagefault_handler(struct xe_device *xe, struct xe_pagefault *pf)
> >   {
> > -	int queue_index = xe_pagefault_queue_index(xe);
> > -	struct xe_pagefault_queue *pf_queue = xe->usm.pf_queue + queue_index;
> > +	struct xe_pagefault_queue *pf_queue = &xe->usm.pf_queue;
> >   	unsigned long flags;
> > +	int work_index;
> >   	bool full;
> >   	spin_lock_irqsave(&pf_queue->lock, flags);
> > +	work_index = xe_pagefault_work_index(xe);
> >   	full = xe_pagefault_queue_full(pf_queue);
> >   	if (!full) {
> >   		memcpy(pf_queue->data + pf_queue->head, pf, sizeof(*pf));
> >   		pf_queue->head = (pf_queue->head + xe_pagefault_entry_size()) %
> >   			pf_queue->size;
> > -		queue_work(xe->usm.pf_wq, &pf_queue->worker);
> > +		queue_work(xe->usm.pf_wq,
> > +			   &xe->usm.pf_workers[work_index].work);
> >   	} else {
> >   		drm_warn(&xe->drm,
> > -			 "PageFault Queue (%d) full, shouldn't be possible\n",
> > -			 queue_index);
> > +			 "PageFault Queue full, shouldn't be possible\n");
> >   	}
> >   	spin_unlock_irqrestore(&pf_queue->lock, flags);
> > diff --git a/drivers/gpu/drm/xe/xe_pagefault_types.h b/drivers/gpu/drm/xe/xe_pagefault_types.h
> > index b3289219b1be..45065c25c25f 100644
> > --- a/drivers/gpu/drm/xe/xe_pagefault_types.h
> > +++ b/drivers/gpu/drm/xe/xe_pagefault_types.h
> > @@ -131,8 +131,21 @@ struct xe_pagefault_queue {
> >   	u32 tail;
> >   	/** @lock: protects page fault queue */
> >   	spinlock_t lock;
> > -	/** @worker: to process page faults */
> > -	struct work_struct worker;
> > +};
> > +
> > +/**
> > + * struct xe_pagefault_work - Xe page fault work item (consumer)
> > + *
> > + * Represents a worker that pops a &struct xe_pagefault from the page fault
> > + * queue and processes it.
> > + */
> > +struct xe_pagefault_work {
> > +	/** @xe: Back-pointer to the Xe device */
> > +	struct xe_device *xe;
> > +	/** @id: Identifier for this work item */
> > +	int id;
> > +	/** @work: Work item used to process the page fault */
> > +	struct work_struct work;
> >   };
> >   #endif
> 
> Matt,
> 
> There were total 4 pf_queues each of size = (total_num_eus +
> XE_NUM_HW_ENGINES) * xe_pagefault_entry_size() * PF_MULTIPLIER additionally
> bigger of roundup_pow_of_two().
> 
> Each of this queue had a dedicated worker.
> 
> There is a comment on queue calculation size in xe_pagefault_queue_init():
> 
> "XXX: Multiplier required as compute UMD are getting PF queue errors
> 
> without it. Follow on why this multiplier is required."
> 
> PF queue errors could be due to slow pf processing by handler in KMD plus
> generating PF for a single VM (asid) therefore hitting constantly single
> queue.
> 
> 
> Now there is a single queue which is 4 times smaller (overall) but it has 4
> workers and there are optimizations which potentially drastically decrease
> processing time.
> 
> In the end it could resolve to a case where a single queue had 4 workers
> instead of one which would be still faster than it is now.
> 
> Still, not sure if queue size is not too small.
> 
> Did you have a thought about it?
> 
> 
> And I think this XXX comment becomes obsolete with such change.
> 

I think the XXX comment was always wrong. We kept increasing the queue
size because of random overflows, but the actual bug was that we didn’t
round up to a power of two, and CIRC_SPACE relies on values being powers
of two.

I believe we never got around to deleting the XXX comment or removing
the multiplier. We can handle this in a follow-up after this series, as
I’d like a large change like this to sit for a while so we can test and
ensure there are no regressions. Then we can clean up the XXX comment
and the multiplier in a follow-up.

Matt

> 
> Regards,
> 
> Maciej
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 04/12] drm/xe: Use a single page-fault queue with multiple workers
  2026-05-06 19:42     ` Matthew Brost
@ 2026-05-07 12:41       ` Maciej Patelczyk
  0 siblings, 0 replies; 33+ messages in thread
From: Maciej Patelczyk @ 2026-05-07 12:41 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

On 06/05/2026 21:42, Matthew Brost wrote:

> On Wed, May 06, 2026 at 05:46:30PM +0200, Maciej Patelczyk wrote:
>> On 26/02/2026 05:28, Matthew Brost wrote:
>>
>>> With fine-grained page-fault locking, it no longer makes sense to
>>> maintain multiple page-fault queues, as we no longer hash queues based
>>> on the VM’s ASID. Multiple workers can pull page faults from a single
>>> queue, eliminating any head-of-queue blocking. Refactor the structures
>>> and code to use a single shared queue.
>>>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    drivers/gpu/drm/xe/xe_device_types.h    | 12 +++---
>>>    drivers/gpu/drm/xe/xe_pagefault.c       | 52 +++++++++++++------------
>>>    drivers/gpu/drm/xe/xe_pagefault_types.h | 17 +++++++-
>>>    3 files changed, 50 insertions(+), 31 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
>>> index 1eb0fe118940..0558dfd52541 100644
>>> --- a/drivers/gpu/drm/xe/xe_device_types.h
>>> +++ b/drivers/gpu/drm/xe/xe_device_types.h
>>> @@ -304,8 +304,8 @@ struct xe_device {
>>>    		struct xarray asid_to_vm;
>>>    		/** @usm.next_asid: next ASID, used to cyclical alloc asids */
>>>    		u32 next_asid;
>>> -		/** @usm.current_pf_queue: current page fault queue */
>>> -		u32 current_pf_queue;
>>> +		/** @usm.current_pf_work: current page fault work item */
>>> +		u32 current_pf_work;
>>>    		/** @usm.lock: protects UM state */
>>>    		struct rw_semaphore lock;
>>>    		/** @usm.pf_wq: page fault work queue, unbound, high priority */
>>> @@ -315,9 +315,11 @@ struct xe_device {
>>>    		 * yields the best bandwidth utilization of the kernel paging
>>>    		 * engine.
>>>    		 */
>>> -#define XE_PAGEFAULT_QUEUE_COUNT	4
>>> -		/** @usm.pf_queue: Page fault queues */
>>> -		struct xe_pagefault_queue pf_queue[XE_PAGEFAULT_QUEUE_COUNT];
>>> +#define XE_PAGEFAULT_WORK_COUNT	4
>>> +		/** @usm.pf_workers: Page fault workers */
>>> +		struct xe_pagefault_work pf_workers[XE_PAGEFAULT_WORK_COUNT];
>>> +		/** @usm.pf_queue: Page fault queue */
>>> +		struct xe_pagefault_queue pf_queue;
>>>    #if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
>>>    		/** @usm.pagemap_shrinker: Shrinker for unused pagemaps */
>>>    		struct drm_pagemap_shrinker *dpagemap_shrinker;
>>> diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
>>> index a372db7cd839..7880fc7e7eb4 100644
>>> --- a/drivers/gpu/drm/xe/xe_pagefault.c
>>> +++ b/drivers/gpu/drm/xe/xe_pagefault.c
>>> @@ -222,6 +222,7 @@ static void xe_pagefault_queue_retry(struct xe_pagefault_queue *pf_queue,
>>>    		pf_queue->tail = pf_queue->size - xe_pagefault_entry_size();
>>>    	else
>>>    		pf_queue->tail -= xe_pagefault_entry_size();
>>> +	memcpy(pf_queue->data + pf_queue->tail, pf, sizeof(*pf));
>>>    	spin_unlock_irq(&pf_queue->lock);
>>>    }
>>> @@ -267,8 +268,10 @@ static void xe_pagefault_print(struct xe_pagefault *pf)
>>>    static void xe_pagefault_queue_work(struct work_struct *w)
>>>    {
>>> -	struct xe_pagefault_queue *pf_queue =
>>> -		container_of(w, typeof(*pf_queue), worker);
>>> +	struct xe_pagefault_work *pf_work =
>>> +		container_of(w, typeof(*pf_work), work);
>>> +	struct xe_device *xe = pf_work->xe;
>>> +	struct xe_pagefault_queue *pf_queue = &xe->usm.pf_queue;
>>>    	struct xe_pagefault pf;
>>>    	unsigned long threshold;
>>> @@ -285,7 +288,7 @@ static void xe_pagefault_queue_work(struct work_struct *w)
>>>    		if (err == -EAGAIN) {
>>>    			xe_pagefault_queue_retry(pf_queue, &pf);
>>> -			queue_work(gt_to_xe(pf.gt)->usm.pf_wq, w);
>>> +			queue_work(xe->usm.pf_wq, w);
>>>    			break;
>>>    		} else if (err) {
>>>    			if (!(pf.consumer.access_type & XE_PAGEFAULT_ACCESS_PREFETCH)) {
>>> @@ -302,7 +305,7 @@ static void xe_pagefault_queue_work(struct work_struct *w)
>>>    		pf.producer.ops->ack_fault(&pf, err);
>>>    		if (time_after(jiffies, threshold)) {
>>> -			queue_work(gt_to_xe(pf.gt)->usm.pf_wq, w);
>>> +			queue_work(xe->usm.pf_wq, w);
>>>    			break;
>>>    		}
>>>    	}
>>> @@ -348,7 +351,6 @@ static int xe_pagefault_queue_init(struct xe_device *xe,
>>>    		xe_pagefault_entry_size(), total_num_eus, pf_queue->size);
>>>    	spin_lock_init(&pf_queue->lock);
>>> -	INIT_WORK(&pf_queue->worker, xe_pagefault_queue_work);
>>>    	pf_queue->data = drmm_kzalloc(&xe->drm, pf_queue->size, GFP_KERNEL);
>>>    	if (!pf_queue->data)
>>> @@ -381,14 +383,20 @@ int xe_pagefault_init(struct xe_device *xe)
>>>    	xe->usm.pf_wq = alloc_workqueue("xe_page_fault_work_queue",
>>>    					WQ_UNBOUND | WQ_HIGHPRI,
>>> -					XE_PAGEFAULT_QUEUE_COUNT);
>>> +					XE_PAGEFAULT_WORK_COUNT);
>>>    	if (!xe->usm.pf_wq)
>>>    		return -ENOMEM;
>>> -	for (i = 0; i < XE_PAGEFAULT_QUEUE_COUNT; ++i) {
>>> -		err = xe_pagefault_queue_init(xe, xe->usm.pf_queue + i);
>>> -		if (err)
>>> -			goto err_out;
>>> +	err = xe_pagefault_queue_init(xe, &xe->usm.pf_queue);
>>> +	if (err)
>>> +		goto err_out;
>>> +
>>> +	for (i = 0; i < XE_PAGEFAULT_WORK_COUNT; ++i) {
>>> +		struct xe_pagefault_work *pf_work = xe->usm.pf_workers + i;
>>> +
>>> +		pf_work->xe = xe;
>>> +		pf_work->id = i;
>>> +		INIT_WORK(&pf_work->work, xe_pagefault_queue_work);
>>>    	}
>>>    	return devm_add_action_or_reset(xe->drm.dev, xe_pagefault_fini, xe);
>>> @@ -430,10 +438,7 @@ static void xe_pagefault_queue_reset(struct xe_device *xe, struct xe_gt *gt,
>>>     */
>>>    void xe_pagefault_reset(struct xe_device *xe, struct xe_gt *gt)
>>>    {
>>> -	int i;
>>> -
>>> -	for (i = 0; i < XE_PAGEFAULT_QUEUE_COUNT; ++i)
>>> -		xe_pagefault_queue_reset(xe, gt, xe->usm.pf_queue + i);
>>> +	xe_pagefault_queue_reset(xe, gt, &xe->usm.pf_queue);
>>>    }
>>>    static bool xe_pagefault_queue_full(struct xe_pagefault_queue *pf_queue)
>>> @@ -448,13 +453,11 @@ static bool xe_pagefault_queue_full(struct xe_pagefault_queue *pf_queue)
>>>     * This function can race with multiple page fault producers, but worst case we
>>>     * stick a page fault on the same queue for consumption.
>>>     */
>>> -static int xe_pagefault_queue_index(struct xe_device *xe)
>>> +static int xe_pagefault_work_index(struct xe_device *xe)
>>>    {
>>> -	u32 old_pf_queue = READ_ONCE(xe->usm.current_pf_queue);
>>> -
>>> -	WRITE_ONCE(xe->usm.current_pf_queue, (old_pf_queue + 1));
>>> +	lockdep_assert_held(&xe->usm.pf_queue.lock);
>>> -	return old_pf_queue % XE_PAGEFAULT_QUEUE_COUNT;
>>> +	return xe->usm.current_pf_work++ % XE_PAGEFAULT_WORK_COUNT;
>>>    }
>>>    /**
>>> @@ -469,22 +472,23 @@ static int xe_pagefault_queue_index(struct xe_device *xe)
>>>     */
>>>    int xe_pagefault_handler(struct xe_device *xe, struct xe_pagefault *pf)
>>>    {
>>> -	int queue_index = xe_pagefault_queue_index(xe);
>>> -	struct xe_pagefault_queue *pf_queue = xe->usm.pf_queue + queue_index;
>>> +	struct xe_pagefault_queue *pf_queue = &xe->usm.pf_queue;
>>>    	unsigned long flags;
>>> +	int work_index;
>>>    	bool full;
>>>    	spin_lock_irqsave(&pf_queue->lock, flags);
>>> +	work_index = xe_pagefault_work_index(xe);
>>>    	full = xe_pagefault_queue_full(pf_queue);
>>>    	if (!full) {
>>>    		memcpy(pf_queue->data + pf_queue->head, pf, sizeof(*pf));
>>>    		pf_queue->head = (pf_queue->head + xe_pagefault_entry_size()) %
>>>    			pf_queue->size;
>>> -		queue_work(xe->usm.pf_wq, &pf_queue->worker);
>>> +		queue_work(xe->usm.pf_wq,
>>> +			   &xe->usm.pf_workers[work_index].work);
>>>    	} else {
>>>    		drm_warn(&xe->drm,
>>> -			 "PageFault Queue (%d) full, shouldn't be possible\n",
>>> -			 queue_index);
>>> +			 "PageFault Queue full, shouldn't be possible\n");
>>>    	}
>>>    	spin_unlock_irqrestore(&pf_queue->lock, flags);
>>> diff --git a/drivers/gpu/drm/xe/xe_pagefault_types.h b/drivers/gpu/drm/xe/xe_pagefault_types.h
>>> index b3289219b1be..45065c25c25f 100644
>>> --- a/drivers/gpu/drm/xe/xe_pagefault_types.h
>>> +++ b/drivers/gpu/drm/xe/xe_pagefault_types.h
>>> @@ -131,8 +131,21 @@ struct xe_pagefault_queue {
>>>    	u32 tail;
>>>    	/** @lock: protects page fault queue */
>>>    	spinlock_t lock;
>>> -	/** @worker: to process page faults */
>>> -	struct work_struct worker;
>>> +};
>>> +
>>> +/**
>>> + * struct xe_pagefault_work - Xe page fault work item (consumer)
>>> + *
>>> + * Represents a worker that pops a &struct xe_pagefault from the page fault
>>> + * queue and processes it.
>>> + */
>>> +struct xe_pagefault_work {
>>> +	/** @xe: Back-pointer to the Xe device */
>>> +	struct xe_device *xe;
>>> +	/** @id: Identifier for this work item */
>>> +	int id;
>>> +	/** @work: Work item used to process the page fault */
>>> +	struct work_struct work;
>>>    };
>>>    #endif
>> Matt,
>>
>> There were total 4 pf_queues each of size = (total_num_eus +
>> XE_NUM_HW_ENGINES) * xe_pagefault_entry_size() * PF_MULTIPLIER additionally
>> bigger of roundup_pow_of_two().
>>
>> Each of this queue had a dedicated worker.
>>
>> There is a comment on queue calculation size in xe_pagefault_queue_init():
>>
>> "XXX: Multiplier required as compute UMD are getting PF queue errors
>>
>> without it. Follow on why this multiplier is required."
>>
>> PF queue errors could be due to slow pf processing by handler in KMD plus
>> generating PF for a single VM (asid) therefore hitting constantly single
>> queue.
>>
>>
>> Now there is a single queue which is 4 times smaller (overall) but it has 4
>> workers and there are optimizations which potentially drastically decrease
>> processing time.
>>
>> In the end it could resolve to a case where a single queue had 4 workers
>> instead of one which would be still faster than it is now.
>>
>> Still, not sure if queue size is not too small.
>>
>> Did you have a thought about it?
>>
>>
>> And I think this XXX comment becomes obsolete with such change.
>>
> I think the XXX comment was always wrong. We kept increasing the queue
> size because of random overflows, but the actual bug was that we didn’t
> round up to a power of two, and CIRC_SPACE relies on values being powers
> of two.
>
> I believe we never got around to deleting the XXX comment or removing
> the multiplier. We can handle this in a follow-up after this series, as
> I’d like a large change like this to sit for a while so we can test and
> ensure there are no regressions. Then we can clean up the XXX comment
> and the multiplier in a follow-up.
>
> Matt

All right,

Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com>


>> Regards,
>>
>> Maciej
>>
>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v4 05/12] drm/xe: Add num_pf_work modparam
  2026-02-26  4:28 [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache Matthew Brost
                   ` (3 preceding siblings ...)
  2026-02-26  4:28 ` [PATCH v4 04/12] drm/xe: Use a single page-fault queue with multiple workers Matthew Brost
@ 2026-02-26  4:28 ` Matthew Brost
  2026-05-06 15:59   ` Maciej Patelczyk
  2026-02-26  4:28 ` [PATCH v4 06/12] drm/xe: Engine class and instance into a u8 Matthew Brost
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 33+ messages in thread
From: Matthew Brost @ 2026-02-26  4:28 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Add a module parameter to control the number of page-fault work threads,
making it easy to experiment with how different numbers of work threads
impact performance.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_defaults.h     |  1 +
 drivers/gpu/drm/xe/xe_device.c       | 17 ++++++++++++++---
 drivers/gpu/drm/xe/xe_device_types.h | 11 ++++-------
 drivers/gpu/drm/xe/xe_module.c       |  4 ++++
 drivers/gpu/drm/xe/xe_module.h       |  1 +
 drivers/gpu/drm/xe/xe_pagefault.c    |  6 +++---
 drivers/gpu/drm/xe/xe_vm.c           |  3 ++-
 7 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_defaults.h b/drivers/gpu/drm/xe/xe_defaults.h
index 5d5d41d067c5..2e615cf896b2 100644
--- a/drivers/gpu/drm/xe/xe_defaults.h
+++ b/drivers/gpu/drm/xe/xe_defaults.h
@@ -22,5 +22,6 @@
 #define XE_DEFAULT_WEDGED_MODE			XE_WEDGED_MODE_UPON_CRITICAL_ERROR
 #define XE_DEFAULT_WEDGED_MODE_STR		"upon-critical-error"
 #define XE_DEFAULT_SVM_NOTIFIER_SIZE		512
+#define XE_DEFAULT_NUM_PF_WORK			2
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 3462645ca13c..0571079a09e8 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -436,6 +436,18 @@ static void xe_device_destroy(struct drm_device *dev, void *dummy)
 	ttm_device_fini(&xe->ttm);
 }
 
+static void xe_device_parse_modparam(struct xe_device *xe)
+{
+	xe->info.force_execlist = xe_modparam.force_execlist;
+	xe->atomic_svm_timeslice_ms = 5;
+	xe->min_run_period_lr_ms = 5;
+	xe->info.num_pf_work = xe_modparam.num_pf_work;
+	if (xe->info.num_pf_work < 1)
+		xe->info.num_pf_work = 1;
+	else if (xe->info.num_pf_work > XE_PAGEFAULT_WORK_MAX)
+		xe->info.num_pf_work = XE_PAGEFAULT_WORK_MAX;
+}
+
 struct xe_device *xe_device_create(struct pci_dev *pdev,
 				   const struct pci_device_id *ent)
 {
@@ -469,9 +481,8 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
 
 	xe->info.devid = pdev->device;
 	xe->info.revid = pdev->revision;
-	xe->info.force_execlist = xe_modparam.force_execlist;
-	xe->atomic_svm_timeslice_ms = 5;
-	xe->min_run_period_lr_ms = 5;
+
+	xe_device_parse_modparam(xe);
 
 	err = xe_irq_init(xe);
 	if (err)
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 0558dfd52541..a027ca5f6828 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -130,6 +130,8 @@ struct xe_device {
 		u8 revid;
 		/** @info.step: stepping information for each IP */
 		struct xe_step_info step;
+		/** @info.num_pf_work: Number of page fault work thread */
+		int num_pf_work;
 		/** @info.dma_mask_size: DMA address bits */
 		u8 dma_mask_size;
 		/** @info.vram_flags: Vram flags */
@@ -310,14 +312,9 @@ struct xe_device {
 		struct rw_semaphore lock;
 		/** @usm.pf_wq: page fault work queue, unbound, high priority */
 		struct workqueue_struct *pf_wq;
-		/*
-		 * We pick 4 here because, in the current implementation, it
-		 * yields the best bandwidth utilization of the kernel paging
-		 * engine.
-		 */
-#define XE_PAGEFAULT_WORK_COUNT	4
+#define XE_PAGEFAULT_WORK_MAX	8
 		/** @usm.pf_workers: Page fault workers */
-		struct xe_pagefault_work pf_workers[XE_PAGEFAULT_WORK_COUNT];
+		struct xe_pagefault_work pf_workers[XE_PAGEFAULT_WORK_MAX];
 		/** @usm.pf_queue: Page fault queue */
 		struct xe_pagefault_queue pf_queue;
 #if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
index 903d3b433421..c750db4b579c 100644
--- a/drivers/gpu/drm/xe/xe_module.c
+++ b/drivers/gpu/drm/xe/xe_module.c
@@ -28,6 +28,7 @@ struct xe_modparam xe_modparam = {
 	.max_vfs =		XE_DEFAULT_MAX_VFS,
 #endif
 	.wedged_mode =		XE_DEFAULT_WEDGED_MODE,
+	.num_pf_work =		XE_DEFAULT_NUM_PF_WORK,
 	.svm_notifier_size =	XE_DEFAULT_SVM_NOTIFIER_SIZE,
 	/* the rest are 0 by default */
 };
@@ -81,6 +82,9 @@ MODULE_PARM_DESC(wedged_mode,
 		 "Module's default policy for the wedged mode (0=never, 1=upon-critical-error, 2=upon-any-hang-no-reset "
 		 "[default=" XE_DEFAULT_WEDGED_MODE_STR "])");
 
+module_param_named(num_pf_work, xe_modparam.num_pf_work, int, 0600);
+MODULE_PARM_DESC(num_pf_work, "Number of page fault work threads, default=2, min=1, max=8");
+
 static int xe_check_nomodeset(void)
 {
 	if (drm_firmware_drivers_only())
diff --git a/drivers/gpu/drm/xe/xe_module.h b/drivers/gpu/drm/xe/xe_module.h
index 79cb9639c0f3..c6642523184a 100644
--- a/drivers/gpu/drm/xe/xe_module.h
+++ b/drivers/gpu/drm/xe/xe_module.h
@@ -22,6 +22,7 @@ struct xe_modparam {
 	unsigned int max_vfs;
 #endif
 	unsigned int wedged_mode;
+	unsigned int num_pf_work;
 	u32 svm_notifier_size;
 };
 
diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
index 7880fc7e7eb4..64b1dc574ab7 100644
--- a/drivers/gpu/drm/xe/xe_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_pagefault.c
@@ -383,7 +383,7 @@ int xe_pagefault_init(struct xe_device *xe)
 
 	xe->usm.pf_wq = alloc_workqueue("xe_page_fault_work_queue",
 					WQ_UNBOUND | WQ_HIGHPRI,
-					XE_PAGEFAULT_WORK_COUNT);
+					xe->info.num_pf_work);
 	if (!xe->usm.pf_wq)
 		return -ENOMEM;
 
@@ -391,7 +391,7 @@ int xe_pagefault_init(struct xe_device *xe)
 	if (err)
 		goto err_out;
 
-	for (i = 0; i < XE_PAGEFAULT_WORK_COUNT; ++i) {
+	for (i = 0; i < xe->info.num_pf_work; ++i) {
 		struct xe_pagefault_work *pf_work = xe->usm.pf_workers + i;
 
 		pf_work->xe = xe;
@@ -457,7 +457,7 @@ static int xe_pagefault_work_index(struct xe_device *xe)
 {
 	lockdep_assert_held(&xe->usm.pf_queue.lock);
 
-	return xe->usm.current_pf_work++ % XE_PAGEFAULT_WORK_COUNT;
+	return xe->usm.current_pf_work++ % xe->info.num_pf_work;
 }
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 06669e9c500d..54c7d0f791e1 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -3070,7 +3070,8 @@ static int prefetch_ranges(struct xe_vm *vm, struct xe_vma_ops *vops,
 	skip_threads =  op->prefetch_range.ranges_count == 1 ||
 		(!dpagemap && !(vops->flags &
 				XE_VMA_OPS_FLAG_HAS_SVM_VALID_RANGE)) ||
-		!(vops->flags & XE_VMA_OPS_FLAG_DOWNGRADE_LOCK);
+		!(vops->flags & XE_VMA_OPS_FLAG_DOWNGRADE_LOCK) ||
+		vm->xe->info.num_pf_work == 1;
 	thread = skip_threads ? &stack_thread : NULL;
 
 	if (!skip_threads) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 05/12] drm/xe: Add num_pf_work modparam
  2026-02-26  4:28 ` [PATCH v4 05/12] drm/xe: Add num_pf_work modparam Matthew Brost
@ 2026-05-06 15:59   ` Maciej Patelczyk
  0 siblings, 0 replies; 33+ messages in thread
From: Maciej Patelczyk @ 2026-05-06 15:59 UTC (permalink / raw)
  To: Matthew Brost, intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

On 26/02/2026 05:28, Matthew Brost wrote:

> Add a module parameter to control the number of page-fault work threads,
> making it easy to experiment with how different numbers of work threads
> impact performance.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_defaults.h     |  1 +
>   drivers/gpu/drm/xe/xe_device.c       | 17 ++++++++++++++---
>   drivers/gpu/drm/xe/xe_device_types.h | 11 ++++-------
>   drivers/gpu/drm/xe/xe_module.c       |  4 ++++
>   drivers/gpu/drm/xe/xe_module.h       |  1 +
>   drivers/gpu/drm/xe/xe_pagefault.c    |  6 +++---
>   drivers/gpu/drm/xe/xe_vm.c           |  3 ++-
>   7 files changed, 29 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_defaults.h b/drivers/gpu/drm/xe/xe_defaults.h
> index 5d5d41d067c5..2e615cf896b2 100644
> --- a/drivers/gpu/drm/xe/xe_defaults.h
> +++ b/drivers/gpu/drm/xe/xe_defaults.h
> @@ -22,5 +22,6 @@
>   #define XE_DEFAULT_WEDGED_MODE			XE_WEDGED_MODE_UPON_CRITICAL_ERROR
>   #define XE_DEFAULT_WEDGED_MODE_STR		"upon-critical-error"
>   #define XE_DEFAULT_SVM_NOTIFIER_SIZE		512
> +#define XE_DEFAULT_NUM_PF_WORK			2
>   
>   #endif
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 3462645ca13c..0571079a09e8 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -436,6 +436,18 @@ static void xe_device_destroy(struct drm_device *dev, void *dummy)
>   	ttm_device_fini(&xe->ttm);
>   }
>   
> +static void xe_device_parse_modparam(struct xe_device *xe)
> +{
> +	xe->info.force_execlist = xe_modparam.force_execlist;
> +	xe->atomic_svm_timeslice_ms = 5;
> +	xe->min_run_period_lr_ms = 5;
> +	xe->info.num_pf_work = xe_modparam.num_pf_work;
> +	if (xe->info.num_pf_work < 1)
> +		xe->info.num_pf_work = 1;
> +	else if (xe->info.num_pf_work > XE_PAGEFAULT_WORK_MAX)
> +		xe->info.num_pf_work = XE_PAGEFAULT_WORK_MAX;
> +}
> +
>   struct xe_device *xe_device_create(struct pci_dev *pdev,
>   				   const struct pci_device_id *ent)
>   {
> @@ -469,9 +481,8 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
>   
>   	xe->info.devid = pdev->device;
>   	xe->info.revid = pdev->revision;
> -	xe->info.force_execlist = xe_modparam.force_execlist;
> -	xe->atomic_svm_timeslice_ms = 5;
> -	xe->min_run_period_lr_ms = 5;
> +
> +	xe_device_parse_modparam(xe);
>   
>   	err = xe_irq_init(xe);
>   	if (err)
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 0558dfd52541..a027ca5f6828 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -130,6 +130,8 @@ struct xe_device {
>   		u8 revid;
>   		/** @info.step: stepping information for each IP */
>   		struct xe_step_info step;
> +		/** @info.num_pf_work: Number of page fault work thread */
> +		int num_pf_work;
>   		/** @info.dma_mask_size: DMA address bits */
>   		u8 dma_mask_size;
>   		/** @info.vram_flags: Vram flags */
> @@ -310,14 +312,9 @@ struct xe_device {
>   		struct rw_semaphore lock;
>   		/** @usm.pf_wq: page fault work queue, unbound, high priority */
>   		struct workqueue_struct *pf_wq;
> -		/*
> -		 * We pick 4 here because, in the current implementation, it
> -		 * yields the best bandwidth utilization of the kernel paging
> -		 * engine.
> -		 */
> -#define XE_PAGEFAULT_WORK_COUNT	4
> +#define XE_PAGEFAULT_WORK_MAX	8
>   		/** @usm.pf_workers: Page fault workers */
> -		struct xe_pagefault_work pf_workers[XE_PAGEFAULT_WORK_COUNT];
> +		struct xe_pagefault_work pf_workers[XE_PAGEFAULT_WORK_MAX];
>   		/** @usm.pf_queue: Page fault queue */
>   		struct xe_pagefault_queue pf_queue;
>   #if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
> diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
> index 903d3b433421..c750db4b579c 100644
> --- a/drivers/gpu/drm/xe/xe_module.c
> +++ b/drivers/gpu/drm/xe/xe_module.c
> @@ -28,6 +28,7 @@ struct xe_modparam xe_modparam = {
>   	.max_vfs =		XE_DEFAULT_MAX_VFS,
>   #endif
>   	.wedged_mode =		XE_DEFAULT_WEDGED_MODE,
> +	.num_pf_work =		XE_DEFAULT_NUM_PF_WORK,
>   	.svm_notifier_size =	XE_DEFAULT_SVM_NOTIFIER_SIZE,
>   	/* the rest are 0 by default */
>   };
> @@ -81,6 +82,9 @@ MODULE_PARM_DESC(wedged_mode,
>   		 "Module's default policy for the wedged mode (0=never, 1=upon-critical-error, 2=upon-any-hang-no-reset "
>   		 "[default=" XE_DEFAULT_WEDGED_MODE_STR "])");
>   
> +module_param_named(num_pf_work, xe_modparam.num_pf_work, int, 0600);
> +MODULE_PARM_DESC(num_pf_work, "Number of page fault work threads, default=2, min=1, max=8");
> +
>   static int xe_check_nomodeset(void)
>   {
>   	if (drm_firmware_drivers_only())
> diff --git a/drivers/gpu/drm/xe/xe_module.h b/drivers/gpu/drm/xe/xe_module.h
> index 79cb9639c0f3..c6642523184a 100644
> --- a/drivers/gpu/drm/xe/xe_module.h
> +++ b/drivers/gpu/drm/xe/xe_module.h
> @@ -22,6 +22,7 @@ struct xe_modparam {
>   	unsigned int max_vfs;
>   #endif
>   	unsigned int wedged_mode;
> +	unsigned int num_pf_work;
>   	u32 svm_notifier_size;
>   };
>   
> diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
> index 7880fc7e7eb4..64b1dc574ab7 100644
> --- a/drivers/gpu/drm/xe/xe_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_pagefault.c
> @@ -383,7 +383,7 @@ int xe_pagefault_init(struct xe_device *xe)
>   
>   	xe->usm.pf_wq = alloc_workqueue("xe_page_fault_work_queue",
>   					WQ_UNBOUND | WQ_HIGHPRI,
> -					XE_PAGEFAULT_WORK_COUNT);
> +					xe->info.num_pf_work);
>   	if (!xe->usm.pf_wq)
>   		return -ENOMEM;
>   
> @@ -391,7 +391,7 @@ int xe_pagefault_init(struct xe_device *xe)
>   	if (err)
>   		goto err_out;
>   
> -	for (i = 0; i < XE_PAGEFAULT_WORK_COUNT; ++i) {
> +	for (i = 0; i < xe->info.num_pf_work; ++i) {
>   		struct xe_pagefault_work *pf_work = xe->usm.pf_workers + i;
>   
>   		pf_work->xe = xe;
> @@ -457,7 +457,7 @@ static int xe_pagefault_work_index(struct xe_device *xe)
>   {
>   	lockdep_assert_held(&xe->usm.pf_queue.lock);
>   
> -	return xe->usm.current_pf_work++ % XE_PAGEFAULT_WORK_COUNT;
> +	return xe->usm.current_pf_work++ % xe->info.num_pf_work;
>   }
>   
>   /**
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 06669e9c500d..54c7d0f791e1 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -3070,7 +3070,8 @@ static int prefetch_ranges(struct xe_vm *vm, struct xe_vma_ops *vops,
>   	skip_threads =  op->prefetch_range.ranges_count == 1 ||
>   		(!dpagemap && !(vops->flags &
>   				XE_VMA_OPS_FLAG_HAS_SVM_VALID_RANGE)) ||
> -		!(vops->flags & XE_VMA_OPS_FLAG_DOWNGRADE_LOCK);
> +		!(vops->flags & XE_VMA_OPS_FLAG_DOWNGRADE_LOCK) ||
> +		vm->xe->info.num_pf_work == 1;
>   	thread = skip_threads ? &stack_thread : NULL;
>   
>   	if (!skip_threads) {

In addition to patch 04, we go down with workers to 2 by default.

Good idea to have this as a modparam!

Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v4 06/12] drm/xe: Engine class and instance into a u8
  2026-02-26  4:28 [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache Matthew Brost
                   ` (4 preceding siblings ...)
  2026-02-26  4:28 ` [PATCH v4 05/12] drm/xe: Add num_pf_work modparam Matthew Brost
@ 2026-02-26  4:28 ` Matthew Brost
  2026-05-06 16:04   ` Maciej Patelczyk
  2026-02-26  4:28 ` [PATCH v4 07/12] drm/xe: Track pagefault worker runtime Matthew Brost
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 33+ messages in thread
From: Matthew Brost @ 2026-02-26  4:28 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Pack the engine class and instance fields into a single u8 to save space
in struct xe_pagefault. This also makes future extensions easier.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_pagefault.c   |  7 +++++--
 drivers/gpu/drm/xe/xe_pagefault.c       | 12 ++++++++----
 drivers/gpu/drm/xe/xe_pagefault_types.h | 10 ++++++----
 3 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_pagefault.c b/drivers/gpu/drm/xe/xe_guc_pagefault.c
index 607e32392f46..2470faf3d5d8 100644
--- a/drivers/gpu/drm/xe/xe_guc_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_guc_pagefault.c
@@ -89,8 +89,11 @@ int xe_guc_pagefault_handler(struct xe_guc *guc, u32 *msg, u32 len)
 				   FIELD_GET(PFD_FAULT_LEVEL, msg[0])) |
 			FIELD_PREP(XE_PAGEFAULT_TYPE_MASK,
 				   FIELD_GET(PFD_FAULT_TYPE, msg[2]));
-	pf.consumer.engine_class = FIELD_GET(PFD_ENG_CLASS, msg[0]);
-	pf.consumer.engine_instance = FIELD_GET(PFD_ENG_INSTANCE, msg[0]);
+	pf.consumer.engine_class_instance =
+		FIELD_PREP(XE_PAGEFAULT_ENGINE_CLASS_MASK,
+			   FIELD_GET(PFD_ENG_CLASS, msg[0])) |
+		FIELD_PREP(XE_PAGEFAULT_ENGINE_INSTANCE_MASK,
+			   FIELD_GET(PFD_ENG_INSTANCE, msg[0]));
 
 	pf.producer.private = guc;
 	pf.producer.ops = &guc_pagefault_ops;
diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
index 64b1dc574ab7..a6fa790774c5 100644
--- a/drivers/gpu/drm/xe/xe_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_pagefault.c
@@ -245,13 +245,16 @@ static bool xe_pagefault_queue_pop(struct xe_pagefault_queue *pf_queue,
 
 static void xe_pagefault_print(struct xe_pagefault *pf)
 {
+	u8 engine_class = FIELD_GET(XE_PAGEFAULT_ENGINE_CLASS_MASK,
+				    pf->consumer.engine_class_instance);
+
 	xe_gt_info(pf->gt, "\n\tASID: %d\n"
 		   "\tFaulted Address: 0x%08x%08x\n"
 		   "\tFaultType: %lu\n"
 		   "\tAccessType: %lu\n"
 		   "\tFaultLevel: %lu\n"
 		   "\tEngineClass: %d %s\n"
-		   "\tEngineInstance: %d\n",
+		   "\tEngineInstance: %lu\n",
 		   pf->consumer.asid,
 		   upper_32_bits(pf->consumer.page_addr),
 		   lower_32_bits(pf->consumer.page_addr),
@@ -261,9 +264,10 @@ static void xe_pagefault_print(struct xe_pagefault *pf)
 			     pf->consumer.access_type),
 		   FIELD_GET(XE_PAGEFAULT_LEVEL_MASK,
 			     pf->consumer.fault_type_level),
-		   pf->consumer.engine_class,
-		   xe_hw_engine_class_to_str(pf->consumer.engine_class),
-		   pf->consumer.engine_instance);
+		   engine_class,
+		   xe_hw_engine_class_to_str(engine_class),
+		   FIELD_GET(XE_PAGEFAULT_ENGINE_INSTANCE_MASK,
+			     pf->consumer.engine_class_instance));
 }
 
 static void xe_pagefault_queue_work(struct work_struct *w)
diff --git a/drivers/gpu/drm/xe/xe_pagefault_types.h b/drivers/gpu/drm/xe/xe_pagefault_types.h
index 45065c25c25f..75bc53205601 100644
--- a/drivers/gpu/drm/xe/xe_pagefault_types.h
+++ b/drivers/gpu/drm/xe/xe_pagefault_types.h
@@ -82,10 +82,12 @@ struct xe_pagefault {
 #define XE_PAGEFAULT_TYPE_LEVEL_NACK		0xff	/* Producer indicates nack fault */
 #define XE_PAGEFAULT_LEVEL_MASK			GENMASK(3, 0)
 #define XE_PAGEFAULT_TYPE_MASK			GENMASK(7, 4)
-		/** @consumer.engine_class: engine class */
-		u8 engine_class;
-		/** @consumer.engine_instance: engine instance */
-		u8 engine_instance;
+		/** @consumer.engine_class_instance: engine class and instance */
+		u8 engine_class_instance;
+#define XE_PAGEFAULT_ENGINE_CLASS_MASK		GENMASK(3, 0)
+#define XE_PAGEFAULT_ENGINE_INSTANCE_MASK	GENMASK(7, 4)
+		/** @pad: alignment padding */
+		u8 pad;
 		/** consumer.reserved: reserved bits for future expansion */
 		u64 reserved;
 	} consumer;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 06/12] drm/xe: Engine class and instance into a u8
  2026-02-26  4:28 ` [PATCH v4 06/12] drm/xe: Engine class and instance into a u8 Matthew Brost
@ 2026-05-06 16:04   ` Maciej Patelczyk
  2026-05-07 16:20     ` Maciej Patelczyk
  0 siblings, 1 reply; 33+ messages in thread
From: Maciej Patelczyk @ 2026-05-06 16:04 UTC (permalink / raw)
  To: intel-xe

On 26/02/2026 05:28, Matthew Brost wrote:

> Pack the engine class and instance fields into a single u8 to save space
> in struct xe_pagefault. This also makes future extensions easier.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_guc_pagefault.c   |  7 +++++--
>   drivers/gpu/drm/xe/xe_pagefault.c       | 12 ++++++++----
>   drivers/gpu/drm/xe/xe_pagefault_types.h | 10 ++++++----
>   3 files changed, 19 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_guc_pagefault.c b/drivers/gpu/drm/xe/xe_guc_pagefault.c
> index 607e32392f46..2470faf3d5d8 100644
> --- a/drivers/gpu/drm/xe/xe_guc_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_guc_pagefault.c
> @@ -89,8 +89,11 @@ int xe_guc_pagefault_handler(struct xe_guc *guc, u32 *msg, u32 len)
>   				   FIELD_GET(PFD_FAULT_LEVEL, msg[0])) |
>   			FIELD_PREP(XE_PAGEFAULT_TYPE_MASK,
>   				   FIELD_GET(PFD_FAULT_TYPE, msg[2]));
> -	pf.consumer.engine_class = FIELD_GET(PFD_ENG_CLASS, msg[0]);
> -	pf.consumer.engine_instance = FIELD_GET(PFD_ENG_INSTANCE, msg[0]);
> +	pf.consumer.engine_class_instance =
> +		FIELD_PREP(XE_PAGEFAULT_ENGINE_CLASS_MASK,
> +			   FIELD_GET(PFD_ENG_CLASS, msg[0])) |
> +		FIELD_PREP(XE_PAGEFAULT_ENGINE_INSTANCE_MASK,
> +			   FIELD_GET(PFD_ENG_INSTANCE, msg[0]));
>   
>   	pf.producer.private = guc;
>   	pf.producer.ops = &guc_pagefault_ops;
> diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
> index 64b1dc574ab7..a6fa790774c5 100644
> --- a/drivers/gpu/drm/xe/xe_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_pagefault.c
> @@ -245,13 +245,16 @@ static bool xe_pagefault_queue_pop(struct xe_pagefault_queue *pf_queue,
>   
>   static void xe_pagefault_print(struct xe_pagefault *pf)
>   {
> +	u8 engine_class = FIELD_GET(XE_PAGEFAULT_ENGINE_CLASS_MASK,
> +				    pf->consumer.engine_class_instance);
> +
>   	xe_gt_info(pf->gt, "\n\tASID: %d\n"
>   		   "\tFaulted Address: 0x%08x%08x\n"
>   		   "\tFaultType: %lu\n"
>   		   "\tAccessType: %lu\n"
>   		   "\tFaultLevel: %lu\n"
>   		   "\tEngineClass: %d %s\n"
> -		   "\tEngineInstance: %d\n",
> +		   "\tEngineInstance: %lu\n",
>   		   pf->consumer.asid,
>   		   upper_32_bits(pf->consumer.page_addr),
>   		   lower_32_bits(pf->consumer.page_addr),
> @@ -261,9 +264,10 @@ static void xe_pagefault_print(struct xe_pagefault *pf)
>   			     pf->consumer.access_type),
>   		   FIELD_GET(XE_PAGEFAULT_LEVEL_MASK,
>   			     pf->consumer.fault_type_level),
> -		   pf->consumer.engine_class,
> -		   xe_hw_engine_class_to_str(pf->consumer.engine_class),
> -		   pf->consumer.engine_instance);
> +		   engine_class,
> +		   xe_hw_engine_class_to_str(engine_class),
> +		   FIELD_GET(XE_PAGEFAULT_ENGINE_INSTANCE_MASK,
> +			     pf->consumer.engine_class_instance));
>   }
>   
>   static void xe_pagefault_queue_work(struct work_struct *w)
> diff --git a/drivers/gpu/drm/xe/xe_pagefault_types.h b/drivers/gpu/drm/xe/xe_pagefault_types.h
> index 45065c25c25f..75bc53205601 100644
> --- a/drivers/gpu/drm/xe/xe_pagefault_types.h
> +++ b/drivers/gpu/drm/xe/xe_pagefault_types.h
> @@ -82,10 +82,12 @@ struct xe_pagefault {
>   #define XE_PAGEFAULT_TYPE_LEVEL_NACK		0xff	/* Producer indicates nack fault */
>   #define XE_PAGEFAULT_LEVEL_MASK			GENMASK(3, 0)
>   #define XE_PAGEFAULT_TYPE_MASK			GENMASK(7, 4)
> -		/** @consumer.engine_class: engine class */
> -		u8 engine_class;
> -		/** @consumer.engine_instance: engine instance */
> -		u8 engine_instance;
> +		/** @consumer.engine_class_instance: engine class and instance */
> +		u8 engine_class_instance;
> +#define XE_PAGEFAULT_ENGINE_CLASS_MASK		GENMASK(3, 0)
> +#define XE_PAGEFAULT_ENGINE_INSTANCE_MASK	GENMASK(7, 4)
> +		/** @pad: alignment padding */
> +		u8 pad;
>   		/** consumer.reserved: reserved bits for future expansion */
>   		u64 reserved;
>   	} consumer;

Looks good,

Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 06/12] drm/xe: Engine class and instance into a u8
  2026-05-06 16:04   ` Maciej Patelczyk
@ 2026-05-07 16:20     ` Maciej Patelczyk
  0 siblings, 0 replies; 33+ messages in thread
From: Maciej Patelczyk @ 2026-05-07 16:20 UTC (permalink / raw)
  To: intel-xe, Matthew Brost


On 06/05/2026 18:04, Maciej Patelczyk wrote:
> On 26/02/2026 05:28, Matthew Brost wrote:
>
>> Pack the engine class and instance fields into a single u8 to save space
>> in struct xe_pagefault. This also makes future extensions easier.
>>
>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_guc_pagefault.c   |  7 +++++--
>>   drivers/gpu/drm/xe/xe_pagefault.c       | 12 ++++++++----
>>   drivers/gpu/drm/xe/xe_pagefault_types.h | 10 ++++++----
>>   3 files changed, 19 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_guc_pagefault.c 
>> b/drivers/gpu/drm/xe/xe_guc_pagefault.c
>> index 607e32392f46..2470faf3d5d8 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_pagefault.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_pagefault.c
>> @@ -89,8 +89,11 @@ int xe_guc_pagefault_handler(struct xe_guc *guc, 
>> u32 *msg, u32 len)
>>                      FIELD_GET(PFD_FAULT_LEVEL, msg[0])) |
>>               FIELD_PREP(XE_PAGEFAULT_TYPE_MASK,
>>                      FIELD_GET(PFD_FAULT_TYPE, msg[2]));
>> -    pf.consumer.engine_class = FIELD_GET(PFD_ENG_CLASS, msg[0]);
>> -    pf.consumer.engine_instance = FIELD_GET(PFD_ENG_INSTANCE, msg[0]);
>> +    pf.consumer.engine_class_instance =
>> +        FIELD_PREP(XE_PAGEFAULT_ENGINE_CLASS_MASK,
>> +               FIELD_GET(PFD_ENG_CLASS, msg[0])) |
>> +        FIELD_PREP(XE_PAGEFAULT_ENGINE_INSTANCE_MASK,
>> +               FIELD_GET(PFD_ENG_INSTANCE, msg[0]));
>>         pf.producer.private = guc;
>>       pf.producer.ops = &guc_pagefault_ops;
>> diff --git a/drivers/gpu/drm/xe/xe_pagefault.c 
>> b/drivers/gpu/drm/xe/xe_pagefault.c
>> index 64b1dc574ab7..a6fa790774c5 100644
>> --- a/drivers/gpu/drm/xe/xe_pagefault.c
>> +++ b/drivers/gpu/drm/xe/xe_pagefault.c
>> @@ -245,13 +245,16 @@ static bool xe_pagefault_queue_pop(struct 
>> xe_pagefault_queue *pf_queue,
>>     static void xe_pagefault_print(struct xe_pagefault *pf)
>>   {
>> +    u8 engine_class = FIELD_GET(XE_PAGEFAULT_ENGINE_CLASS_MASK,
>> +                    pf->consumer.engine_class_instance);
>> +
>>       xe_gt_info(pf->gt, "\n\tASID: %d\n"
>>              "\tFaulted Address: 0x%08x%08x\n"
>>              "\tFaultType: %lu\n"
>>              "\tAccessType: %lu\n"
>>              "\tFaultLevel: %lu\n"
>>              "\tEngineClass: %d %s\n"
>> -           "\tEngineInstance: %d\n",
>> +           "\tEngineInstance: %lu\n",
>>              pf->consumer.asid,
>>              upper_32_bits(pf->consumer.page_addr),
>>              lower_32_bits(pf->consumer.page_addr),
>> @@ -261,9 +264,10 @@ static void xe_pagefault_print(struct 
>> xe_pagefault *pf)
>>                    pf->consumer.access_type),
>>              FIELD_GET(XE_PAGEFAULT_LEVEL_MASK,
>>                    pf->consumer.fault_type_level),
>> -           pf->consumer.engine_class,
>> - xe_hw_engine_class_to_str(pf->consumer.engine_class),
>> -           pf->consumer.engine_instance);
>> +           engine_class,
>> +           xe_hw_engine_class_to_str(engine_class),
>> +           FIELD_GET(XE_PAGEFAULT_ENGINE_INSTANCE_MASK,
>> +                 pf->consumer.engine_class_instance));
>>   }
>>     static void xe_pagefault_queue_work(struct work_struct *w)
>> diff --git a/drivers/gpu/drm/xe/xe_pagefault_types.h 
>> b/drivers/gpu/drm/xe/xe_pagefault_types.h
>> index 45065c25c25f..75bc53205601 100644
>> --- a/drivers/gpu/drm/xe/xe_pagefault_types.h
>> +++ b/drivers/gpu/drm/xe/xe_pagefault_types.h
>> @@ -82,10 +82,12 @@ struct xe_pagefault {
>>   #define XE_PAGEFAULT_TYPE_LEVEL_NACK        0xff    /* Producer 
>> indicates nack fault */
>>   #define XE_PAGEFAULT_LEVEL_MASK            GENMASK(3, 0)
>>   #define XE_PAGEFAULT_TYPE_MASK            GENMASK(7, 4)
>> -        /** @consumer.engine_class: engine class */
>> -        u8 engine_class;
>> -        /** @consumer.engine_instance: engine instance */
>> -        u8 engine_instance;
>> +        /** @consumer.engine_class_instance: engine class and 
>> instance */
>> +        u8 engine_class_instance;
>> +#define XE_PAGEFAULT_ENGINE_CLASS_MASK        GENMASK(3, 0)
>> +#define XE_PAGEFAULT_ENGINE_INSTANCE_MASK    GENMASK(7, 4)
>> +        /** @pad: alignment padding */
>> +        u8 pad;
>>           /** consumer.reserved: reserved bits for future expansion */
>>           u64 reserved;
>>       } consumer;
>
> Looks good,
>
> Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
>
Sorry,

../drivers/gpu/drm/xe/xe_vm.c: In function ‘xe_vm_add_fault_entry_pf’:

../drivers/gpu/drm/xe/xe_vm.c:605:51: error: ‘struct <anonymous>’ has no 
member named ‘engine_class’

   605 |         hwe = xe_gt_hw_engine(pf->gt, pf->consumer.engine_class,

       |                                                   ^

../drivers/gpu/drm/xe/xe_vm.c:606:44: error: ‘struct <anonymous>’ has no 
member named ‘engine_instance’; did you mean ‘engine_class_instance’?

  606 |  pf->consumer.engine_instance, false);

      |                                            ^~~~~~~~~~~~~~~

       | engine_class_instance


Missing:

--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -602,8 +602,12 @@ void xe_vm_add_fault_entry_pf(struct xe_vm *vm, 
struct xe_pagefault *pf)
         struct xe_hw_engine *hwe;

         /* Do not report faults on reserved engines */
-       hwe = xe_gt_hw_engine(pf->gt, pf->consumer.engine_class,
-                             pf->consumer.engine_instance, false);
+       hwe = xe_gt_hw_engine(pf->gt,
+  FIELD_GET(XE_PAGEFAULT_ENGINE_CLASS_MASK,
+  pf->consumer.engine_class_instance),
+  FIELD_GET(XE_PAGEFAULT_ENGINE_INSTANCE_MASK,
+  pf->consumer.engine_class_instance),
+                             false);
         if (!hwe || xe_hw_engine_is_reserved(hwe))


Maciej


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v4 07/12] drm/xe: Track pagefault worker runtime
  2026-02-26  4:28 [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache Matthew Brost
                   ` (5 preceding siblings ...)
  2026-02-26  4:28 ` [PATCH v4 06/12] drm/xe: Engine class and instance into a u8 Matthew Brost
@ 2026-02-26  4:28 ` Matthew Brost
  2026-05-07 12:51   ` Maciej Patelczyk
  2026-02-26  4:28 ` [PATCH v4 08/12] drm/xe: Chain page faults via queue-resident cache to avoid fault storms Matthew Brost
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 33+ messages in thread
From: Matthew Brost @ 2026-02-26  4:28 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Add a GT stat measuring total time spent servicing pagefault workqueue
iterations. The counter accumulates the runtime of the pagefault worker
in microseconds, allowing correlation of fault storms and chaining
behavior with CPU time spent in the fault handler.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_stats.c       | 1 +
 drivers/gpu/drm/xe/xe_gt_stats_types.h | 1 +
 drivers/gpu/drm/xe/xe_pagefault.c      | 8 ++++++++
 3 files changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_gt_stats.c b/drivers/gpu/drm/xe/xe_gt_stats.c
index 81cec441b449..c1af3ecb429b 100644
--- a/drivers/gpu/drm/xe/xe_gt_stats.c
+++ b/drivers/gpu/drm/xe/xe_gt_stats.c
@@ -60,6 +60,7 @@ static const char *const stat_description[__XE_GT_STATS_NUM_IDS] = {
 	DEF_STAT_STR(SVM_TLB_INVAL_US, "svm_tlb_inval_us"),
 	DEF_STAT_STR(VMA_PAGEFAULT_COUNT, "vma_pagefault_count"),
 	DEF_STAT_STR(VMA_PAGEFAULT_KB, "vma_pagefault_kb"),
+	DEF_STAT_STR(PAGEFAULT_US, "pagefault_us"),
 	DEF_STAT_STR(INVALID_PREFETCH_PAGEFAULT_COUNT, "invalid_prefetch_pagefault_count"),
 	DEF_STAT_STR(SVM_4K_PAGEFAULT_COUNT, "svm_4K_pagefault_count"),
 	DEF_STAT_STR(SVM_64K_PAGEFAULT_COUNT, "svm_64K_pagefault_count"),
diff --git a/drivers/gpu/drm/xe/xe_gt_stats_types.h b/drivers/gpu/drm/xe/xe_gt_stats_types.h
index b6081c312474..129260bfdfe6 100644
--- a/drivers/gpu/drm/xe/xe_gt_stats_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_stats_types.h
@@ -15,6 +15,7 @@ enum xe_gt_stats_id {
 	XE_GT_STATS_ID_SVM_TLB_INVAL_US,
 	XE_GT_STATS_ID_VMA_PAGEFAULT_COUNT,
 	XE_GT_STATS_ID_VMA_PAGEFAULT_KB,
+	XE_GT_STATS_ID_PAGEFAULT_US,
 	XE_GT_STATS_ID_INVALID_PREFETCH_PAGEFAULT_COUNT,
 	XE_GT_STATS_ID_SVM_4K_PAGEFAULT_COUNT,
 	XE_GT_STATS_ID_SVM_64K_PAGEFAULT_COUNT,
diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
index a6fa790774c5..030452923ab9 100644
--- a/drivers/gpu/drm/xe/xe_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_pagefault.c
@@ -277,6 +277,8 @@ static void xe_pagefault_queue_work(struct work_struct *w)
 	struct xe_device *xe = pf_work->xe;
 	struct xe_pagefault_queue *pf_queue = &xe->usm.pf_queue;
 	struct xe_pagefault pf;
+	ktime_t start = xe_gt_stats_ktime_get();
+	struct xe_gt *gt = NULL;
 	unsigned long threshold;
 
 #define USM_QUEUE_MAX_RUNTIME_MS      20
@@ -288,6 +290,7 @@ static void xe_pagefault_queue_work(struct work_struct *w)
 		if (!pf.gt)	/* Fault squashed during reset */
 			continue;
 
+		gt = pf.gt;
 		err = xe_pagefault_service(&pf);
 
 		if (err == -EAGAIN) {
@@ -314,6 +317,11 @@ static void xe_pagefault_queue_work(struct work_struct *w)
 		}
 	}
 #undef USM_QUEUE_MAX_RUNTIME_MS
+
+	if (gt)
+		xe_gt_stats_incr(xe_root_mmio_gt(gt_to_xe(gt)),
+				 XE_GT_STATS_ID_PAGEFAULT_US,
+				 xe_gt_stats_ktime_us_delta(start));
 }
 
 static int xe_pagefault_queue_init(struct xe_device *xe,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 07/12] drm/xe: Track pagefault worker runtime
  2026-02-26  4:28 ` [PATCH v4 07/12] drm/xe: Track pagefault worker runtime Matthew Brost
@ 2026-05-07 12:51   ` Maciej Patelczyk
  0 siblings, 0 replies; 33+ messages in thread
From: Maciej Patelczyk @ 2026-05-07 12:51 UTC (permalink / raw)
  To: intel-xe

On 26/02/2026 05:28, Matthew Brost wrote:

> Add a GT stat measuring total time spent servicing pagefault workqueue
> iterations. The counter accumulates the runtime of the pagefault worker
> in microseconds, allowing correlation of fault storms and chaining
> behavior with CPU time spent in the fault handler.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_gt_stats.c       | 1 +
>   drivers/gpu/drm/xe/xe_gt_stats_types.h | 1 +
>   drivers/gpu/drm/xe/xe_pagefault.c      | 8 ++++++++
>   3 files changed, 10 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_gt_stats.c b/drivers/gpu/drm/xe/xe_gt_stats.c
> index 81cec441b449..c1af3ecb429b 100644
> --- a/drivers/gpu/drm/xe/xe_gt_stats.c
> +++ b/drivers/gpu/drm/xe/xe_gt_stats.c
> @@ -60,6 +60,7 @@ static const char *const stat_description[__XE_GT_STATS_NUM_IDS] = {
>   	DEF_STAT_STR(SVM_TLB_INVAL_US, "svm_tlb_inval_us"),
>   	DEF_STAT_STR(VMA_PAGEFAULT_COUNT, "vma_pagefault_count"),
>   	DEF_STAT_STR(VMA_PAGEFAULT_KB, "vma_pagefault_kb"),
> +	DEF_STAT_STR(PAGEFAULT_US, "pagefault_us"),
>   	DEF_STAT_STR(INVALID_PREFETCH_PAGEFAULT_COUNT, "invalid_prefetch_pagefault_count"),
>   	DEF_STAT_STR(SVM_4K_PAGEFAULT_COUNT, "svm_4K_pagefault_count"),
>   	DEF_STAT_STR(SVM_64K_PAGEFAULT_COUNT, "svm_64K_pagefault_count"),
> diff --git a/drivers/gpu/drm/xe/xe_gt_stats_types.h b/drivers/gpu/drm/xe/xe_gt_stats_types.h
> index b6081c312474..129260bfdfe6 100644
> --- a/drivers/gpu/drm/xe/xe_gt_stats_types.h
> +++ b/drivers/gpu/drm/xe/xe_gt_stats_types.h
> @@ -15,6 +15,7 @@ enum xe_gt_stats_id {
>   	XE_GT_STATS_ID_SVM_TLB_INVAL_US,
>   	XE_GT_STATS_ID_VMA_PAGEFAULT_COUNT,
>   	XE_GT_STATS_ID_VMA_PAGEFAULT_KB,
> +	XE_GT_STATS_ID_PAGEFAULT_US,
>   	XE_GT_STATS_ID_INVALID_PREFETCH_PAGEFAULT_COUNT,
>   	XE_GT_STATS_ID_SVM_4K_PAGEFAULT_COUNT,
>   	XE_GT_STATS_ID_SVM_64K_PAGEFAULT_COUNT,
> diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
> index a6fa790774c5..030452923ab9 100644
> --- a/drivers/gpu/drm/xe/xe_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_pagefault.c
> @@ -277,6 +277,8 @@ static void xe_pagefault_queue_work(struct work_struct *w)
>   	struct xe_device *xe = pf_work->xe;
>   	struct xe_pagefault_queue *pf_queue = &xe->usm.pf_queue;
>   	struct xe_pagefault pf;
> +	ktime_t start = xe_gt_stats_ktime_get();
> +	struct xe_gt *gt = NULL;
>   	unsigned long threshold;
>   
>   #define USM_QUEUE_MAX_RUNTIME_MS      20
> @@ -288,6 +290,7 @@ static void xe_pagefault_queue_work(struct work_struct *w)
>   		if (!pf.gt)	/* Fault squashed during reset */
>   			continue;
>   
> +		gt = pf.gt;
>   		err = xe_pagefault_service(&pf);
>   
>   		if (err == -EAGAIN) {
> @@ -314,6 +317,11 @@ static void xe_pagefault_queue_work(struct work_struct *w)
>   		}
>   	}
>   #undef USM_QUEUE_MAX_RUNTIME_MS
> +
> +	if (gt)
> +		xe_gt_stats_incr(xe_root_mmio_gt(gt_to_xe(gt)),
> +				 XE_GT_STATS_ID_PAGEFAULT_US,
> +				 xe_gt_stats_ktime_us_delta(start));
>   }
>   
>   static int xe_pagefault_queue_init(struct xe_device *xe,

I would use plural here to emphasis pagefaults and workers.

My first impression when reading 'pagefault_us' it that is single 
pagefault processing time.


Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v4 08/12] drm/xe: Chain page faults via queue-resident cache to avoid fault storms
  2026-02-26  4:28 [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache Matthew Brost
                   ` (6 preceding siblings ...)
  2026-02-26  4:28 ` [PATCH v4 07/12] drm/xe: Track pagefault worker runtime Matthew Brost
@ 2026-02-26  4:28 ` Matthew Brost
  2026-05-08 12:03   ` Maciej Patelczyk
  2026-02-26  4:28 ` [PATCH v4 09/12] drm/xe: Add pagefault chaining stats Matthew Brost
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 33+ messages in thread
From: Matthew Brost @ 2026-02-26  4:28 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Some Xe platforms can generate pagefault storms where many faults target
the same address range in a short time window (e.g. many EU threads
faulting the same page). The current worker/locking model effectively
serializes faults for a given range and repeatedly performs VMA/range
lookups for each fault, which creates head-of-queue blocking and wastes
CPU in the hot path.

Introduce a page fault chaining cache that coalesces faults targeting
the samr ASID and address range.

Each worker tracks the active fault range it is servicing. Fault entries
reside in stable queue storage, allowing the IRQ handler to match new
faults against the worker cache and directly chain cache hits onto the
active entry without allocation or waiting for dequeue. Once the leading
fault completes, the worker acknowledges the entire chain.

A small allocation state is added to each entry so queue, worker, and
IRQ pathd can safely reference the same fault object. This prevents
reuse while the fault is active and guarantees that chained faults
remain valid until acknowledged.

Fault handlers also record the serviced range so subsequent faults can
be acknowledged without re-running the full resolution path.

This removes repeated fault resolution during fault storms and
significantly improves forward progress in SVM workloads.

Assisted-by: Chat-GPT # Documentation
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_pagefault.c       | 434 +++++++++++++++++++++---
 drivers/gpu/drm/xe/xe_pagefault.h       |  71 ++++
 drivers/gpu/drm/xe/xe_pagefault_types.h |  78 +++--
 drivers/gpu/drm/xe/xe_svm.c             |  16 +-
 drivers/gpu/drm/xe/xe_svm.h             |   9 +-
 5 files changed, 523 insertions(+), 85 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
index 030452923ab9..9c14f9505faf 100644
--- a/drivers/gpu/drm/xe/xe_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_pagefault.c
@@ -35,6 +35,70 @@
  * xe_pagefault.c implements the consumer layer.
  */
 
+/**
+ * DOC: Xe page fault cache
+ *
+ * Some Xe hardware can trigger “fault storms,” which are many page faults to
+ * the same address within a short period of time. An example is many EU threads
+ * faulting on the same page simultaneously. With the current page fault locking
+ * structure, only one page fault for a given address range can be processed at
+ * a time. This causes head-of-queue blocking across workers, killing
+ * parallelism. If the page fault handler must repeatedly look up resources
+ * (VMAs, ranges) to determine that the pages are valid for each fault in the
+ * storm, the time complexity grows rapidly.
+ *
+ * To address this, each page fault worker maintains a cache of the active fault
+ * being processed. Subsequent faults that hit in the cache are chained to the
+ * pending fault, and all chained faults are acknowledged once the initial fault
+ * completes. This alleviates head-of-queue blocking and quickly chains faults
+ * in the upper layers, avoiding expensive lookups in the main fault-handling
+ * path.
+ *
+ * Faults are buffered in the page fault queue in a way that provides stable
+ * storage for outstanding faults. In particular, faults may be chained directly
+ * while still resident in the queue storage (i.e., outside the worker’s current
+ * head/tail dequeue position). This allows the IRQ handler to match newly
+ * arrived faults against the per-worker cache and immediately chain cache hits
+ * onto the active fault under the queue lock, without allocating memory or
+ * waiting for the worker to pop the fault first.
+ *
+ * A per-fault state field is used to assert correctness of these invariants.
+ * The state tracks whether an entry is free, queued, chained, or currently
+ * active. Transitions are performed under the page fault queue lock, and the
+ * worker acknowledges faults by walking the chain and returning entries to the
+ * free state once they are complete.
+ */
+
+/**
+ * enum xe_pagefault_alloc_state - lifetime state for a page fault queue entry
+ * @XE_PAGEFAULT_ALLOC_STATE_FREE:
+ *	Entry is unused and may be overwritten by the producer, consumer retry
+ *	or requeue..
+ * @XE_PAGEFAULT_ALLOC_STATE_QUEUED:
+ *	Entry has been enqueued and may be dequeued by a worker.
+ * @XE_PAGEFAULT_ALLOC_STATE_ACTIVE:
+ *	Entry has been dequeued and is the worker's currently serviced fault.
+ *	The worker may attach additional faults to it via consumer.next.
+ * @XE_PAGEFAULT_ALLOC_STATE_CHAINED:
+ *	Entry is not independently serviced; it has been chained onto an
+ *	ACTIVE entry via consumer.next and will be acknowledged when the
+ *	leading fault completes.
+ *
+ * The page fault queue provides stable storage for outstanding faults so the
+ * IRQ handler can chain new cache hits directly onto a worker's active fault.
+ * Because entries may remain referenced outside the consumer dequeue window,
+ * the producer must only write into entries in the FREE state.
+ *
+ * State transitions are protected by the page fault queue lock. Workers return
+ * entries to FREE after acknowledging the fault (either as ACTIVE or CHAINED).
+ */
+enum xe_pagefault_alloc_state {
+	XE_PAGEFAULT_ALLOC_STATE_FREE		= 0,
+	XE_PAGEFAULT_ALLOC_STATE_QUEUED		= 1,
+	XE_PAGEFAULT_ALLOC_STATE_CHAINED	= 2,
+	XE_PAGEFAULT_ALLOC_STATE_ACTIVE		= 3,
+};
+
 static int xe_pagefault_entry_size(void)
 {
 	/*
@@ -64,7 +128,7 @@ static int xe_pagefault_begin(struct drm_exec *exec, struct xe_vma *vma,
 }
 
 static int xe_pagefault_handle_vma(struct xe_gt *gt, struct xe_vma *vma,
-				   bool atomic)
+				   struct xe_pagefault *pf, bool atomic)
 {
 	struct xe_vm *vm = xe_vma_vm(vma);
 	struct xe_tile *tile = gt_to_tile(gt);
@@ -89,8 +153,11 @@ static int xe_pagefault_handle_vma(struct xe_gt *gt, struct xe_vma *vma,
 
 	/* Check if VMA is valid, opportunistic check only */
 	if (xe_vm_has_valid_gpu_mapping(tile, vma->tile_present,
-					vma->tile_invalidated) && !atomic)
+					vma->tile_invalidated) && !atomic) {
+		xe_pagefault_set_start_addr(pf, xe_vma_start(vma));
+		xe_pagefault_set_end_addr(pf, xe_vma_end(vma));
 		return 0;
+	}
 
 	do {
 		if (xe_vma_is_userptr(vma) &&
@@ -128,6 +195,10 @@ static int xe_pagefault_handle_vma(struct xe_gt *gt, struct xe_vma *vma,
 	} while (err == -EAGAIN);
 
 	if (!err) {
+		/* Give hint to immediately ack faults */
+		xe_pagefault_set_start_addr(pf, xe_vma_start(vma));
+		xe_pagefault_set_end_addr(pf, xe_vma_end(vma));
+
 		dma_fence_wait(fence, false);
 		dma_fence_put(fence);
 	}
@@ -199,10 +270,10 @@ static int xe_pagefault_service(struct xe_pagefault *pf)
 	atomic = xe_pagefault_access_is_atomic(pf->consumer.access_type);
 
 	if (xe_vma_is_cpu_addr_mirror(vma))
-		err = xe_svm_handle_pagefault(vm, vma, gt,
+		err = xe_svm_handle_pagefault(vm, vma, pf, gt,
 					      pf->consumer.page_addr, atomic);
 	else
-		err = xe_pagefault_handle_vma(gt, vma, atomic);
+		err = xe_pagefault_handle_vma(gt, vma, pf, atomic);
 
 unlock_vm:
 	if (!err)
@@ -214,33 +285,228 @@ static int xe_pagefault_service(struct xe_pagefault *pf)
 	return err;
 }
 
+#define XE_PAGEFAULT_CACHE_START_INVALID	U64_MAX
+#define xe_pagefault_cache_start_invalidate(val)	\
+	(val = XE_PAGEFAULT_CACHE_START_INVALID)
+
+static void
+__xe_pagefault_cache_invalidate(struct xe_pagefault_queue *pf_queue,
+				struct xe_pagefault_work *pf_work)
+{
+	lockdep_assert_held(&pf_queue->lock);
+
+	xe_pagefault_cache_start_invalidate(pf_work->cache.start);
+}
+
+static void
+xe_pagefault_cache_invalidate(struct xe_pagefault_queue *pf_queue,
+			      struct xe_pagefault_work *pf_work)
+{
+	spin_lock_irq(&pf_queue->lock);
+	__xe_pagefault_cache_invalidate(pf_queue, pf_work);
+	spin_unlock_irq(&pf_queue->lock);
+}
+
+static bool xe_pagefault_queue_full(struct xe_pagefault_queue *pf_queue)
+{
+	lockdep_assert_held(&pf_queue->lock);
+
+	return CIRC_SPACE(pf_queue->head, pf_queue->tail,
+			  pf_queue->size) <= xe_pagefault_entry_size();
+}
+
+static struct xe_pagefault *
+__xe_pagefault_queue_add(struct xe_pagefault_queue *pf_queue,
+			 struct xe_pagefault *pf)
+{
+	struct xe_device *xe = container_of(pf_queue, typeof(*xe),
+					    usm.pf_queue);
+	struct xe_pagefault *lpf;
+
+	lockdep_assert_held(&pf_queue->lock);
+
+	do {
+		xe_assert(xe, !xe_pagefault_queue_full(pf_queue));
+
+		lpf = (pf_queue->data + pf_queue->head);
+		pf_queue->head = (pf_queue->head + xe_pagefault_entry_size()) %
+			pf_queue->size;
+	} while (lpf->consumer.alloc_state != XE_PAGEFAULT_ALLOC_STATE_FREE);
+
+	memcpy(lpf, pf, sizeof(*pf));
+	lpf->consumer.alloc_state = XE_PAGEFAULT_ALLOC_STATE_QUEUED;
+
+	return lpf;
+}
+
 static void xe_pagefault_queue_retry(struct xe_pagefault_queue *pf_queue,
-				     struct xe_pagefault *pf)
+				     struct xe_pagefault *pf,
+				     struct xe_pagefault_work *pf_work)
 {
+	struct xe_device *xe = container_of(pf_queue, typeof(*xe),
+					    usm.pf_queue);
+
+	xe_assert(xe, pf->consumer.alloc_state ==
+		  XE_PAGEFAULT_ALLOC_STATE_ACTIVE);
+
 	spin_lock_irq(&pf_queue->lock);
-	if (!pf_queue->tail)
-		pf_queue->tail = pf_queue->size - xe_pagefault_entry_size();
-	else
-		pf_queue->tail -= xe_pagefault_entry_size();
-	memcpy(pf_queue->data + pf_queue->tail, pf, sizeof(*pf));
+	pf->consumer.alloc_state = XE_PAGEFAULT_ALLOC_STATE_FREE;
+	__xe_pagefault_queue_add(pf_queue, pf);
+	__xe_pagefault_cache_invalidate(pf_queue, pf_work);
 	spin_unlock_irq(&pf_queue->lock);
 }
 
-static bool xe_pagefault_queue_pop(struct xe_pagefault_queue *pf_queue,
-				   struct xe_pagefault *pf)
+static struct xe_pagefault *
+xe_pagefault_queue_requeue(struct xe_pagefault_queue *pf_queue,
+			   struct xe_pagefault *pf, struct xe_gt *gt)
 {
-	bool found_fault = false;
+	struct xe_device *xe = container_of(pf_queue, typeof(*xe),
+					    usm.pf_queue);
+	struct xe_pagefault *next = pf->consumer.next, *lpf;
+
+	xe_assert(xe, pf->consumer.alloc_state ==
+		  XE_PAGEFAULT_ALLOC_STATE_CHAINED);
 
 	spin_lock_irq(&pf_queue->lock);
-	if (pf_queue->tail != pf_queue->head) {
-		memcpy(pf, pf_queue->data + pf_queue->tail, sizeof(*pf));
-		pf_queue->tail = (pf_queue->tail + xe_pagefault_entry_size()) %
-			pf_queue->size;
-		found_fault = true;
-	}
+	pf->consumer.alloc_state = XE_PAGEFAULT_ALLOC_STATE_FREE;
+	lpf = __xe_pagefault_queue_add(pf_queue, pf);
+	lpf->consumer.next = NULL;
+	lpf->consumer.fault_type_level |= XE_PAGEFAULT_REQUEUE_MASK;
 	spin_unlock_irq(&pf_queue->lock);
 
-	return found_fault;
+	return next;
+}
+
+static bool xe_pagefault_cache_match(struct xe_pagefault *pf, u64 start,
+				     u64 end, u64 cache_asid)
+{
+	struct xe_device *xe = gt_to_xe(pf->gt);
+	u64 page_addr = pf->consumer.page_addr;
+	u32 pf_asid = pf->consumer.asid;
+
+	xe_assert(xe, pf->consumer.alloc_state !=
+		  XE_PAGEFAULT_ALLOC_STATE_FREE);
+
+	return page_addr >= start && page_addr < end &&
+		pf_asid == cache_asid;
+}
+
+static bool xe_pagefault_cache_hit(struct xe_pagefault_queue *pf_queue,
+				   struct xe_pagefault *pf)
+{
+	struct xe_device *xe = container_of(pf_queue, typeof(*xe),
+					    usm.pf_queue);
+	struct xe_pagefault_work *pf_work;
+	bool requeue = FIELD_GET(XE_PAGEFAULT_REQUEUE_MASK,
+				 pf->consumer.fault_type_level);
+	int i;
+
+	lockdep_assert_held(&pf_queue->lock);
+	xe_assert(xe, pf->consumer.alloc_state ==
+		  XE_PAGEFAULT_ALLOC_STATE_QUEUED);
+
+	/*
+	 * If this is a retry, we may already have a chain attached. In that
+	 * case, we cannot hit in the cache because chains cannot easily be
+	 * combined.
+	 */
+	if (pf->consumer.next)
+		return false;
+
+	for (i = 0, pf_work = xe->usm.pf_workers;
+	     i < xe->info.num_pf_work; ++i, ++pf_work) {
+		u64 start = pf_work->cache.start;
+		u64 end = requeue ? start + SZ_4K : pf_work->cache.end;
+		u32 asid = pf_work->cache.asid;
+
+		if (xe_pagefault_cache_match(pf, start, end, asid)) {
+			xe_assert(xe, pf_work->cache.pf->consumer.alloc_state ==
+				  XE_PAGEFAULT_ALLOC_STATE_ACTIVE);
+
+			pf->consumer.alloc_state =
+				XE_PAGEFAULT_ALLOC_STATE_CHAINED;
+			pf->consumer.next = pf_work->cache.pf->consumer.next;
+			pf_work->cache.pf->consumer.next = pf;
+
+			return true;
+		}
+	}
+
+	return false;
+}
+
+static void xe_pagefault_queue_advance(struct xe_pagefault_queue *pf_queue)
+{
+	lockdep_assert_held(&pf_queue->lock);
+
+	pf_queue->tail = (pf_queue->tail + xe_pagefault_entry_size()) %
+		pf_queue->size;
+}
+
+static bool xe_pagefault_queue_pop(struct xe_pagefault_queue *pf_queue,
+				   struct xe_pagefault **pf, int id)
+{
+	struct xe_device *xe = container_of(pf_queue, typeof(*xe),
+					    usm.pf_queue);
+	struct xe_pagefault_work *pf_work;
+	struct xe_pagefault *lpf;
+	size_t align = SZ_2M;
+
+	guard(spinlock_irq)(&pf_queue->lock);
+
+	for (*pf = NULL; !*pf;) {
+		if (pf_queue->tail == pf_queue->head)
+			return false;
+
+		lpf = (pf_queue->data + pf_queue->tail);
+		xe_pagefault_queue_advance(pf_queue);
+
+		if (lpf->consumer.alloc_state !=
+		    XE_PAGEFAULT_ALLOC_STATE_QUEUED)
+			continue;
+
+		if (xe_pagefault_cache_hit(pf_queue, lpf))
+			continue;
+
+		*pf = lpf;	/* Hand back page fault for processing */
+	}
+
+	/*
+	 * No cache hit; allocate a new cache entry. We assume most faults
+	 * within a 2M range will hit the same pages. If this assumption proves
+	 * false, the mismatched fault is requeued after the initial fault is
+	 * acknowledged.
+	 */
+	pf_work = xe->usm.pf_workers + id;
+	if (FIELD_GET(XE_PAGEFAULT_REQUEUE_MASK,
+		      lpf->consumer.fault_type_level))
+		align = SZ_4K;
+	pf_work->cache.start = ALIGN_DOWN(lpf->consumer.page_addr, align);
+	pf_work->cache.end = pf_work->cache.start + align;
+	pf_work->cache.asid = lpf->consumer.asid;
+	pf_work->cache.pf = lpf;
+	lpf->consumer.alloc_state = XE_PAGEFAULT_ALLOC_STATE_ACTIVE;
+
+	/* Drain queue until empty or new fault found */
+	while (1) {
+		if (pf_queue->tail == pf_queue->head)
+			break;
+
+		lpf = (pf_queue->data + pf_queue->tail);
+
+		if (lpf->consumer.alloc_state !=
+		    XE_PAGEFAULT_ALLOC_STATE_QUEUED) {
+			xe_pagefault_queue_advance(pf_queue);
+			continue;
+		}
+
+		if (!xe_pagefault_cache_hit(pf_queue, lpf))
+			break;
+
+		xe_pagefault_queue_advance(pf_queue);
+	}
+
+	return true;
 }
 
 static void xe_pagefault_print(struct xe_pagefault *pf)
@@ -276,40 +542,91 @@ static void xe_pagefault_queue_work(struct work_struct *w)
 		container_of(w, typeof(*pf_work), work);
 	struct xe_device *xe = pf_work->xe;
 	struct xe_pagefault_queue *pf_queue = &xe->usm.pf_queue;
-	struct xe_pagefault pf;
+	struct xe_pagefault *pf;
 	ktime_t start = xe_gt_stats_ktime_get();
-	struct xe_gt *gt = NULL;
 	unsigned long threshold;
+	u64 cache_start = XE_PAGEFAULT_CACHE_START_INVALID, cache_end = 0;
+	u32 cache_asid = 0;
 
 #define USM_QUEUE_MAX_RUNTIME_MS      20
 	threshold = jiffies + msecs_to_jiffies(USM_QUEUE_MAX_RUNTIME_MS);
 
-	while (xe_pagefault_queue_pop(pf_queue, &pf)) {
-		int err;
 
-		if (!pf.gt)	/* Fault squashed during reset */
-			continue;
+	while (xe_pagefault_queue_pop(pf_queue, &pf, pf_work->id)) {
+		struct xe_gt *gt = pf->gt;
+		u32 asid = pf->consumer.asid;
+		int err = 0;
+
+		/* Last fault same address, ack immediately */
+		if (xe_pagefault_cache_match(pf, cache_start, cache_end,
+					     cache_asid))
+			goto ack_fault;
 
-		gt = pf.gt;
-		err = xe_pagefault_service(&pf);
+		err = xe_pagefault_service(pf);
 
 		if (err == -EAGAIN) {
-			xe_pagefault_queue_retry(pf_queue, &pf);
+			xe_pagefault_queue_retry(pf_queue, pf, pf_work);
 			queue_work(xe->usm.pf_wq, w);
 			break;
-		} else if (err) {
-			if (!(pf.consumer.access_type & XE_PAGEFAULT_ACCESS_PREFETCH)) {
-				xe_pagefault_print(&pf);
-				xe_gt_info(pf.gt, "Fault response: Unsuccessful %pe\n",
+		} else if (err ) {
+			if (!(pf->consumer.access_type & XE_PAGEFAULT_ACCESS_PREFETCH)) {
+				xe_pagefault_cache_start_invalidate(cache_start);
+				xe_pagefault_print(pf);
+				xe_gt_info(pf->gt, "Fault response: Unsuccessful %pe\n",
 					   ERR_PTR(err));
 			} else {
-				xe_gt_stats_incr(pf.gt, XE_GT_STATS_ID_INVALID_PREFETCH_PAGEFAULT_COUNT, 1);
-				xe_gt_dbg(pf.gt, "Prefetch Fault response: Unsuccessful %pe\n",
+				xe_gt_stats_incr(pf->gt, XE_GT_STATS_ID_INVALID_PREFETCH_PAGEFAULT_COUNT, 1);
+				xe_gt_dbg(pf->gt, "Prefetch Fault response: Unsuccessful %pe\n",
 					  ERR_PTR(err));
 			}
+		} else {
+			/* Cache valid fault locally */
+			cache_start = xe_pagefault_start_addr(pf);
+			cache_end = xe_pagefault_end_addr(pf);
+			cache_asid = asid;
 		}
 
-		pf.producer.ops->ack_fault(&pf, err);
+ack_fault:
+		xe_assert(xe, pf->consumer.alloc_state ==
+			  XE_PAGEFAULT_ALLOC_STATE_ACTIVE);
+		xe_assert(xe, pf == pf_work->cache.pf);
+
+		while (pf) {
+			struct xe_pagefault *next;
+
+			xe_assert(xe, pf->consumer.alloc_state ==
+				  XE_PAGEFAULT_ALLOC_STATE_CHAINED ||
+				  pf->consumer.alloc_state ==
+				  XE_PAGEFAULT_ALLOC_STATE_ACTIVE);
+
+			pf->producer.ops->ack_fault(pf, err);
+
+			if (pf->consumer.alloc_state ==
+			    XE_PAGEFAULT_ALLOC_STATE_ACTIVE)
+				xe_pagefault_cache_invalidate(pf_queue,
+							      pf_work);
+
+			/*
+			 * Removed from the cache, so next is stable within this
+			 * chain. Once alloc_state transitions to
+			 * XE_PAGEFAULT_ALLOC_STATE_FREE, the local entry must
+			 * not be touched.
+			 */
+			next = pf->consumer.next;
+			WRITE_ONCE(pf->consumer.alloc_state,
+				   XE_PAGEFAULT_ALLOC_STATE_FREE);
+			pf = next;
+
+			/*
+			 * Requeue chained faults which do not match the last
+			 * fault processed
+			 */
+			while (pf && !xe_pagefault_cache_match(pf, cache_start,
+							       cache_end,
+							       cache_asid))
+				pf = xe_pagefault_queue_requeue(pf_queue, pf,
+								gt);
+		}
 
 		if (time_after(jiffies, threshold)) {
 			queue_work(xe->usm.pf_wq, w);
@@ -318,10 +635,8 @@ static void xe_pagefault_queue_work(struct work_struct *w)
 	}
 #undef USM_QUEUE_MAX_RUNTIME_MS
 
-	if (gt)
-		xe_gt_stats_incr(xe_root_mmio_gt(gt_to_xe(gt)),
-				 XE_GT_STATS_ID_PAGEFAULT_US,
-				 xe_gt_stats_ktime_us_delta(start));
+	xe_gt_stats_incr(xe_root_mmio_gt(xe), XE_GT_STATS_ID_PAGEFAULT_US,
+			 xe_gt_stats_ktime_us_delta(start));
 }
 
 static int xe_pagefault_queue_init(struct xe_device *xe,
@@ -408,6 +723,7 @@ int xe_pagefault_init(struct xe_device *xe)
 
 		pf_work->xe = xe;
 		pf_work->id = i;
+		xe_pagefault_cache_start_invalidate(pf_work->cache.start);
 		INIT_WORK(&pf_work->work, xe_pagefault_queue_work);
 	}
 
@@ -430,12 +746,15 @@ static void xe_pagefault_queue_reset(struct xe_device *xe, struct xe_gt *gt,
 	/* Squash all pending faults on the GT */
 
 	spin_lock_irq(&pf_queue->lock);
-	for (i = pf_queue->tail; i != pf_queue->head;
-	     i = (i + xe_pagefault_entry_size()) % pf_queue->size) {
+	for (i = 0; i < pf_queue->size; i += xe_pagefault_entry_size()) {
 		struct xe_pagefault *pf = pf_queue->data + i;
 
-		if (pf->gt == gt)
-			pf->gt = NULL;
+		if (pf->gt != gt)
+			continue;
+
+		pf->consumer.alloc_state =
+			XE_PAGEFAULT_ALLOC_STATE_FREE;
+		pf->consumer.next = NULL;
 	}
 	spin_unlock_irq(&pf_queue->lock);
 }
@@ -453,12 +772,11 @@ void xe_pagefault_reset(struct xe_device *xe, struct xe_gt *gt)
 	xe_pagefault_queue_reset(xe, gt, &xe->usm.pf_queue);
 }
 
-static bool xe_pagefault_queue_full(struct xe_pagefault_queue *pf_queue)
+static bool xe_pagefault_queue_empty(struct xe_pagefault_queue *pf_queue)
 {
 	lockdep_assert_held(&pf_queue->lock);
 
-	return CIRC_SPACE(pf_queue->head, pf_queue->tail, pf_queue->size) <=
-		xe_pagefault_entry_size();
+	return pf_queue->head == pf_queue->tail;
 }
 
 /*
@@ -486,18 +804,26 @@ int xe_pagefault_handler(struct xe_device *xe, struct xe_pagefault *pf)
 {
 	struct xe_pagefault_queue *pf_queue = &xe->usm.pf_queue;
 	unsigned long flags;
-	int work_index;
 	bool full;
 
 	spin_lock_irqsave(&pf_queue->lock, flags);
-	work_index = xe_pagefault_work_index(xe);
 	full = xe_pagefault_queue_full(pf_queue);
 	if (!full) {
-		memcpy(pf_queue->data + pf_queue->head, pf, sizeof(*pf));
-		pf_queue->head = (pf_queue->head + xe_pagefault_entry_size()) %
-			pf_queue->size;
-		queue_work(xe->usm.pf_wq,
-			   &xe->usm.pf_workers[work_index].work);
+		struct xe_pagefault *lpf;
+		bool empty = xe_pagefault_queue_empty(pf_queue);
+
+		lpf = __xe_pagefault_queue_add(pf_queue, pf);
+		lpf->consumer.next = NULL;
+
+		if (xe_pagefault_cache_hit(pf_queue, lpf)) {
+			if (empty)
+				xe_pagefault_queue_advance(pf_queue);
+		} else {
+			int work_index = xe_pagefault_work_index(xe);
+
+			queue_work(xe->usm.pf_wq,
+				   &xe->usm.pf_workers[work_index].work);
+		}
 	} else {
 		drm_warn(&xe->drm,
 			 "PageFault Queue full, shouldn't be possible\n");
diff --git a/drivers/gpu/drm/xe/xe_pagefault.h b/drivers/gpu/drm/xe/xe_pagefault.h
index bd0cdf9ed37f..feaf2a69674a 100644
--- a/drivers/gpu/drm/xe/xe_pagefault.h
+++ b/drivers/gpu/drm/xe/xe_pagefault.h
@@ -6,6 +6,8 @@
 #ifndef _XE_PAGEFAULT_H_
 #define _XE_PAGEFAULT_H_
 
+#include "xe_pagefault_types.h"
+
 struct xe_device;
 struct xe_gt;
 struct xe_pagefault;
@@ -16,4 +18,73 @@ void xe_pagefault_reset(struct xe_device *xe, struct xe_gt *gt);
 
 int xe_pagefault_handler(struct xe_device *xe, struct xe_pagefault *pf);
 
+#define XE_PAGEFAULT_END_ADDR_MASK	(~0xfffull)
+
+/**
+ * xe_pagefault_set_end_addr() - store serviced range end for a pagefault
+ * @pf: Pagefault entry
+ * @end_addr: Inclusive end address of the serviced fault range
+ *
+ * The pagefault consumer stores the resolved fault range so subsequent faults
+ * hitting the same range can be immediately acknowledged without re-running
+ * the full fault handling path.
+ *
+ * The end address shares storage with other consumer metadata and therefore
+ * must be masked with %XE_PAGEFAULT_END_ADDR_MASK before storing. Bits outside
+ * the mask are reserved for internal state tracking and must be preserved.
+ */
+static inline void
+xe_pagefault_set_end_addr(struct xe_pagefault *pf, u64 end_addr)
+{
+	pf->consumer.end_addr &= ~XE_PAGEFAULT_END_ADDR_MASK;
+	pf->consumer.end_addr |= end_addr;
+}
+
+/**
+ * xe_pagefault_end_addr() - read serviced range end for a pagefault
+ * @pf: Pagefault entry
+ *
+ * Returns the inclusive end address of the range previously recorded by
+ * xe_pagefault_set_end_addr(). Only the bits covered by
+ * %XE_PAGEFAULT_END_ADDR_MASK are returned; other bits in the storage are
+ * reserved for internal state.
+ *
+ * Return: End address of the serviced fault range.
+ */
+static inline u64 xe_pagefault_end_addr(struct xe_pagefault *pf)
+{
+	return pf->consumer.end_addr & XE_PAGEFAULT_END_ADDR_MASK;
+}
+
+#undef XE_PAGEFAULT_END_ADDR_MASK
+
+/**
+ * xe_pagefault_set_start_addr() - store serviced range start for a pagefault
+ * @pf: Pagefault entry
+ * @start_addr: Start address of the serviced fault range
+ *
+ * The pagefault consumer stores the resolved fault range so subsequent faults
+ * hitting the same range can be immediately acknowledged without re-running
+ * the full fault handling path.
+ */
+static inline void
+xe_pagefault_set_start_addr(struct xe_pagefault *pf, u64 start_addr)
+{
+	pf->consumer.page_addr = start_addr;
+}
+
+/**
+ * xe_pagefault_start_addr() - read serviced range start for a pagefault
+ * @pf: Pagefault entry
+ *
+ * Returns the inclusive start address of the range previously recorded by
+ * xe_pagefault_set_start_addr().
+ *
+ * Return: Start address of the serviced fault range.
+ */
+static inline u64 xe_pagefault_start_addr(struct xe_pagefault *pf)
+{
+	return pf->consumer.page_addr;
+}
+
 #endif
diff --git a/drivers/gpu/drm/xe/xe_pagefault_types.h b/drivers/gpu/drm/xe/xe_pagefault_types.h
index 75bc53205601..57cb292105d7 100644
--- a/drivers/gpu/drm/xe/xe_pagefault_types.h
+++ b/drivers/gpu/drm/xe/xe_pagefault_types.h
@@ -60,36 +60,55 @@ struct xe_pagefault {
 	/**
 	 * @consumer: State for the software handling the fault. Populated by
 	 * the producer and may be modified by the consumer to communicate
-	 * information back to the producer upon fault acknowledgment.
+	 * information back to the producer upon fault acknowledgment. After
+	 * fault acknowledgment, the producer should only access consumer fields
+	 * via well defined helpers.
 	 */
 	struct {
-		/** @consumer.page_addr: address of page fault */
-		u64 page_addr;
-		/** @consumer.asid: address space ID */
-		u32 asid;
 		/**
-		 * @consumer.access_type: access type and prefetch flag packed
-		 * into a u8.
+		 * @consumer.page_addr: address of page fault, populated by
+		 * consumer after fault completion
 		 */
-		u8 access_type;
+		u64 page_addr;
+		union {
+			struct {
+				/** @alloc_state: page fault allocation state */
+				u8 alloc_state;
+				/**
+				 * @consumer.access_type: access type, u8 rather
+				 * than enum to keep size compact
+				 */
+				u8 access_type;
 #define XE_PAGEFAULT_ACCESS_TYPE_MASK	GENMASK(1, 0)
 #define XE_PAGEFAULT_ACCESS_PREFETCH	BIT(7)
-		/**
-		 * @consumer.fault_type_level: fault type and level, u8 rather
-		 * than enum to keep size compact
-		 */
-		u8 fault_type_level;
+				/**
+				 * @consumer.fault_type_level: fault type and
+				 * level, u8 rather than enum to keep size
+				 * compact
+				 */
+				u8 fault_type_level;
 #define XE_PAGEFAULT_TYPE_LEVEL_NACK		0xff	/* Producer indicates nack fault */
-#define XE_PAGEFAULT_LEVEL_MASK			GENMASK(3, 0)
-#define XE_PAGEFAULT_TYPE_MASK			GENMASK(7, 4)
-		/** @consumer.engine_class_instance: engine class and instance */
-		u8 engine_class_instance;
+#define XE_PAGEFAULT_LEVEL_MASK			GENMASK(2, 0)
+#define XE_PAGEFAULT_TYPE_MASK			GENMASK(6, 3)
+#define XE_PAGEFAULT_REQUEUE_MASK		BIT(7)
+				/** @consumer.engine_class_instance: engine class and instance */
+				u8 engine_class_instance;
 #define XE_PAGEFAULT_ENGINE_CLASS_MASK		GENMASK(3, 0)
 #define XE_PAGEFAULT_ENGINE_INSTANCE_MASK	GENMASK(7, 4)
-		/** @pad: alignment padding */
-		u8 pad;
-		/** consumer.reserved: reserved bits for future expansion */
-		u64 reserved;
+				/** @consumer.asid: address space ID */
+				u32 asid;
+			};
+			/**
+			 * @consumer.end_addr: end address of page fault,
+			 * populated by consumer after fault completion
+			 */
+			u64 end_addr;
+		};
+		/**
+		 * @consumer.next: next pagefault chained to this fault,
+		 * protected by pf_queue lock
+		 */
+		struct xe_pagefault *next;
 	} consumer;
 	/**
 	 * @producer: State for the producer (i.e., HW/FW interface). Populated
@@ -131,7 +150,7 @@ struct xe_pagefault_queue {
 	u32 head;
 	/** @tail: Tail pointer in bytes, moved by consumer, protected by @lock */
 	u32 tail;
-	/** @lock: protects page fault queue */
+	/** @lock: protects page fault queue, workers caches */
 	spinlock_t lock;
 };
 
@@ -146,6 +165,21 @@ struct xe_pagefault_work {
 	struct xe_device *xe;
 	/** @id: Identifier for this work item */
 	int id;
+	/**
+	 * @cache: Page fault cache for the currently processed fault
+	 *
+	 * Protected by the page fault queue lock.
+	 */
+	struct {
+		/** @cache.start: Start address of the current page fault */
+		u64 start;
+		/** @cache.end: End address of the current page fault */
+		u64 end;
+		/** @cache.asid: Address space ID of the current page fault */
+		u32 asid;
+		/** @cache.pf: Pointer to the current page fault */
+		struct xe_pagefault *pf;
+	} cache;
 	/** @work: Work item used to process the page fault */
 	struct work_struct work;
 };
diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index 66eee490a0c3..fc439fd85187 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -15,6 +15,7 @@
 #include "xe_gt_stats.h"
 #include "xe_migrate.h"
 #include "xe_module.h"
+#include "xe_pagefault.h"
 #include "xe_pm.h"
 #include "xe_pt.h"
 #include "xe_svm.h"
@@ -1215,8 +1216,8 @@ DECL_SVM_RANGE_US_STATS(bind, BIND)
 DECL_SVM_RANGE_US_STATS(fault, PAGEFAULT)
 
 static int __xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
-				     struct xe_gt *gt, u64 fault_addr,
-				     bool need_vram)
+				     struct xe_pagefault *pf, struct xe_gt *gt,
+				     u64 fault_addr, bool need_vram)
 {
 	int devmem_possible = IS_DGFX(vm->xe) &&
 		IS_ENABLED(CONFIG_DRM_XE_PAGEMAP);
@@ -1372,6 +1373,10 @@ static int __xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
 	xe_svm_range_bind_us_stats_incr(gt, range, bind_start);
 
 out:
+	/* Give hint to immediately ack faults */
+	xe_pagefault_set_start_addr(pf,  xe_svm_range_start(range));
+	xe_pagefault_set_end_addr(pf, xe_svm_range_end(range));
+
 	xe_svm_range_fault_us_stats_incr(gt, range, start);
 	mutex_unlock(&range->lock);
 	drm_gpusvm_range_put(&range->base);
@@ -1394,6 +1399,7 @@ static int __xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
  * xe_svm_handle_pagefault() - SVM handle page fault
  * @vm: The VM.
  * @vma: The CPU address mirror VMA.
+ * @pf: Pagefault structure
  * @gt: The gt upon the fault occurred.
  * @fault_addr: The GPU fault address.
  * @atomic: The fault atomic access bit.
@@ -1404,8 +1410,8 @@ static int __xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
  * Return: 0 on success, negative error code on error.
  */
 int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
-			    struct xe_gt *gt, u64 fault_addr,
-			    bool atomic)
+			    struct xe_pagefault *pf, struct xe_gt *gt,
+			    u64 fault_addr, bool atomic)
 {
 	int need_vram, ret;
 retry:
@@ -1413,7 +1419,7 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
 	if (need_vram < 0)
 		return need_vram;
 
-	ret =  __xe_svm_handle_pagefault(vm, vma, gt, fault_addr,
+	ret =  __xe_svm_handle_pagefault(vm, vma, pf, gt, fault_addr,
 					 need_vram ? true : false);
 	if (ret == -EAGAIN) {
 		/*
diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index ebcca34f7f4d..07be92579971 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -21,6 +21,7 @@ struct drm_file;
 struct xe_bo;
 struct xe_gt;
 struct xe_device;
+struct xe_pagefault;
 struct xe_vram_region;
 struct xe_tile;
 struct xe_vm;
@@ -107,8 +108,8 @@ void xe_svm_fini(struct xe_vm *vm);
 void xe_svm_close(struct xe_vm *vm);
 
 int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
-			    struct xe_gt *gt, u64 fault_addr,
-			    bool atomic);
+			    struct xe_pagefault *pf, struct xe_gt *gt,
+			    u64 fault_addr, bool atomic);
 
 bool xe_svm_has_mapping(struct xe_vm *vm, u64 start, u64 end);
 
@@ -296,8 +297,8 @@ void xe_svm_close(struct xe_vm *vm)
 
 static inline
 int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
-			    struct xe_gt *gt, u64 fault_addr,
-			    bool atomic)
+			    struct xe_pagefault *pf, struct xe_gt *gt,
+			    u64 fault_addr, bool atomic)
 {
 	return 0;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 08/12] drm/xe: Chain page faults via queue-resident cache to avoid fault storms
  2026-02-26  4:28 ` [PATCH v4 08/12] drm/xe: Chain page faults via queue-resident cache to avoid fault storms Matthew Brost
@ 2026-05-08 12:03   ` Maciej Patelczyk
  0 siblings, 0 replies; 33+ messages in thread
From: Maciej Patelczyk @ 2026-05-08 12:03 UTC (permalink / raw)
  To: Matthew Brost, intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

On 26/02/2026 05:28, Matthew Brost wrote:

> Some Xe platforms can generate pagefault storms where many faults target
> the same address range in a short time window (e.g. many EU threads
> faulting the same page). The current worker/locking model effectively
> serializes faults for a given range and repeatedly performs VMA/range
> lookups for each fault, which creates head-of-queue blocking and wastes
> CPU in the hot path.
>
> Introduce a page fault chaining cache that coalesces faults targeting
> the samr ASID and address range.
>
> Each worker tracks the active fault range it is servicing. Fault entries
> reside in stable queue storage, allowing the IRQ handler to match new
> faults against the worker cache and directly chain cache hits onto the
> active entry without allocation or waiting for dequeue. Once the leading
> fault completes, the worker acknowledges the entire chain.
>
> A small allocation state is added to each entry so queue, worker, and
> IRQ pathd can safely reference the same fault object. This prevents
> reuse while the fault is active and guarantees that chained faults
> remain valid until acknowledged.
>
> Fault handlers also record the serviced range so subsequent faults can
> be acknowledged without re-running the full resolution path.
>
> This removes repeated fault resolution during fault storms and
> significantly improves forward progress in SVM workloads.
>
> Assisted-by: Chat-GPT # Documentation
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_pagefault.c       | 434 +++++++++++++++++++++---
>   drivers/gpu/drm/xe/xe_pagefault.h       |  71 ++++
>   drivers/gpu/drm/xe/xe_pagefault_types.h |  78 +++--
>   drivers/gpu/drm/xe/xe_svm.c             |  16 +-
>   drivers/gpu/drm/xe/xe_svm.h             |   9 +-
>   5 files changed, 523 insertions(+), 85 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
> index 030452923ab9..9c14f9505faf 100644
> --- a/drivers/gpu/drm/xe/xe_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_pagefault.c
> @@ -35,6 +35,70 @@
>    * xe_pagefault.c implements the consumer layer.
>    */
>   
> +/**
> + * DOC: Xe page fault cache
> + *
> + * Some Xe hardware can trigger “fault storms,” which are many page faults to
> + * the same address within a short period of time. An example is many EU threads
> + * faulting on the same page simultaneously. With the current page fault locking
> + * structure, only one page fault for a given address range can be processed at
> + * a time. This causes head-of-queue blocking across workers, killing
> + * parallelism. If the page fault handler must repeatedly look up resources
> + * (VMAs, ranges) to determine that the pages are valid for each fault in the
> + * storm, the time complexity grows rapidly.
> + *
> + * To address this, each page fault worker maintains a cache of the active fault
> + * being processed. Subsequent faults that hit in the cache are chained to the
> + * pending fault, and all chained faults are acknowledged once the initial fault
> + * completes. This alleviates head-of-queue blocking and quickly chains faults
> + * in the upper layers, avoiding expensive lookups in the main fault-handling
> + * path.
> + *
> + * Faults are buffered in the page fault queue in a way that provides stable
> + * storage for outstanding faults. In particular, faults may be chained directly
> + * while still resident in the queue storage (i.e., outside the worker’s current
> + * head/tail dequeue position). This allows the IRQ handler to match newly
> + * arrived faults against the per-worker cache and immediately chain cache hits
> + * onto the active fault under the queue lock, without allocating memory or
> + * waiting for the worker to pop the fault first.
> + *
> + * A per-fault state field is used to assert correctness of these invariants.
> + * The state tracks whether an entry is free, queued, chained, or currently
> + * active. Transitions are performed under the page fault queue lock, and the
> + * worker acknowledges faults by walking the chain and returning entries to the
> + * free state once they are complete.
> + */
> +
> +/**
> + * enum xe_pagefault_alloc_state - lifetime state for a page fault queue entry
> + * @XE_PAGEFAULT_ALLOC_STATE_FREE:
> + *	Entry is unused and may be overwritten by the producer, consumer retry
> + *	or requeue..
> + * @XE_PAGEFAULT_ALLOC_STATE_QUEUED:
> + *	Entry has been enqueued and may be dequeued by a worker.
> + * @XE_PAGEFAULT_ALLOC_STATE_ACTIVE:
> + *	Entry has been dequeued and is the worker's currently serviced fault.
> + *	The worker may attach additional faults to it via consumer.next.
> + * @XE_PAGEFAULT_ALLOC_STATE_CHAINED:
> + *	Entry is not independently serviced; it has been chained onto an
> + *	ACTIVE entry via consumer.next and will be acknowledged when the
> + *	leading fault completes.
> + *
> + * The page fault queue provides stable storage for outstanding faults so the
> + * IRQ handler can chain new cache hits directly onto a worker's active fault.
> + * Because entries may remain referenced outside the consumer dequeue window,
> + * the producer must only write into entries in the FREE state.
> + *
> + * State transitions are protected by the page fault queue lock. Workers return
> + * entries to FREE after acknowledging the fault (either as ACTIVE or CHAINED).
> + */
> +enum xe_pagefault_alloc_state {
> +	XE_PAGEFAULT_ALLOC_STATE_FREE		= 0,
> +	XE_PAGEFAULT_ALLOC_STATE_QUEUED		= 1,
> +	XE_PAGEFAULT_ALLOC_STATE_CHAINED	= 2,
> +	XE_PAGEFAULT_ALLOC_STATE_ACTIVE		= 3,
> +};
> +
>   static int xe_pagefault_entry_size(void)
>   {
>   	/*
> @@ -64,7 +128,7 @@ static int xe_pagefault_begin(struct drm_exec *exec, struct xe_vma *vma,
>   }
>   
>   static int xe_pagefault_handle_vma(struct xe_gt *gt, struct xe_vma *vma,
> -				   bool atomic)
> +				   struct xe_pagefault *pf, bool atomic)
>   {
>   	struct xe_vm *vm = xe_vma_vm(vma);
>   	struct xe_tile *tile = gt_to_tile(gt);
> @@ -89,8 +153,11 @@ static int xe_pagefault_handle_vma(struct xe_gt *gt, struct xe_vma *vma,
>   
>   	/* Check if VMA is valid, opportunistic check only */
>   	if (xe_vm_has_valid_gpu_mapping(tile, vma->tile_present,
> -					vma->tile_invalidated) && !atomic)
> +					vma->tile_invalidated) && !atomic) {
> +		xe_pagefault_set_start_addr(pf, xe_vma_start(vma));
> +		xe_pagefault_set_end_addr(pf, xe_vma_end(vma));
>   		return 0;
> +	}
>   
>   	do {
>   		if (xe_vma_is_userptr(vma) &&
> @@ -128,6 +195,10 @@ static int xe_pagefault_handle_vma(struct xe_gt *gt, struct xe_vma *vma,
>   	} while (err == -EAGAIN);
>   
>   	if (!err) {
> +		/* Give hint to immediately ack faults */
> +		xe_pagefault_set_start_addr(pf, xe_vma_start(vma));
> +		xe_pagefault_set_end_addr(pf, xe_vma_end(vma));
> +
>   		dma_fence_wait(fence, false);
>   		dma_fence_put(fence);
>   	}
> @@ -199,10 +270,10 @@ static int xe_pagefault_service(struct xe_pagefault *pf)
>   	atomic = xe_pagefault_access_is_atomic(pf->consumer.access_type);
>   
>   	if (xe_vma_is_cpu_addr_mirror(vma))
> -		err = xe_svm_handle_pagefault(vm, vma, gt,
> +		err = xe_svm_handle_pagefault(vm, vma, pf, gt,
>   					      pf->consumer.page_addr, atomic);
>   	else
> -		err = xe_pagefault_handle_vma(gt, vma, atomic);
> +		err = xe_pagefault_handle_vma(gt, vma, pf, atomic);
>   
>   unlock_vm:
>   	if (!err)
> @@ -214,33 +285,228 @@ static int xe_pagefault_service(struct xe_pagefault *pf)
>   	return err;
>   }
>   
> +#define XE_PAGEFAULT_CACHE_START_INVALID	U64_MAX
> +#define xe_pagefault_cache_start_invalidate(val)	\
> +	(val = XE_PAGEFAULT_CACHE_START_INVALID)
> +
> +static void
> +__xe_pagefault_cache_invalidate(struct xe_pagefault_queue *pf_queue,
> +				struct xe_pagefault_work *pf_work)
> +{
> +	lockdep_assert_held(&pf_queue->lock);
> +
> +	xe_pagefault_cache_start_invalidate(pf_work->cache.start);
> +}
> +
> +static void
> +xe_pagefault_cache_invalidate(struct xe_pagefault_queue *pf_queue,
> +			      struct xe_pagefault_work *pf_work)
> +{
> +	spin_lock_irq(&pf_queue->lock);
> +	__xe_pagefault_cache_invalidate(pf_queue, pf_work);
> +	spin_unlock_irq(&pf_queue->lock);
> +}
> +
> +static bool xe_pagefault_queue_full(struct xe_pagefault_queue *pf_queue)
> +{
> +	lockdep_assert_held(&pf_queue->lock);
> +
> +	return CIRC_SPACE(pf_queue->head, pf_queue->tail,
> +			  pf_queue->size) <= xe_pagefault_entry_size();
> +}
> +
> +static struct xe_pagefault *
> +__xe_pagefault_queue_add(struct xe_pagefault_queue *pf_queue,
> +			 struct xe_pagefault *pf)
> +{
> +	struct xe_device *xe = container_of(pf_queue, typeof(*xe),
> +					    usm.pf_queue);
> +	struct xe_pagefault *lpf;
> +
> +	lockdep_assert_held(&pf_queue->lock);
> +
> +	do {
> +		xe_assert(xe, !xe_pagefault_queue_full(pf_queue));
> +
> +		lpf = (pf_queue->data + pf_queue->head);
> +		pf_queue->head = (pf_queue->head + xe_pagefault_entry_size()) %
> +			pf_queue->size;
> +	} while (lpf->consumer.alloc_state != XE_PAGEFAULT_ALLOC_STATE_FREE);
> +
> +	memcpy(lpf, pf, sizeof(*pf));
> +	lpf->consumer.alloc_state = XE_PAGEFAULT_ALLOC_STATE_QUEUED;
> +
> +	return lpf;
> +}
> +
>   static void xe_pagefault_queue_retry(struct xe_pagefault_queue *pf_queue,
> -				     struct xe_pagefault *pf)
> +				     struct xe_pagefault *pf,
> +				     struct xe_pagefault_work *pf_work)
>   {
> +	struct xe_device *xe = container_of(pf_queue, typeof(*xe),
> +					    usm.pf_queue);
> +
> +	xe_assert(xe, pf->consumer.alloc_state ==
> +		  XE_PAGEFAULT_ALLOC_STATE_ACTIVE);
> +
>   	spin_lock_irq(&pf_queue->lock);
> -	if (!pf_queue->tail)
> -		pf_queue->tail = pf_queue->size - xe_pagefault_entry_size();
> -	else
> -		pf_queue->tail -= xe_pagefault_entry_size();
> -	memcpy(pf_queue->data + pf_queue->tail, pf, sizeof(*pf));
> +	pf->consumer.alloc_state = XE_PAGEFAULT_ALLOC_STATE_FREE;
> +	__xe_pagefault_queue_add(pf_queue, pf);
> +	__xe_pagefault_cache_invalidate(pf_queue, pf_work);
>   	spin_unlock_irq(&pf_queue->lock);
>   }
>   
> -static bool xe_pagefault_queue_pop(struct xe_pagefault_queue *pf_queue,
> -				   struct xe_pagefault *pf)
> +static struct xe_pagefault *
> +xe_pagefault_queue_requeue(struct xe_pagefault_queue *pf_queue,
> +			   struct xe_pagefault *pf, struct xe_gt *gt)
This is specifically for chained entries and it's _unchain_and_requeue 
actually.
>   {
> -	bool found_fault = false;
> +	struct xe_device *xe = container_of(pf_queue, typeof(*xe),
> +					    usm.pf_queue);
> +	struct xe_pagefault *next = pf->consumer.next, *lpf;
> +
> +	xe_assert(xe, pf->consumer.alloc_state ==
> +		  XE_PAGEFAULT_ALLOC_STATE_CHAINED);
>   
>   	spin_lock_irq(&pf_queue->lock);
> -	if (pf_queue->tail != pf_queue->head) {
> -		memcpy(pf, pf_queue->data + pf_queue->tail, sizeof(*pf));
> -		pf_queue->tail = (pf_queue->tail + xe_pagefault_entry_size()) %
> -			pf_queue->size;
> -		found_fault = true;
> -	}
> +	pf->consumer.alloc_state = XE_PAGEFAULT_ALLOC_STATE_FREE;
> +	lpf = __xe_pagefault_queue_add(pf_queue, pf);
> +	lpf->consumer.next = NULL;
> +	lpf->consumer.fault_type_level |= XE_PAGEFAULT_REQUEUE_MASK;
>   	spin_unlock_irq(&pf_queue->lock);
>   
> -	return found_fault;
> +	return next;
> +}
> +
> +static bool xe_pagefault_cache_match(struct xe_pagefault *pf, u64 start,
> +				     u64 end, u64 cache_asid)
Maybe without 'cache' since the comparison is for function's arguments.
> +{
> +	struct xe_device *xe = gt_to_xe(pf->gt);
> +	u64 page_addr = pf->consumer.page_addr;
> +	u32 pf_asid = pf->consumer.asid;
> +
> +	xe_assert(xe, pf->consumer.alloc_state !=
> +		  XE_PAGEFAULT_ALLOC_STATE_FREE);
> +
> +	return page_addr >= start && page_addr < end &&
> +		pf_asid == cache_asid;
> +}
> +
> +static bool xe_pagefault_cache_hit(struct xe_pagefault_queue *pf_queue,
> +				   struct xe_pagefault *pf)

I would say that the function chains entry to worker's active chain.

I think you could rename it to xe_pagefault_entry_chained() or similar.

 From 'cache_hit' I expect just the answer if it matches to any cache.

> +{
> +	struct xe_device *xe = container_of(pf_queue, typeof(*xe),
> +					    usm.pf_queue);
> +	struct xe_pagefault_work *pf_work;
> +	bool requeue = FIELD_GET(XE_PAGEFAULT_REQUEUE_MASK,
> +				 pf->consumer.fault_type_level);
> +	int i;
> +
> +	lockdep_assert_held(&pf_queue->lock);
> +	xe_assert(xe, pf->consumer.alloc_state ==
> +		  XE_PAGEFAULT_ALLOC_STATE_QUEUED);
> +
> +	/*
> +	 * If this is a retry, we may already have a chain attached. In that
> +	 * case, we cannot hit in the cache because chains cannot easily be
> +	 * combined.
> +	 */
> +	if (pf->consumer.next)
> +		return false;
> +
> +	for (i = 0, pf_work = xe->usm.pf_workers;
> +	     i < xe->info.num_pf_work; ++i, ++pf_work) {
> +		u64 start = pf_work->cache.start;
> +		u64 end = requeue ? start + SZ_4K : pf_work->cache.end;
> +		u32 asid = pf_work->cache.asid;
> +
> +		if (xe_pagefault_cache_match(pf, start, end, asid)) {
> +			xe_assert(xe, pf_work->cache.pf->consumer.alloc_state ==
> +				  XE_PAGEFAULT_ALLOC_STATE_ACTIVE);
> +
> +			pf->consumer.alloc_state =
> +				XE_PAGEFAULT_ALLOC_STATE_CHAINED;
> +			pf->consumer.next = pf_work->cache.pf->consumer.next;
> +			pf_work->cache.pf->consumer.next = pf;
> +
> +			return true;
> +		}
> +	}
> +
> +	return false;
> +}
> +
> +static void xe_pagefault_queue_advance(struct xe_pagefault_queue *pf_queue)
> +{
> +	lockdep_assert_held(&pf_queue->lock);
> +
> +	pf_queue->tail = (pf_queue->tail + xe_pagefault_entry_size()) %
> +		pf_queue->size;
> +}
> +
> +static bool xe_pagefault_queue_pop(struct xe_pagefault_queue *pf_queue,
> +				   struct xe_pagefault **pf, int id)
> +{
> +	struct xe_device *xe = container_of(pf_queue, typeof(*xe),
> +					    usm.pf_queue);
> +	struct xe_pagefault_work *pf_work;
> +	struct xe_pagefault *lpf;
> +	size_t align = SZ_2M;
> +
> +	guard(spinlock_irq)(&pf_queue->lock);
> +
> +	for (*pf = NULL; !*pf;) {
> +		if (pf_queue->tail == pf_queue->head)
> +			return false;
xe_pagefault_queue_empty(pf_queue)?
> +
> +		lpf = (pf_queue->data + pf_queue->tail);

Maybe add a helper

static inline struct xe_pagefault *xe_pagefault_queue_get_tail_locked(struct xe_pagefault_queue *pf_queue)
{
	return pf_queue->data + pf_queue->tail;
}

Getting tail is used more than once in a code.

> +		xe_pagefault_queue_advance(pf_queue);
> +
> +		if (lpf->consumer.alloc_state !=
> +		    XE_PAGEFAULT_ALLOC_STATE_QUEUED)
> +			continue;
> +
> +		if (xe_pagefault_cache_hit(pf_queue, lpf))
> +			continue;

This chains STATE_QUEUED entry into matching worker.

> +
> +		*pf = lpf;	/* Hand back page fault for processing */
> +	}
> +
> +	/*
> +	 * No cache hit; allocate a new cache entry. We assume most faults
> +	 * within a 2M range will hit the same pages. If this assumption proves
> +	 * false, the mismatched fault is requeued after the initial fault is
> +	 * acknowledged.
> +	 */
> +	pf_work = xe->usm.pf_workers + id;
> +	if (FIELD_GET(XE_PAGEFAULT_REQUEUE_MASK,
> +		      lpf->consumer.fault_type_level))
> +		align = SZ_4K;
> +	pf_work->cache.start = ALIGN_DOWN(lpf->consumer.page_addr, align);
> +	pf_work->cache.end = pf_work->cache.start + align;
> +	pf_work->cache.asid = lpf->consumer.asid;
> +	pf_work->cache.pf = lpf;
> +	lpf->consumer.alloc_state = XE_PAGEFAULT_ALLOC_STATE_ACTIVE;
> +
> +	/* Drain queue until empty or new fault found */
> +	while (1) {
> +		if (pf_queue->tail == pf_queue->head)
> +			break;
> +
> +		lpf = (pf_queue->data + pf_queue->tail);
> +
> +		if (lpf->consumer.alloc_state !=
> +		    XE_PAGEFAULT_ALLOC_STATE_QUEUED) {
> +			xe_pagefault_queue_advance(pf_queue);
> +			continue;
> +		}
> +
> +		if (!xe_pagefault_cache_hit(pf_queue, lpf))
> +			break;
> +
> +		xe_pagefault_queue_advance(pf_queue);
> +	}
> +
In this while(1) initial chaining begins. If there are queued pagefaults 
before any worker starts in this loop they will be chained for the first 
time.
> +	return true;
>   }
>   
>   static void xe_pagefault_print(struct xe_pagefault *pf)
> @@ -276,40 +542,91 @@ static void xe_pagefault_queue_work(struct work_struct *w)
>   		container_of(w, typeof(*pf_work), work);
>   	struct xe_device *xe = pf_work->xe;
>   	struct xe_pagefault_queue *pf_queue = &xe->usm.pf_queue;
> -	struct xe_pagefault pf;
> +	struct xe_pagefault *pf;
>   	ktime_t start = xe_gt_stats_ktime_get();
> -	struct xe_gt *gt = NULL;
>   	unsigned long threshold;
> +	u64 cache_start = XE_PAGEFAULT_CACHE_START_INVALID, cache_end = 0;
> +	u32 cache_asid = 0;
>   
>   #define USM_QUEUE_MAX_RUNTIME_MS      20
>   	threshold = jiffies + msecs_to_jiffies(USM_QUEUE_MAX_RUNTIME_MS);
>   
> -	while (xe_pagefault_queue_pop(pf_queue, &pf)) {
> -		int err;
>   
> -		if (!pf.gt)	/* Fault squashed during reset */
> -			continue;
> +	while (xe_pagefault_queue_pop(pf_queue, &pf, pf_work->id)) {
> +		struct xe_gt *gt = pf->gt;
> +		u32 asid = pf->consumer.asid;
> +		int err = 0;
> +
> +		/* Last fault same address, ack immediately */
> +		if (xe_pagefault_cache_match(pf, cache_start, cache_end,
> +					     cache_asid))
> +			goto ack_fault;
>   
> -		gt = pf.gt;
> -		err = xe_pagefault_service(&pf);
> +		err = xe_pagefault_service(pf);
>   
>   		if (err == -EAGAIN) {
> -			xe_pagefault_queue_retry(pf_queue, &pf);
> +			xe_pagefault_queue_retry(pf_queue, pf, pf_work);
>   			queue_work(xe->usm.pf_wq, w);
>   			break;
> -		} else if (err) {
> -			if (!(pf.consumer.access_type & XE_PAGEFAULT_ACCESS_PREFETCH)) {
> -				xe_pagefault_print(&pf);
> -				xe_gt_info(pf.gt, "Fault response: Unsuccessful %pe\n",
> +		} else if (err ) {
Checkpatch will catch it anyway. Space after err.
> +			if (!(pf->consumer.access_type & XE_PAGEFAULT_ACCESS_PREFETCH)) {
> +				xe_pagefault_cache_start_invalidate(cache_start);
> +				xe_pagefault_print(pf);
> +				xe_gt_info(pf->gt, "Fault response: Unsuccessful %pe\n",
>   					   ERR_PTR(err));
>   			} else {
> -				xe_gt_stats_incr(pf.gt, XE_GT_STATS_ID_INVALID_PREFETCH_PAGEFAULT_COUNT, 1);
> -				xe_gt_dbg(pf.gt, "Prefetch Fault response: Unsuccessful %pe\n",
> +				xe_gt_stats_incr(pf->gt, XE_GT_STATS_ID_INVALID_PREFETCH_PAGEFAULT_COUNT, 1);
> +				xe_gt_dbg(pf->gt, "Prefetch Fault response: Unsuccessful %pe\n",
>   					  ERR_PTR(err));
>   			}
> +		} else {
> +			/* Cache valid fault locally */
> +			cache_start = xe_pagefault_start_addr(pf);
> +			cache_end = xe_pagefault_end_addr(pf);
> +			cache_asid = asid;
>   		}
>   
> -		pf.producer.ops->ack_fault(&pf, err);
> +ack_fault:
> +		xe_assert(xe, pf->consumer.alloc_state ==
> +			  XE_PAGEFAULT_ALLOC_STATE_ACTIVE);
> +		xe_assert(xe, pf == pf_work->cache.pf);
> +
> +		while (pf) {
> +			struct xe_pagefault *next;
> +
> +			xe_assert(xe, pf->consumer.alloc_state ==
> +				  XE_PAGEFAULT_ALLOC_STATE_CHAINED ||
> +				  pf->consumer.alloc_state ==
> +				  XE_PAGEFAULT_ALLOC_STATE_ACTIVE);
> +
> +			pf->producer.ops->ack_fault(pf, err);
> +
> +			if (pf->consumer.alloc_state ==
> +			    XE_PAGEFAULT_ALLOC_STATE_ACTIVE)
> +				xe_pagefault_cache_invalidate(pf_queue,
> +							      pf_work);

This gives me a lot of thinking.

Why to invalidate the cache now? The xe_pagefault_handler tries to chain 
incoming entries with existing chain but it has limitations

  I think a lock could be grabbed here or above since the whole section 
here will become protected.

> +
> +			/*
> +			 * Removed from the cache, so next is stable within this
> +			 * chain. Once alloc_state transitions to
> +			 * XE_PAGEFAULT_ALLOC_STATE_FREE, the local entry must
> +			 * not be touched.
> +			 */
> +			next = pf->consumer.next;
> +			WRITE_ONCE(pf->consumer.alloc_state,
> +				   XE_PAGEFAULT_ALLOC_STATE_FREE);
> +			pf = next;
> +
> +			/*
> +			 * Requeue chained faults which do not match the last
> +			 * fault processed
> +			 */
> +			while (pf && !xe_pagefault_cache_match(pf, cache_start,
> +							       cache_end,
> +							       cache_asid))
> +				pf = xe_pagefault_queue_requeue(pf_queue, pf,
> +								gt);

After 'while' if pf then update the pf->consumer.alloc_state to ACTIVE, 
update the worker cache to pf

then release the lock. xe_pagefault_queue_requeue becomes _locked then.

This way the handler could add new entries while worker is active.


Or potentially queue_pop could catch later on such loose entry and chain 
it...


Second thing. I'm not sure I understand this requeue action.

Is it for instance because initially all 2MB range pagefaults were 
chained and after xe_pagefault_service() the range was narrowed down to 4k?

If so then some of those pagefaults should really be unchained and requeued.

> +		}
>   
>   		if (time_after(jiffies, threshold)) {
>   			queue_work(xe->usm.pf_wq, w);
> @@ -318,10 +635,8 @@ static void xe_pagefault_queue_work(struct work_struct *w)
>   	}
>   #undef USM_QUEUE_MAX_RUNTIME_MS
>   
> -	if (gt)
> -		xe_gt_stats_incr(xe_root_mmio_gt(gt_to_xe(gt)),
> -				 XE_GT_STATS_ID_PAGEFAULT_US,
> -				 xe_gt_stats_ktime_us_delta(start));
> +	xe_gt_stats_incr(xe_root_mmio_gt(xe), XE_GT_STATS_ID_PAGEFAULT_US,
> +			 xe_gt_stats_ktime_us_delta(start));
>   }
>   
>   static int xe_pagefault_queue_init(struct xe_device *xe,
> @@ -408,6 +723,7 @@ int xe_pagefault_init(struct xe_device *xe)
>   
>   		pf_work->xe = xe;
>   		pf_work->id = i;
> +		xe_pagefault_cache_start_invalidate(pf_work->cache.start);
>   		INIT_WORK(&pf_work->work, xe_pagefault_queue_work);
>   	}
>   
> @@ -430,12 +746,15 @@ static void xe_pagefault_queue_reset(struct xe_device *xe, struct xe_gt *gt,
>   	/* Squash all pending faults on the GT */
>   
>   	spin_lock_irq(&pf_queue->lock);
> -	for (i = pf_queue->tail; i != pf_queue->head;
> -	     i = (i + xe_pagefault_entry_size()) % pf_queue->size) {
> +	for (i = 0; i < pf_queue->size; i += xe_pagefault_entry_size()) {
>   		struct xe_pagefault *pf = pf_queue->data + i;
>   
> -		if (pf->gt == gt)
> -			pf->gt = NULL;
> +		if (pf->gt != gt)
> +			continue;
> +
> +		pf->consumer.alloc_state =
> +			XE_PAGEFAULT_ALLOC_STATE_FREE;
> +		pf->consumer.next = NULL;
>   	}
>   	spin_unlock_irq(&pf_queue->lock);
>   }
> @@ -453,12 +772,11 @@ void xe_pagefault_reset(struct xe_device *xe, struct xe_gt *gt)
>   	xe_pagefault_queue_reset(xe, gt, &xe->usm.pf_queue);
>   }
>   
> -static bool xe_pagefault_queue_full(struct xe_pagefault_queue *pf_queue)
> +static bool xe_pagefault_queue_empty(struct xe_pagefault_queue *pf_queue)
>   {
>   	lockdep_assert_held(&pf_queue->lock);
>   
> -	return CIRC_SPACE(pf_queue->head, pf_queue->tail, pf_queue->size) <=
> -		xe_pagefault_entry_size();
> +	return pf_queue->head == pf_queue->tail;
>   }
>   
>   /*
> @@ -486,18 +804,26 @@ int xe_pagefault_handler(struct xe_device *xe, struct xe_pagefault *pf)
>   {
>   	struct xe_pagefault_queue *pf_queue = &xe->usm.pf_queue;
>   	unsigned long flags;
> -	int work_index;
>   	bool full;
>   
>   	spin_lock_irqsave(&pf_queue->lock, flags);
> -	work_index = xe_pagefault_work_index(xe);
>   	full = xe_pagefault_queue_full(pf_queue);
>   	if (!full) {
> -		memcpy(pf_queue->data + pf_queue->head, pf, sizeof(*pf));
> -		pf_queue->head = (pf_queue->head + xe_pagefault_entry_size()) %
> -			pf_queue->size;
> -		queue_work(xe->usm.pf_wq,
> -			   &xe->usm.pf_workers[work_index].work);
> +		struct xe_pagefault *lpf;
> +		bool empty = xe_pagefault_queue_empty(pf_queue);
> +
> +		lpf = __xe_pagefault_queue_add(pf_queue, pf);
> +		lpf->consumer.next = NULL;
> +
> +		if (xe_pagefault_cache_hit(pf_queue, lpf)) {
> +			if (empty)
> +				xe_pagefault_queue_advance(pf_queue);
> +		} else {
> +			int work_index = xe_pagefault_work_index(xe);
> +
> +			queue_work(xe->usm.pf_wq,
> +				   &xe->usm.pf_workers[work_index].work);
> +		}
>   	} else {
>   		drm_warn(&xe->drm,
>   			 "PageFault Queue full, shouldn't be possible\n");
> diff --git a/drivers/gpu/drm/xe/xe_pagefault.h b/drivers/gpu/drm/xe/xe_pagefault.h
> index bd0cdf9ed37f..feaf2a69674a 100644
> --- a/drivers/gpu/drm/xe/xe_pagefault.h
> +++ b/drivers/gpu/drm/xe/xe_pagefault.h
> @@ -6,6 +6,8 @@
>   #ifndef _XE_PAGEFAULT_H_
>   #define _XE_PAGEFAULT_H_
>   
> +#include "xe_pagefault_types.h"
> +
>   struct xe_device;
>   struct xe_gt;
>   struct xe_pagefault;
> @@ -16,4 +18,73 @@ void xe_pagefault_reset(struct xe_device *xe, struct xe_gt *gt);
>   
>   int xe_pagefault_handler(struct xe_device *xe, struct xe_pagefault *pf);
>   
> +#define XE_PAGEFAULT_END_ADDR_MASK	(~0xfffull)
> +
> +/**
> + * xe_pagefault_set_end_addr() - store serviced range end for a pagefault
> + * @pf: Pagefault entry
> + * @end_addr: Inclusive end address of the serviced fault range
> + *
> + * The pagefault consumer stores the resolved fault range so subsequent faults
> + * hitting the same range can be immediately acknowledged without re-running
> + * the full fault handling path.
> + *
> + * The end address shares storage with other consumer metadata and therefore
> + * must be masked with %XE_PAGEFAULT_END_ADDR_MASK before storing. Bits outside
> + * the mask are reserved for internal state tracking and must be preserved.
> + */
> +static inline void
> +xe_pagefault_set_end_addr(struct xe_pagefault *pf, u64 end_addr)
> +{
> +	pf->consumer.end_addr &= ~XE_PAGEFAULT_END_ADDR_MASK;
> +	pf->consumer.end_addr |= end_addr;
> +}
> +
> +/**
> + * xe_pagefault_end_addr() - read serviced range end for a pagefault
> + * @pf: Pagefault entry
> + *
> + * Returns the inclusive end address of the range previously recorded by
> + * xe_pagefault_set_end_addr(). Only the bits covered by
> + * %XE_PAGEFAULT_END_ADDR_MASK are returned; other bits in the storage are
> + * reserved for internal state.
> + *
> + * Return: End address of the serviced fault range.
> + */
> +static inline u64 xe_pagefault_end_addr(struct xe_pagefault *pf)
> +{
> +	return pf->consumer.end_addr & XE_PAGEFAULT_END_ADDR_MASK;
> +}
> +
> +#undef XE_PAGEFAULT_END_ADDR_MASK
> +
> +/**
> + * xe_pagefault_set_start_addr() - store serviced range start for a pagefault
> + * @pf: Pagefault entry
> + * @start_addr: Start address of the serviced fault range
> + *
> + * The pagefault consumer stores the resolved fault range so subsequent faults
> + * hitting the same range can be immediately acknowledged without re-running
> + * the full fault handling path.
> + */
> +static inline void
> +xe_pagefault_set_start_addr(struct xe_pagefault *pf, u64 start_addr)
> +{
> +	pf->consumer.page_addr = start_addr;
> +}
> +
> +/**
> + * xe_pagefault_start_addr() - read serviced range start for a pagefault
> + * @pf: Pagefault entry
> + *
> + * Returns the inclusive start address of the range previously recorded by
> + * xe_pagefault_set_start_addr().
> + *
> + * Return: Start address of the serviced fault range.
> + */
> +static inline u64 xe_pagefault_start_addr(struct xe_pagefault *pf)
> +{
> +	return pf->consumer.page_addr;
> +}
> +
>   #endif
> diff --git a/drivers/gpu/drm/xe/xe_pagefault_types.h b/drivers/gpu/drm/xe/xe_pagefault_types.h
> index 75bc53205601..57cb292105d7 100644
> --- a/drivers/gpu/drm/xe/xe_pagefault_types.h
> +++ b/drivers/gpu/drm/xe/xe_pagefault_types.h
> @@ -60,36 +60,55 @@ struct xe_pagefault {
>   	/**
>   	 * @consumer: State for the software handling the fault. Populated by
>   	 * the producer and may be modified by the consumer to communicate
> -	 * information back to the producer upon fault acknowledgment.
> +	 * information back to the producer upon fault acknowledgment. After
> +	 * fault acknowledgment, the producer should only access consumer fields
> +	 * via well defined helpers.
>   	 */
>   	struct {
> -		/** @consumer.page_addr: address of page fault */
> -		u64 page_addr;
> -		/** @consumer.asid: address space ID */
> -		u32 asid;
>   		/**
> -		 * @consumer.access_type: access type and prefetch flag packed
> -		 * into a u8.
> +		 * @consumer.page_addr: address of page fault, populated by
> +		 * consumer after fault completion
>   		 */
> -		u8 access_type;
> +		u64 page_addr;
> +		union {
> +			struct {
> +				/** @alloc_state: page fault allocation state */
> +				u8 alloc_state;
> +				/**
> +				 * @consumer.access_type: access type, u8 rather
> +				 * than enum to keep size compact
> +				 */
> +				u8 access_type;
>   #define XE_PAGEFAULT_ACCESS_TYPE_MASK	GENMASK(1, 0)
>   #define XE_PAGEFAULT_ACCESS_PREFETCH	BIT(7)
> -		/**
> -		 * @consumer.fault_type_level: fault type and level, u8 rather
> -		 * than enum to keep size compact
> -		 */
> -		u8 fault_type_level;
> +				/**
> +				 * @consumer.fault_type_level: fault type and
> +				 * level, u8 rather than enum to keep size
> +				 * compact
> +				 */
> +				u8 fault_type_level;
>   #define XE_PAGEFAULT_TYPE_LEVEL_NACK		0xff	/* Producer indicates nack fault */
> -#define XE_PAGEFAULT_LEVEL_MASK			GENMASK(3, 0)
> -#define XE_PAGEFAULT_TYPE_MASK			GENMASK(7, 4)
> -		/** @consumer.engine_class_instance: engine class and instance */
> -		u8 engine_class_instance;
> +#define XE_PAGEFAULT_LEVEL_MASK			GENMASK(2, 0)
> +#define XE_PAGEFAULT_TYPE_MASK			GENMASK(6, 3)
> +#define XE_PAGEFAULT_REQUEUE_MASK		BIT(7)
> +				/** @consumer.engine_class_instance: engine class and instance */
> +				u8 engine_class_instance;
>   #define XE_PAGEFAULT_ENGINE_CLASS_MASK		GENMASK(3, 0)
>   #define XE_PAGEFAULT_ENGINE_INSTANCE_MASK	GENMASK(7, 4)
> -		/** @pad: alignment padding */
> -		u8 pad;
> -		/** consumer.reserved: reserved bits for future expansion */
> -		u64 reserved;
> +				/** @consumer.asid: address space ID */
> +				u32 asid;
> +			};
> +			/**
> +			 * @consumer.end_addr: end address of page fault,
> +			 * populated by consumer after fault completion
> +			 */
> +			u64 end_addr;
> +		};
> +		/**
> +		 * @consumer.next: next pagefault chained to this fault,
> +		 * protected by pf_queue lock
> +		 */
> +		struct xe_pagefault *next;
>   	} consumer;
>   	/**
>   	 * @producer: State for the producer (i.e., HW/FW interface). Populated
> @@ -131,7 +150,7 @@ struct xe_pagefault_queue {
>   	u32 head;
>   	/** @tail: Tail pointer in bytes, moved by consumer, protected by @lock */
>   	u32 tail;
> -	/** @lock: protects page fault queue */
> +	/** @lock: protects page fault queue, workers caches */
>   	spinlock_t lock;
>   };
>   
> @@ -146,6 +165,21 @@ struct xe_pagefault_work {
>   	struct xe_device *xe;
>   	/** @id: Identifier for this work item */
>   	int id;
> +	/**
> +	 * @cache: Page fault cache for the currently processed fault
> +	 *
> +	 * Protected by the page fault queue lock.
> +	 */
> +	struct {
> +		/** @cache.start: Start address of the current page fault */
> +		u64 start;
> +		/** @cache.end: End address of the current page fault */
> +		u64 end;
> +		/** @cache.asid: Address space ID of the current page fault */
> +		u32 asid;
> +		/** @cache.pf: Pointer to the current page fault */
> +		struct xe_pagefault *pf;
> +	} cache;
>   	/** @work: Work item used to process the page fault */
>   	struct work_struct work;
>   };
> diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> index 66eee490a0c3..fc439fd85187 100644
> --- a/drivers/gpu/drm/xe/xe_svm.c
> +++ b/drivers/gpu/drm/xe/xe_svm.c
> @@ -15,6 +15,7 @@
>   #include "xe_gt_stats.h"
>   #include "xe_migrate.h"
>   #include "xe_module.h"
> +#include "xe_pagefault.h"
>   #include "xe_pm.h"
>   #include "xe_pt.h"
>   #include "xe_svm.h"
> @@ -1215,8 +1216,8 @@ DECL_SVM_RANGE_US_STATS(bind, BIND)
>   DECL_SVM_RANGE_US_STATS(fault, PAGEFAULT)
>   
>   static int __xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
> -				     struct xe_gt *gt, u64 fault_addr,
> -				     bool need_vram)
> +				     struct xe_pagefault *pf, struct xe_gt *gt,
> +				     u64 fault_addr, bool need_vram)
>   {
>   	int devmem_possible = IS_DGFX(vm->xe) &&
>   		IS_ENABLED(CONFIG_DRM_XE_PAGEMAP);
> @@ -1372,6 +1373,10 @@ static int __xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
>   	xe_svm_range_bind_us_stats_incr(gt, range, bind_start);
>   
>   out:
> +	/* Give hint to immediately ack faults */
> +	xe_pagefault_set_start_addr(pf,  xe_svm_range_start(range));
> +	xe_pagefault_set_end_addr(pf, xe_svm_range_end(range));
> +
>   	xe_svm_range_fault_us_stats_incr(gt, range, start);
>   	mutex_unlock(&range->lock);
>   	drm_gpusvm_range_put(&range->base);
> @@ -1394,6 +1399,7 @@ static int __xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
>    * xe_svm_handle_pagefault() - SVM handle page fault
>    * @vm: The VM.
>    * @vma: The CPU address mirror VMA.
> + * @pf: Pagefault structure
>    * @gt: The gt upon the fault occurred.
>    * @fault_addr: The GPU fault address.
>    * @atomic: The fault atomic access bit.
> @@ -1404,8 +1410,8 @@ static int __xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
>    * Return: 0 on success, negative error code on error.
>    */
>   int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
> -			    struct xe_gt *gt, u64 fault_addr,
> -			    bool atomic)
> +			    struct xe_pagefault *pf, struct xe_gt *gt,
> +			    u64 fault_addr, bool atomic)
>   {
>   	int need_vram, ret;
>   retry:
> @@ -1413,7 +1419,7 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
>   	if (need_vram < 0)
>   		return need_vram;
>   
> -	ret =  __xe_svm_handle_pagefault(vm, vma, gt, fault_addr,
> +	ret =  __xe_svm_handle_pagefault(vm, vma, pf, gt, fault_addr,
>   					 need_vram ? true : false);
>   	if (ret == -EAGAIN) {
>   		/*
> diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
> index ebcca34f7f4d..07be92579971 100644
> --- a/drivers/gpu/drm/xe/xe_svm.h
> +++ b/drivers/gpu/drm/xe/xe_svm.h
> @@ -21,6 +21,7 @@ struct drm_file;
>   struct xe_bo;
>   struct xe_gt;
>   struct xe_device;
> +struct xe_pagefault;
>   struct xe_vram_region;
>   struct xe_tile;
>   struct xe_vm;
> @@ -107,8 +108,8 @@ void xe_svm_fini(struct xe_vm *vm);
>   void xe_svm_close(struct xe_vm *vm);
>   
>   int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
> -			    struct xe_gt *gt, u64 fault_addr,
> -			    bool atomic);
> +			    struct xe_pagefault *pf, struct xe_gt *gt,
> +			    u64 fault_addr, bool atomic);
>   
>   bool xe_svm_has_mapping(struct xe_vm *vm, u64 start, u64 end);
>   
> @@ -296,8 +297,8 @@ void xe_svm_close(struct xe_vm *vm)
>   
>   static inline
>   int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
> -			    struct xe_gt *gt, u64 fault_addr,
> -			    bool atomic)
> +			    struct xe_pagefault *pf, struct xe_gt *gt,
> +			    u64 fault_addr, bool atomic)
>   {
>   	return 0;
>   }

This is huge and impressive work! The patch and the whole series.

I can't say I'm done with the review.  Will continue it as need to get 
the understanding of it for

Intel Xe GPU Debug Support (eudebug) 
(https://patchwork.freedesktop.org/series/165777/)


Regards,

Maciej


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v4 09/12] drm/xe: Add pagefault chaining stats
  2026-02-26  4:28 [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache Matthew Brost
                   ` (7 preceding siblings ...)
  2026-02-26  4:28 ` [PATCH v4 08/12] drm/xe: Chain page faults via queue-resident cache to avoid fault storms Matthew Brost
@ 2026-02-26  4:28 ` Matthew Brost
  2026-05-07 13:15   ` Maciej Patelczyk
  2026-02-26  4:28 ` [PATCH v4 10/12] drm/xe: Add debugfs pagefault_info Matthew Brost
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 33+ messages in thread
From: Matthew Brost @ 2026-02-26  4:28 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Add GT stats to quantify pagefault chaining behavior during fault storms.

Track total chained faults, faults chained directly from the IRQ
handler, cases where IRQ chaining also drained the queue, chained faults
that had to be requeued due to range mismatch, and cases where the last
serviced range allowed immediate ack.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_stats.c       |  5 +++++
 drivers/gpu/drm/xe/xe_gt_stats_types.h |  5 +++++
 drivers/gpu/drm/xe/xe_pagefault.c      | 18 ++++++++++++++++--
 3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_stats.c b/drivers/gpu/drm/xe/xe_gt_stats.c
index c1af3ecb429b..cdd467dfb46d 100644
--- a/drivers/gpu/drm/xe/xe_gt_stats.c
+++ b/drivers/gpu/drm/xe/xe_gt_stats.c
@@ -54,6 +54,11 @@ void xe_gt_stats_incr(struct xe_gt *gt, const enum xe_gt_stats_id id, int incr)
 #define DEF_STAT_STR(ID, name) [XE_GT_STATS_ID_##ID] = name
 
 static const char *const stat_description[__XE_GT_STATS_NUM_IDS] = {
+	DEF_STAT_STR(CHAIN_PAGEFAULT_COUNT, "chain_pagefault_count"),
+	DEF_STAT_STR(CHAIN_IRQ_PAGEFAULT_COUNT, "chain_irq_pagefault_count"),
+	DEF_STAT_STR(CHAIN_DRAIN_IRQ_PAGEFAULT_COUNT, "chain_drain_irq_pagefault_count"),
+	DEF_STAT_STR(CHAIN_MISMATCH_PAGEFAULT_COUNT, "chain_mismatch_pagefault_count"),
+	DEF_STAT_STR(LAST_PAGEFAULT_COUNT, "last_pagefault_count"),
 	DEF_STAT_STR(SVM_PAGEFAULT_COUNT, "svm_pagefault_count"),
 	DEF_STAT_STR(TLB_INVAL, "tlb_inval_count"),
 	DEF_STAT_STR(SVM_TLB_INVAL_COUNT, "svm_tlb_inval_count"),
diff --git a/drivers/gpu/drm/xe/xe_gt_stats_types.h b/drivers/gpu/drm/xe/xe_gt_stats_types.h
index 129260bfdfe6..591e614e1cfc 100644
--- a/drivers/gpu/drm/xe/xe_gt_stats_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_stats_types.h
@@ -9,6 +9,11 @@
 #include <linux/types.h>
 
 enum xe_gt_stats_id {
+	XE_GT_STATS_ID_CHAIN_PAGEFAULT_COUNT,
+	XE_GT_STATS_ID_CHAIN_IRQ_PAGEFAULT_COUNT,
+	XE_GT_STATS_ID_CHAIN_DRAIN_IRQ_PAGEFAULT_COUNT,
+	XE_GT_STATS_ID_CHAIN_MISMATCH_PAGEFAULT_COUNT,
+	XE_GT_STATS_ID_LAST_PAGEFAULT_COUNT,
 	XE_GT_STATS_ID_SVM_PAGEFAULT_COUNT,
 	XE_GT_STATS_ID_TLB_INVAL,
 	XE_GT_STATS_ID_SVM_TLB_INVAL_COUNT,
diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
index 9c14f9505faf..c497dd8d9724 100644
--- a/drivers/gpu/drm/xe/xe_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_pagefault.c
@@ -364,6 +364,7 @@ xe_pagefault_queue_requeue(struct xe_pagefault_queue *pf_queue,
 					    usm.pf_queue);
 	struct xe_pagefault *next = pf->consumer.next, *lpf;
 
+	xe_gt_stats_incr(gt, XE_GT_STATS_ID_CHAIN_MISMATCH_PAGEFAULT_COUNT, 1);
 	xe_assert(xe, pf->consumer.alloc_state ==
 		  XE_PAGEFAULT_ALLOC_STATE_CHAINED);
 
@@ -423,6 +424,10 @@ static bool xe_pagefault_cache_hit(struct xe_pagefault_queue *pf_queue,
 			xe_assert(xe, pf_work->cache.pf->consumer.alloc_state ==
 				  XE_PAGEFAULT_ALLOC_STATE_ACTIVE);
 
+			xe_gt_stats_incr(pf->gt,
+					 XE_GT_STATS_ID_CHAIN_PAGEFAULT_COUNT,
+					 1);
+
 			pf->consumer.alloc_state =
 				XE_PAGEFAULT_ALLOC_STATE_CHAINED;
 			pf->consumer.next = pf_work->cache.pf->consumer.next;
@@ -559,8 +564,10 @@ static void xe_pagefault_queue_work(struct work_struct *w)
 
 		/* Last fault same address, ack immediately */
 		if (xe_pagefault_cache_match(pf, cache_start, cache_end,
-					     cache_asid))
+					     cache_asid)) {
+			xe_gt_stats_incr(gt, XE_GT_STATS_ID_LAST_PAGEFAULT_COUNT, 1);
 			goto ack_fault;
+		}
 
 		err = xe_pagefault_service(pf);
 
@@ -816,8 +823,15 @@ int xe_pagefault_handler(struct xe_device *xe, struct xe_pagefault *pf)
 		lpf->consumer.next = NULL;
 
 		if (xe_pagefault_cache_hit(pf_queue, lpf)) {
-			if (empty)
+			xe_gt_stats_incr(pf->gt,
+					 XE_GT_STATS_ID_CHAIN_IRQ_PAGEFAULT_COUNT,
+					 1);
+			if (empty) {
 				xe_pagefault_queue_advance(pf_queue);
+				xe_gt_stats_incr(pf->gt,
+						 XE_GT_STATS_ID_CHAIN_DRAIN_IRQ_PAGEFAULT_COUNT,
+						 1);
+			}
 		} else {
 			int work_index = xe_pagefault_work_index(xe);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 09/12] drm/xe: Add pagefault chaining stats
  2026-02-26  4:28 ` [PATCH v4 09/12] drm/xe: Add pagefault chaining stats Matthew Brost
@ 2026-05-07 13:15   ` Maciej Patelczyk
  2026-05-07 13:52     ` Francois Dugast
  0 siblings, 1 reply; 33+ messages in thread
From: Maciej Patelczyk @ 2026-05-07 13:15 UTC (permalink / raw)
  To: Matthew Brost, intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

On 26/02/2026 05:28, Matthew Brost wrote:

> Add GT stats to quantify pagefault chaining behavior during fault storms.
>
> Track total chained faults, faults chained directly from the IRQ
> handler, cases where IRQ chaining also drained the queue, chained faults
> that had to be requeued due to range mismatch, and cases where the last
> serviced range allowed immediate ack.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_gt_stats.c       |  5 +++++
>   drivers/gpu/drm/xe/xe_gt_stats_types.h |  5 +++++
>   drivers/gpu/drm/xe/xe_pagefault.c      | 18 ++++++++++++++++--
>   3 files changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_gt_stats.c b/drivers/gpu/drm/xe/xe_gt_stats.c
> index c1af3ecb429b..cdd467dfb46d 100644
> --- a/drivers/gpu/drm/xe/xe_gt_stats.c
> +++ b/drivers/gpu/drm/xe/xe_gt_stats.c
> @@ -54,6 +54,11 @@ void xe_gt_stats_incr(struct xe_gt *gt, const enum xe_gt_stats_id id, int incr)
>   #define DEF_STAT_STR(ID, name) [XE_GT_STATS_ID_##ID] = name
>   
>   static const char *const stat_description[__XE_GT_STATS_NUM_IDS] = {
> +	DEF_STAT_STR(CHAIN_PAGEFAULT_COUNT, "chain_pagefault_count"),
> +	DEF_STAT_STR(CHAIN_IRQ_PAGEFAULT_COUNT, "chain_irq_pagefault_count"),
> +	DEF_STAT_STR(CHAIN_DRAIN_IRQ_PAGEFAULT_COUNT, "chain_drain_irq_pagefault_count"),
> +	DEF_STAT_STR(CHAIN_MISMATCH_PAGEFAULT_COUNT, "chain_mismatch_pagefault_count"),
> +	DEF_STAT_STR(LAST_PAGEFAULT_COUNT, "last_pagefault_count"),
>   	DEF_STAT_STR(SVM_PAGEFAULT_COUNT, "svm_pagefault_count"),
>   	DEF_STAT_STR(TLB_INVAL, "tlb_inval_count"),
>   	DEF_STAT_STR(SVM_TLB_INVAL_COUNT, "svm_tlb_inval_count"),
> diff --git a/drivers/gpu/drm/xe/xe_gt_stats_types.h b/drivers/gpu/drm/xe/xe_gt_stats_types.h
> index 129260bfdfe6..591e614e1cfc 100644
> --- a/drivers/gpu/drm/xe/xe_gt_stats_types.h
> +++ b/drivers/gpu/drm/xe/xe_gt_stats_types.h
> @@ -9,6 +9,11 @@
>   #include <linux/types.h>
>   
>   enum xe_gt_stats_id {
> +	XE_GT_STATS_ID_CHAIN_PAGEFAULT_COUNT,
> +	XE_GT_STATS_ID_CHAIN_IRQ_PAGEFAULT_COUNT,
> +	XE_GT_STATS_ID_CHAIN_DRAIN_IRQ_PAGEFAULT_COUNT,
> +	XE_GT_STATS_ID_CHAIN_MISMATCH_PAGEFAULT_COUNT,
> +	XE_GT_STATS_ID_LAST_PAGEFAULT_COUNT,
>   	XE_GT_STATS_ID_SVM_PAGEFAULT_COUNT,
>   	XE_GT_STATS_ID_TLB_INVAL,
>   	XE_GT_STATS_ID_SVM_TLB_INVAL_COUNT,
> diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
> index 9c14f9505faf..c497dd8d9724 100644
> --- a/drivers/gpu/drm/xe/xe_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_pagefault.c
> @@ -364,6 +364,7 @@ xe_pagefault_queue_requeue(struct xe_pagefault_queue *pf_queue,
>   					    usm.pf_queue);
>   	struct xe_pagefault *next = pf->consumer.next, *lpf;
>   
> +	xe_gt_stats_incr(gt, XE_GT_STATS_ID_CHAIN_MISMATCH_PAGEFAULT_COUNT, 1);
>   	xe_assert(xe, pf->consumer.alloc_state ==
>   		  XE_PAGEFAULT_ALLOC_STATE_CHAINED);
>   
> @@ -423,6 +424,10 @@ static bool xe_pagefault_cache_hit(struct xe_pagefault_queue *pf_queue,
>   			xe_assert(xe, pf_work->cache.pf->consumer.alloc_state ==
>   				  XE_PAGEFAULT_ALLOC_STATE_ACTIVE);
>   
> +			xe_gt_stats_incr(pf->gt,
> +					 XE_GT_STATS_ID_CHAIN_PAGEFAULT_COUNT,
> +					 1);
> +
>   			pf->consumer.alloc_state =
>   				XE_PAGEFAULT_ALLOC_STATE_CHAINED;
>   			pf->consumer.next = pf_work->cache.pf->consumer.next;
> @@ -559,8 +564,10 @@ static void xe_pagefault_queue_work(struct work_struct *w)
>   
>   		/* Last fault same address, ack immediately */
>   		if (xe_pagefault_cache_match(pf, cache_start, cache_end,
> -					     cache_asid))
> +					     cache_asid)) {
> +			xe_gt_stats_incr(gt, XE_GT_STATS_ID_LAST_PAGEFAULT_COUNT, 1);
>   			goto ack_fault;
> +		}
>   
>   		err = xe_pagefault_service(pf);
>   
> @@ -816,8 +823,15 @@ int xe_pagefault_handler(struct xe_device *xe, struct xe_pagefault *pf)
>   		lpf->consumer.next = NULL;
>   
>   		if (xe_pagefault_cache_hit(pf_queue, lpf)) {
> -			if (empty)
> +			xe_gt_stats_incr(pf->gt,
> +					 XE_GT_STATS_ID_CHAIN_IRQ_PAGEFAULT_COUNT,
> +					 1);
> +			if (empty) {
>   				xe_pagefault_queue_advance(pf_queue);
> +				xe_gt_stats_incr(pf->gt,
> +						 XE_GT_STATS_ID_CHAIN_DRAIN_IRQ_PAGEFAULT_COUNT,
> +						 1);
> +			}
>   		} else {
>   			int work_index = xe_pagefault_work_index(xe);
>   

Looks good,

Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com>



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 09/12] drm/xe: Add pagefault chaining stats
  2026-05-07 13:15   ` Maciej Patelczyk
@ 2026-05-07 13:52     ` Francois Dugast
  0 siblings, 0 replies; 33+ messages in thread
From: Francois Dugast @ 2026-05-07 13:52 UTC (permalink / raw)
  To: Maciej Patelczyk
  Cc: Matthew Brost, intel-xe, stuart.summers, arvind.yadav,
	himal.prasad.ghimiray, thomas.hellstrom

On Thu, May 07, 2026 at 03:15:12PM +0200, Maciej Patelczyk wrote:
> On 26/02/2026 05:28, Matthew Brost wrote:
> 
> > Add GT stats to quantify pagefault chaining behavior during fault storms.
> > 
> > Track total chained faults, faults chained directly from the IRQ
> > handler, cases where IRQ chaining also drained the queue, chained faults
> > that had to be requeued due to range mismatch, and cases where the last
> > serviced range allowed immediate ack.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_gt_stats.c       |  5 +++++
> >   drivers/gpu/drm/xe/xe_gt_stats_types.h |  5 +++++
> >   drivers/gpu/drm/xe/xe_pagefault.c      | 18 ++++++++++++++++--
> >   3 files changed, 26 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_gt_stats.c b/drivers/gpu/drm/xe/xe_gt_stats.c
> > index c1af3ecb429b..cdd467dfb46d 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_stats.c
> > +++ b/drivers/gpu/drm/xe/xe_gt_stats.c
> > @@ -54,6 +54,11 @@ void xe_gt_stats_incr(struct xe_gt *gt, const enum xe_gt_stats_id id, int incr)
> >   #define DEF_STAT_STR(ID, name) [XE_GT_STATS_ID_##ID] = name
> >   static const char *const stat_description[__XE_GT_STATS_NUM_IDS] = {
> > +	DEF_STAT_STR(CHAIN_PAGEFAULT_COUNT, "chain_pagefault_count"),
> > +	DEF_STAT_STR(CHAIN_IRQ_PAGEFAULT_COUNT, "chain_irq_pagefault_count"),
> > +	DEF_STAT_STR(CHAIN_DRAIN_IRQ_PAGEFAULT_COUNT, "chain_drain_irq_pagefault_count"),
> > +	DEF_STAT_STR(CHAIN_MISMATCH_PAGEFAULT_COUNT, "chain_mismatch_pagefault_count"),
> > +	DEF_STAT_STR(LAST_PAGEFAULT_COUNT, "last_pagefault_count"),
> >   	DEF_STAT_STR(SVM_PAGEFAULT_COUNT, "svm_pagefault_count"),
> >   	DEF_STAT_STR(TLB_INVAL, "tlb_inval_count"),
> >   	DEF_STAT_STR(SVM_TLB_INVAL_COUNT, "svm_tlb_inval_count"),
> > diff --git a/drivers/gpu/drm/xe/xe_gt_stats_types.h b/drivers/gpu/drm/xe/xe_gt_stats_types.h
> > index 129260bfdfe6..591e614e1cfc 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_stats_types.h
> > +++ b/drivers/gpu/drm/xe/xe_gt_stats_types.h
> > @@ -9,6 +9,11 @@
> >   #include <linux/types.h>
> >   enum xe_gt_stats_id {
> > +	XE_GT_STATS_ID_CHAIN_PAGEFAULT_COUNT,
> > +	XE_GT_STATS_ID_CHAIN_IRQ_PAGEFAULT_COUNT,
> > +	XE_GT_STATS_ID_CHAIN_DRAIN_IRQ_PAGEFAULT_COUNT,
> > +	XE_GT_STATS_ID_CHAIN_MISMATCH_PAGEFAULT_COUNT,
> > +	XE_GT_STATS_ID_LAST_PAGEFAULT_COUNT,
> >   	XE_GT_STATS_ID_SVM_PAGEFAULT_COUNT,
> >   	XE_GT_STATS_ID_TLB_INVAL,
> >   	XE_GT_STATS_ID_SVM_TLB_INVAL_COUNT,
> > diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
> > index 9c14f9505faf..c497dd8d9724 100644
> > --- a/drivers/gpu/drm/xe/xe_pagefault.c
> > +++ b/drivers/gpu/drm/xe/xe_pagefault.c
> > @@ -364,6 +364,7 @@ xe_pagefault_queue_requeue(struct xe_pagefault_queue *pf_queue,
> >   					    usm.pf_queue);
> >   	struct xe_pagefault *next = pf->consumer.next, *lpf;
> > +	xe_gt_stats_incr(gt, XE_GT_STATS_ID_CHAIN_MISMATCH_PAGEFAULT_COUNT, 1);
> >   	xe_assert(xe, pf->consumer.alloc_state ==
> >   		  XE_PAGEFAULT_ALLOC_STATE_CHAINED);
> > @@ -423,6 +424,10 @@ static bool xe_pagefault_cache_hit(struct xe_pagefault_queue *pf_queue,
> >   			xe_assert(xe, pf_work->cache.pf->consumer.alloc_state ==
> >   				  XE_PAGEFAULT_ALLOC_STATE_ACTIVE);
> > +			xe_gt_stats_incr(pf->gt,
> > +					 XE_GT_STATS_ID_CHAIN_PAGEFAULT_COUNT,
> > +					 1);
> > +
> >   			pf->consumer.alloc_state =
> >   				XE_PAGEFAULT_ALLOC_STATE_CHAINED;
> >   			pf->consumer.next = pf_work->cache.pf->consumer.next;
> > @@ -559,8 +564,10 @@ static void xe_pagefault_queue_work(struct work_struct *w)
> >   		/* Last fault same address, ack immediately */
> >   		if (xe_pagefault_cache_match(pf, cache_start, cache_end,
> > -					     cache_asid))
> > +					     cache_asid)) {
> > +			xe_gt_stats_incr(gt, XE_GT_STATS_ID_LAST_PAGEFAULT_COUNT, 1);
> >   			goto ack_fault;
> > +		}
> >   		err = xe_pagefault_service(pf);
> > @@ -816,8 +823,15 @@ int xe_pagefault_handler(struct xe_device *xe, struct xe_pagefault *pf)
> >   		lpf->consumer.next = NULL;
> >   		if (xe_pagefault_cache_hit(pf_queue, lpf)) {
> > -			if (empty)
> > +			xe_gt_stats_incr(pf->gt,
> > +					 XE_GT_STATS_ID_CHAIN_IRQ_PAGEFAULT_COUNT,
> > +					 1);
> > +			if (empty) {
> >   				xe_pagefault_queue_advance(pf_queue);
> > +				xe_gt_stats_incr(pf->gt,
> > +						 XE_GT_STATS_ID_CHAIN_DRAIN_IRQ_PAGEFAULT_COUNT,
> > +						 1);
> > +			}
> >   		} else {
> >   			int work_index = xe_pagefault_work_index(xe);
> 
> Looks good,
> 
> Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
> 
> 

Actually the patch now causes this warning during build:

    drivers/gpu/drm/xe/xe_gt_stats_types.h:186 Enum value 'XE_GT_STATS_ID_PAGEFAULT_US' not described in enum 'xe_gt_stats_id'

So please add a description in the kernel doc, like for other GT stats entries.

Francois

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v4 10/12] drm/xe: Add debugfs pagefault_info
  2026-02-26  4:28 [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache Matthew Brost
                   ` (8 preceding siblings ...)
  2026-02-26  4:28 ` [PATCH v4 09/12] drm/xe: Add pagefault chaining stats Matthew Brost
@ 2026-02-26  4:28 ` Matthew Brost
  2026-05-07 10:07   ` Maciej Patelczyk
  2026-02-26  4:28 ` [PATCH v4 11/12] drm/xe: batch CT pagefault acks with periodic flush Matthew Brost
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 33+ messages in thread
From: Matthew Brost @ 2026-02-26  4:28 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Add a debugfs entry to dump Xe page fault queue state. The output
includes queue geometry (entry size, total size, head/tail), per-entry
allocation state counts, and whether each page fault worker cache is
currently valid.

This is intended to help debug page fault storms, chaining, and retry
behaviour without needing tracing.

Assisted-by: Chat-GPT # Documentation
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_debugfs.c   | 11 ++++++
 drivers/gpu/drm/xe/xe_pagefault.c | 62 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_pagefault.h |  3 ++
 3 files changed, 76 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
index 844cfafe1ec7..f02481be2501 100644
--- a/drivers/gpu/drm/xe/xe_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_debugfs.c
@@ -19,6 +19,7 @@
 #include "xe_gt_printk.h"
 #include "xe_guc_ads.h"
 #include "xe_mmio.h"
+#include "xe_pagefault.h"
 #include "xe_pm.h"
 #include "xe_psmi.h"
 #include "xe_pxp_debugfs.h"
@@ -109,6 +110,15 @@ static int sriov_info(struct seq_file *m, void *data)
 	return 0;
 }
 
+static int pagefault_info(struct seq_file *m, void *data)
+{
+	struct xe_device *xe = node_to_xe(m->private);
+	struct drm_printer p = drm_seq_file_printer(m);
+
+	xe_pagefault_print_info(xe, &p);
+	return 0;
+}
+
 static int workarounds(struct xe_device *xe, struct drm_printer *p)
 {
 	guard(xe_pm_runtime)(xe);
@@ -184,6 +194,7 @@ static const struct drm_info_list debugfs_list[] = {
 	{"info", info, 0},
 	{ .name = "sriov_info", .show = sriov_info, },
 	{ .name = "workarounds", .show = workaround_info, },
+	{ .name = "pagefault_info", .show = pagefault_info, },
 };
 
 static const struct drm_info_list debugfs_residencies[] = {
diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
index c497dd8d9724..2cfda29321c9 100644
--- a/drivers/gpu/drm/xe/xe_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_pagefault.c
@@ -97,6 +97,7 @@ enum xe_pagefault_alloc_state {
 	XE_PAGEFAULT_ALLOC_STATE_QUEUED		= 1,
 	XE_PAGEFAULT_ALLOC_STATE_CHAINED	= 2,
 	XE_PAGEFAULT_ALLOC_STATE_ACTIVE		= 3,
+	XE_PAGEFAULT_ALLOC_STATE_COUNT		= 4,
 };
 
 static int xe_pagefault_entry_size(void)
@@ -846,3 +847,64 @@ int xe_pagefault_handler(struct xe_device *xe, struct xe_pagefault *pf)
 
 	return full ? -ENOSPC : 0;
 }
+
+/**
+ * xe_pagefault_print_info() - dump page fault queue/cache debug information
+ * @xe: Xe device
+ * @p: DRM printer to emit output to
+ *
+ * Print a snapshot of the page fault queue state for debugging. The output
+ * includes queue parameters (entry size, total size, head/tail), a histogram
+ * of per-entry allocation state values, and the validity of each per-worker
+ * page fault cache.
+ *
+ * This function is intended for debugfs and similar diagnostics. It acquires
+ * the page fault queue spinlock internally to serialize against IRQ-side
+ * producers and the worker consumer path, so callers must not hold the queue
+ * lock.
+ */
+void xe_pagefault_print_info(struct xe_device *xe, struct drm_printer *p)
+{
+	struct xe_pagefault_queue *pf_queue = &xe->usm.pf_queue;
+	struct xe_pagefault_work *pf_work;
+	static const char * const alloc_state_names[] = {
+		[XE_PAGEFAULT_ALLOC_STATE_FREE] = "free",
+		[XE_PAGEFAULT_ALLOC_STATE_QUEUED] = "queued",
+		[XE_PAGEFAULT_ALLOC_STATE_CHAINED] = "chained",
+		[XE_PAGEFAULT_ALLOC_STATE_ACTIVE] = "active",
+	};
+	u32 i, counts[XE_PAGEFAULT_ALLOC_STATE_COUNT] = {};
+
+	guard(spinlock_irq)(&pf_queue->lock);
+
+	drm_printf(p, "pagefault size: %u\n", xe_pagefault_entry_size());
+	drm_printf(p, "pagefault queue size: %u\n", pf_queue->size);
+	drm_printf(p, "pagefault queue head: %u\n", pf_queue->head);
+	drm_printf(p, "pagefault queue tail: %u\n", pf_queue->tail);
+
+	for (i = 0; i < pf_queue->size; i += xe_pagefault_entry_size()) {
+		struct xe_pagefault *pf = pf_queue->data + i;
+
+		if (pf->consumer.alloc_state >=
+		    XE_PAGEFAULT_ALLOC_STATE_COUNT) {
+			drm_printf(p, "pagefault[%u] corrupted alloc_state=%u\n",
+				   i, pf->consumer.alloc_state);
+			continue;
+		}
+
+		counts[pf->consumer.alloc_state]++;
+	}
+
+	for (i = 0; i < XE_PAGEFAULT_ALLOC_STATE_COUNT; ++i)
+		drm_printf(p, "pagefault queue %s count: %u\n",
+			   alloc_state_names[i], counts[i]);
+
+	for (i = 0, pf_work = xe->usm.pf_workers;
+	     i < xe->info.num_pf_work; ++i, ++pf_work) {
+		if (pf_work->cache.start == XE_PAGEFAULT_CACHE_START_INVALID)
+			drm_printf(p, "pagefault work[%u] cache invalid\n", i);
+		else
+			drm_printf(p, "pagefault work[%u] cache valid\n", i);
+
+	}
+}
diff --git a/drivers/gpu/drm/xe/xe_pagefault.h b/drivers/gpu/drm/xe/xe_pagefault.h
index feaf2a69674a..e9c5d1f03760 100644
--- a/drivers/gpu/drm/xe/xe_pagefault.h
+++ b/drivers/gpu/drm/xe/xe_pagefault.h
@@ -8,6 +8,7 @@
 
 #include "xe_pagefault_types.h"
 
+struct drm_printer;
 struct xe_device;
 struct xe_gt;
 struct xe_pagefault;
@@ -18,6 +19,8 @@ void xe_pagefault_reset(struct xe_device *xe, struct xe_gt *gt);
 
 int xe_pagefault_handler(struct xe_device *xe, struct xe_pagefault *pf);
 
+void xe_pagefault_print_info(struct xe_device *xe, struct drm_printer *p);
+
 #define XE_PAGEFAULT_END_ADDR_MASK	(~0xfffull)
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 10/12] drm/xe: Add debugfs pagefault_info
  2026-02-26  4:28 ` [PATCH v4 10/12] drm/xe: Add debugfs pagefault_info Matthew Brost
@ 2026-05-07 10:07   ` Maciej Patelczyk
  0 siblings, 0 replies; 33+ messages in thread
From: Maciej Patelczyk @ 2026-05-07 10:07 UTC (permalink / raw)
  To: intel-xe

On 26/02/2026 05:28, Matthew Brost wrote:

> Add a debugfs entry to dump Xe page fault queue state. The output
> includes queue geometry (entry size, total size, head/tail), per-entry
> allocation state counts, and whether each page fault worker cache is
> currently valid.
>
> This is intended to help debug page fault storms, chaining, and retry
> behaviour without needing tracing.
>
> Assisted-by: Chat-GPT # Documentation
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_debugfs.c   | 11 ++++++
>   drivers/gpu/drm/xe/xe_pagefault.c | 62 +++++++++++++++++++++++++++++++
>   drivers/gpu/drm/xe/xe_pagefault.h |  3 ++
>   3 files changed, 76 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
> index 844cfafe1ec7..f02481be2501 100644
> --- a/drivers/gpu/drm/xe/xe_debugfs.c
> +++ b/drivers/gpu/drm/xe/xe_debugfs.c
> @@ -19,6 +19,7 @@
>   #include "xe_gt_printk.h"
>   #include "xe_guc_ads.h"
>   #include "xe_mmio.h"
> +#include "xe_pagefault.h"
>   #include "xe_pm.h"
>   #include "xe_psmi.h"
>   #include "xe_pxp_debugfs.h"
> @@ -109,6 +110,15 @@ static int sriov_info(struct seq_file *m, void *data)
>   	return 0;
>   }
>   
> +static int pagefault_info(struct seq_file *m, void *data)
> +{
> +	struct xe_device *xe = node_to_xe(m->private);
> +	struct drm_printer p = drm_seq_file_printer(m);
> +
> +	xe_pagefault_print_info(xe, &p);
> +	return 0;
> +}
> +
>   static int workarounds(struct xe_device *xe, struct drm_printer *p)
>   {
>   	guard(xe_pm_runtime)(xe);
> @@ -184,6 +194,7 @@ static const struct drm_info_list debugfs_list[] = {
>   	{"info", info, 0},
>   	{ .name = "sriov_info", .show = sriov_info, },
>   	{ .name = "workarounds", .show = workaround_info, },
> +	{ .name = "pagefault_info", .show = pagefault_info, },
>   };
>   
>   static const struct drm_info_list debugfs_residencies[] = {
> diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
> index c497dd8d9724..2cfda29321c9 100644
> --- a/drivers/gpu/drm/xe/xe_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_pagefault.c
> @@ -97,6 +97,7 @@ enum xe_pagefault_alloc_state {
>   	XE_PAGEFAULT_ALLOC_STATE_QUEUED		= 1,
>   	XE_PAGEFAULT_ALLOC_STATE_CHAINED	= 2,
>   	XE_PAGEFAULT_ALLOC_STATE_ACTIVE		= 3,
> +	XE_PAGEFAULT_ALLOC_STATE_COUNT		= 4,
>   };
>   
>   static int xe_pagefault_entry_size(void)
> @@ -846,3 +847,64 @@ int xe_pagefault_handler(struct xe_device *xe, struct xe_pagefault *pf)
>   
>   	return full ? -ENOSPC : 0;
>   }
> +
> +/**
> + * xe_pagefault_print_info() - dump page fault queue/cache debug information
> + * @xe: Xe device
> + * @p: DRM printer to emit output to
> + *
> + * Print a snapshot of the page fault queue state for debugging. The output
> + * includes queue parameters (entry size, total size, head/tail), a histogram
> + * of per-entry allocation state values, and the validity of each per-worker
> + * page fault cache.
> + *
> + * This function is intended for debugfs and similar diagnostics. It acquires
> + * the page fault queue spinlock internally to serialize against IRQ-side
> + * producers and the worker consumer path, so callers must not hold the queue
> + * lock.
> + */
> +void xe_pagefault_print_info(struct xe_device *xe, struct drm_printer *p)
> +{
> +	struct xe_pagefault_queue *pf_queue = &xe->usm.pf_queue;
> +	struct xe_pagefault_work *pf_work;
> +	static const char * const alloc_state_names[] = {
> +		[XE_PAGEFAULT_ALLOC_STATE_FREE] = "free",
> +		[XE_PAGEFAULT_ALLOC_STATE_QUEUED] = "queued",
> +		[XE_PAGEFAULT_ALLOC_STATE_CHAINED] = "chained",
> +		[XE_PAGEFAULT_ALLOC_STATE_ACTIVE] = "active",
> +	};
> +	u32 i, counts[XE_PAGEFAULT_ALLOC_STATE_COUNT] = {};
> +
> +	guard(spinlock_irq)(&pf_queue->lock);
> +
> +	drm_printf(p, "pagefault size: %u\n", xe_pagefault_entry_size());
> +	drm_printf(p, "pagefault queue size: %u\n", pf_queue->size);
> +	drm_printf(p, "pagefault queue head: %u\n", pf_queue->head);
> +	drm_printf(p, "pagefault queue tail: %u\n", pf_queue->tail);
> +
> +	for (i = 0; i < pf_queue->size; i += xe_pagefault_entry_size()) {
> +		struct xe_pagefault *pf = pf_queue->data + i;
> +
> +		if (pf->consumer.alloc_state >=
> +		    XE_PAGEFAULT_ALLOC_STATE_COUNT) {
> +			drm_printf(p, "pagefault[%u] corrupted alloc_state=%u\n",
> +				   i, pf->consumer.alloc_state);
> +			continue;
> +		}
> +
> +		counts[pf->consumer.alloc_state]++;
> +	}
> +
> +	for (i = 0; i < XE_PAGEFAULT_ALLOC_STATE_COUNT; ++i)
> +		drm_printf(p, "pagefault queue %s count: %u\n",
> +			   alloc_state_names[i], counts[i]);
> +
> +	for (i = 0, pf_work = xe->usm.pf_workers;
> +	     i < xe->info.num_pf_work; ++i, ++pf_work) {
> +		if (pf_work->cache.start == XE_PAGEFAULT_CACHE_START_INVALID)
> +			drm_printf(p, "pagefault work[%u] cache invalid\n", i);
> +		else
> +			drm_printf(p, "pagefault work[%u] cache valid\n", i);
> +
> +	}
> +}
> diff --git a/drivers/gpu/drm/xe/xe_pagefault.h b/drivers/gpu/drm/xe/xe_pagefault.h
> index feaf2a69674a..e9c5d1f03760 100644
> --- a/drivers/gpu/drm/xe/xe_pagefault.h
> +++ b/drivers/gpu/drm/xe/xe_pagefault.h
> @@ -8,6 +8,7 @@
>   
>   #include "xe_pagefault_types.h"
>   
> +struct drm_printer;
>   struct xe_device;
>   struct xe_gt;
>   struct xe_pagefault;
> @@ -18,6 +19,8 @@ void xe_pagefault_reset(struct xe_device *xe, struct xe_gt *gt);
>   
>   int xe_pagefault_handler(struct xe_device *xe, struct xe_pagefault *pf);
>   
> +void xe_pagefault_print_info(struct xe_device *xe, struct drm_printer *p);
> +
>   #define XE_PAGEFAULT_END_ADDR_MASK	(~0xfffull)
>   
>   /**

There could be only as many chains as there are many workers.

In a very specific case right after pf goes from ACTIVE to FREE this 
could get a snapshot from which one can deduce that there is less chains 
that really are.

Just thinking if there ever be 0 active and say 100 chained. Looks like 
possible.


Looks good,

Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v4 11/12] drm/xe: batch CT pagefault acks with periodic flush
  2026-02-26  4:28 [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache Matthew Brost
                   ` (9 preceding siblings ...)
  2026-02-26  4:28 ` [PATCH v4 10/12] drm/xe: Add debugfs pagefault_info Matthew Brost
@ 2026-02-26  4:28 ` Matthew Brost
  2026-05-08  9:24   ` Maciej Patelczyk
  2026-02-26  4:28 ` [PATCH v4 12/12] drm/xe: Track parallel page fault activity in GT stats Matthew Brost
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 33+ messages in thread
From: Matthew Brost @ 2026-02-26  4:28 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Pagefault storms can generate long chains of acknowledgments back to the
GuC. Sending each ack as a full CT submission forces a barrier,
descriptor update and doorbell per fault.

Extend xe_guc_ct_send_locked() with a “write-only” mode that copies the
message into the H2G ring but defers publishing the descriptor and
ringing the doorbell. Add xe_guc_ct_send_flush() to publish pending
writes and notify GuC once per batch. Wire this into the pagefault
producer via new ack_fault_begin/ack_fault_end callbacks and CT lock
wrappers.

To avoid excessive flush latency while still amortizing MMIO costs, use
a simple periodic flush heuristic for GuC pagefault acks: batch most
acks as write-only and force a publish at a fixed interval (e.g., every
16th ack), with a final flush at end-of-batch.

Also increase the H2G CTB size to 16K to better absorb bursts.

Assistent-by: Chat-GPT # Documentation
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_ct.c          | 94 +++++++++++++++++++------
 drivers/gpu/drm/xe/xe_guc_ct.h          | 35 ++++++++-
 drivers/gpu/drm/xe/xe_guc_pagefault.c   | 28 +++++++-
 drivers/gpu/drm/xe/xe_guc_types.h       |  6 ++
 drivers/gpu/drm/xe/xe_pagefault.c       | 12 +++-
 drivers/gpu/drm/xe/xe_pagefault_types.h | 14 ++++
 6 files changed, 164 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index 3a262d3af8cf..5a126e19c53e 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -255,7 +255,7 @@ static bool g2h_fence_needs_alloc(struct g2h_fence *g2h_fence)
 
 #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
 #define CTB_H2G_BUFFER_OFFSET	(CTB_DESC_SIZE * 2)
-#define CTB_H2G_BUFFER_SIZE	(SZ_4K)
+#define CTB_H2G_BUFFER_SIZE	(SZ_16K)
 #define CTB_H2G_BUFFER_DWORDS	(CTB_H2G_BUFFER_SIZE / sizeof(u32))
 #define CTB_G2H_BUFFER_SIZE	(SZ_128K)
 #define CTB_G2H_BUFFER_DWORDS	(CTB_G2H_BUFFER_SIZE / sizeof(u32))
@@ -912,7 +912,7 @@ static bool vf_action_can_safely_fail(struct xe_device *xe, u32 action)
 #define H2G_CT_HEADERS (GUC_CTB_HDR_LEN + 1) /* one DW CTB header and one DW HxG header */
 
 static int h2g_write(struct xe_guc_ct *ct, const u32 *action, u32 len,
-		     u32 ct_fence_value, bool want_response)
+		     u32 ct_fence_value, bool want_response, bool write_only)
 {
 	struct xe_device *xe = ct_to_xe(ct);
 	struct xe_gt *gt = ct_to_gt(ct);
@@ -936,15 +936,8 @@ static int h2g_write(struct xe_guc_ct *ct, const u32 *action, u32 len,
 	}
 
 	if (IS_ENABLED(CONFIG_DRM_XE_DEBUG)) {
-		u32 desc_tail = desc_read(xe, h2g, tail);
 		u32 desc_head = desc_read(xe, h2g, head);
 
-		if (tail != desc_tail) {
-			desc_write(xe, h2g, status, desc_status | GUC_CTB_STATUS_MISMATCH);
-			xe_gt_err(gt, "CT write: tail was modified %u != %u\n", desc_tail, tail);
-			goto corrupted;
-		}
-
 		if (tail > h2g->info.size) {
 			desc_write(xe, h2g, status, desc_status | GUC_CTB_STATUS_OVERFLOW);
 			xe_gt_err(gt, "CT write: tail out of range: %u vs %u\n",
@@ -966,7 +959,8 @@ static int h2g_write(struct xe_guc_ct *ct, const u32 *action, u32 len,
 			      (h2g->info.size - tail) * sizeof(u32));
 		h2g_reserve_space(ct, (h2g->info.size - tail));
 		h2g->info.tail = 0;
-		desc_write(xe, h2g, tail, h2g->info.tail);
+		if (!write_only)
+			desc_write(xe, h2g, tail, h2g->info.tail);
 
 		return -EAGAIN;
 	}
@@ -997,14 +991,15 @@ static int h2g_write(struct xe_guc_ct *ct, const u32 *action, u32 len,
 	/* Write H2G ensuring visible before descriptor update */
 	xe_map_memcpy_to(xe, &map, 0, cmd, H2G_CT_HEADERS * sizeof(u32));
 	xe_map_memcpy_to(xe, &map, H2G_CT_HEADERS * sizeof(u32), action, len * sizeof(u32));
-	xe_device_wmb(xe);
-
 	/* Update local copies */
 	h2g->info.tail = (tail + full_len) % h2g->info.size;
 	h2g_reserve_space(ct, full_len);
 
 	/* Update descriptor */
-	desc_write(xe, h2g, tail, h2g->info.tail);
+	if (!write_only) {
+		xe_device_wmb(xe);
+		desc_write(xe, h2g, tail, h2g->info.tail);
+	}
 
 	trace_xe_guc_ctb_h2g(xe, gt->info.id, *(action - 1), full_len,
 			     desc_read(xe, h2g, head), h2g->info.tail);
@@ -1018,7 +1013,7 @@ static int h2g_write(struct xe_guc_ct *ct, const u32 *action, u32 len,
 
 static int __guc_ct_send_locked(struct xe_guc_ct *ct, const u32 *action,
 				u32 len, u32 g2h_len, u32 num_g2h,
-				struct g2h_fence *g2h_fence)
+				struct g2h_fence *g2h_fence, bool write_only)
 {
 	struct xe_gt *gt = ct_to_gt(ct);
 	u16 seqno;
@@ -1073,7 +1068,7 @@ static int __guc_ct_send_locked(struct xe_guc_ct *ct, const u32 *action,
 	if (unlikely(ret))
 		goto out_unlock;
 
-	ret = h2g_write(ct, action, len, seqno, !!g2h_fence);
+	ret = h2g_write(ct, action, len, seqno, !!g2h_fence, write_only);
 	if (unlikely(ret)) {
 		if (ret == -EAGAIN)
 			goto retry;
@@ -1081,7 +1076,8 @@ static int __guc_ct_send_locked(struct xe_guc_ct *ct, const u32 *action,
 	}
 
 	__g2h_reserve_space(ct, g2h_len, num_g2h);
-	xe_guc_notify(ct_to_guc(ct));
+	if (!write_only)
+		xe_guc_notify(ct_to_guc(ct));
 out_unlock:
 	if (g2h_len)
 		spin_unlock_irq(&ct->fast_lock);
@@ -1157,7 +1153,7 @@ static bool guc_ct_send_wait_for_retry(struct xe_guc_ct *ct, u32 len,
 
 static int guc_ct_send_locked(struct xe_guc_ct *ct, const u32 *action, u32 len,
 			      u32 g2h_len, u32 num_g2h,
-			      struct g2h_fence *g2h_fence)
+			      struct g2h_fence *g2h_fence, bool write_only)
 {
 	struct xe_gt *gt = ct_to_gt(ct);
 	unsigned int sleep_period_ms = 1;
@@ -1170,9 +1166,11 @@ static int guc_ct_send_locked(struct xe_guc_ct *ct, const u32 *action, u32 len,
 
 try_again:
 	ret = __guc_ct_send_locked(ct, action, len, g2h_len, num_g2h,
-				   g2h_fence);
+				   g2h_fence, write_only);
 
 	if (unlikely(ret == -EBUSY)) {
+		if (write_only)
+			xe_guc_ct_send_flush(ct);
 		if (!guc_ct_send_wait_for_retry(ct, len, g2h_len, g2h_fence,
 						&sleep_period_ms, &sleep_total_ms))
 			goto broken;
@@ -1196,7 +1194,8 @@ static int guc_ct_send(struct xe_guc_ct *ct, const u32 *action, u32 len,
 	xe_gt_assert(ct_to_gt(ct), !g2h_len || !g2h_fence);
 
 	mutex_lock(&ct->lock);
-	ret = guc_ct_send_locked(ct, action, len, g2h_len, num_g2h, g2h_fence);
+	ret = guc_ct_send_locked(ct, action, len, g2h_len, num_g2h, g2h_fence,
+				 false);
 	mutex_unlock(&ct->lock);
 
 	return ret;
@@ -1214,25 +1213,76 @@ int xe_guc_ct_send(struct xe_guc_ct *ct, const u32 *action, u32 len,
 	return ret;
 }
 
+/**
+ * xe_guc_ct_send_locked() - submit a GuC CT H2G message with CT lock held
+ * @ct: GuC CT object
+ * @action: payload dwords (HxG header dword is expected at @action[-1])
+ * @len: number of payload dwords in @action
+ * @write_only: defer publishing/doorbell for batching
+ *
+ * Sends a single H2G message to the GuC CT buffer while the caller already
+ * holds @ct->lock.
+ *
+ * If @write_only is false, the function completes the submission immediately:
+ * it makes the payload visible to the device, updates the H2G descriptor and
+ * rings the GuC doorbell.
+ *
+ * If @write_only is true, the message payload is copied into the H2G ring and
+ * the software tail is advanced, but the descriptor update and doorbell are
+ * deferred so multiple messages can be batched. In this mode, the caller must
+ * eventually call xe_guc_ct_send_flush() (still holding @ct->lock) to publish
+ * the descriptor and notify the GuC. On internal retry paths (-EBUSY), the
+ * implementation may force a flush to ensure forward progress.
+ *
+ * Return: 0 on success, negative errno on failure.
+ *
+ * Locking:
+ *   Must be called with @ct->lock held.
+ */
 int xe_guc_ct_send_locked(struct xe_guc_ct *ct, const u32 *action, u32 len,
-			  u32 g2h_len, u32 num_g2h)
+			  bool write_only)
 {
 	int ret;
 
-	ret = guc_ct_send_locked(ct, action, len, g2h_len, num_g2h, NULL);
+	ret = guc_ct_send_locked(ct, action, len, 0, 0, NULL, write_only);
 	if (ret == -EDEADLK)
 		kick_reset(ct);
 
 	return ret;
 }
 
+/**
+ * xe_guc_ct_send_flush() - flush pending GuC CT H2G writes
+ * @ct: GuC CT instance
+ *
+ * Some callers batch multiple H2G writes using xe_guc_ct_send_locked() in
+ * "write-only" mode (i.e., queue the message payloads but defer ringing the
+ * doorbell / updating the CT descriptor). This helper completes the submission
+ * by ensuring the payload writes are visible to the device, updating the H2G
+ * descriptor, and ringing the GuC CT doorbell.
+ *
+ * Locking:
+ *   Must be called with @ct->lock held.
+ */
+void xe_guc_ct_send_flush(struct xe_guc_ct *ct)
+{
+	struct xe_device *xe = ct_to_xe(ct);
+	struct guc_ctb *h2g = &ct->ctbs.h2g;
+
+	lockdep_assert_held(&ct->lock);
+
+	xe_device_wmb(xe);
+	desc_write(xe, h2g, tail, h2g->info.tail);
+	xe_guc_notify(ct_to_guc(ct));
+}
+
 int xe_guc_ct_send_g2h_handler(struct xe_guc_ct *ct, const u32 *action, u32 len)
 {
 	int ret;
 
 	lockdep_assert_held(&ct->lock);
 
-	ret = guc_ct_send_locked(ct, action, len, 0, 0, NULL);
+	ret = guc_ct_send_locked(ct, action, len, 0, 0, NULL, false);
 	if (ret == -EDEADLK)
 		kick_reset(ct);
 
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.h b/drivers/gpu/drm/xe/xe_guc_ct.h
index 767365a33dee..2db4dded6b96 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.h
+++ b/drivers/gpu/drm/xe/xe_guc_ct.h
@@ -54,7 +54,7 @@ static inline void xe_guc_ct_irq_handler(struct xe_guc_ct *ct)
 int xe_guc_ct_send(struct xe_guc_ct *ct, const u32 *action, u32 len,
 		   u32 g2h_len, u32 num_g2h);
 int xe_guc_ct_send_locked(struct xe_guc_ct *ct, const u32 *action, u32 len,
-			  u32 g2h_len, u32 num_g2h);
+			  bool write_only);
 int xe_guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
 			u32 *response_buffer);
 static inline int
@@ -62,6 +62,7 @@ xe_guc_ct_send_block(struct xe_guc_ct *ct, const u32 *action, u32 len)
 {
 	return xe_guc_ct_send_recv(ct, action, len, NULL);
 }
+void xe_guc_ct_send_flush(struct xe_guc_ct *ct);
 
 /* This is only version of the send CT you can call from a G2H handler */
 int xe_guc_ct_send_g2h_handler(struct xe_guc_ct *ct, const u32 *action,
@@ -87,4 +88,36 @@ static inline void xe_guc_ct_wake_waiters(struct xe_guc_ct *ct)
 	wake_up_all(&ct->wq);
 }
 
+/**
+ * xe_guc_ct_lock() - take the GuC CT mutex
+ * @ct: GuC CT object
+ *
+ * Wrapper around mutex_lock(&ct->lock) for cases where CT operations need to be
+ * performed from contexts that want an explicit "CT locked" pair without
+ * exporting the lock itself.
+ *
+ * Return/Locking:
+ *   Acquires @ct->lock.
+ */
+static inline void xe_guc_ct_lock(struct xe_guc_ct *ct)
+__acquires(&ct->lock)
+{
+	mutex_lock(&ct->lock);
+}
+
+/**
+ * xe_guc_ct_unlock() - release the GuC CT mutex
+ * @ct: GuC CT object
+ *
+ * Counterpart to xe_guc_ct_lock().
+ *
+ * Locking:
+ *   Releases @ct->lock.
+ */
+static inline void xe_guc_ct_unlock(struct xe_guc_ct *ct)
+__releases(&ct->lock)
+{
+	mutex_unlock(&ct->lock);
+}
+
 #endif
diff --git a/drivers/gpu/drm/xe/xe_guc_pagefault.c b/drivers/gpu/drm/xe/xe_guc_pagefault.c
index 2470faf3d5d8..cee653bf463b 100644
--- a/drivers/gpu/drm/xe/xe_guc_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_guc_pagefault.c
@@ -10,6 +10,19 @@
 #include "xe_pagefault.h"
 #include "xe_pagefault_types.h"
 
+#define XE_GUC_PAGEFAULT_FLUSH_PERIOD	BIT(4)	/* Sixteen */
+
+static void guc_ack_fault_begin(void *private)
+{
+	struct xe_guc *guc = private;
+
+	xe_guc_ct_lock(&guc->ct);
+
+	/* Ack the 2th, then 18th, etc... */
+	guc->pagefault_ack_counter =
+		XE_GUC_PAGEFAULT_FLUSH_PERIOD - 2;
+}
+
 static void guc_ack_fault(struct xe_pagefault *pf, int err)
 {
 	u32 vfid = FIELD_GET(PFD_VFID, pf->producer.msg[2]);
@@ -36,12 +49,25 @@ static void guc_ack_fault(struct xe_pagefault *pf, int err)
 		FIELD_PREP(PFR_PDATA, pdata),
 	};
 	struct xe_guc *guc = pf->producer.private;
+	bool write_only = guc->pagefault_ack_counter++ &
+		(XE_GUC_PAGEFAULT_FLUSH_PERIOD - 1);
+
+	xe_guc_ct_send_locked(&guc->ct, action, ARRAY_SIZE(action),
+			      write_only);
+}
+
+static void guc_ack_fault_end(void *private)
+{
+	struct xe_guc *guc = private;
 
-	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
+	xe_guc_ct_send_flush(&guc->ct);
+	xe_guc_ct_unlock(&guc->ct);
 }
 
 static const struct xe_pagefault_ops guc_pagefault_ops = {
+	.ack_fault_begin = guc_ack_fault_begin,
 	.ack_fault = guc_ack_fault,
+	.ack_fault_end = guc_ack_fault_end,
 };
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_guc_types.h b/drivers/gpu/drm/xe/xe_guc_types.h
index c7b9642b41ba..2996e5903ccb 100644
--- a/drivers/gpu/drm/xe/xe_guc_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_types.h
@@ -124,6 +124,12 @@ struct xe_guc {
 	struct xe_reg notify_reg;
 	/** @params: Control params for fw initialization */
 	u32 params[GUC_CTL_MAX_DWORDS];
+
+	/**
+	 * @pagefault_ack_counter: Counter to determine when periodically ack
+	 * pagefaults in a batch.
+	 */
+	u32 pagefault_ack_counter;
 };
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
index 2cfda29321c9..d252a8c9d88c 100644
--- a/drivers/gpu/drm/xe/xe_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_pagefault.c
@@ -425,6 +425,10 @@ static bool xe_pagefault_cache_hit(struct xe_pagefault_queue *pf_queue,
 			xe_assert(xe, pf_work->cache.pf->consumer.alloc_state ==
 				  XE_PAGEFAULT_ALLOC_STATE_ACTIVE);
 
+			if (pf->producer.private !=
+			    pf_work->cache.pf->producer.private)
+				continue;
+
 			xe_gt_stats_incr(pf->gt,
 					 XE_GT_STATS_ID_CHAIN_PAGEFAULT_COUNT,
 					 1);
@@ -559,6 +563,8 @@ static void xe_pagefault_queue_work(struct work_struct *w)
 
 
 	while (xe_pagefault_queue_pop(pf_queue, &pf, pf_work->id)) {
+		const struct xe_pagefault_ops *ops = pf->producer.ops;
+		void *private = pf->producer.private;
 		struct xe_gt *gt = pf->gt;
 		u32 asid = pf->consumer.asid;
 		int err = 0;
@@ -599,6 +605,7 @@ static void xe_pagefault_queue_work(struct work_struct *w)
 			  XE_PAGEFAULT_ALLOC_STATE_ACTIVE);
 		xe_assert(xe, pf == pf_work->cache.pf);
 
+		ops->ack_fault_begin(private);
 		while (pf) {
 			struct xe_pagefault *next;
 
@@ -606,8 +613,10 @@ static void xe_pagefault_queue_work(struct work_struct *w)
 				  XE_PAGEFAULT_ALLOC_STATE_CHAINED ||
 				  pf->consumer.alloc_state ==
 				  XE_PAGEFAULT_ALLOC_STATE_ACTIVE);
+			xe_assert(xe, ops == pf->producer.ops);
+			xe_assert(xe, gt == pf->gt);
 
-			pf->producer.ops->ack_fault(pf, err);
+			ops->ack_fault(pf, err);
 
 			if (pf->consumer.alloc_state ==
 			    XE_PAGEFAULT_ALLOC_STATE_ACTIVE)
@@ -635,6 +644,7 @@ static void xe_pagefault_queue_work(struct work_struct *w)
 				pf = xe_pagefault_queue_requeue(pf_queue, pf,
 								gt);
 		}
+		ops->ack_fault_end(private);
 
 		if (time_after(jiffies, threshold)) {
 			queue_work(xe->usm.pf_wq, w);
diff --git a/drivers/gpu/drm/xe/xe_pagefault_types.h b/drivers/gpu/drm/xe/xe_pagefault_types.h
index 57cb292105d7..bc8f582b4e03 100644
--- a/drivers/gpu/drm/xe/xe_pagefault_types.h
+++ b/drivers/gpu/drm/xe/xe_pagefault_types.h
@@ -33,6 +33,13 @@ enum xe_pagefault_type {
 
 /** struct xe_pagefault_ops - Xe pagefault ops (producer) */
 struct xe_pagefault_ops {
+	/**
+	 * @ack_fault_begin: Ack fault begin
+	 * @private: producer private data
+	 *
+	 * Page fault producer begins acknowledgment from the consumer.
+	 */
+	void (*ack_fault_begin)(void *private);
 	/**
 	 * @ack_fault: Ack fault
 	 * @pf: Page fault
@@ -42,6 +49,13 @@ struct xe_pagefault_ops {
 	 * sends the result to the HW/FW interface.
 	 */
 	void (*ack_fault)(struct xe_pagefault *pf, int err);
+	/**
+	 * @ack_fault_end: Ack fault end
+	 * @private: producer private data
+	 *
+	 * Page fault producer ends acknowledgment from the consumer.
+	 */
+	void (*ack_fault_end)(void *private);
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 11/12] drm/xe: batch CT pagefault acks with periodic flush
  2026-02-26  4:28 ` [PATCH v4 11/12] drm/xe: batch CT pagefault acks with periodic flush Matthew Brost
@ 2026-05-08  9:24   ` Maciej Patelczyk
  0 siblings, 0 replies; 33+ messages in thread
From: Maciej Patelczyk @ 2026-05-08  9:24 UTC (permalink / raw)
  To: Matthew Brost, intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

On 26/02/2026 05:28, Matthew Brost wrote:

> Pagefault storms can generate long chains of acknowledgments back to the
> GuC. Sending each ack as a full CT submission forces a barrier,
> descriptor update and doorbell per fault.
>
> Extend xe_guc_ct_send_locked() with a “write-only” mode that copies the
> message into the H2G ring but defers publishing the descriptor and
> ringing the doorbell. Add xe_guc_ct_send_flush() to publish pending
> writes and notify GuC once per batch. Wire this into the pagefault
> producer via new ack_fault_begin/ack_fault_end callbacks and CT lock
> wrappers.
>
> To avoid excessive flush latency while still amortizing MMIO costs, use
> a simple periodic flush heuristic for GuC pagefault acks: batch most
> acks as write-only and force a publish at a fixed interval (e.g., every
> 16th ack), with a final flush at end-of-batch.
>
> Also increase the H2G CTB size to 16K to better absorb bursts.
>
> Assistent-by: Chat-GPT # Documentation
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Interesting addition to chaining.
> ---
>   drivers/gpu/drm/xe/xe_guc_ct.c          | 94 +++++++++++++++++++------
>   drivers/gpu/drm/xe/xe_guc_ct.h          | 35 ++++++++-
>   drivers/gpu/drm/xe/xe_guc_pagefault.c   | 28 +++++++-
>   drivers/gpu/drm/xe/xe_guc_types.h       |  6 ++
>   drivers/gpu/drm/xe/xe_pagefault.c       | 12 +++-
>   drivers/gpu/drm/xe/xe_pagefault_types.h | 14 ++++
>   6 files changed, 164 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> index 3a262d3af8cf..5a126e19c53e 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> @@ -255,7 +255,7 @@ static bool g2h_fence_needs_alloc(struct g2h_fence *g2h_fence)
>   
>   #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
>   #define CTB_H2G_BUFFER_OFFSET	(CTB_DESC_SIZE * 2)
> -#define CTB_H2G_BUFFER_SIZE	(SZ_4K)
> +#define CTB_H2G_BUFFER_SIZE	(SZ_16K)
>   #define CTB_H2G_BUFFER_DWORDS	(CTB_H2G_BUFFER_SIZE / sizeof(u32))
>   #define CTB_G2H_BUFFER_SIZE	(SZ_128K)
>   #define CTB_G2H_BUFFER_DWORDS	(CTB_G2H_BUFFER_SIZE / sizeof(u32))
> @@ -912,7 +912,7 @@ static bool vf_action_can_safely_fail(struct xe_device *xe, u32 action)
>   #define H2G_CT_HEADERS (GUC_CTB_HDR_LEN + 1) /* one DW CTB header and one DW HxG header */
>   
>   static int h2g_write(struct xe_guc_ct *ct, const u32 *action, u32 len,
> -		     u32 ct_fence_value, bool want_response)
> +		     u32 ct_fence_value, bool want_response, bool write_only)
>   {
>   	struct xe_device *xe = ct_to_xe(ct);
>   	struct xe_gt *gt = ct_to_gt(ct);
> @@ -936,15 +936,8 @@ static int h2g_write(struct xe_guc_ct *ct, const u32 *action, u32 len,
>   	}
>   
>   	if (IS_ENABLED(CONFIG_DRM_XE_DEBUG)) {
> -		u32 desc_tail = desc_read(xe, h2g, tail);
>   		u32 desc_head = desc_read(xe, h2g, head);
>   
> -		if (tail != desc_tail) {
> -			desc_write(xe, h2g, status, desc_status | GUC_CTB_STATUS_MISMATCH);
> -			xe_gt_err(gt, "CT write: tail was modified %u != %u\n", desc_tail, tail);
> -			goto corrupted;
> -		}
> -
>   		if (tail > h2g->info.size) {
>   			desc_write(xe, h2g, status, desc_status | GUC_CTB_STATUS_OVERFLOW);
>   			xe_gt_err(gt, "CT write: tail out of range: %u vs %u\n",
> @@ -966,7 +959,8 @@ static int h2g_write(struct xe_guc_ct *ct, const u32 *action, u32 len,
>   			      (h2g->info.size - tail) * sizeof(u32));
>   		h2g_reserve_space(ct, (h2g->info.size - tail));
>   		h2g->info.tail = 0;
> -		desc_write(xe, h2g, tail, h2g->info.tail);
> +		if (!write_only)
> +			desc_write(xe, h2g, tail, h2g->info.tail);
>   
>   		return -EAGAIN;
>   	}
> @@ -997,14 +991,15 @@ static int h2g_write(struct xe_guc_ct *ct, const u32 *action, u32 len,
>   	/* Write H2G ensuring visible before descriptor update */
>   	xe_map_memcpy_to(xe, &map, 0, cmd, H2G_CT_HEADERS * sizeof(u32));
>   	xe_map_memcpy_to(xe, &map, H2G_CT_HEADERS * sizeof(u32), action, len * sizeof(u32));
> -	xe_device_wmb(xe);
> -
>   	/* Update local copies */
>   	h2g->info.tail = (tail + full_len) % h2g->info.size;
>   	h2g_reserve_space(ct, full_len);
>   
>   	/* Update descriptor */
> -	desc_write(xe, h2g, tail, h2g->info.tail);
> +	if (!write_only) {
> +		xe_device_wmb(xe);
> +		desc_write(xe, h2g, tail, h2g->info.tail);
> +	}
>   
>   	trace_xe_guc_ctb_h2g(xe, gt->info.id, *(action - 1), full_len,
>   			     desc_read(xe, h2g, head), h2g->info.tail);
> @@ -1018,7 +1013,7 @@ static int h2g_write(struct xe_guc_ct *ct, const u32 *action, u32 len,
>   
>   static int __guc_ct_send_locked(struct xe_guc_ct *ct, const u32 *action,
>   				u32 len, u32 g2h_len, u32 num_g2h,
> -				struct g2h_fence *g2h_fence)
> +				struct g2h_fence *g2h_fence, bool write_only)
>   {
>   	struct xe_gt *gt = ct_to_gt(ct);
>   	u16 seqno;
> @@ -1073,7 +1068,7 @@ static int __guc_ct_send_locked(struct xe_guc_ct *ct, const u32 *action,
>   	if (unlikely(ret))
>   		goto out_unlock;
>   
> -	ret = h2g_write(ct, action, len, seqno, !!g2h_fence);
> +	ret = h2g_write(ct, action, len, seqno, !!g2h_fence, write_only);
>   	if (unlikely(ret)) {
>   		if (ret == -EAGAIN)
>   			goto retry;
> @@ -1081,7 +1076,8 @@ static int __guc_ct_send_locked(struct xe_guc_ct *ct, const u32 *action,
>   	}
>   
>   	__g2h_reserve_space(ct, g2h_len, num_g2h);
> -	xe_guc_notify(ct_to_guc(ct));
> +	if (!write_only)
> +		xe_guc_notify(ct_to_guc(ct));
>   out_unlock:
>   	if (g2h_len)
>   		spin_unlock_irq(&ct->fast_lock);
> @@ -1157,7 +1153,7 @@ static bool guc_ct_send_wait_for_retry(struct xe_guc_ct *ct, u32 len,
>   
>   static int guc_ct_send_locked(struct xe_guc_ct *ct, const u32 *action, u32 len,
>   			      u32 g2h_len, u32 num_g2h,
> -			      struct g2h_fence *g2h_fence)
> +			      struct g2h_fence *g2h_fence, bool write_only)
>   {
>   	struct xe_gt *gt = ct_to_gt(ct);
>   	unsigned int sleep_period_ms = 1;
> @@ -1170,9 +1166,11 @@ static int guc_ct_send_locked(struct xe_guc_ct *ct, const u32 *action, u32 len,
>   
>   try_again:
>   	ret = __guc_ct_send_locked(ct, action, len, g2h_len, num_g2h,
> -				   g2h_fence);
> +				   g2h_fence, write_only);
>   
>   	if (unlikely(ret == -EBUSY)) {
> +		if (write_only)
> +			xe_guc_ct_send_flush(ct);


Just to check if the 'if' is correct here. So flush is for 'write_only' 
only when got -EBUSY.


>   		if (!guc_ct_send_wait_for_retry(ct, len, g2h_len, g2h_fence,
>   						&sleep_period_ms, &sleep_total_ms))
>   			goto broken;
> @@ -1196,7 +1194,8 @@ static int guc_ct_send(struct xe_guc_ct *ct, const u32 *action, u32 len,
>   	xe_gt_assert(ct_to_gt(ct), !g2h_len || !g2h_fence);
>   
>   	mutex_lock(&ct->lock);
> -	ret = guc_ct_send_locked(ct, action, len, g2h_len, num_g2h, g2h_fence);
> +	ret = guc_ct_send_locked(ct, action, len, g2h_len, num_g2h, g2h_fence,
> +				 false);
>   	mutex_unlock(&ct->lock);
>   
>   	return ret;
> @@ -1214,25 +1213,76 @@ int xe_guc_ct_send(struct xe_guc_ct *ct, const u32 *action, u32 len,
>   	return ret;
>   }
>   
> +/**
> + * xe_guc_ct_send_locked() - submit a GuC CT H2G message with CT lock held
> + * @ct: GuC CT object
> + * @action: payload dwords (HxG header dword is expected at @action[-1])
> + * @len: number of payload dwords in @action
> + * @write_only: defer publishing/doorbell for batching
> + *
> + * Sends a single H2G message to the GuC CT buffer while the caller already
> + * holds @ct->lock.
> + *
> + * If @write_only is false, the function completes the submission immediately:
> + * it makes the payload visible to the device, updates the H2G descriptor and
> + * rings the GuC doorbell.
> + *
> + * If @write_only is true, the message payload is copied into the H2G ring and
> + * the software tail is advanced, but the descriptor update and doorbell are
> + * deferred so multiple messages can be batched. In this mode, the caller must
> + * eventually call xe_guc_ct_send_flush() (still holding @ct->lock) to publish
> + * the descriptor and notify the GuC. On internal retry paths (-EBUSY), the
> + * implementation may force a flush to ensure forward progress.
> + *
> + * Return: 0 on success, negative errno on failure.
> + *
> + * Locking:
> + *   Must be called with @ct->lock held.
> + */
>   int xe_guc_ct_send_locked(struct xe_guc_ct *ct, const u32 *action, u32 len,
> -			  u32 g2h_len, u32 num_g2h)
> +			  bool write_only)
>   {
>   	int ret;
>   
> -	ret = guc_ct_send_locked(ct, action, len, g2h_len, num_g2h, NULL);
> +	ret = guc_ct_send_locked(ct, action, len, 0, 0, NULL, write_only);
>   	if (ret == -EDEADLK)
>   		kick_reset(ct);
>   
>   	return ret;
>   }
>   
> +/**
> + * xe_guc_ct_send_flush() - flush pending GuC CT H2G writes
> + * @ct: GuC CT instance
> + *
> + * Some callers batch multiple H2G writes using xe_guc_ct_send_locked() in
> + * "write-only" mode (i.e., queue the message payloads but defer ringing the
> + * doorbell / updating the CT descriptor). This helper completes the submission
> + * by ensuring the payload writes are visible to the device, updating the H2G
> + * descriptor, and ringing the GuC CT doorbell.
> + *
> + * Locking:
> + *   Must be called with @ct->lock held.
> + */
> +void xe_guc_ct_send_flush(struct xe_guc_ct *ct)
> +{
> +	struct xe_device *xe = ct_to_xe(ct);
> +	struct guc_ctb *h2g = &ct->ctbs.h2g;
> +
> +	lockdep_assert_held(&ct->lock);
> +
> +	xe_device_wmb(xe);
> +	desc_write(xe, h2g, tail, h2g->info.tail);
> +	xe_guc_notify(ct_to_guc(ct));
> +}
> +
>   int xe_guc_ct_send_g2h_handler(struct xe_guc_ct *ct, const u32 *action, u32 len)
>   {
>   	int ret;
>   
>   	lockdep_assert_held(&ct->lock);
>   
> -	ret = guc_ct_send_locked(ct, action, len, 0, 0, NULL);
> +	ret = guc_ct_send_locked(ct, action, len, 0, 0, NULL, false);
>   	if (ret == -EDEADLK)
>   		kick_reset(ct);
>   
> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.h b/drivers/gpu/drm/xe/xe_guc_ct.h
> index 767365a33dee..2db4dded6b96 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ct.h
> +++ b/drivers/gpu/drm/xe/xe_guc_ct.h
> @@ -54,7 +54,7 @@ static inline void xe_guc_ct_irq_handler(struct xe_guc_ct *ct)
>   int xe_guc_ct_send(struct xe_guc_ct *ct, const u32 *action, u32 len,
>   		   u32 g2h_len, u32 num_g2h);
>   int xe_guc_ct_send_locked(struct xe_guc_ct *ct, const u32 *action, u32 len,
> -			  u32 g2h_len, u32 num_g2h);
> +			  bool write_only);
>   int xe_guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
>   			u32 *response_buffer);
>   static inline int
> @@ -62,6 +62,7 @@ xe_guc_ct_send_block(struct xe_guc_ct *ct, const u32 *action, u32 len)
>   {
>   	return xe_guc_ct_send_recv(ct, action, len, NULL);
>   }
> +void xe_guc_ct_send_flush(struct xe_guc_ct *ct);
>   
>   /* This is only version of the send CT you can call from a G2H handler */
>   int xe_guc_ct_send_g2h_handler(struct xe_guc_ct *ct, const u32 *action,
> @@ -87,4 +88,36 @@ static inline void xe_guc_ct_wake_waiters(struct xe_guc_ct *ct)
>   	wake_up_all(&ct->wq);
>   }
>   
> +/**
> + * xe_guc_ct_lock() - take the GuC CT mutex
> + * @ct: GuC CT object
> + *
> + * Wrapper around mutex_lock(&ct->lock) for cases where CT operations need to be
> + * performed from contexts that want an explicit "CT locked" pair without
> + * exporting the lock itself.
> + *
> + * Return/Locking:
> + *   Acquires @ct->lock.
> + */
> +static inline void xe_guc_ct_lock(struct xe_guc_ct *ct)
> +__acquires(&ct->lock)
> +{
> +	mutex_lock(&ct->lock);
> +}
> +
> +/**
> + * xe_guc_ct_unlock() - release the GuC CT mutex
> + * @ct: GuC CT object
> + *
> + * Counterpart to xe_guc_ct_lock().
> + *
> + * Locking:
> + *   Releases @ct->lock.
> + */
> +static inline void xe_guc_ct_unlock(struct xe_guc_ct *ct)
> +__releases(&ct->lock)
> +{
> +	mutex_unlock(&ct->lock);
> +}
> +
>   #endif
> diff --git a/drivers/gpu/drm/xe/xe_guc_pagefault.c b/drivers/gpu/drm/xe/xe_guc_pagefault.c
> index 2470faf3d5d8..cee653bf463b 100644
> --- a/drivers/gpu/drm/xe/xe_guc_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_guc_pagefault.c
> @@ -10,6 +10,19 @@
>   #include "xe_pagefault.h"
>   #include "xe_pagefault_types.h"
>   
> +#define XE_GUC_PAGEFAULT_FLUSH_PERIOD	BIT(4)	/* Sixteen */
> +
> +static void guc_ack_fault_begin(void *private)
> +{
> +	struct xe_guc *guc = private;
> +
> +	xe_guc_ct_lock(&guc->ct);
> +
> +	/* Ack the 2th, then 18th, etc... */
> +	guc->pagefault_ack_counter =
> +		XE_GUC_PAGEFAULT_FLUSH_PERIOD - 2;
> +}
> +
>   static void guc_ack_fault(struct xe_pagefault *pf, int err)
>   {
>   	u32 vfid = FIELD_GET(PFD_VFID, pf->producer.msg[2]);
> @@ -36,12 +49,25 @@ static void guc_ack_fault(struct xe_pagefault *pf, int err)
>   		FIELD_PREP(PFR_PDATA, pdata),
>   	};
>   	struct xe_guc *guc = pf->producer.private;
> +	bool write_only = guc->pagefault_ack_counter++ &
> +		(XE_GUC_PAGEFAULT_FLUSH_PERIOD - 1);
> +
> +	xe_guc_ct_send_locked(&guc->ct, action, ARRAY_SIZE(action),
> +			      write_only);
> +}
> +
> +static void guc_ack_fault_end(void *private)
> +{
> +	struct xe_guc *guc = private;
>   
> -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
Maybe
if (!((guc->pagefault_ack_counter -1) & (XE_GUC_PAGEFAULT_FLUSH_PERIOD - 
1)))
> +	xe_guc_ct_send_flush(&guc->ct);
Not to wake up GuC without new ACK where the last one was !write_only.
> +	xe_guc_ct_unlock(&guc->ct);
>   }
>   
>   static const struct xe_pagefault_ops guc_pagefault_ops = {
> +	.ack_fault_begin = guc_ack_fault_begin,
>   	.ack_fault = guc_ack_fault,
> +	.ack_fault_end = guc_ack_fault_end,
>   };
>   
>   /**
> diff --git a/drivers/gpu/drm/xe/xe_guc_types.h b/drivers/gpu/drm/xe/xe_guc_types.h
> index c7b9642b41ba..2996e5903ccb 100644
> --- a/drivers/gpu/drm/xe/xe_guc_types.h
> +++ b/drivers/gpu/drm/xe/xe_guc_types.h
> @@ -124,6 +124,12 @@ struct xe_guc {
>   	struct xe_reg notify_reg;
>   	/** @params: Control params for fw initialization */
>   	u32 params[GUC_CTL_MAX_DWORDS];
> +
> +	/**
> +	 * @pagefault_ack_counter: Counter to determine when periodically ack
> +	 * pagefaults in a batch.
> +	 */
> +	u32 pagefault_ack_counter;
>   };
>   
>   #endif
> diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
> index 2cfda29321c9..d252a8c9d88c 100644
> --- a/drivers/gpu/drm/xe/xe_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_pagefault.c
> @@ -425,6 +425,10 @@ static bool xe_pagefault_cache_hit(struct xe_pagefault_queue *pf_queue,
>   			xe_assert(xe, pf_work->cache.pf->consumer.alloc_state ==
>   				  XE_PAGEFAULT_ALLOC_STATE_ACTIVE);
>   
> +			if (pf->producer.private !=
> +			    pf_work->cache.pf->producer.private)
> +				continue;
> +
>   			xe_gt_stats_incr(pf->gt,
>   					 XE_GT_STATS_ID_CHAIN_PAGEFAULT_COUNT,
>   					 1);
> @@ -559,6 +563,8 @@ static void xe_pagefault_queue_work(struct work_struct *w)
>   
>   
>   	while (xe_pagefault_queue_pop(pf_queue, &pf, pf_work->id)) {
> +		const struct xe_pagefault_ops *ops = pf->producer.ops;
> +		void *private = pf->producer.private;
>   		struct xe_gt *gt = pf->gt;
>   		u32 asid = pf->consumer.asid;
>   		int err = 0;
> @@ -599,6 +605,7 @@ static void xe_pagefault_queue_work(struct work_struct *w)
>   			  XE_PAGEFAULT_ALLOC_STATE_ACTIVE);
>   		xe_assert(xe, pf == pf_work->cache.pf);
>   
> +		ops->ack_fault_begin(private);
>   		while (pf) {
>   			struct xe_pagefault *next;
>   
> @@ -606,8 +613,10 @@ static void xe_pagefault_queue_work(struct work_struct *w)
>   				  XE_PAGEFAULT_ALLOC_STATE_CHAINED ||
>   				  pf->consumer.alloc_state ==
>   				  XE_PAGEFAULT_ALLOC_STATE_ACTIVE);
> +			xe_assert(xe, ops == pf->producer.ops);
> +			xe_assert(xe, gt == pf->gt);
>   
> -			pf->producer.ops->ack_fault(pf, err);
> +			ops->ack_fault(pf, err);
>   
>   			if (pf->consumer.alloc_state ==
>   			    XE_PAGEFAULT_ALLOC_STATE_ACTIVE)
> @@ -635,6 +644,7 @@ static void xe_pagefault_queue_work(struct work_struct *w)
>   				pf = xe_pagefault_queue_requeue(pf_queue, pf,
>   								gt);
>   		}
> +		ops->ack_fault_end(private);

For every chain, the worker proceeds, every 16-th pagefault will update 
the GuC and at the end additional update.


>   
>   		if (time_after(jiffies, threshold)) {
>   			queue_work(xe->usm.pf_wq, w);
> diff --git a/drivers/gpu/drm/xe/xe_pagefault_types.h b/drivers/gpu/drm/xe/xe_pagefault_types.h
> index 57cb292105d7..bc8f582b4e03 100644
> --- a/drivers/gpu/drm/xe/xe_pagefault_types.h
> +++ b/drivers/gpu/drm/xe/xe_pagefault_types.h
> @@ -33,6 +33,13 @@ enum xe_pagefault_type {
>   
>   /** struct xe_pagefault_ops - Xe pagefault ops (producer) */
>   struct xe_pagefault_ops {
> +	/**
> +	 * @ack_fault_begin: Ack fault begin
> +	 * @private: producer private data
> +	 *
> +	 * Page fault producer begins acknowledgment from the consumer.
> +	 */
> +	void (*ack_fault_begin)(void *private);
>   	/**
>   	 * @ack_fault: Ack fault
>   	 * @pf: Page fault
> @@ -42,6 +49,13 @@ struct xe_pagefault_ops {
>   	 * sends the result to the HW/FW interface.
>   	 */
>   	void (*ack_fault)(struct xe_pagefault *pf, int err);
> +	/**
> +	 * @ack_fault_end: Ack fault end
> +	 * @private: producer private data
> +	 *
> +	 * Page fault producer ends acknowledgment from the consumer.
> +	 */
> +	void (*ack_fault_end)(void *private);
>   };
>   
>   /**

In the end, each chain will result with GuC update. In case of long 
chains the updates are every 16th ACK.


Maciej


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v4 12/12] drm/xe: Track parallel page fault activity in GT stats
  2026-02-26  4:28 [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache Matthew Brost
                   ` (10 preceding siblings ...)
  2026-02-26  4:28 ` [PATCH v4 11/12] drm/xe: batch CT pagefault acks with periodic flush Matthew Brost
@ 2026-02-26  4:28 ` Matthew Brost
  2026-05-07 13:56   ` Maciej Patelczyk
  2026-02-26  4:35 ` ✗ CI.checkpatch: warning for Fine grained fault locking, threaded prefetch, storm cache (rev4) Patchwork
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 33+ messages in thread
From: Matthew Brost @ 2026-02-26  4:28 UTC (permalink / raw)
  To: intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

Add a new GT statistic, PARALLEL_PAGEFAULT_COUNT, to record when
multiple page fault workers are active concurrently.

When a worker dequeues a fault, scan peer workers for an active
cache entry and increment the counter if another fault is already
in flight. This provides basic visibility into parallel fault
handling behavior for performance analysis and tuning.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_stats.c       |  1 +
 drivers/gpu/drm/xe/xe_gt_stats_types.h |  1 +
 drivers/gpu/drm/xe/xe_pagefault.c      | 18 +++++++++++++++++-
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_stats.c b/drivers/gpu/drm/xe/xe_gt_stats.c
index cdd467dfb46d..621d1a2df067 100644
--- a/drivers/gpu/drm/xe/xe_gt_stats.c
+++ b/drivers/gpu/drm/xe/xe_gt_stats.c
@@ -58,6 +58,7 @@ static const char *const stat_description[__XE_GT_STATS_NUM_IDS] = {
 	DEF_STAT_STR(CHAIN_IRQ_PAGEFAULT_COUNT, "chain_irq_pagefault_count"),
 	DEF_STAT_STR(CHAIN_DRAIN_IRQ_PAGEFAULT_COUNT, "chain_drain_irq_pagefault_count"),
 	DEF_STAT_STR(CHAIN_MISMATCH_PAGEFAULT_COUNT, "chain_mismatch_pagefault_count"),
+	DEF_STAT_STR(PARALLEL_PAGEFAULT_COUNT, "parallel_pagefault_count"),
 	DEF_STAT_STR(LAST_PAGEFAULT_COUNT, "last_pagefault_count"),
 	DEF_STAT_STR(SVM_PAGEFAULT_COUNT, "svm_pagefault_count"),
 	DEF_STAT_STR(TLB_INVAL, "tlb_inval_count"),
diff --git a/drivers/gpu/drm/xe/xe_gt_stats_types.h b/drivers/gpu/drm/xe/xe_gt_stats_types.h
index 591e614e1cfc..075a12152ae2 100644
--- a/drivers/gpu/drm/xe/xe_gt_stats_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_stats_types.h
@@ -13,6 +13,7 @@ enum xe_gt_stats_id {
 	XE_GT_STATS_ID_CHAIN_IRQ_PAGEFAULT_COUNT,
 	XE_GT_STATS_ID_CHAIN_DRAIN_IRQ_PAGEFAULT_COUNT,
 	XE_GT_STATS_ID_CHAIN_MISMATCH_PAGEFAULT_COUNT,
+	XE_GT_STATS_ID_PARALLEL_PAGEFAULT_COUNT,
 	XE_GT_STATS_ID_LAST_PAGEFAULT_COUNT,
 	XE_GT_STATS_ID_SVM_PAGEFAULT_COUNT,
 	XE_GT_STATS_ID_TLB_INVAL,
diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
index d252a8c9d88c..2a37a4c97aad 100644
--- a/drivers/gpu/drm/xe/xe_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_pagefault.c
@@ -458,9 +458,10 @@ static bool xe_pagefault_queue_pop(struct xe_pagefault_queue *pf_queue,
 {
 	struct xe_device *xe = container_of(pf_queue, typeof(*xe),
 					    usm.pf_queue);
-	struct xe_pagefault_work *pf_work;
+	struct xe_pagefault_work *pf_work, *__pf_work;
 	struct xe_pagefault *lpf;
 	size_t align = SZ_2M;
+	int i;
 
 	guard(spinlock_irq)(&pf_queue->lock);
 
@@ -497,6 +498,21 @@ static bool xe_pagefault_queue_pop(struct xe_pagefault_queue *pf_queue,
 	pf_work->cache.pf = lpf;
 	lpf->consumer.alloc_state = XE_PAGEFAULT_ALLOC_STATE_ACTIVE;
 
+	for (i = 0, __pf_work = xe->usm.pf_workers;
+	     i < xe->info.num_pf_work; ++i, ++__pf_work) {
+		u64 cache_start = __pf_work->cache.start;
+
+		if (__pf_work == pf_work)
+			continue;
+
+		if (cache_start != XE_PAGEFAULT_CACHE_START_INVALID) {
+			xe_gt_stats_incr(xe_root_mmio_gt(xe),
+					 XE_GT_STATS_ID_PARALLEL_PAGEFAULT_COUNT,
+					 1);
+			break;
+		}
+	}
+
 	/* Drain queue until empty or new fault found */
 	while (1) {
 		if (pf_queue->tail == pf_queue->head)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 12/12] drm/xe: Track parallel page fault activity in GT stats
  2026-02-26  4:28 ` [PATCH v4 12/12] drm/xe: Track parallel page fault activity in GT stats Matthew Brost
@ 2026-05-07 13:56   ` Maciej Patelczyk
  2026-05-07 14:23     ` Francois Dugast
  0 siblings, 1 reply; 33+ messages in thread
From: Maciej Patelczyk @ 2026-05-07 13:56 UTC (permalink / raw)
  To: Matthew Brost, intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	thomas.hellstrom, francois.dugast

On 26/02/2026 05:28, Matthew Brost wrote:

> Add a new GT statistic, PARALLEL_PAGEFAULT_COUNT, to record when
> multiple page fault workers are active concurrently.
>
> When a worker dequeues a fault, scan peer workers for an active
> cache entry and increment the counter if another fault is already
> in flight. This provides basic visibility into parallel fault
> handling behavior for performance analysis and tuning.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_gt_stats.c       |  1 +
>   drivers/gpu/drm/xe/xe_gt_stats_types.h |  1 +
>   drivers/gpu/drm/xe/xe_pagefault.c      | 18 +++++++++++++++++-
>   3 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_gt_stats.c b/drivers/gpu/drm/xe/xe_gt_stats.c
> index cdd467dfb46d..621d1a2df067 100644
> --- a/drivers/gpu/drm/xe/xe_gt_stats.c
> +++ b/drivers/gpu/drm/xe/xe_gt_stats.c
> @@ -58,6 +58,7 @@ static const char *const stat_description[__XE_GT_STATS_NUM_IDS] = {
>   	DEF_STAT_STR(CHAIN_IRQ_PAGEFAULT_COUNT, "chain_irq_pagefault_count"),
>   	DEF_STAT_STR(CHAIN_DRAIN_IRQ_PAGEFAULT_COUNT, "chain_drain_irq_pagefault_count"),
>   	DEF_STAT_STR(CHAIN_MISMATCH_PAGEFAULT_COUNT, "chain_mismatch_pagefault_count"),
> +	DEF_STAT_STR(PARALLEL_PAGEFAULT_COUNT, "parallel_pagefault_count"),
>   	DEF_STAT_STR(LAST_PAGEFAULT_COUNT, "last_pagefault_count"),
>   	DEF_STAT_STR(SVM_PAGEFAULT_COUNT, "svm_pagefault_count"),
>   	DEF_STAT_STR(TLB_INVAL, "tlb_inval_count"),
> diff --git a/drivers/gpu/drm/xe/xe_gt_stats_types.h b/drivers/gpu/drm/xe/xe_gt_stats_types.h
> index 591e614e1cfc..075a12152ae2 100644
> --- a/drivers/gpu/drm/xe/xe_gt_stats_types.h
> +++ b/drivers/gpu/drm/xe/xe_gt_stats_types.h
> @@ -13,6 +13,7 @@ enum xe_gt_stats_id {
>   	XE_GT_STATS_ID_CHAIN_IRQ_PAGEFAULT_COUNT,
>   	XE_GT_STATS_ID_CHAIN_DRAIN_IRQ_PAGEFAULT_COUNT,
>   	XE_GT_STATS_ID_CHAIN_MISMATCH_PAGEFAULT_COUNT,
> +	XE_GT_STATS_ID_PARALLEL_PAGEFAULT_COUNT,
>   	XE_GT_STATS_ID_LAST_PAGEFAULT_COUNT,
>   	XE_GT_STATS_ID_SVM_PAGEFAULT_COUNT,
>   	XE_GT_STATS_ID_TLB_INVAL,
> diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
> index d252a8c9d88c..2a37a4c97aad 100644
> --- a/drivers/gpu/drm/xe/xe_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_pagefault.c
> @@ -458,9 +458,10 @@ static bool xe_pagefault_queue_pop(struct xe_pagefault_queue *pf_queue,
>   {
>   	struct xe_device *xe = container_of(pf_queue, typeof(*xe),
>   					    usm.pf_queue);
> -	struct xe_pagefault_work *pf_work;
> +	struct xe_pagefault_work *pf_work, *__pf_work;
>   	struct xe_pagefault *lpf;
>   	size_t align = SZ_2M;
> +	int i;
>   
>   	guard(spinlock_irq)(&pf_queue->lock);
>   
> @@ -497,6 +498,21 @@ static bool xe_pagefault_queue_pop(struct xe_pagefault_queue *pf_queue,
>   	pf_work->cache.pf = lpf;
>   	lpf->consumer.alloc_state = XE_PAGEFAULT_ALLOC_STATE_ACTIVE;
>   
> +	for (i = 0, __pf_work = xe->usm.pf_workers;
> +	     i < xe->info.num_pf_work; ++i, ++__pf_work) {
> +		u64 cache_start = __pf_work->cache.start;
> +
> +		if (__pf_work == pf_work)
> +			continue;
> +
> +		if (cache_start != XE_PAGEFAULT_CACHE_START_INVALID) {
> +			xe_gt_stats_incr(xe_root_mmio_gt(xe),
> +					 XE_GT_STATS_ID_PARALLEL_PAGEFAULT_COUNT,
> +					 1);

In the xe_pagefault_queue_work() right after first ACTIVE pagefault is 
ACK'ed the worke's cache is invalidated.

Worker may still run releasing chained entries but this stat won't catch it.

Maybe it doesn't matter. Maybe the work is done (the 
xe_pagefault_service() being the core work) at that time.

Just noticing.

If it doesn't matter then

Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com>

> +			break;
> +		}
> +	}
> +
>   	/* Drain queue until empty or new fault found */
>   	while (1) {
>   		if (pf_queue->tail == pf_queue->head)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 12/12] drm/xe: Track parallel page fault activity in GT stats
  2026-05-07 13:56   ` Maciej Patelczyk
@ 2026-05-07 14:23     ` Francois Dugast
  0 siblings, 0 replies; 33+ messages in thread
From: Francois Dugast @ 2026-05-07 14:23 UTC (permalink / raw)
  To: Maciej Patelczyk
  Cc: Matthew Brost, intel-xe, stuart.summers, arvind.yadav,
	himal.prasad.ghimiray, thomas.hellstrom

On Thu, May 07, 2026 at 03:56:37PM +0200, Maciej Patelczyk wrote:
> On 26/02/2026 05:28, Matthew Brost wrote:
> 
> > Add a new GT statistic, PARALLEL_PAGEFAULT_COUNT, to record when
> > multiple page fault workers are active concurrently.
> > 
> > When a worker dequeues a fault, scan peer workers for an active
> > cache entry and increment the counter if another fault is already
> > in flight. This provides basic visibility into parallel fault
> > handling behavior for performance analysis and tuning.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_gt_stats.c       |  1 +
> >   drivers/gpu/drm/xe/xe_gt_stats_types.h |  1 +
> >   drivers/gpu/drm/xe/xe_pagefault.c      | 18 +++++++++++++++++-
> >   3 files changed, 19 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_gt_stats.c b/drivers/gpu/drm/xe/xe_gt_stats.c
> > index cdd467dfb46d..621d1a2df067 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_stats.c
> > +++ b/drivers/gpu/drm/xe/xe_gt_stats.c
> > @@ -58,6 +58,7 @@ static const char *const stat_description[__XE_GT_STATS_NUM_IDS] = {
> >   	DEF_STAT_STR(CHAIN_IRQ_PAGEFAULT_COUNT, "chain_irq_pagefault_count"),
> >   	DEF_STAT_STR(CHAIN_DRAIN_IRQ_PAGEFAULT_COUNT, "chain_drain_irq_pagefault_count"),
> >   	DEF_STAT_STR(CHAIN_MISMATCH_PAGEFAULT_COUNT, "chain_mismatch_pagefault_count"),
> > +	DEF_STAT_STR(PARALLEL_PAGEFAULT_COUNT, "parallel_pagefault_count"),
> >   	DEF_STAT_STR(LAST_PAGEFAULT_COUNT, "last_pagefault_count"),
> >   	DEF_STAT_STR(SVM_PAGEFAULT_COUNT, "svm_pagefault_count"),
> >   	DEF_STAT_STR(TLB_INVAL, "tlb_inval_count"),
> > diff --git a/drivers/gpu/drm/xe/xe_gt_stats_types.h b/drivers/gpu/drm/xe/xe_gt_stats_types.h
> > index 591e614e1cfc..075a12152ae2 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_stats_types.h
> > +++ b/drivers/gpu/drm/xe/xe_gt_stats_types.h
> > @@ -13,6 +13,7 @@ enum xe_gt_stats_id {
> >   	XE_GT_STATS_ID_CHAIN_IRQ_PAGEFAULT_COUNT,
> >   	XE_GT_STATS_ID_CHAIN_DRAIN_IRQ_PAGEFAULT_COUNT,
> >   	XE_GT_STATS_ID_CHAIN_MISMATCH_PAGEFAULT_COUNT,
> > +	XE_GT_STATS_ID_PARALLEL_PAGEFAULT_COUNT,
> >   	XE_GT_STATS_ID_LAST_PAGEFAULT_COUNT,
> >   	XE_GT_STATS_ID_SVM_PAGEFAULT_COUNT,
> >   	XE_GT_STATS_ID_TLB_INVAL,
> > diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
> > index d252a8c9d88c..2a37a4c97aad 100644
> > --- a/drivers/gpu/drm/xe/xe_pagefault.c
> > +++ b/drivers/gpu/drm/xe/xe_pagefault.c
> > @@ -458,9 +458,10 @@ static bool xe_pagefault_queue_pop(struct xe_pagefault_queue *pf_queue,
> >   {
> >   	struct xe_device *xe = container_of(pf_queue, typeof(*xe),
> >   					    usm.pf_queue);
> > -	struct xe_pagefault_work *pf_work;
> > +	struct xe_pagefault_work *pf_work, *__pf_work;
> >   	struct xe_pagefault *lpf;
> >   	size_t align = SZ_2M;
> > +	int i;
> >   	guard(spinlock_irq)(&pf_queue->lock);
> > @@ -497,6 +498,21 @@ static bool xe_pagefault_queue_pop(struct xe_pagefault_queue *pf_queue,
> >   	pf_work->cache.pf = lpf;
> >   	lpf->consumer.alloc_state = XE_PAGEFAULT_ALLOC_STATE_ACTIVE;
> > +	for (i = 0, __pf_work = xe->usm.pf_workers;
> > +	     i < xe->info.num_pf_work; ++i, ++__pf_work) {
> > +		u64 cache_start = __pf_work->cache.start;
> > +
> > +		if (__pf_work == pf_work)
> > +			continue;
> > +
> > +		if (cache_start != XE_PAGEFAULT_CACHE_START_INVALID) {
> > +			xe_gt_stats_incr(xe_root_mmio_gt(xe),
> > +					 XE_GT_STATS_ID_PARALLEL_PAGEFAULT_COUNT,
> > +					 1);
> 
> In the xe_pagefault_queue_work() right after first ACTIVE pagefault is
> ACK'ed the worke's cache is invalidated.
> 
> Worker may still run releasing chained entries but this stat won't catch it.
> 
> Maybe it doesn't matter. Maybe the work is done (the xe_pagefault_service()
> being the core work) at that time.
> 
> Just noticing.
> 
> If it doesn't matter then
> 
> Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
> 
> > +			break;
> > +		}
> > +	}
> > +
> >   	/* Drain queue until empty or new fault found */
> >   	while (1) {
> >   		if (pf_queue->tail == pf_queue->head)

Same comment as https://patchwork.freedesktop.org/patch/707296/?series=162167&rev=4#comment_1332066

Please add a description in the kernel doc for
XE_GT_STATS_ID_PARALLEL_PAGEFAULT_COUNT.

Francois

^ permalink raw reply	[flat|nested] 33+ messages in thread

* ✗ CI.checkpatch: warning for Fine grained fault locking, threaded prefetch, storm cache (rev4)
  2026-02-26  4:28 [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache Matthew Brost
                   ` (11 preceding siblings ...)
  2026-02-26  4:28 ` [PATCH v4 12/12] drm/xe: Track parallel page fault activity in GT stats Matthew Brost
@ 2026-02-26  4:35 ` Patchwork
  2026-02-26  4:36 ` ✓ CI.KUnit: success " Patchwork
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Patchwork @ 2026-02-26  4:35 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

== Series Details ==

Series: Fine grained fault locking, threaded prefetch, storm cache (rev4)
URL   : https://patchwork.freedesktop.org/series/162167/
State : warning

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
1f57ba1afceae32108bd24770069f764d940a0e4
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit b22e50c3acc2d6942febf7f04758fd75c289a9b2
Author: Matthew Brost <matthew.brost@intel.com>
Date:   Wed Feb 25 20:28:34 2026 -0800

    drm/xe: Track parallel page fault activity in GT stats
    
    Add a new GT statistic, PARALLEL_PAGEFAULT_COUNT, to record when
    multiple page fault workers are active concurrently.
    
    When a worker dequeues a fault, scan peer workers for an active
    cache entry and increment the counter if another fault is already
    in flight. This provides basic visibility into parallel fault
    handling behavior for performance analysis and tuning.
    
    Signed-off-by: Matthew Brost <matthew.brost@intel.com>
+ /mt/dim checkpatch f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440 drm-intel
cc4a6a960ced drm/xe: Fine grained page fault locking
-:557: CHECK:UNCOMMENTED_DEFINITION: struct mutex definition without comment
#557: FILE: drivers/gpu/drm/xe/xe_svm.h:251:
+	struct mutex lock;

-:599: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#599: FILE: drivers/gpu/drm/xe/xe_userptr.c:61:
+}
+#define xe_vma_userptr_lockdep(uvma)	\

total: 0 errors, 0 warnings, 2 checks, 691 lines checked
ac537356e9fc drm/xe: Allow prefetch-only VM bind IOCTLs to use VM read lock
0aeac3bc3f4e drm/xe: Thread prefetch of SVM ranges
6b966b3ffe9b drm/xe: Use a single page-fault queue with multiple workers
95d9cfd0aa92 drm/xe: Add num_pf_work modparam
d4cdaa9ba507 drm/xe: Engine class and instance into a u8
5aec41437b2b drm/xe: Track pagefault worker runtime
2c38f9bd67bf drm/xe: Chain page faults via queue-resident cache to avoid fault storms
-:34: WARNING:BAD_SIGN_OFF: Non-standard signature: Assisted-by:
#34: 
Assisted-by: Chat-GPT # Documentation

-:34: ERROR:BAD_SIGN_OFF: Unrecognized email address: 'Chat-GPT # Documentation'
#34: 
Assisted-by: Chat-GPT # Documentation

-:446: ERROR:SPACING: space prohibited before that close parenthesis ')'
#446: FILE: drivers/gpu/drm/xe/xe_pagefault.c:571:
+		} else if (err ) {

-:455: WARNING:LONG_LINE: line length of 109 exceeds 100 columns
#455: FILE: drivers/gpu/drm/xe/xe_pagefault.c:578:
+				xe_gt_stats_incr(pf->gt, XE_GT_STATS_ID_INVALID_PREFETCH_PAGEFAULT_COUNT, 1);

total: 2 errors, 2 warnings, 0 checks, 813 lines checked
b79f27af94ef drm/xe: Add pagefault chaining stats
3f11dd2c5784 drm/xe: Add debugfs pagefault_info
-:14: WARNING:BAD_SIGN_OFF: Non-standard signature: Assisted-by:
#14: 
Assisted-by: Chat-GPT # Documentation

-:14: ERROR:BAD_SIGN_OFF: Unrecognized email address: 'Chat-GPT # Documentation'
#14: 
Assisted-by: Chat-GPT # Documentation

-:128: CHECK:BRACES: Blank lines aren't necessary before a close brace '}'
#128: FILE: drivers/gpu/drm/xe/xe_pagefault.c:909:
+
+	}

total: 1 errors, 1 warnings, 1 checks, 115 lines checked
955d0956e486 drm/xe: batch CT pagefault acks with periodic flush
-:27: WARNING:BAD_SIGN_OFF: Non-standard signature: Assistent-by:
#27: 
Assistent-by: Chat-GPT # Documentation

-:27: ERROR:BAD_SIGN_OFF: Unrecognized email address: 'Chat-GPT # Documentation'
#27: 
Assistent-by: Chat-GPT # Documentation

-:254: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#254: FILE: drivers/gpu/drm/xe/xe_guc_ct.h:65:
 }
+void xe_guc_ct_send_flush(struct xe_guc_ct *ct);

total: 1 errors, 1 warnings, 1 checks, 368 lines checked
b22e50c3acc2 drm/xe: Track parallel page fault activity in GT stats



^ permalink raw reply	[flat|nested] 33+ messages in thread

* ✓ CI.KUnit: success for Fine grained fault locking, threaded prefetch, storm cache (rev4)
  2026-02-26  4:28 [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache Matthew Brost
                   ` (12 preceding siblings ...)
  2026-02-26  4:35 ` ✗ CI.checkpatch: warning for Fine grained fault locking, threaded prefetch, storm cache (rev4) Patchwork
@ 2026-02-26  4:36 ` Patchwork
  2026-02-26  5:26 ` ✗ Xe.CI.BAT: failure " Patchwork
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Patchwork @ 2026-02-26  4:36 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

== Series Details ==

Series: Fine grained fault locking, threaded prefetch, storm cache (rev4)
URL   : https://patchwork.freedesktop.org/series/162167/
State : success

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
[04:35:18] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[04:35:22] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[04:35:53] Starting KUnit Kernel (1/1)...
[04:35:53] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[04:35:53] ================== guc_buf (11 subtests) ===================
[04:35:53] [PASSED] test_smallest
[04:35:53] [PASSED] test_largest
[04:35:53] [PASSED] test_granular
[04:35:53] [PASSED] test_unique
[04:35:53] [PASSED] test_overlap
[04:35:53] [PASSED] test_reusable
[04:35:53] [PASSED] test_too_big
[04:35:53] [PASSED] test_flush
[04:35:53] [PASSED] test_lookup
[04:35:53] [PASSED] test_data
[04:35:53] [PASSED] test_class
[04:35:53] ===================== [PASSED] guc_buf =====================
[04:35:53] =================== guc_dbm (7 subtests) ===================
[04:35:53] [PASSED] test_empty
[04:35:53] [PASSED] test_default
[04:35:53] ======================== test_size  ========================
[04:35:53] [PASSED] 4
[04:35:53] [PASSED] 8
[04:35:53] [PASSED] 32
[04:35:53] [PASSED] 256
[04:35:53] ==================== [PASSED] test_size ====================
[04:35:53] ======================= test_reuse  ========================
[04:35:53] [PASSED] 4
[04:35:53] [PASSED] 8
[04:35:53] [PASSED] 32
[04:35:53] [PASSED] 256
[04:35:53] =================== [PASSED] test_reuse ====================
[04:35:53] =================== test_range_overlap  ====================
[04:35:53] [PASSED] 4
[04:35:53] [PASSED] 8
[04:35:53] [PASSED] 32
[04:35:53] [PASSED] 256
[04:35:53] =============== [PASSED] test_range_overlap ================
[04:35:53] =================== test_range_compact  ====================
[04:35:53] [PASSED] 4
[04:35:53] [PASSED] 8
[04:35:53] [PASSED] 32
[04:35:53] [PASSED] 256
[04:35:53] =============== [PASSED] test_range_compact ================
[04:35:53] ==================== test_range_spare  =====================
[04:35:53] [PASSED] 4
[04:35:53] [PASSED] 8
[04:35:53] [PASSED] 32
[04:35:53] [PASSED] 256
[04:35:53] ================ [PASSED] test_range_spare =================
[04:35:53] ===================== [PASSED] guc_dbm =====================
[04:35:53] =================== guc_idm (6 subtests) ===================
[04:35:53] [PASSED] bad_init
[04:35:53] [PASSED] no_init
[04:35:53] [PASSED] init_fini
[04:35:53] [PASSED] check_used
[04:35:53] [PASSED] check_quota
[04:35:53] [PASSED] check_all
[04:35:53] ===================== [PASSED] guc_idm =====================
[04:35:53] ================== no_relay (3 subtests) ===================
[04:35:53] [PASSED] xe_drops_guc2pf_if_not_ready
[04:35:53] [PASSED] xe_drops_guc2vf_if_not_ready
[04:35:53] [PASSED] xe_rejects_send_if_not_ready
[04:35:53] ==================== [PASSED] no_relay =====================
[04:35:53] ================== pf_relay (14 subtests) ==================
[04:35:53] [PASSED] pf_rejects_guc2pf_too_short
[04:35:53] [PASSED] pf_rejects_guc2pf_too_long
[04:35:53] [PASSED] pf_rejects_guc2pf_no_payload
[04:35:53] [PASSED] pf_fails_no_payload
[04:35:53] [PASSED] pf_fails_bad_origin
[04:35:53] [PASSED] pf_fails_bad_type
[04:35:53] [PASSED] pf_txn_reports_error
[04:35:53] [PASSED] pf_txn_sends_pf2guc
[04:35:53] [PASSED] pf_sends_pf2guc
[04:35:53] [SKIPPED] pf_loopback_nop
[04:35:53] [SKIPPED] pf_loopback_echo
[04:35:53] [SKIPPED] pf_loopback_fail
[04:35:53] [SKIPPED] pf_loopback_busy
[04:35:53] [SKIPPED] pf_loopback_retry
[04:35:53] ==================== [PASSED] pf_relay =====================
[04:35:53] ================== vf_relay (3 subtests) ===================
[04:35:53] [PASSED] vf_rejects_guc2vf_too_short
[04:35:53] [PASSED] vf_rejects_guc2vf_too_long
[04:35:53] [PASSED] vf_rejects_guc2vf_no_payload
[04:35:53] ==================== [PASSED] vf_relay =====================
[04:35:53] ================ pf_gt_config (9 subtests) =================
[04:35:53] [PASSED] fair_contexts_1vf
[04:35:53] [PASSED] fair_doorbells_1vf
[04:35:53] [PASSED] fair_ggtt_1vf
[04:35:53] ====================== fair_vram_1vf  ======================
[04:35:53] [PASSED] 3.50 GiB
[04:35:53] [PASSED] 11.5 GiB
[04:35:53] [PASSED] 15.5 GiB
[04:35:53] [PASSED] 31.5 GiB
[04:35:53] [PASSED] 63.5 GiB
[04:35:53] [PASSED] 13.9 GiB
[04:35:53] ================== [PASSED] fair_vram_1vf ==================
[04:35:53] ================ fair_vram_1vf_admin_only  =================
[04:35:53] [PASSED] 3.50 GiB
[04:35:53] [PASSED] 11.5 GiB
[04:35:53] [PASSED] 15.5 GiB
[04:35:53] [PASSED] 31.5 GiB
[04:35:53] [PASSED] 63.5 GiB
[04:35:53] [PASSED] 13.9 GiB
[04:35:53] ============ [PASSED] fair_vram_1vf_admin_only =============
[04:35:53] ====================== fair_contexts  ======================
[04:35:53] [PASSED] 1 VF
[04:35:53] [PASSED] 2 VFs
[04:35:53] [PASSED] 3 VFs
[04:35:53] [PASSED] 4 VFs
[04:35:53] [PASSED] 5 VFs
[04:35:53] [PASSED] 6 VFs
[04:35:53] [PASSED] 7 VFs
[04:35:53] [PASSED] 8 VFs
[04:35:53] [PASSED] 9 VFs
[04:35:53] [PASSED] 10 VFs
[04:35:53] [PASSED] 11 VFs
[04:35:53] [PASSED] 12 VFs
[04:35:53] [PASSED] 13 VFs
[04:35:53] [PASSED] 14 VFs
[04:35:53] [PASSED] 15 VFs
[04:35:53] [PASSED] 16 VFs
[04:35:53] [PASSED] 17 VFs
[04:35:53] [PASSED] 18 VFs
[04:35:53] [PASSED] 19 VFs
[04:35:53] [PASSED] 20 VFs
[04:35:53] [PASSED] 21 VFs
[04:35:53] [PASSED] 22 VFs
[04:35:53] [PASSED] 23 VFs
[04:35:53] [PASSED] 24 VFs
[04:35:53] [PASSED] 25 VFs
[04:35:53] [PASSED] 26 VFs
[04:35:53] [PASSED] 27 VFs
[04:35:53] [PASSED] 28 VFs
[04:35:53] [PASSED] 29 VFs
[04:35:53] [PASSED] 30 VFs
[04:35:53] [PASSED] 31 VFs
[04:35:53] [PASSED] 32 VFs
[04:35:53] [PASSED] 33 VFs
[04:35:53] [PASSED] 34 VFs
[04:35:53] [PASSED] 35 VFs
[04:35:53] [PASSED] 36 VFs
[04:35:53] [PASSED] 37 VFs
[04:35:53] [PASSED] 38 VFs
[04:35:53] [PASSED] 39 VFs
[04:35:53] [PASSED] 40 VFs
[04:35:53] [PASSED] 41 VFs
[04:35:53] [PASSED] 42 VFs
[04:35:53] [PASSED] 43 VFs
[04:35:53] [PASSED] 44 VFs
[04:35:53] [PASSED] 45 VFs
[04:35:53] [PASSED] 46 VFs
[04:35:53] [PASSED] 47 VFs
[04:35:53] [PASSED] 48 VFs
[04:35:53] [PASSED] 49 VFs
[04:35:53] [PASSED] 50 VFs
[04:35:53] [PASSED] 51 VFs
[04:35:53] [PASSED] 52 VFs
[04:35:53] [PASSED] 53 VFs
[04:35:53] [PASSED] 54 VFs
[04:35:53] [PASSED] 55 VFs
[04:35:53] [PASSED] 56 VFs
[04:35:53] [PASSED] 57 VFs
[04:35:53] [PASSED] 58 VFs
[04:35:53] [PASSED] 59 VFs
[04:35:53] [PASSED] 60 VFs
[04:35:53] [PASSED] 61 VFs
[04:35:53] [PASSED] 62 VFs
[04:35:53] [PASSED] 63 VFs
[04:35:53] ================== [PASSED] fair_contexts ==================
[04:35:53] ===================== fair_doorbells  ======================
[04:35:53] [PASSED] 1 VF
[04:35:53] [PASSED] 2 VFs
[04:35:53] [PASSED] 3 VFs
[04:35:53] [PASSED] 4 VFs
[04:35:53] [PASSED] 5 VFs
[04:35:53] [PASSED] 6 VFs
[04:35:53] [PASSED] 7 VFs
[04:35:53] [PASSED] 8 VFs
[04:35:53] [PASSED] 9 VFs
[04:35:53] [PASSED] 10 VFs
[04:35:53] [PASSED] 11 VFs
[04:35:53] [PASSED] 12 VFs
[04:35:53] [PASSED] 13 VFs
[04:35:53] [PASSED] 14 VFs
[04:35:53] [PASSED] 15 VFs
[04:35:53] [PASSED] 16 VFs
[04:35:53] [PASSED] 17 VFs
[04:35:53] [PASSED] 18 VFs
[04:35:53] [PASSED] 19 VFs
[04:35:53] [PASSED] 20 VFs
[04:35:53] [PASSED] 21 VFs
[04:35:53] [PASSED] 22 VFs
[04:35:53] [PASSED] 23 VFs
[04:35:53] [PASSED] 24 VFs
[04:35:53] [PASSED] 25 VFs
[04:35:53] [PASSED] 26 VFs
[04:35:53] [PASSED] 27 VFs
[04:35:53] [PASSED] 28 VFs
[04:35:53] [PASSED] 29 VFs
[04:35:53] [PASSED] 30 VFs
[04:35:53] [PASSED] 31 VFs
[04:35:53] [PASSED] 32 VFs
[04:35:53] [PASSED] 33 VFs
[04:35:53] [PASSED] 34 VFs
[04:35:53] [PASSED] 35 VFs
[04:35:53] [PASSED] 36 VFs
[04:35:53] [PASSED] 37 VFs
[04:35:53] [PASSED] 38 VFs
[04:35:53] [PASSED] 39 VFs
[04:35:53] [PASSED] 40 VFs
[04:35:53] [PASSED] 41 VFs
[04:35:53] [PASSED] 42 VFs
[04:35:53] [PASSED] 43 VFs
[04:35:53] [PASSED] 44 VFs
[04:35:53] [PASSED] 45 VFs
[04:35:53] [PASSED] 46 VFs
[04:35:53] [PASSED] 47 VFs
[04:35:53] [PASSED] 48 VFs
[04:35:53] [PASSED] 49 VFs
[04:35:53] [PASSED] 50 VFs
[04:35:53] [PASSED] 51 VFs
[04:35:53] [PASSED] 52 VFs
[04:35:53] [PASSED] 53 VFs
[04:35:53] [PASSED] 54 VFs
[04:35:53] [PASSED] 55 VFs
[04:35:53] [PASSED] 56 VFs
[04:35:53] [PASSED] 57 VFs
[04:35:53] [PASSED] 58 VFs
[04:35:53] [PASSED] 59 VFs
[04:35:53] [PASSED] 60 VFs
[04:35:53] [PASSED] 61 VFs
[04:35:53] [PASSED] 62 VFs
[04:35:53] [PASSED] 63 VFs
[04:35:53] ================= [PASSED] fair_doorbells ==================
[04:35:53] ======================== fair_ggtt  ========================
[04:35:53] [PASSED] 1 VF
[04:35:53] [PASSED] 2 VFs
[04:35:53] [PASSED] 3 VFs
[04:35:53] [PASSED] 4 VFs
[04:35:53] [PASSED] 5 VFs
[04:35:53] [PASSED] 6 VFs
[04:35:53] [PASSED] 7 VFs
[04:35:53] [PASSED] 8 VFs
[04:35:53] [PASSED] 9 VFs
[04:35:53] [PASSED] 10 VFs
[04:35:53] [PASSED] 11 VFs
[04:35:53] [PASSED] 12 VFs
[04:35:53] [PASSED] 13 VFs
[04:35:53] [PASSED] 14 VFs
[04:35:53] [PASSED] 15 VFs
[04:35:53] [PASSED] 16 VFs
[04:35:53] [PASSED] 17 VFs
[04:35:53] [PASSED] 18 VFs
[04:35:53] [PASSED] 19 VFs
[04:35:53] [PASSED] 20 VFs
[04:35:53] [PASSED] 21 VFs
[04:35:53] [PASSED] 22 VFs
[04:35:53] [PASSED] 23 VFs
[04:35:53] [PASSED] 24 VFs
[04:35:53] [PASSED] 25 VFs
[04:35:53] [PASSED] 26 VFs
[04:35:53] [PASSED] 27 VFs
[04:35:53] [PASSED] 28 VFs
[04:35:53] [PASSED] 29 VFs
[04:35:53] [PASSED] 30 VFs
[04:35:53] [PASSED] 31 VFs
[04:35:53] [PASSED] 32 VFs
[04:35:53] [PASSED] 33 VFs
[04:35:53] [PASSED] 34 VFs
[04:35:53] [PASSED] 35 VFs
[04:35:53] [PASSED] 36 VFs
[04:35:53] [PASSED] 37 VFs
[04:35:53] [PASSED] 38 VFs
[04:35:53] [PASSED] 39 VFs
[04:35:53] [PASSED] 40 VFs
[04:35:53] [PASSED] 41 VFs
[04:35:53] [PASSED] 42 VFs
[04:35:53] [PASSED] 43 VFs
[04:35:53] [PASSED] 44 VFs
[04:35:53] [PASSED] 45 VFs
[04:35:53] [PASSED] 46 VFs
[04:35:53] [PASSED] 47 VFs
[04:35:53] [PASSED] 48 VFs
[04:35:53] [PASSED] 49 VFs
[04:35:53] [PASSED] 50 VFs
[04:35:53] [PASSED] 51 VFs
[04:35:53] [PASSED] 52 VFs
[04:35:53] [PASSED] 53 VFs
[04:35:53] [PASSED] 54 VFs
[04:35:53] [PASSED] 55 VFs
[04:35:53] [PASSED] 56 VFs
[04:35:53] [PASSED] 57 VFs
[04:35:53] [PASSED] 58 VFs
[04:35:53] [PASSED] 59 VFs
[04:35:53] [PASSED] 60 VFs
[04:35:53] [PASSED] 61 VFs
[04:35:53] [PASSED] 62 VFs
[04:35:53] [PASSED] 63 VFs
[04:35:53] ==================== [PASSED] fair_ggtt ====================
[04:35:53] ======================== fair_vram  ========================
[04:35:53] [PASSED] 1 VF
[04:35:53] [PASSED] 2 VFs
[04:35:53] [PASSED] 3 VFs
[04:35:53] [PASSED] 4 VFs
[04:35:53] [PASSED] 5 VFs
[04:35:53] [PASSED] 6 VFs
[04:35:53] [PASSED] 7 VFs
[04:35:53] [PASSED] 8 VFs
[04:35:53] [PASSED] 9 VFs
[04:35:53] [PASSED] 10 VFs
[04:35:53] [PASSED] 11 VFs
[04:35:53] [PASSED] 12 VFs
[04:35:53] [PASSED] 13 VFs
[04:35:53] [PASSED] 14 VFs
[04:35:53] [PASSED] 15 VFs
[04:35:53] [PASSED] 16 VFs
[04:35:53] [PASSED] 17 VFs
[04:35:53] [PASSED] 18 VFs
[04:35:53] [PASSED] 19 VFs
[04:35:53] [PASSED] 20 VFs
[04:35:53] [PASSED] 21 VFs
[04:35:53] [PASSED] 22 VFs
[04:35:53] [PASSED] 23 VFs
[04:35:53] [PASSED] 24 VFs
[04:35:53] [PASSED] 25 VFs
[04:35:53] [PASSED] 26 VFs
[04:35:53] [PASSED] 27 VFs
[04:35:53] [PASSED] 28 VFs
[04:35:53] [PASSED] 29 VFs
[04:35:53] [PASSED] 30 VFs
[04:35:53] [PASSED] 31 VFs
[04:35:53] [PASSED] 32 VFs
[04:35:53] [PASSED] 33 VFs
[04:35:53] [PASSED] 34 VFs
[04:35:53] [PASSED] 35 VFs
[04:35:53] [PASSED] 36 VFs
[04:35:53] [PASSED] 37 VFs
[04:35:53] [PASSED] 38 VFs
[04:35:53] [PASSED] 39 VFs
[04:35:53] [PASSED] 40 VFs
[04:35:53] [PASSED] 41 VFs
[04:35:53] [PASSED] 42 VFs
[04:35:53] [PASSED] 43 VFs
[04:35:53] [PASSED] 44 VFs
[04:35:53] [PASSED] 45 VFs
[04:35:53] [PASSED] 46 VFs
[04:35:53] [PASSED] 47 VFs
[04:35:53] [PASSED] 48 VFs
[04:35:53] [PASSED] 49 VFs
[04:35:53] [PASSED] 50 VFs
[04:35:53] [PASSED] 51 VFs
[04:35:53] [PASSED] 52 VFs
[04:35:53] [PASSED] 53 VFs
[04:35:53] [PASSED] 54 VFs
[04:35:53] [PASSED] 55 VFs
[04:35:53] [PASSED] 56 VFs
[04:35:53] [PASSED] 57 VFs
[04:35:53] [PASSED] 58 VFs
[04:35:53] [PASSED] 59 VFs
[04:35:53] [PASSED] 60 VFs
[04:35:53] [PASSED] 61 VFs
[04:35:53] [PASSED] 62 VFs
[04:35:53] [PASSED] 63 VFs
[04:35:53] ==================== [PASSED] fair_vram ====================
[04:35:53] ================== [PASSED] pf_gt_config ===================
[04:35:53] ===================== lmtt (1 subtest) =====================
[04:35:53] ======================== test_ops  =========================
[04:35:53] [PASSED] 2-level
[04:35:53] [PASSED] multi-level
[04:35:53] ==================== [PASSED] test_ops =====================
[04:35:53] ====================== [PASSED] lmtt =======================
[04:35:53] ================= pf_service (11 subtests) =================
[04:35:53] [PASSED] pf_negotiate_any
[04:35:53] [PASSED] pf_negotiate_base_match
[04:35:53] [PASSED] pf_negotiate_base_newer
[04:35:53] [PASSED] pf_negotiate_base_next
[04:35:53] [SKIPPED] pf_negotiate_base_older
[04:35:53] [PASSED] pf_negotiate_base_prev
[04:35:53] [PASSED] pf_negotiate_latest_match
[04:35:53] [PASSED] pf_negotiate_latest_newer
[04:35:53] [PASSED] pf_negotiate_latest_next
[04:35:53] [SKIPPED] pf_negotiate_latest_older
[04:35:53] [SKIPPED] pf_negotiate_latest_prev
[04:35:53] =================== [PASSED] pf_service ====================
[04:35:53] ================= xe_guc_g2g (2 subtests) ==================
[04:35:53] ============== xe_live_guc_g2g_kunit_default  ==============
[04:35:53] ========= [SKIPPED] xe_live_guc_g2g_kunit_default ==========
[04:35:53] ============== xe_live_guc_g2g_kunit_allmem  ===============
[04:35:53] ========== [SKIPPED] xe_live_guc_g2g_kunit_allmem ==========
[04:35:53] =================== [SKIPPED] xe_guc_g2g ===================
[04:35:53] =================== xe_mocs (2 subtests) ===================
[04:35:53] ================ xe_live_mocs_kernel_kunit  ================
[04:35:53] =========== [SKIPPED] xe_live_mocs_kernel_kunit ============
[04:35:53] ================ xe_live_mocs_reset_kunit  =================
[04:35:53] ============ [SKIPPED] xe_live_mocs_reset_kunit ============
[04:35:53] ==================== [SKIPPED] xe_mocs =====================
[04:35:53] ================= xe_migrate (2 subtests) ==================
[04:35:53] ================= xe_migrate_sanity_kunit  =================
[04:35:53] ============ [SKIPPED] xe_migrate_sanity_kunit =============
[04:35:53] ================== xe_validate_ccs_kunit  ==================
[04:35:53] ============= [SKIPPED] xe_validate_ccs_kunit ==============
[04:35:53] =================== [SKIPPED] xe_migrate ===================
[04:35:53] ================== xe_dma_buf (1 subtest) ==================
[04:35:53] ==================== xe_dma_buf_kunit  =====================
[04:35:53] ================ [SKIPPED] xe_dma_buf_kunit ================
[04:35:53] =================== [SKIPPED] xe_dma_buf ===================
[04:35:53] ================= xe_bo_shrink (1 subtest) =================
[04:35:53] =================== xe_bo_shrink_kunit  ====================
[04:35:53] =============== [SKIPPED] xe_bo_shrink_kunit ===============
[04:35:53] ================== [SKIPPED] xe_bo_shrink ==================
[04:35:53] ==================== xe_bo (2 subtests) ====================
[04:35:53] ================== xe_ccs_migrate_kunit  ===================
[04:35:53] ============== [SKIPPED] xe_ccs_migrate_kunit ==============
[04:35:53] ==================== xe_bo_evict_kunit  ====================
[04:35:53] =============== [SKIPPED] xe_bo_evict_kunit ================
[04:35:53] ===================== [SKIPPED] xe_bo ======================
[04:35:53] ==================== args (13 subtests) ====================
[04:35:53] [PASSED] count_args_test
[04:35:53] [PASSED] call_args_example
[04:35:53] [PASSED] call_args_test
[04:35:53] [PASSED] drop_first_arg_example
[04:35:53] [PASSED] drop_first_arg_test
[04:35:53] [PASSED] first_arg_example
[04:35:53] [PASSED] first_arg_test
[04:35:53] [PASSED] last_arg_example
[04:35:53] [PASSED] last_arg_test
[04:35:53] [PASSED] pick_arg_example
[04:35:53] [PASSED] if_args_example
[04:35:53] [PASSED] if_args_test
[04:35:53] [PASSED] sep_comma_example
[04:35:53] ====================== [PASSED] args =======================
[04:35:53] =================== xe_pci (3 subtests) ====================
[04:35:53] ==================== check_graphics_ip  ====================
[04:35:53] [PASSED] 12.00 Xe_LP
[04:35:53] [PASSED] 12.10 Xe_LP+
[04:35:53] [PASSED] 12.55 Xe_HPG
[04:35:53] [PASSED] 12.60 Xe_HPC
[04:35:53] [PASSED] 12.70 Xe_LPG
[04:35:53] [PASSED] 12.71 Xe_LPG
[04:35:53] [PASSED] 12.74 Xe_LPG+
[04:35:53] [PASSED] 20.01 Xe2_HPG
[04:35:53] [PASSED] 20.02 Xe2_HPG
[04:35:53] [PASSED] 20.04 Xe2_LPG
[04:35:53] [PASSED] 30.00 Xe3_LPG
[04:35:53] [PASSED] 30.01 Xe3_LPG
[04:35:53] [PASSED] 30.03 Xe3_LPG
[04:35:53] [PASSED] 30.04 Xe3_LPG
[04:35:53] [PASSED] 30.05 Xe3_LPG
[04:35:53] [PASSED] 35.10 Xe3p_LPG
[04:35:53] [PASSED] 35.11 Xe3p_XPC
[04:35:53] ================ [PASSED] check_graphics_ip ================
[04:35:53] ===================== check_media_ip  ======================
[04:35:53] [PASSED] 12.00 Xe_M
[04:35:53] [PASSED] 12.55 Xe_HPM
[04:35:53] [PASSED] 13.00 Xe_LPM+
[04:35:53] [PASSED] 13.01 Xe2_HPM
[04:35:53] [PASSED] 20.00 Xe2_LPM
[04:35:53] [PASSED] 30.00 Xe3_LPM
[04:35:53] [PASSED] 30.02 Xe3_LPM
[04:35:53] [PASSED] 35.00 Xe3p_LPM
[04:35:53] [PASSED] 35.03 Xe3p_HPM
[04:35:53] ================= [PASSED] check_media_ip ==================
[04:35:53] =================== check_platform_desc  ===================
[04:35:53] [PASSED] 0x9A60 (TIGERLAKE)
[04:35:53] [PASSED] 0x9A68 (TIGERLAKE)
[04:35:53] [PASSED] 0x9A70 (TIGERLAKE)
[04:35:53] [PASSED] 0x9A40 (TIGERLAKE)
[04:35:53] [PASSED] 0x9A49 (TIGERLAKE)
[04:35:53] [PASSED] 0x9A59 (TIGERLAKE)
[04:35:53] [PASSED] 0x9A78 (TIGERLAKE)
[04:35:53] [PASSED] 0x9AC0 (TIGERLAKE)
[04:35:53] [PASSED] 0x9AC9 (TIGERLAKE)
[04:35:53] [PASSED] 0x9AD9 (TIGERLAKE)
[04:35:53] [PASSED] 0x9AF8 (TIGERLAKE)
[04:35:53] [PASSED] 0x4C80 (ROCKETLAKE)
[04:35:53] [PASSED] 0x4C8A (ROCKETLAKE)
[04:35:53] [PASSED] 0x4C8B (ROCKETLAKE)
[04:35:53] [PASSED] 0x4C8C (ROCKETLAKE)
[04:35:53] [PASSED] 0x4C90 (ROCKETLAKE)
[04:35:53] [PASSED] 0x4C9A (ROCKETLAKE)
[04:35:53] [PASSED] 0x4680 (ALDERLAKE_S)
[04:35:53] [PASSED] 0x4682 (ALDERLAKE_S)
[04:35:53] [PASSED] 0x4688 (ALDERLAKE_S)
[04:35:53] [PASSED] 0x468A (ALDERLAKE_S)
[04:35:53] [PASSED] 0x468B (ALDERLAKE_S)
[04:35:53] [PASSED] 0x4690 (ALDERLAKE_S)
[04:35:53] [PASSED] 0x4692 (ALDERLAKE_S)
[04:35:53] [PASSED] 0x4693 (ALDERLAKE_S)
[04:35:53] [PASSED] 0x46A0 (ALDERLAKE_P)
[04:35:53] [PASSED] 0x46A1 (ALDERLAKE_P)
[04:35:53] [PASSED] 0x46A2 (ALDERLAKE_P)
[04:35:53] [PASSED] 0x46A3 (ALDERLAKE_P)
[04:35:53] [PASSED] 0x46A6 (ALDERLAKE_P)
[04:35:53] [PASSED] 0x46A8 (ALDERLAKE_P)
[04:35:53] [PASSED] 0x46AA (ALDERLAKE_P)
[04:35:53] [PASSED] 0x462A (ALDERLAKE_P)
[04:35:53] [PASSED] 0x4626 (ALDERLAKE_P)
[04:35:53] [PASSED] 0x4628 (ALDERLAKE_P)
[04:35:53] [PASSED] 0x46B0 (ALDERLAKE_P)
[04:35:53] [PASSED] 0x46B1 (ALDERLAKE_P)
[04:35:53] [PASSED] 0x46B2 (ALDERLAKE_P)
[04:35:53] [PASSED] 0x46B3 (ALDERLAKE_P)
[04:35:53] [PASSED] 0x46C0 (ALDERLAKE_P)
[04:35:53] [PASSED] 0x46C1 (ALDERLAKE_P)
[04:35:53] [PASSED] 0x46C2 (ALDERLAKE_P)
[04:35:53] [PASSED] 0x46C3 (ALDERLAKE_P)
[04:35:53] [PASSED] 0x46D0 (ALDERLAKE_N)
[04:35:53] [PASSED] 0x46D1 (ALDERLAKE_N)
[04:35:53] [PASSED] 0x46D2 (ALDERLAKE_N)
[04:35:53] [PASSED] 0x46D3 (ALDERLAKE_N)
[04:35:53] [PASSED] 0x46D4 (ALDERLAKE_N)
[04:35:53] [PASSED] 0xA721 (ALDERLAKE_P)
[04:35:53] [PASSED] 0xA7A1 (ALDERLAKE_P)
[04:35:53] [PASSED] 0xA7A9 (ALDERLAKE_P)
[04:35:53] [PASSED] 0xA7AC (ALDERLAKE_P)
[04:35:53] [PASSED] 0xA7AD (ALDERLAKE_P)
[04:35:53] [PASSED] 0xA720 (ALDERLAKE_P)
[04:35:53] [PASSED] 0xA7A0 (ALDERLAKE_P)
[04:35:53] [PASSED] 0xA7A8 (ALDERLAKE_P)
[04:35:53] [PASSED] 0xA7AA (ALDERLAKE_P)
[04:35:53] [PASSED] 0xA7AB (ALDERLAKE_P)
[04:35:53] [PASSED] 0xA780 (ALDERLAKE_S)
[04:35:53] [PASSED] 0xA781 (ALDERLAKE_S)
[04:35:53] [PASSED] 0xA782 (ALDERLAKE_S)
[04:35:53] [PASSED] 0xA783 (ALDERLAKE_S)
[04:35:53] [PASSED] 0xA788 (ALDERLAKE_S)
[04:35:53] [PASSED] 0xA789 (ALDERLAKE_S)
[04:35:53] [PASSED] 0xA78A (ALDERLAKE_S)
[04:35:53] [PASSED] 0xA78B (ALDERLAKE_S)
[04:35:53] [PASSED] 0x4905 (DG1)
[04:35:53] [PASSED] 0x4906 (DG1)
[04:35:53] [PASSED] 0x4907 (DG1)
[04:35:53] [PASSED] 0x4908 (DG1)
[04:35:53] [PASSED] 0x4909 (DG1)
[04:35:53] [PASSED] 0x56C0 (DG2)
[04:35:53] [PASSED] 0x56C2 (DG2)
[04:35:53] [PASSED] 0x56C1 (DG2)
[04:35:53] [PASSED] 0x7D51 (METEORLAKE)
[04:35:53] [PASSED] 0x7DD1 (METEORLAKE)
[04:35:53] [PASSED] 0x7D41 (METEORLAKE)
[04:35:53] [PASSED] 0x7D67 (METEORLAKE)
[04:35:53] [PASSED] 0xB640 (METEORLAKE)
[04:35:53] [PASSED] 0x56A0 (DG2)
[04:35:53] [PASSED] 0x56A1 (DG2)
[04:35:53] [PASSED] 0x56A2 (DG2)
[04:35:53] [PASSED] 0x56BE (DG2)
[04:35:53] [PASSED] 0x56BF (DG2)
[04:35:53] [PASSED] 0x5690 (DG2)
[04:35:53] [PASSED] 0x5691 (DG2)
[04:35:53] [PASSED] 0x5692 (DG2)
[04:35:53] [PASSED] 0x56A5 (DG2)
[04:35:53] [PASSED] 0x56A6 (DG2)
[04:35:53] [PASSED] 0x56B0 (DG2)
[04:35:53] [PASSED] 0x56B1 (DG2)
[04:35:53] [PASSED] 0x56BA (DG2)
[04:35:53] [PASSED] 0x56BB (DG2)
[04:35:53] [PASSED] 0x56BC (DG2)
[04:35:53] [PASSED] 0x56BD (DG2)
[04:35:53] [PASSED] 0x5693 (DG2)
[04:35:53] [PASSED] 0x5694 (DG2)
[04:35:53] [PASSED] 0x5695 (DG2)
[04:35:53] [PASSED] 0x56A3 (DG2)
[04:35:53] [PASSED] 0x56A4 (DG2)
[04:35:53] [PASSED] 0x56B2 (DG2)
[04:35:53] [PASSED] 0x56B3 (DG2)
[04:35:53] [PASSED] 0x5696 (DG2)
[04:35:53] [PASSED] 0x5697 (DG2)
[04:35:53] [PASSED] 0xB69 (PVC)
[04:35:53] [PASSED] 0xB6E (PVC)
[04:35:53] [PASSED] 0xBD4 (PVC)
[04:35:53] [PASSED] 0xBD5 (PVC)
[04:35:53] [PASSED] 0xBD6 (PVC)
[04:35:53] [PASSED] 0xBD7 (PVC)
[04:35:53] [PASSED] 0xBD8 (PVC)
[04:35:53] [PASSED] 0xBD9 (PVC)
[04:35:53] [PASSED] 0xBDA (PVC)
[04:35:53] [PASSED] 0xBDB (PVC)
[04:35:53] [PASSED] 0xBE0 (PVC)
[04:35:53] [PASSED] 0xBE1 (PVC)
[04:35:53] [PASSED] 0xBE5 (PVC)
[04:35:53] [PASSED] 0x7D40 (METEORLAKE)
[04:35:53] [PASSED] 0x7D45 (METEORLAKE)
[04:35:53] [PASSED] 0x7D55 (METEORLAKE)
[04:35:53] [PASSED] 0x7D60 (METEORLAKE)
[04:35:53] [PASSED] 0x7DD5 (METEORLAKE)
[04:35:53] [PASSED] 0x6420 (LUNARLAKE)
[04:35:53] [PASSED] 0x64A0 (LUNARLAKE)
[04:35:53] [PASSED] 0x64B0 (LUNARLAKE)
[04:35:53] [PASSED] 0xE202 (BATTLEMAGE)
[04:35:53] [PASSED] 0xE209 (BATTLEMAGE)
[04:35:53] [PASSED] 0xE20B (BATTLEMAGE)
[04:35:53] [PASSED] 0xE20C (BATTLEMAGE)
[04:35:53] [PASSED] 0xE20D (BATTLEMAGE)
[04:35:53] [PASSED] 0xE210 (BATTLEMAGE)
[04:35:53] [PASSED] 0xE211 (BATTLEMAGE)
[04:35:53] [PASSED] 0xE212 (BATTLEMAGE)
[04:35:53] [PASSED] 0xE216 (BATTLEMAGE)
[04:35:53] [PASSED] 0xE220 (BATTLEMAGE)
[04:35:53] [PASSED] 0xE221 (BATTLEMAGE)
[04:35:53] [PASSED] 0xE222 (BATTLEMAGE)
[04:35:53] [PASSED] 0xE223 (BATTLEMAGE)
[04:35:53] [PASSED] 0xB080 (PANTHERLAKE)
[04:35:53] [PASSED] 0xB081 (PANTHERLAKE)
[04:35:53] [PASSED] 0xB082 (PANTHERLAKE)
[04:35:53] [PASSED] 0xB083 (PANTHERLAKE)
[04:35:53] [PASSED] 0xB084 (PANTHERLAKE)
[04:35:53] [PASSED] 0xB085 (PANTHERLAKE)
[04:35:53] [PASSED] 0xB086 (PANTHERLAKE)
[04:35:53] [PASSED] 0xB087 (PANTHERLAKE)
[04:35:53] [PASSED] 0xB08F (PANTHERLAKE)
[04:35:53] [PASSED] 0xB090 (PANTHERLAKE)
[04:35:53] [PASSED] 0xB0A0 (PANTHERLAKE)
[04:35:53] [PASSED] 0xB0B0 (PANTHERLAKE)
[04:35:53] [PASSED] 0xFD80 (PANTHERLAKE)
[04:35:53] [PASSED] 0xFD81 (PANTHERLAKE)
[04:35:53] [PASSED] 0xD740 (NOVALAKE_S)
[04:35:53] [PASSED] 0xD741 (NOVALAKE_S)
[04:35:53] [PASSED] 0xD742 (NOVALAKE_S)
[04:35:53] [PASSED] 0xD743 (NOVALAKE_S)
[04:35:53] [PASSED] 0xD744 (NOVALAKE_S)
[04:35:53] [PASSED] 0xD745 (NOVALAKE_S)
[04:35:53] [PASSED] 0x674C (CRESCENTISLAND)
[04:35:53] [PASSED] 0xD750 (NOVALAKE_P)
[04:35:53] [PASSED] 0xD751 (NOVALAKE_P)
[04:35:53] [PASSED] 0xD752 (NOVALAKE_P)
[04:35:53] [PASSED] 0xD753 (NOVALAKE_P)
[04:35:53] [PASSED] 0xD754 (NOVALAKE_P)
[04:35:53] [PASSED] 0xD755 (NOVALAKE_P)
[04:35:53] [PASSED] 0xD756 (NOVALAKE_P)
[04:35:53] [PASSED] 0xD757 (NOVALAKE_P)
[04:35:53] [PASSED] 0xD75F (NOVALAKE_P)
[04:35:53] =============== [PASSED] check_platform_desc ===============
[04:35:53] ===================== [PASSED] xe_pci ======================
[04:35:53] =================== xe_rtp (2 subtests) ====================
[04:35:53] =============== xe_rtp_process_to_sr_tests  ================
[04:35:53] [PASSED] coalesce-same-reg
[04:35:53] [PASSED] no-match-no-add
[04:35:53] [PASSED] match-or
[04:35:53] [PASSED] match-or-xfail
[04:35:53] [PASSED] no-match-no-add-multiple-rules
[04:35:53] [PASSED] two-regs-two-entries
[04:35:53] [PASSED] clr-one-set-other
[04:35:53] [PASSED] set-field
[04:35:53] [PASSED] conflict-duplicate
stty: 'standard input': Inappropriate ioctl for device
[04:35:53] [PASSED] conflict-not-disjoint
[04:35:53] [PASSED] conflict-reg-type
[04:35:53] =========== [PASSED] xe_rtp_process_to_sr_tests ============
[04:35:53] ================== xe_rtp_process_tests  ===================
[04:35:53] [PASSED] active1
[04:35:53] [PASSED] active2
[04:35:53] [PASSED] active-inactive
[04:35:53] [PASSED] inactive-active
[04:35:53] [PASSED] inactive-1st_or_active-inactive
[04:35:53] [PASSED] inactive-2nd_or_active-inactive
[04:35:53] [PASSED] inactive-last_or_active-inactive
[04:35:53] [PASSED] inactive-no_or_active-inactive
[04:35:53] ============== [PASSED] xe_rtp_process_tests ===============
[04:35:53] ===================== [PASSED] xe_rtp ======================
[04:35:53] ==================== xe_wa (1 subtest) =====================
[04:35:53] ======================== xe_wa_gt  =========================
[04:35:53] [PASSED] TIGERLAKE B0
[04:35:53] [PASSED] DG1 A0
[04:35:53] [PASSED] DG1 B0
[04:35:53] [PASSED] ALDERLAKE_S A0
[04:35:53] [PASSED] ALDERLAKE_S B0
[04:35:53] [PASSED] ALDERLAKE_S C0
[04:35:53] [PASSED] ALDERLAKE_S D0
[04:35:53] [PASSED] ALDERLAKE_P A0
[04:35:53] [PASSED] ALDERLAKE_P B0
[04:35:53] [PASSED] ALDERLAKE_P C0
[04:35:53] [PASSED] ALDERLAKE_S RPLS D0
[04:35:53] [PASSED] ALDERLAKE_P RPLU E0
[04:35:53] [PASSED] DG2 G10 C0
[04:35:53] [PASSED] DG2 G11 B1
[04:35:53] [PASSED] DG2 G12 A1
[04:35:53] [PASSED] METEORLAKE 12.70(Xe_LPG) A0 13.00(Xe_LPM+) A0
[04:35:53] [PASSED] METEORLAKE 12.71(Xe_LPG) A0 13.00(Xe_LPM+) A0
[04:35:53] [PASSED] METEORLAKE 12.74(Xe_LPG+) A0 13.00(Xe_LPM+) A0
[04:35:53] [PASSED] LUNARLAKE 20.04(Xe2_LPG) A0 20.00(Xe2_LPM) A0
[04:35:53] [PASSED] LUNARLAKE 20.04(Xe2_LPG) B0 20.00(Xe2_LPM) A0
[04:35:53] [PASSED] BATTLEMAGE 20.01(Xe2_HPG) A0 13.01(Xe2_HPM) A1
[04:35:53] [PASSED] PANTHERLAKE 30.00(Xe3_LPG) A0 30.00(Xe3_LPM) A0
[04:35:53] ==================== [PASSED] xe_wa_gt =====================
[04:35:53] ====================== [PASSED] xe_wa ======================
[04:35:53] ============================================================
[04:35:53] Testing complete. Ran 597 tests: passed: 579, skipped: 18
[04:35:53] Elapsed time: 35.565s total, 4.215s configuring, 30.682s building, 0.639s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/tests/.kunitconfig
[04:35:54] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[04:35:55] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[04:36:19] Starting KUnit Kernel (1/1)...
[04:36:19] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[04:36:19] ============ drm_test_pick_cmdline (2 subtests) ============
[04:36:19] [PASSED] drm_test_pick_cmdline_res_1920_1080_60
[04:36:19] =============== drm_test_pick_cmdline_named  ===============
[04:36:19] [PASSED] NTSC
[04:36:19] [PASSED] NTSC-J
[04:36:19] [PASSED] PAL
[04:36:19] [PASSED] PAL-M
[04:36:19] =========== [PASSED] drm_test_pick_cmdline_named ===========
[04:36:19] ============== [PASSED] drm_test_pick_cmdline ==============
[04:36:19] == drm_test_atomic_get_connector_for_encoder (1 subtest) ===
[04:36:19] [PASSED] drm_test_drm_atomic_get_connector_for_encoder
[04:36:19] ==== [PASSED] drm_test_atomic_get_connector_for_encoder ====
[04:36:19] =========== drm_validate_clone_mode (2 subtests) ===========
[04:36:19] ============== drm_test_check_in_clone_mode  ===============
[04:36:19] [PASSED] in_clone_mode
[04:36:19] [PASSED] not_in_clone_mode
[04:36:19] ========== [PASSED] drm_test_check_in_clone_mode ===========
[04:36:19] =============== drm_test_check_valid_clones  ===============
[04:36:19] [PASSED] not_in_clone_mode
[04:36:19] [PASSED] valid_clone
[04:36:19] [PASSED] invalid_clone
[04:36:19] =========== [PASSED] drm_test_check_valid_clones ===========
[04:36:19] ============= [PASSED] drm_validate_clone_mode =============
[04:36:19] ============= drm_validate_modeset (1 subtest) =============
[04:36:19] [PASSED] drm_test_check_connector_changed_modeset
[04:36:19] ============== [PASSED] drm_validate_modeset ===============
[04:36:19] ====== drm_test_bridge_get_current_state (2 subtests) ======
[04:36:19] [PASSED] drm_test_drm_bridge_get_current_state_atomic
[04:36:19] [PASSED] drm_test_drm_bridge_get_current_state_legacy
[04:36:19] ======== [PASSED] drm_test_bridge_get_current_state ========
[04:36:19] ====== drm_test_bridge_helper_reset_crtc (3 subtests) ======
[04:36:19] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic
[04:36:19] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic_disabled
[04:36:19] [PASSED] drm_test_drm_bridge_helper_reset_crtc_legacy
[04:36:19] ======== [PASSED] drm_test_bridge_helper_reset_crtc ========
[04:36:19] ============== drm_bridge_alloc (2 subtests) ===============
[04:36:19] [PASSED] drm_test_drm_bridge_alloc_basic
[04:36:19] [PASSED] drm_test_drm_bridge_alloc_get_put
[04:36:19] ================ [PASSED] drm_bridge_alloc =================
[04:36:19] ============= drm_cmdline_parser (40 subtests) =============
[04:36:19] [PASSED] drm_test_cmdline_force_d_only
[04:36:19] [PASSED] drm_test_cmdline_force_D_only_dvi
[04:36:19] [PASSED] drm_test_cmdline_force_D_only_hdmi
[04:36:19] [PASSED] drm_test_cmdline_force_D_only_not_digital
[04:36:19] [PASSED] drm_test_cmdline_force_e_only
[04:36:19] [PASSED] drm_test_cmdline_res
[04:36:19] [PASSED] drm_test_cmdline_res_vesa
[04:36:19] [PASSED] drm_test_cmdline_res_vesa_rblank
[04:36:19] [PASSED] drm_test_cmdline_res_rblank
[04:36:19] [PASSED] drm_test_cmdline_res_bpp
[04:36:19] [PASSED] drm_test_cmdline_res_refresh
[04:36:19] [PASSED] drm_test_cmdline_res_bpp_refresh
[04:36:19] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced
[04:36:19] [PASSED] drm_test_cmdline_res_bpp_refresh_margins
[04:36:19] [PASSED] drm_test_cmdline_res_bpp_refresh_force_off
[04:36:19] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on
[04:36:19] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_analog
[04:36:19] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_digital
[04:36:19] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced_margins_force_on
[04:36:19] [PASSED] drm_test_cmdline_res_margins_force_on
[04:36:19] [PASSED] drm_test_cmdline_res_vesa_margins
[04:36:19] [PASSED] drm_test_cmdline_name
[04:36:19] [PASSED] drm_test_cmdline_name_bpp
[04:36:19] [PASSED] drm_test_cmdline_name_option
[04:36:19] [PASSED] drm_test_cmdline_name_bpp_option
[04:36:19] [PASSED] drm_test_cmdline_rotate_0
[04:36:19] [PASSED] drm_test_cmdline_rotate_90
[04:36:19] [PASSED] drm_test_cmdline_rotate_180
[04:36:19] [PASSED] drm_test_cmdline_rotate_270
[04:36:19] [PASSED] drm_test_cmdline_hmirror
[04:36:19] [PASSED] drm_test_cmdline_vmirror
[04:36:19] [PASSED] drm_test_cmdline_margin_options
[04:36:19] [PASSED] drm_test_cmdline_multiple_options
[04:36:19] [PASSED] drm_test_cmdline_bpp_extra_and_option
[04:36:19] [PASSED] drm_test_cmdline_extra_and_option
[04:36:19] [PASSED] drm_test_cmdline_freestanding_options
[04:36:19] [PASSED] drm_test_cmdline_freestanding_force_e_and_options
[04:36:19] [PASSED] drm_test_cmdline_panel_orientation
[04:36:19] ================ drm_test_cmdline_invalid  =================
[04:36:19] [PASSED] margin_only
[04:36:19] [PASSED] interlace_only
[04:36:19] [PASSED] res_missing_x
[04:36:19] [PASSED] res_missing_y
[04:36:19] [PASSED] res_bad_y
[04:36:19] [PASSED] res_missing_y_bpp
[04:36:19] [PASSED] res_bad_bpp
[04:36:19] [PASSED] res_bad_refresh
[04:36:19] [PASSED] res_bpp_refresh_force_on_off
[04:36:19] [PASSED] res_invalid_mode
[04:36:19] [PASSED] res_bpp_wrong_place_mode
[04:36:19] [PASSED] name_bpp_refresh
[04:36:19] [PASSED] name_refresh
[04:36:19] [PASSED] name_refresh_wrong_mode
[04:36:19] [PASSED] name_refresh_invalid_mode
[04:36:19] [PASSED] rotate_multiple
[04:36:19] [PASSED] rotate_invalid_val
[04:36:19] [PASSED] rotate_truncated
[04:36:19] [PASSED] invalid_option
[04:36:19] [PASSED] invalid_tv_option
[04:36:19] [PASSED] truncated_tv_option
[04:36:19] ============ [PASSED] drm_test_cmdline_invalid =============
[04:36:19] =============== drm_test_cmdline_tv_options  ===============
[04:36:19] [PASSED] NTSC
[04:36:19] [PASSED] NTSC_443
[04:36:19] [PASSED] NTSC_J
[04:36:19] [PASSED] PAL
[04:36:19] [PASSED] PAL_M
[04:36:19] [PASSED] PAL_N
[04:36:19] [PASSED] SECAM
[04:36:19] [PASSED] MONO_525
[04:36:19] [PASSED] MONO_625
[04:36:19] =========== [PASSED] drm_test_cmdline_tv_options ===========
[04:36:19] =============== [PASSED] drm_cmdline_parser ================
[04:36:19] ========== drmm_connector_hdmi_init (20 subtests) ==========
[04:36:19] [PASSED] drm_test_connector_hdmi_init_valid
[04:36:19] [PASSED] drm_test_connector_hdmi_init_bpc_8
[04:36:19] [PASSED] drm_test_connector_hdmi_init_bpc_10
[04:36:19] [PASSED] drm_test_connector_hdmi_init_bpc_12
[04:36:19] [PASSED] drm_test_connector_hdmi_init_bpc_invalid
[04:36:19] [PASSED] drm_test_connector_hdmi_init_bpc_null
[04:36:19] [PASSED] drm_test_connector_hdmi_init_formats_empty
[04:36:19] [PASSED] drm_test_connector_hdmi_init_formats_no_rgb
[04:36:19] === drm_test_connector_hdmi_init_formats_yuv420_allowed  ===
[04:36:19] [PASSED] supported_formats=0x9 yuv420_allowed=1
[04:36:19] [PASSED] supported_formats=0x9 yuv420_allowed=0
[04:36:19] [PASSED] supported_formats=0x3 yuv420_allowed=1
[04:36:19] [PASSED] supported_formats=0x3 yuv420_allowed=0
[04:36:19] === [PASSED] drm_test_connector_hdmi_init_formats_yuv420_allowed ===
[04:36:19] [PASSED] drm_test_connector_hdmi_init_null_ddc
[04:36:19] [PASSED] drm_test_connector_hdmi_init_null_product
[04:36:19] [PASSED] drm_test_connector_hdmi_init_null_vendor
[04:36:19] [PASSED] drm_test_connector_hdmi_init_product_length_exact
[04:36:19] [PASSED] drm_test_connector_hdmi_init_product_length_too_long
[04:36:19] [PASSED] drm_test_connector_hdmi_init_product_valid
[04:36:19] [PASSED] drm_test_connector_hdmi_init_vendor_length_exact
[04:36:19] [PASSED] drm_test_connector_hdmi_init_vendor_length_too_long
[04:36:19] [PASSED] drm_test_connector_hdmi_init_vendor_valid
[04:36:19] ========= drm_test_connector_hdmi_init_type_valid  =========
[04:36:19] [PASSED] HDMI-A
[04:36:19] [PASSED] HDMI-B
[04:36:19] ===== [PASSED] drm_test_connector_hdmi_init_type_valid =====
[04:36:19] ======== drm_test_connector_hdmi_init_type_invalid  ========
[04:36:19] [PASSED] Unknown
[04:36:19] [PASSED] VGA
[04:36:19] [PASSED] DVI-I
[04:36:19] [PASSED] DVI-D
[04:36:19] [PASSED] DVI-A
[04:36:19] [PASSED] Composite
[04:36:19] [PASSED] SVIDEO
[04:36:19] [PASSED] LVDS
[04:36:19] [PASSED] Component
[04:36:19] [PASSED] DIN
[04:36:19] [PASSED] DP
[04:36:19] [PASSED] TV
[04:36:19] [PASSED] eDP
[04:36:19] [PASSED] Virtual
[04:36:19] [PASSED] DSI
[04:36:19] [PASSED] DPI
[04:36:19] [PASSED] Writeback
[04:36:19] [PASSED] SPI
[04:36:19] [PASSED] USB
[04:36:19] ==== [PASSED] drm_test_connector_hdmi_init_type_invalid ====
[04:36:19] ============ [PASSED] drmm_connector_hdmi_init =============
[04:36:19] ============= drmm_connector_init (3 subtests) =============
[04:36:19] [PASSED] drm_test_drmm_connector_init
[04:36:19] [PASSED] drm_test_drmm_connector_init_null_ddc
[04:36:19] ========= drm_test_drmm_connector_init_type_valid  =========
[04:36:19] [PASSED] Unknown
[04:36:19] [PASSED] VGA
[04:36:19] [PASSED] DVI-I
[04:36:19] [PASSED] DVI-D
[04:36:19] [PASSED] DVI-A
[04:36:19] [PASSED] Composite
[04:36:19] [PASSED] SVIDEO
[04:36:19] [PASSED] LVDS
[04:36:19] [PASSED] Component
[04:36:19] [PASSED] DIN
[04:36:19] [PASSED] DP
[04:36:19] [PASSED] HDMI-A
[04:36:19] [PASSED] HDMI-B
[04:36:19] [PASSED] TV
[04:36:19] [PASSED] eDP
[04:36:19] [PASSED] Virtual
[04:36:19] [PASSED] DSI
[04:36:19] [PASSED] DPI
[04:36:19] [PASSED] Writeback
[04:36:19] [PASSED] SPI
[04:36:19] [PASSED] USB
[04:36:19] ===== [PASSED] drm_test_drmm_connector_init_type_valid =====
[04:36:19] =============== [PASSED] drmm_connector_init ===============
[04:36:19] ========= drm_connector_dynamic_init (6 subtests) ==========
[04:36:19] [PASSED] drm_test_drm_connector_dynamic_init
[04:36:19] [PASSED] drm_test_drm_connector_dynamic_init_null_ddc
[04:36:19] [PASSED] drm_test_drm_connector_dynamic_init_not_added
[04:36:19] [PASSED] drm_test_drm_connector_dynamic_init_properties
[04:36:19] ===== drm_test_drm_connector_dynamic_init_type_valid  ======
[04:36:19] [PASSED] Unknown
[04:36:19] [PASSED] VGA
[04:36:19] [PASSED] DVI-I
[04:36:19] [PASSED] DVI-D
[04:36:19] [PASSED] DVI-A
[04:36:19] [PASSED] Composite
[04:36:19] [PASSED] SVIDEO
[04:36:19] [PASSED] LVDS
[04:36:19] [PASSED] Component
[04:36:19] [PASSED] DIN
[04:36:19] [PASSED] DP
[04:36:19] [PASSED] HDMI-A
[04:36:19] [PASSED] HDMI-B
[04:36:19] [PASSED] TV
[04:36:19] [PASSED] eDP
[04:36:19] [PASSED] Virtual
[04:36:19] [PASSED] DSI
[04:36:19] [PASSED] DPI
[04:36:19] [PASSED] Writeback
[04:36:19] [PASSED] SPI
[04:36:19] [PASSED] USB
[04:36:19] = [PASSED] drm_test_drm_connector_dynamic_init_type_valid ==
[04:36:19] ======== drm_test_drm_connector_dynamic_init_name  =========
[04:36:19] [PASSED] Unknown
[04:36:19] [PASSED] VGA
[04:36:19] [PASSED] DVI-I
[04:36:19] [PASSED] DVI-D
[04:36:19] [PASSED] DVI-A
[04:36:19] [PASSED] Composite
[04:36:19] [PASSED] SVIDEO
[04:36:19] [PASSED] LVDS
[04:36:19] [PASSED] Component
[04:36:19] [PASSED] DIN
[04:36:19] [PASSED] DP
[04:36:19] [PASSED] HDMI-A
[04:36:19] [PASSED] HDMI-B
[04:36:19] [PASSED] TV
[04:36:19] [PASSED] eDP
[04:36:19] [PASSED] Virtual
[04:36:19] [PASSED] DSI
[04:36:19] [PASSED] DPI
[04:36:19] [PASSED] Writeback
[04:36:19] [PASSED] SPI
[04:36:19] [PASSED] USB
[04:36:19] ==== [PASSED] drm_test_drm_connector_dynamic_init_name =====
[04:36:19] =========== [PASSED] drm_connector_dynamic_init ============
[04:36:19] ==== drm_connector_dynamic_register_early (4 subtests) =====
[04:36:19] [PASSED] drm_test_drm_connector_dynamic_register_early_on_list
[04:36:19] [PASSED] drm_test_drm_connector_dynamic_register_early_defer
[04:36:19] [PASSED] drm_test_drm_connector_dynamic_register_early_no_init
[04:36:19] [PASSED] drm_test_drm_connector_dynamic_register_early_no_mode_object
[04:36:19] ====== [PASSED] drm_connector_dynamic_register_early =======
[04:36:19] ======= drm_connector_dynamic_register (7 subtests) ========
[04:36:19] [PASSED] drm_test_drm_connector_dynamic_register_on_list
[04:36:19] [PASSED] drm_test_drm_connector_dynamic_register_no_defer
[04:36:19] [PASSED] drm_test_drm_connector_dynamic_register_no_init
[04:36:19] [PASSED] drm_test_drm_connector_dynamic_register_mode_object
[04:36:19] [PASSED] drm_test_drm_connector_dynamic_register_sysfs
[04:36:19] [PASSED] drm_test_drm_connector_dynamic_register_sysfs_name
[04:36:19] [PASSED] drm_test_drm_connector_dynamic_register_debugfs
[04:36:19] ========= [PASSED] drm_connector_dynamic_register ==========
[04:36:19] = drm_connector_attach_broadcast_rgb_property (2 subtests) =
[04:36:19] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property
[04:36:19] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property_hdmi_connector
[04:36:19] === [PASSED] drm_connector_attach_broadcast_rgb_property ===
[04:36:19] ========== drm_get_tv_mode_from_name (2 subtests) ==========
[04:36:19] ========== drm_test_get_tv_mode_from_name_valid  ===========
[04:36:19] [PASSED] NTSC
[04:36:19] [PASSED] NTSC-443
[04:36:19] [PASSED] NTSC-J
[04:36:19] [PASSED] PAL
[04:36:19] [PASSED] PAL-M
[04:36:19] [PASSED] PAL-N
[04:36:19] [PASSED] SECAM
[04:36:19] [PASSED] Mono
[04:36:19] ====== [PASSED] drm_test_get_tv_mode_from_name_valid =======
[04:36:19] [PASSED] drm_test_get_tv_mode_from_name_truncated
[04:36:19] ============ [PASSED] drm_get_tv_mode_from_name ============
[04:36:19] = drm_test_connector_hdmi_compute_mode_clock (12 subtests) =
[04:36:19] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb
[04:36:19] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc
[04:36:19] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc_vic_1
[04:36:19] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc
[04:36:19] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc_vic_1
[04:36:19] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_double
[04:36:19] = drm_test_connector_hdmi_compute_mode_clock_yuv420_valid  =
[04:36:19] [PASSED] VIC 96
[04:36:19] [PASSED] VIC 97
[04:36:19] [PASSED] VIC 101
[04:36:19] [PASSED] VIC 102
[04:36:19] [PASSED] VIC 106
[04:36:19] [PASSED] VIC 107
[04:36:19] === [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_valid ===
[04:36:19] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_10_bpc
[04:36:19] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_12_bpc
[04:36:19] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_8_bpc
[04:36:19] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_10_bpc
[04:36:19] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_12_bpc
[04:36:19] === [PASSED] drm_test_connector_hdmi_compute_mode_clock ====
[04:36:19] == drm_hdmi_connector_get_broadcast_rgb_name (2 subtests) ==
[04:36:19] === drm_test_drm_hdmi_connector_get_broadcast_rgb_name  ====
[04:36:19] [PASSED] Automatic
[04:36:19] [PASSED] Full
[04:36:19] [PASSED] Limited 16:235
[04:36:19] === [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name ===
[04:36:19] [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name_invalid
[04:36:19] ==== [PASSED] drm_hdmi_connector_get_broadcast_rgb_name ====
[04:36:19] == drm_hdmi_connector_get_output_format_name (2 subtests) ==
[04:36:19] === drm_test_drm_hdmi_connector_get_output_format_name  ====
[04:36:19] [PASSED] RGB
[04:36:19] [PASSED] YUV 4:2:0
[04:36:19] [PASSED] YUV 4:2:2
[04:36:19] [PASSED] YUV 4:4:4
[04:36:19] === [PASSED] drm_test_drm_hdmi_connector_get_output_format_name ===
[04:36:19] [PASSED] drm_test_drm_hdmi_connector_get_output_format_name_invalid
[04:36:19] ==== [PASSED] drm_hdmi_connector_get_output_format_name ====
[04:36:19] ============= drm_damage_helper (21 subtests) ==============
[04:36:19] [PASSED] drm_test_damage_iter_no_damage
[04:36:19] [PASSED] drm_test_damage_iter_no_damage_fractional_src
[04:36:19] [PASSED] drm_test_damage_iter_no_damage_src_moved
[04:36:19] [PASSED] drm_test_damage_iter_no_damage_fractional_src_moved
[04:36:19] [PASSED] drm_test_damage_iter_no_damage_not_visible
[04:36:19] [PASSED] drm_test_damage_iter_no_damage_no_crtc
[04:36:19] [PASSED] drm_test_damage_iter_no_damage_no_fb
[04:36:19] [PASSED] drm_test_damage_iter_simple_damage
[04:36:19] [PASSED] drm_test_damage_iter_single_damage
[04:36:19] [PASSED] drm_test_damage_iter_single_damage_intersect_src
[04:36:19] [PASSED] drm_test_damage_iter_single_damage_outside_src
[04:36:19] [PASSED] drm_test_damage_iter_single_damage_fractional_src
[04:36:19] [PASSED] drm_test_damage_iter_single_damage_intersect_fractional_src
[04:36:19] [PASSED] drm_test_damage_iter_single_damage_outside_fractional_src
[04:36:19] [PASSED] drm_test_damage_iter_single_damage_src_moved
[04:36:19] [PASSED] drm_test_damage_iter_single_damage_fractional_src_moved
[04:36:19] [PASSED] drm_test_damage_iter_damage
[04:36:19] [PASSED] drm_test_damage_iter_damage_one_intersect
[04:36:19] [PASSED] drm_test_damage_iter_damage_one_outside
[04:36:19] [PASSED] drm_test_damage_iter_damage_src_moved
[04:36:19] [PASSED] drm_test_damage_iter_damage_not_visible
[04:36:19] ================ [PASSED] drm_damage_helper ================
[04:36:19] ============== drm_dp_mst_helper (3 subtests) ==============
[04:36:19] ============== drm_test_dp_mst_calc_pbn_mode  ==============
[04:36:19] [PASSED] Clock 154000 BPP 30 DSC disabled
[04:36:19] [PASSED] Clock 234000 BPP 30 DSC disabled
[04:36:19] [PASSED] Clock 297000 BPP 24 DSC disabled
[04:36:19] [PASSED] Clock 332880 BPP 24 DSC enabled
[04:36:19] [PASSED] Clock 324540 BPP 24 DSC enabled
[04:36:19] ========== [PASSED] drm_test_dp_mst_calc_pbn_mode ==========
[04:36:19] ============== drm_test_dp_mst_calc_pbn_div  ===============
[04:36:19] [PASSED] Link rate 2000000 lane count 4
[04:36:19] [PASSED] Link rate 2000000 lane count 2
[04:36:19] [PASSED] Link rate 2000000 lane count 1
[04:36:19] [PASSED] Link rate 1350000 lane count 4
[04:36:19] [PASSED] Link rate 1350000 lane count 2
[04:36:19] [PASSED] Link rate 1350000 lane count 1
[04:36:19] [PASSED] Link rate 1000000 lane count 4
[04:36:19] [PASSED] Link rate 1000000 lane count 2
[04:36:19] [PASSED] Link rate 1000000 lane count 1
[04:36:19] [PASSED] Link rate 810000 lane count 4
[04:36:19] [PASSED] Link rate 810000 lane count 2
[04:36:19] [PASSED] Link rate 810000 lane count 1
[04:36:19] [PASSED] Link rate 540000 lane count 4
[04:36:19] [PASSED] Link rate 540000 lane count 2
[04:36:19] [PASSED] Link rate 540000 lane count 1
[04:36:19] [PASSED] Link rate 270000 lane count 4
[04:36:19] [PASSED] Link rate 270000 lane count 2
[04:36:19] [PASSED] Link rate 270000 lane count 1
[04:36:19] [PASSED] Link rate 162000 lane count 4
[04:36:19] [PASSED] Link rate 162000 lane count 2
[04:36:19] [PASSED] Link rate 162000 lane count 1
[04:36:19] ========== [PASSED] drm_test_dp_mst_calc_pbn_div ===========
[04:36:19] ========= drm_test_dp_mst_sideband_msg_req_decode  =========
[04:36:19] [PASSED] DP_ENUM_PATH_RESOURCES with port number
[04:36:19] [PASSED] DP_POWER_UP_PHY with port number
[04:36:19] [PASSED] DP_POWER_DOWN_PHY with port number
[04:36:19] [PASSED] DP_ALLOCATE_PAYLOAD with SDP stream sinks
[04:36:19] [PASSED] DP_ALLOCATE_PAYLOAD with port number
[04:36:19] [PASSED] DP_ALLOCATE_PAYLOAD with VCPI
[04:36:19] [PASSED] DP_ALLOCATE_PAYLOAD with PBN
[04:36:19] [PASSED] DP_QUERY_PAYLOAD with port number
[04:36:19] [PASSED] DP_QUERY_PAYLOAD with VCPI
[04:36:19] [PASSED] DP_REMOTE_DPCD_READ with port number
[04:36:19] [PASSED] DP_REMOTE_DPCD_READ with DPCD address
[04:36:19] [PASSED] DP_REMOTE_DPCD_READ with max number of bytes
[04:36:19] [PASSED] DP_REMOTE_DPCD_WRITE with port number
[04:36:19] [PASSED] DP_REMOTE_DPCD_WRITE with DPCD address
[04:36:19] [PASSED] DP_REMOTE_DPCD_WRITE with data array
[04:36:19] [PASSED] DP_REMOTE_I2C_READ with port number
[04:36:19] [PASSED] DP_REMOTE_I2C_READ with I2C device ID
[04:36:19] [PASSED] DP_REMOTE_I2C_READ with transactions array
[04:36:19] [PASSED] DP_REMOTE_I2C_WRITE with port number
[04:36:19] [PASSED] DP_REMOTE_I2C_WRITE with I2C device ID
[04:36:19] [PASSED] DP_REMOTE_I2C_WRITE with data array
[04:36:19] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream ID
[04:36:19] [PASSED] DP_QUERY_STREAM_ENC_STATUS with client ID
[04:36:19] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream event
[04:36:19] [PASSED] DP_QUERY_STREAM_ENC_STATUS with valid stream event
[04:36:19] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream behavior
[04:36:19] [PASSED] DP_QUERY_STREAM_ENC_STATUS with a valid stream behavior
[04:36:19] ===== [PASSED] drm_test_dp_mst_sideband_msg_req_decode =====
[04:36:19] ================ [PASSED] drm_dp_mst_helper ================
[04:36:19] ================== drm_exec (7 subtests) ===================
[04:36:19] [PASSED] sanitycheck
[04:36:19] [PASSED] test_lock
[04:36:19] [PASSED] test_lock_unlock
[04:36:19] [PASSED] test_duplicates
[04:36:19] [PASSED] test_prepare
[04:36:19] [PASSED] test_prepare_array
[04:36:19] [PASSED] test_multiple_loops
[04:36:19] ==================== [PASSED] drm_exec =====================
[04:36:19] =========== drm_format_helper_test (17 subtests) ===========
[04:36:19] ============== drm_test_fb_xrgb8888_to_gray8  ==============
[04:36:19] [PASSED] single_pixel_source_buffer
[04:36:19] [PASSED] single_pixel_clip_rectangle
[04:36:19] [PASSED] well_known_colors
[04:36:19] [PASSED] destination_pitch
[04:36:19] ========== [PASSED] drm_test_fb_xrgb8888_to_gray8 ==========
[04:36:19] ============= drm_test_fb_xrgb8888_to_rgb332  ==============
[04:36:19] [PASSED] single_pixel_source_buffer
[04:36:19] [PASSED] single_pixel_clip_rectangle
[04:36:19] [PASSED] well_known_colors
[04:36:19] [PASSED] destination_pitch
[04:36:19] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb332 ==========
[04:36:19] ============= drm_test_fb_xrgb8888_to_rgb565  ==============
[04:36:19] [PASSED] single_pixel_source_buffer
[04:36:19] [PASSED] single_pixel_clip_rectangle
[04:36:19] [PASSED] well_known_colors
[04:36:19] [PASSED] destination_pitch
[04:36:19] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb565 ==========
[04:36:19] ============ drm_test_fb_xrgb8888_to_xrgb1555  =============
[04:36:19] [PASSED] single_pixel_source_buffer
[04:36:19] [PASSED] single_pixel_clip_rectangle
[04:36:19] [PASSED] well_known_colors
[04:36:19] [PASSED] destination_pitch
[04:36:19] ======== [PASSED] drm_test_fb_xrgb8888_to_xrgb1555 =========
[04:36:19] ============ drm_test_fb_xrgb8888_to_argb1555  =============
[04:36:19] [PASSED] single_pixel_source_buffer
[04:36:19] [PASSED] single_pixel_clip_rectangle
[04:36:19] [PASSED] well_known_colors
[04:36:19] [PASSED] destination_pitch
[04:36:19] ======== [PASSED] drm_test_fb_xrgb8888_to_argb1555 =========
[04:36:19] ============ drm_test_fb_xrgb8888_to_rgba5551  =============
[04:36:19] [PASSED] single_pixel_source_buffer
[04:36:19] [PASSED] single_pixel_clip_rectangle
[04:36:19] [PASSED] well_known_colors
[04:36:19] [PASSED] destination_pitch
[04:36:19] ======== [PASSED] drm_test_fb_xrgb8888_to_rgba5551 =========
[04:36:19] ============= drm_test_fb_xrgb8888_to_rgb888  ==============
[04:36:19] [PASSED] single_pixel_source_buffer
[04:36:19] [PASSED] single_pixel_clip_rectangle
[04:36:19] [PASSED] well_known_colors
[04:36:19] [PASSED] destination_pitch
[04:36:19] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb888 ==========
[04:36:19] ============= drm_test_fb_xrgb8888_to_bgr888  ==============
[04:36:19] [PASSED] single_pixel_source_buffer
[04:36:19] [PASSED] single_pixel_clip_rectangle
[04:36:19] [PASSED] well_known_colors
[04:36:19] [PASSED] destination_pitch
[04:36:19] ========= [PASSED] drm_test_fb_xrgb8888_to_bgr888 ==========
[04:36:19] ============ drm_test_fb_xrgb8888_to_argb8888  =============
[04:36:19] [PASSED] single_pixel_source_buffer
[04:36:19] [PASSED] single_pixel_clip_rectangle
[04:36:19] [PASSED] well_known_colors
[04:36:19] [PASSED] destination_pitch
[04:36:19] ======== [PASSED] drm_test_fb_xrgb8888_to_argb8888 =========
[04:36:19] =========== drm_test_fb_xrgb8888_to_xrgb2101010  ===========
[04:36:19] [PASSED] single_pixel_source_buffer
[04:36:19] [PASSED] single_pixel_clip_rectangle
[04:36:19] [PASSED] well_known_colors
[04:36:19] [PASSED] destination_pitch
[04:36:19] ======= [PASSED] drm_test_fb_xrgb8888_to_xrgb2101010 =======
[04:36:19] =========== drm_test_fb_xrgb8888_to_argb2101010  ===========
[04:36:19] [PASSED] single_pixel_source_buffer
[04:36:19] [PASSED] single_pixel_clip_rectangle
[04:36:19] [PASSED] well_known_colors
[04:36:19] [PASSED] destination_pitch
[04:36:19] ======= [PASSED] drm_test_fb_xrgb8888_to_argb2101010 =======
[04:36:19] ============== drm_test_fb_xrgb8888_to_mono  ===============
[04:36:19] [PASSED] single_pixel_source_buffer
[04:36:19] [PASSED] single_pixel_clip_rectangle
[04:36:19] [PASSED] well_known_colors
[04:36:19] [PASSED] destination_pitch
[04:36:19] ========== [PASSED] drm_test_fb_xrgb8888_to_mono ===========
[04:36:19] ==================== drm_test_fb_swab  =====================
[04:36:19] [PASSED] single_pixel_source_buffer
[04:36:19] [PASSED] single_pixel_clip_rectangle
[04:36:19] [PASSED] well_known_colors
[04:36:19] [PASSED] destination_pitch
[04:36:19] ================ [PASSED] drm_test_fb_swab =================
[04:36:19] ============ drm_test_fb_xrgb8888_to_xbgr8888  =============
[04:36:19] [PASSED] single_pixel_source_buffer
[04:36:19] [PASSED] single_pixel_clip_rectangle
[04:36:19] [PASSED] well_known_colors
[04:36:19] [PASSED] destination_pitch
[04:36:19] ======== [PASSED] drm_test_fb_xrgb8888_to_xbgr8888 =========
[04:36:19] ============ drm_test_fb_xrgb8888_to_abgr8888  =============
[04:36:19] [PASSED] single_pixel_source_buffer
[04:36:19] [PASSED] single_pixel_clip_rectangle
[04:36:19] [PASSED] well_known_colors
[04:36:19] [PASSED] destination_pitch
[04:36:19] ======== [PASSED] drm_test_fb_xrgb8888_to_abgr8888 =========
[04:36:19] ================= drm_test_fb_clip_offset  =================
[04:36:19] [PASSED] pass through
[04:36:19] [PASSED] horizontal offset
[04:36:19] [PASSED] vertical offset
[04:36:19] [PASSED] horizontal and vertical offset
[04:36:19] [PASSED] horizontal offset (custom pitch)
[04:36:19] [PASSED] vertical offset (custom pitch)
[04:36:19] [PASSED] horizontal and vertical offset (custom pitch)
[04:36:19] ============= [PASSED] drm_test_fb_clip_offset =============
[04:36:19] =================== drm_test_fb_memcpy  ====================
[04:36:19] [PASSED] single_pixel_source_buffer: XR24 little-endian (0x34325258)
[04:36:19] [PASSED] single_pixel_source_buffer: XRA8 little-endian (0x38415258)
[04:36:19] [PASSED] single_pixel_source_buffer: YU24 little-endian (0x34325559)
[04:36:19] [PASSED] single_pixel_clip_rectangle: XB24 little-endian (0x34324258)
[04:36:19] [PASSED] single_pixel_clip_rectangle: XRA8 little-endian (0x38415258)
[04:36:19] [PASSED] single_pixel_clip_rectangle: YU24 little-endian (0x34325559)
[04:36:19] [PASSED] well_known_colors: XB24 little-endian (0x34324258)
[04:36:19] [PASSED] well_known_colors: XRA8 little-endian (0x38415258)
[04:36:19] [PASSED] well_known_colors: YU24 little-endian (0x34325559)
[04:36:19] [PASSED] destination_pitch: XB24 little-endian (0x34324258)
[04:36:19] [PASSED] destination_pitch: XRA8 little-endian (0x38415258)
[04:36:19] [PASSED] destination_pitch: YU24 little-endian (0x34325559)
[04:36:19] =============== [PASSED] drm_test_fb_memcpy ================
[04:36:19] ============= [PASSED] drm_format_helper_test ==============
[04:36:19] ================= drm_format (18 subtests) =================
[04:36:19] [PASSED] drm_test_format_block_width_invalid
[04:36:19] [PASSED] drm_test_format_block_width_one_plane
[04:36:19] [PASSED] drm_test_format_block_width_two_plane
[04:36:19] [PASSED] drm_test_format_block_width_three_plane
[04:36:19] [PASSED] drm_test_format_block_width_tiled
[04:36:19] [PASSED] drm_test_format_block_height_invalid
[04:36:19] [PASSED] drm_test_format_block_height_one_plane
[04:36:19] [PASSED] drm_test_format_block_height_two_plane
[04:36:19] [PASSED] drm_test_format_block_height_three_plane
[04:36:19] [PASSED] drm_test_format_block_height_tiled
[04:36:19] [PASSED] drm_test_format_min_pitch_invalid
[04:36:19] [PASSED] drm_test_format_min_pitch_one_plane_8bpp
[04:36:19] [PASSED] drm_test_format_min_pitch_one_plane_16bpp
[04:36:19] [PASSED] drm_test_format_min_pitch_one_plane_24bpp
[04:36:19] [PASSED] drm_test_format_min_pitch_one_plane_32bpp
[04:36:19] [PASSED] drm_test_format_min_pitch_two_plane
[04:36:19] [PASSED] drm_test_format_min_pitch_three_plane_8bpp
[04:36:19] [PASSED] drm_test_format_min_pitch_tiled
[04:36:19] =================== [PASSED] drm_format ====================
[04:36:19] ============== drm_framebuffer (10 subtests) ===============
[04:36:19] ========== drm_test_framebuffer_check_src_coords  ==========
[04:36:19] [PASSED] Success: source fits into fb
[04:36:19] [PASSED] Fail: overflowing fb with x-axis coordinate
[04:36:19] [PASSED] Fail: overflowing fb with y-axis coordinate
[04:36:19] [PASSED] Fail: overflowing fb with source width
[04:36:19] [PASSED] Fail: overflowing fb with source height
[04:36:19] ====== [PASSED] drm_test_framebuffer_check_src_coords ======
[04:36:19] [PASSED] drm_test_framebuffer_cleanup
[04:36:19] =============== drm_test_framebuffer_create  ===============
[04:36:19] [PASSED] ABGR8888 normal sizes
[04:36:19] [PASSED] ABGR8888 max sizes
[04:36:19] [PASSED] ABGR8888 pitch greater than min required
[04:36:19] [PASSED] ABGR8888 pitch less than min required
[04:36:19] [PASSED] ABGR8888 Invalid width
[04:36:19] [PASSED] ABGR8888 Invalid buffer handle
[04:36:19] [PASSED] No pixel format
[04:36:19] [PASSED] ABGR8888 Width 0
[04:36:19] [PASSED] ABGR8888 Height 0
[04:36:19] [PASSED] ABGR8888 Out of bound height * pitch combination
[04:36:19] [PASSED] ABGR8888 Large buffer offset
[04:36:19] [PASSED] ABGR8888 Buffer offset for inexistent plane
[04:36:19] [PASSED] ABGR8888 Invalid flag
[04:36:19] [PASSED] ABGR8888 Set DRM_MODE_FB_MODIFIERS without modifiers
[04:36:19] [PASSED] ABGR8888 Valid buffer modifier
[04:36:19] [PASSED] ABGR8888 Invalid buffer modifier(DRM_FORMAT_MOD_SAMSUNG_64_32_TILE)
[04:36:19] [PASSED] ABGR8888 Extra pitches without DRM_MODE_FB_MODIFIERS
[04:36:19] [PASSED] ABGR8888 Extra pitches with DRM_MODE_FB_MODIFIERS
[04:36:19] [PASSED] NV12 Normal sizes
[04:36:19] [PASSED] NV12 Max sizes
[04:36:19] [PASSED] NV12 Invalid pitch
[04:36:19] [PASSED] NV12 Invalid modifier/missing DRM_MODE_FB_MODIFIERS flag
[04:36:19] [PASSED] NV12 different  modifier per-plane
[04:36:19] [PASSED] NV12 with DRM_FORMAT_MOD_SAMSUNG_64_32_TILE
[04:36:19] [PASSED] NV12 Valid modifiers without DRM_MODE_FB_MODIFIERS
[04:36:19] [PASSED] NV12 Modifier for inexistent plane
[04:36:19] [PASSED] NV12 Handle for inexistent plane
[04:36:19] [PASSED] NV12 Handle for inexistent plane without DRM_MODE_FB_MODIFIERS
[04:36:19] [PASSED] YVU420 DRM_MODE_FB_MODIFIERS set without modifier
[04:36:19] [PASSED] YVU420 Normal sizes
[04:36:19] [PASSED] YVU420 Max sizes
[04:36:19] [PASSED] YVU420 Invalid pitch
[04:36:19] [PASSED] YVU420 Different pitches
[04:36:19] [PASSED] YVU420 Different buffer offsets/pitches
[04:36:19] [PASSED] YVU420 Modifier set just for plane 0, without DRM_MODE_FB_MODIFIERS
[04:36:19] [PASSED] YVU420 Modifier set just for planes 0, 1, without DRM_MODE_FB_MODIFIERS
[04:36:19] [PASSED] YVU420 Modifier set just for plane 0, 1, with DRM_MODE_FB_MODIFIERS
[04:36:19] [PASSED] YVU420 Valid modifier
[04:36:19] [PASSED] YVU420 Different modifiers per plane
[04:36:19] [PASSED] YVU420 Modifier for inexistent plane
[04:36:19] [PASSED] YUV420_10BIT Invalid modifier(DRM_FORMAT_MOD_LINEAR)
[04:36:19] [PASSED] X0L2 Normal sizes
[04:36:19] [PASSED] X0L2 Max sizes
[04:36:19] [PASSED] X0L2 Invalid pitch
[04:36:19] [PASSED] X0L2 Pitch greater than minimum required
[04:36:19] [PASSED] X0L2 Handle for inexistent plane
[04:36:19] [PASSED] X0L2 Offset for inexistent plane, without DRM_MODE_FB_MODIFIERS set
[04:36:19] [PASSED] X0L2 Modifier without DRM_MODE_FB_MODIFIERS set
[04:36:19] [PASSED] X0L2 Valid modifier
[04:36:19] [PASSED] X0L2 Modifier for inexistent plane
[04:36:19] =========== [PASSED] drm_test_framebuffer_create ===========
[04:36:19] [PASSED] drm_test_framebuffer_free
[04:36:19] [PASSED] drm_test_framebuffer_init
[04:36:19] [PASSED] drm_test_framebuffer_init_bad_format
[04:36:19] [PASSED] drm_test_framebuffer_init_dev_mismatch
[04:36:19] [PASSED] drm_test_framebuffer_lookup
[04:36:19] [PASSED] drm_test_framebuffer_lookup_inexistent
[04:36:19] [PASSED] drm_test_framebuffer_modifiers_not_supported
[04:36:19] ================= [PASSED] drm_framebuffer =================
[04:36:19] ================ drm_gem_shmem (8 subtests) ================
[04:36:19] [PASSED] drm_gem_shmem_test_obj_create
[04:36:19] [PASSED] drm_gem_shmem_test_obj_create_private
[04:36:19] [PASSED] drm_gem_shmem_test_pin_pages
[04:36:19] [PASSED] drm_gem_shmem_test_vmap
[04:36:19] [PASSED] drm_gem_shmem_test_get_sg_table
[04:36:19] [PASSED] drm_gem_shmem_test_get_pages_sgt
[04:36:19] [PASSED] drm_gem_shmem_test_madvise
[04:36:19] [PASSED] drm_gem_shmem_test_purge
[04:36:19] ================== [PASSED] drm_gem_shmem ==================
[04:36:19] === drm_atomic_helper_connector_hdmi_check (27 subtests) ===
[04:36:19] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode
[04:36:19] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode_vic_1
[04:36:19] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode
[04:36:19] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode_vic_1
[04:36:19] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode
[04:36:19] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode_vic_1
[04:36:19] ====== drm_test_check_broadcast_rgb_cea_mode_yuv420  =======
[04:36:19] [PASSED] Automatic
[04:36:19] [PASSED] Full
[04:36:19] [PASSED] Limited 16:235
[04:36:19] == [PASSED] drm_test_check_broadcast_rgb_cea_mode_yuv420 ===
[04:36:19] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_changed
[04:36:19] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_not_changed
[04:36:19] [PASSED] drm_test_check_disable_connector
[04:36:19] [PASSED] drm_test_check_hdmi_funcs_reject_rate
[04:36:19] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_rgb
[04:36:19] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_yuv420
[04:36:19] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv422
[04:36:19] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv420
[04:36:19] [PASSED] drm_test_check_driver_unsupported_fallback_yuv420
[04:36:19] [PASSED] drm_test_check_output_bpc_crtc_mode_changed
[04:36:19] [PASSED] drm_test_check_output_bpc_crtc_mode_not_changed
[04:36:19] [PASSED] drm_test_check_output_bpc_dvi
[04:36:19] [PASSED] drm_test_check_output_bpc_format_vic_1
[04:36:19] [PASSED] drm_test_check_output_bpc_format_display_8bpc_only
[04:36:19] [PASSED] drm_test_check_output_bpc_format_display_rgb_only
[04:36:19] [PASSED] drm_test_check_output_bpc_format_driver_8bpc_only
[04:36:19] [PASSED] drm_test_check_output_bpc_format_driver_rgb_only
[04:36:19] [PASSED] drm_test_check_tmds_char_rate_rgb_8bpc
[04:36:19] [PASSED] drm_test_check_tmds_char_rate_rgb_10bpc
[04:36:19] [PASSED] drm_test_check_tmds_char_rate_rgb_12bpc
[04:36:19] ===== [PASSED] drm_atomic_helper_connector_hdmi_check ======
[04:36:19] === drm_atomic_helper_connector_hdmi_reset (6 subtests) ====
[04:36:19] [PASSED] drm_test_check_broadcast_rgb_value
[04:36:19] [PASSED] drm_test_check_bpc_8_value
[04:36:19] [PASSED] drm_test_check_bpc_10_value
[04:36:19] [PASSED] drm_test_check_bpc_12_value
[04:36:19] [PASSED] drm_test_check_format_value
[04:36:19] [PASSED] drm_test_check_tmds_char_value
[04:36:19] ===== [PASSED] drm_atomic_helper_connector_hdmi_reset ======
[04:36:19] = drm_atomic_helper_connector_hdmi_mode_valid (4 subtests) =
[04:36:19] [PASSED] drm_test_check_mode_valid
[04:36:19] [PASSED] drm_test_check_mode_valid_reject
[04:36:19] [PASSED] drm_test_check_mode_valid_reject_rate
[04:36:19] [PASSED] drm_test_check_mode_valid_reject_max_clock
[04:36:19] === [PASSED] drm_atomic_helper_connector_hdmi_mode_valid ===
[04:36:19] = drm_atomic_helper_connector_hdmi_infoframes (5 subtests) =
[04:36:19] [PASSED] drm_test_check_infoframes
[04:36:19] [PASSED] drm_test_check_reject_avi_infoframe
[04:36:19] [PASSED] drm_test_check_reject_hdr_infoframe_bpc_8
[04:36:19] [PASSED] drm_test_check_reject_hdr_infoframe_bpc_10
[04:36:19] [PASSED] drm_test_check_reject_audio_infoframe
[04:36:19] === [PASSED] drm_atomic_helper_connector_hdmi_infoframes ===
[04:36:19] ================= drm_managed (2 subtests) =================
[04:36:19] [PASSED] drm_test_managed_release_action
[04:36:19] [PASSED] drm_test_managed_run_action
[04:36:19] =================== [PASSED] drm_managed ===================
[04:36:19] =================== drm_mm (6 subtests) ====================
[04:36:19] [PASSED] drm_test_mm_init
[04:36:19] [PASSED] drm_test_mm_debug
[04:36:19] [PASSED] drm_test_mm_align32
[04:36:19] [PASSED] drm_test_mm_align64
[04:36:19] [PASSED] drm_test_mm_lowest
[04:36:19] [PASSED] drm_test_mm_highest
[04:36:19] ===================== [PASSED] drm_mm ======================
[04:36:19] ============= drm_modes_analog_tv (5 subtests) =============
[04:36:19] [PASSED] drm_test_modes_analog_tv_mono_576i
[04:36:19] [PASSED] drm_test_modes_analog_tv_ntsc_480i
[04:36:19] [PASSED] drm_test_modes_analog_tv_ntsc_480i_inlined
[04:36:19] [PASSED] drm_test_modes_analog_tv_pal_576i
[04:36:19] [PASSED] drm_test_modes_analog_tv_pal_576i_inlined
[04:36:19] =============== [PASSED] drm_modes_analog_tv ===============
[04:36:19] ============== drm_plane_helper (2 subtests) ===============
[04:36:19] =============== drm_test_check_plane_state  ================
[04:36:19] [PASSED] clipping_simple
[04:36:19] [PASSED] clipping_rotate_reflect
[04:36:19] [PASSED] positioning_simple
[04:36:19] [PASSED] upscaling
[04:36:19] [PASSED] downscaling
[04:36:19] [PASSED] rounding1
[04:36:19] [PASSED] rounding2
[04:36:19] [PASSED] rounding3
[04:36:19] [PASSED] rounding4
[04:36:19] =========== [PASSED] drm_test_check_plane_state ============
[04:36:19] =========== drm_test_check_invalid_plane_state  ============
[04:36:19] [PASSED] positioning_invalid
[04:36:19] [PASSED] upscaling_invalid
[04:36:19] [PASSED] downscaling_invalid
[04:36:19] ======= [PASSED] drm_test_check_invalid_plane_state ========
[04:36:19] ================ [PASSED] drm_plane_helper =================
[04:36:19] ====== drm_connector_helper_tv_get_modes (1 subtest) =======
[04:36:19] ====== drm_test_connector_helper_tv_get_modes_check  =======
[04:36:19] [PASSED] None
[04:36:19] [PASSED] PAL
[04:36:19] [PASSED] NTSC
[04:36:19] [PASSED] Both, NTSC Default
[04:36:19] [PASSED] Both, PAL Default
[04:36:19] [PASSED] Both, NTSC Default, with PAL on command-line
[04:36:19] [PASSED] Both, PAL Default, with NTSC on command-line
[04:36:19] == [PASSED] drm_test_connector_helper_tv_get_modes_check ===
[04:36:19] ======== [PASSED] drm_connector_helper_tv_get_modes ========
[04:36:19] ================== drm_rect (9 subtests) ===================
[04:36:19] [PASSED] drm_test_rect_clip_scaled_div_by_zero
[04:36:19] [PASSED] drm_test_rect_clip_scaled_not_clipped
[04:36:19] [PASSED] drm_test_rect_clip_scaled_clipped
[04:36:19] [PASSED] drm_test_rect_clip_scaled_signed_vs_unsigned
[04:36:19] ================= drm_test_rect_intersect  =================
[04:36:19] [PASSED] top-left x bottom-right: 2x2+1+1 x 2x2+0+0
[04:36:19] [PASSED] top-right x bottom-left: 2x2+0+0 x 2x2+1-1
[04:36:19] [PASSED] bottom-left x top-right: 2x2+1-1 x 2x2+0+0
[04:36:19] [PASSED] bottom-right x top-left: 2x2+0+0 x 2x2+1+1
[04:36:19] [PASSED] right x left: 2x1+0+0 x 3x1+1+0
[04:36:19] [PASSED] left x right: 3x1+1+0 x 2x1+0+0
[04:36:19] [PASSED] up x bottom: 1x2+0+0 x 1x3+0-1
[04:36:19] [PASSED] bottom x up: 1x3+0-1 x 1x2+0+0
[04:36:19] [PASSED] touching corner: 1x1+0+0 x 2x2+1+1
[04:36:19] [PASSED] touching side: 1x1+0+0 x 1x1+1+0
[04:36:19] [PASSED] equal rects: 2x2+0+0 x 2x2+0+0
[04:36:19] [PASSED] inside another: 2x2+0+0 x 1x1+1+1
[04:36:19] [PASSED] far away: 1x1+0+0 x 1x1+3+6
[04:36:19] [PASSED] points intersecting: 0x0+5+10 x 0x0+5+10
[04:36:19] [PASSED] points not intersecting: 0x0+0+0 x 0x0+5+10
[04:36:19] ============= [PASSED] drm_test_rect_intersect =============
[04:36:19] ================ drm_test_rect_calc_hscale  ================
[04:36:19] [PASSED] normal use
[04:36:19] [PASSED] out of max range
[04:36:19] [PASSED] out of min range
[04:36:19] [PASSED] zero dst
[04:36:19] [PASSED] negative src
[04:36:19] [PASSED] negative dst
[04:36:19] ============ [PASSED] drm_test_rect_calc_hscale ============
[04:36:19] ================ drm_test_rect_calc_vscale  ================
[04:36:19] [PASSED] normal use
[04:36:19] [PASSED] out of max range
[04:36:19] [PASSED] out of min range
[04:36:19] [PASSED] zero dst
[04:36:19] [PASSED] negative src
[04:36:19] [PASSED] negative dst
stty: 'standard input': Inappropriate ioctl for device
[04:36:19] ============ [PASSED] drm_test_rect_calc_vscale ============
[04:36:19] ================== drm_test_rect_rotate  ===================
[04:36:19] [PASSED] reflect-x
[04:36:19] [PASSED] reflect-y
[04:36:19] [PASSED] rotate-0
[04:36:19] [PASSED] rotate-90
[04:36:19] [PASSED] rotate-180
[04:36:19] [PASSED] rotate-270
[04:36:19] ============== [PASSED] drm_test_rect_rotate ===============
[04:36:19] ================ drm_test_rect_rotate_inv  =================
[04:36:19] [PASSED] reflect-x
[04:36:19] [PASSED] reflect-y
[04:36:19] [PASSED] rotate-0
[04:36:19] [PASSED] rotate-90
[04:36:19] [PASSED] rotate-180
[04:36:19] [PASSED] rotate-270
[04:36:19] ============ [PASSED] drm_test_rect_rotate_inv =============
[04:36:19] ==================== [PASSED] drm_rect =====================
[04:36:19] ============ drm_sysfb_modeset_test (1 subtest) ============
[04:36:19] ============ drm_test_sysfb_build_fourcc_list  =============
[04:36:19] [PASSED] no native formats
[04:36:19] [PASSED] XRGB8888 as native format
[04:36:19] [PASSED] remove duplicates
[04:36:19] [PASSED] convert alpha formats
[04:36:19] [PASSED] random formats
[04:36:19] ======== [PASSED] drm_test_sysfb_build_fourcc_list =========
[04:36:19] ============= [PASSED] drm_sysfb_modeset_test ==============
[04:36:19] ================== drm_fixp (2 subtests) ===================
[04:36:19] [PASSED] drm_test_int2fixp
[04:36:19] [PASSED] drm_test_sm2fixp
[04:36:19] ==================== [PASSED] drm_fixp =====================
[04:36:19] ============================================================
[04:36:19] Testing complete. Ran 621 tests: passed: 621
[04:36:19] Elapsed time: 25.898s total, 1.739s configuring, 23.993s building, 0.121s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/ttm/tests/.kunitconfig
[04:36:20] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[04:36:21] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[04:36:31] Starting KUnit Kernel (1/1)...
[04:36:31] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[04:36:31] ================= ttm_device (5 subtests) ==================
[04:36:31] [PASSED] ttm_device_init_basic
[04:36:31] [PASSED] ttm_device_init_multiple
[04:36:31] [PASSED] ttm_device_fini_basic
[04:36:31] [PASSED] ttm_device_init_no_vma_man
[04:36:31] ================== ttm_device_init_pools  ==================
[04:36:31] [PASSED] No DMA allocations, no DMA32 required
[04:36:31] [PASSED] DMA allocations, DMA32 required
[04:36:31] [PASSED] No DMA allocations, DMA32 required
[04:36:31] [PASSED] DMA allocations, no DMA32 required
[04:36:31] ============== [PASSED] ttm_device_init_pools ==============
[04:36:31] =================== [PASSED] ttm_device ====================
[04:36:31] ================== ttm_pool (8 subtests) ===================
[04:36:31] ================== ttm_pool_alloc_basic  ===================
[04:36:31] [PASSED] One page
[04:36:31] [PASSED] More than one page
[04:36:31] [PASSED] Above the allocation limit
[04:36:31] [PASSED] One page, with coherent DMA mappings enabled
[04:36:31] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[04:36:31] ============== [PASSED] ttm_pool_alloc_basic ===============
[04:36:31] ============== ttm_pool_alloc_basic_dma_addr  ==============
[04:36:31] [PASSED] One page
[04:36:31] [PASSED] More than one page
[04:36:31] [PASSED] Above the allocation limit
[04:36:31] [PASSED] One page, with coherent DMA mappings enabled
[04:36:31] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[04:36:31] ========== [PASSED] ttm_pool_alloc_basic_dma_addr ==========
[04:36:31] [PASSED] ttm_pool_alloc_order_caching_match
[04:36:31] [PASSED] ttm_pool_alloc_caching_mismatch
[04:36:31] [PASSED] ttm_pool_alloc_order_mismatch
[04:36:31] [PASSED] ttm_pool_free_dma_alloc
[04:36:31] [PASSED] ttm_pool_free_no_dma_alloc
[04:36:31] [PASSED] ttm_pool_fini_basic
[04:36:31] ==================== [PASSED] ttm_pool =====================
[04:36:31] ================ ttm_resource (8 subtests) =================
[04:36:31] ================= ttm_resource_init_basic  =================
[04:36:31] [PASSED] Init resource in TTM_PL_SYSTEM
[04:36:31] [PASSED] Init resource in TTM_PL_VRAM
[04:36:31] [PASSED] Init resource in a private placement
[04:36:31] [PASSED] Init resource in TTM_PL_SYSTEM, set placement flags
[04:36:31] ============= [PASSED] ttm_resource_init_basic =============
[04:36:31] [PASSED] ttm_resource_init_pinned
[04:36:31] [PASSED] ttm_resource_fini_basic
[04:36:31] [PASSED] ttm_resource_manager_init_basic
[04:36:31] [PASSED] ttm_resource_manager_usage_basic
[04:36:31] [PASSED] ttm_resource_manager_set_used_basic
[04:36:31] [PASSED] ttm_sys_man_alloc_basic
[04:36:31] [PASSED] ttm_sys_man_free_basic
[04:36:31] ================== [PASSED] ttm_resource ===================
[04:36:31] =================== ttm_tt (15 subtests) ===================
[04:36:31] ==================== ttm_tt_init_basic  ====================
[04:36:31] [PASSED] Page-aligned size
[04:36:31] [PASSED] Extra pages requested
[04:36:31] ================ [PASSED] ttm_tt_init_basic ================
[04:36:31] [PASSED] ttm_tt_init_misaligned
[04:36:31] [PASSED] ttm_tt_fini_basic
[04:36:31] [PASSED] ttm_tt_fini_sg
[04:36:31] [PASSED] ttm_tt_fini_shmem
[04:36:31] [PASSED] ttm_tt_create_basic
[04:36:31] [PASSED] ttm_tt_create_invalid_bo_type
[04:36:31] [PASSED] ttm_tt_create_ttm_exists
[04:36:31] [PASSED] ttm_tt_create_failed
[04:36:31] [PASSED] ttm_tt_destroy_basic
[04:36:31] [PASSED] ttm_tt_populate_null_ttm
[04:36:31] [PASSED] ttm_tt_populate_populated_ttm
[04:36:31] [PASSED] ttm_tt_unpopulate_basic
[04:36:31] [PASSED] ttm_tt_unpopulate_empty_ttm
[04:36:31] [PASSED] ttm_tt_swapin_basic
[04:36:31] ===================== [PASSED] ttm_tt ======================
[04:36:31] =================== ttm_bo (14 subtests) ===================
[04:36:31] =========== ttm_bo_reserve_optimistic_no_ticket  ===========
[04:36:31] [PASSED] Cannot be interrupted and sleeps
[04:36:31] [PASSED] Cannot be interrupted, locks straight away
[04:36:31] [PASSED] Can be interrupted, sleeps
[04:36:31] ======= [PASSED] ttm_bo_reserve_optimistic_no_ticket =======
[04:36:31] [PASSED] ttm_bo_reserve_locked_no_sleep
[04:36:31] [PASSED] ttm_bo_reserve_no_wait_ticket
[04:36:31] [PASSED] ttm_bo_reserve_double_resv
[04:36:31] [PASSED] ttm_bo_reserve_interrupted
[04:36:31] [PASSED] ttm_bo_reserve_deadlock
[04:36:31] [PASSED] ttm_bo_unreserve_basic
[04:36:31] [PASSED] ttm_bo_unreserve_pinned
[04:36:31] [PASSED] ttm_bo_unreserve_bulk
[04:36:31] [PASSED] ttm_bo_fini_basic
[04:36:31] [PASSED] ttm_bo_fini_shared_resv
[04:36:31] [PASSED] ttm_bo_pin_basic
[04:36:31] [PASSED] ttm_bo_pin_unpin_resource
[04:36:31] [PASSED] ttm_bo_multiple_pin_one_unpin
[04:36:31] ===================== [PASSED] ttm_bo ======================
[04:36:31] ============== ttm_bo_validate (21 subtests) ===============
[04:36:31] ============== ttm_bo_init_reserved_sys_man  ===============
[04:36:31] [PASSED] Buffer object for userspace
[04:36:31] [PASSED] Kernel buffer object
[04:36:31] [PASSED] Shared buffer object
[04:36:31] ========== [PASSED] ttm_bo_init_reserved_sys_man ===========
[04:36:31] ============== ttm_bo_init_reserved_mock_man  ==============
[04:36:31] [PASSED] Buffer object for userspace
[04:36:31] [PASSED] Kernel buffer object
[04:36:31] [PASSED] Shared buffer object
[04:36:31] ========== [PASSED] ttm_bo_init_reserved_mock_man ==========
[04:36:31] [PASSED] ttm_bo_init_reserved_resv
[04:36:31] ================== ttm_bo_validate_basic  ==================
[04:36:31] [PASSED] Buffer object for userspace
[04:36:31] [PASSED] Kernel buffer object
[04:36:31] [PASSED] Shared buffer object
[04:36:31] ============== [PASSED] ttm_bo_validate_basic ==============
[04:36:31] [PASSED] ttm_bo_validate_invalid_placement
[04:36:31] ============= ttm_bo_validate_same_placement  ==============
[04:36:31] [PASSED] System manager
[04:36:31] [PASSED] VRAM manager
[04:36:31] ========= [PASSED] ttm_bo_validate_same_placement ==========
[04:36:31] [PASSED] ttm_bo_validate_failed_alloc
[04:36:31] [PASSED] ttm_bo_validate_pinned
[04:36:31] [PASSED] ttm_bo_validate_busy_placement
[04:36:31] ================ ttm_bo_validate_multihop  =================
[04:36:31] [PASSED] Buffer object for userspace
[04:36:31] [PASSED] Kernel buffer object
[04:36:31] [PASSED] Shared buffer object
[04:36:31] ============ [PASSED] ttm_bo_validate_multihop =============
[04:36:31] ========== ttm_bo_validate_no_placement_signaled  ==========
[04:36:31] [PASSED] Buffer object in system domain, no page vector
[04:36:31] [PASSED] Buffer object in system domain with an existing page vector
[04:36:31] ====== [PASSED] ttm_bo_validate_no_placement_signaled ======
[04:36:31] ======== ttm_bo_validate_no_placement_not_signaled  ========
[04:36:31] [PASSED] Buffer object for userspace
[04:36:31] [PASSED] Kernel buffer object
[04:36:31] [PASSED] Shared buffer object
[04:36:31] ==== [PASSED] ttm_bo_validate_no_placement_not_signaled ====
[04:36:31] [PASSED] ttm_bo_validate_move_fence_signaled
[04:36:31] ========= ttm_bo_validate_move_fence_not_signaled  =========
[04:36:31] [PASSED] Waits for GPU
[04:36:31] [PASSED] Tries to lock straight away
[04:36:31] ===== [PASSED] ttm_bo_validate_move_fence_not_signaled =====
[04:36:31] [PASSED] ttm_bo_validate_happy_evict
[04:36:31] [PASSED] ttm_bo_validate_all_pinned_evict
[04:36:31] [PASSED] ttm_bo_validate_allowed_only_evict
[04:36:31] [PASSED] ttm_bo_validate_deleted_evict
[04:36:31] [PASSED] ttm_bo_validate_busy_domain_evict
[04:36:31] [PASSED] ttm_bo_validate_evict_gutting
[04:36:31] [PASSED] ttm_bo_validate_recrusive_evict
stty: 'standard input': Inappropriate ioctl for device
[04:36:31] ================= [PASSED] ttm_bo_validate =================
[04:36:31] ============================================================
[04:36:31] Testing complete. Ran 101 tests: passed: 101
[04:36:31] Elapsed time: 11.429s total, 1.727s configuring, 9.485s building, 0.170s running

+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 33+ messages in thread

* ✗ Xe.CI.BAT: failure for Fine grained fault locking, threaded prefetch, storm cache (rev4)
  2026-02-26  4:28 [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache Matthew Brost
                   ` (13 preceding siblings ...)
  2026-02-26  4:36 ` ✓ CI.KUnit: success " Patchwork
@ 2026-02-26  5:26 ` Patchwork
  2026-02-26  8:59 ` ✗ Xe.CI.FULL: " Patchwork
  2026-02-26 13:43 ` [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache Thomas Hellström
  16 siblings, 0 replies; 33+ messages in thread
From: Patchwork @ 2026-02-26  5:26 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 2048 bytes --]

== Series Details ==

Series: Fine grained fault locking, threaded prefetch, storm cache (rev4)
URL   : https://patchwork.freedesktop.org/series/162167/
State : failure

== Summary ==

CI Bug Log - changes from xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440_BAT -> xe-pw-162167v4_BAT
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with xe-pw-162167v4_BAT absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in xe-pw-162167v4_BAT, please notify your bug team (I915-ci-infra@lists.freedesktop.org) to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (14 -> 14)
------------------------------

  No changes in participating hosts

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in xe-pw-162167v4_BAT:

### IGT changes ###

#### Possible regressions ####

  * igt@core_debugfs@read-all-entries:
    - bat-adlp-vm:        [PASS][1] -> [ABORT][2]
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/bat-adlp-vm/igt@core_debugfs@read-all-entries.html
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/bat-adlp-vm/igt@core_debugfs@read-all-entries.html
    - bat-atsm-2:         [PASS][3] -> [ABORT][4]
   [3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/bat-atsm-2/igt@core_debugfs@read-all-entries.html
   [4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/bat-atsm-2/igt@core_debugfs@read-all-entries.html

  


Build changes
-------------

  * Linux: xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440 -> xe-pw-162167v4

  IGT_8772: 8772
  xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440: f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440
  xe-pw-162167v4: 162167v4

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/index.html

[-- Attachment #2: Type: text/html, Size: 2641 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* ✗ Xe.CI.FULL: failure for Fine grained fault locking, threaded prefetch, storm cache (rev4)
  2026-02-26  4:28 [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache Matthew Brost
                   ` (14 preceding siblings ...)
  2026-02-26  5:26 ` ✗ Xe.CI.BAT: failure " Patchwork
@ 2026-02-26  8:59 ` Patchwork
  2026-02-26 13:43 ` [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache Thomas Hellström
  16 siblings, 0 replies; 33+ messages in thread
From: Patchwork @ 2026-02-26  8:59 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 29940 bytes --]

== Series Details ==

Series: Fine grained fault locking, threaded prefetch, storm cache (rev4)
URL   : https://patchwork.freedesktop.org/series/162167/
State : failure

== Summary ==

CI Bug Log - changes from xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440_FULL -> xe-pw-162167v4_FULL
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with xe-pw-162167v4_FULL absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in xe-pw-162167v4_FULL, please notify your bug team (I915-ci-infra@lists.freedesktop.org) to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (2 -> 2)
------------------------------

  No changes in participating hosts

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in xe-pw-162167v4_FULL:

### IGT changes ###

#### Possible regressions ####

  * igt@kms_flip@2x-flip-vs-expired-vblank-interruptible@bd-dp2-hdmi-a3:
    - shard-bmg:          [PASS][1] -> [FAIL][2] +1 other test fail
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-bmg-1/igt@kms_flip@2x-flip-vs-expired-vblank-interruptible@bd-dp2-hdmi-a3.html
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-8/igt@kms_flip@2x-flip-vs-expired-vblank-interruptible@bd-dp2-hdmi-a3.html

  * igt@kms_pm_backlight@brightness-with-dpms:
    - shard-lnl:          [PASS][3] -> [SKIP][4] +1 other test skip
   [3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-lnl-8/igt@kms_pm_backlight@brightness-with-dpms.html
   [4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-lnl-8/igt@kms_pm_backlight@brightness-with-dpms.html

  * igt@xe_exec_system_allocator@process-many-mmap-remap-dontunmap:
    - shard-lnl:          [PASS][5] -> [INCOMPLETE][6]
   [5]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-lnl-4/igt@xe_exec_system_allocator@process-many-mmap-remap-dontunmap.html
   [6]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-lnl-5/igt@xe_exec_system_allocator@process-many-mmap-remap-dontunmap.html

  * igt@xe_pm@s4-d3hot-basic-exec:
    - shard-bmg:          [PASS][7] -> [WARN][8]
   [7]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-bmg-6/igt@xe_pm@s4-d3hot-basic-exec.html
   [8]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-4/igt@xe_pm@s4-d3hot-basic-exec.html

  
Known issues
------------

  Here are the changes found in xe-pw-162167v4_FULL that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_big_fb@linear-32bpp-rotate-270:
    - shard-bmg:          NOTRUN -> [SKIP][9] ([Intel XE#2327])
   [9]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-3/igt@kms_big_fb@linear-32bpp-rotate-270.html

  * igt@kms_big_fb@linear-max-hw-stride-32bpp-rotate-180-hflip:
    - shard-bmg:          NOTRUN -> [SKIP][10] ([Intel XE#7059])
   [10]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-3/igt@kms_big_fb@linear-max-hw-stride-32bpp-rotate-180-hflip.html

  * igt@kms_big_fb@y-tiled-16bpp-rotate-0:
    - shard-bmg:          NOTRUN -> [SKIP][11] ([Intel XE#1124]) +2 other tests skip
   [11]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-8/igt@kms_big_fb@y-tiled-16bpp-rotate-0.html

  * igt@kms_ccs@bad-pixel-format-y-tiled-gen12-rc-ccs-cc:
    - shard-bmg:          NOTRUN -> [SKIP][12] ([Intel XE#2887]) +3 other tests skip
   [12]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-8/igt@kms_ccs@bad-pixel-format-y-tiled-gen12-rc-ccs-cc.html

  * igt@kms_ccs@crc-sprite-planes-basic-4-tiled-lnl-ccs@pipe-b-dp-2:
    - shard-bmg:          NOTRUN -> [SKIP][13] ([Intel XE#2652]) +3 other tests skip
   [13]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-2/igt@kms_ccs@crc-sprite-planes-basic-4-tiled-lnl-ccs@pipe-b-dp-2.html

  * igt@kms_cdclk@mode-transition:
    - shard-bmg:          NOTRUN -> [SKIP][14] ([Intel XE#2724])
   [14]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-8/igt@kms_cdclk@mode-transition.html

  * igt@kms_chamelium_color@ctm-0-50:
    - shard-bmg:          NOTRUN -> [SKIP][15] ([Intel XE#2325])
   [15]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-8/igt@kms_chamelium_color@ctm-0-50.html

  * igt@kms_chamelium_edid@dp-edid-resolution-list:
    - shard-bmg:          NOTRUN -> [SKIP][16] ([Intel XE#2252]) +3 other tests skip
   [16]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-3/igt@kms_chamelium_edid@dp-edid-resolution-list.html

  * igt@kms_content_protection@lic-type-0-hdcp14@pipe-a-dp-2:
    - shard-bmg:          NOTRUN -> [FAIL][17] ([Intel XE#1178] / [Intel XE#3304]) +1 other test fail
   [17]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-6/igt@kms_content_protection@lic-type-0-hdcp14@pipe-a-dp-2.html

  * igt@kms_cursor_crc@cursor-offscreen-32x32:
    - shard-bmg:          NOTRUN -> [SKIP][18] ([Intel XE#2320]) +1 other test skip
   [18]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-8/igt@kms_cursor_crc@cursor-offscreen-32x32.html

  * igt@kms_cursor_crc@cursor-onscreen-512x512:
    - shard-bmg:          NOTRUN -> [SKIP][19] ([Intel XE#2321])
   [19]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-3/igt@kms_cursor_crc@cursor-onscreen-512x512.html

  * igt@kms_cursor_crc@cursor-sliding-256x256:
    - shard-bmg:          [PASS][20] -> [FAIL][21] ([Intel XE#6747]) +1 other test fail
   [20]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-bmg-4/igt@kms_cursor_crc@cursor-sliding-256x256.html
   [21]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-2/igt@kms_cursor_crc@cursor-sliding-256x256.html

  * igt@kms_cursor_legacy@cursorb-vs-flipa-varying-size:
    - shard-bmg:          [PASS][22] -> [DMESG-WARN][23] ([Intel XE#5354])
   [22]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-bmg-4/igt@kms_cursor_legacy@cursorb-vs-flipa-varying-size.html
   [23]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-3/igt@kms_cursor_legacy@cursorb-vs-flipa-varying-size.html

  * igt@kms_dsc@dsc-with-bpc:
    - shard-bmg:          NOTRUN -> [SKIP][24] ([Intel XE#2244])
   [24]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-3/igt@kms_dsc@dsc-with-bpc.html

  * igt@kms_fbc_dirty_rect@fbc-dirty-rectangle-dirtyfb-tests:
    - shard-bmg:          NOTRUN -> [SKIP][25] ([Intel XE#4422])
   [25]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-8/igt@kms_fbc_dirty_rect@fbc-dirty-rectangle-dirtyfb-tests.html

  * igt@kms_flip_scaled_crc@flip-64bpp-yftile-to-16bpp-yftile-upscaling:
    - shard-bmg:          NOTRUN -> [SKIP][26] ([Intel XE#7178]) +2 other tests skip
   [26]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-8/igt@kms_flip_scaled_crc@flip-64bpp-yftile-to-16bpp-yftile-upscaling.html

  * igt@kms_frontbuffer_tracking@drrs-1p-offscreen-pri-shrfb-draw-blt:
    - shard-lnl:          NOTRUN -> [SKIP][27] ([Intel XE#6312])
   [27]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-lnl-3/igt@kms_frontbuffer_tracking@drrs-1p-offscreen-pri-shrfb-draw-blt.html

  * igt@kms_frontbuffer_tracking@drrs-2p-primscrn-shrfb-msflip-blt:
    - shard-bmg:          NOTRUN -> [SKIP][28] ([Intel XE#2311]) +8 other tests skip
   [28]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-3/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-shrfb-msflip-blt.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-cur-indfb-draw-render:
    - shard-bmg:          NOTRUN -> [SKIP][29] ([Intel XE#4141]) +3 other tests skip
   [29]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-3/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-cur-indfb-draw-render.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-draw-mmap-wc:
    - shard-lnl:          NOTRUN -> [SKIP][30] ([Intel XE#656])
   [30]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-lnl-3/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-draw-mmap-wc.html

  * igt@kms_frontbuffer_tracking@fbc-tiling-y:
    - shard-bmg:          NOTRUN -> [SKIP][31] ([Intel XE#2352])
   [31]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-8/igt@kms_frontbuffer_tracking@fbc-tiling-y.html

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-indfb-draw-blt:
    - shard-bmg:          NOTRUN -> [SKIP][32] ([Intel XE#2313]) +5 other tests skip
   [32]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-8/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-indfb-draw-blt.html

  * igt@kms_hdr@brightness-with-hdr:
    - shard-bmg:          NOTRUN -> [SKIP][33] ([Intel XE#3374] / [Intel XE#3544])
   [33]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-3/igt@kms_hdr@brightness-with-hdr.html

  * igt@kms_plane@pixel-format-4-tiled-dg2-rc-ccs-cc-modifier:
    - shard-bmg:          NOTRUN -> [SKIP][34] ([Intel XE#7283])
   [34]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-3/igt@kms_plane@pixel-format-4-tiled-dg2-rc-ccs-cc-modifier.html

  * igt@kms_plane_lowres@tiling-yf:
    - shard-bmg:          NOTRUN -> [SKIP][35] ([Intel XE#2393])
   [35]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-8/igt@kms_plane_lowres@tiling-yf.html

  * igt@kms_pm_backlight@fade-with-dpms:
    - shard-lnl:          [PASS][36] -> [SKIP][37] ([Intel XE#7300])
   [36]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-lnl-1/igt@kms_pm_backlight@fade-with-dpms.html
   [37]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-lnl-5/igt@kms_pm_backlight@fade-with-dpms.html

  * igt@kms_pm_rpm@drm-resources-equal:
    - shard-bmg:          NOTRUN -> [SKIP][38] ([Intel XE#7197]) +1 other test skip
   [38]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-8/igt@kms_pm_rpm@drm-resources-equal.html

  * igt@kms_pm_rpm@modeset-lpsp-stress-no-wait:
    - shard-lnl:          [PASS][39] -> [SKIP][40] ([Intel XE#7197]) +1 other test skip
   [39]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-lnl-5/igt@kms_pm_rpm@modeset-lpsp-stress-no-wait.html
   [40]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-lnl-2/igt@kms_pm_rpm@modeset-lpsp-stress-no-wait.html

  * igt@kms_pm_rpm@system-suspend-modeset:
    - shard-bmg:          [PASS][41] -> [SKIP][42] ([Intel XE#7197]) +3 other tests skip
   [41]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-bmg-1/igt@kms_pm_rpm@system-suspend-modeset.html
   [42]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-8/igt@kms_pm_rpm@system-suspend-modeset.html

  * igt@kms_psr2_sf@psr2-overlay-plane-move-continuous-exceed-fully-sf:
    - shard-bmg:          NOTRUN -> [SKIP][43] ([Intel XE#1489]) +2 other tests skip
   [43]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-3/igt@kms_psr2_sf@psr2-overlay-plane-move-continuous-exceed-fully-sf.html

  * igt@kms_psr@pr-sprite-plane-onoff:
    - shard-bmg:          NOTRUN -> [SKIP][44] ([Intel XE#2234] / [Intel XE#2850]) +4 other tests skip
   [44]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-3/igt@kms_psr@pr-sprite-plane-onoff.html

  * igt@kms_sharpness_filter@filter-dpms:
    - shard-bmg:          NOTRUN -> [SKIP][45] ([Intel XE#6503]) +1 other test skip
   [45]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-7/igt@kms_sharpness_filter@filter-dpms.html

  * igt@kms_vblank@ts-continuation-dpms-rpm:
    - shard-lnl:          [PASS][46] -> [SKIP][47] ([Intel XE#7287]) +2 other tests skip
   [46]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-lnl-6/igt@kms_vblank@ts-continuation-dpms-rpm.html
   [47]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-lnl-4/igt@kms_vblank@ts-continuation-dpms-rpm.html

  * igt@xe_eudebug@basic-vm-access-faultable:
    - shard-bmg:          NOTRUN -> [SKIP][48] ([Intel XE#4837])
   [48]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-8/igt@xe_eudebug@basic-vm-access-faultable.html

  * igt@xe_eudebug_online@interrupt-all-set-breakpoint:
    - shard-bmg:          NOTRUN -> [SKIP][49] ([Intel XE#4837] / [Intel XE#6665])
   [49]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-8/igt@xe_eudebug_online@interrupt-all-set-breakpoint.html

  * igt@xe_exec_basic@multigpu-once-rebind:
    - shard-bmg:          NOTRUN -> [SKIP][50] ([Intel XE#2322])
   [50]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-8/igt@xe_exec_basic@multigpu-once-rebind.html

  * igt@xe_exec_fault_mode@many-multi-queue-rebind-prefetch:
    - shard-bmg:          NOTRUN -> [SKIP][51] ([Intel XE#7136]) +6 other tests skip
   [51]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-8/igt@xe_exec_fault_mode@many-multi-queue-rebind-prefetch.html

  * igt@xe_exec_multi_queue@few-execs-priority:
    - shard-bmg:          NOTRUN -> [SKIP][52] ([Intel XE#6874]) +3 other tests skip
   [52]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-8/igt@xe_exec_multi_queue@few-execs-priority.html

  * igt@xe_exec_multi_queue@one-queue-preempt-mode-fault-basic:
    - shard-lnl:          NOTRUN -> [SKIP][53] ([Intel XE#6874]) +1 other test skip
   [53]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-lnl-3/igt@xe_exec_multi_queue@one-queue-preempt-mode-fault-basic.html

  * igt@xe_exec_threads@threads-multi-queue-cm-fd-userptr-rebind:
    - shard-bmg:          NOTRUN -> [SKIP][54] ([Intel XE#7138]) +3 other tests skip
   [54]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-8/igt@xe_exec_threads@threads-multi-queue-cm-fd-userptr-rebind.html

  * igt@xe_peer2peer@read:
    - shard-bmg:          NOTRUN -> [SKIP][55] ([Intel XE#2427] / [Intel XE#6953] / [Intel XE#7326])
   [55]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-3/igt@xe_peer2peer@read.html

  * igt@xe_pm@d3hot-basic:
    - shard-bmg:          [PASS][56] -> [FAIL][57] ([Intel XE#7165])
   [56]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-bmg-1/igt@xe_pm@d3hot-basic.html
   [57]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-8/igt@xe_pm@d3hot-basic.html

  * igt@xe_pm@s4-d3cold-basic-exec:
    - shard-bmg:          NOTRUN -> [SKIP][58] ([Intel XE#2284]) +1 other test skip
   [58]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-8/igt@xe_pm@s4-d3cold-basic-exec.html

  * igt@xe_pxp@pxp-optout:
    - shard-bmg:          NOTRUN -> [SKIP][59] ([Intel XE#4733]) +1 other test skip
   [59]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-8/igt@xe_pxp@pxp-optout.html

  
#### Possible fixes ####

  * igt@kms_atomic_transition@plane-all-modeset-transition:
    - shard-bmg:          [ABORT][60] ([Intel XE#5545]) -> [PASS][61] +1 other test pass
   [60]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-bmg-2/igt@kms_atomic_transition@plane-all-modeset-transition.html
   [61]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-8/igt@kms_atomic_transition@plane-all-modeset-transition.html

  * igt@kms_cursor_legacy@flip-vs-cursor-atomic:
    - shard-bmg:          [FAIL][62] ([Intel XE#7480]) -> [PASS][63]
   [62]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-bmg-1/igt@kms_cursor_legacy@flip-vs-cursor-atomic.html
   [63]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-8/igt@kms_cursor_legacy@flip-vs-cursor-atomic.html

  * igt@kms_cursor_legacy@flip-vs-cursor-legacy:
    - shard-bmg:          [FAIL][64] -> [PASS][65]
   [64]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-bmg-2/igt@kms_cursor_legacy@flip-vs-cursor-legacy.html
   [65]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-1/igt@kms_cursor_legacy@flip-vs-cursor-legacy.html

  * igt@kms_vrr@seamless-rr-switch-virtual@pipe-a-edp-1:
    - shard-lnl:          [FAIL][66] ([Intel XE#2142]) -> [PASS][67] +1 other test pass
   [66]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-lnl-2/igt@kms_vrr@seamless-rr-switch-virtual@pipe-a-edp-1.html
   [67]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-lnl-2/igt@kms_vrr@seamless-rr-switch-virtual@pipe-a-edp-1.html

  * igt@xe_exec_system_allocator@pat-index-madvise-pat-idx-uc-multi-vma:
    - shard-lnl:          [FAIL][68] ([Intel XE#5625]) -> [PASS][69]
   [68]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-lnl-8/igt@xe_exec_system_allocator@pat-index-madvise-pat-idx-uc-multi-vma.html
   [69]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-lnl-4/igt@xe_exec_system_allocator@pat-index-madvise-pat-idx-uc-multi-vma.html

  * igt@xe_pm@s4-vm-bind-prefetch:
    - shard-lnl:          [INCOMPLETE][70] -> [PASS][71]
   [70]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-lnl-2/igt@xe_pm@s4-vm-bind-prefetch.html
   [71]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-lnl-3/igt@xe_pm@s4-vm-bind-prefetch.html

  
#### Warnings ####

  * igt@kms_content_protection@suspend-resume:
    - shard-lnl:          [SKIP][72] ([Intel XE#6705]) -> [SKIP][73] ([Intel XE#6705] / [Intel XE#6973])
   [72]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-lnl-1/igt@kms_content_protection@suspend-resume.html
   [73]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-lnl-5/igt@kms_content_protection@suspend-resume.html

  * igt@kms_content_protection@uevent:
    - shard-lnl:          [SKIP][74] ([Intel XE#3278]) -> [SKIP][75] ([Intel XE#3278] / [Intel XE#6973]) +7 other tests skip
   [74]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-lnl-8/igt@kms_content_protection@uevent.html
   [75]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-lnl-8/igt@kms_content_protection@uevent.html

  * igt@kms_pm_rpm@dpms-lpsp:
    - shard-bmg:          [SKIP][76] ([Intel XE#1439] / [Intel XE#3141] / [Intel XE#836]) -> [SKIP][77] ([Intel XE#7197]) +2 other tests skip
   [76]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-bmg-7/igt@kms_pm_rpm@dpms-lpsp.html
   [77]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-1/igt@kms_pm_rpm@dpms-lpsp.html

  * igt@kms_pm_rpm@dpms-mode-unset-non-lpsp:
    - shard-lnl:          [SKIP][78] ([Intel XE#1439] / [Intel XE#836]) -> [SKIP][79] ([Intel XE#7197])
   [78]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-lnl-5/igt@kms_pm_rpm@dpms-mode-unset-non-lpsp.html
   [79]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-lnl-2/igt@kms_pm_rpm@dpms-mode-unset-non-lpsp.html

  * igt@kms_pm_rpm@modeset-non-lpsp-stress-no-wait:
    - shard-lnl:          [SKIP][80] ([Intel XE#1439] / [Intel XE#3141]) -> [SKIP][81] ([Intel XE#7197])
   [80]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-lnl-1/igt@kms_pm_rpm@modeset-non-lpsp-stress-no-wait.html
   [81]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-lnl-5/igt@kms_pm_rpm@modeset-non-lpsp-stress-no-wait.html

  * igt@kms_pm_rpm@package-g7:
    - shard-lnl:          [SKIP][82] ([Intel XE#6813]) -> [SKIP][83] ([Intel XE#7197])
   [82]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-lnl-8/igt@kms_pm_rpm@package-g7.html
   [83]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-lnl-1/igt@kms_pm_rpm@package-g7.html

  * igt@kms_psr2_sf@fbc-psr2-overlay-plane-update-sf-dmg-area@pipe-a-edp-1:
    - shard-lnl:          [SKIP][84] ([Intel XE#1406] / [Intel XE#4608]) -> [SKIP][85] ([Intel XE#4608]) +19 other tests skip
   [84]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-lnl-1/igt@kms_psr2_sf@fbc-psr2-overlay-plane-update-sf-dmg-area@pipe-a-edp-1.html
   [85]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-lnl-7/igt@kms_psr2_sf@fbc-psr2-overlay-plane-update-sf-dmg-area@pipe-a-edp-1.html

  * igt@kms_psr2_sf@fbc-psr2-overlay-primary-update-sf-dmg-area:
    - shard-lnl:          [SKIP][86] ([Intel XE#1406] / [Intel XE#2893] / [Intel XE#4608]) -> [SKIP][87] ([Intel XE#2893] / [Intel XE#4608]) +9 other tests skip
   [86]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-lnl-4/igt@kms_psr2_sf@fbc-psr2-overlay-primary-update-sf-dmg-area.html
   [87]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-lnl-5/igt@kms_psr2_sf@fbc-psr2-overlay-primary-update-sf-dmg-area.html

  * igt@kms_psr2_sf@pr-overlay-plane-move-continuous-sf:
    - shard-lnl:          [SKIP][88] ([Intel XE#1406] / [Intel XE#2893]) -> [SKIP][89] ([Intel XE#2893]) +22 other tests skip
   [88]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-lnl-1/igt@kms_psr2_sf@pr-overlay-plane-move-continuous-sf.html
   [89]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-lnl-5/igt@kms_psr2_sf@pr-overlay-plane-move-continuous-sf.html

  * igt@kms_psr2_sf@psr2-plane-move-sf-dmg-area:
    - shard-bmg:          [SKIP][90] ([Intel XE#1406] / [Intel XE#1489]) -> [SKIP][91] ([Intel XE#1489]) +43 other tests skip
   [90]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-bmg-7/igt@kms_psr2_sf@psr2-plane-move-sf-dmg-area.html
   [91]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-1/igt@kms_psr2_sf@psr2-plane-move-sf-dmg-area.html

  * igt@kms_psr2_su@frontbuffer-xrgb8888:
    - shard-lnl:          [SKIP][92] ([Intel XE#1128] / [Intel XE#1406]) -> [SKIP][93] ([Intel XE#1128]) +3 other tests skip
   [92]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-lnl-6/igt@kms_psr2_su@frontbuffer-xrgb8888.html
   [93]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-lnl-8/igt@kms_psr2_su@frontbuffer-xrgb8888.html

  * igt@kms_psr2_su@page_flip-xrgb8888:
    - shard-bmg:          [SKIP][94] ([Intel XE#1406] / [Intel XE#2387]) -> [SKIP][95] ([Intel XE#2387]) +3 other tests skip
   [94]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-bmg-4/igt@kms_psr2_su@page_flip-xrgb8888.html
   [95]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-2/igt@kms_psr2_su@page_flip-xrgb8888.html

  * igt@kms_psr@psr-primary-page-flip:
    - shard-bmg:          [SKIP][96] ([Intel XE#1406] / [Intel XE#2234] / [Intel XE#2850]) -> [SKIP][97] ([Intel XE#2234] / [Intel XE#2850]) +68 other tests skip
   [96]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-bmg-1/igt@kms_psr@psr-primary-page-flip.html
   [97]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-9/igt@kms_psr@psr-primary-page-flip.html

  * igt@kms_psr@psr2-primary-render:
    - shard-bmg:          [SKIP][98] ([Intel XE#1406] / [Intel XE#2234]) -> [SKIP][99] ([Intel XE#2234])
   [98]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-bmg-9/igt@kms_psr@psr2-primary-render.html
   [99]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-5/igt@kms_psr@psr2-primary-render.html

  * igt@kms_psr_stress_test@flip-primary-invalidate-overlay:
    - shard-bmg:          [SKIP][100] ([Intel XE#1406] / [Intel XE#2414]) -> [SKIP][101] ([Intel XE#2414]) +1 other test skip
   [100]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440/shard-bmg-5/igt@kms_psr_stress_test@flip-primary-invalidate-overlay.html
   [101]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/shard-bmg-3/igt@kms_psr_stress_test@flip-primary-invalidate-overlay.html

  
  [Intel XE#1124]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1124
  [Intel XE#1128]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1128
  [Intel XE#1178]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1178
  [Intel XE#1406]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1406
  [Intel XE#1439]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1439
  [Intel XE#1489]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1489
  [Intel XE#2142]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2142
  [Intel XE#2234]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2234
  [Intel XE#2244]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2244
  [Intel XE#2252]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2252
  [Intel XE#2284]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2284
  [Intel XE#2311]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2311
  [Intel XE#2313]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2313
  [Intel XE#2320]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2320
  [Intel XE#2321]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2321
  [Intel XE#2322]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2322
  [Intel XE#2325]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2325
  [Intel XE#2327]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2327
  [Intel XE#2352]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2352
  [Intel XE#2387]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2387
  [Intel XE#2393]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2393
  [Intel XE#2414]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2414
  [Intel XE#2427]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2427
  [Intel XE#2652]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2652
  [Intel XE#2724]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2724
  [Intel XE#2850]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2850
  [Intel XE#2887]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2887
  [Intel XE#2893]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2893
  [Intel XE#3141]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3141
  [Intel XE#3278]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3278
  [Intel XE#3304]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3304
  [Intel XE#3374]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3374
  [Intel XE#3544]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3544
  [Intel XE#4141]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4141
  [Intel XE#4422]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4422
  [Intel XE#4608]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4608
  [Intel XE#4733]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4733
  [Intel XE#4837]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4837
  [Intel XE#5354]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5354
  [Intel XE#5545]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5545
  [Intel XE#5625]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5625
  [Intel XE#6312]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6312
  [Intel XE#6503]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6503
  [Intel XE#656]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/656
  [Intel XE#6665]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6665
  [Intel XE#6705]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6705
  [Intel XE#6747]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6747
  [Intel XE#6813]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6813
  [Intel XE#6874]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6874
  [Intel XE#6953]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6953
  [Intel XE#6973]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6973
  [Intel XE#7059]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7059
  [Intel XE#7136]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7136
  [Intel XE#7138]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7138
  [Intel XE#7165]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7165
  [Intel XE#7178]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7178
  [Intel XE#7197]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7197
  [Intel XE#7283]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7283
  [Intel XE#7287]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7287
  [Intel XE#7300]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7300
  [Intel XE#7326]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7326
  [Intel XE#7480]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7480
  [Intel XE#836]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/836


Build changes
-------------

  * Linux: xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440 -> xe-pw-162167v4

  IGT_8772: 8772
  xe-4620-f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440: f0ec6252eb6d9a4f0cb5a437f5c21fec16d0a440
  xe-pw-162167v4: 162167v4

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162167v4/index.html

[-- Attachment #2: Type: text/html, Size: 34822 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache
  2026-02-26  4:28 [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache Matthew Brost
                   ` (15 preceding siblings ...)
  2026-02-26  8:59 ` ✗ Xe.CI.FULL: " Patchwork
@ 2026-02-26 13:43 ` Thomas Hellström
  2026-02-26 19:36   ` Matthew Brost
  16 siblings, 1 reply; 33+ messages in thread
From: Thomas Hellström @ 2026-02-26 13:43 UTC (permalink / raw)
  To: Matthew Brost, intel-xe
  Cc: stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	francois.dugast

Hi, Matt.

On Wed, 2026-02-25 at 20:28 -0800, Matthew Brost wrote:
> Fine-grained fault locking provides immediate benefits: it allows
> page
> faults from the same VM to be processed in parallel (unless they
> target
> the same range) and enables a sane multi-threaded prefetch
> implementation. UMD prefetch benchmarks see 10% to 50% improvement in
> prefetch performance on BMG depending on PCIe bus speed.
> 
> Once parallel fault processing is available, the pagefault queue can
> be
> unified into a single queue with multiple workers pulling faults to
> process. A single queue then allows a sensible pagefault cache to be
> implemented, so that multiple faults targeting the same region can be
> batched together and acknowledged in, ideally, a single pass. This
> saves
> CPU cycles during pagefault handling and improves overall throughput
> of
> the fault handler.
> 
> Significant improvements in UMD pagefault benchmarks can be seen when
> utilizing this caching.
> 
> v3:
>  - Fix kunit build (CI)
> v4:
>  - Actually fix kunit build (CI)
> 
> Matt
> 
> Matthew Brost (12):
>   drm/xe: Fine grained page fault locking
>   drm/xe: Allow prefetch-only VM bind IOCTLs to use VM read lock
>   drm/xe: Thread prefetch of SVM ranges
>   drm/xe: Use a single page-fault queue with multiple workers
>   drm/xe: Add num_pf_work modparam
>   drm/xe: Engine class and instance into a u8
>   drm/xe: Track pagefault worker runtime
>   drm/xe: Chain page faults via queue-resident cache to avoid fault
>     storms
>   drm/xe: Add pagefault chaining stats
>   drm/xe: Add debugfs pagefault_info
>   drm/xe: batch CT pagefault acks with periodic flush
>   drm/xe: Track parallel page fault activity in GT stats
> 
>  drivers/gpu/drm/drm_gpusvm.c            |   2 +-
>  drivers/gpu/drm/xe/xe_debugfs.c         |  11 +
>  drivers/gpu/drm/xe/xe_defaults.h        |   1 +
>  drivers/gpu/drm/xe/xe_device.c          |  17 +-
>  drivers/gpu/drm/xe/xe_device_types.h    |  17 +-
>  drivers/gpu/drm/xe/xe_gt_stats.c        |   7 +
>  drivers/gpu/drm/xe/xe_gt_stats_types.h  |   7 +
>  drivers/gpu/drm/xe/xe_guc_ct.c          |  94 +++-
>  drivers/gpu/drm/xe/xe_guc_ct.h          |  35 +-
>  drivers/gpu/drm/xe/xe_guc_pagefault.c   |  35 +-
>  drivers/gpu/drm/xe/xe_guc_types.h       |   6 +
>  drivers/gpu/drm/xe/xe_module.c          |   4 +
>  drivers/gpu/drm/xe/xe_module.h          |   1 +
>  drivers/gpu/drm/xe/xe_pagefault.c       | 675 ++++++++++++++++++++--
> --
>  drivers/gpu/drm/xe/xe_pagefault.h       |  74 +++
>  drivers/gpu/drm/xe/xe_pagefault_types.h | 109 +++-
>  drivers/gpu/drm/xe/xe_svm.c             | 129 +++--
>  drivers/gpu/drm/xe/xe_svm.h             |  59 ++-
>  drivers/gpu/drm/xe/xe_userptr.c         |  20 +-
>  drivers/gpu/drm/xe/xe_vm.c              | 215 ++++++--
>  drivers/gpu/drm/xe/xe_vm_types.h        |  37 +-
>  21 files changed, 1309 insertions(+), 246 deletions(-)

Before I get to reviewing this, some suggestions from Claude:

  Confirmed regressions (3 commits with issues):

  c664c1b91090 — Fine grained page fault locking

   - Reference leak in vm_bind_ioctl_ops_create (xe_vm.c). 
  xe_svm_range_find_or_insert() was changed to take a reference, but
two paths
   don't put it: (1) when xe_svm_range_validate() returns true → goto 
  check_next_range, and (2) when xa_alloc() fails → goto
unwind_prefetch_ops. 
  The validate path is on every prefetch of an already-populated range,
so 
  refcounts grow unbounded.

  80012f80c75f — Chain page faults

   - Commit message typos only: "samr ASID" → "same ASID", "IRQ pathd"
→ "IRQ 
  paths". No code issues.

  569104fb76ed — batch CT pagefault acks with periodic flush

   - Off-by-one in flush period: guc_ack_fault_begin initialises 
  pagefault_ack_counter to PERIOD - 2 = 14, but the comment says first
flush 
  should be at ack #2. With counter=14 the first flush fires at ack #3 
  (counter hits 16, 16&15==0). Fix: = XE_GUC_PAGEFAULT_FLUSH_PERIOD -
1.
   - Commit message typo: "Assistent-by" → "Assisted-by".

/Thomas




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache
  2026-02-26 13:43 ` [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache Thomas Hellström
@ 2026-02-26 19:36   ` Matthew Brost
  0 siblings, 0 replies; 33+ messages in thread
From: Matthew Brost @ 2026-02-26 19:36 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, stuart.summers, arvind.yadav, himal.prasad.ghimiray,
	francois.dugast

On Thu, Feb 26, 2026 at 02:43:29PM +0100, Thomas Hellström wrote:
> Hi, Matt.
> 
> On Wed, 2026-02-25 at 20:28 -0800, Matthew Brost wrote:
> > Fine-grained fault locking provides immediate benefits: it allows
> > page
> > faults from the same VM to be processed in parallel (unless they
> > target
> > the same range) and enables a sane multi-threaded prefetch
> > implementation. UMD prefetch benchmarks see 10% to 50% improvement in
> > prefetch performance on BMG depending on PCIe bus speed.
> > 
> > Once parallel fault processing is available, the pagefault queue can
> > be
> > unified into a single queue with multiple workers pulling faults to
> > process. A single queue then allows a sensible pagefault cache to be
> > implemented, so that multiple faults targeting the same region can be
> > batched together and acknowledged in, ideally, a single pass. This
> > saves
> > CPU cycles during pagefault handling and improves overall throughput
> > of
> > the fault handler.
> > 
> > Significant improvements in UMD pagefault benchmarks can be seen when
> > utilizing this caching.
> > 
> > v3:
> >  - Fix kunit build (CI)
> > v4:
> >  - Actually fix kunit build (CI)
> > 
> > Matt
> > 
> > Matthew Brost (12):
> >   drm/xe: Fine grained page fault locking
> >   drm/xe: Allow prefetch-only VM bind IOCTLs to use VM read lock
> >   drm/xe: Thread prefetch of SVM ranges
> >   drm/xe: Use a single page-fault queue with multiple workers
> >   drm/xe: Add num_pf_work modparam
> >   drm/xe: Engine class and instance into a u8
> >   drm/xe: Track pagefault worker runtime
> >   drm/xe: Chain page faults via queue-resident cache to avoid fault
> >     storms
> >   drm/xe: Add pagefault chaining stats
> >   drm/xe: Add debugfs pagefault_info
> >   drm/xe: batch CT pagefault acks with periodic flush
> >   drm/xe: Track parallel page fault activity in GT stats
> > 
> >  drivers/gpu/drm/drm_gpusvm.c            |   2 +-
> >  drivers/gpu/drm/xe/xe_debugfs.c         |  11 +
> >  drivers/gpu/drm/xe/xe_defaults.h        |   1 +
> >  drivers/gpu/drm/xe/xe_device.c          |  17 +-
> >  drivers/gpu/drm/xe/xe_device_types.h    |  17 +-
> >  drivers/gpu/drm/xe/xe_gt_stats.c        |   7 +
> >  drivers/gpu/drm/xe/xe_gt_stats_types.h  |   7 +
> >  drivers/gpu/drm/xe/xe_guc_ct.c          |  94 +++-
> >  drivers/gpu/drm/xe/xe_guc_ct.h          |  35 +-
> >  drivers/gpu/drm/xe/xe_guc_pagefault.c   |  35 +-
> >  drivers/gpu/drm/xe/xe_guc_types.h       |   6 +
> >  drivers/gpu/drm/xe/xe_module.c          |   4 +
> >  drivers/gpu/drm/xe/xe_module.h          |   1 +
> >  drivers/gpu/drm/xe/xe_pagefault.c       | 675 ++++++++++++++++++++--
> > --
> >  drivers/gpu/drm/xe/xe_pagefault.h       |  74 +++
> >  drivers/gpu/drm/xe/xe_pagefault_types.h | 109 +++-
> >  drivers/gpu/drm/xe/xe_svm.c             | 129 +++--
> >  drivers/gpu/drm/xe/xe_svm.h             |  59 ++-
> >  drivers/gpu/drm/xe/xe_userptr.c         |  20 +-
> >  drivers/gpu/drm/xe/xe_vm.c              | 215 ++++++--
> >  drivers/gpu/drm/xe/xe_vm_types.h        |  37 +-
> >  21 files changed, 1309 insertions(+), 246 deletions(-)
> 
> Before I get to reviewing this, some suggestions from Claude:
> 
>   Confirmed regressions (3 commits with issues):
> 
>   c664c1b91090 — Fine grained page fault locking
> 
>    - Reference leak in vm_bind_ioctl_ops_create (xe_vm.c). 
>   xe_svm_range_find_or_insert() was changed to take a reference, but
> two paths
>    don't put it: (1) when xe_svm_range_validate() returns true → goto 
>   check_next_range, and (2) when xa_alloc() fails → goto
> unwind_prefetch_ops. 
>   The validate path is on every prefetch of an already-populated range,
> so 
>   refcounts grow unbounded.
>

Indeed. Will fix.

>   80012f80c75f — Chain page faults
> 
>    - Commit message typos only: "samr ASID" → "same ASID", "IRQ pathd"
> → "IRQ 
>   paths". No code issues.
>

Yes.

>   569104fb76ed — batch CT pagefault acks with periodic flush
> 
>    - Off-by-one in flush period: guc_ack_fault_begin initialises 

This is pretty good, caught this one myself after posting.

I'm convinced everyone should use Claude as a spot check before posting.

>   pagefault_ack_counter to PERIOD - 2 = 14, but the comment says first
> flush 
>   should be at ack #2. With counter=14 the first flush fires at ack #3 
>   (counter hits 16, 16&15==0). Fix: = XE_GUC_PAGEFAULT_FLUSH_PERIOD -
> 1.
>    - Commit message typo: "Assistent-by" → "Assisted-by".

Yes.

Matt

> 
> /Thomas
> 
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2026-05-08 12:04 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-26  4:28 [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache Matthew Brost
2026-02-26  4:28 ` [PATCH v4 01/12] drm/xe: Fine grained page fault locking Matthew Brost
2026-02-26  4:28 ` [PATCH v4 02/12] drm/xe: Allow prefetch-only VM bind IOCTLs to use VM read lock Matthew Brost
2026-02-26  4:28 ` [PATCH v4 03/12] drm/xe: Thread prefetch of SVM ranges Matthew Brost
2026-02-26  4:28 ` [PATCH v4 04/12] drm/xe: Use a single page-fault queue with multiple workers Matthew Brost
2026-05-06 15:46   ` Maciej Patelczyk
2026-05-06 19:42     ` Matthew Brost
2026-05-07 12:41       ` Maciej Patelczyk
2026-02-26  4:28 ` [PATCH v4 05/12] drm/xe: Add num_pf_work modparam Matthew Brost
2026-05-06 15:59   ` Maciej Patelczyk
2026-02-26  4:28 ` [PATCH v4 06/12] drm/xe: Engine class and instance into a u8 Matthew Brost
2026-05-06 16:04   ` Maciej Patelczyk
2026-05-07 16:20     ` Maciej Patelczyk
2026-02-26  4:28 ` [PATCH v4 07/12] drm/xe: Track pagefault worker runtime Matthew Brost
2026-05-07 12:51   ` Maciej Patelczyk
2026-02-26  4:28 ` [PATCH v4 08/12] drm/xe: Chain page faults via queue-resident cache to avoid fault storms Matthew Brost
2026-05-08 12:03   ` Maciej Patelczyk
2026-02-26  4:28 ` [PATCH v4 09/12] drm/xe: Add pagefault chaining stats Matthew Brost
2026-05-07 13:15   ` Maciej Patelczyk
2026-05-07 13:52     ` Francois Dugast
2026-02-26  4:28 ` [PATCH v4 10/12] drm/xe: Add debugfs pagefault_info Matthew Brost
2026-05-07 10:07   ` Maciej Patelczyk
2026-02-26  4:28 ` [PATCH v4 11/12] drm/xe: batch CT pagefault acks with periodic flush Matthew Brost
2026-05-08  9:24   ` Maciej Patelczyk
2026-02-26  4:28 ` [PATCH v4 12/12] drm/xe: Track parallel page fault activity in GT stats Matthew Brost
2026-05-07 13:56   ` Maciej Patelczyk
2026-05-07 14:23     ` Francois Dugast
2026-02-26  4:35 ` ✗ CI.checkpatch: warning for Fine grained fault locking, threaded prefetch, storm cache (rev4) Patchwork
2026-02-26  4:36 ` ✓ CI.KUnit: success " Patchwork
2026-02-26  5:26 ` ✗ Xe.CI.BAT: failure " Patchwork
2026-02-26  8:59 ` ✗ Xe.CI.FULL: " Patchwork
2026-02-26 13:43 ` [PATCH v4 00/12] Fine grained fault locking, threaded prefetch, storm cache Thomas Hellström
2026-02-26 19:36   ` Matthew Brost

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox