Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] Fix serialization on burst of unbinds
@ 2025-10-17 16:52 Matthew Brost
  2025-10-17 16:52 ` [PATCH 1/1] drm/xe: Avoid serializing unbind jobs on prior TLB invalidations Matthew Brost
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Matthew Brost @ 2025-10-17 16:52 UTC (permalink / raw)
  To: intel-xe; +Cc: carlos.santa, thomas.hellstrom

Attempting to resolve part of [1]; the patch should explain the details
of this solution.

Other options have been discussed, including not exporting TLB
invalidation fences to user space, or using TLB invalidation fences to
enforce ordering on the bind queue, rather than simply attaching TLB
invalidations to dma-resv, which blocks reclaim. I’ve implemented this
approach, and it appears to have the same effect as this patch. However,
I can't confidently reason that it won't violate queue fence ordering
rules. If desired, this alternative solution can also be posted for
discussion.

Matt

[1] https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6047

Matthew Brost (1):
  drm/xe: Avoid serializing unbind jobs on prior TLB invalidations

 drivers/gpu/drm/xe/xe_exec.c          |  3 +-
 drivers/gpu/drm/xe/xe_exec_queue.c    | 18 +++++++++--
 drivers/gpu/drm/xe/xe_exec_queue.h    |  3 +-
 drivers/gpu/drm/xe/xe_pt.c            | 15 +++++++--
 drivers/gpu/drm/xe/xe_sched_job.c     | 44 ++++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_sched_job.h     |  7 ++++-
 drivers/gpu/drm/xe/xe_tlb_inval_job.c | 14 +++++++++
 drivers/gpu/drm/xe/xe_tlb_inval_job.h |  2 ++
 8 files changed, 98 insertions(+), 8 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/1] drm/xe: Avoid serializing unbind jobs on prior TLB invalidations
  2025-10-17 16:52 [PATCH 0/1] Fix serialization on burst of unbinds Matthew Brost
@ 2025-10-17 16:52 ` Matthew Brost
  2025-10-21 17:55   ` Summers, Stuart
                     ` (2 more replies)
  2025-10-17 18:36 ` ✓ CI.KUnit: success for Fix serialization on burst of unbinds Patchwork
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 15+ messages in thread
From: Matthew Brost @ 2025-10-17 16:52 UTC (permalink / raw)
  To: intel-xe; +Cc: carlos.santa, thomas.hellstrom

When a burst of unbind jobs is issued, a dependency chain can form
between the TLB invalidation of a previous unbind job and the current
one. This leads to undesirable serialization, causing current jobs to
wait unnecessarily for prior TLB invalidations, execute on the GPU when
not needed, and significantly slow down the unbind burst—resulting in up
to a 4× slowdown.

To break this chain, mask the last bind queue dependency if the last
fence's DMA context matches the TLB invalidation context. This allows
full pipelining of unbinds and TLB invalidations while preserving
correct dma-fence signaling semantics.

Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6047
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_exec.c          |  3 +-
 drivers/gpu/drm/xe/xe_exec_queue.c    | 18 +++++++++--
 drivers/gpu/drm/xe/xe_exec_queue.h    |  3 +-
 drivers/gpu/drm/xe/xe_pt.c            | 15 +++++++--
 drivers/gpu/drm/xe/xe_sched_job.c     | 44 ++++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_sched_job.h     |  7 ++++-
 drivers/gpu/drm/xe/xe_tlb_inval_job.c | 14 +++++++++
 drivers/gpu/drm/xe/xe_tlb_inval_job.h |  2 ++
 8 files changed, 98 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
index 0dc27476832b..6034cfc8be06 100644
--- a/drivers/gpu/drm/xe/xe_exec.c
+++ b/drivers/gpu/drm/xe/xe_exec.c
@@ -294,7 +294,8 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		goto err_put_job;
 
 	if (!xe_vm_in_lr_mode(vm)) {
-		err = xe_sched_job_last_fence_add_dep(job, vm);
+		err = xe_sched_job_last_fence_add_dep(job, vm, NO_MASK_DEP,
+						      NO_MASK_DEP);
 		if (err)
 			goto err_put_job;
 
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index 90cbc95f8e2e..d6f69d9bccba 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -25,6 +25,7 @@
 #include "xe_migrate.h"
 #include "xe_pm.h"
 #include "xe_ring_ops_types.h"
+#include "xe_sched_job.h"
 #include "xe_trace.h"
 #include "xe_vm.h"
 #include "xe_pxp.h"
@@ -1106,11 +1107,17 @@ void xe_exec_queue_last_fence_set(struct xe_exec_queue *q, struct xe_vm *vm,
  * xe_exec_queue_last_fence_test_dep - Test last fence dependency of queue
  * @q: The exec queue
  * @vm: The VM the engine does a bind or exec for
+ * @mask_ctx0: Mask dma-fence context0
+ * @mask_ctx1: Mask dma-fence context1
+ *
+ * Test last fence dependency of queue, skipping masked dma fence contexts.
  *
  * Returns:
- * -ETIME if there exists an unsignalled last fence dependency, zero otherwise.
+ * -ETIME if there exists an unsignalled and unmasked last fence dependency,
+ * zero otherwise.
  */
-int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q, struct xe_vm *vm)
+int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q, struct xe_vm *vm,
+				      u64 mask_ctx0, u64 mask_ctx1)
 {
 	struct dma_fence *fence;
 	int err = 0;
@@ -1119,6 +1126,13 @@ int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q, struct xe_vm *vm)
 	if (fence) {
 		err = test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags) ?
 			0 : -ETIME;
+
+		if (err == -ETIME) {
+			if (xe_sched_job_mask_dependency(fence, mask_ctx0,
+							 mask_ctx1))
+				err = 0;
+		}
+
 		dma_fence_put(fence);
 	}
 
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
index a4dfbe858bda..99a35b22a46c 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue.h
@@ -85,7 +85,8 @@ struct dma_fence *xe_exec_queue_last_fence_get_for_resume(struct xe_exec_queue *
 void xe_exec_queue_last_fence_set(struct xe_exec_queue *e, struct xe_vm *vm,
 				  struct dma_fence *fence);
 int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
-				      struct xe_vm *vm);
+				      struct xe_vm *vm, u64 mask_ctx0,
+				      u64 mask_ctx1);
 void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q);
 
 int xe_exec_queue_contexts_hwsp_rebase(struct xe_exec_queue *q, void *scratch);
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index d22fd1ccc0ba..bba9ae559f57 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1341,10 +1341,21 @@ static int xe_pt_vm_dependencies(struct xe_sched_job *job,
 	}
 
 	if (!(pt_update_ops->q->flags & EXEC_QUEUE_FLAG_KERNEL)) {
+		u64 mask_ctx0 = NO_MASK_DEP, mask_ctx1 = NO_MASK_DEP;
+
+		if (ijob)
+			mask_ctx0 = xe_tlb_inval_job_fence_context(ijob);
+		if (mjob)
+			mask_ctx1 = xe_tlb_inval_job_fence_context(mjob);
+
 		if (job)
-			err = xe_sched_job_last_fence_add_dep(job, vm);
+			err = xe_sched_job_last_fence_add_dep(job, vm,
+							      mask_ctx0,
+							      mask_ctx1);
 		else
-			err = xe_exec_queue_last_fence_test_dep(pt_update_ops->q, vm);
+			err = xe_exec_queue_last_fence_test_dep(pt_update_ops->q,
+								vm, mask_ctx0,
+								mask_ctx1);
 	}
 
 	for (i = 0; job && !err && i < vops->num_syncs; i++)
diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
index d21bf8f26964..7cbdd87904c6 100644
--- a/drivers/gpu/drm/xe/xe_sched_job.c
+++ b/drivers/gpu/drm/xe/xe_sched_job.c
@@ -6,6 +6,7 @@
 #include "xe_sched_job.h"
 
 #include <uapi/drm/xe_drm.h>
+#include <linux/dma-fence-array.h>
 #include <linux/dma-fence-chain.h>
 #include <linux/slab.h>
 
@@ -295,19 +296,60 @@ void xe_sched_job_push(struct xe_sched_job *job)
 	xe_sched_job_put(job);
 }
 
+/**
+ * xe_sched_job_mask_dependency() - Determine if a dma-fence dependency can be masked
+ * @fence: The dma-fence to check
+ * @mask_ctx0: First context to compare against the fence's context
+ * @mask_ctx1: Second context to compare against the fence's context
+ *
+ * This function checks whether the context of the given dma-fence matches
+ * either of the provided mask contexts. If a match is found, the dependency
+ * represented by the fence can be skipped. If the fence is a dma-fence-array,
+ * its individual fences are unwound and checked.
+ *
+ * Return: true if the fence can be masked (i.e., skipped), false otherwise.
+ */
+bool xe_sched_job_mask_dependency(struct dma_fence *fence, u64 mask_ctx0,
+				  u64 mask_ctx1)
+{
+	if (dma_fence_is_array(fence)) {
+		struct dma_fence *__fence;
+		int index;
+
+		dma_fence_array_for_each(__fence, index, fence)
+			if (__fence->context == mask_ctx0 ||
+			    __fence->context == mask_ctx1)
+				return true;
+	} else if (fence->context == mask_ctx0 ||
+		   fence->context == mask_ctx1) {
+		return true;
+	}
+
+	return false;
+}
+
 /**
  * xe_sched_job_last_fence_add_dep - Add last fence dependency to job
  * @job:job to add the last fence dependency to
  * @vm: virtual memory job belongs to
+ * @mask_ctx0: Mask dma-fence context0
+ * @mask_ctx1: Mask dma-fence context1
+ *
+ * Add last fence dependency to job, skipping masked dma fence contexts.
  *
  * Returns:
  * 0 on success, or an error on failing to expand the array.
  */
-int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm)
+int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm,
+				    u64 mask_ctx0, u64 mask_ctx1)
 {
 	struct dma_fence *fence;
 
 	fence = xe_exec_queue_last_fence_get(job->q, vm);
+	if (xe_sched_job_mask_dependency(fence, mask_ctx0, mask_ctx1)) {
+		dma_fence_put(fence);
+		return 0;
+	}
 
 	return drm_sched_job_add_dependency(&job->drm, fence);
 }
diff --git a/drivers/gpu/drm/xe/xe_sched_job.h b/drivers/gpu/drm/xe/xe_sched_job.h
index 3dc72c5c1f13..81d8e848e605 100644
--- a/drivers/gpu/drm/xe/xe_sched_job.h
+++ b/drivers/gpu/drm/xe/xe_sched_job.h
@@ -58,7 +58,8 @@ bool xe_sched_job_completed(struct xe_sched_job *job);
 void xe_sched_job_arm(struct xe_sched_job *job);
 void xe_sched_job_push(struct xe_sched_job *job);
 
-int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm);
+int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm,
+				    u64 mask_ctx0, u64 mask_ctx1);
 void xe_sched_job_init_user_fence(struct xe_sched_job *job,
 				  struct xe_sync_entry *sync);
 
@@ -93,4 +94,8 @@ void xe_sched_job_snapshot_print(struct xe_sched_job_snapshot *snapshot, struct
 int xe_sched_job_add_deps(struct xe_sched_job *job, struct dma_resv *resv,
 			  enum dma_resv_usage usage);
 
+#define NO_MASK_DEP	(~0x0ull)
+bool xe_sched_job_mask_dependency(struct dma_fence *fence, u64 mask_ctx0,
+				  u64 mask_ctx1);
+
 #endif
diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.c b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
index 492def04a559..f2fe7f9fbb22 100644
--- a/drivers/gpu/drm/xe/xe_tlb_inval_job.c
+++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
@@ -32,6 +32,8 @@ struct xe_tlb_inval_job {
 	u64 start;
 	/** @end: End address to invalidate */
 	u64 end;
+	/** @fence_context: Fence context for job */
+	u64 fence_context;
 	/** @asid: Address space ID to invalidate */
 	u32 asid;
 	/** @fence_armed: Fence has been armed */
@@ -101,6 +103,7 @@ xe_tlb_inval_job_create(struct xe_exec_queue *q, struct xe_tlb_inval *tlb_inval,
 	job->asid = asid;
 	job->fence_armed = false;
 	job->dep.ops = &dep_job_ops;
+	job->fence_context = entity->fence_context + 1;
 	kref_init(&job->refcount);
 	xe_exec_queue_get(q);	/* Pairs with put in xe_tlb_inval_job_destroy */
 
@@ -266,3 +269,14 @@ void xe_tlb_inval_job_put(struct xe_tlb_inval_job *job)
 	if (!IS_ERR_OR_NULL(job))
 		kref_put(&job->refcount, xe_tlb_inval_job_destroy);
 }
+
+/**
+ * xe_tlb_inval_job_fence_context() - TLB invalidation job fence context
+ * @job: TLB invalidation job object
+ *
+ * Return: TLB invalidation job fence context
+ */
+u64 xe_tlb_inval_job_fence_context(struct xe_tlb_inval_job *job)
+{
+	return job->fence_context;
+}
diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.h b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
index e63edcb26b50..2576165c2228 100644
--- a/drivers/gpu/drm/xe/xe_tlb_inval_job.h
+++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
@@ -30,4 +30,6 @@ void xe_tlb_inval_job_get(struct xe_tlb_inval_job *job);
 
 void xe_tlb_inval_job_put(struct xe_tlb_inval_job *job);
 
+u64 xe_tlb_inval_job_fence_context(struct xe_tlb_inval_job *job);
+
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* ✓ CI.KUnit: success for Fix serialization on burst of unbinds
  2025-10-17 16:52 [PATCH 0/1] Fix serialization on burst of unbinds Matthew Brost
  2025-10-17 16:52 ` [PATCH 1/1] drm/xe: Avoid serializing unbind jobs on prior TLB invalidations Matthew Brost
@ 2025-10-17 18:36 ` Patchwork
  2025-10-17 19:16 ` ✓ Xe.CI.BAT: " Patchwork
  2025-10-18 18:20 ` ✗ Xe.CI.Full: failure " Patchwork
  3 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2025-10-17 18:36 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

== Series Details ==

Series: Fix serialization on burst of unbinds
URL   : https://patchwork.freedesktop.org/series/156144/
State : success

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
[18:34:56] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[18:35:00] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[18:35:31] Starting KUnit Kernel (1/1)...
[18:35:31] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[18:35:31] ================== guc_buf (11 subtests) ===================
[18:35:31] [PASSED] test_smallest
[18:35:31] [PASSED] test_largest
[18:35:31] [PASSED] test_granular
[18:35:31] [PASSED] test_unique
[18:35:31] [PASSED] test_overlap
[18:35:31] [PASSED] test_reusable
[18:35:31] [PASSED] test_too_big
[18:35:31] [PASSED] test_flush
[18:35:31] [PASSED] test_lookup
[18:35:31] [PASSED] test_data
[18:35:31] [PASSED] test_class
[18:35:31] ===================== [PASSED] guc_buf =====================
[18:35:31] =================== guc_dbm (7 subtests) ===================
[18:35:31] [PASSED] test_empty
[18:35:31] [PASSED] test_default
[18:35:31] ======================== test_size  ========================
[18:35:31] [PASSED] 4
[18:35:31] [PASSED] 8
[18:35:31] [PASSED] 32
[18:35:31] [PASSED] 256
[18:35:31] ==================== [PASSED] test_size ====================
[18:35:31] ======================= test_reuse  ========================
[18:35:31] [PASSED] 4
[18:35:31] [PASSED] 8
[18:35:31] [PASSED] 32
[18:35:31] [PASSED] 256
[18:35:31] =================== [PASSED] test_reuse ====================
[18:35:31] =================== test_range_overlap  ====================
[18:35:31] [PASSED] 4
[18:35:31] [PASSED] 8
[18:35:31] [PASSED] 32
[18:35:31] [PASSED] 256
[18:35:31] =============== [PASSED] test_range_overlap ================
[18:35:31] =================== test_range_compact  ====================
[18:35:31] [PASSED] 4
[18:35:31] [PASSED] 8
[18:35:31] [PASSED] 32
[18:35:31] [PASSED] 256
[18:35:31] =============== [PASSED] test_range_compact ================
[18:35:31] ==================== test_range_spare  =====================
[18:35:31] [PASSED] 4
[18:35:31] [PASSED] 8
[18:35:31] [PASSED] 32
[18:35:31] [PASSED] 256
[18:35:31] ================ [PASSED] test_range_spare =================
[18:35:31] ===================== [PASSED] guc_dbm =====================
[18:35:31] =================== guc_idm (6 subtests) ===================
[18:35:31] [PASSED] bad_init
[18:35:31] [PASSED] no_init
[18:35:31] [PASSED] init_fini
[18:35:31] [PASSED] check_used
[18:35:31] [PASSED] check_quota
[18:35:31] [PASSED] check_all
[18:35:31] ===================== [PASSED] guc_idm =====================
[18:35:31] ================== no_relay (3 subtests) ===================
[18:35:31] [PASSED] xe_drops_guc2pf_if_not_ready
[18:35:31] [PASSED] xe_drops_guc2vf_if_not_ready
[18:35:31] [PASSED] xe_rejects_send_if_not_ready
[18:35:31] ==================== [PASSED] no_relay =====================
[18:35:31] ================== pf_relay (14 subtests) ==================
[18:35:31] [PASSED] pf_rejects_guc2pf_too_short
[18:35:31] [PASSED] pf_rejects_guc2pf_too_long
[18:35:31] [PASSED] pf_rejects_guc2pf_no_payload
[18:35:31] [PASSED] pf_fails_no_payload
[18:35:31] [PASSED] pf_fails_bad_origin
[18:35:31] [PASSED] pf_fails_bad_type
[18:35:31] [PASSED] pf_txn_reports_error
[18:35:31] [PASSED] pf_txn_sends_pf2guc
[18:35:31] [PASSED] pf_sends_pf2guc
[18:35:31] [SKIPPED] pf_loopback_nop
[18:35:31] [SKIPPED] pf_loopback_echo
[18:35:31] [SKIPPED] pf_loopback_fail
[18:35:31] [SKIPPED] pf_loopback_busy
[18:35:31] [SKIPPED] pf_loopback_retry
[18:35:31] ==================== [PASSED] pf_relay =====================
[18:35:31] ================== vf_relay (3 subtests) ===================
[18:35:31] [PASSED] vf_rejects_guc2vf_too_short
[18:35:31] [PASSED] vf_rejects_guc2vf_too_long
[18:35:31] [PASSED] vf_rejects_guc2vf_no_payload
[18:35:31] ==================== [PASSED] vf_relay =====================
[18:35:31] ===================== lmtt (1 subtest) =====================
[18:35:31] ======================== test_ops  =========================
[18:35:31] [PASSED] 2-level
[18:35:31] [PASSED] multi-level
[18:35:31] ==================== [PASSED] test_ops =====================
[18:35:31] ====================== [PASSED] lmtt =======================
[18:35:31] ================= pf_service (11 subtests) =================
[18:35:31] [PASSED] pf_negotiate_any
[18:35:31] [PASSED] pf_negotiate_base_match
[18:35:31] [PASSED] pf_negotiate_base_newer
[18:35:31] [PASSED] pf_negotiate_base_next
[18:35:31] [SKIPPED] pf_negotiate_base_older
[18:35:31] [PASSED] pf_negotiate_base_prev
[18:35:31] [PASSED] pf_negotiate_latest_match
[18:35:31] [PASSED] pf_negotiate_latest_newer
[18:35:31] [PASSED] pf_negotiate_latest_next
[18:35:31] [SKIPPED] pf_negotiate_latest_older
[18:35:31] [SKIPPED] pf_negotiate_latest_prev
[18:35:31] =================== [PASSED] pf_service ====================
[18:35:31] ================= xe_guc_g2g (2 subtests) ==================
[18:35:31] ============== xe_live_guc_g2g_kunit_default  ==============
[18:35:31] ========= [SKIPPED] xe_live_guc_g2g_kunit_default ==========
[18:35:31] ============== xe_live_guc_g2g_kunit_allmem  ===============
[18:35:31] ========== [SKIPPED] xe_live_guc_g2g_kunit_allmem ==========
[18:35:31] =================== [SKIPPED] xe_guc_g2g ===================
[18:35:31] =================== xe_mocs (2 subtests) ===================
[18:35:31] ================ xe_live_mocs_kernel_kunit  ================
[18:35:31] =========== [SKIPPED] xe_live_mocs_kernel_kunit ============
[18:35:31] ================ xe_live_mocs_reset_kunit  =================
[18:35:31] ============ [SKIPPED] xe_live_mocs_reset_kunit ============
[18:35:31] ==================== [SKIPPED] xe_mocs =====================
[18:35:31] ================= xe_migrate (2 subtests) ==================
[18:35:31] ================= xe_migrate_sanity_kunit  =================
[18:35:31] ============ [SKIPPED] xe_migrate_sanity_kunit =============
[18:35:31] ================== xe_validate_ccs_kunit  ==================
[18:35:31] ============= [SKIPPED] xe_validate_ccs_kunit ==============
[18:35:31] =================== [SKIPPED] xe_migrate ===================
[18:35:31] ================== xe_dma_buf (1 subtest) ==================
[18:35:31] ==================== xe_dma_buf_kunit  =====================
[18:35:31] ================ [SKIPPED] xe_dma_buf_kunit ================
[18:35:31] =================== [SKIPPED] xe_dma_buf ===================
[18:35:31] ================= xe_bo_shrink (1 subtest) =================
[18:35:31] =================== xe_bo_shrink_kunit  ====================
[18:35:31] =============== [SKIPPED] xe_bo_shrink_kunit ===============
[18:35:31] ================== [SKIPPED] xe_bo_shrink ==================
[18:35:31] ==================== xe_bo (2 subtests) ====================
[18:35:31] ================== xe_ccs_migrate_kunit  ===================
[18:35:31] ============== [SKIPPED] xe_ccs_migrate_kunit ==============
[18:35:31] ==================== xe_bo_evict_kunit  ====================
[18:35:31] =============== [SKIPPED] xe_bo_evict_kunit ================
[18:35:31] ===================== [SKIPPED] xe_bo ======================
[18:35:31] ==================== args (11 subtests) ====================
[18:35:31] [PASSED] count_args_test
[18:35:31] [PASSED] call_args_example
[18:35:31] [PASSED] call_args_test
[18:35:31] [PASSED] drop_first_arg_example
[18:35:31] [PASSED] drop_first_arg_test
[18:35:31] [PASSED] first_arg_example
[18:35:31] [PASSED] first_arg_test
[18:35:31] [PASSED] last_arg_example
[18:35:31] [PASSED] last_arg_test
[18:35:31] [PASSED] pick_arg_example
[18:35:31] [PASSED] sep_comma_example
[18:35:31] ====================== [PASSED] args =======================
[18:35:31] =================== xe_pci (3 subtests) ====================
[18:35:31] ==================== check_graphics_ip  ====================
[18:35:31] [PASSED] 12.00 Xe_LP
[18:35:31] [PASSED] 12.10 Xe_LP+
[18:35:31] [PASSED] 12.55 Xe_HPG
[18:35:31] [PASSED] 12.60 Xe_HPC
[18:35:31] [PASSED] 12.70 Xe_LPG
[18:35:31] [PASSED] 12.71 Xe_LPG
[18:35:31] [PASSED] 12.74 Xe_LPG+
[18:35:31] [PASSED] 20.01 Xe2_HPG
[18:35:31] [PASSED] 20.02 Xe2_HPG
[18:35:31] [PASSED] 20.04 Xe2_LPG
[18:35:31] [PASSED] 30.00 Xe3_LPG
[18:35:31] [PASSED] 30.01 Xe3_LPG
[18:35:31] [PASSED] 30.03 Xe3_LPG
[18:35:31] ================ [PASSED] check_graphics_ip ================
[18:35:31] ===================== check_media_ip  ======================
[18:35:31] [PASSED] 12.00 Xe_M
[18:35:31] [PASSED] 12.55 Xe_HPM
[18:35:31] [PASSED] 13.00 Xe_LPM+
[18:35:31] [PASSED] 13.01 Xe2_HPM
[18:35:31] [PASSED] 20.00 Xe2_LPM
[18:35:31] [PASSED] 30.00 Xe3_LPM
[18:35:31] [PASSED] 30.02 Xe3_LPM
[18:35:31] ================= [PASSED] check_media_ip ==================
[18:35:31] ================= check_platform_gt_count  =================
[18:35:31] [PASSED] 0x9A60 (TIGERLAKE)
[18:35:31] [PASSED] 0x9A68 (TIGERLAKE)
[18:35:31] [PASSED] 0x9A70 (TIGERLAKE)
[18:35:31] [PASSED] 0x9A40 (TIGERLAKE)
[18:35:31] [PASSED] 0x9A49 (TIGERLAKE)
[18:35:31] [PASSED] 0x9A59 (TIGERLAKE)
[18:35:31] [PASSED] 0x9A78 (TIGERLAKE)
[18:35:31] [PASSED] 0x9AC0 (TIGERLAKE)
[18:35:31] [PASSED] 0x9AC9 (TIGERLAKE)
[18:35:31] [PASSED] 0x9AD9 (TIGERLAKE)
[18:35:31] [PASSED] 0x9AF8 (TIGERLAKE)
[18:35:31] [PASSED] 0x4C80 (ROCKETLAKE)
[18:35:31] [PASSED] 0x4C8A (ROCKETLAKE)
[18:35:31] [PASSED] 0x4C8B (ROCKETLAKE)
[18:35:31] [PASSED] 0x4C8C (ROCKETLAKE)
[18:35:31] [PASSED] 0x4C90 (ROCKETLAKE)
[18:35:31] [PASSED] 0x4C9A (ROCKETLAKE)
[18:35:31] [PASSED] 0x4680 (ALDERLAKE_S)
[18:35:31] [PASSED] 0x4682 (ALDERLAKE_S)
[18:35:31] [PASSED] 0x4688 (ALDERLAKE_S)
[18:35:31] [PASSED] 0x468A (ALDERLAKE_S)
[18:35:31] [PASSED] 0x468B (ALDERLAKE_S)
[18:35:31] [PASSED] 0x4690 (ALDERLAKE_S)
[18:35:31] [PASSED] 0x4692 (ALDERLAKE_S)
[18:35:31] [PASSED] 0x4693 (ALDERLAKE_S)
[18:35:31] [PASSED] 0x46A0 (ALDERLAKE_P)
[18:35:31] [PASSED] 0x46A1 (ALDERLAKE_P)
[18:35:31] [PASSED] 0x46A2 (ALDERLAKE_P)
[18:35:31] [PASSED] 0x46A3 (ALDERLAKE_P)
[18:35:31] [PASSED] 0x46A6 (ALDERLAKE_P)
[18:35:31] [PASSED] 0x46A8 (ALDERLAKE_P)
[18:35:31] [PASSED] 0x46AA (ALDERLAKE_P)
[18:35:31] [PASSED] 0x462A (ALDERLAKE_P)
[18:35:31] [PASSED] 0x4626 (ALDERLAKE_P)
[18:35:31] [PASSED] 0x4628 (ALDERLAKE_P)
[18:35:31] [PASSED] 0x46B0 (ALDERLAKE_P)
[18:35:31] [PASSED] 0x46B1 (ALDERLAKE_P)
[18:35:31] [PASSED] 0x46B2 (ALDERLAKE_P)
[18:35:31] [PASSED] 0x46B3 (ALDERLAKE_P)
[18:35:31] [PASSED] 0x46C0 (ALDERLAKE_P)
[18:35:31] [PASSED] 0x46C1 (ALDERLAKE_P)
[18:35:31] [PASSED] 0x46C2 (ALDERLAKE_P)
[18:35:31] [PASSED] 0x46C3 (ALDERLAKE_P)
[18:35:31] [PASSED] 0x46D0 (ALDERLAKE_N)
[18:35:31] [PASSED] 0x46D1 (ALDERLAKE_N)
[18:35:31] [PASSED] 0x46D2 (ALDERLAKE_N)
[18:35:31] [PASSED] 0x46D3 (ALDERLAKE_N)
[18:35:31] [PASSED] 0x46D4 (ALDERLAKE_N)
[18:35:31] [PASSED] 0xA721 (ALDERLAKE_P)
[18:35:31] [PASSED] 0xA7A1 (ALDERLAKE_P)
[18:35:31] [PASSED] 0xA7A9 (ALDERLAKE_P)
[18:35:31] [PASSED] 0xA7AC (ALDERLAKE_P)
[18:35:31] [PASSED] 0xA7AD (ALDERLAKE_P)
[18:35:31] [PASSED] 0xA720 (ALDERLAKE_P)
[18:35:31] [PASSED] 0xA7A0 (ALDERLAKE_P)
[18:35:31] [PASSED] 0xA7A8 (ALDERLAKE_P)
[18:35:31] [PASSED] 0xA7AA (ALDERLAKE_P)
[18:35:31] [PASSED] 0xA7AB (ALDERLAKE_P)
[18:35:31] [PASSED] 0xA780 (ALDERLAKE_S)
[18:35:31] [PASSED] 0xA781 (ALDERLAKE_S)
[18:35:31] [PASSED] 0xA782 (ALDERLAKE_S)
[18:35:31] [PASSED] 0xA783 (ALDERLAKE_S)
[18:35:31] [PASSED] 0xA788 (ALDERLAKE_S)
[18:35:31] [PASSED] 0xA789 (ALDERLAKE_S)
[18:35:31] [PASSED] 0xA78A (ALDERLAKE_S)
[18:35:31] [PASSED] 0xA78B (ALDERLAKE_S)
[18:35:31] [PASSED] 0x4905 (DG1)
[18:35:31] [PASSED] 0x4906 (DG1)
[18:35:31] [PASSED] 0x4907 (DG1)
[18:35:31] [PASSED] 0x4908 (DG1)
[18:35:31] [PASSED] 0x4909 (DG1)
[18:35:31] [PASSED] 0x56C0 (DG2)
[18:35:31] [PASSED] 0x56C2 (DG2)
[18:35:31] [PASSED] 0x56C1 (DG2)
[18:35:31] [PASSED] 0x7D51 (METEORLAKE)
[18:35:31] [PASSED] 0x7DD1 (METEORLAKE)
[18:35:31] [PASSED] 0x7D41 (METEORLAKE)
[18:35:31] [PASSED] 0x7D67 (METEORLAKE)
[18:35:31] [PASSED] 0xB640 (METEORLAKE)
[18:35:31] [PASSED] 0x56A0 (DG2)
[18:35:31] [PASSED] 0x56A1 (DG2)
[18:35:31] [PASSED] 0x56A2 (DG2)
[18:35:31] [PASSED] 0x56BE (DG2)
[18:35:31] [PASSED] 0x56BF (DG2)
[18:35:31] [PASSED] 0x5690 (DG2)
[18:35:31] [PASSED] 0x5691 (DG2)
[18:35:31] [PASSED] 0x5692 (DG2)
[18:35:31] [PASSED] 0x56A5 (DG2)
[18:35:31] [PASSED] 0x56A6 (DG2)
[18:35:31] [PASSED] 0x56B0 (DG2)
[18:35:31] [PASSED] 0x56B1 (DG2)
[18:35:31] [PASSED] 0x56BA (DG2)
[18:35:31] [PASSED] 0x56BB (DG2)
[18:35:31] [PASSED] 0x56BC (DG2)
[18:35:31] [PASSED] 0x56BD (DG2)
[18:35:31] [PASSED] 0x5693 (DG2)
[18:35:31] [PASSED] 0x5694 (DG2)
[18:35:31] [PASSED] 0x5695 (DG2)
[18:35:31] [PASSED] 0x56A3 (DG2)
[18:35:31] [PASSED] 0x56A4 (DG2)
[18:35:31] [PASSED] 0x56B2 (DG2)
[18:35:31] [PASSED] 0x56B3 (DG2)
[18:35:31] [PASSED] 0x5696 (DG2)
[18:35:31] [PASSED] 0x5697 (DG2)
[18:35:31] [PASSED] 0xB69 (PVC)
[18:35:31] [PASSED] 0xB6E (PVC)
[18:35:31] [PASSED] 0xBD4 (PVC)
[18:35:31] [PASSED] 0xBD5 (PVC)
[18:35:31] [PASSED] 0xBD6 (PVC)
[18:35:31] [PASSED] 0xBD7 (PVC)
[18:35:31] [PASSED] 0xBD8 (PVC)
[18:35:31] [PASSED] 0xBD9 (PVC)
[18:35:31] [PASSED] 0xBDA (PVC)
[18:35:31] [PASSED] 0xBDB (PVC)
[18:35:31] [PASSED] 0xBE0 (PVC)
[18:35:31] [PASSED] 0xBE1 (PVC)
[18:35:31] [PASSED] 0xBE5 (PVC)
[18:35:31] [PASSED] 0x7D40 (METEORLAKE)
[18:35:31] [PASSED] 0x7D45 (METEORLAKE)
[18:35:31] [PASSED] 0x7D55 (METEORLAKE)
[18:35:31] [PASSED] 0x7D60 (METEORLAKE)
[18:35:31] [PASSED] 0x7DD5 (METEORLAKE)
[18:35:31] [PASSED] 0x6420 (LUNARLAKE)
[18:35:31] [PASSED] 0x64A0 (LUNARLAKE)
[18:35:31] [PASSED] 0x64B0 (LUNARLAKE)
[18:35:31] [PASSED] 0xE202 (BATTLEMAGE)
[18:35:31] [PASSED] 0xE209 (BATTLEMAGE)
[18:35:31] [PASSED] 0xE20B (BATTLEMAGE)
[18:35:31] [PASSED] 0xE20C (BATTLEMAGE)
[18:35:31] [PASSED] 0xE20D (BATTLEMAGE)
[18:35:31] [PASSED] 0xE210 (BATTLEMAGE)
[18:35:31] [PASSED] 0xE211 (BATTLEMAGE)
[18:35:31] [PASSED] 0xE212 (BATTLEMAGE)
[18:35:31] [PASSED] 0xE216 (BATTLEMAGE)
[18:35:31] [PASSED] 0xE220 (BATTLEMAGE)
[18:35:31] [PASSED] 0xE221 (BATTLEMAGE)
[18:35:31] [PASSED] 0xE222 (BATTLEMAGE)
[18:35:31] [PASSED] 0xE223 (BATTLEMAGE)
[18:35:31] [PASSED] 0xB080 (PANTHERLAKE)
[18:35:31] [PASSED] 0xB081 (PANTHERLAKE)
[18:35:31] [PASSED] 0xB082 (PANTHERLAKE)
[18:35:31] [PASSED] 0xB083 (PANTHERLAKE)
[18:35:31] [PASSED] 0xB084 (PANTHERLAKE)
[18:35:31] [PASSED] 0xB085 (PANTHERLAKE)
[18:35:31] [PASSED] 0xB086 (PANTHERLAKE)
[18:35:31] [PASSED] 0xB087 (PANTHERLAKE)
[18:35:31] [PASSED] 0xB08F (PANTHERLAKE)
[18:35:31] [PASSED] 0xB090 (PANTHERLAKE)
[18:35:31] [PASSED] 0xB0A0 (PANTHERLAKE)
[18:35:31] [PASSED] 0xB0B0 (PANTHERLAKE)
[18:35:31] [PASSED] 0xFD80 (PANTHERLAKE)
[18:35:31] [PASSED] 0xFD81 (PANTHERLAKE)
[18:35:31] ============= [PASSED] check_platform_gt_count =============
[18:35:31] ===================== [PASSED] xe_pci ======================
[18:35:31] =================== xe_rtp (2 subtests) ====================
[18:35:31] =============== xe_rtp_process_to_sr_tests  ================
[18:35:31] [PASSED] coalesce-same-reg
[18:35:31] [PASSED] no-match-no-add
[18:35:31] [PASSED] match-or
[18:35:31] [PASSED] match-or-xfail
[18:35:31] [PASSED] no-match-no-add-multiple-rules
[18:35:31] [PASSED] two-regs-two-entries
[18:35:31] [PASSED] clr-one-set-other
[18:35:31] [PASSED] set-field
[18:35:31] [PASSED] conflict-duplicate
[18:35:31] [PASSED] conflict-not-disjoint
[18:35:31] [PASSED] conflict-reg-type
[18:35:31] =========== [PASSED] xe_rtp_process_to_sr_tests ============
[18:35:31] ================== xe_rtp_process_tests  ===================
[18:35:31] [PASSED] active1
[18:35:31] [PASSED] active2
[18:35:31] [PASSED] active-inactive
[18:35:31] [PASSED] inactive-active
[18:35:31] [PASSED] inactive-1st_or_active-inactive
[18:35:31] [PASSED] inactive-2nd_or_active-inactive
[18:35:31] [PASSED] inactive-last_or_active-inactive
[18:35:31] [PASSED] inactive-no_or_active-inactive
[18:35:31] ============== [PASSED] xe_rtp_process_tests ===============
[18:35:31] ===================== [PASSED] xe_rtp ======================
[18:35:31] ==================== xe_wa (1 subtest) =====================
[18:35:31] ======================== xe_wa_gt  =========================
[18:35:31] [PASSED] TIGERLAKE B0
[18:35:31] [PASSED] DG1 A0
[18:35:31] [PASSED] DG1 B0
[18:35:31] [PASSED] ALDERLAKE_S A0
[18:35:31] [PASSED] ALDERLAKE_S B0
stty: 'standard input': Inappropriate ioctl for device
[18:35:31] [PASSED] ALDERLAKE_S C0
[18:35:31] [PASSED] ALDERLAKE_S D0
[18:35:31] [PASSED] ALDERLAKE_P A0
[18:35:31] [PASSED] ALDERLAKE_P B0
[18:35:31] [PASSED] ALDERLAKE_P C0
[18:35:31] [PASSED] ALDERLAKE_S RPLS D0
[18:35:31] [PASSED] ALDERLAKE_P RPLU E0
[18:35:31] [PASSED] DG2 G10 C0
[18:35:31] [PASSED] DG2 G11 B1
[18:35:31] [PASSED] DG2 G12 A1
[18:35:31] [PASSED] METEORLAKE 12.70(Xe_LPG) A0 13.00(Xe_LPM+) A0
[18:35:31] [PASSED] METEORLAKE 12.71(Xe_LPG) A0 13.00(Xe_LPM+) A0
[18:35:31] [PASSED] METEORLAKE 12.74(Xe_LPG+) A0 13.00(Xe_LPM+) A0
[18:35:31] [PASSED] LUNARLAKE 20.04(Xe2_LPG) A0 20.00(Xe2_LPM) A0
[18:35:31] [PASSED] LUNARLAKE 20.04(Xe2_LPG) B0 20.00(Xe2_LPM) A0
[18:35:31] [PASSED] BATTLEMAGE 20.01(Xe2_HPG) A0 13.01(Xe2_HPM) A1
[18:35:31] [PASSED] PANTHERLAKE 30.00(Xe3_LPG) A0 30.00(Xe3_LPM) A0
[18:35:31] ==================== [PASSED] xe_wa_gt =====================
[18:35:31] ====================== [PASSED] xe_wa ======================
[18:35:31] ============================================================
[18:35:31] Testing complete. Ran 306 tests: passed: 288, skipped: 18
[18:35:31] Elapsed time: 35.161s total, 4.289s configuring, 30.505s building, 0.321s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/tests/.kunitconfig
[18:35:31] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[18:35:33] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[18:35:58] Starting KUnit Kernel (1/1)...
[18:35:58] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[18:35:58] ============ drm_test_pick_cmdline (2 subtests) ============
[18:35:58] [PASSED] drm_test_pick_cmdline_res_1920_1080_60
[18:35:58] =============== drm_test_pick_cmdline_named  ===============
[18:35:58] [PASSED] NTSC
[18:35:58] [PASSED] NTSC-J
[18:35:58] [PASSED] PAL
[18:35:58] [PASSED] PAL-M
[18:35:58] =========== [PASSED] drm_test_pick_cmdline_named ===========
[18:35:58] ============== [PASSED] drm_test_pick_cmdline ==============
[18:35:58] == drm_test_atomic_get_connector_for_encoder (1 subtest) ===
[18:35:58] [PASSED] drm_test_drm_atomic_get_connector_for_encoder
[18:35:58] ==== [PASSED] drm_test_atomic_get_connector_for_encoder ====
[18:35:58] =========== drm_validate_clone_mode (2 subtests) ===========
[18:35:58] ============== drm_test_check_in_clone_mode  ===============
[18:35:58] [PASSED] in_clone_mode
[18:35:58] [PASSED] not_in_clone_mode
[18:35:58] ========== [PASSED] drm_test_check_in_clone_mode ===========
[18:35:58] =============== drm_test_check_valid_clones  ===============
[18:35:58] [PASSED] not_in_clone_mode
[18:35:58] [PASSED] valid_clone
[18:35:58] [PASSED] invalid_clone
[18:35:58] =========== [PASSED] drm_test_check_valid_clones ===========
[18:35:58] ============= [PASSED] drm_validate_clone_mode =============
[18:35:58] ============= drm_validate_modeset (1 subtest) =============
[18:35:58] [PASSED] drm_test_check_connector_changed_modeset
[18:35:58] ============== [PASSED] drm_validate_modeset ===============
[18:35:58] ====== drm_test_bridge_get_current_state (2 subtests) ======
[18:35:58] [PASSED] drm_test_drm_bridge_get_current_state_atomic
[18:35:58] [PASSED] drm_test_drm_bridge_get_current_state_legacy
[18:35:58] ======== [PASSED] drm_test_bridge_get_current_state ========
[18:35:58] ====== drm_test_bridge_helper_reset_crtc (3 subtests) ======
[18:35:58] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic
[18:35:58] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic_disabled
[18:35:58] [PASSED] drm_test_drm_bridge_helper_reset_crtc_legacy
[18:35:58] ======== [PASSED] drm_test_bridge_helper_reset_crtc ========
[18:35:58] ============== drm_bridge_alloc (2 subtests) ===============
[18:35:58] [PASSED] drm_test_drm_bridge_alloc_basic
[18:35:58] [PASSED] drm_test_drm_bridge_alloc_get_put
[18:35:58] ================ [PASSED] drm_bridge_alloc =================
[18:35:58] ================== drm_buddy (8 subtests) ==================
[18:35:58] [PASSED] drm_test_buddy_alloc_limit
[18:35:58] [PASSED] drm_test_buddy_alloc_optimistic
[18:35:58] [PASSED] drm_test_buddy_alloc_pessimistic
[18:35:58] [PASSED] drm_test_buddy_alloc_pathological
[18:35:58] [PASSED] drm_test_buddy_alloc_contiguous
[18:35:58] [PASSED] drm_test_buddy_alloc_clear
[18:35:58] [PASSED] drm_test_buddy_alloc_range_bias
[18:35:58] [PASSED] drm_test_buddy_fragmentation_performance
[18:35:58] ==================== [PASSED] drm_buddy ====================
[18:35:58] ============= drm_cmdline_parser (40 subtests) =============
[18:35:58] [PASSED] drm_test_cmdline_force_d_only
[18:35:58] [PASSED] drm_test_cmdline_force_D_only_dvi
[18:35:58] [PASSED] drm_test_cmdline_force_D_only_hdmi
[18:35:58] [PASSED] drm_test_cmdline_force_D_only_not_digital
[18:35:58] [PASSED] drm_test_cmdline_force_e_only
[18:35:58] [PASSED] drm_test_cmdline_res
[18:35:58] [PASSED] drm_test_cmdline_res_vesa
[18:35:58] [PASSED] drm_test_cmdline_res_vesa_rblank
[18:35:58] [PASSED] drm_test_cmdline_res_rblank
[18:35:58] [PASSED] drm_test_cmdline_res_bpp
[18:35:58] [PASSED] drm_test_cmdline_res_refresh
[18:35:58] [PASSED] drm_test_cmdline_res_bpp_refresh
[18:35:58] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced
[18:35:58] [PASSED] drm_test_cmdline_res_bpp_refresh_margins
[18:35:58] [PASSED] drm_test_cmdline_res_bpp_refresh_force_off
[18:35:58] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on
[18:35:58] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_analog
[18:35:58] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_digital
[18:35:58] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced_margins_force_on
[18:35:58] [PASSED] drm_test_cmdline_res_margins_force_on
[18:35:58] [PASSED] drm_test_cmdline_res_vesa_margins
[18:35:58] [PASSED] drm_test_cmdline_name
[18:35:58] [PASSED] drm_test_cmdline_name_bpp
[18:35:58] [PASSED] drm_test_cmdline_name_option
[18:35:58] [PASSED] drm_test_cmdline_name_bpp_option
[18:35:58] [PASSED] drm_test_cmdline_rotate_0
[18:35:58] [PASSED] drm_test_cmdline_rotate_90
[18:35:58] [PASSED] drm_test_cmdline_rotate_180
[18:35:58] [PASSED] drm_test_cmdline_rotate_270
[18:35:58] [PASSED] drm_test_cmdline_hmirror
[18:35:58] [PASSED] drm_test_cmdline_vmirror
[18:35:58] [PASSED] drm_test_cmdline_margin_options
[18:35:58] [PASSED] drm_test_cmdline_multiple_options
[18:35:58] [PASSED] drm_test_cmdline_bpp_extra_and_option
[18:35:58] [PASSED] drm_test_cmdline_extra_and_option
[18:35:58] [PASSED] drm_test_cmdline_freestanding_options
[18:35:58] [PASSED] drm_test_cmdline_freestanding_force_e_and_options
[18:35:58] [PASSED] drm_test_cmdline_panel_orientation
[18:35:58] ================ drm_test_cmdline_invalid  =================
[18:35:58] [PASSED] margin_only
[18:35:58] [PASSED] interlace_only
[18:35:58] [PASSED] res_missing_x
[18:35:58] [PASSED] res_missing_y
[18:35:58] [PASSED] res_bad_y
[18:35:58] [PASSED] res_missing_y_bpp
[18:35:58] [PASSED] res_bad_bpp
[18:35:58] [PASSED] res_bad_refresh
[18:35:58] [PASSED] res_bpp_refresh_force_on_off
[18:35:58] [PASSED] res_invalid_mode
[18:35:58] [PASSED] res_bpp_wrong_place_mode
[18:35:58] [PASSED] name_bpp_refresh
[18:35:58] [PASSED] name_refresh
[18:35:58] [PASSED] name_refresh_wrong_mode
[18:35:58] [PASSED] name_refresh_invalid_mode
[18:35:58] [PASSED] rotate_multiple
[18:35:58] [PASSED] rotate_invalid_val
[18:35:58] [PASSED] rotate_truncated
[18:35:58] [PASSED] invalid_option
[18:35:58] [PASSED] invalid_tv_option
[18:35:58] [PASSED] truncated_tv_option
[18:35:58] ============ [PASSED] drm_test_cmdline_invalid =============
[18:35:58] =============== drm_test_cmdline_tv_options  ===============
[18:35:58] [PASSED] NTSC
[18:35:58] [PASSED] NTSC_443
[18:35:58] [PASSED] NTSC_J
[18:35:58] [PASSED] PAL
[18:35:58] [PASSED] PAL_M
[18:35:58] [PASSED] PAL_N
[18:35:58] [PASSED] SECAM
[18:35:58] [PASSED] MONO_525
[18:35:58] [PASSED] MONO_625
[18:35:58] =========== [PASSED] drm_test_cmdline_tv_options ===========
[18:35:58] =============== [PASSED] drm_cmdline_parser ================
[18:35:58] ========== drmm_connector_hdmi_init (20 subtests) ==========
[18:35:58] [PASSED] drm_test_connector_hdmi_init_valid
[18:35:58] [PASSED] drm_test_connector_hdmi_init_bpc_8
[18:35:58] [PASSED] drm_test_connector_hdmi_init_bpc_10
[18:35:58] [PASSED] drm_test_connector_hdmi_init_bpc_12
[18:35:58] [PASSED] drm_test_connector_hdmi_init_bpc_invalid
[18:35:58] [PASSED] drm_test_connector_hdmi_init_bpc_null
[18:35:58] [PASSED] drm_test_connector_hdmi_init_formats_empty
[18:35:58] [PASSED] drm_test_connector_hdmi_init_formats_no_rgb
[18:35:58] === drm_test_connector_hdmi_init_formats_yuv420_allowed  ===
[18:35:58] [PASSED] supported_formats=0x9 yuv420_allowed=1
[18:35:58] [PASSED] supported_formats=0x9 yuv420_allowed=0
[18:35:58] [PASSED] supported_formats=0x3 yuv420_allowed=1
[18:35:58] [PASSED] supported_formats=0x3 yuv420_allowed=0
[18:35:58] === [PASSED] drm_test_connector_hdmi_init_formats_yuv420_allowed ===
[18:35:58] [PASSED] drm_test_connector_hdmi_init_null_ddc
[18:35:58] [PASSED] drm_test_connector_hdmi_init_null_product
[18:35:58] [PASSED] drm_test_connector_hdmi_init_null_vendor
[18:35:58] [PASSED] drm_test_connector_hdmi_init_product_length_exact
[18:35:58] [PASSED] drm_test_connector_hdmi_init_product_length_too_long
[18:35:58] [PASSED] drm_test_connector_hdmi_init_product_valid
[18:35:58] [PASSED] drm_test_connector_hdmi_init_vendor_length_exact
[18:35:58] [PASSED] drm_test_connector_hdmi_init_vendor_length_too_long
[18:35:58] [PASSED] drm_test_connector_hdmi_init_vendor_valid
[18:35:58] ========= drm_test_connector_hdmi_init_type_valid  =========
[18:35:58] [PASSED] HDMI-A
[18:35:58] [PASSED] HDMI-B
[18:35:58] ===== [PASSED] drm_test_connector_hdmi_init_type_valid =====
[18:35:58] ======== drm_test_connector_hdmi_init_type_invalid  ========
[18:35:58] [PASSED] Unknown
[18:35:58] [PASSED] VGA
[18:35:58] [PASSED] DVI-I
[18:35:58] [PASSED] DVI-D
[18:35:58] [PASSED] DVI-A
[18:35:58] [PASSED] Composite
[18:35:58] [PASSED] SVIDEO
[18:35:58] [PASSED] LVDS
[18:35:58] [PASSED] Component
[18:35:58] [PASSED] DIN
[18:35:58] [PASSED] DP
[18:35:58] [PASSED] TV
[18:35:58] [PASSED] eDP
[18:35:58] [PASSED] Virtual
[18:35:58] [PASSED] DSI
[18:35:58] [PASSED] DPI
[18:35:58] [PASSED] Writeback
[18:35:58] [PASSED] SPI
[18:35:58] [PASSED] USB
[18:35:58] ==== [PASSED] drm_test_connector_hdmi_init_type_invalid ====
[18:35:58] ============ [PASSED] drmm_connector_hdmi_init =============
[18:35:58] ============= drmm_connector_init (3 subtests) =============
[18:35:58] [PASSED] drm_test_drmm_connector_init
[18:35:58] [PASSED] drm_test_drmm_connector_init_null_ddc
[18:35:58] ========= drm_test_drmm_connector_init_type_valid  =========
[18:35:58] [PASSED] Unknown
[18:35:58] [PASSED] VGA
[18:35:58] [PASSED] DVI-I
[18:35:58] [PASSED] DVI-D
[18:35:58] [PASSED] DVI-A
[18:35:58] [PASSED] Composite
[18:35:58] [PASSED] SVIDEO
[18:35:58] [PASSED] LVDS
[18:35:58] [PASSED] Component
[18:35:58] [PASSED] DIN
[18:35:58] [PASSED] DP
[18:35:58] [PASSED] HDMI-A
[18:35:58] [PASSED] HDMI-B
[18:35:58] [PASSED] TV
[18:35:58] [PASSED] eDP
[18:35:58] [PASSED] Virtual
[18:35:58] [PASSED] DSI
[18:35:58] [PASSED] DPI
[18:35:58] [PASSED] Writeback
[18:35:58] [PASSED] SPI
[18:35:58] [PASSED] USB
[18:35:58] ===== [PASSED] drm_test_drmm_connector_init_type_valid =====
[18:35:58] =============== [PASSED] drmm_connector_init ===============
[18:35:58] ========= drm_connector_dynamic_init (6 subtests) ==========
[18:35:58] [PASSED] drm_test_drm_connector_dynamic_init
[18:35:58] [PASSED] drm_test_drm_connector_dynamic_init_null_ddc
[18:35:58] [PASSED] drm_test_drm_connector_dynamic_init_not_added
[18:35:58] [PASSED] drm_test_drm_connector_dynamic_init_properties
[18:35:58] ===== drm_test_drm_connector_dynamic_init_type_valid  ======
[18:35:58] [PASSED] Unknown
[18:35:58] [PASSED] VGA
[18:35:58] [PASSED] DVI-I
[18:35:58] [PASSED] DVI-D
[18:35:58] [PASSED] DVI-A
[18:35:58] [PASSED] Composite
[18:35:58] [PASSED] SVIDEO
[18:35:58] [PASSED] LVDS
[18:35:58] [PASSED] Component
[18:35:58] [PASSED] DIN
[18:35:58] [PASSED] DP
[18:35:58] [PASSED] HDMI-A
[18:35:58] [PASSED] HDMI-B
[18:35:58] [PASSED] TV
[18:35:58] [PASSED] eDP
[18:35:58] [PASSED] Virtual
[18:35:58] [PASSED] DSI
[18:35:58] [PASSED] DPI
[18:35:58] [PASSED] Writeback
[18:35:58] [PASSED] SPI
[18:35:58] [PASSED] USB
[18:35:58] = [PASSED] drm_test_drm_connector_dynamic_init_type_valid ==
[18:35:58] ======== drm_test_drm_connector_dynamic_init_name  =========
[18:35:58] [PASSED] Unknown
[18:35:58] [PASSED] VGA
[18:35:58] [PASSED] DVI-I
[18:35:58] [PASSED] DVI-D
[18:35:58] [PASSED] DVI-A
[18:35:58] [PASSED] Composite
[18:35:58] [PASSED] SVIDEO
[18:35:58] [PASSED] LVDS
[18:35:58] [PASSED] Component
[18:35:58] [PASSED] DIN
[18:35:58] [PASSED] DP
[18:35:58] [PASSED] HDMI-A
[18:35:58] [PASSED] HDMI-B
[18:35:58] [PASSED] TV
[18:35:58] [PASSED] eDP
[18:35:58] [PASSED] Virtual
[18:35:58] [PASSED] DSI
[18:35:58] [PASSED] DPI
[18:35:58] [PASSED] Writeback
[18:35:58] [PASSED] SPI
[18:35:58] [PASSED] USB
[18:35:58] ==== [PASSED] drm_test_drm_connector_dynamic_init_name =====
[18:35:58] =========== [PASSED] drm_connector_dynamic_init ============
[18:35:58] ==== drm_connector_dynamic_register_early (4 subtests) =====
[18:35:58] [PASSED] drm_test_drm_connector_dynamic_register_early_on_list
[18:35:58] [PASSED] drm_test_drm_connector_dynamic_register_early_defer
[18:35:58] [PASSED] drm_test_drm_connector_dynamic_register_early_no_init
[18:35:58] [PASSED] drm_test_drm_connector_dynamic_register_early_no_mode_object
[18:35:58] ====== [PASSED] drm_connector_dynamic_register_early =======
[18:35:58] ======= drm_connector_dynamic_register (7 subtests) ========
[18:35:58] [PASSED] drm_test_drm_connector_dynamic_register_on_list
[18:35:58] [PASSED] drm_test_drm_connector_dynamic_register_no_defer
[18:35:58] [PASSED] drm_test_drm_connector_dynamic_register_no_init
[18:35:58] [PASSED] drm_test_drm_connector_dynamic_register_mode_object
[18:35:58] [PASSED] drm_test_drm_connector_dynamic_register_sysfs
[18:35:58] [PASSED] drm_test_drm_connector_dynamic_register_sysfs_name
[18:35:58] [PASSED] drm_test_drm_connector_dynamic_register_debugfs
[18:35:58] ========= [PASSED] drm_connector_dynamic_register ==========
[18:35:58] = drm_connector_attach_broadcast_rgb_property (2 subtests) =
[18:35:58] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property
[18:35:58] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property_hdmi_connector
[18:35:58] === [PASSED] drm_connector_attach_broadcast_rgb_property ===
[18:35:58] ========== drm_get_tv_mode_from_name (2 subtests) ==========
[18:35:58] ========== drm_test_get_tv_mode_from_name_valid  ===========
[18:35:58] [PASSED] NTSC
[18:35:58] [PASSED] NTSC-443
[18:35:58] [PASSED] NTSC-J
[18:35:58] [PASSED] PAL
[18:35:58] [PASSED] PAL-M
[18:35:58] [PASSED] PAL-N
[18:35:58] [PASSED] SECAM
[18:35:58] [PASSED] Mono
[18:35:58] ====== [PASSED] drm_test_get_tv_mode_from_name_valid =======
[18:35:58] [PASSED] drm_test_get_tv_mode_from_name_truncated
[18:35:58] ============ [PASSED] drm_get_tv_mode_from_name ============
[18:35:58] = drm_test_connector_hdmi_compute_mode_clock (12 subtests) =
[18:35:58] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb
[18:35:58] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc
[18:35:58] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc_vic_1
[18:35:58] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc
[18:35:58] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc_vic_1
[18:35:58] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_double
[18:35:58] = drm_test_connector_hdmi_compute_mode_clock_yuv420_valid  =
[18:35:58] [PASSED] VIC 96
[18:35:58] [PASSED] VIC 97
[18:35:58] [PASSED] VIC 101
[18:35:58] [PASSED] VIC 102
[18:35:58] [PASSED] VIC 106
[18:35:58] [PASSED] VIC 107
[18:35:58] === [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_valid ===
[18:35:58] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_10_bpc
[18:35:58] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_12_bpc
[18:35:58] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_8_bpc
[18:35:58] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_10_bpc
[18:35:58] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_12_bpc
[18:35:58] === [PASSED] drm_test_connector_hdmi_compute_mode_clock ====
[18:35:58] == drm_hdmi_connector_get_broadcast_rgb_name (2 subtests) ==
[18:35:58] === drm_test_drm_hdmi_connector_get_broadcast_rgb_name  ====
[18:35:58] [PASSED] Automatic
[18:35:58] [PASSED] Full
[18:35:58] [PASSED] Limited 16:235
[18:35:58] === [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name ===
[18:35:58] [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name_invalid
[18:35:58] ==== [PASSED] drm_hdmi_connector_get_broadcast_rgb_name ====
[18:35:58] == drm_hdmi_connector_get_output_format_name (2 subtests) ==
[18:35:58] === drm_test_drm_hdmi_connector_get_output_format_name  ====
[18:35:58] [PASSED] RGB
[18:35:58] [PASSED] YUV 4:2:0
[18:35:58] [PASSED] YUV 4:2:2
[18:35:58] [PASSED] YUV 4:4:4
[18:35:58] === [PASSED] drm_test_drm_hdmi_connector_get_output_format_name ===
[18:35:58] [PASSED] drm_test_drm_hdmi_connector_get_output_format_name_invalid
[18:35:58] ==== [PASSED] drm_hdmi_connector_get_output_format_name ====
[18:35:58] ============= drm_damage_helper (21 subtests) ==============
[18:35:58] [PASSED] drm_test_damage_iter_no_damage
[18:35:58] [PASSED] drm_test_damage_iter_no_damage_fractional_src
[18:35:58] [PASSED] drm_test_damage_iter_no_damage_src_moved
[18:35:58] [PASSED] drm_test_damage_iter_no_damage_fractional_src_moved
[18:35:58] [PASSED] drm_test_damage_iter_no_damage_not_visible
[18:35:58] [PASSED] drm_test_damage_iter_no_damage_no_crtc
[18:35:58] [PASSED] drm_test_damage_iter_no_damage_no_fb
[18:35:58] [PASSED] drm_test_damage_iter_simple_damage
[18:35:58] [PASSED] drm_test_damage_iter_single_damage
[18:35:58] [PASSED] drm_test_damage_iter_single_damage_intersect_src
[18:35:58] [PASSED] drm_test_damage_iter_single_damage_outside_src
[18:35:58] [PASSED] drm_test_damage_iter_single_damage_fractional_src
[18:35:58] [PASSED] drm_test_damage_iter_single_damage_intersect_fractional_src
[18:35:58] [PASSED] drm_test_damage_iter_single_damage_outside_fractional_src
[18:35:58] [PASSED] drm_test_damage_iter_single_damage_src_moved
[18:35:58] [PASSED] drm_test_damage_iter_single_damage_fractional_src_moved
[18:35:58] [PASSED] drm_test_damage_iter_damage
[18:35:58] [PASSED] drm_test_damage_iter_damage_one_intersect
[18:35:58] [PASSED] drm_test_damage_iter_damage_one_outside
[18:35:58] [PASSED] drm_test_damage_iter_damage_src_moved
[18:35:58] [PASSED] drm_test_damage_iter_damage_not_visible
[18:35:58] ================ [PASSED] drm_damage_helper ================
[18:35:58] ============== drm_dp_mst_helper (3 subtests) ==============
[18:35:58] ============== drm_test_dp_mst_calc_pbn_mode  ==============
[18:35:58] [PASSED] Clock 154000 BPP 30 DSC disabled
[18:35:58] [PASSED] Clock 234000 BPP 30 DSC disabled
[18:35:58] [PASSED] Clock 297000 BPP 24 DSC disabled
[18:35:58] [PASSED] Clock 332880 BPP 24 DSC enabled
[18:35:58] [PASSED] Clock 324540 BPP 24 DSC enabled
[18:35:58] ========== [PASSED] drm_test_dp_mst_calc_pbn_mode ==========
[18:35:58] ============== drm_test_dp_mst_calc_pbn_div  ===============
[18:35:58] [PASSED] Link rate 2000000 lane count 4
[18:35:58] [PASSED] Link rate 2000000 lane count 2
[18:35:58] [PASSED] Link rate 2000000 lane count 1
[18:35:58] [PASSED] Link rate 1350000 lane count 4
[18:35:58] [PASSED] Link rate 1350000 lane count 2
[18:35:58] [PASSED] Link rate 1350000 lane count 1
[18:35:58] [PASSED] Link rate 1000000 lane count 4
[18:35:58] [PASSED] Link rate 1000000 lane count 2
[18:35:58] [PASSED] Link rate 1000000 lane count 1
[18:35:58] [PASSED] Link rate 810000 lane count 4
[18:35:58] [PASSED] Link rate 810000 lane count 2
[18:35:58] [PASSED] Link rate 810000 lane count 1
[18:35:58] [PASSED] Link rate 540000 lane count 4
[18:35:58] [PASSED] Link rate 540000 lane count 2
[18:35:58] [PASSED] Link rate 540000 lane count 1
[18:35:58] [PASSED] Link rate 270000 lane count 4
[18:35:58] [PASSED] Link rate 270000 lane count 2
[18:35:58] [PASSED] Link rate 270000 lane count 1
[18:35:58] [PASSED] Link rate 162000 lane count 4
[18:35:58] [PASSED] Link rate 162000 lane count 2
[18:35:58] [PASSED] Link rate 162000 lane count 1
[18:35:58] ========== [PASSED] drm_test_dp_mst_calc_pbn_div ===========
[18:35:58] ========= drm_test_dp_mst_sideband_msg_req_decode  =========
[18:35:58] [PASSED] DP_ENUM_PATH_RESOURCES with port number
[18:35:58] [PASSED] DP_POWER_UP_PHY with port number
[18:35:58] [PASSED] DP_POWER_DOWN_PHY with port number
[18:35:58] [PASSED] DP_ALLOCATE_PAYLOAD with SDP stream sinks
[18:35:58] [PASSED] DP_ALLOCATE_PAYLOAD with port number
[18:35:58] [PASSED] DP_ALLOCATE_PAYLOAD with VCPI
[18:35:58] [PASSED] DP_ALLOCATE_PAYLOAD with PBN
[18:35:58] [PASSED] DP_QUERY_PAYLOAD with port number
[18:35:58] [PASSED] DP_QUERY_PAYLOAD with VCPI
[18:35:58] [PASSED] DP_REMOTE_DPCD_READ with port number
[18:35:58] [PASSED] DP_REMOTE_DPCD_READ with DPCD address
[18:35:58] [PASSED] DP_REMOTE_DPCD_READ with max number of bytes
[18:35:58] [PASSED] DP_REMOTE_DPCD_WRITE with port number
[18:35:58] [PASSED] DP_REMOTE_DPCD_WRITE with DPCD address
[18:35:58] [PASSED] DP_REMOTE_DPCD_WRITE with data array
[18:35:58] [PASSED] DP_REMOTE_I2C_READ with port number
[18:35:58] [PASSED] DP_REMOTE_I2C_READ with I2C device ID
[18:35:58] [PASSED] DP_REMOTE_I2C_READ with transactions array
[18:35:58] [PASSED] DP_REMOTE_I2C_WRITE with port number
[18:35:58] [PASSED] DP_REMOTE_I2C_WRITE with I2C device ID
[18:35:58] [PASSED] DP_REMOTE_I2C_WRITE with data array
[18:35:58] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream ID
[18:35:58] [PASSED] DP_QUERY_STREAM_ENC_STATUS with client ID
[18:35:58] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream event
[18:35:58] [PASSED] DP_QUERY_STREAM_ENC_STATUS with valid stream event
[18:35:58] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream behavior
[18:35:58] [PASSED] DP_QUERY_STREAM_ENC_STATUS with a valid stream behavior
[18:35:58] ===== [PASSED] drm_test_dp_mst_sideband_msg_req_decode =====
[18:35:58] ================ [PASSED] drm_dp_mst_helper ================
[18:35:58] ================== drm_exec (7 subtests) ===================
[18:35:58] [PASSED] sanitycheck
[18:35:58] [PASSED] test_lock
[18:35:58] [PASSED] test_lock_unlock
[18:35:58] [PASSED] test_duplicates
[18:35:58] [PASSED] test_prepare
[18:35:58] [PASSED] test_prepare_array
[18:35:58] [PASSED] test_multiple_loops
[18:35:58] ==================== [PASSED] drm_exec =====================
[18:35:58] =========== drm_format_helper_test (17 subtests) ===========
[18:35:58] ============== drm_test_fb_xrgb8888_to_gray8  ==============
[18:35:58] [PASSED] single_pixel_source_buffer
[18:35:58] [PASSED] single_pixel_clip_rectangle
[18:35:58] [PASSED] well_known_colors
[18:35:58] [PASSED] destination_pitch
[18:35:58] ========== [PASSED] drm_test_fb_xrgb8888_to_gray8 ==========
[18:35:58] ============= drm_test_fb_xrgb8888_to_rgb332  ==============
[18:35:58] [PASSED] single_pixel_source_buffer
[18:35:58] [PASSED] single_pixel_clip_rectangle
[18:35:58] [PASSED] well_known_colors
[18:35:58] [PASSED] destination_pitch
[18:35:58] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb332 ==========
[18:35:58] ============= drm_test_fb_xrgb8888_to_rgb565  ==============
[18:35:58] [PASSED] single_pixel_source_buffer
[18:35:58] [PASSED] single_pixel_clip_rectangle
[18:35:58] [PASSED] well_known_colors
[18:35:58] [PASSED] destination_pitch
[18:35:58] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb565 ==========
[18:35:58] ============ drm_test_fb_xrgb8888_to_xrgb1555  =============
[18:35:58] [PASSED] single_pixel_source_buffer
[18:35:58] [PASSED] single_pixel_clip_rectangle
[18:35:58] [PASSED] well_known_colors
[18:35:58] [PASSED] destination_pitch
[18:35:58] ======== [PASSED] drm_test_fb_xrgb8888_to_xrgb1555 =========
[18:35:58] ============ drm_test_fb_xrgb8888_to_argb1555  =============
[18:35:58] [PASSED] single_pixel_source_buffer
[18:35:58] [PASSED] single_pixel_clip_rectangle
[18:35:58] [PASSED] well_known_colors
[18:35:58] [PASSED] destination_pitch
[18:35:58] ======== [PASSED] drm_test_fb_xrgb8888_to_argb1555 =========
[18:35:58] ============ drm_test_fb_xrgb8888_to_rgba5551  =============
[18:35:58] [PASSED] single_pixel_source_buffer
[18:35:58] [PASSED] single_pixel_clip_rectangle
[18:35:58] [PASSED] well_known_colors
[18:35:58] [PASSED] destination_pitch
[18:35:58] ======== [PASSED] drm_test_fb_xrgb8888_to_rgba5551 =========
[18:35:58] ============= drm_test_fb_xrgb8888_to_rgb888  ==============
[18:35:58] [PASSED] single_pixel_source_buffer
[18:35:58] [PASSED] single_pixel_clip_rectangle
[18:35:58] [PASSED] well_known_colors
[18:35:58] [PASSED] destination_pitch
[18:35:58] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb888 ==========
[18:35:58] ============= drm_test_fb_xrgb8888_to_bgr888  ==============
[18:35:58] [PASSED] single_pixel_source_buffer
[18:35:58] [PASSED] single_pixel_clip_rectangle
[18:35:58] [PASSED] well_known_colors
[18:35:58] [PASSED] destination_pitch
[18:35:58] ========= [PASSED] drm_test_fb_xrgb8888_to_bgr888 ==========
[18:35:58] ============ drm_test_fb_xrgb8888_to_argb8888  =============
[18:35:58] [PASSED] single_pixel_source_buffer
[18:35:58] [PASSED] single_pixel_clip_rectangle
[18:35:58] [PASSED] well_known_colors
[18:35:58] [PASSED] destination_pitch
[18:35:58] ======== [PASSED] drm_test_fb_xrgb8888_to_argb8888 =========
[18:35:58] =========== drm_test_fb_xrgb8888_to_xrgb2101010  ===========
[18:35:58] [PASSED] single_pixel_source_buffer
[18:35:58] [PASSED] single_pixel_clip_rectangle
[18:35:58] [PASSED] well_known_colors
[18:35:58] [PASSED] destination_pitch
[18:35:58] ======= [PASSED] drm_test_fb_xrgb8888_to_xrgb2101010 =======
[18:35:58] =========== drm_test_fb_xrgb8888_to_argb2101010  ===========
[18:35:58] [PASSED] single_pixel_source_buffer
[18:35:58] [PASSED] single_pixel_clip_rectangle
[18:35:58] [PASSED] well_known_colors
[18:35:58] [PASSED] destination_pitch
[18:35:58] ======= [PASSED] drm_test_fb_xrgb8888_to_argb2101010 =======
[18:35:58] ============== drm_test_fb_xrgb8888_to_mono  ===============
[18:35:58] [PASSED] single_pixel_source_buffer
[18:35:58] [PASSED] single_pixel_clip_rectangle
[18:35:58] [PASSED] well_known_colors
[18:35:58] [PASSED] destination_pitch
[18:35:58] ========== [PASSED] drm_test_fb_xrgb8888_to_mono ===========
[18:35:58] ==================== drm_test_fb_swab  =====================
[18:35:58] [PASSED] single_pixel_source_buffer
[18:35:58] [PASSED] single_pixel_clip_rectangle
[18:35:58] [PASSED] well_known_colors
[18:35:58] [PASSED] destination_pitch
[18:35:58] ================ [PASSED] drm_test_fb_swab =================
[18:35:58] ============ drm_test_fb_xrgb8888_to_xbgr8888  =============
[18:35:58] [PASSED] single_pixel_source_buffer
[18:35:58] [PASSED] single_pixel_clip_rectangle
[18:35:58] [PASSED] well_known_colors
[18:35:58] [PASSED] destination_pitch
[18:35:58] ======== [PASSED] drm_test_fb_xrgb8888_to_xbgr8888 =========
[18:35:58] ============ drm_test_fb_xrgb8888_to_abgr8888  =============
[18:35:58] [PASSED] single_pixel_source_buffer
[18:35:58] [PASSED] single_pixel_clip_rectangle
[18:35:58] [PASSED] well_known_colors
[18:35:58] [PASSED] destination_pitch
[18:35:58] ======== [PASSED] drm_test_fb_xrgb8888_to_abgr8888 =========
[18:35:58] ================= drm_test_fb_clip_offset  =================
[18:35:58] [PASSED] pass through
[18:35:58] [PASSED] horizontal offset
[18:35:58] [PASSED] vertical offset
[18:35:58] [PASSED] horizontal and vertical offset
[18:35:58] [PASSED] horizontal offset (custom pitch)
[18:35:58] [PASSED] vertical offset (custom pitch)
[18:35:58] [PASSED] horizontal and vertical offset (custom pitch)
[18:35:58] ============= [PASSED] drm_test_fb_clip_offset =============
[18:35:58] =================== drm_test_fb_memcpy  ====================
[18:35:58] [PASSED] single_pixel_source_buffer: XR24 little-endian (0x34325258)
[18:35:58] [PASSED] single_pixel_source_buffer: XRA8 little-endian (0x38415258)
[18:35:58] [PASSED] single_pixel_source_buffer: YU24 little-endian (0x34325559)
[18:35:58] [PASSED] single_pixel_clip_rectangle: XB24 little-endian (0x34324258)
[18:35:58] [PASSED] single_pixel_clip_rectangle: XRA8 little-endian (0x38415258)
[18:35:58] [PASSED] single_pixel_clip_rectangle: YU24 little-endian (0x34325559)
[18:35:58] [PASSED] well_known_colors: XB24 little-endian (0x34324258)
[18:35:58] [PASSED] well_known_colors: XRA8 little-endian (0x38415258)
[18:35:58] [PASSED] well_known_colors: YU24 little-endian (0x34325559)
[18:35:58] [PASSED] destination_pitch: XB24 little-endian (0x34324258)
[18:35:58] [PASSED] destination_pitch: XRA8 little-endian (0x38415258)
[18:35:58] [PASSED] destination_pitch: YU24 little-endian (0x34325559)
[18:35:58] =============== [PASSED] drm_test_fb_memcpy ================
[18:35:58] ============= [PASSED] drm_format_helper_test ==============
[18:35:58] ================= drm_format (18 subtests) =================
[18:35:58] [PASSED] drm_test_format_block_width_invalid
[18:35:58] [PASSED] drm_test_format_block_width_one_plane
[18:35:58] [PASSED] drm_test_format_block_width_two_plane
[18:35:58] [PASSED] drm_test_format_block_width_three_plane
[18:35:58] [PASSED] drm_test_format_block_width_tiled
[18:35:58] [PASSED] drm_test_format_block_height_invalid
[18:35:58] [PASSED] drm_test_format_block_height_one_plane
[18:35:58] [PASSED] drm_test_format_block_height_two_plane
[18:35:58] [PASSED] drm_test_format_block_height_three_plane
[18:35:58] [PASSED] drm_test_format_block_height_tiled
[18:35:58] [PASSED] drm_test_format_min_pitch_invalid
[18:35:58] [PASSED] drm_test_format_min_pitch_one_plane_8bpp
[18:35:58] [PASSED] drm_test_format_min_pitch_one_plane_16bpp
[18:35:58] [PASSED] drm_test_format_min_pitch_one_plane_24bpp
[18:35:58] [PASSED] drm_test_format_min_pitch_one_plane_32bpp
[18:35:58] [PASSED] drm_test_format_min_pitch_two_plane
[18:35:58] [PASSED] drm_test_format_min_pitch_three_plane_8bpp
[18:35:58] [PASSED] drm_test_format_min_pitch_tiled
[18:35:58] =================== [PASSED] drm_format ====================
[18:35:58] ============== drm_framebuffer (10 subtests) ===============
[18:35:58] ========== drm_test_framebuffer_check_src_coords  ==========
[18:35:58] [PASSED] Success: source fits into fb
[18:35:58] [PASSED] Fail: overflowing fb with x-axis coordinate
[18:35:58] [PASSED] Fail: overflowing fb with y-axis coordinate
[18:35:58] [PASSED] Fail: overflowing fb with source width
[18:35:58] [PASSED] Fail: overflowing fb with source height
[18:35:58] ====== [PASSED] drm_test_framebuffer_check_src_coords ======
[18:35:58] [PASSED] drm_test_framebuffer_cleanup
[18:35:58] =============== drm_test_framebuffer_create  ===============
[18:35:58] [PASSED] ABGR8888 normal sizes
[18:35:58] [PASSED] ABGR8888 max sizes
[18:35:58] [PASSED] ABGR8888 pitch greater than min required
[18:35:58] [PASSED] ABGR8888 pitch less than min required
[18:35:58] [PASSED] ABGR8888 Invalid width
[18:35:58] [PASSED] ABGR8888 Invalid buffer handle
[18:35:58] [PASSED] No pixel format
[18:35:58] [PASSED] ABGR8888 Width 0
[18:35:58] [PASSED] ABGR8888 Height 0
[18:35:58] [PASSED] ABGR8888 Out of bound height * pitch combination
[18:35:58] [PASSED] ABGR8888 Large buffer offset
[18:35:58] [PASSED] ABGR8888 Buffer offset for inexistent plane
[18:35:58] [PASSED] ABGR8888 Invalid flag
[18:35:58] [PASSED] ABGR8888 Set DRM_MODE_FB_MODIFIERS without modifiers
[18:35:58] [PASSED] ABGR8888 Valid buffer modifier
[18:35:58] [PASSED] ABGR8888 Invalid buffer modifier(DRM_FORMAT_MOD_SAMSUNG_64_32_TILE)
[18:35:58] [PASSED] ABGR8888 Extra pitches without DRM_MODE_FB_MODIFIERS
[18:35:58] [PASSED] ABGR8888 Extra pitches with DRM_MODE_FB_MODIFIERS
[18:35:58] [PASSED] NV12 Normal sizes
[18:35:58] [PASSED] NV12 Max sizes
[18:35:58] [PASSED] NV12 Invalid pitch
[18:35:58] [PASSED] NV12 Invalid modifier/missing DRM_MODE_FB_MODIFIERS flag
[18:35:58] [PASSED] NV12 different  modifier per-plane
[18:35:58] [PASSED] NV12 with DRM_FORMAT_MOD_SAMSUNG_64_32_TILE
[18:35:58] [PASSED] NV12 Valid modifiers without DRM_MODE_FB_MODIFIERS
[18:35:58] [PASSED] NV12 Modifier for inexistent plane
[18:35:58] [PASSED] NV12 Handle for inexistent plane
[18:35:58] [PASSED] NV12 Handle for inexistent plane without DRM_MODE_FB_MODIFIERS
[18:35:58] [PASSED] YVU420 DRM_MODE_FB_MODIFIERS set without modifier
[18:35:58] [PASSED] YVU420 Normal sizes
[18:35:58] [PASSED] YVU420 Max sizes
[18:35:58] [PASSED] YVU420 Invalid pitch
[18:35:58] [PASSED] YVU420 Different pitches
[18:35:58] [PASSED] YVU420 Different buffer offsets/pitches
[18:35:58] [PASSED] YVU420 Modifier set just for plane 0, without DRM_MODE_FB_MODIFIERS
[18:35:58] [PASSED] YVU420 Modifier set just for planes 0, 1, without DRM_MODE_FB_MODIFIERS
[18:35:58] [PASSED] YVU420 Modifier set just for plane 0, 1, with DRM_MODE_FB_MODIFIERS
[18:35:58] [PASSED] YVU420 Valid modifier
[18:35:58] [PASSED] YVU420 Different modifiers per plane
[18:35:58] [PASSED] YVU420 Modifier for inexistent plane
[18:35:58] [PASSED] YUV420_10BIT Invalid modifier(DRM_FORMAT_MOD_LINEAR)
[18:35:58] [PASSED] X0L2 Normal sizes
[18:35:58] [PASSED] X0L2 Max sizes
[18:35:58] [PASSED] X0L2 Invalid pitch
[18:35:58] [PASSED] X0L2 Pitch greater than minimum required
[18:35:58] [PASSED] X0L2 Handle for inexistent plane
[18:35:58] [PASSED] X0L2 Offset for inexistent plane, without DRM_MODE_FB_MODIFIERS set
[18:35:58] [PASSED] X0L2 Modifier without DRM_MODE_FB_MODIFIERS set
[18:35:58] [PASSED] X0L2 Valid modifier
[18:35:58] [PASSED] X0L2 Modifier for inexistent plane
[18:35:58] =========== [PASSED] drm_test_framebuffer_create ===========
[18:35:58] [PASSED] drm_test_framebuffer_free
[18:35:58] [PASSED] drm_test_framebuffer_init
[18:35:58] [PASSED] drm_test_framebuffer_init_bad_format
[18:35:58] [PASSED] drm_test_framebuffer_init_dev_mismatch
[18:35:58] [PASSED] drm_test_framebuffer_lookup
[18:35:58] [PASSED] drm_test_framebuffer_lookup_inexistent
[18:35:58] [PASSED] drm_test_framebuffer_modifiers_not_supported
[18:35:58] ================= [PASSED] drm_framebuffer =================
[18:35:58] ================ drm_gem_shmem (8 subtests) ================
[18:35:58] [PASSED] drm_gem_shmem_test_obj_create
[18:35:58] [PASSED] drm_gem_shmem_test_obj_create_private
[18:35:58] [PASSED] drm_gem_shmem_test_pin_pages
[18:35:58] [PASSED] drm_gem_shmem_test_vmap
[18:35:58] [PASSED] drm_gem_shmem_test_get_pages_sgt
[18:35:58] [PASSED] drm_gem_shmem_test_get_sg_table
[18:35:58] [PASSED] drm_gem_shmem_test_madvise
[18:35:58] [PASSED] drm_gem_shmem_test_purge
[18:35:58] ================== [PASSED] drm_gem_shmem ==================
[18:35:58] === drm_atomic_helper_connector_hdmi_check (27 subtests) ===
[18:35:58] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode
[18:35:58] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode_vic_1
[18:35:58] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode
[18:35:58] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode_vic_1
[18:35:58] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode
[18:35:58] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode_vic_1
[18:35:58] ====== drm_test_check_broadcast_rgb_cea_mode_yuv420  =======
[18:35:58] [PASSED] Automatic
[18:35:58] [PASSED] Full
[18:35:58] [PASSED] Limited 16:235
[18:35:58] == [PASSED] drm_test_check_broadcast_rgb_cea_mode_yuv420 ===
[18:35:58] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_changed
[18:35:58] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_not_changed
[18:35:58] [PASSED] drm_test_check_disable_connector
[18:35:58] [PASSED] drm_test_check_hdmi_funcs_reject_rate
[18:35:58] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_rgb
[18:35:58] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_yuv420
[18:35:58] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv422
[18:35:58] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv420
[18:35:58] [PASSED] drm_test_check_driver_unsupported_fallback_yuv420
[18:35:58] [PASSED] drm_test_check_output_bpc_crtc_mode_changed
[18:35:58] [PASSED] drm_test_check_output_bpc_crtc_mode_not_changed
[18:35:58] [PASSED] drm_test_check_output_bpc_dvi
[18:35:58] [PASSED] drm_test_check_output_bpc_format_vic_1
[18:35:58] [PASSED] drm_test_check_output_bpc_format_display_8bpc_only
[18:35:58] [PASSED] drm_test_check_output_bpc_format_display_rgb_only
[18:35:58] [PASSED] drm_test_check_output_bpc_format_driver_8bpc_only
[18:35:58] [PASSED] drm_test_check_output_bpc_format_driver_rgb_only
[18:35:58] [PASSED] drm_test_check_tmds_char_rate_rgb_8bpc
[18:35:58] [PASSED] drm_test_check_tmds_char_rate_rgb_10bpc
[18:35:58] [PASSED] drm_test_check_tmds_char_rate_rgb_12bpc
[18:35:58] ===== [PASSED] drm_atomic_helper_connector_hdmi_check ======
[18:35:58] === drm_atomic_helper_connector_hdmi_reset (6 subtests) ====
[18:35:58] [PASSED] drm_test_check_broadcast_rgb_value
[18:35:58] [PASSED] drm_test_check_bpc_8_value
[18:35:58] [PASSED] drm_test_check_bpc_10_value
[18:35:58] [PASSED] drm_test_check_bpc_12_value
[18:35:58] [PASSED] drm_test_check_format_value
[18:35:58] [PASSED] drm_test_check_tmds_char_value
[18:35:58] ===== [PASSED] drm_atomic_helper_connector_hdmi_reset ======
[18:35:58] = drm_atomic_helper_connector_hdmi_mode_valid (4 subtests) =
[18:35:58] [PASSED] drm_test_check_mode_valid
[18:35:58] [PASSED] drm_test_check_mode_valid_reject
[18:35:58] [PASSED] drm_test_check_mode_valid_reject_rate
[18:35:58] [PASSED] drm_test_check_mode_valid_reject_max_clock
[18:35:58] === [PASSED] drm_atomic_helper_connector_hdmi_mode_valid ===
[18:35:58] ================= drm_managed (2 subtests) =================
[18:35:58] [PASSED] drm_test_managed_release_action
[18:35:58] [PASSED] drm_test_managed_run_action
[18:35:58] =================== [PASSED] drm_managed ===================
[18:35:58] =================== drm_mm (6 subtests) ====================
[18:35:58] [PASSED] drm_test_mm_init
[18:35:58] [PASSED] drm_test_mm_debug
[18:35:58] [PASSED] drm_test_mm_align32
[18:35:58] [PASSED] drm_test_mm_align64
[18:35:58] [PASSED] drm_test_mm_lowest
[18:35:58] [PASSED] drm_test_mm_highest
[18:35:58] ===================== [PASSED] drm_mm ======================
[18:35:58] ============= drm_modes_analog_tv (5 subtests) =============
[18:35:58] [PASSED] drm_test_modes_analog_tv_mono_576i
[18:35:58] [PASSED] drm_test_modes_analog_tv_ntsc_480i
[18:35:58] [PASSED] drm_test_modes_analog_tv_ntsc_480i_inlined
[18:35:58] [PASSED] drm_test_modes_analog_tv_pal_576i
[18:35:58] [PASSED] drm_test_modes_analog_tv_pal_576i_inlined
[18:35:58] =============== [PASSED] drm_modes_analog_tv ===============
[18:35:58] ============== drm_plane_helper (2 subtests) ===============
[18:35:58] =============== drm_test_check_plane_state  ================
[18:35:58] [PASSED] clipping_simple
[18:35:58] [PASSED] clipping_rotate_reflect
[18:35:58] [PASSED] positioning_simple
[18:35:58] [PASSED] upscaling
[18:35:58] [PASSED] downscaling
[18:35:58] [PASSED] rounding1
[18:35:58] [PASSED] rounding2
[18:35:58] [PASSED] rounding3
[18:35:58] [PASSED] rounding4
[18:35:58] =========== [PASSED] drm_test_check_plane_state ============
[18:35:58] =========== drm_test_check_invalid_plane_state  ============
[18:35:58] [PASSED] positioning_invalid
[18:35:58] [PASSED] upscaling_invalid
[18:35:58] [PASSED] downscaling_invalid
[18:35:58] ======= [PASSED] drm_test_check_invalid_plane_state ========
[18:35:58] ================ [PASSED] drm_plane_helper =================
[18:35:58] ====== drm_connector_helper_tv_get_modes (1 subtest) =======
[18:35:58] ====== drm_test_connector_helper_tv_get_modes_check  =======
[18:35:58] [PASSED] None
[18:35:58] [PASSED] PAL
[18:35:58] [PASSED] NTSC
[18:35:58] [PASSED] Both, NTSC Default
[18:35:58] [PASSED] Both, PAL Default
[18:35:58] [PASSED] Both, NTSC Default, with PAL on command-line
[18:35:58] [PASSED] Both, PAL Default, with NTSC on command-line
[18:35:58] == [PASSED] drm_test_connector_helper_tv_get_modes_check ===
[18:35:58] ======== [PASSED] drm_connector_helper_tv_get_modes ========
[18:35:58] ================== drm_rect (9 subtests) ===================
[18:35:58] [PASSED] drm_test_rect_clip_scaled_div_by_zero
[18:35:58] [PASSED] drm_test_rect_clip_scaled_not_clipped
[18:35:58] [PASSED] drm_test_rect_clip_scaled_clipped
[18:35:58] [PASSED] drm_test_rect_clip_scaled_signed_vs_unsigned
[18:35:58] ================= drm_test_rect_intersect  =================
[18:35:58] [PASSED] top-left x bottom-right: 2x2+1+1 x 2x2+0+0
[18:35:58] [PASSED] top-right x bottom-left: 2x2+0+0 x 2x2+1-1
[18:35:58] [PASSED] bottom-left x top-right: 2x2+1-1 x 2x2+0+0
[18:35:58] [PASSED] bottom-right x top-left: 2x2+0+0 x 2x2+1+1
[18:35:58] [PASSED] right x left: 2x1+0+0 x 3x1+1+0
[18:35:58] [PASSED] left x right: 3x1+1+0 x 2x1+0+0
[18:35:58] [PASSED] up x bottom: 1x2+0+0 x 1x3+0-1
[18:35:58] [PASSED] bottom x up: 1x3+0-1 x 1x2+0+0
[18:35:58] [PASSED] touching corner: 1x1+0+0 x 2x2+1+1
[18:35:58] [PASSED] touching side: 1x1+0+0 x 1x1+1+0
[18:35:58] [PASSED] equal rects: 2x2+0+0 x 2x2+0+0
[18:35:58] [PASSED] inside another: 2x2+0+0 x 1x1+1+1
[18:35:58] [PASSED] far away: 1x1+0+0 x 1x1+3+6
[18:35:58] [PASSED] points intersecting: 0x0+5+10 x 0x0+5+10
[18:35:58] [PASSED] points not intersecting: 0x0+0+0 x 0x0+5+10
[18:35:58] ============= [PASSED] drm_test_rect_intersect =============
[18:35:58] ================ drm_test_rect_calc_hscale  ================
[18:35:58] [PASSED] normal use
[18:35:58] [PASSED] out of max range
[18:35:58] [PASSED] out of min range
[18:35:58] [PASSED] zero dst
[18:35:58] [PASSED] negative src
[18:35:58] [PASSED] negative dst
[18:35:58] ============ [PASSED] drm_test_rect_calc_hscale ============
[18:35:58] ================ drm_test_rect_calc_vscale  ================
[18:35:58] [PASSED] normal use
stty: 'standard input': Inappropriate ioctl for device
[18:35:58] [PASSED] out of max range
[18:35:58] [PASSED] out of min range
[18:35:58] [PASSED] zero dst
[18:35:58] [PASSED] negative src
[18:35:58] [PASSED] negative dst
[18:35:58] ============ [PASSED] drm_test_rect_calc_vscale ============
[18:35:58] ================== drm_test_rect_rotate  ===================
[18:35:58] [PASSED] reflect-x
[18:35:58] [PASSED] reflect-y
[18:35:58] [PASSED] rotate-0
[18:35:58] [PASSED] rotate-90
[18:35:58] [PASSED] rotate-180
[18:35:58] [PASSED] rotate-270
[18:35:58] ============== [PASSED] drm_test_rect_rotate ===============
[18:35:58] ================ drm_test_rect_rotate_inv  =================
[18:35:58] [PASSED] reflect-x
[18:35:58] [PASSED] reflect-y
[18:35:58] [PASSED] rotate-0
[18:35:58] [PASSED] rotate-90
[18:35:58] [PASSED] rotate-180
[18:35:58] [PASSED] rotate-270
[18:35:58] ============ [PASSED] drm_test_rect_rotate_inv =============
[18:35:58] ==================== [PASSED] drm_rect =====================
[18:35:58] ============ drm_sysfb_modeset_test (1 subtest) ============
[18:35:58] ============ drm_test_sysfb_build_fourcc_list  =============
[18:35:58] [PASSED] no native formats
[18:35:58] [PASSED] XRGB8888 as native format
[18:35:58] [PASSED] remove duplicates
[18:35:58] [PASSED] convert alpha formats
[18:35:58] [PASSED] random formats
[18:35:58] ======== [PASSED] drm_test_sysfb_build_fourcc_list =========
[18:35:58] ============= [PASSED] drm_sysfb_modeset_test ==============
[18:35:58] ============================================================
[18:35:58] Testing complete. Ran 622 tests: passed: 622
[18:35:58] Elapsed time: 26.945s total, 1.723s configuring, 24.799s building, 0.393s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/ttm/tests/.kunitconfig
[18:35:58] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[18:36:00] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[18:36:10] Starting KUnit Kernel (1/1)...
[18:36:10] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[18:36:10] ================= ttm_device (5 subtests) ==================
[18:36:10] [PASSED] ttm_device_init_basic
[18:36:10] [PASSED] ttm_device_init_multiple
[18:36:10] [PASSED] ttm_device_fini_basic
[18:36:10] [PASSED] ttm_device_init_no_vma_man
[18:36:10] ================== ttm_device_init_pools  ==================
[18:36:10] [PASSED] No DMA allocations, no DMA32 required
[18:36:10] [PASSED] DMA allocations, DMA32 required
[18:36:10] [PASSED] No DMA allocations, DMA32 required
[18:36:10] [PASSED] DMA allocations, no DMA32 required
[18:36:10] ============== [PASSED] ttm_device_init_pools ==============
[18:36:10] =================== [PASSED] ttm_device ====================
[18:36:10] ================== ttm_pool (8 subtests) ===================
[18:36:10] ================== ttm_pool_alloc_basic  ===================
[18:36:10] [PASSED] One page
[18:36:10] [PASSED] More than one page
[18:36:10] [PASSED] Above the allocation limit
[18:36:10] [PASSED] One page, with coherent DMA mappings enabled
[18:36:10] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[18:36:10] ============== [PASSED] ttm_pool_alloc_basic ===============
[18:36:10] ============== ttm_pool_alloc_basic_dma_addr  ==============
[18:36:10] [PASSED] One page
[18:36:10] [PASSED] More than one page
[18:36:10] [PASSED] Above the allocation limit
[18:36:10] [PASSED] One page, with coherent DMA mappings enabled
[18:36:10] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[18:36:10] ========== [PASSED] ttm_pool_alloc_basic_dma_addr ==========
[18:36:10] [PASSED] ttm_pool_alloc_order_caching_match
[18:36:10] [PASSED] ttm_pool_alloc_caching_mismatch
[18:36:10] [PASSED] ttm_pool_alloc_order_mismatch
[18:36:10] [PASSED] ttm_pool_free_dma_alloc
[18:36:10] [PASSED] ttm_pool_free_no_dma_alloc
[18:36:10] [PASSED] ttm_pool_fini_basic
[18:36:10] ==================== [PASSED] ttm_pool =====================
[18:36:10] ================ ttm_resource (8 subtests) =================
[18:36:10] ================= ttm_resource_init_basic  =================
[18:36:10] [PASSED] Init resource in TTM_PL_SYSTEM
[18:36:10] [PASSED] Init resource in TTM_PL_VRAM
[18:36:10] [PASSED] Init resource in a private placement
[18:36:10] [PASSED] Init resource in TTM_PL_SYSTEM, set placement flags
[18:36:10] ============= [PASSED] ttm_resource_init_basic =============
[18:36:10] [PASSED] ttm_resource_init_pinned
[18:36:10] [PASSED] ttm_resource_fini_basic
[18:36:10] [PASSED] ttm_resource_manager_init_basic
[18:36:10] [PASSED] ttm_resource_manager_usage_basic
[18:36:10] [PASSED] ttm_resource_manager_set_used_basic
[18:36:10] [PASSED] ttm_sys_man_alloc_basic
[18:36:10] [PASSED] ttm_sys_man_free_basic
[18:36:10] ================== [PASSED] ttm_resource ===================
[18:36:10] =================== ttm_tt (15 subtests) ===================
[18:36:10] ==================== ttm_tt_init_basic  ====================
[18:36:10] [PASSED] Page-aligned size
[18:36:10] [PASSED] Extra pages requested
[18:36:10] ================ [PASSED] ttm_tt_init_basic ================
[18:36:10] [PASSED] ttm_tt_init_misaligned
[18:36:10] [PASSED] ttm_tt_fini_basic
[18:36:10] [PASSED] ttm_tt_fini_sg
[18:36:10] [PASSED] ttm_tt_fini_shmem
[18:36:10] [PASSED] ttm_tt_create_basic
[18:36:10] [PASSED] ttm_tt_create_invalid_bo_type
[18:36:10] [PASSED] ttm_tt_create_ttm_exists
[18:36:10] [PASSED] ttm_tt_create_failed
[18:36:10] [PASSED] ttm_tt_destroy_basic
[18:36:10] [PASSED] ttm_tt_populate_null_ttm
[18:36:10] [PASSED] ttm_tt_populate_populated_ttm
[18:36:10] [PASSED] ttm_tt_unpopulate_basic
[18:36:10] [PASSED] ttm_tt_unpopulate_empty_ttm
[18:36:10] [PASSED] ttm_tt_swapin_basic
[18:36:10] ===================== [PASSED] ttm_tt ======================
[18:36:10] =================== ttm_bo (14 subtests) ===================
[18:36:10] =========== ttm_bo_reserve_optimistic_no_ticket  ===========
[18:36:10] [PASSED] Cannot be interrupted and sleeps
[18:36:10] [PASSED] Cannot be interrupted, locks straight away
[18:36:10] [PASSED] Can be interrupted, sleeps
[18:36:10] ======= [PASSED] ttm_bo_reserve_optimistic_no_ticket =======
[18:36:10] [PASSED] ttm_bo_reserve_locked_no_sleep
[18:36:10] [PASSED] ttm_bo_reserve_no_wait_ticket
[18:36:10] [PASSED] ttm_bo_reserve_double_resv
[18:36:10] [PASSED] ttm_bo_reserve_interrupted
[18:36:10] [PASSED] ttm_bo_reserve_deadlock
[18:36:10] [PASSED] ttm_bo_unreserve_basic
[18:36:10] [PASSED] ttm_bo_unreserve_pinned
[18:36:10] [PASSED] ttm_bo_unreserve_bulk
[18:36:10] [PASSED] ttm_bo_fini_basic
[18:36:10] [PASSED] ttm_bo_fini_shared_resv
[18:36:10] [PASSED] ttm_bo_pin_basic
[18:36:10] [PASSED] ttm_bo_pin_unpin_resource
[18:36:10] [PASSED] ttm_bo_multiple_pin_one_unpin
[18:36:10] ===================== [PASSED] ttm_bo ======================
[18:36:10] ============== ttm_bo_validate (21 subtests) ===============
[18:36:10] ============== ttm_bo_init_reserved_sys_man  ===============
[18:36:10] [PASSED] Buffer object for userspace
[18:36:10] [PASSED] Kernel buffer object
[18:36:10] [PASSED] Shared buffer object
[18:36:10] ========== [PASSED] ttm_bo_init_reserved_sys_man ===========
[18:36:10] ============== ttm_bo_init_reserved_mock_man  ==============
[18:36:10] [PASSED] Buffer object for userspace
[18:36:10] [PASSED] Kernel buffer object
[18:36:10] [PASSED] Shared buffer object
[18:36:10] ========== [PASSED] ttm_bo_init_reserved_mock_man ==========
[18:36:10] [PASSED] ttm_bo_init_reserved_resv
[18:36:10] ================== ttm_bo_validate_basic  ==================
[18:36:10] [PASSED] Buffer object for userspace
[18:36:10] [PASSED] Kernel buffer object
[18:36:10] [PASSED] Shared buffer object
[18:36:10] ============== [PASSED] ttm_bo_validate_basic ==============
[18:36:10] [PASSED] ttm_bo_validate_invalid_placement
[18:36:10] ============= ttm_bo_validate_same_placement  ==============
[18:36:10] [PASSED] System manager
[18:36:10] [PASSED] VRAM manager
[18:36:10] ========= [PASSED] ttm_bo_validate_same_placement ==========
[18:36:10] [PASSED] ttm_bo_validate_failed_alloc
[18:36:10] [PASSED] ttm_bo_validate_pinned
[18:36:10] [PASSED] ttm_bo_validate_busy_placement
[18:36:10] ================ ttm_bo_validate_multihop  =================
[18:36:10] [PASSED] Buffer object for userspace
[18:36:10] [PASSED] Kernel buffer object
[18:36:10] [PASSED] Shared buffer object
[18:36:10] ============ [PASSED] ttm_bo_validate_multihop =============
[18:36:10] ========== ttm_bo_validate_no_placement_signaled  ==========
[18:36:10] [PASSED] Buffer object in system domain, no page vector
[18:36:10] [PASSED] Buffer object in system domain with an existing page vector
[18:36:10] ====== [PASSED] ttm_bo_validate_no_placement_signaled ======
[18:36:10] ======== ttm_bo_validate_no_placement_not_signaled  ========
[18:36:10] [PASSED] Buffer object for userspace
[18:36:10] [PASSED] Kernel buffer object
[18:36:10] [PASSED] Shared buffer object
[18:36:10] ==== [PASSED] ttm_bo_validate_no_placement_not_signaled ====
[18:36:10] [PASSED] ttm_bo_validate_move_fence_signaled
[18:36:10] ========= ttm_bo_validate_move_fence_not_signaled  =========
[18:36:10] [PASSED] Waits for GPU
[18:36:10] [PASSED] Tries to lock straight away
[18:36:10] ===== [PASSED] ttm_bo_validate_move_fence_not_signaled =====
[18:36:10] [PASSED] ttm_bo_validate_happy_evict
[18:36:10] [PASSED] ttm_bo_validate_all_pinned_evict
[18:36:10] [PASSED] ttm_bo_validate_allowed_only_evict
[18:36:10] [PASSED] ttm_bo_validate_deleted_evict
[18:36:10] [PASSED] ttm_bo_validate_busy_domain_evict
[18:36:10] [PASSED] ttm_bo_validate_evict_gutting
[18:36:10] [PASSED] ttm_bo_validate_recrusive_evict
stty: 'standard input': Inappropriate ioctl for device
[18:36:10] ================= [PASSED] ttm_bo_validate =================
[18:36:10] ============================================================
[18:36:10] Testing complete. Ran 101 tests: passed: 101
[18:36:10] Elapsed time: 11.511s total, 1.741s configuring, 9.553s building, 0.181s running

+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 15+ messages in thread

* ✓ Xe.CI.BAT: success for Fix serialization on burst of unbinds
  2025-10-17 16:52 [PATCH 0/1] Fix serialization on burst of unbinds Matthew Brost
  2025-10-17 16:52 ` [PATCH 1/1] drm/xe: Avoid serializing unbind jobs on prior TLB invalidations Matthew Brost
  2025-10-17 18:36 ` ✓ CI.KUnit: success for Fix serialization on burst of unbinds Patchwork
@ 2025-10-17 19:16 ` Patchwork
  2025-10-18 18:20 ` ✗ Xe.CI.Full: failure " Patchwork
  3 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2025-10-17 19:16 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 951 bytes --]

== Series Details ==

Series: Fix serialization on burst of unbinds
URL   : https://patchwork.freedesktop.org/series/156144/
State : success

== Summary ==

CI Bug Log - changes from xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f_BAT -> xe-pw-156144v1_BAT
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Participating hosts (11 -> 11)
------------------------------

  No changes in participating hosts


Changes
-------

  No changes found


Build changes
-------------

  * Linux: xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f -> xe-pw-156144v1

  IGT_8592: b3d809d537febc23792ab8d0eb6d13cf80d626c8 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f: b5e5976a35cb0e8e45aea836b42ecccf22df803f
  xe-pw-156144v1: 156144v1

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/index.html

[-- Attachment #2: Type: text/html, Size: 1499 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* ✗ Xe.CI.Full: failure for Fix serialization on burst of unbinds
  2025-10-17 16:52 [PATCH 0/1] Fix serialization on burst of unbinds Matthew Brost
                   ` (2 preceding siblings ...)
  2025-10-17 19:16 ` ✓ Xe.CI.BAT: " Patchwork
@ 2025-10-18 18:20 ` Patchwork
  3 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2025-10-18 18:20 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 41027 bytes --]

== Series Details ==

Series: Fix serialization on burst of unbinds
URL   : https://patchwork.freedesktop.org/series/156144/
State : failure

== Summary ==

CI Bug Log - changes from xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f_FULL -> xe-pw-156144v1_FULL
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with xe-pw-156144v1_FULL absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in xe-pw-156144v1_FULL, please notify your bug team (I915-ci-infra@lists.freedesktop.org) to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (4 -> 4)
------------------------------

  No changes in participating hosts

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in xe-pw-156144v1_FULL:

### IGT changes ###

#### Possible regressions ####

  * igt@kms_color@deep-color:
    - shard-bmg:          [PASS][1] -> [INCOMPLETE][2] +1 other test incomplete
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-2/igt@kms_color@deep-color.html
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-8/igt@kms_color@deep-color.html

  * igt@xe_evict_ccs@evict-overcommit-standalone-nofree-samefd:
    - shard-bmg:          [PASS][3] -> [FAIL][4]
   [3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-1/igt@xe_evict_ccs@evict-overcommit-standalone-nofree-samefd.html
   [4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-2/igt@xe_evict_ccs@evict-overcommit-standalone-nofree-samefd.html

  
Known issues
------------

  Here are the changes found in xe-pw-156144v1_FULL that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_big_fb@4-tiled-64bpp-rotate-90:
    - shard-adlp:         NOTRUN -> [SKIP][5] ([Intel XE#1124]) +4 other tests skip
   [5]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-2/igt@kms_big_fb@4-tiled-64bpp-rotate-90.html

  * igt@kms_big_fb@linear-32bpp-rotate-270:
    - shard-adlp:         NOTRUN -> [SKIP][6] ([Intel XE#316]) +2 other tests skip
   [6]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-6/igt@kms_big_fb@linear-32bpp-rotate-270.html

  * igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-hflip:
    - shard-adlp:         NOTRUN -> [DMESG-FAIL][7] ([Intel XE#4543]) +3 other tests dmesg-fail
   [7]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-2/igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-hflip.html

  * igt@kms_big_fb@yf-tiled-32bpp-rotate-90:
    - shard-dg2-set2:     NOTRUN -> [SKIP][8] ([Intel XE#1124])
   [8]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-dg2-434/igt@kms_big_fb@yf-tiled-32bpp-rotate-90.html

  * igt@kms_bw@linear-tiling-1-displays-2560x1440p:
    - shard-adlp:         NOTRUN -> [SKIP][9] ([Intel XE#367]) +2 other tests skip
   [9]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-6/igt@kms_bw@linear-tiling-1-displays-2560x1440p.html

  * igt@kms_ccs@crc-primary-basic-4-tiled-lnl-ccs@pipe-b-dp-2:
    - shard-bmg:          NOTRUN -> [SKIP][10] ([Intel XE#2652] / [Intel XE#787]) +3 other tests skip
   [10]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-2/igt@kms_ccs@crc-primary-basic-4-tiled-lnl-ccs@pipe-b-dp-2.html

  * igt@kms_ccs@crc-primary-suspend-4-tiled-bmg-ccs:
    - shard-bmg:          [PASS][11] -> [INCOMPLETE][12] ([Intel XE#3862]) +1 other test incomplete
   [11]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-7/igt@kms_ccs@crc-primary-suspend-4-tiled-bmg-ccs.html
   [12]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-5/igt@kms_ccs@crc-primary-suspend-4-tiled-bmg-ccs.html

  * igt@kms_ccs@crc-primary-suspend-4-tiled-dg2-rc-ccs-cc@pipe-c-hdmi-a-6:
    - shard-dg2-set2:     [PASS][13] -> [INCOMPLETE][14] ([Intel XE#3862]) +1 other test incomplete
   [13]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-dg2-463/igt@kms_ccs@crc-primary-suspend-4-tiled-dg2-rc-ccs-cc@pipe-c-hdmi-a-6.html
   [14]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-dg2-466/igt@kms_ccs@crc-primary-suspend-4-tiled-dg2-rc-ccs-cc@pipe-c-hdmi-a-6.html

  * igt@kms_ccs@crc-sprite-planes-basic-4-tiled-dg2-rc-ccs-cc@pipe-b-hdmi-a-1:
    - shard-adlp:         NOTRUN -> [SKIP][15] ([Intel XE#787]) +23 other tests skip
   [15]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-6/igt@kms_ccs@crc-sprite-planes-basic-4-tiled-dg2-rc-ccs-cc@pipe-b-hdmi-a-1.html

  * igt@kms_ccs@random-ccs-data-4-tiled-dg2-mc-ccs@pipe-c-dp-4:
    - shard-dg2-set2:     NOTRUN -> [INCOMPLETE][16] ([Intel XE#1727] / [Intel XE#2705] / [Intel XE#3113] / [Intel XE#4212] / [Intel XE#4522]) +1 other test incomplete
   [16]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-dg2-463/igt@kms_ccs@random-ccs-data-4-tiled-dg2-mc-ccs@pipe-c-dp-4.html

  * igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc:
    - shard-adlp:         NOTRUN -> [SKIP][17] ([Intel XE#455] / [Intel XE#787]) +15 other tests skip
   [17]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-2/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc.html
    - shard-dg2-set2:     [PASS][18] -> [INCOMPLETE][19] ([Intel XE#1727] / [Intel XE#2705] / [Intel XE#3113] / [Intel XE#4212] / [Intel XE#4345] / [Intel XE#4522])
   [18]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-dg2-464/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc.html
   [19]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-dg2-436/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc.html

  * igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc@pipe-c-hdmi-a-6:
    - shard-dg2-set2:     [PASS][20] -> [INCOMPLETE][21] ([Intel XE#1727] / [Intel XE#2705] / [Intel XE#3113] / [Intel XE#4212] / [Intel XE#4522])
   [20]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-dg2-464/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc@pipe-c-hdmi-a-6.html
   [21]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-dg2-436/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc@pipe-c-hdmi-a-6.html

  * igt@kms_chamelium_color@ctm-max:
    - shard-adlp:         NOTRUN -> [SKIP][22] ([Intel XE#306])
   [22]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-2/igt@kms_chamelium_color@ctm-max.html

  * igt@kms_chamelium_hpd@dp-hpd-enable-disable-mode:
    - shard-adlp:         NOTRUN -> [SKIP][23] ([Intel XE#373]) +3 other tests skip
   [23]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-6/igt@kms_chamelium_hpd@dp-hpd-enable-disable-mode.html

  * igt@kms_content_protection@atomic-dpms@pipe-a-dp-2:
    - shard-bmg:          NOTRUN -> [FAIL][24] ([Intel XE#1178])
   [24]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-2/igt@kms_content_protection@atomic-dpms@pipe-a-dp-2.html

  * igt@kms_content_protection@dp-mst-type-0:
    - shard-adlp:         NOTRUN -> [SKIP][25] ([Intel XE#307])
   [25]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-2/igt@kms_content_protection@dp-mst-type-0.html

  * igt@kms_cursor_crc@cursor-onscreen-32x32:
    - shard-adlp:         NOTRUN -> [SKIP][26] ([Intel XE#455]) +11 other tests skip
   [26]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-2/igt@kms_cursor_crc@cursor-onscreen-32x32.html

  * igt@kms_cursor_legacy@cursora-vs-flipb-atomic-transitions:
    - shard-bmg:          [PASS][27] -> [SKIP][28] ([Intel XE#2291]) +1 other test skip
   [27]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-4/igt@kms_cursor_legacy@cursora-vs-flipb-atomic-transitions.html
   [28]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-6/igt@kms_cursor_legacy@cursora-vs-flipb-atomic-transitions.html

  * igt@kms_cursor_legacy@cursora-vs-flipb-legacy:
    - shard-adlp:         NOTRUN -> [SKIP][29] ([Intel XE#309]) +3 other tests skip
   [29]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-2/igt@kms_cursor_legacy@cursora-vs-flipb-legacy.html

  * igt@kms_cursor_legacy@flip-vs-cursor-atomic:
    - shard-bmg:          [PASS][30] -> [FAIL][31] ([Intel XE#1475])
   [30]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-6/igt@kms_cursor_legacy@flip-vs-cursor-atomic.html
   [31]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-7/igt@kms_cursor_legacy@flip-vs-cursor-atomic.html

  * igt@kms_feature_discovery@display-3x:
    - shard-adlp:         NOTRUN -> [SKIP][32] ([Intel XE#703])
   [32]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-2/igt@kms_feature_discovery@display-3x.html

  * igt@kms_flip@2x-flip-vs-blocking-wf-vblank@ab-hdmi-a6-dp4:
    - shard-dg2-set2:     [PASS][33] -> [FAIL][34] ([Intel XE#5408]) +1 other test fail
   [33]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-dg2-433/igt@kms_flip@2x-flip-vs-blocking-wf-vblank@ab-hdmi-a6-dp4.html
   [34]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-dg2-432/igt@kms_flip@2x-flip-vs-blocking-wf-vblank@ab-hdmi-a6-dp4.html

  * igt@kms_flip@2x-flip-vs-wf_vblank:
    - shard-bmg:          [PASS][35] -> [SKIP][36] ([Intel XE#2316]) +2 other tests skip
   [35]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-4/igt@kms_flip@2x-flip-vs-wf_vblank.html
   [36]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-6/igt@kms_flip@2x-flip-vs-wf_vblank.html

  * igt@kms_flip@2x-single-buffer-flip-vs-dpms-off-vs-modeset:
    - shard-adlp:         NOTRUN -> [SKIP][37] ([Intel XE#310]) +3 other tests skip
   [37]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-6/igt@kms_flip@2x-single-buffer-flip-vs-dpms-off-vs-modeset.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@d-dp4:
    - shard-dg2-set2:     [PASS][38] -> [FAIL][39] ([Intel XE#301] / [Intel XE#3149]) +1 other test fail
   [38]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-dg2-434/igt@kms_flip@flip-vs-expired-vblank-interruptible@d-dp4.html
   [39]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-dg2-464/igt@kms_flip@flip-vs-expired-vblank-interruptible@d-dp4.html

  * igt@kms_flip@flip-vs-suspend-interruptible:
    - shard-bmg:          [PASS][40] -> [INCOMPLETE][41] ([Intel XE#2049] / [Intel XE#2597]) +3 other tests incomplete
   [40]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-5/igt@kms_flip@flip-vs-suspend-interruptible.html
   [41]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-6/igt@kms_flip@flip-vs-suspend-interruptible.html

  * igt@kms_flip@plain-flip-interruptible@b-hdmi-a1:
    - shard-adlp:         [PASS][42] -> [DMESG-WARN][43] ([Intel XE#4543]) +2 other tests dmesg-warn
   [42]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-adlp-2/igt@kms_flip@plain-flip-interruptible@b-hdmi-a1.html
   [43]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-1/igt@kms_flip@plain-flip-interruptible@b-hdmi-a1.html

  * igt@kms_flip@wf_vblank-ts-check-interruptible@a-edp1:
    - shard-lnl:          [PASS][44] -> [FAIL][45] ([Intel XE#5416]) +1 other test fail
   [44]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-lnl-4/igt@kms_flip@wf_vblank-ts-check-interruptible@a-edp1.html
   [45]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-lnl-4/igt@kms_flip@wf_vblank-ts-check-interruptible@a-edp1.html

  * igt@kms_frontbuffer_tracking@drrs-1p-primscrn-cur-indfb-draw-blt:
    - shard-adlp:         NOTRUN -> [SKIP][46] ([Intel XE#651]) +5 other tests skip
   [46]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-2/igt@kms_frontbuffer_tracking@drrs-1p-primscrn-cur-indfb-draw-blt.html

  * igt@kms_frontbuffer_tracking@drrs-2p-scndscrn-indfb-pgflip-blt:
    - shard-adlp:         NOTRUN -> [SKIP][47] ([Intel XE#656]) +17 other tests skip
   [47]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-6/igt@kms_frontbuffer_tracking@drrs-2p-scndscrn-indfb-pgflip-blt.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-pri-indfb-draw-mmap-wc:
    - shard-dg2-set2:     NOTRUN -> [SKIP][48] ([Intel XE#653])
   [48]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-dg2-463/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-pri-indfb-draw-mmap-wc.html

  * igt@kms_frontbuffer_tracking@fbcpsr-rgb101010-draw-render:
    - shard-adlp:         NOTRUN -> [SKIP][49] ([Intel XE#653]) +7 other tests skip
   [49]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-2/igt@kms_frontbuffer_tracking@fbcpsr-rgb101010-draw-render.html

  * igt@kms_frontbuffer_tracking@plane-fbc-rte:
    - shard-adlp:         NOTRUN -> [SKIP][50] ([Intel XE#1158])
   [50]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-6/igt@kms_frontbuffer_tracking@plane-fbc-rte.html

  * igt@kms_hdr@invalid-hdr:
    - shard-bmg:          [PASS][51] -> [SKIP][52] ([Intel XE#1503])
   [51]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-8/igt@kms_hdr@invalid-hdr.html
   [52]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-1/igt@kms_hdr@invalid-hdr.html

  * igt@kms_joiner@switch-modeset-ultra-joiner-big-joiner:
    - shard-adlp:         NOTRUN -> [SKIP][53] ([Intel XE#2925])
   [53]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-2/igt@kms_joiner@switch-modeset-ultra-joiner-big-joiner.html

  * igt@kms_plane_multiple@2x-tiling-x:
    - shard-bmg:          [PASS][54] -> [SKIP][55] ([Intel XE#4596])
   [54]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-7/igt@kms_plane_multiple@2x-tiling-x.html
   [55]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-6/igt@kms_plane_multiple@2x-tiling-x.html

  * igt@kms_plane_multiple@tiling-yf:
    - shard-adlp:         NOTRUN -> [SKIP][56] ([Intel XE#5020])
   [56]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-2/igt@kms_plane_multiple@tiling-yf.html

  * igt@kms_pm_rpm@modeset-non-lpsp:
    - shard-adlp:         NOTRUN -> [SKIP][57] ([Intel XE#836])
   [57]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-2/igt@kms_pm_rpm@modeset-non-lpsp.html

  * igt@kms_psr2_sf@psr2-overlay-plane-move-continuous-sf:
    - shard-adlp:         NOTRUN -> [SKIP][58] ([Intel XE#1406] / [Intel XE#1489]) +4 other tests skip
   [58]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-6/igt@kms_psr2_sf@psr2-overlay-plane-move-continuous-sf.html

  * igt@kms_psr2_su@page_flip-xrgb8888:
    - shard-adlp:         NOTRUN -> [SKIP][59] ([Intel XE#1122] / [Intel XE#1406] / [Intel XE#5580])
   [59]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-2/igt@kms_psr2_su@page_flip-xrgb8888.html

  * igt@kms_psr@fbc-psr-cursor-plane-move:
    - shard-adlp:         NOTRUN -> [SKIP][60] ([Intel XE#1406] / [Intel XE#2850] / [Intel XE#929]) +4 other tests skip
   [60]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-2/igt@kms_psr@fbc-psr-cursor-plane-move.html

  * igt@kms_psr@pr-primary-render:
    - shard-dg2-set2:     NOTRUN -> [SKIP][61] ([Intel XE#1406] / [Intel XE#2850] / [Intel XE#929])
   [61]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-dg2-434/igt@kms_psr@pr-primary-render.html

  * igt@kms_vrr@cmrr:
    - shard-adlp:         NOTRUN -> [SKIP][62] ([Intel XE#2168])
   [62]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-6/igt@kms_vrr@cmrr.html

  * igt@xe_compute_preempt@compute-preempt:
    - shard-adlp:         NOTRUN -> [SKIP][63] ([Intel XE#6360])
   [63]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-2/igt@xe_compute_preempt@compute-preempt.html

  * igt@xe_copy_basic@mem-set-linear-0x369:
    - shard-adlp:         NOTRUN -> [SKIP][64] ([Intel XE#1126]) +1 other test skip
   [64]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-6/igt@xe_copy_basic@mem-set-linear-0x369.html

  * igt@xe_eudebug@discovery-empty-clients:
    - shard-dg2-set2:     NOTRUN -> [SKIP][65] ([Intel XE#4837])
   [65]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-dg2-463/igt@xe_eudebug@discovery-empty-clients.html

  * igt@xe_eudebug_online@writes-caching-sram-bb-vram-target-vram:
    - shard-adlp:         NOTRUN -> [SKIP][66] ([Intel XE#4837] / [Intel XE#5565]) +5 other tests skip
   [66]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-6/igt@xe_eudebug_online@writes-caching-sram-bb-vram-target-vram.html

  * igt@xe_evict@evict-beng-cm-threads-large:
    - shard-adlp:         NOTRUN -> [SKIP][67] ([Intel XE#261]) +2 other tests skip
   [67]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-6/igt@xe_evict@evict-beng-cm-threads-large.html

  * igt@xe_evict@evict-large-cm:
    - shard-adlp:         NOTRUN -> [SKIP][68] ([Intel XE#261] / [Intel XE#5564])
   [68]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-6/igt@xe_evict@evict-large-cm.html

  * igt@xe_evict@evict-small-multi-vm-cm:
    - shard-adlp:         NOTRUN -> [SKIP][69] ([Intel XE#261] / [Intel XE#688])
   [69]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-2/igt@xe_evict@evict-small-multi-vm-cm.html

  * igt@xe_exec_basic@multigpu-once-basic-defer-bind:
    - shard-adlp:         NOTRUN -> [SKIP][70] ([Intel XE#1392] / [Intel XE#5575]) +3 other tests skip
   [70]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-6/igt@xe_exec_basic@multigpu-once-basic-defer-bind.html

  * igt@xe_exec_basic@multigpu-once-bindexecqueue-rebind:
    - shard-dg2-set2:     [PASS][71] -> [SKIP][72] ([Intel XE#1392])
   [71]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-dg2-434/igt@xe_exec_basic@multigpu-once-bindexecqueue-rebind.html
   [72]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-dg2-464/igt@xe_exec_basic@multigpu-once-bindexecqueue-rebind.html

  * igt@xe_exec_fault_mode@many-execqueues-bindexecqueue:
    - shard-adlp:         NOTRUN -> [SKIP][73] ([Intel XE#288] / [Intel XE#5561]) +13 other tests skip
   [73]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-6/igt@xe_exec_fault_mode@many-execqueues-bindexecqueue.html

  * igt@xe_exec_fault_mode@once-rebind-prefetch:
    - shard-dg2-set2:     NOTRUN -> [SKIP][74] ([Intel XE#288])
   [74]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-dg2-463/igt@xe_exec_fault_mode@once-rebind-prefetch.html

  * igt@xe_exec_system_allocator@many-stride-mmap-new-huge:
    - shard-adlp:         NOTRUN -> [SKIP][75] ([Intel XE#4915]) +118 other tests skip
   [75]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-6/igt@xe_exec_system_allocator@many-stride-mmap-new-huge.html

  * igt@xe_exec_system_allocator@threads-many-large-execqueues-mmap:
    - shard-dg2-set2:     NOTRUN -> [SKIP][76] ([Intel XE#4915]) +14 other tests skip
   [76]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-dg2-434/igt@xe_exec_system_allocator@threads-many-large-execqueues-mmap.html

  * igt@xe_fault_injection@probe-fail-guc-xe_guc_mmio_send_recv:
    - shard-dg2-set2:     [PASS][77] -> [DMESG-WARN][78] ([Intel XE#5893])
   [77]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-dg2-434/igt@xe_fault_injection@probe-fail-guc-xe_guc_mmio_send_recv.html
   [78]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-dg2-464/igt@xe_fault_injection@probe-fail-guc-xe_guc_mmio_send_recv.html

  * igt@xe_oa@create-destroy-userspace-config:
    - shard-adlp:         NOTRUN -> [SKIP][79] ([Intel XE#3573]) +1 other test skip
   [79]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-2/igt@xe_oa@create-destroy-userspace-config.html

  * igt@xe_pm@s2idle-multiple-execs:
    - shard-dg2-set2:     [PASS][80] -> [INCOMPLETE][81] ([Intel XE#4504])
   [80]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-dg2-434/igt@xe_pm@s2idle-multiple-execs.html
   [81]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-dg2-464/igt@xe_pm@s2idle-multiple-execs.html

  * igt@xe_pxp@pxp-termination-key-update-post-termination-irq:
    - shard-adlp:         NOTRUN -> [SKIP][82] ([Intel XE#4733] / [Intel XE#5594]) +1 other test skip
   [82]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-2/igt@xe_pxp@pxp-termination-key-update-post-termination-irq.html

  
#### Possible fixes ####

  * igt@kms_ccs@crc-primary-suspend-4-tiled-dg2-mc-ccs@pipe-d-dp-4:
    - shard-dg2-set2:     [INCOMPLETE][83] ([Intel XE#3862]) -> [PASS][84] +1 other test pass
   [83]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-dg2-435/igt@kms_ccs@crc-primary-suspend-4-tiled-dg2-mc-ccs@pipe-d-dp-4.html
   [84]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-dg2-463/igt@kms_ccs@crc-primary-suspend-4-tiled-dg2-mc-ccs@pipe-d-dp-4.html

  * igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs@pipe-c-hdmi-a-6:
    - shard-dg2-set2:     [INCOMPLETE][85] ([Intel XE#1727] / [Intel XE#3113] / [Intel XE#6168]) -> [PASS][86] +1 other test pass
   [85]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-dg2-436/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs@pipe-c-hdmi-a-6.html
   [86]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-dg2-466/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs@pipe-c-hdmi-a-6.html

  * igt@kms_cursor_crc@cursor-sliding-256x256:
    - shard-adlp:         [DMESG-WARN][87] ([Intel XE#2953] / [Intel XE#4173]) -> [PASS][88] +7 other tests pass
   [87]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-adlp-9/igt@kms_cursor_crc@cursor-sliding-256x256.html
   [88]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-2/igt@kms_cursor_crc@cursor-sliding-256x256.html

  * igt@kms_cursor_legacy@cursorb-vs-flipa-atomic-transitions-varying-size:
    - shard-bmg:          [SKIP][89] ([Intel XE#2291]) -> [PASS][90] +2 other tests pass
   [89]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-6/igt@kms_cursor_legacy@cursorb-vs-flipa-atomic-transitions-varying-size.html
   [90]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-4/igt@kms_cursor_legacy@cursorb-vs-flipa-atomic-transitions-varying-size.html

  * igt@kms_cursor_legacy@single-bo:
    - shard-bmg:          [DMESG-WARN][91] ([Intel XE#5354]) -> [PASS][92] +2 other tests pass
   [91]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-1/igt@kms_cursor_legacy@single-bo.html
   [92]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-2/igt@kms_cursor_legacy@single-bo.html

  * igt@kms_dp_linktrain_fallback@dp-fallback:
    - shard-bmg:          [SKIP][93] ([Intel XE#4294]) -> [PASS][94]
   [93]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-6/igt@kms_dp_linktrain_fallback@dp-fallback.html
   [94]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-4/igt@kms_dp_linktrain_fallback@dp-fallback.html

  * igt@kms_flip@2x-plain-flip-fb-recreate:
    - shard-bmg:          [SKIP][95] ([Intel XE#2316]) -> [PASS][96] +3 other tests pass
   [95]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-6/igt@kms_flip@2x-plain-flip-fb-recreate.html
   [96]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-7/igt@kms_flip@2x-plain-flip-fb-recreate.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible:
    - shard-lnl:          [FAIL][97] ([Intel XE#301]) -> [PASS][98] +1 other test pass
   [97]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-lnl-2/igt@kms_flip@flip-vs-expired-vblank-interruptible.html
   [98]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-lnl-5/igt@kms_flip@flip-vs-expired-vblank-interruptible.html

  * igt@kms_flip@flip-vs-suspend-interruptible:
    - shard-adlp:         [DMESG-WARN][99] ([Intel XE#4543]) -> [PASS][100] +2 other tests pass
   [99]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-adlp-1/igt@kms_flip@flip-vs-suspend-interruptible.html
   [100]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-6/igt@kms_flip@flip-vs-suspend-interruptible.html
    - shard-dg2-set2:     [INCOMPLETE][101] ([Intel XE#2049] / [Intel XE#2597]) -> [PASS][102] +1 other test pass
   [101]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-dg2-464/igt@kms_flip@flip-vs-suspend-interruptible.html
   [102]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-dg2-434/igt@kms_flip@flip-vs-suspend-interruptible.html

  * igt@kms_flip@wf_vblank-ts-check-interruptible@c-hdmi-a1:
    - shard-adlp:         [FAIL][103] ([Intel XE#5408]) -> [PASS][104] +1 other test pass
   [103]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-adlp-6/igt@kms_flip@wf_vblank-ts-check-interruptible@c-hdmi-a1.html
   [104]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-9/igt@kms_flip@wf_vblank-ts-check-interruptible@c-hdmi-a1.html

  * igt@kms_hdr@static-toggle-suspend:
    - shard-bmg:          [SKIP][105] ([Intel XE#1503]) -> [PASS][106] +1 other test pass
   [105]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-6/igt@kms_hdr@static-toggle-suspend.html
   [106]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-7/igt@kms_hdr@static-toggle-suspend.html

  * igt@xe_create@create-execqueues-leak:
    - shard-adlp:         [FAIL][107] -> [PASS][108]
   [107]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-adlp-2/igt@xe_create@create-execqueues-leak.html
   [108]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-adlp-8/igt@xe_create@create-execqueues-leak.html

  
#### Warnings ####

  * igt@kms_ccs@random-ccs-data-4-tiled-dg2-mc-ccs:
    - shard-dg2-set2:     [INCOMPLETE][109] ([Intel XE#1727] / [Intel XE#3113] / [Intel XE#4345] / [Intel XE#6168]) -> [INCOMPLETE][110] ([Intel XE#1727] / [Intel XE#2705] / [Intel XE#3113] / [Intel XE#4212] / [Intel XE#4345] / [Intel XE#4522]) +1 other test incomplete
   [109]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-dg2-466/igt@kms_ccs@random-ccs-data-4-tiled-dg2-mc-ccs.html
   [110]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-dg2-463/igt@kms_ccs@random-ccs-data-4-tiled-dg2-mc-ccs.html

  * igt@kms_content_protection@atomic-dpms:
    - shard-bmg:          [SKIP][111] ([Intel XE#2341]) -> [FAIL][112] ([Intel XE#1178])
   [111]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-6/igt@kms_content_protection@atomic-dpms.html
   [112]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-2/igt@kms_content_protection@atomic-dpms.html

  * igt@kms_content_protection@uevent:
    - shard-bmg:          [FAIL][113] ([Intel XE#1188]) -> [SKIP][114] ([Intel XE#2341])
   [113]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-7/igt@kms_content_protection@uevent.html
   [114]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-6/igt@kms_content_protection@uevent.html

  * igt@kms_frontbuffer_tracking@drrs-2p-primscrn-indfb-pgflip-blt:
    - shard-bmg:          [SKIP][115] ([Intel XE#2311]) -> [SKIP][116] ([Intel XE#2312]) +11 other tests skip
   [115]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-7/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-indfb-pgflip-blt.html
   [116]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-6/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-indfb-pgflip-blt.html

  * igt@kms_frontbuffer_tracking@drrs-2p-primscrn-shrfb-msflip-blt:
    - shard-bmg:          [SKIP][117] ([Intel XE#2312]) -> [SKIP][118] ([Intel XE#2311]) +9 other tests skip
   [117]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-6/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-shrfb-msflip-blt.html
   [118]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-2/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-shrfb-msflip-blt.html

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-shrfb-msflip-blt:
    - shard-bmg:          [SKIP][119] ([Intel XE#2312]) -> [SKIP][120] ([Intel XE#5390]) +4 other tests skip
   [119]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-6/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-shrfb-msflip-blt.html
   [120]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-2/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-shrfb-msflip-blt.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-draw-mmap-wc:
    - shard-bmg:          [SKIP][121] ([Intel XE#5390]) -> [SKIP][122] ([Intel XE#2312]) +4 other tests skip
   [121]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-7/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-draw-mmap-wc.html
   [122]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-6/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-draw-mmap-wc.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-cur-indfb-draw-blt:
    - shard-bmg:          [SKIP][123] ([Intel XE#2312]) -> [SKIP][124] ([Intel XE#2313]) +10 other tests skip
   [123]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-6/igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-cur-indfb-draw-blt.html
   [124]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-2/igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-cur-indfb-draw-blt.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-pri-indfb-draw-render:
    - shard-bmg:          [SKIP][125] ([Intel XE#2313]) -> [SKIP][126] ([Intel XE#2312]) +11 other tests skip
   [125]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-1/igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-pri-indfb-draw-render.html
   [126]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-6/igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-pri-indfb-draw-render.html

  * igt@kms_hdr@brightness-with-hdr:
    - shard-bmg:          [SKIP][127] ([Intel XE#3374] / [Intel XE#3544]) -> [SKIP][128] ([Intel XE#3544])
   [127]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-3/igt@kms_hdr@brightness-with-hdr.html
   [128]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-1/igt@kms_hdr@brightness-with-hdr.html

  * igt@kms_tiled_display@basic-test-pattern:
    - shard-dg2-set2:     [SKIP][129] ([Intel XE#362]) -> [FAIL][130] ([Intel XE#1729])
   [129]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-dg2-463/igt@kms_tiled_display@basic-test-pattern.html
   [130]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-dg2-435/igt@kms_tiled_display@basic-test-pattern.html

  * igt@kms_tiled_display@basic-test-pattern-with-chamelium:
    - shard-bmg:          [SKIP][131] ([Intel XE#2426]) -> [SKIP][132] ([Intel XE#2509])
   [131]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-4/igt@kms_tiled_display@basic-test-pattern-with-chamelium.html
   [132]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-5/igt@kms_tiled_display@basic-test-pattern-with-chamelium.html
    - shard-dg2-set2:     [SKIP][133] ([Intel XE#1500]) -> [SKIP][134] ([Intel XE#362])
   [133]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-dg2-435/igt@kms_tiled_display@basic-test-pattern-with-chamelium.html
   [134]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-dg2-463/igt@kms_tiled_display@basic-test-pattern-with-chamelium.html

  * igt@xe_fault_injection@probe-fail-guc-xe_guc_ct_send_recv:
    - shard-bmg:          [ABORT][135] ([Intel XE#4917] / [Intel XE#5466] / [Intel XE#5530]) -> [ABORT][136] ([Intel XE#5466] / [Intel XE#5530])
   [135]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f/shard-bmg-7/igt@xe_fault_injection@probe-fail-guc-xe_guc_ct_send_recv.html
   [136]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/shard-bmg-6/igt@xe_fault_injection@probe-fail-guc-xe_guc_ct_send_recv.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [Intel XE#1122]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1122
  [Intel XE#1124]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1124
  [Intel XE#1126]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1126
  [Intel XE#1158]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1158
  [Intel XE#1178]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1178
  [Intel XE#1188]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1188
  [Intel XE#1392]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1392
  [Intel XE#1406]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1406
  [Intel XE#1475]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1475
  [Intel XE#1489]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1489
  [Intel XE#1500]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1500
  [Intel XE#1503]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1503
  [Intel XE#1727]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1727
  [Intel XE#1729]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1729
  [Intel XE#2049]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2049
  [Intel XE#2168]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2168
  [Intel XE#2291]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2291
  [Intel XE#2311]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2311
  [Intel XE#2312]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2312
  [Intel XE#2313]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2313
  [Intel XE#2316]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2316
  [Intel XE#2341]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2341
  [Intel XE#2426]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2426
  [Intel XE#2509]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2509
  [Intel XE#2597]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2597
  [Intel XE#261]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/261
  [Intel XE#2652]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2652
  [Intel XE#2705]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2705
  [Intel XE#2850]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2850
  [Intel XE#288]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/288
  [Intel XE#2925]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2925
  [Intel XE#2953]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2953
  [Intel XE#301]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/301
  [Intel XE#306]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/306
  [Intel XE#307]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/307
  [Intel XE#309]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/309
  [Intel XE#310]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/310
  [Intel XE#3113]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3113
  [Intel XE#3149]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3149
  [Intel XE#316]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/316
  [Intel XE#3374]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3374
  [Intel XE#3544]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3544
  [Intel XE#3573]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3573
  [Intel XE#362]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/362
  [Intel XE#367]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/367
  [Intel XE#373]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/373
  [Intel XE#3862]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3862
  [Intel XE#4173]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4173
  [Intel XE#4212]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4212
  [Intel XE#4294]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4294
  [Intel XE#4345]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4345
  [Intel XE#4504]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4504
  [Intel XE#4522]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4522
  [Intel XE#4543]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4543
  [Intel XE#455]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/455
  [Intel XE#4596]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4596
  [Intel XE#4733]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4733
  [Intel XE#4837]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4837
  [Intel XE#4915]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4915
  [Intel XE#4917]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4917
  [Intel XE#5020]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5020
  [Intel XE#5354]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5354
  [Intel XE#5390]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5390
  [Intel XE#5408]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5408
  [Intel XE#5416]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5416
  [Intel XE#5466]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5466
  [Intel XE#5530]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5530
  [Intel XE#5561]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5561
  [Intel XE#5564]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5564
  [Intel XE#5565]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5565
  [Intel XE#5575]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5575
  [Intel XE#5580]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5580
  [Intel XE#5594]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5594
  [Intel XE#5893]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5893
  [Intel XE#6032]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6032
  [Intel XE#6168]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6168
  [Intel XE#6312]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6312
  [Intel XE#6360]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6360
  [Intel XE#6377]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6377
  [Intel XE#651]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/651
  [Intel XE#653]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/653
  [Intel XE#656]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/656
  [Intel XE#688]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/688
  [Intel XE#703]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/703
  [Intel XE#787]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/787
  [Intel XE#836]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/836
  [Intel XE#929]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/929


Build changes
-------------

  * Linux: xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f -> xe-pw-156144v1

  IGT_8592: b3d809d537febc23792ab8d0eb6d13cf80d626c8 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  xe-3941-b5e5976a35cb0e8e45aea836b42ecccf22df803f: b5e5976a35cb0e8e45aea836b42ecccf22df803f
  xe-pw-156144v1: 156144v1

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156144v1/index.html

[-- Attachment #2: Type: text/html, Size: 48182 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] drm/xe: Avoid serializing unbind jobs on prior TLB invalidations
  2025-10-17 16:52 ` [PATCH 1/1] drm/xe: Avoid serializing unbind jobs on prior TLB invalidations Matthew Brost
@ 2025-10-21 17:55   ` Summers, Stuart
  2025-10-21 20:36     ` Matthew Brost
  2025-10-22  8:00   ` Tvrtko Ursulin
  2025-10-23 12:28   ` Thomas Hellström
  2 siblings, 1 reply; 15+ messages in thread
From: Summers, Stuart @ 2025-10-21 17:55 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Brost,  Matthew
  Cc: Santa, Carlos, thomas.hellstrom@linux.intel.com

On Fri, 2025-10-17 at 09:52 -0700, Matthew Brost wrote:
> When a burst of unbind jobs is issued, a dependency chain can form
> between the TLB invalidation of a previous unbind job and the current
> one. This leads to undesirable serialization, causing current jobs to
> wait unnecessarily for prior TLB invalidations, execute on the GPU
> when
> not needed, and significantly slow down the unbind burst—resulting in
> up
> to a 4× slowdown.
> 
> To break this chain, mask the last bind queue dependency if the last
> fence's DMA context matches the TLB invalidation context. This allows
> full pipelining of unbinds and TLB invalidations while preserving
> correct dma-fence signaling semantics.
> 
> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6047
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_exec.c          |  3 +-
>  drivers/gpu/drm/xe/xe_exec_queue.c    | 18 +++++++++--
>  drivers/gpu/drm/xe/xe_exec_queue.h    |  3 +-
>  drivers/gpu/drm/xe/xe_pt.c            | 15 +++++++--
>  drivers/gpu/drm/xe/xe_sched_job.c     | 44
> ++++++++++++++++++++++++++-
>  drivers/gpu/drm/xe/xe_sched_job.h     |  7 ++++-
>  drivers/gpu/drm/xe/xe_tlb_inval_job.c | 14 +++++++++
>  drivers/gpu/drm/xe/xe_tlb_inval_job.h |  2 ++
>  8 files changed, 98 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_exec.c
> b/drivers/gpu/drm/xe/xe_exec.c
> index 0dc27476832b..6034cfc8be06 100644
> --- a/drivers/gpu/drm/xe/xe_exec.c
> +++ b/drivers/gpu/drm/xe/xe_exec.c
> @@ -294,7 +294,8 @@ int xe_exec_ioctl(struct drm_device *dev, void
> *data, struct drm_file *file)
>                 goto err_put_job;
>  
>         if (!xe_vm_in_lr_mode(vm)) {
> -               err = xe_sched_job_last_fence_add_dep(job, vm);
> +               err = xe_sched_job_last_fence_add_dep(job, vm,
> NO_MASK_DEP,
> +                                                     NO_MASK_DEP);
>                 if (err)
>                         goto err_put_job;
>  
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> b/drivers/gpu/drm/xe/xe_exec_queue.c
> index 90cbc95f8e2e..d6f69d9bccba 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -25,6 +25,7 @@
>  #include "xe_migrate.h"
>  #include "xe_pm.h"
>  #include "xe_ring_ops_types.h"
> +#include "xe_sched_job.h"
>  #include "xe_trace.h"
>  #include "xe_vm.h"
>  #include "xe_pxp.h"
> @@ -1106,11 +1107,17 @@ void xe_exec_queue_last_fence_set(struct
> xe_exec_queue *q, struct xe_vm *vm,
>   * xe_exec_queue_last_fence_test_dep - Test last fence dependency of
> queue
>   * @q: The exec queue
>   * @vm: The VM the engine does a bind or exec for
> + * @mask_ctx0: Mask dma-fence context0
> + * @mask_ctx1: Mask dma-fence context1
> + *
> + * Test last fence dependency of queue, skipping masked dma fence
> contexts.
>   *
>   * Returns:
> - * -ETIME if there exists an unsignalled last fence dependency, zero
> otherwise.
> + * -ETIME if there exists an unsignalled and unmasked last fence
> dependency,
> + * zero otherwise.
>   */
> -int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
> struct xe_vm *vm)
> +int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
> struct xe_vm *vm,
> +                                     u64 mask_ctx0, u64 mask_ctx1)
>  {
>         struct dma_fence *fence;
>         int err = 0;
> @@ -1119,6 +1126,13 @@ int xe_exec_queue_last_fence_test_dep(struct
> xe_exec_queue *q, struct xe_vm *vm)
>         if (fence) {
>                 err = test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence-
> >flags) ?
>                         0 : -ETIME;
> +
> +               if (err == -ETIME) {
> +                       if (xe_sched_job_mask_dependency(fence,
> mask_ctx0,
> +                                                        mask_ctx1))
> +                               err = 0;
> +               }
> +
>                 dma_fence_put(fence);
>         }
>  
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h
> b/drivers/gpu/drm/xe/xe_exec_queue.h
> index a4dfbe858bda..99a35b22a46c 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
> @@ -85,7 +85,8 @@ struct dma_fence
> *xe_exec_queue_last_fence_get_for_resume(struct xe_exec_queue *
>  void xe_exec_queue_last_fence_set(struct xe_exec_queue *e, struct
> xe_vm *vm,
>                                   struct dma_fence *fence);
>  int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
> -                                     struct xe_vm *vm);
> +                                     struct xe_vm *vm, u64
> mask_ctx0,
> +                                     u64 mask_ctx1);
>  void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q);
>  
>  int xe_exec_queue_contexts_hwsp_rebase(struct xe_exec_queue *q, void
> *scratch);
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index d22fd1ccc0ba..bba9ae559f57 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -1341,10 +1341,21 @@ static int xe_pt_vm_dependencies(struct
> xe_sched_job *job,
>         }
>  
>         if (!(pt_update_ops->q->flags & EXEC_QUEUE_FLAG_KERNEL)) {
> +               u64 mask_ctx0 = NO_MASK_DEP, mask_ctx1 = NO_MASK_DEP;
> +
> +               if (ijob)
> +                       mask_ctx0 =
> xe_tlb_inval_job_fence_context(ijob);
> +               if (mjob)
> +                       mask_ctx1 =
> xe_tlb_inval_job_fence_context(mjob);

Can we rename these ictx and mctx for consistency?

Also, do we really need both of these here? Shouldn't we always have
the primary GT inval (ictx) and so just need to check the one? My
understanding is the reason being that we might be adding either one of
these as the last fence so we need both checks. But in that case would
it be better to check against all dependencies or even just the last
two? Wouldn't that also help if multiple apps are trying to free at
once here so we have interleaved unbind dependencies?

> +
>                 if (job)
> -                       err = xe_sched_job_last_fence_add_dep(job,
> vm);
> +                       err = xe_sched_job_last_fence_add_dep(job,
> vm,
> +                                                            
> mask_ctx0,
> +                                                            
> mask_ctx1);
>                 else
> -                       err =
> xe_exec_queue_last_fence_test_dep(pt_update_ops->q, vm);
> +                       err =
> xe_exec_queue_last_fence_test_dep(pt_update_ops->q,
> +                                                               vm,
> mask_ctx0,
> +                                                               mask_
> ctx1);
>         }
>  
>         for (i = 0; job && !err && i < vops->num_syncs; i++)
> diff --git a/drivers/gpu/drm/xe/xe_sched_job.c
> b/drivers/gpu/drm/xe/xe_sched_job.c
> index d21bf8f26964..7cbdd87904c6 100644
> --- a/drivers/gpu/drm/xe/xe_sched_job.c
> +++ b/drivers/gpu/drm/xe/xe_sched_job.c
> @@ -6,6 +6,7 @@
>  #include "xe_sched_job.h"
>  
>  #include <uapi/drm/xe_drm.h>
> +#include <linux/dma-fence-array.h>
>  #include <linux/dma-fence-chain.h>
>  #include <linux/slab.h>
>  
> @@ -295,19 +296,60 @@ void xe_sched_job_push(struct xe_sched_job
> *job)
>         xe_sched_job_put(job);
>  }
>  
> +/**
> + * xe_sched_job_mask_dependency() - Determine if a dma-fence
> dependency can be masked
> + * @fence: The dma-fence to check
> + * @mask_ctx0: First context to compare against the fence's context
> + * @mask_ctx1: Second context to compare against the fence's context
> + *
> + * This function checks whether the context of the given dma-fence
> matches
> + * either of the provided mask contexts. If a match is found, the
> dependency
> + * represented by the fence can be skipped. If the fence is a dma-
> fence-array,
> + * its individual fences are unwound and checked.
> + *
> + * Return: true if the fence can be masked (i.e., skipped), false
> otherwise.
> + */
> +bool xe_sched_job_mask_dependency(struct dma_fence *fence, u64
> mask_ctx0,
> +                                 u64 mask_ctx1)
> +{
> +       if (dma_fence_is_array(fence)) {
> +               struct dma_fence *__fence;
> +               int index;
> +
> +               dma_fence_array_for_each(__fence, index, fence)
> +                       if (__fence->context == mask_ctx0 ||
> +                           __fence->context == mask_ctx1)
> +                               return true;
> +       } else if (fence->context == mask_ctx0 ||
> +                  fence->context == mask_ctx1) {
> +               return true;
> +       }
> +
> +       return false;
> +}
> +
>  /**
>   * xe_sched_job_last_fence_add_dep - Add last fence dependency to
> job
>   * @job:job to add the last fence dependency to
>   * @vm: virtual memory job belongs to
> + * @mask_ctx0: Mask dma-fence context0
> + * @mask_ctx1: Mask dma-fence context1
> + *
> + * Add last fence dependency to job, skipping masked dma fence
> contexts.
>   *
>   * Returns:
>   * 0 on success, or an error on failing to expand the array.
>   */
> -int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct
> xe_vm *vm)
> +int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct
> xe_vm *vm,
> +                                   u64 mask_ctx0, u64 mask_ctx1)
>  {
>         struct dma_fence *fence;
>  
>         fence = xe_exec_queue_last_fence_get(job->q, vm);
> +       if (xe_sched_job_mask_dependency(fence, mask_ctx0,
> mask_ctx1)) {
> +               dma_fence_put(fence);
> +               return 0;
> +       }
>  
>         return drm_sched_job_add_dependency(&job->drm, fence);
>  }
> diff --git a/drivers/gpu/drm/xe/xe_sched_job.h
> b/drivers/gpu/drm/xe/xe_sched_job.h
> index 3dc72c5c1f13..81d8e848e605 100644
> --- a/drivers/gpu/drm/xe/xe_sched_job.h
> +++ b/drivers/gpu/drm/xe/xe_sched_job.h
> @@ -58,7 +58,8 @@ bool xe_sched_job_completed(struct xe_sched_job
> *job);
>  void xe_sched_job_arm(struct xe_sched_job *job);
>  void xe_sched_job_push(struct xe_sched_job *job);
>  
> -int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct
> xe_vm *vm);
> +int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct
> xe_vm *vm,
> +                                   u64 mask_ctx0, u64 mask_ctx1);
>  void xe_sched_job_init_user_fence(struct xe_sched_job *job,
>                                   struct xe_sync_entry *sync);
>  
> @@ -93,4 +94,8 @@ void xe_sched_job_snapshot_print(struct
> xe_sched_job_snapshot *snapshot, struct
>  int xe_sched_job_add_deps(struct xe_sched_job *job, struct dma_resv
> *resv,
>                           enum dma_resv_usage usage);
>  
> +#define NO_MASK_DEP    (~0x0ull)
> +bool xe_sched_job_mask_dependency(struct dma_fence *fence, u64
> mask_ctx0,
> +                                 u64 mask_ctx1);
> +
>  #endif
> diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> index 492def04a559..f2fe7f9fbb22 100644
> --- a/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> +++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> @@ -32,6 +32,8 @@ struct xe_tlb_inval_job {
>         u64 start;
>         /** @end: End address to invalidate */
>         u64 end;
> +       /** @fence_context: Fence context for job */
> +       u64 fence_context;
>         /** @asid: Address space ID to invalidate */
>         u32 asid;
>         /** @fence_armed: Fence has been armed */
> @@ -101,6 +103,7 @@ xe_tlb_inval_job_create(struct xe_exec_queue *q,
> struct xe_tlb_inval *tlb_inval,
>         job->asid = asid;
>         job->fence_armed = false;
>         job->dep.ops = &dep_job_ops;

This means the "finished" context per the entity definition right? Can
you either add a note here or change the job->fence_context name to
reflect that? Or otherwise why is this adding the +1 here?

Thanks,
Stuart

> +       job->fence_context = entity->fence_context + 1;
>         kref_init(&job->refcount);
>         xe_exec_queue_get(q);   /* Pairs with put in
> xe_tlb_inval_job_destroy */
>  
> @@ -266,3 +269,14 @@ void xe_tlb_inval_job_put(struct
> xe_tlb_inval_job *job)
>         if (!IS_ERR_OR_NULL(job))
>                 kref_put(&job->refcount, xe_tlb_inval_job_destroy);
>  }
> +
> +/**
> + * xe_tlb_inval_job_fence_context() - TLB invalidation job fence
> context
> + * @job: TLB invalidation job object
> + *
> + * Return: TLB invalidation job fence context
> + */
> +u64 xe_tlb_inval_job_fence_context(struct xe_tlb_inval_job *job)
> +{
> +       return job->fence_context;
> +}
> diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> index e63edcb26b50..2576165c2228 100644
> --- a/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> +++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> @@ -30,4 +30,6 @@ void xe_tlb_inval_job_get(struct xe_tlb_inval_job
> *job);
>  
>  void xe_tlb_inval_job_put(struct xe_tlb_inval_job *job);
>  
> +u64 xe_tlb_inval_job_fence_context(struct xe_tlb_inval_job *job);
> +
>  #endif


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] drm/xe: Avoid serializing unbind jobs on prior TLB invalidations
  2025-10-21 17:55   ` Summers, Stuart
@ 2025-10-21 20:36     ` Matthew Brost
  2025-10-21 20:43       ` Summers, Stuart
  0 siblings, 1 reply; 15+ messages in thread
From: Matthew Brost @ 2025-10-21 20:36 UTC (permalink / raw)
  To: Summers, Stuart
  Cc: intel-xe@lists.freedesktop.org, Santa,  Carlos,
	thomas.hellstrom@linux.intel.com

On Tue, Oct 21, 2025 at 11:55:56AM -0600, Summers, Stuart wrote:
> On Fri, 2025-10-17 at 09:52 -0700, Matthew Brost wrote:
> > When a burst of unbind jobs is issued, a dependency chain can form
> > between the TLB invalidation of a previous unbind job and the current
> > one. This leads to undesirable serialization, causing current jobs to
> > wait unnecessarily for prior TLB invalidations, execute on the GPU
> > when
> > not needed, and significantly slow down the unbind burst—resulting in
> > up
> > to a 4× slowdown.
> > 
> > To break this chain, mask the last bind queue dependency if the last
> > fence's DMA context matches the TLB invalidation context. This allows
> > full pipelining of unbinds and TLB invalidations while preserving
> > correct dma-fence signaling semantics.
> > 
> > Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6047
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_exec.c          |  3 +-
> >  drivers/gpu/drm/xe/xe_exec_queue.c    | 18 +++++++++--
> >  drivers/gpu/drm/xe/xe_exec_queue.h    |  3 +-
> >  drivers/gpu/drm/xe/xe_pt.c            | 15 +++++++--
> >  drivers/gpu/drm/xe/xe_sched_job.c     | 44
> > ++++++++++++++++++++++++++-
> >  drivers/gpu/drm/xe/xe_sched_job.h     |  7 ++++-
> >  drivers/gpu/drm/xe/xe_tlb_inval_job.c | 14 +++++++++
> >  drivers/gpu/drm/xe/xe_tlb_inval_job.h |  2 ++
> >  8 files changed, 98 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_exec.c
> > b/drivers/gpu/drm/xe/xe_exec.c
> > index 0dc27476832b..6034cfc8be06 100644
> > --- a/drivers/gpu/drm/xe/xe_exec.c
> > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > @@ -294,7 +294,8 @@ int xe_exec_ioctl(struct drm_device *dev, void
> > *data, struct drm_file *file)
> >                 goto err_put_job;
> >  
> >         if (!xe_vm_in_lr_mode(vm)) {
> > -               err = xe_sched_job_last_fence_add_dep(job, vm);
> > +               err = xe_sched_job_last_fence_add_dep(job, vm,
> > NO_MASK_DEP,
> > +                                                     NO_MASK_DEP);
> >                 if (err)
> >                         goto err_put_job;
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> > b/drivers/gpu/drm/xe/xe_exec_queue.c
> > index 90cbc95f8e2e..d6f69d9bccba 100644
> > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > @@ -25,6 +25,7 @@
> >  #include "xe_migrate.h"
> >  #include "xe_pm.h"
> >  #include "xe_ring_ops_types.h"
> > +#include "xe_sched_job.h"
> >  #include "xe_trace.h"
> >  #include "xe_vm.h"
> >  #include "xe_pxp.h"
> > @@ -1106,11 +1107,17 @@ void xe_exec_queue_last_fence_set(struct
> > xe_exec_queue *q, struct xe_vm *vm,
> >   * xe_exec_queue_last_fence_test_dep - Test last fence dependency of
> > queue
> >   * @q: The exec queue
> >   * @vm: The VM the engine does a bind or exec for
> > + * @mask_ctx0: Mask dma-fence context0
> > + * @mask_ctx1: Mask dma-fence context1
> > + *
> > + * Test last fence dependency of queue, skipping masked dma fence
> > contexts.
> >   *
> >   * Returns:
> > - * -ETIME if there exists an unsignalled last fence dependency, zero
> > otherwise.
> > + * -ETIME if there exists an unsignalled and unmasked last fence
> > dependency,
> > + * zero otherwise.
> >   */
> > -int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
> > struct xe_vm *vm)
> > +int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
> > struct xe_vm *vm,
> > +                                     u64 mask_ctx0, u64 mask_ctx1)
> >  {
> >         struct dma_fence *fence;
> >         int err = 0;
> > @@ -1119,6 +1126,13 @@ int xe_exec_queue_last_fence_test_dep(struct
> > xe_exec_queue *q, struct xe_vm *vm)
> >         if (fence) {
> >                 err = test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence-
> > >flags) ?
> >                         0 : -ETIME;
> > +
> > +               if (err == -ETIME) {
> > +                       if (xe_sched_job_mask_dependency(fence,
> > mask_ctx0,
> > +                                                        mask_ctx1))
> > +                               err = 0;
> > +               }
> > +
> >                 dma_fence_put(fence);
> >         }
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h
> > b/drivers/gpu/drm/xe/xe_exec_queue.h
> > index a4dfbe858bda..99a35b22a46c 100644
> > --- a/drivers/gpu/drm/xe/xe_exec_queue.h
> > +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
> > @@ -85,7 +85,8 @@ struct dma_fence
> > *xe_exec_queue_last_fence_get_for_resume(struct xe_exec_queue *
> >  void xe_exec_queue_last_fence_set(struct xe_exec_queue *e, struct
> > xe_vm *vm,
> >                                   struct dma_fence *fence);
> >  int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
> > -                                     struct xe_vm *vm);
> > +                                     struct xe_vm *vm, u64
> > mask_ctx0,
> > +                                     u64 mask_ctx1);
> >  void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q);
> >  
> >  int xe_exec_queue_contexts_hwsp_rebase(struct xe_exec_queue *q, void
> > *scratch);
> > diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> > index d22fd1ccc0ba..bba9ae559f57 100644
> > --- a/drivers/gpu/drm/xe/xe_pt.c
> > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > @@ -1341,10 +1341,21 @@ static int xe_pt_vm_dependencies(struct
> > xe_sched_job *job,
> >         }
> >  
> >         if (!(pt_update_ops->q->flags & EXEC_QUEUE_FLAG_KERNEL)) {
> > +               u64 mask_ctx0 = NO_MASK_DEP, mask_ctx1 = NO_MASK_DEP;
> > +
> > +               if (ijob)
> > +                       mask_ctx0 =
> > xe_tlb_inval_job_fence_context(ijob);
> > +               if (mjob)
> > +                       mask_ctx1 =
> > xe_tlb_inval_job_fence_context(mjob);
> 
> Can we rename these ictx and mctx for consistency?
> 

Yes.

> Also, do we really need both of these here? Shouldn't we always have
> the primary GT inval (ictx) and so just need to check the one? My

The code as written, we'd only need to check primary GT but Matt R
eventually wants the driver to be able to boot without the primary GT.
There is a bit of work to done to that work but I didn't want to make it
worse in this patch.

> understanding is the reason being that we might be adding either one of
> these as the last fence so we need both checks. But in that case would
> it be better to check against all dependencies or even just the last
> two? Wouldn't that also help if multiple apps are trying to free at
> once here so we have interleaved unbind dependencies?
> 

Depending on how the bind is setup, we may check further dependecies
in dma-resv - see all the other checks in this function. This is
covering the case for last queue dependecy only which is at least
sufficent help with ChromeOS case where this is triggered with a burst
of user unbinds.

We might still have an issue with a burst of SVM unbinds where this can
serialize though, that would however likely need some DRM scheduler
changes though fairly similar to this patch. I can maybe think on that
one in a follow up in a later series. Also I probably should switch over
SVM unbinds to drain the entire garbage collector list and issue a
single unbind job too.

> > +
> >                 if (job)
> > -                       err = xe_sched_job_last_fence_add_dep(job,
> > vm);
> > +                       err = xe_sched_job_last_fence_add_dep(job,
> > vm,
> > +                                                            
> > mask_ctx0,
> > +                                                            
> > mask_ctx1);
> >                 else
> > -                       err =
> > xe_exec_queue_last_fence_test_dep(pt_update_ops->q, vm);
> > +                       err =
> > xe_exec_queue_last_fence_test_dep(pt_update_ops->q,
> > +                                                               vm,
> > mask_ctx0,
> > +                                                               mask_
> > ctx1);
> >         }
> >  
> >         for (i = 0; job && !err && i < vops->num_syncs; i++)
> > diff --git a/drivers/gpu/drm/xe/xe_sched_job.c
> > b/drivers/gpu/drm/xe/xe_sched_job.c
> > index d21bf8f26964..7cbdd87904c6 100644
> > --- a/drivers/gpu/drm/xe/xe_sched_job.c
> > +++ b/drivers/gpu/drm/xe/xe_sched_job.c
> > @@ -6,6 +6,7 @@
> >  #include "xe_sched_job.h"
> >  
> >  #include <uapi/drm/xe_drm.h>
> > +#include <linux/dma-fence-array.h>
> >  #include <linux/dma-fence-chain.h>
> >  #include <linux/slab.h>
> >  
> > @@ -295,19 +296,60 @@ void xe_sched_job_push(struct xe_sched_job
> > *job)
> >         xe_sched_job_put(job);
> >  }
> >  
> > +/**
> > + * xe_sched_job_mask_dependency() - Determine if a dma-fence
> > dependency can be masked
> > + * @fence: The dma-fence to check
> > + * @mask_ctx0: First context to compare against the fence's context
> > + * @mask_ctx1: Second context to compare against the fence's context
> > + *
> > + * This function checks whether the context of the given dma-fence
> > matches
> > + * either of the provided mask contexts. If a match is found, the
> > dependency
> > + * represented by the fence can be skipped. If the fence is a dma-
> > fence-array,
> > + * its individual fences are unwound and checked.
> > + *
> > + * Return: true if the fence can be masked (i.e., skipped), false
> > otherwise.
> > + */
> > +bool xe_sched_job_mask_dependency(struct dma_fence *fence, u64
> > mask_ctx0,
> > +                                 u64 mask_ctx1)
> > +{
> > +       if (dma_fence_is_array(fence)) {
> > +               struct dma_fence *__fence;
> > +               int index;
> > +
> > +               dma_fence_array_for_each(__fence, index, fence)
> > +                       if (__fence->context == mask_ctx0 ||
> > +                           __fence->context == mask_ctx1)
> > +                               return true;
> > +       } else if (fence->context == mask_ctx0 ||
> > +                  fence->context == mask_ctx1) {
> > +               return true;
> > +       }
> > +
> > +       return false;
> > +}
> > +
> >  /**
> >   * xe_sched_job_last_fence_add_dep - Add last fence dependency to
> > job
> >   * @job:job to add the last fence dependency to
> >   * @vm: virtual memory job belongs to
> > + * @mask_ctx0: Mask dma-fence context0
> > + * @mask_ctx1: Mask dma-fence context1
> > + *
> > + * Add last fence dependency to job, skipping masked dma fence
> > contexts.
> >   *
> >   * Returns:
> >   * 0 on success, or an error on failing to expand the array.
> >   */
> > -int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct
> > xe_vm *vm)
> > +int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct
> > xe_vm *vm,
> > +                                   u64 mask_ctx0, u64 mask_ctx1)
> >  {
> >         struct dma_fence *fence;
> >  
> >         fence = xe_exec_queue_last_fence_get(job->q, vm);
> > +       if (xe_sched_job_mask_dependency(fence, mask_ctx0,
> > mask_ctx1)) {
> > +               dma_fence_put(fence);
> > +               return 0;
> > +       }
> >  
> >         return drm_sched_job_add_dependency(&job->drm, fence);
> >  }
> > diff --git a/drivers/gpu/drm/xe/xe_sched_job.h
> > b/drivers/gpu/drm/xe/xe_sched_job.h
> > index 3dc72c5c1f13..81d8e848e605 100644
> > --- a/drivers/gpu/drm/xe/xe_sched_job.h
> > +++ b/drivers/gpu/drm/xe/xe_sched_job.h
> > @@ -58,7 +58,8 @@ bool xe_sched_job_completed(struct xe_sched_job
> > *job);
> >  void xe_sched_job_arm(struct xe_sched_job *job);
> >  void xe_sched_job_push(struct xe_sched_job *job);
> >  
> > -int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct
> > xe_vm *vm);
> > +int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct
> > xe_vm *vm,
> > +                                   u64 mask_ctx0, u64 mask_ctx1);
> >  void xe_sched_job_init_user_fence(struct xe_sched_job *job,
> >                                   struct xe_sync_entry *sync);
> >  
> > @@ -93,4 +94,8 @@ void xe_sched_job_snapshot_print(struct
> > xe_sched_job_snapshot *snapshot, struct
> >  int xe_sched_job_add_deps(struct xe_sched_job *job, struct dma_resv
> > *resv,
> >                           enum dma_resv_usage usage);
> >  
> > +#define NO_MASK_DEP    (~0x0ull)
> > +bool xe_sched_job_mask_dependency(struct dma_fence *fence, u64
> > mask_ctx0,
> > +                                 u64 mask_ctx1);
> > +
> >  #endif
> > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > index 492def04a559..f2fe7f9fbb22 100644
> > --- a/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > @@ -32,6 +32,8 @@ struct xe_tlb_inval_job {
> >         u64 start;
> >         /** @end: End address to invalidate */
> >         u64 end;
> > +       /** @fence_context: Fence context for job */
> > +       u64 fence_context;
> >         /** @asid: Address space ID to invalidate */
> >         u32 asid;
> >         /** @fence_armed: Fence has been armed */
> > @@ -101,6 +103,7 @@ xe_tlb_inval_job_create(struct xe_exec_queue *q,
> > struct xe_tlb_inval *tlb_inval,
> >         job->asid = asid;
> >         job->fence_armed = false;
> >         job->dep.ops = &dep_job_ops;
> 
> This means the "finished" context per the entity definition right? Can
> you either add a note here or change the job->fence_context name to
> reflect that? Or otherwise why is this adding the +1 here?
> 

The schedule context is entity->context, the finished context is entity
+ 1 - this is in DRM scheduler doc. I can add a comment to this for now,
and roll a better fix which is DRM scheduler helper to fish out the
finished context into this series [1]. DRM scheduler stuff moves slow so
that latter may take a minute and didn't want to block a fix on that.

Matt

[1] https://patchwork.freedesktop.org/series/155314/

> Thanks,
> Stuart
> 
> > +       job->fence_context = entity->fence_context + 1;
> >         kref_init(&job->refcount);
> >         xe_exec_queue_get(q);   /* Pairs with put in
> > xe_tlb_inval_job_destroy */
> >  
> > @@ -266,3 +269,14 @@ void xe_tlb_inval_job_put(struct
> > xe_tlb_inval_job *job)
> >         if (!IS_ERR_OR_NULL(job))
> >                 kref_put(&job->refcount, xe_tlb_inval_job_destroy);
> >  }
> > +
> > +/**
> > + * xe_tlb_inval_job_fence_context() - TLB invalidation job fence
> > context
> > + * @job: TLB invalidation job object
> > + *
> > + * Return: TLB invalidation job fence context
> > + */
> > +u64 xe_tlb_inval_job_fence_context(struct xe_tlb_inval_job *job)
> > +{
> > +       return job->fence_context;
> > +}
> > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > index e63edcb26b50..2576165c2228 100644
> > --- a/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > @@ -30,4 +30,6 @@ void xe_tlb_inval_job_get(struct xe_tlb_inval_job
> > *job);
> >  
> >  void xe_tlb_inval_job_put(struct xe_tlb_inval_job *job);
> >  
> > +u64 xe_tlb_inval_job_fence_context(struct xe_tlb_inval_job *job);
> > +
> >  #endif
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] drm/xe: Avoid serializing unbind jobs on prior TLB invalidations
  2025-10-21 20:36     ` Matthew Brost
@ 2025-10-21 20:43       ` Summers, Stuart
  2025-10-21 20:50         ` Matthew Brost
  0 siblings, 1 reply; 15+ messages in thread
From: Summers, Stuart @ 2025-10-21 20:43 UTC (permalink / raw)
  To: Brost, Matthew
  Cc: intel-xe@lists.freedesktop.org, Santa,  Carlos,
	thomas.hellstrom@linux.intel.com

On Tue, 2025-10-21 at 13:36 -0700, Matthew Brost wrote:
> On Tue, Oct 21, 2025 at 11:55:56AM -0600, Summers, Stuart wrote:
> > On Fri, 2025-10-17 at 09:52 -0700, Matthew Brost wrote:
> > > When a burst of unbind jobs is issued, a dependency chain can
> > > form
> > > between the TLB invalidation of a previous unbind job and the
> > > current
> > > one. This leads to undesirable serialization, causing current
> > > jobs to
> > > wait unnecessarily for prior TLB invalidations, execute on the
> > > GPU
> > > when
> > > not needed, and significantly slow down the unbind
> > > burst—resulting in
> > > up
> > > to a 4× slowdown.
> > > 
> > > To break this chain, mask the last bind queue dependency if the
> > > last
> > > fence's DMA context matches the TLB invalidation context. This
> > > allows
> > > full pipelining of unbinds and TLB invalidations while preserving
> > > correct dma-fence signaling semantics.
> > > 
> > > Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6047
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_exec.c          |  3 +-
> > >  drivers/gpu/drm/xe/xe_exec_queue.c    | 18 +++++++++--
> > >  drivers/gpu/drm/xe/xe_exec_queue.h    |  3 +-
> > >  drivers/gpu/drm/xe/xe_pt.c            | 15 +++++++--
> > >  drivers/gpu/drm/xe/xe_sched_job.c     | 44
> > > ++++++++++++++++++++++++++-
> > >  drivers/gpu/drm/xe/xe_sched_job.h     |  7 ++++-
> > >  drivers/gpu/drm/xe/xe_tlb_inval_job.c | 14 +++++++++
> > >  drivers/gpu/drm/xe/xe_tlb_inval_job.h |  2 ++
> > >  8 files changed, 98 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_exec.c
> > > b/drivers/gpu/drm/xe/xe_exec.c
> > > index 0dc27476832b..6034cfc8be06 100644
> > > --- a/drivers/gpu/drm/xe/xe_exec.c
> > > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > > @@ -294,7 +294,8 @@ int xe_exec_ioctl(struct drm_device *dev,
> > > void
> > > *data, struct drm_file *file)
> > >                 goto err_put_job;
> > >  
> > >         if (!xe_vm_in_lr_mode(vm)) {
> > > -               err = xe_sched_job_last_fence_add_dep(job, vm);
> > > +               err = xe_sched_job_last_fence_add_dep(job, vm,
> > > NO_MASK_DEP,
> > > +                                                    
> > > NO_MASK_DEP);
> > >                 if (err)
> > >                         goto err_put_job;
> > >  
> > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > index 90cbc95f8e2e..d6f69d9bccba 100644
> > > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > @@ -25,6 +25,7 @@
> > >  #include "xe_migrate.h"
> > >  #include "xe_pm.h"
> > >  #include "xe_ring_ops_types.h"
> > > +#include "xe_sched_job.h"
> > >  #include "xe_trace.h"
> > >  #include "xe_vm.h"
> > >  #include "xe_pxp.h"
> > > @@ -1106,11 +1107,17 @@ void xe_exec_queue_last_fence_set(struct
> > > xe_exec_queue *q, struct xe_vm *vm,
> > >   * xe_exec_queue_last_fence_test_dep - Test last fence
> > > dependency of
> > > queue
> > >   * @q: The exec queue
> > >   * @vm: The VM the engine does a bind or exec for
> > > + * @mask_ctx0: Mask dma-fence context0
> > > + * @mask_ctx1: Mask dma-fence context1
> > > + *
> > > + * Test last fence dependency of queue, skipping masked dma
> > > fence
> > > contexts.
> > >   *
> > >   * Returns:
> > > - * -ETIME if there exists an unsignalled last fence dependency,
> > > zero
> > > otherwise.
> > > + * -ETIME if there exists an unsignalled and unmasked last fence
> > > dependency,
> > > + * zero otherwise.
> > >   */
> > > -int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
> > > struct xe_vm *vm)
> > > +int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
> > > struct xe_vm *vm,
> > > +                                     u64 mask_ctx0, u64
> > > mask_ctx1)
> > >  {
> > >         struct dma_fence *fence;
> > >         int err = 0;
> > > @@ -1119,6 +1126,13 @@ int
> > > xe_exec_queue_last_fence_test_dep(struct
> > > xe_exec_queue *q, struct xe_vm *vm)
> > >         if (fence) {
> > >                 err = test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> > > &fence-
> > > > flags) ?
> > >                         0 : -ETIME;
> > > +
> > > +               if (err == -ETIME) {
> > > +                       if (xe_sched_job_mask_dependency(fence,
> > > mask_ctx0,
> > > +                                                       
> > > mask_ctx1))
> > > +                               err = 0;
> > > +               }
> > > +
> > >                 dma_fence_put(fence);
> > >         }
> > >  
> > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h
> > > b/drivers/gpu/drm/xe/xe_exec_queue.h
> > > index a4dfbe858bda..99a35b22a46c 100644
> > > --- a/drivers/gpu/drm/xe/xe_exec_queue.h
> > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
> > > @@ -85,7 +85,8 @@ struct dma_fence
> > > *xe_exec_queue_last_fence_get_for_resume(struct xe_exec_queue *
> > >  void xe_exec_queue_last_fence_set(struct xe_exec_queue *e,
> > > struct
> > > xe_vm *vm,
> > >                                   struct dma_fence *fence);
> > >  int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
> > > -                                     struct xe_vm *vm);
> > > +                                     struct xe_vm *vm, u64
> > > mask_ctx0,
> > > +                                     u64 mask_ctx1);
> > >  void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q);
> > >  
> > >  int xe_exec_queue_contexts_hwsp_rebase(struct xe_exec_queue *q,
> > > void
> > > *scratch);
> > > diff --git a/drivers/gpu/drm/xe/xe_pt.c
> > > b/drivers/gpu/drm/xe/xe_pt.c
> > > index d22fd1ccc0ba..bba9ae559f57 100644
> > > --- a/drivers/gpu/drm/xe/xe_pt.c
> > > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > > @@ -1341,10 +1341,21 @@ static int xe_pt_vm_dependencies(struct
> > > xe_sched_job *job,
> > >         }
> > >  
> > >         if (!(pt_update_ops->q->flags & EXEC_QUEUE_FLAG_KERNEL))
> > > {
> > > +               u64 mask_ctx0 = NO_MASK_DEP, mask_ctx1 =
> > > NO_MASK_DEP;
> > > +
> > > +               if (ijob)
> > > +                       mask_ctx0 =
> > > xe_tlb_inval_job_fence_context(ijob);
> > > +               if (mjob)
> > > +                       mask_ctx1 =
> > > xe_tlb_inval_job_fence_context(mjob);
> > 
> > Can we rename these ictx and mctx for consistency?
> > 
> 
> Yes.
> 
> > Also, do we really need both of these here? Shouldn't we always
> > have
> > the primary GT inval (ictx) and so just need to check the one? My
> 
> The code as written, we'd only need to check primary GT but Matt R
> eventually wants the driver to be able to boot without the primary
> GT.
> There is a bit of work to done to that work but I didn't want to make
> it
> worse in this patch.

Yeah makes sense, was just thinking of how to make this a little
simplier. I don't really like that we're looking at both contexts here
when we really just need one of them. But we're also doing this in
other parts of the driver (like the primary/media GT TLB invals for
which this is based), so maybe no problem. Just good to at least note
this, otherwise to me it isn't super clear why we need two contexts
here at a glance.

I think at least having those name changes (ictx and mctx) would help
here.

> 
> > understanding is the reason being that we might be adding either
> > one of
> > these as the last fence so we need both checks. But in that case
> > would
> > it be better to check against all dependencies or even just the
> > last
> > two? Wouldn't that also help if multiple apps are trying to free at
> > once here so we have interleaved unbind dependencies?
> > 
> 
> Depending on how the bind is setup, we may check further dependecies
> in dma-resv - see all the other checks in this function. This is
> covering the case for last queue dependecy only which is at least
> sufficent help with ChromeOS case where this is triggered with a
> burst
> of user unbinds.
> 
> We might still have an issue with a burst of SVM unbinds where this
> can
> serialize though, that would however likely need some DRM scheduler
> changes though fairly similar to this patch. I can maybe think on
> that
> one in a follow up in a later series. Also I probably should switch
> over
> SVM unbinds to drain the entire garbage collector list and issue a
> single unbind job too.

Makes sense to me, but I'm also fine having that in a follow-up patch.

> 
> > > +
> > >                 if (job)
> > > -                       err =
> > > xe_sched_job_last_fence_add_dep(job,
> > > vm);
> > > +                       err =
> > > xe_sched_job_last_fence_add_dep(job,
> > > vm,
> > > +                                                            
> > > mask_ctx0,
> > > +                                                            
> > > mask_ctx1);
> > >                 else
> > > -                       err =
> > > xe_exec_queue_last_fence_test_dep(pt_update_ops->q, vm);
> > > +                       err =
> > > xe_exec_queue_last_fence_test_dep(pt_update_ops->q,
> > > +                                                               v
> > > m,
> > > mask_ctx0,
> > > +                                                               m
> > > ask_
> > > ctx1);
> > >         }
> > >  
> > >         for (i = 0; job && !err && i < vops->num_syncs; i++)
> > > diff --git a/drivers/gpu/drm/xe/xe_sched_job.c
> > > b/drivers/gpu/drm/xe/xe_sched_job.c
> > > index d21bf8f26964..7cbdd87904c6 100644
> > > --- a/drivers/gpu/drm/xe/xe_sched_job.c
> > > +++ b/drivers/gpu/drm/xe/xe_sched_job.c
> > > @@ -6,6 +6,7 @@
> > >  #include "xe_sched_job.h"
> > >  
> > >  #include <uapi/drm/xe_drm.h>
> > > +#include <linux/dma-fence-array.h>
> > >  #include <linux/dma-fence-chain.h>
> > >  #include <linux/slab.h>
> > >  
> > > @@ -295,19 +296,60 @@ void xe_sched_job_push(struct xe_sched_job
> > > *job)
> > >         xe_sched_job_put(job);
> > >  }
> > >  
> > > +/**
> > > + * xe_sched_job_mask_dependency() - Determine if a dma-fence
> > > dependency can be masked
> > > + * @fence: The dma-fence to check
> > > + * @mask_ctx0: First context to compare against the fence's
> > > context
> > > + * @mask_ctx1: Second context to compare against the fence's
> > > context
> > > + *
> > > + * This function checks whether the context of the given dma-
> > > fence
> > > matches
> > > + * either of the provided mask contexts. If a match is found,
> > > the
> > > dependency
> > > + * represented by the fence can be skipped. If the fence is a
> > > dma-
> > > fence-array,
> > > + * its individual fences are unwound and checked.
> > > + *
> > > + * Return: true if the fence can be masked (i.e., skipped),
> > > false
> > > otherwise.
> > > + */
> > > +bool xe_sched_job_mask_dependency(struct dma_fence *fence, u64
> > > mask_ctx0,
> > > +                                 u64 mask_ctx1)
> > > +{
> > > +       if (dma_fence_is_array(fence)) {
> > > +               struct dma_fence *__fence;
> > > +               int index;
> > > +
> > > +               dma_fence_array_for_each(__fence, index, fence)
> > > +                       if (__fence->context == mask_ctx0 ||
> > > +                           __fence->context == mask_ctx1)
> > > +                               return true;
> > > +       } else if (fence->context == mask_ctx0 ||
> > > +                  fence->context == mask_ctx1) {
> > > +               return true;
> > > +       }
> > > +
> > > +       return false;
> > > +}
> > > +
> > >  /**
> > >   * xe_sched_job_last_fence_add_dep - Add last fence dependency
> > > to
> > > job
> > >   * @job:job to add the last fence dependency to
> > >   * @vm: virtual memory job belongs to
> > > + * @mask_ctx0: Mask dma-fence context0
> > > + * @mask_ctx1: Mask dma-fence context1
> > > + *
> > > + * Add last fence dependency to job, skipping masked dma fence
> > > contexts.
> > >   *
> > >   * Returns:
> > >   * 0 on success, or an error on failing to expand the array.
> > >   */
> > > -int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job,
> > > struct
> > > xe_vm *vm)
> > > +int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job,
> > > struct
> > > xe_vm *vm,
> > > +                                   u64 mask_ctx0, u64 mask_ctx1)
> > >  {
> > >         struct dma_fence *fence;
> > >  
> > >         fence = xe_exec_queue_last_fence_get(job->q, vm);
> > > +       if (xe_sched_job_mask_dependency(fence, mask_ctx0,
> > > mask_ctx1)) {
> > > +               dma_fence_put(fence);
> > > +               return 0;
> > > +       }
> > >  
> > >         return drm_sched_job_add_dependency(&job->drm, fence);
> > >  }
> > > diff --git a/drivers/gpu/drm/xe/xe_sched_job.h
> > > b/drivers/gpu/drm/xe/xe_sched_job.h
> > > index 3dc72c5c1f13..81d8e848e605 100644
> > > --- a/drivers/gpu/drm/xe/xe_sched_job.h
> > > +++ b/drivers/gpu/drm/xe/xe_sched_job.h
> > > @@ -58,7 +58,8 @@ bool xe_sched_job_completed(struct xe_sched_job
> > > *job);
> > >  void xe_sched_job_arm(struct xe_sched_job *job);
> > >  void xe_sched_job_push(struct xe_sched_job *job);
> > >  
> > > -int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job,
> > > struct
> > > xe_vm *vm);
> > > +int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job,
> > > struct
> > > xe_vm *vm,
> > > +                                   u64 mask_ctx0, u64
> > > mask_ctx1);
> > >  void xe_sched_job_init_user_fence(struct xe_sched_job *job,
> > >                                   struct xe_sync_entry *sync);
> > >  
> > > @@ -93,4 +94,8 @@ void xe_sched_job_snapshot_print(struct
> > > xe_sched_job_snapshot *snapshot, struct
> > >  int xe_sched_job_add_deps(struct xe_sched_job *job, struct
> > > dma_resv
> > > *resv,
> > >                           enum dma_resv_usage usage);
> > >  
> > > +#define NO_MASK_DEP    (~0x0ull)
> > > +bool xe_sched_job_mask_dependency(struct dma_fence *fence, u64
> > > mask_ctx0,
> > > +                                 u64 mask_ctx1);
> > > +
> > >  #endif
> > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > > b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > > index 492def04a559..f2fe7f9fbb22 100644
> > > --- a/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > > @@ -32,6 +32,8 @@ struct xe_tlb_inval_job {
> > >         u64 start;
> > >         /** @end: End address to invalidate */
> > >         u64 end;
> > > +       /** @fence_context: Fence context for job */
> > > +       u64 fence_context;
> > >         /** @asid: Address space ID to invalidate */
> > >         u32 asid;
> > >         /** @fence_armed: Fence has been armed */
> > > @@ -101,6 +103,7 @@ xe_tlb_inval_job_create(struct xe_exec_queue
> > > *q,
> > > struct xe_tlb_inval *tlb_inval,
> > >         job->asid = asid;
> > >         job->fence_armed = false;
> > >         job->dep.ops = &dep_job_ops;
> > 
> > This means the "finished" context per the entity definition right?
> > Can
> > you either add a note here or change the job->fence_context name to
> > reflect that? Or otherwise why is this adding the +1 here?
> > 
> 
> The schedule context is entity->context, the finished context is
> entity
> + 1 - this is in DRM scheduler doc. I can add a comment to this for
> now,
> and roll a better fix which is DRM scheduler helper to fish out the
> finished context into this series [1]. DRM scheduler stuff moves slow
> so
> that latter may take a minute and didn't want to block a fix on that.

Can you add some quick documentation there to that effect? Just nice
not to have to go back and forth to the entity documentation which
right now is just implied.

Also thanks for the link to that other series, I'll check that out too.

Thanks,
Stuart

> 
> Matt
> 
> [1] https://patchwork.freedesktop.org/series/155314/
> 
> > Thanks,
> > Stuart
> > 
> > > +       job->fence_context = entity->fence_context + 1;
> > >         kref_init(&job->refcount);
> > >         xe_exec_queue_get(q);   /* Pairs with put in
> > > xe_tlb_inval_job_destroy */
> > >  
> > > @@ -266,3 +269,14 @@ void xe_tlb_inval_job_put(struct
> > > xe_tlb_inval_job *job)
> > >         if (!IS_ERR_OR_NULL(job))
> > >                 kref_put(&job->refcount,
> > > xe_tlb_inval_job_destroy);
> > >  }
> > > +
> > > +/**
> > > + * xe_tlb_inval_job_fence_context() - TLB invalidation job fence
> > > context
> > > + * @job: TLB invalidation job object
> > > + *
> > > + * Return: TLB invalidation job fence context
> > > + */
> > > +u64 xe_tlb_inval_job_fence_context(struct xe_tlb_inval_job *job)
> > > +{
> > > +       return job->fence_context;
> > > +}
> > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > > b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > > index e63edcb26b50..2576165c2228 100644
> > > --- a/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > > @@ -30,4 +30,6 @@ void xe_tlb_inval_job_get(struct
> > > xe_tlb_inval_job
> > > *job);
> > >  
> > >  void xe_tlb_inval_job_put(struct xe_tlb_inval_job *job);
> > >  
> > > +u64 xe_tlb_inval_job_fence_context(struct xe_tlb_inval_job
> > > *job);
> > > +
> > >  #endif
> > 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] drm/xe: Avoid serializing unbind jobs on prior TLB invalidations
  2025-10-21 20:43       ` Summers, Stuart
@ 2025-10-21 20:50         ` Matthew Brost
  0 siblings, 0 replies; 15+ messages in thread
From: Matthew Brost @ 2025-10-21 20:50 UTC (permalink / raw)
  To: Summers, Stuart
  Cc: intel-xe@lists.freedesktop.org, Santa,  Carlos,
	thomas.hellstrom@linux.intel.com

On Tue, Oct 21, 2025 at 02:43:07PM -0600, Summers, Stuart wrote:
> On Tue, 2025-10-21 at 13:36 -0700, Matthew Brost wrote:
> > On Tue, Oct 21, 2025 at 11:55:56AM -0600, Summers, Stuart wrote:
> > > On Fri, 2025-10-17 at 09:52 -0700, Matthew Brost wrote:
> > > > When a burst of unbind jobs is issued, a dependency chain can
> > > > form
> > > > between the TLB invalidation of a previous unbind job and the
> > > > current
> > > > one. This leads to undesirable serialization, causing current
> > > > jobs to
> > > > wait unnecessarily for prior TLB invalidations, execute on the
> > > > GPU
> > > > when
> > > > not needed, and significantly slow down the unbind
> > > > burst—resulting in
> > > > up
> > > > to a 4× slowdown.
> > > > 
> > > > To break this chain, mask the last bind queue dependency if the
> > > > last
> > > > fence's DMA context matches the TLB invalidation context. This
> > > > allows
> > > > full pipelining of unbinds and TLB invalidations while preserving
> > > > correct dma-fence signaling semantics.
> > > > 
> > > > Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6047
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/xe/xe_exec.c          |  3 +-
> > > >  drivers/gpu/drm/xe/xe_exec_queue.c    | 18 +++++++++--
> > > >  drivers/gpu/drm/xe/xe_exec_queue.h    |  3 +-
> > > >  drivers/gpu/drm/xe/xe_pt.c            | 15 +++++++--
> > > >  drivers/gpu/drm/xe/xe_sched_job.c     | 44
> > > > ++++++++++++++++++++++++++-
> > > >  drivers/gpu/drm/xe/xe_sched_job.h     |  7 ++++-
> > > >  drivers/gpu/drm/xe/xe_tlb_inval_job.c | 14 +++++++++
> > > >  drivers/gpu/drm/xe/xe_tlb_inval_job.h |  2 ++
> > > >  8 files changed, 98 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_exec.c
> > > > b/drivers/gpu/drm/xe/xe_exec.c
> > > > index 0dc27476832b..6034cfc8be06 100644
> > > > --- a/drivers/gpu/drm/xe/xe_exec.c
> > > > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > > > @@ -294,7 +294,8 @@ int xe_exec_ioctl(struct drm_device *dev,
> > > > void
> > > > *data, struct drm_file *file)
> > > >                 goto err_put_job;
> > > >  
> > > >         if (!xe_vm_in_lr_mode(vm)) {
> > > > -               err = xe_sched_job_last_fence_add_dep(job, vm);
> > > > +               err = xe_sched_job_last_fence_add_dep(job, vm,
> > > > NO_MASK_DEP,
> > > > +                                                    
> > > > NO_MASK_DEP);
> > > >                 if (err)
> > > >                         goto err_put_job;
> > > >  
> > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > index 90cbc95f8e2e..d6f69d9bccba 100644
> > > > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > @@ -25,6 +25,7 @@
> > > >  #include "xe_migrate.h"
> > > >  #include "xe_pm.h"
> > > >  #include "xe_ring_ops_types.h"
> > > > +#include "xe_sched_job.h"
> > > >  #include "xe_trace.h"
> > > >  #include "xe_vm.h"
> > > >  #include "xe_pxp.h"
> > > > @@ -1106,11 +1107,17 @@ void xe_exec_queue_last_fence_set(struct
> > > > xe_exec_queue *q, struct xe_vm *vm,
> > > >   * xe_exec_queue_last_fence_test_dep - Test last fence
> > > > dependency of
> > > > queue
> > > >   * @q: The exec queue
> > > >   * @vm: The VM the engine does a bind or exec for
> > > > + * @mask_ctx0: Mask dma-fence context0
> > > > + * @mask_ctx1: Mask dma-fence context1
> > > > + *
> > > > + * Test last fence dependency of queue, skipping masked dma
> > > > fence
> > > > contexts.
> > > >   *
> > > >   * Returns:
> > > > - * -ETIME if there exists an unsignalled last fence dependency,
> > > > zero
> > > > otherwise.
> > > > + * -ETIME if there exists an unsignalled and unmasked last fence
> > > > dependency,
> > > > + * zero otherwise.
> > > >   */
> > > > -int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
> > > > struct xe_vm *vm)
> > > > +int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
> > > > struct xe_vm *vm,
> > > > +                                     u64 mask_ctx0, u64
> > > > mask_ctx1)
> > > >  {
> > > >         struct dma_fence *fence;
> > > >         int err = 0;
> > > > @@ -1119,6 +1126,13 @@ int
> > > > xe_exec_queue_last_fence_test_dep(struct
> > > > xe_exec_queue *q, struct xe_vm *vm)
> > > >         if (fence) {
> > > >                 err = test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> > > > &fence-
> > > > > flags) ?
> > > >                         0 : -ETIME;
> > > > +
> > > > +               if (err == -ETIME) {
> > > > +                       if (xe_sched_job_mask_dependency(fence,
> > > > mask_ctx0,
> > > > +                                                       
> > > > mask_ctx1))
> > > > +                               err = 0;
> > > > +               }
> > > > +
> > > >                 dma_fence_put(fence);
> > > >         }
> > > >  
> > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h
> > > > b/drivers/gpu/drm/xe/xe_exec_queue.h
> > > > index a4dfbe858bda..99a35b22a46c 100644
> > > > --- a/drivers/gpu/drm/xe/xe_exec_queue.h
> > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
> > > > @@ -85,7 +85,8 @@ struct dma_fence
> > > > *xe_exec_queue_last_fence_get_for_resume(struct xe_exec_queue *
> > > >  void xe_exec_queue_last_fence_set(struct xe_exec_queue *e,
> > > > struct
> > > > xe_vm *vm,
> > > >                                   struct dma_fence *fence);
> > > >  int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
> > > > -                                     struct xe_vm *vm);
> > > > +                                     struct xe_vm *vm, u64
> > > > mask_ctx0,
> > > > +                                     u64 mask_ctx1);
> > > >  void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q);
> > > >  
> > > >  int xe_exec_queue_contexts_hwsp_rebase(struct xe_exec_queue *q,
> > > > void
> > > > *scratch);
> > > > diff --git a/drivers/gpu/drm/xe/xe_pt.c
> > > > b/drivers/gpu/drm/xe/xe_pt.c
> > > > index d22fd1ccc0ba..bba9ae559f57 100644
> > > > --- a/drivers/gpu/drm/xe/xe_pt.c
> > > > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > > > @@ -1341,10 +1341,21 @@ static int xe_pt_vm_dependencies(struct
> > > > xe_sched_job *job,
> > > >         }
> > > >  
> > > >         if (!(pt_update_ops->q->flags & EXEC_QUEUE_FLAG_KERNEL))
> > > > {
> > > > +               u64 mask_ctx0 = NO_MASK_DEP, mask_ctx1 =
> > > > NO_MASK_DEP;
> > > > +
> > > > +               if (ijob)
> > > > +                       mask_ctx0 =
> > > > xe_tlb_inval_job_fence_context(ijob);
> > > > +               if (mjob)
> > > > +                       mask_ctx1 =
> > > > xe_tlb_inval_job_fence_context(mjob);
> > > 
> > > Can we rename these ictx and mctx for consistency?
> > > 
> > 
> > Yes.
> > 
> > > Also, do we really need both of these here? Shouldn't we always
> > > have
> > > the primary GT inval (ictx) and so just need to check the one? My
> > 
> > The code as written, we'd only need to check primary GT but Matt R
> > eventually wants the driver to be able to boot without the primary
> > GT.
> > There is a bit of work to done to that work but I didn't want to make
> > it
> > worse in this patch.
> 
> Yeah makes sense, was just thinking of how to make this a little
> simplier. I don't really like that we're looking at both contexts here
> when we really just need one of them. But we're also doing this in
> other parts of the driver (like the primary/media GT TLB invals for
> which this is based), so maybe no problem. Just good to at least note
> this, otherwise to me it isn't super clear why we need two contexts
> here at a glance.
> 
> I think at least having those name changes (ictx and mctx) would help
> here.
> 

Yes, will add a comment too.

> > 
> > > understanding is the reason being that we might be adding either
> > > one of
> > > these as the last fence so we need both checks. But in that case
> > > would
> > > it be better to check against all dependencies or even just the
> > > last
> > > two? Wouldn't that also help if multiple apps are trying to free at
> > > once here so we have interleaved unbind dependencies?
> > > 
> > 
> > Depending on how the bind is setup, we may check further dependecies
> > in dma-resv - see all the other checks in this function. This is
> > covering the case for last queue dependecy only which is at least
> > sufficent help with ChromeOS case where this is triggered with a
> > burst
> > of user unbinds.
> > 
> > We might still have an issue with a burst of SVM unbinds where this
> > can
> > serialize though, that would however likely need some DRM scheduler
> > changes though fairly similar to this patch. I can maybe think on
> > that
> > one in a follow up in a later series. Also I probably should switch
> > over
> > SVM unbinds to drain the entire garbage collector list and issue a
> > single unbind job too.
> 
> Makes sense to me, but I'm also fine having that in a follow-up patch.
> 
> > 
> > > > +
> > > >                 if (job)
> > > > -                       err =
> > > > xe_sched_job_last_fence_add_dep(job,
> > > > vm);
> > > > +                       err =
> > > > xe_sched_job_last_fence_add_dep(job,
> > > > vm,
> > > > +                                                            
> > > > mask_ctx0,
> > > > +                                                            
> > > > mask_ctx1);
> > > >                 else
> > > > -                       err =
> > > > xe_exec_queue_last_fence_test_dep(pt_update_ops->q, vm);
> > > > +                       err =
> > > > xe_exec_queue_last_fence_test_dep(pt_update_ops->q,
> > > > +                                                               v
> > > > m,
> > > > mask_ctx0,
> > > > +                                                               m
> > > > ask_
> > > > ctx1);
> > > >         }
> > > >  
> > > >         for (i = 0; job && !err && i < vops->num_syncs; i++)
> > > > diff --git a/drivers/gpu/drm/xe/xe_sched_job.c
> > > > b/drivers/gpu/drm/xe/xe_sched_job.c
> > > > index d21bf8f26964..7cbdd87904c6 100644
> > > > --- a/drivers/gpu/drm/xe/xe_sched_job.c
> > > > +++ b/drivers/gpu/drm/xe/xe_sched_job.c
> > > > @@ -6,6 +6,7 @@
> > > >  #include "xe_sched_job.h"
> > > >  
> > > >  #include <uapi/drm/xe_drm.h>
> > > > +#include <linux/dma-fence-array.h>
> > > >  #include <linux/dma-fence-chain.h>
> > > >  #include <linux/slab.h>
> > > >  
> > > > @@ -295,19 +296,60 @@ void xe_sched_job_push(struct xe_sched_job
> > > > *job)
> > > >         xe_sched_job_put(job);
> > > >  }
> > > >  
> > > > +/**
> > > > + * xe_sched_job_mask_dependency() - Determine if a dma-fence
> > > > dependency can be masked
> > > > + * @fence: The dma-fence to check
> > > > + * @mask_ctx0: First context to compare against the fence's
> > > > context
> > > > + * @mask_ctx1: Second context to compare against the fence's
> > > > context
> > > > + *
> > > > + * This function checks whether the context of the given dma-
> > > > fence
> > > > matches
> > > > + * either of the provided mask contexts. If a match is found,
> > > > the
> > > > dependency
> > > > + * represented by the fence can be skipped. If the fence is a
> > > > dma-
> > > > fence-array,
> > > > + * its individual fences are unwound and checked.
> > > > + *
> > > > + * Return: true if the fence can be masked (i.e., skipped),
> > > > false
> > > > otherwise.
> > > > + */
> > > > +bool xe_sched_job_mask_dependency(struct dma_fence *fence, u64
> > > > mask_ctx0,
> > > > +                                 u64 mask_ctx1)
> > > > +{
> > > > +       if (dma_fence_is_array(fence)) {
> > > > +               struct dma_fence *__fence;
> > > > +               int index;
> > > > +
> > > > +               dma_fence_array_for_each(__fence, index, fence)
> > > > +                       if (__fence->context == mask_ctx0 ||
> > > > +                           __fence->context == mask_ctx1)
> > > > +                               return true;
> > > > +       } else if (fence->context == mask_ctx0 ||
> > > > +                  fence->context == mask_ctx1) {
> > > > +               return true;
> > > > +       }
> > > > +
> > > > +       return false;
> > > > +}
> > > > +
> > > >  /**
> > > >   * xe_sched_job_last_fence_add_dep - Add last fence dependency
> > > > to
> > > > job
> > > >   * @job:job to add the last fence dependency to
> > > >   * @vm: virtual memory job belongs to
> > > > + * @mask_ctx0: Mask dma-fence context0
> > > > + * @mask_ctx1: Mask dma-fence context1
> > > > + *
> > > > + * Add last fence dependency to job, skipping masked dma fence
> > > > contexts.
> > > >   *
> > > >   * Returns:
> > > >   * 0 on success, or an error on failing to expand the array.
> > > >   */
> > > > -int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job,
> > > > struct
> > > > xe_vm *vm)
> > > > +int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job,
> > > > struct
> > > > xe_vm *vm,
> > > > +                                   u64 mask_ctx0, u64 mask_ctx1)
> > > >  {
> > > >         struct dma_fence *fence;
> > > >  
> > > >         fence = xe_exec_queue_last_fence_get(job->q, vm);
> > > > +       if (xe_sched_job_mask_dependency(fence, mask_ctx0,
> > > > mask_ctx1)) {
> > > > +               dma_fence_put(fence);
> > > > +               return 0;
> > > > +       }
> > > >  
> > > >         return drm_sched_job_add_dependency(&job->drm, fence);
> > > >  }
> > > > diff --git a/drivers/gpu/drm/xe/xe_sched_job.h
> > > > b/drivers/gpu/drm/xe/xe_sched_job.h
> > > > index 3dc72c5c1f13..81d8e848e605 100644
> > > > --- a/drivers/gpu/drm/xe/xe_sched_job.h
> > > > +++ b/drivers/gpu/drm/xe/xe_sched_job.h
> > > > @@ -58,7 +58,8 @@ bool xe_sched_job_completed(struct xe_sched_job
> > > > *job);
> > > >  void xe_sched_job_arm(struct xe_sched_job *job);
> > > >  void xe_sched_job_push(struct xe_sched_job *job);
> > > >  
> > > > -int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job,
> > > > struct
> > > > xe_vm *vm);
> > > > +int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job,
> > > > struct
> > > > xe_vm *vm,
> > > > +                                   u64 mask_ctx0, u64
> > > > mask_ctx1);
> > > >  void xe_sched_job_init_user_fence(struct xe_sched_job *job,
> > > >                                   struct xe_sync_entry *sync);
> > > >  
> > > > @@ -93,4 +94,8 @@ void xe_sched_job_snapshot_print(struct
> > > > xe_sched_job_snapshot *snapshot, struct
> > > >  int xe_sched_job_add_deps(struct xe_sched_job *job, struct
> > > > dma_resv
> > > > *resv,
> > > >                           enum dma_resv_usage usage);
> > > >  
> > > > +#define NO_MASK_DEP    (~0x0ull)
> > > > +bool xe_sched_job_mask_dependency(struct dma_fence *fence, u64
> > > > mask_ctx0,
> > > > +                                 u64 mask_ctx1);
> > > > +
> > > >  #endif
> > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > > > b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > > > index 492def04a559..f2fe7f9fbb22 100644
> > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > > > @@ -32,6 +32,8 @@ struct xe_tlb_inval_job {
> > > >         u64 start;
> > > >         /** @end: End address to invalidate */
> > > >         u64 end;
> > > > +       /** @fence_context: Fence context for job */
> > > > +       u64 fence_context;
> > > >         /** @asid: Address space ID to invalidate */
> > > >         u32 asid;
> > > >         /** @fence_armed: Fence has been armed */
> > > > @@ -101,6 +103,7 @@ xe_tlb_inval_job_create(struct xe_exec_queue
> > > > *q,
> > > > struct xe_tlb_inval *tlb_inval,
> > > >         job->asid = asid;
> > > >         job->fence_armed = false;
> > > >         job->dep.ops = &dep_job_ops;
> > > 
> > > This means the "finished" context per the entity definition right?
> > > Can
> > > you either add a note here or change the job->fence_context name to
> > > reflect that? Or otherwise why is this adding the +1 here?
> > > 
> > 
> > The schedule context is entity->context, the finished context is
> > entity
> > + 1 - this is in DRM scheduler doc. I can add a comment to this for
> > now,
> > and roll a better fix which is DRM scheduler helper to fish out the
> > finished context into this series [1]. DRM scheduler stuff moves slow
> > so
> > that latter may take a minute and didn't want to block a fix on that.
> 
> Can you add some quick documentation there to that effect? Just nice
> not to have to go back and forth to the entity documentation which
> right now is just implied.
> 

Will add a comment.

Matt

> Also thanks for the link to that other series, I'll check that out too.
> 
> Thanks,
> Stuart
> 
> > 
> > Matt
> > 
> > [1] https://patchwork.freedesktop.org/series/155314/
> > 
> > > Thanks,
> > > Stuart
> > > 
> > > > +       job->fence_context = entity->fence_context + 1;
> > > >         kref_init(&job->refcount);
> > > >         xe_exec_queue_get(q);   /* Pairs with put in
> > > > xe_tlb_inval_job_destroy */
> > > >  
> > > > @@ -266,3 +269,14 @@ void xe_tlb_inval_job_put(struct
> > > > xe_tlb_inval_job *job)
> > > >         if (!IS_ERR_OR_NULL(job))
> > > >                 kref_put(&job->refcount,
> > > > xe_tlb_inval_job_destroy);
> > > >  }
> > > > +
> > > > +/**
> > > > + * xe_tlb_inval_job_fence_context() - TLB invalidation job fence
> > > > context
> > > > + * @job: TLB invalidation job object
> > > > + *
> > > > + * Return: TLB invalidation job fence context
> > > > + */
> > > > +u64 xe_tlb_inval_job_fence_context(struct xe_tlb_inval_job *job)
> > > > +{
> > > > +       return job->fence_context;
> > > > +}
> > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > > > b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > > > index e63edcb26b50..2576165c2228 100644
> > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > > > @@ -30,4 +30,6 @@ void xe_tlb_inval_job_get(struct
> > > > xe_tlb_inval_job
> > > > *job);
> > > >  
> > > >  void xe_tlb_inval_job_put(struct xe_tlb_inval_job *job);
> > > >  
> > > > +u64 xe_tlb_inval_job_fence_context(struct xe_tlb_inval_job
> > > > *job);
> > > > +
> > > >  #endif
> > > 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] drm/xe: Avoid serializing unbind jobs on prior TLB invalidations
  2025-10-17 16:52 ` [PATCH 1/1] drm/xe: Avoid serializing unbind jobs on prior TLB invalidations Matthew Brost
  2025-10-21 17:55   ` Summers, Stuart
@ 2025-10-22  8:00   ` Tvrtko Ursulin
  2025-10-22 15:10     ` Matthew Brost
  2025-10-23 12:28   ` Thomas Hellström
  2 siblings, 1 reply; 15+ messages in thread
From: Tvrtko Ursulin @ 2025-10-22  8:00 UTC (permalink / raw)
  To: Matthew Brost, intel-xe; +Cc: carlos.santa, thomas.hellstrom, Philipp Stanner


On 17/10/2025 17:52, Matthew Brost wrote:
> When a burst of unbind jobs is issued, a dependency chain can form
> between the TLB invalidation of a previous unbind job and the current
> one. This leads to undesirable serialization, causing current jobs to
> wait unnecessarily for prior TLB invalidations, execute on the GPU when
> not needed, and significantly slow down the unbind burst—resulting in up
> to a 4× slowdown.
> 
> To break this chain, mask the last bind queue dependency if the last
> fence's DMA context matches the TLB invalidation context. This allows
> full pipelining of unbinds and TLB invalidations while preserving
> correct dma-fence signaling semantics.
> 
> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6047
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_exec.c          |  3 +-
>   drivers/gpu/drm/xe/xe_exec_queue.c    | 18 +++++++++--
>   drivers/gpu/drm/xe/xe_exec_queue.h    |  3 +-
>   drivers/gpu/drm/xe/xe_pt.c            | 15 +++++++--
>   drivers/gpu/drm/xe/xe_sched_job.c     | 44 ++++++++++++++++++++++++++-
>   drivers/gpu/drm/xe/xe_sched_job.h     |  7 ++++-
>   drivers/gpu/drm/xe/xe_tlb_inval_job.c | 14 +++++++++
>   drivers/gpu/drm/xe/xe_tlb_inval_job.h |  2 ++
>   8 files changed, 98 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> index 0dc27476832b..6034cfc8be06 100644
> --- a/drivers/gpu/drm/xe/xe_exec.c
> +++ b/drivers/gpu/drm/xe/xe_exec.c
> @@ -294,7 +294,8 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   		goto err_put_job;
>   
>   	if (!xe_vm_in_lr_mode(vm)) {
> -		err = xe_sched_job_last_fence_add_dep(job, vm);
> +		err = xe_sched_job_last_fence_add_dep(job, vm, NO_MASK_DEP,
> +						      NO_MASK_DEP);
>   		if (err)
>   			goto err_put_job;
>   
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> index 90cbc95f8e2e..d6f69d9bccba 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -25,6 +25,7 @@
>   #include "xe_migrate.h"
>   #include "xe_pm.h"
>   #include "xe_ring_ops_types.h"
> +#include "xe_sched_job.h"
>   #include "xe_trace.h"
>   #include "xe_vm.h"
>   #include "xe_pxp.h"
> @@ -1106,11 +1107,17 @@ void xe_exec_queue_last_fence_set(struct xe_exec_queue *q, struct xe_vm *vm,
>    * xe_exec_queue_last_fence_test_dep - Test last fence dependency of queue
>    * @q: The exec queue
>    * @vm: The VM the engine does a bind or exec for
> + * @mask_ctx0: Mask dma-fence context0
> + * @mask_ctx1: Mask dma-fence context1
> + *
> + * Test last fence dependency of queue, skipping masked dma fence contexts.
>    *
>    * Returns:
> - * -ETIME if there exists an unsignalled last fence dependency, zero otherwise.
> + * -ETIME if there exists an unsignalled and unmasked last fence dependency,
> + * zero otherwise.
>    */
> -int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q, struct xe_vm *vm)
> +int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q, struct xe_vm *vm,
> +				      u64 mask_ctx0, u64 mask_ctx1)
>   {
>   	struct dma_fence *fence;
>   	int err = 0;
> @@ -1119,6 +1126,13 @@ int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q, struct xe_vm *vm)
>   	if (fence) {
>   		err = test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags) ?
>   			0 : -ETIME;
> +
> +		if (err == -ETIME) {
> +			if (xe_sched_job_mask_dependency(fence, mask_ctx0,
> +							 mask_ctx1))
> +				err = 0;
> +		}
> +
>   		dma_fence_put(fence);
>   	}
>   
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
> index a4dfbe858bda..99a35b22a46c 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
> @@ -85,7 +85,8 @@ struct dma_fence *xe_exec_queue_last_fence_get_for_resume(struct xe_exec_queue *
>   void xe_exec_queue_last_fence_set(struct xe_exec_queue *e, struct xe_vm *vm,
>   				  struct dma_fence *fence);
>   int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
> -				      struct xe_vm *vm);
> +				      struct xe_vm *vm, u64 mask_ctx0,
> +				      u64 mask_ctx1);
>   void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q);
>   
>   int xe_exec_queue_contexts_hwsp_rebase(struct xe_exec_queue *q, void *scratch);
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index d22fd1ccc0ba..bba9ae559f57 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -1341,10 +1341,21 @@ static int xe_pt_vm_dependencies(struct xe_sched_job *job,
>   	}
>   
>   	if (!(pt_update_ops->q->flags & EXEC_QUEUE_FLAG_KERNEL)) {
> +		u64 mask_ctx0 = NO_MASK_DEP, mask_ctx1 = NO_MASK_DEP;
> +
> +		if (ijob)
> +			mask_ctx0 = xe_tlb_inval_job_fence_context(ijob);
> +		if (mjob)
> +			mask_ctx1 = xe_tlb_inval_job_fence_context(mjob);
> +
>   		if (job)
> -			err = xe_sched_job_last_fence_add_dep(job, vm);
> +			err = xe_sched_job_last_fence_add_dep(job, vm,
> +							      mask_ctx0,
> +							      mask_ctx1);
>   		else
> -			err = xe_exec_queue_last_fence_test_dep(pt_update_ops->q, vm);
> +			err = xe_exec_queue_last_fence_test_dep(pt_update_ops->q,
> +								vm, mask_ctx0,
> +								mask_ctx1);
>   	}
>   
>   	for (i = 0; job && !err && i < vops->num_syncs; i++)
> diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
> index d21bf8f26964..7cbdd87904c6 100644
> --- a/drivers/gpu/drm/xe/xe_sched_job.c
> +++ b/drivers/gpu/drm/xe/xe_sched_job.c
> @@ -6,6 +6,7 @@
>   #include "xe_sched_job.h"
>   
>   #include <uapi/drm/xe_drm.h>
> +#include <linux/dma-fence-array.h>
>   #include <linux/dma-fence-chain.h>
>   #include <linux/slab.h>
>   
> @@ -295,19 +296,60 @@ void xe_sched_job_push(struct xe_sched_job *job)
>   	xe_sched_job_put(job);
>   }
>   
> +/**
> + * xe_sched_job_mask_dependency() - Determine if a dma-fence dependency can be masked
> + * @fence: The dma-fence to check
> + * @mask_ctx0: First context to compare against the fence's context
> + * @mask_ctx1: Second context to compare against the fence's context
> + *
> + * This function checks whether the context of the given dma-fence matches
> + * either of the provided mask contexts. If a match is found, the dependency
> + * represented by the fence can be skipped. If the fence is a dma-fence-array,
> + * its individual fences are unwound and checked.
> + *
> + * Return: true if the fence can be masked (i.e., skipped), false otherwise.
> + */
> +bool xe_sched_job_mask_dependency(struct dma_fence *fence, u64 mask_ctx0,
> +				  u64 mask_ctx1)
> +{
> +	if (dma_fence_is_array(fence)) {
> +		struct dma_fence *__fence;
> +		int index;
> +
> +		dma_fence_array_for_each(__fence, index, fence)
> +			if (__fence->context == mask_ctx0 ||
> +			    __fence->context == mask_ctx1)
> +				return true;
> +	} else if (fence->context == mask_ctx0 ||
> +		   fence->context == mask_ctx1) {
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
>   /**
>    * xe_sched_job_last_fence_add_dep - Add last fence dependency to job
>    * @job:job to add the last fence dependency to
>    * @vm: virtual memory job belongs to
> + * @mask_ctx0: Mask dma-fence context0
> + * @mask_ctx1: Mask dma-fence context1
> + *
> + * Add last fence dependency to job, skipping masked dma fence contexts.
>    *
>    * Returns:
>    * 0 on success, or an error on failing to expand the array.
>    */
> -int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm)
> +int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm,
> +				    u64 mask_ctx0, u64 mask_ctx1)
>   {
>   	struct dma_fence *fence;
>   
>   	fence = xe_exec_queue_last_fence_get(job->q, vm);
> +	if (xe_sched_job_mask_dependency(fence, mask_ctx0, mask_ctx1)) {
> +		dma_fence_put(fence);
> +		return 0;
> +	}
>   
>   	return drm_sched_job_add_dependency(&job->drm, fence);
>   }
> diff --git a/drivers/gpu/drm/xe/xe_sched_job.h b/drivers/gpu/drm/xe/xe_sched_job.h
> index 3dc72c5c1f13..81d8e848e605 100644
> --- a/drivers/gpu/drm/xe/xe_sched_job.h
> +++ b/drivers/gpu/drm/xe/xe_sched_job.h
> @@ -58,7 +58,8 @@ bool xe_sched_job_completed(struct xe_sched_job *job);
>   void xe_sched_job_arm(struct xe_sched_job *job);
>   void xe_sched_job_push(struct xe_sched_job *job);
>   
> -int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm);
> +int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm,
> +				    u64 mask_ctx0, u64 mask_ctx1);
>   void xe_sched_job_init_user_fence(struct xe_sched_job *job,
>   				  struct xe_sync_entry *sync);
>   
> @@ -93,4 +94,8 @@ void xe_sched_job_snapshot_print(struct xe_sched_job_snapshot *snapshot, struct
>   int xe_sched_job_add_deps(struct xe_sched_job *job, struct dma_resv *resv,
>   			  enum dma_resv_usage usage);
>   
> +#define NO_MASK_DEP	(~0x0ull)
> +bool xe_sched_job_mask_dependency(struct dma_fence *fence, u64 mask_ctx0,
> +				  u64 mask_ctx1);
> +
>   #endif
> diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.c b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> index 492def04a559..f2fe7f9fbb22 100644
> --- a/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> +++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> @@ -32,6 +32,8 @@ struct xe_tlb_inval_job {
>   	u64 start;
>   	/** @end: End address to invalidate */
>   	u64 end;
> +	/** @fence_context: Fence context for job */
> +	u64 fence_context;
>   	/** @asid: Address space ID to invalidate */
>   	u32 asid;
>   	/** @fence_armed: Fence has been armed */
> @@ -101,6 +103,7 @@ xe_tlb_inval_job_create(struct xe_exec_queue *q, struct xe_tlb_inval *tlb_inval,
>   	job->asid = asid;
>   	job->fence_armed = false;
>   	job->dep.ops = &dep_job_ops;
> +	job->fence_context = entity->fence_context + 1;

As a side note, hardcoding the assumption on how scheduler allocates 
contexts is not great given recent efforts to make drivers know less of 
the scheduler internals.

But what I really wanted to ask is, having only glanced the patch 
briefly, could xe performance problem here also be solved by unwrapping 
the container fences at the DRM scheduler dependency tracking level?

I am asking because amdgpu recently posted a patch to unwrap in their 
code for potentially similar performance reasons, and if now xe wants 
something similar, or even the same, it is an interesting question where 
to do it.

Also, I have a patch (not sure if I posted it so far) which unwraps in 
drm_sched_job_add_dependency() and converst the dependency xarray to 
unwrapped dma-fence-array. Initial idea there was to allow scheduler 
worker to only be woken up once, once all deps are signaled, but now if 
two drivers seems to be unwrapping fences maybe there is a case to be 
made for doing it in the core.

Regards,

Tvrtko

>   	kref_init(&job->refcount);
>   	xe_exec_queue_get(q);	/* Pairs with put in xe_tlb_inval_job_destroy */
>   
> @@ -266,3 +269,14 @@ void xe_tlb_inval_job_put(struct xe_tlb_inval_job *job)
>   	if (!IS_ERR_OR_NULL(job))
>   		kref_put(&job->refcount, xe_tlb_inval_job_destroy);
>   }
> +
> +/**
> + * xe_tlb_inval_job_fence_context() - TLB invalidation job fence context
> + * @job: TLB invalidation job object
> + *
> + * Return: TLB invalidation job fence context
> + */
> +u64 xe_tlb_inval_job_fence_context(struct xe_tlb_inval_job *job)
> +{
> +	return job->fence_context;
> +}
> diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.h b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> index e63edcb26b50..2576165c2228 100644
> --- a/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> +++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> @@ -30,4 +30,6 @@ void xe_tlb_inval_job_get(struct xe_tlb_inval_job *job);
>   
>   void xe_tlb_inval_job_put(struct xe_tlb_inval_job *job);
>   
> +u64 xe_tlb_inval_job_fence_context(struct xe_tlb_inval_job *job);
> +
>   #endif


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] drm/xe: Avoid serializing unbind jobs on prior TLB invalidations
  2025-10-22  8:00   ` Tvrtko Ursulin
@ 2025-10-22 15:10     ` Matthew Brost
  2025-10-23 12:46       ` Tvrtko Ursulin
  0 siblings, 1 reply; 15+ messages in thread
From: Matthew Brost @ 2025-10-22 15:10 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-xe, carlos.santa, thomas.hellstrom, Philipp Stanner

On Wed, Oct 22, 2025 at 09:00:47AM +0100, Tvrtko Ursulin wrote:
> 
> On 17/10/2025 17:52, Matthew Brost wrote:
> > When a burst of unbind jobs is issued, a dependency chain can form
> > between the TLB invalidation of a previous unbind job and the current
> > one. This leads to undesirable serialization, causing current jobs to
> > wait unnecessarily for prior TLB invalidations, execute on the GPU when
> > not needed, and significantly slow down the unbind burst—resulting in up
> > to a 4× slowdown.
> > 
> > To break this chain, mask the last bind queue dependency if the last
> > fence's DMA context matches the TLB invalidation context. This allows
> > full pipelining of unbinds and TLB invalidations while preserving
> > correct dma-fence signaling semantics.
> > 
> > Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6047
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_exec.c          |  3 +-
> >   drivers/gpu/drm/xe/xe_exec_queue.c    | 18 +++++++++--
> >   drivers/gpu/drm/xe/xe_exec_queue.h    |  3 +-
> >   drivers/gpu/drm/xe/xe_pt.c            | 15 +++++++--
> >   drivers/gpu/drm/xe/xe_sched_job.c     | 44 ++++++++++++++++++++++++++-
> >   drivers/gpu/drm/xe/xe_sched_job.h     |  7 ++++-
> >   drivers/gpu/drm/xe/xe_tlb_inval_job.c | 14 +++++++++
> >   drivers/gpu/drm/xe/xe_tlb_inval_job.h |  2 ++
> >   8 files changed, 98 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> > index 0dc27476832b..6034cfc8be06 100644
> > --- a/drivers/gpu/drm/xe/xe_exec.c
> > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > @@ -294,7 +294,8 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> >   		goto err_put_job;
> >   	if (!xe_vm_in_lr_mode(vm)) {
> > -		err = xe_sched_job_last_fence_add_dep(job, vm);
> > +		err = xe_sched_job_last_fence_add_dep(job, vm, NO_MASK_DEP,
> > +						      NO_MASK_DEP);
> >   		if (err)
> >   			goto err_put_job;
> > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> > index 90cbc95f8e2e..d6f69d9bccba 100644
> > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > @@ -25,6 +25,7 @@
> >   #include "xe_migrate.h"
> >   #include "xe_pm.h"
> >   #include "xe_ring_ops_types.h"
> > +#include "xe_sched_job.h"
> >   #include "xe_trace.h"
> >   #include "xe_vm.h"
> >   #include "xe_pxp.h"
> > @@ -1106,11 +1107,17 @@ void xe_exec_queue_last_fence_set(struct xe_exec_queue *q, struct xe_vm *vm,
> >    * xe_exec_queue_last_fence_test_dep - Test last fence dependency of queue
> >    * @q: The exec queue
> >    * @vm: The VM the engine does a bind or exec for
> > + * @mask_ctx0: Mask dma-fence context0
> > + * @mask_ctx1: Mask dma-fence context1
> > + *
> > + * Test last fence dependency of queue, skipping masked dma fence contexts.
> >    *
> >    * Returns:
> > - * -ETIME if there exists an unsignalled last fence dependency, zero otherwise.
> > + * -ETIME if there exists an unsignalled and unmasked last fence dependency,
> > + * zero otherwise.
> >    */
> > -int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q, struct xe_vm *vm)
> > +int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q, struct xe_vm *vm,
> > +				      u64 mask_ctx0, u64 mask_ctx1)
> >   {
> >   	struct dma_fence *fence;
> >   	int err = 0;
> > @@ -1119,6 +1126,13 @@ int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q, struct xe_vm *vm)
> >   	if (fence) {
> >   		err = test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags) ?
> >   			0 : -ETIME;
> > +
> > +		if (err == -ETIME) {
> > +			if (xe_sched_job_mask_dependency(fence, mask_ctx0,
> > +							 mask_ctx1))
> > +				err = 0;
> > +		}
> > +
> >   		dma_fence_put(fence);
> >   	}
> > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
> > index a4dfbe858bda..99a35b22a46c 100644
> > --- a/drivers/gpu/drm/xe/xe_exec_queue.h
> > +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
> > @@ -85,7 +85,8 @@ struct dma_fence *xe_exec_queue_last_fence_get_for_resume(struct xe_exec_queue *
> >   void xe_exec_queue_last_fence_set(struct xe_exec_queue *e, struct xe_vm *vm,
> >   				  struct dma_fence *fence);
> >   int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
> > -				      struct xe_vm *vm);
> > +				      struct xe_vm *vm, u64 mask_ctx0,
> > +				      u64 mask_ctx1);
> >   void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q);
> >   int xe_exec_queue_contexts_hwsp_rebase(struct xe_exec_queue *q, void *scratch);
> > diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> > index d22fd1ccc0ba..bba9ae559f57 100644
> > --- a/drivers/gpu/drm/xe/xe_pt.c
> > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > @@ -1341,10 +1341,21 @@ static int xe_pt_vm_dependencies(struct xe_sched_job *job,
> >   	}
> >   	if (!(pt_update_ops->q->flags & EXEC_QUEUE_FLAG_KERNEL)) {
> > +		u64 mask_ctx0 = NO_MASK_DEP, mask_ctx1 = NO_MASK_DEP;
> > +
> > +		if (ijob)
> > +			mask_ctx0 = xe_tlb_inval_job_fence_context(ijob);
> > +		if (mjob)
> > +			mask_ctx1 = xe_tlb_inval_job_fence_context(mjob);
> > +
> >   		if (job)
> > -			err = xe_sched_job_last_fence_add_dep(job, vm);
> > +			err = xe_sched_job_last_fence_add_dep(job, vm,
> > +							      mask_ctx0,
> > +							      mask_ctx1);
> >   		else
> > -			err = xe_exec_queue_last_fence_test_dep(pt_update_ops->q, vm);
> > +			err = xe_exec_queue_last_fence_test_dep(pt_update_ops->q,
> > +								vm, mask_ctx0,
> > +								mask_ctx1);
> >   	}
> >   	for (i = 0; job && !err && i < vops->num_syncs; i++)
> > diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
> > index d21bf8f26964..7cbdd87904c6 100644
> > --- a/drivers/gpu/drm/xe/xe_sched_job.c
> > +++ b/drivers/gpu/drm/xe/xe_sched_job.c
> > @@ -6,6 +6,7 @@
> >   #include "xe_sched_job.h"
> >   #include <uapi/drm/xe_drm.h>
> > +#include <linux/dma-fence-array.h>
> >   #include <linux/dma-fence-chain.h>
> >   #include <linux/slab.h>
> > @@ -295,19 +296,60 @@ void xe_sched_job_push(struct xe_sched_job *job)
> >   	xe_sched_job_put(job);
> >   }
> > +/**
> > + * xe_sched_job_mask_dependency() - Determine if a dma-fence dependency can be masked
> > + * @fence: The dma-fence to check
> > + * @mask_ctx0: First context to compare against the fence's context
> > + * @mask_ctx1: Second context to compare against the fence's context
> > + *
> > + * This function checks whether the context of the given dma-fence matches
> > + * either of the provided mask contexts. If a match is found, the dependency
> > + * represented by the fence can be skipped. If the fence is a dma-fence-array,
> > + * its individual fences are unwound and checked.
> > + *
> > + * Return: true if the fence can be masked (i.e., skipped), false otherwise.
> > + */
> > +bool xe_sched_job_mask_dependency(struct dma_fence *fence, u64 mask_ctx0,
> > +				  u64 mask_ctx1)
> > +{
> > +	if (dma_fence_is_array(fence)) {
> > +		struct dma_fence *__fence;
> > +		int index;
> > +
> > +		dma_fence_array_for_each(__fence, index, fence)
> > +			if (__fence->context == mask_ctx0 ||
> > +			    __fence->context == mask_ctx1)
> > +				return true;
> > +	} else if (fence->context == mask_ctx0 ||
> > +		   fence->context == mask_ctx1) {
> > +		return true;
> > +	}
> > +
> > +	return false;
> > +}
> > +
> >   /**
> >    * xe_sched_job_last_fence_add_dep - Add last fence dependency to job
> >    * @job:job to add the last fence dependency to
> >    * @vm: virtual memory job belongs to
> > + * @mask_ctx0: Mask dma-fence context0
> > + * @mask_ctx1: Mask dma-fence context1
> > + *
> > + * Add last fence dependency to job, skipping masked dma fence contexts.
> >    *
> >    * Returns:
> >    * 0 on success, or an error on failing to expand the array.
> >    */
> > -int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm)
> > +int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm,
> > +				    u64 mask_ctx0, u64 mask_ctx1)
> >   {
> >   	struct dma_fence *fence;
> >   	fence = xe_exec_queue_last_fence_get(job->q, vm);
> > +	if (xe_sched_job_mask_dependency(fence, mask_ctx0, mask_ctx1)) {
> > +		dma_fence_put(fence);
> > +		return 0;
> > +	}
> >   	return drm_sched_job_add_dependency(&job->drm, fence);
> >   }
> > diff --git a/drivers/gpu/drm/xe/xe_sched_job.h b/drivers/gpu/drm/xe/xe_sched_job.h
> > index 3dc72c5c1f13..81d8e848e605 100644
> > --- a/drivers/gpu/drm/xe/xe_sched_job.h
> > +++ b/drivers/gpu/drm/xe/xe_sched_job.h
> > @@ -58,7 +58,8 @@ bool xe_sched_job_completed(struct xe_sched_job *job);
> >   void xe_sched_job_arm(struct xe_sched_job *job);
> >   void xe_sched_job_push(struct xe_sched_job *job);
> > -int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm);
> > +int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm,
> > +				    u64 mask_ctx0, u64 mask_ctx1);
> >   void xe_sched_job_init_user_fence(struct xe_sched_job *job,
> >   				  struct xe_sync_entry *sync);
> > @@ -93,4 +94,8 @@ void xe_sched_job_snapshot_print(struct xe_sched_job_snapshot *snapshot, struct
> >   int xe_sched_job_add_deps(struct xe_sched_job *job, struct dma_resv *resv,
> >   			  enum dma_resv_usage usage);
> > +#define NO_MASK_DEP	(~0x0ull)
> > +bool xe_sched_job_mask_dependency(struct dma_fence *fence, u64 mask_ctx0,
> > +				  u64 mask_ctx1);
> > +
> >   #endif
> > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.c b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > index 492def04a559..f2fe7f9fbb22 100644
> > --- a/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > @@ -32,6 +32,8 @@ struct xe_tlb_inval_job {
> >   	u64 start;
> >   	/** @end: End address to invalidate */
> >   	u64 end;
> > +	/** @fence_context: Fence context for job */
> > +	u64 fence_context;
> >   	/** @asid: Address space ID to invalidate */
> >   	u32 asid;
> >   	/** @fence_armed: Fence has been armed */
> > @@ -101,6 +103,7 @@ xe_tlb_inval_job_create(struct xe_exec_queue *q, struct xe_tlb_inval *tlb_inval,
> >   	job->asid = asid;
> >   	job->fence_armed = false;
> >   	job->dep.ops = &dep_job_ops;
> > +	job->fence_context = entity->fence_context + 1;
> 
> As a side note, hardcoding the assumption on how scheduler allocates
> contexts is not great given recent efforts to make drivers know less of the
> scheduler internals.
> 

Yes, we should probably have a helper here — maybe
drm_sched_job_finished_context?

I was planning to roll this change into [1], but that series hasn’t
gained much traction, and fixing this is a fairly high-priority issue
for customers.

This is documented in the DRM scheduler kernel docs:
entity->fence_context + 1 is the job's finished context.

[1] https://patchwork.freedesktop.org/series/155314/

> But what I really wanted to ask is, having only glanced the patch briefly,
> could xe performance problem here also be solved by unwrapping the container
> fences at the DRM scheduler dependency tracking level?
> 

This is primarily about preventing TLB fences — which originate from a
different context than the bind queue but are still ordered on the queue
— from becoming dependencies. The process involves two passes: in the
first pass, we detect dependencies. If none are found, we immediately
complete the bind via the CPU. If dependencies are present, we defer the
bind to the GPU.

> I am asking because amdgpu recently posted a patch to unwrap in their code
> for potentially similar performance reasons, and if now xe wants something
> similar, or even the same, it is an interesting question where to do it.
> 
> Also, I have a patch (not sure if I posted it so far) which unwraps in
> drm_sched_job_add_dependency() and converst the dependency xarray to
> unwrapped dma-fence-array. Initial idea there was to allow scheduler worker
> to only be woken up once, once all deps are signaled, but now if two drivers
> seems to be unwrapping fences maybe there is a case to be made for doing it
> in the core.
> 

I don't think this is the same problem as the one above, but it's an
interesting idea in general. CC me if you post this one.

Matt

> Regards,
> 
> Tvrtko
> 
> >   	kref_init(&job->refcount);
> >   	xe_exec_queue_get(q);	/* Pairs with put in xe_tlb_inval_job_destroy */
> > @@ -266,3 +269,14 @@ void xe_tlb_inval_job_put(struct xe_tlb_inval_job *job)
> >   	if (!IS_ERR_OR_NULL(job))
> >   		kref_put(&job->refcount, xe_tlb_inval_job_destroy);
> >   }
> > +
> > +/**
> > + * xe_tlb_inval_job_fence_context() - TLB invalidation job fence context
> > + * @job: TLB invalidation job object
> > + *
> > + * Return: TLB invalidation job fence context
> > + */
> > +u64 xe_tlb_inval_job_fence_context(struct xe_tlb_inval_job *job)
> > +{
> > +	return job->fence_context;
> > +}
> > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.h b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > index e63edcb26b50..2576165c2228 100644
> > --- a/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > @@ -30,4 +30,6 @@ void xe_tlb_inval_job_get(struct xe_tlb_inval_job *job);
> >   void xe_tlb_inval_job_put(struct xe_tlb_inval_job *job);
> > +u64 xe_tlb_inval_job_fence_context(struct xe_tlb_inval_job *job);
> > +
> >   #endif
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] drm/xe: Avoid serializing unbind jobs on prior TLB invalidations
  2025-10-17 16:52 ` [PATCH 1/1] drm/xe: Avoid serializing unbind jobs on prior TLB invalidations Matthew Brost
  2025-10-21 17:55   ` Summers, Stuart
  2025-10-22  8:00   ` Tvrtko Ursulin
@ 2025-10-23 12:28   ` Thomas Hellström
  2 siblings, 0 replies; 15+ messages in thread
From: Thomas Hellström @ 2025-10-23 12:28 UTC (permalink / raw)
  To: Matthew Brost, intel-xe; +Cc: carlos.santa

Hi Matt

On Fri, 2025-10-17 at 09:52 -0700, Matthew Brost wrote:
> When a burst of unbind jobs is issued, a dependency chain can form
> between the TLB invalidation of a previous unbind job and the current
> one. This leads to undesirable serialization, causing current jobs to
> wait unnecessarily for prior TLB invalidations, execute on the GPU
> when
> not needed, and significantly slow down the unbind burst—resulting in
> up
> to a 4× slowdown.
> 
> To break this chain, mask the last bind queue dependency if the last
> fence's DMA context matches the TLB invalidation context. This allows
> full pipelining of unbinds and TLB invalidations while preserving
> correct dma-fence signaling semantics.
> 
> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6047
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Some comments below.

Apart from that, it looks like this stems from us always combining the
exec_queue fence and TLB fence into an out_fence and then use that as
the exec_queue_last_fence.

But IMO the exec_queue last fence should always be a single gpu_job
fence, with the exception of no bind jobs, so that we store the gpu_job
fence as the last fence, and *then* combine with any TLB fence as the
out fence.


> ---
>  drivers/gpu/drm/xe/xe_exec.c          |  3 +-
>  drivers/gpu/drm/xe/xe_exec_queue.c    | 18 +++++++++--
>  drivers/gpu/drm/xe/xe_exec_queue.h    |  3 +-
>  drivers/gpu/drm/xe/xe_pt.c            | 15 +++++++--
>  drivers/gpu/drm/xe/xe_sched_job.c     | 44
> ++++++++++++++++++++++++++-
>  drivers/gpu/drm/xe/xe_sched_job.h     |  7 ++++-
>  drivers/gpu/drm/xe/xe_tlb_inval_job.c | 14 +++++++++
>  drivers/gpu/drm/xe/xe_tlb_inval_job.h |  2 ++
>  8 files changed, 98 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_exec.c
> b/drivers/gpu/drm/xe/xe_exec.c
> index 0dc27476832b..6034cfc8be06 100644
> --- a/drivers/gpu/drm/xe/xe_exec.c
> +++ b/drivers/gpu/drm/xe/xe_exec.c
> @@ -294,7 +294,8 @@ int xe_exec_ioctl(struct drm_device *dev, void
> *data, struct drm_file *file)
>  		goto err_put_job;
>  
>  	if (!xe_vm_in_lr_mode(vm)) {
> -		err = xe_sched_job_last_fence_add_dep(job, vm);
> +		err = xe_sched_job_last_fence_add_dep(job, vm,
> NO_MASK_DEP,
> +						      NO_MASK_DEP);
>  		if (err)
>  			goto err_put_job;
>  
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> b/drivers/gpu/drm/xe/xe_exec_queue.c
> index 90cbc95f8e2e..d6f69d9bccba 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -25,6 +25,7 @@
>  #include "xe_migrate.h"
>  #include "xe_pm.h"
>  #include "xe_ring_ops_types.h"
> +#include "xe_sched_job.h"
>  #include "xe_trace.h"
>  #include "xe_vm.h"
>  #include "xe_pxp.h"
> @@ -1106,11 +1107,17 @@ void xe_exec_queue_last_fence_set(struct
> xe_exec_queue *q, struct xe_vm *vm,
>   * xe_exec_queue_last_fence_test_dep - Test last fence dependency of
> queue
>   * @q: The exec queue
>   * @vm: The VM the engine does a bind or exec for
> + * @mask_ctx0: Mask dma-fence context0
> + * @mask_ctx1: Mask dma-fence context1
> + *
> + * Test last fence dependency of queue, skipping masked dma fence
> contexts.
>   *
>   * Returns:
> - * -ETIME if there exists an unsignalled last fence dependency, zero
> otherwise.
> + * -ETIME if there exists an unsignalled and unmasked last fence
> dependency,
> + * zero otherwise.
>   */
> -int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
> struct xe_vm *vm)
> +int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
> struct xe_vm *vm,
> +				      u64 mask_ctx0, u64 mask_ctx1)
>  {
>  	struct dma_fence *fence;
>  	int err = 0;
> @@ -1119,6 +1126,13 @@ int xe_exec_queue_last_fence_test_dep(struct
> xe_exec_queue *q, struct xe_vm *vm)
>  	if (fence) {
>  		err = test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence-
> >flags) ?
>  			0 : -ETIME;
> +
> +		if (err == -ETIME) {
> +			if (xe_sched_job_mask_dependency(fence,
> mask_ctx0,
> +							 mask_ctx1))
> +				err = 0;
> +		}
> +
>  		dma_fence_put(fence);
>  	}
>  
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h
> b/drivers/gpu/drm/xe/xe_exec_queue.h
> index a4dfbe858bda..99a35b22a46c 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
> @@ -85,7 +85,8 @@ struct dma_fence
> *xe_exec_queue_last_fence_get_for_resume(struct xe_exec_queue *
>  void xe_exec_queue_last_fence_set(struct xe_exec_queue *e, struct
> xe_vm *vm,
>  				  struct dma_fence *fence);
>  int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
> -				      struct xe_vm *vm);
> +				      struct xe_vm *vm, u64
> mask_ctx0,
> +				      u64 mask_ctx1);
>  void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q);
>  
>  int xe_exec_queue_contexts_hwsp_rebase(struct xe_exec_queue *q, void
> *scratch);
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index d22fd1ccc0ba..bba9ae559f57 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -1341,10 +1341,21 @@ static int xe_pt_vm_dependencies(struct
> xe_sched_job *job,
>  	}
>  
>  	if (!(pt_update_ops->q->flags & EXEC_QUEUE_FLAG_KERNEL)) {
> +		u64 mask_ctx0 = NO_MASK_DEP, mask_ctx1 =
> NO_MASK_DEP;
> +
> +		if (ijob)
> +			mask_ctx0 =
> xe_tlb_inval_job_fence_context(ijob);
> +		if (mjob)
> +			mask_ctx1 =
> xe_tlb_inval_job_fence_context(mjob);
> +
>  		if (job)
> -			err = xe_sched_job_last_fence_add_dep(job,
> vm);
> +			err = xe_sched_job_last_fence_add_dep(job,
> vm,
> +							     
> mask_ctx0,
> +							     
> mask_ctx1);
>  		else
> -			err =
> xe_exec_queue_last_fence_test_dep(pt_update_ops->q, vm);
> +			err =
> xe_exec_queue_last_fence_test_dep(pt_update_ops->q,
> +								vm,
> mask_ctx0,
> +								mask
> _ctx1);
>  	}
>  
>  	for (i = 0; job && !err && i < vops->num_syncs; i++)
> diff --git a/drivers/gpu/drm/xe/xe_sched_job.c
> b/drivers/gpu/drm/xe/xe_sched_job.c
> index d21bf8f26964..7cbdd87904c6 100644
> --- a/drivers/gpu/drm/xe/xe_sched_job.c
> +++ b/drivers/gpu/drm/xe/xe_sched_job.c
> @@ -6,6 +6,7 @@
>  #include "xe_sched_job.h"
>  
>  #include <uapi/drm/xe_drm.h>
> +#include <linux/dma-fence-array.h>
>  #include <linux/dma-fence-chain.h>
>  #include <linux/slab.h>
>  
> @@ -295,19 +296,60 @@ void xe_sched_job_push(struct xe_sched_job
> *job)
>  	xe_sched_job_put(job);
>  }
>  
> +/**
> + * xe_sched_job_mask_dependency() - Determine if a dma-fence
> dependency can be masked
> + * @fence: The dma-fence to check
> + * @mask_ctx0: First context to compare against the fence's context
> + * @mask_ctx1: Second context to compare against the fence's context
> + *
> + * This function checks whether the context of the given dma-fence
> matches
> + * either of the provided mask contexts. If a match is found, the
> dependency
> + * represented by the fence can be skipped. If the fence is a dma-
> fence-array,
> + * its individual fences are unwound and checked.
> + *
> + * Return: true if the fence can be masked (i.e., skipped), false
> otherwise.
> + */
> +bool xe_sched_job_mask_dependency(struct dma_fence *fence, u64
> mask_ctx0,
> +				  u64 mask_ctx1)
> +{
> +	if (dma_fence_is_array(fence)) {
> +		struct dma_fence *__fence;
> +		int index;
> +
> +		dma_fence_array_for_each(__fence, index, fence)
> +			if (__fence->context == mask_ctx0 ||
> +			    __fence->context == mask_ctx1)
> +				return true;

What if there are other fences in the array that don't match the
contexts? Don't we lose them.

On a different side-note, when looking at the code it looks like the
last fence could in theory be an array_fence[tiles] of
array_fence[gts]. I don't think we have such HW yet, but IIRC dma_fence
container rules do not allow that.

Thanks,
Thomas



> +	} else if (fence->context == mask_ctx0 ||
> +		   fence->context == mask_ctx1) {
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
>  /**
>   * xe_sched_job_last_fence_add_dep - Add last fence dependency to
> job
>   * @job:job to add the last fence dependency to
>   * @vm: virtual memory job belongs to
> + * @mask_ctx0: Mask dma-fence context0
> + * @mask_ctx1: Mask dma-fence context1
> + *
> + * Add last fence dependency to job, skipping masked dma fence
> contexts.
>   *
>   * Returns:
>   * 0 on success, or an error on failing to expand the array.
>   */
> -int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct
> xe_vm *vm)
> +int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct
> xe_vm *vm,
> +				    u64 mask_ctx0, u64 mask_ctx1)
>  {
>  	struct dma_fence *fence;
>  
>  	fence = xe_exec_queue_last_fence_get(job->q, vm);
> +	if (xe_sched_job_mask_dependency(fence, mask_ctx0,
> mask_ctx1)) {
> +		dma_fence_put(fence);
> +		return 0;
> +	}
>  
>  	return drm_sched_job_add_dependency(&job->drm, fence);
>  }
> diff --git a/drivers/gpu/drm/xe/xe_sched_job.h
> b/drivers/gpu/drm/xe/xe_sched_job.h
> index 3dc72c5c1f13..81d8e848e605 100644
> --- a/drivers/gpu/drm/xe/xe_sched_job.h
> +++ b/drivers/gpu/drm/xe/xe_sched_job.h
> @@ -58,7 +58,8 @@ bool xe_sched_job_completed(struct xe_sched_job
> *job);
>  void xe_sched_job_arm(struct xe_sched_job *job);
>  void xe_sched_job_push(struct xe_sched_job *job);
>  
> -int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct
> xe_vm *vm);
> +int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct
> xe_vm *vm,
> +				    u64 mask_ctx0, u64 mask_ctx1);
>  void xe_sched_job_init_user_fence(struct xe_sched_job *job,
>  				  struct xe_sync_entry *sync);
>  
> @@ -93,4 +94,8 @@ void xe_sched_job_snapshot_print(struct
> xe_sched_job_snapshot *snapshot, struct
>  int xe_sched_job_add_deps(struct xe_sched_job *job, struct dma_resv
> *resv,
>  			  enum dma_resv_usage usage);
>  
> +#define NO_MASK_DEP	(~0x0ull)
> +bool xe_sched_job_mask_dependency(struct dma_fence *fence, u64
> mask_ctx0,
> +				  u64 mask_ctx1);
> +
>  #endif
> diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> index 492def04a559..f2fe7f9fbb22 100644
> --- a/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> +++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> @@ -32,6 +32,8 @@ struct xe_tlb_inval_job {
>  	u64 start;
>  	/** @end: End address to invalidate */
>  	u64 end;
> +	/** @fence_context: Fence context for job */
> +	u64 fence_context;
>  	/** @asid: Address space ID to invalidate */
>  	u32 asid;
>  	/** @fence_armed: Fence has been armed */
> @@ -101,6 +103,7 @@ xe_tlb_inval_job_create(struct xe_exec_queue *q,
> struct xe_tlb_inval *tlb_inval,
>  	job->asid = asid;
>  	job->fence_armed = false;
>  	job->dep.ops = &dep_job_ops;
> +	job->fence_context = entity->fence_context + 1;
>  	kref_init(&job->refcount);
>  	xe_exec_queue_get(q);	/* Pairs with put in
> xe_tlb_inval_job_destroy */
>  
> @@ -266,3 +269,14 @@ void xe_tlb_inval_job_put(struct
> xe_tlb_inval_job *job)
>  	if (!IS_ERR_OR_NULL(job))
>  		kref_put(&job->refcount, xe_tlb_inval_job_destroy);
>  }
> +
> +/**
> + * xe_tlb_inval_job_fence_context() - TLB invalidation job fence
> context
> + * @job: TLB invalidation job object
> + *
> + * Return: TLB invalidation job fence context
> + */
> +u64 xe_tlb_inval_job_fence_context(struct xe_tlb_inval_job *job)
> +{
> +	return job->fence_context;
> +}
> diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> index e63edcb26b50..2576165c2228 100644
> --- a/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> +++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> @@ -30,4 +30,6 @@ void xe_tlb_inval_job_get(struct xe_tlb_inval_job
> *job);
>  
>  void xe_tlb_inval_job_put(struct xe_tlb_inval_job *job);
>  
> +u64 xe_tlb_inval_job_fence_context(struct xe_tlb_inval_job *job);
> +
>  #endif


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] drm/xe: Avoid serializing unbind jobs on prior TLB invalidations
  2025-10-22 15:10     ` Matthew Brost
@ 2025-10-23 12:46       ` Tvrtko Ursulin
  2025-10-23 18:55         ` Matthew Brost
  0 siblings, 1 reply; 15+ messages in thread
From: Tvrtko Ursulin @ 2025-10-23 12:46 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, carlos.santa, thomas.hellstrom, Philipp Stanner


On 22/10/2025 16:10, Matthew Brost wrote:
> On Wed, Oct 22, 2025 at 09:00:47AM +0100, Tvrtko Ursulin wrote:
>>
>> On 17/10/2025 17:52, Matthew Brost wrote:
>>> When a burst of unbind jobs is issued, a dependency chain can form
>>> between the TLB invalidation of a previous unbind job and the current
>>> one. This leads to undesirable serialization, causing current jobs to
>>> wait unnecessarily for prior TLB invalidations, execute on the GPU when
>>> not needed, and significantly slow down the unbind burst—resulting in up
>>> to a 4× slowdown.
>>>
>>> To break this chain, mask the last bind queue dependency if the last
>>> fence's DMA context matches the TLB invalidation context. This allows
>>> full pipelining of unbinds and TLB invalidations while preserving
>>> correct dma-fence signaling semantics.
>>>
>>> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6047
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    drivers/gpu/drm/xe/xe_exec.c          |  3 +-
>>>    drivers/gpu/drm/xe/xe_exec_queue.c    | 18 +++++++++--
>>>    drivers/gpu/drm/xe/xe_exec_queue.h    |  3 +-
>>>    drivers/gpu/drm/xe/xe_pt.c            | 15 +++++++--
>>>    drivers/gpu/drm/xe/xe_sched_job.c     | 44 ++++++++++++++++++++++++++-
>>>    drivers/gpu/drm/xe/xe_sched_job.h     |  7 ++++-
>>>    drivers/gpu/drm/xe/xe_tlb_inval_job.c | 14 +++++++++
>>>    drivers/gpu/drm/xe/xe_tlb_inval_job.h |  2 ++
>>>    8 files changed, 98 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
>>> index 0dc27476832b..6034cfc8be06 100644
>>> --- a/drivers/gpu/drm/xe/xe_exec.c
>>> +++ b/drivers/gpu/drm/xe/xe_exec.c
>>> @@ -294,7 +294,8 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>>>    		goto err_put_job;
>>>    	if (!xe_vm_in_lr_mode(vm)) {
>>> -		err = xe_sched_job_last_fence_add_dep(job, vm);
>>> +		err = xe_sched_job_last_fence_add_dep(job, vm, NO_MASK_DEP,
>>> +						      NO_MASK_DEP);
>>>    		if (err)
>>>    			goto err_put_job;
>>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
>>> index 90cbc95f8e2e..d6f69d9bccba 100644
>>> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
>>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
>>> @@ -25,6 +25,7 @@
>>>    #include "xe_migrate.h"
>>>    #include "xe_pm.h"
>>>    #include "xe_ring_ops_types.h"
>>> +#include "xe_sched_job.h"
>>>    #include "xe_trace.h"
>>>    #include "xe_vm.h"
>>>    #include "xe_pxp.h"
>>> @@ -1106,11 +1107,17 @@ void xe_exec_queue_last_fence_set(struct xe_exec_queue *q, struct xe_vm *vm,
>>>     * xe_exec_queue_last_fence_test_dep - Test last fence dependency of queue
>>>     * @q: The exec queue
>>>     * @vm: The VM the engine does a bind or exec for
>>> + * @mask_ctx0: Mask dma-fence context0
>>> + * @mask_ctx1: Mask dma-fence context1
>>> + *
>>> + * Test last fence dependency of queue, skipping masked dma fence contexts.
>>>     *
>>>     * Returns:
>>> - * -ETIME if there exists an unsignalled last fence dependency, zero otherwise.
>>> + * -ETIME if there exists an unsignalled and unmasked last fence dependency,
>>> + * zero otherwise.
>>>     */
>>> -int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q, struct xe_vm *vm)
>>> +int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q, struct xe_vm *vm,
>>> +				      u64 mask_ctx0, u64 mask_ctx1)
>>>    {
>>>    	struct dma_fence *fence;
>>>    	int err = 0;
>>> @@ -1119,6 +1126,13 @@ int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q, struct xe_vm *vm)
>>>    	if (fence) {
>>>    		err = test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags) ?
>>>    			0 : -ETIME;
>>> +
>>> +		if (err == -ETIME) {
>>> +			if (xe_sched_job_mask_dependency(fence, mask_ctx0,
>>> +							 mask_ctx1))
>>> +				err = 0;
>>> +		}
>>> +
>>>    		dma_fence_put(fence);
>>>    	}
>>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
>>> index a4dfbe858bda..99a35b22a46c 100644
>>> --- a/drivers/gpu/drm/xe/xe_exec_queue.h
>>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
>>> @@ -85,7 +85,8 @@ struct dma_fence *xe_exec_queue_last_fence_get_for_resume(struct xe_exec_queue *
>>>    void xe_exec_queue_last_fence_set(struct xe_exec_queue *e, struct xe_vm *vm,
>>>    				  struct dma_fence *fence);
>>>    int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
>>> -				      struct xe_vm *vm);
>>> +				      struct xe_vm *vm, u64 mask_ctx0,
>>> +				      u64 mask_ctx1);
>>>    void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q);
>>>    int xe_exec_queue_contexts_hwsp_rebase(struct xe_exec_queue *q, void *scratch);
>>> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
>>> index d22fd1ccc0ba..bba9ae559f57 100644
>>> --- a/drivers/gpu/drm/xe/xe_pt.c
>>> +++ b/drivers/gpu/drm/xe/xe_pt.c
>>> @@ -1341,10 +1341,21 @@ static int xe_pt_vm_dependencies(struct xe_sched_job *job,
>>>    	}
>>>    	if (!(pt_update_ops->q->flags & EXEC_QUEUE_FLAG_KERNEL)) {
>>> +		u64 mask_ctx0 = NO_MASK_DEP, mask_ctx1 = NO_MASK_DEP;
>>> +
>>> +		if (ijob)
>>> +			mask_ctx0 = xe_tlb_inval_job_fence_context(ijob);
>>> +		if (mjob)
>>> +			mask_ctx1 = xe_tlb_inval_job_fence_context(mjob);
>>> +
>>>    		if (job)
>>> -			err = xe_sched_job_last_fence_add_dep(job, vm);
>>> +			err = xe_sched_job_last_fence_add_dep(job, vm,
>>> +							      mask_ctx0,
>>> +							      mask_ctx1);
>>>    		else
>>> -			err = xe_exec_queue_last_fence_test_dep(pt_update_ops->q, vm);
>>> +			err = xe_exec_queue_last_fence_test_dep(pt_update_ops->q,
>>> +								vm, mask_ctx0,
>>> +								mask_ctx1);
>>>    	}
>>>    	for (i = 0; job && !err && i < vops->num_syncs; i++)
>>> diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
>>> index d21bf8f26964..7cbdd87904c6 100644
>>> --- a/drivers/gpu/drm/xe/xe_sched_job.c
>>> +++ b/drivers/gpu/drm/xe/xe_sched_job.c
>>> @@ -6,6 +6,7 @@
>>>    #include "xe_sched_job.h"
>>>    #include <uapi/drm/xe_drm.h>
>>> +#include <linux/dma-fence-array.h>
>>>    #include <linux/dma-fence-chain.h>
>>>    #include <linux/slab.h>
>>> @@ -295,19 +296,60 @@ void xe_sched_job_push(struct xe_sched_job *job)
>>>    	xe_sched_job_put(job);
>>>    }
>>> +/**
>>> + * xe_sched_job_mask_dependency() - Determine if a dma-fence dependency can be masked
>>> + * @fence: The dma-fence to check
>>> + * @mask_ctx0: First context to compare against the fence's context
>>> + * @mask_ctx1: Second context to compare against the fence's context
>>> + *
>>> + * This function checks whether the context of the given dma-fence matches
>>> + * either of the provided mask contexts. If a match is found, the dependency
>>> + * represented by the fence can be skipped. If the fence is a dma-fence-array,
>>> + * its individual fences are unwound and checked.
>>> + *
>>> + * Return: true if the fence can be masked (i.e., skipped), false otherwise.
>>> + */
>>> +bool xe_sched_job_mask_dependency(struct dma_fence *fence, u64 mask_ctx0,
>>> +				  u64 mask_ctx1)
>>> +{
>>> +	if (dma_fence_is_array(fence)) {
>>> +		struct dma_fence *__fence;
>>> +		int index;
>>> +
>>> +		dma_fence_array_for_each(__fence, index, fence)
>>> +			if (__fence->context == mask_ctx0 ||
>>> +			    __fence->context == mask_ctx1)
>>> +				return true;
>>> +	} else if (fence->context == mask_ctx0 ||
>>> +		   fence->context == mask_ctx1) {
>>> +		return true;
>>> +	}
>>> +
>>> +	return false;
>>> +}
>>> +
>>>    /**
>>>     * xe_sched_job_last_fence_add_dep - Add last fence dependency to job
>>>     * @job:job to add the last fence dependency to
>>>     * @vm: virtual memory job belongs to
>>> + * @mask_ctx0: Mask dma-fence context0
>>> + * @mask_ctx1: Mask dma-fence context1
>>> + *
>>> + * Add last fence dependency to job, skipping masked dma fence contexts.
>>>     *
>>>     * Returns:
>>>     * 0 on success, or an error on failing to expand the array.
>>>     */
>>> -int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm)
>>> +int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm,
>>> +				    u64 mask_ctx0, u64 mask_ctx1)
>>>    {
>>>    	struct dma_fence *fence;
>>>    	fence = xe_exec_queue_last_fence_get(job->q, vm);
>>> +	if (xe_sched_job_mask_dependency(fence, mask_ctx0, mask_ctx1)) {
>>> +		dma_fence_put(fence);
>>> +		return 0;
>>> +	}
>>>    	return drm_sched_job_add_dependency(&job->drm, fence);
>>>    }
>>> diff --git a/drivers/gpu/drm/xe/xe_sched_job.h b/drivers/gpu/drm/xe/xe_sched_job.h
>>> index 3dc72c5c1f13..81d8e848e605 100644
>>> --- a/drivers/gpu/drm/xe/xe_sched_job.h
>>> +++ b/drivers/gpu/drm/xe/xe_sched_job.h
>>> @@ -58,7 +58,8 @@ bool xe_sched_job_completed(struct xe_sched_job *job);
>>>    void xe_sched_job_arm(struct xe_sched_job *job);
>>>    void xe_sched_job_push(struct xe_sched_job *job);
>>> -int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm);
>>> +int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm,
>>> +				    u64 mask_ctx0, u64 mask_ctx1);
>>>    void xe_sched_job_init_user_fence(struct xe_sched_job *job,
>>>    				  struct xe_sync_entry *sync);
>>> @@ -93,4 +94,8 @@ void xe_sched_job_snapshot_print(struct xe_sched_job_snapshot *snapshot, struct
>>>    int xe_sched_job_add_deps(struct xe_sched_job *job, struct dma_resv *resv,
>>>    			  enum dma_resv_usage usage);
>>> +#define NO_MASK_DEP	(~0x0ull)
>>> +bool xe_sched_job_mask_dependency(struct dma_fence *fence, u64 mask_ctx0,
>>> +				  u64 mask_ctx1);
>>> +
>>>    #endif
>>> diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.c b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
>>> index 492def04a559..f2fe7f9fbb22 100644
>>> --- a/drivers/gpu/drm/xe/xe_tlb_inval_job.c
>>> +++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
>>> @@ -32,6 +32,8 @@ struct xe_tlb_inval_job {
>>>    	u64 start;
>>>    	/** @end: End address to invalidate */
>>>    	u64 end;
>>> +	/** @fence_context: Fence context for job */
>>> +	u64 fence_context;
>>>    	/** @asid: Address space ID to invalidate */
>>>    	u32 asid;
>>>    	/** @fence_armed: Fence has been armed */
>>> @@ -101,6 +103,7 @@ xe_tlb_inval_job_create(struct xe_exec_queue *q, struct xe_tlb_inval *tlb_inval,
>>>    	job->asid = asid;
>>>    	job->fence_armed = false;
>>>    	job->dep.ops = &dep_job_ops;
>>> +	job->fence_context = entity->fence_context + 1;
>>
>> As a side note, hardcoding the assumption on how scheduler allocates
>> contexts is not great given recent efforts to make drivers know less of the
>> scheduler internals.
>>
> 
> Yes, we should probably have a helper here — maybe
> drm_sched_job_finished_context?
> 
> I was planning to roll this change into [1], but that series hasn’t
> gained much traction, and fixing this is a fairly high-priority issue
> for customers.
> 
> This is documented in the DRM scheduler kernel docs:
> entity->fence_context + 1 is the job's finished context.
> 
> [1] https://patchwork.freedesktop.org/series/155314/
> 
>> But what I really wanted to ask is, having only glanced the patch briefly,
>> could xe performance problem here also be solved by unwrapping the container
>> fences at the DRM scheduler dependency tracking level?
>>
> 
> This is primarily about preventing TLB fences — which originate from a
> different context than the bind queue but are still ordered on the queue
> — from becoming dependencies. The process involves two passes: in the
> first pass, we detect dependencies. If none are found, we immediately
> complete the bind via the CPU. If dependencies are present, we defer the
> bind to the GPU.

Interesting, I saw fence unwrapping and context number checking and 
thought it was maybe the same problem. I do not fully understand what xe 
is doing to comment strongly but it does raise a question on whether 
there could be a more elegant solution (ie not a hack).

Could the two entities be shared and would that solve the problem? I 
mean the TLB invalidation and the bind queue entities, do they need to 
be separate if the assumption and guarantee is to execute in order?

>> I am asking because amdgpu recently posted a patch to unwrap in their code
>> for potentially similar performance reasons, and if now xe wants something
>> similar, or even the same, it is an interesting question where to do it.
>>
>> Also, I have a patch (not sure if I posted it so far) which unwraps in
>> drm_sched_job_add_dependency() and converst the dependency xarray to
>> unwrapped dma-fence-array. Initial idea there was to allow scheduler worker
>> to only be woken up once, once all deps are signaled, but now if two drivers
>> seems to be unwrapping fences maybe there is a case to be made for doing it
>> in the core.
>>
> 
> I don't think this is the same problem as the one above, but it's an
> interesting idea in general. CC me if you post this one.

Okay, but since it sounds it would not be helping here it will not be 
priority to clean it up and send so might be a while.

Regards,

Tvrtko

> 
> Matt
> 
>> Regards,
>>
>> Tvrtko
>>
>>>    	kref_init(&job->refcount);
>>>    	xe_exec_queue_get(q);	/* Pairs with put in xe_tlb_inval_job_destroy */
>>> @@ -266,3 +269,14 @@ void xe_tlb_inval_job_put(struct xe_tlb_inval_job *job)
>>>    	if (!IS_ERR_OR_NULL(job))
>>>    		kref_put(&job->refcount, xe_tlb_inval_job_destroy);
>>>    }
>>> +
>>> +/**
>>> + * xe_tlb_inval_job_fence_context() - TLB invalidation job fence context
>>> + * @job: TLB invalidation job object
>>> + *
>>> + * Return: TLB invalidation job fence context
>>> + */
>>> +u64 xe_tlb_inval_job_fence_context(struct xe_tlb_inval_job *job)
>>> +{
>>> +	return job->fence_context;
>>> +}
>>> diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.h b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
>>> index e63edcb26b50..2576165c2228 100644
>>> --- a/drivers/gpu/drm/xe/xe_tlb_inval_job.h
>>> +++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
>>> @@ -30,4 +30,6 @@ void xe_tlb_inval_job_get(struct xe_tlb_inval_job *job);
>>>    void xe_tlb_inval_job_put(struct xe_tlb_inval_job *job);
>>> +u64 xe_tlb_inval_job_fence_context(struct xe_tlb_inval_job *job);
>>> +
>>>    #endif
>>






^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] drm/xe: Avoid serializing unbind jobs on prior TLB invalidations
  2025-10-23 12:46       ` Tvrtko Ursulin
@ 2025-10-23 18:55         ` Matthew Brost
  2025-10-23 19:27           ` Matthew Brost
  0 siblings, 1 reply; 15+ messages in thread
From: Matthew Brost @ 2025-10-23 18:55 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-xe, carlos.santa, thomas.hellstrom, Philipp Stanner

On Thu, Oct 23, 2025 at 01:46:26PM +0100, Tvrtko Ursulin wrote:
> 
> On 22/10/2025 16:10, Matthew Brost wrote:
> > On Wed, Oct 22, 2025 at 09:00:47AM +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 17/10/2025 17:52, Matthew Brost wrote:
> > > > When a burst of unbind jobs is issued, a dependency chain can form
> > > > between the TLB invalidation of a previous unbind job and the current
> > > > one. This leads to undesirable serialization, causing current jobs to
> > > > wait unnecessarily for prior TLB invalidations, execute on the GPU when
> > > > not needed, and significantly slow down the unbind burst—resulting in up
> > > > to a 4× slowdown.
> > > > 
> > > > To break this chain, mask the last bind queue dependency if the last
> > > > fence's DMA context matches the TLB invalidation context. This allows
> > > > full pipelining of unbinds and TLB invalidations while preserving
> > > > correct dma-fence signaling semantics.
> > > > 
> > > > Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6047
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >    drivers/gpu/drm/xe/xe_exec.c          |  3 +-
> > > >    drivers/gpu/drm/xe/xe_exec_queue.c    | 18 +++++++++--
> > > >    drivers/gpu/drm/xe/xe_exec_queue.h    |  3 +-
> > > >    drivers/gpu/drm/xe/xe_pt.c            | 15 +++++++--
> > > >    drivers/gpu/drm/xe/xe_sched_job.c     | 44 ++++++++++++++++++++++++++-
> > > >    drivers/gpu/drm/xe/xe_sched_job.h     |  7 ++++-
> > > >    drivers/gpu/drm/xe/xe_tlb_inval_job.c | 14 +++++++++
> > > >    drivers/gpu/drm/xe/xe_tlb_inval_job.h |  2 ++
> > > >    8 files changed, 98 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> > > > index 0dc27476832b..6034cfc8be06 100644
> > > > --- a/drivers/gpu/drm/xe/xe_exec.c
> > > > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > > > @@ -294,7 +294,8 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> > > >    		goto err_put_job;
> > > >    	if (!xe_vm_in_lr_mode(vm)) {
> > > > -		err = xe_sched_job_last_fence_add_dep(job, vm);
> > > > +		err = xe_sched_job_last_fence_add_dep(job, vm, NO_MASK_DEP,
> > > > +						      NO_MASK_DEP);
> > > >    		if (err)
> > > >    			goto err_put_job;
> > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > index 90cbc95f8e2e..d6f69d9bccba 100644
> > > > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > @@ -25,6 +25,7 @@
> > > >    #include "xe_migrate.h"
> > > >    #include "xe_pm.h"
> > > >    #include "xe_ring_ops_types.h"
> > > > +#include "xe_sched_job.h"
> > > >    #include "xe_trace.h"
> > > >    #include "xe_vm.h"
> > > >    #include "xe_pxp.h"
> > > > @@ -1106,11 +1107,17 @@ void xe_exec_queue_last_fence_set(struct xe_exec_queue *q, struct xe_vm *vm,
> > > >     * xe_exec_queue_last_fence_test_dep - Test last fence dependency of queue
> > > >     * @q: The exec queue
> > > >     * @vm: The VM the engine does a bind or exec for
> > > > + * @mask_ctx0: Mask dma-fence context0
> > > > + * @mask_ctx1: Mask dma-fence context1
> > > > + *
> > > > + * Test last fence dependency of queue, skipping masked dma fence contexts.
> > > >     *
> > > >     * Returns:
> > > > - * -ETIME if there exists an unsignalled last fence dependency, zero otherwise.
> > > > + * -ETIME if there exists an unsignalled and unmasked last fence dependency,
> > > > + * zero otherwise.
> > > >     */
> > > > -int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q, struct xe_vm *vm)
> > > > +int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q, struct xe_vm *vm,
> > > > +				      u64 mask_ctx0, u64 mask_ctx1)
> > > >    {
> > > >    	struct dma_fence *fence;
> > > >    	int err = 0;
> > > > @@ -1119,6 +1126,13 @@ int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q, struct xe_vm *vm)
> > > >    	if (fence) {
> > > >    		err = test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags) ?
> > > >    			0 : -ETIME;
> > > > +
> > > > +		if (err == -ETIME) {
> > > > +			if (xe_sched_job_mask_dependency(fence, mask_ctx0,
> > > > +							 mask_ctx1))
> > > > +				err = 0;
> > > > +		}
> > > > +
> > > >    		dma_fence_put(fence);
> > > >    	}
> > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
> > > > index a4dfbe858bda..99a35b22a46c 100644
> > > > --- a/drivers/gpu/drm/xe/xe_exec_queue.h
> > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
> > > > @@ -85,7 +85,8 @@ struct dma_fence *xe_exec_queue_last_fence_get_for_resume(struct xe_exec_queue *
> > > >    void xe_exec_queue_last_fence_set(struct xe_exec_queue *e, struct xe_vm *vm,
> > > >    				  struct dma_fence *fence);
> > > >    int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
> > > > -				      struct xe_vm *vm);
> > > > +				      struct xe_vm *vm, u64 mask_ctx0,
> > > > +				      u64 mask_ctx1);
> > > >    void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q);
> > > >    int xe_exec_queue_contexts_hwsp_rebase(struct xe_exec_queue *q, void *scratch);
> > > > diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> > > > index d22fd1ccc0ba..bba9ae559f57 100644
> > > > --- a/drivers/gpu/drm/xe/xe_pt.c
> > > > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > > > @@ -1341,10 +1341,21 @@ static int xe_pt_vm_dependencies(struct xe_sched_job *job,
> > > >    	}
> > > >    	if (!(pt_update_ops->q->flags & EXEC_QUEUE_FLAG_KERNEL)) {
> > > > +		u64 mask_ctx0 = NO_MASK_DEP, mask_ctx1 = NO_MASK_DEP;
> > > > +
> > > > +		if (ijob)
> > > > +			mask_ctx0 = xe_tlb_inval_job_fence_context(ijob);
> > > > +		if (mjob)
> > > > +			mask_ctx1 = xe_tlb_inval_job_fence_context(mjob);
> > > > +
> > > >    		if (job)
> > > > -			err = xe_sched_job_last_fence_add_dep(job, vm);
> > > > +			err = xe_sched_job_last_fence_add_dep(job, vm,
> > > > +							      mask_ctx0,
> > > > +							      mask_ctx1);
> > > >    		else
> > > > -			err = xe_exec_queue_last_fence_test_dep(pt_update_ops->q, vm);
> > > > +			err = xe_exec_queue_last_fence_test_dep(pt_update_ops->q,
> > > > +								vm, mask_ctx0,
> > > > +								mask_ctx1);
> > > >    	}
> > > >    	for (i = 0; job && !err && i < vops->num_syncs; i++)
> > > > diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
> > > > index d21bf8f26964..7cbdd87904c6 100644
> > > > --- a/drivers/gpu/drm/xe/xe_sched_job.c
> > > > +++ b/drivers/gpu/drm/xe/xe_sched_job.c
> > > > @@ -6,6 +6,7 @@
> > > >    #include "xe_sched_job.h"
> > > >    #include <uapi/drm/xe_drm.h>
> > > > +#include <linux/dma-fence-array.h>
> > > >    #include <linux/dma-fence-chain.h>
> > > >    #include <linux/slab.h>
> > > > @@ -295,19 +296,60 @@ void xe_sched_job_push(struct xe_sched_job *job)
> > > >    	xe_sched_job_put(job);
> > > >    }
> > > > +/**
> > > > + * xe_sched_job_mask_dependency() - Determine if a dma-fence dependency can be masked
> > > > + * @fence: The dma-fence to check
> > > > + * @mask_ctx0: First context to compare against the fence's context
> > > > + * @mask_ctx1: Second context to compare against the fence's context
> > > > + *
> > > > + * This function checks whether the context of the given dma-fence matches
> > > > + * either of the provided mask contexts. If a match is found, the dependency
> > > > + * represented by the fence can be skipped. If the fence is a dma-fence-array,
> > > > + * its individual fences are unwound and checked.
> > > > + *
> > > > + * Return: true if the fence can be masked (i.e., skipped), false otherwise.
> > > > + */
> > > > +bool xe_sched_job_mask_dependency(struct dma_fence *fence, u64 mask_ctx0,
> > > > +				  u64 mask_ctx1)
> > > > +{
> > > > +	if (dma_fence_is_array(fence)) {
> > > > +		struct dma_fence *__fence;
> > > > +		int index;
> > > > +
> > > > +		dma_fence_array_for_each(__fence, index, fence)
> > > > +			if (__fence->context == mask_ctx0 ||
> > > > +			    __fence->context == mask_ctx1)
> > > > +				return true;
> > > > +	} else if (fence->context == mask_ctx0 ||
> > > > +		   fence->context == mask_ctx1) {
> > > > +		return true;
> > > > +	}
> > > > +
> > > > +	return false;
> > > > +}
> > > > +
> > > >    /**
> > > >     * xe_sched_job_last_fence_add_dep - Add last fence dependency to job
> > > >     * @job:job to add the last fence dependency to
> > > >     * @vm: virtual memory job belongs to
> > > > + * @mask_ctx0: Mask dma-fence context0
> > > > + * @mask_ctx1: Mask dma-fence context1
> > > > + *
> > > > + * Add last fence dependency to job, skipping masked dma fence contexts.
> > > >     *
> > > >     * Returns:
> > > >     * 0 on success, or an error on failing to expand the array.
> > > >     */
> > > > -int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm)
> > > > +int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm,
> > > > +				    u64 mask_ctx0, u64 mask_ctx1)
> > > >    {
> > > >    	struct dma_fence *fence;
> > > >    	fence = xe_exec_queue_last_fence_get(job->q, vm);
> > > > +	if (xe_sched_job_mask_dependency(fence, mask_ctx0, mask_ctx1)) {
> > > > +		dma_fence_put(fence);
> > > > +		return 0;
> > > > +	}
> > > >    	return drm_sched_job_add_dependency(&job->drm, fence);
> > > >    }
> > > > diff --git a/drivers/gpu/drm/xe/xe_sched_job.h b/drivers/gpu/drm/xe/xe_sched_job.h
> > > > index 3dc72c5c1f13..81d8e848e605 100644
> > > > --- a/drivers/gpu/drm/xe/xe_sched_job.h
> > > > +++ b/drivers/gpu/drm/xe/xe_sched_job.h
> > > > @@ -58,7 +58,8 @@ bool xe_sched_job_completed(struct xe_sched_job *job);
> > > >    void xe_sched_job_arm(struct xe_sched_job *job);
> > > >    void xe_sched_job_push(struct xe_sched_job *job);
> > > > -int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm);
> > > > +int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm,
> > > > +				    u64 mask_ctx0, u64 mask_ctx1);
> > > >    void xe_sched_job_init_user_fence(struct xe_sched_job *job,
> > > >    				  struct xe_sync_entry *sync);
> > > > @@ -93,4 +94,8 @@ void xe_sched_job_snapshot_print(struct xe_sched_job_snapshot *snapshot, struct
> > > >    int xe_sched_job_add_deps(struct xe_sched_job *job, struct dma_resv *resv,
> > > >    			  enum dma_resv_usage usage);
> > > > +#define NO_MASK_DEP	(~0x0ull)
> > > > +bool xe_sched_job_mask_dependency(struct dma_fence *fence, u64 mask_ctx0,
> > > > +				  u64 mask_ctx1);
> > > > +
> > > >    #endif
> > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.c b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > > > index 492def04a559..f2fe7f9fbb22 100644
> > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > > > @@ -32,6 +32,8 @@ struct xe_tlb_inval_job {
> > > >    	u64 start;
> > > >    	/** @end: End address to invalidate */
> > > >    	u64 end;
> > > > +	/** @fence_context: Fence context for job */
> > > > +	u64 fence_context;
> > > >    	/** @asid: Address space ID to invalidate */
> > > >    	u32 asid;
> > > >    	/** @fence_armed: Fence has been armed */
> > > > @@ -101,6 +103,7 @@ xe_tlb_inval_job_create(struct xe_exec_queue *q, struct xe_tlb_inval *tlb_inval,
> > > >    	job->asid = asid;
> > > >    	job->fence_armed = false;
> > > >    	job->dep.ops = &dep_job_ops;
> > > > +	job->fence_context = entity->fence_context + 1;
> > > 
> > > As a side note, hardcoding the assumption on how scheduler allocates
> > > contexts is not great given recent efforts to make drivers know less of the
> > > scheduler internals.
> > > 
> > 
> > Yes, we should probably have a helper here — maybe
> > drm_sched_job_finished_context?
> > 
> > I was planning to roll this change into [1], but that series hasn’t
> > gained much traction, and fixing this is a fairly high-priority issue
> > for customers.
> > 
> > This is documented in the DRM scheduler kernel docs:
> > entity->fence_context + 1 is the job's finished context.
> > 
> > [1] https://patchwork.freedesktop.org/series/155314/
> > 
> > > But what I really wanted to ask is, having only glanced the patch briefly,
> > > could xe performance problem here also be solved by unwrapping the container
> > > fences at the DRM scheduler dependency tracking level?
> > > 
> > 
> > This is primarily about preventing TLB fences — which originate from a
> > different context than the bind queue but are still ordered on the queue
> > — from becoming dependencies. The process involves two passes: in the
> > first pass, we detect dependencies. If none are found, we immediately
> > complete the bind via the CPU. If dependencies are present, we defer the
> > bind to the GPU.
> 
> Interesting, I saw fence unwrapping and context number checking and thought
> it was maybe the same problem. I do not fully understand what xe is doing to
> comment strongly but it does raise a question on whether there could be a
> more elegant solution (ie not a hack).
> 
> Could the two entities be shared and would that solve the problem? I mean
> the TLB invalidation and the bind queue entities, do they need to be
> separate if the assumption and guarantee is to execute in order?
> 

Sharing dma-fence context would be great, but we have three scheduler
instances here — one for the bind queue and two for TLB invalidations,
one per GuC instance. The bind job feeds into the two TLB invalidations
as dependencies. The two TLB invalidations themselves are not ordered
with respect to each other, and the overall operation signals only when
both TLB invalidations have signaled.

This gets more complicated when a subsequent bind is issued without an
invalidation — it needs to wait on the prior invalidations to ensure
that fences sent to user space from the queue don’t signal out of order.

If the subsequent bind does issue an invalidation, then we don’t need to
wait — and that’s what this patch is (partially) fixing (e.g., a burst
of unbinds, which is the issue you previously raised with Chrome
switching tabs).

I’d love to find an elegant solution, but I’m just not seeing one right
now. I also wouldn’t call this a hack — getting dependency tracking
right in a complex driver is, frankly, just really hard. We’re still
working on getting everything correct.

Matt

> > > I am asking because amdgpu recently posted a patch to unwrap in their code
> > > for potentially similar performance reasons, and if now xe wants something
> > > similar, or even the same, it is an interesting question where to do it.
> > > 
> > > Also, I have a patch (not sure if I posted it so far) which unwraps in
> > > drm_sched_job_add_dependency() and converst the dependency xarray to
> > > unwrapped dma-fence-array. Initial idea there was to allow scheduler worker
> > > to only be woken up once, once all deps are signaled, but now if two drivers
> > > seems to be unwrapping fences maybe there is a case to be made for doing it
> > > in the core.
> > > 
> > 
> > I don't think this is the same problem as the one above, but it's an
> > interesting idea in general. CC me if you post this one.
> 
> Okay, but since it sounds it would not be helping here it will not be
> priority to clean it up and send so might be a while.
> 
> Regards,
> 
> Tvrtko
> 
> > 
> > Matt
> > 
> > > Regards,
> > > 
> > > Tvrtko
> > > 
> > > >    	kref_init(&job->refcount);
> > > >    	xe_exec_queue_get(q);	/* Pairs with put in xe_tlb_inval_job_destroy */
> > > > @@ -266,3 +269,14 @@ void xe_tlb_inval_job_put(struct xe_tlb_inval_job *job)
> > > >    	if (!IS_ERR_OR_NULL(job))
> > > >    		kref_put(&job->refcount, xe_tlb_inval_job_destroy);
> > > >    }
> > > > +
> > > > +/**
> > > > + * xe_tlb_inval_job_fence_context() - TLB invalidation job fence context
> > > > + * @job: TLB invalidation job object
> > > > + *
> > > > + * Return: TLB invalidation job fence context
> > > > + */
> > > > +u64 xe_tlb_inval_job_fence_context(struct xe_tlb_inval_job *job)
> > > > +{
> > > > +	return job->fence_context;
> > > > +}
> > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.h b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > > > index e63edcb26b50..2576165c2228 100644
> > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > > > @@ -30,4 +30,6 @@ void xe_tlb_inval_job_get(struct xe_tlb_inval_job *job);
> > > >    void xe_tlb_inval_job_put(struct xe_tlb_inval_job *job);
> > > > +u64 xe_tlb_inval_job_fence_context(struct xe_tlb_inval_job *job);
> > > > +
> > > >    #endif
> > > 
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] drm/xe: Avoid serializing unbind jobs on prior TLB invalidations
  2025-10-23 18:55         ` Matthew Brost
@ 2025-10-23 19:27           ` Matthew Brost
  0 siblings, 0 replies; 15+ messages in thread
From: Matthew Brost @ 2025-10-23 19:27 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-xe, carlos.santa, thomas.hellstrom, Philipp Stanner

On Thu, Oct 23, 2025 at 11:55:45AM -0700, Matthew Brost wrote:
> On Thu, Oct 23, 2025 at 01:46:26PM +0100, Tvrtko Ursulin wrote:
> > 
> > On 22/10/2025 16:10, Matthew Brost wrote:
> > > On Wed, Oct 22, 2025 at 09:00:47AM +0100, Tvrtko Ursulin wrote:
> > > > 
> > > > On 17/10/2025 17:52, Matthew Brost wrote:
> > > > > When a burst of unbind jobs is issued, a dependency chain can form
> > > > > between the TLB invalidation of a previous unbind job and the current
> > > > > one. This leads to undesirable serialization, causing current jobs to
> > > > > wait unnecessarily for prior TLB invalidations, execute on the GPU when
> > > > > not needed, and significantly slow down the unbind burst—resulting in up
> > > > > to a 4× slowdown.
> > > > > 
> > > > > To break this chain, mask the last bind queue dependency if the last
> > > > > fence's DMA context matches the TLB invalidation context. This allows
> > > > > full pipelining of unbinds and TLB invalidations while preserving
> > > > > correct dma-fence signaling semantics.
> > > > > 
> > > > > Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6047
> > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > > ---
> > > > >    drivers/gpu/drm/xe/xe_exec.c          |  3 +-
> > > > >    drivers/gpu/drm/xe/xe_exec_queue.c    | 18 +++++++++--
> > > > >    drivers/gpu/drm/xe/xe_exec_queue.h    |  3 +-
> > > > >    drivers/gpu/drm/xe/xe_pt.c            | 15 +++++++--
> > > > >    drivers/gpu/drm/xe/xe_sched_job.c     | 44 ++++++++++++++++++++++++++-
> > > > >    drivers/gpu/drm/xe/xe_sched_job.h     |  7 ++++-
> > > > >    drivers/gpu/drm/xe/xe_tlb_inval_job.c | 14 +++++++++
> > > > >    drivers/gpu/drm/xe/xe_tlb_inval_job.h |  2 ++
> > > > >    8 files changed, 98 insertions(+), 8 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> > > > > index 0dc27476832b..6034cfc8be06 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_exec.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > > > > @@ -294,7 +294,8 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> > > > >    		goto err_put_job;
> > > > >    	if (!xe_vm_in_lr_mode(vm)) {
> > > > > -		err = xe_sched_job_last_fence_add_dep(job, vm);
> > > > > +		err = xe_sched_job_last_fence_add_dep(job, vm, NO_MASK_DEP,
> > > > > +						      NO_MASK_DEP);
> > > > >    		if (err)
> > > > >    			goto err_put_job;
> > > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > index 90cbc95f8e2e..d6f69d9bccba 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > @@ -25,6 +25,7 @@
> > > > >    #include "xe_migrate.h"
> > > > >    #include "xe_pm.h"
> > > > >    #include "xe_ring_ops_types.h"
> > > > > +#include "xe_sched_job.h"
> > > > >    #include "xe_trace.h"
> > > > >    #include "xe_vm.h"
> > > > >    #include "xe_pxp.h"
> > > > > @@ -1106,11 +1107,17 @@ void xe_exec_queue_last_fence_set(struct xe_exec_queue *q, struct xe_vm *vm,
> > > > >     * xe_exec_queue_last_fence_test_dep - Test last fence dependency of queue
> > > > >     * @q: The exec queue
> > > > >     * @vm: The VM the engine does a bind or exec for
> > > > > + * @mask_ctx0: Mask dma-fence context0
> > > > > + * @mask_ctx1: Mask dma-fence context1
> > > > > + *
> > > > > + * Test last fence dependency of queue, skipping masked dma fence contexts.
> > > > >     *
> > > > >     * Returns:
> > > > > - * -ETIME if there exists an unsignalled last fence dependency, zero otherwise.
> > > > > + * -ETIME if there exists an unsignalled and unmasked last fence dependency,
> > > > > + * zero otherwise.
> > > > >     */
> > > > > -int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q, struct xe_vm *vm)
> > > > > +int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q, struct xe_vm *vm,
> > > > > +				      u64 mask_ctx0, u64 mask_ctx1)
> > > > >    {
> > > > >    	struct dma_fence *fence;
> > > > >    	int err = 0;
> > > > > @@ -1119,6 +1126,13 @@ int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q, struct xe_vm *vm)
> > > > >    	if (fence) {
> > > > >    		err = test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags) ?
> > > > >    			0 : -ETIME;
> > > > > +
> > > > > +		if (err == -ETIME) {
> > > > > +			if (xe_sched_job_mask_dependency(fence, mask_ctx0,
> > > > > +							 mask_ctx1))
> > > > > +				err = 0;
> > > > > +		}
> > > > > +
> > > > >    		dma_fence_put(fence);
> > > > >    	}
> > > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
> > > > > index a4dfbe858bda..99a35b22a46c 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_exec_queue.h
> > > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
> > > > > @@ -85,7 +85,8 @@ struct dma_fence *xe_exec_queue_last_fence_get_for_resume(struct xe_exec_queue *
> > > > >    void xe_exec_queue_last_fence_set(struct xe_exec_queue *e, struct xe_vm *vm,
> > > > >    				  struct dma_fence *fence);
> > > > >    int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q,
> > > > > -				      struct xe_vm *vm);
> > > > > +				      struct xe_vm *vm, u64 mask_ctx0,
> > > > > +				      u64 mask_ctx1);
> > > > >    void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q);
> > > > >    int xe_exec_queue_contexts_hwsp_rebase(struct xe_exec_queue *q, void *scratch);
> > > > > diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> > > > > index d22fd1ccc0ba..bba9ae559f57 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_pt.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > > > > @@ -1341,10 +1341,21 @@ static int xe_pt_vm_dependencies(struct xe_sched_job *job,
> > > > >    	}
> > > > >    	if (!(pt_update_ops->q->flags & EXEC_QUEUE_FLAG_KERNEL)) {
> > > > > +		u64 mask_ctx0 = NO_MASK_DEP, mask_ctx1 = NO_MASK_DEP;
> > > > > +
> > > > > +		if (ijob)
> > > > > +			mask_ctx0 = xe_tlb_inval_job_fence_context(ijob);
> > > > > +		if (mjob)
> > > > > +			mask_ctx1 = xe_tlb_inval_job_fence_context(mjob);
> > > > > +
> > > > >    		if (job)
> > > > > -			err = xe_sched_job_last_fence_add_dep(job, vm);
> > > > > +			err = xe_sched_job_last_fence_add_dep(job, vm,
> > > > > +							      mask_ctx0,
> > > > > +							      mask_ctx1);
> > > > >    		else
> > > > > -			err = xe_exec_queue_last_fence_test_dep(pt_update_ops->q, vm);
> > > > > +			err = xe_exec_queue_last_fence_test_dep(pt_update_ops->q,
> > > > > +								vm, mask_ctx0,
> > > > > +								mask_ctx1);
> > > > >    	}
> > > > >    	for (i = 0; job && !err && i < vops->num_syncs; i++)
> > > > > diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
> > > > > index d21bf8f26964..7cbdd87904c6 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_sched_job.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_sched_job.c
> > > > > @@ -6,6 +6,7 @@
> > > > >    #include "xe_sched_job.h"
> > > > >    #include <uapi/drm/xe_drm.h>
> > > > > +#include <linux/dma-fence-array.h>
> > > > >    #include <linux/dma-fence-chain.h>
> > > > >    #include <linux/slab.h>
> > > > > @@ -295,19 +296,60 @@ void xe_sched_job_push(struct xe_sched_job *job)
> > > > >    	xe_sched_job_put(job);
> > > > >    }
> > > > > +/**
> > > > > + * xe_sched_job_mask_dependency() - Determine if a dma-fence dependency can be masked
> > > > > + * @fence: The dma-fence to check
> > > > > + * @mask_ctx0: First context to compare against the fence's context
> > > > > + * @mask_ctx1: Second context to compare against the fence's context
> > > > > + *
> > > > > + * This function checks whether the context of the given dma-fence matches
> > > > > + * either of the provided mask contexts. If a match is found, the dependency
> > > > > + * represented by the fence can be skipped. If the fence is a dma-fence-array,
> > > > > + * its individual fences are unwound and checked.
> > > > > + *
> > > > > + * Return: true if the fence can be masked (i.e., skipped), false otherwise.
> > > > > + */
> > > > > +bool xe_sched_job_mask_dependency(struct dma_fence *fence, u64 mask_ctx0,
> > > > > +				  u64 mask_ctx1)
> > > > > +{
> > > > > +	if (dma_fence_is_array(fence)) {
> > > > > +		struct dma_fence *__fence;
> > > > > +		int index;
> > > > > +
> > > > > +		dma_fence_array_for_each(__fence, index, fence)
> > > > > +			if (__fence->context == mask_ctx0 ||
> > > > > +			    __fence->context == mask_ctx1)
> > > > > +				return true;
> > > > > +	} else if (fence->context == mask_ctx0 ||
> > > > > +		   fence->context == mask_ctx1) {
> > > > > +		return true;
> > > > > +	}
> > > > > +
> > > > > +	return false;
> > > > > +}
> > > > > +
> > > > >    /**
> > > > >     * xe_sched_job_last_fence_add_dep - Add last fence dependency to job
> > > > >     * @job:job to add the last fence dependency to
> > > > >     * @vm: virtual memory job belongs to
> > > > > + * @mask_ctx0: Mask dma-fence context0
> > > > > + * @mask_ctx1: Mask dma-fence context1
> > > > > + *
> > > > > + * Add last fence dependency to job, skipping masked dma fence contexts.
> > > > >     *
> > > > >     * Returns:
> > > > >     * 0 on success, or an error on failing to expand the array.
> > > > >     */
> > > > > -int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm)
> > > > > +int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm,
> > > > > +				    u64 mask_ctx0, u64 mask_ctx1)
> > > > >    {
> > > > >    	struct dma_fence *fence;
> > > > >    	fence = xe_exec_queue_last_fence_get(job->q, vm);
> > > > > +	if (xe_sched_job_mask_dependency(fence, mask_ctx0, mask_ctx1)) {
> > > > > +		dma_fence_put(fence);
> > > > > +		return 0;
> > > > > +	}
> > > > >    	return drm_sched_job_add_dependency(&job->drm, fence);
> > > > >    }
> > > > > diff --git a/drivers/gpu/drm/xe/xe_sched_job.h b/drivers/gpu/drm/xe/xe_sched_job.h
> > > > > index 3dc72c5c1f13..81d8e848e605 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_sched_job.h
> > > > > +++ b/drivers/gpu/drm/xe/xe_sched_job.h
> > > > > @@ -58,7 +58,8 @@ bool xe_sched_job_completed(struct xe_sched_job *job);
> > > > >    void xe_sched_job_arm(struct xe_sched_job *job);
> > > > >    void xe_sched_job_push(struct xe_sched_job *job);
> > > > > -int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm);
> > > > > +int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm,
> > > > > +				    u64 mask_ctx0, u64 mask_ctx1);
> > > > >    void xe_sched_job_init_user_fence(struct xe_sched_job *job,
> > > > >    				  struct xe_sync_entry *sync);
> > > > > @@ -93,4 +94,8 @@ void xe_sched_job_snapshot_print(struct xe_sched_job_snapshot *snapshot, struct
> > > > >    int xe_sched_job_add_deps(struct xe_sched_job *job, struct dma_resv *resv,
> > > > >    			  enum dma_resv_usage usage);
> > > > > +#define NO_MASK_DEP	(~0x0ull)
> > > > > +bool xe_sched_job_mask_dependency(struct dma_fence *fence, u64 mask_ctx0,
> > > > > +				  u64 mask_ctx1);
> > > > > +
> > > > >    #endif
> > > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.c b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > > > > index 492def04a559..f2fe7f9fbb22 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.c
> > > > > @@ -32,6 +32,8 @@ struct xe_tlb_inval_job {
> > > > >    	u64 start;
> > > > >    	/** @end: End address to invalidate */
> > > > >    	u64 end;
> > > > > +	/** @fence_context: Fence context for job */
> > > > > +	u64 fence_context;
> > > > >    	/** @asid: Address space ID to invalidate */
> > > > >    	u32 asid;
> > > > >    	/** @fence_armed: Fence has been armed */
> > > > > @@ -101,6 +103,7 @@ xe_tlb_inval_job_create(struct xe_exec_queue *q, struct xe_tlb_inval *tlb_inval,
> > > > >    	job->asid = asid;
> > > > >    	job->fence_armed = false;
> > > > >    	job->dep.ops = &dep_job_ops;
> > > > > +	job->fence_context = entity->fence_context + 1;
> > > > 
> > > > As a side note, hardcoding the assumption on how scheduler allocates
> > > > contexts is not great given recent efforts to make drivers know less of the
> > > > scheduler internals.
> > > > 
> > > 
> > > Yes, we should probably have a helper here — maybe
> > > drm_sched_job_finished_context?
> > > 
> > > I was planning to roll this change into [1], but that series hasn’t
> > > gained much traction, and fixing this is a fairly high-priority issue
> > > for customers.
> > > 
> > > This is documented in the DRM scheduler kernel docs:
> > > entity->fence_context + 1 is the job's finished context.
> > > 
> > > [1] https://patchwork.freedesktop.org/series/155314/
> > > 
> > > > But what I really wanted to ask is, having only glanced the patch briefly,
> > > > could xe performance problem here also be solved by unwrapping the container
> > > > fences at the DRM scheduler dependency tracking level?
> > > > 
> > > 
> > > This is primarily about preventing TLB fences — which originate from a
> > > different context than the bind queue but are still ordered on the queue
> > > — from becoming dependencies. The process involves two passes: in the
> > > first pass, we detect dependencies. If none are found, we immediately
> > > complete the bind via the CPU. If dependencies are present, we defer the
> > > bind to the GPU.
> > 
> > Interesting, I saw fence unwrapping and context number checking and thought
> > it was maybe the same problem. I do not fully understand what xe is doing to
> > comment strongly but it does raise a question on whether there could be a
> > more elegant solution (ie not a hack).
> > 
> > Could the two entities be shared and would that solve the problem? I mean
> > the TLB invalidation and the bind queue entities, do they need to be
> > separate if the assumption and guarantee is to execute in order?
> > 
> 
> Sharing dma-fence context would be great, but we have three scheduler
> instances here — one for the bind queue and two for TLB invalidations,
> one per GuC instance. The bind job feeds into the two TLB invalidations
> as dependencies. The two TLB invalidations themselves are not ordered
> with respect to each other, and the overall operation signals only when
> both TLB invalidations have signaled.
> 
> This gets more complicated when a subsequent bind is issued without an
> invalidation — it needs to wait on the prior invalidations to ensure
> that fences sent to user space from the queue don’t signal out of order.
> 
> If the subsequent bind does issue an invalidation, then we don’t need to
> wait — and that’s what this patch is (partially) fixing (e.g., a burst
> of unbinds, which is the issue you previously raised with Chrome
> switching tabs).
> 
> I’d love to find an elegant solution, but I’m just not seeing one right

Okay, I take this back. I think I have an elegant solution, but it will
take time for Xe to get there.

Currently, when a bind job is needed (i.e., it has dependencies), the
final part of the bind process runs on the GPU, programming leaves or
pruning new parts in the page table structure. GPU jobs are inherently
asynchronous, so subsequent TLB invalidations must be scheduled to run
afterward.

I have patches that, for various reasons, make it advantageous to run
the bind job on the CPU. Once that happens, the operation becomes
synchronous, allowing TLB invalidations to be issued immediately from
run_job, returning a dma-fence-array or dma-fence-chain of all issued
TLB invalidations. We’d only need a single queue (scheduler instance /
entity) for bind jobs and invalidations.

We’d still need to block somewhere to cover the case of a job with
invalidation followed by a job without invalidation, but that seems
workable.

As I said, this won’t happen overnight. The CPU bind patches have been
around for a couple of years, and the last time I revived them, PM
references were leaking—so I need to track that down. But in the long
term, a single queue for binds and invalidations seems like the ultimate
goal.

In the short term, this is something we need to fix—it's a clear issue
in our driver.

Matt

> now. I also wouldn’t call this a hack — getting dependency tracking
> right in a complex driver is, frankly, just really hard. We’re still
> working on getting everything correct.
> 
> Matt
> 
> > > > I am asking because amdgpu recently posted a patch to unwrap in their code
> > > > for potentially similar performance reasons, and if now xe wants something
> > > > similar, or even the same, it is an interesting question where to do it.
> > > > 
> > > > Also, I have a patch (not sure if I posted it so far) which unwraps in
> > > > drm_sched_job_add_dependency() and converst the dependency xarray to
> > > > unwrapped dma-fence-array. Initial idea there was to allow scheduler worker
> > > > to only be woken up once, once all deps are signaled, but now if two drivers
> > > > seems to be unwrapping fences maybe there is a case to be made for doing it
> > > > in the core.
> > > > 
> > > 
> > > I don't think this is the same problem as the one above, but it's an
> > > interesting idea in general. CC me if you post this one.
> > 
> > Okay, but since it sounds it would not be helping here it will not be
> > priority to clean it up and send so might be a while.
> > 
> > Regards,
> > 
> > Tvrtko
> > 
> > > 
> > > Matt
> > > 
> > > > Regards,
> > > > 
> > > > Tvrtko
> > > > 
> > > > >    	kref_init(&job->refcount);
> > > > >    	xe_exec_queue_get(q);	/* Pairs with put in xe_tlb_inval_job_destroy */
> > > > > @@ -266,3 +269,14 @@ void xe_tlb_inval_job_put(struct xe_tlb_inval_job *job)
> > > > >    	if (!IS_ERR_OR_NULL(job))
> > > > >    		kref_put(&job->refcount, xe_tlb_inval_job_destroy);
> > > > >    }
> > > > > +
> > > > > +/**
> > > > > + * xe_tlb_inval_job_fence_context() - TLB invalidation job fence context
> > > > > + * @job: TLB invalidation job object
> > > > > + *
> > > > > + * Return: TLB invalidation job fence context
> > > > > + */
> > > > > +u64 xe_tlb_inval_job_fence_context(struct xe_tlb_inval_job *job)
> > > > > +{
> > > > > +	return job->fence_context;
> > > > > +}
> > > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_job.h b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > > > > index e63edcb26b50..2576165c2228 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_job.h
> > > > > @@ -30,4 +30,6 @@ void xe_tlb_inval_job_get(struct xe_tlb_inval_job *job);
> > > > >    void xe_tlb_inval_job_put(struct xe_tlb_inval_job *job);
> > > > > +u64 xe_tlb_inval_job_fence_context(struct xe_tlb_inval_job *job);
> > > > > +
> > > > >    #endif
> > > > 
> > 
> > 
> > 
> > 
> > 

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-10-23 19:27 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-17 16:52 [PATCH 0/1] Fix serialization on burst of unbinds Matthew Brost
2025-10-17 16:52 ` [PATCH 1/1] drm/xe: Avoid serializing unbind jobs on prior TLB invalidations Matthew Brost
2025-10-21 17:55   ` Summers, Stuart
2025-10-21 20:36     ` Matthew Brost
2025-10-21 20:43       ` Summers, Stuart
2025-10-21 20:50         ` Matthew Brost
2025-10-22  8:00   ` Tvrtko Ursulin
2025-10-22 15:10     ` Matthew Brost
2025-10-23 12:46       ` Tvrtko Ursulin
2025-10-23 18:55         ` Matthew Brost
2025-10-23 19:27           ` Matthew Brost
2025-10-23 12:28   ` Thomas Hellström
2025-10-17 18:36 ` ✓ CI.KUnit: success for Fix serialization on burst of unbinds Patchwork
2025-10-17 19:16 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-18 18:20 ` ✗ Xe.CI.Full: failure " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox