[PATCH v7 0/9] Fix DRM scheduler layering violations in Xe

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe
@ 2025-12-01 18:39 Matthew Brost
  2025-12-01 18:39 ` [PATCH v7 1/9] drm/sched: Add several job helpers to avoid drivers touching scheduler state Matthew Brost
                   ` (13 more replies)
  0 siblings, 14 replies; 31+ messages in thread
From: Matthew Brost @ 2025-12-01 18:39 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel

At XDC, we discussed that drivers should avoid accessing DRM scheduler
internals, misusing DRM scheduler locks, and adopt a well-defined
pending job list iterator. This series proposes the necessary changes to
the DRM scheduler to bring Xe in line with that agreement and updates Xe
to use the new DRM scheduler API.

While here, cleanup LR queue handling and simplify GuC state machine in
Xe too. Also rework LRC timestamp sampling to avoid scheduling toggle.

v2:
 - Fix checkpatch / naming issues
v3:
 - Only allow pending job list iterator to be called on stopped schedulers
 - Cleanup LR queue handling / fix a few misselanous Xe scheduler issues
v4:
 - Address Niranjana's feedback
 - Add patch to avoid toggling scheduler state in the TDR
v5:
 - Rebase
 - Fixup LRC timeout check (Umesh)
v6:
 - Fix VF bugs (Testing)
v7:
 - Disable timestamp WA on VF

Matt

Matthew Brost (9):
  drm/sched: Add several job helpers to avoid drivers touching scheduler
    state
  drm/sched: Add pending job list iterator
  drm/xe: Add dedicated message lock
  drm/xe: Stop abusing DRM scheduler internals
  drm/xe: Only toggle scheduling in TDR if GuC is running
  drm/xe: Do not deregister queues in TDR
  drm/xe: Remove special casing for LR queues in submission
  drm/xe: Disable timestamp WA on VFs
  drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR

 drivers/gpu/drm/scheduler/sched_main.c       |   4 +-
 drivers/gpu/drm/xe/xe_gpu_scheduler.c        |   9 +-
 drivers/gpu/drm/xe/xe_gpu_scheduler.h        |  37 +-
 drivers/gpu/drm/xe/xe_gpu_scheduler_types.h  |   2 +
 drivers/gpu/drm/xe/xe_guc_exec_queue_types.h |   2 -
 drivers/gpu/drm/xe/xe_guc_submit.c           | 362 +++----------------
 drivers/gpu/drm/xe/xe_guc_submit_types.h     |  11 -
 drivers/gpu/drm/xe/xe_hw_fence.c             |  16 -
 drivers/gpu/drm/xe/xe_hw_fence.h             |   2 -
 drivers/gpu/drm/xe/xe_lrc.c                  |  45 ++-
 drivers/gpu/drm/xe/xe_lrc.h                  |   3 +-
 drivers/gpu/drm/xe/xe_ring_ops.c             |  25 +-
 drivers/gpu/drm/xe/xe_sched_job.c            |   1 +
 drivers/gpu/drm/xe/xe_sched_job_types.h      |   2 +
 drivers/gpu/drm/xe/xe_trace.h                |   5 -
 include/drm/gpu_scheduler.h                  |  82 +++++
 16 files changed, 211 insertions(+), 397 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v7 1/9] drm/sched: Add several job helpers to avoid drivers touching scheduler state
  2025-12-01 18:39 [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
@ 2025-12-01 18:39 ` Matthew Brost
  2025-12-03  8:56   ` Philipp Stanner
  2025-12-01 18:39 ` [PATCH v7 2/9] drm/sched: Add pending job list iterator Matthew Brost
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 31+ messages in thread
From: Matthew Brost @ 2025-12-01 18:39 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel

Add helpers to see if scheduler is stopped and a jobs signaled state.
Expected to be used driver side on recovery and debug flows.

v4:
 - Reorder patch to first in series (Niranjana)
 - Also check parent fence for signaling (Niranjana)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/scheduler/sched_main.c |  4 ++--
 include/drm/gpu_scheduler.h            | 32 ++++++++++++++++++++++++++
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 1d4f1b822e7b..cf40c18ab433 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -344,7 +344,7 @@ drm_sched_rq_select_entity_fifo(struct drm_gpu_scheduler *sched,
  */
 static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
 {
-	if (!READ_ONCE(sched->pause_submit))
+	if (!drm_sched_is_stopped(sched))
 		queue_work(sched->submit_wq, &sched->work_run_job);
 }
 
@@ -354,7 +354,7 @@ static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
  */
 static void drm_sched_run_free_queue(struct drm_gpu_scheduler *sched)
 {
-	if (!READ_ONCE(sched->pause_submit))
+	if (!drm_sched_is_stopped(sched))
 		queue_work(sched->submit_wq, &sched->work_free_job);
 }
 
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index fb88301b3c45..385bf34e76fe 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -698,4 +698,36 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
 				   struct drm_gpu_scheduler **sched_list,
 				   unsigned int num_sched_list);
 
+/* Inlines */
+
+/**
+ * drm_sched_is_stopped() - DRM is stopped
+ * @sched: DRM scheduler
+ *
+ * Return: True if sched is stopped, False otherwise
+ */
+static inline bool drm_sched_is_stopped(struct drm_gpu_scheduler *sched)
+{
+	return READ_ONCE(sched->pause_submit);
+}
+
+/**
+ * drm_sched_job_is_signaled() - DRM scheduler job is signaled
+ * @job: DRM scheduler job
+ *
+ * Determine if DRM scheduler job is signaled. DRM scheduler should be stopped
+ * to obtain a stable snapshot of state. Both parent fence (hardware fence) and
+ * finished fence (software fence) are check to determine signaling state.
+ *
+ * Return: True if job is signaled, False otherwise
+ */
+static inline bool drm_sched_job_is_signaled(struct drm_sched_job *job)
+{
+	struct drm_sched_fence *s_fence = job->s_fence;
+
+	WARN_ON(!drm_sched_is_stopped(job->sched));
+	return (s_fence->parent && dma_fence_is_signaled(s_fence->parent)) ||
+		dma_fence_is_signaled(&s_fence->finished);
+}
+
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v7 2/9] drm/sched: Add pending job list iterator
  2025-12-01 18:39 [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
  2025-12-01 18:39 ` [PATCH v7 1/9] drm/sched: Add several job helpers to avoid drivers touching scheduler state Matthew Brost
@ 2025-12-01 18:39 ` Matthew Brost
  2025-12-03  9:07   ` Philipp Stanner
  2025-12-01 18:39 ` [PATCH v7 3/9] drm/xe: Add dedicated message lock Matthew Brost
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 31+ messages in thread
From: Matthew Brost @ 2025-12-01 18:39 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel

Stop open coding pending job list in drivers. Add pending job list
iterator which safely walks DRM scheduler list asserting DRM scheduler
is stopped.

v2:
 - Fix checkpatch (CI)
v3:
 - Drop locked version (Christian)
v4:
 - Reorder patch (Niranjana)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 include/drm/gpu_scheduler.h | 50 +++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 385bf34e76fe..9d228513d06c 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -730,4 +730,54 @@ static inline bool drm_sched_job_is_signaled(struct drm_sched_job *job)
 		dma_fence_is_signaled(&s_fence->finished);
 }
 
+/**
+ * struct drm_sched_pending_job_iter - DRM scheduler pending job iterator state
+ * @sched: DRM scheduler associated with pending job iterator
+ */
+struct drm_sched_pending_job_iter {
+	struct drm_gpu_scheduler *sched;
+};
+
+/* Drivers should never call this directly */
+static inline struct drm_sched_pending_job_iter
+__drm_sched_pending_job_iter_begin(struct drm_gpu_scheduler *sched)
+{
+	struct drm_sched_pending_job_iter iter = {
+		.sched = sched,
+	};
+
+	WARN_ON(!drm_sched_is_stopped(sched));
+	return iter;
+}
+
+/* Drivers should never call this directly */
+static inline void
+__drm_sched_pending_job_iter_end(const struct drm_sched_pending_job_iter iter)
+{
+	WARN_ON(!drm_sched_is_stopped(iter.sched));
+}
+
+DEFINE_CLASS(drm_sched_pending_job_iter, struct drm_sched_pending_job_iter,
+	     __drm_sched_pending_job_iter_end(_T),
+	     __drm_sched_pending_job_iter_begin(__sched),
+	     struct drm_gpu_scheduler *__sched);
+static inline void *
+class_drm_sched_pending_job_iter_lock_ptr(class_drm_sched_pending_job_iter_t *_T)
+{ return _T; }
+#define class_drm_sched_pending_job_iter_is_conditional false
+
+/**
+ * drm_sched_for_each_pending_job() - Iterator for each pending job in scheduler
+ * @__job: Current pending job being iterated over
+ * @__sched: DRM scheduler to iterate over pending jobs
+ * @__entity: DRM scheduler entity to filter jobs, NULL indicates no filter
+ *
+ * Iterator for each pending job in scheduler, filtering on an entity, and
+ * enforcing scheduler is fully stopped
+ */
+#define drm_sched_for_each_pending_job(__job, __sched, __entity)		\
+	scoped_guard(drm_sched_pending_job_iter, (__sched))			\
+		list_for_each_entry((__job), &(__sched)->pending_list, list)	\
+			for_each_if(!(__entity) || (__job)->entity == (__entity))
+
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v7 3/9] drm/xe: Add dedicated message lock
  2025-12-01 18:39 [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
  2025-12-01 18:39 ` [PATCH v7 1/9] drm/sched: Add several job helpers to avoid drivers touching scheduler state Matthew Brost
  2025-12-01 18:39 ` [PATCH v7 2/9] drm/sched: Add pending job list iterator Matthew Brost
@ 2025-12-01 18:39 ` Matthew Brost
  2025-12-03  9:38   ` Philipp Stanner
  2025-12-01 18:39 ` [PATCH v7 4/9] drm/xe: Stop abusing DRM scheduler internals Matthew Brost
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 31+ messages in thread
From: Matthew Brost @ 2025-12-01 18:39 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel

Stop abusing DRM scheduler job list lock for messages, add dedicated
message lock.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/xe/xe_gpu_scheduler.c       | 5 +++--
 drivers/gpu/drm/xe/xe_gpu_scheduler.h       | 4 ++--
 drivers/gpu/drm/xe/xe_gpu_scheduler_types.h | 2 ++
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.c b/drivers/gpu/drm/xe/xe_gpu_scheduler.c
index f91e06d03511..f4f23317191f 100644
--- a/drivers/gpu/drm/xe/xe_gpu_scheduler.c
+++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.c
@@ -77,6 +77,7 @@ int xe_sched_init(struct xe_gpu_scheduler *sched,
 	};
 
 	sched->ops = xe_ops;
+	spin_lock_init(&sched->msg_lock);
 	INIT_LIST_HEAD(&sched->msgs);
 	INIT_WORK(&sched->work_process_msg, xe_sched_process_msg_work);
 
@@ -117,7 +118,7 @@ void xe_sched_add_msg(struct xe_gpu_scheduler *sched,
 void xe_sched_add_msg_locked(struct xe_gpu_scheduler *sched,
 			     struct xe_sched_msg *msg)
 {
-	lockdep_assert_held(&sched->base.job_list_lock);
+	lockdep_assert_held(&sched->msg_lock);
 
 	list_add_tail(&msg->link, &sched->msgs);
 	xe_sched_process_msg_queue(sched);
@@ -131,7 +132,7 @@ void xe_sched_add_msg_locked(struct xe_gpu_scheduler *sched,
 void xe_sched_add_msg_head(struct xe_gpu_scheduler *sched,
 			   struct xe_sched_msg *msg)
 {
-	lockdep_assert_held(&sched->base.job_list_lock);
+	lockdep_assert_held(&sched->msg_lock);
 
 	list_add(&msg->link, &sched->msgs);
 	xe_sched_process_msg_queue(sched);
diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.h b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
index c7a77a3a9681..dceb2cd0ee5b 100644
--- a/drivers/gpu/drm/xe/xe_gpu_scheduler.h
+++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
@@ -33,12 +33,12 @@ void xe_sched_add_msg_head(struct xe_gpu_scheduler *sched,
 
 static inline void xe_sched_msg_lock(struct xe_gpu_scheduler *sched)
 {
-	spin_lock(&sched->base.job_list_lock);
+	spin_lock(&sched->msg_lock);
 }
 
 static inline void xe_sched_msg_unlock(struct xe_gpu_scheduler *sched)
 {
-	spin_unlock(&sched->base.job_list_lock);
+	spin_unlock(&sched->msg_lock);
 }
 
 static inline void xe_sched_stop(struct xe_gpu_scheduler *sched)
diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler_types.h b/drivers/gpu/drm/xe/xe_gpu_scheduler_types.h
index 6731b13da8bb..63d9bf92583c 100644
--- a/drivers/gpu/drm/xe/xe_gpu_scheduler_types.h
+++ b/drivers/gpu/drm/xe/xe_gpu_scheduler_types.h
@@ -47,6 +47,8 @@ struct xe_gpu_scheduler {
 	const struct xe_sched_backend_ops	*ops;
 	/** @msgs: list of messages to be processed in @work_process_msg */
 	struct list_head			msgs;
+	/** @msg_lock: Message lock */
+	spinlock_t				msg_lock;
 	/** @work_process_msg: processes messages */
 	struct work_struct		work_process_msg;
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v7 4/9] drm/xe: Stop abusing DRM scheduler internals
  2025-12-01 18:39 [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
                   ` (2 preceding siblings ...)
  2025-12-01 18:39 ` [PATCH v7 3/9] drm/xe: Add dedicated message lock Matthew Brost
@ 2025-12-01 18:39 ` Matthew Brost
  2025-12-03 10:56   ` Philipp Stanner
  2025-12-01 18:39 ` [PATCH v7 5/9] drm/xe: Only toggle scheduling in TDR if GuC is running Matthew Brost
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 31+ messages in thread
From: Matthew Brost @ 2025-12-01 18:39 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel

Use new pending job list iterator and new helper functions in Xe to
avoid reaching into DRM scheduler internals.

Part of this change involves removing pending jobs debug information
from debugfs and devcoredump. As agreed, the pending job list should
only be accessed when the scheduler is stopped. However, it's not
straightforward to determine whether the scheduler is stopped from the
shared debugfs/devcoredump code path. Additionally, the pending job list
provides little useful information, as pending jobs can be inferred from
seqnos and ring head/tail positions. Therefore, this debug information
is being removed.

v4:
 - Add comment around DRM_GPU_SCHED_STAT_NO_HANG (Niranjana)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/xe/xe_gpu_scheduler.c    |  4 +-
 drivers/gpu/drm/xe/xe_gpu_scheduler.h    | 33 ++--------
 drivers/gpu/drm/xe/xe_guc_submit.c       | 81 ++++++------------------
 drivers/gpu/drm/xe/xe_guc_submit_types.h | 11 ----
 drivers/gpu/drm/xe/xe_hw_fence.c         | 16 -----
 drivers/gpu/drm/xe/xe_hw_fence.h         |  2 -
 6 files changed, 27 insertions(+), 120 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.c b/drivers/gpu/drm/xe/xe_gpu_scheduler.c
index f4f23317191f..9c8004d5dd91 100644
--- a/drivers/gpu/drm/xe/xe_gpu_scheduler.c
+++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.c
@@ -7,7 +7,7 @@
 
 static void xe_sched_process_msg_queue(struct xe_gpu_scheduler *sched)
 {
-	if (!READ_ONCE(sched->base.pause_submit))
+	if (!drm_sched_is_stopped(&sched->base))
 		queue_work(sched->base.submit_wq, &sched->work_process_msg);
 }
 
@@ -43,7 +43,7 @@ static void xe_sched_process_msg_work(struct work_struct *w)
 		container_of(w, struct xe_gpu_scheduler, work_process_msg);
 	struct xe_sched_msg *msg;
 
-	if (READ_ONCE(sched->base.pause_submit))
+	if (drm_sched_is_stopped(&sched->base))
 		return;
 
 	msg = xe_sched_get_msg(sched);
diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.h b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
index dceb2cd0ee5b..664c2db56af3 100644
--- a/drivers/gpu/drm/xe/xe_gpu_scheduler.h
+++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
@@ -56,12 +56,9 @@ static inline void xe_sched_resubmit_jobs(struct xe_gpu_scheduler *sched)
 	struct drm_sched_job *s_job;
 	bool restore_replay = false;
 
-	list_for_each_entry(s_job, &sched->base.pending_list, list) {
-		struct drm_sched_fence *s_fence = s_job->s_fence;
-		struct dma_fence *hw_fence = s_fence->parent;
-
+	drm_sched_for_each_pending_job(s_job, &sched->base, NULL) {
 		restore_replay |= to_xe_sched_job(s_job)->restore_replay;
-		if (restore_replay || (hw_fence && !dma_fence_is_signaled(hw_fence)))
+		if (restore_replay || !drm_sched_job_is_signaled(s_job))
 			sched->base.ops->run_job(s_job);
 	}
 }
@@ -72,14 +69,6 @@ xe_sched_invalidate_job(struct xe_sched_job *job, int threshold)
 	return drm_sched_invalidate_job(&job->drm, threshold);
 }
 
-static inline void xe_sched_add_pending_job(struct xe_gpu_scheduler *sched,
-					    struct xe_sched_job *job)
-{
-	spin_lock(&sched->base.job_list_lock);
-	list_add(&job->drm.list, &sched->base.pending_list);
-	spin_unlock(&sched->base.job_list_lock);
-}
-
 /**
  * xe_sched_first_pending_job() - Find first pending job which is unsignaled
  * @sched: Xe GPU scheduler
@@ -89,21 +78,13 @@ static inline void xe_sched_add_pending_job(struct xe_gpu_scheduler *sched,
 static inline
 struct xe_sched_job *xe_sched_first_pending_job(struct xe_gpu_scheduler *sched)
 {
-	struct xe_sched_job *job, *r_job = NULL;
-
-	spin_lock(&sched->base.job_list_lock);
-	list_for_each_entry(job, &sched->base.pending_list, drm.list) {
-		struct drm_sched_fence *s_fence = job->drm.s_fence;
-		struct dma_fence *hw_fence = s_fence->parent;
+	struct drm_sched_job *job;
 
-		if (hw_fence && !dma_fence_is_signaled(hw_fence)) {
-			r_job = job;
-			break;
-		}
-	}
-	spin_unlock(&sched->base.job_list_lock);
+	drm_sched_for_each_pending_job(job, &sched->base, NULL)
+		if (!drm_sched_job_is_signaled(job))
+			return to_xe_sched_job(job);
 
-	return r_job;
+	return NULL;
 }
 
 static inline int
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 3ca2558c8c96..c8027ccaec81 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1032,7 +1032,7 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
 	struct xe_exec_queue *q = ge->q;
 	struct xe_guc *guc = exec_queue_to_guc(q);
 	struct xe_gpu_scheduler *sched = &ge->sched;
-	struct xe_sched_job *job;
+	struct drm_sched_job *job;
 	bool wedged = false;
 
 	xe_gt_assert(guc_to_gt(guc), xe_exec_queue_is_lr(q));
@@ -1091,16 +1091,10 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
 	if (!exec_queue_killed(q) && !xe_lrc_ring_is_idle(q->lrc[0]))
 		xe_devcoredump(q, NULL, "LR job cleanup, guc_id=%d", q->guc->id);
 
-	xe_hw_fence_irq_stop(q->fence_irq);
+	drm_sched_for_each_pending_job(job, &sched->base, NULL)
+		xe_sched_job_set_error(to_xe_sched_job(job), -ECANCELED);
 
 	xe_sched_submission_start(sched);
-
-	spin_lock(&sched->base.job_list_lock);
-	list_for_each_entry(job, &sched->base.pending_list, drm.list)
-		xe_sched_job_set_error(job, -ECANCELED);
-	spin_unlock(&sched->base.job_list_lock);
-
-	xe_hw_fence_irq_start(q->fence_irq);
 }
 
 #define ADJUST_FIVE_PERCENT(__t)	mul_u64_u32_div(__t, 105, 100)
@@ -1219,7 +1213,7 @@ static enum drm_gpu_sched_stat
 guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 {
 	struct xe_sched_job *job = to_xe_sched_job(drm_job);
-	struct xe_sched_job *tmp_job;
+	struct drm_sched_job *tmp_job;
 	struct xe_exec_queue *q = job->q;
 	struct xe_gpu_scheduler *sched = &q->guc->sched;
 	struct xe_guc *guc = exec_queue_to_guc(q);
@@ -1227,7 +1221,6 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	struct xe_device *xe = guc_to_xe(guc);
 	int err = -ETIME;
 	pid_t pid = -1;
-	int i = 0;
 	bool wedged = false, skip_timeout_check;
 
 	xe_gt_assert(guc_to_gt(guc), !xe_exec_queue_is_lr(q));
@@ -1392,28 +1385,19 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 		__deregister_exec_queue(guc, q);
 	}
 
-	/* Stop fence signaling */
-	xe_hw_fence_irq_stop(q->fence_irq);
+	/* Mark all outstanding jobs as bad, thus completing them */
+	xe_sched_job_set_error(job, err);
+	drm_sched_for_each_pending_job(tmp_job, &sched->base, NULL)
+		xe_sched_job_set_error(to_xe_sched_job(tmp_job), -ECANCELED);
 
-	/*
-	 * Fence state now stable, stop / start scheduler which cleans up any
-	 * fences that are complete
-	 */
-	xe_sched_add_pending_job(sched, job);
 	xe_sched_submission_start(sched);
-
 	xe_guc_exec_queue_trigger_cleanup(q);
 
-	/* Mark all outstanding jobs as bad, thus completing them */
-	spin_lock(&sched->base.job_list_lock);
-	list_for_each_entry(tmp_job, &sched->base.pending_list, drm.list)
-		xe_sched_job_set_error(tmp_job, !i++ ? err : -ECANCELED);
-	spin_unlock(&sched->base.job_list_lock);
-
-	/* Start fence signaling */
-	xe_hw_fence_irq_start(q->fence_irq);
-
-	return DRM_GPU_SCHED_STAT_RESET;
+	/*
+	 * We want the job added back to the pending list so it gets freed; this
+	 * is what DRM_GPU_SCHED_STAT_NO_HANG does.
+	 */
+	return DRM_GPU_SCHED_STAT_NO_HANG;
 
 sched_enable:
 	set_exec_queue_pending_tdr_exit(q);
@@ -2265,9 +2249,12 @@ static void guc_exec_queue_unpause_prepare(struct xe_guc *guc,
 {
 	struct xe_gpu_scheduler *sched = &q->guc->sched;
 	struct xe_sched_job *job = NULL;
+	struct drm_sched_job *s_job;
 	bool restore_replay = false;
 
-	list_for_each_entry(job, &sched->base.pending_list, drm.list) {
+	drm_sched_for_each_pending_job(s_job, &sched->base, NULL) {
+		job = to_xe_sched_job(s_job);
+
 		restore_replay |= job->restore_replay;
 		if (restore_replay) {
 			xe_gt_dbg(guc_to_gt(guc), "Replay JOB - guc_id=%d, seqno=%d",
@@ -2391,7 +2378,7 @@ void xe_guc_submit_unpause_vf(struct xe_guc *guc)
 		 * created after resfix done.
 		 */
 		if (q->guc->id != index ||
-		    !READ_ONCE(q->guc->sched.base.pause_submit))
+		    !drm_sched_is_stopped(&q->guc->sched.base))
 			continue;
 
 		guc_exec_queue_unpause(guc, q);
@@ -2813,30 +2800,6 @@ xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q)
 	if (snapshot->parallel_execution)
 		guc_exec_queue_wq_snapshot_capture(q, snapshot);
 
-	spin_lock(&sched->base.job_list_lock);
-	snapshot->pending_list_size = list_count_nodes(&sched->base.pending_list);
-	snapshot->pending_list = kmalloc_array(snapshot->pending_list_size,
-					       sizeof(struct pending_list_snapshot),
-					       GFP_ATOMIC);
-
-	if (snapshot->pending_list) {
-		struct xe_sched_job *job_iter;
-
-		i = 0;
-		list_for_each_entry(job_iter, &sched->base.pending_list, drm.list) {
-			snapshot->pending_list[i].seqno =
-				xe_sched_job_seqno(job_iter);
-			snapshot->pending_list[i].fence =
-				dma_fence_is_signaled(job_iter->fence) ? 1 : 0;
-			snapshot->pending_list[i].finished =
-				dma_fence_is_signaled(&job_iter->drm.s_fence->finished)
-				? 1 : 0;
-			i++;
-		}
-	}
-
-	spin_unlock(&sched->base.job_list_lock);
-
 	return snapshot;
 }
 
@@ -2894,13 +2857,6 @@ xe_guc_exec_queue_snapshot_print(struct xe_guc_submit_exec_queue_snapshot *snaps
 
 	if (snapshot->parallel_execution)
 		guc_exec_queue_wq_snapshot_print(snapshot, p);
-
-	for (i = 0; snapshot->pending_list && i < snapshot->pending_list_size;
-	     i++)
-		drm_printf(p, "\tJob: seqno=%d, fence=%d, finished=%d\n",
-			   snapshot->pending_list[i].seqno,
-			   snapshot->pending_list[i].fence,
-			   snapshot->pending_list[i].finished);
 }
 
 /**
@@ -2923,7 +2879,6 @@ void xe_guc_exec_queue_snapshot_free(struct xe_guc_submit_exec_queue_snapshot *s
 			xe_lrc_snapshot_free(snapshot->lrc[i]);
 		kfree(snapshot->lrc);
 	}
-	kfree(snapshot->pending_list);
 	kfree(snapshot);
 }
 
diff --git a/drivers/gpu/drm/xe/xe_guc_submit_types.h b/drivers/gpu/drm/xe/xe_guc_submit_types.h
index dc7456c34583..0b08c79cf3b9 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_submit_types.h
@@ -61,12 +61,6 @@ struct guc_submit_parallel_scratch {
 	u32 wq[WQ_SIZE / sizeof(u32)];
 };
 
-struct pending_list_snapshot {
-	u32 seqno;
-	bool fence;
-	bool finished;
-};
-
 /**
  * struct xe_guc_submit_exec_queue_snapshot - Snapshot for devcoredump
  */
@@ -134,11 +128,6 @@ struct xe_guc_submit_exec_queue_snapshot {
 		/** @wq: Workqueue Items */
 		u32 wq[WQ_SIZE / sizeof(u32)];
 	} parallel;
-
-	/** @pending_list_size: Size of the pending list snapshot array */
-	int pending_list_size;
-	/** @pending_list: snapshot of the pending list info */
-	struct pending_list_snapshot *pending_list;
 };
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_hw_fence.c b/drivers/gpu/drm/xe/xe_hw_fence.c
index b2a0c46dfcd4..e65dfcdfdbc5 100644
--- a/drivers/gpu/drm/xe/xe_hw_fence.c
+++ b/drivers/gpu/drm/xe/xe_hw_fence.c
@@ -110,22 +110,6 @@ void xe_hw_fence_irq_run(struct xe_hw_fence_irq *irq)
 	irq_work_queue(&irq->work);
 }
 
-void xe_hw_fence_irq_stop(struct xe_hw_fence_irq *irq)
-{
-	spin_lock_irq(&irq->lock);
-	irq->enabled = false;
-	spin_unlock_irq(&irq->lock);
-}
-
-void xe_hw_fence_irq_start(struct xe_hw_fence_irq *irq)
-{
-	spin_lock_irq(&irq->lock);
-	irq->enabled = true;
-	spin_unlock_irq(&irq->lock);
-
-	irq_work_queue(&irq->work);
-}
-
 void xe_hw_fence_ctx_init(struct xe_hw_fence_ctx *ctx, struct xe_gt *gt,
 			  struct xe_hw_fence_irq *irq, const char *name)
 {
diff --git a/drivers/gpu/drm/xe/xe_hw_fence.h b/drivers/gpu/drm/xe/xe_hw_fence.h
index f13a1c4982c7..599492c13f80 100644
--- a/drivers/gpu/drm/xe/xe_hw_fence.h
+++ b/drivers/gpu/drm/xe/xe_hw_fence.h
@@ -17,8 +17,6 @@ void xe_hw_fence_module_exit(void);
 void xe_hw_fence_irq_init(struct xe_hw_fence_irq *irq);
 void xe_hw_fence_irq_finish(struct xe_hw_fence_irq *irq);
 void xe_hw_fence_irq_run(struct xe_hw_fence_irq *irq);
-void xe_hw_fence_irq_stop(struct xe_hw_fence_irq *irq);
-void xe_hw_fence_irq_start(struct xe_hw_fence_irq *irq);
 
 void xe_hw_fence_ctx_init(struct xe_hw_fence_ctx *ctx, struct xe_gt *gt,
 			  struct xe_hw_fence_irq *irq, const char *name);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v7 5/9] drm/xe: Only toggle scheduling in TDR if GuC is running
  2025-12-01 18:39 [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
                   ` (3 preceding siblings ...)
  2025-12-01 18:39 ` [PATCH v7 4/9] drm/xe: Stop abusing DRM scheduler internals Matthew Brost
@ 2025-12-01 18:39 ` Matthew Brost
  2025-12-01 18:39 ` [PATCH v7 6/9] drm/xe: Do not deregister queues in TDR Matthew Brost
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2025-12-01 18:39 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel

If the firmware is not running during TDR (e.g., when the driver is
unloading), there's no need to toggle scheduling in the GuC. In such
cases, skip this step.

v4:
 - Bail on wait UC not running (Niranjana)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index c8027ccaec81..d432a39cfbfd 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1274,7 +1274,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 		if (exec_queue_reset(q))
 			err = -EIO;
 
-		if (!exec_queue_destroyed(q)) {
+		if (!exec_queue_destroyed(q) && xe_uc_fw_is_running(&guc->fw)) {
 			/*
 			 * Wait for any pending G2H to flush out before
 			 * modifying state
@@ -1309,6 +1309,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 		 */
 		smp_rmb();
 		ret = wait_event_timeout(guc->ct.wq,
+					 !xe_uc_fw_is_running(&guc->fw) ||
 					 !exec_queue_pending_disable(q) ||
 					 xe_guc_read_stopped(guc) ||
 					 vf_recovery(guc), HZ * 5);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v7 6/9] drm/xe: Do not deregister queues in TDR
  2025-12-01 18:39 [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
                   ` (4 preceding siblings ...)
  2025-12-01 18:39 ` [PATCH v7 5/9] drm/xe: Only toggle scheduling in TDR if GuC is running Matthew Brost
@ 2025-12-01 18:39 ` Matthew Brost
  2025-12-01 18:39 ` [PATCH v7 7/9] drm/xe: Remove special casing for LR queues in submission Matthew Brost
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2025-12-01 18:39 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel

Deregistering queues in the TDR introduces unnecessary complexity,
requiring reference-counting techniques to function correctly,
particularly to prevent use-after-free (UAF) issues while a
deregistration initiated from the TDR is in progress.

All that's needed in the TDR is to kick the queue off the hardware,
which is achieved by disabling scheduling. Queue deregistration should
be handled in a single, well-defined point in the cleanup path, tied to
the queue's reference count.

v4:
 - Explain why extra ref were needed prior to this patch (Niranjana)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 65 +++++-------------------------
 1 file changed, 9 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index d432a39cfbfd..622b3d92ba41 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -69,9 +69,8 @@ exec_queue_to_guc(struct xe_exec_queue *q)
 #define EXEC_QUEUE_STATE_WEDGED			(1 << 8)
 #define EXEC_QUEUE_STATE_BANNED			(1 << 9)
 #define EXEC_QUEUE_STATE_CHECK_TIMEOUT		(1 << 10)
-#define EXEC_QUEUE_STATE_EXTRA_REF		(1 << 11)
-#define EXEC_QUEUE_STATE_PENDING_RESUME		(1 << 12)
-#define EXEC_QUEUE_STATE_PENDING_TDR_EXIT	(1 << 13)
+#define EXEC_QUEUE_STATE_PENDING_RESUME		(1 << 11)
+#define EXEC_QUEUE_STATE_PENDING_TDR_EXIT	(1 << 12)
 
 static bool exec_queue_registered(struct xe_exec_queue *q)
 {
@@ -218,21 +217,6 @@ static void clear_exec_queue_check_timeout(struct xe_exec_queue *q)
 	atomic_and(~EXEC_QUEUE_STATE_CHECK_TIMEOUT, &q->guc->state);
 }
 
-static bool exec_queue_extra_ref(struct xe_exec_queue *q)
-{
-	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_EXTRA_REF;
-}
-
-static void set_exec_queue_extra_ref(struct xe_exec_queue *q)
-{
-	atomic_or(EXEC_QUEUE_STATE_EXTRA_REF, &q->guc->state);
-}
-
-static void clear_exec_queue_extra_ref(struct xe_exec_queue *q)
-{
-	atomic_and(~EXEC_QUEUE_STATE_EXTRA_REF, &q->guc->state);
-}
-
 static bool exec_queue_pending_resume(struct xe_exec_queue *q)
 {
 	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_PENDING_RESUME;
@@ -1190,25 +1174,6 @@ static void disable_scheduling(struct xe_exec_queue *q, bool immediate)
 		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
 }
 
-static void __deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
-{
-	u32 action[] = {
-		XE_GUC_ACTION_DEREGISTER_CONTEXT,
-		q->guc->id,
-	};
-
-	xe_gt_assert(guc_to_gt(guc), !exec_queue_destroyed(q));
-	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
-	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_enable(q));
-	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_disable(q));
-
-	set_exec_queue_destroyed(q);
-	trace_xe_exec_queue_deregister(q);
-
-	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
-		       G2H_LEN_DW_DEREGISTER_CONTEXT, 1);
-}
-
 static enum drm_gpu_sched_stat
 guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 {
@@ -1224,6 +1189,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	bool wedged = false, skip_timeout_check;
 
 	xe_gt_assert(guc_to_gt(guc), !xe_exec_queue_is_lr(q));
+	xe_gt_assert(guc_to_gt(guc), !exec_queue_destroyed(q));
 
 	/*
 	 * TDR has fired before free job worker. Common if exec queue
@@ -1240,8 +1206,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 
 	/* Must check all state after stopping scheduler */
 	skip_timeout_check = exec_queue_reset(q) ||
-		exec_queue_killed_or_banned_or_wedged(q) ||
-		exec_queue_destroyed(q);
+		exec_queue_killed_or_banned_or_wedged(q);
 
 	/*
 	 * If devcoredump not captured and GuC capture for the job is not ready
@@ -1268,13 +1233,13 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 		wedged = guc_submit_hint_wedged(exec_queue_to_guc(q));
 
 	/* Engine state now stable, disable scheduling to check timestamp */
-	if (!wedged && exec_queue_registered(q)) {
+	if (!wedged && (exec_queue_enabled(q) || exec_queue_pending_disable(q))) {
 		int ret;
 
 		if (exec_queue_reset(q))
 			err = -EIO;
 
-		if (!exec_queue_destroyed(q) && xe_uc_fw_is_running(&guc->fw)) {
+		if (xe_uc_fw_is_running(&guc->fw)) {
 			/*
 			 * Wait for any pending G2H to flush out before
 			 * modifying state
@@ -1324,8 +1289,6 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 			xe_devcoredump(q, job,
 				       "Schedule disable failed to respond, guc_id=%d, ret=%d, guc_read=%d",
 				       q->guc->id, ret, xe_guc_read_stopped(guc));
-			set_exec_queue_extra_ref(q);
-			xe_exec_queue_get(q);	/* GT reset owns this */
 			set_exec_queue_banned(q);
 			xe_gt_reset_async(q->gt);
 			xe_sched_tdr_queue_imm(sched);
@@ -1378,13 +1341,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 		}
 	}
 
-	/* Finish cleaning up exec queue via deregister */
 	set_exec_queue_banned(q);
-	if (!wedged && exec_queue_registered(q) && !exec_queue_destroyed(q)) {
-		set_exec_queue_extra_ref(q);
-		xe_exec_queue_get(q);
-		__deregister_exec_queue(guc, q);
-	}
 
 	/* Mark all outstanding jobs as bad, thus completing them */
 	xe_sched_job_set_error(job, err);
@@ -1928,7 +1885,7 @@ static void guc_exec_queue_stop(struct xe_guc *guc, struct xe_exec_queue *q)
 
 	/* Clean up lost G2H + reset engine state */
 	if (exec_queue_registered(q)) {
-		if (exec_queue_extra_ref(q) || xe_exec_queue_is_lr(q))
+		if (xe_exec_queue_is_lr(q))
 			xe_exec_queue_put(q);
 		else if (exec_queue_destroyed(q))
 			__guc_exec_queue_destroy(guc, q);
@@ -2062,11 +2019,7 @@ static void guc_exec_queue_revert_pending_state_change(struct xe_guc *guc,
 
 	if (exec_queue_destroyed(q) && exec_queue_registered(q)) {
 		clear_exec_queue_destroyed(q);
-		if (exec_queue_extra_ref(q))
-			xe_exec_queue_put(q);
-		else
-			q->guc->needs_cleanup = true;
-		clear_exec_queue_extra_ref(q);
+		q->guc->needs_cleanup = true;
 		xe_gt_dbg(guc_to_gt(guc), "Replay CLEANUP - guc_id=%d",
 			  q->guc->id);
 	}
@@ -2533,7 +2486,7 @@ static void handle_deregister_done(struct xe_guc *guc, struct xe_exec_queue *q)
 
 	clear_exec_queue_registered(q);
 
-	if (exec_queue_extra_ref(q) || xe_exec_queue_is_lr(q))
+	if (xe_exec_queue_is_lr(q))
 		xe_exec_queue_put(q);
 	else
 		__guc_exec_queue_destroy(guc, q);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v7 7/9] drm/xe: Remove special casing for LR queues in submission
  2025-12-01 18:39 [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
                   ` (5 preceding siblings ...)
  2025-12-01 18:39 ` [PATCH v7 6/9] drm/xe: Do not deregister queues in TDR Matthew Brost
@ 2025-12-01 18:39 ` Matthew Brost
  2025-12-01 18:39 ` [PATCH v7 8/9] drm/xe: Disable timestamp WA on VFs Matthew Brost
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2025-12-01 18:39 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel

Now that LR jobs are tracked by the DRM scheduler, there's no longer a
need to special-case LR queues. This change removes all LR
queue-specific handling, including dedicated TDR logic, reference
counting schemes, and other related mechanisms.

v4:
 - Remove xe_exec_queue_lr_cleanup tracepoint (Niranjana)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_exec_queue_types.h |   2 -
 drivers/gpu/drm/xe/xe_guc_submit.c           | 132 ++-----------------
 drivers/gpu/drm/xe/xe_trace.h                |   5 -
 3 files changed, 11 insertions(+), 128 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h b/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
index a3b034e4b205..fd0915ed8eb1 100644
--- a/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
@@ -33,8 +33,6 @@ struct xe_guc_exec_queue {
 	 */
 #define MAX_STATIC_MSG_TYPE	3
 	struct xe_sched_msg static_msgs[MAX_STATIC_MSG_TYPE];
-	/** @lr_tdr: long running TDR worker */
-	struct work_struct lr_tdr;
 	/** @destroy_async: do final destroy async from this worker */
 	struct work_struct destroy_async;
 	/** @resume_time: time of last resume */
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 622b3d92ba41..8190f2afbaed 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -674,14 +674,6 @@ static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
 		parallel_write(xe, map, wq_desc.wq_status, WQ_STATUS_ACTIVE);
 	}
 
-	/*
-	 * We must keep a reference for LR engines if engine is registered with
-	 * the GuC as jobs signal immediately and can't destroy an engine if the
-	 * GuC has a reference to it.
-	 */
-	if (xe_exec_queue_is_lr(q))
-		xe_exec_queue_get(q);
-
 	set_exec_queue_registered(q);
 	trace_xe_exec_queue_register(q);
 	if (xe_exec_queue_is_parallel(q))
@@ -854,7 +846,7 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
 	struct xe_sched_job *job = to_xe_sched_job(drm_job);
 	struct xe_exec_queue *q = job->q;
 	struct xe_guc *guc = exec_queue_to_guc(q);
-	bool lr = xe_exec_queue_is_lr(q), killed_or_banned_or_wedged =
+	bool killed_or_banned_or_wedged =
 		exec_queue_killed_or_banned_or_wedged(q);
 
 	xe_gt_assert(guc_to_gt(guc), !(exec_queue_destroyed(q) || exec_queue_pending_disable(q)) ||
@@ -871,15 +863,6 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
 		job->restore_replay = false;
 	}
 
-	/*
-	 * We don't care about job-fence ordering in LR VMs because these fences
-	 * are never exported; they are used solely to keep jobs on the pending
-	 * list. Once a queue enters an error state, there's no need to track
-	 * them.
-	 */
-	if (killed_or_banned_or_wedged && lr)
-		xe_sched_job_set_error(job, -ECANCELED);
-
 	return job->fence;
 }
 
@@ -923,8 +906,7 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
 		xe_gt_warn(q->gt, "Pending enable/disable failed to respond\n");
 		xe_sched_submission_start(sched);
 		xe_gt_reset_async(q->gt);
-		if (!xe_exec_queue_is_lr(q))
-			xe_sched_tdr_queue_imm(sched);
+		xe_sched_tdr_queue_imm(sched);
 		return;
 	}
 
@@ -950,10 +932,7 @@ static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q)
 	/** to wakeup xe_wait_user_fence ioctl if exec queue is reset */
 	wake_up_all(&xe->ufence_wq);
 
-	if (xe_exec_queue_is_lr(q))
-		queue_work(guc_to_gt(guc)->ordered_wq, &q->guc->lr_tdr);
-	else
-		xe_sched_tdr_queue_imm(&q->guc->sched);
+	xe_sched_tdr_queue_imm(&q->guc->sched);
 }
 
 /**
@@ -1009,78 +988,6 @@ static bool guc_submit_hint_wedged(struct xe_guc *guc)
 	return true;
 }
 
-static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
-{
-	struct xe_guc_exec_queue *ge =
-		container_of(w, struct xe_guc_exec_queue, lr_tdr);
-	struct xe_exec_queue *q = ge->q;
-	struct xe_guc *guc = exec_queue_to_guc(q);
-	struct xe_gpu_scheduler *sched = &ge->sched;
-	struct drm_sched_job *job;
-	bool wedged = false;
-
-	xe_gt_assert(guc_to_gt(guc), xe_exec_queue_is_lr(q));
-
-	if (vf_recovery(guc))
-		return;
-
-	trace_xe_exec_queue_lr_cleanup(q);
-
-	if (!exec_queue_killed(q))
-		wedged = guc_submit_hint_wedged(exec_queue_to_guc(q));
-
-	/* Kill the run_job / process_msg entry points */
-	xe_sched_submission_stop(sched);
-
-	/*
-	 * Engine state now mostly stable, disable scheduling / deregister if
-	 * needed. This cleanup routine might be called multiple times, where
-	 * the actual async engine deregister drops the final engine ref.
-	 * Calling disable_scheduling_deregister will mark the engine as
-	 * destroyed and fire off the CT requests to disable scheduling /
-	 * deregister, which we only want to do once. We also don't want to mark
-	 * the engine as pending_disable again as this may race with the
-	 * xe_guc_deregister_done_handler() which treats it as an unexpected
-	 * state.
-	 */
-	if (!wedged && exec_queue_registered(q) && !exec_queue_destroyed(q)) {
-		struct xe_guc *guc = exec_queue_to_guc(q);
-		int ret;
-
-		set_exec_queue_banned(q);
-		disable_scheduling_deregister(guc, q);
-
-		/*
-		 * Must wait for scheduling to be disabled before signalling
-		 * any fences, if GT broken the GT reset code should signal us.
-		 */
-		ret = wait_event_timeout(guc->ct.wq,
-					 !exec_queue_pending_disable(q) ||
-					 xe_guc_read_stopped(guc) ||
-					 vf_recovery(guc), HZ * 5);
-		if (vf_recovery(guc))
-			return;
-
-		if (!ret) {
-			xe_gt_warn(q->gt, "Schedule disable failed to respond, guc_id=%d\n",
-				   q->guc->id);
-			xe_devcoredump(q, NULL, "Schedule disable failed to respond, guc_id=%d\n",
-				       q->guc->id);
-			xe_sched_submission_start(sched);
-			xe_gt_reset_async(q->gt);
-			return;
-		}
-	}
-
-	if (!exec_queue_killed(q) && !xe_lrc_ring_is_idle(q->lrc[0]))
-		xe_devcoredump(q, NULL, "LR job cleanup, guc_id=%d", q->guc->id);
-
-	drm_sched_for_each_pending_job(job, &sched->base, NULL)
-		xe_sched_job_set_error(to_xe_sched_job(job), -ECANCELED);
-
-	xe_sched_submission_start(sched);
-}
-
 #define ADJUST_FIVE_PERCENT(__t)	mul_u64_u32_div(__t, 105, 100)
 
 static bool check_timeout(struct xe_exec_queue *q, struct xe_sched_job *job)
@@ -1150,8 +1057,7 @@ static void enable_scheduling(struct xe_exec_queue *q)
 		xe_gt_warn(guc_to_gt(guc), "Schedule enable failed to respond");
 		set_exec_queue_banned(q);
 		xe_gt_reset_async(q->gt);
-		if (!xe_exec_queue_is_lr(q))
-			xe_sched_tdr_queue_imm(&q->guc->sched);
+		xe_sched_tdr_queue_imm(&q->guc->sched);
 	}
 }
 
@@ -1188,7 +1094,6 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	pid_t pid = -1;
 	bool wedged = false, skip_timeout_check;
 
-	xe_gt_assert(guc_to_gt(guc), !xe_exec_queue_is_lr(q));
 	xe_gt_assert(guc_to_gt(guc), !exec_queue_destroyed(q));
 
 	/*
@@ -1208,6 +1113,10 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	skip_timeout_check = exec_queue_reset(q) ||
 		exec_queue_killed_or_banned_or_wedged(q);
 
+	/* LR jobs can only get here if queue has been killed or hit an error */
+	if (xe_exec_queue_is_lr(q))
+		xe_gt_assert(guc_to_gt(guc), skip_timeout_check);
+
 	/*
 	 * If devcoredump not captured and GuC capture for the job is not ready
 	 * do manual capture first and decide later if we need to use it
@@ -1397,8 +1306,6 @@ static void __guc_exec_queue_destroy_async(struct work_struct *w)
 	guard(xe_pm_runtime)(guc_to_xe(guc));
 	trace_xe_exec_queue_destroy(q);
 
-	if (xe_exec_queue_is_lr(q))
-		cancel_work_sync(&ge->lr_tdr);
 	/* Confirm no work left behind accessing device structures */
 	cancel_delayed_work_sync(&ge->sched.base.work_tdr);
 
@@ -1629,9 +1536,6 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
 	if (err)
 		goto err_sched;
 
-	if (xe_exec_queue_is_lr(q))
-		INIT_WORK(&q->guc->lr_tdr, xe_guc_exec_queue_lr_cleanup);
-
 	mutex_lock(&guc->submission_state.lock);
 
 	err = alloc_guc_id(guc, q);
@@ -1885,9 +1789,7 @@ static void guc_exec_queue_stop(struct xe_guc *guc, struct xe_exec_queue *q)
 
 	/* Clean up lost G2H + reset engine state */
 	if (exec_queue_registered(q)) {
-		if (xe_exec_queue_is_lr(q))
-			xe_exec_queue_put(q);
-		else if (exec_queue_destroyed(q))
+		if (exec_queue_destroyed(q))
 			__guc_exec_queue_destroy(guc, q);
 	}
 	if (q->guc->suspend_pending) {
@@ -1917,9 +1819,6 @@ static void guc_exec_queue_stop(struct xe_guc *guc, struct xe_exec_queue *q)
 				trace_xe_sched_job_ban(job);
 				ban = true;
 			}
-		} else if (xe_exec_queue_is_lr(q) &&
-			   !xe_lrc_ring_is_idle(q->lrc[0])) {
-			ban = true;
 		}
 
 		if (ban) {
@@ -2002,8 +1901,6 @@ static void guc_exec_queue_revert_pending_state_change(struct xe_guc *guc,
 	if (pending_enable && !pending_resume &&
 	    !exec_queue_pending_tdr_exit(q)) {
 		clear_exec_queue_registered(q);
-		if (xe_exec_queue_is_lr(q))
-			xe_exec_queue_put(q);
 		xe_gt_dbg(guc_to_gt(guc), "Replay REGISTER - guc_id=%d",
 			  q->guc->id);
 	}
@@ -2072,10 +1969,7 @@ static void guc_exec_queue_pause(struct xe_guc *guc, struct xe_exec_queue *q)
 
 	/* Stop scheduling + flush any DRM scheduler operations */
 	xe_sched_submission_stop(sched);
-	if (xe_exec_queue_is_lr(q))
-		cancel_work_sync(&q->guc->lr_tdr);
-	else
-		cancel_delayed_work_sync(&sched->base.work_tdr);
+	cancel_delayed_work_sync(&sched->base.work_tdr);
 
 	guc_exec_queue_revert_pending_state_change(guc, q);
 
@@ -2485,11 +2379,7 @@ static void handle_deregister_done(struct xe_guc *guc, struct xe_exec_queue *q)
 	trace_xe_exec_queue_deregister_done(q);
 
 	clear_exec_queue_registered(q);
-
-	if (xe_exec_queue_is_lr(q))
-		xe_exec_queue_put(q);
-	else
-		__guc_exec_queue_destroy(guc, q);
+	__guc_exec_queue_destroy(guc, q);
 }
 
 int xe_guc_deregister_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
index 79a97b086cb2..cf2ef70fb7ce 100644
--- a/drivers/gpu/drm/xe/xe_trace.h
+++ b/drivers/gpu/drm/xe/xe_trace.h
@@ -182,11 +182,6 @@ DEFINE_EVENT(xe_exec_queue, xe_exec_queue_resubmit,
 	     TP_ARGS(q)
 );
 
-DEFINE_EVENT(xe_exec_queue, xe_exec_queue_lr_cleanup,
-	     TP_PROTO(struct xe_exec_queue *q),
-	     TP_ARGS(q)
-);
-
 DECLARE_EVENT_CLASS(xe_sched_job,
 		    TP_PROTO(struct xe_sched_job *job),
 		    TP_ARGS(job),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v7 8/9] drm/xe: Disable timestamp WA on VFs
  2025-12-01 18:39 [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
                   ` (6 preceding siblings ...)
  2025-12-01 18:39 ` [PATCH v7 7/9] drm/xe: Remove special casing for LR queues in submission Matthew Brost
@ 2025-12-01 18:39 ` Matthew Brost
  2025-12-02  6:42   ` Umesh Nerlige Ramappa
  2025-12-01 18:39 ` [PATCH v7 9/9] drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR Matthew Brost
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 31+ messages in thread
From: Matthew Brost @ 2025-12-01 18:39 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel

The timestamp WA does not work on a VF because it requires reading MMIO
registers, which are inaccessible on a VF. This timestamp WA confuses
LRC sampling on a VF during TDR, as the LRC timestamp would always read
as 1 for any active context. Disable the timestamp WA on VFs to avoid
this confusion.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_lrc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index a05060f75e7e..166353455f8f 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -1063,6 +1063,9 @@ static ssize_t setup_utilization_wa(struct xe_lrc *lrc,
 {
 	u32 *cmd = batch;
 
+	if (IS_SRIOV_VF(gt_to_xe(lrc->gt)))
+		return 0;
+
 	if (xe_gt_WARN_ON(lrc->gt, max_len < 12))
 		return -ENOSPC;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v7 9/9] drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR
  2025-12-01 18:39 [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
                   ` (7 preceding siblings ...)
  2025-12-01 18:39 ` [PATCH v7 8/9] drm/xe: Disable timestamp WA on VFs Matthew Brost
@ 2025-12-01 18:39 ` Matthew Brost
  2025-12-02  7:31   ` Umesh Nerlige Ramappa
  2025-12-02  0:53 ` ✗ CI.checkpatch: warning for Fix DRM scheduler layering violations in Xe (rev8) Patchwork
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 31+ messages in thread
From: Matthew Brost @ 2025-12-01 18:39 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel

We now have proper infrastructure to accurately check the LRC timestamp
without toggling the scheduling state for non-VFs. For VFs, it is still
possible to get an inaccurate view if the context is on hardware. We
guard against free-running contexts on VFs by banning jobs whose
timestamps are not moving. In addition, VFs have a timeslice quantum
that naturally triggers context switches when more than one VF is
running, thus updating the LRC timestamp.

For multi-queue, it is desirable to avoid scheduling toggling in the TDR
because this scheduling state is shared among many queues. Furthermore,
this change simplifies the GuC state machine. The trade-off for VF cases
seems worthwhile.

v5:
 - Add xe_lrc_timestamp helper (Umesh)
v6:
 - Reduce number of tries on stuck timestamp (VF testing)
 - Convert job timestamp save to a memory copy (VF testing)
v7:
 - Save ctx timestamp to LRC when start VF job (VF testing)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c      | 97 ++++++-------------------
 drivers/gpu/drm/xe/xe_lrc.c             | 42 +++++++----
 drivers/gpu/drm/xe/xe_lrc.h             |  3 +-
 drivers/gpu/drm/xe/xe_ring_ops.c        | 25 +++++--
 drivers/gpu/drm/xe/xe_sched_job.c       |  1 +
 drivers/gpu/drm/xe/xe_sched_job_types.h |  2 +
 6 files changed, 76 insertions(+), 94 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 8190f2afbaed..dc4bf3126450 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -68,9 +68,7 @@ exec_queue_to_guc(struct xe_exec_queue *q)
 #define EXEC_QUEUE_STATE_KILLED			(1 << 7)
 #define EXEC_QUEUE_STATE_WEDGED			(1 << 8)
 #define EXEC_QUEUE_STATE_BANNED			(1 << 9)
-#define EXEC_QUEUE_STATE_CHECK_TIMEOUT		(1 << 10)
-#define EXEC_QUEUE_STATE_PENDING_RESUME		(1 << 11)
-#define EXEC_QUEUE_STATE_PENDING_TDR_EXIT	(1 << 12)
+#define EXEC_QUEUE_STATE_PENDING_RESUME		(1 << 10)
 
 static bool exec_queue_registered(struct xe_exec_queue *q)
 {
@@ -202,21 +200,6 @@ static void set_exec_queue_wedged(struct xe_exec_queue *q)
 	atomic_or(EXEC_QUEUE_STATE_WEDGED, &q->guc->state);
 }
 
-static bool exec_queue_check_timeout(struct xe_exec_queue *q)
-{
-	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_CHECK_TIMEOUT;
-}
-
-static void set_exec_queue_check_timeout(struct xe_exec_queue *q)
-{
-	atomic_or(EXEC_QUEUE_STATE_CHECK_TIMEOUT, &q->guc->state);
-}
-
-static void clear_exec_queue_check_timeout(struct xe_exec_queue *q)
-{
-	atomic_and(~EXEC_QUEUE_STATE_CHECK_TIMEOUT, &q->guc->state);
-}
-
 static bool exec_queue_pending_resume(struct xe_exec_queue *q)
 {
 	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_PENDING_RESUME;
@@ -232,21 +215,6 @@ static void clear_exec_queue_pending_resume(struct xe_exec_queue *q)
 	atomic_and(~EXEC_QUEUE_STATE_PENDING_RESUME, &q->guc->state);
 }
 
-static bool exec_queue_pending_tdr_exit(struct xe_exec_queue *q)
-{
-	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_PENDING_TDR_EXIT;
-}
-
-static void set_exec_queue_pending_tdr_exit(struct xe_exec_queue *q)
-{
-	atomic_or(EXEC_QUEUE_STATE_PENDING_TDR_EXIT, &q->guc->state);
-}
-
-static void clear_exec_queue_pending_tdr_exit(struct xe_exec_queue *q)
-{
-	atomic_and(~EXEC_QUEUE_STATE_PENDING_TDR_EXIT, &q->guc->state);
-}
-
 static bool exec_queue_killed_or_banned_or_wedged(struct xe_exec_queue *q)
 {
 	return (atomic_read(&q->guc->state) &
@@ -1006,7 +974,16 @@ static bool check_timeout(struct xe_exec_queue *q, struct xe_sched_job *job)
 		return xe_sched_invalidate_job(job, 2);
 	}
 
-	ctx_timestamp = lower_32_bits(xe_lrc_ctx_timestamp(q->lrc[0]));
+	ctx_timestamp = lower_32_bits(xe_lrc_timestamp(q->lrc[0]));
+	if (ctx_timestamp == job->sample_timestamp) {
+		xe_gt_warn(gt, "Check job timeout: seqno=%u, lrc_seqno=%u, guc_id=%d, timestamp stuck",
+			   xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job),
+			   q->guc->id);
+
+		return xe_sched_invalidate_job(job, 0);
+	}
+
+	job->sample_timestamp = ctx_timestamp;
 	ctx_job_timestamp = xe_lrc_ctx_job_timestamp(q->lrc[0]);
 
 	/*
@@ -1132,16 +1109,17 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	}
 
 	/*
-	 * XXX: Sampling timeout doesn't work in wedged mode as we have to
-	 * modify scheduling state to read timestamp. We could read the
-	 * timestamp from a register to accumulate current running time but this
-	 * doesn't work for SRIOV. For now assuming timeouts in wedged mode are
-	 * genuine timeouts.
+	 * Check if job is actually timed out, if so restart job execution and TDR
 	 */
+	if (!skip_timeout_check && !check_timeout(q, job))
+		goto rearm;
+
 	if (!exec_queue_killed(q))
 		wedged = guc_submit_hint_wedged(exec_queue_to_guc(q));
 
-	/* Engine state now stable, disable scheduling to check timestamp */
+	set_exec_queue_banned(q);
+
+	/* Kick job / queue off hardware */
 	if (!wedged && (exec_queue_enabled(q) || exec_queue_pending_disable(q))) {
 		int ret;
 
@@ -1163,13 +1141,6 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 			if (!ret || xe_guc_read_stopped(guc))
 				goto trigger_reset;
 
-			/*
-			 * Flag communicates to G2H handler that schedule
-			 * disable originated from a timeout check. The G2H then
-			 * avoid triggering cleanup or deregistering the exec
-			 * queue.
-			 */
-			set_exec_queue_check_timeout(q);
 			disable_scheduling(q, skip_timeout_check);
 		}
 
@@ -1198,22 +1169,12 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 			xe_devcoredump(q, job,
 				       "Schedule disable failed to respond, guc_id=%d, ret=%d, guc_read=%d",
 				       q->guc->id, ret, xe_guc_read_stopped(guc));
-			set_exec_queue_banned(q);
 			xe_gt_reset_async(q->gt);
 			xe_sched_tdr_queue_imm(sched);
 			goto rearm;
 		}
 	}
 
-	/*
-	 * Check if job is actually timed out, if so restart job execution and TDR
-	 */
-	if (!wedged && !skip_timeout_check && !check_timeout(q, job) &&
-	    !exec_queue_reset(q) && exec_queue_registered(q)) {
-		clear_exec_queue_check_timeout(q);
-		goto sched_enable;
-	}
-
 	if (q->vm && q->vm->xef) {
 		process_name = q->vm->xef->process_name;
 		pid = q->vm->xef->pid;
@@ -1244,14 +1205,11 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	if (!wedged && (q->flags & EXEC_QUEUE_FLAG_KERNEL ||
 			(q->flags & EXEC_QUEUE_FLAG_VM && !exec_queue_killed(q)))) {
 		if (!xe_sched_invalidate_job(job, 2)) {
-			clear_exec_queue_check_timeout(q);
 			xe_gt_reset_async(q->gt);
 			goto rearm;
 		}
 	}
 
-	set_exec_queue_banned(q);
-
 	/* Mark all outstanding jobs as bad, thus completing them */
 	xe_sched_job_set_error(job, err);
 	drm_sched_for_each_pending_job(tmp_job, &sched->base, NULL)
@@ -1266,9 +1224,6 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	 */
 	return DRM_GPU_SCHED_STAT_NO_HANG;
 
-sched_enable:
-	set_exec_queue_pending_tdr_exit(q);
-	enable_scheduling(q);
 rearm:
 	/*
 	 * XXX: Ideally want to adjust timeout based on current execution time
@@ -1898,8 +1853,7 @@ static void guc_exec_queue_revert_pending_state_change(struct xe_guc *guc,
 			  q->guc->id);
 	}
 
-	if (pending_enable && !pending_resume &&
-	    !exec_queue_pending_tdr_exit(q)) {
+	if (pending_enable && !pending_resume) {
 		clear_exec_queue_registered(q);
 		xe_gt_dbg(guc_to_gt(guc), "Replay REGISTER - guc_id=%d",
 			  q->guc->id);
@@ -1908,7 +1862,6 @@ static void guc_exec_queue_revert_pending_state_change(struct xe_guc *guc,
 	if (pending_enable) {
 		clear_exec_queue_enabled(q);
 		clear_exec_queue_pending_resume(q);
-		clear_exec_queue_pending_tdr_exit(q);
 		clear_exec_queue_pending_enable(q);
 		xe_gt_dbg(guc_to_gt(guc), "Replay ENABLE - guc_id=%d",
 			  q->guc->id);
@@ -1934,7 +1887,6 @@ static void guc_exec_queue_revert_pending_state_change(struct xe_guc *guc,
 		if (!pending_enable)
 			set_exec_queue_enabled(q);
 		clear_exec_queue_pending_disable(q);
-		clear_exec_queue_check_timeout(q);
 		xe_gt_dbg(guc_to_gt(guc), "Replay DISABLE - guc_id=%d",
 			  q->guc->id);
 	}
@@ -2308,13 +2260,10 @@ static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
 
 		q->guc->resume_time = ktime_get();
 		clear_exec_queue_pending_resume(q);
-		clear_exec_queue_pending_tdr_exit(q);
 		clear_exec_queue_pending_enable(q);
 		smp_wmb();
 		wake_up_all(&guc->ct.wq);
 	} else {
-		bool check_timeout = exec_queue_check_timeout(q);
-
 		xe_gt_assert(guc_to_gt(guc), runnable_state == 0);
 		xe_gt_assert(guc_to_gt(guc), exec_queue_pending_disable(q));
 
@@ -2322,11 +2271,11 @@ static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
 			suspend_fence_signal(q);
 			clear_exec_queue_pending_disable(q);
 		} else {
-			if (exec_queue_banned(q) || check_timeout) {
+			if (exec_queue_banned(q)) {
 				smp_wmb();
 				wake_up_all(&guc->ct.wq);
 			}
-			if (!check_timeout && exec_queue_destroyed(q)) {
+			if (exec_queue_destroyed(q)) {
 				/*
 				 * Make sure to clear the pending_disable only
 				 * after sampling the destroyed state. We want
@@ -2436,7 +2385,7 @@ int xe_guc_exec_queue_reset_handler(struct xe_guc *guc, u32 *msg, u32 len)
 	 * guc_exec_queue_timedout_job.
 	 */
 	set_exec_queue_reset(q);
-	if (!exec_queue_banned(q) && !exec_queue_check_timeout(q))
+	if (!exec_queue_banned(q))
 		xe_guc_exec_queue_trigger_cleanup(q);
 
 	return 0;
@@ -2517,7 +2466,7 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
 
 	/* Treat the same as engine reset */
 	set_exec_queue_reset(q);
-	if (!exec_queue_banned(q) && !exec_queue_check_timeout(q))
+	if (!exec_queue_banned(q))
 		xe_guc_exec_queue_trigger_cleanup(q);
 
 	return 0;
diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index 166353455f8f..38b0c536f6fb 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -852,7 +852,7 @@ u32 xe_lrc_ctx_timestamp_udw_ggtt_addr(struct xe_lrc *lrc)
  *
  * Returns: ctx timestamp value
  */
-u64 xe_lrc_ctx_timestamp(struct xe_lrc *lrc)
+static u64 xe_lrc_ctx_timestamp(struct xe_lrc *lrc)
 {
 	struct xe_device *xe = lrc_to_xe(lrc);
 	struct iosys_map map;
@@ -2380,35 +2380,31 @@ static int get_ctx_timestamp(struct xe_lrc *lrc, u32 engine_id, u64 *reg_ctx_ts)
 }
 
 /**
- * xe_lrc_update_timestamp() - Update ctx timestamp
+ * xe_lrc_timestamp() - Current ctx timestamp
  * @lrc: Pointer to the lrc.
- * @old_ts: Old timestamp value
  *
- * Populate @old_ts current saved ctx timestamp, read new ctx timestamp and
- * update saved value. With support for active contexts, the calculation may be
- * slightly racy, so follow a read-again logic to ensure that the context is
- * still active before returning the right timestamp.
+ * Return latest ctx timestamp. With support for active contexts, the
+ * calculation may bb slightly racy, so follow a read-again logic to ensure that
+ * the context is still active before returning the right timestamp.
  *
  * Returns: New ctx timestamp value
  */
-u64 xe_lrc_update_timestamp(struct xe_lrc *lrc, u64 *old_ts)
+u64 xe_lrc_timestamp(struct xe_lrc *lrc)
 {
-	u64 lrc_ts, reg_ts;
+	u64 lrc_ts, reg_ts, new_ts;
 	u32 engine_id;
 
-	*old_ts = lrc->ctx_timestamp;
-
 	lrc_ts = xe_lrc_ctx_timestamp(lrc);
 	/* CTX_TIMESTAMP mmio read is invalid on VF, so return the LRC value */
 	if (IS_SRIOV_VF(lrc_to_xe(lrc))) {
-		lrc->ctx_timestamp = lrc_ts;
+		new_ts = lrc_ts;
 		goto done;
 	}
 
 	if (lrc_ts == CONTEXT_ACTIVE) {
 		engine_id = xe_lrc_engine_id(lrc);
 		if (!get_ctx_timestamp(lrc, engine_id, &reg_ts))
-			lrc->ctx_timestamp = reg_ts;
+			new_ts = reg_ts;
 
 		/* read lrc again to ensure context is still active */
 		lrc_ts = xe_lrc_ctx_timestamp(lrc);
@@ -2419,9 +2415,27 @@ u64 xe_lrc_update_timestamp(struct xe_lrc *lrc, u64 *old_ts)
 	 * be a separate if condition.
 	 */
 	if (lrc_ts != CONTEXT_ACTIVE)
-		lrc->ctx_timestamp = lrc_ts;
+		new_ts = lrc_ts;
 
 done:
+	return new_ts;
+}
+
+/**
+ * xe_lrc_update_timestamp() - Update ctx timestamp
+ * @lrc: Pointer to the lrc.
+ * @old_ts: Old timestamp value
+ *
+ * Populate @old_ts current saved ctx timestamp, read new ctx timestamp and
+ * update saved value.
+ *
+ * Returns: New ctx timestamp value
+ */
+u64 xe_lrc_update_timestamp(struct xe_lrc *lrc, u64 *old_ts)
+{
+	*old_ts = lrc->ctx_timestamp;
+	lrc->ctx_timestamp = xe_lrc_timestamp(lrc);
+
 	trace_xe_lrc_update_timestamp(lrc, *old_ts);
 
 	return lrc->ctx_timestamp;
diff --git a/drivers/gpu/drm/xe/xe_lrc.h b/drivers/gpu/drm/xe/xe_lrc.h
index a32472b92242..93c1234e2706 100644
--- a/drivers/gpu/drm/xe/xe_lrc.h
+++ b/drivers/gpu/drm/xe/xe_lrc.h
@@ -142,7 +142,6 @@ void xe_lrc_snapshot_free(struct xe_lrc_snapshot *snapshot);
 
 u32 xe_lrc_ctx_timestamp_ggtt_addr(struct xe_lrc *lrc);
 u32 xe_lrc_ctx_timestamp_udw_ggtt_addr(struct xe_lrc *lrc);
-u64 xe_lrc_ctx_timestamp(struct xe_lrc *lrc);
 u32 xe_lrc_ctx_job_timestamp_ggtt_addr(struct xe_lrc *lrc);
 u32 xe_lrc_ctx_job_timestamp(struct xe_lrc *lrc);
 int xe_lrc_setup_wa_bb_with_scratch(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
@@ -162,4 +161,6 @@ int xe_lrc_setup_wa_bb_with_scratch(struct xe_lrc *lrc, struct xe_hw_engine *hwe
  */
 u64 xe_lrc_update_timestamp(struct xe_lrc *lrc, u64 *old_ts);
 
+u64 xe_lrc_timestamp(struct xe_lrc *lrc);
+
 #endif
diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index ac0c6dcffe15..3dacfc2da75c 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -233,13 +233,26 @@ static u32 get_ppgtt_flag(struct xe_sched_job *job)
 	return 0;
 }
 
-static int emit_copy_timestamp(struct xe_lrc *lrc, u32 *dw, int i)
+static int emit_copy_timestamp(struct xe_device *xe, struct xe_lrc *lrc,
+			       u32 *dw, int i)
 {
 	dw[i++] = MI_STORE_REGISTER_MEM | MI_SRM_USE_GGTT | MI_SRM_ADD_CS_OFFSET;
 	dw[i++] = RING_CTX_TIMESTAMP(0).addr;
 	dw[i++] = xe_lrc_ctx_job_timestamp_ggtt_addr(lrc);
 	dw[i++] = 0;
 
+	/*
+	 * Ensure CTX timestamp >= Job timestamp during VF sampling to avoid
+	 * arithmetic wraparound in TDR.
+	 */
+	if (IS_SRIOV_VF(xe)) {
+		dw[i++] = MI_STORE_REGISTER_MEM | MI_SRM_USE_GGTT |
+			MI_SRM_ADD_CS_OFFSET;
+		dw[i++] = RING_CTX_TIMESTAMP(0).addr;
+		dw[i++] = xe_lrc_ctx_timestamp_ggtt_addr(lrc);
+		dw[i++] = 0;
+	}
+
 	return i;
 }
 
@@ -253,7 +266,7 @@ static void __emit_job_gen12_simple(struct xe_sched_job *job, struct xe_lrc *lrc
 
 	*head = lrc->ring.tail;
 
-	i = emit_copy_timestamp(lrc, dw, i);
+	i = emit_copy_timestamp(gt_to_xe(gt), lrc, dw, i);
 
 	if (job->ring_ops_flush_tlb) {
 		dw[i++] = preparser_disable(true);
@@ -308,7 +321,7 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
 
 	*head = lrc->ring.tail;
 
-	i = emit_copy_timestamp(lrc, dw, i);
+	i = emit_copy_timestamp(xe, lrc, dw, i);
 
 	dw[i++] = preparser_disable(true);
 
@@ -362,7 +375,7 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
 
 	*head = lrc->ring.tail;
 
-	i = emit_copy_timestamp(lrc, dw, i);
+	i = emit_copy_timestamp(xe, lrc, dw, i);
 
 	dw[i++] = preparser_disable(true);
 	if (lacks_render)
@@ -406,12 +419,14 @@ static void emit_migration_job_gen12(struct xe_sched_job *job,
 				     struct xe_lrc *lrc, u32 *head,
 				     u32 seqno)
 {
+	struct xe_gt *gt = job->q->gt;
+	struct xe_device *xe = gt_to_xe(gt);
 	u32 saddr = xe_lrc_start_seqno_ggtt_addr(lrc);
 	u32 dw[MAX_JOB_SIZE_DW], i = 0;
 
 	*head = lrc->ring.tail;
 
-	i = emit_copy_timestamp(lrc, dw, i);
+	i = emit_copy_timestamp(xe, lrc, dw, i);
 
 	i = emit_store_imm_ggtt(saddr, seqno, dw, i);
 
diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
index cb674a322113..39aec7f6d86d 100644
--- a/drivers/gpu/drm/xe/xe_sched_job.c
+++ b/drivers/gpu/drm/xe/xe_sched_job.c
@@ -110,6 +110,7 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
 		return ERR_PTR(-ENOMEM);
 
 	job->q = q;
+	job->sample_timestamp = U64_MAX;
 	kref_init(&job->refcount);
 	xe_exec_queue_get(job->q);
 
diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h b/drivers/gpu/drm/xe/xe_sched_job_types.h
index 7c4c54fe920a..13c2970e81a8 100644
--- a/drivers/gpu/drm/xe/xe_sched_job_types.h
+++ b/drivers/gpu/drm/xe/xe_sched_job_types.h
@@ -59,6 +59,8 @@ struct xe_sched_job {
 	u32 lrc_seqno;
 	/** @migrate_flush_flags: Additional flush flags for migration jobs */
 	u32 migrate_flush_flags;
+	/** @sample_timestamp: Sampling of job timestamp in TDR */
+	u64 sample_timestamp;
 	/** @ring_ops_flush_tlb: The ring ops need to flush TLB before payload. */
 	bool ring_ops_flush_tlb;
 	/** @ggtt: mapped in ggtt. */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* ✗ CI.checkpatch: warning for Fix DRM scheduler layering violations in Xe (rev8)
  2025-12-01 18:39 [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
                   ` (8 preceding siblings ...)
  2025-12-01 18:39 ` [PATCH v7 9/9] drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR Matthew Brost
@ 2025-12-02  0:53 ` Patchwork
  2025-12-02  0:55 ` ✓ CI.KUnit: success " Patchwork
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2025-12-02  0:53 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

== Series Details ==

Series: Fix DRM scheduler layering violations in Xe (rev8)
URL   : https://patchwork.freedesktop.org/series/155314/
State : warning

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
2de9a3901bc28757c7906b454717b64e2a214021
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit 41b39105ae1e7d76fa56826b12ebc1ed35a09419
Author: Matthew Brost <matthew.brost@intel.com>
Date:   Mon Dec 1 10:39:54 2025 -0800

    drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR
    
    We now have proper infrastructure to accurately check the LRC timestamp
    without toggling the scheduling state for non-VFs. For VFs, it is still
    possible to get an inaccurate view if the context is on hardware. We
    guard against free-running contexts on VFs by banning jobs whose
    timestamps are not moving. In addition, VFs have a timeslice quantum
    that naturally triggers context switches when more than one VF is
    running, thus updating the LRC timestamp.
    
    For multi-queue, it is desirable to avoid scheduling toggling in the TDR
    because this scheduling state is shared among many queues. Furthermore,
    this change simplifies the GuC state machine. The trade-off for VF cases
    seems worthwhile.
    
    v5:
     - Add xe_lrc_timestamp helper (Umesh)
    v6:
     - Reduce number of tries on stuck timestamp (VF testing)
     - Convert job timestamp save to a memory copy (VF testing)
    v7:
     - Save ctx timestamp to LRC when start VF job (VF testing)
    
    Signed-off-by: Matthew Brost <matthew.brost@intel.com>
+ /mt/dim checkpatch 639f325d8cbdc690de963db2fe5840444ac7ea65 drm-intel
9a0a091575cd drm/sched: Add several job helpers to avoid drivers touching scheduler state
a8edba9328cf drm/sched: Add pending job list iterator
-:73: ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#73: FILE: include/drm/gpu_scheduler.h:778:
+#define drm_sched_for_each_pending_job(__job, __sched, __entity)		\
+	scoped_guard(drm_sched_pending_job_iter, (__sched))			\
+		list_for_each_entry((__job), &(__sched)->pending_list, list)	\
+			for_each_if(!(__entity) || (__job)->entity == (__entity))

BUT SEE:

   do {} while (0) advice is over-stated in a few situations:

   The more obvious case is macros, like MODULE_PARM_DESC, invoked at
   file-scope, where C disallows code (it must be in functions).  See
   $exceptions if you have one to add by name.

   More troublesome is declarative macros used at top of new scope,
   like DECLARE_PER_CPU.  These might just compile with a do-while-0
   wrapper, but would be incorrect.  Most of these are handled by
   detecting struct,union,etc declaration primitives in $exceptions.

   Theres also macros called inside an if (block), which "return" an
   expression.  These cannot do-while, and need a ({}) wrapper.

   Enjoy this qualification while we work to improve our heuristics.

-:73: CHECK:MACRO_ARG_REUSE: Macro argument reuse '__job' - possible side-effects?
#73: FILE: include/drm/gpu_scheduler.h:778:
+#define drm_sched_for_each_pending_job(__job, __sched, __entity)		\
+	scoped_guard(drm_sched_pending_job_iter, (__sched))			\
+		list_for_each_entry((__job), &(__sched)->pending_list, list)	\
+			for_each_if(!(__entity) || (__job)->entity == (__entity))

-:73: CHECK:MACRO_ARG_REUSE: Macro argument reuse '__sched' - possible side-effects?
#73: FILE: include/drm/gpu_scheduler.h:778:
+#define drm_sched_for_each_pending_job(__job, __sched, __entity)		\
+	scoped_guard(drm_sched_pending_job_iter, (__sched))			\
+		list_for_each_entry((__job), &(__sched)->pending_list, list)	\
+			for_each_if(!(__entity) || (__job)->entity == (__entity))

-:73: CHECK:MACRO_ARG_REUSE: Macro argument reuse '__entity' - possible side-effects?
#73: FILE: include/drm/gpu_scheduler.h:778:
+#define drm_sched_for_each_pending_job(__job, __sched, __entity)		\
+	scoped_guard(drm_sched_pending_job_iter, (__sched))			\
+		list_for_each_entry((__job), &(__sched)->pending_list, list)	\
+			for_each_if(!(__entity) || (__job)->entity == (__entity))

total: 1 errors, 0 warnings, 3 checks, 54 lines checked
ba24d3ac008e drm/xe: Add dedicated message lock
628b33612410 drm/xe: Stop abusing DRM scheduler internals
7aac3fbb7683 drm/xe: Only toggle scheduling in TDR if GuC is running
7183fe3bbdd1 drm/xe: Do not deregister queues in TDR
acfab21aaaf9 drm/xe: Remove special casing for LR queues in submission
e170c9ec71a7 drm/xe: Disable timestamp WA on VFs
41b39105ae1e drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR



^ permalink raw reply	[flat|nested] 31+ messages in thread

* ✓ CI.KUnit: success for Fix DRM scheduler layering violations in Xe (rev8)
  2025-12-01 18:39 [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
                   ` (9 preceding siblings ...)
  2025-12-02  0:53 ` ✗ CI.checkpatch: warning for Fix DRM scheduler layering violations in Xe (rev8) Patchwork
@ 2025-12-02  0:55 ` Patchwork
  2025-12-02  2:05 ` ✓ Xe.CI.BAT: " Patchwork
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2025-12-02  0:55 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

== Series Details ==

Series: Fix DRM scheduler layering violations in Xe (rev8)
URL   : https://patchwork.freedesktop.org/series/155314/
State : success

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
[00:53:44] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[00:53:48] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[00:54:19] Starting KUnit Kernel (1/1)...
[00:54:19] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[00:54:19] ================== guc_buf (11 subtests) ===================
[00:54:19] [PASSED] test_smallest
[00:54:19] [PASSED] test_largest
[00:54:19] [PASSED] test_granular
[00:54:19] [PASSED] test_unique
[00:54:19] [PASSED] test_overlap
[00:54:19] [PASSED] test_reusable
[00:54:19] [PASSED] test_too_big
[00:54:19] [PASSED] test_flush
[00:54:19] [PASSED] test_lookup
[00:54:19] [PASSED] test_data
[00:54:19] [PASSED] test_class
[00:54:19] ===================== [PASSED] guc_buf =====================
[00:54:19] =================== guc_dbm (7 subtests) ===================
[00:54:19] [PASSED] test_empty
[00:54:19] [PASSED] test_default
[00:54:19] ======================== test_size  ========================
[00:54:19] [PASSED] 4
[00:54:19] [PASSED] 8
[00:54:19] [PASSED] 32
[00:54:19] [PASSED] 256
[00:54:19] ==================== [PASSED] test_size ====================
[00:54:19] ======================= test_reuse  ========================
[00:54:19] [PASSED] 4
[00:54:19] [PASSED] 8
[00:54:19] [PASSED] 32
[00:54:19] [PASSED] 256
[00:54:19] =================== [PASSED] test_reuse ====================
[00:54:19] =================== test_range_overlap  ====================
[00:54:19] [PASSED] 4
[00:54:19] [PASSED] 8
[00:54:19] [PASSED] 32
[00:54:19] [PASSED] 256
[00:54:19] =============== [PASSED] test_range_overlap ================
[00:54:19] =================== test_range_compact  ====================
[00:54:19] [PASSED] 4
[00:54:19] [PASSED] 8
[00:54:19] [PASSED] 32
[00:54:19] [PASSED] 256
[00:54:19] =============== [PASSED] test_range_compact ================
[00:54:19] ==================== test_range_spare  =====================
[00:54:19] [PASSED] 4
[00:54:19] [PASSED] 8
[00:54:19] [PASSED] 32
[00:54:19] [PASSED] 256
[00:54:19] ================ [PASSED] test_range_spare =================
[00:54:19] ===================== [PASSED] guc_dbm =====================
[00:54:19] =================== guc_idm (6 subtests) ===================
[00:54:19] [PASSED] bad_init
[00:54:19] [PASSED] no_init
[00:54:19] [PASSED] init_fini
[00:54:19] [PASSED] check_used
[00:54:19] [PASSED] check_quota
[00:54:19] [PASSED] check_all
[00:54:19] ===================== [PASSED] guc_idm =====================
[00:54:19] ================== no_relay (3 subtests) ===================
[00:54:19] [PASSED] xe_drops_guc2pf_if_not_ready
[00:54:19] [PASSED] xe_drops_guc2vf_if_not_ready
[00:54:19] [PASSED] xe_rejects_send_if_not_ready
[00:54:19] ==================== [PASSED] no_relay =====================
[00:54:19] ================== pf_relay (14 subtests) ==================
[00:54:19] [PASSED] pf_rejects_guc2pf_too_short
[00:54:19] [PASSED] pf_rejects_guc2pf_too_long
[00:54:19] [PASSED] pf_rejects_guc2pf_no_payload
[00:54:19] [PASSED] pf_fails_no_payload
[00:54:19] [PASSED] pf_fails_bad_origin
[00:54:19] [PASSED] pf_fails_bad_type
[00:54:19] [PASSED] pf_txn_reports_error
[00:54:19] [PASSED] pf_txn_sends_pf2guc
[00:54:19] [PASSED] pf_sends_pf2guc
[00:54:19] [SKIPPED] pf_loopback_nop
[00:54:19] [SKIPPED] pf_loopback_echo
[00:54:19] [SKIPPED] pf_loopback_fail
[00:54:19] [SKIPPED] pf_loopback_busy
[00:54:19] [SKIPPED] pf_loopback_retry
[00:54:19] ==================== [PASSED] pf_relay =====================
[00:54:19] ================== vf_relay (3 subtests) ===================
[00:54:19] [PASSED] vf_rejects_guc2vf_too_short
[00:54:19] [PASSED] vf_rejects_guc2vf_too_long
[00:54:19] [PASSED] vf_rejects_guc2vf_no_payload
[00:54:19] ==================== [PASSED] vf_relay =====================
[00:54:19] ================ pf_gt_config (6 subtests) =================
[00:54:19] [PASSED] fair_contexts_1vf
[00:54:19] [PASSED] fair_doorbells_1vf
[00:54:19] [PASSED] fair_ggtt_1vf
[00:54:19] ====================== fair_contexts  ======================
[00:54:19] [PASSED] 1 VF
[00:54:19] [PASSED] 2 VFs
[00:54:19] [PASSED] 3 VFs
[00:54:19] [PASSED] 4 VFs
[00:54:19] [PASSED] 5 VFs
[00:54:19] [PASSED] 6 VFs
[00:54:19] [PASSED] 7 VFs
[00:54:19] [PASSED] 8 VFs
[00:54:19] [PASSED] 9 VFs
[00:54:19] [PASSED] 10 VFs
[00:54:19] [PASSED] 11 VFs
[00:54:19] [PASSED] 12 VFs
[00:54:19] [PASSED] 13 VFs
[00:54:19] [PASSED] 14 VFs
[00:54:19] [PASSED] 15 VFs
[00:54:19] [PASSED] 16 VFs
[00:54:19] [PASSED] 17 VFs
[00:54:19] [PASSED] 18 VFs
[00:54:19] [PASSED] 19 VFs
[00:54:19] [PASSED] 20 VFs
[00:54:19] [PASSED] 21 VFs
[00:54:19] [PASSED] 22 VFs
[00:54:19] [PASSED] 23 VFs
[00:54:19] [PASSED] 24 VFs
[00:54:19] [PASSED] 25 VFs
[00:54:19] [PASSED] 26 VFs
[00:54:19] [PASSED] 27 VFs
[00:54:19] [PASSED] 28 VFs
[00:54:19] [PASSED] 29 VFs
[00:54:19] [PASSED] 30 VFs
[00:54:19] [PASSED] 31 VFs
[00:54:19] [PASSED] 32 VFs
[00:54:19] [PASSED] 33 VFs
[00:54:19] [PASSED] 34 VFs
[00:54:19] [PASSED] 35 VFs
[00:54:19] [PASSED] 36 VFs
[00:54:19] [PASSED] 37 VFs
[00:54:19] [PASSED] 38 VFs
[00:54:19] [PASSED] 39 VFs
[00:54:19] [PASSED] 40 VFs
[00:54:19] [PASSED] 41 VFs
[00:54:19] [PASSED] 42 VFs
[00:54:19] [PASSED] 43 VFs
[00:54:19] [PASSED] 44 VFs
[00:54:19] [PASSED] 45 VFs
[00:54:19] [PASSED] 46 VFs
[00:54:19] [PASSED] 47 VFs
[00:54:19] [PASSED] 48 VFs
[00:54:19] [PASSED] 49 VFs
[00:54:19] [PASSED] 50 VFs
[00:54:19] [PASSED] 51 VFs
[00:54:19] [PASSED] 52 VFs
[00:54:19] [PASSED] 53 VFs
[00:54:19] [PASSED] 54 VFs
[00:54:19] [PASSED] 55 VFs
[00:54:19] [PASSED] 56 VFs
[00:54:19] [PASSED] 57 VFs
[00:54:19] [PASSED] 58 VFs
[00:54:19] [PASSED] 59 VFs
[00:54:19] [PASSED] 60 VFs
[00:54:19] [PASSED] 61 VFs
[00:54:19] [PASSED] 62 VFs
[00:54:19] [PASSED] 63 VFs
[00:54:19] ================== [PASSED] fair_contexts ==================
[00:54:19] ===================== fair_doorbells  ======================
[00:54:19] [PASSED] 1 VF
[00:54:19] [PASSED] 2 VFs
[00:54:19] [PASSED] 3 VFs
[00:54:19] [PASSED] 4 VFs
[00:54:19] [PASSED] 5 VFs
[00:54:19] [PASSED] 6 VFs
[00:54:19] [PASSED] 7 VFs
[00:54:19] [PASSED] 8 VFs
[00:54:19] [PASSED] 9 VFs
[00:54:19] [PASSED] 10 VFs
[00:54:19] [PASSED] 11 VFs
[00:54:19] [PASSED] 12 VFs
[00:54:19] [PASSED] 13 VFs
[00:54:19] [PASSED] 14 VFs
[00:54:19] [PASSED] 15 VFs
[00:54:19] [PASSED] 16 VFs
[00:54:19] [PASSED] 17 VFs
[00:54:19] [PASSED] 18 VFs
[00:54:19] [PASSED] 19 VFs
[00:54:19] [PASSED] 20 VFs
[00:54:19] [PASSED] 21 VFs
[00:54:19] [PASSED] 22 VFs
[00:54:19] [PASSED] 23 VFs
[00:54:19] [PASSED] 24 VFs
[00:54:19] [PASSED] 25 VFs
[00:54:19] [PASSED] 26 VFs
[00:54:19] [PASSED] 27 VFs
[00:54:19] [PASSED] 28 VFs
[00:54:19] [PASSED] 29 VFs
[00:54:19] [PASSED] 30 VFs
[00:54:19] [PASSED] 31 VFs
[00:54:19] [PASSED] 32 VFs
[00:54:19] [PASSED] 33 VFs
[00:54:19] [PASSED] 34 VFs
[00:54:19] [PASSED] 35 VFs
[00:54:19] [PASSED] 36 VFs
[00:54:19] [PASSED] 37 VFs
[00:54:19] [PASSED] 38 VFs
[00:54:19] [PASSED] 39 VFs
[00:54:19] [PASSED] 40 VFs
[00:54:19] [PASSED] 41 VFs
[00:54:19] [PASSED] 42 VFs
[00:54:19] [PASSED] 43 VFs
[00:54:19] [PASSED] 44 VFs
[00:54:19] [PASSED] 45 VFs
[00:54:19] [PASSED] 46 VFs
[00:54:19] [PASSED] 47 VFs
[00:54:19] [PASSED] 48 VFs
[00:54:19] [PASSED] 49 VFs
[00:54:19] [PASSED] 50 VFs
[00:54:19] [PASSED] 51 VFs
[00:54:19] [PASSED] 52 VFs
[00:54:19] [PASSED] 53 VFs
[00:54:19] [PASSED] 54 VFs
[00:54:19] [PASSED] 55 VFs
[00:54:19] [PASSED] 56 VFs
[00:54:19] [PASSED] 57 VFs
[00:54:19] [PASSED] 58 VFs
[00:54:19] [PASSED] 59 VFs
[00:54:19] [PASSED] 60 VFs
[00:54:19] [PASSED] 61 VFs
[00:54:19] [PASSED] 62 VFs
[00:54:19] [PASSED] 63 VFs
[00:54:19] ================= [PASSED] fair_doorbells ==================
[00:54:19] ======================== fair_ggtt  ========================
[00:54:19] [PASSED] 1 VF
[00:54:19] [PASSED] 2 VFs
[00:54:19] [PASSED] 3 VFs
[00:54:19] [PASSED] 4 VFs
[00:54:19] [PASSED] 5 VFs
[00:54:19] [PASSED] 6 VFs
[00:54:19] [PASSED] 7 VFs
[00:54:19] [PASSED] 8 VFs
[00:54:19] [PASSED] 9 VFs
[00:54:19] [PASSED] 10 VFs
[00:54:19] [PASSED] 11 VFs
[00:54:19] [PASSED] 12 VFs
[00:54:19] [PASSED] 13 VFs
[00:54:19] [PASSED] 14 VFs
[00:54:19] [PASSED] 15 VFs
[00:54:19] [PASSED] 16 VFs
[00:54:19] [PASSED] 17 VFs
[00:54:19] [PASSED] 18 VFs
[00:54:19] [PASSED] 19 VFs
[00:54:19] [PASSED] 20 VFs
[00:54:19] [PASSED] 21 VFs
[00:54:19] [PASSED] 22 VFs
[00:54:19] [PASSED] 23 VFs
[00:54:19] [PASSED] 24 VFs
[00:54:19] [PASSED] 25 VFs
[00:54:19] [PASSED] 26 VFs
[00:54:19] [PASSED] 27 VFs
[00:54:19] [PASSED] 28 VFs
[00:54:19] [PASSED] 29 VFs
[00:54:19] [PASSED] 30 VFs
[00:54:19] [PASSED] 31 VFs
[00:54:19] [PASSED] 32 VFs
[00:54:19] [PASSED] 33 VFs
[00:54:19] [PASSED] 34 VFs
[00:54:19] [PASSED] 35 VFs
[00:54:19] [PASSED] 36 VFs
[00:54:19] [PASSED] 37 VFs
[00:54:19] [PASSED] 38 VFs
[00:54:19] [PASSED] 39 VFs
[00:54:19] [PASSED] 40 VFs
[00:54:19] [PASSED] 41 VFs
[00:54:19] [PASSED] 42 VFs
[00:54:19] [PASSED] 43 VFs
[00:54:19] [PASSED] 44 VFs
[00:54:19] [PASSED] 45 VFs
[00:54:19] [PASSED] 46 VFs
[00:54:19] [PASSED] 47 VFs
[00:54:19] [PASSED] 48 VFs
[00:54:19] [PASSED] 49 VFs
[00:54:19] [PASSED] 50 VFs
[00:54:19] [PASSED] 51 VFs
[00:54:19] [PASSED] 52 VFs
[00:54:19] [PASSED] 53 VFs
[00:54:19] [PASSED] 54 VFs
[00:54:19] [PASSED] 55 VFs
[00:54:19] [PASSED] 56 VFs
[00:54:19] [PASSED] 57 VFs
[00:54:19] [PASSED] 58 VFs
[00:54:19] [PASSED] 59 VFs
[00:54:19] [PASSED] 60 VFs
[00:54:19] [PASSED] 61 VFs
[00:54:19] [PASSED] 62 VFs
[00:54:19] [PASSED] 63 VFs
[00:54:19] ==================== [PASSED] fair_ggtt ====================
[00:54:19] ================== [PASSED] pf_gt_config ===================
[00:54:19] ===================== lmtt (1 subtest) =====================
[00:54:19] ======================== test_ops  =========================
[00:54:19] [PASSED] 2-level
[00:54:19] [PASSED] multi-level
[00:54:19] ==================== [PASSED] test_ops =====================
[00:54:19] ====================== [PASSED] lmtt =======================
[00:54:19] ================= pf_service (11 subtests) =================
[00:54:19] [PASSED] pf_negotiate_any
[00:54:19] [PASSED] pf_negotiate_base_match
[00:54:19] [PASSED] pf_negotiate_base_newer
[00:54:19] [PASSED] pf_negotiate_base_next
[00:54:19] [SKIPPED] pf_negotiate_base_older
[00:54:19] [PASSED] pf_negotiate_base_prev
[00:54:19] [PASSED] pf_negotiate_latest_match
[00:54:19] [PASSED] pf_negotiate_latest_newer
[00:54:19] [PASSED] pf_negotiate_latest_next
[00:54:19] [SKIPPED] pf_negotiate_latest_older
[00:54:19] [SKIPPED] pf_negotiate_latest_prev
[00:54:19] =================== [PASSED] pf_service ====================
[00:54:19] ================= xe_guc_g2g (2 subtests) ==================
[00:54:19] ============== xe_live_guc_g2g_kunit_default  ==============
[00:54:19] ========= [SKIPPED] xe_live_guc_g2g_kunit_default ==========
[00:54:19] ============== xe_live_guc_g2g_kunit_allmem  ===============
[00:54:19] ========== [SKIPPED] xe_live_guc_g2g_kunit_allmem ==========
[00:54:19] =================== [SKIPPED] xe_guc_g2g ===================
[00:54:19] =================== xe_mocs (2 subtests) ===================
[00:54:19] ================ xe_live_mocs_kernel_kunit  ================
[00:54:19] =========== [SKIPPED] xe_live_mocs_kernel_kunit ============
[00:54:19] ================ xe_live_mocs_reset_kunit  =================
[00:54:19] ============ [SKIPPED] xe_live_mocs_reset_kunit ============
[00:54:19] ==================== [SKIPPED] xe_mocs =====================
[00:54:19] ================= xe_migrate (2 subtests) ==================
[00:54:19] ================= xe_migrate_sanity_kunit  =================
[00:54:19] ============ [SKIPPED] xe_migrate_sanity_kunit =============
[00:54:19] ================== xe_validate_ccs_kunit  ==================
[00:54:19] ============= [SKIPPED] xe_validate_ccs_kunit ==============
[00:54:19] =================== [SKIPPED] xe_migrate ===================
[00:54:19] ================== xe_dma_buf (1 subtest) ==================
[00:54:19] ==================== xe_dma_buf_kunit  =====================
[00:54:19] ================ [SKIPPED] xe_dma_buf_kunit ================
[00:54:19] =================== [SKIPPED] xe_dma_buf ===================
[00:54:19] ================= xe_bo_shrink (1 subtest) =================
[00:54:19] =================== xe_bo_shrink_kunit  ====================
[00:54:19] =============== [SKIPPED] xe_bo_shrink_kunit ===============
[00:54:19] ================== [SKIPPED] xe_bo_shrink ==================
[00:54:19] ==================== xe_bo (2 subtests) ====================
[00:54:19] ================== xe_ccs_migrate_kunit  ===================
[00:54:19] ============== [SKIPPED] xe_ccs_migrate_kunit ==============
[00:54:19] ==================== xe_bo_evict_kunit  ====================
[00:54:19] =============== [SKIPPED] xe_bo_evict_kunit ================
[00:54:19] ===================== [SKIPPED] xe_bo ======================
[00:54:19] ==================== args (11 subtests) ====================
[00:54:19] [PASSED] count_args_test
[00:54:19] [PASSED] call_args_example
[00:54:19] [PASSED] call_args_test
[00:54:19] [PASSED] drop_first_arg_example
[00:54:19] [PASSED] drop_first_arg_test
[00:54:19] [PASSED] first_arg_example
[00:54:19] [PASSED] first_arg_test
[00:54:19] [PASSED] last_arg_example
[00:54:19] [PASSED] last_arg_test
[00:54:19] [PASSED] pick_arg_example
[00:54:19] [PASSED] sep_comma_example
[00:54:19] ====================== [PASSED] args =======================
[00:54:19] =================== xe_pci (3 subtests) ====================
[00:54:19] ==================== check_graphics_ip  ====================
[00:54:19] [PASSED] 12.00 Xe_LP
[00:54:19] [PASSED] 12.10 Xe_LP+
[00:54:19] [PASSED] 12.55 Xe_HPG
[00:54:19] [PASSED] 12.60 Xe_HPC
[00:54:19] [PASSED] 12.70 Xe_LPG
[00:54:19] [PASSED] 12.71 Xe_LPG
[00:54:19] [PASSED] 12.74 Xe_LPG+
[00:54:19] [PASSED] 20.01 Xe2_HPG
[00:54:19] [PASSED] 20.02 Xe2_HPG
[00:54:19] [PASSED] 20.04 Xe2_LPG
[00:54:19] [PASSED] 30.00 Xe3_LPG
[00:54:19] [PASSED] 30.01 Xe3_LPG
[00:54:19] [PASSED] 30.03 Xe3_LPG
[00:54:19] [PASSED] 30.04 Xe3_LPG
[00:54:19] [PASSED] 30.05 Xe3_LPG
[00:54:19] [PASSED] 35.11 Xe3p_XPC
[00:54:19] ================ [PASSED] check_graphics_ip ================
[00:54:19] ===================== check_media_ip  ======================
[00:54:19] [PASSED] 12.00 Xe_M
[00:54:19] [PASSED] 12.55 Xe_HPM
[00:54:19] [PASSED] 13.00 Xe_LPM+
[00:54:19] [PASSED] 13.01 Xe2_HPM
[00:54:19] [PASSED] 20.00 Xe2_LPM
[00:54:19] [PASSED] 30.00 Xe3_LPM
[00:54:19] [PASSED] 30.02 Xe3_LPM
[00:54:19] [PASSED] 35.00 Xe3p_LPM
[00:54:19] [PASSED] 35.03 Xe3p_HPM
[00:54:19] ================= [PASSED] check_media_ip ==================
[00:54:19] =================== check_platform_desc  ===================
[00:54:19] [PASSED] 0x9A60 (TIGERLAKE)
[00:54:19] [PASSED] 0x9A68 (TIGERLAKE)
[00:54:19] [PASSED] 0x9A70 (TIGERLAKE)
[00:54:19] [PASSED] 0x9A40 (TIGERLAKE)
[00:54:19] [PASSED] 0x9A49 (TIGERLAKE)
[00:54:19] [PASSED] 0x9A59 (TIGERLAKE)
[00:54:19] [PASSED] 0x9A78 (TIGERLAKE)
[00:54:19] [PASSED] 0x9AC0 (TIGERLAKE)
[00:54:19] [PASSED] 0x9AC9 (TIGERLAKE)
[00:54:19] [PASSED] 0x9AD9 (TIGERLAKE)
[00:54:19] [PASSED] 0x9AF8 (TIGERLAKE)
[00:54:19] [PASSED] 0x4C80 (ROCKETLAKE)
[00:54:19] [PASSED] 0x4C8A (ROCKETLAKE)
[00:54:19] [PASSED] 0x4C8B (ROCKETLAKE)
[00:54:19] [PASSED] 0x4C8C (ROCKETLAKE)
[00:54:19] [PASSED] 0x4C90 (ROCKETLAKE)
[00:54:19] [PASSED] 0x4C9A (ROCKETLAKE)
[00:54:19] [PASSED] 0x4680 (ALDERLAKE_S)
[00:54:19] [PASSED] 0x4682 (ALDERLAKE_S)
[00:54:19] [PASSED] 0x4688 (ALDERLAKE_S)
[00:54:19] [PASSED] 0x468A (ALDERLAKE_S)
[00:54:19] [PASSED] 0x468B (ALDERLAKE_S)
[00:54:19] [PASSED] 0x4690 (ALDERLAKE_S)
[00:54:19] [PASSED] 0x4692 (ALDERLAKE_S)
[00:54:19] [PASSED] 0x4693 (ALDERLAKE_S)
[00:54:19] [PASSED] 0x46A0 (ALDERLAKE_P)
[00:54:19] [PASSED] 0x46A1 (ALDERLAKE_P)
[00:54:19] [PASSED] 0x46A2 (ALDERLAKE_P)
[00:54:19] [PASSED] 0x46A3 (ALDERLAKE_P)
[00:54:19] [PASSED] 0x46A6 (ALDERLAKE_P)
[00:54:19] [PASSED] 0x46A8 (ALDERLAKE_P)
[00:54:19] [PASSED] 0x46AA (ALDERLAKE_P)
[00:54:19] [PASSED] 0x462A (ALDERLAKE_P)
[00:54:19] [PASSED] 0x4626 (ALDERLAKE_P)
[00:54:19] [PASSED] 0x4628 (ALDERLAKE_P)
[00:54:19] [PASSED] 0x46B0 (ALDERLAKE_P)
stty: 'standard input': Inappropriate ioctl for device
[00:54:19] [PASSED] 0x46B1 (ALDERLAKE_P)
[00:54:19] [PASSED] 0x46B2 (ALDERLAKE_P)
[00:54:19] [PASSED] 0x46B3 (ALDERLAKE_P)
[00:54:19] [PASSED] 0x46C0 (ALDERLAKE_P)
[00:54:19] [PASSED] 0x46C1 (ALDERLAKE_P)
[00:54:19] [PASSED] 0x46C2 (ALDERLAKE_P)
[00:54:19] [PASSED] 0x46C3 (ALDERLAKE_P)
[00:54:19] [PASSED] 0x46D0 (ALDERLAKE_N)
[00:54:19] [PASSED] 0x46D1 (ALDERLAKE_N)
[00:54:19] [PASSED] 0x46D2 (ALDERLAKE_N)
[00:54:19] [PASSED] 0x46D3 (ALDERLAKE_N)
[00:54:19] [PASSED] 0x46D4 (ALDERLAKE_N)
[00:54:19] [PASSED] 0xA721 (ALDERLAKE_P)
[00:54:19] [PASSED] 0xA7A1 (ALDERLAKE_P)
[00:54:19] [PASSED] 0xA7A9 (ALDERLAKE_P)
[00:54:19] [PASSED] 0xA7AC (ALDERLAKE_P)
[00:54:19] [PASSED] 0xA7AD (ALDERLAKE_P)
[00:54:19] [PASSED] 0xA720 (ALDERLAKE_P)
[00:54:19] [PASSED] 0xA7A0 (ALDERLAKE_P)
[00:54:19] [PASSED] 0xA7A8 (ALDERLAKE_P)
[00:54:19] [PASSED] 0xA7AA (ALDERLAKE_P)
[00:54:19] [PASSED] 0xA7AB (ALDERLAKE_P)
[00:54:19] [PASSED] 0xA780 (ALDERLAKE_S)
[00:54:19] [PASSED] 0xA781 (ALDERLAKE_S)
[00:54:19] [PASSED] 0xA782 (ALDERLAKE_S)
[00:54:19] [PASSED] 0xA783 (ALDERLAKE_S)
[00:54:19] [PASSED] 0xA788 (ALDERLAKE_S)
[00:54:19] [PASSED] 0xA789 (ALDERLAKE_S)
[00:54:19] [PASSED] 0xA78A (ALDERLAKE_S)
[00:54:19] [PASSED] 0xA78B (ALDERLAKE_S)
[00:54:19] [PASSED] 0x4905 (DG1)
[00:54:19] [PASSED] 0x4906 (DG1)
[00:54:19] [PASSED] 0x4907 (DG1)
[00:54:19] [PASSED] 0x4908 (DG1)
[00:54:19] [PASSED] 0x4909 (DG1)
[00:54:19] [PASSED] 0x56C0 (DG2)
[00:54:19] [PASSED] 0x56C2 (DG2)
[00:54:19] [PASSED] 0x56C1 (DG2)
[00:54:19] [PASSED] 0x7D51 (METEORLAKE)
[00:54:19] [PASSED] 0x7DD1 (METEORLAKE)
[00:54:19] [PASSED] 0x7D41 (METEORLAKE)
[00:54:19] [PASSED] 0x7D67 (METEORLAKE)
[00:54:19] [PASSED] 0xB640 (METEORLAKE)
[00:54:19] [PASSED] 0x56A0 (DG2)
[00:54:19] [PASSED] 0x56A1 (DG2)
[00:54:19] [PASSED] 0x56A2 (DG2)
[00:54:19] [PASSED] 0x56BE (DG2)
[00:54:19] [PASSED] 0x56BF (DG2)
[00:54:19] [PASSED] 0x5690 (DG2)
[00:54:19] [PASSED] 0x5691 (DG2)
[00:54:19] [PASSED] 0x5692 (DG2)
[00:54:19] [PASSED] 0x56A5 (DG2)
[00:54:19] [PASSED] 0x56A6 (DG2)
[00:54:19] [PASSED] 0x56B0 (DG2)
[00:54:19] [PASSED] 0x56B1 (DG2)
[00:54:19] [PASSED] 0x56BA (DG2)
[00:54:19] [PASSED] 0x56BB (DG2)
[00:54:19] [PASSED] 0x56BC (DG2)
[00:54:19] [PASSED] 0x56BD (DG2)
[00:54:19] [PASSED] 0x5693 (DG2)
[00:54:19] [PASSED] 0x5694 (DG2)
[00:54:19] [PASSED] 0x5695 (DG2)
[00:54:19] [PASSED] 0x56A3 (DG2)
[00:54:19] [PASSED] 0x56A4 (DG2)
[00:54:19] [PASSED] 0x56B2 (DG2)
[00:54:19] [PASSED] 0x56B3 (DG2)
[00:54:19] [PASSED] 0x5696 (DG2)
[00:54:19] [PASSED] 0x5697 (DG2)
[00:54:19] [PASSED] 0xB69 (PVC)
[00:54:19] [PASSED] 0xB6E (PVC)
[00:54:19] [PASSED] 0xBD4 (PVC)
[00:54:19] [PASSED] 0xBD5 (PVC)
[00:54:19] [PASSED] 0xBD6 (PVC)
[00:54:19] [PASSED] 0xBD7 (PVC)
[00:54:19] [PASSED] 0xBD8 (PVC)
[00:54:19] [PASSED] 0xBD9 (PVC)
[00:54:19] [PASSED] 0xBDA (PVC)
[00:54:19] [PASSED] 0xBDB (PVC)
[00:54:19] [PASSED] 0xBE0 (PVC)
[00:54:19] [PASSED] 0xBE1 (PVC)
[00:54:19] [PASSED] 0xBE5 (PVC)
[00:54:19] [PASSED] 0x7D40 (METEORLAKE)
[00:54:19] [PASSED] 0x7D45 (METEORLAKE)
[00:54:19] [PASSED] 0x7D55 (METEORLAKE)
[00:54:19] [PASSED] 0x7D60 (METEORLAKE)
[00:54:19] [PASSED] 0x7DD5 (METEORLAKE)
[00:54:19] [PASSED] 0x6420 (LUNARLAKE)
[00:54:19] [PASSED] 0x64A0 (LUNARLAKE)
[00:54:19] [PASSED] 0x64B0 (LUNARLAKE)
[00:54:19] [PASSED] 0xE202 (BATTLEMAGE)
[00:54:19] [PASSED] 0xE209 (BATTLEMAGE)
[00:54:19] [PASSED] 0xE20B (BATTLEMAGE)
[00:54:19] [PASSED] 0xE20C (BATTLEMAGE)
[00:54:19] [PASSED] 0xE20D (BATTLEMAGE)
[00:54:19] [PASSED] 0xE210 (BATTLEMAGE)
[00:54:19] [PASSED] 0xE211 (BATTLEMAGE)
[00:54:19] [PASSED] 0xE212 (BATTLEMAGE)
[00:54:19] [PASSED] 0xE216 (BATTLEMAGE)
[00:54:19] [PASSED] 0xE220 (BATTLEMAGE)
[00:54:19] [PASSED] 0xE221 (BATTLEMAGE)
[00:54:19] [PASSED] 0xE222 (BATTLEMAGE)
[00:54:19] [PASSED] 0xE223 (BATTLEMAGE)
[00:54:19] [PASSED] 0xB080 (PANTHERLAKE)
[00:54:19] [PASSED] 0xB081 (PANTHERLAKE)
[00:54:19] [PASSED] 0xB082 (PANTHERLAKE)
[00:54:19] [PASSED] 0xB083 (PANTHERLAKE)
[00:54:19] [PASSED] 0xB084 (PANTHERLAKE)
[00:54:19] [PASSED] 0xB085 (PANTHERLAKE)
[00:54:19] [PASSED] 0xB086 (PANTHERLAKE)
[00:54:19] [PASSED] 0xB087 (PANTHERLAKE)
[00:54:19] [PASSED] 0xB08F (PANTHERLAKE)
[00:54:19] [PASSED] 0xB090 (PANTHERLAKE)
[00:54:19] [PASSED] 0xB0A0 (PANTHERLAKE)
[00:54:19] [PASSED] 0xB0B0 (PANTHERLAKE)
[00:54:19] [PASSED] 0xD740 (NOVALAKE_S)
[00:54:19] [PASSED] 0xD741 (NOVALAKE_S)
[00:54:19] [PASSED] 0xD742 (NOVALAKE_S)
[00:54:19] [PASSED] 0xD743 (NOVALAKE_S)
[00:54:19] [PASSED] 0xD744 (NOVALAKE_S)
[00:54:19] [PASSED] 0xD745 (NOVALAKE_S)
[00:54:19] [PASSED] 0x674C (CRESCENTISLAND)
[00:54:19] [PASSED] 0xFD80 (PANTHERLAKE)
[00:54:19] [PASSED] 0xFD81 (PANTHERLAKE)
[00:54:19] =============== [PASSED] check_platform_desc ===============
[00:54:19] ===================== [PASSED] xe_pci ======================
[00:54:19] =================== xe_rtp (2 subtests) ====================
[00:54:19] =============== xe_rtp_process_to_sr_tests  ================
[00:54:19] [PASSED] coalesce-same-reg
[00:54:19] [PASSED] no-match-no-add
[00:54:19] [PASSED] match-or
[00:54:19] [PASSED] match-or-xfail
[00:54:19] [PASSED] no-match-no-add-multiple-rules
[00:54:19] [PASSED] two-regs-two-entries
[00:54:19] [PASSED] clr-one-set-other
[00:54:19] [PASSED] set-field
[00:54:19] [PASSED] conflict-duplicate
[00:54:19] [PASSED] conflict-not-disjoint
[00:54:19] [PASSED] conflict-reg-type
[00:54:19] =========== [PASSED] xe_rtp_process_to_sr_tests ============
[00:54:19] ================== xe_rtp_process_tests  ===================
[00:54:19] [PASSED] active1
[00:54:19] [PASSED] active2
[00:54:19] [PASSED] active-inactive
[00:54:19] [PASSED] inactive-active
[00:54:19] [PASSED] inactive-1st_or_active-inactive
[00:54:19] [PASSED] inactive-2nd_or_active-inactive
[00:54:19] [PASSED] inactive-last_or_active-inactive
[00:54:19] [PASSED] inactive-no_or_active-inactive
[00:54:19] ============== [PASSED] xe_rtp_process_tests ===============
[00:54:19] ===================== [PASSED] xe_rtp ======================
[00:54:19] ==================== xe_wa (1 subtest) =====================
[00:54:19] ======================== xe_wa_gt  =========================
[00:54:19] [PASSED] TIGERLAKE B0
[00:54:19] [PASSED] DG1 A0
[00:54:19] [PASSED] DG1 B0
[00:54:19] [PASSED] ALDERLAKE_S A0
[00:54:19] [PASSED] ALDERLAKE_S B0
[00:54:19] [PASSED] ALDERLAKE_S C0
[00:54:19] [PASSED] ALDERLAKE_S D0
[00:54:19] [PASSED] ALDERLAKE_P A0
[00:54:19] [PASSED] ALDERLAKE_P B0
[00:54:19] [PASSED] ALDERLAKE_P C0
[00:54:19] [PASSED] ALDERLAKE_S RPLS D0
[00:54:19] [PASSED] ALDERLAKE_P RPLU E0
[00:54:19] [PASSED] DG2 G10 C0
[00:54:19] [PASSED] DG2 G11 B1
[00:54:19] [PASSED] DG2 G12 A1
[00:54:19] [PASSED] METEORLAKE 12.70(Xe_LPG) A0 13.00(Xe_LPM+) A0
[00:54:19] [PASSED] METEORLAKE 12.71(Xe_LPG) A0 13.00(Xe_LPM+) A0
[00:54:19] [PASSED] METEORLAKE 12.74(Xe_LPG+) A0 13.00(Xe_LPM+) A0
[00:54:19] [PASSED] LUNARLAKE 20.04(Xe2_LPG) A0 20.00(Xe2_LPM) A0
[00:54:19] [PASSED] LUNARLAKE 20.04(Xe2_LPG) B0 20.00(Xe2_LPM) A0
[00:54:19] [PASSED] BATTLEMAGE 20.01(Xe2_HPG) A0 13.01(Xe2_HPM) A1
[00:54:19] [PASSED] PANTHERLAKE 30.00(Xe3_LPG) A0 30.00(Xe3_LPM) A0
[00:54:19] ==================== [PASSED] xe_wa_gt =====================
[00:54:19] ====================== [PASSED] xe_wa ======================
[00:54:19] ============================================================
[00:54:19] Testing complete. Ran 510 tests: passed: 492, skipped: 18
[00:54:20] Elapsed time: 35.834s total, 4.205s configuring, 31.163s building, 0.456s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/tests/.kunitconfig
[00:54:20] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[00:54:21] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[00:54:46] Starting KUnit Kernel (1/1)...
[00:54:46] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[00:54:46] ============ drm_test_pick_cmdline (2 subtests) ============
[00:54:46] [PASSED] drm_test_pick_cmdline_res_1920_1080_60
[00:54:46] =============== drm_test_pick_cmdline_named  ===============
[00:54:46] [PASSED] NTSC
[00:54:46] [PASSED] NTSC-J
[00:54:46] [PASSED] PAL
[00:54:46] [PASSED] PAL-M
[00:54:46] =========== [PASSED] drm_test_pick_cmdline_named ===========
[00:54:46] ============== [PASSED] drm_test_pick_cmdline ==============
[00:54:46] == drm_test_atomic_get_connector_for_encoder (1 subtest) ===
[00:54:46] [PASSED] drm_test_drm_atomic_get_connector_for_encoder
[00:54:46] ==== [PASSED] drm_test_atomic_get_connector_for_encoder ====
[00:54:46] =========== drm_validate_clone_mode (2 subtests) ===========
[00:54:46] ============== drm_test_check_in_clone_mode  ===============
[00:54:46] [PASSED] in_clone_mode
[00:54:46] [PASSED] not_in_clone_mode
[00:54:46] ========== [PASSED] drm_test_check_in_clone_mode ===========
[00:54:46] =============== drm_test_check_valid_clones  ===============
[00:54:46] [PASSED] not_in_clone_mode
[00:54:46] [PASSED] valid_clone
[00:54:46] [PASSED] invalid_clone
[00:54:46] =========== [PASSED] drm_test_check_valid_clones ===========
[00:54:46] ============= [PASSED] drm_validate_clone_mode =============
[00:54:46] ============= drm_validate_modeset (1 subtest) =============
[00:54:46] [PASSED] drm_test_check_connector_changed_modeset
[00:54:46] ============== [PASSED] drm_validate_modeset ===============
[00:54:46] ====== drm_test_bridge_get_current_state (2 subtests) ======
[00:54:46] [PASSED] drm_test_drm_bridge_get_current_state_atomic
[00:54:46] [PASSED] drm_test_drm_bridge_get_current_state_legacy
[00:54:46] ======== [PASSED] drm_test_bridge_get_current_state ========
[00:54:46] ====== drm_test_bridge_helper_reset_crtc (3 subtests) ======
[00:54:46] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic
[00:54:46] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic_disabled
[00:54:46] [PASSED] drm_test_drm_bridge_helper_reset_crtc_legacy
[00:54:46] ======== [PASSED] drm_test_bridge_helper_reset_crtc ========
[00:54:46] ============== drm_bridge_alloc (2 subtests) ===============
[00:54:46] [PASSED] drm_test_drm_bridge_alloc_basic
[00:54:46] [PASSED] drm_test_drm_bridge_alloc_get_put
[00:54:46] ================ [PASSED] drm_bridge_alloc =================
[00:54:46] ================== drm_buddy (8 subtests) ==================
[00:54:46] [PASSED] drm_test_buddy_alloc_limit
[00:54:46] [PASSED] drm_test_buddy_alloc_optimistic
[00:54:46] [PASSED] drm_test_buddy_alloc_pessimistic
[00:54:46] [PASSED] drm_test_buddy_alloc_pathological
[00:54:46] [PASSED] drm_test_buddy_alloc_contiguous
[00:54:46] [PASSED] drm_test_buddy_alloc_clear
[00:54:47] [PASSED] drm_test_buddy_alloc_range_bias
[00:54:47] [PASSED] drm_test_buddy_fragmentation_performance
[00:54:47] ==================== [PASSED] drm_buddy ====================
[00:54:47] ============= drm_cmdline_parser (40 subtests) =============
[00:54:47] [PASSED] drm_test_cmdline_force_d_only
[00:54:47] [PASSED] drm_test_cmdline_force_D_only_dvi
[00:54:47] [PASSED] drm_test_cmdline_force_D_only_hdmi
[00:54:47] [PASSED] drm_test_cmdline_force_D_only_not_digital
[00:54:47] [PASSED] drm_test_cmdline_force_e_only
[00:54:47] [PASSED] drm_test_cmdline_res
[00:54:47] [PASSED] drm_test_cmdline_res_vesa
[00:54:47] [PASSED] drm_test_cmdline_res_vesa_rblank
[00:54:47] [PASSED] drm_test_cmdline_res_rblank
[00:54:47] [PASSED] drm_test_cmdline_res_bpp
[00:54:47] [PASSED] drm_test_cmdline_res_refresh
[00:54:47] [PASSED] drm_test_cmdline_res_bpp_refresh
[00:54:47] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced
[00:54:47] [PASSED] drm_test_cmdline_res_bpp_refresh_margins
[00:54:47] [PASSED] drm_test_cmdline_res_bpp_refresh_force_off
[00:54:47] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on
[00:54:47] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_analog
[00:54:47] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_digital
[00:54:47] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced_margins_force_on
[00:54:47] [PASSED] drm_test_cmdline_res_margins_force_on
[00:54:47] [PASSED] drm_test_cmdline_res_vesa_margins
[00:54:47] [PASSED] drm_test_cmdline_name
[00:54:47] [PASSED] drm_test_cmdline_name_bpp
[00:54:47] [PASSED] drm_test_cmdline_name_option
[00:54:47] [PASSED] drm_test_cmdline_name_bpp_option
[00:54:47] [PASSED] drm_test_cmdline_rotate_0
[00:54:47] [PASSED] drm_test_cmdline_rotate_90
[00:54:47] [PASSED] drm_test_cmdline_rotate_180
[00:54:47] [PASSED] drm_test_cmdline_rotate_270
[00:54:47] [PASSED] drm_test_cmdline_hmirror
[00:54:47] [PASSED] drm_test_cmdline_vmirror
[00:54:47] [PASSED] drm_test_cmdline_margin_options
[00:54:47] [PASSED] drm_test_cmdline_multiple_options
[00:54:47] [PASSED] drm_test_cmdline_bpp_extra_and_option
[00:54:47] [PASSED] drm_test_cmdline_extra_and_option
[00:54:47] [PASSED] drm_test_cmdline_freestanding_options
[00:54:47] [PASSED] drm_test_cmdline_freestanding_force_e_and_options
[00:54:47] [PASSED] drm_test_cmdline_panel_orientation
[00:54:47] ================ drm_test_cmdline_invalid  =================
[00:54:47] [PASSED] margin_only
[00:54:47] [PASSED] interlace_only
[00:54:47] [PASSED] res_missing_x
[00:54:47] [PASSED] res_missing_y
[00:54:47] [PASSED] res_bad_y
[00:54:47] [PASSED] res_missing_y_bpp
[00:54:47] [PASSED] res_bad_bpp
[00:54:47] [PASSED] res_bad_refresh
[00:54:47] [PASSED] res_bpp_refresh_force_on_off
[00:54:47] [PASSED] res_invalid_mode
[00:54:47] [PASSED] res_bpp_wrong_place_mode
[00:54:47] [PASSED] name_bpp_refresh
[00:54:47] [PASSED] name_refresh
[00:54:47] [PASSED] name_refresh_wrong_mode
[00:54:47] [PASSED] name_refresh_invalid_mode
[00:54:47] [PASSED] rotate_multiple
[00:54:47] [PASSED] rotate_invalid_val
[00:54:47] [PASSED] rotate_truncated
[00:54:47] [PASSED] invalid_option
[00:54:47] [PASSED] invalid_tv_option
[00:54:47] [PASSED] truncated_tv_option
[00:54:47] ============ [PASSED] drm_test_cmdline_invalid =============
[00:54:47] =============== drm_test_cmdline_tv_options  ===============
[00:54:47] [PASSED] NTSC
[00:54:47] [PASSED] NTSC_443
[00:54:47] [PASSED] NTSC_J
[00:54:47] [PASSED] PAL
[00:54:47] [PASSED] PAL_M
[00:54:47] [PASSED] PAL_N
[00:54:47] [PASSED] SECAM
[00:54:47] [PASSED] MONO_525
[00:54:47] [PASSED] MONO_625
[00:54:47] =========== [PASSED] drm_test_cmdline_tv_options ===========
[00:54:47] =============== [PASSED] drm_cmdline_parser ================
[00:54:47] ========== drmm_connector_hdmi_init (20 subtests) ==========
[00:54:47] [PASSED] drm_test_connector_hdmi_init_valid
[00:54:47] [PASSED] drm_test_connector_hdmi_init_bpc_8
[00:54:47] [PASSED] drm_test_connector_hdmi_init_bpc_10
[00:54:47] [PASSED] drm_test_connector_hdmi_init_bpc_12
[00:54:47] [PASSED] drm_test_connector_hdmi_init_bpc_invalid
[00:54:47] [PASSED] drm_test_connector_hdmi_init_bpc_null
[00:54:47] [PASSED] drm_test_connector_hdmi_init_formats_empty
[00:54:47] [PASSED] drm_test_connector_hdmi_init_formats_no_rgb
[00:54:47] === drm_test_connector_hdmi_init_formats_yuv420_allowed  ===
[00:54:47] [PASSED] supported_formats=0x9 yuv420_allowed=1
[00:54:47] [PASSED] supported_formats=0x9 yuv420_allowed=0
[00:54:47] [PASSED] supported_formats=0x3 yuv420_allowed=1
[00:54:47] [PASSED] supported_formats=0x3 yuv420_allowed=0
[00:54:47] === [PASSED] drm_test_connector_hdmi_init_formats_yuv420_allowed ===
[00:54:47] [PASSED] drm_test_connector_hdmi_init_null_ddc
[00:54:47] [PASSED] drm_test_connector_hdmi_init_null_product
[00:54:47] [PASSED] drm_test_connector_hdmi_init_null_vendor
[00:54:47] [PASSED] drm_test_connector_hdmi_init_product_length_exact
[00:54:47] [PASSED] drm_test_connector_hdmi_init_product_length_too_long
[00:54:47] [PASSED] drm_test_connector_hdmi_init_product_valid
[00:54:47] [PASSED] drm_test_connector_hdmi_init_vendor_length_exact
[00:54:47] [PASSED] drm_test_connector_hdmi_init_vendor_length_too_long
[00:54:47] [PASSED] drm_test_connector_hdmi_init_vendor_valid
[00:54:47] ========= drm_test_connector_hdmi_init_type_valid  =========
[00:54:47] [PASSED] HDMI-A
[00:54:47] [PASSED] HDMI-B
[00:54:47] ===== [PASSED] drm_test_connector_hdmi_init_type_valid =====
[00:54:47] ======== drm_test_connector_hdmi_init_type_invalid  ========
[00:54:47] [PASSED] Unknown
[00:54:47] [PASSED] VGA
[00:54:47] [PASSED] DVI-I
[00:54:47] [PASSED] DVI-D
[00:54:47] [PASSED] DVI-A
[00:54:47] [PASSED] Composite
[00:54:47] [PASSED] SVIDEO
[00:54:47] [PASSED] LVDS
[00:54:47] [PASSED] Component
[00:54:47] [PASSED] DIN
[00:54:47] [PASSED] DP
[00:54:47] [PASSED] TV
[00:54:47] [PASSED] eDP
[00:54:47] [PASSED] Virtual
[00:54:47] [PASSED] DSI
[00:54:47] [PASSED] DPI
[00:54:47] [PASSED] Writeback
[00:54:47] [PASSED] SPI
[00:54:47] [PASSED] USB
[00:54:47] ==== [PASSED] drm_test_connector_hdmi_init_type_invalid ====
[00:54:47] ============ [PASSED] drmm_connector_hdmi_init =============
[00:54:47] ============= drmm_connector_init (3 subtests) =============
[00:54:47] [PASSED] drm_test_drmm_connector_init
[00:54:47] [PASSED] drm_test_drmm_connector_init_null_ddc
[00:54:47] ========= drm_test_drmm_connector_init_type_valid  =========
[00:54:47] [PASSED] Unknown
[00:54:47] [PASSED] VGA
[00:54:47] [PASSED] DVI-I
[00:54:47] [PASSED] DVI-D
[00:54:47] [PASSED] DVI-A
[00:54:47] [PASSED] Composite
[00:54:47] [PASSED] SVIDEO
[00:54:47] [PASSED] LVDS
[00:54:47] [PASSED] Component
[00:54:47] [PASSED] DIN
[00:54:47] [PASSED] DP
[00:54:47] [PASSED] HDMI-A
[00:54:47] [PASSED] HDMI-B
[00:54:47] [PASSED] TV
[00:54:47] [PASSED] eDP
[00:54:47] [PASSED] Virtual
[00:54:47] [PASSED] DSI
[00:54:47] [PASSED] DPI
[00:54:47] [PASSED] Writeback
[00:54:47] [PASSED] SPI
[00:54:47] [PASSED] USB
[00:54:47] ===== [PASSED] drm_test_drmm_connector_init_type_valid =====
[00:54:47] =============== [PASSED] drmm_connector_init ===============
[00:54:47] ========= drm_connector_dynamic_init (6 subtests) ==========
[00:54:47] [PASSED] drm_test_drm_connector_dynamic_init
[00:54:47] [PASSED] drm_test_drm_connector_dynamic_init_null_ddc
[00:54:47] [PASSED] drm_test_drm_connector_dynamic_init_not_added
[00:54:47] [PASSED] drm_test_drm_connector_dynamic_init_properties
[00:54:47] ===== drm_test_drm_connector_dynamic_init_type_valid  ======
[00:54:47] [PASSED] Unknown
[00:54:47] [PASSED] VGA
[00:54:47] [PASSED] DVI-I
[00:54:47] [PASSED] DVI-D
[00:54:47] [PASSED] DVI-A
[00:54:47] [PASSED] Composite
[00:54:47] [PASSED] SVIDEO
[00:54:47] [PASSED] LVDS
[00:54:47] [PASSED] Component
[00:54:47] [PASSED] DIN
[00:54:47] [PASSED] DP
[00:54:47] [PASSED] HDMI-A
[00:54:47] [PASSED] HDMI-B
[00:54:47] [PASSED] TV
[00:54:47] [PASSED] eDP
[00:54:47] [PASSED] Virtual
[00:54:47] [PASSED] DSI
[00:54:47] [PASSED] DPI
[00:54:47] [PASSED] Writeback
[00:54:47] [PASSED] SPI
[00:54:47] [PASSED] USB
[00:54:47] = [PASSED] drm_test_drm_connector_dynamic_init_type_valid ==
[00:54:47] ======== drm_test_drm_connector_dynamic_init_name  =========
[00:54:47] [PASSED] Unknown
[00:54:47] [PASSED] VGA
[00:54:47] [PASSED] DVI-I
[00:54:47] [PASSED] DVI-D
[00:54:47] [PASSED] DVI-A
[00:54:47] [PASSED] Composite
[00:54:47] [PASSED] SVIDEO
[00:54:47] [PASSED] LVDS
[00:54:47] [PASSED] Component
[00:54:47] [PASSED] DIN
[00:54:47] [PASSED] DP
[00:54:47] [PASSED] HDMI-A
[00:54:47] [PASSED] HDMI-B
[00:54:47] [PASSED] TV
[00:54:47] [PASSED] eDP
[00:54:47] [PASSED] Virtual
[00:54:47] [PASSED] DSI
[00:54:47] [PASSED] DPI
[00:54:47] [PASSED] Writeback
[00:54:47] [PASSED] SPI
[00:54:47] [PASSED] USB
[00:54:47] ==== [PASSED] drm_test_drm_connector_dynamic_init_name =====
[00:54:47] =========== [PASSED] drm_connector_dynamic_init ============
[00:54:47] ==== drm_connector_dynamic_register_early (4 subtests) =====
[00:54:47] [PASSED] drm_test_drm_connector_dynamic_register_early_on_list
[00:54:47] [PASSED] drm_test_drm_connector_dynamic_register_early_defer
[00:54:47] [PASSED] drm_test_drm_connector_dynamic_register_early_no_init
[00:54:47] [PASSED] drm_test_drm_connector_dynamic_register_early_no_mode_object
[00:54:47] ====== [PASSED] drm_connector_dynamic_register_early =======
[00:54:47] ======= drm_connector_dynamic_register (7 subtests) ========
[00:54:47] [PASSED] drm_test_drm_connector_dynamic_register_on_list
[00:54:47] [PASSED] drm_test_drm_connector_dynamic_register_no_defer
[00:54:47] [PASSED] drm_test_drm_connector_dynamic_register_no_init
[00:54:47] [PASSED] drm_test_drm_connector_dynamic_register_mode_object
[00:54:47] [PASSED] drm_test_drm_connector_dynamic_register_sysfs
[00:54:47] [PASSED] drm_test_drm_connector_dynamic_register_sysfs_name
[00:54:47] [PASSED] drm_test_drm_connector_dynamic_register_debugfs
[00:54:47] ========= [PASSED] drm_connector_dynamic_register ==========
[00:54:47] = drm_connector_attach_broadcast_rgb_property (2 subtests) =
[00:54:47] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property
[00:54:47] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property_hdmi_connector
[00:54:47] === [PASSED] drm_connector_attach_broadcast_rgb_property ===
[00:54:47] ========== drm_get_tv_mode_from_name (2 subtests) ==========
[00:54:47] ========== drm_test_get_tv_mode_from_name_valid  ===========
[00:54:47] [PASSED] NTSC
[00:54:47] [PASSED] NTSC-443
[00:54:47] [PASSED] NTSC-J
[00:54:47] [PASSED] PAL
[00:54:47] [PASSED] PAL-M
[00:54:47] [PASSED] PAL-N
[00:54:47] [PASSED] SECAM
[00:54:47] [PASSED] Mono
[00:54:47] ====== [PASSED] drm_test_get_tv_mode_from_name_valid =======
[00:54:47] [PASSED] drm_test_get_tv_mode_from_name_truncated
[00:54:47] ============ [PASSED] drm_get_tv_mode_from_name ============
[00:54:47] = drm_test_connector_hdmi_compute_mode_clock (12 subtests) =
[00:54:47] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb
[00:54:47] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc
[00:54:47] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc_vic_1
[00:54:47] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc
[00:54:47] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc_vic_1
[00:54:47] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_double
[00:54:47] = drm_test_connector_hdmi_compute_mode_clock_yuv420_valid  =
[00:54:47] [PASSED] VIC 96
[00:54:47] [PASSED] VIC 97
[00:54:47] [PASSED] VIC 101
[00:54:47] [PASSED] VIC 102
[00:54:47] [PASSED] VIC 106
[00:54:47] [PASSED] VIC 107
[00:54:47] === [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_valid ===
[00:54:47] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_10_bpc
[00:54:47] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_12_bpc
[00:54:47] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_8_bpc
[00:54:47] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_10_bpc
[00:54:47] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_12_bpc
[00:54:47] === [PASSED] drm_test_connector_hdmi_compute_mode_clock ====
[00:54:47] == drm_hdmi_connector_get_broadcast_rgb_name (2 subtests) ==
[00:54:47] === drm_test_drm_hdmi_connector_get_broadcast_rgb_name  ====
[00:54:47] [PASSED] Automatic
[00:54:47] [PASSED] Full
[00:54:47] [PASSED] Limited 16:235
[00:54:47] === [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name ===
[00:54:47] [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name_invalid
[00:54:47] ==== [PASSED] drm_hdmi_connector_get_broadcast_rgb_name ====
[00:54:47] == drm_hdmi_connector_get_output_format_name (2 subtests) ==
[00:54:47] === drm_test_drm_hdmi_connector_get_output_format_name  ====
[00:54:47] [PASSED] RGB
[00:54:47] [PASSED] YUV 4:2:0
[00:54:47] [PASSED] YUV 4:2:2
[00:54:47] [PASSED] YUV 4:4:4
[00:54:47] === [PASSED] drm_test_drm_hdmi_connector_get_output_format_name ===
[00:54:47] [PASSED] drm_test_drm_hdmi_connector_get_output_format_name_invalid
[00:54:47] ==== [PASSED] drm_hdmi_connector_get_output_format_name ====
[00:54:47] ============= drm_damage_helper (21 subtests) ==============
[00:54:47] [PASSED] drm_test_damage_iter_no_damage
[00:54:47] [PASSED] drm_test_damage_iter_no_damage_fractional_src
[00:54:47] [PASSED] drm_test_damage_iter_no_damage_src_moved
[00:54:47] [PASSED] drm_test_damage_iter_no_damage_fractional_src_moved
[00:54:47] [PASSED] drm_test_damage_iter_no_damage_not_visible
[00:54:47] [PASSED] drm_test_damage_iter_no_damage_no_crtc
[00:54:47] [PASSED] drm_test_damage_iter_no_damage_no_fb
[00:54:47] [PASSED] drm_test_damage_iter_simple_damage
[00:54:47] [PASSED] drm_test_damage_iter_single_damage
[00:54:47] [PASSED] drm_test_damage_iter_single_damage_intersect_src
[00:54:47] [PASSED] drm_test_damage_iter_single_damage_outside_src
[00:54:47] [PASSED] drm_test_damage_iter_single_damage_fractional_src
[00:54:47] [PASSED] drm_test_damage_iter_single_damage_intersect_fractional_src
[00:54:47] [PASSED] drm_test_damage_iter_single_damage_outside_fractional_src
[00:54:47] [PASSED] drm_test_damage_iter_single_damage_src_moved
[00:54:47] [PASSED] drm_test_damage_iter_single_damage_fractional_src_moved
[00:54:47] [PASSED] drm_test_damage_iter_damage
[00:54:47] [PASSED] drm_test_damage_iter_damage_one_intersect
[00:54:47] [PASSED] drm_test_damage_iter_damage_one_outside
[00:54:47] [PASSED] drm_test_damage_iter_damage_src_moved
[00:54:47] [PASSED] drm_test_damage_iter_damage_not_visible
[00:54:47] ================ [PASSED] drm_damage_helper ================
[00:54:47] ============== drm_dp_mst_helper (3 subtests) ==============
[00:54:47] ============== drm_test_dp_mst_calc_pbn_mode  ==============
[00:54:47] [PASSED] Clock 154000 BPP 30 DSC disabled
[00:54:47] [PASSED] Clock 234000 BPP 30 DSC disabled
[00:54:47] [PASSED] Clock 297000 BPP 24 DSC disabled
[00:54:47] [PASSED] Clock 332880 BPP 24 DSC enabled
[00:54:47] [PASSED] Clock 324540 BPP 24 DSC enabled
[00:54:47] ========== [PASSED] drm_test_dp_mst_calc_pbn_mode ==========
[00:54:47] ============== drm_test_dp_mst_calc_pbn_div  ===============
[00:54:47] [PASSED] Link rate 2000000 lane count 4
[00:54:47] [PASSED] Link rate 2000000 lane count 2
[00:54:47] [PASSED] Link rate 2000000 lane count 1
[00:54:47] [PASSED] Link rate 1350000 lane count 4
[00:54:47] [PASSED] Link rate 1350000 lane count 2
[00:54:47] [PASSED] Link rate 1350000 lane count 1
[00:54:47] [PASSED] Link rate 1000000 lane count 4
[00:54:47] [PASSED] Link rate 1000000 lane count 2
[00:54:47] [PASSED] Link rate 1000000 lane count 1
[00:54:47] [PASSED] Link rate 810000 lane count 4
[00:54:47] [PASSED] Link rate 810000 lane count 2
[00:54:47] [PASSED] Link rate 810000 lane count 1
[00:54:47] [PASSED] Link rate 540000 lane count 4
[00:54:47] [PASSED] Link rate 540000 lane count 2
[00:54:47] [PASSED] Link rate 540000 lane count 1
[00:54:47] [PASSED] Link rate 270000 lane count 4
[00:54:47] [PASSED] Link rate 270000 lane count 2
[00:54:47] [PASSED] Link rate 270000 lane count 1
[00:54:47] [PASSED] Link rate 162000 lane count 4
[00:54:47] [PASSED] Link rate 162000 lane count 2
[00:54:47] [PASSED] Link rate 162000 lane count 1
[00:54:47] ========== [PASSED] drm_test_dp_mst_calc_pbn_div ===========
[00:54:47] ========= drm_test_dp_mst_sideband_msg_req_decode  =========
[00:54:47] [PASSED] DP_ENUM_PATH_RESOURCES with port number
[00:54:47] [PASSED] DP_POWER_UP_PHY with port number
[00:54:47] [PASSED] DP_POWER_DOWN_PHY with port number
[00:54:47] [PASSED] DP_ALLOCATE_PAYLOAD with SDP stream sinks
[00:54:47] [PASSED] DP_ALLOCATE_PAYLOAD with port number
[00:54:47] [PASSED] DP_ALLOCATE_PAYLOAD with VCPI
[00:54:47] [PASSED] DP_ALLOCATE_PAYLOAD with PBN
[00:54:47] [PASSED] DP_QUERY_PAYLOAD with port number
[00:54:47] [PASSED] DP_QUERY_PAYLOAD with VCPI
[00:54:47] [PASSED] DP_REMOTE_DPCD_READ with port number
[00:54:47] [PASSED] DP_REMOTE_DPCD_READ with DPCD address
[00:54:47] [PASSED] DP_REMOTE_DPCD_READ with max number of bytes
[00:54:47] [PASSED] DP_REMOTE_DPCD_WRITE with port number
[00:54:47] [PASSED] DP_REMOTE_DPCD_WRITE with DPCD address
[00:54:47] [PASSED] DP_REMOTE_DPCD_WRITE with data array
[00:54:47] [PASSED] DP_REMOTE_I2C_READ with port number
[00:54:47] [PASSED] DP_REMOTE_I2C_READ with I2C device ID
[00:54:47] [PASSED] DP_REMOTE_I2C_READ with transactions array
[00:54:47] [PASSED] DP_REMOTE_I2C_WRITE with port number
[00:54:47] [PASSED] DP_REMOTE_I2C_WRITE with I2C device ID
[00:54:47] [PASSED] DP_REMOTE_I2C_WRITE with data array
[00:54:47] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream ID
[00:54:47] [PASSED] DP_QUERY_STREAM_ENC_STATUS with client ID
[00:54:47] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream event
[00:54:47] [PASSED] DP_QUERY_STREAM_ENC_STATUS with valid stream event
[00:54:47] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream behavior
[00:54:47] [PASSED] DP_QUERY_STREAM_ENC_STATUS with a valid stream behavior
[00:54:47] ===== [PASSED] drm_test_dp_mst_sideband_msg_req_decode =====
[00:54:47] ================ [PASSED] drm_dp_mst_helper ================
[00:54:47] ================== drm_exec (7 subtests) ===================
[00:54:47] [PASSED] sanitycheck
[00:54:47] [PASSED] test_lock
[00:54:47] [PASSED] test_lock_unlock
[00:54:47] [PASSED] test_duplicates
[00:54:47] [PASSED] test_prepare
[00:54:47] [PASSED] test_prepare_array
[00:54:47] [PASSED] test_multiple_loops
[00:54:47] ==================== [PASSED] drm_exec =====================
[00:54:47] =========== drm_format_helper_test (17 subtests) ===========
[00:54:47] ============== drm_test_fb_xrgb8888_to_gray8  ==============
[00:54:47] [PASSED] single_pixel_source_buffer
[00:54:47] [PASSED] single_pixel_clip_rectangle
[00:54:47] [PASSED] well_known_colors
[00:54:47] [PASSED] destination_pitch
[00:54:47] ========== [PASSED] drm_test_fb_xrgb8888_to_gray8 ==========
[00:54:47] ============= drm_test_fb_xrgb8888_to_rgb332  ==============
[00:54:47] [PASSED] single_pixel_source_buffer
[00:54:47] [PASSED] single_pixel_clip_rectangle
[00:54:47] [PASSED] well_known_colors
[00:54:47] [PASSED] destination_pitch
[00:54:47] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb332 ==========
[00:54:47] ============= drm_test_fb_xrgb8888_to_rgb565  ==============
[00:54:47] [PASSED] single_pixel_source_buffer
[00:54:47] [PASSED] single_pixel_clip_rectangle
[00:54:47] [PASSED] well_known_colors
[00:54:47] [PASSED] destination_pitch
[00:54:47] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb565 ==========
[00:54:47] ============ drm_test_fb_xrgb8888_to_xrgb1555  =============
[00:54:47] [PASSED] single_pixel_source_buffer
[00:54:47] [PASSED] single_pixel_clip_rectangle
[00:54:47] [PASSED] well_known_colors
[00:54:47] [PASSED] destination_pitch
[00:54:47] ======== [PASSED] drm_test_fb_xrgb8888_to_xrgb1555 =========
[00:54:47] ============ drm_test_fb_xrgb8888_to_argb1555  =============
[00:54:47] [PASSED] single_pixel_source_buffer
[00:54:47] [PASSED] single_pixel_clip_rectangle
[00:54:47] [PASSED] well_known_colors
[00:54:47] [PASSED] destination_pitch
[00:54:47] ======== [PASSED] drm_test_fb_xrgb8888_to_argb1555 =========
[00:54:47] ============ drm_test_fb_xrgb8888_to_rgba5551  =============
[00:54:47] [PASSED] single_pixel_source_buffer
[00:54:47] [PASSED] single_pixel_clip_rectangle
[00:54:47] [PASSED] well_known_colors
[00:54:47] [PASSED] destination_pitch
[00:54:47] ======== [PASSED] drm_test_fb_xrgb8888_to_rgba5551 =========
[00:54:47] ============= drm_test_fb_xrgb8888_to_rgb888  ==============
[00:54:47] [PASSED] single_pixel_source_buffer
[00:54:47] [PASSED] single_pixel_clip_rectangle
[00:54:47] [PASSED] well_known_colors
[00:54:47] [PASSED] destination_pitch
[00:54:47] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb888 ==========
[00:54:47] ============= drm_test_fb_xrgb8888_to_bgr888  ==============
[00:54:47] [PASSED] single_pixel_source_buffer
[00:54:47] [PASSED] single_pixel_clip_rectangle
[00:54:47] [PASSED] well_known_colors
[00:54:47] [PASSED] destination_pitch
[00:54:47] ========= [PASSED] drm_test_fb_xrgb8888_to_bgr888 ==========
[00:54:47] ============ drm_test_fb_xrgb8888_to_argb8888  =============
[00:54:47] [PASSED] single_pixel_source_buffer
[00:54:47] [PASSED] single_pixel_clip_rectangle
[00:54:47] [PASSED] well_known_colors
[00:54:47] [PASSED] destination_pitch
[00:54:47] ======== [PASSED] drm_test_fb_xrgb8888_to_argb8888 =========
[00:54:47] =========== drm_test_fb_xrgb8888_to_xrgb2101010  ===========
[00:54:47] [PASSED] single_pixel_source_buffer
[00:54:47] [PASSED] single_pixel_clip_rectangle
[00:54:47] [PASSED] well_known_colors
[00:54:47] [PASSED] destination_pitch
[00:54:47] ======= [PASSED] drm_test_fb_xrgb8888_to_xrgb2101010 =======
[00:54:47] =========== drm_test_fb_xrgb8888_to_argb2101010  ===========
[00:54:47] [PASSED] single_pixel_source_buffer
[00:54:47] [PASSED] single_pixel_clip_rectangle
[00:54:47] [PASSED] well_known_colors
[00:54:47] [PASSED] destination_pitch
[00:54:47] ======= [PASSED] drm_test_fb_xrgb8888_to_argb2101010 =======
[00:54:47] ============== drm_test_fb_xrgb8888_to_mono  ===============
[00:54:47] [PASSED] single_pixel_source_buffer
[00:54:47] [PASSED] single_pixel_clip_rectangle
[00:54:47] [PASSED] well_known_colors
[00:54:47] [PASSED] destination_pitch
[00:54:47] ========== [PASSED] drm_test_fb_xrgb8888_to_mono ===========
[00:54:47] ==================== drm_test_fb_swab  =====================
[00:54:47] [PASSED] single_pixel_source_buffer
[00:54:47] [PASSED] single_pixel_clip_rectangle
[00:54:47] [PASSED] well_known_colors
[00:54:47] [PASSED] destination_pitch
[00:54:47] ================ [PASSED] drm_test_fb_swab =================
[00:54:47] ============ drm_test_fb_xrgb8888_to_xbgr8888  =============
[00:54:47] [PASSED] single_pixel_source_buffer
[00:54:47] [PASSED] single_pixel_clip_rectangle
[00:54:47] [PASSED] well_known_colors
[00:54:47] [PASSED] destination_pitch
[00:54:47] ======== [PASSED] drm_test_fb_xrgb8888_to_xbgr8888 =========
[00:54:47] ============ drm_test_fb_xrgb8888_to_abgr8888  =============
[00:54:47] [PASSED] single_pixel_source_buffer
[00:54:47] [PASSED] single_pixel_clip_rectangle
[00:54:47] [PASSED] well_known_colors
[00:54:47] [PASSED] destination_pitch
[00:54:47] ======== [PASSED] drm_test_fb_xrgb8888_to_abgr8888 =========
[00:54:47] ================= drm_test_fb_clip_offset  =================
[00:54:47] [PASSED] pass through
[00:54:47] [PASSED] horizontal offset
[00:54:47] [PASSED] vertical offset
[00:54:47] [PASSED] horizontal and vertical offset
[00:54:47] [PASSED] horizontal offset (custom pitch)
[00:54:47] [PASSED] vertical offset (custom pitch)
[00:54:47] [PASSED] horizontal and vertical offset (custom pitch)
[00:54:47] ============= [PASSED] drm_test_fb_clip_offset =============
[00:54:47] =================== drm_test_fb_memcpy  ====================
[00:54:47] [PASSED] single_pixel_source_buffer: XR24 little-endian (0x34325258)
[00:54:47] [PASSED] single_pixel_source_buffer: XRA8 little-endian (0x38415258)
[00:54:47] [PASSED] single_pixel_source_buffer: YU24 little-endian (0x34325559)
[00:54:47] [PASSED] single_pixel_clip_rectangle: XB24 little-endian (0x34324258)
[00:54:47] [PASSED] single_pixel_clip_rectangle: XRA8 little-endian (0x38415258)
[00:54:47] [PASSED] single_pixel_clip_rectangle: YU24 little-endian (0x34325559)
[00:54:47] [PASSED] well_known_colors: XB24 little-endian (0x34324258)
[00:54:47] [PASSED] well_known_colors: XRA8 little-endian (0x38415258)
[00:54:47] [PASSED] well_known_colors: YU24 little-endian (0x34325559)
[00:54:47] [PASSED] destination_pitch: XB24 little-endian (0x34324258)
[00:54:47] [PASSED] destination_pitch: XRA8 little-endian (0x38415258)
[00:54:47] [PASSED] destination_pitch: YU24 little-endian (0x34325559)
[00:54:47] =============== [PASSED] drm_test_fb_memcpy ================
[00:54:47] ============= [PASSED] drm_format_helper_test ==============
[00:54:47] ================= drm_format (18 subtests) =================
[00:54:47] [PASSED] drm_test_format_block_width_invalid
[00:54:47] [PASSED] drm_test_format_block_width_one_plane
[00:54:47] [PASSED] drm_test_format_block_width_two_plane
[00:54:47] [PASSED] drm_test_format_block_width_three_plane
[00:54:47] [PASSED] drm_test_format_block_width_tiled
[00:54:47] [PASSED] drm_test_format_block_height_invalid
[00:54:47] [PASSED] drm_test_format_block_height_one_plane
[00:54:47] [PASSED] drm_test_format_block_height_two_plane
[00:54:47] [PASSED] drm_test_format_block_height_three_plane
[00:54:47] [PASSED] drm_test_format_block_height_tiled
[00:54:47] [PASSED] drm_test_format_min_pitch_invalid
[00:54:47] [PASSED] drm_test_format_min_pitch_one_plane_8bpp
[00:54:47] [PASSED] drm_test_format_min_pitch_one_plane_16bpp
[00:54:47] [PASSED] drm_test_format_min_pitch_one_plane_24bpp
[00:54:47] [PASSED] drm_test_format_min_pitch_one_plane_32bpp
[00:54:47] [PASSED] drm_test_format_min_pitch_two_plane
[00:54:47] [PASSED] drm_test_format_min_pitch_three_plane_8bpp
[00:54:47] [PASSED] drm_test_format_min_pitch_tiled
[00:54:47] =================== [PASSED] drm_format ====================
[00:54:47] ============== drm_framebuffer (10 subtests) ===============
[00:54:47] ========== drm_test_framebuffer_check_src_coords  ==========
[00:54:47] [PASSED] Success: source fits into fb
[00:54:47] [PASSED] Fail: overflowing fb with x-axis coordinate
[00:54:47] [PASSED] Fail: overflowing fb with y-axis coordinate
[00:54:47] [PASSED] Fail: overflowing fb with source width
[00:54:47] [PASSED] Fail: overflowing fb with source height
[00:54:47] ====== [PASSED] drm_test_framebuffer_check_src_coords ======
[00:54:47] [PASSED] drm_test_framebuffer_cleanup
[00:54:47] =============== drm_test_framebuffer_create  ===============
[00:54:47] [PASSED] ABGR8888 normal sizes
[00:54:47] [PASSED] ABGR8888 max sizes
[00:54:47] [PASSED] ABGR8888 pitch greater than min required
[00:54:47] [PASSED] ABGR8888 pitch less than min required
[00:54:47] [PASSED] ABGR8888 Invalid width
[00:54:47] [PASSED] ABGR8888 Invalid buffer handle
[00:54:47] [PASSED] No pixel format
[00:54:47] [PASSED] ABGR8888 Width 0
[00:54:47] [PASSED] ABGR8888 Height 0
[00:54:47] [PASSED] ABGR8888 Out of bound height * pitch combination
[00:54:47] [PASSED] ABGR8888 Large buffer offset
[00:54:47] [PASSED] ABGR8888 Buffer offset for inexistent plane
[00:54:47] [PASSED] ABGR8888 Invalid flag
[00:54:47] [PASSED] ABGR8888 Set DRM_MODE_FB_MODIFIERS without modifiers
[00:54:47] [PASSED] ABGR8888 Valid buffer modifier
[00:54:47] [PASSED] ABGR8888 Invalid buffer modifier(DRM_FORMAT_MOD_SAMSUNG_64_32_TILE)
[00:54:47] [PASSED] ABGR8888 Extra pitches without DRM_MODE_FB_MODIFIERS
[00:54:47] [PASSED] ABGR8888 Extra pitches with DRM_MODE_FB_MODIFIERS
[00:54:47] [PASSED] NV12 Normal sizes
[00:54:47] [PASSED] NV12 Max sizes
[00:54:47] [PASSED] NV12 Invalid pitch
[00:54:47] [PASSED] NV12 Invalid modifier/missing DRM_MODE_FB_MODIFIERS flag
[00:54:47] [PASSED] NV12 different  modifier per-plane
[00:54:47] [PASSED] NV12 with DRM_FORMAT_MOD_SAMSUNG_64_32_TILE
[00:54:47] [PASSED] NV12 Valid modifiers without DRM_MODE_FB_MODIFIERS
[00:54:47] [PASSED] NV12 Modifier for inexistent plane
[00:54:47] [PASSED] NV12 Handle for inexistent plane
[00:54:47] [PASSED] NV12 Handle for inexistent plane without DRM_MODE_FB_MODIFIERS
[00:54:47] [PASSED] YVU420 DRM_MODE_FB_MODIFIERS set without modifier
[00:54:47] [PASSED] YVU420 Normal sizes
[00:54:47] [PASSED] YVU420 Max sizes
[00:54:47] [PASSED] YVU420 Invalid pitch
[00:54:47] [PASSED] YVU420 Different pitches
[00:54:47] [PASSED] YVU420 Different buffer offsets/pitches
[00:54:47] [PASSED] YVU420 Modifier set just for plane 0, without DRM_MODE_FB_MODIFIERS
[00:54:47] [PASSED] YVU420 Modifier set just for planes 0, 1, without DRM_MODE_FB_MODIFIERS
[00:54:47] [PASSED] YVU420 Modifier set just for plane 0, 1, with DRM_MODE_FB_MODIFIERS
[00:54:47] [PASSED] YVU420 Valid modifier
[00:54:47] [PASSED] YVU420 Different modifiers per plane
[00:54:47] [PASSED] YVU420 Modifier for inexistent plane
[00:54:47] [PASSED] YUV420_10BIT Invalid modifier(DRM_FORMAT_MOD_LINEAR)
[00:54:47] [PASSED] X0L2 Normal sizes
[00:54:47] [PASSED] X0L2 Max sizes
[00:54:47] [PASSED] X0L2 Invalid pitch
[00:54:47] [PASSED] X0L2 Pitch greater than minimum required
[00:54:47] [PASSED] X0L2 Handle for inexistent plane
[00:54:47] [PASSED] X0L2 Offset for inexistent plane, without DRM_MODE_FB_MODIFIERS set
[00:54:47] [PASSED] X0L2 Modifier without DRM_MODE_FB_MODIFIERS set
[00:54:47] [PASSED] X0L2 Valid modifier
[00:54:47] [PASSED] X0L2 Modifier for inexistent plane
[00:54:47] =========== [PASSED] drm_test_framebuffer_create ===========
[00:54:47] [PASSED] drm_test_framebuffer_free
[00:54:47] [PASSED] drm_test_framebuffer_init
[00:54:47] [PASSED] drm_test_framebuffer_init_bad_format
[00:54:47] [PASSED] drm_test_framebuffer_init_dev_mismatch
[00:54:47] [PASSED] drm_test_framebuffer_lookup
[00:54:47] [PASSED] drm_test_framebuffer_lookup_inexistent
[00:54:47] [PASSED] drm_test_framebuffer_modifiers_not_supported
[00:54:47] ================= [PASSED] drm_framebuffer =================
[00:54:47] ================ drm_gem_shmem (8 subtests) ================
[00:54:47] [PASSED] drm_gem_shmem_test_obj_create
[00:54:47] [PASSED] drm_gem_shmem_test_obj_create_private
[00:54:47] [PASSED] drm_gem_shmem_test_pin_pages
[00:54:47] [PASSED] drm_gem_shmem_test_vmap
[00:54:47] [PASSED] drm_gem_shmem_test_get_pages_sgt
[00:54:47] [PASSED] drm_gem_shmem_test_get_sg_table
[00:54:47] [PASSED] drm_gem_shmem_test_madvise
[00:54:47] [PASSED] drm_gem_shmem_test_purge
[00:54:47] ================== [PASSED] drm_gem_shmem ==================
[00:54:47] === drm_atomic_helper_connector_hdmi_check (27 subtests) ===
[00:54:47] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode
[00:54:47] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode_vic_1
[00:54:47] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode
[00:54:47] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode_vic_1
[00:54:47] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode
[00:54:47] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode_vic_1
[00:54:47] ====== drm_test_check_broadcast_rgb_cea_mode_yuv420  =======
[00:54:47] [PASSED] Automatic
[00:54:47] [PASSED] Full
[00:54:47] [PASSED] Limited 16:235
[00:54:47] == [PASSED] drm_test_check_broadcast_rgb_cea_mode_yuv420 ===
[00:54:47] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_changed
[00:54:47] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_not_changed
[00:54:47] [PASSED] drm_test_check_disable_connector
[00:54:47] [PASSED] drm_test_check_hdmi_funcs_reject_rate
[00:54:47] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_rgb
[00:54:47] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_yuv420
[00:54:47] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv422
[00:54:47] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv420
[00:54:47] [PASSED] drm_test_check_driver_unsupported_fallback_yuv420
[00:54:47] [PASSED] drm_test_check_output_bpc_crtc_mode_changed
[00:54:47] [PASSED] drm_test_check_output_bpc_crtc_mode_not_changed
[00:54:47] [PASSED] drm_test_check_output_bpc_dvi
[00:54:47] [PASSED] drm_test_check_output_bpc_format_vic_1
[00:54:47] [PASSED] drm_test_check_output_bpc_format_display_8bpc_only
[00:54:47] [PASSED] drm_test_check_output_bpc_format_display_rgb_only
[00:54:47] [PASSED] drm_test_check_output_bpc_format_driver_8bpc_only
[00:54:47] [PASSED] drm_test_check_output_bpc_format_driver_rgb_only
[00:54:47] [PASSED] drm_test_check_tmds_char_rate_rgb_8bpc
[00:54:47] [PASSED] drm_test_check_tmds_char_rate_rgb_10bpc
[00:54:47] [PASSED] drm_test_check_tmds_char_rate_rgb_12bpc
[00:54:47] ===== [PASSED] drm_atomic_helper_connector_hdmi_check ======
[00:54:47] === drm_atomic_helper_connector_hdmi_reset (6 subtests) ====
[00:54:47] [PASSED] drm_test_check_broadcast_rgb_value
[00:54:47] [PASSED] drm_test_check_bpc_8_value
[00:54:47] [PASSED] drm_test_check_bpc_10_value
[00:54:47] [PASSED] drm_test_check_bpc_12_value
[00:54:47] [PASSED] drm_test_check_format_value
[00:54:47] [PASSED] drm_test_check_tmds_char_value
[00:54:47] ===== [PASSED] drm_atomic_helper_connector_hdmi_reset ======
[00:54:47] = drm_atomic_helper_connector_hdmi_mode_valid (4 subtests) =
[00:54:47] [PASSED] drm_test_check_mode_valid
[00:54:47] [PASSED] drm_test_check_mode_valid_reject
[00:54:47] [PASSED] drm_test_check_mode_valid_reject_rate
[00:54:47] [PASSED] drm_test_check_mode_valid_reject_max_clock
[00:54:47] === [PASSED] drm_atomic_helper_connector_hdmi_mode_valid ===
[00:54:47] ================= drm_managed (2 subtests) =================
[00:54:47] [PASSED] drm_test_managed_release_action
[00:54:47] [PASSED] drm_test_managed_run_action
[00:54:47] =================== [PASSED] drm_managed ===================
[00:54:47] =================== drm_mm (6 subtests) ====================
[00:54:47] [PASSED] drm_test_mm_init
[00:54:47] [PASSED] drm_test_mm_debug
[00:54:47] [PASSED] drm_test_mm_align32
[00:54:47] [PASSED] drm_test_mm_align64
[00:54:47] [PASSED] drm_test_mm_lowest
[00:54:47] [PASSED] drm_test_mm_highest
[00:54:47] ===================== [PASSED] drm_mm ======================
[00:54:47] ============= drm_modes_analog_tv (5 subtests) =============
[00:54:47] [PASSED] drm_test_modes_analog_tv_mono_576i
[00:54:47] [PASSED] drm_test_modes_analog_tv_ntsc_480i
[00:54:47] [PASSED] drm_test_modes_analog_tv_ntsc_480i_inlined
[00:54:47] [PASSED] drm_test_modes_analog_tv_pal_576i
[00:54:47] [PASSED] drm_test_modes_analog_tv_pal_576i_inlined
[00:54:47] =============== [PASSED] drm_modes_analog_tv ===============
[00:54:47] ============== drm_plane_helper (2 subtests) ===============
[00:54:47] =============== drm_test_check_plane_state  ================
[00:54:47] [PASSED] clipping_simple
[00:54:47] [PASSED] clipping_rotate_reflect
[00:54:47] [PASSED] positioning_simple
[00:54:47] [PASSED] upscaling
[00:54:47] [PASSED] downscaling
[00:54:47] [PASSED] rounding1
[00:54:47] [PASSED] rounding2
[00:54:47] [PASSED] rounding3
[00:54:47] [PASSED] rounding4
[00:54:47] =========== [PASSED] drm_test_check_plane_state ============
[00:54:47] =========== drm_test_check_invalid_plane_state  ============
[00:54:47] [PASSED] positioning_invalid
[00:54:47] [PASSED] upscaling_invalid
[00:54:47] [PASSED] downscaling_invalid
[00:54:47] ======= [PASSED] drm_test_check_invalid_plane_state ========
[00:54:47] ================ [PASSED] drm_plane_helper =================
[00:54:47] ====== drm_connector_helper_tv_get_modes (1 subtest) =======
[00:54:47] ====== drm_test_connector_helper_tv_get_modes_check  =======
[00:54:47] [PASSED] None
[00:54:47] [PASSED] PAL
[00:54:47] [PASSED] NTSC
[00:54:47] [PASSED] Both, NTSC Default
[00:54:47] [PASSED] Both, PAL Default
[00:54:47] [PASSED] Both, NTSC Default, with PAL on command-line
[00:54:47] [PASSED] Both, PAL Default, with NTSC on command-line
[00:54:47] == [PASSED] drm_test_connector_helper_tv_get_modes_check ===
[00:54:47] ======== [PASSED] drm_connector_helper_tv_get_modes ========
[00:54:47] ================== drm_rect (9 subtests) ===================
[00:54:47] [PASSED] drm_test_rect_clip_scaled_div_by_zero
[00:54:47] [PASSED] drm_test_rect_clip_scaled_not_clipped
[00:54:47] [PASSED] drm_test_rect_clip_scaled_clipped
[00:54:47] [PASSED] drm_test_rect_clip_scaled_signed_vs_unsigned
[00:54:47] ================= drm_test_rect_intersect  =================
[00:54:47] [PASSED] top-left x bottom-right: 2x2+1+1 x 2x2+0+0
[00:54:47] [PASSED] top-right x bottom-left: 2x2+0+0 x 2x2+1-1
[00:54:47] [PASSED] bottom-left x top-right: 2x2+1-1 x 2x2+0+0
[00:54:47] [PASSED] bottom-right x top-left: 2x2+0+0 x 2x2+1+1
[00:54:47] [PASSED] right x left: 2x1+0+0 x 3x1+1+0
[00:54:47] [PASSED] left x right: 3x1+1+0 x 2x1+0+0
[00:54:47] [PASSED] up x bottom: 1x2+0+0 x 1x3+0-1
[00:54:47] [PASSED] bottom x up: 1x3+0-1 x 1x2+0+0
[00:54:47] [PASSED] touching corner: 1x1+0+0 x 2x2+1+1
[00:54:47] [PASSED] touching side: 1x1+0+0 x 1x1+1+0
[00:54:47] [PASSED] equal rects: 2x2+0+0 x 2x2+0+0
[00:54:47] [PASSED] inside another: 2x2+0+0 x 1x1+1+1
[00:54:47] [PASSED] far away: 1x1+0+0 x 1x1+3+6
[00:54:47] [PASSED] points intersecting: 0x0+5+10 x 0x0+5+10
[00:54:47] [PASSED] points not intersecting: 0x0+0+0 x 0x0+5+10
[00:54:47] ============= [PASSED] drm_test_rect_intersect =============
[00:54:47] ================ drm_test_rect_calc_hscale  ================
[00:54:47] [PASSED] normal use
[00:54:47] [PASSED] out of max range
[00:54:47] [PASSED] out of min range
[00:54:47] [PASSED] zero dst
[00:54:47] [PASSED] negative src
[00:54:47] [PASSED] negative dst
[00:54:47] ============ [PASSED] drm_test_rect_calc_hscale ============
[00:54:47] ================ drm_test_rect_calc_vscale  ================
[00:54:47] [PASSED] normal use
stty: 'standard input': Inappropriate ioctl for device
[00:54:47] [PASSED] out of max range
[00:54:47] [PASSED] out of min range
[00:54:47] [PASSED] zero dst
[00:54:47] [PASSED] negative src
[00:54:47] [PASSED] negative dst
[00:54:47] ============ [PASSED] drm_test_rect_calc_vscale ============
[00:54:47] ================== drm_test_rect_rotate  ===================
[00:54:47] [PASSED] reflect-x
[00:54:47] [PASSED] reflect-y
[00:54:47] [PASSED] rotate-0
[00:54:47] [PASSED] rotate-90
[00:54:47] [PASSED] rotate-180
[00:54:47] [PASSED] rotate-270
[00:54:47] ============== [PASSED] drm_test_rect_rotate ===============
[00:54:47] ================ drm_test_rect_rotate_inv  =================
[00:54:47] [PASSED] reflect-x
[00:54:47] [PASSED] reflect-y
[00:54:47] [PASSED] rotate-0
[00:54:47] [PASSED] rotate-90
[00:54:47] [PASSED] rotate-180
[00:54:47] [PASSED] rotate-270
[00:54:47] ============ [PASSED] drm_test_rect_rotate_inv =============
[00:54:47] ==================== [PASSED] drm_rect =====================
[00:54:47] ============ drm_sysfb_modeset_test (1 subtest) ============
[00:54:47] ============ drm_test_sysfb_build_fourcc_list  =============
[00:54:47] [PASSED] no native formats
[00:54:47] [PASSED] XRGB8888 as native format
[00:54:47] [PASSED] remove duplicates
[00:54:47] [PASSED] convert alpha formats
[00:54:47] [PASSED] random formats
[00:54:47] ======== [PASSED] drm_test_sysfb_build_fourcc_list =========
[00:54:47] ============= [PASSED] drm_sysfb_modeset_test ==============
[00:54:47] ================== drm_fixp (2 subtests) ===================
[00:54:47] [PASSED] drm_test_int2fixp
[00:54:47] [PASSED] drm_test_sm2fixp
[00:54:47] ==================== [PASSED] drm_fixp =====================
[00:54:47] ============================================================
[00:54:47] Testing complete. Ran 624 tests: passed: 624
[00:54:47] Elapsed time: 27.080s total, 1.632s configuring, 25.032s building, 0.413s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/ttm/tests/.kunitconfig
[00:54:47] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[00:54:48] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[00:54:58] Starting KUnit Kernel (1/1)...
[00:54:58] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[00:54:58] ================= ttm_device (5 subtests) ==================
[00:54:58] [PASSED] ttm_device_init_basic
[00:54:58] [PASSED] ttm_device_init_multiple
[00:54:58] [PASSED] ttm_device_fini_basic
[00:54:58] [PASSED] ttm_device_init_no_vma_man
[00:54:58] ================== ttm_device_init_pools  ==================
[00:54:58] [PASSED] No DMA allocations, no DMA32 required
[00:54:58] [PASSED] DMA allocations, DMA32 required
[00:54:58] [PASSED] No DMA allocations, DMA32 required
[00:54:58] [PASSED] DMA allocations, no DMA32 required
[00:54:58] ============== [PASSED] ttm_device_init_pools ==============
[00:54:58] =================== [PASSED] ttm_device ====================
[00:54:58] ================== ttm_pool (8 subtests) ===================
[00:54:58] ================== ttm_pool_alloc_basic  ===================
[00:54:58] [PASSED] One page
[00:54:58] [PASSED] More than one page
[00:54:58] [PASSED] Above the allocation limit
[00:54:58] [PASSED] One page, with coherent DMA mappings enabled
[00:54:58] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[00:54:58] ============== [PASSED] ttm_pool_alloc_basic ===============
[00:54:58] ============== ttm_pool_alloc_basic_dma_addr  ==============
[00:54:58] [PASSED] One page
[00:54:58] [PASSED] More than one page
[00:54:58] [PASSED] Above the allocation limit
[00:54:58] [PASSED] One page, with coherent DMA mappings enabled
[00:54:58] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[00:54:58] ========== [PASSED] ttm_pool_alloc_basic_dma_addr ==========
[00:54:58] [PASSED] ttm_pool_alloc_order_caching_match
[00:54:58] [PASSED] ttm_pool_alloc_caching_mismatch
[00:54:58] [PASSED] ttm_pool_alloc_order_mismatch
[00:54:58] [PASSED] ttm_pool_free_dma_alloc
[00:54:58] [PASSED] ttm_pool_free_no_dma_alloc
[00:54:58] [PASSED] ttm_pool_fini_basic
[00:54:58] ==================== [PASSED] ttm_pool =====================
[00:54:58] ================ ttm_resource (8 subtests) =================
[00:54:58] ================= ttm_resource_init_basic  =================
[00:54:58] [PASSED] Init resource in TTM_PL_SYSTEM
[00:54:58] [PASSED] Init resource in TTM_PL_VRAM
[00:54:58] [PASSED] Init resource in a private placement
[00:54:58] [PASSED] Init resource in TTM_PL_SYSTEM, set placement flags
[00:54:58] ============= [PASSED] ttm_resource_init_basic =============
[00:54:58] [PASSED] ttm_resource_init_pinned
[00:54:58] [PASSED] ttm_resource_fini_basic
[00:54:58] [PASSED] ttm_resource_manager_init_basic
[00:54:58] [PASSED] ttm_resource_manager_usage_basic
[00:54:58] [PASSED] ttm_resource_manager_set_used_basic
[00:54:58] [PASSED] ttm_sys_man_alloc_basic
[00:54:58] [PASSED] ttm_sys_man_free_basic
[00:54:58] ================== [PASSED] ttm_resource ===================
[00:54:58] =================== ttm_tt (15 subtests) ===================
[00:54:58] ==================== ttm_tt_init_basic  ====================
[00:54:58] [PASSED] Page-aligned size
[00:54:58] [PASSED] Extra pages requested
[00:54:58] ================ [PASSED] ttm_tt_init_basic ================
[00:54:58] [PASSED] ttm_tt_init_misaligned
[00:54:58] [PASSED] ttm_tt_fini_basic
[00:54:58] [PASSED] ttm_tt_fini_sg
[00:54:58] [PASSED] ttm_tt_fini_shmem
[00:54:58] [PASSED] ttm_tt_create_basic
[00:54:58] [PASSED] ttm_tt_create_invalid_bo_type
[00:54:58] [PASSED] ttm_tt_create_ttm_exists
[00:54:58] [PASSED] ttm_tt_create_failed
[00:54:58] [PASSED] ttm_tt_destroy_basic
[00:54:58] [PASSED] ttm_tt_populate_null_ttm
[00:54:58] [PASSED] ttm_tt_populate_populated_ttm
[00:54:58] [PASSED] ttm_tt_unpopulate_basic
[00:54:58] [PASSED] ttm_tt_unpopulate_empty_ttm
[00:54:58] [PASSED] ttm_tt_swapin_basic
[00:54:58] ===================== [PASSED] ttm_tt ======================
[00:54:58] =================== ttm_bo (14 subtests) ===================
[00:54:58] =========== ttm_bo_reserve_optimistic_no_ticket  ===========
[00:54:58] [PASSED] Cannot be interrupted and sleeps
[00:54:58] [PASSED] Cannot be interrupted, locks straight away
[00:54:58] [PASSED] Can be interrupted, sleeps
[00:54:58] ======= [PASSED] ttm_bo_reserve_optimistic_no_ticket =======
[00:54:58] [PASSED] ttm_bo_reserve_locked_no_sleep
[00:54:58] [PASSED] ttm_bo_reserve_no_wait_ticket
[00:54:58] [PASSED] ttm_bo_reserve_double_resv
[00:54:58] [PASSED] ttm_bo_reserve_interrupted
[00:54:58] [PASSED] ttm_bo_reserve_deadlock
[00:54:58] [PASSED] ttm_bo_unreserve_basic
[00:54:58] [PASSED] ttm_bo_unreserve_pinned
[00:54:58] [PASSED] ttm_bo_unreserve_bulk
[00:54:58] [PASSED] ttm_bo_fini_basic
[00:54:58] [PASSED] ttm_bo_fini_shared_resv
[00:54:58] [PASSED] ttm_bo_pin_basic
[00:54:58] [PASSED] ttm_bo_pin_unpin_resource
[00:54:58] [PASSED] ttm_bo_multiple_pin_one_unpin
[00:54:58] ===================== [PASSED] ttm_bo ======================
[00:54:58] ============== ttm_bo_validate (21 subtests) ===============
[00:54:58] ============== ttm_bo_init_reserved_sys_man  ===============
[00:54:58] [PASSED] Buffer object for userspace
[00:54:58] [PASSED] Kernel buffer object
[00:54:58] [PASSED] Shared buffer object
[00:54:58] ========== [PASSED] ttm_bo_init_reserved_sys_man ===========
[00:54:58] ============== ttm_bo_init_reserved_mock_man  ==============
[00:54:58] [PASSED] Buffer object for userspace
[00:54:58] [PASSED] Kernel buffer object
[00:54:58] [PASSED] Shared buffer object
[00:54:58] ========== [PASSED] ttm_bo_init_reserved_mock_man ==========
[00:54:58] [PASSED] ttm_bo_init_reserved_resv
[00:54:58] ================== ttm_bo_validate_basic  ==================
[00:54:58] [PASSED] Buffer object for userspace
[00:54:58] [PASSED] Kernel buffer object
[00:54:58] [PASSED] Shared buffer object
[00:54:58] ============== [PASSED] ttm_bo_validate_basic ==============
[00:54:58] [PASSED] ttm_bo_validate_invalid_placement
[00:54:58] ============= ttm_bo_validate_same_placement  ==============
[00:54:58] [PASSED] System manager
[00:54:58] [PASSED] VRAM manager
[00:54:58] ========= [PASSED] ttm_bo_validate_same_placement ==========
[00:54:58] [PASSED] ttm_bo_validate_failed_alloc
[00:54:58] [PASSED] ttm_bo_validate_pinned
[00:54:58] [PASSED] ttm_bo_validate_busy_placement
[00:54:58] ================ ttm_bo_validate_multihop  =================
[00:54:58] [PASSED] Buffer object for userspace
[00:54:58] [PASSED] Kernel buffer object
[00:54:58] [PASSED] Shared buffer object
[00:54:58] ============ [PASSED] ttm_bo_validate_multihop =============
[00:54:58] ========== ttm_bo_validate_no_placement_signaled  ==========
[00:54:58] [PASSED] Buffer object in system domain, no page vector
[00:54:58] [PASSED] Buffer object in system domain with an existing page vector
[00:54:58] ====== [PASSED] ttm_bo_validate_no_placement_signaled ======
[00:54:58] ======== ttm_bo_validate_no_placement_not_signaled  ========
[00:54:58] [PASSED] Buffer object for userspace
[00:54:58] [PASSED] Kernel buffer object
[00:54:58] [PASSED] Shared buffer object
[00:54:58] ==== [PASSED] ttm_bo_validate_no_placement_not_signaled ====
[00:54:58] [PASSED] ttm_bo_validate_move_fence_signaled
[00:54:58] ========= ttm_bo_validate_move_fence_not_signaled  =========
[00:54:58] [PASSED] Waits for GPU
[00:54:58] [PASSED] Tries to lock straight away
[00:54:58] ===== [PASSED] ttm_bo_validate_move_fence_not_signaled =====
[00:54:58] [PASSED] ttm_bo_validate_happy_evict
[00:54:58] [PASSED] ttm_bo_validate_all_pinned_evict
[00:54:58] [PASSED] ttm_bo_validate_allowed_only_evict
[00:54:58] [PASSED] ttm_bo_validate_deleted_evict
[00:54:58] [PASSED] ttm_bo_validate_busy_domain_evict
[00:54:58] [PASSED] ttm_bo_validate_evict_gutting
[00:54:58] [PASSED] ttm_bo_validate_recrusive_evict
stty: 'standard input': Inappropriate ioctl for device
[00:54:58] ================= [PASSED] ttm_bo_validate =================
[00:54:58] ============================================================
[00:54:58] Testing complete. Ran 101 tests: passed: 101
[00:54:58] Elapsed time: 11.228s total, 1.692s configuring, 9.319s building, 0.185s running

+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 31+ messages in thread

* ✓ Xe.CI.BAT: success for Fix DRM scheduler layering violations in Xe (rev8)
  2025-12-01 18:39 [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
                   ` (10 preceding siblings ...)
  2025-12-02  0:55 ` ✓ CI.KUnit: success " Patchwork
@ 2025-12-02  2:05 ` Patchwork
  2025-12-02  5:18 ` ✓ Xe.CI.Full: " Patchwork
  2025-12-03  1:23 ` [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
  13 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2025-12-02  2:05 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 1414 bytes --]

== Series Details ==

Series: Fix DRM scheduler layering violations in Xe (rev8)
URL   : https://patchwork.freedesktop.org/series/155314/
State : success

== Summary ==

CI Bug Log - changes from xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65_BAT -> xe-pw-155314v8_BAT
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Participating hosts (12 -> 12)
------------------------------

  No changes in participating hosts

Known issues
------------

  Here are the changes found in xe-pw-155314v8_BAT that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@xe_waitfence@reltime:
    - bat-dg2-oem2:       [PASS][1] -> [FAIL][2] ([Intel XE#6520])
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/bat-dg2-oem2/igt@xe_waitfence@reltime.html
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/bat-dg2-oem2/igt@xe_waitfence@reltime.html

  
  [Intel XE#6520]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6520


Build changes
-------------

  * Linux: xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65 -> xe-pw-155314v8

  IGT_8647: 8647
  xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65: 639f325d8cbdc690de963db2fe5840444ac7ea65
  xe-pw-155314v8: 155314v8

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/index.html

[-- Attachment #2: Type: text/html, Size: 1979 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* ✓ Xe.CI.Full: success for Fix DRM scheduler layering violations in Xe (rev8)
  2025-12-01 18:39 [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
                   ` (11 preceding siblings ...)
  2025-12-02  2:05 ` ✓ Xe.CI.BAT: " Patchwork
@ 2025-12-02  5:18 ` Patchwork
  2025-12-03  1:23 ` [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
  13 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2025-12-02  5:18 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 49538 bytes --]

== Series Details ==

Series: Fix DRM scheduler layering violations in Xe (rev8)
URL   : https://patchwork.freedesktop.org/series/155314/
State : success

== Summary ==

CI Bug Log - changes from xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65_FULL -> xe-pw-155314v8_FULL
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Participating hosts (4 -> 4)
------------------------------

  No changes in participating hosts

Known issues
------------

  Here are the changes found in xe-pw-155314v8_FULL that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_async_flips@async-flip-with-page-flip-events-tiled@pipe-b-edp-1-x:
    - shard-lnl:          NOTRUN -> [FAIL][1] ([Intel XE#6676]) +6 other tests fail
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-8/igt@kms_async_flips@async-flip-with-page-flip-events-tiled@pipe-b-edp-1-x.html

  * igt@kms_big_fb@x-tiled-32bpp-rotate-270:
    - shard-dg2-set2:     NOTRUN -> [SKIP][2] ([Intel XE#316]) +1 other test skip
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-432/igt@kms_big_fb@x-tiled-32bpp-rotate-270.html

  * igt@kms_big_fb@y-tiled-32bpp-rotate-90:
    - shard-bmg:          NOTRUN -> [SKIP][3] ([Intel XE#1124]) +2 other tests skip
   [3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-1/igt@kms_big_fb@y-tiled-32bpp-rotate-90.html

  * igt@kms_big_fb@y-tiled-64bpp-rotate-90:
    - shard-dg2-set2:     NOTRUN -> [SKIP][4] ([Intel XE#1124]) +3 other tests skip
   [4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-436/igt@kms_big_fb@y-tiled-64bpp-rotate-90.html

  * igt@kms_big_fb@yf-tiled-32bpp-rotate-180:
    - shard-adlp:         NOTRUN -> [SKIP][5] ([Intel XE#1124]) +1 other test skip
   [5]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-adlp-6/igt@kms_big_fb@yf-tiled-32bpp-rotate-180.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-180-hflip-async-flip:
    - shard-lnl:          NOTRUN -> [SKIP][6] ([Intel XE#1124]) +2 other tests skip
   [6]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-8/igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-180-hflip-async-flip.html

  * igt@kms_bw@connected-linear-tiling-3-displays-2560x1440p:
    - shard-dg2-set2:     NOTRUN -> [SKIP][7] ([Intel XE#2191])
   [7]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-436/igt@kms_bw@connected-linear-tiling-3-displays-2560x1440p.html

  * igt@kms_bw@linear-tiling-1-displays-2560x1440p:
    - shard-bmg:          NOTRUN -> [SKIP][8] ([Intel XE#367])
   [8]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-1/igt@kms_bw@linear-tiling-1-displays-2560x1440p.html

  * igt@kms_ccs@bad-pixel-format-y-tiled-gen12-rc-ccs-cc:
    - shard-lnl:          NOTRUN -> [SKIP][9] ([Intel XE#2887])
   [9]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-8/igt@kms_ccs@bad-pixel-format-y-tiled-gen12-rc-ccs-cc.html

  * igt@kms_ccs@ccs-on-another-bo-4-tiled-mtl-rc-ccs-cc:
    - shard-bmg:          NOTRUN -> [SKIP][10] ([Intel XE#2887]) +1 other test skip
   [10]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-1/igt@kms_ccs@ccs-on-another-bo-4-tiled-mtl-rc-ccs-cc.html

  * igt@kms_ccs@crc-sprite-planes-basic-4-tiled-dg2-rc-ccs-cc:
    - shard-adlp:         NOTRUN -> [SKIP][11] ([Intel XE#455] / [Intel XE#787]) +1 other test skip
   [11]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-adlp-6/igt@kms_ccs@crc-sprite-planes-basic-4-tiled-dg2-rc-ccs-cc.html

  * igt@kms_ccs@crc-sprite-planes-basic-4-tiled-dg2-rc-ccs-cc@pipe-b-hdmi-a-1:
    - shard-adlp:         NOTRUN -> [SKIP][12] ([Intel XE#787]) +2 other tests skip
   [12]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-adlp-6/igt@kms_ccs@crc-sprite-planes-basic-4-tiled-dg2-rc-ccs-cc@pipe-b-hdmi-a-1.html

  * igt@kms_ccs@crc-sprite-planes-basic-4-tiled-lnl-ccs:
    - shard-dg2-set2:     NOTRUN -> [SKIP][13] ([Intel XE#2907])
   [13]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-436/igt@kms_ccs@crc-sprite-planes-basic-4-tiled-lnl-ccs.html

  * igt@kms_ccs@crc-sprite-planes-basic-y-tiled-gen12-rc-ccs@pipe-d-dp-4:
    - shard-dg2-set2:     NOTRUN -> [SKIP][14] ([Intel XE#455] / [Intel XE#787]) +5 other tests skip
   [14]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-432/igt@kms_ccs@crc-sprite-planes-basic-y-tiled-gen12-rc-ccs@pipe-d-dp-4.html

  * igt@kms_ccs@random-ccs-data-4-tiled-mtl-mc-ccs@pipe-b-hdmi-a-6:
    - shard-dg2-set2:     NOTRUN -> [SKIP][15] ([Intel XE#787]) +20 other tests skip
   [15]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-436/igt@kms_ccs@random-ccs-data-4-tiled-mtl-mc-ccs@pipe-b-hdmi-a-6.html

  * igt@kms_chamelium_color@ctm-blue-to-red:
    - shard-dg2-set2:     NOTRUN -> [SKIP][16] ([Intel XE#306])
   [16]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-432/igt@kms_chamelium_color@ctm-blue-to-red.html

  * igt@kms_chamelium_frames@hdmi-crc-single:
    - shard-dg2-set2:     NOTRUN -> [SKIP][17] ([Intel XE#373]) +1 other test skip
   [17]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-432/igt@kms_chamelium_frames@hdmi-crc-single.html

  * igt@kms_chamelium_hpd@hdmi-hpd-with-enabled-mode:
    - shard-bmg:          NOTRUN -> [SKIP][18] ([Intel XE#2252]) +1 other test skip
   [18]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-1/igt@kms_chamelium_hpd@hdmi-hpd-with-enabled-mode.html

  * igt@kms_chamelium_hpd@vga-hpd-without-ddc:
    - shard-adlp:         NOTRUN -> [SKIP][19] ([Intel XE#373])
   [19]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-adlp-6/igt@kms_chamelium_hpd@vga-hpd-without-ddc.html

  * igt@kms_colorop@plane-xr24-xr24-ctm_3x4_bt709_dec_enc:
    - shard-bmg:          NOTRUN -> [SKIP][20] ([Intel XE#6704]) +1 other test skip
   [20]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-1/igt@kms_colorop@plane-xr24-xr24-ctm_3x4_bt709_dec_enc.html

  * igt@kms_colorop@plane-xr30-xr30-bt2020_inv_oetf:
    - shard-dg2-set2:     NOTRUN -> [SKIP][21] ([Intel XE#6704])
   [21]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-436/igt@kms_colorop@plane-xr30-xr30-bt2020_inv_oetf.html

  * igt@kms_content_protection@legacy:
    - shard-dg2-set2:     NOTRUN -> [FAIL][22] ([Intel XE#1178]) +3 other tests fail
   [22]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-436/igt@kms_content_protection@legacy.html

  * igt@kms_content_protection@uevent:
    - shard-bmg:          NOTRUN -> [FAIL][23] ([Intel XE#6707]) +1 other test fail
   [23]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-1/igt@kms_content_protection@uevent.html

  * igt@kms_cursor_crc@cursor-onscreen-32x32:
    - shard-bmg:          NOTRUN -> [SKIP][24] ([Intel XE#2320])
   [24]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-1/igt@kms_cursor_crc@cursor-onscreen-32x32.html

  * igt@kms_cursor_crc@cursor-random-32x10:
    - shard-lnl:          NOTRUN -> [SKIP][25] ([Intel XE#1424])
   [25]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-8/igt@kms_cursor_crc@cursor-random-32x10.html

  * igt@kms_cursor_crc@cursor-sliding-512x512:
    - shard-dg2-set2:     NOTRUN -> [SKIP][26] ([Intel XE#308])
   [26]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-436/igt@kms_cursor_crc@cursor-sliding-512x512.html

  * igt@kms_cursor_legacy@cursora-vs-flipb-atomic-transitions:
    - shard-lnl:          NOTRUN -> [SKIP][27] ([Intel XE#309])
   [27]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-8/igt@kms_cursor_legacy@cursora-vs-flipb-atomic-transitions.html

  * igt@kms_cursor_legacy@cursorb-vs-flipa-atomic:
    - shard-bmg:          [PASS][28] -> [SKIP][29] ([Intel XE#2291])
   [28]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-8/igt@kms_cursor_legacy@cursorb-vs-flipa-atomic.html
   [29]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-2/igt@kms_cursor_legacy@cursorb-vs-flipa-atomic.html

  * igt@kms_cursor_legacy@short-busy-flip-before-cursor-atomic-transitions-varying-size:
    - shard-lnl:          NOTRUN -> [SKIP][30] ([Intel XE#323])
   [30]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-8/igt@kms_cursor_legacy@short-busy-flip-before-cursor-atomic-transitions-varying-size.html

  * igt@kms_dither@fb-8bpc-vs-panel-6bpc@pipe-a-hdmi-a-6:
    - shard-dg2-set2:     NOTRUN -> [SKIP][31] ([Intel XE#4494] / [i915#3804])
   [31]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-436/igt@kms_dither@fb-8bpc-vs-panel-6bpc@pipe-a-hdmi-a-6.html

  * igt@kms_dsc@dsc-basic:
    - shard-bmg:          NOTRUN -> [SKIP][32] ([Intel XE#2244])
   [32]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-1/igt@kms_dsc@dsc-basic.html

  * igt@kms_fbc_dirty_rect@fbc-dirty-rectangle-different-formats:
    - shard-dg2-set2:     NOTRUN -> [SKIP][33] ([Intel XE#4422])
   [33]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-436/igt@kms_fbc_dirty_rect@fbc-dirty-rectangle-different-formats.html

  * igt@kms_fbcon_fbt@fbc-suspend:
    - shard-adlp:         NOTRUN -> [ABORT][34] ([Intel XE#6675])
   [34]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-adlp-6/igt@kms_fbcon_fbt@fbc-suspend.html

  * igt@kms_feature_discovery@display-2x:
    - shard-adlp:         NOTRUN -> [SKIP][35] ([Intel XE#702])
   [35]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-adlp-6/igt@kms_feature_discovery@display-2x.html

  * igt@kms_flip@2x-flip-vs-suspend-interruptible@ab-dp2-hdmi-a3:
    - shard-bmg:          NOTRUN -> [ABORT][36] ([Intel XE#6675]) +5 other tests abort
   [36]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-5/igt@kms_flip@2x-flip-vs-suspend-interruptible@ab-dp2-hdmi-a3.html

  * igt@kms_flip@2x-plain-flip-fb-recreate-interruptible:
    - shard-bmg:          [PASS][37] -> [SKIP][38] ([Intel XE#2316]) +6 other tests skip
   [37]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-4/igt@kms_flip@2x-plain-flip-fb-recreate-interruptible.html
   [38]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-6/igt@kms_flip@2x-plain-flip-fb-recreate-interruptible.html

  * igt@kms_flip_scaled_crc@flip-32bpp-4tile-to-32bpp-4tiledg2rcccs-upscaling@pipe-a-valid-mode:
    - shard-bmg:          NOTRUN -> [SKIP][39] ([Intel XE#2293]) +2 other tests skip
   [39]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-1/igt@kms_flip_scaled_crc@flip-32bpp-4tile-to-32bpp-4tiledg2rcccs-upscaling@pipe-a-valid-mode.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytileccs-to-64bpp-ytile-downscaling:
    - shard-dg2-set2:     NOTRUN -> [SKIP][40] ([Intel XE#455]) +6 other tests skip
   [40]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-436/igt@kms_flip_scaled_crc@flip-32bpp-ytileccs-to-64bpp-ytile-downscaling.html

  * igt@kms_flip_scaled_crc@flip-64bpp-yftile-to-16bpp-yftile-upscaling:
    - shard-bmg:          NOTRUN -> [SKIP][41] ([Intel XE#2293] / [Intel XE#2380]) +2 other tests skip
   [41]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-2/igt@kms_flip_scaled_crc@flip-64bpp-yftile-to-16bpp-yftile-upscaling.html

  * igt@kms_frontbuffer_tracking@drrs-1p-primscrn-shrfb-msflip-blt:
    - shard-lnl:          NOTRUN -> [SKIP][42] ([Intel XE#651]) +1 other test skip
   [42]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-8/igt@kms_frontbuffer_tracking@drrs-1p-primscrn-shrfb-msflip-blt.html

  * igt@kms_frontbuffer_tracking@drrs-2p-primscrn-pri-shrfb-draw-mmap-wc:
    - shard-adlp:         NOTRUN -> [SKIP][43] ([Intel XE#656]) +2 other tests skip
   [43]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-adlp-6/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-pri-shrfb-draw-mmap-wc.html

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-shrfb-pgflip-blt:
    - shard-bmg:          NOTRUN -> [SKIP][44] ([Intel XE#4141]) +2 other tests skip
   [44]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-1/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-shrfb-pgflip-blt.html

  * igt@kms_frontbuffer_tracking@fbcdrrs-1p-offscreen-pri-shrfb-draw-render:
    - shard-lnl:          NOTRUN -> [SKIP][45] ([Intel XE#6312])
   [45]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-8/igt@kms_frontbuffer_tracking@fbcdrrs-1p-offscreen-pri-shrfb-draw-render.html

  * igt@kms_frontbuffer_tracking@fbcdrrs-1p-primscrn-cur-indfb-draw-render:
    - shard-dg2-set2:     NOTRUN -> [SKIP][46] ([Intel XE#651]) +7 other tests skip
   [46]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-436/igt@kms_frontbuffer_tracking@fbcdrrs-1p-primscrn-cur-indfb-draw-render.html

  * igt@kms_frontbuffer_tracking@fbcdrrs-2p-pri-indfb-multidraw:
    - shard-bmg:          NOTRUN -> [SKIP][47] ([Intel XE#2312]) +1 other test skip
   [47]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-2/igt@kms_frontbuffer_tracking@fbcdrrs-2p-pri-indfb-multidraw.html

  * igt@kms_frontbuffer_tracking@fbcdrrs-rgb101010-draw-render:
    - shard-bmg:          NOTRUN -> [SKIP][48] ([Intel XE#2311]) +3 other tests skip
   [48]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-2/igt@kms_frontbuffer_tracking@fbcdrrs-rgb101010-draw-render.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-cur-indfb-move:
    - shard-lnl:          NOTRUN -> [SKIP][49] ([Intel XE#656])
   [49]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-8/igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-cur-indfb-move.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-shrfb-msflip-blt:
    - shard-dg2-set2:     NOTRUN -> [SKIP][50] ([Intel XE#653]) +7 other tests skip
   [50]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-436/igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-shrfb-msflip-blt.html

  * igt@kms_frontbuffer_tracking@psr-2p-scndscrn-shrfb-plflip-blt:
    - shard-bmg:          NOTRUN -> [SKIP][51] ([Intel XE#2313]) +3 other tests skip
   [51]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-1/igt@kms_frontbuffer_tracking@psr-2p-scndscrn-shrfb-plflip-blt.html

  * igt@kms_frontbuffer_tracking@psr-rgb565-draw-render:
    - shard-adlp:         NOTRUN -> [SKIP][52] ([Intel XE#653])
   [52]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-adlp-6/igt@kms_frontbuffer_tracking@psr-rgb565-draw-render.html

  * igt@kms_pipe_crc_basic@suspend-read-crc@pipe-a-edp-1:
    - shard-lnl:          [PASS][53] -> [ABORT][54] ([Intel XE#6675])
   [53]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-lnl-3/igt@kms_pipe_crc_basic@suspend-read-crc@pipe-a-edp-1.html
   [54]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-2/igt@kms_pipe_crc_basic@suspend-read-crc@pipe-a-edp-1.html

  * igt@kms_plane_scaling@2x-scaler-multi-pipe:
    - shard-bmg:          [PASS][55] -> [SKIP][56] ([Intel XE#2571])
   [55]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-4/igt@kms_plane_scaling@2x-scaler-multi-pipe.html
   [56]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-6/igt@kms_plane_scaling@2x-scaler-multi-pipe.html

  * igt@kms_pm_dc@dc5-psr:
    - shard-adlp:         NOTRUN -> [SKIP][57] ([Intel XE#1129])
   [57]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-adlp-6/igt@kms_pm_dc@dc5-psr.html

  * igt@kms_pm_dc@dc6-dpms:
    - shard-dg2-set2:     NOTRUN -> [SKIP][58] ([Intel XE#908])
   [58]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-432/igt@kms_pm_dc@dc6-dpms.html

  * igt@kms_pm_rpm@system-suspend-idle:
    - shard-dg2-set2:     [PASS][59] -> [ABORT][60] ([Intel XE#6675])
   [59]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-dg2-436/igt@kms_pm_rpm@system-suspend-idle.html
   [60]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-466/igt@kms_pm_rpm@system-suspend-idle.html

  * igt@kms_psr2_sf@fbc-pr-cursor-plane-update-sf:
    - shard-dg2-set2:     NOTRUN -> [SKIP][61] ([Intel XE#1406] / [Intel XE#1489]) +2 other tests skip
   [61]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-432/igt@kms_psr2_sf@fbc-pr-cursor-plane-update-sf.html

  * igt@kms_psr2_sf@fbc-psr2-cursor-plane-update-sf:
    - shard-adlp:         NOTRUN -> [SKIP][62] ([Intel XE#1406] / [Intel XE#1489])
   [62]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-adlp-6/igt@kms_psr2_sf@fbc-psr2-cursor-plane-update-sf.html

  * igt@kms_psr2_sf@fbc-psr2-overlay-plane-move-continuous-exceed-fully-sf:
    - shard-lnl:          NOTRUN -> [SKIP][63] ([Intel XE#1406] / [Intel XE#2893] / [Intel XE#4608])
   [63]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-8/igt@kms_psr2_sf@fbc-psr2-overlay-plane-move-continuous-exceed-fully-sf.html

  * igt@kms_psr2_sf@fbc-psr2-overlay-plane-move-continuous-exceed-fully-sf@pipe-a-edp-1:
    - shard-lnl:          NOTRUN -> [SKIP][64] ([Intel XE#1406] / [Intel XE#4608]) +1 other test skip
   [64]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-8/igt@kms_psr2_sf@fbc-psr2-overlay-plane-move-continuous-exceed-fully-sf@pipe-a-edp-1.html

  * igt@kms_psr@fbc-pr-basic:
    - shard-bmg:          NOTRUN -> [SKIP][65] ([Intel XE#1406] / [Intel XE#2234] / [Intel XE#2850]) +1 other test skip
   [65]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-1/igt@kms_psr@fbc-pr-basic.html

  * igt@kms_psr@fbc-pr-sprite-plane-move:
    - shard-dg2-set2:     NOTRUN -> [SKIP][66] ([Intel XE#1406] / [Intel XE#2850] / [Intel XE#929]) +2 other tests skip
   [66]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-432/igt@kms_psr@fbc-pr-sprite-plane-move.html

  * igt@kms_psr@pr-primary-blt:
    - shard-lnl:          NOTRUN -> [SKIP][67] ([Intel XE#1406])
   [67]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-8/igt@kms_psr@pr-primary-blt.html

  * igt@kms_rotation_crc@primary-4-tiled-reflect-x-180:
    - shard-adlp:         NOTRUN -> [SKIP][68] ([Intel XE#1127])
   [68]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-adlp-6/igt@kms_rotation_crc@primary-4-tiled-reflect-x-180.html

  * igt@kms_rotation_crc@primary-y-tiled-reflect-x-90:
    - shard-dg2-set2:     NOTRUN -> [SKIP][69] ([Intel XE#3414])
   [69]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-432/igt@kms_rotation_crc@primary-y-tiled-reflect-x-90.html

  * igt@kms_rotation_crc@primary-yf-tiled-reflect-x-180:
    - shard-bmg:          NOTRUN -> [SKIP][70] ([Intel XE#2330])
   [70]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-1/igt@kms_rotation_crc@primary-yf-tiled-reflect-x-180.html

  * igt@kms_setmode@basic@pipe-a-hdmi-a-3:
    - shard-bmg:          [PASS][71] -> [FAIL][72] ([Intel XE#6361]) +1 other test fail
   [71]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-5/igt@kms_setmode@basic@pipe-a-hdmi-a-3.html
   [72]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-3/igt@kms_setmode@basic@pipe-a-hdmi-a-3.html

  * igt@kms_setmode@clone-exclusive-crtc:
    - shard-lnl:          NOTRUN -> [SKIP][73] ([Intel XE#1435])
   [73]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-8/igt@kms_setmode@clone-exclusive-crtc.html

  * igt@kms_tiled_display@basic-test-pattern:
    - shard-adlp:         NOTRUN -> [SKIP][74] ([Intel XE#362])
   [74]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-adlp-6/igt@kms_tiled_display@basic-test-pattern.html

  * igt@kms_vrr@seamless-rr-switch-vrr:
    - shard-lnl:          NOTRUN -> [SKIP][75] ([Intel XE#1499])
   [75]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-8/igt@kms_vrr@seamless-rr-switch-vrr.html

  * igt@testdisplay:
    - shard-bmg:          [PASS][76] -> [ABORT][77] ([Intel XE#6740])
   [76]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-4/igt@testdisplay.html
   [77]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-7/igt@testdisplay.html

  * igt@xe_ccs@block-copy-compressed-inc-dimension:
    - shard-adlp:         NOTRUN -> [SKIP][78] ([Intel XE#455] / [Intel XE#488] / [Intel XE#5607])
   [78]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-adlp-6/igt@xe_ccs@block-copy-compressed-inc-dimension.html

  * igt@xe_compute_preempt@compute-preempt:
    - shard-dg2-set2:     NOTRUN -> [SKIP][79] ([Intel XE#6360]) +1 other test skip
   [79]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-432/igt@xe_compute_preempt@compute-preempt.html

  * igt@xe_copy_basic@mem-copy-linear-0x3fff:
    - shard-dg2-set2:     NOTRUN -> [SKIP][80] ([Intel XE#1123])
   [80]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-436/igt@xe_copy_basic@mem-copy-linear-0x3fff.html

  * igt@xe_create@multigpu-create-massive-size:
    - shard-dg2-set2:     NOTRUN -> [SKIP][81] ([Intel XE#944])
   [81]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-432/igt@xe_create@multigpu-create-massive-size.html

  * igt@xe_eudebug@discovery-empty:
    - shard-dg2-set2:     NOTRUN -> [SKIP][82] ([Intel XE#4837]) +2 other tests skip
   [82]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-436/igt@xe_eudebug@discovery-empty.html

  * igt@xe_eudebug_online@interrupt-other-debuggable:
    - shard-lnl:          NOTRUN -> [SKIP][83] ([Intel XE#4837])
   [83]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-8/igt@xe_eudebug_online@interrupt-other-debuggable.html

  * igt@xe_eudebug_online@set-breakpoint:
    - shard-bmg:          NOTRUN -> [SKIP][84] ([Intel XE#4837]) +2 other tests skip
   [84]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-1/igt@xe_eudebug_online@set-breakpoint.html

  * igt@xe_eudebug_online@tdctl-parameters:
    - shard-adlp:         NOTRUN -> [SKIP][85] ([Intel XE#4837] / [Intel XE#5565]) +1 other test skip
   [85]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-adlp-6/igt@xe_eudebug_online@tdctl-parameters.html

  * igt@xe_evict@evict-beng-small-external-cm:
    - shard-lnl:          NOTRUN -> [SKIP][86] ([Intel XE#688])
   [86]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-8/igt@xe_evict@evict-beng-small-external-cm.html

  * igt@xe_evict@evict-small-external-cm:
    - shard-adlp:         NOTRUN -> [SKIP][87] ([Intel XE#261] / [Intel XE#5564] / [Intel XE#688])
   [87]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-adlp-6/igt@xe_evict@evict-small-external-cm.html

  * igt@xe_exec_basic@multigpu-many-execqueues-many-vm-null-rebind:
    - shard-lnl:          NOTRUN -> [SKIP][88] ([Intel XE#1392])
   [88]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-8/igt@xe_exec_basic@multigpu-many-execqueues-many-vm-null-rebind.html

  * igt@xe_exec_basic@multigpu-no-exec-userptr-rebind:
    - shard-bmg:          NOTRUN -> [SKIP][89] ([Intel XE#2322]) +1 other test skip
   [89]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-2/igt@xe_exec_basic@multigpu-no-exec-userptr-rebind.html

  * igt@xe_exec_fault_mode@many-userptr-invalidate-imm:
    - shard-dg2-set2:     NOTRUN -> [SKIP][90] ([Intel XE#288]) +5 other tests skip
   [90]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-436/igt@xe_exec_fault_mode@many-userptr-invalidate-imm.html

  * igt@xe_exec_fault_mode@once-userptr-prefetch:
    - shard-adlp:         NOTRUN -> [SKIP][91] ([Intel XE#288] / [Intel XE#5561]) +2 other tests skip
   [91]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-adlp-6/igt@xe_exec_fault_mode@once-userptr-prefetch.html

  * igt@xe_exec_system_allocator@fault-threads-same-page-benchmark:
    - shard-dg2-set2:     NOTRUN -> [SKIP][92] ([Intel XE#4915]) +84 other tests skip
   [92]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-436/igt@xe_exec_system_allocator@fault-threads-same-page-benchmark.html

  * igt@xe_exec_system_allocator@madvise-no-range-invalidate-same-attr:
    - shard-lnl:          NOTRUN -> [WARN][93] ([Intel XE#5786])
   [93]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-8/igt@xe_exec_system_allocator@madvise-no-range-invalidate-same-attr.html

  * igt@xe_exec_system_allocator@many-stride-mmap-huge-nomemset:
    - shard-lnl:          NOTRUN -> [SKIP][94] ([Intel XE#4943]) +4 other tests skip
   [94]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-8/igt@xe_exec_system_allocator@many-stride-mmap-huge-nomemset.html

  * igt@xe_exec_system_allocator@threads-shared-vm-many-execqueues-malloc-fork-read-after:
    - shard-adlp:         NOTRUN -> [SKIP][95] ([Intel XE#4915]) +23 other tests skip
   [95]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-adlp-6/igt@xe_exec_system_allocator@threads-shared-vm-many-execqueues-malloc-fork-read-after.html

  * igt@xe_exec_system_allocator@threads-shared-vm-many-stride-mmap-free-huge:
    - shard-bmg:          NOTRUN -> [SKIP][96] ([Intel XE#4943]) +3 other tests skip
   [96]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-1/igt@xe_exec_system_allocator@threads-shared-vm-many-stride-mmap-free-huge.html

  * igt@xe_fault_injection@exec-queue-create-fail-xe_pxp_exec_queue_add:
    - shard-dg2-set2:     NOTRUN -> [SKIP][97] ([Intel XE#6281])
   [97]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-432/igt@xe_fault_injection@exec-queue-create-fail-xe_pxp_exec_queue_add.html

  * igt@xe_huc_copy@huc_copy:
    - shard-dg2-set2:     NOTRUN -> [SKIP][98] ([Intel XE#255])
   [98]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-436/igt@xe_huc_copy@huc_copy.html

  * igt@xe_module_load@load:
    - shard-bmg:          ([PASS][99], [PASS][100], [PASS][101], [PASS][102], [PASS][103], [PASS][104], [PASS][105], [PASS][106], [PASS][107], [PASS][108], [PASS][109], [PASS][110], [PASS][111], [PASS][112], [PASS][113], [PASS][114], [PASS][115], [PASS][116], [PASS][117], [PASS][118], [PASS][119], [PASS][120], [PASS][121], [PASS][122]) -> ([PASS][123], [PASS][124], [PASS][125], [SKIP][126], [PASS][127], [PASS][128], [PASS][129], [PASS][130], [PASS][131], [PASS][132], [PASS][133], [PASS][134], [PASS][135], [PASS][136], [PASS][137], [PASS][138], [PASS][139], [PASS][140], [PASS][141], [PASS][142], [PASS][143], [PASS][144], [PASS][145], [PASS][146], [PASS][147], [PASS][148]) ([Intel XE#2457])
   [99]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-2/igt@xe_module_load@load.html
   [100]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-4/igt@xe_module_load@load.html
   [101]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-6/igt@xe_module_load@load.html
   [102]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-1/igt@xe_module_load@load.html
   [103]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-8/igt@xe_module_load@load.html
   [104]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-3/igt@xe_module_load@load.html
   [105]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-3/igt@xe_module_load@load.html
   [106]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-5/igt@xe_module_load@load.html
   [107]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-5/igt@xe_module_load@load.html
   [108]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-5/igt@xe_module_load@load.html
   [109]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-5/igt@xe_module_load@load.html
   [110]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-1/igt@xe_module_load@load.html
   [111]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-7/igt@xe_module_load@load.html
   [112]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-7/igt@xe_module_load@load.html
   [113]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-4/igt@xe_module_load@load.html
   [114]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-4/igt@xe_module_load@load.html
   [115]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-8/igt@xe_module_load@load.html
   [116]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-8/igt@xe_module_load@load.html
   [117]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-7/igt@xe_module_load@load.html
   [118]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-6/igt@xe_module_load@load.html
   [119]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-6/igt@xe_module_load@load.html
   [120]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-3/igt@xe_module_load@load.html
   [121]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-6/igt@xe_module_load@load.html
   [122]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-2/igt@xe_module_load@load.html
   [123]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-4/igt@xe_module_load@load.html
   [124]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-5/igt@xe_module_load@load.html
   [125]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-2/igt@xe_module_load@load.html
   [126]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-1/igt@xe_module_load@load.html
   [127]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-4/igt@xe_module_load@load.html
   [128]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-5/igt@xe_module_load@load.html
   [129]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-3/igt@xe_module_load@load.html
   [130]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-3/igt@xe_module_load@load.html
   [131]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-1/igt@xe_module_load@load.html
   [132]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-1/igt@xe_module_load@load.html
   [133]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-8/igt@xe_module_load@load.html
   [134]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-2/igt@xe_module_load@load.html
   [135]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-7/igt@xe_module_load@load.html
   [136]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-5/igt@xe_module_load@load.html
   [137]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-7/igt@xe_module_load@load.html
   [138]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-6/igt@xe_module_load@load.html
   [139]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-6/igt@xe_module_load@load.html
   [140]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-4/igt@xe_module_load@load.html
   [141]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-1/igt@xe_module_load@load.html
   [142]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-7/igt@xe_module_load@load.html
   [143]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-6/igt@xe_module_load@load.html
   [144]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-3/igt@xe_module_load@load.html
   [145]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-3/igt@xe_module_load@load.html
   [146]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-8/igt@xe_module_load@load.html
   [147]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-8/igt@xe_module_load@load.html
   [148]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-2/igt@xe_module_load@load.html

  * igt@xe_oa@polling-small-buf:
    - shard-dg2-set2:     NOTRUN -> [SKIP][149] ([Intel XE#3573]) +1 other test skip
   [149]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-436/igt@xe_oa@polling-small-buf.html

  * igt@xe_pat@pat-index-xelpg:
    - shard-dg2-set2:     NOTRUN -> [SKIP][150] ([Intel XE#979])
   [150]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-436/igt@xe_pat@pat-index-xelpg.html

  * igt@xe_pm@d3hot-i2c:
    - shard-lnl:          NOTRUN -> [SKIP][151] ([Intel XE#5742])
   [151]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-8/igt@xe_pm@d3hot-i2c.html

  * igt@xe_pm@s2idle-basic-exec:
    - shard-adlp:         [PASS][152] -> [ABORT][153] ([Intel XE#6675]) +7 other tests abort
   [152]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-adlp-6/igt@xe_pm@s2idle-basic-exec.html
   [153]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-adlp-9/igt@xe_pm@s2idle-basic-exec.html

  * igt@xe_pm@s2idle-vm-bind-prefetch:
    - shard-bmg:          [PASS][154] -> [ABORT][155] ([Intel XE#6675])
   [154]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-1/igt@xe_pm@s2idle-vm-bind-prefetch.html
   [155]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-3/igt@xe_pm@s2idle-vm-bind-prefetch.html

  * igt@xe_pm@s3-d3cold-basic-exec:
    - shard-bmg:          NOTRUN -> [SKIP][156] ([Intel XE#2284])
   [156]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-1/igt@xe_pm@s3-d3cold-basic-exec.html

  * igt@xe_pm@s4-basic:
    - shard-dg2-set2:     NOTRUN -> [ABORT][157] ([Intel XE#6675]) +1 other test abort
   [157]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-432/igt@xe_pm@s4-basic.html

  * igt@xe_pmu@engine-activity-accuracy-90:
    - shard-lnl:          [PASS][158] -> [FAIL][159] ([Intel XE#6251]) +2 other tests fail
   [158]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-lnl-2/igt@xe_pmu@engine-activity-accuracy-90.html
   [159]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-8/igt@xe_pmu@engine-activity-accuracy-90.html

  * igt@xe_query@multigpu-query-mem-usage:
    - shard-lnl:          NOTRUN -> [SKIP][160] ([Intel XE#944])
   [160]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-8/igt@xe_query@multigpu-query-mem-usage.html

  * igt@xe_spin_batch@spin-mem-copy:
    - shard-adlp:         NOTRUN -> [SKIP][161] ([Intel XE#4821])
   [161]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-adlp-6/igt@xe_spin_batch@spin-mem-copy.html

  * igt@xe_sriov_flr@flr-each-isolation:
    - shard-lnl:          NOTRUN -> [SKIP][162] ([Intel XE#3342])
   [162]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-8/igt@xe_sriov_flr@flr-each-isolation.html

  
#### Possible fixes ####

  * igt@kms_flip@2x-flip-vs-dpms:
    - shard-bmg:          [SKIP][163] ([Intel XE#2316]) -> [PASS][164] +1 other test pass
   [163]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-6/igt@kms_flip@2x-flip-vs-dpms.html
   [164]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-7/igt@kms_flip@2x-flip-vs-dpms.html

  * igt@kms_pipe_crc_basic@read-crc-frame-sequence:
    - shard-bmg:          [ABORT][165] ([Intel XE#1727]) -> [PASS][166] +1 other test pass
   [165]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-2/igt@kms_pipe_crc_basic@read-crc-frame-sequence.html
   [166]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-1/igt@kms_pipe_crc_basic@read-crc-frame-sequence.html

  * igt@kms_pm_rpm@system-suspend-modeset:
    - shard-dg2-set2:     [ABORT][167] ([Intel XE#6675]) -> [PASS][168] +2 other tests pass
   [167]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-dg2-466/igt@kms_pm_rpm@system-suspend-modeset.html
   [168]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-dg2-436/igt@kms_pm_rpm@system-suspend-modeset.html

  * igt@xe_pm@s2idle-basic:
    - shard-lnl:          [ABORT][169] ([Intel XE#6675]) -> [PASS][170]
   [169]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-lnl-2/igt@xe_pm@s2idle-basic.html
   [170]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-8/igt@xe_pm@s2idle-basic.html

  * igt@xe_pm@s2idle-mocs:
    - shard-adlp:         [ABORT][171] ([Intel XE#6675]) -> [PASS][172] +1 other test pass
   [171]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-adlp-9/igt@xe_pm@s2idle-mocs.html
   [172]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-adlp-6/igt@xe_pm@s2idle-mocs.html

  
#### Warnings ####

  * igt@kms_content_protection@srm:
    - shard-bmg:          [FAIL][173] ([Intel XE#1178]) -> [SKIP][174] ([Intel XE#2341])
   [173]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-8/igt@kms_content_protection@srm.html
   [174]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-2/igt@kms_content_protection@srm.html

  * igt@kms_flip@2x-flip-vs-suspend-interruptible:
    - shard-bmg:          [SKIP][175] ([Intel XE#2316]) -> [ABORT][176] ([Intel XE#6675])
   [175]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-2/igt@kms_flip@2x-flip-vs-suspend-interruptible.html
   [176]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-5/igt@kms_flip@2x-flip-vs-suspend-interruptible.html

  * igt@kms_frontbuffer_tracking@drrs-2p-primscrn-pri-indfb-draw-mmap-wc:
    - shard-bmg:          [SKIP][177] ([Intel XE#2312]) -> [SKIP][178] ([Intel XE#2311]) +3 other tests skip
   [177]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-6/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-pri-indfb-draw-mmap-wc.html
   [178]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-7/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-pri-indfb-draw-mmap-wc.html

  * igt@kms_frontbuffer_tracking@drrs-2p-primscrn-spr-indfb-draw-render:
    - shard-bmg:          [SKIP][179] ([Intel XE#2311]) -> [SKIP][180] ([Intel XE#2312]) +8 other tests skip
   [179]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-8/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-spr-indfb-draw-render.html
   [180]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-2/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-spr-indfb-draw-render.html

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-cur-indfb-draw-render:
    - shard-bmg:          [SKIP][181] ([Intel XE#2312]) -> [SKIP][182] ([Intel XE#4141])
   [181]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-2/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-cur-indfb-draw-render.html
   [182]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-1/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-cur-indfb-draw-render.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-draw-render:
    - shard-bmg:          [SKIP][183] ([Intel XE#4141]) -> [SKIP][184] ([Intel XE#2312]) +3 other tests skip
   [183]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-8/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-draw-render.html
   [184]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-2/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-draw-render.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-spr-indfb-onoff:
    - shard-bmg:          [SKIP][185] ([Intel XE#2312]) -> [SKIP][186] ([Intel XE#2313]) +3 other tests skip
   [185]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-2/igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-spr-indfb-onoff.html
   [186]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-1/igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-spr-indfb-onoff.html

  * igt@kms_frontbuffer_tracking@psr-2p-primscrn-spr-indfb-fullscreen:
    - shard-bmg:          [SKIP][187] ([Intel XE#2313]) -> [SKIP][188] ([Intel XE#2312]) +9 other tests skip
   [187]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-bmg-4/igt@kms_frontbuffer_tracking@psr-2p-primscrn-spr-indfb-fullscreen.html
   [188]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-bmg-6/igt@kms_frontbuffer_tracking@psr-2p-primscrn-spr-indfb-fullscreen.html

  * igt@kms_pipe_crc_basic@suspend-read-crc@pipe-b-edp-1:
    - shard-lnl:          [ABORT][189] ([Intel XE#6675]) -> [INCOMPLETE][190] ([Intel XE#6718])
   [189]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65/shard-lnl-3/igt@kms_pipe_crc_basic@suspend-read-crc@pipe-b-edp-1.html
   [190]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/shard-lnl-2/igt@kms_pipe_crc_basic@suspend-read-crc@pipe-b-edp-1.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [Intel XE#1123]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1123
  [Intel XE#1124]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1124
  [Intel XE#1127]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1127
  [Intel XE#1129]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1129
  [Intel XE#1178]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1178
  [Intel XE#1392]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1392
  [Intel XE#1406]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1406
  [Intel XE#1424]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1424
  [Intel XE#1435]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1435
  [Intel XE#1489]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1489
  [Intel XE#1499]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1499
  [Intel XE#1727]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1727
  [Intel XE#2191]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2191
  [Intel XE#2234]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2234
  [Intel XE#2244]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2244
  [Intel XE#2252]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2252
  [Intel XE#2284]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2284
  [Intel XE#2291]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2291
  [Intel XE#2293]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2293
  [Intel XE#2311]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2311
  [Intel XE#2312]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2312
  [Intel XE#2313]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2313
  [Intel XE#2316]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2316
  [Intel XE#2320]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2320
  [Intel XE#2322]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2322
  [Intel XE#2330]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2330
  [Intel XE#2341]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2341
  [Intel XE#2380]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2380
  [Intel XE#2457]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2457
  [Intel XE#255]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/255
  [Intel XE#2571]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2571
  [Intel XE#261]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/261
  [Intel XE#2850]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2850
  [Intel XE#288]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/288
  [Intel XE#2887]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2887
  [Intel XE#2893]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2893
  [Intel XE#2907]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2907
  [Intel XE#306]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/306
  [Intel XE#308]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/308
  [Intel XE#309]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/309
  [Intel XE#316]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/316
  [Intel XE#323]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/323
  [Intel XE#3342]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3342
  [Intel XE#3414]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3414
  [Intel XE#3573]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3573
  [Intel XE#362]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/362
  [Intel XE#367]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/367
  [Intel XE#373]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/373
  [Intel XE#4141]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4141
  [Intel XE#4422]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4422
  [Intel XE#4494]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4494
  [Intel XE#455]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/455
  [Intel XE#4608]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4608
  [Intel XE#4821]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4821
  [Intel XE#4837]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4837
  [Intel XE#488]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/488
  [Intel XE#4915]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4915
  [Intel XE#4943]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4943
  [Intel XE#5561]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5561
  [Intel XE#5564]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5564
  [Intel XE#5565]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5565
  [Intel XE#5607]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5607
  [Intel XE#5742]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5742
  [Intel XE#5786]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5786
  [Intel XE#6251]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6251
  [Intel XE#6281]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6281
  [Intel XE#6312]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6312
  [Intel XE#6360]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6360
  [Intel XE#6361]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6361
  [Intel XE#651]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/651
  [Intel XE#653]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/653
  [Intel XE#656]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/656
  [Intel XE#6675]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6675
  [Intel XE#6676]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6676
  [Intel XE#6704]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6704
  [Intel XE#6707]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6707
  [Intel XE#6718]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6718
  [Intel XE#6740]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6740
  [Intel XE#688]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/688
  [Intel XE#702]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/702
  [Intel XE#787]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/787
  [Intel XE#908]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/908
  [Intel XE#929]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/929
  [Intel XE#944]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/944
  [Intel XE#979]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/979
  [i915#3804]: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/3804


Build changes
-------------

  * Linux: xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65 -> xe-pw-155314v8

  IGT_8647: 8647
  xe-4178-639f325d8cbdc690de963db2fe5840444ac7ea65: 639f325d8cbdc690de963db2fe5840444ac7ea65
  xe-pw-155314v8: 155314v8

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-155314v8/index.html

[-- Attachment #2: Type: text/html, Size: 57504 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 8/9] drm/xe: Disable timestamp WA on VFs
  2025-12-01 18:39 ` [PATCH v7 8/9] drm/xe: Disable timestamp WA on VFs Matthew Brost
@ 2025-12-02  6:42   ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 31+ messages in thread
From: Umesh Nerlige Ramappa @ 2025-12-02  6:42 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, dri-devel

On Mon, Dec 01, 2025 at 10:39:53AM -0800, Matthew Brost wrote:
>The timestamp WA does not work on a VF because it requires reading MMIO
>registers, which are inaccessible on a VF. This timestamp WA confuses
>LRC sampling on a VF during TDR, as the LRC timestamp would always read
>as 1 for any active context. Disable the timestamp WA on VFs to avoid
>this confusion.
>
>Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Thanks for fixing this.

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

I think this needs a Fixes tag.

Fixes: 617d824c5323 ("drm/xe: Add WA BB to capture active context utilization")

This patch may not easily backport since functions are renamed later in 
a refactor. May need to be manually sent to stable or needs Maintainer's 
inputs.

Regards,
Umesh


>---
> drivers/gpu/drm/xe/xe_lrc.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
>index a05060f75e7e..166353455f8f 100644
>--- a/drivers/gpu/drm/xe/xe_lrc.c
>+++ b/drivers/gpu/drm/xe/xe_lrc.c
>@@ -1063,6 +1063,9 @@ static ssize_t setup_utilization_wa(struct xe_lrc *lrc,
> {
> 	u32 *cmd = batch;
>
>+	if (IS_SRIOV_VF(gt_to_xe(lrc->gt)))
>+		return 0;
>+
> 	if (xe_gt_WARN_ON(lrc->gt, max_len < 12))
> 		return -ENOSPC;
>
>-- 
>2.34.1
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 9/9] drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR
  2025-12-01 18:39 ` [PATCH v7 9/9] drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR Matthew Brost
@ 2025-12-02  7:31   ` Umesh Nerlige Ramappa
  2025-12-02 15:14     ` Matthew Brost
  0 siblings, 1 reply; 31+ messages in thread
From: Umesh Nerlige Ramappa @ 2025-12-02  7:31 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe, dri-devel

On Mon, Dec 01, 2025 at 10:39:54AM -0800, Matthew Brost wrote:
>We now have proper infrastructure to accurately check the LRC timestamp
>without toggling the scheduling state for non-VFs. For VFs, it is still
>possible to get an inaccurate view if the context is on hardware. We
>guard against free-running contexts on VFs by banning jobs whose
>timestamps are not moving. In addition, VFs have a timeslice quantum
>that naturally triggers context switches when more than one VF is
>running, thus updating the LRC timestamp.

I guess some workloads are configured to just keep running on VFs 
without switching. I am assuming they are classified as Long Running and 
are not affected by TDR. If so, the timeouts should still work 
reasonably on a VF considering switching is usually in milliseconds and 
timeouts are much larger.

>
>For multi-queue, it is desirable to avoid scheduling toggling in the TDR
>because this scheduling state is shared among many queues. Furthermore,
>this change simplifies the GuC state machine. The trade-off for VF cases
>seems worthwhile.
>
>v5:
> - Add xe_lrc_timestamp helper (Umesh)
>v6:
> - Reduce number of tries on stuck timestamp (VF testing)
> - Convert job timestamp save to a memory copy (VF testing)
>v7:
> - Save ctx timestamp to LRC when start VF job (VF testing)
>
>Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>---
> drivers/gpu/drm/xe/xe_guc_submit.c      | 97 ++++++-------------------
> drivers/gpu/drm/xe/xe_lrc.c             | 42 +++++++----
> drivers/gpu/drm/xe/xe_lrc.h             |  3 +-
> drivers/gpu/drm/xe/xe_ring_ops.c        | 25 +++++--
> drivers/gpu/drm/xe/xe_sched_job.c       |  1 +
> drivers/gpu/drm/xe/xe_sched_job_types.h |  2 +
> 6 files changed, 76 insertions(+), 94 deletions(-)
>
>diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>index 8190f2afbaed..dc4bf3126450 100644
>--- a/drivers/gpu/drm/xe/xe_guc_submit.c
>+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>@@ -68,9 +68,7 @@ exec_queue_to_guc(struct xe_exec_queue *q)
> #define EXEC_QUEUE_STATE_KILLED			(1 << 7)
> #define EXEC_QUEUE_STATE_WEDGED			(1 << 8)
> #define EXEC_QUEUE_STATE_BANNED			(1 << 9)
>-#define EXEC_QUEUE_STATE_CHECK_TIMEOUT		(1 << 10)
>-#define EXEC_QUEUE_STATE_PENDING_RESUME		(1 << 11)
>-#define EXEC_QUEUE_STATE_PENDING_TDR_EXIT	(1 << 12)
>+#define EXEC_QUEUE_STATE_PENDING_RESUME		(1 << 10)
>
> static bool exec_queue_registered(struct xe_exec_queue *q)
> {
>@@ -202,21 +200,6 @@ static void set_exec_queue_wedged(struct xe_exec_queue *q)
> 	atomic_or(EXEC_QUEUE_STATE_WEDGED, &q->guc->state);
> }
>
>-static bool exec_queue_check_timeout(struct xe_exec_queue *q)
>-{
>-	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_CHECK_TIMEOUT;
>-}
>-
>-static void set_exec_queue_check_timeout(struct xe_exec_queue *q)
>-{
>-	atomic_or(EXEC_QUEUE_STATE_CHECK_TIMEOUT, &q->guc->state);
>-}
>-
>-static void clear_exec_queue_check_timeout(struct xe_exec_queue *q)
>-{
>-	atomic_and(~EXEC_QUEUE_STATE_CHECK_TIMEOUT, &q->guc->state);
>-}
>-
> static bool exec_queue_pending_resume(struct xe_exec_queue *q)
> {
> 	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_PENDING_RESUME;
>@@ -232,21 +215,6 @@ static void clear_exec_queue_pending_resume(struct xe_exec_queue *q)
> 	atomic_and(~EXEC_QUEUE_STATE_PENDING_RESUME, &q->guc->state);
> }
>
>-static bool exec_queue_pending_tdr_exit(struct xe_exec_queue *q)
>-{
>-	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_PENDING_TDR_EXIT;
>-}
>-
>-static void set_exec_queue_pending_tdr_exit(struct xe_exec_queue *q)
>-{
>-	atomic_or(EXEC_QUEUE_STATE_PENDING_TDR_EXIT, &q->guc->state);
>-}
>-
>-static void clear_exec_queue_pending_tdr_exit(struct xe_exec_queue *q)
>-{
>-	atomic_and(~EXEC_QUEUE_STATE_PENDING_TDR_EXIT, &q->guc->state);
>-}
>-
> static bool exec_queue_killed_or_banned_or_wedged(struct xe_exec_queue *q)
> {
> 	return (atomic_read(&q->guc->state) &
>@@ -1006,7 +974,16 @@ static bool check_timeout(struct xe_exec_queue *q, struct xe_sched_job *job)
> 		return xe_sched_invalidate_job(job, 2);
> 	}
>
>-	ctx_timestamp = lower_32_bits(xe_lrc_ctx_timestamp(q->lrc[0]));
>+	ctx_timestamp = lower_32_bits(xe_lrc_timestamp(q->lrc[0]));
>+	if (ctx_timestamp == job->sample_timestamp) {
>+		xe_gt_warn(gt, "Check job timeout: seqno=%u, lrc_seqno=%u, guc_id=%d, timestamp stuck",
>+			   xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job),
>+			   q->guc->id);
>+
>+		return xe_sched_invalidate_job(job, 0);
>+	}
>+
>+	job->sample_timestamp = ctx_timestamp;
> 	ctx_job_timestamp = xe_lrc_ctx_job_timestamp(q->lrc[0]);
>
> 	/*
>@@ -1132,16 +1109,17 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> 	}
>
> 	/*
>-	 * XXX: Sampling timeout doesn't work in wedged mode as we have to
>-	 * modify scheduling state to read timestamp. We could read the
>-	 * timestamp from a register to accumulate current running time but this
>-	 * doesn't work for SRIOV. For now assuming timeouts in wedged mode are
>-	 * genuine timeouts.
>+	 * Check if job is actually timed out, if so restart job execution and TDR
> 	 */
>+	if (!skip_timeout_check && !check_timeout(q, job))
>+		goto rearm;
>+
> 	if (!exec_queue_killed(q))
> 		wedged = guc_submit_hint_wedged(exec_queue_to_guc(q));
>
>-	/* Engine state now stable, disable scheduling to check timestamp */
>+	set_exec_queue_banned(q);
>+
>+	/* Kick job / queue off hardware */
> 	if (!wedged && (exec_queue_enabled(q) || exec_queue_pending_disable(q))) {
> 		int ret;
>
>@@ -1163,13 +1141,6 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> 			if (!ret || xe_guc_read_stopped(guc))
> 				goto trigger_reset;
>
>-			/*
>-			 * Flag communicates to G2H handler that schedule
>-			 * disable originated from a timeout check. The G2H then
>-			 * avoid triggering cleanup or deregistering the exec
>-			 * queue.
>-			 */
>-			set_exec_queue_check_timeout(q);
> 			disable_scheduling(q, skip_timeout_check);
> 		}
>
>@@ -1198,22 +1169,12 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> 			xe_devcoredump(q, job,
> 				       "Schedule disable failed to respond, guc_id=%d, ret=%d, guc_read=%d",
> 				       q->guc->id, ret, xe_guc_read_stopped(guc));
>-			set_exec_queue_banned(q);
> 			xe_gt_reset_async(q->gt);
> 			xe_sched_tdr_queue_imm(sched);
> 			goto rearm;
> 		}
> 	}
>
>-	/*
>-	 * Check if job is actually timed out, if so restart job execution and TDR
>-	 */
>-	if (!wedged && !skip_timeout_check && !check_timeout(q, job) &&
>-	    !exec_queue_reset(q) && exec_queue_registered(q)) {
>-		clear_exec_queue_check_timeout(q);
>-		goto sched_enable;
>-	}
>-
> 	if (q->vm && q->vm->xef) {
> 		process_name = q->vm->xef->process_name;
> 		pid = q->vm->xef->pid;
>@@ -1244,14 +1205,11 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> 	if (!wedged && (q->flags & EXEC_QUEUE_FLAG_KERNEL ||
> 			(q->flags & EXEC_QUEUE_FLAG_VM && !exec_queue_killed(q)))) {
> 		if (!xe_sched_invalidate_job(job, 2)) {
>-			clear_exec_queue_check_timeout(q);
> 			xe_gt_reset_async(q->gt);
> 			goto rearm;
> 		}
> 	}
>
>-	set_exec_queue_banned(q);
>-
> 	/* Mark all outstanding jobs as bad, thus completing them */
> 	xe_sched_job_set_error(job, err);
> 	drm_sched_for_each_pending_job(tmp_job, &sched->base, NULL)
>@@ -1266,9 +1224,6 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> 	 */
> 	return DRM_GPU_SCHED_STAT_NO_HANG;
>
>-sched_enable:
>-	set_exec_queue_pending_tdr_exit(q);
>-	enable_scheduling(q);
> rearm:
> 	/*
> 	 * XXX: Ideally want to adjust timeout based on current execution time
>@@ -1898,8 +1853,7 @@ static void guc_exec_queue_revert_pending_state_change(struct xe_guc *guc,
> 			  q->guc->id);
> 	}
>
>-	if (pending_enable && !pending_resume &&
>-	    !exec_queue_pending_tdr_exit(q)) {
>+	if (pending_enable && !pending_resume) {
> 		clear_exec_queue_registered(q);
> 		xe_gt_dbg(guc_to_gt(guc), "Replay REGISTER - guc_id=%d",
> 			  q->guc->id);
>@@ -1908,7 +1862,6 @@ static void guc_exec_queue_revert_pending_state_change(struct xe_guc *guc,
> 	if (pending_enable) {
> 		clear_exec_queue_enabled(q);
> 		clear_exec_queue_pending_resume(q);
>-		clear_exec_queue_pending_tdr_exit(q);
> 		clear_exec_queue_pending_enable(q);
> 		xe_gt_dbg(guc_to_gt(guc), "Replay ENABLE - guc_id=%d",
> 			  q->guc->id);
>@@ -1934,7 +1887,6 @@ static void guc_exec_queue_revert_pending_state_change(struct xe_guc *guc,
> 		if (!pending_enable)
> 			set_exec_queue_enabled(q);
> 		clear_exec_queue_pending_disable(q);
>-		clear_exec_queue_check_timeout(q);
> 		xe_gt_dbg(guc_to_gt(guc), "Replay DISABLE - guc_id=%d",
> 			  q->guc->id);
> 	}
>@@ -2308,13 +2260,10 @@ static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
>
> 		q->guc->resume_time = ktime_get();
> 		clear_exec_queue_pending_resume(q);
>-		clear_exec_queue_pending_tdr_exit(q);
> 		clear_exec_queue_pending_enable(q);
> 		smp_wmb();
> 		wake_up_all(&guc->ct.wq);
> 	} else {
>-		bool check_timeout = exec_queue_check_timeout(q);
>-
> 		xe_gt_assert(guc_to_gt(guc), runnable_state == 0);
> 		xe_gt_assert(guc_to_gt(guc), exec_queue_pending_disable(q));
>
>@@ -2322,11 +2271,11 @@ static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
> 			suspend_fence_signal(q);
> 			clear_exec_queue_pending_disable(q);
> 		} else {
>-			if (exec_queue_banned(q) || check_timeout) {
>+			if (exec_queue_banned(q)) {
> 				smp_wmb();
> 				wake_up_all(&guc->ct.wq);
> 			}
>-			if (!check_timeout && exec_queue_destroyed(q)) {
>+			if (exec_queue_destroyed(q)) {
> 				/*
> 				 * Make sure to clear the pending_disable only
> 				 * after sampling the destroyed state. We want
>@@ -2436,7 +2385,7 @@ int xe_guc_exec_queue_reset_handler(struct xe_guc *guc, u32 *msg, u32 len)
> 	 * guc_exec_queue_timedout_job.
> 	 */
> 	set_exec_queue_reset(q);
>-	if (!exec_queue_banned(q) && !exec_queue_check_timeout(q))
>+	if (!exec_queue_banned(q))
> 		xe_guc_exec_queue_trigger_cleanup(q);
>
> 	return 0;
>@@ -2517,7 +2466,7 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
>
> 	/* Treat the same as engine reset */
> 	set_exec_queue_reset(q);
>-	if (!exec_queue_banned(q) && !exec_queue_check_timeout(q))
>+	if (!exec_queue_banned(q))
> 		xe_guc_exec_queue_trigger_cleanup(q);
>
> 	return 0;
>diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
>index 166353455f8f..38b0c536f6fb 100644
>--- a/drivers/gpu/drm/xe/xe_lrc.c
>+++ b/drivers/gpu/drm/xe/xe_lrc.c
>@@ -852,7 +852,7 @@ u32 xe_lrc_ctx_timestamp_udw_ggtt_addr(struct xe_lrc *lrc)
>  *
>  * Returns: ctx timestamp value
>  */
>-u64 xe_lrc_ctx_timestamp(struct xe_lrc *lrc)
>+static u64 xe_lrc_ctx_timestamp(struct xe_lrc *lrc)
> {
> 	struct xe_device *xe = lrc_to_xe(lrc);
> 	struct iosys_map map;
>@@ -2380,35 +2380,31 @@ static int get_ctx_timestamp(struct xe_lrc *lrc, u32 engine_id, u64 *reg_ctx_ts)
> }
>
> /**
>- * xe_lrc_update_timestamp() - Update ctx timestamp
>+ * xe_lrc_timestamp() - Current ctx timestamp
>  * @lrc: Pointer to the lrc.
>- * @old_ts: Old timestamp value
>  *
>- * Populate @old_ts current saved ctx timestamp, read new ctx timestamp and
>- * update saved value. With support for active contexts, the calculation may be
>- * slightly racy, so follow a read-again logic to ensure that the context is
>- * still active before returning the right timestamp.
>+ * Return latest ctx timestamp. With support for active contexts, the
>+ * calculation may bb slightly racy, so follow a read-again logic to ensure that
>+ * the context is still active before returning the right timestamp.
>  *
>  * Returns: New ctx timestamp value
>  */
>-u64 xe_lrc_update_timestamp(struct xe_lrc *lrc, u64 *old_ts)
>+u64 xe_lrc_timestamp(struct xe_lrc *lrc)
> {
>-	u64 lrc_ts, reg_ts;
>+	u64 lrc_ts, reg_ts, new_ts;
> 	u32 engine_id;
>
>-	*old_ts = lrc->ctx_timestamp;
>-
> 	lrc_ts = xe_lrc_ctx_timestamp(lrc);
> 	/* CTX_TIMESTAMP mmio read is invalid on VF, so return the LRC value */
> 	if (IS_SRIOV_VF(lrc_to_xe(lrc))) {
>-		lrc->ctx_timestamp = lrc_ts;
>+		new_ts = lrc_ts;
> 		goto done;
> 	}
>
> 	if (lrc_ts == CONTEXT_ACTIVE) {
> 		engine_id = xe_lrc_engine_id(lrc);
> 		if (!get_ctx_timestamp(lrc, engine_id, &reg_ts))
>-			lrc->ctx_timestamp = reg_ts;
>+			new_ts = reg_ts;
>
> 		/* read lrc again to ensure context is still active */
> 		lrc_ts = xe_lrc_ctx_timestamp(lrc);
>@@ -2419,9 +2415,27 @@ u64 xe_lrc_update_timestamp(struct xe_lrc *lrc, u64 *old_ts)
> 	 * be a separate if condition.
> 	 */
> 	if (lrc_ts != CONTEXT_ACTIVE)
>-		lrc->ctx_timestamp = lrc_ts;
>+		new_ts = lrc_ts;
>
> done:
>+	return new_ts;
>+}
>+
>+/**
>+ * xe_lrc_update_timestamp() - Update ctx timestamp
>+ * @lrc: Pointer to the lrc.
>+ * @old_ts: Old timestamp value
>+ *
>+ * Populate @old_ts current saved ctx timestamp, read new ctx timestamp and
>+ * update saved value.
>+ *
>+ * Returns: New ctx timestamp value
>+ */
>+u64 xe_lrc_update_timestamp(struct xe_lrc *lrc, u64 *old_ts)
>+{
>+	*old_ts = lrc->ctx_timestamp;
>+	lrc->ctx_timestamp = xe_lrc_timestamp(lrc);
>+
> 	trace_xe_lrc_update_timestamp(lrc, *old_ts);
>
> 	return lrc->ctx_timestamp;
>diff --git a/drivers/gpu/drm/xe/xe_lrc.h b/drivers/gpu/drm/xe/xe_lrc.h
>index a32472b92242..93c1234e2706 100644
>--- a/drivers/gpu/drm/xe/xe_lrc.h
>+++ b/drivers/gpu/drm/xe/xe_lrc.h
>@@ -142,7 +142,6 @@ void xe_lrc_snapshot_free(struct xe_lrc_snapshot *snapshot);
>
> u32 xe_lrc_ctx_timestamp_ggtt_addr(struct xe_lrc *lrc);
> u32 xe_lrc_ctx_timestamp_udw_ggtt_addr(struct xe_lrc *lrc);
>-u64 xe_lrc_ctx_timestamp(struct xe_lrc *lrc);
> u32 xe_lrc_ctx_job_timestamp_ggtt_addr(struct xe_lrc *lrc);
> u32 xe_lrc_ctx_job_timestamp(struct xe_lrc *lrc);
> int xe_lrc_setup_wa_bb_with_scratch(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
>@@ -162,4 +161,6 @@ int xe_lrc_setup_wa_bb_with_scratch(struct xe_lrc *lrc, struct xe_hw_engine *hwe
>  */
> u64 xe_lrc_update_timestamp(struct xe_lrc *lrc, u64 *old_ts);
>
>+u64 xe_lrc_timestamp(struct xe_lrc *lrc);
>+
> #endif
>diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
>index ac0c6dcffe15..3dacfc2da75c 100644
>--- a/drivers/gpu/drm/xe/xe_ring_ops.c
>+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
>@@ -233,13 +233,26 @@ static u32 get_ppgtt_flag(struct xe_sched_job *job)
> 	return 0;
> }
>
>-static int emit_copy_timestamp(struct xe_lrc *lrc, u32 *dw, int i)
>+static int emit_copy_timestamp(struct xe_device *xe, struct xe_lrc *lrc,
>+			       u32 *dw, int i)
> {
> 	dw[i++] = MI_STORE_REGISTER_MEM | MI_SRM_USE_GGTT | MI_SRM_ADD_CS_OFFSET;
> 	dw[i++] = RING_CTX_TIMESTAMP(0).addr;
> 	dw[i++] = xe_lrc_ctx_job_timestamp_ggtt_addr(lrc);
> 	dw[i++] = 0;
>
>+	/*
>+	 * Ensure CTX timestamp >= Job timestamp during VF sampling to avoid
>+	 * arithmetic wraparound in TDR.
>+	 */
>+	if (IS_SRIOV_VF(xe)) {
>+		dw[i++] = MI_STORE_REGISTER_MEM | MI_SRM_USE_GGTT |
>+			MI_SRM_ADD_CS_OFFSET;
>+		dw[i++] = RING_CTX_TIMESTAMP(0).addr;
>+		dw[i++] = xe_lrc_ctx_timestamp_ggtt_addr(lrc);
>+		dw[i++] = 0;
>+	}

Is this change for a different issue OR is it the same issue that is 
fixed in patch 8?

otherwise, LGTM,

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

Thanks,
Umesh

>+
> 	return i;
> }
>
>@@ -253,7 +266,7 @@ static void __emit_job_gen12_simple(struct xe_sched_job *job, struct xe_lrc *lrc
>
> 	*head = lrc->ring.tail;
>
>-	i = emit_copy_timestamp(lrc, dw, i);
>+	i = emit_copy_timestamp(gt_to_xe(gt), lrc, dw, i);
>
> 	if (job->ring_ops_flush_tlb) {
> 		dw[i++] = preparser_disable(true);
>@@ -308,7 +321,7 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
>
> 	*head = lrc->ring.tail;
>
>-	i = emit_copy_timestamp(lrc, dw, i);
>+	i = emit_copy_timestamp(xe, lrc, dw, i);
>
> 	dw[i++] = preparser_disable(true);
>
>@@ -362,7 +375,7 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
>
> 	*head = lrc->ring.tail;
>
>-	i = emit_copy_timestamp(lrc, dw, i);
>+	i = emit_copy_timestamp(xe, lrc, dw, i);
>
> 	dw[i++] = preparser_disable(true);
> 	if (lacks_render)
>@@ -406,12 +419,14 @@ static void emit_migration_job_gen12(struct xe_sched_job *job,
> 				     struct xe_lrc *lrc, u32 *head,
> 				     u32 seqno)
> {
>+	struct xe_gt *gt = job->q->gt;
>+	struct xe_device *xe = gt_to_xe(gt);
> 	u32 saddr = xe_lrc_start_seqno_ggtt_addr(lrc);
> 	u32 dw[MAX_JOB_SIZE_DW], i = 0;
>
> 	*head = lrc->ring.tail;
>
>-	i = emit_copy_timestamp(lrc, dw, i);
>+	i = emit_copy_timestamp(xe, lrc, dw, i);
>
> 	i = emit_store_imm_ggtt(saddr, seqno, dw, i);
>
>diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
>index cb674a322113..39aec7f6d86d 100644
>--- a/drivers/gpu/drm/xe/xe_sched_job.c
>+++ b/drivers/gpu/drm/xe/xe_sched_job.c
>@@ -110,6 +110,7 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
> 		return ERR_PTR(-ENOMEM);
>
> 	job->q = q;
>+	job->sample_timestamp = U64_MAX;
> 	kref_init(&job->refcount);
> 	xe_exec_queue_get(job->q);
>
>diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h b/drivers/gpu/drm/xe/xe_sched_job_types.h
>index 7c4c54fe920a..13c2970e81a8 100644
>--- a/drivers/gpu/drm/xe/xe_sched_job_types.h
>+++ b/drivers/gpu/drm/xe/xe_sched_job_types.h
>@@ -59,6 +59,8 @@ struct xe_sched_job {
> 	u32 lrc_seqno;
> 	/** @migrate_flush_flags: Additional flush flags for migration jobs */
> 	u32 migrate_flush_flags;
>+	/** @sample_timestamp: Sampling of job timestamp in TDR */
>+	u64 sample_timestamp;
> 	/** @ring_ops_flush_tlb: The ring ops need to flush TLB before payload. */
> 	bool ring_ops_flush_tlb;
> 	/** @ggtt: mapped in ggtt. */
>-- 
>2.34.1
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 9/9] drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR
  2025-12-02  7:31   ` Umesh Nerlige Ramappa
@ 2025-12-02 15:14     ` Matthew Brost
  0 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2025-12-02 15:14 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-xe, dri-devel

On Mon, Dec 01, 2025 at 11:31:57PM -0800, Umesh Nerlige Ramappa wrote:
> On Mon, Dec 01, 2025 at 10:39:54AM -0800, Matthew Brost wrote:
> > We now have proper infrastructure to accurately check the LRC timestamp
> > without toggling the scheduling state for non-VFs. For VFs, it is still
> > possible to get an inaccurate view if the context is on hardware. We
> > guard against free-running contexts on VFs by banning jobs whose
> > timestamps are not moving. In addition, VFs have a timeslice quantum
> > that naturally triggers context switches when more than one VF is
> > running, thus updating the LRC timestamp.
> 
> I guess some workloads are configured to just keep running on VFs without
> switching. I am assuming they are classified as Long Running and are not
> affected by TDR. If so, the timeouts should still work reasonably on a VF
> considering switching is usually in milliseconds and timeouts are much
> larger.
> 
> > 
> > For multi-queue, it is desirable to avoid scheduling toggling in the TDR
> > because this scheduling state is shared among many queues. Furthermore,
> > this change simplifies the GuC state machine. The trade-off for VF cases
> > seems worthwhile.
> > 
> > v5:
> > - Add xe_lrc_timestamp helper (Umesh)
> > v6:
> > - Reduce number of tries on stuck timestamp (VF testing)
> > - Convert job timestamp save to a memory copy (VF testing)
> > v7:
> > - Save ctx timestamp to LRC when start VF job (VF testing)
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_guc_submit.c      | 97 ++++++-------------------
> > drivers/gpu/drm/xe/xe_lrc.c             | 42 +++++++----
> > drivers/gpu/drm/xe/xe_lrc.h             |  3 +-
> > drivers/gpu/drm/xe/xe_ring_ops.c        | 25 +++++--
> > drivers/gpu/drm/xe/xe_sched_job.c       |  1 +
> > drivers/gpu/drm/xe/xe_sched_job_types.h |  2 +
> > 6 files changed, 76 insertions(+), 94 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > index 8190f2afbaed..dc4bf3126450 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > @@ -68,9 +68,7 @@ exec_queue_to_guc(struct xe_exec_queue *q)
> > #define EXEC_QUEUE_STATE_KILLED			(1 << 7)
> > #define EXEC_QUEUE_STATE_WEDGED			(1 << 8)
> > #define EXEC_QUEUE_STATE_BANNED			(1 << 9)
> > -#define EXEC_QUEUE_STATE_CHECK_TIMEOUT		(1 << 10)
> > -#define EXEC_QUEUE_STATE_PENDING_RESUME		(1 << 11)
> > -#define EXEC_QUEUE_STATE_PENDING_TDR_EXIT	(1 << 12)
> > +#define EXEC_QUEUE_STATE_PENDING_RESUME		(1 << 10)
> > 
> > static bool exec_queue_registered(struct xe_exec_queue *q)
> > {
> > @@ -202,21 +200,6 @@ static void set_exec_queue_wedged(struct xe_exec_queue *q)
> > 	atomic_or(EXEC_QUEUE_STATE_WEDGED, &q->guc->state);
> > }
> > 
> > -static bool exec_queue_check_timeout(struct xe_exec_queue *q)
> > -{
> > -	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_CHECK_TIMEOUT;
> > -}
> > -
> > -static void set_exec_queue_check_timeout(struct xe_exec_queue *q)
> > -{
> > -	atomic_or(EXEC_QUEUE_STATE_CHECK_TIMEOUT, &q->guc->state);
> > -}
> > -
> > -static void clear_exec_queue_check_timeout(struct xe_exec_queue *q)
> > -{
> > -	atomic_and(~EXEC_QUEUE_STATE_CHECK_TIMEOUT, &q->guc->state);
> > -}
> > -
> > static bool exec_queue_pending_resume(struct xe_exec_queue *q)
> > {
> > 	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_PENDING_RESUME;
> > @@ -232,21 +215,6 @@ static void clear_exec_queue_pending_resume(struct xe_exec_queue *q)
> > 	atomic_and(~EXEC_QUEUE_STATE_PENDING_RESUME, &q->guc->state);
> > }
> > 
> > -static bool exec_queue_pending_tdr_exit(struct xe_exec_queue *q)
> > -{
> > -	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_PENDING_TDR_EXIT;
> > -}
> > -
> > -static void set_exec_queue_pending_tdr_exit(struct xe_exec_queue *q)
> > -{
> > -	atomic_or(EXEC_QUEUE_STATE_PENDING_TDR_EXIT, &q->guc->state);
> > -}
> > -
> > -static void clear_exec_queue_pending_tdr_exit(struct xe_exec_queue *q)
> > -{
> > -	atomic_and(~EXEC_QUEUE_STATE_PENDING_TDR_EXIT, &q->guc->state);
> > -}
> > -
> > static bool exec_queue_killed_or_banned_or_wedged(struct xe_exec_queue *q)
> > {
> > 	return (atomic_read(&q->guc->state) &
> > @@ -1006,7 +974,16 @@ static bool check_timeout(struct xe_exec_queue *q, struct xe_sched_job *job)
> > 		return xe_sched_invalidate_job(job, 2);
> > 	}
> > 
> > -	ctx_timestamp = lower_32_bits(xe_lrc_ctx_timestamp(q->lrc[0]));
> > +	ctx_timestamp = lower_32_bits(xe_lrc_timestamp(q->lrc[0]));
> > +	if (ctx_timestamp == job->sample_timestamp) {
> > +		xe_gt_warn(gt, "Check job timeout: seqno=%u, lrc_seqno=%u, guc_id=%d, timestamp stuck",
> > +			   xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job),
> > +			   q->guc->id);
> > +
> > +		return xe_sched_invalidate_job(job, 0);
> > +	}
> > +
> > +	job->sample_timestamp = ctx_timestamp;
> > 	ctx_job_timestamp = xe_lrc_ctx_job_timestamp(q->lrc[0]);
> > 
> > 	/*
> > @@ -1132,16 +1109,17 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> > 	}
> > 
> > 	/*
> > -	 * XXX: Sampling timeout doesn't work in wedged mode as we have to
> > -	 * modify scheduling state to read timestamp. We could read the
> > -	 * timestamp from a register to accumulate current running time but this
> > -	 * doesn't work for SRIOV. For now assuming timeouts in wedged mode are
> > -	 * genuine timeouts.
> > +	 * Check if job is actually timed out, if so restart job execution and TDR
> > 	 */
> > +	if (!skip_timeout_check && !check_timeout(q, job))
> > +		goto rearm;
> > +
> > 	if (!exec_queue_killed(q))
> > 		wedged = guc_submit_hint_wedged(exec_queue_to_guc(q));
> > 
> > -	/* Engine state now stable, disable scheduling to check timestamp */
> > +	set_exec_queue_banned(q);
> > +
> > +	/* Kick job / queue off hardware */
> > 	if (!wedged && (exec_queue_enabled(q) || exec_queue_pending_disable(q))) {
> > 		int ret;
> > 
> > @@ -1163,13 +1141,6 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> > 			if (!ret || xe_guc_read_stopped(guc))
> > 				goto trigger_reset;
> > 
> > -			/*
> > -			 * Flag communicates to G2H handler that schedule
> > -			 * disable originated from a timeout check. The G2H then
> > -			 * avoid triggering cleanup or deregistering the exec
> > -			 * queue.
> > -			 */
> > -			set_exec_queue_check_timeout(q);
> > 			disable_scheduling(q, skip_timeout_check);
> > 		}
> > 
> > @@ -1198,22 +1169,12 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> > 			xe_devcoredump(q, job,
> > 				       "Schedule disable failed to respond, guc_id=%d, ret=%d, guc_read=%d",
> > 				       q->guc->id, ret, xe_guc_read_stopped(guc));
> > -			set_exec_queue_banned(q);
> > 			xe_gt_reset_async(q->gt);
> > 			xe_sched_tdr_queue_imm(sched);
> > 			goto rearm;
> > 		}
> > 	}
> > 
> > -	/*
> > -	 * Check if job is actually timed out, if so restart job execution and TDR
> > -	 */
> > -	if (!wedged && !skip_timeout_check && !check_timeout(q, job) &&
> > -	    !exec_queue_reset(q) && exec_queue_registered(q)) {
> > -		clear_exec_queue_check_timeout(q);
> > -		goto sched_enable;
> > -	}
> > -
> > 	if (q->vm && q->vm->xef) {
> > 		process_name = q->vm->xef->process_name;
> > 		pid = q->vm->xef->pid;
> > @@ -1244,14 +1205,11 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> > 	if (!wedged && (q->flags & EXEC_QUEUE_FLAG_KERNEL ||
> > 			(q->flags & EXEC_QUEUE_FLAG_VM && !exec_queue_killed(q)))) {
> > 		if (!xe_sched_invalidate_job(job, 2)) {
> > -			clear_exec_queue_check_timeout(q);
> > 			xe_gt_reset_async(q->gt);
> > 			goto rearm;
> > 		}
> > 	}
> > 
> > -	set_exec_queue_banned(q);
> > -
> > 	/* Mark all outstanding jobs as bad, thus completing them */
> > 	xe_sched_job_set_error(job, err);
> > 	drm_sched_for_each_pending_job(tmp_job, &sched->base, NULL)
> > @@ -1266,9 +1224,6 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> > 	 */
> > 	return DRM_GPU_SCHED_STAT_NO_HANG;
> > 
> > -sched_enable:
> > -	set_exec_queue_pending_tdr_exit(q);
> > -	enable_scheduling(q);
> > rearm:
> > 	/*
> > 	 * XXX: Ideally want to adjust timeout based on current execution time
> > @@ -1898,8 +1853,7 @@ static void guc_exec_queue_revert_pending_state_change(struct xe_guc *guc,
> > 			  q->guc->id);
> > 	}
> > 
> > -	if (pending_enable && !pending_resume &&
> > -	    !exec_queue_pending_tdr_exit(q)) {
> > +	if (pending_enable && !pending_resume) {
> > 		clear_exec_queue_registered(q);
> > 		xe_gt_dbg(guc_to_gt(guc), "Replay REGISTER - guc_id=%d",
> > 			  q->guc->id);
> > @@ -1908,7 +1862,6 @@ static void guc_exec_queue_revert_pending_state_change(struct xe_guc *guc,
> > 	if (pending_enable) {
> > 		clear_exec_queue_enabled(q);
> > 		clear_exec_queue_pending_resume(q);
> > -		clear_exec_queue_pending_tdr_exit(q);
> > 		clear_exec_queue_pending_enable(q);
> > 		xe_gt_dbg(guc_to_gt(guc), "Replay ENABLE - guc_id=%d",
> > 			  q->guc->id);
> > @@ -1934,7 +1887,6 @@ static void guc_exec_queue_revert_pending_state_change(struct xe_guc *guc,
> > 		if (!pending_enable)
> > 			set_exec_queue_enabled(q);
> > 		clear_exec_queue_pending_disable(q);
> > -		clear_exec_queue_check_timeout(q);
> > 		xe_gt_dbg(guc_to_gt(guc), "Replay DISABLE - guc_id=%d",
> > 			  q->guc->id);
> > 	}
> > @@ -2308,13 +2260,10 @@ static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
> > 
> > 		q->guc->resume_time = ktime_get();
> > 		clear_exec_queue_pending_resume(q);
> > -		clear_exec_queue_pending_tdr_exit(q);
> > 		clear_exec_queue_pending_enable(q);
> > 		smp_wmb();
> > 		wake_up_all(&guc->ct.wq);
> > 	} else {
> > -		bool check_timeout = exec_queue_check_timeout(q);
> > -
> > 		xe_gt_assert(guc_to_gt(guc), runnable_state == 0);
> > 		xe_gt_assert(guc_to_gt(guc), exec_queue_pending_disable(q));
> > 
> > @@ -2322,11 +2271,11 @@ static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
> > 			suspend_fence_signal(q);
> > 			clear_exec_queue_pending_disable(q);
> > 		} else {
> > -			if (exec_queue_banned(q) || check_timeout) {
> > +			if (exec_queue_banned(q)) {
> > 				smp_wmb();
> > 				wake_up_all(&guc->ct.wq);
> > 			}
> > -			if (!check_timeout && exec_queue_destroyed(q)) {
> > +			if (exec_queue_destroyed(q)) {
> > 				/*
> > 				 * Make sure to clear the pending_disable only
> > 				 * after sampling the destroyed state. We want
> > @@ -2436,7 +2385,7 @@ int xe_guc_exec_queue_reset_handler(struct xe_guc *guc, u32 *msg, u32 len)
> > 	 * guc_exec_queue_timedout_job.
> > 	 */
> > 	set_exec_queue_reset(q);
> > -	if (!exec_queue_banned(q) && !exec_queue_check_timeout(q))
> > +	if (!exec_queue_banned(q))
> > 		xe_guc_exec_queue_trigger_cleanup(q);
> > 
> > 	return 0;
> > @@ -2517,7 +2466,7 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
> > 
> > 	/* Treat the same as engine reset */
> > 	set_exec_queue_reset(q);
> > -	if (!exec_queue_banned(q) && !exec_queue_check_timeout(q))
> > +	if (!exec_queue_banned(q))
> > 		xe_guc_exec_queue_trigger_cleanup(q);
> > 
> > 	return 0;
> > diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
> > index 166353455f8f..38b0c536f6fb 100644
> > --- a/drivers/gpu/drm/xe/xe_lrc.c
> > +++ b/drivers/gpu/drm/xe/xe_lrc.c
> > @@ -852,7 +852,7 @@ u32 xe_lrc_ctx_timestamp_udw_ggtt_addr(struct xe_lrc *lrc)
> >  *
> >  * Returns: ctx timestamp value
> >  */
> > -u64 xe_lrc_ctx_timestamp(struct xe_lrc *lrc)
> > +static u64 xe_lrc_ctx_timestamp(struct xe_lrc *lrc)
> > {
> > 	struct xe_device *xe = lrc_to_xe(lrc);
> > 	struct iosys_map map;
> > @@ -2380,35 +2380,31 @@ static int get_ctx_timestamp(struct xe_lrc *lrc, u32 engine_id, u64 *reg_ctx_ts)
> > }
> > 
> > /**
> > - * xe_lrc_update_timestamp() - Update ctx timestamp
> > + * xe_lrc_timestamp() - Current ctx timestamp
> >  * @lrc: Pointer to the lrc.
> > - * @old_ts: Old timestamp value
> >  *
> > - * Populate @old_ts current saved ctx timestamp, read new ctx timestamp and
> > - * update saved value. With support for active contexts, the calculation may be
> > - * slightly racy, so follow a read-again logic to ensure that the context is
> > - * still active before returning the right timestamp.
> > + * Return latest ctx timestamp. With support for active contexts, the
> > + * calculation may bb slightly racy, so follow a read-again logic to ensure that
> > + * the context is still active before returning the right timestamp.
> >  *
> >  * Returns: New ctx timestamp value
> >  */
> > -u64 xe_lrc_update_timestamp(struct xe_lrc *lrc, u64 *old_ts)
> > +u64 xe_lrc_timestamp(struct xe_lrc *lrc)
> > {
> > -	u64 lrc_ts, reg_ts;
> > +	u64 lrc_ts, reg_ts, new_ts;
> > 	u32 engine_id;
> > 
> > -	*old_ts = lrc->ctx_timestamp;
> > -
> > 	lrc_ts = xe_lrc_ctx_timestamp(lrc);
> > 	/* CTX_TIMESTAMP mmio read is invalid on VF, so return the LRC value */
> > 	if (IS_SRIOV_VF(lrc_to_xe(lrc))) {
> > -		lrc->ctx_timestamp = lrc_ts;
> > +		new_ts = lrc_ts;
> > 		goto done;
> > 	}
> > 
> > 	if (lrc_ts == CONTEXT_ACTIVE) {
> > 		engine_id = xe_lrc_engine_id(lrc);
> > 		if (!get_ctx_timestamp(lrc, engine_id, &reg_ts))
> > -			lrc->ctx_timestamp = reg_ts;
> > +			new_ts = reg_ts;
> > 
> > 		/* read lrc again to ensure context is still active */
> > 		lrc_ts = xe_lrc_ctx_timestamp(lrc);
> > @@ -2419,9 +2415,27 @@ u64 xe_lrc_update_timestamp(struct xe_lrc *lrc, u64 *old_ts)
> > 	 * be a separate if condition.
> > 	 */
> > 	if (lrc_ts != CONTEXT_ACTIVE)
> > -		lrc->ctx_timestamp = lrc_ts;
> > +		new_ts = lrc_ts;
> > 
> > done:
> > +	return new_ts;
> > +}
> > +
> > +/**
> > + * xe_lrc_update_timestamp() - Update ctx timestamp
> > + * @lrc: Pointer to the lrc.
> > + * @old_ts: Old timestamp value
> > + *
> > + * Populate @old_ts current saved ctx timestamp, read new ctx timestamp and
> > + * update saved value.
> > + *
> > + * Returns: New ctx timestamp value
> > + */
> > +u64 xe_lrc_update_timestamp(struct xe_lrc *lrc, u64 *old_ts)
> > +{
> > +	*old_ts = lrc->ctx_timestamp;
> > +	lrc->ctx_timestamp = xe_lrc_timestamp(lrc);
> > +
> > 	trace_xe_lrc_update_timestamp(lrc, *old_ts);
> > 
> > 	return lrc->ctx_timestamp;
> > diff --git a/drivers/gpu/drm/xe/xe_lrc.h b/drivers/gpu/drm/xe/xe_lrc.h
> > index a32472b92242..93c1234e2706 100644
> > --- a/drivers/gpu/drm/xe/xe_lrc.h
> > +++ b/drivers/gpu/drm/xe/xe_lrc.h
> > @@ -142,7 +142,6 @@ void xe_lrc_snapshot_free(struct xe_lrc_snapshot *snapshot);
> > 
> > u32 xe_lrc_ctx_timestamp_ggtt_addr(struct xe_lrc *lrc);
> > u32 xe_lrc_ctx_timestamp_udw_ggtt_addr(struct xe_lrc *lrc);
> > -u64 xe_lrc_ctx_timestamp(struct xe_lrc *lrc);
> > u32 xe_lrc_ctx_job_timestamp_ggtt_addr(struct xe_lrc *lrc);
> > u32 xe_lrc_ctx_job_timestamp(struct xe_lrc *lrc);
> > int xe_lrc_setup_wa_bb_with_scratch(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
> > @@ -162,4 +161,6 @@ int xe_lrc_setup_wa_bb_with_scratch(struct xe_lrc *lrc, struct xe_hw_engine *hwe
> >  */
> > u64 xe_lrc_update_timestamp(struct xe_lrc *lrc, u64 *old_ts);
> > 
> > +u64 xe_lrc_timestamp(struct xe_lrc *lrc);
> > +
> > #endif
> > diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
> > index ac0c6dcffe15..3dacfc2da75c 100644
> > --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> > +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> > @@ -233,13 +233,26 @@ static u32 get_ppgtt_flag(struct xe_sched_job *job)
> > 	return 0;
> > }
> > 
> > -static int emit_copy_timestamp(struct xe_lrc *lrc, u32 *dw, int i)
> > +static int emit_copy_timestamp(struct xe_device *xe, struct xe_lrc *lrc,
> > +			       u32 *dw, int i)
> > {
> > 	dw[i++] = MI_STORE_REGISTER_MEM | MI_SRM_USE_GGTT | MI_SRM_ADD_CS_OFFSET;
> > 	dw[i++] = RING_CTX_TIMESTAMP(0).addr;
> > 	dw[i++] = xe_lrc_ctx_job_timestamp_ggtt_addr(lrc);
> > 	dw[i++] = 0;
> > 
> > +	/*
> > +	 * Ensure CTX timestamp >= Job timestamp during VF sampling to avoid
> > +	 * arithmetic wraparound in TDR.
> > +	 */
> > +	if (IS_SRIOV_VF(xe)) {
> > +		dw[i++] = MI_STORE_REGISTER_MEM | MI_SRM_USE_GGTT |
> > +			MI_SRM_ADD_CS_OFFSET;
> > +		dw[i++] = RING_CTX_TIMESTAMP(0).addr;
> > +		dw[i++] = xe_lrc_ctx_timestamp_ggtt_addr(lrc);
> > +		dw[i++] = 0;
> > +	}
> 
> Is this change for a different issue OR is it the same issue that is fixed
> in patch 8?
> 

This is covering the case where the LRC timestamp is less than job
timestamp. Consider the case a context switches on with timestamp of 1,
by the time job timestamp is saved the value is 2. The TDR would see
this as wrap, thus immediately timeout the job upon first TDR fire. It
is possible the job only switched on the hardware at very end of TDR
period.

This code ensures LRC timestamp >= job timestamp on first switch in. So
this code is needed in addition to the previous patch.

Matt

> otherwise, LGTM,
> 
> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> 
> Thanks,
> Umesh
> 
> > +
> > 	return i;
> > }
> > 
> > @@ -253,7 +266,7 @@ static void __emit_job_gen12_simple(struct xe_sched_job *job, struct xe_lrc *lrc
> > 
> > 	*head = lrc->ring.tail;
> > 
> > -	i = emit_copy_timestamp(lrc, dw, i);
> > +	i = emit_copy_timestamp(gt_to_xe(gt), lrc, dw, i);
> > 
> > 	if (job->ring_ops_flush_tlb) {
> > 		dw[i++] = preparser_disable(true);
> > @@ -308,7 +321,7 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
> > 
> > 	*head = lrc->ring.tail;
> > 
> > -	i = emit_copy_timestamp(lrc, dw, i);
> > +	i = emit_copy_timestamp(xe, lrc, dw, i);
> > 
> > 	dw[i++] = preparser_disable(true);
> > 
> > @@ -362,7 +375,7 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
> > 
> > 	*head = lrc->ring.tail;
> > 
> > -	i = emit_copy_timestamp(lrc, dw, i);
> > +	i = emit_copy_timestamp(xe, lrc, dw, i);
> > 
> > 	dw[i++] = preparser_disable(true);
> > 	if (lacks_render)
> > @@ -406,12 +419,14 @@ static void emit_migration_job_gen12(struct xe_sched_job *job,
> > 				     struct xe_lrc *lrc, u32 *head,
> > 				     u32 seqno)
> > {
> > +	struct xe_gt *gt = job->q->gt;
> > +	struct xe_device *xe = gt_to_xe(gt);
> > 	u32 saddr = xe_lrc_start_seqno_ggtt_addr(lrc);
> > 	u32 dw[MAX_JOB_SIZE_DW], i = 0;
> > 
> > 	*head = lrc->ring.tail;
> > 
> > -	i = emit_copy_timestamp(lrc, dw, i);
> > +	i = emit_copy_timestamp(xe, lrc, dw, i);
> > 
> > 	i = emit_store_imm_ggtt(saddr, seqno, dw, i);
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
> > index cb674a322113..39aec7f6d86d 100644
> > --- a/drivers/gpu/drm/xe/xe_sched_job.c
> > +++ b/drivers/gpu/drm/xe/xe_sched_job.c
> > @@ -110,6 +110,7 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
> > 		return ERR_PTR(-ENOMEM);
> > 
> > 	job->q = q;
> > +	job->sample_timestamp = U64_MAX;
> > 	kref_init(&job->refcount);
> > 	xe_exec_queue_get(job->q);
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h b/drivers/gpu/drm/xe/xe_sched_job_types.h
> > index 7c4c54fe920a..13c2970e81a8 100644
> > --- a/drivers/gpu/drm/xe/xe_sched_job_types.h
> > +++ b/drivers/gpu/drm/xe/xe_sched_job_types.h
> > @@ -59,6 +59,8 @@ struct xe_sched_job {
> > 	u32 lrc_seqno;
> > 	/** @migrate_flush_flags: Additional flush flags for migration jobs */
> > 	u32 migrate_flush_flags;
> > +	/** @sample_timestamp: Sampling of job timestamp in TDR */
> > +	u64 sample_timestamp;
> > 	/** @ring_ops_flush_tlb: The ring ops need to flush TLB before payload. */
> > 	bool ring_ops_flush_tlb;
> > 	/** @ggtt: mapped in ggtt. */
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe
  2025-12-01 18:39 [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
                   ` (12 preceding siblings ...)
  2025-12-02  5:18 ` ✓ Xe.CI.Full: " Patchwork
@ 2025-12-03  1:23 ` Matthew Brost
  2025-12-03  8:33   ` Philipp Stanner
  13 siblings, 1 reply; 31+ messages in thread
From: Matthew Brost @ 2025-12-03  1:23 UTC (permalink / raw)
  To: intel-xe; +Cc: dri-devel, phasta, christian.koenig, dakr

On Mon, Dec 01, 2025 at 10:39:45AM -0800, Matthew Brost wrote:

Fellow DRM sched maintainers - going to merge the first two patches in
this series to drm-misc-next two days from now unless I hear an
objection.

Matt 

> At XDC, we discussed that drivers should avoid accessing DRM scheduler
> internals, misusing DRM scheduler locks, and adopt a well-defined
> pending job list iterator. This series proposes the necessary changes to
> the DRM scheduler to bring Xe in line with that agreement and updates Xe
> to use the new DRM scheduler API.
> 
> While here, cleanup LR queue handling and simplify GuC state machine in
> Xe too. Also rework LRC timestamp sampling to avoid scheduling toggle.
> 
> v2:
>  - Fix checkpatch / naming issues
> v3:
>  - Only allow pending job list iterator to be called on stopped schedulers
>  - Cleanup LR queue handling / fix a few misselanous Xe scheduler issues
> v4:
>  - Address Niranjana's feedback
>  - Add patch to avoid toggling scheduler state in the TDR
> v5:
>  - Rebase
>  - Fixup LRC timeout check (Umesh)
> v6:
>  - Fix VF bugs (Testing)
> v7:
>  - Disable timestamp WA on VF
> 
> Matt
> 
> Matthew Brost (9):
>   drm/sched: Add several job helpers to avoid drivers touching scheduler
>     state
>   drm/sched: Add pending job list iterator
>   drm/xe: Add dedicated message lock
>   drm/xe: Stop abusing DRM scheduler internals
>   drm/xe: Only toggle scheduling in TDR if GuC is running
>   drm/xe: Do not deregister queues in TDR
>   drm/xe: Remove special casing for LR queues in submission
>   drm/xe: Disable timestamp WA on VFs
>   drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR
> 
>  drivers/gpu/drm/scheduler/sched_main.c       |   4 +-
>  drivers/gpu/drm/xe/xe_gpu_scheduler.c        |   9 +-
>  drivers/gpu/drm/xe/xe_gpu_scheduler.h        |  37 +-
>  drivers/gpu/drm/xe/xe_gpu_scheduler_types.h  |   2 +
>  drivers/gpu/drm/xe/xe_guc_exec_queue_types.h |   2 -
>  drivers/gpu/drm/xe/xe_guc_submit.c           | 362 +++----------------
>  drivers/gpu/drm/xe/xe_guc_submit_types.h     |  11 -
>  drivers/gpu/drm/xe/xe_hw_fence.c             |  16 -
>  drivers/gpu/drm/xe/xe_hw_fence.h             |   2 -
>  drivers/gpu/drm/xe/xe_lrc.c                  |  45 ++-
>  drivers/gpu/drm/xe/xe_lrc.h                  |   3 +-
>  drivers/gpu/drm/xe/xe_ring_ops.c             |  25 +-
>  drivers/gpu/drm/xe/xe_sched_job.c            |   1 +
>  drivers/gpu/drm/xe/xe_sched_job_types.h      |   2 +
>  drivers/gpu/drm/xe/xe_trace.h                |   5 -
>  include/drm/gpu_scheduler.h                  |  82 +++++
>  16 files changed, 211 insertions(+), 397 deletions(-)
> 
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe
  2025-12-03  1:23 ` [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
@ 2025-12-03  8:33   ` Philipp Stanner
  0 siblings, 0 replies; 31+ messages in thread
From: Philipp Stanner @ 2025-12-03  8:33 UTC (permalink / raw)
  To: Matthew Brost, intel-xe; +Cc: dri-devel, phasta, christian.koenig, dakr

On Tue, 2025-12-02 at 17:23 -0800, Matthew Brost wrote:
> On Mon, Dec 01, 2025 at 10:39:45AM -0800, Matthew Brost wrote:
> 
> Fellow DRM sched maintainers - going to merge the first two patches in
> this series to drm-misc-next two days from now unless I hear an
> objection.

Fellow maintainer Matt, it seems that none of us has ever been on Cc
for this series or patch? I can't find it in my inbox. Did you forget
to add us for 7 revisions, or did you omit us on purpose?

I look occasionally at dri-devel, but you know how huge the list is.

P.


> 
> Matt 
> 
> > At XDC, we discussed that drivers should avoid accessing DRM scheduler
> > internals, misusing DRM scheduler locks, and adopt a well-defined
> > pending job list iterator. This series proposes the necessary changes to
> > the DRM scheduler to bring Xe in line with that agreement and updates Xe
> > to use the new DRM scheduler API.
> > 
> > While here, cleanup LR queue handling and simplify GuC state machine in
> > Xe too. Also rework LRC timestamp sampling to avoid scheduling toggle.
> > 
> > v2:
> >  - Fix checkpatch / naming issues
> > v3:
> >  - Only allow pending job list iterator to be called on stopped schedulers
> >  - Cleanup LR queue handling / fix a few misselanous Xe scheduler issues
> > v4:
> >  - Address Niranjana's feedback
> >  - Add patch to avoid toggling scheduler state in the TDR
> > v5:
> >  - Rebase
> >  - Fixup LRC timeout check (Umesh)
> > v6:
> >  - Fix VF bugs (Testing)
> > v7:
> >  - Disable timestamp WA on VF
> > 
> > Matt
> > 
> > Matthew Brost (9):
> >   drm/sched: Add several job helpers to avoid drivers touching scheduler
> >     state
> >   drm/sched: Add pending job list iterator
> >   drm/xe: Add dedicated message lock
> >   drm/xe: Stop abusing DRM scheduler internals
> >   drm/xe: Only toggle scheduling in TDR if GuC is running
> >   drm/xe: Do not deregister queues in TDR
> >   drm/xe: Remove special casing for LR queues in submission
> >   drm/xe: Disable timestamp WA on VFs
> >   drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR
> > 
> >  drivers/gpu/drm/scheduler/sched_main.c       |   4 +-
> >  drivers/gpu/drm/xe/xe_gpu_scheduler.c        |   9 +-
> >  drivers/gpu/drm/xe/xe_gpu_scheduler.h        |  37 +-
> >  drivers/gpu/drm/xe/xe_gpu_scheduler_types.h  |   2 +
> >  drivers/gpu/drm/xe/xe_guc_exec_queue_types.h |   2 -
> >  drivers/gpu/drm/xe/xe_guc_submit.c           | 362 +++----------------
> >  drivers/gpu/drm/xe/xe_guc_submit_types.h     |  11 -
> >  drivers/gpu/drm/xe/xe_hw_fence.c             |  16 -
> >  drivers/gpu/drm/xe/xe_hw_fence.h             |   2 -
> >  drivers/gpu/drm/xe/xe_lrc.c                  |  45 ++-
> >  drivers/gpu/drm/xe/xe_lrc.h                  |   3 +-
> >  drivers/gpu/drm/xe/xe_ring_ops.c             |  25 +-
> >  drivers/gpu/drm/xe/xe_sched_job.c            |   1 +
> >  drivers/gpu/drm/xe/xe_sched_job_types.h      |   2 +
> >  drivers/gpu/drm/xe/xe_trace.h                |   5 -
> >  include/drm/gpu_scheduler.h                  |  82 +++++
> >  16 files changed, 211 insertions(+), 397 deletions(-)
> > 
> > -- 
> > 2.34.1
> > 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 1/9] drm/sched: Add several job helpers to avoid drivers touching scheduler state
  2025-12-01 18:39 ` [PATCH v7 1/9] drm/sched: Add several job helpers to avoid drivers touching scheduler state Matthew Brost
@ 2025-12-03  8:56   ` Philipp Stanner
  2025-12-03 21:10     ` Matthew Brost
  0 siblings, 1 reply; 31+ messages in thread
From: Philipp Stanner @ 2025-12-03  8:56 UTC (permalink / raw)
  To: Matthew Brost, intel-xe
  Cc: dri-devel, dakr, Christian König, Alex Deucher

+Cc Christian, Alex, Danilo

On Mon, 2025-12-01 at 10:39 -0800, Matthew Brost wrote:
> Add helpers to see if scheduler is stopped and a jobs signaled state.
> Expected to be used driver side on recovery and debug flows.

First, thanks for working on this.

This is a big and significant change because it moves towards ending
the 10-year practice of accessing internal locks etc. – I think this
should have a long(er) and detailed commit message aka "In the past
drivers used to … this must end because … to do so we need to provide
those new functions: …"

> 
> v4:
>  - Reorder patch to first in series (Niranjana)
>  - Also check parent fence for signaling (Niranjana)

"We" mostly agreed of not adding changelogs to commit messages anymore
and either have them in the cover letter or in the patche's comment
section below ---
The commit changelog comments are not canonical in the kernel and don't
provide any value IMO.

> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c |  4 ++--
>  include/drm/gpu_scheduler.h            | 32 ++++++++++++++++++++++++++
>  2 files changed, 34 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 1d4f1b822e7b..cf40c18ab433 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -344,7 +344,7 @@ drm_sched_rq_select_entity_fifo(struct drm_gpu_scheduler *sched,
>   */
>  static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>  {
> -	if (!READ_ONCE(sched->pause_submit))
> +	if (!drm_sched_is_stopped(sched))
>  		queue_work(sched->submit_wq, &sched->work_run_job);
>  }
>  
> @@ -354,7 +354,7 @@ static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>   */
>  static void drm_sched_run_free_queue(struct drm_gpu_scheduler *sched)
>  {
> -	if (!READ_ONCE(sched->pause_submit))
> +	if (!drm_sched_is_stopped(sched))
>  		queue_work(sched->submit_wq, &sched->work_free_job);
>  }
>  
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index fb88301b3c45..385bf34e76fe 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -698,4 +698,36 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
>  				   struct drm_gpu_scheduler **sched_list,
>  				   unsigned int num_sched_list);
>  
> +/* Inlines */

Surplus comment, everyone immediately sees by the keyword that the
functions are inline.

But why do you want to provide them here instead of in sched_main.c in
the first place?

> +
> +/**
> + * drm_sched_is_stopped() - DRM is stopped

Well no, I doubt the entire DRM subsystem is stopped ;)

"Checks whether drm_sched is stopped"

> + * @sched: DRM scheduler
> + *
> + * Return: True if sched is stopped, False otherwise
> + */
> +static inline bool drm_sched_is_stopped(struct drm_gpu_scheduler *sched)
> +{
> +	return READ_ONCE(sched->pause_submit);

I am by the way suspecting since a long time

> +}
> +
> +/**
> + * drm_sched_job_is_signaled() - DRM scheduler job is signaled
> + * @job: DRM scheduler job
> + *
> + * Determine if DRM scheduler job is signaled. DRM scheduler should be stopped
> + * to obtain a stable snapshot of state. Both parent fence (hardware fence) and
> + * finished fence (software fence) are check to determine signaling state.

s/check/checked

I can roughly understand why you need the start/stop checkers for your
list iterator, but what is this function's purpose? The commit message
should explain that.

Do you need them in Xe? Do all drivers need them?

I think it's very cool that you provide this series and are working on
all that, but at XDC I think the important point was that we determined
that AMD and Intel basically do the same trick for GPU resets.

So our desire was not only to prevent folks from accessing the
scheduler's internals, but, ideally, also provide a well documented,
centralized and canonical mechanisms to do GPU resets.

So I think this drm/sched code must be discussed with AMD and we should
see whether it would be sufficient for them, too. And if yes, we need
to properly document that new way of GPU resets and tell users what
those functions are for. The docstrings so far just highlight that
those functions exist and how they are used, but not *why* they exist.

> + *
> + * Return: True if job is signaled, False otherwise

True and False should be lower case I think. At least I've never seen
them upper case in docstrings so far?

P.

> + */
> +static inline bool drm_sched_job_is_signaled(struct drm_sched_job *job)
> +{
> +	struct drm_sched_fence *s_fence = job->s_fence;
> +
> +	WARN_ON(!drm_sched_is_stopped(job->sched));
> +	return (s_fence->parent && dma_fence_is_signaled(s_fence->parent)) ||
> +		dma_fence_is_signaled(&s_fence->finished);
> +}
> +
>  #endif

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 2/9] drm/sched: Add pending job list iterator
  2025-12-01 18:39 ` [PATCH v7 2/9] drm/sched: Add pending job list iterator Matthew Brost
@ 2025-12-03  9:07   ` Philipp Stanner
  2025-12-03 10:28     ` Philipp Stanner
  2025-12-04 16:04     ` Alex Deucher
  0 siblings, 2 replies; 31+ messages in thread
From: Philipp Stanner @ 2025-12-03  9:07 UTC (permalink / raw)
  To: Matthew Brost, intel-xe
  Cc: dri-devel, Alex Deucher, Christian König, dakr

+Cc Alex, Christian, Danilo


On Mon, 2025-12-01 at 10:39 -0800, Matthew Brost wrote:
> Stop open coding pending job list in drivers. Add pending job list
> iterator which safely walks DRM scheduler list asserting DRM scheduler
> is stopped.
> 
> v2:
>  - Fix checkpatch (CI)
> v3:
>  - Drop locked version (Christian)
> v4:
>  - Reorder patch (Niranjana)

Same with the changelog.

> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>  include/drm/gpu_scheduler.h | 50 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 50 insertions(+)
> 
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 385bf34e76fe..9d228513d06c 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -730,4 +730,54 @@ static inline bool drm_sched_job_is_signaled(struct drm_sched_job *job)
>  		dma_fence_is_signaled(&s_fence->finished);
>  }
>  
> +/**
> + * struct drm_sched_pending_job_iter - DRM scheduler pending job iterator state
> + * @sched: DRM scheduler associated with pending job iterator
> + */
> +struct drm_sched_pending_job_iter {
> +	struct drm_gpu_scheduler *sched;
> +};
> +
> +/* Drivers should never call this directly */
> +static inline struct drm_sched_pending_job_iter
> +__drm_sched_pending_job_iter_begin(struct drm_gpu_scheduler *sched)
> +{
> +	struct drm_sched_pending_job_iter iter = {
> +		.sched = sched,
> +	};
> +
> +	WARN_ON(!drm_sched_is_stopped(sched));
> +	return iter;
> +}
> +
> +/* Drivers should never call this directly */
> +static inline void
> +__drm_sched_pending_job_iter_end(const struct drm_sched_pending_job_iter iter)
> +{
> +	WARN_ON(!drm_sched_is_stopped(iter.sched));
> +}
> +
> +DEFINE_CLASS(drm_sched_pending_job_iter, struct drm_sched_pending_job_iter,
> +	     __drm_sched_pending_job_iter_end(_T),
> +	     __drm_sched_pending_job_iter_begin(__sched),
> +	     struct drm_gpu_scheduler *__sched);
> +static inline void *
> +class_drm_sched_pending_job_iter_lock_ptr(class_drm_sched_pending_job_iter_t *_T)
> +{ return _T; }
> +#define class_drm_sched_pending_job_iter_is_conditional false
> +
> +/**
> + * drm_sched_for_each_pending_job() - Iterator for each pending job in scheduler
> + * @__job: Current pending job being iterated over
> + * @__sched: DRM scheduler to iterate over pending jobs
> + * @__entity: DRM scheduler entity to filter jobs, NULL indicates no filter
> + *
> + * Iterator for each pending job in scheduler, filtering on an entity, and
> + * enforcing scheduler is fully stopped
> + */
> +#define drm_sched_for_each_pending_job(__job, __sched, __entity)		\
> +	scoped_guard(drm_sched_pending_job_iter, (__sched))			\
> +		list_for_each_entry((__job), &(__sched)->pending_list, list)	\
> +			for_each_if(!(__entity) || (__job)->entity == (__entity))
> +
>  #endif


See my comments in the first patch. The docu doesn't mention at all why
this new functionality exists and when and why users would be expected
to use it.

As far as I remember from XDC, both AMD and Intel overwrite a timed out
jobs buffer data in the rings on GPU reset. To do so, the driver needs
the timedout job (passed through timedout_job() callback) and then
needs all the pending non-broken jobs.

AFAICS your patch provides a generic iterator over the entire
pending_list. How is a driver then supposed to determine which are the
non-broken jobs (just asking, but that needs to be documented)?

Could it make sense to use a different iterator which only returns jobs
of not belonging to the same context as the timedout-one?

Those are important questions that need to be addressed before merging
that.

And if this works canonically (i.e., for basically everyone), it needs
to be documented in drm_sched_resubmit_jobs() that this iterator is now
the canonical way of handling timeouts.

Moreover, btw, just yesterday I added an entry to the DRM todo list
which addresses drm_sched_resubmit_jobs(). If we merge this, that entry
would have to be removed, too.


@AMD: Would the code Matthew provides work for you? Please give your
input. This is very important common infrastructure.


Philipp


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 3/9] drm/xe: Add dedicated message lock
  2025-12-01 18:39 ` [PATCH v7 3/9] drm/xe: Add dedicated message lock Matthew Brost
@ 2025-12-03  9:38   ` Philipp Stanner
  0 siblings, 0 replies; 31+ messages in thread
From: Philipp Stanner @ 2025-12-03  9:38 UTC (permalink / raw)
  To: Matthew Brost, intel-xe; +Cc: dri-devel

On Mon, 2025-12-01 at 10:39 -0800, Matthew Brost wrote:
> Stop abusing DRM scheduler job list lock for messages, add dedicated
> message lock.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Nice! Me gusta.


Acked-by: Philipp Stanner <phasta@kernel.org>

> ---
>  drivers/gpu/drm/xe/xe_gpu_scheduler.c       | 5 +++--
>  drivers/gpu/drm/xe/xe_gpu_scheduler.h       | 4 ++--
>  drivers/gpu/drm/xe/xe_gpu_scheduler_types.h | 2 ++
>  3 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.c b/drivers/gpu/drm/xe/xe_gpu_scheduler.c
> index f91e06d03511..f4f23317191f 100644
> --- a/drivers/gpu/drm/xe/xe_gpu_scheduler.c
> +++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.c
> @@ -77,6 +77,7 @@ int xe_sched_init(struct xe_gpu_scheduler *sched,
>  	};
>  
>  	sched->ops = xe_ops;
> +	spin_lock_init(&sched->msg_lock);
>  	INIT_LIST_HEAD(&sched->msgs);
>  	INIT_WORK(&sched->work_process_msg, xe_sched_process_msg_work);
>  
> @@ -117,7 +118,7 @@ void xe_sched_add_msg(struct xe_gpu_scheduler *sched,
>  void xe_sched_add_msg_locked(struct xe_gpu_scheduler *sched,
>  			     struct xe_sched_msg *msg)
>  {
> -	lockdep_assert_held(&sched->base.job_list_lock);
> +	lockdep_assert_held(&sched->msg_lock);
>  
>  	list_add_tail(&msg->link, &sched->msgs);
>  	xe_sched_process_msg_queue(sched);
> @@ -131,7 +132,7 @@ void xe_sched_add_msg_locked(struct xe_gpu_scheduler *sched,
>  void xe_sched_add_msg_head(struct xe_gpu_scheduler *sched,
>  			   struct xe_sched_msg *msg)
>  {
> -	lockdep_assert_held(&sched->base.job_list_lock);
> +	lockdep_assert_held(&sched->msg_lock);
>  
>  	list_add(&msg->link, &sched->msgs);
>  	xe_sched_process_msg_queue(sched);
> diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.h b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
> index c7a77a3a9681..dceb2cd0ee5b 100644
> --- a/drivers/gpu/drm/xe/xe_gpu_scheduler.h
> +++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
> @@ -33,12 +33,12 @@ void xe_sched_add_msg_head(struct xe_gpu_scheduler *sched,
>  
>  static inline void xe_sched_msg_lock(struct xe_gpu_scheduler *sched)
>  {
> -	spin_lock(&sched->base.job_list_lock);
> +	spin_lock(&sched->msg_lock);
>  }
>  
>  static inline void xe_sched_msg_unlock(struct xe_gpu_scheduler *sched)
>  {
> -	spin_unlock(&sched->base.job_list_lock);
> +	spin_unlock(&sched->msg_lock);
>  }
>  
>  static inline void xe_sched_stop(struct xe_gpu_scheduler *sched)
> diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler_types.h b/drivers/gpu/drm/xe/xe_gpu_scheduler_types.h
> index 6731b13da8bb..63d9bf92583c 100644
> --- a/drivers/gpu/drm/xe/xe_gpu_scheduler_types.h
> +++ b/drivers/gpu/drm/xe/xe_gpu_scheduler_types.h
> @@ -47,6 +47,8 @@ struct xe_gpu_scheduler {
>  	const struct xe_sched_backend_ops	*ops;
>  	/** @msgs: list of messages to be processed in @work_process_msg */
>  	struct list_head			msgs;
> +	/** @msg_lock: Message lock */
> +	spinlock_t				msg_lock;
>  	/** @work_process_msg: processes messages */
>  	struct work_struct		work_process_msg;
>  };


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 2/9] drm/sched: Add pending job list iterator
  2025-12-03  9:07   ` Philipp Stanner
@ 2025-12-03 10:28     ` Philipp Stanner
  2025-12-04 16:04     ` Alex Deucher
  1 sibling, 0 replies; 31+ messages in thread
From: Philipp Stanner @ 2025-12-03 10:28 UTC (permalink / raw)
  To: Matthew Brost, intel-xe
  Cc: dri-devel, Alex Deucher, Christian König, dakr

On Wed, 2025-12-03 at 10:07 +0100, Philipp Stanner wrote:
> +Cc Alex, Christian, Danilo
> 
> 
> On Mon, 2025-12-01 at 10:39 -0800, Matthew Brost wrote:
> > 

[…]

> > +
> > +/**
> > + * drm_sched_for_each_pending_job() - Iterator for each pending job in scheduler
> > + * @__job: Current pending job being iterated over
> > + * @__sched: DRM scheduler to iterate over pending jobs
> > + * @__entity: DRM scheduler entity to filter jobs, NULL indicates no filter
> > + *
> > + * Iterator for each pending job in scheduler, filtering on an entity, and
> > + * enforcing scheduler is fully stopped
> > + */
> > +#define drm_sched_for_each_pending_job(__job, __sched, __entity)		\
> > +	scoped_guard(drm_sched_pending_job_iter, (__sched))			\
> > +		list_for_each_entry((__job), &(__sched)->pending_list, list)	\
> > +			for_each_if(!(__entity) || (__job)->entity == (__entity))
> > +
> >  #endif
> 
> 
> See my comments in the first patch. The docu doesn't mention at all why
> this new functionality exists and when and why users would be expected
> to use it.
> 
> As far as I remember from XDC, both AMD and Intel overwrite a timed out
> jobs buffer data in the rings on GPU reset. To do so, the driver needs
> the timedout job (passed through timedout_job() callback) and then
> needs all the pending non-broken jobs.
> 
> AFAICS your patch provides a generic iterator over the entire
> pending_list. How is a driver then supposed to determine which are the
> non-broken jobs (just asking, but that needs to be documented)?
> 
> Could it make sense to use a different iterator which only returns jobs
> of not belonging to the same context as the timedout-one?

(forget about that comment, you do that with the entity-filter
obviously)

P.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 4/9] drm/xe: Stop abusing DRM scheduler internals
  2025-12-01 18:39 ` [PATCH v7 4/9] drm/xe: Stop abusing DRM scheduler internals Matthew Brost
@ 2025-12-03 10:56   ` Philipp Stanner
  2025-12-03 20:44     ` Matthew Brost
  0 siblings, 1 reply; 31+ messages in thread
From: Philipp Stanner @ 2025-12-03 10:56 UTC (permalink / raw)
  To: Matthew Brost, intel-xe
  Cc: dri-devel, dakr, Alex Deucher, Christian König

On Mon, 2025-12-01 at 10:39 -0800, Matthew Brost wrote:
> Use new pending job list iterator and new helper functions in Xe to
> avoid reaching into DRM scheduler internals.

Cool.

Obviously this is your driver, but some comments below which you might
want to take into account.

> 
> Part of this change involves removing pending jobs debug information
> from debugfs and devcoredump. As agreed, the pending job list should
> only be accessed when the scheduler is stopped. However, it's not
> straightforward to determine whether the scheduler is stopped from the
> shared debugfs/devcoredump code path. Additionally, the pending job list
> provides little useful information, as pending jobs can be inferred from
> seqnos and ring head/tail positions. Therefore, this debug information
> is being removed.

This reads a bit like a contradiction to the first sentence.

> 
> v4:
>  - Add comment around DRM_GPU_SCHED_STAT_NO_HANG (Niranjana)

Revision info for just one of 7 revisions?

> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gpu_scheduler.c    |  4 +-
>  drivers/gpu/drm/xe/xe_gpu_scheduler.h    | 33 ++--------
>  drivers/gpu/drm/xe/xe_guc_submit.c       | 81 ++++++------------------
>  drivers/gpu/drm/xe/xe_guc_submit_types.h | 11 ----
>  drivers/gpu/drm/xe/xe_hw_fence.c         | 16 -----
>  drivers/gpu/drm/xe/xe_hw_fence.h         |  2 -
>  6 files changed, 27 insertions(+), 120 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.c b/drivers/gpu/drm/xe/xe_gpu_scheduler.c
> index f4f23317191f..9c8004d5dd91 100644
> --- a/drivers/gpu/drm/xe/xe_gpu_scheduler.c
> +++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.c
> @@ -7,7 +7,7 @@
>  
>  static void xe_sched_process_msg_queue(struct xe_gpu_scheduler *sched)
>  {
> -	if (!READ_ONCE(sched->base.pause_submit))
> +	if (!drm_sched_is_stopped(&sched->base))
>  		queue_work(sched->base.submit_wq, &sched->work_process_msg);

Sharing the submit_wq is legal. But next-level cleanness would be if
struct drm_gpu_scheduler's internal components wouldn't be touched.
That's kind of a luxury request, though.

>  }
>  
> @@ -43,7 +43,7 @@ static void xe_sched_process_msg_work(struct work_struct *w)
>  		container_of(w, struct xe_gpu_scheduler, work_process_msg);
>  	struct xe_sched_msg *msg;
>  
> -	if (READ_ONCE(sched->base.pause_submit))
> +	if (drm_sched_is_stopped(&sched->base))
>  		return;
>  
>  	msg = xe_sched_get_msg(sched);
> diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.h b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
> index dceb2cd0ee5b..664c2db56af3 100644
> --- a/drivers/gpu/drm/xe/xe_gpu_scheduler.h
> +++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
> @@ -56,12 +56,9 @@ static inline void xe_sched_resubmit_jobs(struct xe_gpu_scheduler *sched)
>  	struct drm_sched_job *s_job;
>  	bool restore_replay = false;
>  
> -	list_for_each_entry(s_job, &sched->base.pending_list, list) {
> -		struct drm_sched_fence *s_fence = s_job->s_fence;
> -		struct dma_fence *hw_fence = s_fence->parent;
> -
> +	drm_sched_for_each_pending_job(s_job, &sched->base, NULL) {
>  		restore_replay |= to_xe_sched_job(s_job)->restore_replay;
> -		if (restore_replay || (hw_fence && !dma_fence_is_signaled(hw_fence)))
> +		if (restore_replay || !drm_sched_job_is_signaled(s_job))

So that's where this function is needed. You check whether that job in
the pending_list is signaled. 

>  			sched->base.ops->run_job(s_job);

Aaaaaahm. So you invoke your own callback. But basically just to access
the function pointer I suppose?

Since this is effectively your drm_sched_resubmit_jobs(), it is
definitely desirable to provide a text book example of how to do resets
so that others can follow your usage.

Can't you replace ops->run_job() with a call to your functions where
you push the jobs to the ring, directly?


P.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 4/9] drm/xe: Stop abusing DRM scheduler internals
  2025-12-03 10:56   ` Philipp Stanner
@ 2025-12-03 20:44     ` Matthew Brost
  2025-12-08 13:44       ` Philipp Stanner
  0 siblings, 1 reply; 31+ messages in thread
From: Matthew Brost @ 2025-12-03 20:44 UTC (permalink / raw)
  To: Philipp Stanner
  Cc: intel-xe, dri-devel, dakr, Alex Deucher, Christian König

On Wed, Dec 03, 2025 at 11:56:01AM +0100, Philipp Stanner wrote:
> On Mon, 2025-12-01 at 10:39 -0800, Matthew Brost wrote:
> > Use new pending job list iterator and new helper functions in Xe to
> > avoid reaching into DRM scheduler internals.
> 
> Cool.
> 
> Obviously this is your driver, but some comments below which you might
> want to take into account.
> 
> > 
> > Part of this change involves removing pending jobs debug information
> > from debugfs and devcoredump. As agreed, the pending job list should
> > only be accessed when the scheduler is stopped. However, it's not
> > straightforward to determine whether the scheduler is stopped from the
> > shared debugfs/devcoredump code path. Additionally, the pending job list
> > provides little useful information, as pending jobs can be inferred from
> > seqnos and ring head/tail positions. Therefore, this debug information
> > is being removed.
> 
> This reads a bit like a contradiction to the first sentence.
> 
> > 
> > v4:
> >  - Add comment around DRM_GPU_SCHED_STAT_NO_HANG (Niranjana)
> 
> Revision info for just one of 7 revisions?
> 

Only v4 changed.

> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_gpu_scheduler.c    |  4 +-
> >  drivers/gpu/drm/xe/xe_gpu_scheduler.h    | 33 ++--------
> >  drivers/gpu/drm/xe/xe_guc_submit.c       | 81 ++++++------------------
> >  drivers/gpu/drm/xe/xe_guc_submit_types.h | 11 ----
> >  drivers/gpu/drm/xe/xe_hw_fence.c         | 16 -----
> >  drivers/gpu/drm/xe/xe_hw_fence.h         |  2 -
> >  6 files changed, 27 insertions(+), 120 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.c b/drivers/gpu/drm/xe/xe_gpu_scheduler.c
> > index f4f23317191f..9c8004d5dd91 100644
> > --- a/drivers/gpu/drm/xe/xe_gpu_scheduler.c
> > +++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.c
> > @@ -7,7 +7,7 @@
> >  
> >  static void xe_sched_process_msg_queue(struct xe_gpu_scheduler *sched)
> >  {
> > -	if (!READ_ONCE(sched->base.pause_submit))
> > +	if (!drm_sched_is_stopped(&sched->base))
> >  		queue_work(sched->base.submit_wq, &sched->work_process_msg);
> 
> Sharing the submit_wq is legal. But next-level cleanness would be if
> struct drm_gpu_scheduler's internal components wouldn't be touched.
> That's kind of a luxury request, though.
> 

Yes, perhaps a helper to extract the submit_wq too.

> >  }
> >  
> > @@ -43,7 +43,7 @@ static void xe_sched_process_msg_work(struct work_struct *w)
> >  		container_of(w, struct xe_gpu_scheduler, work_process_msg);
> >  	struct xe_sched_msg *msg;
> >  
> > -	if (READ_ONCE(sched->base.pause_submit))
> > +	if (drm_sched_is_stopped(&sched->base))
> >  		return;
> >  
> >  	msg = xe_sched_get_msg(sched);
> > diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.h b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
> > index dceb2cd0ee5b..664c2db56af3 100644
> > --- a/drivers/gpu/drm/xe/xe_gpu_scheduler.h
> > +++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
> > @@ -56,12 +56,9 @@ static inline void xe_sched_resubmit_jobs(struct xe_gpu_scheduler *sched)
> >  	struct drm_sched_job *s_job;
> >  	bool restore_replay = false;
> >  
> > -	list_for_each_entry(s_job, &sched->base.pending_list, list) {
> > -		struct drm_sched_fence *s_fence = s_job->s_fence;
> > -		struct dma_fence *hw_fence = s_fence->parent;
> > -
> > +	drm_sched_for_each_pending_job(s_job, &sched->base, NULL) {
> >  		restore_replay |= to_xe_sched_job(s_job)->restore_replay;
> > -		if (restore_replay || (hw_fence && !dma_fence_is_signaled(hw_fence)))
> > +		if (restore_replay || !drm_sched_job_is_signaled(s_job))
> 
> So that's where this function is needed. You check whether that job in
> the pending_list is signaled. 
> 

Yes, during GT reset flows (think a device level reset) it is possible
we stop the scheduler between the window of a job signaling but before
free_job is called. We want avoid resubmission of jobs which have
signaled.

> >  			sched->base.ops->run_job(s_job);
> 
> Aaaaaahm. So you invoke your own callback. But basically just to access
> the function pointer I suppose?
> 
> Since this is effectively your drm_sched_resubmit_jobs(), it is
> definitely desirable to provide a text book example of how to do resets
> so that others can follow your usage.
> 

Yes, but drm_sched_resubmit_jobs() does some nonsense with dma-fence
that I don’t need here. Honestly, I’m a little unsure what that is
actually doing. We also use this function during VF restore after
migration. This is a multi-step process that needs to operate on the
same set of jobs at each step of the restore. That’s what the
restore_replay variable represents—it marks a job at the very beginning
of the restore process, and each step along the way ensures execution
starts at that job. Techincally once we here in a VF restore jobs can
start signaling as the hardware is live. So some of this really is
vendor-specific.

> Can't you replace ops->run_job() with a call to your functions where
> you push the jobs to the ring, directly?
> 

Yes, we could, but that function isn’t currently exported. Also, in
future products, we may assign a different run_job vfunc based on
hardware generation or queue type. So using a vfunc here makes sense as
a bit of future-proofing. Of course, we could also have a DRM
scheduler-level helper that invokes run_job for us.

Matt

> 
> P.
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 1/9] drm/sched: Add several job helpers to avoid drivers touching scheduler state
  2025-12-03  8:56   ` Philipp Stanner
@ 2025-12-03 21:10     ` Matthew Brost
  0 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2025-12-03 21:10 UTC (permalink / raw)
  To: Philipp Stanner
  Cc: intel-xe, dri-devel, dakr, Christian König, Alex Deucher

On Wed, Dec 03, 2025 at 09:56:45AM +0100, Philipp Stanner wrote:
> +Cc Christian, Alex, Danilo
> 
> 
> On Mon, 2025-12-01 at 10:39 -0800, Matthew Brost wrote:
> > Add helpers to see if scheduler is stopped and a jobs signaled state.
> > Expected to be used driver side on recovery and debug flows.
> 
> First, thanks for working on this.
> 
> This is a big and significant change because it moves towards ending
> the 10-year practice of accessing internal locks etc. – I think this
> should have a long(er) and detailed commit message aka "In the past
> drivers used to … this must end because … to do so we need to provide
> those new functions: …"
> 

Sure, let me add that.

> > 
> > v4:
> >  - Reorder patch to first in series (Niranjana)
> >  - Also check parent fence for signaling (Niranjana)
> 
> "We" mostly agreed of not adding changelogs to commit messages anymore
> and either have them in the cover letter or in the patche's comment
> section below ---
> The commit changelog comments are not canonical in the kernel and don't
> provide any value IMO.
> 

In Xe we typically keep these, right or wrong? Also if this is below
---, if I checkouk a mbox and apply it then next time I send the patch
change log is lost unless I add it back in. Maybe there is a git am
option so that doesn't happen? Anyways, I'll fix this up in the next
rev.

> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> > ---
> >  drivers/gpu/drm/scheduler/sched_main.c |  4 ++--
> >  include/drm/gpu_scheduler.h            | 32 ++++++++++++++++++++++++++
> >  2 files changed, 34 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index 1d4f1b822e7b..cf40c18ab433 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -344,7 +344,7 @@ drm_sched_rq_select_entity_fifo(struct drm_gpu_scheduler *sched,
> >   */
> >  static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
> >  {
> > -	if (!READ_ONCE(sched->pause_submit))
> > +	if (!drm_sched_is_stopped(sched))
> >  		queue_work(sched->submit_wq, &sched->work_run_job);
> >  }
> >  
> > @@ -354,7 +354,7 @@ static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
> >   */
> >  static void drm_sched_run_free_queue(struct drm_gpu_scheduler *sched)
> >  {
> > -	if (!READ_ONCE(sched->pause_submit))
> > +	if (!drm_sched_is_stopped(sched))
> >  		queue_work(sched->submit_wq, &sched->work_free_job);
> >  }
> >  
> > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > index fb88301b3c45..385bf34e76fe 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -698,4 +698,36 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> >  				   struct drm_gpu_scheduler **sched_list,
> >  				   unsigned int num_sched_list);
> >  
> > +/* Inlines */

This file has surplus comments, so just followed the style. See
'Scheduler operations' and "Jobs' in this header. But can remove.

> 
> Surplus comment, everyone immediately sees by the keyword that the
> functions are inline.
> 
> But why do you want to provide them here instead of in sched_main.c in
> the first place?

They are small functions so made them inlines but can move sched_main.c
if needed. The iterator in the following patch needs to be in header
though.

> 
> 
> > +
> > +/**
> > + * drm_sched_is_stopped() - DRM is stopped
> 
> Well no, I doubt the entire DRM subsystem is stopped ;)
> 
> "Checks whether drm_sched is stopped"
> 

Sure.

> > + * @sched: DRM scheduler
> > + *
> > + * Return: True if sched is stopped, False otherwise
> > + */
> > +static inline bool drm_sched_is_stopped(struct drm_gpu_scheduler *sched)
> > +{
> > +	return READ_ONCE(sched->pause_submit);
> 
> I am by the way suspecting since a long time
> 

?

> > +}
> > +
> > +/**
> > + * drm_sched_job_is_signaled() - DRM scheduler job is signaled
> > + * @job: DRM scheduler job
> > + *
> > + * Determine if DRM scheduler job is signaled. DRM scheduler should be stopped
> > + * to obtain a stable snapshot of state. Both parent fence (hardware fence) and
> > + * finished fence (software fence) are check to determine signaling state.
> 
> s/check/checked
>

+1
 
> I can roughly understand why you need the start/stop checkers for your
> list iterator, but what is this function's purpose? The commit message
> should explain that.
> 

Sure can adjust the commit message.

> Do you need them in Xe? Do all drivers need them?
> 

I think Xe question in answered in patch #4. Unsure on other driver.

> I think it's very cool that you provide this series and are working on
> all that, but at XDC I think the important point was that we determined
> that AMD and Intel basically do the same trick for GPU resets.
> 
> So our desire was not only to prevent folks from accessing the
> scheduler's internals, but, ideally, also provide a well documented,
> centralized and canonical mechanisms to do GPU resets.

See my reply in patch #4. I believe GPU resets could largely be generic.
However, my driver’s VF migration restore use case also calls run_job
again, which is a vendor-specific flow. So I’d prefer to keep that part
on the driver side and just use the functions provided in the first two
patches of this series to avoid touching the internals of the scheduler.
Eventually, I might push some of the logic from my custom function into
my run_job callback, but at the moment the ROI on that is quite low—and
I’m not convinced this can be made completely generic.

> 
> So I think this drm/sched code must be discussed with AMD and we should
> see whether it would be sufficient for them, too. And if yes, we need
> to properly document that new way of GPU resets and tell users what
> those functions are for. The docstrings so far just highlight that
> those functions exist and how they are used, but not *why* they exist.
> 

Again, I really doubt that everything related to GPU resets and
resubmission can be made generic. This series is about providing the
interfaces to do these things and doing so safely (e.g., not walking the
pending job list while the scheduler is running, etc.).

> > + *
> > + * Return: True if job is signaled, False otherwise
> 
> True and False should be lower case I think. At least I've never seen
> them upper case in docstrings so far?
> 

That's typically how we type this in Xe but this is a bikeshed. Can
change if you like.

Matt

> 
> P.
> 
> > + */
> > +static inline bool drm_sched_job_is_signaled(struct drm_sched_job *job)
> > +{
> > +	struct drm_sched_fence *s_fence = job->s_fence;
> > +
> > +	WARN_ON(!drm_sched_is_stopped(job->sched));
> > +	return (s_fence->parent && dma_fence_is_signaled(s_fence->parent)) ||
> > +		dma_fence_is_signaled(&s_fence->finished);
> > +}
> > +
> >  #endif
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 2/9] drm/sched: Add pending job list iterator
  2025-12-03  9:07   ` Philipp Stanner
  2025-12-03 10:28     ` Philipp Stanner
@ 2025-12-04 16:04     ` Alex Deucher
  2025-12-05  9:19       ` Christian König
  1 sibling, 1 reply; 31+ messages in thread
From: Alex Deucher @ 2025-12-04 16:04 UTC (permalink / raw)
  To: Philipp Stanner
  Cc: Matthew Brost, intel-xe, dri-devel, Alex Deucher,
	Christian König, dakr

On Wed, Dec 3, 2025 at 4:24 AM Philipp Stanner <pstanner@redhat.com> wrote:
>
> +Cc Alex, Christian, Danilo
>
>
> On Mon, 2025-12-01 at 10:39 -0800, Matthew Brost wrote:
> > Stop open coding pending job list in drivers. Add pending job list
> > iterator which safely walks DRM scheduler list asserting DRM scheduler
> > is stopped.
> >
> > v2:
> >  - Fix checkpatch (CI)
> > v3:
> >  - Drop locked version (Christian)
> > v4:
> >  - Reorder patch (Niranjana)
>
> Same with the changelog.
>
> >
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> > ---
> >  include/drm/gpu_scheduler.h | 50 +++++++++++++++++++++++++++++++++++++
> >  1 file changed, 50 insertions(+)
> >
> > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > index 385bf34e76fe..9d228513d06c 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -730,4 +730,54 @@ static inline bool drm_sched_job_is_signaled(struct drm_sched_job *job)
> >               dma_fence_is_signaled(&s_fence->finished);
> >  }
> >
> > +/**
> > + * struct drm_sched_pending_job_iter - DRM scheduler pending job iterator state
> > + * @sched: DRM scheduler associated with pending job iterator
> > + */
> > +struct drm_sched_pending_job_iter {
> > +     struct drm_gpu_scheduler *sched;
> > +};
> > +
> > +/* Drivers should never call this directly */
> > +static inline struct drm_sched_pending_job_iter
> > +__drm_sched_pending_job_iter_begin(struct drm_gpu_scheduler *sched)
> > +{
> > +     struct drm_sched_pending_job_iter iter = {
> > +             .sched = sched,
> > +     };
> > +
> > +     WARN_ON(!drm_sched_is_stopped(sched));
> > +     return iter;
> > +}
> > +
> > +/* Drivers should never call this directly */
> > +static inline void
> > +__drm_sched_pending_job_iter_end(const struct drm_sched_pending_job_iter iter)
> > +{
> > +     WARN_ON(!drm_sched_is_stopped(iter.sched));
> > +}
> > +
> > +DEFINE_CLASS(drm_sched_pending_job_iter, struct drm_sched_pending_job_iter,
> > +          __drm_sched_pending_job_iter_end(_T),
> > +          __drm_sched_pending_job_iter_begin(__sched),
> > +          struct drm_gpu_scheduler *__sched);
> > +static inline void *
> > +class_drm_sched_pending_job_iter_lock_ptr(class_drm_sched_pending_job_iter_t *_T)
> > +{ return _T; }
> > +#define class_drm_sched_pending_job_iter_is_conditional false
> > +
> > +/**
> > + * drm_sched_for_each_pending_job() - Iterator for each pending job in scheduler
> > + * @__job: Current pending job being iterated over
> > + * @__sched: DRM scheduler to iterate over pending jobs
> > + * @__entity: DRM scheduler entity to filter jobs, NULL indicates no filter
> > + *
> > + * Iterator for each pending job in scheduler, filtering on an entity, and
> > + * enforcing scheduler is fully stopped
> > + */
> > +#define drm_sched_for_each_pending_job(__job, __sched, __entity)             \
> > +     scoped_guard(drm_sched_pending_job_iter, (__sched))                     \
> > +             list_for_each_entry((__job), &(__sched)->pending_list, list)    \
> > +                     for_each_if(!(__entity) || (__job)->entity == (__entity))
> > +
> >  #endif
>
>
> See my comments in the first patch. The docu doesn't mention at all why
> this new functionality exists and when and why users would be expected
> to use it.
>
> As far as I remember from XDC, both AMD and Intel overwrite a timed out
> jobs buffer data in the rings on GPU reset. To do so, the driver needs
> the timedout job (passed through timedout_job() callback) and then
> needs all the pending non-broken jobs.
>
> AFAICS your patch provides a generic iterator over the entire
> pending_list. How is a driver then supposed to determine which are the
> non-broken jobs (just asking, but that needs to be documented)?
>
> Could it make sense to use a different iterator which only returns jobs
> of not belonging to the same context as the timedout-one?
>
> Those are important questions that need to be addressed before merging
> that.
>
> And if this works canonically (i.e., for basically everyone), it needs
> to be documented in drm_sched_resubmit_jobs() that this iterator is now
> the canonical way of handling timeouts.
>
> Moreover, btw, just yesterday I added an entry to the DRM todo list
> which addresses drm_sched_resubmit_jobs(). If we merge this, that entry
> would have to be removed, too.
>
>
> @AMD: Would the code Matthew provides work for you? Please give your
> input. This is very important common infrastructure.

I don't think drm_sched_resubmit_jobs() can work for us without major
rework.  For our kernel queues, we have a single queue on which jobs
for different clients are scheduled.  When we reset the queue, we lose
all jobs on the queue and have to re-emit the non-guilty ones.  We do
this at the ring level, i.e., we save the packets directly from the
ring and then re-emit the packets for the non-guilty contexts to the
freshly reset ring.  This avoids running run_job() again which would
issue new fences and race with memory management, etc.

I think the following would be workable:
1. driver job_timedout() callback flags the job as bad. resets the bad
queue, and calls drm_sched_resubmit_jobs()
2. drm_sched_resubmit_jobs() walks the pending list and calls
run_job() for every job
2. driver run_job() callback looks to see if we already ran this job
and uses the original fence rather than allocating a new one
3. driver run_job() callback checks to see if the job is guilty or
from the same context and if so, sets an error on the fences and
submits only the fence packet to the queue so that any follow up jobs
will properly synchronize if they need to wait on the fence from the
bad job.
4. driver run_job() callback will submit the full packet stream for
non-guilty contexts

I guess we could use the iterator and implement that logic in the
driver directly rather than using drm_sched_resubmit_jobs().

Alex

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 2/9] drm/sched: Add pending job list iterator
  2025-12-04 16:04     ` Alex Deucher
@ 2025-12-05  9:19       ` Christian König
  2025-12-05 18:54         ` Matthew Brost
  2025-12-08 13:33         ` Philipp Stanner
  0 siblings, 2 replies; 31+ messages in thread
From: Christian König @ 2025-12-05  9:19 UTC (permalink / raw)
  To: Alex Deucher, Philipp Stanner
  Cc: Matthew Brost, intel-xe, dri-devel, Alex Deucher, dakr

On 12/4/25 17:04, Alex Deucher wrote:
> On Wed, Dec 3, 2025 at 4:24 AM Philipp Stanner <pstanner@redhat.com> wrote:
>>
>> +Cc Alex, Christian, Danilo
>>
>>
>> On Mon, 2025-12-01 at 10:39 -0800, Matthew Brost wrote:
>>> Stop open coding pending job list in drivers. Add pending job list
>>> iterator which safely walks DRM scheduler list asserting DRM scheduler
>>> is stopped.
>>>
>>> v2:
>>>  - Fix checkpatch (CI)
>>> v3:
>>>  - Drop locked version (Christian)
>>> v4:
>>>  - Reorder patch (Niranjana)
>>
>> Same with the changelog.
>>
>>>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>> ---
>>>  include/drm/gpu_scheduler.h | 50 +++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 50 insertions(+)
>>>
>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>> index 385bf34e76fe..9d228513d06c 100644
>>> --- a/include/drm/gpu_scheduler.h
>>> +++ b/include/drm/gpu_scheduler.h
>>> @@ -730,4 +730,54 @@ static inline bool drm_sched_job_is_signaled(struct drm_sched_job *job)
>>>               dma_fence_is_signaled(&s_fence->finished);
>>>  }
>>>
>>> +/**
>>> + * struct drm_sched_pending_job_iter - DRM scheduler pending job iterator state
>>> + * @sched: DRM scheduler associated with pending job iterator
>>> + */
>>> +struct drm_sched_pending_job_iter {
>>> +     struct drm_gpu_scheduler *sched;
>>> +};
>>> +
>>> +/* Drivers should never call this directly */
>>> +static inline struct drm_sched_pending_job_iter
>>> +__drm_sched_pending_job_iter_begin(struct drm_gpu_scheduler *sched)
>>> +{
>>> +     struct drm_sched_pending_job_iter iter = {
>>> +             .sched = sched,
>>> +     };
>>> +
>>> +     WARN_ON(!drm_sched_is_stopped(sched));
>>> +     return iter;
>>> +}
>>> +
>>> +/* Drivers should never call this directly */
>>> +static inline void
>>> +__drm_sched_pending_job_iter_end(const struct drm_sched_pending_job_iter iter)
>>> +{
>>> +     WARN_ON(!drm_sched_is_stopped(iter.sched));
>>> +}
>>> +
>>> +DEFINE_CLASS(drm_sched_pending_job_iter, struct drm_sched_pending_job_iter,
>>> +          __drm_sched_pending_job_iter_end(_T),
>>> +          __drm_sched_pending_job_iter_begin(__sched),
>>> +          struct drm_gpu_scheduler *__sched);
>>> +static inline void *
>>> +class_drm_sched_pending_job_iter_lock_ptr(class_drm_sched_pending_job_iter_t *_T)
>>> +{ return _T; }
>>> +#define class_drm_sched_pending_job_iter_is_conditional false
>>> +
>>> +/**
>>> + * drm_sched_for_each_pending_job() - Iterator for each pending job in scheduler
>>> + * @__job: Current pending job being iterated over
>>> + * @__sched: DRM scheduler to iterate over pending jobs
>>> + * @__entity: DRM scheduler entity to filter jobs, NULL indicates no filter
>>> + *
>>> + * Iterator for each pending job in scheduler, filtering on an entity, and
>>> + * enforcing scheduler is fully stopped
>>> + */
>>> +#define drm_sched_for_each_pending_job(__job, __sched, __entity)             \
>>> +     scoped_guard(drm_sched_pending_job_iter, (__sched))                     \
>>> +             list_for_each_entry((__job), &(__sched)->pending_list, list)    \
>>> +                     for_each_if(!(__entity) || (__job)->entity == (__entity))
>>> +
>>>  #endif
>>
>>
>> See my comments in the first patch. The docu doesn't mention at all why
>> this new functionality exists and when and why users would be expected
>> to use it.
>>
>> As far as I remember from XDC, both AMD and Intel overwrite a timed out
>> jobs buffer data in the rings on GPU reset. To do so, the driver needs
>> the timedout job (passed through timedout_job() callback) and then
>> needs all the pending non-broken jobs.
>>
>> AFAICS your patch provides a generic iterator over the entire
>> pending_list. How is a driver then supposed to determine which are the
>> non-broken jobs (just asking, but that needs to be documented)?
>>
>> Could it make sense to use a different iterator which only returns jobs
>> of not belonging to the same context as the timedout-one?
>>
>> Those are important questions that need to be addressed before merging
>> that.
>>
>> And if this works canonically (i.e., for basically everyone), it needs
>> to be documented in drm_sched_resubmit_jobs() that this iterator is now
>> the canonical way of handling timeouts.
>>
>> Moreover, btw, just yesterday I added an entry to the DRM todo list
>> which addresses drm_sched_resubmit_jobs(). If we merge this, that entry
>> would have to be removed, too.
>>
>>
>> @AMD: Would the code Matthew provides work for you? Please give your
>> input. This is very important common infrastructure.
> 
> I don't think drm_sched_resubmit_jobs() can work for us without major
> rework.  For our kernel queues, we have a single queue on which jobs
> for different clients are scheduled.  When we reset the queue, we lose
> all jobs on the queue and have to re-emit the non-guilty ones.  We do
> this at the ring level, i.e., we save the packets directly from the
> ring and then re-emit the packets for the non-guilty contexts to the
> freshly reset ring.  This avoids running run_job() again which would
> issue new fences and race with memory management, etc.
> 
> I think the following would be workable:
> 1. driver job_timedout() callback flags the job as bad. resets the bad
> queue, and calls drm_sched_resubmit_jobs()
> 2. drm_sched_resubmit_jobs() walks the pending list and calls
> run_job() for every job

Calling run_job() multiple times was one of the worst ideas I have ever seen.

The problem here is that you need a transactional approach to the internal driver state which is modified by ->run_job().

So for example if you have:
->run_job(A)
->run_job(B)
->run_job(C)

And after a reset you find that you need to re-submit only job B and A & C are filtered then that means that your driver state needs to get back before running job A.

> 2. driver run_job() callback looks to see if we already ran this job
> and uses the original fence rather than allocating a new one

Nope, the problem is *all* drivers *must* use the original fence. Otherwise you always run into trouble.

We should not promote a driver interface which makes it extremely easy to shoot down the whole system.

> 3. driver run_job() callback checks to see if the job is guilty or
> from the same context and if so, sets an error on the fences and
> submits only the fence packet to the queue so that any follow up jobs
> will properly synchronize if they need to wait on the fence from the
> bad job.
> 4. driver run_job() callback will submit the full packet stream for
> non-guilty contexts
> 
> I guess we could use the iterator and implement that logic in the
> driver directly rather than using drm_sched_resubmit_jobs().

Yeah, exactly that's the way to go.

Christian.

> 
> Alex


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 2/9] drm/sched: Add pending job list iterator
  2025-12-05  9:19       ` Christian König
@ 2025-12-05 18:54         ` Matthew Brost
  2025-12-08 13:33         ` Philipp Stanner
  1 sibling, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2025-12-05 18:54 UTC (permalink / raw)
  To: Christian König
  Cc: Alex Deucher, Philipp Stanner, intel-xe, dri-devel, Alex Deucher,
	dakr

On Fri, Dec 05, 2025 at 10:19:32AM +0100, Christian König wrote:
> On 12/4/25 17:04, Alex Deucher wrote:
> > On Wed, Dec 3, 2025 at 4:24 AM Philipp Stanner <pstanner@redhat.com> wrote:
> >>
> >> +Cc Alex, Christian, Danilo
> >>
> >>
> >> On Mon, 2025-12-01 at 10:39 -0800, Matthew Brost wrote:
> >>> Stop open coding pending job list in drivers. Add pending job list
> >>> iterator which safely walks DRM scheduler list asserting DRM scheduler
> >>> is stopped.
> >>>
> >>> v2:
> >>>  - Fix checkpatch (CI)
> >>> v3:
> >>>  - Drop locked version (Christian)
> >>> v4:
> >>>  - Reorder patch (Niranjana)
> >>
> >> Same with the changelog.
> >>
> >>>
> >>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >>> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> >>> ---
> >>>  include/drm/gpu_scheduler.h | 50 +++++++++++++++++++++++++++++++++++++
> >>>  1 file changed, 50 insertions(+)
> >>>
> >>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> >>> index 385bf34e76fe..9d228513d06c 100644
> >>> --- a/include/drm/gpu_scheduler.h
> >>> +++ b/include/drm/gpu_scheduler.h
> >>> @@ -730,4 +730,54 @@ static inline bool drm_sched_job_is_signaled(struct drm_sched_job *job)
> >>>               dma_fence_is_signaled(&s_fence->finished);
> >>>  }
> >>>
> >>> +/**
> >>> + * struct drm_sched_pending_job_iter - DRM scheduler pending job iterator state
> >>> + * @sched: DRM scheduler associated with pending job iterator
> >>> + */
> >>> +struct drm_sched_pending_job_iter {
> >>> +     struct drm_gpu_scheduler *sched;
> >>> +};
> >>> +
> >>> +/* Drivers should never call this directly */
> >>> +static inline struct drm_sched_pending_job_iter
> >>> +__drm_sched_pending_job_iter_begin(struct drm_gpu_scheduler *sched)
> >>> +{
> >>> +     struct drm_sched_pending_job_iter iter = {
> >>> +             .sched = sched,
> >>> +     };
> >>> +
> >>> +     WARN_ON(!drm_sched_is_stopped(sched));
> >>> +     return iter;
> >>> +}
> >>> +
> >>> +/* Drivers should never call this directly */
> >>> +static inline void
> >>> +__drm_sched_pending_job_iter_end(const struct drm_sched_pending_job_iter iter)
> >>> +{
> >>> +     WARN_ON(!drm_sched_is_stopped(iter.sched));
> >>> +}
> >>> +
> >>> +DEFINE_CLASS(drm_sched_pending_job_iter, struct drm_sched_pending_job_iter,
> >>> +          __drm_sched_pending_job_iter_end(_T),
> >>> +          __drm_sched_pending_job_iter_begin(__sched),
> >>> +          struct drm_gpu_scheduler *__sched);
> >>> +static inline void *
> >>> +class_drm_sched_pending_job_iter_lock_ptr(class_drm_sched_pending_job_iter_t *_T)
> >>> +{ return _T; }
> >>> +#define class_drm_sched_pending_job_iter_is_conditional false
> >>> +
> >>> +/**
> >>> + * drm_sched_for_each_pending_job() - Iterator for each pending job in scheduler
> >>> + * @__job: Current pending job being iterated over
> >>> + * @__sched: DRM scheduler to iterate over pending jobs
> >>> + * @__entity: DRM scheduler entity to filter jobs, NULL indicates no filter
> >>> + *
> >>> + * Iterator for each pending job in scheduler, filtering on an entity, and
> >>> + * enforcing scheduler is fully stopped
> >>> + */
> >>> +#define drm_sched_for_each_pending_job(__job, __sched, __entity)             \
> >>> +     scoped_guard(drm_sched_pending_job_iter, (__sched))                     \
> >>> +             list_for_each_entry((__job), &(__sched)->pending_list, list)    \
> >>> +                     for_each_if(!(__entity) || (__job)->entity == (__entity))
> >>> +
> >>>  #endif
> >>
> >>
> >> See my comments in the first patch. The docu doesn't mention at all why
> >> this new functionality exists and when and why users would be expected
> >> to use it.
> >>
> >> As far as I remember from XDC, both AMD and Intel overwrite a timed out
> >> jobs buffer data in the rings on GPU reset. To do so, the driver needs
> >> the timedout job (passed through timedout_job() callback) and then
> >> needs all the pending non-broken jobs.
> >>
> >> AFAICS your patch provides a generic iterator over the entire
> >> pending_list. How is a driver then supposed to determine which are the
> >> non-broken jobs (just asking, but that needs to be documented)?
> >>
> >> Could it make sense to use a different iterator which only returns jobs
> >> of not belonging to the same context as the timedout-one?
> >>
> >> Those are important questions that need to be addressed before merging
> >> that.
> >>
> >> And if this works canonically (i.e., for basically everyone), it needs
> >> to be documented in drm_sched_resubmit_jobs() that this iterator is now
> >> the canonical way of handling timeouts.
> >>
> >> Moreover, btw, just yesterday I added an entry to the DRM todo list
> >> which addresses drm_sched_resubmit_jobs(). If we merge this, that entry
> >> would have to be removed, too.
> >>
> >>
> >> @AMD: Would the code Matthew provides work for you? Please give your
> >> input. This is very important common infrastructure.
> > 
> > I don't think drm_sched_resubmit_jobs() can work for us without major
> > rework.  For our kernel queues, we have a single queue on which jobs
> > for different clients are scheduled.  When we reset the queue, we lose
> > all jobs on the queue and have to re-emit the non-guilty ones.  We do
> > this at the ring level, i.e., we save the packets directly from the
> > ring and then re-emit the packets for the non-guilty contexts to the
> > freshly reset ring.  This avoids running run_job() again which would
> > issue new fences and race with memory management, etc.
> > 
> > I think the following would be workable:
> > 1. driver job_timedout() callback flags the job as bad. resets the bad
> > queue, and calls drm_sched_resubmit_jobs()
> > 2. drm_sched_resubmit_jobs() walks the pending list and calls
> > run_job() for every job
> 
> Calling run_job() multiple times was one of the worst ideas I have ever seen.
> 

I'm not sure who this is directed at but maybe dial back the intensity a
bit here. I really doubt this is one of the worst ideas you've ever
seen.

> The problem here is that you need a transactional approach to the internal driver state which is modified by ->run_job().
> 
> So for example if you have:
> ->run_job(A)
> ->run_job(B)
> ->run_job(C)
> 
> And after a reset you find that you need to re-submit only job B and A & C are filtered then that means that your driver state needs to get back before running job A.
> 

This scenario isn’t possible in Xe due to the 1:1 relationship between
the scheduler and the entity. Jobs execute serially on a single ring. So
if B needs to be resubmitted, so does C. I’m not sure how AMDGPU works,
but it seems like a significant problem if this scenario can occur or
the scheduler is being misused, as AFAIK jobs submitted to a scheduler
instance should signal in order.

> > 2. driver run_job() callback looks to see if we already ran this job
> > and uses the original fence rather than allocating a new one
> 
> Nope, the problem is *all* drivers *must* use the original fence. Otherwise you always run into trouble.
> 

Isn’t that what Alex is saying—always use the original fence? I fully
agree with this; Xe does the same. That’s why drm_sched_resubmit_jobs is
confusing, as the way it’s coded assumes run_job can return a different
fence than the original invocation of run_job. This is part of the
reason I didn’t use this function in Xe—it looked scary.

> We should not promote a driver interface which makes it extremely easy to shoot down the whole system.
> 
> > 3. driver run_job() callback checks to see if the job is guilty or
> > from the same context and if so, sets an error on the fences and
> > submits only the fence packet to the queue so that any follow up jobs
> > will properly synchronize if they need to wait on the fence from the
> > bad job.
> > 4. driver run_job() callback will submit the full packet stream for
> > non-guilty contexts
> > 
> > I guess we could use the iterator and implement that logic in the
> > driver directly rather than using drm_sched_resubmit_jobs().
> 
> Yeah, exactly that's the way to go.
> 

I think we’re in agreement that this patch can be rebased, address any
comments here, and then be merged?

Matt

> Christian.
> 
> > 
> > Alex
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 2/9] drm/sched: Add pending job list iterator
  2025-12-05  9:19       ` Christian König
  2025-12-05 18:54         ` Matthew Brost
@ 2025-12-08 13:33         ` Philipp Stanner
  1 sibling, 0 replies; 31+ messages in thread
From: Philipp Stanner @ 2025-12-08 13:33 UTC (permalink / raw)
  To: Christian König, Alex Deucher
  Cc: Matthew Brost, intel-xe, dri-devel, Alex Deucher, dakr

On Fri, 2025-12-05 at 10:19 +0100, Christian König wrote:
> On 12/4/25 17:04, Alex Deucher wrote:
> > On Wed, Dec 3, 2025 at 4:24 AM Philipp Stanner <pstanner@redhat.com> wrote:
> > > 
> > > +Cc Alex, Christian, Danilo
> > > 
> > > 
> > > On Mon, 2025-12-01 at 10:39 -0800, Matthew Brost wrote:
> > > > Stop open coding pending job list in drivers. Add pending job list
> > > > iterator which safely walks DRM scheduler list asserting DRM scheduler
> > > > is stopped.
> > > > 
> > > > v2:
> > > >  - Fix checkpatch (CI)
> > > > v3:
> > > >  - Drop locked version (Christian)
> > > > v4:
> > > >  - Reorder patch (Niranjana)
> > > 
> > > Same with the changelog.
> > > 
> > > > 
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> > > > ---
> > > >  include/drm/gpu_scheduler.h | 50 +++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 50 insertions(+)
> > > > 
> > > > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > > > index 385bf34e76fe..9d228513d06c 100644
> > > > --- a/include/drm/gpu_scheduler.h
> > > > +++ b/include/drm/gpu_scheduler.h
> > > > @@ -730,4 +730,54 @@ static inline bool drm_sched_job_is_signaled(struct drm_sched_job *job)
> > > >               dma_fence_is_signaled(&s_fence->finished);
> > > >  }
> > > > 
> > > > +/**
> > > > + * struct drm_sched_pending_job_iter - DRM scheduler pending job iterator state
> > > > + * @sched: DRM scheduler associated with pending job iterator
> > > > + */
> > > > +struct drm_sched_pending_job_iter {
> > > > +     struct drm_gpu_scheduler *sched;
> > > > +};
> > > > +
> > > > +/* Drivers should never call this directly */
> > > > +static inline struct drm_sched_pending_job_iter
> > > > +__drm_sched_pending_job_iter_begin(struct drm_gpu_scheduler *sched)
> > > > +{
> > > > +     struct drm_sched_pending_job_iter iter = {
> > > > +             .sched = sched,
> > > > +     };
> > > > +
> > > > +     WARN_ON(!drm_sched_is_stopped(sched));
> > > > +     return iter;
> > > > +}
> > > > +
> > > > +/* Drivers should never call this directly */
> > > > +static inline void
> > > > +__drm_sched_pending_job_iter_end(const struct drm_sched_pending_job_iter iter)
> > > > +{
> > > > +     WARN_ON(!drm_sched_is_stopped(iter.sched));
> > > > +}
> > > > +
> > > > +DEFINE_CLASS(drm_sched_pending_job_iter, struct drm_sched_pending_job_iter,
> > > > +          __drm_sched_pending_job_iter_end(_T),
> > > > +          __drm_sched_pending_job_iter_begin(__sched),
> > > > +          struct drm_gpu_scheduler *__sched);
> > > > +static inline void *
> > > > +class_drm_sched_pending_job_iter_lock_ptr(class_drm_sched_pending_job_iter_t *_T)
> > > > +{ return _T; }
> > > > +#define class_drm_sched_pending_job_iter_is_conditional false
> > > > +
> > > > +/**
> > > > + * drm_sched_for_each_pending_job() - Iterator for each pending job in scheduler
> > > > + * @__job: Current pending job being iterated over
> > > > + * @__sched: DRM scheduler to iterate over pending jobs
> > > > + * @__entity: DRM scheduler entity to filter jobs, NULL indicates no filter
> > > > + *
> > > > + * Iterator for each pending job in scheduler, filtering on an entity, and
> > > > + * enforcing scheduler is fully stopped
> > > > + */
> > > > +#define drm_sched_for_each_pending_job(__job, __sched, __entity)             \
> > > > +     scoped_guard(drm_sched_pending_job_iter, (__sched))                     \
> > > > +             list_for_each_entry((__job), &(__sched)->pending_list, list)    \
> > > > +                     for_each_if(!(__entity) || (__job)->entity == (__entity))
> > > > +
> > > >  #endif
> > > 
> > > 
> > > See my comments in the first patch. The docu doesn't mention at all why
> > > this new functionality exists and when and why users would be expected
> > > to use it.
> > > 
> > > As far as I remember from XDC, both AMD and Intel overwrite a timed out
> > > jobs buffer data in the rings on GPU reset. To do so, the driver needs
> > > the timedout job (passed through timedout_job() callback) and then
> > > needs all the pending non-broken jobs.
> > > 
> > > AFAICS your patch provides a generic iterator over the entire
> > > pending_list. How is a driver then supposed to determine which are the
> > > non-broken jobs (just asking, but that needs to be documented)?
> > > 
> > > Could it make sense to use a different iterator which only returns jobs
> > > of not belonging to the same context as the timedout-one?
> > > 
> > > Those are important questions that need to be addressed before merging
> > > that.
> > > 
> > > And if this works canonically (i.e., for basically everyone), it needs
> > > to be documented in drm_sched_resubmit_jobs() that this iterator is now
> > > the canonical way of handling timeouts.
> > > 
> > > Moreover, btw, just yesterday I added an entry to the DRM todo list
> > > which addresses drm_sched_resubmit_jobs(). If we merge this, that entry
> > > would have to be removed, too.
> > > 
> > > 
> > > @AMD: Would the code Matthew provides work for you? Please give your
> > > input. This is very important common infrastructure.
> > 
> > I don't think drm_sched_resubmit_jobs() can work for us without major
> > rework.  For our kernel queues, we have a single queue on which jobs
> > for different clients are scheduled.  When we reset the queue, we lose
> > all jobs on the queue and have to re-emit the non-guilty ones.  We do
> > this at the ring level, i.e., we save the packets directly from the
> > ring and then re-emit the packets for the non-guilty contexts to the
> > freshly reset ring.  This avoids running run_job() again which would
> > issue new fences and race with memory management, etc.
> > 
> > I think the following would be workable:
> > 1. driver job_timedout() callback flags the job as bad. resets the bad
> > queue, and calls drm_sched_resubmit_jobs()
> > 2. drm_sched_resubmit_jobs() walks the pending list and calls
> > run_job() for every job
> 
> Calling run_job() multiple times was one of the worst ideas I have ever seen.
> 
> The problem here is that you need a transactional approach to the internal driver state which is modified by ->run_job().
> 
> So for example if you have:
> ->run_job(A)
> ->run_job(B)
> ->run_job(C)
> 
> And after a reset you find that you need to re-submit only job B and A & C are filtered then that means that your driver state needs to get back before running job A.
> 
> > 2. driver run_job() callback looks to see if we already ran this job
> > and uses the original fence rather than allocating a new one
> 
> Nope, the problem is *all* drivers *must* use the original fence. Otherwise you always run into trouble.
> 
> We should not promote a driver interface which makes it extremely easy to shoot down the whole system.
> 
> > 3. driver run_job() callback checks to see if the job is guilty or
> > from the same context and if so, sets an error on the fences and
> > submits only the fence packet to the queue so that any follow up jobs
> > will properly synchronize if they need to wait on the fence from the
> > bad job.
> > 4. driver run_job() callback will submit the full packet stream for
> > non-guilty contexts
> > 
> > I guess we could use the iterator and implement that logic in the
> > driver directly rather than using drm_sched_resubmit_jobs().
> 
> Yeah, exactly that's the way to go.

Sorry, I guess my message was confusing.

I don't mean anyone to use drm_sched_resubmit_jobs() at all. That
function is deprecated, and since there are still users, we can't
easily modify it.
I just mentioned that function because its docstring should inform
about what users should do *instead* of calling that function.
Currently, it's just marked as deprecated without providing users with
an alternative.

Anyways.
So can you take a look at Matthew's iterator and see if that will work
for you?

If we'd end up with an at least mostly-generic solution for resubmits
that's in use in both Xe and amdgpu, that would be a huge leap forward.


P.

> 
> Christian.
> 
> > 
> > Alex
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7 4/9] drm/xe: Stop abusing DRM scheduler internals
  2025-12-03 20:44     ` Matthew Brost
@ 2025-12-08 13:44       ` Philipp Stanner
  0 siblings, 0 replies; 31+ messages in thread
From: Philipp Stanner @ 2025-12-08 13:44 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, dri-devel, dakr, Alex Deucher, Christian König

On Wed, 2025-12-03 at 12:44 -0800, Matthew Brost wrote:
> On Wed, Dec 03, 2025 at 11:56:01AM +0100, Philipp Stanner wrote:
> > On Mon, 2025-12-01 at 10:39 -0800, Matthew Brost wrote:
> > > Use new pending job list iterator and new helper functions in Xe to
> > > avoid reaching into DRM scheduler internals.
> > > 

[…]

> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_gpu_scheduler.c    |  4 +-
> > >  drivers/gpu/drm/xe/xe_gpu_scheduler.h    | 33 ++--------
> > >  drivers/gpu/drm/xe/xe_guc_submit.c       | 81 ++++++------------------
> > >  drivers/gpu/drm/xe/xe_guc_submit_types.h | 11 ----
> > >  drivers/gpu/drm/xe/xe_hw_fence.c         | 16 -----
> > >  drivers/gpu/drm/xe/xe_hw_fence.h         |  2 -
> > >  6 files changed, 27 insertions(+), 120 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.c b/drivers/gpu/drm/xe/xe_gpu_scheduler.c
> > > index f4f23317191f..9c8004d5dd91 100644
> > > --- a/drivers/gpu/drm/xe/xe_gpu_scheduler.c
> > > +++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.c
> > > @@ -7,7 +7,7 @@
> > >  
> > >  static void xe_sched_process_msg_queue(struct xe_gpu_scheduler *sched)
> > >  {
> > > -	if (!READ_ONCE(sched->base.pause_submit))
> > > +	if (!drm_sched_is_stopped(&sched->base))
> > >  		queue_work(sched->base.submit_wq, &sched->work_process_msg);
> > 
> > Sharing the submit_wq is legal. But next-level cleanness would be if
> > struct drm_gpu_scheduler's internal components wouldn't be touched.
> > That's kind of a luxury request, though.
> > 
> 
> Yes, perhaps a helper to extract the submit_wq too.

Could work; but best would be if driver's store their own pointer.
That's not always possible (Boris recently tried to do it for Panthor),
but often it is.

> 
> > >  }
> > >  
> > > @@ -43,7 +43,7 @@ static void xe_sched_process_msg_work(struct work_struct *w)
> > >  		container_of(w, struct xe_gpu_scheduler, work_process_msg);
> > >  	struct xe_sched_msg *msg;
> > >  
> > > -	if (READ_ONCE(sched->base.pause_submit))
> > > +	if (drm_sched_is_stopped(&sched->base))
> > >  		return;
> > >  
> > >  	msg = xe_sched_get_msg(sched);
> > > diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.h b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
> > > index dceb2cd0ee5b..664c2db56af3 100644
> > > --- a/drivers/gpu/drm/xe/xe_gpu_scheduler.h
> > > +++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
> > > @@ -56,12 +56,9 @@ static inline void xe_sched_resubmit_jobs(struct xe_gpu_scheduler *sched)
> > >  	struct drm_sched_job *s_job;
> > >  	bool restore_replay = false;
> > >  
> > > -	list_for_each_entry(s_job, &sched->base.pending_list, list) {
> > > -		struct drm_sched_fence *s_fence = s_job->s_fence;
> > > -		struct dma_fence *hw_fence = s_fence->parent;
> > > -
> > > +	drm_sched_for_each_pending_job(s_job, &sched->base, NULL) {
> > >  		restore_replay |= to_xe_sched_job(s_job)->restore_replay;
> > > -		if (restore_replay || (hw_fence && !dma_fence_is_signaled(hw_fence)))
> > > +		if (restore_replay || !drm_sched_job_is_signaled(s_job))
> > 
> > So that's where this function is needed. You check whether that job in
> > the pending_list is signaled. 
> > 
> 
> Yes, during GT reset flows (think a device level reset) it is possible
> we stop the scheduler between the window of a job signaling but before
> free_job is called. We want avoid resubmission of jobs which have
> signaled.

I'm not so convinced then that the function should be called
drm_sched_job_is_signaled(). A job is also associated with
s_fence.finished.

Why is it that that can race here, btw. – isn't it your driver which
signals the hardware fences? How and where? Interrupts?

> 
> > >  			sched->base.ops->run_job(s_job);
> > 
> > Aaaaaahm. So you invoke your own callback. But basically just to access
> > the function pointer I suppose?
> > 
> > Since this is effectively your drm_sched_resubmit_jobs(), it is
> > definitely desirable to provide a text book example of how to do resets
> > so that others can follow your usage.
> > 
> 
> Yes, but drm_sched_resubmit_jobs() does some nonsense with dma-fence
> that I don’t need here. Honestly, I’m a little unsure what that is
> actually doing.
> 

Resubmit jobs shouldn't be used anymore, it's depercated. What I mean
is that your code here effectively is the resubmission code. So if you
think that it's the right way of doing resets, in harmony with
drm_sched, then it would be good if this code is as tidy as possible
and preferably even commented, so that we can point other driver
programmers to this as an example of idiomatic usage.


>  We also use this function during VF restore after
> migration. This is a multi-step process that needs to operate on the
> same set of jobs at each step of the restore. That’s what the
> restore_replay variable represents—it marks a job at the very beginning
> of the restore process, and each step along the way ensures execution
> starts at that job. Techincally once we here in a VF restore jobs can
> start signaling as the hardware is live. So some of this really is
> vendor-specific.
> 
> > Can't you replace ops->run_job() with a call to your functions where
> > you push the jobs to the ring, directly?
> > 
> 
> Yes, we could, but that function isn’t currently exported. Also, in
> future products, we may assign a different run_job vfunc based on
> hardware generation or queue type. So using a vfunc here makes sense as
> a bit of future-proofing. Of course, we could also have a DRM
> scheduler-level helper that invokes run_job for us.

OK.

But same comment as above applies, having distinct pointers would be the cleanest thing.


P.


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2025-12-08 13:44 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-01 18:39 [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
2025-12-01 18:39 ` [PATCH v7 1/9] drm/sched: Add several job helpers to avoid drivers touching scheduler state Matthew Brost
2025-12-03  8:56   ` Philipp Stanner
2025-12-03 21:10     ` Matthew Brost
2025-12-01 18:39 ` [PATCH v7 2/9] drm/sched: Add pending job list iterator Matthew Brost
2025-12-03  9:07   ` Philipp Stanner
2025-12-03 10:28     ` Philipp Stanner
2025-12-04 16:04     ` Alex Deucher
2025-12-05  9:19       ` Christian König
2025-12-05 18:54         ` Matthew Brost
2025-12-08 13:33         ` Philipp Stanner
2025-12-01 18:39 ` [PATCH v7 3/9] drm/xe: Add dedicated message lock Matthew Brost
2025-12-03  9:38   ` Philipp Stanner
2025-12-01 18:39 ` [PATCH v7 4/9] drm/xe: Stop abusing DRM scheduler internals Matthew Brost
2025-12-03 10:56   ` Philipp Stanner
2025-12-03 20:44     ` Matthew Brost
2025-12-08 13:44       ` Philipp Stanner
2025-12-01 18:39 ` [PATCH v7 5/9] drm/xe: Only toggle scheduling in TDR if GuC is running Matthew Brost
2025-12-01 18:39 ` [PATCH v7 6/9] drm/xe: Do not deregister queues in TDR Matthew Brost
2025-12-01 18:39 ` [PATCH v7 7/9] drm/xe: Remove special casing for LR queues in submission Matthew Brost
2025-12-01 18:39 ` [PATCH v7 8/9] drm/xe: Disable timestamp WA on VFs Matthew Brost
2025-12-02  6:42   ` Umesh Nerlige Ramappa
2025-12-01 18:39 ` [PATCH v7 9/9] drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR Matthew Brost
2025-12-02  7:31   ` Umesh Nerlige Ramappa
2025-12-02 15:14     ` Matthew Brost
2025-12-02  0:53 ` ✗ CI.checkpatch: warning for Fix DRM scheduler layering violations in Xe (rev8) Patchwork
2025-12-02  0:55 ` ✓ CI.KUnit: success " Patchwork
2025-12-02  2:05 ` ✓ Xe.CI.BAT: " Patchwork
2025-12-02  5:18 ` ✓ Xe.CI.Full: " Patchwork
2025-12-03  1:23 ` [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
2025-12-03  8:33   ` Philipp Stanner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox