[PATCH v4 0/8] Fix DRM scheduler layering violations in Xe

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 0/8] Fix DRM scheduler layering violations in Xe
@ 2025-11-19 22:40 Matthew Brost
  2025-11-19 22:40 ` [PATCH v4 1/8] drm/sched: Add several job helpers to avoid drivers touching scheduler state Matthew Brost
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: Matthew Brost @ 2025-11-19 22:40 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: niranjana.vishwanathapura, umesh.nerlige.ramappa,
	christian.koenig, pstanner, dakr

At XDC, we discussed that drivers should avoid accessing DRM scheduler
internals, misusing DRM scheduler locks, and adopt a well-defined
pending job list iterator. This series proposes the necessary changes to
the DRM scheduler to bring Xe in line with that agreement and updates Xe
to use the new DRM scheduler API.

While here, cleanup LR queue handling and simplify GuC state machine in
Xe too.

v2:
 - Fix checkpatch / naming issues
v3:
 - Only allow pending job list iterator to be called on stopped schedulers
 - Cleanup LR queue handling / fix a few misselanous Xe scheduler issues
v4:
 - Address Niranjana's feedback
 - Add patch to avoid toggling scheduler state in the TDR

Matt

Matthew Brost (8):
  drm/sched: Add several job helpers to avoid drivers touching scheduler
    state
  drm/sched: Add pending job list iterator
  drm/xe: Add dedicated message lock
  drm/xe: Stop abusing DRM scheduler internals
  drm/xe: Only toggle scheduling in TDR if GuC is running
  drm/xe: Do not deregister queues in TDR
  drm/xe: Remove special casing for LR queues in submission
  drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR

 drivers/gpu/drm/scheduler/sched_main.c       |   4 +-
 drivers/gpu/drm/xe/xe_gpu_scheduler.c        |   9 +-
 drivers/gpu/drm/xe/xe_gpu_scheduler.h        |  38 +-
 drivers/gpu/drm/xe/xe_gpu_scheduler_types.h  |   2 +
 drivers/gpu/drm/xe/xe_guc_exec_queue_types.h |   2 -
 drivers/gpu/drm/xe/xe_guc_submit.c           | 362 +++----------------
 drivers/gpu/drm/xe/xe_guc_submit_types.h     |  11 -
 drivers/gpu/drm/xe/xe_hw_fence.c             |  16 -
 drivers/gpu/drm/xe/xe_hw_fence.h             |   2 -
 drivers/gpu/drm/xe/xe_sched_job.c            |   1 +
 drivers/gpu/drm/xe/xe_sched_job_types.h      |   2 +
 drivers/gpu/drm/xe/xe_trace.h                |   5 -
 include/drm/gpu_scheduler.h                  |  82 +++++
 13 files changed, 157 insertions(+), 379 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 1/8] drm/sched: Add several job helpers to avoid drivers touching scheduler state
  2025-11-19 22:40 [PATCH v4 0/8] Fix DRM scheduler layering violations in Xe Matthew Brost
@ 2025-11-19 22:40 ` Matthew Brost
  2025-11-20 17:20   ` Niranjana Vishwanathapura
  2025-11-19 22:41 ` [PATCH v4 2/8] drm/sched: Add pending job list iterator Matthew Brost
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Matthew Brost @ 2025-11-19 22:40 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: niranjana.vishwanathapura, umesh.nerlige.ramappa,
	christian.koenig, pstanner, dakr

Add helpers to see if scheduler is stopped and a jobs signaled state.
Expected to be used driver side on recovery and debug flows.

v4:
 - Reorder patch to first in series (Niranjana)
 - Also check parent fence for signaling (Niranjana)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/scheduler/sched_main.c |  4 ++--
 include/drm/gpu_scheduler.h            | 32 ++++++++++++++++++++++++++
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 1d4f1b822e7b..cf40c18ab433 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -344,7 +344,7 @@ drm_sched_rq_select_entity_fifo(struct drm_gpu_scheduler *sched,
  */
 static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
 {
-	if (!READ_ONCE(sched->pause_submit))
+	if (!drm_sched_is_stopped(sched))
 		queue_work(sched->submit_wq, &sched->work_run_job);
 }
 
@@ -354,7 +354,7 @@ static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
  */
 static void drm_sched_run_free_queue(struct drm_gpu_scheduler *sched)
 {
-	if (!READ_ONCE(sched->pause_submit))
+	if (!drm_sched_is_stopped(sched))
 		queue_work(sched->submit_wq, &sched->work_free_job);
 }
 
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index fb88301b3c45..385bf34e76fe 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -698,4 +698,36 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
 				   struct drm_gpu_scheduler **sched_list,
 				   unsigned int num_sched_list);
 
+/* Inlines */
+
+/**
+ * drm_sched_is_stopped() - DRM is stopped
+ * @sched: DRM scheduler
+ *
+ * Return: True if sched is stopped, False otherwise
+ */
+static inline bool drm_sched_is_stopped(struct drm_gpu_scheduler *sched)
+{
+	return READ_ONCE(sched->pause_submit);
+}
+
+/**
+ * drm_sched_job_is_signaled() - DRM scheduler job is signaled
+ * @job: DRM scheduler job
+ *
+ * Determine if DRM scheduler job is signaled. DRM scheduler should be stopped
+ * to obtain a stable snapshot of state. Both parent fence (hardware fence) and
+ * finished fence (software fence) are check to determine signaling state.
+ *
+ * Return: True if job is signaled, False otherwise
+ */
+static inline bool drm_sched_job_is_signaled(struct drm_sched_job *job)
+{
+	struct drm_sched_fence *s_fence = job->s_fence;
+
+	WARN_ON(!drm_sched_is_stopped(job->sched));
+	return (s_fence->parent && dma_fence_is_signaled(s_fence->parent)) ||
+		dma_fence_is_signaled(&s_fence->finished);
+}
+
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 2/8] drm/sched: Add pending job list iterator
  2025-11-19 22:40 [PATCH v4 0/8] Fix DRM scheduler layering violations in Xe Matthew Brost
  2025-11-19 22:40 ` [PATCH v4 1/8] drm/sched: Add several job helpers to avoid drivers touching scheduler state Matthew Brost
@ 2025-11-19 22:41 ` Matthew Brost
  2025-11-20 17:21   ` Niranjana Vishwanathapura
  2025-11-19 22:41 ` [PATCH v4 3/8] drm/xe: Add dedicated message lock Matthew Brost
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Matthew Brost @ 2025-11-19 22:41 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: niranjana.vishwanathapura, umesh.nerlige.ramappa,
	christian.koenig, pstanner, dakr

Stop open coding pending job list in drivers. Add pending job list
iterator which safely walks DRM scheduler list asserting DRM scheduler
is stopped.

v2:
 - Fix checkpatch (CI)
v3:
 - Drop locked version (Christian)
v4:
 - Reorder patch (Niranjana)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 include/drm/gpu_scheduler.h | 50 +++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 385bf34e76fe..9d228513d06c 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -730,4 +730,54 @@ static inline bool drm_sched_job_is_signaled(struct drm_sched_job *job)
 		dma_fence_is_signaled(&s_fence->finished);
 }
 
+/**
+ * struct drm_sched_pending_job_iter - DRM scheduler pending job iterator state
+ * @sched: DRM scheduler associated with pending job iterator
+ */
+struct drm_sched_pending_job_iter {
+	struct drm_gpu_scheduler *sched;
+};
+
+/* Drivers should never call this directly */
+static inline struct drm_sched_pending_job_iter
+__drm_sched_pending_job_iter_begin(struct drm_gpu_scheduler *sched)
+{
+	struct drm_sched_pending_job_iter iter = {
+		.sched = sched,
+	};
+
+	WARN_ON(!drm_sched_is_stopped(sched));
+	return iter;
+}
+
+/* Drivers should never call this directly */
+static inline void
+__drm_sched_pending_job_iter_end(const struct drm_sched_pending_job_iter iter)
+{
+	WARN_ON(!drm_sched_is_stopped(iter.sched));
+}
+
+DEFINE_CLASS(drm_sched_pending_job_iter, struct drm_sched_pending_job_iter,
+	     __drm_sched_pending_job_iter_end(_T),
+	     __drm_sched_pending_job_iter_begin(__sched),
+	     struct drm_gpu_scheduler *__sched);
+static inline void *
+class_drm_sched_pending_job_iter_lock_ptr(class_drm_sched_pending_job_iter_t *_T)
+{ return _T; }
+#define class_drm_sched_pending_job_iter_is_conditional false
+
+/**
+ * drm_sched_for_each_pending_job() - Iterator for each pending job in scheduler
+ * @__job: Current pending job being iterated over
+ * @__sched: DRM scheduler to iterate over pending jobs
+ * @__entity: DRM scheduler entity to filter jobs, NULL indicates no filter
+ *
+ * Iterator for each pending job in scheduler, filtering on an entity, and
+ * enforcing scheduler is fully stopped
+ */
+#define drm_sched_for_each_pending_job(__job, __sched, __entity)		\
+	scoped_guard(drm_sched_pending_job_iter, (__sched))			\
+		list_for_each_entry((__job), &(__sched)->pending_list, list)	\
+			for_each_if(!(__entity) || (__job)->entity == (__entity))
+
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 3/8] drm/xe: Add dedicated message lock
  2025-11-19 22:40 [PATCH v4 0/8] Fix DRM scheduler layering violations in Xe Matthew Brost
  2025-11-19 22:40 ` [PATCH v4 1/8] drm/sched: Add several job helpers to avoid drivers touching scheduler state Matthew Brost
  2025-11-19 22:41 ` [PATCH v4 2/8] drm/sched: Add pending job list iterator Matthew Brost
@ 2025-11-19 22:41 ` Matthew Brost
  2025-11-19 22:41 ` [PATCH v4 4/8] drm/xe: Stop abusing DRM scheduler internals Matthew Brost
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Matthew Brost @ 2025-11-19 22:41 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: niranjana.vishwanathapura, umesh.nerlige.ramappa,
	christian.koenig, pstanner, dakr

Stop abusing DRM scheduler job list lock for messages, add dedicated
message lock.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/xe/xe_gpu_scheduler.c       | 5 +++--
 drivers/gpu/drm/xe/xe_gpu_scheduler.h       | 4 ++--
 drivers/gpu/drm/xe/xe_gpu_scheduler_types.h | 2 ++
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.c b/drivers/gpu/drm/xe/xe_gpu_scheduler.c
index f91e06d03511..f4f23317191f 100644
--- a/drivers/gpu/drm/xe/xe_gpu_scheduler.c
+++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.c
@@ -77,6 +77,7 @@ int xe_sched_init(struct xe_gpu_scheduler *sched,
 	};
 
 	sched->ops = xe_ops;
+	spin_lock_init(&sched->msg_lock);
 	INIT_LIST_HEAD(&sched->msgs);
 	INIT_WORK(&sched->work_process_msg, xe_sched_process_msg_work);
 
@@ -117,7 +118,7 @@ void xe_sched_add_msg(struct xe_gpu_scheduler *sched,
 void xe_sched_add_msg_locked(struct xe_gpu_scheduler *sched,
 			     struct xe_sched_msg *msg)
 {
-	lockdep_assert_held(&sched->base.job_list_lock);
+	lockdep_assert_held(&sched->msg_lock);
 
 	list_add_tail(&msg->link, &sched->msgs);
 	xe_sched_process_msg_queue(sched);
@@ -131,7 +132,7 @@ void xe_sched_add_msg_locked(struct xe_gpu_scheduler *sched,
 void xe_sched_add_msg_head(struct xe_gpu_scheduler *sched,
 			   struct xe_sched_msg *msg)
 {
-	lockdep_assert_held(&sched->base.job_list_lock);
+	lockdep_assert_held(&sched->msg_lock);
 
 	list_add(&msg->link, &sched->msgs);
 	xe_sched_process_msg_queue(sched);
diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.h b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
index 9955397aaaa9..b971b6b69419 100644
--- a/drivers/gpu/drm/xe/xe_gpu_scheduler.h
+++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
@@ -33,12 +33,12 @@ void xe_sched_add_msg_head(struct xe_gpu_scheduler *sched,
 
 static inline void xe_sched_msg_lock(struct xe_gpu_scheduler *sched)
 {
-	spin_lock(&sched->base.job_list_lock);
+	spin_lock(&sched->msg_lock);
 }
 
 static inline void xe_sched_msg_unlock(struct xe_gpu_scheduler *sched)
 {
-	spin_unlock(&sched->base.job_list_lock);
+	spin_unlock(&sched->msg_lock);
 }
 
 static inline void xe_sched_stop(struct xe_gpu_scheduler *sched)
diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler_types.h b/drivers/gpu/drm/xe/xe_gpu_scheduler_types.h
index 6731b13da8bb..63d9bf92583c 100644
--- a/drivers/gpu/drm/xe/xe_gpu_scheduler_types.h
+++ b/drivers/gpu/drm/xe/xe_gpu_scheduler_types.h
@@ -47,6 +47,8 @@ struct xe_gpu_scheduler {
 	const struct xe_sched_backend_ops	*ops;
 	/** @msgs: list of messages to be processed in @work_process_msg */
 	struct list_head			msgs;
+	/** @msg_lock: Message lock */
+	spinlock_t				msg_lock;
 	/** @work_process_msg: processes messages */
 	struct work_struct		work_process_msg;
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 4/8] drm/xe: Stop abusing DRM scheduler internals
  2025-11-19 22:40 [PATCH v4 0/8] Fix DRM scheduler layering violations in Xe Matthew Brost
                   ` (2 preceding siblings ...)
  2025-11-19 22:41 ` [PATCH v4 3/8] drm/xe: Add dedicated message lock Matthew Brost
@ 2025-11-19 22:41 ` Matthew Brost
  2025-11-20 17:26   ` Niranjana Vishwanathapura
  2025-11-19 22:41 ` [PATCH v4 5/8] drm/xe: Only toggle scheduling in TDR if GuC is running Matthew Brost
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Matthew Brost @ 2025-11-19 22:41 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: niranjana.vishwanathapura, umesh.nerlige.ramappa,
	christian.koenig, pstanner, dakr

Use new pending job list iterator and new helper functions in Xe to
avoid reaching into DRM scheduler internals.

Part of this change involves removing pending jobs debug information
from debugfs and devcoredump. As agreed, the pending job list should
only be accessed when the scheduler is stopped. However, it's not
straightforward to determine whether the scheduler is stopped from the
shared debugfs/devcoredump code path. Additionally, the pending job list
provides little useful information, as pending jobs can be inferred from
seqnos and ring head/tail positions. Therefore, this debug information
is being removed.

v4:
 - Add comment around DRM_GPU_SCHED_STAT_NO_HANG (Niranjana)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_gpu_scheduler.c    |  4 +-
 drivers/gpu/drm/xe/xe_gpu_scheduler.h    | 34 +++--------
 drivers/gpu/drm/xe/xe_guc_submit.c       | 78 +++++-------------------
 drivers/gpu/drm/xe/xe_guc_submit_types.h | 11 ----
 drivers/gpu/drm/xe/xe_hw_fence.c         | 16 -----
 drivers/gpu/drm/xe/xe_hw_fence.h         |  2 -
 6 files changed, 24 insertions(+), 121 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.c b/drivers/gpu/drm/xe/xe_gpu_scheduler.c
index f4f23317191f..9c8004d5dd91 100644
--- a/drivers/gpu/drm/xe/xe_gpu_scheduler.c
+++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.c
@@ -7,7 +7,7 @@
 
 static void xe_sched_process_msg_queue(struct xe_gpu_scheduler *sched)
 {
-	if (!READ_ONCE(sched->base.pause_submit))
+	if (!drm_sched_is_stopped(&sched->base))
 		queue_work(sched->base.submit_wq, &sched->work_process_msg);
 }
 
@@ -43,7 +43,7 @@ static void xe_sched_process_msg_work(struct work_struct *w)
 		container_of(w, struct xe_gpu_scheduler, work_process_msg);
 	struct xe_sched_msg *msg;
 
-	if (READ_ONCE(sched->base.pause_submit))
+	if (drm_sched_is_stopped(&sched->base))
 		return;
 
 	msg = xe_sched_get_msg(sched);
diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.h b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
index b971b6b69419..583372a78140 100644
--- a/drivers/gpu/drm/xe/xe_gpu_scheduler.h
+++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
@@ -55,14 +55,10 @@ static inline void xe_sched_resubmit_jobs(struct xe_gpu_scheduler *sched)
 {
 	struct drm_sched_job *s_job;
 
-	list_for_each_entry(s_job, &sched->base.pending_list, list) {
-		struct drm_sched_fence *s_fence = s_job->s_fence;
-		struct dma_fence *hw_fence = s_fence->parent;
-
+	drm_sched_for_each_pending_job(s_job, &sched->base, NULL)
 		if (to_xe_sched_job(s_job)->skip_emit ||
-		    (hw_fence && !dma_fence_is_signaled(hw_fence)))
+		    !drm_sched_job_is_signaled(s_job))
 			sched->base.ops->run_job(s_job);
-	}
 }
 
 static inline bool
@@ -71,14 +67,6 @@ xe_sched_invalidate_job(struct xe_sched_job *job, int threshold)
 	return drm_sched_invalidate_job(&job->drm, threshold);
 }
 
-static inline void xe_sched_add_pending_job(struct xe_gpu_scheduler *sched,
-					    struct xe_sched_job *job)
-{
-	spin_lock(&sched->base.job_list_lock);
-	list_add(&job->drm.list, &sched->base.pending_list);
-	spin_unlock(&sched->base.job_list_lock);
-}
-
 /**
  * xe_sched_first_pending_job() - Find first pending job which is unsignaled
  * @sched: Xe GPU scheduler
@@ -88,21 +76,13 @@ static inline void xe_sched_add_pending_job(struct xe_gpu_scheduler *sched,
 static inline
 struct xe_sched_job *xe_sched_first_pending_job(struct xe_gpu_scheduler *sched)
 {
-	struct xe_sched_job *job, *r_job = NULL;
-
-	spin_lock(&sched->base.job_list_lock);
-	list_for_each_entry(job, &sched->base.pending_list, drm.list) {
-		struct drm_sched_fence *s_fence = job->drm.s_fence;
-		struct dma_fence *hw_fence = s_fence->parent;
+	struct drm_sched_job *job;
 
-		if (hw_fence && !dma_fence_is_signaled(hw_fence)) {
-			r_job = job;
-			break;
-		}
-	}
-	spin_unlock(&sched->base.job_list_lock);
+	drm_sched_for_each_pending_job(job, &sched->base, NULL)
+		if (!drm_sched_job_is_signaled(job))
+			return to_xe_sched_job(job);
 
-	return r_job;
+	return NULL;
 }
 
 static inline int
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index d4ffdb71ef3d..3ee35d4873bc 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1032,7 +1032,7 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
 	struct xe_exec_queue *q = ge->q;
 	struct xe_guc *guc = exec_queue_to_guc(q);
 	struct xe_gpu_scheduler *sched = &ge->sched;
-	struct xe_sched_job *job;
+	struct drm_sched_job *job;
 	bool wedged = false;
 
 	xe_gt_assert(guc_to_gt(guc), xe_exec_queue_is_lr(q));
@@ -1091,16 +1091,10 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
 	if (!exec_queue_killed(q) && !xe_lrc_ring_is_idle(q->lrc[0]))
 		xe_devcoredump(q, NULL, "LR job cleanup, guc_id=%d", q->guc->id);
 
-	xe_hw_fence_irq_stop(q->fence_irq);
+	drm_sched_for_each_pending_job(job, &sched->base, NULL)
+		xe_sched_job_set_error(to_xe_sched_job(job), -ECANCELED);
 
 	xe_sched_submission_start(sched);
-
-	spin_lock(&sched->base.job_list_lock);
-	list_for_each_entry(job, &sched->base.pending_list, drm.list)
-		xe_sched_job_set_error(job, -ECANCELED);
-	spin_unlock(&sched->base.job_list_lock);
-
-	xe_hw_fence_irq_start(q->fence_irq);
 }
 
 #define ADJUST_FIVE_PERCENT(__t)	mul_u64_u32_div(__t, 105, 100)
@@ -1219,7 +1213,7 @@ static enum drm_gpu_sched_stat
 guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 {
 	struct xe_sched_job *job = to_xe_sched_job(drm_job);
-	struct xe_sched_job *tmp_job;
+	struct drm_sched_job *tmp_job;
 	struct xe_exec_queue *q = job->q;
 	struct xe_gpu_scheduler *sched = &q->guc->sched;
 	struct xe_guc *guc = exec_queue_to_guc(q);
@@ -1228,7 +1222,6 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	unsigned int fw_ref;
 	int err = -ETIME;
 	pid_t pid = -1;
-	int i = 0;
 	bool wedged = false, skip_timeout_check;
 
 	xe_gt_assert(guc_to_gt(guc), !xe_exec_queue_is_lr(q));
@@ -1395,28 +1388,19 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 		__deregister_exec_queue(guc, q);
 	}
 
-	/* Stop fence signaling */
-	xe_hw_fence_irq_stop(q->fence_irq);
+	/* Mark all outstanding jobs as bad, thus completing them */
+	xe_sched_job_set_error(job, err);
+	drm_sched_for_each_pending_job(tmp_job, &sched->base, NULL)
+		xe_sched_job_set_error(to_xe_sched_job(tmp_job), -ECANCELED);
 
-	/*
-	 * Fence state now stable, stop / start scheduler which cleans up any
-	 * fences that are complete
-	 */
-	xe_sched_add_pending_job(sched, job);
 	xe_sched_submission_start(sched);
-
 	xe_guc_exec_queue_trigger_cleanup(q);
 
-	/* Mark all outstanding jobs as bad, thus completing them */
-	spin_lock(&sched->base.job_list_lock);
-	list_for_each_entry(tmp_job, &sched->base.pending_list, drm.list)
-		xe_sched_job_set_error(tmp_job, !i++ ? err : -ECANCELED);
-	spin_unlock(&sched->base.job_list_lock);
-
-	/* Start fence signaling */
-	xe_hw_fence_irq_start(q->fence_irq);
-
-	return DRM_GPU_SCHED_STAT_RESET;
+	/*
+	 * We want the job added back to the pending list so it gets freed; this
+	 * is what DRM_GPU_SCHED_STAT_NO_HANG does.
+	 */
+	return DRM_GPU_SCHED_STAT_NO_HANG;
 
 sched_enable:
 	set_exec_queue_pending_tdr_exit(q);
@@ -2244,7 +2228,7 @@ static void guc_exec_queue_unpause_prepare(struct xe_guc *guc,
 	struct drm_sched_job *s_job;
 	struct xe_sched_job *job = NULL;
 
-	list_for_each_entry(s_job, &sched->base.pending_list, list) {
+	drm_sched_for_each_pending_job(s_job, &sched->base, NULL) {
 		job = to_xe_sched_job(s_job);
 
 		xe_gt_dbg(guc_to_gt(guc), "Replay JOB - guc_id=%d, seqno=%d",
@@ -2349,7 +2333,7 @@ void xe_guc_submit_unpause(struct xe_guc *guc)
 		 * created after resfix done.
 		 */
 		if (q->guc->id != index ||
-		    !READ_ONCE(q->guc->sched.base.pause_submit))
+		    !drm_sched_is_stopped(&q->guc->sched.base))
 			continue;
 
 		guc_exec_queue_unpause(guc, q);
@@ -2771,30 +2755,6 @@ xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q)
 	if (snapshot->parallel_execution)
 		guc_exec_queue_wq_snapshot_capture(q, snapshot);
 
-	spin_lock(&sched->base.job_list_lock);
-	snapshot->pending_list_size = list_count_nodes(&sched->base.pending_list);
-	snapshot->pending_list = kmalloc_array(snapshot->pending_list_size,
-					       sizeof(struct pending_list_snapshot),
-					       GFP_ATOMIC);
-
-	if (snapshot->pending_list) {
-		struct xe_sched_job *job_iter;
-
-		i = 0;
-		list_for_each_entry(job_iter, &sched->base.pending_list, drm.list) {
-			snapshot->pending_list[i].seqno =
-				xe_sched_job_seqno(job_iter);
-			snapshot->pending_list[i].fence =
-				dma_fence_is_signaled(job_iter->fence) ? 1 : 0;
-			snapshot->pending_list[i].finished =
-				dma_fence_is_signaled(&job_iter->drm.s_fence->finished)
-				? 1 : 0;
-			i++;
-		}
-	}
-
-	spin_unlock(&sched->base.job_list_lock);
-
 	return snapshot;
 }
 
@@ -2852,13 +2812,6 @@ xe_guc_exec_queue_snapshot_print(struct xe_guc_submit_exec_queue_snapshot *snaps
 
 	if (snapshot->parallel_execution)
 		guc_exec_queue_wq_snapshot_print(snapshot, p);
-
-	for (i = 0; snapshot->pending_list && i < snapshot->pending_list_size;
-	     i++)
-		drm_printf(p, "\tJob: seqno=%d, fence=%d, finished=%d\n",
-			   snapshot->pending_list[i].seqno,
-			   snapshot->pending_list[i].fence,
-			   snapshot->pending_list[i].finished);
 }
 
 /**
@@ -2881,7 +2834,6 @@ void xe_guc_exec_queue_snapshot_free(struct xe_guc_submit_exec_queue_snapshot *s
 			xe_lrc_snapshot_free(snapshot->lrc[i]);
 		kfree(snapshot->lrc);
 	}
-	kfree(snapshot->pending_list);
 	kfree(snapshot);
 }
 
diff --git a/drivers/gpu/drm/xe/xe_guc_submit_types.h b/drivers/gpu/drm/xe/xe_guc_submit_types.h
index dc7456c34583..0b08c79cf3b9 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_submit_types.h
@@ -61,12 +61,6 @@ struct guc_submit_parallel_scratch {
 	u32 wq[WQ_SIZE / sizeof(u32)];
 };
 
-struct pending_list_snapshot {
-	u32 seqno;
-	bool fence;
-	bool finished;
-};
-
 /**
  * struct xe_guc_submit_exec_queue_snapshot - Snapshot for devcoredump
  */
@@ -134,11 +128,6 @@ struct xe_guc_submit_exec_queue_snapshot {
 		/** @wq: Workqueue Items */
 		u32 wq[WQ_SIZE / sizeof(u32)];
 	} parallel;
-
-	/** @pending_list_size: Size of the pending list snapshot array */
-	int pending_list_size;
-	/** @pending_list: snapshot of the pending list info */
-	struct pending_list_snapshot *pending_list;
 };
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_hw_fence.c b/drivers/gpu/drm/xe/xe_hw_fence.c
index b2a0c46dfcd4..e65dfcdfdbc5 100644
--- a/drivers/gpu/drm/xe/xe_hw_fence.c
+++ b/drivers/gpu/drm/xe/xe_hw_fence.c
@@ -110,22 +110,6 @@ void xe_hw_fence_irq_run(struct xe_hw_fence_irq *irq)
 	irq_work_queue(&irq->work);
 }
 
-void xe_hw_fence_irq_stop(struct xe_hw_fence_irq *irq)
-{
-	spin_lock_irq(&irq->lock);
-	irq->enabled = false;
-	spin_unlock_irq(&irq->lock);
-}
-
-void xe_hw_fence_irq_start(struct xe_hw_fence_irq *irq)
-{
-	spin_lock_irq(&irq->lock);
-	irq->enabled = true;
-	spin_unlock_irq(&irq->lock);
-
-	irq_work_queue(&irq->work);
-}
-
 void xe_hw_fence_ctx_init(struct xe_hw_fence_ctx *ctx, struct xe_gt *gt,
 			  struct xe_hw_fence_irq *irq, const char *name)
 {
diff --git a/drivers/gpu/drm/xe/xe_hw_fence.h b/drivers/gpu/drm/xe/xe_hw_fence.h
index f13a1c4982c7..599492c13f80 100644
--- a/drivers/gpu/drm/xe/xe_hw_fence.h
+++ b/drivers/gpu/drm/xe/xe_hw_fence.h
@@ -17,8 +17,6 @@ void xe_hw_fence_module_exit(void);
 void xe_hw_fence_irq_init(struct xe_hw_fence_irq *irq);
 void xe_hw_fence_irq_finish(struct xe_hw_fence_irq *irq);
 void xe_hw_fence_irq_run(struct xe_hw_fence_irq *irq);
-void xe_hw_fence_irq_stop(struct xe_hw_fence_irq *irq);
-void xe_hw_fence_irq_start(struct xe_hw_fence_irq *irq);
 
 void xe_hw_fence_ctx_init(struct xe_hw_fence_ctx *ctx, struct xe_gt *gt,
 			  struct xe_hw_fence_irq *irq, const char *name);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 5/8] drm/xe: Only toggle scheduling in TDR if GuC is running
  2025-11-19 22:40 [PATCH v4 0/8] Fix DRM scheduler layering violations in Xe Matthew Brost
                   ` (3 preceding siblings ...)
  2025-11-19 22:41 ` [PATCH v4 4/8] drm/xe: Stop abusing DRM scheduler internals Matthew Brost
@ 2025-11-19 22:41 ` Matthew Brost
  2025-11-20 19:48   ` Niranjana Vishwanathapura
  2025-11-19 22:41 ` [PATCH v4 6/8] drm/xe: Do not deregister queues in TDR Matthew Brost
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Matthew Brost @ 2025-11-19 22:41 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: niranjana.vishwanathapura, umesh.nerlige.ramappa,
	christian.koenig, pstanner, dakr

If the firmware is not running during TDR (e.g., when the driver is
unloading), there's no need to toggle scheduling in the GuC. In such
cases, skip this step.

v4:
 - Bail on wait UC not running (Niranjana)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 3ee35d4873bc..648c9ea06749 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1277,7 +1277,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 		if (exec_queue_reset(q))
 			err = -EIO;
 
-		if (!exec_queue_destroyed(q)) {
+		if (!exec_queue_destroyed(q) && xe_uc_fw_is_running(&guc->fw)) {
 			/*
 			 * Wait for any pending G2H to flush out before
 			 * modifying state
@@ -1312,6 +1312,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 		 */
 		smp_rmb();
 		ret = wait_event_timeout(guc->ct.wq,
+					 !xe_uc_fw_is_running(&guc->fw) ||
 					 !exec_queue_pending_disable(q) ||
 					 xe_guc_read_stopped(guc) ||
 					 vf_recovery(guc), HZ * 5);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 6/8] drm/xe: Do not deregister queues in TDR
  2025-11-19 22:40 [PATCH v4 0/8] Fix DRM scheduler layering violations in Xe Matthew Brost
                   ` (4 preceding siblings ...)
  2025-11-19 22:41 ` [PATCH v4 5/8] drm/xe: Only toggle scheduling in TDR if GuC is running Matthew Brost
@ 2025-11-19 22:41 ` Matthew Brost
  2025-11-20 19:50   ` Niranjana Vishwanathapura
  2025-11-19 22:41 ` [PATCH v4 7/8] drm/xe: Remove special casing for LR queues in submission Matthew Brost
  2025-11-19 22:41 ` [PATCH v4 8/8] drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR Matthew Brost
  7 siblings, 1 reply; 22+ messages in thread
From: Matthew Brost @ 2025-11-19 22:41 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: niranjana.vishwanathapura, umesh.nerlige.ramappa,
	christian.koenig, pstanner, dakr

Deregistering queues in the TDR introduces unnecessary complexity,
requiring reference-counting techniques to function correctly,
particularly to prevent use-after-free (UAF) issues while a
deregistration initiated from the TDR is in progress.

All that's needed in the TDR is to kick the queue off the hardware,
which is achieved by disabling scheduling. Queue deregistration should
be handled in a single, well-defined point in the cleanup path, tied to
the queue's reference count.

v4:
 - Explain why extra ref were needed prior to this patch (Niranjana)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 65 +++++-------------------------
 1 file changed, 9 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 648c9ea06749..5de300b66767 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -69,9 +69,8 @@ exec_queue_to_guc(struct xe_exec_queue *q)
 #define EXEC_QUEUE_STATE_WEDGED			(1 << 8)
 #define EXEC_QUEUE_STATE_BANNED			(1 << 9)
 #define EXEC_QUEUE_STATE_CHECK_TIMEOUT		(1 << 10)
-#define EXEC_QUEUE_STATE_EXTRA_REF		(1 << 11)
-#define EXEC_QUEUE_STATE_PENDING_RESUME		(1 << 12)
-#define EXEC_QUEUE_STATE_PENDING_TDR_EXIT	(1 << 13)
+#define EXEC_QUEUE_STATE_PENDING_RESUME		(1 << 11)
+#define EXEC_QUEUE_STATE_PENDING_TDR_EXIT	(1 << 12)
 
 static bool exec_queue_registered(struct xe_exec_queue *q)
 {
@@ -218,21 +217,6 @@ static void clear_exec_queue_check_timeout(struct xe_exec_queue *q)
 	atomic_and(~EXEC_QUEUE_STATE_CHECK_TIMEOUT, &q->guc->state);
 }
 
-static bool exec_queue_extra_ref(struct xe_exec_queue *q)
-{
-	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_EXTRA_REF;
-}
-
-static void set_exec_queue_extra_ref(struct xe_exec_queue *q)
-{
-	atomic_or(EXEC_QUEUE_STATE_EXTRA_REF, &q->guc->state);
-}
-
-static void clear_exec_queue_extra_ref(struct xe_exec_queue *q)
-{
-	atomic_and(~EXEC_QUEUE_STATE_EXTRA_REF, &q->guc->state);
-}
-
 static bool exec_queue_pending_resume(struct xe_exec_queue *q)
 {
 	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_PENDING_RESUME;
@@ -1190,25 +1174,6 @@ static void disable_scheduling(struct xe_exec_queue *q, bool immediate)
 		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
 }
 
-static void __deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
-{
-	u32 action[] = {
-		XE_GUC_ACTION_DEREGISTER_CONTEXT,
-		q->guc->id,
-	};
-
-	xe_gt_assert(guc_to_gt(guc), !exec_queue_destroyed(q));
-	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
-	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_enable(q));
-	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_disable(q));
-
-	set_exec_queue_destroyed(q);
-	trace_xe_exec_queue_deregister(q);
-
-	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
-		       G2H_LEN_DW_DEREGISTER_CONTEXT, 1);
-}
-
 static enum drm_gpu_sched_stat
 guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 {
@@ -1225,6 +1190,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	bool wedged = false, skip_timeout_check;
 
 	xe_gt_assert(guc_to_gt(guc), !xe_exec_queue_is_lr(q));
+	xe_gt_assert(guc_to_gt(guc), !exec_queue_destroyed(q));
 
 	/*
 	 * TDR has fired before free job worker. Common if exec queue
@@ -1241,8 +1207,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 
 	/* Must check all state after stopping scheduler */
 	skip_timeout_check = exec_queue_reset(q) ||
-		exec_queue_killed_or_banned_or_wedged(q) ||
-		exec_queue_destroyed(q);
+		exec_queue_killed_or_banned_or_wedged(q);
 
 	/*
 	 * If devcoredump not captured and GuC capture for the job is not ready
@@ -1271,13 +1236,13 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 		wedged = guc_submit_hint_wedged(exec_queue_to_guc(q));
 
 	/* Engine state now stable, disable scheduling to check timestamp */
-	if (!wedged && exec_queue_registered(q)) {
+	if (!wedged && (exec_queue_enabled(q) || exec_queue_pending_disable(q))) {
 		int ret;
 
 		if (exec_queue_reset(q))
 			err = -EIO;
 
-		if (!exec_queue_destroyed(q) && xe_uc_fw_is_running(&guc->fw)) {
+		if (xe_uc_fw_is_running(&guc->fw)) {
 			/*
 			 * Wait for any pending G2H to flush out before
 			 * modifying state
@@ -1327,8 +1292,6 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 			xe_devcoredump(q, job,
 				       "Schedule disable failed to respond, guc_id=%d, ret=%d, guc_read=%d",
 				       q->guc->id, ret, xe_guc_read_stopped(guc));
-			set_exec_queue_extra_ref(q);
-			xe_exec_queue_get(q);	/* GT reset owns this */
 			set_exec_queue_banned(q);
 			xe_gt_reset_async(q->gt);
 			xe_sched_tdr_queue_imm(sched);
@@ -1381,13 +1344,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 		}
 	}
 
-	/* Finish cleaning up exec queue via deregister */
 	set_exec_queue_banned(q);
-	if (!wedged && exec_queue_registered(q) && !exec_queue_destroyed(q)) {
-		set_exec_queue_extra_ref(q);
-		xe_exec_queue_get(q);
-		__deregister_exec_queue(guc, q);
-	}
 
 	/* Mark all outstanding jobs as bad, thus completing them */
 	xe_sched_job_set_error(job, err);
@@ -1933,7 +1890,7 @@ static void guc_exec_queue_stop(struct xe_guc *guc, struct xe_exec_queue *q)
 
 	/* Clean up lost G2H + reset engine state */
 	if (exec_queue_registered(q)) {
-		if (exec_queue_extra_ref(q) || xe_exec_queue_is_lr(q))
+		if (xe_exec_queue_is_lr(q))
 			xe_exec_queue_put(q);
 		else if (exec_queue_destroyed(q))
 			__guc_exec_queue_destroy(guc, q);
@@ -2067,11 +2024,7 @@ static void guc_exec_queue_revert_pending_state_change(struct xe_guc *guc,
 
 	if (exec_queue_destroyed(q) && exec_queue_registered(q)) {
 		clear_exec_queue_destroyed(q);
-		if (exec_queue_extra_ref(q))
-			xe_exec_queue_put(q);
-		else
-			q->guc->needs_cleanup = true;
-		clear_exec_queue_extra_ref(q);
+		q->guc->needs_cleanup = true;
 		xe_gt_dbg(guc_to_gt(guc), "Replay CLEANUP - guc_id=%d",
 			  q->guc->id);
 	}
@@ -2488,7 +2441,7 @@ static void handle_deregister_done(struct xe_guc *guc, struct xe_exec_queue *q)
 
 	clear_exec_queue_registered(q);
 
-	if (exec_queue_extra_ref(q) || xe_exec_queue_is_lr(q))
+	if (xe_exec_queue_is_lr(q))
 		xe_exec_queue_put(q);
 	else
 		__guc_exec_queue_destroy(guc, q);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 7/8] drm/xe: Remove special casing for LR queues in submission
  2025-11-19 22:40 [PATCH v4 0/8] Fix DRM scheduler layering violations in Xe Matthew Brost
                   ` (5 preceding siblings ...)
  2025-11-19 22:41 ` [PATCH v4 6/8] drm/xe: Do not deregister queues in TDR Matthew Brost
@ 2025-11-19 22:41 ` Matthew Brost
  2025-11-20 18:53   ` Niranjana Vishwanathapura
  2025-11-19 22:41 ` [PATCH v4 8/8] drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR Matthew Brost
  7 siblings, 1 reply; 22+ messages in thread
From: Matthew Brost @ 2025-11-19 22:41 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: niranjana.vishwanathapura, umesh.nerlige.ramappa,
	christian.koenig, pstanner, dakr

Now that LR jobs are tracked by the DRM scheduler, there's no longer a
need to special-case LR queues. This change removes all LR
queue-specific handling, including dedicated TDR logic, reference
counting schemes, and other related mechanisms.

v4:
 - Remove xe_exec_queue_lr_cleanup tracepoint (Niranjana)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_exec_queue_types.h |   2 -
 drivers/gpu/drm/xe/xe_guc_submit.c           | 132 ++-----------------
 drivers/gpu/drm/xe/xe_trace.h                |   5 -
 3 files changed, 11 insertions(+), 128 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h b/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
index a3b034e4b205..fd0915ed8eb1 100644
--- a/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
@@ -33,8 +33,6 @@ struct xe_guc_exec_queue {
 	 */
 #define MAX_STATIC_MSG_TYPE	3
 	struct xe_sched_msg static_msgs[MAX_STATIC_MSG_TYPE];
-	/** @lr_tdr: long running TDR worker */
-	struct work_struct lr_tdr;
 	/** @destroy_async: do final destroy async from this worker */
 	struct work_struct destroy_async;
 	/** @resume_time: time of last resume */
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 5de300b66767..1f2afad1766e 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -674,14 +674,6 @@ static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
 		parallel_write(xe, map, wq_desc.wq_status, WQ_STATUS_ACTIVE);
 	}
 
-	/*
-	 * We must keep a reference for LR engines if engine is registered with
-	 * the GuC as jobs signal immediately and can't destroy an engine if the
-	 * GuC has a reference to it.
-	 */
-	if (xe_exec_queue_is_lr(q))
-		xe_exec_queue_get(q);
-
 	set_exec_queue_registered(q);
 	trace_xe_exec_queue_register(q);
 	if (xe_exec_queue_is_parallel(q))
@@ -854,7 +846,7 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
 	struct xe_sched_job *job = to_xe_sched_job(drm_job);
 	struct xe_exec_queue *q = job->q;
 	struct xe_guc *guc = exec_queue_to_guc(q);
-	bool lr = xe_exec_queue_is_lr(q), killed_or_banned_or_wedged =
+	bool killed_or_banned_or_wedged =
 		exec_queue_killed_or_banned_or_wedged(q);
 
 	xe_gt_assert(guc_to_gt(guc), !(exec_queue_destroyed(q) || exec_queue_pending_disable(q)) ||
@@ -871,15 +863,6 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
 		job->skip_emit = false;
 	}
 
-	/*
-	 * We don't care about job-fence ordering in LR VMs because these fences
-	 * are never exported; they are used solely to keep jobs on the pending
-	 * list. Once a queue enters an error state, there's no need to track
-	 * them.
-	 */
-	if (killed_or_banned_or_wedged && lr)
-		xe_sched_job_set_error(job, -ECANCELED);
-
 	return job->fence;
 }
 
@@ -923,8 +906,7 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
 		xe_gt_warn(q->gt, "Pending enable/disable failed to respond\n");
 		xe_sched_submission_start(sched);
 		xe_gt_reset_async(q->gt);
-		if (!xe_exec_queue_is_lr(q))
-			xe_sched_tdr_queue_imm(sched);
+		xe_sched_tdr_queue_imm(sched);
 		return;
 	}
 
@@ -950,10 +932,7 @@ static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q)
 	/** to wakeup xe_wait_user_fence ioctl if exec queue is reset */
 	wake_up_all(&xe->ufence_wq);
 
-	if (xe_exec_queue_is_lr(q))
-		queue_work(guc_to_gt(guc)->ordered_wq, &q->guc->lr_tdr);
-	else
-		xe_sched_tdr_queue_imm(&q->guc->sched);
+	xe_sched_tdr_queue_imm(&q->guc->sched);
 }
 
 /**
@@ -1009,78 +988,6 @@ static bool guc_submit_hint_wedged(struct xe_guc *guc)
 	return true;
 }
 
-static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
-{
-	struct xe_guc_exec_queue *ge =
-		container_of(w, struct xe_guc_exec_queue, lr_tdr);
-	struct xe_exec_queue *q = ge->q;
-	struct xe_guc *guc = exec_queue_to_guc(q);
-	struct xe_gpu_scheduler *sched = &ge->sched;
-	struct drm_sched_job *job;
-	bool wedged = false;
-
-	xe_gt_assert(guc_to_gt(guc), xe_exec_queue_is_lr(q));
-
-	if (vf_recovery(guc))
-		return;
-
-	trace_xe_exec_queue_lr_cleanup(q);
-
-	if (!exec_queue_killed(q))
-		wedged = guc_submit_hint_wedged(exec_queue_to_guc(q));
-
-	/* Kill the run_job / process_msg entry points */
-	xe_sched_submission_stop(sched);
-
-	/*
-	 * Engine state now mostly stable, disable scheduling / deregister if
-	 * needed. This cleanup routine might be called multiple times, where
-	 * the actual async engine deregister drops the final engine ref.
-	 * Calling disable_scheduling_deregister will mark the engine as
-	 * destroyed and fire off the CT requests to disable scheduling /
-	 * deregister, which we only want to do once. We also don't want to mark
-	 * the engine as pending_disable again as this may race with the
-	 * xe_guc_deregister_done_handler() which treats it as an unexpected
-	 * state.
-	 */
-	if (!wedged && exec_queue_registered(q) && !exec_queue_destroyed(q)) {
-		struct xe_guc *guc = exec_queue_to_guc(q);
-		int ret;
-
-		set_exec_queue_banned(q);
-		disable_scheduling_deregister(guc, q);
-
-		/*
-		 * Must wait for scheduling to be disabled before signalling
-		 * any fences, if GT broken the GT reset code should signal us.
-		 */
-		ret = wait_event_timeout(guc->ct.wq,
-					 !exec_queue_pending_disable(q) ||
-					 xe_guc_read_stopped(guc) ||
-					 vf_recovery(guc), HZ * 5);
-		if (vf_recovery(guc))
-			return;
-
-		if (!ret) {
-			xe_gt_warn(q->gt, "Schedule disable failed to respond, guc_id=%d\n",
-				   q->guc->id);
-			xe_devcoredump(q, NULL, "Schedule disable failed to respond, guc_id=%d\n",
-				       q->guc->id);
-			xe_sched_submission_start(sched);
-			xe_gt_reset_async(q->gt);
-			return;
-		}
-	}
-
-	if (!exec_queue_killed(q) && !xe_lrc_ring_is_idle(q->lrc[0]))
-		xe_devcoredump(q, NULL, "LR job cleanup, guc_id=%d", q->guc->id);
-
-	drm_sched_for_each_pending_job(job, &sched->base, NULL)
-		xe_sched_job_set_error(to_xe_sched_job(job), -ECANCELED);
-
-	xe_sched_submission_start(sched);
-}
-
 #define ADJUST_FIVE_PERCENT(__t)	mul_u64_u32_div(__t, 105, 100)
 
 static bool check_timeout(struct xe_exec_queue *q, struct xe_sched_job *job)
@@ -1150,8 +1057,7 @@ static void enable_scheduling(struct xe_exec_queue *q)
 		xe_gt_warn(guc_to_gt(guc), "Schedule enable failed to respond");
 		set_exec_queue_banned(q);
 		xe_gt_reset_async(q->gt);
-		if (!xe_exec_queue_is_lr(q))
-			xe_sched_tdr_queue_imm(&q->guc->sched);
+		xe_sched_tdr_queue_imm(&q->guc->sched);
 	}
 }
 
@@ -1189,7 +1095,6 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	pid_t pid = -1;
 	bool wedged = false, skip_timeout_check;
 
-	xe_gt_assert(guc_to_gt(guc), !xe_exec_queue_is_lr(q));
 	xe_gt_assert(guc_to_gt(guc), !exec_queue_destroyed(q));
 
 	/*
@@ -1209,6 +1114,10 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	skip_timeout_check = exec_queue_reset(q) ||
 		exec_queue_killed_or_banned_or_wedged(q);
 
+	/* LR jobs can only get here if queue has been killed or hit an error */
+	if (xe_exec_queue_is_lr(q))
+		xe_gt_assert(guc_to_gt(guc), skip_timeout_check);
+
 	/*
 	 * If devcoredump not captured and GuC capture for the job is not ready
 	 * do manual capture first and decide later if we need to use it
@@ -1400,8 +1309,6 @@ static void __guc_exec_queue_destroy_async(struct work_struct *w)
 	xe_pm_runtime_get(guc_to_xe(guc));
 	trace_xe_exec_queue_destroy(q);
 
-	if (xe_exec_queue_is_lr(q))
-		cancel_work_sync(&ge->lr_tdr);
 	/* Confirm no work left behind accessing device structures */
 	cancel_delayed_work_sync(&ge->sched.base.work_tdr);
 
@@ -1634,9 +1541,6 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
 	if (err)
 		goto err_sched;
 
-	if (xe_exec_queue_is_lr(q))
-		INIT_WORK(&q->guc->lr_tdr, xe_guc_exec_queue_lr_cleanup);
-
 	mutex_lock(&guc->submission_state.lock);
 
 	err = alloc_guc_id(guc, q);
@@ -1890,9 +1794,7 @@ static void guc_exec_queue_stop(struct xe_guc *guc, struct xe_exec_queue *q)
 
 	/* Clean up lost G2H + reset engine state */
 	if (exec_queue_registered(q)) {
-		if (xe_exec_queue_is_lr(q))
-			xe_exec_queue_put(q);
-		else if (exec_queue_destroyed(q))
+		if (exec_queue_destroyed(q))
 			__guc_exec_queue_destroy(guc, q);
 	}
 	if (q->guc->suspend_pending) {
@@ -1922,9 +1824,6 @@ static void guc_exec_queue_stop(struct xe_guc *guc, struct xe_exec_queue *q)
 				trace_xe_sched_job_ban(job);
 				ban = true;
 			}
-		} else if (xe_exec_queue_is_lr(q) &&
-			   !xe_lrc_ring_is_idle(q->lrc[0])) {
-			ban = true;
 		}
 
 		if (ban) {
@@ -2007,8 +1906,6 @@ static void guc_exec_queue_revert_pending_state_change(struct xe_guc *guc,
 	if (pending_enable && !pending_resume &&
 	    !exec_queue_pending_tdr_exit(q)) {
 		clear_exec_queue_registered(q);
-		if (xe_exec_queue_is_lr(q))
-			xe_exec_queue_put(q);
 		xe_gt_dbg(guc_to_gt(guc), "Replay REGISTER - guc_id=%d",
 			  q->guc->id);
 	}
@@ -2065,10 +1962,7 @@ static void guc_exec_queue_pause(struct xe_guc *guc, struct xe_exec_queue *q)
 
 	/* Stop scheduling + flush any DRM scheduler operations */
 	xe_sched_submission_stop(sched);
-	if (xe_exec_queue_is_lr(q))
-		cancel_work_sync(&q->guc->lr_tdr);
-	else
-		cancel_delayed_work_sync(&sched->base.work_tdr);
+	cancel_delayed_work_sync(&sched->base.work_tdr);
 
 	guc_exec_queue_revert_pending_state_change(guc, q);
 
@@ -2440,11 +2334,7 @@ static void handle_deregister_done(struct xe_guc *guc, struct xe_exec_queue *q)
 	trace_xe_exec_queue_deregister_done(q);
 
 	clear_exec_queue_registered(q);
-
-	if (xe_exec_queue_is_lr(q))
-		xe_exec_queue_put(q);
-	else
-		__guc_exec_queue_destroy(guc, q);
+	__guc_exec_queue_destroy(guc, q);
 }
 
 int xe_guc_deregister_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
index 79a97b086cb2..cf2ef70fb7ce 100644
--- a/drivers/gpu/drm/xe/xe_trace.h
+++ b/drivers/gpu/drm/xe/xe_trace.h
@@ -182,11 +182,6 @@ DEFINE_EVENT(xe_exec_queue, xe_exec_queue_resubmit,
 	     TP_ARGS(q)
 );
 
-DEFINE_EVENT(xe_exec_queue, xe_exec_queue_lr_cleanup,
-	     TP_PROTO(struct xe_exec_queue *q),
-	     TP_ARGS(q)
-);
-
 DECLARE_EVENT_CLASS(xe_sched_job,
 		    TP_PROTO(struct xe_sched_job *job),
 		    TP_ARGS(job),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 8/8] drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR
  2025-11-19 22:40 [PATCH v4 0/8] Fix DRM scheduler layering violations in Xe Matthew Brost
                   ` (6 preceding siblings ...)
  2025-11-19 22:41 ` [PATCH v4 7/8] drm/xe: Remove special casing for LR queues in submission Matthew Brost
@ 2025-11-19 22:41 ` Matthew Brost
  2025-11-20 20:33   ` Umesh Nerlige Ramappa
  7 siblings, 1 reply; 22+ messages in thread
From: Matthew Brost @ 2025-11-19 22:41 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: niranjana.vishwanathapura, umesh.nerlige.ramappa,
	christian.koenig, pstanner, dakr

We now have proper infrastructure to accurately check the LRC timestamp
without toggling the scheduling state for non-VFs. For VFs, it is still
possible to get an inaccurate view if the context is on hardware. We
guard against free-running contexts on VFs by banning jobs whose
timestamps are not moving. In addition, VFs have a timeslice quantum
that naturally triggers context switches when more than one VF is
running, thus updating the LRC timestamp.

For multi-queue, it is desirable to avoid scheduling toggling in the TDR
because this scheduling state is shared among many queues. Furthermore,
this change simplifies the GuC state machine. The trade-off for VF cases
seems worthwhile.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c      | 100 ++++++------------------
 drivers/gpu/drm/xe/xe_sched_job.c       |   1 +
 drivers/gpu/drm/xe/xe_sched_job_types.h |   2 +
 3 files changed, 28 insertions(+), 75 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 1f2afad1766e..7404716e979f 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -68,9 +68,7 @@ exec_queue_to_guc(struct xe_exec_queue *q)
 #define EXEC_QUEUE_STATE_KILLED			(1 << 7)
 #define EXEC_QUEUE_STATE_WEDGED			(1 << 8)
 #define EXEC_QUEUE_STATE_BANNED			(1 << 9)
-#define EXEC_QUEUE_STATE_CHECK_TIMEOUT		(1 << 10)
-#define EXEC_QUEUE_STATE_PENDING_RESUME		(1 << 11)
-#define EXEC_QUEUE_STATE_PENDING_TDR_EXIT	(1 << 12)
+#define EXEC_QUEUE_STATE_PENDING_RESUME		(1 << 10)
 
 static bool exec_queue_registered(struct xe_exec_queue *q)
 {
@@ -202,21 +200,6 @@ static void set_exec_queue_wedged(struct xe_exec_queue *q)
 	atomic_or(EXEC_QUEUE_STATE_WEDGED, &q->guc->state);
 }
 
-static bool exec_queue_check_timeout(struct xe_exec_queue *q)
-{
-	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_CHECK_TIMEOUT;
-}
-
-static void set_exec_queue_check_timeout(struct xe_exec_queue *q)
-{
-	atomic_or(EXEC_QUEUE_STATE_CHECK_TIMEOUT, &q->guc->state);
-}
-
-static void clear_exec_queue_check_timeout(struct xe_exec_queue *q)
-{
-	atomic_and(~EXEC_QUEUE_STATE_CHECK_TIMEOUT, &q->guc->state);
-}
-
 static bool exec_queue_pending_resume(struct xe_exec_queue *q)
 {
 	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_PENDING_RESUME;
@@ -232,21 +215,6 @@ static void clear_exec_queue_pending_resume(struct xe_exec_queue *q)
 	atomic_and(~EXEC_QUEUE_STATE_PENDING_RESUME, &q->guc->state);
 }
 
-static bool exec_queue_pending_tdr_exit(struct xe_exec_queue *q)
-{
-	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_PENDING_TDR_EXIT;
-}
-
-static void set_exec_queue_pending_tdr_exit(struct xe_exec_queue *q)
-{
-	atomic_or(EXEC_QUEUE_STATE_PENDING_TDR_EXIT, &q->guc->state);
-}
-
-static void clear_exec_queue_pending_tdr_exit(struct xe_exec_queue *q)
-{
-	atomic_and(~EXEC_QUEUE_STATE_PENDING_TDR_EXIT, &q->guc->state);
-}
-
 static bool exec_queue_killed_or_banned_or_wedged(struct xe_exec_queue *q)
 {
 	return (atomic_read(&q->guc->state) &
@@ -996,7 +964,7 @@ static bool check_timeout(struct xe_exec_queue *q, struct xe_sched_job *job)
 	u32 ctx_timestamp, ctx_job_timestamp;
 	u32 timeout_ms = q->sched_props.job_timeout_ms;
 	u32 diff;
-	u64 running_time_ms;
+	u64 running_time_ms, old_timestamp;
 
 	if (!xe_sched_job_started(job)) {
 		xe_gt_warn(gt, "Check job timeout: seqno=%u, lrc_seqno=%u, guc_id=%d, not started",
@@ -1006,7 +974,17 @@ static bool check_timeout(struct xe_exec_queue *q, struct xe_sched_job *job)
 		return xe_sched_invalidate_job(job, 2);
 	}
 
-	ctx_timestamp = lower_32_bits(xe_lrc_ctx_timestamp(q->lrc[0]));
+	ctx_timestamp = lower_32_bits(xe_lrc_update_timestamp(q->lrc[0],
+							      &old_timestamp));
+	if (ctx_timestamp == job->sample_timestamp) {
+		xe_gt_warn(gt, "Check job timeout: seqno=%u, lrc_seqno=%u, guc_id=%d, timestamp stuck",
+			   xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job),
+			   q->guc->id);
+
+		return xe_sched_invalidate_job(job, 2);
+	}
+
+	job->sample_timestamp = ctx_timestamp;
 	ctx_job_timestamp = xe_lrc_ctx_job_timestamp(q->lrc[0]);
 
 	/*
@@ -1135,16 +1113,17 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	}
 
 	/*
-	 * XXX: Sampling timeout doesn't work in wedged mode as we have to
-	 * modify scheduling state to read timestamp. We could read the
-	 * timestamp from a register to accumulate current running time but this
-	 * doesn't work for SRIOV. For now assuming timeouts in wedged mode are
-	 * genuine timeouts.
+	 * Check if job is actually timed out, if so restart job execution and TDR
 	 */
+	if (!skip_timeout_check && !check_timeout(q, job))
+		goto rearm;
+
 	if (!exec_queue_killed(q))
 		wedged = guc_submit_hint_wedged(exec_queue_to_guc(q));
 
-	/* Engine state now stable, disable scheduling to check timestamp */
+	set_exec_queue_banned(q);
+
+	/* Kick job / queue off hardware */
 	if (!wedged && (exec_queue_enabled(q) || exec_queue_pending_disable(q))) {
 		int ret;
 
@@ -1166,13 +1145,6 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 			if (!ret || xe_guc_read_stopped(guc))
 				goto trigger_reset;
 
-			/*
-			 * Flag communicates to G2H handler that schedule
-			 * disable originated from a timeout check. The G2H then
-			 * avoid triggering cleanup or deregistering the exec
-			 * queue.
-			 */
-			set_exec_queue_check_timeout(q);
 			disable_scheduling(q, skip_timeout_check);
 		}
 
@@ -1201,22 +1173,12 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 			xe_devcoredump(q, job,
 				       "Schedule disable failed to respond, guc_id=%d, ret=%d, guc_read=%d",
 				       q->guc->id, ret, xe_guc_read_stopped(guc));
-			set_exec_queue_banned(q);
 			xe_gt_reset_async(q->gt);
 			xe_sched_tdr_queue_imm(sched);
 			goto rearm;
 		}
 	}
 
-	/*
-	 * Check if job is actually timed out, if so restart job execution and TDR
-	 */
-	if (!wedged && !skip_timeout_check && !check_timeout(q, job) &&
-	    !exec_queue_reset(q) && exec_queue_registered(q)) {
-		clear_exec_queue_check_timeout(q);
-		goto sched_enable;
-	}
-
 	if (q->vm && q->vm->xef) {
 		process_name = q->vm->xef->process_name;
 		pid = q->vm->xef->pid;
@@ -1247,14 +1209,11 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	if (!wedged && (q->flags & EXEC_QUEUE_FLAG_KERNEL ||
 			(q->flags & EXEC_QUEUE_FLAG_VM && !exec_queue_killed(q)))) {
 		if (!xe_sched_invalidate_job(job, 2)) {
-			clear_exec_queue_check_timeout(q);
 			xe_gt_reset_async(q->gt);
 			goto rearm;
 		}
 	}
 
-	set_exec_queue_banned(q);
-
 	/* Mark all outstanding jobs as bad, thus completing them */
 	xe_sched_job_set_error(job, err);
 	drm_sched_for_each_pending_job(tmp_job, &sched->base, NULL)
@@ -1269,9 +1228,6 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	 */
 	return DRM_GPU_SCHED_STAT_NO_HANG;
 
-sched_enable:
-	set_exec_queue_pending_tdr_exit(q);
-	enable_scheduling(q);
 rearm:
 	/*
 	 * XXX: Ideally want to adjust timeout based on current execution time
@@ -1903,8 +1859,7 @@ static void guc_exec_queue_revert_pending_state_change(struct xe_guc *guc,
 			  q->guc->id);
 	}
 
-	if (pending_enable && !pending_resume &&
-	    !exec_queue_pending_tdr_exit(q)) {
+	if (pending_enable && !pending_resume) {
 		clear_exec_queue_registered(q);
 		xe_gt_dbg(guc_to_gt(guc), "Replay REGISTER - guc_id=%d",
 			  q->guc->id);
@@ -1913,7 +1868,6 @@ static void guc_exec_queue_revert_pending_state_change(struct xe_guc *guc,
 	if (pending_enable) {
 		clear_exec_queue_enabled(q);
 		clear_exec_queue_pending_resume(q);
-		clear_exec_queue_pending_tdr_exit(q);
 		clear_exec_queue_pending_enable(q);
 		xe_gt_dbg(guc_to_gt(guc), "Replay ENABLE - guc_id=%d",
 			  q->guc->id);
@@ -1939,7 +1893,6 @@ static void guc_exec_queue_revert_pending_state_change(struct xe_guc *guc,
 		if (!pending_enable)
 			set_exec_queue_enabled(q);
 		clear_exec_queue_pending_disable(q);
-		clear_exec_queue_check_timeout(q);
 		xe_gt_dbg(guc_to_gt(guc), "Replay DISABLE - guc_id=%d",
 			  q->guc->id);
 	}
@@ -2263,13 +2216,10 @@ static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
 
 		q->guc->resume_time = ktime_get();
 		clear_exec_queue_pending_resume(q);
-		clear_exec_queue_pending_tdr_exit(q);
 		clear_exec_queue_pending_enable(q);
 		smp_wmb();
 		wake_up_all(&guc->ct.wq);
 	} else {
-		bool check_timeout = exec_queue_check_timeout(q);
-
 		xe_gt_assert(guc_to_gt(guc), runnable_state == 0);
 		xe_gt_assert(guc_to_gt(guc), exec_queue_pending_disable(q));
 
@@ -2277,11 +2227,11 @@ static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
 			suspend_fence_signal(q);
 			clear_exec_queue_pending_disable(q);
 		} else {
-			if (exec_queue_banned(q) || check_timeout) {
+			if (exec_queue_banned(q)) {
 				smp_wmb();
 				wake_up_all(&guc->ct.wq);
 			}
-			if (!check_timeout && exec_queue_destroyed(q)) {
+			if (exec_queue_destroyed(q)) {
 				/*
 				 * Make sure to clear the pending_disable only
 				 * after sampling the destroyed state. We want
@@ -2391,7 +2341,7 @@ int xe_guc_exec_queue_reset_handler(struct xe_guc *guc, u32 *msg, u32 len)
 	 * guc_exec_queue_timedout_job.
 	 */
 	set_exec_queue_reset(q);
-	if (!exec_queue_banned(q) && !exec_queue_check_timeout(q))
+	if (!exec_queue_banned(q))
 		xe_guc_exec_queue_trigger_cleanup(q);
 
 	return 0;
@@ -2472,7 +2422,7 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
 
 	/* Treat the same as engine reset */
 	set_exec_queue_reset(q);
-	if (!exec_queue_banned(q) && !exec_queue_check_timeout(q))
+	if (!exec_queue_banned(q))
 		xe_guc_exec_queue_trigger_cleanup(q);
 
 	return 0;
diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
index cb674a322113..39aec7f6d86d 100644
--- a/drivers/gpu/drm/xe/xe_sched_job.c
+++ b/drivers/gpu/drm/xe/xe_sched_job.c
@@ -110,6 +110,7 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
 		return ERR_PTR(-ENOMEM);
 
 	job->q = q;
+	job->sample_timestamp = U64_MAX;
 	kref_init(&job->refcount);
 	xe_exec_queue_get(job->q);
 
diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h b/drivers/gpu/drm/xe/xe_sched_job_types.h
index d26612abb4ca..ad5eee8a8cdb 100644
--- a/drivers/gpu/drm/xe/xe_sched_job_types.h
+++ b/drivers/gpu/drm/xe/xe_sched_job_types.h
@@ -59,6 +59,8 @@ struct xe_sched_job {
 	u32 lrc_seqno;
 	/** @migrate_flush_flags: Additional flush flags for migration jobs */
 	u32 migrate_flush_flags;
+	/** @sample_timestamp: Sampling of job timestamp in TDR */
+	u64 sample_timestamp;
 	/** @ring_ops_flush_tlb: The ring ops need to flush TLB before payload. */
 	bool ring_ops_flush_tlb;
 	/** @ggtt: mapped in ggtt. */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/8] drm/sched: Add several job helpers to avoid drivers touching scheduler state
  2025-11-19 22:40 ` [PATCH v4 1/8] drm/sched: Add several job helpers to avoid drivers touching scheduler state Matthew Brost
@ 2025-11-20 17:20   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 22+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-20 17:20 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, dri-devel, umesh.nerlige.ramappa, christian.koenig,
	pstanner, dakr

On Wed, Nov 19, 2025 at 02:40:59PM -0800, Matthew Brost wrote:
>Add helpers to see if scheduler is stopped and a jobs signaled state.
>Expected to be used driver side on recovery and debug flows.
>
>v4:
> - Reorder patch to first in series (Niranjana)
> - Also check parent fence for signaling (Niranjana)
>
>Signed-off-by: Matthew Brost <matthew.brost@intel.com>

LGTM.
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

>---
> drivers/gpu/drm/scheduler/sched_main.c |  4 ++--
> include/drm/gpu_scheduler.h            | 32 ++++++++++++++++++++++++++
> 2 files changed, 34 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>index 1d4f1b822e7b..cf40c18ab433 100644
>--- a/drivers/gpu/drm/scheduler/sched_main.c
>+++ b/drivers/gpu/drm/scheduler/sched_main.c
>@@ -344,7 +344,7 @@ drm_sched_rq_select_entity_fifo(struct drm_gpu_scheduler *sched,
>  */
> static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
> {
>-	if (!READ_ONCE(sched->pause_submit))
>+	if (!drm_sched_is_stopped(sched))
> 		queue_work(sched->submit_wq, &sched->work_run_job);
> }
>
>@@ -354,7 +354,7 @@ static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>  */
> static void drm_sched_run_free_queue(struct drm_gpu_scheduler *sched)
> {
>-	if (!READ_ONCE(sched->pause_submit))
>+	if (!drm_sched_is_stopped(sched))
> 		queue_work(sched->submit_wq, &sched->work_free_job);
> }
>
>diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>index fb88301b3c45..385bf34e76fe 100644
>--- a/include/drm/gpu_scheduler.h
>+++ b/include/drm/gpu_scheduler.h
>@@ -698,4 +698,36 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> 				   struct drm_gpu_scheduler **sched_list,
> 				   unsigned int num_sched_list);
>
>+/* Inlines */
>+
>+/**
>+ * drm_sched_is_stopped() - DRM is stopped
>+ * @sched: DRM scheduler
>+ *
>+ * Return: True if sched is stopped, False otherwise
>+ */
>+static inline bool drm_sched_is_stopped(struct drm_gpu_scheduler *sched)
>+{
>+	return READ_ONCE(sched->pause_submit);
>+}
>+
>+/**
>+ * drm_sched_job_is_signaled() - DRM scheduler job is signaled
>+ * @job: DRM scheduler job
>+ *
>+ * Determine if DRM scheduler job is signaled. DRM scheduler should be stopped
>+ * to obtain a stable snapshot of state. Both parent fence (hardware fence) and
>+ * finished fence (software fence) are check to determine signaling state.
>+ *
>+ * Return: True if job is signaled, False otherwise
>+ */
>+static inline bool drm_sched_job_is_signaled(struct drm_sched_job *job)
>+{
>+	struct drm_sched_fence *s_fence = job->s_fence;
>+
>+	WARN_ON(!drm_sched_is_stopped(job->sched));
>+	return (s_fence->parent && dma_fence_is_signaled(s_fence->parent)) ||
>+		dma_fence_is_signaled(&s_fence->finished);
>+}
>+
> #endif
>-- 
>2.34.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/8] drm/sched: Add pending job list iterator
  2025-11-19 22:41 ` [PATCH v4 2/8] drm/sched: Add pending job list iterator Matthew Brost
@ 2025-11-20 17:21   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 22+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-20 17:21 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, dri-devel, umesh.nerlige.ramappa, christian.koenig,
	pstanner, dakr

On Wed, Nov 19, 2025 at 02:41:00PM -0800, Matthew Brost wrote:
>Stop open coding pending job list in drivers. Add pending job list
>iterator which safely walks DRM scheduler list asserting DRM scheduler
>is stopped.
>
>v2:
> - Fix checkpatch (CI)
>v3:
> - Drop locked version (Christian)
>v4:
> - Reorder patch (Niranjana)
>
>Signed-off-by: Matthew Brost <matthew.brost@intel.com>

LGTM.
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

>---
> include/drm/gpu_scheduler.h | 50 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 50 insertions(+)
>
>diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>index 385bf34e76fe..9d228513d06c 100644
>--- a/include/drm/gpu_scheduler.h
>+++ b/include/drm/gpu_scheduler.h
>@@ -730,4 +730,54 @@ static inline bool drm_sched_job_is_signaled(struct drm_sched_job *job)
> 		dma_fence_is_signaled(&s_fence->finished);
> }
>
>+/**
>+ * struct drm_sched_pending_job_iter - DRM scheduler pending job iterator state
>+ * @sched: DRM scheduler associated with pending job iterator
>+ */
>+struct drm_sched_pending_job_iter {
>+	struct drm_gpu_scheduler *sched;
>+};
>+
>+/* Drivers should never call this directly */
>+static inline struct drm_sched_pending_job_iter
>+__drm_sched_pending_job_iter_begin(struct drm_gpu_scheduler *sched)
>+{
>+	struct drm_sched_pending_job_iter iter = {
>+		.sched = sched,
>+	};
>+
>+	WARN_ON(!drm_sched_is_stopped(sched));
>+	return iter;
>+}
>+
>+/* Drivers should never call this directly */
>+static inline void
>+__drm_sched_pending_job_iter_end(const struct drm_sched_pending_job_iter iter)
>+{
>+	WARN_ON(!drm_sched_is_stopped(iter.sched));
>+}
>+
>+DEFINE_CLASS(drm_sched_pending_job_iter, struct drm_sched_pending_job_iter,
>+	     __drm_sched_pending_job_iter_end(_T),
>+	     __drm_sched_pending_job_iter_begin(__sched),
>+	     struct drm_gpu_scheduler *__sched);
>+static inline void *
>+class_drm_sched_pending_job_iter_lock_ptr(class_drm_sched_pending_job_iter_t *_T)
>+{ return _T; }
>+#define class_drm_sched_pending_job_iter_is_conditional false
>+
>+/**
>+ * drm_sched_for_each_pending_job() - Iterator for each pending job in scheduler
>+ * @__job: Current pending job being iterated over
>+ * @__sched: DRM scheduler to iterate over pending jobs
>+ * @__entity: DRM scheduler entity to filter jobs, NULL indicates no filter
>+ *
>+ * Iterator for each pending job in scheduler, filtering on an entity, and
>+ * enforcing scheduler is fully stopped
>+ */
>+#define drm_sched_for_each_pending_job(__job, __sched, __entity)		\
>+	scoped_guard(drm_sched_pending_job_iter, (__sched))			\
>+		list_for_each_entry((__job), &(__sched)->pending_list, list)	\
>+			for_each_if(!(__entity) || (__job)->entity == (__entity))
>+
> #endif
>-- 
>2.34.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 4/8] drm/xe: Stop abusing DRM scheduler internals
  2025-11-19 22:41 ` [PATCH v4 4/8] drm/xe: Stop abusing DRM scheduler internals Matthew Brost
@ 2025-11-20 17:26   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 22+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-20 17:26 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, dri-devel, umesh.nerlige.ramappa, christian.koenig,
	pstanner, dakr

On Wed, Nov 19, 2025 at 02:41:02PM -0800, Matthew Brost wrote:
>Use new pending job list iterator and new helper functions in Xe to
>avoid reaching into DRM scheduler internals.
>
>Part of this change involves removing pending jobs debug information
>from debugfs and devcoredump. As agreed, the pending job list should
>only be accessed when the scheduler is stopped. However, it's not
>straightforward to determine whether the scheduler is stopped from the
>shared debugfs/devcoredump code path. Additionally, the pending job list
>provides little useful information, as pending jobs can be inferred from
>seqnos and ring head/tail positions. Therefore, this debug information
>is being removed.
>
>v4:
> - Add comment around DRM_GPU_SCHED_STAT_NO_HANG (Niranjana)
>
>Signed-off-by: Matthew Brost <matthew.brost@intel.com>

LGTM.
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

>---
> drivers/gpu/drm/xe/xe_gpu_scheduler.c    |  4 +-
> drivers/gpu/drm/xe/xe_gpu_scheduler.h    | 34 +++--------
> drivers/gpu/drm/xe/xe_guc_submit.c       | 78 +++++-------------------
> drivers/gpu/drm/xe/xe_guc_submit_types.h | 11 ----
> drivers/gpu/drm/xe/xe_hw_fence.c         | 16 -----
> drivers/gpu/drm/xe/xe_hw_fence.h         |  2 -
> 6 files changed, 24 insertions(+), 121 deletions(-)
>
>diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.c b/drivers/gpu/drm/xe/xe_gpu_scheduler.c
>index f4f23317191f..9c8004d5dd91 100644
>--- a/drivers/gpu/drm/xe/xe_gpu_scheduler.c
>+++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.c
>@@ -7,7 +7,7 @@
>
> static void xe_sched_process_msg_queue(struct xe_gpu_scheduler *sched)
> {
>-	if (!READ_ONCE(sched->base.pause_submit))
>+	if (!drm_sched_is_stopped(&sched->base))
> 		queue_work(sched->base.submit_wq, &sched->work_process_msg);
> }
>
>@@ -43,7 +43,7 @@ static void xe_sched_process_msg_work(struct work_struct *w)
> 		container_of(w, struct xe_gpu_scheduler, work_process_msg);
> 	struct xe_sched_msg *msg;
>
>-	if (READ_ONCE(sched->base.pause_submit))
>+	if (drm_sched_is_stopped(&sched->base))
> 		return;
>
> 	msg = xe_sched_get_msg(sched);
>diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.h b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
>index b971b6b69419..583372a78140 100644
>--- a/drivers/gpu/drm/xe/xe_gpu_scheduler.h
>+++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.h
>@@ -55,14 +55,10 @@ static inline void xe_sched_resubmit_jobs(struct xe_gpu_scheduler *sched)
> {
> 	struct drm_sched_job *s_job;
>
>-	list_for_each_entry(s_job, &sched->base.pending_list, list) {
>-		struct drm_sched_fence *s_fence = s_job->s_fence;
>-		struct dma_fence *hw_fence = s_fence->parent;
>-
>+	drm_sched_for_each_pending_job(s_job, &sched->base, NULL)
> 		if (to_xe_sched_job(s_job)->skip_emit ||
>-		    (hw_fence && !dma_fence_is_signaled(hw_fence)))
>+		    !drm_sched_job_is_signaled(s_job))
> 			sched->base.ops->run_job(s_job);
>-	}
> }
>
> static inline bool
>@@ -71,14 +67,6 @@ xe_sched_invalidate_job(struct xe_sched_job *job, int threshold)
> 	return drm_sched_invalidate_job(&job->drm, threshold);
> }
>
>-static inline void xe_sched_add_pending_job(struct xe_gpu_scheduler *sched,
>-					    struct xe_sched_job *job)
>-{
>-	spin_lock(&sched->base.job_list_lock);
>-	list_add(&job->drm.list, &sched->base.pending_list);
>-	spin_unlock(&sched->base.job_list_lock);
>-}
>-
> /**
>  * xe_sched_first_pending_job() - Find first pending job which is unsignaled
>  * @sched: Xe GPU scheduler
>@@ -88,21 +76,13 @@ static inline void xe_sched_add_pending_job(struct xe_gpu_scheduler *sched,
> static inline
> struct xe_sched_job *xe_sched_first_pending_job(struct xe_gpu_scheduler *sched)
> {
>-	struct xe_sched_job *job, *r_job = NULL;
>-
>-	spin_lock(&sched->base.job_list_lock);
>-	list_for_each_entry(job, &sched->base.pending_list, drm.list) {
>-		struct drm_sched_fence *s_fence = job->drm.s_fence;
>-		struct dma_fence *hw_fence = s_fence->parent;
>+	struct drm_sched_job *job;
>
>-		if (hw_fence && !dma_fence_is_signaled(hw_fence)) {
>-			r_job = job;
>-			break;
>-		}
>-	}
>-	spin_unlock(&sched->base.job_list_lock);
>+	drm_sched_for_each_pending_job(job, &sched->base, NULL)
>+		if (!drm_sched_job_is_signaled(job))
>+			return to_xe_sched_job(job);
>
>-	return r_job;
>+	return NULL;
> }
>
> static inline int
>diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>index d4ffdb71ef3d..3ee35d4873bc 100644
>--- a/drivers/gpu/drm/xe/xe_guc_submit.c
>+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>@@ -1032,7 +1032,7 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
> 	struct xe_exec_queue *q = ge->q;
> 	struct xe_guc *guc = exec_queue_to_guc(q);
> 	struct xe_gpu_scheduler *sched = &ge->sched;
>-	struct xe_sched_job *job;
>+	struct drm_sched_job *job;
> 	bool wedged = false;
>
> 	xe_gt_assert(guc_to_gt(guc), xe_exec_queue_is_lr(q));
>@@ -1091,16 +1091,10 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
> 	if (!exec_queue_killed(q) && !xe_lrc_ring_is_idle(q->lrc[0]))
> 		xe_devcoredump(q, NULL, "LR job cleanup, guc_id=%d", q->guc->id);
>
>-	xe_hw_fence_irq_stop(q->fence_irq);
>+	drm_sched_for_each_pending_job(job, &sched->base, NULL)
>+		xe_sched_job_set_error(to_xe_sched_job(job), -ECANCELED);
>
> 	xe_sched_submission_start(sched);
>-
>-	spin_lock(&sched->base.job_list_lock);
>-	list_for_each_entry(job, &sched->base.pending_list, drm.list)
>-		xe_sched_job_set_error(job, -ECANCELED);
>-	spin_unlock(&sched->base.job_list_lock);
>-
>-	xe_hw_fence_irq_start(q->fence_irq);
> }
>
> #define ADJUST_FIVE_PERCENT(__t)	mul_u64_u32_div(__t, 105, 100)
>@@ -1219,7 +1213,7 @@ static enum drm_gpu_sched_stat
> guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> {
> 	struct xe_sched_job *job = to_xe_sched_job(drm_job);
>-	struct xe_sched_job *tmp_job;
>+	struct drm_sched_job *tmp_job;
> 	struct xe_exec_queue *q = job->q;
> 	struct xe_gpu_scheduler *sched = &q->guc->sched;
> 	struct xe_guc *guc = exec_queue_to_guc(q);
>@@ -1228,7 +1222,6 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> 	unsigned int fw_ref;
> 	int err = -ETIME;
> 	pid_t pid = -1;
>-	int i = 0;
> 	bool wedged = false, skip_timeout_check;
>
> 	xe_gt_assert(guc_to_gt(guc), !xe_exec_queue_is_lr(q));
>@@ -1395,28 +1388,19 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> 		__deregister_exec_queue(guc, q);
> 	}
>
>-	/* Stop fence signaling */
>-	xe_hw_fence_irq_stop(q->fence_irq);
>+	/* Mark all outstanding jobs as bad, thus completing them */
>+	xe_sched_job_set_error(job, err);
>+	drm_sched_for_each_pending_job(tmp_job, &sched->base, NULL)
>+		xe_sched_job_set_error(to_xe_sched_job(tmp_job), -ECANCELED);
>
>-	/*
>-	 * Fence state now stable, stop / start scheduler which cleans up any
>-	 * fences that are complete
>-	 */
>-	xe_sched_add_pending_job(sched, job);
> 	xe_sched_submission_start(sched);
>-
> 	xe_guc_exec_queue_trigger_cleanup(q);
>
>-	/* Mark all outstanding jobs as bad, thus completing them */
>-	spin_lock(&sched->base.job_list_lock);
>-	list_for_each_entry(tmp_job, &sched->base.pending_list, drm.list)
>-		xe_sched_job_set_error(tmp_job, !i++ ? err : -ECANCELED);
>-	spin_unlock(&sched->base.job_list_lock);
>-
>-	/* Start fence signaling */
>-	xe_hw_fence_irq_start(q->fence_irq);
>-
>-	return DRM_GPU_SCHED_STAT_RESET;
>+	/*
>+	 * We want the job added back to the pending list so it gets freed; this
>+	 * is what DRM_GPU_SCHED_STAT_NO_HANG does.
>+	 */
>+	return DRM_GPU_SCHED_STAT_NO_HANG;
>
> sched_enable:
> 	set_exec_queue_pending_tdr_exit(q);
>@@ -2244,7 +2228,7 @@ static void guc_exec_queue_unpause_prepare(struct xe_guc *guc,
> 	struct drm_sched_job *s_job;
> 	struct xe_sched_job *job = NULL;
>
>-	list_for_each_entry(s_job, &sched->base.pending_list, list) {
>+	drm_sched_for_each_pending_job(s_job, &sched->base, NULL) {
> 		job = to_xe_sched_job(s_job);
>
> 		xe_gt_dbg(guc_to_gt(guc), "Replay JOB - guc_id=%d, seqno=%d",
>@@ -2349,7 +2333,7 @@ void xe_guc_submit_unpause(struct xe_guc *guc)
> 		 * created after resfix done.
> 		 */
> 		if (q->guc->id != index ||
>-		    !READ_ONCE(q->guc->sched.base.pause_submit))
>+		    !drm_sched_is_stopped(&q->guc->sched.base))
> 			continue;
>
> 		guc_exec_queue_unpause(guc, q);
>@@ -2771,30 +2755,6 @@ xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q)
> 	if (snapshot->parallel_execution)
> 		guc_exec_queue_wq_snapshot_capture(q, snapshot);
>
>-	spin_lock(&sched->base.job_list_lock);
>-	snapshot->pending_list_size = list_count_nodes(&sched->base.pending_list);
>-	snapshot->pending_list = kmalloc_array(snapshot->pending_list_size,
>-					       sizeof(struct pending_list_snapshot),
>-					       GFP_ATOMIC);
>-
>-	if (snapshot->pending_list) {
>-		struct xe_sched_job *job_iter;
>-
>-		i = 0;
>-		list_for_each_entry(job_iter, &sched->base.pending_list, drm.list) {
>-			snapshot->pending_list[i].seqno =
>-				xe_sched_job_seqno(job_iter);
>-			snapshot->pending_list[i].fence =
>-				dma_fence_is_signaled(job_iter->fence) ? 1 : 0;
>-			snapshot->pending_list[i].finished =
>-				dma_fence_is_signaled(&job_iter->drm.s_fence->finished)
>-				? 1 : 0;
>-			i++;
>-		}
>-	}
>-
>-	spin_unlock(&sched->base.job_list_lock);
>-
> 	return snapshot;
> }
>
>@@ -2852,13 +2812,6 @@ xe_guc_exec_queue_snapshot_print(struct xe_guc_submit_exec_queue_snapshot *snaps
>
> 	if (snapshot->parallel_execution)
> 		guc_exec_queue_wq_snapshot_print(snapshot, p);
>-
>-	for (i = 0; snapshot->pending_list && i < snapshot->pending_list_size;
>-	     i++)
>-		drm_printf(p, "\tJob: seqno=%d, fence=%d, finished=%d\n",
>-			   snapshot->pending_list[i].seqno,
>-			   snapshot->pending_list[i].fence,
>-			   snapshot->pending_list[i].finished);
> }
>
> /**
>@@ -2881,7 +2834,6 @@ void xe_guc_exec_queue_snapshot_free(struct xe_guc_submit_exec_queue_snapshot *s
> 			xe_lrc_snapshot_free(snapshot->lrc[i]);
> 		kfree(snapshot->lrc);
> 	}
>-	kfree(snapshot->pending_list);
> 	kfree(snapshot);
> }
>
>diff --git a/drivers/gpu/drm/xe/xe_guc_submit_types.h b/drivers/gpu/drm/xe/xe_guc_submit_types.h
>index dc7456c34583..0b08c79cf3b9 100644
>--- a/drivers/gpu/drm/xe/xe_guc_submit_types.h
>+++ b/drivers/gpu/drm/xe/xe_guc_submit_types.h
>@@ -61,12 +61,6 @@ struct guc_submit_parallel_scratch {
> 	u32 wq[WQ_SIZE / sizeof(u32)];
> };
>
>-struct pending_list_snapshot {
>-	u32 seqno;
>-	bool fence;
>-	bool finished;
>-};
>-
> /**
>  * struct xe_guc_submit_exec_queue_snapshot - Snapshot for devcoredump
>  */
>@@ -134,11 +128,6 @@ struct xe_guc_submit_exec_queue_snapshot {
> 		/** @wq: Workqueue Items */
> 		u32 wq[WQ_SIZE / sizeof(u32)];
> 	} parallel;
>-
>-	/** @pending_list_size: Size of the pending list snapshot array */
>-	int pending_list_size;
>-	/** @pending_list: snapshot of the pending list info */
>-	struct pending_list_snapshot *pending_list;
> };
>
> #endif
>diff --git a/drivers/gpu/drm/xe/xe_hw_fence.c b/drivers/gpu/drm/xe/xe_hw_fence.c
>index b2a0c46dfcd4..e65dfcdfdbc5 100644
>--- a/drivers/gpu/drm/xe/xe_hw_fence.c
>+++ b/drivers/gpu/drm/xe/xe_hw_fence.c
>@@ -110,22 +110,6 @@ void xe_hw_fence_irq_run(struct xe_hw_fence_irq *irq)
> 	irq_work_queue(&irq->work);
> }
>
>-void xe_hw_fence_irq_stop(struct xe_hw_fence_irq *irq)
>-{
>-	spin_lock_irq(&irq->lock);
>-	irq->enabled = false;
>-	spin_unlock_irq(&irq->lock);
>-}
>-
>-void xe_hw_fence_irq_start(struct xe_hw_fence_irq *irq)
>-{
>-	spin_lock_irq(&irq->lock);
>-	irq->enabled = true;
>-	spin_unlock_irq(&irq->lock);
>-
>-	irq_work_queue(&irq->work);
>-}
>-
> void xe_hw_fence_ctx_init(struct xe_hw_fence_ctx *ctx, struct xe_gt *gt,
> 			  struct xe_hw_fence_irq *irq, const char *name)
> {
>diff --git a/drivers/gpu/drm/xe/xe_hw_fence.h b/drivers/gpu/drm/xe/xe_hw_fence.h
>index f13a1c4982c7..599492c13f80 100644
>--- a/drivers/gpu/drm/xe/xe_hw_fence.h
>+++ b/drivers/gpu/drm/xe/xe_hw_fence.h
>@@ -17,8 +17,6 @@ void xe_hw_fence_module_exit(void);
> void xe_hw_fence_irq_init(struct xe_hw_fence_irq *irq);
> void xe_hw_fence_irq_finish(struct xe_hw_fence_irq *irq);
> void xe_hw_fence_irq_run(struct xe_hw_fence_irq *irq);
>-void xe_hw_fence_irq_stop(struct xe_hw_fence_irq *irq);
>-void xe_hw_fence_irq_start(struct xe_hw_fence_irq *irq);
>
> void xe_hw_fence_ctx_init(struct xe_hw_fence_ctx *ctx, struct xe_gt *gt,
> 			  struct xe_hw_fence_irq *irq, const char *name);
>-- 
>2.34.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 7/8] drm/xe: Remove special casing for LR queues in submission
  2025-11-19 22:41 ` [PATCH v4 7/8] drm/xe: Remove special casing for LR queues in submission Matthew Brost
@ 2025-11-20 18:53   ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 22+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-20 18:53 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, dri-devel, umesh.nerlige.ramappa, christian.koenig,
	pstanner, dakr

On Wed, Nov 19, 2025 at 02:41:05PM -0800, Matthew Brost wrote:
>Now that LR jobs are tracked by the DRM scheduler, there's no longer a
>need to special-case LR queues. This change removes all LR
>queue-specific handling, including dedicated TDR logic, reference
>counting schemes, and other related mechanisms.
>
>v4:
> - Remove xe_exec_queue_lr_cleanup tracepoint (Niranjana)
>
>Signed-off-by: Matthew Brost <matthew.brost@intel.com>

LGTM.
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

>---
> drivers/gpu/drm/xe/xe_guc_exec_queue_types.h |   2 -
> drivers/gpu/drm/xe/xe_guc_submit.c           | 132 ++-----------------
> drivers/gpu/drm/xe/xe_trace.h                |   5 -
> 3 files changed, 11 insertions(+), 128 deletions(-)
>
>diff --git a/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h b/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
>index a3b034e4b205..fd0915ed8eb1 100644
>--- a/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
>+++ b/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
>@@ -33,8 +33,6 @@ struct xe_guc_exec_queue {
> 	 */
> #define MAX_STATIC_MSG_TYPE	3
> 	struct xe_sched_msg static_msgs[MAX_STATIC_MSG_TYPE];
>-	/** @lr_tdr: long running TDR worker */
>-	struct work_struct lr_tdr;
> 	/** @destroy_async: do final destroy async from this worker */
> 	struct work_struct destroy_async;
> 	/** @resume_time: time of last resume */
>diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>index 5de300b66767..1f2afad1766e 100644
>--- a/drivers/gpu/drm/xe/xe_guc_submit.c
>+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>@@ -674,14 +674,6 @@ static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
> 		parallel_write(xe, map, wq_desc.wq_status, WQ_STATUS_ACTIVE);
> 	}
>
>-	/*
>-	 * We must keep a reference for LR engines if engine is registered with
>-	 * the GuC as jobs signal immediately and can't destroy an engine if the
>-	 * GuC has a reference to it.
>-	 */
>-	if (xe_exec_queue_is_lr(q))
>-		xe_exec_queue_get(q);
>-
> 	set_exec_queue_registered(q);
> 	trace_xe_exec_queue_register(q);
> 	if (xe_exec_queue_is_parallel(q))
>@@ -854,7 +846,7 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
> 	struct xe_sched_job *job = to_xe_sched_job(drm_job);
> 	struct xe_exec_queue *q = job->q;
> 	struct xe_guc *guc = exec_queue_to_guc(q);
>-	bool lr = xe_exec_queue_is_lr(q), killed_or_banned_or_wedged =
>+	bool killed_or_banned_or_wedged =
> 		exec_queue_killed_or_banned_or_wedged(q);
>
> 	xe_gt_assert(guc_to_gt(guc), !(exec_queue_destroyed(q) || exec_queue_pending_disable(q)) ||
>@@ -871,15 +863,6 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
> 		job->skip_emit = false;
> 	}
>
>-	/*
>-	 * We don't care about job-fence ordering in LR VMs because these fences
>-	 * are never exported; they are used solely to keep jobs on the pending
>-	 * list. Once a queue enters an error state, there's no need to track
>-	 * them.
>-	 */
>-	if (killed_or_banned_or_wedged && lr)
>-		xe_sched_job_set_error(job, -ECANCELED);
>-
> 	return job->fence;
> }
>
>@@ -923,8 +906,7 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
> 		xe_gt_warn(q->gt, "Pending enable/disable failed to respond\n");
> 		xe_sched_submission_start(sched);
> 		xe_gt_reset_async(q->gt);
>-		if (!xe_exec_queue_is_lr(q))
>-			xe_sched_tdr_queue_imm(sched);
>+		xe_sched_tdr_queue_imm(sched);
> 		return;
> 	}
>
>@@ -950,10 +932,7 @@ static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q)
> 	/** to wakeup xe_wait_user_fence ioctl if exec queue is reset */
> 	wake_up_all(&xe->ufence_wq);
>
>-	if (xe_exec_queue_is_lr(q))
>-		queue_work(guc_to_gt(guc)->ordered_wq, &q->guc->lr_tdr);
>-	else
>-		xe_sched_tdr_queue_imm(&q->guc->sched);
>+	xe_sched_tdr_queue_imm(&q->guc->sched);
> }
>
> /**
>@@ -1009,78 +988,6 @@ static bool guc_submit_hint_wedged(struct xe_guc *guc)
> 	return true;
> }
>
>-static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
>-{
>-	struct xe_guc_exec_queue *ge =
>-		container_of(w, struct xe_guc_exec_queue, lr_tdr);
>-	struct xe_exec_queue *q = ge->q;
>-	struct xe_guc *guc = exec_queue_to_guc(q);
>-	struct xe_gpu_scheduler *sched = &ge->sched;
>-	struct drm_sched_job *job;
>-	bool wedged = false;
>-
>-	xe_gt_assert(guc_to_gt(guc), xe_exec_queue_is_lr(q));
>-
>-	if (vf_recovery(guc))
>-		return;
>-
>-	trace_xe_exec_queue_lr_cleanup(q);
>-
>-	if (!exec_queue_killed(q))
>-		wedged = guc_submit_hint_wedged(exec_queue_to_guc(q));
>-
>-	/* Kill the run_job / process_msg entry points */
>-	xe_sched_submission_stop(sched);
>-
>-	/*
>-	 * Engine state now mostly stable, disable scheduling / deregister if
>-	 * needed. This cleanup routine might be called multiple times, where
>-	 * the actual async engine deregister drops the final engine ref.
>-	 * Calling disable_scheduling_deregister will mark the engine as
>-	 * destroyed and fire off the CT requests to disable scheduling /
>-	 * deregister, which we only want to do once. We also don't want to mark
>-	 * the engine as pending_disable again as this may race with the
>-	 * xe_guc_deregister_done_handler() which treats it as an unexpected
>-	 * state.
>-	 */
>-	if (!wedged && exec_queue_registered(q) && !exec_queue_destroyed(q)) {
>-		struct xe_guc *guc = exec_queue_to_guc(q);
>-		int ret;
>-
>-		set_exec_queue_banned(q);
>-		disable_scheduling_deregister(guc, q);
>-
>-		/*
>-		 * Must wait for scheduling to be disabled before signalling
>-		 * any fences, if GT broken the GT reset code should signal us.
>-		 */
>-		ret = wait_event_timeout(guc->ct.wq,
>-					 !exec_queue_pending_disable(q) ||
>-					 xe_guc_read_stopped(guc) ||
>-					 vf_recovery(guc), HZ * 5);
>-		if (vf_recovery(guc))
>-			return;
>-
>-		if (!ret) {
>-			xe_gt_warn(q->gt, "Schedule disable failed to respond, guc_id=%d\n",
>-				   q->guc->id);
>-			xe_devcoredump(q, NULL, "Schedule disable failed to respond, guc_id=%d\n",
>-				       q->guc->id);
>-			xe_sched_submission_start(sched);
>-			xe_gt_reset_async(q->gt);
>-			return;
>-		}
>-	}
>-
>-	if (!exec_queue_killed(q) && !xe_lrc_ring_is_idle(q->lrc[0]))
>-		xe_devcoredump(q, NULL, "LR job cleanup, guc_id=%d", q->guc->id);
>-
>-	drm_sched_for_each_pending_job(job, &sched->base, NULL)
>-		xe_sched_job_set_error(to_xe_sched_job(job), -ECANCELED);
>-
>-	xe_sched_submission_start(sched);
>-}
>-
> #define ADJUST_FIVE_PERCENT(__t)	mul_u64_u32_div(__t, 105, 100)
>
> static bool check_timeout(struct xe_exec_queue *q, struct xe_sched_job *job)
>@@ -1150,8 +1057,7 @@ static void enable_scheduling(struct xe_exec_queue *q)
> 		xe_gt_warn(guc_to_gt(guc), "Schedule enable failed to respond");
> 		set_exec_queue_banned(q);
> 		xe_gt_reset_async(q->gt);
>-		if (!xe_exec_queue_is_lr(q))
>-			xe_sched_tdr_queue_imm(&q->guc->sched);
>+		xe_sched_tdr_queue_imm(&q->guc->sched);
> 	}
> }
>
>@@ -1189,7 +1095,6 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> 	pid_t pid = -1;
> 	bool wedged = false, skip_timeout_check;
>
>-	xe_gt_assert(guc_to_gt(guc), !xe_exec_queue_is_lr(q));
> 	xe_gt_assert(guc_to_gt(guc), !exec_queue_destroyed(q));
>
> 	/*
>@@ -1209,6 +1114,10 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> 	skip_timeout_check = exec_queue_reset(q) ||
> 		exec_queue_killed_or_banned_or_wedged(q);
>
>+	/* LR jobs can only get here if queue has been killed or hit an error */
>+	if (xe_exec_queue_is_lr(q))
>+		xe_gt_assert(guc_to_gt(guc), skip_timeout_check);
>+
> 	/*
> 	 * If devcoredump not captured and GuC capture for the job is not ready
> 	 * do manual capture first and decide later if we need to use it
>@@ -1400,8 +1309,6 @@ static void __guc_exec_queue_destroy_async(struct work_struct *w)
> 	xe_pm_runtime_get(guc_to_xe(guc));
> 	trace_xe_exec_queue_destroy(q);
>
>-	if (xe_exec_queue_is_lr(q))
>-		cancel_work_sync(&ge->lr_tdr);
> 	/* Confirm no work left behind accessing device structures */
> 	cancel_delayed_work_sync(&ge->sched.base.work_tdr);
>
>@@ -1634,9 +1541,6 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
> 	if (err)
> 		goto err_sched;
>
>-	if (xe_exec_queue_is_lr(q))
>-		INIT_WORK(&q->guc->lr_tdr, xe_guc_exec_queue_lr_cleanup);
>-
> 	mutex_lock(&guc->submission_state.lock);
>
> 	err = alloc_guc_id(guc, q);
>@@ -1890,9 +1794,7 @@ static void guc_exec_queue_stop(struct xe_guc *guc, struct xe_exec_queue *q)
>
> 	/* Clean up lost G2H + reset engine state */
> 	if (exec_queue_registered(q)) {
>-		if (xe_exec_queue_is_lr(q))
>-			xe_exec_queue_put(q);
>-		else if (exec_queue_destroyed(q))
>+		if (exec_queue_destroyed(q))
> 			__guc_exec_queue_destroy(guc, q);
> 	}
> 	if (q->guc->suspend_pending) {
>@@ -1922,9 +1824,6 @@ static void guc_exec_queue_stop(struct xe_guc *guc, struct xe_exec_queue *q)
> 				trace_xe_sched_job_ban(job);
> 				ban = true;
> 			}
>-		} else if (xe_exec_queue_is_lr(q) &&
>-			   !xe_lrc_ring_is_idle(q->lrc[0])) {
>-			ban = true;
> 		}
>
> 		if (ban) {
>@@ -2007,8 +1906,6 @@ static void guc_exec_queue_revert_pending_state_change(struct xe_guc *guc,
> 	if (pending_enable && !pending_resume &&
> 	    !exec_queue_pending_tdr_exit(q)) {
> 		clear_exec_queue_registered(q);
>-		if (xe_exec_queue_is_lr(q))
>-			xe_exec_queue_put(q);
> 		xe_gt_dbg(guc_to_gt(guc), "Replay REGISTER - guc_id=%d",
> 			  q->guc->id);
> 	}
>@@ -2065,10 +1962,7 @@ static void guc_exec_queue_pause(struct xe_guc *guc, struct xe_exec_queue *q)
>
> 	/* Stop scheduling + flush any DRM scheduler operations */
> 	xe_sched_submission_stop(sched);
>-	if (xe_exec_queue_is_lr(q))
>-		cancel_work_sync(&q->guc->lr_tdr);
>-	else
>-		cancel_delayed_work_sync(&sched->base.work_tdr);
>+	cancel_delayed_work_sync(&sched->base.work_tdr);
>
> 	guc_exec_queue_revert_pending_state_change(guc, q);
>
>@@ -2440,11 +2334,7 @@ static void handle_deregister_done(struct xe_guc *guc, struct xe_exec_queue *q)
> 	trace_xe_exec_queue_deregister_done(q);
>
> 	clear_exec_queue_registered(q);
>-
>-	if (xe_exec_queue_is_lr(q))
>-		xe_exec_queue_put(q);
>-	else
>-		__guc_exec_queue_destroy(guc, q);
>+	__guc_exec_queue_destroy(guc, q);
> }
>
> int xe_guc_deregister_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
>diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
>index 79a97b086cb2..cf2ef70fb7ce 100644
>--- a/drivers/gpu/drm/xe/xe_trace.h
>+++ b/drivers/gpu/drm/xe/xe_trace.h
>@@ -182,11 +182,6 @@ DEFINE_EVENT(xe_exec_queue, xe_exec_queue_resubmit,
> 	     TP_ARGS(q)
> );
>
>-DEFINE_EVENT(xe_exec_queue, xe_exec_queue_lr_cleanup,
>-	     TP_PROTO(struct xe_exec_queue *q),
>-	     TP_ARGS(q)
>-);
>-
> DECLARE_EVENT_CLASS(xe_sched_job,
> 		    TP_PROTO(struct xe_sched_job *job),
> 		    TP_ARGS(job),
>-- 
>2.34.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 5/8] drm/xe: Only toggle scheduling in TDR if GuC is running
  2025-11-19 22:41 ` [PATCH v4 5/8] drm/xe: Only toggle scheduling in TDR if GuC is running Matthew Brost
@ 2025-11-20 19:48   ` Niranjana Vishwanathapura
  2025-11-21 22:06     ` Matthew Brost
  0 siblings, 1 reply; 22+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-20 19:48 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, dri-devel, umesh.nerlige.ramappa, christian.koenig,
	pstanner, dakr

On Wed, Nov 19, 2025 at 02:41:03PM -0800, Matthew Brost wrote:
>If the firmware is not running during TDR (e.g., when the driver is
>unloading), there's no need to toggle scheduling in the GuC. In such
>cases, skip this step.
>
>v4:
> - Bail on wait UC not running (Niranjana)
>
>Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>---
> drivers/gpu/drm/xe/xe_guc_submit.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>index 3ee35d4873bc..648c9ea06749 100644
>--- a/drivers/gpu/drm/xe/xe_guc_submit.c
>+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>@@ -1277,7 +1277,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> 		if (exec_queue_reset(q))
> 			err = -EIO;
>
>-		if (!exec_queue_destroyed(q)) {
>+		if (!exec_queue_destroyed(q) && xe_uc_fw_is_running(&guc->fw)) {
> 			/*
> 			 * Wait for any pending G2H to flush out before
> 			 * modifying state
>@@ -1312,6 +1312,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> 		 */
> 		smp_rmb();
> 		ret = wait_event_timeout(guc->ct.wq,
>+					 !xe_uc_fw_is_running(&guc->fw) ||
> 					 !exec_queue_pending_disable(q) ||
> 					 xe_guc_read_stopped(guc) ||
> 					 vf_recovery(guc), HZ * 5);

What if the wait exits because of '!xe_uc_fw_is_running(&guc->fw)'?
It is not clear where the control goes in that case based on all the
'if' checks that follows this wait. Should there be a specific check
for '!!xe_uc_fw_is_running(&guc->fw)' following the wait here?

Niranjana

>-- 
>2.34.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 6/8] drm/xe: Do not deregister queues in TDR
  2025-11-19 22:41 ` [PATCH v4 6/8] drm/xe: Do not deregister queues in TDR Matthew Brost
@ 2025-11-20 19:50   ` Niranjana Vishwanathapura
  2025-11-21 21:25     ` Matthew Brost
  0 siblings, 1 reply; 22+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-20 19:50 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, dri-devel, umesh.nerlige.ramappa, christian.koenig,
	pstanner, dakr

On Wed, Nov 19, 2025 at 02:41:04PM -0800, Matthew Brost wrote:
>Deregistering queues in the TDR introduces unnecessary complexity,
>requiring reference-counting techniques to function correctly,
>particularly to prevent use-after-free (UAF) issues while a
>deregistration initiated from the TDR is in progress.
>
>All that's needed in the TDR is to kick the queue off the hardware,
>which is achieved by disabling scheduling. Queue deregistration should
>be handled in a single, well-defined point in the cleanup path, tied to
>the queue's reference count.
>
>v4:
> - Explain why extra ref were needed prior to this patch (Niranjana)
>
>Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>---
> drivers/gpu/drm/xe/xe_guc_submit.c | 65 +++++-------------------------
> 1 file changed, 9 insertions(+), 56 deletions(-)
>
>diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>index 648c9ea06749..5de300b66767 100644
>--- a/drivers/gpu/drm/xe/xe_guc_submit.c
>+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>@@ -69,9 +69,8 @@ exec_queue_to_guc(struct xe_exec_queue *q)
> #define EXEC_QUEUE_STATE_WEDGED			(1 << 8)
> #define EXEC_QUEUE_STATE_BANNED			(1 << 9)
> #define EXEC_QUEUE_STATE_CHECK_TIMEOUT		(1 << 10)
>-#define EXEC_QUEUE_STATE_EXTRA_REF		(1 << 11)
>-#define EXEC_QUEUE_STATE_PENDING_RESUME		(1 << 12)
>-#define EXEC_QUEUE_STATE_PENDING_TDR_EXIT	(1 << 13)
>+#define EXEC_QUEUE_STATE_PENDING_RESUME		(1 << 11)
>+#define EXEC_QUEUE_STATE_PENDING_TDR_EXIT	(1 << 12)
>
> static bool exec_queue_registered(struct xe_exec_queue *q)
> {
>@@ -218,21 +217,6 @@ static void clear_exec_queue_check_timeout(struct xe_exec_queue *q)
> 	atomic_and(~EXEC_QUEUE_STATE_CHECK_TIMEOUT, &q->guc->state);
> }
>
>-static bool exec_queue_extra_ref(struct xe_exec_queue *q)
>-{
>-	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_EXTRA_REF;
>-}
>-
>-static void set_exec_queue_extra_ref(struct xe_exec_queue *q)
>-{
>-	atomic_or(EXEC_QUEUE_STATE_EXTRA_REF, &q->guc->state);
>-}
>-
>-static void clear_exec_queue_extra_ref(struct xe_exec_queue *q)
>-{
>-	atomic_and(~EXEC_QUEUE_STATE_EXTRA_REF, &q->guc->state);
>-}
>-
> static bool exec_queue_pending_resume(struct xe_exec_queue *q)
> {
> 	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_PENDING_RESUME;
>@@ -1190,25 +1174,6 @@ static void disable_scheduling(struct xe_exec_queue *q, bool immediate)
> 		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
> }
>
>-static void __deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
>-{
>-	u32 action[] = {
>-		XE_GUC_ACTION_DEREGISTER_CONTEXT,
>-		q->guc->id,
>-	};
>-
>-	xe_gt_assert(guc_to_gt(guc), !exec_queue_destroyed(q));
>-	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
>-	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_enable(q));
>-	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_disable(q));
>-
>-	set_exec_queue_destroyed(q);
>-	trace_xe_exec_queue_deregister(q);
>-
>-	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>-		       G2H_LEN_DW_DEREGISTER_CONTEXT, 1);
>-}
>-
> static enum drm_gpu_sched_stat
> guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> {
>@@ -1225,6 +1190,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> 	bool wedged = false, skip_timeout_check;
>
> 	xe_gt_assert(guc_to_gt(guc), !xe_exec_queue_is_lr(q));
>+	xe_gt_assert(guc_to_gt(guc), !exec_queue_destroyed(q));

Is it always guaranteed? What if we get here because TDR is triggered
by some error notification from the GuC and befor we get here, the
exec_queue gets destroyed in the CLEANUP message handler? I am not
sure we can we be sure here that it will be race proof.

Niranjana

>
> 	/*
> 	 * TDR has fired before free job worker. Common if exec queue
>@@ -1241,8 +1207,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>
> 	/* Must check all state after stopping scheduler */
> 	skip_timeout_check = exec_queue_reset(q) ||
>-		exec_queue_killed_or_banned_or_wedged(q) ||
>-		exec_queue_destroyed(q);
>+		exec_queue_killed_or_banned_or_wedged(q);
>
> 	/*
> 	 * If devcoredump not captured and GuC capture for the job is not ready
>@@ -1271,13 +1236,13 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> 		wedged = guc_submit_hint_wedged(exec_queue_to_guc(q));
>
> 	/* Engine state now stable, disable scheduling to check timestamp */
>-	if (!wedged && exec_queue_registered(q)) {
>+	if (!wedged && (exec_queue_enabled(q) || exec_queue_pending_disable(q))) {
> 		int ret;
>
> 		if (exec_queue_reset(q))
> 			err = -EIO;
>
>-		if (!exec_queue_destroyed(q) && xe_uc_fw_is_running(&guc->fw)) {
>+		if (xe_uc_fw_is_running(&guc->fw)) {
> 			/*
> 			 * Wait for any pending G2H to flush out before
> 			 * modifying state
>@@ -1327,8 +1292,6 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> 			xe_devcoredump(q, job,
> 				       "Schedule disable failed to respond, guc_id=%d, ret=%d, guc_read=%d",
> 				       q->guc->id, ret, xe_guc_read_stopped(guc));
>-			set_exec_queue_extra_ref(q);
>-			xe_exec_queue_get(q);	/* GT reset owns this */
> 			set_exec_queue_banned(q);
> 			xe_gt_reset_async(q->gt);
> 			xe_sched_tdr_queue_imm(sched);
>@@ -1381,13 +1344,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> 		}
> 	}
>
>-	/* Finish cleaning up exec queue via deregister */
> 	set_exec_queue_banned(q);
>-	if (!wedged && exec_queue_registered(q) && !exec_queue_destroyed(q)) {
>-		set_exec_queue_extra_ref(q);
>-		xe_exec_queue_get(q);
>-		__deregister_exec_queue(guc, q);
>-	}
>
> 	/* Mark all outstanding jobs as bad, thus completing them */
> 	xe_sched_job_set_error(job, err);
>@@ -1933,7 +1890,7 @@ static void guc_exec_queue_stop(struct xe_guc *guc, struct xe_exec_queue *q)
>
> 	/* Clean up lost G2H + reset engine state */
> 	if (exec_queue_registered(q)) {
>-		if (exec_queue_extra_ref(q) || xe_exec_queue_is_lr(q))
>+		if (xe_exec_queue_is_lr(q))
> 			xe_exec_queue_put(q);
> 		else if (exec_queue_destroyed(q))
> 			__guc_exec_queue_destroy(guc, q);
>@@ -2067,11 +2024,7 @@ static void guc_exec_queue_revert_pending_state_change(struct xe_guc *guc,
>
> 	if (exec_queue_destroyed(q) && exec_queue_registered(q)) {
> 		clear_exec_queue_destroyed(q);
>-		if (exec_queue_extra_ref(q))
>-			xe_exec_queue_put(q);
>-		else
>-			q->guc->needs_cleanup = true;
>-		clear_exec_queue_extra_ref(q);
>+		q->guc->needs_cleanup = true;
> 		xe_gt_dbg(guc_to_gt(guc), "Replay CLEANUP - guc_id=%d",
> 			  q->guc->id);
> 	}
>@@ -2488,7 +2441,7 @@ static void handle_deregister_done(struct xe_guc *guc, struct xe_exec_queue *q)
>
> 	clear_exec_queue_registered(q);
>
>-	if (exec_queue_extra_ref(q) || xe_exec_queue_is_lr(q))
>+	if (xe_exec_queue_is_lr(q))
> 		xe_exec_queue_put(q);
> 	else
> 		__guc_exec_queue_destroy(guc, q);
>-- 
>2.34.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 8/8] drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR
  2025-11-19 22:41 ` [PATCH v4 8/8] drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR Matthew Brost
@ 2025-11-20 20:33   ` Umesh Nerlige Ramappa
  2025-11-21 21:33     ` Matthew Brost
  0 siblings, 1 reply; 22+ messages in thread
From: Umesh Nerlige Ramappa @ 2025-11-20 20:33 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, dri-devel, niranjana.vishwanathapura, christian.koenig,
	pstanner, dakr

On Wed, Nov 19, 2025 at 02:41:06PM -0800, Matthew Brost wrote:
>We now have proper infrastructure to accurately check the LRC timestamp
>without toggling the scheduling state for non-VFs. For VFs, it is still
>possible to get an inaccurate view if the context is on hardware. We
>guard against free-running contexts on VFs by banning jobs whose
>timestamps are not moving. In addition, VFs have a timeslice quantum
>that naturally triggers context switches when more than one VF is
>running, thus updating the LRC timestamp.
>
>For multi-queue, it is desirable to avoid scheduling toggling in the TDR
>because this scheduling state is shared among many queues. Furthermore,
>this change simplifies the GuC state machine. The trade-off for VF cases
>seems worthwhile.
>
>Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>---
> drivers/gpu/drm/xe/xe_guc_submit.c      | 100 ++++++------------------
> drivers/gpu/drm/xe/xe_sched_job.c       |   1 +
> drivers/gpu/drm/xe/xe_sched_job_types.h |   2 +
> 3 files changed, 28 insertions(+), 75 deletions(-)
>
>diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>index 1f2afad1766e..7404716e979f 100644
>--- a/drivers/gpu/drm/xe/xe_guc_submit.c
>+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>@@ -68,9 +68,7 @@ exec_queue_to_guc(struct xe_exec_queue *q)
> #define EXEC_QUEUE_STATE_KILLED			(1 << 7)
> #define EXEC_QUEUE_STATE_WEDGED			(1 << 8)
> #define EXEC_QUEUE_STATE_BANNED			(1 << 9)
>-#define EXEC_QUEUE_STATE_CHECK_TIMEOUT		(1 << 10)
>-#define EXEC_QUEUE_STATE_PENDING_RESUME		(1 << 11)
>-#define EXEC_QUEUE_STATE_PENDING_TDR_EXIT	(1 << 12)
>+#define EXEC_QUEUE_STATE_PENDING_RESUME		(1 << 10)
>

... snip ...

> static bool exec_queue_killed_or_banned_or_wedged(struct xe_exec_queue 
> *q)
> {
> 	return (atomic_read(&q->guc->state) &
>@@ -996,7 +964,7 @@ static bool check_timeout(struct xe_exec_queue *q, struct xe_sched_job *job)
> 	u32 ctx_timestamp, ctx_job_timestamp;
> 	u32 timeout_ms = q->sched_props.job_timeout_ms;
> 	u32 diff;
>-	u64 running_time_ms;
>+	u64 running_time_ms, old_timestamp;
>
> 	if (!xe_sched_job_started(job)) {
> 		xe_gt_warn(gt, "Check job timeout: seqno=%u, lrc_seqno=%u, guc_id=%d, not started",
>@@ -1006,7 +974,17 @@ static bool check_timeout(struct xe_exec_queue *q, struct xe_sched_job *job)
> 		return xe_sched_invalidate_job(job, 2);
> 	}
>
>-	ctx_timestamp = lower_32_bits(xe_lrc_ctx_timestamp(q->lrc[0]));
>+	ctx_timestamp = lower_32_bits(xe_lrc_update_timestamp(q->lrc[0],
>+							      &old_timestamp));

Reg: xe_lrc_update_timestamp()

The way context utilization is using this helper is to accumulate the 
'new - old' values each time this function is called. In the below 
example, context utilization will loose some ticks.

Example:

1. This code calls xe_lrc_update_timestamp() to sample the timestamp for 
TDR purposes. Say context/job is running, then the lrc->ctx_timestamp is 
updated (moved forward).

2. The context utilization code calls xe_lrc_update_timestamp(). Within 
this helper
- old_ts is sampled as lrc->ctx_timestamp
- new_ts is calculated based on whether the job/context is active
- lrc->ctx_timestamp is updated to the new value.

The result is that we lost one chunk of utilization because of the 
previous call from the TDR path. I think some refactor would be needed 
to fix that.

The other comment you already mentioned offline is locking, which I 
think we should add to protect lrc->ctx_timestamp. I don't know if a 
refactor will avoid the lock though.

Thanks,
Umesh

>+	if (ctx_timestamp == job->sample_timestamp) {
>+		xe_gt_warn(gt, "Check job timeout: seqno=%u, lrc_seqno=%u, guc_id=%d, timestamp stuck",
>+			   xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job),
>+			   q->guc->id);
>+
>+		return xe_sched_invalidate_job(job, 2);
>+	}
>+
>+	job->sample_timestamp = ctx_timestamp;
> 	ctx_job_timestamp = xe_lrc_ctx_job_timestamp(q->lrc[0]);
>
> 	/*
>@@ -1135,16 +1113,17 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> 	}
>


... snip ...

>diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h 
>b/drivers/gpu/drm/xe/xe_sched_job_types.h
>index d26612abb4ca..ad5eee8a8cdb 100644
>--- a/drivers/gpu/drm/xe/xe_sched_job_types.h
>+++ b/drivers/gpu/drm/xe/xe_sched_job_types.h
>@@ -59,6 +59,8 @@ struct xe_sched_job {
> 	u32 lrc_seqno;
> 	/** @migrate_flush_flags: Additional flush flags for migration jobs */
> 	u32 migrate_flush_flags;
>+	/** @sample_timestamp: Sampling of job timestamp in TDR */
>+	u64 sample_timestamp;
> 	/** @ring_ops_flush_tlb: The ring ops need to flush TLB before payload. */
> 	bool ring_ops_flush_tlb;
> 	/** @ggtt: mapped in ggtt. */
>-- 
>2.34.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 6/8] drm/xe: Do not deregister queues in TDR
  2025-11-20 19:50   ` Niranjana Vishwanathapura
@ 2025-11-21 21:25     ` Matthew Brost
  2025-11-21 23:51       ` Niranjana Vishwanathapura
  0 siblings, 1 reply; 22+ messages in thread
From: Matthew Brost @ 2025-11-21 21:25 UTC (permalink / raw)
  To: Niranjana Vishwanathapura
  Cc: intel-xe, dri-devel, umesh.nerlige.ramappa, christian.koenig,
	pstanner, dakr

On Thu, Nov 20, 2025 at 11:50:32AM -0800, Niranjana Vishwanathapura wrote:
> On Wed, Nov 19, 2025 at 02:41:04PM -0800, Matthew Brost wrote:
> > Deregistering queues in the TDR introduces unnecessary complexity,
> > requiring reference-counting techniques to function correctly,
> > particularly to prevent use-after-free (UAF) issues while a
> > deregistration initiated from the TDR is in progress.
> > 
> > All that's needed in the TDR is to kick the queue off the hardware,
> > which is achieved by disabling scheduling. Queue deregistration should
> > be handled in a single, well-defined point in the cleanup path, tied to
> > the queue's reference count.
> > 
> > v4:
> > - Explain why extra ref were needed prior to this patch (Niranjana)
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_guc_submit.c | 65 +++++-------------------------
> > 1 file changed, 9 insertions(+), 56 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > index 648c9ea06749..5de300b66767 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > @@ -69,9 +69,8 @@ exec_queue_to_guc(struct xe_exec_queue *q)
> > #define EXEC_QUEUE_STATE_WEDGED			(1 << 8)
> > #define EXEC_QUEUE_STATE_BANNED			(1 << 9)
> > #define EXEC_QUEUE_STATE_CHECK_TIMEOUT		(1 << 10)
> > -#define EXEC_QUEUE_STATE_EXTRA_REF		(1 << 11)
> > -#define EXEC_QUEUE_STATE_PENDING_RESUME		(1 << 12)
> > -#define EXEC_QUEUE_STATE_PENDING_TDR_EXIT	(1 << 13)
> > +#define EXEC_QUEUE_STATE_PENDING_RESUME		(1 << 11)
> > +#define EXEC_QUEUE_STATE_PENDING_TDR_EXIT	(1 << 12)
> > 
> > static bool exec_queue_registered(struct xe_exec_queue *q)
> > {
> > @@ -218,21 +217,6 @@ static void clear_exec_queue_check_timeout(struct xe_exec_queue *q)
> > 	atomic_and(~EXEC_QUEUE_STATE_CHECK_TIMEOUT, &q->guc->state);
> > }
> > 
> > -static bool exec_queue_extra_ref(struct xe_exec_queue *q)
> > -{
> > -	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_EXTRA_REF;
> > -}
> > -
> > -static void set_exec_queue_extra_ref(struct xe_exec_queue *q)
> > -{
> > -	atomic_or(EXEC_QUEUE_STATE_EXTRA_REF, &q->guc->state);
> > -}
> > -
> > -static void clear_exec_queue_extra_ref(struct xe_exec_queue *q)
> > -{
> > -	atomic_and(~EXEC_QUEUE_STATE_EXTRA_REF, &q->guc->state);
> > -}
> > -
> > static bool exec_queue_pending_resume(struct xe_exec_queue *q)
> > {
> > 	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_PENDING_RESUME;
> > @@ -1190,25 +1174,6 @@ static void disable_scheduling(struct xe_exec_queue *q, bool immediate)
> > 		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
> > }
> > 
> > -static void __deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
> > -{
> > -	u32 action[] = {
> > -		XE_GUC_ACTION_DEREGISTER_CONTEXT,
> > -		q->guc->id,
> > -	};
> > -
> > -	xe_gt_assert(guc_to_gt(guc), !exec_queue_destroyed(q));
> > -	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
> > -	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_enable(q));
> > -	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_disable(q));
> > -
> > -	set_exec_queue_destroyed(q);
> > -	trace_xe_exec_queue_deregister(q);
> > -
> > -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> > -		       G2H_LEN_DW_DEREGISTER_CONTEXT, 1);
> > -}
> > -
> > static enum drm_gpu_sched_stat
> > guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> > {
> > @@ -1225,6 +1190,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> > 	bool wedged = false, skip_timeout_check;
> > 
> > 	xe_gt_assert(guc_to_gt(guc), !xe_exec_queue_is_lr(q));
> > +	xe_gt_assert(guc_to_gt(guc), !exec_queue_destroyed(q));
> 
> Is it always guaranteed? What if we get here because TDR is triggered
> by some error notification from the GuC and befor we get here, the
> exec_queue gets destroyed in the CLEANUP message handler? I am not
> sure we can we be sure here that it will be race proof.
> 

Jobs hold a reference to the queue. We have a job here, and the CLEANUP
message (which sets destroyed) is tied to the reference count. So if
this pops, we have a problem. I use asserts in GuC submission to ensure
the state machine (which is fairly complicated) works as designed-this
would be one of those cases.

Matt 

> Niranjana
> 
> > 
> > 	/*
> > 	 * TDR has fired before free job worker. Common if exec queue
> > @@ -1241,8 +1207,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> > 
> > 	/* Must check all state after stopping scheduler */
> > 	skip_timeout_check = exec_queue_reset(q) ||
> > -		exec_queue_killed_or_banned_or_wedged(q) ||
> > -		exec_queue_destroyed(q);
> > +		exec_queue_killed_or_banned_or_wedged(q);
> > 
> > 	/*
> > 	 * If devcoredump not captured and GuC capture for the job is not ready
> > @@ -1271,13 +1236,13 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> > 		wedged = guc_submit_hint_wedged(exec_queue_to_guc(q));
> > 
> > 	/* Engine state now stable, disable scheduling to check timestamp */
> > -	if (!wedged && exec_queue_registered(q)) {
> > +	if (!wedged && (exec_queue_enabled(q) || exec_queue_pending_disable(q))) {
> > 		int ret;
> > 
> > 		if (exec_queue_reset(q))
> > 			err = -EIO;
> > 
> > -		if (!exec_queue_destroyed(q) && xe_uc_fw_is_running(&guc->fw)) {
> > +		if (xe_uc_fw_is_running(&guc->fw)) {
> > 			/*
> > 			 * Wait for any pending G2H to flush out before
> > 			 * modifying state
> > @@ -1327,8 +1292,6 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> > 			xe_devcoredump(q, job,
> > 				       "Schedule disable failed to respond, guc_id=%d, ret=%d, guc_read=%d",
> > 				       q->guc->id, ret, xe_guc_read_stopped(guc));
> > -			set_exec_queue_extra_ref(q);
> > -			xe_exec_queue_get(q);	/* GT reset owns this */
> > 			set_exec_queue_banned(q);
> > 			xe_gt_reset_async(q->gt);
> > 			xe_sched_tdr_queue_imm(sched);
> > @@ -1381,13 +1344,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> > 		}
> > 	}
> > 
> > -	/* Finish cleaning up exec queue via deregister */
> > 	set_exec_queue_banned(q);
> > -	if (!wedged && exec_queue_registered(q) && !exec_queue_destroyed(q)) {
> > -		set_exec_queue_extra_ref(q);
> > -		xe_exec_queue_get(q);
> > -		__deregister_exec_queue(guc, q);
> > -	}
> > 
> > 	/* Mark all outstanding jobs as bad, thus completing them */
> > 	xe_sched_job_set_error(job, err);
> > @@ -1933,7 +1890,7 @@ static void guc_exec_queue_stop(struct xe_guc *guc, struct xe_exec_queue *q)
> > 
> > 	/* Clean up lost G2H + reset engine state */
> > 	if (exec_queue_registered(q)) {
> > -		if (exec_queue_extra_ref(q) || xe_exec_queue_is_lr(q))
> > +		if (xe_exec_queue_is_lr(q))
> > 			xe_exec_queue_put(q);
> > 		else if (exec_queue_destroyed(q))
> > 			__guc_exec_queue_destroy(guc, q);
> > @@ -2067,11 +2024,7 @@ static void guc_exec_queue_revert_pending_state_change(struct xe_guc *guc,
> > 
> > 	if (exec_queue_destroyed(q) && exec_queue_registered(q)) {
> > 		clear_exec_queue_destroyed(q);
> > -		if (exec_queue_extra_ref(q))
> > -			xe_exec_queue_put(q);
> > -		else
> > -			q->guc->needs_cleanup = true;
> > -		clear_exec_queue_extra_ref(q);
> > +		q->guc->needs_cleanup = true;
> > 		xe_gt_dbg(guc_to_gt(guc), "Replay CLEANUP - guc_id=%d",
> > 			  q->guc->id);
> > 	}
> > @@ -2488,7 +2441,7 @@ static void handle_deregister_done(struct xe_guc *guc, struct xe_exec_queue *q)
> > 
> > 	clear_exec_queue_registered(q);
> > 
> > -	if (exec_queue_extra_ref(q) || xe_exec_queue_is_lr(q))
> > +	if (xe_exec_queue_is_lr(q))
> > 		xe_exec_queue_put(q);
> > 	else
> > 		__guc_exec_queue_destroy(guc, q);
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 8/8] drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR
  2025-11-20 20:33   ` Umesh Nerlige Ramappa
@ 2025-11-21 21:33     ` Matthew Brost
  2025-11-25  1:06       ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 22+ messages in thread
From: Matthew Brost @ 2025-11-21 21:33 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa
  Cc: intel-xe, dri-devel, niranjana.vishwanathapura, christian.koenig,
	pstanner, dakr

On Thu, Nov 20, 2025 at 12:33:12PM -0800, Umesh Nerlige Ramappa wrote:
> On Wed, Nov 19, 2025 at 02:41:06PM -0800, Matthew Brost wrote:
> > We now have proper infrastructure to accurately check the LRC timestamp
> > without toggling the scheduling state for non-VFs. For VFs, it is still
> > possible to get an inaccurate view if the context is on hardware. We
> > guard against free-running contexts on VFs by banning jobs whose
> > timestamps are not moving. In addition, VFs have a timeslice quantum
> > that naturally triggers context switches when more than one VF is
> > running, thus updating the LRC timestamp.
> > 
> > For multi-queue, it is desirable to avoid scheduling toggling in the TDR
> > because this scheduling state is shared among many queues. Furthermore,
> > this change simplifies the GuC state machine. The trade-off for VF cases
> > seems worthwhile.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_guc_submit.c      | 100 ++++++------------------
> > drivers/gpu/drm/xe/xe_sched_job.c       |   1 +
> > drivers/gpu/drm/xe/xe_sched_job_types.h |   2 +
> > 3 files changed, 28 insertions(+), 75 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > index 1f2afad1766e..7404716e979f 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > @@ -68,9 +68,7 @@ exec_queue_to_guc(struct xe_exec_queue *q)
> > #define EXEC_QUEUE_STATE_KILLED			(1 << 7)
> > #define EXEC_QUEUE_STATE_WEDGED			(1 << 8)
> > #define EXEC_QUEUE_STATE_BANNED			(1 << 9)
> > -#define EXEC_QUEUE_STATE_CHECK_TIMEOUT		(1 << 10)
> > -#define EXEC_QUEUE_STATE_PENDING_RESUME		(1 << 11)
> > -#define EXEC_QUEUE_STATE_PENDING_TDR_EXIT	(1 << 12)
> > +#define EXEC_QUEUE_STATE_PENDING_RESUME		(1 << 10)
> > 
> 
> ... snip ...
> 
> > static bool exec_queue_killed_or_banned_or_wedged(struct xe_exec_queue
> > *q)
> > {
> > 	return (atomic_read(&q->guc->state) &
> > @@ -996,7 +964,7 @@ static bool check_timeout(struct xe_exec_queue *q, struct xe_sched_job *job)
> > 	u32 ctx_timestamp, ctx_job_timestamp;
> > 	u32 timeout_ms = q->sched_props.job_timeout_ms;
> > 	u32 diff;
> > -	u64 running_time_ms;
> > +	u64 running_time_ms, old_timestamp;
> > 
> > 	if (!xe_sched_job_started(job)) {
> > 		xe_gt_warn(gt, "Check job timeout: seqno=%u, lrc_seqno=%u, guc_id=%d, not started",
> > @@ -1006,7 +974,17 @@ static bool check_timeout(struct xe_exec_queue *q, struct xe_sched_job *job)
> > 		return xe_sched_invalidate_job(job, 2);
> > 	}
> > 
> > -	ctx_timestamp = lower_32_bits(xe_lrc_ctx_timestamp(q->lrc[0]));
> > +	ctx_timestamp = lower_32_bits(xe_lrc_update_timestamp(q->lrc[0],
> > +							      &old_timestamp));
> 
> Reg: xe_lrc_update_timestamp()
> 
> The way context utilization is using this helper is to accumulate the 'new -
> old' values each time this function is called. In the below example, context
> utilization will loose some ticks.
> 
> Example:
> 
> 1. This code calls xe_lrc_update_timestamp() to sample the timestamp for TDR
> purposes. Say context/job is running, then the lrc->ctx_timestamp is updated
> (moved forward).
> 
> 2. The context utilization code calls xe_lrc_update_timestamp(). Within this
> helper
> - old_ts is sampled as lrc->ctx_timestamp
> - new_ts is calculated based on whether the job/context is active
> - lrc->ctx_timestamp is updated to the new value.
> 
> The result is that we lost one chunk of utilization because of the previous
> call from the TDR path. I think some refactor would be needed to fix that.
> 
> The other comment you already mentioned offline is locking, which I think we
> should add to protect lrc->ctx_timestamp. I don't know if a refactor will
> avoid the lock though.
> 

I agree with you analysis here - thanks for the help.

How about - we extract the following code from
xe_exec_queue_update_run_ticks into helper that also returns the current
timestamp and is also protected by an queue spin lock:

         new_ts = xe_lrc_update_timestamp(lrc, &old_ts);
         q->xef->run_ticks[q->class] += (new_ts - old_ts) * q->width;
 
It harmless if the TDR also updates run_ticks when it samples the LRC
timestamp, right? Also the helper just skips run_ticks if q->xef is
NULL.

Matt

> Thanks,
> Umesh
> 
> > +	if (ctx_timestamp == job->sample_timestamp) {
> > +		xe_gt_warn(gt, "Check job timeout: seqno=%u, lrc_seqno=%u, guc_id=%d, timestamp stuck",
> > +			   xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job),
> > +			   q->guc->id);
> > +
> > +		return xe_sched_invalidate_job(job, 2);
> > +	}
> > +
> > +	job->sample_timestamp = ctx_timestamp;
> > 	ctx_job_timestamp = xe_lrc_ctx_job_timestamp(q->lrc[0]);
> > 
> > 	/*
> > @@ -1135,16 +1113,17 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> > 	}
> > 
> 
> 
> ... snip ...
> 
> > diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h
> > b/drivers/gpu/drm/xe/xe_sched_job_types.h
> > index d26612abb4ca..ad5eee8a8cdb 100644
> > --- a/drivers/gpu/drm/xe/xe_sched_job_types.h
> > +++ b/drivers/gpu/drm/xe/xe_sched_job_types.h
> > @@ -59,6 +59,8 @@ struct xe_sched_job {
> > 	u32 lrc_seqno;
> > 	/** @migrate_flush_flags: Additional flush flags for migration jobs */
> > 	u32 migrate_flush_flags;
 > >+	/** @sample_timestamp: Sampling of job timestamp in TDR */
> > +	u64 sample_timestamp;
> > 	/** @ring_ops_flush_tlb: The ring ops need to flush TLB before payload. */
> > 	bool ring_ops_flush_tlb;
> > 	/** @ggtt: mapped in ggtt. */
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 5/8] drm/xe: Only toggle scheduling in TDR if GuC is running
  2025-11-20 19:48   ` Niranjana Vishwanathapura
@ 2025-11-21 22:06     ` Matthew Brost
  2025-11-21 23:52       ` Niranjana Vishwanathapura
  0 siblings, 1 reply; 22+ messages in thread
From: Matthew Brost @ 2025-11-21 22:06 UTC (permalink / raw)
  To: Niranjana Vishwanathapura
  Cc: intel-xe, dri-devel, umesh.nerlige.ramappa, christian.koenig,
	pstanner, dakr

On Thu, Nov 20, 2025 at 11:48:13AM -0800, Niranjana Vishwanathapura wrote:
> On Wed, Nov 19, 2025 at 02:41:03PM -0800, Matthew Brost wrote:
> > If the firmware is not running during TDR (e.g., when the driver is
> > unloading), there's no need to toggle scheduling in the GuC. In such
> > cases, skip this step.
> > 
> > v4:
> > - Bail on wait UC not running (Niranjana)
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_guc_submit.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > index 3ee35d4873bc..648c9ea06749 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > @@ -1277,7 +1277,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> > 		if (exec_queue_reset(q))
> > 			err = -EIO;
> > 
> > -		if (!exec_queue_destroyed(q)) {
> > +		if (!exec_queue_destroyed(q) && xe_uc_fw_is_running(&guc->fw)) {
> > 			/*
> > 			 * Wait for any pending G2H to flush out before
> > 			 * modifying state
> > @@ -1312,6 +1312,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> > 		 */
> > 		smp_rmb();
> > 		ret = wait_event_timeout(guc->ct.wq,
> > +					 !xe_uc_fw_is_running(&guc->fw) ||
> > 					 !exec_queue_pending_disable(q) ||
> > 					 xe_guc_read_stopped(guc) ||
> > 					 vf_recovery(guc), HZ * 5);
> 
> What if the wait exits because of '!xe_uc_fw_is_running(&guc->fw)'?
> It is not clear where the control goes in that case based on all the
> 'if' checks that follows this wait. Should there be a specific check
> for '!!xe_uc_fw_is_running(&guc->fw)' following the wait here?
> 

Return will be zero and we should tear down the queue. Also I believe
this is covering case where the driver is unbinding and schedule disable
CT blew up a warning. I think the logic works as is or at least I don't
see a problem.

Matt

> Niranjana
> 
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 6/8] drm/xe: Do not deregister queues in TDR
  2025-11-21 21:25     ` Matthew Brost
@ 2025-11-21 23:51       ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 22+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-21 23:51 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, dri-devel, umesh.nerlige.ramappa, christian.koenig,
	pstanner, dakr

On Fri, Nov 21, 2025 at 01:25:58PM -0800, Matthew Brost wrote:
>On Thu, Nov 20, 2025 at 11:50:32AM -0800, Niranjana Vishwanathapura wrote:
>> On Wed, Nov 19, 2025 at 02:41:04PM -0800, Matthew Brost wrote:
>> > Deregistering queues in the TDR introduces unnecessary complexity,
>> > requiring reference-counting techniques to function correctly,
>> > particularly to prevent use-after-free (UAF) issues while a
>> > deregistration initiated from the TDR is in progress.
>> >
>> > All that's needed in the TDR is to kick the queue off the hardware,
>> > which is achieved by disabling scheduling. Queue deregistration should
>> > be handled in a single, well-defined point in the cleanup path, tied to
>> > the queue's reference count.
>> >
>> > v4:
>> > - Explain why extra ref were needed prior to this patch (Niranjana)
>> >
>> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> > ---
>> > drivers/gpu/drm/xe/xe_guc_submit.c | 65 +++++-------------------------
>> > 1 file changed, 9 insertions(+), 56 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>> > index 648c9ea06749..5de300b66767 100644
>> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
>> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>> > @@ -69,9 +69,8 @@ exec_queue_to_guc(struct xe_exec_queue *q)
>> > #define EXEC_QUEUE_STATE_WEDGED			(1 << 8)
>> > #define EXEC_QUEUE_STATE_BANNED			(1 << 9)
>> > #define EXEC_QUEUE_STATE_CHECK_TIMEOUT		(1 << 10)
>> > -#define EXEC_QUEUE_STATE_EXTRA_REF		(1 << 11)
>> > -#define EXEC_QUEUE_STATE_PENDING_RESUME		(1 << 12)
>> > -#define EXEC_QUEUE_STATE_PENDING_TDR_EXIT	(1 << 13)
>> > +#define EXEC_QUEUE_STATE_PENDING_RESUME		(1 << 11)
>> > +#define EXEC_QUEUE_STATE_PENDING_TDR_EXIT	(1 << 12)
>> >
>> > static bool exec_queue_registered(struct xe_exec_queue *q)
>> > {
>> > @@ -218,21 +217,6 @@ static void clear_exec_queue_check_timeout(struct xe_exec_queue *q)
>> > 	atomic_and(~EXEC_QUEUE_STATE_CHECK_TIMEOUT, &q->guc->state);
>> > }
>> >
>> > -static bool exec_queue_extra_ref(struct xe_exec_queue *q)
>> > -{
>> > -	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_EXTRA_REF;
>> > -}
>> > -
>> > -static void set_exec_queue_extra_ref(struct xe_exec_queue *q)
>> > -{
>> > -	atomic_or(EXEC_QUEUE_STATE_EXTRA_REF, &q->guc->state);
>> > -}
>> > -
>> > -static void clear_exec_queue_extra_ref(struct xe_exec_queue *q)
>> > -{
>> > -	atomic_and(~EXEC_QUEUE_STATE_EXTRA_REF, &q->guc->state);
>> > -}
>> > -
>> > static bool exec_queue_pending_resume(struct xe_exec_queue *q)
>> > {
>> > 	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_PENDING_RESUME;
>> > @@ -1190,25 +1174,6 @@ static void disable_scheduling(struct xe_exec_queue *q, bool immediate)
>> > 		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
>> > }
>> >
>> > -static void __deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
>> > -{
>> > -	u32 action[] = {
>> > -		XE_GUC_ACTION_DEREGISTER_CONTEXT,
>> > -		q->guc->id,
>> > -	};
>> > -
>> > -	xe_gt_assert(guc_to_gt(guc), !exec_queue_destroyed(q));
>> > -	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
>> > -	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_enable(q));
>> > -	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_disable(q));
>> > -
>> > -	set_exec_queue_destroyed(q);
>> > -	trace_xe_exec_queue_deregister(q);
>> > -
>> > -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> > -		       G2H_LEN_DW_DEREGISTER_CONTEXT, 1);
>> > -}
>> > -
>> > static enum drm_gpu_sched_stat
>> > guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>> > {
>> > @@ -1225,6 +1190,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>> > 	bool wedged = false, skip_timeout_check;
>> >
>> > 	xe_gt_assert(guc_to_gt(guc), !xe_exec_queue_is_lr(q));
>> > +	xe_gt_assert(guc_to_gt(guc), !exec_queue_destroyed(q));
>>
>> Is it always guaranteed? What if we get here because TDR is triggered
>> by some error notification from the GuC and befor we get here, the
>> exec_queue gets destroyed in the CLEANUP message handler? I am not
>> sure we can we be sure here that it will be race proof.
>>
>
>Jobs hold a reference to the queue. We have a job here, and the CLEANUP
>message (which sets destroyed) is tied to the reference count. So if
>this pops, we have a problem. I use asserts in GuC submission to ensure
>the state machine (which is fairly complicated) works as designed-this
>would be one of those cases.
>

Ok, sounds good.
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

>Matt
>
>> Niranjana
>>
>> >
>> > 	/*
>> > 	 * TDR has fired before free job worker. Common if exec queue
>> > @@ -1241,8 +1207,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>> >
>> > 	/* Must check all state after stopping scheduler */
>> > 	skip_timeout_check = exec_queue_reset(q) ||
>> > -		exec_queue_killed_or_banned_or_wedged(q) ||
>> > -		exec_queue_destroyed(q);
>> > +		exec_queue_killed_or_banned_or_wedged(q);
>> >
>> > 	/*
>> > 	 * If devcoredump not captured and GuC capture for the job is not ready
>> > @@ -1271,13 +1236,13 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>> > 		wedged = guc_submit_hint_wedged(exec_queue_to_guc(q));
>> >
>> > 	/* Engine state now stable, disable scheduling to check timestamp */
>> > -	if (!wedged && exec_queue_registered(q)) {
>> > +	if (!wedged && (exec_queue_enabled(q) || exec_queue_pending_disable(q))) {
>> > 		int ret;
>> >
>> > 		if (exec_queue_reset(q))
>> > 			err = -EIO;
>> >
>> > -		if (!exec_queue_destroyed(q) && xe_uc_fw_is_running(&guc->fw)) {
>> > +		if (xe_uc_fw_is_running(&guc->fw)) {
>> > 			/*
>> > 			 * Wait for any pending G2H to flush out before
>> > 			 * modifying state
>> > @@ -1327,8 +1292,6 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>> > 			xe_devcoredump(q, job,
>> > 				       "Schedule disable failed to respond, guc_id=%d, ret=%d, guc_read=%d",
>> > 				       q->guc->id, ret, xe_guc_read_stopped(guc));
>> > -			set_exec_queue_extra_ref(q);
>> > -			xe_exec_queue_get(q);	/* GT reset owns this */
>> > 			set_exec_queue_banned(q);
>> > 			xe_gt_reset_async(q->gt);
>> > 			xe_sched_tdr_queue_imm(sched);
>> > @@ -1381,13 +1344,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>> > 		}
>> > 	}
>> >
>> > -	/* Finish cleaning up exec queue via deregister */
>> > 	set_exec_queue_banned(q);
>> > -	if (!wedged && exec_queue_registered(q) && !exec_queue_destroyed(q)) {
>> > -		set_exec_queue_extra_ref(q);
>> > -		xe_exec_queue_get(q);
>> > -		__deregister_exec_queue(guc, q);
>> > -	}
>> >
>> > 	/* Mark all outstanding jobs as bad, thus completing them */
>> > 	xe_sched_job_set_error(job, err);
>> > @@ -1933,7 +1890,7 @@ static void guc_exec_queue_stop(struct xe_guc *guc, struct xe_exec_queue *q)
>> >
>> > 	/* Clean up lost G2H + reset engine state */
>> > 	if (exec_queue_registered(q)) {
>> > -		if (exec_queue_extra_ref(q) || xe_exec_queue_is_lr(q))
>> > +		if (xe_exec_queue_is_lr(q))
>> > 			xe_exec_queue_put(q);
>> > 		else if (exec_queue_destroyed(q))
>> > 			__guc_exec_queue_destroy(guc, q);
>> > @@ -2067,11 +2024,7 @@ static void guc_exec_queue_revert_pending_state_change(struct xe_guc *guc,
>> >
>> > 	if (exec_queue_destroyed(q) && exec_queue_registered(q)) {
>> > 		clear_exec_queue_destroyed(q);
>> > -		if (exec_queue_extra_ref(q))
>> > -			xe_exec_queue_put(q);
>> > -		else
>> > -			q->guc->needs_cleanup = true;
>> > -		clear_exec_queue_extra_ref(q);
>> > +		q->guc->needs_cleanup = true;
>> > 		xe_gt_dbg(guc_to_gt(guc), "Replay CLEANUP - guc_id=%d",
>> > 			  q->guc->id);
>> > 	}
>> > @@ -2488,7 +2441,7 @@ static void handle_deregister_done(struct xe_guc *guc, struct xe_exec_queue *q)
>> >
>> > 	clear_exec_queue_registered(q);
>> >
>> > -	if (exec_queue_extra_ref(q) || xe_exec_queue_is_lr(q))
>> > +	if (xe_exec_queue_is_lr(q))
>> > 		xe_exec_queue_put(q);
>> > 	else
>> > 		__guc_exec_queue_destroy(guc, q);
>> > --
>> > 2.34.1
>> >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 5/8] drm/xe: Only toggle scheduling in TDR if GuC is running
  2025-11-21 22:06     ` Matthew Brost
@ 2025-11-21 23:52       ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 22+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-21 23:52 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, dri-devel, umesh.nerlige.ramappa, christian.koenig,
	pstanner, dakr

On Fri, Nov 21, 2025 at 02:06:41PM -0800, Matthew Brost wrote:
>On Thu, Nov 20, 2025 at 11:48:13AM -0800, Niranjana Vishwanathapura wrote:
>> On Wed, Nov 19, 2025 at 02:41:03PM -0800, Matthew Brost wrote:
>> > If the firmware is not running during TDR (e.g., when the driver is
>> > unloading), there's no need to toggle scheduling in the GuC. In such
>> > cases, skip this step.
>> >
>> > v4:
>> > - Bail on wait UC not running (Niranjana)
>> >
>> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> > ---
>> > drivers/gpu/drm/xe/xe_guc_submit.c | 3 ++-
>> > 1 file changed, 2 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>> > index 3ee35d4873bc..648c9ea06749 100644
>> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
>> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>> > @@ -1277,7 +1277,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>> > 		if (exec_queue_reset(q))
>> > 			err = -EIO;
>> >
>> > -		if (!exec_queue_destroyed(q)) {
>> > +		if (!exec_queue_destroyed(q) && xe_uc_fw_is_running(&guc->fw)) {
>> > 			/*
>> > 			 * Wait for any pending G2H to flush out before
>> > 			 * modifying state
>> > @@ -1312,6 +1312,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>> > 		 */
>> > 		smp_rmb();
>> > 		ret = wait_event_timeout(guc->ct.wq,
>> > +					 !xe_uc_fw_is_running(&guc->fw) ||
>> > 					 !exec_queue_pending_disable(q) ||
>> > 					 xe_guc_read_stopped(guc) ||
>> > 					 vf_recovery(guc), HZ * 5);
>>
>> What if the wait exits because of '!xe_uc_fw_is_running(&guc->fw)'?
>> It is not clear where the control goes in that case based on all the
>> 'if' checks that follows this wait. Should there be a specific check
>> for '!!xe_uc_fw_is_running(&guc->fw)' following the wait here?
>>
>
>Return will be zero and we should tear down the queue. Also I believe
>this is covering case where the driver is unbinding and schedule disable
>CT blew up a warning. I think the logic works as is or at least I don't
>see a problem.
>

Ok, sounds good.
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

>Matt
>
>> Niranjana
>>
>> > --
>> > 2.34.1
>> >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 8/8] drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR
  2025-11-21 21:33     ` Matthew Brost
@ 2025-11-25  1:06       ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 22+ messages in thread
From: Umesh Nerlige Ramappa @ 2025-11-25  1:06 UTC (permalink / raw)
  To: Matthew Brost
  Cc: intel-xe, dri-devel, niranjana.vishwanathapura, christian.koenig,
	pstanner, dakr

On Fri, Nov 21, 2025 at 01:33:06PM -0800, Matthew Brost wrote:
>On Thu, Nov 20, 2025 at 12:33:12PM -0800, Umesh Nerlige Ramappa wrote:
>> On Wed, Nov 19, 2025 at 02:41:06PM -0800, Matthew Brost wrote:
>> > We now have proper infrastructure to accurately check the LRC timestamp
>> > without toggling the scheduling state for non-VFs. For VFs, it is still
>> > possible to get an inaccurate view if the context is on hardware. We
>> > guard against free-running contexts on VFs by banning jobs whose
>> > timestamps are not moving. In addition, VFs have a timeslice quantum
>> > that naturally triggers context switches when more than one VF is
>> > running, thus updating the LRC timestamp.
>> >
>> > For multi-queue, it is desirable to avoid scheduling toggling in the TDR
>> > because this scheduling state is shared among many queues. Furthermore,
>> > this change simplifies the GuC state machine. The trade-off for VF cases
>> > seems worthwhile.
>> >
>> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> > ---
>> > drivers/gpu/drm/xe/xe_guc_submit.c      | 100 ++++++------------------
>> > drivers/gpu/drm/xe/xe_sched_job.c       |   1 +
>> > drivers/gpu/drm/xe/xe_sched_job_types.h |   2 +
>> > 3 files changed, 28 insertions(+), 75 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>> > index 1f2afad1766e..7404716e979f 100644
>> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
>> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>> > @@ -68,9 +68,7 @@ exec_queue_to_guc(struct xe_exec_queue *q)
>> > #define EXEC_QUEUE_STATE_KILLED			(1 << 7)
>> > #define EXEC_QUEUE_STATE_WEDGED			(1 << 8)
>> > #define EXEC_QUEUE_STATE_BANNED			(1 << 9)
>> > -#define EXEC_QUEUE_STATE_CHECK_TIMEOUT		(1 << 10)
>> > -#define EXEC_QUEUE_STATE_PENDING_RESUME		(1 << 11)
>> > -#define EXEC_QUEUE_STATE_PENDING_TDR_EXIT	(1 << 12)
>> > +#define EXEC_QUEUE_STATE_PENDING_RESUME		(1 << 10)
>> >
>>
>> ... snip ...
>>
>> > static bool exec_queue_killed_or_banned_or_wedged(struct xe_exec_queue
>> > *q)
>> > {
>> > 	return (atomic_read(&q->guc->state) &
>> > @@ -996,7 +964,7 @@ static bool check_timeout(struct xe_exec_queue *q, struct xe_sched_job *job)
>> > 	u32 ctx_timestamp, ctx_job_timestamp;
>> > 	u32 timeout_ms = q->sched_props.job_timeout_ms;
>> > 	u32 diff;
>> > -	u64 running_time_ms;
>> > +	u64 running_time_ms, old_timestamp;
>> >
>> > 	if (!xe_sched_job_started(job)) {
>> > 		xe_gt_warn(gt, "Check job timeout: seqno=%u, lrc_seqno=%u, guc_id=%d, not started",
>> > @@ -1006,7 +974,17 @@ static bool check_timeout(struct xe_exec_queue *q, struct xe_sched_job *job)
>> > 		return xe_sched_invalidate_job(job, 2);
>> > 	}
>> >
>> > -	ctx_timestamp = lower_32_bits(xe_lrc_ctx_timestamp(q->lrc[0]));
>> > +	ctx_timestamp = lower_32_bits(xe_lrc_update_timestamp(q->lrc[0],
>> > +							      &old_timestamp));
>>
>> Reg: xe_lrc_update_timestamp()
>>
>> The way context utilization is using this helper is to accumulate the 'new -
>> old' values each time this function is called. In the below example, context
>> utilization will loose some ticks.
>>
>> Example:
>>
>> 1. This code calls xe_lrc_update_timestamp() to sample the timestamp for TDR
>> purposes. Say context/job is running, then the lrc->ctx_timestamp is updated
>> (moved forward).
>>
>> 2. The context utilization code calls xe_lrc_update_timestamp(). Within this
>> helper
>> - old_ts is sampled as lrc->ctx_timestamp
>> - new_ts is calculated based on whether the job/context is active
>> - lrc->ctx_timestamp is updated to the new value.
>>
>> The result is that we lost one chunk of utilization because of the previous
>> call from the TDR path. I think some refactor would be needed to fix that.
>>
>> The other comment you already mentioned offline is locking, which I think we
>> should add to protect lrc->ctx_timestamp. I don't know if a refactor will
>> avoid the lock though.
>>
>
>I agree with you analysis here - thanks for the help.
>
>How about - we extract the following code from
>xe_exec_queue_update_run_ticks into helper that also returns the current
>timestamp and is also protected by an queue spin lock:
>
>         new_ts = xe_lrc_update_timestamp(lrc, &old_ts);
>         q->xef->run_ticks[q->class] += (new_ts - old_ts) * q->width;

I was thinking something like below.

/**
  * xe_lrc_timestamp() - Current ctx timestamp
  * @lrc: Pointer to the lrc.
  *
  * Return latest ctx timestamp.
  *
  * Returns: New ctx timestamp value
  */
u64 xe_lrc_timestamp(struct xe_lrc *lrc)
{
         u64 lrc_ts, reg_ts, new_ts;
         u32 engine_id;

         lrc_ts = xe_lrc_ctx_timestamp(lrc);
         /* CTX_TIMESTAMP mmio read is invalid on VF, so return the LRC value */
         if (IS_SRIOV_VF(lrc_to_xe(lrc))) {
                 new_ts = lrc_ts;
                 goto done;
         }

         if (lrc_ts == CONTEXT_ACTIVE) {
                 engine_id = xe_lrc_engine_id(lrc);
                 if (!get_ctx_timestamp(lrc, engine_id, &reg_ts))
                         new_ts = reg_ts;

                 /* read lrc again to ensure context is still active */
                 lrc_ts = xe_lrc_ctx_timestamp(lrc);
         }

         /*
          * If context switched out, just use the lrc_ts. Note that this needs to
          * be a separate if condition.
          */
         if (lrc_ts != CONTEXT_ACTIVE)
                 new_ts = lrc_ts;

done:
         return new_ts;
}

/**
  * xe_lrc_update_timestamp() - Update ctx timestamp
  * @lrc: Pointer to the lrc.
  * @old_ts: Old timestamp value
  *
  * Populate @old_ts current saved ctx timestamp, read new ctx timestamp and
  * update saved value. With support for active contexts, the calculation may be
  * slightly racy, so follow a read-again logic to ensure that the context is
  * still active before returning the right timestamp.
  *
  * Returns: New ctx timestamp value
  */
u64 xe_lrc_update_timestamp(struct xe_lrc *lrc, u64 *old_ts)
{
         *old_ts = lrc->ctx_timestamp;
         lrc->ctx_timestamp = xe_lrc_timestamp(lrc);

         trace_xe_lrc_update_timestamp(lrc, *old_ts);

         return lrc->ctx_timestamp;
}

TDR logic could just use xe_lrc_timestamp() since it does not care about 
old_ts anyways. We could avoid the lock since xe_lrc_update_timestamp() 
is the only place where lrc->ctx_timestamp gets updated.

Thanks,
Umesh

>
>It harmless if the TDR also updates run_ticks when it samples the LRC
>timestamp, right? Also the helper just skips run_ticks if q->xef is
>NULL.
>
>Matt
>
>> Thanks,
>> Umesh
>>
>> > +	if (ctx_timestamp == job->sample_timestamp) {
>> > +		xe_gt_warn(gt, "Check job timeout: seqno=%u, lrc_seqno=%u, guc_id=%d, timestamp stuck",
>> > +			   xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job),
>> > +			   q->guc->id);
>> > +
>> > +		return xe_sched_invalidate_job(job, 2);
>> > +	}
>> > +
>> > +	job->sample_timestamp = ctx_timestamp;
>> > 	ctx_job_timestamp = xe_lrc_ctx_job_timestamp(q->lrc[0]);
>> >
>> > 	/*
>> > @@ -1135,16 +1113,17 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>> > 	}
>> >
>>
>>
>> ... snip ...
>>
>> > diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h
>> > b/drivers/gpu/drm/xe/xe_sched_job_types.h
>> > index d26612abb4ca..ad5eee8a8cdb 100644
>> > --- a/drivers/gpu/drm/xe/xe_sched_job_types.h
>> > +++ b/drivers/gpu/drm/xe/xe_sched_job_types.h
>> > @@ -59,6 +59,8 @@ struct xe_sched_job {
>> > 	u32 lrc_seqno;
>> > 	/** @migrate_flush_flags: Additional flush flags for migration jobs */
>> > 	u32 migrate_flush_flags;
> > >+	/** @sample_timestamp: Sampling of job timestamp in TDR */
>> > +	u64 sample_timestamp;
>> > 	/** @ring_ops_flush_tlb: The ring ops need to flush TLB before payload. */
>> > 	bool ring_ops_flush_tlb;
>> > 	/** @ggtt: mapped in ggtt. */
>> > --
>> > 2.34.1
>> >

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-11-25  1:06 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-19 22:40 [PATCH v4 0/8] Fix DRM scheduler layering violations in Xe Matthew Brost
2025-11-19 22:40 ` [PATCH v4 1/8] drm/sched: Add several job helpers to avoid drivers touching scheduler state Matthew Brost
2025-11-20 17:20   ` Niranjana Vishwanathapura
2025-11-19 22:41 ` [PATCH v4 2/8] drm/sched: Add pending job list iterator Matthew Brost
2025-11-20 17:21   ` Niranjana Vishwanathapura
2025-11-19 22:41 ` [PATCH v4 3/8] drm/xe: Add dedicated message lock Matthew Brost
2025-11-19 22:41 ` [PATCH v4 4/8] drm/xe: Stop abusing DRM scheduler internals Matthew Brost
2025-11-20 17:26   ` Niranjana Vishwanathapura
2025-11-19 22:41 ` [PATCH v4 5/8] drm/xe: Only toggle scheduling in TDR if GuC is running Matthew Brost
2025-11-20 19:48   ` Niranjana Vishwanathapura
2025-11-21 22:06     ` Matthew Brost
2025-11-21 23:52       ` Niranjana Vishwanathapura
2025-11-19 22:41 ` [PATCH v4 6/8] drm/xe: Do not deregister queues in TDR Matthew Brost
2025-11-20 19:50   ` Niranjana Vishwanathapura
2025-11-21 21:25     ` Matthew Brost
2025-11-21 23:51       ` Niranjana Vishwanathapura
2025-11-19 22:41 ` [PATCH v4 7/8] drm/xe: Remove special casing for LR queues in submission Matthew Brost
2025-11-20 18:53   ` Niranjana Vishwanathapura
2025-11-19 22:41 ` [PATCH v4 8/8] drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR Matthew Brost
2025-11-20 20:33   ` Umesh Nerlige Ramappa
2025-11-21 21:33     ` Matthew Brost
2025-11-25  1:06       ` Umesh Nerlige Ramappa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox