Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 1/3] drm/xe: skip banning kernel migration queue on TDR timeout
@ 2026-06-03 12:06 Sanjay Yadav
  2026-06-03 12:06 ` [RFC PATCH 2/3] drm/sched: fix drm_sched_tdr_queue_imm to not corrupt timeout value Sanjay Yadav
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Sanjay Yadav @ 2026-06-03 12:06 UTC (permalink / raw)
  To: intel-xe
  Cc: dri-devel, rodrigo.vivi, nirmoy.das, umesh.nerlige.ramappa,
	thomas.hellstrom, matthew.brost, niranjana.vishwanathapura,
	thomas.hellstrom, fei.yang, himal.prasad.ghimiray,
	matthew.d.roper, maarten.lankhorst, joonas.lahtinen, matthew.auld

guc_exec_queue_timedout_job() unconditionally bans the queue once a
job times out. For the kernel migration queue this is fatal — once
banned, no page table migrations can complete and the GPU is
effectively dead until driver reload.

The submission is already stopped and the timed-out job is erred out,
so banning is not needed for correctness. GT reset handles the actual
hardware recovery. Skip banning for kernel queues so they remain
available after reset.

Fixes: bb63e7257e63 ("drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Assisted-by: Claude:claude-opus-4.6
Suggested-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index ab501513d806..e6ad57cbbf0e 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1543,7 +1543,8 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	if (!exec_queue_killed(q))
 		wedged = guc_submit_hint_wedged(exec_queue_to_guc(q));
 
-	set_exec_queue_banned(q);
+	if (!(q->flags & EXEC_QUEUE_FLAG_KERNEL))
+		set_exec_queue_banned(q);
 
 	/* Kick job / queue off hardware */
 	if (!wedged && (exec_queue_enabled(primary) ||
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-06-03 15:13 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-03 12:06 [RFC PATCH 1/3] drm/xe: skip banning kernel migration queue on TDR timeout Sanjay Yadav
2026-06-03 12:06 ` [RFC PATCH 2/3] drm/sched: fix drm_sched_tdr_queue_imm to not corrupt timeout value Sanjay Yadav
2026-06-03 13:47   ` Rodrigo Vivi
2026-06-03 12:06 ` [RFC PATCH 3/3] drm/xe: don't cancel other pending jobs on kernel migration queue timeout Sanjay Yadav
2026-06-03 12:21 ` ✓ CI.KUnit: success for series starting with [RFC,1/3] drm/xe: skip banning kernel migration queue on TDR timeout Patchwork
2026-06-03 12:42 ` [RFC PATCH 1/3] " Matthew Auld
2026-06-03 13:52   ` Rodrigo Vivi
2026-06-03 15:13     ` Hellstrom, Thomas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox