Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/xe/guc: Destroy LR exec queue directly if GuC is not running
@ 2025-10-14  3:36 Shuicheng Lin
  2025-10-14  4:28 ` ✗ CI.checkpatch: warning for " Patchwork
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Shuicheng Lin @ 2025-10-14  3:36 UTC (permalink / raw)
  To: intel-xe; +Cc: Shuicheng Lin, Matthew Brost

During LR exec queue cleanup, if the GuC firmware is not running,
the driver cannot communicate with the GuC to properly deregister
the exec queue. In this case, directly destroy the exec queue
instead of attempting deregistration.

This prevents schedule disable failure and GuC ID resource leaks as
below dmesg log:
"
[   50.242564] pci 0000:03:00.0: [drm] GT0: Schedule disable failed to respond, guc_id=2
[   50.242568] ------------[ cut here ]------------
[   50.242584] pci 0000:03:00.0: [drm] Assertion `ret` failed!
...
[   50.244942] pci 0000:03:00.0: [drm] *ERROR* GT0: GUC ID manager unclean (1/65535)
[   50.244970] pci 0000:03:00.0: [drm] GT0:     total 65535
[   50.245002] pci 0000:03:00.0: [drm] GT0:     used 1
[   50.245032] pci 0000:03:00.0: [drm] GT0:     range 2..2 (1)
"

Fixes: 8ae8a2e8dd21 ("drm/xe: Long running job update")
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 0ef67d3523a7..d2dfbdc82920 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -47,6 +47,8 @@
 #include "xe_uc_fw.h"
 #include "xe_vm.h"
 
+static void __guc_exec_queue_destroy(struct xe_guc *guc, struct xe_exec_queue *q);
+
 static struct xe_guc *
 exec_queue_to_guc(struct xe_exec_queue *q)
 {
@@ -1060,10 +1062,15 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
 	 * state.
 	 */
 	if (!wedged && exec_queue_registered(q) && !exec_queue_destroyed(q)) {
-		struct xe_guc *guc = exec_queue_to_guc(q);
 		int ret;
 
 		set_exec_queue_banned(q);
+		/* If GuC is not running, just destroy the exec queue as we can't communicate with it */
+		if (!xe_uc_fw_is_running(&guc->fw)) {
+			__guc_exec_queue_destroy(guc, q);
+			goto skip_deregister;
+		}
+
 		disable_scheduling_deregister(guc, q);
 
 		/*
@@ -1088,6 +1095,7 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
 		}
 	}
 
+skip_deregister:
 	if (!exec_queue_killed(q) && !xe_lrc_ring_is_idle(q->lrc[0]))
 		xe_devcoredump(q, NULL, "LR job cleanup, guc_id=%d", q->guc->id);
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-10-16 14:51 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-14  3:36 [PATCH] drm/xe/guc: Destroy LR exec queue directly if GuC is not running Shuicheng Lin
2025-10-14  4:28 ` ✗ CI.checkpatch: warning for " Patchwork
2025-10-14  4:29 ` ✓ CI.KUnit: success " Patchwork
2025-10-14  5:08 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-14 12:53 ` ✗ Xe.CI.Full: failure " Patchwork
2025-10-15 17:07 ` [PATCH] " Lin, Shuicheng
2025-10-15 18:01   ` Summers, Stuart
2025-10-15 18:11     ` Lin, Shuicheng
2025-10-15 18:15       ` Summers, Stuart
2025-10-15 18:02 ` Daniele Ceraolo Spurio
2025-10-15 21:07   ` Lin, Shuicheng
2025-10-16  3:24 ` Matthew Brost
2025-10-16 14:51   ` Lin, Shuicheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox