From: Stuart Summers <stuart.summers@intel.com>
Cc: intel-xe@lists.freedesktop.org, matthew.brost@intel.com,
Stuart Summers <stuart.summers@intel.com>
Subject: [PATCH 6/6] drm/xe: Clean up GuC software state after a wedge
Date: Tue, 14 Oct 2025 18:09:27 +0000 [thread overview]
Message-ID: <20251014180927.105077-7-stuart.summers@intel.com> (raw)
In-Reply-To: <20251014180927.105077-1-stuart.summers@intel.com>
Comments also added to the code, but in the event of
a wedge or a hardware failure while communication with
GuC is outstanding (e.g. during a schedule disable or
context deregistration), the driver doesn't automatically
reset the software state as it would in a typical GT reset
since we are trying to save the state for debug. However
once the user unbinds the driver we still need to go through
and clean everything up for these exec queues so we don't
leak memory on the DRM side (e.g. LRC or LRC BO).
Add a kick start to the DRM scheduler to handle any outstanding
messages on hold during the wedge and go through the GuC stop
flow to simulate that reset on teardown.
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
---
drivers/gpu/drm/xe/xe_guc_submit.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 5ec1e4a83d68..0bbae336c722 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -276,6 +276,14 @@ static void guc_submit_fini(struct drm_device *drm, void *arg)
struct xe_gt *gt = guc_to_gt(guc);
int ret;
+ /*
+ * If GuC stopped responding during deregistration
+ * some queues can be left in a bad state. Ensure
+ * these are all cleaned up by going through the
+ * GuC software reset flow.
+ */
+ xe_guc_stop(guc);
+
ret = wait_event_timeout(guc->submission_state.fini_wq,
xa_empty(&guc->submission_state.exec_queue_lookup),
HZ * 5);
@@ -295,6 +303,13 @@ static void guc_submit_wedged_fini(void *arg)
mutex_lock(&guc->submission_state.lock);
xa_for_each(&guc->submission_state.exec_queue_lookup, index, q) {
+ /*
+ * Kick start the scheduler since some messages
+ * might have been added while the scheduler was
+ * stopped during a wedge event.
+ */
+ xe_sched_submission_start(&q->guc->sched);
+
if (exec_queue_wedged(q)) {
mutex_unlock(&guc->submission_state.lock);
xe_exec_queue_put(q);
--
2.34.1
next prev parent reply other threads:[~2025-10-14 18:09 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-14 18:09 [PATCH 0/6] Fix a couple of wedge corner-case memory leaks Stuart Summers
2025-10-14 18:09 ` [PATCH 1/6] drm/xe: Add additional trace points for LRCs Stuart Summers
2025-10-14 18:09 ` [PATCH 2/6] drm/xe: Add a trace point for VM close Stuart Summers
2025-10-14 18:09 ` [PATCH 3/6] drm/xe: Add the BO pointer info to the BO trace Stuart Summers
2025-10-14 18:09 ` [PATCH 4/6] drm/xe: Add new exec queue trace points Stuart Summers
2025-10-14 18:09 ` [PATCH 5/6] drm/xe: Correct migration VM teardown order Stuart Summers
2025-10-15 0:59 ` Matthew Brost
2025-10-15 17:55 ` Summers, Stuart
2025-10-14 18:09 ` Stuart Summers [this message]
2025-10-15 19:45 ` [PATCH 6/6] drm/xe: Clean up GuC software state after a wedge Summers, Stuart
2025-10-16 3:42 ` Matthew Brost
2025-10-16 17:50 ` Summers, Stuart
2025-10-16 22:10 ` Matthew Brost
2025-10-17 14:48 ` Summers, Stuart
2025-10-14 18:15 ` ✗ CI.checkpatch: warning for Fix a couple of wedge corner-case memory leaks (rev4) Patchwork
2025-10-14 18:16 ` ✓ CI.KUnit: success " Patchwork
2025-10-14 19:13 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-10-14 21:14 ` Summers, Stuart
2025-10-15 5:18 ` ✗ Xe.CI.Full: " Patchwork
-- strict thread matches above, loose matches on Subject: below --
2025-10-27 18:04 [PATCH 0/6] Fix a couple of wedge corner-case memory leaks Stuart Summers
2025-10-27 18:04 ` [PATCH 6/6] drm/xe: Clean up GuC software state after a wedge Stuart Summers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251014180927.105077-7-stuart.summers@intel.com \
--to=stuart.summers@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox