From: Stuart Summers <stuart.summers@intel.com>
Cc: intel-xe@lists.freedesktop.org,
Stuart Summers <stuart.summers@intel.com>
Subject: [PATCH 7/7] drm/xe: Check for GuC responses on disabling scheduling
Date: Thu, 2 Oct 2025 23:04:44 +0000 [thread overview]
Message-ID: <20251002230444.313505-8-stuart.summers@intel.com> (raw)
In-Reply-To: <20251002230444.313505-1-stuart.summers@intel.com>
In the event the GuC becomes unresponsive during a scheduling
disable event, we still want the driver to be able to recover.
This patch follows the same methodology we already have in place
for TLB invalidation requests, where we send a request to GuC
and wait for that invalidation done response. If the response
doesn't come back in time we then at least print a message
indicating the invalidation failed for some reason.
In this case, we send the schedule disable and the expectation
is that GuC will respond with a schedule done response. The KMD
then catches that response and in turn sends a context deregistration
response. So in the event GuC becomes unresponsive after we send
the schedule disable, we actually have two g2h responses that
have been reserved but never received.
To handle this, make sure the pending disable event in the
exec queue gets cleared (i.e. we received that response from
GuC). If it doesn't in a reasonable amount of time, assume
GuC is dead: ban the exec queue, queue up a GT reset, and
manually call the schedule done handler. Then in the schedule
done handler, in turn, check whether the context had been
banned. If so, manually call the deregistration done handler
to ensure all resources related to that exec queue get
cleaned up properly. Without this, if the device becomes
wedged after an exec queue has been created, the attached
resources like the LRC will not get feed properly resulting
in a memory leak.
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
---
drivers/gpu/drm/xe/xe_guc_submit.c | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 45b72bebfc63..a177d87c8524 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -939,6 +939,9 @@ int xe_guc_read_stopped(struct xe_guc *guc)
GUC_CONTEXT_##enable_disable, \
}
+static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
+ u32 runnable_state);
+
static void disable_scheduling_deregister(struct xe_guc *guc,
struct xe_exec_queue *q)
{
@@ -974,6 +977,17 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
G2H_LEN_DW_SCHED_CONTEXT_MODE_SET +
G2H_LEN_DW_DEREGISTER_CONTEXT, 2);
+
+ ret = wait_event_timeout(guc->ct.wq,
+ !exec_queue_pending_disable(q) ||
+ xe_guc_read_stopped(guc),
+ HZ * 5);
+ if (!ret || xe_guc_read_stopped(guc)) {
+ xe_gt_warn(guc_to_gt(guc), "Schedule disable failed to respond");
+ set_exec_queue_banned(q);
+ handle_sched_done(guc, q, 0);
+ xe_gt_reset_async(q->gt);
+ }
}
static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q)
@@ -2117,6 +2131,8 @@ g2h_exec_queue_lookup(struct xe_guc *guc, u32 guc_id)
return q;
}
+static void handle_deregister_done(struct xe_guc *guc, struct xe_exec_queue *q);
+
static void deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
{
u32 action[] = {
@@ -2131,7 +2147,12 @@ static void deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
trace_xe_exec_queue_deregister(q);
- xe_guc_ct_send_g2h_handler(&guc->ct, action, ARRAY_SIZE(action));
+ if (exec_queue_banned(q)) {
+ handle_deregister_done(guc, q);
+ } else {
+ xe_guc_ct_send_g2h_handler(&guc->ct, action,
+ ARRAY_SIZE(action));
+ }
}
static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
--
2.34.1
next prev parent reply other threads:[~2025-10-02 23:04 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-02 23:04 [PATCH 0/7] Fix a couple of wedge corner-case memory leaks Stuart Summers
2025-10-02 23:04 ` [PATCH 1/7] drm/xe: Add additional trace points for LRCs Stuart Summers
2025-10-02 23:04 ` [PATCH 2/7] drm/xe: Add a trace point for VM close Stuart Summers
2025-10-02 23:04 ` [PATCH 3/7] drm/xe: Add the BO pointer info to the BO trace Stuart Summers
2025-10-02 23:04 ` [PATCH 4/7] drm/xe: Add new exec queue trace points Stuart Summers
2025-10-02 23:04 ` [PATCH 5/7] drm/xe: Handle missing migration VM on VM creation Stuart Summers
2025-10-02 23:34 ` Lin, Shuicheng
2025-10-03 6:56 ` Matthew Brost
2025-10-03 14:33 ` Summers, Stuart
2025-10-02 23:04 ` [PATCH 6/7] drm/xe: Don't send a CLEANUP message on sched pause Stuart Summers
2025-10-03 18:50 ` Matthew Brost
2025-10-03 18:53 ` Summers, Stuart
2025-10-02 23:04 ` Stuart Summers [this message]
2025-10-03 18:54 ` [PATCH 7/7] drm/xe: Check for GuC responses on disabling scheduling Matthew Brost
2025-10-03 18:58 ` Summers, Stuart
2025-10-03 19:38 ` Matthew Brost
2025-10-03 19:42 ` Summers, Stuart
2025-10-03 19:49 ` Matthew Brost
2025-10-03 19:53 ` Summers, Stuart
2025-10-02 23:11 ` ✗ CI.checkpatch: warning for Fix a couple of wedge corner-case memory leaks Patchwork
2025-10-02 23:12 ` ✓ CI.KUnit: success " Patchwork
2025-10-02 23:58 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-10-03 2:16 ` ✗ Xe.CI.Full: " Patchwork
2025-10-03 14:38 ` Summers, Stuart
-- strict thread matches above, loose matches on Subject: below --
2025-10-13 16:24 [PATCH 0/7] " Stuart Summers
2025-10-13 16:25 ` [PATCH 7/7] drm/xe: Check for GuC responses on disabling scheduling Stuart Summers
2025-10-13 22:31 [PATCH 0/7] Fix a couple of wedge corner-case memory leaks Stuart Summers
2025-10-13 22:31 ` [PATCH 7/7] drm/xe: Check for GuC responses on disabling scheduling Stuart Summers
2025-10-14 2:09 ` Matthew Brost
2025-10-14 3:10 ` Summers, Stuart
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251002230444.313505-8-stuart.summers@intel.com \
--to=stuart.summers@intel.com \
--cc=intel-xe@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox