From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH 2/2] drm/xe: Drop registration of guc_submit_wedged_fini from xe_guc_submit_wedge()
Date: Fri, 27 Mar 2026 09:19:51 -0400 [thread overview]
Message-ID: <acaD9w2mi8I5mcrD@intel.com> (raw)
In-Reply-To: <20260326210116.202585-3-matthew.brost@intel.com>
On Thu, Mar 26, 2026 at 02:01:16PM -0700, Matthew Brost wrote:
> xe_guc_submit_wedge() runs in the DMA-fence signaling path, where
> GFP_KERNEL memory allocations are not permitted. However, registering
> guc_submit_wedged_fini via drmm_add_action_or_reset() triggers such an
> allocation.
>
> Avoid this by moving the logic from guc_submit_wedged_fini() into
> guc_submit_fini(), where wedged exec queue references are dropped during
> normal teardown.
interesting and easier than I had imagined.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
>
> Fixes: 8ed9aaae39f3 ("drm/xe: Force wedged state and block GT reset upon any GPU hang")
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
> drivers/gpu/drm/xe/xe_guc_submit.c | 33 ++++++++----------------------
> 1 file changed, 9 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index a145234f662b..10556156eaad 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -259,24 +259,12 @@ static void guc_submit_sw_fini(struct drm_device *drm, void *arg)
> }
>
> static void guc_submit_fini(void *arg)
> -{
> - struct xe_guc *guc = arg;
> -
> - /* Forcefully kill any remaining exec queues */
> - xe_guc_ct_stop(&guc->ct);
> - guc_submit_reset_prepare(guc);
> - xe_guc_softreset(guc);
> - xe_guc_submit_stop(guc);
> - xe_uc_fw_sanitize(&guc->fw);
> - xe_guc_submit_pause_abort(guc);
> -}
> -
> -static void guc_submit_wedged_fini(void *arg)
> {
> struct xe_guc *guc = arg;
> struct xe_exec_queue *q;
> unsigned long index;
>
> + /* Drop any wedged queue refs */
> mutex_lock(&guc->submission_state.lock);
> xa_for_each(&guc->submission_state.exec_queue_lookup, index, q) {
> if (exec_queue_wedged(q)) {
> @@ -286,6 +274,14 @@ static void guc_submit_wedged_fini(void *arg)
> }
> }
> mutex_unlock(&guc->submission_state.lock);
> +
> + /* Forcefully kill any remaining exec queues */
> + xe_guc_ct_stop(&guc->ct);
> + guc_submit_reset_prepare(guc);
> + xe_guc_softreset(guc);
> + xe_guc_submit_stop(guc);
> + xe_uc_fw_sanitize(&guc->fw);
> + xe_guc_submit_pause_abort(guc);
> }
>
> static const struct xe_exec_queue_ops guc_exec_queue_ops;
> @@ -1320,10 +1316,8 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
> void xe_guc_submit_wedge(struct xe_guc *guc)
> {
> struct xe_device *xe = guc_to_xe(guc);
> - struct xe_gt *gt = guc_to_gt(guc);
> struct xe_exec_queue *q;
> unsigned long index;
> - int err;
>
> xe_gt_assert(guc_to_gt(guc), guc_to_xe(guc)->wedged.mode);
>
> @@ -1335,15 +1329,6 @@ void xe_guc_submit_wedge(struct xe_guc *guc)
> return;
>
> if (xe->wedged.mode == XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET) {
> - err = devm_add_action_or_reset(guc_to_xe(guc)->drm.dev,
> - guc_submit_wedged_fini, guc);
> - if (err) {
> - xe_gt_err(gt, "Failed to register clean-up on wedged.mode=%s; "
> - "Although device is wedged.\n",
> - xe_wedged_mode_to_string(XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET));
> - return;
> - }
> -
> mutex_lock(&guc->submission_state.lock);
> xa_for_each(&guc->submission_state.exec_queue_lookup, index, q)
> if (xe_exec_queue_get_unless_zero(q))
> --
> 2.34.1
>
next prev parent reply other threads:[~2026-03-27 13:20 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-26 21:01 [PATCH 0/2] Wedged memory reclaim fixes Matthew Brost
2026-03-26 21:01 ` [PATCH 1/2] drm/xe: Avoid memory allocations in xe_device_declare_wedged() Matthew Brost
2026-03-27 13:15 ` Rodrigo Vivi
2026-03-26 21:01 ` [PATCH 2/2] drm/xe: Drop registration of guc_submit_wedged_fini from xe_guc_submit_wedge() Matthew Brost
2026-03-27 13:19 ` Rodrigo Vivi [this message]
2026-03-26 21:11 ` ✓ CI.KUnit: success for Wedged memory reclaim fixes Patchwork
2026-03-26 22:01 ` ✓ Xe.CI.BAT: " Patchwork
2026-03-27 14:45 ` ✓ Xe.CI.FULL: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=acaD9w2mi8I5mcrD@intel.com \
--to=rodrigo.vivi@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.