Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: Zhanjun Dong <zhanjun.dong@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <daniele.ceraolospurio@intel.com>
Subject: Re: [PATCH v3] drm/xe/uc: Add stop on hardware initialization error
Date: Tue, 28 Oct 2025 12:57:02 -0700	[thread overview]
Message-ID: <aQEgDjfMVDYELrWJ@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <20251028153820.3139977-1-zhanjun.dong@intel.com>

On Tue, Oct 28, 2025 at 11:38:20AM -0400, Zhanjun Dong wrote:
> On hardware init fail, the hardware might no longer response, add GuC stop
> to clean up exec_queue items.
> 
> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5466
> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5530
> Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
> ---
> v3: Switch to xe_guc_stop
> v2: Switch to xe_guc_ct_stop
> ---
>  drivers/gpu/drm/xe/xe_uc.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_uc.c b/drivers/gpu/drm/xe/xe_uc.c
> index 465bda355443..00ca5883e006 100644
> --- a/drivers/gpu/drm/xe/xe_uc.c
> +++ b/drivers/gpu/drm/xe/xe_uc.c
> @@ -173,6 +173,7 @@ static int vf_uc_load_hw(struct xe_uc *uc)
>  	return 0;
>  
>  err_out:
> +	xe_guc_stop(&uc->guc);

If exec queues are destroyed later—after the submission backend has been
stopped—the final put on the queue may be lost, leading to dangling
memory when aborting the driver load or unloading it.

I think you'll need to call xe_guc_submit_pause_abort somewhere to
ensure the final put cleanup messages are processed by the queues. Maybe
we add this call in guc_submit_fini before wait_event_timeout?

Matt

>  	xe_guc_sanitize(&uc->guc);
>  	return err;
>  }
> @@ -228,6 +229,7 @@ int xe_uc_load_hw(struct xe_uc *uc)
>  	return 0;
>  
>  err_out:
> +	xe_guc_stop(&uc->guc);
>  	xe_guc_sanitize(&uc->guc);
>  	return ret;
>  }
> -- 
> 2.34.1
> 

  parent reply	other threads:[~2025-10-28 19:57 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-28 15:38 [PATCH v3] drm/xe/uc: Add stop on hardware initialization error Zhanjun Dong
2025-10-28 17:29 ` ✓ CI.KUnit: success for drm/xe/uc: Add stop on hardware initialization error (rev2) Patchwork
2025-10-28 18:23 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-28 19:57 ` Matthew Brost [this message]
2025-10-28 22:36   ` [PATCH v3] drm/xe/uc: Add stop on hardware initialization error Dong, Zhanjun
2025-11-04 16:33     ` Dong, Zhanjun
2025-11-19  3:17       ` Matthew Brost
2025-11-20 17:05         ` Dong, Zhanjun
2025-10-29  3:43 ` ✗ Xe.CI.Full: failure for drm/xe/uc: Add stop on hardware initialization error (rev2) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQEgDjfMVDYELrWJ@lstrano-desk.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=daniele.ceraolospurio@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=zhanjun.dong@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox