Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
To: Zhanjun Dong <zhanjun.dong@intel.com>, <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH v4 1/1] drm/xe/gsc: Fix GSC proxy cleanup on early initialization failure
Date: Thu, 19 Feb 2026 16:33:23 -0800	[thread overview]
Message-ID: <8c9cbab0-2315-469a-84c0-033806ead0a2@intel.com> (raw)
In-Reply-To: <20260219173128.2414504-1-zhanjun.dong@intel.com>



On 2/19/2026 9:31 AM, Zhanjun Dong wrote:
> xe_gsc_proxy_remove undoes what is done in both xe_gsc_proxy_init and
> xe_gsc_proxy_start; however, if we fail between those 2 calls, it is
> possible that the HW forcewake access hasn't been initialized yet and so
> we hit errors when the cleanup code tries to write GSC register. To
> avoid that, split the cleanup in 2 functions so that the HW cleanup is
> only called if the HW setup was completed successfully.
>
> Since the HW cleanup (interrupt disabling) is now removed from
> xe_gsc_proxy_remove, the cleanup on error paths in xe_gsc_proxy_start
> must be updated to disable interrupts before returning.
>
> Fixes: ff6cd29b690b ("drm/xe: Cleanup unwind of gt initialization")
> Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> ---
> v4:
> - Replace devm-managed cleanup action for xe_gsc_proxy_stop() with a
>    manual flag-based approach using a 'started' flag. This avoids a race
>    condition where module unload could start while the async GSC proxy
>    initialization is still in progress, potentially causing the devm
>    cleanup to be called at the wrong time.
> - Set gsc->proxy.started = true at the end of xe_gsc_proxy_start() when
>    initialization completes successfully.
> - Check gsc->proxy.started in xe_gsc_proxy_remove() to conditionally
>    call xe_gsc_proxy_stop() only if the proxy was actually started.
>
> v3:
> - Move xe_gsc_wait_for_worker_completion() to xe_gsc_proxy_stop() after
>    disabling interrupts, since the worker shouldn't be queued anymore
>    after interrupts are disabled.
> - Update commit message to clarify that the error handling changes in
>    xe_gsc_proxy_start() are necessary due to the cleanup refactoring,
>    not a separate fix.
>
> v2:
> - Split cleanup into two functions: xe_gsc_proxy_remove() for SW cleanup
>    and xe_gsc_proxy_stop() for HW cleanup that requires forcewake access.
> - Add error handling in xe_gsc_proxy_start to disable interrupts on
>    early error exits.
> ---
>   drivers/gpu/drm/xe/xe_gsc_proxy.c | 42 +++++++++++++++++++++++++------
>   drivers/gpu/drm/xe/xe_gsc_types.h |  2 ++
>   2 files changed, 36 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_gsc_proxy.c b/drivers/gpu/drm/xe/xe_gsc_proxy.c
> index 42438b21f235..afafe8b41e65 100644
> --- a/drivers/gpu/drm/xe/xe_gsc_proxy.c
> +++ b/drivers/gpu/drm/xe/xe_gsc_proxy.c
> @@ -435,15 +435,11 @@ static int proxy_channel_alloc(struct xe_gsc *gsc)
>   	return 0;
>   }
>   
> -static void xe_gsc_proxy_remove(void *arg)
> +static void xe_gsc_proxy_stop(struct xe_gsc *gsc)
>   {
> -	struct xe_gsc *gsc = arg;
>   	struct xe_gt *gt = gsc_to_gt(gsc);
>   	struct xe_device *xe = gt_to_xe(gt);
>   
> -	if (!gsc->proxy.component_added)
> -		return;
> -
>   	/* disable HECI2 IRQs */
>   	scoped_guard(xe_pm_runtime, xe) {
>   		CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GSC);
> @@ -455,6 +451,29 @@ static void xe_gsc_proxy_remove(void *arg)
>   	}
>   
>   	xe_gsc_wait_for_worker_completion(gsc);
> +	gsc->proxy.started = false;
> +}
> +
> +static void xe_gsc_proxy_remove(void *arg)
> +{
> +	struct xe_gsc *gsc = arg;
> +	struct xe_gt *gt = gsc_to_gt(gsc);
> +	struct xe_device *xe = gt_to_xe(gt);
> +
> +	if (!gsc->proxy.component_added)
> +		return;
> +
> +	/*
> +	 * GSC proxy init is an async process that can be ongoing during

"proxy init" is ambiguous here, because the xe_gsc_proxy_init() function 
is called synchronously. Maybe use "proxy start", because that is the 
part that is called asynchronously.

> +	 * Xe module load/unload. Using devm managed action to register
> +	 * xe_gsc_proxy_stop could cause issues if Xe module unload has
> +	 * already started when the action is registered, potentially leading
> +	 * to the cleanup being called at the wrong time. The 'started' flag
> +	 * is used to avoid this race condition by ensuring we only stop the
> +	 * proxy if it was actually started.

This last sentence about the "started" flag is a bit confusing, because 
it doesn't directly avoid a race condition. I'd replace it with 
something like:
"Therefore, instead of registering a separate devm action to undo what 
is done in proxy start, we call it from here, but only if the start has 
completed successfully (tracked with the 'started' flag)."

with those changes:

Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Daniele

> +	 */
> +	if (gsc->proxy.started)
> +		xe_gsc_proxy_stop(gsc);
>   
>   	component_del(xe->drm.dev, &xe_gsc_proxy_component_ops);
>   	gsc->proxy.component_added = false;
> @@ -510,6 +529,7 @@ int xe_gsc_proxy_init(struct xe_gsc *gsc)
>    */
>   int xe_gsc_proxy_start(struct xe_gsc *gsc)
>   {
> +	struct xe_gt *gt = gsc_to_gt(gsc);
>   	int err;
>   
>   	/* enable the proxy interrupt in the GSC shim layer */
> @@ -521,12 +541,18 @@ int xe_gsc_proxy_start(struct xe_gsc *gsc)
>   	 */
>   	err = xe_gsc_proxy_request_handler(gsc);
>   	if (err)
> -		return err;
> +		goto err_irq_disable;
>   
>   	if (!xe_gsc_proxy_init_done(gsc)) {
> -		xe_gt_err(gsc_to_gt(gsc), "GSC FW reports proxy init not completed\n");
> -		return -EIO;
> +		xe_gt_err(gt, "GSC FW reports proxy init not completed\n");
> +		err = -EIO;
> +		goto err_irq_disable;
>   	}
>   
> +	gsc->proxy.started = true;
>   	return 0;
> +
> +err_irq_disable:
> +	gsc_proxy_irq_toggle(gsc, false);
> +	return err;
>   }
> diff --git a/drivers/gpu/drm/xe/xe_gsc_types.h b/drivers/gpu/drm/xe/xe_gsc_types.h
> index 97c056656df0..5aaa2a75861f 100644
> --- a/drivers/gpu/drm/xe/xe_gsc_types.h
> +++ b/drivers/gpu/drm/xe/xe_gsc_types.h
> @@ -58,6 +58,8 @@ struct xe_gsc {
>   		struct mutex mutex;
>   		/** @proxy.component_added: whether the component has been added */
>   		bool component_added;
> +		/** @proxy.started: whether the proxy has been started */
> +		bool started;
>   		/** @proxy.bo: object to store message to and from the GSC */
>   		struct xe_bo *bo;
>   		/** @proxy.to_gsc: map of the memory used to send messages to the GSC */


  parent reply	other threads:[~2026-02-20  0:33 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-19 17:31 [PATCH v4 1/1] drm/xe/gsc: Fix GSC proxy cleanup on early initialization failure Zhanjun Dong
2026-02-19 17:38 ` ✓ CI.KUnit: success for series starting with [v4,1/1] " Patchwork
2026-02-20  0:33 ` Daniele Ceraolo Spurio [this message]
2026-02-20  8:33 ` ✓ Xe.CI.BAT: " Patchwork
2026-02-20 12:18 ` ✗ Xe.CI.FULL: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8c9cbab0-2315-469a-84c0-033806ead0a2@intel.com \
    --to=daniele.ceraolospurio@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=zhanjun.dong@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox