From: Matthew Brost <matthew.brost@intel.com>
To: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: "Cavitt, Jonathan" <jonathan.cavitt@intel.com>,
"Dong, Zhanjun" <zhanjun.dong@intel.com>,
"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
"Summers, Stuart" <stuart.summers@intel.com>
Subject: Re: [PATCH v8] drm/xe/uc: Disable GuC communication on hardware initialization error.
Date: Mon, 7 Jul 2025 11:30:07 -0700 [thread overview]
Message-ID: <aGwSLzlUP1YTJ5bP@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <3a784641-2436-4ead-88e2-88ef33fd39ac@intel.com>
On Mon, Jul 07, 2025 at 07:36:29PM +0200, Michal Wajdeczko wrote:
>
>
> On 07.07.2025 15:56, Cavitt, Jonathan wrote:
> > -----Original Message-----
> > From: Dong, Zhanjun <zhanjun.dong@intel.com>
> > Sent: Thursday, July 3, 2025 2:39 PM
> > To: intel-xe@lists.freedesktop.org
> > Cc: Dong, Zhanjun <zhanjun.dong@intel.com>; Wajdeczko, Michal <Michal.Wajdeczko@intel.com>; Summers, Stuart <stuart.summers@intel.com>; Cavitt, Jonathan <jonathan.cavitt@intel.com>
> > Subject: [PATCH v8] drm/xe/uc: Disable GuC communication on hardware initialization error.
> >>
> >> Disable GuC communication on Xe micro controller hardware initialization
> >> error.
> >>
> >> Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
> >> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4917
> >>
> >> ---
> >> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> >> Cc: Stuart Summers <stuart.summers@intel.com>
> >> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
> >>
> >> Change list:
> >> v8: Fix kernel-doc style
> >> Add error handling in vf_guc_min_load_for_hwconfig
> >> v7: Add kernel-doc for xe_guc_disable_communication
> >> Unset submission_state.enabled as well
> >> v6: Skip disable ct on xe_guc_enable_communication error
> >> v5: Set wedge is excessive action, revert back to disable ct
> >> v4: Fix typo and add new line
> >> v3: v2 CI re-run
> >> v2: Remove unnecessary jump to err-out
> >> Drop disable ct, switch to set wedge
> >> ---
> >> drivers/gpu/drm/xe/xe_guc.c | 20 ++++++++++++++++++--
> >> drivers/gpu/drm/xe/xe_guc.h | 1 +
> >> drivers/gpu/drm/xe/xe_uc.c | 19 ++++++++++++++-----
> >> 3 files changed, 33 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> >> index 8573957facae..6643a2cb898b 100644
> >> --- a/drivers/gpu/drm/xe/xe_guc.c
> >> +++ b/drivers/gpu/drm/xe/xe_guc.c
> >> @@ -1219,13 +1219,17 @@ static int vf_guc_min_load_for_hwconfig(struct xe_guc *guc)
> >>
> >> ret = xe_gt_sriov_vf_connect(gt);
> >> if (ret)
> >> - return ret;
> >> + goto err_out;
> >>
> >> ret = xe_gt_sriov_vf_query_runtime(gt);
> >> if (ret)
> >> - return ret;
> >> + goto err_out;
> >>
> >> return 0;
> >> +
> >> +err_out:
> >> + xe_guc_disable_communication(guc);
> >> + return ret;
> >> }
> >>
> >> /**
> >> @@ -1337,6 +1341,18 @@ int xe_guc_enable_communication(struct xe_guc *guc)
> >> return 0;
> >> }
> >>
> >> +/**
> >> + * xe_guc_disable_communication() - Disable GuC communication
> >> + * @guc: The GuC object
> >> + *
> >> + * This function will disable the GuC communication.
> >> + */
> >> +void xe_guc_disable_communication(struct xe_guc *guc)
> >> +{
> >> + guc->submission_state.enabled = false;
> >> + xe_guc_ct_disable(&guc->ct);
> >
> > Didn't this used to just be a wrapper for xe_guc_ct_disable?
> >
> > ... Well, I suppose needing to set the submission state enabled to false
> > would also be a rather important step, so it's probably good that was added.
>
> hmm, while it might be a required step for shutdown, putting it here
> looks like a random choice, it's unbalanced what "enable_communication"
> did and also looks like a violation of the layering, as guc_submit code
> has its own files and we shouldn't touch internals from the outside
>
> + @Matt
>
In general, the use of guc->submission_state.enabled in the existing
driver doesn't appear correct. It's accessed in multiple files without
any clear locking or layering rules. This falls into the category of "I
wrote this early in Xe without much thought or due to some ignorance,
and we never cleaned it up"—and there's quite a bit of that floating
around in Xe. So I agree with Michal—we shouldn't make this worse.
Rather than calling xe_guc_disable_communication here, maybe call
xe_guc_sanitize instead? It should achieve what you're aiming for
without further complicating the handling of
guc->submission_state.enabled.
Matt
> >
> > Everything else is LGTM, so
> > Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
> > -Jonathan Cavitt
> >
> >> +}
> >> +
> >> int xe_guc_suspend(struct xe_guc *guc)
> >> {
> >> struct xe_gt *gt = guc_to_gt(guc);
> >> diff --git a/drivers/gpu/drm/xe/xe_guc.h b/drivers/gpu/drm/xe/xe_guc.h
> >> index 22cf019a11bf..20823b821f7d 100644
> >> --- a/drivers/gpu/drm/xe/xe_guc.h
> >> +++ b/drivers/gpu/drm/xe/xe_guc.h
> >> @@ -34,6 +34,7 @@ int xe_guc_reset(struct xe_guc *guc);
> >> int xe_guc_upload(struct xe_guc *guc);
> >> int xe_guc_min_load_for_hwconfig(struct xe_guc *guc);
> >> int xe_guc_enable_communication(struct xe_guc *guc);
> >> +void xe_guc_disable_communication(struct xe_guc *guc);
> >> int xe_guc_opt_in_features_enable(struct xe_guc *guc);
> >> int xe_guc_suspend(struct xe_guc *guc);
> >> void xe_guc_notify(struct xe_guc *guc);
> >> diff --git a/drivers/gpu/drm/xe/xe_uc.c b/drivers/gpu/drm/xe/xe_uc.c
> >> index 6431ba3a2c53..1012fe84b379 100644
> >> --- a/drivers/gpu/drm/xe/xe_uc.c
> >> +++ b/drivers/gpu/drm/xe/xe_uc.c
> >> @@ -13,6 +13,7 @@
> >> #include "xe_gt_printk.h"
> >> #include "xe_gt_sriov_vf.h"
> >> #include "xe_guc.h"
> >> +#include "xe_guc_ct.h"
> >> #include "xe_guc_pc.h"
> >> #include "xe_guc_engine_activity.h"
> >> #include "xe_huc.h"
> >> @@ -158,7 +159,7 @@ static int vf_uc_load_hw(struct xe_uc *uc)
> >>
> >> err = xe_gt_sriov_vf_connect(uc_to_gt(uc));
> >> if (err)
> >> - return err;
> >> + goto err_out;
> >>
> >> uc->guc.submission_state.enabled = true;
> >>
> >> @@ -168,9 +169,13 @@ static int vf_uc_load_hw(struct xe_uc *uc)
> >>
> >> err = xe_gt_record_default_lrcs(uc_to_gt(uc));
> >> if (err)
> >> - return err;
> >> + goto err_out;
> >>
> >> return 0;
> >> +
> >> +err_out:
> >> + xe_guc_disable_communication(&uc->guc);
> >> + return err;
> >> }
> >>
> >> /*
> >> @@ -202,15 +207,15 @@ int xe_uc_load_hw(struct xe_uc *uc)
> >>
> >> ret = xe_gt_record_default_lrcs(uc_to_gt(uc));
> >> if (ret)
> >> - return ret;
> >> + goto err_out;
> >>
> >> ret = xe_guc_post_load_init(&uc->guc);
> >> if (ret)
> >> - return ret;
> >> + goto err_out;
> >>
> >> ret = xe_guc_pc_start(&uc->guc.pc);
> >> if (ret)
> >> - return ret;
> >> + goto err_out;
> >>
> >> xe_guc_engine_activity_enable_stats(&uc->guc);
> >>
> >> @@ -222,6 +227,10 @@ int xe_uc_load_hw(struct xe_uc *uc)
> >> xe_gsc_load_start(&uc->gsc);
> >>
> >> return 0;
> >> +
> >> +err_out:
> >> + xe_guc_disable_communication(&uc->guc);
> >> + return ret;
> >> }
> >>
> >> int xe_uc_reset_prepare(struct xe_uc *uc)
> >> --
> >> 2.34.1
> >>
> >>
>
next prev parent reply other threads:[~2025-07-07 18:28 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-03 21:38 [PATCH v8] drm/xe/uc: Disable GuC communication on hardware initialization error Zhanjun Dong
2025-07-03 22:07 ` ✓ CI.KUnit: success for drm/xe/uc: Disable GuC communication on hardware initialization error. (rev6) Patchwork
2025-07-03 22:44 ` ✓ Xe.CI.BAT: " Patchwork
2025-07-05 16:06 ` ✗ Xe.CI.Full: failure " Patchwork
2025-07-07 15:51 ` Dong, Zhanjun
2025-07-07 13:56 ` [PATCH v8] drm/xe/uc: Disable GuC communication on hardware initialization error Cavitt, Jonathan
2025-07-07 17:36 ` Michal Wajdeczko
2025-07-07 18:30 ` Matthew Brost [this message]
2025-07-07 18:34 ` Cavitt, Jonathan
2025-07-07 21:05 ` Dong, Zhanjun
2025-07-07 18:28 ` Dong, Zhanjun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aGwSLzlUP1YTJ5bP@lstrano-desk.jf.intel.com \
--to=matthew.brost@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=jonathan.cavitt@intel.com \
--cc=michal.wajdeczko@intel.com \
--cc=stuart.summers@intel.com \
--cc=zhanjun.dong@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox