From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: <intel-xe@lists.freedesktop.org>,
Lucas De Marchi <lucas.demarchi@intel.com>,
<himal.prasad.ghimiray@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Subject: Re: [PATCH 2/4] drm/xe: declare wedged upon GuC load failure
Date: Tue, 16 Apr 2024 15:05:46 -0400 [thread overview]
Message-ID: <Zh7MCpZGUz6z3dZm@intel.com> (raw)
In-Reply-To: <20240409221507.1076471-2-rodrigo.vivi@intel.com>
On Tue, Apr 09, 2024 at 06:15:05PM -0400, Rodrigo Vivi wrote:
> Let's block the device upon any GuC load failure.
> But let's continue with the probe so guc logs can be read
> from the debugfs.
>
> v2: - s/wedged/busted
> - do not block probe or we lose guc_logs in debugfs (Matt)
>
> v3: - s/busted/wedged
>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
> drivers/gpu/drm/xe/xe_guc.c | 42 ++++++++++++++++---------------------
> 1 file changed, 18 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> index 240e7a4bbff1..f1c3e338301d 100644
> --- a/drivers/gpu/drm/xe/xe_guc.c
> +++ b/drivers/gpu/drm/xe/xe_guc.c
> @@ -451,7 +451,7 @@ static int guc_xfer_rsa(struct xe_guc *guc)
> return 0;
> }
>
> -static int guc_wait_ucode(struct xe_guc *guc)
> +static void guc_wait_ucode(struct xe_guc *guc)
> {
> struct xe_gt *gt = guc_to_gt(guc);
> u32 status;
> @@ -479,30 +479,26 @@ static int guc_wait_ucode(struct xe_guc *guc)
> 200000, &status, false);
>
> if (ret) {
> - xe_gt_info(gt, "GuC load failed: status = 0x%08X\n", status);
> - xe_gt_info(gt, "GuC status: Reset = %u, BootROM = %#X, UKernel = %#X, MIA = %#X, Auth = %#X\n",
> - REG_FIELD_GET(GS_MIA_IN_RESET, status),
> - REG_FIELD_GET(GS_BOOTROM_MASK, status),
> - REG_FIELD_GET(GS_UKERNEL_MASK, status),
> - REG_FIELD_GET(GS_MIA_MASK, status),
> - REG_FIELD_GET(GS_AUTH_STATUS_MASK, status));
> -
> - if ((status & GS_BOOTROM_MASK) == GS_BOOTROM_RSA_FAILED) {
> - xe_gt_info(gt, "GuC firmware signature verification failed\n");
> - ret = -ENOEXEC;
> - }
> + xe_gt_err(gt, "GuC load failed: status = 0x%08X\n", status);
> + xe_gt_err(gt, "GuC status: Reset = %u, BootROM = %#X, UKernel = %#X, MIA = %#X, Auth = %#X\n",
> + REG_FIELD_GET(GS_MIA_IN_RESET, status),
> + REG_FIELD_GET(GS_BOOTROM_MASK, status),
> + REG_FIELD_GET(GS_UKERNEL_MASK, status),
> + REG_FIELD_GET(GS_MIA_MASK, status),
> + REG_FIELD_GET(GS_AUTH_STATUS_MASK, status));
> +
> + if ((status & GS_BOOTROM_MASK) == GS_BOOTROM_RSA_FAILED)
> + xe_gt_err(gt, "GuC firmware signature verification failed\n");
>
> if (REG_FIELD_GET(GS_UKERNEL_MASK, status) ==
> - XE_GUC_LOAD_STATUS_EXCEPTION) {
> - xe_gt_info(gt, "GuC firmware exception. EIP: %#x\n",
> - xe_mmio_read32(gt, SOFT_SCRATCH(13)));
> - ret = -ENXIO;
> - }
> + XE_GUC_LOAD_STATUS_EXCEPTION)
> + xe_gt_err(gt, "GuC firmware exception. EIP: %#x\n",
> + xe_mmio_read32(gt, SOFT_SCRATCH(13)));
> +
> + xe_device_declare_wedged(gt_to_xe(gt));
> } else {
> xe_gt_dbg(gt, "GuC successfully loaded\n");
> }
> -
> - return ret;
> }
>
> static int __xe_guc_upload(struct xe_guc *guc)
> @@ -532,16 +528,14 @@ static int __xe_guc_upload(struct xe_guc *guc)
> goto out;
>
> /* Wait for authentication */
> - ret = guc_wait_ucode(guc);
> - if (ret)
> - goto out;
> + guc_wait_ucode(guc);
>
> xe_uc_fw_change_status(&guc->fw, XE_UC_FIRMWARE_RUNNING);
> return 0;
>
> out:
> xe_uc_fw_change_status(&guc->fw, XE_UC_FIRMWARE_LOAD_FAIL);
> - return 0 /* FIXME: ret, don't want to stop load currently */;
> + return ret;
Lucas, thanks for the review. Just to let you know that I'm removing
this chunk from this patch. Himal had noticed and warned me that
this would change the behavior of other cases that are not touched
or covered by this patch. i.e. if the guc_load fails on guc_xfer_rsa
or xe_uc_fw_upload, we were not aboarting the probe, but now we are.
So, let's remove this change from this patch for now so we can go
ahead with this and then on top we see if we do the wedged on top
of the rest and make this function a void case.
Agree?
> }
>
> /**
> --
> 2.44.0
>
next prev parent reply other threads:[~2024-04-16 19:05 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-09 22:15 [PATCH 1/4] drm/xe: Introduce a simple wedged state Rodrigo Vivi
2024-04-09 22:15 ` [PATCH 2/4] drm/xe: declare wedged upon GuC load failure Rodrigo Vivi
2024-04-09 23:41 ` Matthew Brost
2024-04-10 6:50 ` Ghimiray, Himal Prasad
2024-04-16 17:40 ` Lucas De Marchi
2024-04-16 17:06 ` Lucas De Marchi
2024-04-16 19:05 ` Rodrigo Vivi [this message]
2024-04-16 19:13 ` Lucas De Marchi
2024-04-16 19:19 ` Ghimiray, Himal Prasad
2024-04-09 22:15 ` [PATCH 3/4] drm/xe: Force wedged state and block GT reset upon any GPU hang Rodrigo Vivi
2024-04-10 17:58 ` Matthew Brost
2024-04-16 17:19 ` Lucas De Marchi
2024-04-16 19:08 ` Rodrigo Vivi
2024-04-09 22:15 ` [PATCH 4/4] drm/xe: Introduce the wedged_mode debugfs Rodrigo Vivi
2024-04-17 19:51 ` Lucas De Marchi
2024-04-17 20:29 ` Rodrigo Vivi
2024-04-17 22:50 ` Lucas De Marchi
2024-04-18 5:14 ` Ghimiray, Himal Prasad
2024-04-18 10:44 ` Ghimiray, Himal Prasad
2024-04-09 22:21 ` ✓ CI.Patch_applied: success for series starting with [1/4] drm/xe: Introduce a simple wedged state Patchwork
2024-04-09 22:22 ` ✓ CI.checkpatch: " Patchwork
2024-04-09 22:23 ` ✓ CI.KUnit: " Patchwork
2024-04-09 22:34 ` ✓ CI.Build: " Patchwork
2024-04-09 22:37 ` ✓ CI.Hooks: " Patchwork
2024-04-09 22:38 ` ✓ CI.checksparse: " Patchwork
2024-04-09 23:02 ` ✓ CI.BAT: " Patchwork
2024-04-10 0:14 ` ✗ CI.FULL: failure " Patchwork
2024-04-16 17:03 ` [PATCH 1/4] " Lucas De Marchi
2024-04-16 19:20 ` Ghimiray, Himal Prasad
-- strict thread matches above, loose matches on Subject: below --
2024-04-23 22:18 Rodrigo Vivi
2024-04-23 22:18 ` [PATCH 2/4] drm/xe: declare wedged upon GuC load failure Rodrigo Vivi
2024-04-03 15:07 [PATCH 0/4] Introduce a wedged state Rodrigo Vivi
2024-04-03 15:07 ` [PATCH 2/4] drm/xe: declare wedged upon GuC load failure Rodrigo Vivi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zh7MCpZGUz6z3dZm@intel.com \
--to=rodrigo.vivi@intel.com \
--cc=himal.prasad.ghimiray@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=lucas.demarchi@intel.com \
--cc=matthew.brost@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox