Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Riana Tauro <riana.tauro@intel.com>
To: Lucas De Marchi <lucas.demarchi@intel.com>,
	intel-xe <intel-xe@lists.freedesktop.org>
Cc: Francois Dugast <francois.dugast@intel.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>
Subject: Re: [PATCH v4 1/3] drm/xe: Move survivability back to xe
Date: Mon, 17 Mar 2025 10:54:40 +0530	[thread overview]
Message-ID: <23861c48-b702-42d1-95c9-9960b8fc0916@intel.com> (raw)
In-Reply-To: <20250313-fix-survivability-v4-1-5e90efdede99@intel.com>



On 3/14/2025 2:10 AM, Lucas De Marchi wrote:
> Commit d40f275d96e8 ("drm/xe: Move survivability entirely to xe_pci")
> moved the survivability handling to be done entirely in the xe_pci
> layer. However there are some issues with that approach:
> 
> 1) Survivability mode needs at least the mmio initialized, otherwise it
>     can't really read a register to decide if it should enter that state
> 2) SR-IOV mode should be initialized, otherwise it's not possible to
>     check if it's VF
> 
> Besides, as pointed by Riana the check for
> xe_survivability_mode_enable() was wrong in xe_pci_probe() since it's
> not a bool return.
> 
> Fix that by moving the initialization to be entirely in the xe_device
> layer, with the correct dependencies handled: only after mmio and sriov
> initialization, and not triggering it on error from
> wait_for_lmem_ready(). This restores the trigger behavior before that
> commit. The xe_pci layer now only checks for "is it enabled?",
> like it's doing in xe_pci_suspend()/xe_pci_remove(), etc.
> 
> Cc: Riana Tauro <riana.tauro@intel.com>
> Fixes: d40f275d96e8 ("drm/xe: Move survivability entirely to xe_pci")
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Looks good to me
Reviewed-by: Riana Tauro <riana.tauro@intel.com>

> ---
> Changes in v4:
> - Swap the returns for err and save_err (Riana)
> - Reword commit message to detail where the behavior before commit
>    d40f275d96e8 is being restored (Rodrigo)
> 
> Changes in v3:
> - Stop using -ENOTRECOVERABLE as it may be confusing wrt
>    critical/non-critical failure. Returning 0 is sufficient
> - s/xe_survivability_mode_capable/xe_survivability_mode_requested/
>    as it's not only about it being capable, but also about the type of
>    error, firmware signaling it, etc.
> ---
>   drivers/gpu/drm/xe/xe_device.c             | 17 +++++++++++++++--
>   drivers/gpu/drm/xe/xe_pci.c                | 16 +++++++---------
>   drivers/gpu/drm/xe/xe_survivability_mode.c | 17 +++++++++++------
>   drivers/gpu/drm/xe/xe_survivability_mode.h |  1 -
>   4 files changed, 33 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index b2f656b2a563e..0e8805f93468d 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -53,6 +53,7 @@
>   #include "xe_pxp.h"
>   #include "xe_query.h"
>   #include "xe_shrinker.h"
> +#include "xe_survivability_mode.h"
>   #include "xe_sriov.h"
>   #include "xe_tile.h"
>   #include "xe_ttm_stolen_mgr.h"
> @@ -711,8 +712,20 @@ int xe_device_probe_early(struct xe_device *xe)
>   	sriov_update_device_info(xe);
>   
>   	err = xe_pcode_probe_early(xe);
> -	if (err)
> -		return err;
> +	if (err) {
> +		int save_err = err;
> +
> +		/*
> +		 * Try to leave device in survivability mode if device is
> +		 * possible, but still return the previous error for error
> +		 * propagation
> +		 */
> +		err = xe_survivability_mode_enable(xe);
> +		if (err)
> +			return err;
> +
> +		return save_err;
> +	}
>   
>   	err = wait_for_lmem_ready(xe);
>   	if (err)
> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> index 4d982a5a4ffd9..6fea3091e2348 100644
> --- a/drivers/gpu/drm/xe/xe_pci.c
> +++ b/drivers/gpu/drm/xe/xe_pci.c
> @@ -808,16 +808,14 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>   		return err;
>   
>   	err = xe_device_probe_early(xe);
> -
> -	/*
> -	 * In Boot Survivability mode, no drm card is exposed and driver is
> -	 * loaded with bare minimum to allow for firmware to be flashed through
> -	 * mei. If early probe fails, check if survivability mode is flagged by
> -	 * HW to be enabled. In that case enable it and return success.
> -	 */
>   	if (err) {
> -		if (xe_survivability_mode_required(xe) &&
> -		    xe_survivability_mode_enable(xe))
> +		/*
> +		 * In Boot Survivability mode, no drm card is exposed and driver
> +		 * is loaded with bare minimum to allow for firmware to be
> +		 * flashed through mei. If early probe failed, but it managed to
> +		 * enable survivability mode, return success.
> +		 */
> +		if (xe_survivability_mode_is_enabled(xe))
>   			return 0;
>   
>   		return err;
> diff --git a/drivers/gpu/drm/xe/xe_survivability_mode.c b/drivers/gpu/drm/xe/xe_survivability_mode.c
> index d939ce70e6fa8..0d57ad1596edf 100644
> --- a/drivers/gpu/drm/xe/xe_survivability_mode.c
> +++ b/drivers/gpu/drm/xe/xe_survivability_mode.c
> @@ -178,15 +178,16 @@ bool xe_survivability_mode_is_enabled(struct xe_device *xe)
>   	return xe->survivability.mode;
>   }
>   
> -/**
> - * xe_survivability_mode_required - checks if survivability mode is required
> +/*
> + * xe_survivability_mode_requested - check if it's possible to enable
> + * survivability mode and requested by HW
>    * @xe: xe device instance
>    *
> - * This function reads the boot status from Pcode
> + * This function reads the boot status from Pcode.
>    *
> - * Return: true if boot status indicates failure, false otherwise
> + * Return: true if boot status indicates failure, false otherwise.
>    */
> -bool xe_survivability_mode_required(struct xe_device *xe)
> +static bool xe_survivability_mode_requested(struct xe_device *xe)
>   {
>   	struct xe_survivability *survivability = &xe->survivability;
>   	struct xe_mmio *mmio = xe_root_tile_mmio(xe);
> @@ -208,7 +209,8 @@ bool xe_survivability_mode_required(struct xe_device *xe)
>    *
>    * Initialize survivability information and enable survivability mode
>    *
> - * Return: 0 for success, negative error code otherwise.
> + * Return: 0 if survivability mode is enabled or not requested; negative error
> + * code otherwise.
>    */
>   int xe_survivability_mode_enable(struct xe_device *xe)
>   {
> @@ -216,6 +218,9 @@ int xe_survivability_mode_enable(struct xe_device *xe)
>   	struct xe_survivability_info *info;
>   	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
>   
> +	if (!xe_survivability_mode_requested(xe))
> +		return 0;
> +
>   	survivability->size = MAX_SCRATCH_MMIO;
>   
>   	info = devm_kcalloc(xe->drm.dev, survivability->size, sizeof(*info),
> diff --git a/drivers/gpu/drm/xe/xe_survivability_mode.h b/drivers/gpu/drm/xe/xe_survivability_mode.h
> index f4df5f9025ce8..d7e64885570d4 100644
> --- a/drivers/gpu/drm/xe/xe_survivability_mode.h
> +++ b/drivers/gpu/drm/xe/xe_survivability_mode.h
> @@ -12,6 +12,5 @@ struct xe_device;
>   
>   int xe_survivability_mode_enable(struct xe_device *xe);
>   bool xe_survivability_mode_is_enabled(struct xe_device *xe);
> -bool xe_survivability_mode_required(struct xe_device *xe);
>   
>   #endif /* _XE_SURVIVABILITY_MODE_H_ */
> 


  reply	other threads:[~2025-03-17  5:25 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-13 20:40 [PATCH v4 0/3] drm/xe: Fix survivability Lucas De Marchi
2025-03-13 20:40 ` [PATCH v4 1/3] drm/xe: Move survivability back to xe Lucas De Marchi
2025-03-17  5:24   ` Riana Tauro [this message]
2025-03-13 20:41 ` [PATCH v4 2/3] drm/xe: Set survivability mode before heci init Lucas De Marchi
2025-03-13 20:41 ` [PATCH v4 3/3] drm/xe: Allow to inject error in early probe Lucas De Marchi
2025-03-13 20:48 ` ✓ CI.Patch_applied: success for drm/xe: Fix survivability (rev4) Patchwork
2025-03-13 20:48 ` ✓ CI.checkpatch: " Patchwork
2025-03-13 20:49 ` ✓ CI.KUnit: " Patchwork
2025-03-13 21:06 ` ✓ CI.Build: " Patchwork
2025-03-13 21:08 ` ✓ CI.Hooks: " Patchwork
2025-03-13 21:10 ` ✓ CI.checksparse: " Patchwork
2025-03-14 12:07 ` ✓ Xe.CI.BAT: " Patchwork
2025-03-14 12:47 ` ✓ CI.Patch_applied: success for drm/xe: Fix survivability (rev6) Patchwork
2025-03-14 12:48 ` ✓ CI.checkpatch: " Patchwork
2025-03-14 12:49 ` ✓ CI.KUnit: " Patchwork
2025-03-14 13:19 ` ✓ CI.Build: " Patchwork
2025-03-14 13:21 ` ✓ CI.Hooks: " Patchwork
2025-03-14 13:23 ` ✓ CI.checksparse: " Patchwork
2025-03-14 13:44 ` ✓ Xe.CI.BAT: " Patchwork
2025-03-14 20:45 ` ✓ Xe.CI.Full: " Patchwork
2025-03-18 15:50 ` [PATCH v4 0/3] drm/xe: Fix survivability Raag Jadav

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=23861c48-b702-42d1-95c9-9960b8fc0916@intel.com \
    --to=riana.tauro@intel.com \
    --cc=francois.dugast@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lucas.demarchi@intel.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox