Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
To: Lukasz Laguna <lukasz.laguna@intel.com>, intel-xe@lists.freedesktop.org
Cc: rodrigo.vivi@intel.com, matthew.brost@intel.com
Subject: Re: [PATCH v5 2/4] drm/xe: Don't update wedged mode in case of an error
Date: Wed, 14 May 2025 22:28:55 +0200	[thread overview]
Message-ID: <a1b5822b-8a4d-45d5-8708-aa3d0f56a640@intel.com> (raw)
In-Reply-To: <20250514101124.12437-3-lukasz.laguna@intel.com>



On 14.05.2025 12:11, Lukasz Laguna wrote:
> Update driver's internal wedged.mode state only in case of a success to
> avoid inconsistent state.
> 
> Fixes: 6b8ef44cc0a9 ("drm/xe: Introduce the wedged_mode debugfs")
> Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_debugfs.c | 8 ++++----
>  drivers/gpu/drm/xe/xe_guc_ads.c | 7 ++++---
>  drivers/gpu/drm/xe/xe_guc_ads.h | 3 ++-
>  3 files changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
> index 5dd0ad3f9bae..72515fc638b1 100644
> --- a/drivers/gpu/drm/xe/xe_debugfs.c
> +++ b/drivers/gpu/drm/xe/xe_debugfs.c
> @@ -170,19 +170,19 @@ static ssize_t wedged_mode_set(struct file *f, const char __user *ubuf,
>  	if (xe->wedged.mode == wedged_mode)
>  		return size;
>  
> -	xe->wedged.mode = wedged_mode;
> -
>  	xe_pm_runtime_get(xe);
>  	for_each_gt(gt, xe, id) {
> -		ret = xe_guc_ads_scheduler_policy_toggle_reset(&gt->uc.guc.ads);
> +		ret = xe_guc_ads_scheduler_policy_toggle_reset(&gt->uc.guc.ads, wedged_mode);
>  		if (ret) {
> -			xe_gt_err(gt, "Failed to update GuC ADS scheduler policy. GuC may still cause engine reset even with wedged_mode=2\n");
> +			xe_gt_err(gt, "Failed to update GuC ADS scheduler policy\n");

maybe also print the error as: %pe

>  			xe_pm_runtime_put(xe);

hmm, so now in case of multiple GTs, when it fails on second or later
GuC/GT, we truly end with an "inconsistent state"

shouldn't we at least try to reset policy on those GuC where we were
able to change it before we failed?

and if we fails at such recovery we still could have inconsistent state,
so maybe new enum (or flag) is still needed ?

>  			return -EIO;
>  		}
>  	}
>  	xe_pm_runtime_put(xe);
>  
> +	xe->wedged.mode = wedged_mode;
> +
>  	return size;
>  }
>  
> diff --git a/drivers/gpu/drm/xe/xe_guc_ads.c b/drivers/gpu/drm/xe/xe_guc_ads.c
> index 413711bd3f58..99aec2e8378b 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ads.c
> +++ b/drivers/gpu/drm/xe/xe_guc_ads.c
> @@ -997,12 +997,13 @@ static int guc_ads_action_update_policies(struct xe_guc_ads *ads, u32 policy_off
>  /**
>   * xe_guc_ads_scheduler_policy_toggle_reset - Toggle reset policy
>   * @ads: Additional data structures object
> + * @mode: The wedged mode
>   *
> - * This function update the GuC's engine reset policy based on wedged.mode.
> + * This function update the GuC's engine reset policy based on wedged mode.
>   *
>   * Return: 0 on success, and negative error code otherwise.
>   */
> -int xe_guc_ads_scheduler_policy_toggle_reset(struct xe_guc_ads *ads)
> +int xe_guc_ads_scheduler_policy_toggle_reset(struct xe_guc_ads *ads, enum xe_wedged_mode mode)

maybe we don't need full xe_wedged_mode but only simple bool for the
reset policy? this function is about "toggle reset policy" after all

>  {
>  	struct xe_device *xe = ads_to_xe(ads);
>  	struct xe_gt *gt = ads_to_gt(ads);
> @@ -1018,7 +1019,7 @@ int xe_guc_ads_scheduler_policy_toggle_reset(struct xe_guc_ads *ads)
>  	policies->dpc_promote_time = ads_blob_read(ads, policies.dpc_promote_time);
>  	policies->max_num_work_items = ads_blob_read(ads, policies.max_num_work_items);
>  	policies->is_valid = 1;
> -	if (xe->wedged.mode == XE_WEDGED_MODE_UPON_ANY_HANG)
> +	if (mode == XE_WEDGED_MODE_UPON_ANY_HANG)
>  		policies->global_flags |= GLOBAL_POLICY_DISABLE_ENGINE_RESET;
>  	else
>  		policies->global_flags &= ~GLOBAL_POLICY_DISABLE_ENGINE_RESET;
> diff --git a/drivers/gpu/drm/xe/xe_guc_ads.h b/drivers/gpu/drm/xe/xe_guc_ads.h
> index 2e6674c760ff..641d9c8375fc 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ads.h
> +++ b/drivers/gpu/drm/xe/xe_guc_ads.h
> @@ -7,12 +7,13 @@
>  #define _XE_GUC_ADS_H_
>  
>  struct xe_guc_ads;
> +enum xe_wedged_mode;
>  
>  int xe_guc_ads_init(struct xe_guc_ads *ads);
>  int xe_guc_ads_init_post_hwconfig(struct xe_guc_ads *ads);
>  void xe_guc_ads_populate(struct xe_guc_ads *ads);
>  void xe_guc_ads_populate_minimal(struct xe_guc_ads *ads);
>  void xe_guc_ads_populate_post_load(struct xe_guc_ads *ads);
> -int xe_guc_ads_scheduler_policy_toggle_reset(struct xe_guc_ads *ads);
> +int xe_guc_ads_scheduler_policy_toggle_reset(struct xe_guc_ads *ads, enum xe_wedged_mode mode);
>  
>  #endif


  reply	other threads:[~2025-05-14 20:29 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-14 10:11 [PATCH v5 0/4] drm/xe: Improve wedged mode handling Lukasz Laguna
2025-05-14 10:11 ` [PATCH v5 1/4] drm/xe: Validate wedged_mode parameter and define enum for modes Lukasz Laguna
2025-05-14 20:18   ` Michal Wajdeczko
2025-05-14 10:11 ` [PATCH v5 2/4] drm/xe: Don't update wedged mode in case of an error Lukasz Laguna
2025-05-14 20:28   ` Michal Wajdeczko [this message]
2025-05-14 20:43   ` Michal Wajdeczko
2025-05-14 10:11 ` [PATCH v5 3/4] drm/xe/vf: Disallow setting wedged_mode=2 Lukasz Laguna
2025-05-14 20:34   ` Michal Wajdeczko
2025-05-14 10:11 ` [PATCH v5 4/4] drm/xe/pf: Allow wedged_mode=2 only in debug config Lukasz Laguna
2025-05-14 20:39   ` Michal Wajdeczko
2025-05-14 22:10 ` ✓ CI.Patch_applied: success for drm/xe: Improve wedged mode handling (rev5) Patchwork
2025-05-14 22:10 ` ✓ CI.checkpatch: " Patchwork
2025-05-14 22:12 ` ✓ CI.KUnit: " Patchwork
2025-05-14 22:22 ` ✓ CI.Build: " Patchwork
2025-05-14 22:24 ` ✗ CI.Hooks: failure " Patchwork
2025-05-14 22:26 ` ✓ CI.checksparse: success " Patchwork
2025-05-14 22:59 ` ✓ Xe.CI.BAT: " Patchwork
2025-05-15  7:36 ` ✗ Xe.CI.Full: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a1b5822b-8a4d-45d5-8708-aa3d0f56a640@intel.com \
    --to=michal.wajdeczko@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lukasz.laguna@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox