From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Lukasz Laguna <lukasz.laguna@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <michal.wajdeczko@intel.com>
Subject: Re: [PATCH v10 1/4] drm/xe: Validate wedged_mode parameter and define enum for modes
Date: Tue, 25 Nov 2025 13:08:19 -0500 [thread overview]
Message-ID: <aSXwk4cFBmdYhoo-@intel.com> (raw)
In-Reply-To: <20251125135422.11244-2-lukasz.laguna@intel.com>
On Tue, Nov 25, 2025 at 02:54:19PM +0100, Lukasz Laguna wrote:
> Check correctness of the wedged_mode parameter input to ensure only
> supported values are accepted. Additionally, replace magic numbers with
> a clearly defined enum.
>
> Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
> ---
> v10:
> - define enum outside of the xe_device struct,
> - fix description of module parameter (Michal).
> ---
> drivers/gpu/drm/xe/xe_debugfs.c | 5 +--
> drivers/gpu/drm/xe/xe_device.c | 46 ++++++++++++++++++++++++++--
> drivers/gpu/drm/xe/xe_device.h | 2 ++
> drivers/gpu/drm/xe/xe_device_types.h | 19 +++++++++++-
> drivers/gpu/drm/xe/xe_guc_ads.c | 4 +--
> drivers/gpu/drm/xe/xe_guc_capture.c | 9 +++++-
> drivers/gpu/drm/xe/xe_guc_submit.c | 7 +++--
> drivers/gpu/drm/xe/xe_module.c | 10 +++---
> drivers/gpu/drm/xe/xe_module.h | 2 +-
> 9 files changed, 88 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
> index 1d5a2a43a9d7..e7cdfb1cda8c 100644
> --- a/drivers/gpu/drm/xe/xe_debugfs.c
> +++ b/drivers/gpu/drm/xe/xe_debugfs.c
> @@ -265,8 +265,9 @@ static ssize_t wedged_mode_set(struct file *f, const char __user *ubuf,
> if (ret)
> return ret;
>
> - if (wedged_mode > 2)
> - return -EINVAL;
> + ret = xe_device_validate_wedged_mode(xe, wedged_mode);
> + if (ret)
> + return ret;
>
> if (xe->wedged.mode == wedged_mode)
> return size;
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
Please, while at it, please update its doc block:
* DOC: Xe Device Wedging
> index 1197f914ef77..f6216c4f83f9 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -760,7 +760,10 @@ int xe_device_probe_early(struct xe_device *xe)
> if (err)
> return err;
>
> - xe->wedged.mode = xe_modparam.wedged_mode;
> + xe->wedged.mode = xe_device_validate_wedged_mode(xe, xe_modparam.wedged_mode) ?
> + XE_WEDGED_MODE_DEFAULT : xe_modparam.wedged_mode;
> + drm_dbg(&xe->drm, "wedged_mode: setting mode (%u) %s\n",
> + xe->wedged.mode, xe_wedged_mode_to_string(xe->wedged.mode));
>
> err = xe_device_vram_alloc(xe);
> if (err)
> @@ -1250,7 +1253,7 @@ void xe_device_declare_wedged(struct xe_device *xe)
> struct xe_gt *gt;
> u8 id;
>
> - if (xe->wedged.mode == 0) {
> + if (xe->wedged.mode == XE_WEDGED_MODE_NEVER) {
> drm_dbg(&xe->drm, "Wedged mode is forcibly disabled\n");
> return;
> }
> @@ -1284,3 +1287,42 @@ void xe_device_declare_wedged(struct xe_device *xe)
> drm_dev_wedged_event(&xe->drm, xe->wedged.method, NULL);
> }
> }
> +
> +/**
> + * xe_device_validate_wedged_mode - Check if given mode is supported
> + * @xe: the &xe_device
> + * @mode: requested mode to validate
> + *
> + * Check whether the provided wedged mode is supported.
> + *
> + * Return: 0 if mode is supported, error code otherwise.
> + */
> +int xe_device_validate_wedged_mode(struct xe_device *xe, unsigned int mode)
> +{
> + if (mode > XE_WEDGED_MODE_UPON_ANY_HANG) {
> + drm_dbg(&xe->drm, "wedged_mode: invalid value (%u)\n", mode);
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +/**
> + * xe_wedged_mode_to_string - Convert enum value to string.
> + * @mode: the &xe_wedged_mode to convert
> + *
> + * Returns: wedged mode as a user friendly string.
> + */
> +const char *xe_wedged_mode_to_string(enum xe_wedged_mode mode)
> +{
> + switch (mode) {
> + case XE_WEDGED_MODE_NEVER:
> + return "never";
> + case XE_WEDGED_MODE_UPON_CRITICAL_ERROR:
> + return "upon-critical-error";
> + case XE_WEDGED_MODE_UPON_ANY_HANG:
> + return "upon-any-hang";
> + default:
> + return "<invalid>";
> + }
> +}
> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> index 32cc6323b7f6..2fffa88f91b0 100644
> --- a/drivers/gpu/drm/xe/xe_device.h
> +++ b/drivers/gpu/drm/xe/xe_device.h
> @@ -189,6 +189,8 @@ static inline bool xe_device_wedged(struct xe_device *xe)
>
> void xe_device_set_wedged_method(struct xe_device *xe, unsigned long method);
> void xe_device_declare_wedged(struct xe_device *xe);
> +int xe_device_validate_wedged_mode(struct xe_device *xe, unsigned int mode);
> +const char *xe_wedged_mode_to_string(enum xe_wedged_mode mode);
>
> struct xe_file *xe_file_get(struct xe_file *xef);
> void xe_file_put(struct xe_file *xef);
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 6ce3247d1bd8..81bf4a70b888 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -43,6 +43,23 @@ struct xe_pat_ops;
> struct xe_pxp;
> struct xe_vram_region;
>
> +/**
> + * enum xe_wedged_mode - possible wedged modes
> + * @XE_WEDGED_MODE_NEVER: Device will never be declared wedged.
> + * @XE_WEDGED_MODE_UPON_CRITICAL_ERROR: Device will be declared wedged only in
> + * case of critical errors. This is the default mode.
> + * @XE_WEDGED_MODE_UPON_ANY_HANG: Device will be declared wedged on any hang.
> + * In this mode, engine resets are disabled.
> + */
> +enum xe_wedged_mode {
> + XE_WEDGED_MODE_NEVER = 0,
> + XE_WEDGED_MODE_UPON_CRITICAL_ERROR = 1,
> + XE_WEDGED_MODE_UPON_ANY_HANG = 2,
> +};
> +
> +#define XE_WEDGED_MODE_DEFAULT XE_WEDGED_MODE_UPON_CRITICAL_ERROR
> +#define XE_WEDGED_MODE_DEFAULT_STR "upon-critical-error"
> +
> #define XE_BO_INVALID_OFFSET LONG_MAX
>
> #define GRAPHICS_VER(xe) ((xe)->info.graphics_verx100 / 100)
> @@ -583,7 +600,7 @@ struct xe_device {
> /** @wedged.flag: Xe device faced a critical error and is now blocked. */
> atomic_t flag;
> /** @wedged.mode: Mode controlled by kernel parameter and debugfs */
> - int mode;
> + enum xe_wedged_mode mode;
> /** @wedged.method: Recovery method to be sent in the drm device wedged uevent */
> unsigned long method;
> } wedged;
> diff --git a/drivers/gpu/drm/xe/xe_guc_ads.c b/drivers/gpu/drm/xe/xe_guc_ads.c
> index bcb85a1bf26d..466f192b6054 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ads.c
> +++ b/drivers/gpu/drm/xe/xe_guc_ads.c
> @@ -451,7 +451,7 @@ static void guc_policies_init(struct xe_guc_ads *ads)
> ads_blob_write(ads, policies.max_num_work_items,
> GLOBAL_POLICY_MAX_NUM_WI);
>
> - if (xe->wedged.mode == 2)
> + if (xe->wedged.mode == XE_WEDGED_MODE_UPON_ANY_HANG)
> global_flags |= GLOBAL_POLICY_DISABLE_ENGINE_RESET;
This mode=2 end-go is not to actually wedge 'upon any hang', but actually
to avoid resetting the GPU at any hang... like do not try to recover.
I wonder if we should and could make this explicit in the name somehow..
perhaps upon_any_hang_no_reset_attempt or upon_any_hang_no_recover_attempt...
but no strong opinion in here... just some random thoughts to make this be
more explicit.
>
> ads_blob_write(ads, policies.global_flags, global_flags);
> @@ -1004,7 +1004,7 @@ int xe_guc_ads_scheduler_policy_toggle_reset(struct xe_guc_ads *ads)
> policies->dpc_promote_time = ads_blob_read(ads, policies.dpc_promote_time);
> policies->max_num_work_items = ads_blob_read(ads, policies.max_num_work_items);
> policies->is_valid = 1;
> - if (xe->wedged.mode == 2)
> + if (xe->wedged.mode == XE_WEDGED_MODE_UPON_ANY_HANG)
> policies->global_flags |= GLOBAL_POLICY_DISABLE_ENGINE_RESET;
> else
> policies->global_flags &= ~GLOBAL_POLICY_DISABLE_ENGINE_RESET;
> diff --git a/drivers/gpu/drm/xe/xe_guc_capture.c b/drivers/gpu/drm/xe/xe_guc_capture.c
> index 0c1fbe97b8bf..45ba0c5308ba 100644
> --- a/drivers/gpu/drm/xe/xe_guc_capture.c
> +++ b/drivers/gpu/drm/xe/xe_guc_capture.c
> @@ -1889,7 +1889,14 @@ xe_guc_capture_get_matching_and_lock(struct xe_exec_queue *q)
> return NULL;
>
> xe = gt_to_xe(q->gt);
> - if (xe->wedged.mode >= 2 || !xe_device_uc_enabled(xe) || IS_SRIOV_VF(xe))
> +
> + if (xe->wedged.mode == XE_WEDGED_MODE_UPON_ANY_HANG)
> + return NULL;
> +
> + if (!xe_device_uc_enabled(xe))
> + return NULL;
> +
> + if (IS_SRIOV_VF(xe))
> return NULL;
>
> ss = &xe->devcoredump.snapshot;
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 7e0882074a99..4a93a2ff061f 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -998,8 +998,9 @@ void xe_guc_submit_wedge(struct xe_guc *guc)
> err = devm_add_action_or_reset(guc_to_xe(guc)->drm.dev,
> guc_submit_wedged_fini, guc);
> if (err) {
> - xe_gt_err(gt, "Failed to register clean-up on wedged.mode=2; "
> - "Although device is wedged.\n");
> + xe_gt_err(gt, "Failed to register clean-up in wedged.mode=%s; "
> + "Although device is wedged.\n",
> + xe_wedged_mode_to_string(XE_WEDGED_MODE_UPON_ANY_HANG));
> return;
> }
>
> @@ -1014,7 +1015,7 @@ static bool guc_submit_hint_wedged(struct xe_guc *guc)
> {
> struct xe_device *xe = guc_to_xe(guc);
>
> - if (xe->wedged.mode != 2)
> + if (xe->wedged.mode != XE_WEDGED_MODE_UPON_ANY_HANG)
> return false;
>
> if (xe_device_wedged(xe))
> diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
> index d08338fc3bc1..cefd00eb7379 100644
> --- a/drivers/gpu/drm/xe/xe_module.c
> +++ b/drivers/gpu/drm/xe/xe_module.c
> @@ -10,6 +10,7 @@
>
> #include <drm/drm_module.h>
>
> +#include "xe_device.h"
> #include "xe_drv.h"
> #include "xe_configfs.h"
> #include "xe_hw_fence.h"
> @@ -29,7 +30,8 @@
> #define DEFAULT_FORCE_PROBE CONFIG_DRM_XE_FORCE_PROBE
> #define DEFAULT_MAX_VFS ~0
> #define DEFAULT_MAX_VFS_STR "unlimited"
> -#define DEFAULT_WEDGED_MODE 1
> +#define DEFAULT_WEDGED_MODE XE_WEDGED_MODE_DEFAULT
> +#define DEFAULT_WEDGED_MODE_STR XE_WEDGED_MODE_DEFAULT_STR
> #define DEFAULT_SVM_NOTIFIER_SIZE 512
>
> struct xe_modparam xe_modparam = {
> @@ -88,10 +90,10 @@ MODULE_PARM_DESC(max_vfs,
> "[default=" DEFAULT_MAX_VFS_STR "])");
> #endif
>
> -module_param_named_unsafe(wedged_mode, xe_modparam.wedged_mode, int, 0600);
> +module_param_named_unsafe(wedged_mode, xe_modparam.wedged_mode, uint, 0600);
> MODULE_PARM_DESC(wedged_mode,
> - "Module's default policy for the wedged mode (0=never, 1=upon-critical-errors, 2=upon-any-hang "
> - "[default=" __stringify(DEFAULT_WEDGED_MODE) "])");
> + "Module's default policy for the wedged mode (0=never, 1=upon-critical-error, 2=upon-any-hang "
> + "[default=" DEFAULT_WEDGED_MODE_STR "])");
>
> static int xe_check_nomodeset(void)
> {
> diff --git a/drivers/gpu/drm/xe/xe_module.h b/drivers/gpu/drm/xe/xe_module.h
> index 5a3bfea8b7b4..1c75f38ca393 100644
> --- a/drivers/gpu/drm/xe/xe_module.h
> +++ b/drivers/gpu/drm/xe/xe_module.h
> @@ -21,7 +21,7 @@ struct xe_modparam {
> #ifdef CONFIG_PCI_IOV
> unsigned int max_vfs;
> #endif
> - int wedged_mode;
> + unsigned int wedged_mode;
> u32 svm_notifier_size;
> };
>
> --
> 2.40.0
>
next prev parent reply other threads:[~2025-11-25 18:08 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-25 13:54 [PATCH v10 0/4] drm/xe: Improve wedged mode handling Lukasz Laguna
2025-11-25 13:54 ` [PATCH v10 1/4] drm/xe: Validate wedged_mode parameter and define enum for modes Lukasz Laguna
2025-11-25 18:08 ` Rodrigo Vivi [this message]
2025-11-26 14:37 ` Laguna, Lukasz
2025-11-25 13:54 ` [PATCH v10 2/4] drm/xe: Don't update wedged mode in case of an error Lukasz Laguna
2025-11-25 18:13 ` Rodrigo Vivi
2025-11-26 14:42 ` Laguna, Lukasz
2025-11-25 13:54 ` [PATCH v10 3/4] drm/xe/vf: Disallow setting wedged mode to upon-any-hang Lukasz Laguna
2025-11-25 13:54 ` [PATCH v10 4/4] drm/xe/pf: Allow upon-any-hang wedged mode only in debug config Lukasz Laguna
2025-11-25 21:28 ` ✓ CI.KUnit: success for drm/xe: Improve wedged mode handling (rev10) Patchwork
2025-11-25 22:29 ` ✓ Xe.CI.BAT: " Patchwork
2025-11-26 1:32 ` ✗ Xe.CI.Full: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aSXwk4cFBmdYhoo-@intel.com \
--to=rodrigo.vivi@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=lukasz.laguna@intel.com \
--cc=michal.wajdeczko@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.