From: Raag Jadav <raag.jadav@intel.com>
To: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
Cc: intel-xe@lists.freedesktop.org, rodrigo.vivi@intel.com,
matthew.brost@intel.com, anshuman.gupta@intel.com,
badal.nilawar@intel.com, riana.tauro@intel.com,
karthik.poosa@intel.com, sk.anirban@intel.com
Subject: Re: [PATCH v4] drm/xe/xe_survivability: Fix runtime survivability error handling
Date: Tue, 21 Apr 2026 09:57:45 +0200 [thread overview]
Message-ID: <aect-TzkCHpt17uK@black.igk.intel.com> (raw)
In-Reply-To: <20260420020025.882006-2-mallesh.koujalagi@intel.com>
On Mon, Apr 20, 2026 at 07:30:26AM +0530, Mallesh Koujalagi wrote:
> xe_survivability_mode_runtime_enable() returns an int, but its caller
> csc_hw_error_work() cannot take any meaningful recovery action on
> failure. The function already handles all internal errors via dev_err()
dev_err() doesn't really handle any errors, it just logs them.
> and proceeds to enable survivability mode regardless of sysfs creation
> failure.
This looks more like a refactoring than fixing any real issue, so I'm not
sure if we should include Fixes tag here. Also probably worth updating both
subject and commit message to phrase the changes accordingly.
Raag
> Change the return type to void and drop unnecessary error handling
> in csc_hw_error_work().
>
> v2:
> - Return is not require after the sysfs creation fail. (Rodrigo/Riana)
> - Change int to void return type. (Rodrigo)
> - Remove extra message from csc_hw_error_work().
>
> v3:
> - Remove ret variable. (Raag)
>
> v4:
> - Drop ret variable from other part of code.
>
> Fixes: a2ca0633a0fe ("drm/xe/xe_survivability: Add support for Runtime survivability mode")
> Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
> ---
> drivers/gpu/drm/xe/xe_hw_error.c | 5 +----
> drivers/gpu/drm/xe/xe_survivability_mode.c | 14 ++++----------
> drivers/gpu/drm/xe/xe_survivability_mode.h | 2 +-
> 3 files changed, 6 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_hw_error.c b/drivers/gpu/drm/xe/xe_hw_error.c
> index 2a31b430570e..64d2260e761b 100644
> --- a/drivers/gpu/drm/xe/xe_hw_error.c
> +++ b/drivers/gpu/drm/xe/xe_hw_error.c
> @@ -169,11 +169,8 @@ static void csc_hw_error_work(struct work_struct *work)
> {
> struct xe_tile *tile = container_of(work, typeof(*tile), csc_hw_error_work);
> struct xe_device *xe = tile_to_xe(tile);
> - int ret;
>
> - ret = xe_survivability_mode_runtime_enable(xe);
> - if (ret)
> - drm_err(&xe->drm, "Failed to enable runtime survivability mode\n");
> + xe_survivability_mode_runtime_enable(xe);
> }
>
> static void csc_hw_error_handler(struct xe_tile *tile, const enum hardware_error hw_err)
> diff --git a/drivers/gpu/drm/xe/xe_survivability_mode.c b/drivers/gpu/drm/xe/xe_survivability_mode.c
> index db64cac39c94..427afd144f3a 100644
> --- a/drivers/gpu/drm/xe/xe_survivability_mode.c
> +++ b/drivers/gpu/drm/xe/xe_survivability_mode.c
> @@ -396,25 +396,21 @@ bool xe_survivability_mode_is_requested(struct xe_device *xe)
> * Runtime survivability mode is enabled when certain errors cause the device to be
> * in non-recoverable state. The device is declared wedged with the appropriate
> * recovery method and survivability mode sysfs exposed to userspace
> - *
> - * Return: 0 if runtime survivability mode is enabled, negative error code otherwise.
> */
> -int xe_survivability_mode_runtime_enable(struct xe_device *xe)
> +void xe_survivability_mode_runtime_enable(struct xe_device *xe)
> {
> struct xe_survivability *survivability = &xe->survivability;
> struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
> - int ret;
>
> if (!IS_DGFX(xe) || IS_SRIOV_VF(xe) || xe->info.platform < XE_BATTLEMAGE) {
> dev_err(&pdev->dev, "Runtime Survivability Mode not supported\n");
> - return -EINVAL;
> + return;
> }
>
> populate_survivability_info(xe);
>
> - ret = create_survivability_sysfs(pdev);
> - if (ret)
> - dev_err(&pdev->dev, "Failed to create survivability mode sysfs\n");
> + if (create_survivability_sysfs(pdev))
> + dev_err(&pdev->dev, "Failed to create survivability sysfs\n");
>
> survivability->type = XE_SURVIVABILITY_TYPE_RUNTIME;
> dev_err(&pdev->dev, "Runtime Survivability mode enabled\n");
> @@ -422,8 +418,6 @@ int xe_survivability_mode_runtime_enable(struct xe_device *xe)
> xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_VENDOR);
> xe_device_declare_wedged(xe);
> dev_err(&pdev->dev, "Firmware flash required, Please refer to the userspace documentation for more details!\n");
> -
> - return 0;
> }
>
> /**
> diff --git a/drivers/gpu/drm/xe/xe_survivability_mode.h b/drivers/gpu/drm/xe/xe_survivability_mode.h
> index 1cc94226aa82..cd040e4d18bb 100644
> --- a/drivers/gpu/drm/xe/xe_survivability_mode.h
> +++ b/drivers/gpu/drm/xe/xe_survivability_mode.h
> @@ -11,7 +11,7 @@
> struct xe_device;
>
> int xe_survivability_mode_boot_enable(struct xe_device *xe);
> -int xe_survivability_mode_runtime_enable(struct xe_device *xe);
> +void xe_survivability_mode_runtime_enable(struct xe_device *xe);
> bool xe_survivability_mode_is_boot_enabled(struct xe_device *xe);
> bool xe_survivability_mode_is_requested(struct xe_device *xe);
>
> --
> 2.34.1
>
prev parent reply other threads:[~2026-04-21 7:57 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-20 2:00 [PATCH v4] drm/xe/xe_survivability: Fix runtime survivability error handling Mallesh Koujalagi
2026-04-20 21:45 ` ✓ CI.KUnit: success for drm/xe/xe_survivability: Fix runtime survivability error handling (rev4) Patchwork
2026-04-20 22:32 ` ✓ Xe.CI.BAT: " Patchwork
2026-04-21 1:29 ` ✗ Xe.CI.FULL: failure " Patchwork
2026-04-21 7:57 ` Raag Jadav [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aect-TzkCHpt17uK@black.igk.intel.com \
--to=raag.jadav@intel.com \
--cc=anshuman.gupta@intel.com \
--cc=badal.nilawar@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=karthik.poosa@intel.com \
--cc=mallesh.koujalagi@intel.com \
--cc=matthew.brost@intel.com \
--cc=riana.tauro@intel.com \
--cc=rodrigo.vivi@intel.com \
--cc=sk.anirban@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox