All of lore.kernel.org
 help / color / mirror / Atom feed
From: Raag Jadav <raag.jadav@intel.com>
To: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
Cc: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	rodrigo.vivi@intel.com, andrealmeid@igalia.com,
	christian.koenig@amd.com, airlied@gmail.com,
	simona.vetter@ffwll.ch, mripard@kernel.org,
	anshuman.gupta@intel.com, badal.nilawar@intel.com,
	riana.tauro@intel.com, karthik.poosa@intel.com,
	sk.anirban@intel.com
Subject: Re: [PATCH v4 4/4] drm/xe: Handle PUNIT errors by requesting cold-reset recovery
Date: Tue, 21 Apr 2026 11:11:25 +0200	[thread overview]
Message-ID: <aec_PYDi46m6RmfY@black.igk.intel.com> (raw)
In-Reply-To: <20260413133013.560239-10-mallesh.koujalagi@intel.com>

On Mon, Apr 13, 2026 at 07:00:18PM +0530, Mallesh Koujalagi wrote:
> When PUNIT (power management unit) errors are detected that persist across
> warm resets, mark the device as wedged with DRM_WEDGE_RECOVERY_COLD_RESET
> and notify userspace that a complete device power cycle is required to
> restore normal operation.
> 
> v3:
> - Use PUNIT instead of PMU. (Riana)
> - Use consistent wordingi.
> - Remove log. (Raag)
> 
> v4:
> - Make function static. (Raag)
> 
> Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_ras.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
> index 437811845c01..653be0c67864 100644
> --- a/drivers/gpu/drm/xe/xe_ras.c
> +++ b/drivers/gpu/drm/xe/xe_ras.c
> @@ -5,6 +5,7 @@
>  
>  #include "xe_assert.h"
>  #include "xe_device_types.h"
> +#include "xe_device.h"
>  #include "xe_printk.h"
>  #include "xe_ras.h"
>  #include "xe_ras_types.h"
> @@ -93,6 +94,22 @@ static enum xe_ras_recovery_action handle_compute_errors(struct xe_device *xe,
>  	return XE_RAS_RECOVERY_ACTION_RECOVERED;
>  }
>  
> +/**
> + * xe_punit_error_handler - Handler for Punit errors requiring cold reset
> + * @xe: device instance
> + *
> + * Handles Punit errors that affect the device and cannot be recovered
> + * through driver reload, PCIe reset, etc.
> + *
> + * Marks the device as wedged with DRM_WEDGE_RECOVERY_COLD_RESET method
> + * and notifies userspace that a device cold reset is required.
> + */

I don't believe we need kdoc for static function.

> +static void xe_punit_error_handler(struct xe_device *xe)

There're also some discussions around not using xe_ prefix for static
functions but ofcourse it's a personal preference.

Raag

> +{
> +	xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_COLD_RESET);
> +	xe_device_declare_wedged(xe);
> +}
> +
>  static enum xe_ras_recovery_action handle_soc_internal_errors(struct xe_device *xe,
>  							      struct xe_ras_error_array *arr)
>  {
> @@ -132,7 +149,7 @@ static enum xe_ras_recovery_action handle_soc_internal_errors(struct xe_device *
>  			xe_err(xe, "[RAS]: PUNIT %s error detected: 0x%x\n",
>  			       severity_to_str(xe, common.severity),
>  			       ieh_error->global_error_status);
> -			/** TODO: Add PUNIT error handling */
> +			xe_punit_error_handler(xe);
>  			return XE_RAS_RECOVERY_ACTION_DISCONNECT;
>  		}
>  	}
> -- 
> 2.34.1
> 

  reply	other threads:[~2026-04-21  9:11 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-13 13:30 [PATCH v4 0/4] Introduce cold reset recovery method Mallesh Koujalagi
2026-04-13 13:30 ` [PATCH v4 1/4] Introduce Xe Uncorrectable Error Handling Mallesh Koujalagi
2026-04-13 13:30 ` [PATCH v4 2/4] drm: Add DRM_WEDGE_RECOVERY_COLD_RESET recovery method Mallesh Koujalagi
2026-04-21  8:19   ` Raag Jadav
2026-04-13 13:30 ` [PATCH v4 3/4] drm/doc: Document " Mallesh Koujalagi
2026-04-21  8:30   ` Raag Jadav
2026-05-12  8:29     ` Mallesh, Koujalagi
2026-04-13 13:30 ` [PATCH v4 4/4] drm/xe: Handle PUNIT errors by requesting cold-reset recovery Mallesh Koujalagi
2026-04-21  9:11   ` Raag Jadav [this message]
2026-04-13 17:43 ` ✗ CI.checkpatch: warning for Introduce cold reset recovery method (rev3) Patchwork
2026-04-13 17:44 ` ✓ CI.KUnit: success " Patchwork
2026-04-13 18:45 ` ✓ Xe.CI.BAT: " Patchwork
2026-04-13 21:22 ` ✗ Xe.CI.FULL: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aec_PYDi46m6RmfY@black.igk.intel.com \
    --to=raag.jadav@intel.com \
    --cc=airlied@gmail.com \
    --cc=andrealmeid@igalia.com \
    --cc=anshuman.gupta@intel.com \
    --cc=badal.nilawar@intel.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=karthik.poosa@intel.com \
    --cc=mallesh.koujalagi@intel.com \
    --cc=mripard@kernel.org \
    --cc=riana.tauro@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=simona.vetter@ffwll.ch \
    --cc=sk.anirban@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.