Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Raag Jadav <raag.jadav@intel.com>
To: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: intel-xe@lists.freedesktop.org, matthew.brost@intel.com,
	riana.tauro@intel.com, michal.wajdeczko@intel.com,
	matthew.d.roper@intel.com, lukasz.laguna@intel.com
Subject: Re: [PATCH v1] drm/xe: Send unknown recovery method for XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET
Date: Thu, 12 Feb 2026 06:28:34 +0100	[thread overview]
Message-ID: <aY1lApm0_ZDMoyK5@black.igk.intel.com> (raw)
In-Reply-To: <aYzAYqWrDkuS1Ro2@intel.com>

On Wed, Feb 11, 2026 at 12:46:10PM -0500, Rodrigo Vivi wrote:
> On Fri, Feb 06, 2026 at 07:32:08AM +0100, Raag Jadav wrote:
> > On Thu, Feb 05, 2026 at 05:54:29PM -0500, Rodrigo Vivi wrote:
> > > On Thu, Feb 05, 2026 at 04:48:35PM +0530, Raag Jadav wrote:
> > > > XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET is intended for debugging hangs,
> > > > so wedge the device without any recovery method (unknown) and have it
> > > > available to the user for debugging.
> > > > 
> > > > Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/xe/xe_device.c | 9 ++++++++-
> > > >  1 file changed, 8 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> > > > index b1241fa4c3d6..815f0b0c9dfd 100644
> > > > --- a/drivers/gpu/drm/xe/xe_device.c
> > > > +++ b/drivers/gpu/drm/xe/xe_device.c
> > > > @@ -1326,8 +1326,15 @@ void xe_device_declare_wedged(struct xe_device *xe)
> > > >  		xe_gt_declare_wedged(gt);
> > > >  
> > > >  	if (xe_device_wedged(xe)) {
> > > > +		/*
> > > > +		 * XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET is intended for debugging hangs,
> > > > +		 * so wedge the device without any recovery method and have it available
> > > > +		 * to the user for debugging.
> > > 
> > > agree....
> > > 
> > > > +		 */
> > > > +		if (xe->wedged.mode == XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET)
> > > > +			xe_device_set_wedged_method(xe, 0);
> > > 
> > > but why not using the already defined:
> > > 
> > > #define DRM_WEDGE_RECOVERY_NONE    BIT(0)  /* optional telemetry collection */
> > 
> > We originally added this for AMD usecase, and it doesn't strictly speaking
> > means 'wedged'.
> > 
> > Documentation/gpu/drm-uapi.rst +441
> > 
> > "The only exception to this is ``WEDGED=none``, which signifies that the device
> > was temporarily 'wedged' at some point but was recovered from driver context
> > using device specific methods like reset."
> 
> Well, so, why not to change that to a more generic meaning then?!
> 
> 'none' should mean, no recovery help is needed. go away user space.
> regardless if it is temporary or permanent...

A few things,

1. I'm doubtful if Christian will allow it since they've built a lot of
infrastruction around it.

2. "Debugging" != "go away userspace" IMO since we ultimately do need the
recovery, it just won't be automated.

3. I had debug cases in mind at the time and have already kept a provision
for them.

Documentation/gpu/drm-uapi.rst +533

"Consumers can also choose to have the device available for debugging or
telemetry collection and base their recovery decision on the findings.
This is useful especially when the driver is unsure about recovery or
method is unknown."

Raag

> > > >  		/* If no wedge recovery method is set, use default */
> > > > -		if (!xe->wedged.method)
> > > > +		else if (!xe->wedged.method)
> > > >  			xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_REBIND |
> > > >  						    DRM_WEDGE_RECOVERY_BUS_RESET);
> > > >  
> > > > -- 
> > > > 2.43.0
> > > > 

  reply	other threads:[~2026-02-12  5:28 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-05 11:18 [PATCH v1] drm/xe: Send unknown recovery method for XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET Raag Jadav
2026-02-05 12:43 ` ✓ CI.KUnit: success for " Patchwork
2026-02-05 13:22 ` ✓ Xe.CI.BAT: " Patchwork
2026-02-05 22:54 ` [PATCH v1] " Rodrigo Vivi
2026-02-06  6:32   ` Raag Jadav
2026-02-11 17:46     ` Rodrigo Vivi
2026-02-12  5:28       ` Raag Jadav [this message]
2026-02-12 14:01         ` Rodrigo Vivi
2026-02-16 10:30           ` Raag Jadav
2026-02-06  9:53 ` ✗ Xe.CI.FULL: failure for " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aY1lApm0_ZDMoyK5@black.igk.intel.com \
    --to=raag.jadav@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lukasz.laguna@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=matthew.d.roper@intel.com \
    --cc=michal.wajdeczko@intel.com \
    --cc=riana.tauro@intel.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox