From: Raag Jadav <raag.jadav@intel.com>
To: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: "Christian König" <christian.koenig@amd.com>,
intel-xe@lists.freedesktop.org, matthew.brost@intel.com,
riana.tauro@intel.com, michal.wajdeczko@intel.com,
matthew.d.roper@intel.com, lukasz.laguna@intel.com
Subject: Re: [PATCH v1] drm/xe: Send unknown recovery method for XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET
Date: Mon, 16 Feb 2026 11:30:17 +0100 [thread overview]
Message-ID: <aZLxubuW7UrLaP1D@black.igk.intel.com> (raw)
In-Reply-To: <aY3dVn4pSgw-wNZD@intel.com>
On Thu, Feb 12, 2026 at 09:01:58AM -0500, Rodrigo Vivi wrote:
> On Thu, Feb 12, 2026 at 06:28:34AM +0100, Raag Jadav wrote:
> > On Wed, Feb 11, 2026 at 12:46:10PM -0500, Rodrigo Vivi wrote:
> > > On Fri, Feb 06, 2026 at 07:32:08AM +0100, Raag Jadav wrote:
> > > > On Thu, Feb 05, 2026 at 05:54:29PM -0500, Rodrigo Vivi wrote:
> > > > > On Thu, Feb 05, 2026 at 04:48:35PM +0530, Raag Jadav wrote:
> > > > > > XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET is intended for debugging hangs,
> > > > > > so wedge the device without any recovery method (unknown) and have it
> > > > > > available to the user for debugging.
> > > > > >
> > > > > > Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> > > > > > ---
> > > > > > drivers/gpu/drm/xe/xe_device.c | 9 ++++++++-
> > > > > > 1 file changed, 8 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> > > > > > index b1241fa4c3d6..815f0b0c9dfd 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_device.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_device.c
> > > > > > @@ -1326,8 +1326,15 @@ void xe_device_declare_wedged(struct xe_device *xe)
> > > > > > xe_gt_declare_wedged(gt);
> > > > > >
> > > > > > if (xe_device_wedged(xe)) {
> > > > > > + /*
> > > > > > + * XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET is intended for debugging hangs,
> > > > > > + * so wedge the device without any recovery method and have it available
> > > > > > + * to the user for debugging.
> > > > >
> > > > > agree....
> > > > >
> > > > > > + */
> > > > > > + if (xe->wedged.mode == XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET)
> > > > > > + xe_device_set_wedged_method(xe, 0);
> > > > >
> > > > > but why not using the already defined:
> > > > >
> > > > > #define DRM_WEDGE_RECOVERY_NONE BIT(0) /* optional telemetry collection */
> > > >
> > > > We originally added this for AMD usecase, and it doesn't strictly speaking
> > > > means 'wedged'.
> > > >
> > > > Documentation/gpu/drm-uapi.rst +441
> > > >
> > > > "The only exception to this is ``WEDGED=none``, which signifies that the device
> > > > was temporarily 'wedged' at some point but was recovered from driver context
> > > > using device specific methods like reset."
> > >
> > > Well, so, why not to change that to a more generic meaning then?!
> > >
> > > 'none' should mean, no recovery help is needed. go away user space.
> > > regardless if it is temporary or permanent...
> >
> > A few things,
> >
> > 1. I'm doubtful if Christian will allow it since they've built a lot of
> > infrastruction around it.
>
> there's only way to know that...
>
> Cc: Christian König <christian.koenig@amd.com>
What do you think Chris? Any objections?
Raag
> > 2. "Debugging" != "go away userspace" IMO since we ultimately do need the
> > recovery, it just won't be automated.
>
> exactly in my view: 'none' = 'no automation needed'
>
> much easier and meaningfully aligned than 'unkown'
>
> >
> > 3. I had debug cases in mind at the time and have already kept a provision
> > for them.
> >
> > Documentation/gpu/drm-uapi.rst +533
> >
> > "Consumers can also choose to have the device available for debugging or
> > telemetry collection and base their recovery decision on the findings.
> > This is useful especially when the driver is unsure about recovery or
> > method is unknown."
>
> Okay, so perhaps we need to update that. Because in my view, driver knows
> and it is pretty sure that no automated recover should take place in this
> case.
>
> > > > > > /* If no wedge recovery method is set, use default */
> > > > > > - if (!xe->wedged.method)
> > > > > > + else if (!xe->wedged.method)
> > > > > > xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_REBIND |
> > > > > > DRM_WEDGE_RECOVERY_BUS_RESET);
> > > > > >
> > > > > > --
> > > > > > 2.43.0
> > > > > >
next prev parent reply other threads:[~2026-02-16 10:30 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-05 11:18 [PATCH v1] drm/xe: Send unknown recovery method for XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET Raag Jadav
2026-02-05 12:43 ` ✓ CI.KUnit: success for " Patchwork
2026-02-05 13:22 ` ✓ Xe.CI.BAT: " Patchwork
2026-02-05 22:54 ` [PATCH v1] " Rodrigo Vivi
2026-02-06 6:32 ` Raag Jadav
2026-02-11 17:46 ` Rodrigo Vivi
2026-02-12 5:28 ` Raag Jadav
2026-02-12 14:01 ` Rodrigo Vivi
2026-02-16 10:30 ` Raag Jadav [this message]
2026-02-06 9:53 ` ✗ Xe.CI.FULL: failure for " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aZLxubuW7UrLaP1D@black.igk.intel.com \
--to=raag.jadav@intel.com \
--cc=christian.koenig@amd.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=lukasz.laguna@intel.com \
--cc=matthew.brost@intel.com \
--cc=matthew.d.roper@intel.com \
--cc=michal.wajdeczko@intel.com \
--cc=riana.tauro@intel.com \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox