Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Raag Jadav <raag.jadav@intel.com>
To: Matt Roper <matthew.d.roper@intel.com>
Cc: airlied@gmail.com, daniel@ffwll.ch, lucas.demarchi@intel.com,
	thomas.hellstrom@linux.intel.com, rodrigo.vivi@intel.com,
	jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com,
	tursulin@ursulin.net, intel-xe@lists.freedesktop.org,
	intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	himal.prasad.ghimiray@intel.com, francois.dugast@intel.com,
	aravind.iddamsetty@linux.intel.com, anshuman.gupta@intel.com
Subject: Re: [PATCH v4 1/3] drm: Introduce device wedged event
Date: Tue, 10 Sep 2024 18:49:00 +0300	[thread overview]
Message-ID: <ZuBqbFA8_d0khPCY@black.fi.intel.com> (raw)
In-Reply-To: <20240909215323.GC5774@mdroper-desk1.amr.corp.intel.com>

On Mon, Sep 09, 2024 at 02:53:23PM -0700, Matt Roper wrote:
> On Fri, Sep 06, 2024 at 03:12:23PM +0530, Raag Jadav wrote:
> > Introduce device wedged event, which will notify userspace of wedged
> > (hanged/unusable) state of the DRM device through a uevent. This is
> > useful especially in cases where the device is in unrecoverable state
> > and requires userspace intervention for recovery.
> > 
> > Purpose of this implementation is to be vendor agnostic. Userspace
> > consumers (sysadmin) can define udev rules to parse this event and
> > take respective action to recover the device.
> > 
> > Consumer expectations:
> > ----------------------
> > 1) Unbind driver
> > 2) Reset bus device
> > 3) Re-bind driver
> > 
> > v4: s/drm_dev_wedged/drm_dev_wedged_event
> >     Use drm_info() (Jani)
> >     Kernel doc adjustment (Aravind)
> > 
> > Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> > ---
> >  drivers/gpu/drm/drm_drv.c | 20 ++++++++++++++++++++
> >  include/drm/drm_drv.h     |  1 +
> >  2 files changed, 21 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > index 93543071a500..cca5d8295eb7 100644
> > --- a/drivers/gpu/drm/drm_drv.c
> > +++ b/drivers/gpu/drm/drm_drv.c
> > @@ -499,6 +499,26 @@ void drm_dev_unplug(struct drm_device *dev)
> >  }
> >  EXPORT_SYMBOL(drm_dev_unplug);
> >  
> > +/**
> > + * drm_dev_wedged_event - generate a device wedged uevent
> > + * @dev: DRM device
> > + *
> > + * This generates a device wedged uevent for the DRM device specified by @dev,
> > + * on the basis of which, userspace may take respective action to recover the
> > + * device. Currently we only set WEDGED=1 in the uevent environment, but this
> > + * can be expanded in the future.
> 
> Just to clarify, is "wedged" intended to always mean "the entire device
> is unusable" or are there cases where it would also get sent if only
> part of the device is in a bad state?  For example, using i915/Xe
> terminology, maybe the GT is dead but display is still working.  Or one
> GT is dead, but another is still alive.

The idea is to provide drivers a way to recover through userspace intervention.
It is upto the drivers to decide when they see the need for recovery and how
they want to recover.

> Basically, is this event intended as a signal that userspace should stop
> trying to do _anything_ with the device, or just that the device has
> degraded functionality in some way (and maybe userspace can still do
> something useful if it's lucky)?  It would be good to clarify that in
> the docs here in case different drivers have different ideas about how
> this is expected to work.

And hence the open discussion. Improvements are welcome :)

Raag

  reply	other threads:[~2024-09-10 15:49 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-06  9:42 [PATCH v4 0/3] Introduce DRM device wedged event Raag Jadav
2024-09-06  9:42 ` [PATCH v4 1/3] drm: Introduce " Raag Jadav
2024-09-07 11:38   ` Asahi Lina
2024-09-07 15:07     ` Lucas De Marchi
2024-09-08 14:08       ` Asahi Lina
2024-09-09 20:01         ` Lucas De Marchi
2024-09-10 15:53           ` Raag Jadav
2024-09-10 16:06             ` Lucas De Marchi
2024-09-09 20:43         ` Rodrigo Vivi
2024-09-09 21:53   ` Matt Roper
2024-09-10 15:49     ` Raag Jadav [this message]
2024-09-24  9:37   ` Simona Vetter
2024-09-06  9:42 ` [PATCH v4 2/3] drm/xe: Use " Raag Jadav
2024-09-06  9:42 ` [PATCH v4 3/3] drm/i915: " Raag Jadav
2024-09-06 10:52 ` ✗ Fi.CI.CHECKPATCH: warning for Introduce DRM device wedged event (rev2) Patchwork
2024-09-06 10:52 ` ✗ Fi.CI.SPARSE: " Patchwork
2024-09-06 10:59 ` ✓ Fi.CI.BAT: success " Patchwork
2024-09-10  8:53 ` ✗ Fi.CI.IGT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZuBqbFA8_d0khPCY@black.fi.intel.com \
    --to=raag.jadav@intel.com \
    --cc=airlied@gmail.com \
    --cc=anshuman.gupta@intel.com \
    --cc=aravind.iddamsetty@linux.intel.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=francois.dugast@intel.com \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jani.nikula@linux.intel.com \
    --cc=joonas.lahtinen@linux.intel.com \
    --cc=lucas.demarchi@intel.com \
    --cc=matthew.d.roper@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=tursulin@ursulin.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox