From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: "André Almeida" <andrealmeid@igalia.com>
Cc: "Raag Jadav" <raag.jadav@intel.com>,
intel-xe@lists.freedesktop.org, thomas.hellstrom@linux.intel.com,
simona@ffwll.ch, intel-gfx@lists.freedesktop.org,
joonas.lahtinen@linux.intel.com, dri-devel@lists.freedesktop.org,
himal.prasad.ghimiray@intel.com, lucas.demarchi@intel.com,
tursulin@ursulin.net, francois.dugast@intel.com,
jani.nikula@linux.intel.com, airlied@gmail.com,
aravind.iddamsetty@linux.intel.com, anshuman.gupta@intel.com,
andi.shyti@linux.intel.com, matthew.d.roper@intel.com,
andriy.shevchenko@linux.intel.com, lina@asahilina.net,
kernel-dev@igalia.com, "Alex Deucher" <alexander.deucher@amd.com>,
"Christian König" <christian.koenig@amd.com>
Subject: Re: [PATCH v7 1/5] drm: Introduce device wedged event
Date: Fri, 18 Oct 2024 10:56:38 -0400 [thread overview]
Message-ID: <ZxJ3DJWY9Lsc9Mn4@intel.com> (raw)
In-Reply-To: <ed8cb1e9-df05-44a7-9088-90b3ee8dce85@igalia.com>
On Thu, Oct 17, 2024 at 04:16:09PM -0300, André Almeida wrote:
> Hi Raag,
>
> Em 30/09/2024 04:38, Raag Jadav escreveu:
> > Introduce device wedged event, which will notify userspace of wedged
> > (hanged/unusable) state of the DRM device through a uevent. This is
> > useful especially in cases where the device is no longer operating as
> > expected even after a hardware reset and has become unrecoverable from
> > driver context.
> >
> > Purpose of this implementation is to provide drivers a generic way to
> > recover with the help of userspace intervention. Different drivers may
> > have different ideas of a "wedged device" depending on their hardware
> > implementation, and hence the vendor agnostic nature of the event.
> > It is up to the drivers to decide when they see the need for recovery
> > and how they want to recover from the available methods.
> >
> > Current implementation defines three recovery methods, out of which,
> > drivers can choose to support any one or multiple of them. Preferred
> > recovery method will be sent in the uevent environment as WEDGED=<method>.
> > Userspace consumers (sysadmin) can define udev rules to parse this event
> > and take respective action to recover the device.
> >
> > =============== ==================================
> > Recovery method Consumer expectations
> > =============== ==================================
> > rebind unbind + rebind driver
> > bus-reset unbind + reset bus device + rebind
> > reboot reboot system
> > =============== ==================================
> >
> >
>
> I proposed something similar in the past: https://lore.kernel.org/dri-devel/20221125175203.52481-1-andrealmeid@igalia.com/
>
> The motivation was that amdgpu was getting stuck after every GPU reset, and
> there was just a black screen. The uevent would then trigger a daemon to
> reset the compositor and getting things back together. As you can see in my
> thread, the feature was blocked in favor of getting better overall GPU reset
> from the kernel side.
>
> Which kind of scenarios are making i915/xe the need to have userspace
> involvement? I tested a bunch of resets in i915 but never managed to get the
> driver stuck.
2 scenarios:
1. Multiple levels of reset has failed and device was declared wedged. This is
rare indeed as the resets improved a lot.
2. Debug case. We can boot the driver with option to declare device wedged at
any timeout, so the device can be debugged.
>
> For the bus-reset, amdgpu does that too, but it doesn't require userspace
> intervention.
How do you trigger that?
next prev parent reply other threads:[~2024-10-18 14:56 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-30 7:38 [PATCH v7 0/5] Introduce DRM device wedged event Raag Jadav
2024-09-30 7:38 ` [PATCH v7 1/5] drm: Introduce " Raag Jadav
2024-09-30 12:59 ` Andy Shevchenko
2024-10-01 5:08 ` Raag Jadav
2024-10-01 12:07 ` Andy Shevchenko
2024-10-01 14:18 ` Raag Jadav
2024-10-01 14:54 ` Andy Shevchenko
2024-10-01 16:42 ` Raag Jadav
2024-10-01 12:20 ` Michal Wajdeczko
2024-10-03 12:23 ` Raag Jadav
2024-10-08 15:02 ` Raag Jadav
2024-10-10 13:02 ` Lucas De Marchi
2024-10-11 8:47 ` Raag Jadav
2024-10-17 2:47 ` Raag Jadav
2024-10-17 7:59 ` Christian König
2024-10-17 16:43 ` Rodrigo Vivi
2024-10-18 10:58 ` Christian König
2024-10-18 12:46 ` Raag Jadav
2024-10-18 12:54 ` Christian König
2024-10-18 14:09 ` Raag Jadav
2024-10-17 19:16 ` André Almeida
2024-10-18 14:56 ` Rodrigo Vivi [this message]
2024-10-18 15:31 ` Alex Deucher
2024-10-18 17:56 ` André Almeida
2024-10-18 21:07 ` Alex Deucher
2024-10-24 17:48 ` Rodrigo Vivi
2024-10-19 19:08 ` Raag Jadav
2024-09-30 7:38 ` [PATCH v7 2/5] drm: Expose wedge recovery methods Raag Jadav
2024-09-30 13:01 ` Andy Shevchenko
2024-10-01 5:23 ` Raag Jadav
2024-09-30 7:38 ` [PATCH v7 3/5] drm/doc: Document device wedged event Raag Jadav
2024-09-30 7:38 ` [PATCH v7 4/5] drm/xe: Use " Raag Jadav
2024-09-30 7:38 ` [PATCH v7 5/5] drm/i915: " Raag Jadav
2024-09-30 7:47 ` ✗ CI.Patch_applied: failure for Introduce DRM device wedged event (rev5) Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZxJ3DJWY9Lsc9Mn4@intel.com \
--to=rodrigo.vivi@intel.com \
--cc=airlied@gmail.com \
--cc=alexander.deucher@amd.com \
--cc=andi.shyti@linux.intel.com \
--cc=andrealmeid@igalia.com \
--cc=andriy.shevchenko@linux.intel.com \
--cc=anshuman.gupta@intel.com \
--cc=aravind.iddamsetty@linux.intel.com \
--cc=christian.koenig@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=francois.dugast@intel.com \
--cc=himal.prasad.ghimiray@intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=jani.nikula@linux.intel.com \
--cc=joonas.lahtinen@linux.intel.com \
--cc=kernel-dev@igalia.com \
--cc=lina@asahilina.net \
--cc=lucas.demarchi@intel.com \
--cc=matthew.d.roper@intel.com \
--cc=raag.jadav@intel.com \
--cc=simona@ffwll.ch \
--cc=thomas.hellstrom@linux.intel.com \
--cc=tursulin@ursulin.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).