From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: "Christian König" <christian.koenig@amd.com>
Cc: intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
dri-devel@lists.freedesktop.org,
"Simona Vetter" <simona.vetter@ffwll.ch>,
"Jani Nikula" <jani.nikula@intel.com>,
"Jouni Högander" <jouni.hogander@intel.com>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
"Michel Dänzer" <michel.daenzer@mailbox.org>
Subject: Re: [PATCH 5/6] drm/i915/reset: Handle the display vs. GPU reset deadlock using a custom dma-fence
Date: Mon, 13 Apr 2026 12:58:40 +0300 [thread overview]
Message-ID: <ady-UMqIQUqmKsLv@intel.com> (raw)
In-Reply-To: <b93c6c33-2d94-4da8-8e97-04c3bade2575@amd.com>
On Mon, Apr 13, 2026 at 11:35:23AM +0200, Christian König wrote:
> On 4/13/26 11:11, Ville Syrjälä wrote:
> >>>> I think something is missing in my picture how that is supposed to work.
> >>>
> >>> The problem stems from the fact that on old platforms a GPU reset
> >>> also resets the display hardware,
> >>
> >> Which is true for at least AMD GPUs and I think pretty much everybody else as well, but that wasn't so much of a problem so far.
> >>
> >>> and to do that safely we need:
> >>> 1. shut down display
> >>> 2. perform the GPU reset
> >>> 3. restore the display hardware to its orignal state
> >>
> >> Mhm, I've recently talked with Michel about it and we confirmed that this is perfectly possible without issues. Adding Michel as well.
> >>
> >>> We just do that with essentially with a normal atomic commit.
> >>
> >> I think that is the source of the problem.
> >>
> >> I'm not an expert on that topic but amdgpu and tons of other drivers seem to just use drm_atomic_helper_shutdown() for that.
> >
> > drm_atomic_helper_shutdown() is definitely not the thing to use
> > for this as it would clobber the stored kms state, leaving everything
> > permanently disabled. The drm_atomic_helper_commit_duplicated_state()
> > stuff i915 uses is the correct thing here.
> >
> > But for this problem it doesn't even matter which gets used. Either
> > would get equally stuck behind a previous atomic commit waiting for
> > its fences.
> >
> >>
> >> What is i915 doing differently?
> >
> > I see zero code for any display reset stuff in any other driver. If
> > amdgpu does anything it must be something completely custom, hidden
> > somewhere deep.
>
> The display is just fully reset by any MODE1 reset, you don't need to do anything special for that.
You can't just ignore the fact that there may be a display hardware
reprogramming already happening in parallel. Failing to follow the
correct programming sequence is a recipe for even hard system hangs.
>
> Restoring the display after the reset is either not an atomic commit at all or done by an async worker after the reset completed.
Everything is an atomic commit. No one is going to implement a
complete second modeset codepath just for resets.
>
> >>> But a
> >>> previous atomic commit may already be waiting for a fence, which
> >>> won't signal until the GPU reset happens, and the GPU reset is now
> >>> waiting for that previous atomic commit to finish so that it can do
> >>> its own atomic commit. In order to break the deadlock we need to
> >>> abort the fence waits in the atomic commit, and that's what this
> >>> "reset fence" achieves.
> >>
> >> As far as I can see that approach looks strongly like a no-go.
> >>
> >> You essentially have a lock inversion here and it is documented that it should *never* be resolved by a timeout, the approach you take now is not much better.
> >
> > Unless someone wants to add some kind of extra abort mechanism to
> > dma_fence_wait*() then I think this is probably the best solution.
> > And given this is only a thing for one driver on old hardware,
> > adding extra stuff to dma_fence_wait*() doesn't really seem worth
> > the hassle.
>
> Well adding something to dma_fence_wait() would be a no-go as well.
>
> The point is you should *NOT* wait for an atomic commit to finish from your GPU reset at all. Any GPU reset which waits on a DMA-fence in one way or another is a really big bug.
>
> That is documented and Simona even came up with tons of lockdep annotation to make sure that drivers comply to that.
>
> I'm deeply surprised that a driver like i915 actually tries something like that. And as DMA-buf maintainer I must bluntly NAK that since it goes against documented rules for implementing drivers.
>
> I mean what can be is that this isn't really solved with the atomic mode setting helpers, but since multiple drivers use it it shouldn't be anything special at all.
There is zero support in the atomic design for reordering commits,
which is pretty much what would be needed. A generic solution would
more or less involve a full redesign/rewrite of atomic.
--
Ville Syrjälä
Intel
next prev parent reply other threads:[~2026-04-13 9:58 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-08 23:34 [PATCH 0/6] drm/i915/reset: Solve display vs. GPU reset deadlock, again Ville Syrjala
2026-04-08 23:34 ` [PATCH 1/6] dma-buf: Remove old lies about dma_fence_wait_any_timeout() not accepting some fences Ville Syrjala
2026-04-09 8:09 ` Jani Nikula
2026-04-09 10:39 ` Christian König
2026-04-08 23:34 ` [PATCH 2/6] drm/i915/reset: Reorganize display reset code Ville Syrjala
2026-04-09 8:13 ` Jani Nikula
2026-04-08 23:34 ` [PATCH 3/6] drm/i915/reset: Move pending_fb_pin handling to i915 Ville Syrjala
2026-04-09 8:17 ` Jani Nikula
2026-04-08 23:34 ` [PATCH 4/6] drm/xe/display: Add init_clock_gating.h stubs Ville Syrjala
2026-04-09 8:19 ` Jani Nikula
2026-04-08 23:34 ` [PATCH 5/6] drm/i915/reset: Handle the display vs. GPU reset deadlock using a custom dma-fence Ville Syrjala
2026-04-09 10:37 ` Jani Nikula
2026-04-09 10:46 ` Christian König
2026-04-09 11:19 ` Ville Syrjälä
2026-04-09 12:17 ` Christian König
2026-04-13 9:11 ` Ville Syrjälä
2026-04-13 9:35 ` Christian König
2026-04-13 9:58 ` Ville Syrjälä [this message]
2026-04-13 10:08 ` Ville Syrjälä
2026-04-13 11:24 ` Christian König
2026-04-13 11:49 ` Ville Syrjälä
2026-04-08 23:34 ` [PATCH 6/6] drm/i915/display: Make fence timeout infinite Ville Syrjala
2026-04-09 10:51 ` Jani Nikula
2026-04-09 3:27 ` ✓ i915.CI.BAT: success for drm/i915/reset: Solve display vs. GPU reset deadlock, again Patchwork
2026-04-09 12:06 ` ✓ i915.CI.Full: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ady-UMqIQUqmKsLv@intel.com \
--to=ville.syrjala@linux.intel.com \
--cc=christian.koenig@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=jani.nikula@intel.com \
--cc=jouni.hogander@intel.com \
--cc=maarten.lankhorst@linux.intel.com \
--cc=michel.daenzer@mailbox.org \
--cc=simona.vetter@ffwll.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox