public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: "Christian König" <christian.koenig@amd.com>
Cc: intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org,
	"Simona Vetter" <simona.vetter@ffwll.ch>,
	"Jani Nikula" <jani.nikula@intel.com>,
	"Jouni Högander" <jouni.hogander@intel.com>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Michel Dänzer" <michel.daenzer@mailbox.org>
Subject: Re: [PATCH 5/6] drm/i915/reset: Handle the display vs. GPU reset deadlock using a custom dma-fence
Date: Mon, 13 Apr 2026 14:49:54 +0300	[thread overview]
Message-ID: <adzYYqOkSfY9XP_m@intel.com> (raw)
In-Reply-To: <855c4188-3701-468f-bd78-8292d6143645@amd.com>

On Mon, Apr 13, 2026 at 01:24:04PM +0200, Christian König wrote:
> On 4/13/26 11:58, Ville Syrjälä wrote:
> > On Mon, Apr 13, 2026 at 11:35:23AM +0200, Christian König wrote:
> >> On 4/13/26 11:11, Ville Syrjälä wrote:
> >>>>>> I think something is missing in my picture how that is supposed to work.
> >>>>>
> >>>>> The problem stems from the fact that on old platforms a GPU reset
> >>>>> also resets the display hardware,
> >>>>
> >>>> Which is true for at least AMD GPUs and I think pretty much everybody else as well, but that wasn't so much of a problem so far.
> >>>>
> >>>>> and to do that safely we need:
> >>>>> 1. shut down display
> >>>>> 2. perform the GPU reset
> >>>>> 3. restore the display hardware to its orignal state
> >>>>
> >>>> Mhm, I've recently talked with Michel about it and we confirmed that this is perfectly possible without issues. Adding Michel as well.
> >>>>
> >>>>> We just do that with essentially with a normal atomic commit.
> >>>>
> >>>> I think that is the source of the problem.
> >>>>
> >>>> I'm not an expert on that topic but amdgpu and tons of other drivers seem to just use drm_atomic_helper_shutdown() for that.
> >>>
> >>> drm_atomic_helper_shutdown() is definitely not the thing to use
> >>> for this as it would clobber the stored kms state, leaving everything
> >>> permanently disabled. The drm_atomic_helper_commit_duplicated_state()
> >>> stuff i915 uses is the correct thing here.
> >>>
> >>> But for this problem it doesn't even matter which gets used. Either
> >>> would get equally stuck behind a previous atomic commit waiting for
> >>> its fences.
> >>>
> >>>>
> >>>> What is i915 doing differently?
> >>>
> >>> I see zero code for any display reset stuff in any other driver. If
> >>> amdgpu does anything it must be something completely custom, hidden
> >>> somewhere deep.
> >>
> >> The display is just fully reset by any MODE1 reset, you don't need to do anything special for that.
> > 
> > You can't just ignore the fact that there may be a display hardware
> > reprogramming already happening in parallel. Failing to follow the
> > correct programming sequence is a recipe for even hard system hangs.
> 
> Of course not. We have a RW lock to prevent concurrent HW access while a GPU reset is ongoing to prevent that.

Now that you mention it, I remember that long ago I did something
very similar for i915, but it required quite a bit of surgery to
the atomic core code and Sima didn't want it so it never went in.

Our modeset state is much more complex these days, so attempting
to resurrect it now (and actually getting people to accept it)
just for old hardware doesn't seem worth the effort.

I think apart from the reset fence, the only other *practical* solution
is to effectively revert Sima's commit 9db529aac938 ("drm/i915: More
surgically unbreak the modeset vs reset deadlock") and go back to the
full wedge (== cancel all in flight requests). It's a bit rude since
it will also penalize completely innocent contexts. But meh.

> > There is zero support in the atomic design for reordering commits,
> > which is pretty much what would be needed. A generic solution would
> > more or less involve a full redesign/rewrite of atomic.
> 
> Mhm, I need to ask our DC team and other contacts how that is handled at the moment for amdgpu.
> 
> But we clearly have the same problem and it doesn't require any hacks like that one here. So there must be a solution for it already.

The hacks must be on the modeset side, since as mentioned atomic has
zero support for this at the moment.

-- 
Ville Syrjälä
Intel

  reply	other threads:[~2026-04-13 11:50 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-08 23:34 [PATCH 0/6] drm/i915/reset: Solve display vs. GPU reset deadlock, again Ville Syrjala
2026-04-08 23:34 ` [PATCH 1/6] dma-buf: Remove old lies about dma_fence_wait_any_timeout() not accepting some fences Ville Syrjala
2026-04-09  8:09   ` Jani Nikula
2026-04-09 10:39   ` Christian König
2026-04-08 23:34 ` [PATCH 2/6] drm/i915/reset: Reorganize display reset code Ville Syrjala
2026-04-09  8:13   ` Jani Nikula
2026-04-08 23:34 ` [PATCH 3/6] drm/i915/reset: Move pending_fb_pin handling to i915 Ville Syrjala
2026-04-09  8:17   ` Jani Nikula
2026-04-08 23:34 ` [PATCH 4/6] drm/xe/display: Add init_clock_gating.h stubs Ville Syrjala
2026-04-09  8:19   ` Jani Nikula
2026-04-08 23:34 ` [PATCH 5/6] drm/i915/reset: Handle the display vs. GPU reset deadlock using a custom dma-fence Ville Syrjala
2026-04-09 10:37   ` Jani Nikula
2026-04-09 10:46   ` Christian König
2026-04-09 11:19     ` Ville Syrjälä
2026-04-09 12:17       ` Christian König
2026-04-13  9:11         ` Ville Syrjälä
2026-04-13  9:35           ` Christian König
2026-04-13  9:58             ` Ville Syrjälä
2026-04-13 10:08               ` Ville Syrjälä
2026-04-13 11:24               ` Christian König
2026-04-13 11:49                 ` Ville Syrjälä [this message]
2026-04-08 23:34 ` [PATCH 6/6] drm/i915/display: Make fence timeout infinite Ville Syrjala
2026-04-09 10:51   ` Jani Nikula
2026-04-09  3:27 ` ✓ i915.CI.BAT: success for drm/i915/reset: Solve display vs. GPU reset deadlock, again Patchwork
2026-04-09 12:06 ` ✓ i915.CI.Full: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adzYYqOkSfY9XP_m@intel.com \
    --to=ville.syrjala@linux.intel.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jani.nikula@intel.com \
    --cc=jouni.hogander@intel.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=michel.daenzer@mailbox.org \
    --cc=simona.vetter@ffwll.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox