public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed
From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: "Christian König" <christian.koenig@amd.com>
Cc: intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org,
	"Simona Vetter" <simona.vetter@ffwll.ch>,
	"Jani Nikula" <jani.nikula@intel.com>,
	"Jouni Högander" <jouni.hogander@intel.com>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>
Subject: Re: [PATCH 5/6] drm/i915/reset: Handle the display vs. GPU reset deadlock using a custom dma-fence
Date: Thu, 9 Apr 2026 14:19:32 +0300	[thread overview]
Message-ID: <adeLRHQ2omAv93yM@intel.com> (raw)
In-Reply-To: <44fa373c-6216-4cc4-a605-94776b3873ad@amd.com>

On Thu, Apr 09, 2026 at 12:46:11PM +0200, Christian König wrote:
> On 4/9/26 01:34, Ville Syrjala wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > The old display vs. GPU reset deadlock is back more or less.
> > The old (working) solution to the problem was originally
> > introduced in commit 9db529aac938 ("drm/i915: More surgically
> > unbreak the modeset vs reset deadlock"), but it got nuked with
> > commit d59cf7bb73f3 ("drm/i915/display: Use dma_fence interfaces
> > instead of i915_sw_fence").
> > 
> > Apparently no one looked hard enough to see that things didn't
> > work quite properly anymore. What is still saving us for the most
> > part is that we have a timeout on the fence wait
> > (CONFIG_DRM_I915_FENCE_TIMEOUT, 10 seconds by default). But
> > people are perhaps trying to get rid of that so we may need
> > another solution, and 10 seconds is a bit slow.
> 
> Yeah agree that approach with the timeout is usually a big no no.
> 
> > Re-solve the problem yet again with a custom dma-fence that gets
> > signaled just prior to a GPU reset, and have the atomic commit wait
> > for either that or the real fence using dma_fence_wait_any_timeout().
> 
> Hui? I don't fully understand what the source of the problem is, but of hand that approach of solving it doesn't sound like a good idea either.
> 
> > Whichever signals first will let the commit proceed. We create a new
> > "reset fence" whenever someone needs one, and keep it until the next
> > GPU reset has completed. After that the next guy will again get a
> > fresh unsignaled "reset fence".
> 
> And that sounds even worse. A dma_fence which waits for the next GPU reset without triggering it itself would be an indefinite dma_fence which is not allowed.

This is a purely internal thing to the i915 display code. This fence
is never shared with anyone. And we never wait for just this fence,
the wait always happens alongside a real fence using
dma_fence_wait_any_timeout().

> 
> I think something is missing in my picture how that is supposed to work.

The problem stems from the fact that on old platforms a GPU reset
also resets the display hardware, and to do that safely we need:
1. shut down display
2. perform the GPU reset
3. restore the display hardware to its orignal state

We just do that with essentially with a normal atomic commit. But a 
previous atomic commit may already be waiting for a fence, which
won't signal until the GPU reset happens, and the GPU reset is now
waiting for that previous atomic commit to finish so that it can do
its own atomic commit. In order to break the deadlock we need to
abort the fence waits in the atomic commit, and that's what this
"reset fence" achieves.

> 
> Thanks,
> Christian.
> 
> > 
> > Cc: Simona Vetter <simona.vetter@ffwll.ch>
> > Cc: Christian König <christian.koenig@amd.com>
> > Cc: Jani Nikula <jani.nikula@intel.com>
> > Cc: Jouni Högander <jouni.hogander@intel.com>
> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/display/intel_display.c  | 34 +++++---
> >  .../gpu/drm/i915/display/intel_display_core.h |  6 ++
> >  .../drm/i915/display/intel_display_driver.c   |  5 ++
> >  .../drm/i915/display/intel_display_reset.c    | 77 +++++++++++++++++++
> >  .../drm/i915/display/intel_display_reset.h    |  4 +
> >  drivers/gpu/drm/xe/Makefile                   |  1 +
> >  6 files changed, 117 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
> > index 58a654ca0d20..83ccf13c4b16 100644
> > --- a/drivers/gpu/drm/i915/display/intel_display.c
> > +++ b/drivers/gpu/drm/i915/display/intel_display.c
> > @@ -72,6 +72,7 @@
> >  #include "intel_display_driver.h"
> >  #include "intel_display_power.h"
> >  #include "intel_display_regs.h"
> > +#include "intel_display_reset.h"
> >  #include "intel_display_rpm.h"
> >  #include "intel_display_types.h"
> >  #include "intel_display_utils.h"
> > @@ -7149,22 +7150,35 @@ static void skl_commit_modeset_enables(struct intel_atomic_state *state)
> >  
> >  static void intel_atomic_commit_fence_wait(struct intel_atomic_state *intel_state)
> >  {
> > -	struct drm_plane *plane;
> > +	struct intel_display *display = to_intel_display(intel_state);
> >  	struct drm_plane_state *new_plane_state;
> > -	long ret;
> > +	struct dma_fence *reset_fence;
> > +	struct drm_plane *plane;
> >  	int i;
> >  
> > +	reset_fence = intel_display_reset_fence_get(display);
> > +
> >  	for_each_new_plane_in_state(&intel_state->base, plane, new_plane_state, i) {
> > -		if (new_plane_state->fence) {
> > -			ret = dma_fence_wait_timeout(new_plane_state->fence, false,
> > -						     i915_fence_timeout());
> > -			if (ret <= 0)
> > -				break;
> > +		struct dma_fence *fences[2] = {
> > +			[0] = new_plane_state->fence,
> > +			[1] = reset_fence,
> > +		};
> > +		long ret;
> >  
> > -			dma_fence_put(new_plane_state->fence);
> > -			new_plane_state->fence = NULL;
> > -		}
> > +		if (!new_plane_state->fence)
> > +			continue;
> > +
> > +		ret = dma_fence_wait_any_timeout(fences, reset_fence ? 2 : 1, false,
> > +						 i915_fence_timeout(), NULL);
> > +		if (ret <= 0)
> > +			break;
> > +
> > +		dma_fence_put(new_plane_state->fence);
> > +		new_plane_state->fence = NULL;
> >  	}
> > +
> > +	if (reset_fence)
> > +		dma_fence_put(reset_fence);
> >  }
> >  
> >  static void intel_atomic_dsb_wait_commit(struct intel_crtc_state *crtc_state)
> > diff --git a/drivers/gpu/drm/i915/display/intel_display_core.h b/drivers/gpu/drm/i915/display/intel_display_core.h
> > index 9e77003addd0..6687b658c51d 100644
> > --- a/drivers/gpu/drm/i915/display/intel_display_core.h
> > +++ b/drivers/gpu/drm/i915/display/intel_display_core.h
> > @@ -556,6 +556,12 @@ struct intel_display {
> >  		unsigned long mask;
> >  	} quirks;
> >  
> > +	struct {
> > +		/* protects reset.fence */
> > +		struct mutex mutex;
> > +		struct dma_fence *fence;
> > +	} reset;
> > +
> >  	struct {
> >  		/* restore state for suspend/resume and display reset */
> >  		struct drm_atomic_state *modeset_state;
> > diff --git a/drivers/gpu/drm/i915/display/intel_display_driver.c b/drivers/gpu/drm/i915/display/intel_display_driver.c
> > index 23bfecc983e8..fcd31722c731 100644
> > --- a/drivers/gpu/drm/i915/display/intel_display_driver.c
> > +++ b/drivers/gpu/drm/i915/display/intel_display_driver.c
> > @@ -34,6 +34,7 @@
> >  #include "intel_display_driver.h"
> >  #include "intel_display_irq.h"
> >  #include "intel_display_power.h"
> > +#include "intel_display_reset.h"
> >  #include "intel_display_types.h"
> >  #include "intel_display_utils.h"
> >  #include "intel_display_wa.h"
> > @@ -257,6 +258,8 @@ int intel_display_driver_probe_noirq(struct intel_display *display)
> >  
> >  	intel_mode_config_init(display);
> >  
> > +	intel_display_reset_fence_init(display);
> > +
> >  	ret = intel_cdclk_init(display);
> >  	if (ret)
> >  		goto cleanup_wq_unordered;
> > @@ -584,6 +587,8 @@ void intel_display_driver_remove(struct intel_display *display)
> >  	if (!HAS_DISPLAY(display))
> >  		return;
> >  
> > +	intel_display_reset_fence_discard(display);
> > +
> >  	flush_workqueue(display->wq.flip);
> >  	flush_workqueue(display->wq.modeset);
> >  	flush_workqueue(display->wq.cleanup);
> > diff --git a/drivers/gpu/drm/i915/display/intel_display_reset.c b/drivers/gpu/drm/i915/display/intel_display_reset.c
> > index ca15dc18ef0f..80dd2ea8a0c2 100644
> > --- a/drivers/gpu/drm/i915/display/intel_display_reset.c
> > +++ b/drivers/gpu/drm/i915/display/intel_display_reset.c
> > @@ -3,6 +3,8 @@
> >   * Copyright © 2023 Intel Corporation
> >   */
> >  
> > +#include <linux/dma-fence.h>
> > +
> >  #include <drm/drm_atomic_helper.h>
> >  #include <drm/drm_print.h>
> >  
> > @@ -16,6 +18,72 @@
> >  #include "intel_hotplug.h"
> >  #include "intel_pps.h"
> >  
> > +static const char *intel_display_reset_fence_get_driver_name(struct dma_fence *fence)
> > +{
> > +	return "intel_display";
> > +}
> > +
> > +static const char *intel_display_reset_fence_get_timeline_name(struct dma_fence *fence)
> > +{
> > +	return "reset";
> > +}
> > +
> > +static const struct dma_fence_ops intel_display_reset_fence_ops = {
> > +	.get_driver_name = intel_display_reset_fence_get_driver_name,
> > +	.get_timeline_name = intel_display_reset_fence_get_timeline_name,
> > +};
> > +
> > +static void intel_display_reset_create(struct intel_display *display)
> > +{
> > +	struct dma_fence *fence;
> > +
> > +	fence = kzalloc_obj(*fence);
> > +	if (!fence)
> > +		return;
> > +
> > +	dma_fence_init(fence, &intel_display_reset_fence_ops, NULL, 0, 0);
> > +
> > +	display->reset.fence = fence;
> > +}
> > +
> > +struct dma_fence *intel_display_reset_fence_get(struct intel_display *display)
> > +{
> > +	struct dma_fence *fence;
> > +
> > +	mutex_lock(&display->reset.mutex);
> > +
> > +	if (!display->reset.fence)
> > +		intel_display_reset_create(display);
> > +
> > +	fence = display->reset.fence;
> > +	if (fence)
> > +		dma_fence_get(fence);
> > +
> > +	mutex_unlock(&display->reset.mutex);
> > +
> > +	return fence;
> > +}
> > +
> > +void intel_display_reset_fence_discard(struct intel_display *display)
> > +{
> > +	struct dma_fence *fence;
> > +
> > +	mutex_lock(&display->reset.mutex);
> > +
> > +	fence = display->reset.fence;
> > +	if (fence)
> > +		dma_fence_put(fence);
> > +
> > +	display->reset.fence = NULL;
> > +
> > +	mutex_unlock(&display->reset.mutex);
> > +}
> > +
> > +void intel_display_reset_fence_init(struct intel_display *display)
> > +{
> > +	mutex_init(&display->reset.mutex);
> > +}
> > +
> >  bool intel_display_reset_supported(struct intel_display *display)
> >  {
> >  	return HAS_DISPLAY(display);
> > @@ -31,8 +99,15 @@ void intel_display_reset_prepare(struct intel_display *display)
> >  {
> >  	struct drm_modeset_acquire_ctx *ctx = &display->restore.reset_ctx;
> >  	struct drm_atomic_state *state;
> > +	struct dma_fence *reset_fence;
> >  	int ret;
> >  
> > +	reset_fence = intel_display_reset_fence_get(display);
> > +	if (reset_fence) {
> > +		dma_fence_signal(reset_fence);
> > +		dma_fence_put(reset_fence);
> > +	}
> > +
> >  	/*
> >  	 * Need mode_config.mutex so that we don't
> >  	 * trample ongoing ->detect() and whatnot.
> > @@ -110,6 +185,8 @@ void intel_display_reset_finish(struct intel_display *display, bool test_only)
> >  
> >  	drm_atomic_state_put(state);
> >  unlock:
> > +	intel_display_reset_fence_discard(display);
> > +
> >  	drm_modeset_drop_locks(ctx);
> >  	drm_modeset_acquire_fini(ctx);
> >  	mutex_unlock(&display->drm->mode_config.mutex);
> > diff --git a/drivers/gpu/drm/i915/display/intel_display_reset.h b/drivers/gpu/drm/i915/display/intel_display_reset.h
> > index a8aa7729d33f..c36a075c6b4d 100644
> > --- a/drivers/gpu/drm/i915/display/intel_display_reset.h
> > +++ b/drivers/gpu/drm/i915/display/intel_display_reset.h
> > @@ -10,6 +10,10 @@
> >  
> >  struct intel_display;
> >  
> > +struct dma_fence *intel_display_reset_fence_get(struct intel_display *display);
> > +void intel_display_reset_fence_discard(struct intel_display *display);
> > +void intel_display_reset_fence_init(struct intel_display *display);
> > +
> >  bool intel_display_reset_supported(struct intel_display *display);
> >  bool intel_display_reset_test(struct intel_display *display);
> >  void intel_display_reset_prepare(struct intel_display *display);
> > diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> > index 110fef511fe2..1a85dfe457f0 100644
> > --- a/drivers/gpu/drm/xe/Makefile
> > +++ b/drivers/gpu/drm/xe/Makefile
> > @@ -262,6 +262,7 @@ xe-$(CONFIG_DRM_XE_DISPLAY) += \
> >  	i915-display/intel_display_power.o \
> >  	i915-display/intel_display_power_map.o \
> >  	i915-display/intel_display_power_well.o \
> > +	i915-display/intel_display_reset.o \
> >  	i915-display/intel_display_rpm.o \
> >  	i915-display/intel_display_rps.o \
> >  	i915-display/intel_display_trace.o \

-- 
Ville Syrjälä
Intel

  reply	other threads:[~2026-04-09 11:19 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-08 23:34 [PATCH 0/6] drm/i915/reset: Solve display vs. GPU reset deadlock, again Ville Syrjala
2026-04-08 23:34 ` [PATCH 1/6] dma-buf: Remove old lies about dma_fence_wait_any_timeout() not accepting some fences Ville Syrjala
2026-04-09  8:09   ` Jani Nikula
2026-04-09 10:39   ` Christian König
2026-04-08 23:34 ` [PATCH 2/6] drm/i915/reset: Reorganize display reset code Ville Syrjala
2026-04-09  8:13   ` Jani Nikula
2026-04-08 23:34 ` [PATCH 3/6] drm/i915/reset: Move pending_fb_pin handling to i915 Ville Syrjala
2026-04-09  8:17   ` Jani Nikula
2026-04-08 23:34 ` [PATCH 4/6] drm/xe/display: Add init_clock_gating.h stubs Ville Syrjala
2026-04-09  8:19   ` Jani Nikula
2026-04-08 23:34 ` [PATCH 5/6] drm/i915/reset: Handle the display vs. GPU reset deadlock using a custom dma-fence Ville Syrjala
2026-04-09 10:37   ` Jani Nikula
2026-04-09 10:46   ` Christian König
2026-04-09 11:19     ` Ville Syrjälä [this message]
2026-04-09 12:17       ` Christian König
2026-04-08 23:34 ` [PATCH 6/6] drm/i915/display: Make fence timeout infinite Ville Syrjala
2026-04-09 10:51   ` Jani Nikula
2026-04-08 23:42 ` ✗ CI.KUnit: failure for drm/i915/reset: Solve display vs. GPU reset deadlock, again Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adeLRHQ2omAv93yM@intel.com \
    --to=ville.syrjala@linux.intel.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jani.nikula@intel.com \
    --cc=jouni.hogander@intel.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=simona.vetter@ffwll.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox