public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Jani Nikula <jani.nikula@intel.com>
To: Ville Syrjala <ville.syrjala@linux.intel.com>,
	intel-gfx@lists.freedesktop.org
Cc: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	"Simona Vetter" <simona.vetter@ffwll.ch>,
	"Christian König" <christian.koenig@amd.com>,
	"Jouni Högander" <jouni.hogander@intel.com>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>
Subject: Re: [PATCH 5/6] drm/i915/reset: Handle the display vs. GPU reset deadlock using a custom dma-fence
Date: Thu, 09 Apr 2026 13:37:56 +0300	[thread overview]
Message-ID: <b914af7aed6d2d19fbb12e7a4f06b705a7f597d1@intel.com> (raw)
In-Reply-To: <20260408233458.22666-6-ville.syrjala@linux.intel.com>

On Thu, 09 Apr 2026, Ville Syrjala <ville.syrjala@linux.intel.com> wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>
> The old display vs. GPU reset deadlock is back more or less.
> The old (working) solution to the problem was originally
> introduced in commit 9db529aac938 ("drm/i915: More surgically
> unbreak the modeset vs reset deadlock"), but it got nuked with
> commit d59cf7bb73f3 ("drm/i915/display: Use dma_fence interfaces
> instead of i915_sw_fence").
>
> Apparently no one looked hard enough to see that things didn't
> work quite properly anymore. What is still saving us for the most
> part is that we have a timeout on the fence wait
> (CONFIG_DRM_I915_FENCE_TIMEOUT, 10 seconds by default). But
> people are perhaps trying to get rid of that so we may need
> another solution, and 10 seconds is a bit slow.
>
> Re-solve the problem yet again with a custom dma-fence that gets
> signaled just prior to a GPU reset, and have the atomic commit wait
> for either that or the real fence using dma_fence_wait_any_timeout().
> Whichever signals first will let the commit proceed. We create a new
> "reset fence" whenever someone needs one, and keep it until the next
> GPU reset has completed. After that the next guy will again get a
> fresh unsignaled "reset fence".
>
> Cc: Simona Vetter <simona.vetter@ffwll.ch>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Jani Nikula <jani.nikula@intel.com>
> Cc: Jouni Högander <jouni.hogander@intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

This all makes sense to me, but I'd like to solicit addition review from
Simona, Christian and/or Maarten.

Reviewed-by: Jani Nikula <jani.nikula@intel.com>

> ---
>  drivers/gpu/drm/i915/display/intel_display.c  | 34 +++++---
>  .../gpu/drm/i915/display/intel_display_core.h |  6 ++
>  .../drm/i915/display/intel_display_driver.c   |  5 ++
>  .../drm/i915/display/intel_display_reset.c    | 77 +++++++++++++++++++
>  .../drm/i915/display/intel_display_reset.h    |  4 +
>  drivers/gpu/drm/xe/Makefile                   |  1 +
>  6 files changed, 117 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
> index 58a654ca0d20..83ccf13c4b16 100644
> --- a/drivers/gpu/drm/i915/display/intel_display.c
> +++ b/drivers/gpu/drm/i915/display/intel_display.c
> @@ -72,6 +72,7 @@
>  #include "intel_display_driver.h"
>  #include "intel_display_power.h"
>  #include "intel_display_regs.h"
> +#include "intel_display_reset.h"
>  #include "intel_display_rpm.h"
>  #include "intel_display_types.h"
>  #include "intel_display_utils.h"
> @@ -7149,22 +7150,35 @@ static void skl_commit_modeset_enables(struct intel_atomic_state *state)
>  
>  static void intel_atomic_commit_fence_wait(struct intel_atomic_state *intel_state)
>  {
> -	struct drm_plane *plane;
> +	struct intel_display *display = to_intel_display(intel_state);
>  	struct drm_plane_state *new_plane_state;
> -	long ret;
> +	struct dma_fence *reset_fence;
> +	struct drm_plane *plane;
>  	int i;
>  
> +	reset_fence = intel_display_reset_fence_get(display);
> +
>  	for_each_new_plane_in_state(&intel_state->base, plane, new_plane_state, i) {
> -		if (new_plane_state->fence) {
> -			ret = dma_fence_wait_timeout(new_plane_state->fence, false,
> -						     i915_fence_timeout());
> -			if (ret <= 0)
> -				break;
> +		struct dma_fence *fences[2] = {
> +			[0] = new_plane_state->fence,
> +			[1] = reset_fence,
> +		};
> +		long ret;
>  
> -			dma_fence_put(new_plane_state->fence);
> -			new_plane_state->fence = NULL;
> -		}
> +		if (!new_plane_state->fence)
> +			continue;
> +
> +		ret = dma_fence_wait_any_timeout(fences, reset_fence ? 2 : 1, false,
> +						 i915_fence_timeout(), NULL);
> +		if (ret <= 0)
> +			break;
> +
> +		dma_fence_put(new_plane_state->fence);
> +		new_plane_state->fence = NULL;
>  	}
> +
> +	if (reset_fence)
> +		dma_fence_put(reset_fence);
>  }
>  
>  static void intel_atomic_dsb_wait_commit(struct intel_crtc_state *crtc_state)
> diff --git a/drivers/gpu/drm/i915/display/intel_display_core.h b/drivers/gpu/drm/i915/display/intel_display_core.h
> index 9e77003addd0..6687b658c51d 100644
> --- a/drivers/gpu/drm/i915/display/intel_display_core.h
> +++ b/drivers/gpu/drm/i915/display/intel_display_core.h
> @@ -556,6 +556,12 @@ struct intel_display {
>  		unsigned long mask;
>  	} quirks;
>  
> +	struct {
> +		/* protects reset.fence */
> +		struct mutex mutex;
> +		struct dma_fence *fence;
> +	} reset;
> +
>  	struct {
>  		/* restore state for suspend/resume and display reset */
>  		struct drm_atomic_state *modeset_state;
> diff --git a/drivers/gpu/drm/i915/display/intel_display_driver.c b/drivers/gpu/drm/i915/display/intel_display_driver.c
> index 23bfecc983e8..fcd31722c731 100644
> --- a/drivers/gpu/drm/i915/display/intel_display_driver.c
> +++ b/drivers/gpu/drm/i915/display/intel_display_driver.c
> @@ -34,6 +34,7 @@
>  #include "intel_display_driver.h"
>  #include "intel_display_irq.h"
>  #include "intel_display_power.h"
> +#include "intel_display_reset.h"
>  #include "intel_display_types.h"
>  #include "intel_display_utils.h"
>  #include "intel_display_wa.h"
> @@ -257,6 +258,8 @@ int intel_display_driver_probe_noirq(struct intel_display *display)
>  
>  	intel_mode_config_init(display);
>  
> +	intel_display_reset_fence_init(display);
> +
>  	ret = intel_cdclk_init(display);
>  	if (ret)
>  		goto cleanup_wq_unordered;
> @@ -584,6 +587,8 @@ void intel_display_driver_remove(struct intel_display *display)
>  	if (!HAS_DISPLAY(display))
>  		return;
>  
> +	intel_display_reset_fence_discard(display);
> +
>  	flush_workqueue(display->wq.flip);
>  	flush_workqueue(display->wq.modeset);
>  	flush_workqueue(display->wq.cleanup);
> diff --git a/drivers/gpu/drm/i915/display/intel_display_reset.c b/drivers/gpu/drm/i915/display/intel_display_reset.c
> index ca15dc18ef0f..80dd2ea8a0c2 100644
> --- a/drivers/gpu/drm/i915/display/intel_display_reset.c
> +++ b/drivers/gpu/drm/i915/display/intel_display_reset.c
> @@ -3,6 +3,8 @@
>   * Copyright © 2023 Intel Corporation
>   */
>  
> +#include <linux/dma-fence.h>
> +
>  #include <drm/drm_atomic_helper.h>
>  #include <drm/drm_print.h>
>  
> @@ -16,6 +18,72 @@
>  #include "intel_hotplug.h"
>  #include "intel_pps.h"
>  
> +static const char *intel_display_reset_fence_get_driver_name(struct dma_fence *fence)
> +{
> +	return "intel_display";
> +}
> +
> +static const char *intel_display_reset_fence_get_timeline_name(struct dma_fence *fence)
> +{
> +	return "reset";
> +}
> +
> +static const struct dma_fence_ops intel_display_reset_fence_ops = {
> +	.get_driver_name = intel_display_reset_fence_get_driver_name,
> +	.get_timeline_name = intel_display_reset_fence_get_timeline_name,
> +};
> +
> +static void intel_display_reset_create(struct intel_display *display)
> +{
> +	struct dma_fence *fence;
> +
> +	fence = kzalloc_obj(*fence);
> +	if (!fence)
> +		return;
> +
> +	dma_fence_init(fence, &intel_display_reset_fence_ops, NULL, 0, 0);
> +
> +	display->reset.fence = fence;
> +}
> +
> +struct dma_fence *intel_display_reset_fence_get(struct intel_display *display)
> +{
> +	struct dma_fence *fence;
> +
> +	mutex_lock(&display->reset.mutex);
> +
> +	if (!display->reset.fence)
> +		intel_display_reset_create(display);
> +
> +	fence = display->reset.fence;
> +	if (fence)
> +		dma_fence_get(fence);
> +
> +	mutex_unlock(&display->reset.mutex);
> +
> +	return fence;
> +}
> +
> +void intel_display_reset_fence_discard(struct intel_display *display)
> +{
> +	struct dma_fence *fence;
> +
> +	mutex_lock(&display->reset.mutex);
> +
> +	fence = display->reset.fence;
> +	if (fence)
> +		dma_fence_put(fence);
> +
> +	display->reset.fence = NULL;
> +
> +	mutex_unlock(&display->reset.mutex);
> +}
> +
> +void intel_display_reset_fence_init(struct intel_display *display)
> +{
> +	mutex_init(&display->reset.mutex);
> +}
> +
>  bool intel_display_reset_supported(struct intel_display *display)
>  {
>  	return HAS_DISPLAY(display);
> @@ -31,8 +99,15 @@ void intel_display_reset_prepare(struct intel_display *display)
>  {
>  	struct drm_modeset_acquire_ctx *ctx = &display->restore.reset_ctx;
>  	struct drm_atomic_state *state;
> +	struct dma_fence *reset_fence;
>  	int ret;
>  
> +	reset_fence = intel_display_reset_fence_get(display);
> +	if (reset_fence) {
> +		dma_fence_signal(reset_fence);
> +		dma_fence_put(reset_fence);
> +	}
> +
>  	/*
>  	 * Need mode_config.mutex so that we don't
>  	 * trample ongoing ->detect() and whatnot.
> @@ -110,6 +185,8 @@ void intel_display_reset_finish(struct intel_display *display, bool test_only)
>  
>  	drm_atomic_state_put(state);
>  unlock:
> +	intel_display_reset_fence_discard(display);
> +
>  	drm_modeset_drop_locks(ctx);
>  	drm_modeset_acquire_fini(ctx);
>  	mutex_unlock(&display->drm->mode_config.mutex);
> diff --git a/drivers/gpu/drm/i915/display/intel_display_reset.h b/drivers/gpu/drm/i915/display/intel_display_reset.h
> index a8aa7729d33f..c36a075c6b4d 100644
> --- a/drivers/gpu/drm/i915/display/intel_display_reset.h
> +++ b/drivers/gpu/drm/i915/display/intel_display_reset.h
> @@ -10,6 +10,10 @@
>  
>  struct intel_display;
>  
> +struct dma_fence *intel_display_reset_fence_get(struct intel_display *display);
> +void intel_display_reset_fence_discard(struct intel_display *display);
> +void intel_display_reset_fence_init(struct intel_display *display);
> +
>  bool intel_display_reset_supported(struct intel_display *display);
>  bool intel_display_reset_test(struct intel_display *display);
>  void intel_display_reset_prepare(struct intel_display *display);
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index 110fef511fe2..1a85dfe457f0 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -262,6 +262,7 @@ xe-$(CONFIG_DRM_XE_DISPLAY) += \
>  	i915-display/intel_display_power.o \
>  	i915-display/intel_display_power_map.o \
>  	i915-display/intel_display_power_well.o \
> +	i915-display/intel_display_reset.o \
>  	i915-display/intel_display_rpm.o \
>  	i915-display/intel_display_rps.o \
>  	i915-display/intel_display_trace.o \

-- 
Jani Nikula, Intel

  reply	other threads:[~2026-04-09 10:38 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-08 23:34 [PATCH 0/6] drm/i915/reset: Solve display vs. GPU reset deadlock, again Ville Syrjala
2026-04-08 23:34 ` [PATCH 1/6] dma-buf: Remove old lies about dma_fence_wait_any_timeout() not accepting some fences Ville Syrjala
2026-04-09  8:09   ` Jani Nikula
2026-04-09 10:39   ` Christian König
2026-04-08 23:34 ` [PATCH 2/6] drm/i915/reset: Reorganize display reset code Ville Syrjala
2026-04-09  8:13   ` Jani Nikula
2026-04-08 23:34 ` [PATCH 3/6] drm/i915/reset: Move pending_fb_pin handling to i915 Ville Syrjala
2026-04-09  8:17   ` Jani Nikula
2026-04-08 23:34 ` [PATCH 4/6] drm/xe/display: Add init_clock_gating.h stubs Ville Syrjala
2026-04-09  8:19   ` Jani Nikula
2026-04-08 23:34 ` [PATCH 5/6] drm/i915/reset: Handle the display vs. GPU reset deadlock using a custom dma-fence Ville Syrjala
2026-04-09 10:37   ` Jani Nikula [this message]
2026-04-09 10:46   ` Christian König
2026-04-09 11:19     ` Ville Syrjälä
2026-04-09 12:17       ` Christian König
2026-04-08 23:34 ` [PATCH 6/6] drm/i915/display: Make fence timeout infinite Ville Syrjala
2026-04-09 10:51   ` Jani Nikula
2026-04-08 23:42 ` ✗ CI.KUnit: failure for drm/i915/reset: Solve display vs. GPU reset deadlock, again Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b914af7aed6d2d19fbb12e7a4f06b705a7f597d1@intel.com \
    --to=jani.nikula@intel.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jouni.hogander@intel.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=simona.vetter@ffwll.ch \
    --cc=ville.syrjala@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox