From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3B9B1EA3C48 for ; Thu, 9 Apr 2026 10:38:06 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 61B0910E78D; Thu, 9 Apr 2026 10:38:04 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="n2Uv4BRW"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) by gabe.freedesktop.org (Postfix) with ESMTPS id B874710E78D; Thu, 9 Apr 2026 10:38:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1775731083; x=1807267083; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=pUToKwWuzW4+i/wcO34uwyMMNZGsv1/eXrXOmyB6gnA=; b=n2Uv4BRWyqw9rHMrjhT4Z8GzI4nW2hsITUBTuVyW4ifSKM81E44scda3 QDAuAtnLG9ntKnKgogAoczEcy+z0hTRBS+8tPFlBHiffJk51+EQMspwx9 +jykJetmyjBbXg0APmymbzzCEwVD9qNrA6jnafLh0lg7QjqDT+KhKKiq2 rUwprDebko/yN5Ld8+yFEkEGh02EfSdvZJ8bvaO5dN+KHUyfX3xQxP7ow OHyng3J2LucNYJCx04s20aAeO3hZWwVF/+WwEPen8E8HhtdkYre7fYSUn bdWfK8E7tcNeBT6LDnSFOUnFmVBWQC9NLRznFDeak5eJk+EM0aNNLqGPw Q==; X-CSE-ConnectionGUID: YZ2Z8jCVQKOhObFKySW8zg== X-CSE-MsgGUID: eZuPV7eaSQ+7x2mxfkTamA== X-IronPort-AV: E=McAfee;i="6800,10657,11753"; a="75768000" X-IronPort-AV: E=Sophos;i="6.23,169,1770624000"; d="scan'208";a="75768000" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Apr 2026 03:38:02 -0700 X-CSE-ConnectionGUID: Ni6X+H0mTCaVhV+54A6hzw== X-CSE-MsgGUID: IWHqL3UGRvGgQpU2CnNZ1w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,169,1770624000"; d="scan'208";a="259181229" Received: from ettammin-mobl3.ger.corp.intel.com (HELO localhost) ([10.245.246.92]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Apr 2026 03:37:59 -0700 From: Jani Nikula To: Ville Syrjala , intel-gfx@lists.freedesktop.org Cc: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org, Simona Vetter , Christian =?utf-8?Q?K=C3=B6nig?= , Jouni =?utf-8?Q?H=C3=B6gander?= , Maarten Lankhorst Subject: Re: [PATCH 5/6] drm/i915/reset: Handle the display vs. GPU reset deadlock using a custom dma-fence In-Reply-To: <20260408233458.22666-6-ville.syrjala@linux.intel.com> Organization: Intel Finland Oy - BIC 0357606-4 - c/o Alberga Business Park, 6 krs Bertel Jungin Aukio 5, 02600 Espoo, Finland References: <20260408233458.22666-1-ville.syrjala@linux.intel.com> <20260408233458.22666-6-ville.syrjala@linux.intel.com> Date: Thu, 09 Apr 2026 13:37:56 +0300 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On Thu, 09 Apr 2026, Ville Syrjala wrote: > From: Ville Syrj=C3=A4l=C3=A4 > > The old display vs. GPU reset deadlock is back more or less. > The old (working) solution to the problem was originally > introduced in commit 9db529aac938 ("drm/i915: More surgically > unbreak the modeset vs reset deadlock"), but it got nuked with > commit d59cf7bb73f3 ("drm/i915/display: Use dma_fence interfaces > instead of i915_sw_fence"). > > Apparently no one looked hard enough to see that things didn't > work quite properly anymore. What is still saving us for the most > part is that we have a timeout on the fence wait > (CONFIG_DRM_I915_FENCE_TIMEOUT, 10 seconds by default). But > people are perhaps trying to get rid of that so we may need > another solution, and 10 seconds is a bit slow. > > Re-solve the problem yet again with a custom dma-fence that gets > signaled just prior to a GPU reset, and have the atomic commit wait > for either that or the real fence using dma_fence_wait_any_timeout(). > Whichever signals first will let the commit proceed. We create a new > "reset fence" whenever someone needs one, and keep it until the next > GPU reset has completed. After that the next guy will again get a > fresh unsignaled "reset fence". > > Cc: Simona Vetter > Cc: Christian K=C3=B6nig > Cc: Jani Nikula > Cc: Jouni H=C3=B6gander > Cc: Maarten Lankhorst > Signed-off-by: Ville Syrj=C3=A4l=C3=A4 This all makes sense to me, but I'd like to solicit addition review from Simona, Christian and/or Maarten. Reviewed-by: Jani Nikula > --- > drivers/gpu/drm/i915/display/intel_display.c | 34 +++++--- > .../gpu/drm/i915/display/intel_display_core.h | 6 ++ > .../drm/i915/display/intel_display_driver.c | 5 ++ > .../drm/i915/display/intel_display_reset.c | 77 +++++++++++++++++++ > .../drm/i915/display/intel_display_reset.h | 4 + > drivers/gpu/drm/xe/Makefile | 1 + > 6 files changed, 117 insertions(+), 10 deletions(-) > > diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/d= rm/i915/display/intel_display.c > index 58a654ca0d20..83ccf13c4b16 100644 > --- a/drivers/gpu/drm/i915/display/intel_display.c > +++ b/drivers/gpu/drm/i915/display/intel_display.c > @@ -72,6 +72,7 @@ > #include "intel_display_driver.h" > #include "intel_display_power.h" > #include "intel_display_regs.h" > +#include "intel_display_reset.h" > #include "intel_display_rpm.h" > #include "intel_display_types.h" > #include "intel_display_utils.h" > @@ -7149,22 +7150,35 @@ static void skl_commit_modeset_enables(struct int= el_atomic_state *state) >=20=20 > static void intel_atomic_commit_fence_wait(struct intel_atomic_state *in= tel_state) > { > - struct drm_plane *plane; > + struct intel_display *display =3D to_intel_display(intel_state); > struct drm_plane_state *new_plane_state; > - long ret; > + struct dma_fence *reset_fence; > + struct drm_plane *plane; > int i; >=20=20 > + reset_fence =3D intel_display_reset_fence_get(display); > + > for_each_new_plane_in_state(&intel_state->base, plane, new_plane_state,= i) { > - if (new_plane_state->fence) { > - ret =3D dma_fence_wait_timeout(new_plane_state->fence, false, > - i915_fence_timeout()); > - if (ret <=3D 0) > - break; > + struct dma_fence *fences[2] =3D { > + [0] =3D new_plane_state->fence, > + [1] =3D reset_fence, > + }; > + long ret; >=20=20 > - dma_fence_put(new_plane_state->fence); > - new_plane_state->fence =3D NULL; > - } > + if (!new_plane_state->fence) > + continue; > + > + ret =3D dma_fence_wait_any_timeout(fences, reset_fence ? 2 : 1, false, > + i915_fence_timeout(), NULL); > + if (ret <=3D 0) > + break; > + > + dma_fence_put(new_plane_state->fence); > + new_plane_state->fence =3D NULL; > } > + > + if (reset_fence) > + dma_fence_put(reset_fence); > } >=20=20 > static void intel_atomic_dsb_wait_commit(struct intel_crtc_state *crtc_s= tate) > diff --git a/drivers/gpu/drm/i915/display/intel_display_core.h b/drivers/= gpu/drm/i915/display/intel_display_core.h > index 9e77003addd0..6687b658c51d 100644 > --- a/drivers/gpu/drm/i915/display/intel_display_core.h > +++ b/drivers/gpu/drm/i915/display/intel_display_core.h > @@ -556,6 +556,12 @@ struct intel_display { > unsigned long mask; > } quirks; >=20=20 > + struct { > + /* protects reset.fence */ > + struct mutex mutex; > + struct dma_fence *fence; > + } reset; > + > struct { > /* restore state for suspend/resume and display reset */ > struct drm_atomic_state *modeset_state; > diff --git a/drivers/gpu/drm/i915/display/intel_display_driver.c b/driver= s/gpu/drm/i915/display/intel_display_driver.c > index 23bfecc983e8..fcd31722c731 100644 > --- a/drivers/gpu/drm/i915/display/intel_display_driver.c > +++ b/drivers/gpu/drm/i915/display/intel_display_driver.c > @@ -34,6 +34,7 @@ > #include "intel_display_driver.h" > #include "intel_display_irq.h" > #include "intel_display_power.h" > +#include "intel_display_reset.h" > #include "intel_display_types.h" > #include "intel_display_utils.h" > #include "intel_display_wa.h" > @@ -257,6 +258,8 @@ int intel_display_driver_probe_noirq(struct intel_dis= play *display) >=20=20 > intel_mode_config_init(display); >=20=20 > + intel_display_reset_fence_init(display); > + > ret =3D intel_cdclk_init(display); > if (ret) > goto cleanup_wq_unordered; > @@ -584,6 +587,8 @@ void intel_display_driver_remove(struct intel_display= *display) > if (!HAS_DISPLAY(display)) > return; >=20=20 > + intel_display_reset_fence_discard(display); > + > flush_workqueue(display->wq.flip); > flush_workqueue(display->wq.modeset); > flush_workqueue(display->wq.cleanup); > diff --git a/drivers/gpu/drm/i915/display/intel_display_reset.c b/drivers= /gpu/drm/i915/display/intel_display_reset.c > index ca15dc18ef0f..80dd2ea8a0c2 100644 > --- a/drivers/gpu/drm/i915/display/intel_display_reset.c > +++ b/drivers/gpu/drm/i915/display/intel_display_reset.c > @@ -3,6 +3,8 @@ > * Copyright =C2=A9 2023 Intel Corporation > */ >=20=20 > +#include > + > #include > #include >=20=20 > @@ -16,6 +18,72 @@ > #include "intel_hotplug.h" > #include "intel_pps.h" >=20=20 > +static const char *intel_display_reset_fence_get_driver_name(struct dma_= fence *fence) > +{ > + return "intel_display"; > +} > + > +static const char *intel_display_reset_fence_get_timeline_name(struct dm= a_fence *fence) > +{ > + return "reset"; > +} > + > +static const struct dma_fence_ops intel_display_reset_fence_ops =3D { > + .get_driver_name =3D intel_display_reset_fence_get_driver_name, > + .get_timeline_name =3D intel_display_reset_fence_get_timeline_name, > +}; > + > +static void intel_display_reset_create(struct intel_display *display) > +{ > + struct dma_fence *fence; > + > + fence =3D kzalloc_obj(*fence); > + if (!fence) > + return; > + > + dma_fence_init(fence, &intel_display_reset_fence_ops, NULL, 0, 0); > + > + display->reset.fence =3D fence; > +} > + > +struct dma_fence *intel_display_reset_fence_get(struct intel_display *di= splay) > +{ > + struct dma_fence *fence; > + > + mutex_lock(&display->reset.mutex); > + > + if (!display->reset.fence) > + intel_display_reset_create(display); > + > + fence =3D display->reset.fence; > + if (fence) > + dma_fence_get(fence); > + > + mutex_unlock(&display->reset.mutex); > + > + return fence; > +} > + > +void intel_display_reset_fence_discard(struct intel_display *display) > +{ > + struct dma_fence *fence; > + > + mutex_lock(&display->reset.mutex); > + > + fence =3D display->reset.fence; > + if (fence) > + dma_fence_put(fence); > + > + display->reset.fence =3D NULL; > + > + mutex_unlock(&display->reset.mutex); > +} > + > +void intel_display_reset_fence_init(struct intel_display *display) > +{ > + mutex_init(&display->reset.mutex); > +} > + > bool intel_display_reset_supported(struct intel_display *display) > { > return HAS_DISPLAY(display); > @@ -31,8 +99,15 @@ void intel_display_reset_prepare(struct intel_display = *display) > { > struct drm_modeset_acquire_ctx *ctx =3D &display->restore.reset_ctx; > struct drm_atomic_state *state; > + struct dma_fence *reset_fence; > int ret; >=20=20 > + reset_fence =3D intel_display_reset_fence_get(display); > + if (reset_fence) { > + dma_fence_signal(reset_fence); > + dma_fence_put(reset_fence); > + } > + > /* > * Need mode_config.mutex so that we don't > * trample ongoing ->detect() and whatnot. > @@ -110,6 +185,8 @@ void intel_display_reset_finish(struct intel_display = *display, bool test_only) >=20=20 > drm_atomic_state_put(state); > unlock: > + intel_display_reset_fence_discard(display); > + > drm_modeset_drop_locks(ctx); > drm_modeset_acquire_fini(ctx); > mutex_unlock(&display->drm->mode_config.mutex); > diff --git a/drivers/gpu/drm/i915/display/intel_display_reset.h b/drivers= /gpu/drm/i915/display/intel_display_reset.h > index a8aa7729d33f..c36a075c6b4d 100644 > --- a/drivers/gpu/drm/i915/display/intel_display_reset.h > +++ b/drivers/gpu/drm/i915/display/intel_display_reset.h > @@ -10,6 +10,10 @@ >=20=20 > struct intel_display; >=20=20 > +struct dma_fence *intel_display_reset_fence_get(struct intel_display *di= splay); > +void intel_display_reset_fence_discard(struct intel_display *display); > +void intel_display_reset_fence_init(struct intel_display *display); > + > bool intel_display_reset_supported(struct intel_display *display); > bool intel_display_reset_test(struct intel_display *display); > void intel_display_reset_prepare(struct intel_display *display); > diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile > index 110fef511fe2..1a85dfe457f0 100644 > --- a/drivers/gpu/drm/xe/Makefile > +++ b/drivers/gpu/drm/xe/Makefile > @@ -262,6 +262,7 @@ xe-$(CONFIG_DRM_XE_DISPLAY) +=3D \ > i915-display/intel_display_power.o \ > i915-display/intel_display_power_map.o \ > i915-display/intel_display_power_well.o \ > + i915-display/intel_display_reset.o \ > i915-display/intel_display_rpm.o \ > i915-display/intel_display_rps.o \ > i915-display/intel_display_trace.o \ --=20 Jani Nikula, Intel