dri-devel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] drm/i915/cx0: fix PLL enable failure handling on Meteor Lake
@ 2026-05-09 16:24 Aaron Esau
  2026-05-09 16:24 ` [PATCH 1/3] drm/i915/cx0: check PLL ACK bit in intel_cx0_pll_is_enabled() Aaron Esau
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Aaron Esau @ 2026-05-09 16:24 UTC (permalink / raw)
  To: intel-gfx
  Cc: intel-xe, dri-devel, jani.nikula, rodrigo.vivi, joonas.lahtinen,
	tursulin, mika.kahola, stable, Aaron Esau

On Meteor Lake with a hybrid Intel/NVIDIA GPU setup, s2idle resume can
leave the CX0 PHY MSGBUS unresponsive. When this happens, the PLL
enable sequence silently fails: register writes via MSGBUS are dropped,
the PLL never locks, but the driver marks it as enabled and proceeds to
drive the pipe.

The root cause of the MSGBUS becoming unresponsive appears to be the
NVIDIA dGPU not participating in S0ix (addressed via the
NVreg_EnableS0ixPowerManagement module parameter). However, the i915
driver should handle PLL enable failures gracefully regardless of the
trigger.

This series:
  1. Fixes intel_cx0_pll_is_enabled() to check the hardware ACK bit,
     not just the driver-set REQUEST bit, so a PLL that failed to lock
     is correctly reported as disabled.
  2. Adds error propagation through the DPLL enable path: changes the
     .enable callback to return int, threads errors through
     _intel_enable_shared_dpll() and intel_dpll_enable(), and checks
     the result in hsw_crtc_enable() and ilk_pch_enable().
  3. Makes the CX0 PLL enable path return -ETIMEDOUT when the PHY
     fails to come out of reset or the PLL fails to lock.

Found on a Lenovo ThinkPad with Intel Ultra 7 155H and NVIDIA RTX 2000
Ada. Kernel traces before each crash:

  i915: Failed to bring PHY A to idle.
  i915: PHY A Read 0c70 failed after 3 retries.
  i915: Timeout waiting for DDI BUF A to get active
  i915: [CRTC:149:pipe A] flip_done timed out

Aaron Esau (3):
  drm/i915/cx0: check PLL ACK bit in intel_cx0_pll_is_enabled()
  drm/i915/dpll: add error propagation to DPLL enable path
  drm/i915/cx0: return errors from CX0 PLL enable on failure

 drivers/gpu/drm/i915/display/intel_cx0_phy.c  | 54 ++++++++----
 drivers/gpu/drm/i915/display/intel_cx0_phy.h  |  6 +-
 drivers/gpu/drm/i915/display/intel_display.c  | 10 ++-
 drivers/gpu/drm/i915/display/intel_dpll_mgr.c | 87 ++++++++++++++-----
 drivers/gpu/drm/i915/display/intel_dpll_mgr.h |  2 +-
 .../gpu/drm/i915/display/intel_pch_display.c  |  7 +-
 6 files changed, 117 insertions(+), 49 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] drm/i915/cx0: check PLL ACK bit in intel_cx0_pll_is_enabled()
  2026-05-09 16:24 [PATCH 0/3] drm/i915/cx0: fix PLL enable failure handling on Meteor Lake Aaron Esau
@ 2026-05-09 16:24 ` Aaron Esau
  2026-05-09 16:24 ` [PATCH 2/3] drm/i915/dpll: add error propagation to DPLL enable path Aaron Esau
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Aaron Esau @ 2026-05-09 16:24 UTC (permalink / raw)
  To: intel-gfx
  Cc: intel-xe, dri-devel, jani.nikula, rodrigo.vivi, joonas.lahtinen,
	tursulin, mika.kahola, stable, Aaron Esau

intel_cx0_pll_is_enabled() only checks the PCLK_PLL_REQUEST bit in
PORT_CLOCK_CTL, which is set by the driver during the PLL enable
sequence. It does not check the PCLK_PLL_ACK bit, which is the
hardware's response indicating the PLL actually locked.

When the CX0 PHY MSGBUS is unresponsive (e.g. after a failed s2idle
resume), the PLL register programming via MSGBUS silently fails and
the PLL never locks, but intel_cx0_pll_is_enabled() returns true
because the driver-set REQUEST bit is present. This causes all
downstream state readout and verification to operate on a PLL that
is not actually enabled.

Check both the REQUEST and ACK bits so that a PLL is only reported
as enabled when the hardware has confirmed it locked.

Fixes: bf8531990380 ("drm/i915/display: Allow display PHYs to reset power state")
Cc: stable@vger.kernel.org
Cc: Mika Kahola <mika.kahola@intel.com>
Signed-off-by: Aaron Esau <aaron1esau@gmail.com>
---
 drivers/gpu/drm/i915/display/intel_cx0_phy.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
index 7288065d2..4cacea802 100644
--- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
+++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
@@ -3581,9 +3581,12 @@ static bool intel_cx0_pll_is_enabled(struct intel_encoder *encoder)
 	struct intel_display *display = to_intel_display(encoder);
 	struct intel_digital_port *dig_port = enc_to_dig_port(encoder);
 	u8 lane = dig_port->lane_reversal ? INTEL_CX0_LANE1 : INTEL_CX0_LANE0;
+	u32 val;
 
-	return intel_de_read(display, XELPDP_PORT_CLOCK_CTL(display, encoder->port)) &
-			     intel_cx0_get_pclk_pll_request(lane);
+	val = intel_de_read(display, XELPDP_PORT_CLOCK_CTL(display, encoder->port));
+
+	return (val & intel_cx0_get_pclk_pll_request(lane)) &&
+	       (val & intel_cx0_get_pclk_pll_ack(lane));
 }
 
 void intel_mtl_tbt_pll_disable_clock(struct intel_encoder *encoder)
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] drm/i915/dpll: add error propagation to DPLL enable path
  2026-05-09 16:24 [PATCH 0/3] drm/i915/cx0: fix PLL enable failure handling on Meteor Lake Aaron Esau
  2026-05-09 16:24 ` [PATCH 1/3] drm/i915/cx0: check PLL ACK bit in intel_cx0_pll_is_enabled() Aaron Esau
@ 2026-05-09 16:24 ` Aaron Esau
  2026-05-09 16:24 ` [PATCH 3/3] drm/i915/cx0: return errors from CX0 PLL enable on failure Aaron Esau
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Aaron Esau @ 2026-05-09 16:24 UTC (permalink / raw)
  To: intel-gfx
  Cc: intel-xe, dri-devel, jani.nikula, rodrigo.vivi, joonas.lahtinen,
	tursulin, mika.kahola, stable, Aaron Esau

The .enable callback in struct intel_dpll_funcs returns void, providing
no way to report PLL enable failures to callers. This leaves
_intel_enable_shared_dpll() and intel_dpll_enable() unable to detect
when a PLL fails to lock, causing pll->on to be set to true
unconditionally and the CRTC enable sequence to proceed against a
non-functional PLL.

Change the .enable callback to return int. Update all implementations
to return 0 (no functional change for platforms where enable cannot
fail). Thread the error through _intel_enable_shared_dpll() and
intel_dpll_enable(), rolling back active_mask and power domain state
on failure.

Update hsw_crtc_enable() and ilk_pch_enable() to check the return
value and bail out before attempting to drive a pipe with no working
PLL.

No functional change on any platform yet, as all .enable callbacks
return 0. A subsequent patch will make the CX0 PHY PLL enable path
return errors on failure.

Signed-off-by: Aaron Esau <aaron1esau@gmail.com>
---
 drivers/gpu/drm/i915/display/intel_display.c  | 10 ++-
 drivers/gpu/drm/i915/display/intel_dpll_mgr.c | 87 ++++++++++++++-----
 drivers/gpu/drm/i915/display/intel_dpll_mgr.h |  2 +-
 .../gpu/drm/i915/display/intel_pch_display.c  |  7 +-
 4 files changed, 80 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index 0f82bf771..74bfeed31 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -1645,8 +1645,14 @@ static void hsw_crtc_enable(struct intel_atomic_state *state,
 
 	intel_encoders_pre_pll_enable(state, crtc);
 
-	if (new_crtc_state->intel_dpll)
-		intel_dpll_enable(new_crtc_state);
+	if (new_crtc_state->intel_dpll) {
+		if (intel_dpll_enable(new_crtc_state)) {
+			drm_err(display->drm,
+				"[CRTC:%d:%s] PLL enable failed, aborting crtc enable\n",
+				crtc->base.base.id, crtc->base.name);
+			return;
+		}
+	}
 
 	intel_encoders_pre_enable(state, crtc);
 
diff --git a/drivers/gpu/drm/i915/display/intel_dpll_mgr.c b/drivers/gpu/drm/i915/display/intel_dpll_mgr.c
index 9aa84a430..78fd2e5f9 100644
--- a/drivers/gpu/drm/i915/display/intel_dpll_mgr.c
+++ b/drivers/gpu/drm/i915/display/intel_dpll_mgr.c
@@ -69,9 +69,9 @@ struct intel_dpll_funcs {
 	 * Hook for enabling the pll, called from intel_enable_dpll() if
 	 * the pll is not already enabled.
 	 */
-	void (*enable)(struct intel_display *display,
-		       struct intel_dpll *pll,
-		       const struct intel_dpll_hw_state *dpll_hw_state);
+	int (*enable)(struct intel_display *display,
+		      struct intel_dpll *pll,
+		      const struct intel_dpll_hw_state *dpll_hw_state);
 
 	/*
 	 * Hook for disabling the pll, called from intel_disable_dpll()
@@ -245,14 +245,28 @@ intel_tc_pll_enable_reg(struct intel_display *display,
 	return MG_PLL_ENABLE(tc_port);
 }
 
-static void _intel_enable_shared_dpll(struct intel_display *display,
-				      struct intel_dpll *pll)
+static int _intel_enable_shared_dpll(struct intel_display *display,
+				     struct intel_dpll *pll)
 {
+	int ret;
+
 	if (pll->info->power_domain)
 		pll->wakeref = intel_display_power_get(display, pll->info->power_domain);
 
-	pll->info->funcs->enable(display, pll, &pll->state.hw_state);
+	ret = pll->info->funcs->enable(display, pll, &pll->state.hw_state);
+	if (ret) {
+		drm_err(display->drm, "%s: PLL enable failed (err %d)\n",
+			pll->info->name, ret);
+		pll->on = false;
+		if (pll->info->power_domain)
+			intel_display_power_put(display, pll->info->power_domain,
+						pll->wakeref);
+		return ret;
+	}
+
 	pll->on = true;
+
+	return 0;
 }
 
 static void _intel_disable_shared_dpll(struct intel_display *display,
@@ -270,17 +284,20 @@ static void _intel_disable_shared_dpll(struct intel_display *display,
  * @crtc_state: CRTC, and its state, which has a DPLL
  *
  * Enable DPLL used by @crtc.
+ *
+ * Returns: 0 on success, negative error code on failure.
  */
-void intel_dpll_enable(const struct intel_crtc_state *crtc_state)
+int intel_dpll_enable(const struct intel_crtc_state *crtc_state)
 {
 	struct intel_display *display = to_intel_display(crtc_state);
 	struct intel_crtc *crtc = to_intel_crtc(crtc_state->uapi.crtc);
 	struct intel_dpll *pll = crtc_state->intel_dpll;
 	unsigned int pipe_mask = intel_crtc_joined_pipe_mask(crtc_state);
 	unsigned int old_mask;
+	int ret = 0;
 
 	if (drm_WARN_ON(display->drm, !pll))
-		return;
+		return -EINVAL;
 
 	mutex_lock(&display->dpll.lock);
 	old_mask = pll->active_mask;
@@ -305,10 +322,14 @@ void intel_dpll_enable(const struct intel_crtc_state *crtc_state)
 
 	drm_dbg_kms(display->drm, "enabling %s\n", pll->info->name);
 
-	_intel_enable_shared_dpll(display, pll);
+	ret = _intel_enable_shared_dpll(display, pll);
+	if (ret)
+		pll->active_mask &= ~pipe_mask;
 
 out:
 	mutex_unlock(&display->dpll.lock);
+
+	return ret;
 }
 
 /**
@@ -577,7 +598,7 @@ static void ibx_assert_pch_refclk_enabled(struct intel_display *display)
 				 "PCH refclk assertion failure, should be active but is disabled\n");
 }
 
-static void ibx_pch_dpll_enable(struct intel_display *display,
+static int ibx_pch_dpll_enable(struct intel_display *display,
 				struct intel_dpll *pll,
 				const struct intel_dpll_hw_state *dpll_hw_state)
 {
@@ -604,6 +625,8 @@ static void ibx_pch_dpll_enable(struct intel_display *display,
 	intel_de_write(display, PCH_DPLL(id), hw_state->dpll);
 	intel_de_posting_read(display, PCH_DPLL(id));
 	udelay(200);
+
+	return 0;
 }
 
 static void ibx_pch_dpll_disable(struct intel_display *display,
@@ -707,7 +730,7 @@ static const struct intel_dpll_mgr pch_pll_mgr = {
 	.compare_hw_state = ibx_compare_hw_state,
 };
 
-static void hsw_ddi_wrpll_enable(struct intel_display *display,
+static int hsw_ddi_wrpll_enable(struct intel_display *display,
 				 struct intel_dpll *pll,
 				 const struct intel_dpll_hw_state *dpll_hw_state)
 {
@@ -717,9 +740,11 @@ static void hsw_ddi_wrpll_enable(struct intel_display *display,
 	intel_de_write(display, WRPLL_CTL(id), hw_state->wrpll);
 	intel_de_posting_read(display, WRPLL_CTL(id));
 	udelay(20);
+
+	return 0;
 }
 
-static void hsw_ddi_spll_enable(struct intel_display *display,
+static int hsw_ddi_spll_enable(struct intel_display *display,
 				struct intel_dpll *pll,
 				const struct intel_dpll_hw_state *dpll_hw_state)
 {
@@ -728,6 +753,8 @@ static void hsw_ddi_spll_enable(struct intel_display *display,
 	intel_de_write(display, SPLL_CTL, hw_state->spll);
 	intel_de_posting_read(display, SPLL_CTL);
 	udelay(20);
+
+	return 0;
 }
 
 static void hsw_ddi_wrpll_disable(struct intel_display *display,
@@ -1300,10 +1327,11 @@ static const struct intel_dpll_funcs hsw_ddi_spll_funcs = {
 	.get_freq = hsw_ddi_spll_get_freq,
 };
 
-static void hsw_ddi_lcpll_enable(struct intel_display *display,
+static int hsw_ddi_lcpll_enable(struct intel_display *display,
 				 struct intel_dpll *pll,
 				 const struct intel_dpll_hw_state *hw_state)
 {
+	return 0;
 }
 
 static void hsw_ddi_lcpll_disable(struct intel_display *display,
@@ -1393,7 +1421,7 @@ static void skl_ddi_pll_write_ctrl1(struct intel_display *display,
 	intel_de_posting_read(display, DPLL_CTRL1);
 }
 
-static void skl_ddi_pll_enable(struct intel_display *display,
+static int skl_ddi_pll_enable(struct intel_display *display,
 			       struct intel_dpll *pll,
 			       const struct intel_dpll_hw_state *dpll_hw_state)
 {
@@ -1413,15 +1441,19 @@ static void skl_ddi_pll_enable(struct intel_display *display,
 
 	if (intel_de_wait_for_set_ms(display, DPLL_STATUS, DPLL_LOCK(id), 5))
 		drm_err(display->drm, "DPLL %d not locked\n", id);
+
+	return 0;
 }
 
-static void skl_ddi_dpll0_enable(struct intel_display *display,
+static int skl_ddi_dpll0_enable(struct intel_display *display,
 				 struct intel_dpll *pll,
 				 const struct intel_dpll_hw_state *dpll_hw_state)
 {
 	const struct skl_dpll_hw_state *hw_state = &dpll_hw_state->skl;
 
 	skl_ddi_pll_write_ctrl1(display, pll, hw_state);
+
+	return 0;
 }
 
 static void skl_ddi_pll_disable(struct intel_display *display,
@@ -2053,7 +2085,7 @@ static const struct intel_dpll_mgr skl_pll_mgr = {
 	.compare_hw_state = skl_compare_hw_state,
 };
 
-static void bxt_ddi_pll_enable(struct intel_display *display,
+static int bxt_ddi_pll_enable(struct intel_display *display,
 			       struct intel_dpll *pll,
 			       const struct intel_dpll_hw_state *dpll_hw_state)
 {
@@ -2158,6 +2190,8 @@ static void bxt_ddi_pll_enable(struct intel_display *display,
 	temp &= ~LANESTAGGER_STRAP_OVRD;
 	temp |= hw_state->pcsdw12;
 	intel_de_write(display, BXT_PORT_PCS_DW12_GRP(phy, ch), temp);
+
+	return 0;
 }
 
 static void bxt_ddi_pll_disable(struct intel_display *display,
@@ -4007,7 +4041,7 @@ static void adlp_cmtg_clock_gating_wa(struct intel_display *display, struct inte
 		drm_dbg_kms(display->drm, "Unexpected flags in TRANS_CMTG_CHICKEN: %08x\n", val);
 }
 
-static void combo_pll_enable(struct intel_display *display,
+static int combo_pll_enable(struct intel_display *display,
 			     struct intel_dpll *pll,
 			     const struct intel_dpll_hw_state *dpll_hw_state)
 {
@@ -4029,9 +4063,11 @@ static void combo_pll_enable(struct intel_display *display,
 	adlp_cmtg_clock_gating_wa(display, pll);
 
 	/* DVFS post sequence would be here. See the comment above. */
+
+	return 0;
 }
 
-static void icl_tbt_pll_enable(struct intel_display *display,
+static int icl_tbt_pll_enable(struct intel_display *display,
 			       struct intel_dpll *pll,
 			       const struct intel_dpll_hw_state *dpll_hw_state)
 {
@@ -4050,9 +4086,11 @@ static void icl_tbt_pll_enable(struct intel_display *display,
 	icl_pll_enable(display, pll, TBT_PLL_ENABLE);
 
 	/* DVFS post sequence would be here. See the comment above. */
+
+	return 0;
 }
 
-static void mg_pll_enable(struct intel_display *display,
+static int mg_pll_enable(struct intel_display *display,
 			  struct intel_dpll *pll,
 			  const struct intel_dpll_hw_state *dpll_hw_state)
 {
@@ -4075,6 +4113,8 @@ static void mg_pll_enable(struct intel_display *display,
 	icl_pll_enable(display, pll, enable_reg);
 
 	/* DVFS post sequence would be here. See the comment above. */
+
+	return 0;
 }
 
 static void icl_pll_disable(struct intel_display *display,
@@ -4392,16 +4432,18 @@ static int mtl_pll_get_freq(struct intel_display *display,
 	return intel_cx0pll_calc_port_clock(encoder, &dpll_hw_state->cx0pll);
 }
 
-static void mtl_pll_enable(struct intel_display *display,
+static int mtl_pll_enable(struct intel_display *display,
 			   struct intel_dpll *pll,
 			   const struct intel_dpll_hw_state *dpll_hw_state)
 {
 	struct intel_encoder *encoder = get_intel_encoder(display, pll);
 
 	if (drm_WARN_ON(display->drm, !encoder))
-		return;
+		return -ENODEV;
 
 	intel_mtl_pll_enable(encoder, pll, dpll_hw_state);
+
+	return 0;
 }
 
 static void mtl_pll_disable(struct intel_display *display,
@@ -4422,10 +4464,11 @@ static const struct intel_dpll_funcs mtl_pll_funcs = {
 	.get_freq = mtl_pll_get_freq,
 };
 
-static void mtl_tbt_pll_enable(struct intel_display *display,
+static int mtl_tbt_pll_enable(struct intel_display *display,
 			       struct intel_dpll *pll,
 			       const struct intel_dpll_hw_state *hw_state)
 {
+	return 0;
 }
 
 static void mtl_tbt_pll_disable(struct intel_display *display,
diff --git a/drivers/gpu/drm/i915/display/intel_dpll_mgr.h b/drivers/gpu/drm/i915/display/intel_dpll_mgr.h
index 5b71c8605..21fae6fd0 100644
--- a/drivers/gpu/drm/i915/display/intel_dpll_mgr.h
+++ b/drivers/gpu/drm/i915/display/intel_dpll_mgr.h
@@ -435,7 +435,7 @@ int intel_dpll_get_freq(struct intel_display *display,
 bool intel_dpll_get_hw_state(struct intel_display *display,
 			     struct intel_dpll *pll,
 			     struct intel_dpll_hw_state *dpll_hw_state);
-void intel_dpll_enable(const struct intel_crtc_state *crtc_state);
+int intel_dpll_enable(const struct intel_crtc_state *crtc_state);
 void intel_dpll_disable(const struct intel_crtc_state *crtc_state);
 void intel_dpll_swap_state(struct intel_atomic_state *state);
 void intel_dpll_init(struct intel_display *display);
diff --git a/drivers/gpu/drm/i915/display/intel_pch_display.c b/drivers/gpu/drm/i915/display/intel_pch_display.c
index 16619f7be..cb979a946 100644
--- a/drivers/gpu/drm/i915/display/intel_pch_display.c
+++ b/drivers/gpu/drm/i915/display/intel_pch_display.c
@@ -399,7 +399,12 @@ void ilk_pch_enable(struct intel_atomic_state *state,
 	 * get_dpll unconditionally resets the pll - we need that
 	 * to have the right LVDS enable sequence.
 	 */
-	intel_dpll_enable(crtc_state);
+	if (intel_dpll_enable(crtc_state)) {
+		drm_err(display->drm,
+			"[CRTC:%d:%s] PCH PLL enable failed, aborting PCH enable\n",
+			crtc->base.base.id, crtc->base.name);
+		return;
+	}
 
 	/* set transcoder timing, panel must allow it */
 	assert_pps_unlocked(display, pipe);
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] drm/i915/cx0: return errors from CX0 PLL enable on failure
  2026-05-09 16:24 [PATCH 0/3] drm/i915/cx0: fix PLL enable failure handling on Meteor Lake Aaron Esau
  2026-05-09 16:24 ` [PATCH 1/3] drm/i915/cx0: check PLL ACK bit in intel_cx0_pll_is_enabled() Aaron Esau
  2026-05-09 16:24 ` [PATCH 2/3] drm/i915/dpll: add error propagation to DPLL enable path Aaron Esau
@ 2026-05-09 16:24 ` Aaron Esau
  2026-05-11  8:03 ` [PATCH 0/3] drm/i915/cx0: fix PLL enable failure handling on Meteor Lake Imre Deak
  2026-05-11  9:33 ` Jani Nikula
  4 siblings, 0 replies; 7+ messages in thread
From: Aaron Esau @ 2026-05-09 16:24 UTC (permalink / raw)
  To: intel-gfx
  Cc: intel-xe, dri-devel, jani.nikula, rodrigo.vivi, joonas.lahtinen,
	tursulin, mika.kahola, stable, Aaron Esau

intel_cx0pll_enable() silently continues when the PHY fails to come
out of SOC reset or the PLL fails to lock. When the CX0 PHY MSGBUS
is unresponsive, this causes all subsequent PLL register writes to be
silently dropped and the PLL lock request to time out, leaving the
display hardware in a broken state.

Return -ETIMEDOUT from intel_cx0_phy_lane_reset() when the PHY fails
to come out of SOC reset. Return -ETIMEDOUT from intel_cx0pll_enable()
when the PLL fails to lock. Propagate these errors through
intel_mtl_pll_enable() and mtl_pll_enable() to the shared DPLL
framework, which (as of the previous patch) will abort the CRTC
enable sequence rather than driving a pipe with a non-functional PLL.

Fixes: 51390cc0e00a ("drm/i915/mtl: Add Support for C10 PHY message bus and pll programming")
Cc: stable@vger.kernel.org
Signed-off-by: Aaron Esau <aaron1esau@gmail.com>
---
 drivers/gpu/drm/i915/display/intel_cx0_phy.c  | 47 ++++++++++++-------
 drivers/gpu/drm/i915/display/intel_cx0_phy.h  |  6 +--
 drivers/gpu/drm/i915/display/intel_dpll_mgr.c |  4 +-
 3 files changed, 34 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
index 4cacea802..f5c8444ef 100644
--- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
+++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
@@ -3103,8 +3103,8 @@ static u32 intel_cx0_get_pclk_refclk_ack(u8 lane_mask)
 	return val;
 }
 
-static void intel_cx0_phy_lane_reset(struct intel_encoder *encoder,
-				     bool lane_reversal)
+static int intel_cx0_phy_lane_reset(struct intel_encoder *encoder,
+				    bool lane_reversal)
 {
 	struct intel_display *display = to_intel_display(encoder);
 	enum port port = encoder->port;
@@ -3121,10 +3121,12 @@ static void intel_cx0_phy_lane_reset(struct intel_encoder *encoder,
 
 	if (intel_de_wait_for_set_us(display, XELPDP_PORT_BUF_CTL1(display, port),
 				     XELPDP_PORT_BUF_SOC_PHY_READY,
-				     XELPDP_PORT_BUF_SOC_READY_TIMEOUT_US))
-		drm_warn(display->drm,
-			 "PHY %c failed to bring out of SOC reset\n",
-			 phy_name(phy));
+				     XELPDP_PORT_BUF_SOC_READY_TIMEOUT_US)) {
+		drm_err(display->drm,
+			"PHY %c failed to bring out of SOC reset\n",
+			phy_name(phy));
+		return -ETIMEDOUT;
+	}
 
 	intel_de_rmw(display, XELPDP_PORT_BUF_CTL2(display, port), lane_pipe_reset,
 		     lane_pipe_reset);
@@ -3160,6 +3162,8 @@ static void intel_cx0_phy_lane_reset(struct intel_encoder *encoder,
 		drm_warn(display->drm,
 			 "PHY %c failed to bring out of lane reset\n",
 			 phy_name(phy));
+
+	return 0;
 }
 
 static void intel_cx0_program_phy_lane(struct intel_encoder *encoder, int lane_count,
@@ -3220,17 +3224,18 @@ static u32 intel_cx0_get_pclk_pll_ack(u8 lane_mask)
 	return val;
 }
 
-static void intel_cx0pll_enable(struct intel_encoder *encoder,
-				const struct intel_cx0pll_state *pll_state)
+static int intel_cx0pll_enable(struct intel_encoder *encoder,
+			       const struct intel_cx0pll_state *pll_state)
 {
 	int port_clock = pll_state->use_c10 ? pll_state->c10.clock : pll_state->c20.clock;
 	struct intel_display *display = to_intel_display(encoder);
 	enum phy phy = intel_encoder_to_phy(encoder);
 	struct intel_digital_port *dig_port = enc_to_dig_port(encoder);
+	struct ref_tracker *wakeref = intel_cx0_phy_transaction_begin(encoder);
 	bool lane_reversal = dig_port->lane_reversal;
 	u8 maxpclk_lane = lane_reversal ? INTEL_CX0_LANE1 :
 					  INTEL_CX0_LANE0;
-	struct ref_tracker *wakeref = intel_cx0_phy_transaction_begin(encoder);
+	int ret;
 
 	/*
 	 * Lane reversal is never used in DP-alt mode, in that case the
@@ -3246,7 +3251,9 @@ static void intel_cx0pll_enable(struct intel_encoder *encoder,
 	intel_program_port_clock_ctl(encoder, pll_state, port_clock, lane_reversal);
 
 	/* 2. Bring PHY out of reset. */
-	intel_cx0_phy_lane_reset(encoder, lane_reversal);
+	ret = intel_cx0_phy_lane_reset(encoder, lane_reversal);
+	if (ret)
+		goto out;
 
 	/*
 	 * 3. Change Phy power state to Ready.
@@ -3296,9 +3303,12 @@ static void intel_cx0pll_enable(struct intel_encoder *encoder,
 	if (intel_de_wait_us(display, XELPDP_PORT_CLOCK_CTL(display, encoder->port),
 			     intel_cx0_get_pclk_pll_ack(INTEL_CX0_BOTH_LANES),
 			     intel_cx0_get_pclk_pll_ack(maxpclk_lane),
-			     XELPDP_PCLK_PLL_ENABLE_TIMEOUT_US, NULL))
-		drm_warn(display->drm, "Port %c PLL not locked\n",
-			 phy_name(phy));
+			     XELPDP_PCLK_PLL_ENABLE_TIMEOUT_US, NULL)) {
+		drm_err(display->drm, "Port %c PLL not locked\n",
+			phy_name(phy));
+		ret = -ETIMEDOUT;
+		goto out;
+	}
 
 	/*
 	 * 11. Follow the Display Voltage Frequency Switching Sequence After
@@ -3320,7 +3330,10 @@ static void intel_cx0pll_enable(struct intel_encoder *encoder,
 						    XELPDP_P2_STATE_READY);
 	}
 
+out:
 	intel_cx0_phy_transaction_end(encoder, wakeref);
+
+	return ret;
 }
 
 void intel_mtl_tbt_pll_calc_state(struct intel_dpll_hw_state *hw_state)
@@ -3458,11 +3471,11 @@ void intel_mtl_tbt_pll_enable_clock(struct intel_encoder *encoder, int port_cloc
 		       port_clock);
 }
 
-void intel_mtl_pll_enable(struct intel_encoder *encoder,
-			  struct intel_dpll *pll,
-			  const struct intel_dpll_hw_state *dpll_hw_state)
+int intel_mtl_pll_enable(struct intel_encoder *encoder,
+			 struct intel_dpll *pll,
+			 const struct intel_dpll_hw_state *dpll_hw_state)
 {
-	intel_cx0pll_enable(encoder, &dpll_hw_state->cx0pll);
+	return intel_cx0pll_enable(encoder, &dpll_hw_state->cx0pll);
 }
 
 void intel_mtl_pll_enable_clock(struct intel_encoder *encoder,
diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.h b/drivers/gpu/drm/i915/display/intel_cx0_phy.h
index ae98ac23e..1d6cc32d7 100644
--- a/drivers/gpu/drm/i915/display/intel_cx0_phy.h
+++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.h
@@ -28,9 +28,9 @@ struct intel_hdmi;
 void intel_clear_response_ready_flag(struct intel_encoder *encoder,
 				     int lane);
 bool intel_encoder_is_c10phy(struct intel_encoder *encoder);
-void intel_mtl_pll_enable(struct intel_encoder *encoder,
-			  struct intel_dpll *pll,
-			  const struct intel_dpll_hw_state *dpll_hw_state);
+int intel_mtl_pll_enable(struct intel_encoder *encoder,
+			 struct intel_dpll *pll,
+			 const struct intel_dpll_hw_state *dpll_hw_state);
 void intel_mtl_pll_disable(struct intel_encoder *encoder);
 enum icl_port_dpll_id
 intel_mtl_port_pll_type(struct intel_encoder *encoder,
diff --git a/drivers/gpu/drm/i915/display/intel_dpll_mgr.c b/drivers/gpu/drm/i915/display/intel_dpll_mgr.c
index 78fd2e5f9..ce31deadc 100644
--- a/drivers/gpu/drm/i915/display/intel_dpll_mgr.c
+++ b/drivers/gpu/drm/i915/display/intel_dpll_mgr.c
@@ -4441,9 +4441,7 @@ static int mtl_pll_enable(struct intel_display *display,
 	if (drm_WARN_ON(display->drm, !encoder))
 		return -ENODEV;
 
-	intel_mtl_pll_enable(encoder, pll, dpll_hw_state);
-
-	return 0;
+	return intel_mtl_pll_enable(encoder, pll, dpll_hw_state);
 }
 
 static void mtl_pll_disable(struct intel_display *display,
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] drm/i915/cx0: fix PLL enable failure handling on Meteor Lake
  2026-05-09 16:24 [PATCH 0/3] drm/i915/cx0: fix PLL enable failure handling on Meteor Lake Aaron Esau
                   ` (2 preceding siblings ...)
  2026-05-09 16:24 ` [PATCH 3/3] drm/i915/cx0: return errors from CX0 PLL enable on failure Aaron Esau
@ 2026-05-11  8:03 ` Imre Deak
  2026-05-11  8:11   ` Saarinen, Jani
  2026-05-11  9:33 ` Jani Nikula
  4 siblings, 1 reply; 7+ messages in thread
From: Imre Deak @ 2026-05-11  8:03 UTC (permalink / raw)
  To: Aaron Esau
  Cc: intel-gfx, intel-xe, dri-devel, jani.nikula, rodrigo.vivi,
	joonas.lahtinen, tursulin, mika.kahola, stable

On Sat, May 09, 2026 at 11:24:04AM -0500, Aaron Esau wrote:
> On Meteor Lake with a hybrid Intel/NVIDIA GPU setup, s2idle resume can
> leave the CX0 PHY MSGBUS unresponsive. When this happens, the PLL
> enable sequence silently fails: register writes via MSGBUS are dropped,
> the PLL never locks, but the driver marks it as enabled and proceeds to
> drive the pipe.
> 
> The root cause of the MSGBUS becoming unresponsive appears to be the
> NVIDIA dGPU not participating in S0ix (addressed via the
> NVreg_EnableS0ixPowerManagement module parameter). However, the i915
> driver should handle PLL enable failures gracefully regardless of the
> trigger.
> 
> This series:
>   1. Fixes intel_cx0_pll_is_enabled() to check the hardware ACK bit,
>      not just the driver-set REQUEST bit, so a PLL that failed to lock
>      is correctly reported as disabled.
>   2. Adds error propagation through the DPLL enable path: changes the
>      .enable callback to return int, threads errors through
>      _intel_enable_shared_dpll() and intel_dpll_enable(), and checks
>      the result in hsw_crtc_enable() and ilk_pch_enable().
>   3. Makes the CX0 PLL enable path return -ETIMEDOUT when the PHY
>      fails to come out of reset or the PLL fails to lock.
> 
> Found on a Lenovo ThinkPad with Intel Ultra 7 155H and NVIDIA RTX 2000
> Ada. Kernel traces before each crash:
> 
>   i915: Failed to bring PHY A to idle.
>   i915: PHY A Read 0c70 failed after 3 retries.
>   i915: Timeout waiting for DDI BUF A to get active
>   i915: [CRTC:149:pipe A] flip_done timed out

This looks to be an issue in the BIOS/FW leaving the PHY and display
output HW state in general in a broken state. Could you please open a
ticket and provide a full dmesg log booting with drm.debug=0xe, so we
have a better idea on the sequence and proper ways to work around such
issues?

Thanks.

> 
> Aaron Esau (3):
>   drm/i915/cx0: check PLL ACK bit in intel_cx0_pll_is_enabled()
>   drm/i915/dpll: add error propagation to DPLL enable path
>   drm/i915/cx0: return errors from CX0 PLL enable on failure
> 
>  drivers/gpu/drm/i915/display/intel_cx0_phy.c  | 54 ++++++++----
>  drivers/gpu/drm/i915/display/intel_cx0_phy.h  |  6 +-
>  drivers/gpu/drm/i915/display/intel_display.c  | 10 ++-
>  drivers/gpu/drm/i915/display/intel_dpll_mgr.c | 87 ++++++++++++++-----
>  drivers/gpu/drm/i915/display/intel_dpll_mgr.h |  2 +-
>  .../gpu/drm/i915/display/intel_pch_display.c  |  7 +-
>  6 files changed, 117 insertions(+), 49 deletions(-)
> 
> -- 
> 2.54.0
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH 0/3] drm/i915/cx0: fix PLL enable failure handling on Meteor Lake
  2026-05-11  8:03 ` [PATCH 0/3] drm/i915/cx0: fix PLL enable failure handling on Meteor Lake Imre Deak
@ 2026-05-11  8:11   ` Saarinen, Jani
  0 siblings, 0 replies; 7+ messages in thread
From: Saarinen, Jani @ 2026-05-11  8:11 UTC (permalink / raw)
  To: Deak, Imre, Aaron Esau
  Cc: intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, jani.nikula@linux.intel.com,
	Vivi, Rodrigo, joonas.lahtinen@linux.intel.com,
	tursulin@ursulin.net, Kahola, Mika, stable@vger.kernel.org

Hi, 
> -----Original Message-----
> From: Intel-xe <intel-xe-bounces@lists.freedesktop.org> On Behalf Of Imre
> Deak
> Sent: Monday, 11 May 2026 11.03
> To: Aaron Esau <aaron1esau@gmail.com>
> Cc: intel-gfx@lists.freedesktop.org; intel-xe@lists.freedesktop.org; dri-
> devel@lists.freedesktop.org; jani.nikula@linux.intel.com; Vivi, Rodrigo
> <rodrigo.vivi@intel.com>; joonas.lahtinen@linux.intel.com;
> tursulin@ursulin.net; Kahola, Mika <mika.kahola@intel.com>;
> stable@vger.kernel.org
> Subject: Re: [PATCH 0/3] drm/i915/cx0: fix PLL enable failure handling on
> Meteor Lake
> 
> On Sat, May 09, 2026 at 11:24:04AM -0500, Aaron Esau wrote:
> > On Meteor Lake with a hybrid Intel/NVIDIA GPU setup, s2idle resume can
> > leave the CX0 PHY MSGBUS unresponsive. When this happens, the PLL
> > enable sequence silently fails: register writes via MSGBUS are
> > dropped, the PLL never locks, but the driver marks it as enabled and
> > proceeds to drive the pipe.
> >
> > The root cause of the MSGBUS becoming unresponsive appears to be the
> > NVIDIA dGPU not participating in S0ix (addressed via the
> > NVreg_EnableS0ixPowerManagement module parameter). However, the
> i915
> > driver should handle PLL enable failures gracefully regardless of the
> > trigger.
> >
> > This series:
> >   1. Fixes intel_cx0_pll_is_enabled() to check the hardware ACK bit,
> >      not just the driver-set REQUEST bit, so a PLL that failed to lock
> >      is correctly reported as disabled.
> >   2. Adds error propagation through the DPLL enable path: changes the
> >      .enable callback to return int, threads errors through
> >      _intel_enable_shared_dpll() and intel_dpll_enable(), and checks
> >      the result in hsw_crtc_enable() and ilk_pch_enable().
> >   3. Makes the CX0 PLL enable path return -ETIMEDOUT when the PHY
> >      fails to come out of reset or the PLL fails to lock.
> >
> > Found on a Lenovo ThinkPad with Intel Ultra 7 155H and NVIDIA RTX 2000
> > Ada. Kernel traces before each crash:
> >
> >   i915: Failed to bring PHY A to idle.
> >   i915: PHY A Read 0c70 failed after 3 retries.
> >   i915: Timeout waiting for DDI BUF A to get active
> >   i915: [CRTC:149:pipe A] flip_done timed out
> 
> This looks to be an issue in the BIOS/FW leaving the PHY and display output
> HW state in general in a broken state. Could you please open a ticket and
> provide a full dmesg log booting with drm.debug=0xe, so we have a better idea
> on the sequence and proper ways to work around such issues?
And this with this instruction https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html

> 
> Thanks.

Br,
Jani
> 
> >
> > Aaron Esau (3):
> >   drm/i915/cx0: check PLL ACK bit in intel_cx0_pll_is_enabled()
> >   drm/i915/dpll: add error propagation to DPLL enable path
> >   drm/i915/cx0: return errors from CX0 PLL enable on failure
> >
> >  drivers/gpu/drm/i915/display/intel_cx0_phy.c  | 54 ++++++++----
> > drivers/gpu/drm/i915/display/intel_cx0_phy.h  |  6 +-
> > drivers/gpu/drm/i915/display/intel_display.c  | 10 ++-
> > drivers/gpu/drm/i915/display/intel_dpll_mgr.c | 87 ++++++++++++++-----
> > drivers/gpu/drm/i915/display/intel_dpll_mgr.h |  2 +-
> > .../gpu/drm/i915/display/intel_pch_display.c  |  7 +-
> >  6 files changed, 117 insertions(+), 49 deletions(-)
> >
> > --
> > 2.54.0
> >

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] drm/i915/cx0: fix PLL enable failure handling on Meteor Lake
  2026-05-09 16:24 [PATCH 0/3] drm/i915/cx0: fix PLL enable failure handling on Meteor Lake Aaron Esau
                   ` (3 preceding siblings ...)
  2026-05-11  8:03 ` [PATCH 0/3] drm/i915/cx0: fix PLL enable failure handling on Meteor Lake Imre Deak
@ 2026-05-11  9:33 ` Jani Nikula
  4 siblings, 0 replies; 7+ messages in thread
From: Jani Nikula @ 2026-05-11  9:33 UTC (permalink / raw)
  To: Aaron Esau, intel-gfx
  Cc: intel-xe, dri-devel, rodrigo.vivi, joonas.lahtinen, tursulin,
	mika.kahola, stable, Aaron Esau, Marco Nenciarini, Imre Deak,
	Ville Syrjälä

On Sat, 09 May 2026, Aaron Esau <aaron1esau@gmail.com> wrote:
> On Meteor Lake with a hybrid Intel/NVIDIA GPU setup, s2idle resume can
> leave the CX0 PHY MSGBUS unresponsive. When this happens, the PLL
> enable sequence silently fails: register writes via MSGBUS are dropped,
> the PLL never locks, but the driver marks it as enabled and proceeds to
> drive the pipe.
>
> The root cause of the MSGBUS becoming unresponsive appears to be the
> NVIDIA dGPU not participating in S0ix (addressed via the
> NVreg_EnableS0ixPowerManagement module parameter). However, the i915
> driver should handle PLL enable failures gracefully regardless of the
> trigger.

The way I read this is: There's an issue with an out-of-tree proprietary
driver, you can only reproduce the issue with said proprietary driver,
and the upstream driver should jump through hoops to workaround the
issue in the proprietary driver, in ways that we won't be able to test
in our CI. And the expectation to work around this upstream is because
you can't really do anything about the proprietary driver.

Is that about right?

Apart from adding a bunch of generic error handling code superficially
unrelated to the proprietary driver.

The reason the CRTC enable path generally doesn't have error propagation
is that 1) the allowed errors on atomic commit are *very* limited, 2)
nonblocking commits are even more limited, and 3) even on failures the
display pipe must be running.

You simply can't bail out in the middle of hsw_crtc_enable() like
suggested in patch 2.

See [1] for more. Also see parts about tainted kernels in [2].


BR,
Jani.


[1] https://docs.kernel.org/gpu/drm-kms.html#c.drm_mode_config_funcs
[2] https://docs.kernel.org/admin-guide/reporting-issues.html


>
> This series:
>   1. Fixes intel_cx0_pll_is_enabled() to check the hardware ACK bit,
>      not just the driver-set REQUEST bit, so a PLL that failed to lock
>      is correctly reported as disabled.
>   2. Adds error propagation through the DPLL enable path: changes the
>      .enable callback to return int, threads errors through
>      _intel_enable_shared_dpll() and intel_dpll_enable(), and checks
>      the result in hsw_crtc_enable() and ilk_pch_enable().
>   3. Makes the CX0 PLL enable path return -ETIMEDOUT when the PHY
>      fails to come out of reset or the PLL fails to lock.
>
> Found on a Lenovo ThinkPad with Intel Ultra 7 155H and NVIDIA RTX 2000
> Ada. Kernel traces before each crash:
>
>   i915: Failed to bring PHY A to idle.
>   i915: PHY A Read 0c70 failed after 3 retries.
>   i915: Timeout waiting for DDI BUF A to get active
>   i915: [CRTC:149:pipe A] flip_done timed out
>
> Aaron Esau (3):
>   drm/i915/cx0: check PLL ACK bit in intel_cx0_pll_is_enabled()
>   drm/i915/dpll: add error propagation to DPLL enable path
>   drm/i915/cx0: return errors from CX0 PLL enable on failure
>
>  drivers/gpu/drm/i915/display/intel_cx0_phy.c  | 54 ++++++++----
>  drivers/gpu/drm/i915/display/intel_cx0_phy.h  |  6 +-
>  drivers/gpu/drm/i915/display/intel_display.c  | 10 ++-
>  drivers/gpu/drm/i915/display/intel_dpll_mgr.c | 87 ++++++++++++++-----
>  drivers/gpu/drm/i915/display/intel_dpll_mgr.h |  2 +-
>  .../gpu/drm/i915/display/intel_pch_display.c  |  7 +-
>  6 files changed, 117 insertions(+), 49 deletions(-)

-- 
Jani Nikula, Intel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-05-11  9:33 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-09 16:24 [PATCH 0/3] drm/i915/cx0: fix PLL enable failure handling on Meteor Lake Aaron Esau
2026-05-09 16:24 ` [PATCH 1/3] drm/i915/cx0: check PLL ACK bit in intel_cx0_pll_is_enabled() Aaron Esau
2026-05-09 16:24 ` [PATCH 2/3] drm/i915/dpll: add error propagation to DPLL enable path Aaron Esau
2026-05-09 16:24 ` [PATCH 3/3] drm/i915/cx0: return errors from CX0 PLL enable on failure Aaron Esau
2026-05-11  8:03 ` [PATCH 0/3] drm/i915/cx0: fix PLL enable failure handling on Meteor Lake Imre Deak
2026-05-11  8:11   ` Saarinen, Jani
2026-05-11  9:33 ` Jani Nikula

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox