public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Stop engines before reset
@ 2017-09-14 11:10 Mika Kuoppala
  0 siblings, 0 replies; 5+ messages in thread
From: Mika Kuoppala @ 2017-09-14 11:10 UTC (permalink / raw)
  To: intel-gfx

On kbl evidence indicates that even if the hardware happily
tells us to proceed with reset, it really isn't ready.
Resetting a freely running batchbuffer after we have ack for readiness,
still can cause a system hang.

We also have similar experiences on older gens. So now
attempt to stop engines before proceeding for reset, on all
gens where we have a gpu reset. This has shown to improve reset
reliability and reduce the risk of losing the machine.

Testcase: igt/prime_busy/hang-* # kbl
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_uncore.c | 72 ++++++++++++++++++++-----------------
 1 file changed, 40 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 97525de2cee4..473325705b92 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1354,33 +1354,38 @@ int i915_reg_read_ioctl(struct drm_device *dev,
 	return ret;
 }
 
-static void gen3_stop_rings(struct drm_i915_private *dev_priv)
+static void gen3_stop_engine(struct intel_engine_cs *engine)
+{
+	struct drm_i915_private *dev_priv = engine->i915;
+	const u32 base = engine->mmio_base;
+	const i915_reg_t mode = RING_MI_MODE(base);
+
+	I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
+	if (intel_wait_for_register_fw(dev_priv,
+				       mode,
+				       MODE_IDLE,
+				       MODE_IDLE,
+				       500))
+		DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n",
+				 engine->name);
+
+	I915_WRITE_FW(RING_CTL(base), 0);
+	I915_WRITE_FW(RING_HEAD(base), 0);
+	I915_WRITE_FW(RING_TAIL(base), 0);
+
+	/* Check acts as a post */
+	if (I915_READ_FW(RING_HEAD(base)) != 0)
+		DRM_DEBUG_DRIVER("%s: ring head not parked\n",
+				 engine->name);
+}
+
+static void i915_stop_engines(struct drm_i915_private *dev_priv, unsigned engine_mask)
 {
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
-	for_each_engine(engine, dev_priv, id) {
-		const u32 base = engine->mmio_base;
-		const i915_reg_t mode = RING_MI_MODE(base);
-
-		I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
-		if (intel_wait_for_register_fw(dev_priv,
-					       mode,
-					       MODE_IDLE,
-					       MODE_IDLE,
-					       500))
-			DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n",
-					 engine->name);
-
-		I915_WRITE_FW(RING_CTL(base), 0);
-		I915_WRITE_FW(RING_HEAD(base), 0);
-		I915_WRITE_FW(RING_TAIL(base), 0);
-
-		/* Check acts as a post */
-		if (I915_READ_FW(RING_HEAD(base)) != 0)
-			DRM_DEBUG_DRIVER("%s: ring head not parked\n",
-					 engine->name);
-	}
+	for_each_engine_masked(engine, dev_priv, engine_mask, id)
+		gen3_stop_engine(engine);
 }
 
 static bool i915_reset_complete(struct pci_dev *pdev)
@@ -1415,9 +1420,6 @@ static int g33_do_reset(struct drm_i915_private *dev_priv, unsigned engine_mask)
 {
 	struct pci_dev *pdev = dev_priv->drm.pdev;
 
-	/* Stop engines before we reset; see g4x_do_reset() below for why. */
-	gen3_stop_rings(dev_priv);
-
 	pci_write_config_byte(pdev, I915_GDRST, GRDOM_RESET_ENABLE);
 	return wait_for(g4x_reset_complete(pdev), 500);
 }
@@ -1432,12 +1434,6 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv, unsigned engine_mask)
 		   I915_READ(VDECCLK_GATE_D) | VCP_UNIT_CLOCK_GATE_DISABLE);
 	POSTING_READ(VDECCLK_GATE_D);
 
-	/* We stop engines, otherwise we might get failed reset and a
-	 * dead gpu (on elk).
-	 * WaMediaResetMainRingCleanup:ctg,elk (presumably)
-	 */
-	gen3_stop_rings(dev_priv);
-
 	pci_write_config_byte(pdev, I915_GDRST,
 			      GRDOM_MEDIA | GRDOM_RESET_ENABLE);
 	ret =  wait_for(g4x_reset_complete(pdev), 500);
@@ -1742,6 +1738,18 @@ int intel_gpu_reset(struct drm_i915_private *dev_priv, unsigned engine_mask)
 	 */
 	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
 	for (retry = 0; retry < 3; retry++) {
+
+		/* We stop engines, otherwise we might get failed reset and a
+		 * dead gpu (on elk). Also as modern gpu as kbl can suffer
+		 * from system hang if batchbuffer is progressing when
+		 * the reset is issued, regardless of READY_TO_RESET ack.
+		 * Thus assume it is best to stop engines on all gens
+		 * where we have a gpu reset.
+		 *
+		 * WaMediaResetMainRingCleanup:ctg,elk (presumably)
+		 */
+		i915_stop_engines(dev_priv, engine_mask);
+
 		ret = reset(dev_priv, engine_mask);
 		if (ret != -ETIMEDOUT)
 			break;
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH] drm/i915: Stop engines before reset
@ 2017-09-19 14:41 Mika Kuoppala
  2017-09-19 19:14 ` ✓ Fi.CI.BAT: success for drm/i915: Stop engines before reset (rev2) Patchwork
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Mika Kuoppala @ 2017-09-19 14:41 UTC (permalink / raw)
  To: intel-gfx

On kbl evidence indicates that even if the hardware happily
tells us to proceed with reset, it really isn't ready.
Resetting a freely running batchbuffer after we have ack for readiness,
still can cause a system hang.

We also have similar experiences on older gens. So now
attempt to stop engines before proceeding for reset, on all
gens where we have a gpu reset. This has shown to improve reset
reliability and reduce the risk of losing the machine.

v2: Add fixme for wa (Joonas)

Testcase: igt/prime_busy/hang-* # kbl
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_uncore.c | 75 +++++++++++++++++++++----------------
 1 file changed, 43 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 97525de2cee4..fdd7f93acb4f 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1354,33 +1354,39 @@ int i915_reg_read_ioctl(struct drm_device *dev,
 	return ret;
 }
 
-static void gen3_stop_rings(struct drm_i915_private *dev_priv)
+static void gen3_stop_engine(struct intel_engine_cs *engine)
+{
+	struct drm_i915_private *dev_priv = engine->i915;
+	const u32 base = engine->mmio_base;
+	const i915_reg_t mode = RING_MI_MODE(base);
+
+	I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
+	if (intel_wait_for_register_fw(dev_priv,
+				       mode,
+				       MODE_IDLE,
+				       MODE_IDLE,
+				       500))
+		DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n",
+				 engine->name);
+
+	I915_WRITE_FW(RING_CTL(base), 0);
+	I915_WRITE_FW(RING_HEAD(base), 0);
+	I915_WRITE_FW(RING_TAIL(base), 0);
+
+	/* Check acts as a post */
+	if (I915_READ_FW(RING_HEAD(base)) != 0)
+		DRM_DEBUG_DRIVER("%s: ring head not parked\n",
+				 engine->name);
+}
+
+static void i915_stop_engines(struct drm_i915_private *dev_priv,
+			      unsigned engine_mask)
 {
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
-	for_each_engine(engine, dev_priv, id) {
-		const u32 base = engine->mmio_base;
-		const i915_reg_t mode = RING_MI_MODE(base);
-
-		I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
-		if (intel_wait_for_register_fw(dev_priv,
-					       mode,
-					       MODE_IDLE,
-					       MODE_IDLE,
-					       500))
-			DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n",
-					 engine->name);
-
-		I915_WRITE_FW(RING_CTL(base), 0);
-		I915_WRITE_FW(RING_HEAD(base), 0);
-		I915_WRITE_FW(RING_TAIL(base), 0);
-
-		/* Check acts as a post */
-		if (I915_READ_FW(RING_HEAD(base)) != 0)
-			DRM_DEBUG_DRIVER("%s: ring head not parked\n",
-					 engine->name);
-	}
+	for_each_engine_masked(engine, dev_priv, engine_mask, id)
+		gen3_stop_engine(engine);
 }
 
 static bool i915_reset_complete(struct pci_dev *pdev)
@@ -1415,9 +1421,6 @@ static int g33_do_reset(struct drm_i915_private *dev_priv, unsigned engine_mask)
 {
 	struct pci_dev *pdev = dev_priv->drm.pdev;
 
-	/* Stop engines before we reset; see g4x_do_reset() below for why. */
-	gen3_stop_rings(dev_priv);
-
 	pci_write_config_byte(pdev, I915_GDRST, GRDOM_RESET_ENABLE);
 	return wait_for(g4x_reset_complete(pdev), 500);
 }
@@ -1432,12 +1435,6 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv, unsigned engine_mask)
 		   I915_READ(VDECCLK_GATE_D) | VCP_UNIT_CLOCK_GATE_DISABLE);
 	POSTING_READ(VDECCLK_GATE_D);
 
-	/* We stop engines, otherwise we might get failed reset and a
-	 * dead gpu (on elk).
-	 * WaMediaResetMainRingCleanup:ctg,elk (presumably)
-	 */
-	gen3_stop_rings(dev_priv);
-
 	pci_write_config_byte(pdev, I915_GDRST,
 			      GRDOM_MEDIA | GRDOM_RESET_ENABLE);
 	ret =  wait_for(g4x_reset_complete(pdev), 500);
@@ -1742,6 +1739,20 @@ int intel_gpu_reset(struct drm_i915_private *dev_priv, unsigned engine_mask)
 	 */
 	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
 	for (retry = 0; retry < 3; retry++) {
+
+		/* We stop engines, otherwise we might get failed reset and a
+		 * dead gpu (on elk). Also as modern gpu as kbl can suffer
+		 * from system hang if batchbuffer is progressing when
+		 * the reset is issued, regardless of READY_TO_RESET ack.
+		 * Thus assume it is best to stop engines on all gens
+		 * where we have a gpu reset.
+		 *
+		 * WaMediaResetMainRingCleanup:ctg,elk (presumably)
+		 *
+		 * FIXME: Wa for more modern gens needs to be validated
+		 */
+		i915_stop_engines(dev_priv, engine_mask);
+
 		ret = reset(dev_priv, engine_mask);
 		if (ret != -ETIMEDOUT)
 			break;
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Stop engines before reset (rev2)
  2017-09-19 14:41 [PATCH] drm/i915: Stop engines before reset Mika Kuoppala
@ 2017-09-19 19:14 ` Patchwork
  2017-09-19 22:40 ` ✓ Fi.CI.IGT: " Patchwork
  2017-09-20 11:24 ` [PATCH] drm/i915: Stop engines before reset Mika Kuoppala
  2 siblings, 0 replies; 5+ messages in thread
From: Patchwork @ 2017-09-19 19:14 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Stop engines before reset (rev2)
URL   : https://patchwork.freedesktop.org/series/30357/
State : success

== Summary ==

Series 30357v2 drm/i915: Stop engines before reset
https://patchwork.freedesktop.org/api/1.0/series/30357/revisions/2/mbox/

Test kms_cursor_legacy:
        Subgroup basic-busy-flip-before-cursor-atomic:
                pass       -> FAIL       (fi-ivb-3770) fdo#100215
Test kms_frontbuffer_tracking:
        Subgroup basic:
                dmesg-warn -> PASS       (fi-kbl-7500u)
Test kms_pipe_crc_basic:
        Subgroup hang-read-crc-pipe-b:
                incomplete -> DMESG-WARN (fi-cfl-s) fdo#102294
        Subgroup suspend-read-crc-pipe-a:
                incomplete -> PASS       (fi-kbl-7500u)

fdo#100215 https://bugs.freedesktop.org/show_bug.cgi?id=100215
fdo#102294 https://bugs.freedesktop.org/show_bug.cgi?id=102294

fi-bdw-5557u     total:289  pass:268  dwarn:0   dfail:0   fail:0   skip:21  time:437s
fi-bdw-gvtdvm    total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:466s
fi-blb-e6850     total:289  pass:224  dwarn:1   dfail:0   fail:0   skip:64  time:419s
fi-bsw-n3050     total:289  pass:243  dwarn:0   dfail:0   fail:0   skip:46  time:505s
fi-bwr-2160      total:289  pass:184  dwarn:0   dfail:0   fail:0   skip:105 time:278s
fi-bxt-j4205     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:503s
fi-byt-j1900     total:289  pass:254  dwarn:1   dfail:0   fail:0   skip:34  time:503s
fi-byt-n2820     total:289  pass:250  dwarn:1   dfail:0   fail:0   skip:38  time:491s
fi-cfl-s         total:289  pass:223  dwarn:34  dfail:0   fail:0   skip:32  time:547s
fi-elk-e7500     total:289  pass:230  dwarn:0   dfail:0   fail:0   skip:59  time:417s
fi-glk-1         total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:570s
fi-hsw-4770      total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:421s
fi-hsw-4770r     total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:407s
fi-ilk-650       total:289  pass:229  dwarn:0   dfail:0   fail:0   skip:60  time:432s
fi-ivb-3520m     total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:487s
fi-ivb-3770      total:289  pass:260  dwarn:0   dfail:0   fail:1   skip:28  time:458s
fi-kbl-7500u     total:289  pass:264  dwarn:1   dfail:0   fail:0   skip:24  time:480s
fi-kbl-7560u     total:289  pass:270  dwarn:0   dfail:0   fail:0   skip:19  time:574s
fi-kbl-r         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:586s
fi-pnv-d510      total:289  pass:223  dwarn:1   dfail:0   fail:0   skip:65  time:543s
fi-skl-6260u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:449s
fi-skl-6700k     total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:750s
fi-skl-6770hq    total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:491s
fi-skl-gvtdvm    total:289  pass:266  dwarn:0   dfail:0   fail:0   skip:23  time:473s
fi-snb-2520m     total:289  pass:251  dwarn:0   dfail:0   fail:0   skip:38  time:567s
fi-snb-2600      total:289  pass:250  dwarn:0   dfail:0   fail:0   skip:39  time:415s

bf6ecf6d25c1c45e576643b7d7a65e8b1e6b4f01 drm-tip: 2017y-09m-19d-17h-23m-04s UTC integration manifest
b6d41b3cd871 drm/i915: Stop engines before reset

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5749/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* ✓ Fi.CI.IGT: success for drm/i915: Stop engines before reset (rev2)
  2017-09-19 14:41 [PATCH] drm/i915: Stop engines before reset Mika Kuoppala
  2017-09-19 19:14 ` ✓ Fi.CI.BAT: success for drm/i915: Stop engines before reset (rev2) Patchwork
@ 2017-09-19 22:40 ` Patchwork
  2017-09-20 11:24 ` [PATCH] drm/i915: Stop engines before reset Mika Kuoppala
  2 siblings, 0 replies; 5+ messages in thread
From: Patchwork @ 2017-09-19 22:40 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Stop engines before reset (rev2)
URL   : https://patchwork.freedesktop.org/series/30357/
State : success

== Summary ==

Test perf:
        Subgroup blocking:
                pass       -> FAIL       (shard-hsw) fdo#102252 +1
Test kms_flip:
        Subgroup rcs-wf_vblank-vs-dpms:
                dmesg-warn -> PASS       (shard-hsw)

fdo#102252 https://bugs.freedesktop.org/show_bug.cgi?id=102252

shard-hsw        total:2317 pass:1245 dwarn:3   dfail:0   fail:12  skip:1057 time:9589s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5749/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/i915: Stop engines before reset
  2017-09-19 14:41 [PATCH] drm/i915: Stop engines before reset Mika Kuoppala
  2017-09-19 19:14 ` ✓ Fi.CI.BAT: success for drm/i915: Stop engines before reset (rev2) Patchwork
  2017-09-19 22:40 ` ✓ Fi.CI.IGT: " Patchwork
@ 2017-09-20 11:24 ` Mika Kuoppala
  2 siblings, 0 replies; 5+ messages in thread
From: Mika Kuoppala @ 2017-09-20 11:24 UTC (permalink / raw)
  To: intel-gfx

Mika Kuoppala <mika.kuoppala@linux.intel.com> writes:

> On kbl evidence indicates that even if the hardware happily
> tells us to proceed with reset, it really isn't ready.
> Resetting a freely running batchbuffer after we have ack for readiness,
> still can cause a system hang.
>
> We also have similar experiences on older gens. So now
> attempt to stop engines before proceeding for reset, on all
> gens where we have a gpu reset. This has shown to improve reset
> reliability and reduce the risk of losing the machine.
>
> v2: Add fixme for wa (Joonas)
>
> Testcase: igt/prime_busy/hang-* # kbl
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Pushed, thanks for acks.
-Mika
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-09-20 11:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-19 14:41 [PATCH] drm/i915: Stop engines before reset Mika Kuoppala
2017-09-19 19:14 ` ✓ Fi.CI.BAT: success for drm/i915: Stop engines before reset (rev2) Patchwork
2017-09-19 22:40 ` ✓ Fi.CI.IGT: " Patchwork
2017-09-20 11:24 ` [PATCH] drm/i915: Stop engines before reset Mika Kuoppala
  -- strict thread matches above, loose matches on Subject: below --
2017-09-14 11:10 Mika Kuoppala

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox