* [PATCH] drm/i915: Stop engines before reset
@ 2017-09-14 11:10 Mika Kuoppala
2017-09-14 11:29 ` ✓ Fi.CI.BAT: success for " Patchwork
2017-09-14 12:28 ` ✗ Fi.CI.IGT: failure " Patchwork
0 siblings, 2 replies; 5+ messages in thread
From: Mika Kuoppala @ 2017-09-14 11:10 UTC (permalink / raw)
To: intel-gfx
On kbl evidence indicates that even if the hardware happily
tells us to proceed with reset, it really isn't ready.
Resetting a freely running batchbuffer after we have ack for readiness,
still can cause a system hang.
We also have similar experiences on older gens. So now
attempt to stop engines before proceeding for reset, on all
gens where we have a gpu reset. This has shown to improve reset
reliability and reduce the risk of losing the machine.
Testcase: igt/prime_busy/hang-* # kbl
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
---
drivers/gpu/drm/i915/intel_uncore.c | 72 ++++++++++++++++++++-----------------
1 file changed, 40 insertions(+), 32 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 97525de2cee4..473325705b92 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1354,33 +1354,38 @@ int i915_reg_read_ioctl(struct drm_device *dev,
return ret;
}
-static void gen3_stop_rings(struct drm_i915_private *dev_priv)
+static void gen3_stop_engine(struct intel_engine_cs *engine)
+{
+ struct drm_i915_private *dev_priv = engine->i915;
+ const u32 base = engine->mmio_base;
+ const i915_reg_t mode = RING_MI_MODE(base);
+
+ I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
+ if (intel_wait_for_register_fw(dev_priv,
+ mode,
+ MODE_IDLE,
+ MODE_IDLE,
+ 500))
+ DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n",
+ engine->name);
+
+ I915_WRITE_FW(RING_CTL(base), 0);
+ I915_WRITE_FW(RING_HEAD(base), 0);
+ I915_WRITE_FW(RING_TAIL(base), 0);
+
+ /* Check acts as a post */
+ if (I915_READ_FW(RING_HEAD(base)) != 0)
+ DRM_DEBUG_DRIVER("%s: ring head not parked\n",
+ engine->name);
+}
+
+static void i915_stop_engines(struct drm_i915_private *dev_priv, unsigned engine_mask)
{
struct intel_engine_cs *engine;
enum intel_engine_id id;
- for_each_engine(engine, dev_priv, id) {
- const u32 base = engine->mmio_base;
- const i915_reg_t mode = RING_MI_MODE(base);
-
- I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
- if (intel_wait_for_register_fw(dev_priv,
- mode,
- MODE_IDLE,
- MODE_IDLE,
- 500))
- DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n",
- engine->name);
-
- I915_WRITE_FW(RING_CTL(base), 0);
- I915_WRITE_FW(RING_HEAD(base), 0);
- I915_WRITE_FW(RING_TAIL(base), 0);
-
- /* Check acts as a post */
- if (I915_READ_FW(RING_HEAD(base)) != 0)
- DRM_DEBUG_DRIVER("%s: ring head not parked\n",
- engine->name);
- }
+ for_each_engine_masked(engine, dev_priv, engine_mask, id)
+ gen3_stop_engine(engine);
}
static bool i915_reset_complete(struct pci_dev *pdev)
@@ -1415,9 +1420,6 @@ static int g33_do_reset(struct drm_i915_private *dev_priv, unsigned engine_mask)
{
struct pci_dev *pdev = dev_priv->drm.pdev;
- /* Stop engines before we reset; see g4x_do_reset() below for why. */
- gen3_stop_rings(dev_priv);
-
pci_write_config_byte(pdev, I915_GDRST, GRDOM_RESET_ENABLE);
return wait_for(g4x_reset_complete(pdev), 500);
}
@@ -1432,12 +1434,6 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv, unsigned engine_mask)
I915_READ(VDECCLK_GATE_D) | VCP_UNIT_CLOCK_GATE_DISABLE);
POSTING_READ(VDECCLK_GATE_D);
- /* We stop engines, otherwise we might get failed reset and a
- * dead gpu (on elk).
- * WaMediaResetMainRingCleanup:ctg,elk (presumably)
- */
- gen3_stop_rings(dev_priv);
-
pci_write_config_byte(pdev, I915_GDRST,
GRDOM_MEDIA | GRDOM_RESET_ENABLE);
ret = wait_for(g4x_reset_complete(pdev), 500);
@@ -1742,6 +1738,18 @@ int intel_gpu_reset(struct drm_i915_private *dev_priv, unsigned engine_mask)
*/
intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
for (retry = 0; retry < 3; retry++) {
+
+ /* We stop engines, otherwise we might get failed reset and a
+ * dead gpu (on elk). Also as modern gpu as kbl can suffer
+ * from system hang if batchbuffer is progressing when
+ * the reset is issued, regardless of READY_TO_RESET ack.
+ * Thus assume it is best to stop engines on all gens
+ * where we have a gpu reset.
+ *
+ * WaMediaResetMainRingCleanup:ctg,elk (presumably)
+ */
+ i915_stop_engines(dev_priv, engine_mask);
+
ret = reset(dev_priv, engine_mask);
if (ret != -ETIMEDOUT)
break;
--
2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 5+ messages in thread
* ✓ Fi.CI.BAT: success for drm/i915: Stop engines before reset
2017-09-14 11:10 [PATCH] drm/i915: Stop engines before reset Mika Kuoppala
@ 2017-09-14 11:29 ` Patchwork
2017-09-14 12:28 ` ✗ Fi.CI.IGT: failure " Patchwork
1 sibling, 0 replies; 5+ messages in thread
From: Patchwork @ 2017-09-14 11:29 UTC (permalink / raw)
To: Mika Kuoppala; +Cc: intel-gfx
== Series Details ==
Series: drm/i915: Stop engines before reset
URL : https://patchwork.freedesktop.org/series/30357/
State : success
== Summary ==
Series 30357v1 drm/i915: Stop engines before reset
https://patchwork.freedesktop.org/api/1.0/series/30357/revisions/1/mbox/
Test chamelium:
Subgroup dp-crc-fast:
fail -> PASS (fi-kbl-7500u) fdo#102514
Test kms_cursor_legacy:
Subgroup basic-busy-flip-before-cursor-atomic:
fail -> PASS (fi-snb-2600) fdo#100215 +1
Test kms_flip:
Subgroup basic-flip-vs-modeset:
pass -> SKIP (fi-skl-x1585l) fdo#101781
fdo#102514 https://bugs.freedesktop.org/show_bug.cgi?id=102514
fdo#100215 https://bugs.freedesktop.org/show_bug.cgi?id=100215
fdo#101781 https://bugs.freedesktop.org/show_bug.cgi?id=101781
fi-bdw-5557u total:289 pass:268 dwarn:0 dfail:0 fail:0 skip:21 time:444s
fi-bdw-gvtdvm total:289 pass:265 dwarn:0 dfail:0 fail:0 skip:24 time:450s
fi-blb-e6850 total:289 pass:224 dwarn:1 dfail:0 fail:0 skip:64 time:377s
fi-bsw-n3050 total:289 pass:243 dwarn:0 dfail:0 fail:0 skip:46 time:527s
fi-bwr-2160 total:289 pass:184 dwarn:0 dfail:0 fail:0 skip:105 time:271s
fi-bxt-j4205 total:289 pass:260 dwarn:0 dfail:0 fail:0 skip:29 time:507s
fi-byt-j1900 total:289 pass:254 dwarn:1 dfail:0 fail:0 skip:34 time:500s
fi-byt-n2820 total:289 pass:250 dwarn:1 dfail:0 fail:0 skip:38 time:502s
fi-cfl-s total:289 pass:223 dwarn:34 dfail:0 fail:0 skip:32 time:550s
fi-elk-e7500 total:289 pass:230 dwarn:0 dfail:0 fail:0 skip:59 time:454s
fi-glk-2a total:289 pass:260 dwarn:0 dfail:0 fail:0 skip:29 time:591s
fi-hsw-4770 total:289 pass:263 dwarn:0 dfail:0 fail:0 skip:26 time:429s
fi-hsw-4770r total:289 pass:263 dwarn:0 dfail:0 fail:0 skip:26 time:408s
fi-ilk-650 total:289 pass:229 dwarn:0 dfail:0 fail:0 skip:60 time:440s
fi-ivb-3520m total:289 pass:261 dwarn:0 dfail:0 fail:0 skip:28 time:489s
fi-ivb-3770 total:289 pass:261 dwarn:0 dfail:0 fail:0 skip:28 time:462s
fi-kbl-7500u total:289 pass:264 dwarn:1 dfail:0 fail:0 skip:24 time:491s
fi-kbl-7560u total:289 pass:270 dwarn:0 dfail:0 fail:0 skip:19 time:576s
fi-kbl-r total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:582s
fi-skl-6260u total:289 pass:269 dwarn:0 dfail:0 fail:0 skip:20 time:472s
fi-skl-6700k total:289 pass:265 dwarn:0 dfail:0 fail:0 skip:24 time:514s
fi-skl-6770hq total:289 pass:269 dwarn:0 dfail:0 fail:0 skip:20 time:497s
fi-skl-gvtdvm total:289 pass:266 dwarn:0 dfail:0 fail:0 skip:23 time:458s
fi-skl-x1585l total:289 pass:268 dwarn:0 dfail:0 fail:0 skip:21 time:476s
fi-snb-2520m total:289 pass:251 dwarn:0 dfail:0 fail:0 skip:38 time:568s
fi-snb-2600 total:289 pass:250 dwarn:0 dfail:0 fail:0 skip:39 time:424s
fi-pnv-d510 failed to connect after reboot
b21142588a9a16f74988f6b524953d4382be4a81 drm-tip: 2017y-09m-14d-08h-23m-57s UTC integration manifest
e65858da5746 drm/i915: Stop engines before reset
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5697/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 5+ messages in thread
* ✗ Fi.CI.IGT: failure for drm/i915: Stop engines before reset
2017-09-14 11:10 [PATCH] drm/i915: Stop engines before reset Mika Kuoppala
2017-09-14 11:29 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2017-09-14 12:28 ` Patchwork
1 sibling, 0 replies; 5+ messages in thread
From: Patchwork @ 2017-09-14 12:28 UTC (permalink / raw)
To: Mika Kuoppala; +Cc: intel-gfx
== Series Details ==
Series: drm/i915: Stop engines before reset
URL : https://patchwork.freedesktop.org/series/30357/
State : failure
== Summary ==
Test kms_flip:
Subgroup dpms-vs-vblank-race:
pass -> FAIL (shard-hsw)
Test kms_setmode:
Subgroup basic:
pass -> FAIL (shard-hsw) fdo#99912
fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912
shard-hsw total:2313 pass:1243 dwarn:0 dfail:0 fail:15 skip:1055 time:9360s
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5697/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] drm/i915: Stop engines before reset
@ 2017-09-19 14:41 Mika Kuoppala
2017-09-20 11:24 ` Mika Kuoppala
0 siblings, 1 reply; 5+ messages in thread
From: Mika Kuoppala @ 2017-09-19 14:41 UTC (permalink / raw)
To: intel-gfx
On kbl evidence indicates that even if the hardware happily
tells us to proceed with reset, it really isn't ready.
Resetting a freely running batchbuffer after we have ack for readiness,
still can cause a system hang.
We also have similar experiences on older gens. So now
attempt to stop engines before proceeding for reset, on all
gens where we have a gpu reset. This has shown to improve reset
reliability and reduce the risk of losing the machine.
v2: Add fixme for wa (Joonas)
Testcase: igt/prime_busy/hang-* # kbl
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
drivers/gpu/drm/i915/intel_uncore.c | 75 +++++++++++++++++++++----------------
1 file changed, 43 insertions(+), 32 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 97525de2cee4..fdd7f93acb4f 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1354,33 +1354,39 @@ int i915_reg_read_ioctl(struct drm_device *dev,
return ret;
}
-static void gen3_stop_rings(struct drm_i915_private *dev_priv)
+static void gen3_stop_engine(struct intel_engine_cs *engine)
+{
+ struct drm_i915_private *dev_priv = engine->i915;
+ const u32 base = engine->mmio_base;
+ const i915_reg_t mode = RING_MI_MODE(base);
+
+ I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
+ if (intel_wait_for_register_fw(dev_priv,
+ mode,
+ MODE_IDLE,
+ MODE_IDLE,
+ 500))
+ DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n",
+ engine->name);
+
+ I915_WRITE_FW(RING_CTL(base), 0);
+ I915_WRITE_FW(RING_HEAD(base), 0);
+ I915_WRITE_FW(RING_TAIL(base), 0);
+
+ /* Check acts as a post */
+ if (I915_READ_FW(RING_HEAD(base)) != 0)
+ DRM_DEBUG_DRIVER("%s: ring head not parked\n",
+ engine->name);
+}
+
+static void i915_stop_engines(struct drm_i915_private *dev_priv,
+ unsigned engine_mask)
{
struct intel_engine_cs *engine;
enum intel_engine_id id;
- for_each_engine(engine, dev_priv, id) {
- const u32 base = engine->mmio_base;
- const i915_reg_t mode = RING_MI_MODE(base);
-
- I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
- if (intel_wait_for_register_fw(dev_priv,
- mode,
- MODE_IDLE,
- MODE_IDLE,
- 500))
- DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n",
- engine->name);
-
- I915_WRITE_FW(RING_CTL(base), 0);
- I915_WRITE_FW(RING_HEAD(base), 0);
- I915_WRITE_FW(RING_TAIL(base), 0);
-
- /* Check acts as a post */
- if (I915_READ_FW(RING_HEAD(base)) != 0)
- DRM_DEBUG_DRIVER("%s: ring head not parked\n",
- engine->name);
- }
+ for_each_engine_masked(engine, dev_priv, engine_mask, id)
+ gen3_stop_engine(engine);
}
static bool i915_reset_complete(struct pci_dev *pdev)
@@ -1415,9 +1421,6 @@ static int g33_do_reset(struct drm_i915_private *dev_priv, unsigned engine_mask)
{
struct pci_dev *pdev = dev_priv->drm.pdev;
- /* Stop engines before we reset; see g4x_do_reset() below for why. */
- gen3_stop_rings(dev_priv);
-
pci_write_config_byte(pdev, I915_GDRST, GRDOM_RESET_ENABLE);
return wait_for(g4x_reset_complete(pdev), 500);
}
@@ -1432,12 +1435,6 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv, unsigned engine_mask)
I915_READ(VDECCLK_GATE_D) | VCP_UNIT_CLOCK_GATE_DISABLE);
POSTING_READ(VDECCLK_GATE_D);
- /* We stop engines, otherwise we might get failed reset and a
- * dead gpu (on elk).
- * WaMediaResetMainRingCleanup:ctg,elk (presumably)
- */
- gen3_stop_rings(dev_priv);
-
pci_write_config_byte(pdev, I915_GDRST,
GRDOM_MEDIA | GRDOM_RESET_ENABLE);
ret = wait_for(g4x_reset_complete(pdev), 500);
@@ -1742,6 +1739,20 @@ int intel_gpu_reset(struct drm_i915_private *dev_priv, unsigned engine_mask)
*/
intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
for (retry = 0; retry < 3; retry++) {
+
+ /* We stop engines, otherwise we might get failed reset and a
+ * dead gpu (on elk). Also as modern gpu as kbl can suffer
+ * from system hang if batchbuffer is progressing when
+ * the reset is issued, regardless of READY_TO_RESET ack.
+ * Thus assume it is best to stop engines on all gens
+ * where we have a gpu reset.
+ *
+ * WaMediaResetMainRingCleanup:ctg,elk (presumably)
+ *
+ * FIXME: Wa for more modern gens needs to be validated
+ */
+ i915_stop_engines(dev_priv, engine_mask);
+
ret = reset(dev_priv, engine_mask);
if (ret != -ETIMEDOUT)
break;
--
2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] drm/i915: Stop engines before reset
2017-09-19 14:41 [PATCH] " Mika Kuoppala
@ 2017-09-20 11:24 ` Mika Kuoppala
0 siblings, 0 replies; 5+ messages in thread
From: Mika Kuoppala @ 2017-09-20 11:24 UTC (permalink / raw)
To: intel-gfx
Mika Kuoppala <mika.kuoppala@linux.intel.com> writes:
> On kbl evidence indicates that even if the hardware happily
> tells us to proceed with reset, it really isn't ready.
> Resetting a freely running batchbuffer after we have ack for readiness,
> still can cause a system hang.
>
> We also have similar experiences on older gens. So now
> attempt to stop engines before proceeding for reset, on all
> gens where we have a gpu reset. This has shown to improve reset
> reliability and reduce the risk of losing the machine.
>
> v2: Add fixme for wa (Joonas)
>
> Testcase: igt/prime_busy/hang-* # kbl
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Pushed, thanks for acks.
-Mika
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-09-20 11:26 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-14 11:10 [PATCH] drm/i915: Stop engines before reset Mika Kuoppala
2017-09-14 11:29 ` ✓ Fi.CI.BAT: success for " Patchwork
2017-09-14 12:28 ` ✗ Fi.CI.IGT: failure " Patchwork
-- strict thread matches above, loose matches on Subject: below --
2017-09-19 14:41 [PATCH] " Mika Kuoppala
2017-09-20 11:24 ` Mika Kuoppala
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox