* [CI PATCH] drm/i915/g4x: Improve gpu reset reliability
@ 2017-05-19 9:13 Mika Kuoppala
2017-05-19 10:16 ` ✗ Fi.CI.BAT: failure for " Patchwork
0 siblings, 1 reply; 3+ messages in thread
From: Mika Kuoppala @ 2017-05-19 9:13 UTC (permalink / raw)
To: intel-gfx; +Cc: Tomi Sarvela
ELK seems to very picky about the preconditions to reset.
Evidence on Eaglelake (8086:2e12 (rev 03)) shows that it does
not like if reset occurs when there is active ring.
Ville found out that there is workaround with name
'WaMediaResetMainRingCleanup' which suggests that we need to
cleanup rings before resetting. It is unclear what cleanup
exactly means but evidence shows that stopping the ring
does have an effect on reset reliability. This patch makes
reset succesful on hangs induced by chained batches (the igt ones).
Note that if the hang is inside a shader, it is possible
that our attempts to stop the ring achieves anything.
v2: zero ctl,head,tail also. bug ref. use driver debugs (Chris)
v3: specify platform on testcases, comment tidyup (Chris)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100942
Testcase: igt/gem_busy/*-hang #elk
Testcase: igt/gem_ringfill/hang-* #elk
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tomi Sarvela <tomi.p.sarvela@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
drivers/gpu/drm/i915/intel_uncore.c | 35 +++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 7922264e4fe1..df425bf629e3 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1427,6 +1427,35 @@ int i915_reg_read_ioctl(struct drm_device *dev,
return ret;
}
+static void gen3_stop_rings(struct drm_i915_private *dev_priv)
+{
+ struct intel_engine_cs *engine;
+ enum intel_engine_id id;
+
+ for_each_engine(engine, dev_priv, id) {
+ const u32 base = engine->mmio_base;
+ const i915_reg_t mode = RING_MI_MODE(base);
+
+ I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
+ if (intel_wait_for_register_fw(dev_priv,
+ mode,
+ MODE_IDLE,
+ MODE_IDLE,
+ 500))
+ DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n",
+ engine->name);
+
+ I915_WRITE_FW(RING_CTL(base), 0);
+ I915_WRITE_FW(RING_HEAD(base), 0);
+ I915_WRITE_FW(RING_TAIL(base), 0);
+
+ /* Check acts as a post */
+ if (I915_READ_FW(RING_HEAD(base)) != 0)
+ DRM_DEBUG_DRIVER("%s: ring head not parked\n",
+ engine->name);
+ }
+}
+
static bool i915_reset_complete(struct pci_dev *pdev)
{
u8 gdrst;
@@ -1473,6 +1502,12 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv, unsigned engine_mask)
I915_READ(VDECCLK_GATE_D) | VCP_UNIT_CLOCK_GATE_DISABLE);
POSTING_READ(VDECCLK_GATE_D);
+ /* We stop engines, otherwise we might get failed reset and a
+ * dead gpu (on elk).
+ * WaMediaResetMainRingCleanup:ctg,elk (presumably)
+ */
+ gen3_stop_rings(dev_priv);
+
pci_write_config_byte(pdev, I915_GDRST,
GRDOM_MEDIA | GRDOM_RESET_ENABLE);
ret = wait_for(g4x_reset_complete(pdev), 500);
--
2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 3+ messages in thread* ✗ Fi.CI.BAT: failure for drm/i915/g4x: Improve gpu reset reliability
2017-05-19 9:13 [CI PATCH] drm/i915/g4x: Improve gpu reset reliability Mika Kuoppala
@ 2017-05-19 10:16 ` Patchwork
2017-05-19 10:53 ` Mika Kuoppala
0 siblings, 1 reply; 3+ messages in thread
From: Patchwork @ 2017-05-19 10:16 UTC (permalink / raw)
To: Mika Kuoppala; +Cc: intel-gfx
== Series Details ==
Series: drm/i915/g4x: Improve gpu reset reliability
URL : https://patchwork.freedesktop.org/series/24680/
State : failure
== Summary ==
Series 24680v1 drm/i915/g4x: Improve gpu reset reliability
https://patchwork.freedesktop.org/api/1.0/series/24680/revisions/1/mbox/
Test gem_exec_flush:
Subgroup basic-batch-kernel-default-uc:
fail -> PASS (fi-snb-2600) fdo#100007
Test gem_exec_suspend:
Subgroup basic-s3:
pass -> INCOMPLETE (fi-skl-6260u)
Subgroup basic-s4-devices:
pass -> DMESG-WARN (fi-snb-2600) fdo#100125
fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007
fdo#100125 https://bugs.freedesktop.org/show_bug.cgi?id=100125
fi-bdw-5557u total:278 pass:267 dwarn:0 dfail:0 fail:0 skip:11 time:442s
fi-bdw-gvtdvm total:278 pass:256 dwarn:8 dfail:0 fail:0 skip:14 time:432s
fi-bsw-n3050 total:278 pass:242 dwarn:0 dfail:0 fail:0 skip:36 time:587s
fi-bxt-j4205 total:278 pass:259 dwarn:0 dfail:0 fail:0 skip:19 time:511s
fi-byt-j1900 total:278 pass:254 dwarn:0 dfail:0 fail:0 skip:24 time:494s
fi-byt-n2820 total:278 pass:250 dwarn:0 dfail:0 fail:0 skip:28 time:485s
fi-hsw-4770 total:278 pass:262 dwarn:0 dfail:0 fail:0 skip:16 time:418s
fi-hsw-4770r total:278 pass:262 dwarn:0 dfail:0 fail:0 skip:16 time:413s
fi-ilk-650 total:278 pass:228 dwarn:0 dfail:0 fail:0 skip:50 time:421s
fi-ivb-3520m total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18 time:493s
fi-ivb-3770 total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18 time:463s
fi-kbl-7500u total:278 pass:255 dwarn:5 dfail:0 fail:0 skip:18 time:464s
fi-skl-6260u total:108 pass:104 dwarn:0 dfail:0 fail:0 skip:3
fi-skl-6700hq total:278 pass:261 dwarn:0 dfail:0 fail:0 skip:17 time:578s
fi-skl-6700k total:278 pass:256 dwarn:4 dfail:0 fail:0 skip:18 time:468s
fi-skl-6770hq total:278 pass:268 dwarn:0 dfail:0 fail:0 skip:10 time:499s
fi-skl-gvtdvm total:278 pass:265 dwarn:0 dfail:0 fail:0 skip:13 time:438s
fi-snb-2520m total:278 pass:250 dwarn:0 dfail:0 fail:0 skip:28 time:535s
fi-snb-2600 total:278 pass:248 dwarn:1 dfail:0 fail:0 skip:29 time:411s
44200c88d542ceeea19b85832f554a1045aa2512 drm-tip: 2017y-05m-19d-08h-50m-52s UTC integration manifest
c437c0c drm/i915/g4x: Improve gpu reset reliability
== Logs ==
For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_4753/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: ✗ Fi.CI.BAT: failure for drm/i915/g4x: Improve gpu reset reliability
2017-05-19 10:16 ` ✗ Fi.CI.BAT: failure for " Patchwork
@ 2017-05-19 10:53 ` Mika Kuoppala
0 siblings, 0 replies; 3+ messages in thread
From: Mika Kuoppala @ 2017-05-19 10:53 UTC (permalink / raw)
To: Patchwork; +Cc: intel-gfx
Patchwork <patchwork@emeril.freedesktop.org> writes:
> == Series Details ==
>
> Series: drm/i915/g4x: Improve gpu reset reliability
> URL : https://patchwork.freedesktop.org/series/24680/
> State : failure
>
> == Summary ==
>
> Series 24680v1 drm/i915/g4x: Improve gpu reset reliability
> https://patchwork.freedesktop.org/api/1.0/series/24680/revisions/1/mbox/
>
> Test gem_exec_flush:
> Subgroup basic-batch-kernel-default-uc:
> fail -> PASS (fi-snb-2600) fdo#100007
> Test gem_exec_suspend:
> Subgroup basic-s3:
> pass -> INCOMPLETE (fi-skl-6260u)
https://bugs.freedesktop.org/show_bug.cgi?id=100129
The patch in question doesn't touch skl codepaths
so it is in CI_DRM_2635.
-Mika
> Subgroup basic-s4-devices:
> pass -> DMESG-WARN (fi-snb-2600) fdo#100125
>
> fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007
> fdo#100125 https://bugs.freedesktop.org/show_bug.cgi?id=100125
>
> fi-bdw-5557u total:278 pass:267 dwarn:0 dfail:0 fail:0 skip:11 time:442s
> fi-bdw-gvtdvm total:278 pass:256 dwarn:8 dfail:0 fail:0 skip:14 time:432s
> fi-bsw-n3050 total:278 pass:242 dwarn:0 dfail:0 fail:0 skip:36 time:587s
> fi-bxt-j4205 total:278 pass:259 dwarn:0 dfail:0 fail:0 skip:19 time:511s
> fi-byt-j1900 total:278 pass:254 dwarn:0 dfail:0 fail:0 skip:24 time:494s
> fi-byt-n2820 total:278 pass:250 dwarn:0 dfail:0 fail:0 skip:28 time:485s
> fi-hsw-4770 total:278 pass:262 dwarn:0 dfail:0 fail:0 skip:16 time:418s
> fi-hsw-4770r total:278 pass:262 dwarn:0 dfail:0 fail:0 skip:16 time:413s
> fi-ilk-650 total:278 pass:228 dwarn:0 dfail:0 fail:0 skip:50 time:421s
> fi-ivb-3520m total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18 time:493s
> fi-ivb-3770 total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18 time:463s
> fi-kbl-7500u total:278 pass:255 dwarn:5 dfail:0 fail:0 skip:18 time:464s
> fi-skl-6260u total:108 pass:104 dwarn:0 dfail:0 fail:0 skip:3
> fi-skl-6700hq total:278 pass:261 dwarn:0 dfail:0 fail:0 skip:17 time:578s
> fi-skl-6700k total:278 pass:256 dwarn:4 dfail:0 fail:0 skip:18 time:468s
> fi-skl-6770hq total:278 pass:268 dwarn:0 dfail:0 fail:0 skip:10 time:499s
> fi-skl-gvtdvm total:278 pass:265 dwarn:0 dfail:0 fail:0 skip:13 time:438s
> fi-snb-2520m total:278 pass:250 dwarn:0 dfail:0 fail:0 skip:28 time:535s
> fi-snb-2600 total:278 pass:248 dwarn:1 dfail:0 fail:0 skip:29 time:411s
>
> 44200c88d542ceeea19b85832f554a1045aa2512 drm-tip: 2017y-05m-19d-08h-50m-52s UTC integration manifest
> c437c0c drm/i915/g4x: Improve gpu reset reliability
>
> == Logs ==
>
> For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_4753/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2017-05-19 10:53 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-19 9:13 [CI PATCH] drm/i915/g4x: Improve gpu reset reliability Mika Kuoppala
2017-05-19 10:16 ` ✗ Fi.CI.BAT: failure for " Patchwork
2017-05-19 10:53 ` Mika Kuoppala
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.