All of lore.kernel.org
 help / color / mirror / Atom feed
* [CI PATCH] drm/i915/g4x: Improve gpu reset reliability
@ 2017-05-19  9:13 Mika Kuoppala
  2017-05-19 10:16 ` ✗ Fi.CI.BAT: failure for " Patchwork
  0 siblings, 1 reply; 3+ messages in thread
From: Mika Kuoppala @ 2017-05-19  9:13 UTC (permalink / raw)
  To: intel-gfx; +Cc: Tomi Sarvela

ELK seems to very picky about the preconditions to reset.
Evidence on Eaglelake (8086:2e12 (rev 03)) shows that it does
not like if reset occurs when there is active ring.

Ville found out that there is workaround with name
'WaMediaResetMainRingCleanup' which suggests that we need to
cleanup rings before resetting. It is unclear what cleanup
exactly means but evidence shows that stopping the ring
does have an effect on reset reliability. This patch makes
reset succesful on hangs induced by chained batches (the igt ones).
Note that if the hang is inside a shader, it is possible
that our attempts to stop the ring achieves anything.

v2: zero ctl,head,tail also. bug ref. use driver debugs (Chris)
v3: specify platform on testcases, comment tidyup (Chris)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100942
Testcase: igt/gem_busy/*-hang #elk
Testcase: igt/gem_ringfill/hang-* #elk
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tomi Sarvela <tomi.p.sarvela@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_uncore.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 7922264e4fe1..df425bf629e3 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1427,6 +1427,35 @@ int i915_reg_read_ioctl(struct drm_device *dev,
 	return ret;
 }
 
+static void gen3_stop_rings(struct drm_i915_private *dev_priv)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	for_each_engine(engine, dev_priv, id) {
+		const u32 base = engine->mmio_base;
+		const i915_reg_t mode = RING_MI_MODE(base);
+
+		I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
+		if (intel_wait_for_register_fw(dev_priv,
+					       mode,
+					       MODE_IDLE,
+					       MODE_IDLE,
+					       500))
+			DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n",
+					 engine->name);
+
+		I915_WRITE_FW(RING_CTL(base), 0);
+		I915_WRITE_FW(RING_HEAD(base), 0);
+		I915_WRITE_FW(RING_TAIL(base), 0);
+
+		/* Check acts as a post */
+		if (I915_READ_FW(RING_HEAD(base)) != 0)
+			DRM_DEBUG_DRIVER("%s: ring head not parked\n",
+					 engine->name);
+	}
+}
+
 static bool i915_reset_complete(struct pci_dev *pdev)
 {
 	u8 gdrst;
@@ -1473,6 +1502,12 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv, unsigned engine_mask)
 		   I915_READ(VDECCLK_GATE_D) | VCP_UNIT_CLOCK_GATE_DISABLE);
 	POSTING_READ(VDECCLK_GATE_D);
 
+	/* We stop engines, otherwise we might get failed reset and a
+	 * dead gpu (on elk).
+	 * WaMediaResetMainRingCleanup:ctg,elk (presumably)
+	 */
+	gen3_stop_rings(dev_priv);
+
 	pci_write_config_byte(pdev, I915_GDRST,
 			      GRDOM_MEDIA | GRDOM_RESET_ENABLE);
 	ret =  wait_for(g4x_reset_complete(pdev), 500);
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* ✗ Fi.CI.BAT: failure for drm/i915/g4x: Improve gpu reset reliability
  2017-05-19  9:13 [CI PATCH] drm/i915/g4x: Improve gpu reset reliability Mika Kuoppala
@ 2017-05-19 10:16 ` Patchwork
  2017-05-19 10:53   ` Mika Kuoppala
  0 siblings, 1 reply; 3+ messages in thread
From: Patchwork @ 2017-05-19 10:16 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/g4x: Improve gpu reset reliability
URL   : https://patchwork.freedesktop.org/series/24680/
State : failure

== Summary ==

Series 24680v1 drm/i915/g4x: Improve gpu reset reliability
https://patchwork.freedesktop.org/api/1.0/series/24680/revisions/1/mbox/

Test gem_exec_flush:
        Subgroup basic-batch-kernel-default-uc:
                fail       -> PASS       (fi-snb-2600) fdo#100007
Test gem_exec_suspend:
        Subgroup basic-s3:
                pass       -> INCOMPLETE (fi-skl-6260u)
        Subgroup basic-s4-devices:
                pass       -> DMESG-WARN (fi-snb-2600) fdo#100125

fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007
fdo#100125 https://bugs.freedesktop.org/show_bug.cgi?id=100125

fi-bdw-5557u     total:278  pass:267  dwarn:0   dfail:0   fail:0   skip:11  time:442s
fi-bdw-gvtdvm    total:278  pass:256  dwarn:8   dfail:0   fail:0   skip:14  time:432s
fi-bsw-n3050     total:278  pass:242  dwarn:0   dfail:0   fail:0   skip:36  time:587s
fi-bxt-j4205     total:278  pass:259  dwarn:0   dfail:0   fail:0   skip:19  time:511s
fi-byt-j1900     total:278  pass:254  dwarn:0   dfail:0   fail:0   skip:24  time:494s
fi-byt-n2820     total:278  pass:250  dwarn:0   dfail:0   fail:0   skip:28  time:485s
fi-hsw-4770      total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time:418s
fi-hsw-4770r     total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time:413s
fi-ilk-650       total:278  pass:228  dwarn:0   dfail:0   fail:0   skip:50  time:421s
fi-ivb-3520m     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time:493s
fi-ivb-3770      total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time:463s
fi-kbl-7500u     total:278  pass:255  dwarn:5   dfail:0   fail:0   skip:18  time:464s
fi-skl-6260u     total:108  pass:104  dwarn:0   dfail:0   fail:0   skip:3  
fi-skl-6700hq    total:278  pass:261  dwarn:0   dfail:0   fail:0   skip:17  time:578s
fi-skl-6700k     total:278  pass:256  dwarn:4   dfail:0   fail:0   skip:18  time:468s
fi-skl-6770hq    total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time:499s
fi-skl-gvtdvm    total:278  pass:265  dwarn:0   dfail:0   fail:0   skip:13  time:438s
fi-snb-2520m     total:278  pass:250  dwarn:0   dfail:0   fail:0   skip:28  time:535s
fi-snb-2600      total:278  pass:248  dwarn:1   dfail:0   fail:0   skip:29  time:411s

44200c88d542ceeea19b85832f554a1045aa2512 drm-tip: 2017y-05m-19d-08h-50m-52s UTC integration manifest
c437c0c drm/i915/g4x: Improve gpu reset reliability

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_4753/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: ✗ Fi.CI.BAT: failure for drm/i915/g4x: Improve gpu reset reliability
  2017-05-19 10:16 ` ✗ Fi.CI.BAT: failure for " Patchwork
@ 2017-05-19 10:53   ` Mika Kuoppala
  0 siblings, 0 replies; 3+ messages in thread
From: Mika Kuoppala @ 2017-05-19 10:53 UTC (permalink / raw)
  To: Patchwork; +Cc: intel-gfx

Patchwork <patchwork@emeril.freedesktop.org> writes:

> == Series Details ==
>
> Series: drm/i915/g4x: Improve gpu reset reliability
> URL   : https://patchwork.freedesktop.org/series/24680/
> State : failure
>
> == Summary ==
>
> Series 24680v1 drm/i915/g4x: Improve gpu reset reliability
> https://patchwork.freedesktop.org/api/1.0/series/24680/revisions/1/mbox/
>
> Test gem_exec_flush:
>         Subgroup basic-batch-kernel-default-uc:
>                 fail       -> PASS       (fi-snb-2600) fdo#100007
> Test gem_exec_suspend:
>         Subgroup basic-s3:
>                 pass       -> INCOMPLETE (fi-skl-6260u)

https://bugs.freedesktop.org/show_bug.cgi?id=100129

The patch in question doesn't touch skl codepaths
so it is in CI_DRM_2635.
-Mika

>         Subgroup basic-s4-devices:
>                 pass       -> DMESG-WARN (fi-snb-2600) fdo#100125
>
> fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007
> fdo#100125 https://bugs.freedesktop.org/show_bug.cgi?id=100125
>
> fi-bdw-5557u     total:278  pass:267  dwarn:0   dfail:0   fail:0   skip:11  time:442s
> fi-bdw-gvtdvm    total:278  pass:256  dwarn:8   dfail:0   fail:0   skip:14  time:432s
> fi-bsw-n3050     total:278  pass:242  dwarn:0   dfail:0   fail:0   skip:36  time:587s
> fi-bxt-j4205     total:278  pass:259  dwarn:0   dfail:0   fail:0   skip:19  time:511s
> fi-byt-j1900     total:278  pass:254  dwarn:0   dfail:0   fail:0   skip:24  time:494s
> fi-byt-n2820     total:278  pass:250  dwarn:0   dfail:0   fail:0   skip:28  time:485s
> fi-hsw-4770      total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time:418s
> fi-hsw-4770r     total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time:413s
> fi-ilk-650       total:278  pass:228  dwarn:0   dfail:0   fail:0   skip:50  time:421s
> fi-ivb-3520m     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time:493s
> fi-ivb-3770      total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time:463s
> fi-kbl-7500u     total:278  pass:255  dwarn:5   dfail:0   fail:0   skip:18  time:464s
> fi-skl-6260u     total:108  pass:104  dwarn:0   dfail:0   fail:0   skip:3  
> fi-skl-6700hq    total:278  pass:261  dwarn:0   dfail:0   fail:0   skip:17  time:578s
> fi-skl-6700k     total:278  pass:256  dwarn:4   dfail:0   fail:0   skip:18  time:468s
> fi-skl-6770hq    total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time:499s
> fi-skl-gvtdvm    total:278  pass:265  dwarn:0   dfail:0   fail:0   skip:13  time:438s
> fi-snb-2520m     total:278  pass:250  dwarn:0   dfail:0   fail:0   skip:28  time:535s
> fi-snb-2600      total:278  pass:248  dwarn:1   dfail:0   fail:0   skip:29  time:411s
>
> 44200c88d542ceeea19b85832f554a1045aa2512 drm-tip: 2017y-05m-19d-08h-50m-52s UTC integration manifest
> c437c0c drm/i915/g4x: Improve gpu reset reliability
>
> == Logs ==
>
> For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_4753/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-05-19 10:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-19  9:13 [CI PATCH] drm/i915/g4x: Improve gpu reset reliability Mika Kuoppala
2017-05-19 10:16 ` ✗ Fi.CI.BAT: failure for " Patchwork
2017-05-19 10:53   ` Mika Kuoppala

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.