From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH v4] drm/i915: Flush the ring stop bit after clearing RING_HEAD in reset
Date: Fri, 18 May 2018 12:05:17 +0100 [thread overview]
Message-ID: <6e157f27-fcc6-ebe0-e549-c2f064f0c340@linux.intel.com> (raw)
In-Reply-To: <20180518100933.2239-1-chris@chris-wilson.co.uk>
On 18/05/2018 11:09, Chris Wilson wrote:
> Inside the live_hangcheck (reset) selftests, we occasionally see
> failures like
>
> <7>[ 239.094840] i915_gem_set_wedged rcs0
> <7>[ 239.094843] i915_gem_set_wedged current seqno 19a98, last 19a9a, hangcheck 0 [5158 ms]
> <7>[ 239.094846] i915_gem_set_wedged Reset count: 6239 (global 1)
> <7>[ 239.094848] i915_gem_set_wedged Requests:
> <7>[ 239.095052] i915_gem_set_wedged first 19a99 [e8c:5f] prio=1024 @ 5159ms: (null)
> <7>[ 239.095056] i915_gem_set_wedged last 19a9a [e81:1a] prio=139 @ 5159ms: igt/rcs0[5977]/1
> <7>[ 239.095059] i915_gem_set_wedged active 19a99 [e8c:5f] prio=1024 @ 5159ms: (null)
> <7>[ 239.095062] i915_gem_set_wedged [head 0220, postfix 0280, tail 02a8, batch 0xffffffff_ffffffff]
> <7>[ 239.100050] i915_gem_set_wedged ring->start: 0x00283000
> <7>[ 239.100053] i915_gem_set_wedged ring->head: 0x000001f8
> <7>[ 239.100055] i915_gem_set_wedged ring->tail: 0x000002a8
> <7>[ 239.100057] i915_gem_set_wedged ring->emit: 0x000002a8
> <7>[ 239.100059] i915_gem_set_wedged ring->space: 0x00000f10
> <7>[ 239.100085] i915_gem_set_wedged RING_START: 0x00283000
> <7>[ 239.100088] i915_gem_set_wedged RING_HEAD: 0x00000260
> <7>[ 239.100091] i915_gem_set_wedged RING_TAIL: 0x000002a8
> <7>[ 239.100094] i915_gem_set_wedged RING_CTL: 0x00000001
> <7>[ 239.100097] i915_gem_set_wedged RING_MODE: 0x00000300 [idle]
> <7>[ 239.100100] i915_gem_set_wedged RING_IMR: fffffefe
> <7>[ 239.100104] i915_gem_set_wedged ACTHD: 0x00000000_0000609c
> <7>[ 239.100108] i915_gem_set_wedged BBADDR: 0x00000000_0000609d
> <7>[ 239.100111] i915_gem_set_wedged DMA_FADDR: 0x00000000_00283260
> <7>[ 239.100114] i915_gem_set_wedged IPEIR: 0x00000000
> <7>[ 239.100117] i915_gem_set_wedged IPEHR: 0x02800000
> <7>[ 239.100120] i915_gem_set_wedged Execlist status: 0x00044052 00000002
> <7>[ 239.100124] i915_gem_set_wedged Execlist CSB read 5 [5 cached], write 5 [5 from hws], interrupt posted? no, tasklet queued? no (enabled)
> <7>[ 239.100128] i915_gem_set_wedged ELSP[0] count=1, ring->start=00283000, rq: 19a99 [e8c:5f] prio=1024 @ 5164ms: (null)
> <7>[ 239.100132] i915_gem_set_wedged ELSP[1] count=1, ring->start=00257000, rq: 19a9a [e81:1a] prio=139 @ 5164ms: igt/rcs0[5977]/1
> <7>[ 239.100135] i915_gem_set_wedged HW active? 0x5
> <7>[ 239.100250] i915_gem_set_wedged E 19a99 [e8c:5f] prio=1024 @ 5164ms: (null)
> <7>[ 239.100338] i915_gem_set_wedged E 19a9a [e81:1a] prio=139 @ 5164ms: igt/rcs0[5977]/1
> <7>[ 239.100340] i915_gem_set_wedged Queue priority: 139
> <7>[ 239.100343] i915_gem_set_wedged Q 0 [e98:19] prio=132 @ 5164ms: igt/rcs0[5977]/8
> <7>[ 239.100346] i915_gem_set_wedged Q 0 [e84:19] prio=121 @ 5165ms: igt/rcs0[5977]/2
> <7>[ 239.100349] i915_gem_set_wedged Q 0 [e87:19] prio=82 @ 5165ms: igt/rcs0[5977]/3
> <7>[ 239.100352] i915_gem_set_wedged Q 0 [e84:1a] prio=44 @ 5164ms: igt/rcs0[5977]/2
> <7>[ 239.100356] i915_gem_set_wedged Q 0 [e8b:19] prio=20 @ 5165ms: igt/rcs0[5977]/4
> <7>[ 239.100362] i915_gem_set_wedged drv_selftest [5894] waiting for 19a99
>
> where the GPU saw an arbitration point and idles; AND HAS NOT BEEN RESET!
> The RING_MODE indicates that is idle and has the STOP_RING bit set, so
> try clearing it.
>
> v2: Only clear the bit on restarting the ring, as we want to be sure the
> STOP_RING bit is kept if reset fails on wedging.
> v3: Spot when the ring state doesn't make sense when re-initialising the
> engine and dump it to the logs so that we don't have to wait for an
> error later and try to guess what happened earlier.
> v4: Prepare to print all the unexpected state, not just the first.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> ---
> drivers/gpu/drm/i915/intel_lrc.c | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 3744f5750624..ba8411ba4abf 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1781,6 +1781,9 @@ static void enable_execlists(struct intel_engine_cs *engine)
> I915_WRITE(RING_MODE_GEN7(engine),
> _MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE));
>
> + I915_WRITE(RING_MI_MODE(engine->mmio_base),
> + _MASKED_BIT_DISABLE(STOP_RING));
Worries me a bit to clear it unconditionally since documentation says
nothing (that I can find) about this scenario.
> +
> I915_WRITE(RING_HWS_PGA(engine->mmio_base),
> engine->status_page.ggtt_offset);
> POSTING_READ(RING_HWS_PGA(engine->mmio_base));
> @@ -1789,6 +1792,19 @@ static void enable_execlists(struct intel_engine_cs *engine)
> engine->execlists.csb_head = -1;
> }
>
> +static bool unexpected_starting_state(struct intel_engine_cs *engine)
> +{
> + struct drm_i915_private *dev_priv = engine->i915;
> + bool unexpected = false;
> +
> + if (I915_READ(RING_MI_MODE(engine->mmio_base)) & STOP_RING) {
> + DRM_DEBUG_DRIVER("STOP_RING still set in RING_MI_MODE\n");
Also worries me to bury this as a debug message unless we can find in
documentation that this is known to happen occasionally.
If we had the check and logging immediately after reset, with a more
visible log, we could build a larger data set of reported failures and
then go back to hw people and ask them about it.
> + unexpected = true;
> + }
> +
> + return unexpected;
> +}
> +
> static int gen8_init_common_ring(struct intel_engine_cs *engine)
> {
> struct intel_engine_execlists * const execlists = &engine->execlists;
> @@ -1801,6 +1817,12 @@ static int gen8_init_common_ring(struct intel_engine_cs *engine)
> intel_engine_reset_breadcrumbs(engine);
> intel_engine_init_hangcheck(engine);
>
> + if (GEM_SHOW_DEBUG() && unexpected_starting_state(engine)) {
> + struct drm_printer p = drm_debug_printer(__func__);
> +
> + intel_engine_dump(engine, &p, NULL);
> + }
> +
> enable_execlists(engine);
>
> /* After a GPU reset, we may have requests to replay */
>
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2018-05-18 11:05 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-18 10:09 [PATCH v4] drm/i915: Flush the ring stop bit after clearing RING_HEAD in reset Chris Wilson
2018-05-18 10:37 ` ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2018-05-18 10:54 ` ✓ Fi.CI.BAT: success " Patchwork
2018-05-18 11:05 ` Tvrtko Ursulin [this message]
2018-05-18 11:10 ` [PATCH v4] " Chris Wilson
2018-05-18 11:50 ` Tvrtko Ursulin
2018-05-18 12:28 ` Chris Wilson
2018-05-18 12:36 ` Tvrtko Ursulin
2018-05-18 14:13 ` Chris Wilson
2018-05-18 14:42 ` Tvrtko Ursulin
2018-05-19 9:04 ` Chris Wilson
2018-05-24 13:34 ` Tvrtko Ursulin
2018-05-24 13:40 ` Chris Wilson
2018-05-25 8:36 ` Tvrtko Ursulin
2018-05-25 12:54 ` Chris Wilson
2018-05-18 14:43 ` ✗ Fi.CI.IGT: failure for " Patchwork
2018-05-18 16:14 ` ✗ Fi.CI.CHECKPATCH: warning " Patchwork
2018-05-18 16:29 ` ✓ Fi.CI.BAT: success " Patchwork
2018-05-19 0:05 ` ✓ Fi.CI.IGT: " Patchwork
2018-05-19 9:05 ` [PATCH v4] " Chris Wilson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6e157f27-fcc6-ebe0-e549-c2f064f0c340@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.