[PATCH 1/2] drm/i915/execlists: Wrap tail pointer after reset tweaking

Linux kernel -stable discussions
 help / color / mirror / Atom feed

* [PATCH 1/2] drm/i915/execlists: Wrap tail pointer after reset tweaking
@ 2017-03-27  3:28 Chris Wilson
  2017-03-27 10:44 ` [Intel-gfx] " Mika Kuoppala
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Wilson @ 2017-03-27  3:28 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson, Mika Kuoppala, # v4 . 10+

If the request->wa_tail is 0 (because it landed exactly on the end of
the ringbuffer), when we reconstruct request->tail following a reset we
fill in an illegal value (-8 or 0x001ffff8). As a result, RING_HEAD is
never able to catch up with RING_TAIL and the GPU spins endlessly. If
the ring contains a couple of breadcrumbs, even our hangcheck is unable
to catch the busy-looping as the ACTHD and seqno continually advance.

Fixes: a3aabe86a340 ("drm/i915/execlists: Reinitialise context image after GPU hang")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
---
 drivers/gpu/drm/i915/intel_lrc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3fdabba0a32d..c8dd848d2ebe 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1302,6 +1302,7 @@ static void reset_common_ring(struct intel_engine_cs *engine,
 
 	/* Reset WaIdleLiteRestore:bdw,skl as well */
 	request->tail = request->wa_tail - WA_TAIL_DWORDS * sizeof(u32);
+	request->tail &= request->ring->size - 1;
 	GEM_BUG_ON(!IS_ALIGNED(request->tail, 8));
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Intel-gfx] [PATCH 1/2] drm/i915/execlists: Wrap tail pointer after        reset tweaking
  2017-03-27  3:28 [PATCH 1/2] drm/i915/execlists: Wrap tail pointer after reset tweaking Chris Wilson
@ 2017-03-27 10:44 ` Mika Kuoppala
  2017-03-27 11:00   ` Chris Wilson
  0 siblings, 1 reply; 5+ messages in thread
From: Mika Kuoppala @ 2017-03-27 10:44 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: # v4 . 10+

Chris Wilson <chris@chris-wilson.co.uk> writes:

> If the request->wa_tail is 0 (because it landed exactly on the end of
> the ringbuffer), when we reconstruct request->tail following a reset we
> fill in an illegal value (-8 or 0x001ffff8). As a result, RING_HEAD is
> never able to catch up with RING_TAIL and the GPU spins endlessly. If
> the ring contains a couple of breadcrumbs, even our hangcheck is unable
> to catch the busy-looping as the ACTHD and seqno continually advance.

Tail is past ring size (on hw) and the ring contents has seqno writes.
So we will replay the ring contents over and over and seqno advances
and wraps back to the first breadcrumbs in ring?

> Fixes: a3aabe86a340 ("drm/i915/execlists: Reinitialise context image after GPU hang")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: <stable@vger.kernel.org> # v4.10+
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 3fdabba0a32d..c8dd848d2ebe 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1302,6 +1302,7 @@ static void reset_common_ring(struct intel_engine_cs *engine,
>  
>  	/* Reset WaIdleLiteRestore:bdw,skl as well */
>  	request->tail = request->wa_tail - WA_TAIL_DWORDS * sizeof(u32);
> +	request->tail &= request->ring->size - 1;
>  	GEM_BUG_ON(!IS_ALIGNED(request->tail, 8));
>  }
>  
> -- 
> 2.11.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Intel-gfx] [PATCH 1/2] drm/i915/execlists: Wrap tail pointer after  reset tweaking
  2017-03-27 10:44 ` [Intel-gfx] " Mika Kuoppala
@ 2017-03-27 11:00   ` Chris Wilson
  2017-03-27 11:07     ` Mika Kuoppala
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Wilson @ 2017-03-27 11:00 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx, # v4 . 10+

On Mon, Mar 27, 2017 at 01:44:00PM +0300, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > If the request->wa_tail is 0 (because it landed exactly on the end of
> > the ringbuffer), when we reconstruct request->tail following a reset we
> > fill in an illegal value (-8 or 0x001ffff8). As a result, RING_HEAD is
> > never able to catch up with RING_TAIL and the GPU spins endlessly. If
> > the ring contains a couple of breadcrumbs, even our hangcheck is unable
> > to catch the busy-looping as the ACTHD and seqno continually advance.
> 
> Tail is past ring size (on hw) and the ring contents has seqno writes.
> So we will replay the ring contents over and over and seqno advances
> and wraps back to the first breadcrumbs in ring?

Yup. It was most confusing to watch. The execlist_port[] was static,
RING_START was static, yet the seqno kept changing. I felt like I was
hallucinating. That or insomnia.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Intel-gfx] [PATCH 1/2] drm/i915/execlists: Wrap tail pointer after        reset tweaking
  2017-03-27 11:00   ` Chris Wilson
@ 2017-03-27 11:07     ` Mika Kuoppala
  2017-03-27 11:19       ` Chris Wilson
  0 siblings, 1 reply; 5+ messages in thread
From: Mika Kuoppala @ 2017-03-27 11:07 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, # v4 . 10+

Chris Wilson <chris@chris-wilson.co.uk> writes:

> On Mon, Mar 27, 2017 at 01:44:00PM +0300, Mika Kuoppala wrote:
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>> 
>> > If the request->wa_tail is 0 (because it landed exactly on the end of
>> > the ringbuffer), when we reconstruct request->tail following a reset we
>> > fill in an illegal value (-8 or 0x001ffff8). As a result, RING_HEAD is
>> > never able to catch up with RING_TAIL and the GPU spins endlessly. If
>> > the ring contains a couple of breadcrumbs, even our hangcheck is unable
>> > to catch the busy-looping as the ACTHD and seqno continually advance.
>> 
>> Tail is past ring size (on hw) and the ring contents has seqno writes.
>> So we will replay the ring contents over and over and seqno advances
>> and wraps back to the first breadcrumbs in ring?
>
> Yup. It was most confusing to watch. The execlist_port[] was static,
> RING_START was static, yet the seqno kept changing. I felt like I was
> hallucinating. That or insomnia.

/o\

When we reset_common_ring() it is always after a hw reset. So the
'last' in sense of hardware's lrc contexts doesn't mean much.

So can we actually get rid of the tail trickery as for first
request after reset, as the lite restore can't happen and
should not matter?

-Mika

> -Chris
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Intel-gfx] [PATCH 1/2] drm/i915/execlists: Wrap tail pointer after  reset tweaking
  2017-03-27 11:07     ` Mika Kuoppala
@ 2017-03-27 11:19       ` Chris Wilson
  0 siblings, 0 replies; 5+ messages in thread
From: Chris Wilson @ 2017-03-27 11:19 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx, # v4 . 10+

On Mon, Mar 27, 2017 at 02:07:09PM +0300, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > On Mon, Mar 27, 2017 at 01:44:00PM +0300, Mika Kuoppala wrote:
> >> Chris Wilson <chris@chris-wilson.co.uk> writes:
> >> 
> >> > If the request->wa_tail is 0 (because it landed exactly on the end of
> >> > the ringbuffer), when we reconstruct request->tail following a reset we
> >> > fill in an illegal value (-8 or 0x001ffff8). As a result, RING_HEAD is
> >> > never able to catch up with RING_TAIL and the GPU spins endlessly. If
> >> > the ring contains a couple of breadcrumbs, even our hangcheck is unable
> >> > to catch the busy-looping as the ACTHD and seqno continually advance.
> >> 
> >> Tail is past ring size (on hw) and the ring contents has seqno writes.
> >> So we will replay the ring contents over and over and seqno advances
> >> and wraps back to the first breadcrumbs in ring?
> >
> > Yup. It was most confusing to watch. The execlist_port[] was static,
> > RING_START was static, yet the seqno kept changing. I felt like I was
> > hallucinating. That or insomnia.
> 
> /o\
> 
> When we reset_common_ring() it is always after a hw reset. So the
> 'last' in sense of hardware's lrc contexts doesn't mean much.
> 
> So can we actually get rid of the tail trickery as for first
> request after reset, as the lite restore can't happen and
> should not matter?

So move handling the rare case into the latency sensitive hotpath? ;)

The complaint I feel is that we don't have a great interface, otoh this
manipulation is currently a one-off.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-03-27 11:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-03-27  3:28 [PATCH 1/2] drm/i915/execlists: Wrap tail pointer after reset tweaking Chris Wilson
2017-03-27 10:44 ` [Intel-gfx] " Mika Kuoppala
2017-03-27 11:00   ` Chris Wilson
2017-03-27 11:07     ` Mika Kuoppala
2017-03-27 11:19       ` Chris Wilson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox