All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/i915/execlists: Reset CSB pointers by mmio as well
Date: Mon, 04 Nov 2019 16:27:21 +0200	[thread overview]
Message-ID: <87o8xrogxi.fsf@gaia.fi.intel.com> (raw)
In-Reply-To: <20191104135307.21083-1-chris@chris-wilson.co.uk>

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Sometimes Icelake forgets to reset the CSB pointers on a GPU reset,
> leading to carry on updating the old tail of the buffer.
>
> <0>[  618.138490] i915_sel-5636    3d..1 673425465us : trace_ports: vecs0: submit { 14de2:504, 0:0 }
> <0>[  618.138490] i915_sel-5636    3.... 673425493us : intel_engine_reset: vecs0 flags=100
> <0>[  618.138490] i915_sel-5636    3.... 673425493us : execlists_reset_prepare: vecs0: depth<-0
> <0>[  618.138490] i915_sel-5636    3.... 673425493us : intel_engine_stop_cs: vecs0
> <0>[  618.138490] i915_sel-5636    3.... 673425523us : __intel_gt_reset: engine_mask=40
> <0>[  618.138490] i915_sel-5636    3.... 673425568us : execlists_reset: vecs0
> <0>[  618.138490] i915_sel-5636    3d..1 673425568us : process_csb: vecs0 cs-irq head=1, tail=2
> <0>[  618.138490] i915_sel-5636    3d..1 673425568us : process_csb: vecs0 csb[2]: status=0x00000001:0x40000000
> <0>[  618.138490] i915_sel-5636    3d..1 673425569us : trace_ports: vecs0: promote { 14de2:504*, 0:0 }
> <0>[  618.138490] i915_sel-5636    3d..1 673425570us : __i915_request_reset: vecs0 rq=14de2:504, guilty? yes
> <0>[  618.138490] i915_sel-5636    3d..1 673425571us : __execlists_reset: vecs0 replay {head:2de0, tail:2e48}
> <0>[  618.138490] i915_sel-5636    3d..1 673425572us : __i915_request_unsubmit: vecs0 fence 14de2:504, current 503
> <0>[  618.138490] i915_sel-5636    3.... 673435544us : intel_engine_cancel_stop_cs: vecs0
> <0>[  618.138490] i915_sel-5636    3.... 673435544us : process_csb: vecs0 cs-irq head=11, tail=11
> <0>[  618.138490] i915_sel-5636    3d..1 673435545us : __i915_request_submit: vecs0 fence 14de2:504, current 503
> <0>[  618.138490] i915_sel-5636    3d..1 673435546us : __execlists_submission_tasklet: vecs0: queue_priority_hint:-2147483648, submit:yes
> <0>[  618.138490] i915_sel-5636    3d..1 673435548us : trace_ports: vecs0: submit { 14de2:504*, 0:0 }
> <0>[  618.138490] i915_sel-5636    3.... 673435549us : execlists_reset_finish: vecs0: depth->0
> <0>[  618.138490] ksoftirq-21      2..s. 673435592us : process_csb: vecs0 cs-irq head=11, tail=3
> <0>[  618.138490] ksoftirq-21      2..s. 673435593us : process_csb: vecs0 csb[0]: status=0x00000001:0x40000000
> <0>[  618.138490] ksoftirq-21      2..s. 673435594us : trace_ports: vecs0: promote { 14de2:504*, 0:0 }
> <0>[  618.138490] ksoftirq-21      2..s. 673435596us : process_csb: vecs0 csb[1]: status=0x00000018:0x40000040
> <0>[  618.138490] ksoftirq-21      2..s. 673435597us : trace_ports: vecs0: completed { 14de2:504*, 0:0 }
> <0>[  618.138490] ksoftirq-21      2..s. 673435612us : process_csb: process_csb:2188 GEM_BUG_ON(!i915_request_completed(*execlists->active) && !reset_in_progress(execlists))
>
> After the reset, we do another clflush before checking the CSB to be
> sure we see whatever was left in the CSB prior to the reset. So it is
> unlikely to be an incoherent view of the CSB, and more likely that
> Icelake didn't reset its pointers.
>
> References: 582a6f90aa0d ("drm/i915/execlists: Add a paranoid flush of the CSB pointers upon reset")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_lrc.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 8d79a965f341..b869a9c7b3c6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -2996,6 +2996,14 @@ static void reset_csb_pointers(struct intel_engine_cs *engine)
>  	WRITE_ONCE(*execlists->csb_write, reset_value);
>  	wmb(); /* Make sure this is visible to HW (paranoia?) */
>  
> +	/*
> +	 * Sometimes Icelakes forgets to reset its pointers on a GPU reset.
> +	 * Bludgeon them with a mmio update to be sure.
> +	 */
> +	ENGINE_WRITE(engine, RING_CONTEXT_STATUS_PTR,
> +		     reset_value << 8 | reset_value);
> +	ENGINE_POSTING_READ(engine, RING_CONTEXT_STATUS_PTR);
> +

Big hammer indeed. Next one is that we assert the reset value
and check that first one after reset comes out with 0x0?

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

>  	invalidate_csb_entries(&execlists->csb_status[0],
>  			       &execlists->csb_status[reset_value]);
>  }
> -- 
> 2.24.0.rc2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

WARNING: multiple messages have this Message-ID (diff)
From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH] drm/i915/execlists: Reset CSB pointers by mmio as well
Date: Mon, 04 Nov 2019 16:27:21 +0200	[thread overview]
Message-ID: <87o8xrogxi.fsf@gaia.fi.intel.com> (raw)
Message-ID: <20191104142721.6J2dAmqVVUJ7_6IkE9ilFM2_vttouWJzEixqgt4BVIE@z> (raw)
In-Reply-To: <20191104135307.21083-1-chris@chris-wilson.co.uk>

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Sometimes Icelake forgets to reset the CSB pointers on a GPU reset,
> leading to carry on updating the old tail of the buffer.
>
> <0>[  618.138490] i915_sel-5636    3d..1 673425465us : trace_ports: vecs0: submit { 14de2:504, 0:0 }
> <0>[  618.138490] i915_sel-5636    3.... 673425493us : intel_engine_reset: vecs0 flags=100
> <0>[  618.138490] i915_sel-5636    3.... 673425493us : execlists_reset_prepare: vecs0: depth<-0
> <0>[  618.138490] i915_sel-5636    3.... 673425493us : intel_engine_stop_cs: vecs0
> <0>[  618.138490] i915_sel-5636    3.... 673425523us : __intel_gt_reset: engine_mask=40
> <0>[  618.138490] i915_sel-5636    3.... 673425568us : execlists_reset: vecs0
> <0>[  618.138490] i915_sel-5636    3d..1 673425568us : process_csb: vecs0 cs-irq head=1, tail=2
> <0>[  618.138490] i915_sel-5636    3d..1 673425568us : process_csb: vecs0 csb[2]: status=0x00000001:0x40000000
> <0>[  618.138490] i915_sel-5636    3d..1 673425569us : trace_ports: vecs0: promote { 14de2:504*, 0:0 }
> <0>[  618.138490] i915_sel-5636    3d..1 673425570us : __i915_request_reset: vecs0 rq=14de2:504, guilty? yes
> <0>[  618.138490] i915_sel-5636    3d..1 673425571us : __execlists_reset: vecs0 replay {head:2de0, tail:2e48}
> <0>[  618.138490] i915_sel-5636    3d..1 673425572us : __i915_request_unsubmit: vecs0 fence 14de2:504, current 503
> <0>[  618.138490] i915_sel-5636    3.... 673435544us : intel_engine_cancel_stop_cs: vecs0
> <0>[  618.138490] i915_sel-5636    3.... 673435544us : process_csb: vecs0 cs-irq head=11, tail=11
> <0>[  618.138490] i915_sel-5636    3d..1 673435545us : __i915_request_submit: vecs0 fence 14de2:504, current 503
> <0>[  618.138490] i915_sel-5636    3d..1 673435546us : __execlists_submission_tasklet: vecs0: queue_priority_hint:-2147483648, submit:yes
> <0>[  618.138490] i915_sel-5636    3d..1 673435548us : trace_ports: vecs0: submit { 14de2:504*, 0:0 }
> <0>[  618.138490] i915_sel-5636    3.... 673435549us : execlists_reset_finish: vecs0: depth->0
> <0>[  618.138490] ksoftirq-21      2..s. 673435592us : process_csb: vecs0 cs-irq head=11, tail=3
> <0>[  618.138490] ksoftirq-21      2..s. 673435593us : process_csb: vecs0 csb[0]: status=0x00000001:0x40000000
> <0>[  618.138490] ksoftirq-21      2..s. 673435594us : trace_ports: vecs0: promote { 14de2:504*, 0:0 }
> <0>[  618.138490] ksoftirq-21      2..s. 673435596us : process_csb: vecs0 csb[1]: status=0x00000018:0x40000040
> <0>[  618.138490] ksoftirq-21      2..s. 673435597us : trace_ports: vecs0: completed { 14de2:504*, 0:0 }
> <0>[  618.138490] ksoftirq-21      2..s. 673435612us : process_csb: process_csb:2188 GEM_BUG_ON(!i915_request_completed(*execlists->active) && !reset_in_progress(execlists))
>
> After the reset, we do another clflush before checking the CSB to be
> sure we see whatever was left in the CSB prior to the reset. So it is
> unlikely to be an incoherent view of the CSB, and more likely that
> Icelake didn't reset its pointers.
>
> References: 582a6f90aa0d ("drm/i915/execlists: Add a paranoid flush of the CSB pointers upon reset")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_lrc.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 8d79a965f341..b869a9c7b3c6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -2996,6 +2996,14 @@ static void reset_csb_pointers(struct intel_engine_cs *engine)
>  	WRITE_ONCE(*execlists->csb_write, reset_value);
>  	wmb(); /* Make sure this is visible to HW (paranoia?) */
>  
> +	/*
> +	 * Sometimes Icelakes forgets to reset its pointers on a GPU reset.
> +	 * Bludgeon them with a mmio update to be sure.
> +	 */
> +	ENGINE_WRITE(engine, RING_CONTEXT_STATUS_PTR,
> +		     reset_value << 8 | reset_value);
> +	ENGINE_POSTING_READ(engine, RING_CONTEXT_STATUS_PTR);
> +

Big hammer indeed. Next one is that we assert the reset value
and check that first one after reset comes out with 0x0?

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

>  	invalidate_csb_entries(&execlists->csb_status[0],
>  			       &execlists->csb_status[reset_value]);
>  }
> -- 
> 2.24.0.rc2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2019-11-04 14:27 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-04 13:53 [PATCH] drm/i915/execlists: Reset CSB pointers by mmio as well Chris Wilson
2019-11-04 13:53 ` [Intel-gfx] " Chris Wilson
2019-11-04 14:27 ` Mika Kuoppala [this message]
2019-11-04 14:27   ` Mika Kuoppala
2019-11-04 14:33   ` Chris Wilson
2019-11-04 14:33     ` [Intel-gfx] " Chris Wilson
2019-11-04 15:07 ` ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2019-11-04 15:07   ` [Intel-gfx] " Patchwork
2019-11-04 15:34 ` ✓ Fi.CI.BAT: success " Patchwork
2019-11-04 15:34   ` [Intel-gfx] " Patchwork
2019-11-05 11:32 ` ✓ Fi.CI.IGT: " Patchwork
2019-11-05 11:32   ` [Intel-gfx] " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o8xrogxi.fsf@gaia.fi.intel.com \
    --to=mika.kuoppala@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.